diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..9a68c85a --- /dev/null +++ b/404.html @@ -0,0 +1,2135 @@ + + + + + + + + + + + + + + + + + + + + + + Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ +

404 - Not found

+ +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/CMakeLists.txt b/CMakeLists.txt new file mode 100644 index 00000000..b658ac58 --- /dev/null +++ b/CMakeLists.txt @@ -0,0 +1,4 @@ +add_subdirectory(Install) +add_subdirectory(Manual) +add_subdirectory(Miscellaneous) +add_subdirectory(Reference) \ No newline at end of file diff --git a/Case Studies/CONTRIBUTING/index.html b/Case Studies/CONTRIBUTING/index.html new file mode 100644 index 00000000..cf90f458 --- /dev/null +++ b/Case Studies/CONTRIBUTING/index.html @@ -0,0 +1,2264 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + CONTRIBUTING - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Contributing Guide

+

Thank you for investing your time contributing a community case study!

+

In this guide, you will get an overview of the contribution workflow.

+

Getting started

+

Read our Code of Conduct to keep our community approachable and respectable.

+

Refer to the Manual and Tutorials to familiarize yourself with the Accera language and programming model.

+

Components of a good case study

+

A good case study should have these components and characteristics:

+
    +
  1. +

    Solves one specific task, such as matrix multiplication, matrix convolution, vector addition. If you have a series of tasks to solve, break them up into multiple case studies that reference one another.

    +
  2. +
  3. +

    Includes working Accera Python code implementing that task. At the end of the case study, the code should produce a HAT package using accera.Package.build().

    +
  4. +
  5. +

    Describes the thought process, considerations, pros and cons of your implementation in a README.md.

    +
  6. +
  7. +

    If the case study generates several implementations (for example, using Parameter Grids), include the following:

    +
  8. +
  9. Benchmark results on a target machine (for example, your laptop). You can run hatlib.run_benchmark on your HAT package.
  10. +
  11. +

    A description of the make and model of that target machine you used (for example, Intel Xeon E5). If you are unsure, you can use the output of this command:

    +
    python -m cpuinfo
    +
    +
  12. +
+

For some examples, refer to the published case studies in the Table of Contents.

+

Publishing your case study

+

All community case studies are published directly from the author's GitHub repository and linked to from the Accera GitHub repository.

+

Once you are ready to publish your case study: +1. Make your case study GitHub repository public (if you haven't done so already).

+
    +
  1. +

    Edit Case Studies/README.md to add your case study to the Table of Contents. The link should point to the git SHA for your latest commit. The format to use is: https://github.com/user/repo/blob/git_sha/path_to_case_study/README.md.

    +
  2. +
  3. +

    Create a Pull Request to submit your edits to Case Studies/README.md.

    +
  4. +
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Case Studies/index.html b/Case Studies/index.html new file mode 100644 index 00000000..f154fa6a --- /dev/null +++ b/Case Studies/index.html @@ -0,0 +1,2242 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Index - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+ +
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Install/Building_on_MacOS/index.html b/Install/Building_on_MacOS/index.html new file mode 100644 index 00000000..26d2fe9b --- /dev/null +++ b/Install/Building_on_MacOS/index.html @@ -0,0 +1,2361 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Building on MacOS - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Building on MacOS

+ +

Installing on MacOS

+

Install Dependencies

+

Accera requires the following tools and libraries:

+
    +
  • A C++ compiler that supports C++ 17, such as clang, which is bundled in XCode
  • +
  • CMake 3.14 or newer
  • +
  • Python 3.7 or newer
  • +
  • Ninja
  • +
  • Ccache
  • +
  • LLVM OpenMP 5, if using parallelization
  • +
+

Homebrew is a package manager that makes it easy to install the prerequisites. Homebrew can be downloaded and installed by:

+
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
+
+

If you already have Homebrew installed, update it to the latest version by typing:

+
brew update
+
+

Install the dependencies:

+ + + + + + + + + + + + + +
Intel MacOSApple Silicon
brew install cmake python ninja-build ccache libomp pkg-configbrew install cmake python ninja ccache libomp pkg-config
+

Clang

+

Select the clang compiler from XCode:

+
xcode-select --install
+
+

Clone Accera

+

A version of git should already be included in XCode.

+

Clone the git repository:

+
git clone --recurse-submodules https://github.com/microsoft/Accera
+
+

Build and install Accera

+

Run the build.sh script to install dependencies and build the Accera Python package (replace <path_to_accera> with the path to the cloned Accera repository).

+
cd <path_to_accera>
+sh ./build.sh
+
+

Update or install the resulting .whl file from the dist sudirectory. The name depends on your Python version, your OS and your CPU architecture. +

pip install -U ./dist/accera-0.0.1-cp37-cp37-macosx_10_15_x86_64.whl --find-links=dist
+

+

Build and install using CMake

+

Accera can also be built using CMake (intended for expert users).

+

Install dependencies

+
cd <path_to_accera>
+git submodule init
+git submodule update
+./external/vcpkg/bootstrap-vcpkg.sh
+./external/vcpkg/vcpkg install catch2 tomlplusplus accera-llvm --overlay-ports=external/llvm
+
+

The last command typically takes a few hours to build and then install Accera's fork of LLVM. We recommend reserving at least 20GB of disk space for the LLVM build.

+

Configure CMake

+
cd <path_to_accera>
+mkdir build
+cd build
+
+cmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja
+
+

Build and run tests

+
cmake --build . --config Release
+ctest -C Release
+
+

Install

+
cmake --build . --config Release --target install
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Install/Building_on_Ubuntu/index.html b/Install/Building_on_Ubuntu/index.html new file mode 100644 index 00000000..cabee0ee --- /dev/null +++ b/Install/Building_on_Ubuntu/index.html @@ -0,0 +1,2359 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Building on Ubuntu - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Building on Ubuntu

+ +

Installing on Ubuntu

+

Quickstart

+

If you have access to Codespaces, you can launch a Linux VM in the browser or in Visual Studio Code with all the pre-requisites installed:

+
    +
  1. Go to https://github.com/microsoft/Accera, use the "<> Code" drop-down menu, and in the Codespaces tab, click Create codespace on main.
  2. +
  3. sh build.sh
  4. +
+

Step 2 will take some time to build Accera's LLVM fork. Grab a coffee and come back in about an hour or so.

+

Build Script

+

If you do not have access to Codespaces or prefer to build locally, you can use the build.sh script to build Accera.

+

Install Dependencies

+

Accera requires the following tools and libraries:

+
    +
  • A C++ compiler that supports C++ 17, such as GCC 8
  • +
  • CMake 3.14 or newer
  • +
  • Python 3.7 or newer
  • +
  • Ninja
  • +
  • Ccache
  • +
  • LLVM OpenMP 5, if using parallelization
  • +
+
sudo apt update
+sudo apt-get install gcc-8 g++-8 cmake python3 python3-pip ninja-build ccache libomp-11-dev pkg-config zip
+
+

Some Ubuntu distributions install an older version of CMake. Check the version of cmake using cmake --version, and download a newer version if older than 3.14.

+

Clone Accera

+

Install git if you don't already have it:

+
sudo apt-get install git
+
+

Clone the git repository

+
git clone --recurse-submodules https://github.com/microsoft/Accera
+
+

Build and install Accera

+

Run the build.sh script to install dependencies and build the Accera Python package (replace <path_to_accera> with the path to the cloned Accera repository).

+
cd <path_to_accera>
+sh ./build.sh
+
+

Update or install the resulting .whl files from the dist subdirectory. The --find-links option tells pip to look at the dist subdirectory for the dependent packages. +The name depends on your Python version, your OS and your CPU architecture. +

pip install -U ./dist/accera-0.0.1-cp37-cp37m-linux_x86_64.whl --find-links=dist
+

+

CMake Builds

+

Accera can also be built using CMake (intended for expert users).

+

Install dependencies

+
cd <path_to_accera>
+git submodule init
+git submodule update
+./external/vcpkg/bootstrap-vcpkg.sh
+./external/vcpkg/vcpkg install catch2 tomlplusplus accera-llvm --overlay-ports=external/llvm
+
+

The last command typically takes a few hours to build and then install Accera's fork of LLVM. We recommend reserving at least 20GB of disk space for the LLVM build.

+

Configure CMake

+
cd <path_to_accera>
+mkdir build
+cd build
+
+cmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja
+
+

Build and run tests

+
cmake --build . --config Release
+ctest -C Release
+
+

Install

+
cmake --build . --config Release --target install
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Install/Building_on_Windows/index.html b/Install/Building_on_Windows/index.html new file mode 100644 index 00000000..867180e4 --- /dev/null +++ b/Install/Building_on_Windows/index.html @@ -0,0 +1,2385 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Building on Windows - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Building on Windows

+ +

Installing on Windows

+

Install Dependencies

+

Visual Studio

+

Accera requires a C++ compiler that supports C++ 17. You can download Visual Studio 2019 Enterprise Edition or Visual Studio 2022 Community Edition. Install Update 10 or later, which includes the LLVM OpenMP libraries only for VS 2019.

+

Select Desktop Development with C++.

+

Accera requires Spectre-mitigated libraries:

+
    +
  1. Go to Individual Components
  2. +
  3. Type in "Spectre" in the search box
  4. +
  5. Select the latest version of the MSVC libraries, e.g., MSVC v142 - VS 2019 C++ x64/x86 Spectre-mitigated libs (Latest) (your actual version may vary)
  6. +
+

CMake

+

Accera requires CMake 3.14 or newer. A version of CMake that satisfies this requirement is included with Visual Studio 2019 and Visual Studio 2022.

+

Python

+

Accera's packages require Python 3.7 64-bit or newer, plus a version of pip that supports 64-bit packages (win_amd64). One way to obtain this is to download and install Miniconda. Download "Miniconda3 Windows 64-bit".

+
Optional: Create a conda environment
+

After installing Miniconda, you can optionally create an environment to manage different Python versions.

+

From an "Anaconda Prompt", create and then activate an environment for Python 3.7 (or a newer version if you prefer). Make sure to activate an environment from other applications that you use to develop Accera.

+
conda create -n py37 python=3.7
+conda activate py37
+
+

Clone Accera

+

Visual Studio 2019 and 2022 include a version of git. To use it, launch Visual Studio 2019 or 2022, and select Clone a repository.

+

Repository location:

+
https://github.com/microsoft/Accera
+
+

Build and install Accera

+

From a command line with Python in your PATH, such as an Anaconda Command Prompt, setup the Visual Studio command line environment (vcvars64.bat) and then run build.bat to generate the Accera Python packages.

+

For Visual Studio 2022: +

"%ProgramFiles%\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
+

+

For Visual Studio 2019: +

"%ProgramFiles(x86)%\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat"
+

+
cd <path_to_accera>
+build.bat
+
+

Replace <path_to_accera> with the path to the cloned Accera repository.

+

Update or install the resulting .whl file from the dist subdirectory. The --find-links option tells pip to look at the dist subdirectory for the dependent packages. +The whl filename depends on your Python version, your OS, and your CPU architecture.

+
pip install -U dist\accera-0.0.1-cp37-cp37m-win_amd64.whl --find-links=dist
+
+

Build and install using CMake

+

Accera can also be built using CMake (intended for expert users).

+

Install dependencies

+
cd <path_to_accera>
+git submodule init
+git submodule update
+external\vcpkg\bootstrap-vcpkg.bat
+external\vcpkg\vcpkg install catch2:x64-windows tomlplusplus:x64-windows accera-llvm:x64-windows --overlay-ports=external\llvm
+
+

The last command typically takes a few hours to build and then install Accera's fork of LLVM. We recommend reserving at least 20GB of disk space for the LLVM build.

+

Configure CMake

+
cd <path_to_accera>
+mkdir build
+cd build
+
+# For Visual Studio 2019:
+cmake .. -DCMAKE_BUILD_TYPE=Release -G"Visual Studio 16 2019" -Ax64
+
+# For Visual Studio 2022:
+cmake .. -DCMAKE_BUILD_TYPE=Release -G"Visual Studio 17 2022" -Ax64
+
+

Build and run tests

+
cmake --build . --config Release -- /m
+ctest -C Release
+
+

Install

+
cmake --build . --config Release --target install -- /m
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Install/Installing_Accera_on_MacOS/index.html b/Install/Installing_Accera_on_MacOS/index.html new file mode 100644 index 00000000..f0385551 --- /dev/null +++ b/Install/Installing_Accera_on_MacOS/index.html @@ -0,0 +1,2268 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Installing Accera on MacOS - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Installing Accera on MacOS

+ +

Installing on MacOS

+

Install dependencies

+

Accera requires the following tools and libraries for building the generated code:

+
    +
  • A C++ compiler, such as clang, which is bundled in XCode
  • +
  • Python 3.7 or newer
  • +
  • OpenMP 5, if using parallelization
  • +
+

Homebrew is a package manager that makes it easy to install the prerequisites. Homebrew can be downloaded and installed by:

+
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+
+

If you already have Homebrew installed, update it to the latest version by typing:

+
brew update
+
+

Install the dependencies:

+
brew install cmake python@3.7
+
+

Install the optional dependency if using parallelization:

+
brew install libomp
+
+

Clang

+

Select the clang compiler from XCode:

+
xcode-select --install
+
+

Install Accera

+

The accera Python package can be installed from PyPI:

+
pip install accera
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Install/Installing_Accera_on_Ubuntu/index.html b/Install/Installing_Accera_on_Ubuntu/index.html new file mode 100644 index 00000000..fab89e46 --- /dev/null +++ b/Install/Installing_Accera_on_Ubuntu/index.html @@ -0,0 +1,2249 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Installing Accera on Ubuntu - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Installing Accera on Ubuntu

+ +

Installing on Ubuntu

+

Install dependencies

+

Accera requires the following tools and libraries for building the generated code:

+
    +
  • A C++ compiler, such as GCC 8
  • +
  • Python 3.7 or newer
  • +
  • OpenMP 5, if using parallelization
  • +
+

Ubuntu 20.04 is recommended. A quick way to start is to use a new Docker container for Ubuntu 20.04:

+
docker run -v $PWD:/code -it --entrypoint "/bin/bash" ubuntu:focal
+
+

Install Accera's dependencies:

+
apt update
+apt-get install gcc-8 g++-8 python3 python3-pip libncurses5
+
+

Install the optional dependency if using parallelization:

+
apt-get install libomp-11-dev
+
+

Install Accera

+

The accera Python package can be installed from PyPI:

+
pip install accera
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Install/Installing_Accera_on_Windows/index.html b/Install/Installing_Accera_on_Windows/index.html new file mode 100644 index 00000000..87d475a9 --- /dev/null +++ b/Install/Installing_Accera_on_Windows/index.html @@ -0,0 +1,2277 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Installing Accera on Windows - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Installing Accera on Windows

+ +

Installing on Windows

+

Install dependencies

+

Visual Studio

+

Accera's generated code requires a C++ compiler. Download Visual Studio 2019 Enterprise Edition or Visual Studio 2022 Community Edition, and select Desktop development with C++ during installation.

+

If you've selected VS 2019 and would like to use parallelization, ensure that Update 10 or later is installed. Both VS 2019 Update 10 or later and VS 2022 include the LLVM OpenMP libraries.

+

Python

+

Accera's packages require Python 3.7 64-bit or newer, plus a version of pip that supports 64-bit packages (win_amd64). One way to obtain this is to download and install Miniconda. Download "Miniconda3 Windows 64-bit".

+
Optional: Create a conda environment
+

After installing Miniconda, you can optionally create an environment to manage different Python versions.

+

From an "Anaconda Prompt", create and then activate an environment for Python 3.7 (or a newer version if you prefer):

+
conda create -n py37 python=3.7
+conda activate py37
+
+

Install Accera

+

The accera Python package can be installed from PyPI:

+
pip install accera
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Install/index.html b/Install/index.html new file mode 100644 index 00000000..52fd0cd4 --- /dev/null +++ b/Install/index.html @@ -0,0 +1,2192 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Index - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Install from PyPI

+

The quickest way to get up and running is to install the pre-built Python packages:

+ +

Build and Install

+

You can also build and install the latest version of Accera by following these instructions:

+ + +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/00 Introduction/index.html b/Manual/00 Introduction/index.html new file mode 100644 index 00000000..c5aba574 --- /dev/null +++ b/Manual/00 Introduction/index.html @@ -0,0 +1,2250 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 00 Introduction - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Introduction

+

Accera is a framework with a Python-based Domain-specific Language (eDSL) that produces optimized compute-intensive code. Accera's primary focus is the optimization of affine and semi-affine nested for-loops for CPU and GPU targets.

+

Optimization of compute-intensive code in a traditional programming language is not only challenging and time-consuming, but manual optimization of the simplest numerical algorithms demands significant engineering effort and requires an advanced understanding of computer architecture and fluency in C++, C, or Assembly Language. Even with all these efforts, implemented code is prone to critical bugs and requires extensive engineering effort for maintenance. Accera aims at resolving all these issues by providing optimized solutions for compute-intensive algorithms that are highly efficient, readable, and maintainable.

+

Accera has THREE primary goals:

+
    +
  • Performance: To generate the fastest implementation for any compute-intensive algorithm.
  • +
  • Readability: To ensure effective implementation of algorithms without sacrificing the readability of code.
  • +
  • Writability: To provide a user-friendly programming model designed for agility and maintainability.
  • +
+

Accera is designed based on the following guiding principles:

+

1: Strict separation of logic from implementation

+

Traditional programming languages are prone to the tight coupling of code logic (what the program does) with its implementation (how the program is implemented). Consider an example of multiplying a 16×11 matrix A by an 11×10 matrix B. The algorithm's logic calculates the sum over k of A[i,k]·B[k,j] for each value of i and j. In Python, this logic can be expressed as: +

# C += A @ B
+for i in range(16):
+    for j in range(10):
+        for k in range(11):
+            C[i, j] += A[i, k] * B[k, j]
+
+The above code expresses more than just the logic of matrix multiplication. It insists on a specific execution flow: first perform all the steps required to calculate C(0,0) in ascending order of k; then proceed to C(0,1). However, in principle, a single order of execution should not be imposed because the iterations of this loop can be performed in any order while keeping the logic intact. Moreover, the above logic doesn't utilize important optimization techniques, such as double-buffered caching or vectorization.

+

Accera, on the other hand, provides a strict distinction between logic and its implementation. The programmer first implements the logic without performance considerations using a pseudocode-like syntax independent of the target platform. Once the logic is specified, only then does the programmer move to define the concrete implementation details.

+

2: Mindfully trade-off safety versus expressivity

+

Accera offers a programming model where a default implementation of the specified logic can be transformed and manipulated in different ways. If used correctly, these transformations are safe, which means that the underlying logic remains intact. This allows the programmer to entirely focus on the performance of the logic without worrying about its correctness. Moreover, these safe transformations allow automatic search algorithms to aggressively search the space of transformations to converge faster and find better optima.

+

Traditionally, this safety is achieved by trading off the true potential of a programming language since it demands restricting its scope. Nevertheless, extensive constraints significantly restrict the expressivity and the power of the programming language, eventually preventing the end-users from developing highly-optimized and sophisticated implementations.

+

Accera moderates this trade-off between safety and expressivity by explicitly defining what level of safety guarantees are being given by each transformation under different circumstances. Some situations are safer than others. However, the programmer knows exactly what safeties are being guaranteed in all cases.

+

3: The programmer is in control

+

Accera gives the programmer maximum control over the generated logic by providing access to the underlying knobs that determine how algorithms are optimized. Convenience methods and carefully used default values can prevent verbosity. As per the use case, these helper methods can always be tuned, even overridden.

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/01 Arrays and Scalars/index.html b/Manual/01 Arrays and Scalars/index.html new file mode 100644 index 00000000..8c96e01a --- /dev/null +++ b/Manual/01 Arrays and Scalars/index.html @@ -0,0 +1,2384 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 01 Arrays and Scalars - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 1: Arrays and Scalars

+

Arrays

+

Accera stores data in multi-dimensional arrays of scalar elements where all the array elements share the same primary data type (e.g., float32, int8). An array has a constant number of dimensions d known at compile-time (e.g., a matrix is a 2-dimensional array). Each dimension has a positive size, and the sequence of d sizes is called the shape of the array. An element of an array is referred to by a d-coordinate zero-based index vector.

+

Affine memory layout

+

Arrays are multi-dimensional, while computer memories have a linear (one-dimensional) address space. There are many strategies to represent a multi-dimensional array in one-dimensional computer memory. Accera arrays must have an affine memory layout, where each array has an affine memory map that is a d-dimensional vector denoted by a and a memory offset value denoted by o. The array element that corresponds to the index vector i is stored at memory address i· a+o (where i· a denotes a vector dot product).

+

Affine memory maps are rich enough to represent many standard array layouts. For example, in affine maps, 2-dimensional arrays (matrices) can be represented as row-major, column-major, triangular, banded, and Toeplitz matrices. However, affine maps cannot represent z-ordering or striped or blocked layouts.

+

Array shape

+

In an affine memory map, each dimension corresponds to an element, where the dimension having the largest absolute value of the element is called the major dimension. The user must specify all dimension sizes except for the major dimension when constructing an Array. Accera assumes that the size is arbitrary (or infinite) if the major dimension is not specified. In other words, the iterations of the loops determine how much of the array is visited along this dimension.

+

For example, a row-major matrix must have a compile-time-constant number of columns. However, the number of rows can be left undefined, and the loops' sizes control how many rows are processed.

+

Compile-time and runtime dimension sizes

+

The number of dimensions of Accera arrays are known at compile-time. However, the user can choose to specify the sizes of each dimension at compile-time or at runtime. Runtime dimension sizes are only resolved at runtime, typically as inputs to an Accera function.

+

For example, a function that implements generalized matrix multiply can receive the M, N, K dimension sizes as inputs along with the M × N, M × K, and N × K Arrays.

+

Furthermore, an Array can have a mixture of compile-time and runtime dimension sizes.

+

Default and inferred memory layout

+

Although the user can explicitly specify the memory map, Accera offers some conveniences. The user can set the layout as FIRST_MAJOR (e.g., for two-dimensional arrays, first-major is equivalent to row-major) or LAST_MAJOR. In both cases, the affine map is inferred from the array shape. Specifically, if the layout is LAST_MAJOR and the shape is denoted by the vector s, then the map a is set to [1, s0, s0×s1, s0×s1×s2, ...]. If the layout is FIRST_MAJOR and the dimension equals 4, then a is set to [s0×s1×s2, s1×s2, s2, 1]. In both cases, the size of the major dimension is not used in the definition of a. This indicates that the major dimension size is not needed. If no layout is specified, the default layout is FIRST_MAJOR.

+

Array properties

+

Accera arrays are defined with either internal scope or external scope. An internal array is a private array that exists inside a specific Accera function only and cannot be accessed outside of that function. An external array is defined outside of an Accera function and passed in as an argument. The memory layout of an external array is specified as a part of the Accera function signature. Moreover, external arrays are assumed to be disjoint, i.e., they do not share any memory.

+

Accera arrays are either mutable or immutable. The elements of a mutable array can be set by an Accera function, while an immutable array is read-only.

+

Array properties are not explicitly set by the user but are implied by the role of the array (see below).

+

Array roles

+

Accera supports the following four array roles where each role is treated differently.

+
    +
  • Input
  • +
  • Input/Output
  • +
  • Output
  • +
  • Constant
  • +
  • Temporary
  • +
+

Input arrays

+

Input arrays are immutable external arrays whose element type, shape, and affine layout can be known at compile-time. However, their contents are only available at runtime. If the Accera function is emitted as a function in C, each input array is passed as a const pointer argument. For example, we can construct a 10×20 input array of 32-bit floating-point numbers by writing +

import accera as acc
+
+A = acc.Array(shape=(10, 20), role=acc.Role.INPUT, element_type=acc.ScalarType.float32)
+
+The layout of this array would be the default layout, which is acc.Array.Layout.FIRST_MAJOR.

+

The shape (and similarly, the layout) of Input arrays can also be set at runtime:

+
N = acc.create_dimensions()
+A = acc.Array(shape=(N, 20), role=acc.Role.INPUT, element_type=acc.ScalarType.float32)
+
+

Input/output arrays

+

Input/Output arrays are similar to the input arrays except that they are mutable external arrays, i.e., their values can be changed. This type of array is used to output the results of the loop-nest computation. If the Accera function is emitted as a function in C, each input array is passed as a non-const pointer argument.

+

Output arrays

+

Output arrays are variable-shaped mutable external arrays whose shapes and affine layout are known at runtime. The key differences with Input/Output arrays are:

+
    +
  • Output arrays are dynamically allocated at runtime. The caller of an Accera function that uses Output arrays will need to implement the __accera_allocate function to allocate memory (and also perform the subsequent deallocation).
  • +
  • Output arrays are uninitialized by default. Accera will produce an error if operators such as += are used on an Output array without prior initialization through assignment.
  • +
  • For simplicity, output dimensions (acc.Role.OUTPUT) must be used for specifying an Output array shape or layout (this limitation may be lifted in the future).
  • +
+

Output arrays are useful for operations that adjust the array shape depending on the input values. For example, the Range operation generates variable output sizes based on the start, end, and step inputs:

+
import accera as acc
+
+# inputs
+Start = acc.Scalar()
+End = acc.Scalar()
+Step = acc.Scalar()
+
+# compute the variable output size
+N = acc.create_dimensions(role=acc.Role.OUTPUT)
+N.value = acc.floor((End - Start) / Step)
+
+# create an Output array with the variable output size
+A = acc.Array(shape=(N, ), role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32)
+
+

The layout of this array is the default layout, which is acc.Array.Layout.FIRST_MAJOR.

+

Constant arrays

+

These are the only Accera arrays whose contents are known at compile-time. Constant arrays are immutable internal arrays whose memory layout can be chosen automatically without any external constraints since they are internally scoped. For example, a constant array can be automatically laid out according to the loop nest's memory access pattern. The layout of a constant array could even depend on its contents (e.g., its sparsity pattern). The dimension sizes of a constant array must be known at compile-time.

+

We must provide the constant array data (the element values) when constructing it. This data can be any Python buffer or a numpy array: +

import accera as acc
+import numpy as np
+
+matrix = np.random.rand(16, 16)
+B = acc.Array(role=acc.Role.CONST, data=matrix)
+

+

Temporary arrays

+

Temporary arrays are mutable internal arrays that are used when two Accera schedules are fused into one (more on fusing in Section 4). The elements of a temporary array are initialized to zeros and used to store intermediate values. Similar to constant arrays, temporary arrays can be laid out arbitrarily. In fact, the Accera compiler can even choose not to store them in physical memory at all.

+

Scalars

+

A scalar represents a single number whose value is mutable and set at runtime. Scalars are useful as input arguments to functions or when computing a single-valued numeric result.

+

Section 2 lists the operations can be performed on scalars.

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/02 Simple Affine Loop Nests/index.html b/Manual/02 Simple Affine Loop Nests/index.html new file mode 100644 index 00000000..e0d26e14 --- /dev/null +++ b/Manual/02 Simple Affine Loop Nests/index.html @@ -0,0 +1,2775 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 02 Simple Affine Loop Nests - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 2: Simple affine loop nests

+

This section introduces loop nests and their different types that are provided in Accera programming model.

+

Affine loop nests

+

Many important compute-intensive workloads can be expressed using nested for-loops. An algorithm that can be defined using nested for-loops is called a loop nest. Accera only supports the class of affine loop nests. A loop nest is affine if the indices of the elements accessed on each iteration are an affine function of the loop iterator variables. For example, the following loop nest is affine: +

for i in range(M):
+    for j in range(N):
+        C[2*i+2, j+2] += A[3*i, j] + B[j, i]
+
+because 2*i+2, j+2, 3*i, j and i are all affine functions of the iterator variables i and j.

+

On the other hand, the following loop nest is not affine: +

for i in range(M):
+    for j in range(N):
+        C[i*i, j] += A[i*i, j] + B[i*j, i]
+
+because i*i and i*j are quadratic (non-affine) functions of i and j.

+

Simple affine loops nests, a.k.a. simple nests

+

Simple Affine Loop Nests, hereinafter referred to as simple nests, is an important subclass of affine loop nests that satisfies the following properties: +1. The loops are perfectly nested: all the computation is entirely contained within the deepest loop. +2. All the loops are normalized: each loop starts at 0, increments by 1, and ends at a compile-time constant size. +3. The loop iterations are order invariant: the logic doesn't change if the loop iterations are executed in a different sequential order. +4. No conditional exit: the loop doesn't contain break or continue commands.

+

The matrix-matrix multiplication example given in the introduction is an example of a simple nest. Another example is 2-dimensional convolution, which is the fundamental operation in convolutional neural networks, and can be written in Python as: +

# Convolve M x N data matrix A with S x T filter matrix B and add output to matrix C
+for i in range(M):
+    for j in range(N):
+        for k in range(S):
+            for l in range(T):
+                C[i, j] += A[i + k, j + l] * B[k, l]
+

+

While Accera supports arbitrary affine loop nests, the programmer defines the logic of their algorithms using simple nests. More complex nests are obtained by applying schedule transformations (see Section 3) or by fusing multiple schedules (see Section 4).

+

Defining the loop nest logic

+

The programmer's goal is to create a highly optimized target-specific implementation of an affine loop nest. The first step towards this goal is to define the logic of one or more simple nests. The logic is a target-independent pseudo-code of a simple nest, written without considering performance. For example, the following code defines the logic of the matrix-matrix multiplication loop nest:

+

# Import accera
+import accera as acc
+
+# Define matrix sizes
+M = 16
+N = 10
+S = 11
+
+A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, S))
+B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(S, N))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+
+# Define a simple affine loop nest and name its loops i, j, k
+nest = acc.Nest(shape=(M, N, S))
+i, j, k = nest.get_indices()
+
+# Define the logic of each iteration in the nest
+@nest.iteration_logic
+def _():
+    C[i,j] += A[i,k] * B[k,j]
+
+We start by defining the arrays that participate in the computation: A and B are input arrays and C is an input/output array. Next, we initialize nest to be an empty skeleton of a loop nest, with nested loops of sizes M, N, S. These loops are logical -- think of them as pseudo-code loops -- they do not define the execution order of the iterations. The index variables that correspond to the three loops are named i, j, k respectively.

+

The last part of the example sets the iteration logic to C[i, j] += A[i, k] * B[k, j]. Note that this iteration logic follows an affine memory access pattern. The syntax in the example makes use of Python decorators and is shorthand for the more explicit syntax: +

def logic_fn():
+    C[i, j] += A[i, k] * B[k, j]
+
+nest.iteration_logic(logic_fn)
+

+

The iteration spaces above have compile-time shapes. We can define runtime shapes by replacing any or all of the constant matrix sizes M, N, and S with an acc.Dimension placeholder:

+
M = acc.create_dimensions() # replace M with a runtime dimension
+N = 10 # a compile-time dimension
+S = 11
+
+A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, S))
+B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(S, N))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+
+# Define a simple affine loop nest and name its loops i, j, k
+nest = acc.Nest(shape=(M, N, S))
+
+

The iteration space dimensions will now be runtime variables that need to be provided to the function (more on this later).

+

Supported operations

+

The iteration logic can include the following operations (assuming accera was imported as acc):

+

Assignment operators

+ + + + + + + + + + + + + + + +
OperationTypes (Operands must be of same type)Description
a = bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Assigns the value of scalar b to scalar a
+

Arithmetic operators

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationTypes (Operands must be of same type)Description
a + bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the sum of scalars a and b
a - bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the difference between scalars a and b
a * bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the product of scalars a and b
a / bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the quotient of scalars a and b. If the operands are integers, an integer division result is returned
a ** bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the b'th power of scalar a
a // bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the floor of the quotient of scalars a and b
a % bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the signed remainder after dividing scalar a by scalar b
-aacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the additive inverse of scalar a
+

Comment: Accera also supports the corresponding compound-assignment operators, such as a += b, a -= b, etc.

+

Relational operators

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationTypes (Operands must be of same type)Description
a == bacc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if scalar a equals scalar b, else False
a != bacc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if scalar a is not equal to scalar b, else False
a < bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if scalar a is strictly smaller than scalar b, else False
a <= bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if scalar a is smaller than or equal to scalar b, else False
a > bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if scalar a is strictly greater than scalar b, else False
a >= bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if scalar a is greater than or equal to scalar b, else False
+

Logical operators

+ + + + + + + + + + + + + + + + + + + + + + + + + +
OperationTypes (Operands must be of same type)Description
acc.logical_and(a, b)acc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if scalars a and b are non-zero, else False
acc.logical_or(a, b)acc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if either scalar a or scalar b are non-zero, else False
acc.logical_not(a)acc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if a is zero, else False
+

Bitwise operators

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationTypes (Operands must be of same type)Description
a & bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns the bitwise AND of the bits in scalars a and b
a \| bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns the bitwise OR of the bits in scalars a and b
a ^ bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns the bitwise XOR of the bits in scalars a and b
~aacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns the bitwise inverse of the bits in scalar a
a << bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns scalar a whose bitwise representation is shifted left by b bits
a >> bacc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns scalar a whose bitwise representation is shifted right by b bits
+

Comment: Accera also supports the corresponding compound-assignment operators, such as a &= b, a |= b, etc.

+

Intrinsics

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationTypes (Operands must be of same type)Description
acc.abs(a)acc.ScalarType.float16/32/64Returns the absolute value of scalar a
acc.max(a, b)acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the larger of the two scalars a and b
acc.min(a, b)acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the smaller of the two scalars a and b
acc.ceil(a)acc.ScalarType.float16/32/64Returns the value of scalar a rounded up to the nearest integer as an int64 type
acc.floor(a)acc.ScalarType.float16/32/64Returns the value of scalar a rounded down to the nearest integer as an int64 type
acc.sqrt(a)acc.ScalarType.float16/32/64Returns the square root of scalar a
acc.exp(a)acc.ScalarType.float16/32/64Returns the exponential e raised to the scalar a
acc.log(a)acc.ScalarType.float16/32/64Returns the natural logarithm (base e) of scalar a
acc.log10(a)acc.ScalarType.float16/32/64Returns the common logarithm (base 10) of scalar a
acc.log2(a)acc.ScalarType.float16/32/64Returns the binary logarithm (base 2) of scalar a
acc.sin(a)acc.ScalarType.float16/32/64Returns the sine of scalar a, where a is in radians
acc.cos(a)acc.ScalarType.float16/32/64Returns the cosine of scalar a, where a is in radians
acc.tan(a)acc.ScalarType.float16/32/64Returns the tangent of scalar a, where a is in radians
acc.sinh(a)acc.ScalarType.float16/32/64Returns the hyperbolic sine of scalar a, where a is in radians
acc.cosh(a)acc.ScalarType.float16/32/64Returns the hyperbolic cosine of scalar a, where a is in radians
acc.tanh(a)acc.ScalarType.float16/32/64Returns the hyperbolic tangent of scalar a, where a is in radians
+

Implicit type casting

+

Accera operators require operands to be the same type. Computations that use multiple types can take advantage of Accera's implicit type casting support when converting from smaller-sized types to larger-sized types.

+

To do implicit casting, simply assign a source type to its implicitly-castable destination type. No additional casting operation is needed for converting between these types.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Source typesDestination type (implicitly-castable)
acc.ScalarType.bool, acc.ScalarType.uint8acc.ScalarType.int8
acc.ScalarType.bool, acc.ScalarType.int8acc.ScalarType.uint8
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.uint16acc.ScalarType.int16
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16acc.ScalarType.uint16
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.uint32acc.ScalarType.int32
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32acc.ScalarType.uint32
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32, acc.ScalarType.uint32, acc.ScalarType.uint64acc.ScalarType.int64
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32, acc.ScalarType.uint32, acc.ScalarType.int64acc.ScalarType.uint64
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16acc.ScalarType.float16
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16acc.ScalarType.bfloat16
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32, acc.ScalarType.uint32, acc.ScalarType.int64, acc.ScalarType.float16, acc.ScalarType.bfloat16acc.ScalarType.float32
acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32, acc.ScalarType.uint32, acc.ScalarType.int64, acc.ScalarType.float16, acc.ScalarType.bfloat16, acc.ScalarType.float32acc.ScalarType.float64
+

To override the casting behavior above, or cast a larger-sized type to a smaller-sized type, use the acc.cast operation.

+

Comment: implicit casting of constants may result in truncation.

+

Accera program stages

+

Let’s take a step back to describe the stages of Accera program:

+
    +
  • Nest: A nest captures the logic of a simple nest, without any optimizations or implementation details.
  • +
  • Schedule: A Nest is used to create a schedule. The schedule controls the order in which the nest iterations are visited. Multiple schedules can be fused into a single schedule, which may no longer represent a simple nest.
  • +
  • Plan: A Schedule is used to create a plan. A plan controls the implementation details that are specific for a target platform (e.g., data caching strategy, vectorization, assignment of arrays and caches to different types of memory).
  • +
  • Package: A Plan is used to create a function in a function package. The package is then compiled and emitted.
  • +
+

Once a package is emitted, the Accera functions contained in it can be called from external client code. This external code is typically not written using Accera.

+

Accera currently supports the following package formats:

+
    +
  • HAT, which is a schematized version of a standard C library. The external client code can be written in C or C++ and linked with the HAT package.
  • +
  • MLIR, which uses standard MLIR dialects. The external code must also be in MLIR.
  • +
+

Overall, to build and emit nest (defined above), we would write:

+
# create a default schedule from the nest
+schedule = nest.create_schedule()
+
+# create a default plan from the schedule
+plan = schedule.create_plan()
+
+# create a HAT package. Create a function in the package based on the plan
+package = acc.Package()
+package.add(plan, args=(A, B, C), base_name="simple_matmul")
+
+# build the HAT package
+package.build(format=acc.Package.Format.HAT_DYNAMIC, name="linear_algebra")
+
+

It may not be immediately clear why so many stages are needed just to compile a simple nest. Therefore, let’s discuss each stage in detail to understand their importance.

+

In the example above, the call to package.add takes three arguments: the first is the plan that defines the function's implementation; the second is the order of the input and input/output arrays in the function signature; and the third is a base name for the function. The full name of the function is the base name followed by an automatically-generated unique identifier. For example, the function in the example could appear in the package as simple_matmul_8f24bef5. The automatically-generated suffix ensures that each function in the package has a unique name. More details on function packages can be found in Section 10.

+

The Array shapes above are known at compile-time. If one or all of the shapes are known at runtime, we provide dimensions as arguments to the function:

+
M, N, S = acc.create_dimensions() # runtime dimensions
+
+A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, S))
+B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(S, N))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+
+...
+
+# create a default schedule from the nest
+schedule = nest.create_schedule()
+
+# create a default plan from the schedule
+plan = schedule.create_plan()
+
+# create a HAT package. Create a function in the package based on the plan, with
+# the dimensions as additional arguments (in any order)
+package = acc.Package()
+package.add(plan, args=(M, N, S, A, B, C), base_name="simple_matmul_runtime_shapes")
+
+# build the HAT package
+package.build(format=acc.Package.Format.HAT_DYNAMIC, name="linear_algebra")
+
+

Convenience syntax

+

For convenience, Accera also provides shortcuts to avoid unnecessary verbosity. Specifically, we can create a function in a package directly from a nest, as follows: +

package.add(nest, args=(A, B, C), base_name="simple_matmul")
+
+The abbreviated syntax makes it seem like a callable function is generated directly from nest. However, what actually happens behind the scenes is that nest creates a default schedule, which creates a default plan, which is added as a function in the package. Accera has a similar convenience syntax to create a function from a schedule: +
package.add(schedule, args=(A, B, C), base_name="simple_matmul")
+
+and to create a plan directly from a nest: +
plan = nest.create_plan()
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/03 Schedules/index.html b/Manual/03 Schedules/index.html new file mode 100644 index 00000000..7ff28c69 --- /dev/null +++ b/Manual/03 Schedules/index.html @@ -0,0 +1,2590 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 03 Schedules - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 3: Schedules

+

We begin with nest from Section 2 which captures the logic of matrix-matrix multiplication. We use nest to create a Schedule that controls the execution order of the nest's iterations. Schedules are target-independent in the sense that the same schedule can be used to emit code for multiple target platforms.

+

We create a default schedule as follows: +

schedule = nest.create_schedule()
+

+

The default schedule is equivalent to the following straightforward for-loop version of the loop nest: +

for i in range(3):
+    for j in range(12):
+        for k in range(15):
+            C[i, j] += A[i, k] * B[k, j]
+
+In other words, each of the logical pseudo-code loops in nest becomes an actual for-loop in the default schedule. The for-loop sizes can be known at compile-time or at runtime.

+

We can now transform this schedule in various ways. However, these transformations do not change the underlying logic defined in nest and merely change the order of the loop iterations. We can even generate as many independent schedules as we want by calling nest.create_schedule().

+

Iteration spaces: a geometric representation of schedules

+

In the Accera programming model, a schedule is geometrically interpreted as a multi-dimensional discrete hypercube called the iteration space of the nest. The elements of the iteration space represent the individual iterations of the loop nest. Initially, the dimensions of the iteration space correspond to the logical loops defined in nest.

+

For example, the default iteration space for the matrix-matrix multiplication nest forms a three-dimensional discrete hypercube, whose shape is (3, 12, 15):

+

(3, 12, 15) iteration space

+

The (3, 12, 15) iteration space. The arrows labelled 1, 2, and 3 indicate the dimension order and direction.

+

How does an iteration space imply an order over the iterations?

+

The dimensions of the iteration space are ordered, and this order corresponds to the original order of the logical loops in nest by default. In fact, the order over the dimensions induces a lexicographic sequence over the individual elements of the iteration space.

+

(3, 12, 15) iteration sequence

+

Video showing sequence of iterations for the (3, 12, 15) iteration space.

+

This geometric interpretation of schedules helps us visualize how different transformations modify them. While some transformations merely rearrange the elements of the iteration space, others increase its dimensions, and some even pad the space with empty (no-op) elements. The transformed iteration space defines a new lexicographic order over the individual iterations.

+

Comment: It is important not to confuse arrays, like A, B, C, with iteration spaces, like schedule. A possible source of confusion could be that both arrays and iteration spaces have a multidimensional rectilinear structure (i.e., they both look like hypercubes). However, arrays and iteration spaces are fundamentally different. Arrays are data structures whose elements are scalars. Iteration spaces are abstract geometric representations of schedules and their elements represent individual iterations of a loop nest. Transformations apply to iteration spaces, not to arrays.

+

Comment: Accera's geometric interpretation of schedules resembles the iteration domain polyhedron, which is the cornerstone of the polyhedral model of compiler optimization. However, unlike polyhedrons, Accera iteration spaces are not embedded in a continuous space and cannot be manipulated by algebraic transformations. Accera iteration spaces always remain rectilinear and are inherently discrete objects.

+

Iteration space slices

+

Iteration space slices is an abstract concept that affects different aspects of the Accera programming model. Since the iteration space dimensions are ordered, each element of the iteration space can be identified by a vector of coordinates. For example, the vector (1, 6, 7) identifies the iteration at position 1 along the first dimension, position 6 along the second dimension, and position 7 along the third dimension. If one or more coordinates are replaced with the wildcard symbol *, we get an iteration space slice, which is a set of iterations obtained by replacing the wildcard with all possible values. For example, (*, *, 2) represents a slice containing all the elements with 2 as their last coordinate. The dimension of a slice equals the number of wildcards in its definition.

+

(*, *, 2) slice

+

The (3, 12, 15) iteration space. Highlighted elements belong to the (*, *, 2) slice.

+

Iteration space slices in four dimensions, denoted by indices (i, j, jj, k):

+ + + + + + + + + + + + + + + +
(1, *, *, *)(*, *, *, 3)(2, *, 0, *)
(1, *, *, *)(*, *, *, 3)(2, *, 0, *)
+

Loops, indices, and dimensions

+

When we defined nest, we used variables such as i, j, and k to name the loops in the loop-nest. When we described the default schedule using equivalent for-loops, i, j, and k became the index variables of those loops. Now, when we represent a schedule as an iteration space, these variables are used as the names of the corresponding iteration space dimensions. From here on, we move seamlessly between these different representations and use the terms loop, index, and dimension interchangeably.

+

Schedule transformations

+

Iteration space transformations change the shape of the iteration space, possibly by adding dimensions or padding the space with empty elements.

+

The iterations space always retains its rectilinear (hypercube) shape. In some cases, Accera transformations must pad the iteration space with empty elements to avoid reaching a jagged iteration space structure.

+

reorder

+
# Reorder the indices.
+schedule.reorder(k, i, j)
+
+

The reorder transformation sets the order of indices in the schedule. From the iteration space point-of-view, reorder performs a pivot rotation of the iteration space, which orients its dimensions in a specified order. Since the iteration space elements are executed in lexicographic order, pivoting the iteration space is equivalent to reordering the loops.

+

For example, we can write: +

schedule.reorder(j, k, i)
+
+After this transformation, schedule becomes equivalent to: +
for j in range(12):
+    for k in range(15):
+        for i in range(3):
+            C[i, j] += A[i, k] * B[k, j]
+

+ + + + + + + + + + + + + +
(3, 12, 15) iteration space(12, 15, 3) iteration space
Default schedule, order is (i, j, k)After reorder(j, k, i), order is (j, k, i)
+

Invalid orders

+

Some orders are not allowed. Describing these restrictions in full will require concepts that are yet to be introduced. Therefore, we are stating these restrictions here and will discuss them later in the upcoming sections. The restrictions are: +1. The inner dimension created by a split transformation (see below) must be ordered later than its corresponding outer dimension. +2. The fusing dimension created by a fuse operation (see Section 4) must always precede any unfused dimensions.

+

Also note that reorder can also have the following overloaded form: +

schedule.reorder(order=(j, k, i))
+
+This form is better suited for use with parameters (see Section 9).

+

split

+
# Splits dimension i into equally-sized parts, orients those parts along a new dimension ii, and stacks those parts along dimension i
+ii = schedule.split(i, size)
+
+

From the iteration space point-of-view, the split transformation takes a dimension i and a size, modifies i, and creates a new dimension ii. Assume that the original size of dimension i was n: The split transformation splits the dimension i into ceil(n/size) parts of size size, orients each of these parts along dimension ii, and stacks the ceil(n/size) parts along the dimension i. If the split size does not divide the dimension size, empty elements are added such that the split size does divide the dimension size. As a result of the split, the size of i becomes ceil(n/size), the size of the new dimension ii equals size, and the iteration space remains rectilinear.

+

In loop terms, ii = split(i, size) splits loop i into two loops: an inner loop ii and an outer loop, which inherits the original name i. Note that the outer loop always precedes the corresponding inner loop in the loop ordering.

+

For example, starting from nest defined in Section 2, we could write: +

schedule = nest.create_schedule()
+jj = schedule.split(j, 3)
+
+The resulting iteration space has a shape of (3, 4, 3, 15) and corresponds to the following python code: +
for i in range(3):
+    for j in range(0, 12, 3): # length 4, stride 3
+        for jj in range(3):
+            for k in range(15):
+                C[i, j+jj] += A[i, k] * B[k, j+jj]
+
+Note that loop j is no longer normalized (it has a stride of 3 rather than 1), which means that the nest is no longer a simple nest. As mentioned in the previous section, Nest objects always represent simple nests, but Schedule objects can represent more complex affine loop nests.

+ + + + + + + + + + + + + +
(3, 12, 15) iteration space(3, 4, 3, 15) iteration space
Default scheduleAfter split(j, 3)
+

After performing a split, both the outer index and the inner index can be split again. For example, +

schedule = nest.create_schedule()
+jj = schedule.split(j, 3)
+jjj = schedule.split(j, 2) # split the outer index j again
+
+After the first split, the iteration space has the shape (3, 4, 3, 15). After the second split, the shape becomes (3, 2, 2, 3, 15). The transformed schedule corresponds to the following Python code: +
for i in range(3):
+    for j in range(0, 12, 6): # length 2, stride 6
+        for jjj in range(0, 6, 3): # length 2, stride 3
+            for jj in range(3):
+                for k in range(15):
+                    C[i, j+jj+jjj] += A[i, k] * B[k, j+jj+jjj]
+

+

The split does not necessarily need to divide the dimension size. For example, consider the following code: +

schedule = nest.create_schedule()
+jj = schedule.split(j, 5)  # original size of dimension j was 12
+
+From the iteration space point-of-view, this code splits dimension j into three parts of size 5, where the last part is padded with empty (no-op) elements. Before the transformation, the iteration space shape is (3, 12, 15), and after the transformation, the shape is (3, 3, 5, 15) (so, 135 empty elements were added).

+ + + + + + + + + + + + + +
(3, 15, 15) iteration space(3, 3, 5, 15) iteration space
Default schedule (no-op elements in blue)After split(j, 5)
+

In loop form, the transformed iteration space corresponds to the following Python code: +

for i in range(3):
+    for j in range(0, 12, 5):
+        for jj in range(5):
+            for k in range(15):
+                if j+jj < 12
+                C[i, j+jj] += A[i, k] * B[k, j+jj]
+
+Note that Accera optimizes away costly if statements by unswitching the loops, which results in code that looks more like this: +
for i in range(3):
+    for j in range(0, 10, 5):
+        for jj in range(5):
+            for k in range(15):
+                C[i, j+jj] += A[i, k] * B[k, j+jj]
+        # loop unswitching: handle the last iteration of the j loop separately
+        for j in range(10, 12):
+            for k in range(15):
+                C[i, j] += A[i, k] * B[k, j]
+

+

Meaningless splits

+

Next, we will describe Accera’s behavior in a few degenerate cases. If the split size equals the dimension size, the transformation simply renames the split dimension. For example, +

schedule = nest.create_schedule()
+jj = schedule.split(j, 12) # original size of dimension j was 12
+
+After the split, the size of j becomes 1 and the size of jj is 12. The new shape of the iteration space is (3, 1, 12, 15). The dimension j becomes meaningless and therefore the schedule is basically unchanged.

+

If the split size exceeds the dimension size, Accera will treat it as if the split size doesn't divide the dimension size. This special case is handled by adding empty elements. For example, +

schedule = nest.create_schedule()
+jj = schedule.split(j, 13)  # original size of dimension j was 12
+
+After the split, the size of j becomes 1 and the size of jj, 13. The new shape of the iteration space is (3, 1, 13, 15), which means that 45 empty elements were added. These empty elements are removed during code generation, which means that the schedule is basically unchanged.

+

Finally, note that jj = schedule.split(j, 1) simply adds a meaningless new dimension jj of size 1, and again, the schedule is unchanged.

+

Convenience syntax: tile

+

The tile transformation is a convenience syntax and does not provide any unique functionality. Consider the following code +

schedule = nest.create_schedule()
+jj, kk = schedule.tile({
+    j: 2,
+    k: 3
+})
+
+The tile transformation above is shorthand for the following sequence of transformations: +
jj = schedule.split(j, 2)
+kk = schedule.split(k, 3)
+

+

It will result in a sequence of indices that are ordered as: +

(i, j, jj, k, kk)
+
+In other words, the tile transformation takes a tuple of indices and a tuple of sizes, splitting each index by the corresponding size. The indices involved in the split are then ordered such that each of the outer indices (parent index) precedes its inner indices (child index). On the other hand, indices that did not participate in the transformation retain their relative positions.

+

skew

+
# Skew dimension i with respect to dimension j.
+schedule.skew(i, j)
+
+

The skew transformation is the easiest to explain for a two-dimensional iteration space of shape (N, M). Skewing dimension i (the row dimension) with respect to j (the column dimension) modifies the iteration space column-by-column: column j gets j empty elements added to its start and M-j-1 empty elements to its end. As a result, each column grows from size N to size N+M-1. Geometrically, the original iteration space elements take the form of a 45-degree parallelogram, embedded within a bounding rectangle of shape (N+M-1, M). The element that used to be at coordinate (i, j) moves to coordinate (i+j, j).

+

Similarly, skewing j with respect to i adds empty elements at the beginning and end of each row, resulting in an iteration space of shape (N, N+M-1). In higher dimensions, we simply apply the two-dimensional skew transformation independently to each two-dimensional slice along the two specified dimensions.

+

To demonstrate the importance of this transformation, consider convolving a 10-element vector with a 3-element filter. The loop logic for this operation is defined as follows: +

import accera as acc
+
+N = 10  # input size
+K = 3  # filter size
+M = N - K + 1  # output size = 8
+
+A = acc.Array(role=acc.Role.INPUT, shape=(N,))
+B = acc.Array(role=acc.Role.INPUT, shape=(K,))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(M,))
+
+nest = acc.Nest(shape=(M, K))
+i, j = nest.get_indices()
+
+@nest.iteration_logic
+def _():
+    C[i] += A[i+j] * B[j]
+
+schedule = nest.create_schedule()
+
+schedule corresponds to an iteration space of shape (8,3), where the first dimension corresponds to the 8 elements of the output vector. This schedule calculates the outputs one by one: first C[0], then C[1], etc.

+

Here is the equivalent Python code: +

for i in range(8):
+    for j in range(3):
+        C[i] += A[i+j] * B[j]
+

+

Now, say that we apply the skew transformation as follows: +

schedule.skew(i, j)
+
+This transformation results in an iteration space of shape (10, 3), where the first dimension corresponds to the 10 elements of the input. This transformed schedule processes the input elements one-by-one: it extracts all the information from A[0] (A[0] is only used in the calculation of C[0]), then moves on to A[1] (which contributes to both C[0] and C[1]), and so on.

+

In this example, the default schedule achieves memory locality with respect to array C, whereas the skewed schedule achieves memory locality with respect to array A.

+

In loop form, the transformed iteration space corresponds to the following Python code:

+
for i in range(10):
+    for j in range(3):
+        if (i-j) >= 0 and (i-j) < 8:
+            C[i-j] += A[i] * B[j]
+
+

Behind the scenes, unswitching the loops results in code that looks more like this:

+
# triangle of height 2, width 3
+for j in range(1):
+    C[0-j] += A[0] * B[j]
+for j in range(2):
+    C[1-j] += A[1] * B[j]
+
+# rectangle of shape (6, 3)
+for i in range(2, 8):
+    for j in range(3):
+        C[i-j] += A[i] * B[j]
+
+# upside-down triangle of height 2, width 3
+for j in range(2):
+    C[6+j] += A[8] * B[2-j]
+for j in range(1):
+    C[7+j] += A[9] * B[2-j]
+
+

Finally, note that some loops have small sizes that can be replaced by unrolls. To enable the unrolling of these small loops, we can use this optional parameter:

+
schedule.skew(i, j, unroll_loops_smaller_than=3)
+
+

This will unroll all loops that are smaller than 3, which include the range(2) and range(1) loops in the example above.

+

pad

+
# Adds empty elements to the beginning of dimension i.
+schedule.pad(i, size)
+
+

The pad transformation pads the beginning of dimension i with empty elements. This operation is meaningless by itself, but can be useful when used with splitting or fusing.

+

Order-invariant schedules and safety

+

A schedule is order-invariant if its underlying logic doesn't depend on the execution order of its iterations. For example, schedules created from a single Nest (via create_schedule()) are order-invariant. All of the schedules discussed so far have been order-invariant.

+

A schedule is safe if its underlying logic is guaranteed to remain intact regardless of the applied transformations. Not all schedules are safe, but order-invariant schedules are. This is because the transformations introduced in this section only change the execution order of iterations without adding or removing any work.

+

In Section 4, we introduce fused schedules, which are not order-invariant, but may still be safe.

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/04 Fusing/index.html b/Manual/04 Fusing/index.html new file mode 100644 index 00000000..dcdfd1f9 --- /dev/null +++ b/Manual/04 Fusing/index.html @@ -0,0 +1,2667 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 04 Fusing - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 4: Fusing

+

With fuse operation, multiple schedules can be combined into a single schedule representing the union of the work in the original schedules. These fused schedules can be transformed by any of the transformations presented in Section 3.

+

Full fusing

+
import accera as acc
+
+# Fuse three schedules to create a fused schedule
+schedule = acc.fuse(schedule0, schedule1, ...)
+
+

Full fusing is the most straightforward, where each dimension is fused with the corresponding dimension from other schedules.

+

Full fusing of same-shaped iteration spaces

+

First, consider the simplest case where we fuse schedules with identical iteration space shapes. This fusing assigns a new dimension called fusing dimension to the fused schedule schedule that does not exist in the original schedules. By default, the fusing dimension is the last dimension in the fused schedule. Its size is equal to the number of fused schedules. The slices along the fusing dimension contain a copy of the iteration logic of schedule0, schedule1. The first slice along the fusing dimension contains a copy of the iteration logic of schedule0, the second slice contains that of schedule1, and so on. Since the fusing dimension is the last dimension, the fused schedule is logically equivalent to executing an iteration of schedule0, followed by an iteration of schedule1, and so on.

+

Consider a scenario where we want first to shift and then scale each element of a matrix. In other words, we want to perform the equivalent of the below Python code:
+

C = (C + A) * B
+

+

If all three matrices are 10 by 10, one way to do this without fusing is to write: +

A = acc.Array(role=acc.Role.INPUT, shape=(10, 10))
+B = acc.Array(role=acc.Role.INPUT, shape=(10, 10))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(10, 10))
+
+# Create nest_simple and schedule_simple
+nest_simple = acc.Nest(shape=(10, 10))
+i, j = nest_simple.get_indices()
+
+@nest_simple.iteration_logic
+def _():
+    C[i, j] = (C[i, j] + A[i, j]) * B[i, j]
+
+schedule_simple = nest_simple.create_schedule()
+
+Note that each iteration in schedule_simple executes simultaneously on all three arrays. However, there can be a case where concurrent operation on these arrays creates excessive pressure on the computer’s memory cache, resulting in lower performance. In such a case, simultaneous operation on two arrays instead of three has a computational advantage.

+

Therefore, we may first want to compute C += A and then compute C *= B. Better yet, we may want to compute C in 2×2 blocks. We first compute C[0:2, 0:2] += A[0:2, 0:2]. Subsequently, we compute C[0:2, 0:2] *= B[0:2, 0:2]. Finally, we move on to the next block and compute C[2:4, 0:2] += A[2:4, 0:2], and so on. This way, fusing offers remarkable flexibility to explore all of these different execution possibilities.

+

First, we define two separate nests, one for the C += A logic and one for the C *= B logic, and get their corresponding default schedules: +

# Create nest0 and schedule0
+nest0 = acc.Nest(shape=(10, 10))
+i0, j0 = nest0.get_indices()
+
+@nest0.iteration_logic
+def _():
+    C[i0, j0] += A[i0, j0]
+
+schedule0 = nest0.create_schedule()
+
+# Create nest1 and schedule1
+nest1 = acc.Nest(shape=(10, 10))
+i1, j1 = nest1.get_indices()
+
+@nest1.iteration_logic
+def _():
+    C[i1, j1] *= B[i1, j1]
+
+schedule1 = nest1.create_schedule()
+

+

Before fusing, both schedule0 and schedule1 have a shape (10, 10). Now, let’s fuse them: +

# Create a fused schedule
+schedule = acc.fuse(schedule0, schedule1)
+i, j, f = schedule.get_indices()
+
+Fusing creates a new fused schedule schedule with a shape (10, 10, 2). It does not change schedule0 and schedule1. The last dimension in schedule is the so-called fusing dimension f. Its slice (*, *, 0) contains a copy of schedule0, and its slice (*, *, 1) contains a copy of schedule1.

+ + + + + + + + + + + + + +
Before fusingAfter fusing
Before fusingAfter fuse(schedule0, schedule1)
+

In loop form, schedule is now equivalent to the following Python code:

+
for i in range(10):
+    for j in range(10):
+        # f = 0
+        C[i, j] += A[i, j]
+        # f = 1
+        C[i, j] *= B[i, j]
+
+

full fusion traversal

+

Resulting iteration sequence for C = (C + A) * B. (White elements represent C + A; purple elements are C * B)

+

Tiling

+

Recall that we discussed computing the output block-by-block: first computing C[0:2, 0:2] += A[0:2, 0:2], then computing C[0:2, 0:2] *= B[0:2, 0:2], and so on. This can be achieved with the following sequence of transformations: +

ii, jj = schedule.tile({
+    i: 2,
+    j: 2
+})
+schedule.reorder(i, j, f, ii, jj)
+
+The resulting schedule is equivalent to the following Python code: +
for i in range(0, 10, 2):
+    for j in range(0, 10, 2):
+        # f = 0
+        for ii in range(2):
+            for jj in range(2):
+                C[i+ii, j+jj] += A[i+ii, j+jj]
+        # f = 1
+        for ii in range(2):
+            for jj in range(2):
+                C[i+ii, j+jj] *= B[i+ii, j+jj]
+

+

Constraints of Fusing Dimension

+

The fusing dimension comes with certain constraints that are discussed from the safety perspective with examples.

+

Constraint 1: the fusing dimension is executed sequentially

+

Unlike other dimensions that allow parallelization, vectorization, or tensorization (see Section 7 ), none of these operations can be applied to the fusing dimension. The fusing dimension must be executed sequentially. This constraint enables the safety guarantee discussed below.

+

Safety

+

Before applying any subsequent transformations, the fused schedule is always logically equivalent to executing the original schedules sequentially for each value of the fused dimensions. However, is it safe? Recall that a schedule is considered safe if the underlying logic is guaranteed to be unchanged regardless of the applied transformation. The safety of a fused schedule depends on circumstances that may break logic equivalence:

+

Accera preserves the order of the fused schedules for each value of the fused dimensions, regardless of how the fused schedule is transformed. For example, in the example above, the fused dimensions are i and j. Therefore, for any concrete value of i and j, the corresponding operation from schedule0 is guaranteed to execute before the corresponding operation from schedule1, regardless of how the fused schedule is transformed. More specifically, for each i and j, the operation C[i, j] += A[i, j] is guaranteed to execute before the operation C[i, j] *= B[i, j], no matter how we transform the fused schedule. Since those are the only operations that interact with C[i,j], the Accera guarantee is sufficient, and we can claim that the fused schedule is safe. With this assurance, the programmer can apply any sequence of transformations without worrying about the correctness of the resulting implementation.

+

However, not every fusing operation creates a safe schedule. For example, consider a scenario where we fused schedule0 and schedule1 differently: +

# Reorder schedule1 before fusing
+schedule1.reorder(j1, i1)
+# Fuse schedule0 with the reordered schedule1
+schedule_t = acc.fuse(schedule0, schedule1)
+a, b, f = schedule_t.get_indices()
+
+In this unnatural example, i0 and j1 are fused and named a. Similarly,i1 and j0 are fused and named b. As mentioned above, Accera guarantees that, for each value of a and b, the operation C[a, b] += A[a, b] is executed before C[b, a] *= B[b, a]. The fusing operation itself preserves the logical equivalence. However, the underlying logic is changed with the transformation performed before fusion: +
schedule1.reorder(j1, i1)
+
+To understand this change in the logic, note that the resulting schedule is equivalent to the following Python code: +
for a in range(10):
+    for b in range(10):
+        C[a, b] += A[a, b]
+        C[b, a] *= B[b, a]
+
+The above code sets C[1,0] to C[1,0] * B[1,0] + A[1,0], whereas the original fused logic set C[1,0] to (C[1,0] + A[1,0]) * B[1,0]. In this case, we can conclude that schedule_t is definitely not safe. If the programmer decides to create an unsafe schedule, they take upon themselves the responsibility of maintaining logical equivalence.

+

Fusing iteration spaces with different shapes

+

If the iterations spaces have different shapes, Accera matches their shapes by padding them appropriately with empty cells.

+

Partial fusing

+

Instead of fusing all the dimensions, we may want to fuse a subset of dimensions, leaving the rest unfused. To fuse the first s dimensions, we use the syntax: +

# Fuse the first s dimensions of three schedules
+schedule = acc.fuse((schedule0, schedule1, ...), partial=s)
+
+The order of the dimensions in the fused schedule is as follows: first the fused dimensions s, then the fusing dimension f, followed by the unfused dimensions of schedule0, schedule1, and so on.

+

We can easily calculate the number of dimensions in the fused schedule. For example, if we fuse the first s dimensions of a d0-dimensional space schedule0 and a d1-dimensional space schedule1, the fused iteration space will have s fused dimensions, d0 + d1 - 2s unfused dimensions, and the special fusing dimension f, for a total of d0 + d1 - s + 1 dimensions.

+

The fuse operation uses padding to ensure that the fused iteration space is not jagged in any direction. For example, say that we partially fuse the first 2 dimensions of schedule0, which is 4-dimensional, and schedule1, which is 3-dimensional: +

schedule = acc.fuse((schedule0, schedule1), partial=2)
+i, j = schedule.get_fused_indices()
+f = schedule.get_fusing_index()
+k, l, m = schedule.get_unfused_indices()
+# Alternative way:
+# i, j, f, k, l, m = schedule.get_indices()
+
+First comes the fused dimensions i and j. Nest is the fusing dimensions f of size 2, followed by the unfused dimensions k and l from schedule0 and m from schedule1. The slice (*, *, 0, *, *, 0) contains a copy of schedule0, the slice (*, *, 1, 0, 0, *) contains a copy of schedule1, and the rest of schedule is padded with empty elements. Note that full fusing is a special case of partial fusing, where s is the larger of the dimensions of schedule0 and schedule1.

+

Constraint 2: the fusing dimension always precedes unfused dimensions

+

Another constraint introduced by partial fusing is that the fusing dimension must precede all of the unfused dimensions in its dimension order. This constraint applies to dimensions derived from the fusing dimension and the unfused dimensions via splitting.

+

Safety

+

The safety guarantees for partial fusing are a natural extension of the guarantees for full fusing. For each value of the fused dimensions, Accera preserves the fused schedules' order regardless of how the fused schedule is transformed. In other words, for each concrete value of fused dimensions, all the corresponding work in schedule0 (across all of its unfused dimensions) is performed before the corresponding work in schedule1 (across all of its unfused dimensions). This remains true no matter how we transform the fused schedule. While fusing, the programmer needs to consider if this property implies safety. The below examples shows how this can be done.

+

Partial fusing example: fully-connected neural layer with activation

+

Consider applying an element-wise operation, such as the ReLU function of AI, to the result of a matrix-matrix multiplication. This is called a fully connected layer with a ReLU activation in the language of neural networks. The function relu(x) is simply max(x,0).

+

Imagine that we have an element-wise operator relu, and we want to implement the equivalent Python code: +

C = relu(C + A @ B)
+
+Here, A has a shape of (8, 4), B has a shape of (4, 8), and C has a shape of (8, 8). Let’s now define two nests, one for C += A @ B and the other for C = relu(C), and obtain their corresponding default schedules: +
# Create nest0 and schedule0
+nest0 = acc.Nest(shape=(8, 8, 4))
+i0, j0, k0 = nest0.get_indices()
+
+# nest0 performs C += A @ B
+@nest0.iteration_logic
+def _():
+    C[i0, j0] += A[i0, k0] * B[k0, j0]
+
+schedule0 = nest0.create_schedule()
+
+# Create nest1 and schedule1
+nest1 = acc.Nest(shape=(8, 8))
+i1, j1 = nest1.get_indices()
+
+# nest1 performs C = relu(C)
+@nest1.iteration_logic
+def _():
+    C[i1, j1] = acc.max(C[i1, j1], 0)
+
+schedule1 = nest1.create_schedule()
+
+In schedule0 and schedule1, the first dimension represents the rows of C and the second dimension represents the columns of C. Additionally, schedule0 has a third dimension that schedule1 does not have. Therefore, we fuse the first two dimensions of the iteration spaces and leave the third dimension of schedule0 unfused. +
schedule = acc.fuse((schedule0, schedule1), partial=2)
+i, j = schedule.get_fused_indices()
+f = schedule.get_fusing_index()
+k0 = schedule.get_unfused_indices()[0]
+# Alternative way:
+# i, j, f, k0 = schedule.get_indices()
+

+

The fused iteration space schedule has a shape of (8, 8, 2, 4). Its slice (*, *, 0, *) contains a copy of schedule0, the slice (*, *, 1, 0) contains a copy of schedule1, and the rest of its elements are padded. Note that the code above overwrites the index k0, which initially was an index of schedule0. However, now it corresponds to the unfused index in schedule. Note that the name k0 is a stylistic choice, we could have chosen a different name.

+ + + + + + + + + + + + + +
Before fusingAfter fusing
Before fusingAfter fuse((schedule0, schedule1), partial=2) (padded elements in blue)
+

Safety

+

Is schedule safe? Recall that for each value of i and j, Accera guarantees that the corresponding work in schedule0 (C[i,j] += A[i,k0] * B[k0,j] for all values of k0) is executed before the corresponding work in schedule1 (C[i,j] = max(C[i,j], 0)), and this holds regardless of how the fused schedule is transformed. Since these are the only operations that touch C[i,j] and the ReLU operation is always executed last, this warrants that schedule is safe. Therefore, we can focus all of our attention on optimizing performance without worrying about correctness from this point onwards.

+

The resulting schedule is now equivalent to the following Python code:

+
for i in range(16):
+    for j in range(10):
+        # f = 0
+        for k0 in range(11):
+            C[i,j] += A[i,k0] * B[k0,j]
+        # f = 1
+        C[i,j] = max(C[i,j], 0)
+
+

relu(C + A @ B) traversal

+

Iteration sequence for C = relu(C + A @ B). (White elements represent C + A @ B; purple elements are relu(C); blue elements are padding.)

+

Partial fusing example: multiplying three matrices

+

Consider fusing two matrix-matrix multiplications to get matrix-matrix-matrix multiplication. Specifically, say that our goal is to calculate the equivalent of the following Python code: +

E += A @ B @ D
+
+Where A has a shape (4, 5), B (5, 6), D (6, 10), and E (4, 10).

+

We start by defining the arrays. In addition to A, B, D, and E, we define a temporary array C to store the intermediate result of A@B. +

A = acc.Array(role=acc.Role.INPUT, shape=(4, 5))
+B = acc.Array(role=acc.Role.INPUT, shape=(5, 6))
+C = acc.Array(role=acc.Role.TEMP, shape=(4, 6))
+D = acc.Array(role=acc.Role.INPUT, shape=(6, 10))
+E = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(4, 10))
+
+Note that C has the role of TEMP. Temporary arrays are mutable and initialized with zeros. Moreover, these arrays are logical objects that may not exist in memory during the entire computation.

+

Next, define a simple nest to compute C += A @ B and another simple nest to compute E += C @ D. +

# Create nest0 and schedule0 for C = A @ B
+nest0 = acc.Nest(shape=(4, 6, 5))
+i0, j0, k0 = nest0.get_indices()
+
+@nest0.iteration_logic
+def _():
+    C[i0, j0] += A[i0, k0] * B[k0, j0]
+
+schedule0 = nest0.create_schedule()
+
+# Create nest1 and schedule1 E += C @ D
+nest1 = acc.Nest(shape=(4, 10, 6))
+i1, j1, k1 = nest1.get_indices()
+
+@nest1.iteration_logic
+def _():
+    E[i1, j1] += C[i1, k1] * D[k1, j1]
+
+schedule1 = nest1.create_schedule()
+
+The temporary array C stores the output of schedule0, which is then used as one of the inputs of schedule1. Dimensions i0 and j0 correspond to the rows and columns of C in schedule0. Similarly, dimensions i1 and k1 correspond to the rows and columns of C in schedule1. Therefore, we fuse i0 with i1 and j0 with k1. We need to correctly line up the dimensions of the two iteration spaces and perform partial fusing. +
schedule1.reorder(i1, k1, j1)
+schedule = acc.fuse((schedule0, schedule1), partial=2)
+i, j = schedule.get_fused_indices()
+f = schedule.get_fusing_index()
+k0, j1 = schedule.get_unfused_indices()
+# Alternative way:
+# i, j, f, k0, j1 = schedule.get_indices()
+

+ + + + + + + + + + + + + +
Before reorder(i1, k1, j1)After reorder(i1, k1, j1)
Before reorder(i1, k1, j1)After reorder(i1, k1, j1)
+

The fused iteration space has a shape of (4, 6, 2, 5, 10). i is the result of fusing i0 and i1, j is the result of fusing j0 and k1 and f is the fusing dimension. On the other hand, k0 is the unfused dimension from schedule0, and j1 is the unfused dimension from schedule1. The slice (*, *, 0, *, 0) contains a copy of schedule0 and the slice (*, *, 1, 0, *) contains a copy of schedule1. The rest of the iteration space is padded with empty elements.

+

After fusing

+

After fuse((schedule0, schedule1), partial=2) (White elements represent C += A @ B; purple elements are E += C @ D; blue elements are padding.)

+

Safety

+

Is schedule safe? Again, recall that for each value of i and j, Accera guarantees that all of the corresponding work in schedule0 (C[i, j] += A[i, k0] * B[k0, j] for all values of k0) is executed before any of the corresponding work in schedule1 (E[i, j1] += C[i, j] * D[j, j1] for all values of j1). In other words, each element of C is entirely computed before it is used. This confirms that the schedule is safe.

+

Initially, the fused schedule is equivalent to the following Python code: +

for i in range(4):
+    for j in range(6):
+        for f in range(2):
+            for k0 in range(5):
+                for j1 in range(7):
+                    if f == 0 and j1 == 0:
+                        # f = 0, create C[i, j]
+                        C[i, j] += A[i, k0] * B[k0, j]
+                    if f == 1 and k0 == 0:
+                        # f = 1, use C[i, j]
+                        E[i, j1] += C[i, j] * D[j, j1]
+

+

The simplified loops after unswitching:

+
for i in range(4):
+    for j in range(6):
+        # f = 0, create C[i, j]
+        for k0 in range(5):
+            C[i, j] += A[i, k0] * B[k0, j]
+        # f = 1, use C[i, j]
+        for j1 in range(7):
+            E[i, j1] += C[i, j] * D[j, j1]
+
+

The advantage of this schedule is that only one element of C is active at any time in the computation. Accera can reuse the same memory location to store the active element of C instead of storing all of C in physical memory.

+

Tiling

+

As a further optimization, we can compute a 2×3 block of C. Do all the work that uses this block and then move on to the next block: +

ii, jj = schedule.tile({
+    i: 2,
+    j: 3
+})
+schedule.reorder(i, j, f, ii, jj, k0, j1)
+
+This schedule is equivalent to the following Python code: +
for i in range(0, 4, 2):
+    for j in range(0, 6, 3):
+        # f = 0
+        for ii in range(2):
+            for jj in range(3):
+                for k0 in range(11):
+                    C[i+ii, j+jj] += A[i+ii, k0] * B[k0, j+jj]
+        # f = 1
+        for ii in range(2):
+            for jj in range(3):
+                for j1 in range(7):
+                    E[i+ii, j1] += C[i+ii, j+jj] * D[j+jj, j1]
+

+ + +
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/05 Targets/index.html b/Manual/05 Targets/index.html new file mode 100644 index 00000000..dcb2d7dc --- /dev/null +++ b/Manual/05 Targets/index.html @@ -0,0 +1,2209 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 05 Targets - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 5: Targets

+

Accera is a cross compiler, which means that it can generate executable code for different target platforms. A target is described using the Target class. Accera already supports many different targets, for example: +

import accera as acc
+
+corei9 = acc.Target(Target.Model.INTEL_7960X, num_threads=44)
+
+or

+

v100 = acc.Target(Target.Model.NVIDIA_V100)
+
+or

+
corei7 = acc.Target(known_name="Intel 7700T")
+
+

To query the list of known names: +

dir(acc.Targets.Model)
+

+

We can also define custom targets: +

my_target = acc.Target(name="Custom processor", category=acc.Target.Category.CPU, architecture=acc.Target.Architecture.X86_64, family="Broadwell", extensions=["MMX", "SSE", "SSE2", "SSE3", "SSSE3", "SSE4", "SSE4.1", "SSE4.2", "AVX", "AVX2", "FMA3"], num_cores=22, num_threads=44, frequency_GHz=3.2, turbo_frequency_GHz=3.8, cache_sizes=[32, 256, 56320], cache_lines=[64, 64, 64])
+

+

One benefit of targets is that they provide a standard way of accessing useful constants. For example, we may want to split an iteration space dimension by the number of elements that fit in a vector register. +

schedule.split(i, size=corei9.vector_bytes/4)
+
+We may tile the iteration space for GPU targets based on input shapes and available resources like shared memory. If you are not sure of what to use, try starting with the default: +
# find block_x and block_y in powers of two, such that block_x*block_y=v100.default_block_size.
+block_x = pow(2, math.log2(v100.default_block_size)//2)
+block_y = v100.default_block_size // block_x
+ii, jj = schedule.tile({
+    i: block_x,
+    j: block_y
+})
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/06 Plans - Caching/index.html b/Manual/06 Plans - Caching/index.html new file mode 100644 index 00000000..59917322 --- /dev/null +++ b/Manual/06 Plans - Caching/index.html @@ -0,0 +1,2401 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 06 Plans Caching - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 6: Plans - Caching

+

In the previous sections, we defined the logic and then scheduled its iterations. Now, let's move on to completing the implementation with target-specific options.

+

First, we create a plan from the schedule: +

plan = schedule.create_plan()
+
+The Accera programming model allows us to create multiple plans from a single schedule. More importantly, we can modify individual plans without changing the schedule. We can manually specify the target platform by calling create_plan that takes a target argument. The default value of this target argument is acc.Target.HOST, which refers to the current host computer.

+

In this section, we discuss how to add data caching strategies to a plan.

+

Not yet implemented: Data caching strategies are not supported when one or more of the Array's dimension sizes are specified at runtime.

+

Key slices

+

Recall that a slice is a set of iteration space elements that match a coordinate template with wildcards, such as (1, *, 3). A key-slice is a slice with only right-aligned wildcards, such as (1, 2, *) and (3, *, *). The level of a key-slice is determined by the number of wildcards in its definition. For example, (1, 2, *) is a level 1 key-slice and (3, *, *) is a level 2 key-slice.

+

Note that the key-slices are changed by reordering the dimensions of the iteration space. However, it is always true that the entire d-dimensional iteration space is a level d key-slice and each individual element is a level zero key-slice. For a total of d+1 different key-slices, each iteration belongs to one key-slice from each level, zero to d. When the schedule is executed, the key-slices containing the current iteration are called the current key-slices.

+

In the Accera programming model, key-slices are significant because they partition the iteration space into sets of consecutive iterations. Therefore, they can describe the phases of computation at different levels of granularity. The term key-slice suggests using them to key (trigger) different actions. Specifically, each time the current level-l key slice changes, we use this event to trigger a cache update.

+

As mentioned above, a key-slice can be identified by its level. Another way to specify a key-slice is to take advantage of the iteration space dimensions being named in the order. To specify a key-slice for a dimension, replace it and subsequent dimensions with wildcard symbols. For example, if the names of the iteration space dimensions are (i, j, k), then a key-slice that corresponds to the dimension j is one of (0, *, *), (1, *, *), etc. Both ways of specifying a key-slice are useful and Accera uses them interchangeably.

+

Active elements and active blocks

+

A loop nest operates on the data that is stored in arrays. Each key-slice can access a subset of the array elements, which we call the active elements that correspond to that specific key-slice. Since the current iteration belongs to key-slices at different levels, we need to define corresponding sets of active elements at different levels.

+

More precisely, array A elements that are read from or written to by the iterations of the current level l key-slice are called the level l active elements of A. This set of elements does not necessarily take the shape of a block. Therefore, the level l active block of A can be defined as the smallest block of elements that contains all of the level l active elements in A. Accera uses active blocks to define caching strategies.

+

Just like we can specify a key-slice using a dimension, we can also refer to the active block that corresponds to a specific dimension. For example, if the names of the iteration space dimensions are (i, j, k) and the current iteration is one of the iterations for which i=3, then the active block in A that corresponds to dimension j is the block that includes all the elements touched by the key-slice (3, *, *).

+

Caches

+

An Accera cache is a local copy of an active block. A cache is contiguous in memory and its memory layout may differ from the layout of the original array. The loop nest iterations operate on the cache elements instead of the original array elements.

+

The contents of the active block are copied into the cache at the start of the corresponding key-slice. If the array is mutable (namely, an input/output array or a temporary array), the cache contents are copied back into the original array at the end of the key-slice.

+

Caching by level

+

To define a cache for a given array, all we need is to specify the desired level.For example: +

AA = plan.cache(A, level=2)
+
+The return value AA is a handle that can be used to refer to the cache in subsequent operations. We can choose the cache layout, just as we did when we defined the original array. +
AA = plan.cache(A, level=2, layout=acc.Array.Layout.FIRST_MAJOR)
+

+

Caching by dimension

+

As mentioned above, we can specify an active block using a dimension. We use this to define a cache as follows: +

AA = plan.cache(A, index=j)
+

+

Caching by element budget

+

Note that the current active blocks of an array are nested, and their sizes are monotonic (nondecreasing) in their level. Therefore, we can also select the largest active block that does not exceed a certain number of elements: +

AA = plan.cache(A, max_elements=1024)
+

+

Thrifty caching

+

By default, Accera caching strategies are thrifty in the sense that the data is physically copied into an allocated cache only if the cached data somehow differs from the original active block. Therefore, if the original active block is already in the correct memory layout and resides contiguous in memory. Accera skips the caching steps and uses the original array instead. Note that a physical copy is created on a GPU if the cache is supposed to be allocated a different type of memory than the original array (e.g., the array is in global memory, but the cache is supposed to be in shared memory).

+

For example, assume that A is a two-dimensional array and its active block at the chosen level is always one of its rows. If A is row-major, the rows are already stored contiguously. Additionally, the data in the active block and the data to be copied to cache are identical: both are contiguous and share the same memory layout. In this case, there is no benefit in using cache over the original array. The thrifty caching strategy will skip the caching steps and use the original array instead.

+

On the other hand, if A is column-major, its rows are not stored contiguously. In this case, copying the active row into a contiguous temporary location could be computationally advantageous. Therefore, the thrifty caching strategy would create the cache and populate it with the data.

+

Thrifty caching can be turned off using the optional argument thrifty=False. If turned off, a physical copy is always created.

+

Hierarchical caching

+

Caches can be composed hierarchically. Namely, a high-level key-slice can trigger a copy from the original array into a big cache, and a lower level key-slice can be used to trigger a copy from the big cache into a smaller cache.

+

For example, +

AA = plan.cache(A, level=4)
+AAA = plan.cache(AA, level=2)
+

+

Multicaching

+

While caches are defined with a key-slice level, a higher-level key slice trigger_level can be specified as the trigger key-slice for copying multiple successive active blocks of elements to a local copy. These copied active blocks have their layouts defined as usual, and only the trigger level for copying them has been changed. Since active blocks are not mutually exclusive, this can result in the same element being copied into multiple locations as separate caches. Therefore, a trigger_level may only be specified on an INPUT or CONST array as Accera does not support multicache write coherence.

+

For example, +

AA = plan.cache(A, level=2, trigger_level=4)
+

+

Mapping caches to specific types of memory

+

Some target platforms have different types of memory that can hold Accera caches. In the case of a GPU target, caches can be located in global or shared memory. To explicitly choose the location of the cache, we write: +

AA = plan.cache(A, level=4, location=v100.MemorySpace.SHARED)
+

+

Double buffering

+

Caches can double-buffer data by loading the next active block's cache data into a temporary buffer during the current active block's usage and then moving that data into the cache buffer after the current active block is done being used. If the cache trigger level is the highest level in the loopnest then this does nothing as it is dependent on having another loop outside of the cache trigger loop. In shared memory caches on GPU this temporary buffer will automatically be allocated in private memory. Since the next iteration's data is loaded into a temporary buffer while the current iteration's data is in the cache buffer, any overlap in these active blocks would result in a write coherency issue similar to what occurs with Multicaching. Because of this, double_buffer may only be specified on an INPUT or CONST array as Accera does not perform multicache write coherence. +

AA = plan.cache(A, level=3, double_buffer=True)
+

+

Full schedule with equivalent pseudo-code: +

...
+M, N, K = 1024, 1024, 1024
+m_tile, n_tile, k_tile = 32, 64, 128
+nest = Nest((M, N, K))
+i, j, k = nest.get_indices()
+@nest.iteration_logic
+def _():
+    C[i,j] += A[i,k] * B[k,j]
+schedule = nest.create_schedule()
+schedule.tile({
+    i: m_tile,
+    j: n_tile,
+    k: k_tile
+})
+schedule.reorder(i, j, k, ii, jj, kk)
+
+plan = schedule.create_plan()
+plan.cache(A, index=ii, double_buffer=True)
+...
+
+equivalent to: +
for i in range(0, M, m_tile):
+    for j in range(0, N, n_tile):
+        for ii_cache in range(0, m_tile):
+            for kk_cache in range(0, k_tile):
+                cache_A[ii_cache, kk_cache] = A[i+ii_cache, kk_cache]
+        for k in range(0, K-k_tile, k_tile): # Note: this loop doesn't run for the final K tile
+            for ii_cache in range(0, m_tile):
+                for kk_cache in range(0, k_tile):
+                    temp_A[ii_cache, kk_cache] = A[i+ii_cache, (k + k_tile) + kk_cache]
+            for ii in range(0, m_tile):
+                for jj in range(0, n_tile):
+                    for kk in range(0, k_tile):
+                        C[i+ii, j+jj] += cache_A[ii, kk] * B[k+kk, j+jj]
+            for ii_cache in range(0, m_tile):
+                for kk_cache in range(0, k_tile):
+                    cache_A[ii_cache, kk_cache] = temp_A[ii_cache, kk_cache]
+        for ii in range(0, m_tile):
+            for jj in range(0, n_tile):
+                for kk in range(0, k_tile):
+                    C[i+ii, j+jj] += cache_A[ii, kk] * B[k+kk, j+jj]
+

+

Caching strategy

+

In GPUs, the mapping between threads and data can be controlled by specifying the strategy option. Currently, we support the BLOCKED and the STRIPED access patterns and they are explained in detail at accera.CacheStrategy. +The choice of which pattern to use will depend on the hardware architecture and the intended algorithm the cache will be used for, since different access patterns incur different performance overhead.

+
AA = plan.cache(A, level=3, double_buffer=True, strategy=_CacheStrategy.BLOCKED)
+
+

The above example will create a cache where each thread copies a contiguous chunk (block) of elements based on their thread index.

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/07 Plans - Operations and Optimizations/index.html b/Manual/07 Plans - Operations and Optimizations/index.html new file mode 100644 index 00000000..7c000952 --- /dev/null +++ b/Manual/07 Plans - Operations and Optimizations/index.html @@ -0,0 +1,2562 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 07 Plans Operations and Optimizations - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 7: Plans - Operations and Optimizations

+

We can control target-specific operations and optimizations using a plan. Examples include instruction pipelining, applying SIMD vector instructions, and so on.

+

unroll

+

By default, each dimension of the iteration space is implemented as a for-loop. The unroll instruction marks a dimension for unrolling rather than looping. Imagine the following nest that multiplies the entries of an array by a constant: +

import accera as acc
+
+my_target = acc.Target(type=acc.Target.Category.CPU)
+
+nest = acc.Nest(shape=(3,5))
+i, j = nest.get_indices()
+
+@nest.iteration_logic
+def _():
+    A[i, j] *= 2.0
+
+plan = nest.create_plan(my_target)
+
+If we build plan as is, the resulting implementation would be equivalent to the following Python code: +
for i in range(3):
+    for j in range(5):
+        A[i, j] *= 2.0
+
+If we add the instruction plan.unroll(index=j), the resulting implementation becomes equivalent to: +
for i in range(3):
+    A[i, 0] *= 2.0
+    A[i, 1] *= 2.0
+    A[i, 2] *= 2.0
+    A[i, 3] *= 2.0
+    A[i, 4] *= 2.0
+
+If, instead of unrolling j, we add the instruction plan.unroll(index=i), the resulting implementation becomes equivalent to: +
for j in range(5):
+    A[0, j] *= 2.0
+for j in range(5):
+    A[1, j] *= 2.0
+for j in range(5):
+    A[2, j] *= 2.0
+
+And, of course, we can also unroll both dimensions, removing for-loops completely.

+

vectorize

+

Modern target platforms support SIMD vector instructions. These instructions perform the same operation on an entire vector of elements, all at once. By default, each dimension of an iteration space becomes a for-loop. The vectorize instruction instead labels a dimension for vectorized execution, rather than for-looping.

+

For example, assume that a host supports 256-bit vector instructions, indicating that its vector instructions operate on eight floating-point elements at once. Also, consider that we already have arrays A, B, and C, and we write the following code: +

nest = acc.Nest(shape=(64,))
+i = nest.get_indices()
+
+@nest.iteration_logic
+def _():
+    C[i] = A[i] * B[i]
+
+schedule = nest.create_schedule()
+ii = schedule.split(i, 8)
+
+plan = nest.create_plan()
+plan.vectorize(index=ii)
+
+The dimension marked for the vectorization is of size 8, which is a supported vector size on the specific target platform. Therefore, the resulting binary will contain something like: +
  00000001400010B0: C5 FC 10 0C 11     vmovups     ymm1,ymmword ptr [rcx+rdx]
+  00000001400010B5: C5 F4 59 0A        vmulps      ymm1,ymm1,ymmword ptr [rdx]
+  00000001400010B9: C4 C1 7C 11 0C 10  vmovups     ymmword ptr [r8+rdx],ymm1
+  00000001400010BF: 48 8D 52 20        lea         rdx,[rdx+20h]
+  00000001400010C3: 48 83 E8 01        sub         rax,1
+  00000001400010C7: 75 E7              jne         00000001400010B0
+
+Note how the multiplication instruction vmulps and the memory move instruction vmovups deal with eight 32-bit floating-point values at a time.

+

Different targets support different vector instructions having different vector sizes. The following table includes iteration logic that vectorizes correctly on most targets with vectorization support, such as Intel Haswell, Broadwell or newer, and ARM v7/A32. Other examples of iteration logic may or may not vectorize correctly. Variables prefixed with v are vector types, and those prefixed with s are scalar types.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Vector pseudocodeEquivalent toSupported types
v1 += s0 * v0for i in range(vector_size):
v1[i] += s0 * v0[i]
float32
v1 += v0 * s0for i in range(vector_size):
v1[i] += v0[i] * s0
float32
v1 += v0 / s0for i in range(vector_size):
v1[i] += v0[i] / s0
float32
v1 -= s0 * v0for i in range(vector_size):
v1[i] -= s0 * v0[i]
float32
v1 -= v0 * s0for i in range(vector_size):
v1[i] -= v0[i] * s0
float32
v1 -= v0 / s0for i in range(vector_size):
v1[i] -= v0[i] / s0
float32
v2 += v0 * v1for i in range(vector_size):
v2[i] += v0[i] * v1[i]
float32
vector inner (dot) product: s0 += dot(v0, v1)for i in range(vector_size):
s0 += v0[i] * v1[i]
float32
v2 = v0 + v1for i in range(vector_size):
v2[i] = v0[i] + v1[i]
int8/16/32/64, float32
v2 = v0 - v1for i in range(vector_size):
v2[i] = v0[i] - v1[i]
int8/16/32/64, float32
v2 = v0 * v1for i in range(vector_size):
v2[i] = v0[i] * v1[i]
int8/16/32/64, float32
v2 = v0 / v1for i in range(vector_size):
v2[i] = v0[i] / v1[i]
float32
v1 = abs(v[0])for i in range(vector_size):
v1[i] = abs(v0[i])
int8/16/32/64, float32
v2 = (v0 == v1)for i in range(vector_size):
v2[i] = 0XF..F if v0[i] == v1[i] else 0
int8/16/32/64, float32
v2 = (v0 > v1)for i in range(vector_size):
v2[i] = 0XF..F if v0[i] > v1[i] else 0
int8/16/32/64, float32
v2 = (v0 >= v1)for i in range(vector_size):
v2[i] = 0XF..F if v0[i] >= v1[i] else 0
int8/16/32/64, float32
v2 = (v0 < v1)for i in range(vector_size):
v2[i] = 0XF..F if v0[i] < v1[i] else 0
int8/16/32/64, float32
v2 = (v0 <= v1)for i in range(vector_size):
v2[i] = 0XF..F if v0[i] <= v1[i] else 0
int8/16/32/64, float32
v1 = v0 << s0for i in range(vector_size):
v1[i] = v0[i] << s0
int16/32/64, float32
v1 = v0 >> s0for i in range(vector_size):
v1[i] = v0[i] >> s0
int16/32/64, float32
s0 = sum(v0)for i in range(vector_size):
s0 += v0[i]
int8/16/32/64, float32
s0 = max(v0 + v1)for i in range(vector_size):
s0 = max(v0[i] + v1[i], s0)
int8/16/32/64, float32
s0 = max(v0 - v1)for i in range(vector_size):
s0 = max(v0[i] - v1[i], s0)
int8/16/32/64, float32
+

Additionally, Accera can perform vectorized load and store operations to/from vector registers and memory if the memory locations are contiguous.

+

To vectorize dimension i, the number of active elements that corresponds to dimension i must exactly match the vector instruction width of the target processor. For example, if the target processor has vector instructions that operate on either 4 or 8 floating-point elements at once, then the number of active elements can either be 4 or 8. Additionally, those active elements must occupy adjacent memory locations (they cannot be spread out).

+

tensorize

+

Some hardware also have specialized instructions for performing matrix multiplications. These instructions operate on certain matrix dimensions with specific data types. The tensorization instructions take tiles of the A, B, and C matrices and compute the C = A * B + C operation.

+

The tensorize operation takes 3 indices:

+
plan.tensorize(indices=(i,j,k))
+
+

Tensorization is limited and is only valid on loop structures of the form

+
for i in range(M):
+    for k in range(N):
+        for j in range(K):
+            C[i, j] += A[i, k] * B[k, j]
+
+

Where there is MxNxK tensorization hardware support using the A, B, and C element data types.

+

Convenience syntax: kernelize

+

The kernelize instruction is a convenience syntax that does not provide any unique functionality. Specifically, kernelize is equivalent to a sequence of unroll instructions, followed by an optional vectorize instruction.

+

A typical Accera design pattern is to first break a loop-nest into tiles and then apply an optimized kernel to each tile. For example, imagine that the loop nest multiplies two 256×256 matrices and the kernel is a highly optimized procedure for multiplying 4×4 matrices. Accera will introduce different ways to write highly optimized kernels in the future. However, currently, it only supports automatic kernelization using the kernelize instruction. As mentioned above, kernelize is shorthand for unrolling and vectorizing. These instructions structure the code in a way that makes it easy for downstream compiler heuristics to automatically generate kernels.

+

Consider, once again, the matrix multiplication example we discussed previously in Section 2. +Assume that we declare the schedule and reorder as follows:

+

schedule = nest.create_schedule()
+schedule.reorder(i, k, j)
+
+Notice that i, k, j are the last three dimensions in the iteration space and the resulting implementation becomes equivalent to:

+
for i in range(M):
+    for k in range(S):
+        for j in range(N):
+            C[i, j] += A[i, k] * B[k, j]
+
+

The instruction: +

plan.kernelize(unroll_indices=(i, k), vectorize_indices=j)
+
+is just shorthand for +
plan.unroll(i)
+plan.unroll(k)
+plan.vectorize(j)
+
+Applying this sequence of instructions allows the compiler to automatically create an optimized kernel from loops i, k, j.

+

For simplicity, assume that the matrix sizes defined by M, N, and S are 3, 4, and 2 respectively.

+

After applying kernelize, the schedule is equivalent to the following Python code: +

C[0,0:4] += A[0,0] * B[0,0:4] # vectorized
+C[0,0:4] += A[0,1] * B[1,0:4] # vectorized
+C[1,0:4] += A[1,0] * B[0,0:4] # vectorized
+C[1,0:4] += A[1,1] * B[1,0:4] # vectorized
+C[2,0:4] += A[2,0] * B[0,0:4] # vectorized
+C[2,0:4] += A[2,1] * B[1,0:4] # vectorized
+

+

This would result in the following vectorized instructions on an Intel Haswell CPU: +

  0000000000000200: C4 C1 78 10 00     vmovups     xmm0,xmmword ptr [r8]
+  0000000000000205: C4 E2 79 18 09     vbroadcastss xmm1,dword ptr [rcx]
+  000000000000020A: C5 F8 10 12        vmovups     xmm2,xmmword ptr [rdx]
+  000000000000020E: C4 E2 69 A8 C8     vfmadd213ps xmm1,xmm2,xmm0
+  0000000000000213: C5 F8 10 5A 10     vmovups     xmm3,xmmword ptr [rdx+10h]
+  0000000000000218: C4 E2 79 18 61 04  vbroadcastss xmm4,dword ptr [rcx+4]
+  000000000000021E: C4 E2 61 A8 E1     vfmadd213ps xmm4,xmm3,xmm1
+  0000000000000223: C4 E2 79 18 49 08  vbroadcastss xmm1,dword ptr [rcx+8]
+  0000000000000229: C4 E2 69 A8 C8     vfmadd213ps xmm1,xmm2,xmm0
+  000000000000022E: C4 E2 79 18 69 0C  vbroadcastss xmm5,dword ptr [rcx+0Ch]
+  0000000000000234: C4 E2 61 A8 E9     vfmadd213ps xmm5,xmm3,xmm1
+  0000000000000239: C4 E2 79 18 49 10  vbroadcastss xmm1,dword ptr [rcx+10h]
+  000000000000023F: C4 E2 69 A8 C8     vfmadd213ps xmm1,xmm2,xmm0
+  0000000000000244: C4 E2 79 18 41 14  vbroadcastss xmm0,dword ptr [rcx+14h]
+  000000000000024A: C4 E2 61 A8 C1     vfmadd213ps xmm0,xmm3,xmm1
+  000000000000024F: C4 C1 58 58 09     vaddps      xmm1,xmm4,xmmword ptr [r9]
+  0000000000000254: C4 C1 50 58 51 10  vaddps      xmm2,xmm5,xmmword ptr [r9+10h]
+  000000000000025A: C4 C1 78 58 41 20  vaddps      xmm0,xmm0,xmmword ptr [r9+20h]
+

+

parallelize

+

The parallelize instruction performs one or more loops in parallel on multiple cores.

+

xeonPlat = acc.Target("Intel 9221", num_threads=16)
+plan = schedule.create_plan(xeonPlat)
+plan.parallelize(indices=(i,j,k))
+
+Specifying multiple dimensions is equivalent to the collapse argument in OpenMP. Therefore, the dimensions must be contiguous in the iteration space dimension order.

+

Static scheduling policy

+

Static scheduling strategy is invoked by setting the argument policy="static" in the call to parallelize. If n iterations are parallelized across c cores, the static scheduling partitions the work into c fixed parts, some of size floor(n/c), some of size ceil(n/c), and executes each part on a different core.

+

Dynamic scheduling policy

+

Dynamic scheduling strategy is invoked by setting the argument policy="dynamic" in the call to parallelize. Dynamic scheduling creates a single work queue that is shared across different cores.

+

Specifying thread limit

+

Setting the argument max_threads to a positive integer value will tell the compiler to have an upper bound on the number of threads used for distributing the workload.

+

Not yet implemented: Pinning to specific cores

+

The pin argument allows the parallel work to be pinned to specific cores.

+

bind

+

Some target platforms, such as GPUs, are specifically designed to execute nested loops. They can take an entire grid of work and schedule its execution on multiple cores. On a GPU, this grid is broken up into multiple blocks, where each block contains multiple threads. Block iterators and thread iterators are identified by special variables in the Target object. To take advantage of a target platform's ability to execute grids, we must bind dimensions of the iteration space with these special iterator variables.

+

For example, +

v100 = acc.Target("Tesla V100")
+plan.bind(mapping={
+        i: v100.GridUnit.BLOCK_X,
+        j: v100.GridUnit.THREAD_X,
+        k: v100.GridUnit.THREAD_Y
+    }
+)
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/08 Deferred Layout of Constant Arrays/index.html b/Manual/08 Deferred Layout of Constant Arrays/index.html new file mode 100644 index 00000000..938eab3e --- /dev/null +++ b/Manual/08 Deferred Layout of Constant Arrays/index.html @@ -0,0 +1,2227 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 08 Deferred Layout of Constant Arrays - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 8: Deferred layout of constant arrays

+

Let's revisit the memory layout of constant arrays. As explained in Section 1, the contents of constant arrays are known at compile-time, and these contents are immutable. Accera stores constant arrays in a non-standard memory layout optimized for a particular plan. In some cases, storing multiple copies of each array element may even prove advantageous (e.g., storing a matrix in row-major and column-major layouts).

+

Deferred layout based on a cache

+

Accera's cache strategy creates local copies of an array's active blocks. The constant array can be arranged based on the defined cache. Specifically, the array is stored by serializing the active blocks consecutively. If the caching strategy is thrifty=True, the active blocks are ready to use without copying the data.

+

To define an array layout based on a cache, Accera DSL has to overcome the chicken-and-egg paradox. While on the one hand, arrays need to be defined even before the nest logic. On the other hand, array layout depends on a cache, which is defined only as a part of a plan. In Accera, we overcome this situation by splitting the array definition into two parts. Though we still define the constant array upfront, we avoid committing to a specific layout. +

import accera as acc
+import numpy as np
+
+matrix = np.random.rand(16, 16)
+A = acc.Array(role=acc.Role.CONST, data=matrix, layout=acc.Array.Layout.DEFERRED)
+
+Now we define the nest logic, the schedule, and the plan. Consider that we define a plan named plan and use this plan to define a cache A based on dimension i: +
AA = plan.cache(A, i, layout=acc.Array.Layout.FIRST_MAJOR, thrifty=True)
+
+We can now use the cache AA to determine the layout of the original array A: +
A.deferred_layout(cache=AA)
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/09 Parameters/index.html b/Manual/09 Parameters/index.html new file mode 100644 index 00000000..3fbb961e --- /dev/null +++ b/Manual/09 Parameters/index.html @@ -0,0 +1,2537 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 09 Parameters - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 9: Parameters

+

Accera's parameters are placeholders that get replaced with concrete values when adding a function to a package. A parameter can be used in a Nest, a Schedule, or a Plan.

+

Parameterized nests

+

Recall that a Nest represents the loop-nest logic. We can parameterize the nest's shape and iteration logic. For example, consider the following parameterized version of matrix multiplication:

+

# Create parameters
+P0, P1, P2, P3 = acc.create_parameters()
+
+A = acc.Array(role=acc.Role.INPUT, shape=(P0, P2))
+B = acc.Array(role=acc.Role.INPUT, shape=(P2, P1))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(P0, P1))
+
+# Define a simple nest
+nest = acc.Nest(shape=(P0, P1, P2))
+i, j, k = nest.get_indices()
+
+# Define the loop nest logic and add it to the nest
+@nest.iteration_logic
+def _():
+    C[i, j] += P3 * A[i, k] * B[k, j]
+
+# create a package
+package = acc.Package()
+
+# Use the templated nest to add two different functions to the package
+package.add(nest, args=(A, B, C), parameters={P0:16, P1:16, P2:16, P3:1.0}, base_name="matmul_16_16_16_1")
+package.add(nest, args=(A, B, C), parameters={P0:32, P1:32, P2:32, P3:2.0}, base_name="matmul_32_32_32_2")
+
+In the above scenario, the shape of the nest is parameterized by (P0, P1, P2) and its iteration logic includes the parameter P3. The nest is used twice with different settings of these parameters to create two separate functions in the package.

+

Parameterized schedules and plans

+

Parameters can also appear in schedules and plans. For example, we can add the following code snippet:

+
P4, P5 = acc.create_parameters()
+
+# Create a parameterized schedule
+schedule = nest.create_schedule()
+ii = schedule.split(i, size=P4)
+
+# Create a parameterized plan
+plan = schedule.create_plan()
+plan.cache(A, level=P5)
+
+# Add another function to the package
+package.add(plan, args=(A, B, C), parameters={P0:16, P1:16, P2:16, P3:1.0, P4:4, P5:2}, base_name="alternative_matmul_16_16_16")
+
+

Supported operations

+

Accera's parameters support the basic arithmetic operations and other relational/bitwise/intrinsics operations. For example, we can add the following code snippet instead:

+
fma_unit_count, vector_size, P5 = acc.create_parameters()
+
+# Create a parameterized schedule
+schedule = nest.create_schedule()
+ii = schedule.split(i, size=fma_unit_count * vector_size)
+iii = schedule.split(ii, size=vector_size)
+
+# Create a parameterized plan
+plan = schedule.create_plan()
+plan.cache(A, level=P5)
+
+# Add another function to the package
+package.add(plan, args=(A, B, C), parameters={P0:16, P1:16, P2:16, P3:1.0, fma_unit_count:4, vector_size:16, P5:2}, base_name="alternative_matmul_16_16_16")
+
+

The supported operations include the following operations:

+

Arithmetic operators

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationTypesDescription
a + bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the sum of parameters (or parameter and scalar) a and b
a - bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the difference between parameters (or parameter and scalar) a and b
a * bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the product of parameters (or parameter and scalar) a and b
a / bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the quotient of parameters (or parameter and scalar) a and b
a ** bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the b'th power of parameter a
a // bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the floor of the quotient of parameters (or parameter and scalar) a and b
a % bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the signed remainder after dividing parameter a by parameter or scalar b
-aacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns the additive inverse of parameter a
+

Comparison Operations

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationTypesDescription
a == bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if parameter or scalar a equals parameter or scalar b, else False
a != bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if parameter or scalar a is not equal to parameter or scalar b, else False
a < bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if parameter or scalar a is strictly smaller than parameter or scalar b, else False
a <= bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if parameter or scalar a is smaller than or equal to parameter or scalar b, else False
a > bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if parameter or scalar a is strictly greater than parameter or scalar b, else False
a >= bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64Returns True if parameter or scalar a is greater than or equal to parameter or scalar b, else False
+

Bitwise operators

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationTypesDescription
a & bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns the bitwise AND of the bits in parameters (or parameter and scalar) a and b
a \| bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns the bitwise OR of the bits in parameters (or parameter and scalar) a and b
a ^ bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns the bitwise XOR of the bits in parameters (or parameter and scalar) a and b
~aacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns the bitwise inverse of the bits in parameter a
a << bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns parameter a whose bitwise representation is shifted left by b bits
a >> bacc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64Returns parameter a whose bitwise representation is shifted right by b bits
+

Intrinsics

+ + + + + + + + + + + + + + + +
OperationTypesDescription
acc.abs(a)acc.ScalarType.float16/32/64Returns the absolute value of parameter a
+

Tuple parameter values

+

Parameters can be used as placeholders for tuples, specifically for tuples of indices. For example, assume that we want to parameterize the order of the iteration space dimensions. We can then write: +

P6 = acc.create_parameters()
+schedule.reorder(order=P6)
+
+Later, we can set the value of P6 to the index tuple (j,k,i).

+

Create parameters from an entire parameter grid

+

Consider the parameterized nest defined above. Rather than setting a specific value for each parameter, imagine that we have a set of different values for each parameter. For example, consider that we want P0 to have a value in set {8, 16}, P1 in {16, 32}, P2 to be always 16, and P3 in {1,2}. We can define the parameter grid with this data, which lists all the valid parameter combinations. In our case, this grid includes the following parameter settings: +

{P0:8, P1:16, P2:16, P3:1.0}
+{P0:8, P1:16, P2:16, P3:2.0}
+{P0:8, P1:32, P2:16, P3:1.0}
+{P0:8, P1:32, P2:16, P3:2.0}
+{P0:16, P1:16, P2:16, P3:1.0}
+{P0:16, P1:16, P2:16, P3:2.0}
+{P0:16, P1:32, P2:16, P3:1.0}
+{P0:16, P1:32, P2:16, P3:2.0}
+

+

Accera provides an easy way to add all the functions that correspond to the parameter grid at once: +

parameters = create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0])
+package.add(nest, args=(A, B, C), base_name="matmul", parameters)
+
+In this case, package.add generates a function eight times, once for each parameter combination in the grid. Other than nest, package.add can alternatively accept a Schedule (if we are performing schedule transformations), or a Plan (if we are setting target-specific options). All eight functions share the same base name. However, Accera automatically adds a unique suffix to each function name to prevent duplication. This pattern allows optional filtering by inspecting the generated parameter values list before calling package.add.

+

You can define a lambda or function to filter out combinations from the parameter grid. The arguments to the filter are the values of a parameter combination, and it should return True if the combination should be included, and False otherwise: +

parameters = create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]}, filter_func=lambda p0, p1, p2, p3: p2 < p1 and 4 * (p0 * p3 + p1 * p2 + p1 * p3 + p2 * p3) / 1024 < 256)
+

+

To limit the size of the parameter grid (and therefore the number of functions generated) to at most 5: +

parameters = create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]}, sample=5)
+

+

If the parameter is a loop order which is a list or tuple of indices, create_parameter_grid can generate all the permutations of loop order. Furthermore, you can pass in a filter function to filter out invalid loop orders: +

parameters = create_parameter_grid({P0:(i, j, k, ii, jj, kk)}, filter_func = lambda *p : schedule.is_valid_loop_order(p[0][0]))
+

+

Schedule.is_valid_loop_order() is a pre-defined filter function that determines if a given loop order is valid for that schedule.

+

Note that the order of the list or tuple of indices provided to create_parameter_grid does not matter.

+

To filter parameters with more complicated logic, you can define your own filter function that wraps Schedule.is_valid_loop_order():

+
def my_filter(parameters_choice):
+    P1, P2, P3, P4, P5, loop_order = parameters_choice
+
+    return P1 > P2 \
+        and P3 > P4 \
+        and P1 * P5 < P3 \
+        and P2 * P5 < P4 \
+        and schedule.is_valid_loop_order(loop_order)
+
+ parameters = acc.create_parameter_grid({
+        P1: [64, 128, 256],
+        P2: [32, 128], 
+        P3: [16, 32, 128],
+        P4: [8, 64],
+        P5: [4],
+        loop_order: (i, j, k, ii, jj, kk)
+    }, my_filter)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/10 Packages/index.html b/Manual/10 Packages/index.html new file mode 100644 index 00000000..dcb66cfa --- /dev/null +++ b/Manual/10 Packages/index.html @@ -0,0 +1,2275 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 10 Packages - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Section 10: Building Packages

+

The Package class represents a collection of Accera-generated functions. Whenever a package is built, it creates a stand-alone function library that other pieces of software can use. Currently, Accera supports two package formats: HAT and MLIR.

+

HAT package format

+

HAT "Header Annotated with TOML" is a format for packaging compiled libraries in the C programming language. HAT implies that a standard C header is styled with useful metadata in the TOML markup language.

+

Consider a nest that holds some loop-nest logic. To build a HAT package containing a function with this logic for the Windows operating system, we write the following lines of code: +

package = acc.Package()
+package.add(nest, args=(A, B), base_name="myFunc")
+package.build(format=acc.Package.Format.HAT_DYNAMIC, name="myPackage", platform=acc.Package.Platform.WINDOWS)
+

+

The result is two files: myPackage.hat and myPackage.dll. The output directory defaults to current working directory. We can change the output directory with output_dir set to a relative or absolute path:

+
package.build(format=acc.Package.Format.HAT_DYNAMIC, name="myPackage", platform=acc.Package.Platform.WINDOWS, output_dir="hat_packages")
+
+

MLIR package format

+

MLIR format is used for debugging the multiple stages of MLIR lowering, from the Accera DSL all the way to runnable code. +

package.build(format=acc.Package.Format.MLIR, name="myPackage")
+

+

Function names in packages

+

We can specify the base name of a function when it is added to a package. The full function name is the base name followed by an automatically generated unique identifier. For example, if the base name is "myFunc" then the function name could be "myFunc_8f24bef5". If no base name is defined, the automatically-generated unique identifier becomes the function name.

+

The unique identifier ensures that no two functions share the same name. However, invoking the function from the client code becomes cumbersome because the function name changes each time the Accera package is updated and rebuilt. Therefore, the HAT file includes the client code to call the function without the unique identifier. Concretely, if the function signature in C is: +

void myFunc_8f24bef5(const float* A, float* B);
+
+then the HAT file also contains the line: +
void (*myFunc)(const float* A, float* B) = myFunc_8f24bef5;
+
+The above code makes the abbreviated name myFunc an alias of the full function name myFunc_8f24bef5. If multiple functions share the same base name, the first function in the HAT file gets the alias.

+

Debug mode

+

A package can be built withmode=acc.Package.Mode.DEBUG. Doing so creates a special version of each function that validates its own correctness every time the function is called. From the outside, a debugging package looks identical to a standard package. However, each of its functions actually contains two different implementations: the Accera implementation (with all of the fancy scheduling and planning) and the trivial default implementation (without any scheduling or planning). When called, the function runs both implementations and asserts that their outputs are within the predefined tolerance. If the outputs don't match, the function prints error messages to stderr. +

package.build(format=acc.Package.Format.HAT_DYNAMIC, name="myPackage", mode=acc.Package.Mode.DEBUG, tolerance=1.0e-6)
+

+

Not yet implemented: Debug mode is not supported for GPU targets.

+

Adding descriptions

+

Accera allows us to specify some standard descriptive fields in a package: +

package.add_description(version="1.0", license="https://mit-license.org/", author="Microsoft Research")
+
+Additionally, we can add arbitrary metadata to the package description as follows: +
package.add_description(other={"title": "My Package Title", "source": "https://github.com/", "citations": ["https://arxiv.org/2021.12345/", "https://arxiv.org/2021.56789/"]})
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Manual/index.html b/Manual/index.html new file mode 100644 index 00000000..5191a854 --- /dev/null +++ b/Manual/index.html @@ -0,0 +1,2192 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Index - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+ +
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/accera/index.html b/Reference/accera/index.html new file mode 100644 index 00000000..d9359e62 --- /dev/null +++ b/Reference/accera/index.html @@ -0,0 +1,2320 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Accera - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

Module functions

+ +

Top level enumerations

+ +

Classes

+

class accera.Array

+

A multidimensional array of scalar elements.

+

Constructors

+
    +
  • Array (role[, data, element_type, layout, offset, shape])
  • +
+

Enumerations

+ +

Methods

+ +
+

class accera.Cache

+

A local copy of an Array block.

+
+

class accera.Index

+

An index representing one of the loops in a Nest or one of the iteration-space dimensions of a Schedule or a Plan.

+
+

class accera.Nest

+

The logic of a loop nest.

+

Constructors

+ +

Methods

+ +
+

class accera.Package

+

Represents a collection of functions that can be built and emitted for use in client code.

+

Constructors

+ +

Enumerations

+ +

Methods

+
    +
  • add_description ([author, license, other, version])
  • +
  • add (args, source[, base_name, parameters])
  • +
  • build (name[, error_path, format, mode, os, tolerance])
  • +
+
+

class accera.Parameter

+

A placeholder that can be used instead of concrete values when constructing or calling the methods of a Nest, Schedule, or Plan.

+
+

class accera.Plan

+

A scheduled (ordered) loop nest with target-specific implementation details.

+

Methods

+
    +
  • cache (source[, index, trigger_index, layout, level, trigger_level, max_elements, thrifty, location, double_buffer, double_buffer_location, vectorize])
  • +
  • bind (indices, grid)
  • +
  • kernelize (unroll_indices[, vectorize_indices])
  • +
  • parallelize (indices[, pin, policy, max_threads])
  • +
  • tensorize (indices, mma_shape[, use_static_offsets, num_total_passes, num_fused_passes, scheduling_policy])
  • +
  • unroll (index)
  • +
  • vectorize (index)
  • +
+
+

class accera.Scalar

+

A scalar element.

+

Constructors

+ +
+

class accera.Dimension

+

A specialization of Scalar with element_type as ScalarType.index.

+

Constructors

+ +
+

class accera.Schedule

+

A scheduled (ordered) loop nest with no target-specific implementation details.

+

Methods

+ +
+

class accera.FusedSchedule

+

Child class of class accera.Schedule created as a result of fusing multiple schedules.

+

Methods (in addition to the inherited functions from class accera.Schedule)

+ +
+

class accera.Target

+

A target platform for the cross-compiler.

+

Constructors

+
    +
  • Target ([architecture, cache_lines, cache_sizes, category, extensions, family, frequency_GHz, model, name, num_cores, num_threads, turbo_frequency_GHz])
  • +
+

Enumerations

+ +
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Array/Array/index.html b/Reference/classes/Array/Array/index.html new file mode 100644 index 00000000..83fa29ea --- /dev/null +++ b/Reference/classes/Array/Array/index.html @@ -0,0 +1,2317 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Array - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Array(role[, data, element_type, layout, offset, shape])

+

Constructs an array.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
roleThe role of the array determines if the array scope is internal or external, if the array is mutable or immutable, and if the array memory is dynamically allocated.accera.Role
dataThe contents of a constant array. Required for accera.Role.CONST arrays but should not be specified for other roles.Python buffer or numpy.ndarray.
element_typeThe array element type.accera.ScalarType, default: accera.ScalarType.float32.
layoutThe affine memory map.accera.Array.Layout, or tuple of (integers or accera.Dimension), default: accera.Array.Layout.FIRST_MAJOR.
offsetThe offset of the affine memory mapinteger (positive, zero, or negative), default: 0.
shapeThe array shape. Required for roles other than accera.Role.CONST, should not be specified for accera.Role.CONST.tuple of (integers or accera.Dimension).
+

Examples

+

Construct an input array: +

import accera as acc
+A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, 20))  # the default layout is acc.Array.Layout.FIRST_MAJOR
+

+

Construct an input array with an explicit standard layout: +

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, 20), layout=acc.Array.Layout.LAST_MAJOR)
+

+

Construct an input array with an explicit affine memory map: +

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, 20), layout=(1, 10))
+

+

Construct an input array with an infinite (undefined) major dimension: +

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, acc.inf), layout=acc.Array.Layout.LAST_MAJOR)
+

+

Construct a input array with both runtime and compile-time dimension sizes: +

M = acc.create_dimensions()
+A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, 20))
+

+

Construct an input/output array: +

A = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(10, 20))
+

+

Construct an input/output array with runtime input dimension sizes: +

M, N = acc.create_dimensions()
+A = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+

+

Construct an output array with runtime output dimension sizes: +

M, N = acc.create_dimensions(role=acc.Role.OUTPUT)
+A = acc.Array(role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+

+

Construct an output array with explicit affine memory map: +

M, N = acc.create_dimensions(role=acc.Role.OUTPUT)
+A = acc.Array(role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N), layout=(1, M))
+

+

Construct a constant array: +

D = np.random.rand(10, 16)
+A = acc.Array(role=acc.Role.CONST, data=D)
+

+

Construct a constant array with an explicit element type and layout, which does not necessarily match the input data: +

D = np.random.rand(10, 16)
+A = acc.Array(role=acc.Role.CONST, element_type=acc.ScalarType.float32, layout=acc.Array.Layout.LAST_MAJOR, data=D)
+

+

Construct a temporary array: +

A = acc.Array(role=acc.Role.TEMP, element_type=acc.ScalarType.float32, shape=(10, 20), layout=acc.Array.Layout.LAST_MAJOR)
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Array/Layout/index.html b/Reference/classes/Array/Layout/index.html new file mode 100644 index 00000000..e19cbd98 --- /dev/null +++ b/Reference/classes/Array/Layout/index.html @@ -0,0 +1,2238 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Layout - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Array.Layout

+ + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.Array.Layout.FIRST_MAJORSpecifies a memory layout where the first major axis is in contiguous memory. For example, in a matrix, this corresponds to "row-major".
accera.Array.Layout.LAST_MAJORSpecifies a memory layout where the last major axis is in contiguous memory. For example, in a matrix, this corresponds to "column-major".
accera.Array.Layout.DEFERREDDefer specifying the memory layout for an Role.CONST array until a cache is created.
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Array/deferred_layout/index.html b/Reference/classes/Array/deferred_layout/index.html new file mode 100644 index 00000000..b3adf190 --- /dev/null +++ b/Reference/classes/Array/deferred_layout/index.html @@ -0,0 +1,2274 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Deferred layout - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Array.deferred_layout(cache)

+

Specifies the layout for a Role.CONST array based on a Cache. For more details, see Deferred layout of constant arrays

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
cacheThe cache that defines the layout to set.accera.Cache
+

Examples

+

Create a constant 16x16 array without specifying a layout. Later on, define its layout based on a cache:

+
import numpy as np
+import accera as acc
+
+matrix = np.random.rand(16, 16)
+
+# Create a constant array with a deferred layout
+A = acc.Array(role=acc.Role.CONST, data=matrix, layout=acc.Array.Layout.DEFERRED)
+B = Array(role=Role.INPUT_OUTPUT, element_type=ScalarType.float32, shape=matrix.shape)
+
+nest = Nest(shape=matrix.shape)
+i, j = nest.get_indices()
+
+@nest.iteration_logic
+def_():
+    B[i, j] += A[i, j]
+
+plan = nest.create_plan()
+
+# create a cache for the constant array
+AA = plan.cache(A, i, layout=acc.Array.Layout.FIRST_MAJOR, thrifty=True)
+
+# update the constant array's layout based on the cache
+A.deferred_layout(cache=AA)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Array/sub_array/index.html b/Reference/classes/Array/sub_array/index.html new file mode 100644 index 00000000..2a90fa13 --- /dev/null +++ b/Reference/classes/Array/sub_array/index.html @@ -0,0 +1,2287 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Sub array - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Array.sub_array(offsets, shape[, strides])

+

Creates a sub-array of a specific shape from an array. The sub-array is created from elements at specified offsets and strides into the original array.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
offsetsThe offsets into the original array.Tuple[int]
shapeThe size of the sub-array.Tuple[int]
strides(Optional) The strides in the original array used to create the sub-array.Tuple[int]
+

Examples

+

Create a sub-array of size 2x3 from an array of size 5x5 at an offset of {1, 1} and a stride of {2, 1}:

+
import numpy as np
+import accera as acc
+
+N = 5
+subArrayNumRows = 2
+subArrayNumCols = 3
+
+matrix = np.random.rand(N, N)
+Arr = Array(role=Role.INPUT, data=matrix)
+
+# Zero out a sub array of size [2, 3] such that the resulting array looks like this:
+# xxxxx
+# x000x
+# xxxxx
+# x000x
+# xxxxx
+
+nest = Nest(shape=(subArrayNumRows, subArrayNumCols))
+i, j = nest.get_indices()
+
+@nest.iteration_logic
+def _():
+    SubArr = Arr.sub_array([1, 1], [subArrayNumRows, subArrayNumCols], [2, 1])
+    SubArr[i, j] = 0.0
+
+schedule = nest.create_schedule()
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Dimension/Dimension/index.html b/Reference/classes/Dimension/Dimension/index.html new file mode 100644 index 00000000..f0780b95 --- /dev/null +++ b/Reference/classes/Dimension/Dimension/index.html @@ -0,0 +1,2276 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Dimension - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Dimension([role, value])

+

Constructs a runtime dimension size with optional initialization.

+

Note: This constructor is meant for advanced use cases that involve Python generator expressions. For the simplified syntax to create dimensions, see create_dimensions.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
roleThe role of the dimension determines if it is mutable or immutable.accera.Role. default: accera.Role.INPUT.
nameThe name of the dimension variable. Default is an empty string.string
valueThe optional value to initialize the dimension. Only applies to mutable dimensions (accera.Role.OUTPUT)integer or Dimension
+

Returns

+

Dimension

+

Examples

+

Construct an output array with runtime dimensions using Python tuple comprehension over an input shape: +

import accera as acc
+
+# input_shape is a tuple or list of acc.Dimensions or integers
+output_shape = tuple(acc.Dimension(role=acc.Role.OUTPUT, value=i) for i in input_shape)
+A = acc.Array(role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32, shape=output_shape)
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/FusedSchedule/get_fused_indices/index.html b/Reference/classes/FusedSchedule/get_fused_indices/index.html new file mode 100644 index 00000000..a5e884ac --- /dev/null +++ b/Reference/classes/FusedSchedule/get_fused_indices/index.html @@ -0,0 +1,2236 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Get fused indices - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.FusedSchedule.get_fused_indices()

+

Gets the fused indices of a fused schedule.

+

Returns

+

Tuple of Index

+

Examples

+
i, j = fused_schedule.get_fused_indices()
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/FusedSchedule/get_fusing_index/index.html b/Reference/classes/FusedSchedule/get_fusing_index/index.html new file mode 100644 index 00000000..f1762b2d --- /dev/null +++ b/Reference/classes/FusedSchedule/get_fusing_index/index.html @@ -0,0 +1,2236 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Get fusing index - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.FusedSchedule.get_fusing_index()

+

Gets the fusing index of a fused schedule.

+

Returns

+

Instance of Index

+

Examples

+
f = fused_schedule.get_fusing_index()
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/FusedSchedule/get_unfused_indices/index.html b/Reference/classes/FusedSchedule/get_unfused_indices/index.html new file mode 100644 index 00000000..79f9ca2e --- /dev/null +++ b/Reference/classes/FusedSchedule/get_unfused_indices/index.html @@ -0,0 +1,2236 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Get unfused indices - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.FusedSchedule.get_unfused_indices()

+

Gets the unfused indices of a fused schedule.

+

Returns

+

Tuple of Index

+

Examples

+
 k, l = fused_schedule.get_unfused_indices()
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Nest/Nest/index.html b/Reference/classes/Nest/Nest/index.html new file mode 100644 index 00000000..36fa0f65 --- /dev/null +++ b/Reference/classes/Nest/Nest/index.html @@ -0,0 +1,2252 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Nest - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Nest(shape)

+

Creates an affine loop nest.

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
shapeThe shape of the iteration spacetuple of positive integers
+

Examples

+

Create a nest with 3 nested for-loops of sizes 16, 10, and 11:

+
nest = acc.Nest(shape=(16, 10, 11))
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Nest/create_plan/index.html b/Reference/classes/Nest/create_plan/index.html new file mode 100644 index 00000000..469a4653 --- /dev/null +++ b/Reference/classes/Nest/create_plan/index.html @@ -0,0 +1,2265 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Create plan - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Nest.create_plan([target])

+

Creates a plan using the default schedule for the nest.

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
targetThe target platform. Defaults to acc.Target.HOSTTarget
+

Returns

+

Plan

+

Examples

+

Create a plan for the host computer, using the default schedule for a nest:

+
plan = nest.create_plan()
+
+

Create a plan for an Intel Core 7th Generation, using the default schedule for a nest:

+
corei9 = acc.Target("Intel 7900X", num_threads=44)
+plan = nest.create_plan(corei9)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Nest/create_schedule/index.html b/Reference/classes/Nest/create_schedule/index.html new file mode 100644 index 00000000..8f898591 --- /dev/null +++ b/Reference/classes/Nest/create_schedule/index.html @@ -0,0 +1,2236 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Create schedule - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Nest.create_schedule()

+

Create a default schedule for a nest.

+

Returns

+

Schedule

+

Examples

+
schedule = nest.create_schedule()
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Nest/get_indices/index.html b/Reference/classes/Nest/get_indices/index.html new file mode 100644 index 00000000..811b81bb --- /dev/null +++ b/Reference/classes/Nest/get_indices/index.html @@ -0,0 +1,2237 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Get indices - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Nest.get_indices()

+

Gets the iteration space dimensions for a nest.

+

Returns

+

Tuple of Index

+

Examples

+

Get the iteration space dimensions for a 3-dimensional nest:

+
i, j, k = nest.get_indices()
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Nest/iteration_logic/index.html b/Reference/classes/Nest/iteration_logic/index.html new file mode 100644 index 00000000..b5e44e16 --- /dev/null +++ b/Reference/classes/Nest/iteration_logic/index.html @@ -0,0 +1,2278 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Iteration logic - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Nest.iteration_logic(logic)

+

Adds an iteration logic function to a Nest.

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
logicPython function that represents the logic to run in the innermost loop of the nest.
+

Examples

+

The preferred syntax uses Python decorators, as follows: +

import accera as acc
+
+A = acc.Array(role=acc.role.INPUT, shape=(16, 64))
+B = acc.Array(role=acc.role.INPUT, shape=(64, 32))
+C = acc.Array(role=acc.role.INPUT_OUTPUT, shape=(16, 32))
+
+nest = acc.Nest(shape=(16, 32, 64))
+i, j, k = nest.get_indices()
+
+@nest.iteration_logic
+def _():
+    C[i,j] += A[i,k] * B[k,j]
+

+

The alternative syntax avoids decorators and instead defines the logic in a function: +

import accera as acc
+
+A = acc.Array(role=acc.role.INPUT, shape=(16, 64))
+B = acc.Array(role=acc.role.INPUT, shape=(64, 32))
+C = acc.Array(role=acc.role.INPUT_OUTPUT, shape=(16, 32))
+
+nest = acc.Nest(shape=(16, 32, 64))
+i, j, k = nest.get_indices()
+
+def logic_fn():
+    C[i, j] += A[i, k] * B[k, j]
+
+nest.iteration_logic(logic_fn)
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Package/Format/index.html b/Reference/classes/Package/Format/index.html new file mode 100644 index 00000000..0048dd9d --- /dev/null +++ b/Reference/classes/Package/Format/index.html @@ -0,0 +1,2243 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Format - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Package.Format

+ + + + + + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.Package.Format.HAT_DYNAMICHAT package format, dynamically linked.
accera.Package.Format.HAT_STATICHAT package format, statically linked.
accera.Package.Format.MLIR_DYNAMICMLIR (debugging) package format, dynamically linked.
accera.Package.Format.MLIR_STATICMLIR (debugging) package format, statically linked.
+

When cross-compiling, use either accera.Package.Format.HAT_STATIC or accera.Package.Format.MLIR_STATIC.

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Package/Mode/index.html b/Reference/classes/Package/Mode/index.html new file mode 100644 index 00000000..434926a0 --- /dev/null +++ b/Reference/classes/Package/Mode/index.html @@ -0,0 +1,2234 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Mode - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Package.Mode

+ + + + + + + + + + + + + + + + + +
typedescription
accera.Package.Mode.DEBUGDebug mode (automatically tests logical equivalence).
accera.Package.Mode.RELEASERelease (maximally optimized).
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Package/Package/index.html b/Reference/classes/Package/Package/index.html new file mode 100644 index 00000000..9d4e3da7 --- /dev/null +++ b/Reference/classes/Package/Package/index.html @@ -0,0 +1,2228 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Package - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Package.Package()

+

A package of functions that can be built and linked with client code.

+

Examples

+

Create a package:

+
package = acc.Package()
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Package/Platform/index.html b/Reference/classes/Package/Platform/index.html new file mode 100644 index 00000000..74763b36 --- /dev/null +++ b/Reference/classes/Package/Platform/index.html @@ -0,0 +1,2254 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Platform - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Package.Platform

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.Package.Platform.HOSTThe host computer's platform
accera.Package.Platform.WINDOWSThe Windows platform
accera.Package.Platform.LINUXThe Linux platform
accera.Package.Platform.MACOSThe MacOS platform
accera.Package.Platform.ANDRIODThe Android platform
accera.Package.Platform.IOSThe iOS platform
accera.Package.Platform.RASPBIANThe Raspbian platform
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Package/add/index.html b/Reference/classes/Package/add/index.html new file mode 100644 index 00000000..64ca9c64 --- /dev/null +++ b/Reference/classes/Package/add/index.html @@ -0,0 +1,2279 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Add - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Package.add(source, args[, base_name, parameters])

+

Adds one or more functions to the package.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype
sourceThe source which defines the function's implementation.Nest or Schedule or Plan
argsThe order of external-scope arrays, scalars, and dimensions used in the function signature.tuple of Array, Scalar, or Dim
base_nameA base name for the function. The full name for the function will be the base name followed by an automatically-generated unique identifier.string
parametersA value for each parameter if the function's implementation is parameterized. See Parameters. A list of dictionaries can also be provided, in which case, multiple functions are generated.Parameter to value dictionary or a list of Parameter to value dictionaries.
+

Examples

+

Adding a function defined by an Plan:

+
package.add(plan, args=(A, B, C), base_name="simple_matmul")
+
+

Convenience syntax to add a function defined by a Schedule. A default Plan will be created automatically:

+
package.add(schedule, args=(A, B, C), base_name="simple_matmul")
+
+

Convenience syntax to add a function defined by a Nest. A default Schedule and Plan will be created internally:

+
package.add(nest, args=(A, B, C), base_name="simple_matmul")
+
+

Adding a function with concrete values specified for its parameters (P0, P1, P2, P3).

+
package.add(nest, args=(A, B, C), parameters={P0:16, P1:16, P2:16, P3:1}, base_name="matmul_16_16_16_1")
+
+

Adding a function with runtime dimension sizes M, N, K and arrays A, B, and C:

+
package.add(nest, args=(M, N, K, A, B, C), base_name="matmul_M_N_K")
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Package/add_description/index.html b/Reference/classes/Package/add_description/index.html new file mode 100644 index 00000000..6f2ab388 --- /dev/null +++ b/Reference/classes/Package/add_description/index.html @@ -0,0 +1,2270 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Add description - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Package.add_description([author, license, other, version])

+

Adds descriptive metadata to the HAT package.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
authorName of the individual or group that authored the package.string
licenseThe internet URL of the license used to release the package.string
otherUser-specific descriptive metadatadictionary
versionThe package version.string
+

Examples

+

Adds the standard version, license, and author description fields to the package: +

package.add_description(version​​​​​​​​​​​​​​​​="1.0", license="https://mit-license.org/", author="Microsoft Research")​​​​​​​​​​
+

+

Adds arbitrary user-defined metadata to describe the package: +

package.add_description(other={​​​​​​​​​​​​​​​​"title": "My Package Title", "source": "https://github.com/", "citations": ["https://arxiv.org/2021.12345/", "https://arxiv.org/2021.56789/"]}​​​​​​​​​​​​​​​​)
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Package/build/index.html b/Reference/classes/Package/build/index.html new file mode 100644 index 00000000..9a5dbcf3 --- /dev/null +++ b/Reference/classes/Package/build/index.html @@ -0,0 +1,2301 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Build - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Package.build(name[, format, mode, platform, tolerance, output_dir])

+

Builds a HAT package.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
nameThe package name.string
formatThe format of the package.accera.Package.Format, defaults to Package.Format.HAT_STATIC
modeThe package mode, such as whether it is optimized or used for debugging.robopy.Package.Mode, defaults to Package.Mode.Release
platformThe platform where the package runs.accera.Package.Platform
toleranceThe tolerance for correctness checking when mode = Package.Mode.Debug.float, defaults to 1e-5
output_dirThe path to an output directory. Defaults to the current directory if unspecified.string
+

Examples

+

Build a Dynamically-linked HAT package called myPackage containing func1 for the host platform in the current directory:

+
package = acc.Package()
+package.add(plan, base_name="func1")
+package.build(format=acc.Package.Format.HAT_DYNAMIC, name="myPackage")
+
+

Build a statically-linked HAT package called myPackage containing func1 for the host platform in the hat_packages subdirectory:

+
package = acc.Package()
+package.add(plan, base_name="func1")
+package.build(format=acc.Package.Format.HAT_STATIC, name="myPackage", output_dir="hat_packages")
+
+

Build a statically-linked myPackage with additional intermediate MLIR files for debugging purposes. To build a dynamically-linked package, use acc.Package.Format.MLIR_DYNAMIC:

+
package = acc.Package()
+package.add(plan, base_name="func1")
+package.build(format=acc.Package.Format.MLIR_STATIC, name="myPackage")
+
+

Build a package with error checking for func1, outputing error messages to stderr if the default implementation and the Accera implementation do not match within a tolerance of 1.0e-6:

+
package = acc.Package()
+package.add(plan, base_name="func1")
+package.build(format=acc.Package.Format.HAT_DYNAMIC, name="myPackage", mode=acc.Package.Mode.DEBUG, tolerance=1.0e-6)
+
+

Cross-compile a statically-linked HAT package called myPackage containing func1 for the Raspberry Pi 3. Note that dynamically-linked HAT packages are not supported for cross-compilation:

+
pi3 = Target("Raspberry Pi 3B", category=Target.Category.CPU)
+plan = schedule.create_plan(target=pi3)
+package = acc.Package()
+package.add(plan, base_name="func1")
+package.build(format=acc.Package.Format.HAT_STATIC, name="myPackagePi3", platform=acc.Package.Platform.RASPBIAN)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Plan/bind/index.html b/Reference/classes/Plan/bind/index.html new file mode 100644 index 00000000..3ebaa3fc --- /dev/null +++ b/Reference/classes/Plan/bind/index.html @@ -0,0 +1,2267 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Bind - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Plan.bind(mapping)

+

Only available for targets that can execute a grid of work (such as GPUs). The bind function binds dimensions of the iteration space to axes of the target-specific grid (such as v100.GridUnit.BLOCK_X, v100.GridUnit.THREAD_X or v100.GridUnit.WARP_X on an Nvidia GPU).

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
mappingMapping of indices to GPU thread or block identifiers.dict of Index to target-specific identifiers
+

Examples

+

Mark the i, j, and k indices to execute on NVidia V100's BLOCK_X, THREAD_X, and THREAD_Y grid axes, respectively.

+
v100 = acc.Target(Target.Model.NVIDIA_V100)
+plan.bind({
+    i: v100.GridUnit.BLOCK_X,
+    j: v100.GridUnit.THREAD_X,
+    k: v100.GridUnit.THREAD_Y
+})
+
+

In some cases, e.g. with tensorization where it might be non-trivial to assign threads to their respective data, it might be simpler to bind iteration space indices to warps (Nvidia) or waves (AMD) in the x and y dimensions rather than threads. This also abstracts the computation at a level higher than individual threads where, instead of each thread performing calculation independently we consider a group of threads (warp) working to solve a bigger computational problem collaboratively (as we often see in warp synchronous primitives like the CUDA's WMMA api). For example,

+
v100 = acc.Target(Target.Model.NVIDIA_V100)
+plan.bind({
+    i: v100.GridUnit.BLOCK_X,
+    j: v100.GridUnit.BLOCK_Y,
+    ii: v100.GridUnit.WARP_Y,
+    jj: v100.GridUnit.WARP_X
+})
+
+

in this case, we assign a warp/wave of threads to each unique combination of the (ii, jj) in the iteration space. The spatial arrangement of the warps to their data is defined by the ranges assigned to these individual indices. For example, if both ii and jj are ranges [0, 32) with step size of 16, we will have a total of 4 warps (2 in the x-dimension and 2 in the y-dimension) covering a 32x32 data region.

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Plan/cache/index.html b/Reference/classes/Plan/cache/index.html new file mode 100644 index 00000000..5fffff63 --- /dev/null +++ b/Reference/classes/Plan/cache/index.html @@ -0,0 +1,2348 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Cache - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Plan.cache(source[, index, trigger_index, layout, level, trigger_level, max_elements, element_type, strategy, thrifty, location, double_buffer, double_buffer_location, vectorize])

+

Adds a caching strategy to a plan.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype
sourceThe array or cache from which this cache is copied.Array or Cache.
indexThe index used to determine the cache level. Specify one and only one of index, level, max_elements.Index.
trigger_indexThe index used to determine what level to fill the cache at. trigger_index can't come after index in the schedule order and will default to index if not specified. Specify at most one of trigger_index or trigger_level.Index.
layoutThe affine memory map, if different from the source.accera.Layout.
levelThe key-slice level to cache (the number of wildcard dimensions in a key-slice). Specify one and only one of index, level, max_elements.positive integer.
trigger_levelThe key-slice level to fill the cache at. trigger_level can't be smaller than level, and will default to level if not specified. Specify at most one of trigger_index or trigger_level.positive integer
max_elementsThe maximum elements to include in the cached region. Specify one and only one of index, level, max_elements.positive integer
element_typeThe element type to use in the cache. Defaults to the element type of the cached arrayScalarType
strategyThe thread to data mapping pattern to use when collaboratively caching by multiple threads. Defaults to AUTO which will resolve to the strategy best suited for the current target environment.CacheStrategy
thriftyUse thrifty caching (copy data into a cache only if the cached data differs from the original active block).bool
locationThe type of memory used to store the cache.MemorySpace
double_bufferWhether to make this cache a double-buffering cache. Only valid on INPUT and CONST arrays.bool
double_buffer_locationWhich memory space to put the double buffer temp array in. Requires that double_buffer is set to True. Defaults to AUTO.MemorySpace or AUTO
vectorizeWhether to vectorize the cache operations. Defaults to AUTO, which will behave like vectorize=True if the loop-nest has any vectorized loop via plan.vectorize(index) or vectorize=False if the loop-nest has no vectorized loops.bool
+

AUTO will configure the double buffering location based on the following: +location | double_buffer | double_buffer_location = AUTO +--- | --- | --- +MemorySpace.SHARED | True | MemorySpace.PRIVATE +!MemorySpace.SHARED | True | Same value as location

+

Returns

+

A Cache handle that represents the created cache.

+

Examples

+

Create a cache of array A at level 2. +

AA = plan.cache(A, level=2)
+

+

Create a cache of array A with the Array.Layout.FIRST_MAJOR layout: +

AA = plan.cache(A, level=2, layout=acc.Array.Layout.FIRST_MAJOR)
+

+

Create a cache of array A for dimension j: +

AA = plan.cache(A, index=j)
+

+

Create a cache of array A for the largest active block that does not exceed 1024 elements: +

AA = plan.cache(A, max_elements=1024)
+

+

Create a level 2 cache of array A from its level 4 cache: +

AA = plan.cache(A, level=4)
+AAA = plan.cache(AA, level=2)
+

+

Not yet implemented: Create a cache of array A at index i in GPU shared memory: +

v100 = Target(Target.Model.NVIDIA_V100)
+AA = plan.cache(A, i, location=v100.MemorySpace.SHARED)
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Plan/kernelize/index.html b/Reference/classes/Plan/kernelize/index.html new file mode 100644 index 00000000..668926ef --- /dev/null +++ b/Reference/classes/Plan/kernelize/index.html @@ -0,0 +1,2264 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Kernelize - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Plan.kernelize(unroll_indices[, vectorize_indices])

+

A convenience method for a sequence of unroll instructions followed by a possible sequence of vectorize instructions.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
unroll_indicesThe iteration-space dimensions to unrolltuple of accera.Index.
vectorize_indicesThe optional iteration-space dimensions to vectorizeaccera.Index or tuple of accera.Index.
+

Examples

+

Unroll i and k, and then vectorize j:

+
schedule.reorder(i, k, j)
+plan = schedule.create_plan()
+plan.kernelize(unroll_indices=(i, k), vectorize_indices=j)
+
+

Another example is to Unroll i and then vectorize j and k:

+
schedule.reorder(i, j, k)
+plan = schedule.create_plan()
+plan.kernelize(unroll_indices=(i,), vectorize_indices=(j, k))
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Plan/parallelize/index.html b/Reference/classes/Plan/parallelize/index.html new file mode 100644 index 00000000..326a0a36 --- /dev/null +++ b/Reference/classes/Plan/parallelize/index.html @@ -0,0 +1,2327 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Parallelize - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Plan.parallelize(indices[, pin, policy, max_threads])

+

Executes one or more loops in parallel on multiple cores or processors.

+

Only available for targets with multiple cores or processors.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
indicesThe iteration-space dimensions to run in parallel. To assign multiple threads to an index, first split that index, then parallelize its split indices.
Unsplit indices will be assigned one thread each, split indices will be assigned threads based on the number of split blocks. This is limited by the number of threads supported by the target.
tuple of accera.Index
pinPin the computation to a subset of cores or processors.tuple of target-specific identifiers
policyThe scheduling policy to apply ("dynamic" or "static").string. Defaults to "static".
max_threadsThe maximum number of threads to use when distributing the workload. The actual number of threads used is the lowest value among (a) max_threads, (b) the number of threads supported by the target and (c) the number of iterations in the domain as specified by indices.int. Defaults to None.
+

Examples

+

Parallelize the i, j, and k dimensions using default number of threads:

+
nest = Nest(shape=(2, 3, 4))
+i, j, k = nest.get_indices()
+plan.parallelize(indices=(i, j, k)) # This will use 2 x 3 x 4 = 24 threads
+
+

Parallelize the i dimension after splitting using default number of threads:

+
nest = Nest(shape=(20,))
+schedule = nest.create_schedule()
+i = schedule.get_indices()
+ii = schedule.split(i, 4)
+plan.parallelize(indices=i) # This will use 20 / 4 = 5 threads
+
+

Parallelize the i, j, and k dimensions using thread limit:

+
nest = Nest(shape=(2, 3, 4))
+i, j, k = nest.get_indices()
+plan.parallelize(indices=(i, j, k), max_threads=4) # This will use 4 threads
+
+

Parallelize the i dimension with thread limit set higher than the number of iterations:

+
nest = Nest(shape=(2, 3, 4))
+i, j, k = nest.get_indices()
+plan.parallelize(indices=i, max_threads=4) # This will use 2 threads since 'i' has only 2 iterations
+
+

Not yet implemented: Parallelize the i, j, and k dimensions by pinning them to specific cores on an Intel Xeon E5:

+
plan.parallelize(indices=(i, j, k), pin=(xeonE5.cores[0], xeonE5.cores[1], xeonE5.cores[2]))
+
+

Apply a dynamic scheduling policy, which uses a queue to partition the work across multiple cores:

+
plan.parallelize(indices=(i, j, k), policy="dynamic")
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Plan/tensorize/index.html b/Reference/classes/Plan/tensorize/index.html new file mode 100644 index 00000000..643f18e6 --- /dev/null +++ b/Reference/classes/Plan/tensorize/index.html @@ -0,0 +1,2295 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Tensorize - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Plan.tensorize(indices, mma_shape [, use_static_offsets, num_total_passes, num_fused_passes, scheduling_policy, prologue_op, epilogue_op])

+

Only available for targets with native matrix multiplication instruction (tensor core) support. Marks the dimensions of the iteration-space for tensorization. Only perfectly nested loops of the following form can be tensorized:

+
for i in range(M):
+    for k in range(N):
+        for j in range(K):
+            C[i, j] += A[i, k] * B[k, j]
+
+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
indicesThe 3-dimensional iteration space to tensorize.3-D tuple of accera.Index
mma_shapeThe type of MMA operation to use.accera.MMAShape
use_static_offsetsThis is an optimization flag, which when enabled will use precomputed offset maps stored in device constant memory. Defaults to False.bool
num_total_passesThis controls the total number of passes to run. Defaults to 1.positive integer
num_fused_passesThis controls the number of passes for which register allocation is done, higher the value more the number of registers that are allocated. Defaults to None which will fuse all the passes as specified by num_total_passes.positive integer
scheduling_policyFor multi-block MMA operations, this controls whether matrix multiplication is done block-by-block or pass-by-pass (affects register usage). Default value is accera.MMASchedulingPolicy.PASS_ORDERaccera.MMASchedulingPolicy
prologue_opThe element-wise operation to apply on matrix fragment data as a part of initialization (pre-tensorization). Default value is accera.MMAFragmentOp.NONEaccera.MMAFragmentOp
epilogue_opThe element-wise operation to apply on matrix fragment data as a part of the final store (post-tensorization). Default value is accera.MMAFragmentOp.NONEaccera.MMAFragmentOp
+

The different values of the enum MMAShape are explained here: accera.MMAShape

+

The different values of the enum MMASchedulingPolicy (applicable only for AMD targets supporting MFMA ops, such as accera.Target.Model.AMD_MI100) are mentioned here: accera.MMASchedulingPolicy

+

The different values of the enum MMAFragmentOp are explained here: accera.MMAFragmentOp

+

Examples

+

Mark the dimensions ii, jj, and kk for tensorization execution:

+
plan.tensorize(indices=(ii,jj,kk))
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Plan/unroll/index.html b/Reference/classes/Plan/unroll/index.html new file mode 100644 index 00000000..aa054f5f --- /dev/null +++ b/Reference/classes/Plan/unroll/index.html @@ -0,0 +1,2252 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Unroll - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Plan.unroll(index)

+

Marks a dimension of the iteration-space for unrolling.

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
indexThe index to unroll.Index
+

Examples

+

Mark the i dimension for unrolling:

+
plan.unroll(index=i)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Plan/vectorize/index.html b/Reference/classes/Plan/vectorize/index.html new file mode 100644 index 00000000..bce732db --- /dev/null +++ b/Reference/classes/Plan/vectorize/index.html @@ -0,0 +1,2252 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Vectorize - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Plan.vectorize(index)

+

Only available for targets that have SIMD registers and support vector instructions. Marks a dimension of the iteration-space for vectorization.

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
indexThe index to vectorize.Index
+

Examples

+

Mark the dimension ii for vectorized execution:

+
plan.vectorize(index=ii)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Scalar/Scalar/index.html b/Reference/classes/Scalar/Scalar/index.html new file mode 100644 index 00000000..916a335a --- /dev/null +++ b/Reference/classes/Scalar/Scalar/index.html @@ -0,0 +1,2267 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Scalar - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Scalar([element_type, value])

+

Constructs a scalar that holds a number.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
element_typeThe element type.accera.ScalarType, default: accera.ScalarType.float32.
valueAn optional value.A number.
+

Examples

+

Construct a float32 scalar: +

import accera as acc
+
+X = acc.Scalar()
+

+

Construct a float32 scalar and initialize it: +

Pi = acc.Scalar(value=3.14)
+

+

Construct integer scalars and perform arithmetic operations on them: +

X = acc.Scalar(element_type=acc.ScalarType.int32)
+Y = acc.Scalar(element_type=acc.ScalarType.int32)
+Y.value = x + 2
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Schedule/create_plan/index.html b/Reference/classes/Schedule/create_plan/index.html new file mode 100644 index 00000000..2734db8e --- /dev/null +++ b/Reference/classes/Schedule/create_plan/index.html @@ -0,0 +1,2259 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Create plan - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Schedule.create_plan([target])

+

Creates a plan for running this schedule.

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
targetThe target platform. Defaults to acc.Target.HOSTTarget
+

Returns

+

Plan

+

Examples

+

TODO

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Schedule/get_indices/index.html b/Reference/classes/Schedule/get_indices/index.html new file mode 100644 index 00000000..697d0e2e --- /dev/null +++ b/Reference/classes/Schedule/get_indices/index.html @@ -0,0 +1,2237 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Get indices - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Schedule.get_indices()

+

Gets the iteration space dimensions for a schedule.

+

Returns

+

Tuple of Index

+

Examples

+

Get the iteration space dimensions for a 3-dimensional nest:

+
i, j, k = schedule.get_indices()
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Schedule/is_valid_loop_order/index.html b/Reference/classes/Schedule/is_valid_loop_order/index.html new file mode 100644 index 00000000..b46ee568 --- /dev/null +++ b/Reference/classes/Schedule/is_valid_loop_order/index.html @@ -0,0 +1,2274 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Is valid loop order - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Schedule.is_valid_loop_order(*order)

+

The is_valid_loop_order function determines if an order of indices is valid. For a description of valid schedule orders, refer to reorder.

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
*orderThe order of indices to check for validityvariable Index arguments
+

Examples

+

Checks if an order is valid:

+
print(schedule.is_valid_loop_order(k, i, j))
+
+

Uses this function as part of a parameter filter to determine which permutations of loop order parameters are valid:

+
P1, P2, P2, P4, P5, loop_order = acc.create_parameters()
+schedule.reorder(order=loop_order)
+
+def my_filter(parameters_choice):
+    P1, P2, P3, P4, P5, loop_order = parameters_choice
+
+    return P1 > P2 \
+        and P3 > P4 \
+        and P1 * P5 < P3 \
+        and P2 * P5 < P4 \
+        and schedule.is_valid_loop_order(loop_order)
+
+ parameters = acc.create_parameter_grid({
+        P1: [64, 128, 256],
+        P2: [32, 128], 
+        P3: [16, 32, 128],
+        P4: [8, 64],
+        P5: [4],
+        loop_order: (i, j, k, ii, jj, kk)
+    }, my_filter)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Schedule/pad/index.html b/Reference/classes/Schedule/pad/index.html new file mode 100644 index 00000000..2bfd0569 --- /dev/null +++ b/Reference/classes/Schedule/pad/index.html @@ -0,0 +1,2257 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Pad - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Schedule.pad(index, size)

+

Pads the beginning of a specified dimension of the iteration-space with empty (no-op) elements.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
indexThe dimension to padIndex
sizeThe number of elements to padnon-negative integer
+

Examples

+

Pads the beginning of dimension i with 10 empty elements

+
schedule.pad(i, 10)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Schedule/reorder/index.html b/Reference/classes/Schedule/reorder/index.html new file mode 100644 index 00000000..3a234127 --- /dev/null +++ b/Reference/classes/Schedule/reorder/index.html @@ -0,0 +1,2263 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Reorder - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Schedule.reorder(order, *args)

+

The reorder transformation sets the order of the indices in the schedule.

+

These orders are not allowed: +1. The outer dimension created by a split transformation must always precede the corresponding inner dimension. +2. The fusing dimension created by a fuse operation must always precede any unfused dimensions.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
orderEither the order of indices to set or the outermost index if using variable argumentstuple of Index or Index.
*argsOptional variable arguments containing subsequent indices to setvariable Index arguments
+

Examples

+

Reorder a schedule by moving the k dimension to the outermost loop:

+
schedule.reorder(k, i, j)
+
+

Using a tuple to reorder a schedule. This overloaded form is better suited for parameters:

+
schedule.reorder(order=(k, i, j))
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Schedule/skew/index.html b/Reference/classes/Schedule/skew/index.html new file mode 100644 index 00000000..03ebe27b --- /dev/null +++ b/Reference/classes/Schedule/skew/index.html @@ -0,0 +1,2265 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Skew - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Schedule.skew(index, reference_index [, unroll_loops_smaller_than])

+

Transforms a dimension with respect to a reference dimension into a parallelogram by padding with empty elements.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
indexThe dimension to skewIndex
reference_indexThe reference dimensionIndex
unroll_loops_smaller_thanUnroll loops that are smaller than this range (non-inclusive)non-negative integer
+

Examples

+

Skew dimension i with respect to dimension j:

+
schedule.skew(i, j)
+
+

Skew dimension j with respect to dimension i, and unroll if the resulting loops are smaller than 3:

+
schedule.skew(j, i, unroll_loops_smaller_than=3)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Schedule/split/index.html b/Reference/classes/Schedule/split/index.html new file mode 100644 index 00000000..c9d579e8 --- /dev/null +++ b/Reference/classes/Schedule/split/index.html @@ -0,0 +1,2268 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Split - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Schedule.split(index, size)

+

The split transformation takes a dimension i and a size, modifies i, and creates a new dimension ii.

+

Assume that the original size of dimension i was n: The split transformation splits dimension i into ceil(n/size) parts of size size, arranges each of those parts along dimension ii, and stacks the ceil(n/size) parts along dimension i.

+

If the split size does not divide the dimension size, empty elements are added such that the split size does divide the dimension size.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
indexThe dimension to splitIndex
sizeThe split sizenon-negative integer
+

Returns

+

Index for the new inner dimension

+

Examples

+

Split the i dimension by 5, creating a new dimension ii:

+
ii = schedule.split(j, 5)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Schedule/tile/index.html b/Reference/classes/Schedule/tile/index.html new file mode 100644 index 00000000..d5e532b2 --- /dev/null +++ b/Reference/classes/Schedule/tile/index.html @@ -0,0 +1,2265 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Tile - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Schedule.tile(shape)

+

The tile transformation is a convenience syntax that takes a tuple of indices and a tuple of sizes, and splits each index by the corresponding size. The indices involved in the split are then ordered such that all the outer indices precede all of their respective inner indices.

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
shapeMapping of indices to tile sizesdict of Index and non-negative integers
+

Returns

+

Tuple of Index representing the new inner dimensions.

+

Examples

+

Tile the i, j, and k dimensions by 8, 2, and 3, respectively.

+
ii, jj, kk = schedule.tile({
+    i: 8,
+    j: 2,
+    k: 3
+})
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Target/Architecture/index.html b/Reference/classes/Target/Architecture/index.html new file mode 100644 index 00000000..e17dd198 --- /dev/null +++ b/Reference/classes/Target/Architecture/index.html @@ -0,0 +1,2248 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Architecture - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Target.Architecture

+

Defines the supported target architectures.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.Target.Architecture.HOSTThe host computer's architecture
accera.Target.Architecture.ARMThe ARM architecture
accera.Target.Architecture.AARCH64The 64-bit ARM architecture
accera.Target.Architecture.X86The 32-bit x86 architecture
accera.Target.Architecture.X86_64The 64-bit x86 architecture
+

TODO: AARCH64?

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Target/Category/index.html b/Reference/classes/Target/Category/index.html new file mode 100644 index 00000000..193a7344 --- /dev/null +++ b/Reference/classes/Target/Category/index.html @@ -0,0 +1,2235 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Category - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Target.Category

+

Defines the target processor category.

+ + + + + + + + + + + + + + + + + +
typedescription
accera.Target.Category.CPU
accera.Target.Category.GPU
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Target/Model/index.html b/Reference/classes/Target/Model/index.html new file mode 100644 index 00000000..4bee62fe --- /dev/null +++ b/Reference/classes/Target/Model/index.html @@ -0,0 +1,4589 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Model - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Target.Model

+

Defines constants for some well-known CPU models.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.Target.Model.AMD_1200AMD 1200
accera.Target.Model.AMD_1300XAMD 1300X
accera.Target.Model.AMD_1400AMD 1400
accera.Target.Model.AMD_1500XAMD 1500X
accera.Target.Model.AMD_1600AMD 1600
accera.Target.Model.AMD_1600XAMD 1600X
accera.Target.Model.AMD_1700AMD 1700
accera.Target.Model.AMD_1700XAMD 1700X
accera.Target.Model.AMD_1800XAMD 1800X
accera.Target.Model.AMD_1900XAMD 1900X
accera.Target.Model.AMD_1920XAMD 1920X
accera.Target.Model.AMD_1950XAMD 1950X
accera.Target.Model.AMD_200GEAMD 200GE
accera.Target.Model.AMD_2200GAMD 2200G
accera.Target.Model.AMD_2200GEAMD 2200GE
accera.Target.Model.AMD_2200UAMD 2200U
accera.Target.Model.AMD_220GEAMD 220GE
accera.Target.Model.AMD_2300UAMD 2300U
accera.Target.Model.AMD_2300XAMD 2300X
accera.Target.Model.AMD_2400GAMD 2400G
accera.Target.Model.AMD_2400GEAMD 2400GE
accera.Target.Model.AMD_240GEAMD 240GE
accera.Target.Model.AMD_2500UAMD 2500U
accera.Target.Model.AMD_2500XAMD 2500X
accera.Target.Model.AMD_2600AMD 2600
accera.Target.Model.AMD_2600EAMD 2600E
accera.Target.Model.AMD_2600HAMD 2600H
accera.Target.Model.AMD_2600XAMD 2600X
accera.Target.Model.AMD_2700AMD 2700
accera.Target.Model.AMD_2700EAMD 2700E
accera.Target.Model.AMD_2700UAMD 2700U
accera.Target.Model.AMD_2700XAMD 2700X
accera.Target.Model.AMD_2700X_GOLD_EDITIONAMD 2700X Gold Edition
accera.Target.Model.AMD_2800HAMD 2800H
accera.Target.Model.AMD_2920XAMD 2920X
accera.Target.Model.AMD_2950XAMD 2950X
accera.Target.Model.AMD_2970WXAMD 2970WX
accera.Target.Model.AMD_2990WXAMD 2990WX
accera.Target.Model.AMD_3000GAMD 3000G
accera.Target.Model.AMD_300UAMD 300U
accera.Target.Model.AMD_3050UAMD 3050U
accera.Target.Model.AMD_3101AMD 3101
accera.Target.Model.AMD_3150UAMD 3150U
accera.Target.Model.AMD_3151AMD 3151
accera.Target.Model.AMD_3200GAMD 3200G
accera.Target.Model.AMD_3200UAMD 3200U
accera.Target.Model.AMD_3201AMD 3201
accera.Target.Model.AMD_3250UAMD 3250U
accera.Target.Model.AMD_3251AMD 3251
accera.Target.Model.AMD_3255AMD 3255
accera.Target.Model.AMD_3300UAMD 3300U
accera.Target.Model.AMD_3301AMD 3301
accera.Target.Model.AMD_3351AMD 3351
accera.Target.Model.AMD_3400GAMD 3400G
accera.Target.Model.AMD_3401AMD 3401
accera.Target.Model.AMD_3451AMD 3451
accera.Target.Model.AMD_3500AMD 3500
accera.Target.Model.AMD_3500UAMD 3500U
accera.Target.Model.AMD_3500XAMD 3500X
accera.Target.Model.AMD_3550HAMD 3550H
accera.Target.Model.AMD_3580UAMD 3580U
accera.Target.Model.AMD_3600AMD 3600
accera.Target.Model.AMD_3600XAMD 3600X
accera.Target.Model.AMD_3600XTAMD 3600XT
accera.Target.Model.AMD_3700UAMD 3700U
accera.Target.Model.AMD_3700XAMD 3700X
accera.Target.Model.AMD_3750HAMD 3750H
accera.Target.Model.AMD_3780UAMD 3780U
accera.Target.Model.AMD_3800XAMD 3800X
accera.Target.Model.AMD_3800XTAMD 3800XT
accera.Target.Model.AMD_3900AMD 3900
accera.Target.Model.AMD_3900XAMD 3900X
accera.Target.Model.AMD_3900XTAMD 3900XT
accera.Target.Model.AMD_3950XAMD 3950X
accera.Target.Model.AMD_3960XAMD 3960X
accera.Target.Model.AMD_3970XAMD 3970X
accera.Target.Model.AMD_3980XAMD 3980X
accera.Target.Model.AMD_3990XAMD 3990X
accera.Target.Model.AMD_4300GAMD 4300G
accera.Target.Model.AMD_4300GEAMD 4300GE
accera.Target.Model.AMD_4300UAMD 4300U
accera.Target.Model.AMD_4500UAMD 4500U
accera.Target.Model.AMD_4600GAMD 4600G
accera.Target.Model.AMD_4600GEAMD 4600GE
accera.Target.Model.AMD_4600HAMD 4600H
accera.Target.Model.AMD_4600HSAMD 4600HS
accera.Target.Model.AMD_4600UAMD 4600U
accera.Target.Model.AMD_4680UAMD 4680U
accera.Target.Model.AMD_4700GAMD 4700G
accera.Target.Model.AMD_4700GEAMD 4700GE
accera.Target.Model.AMD_4700UAMD 4700U
accera.Target.Model.AMD_4800HAMD 4800H
accera.Target.Model.AMD_4800HSAMD 4800HS
accera.Target.Model.AMD_4800UAMD 4800U
accera.Target.Model.AMD_4900HAMD 4900H
accera.Target.Model.AMD_4900HSAMD 4900HS
accera.Target.Model.AMD_4980UAMD 4980U
accera.Target.Model.AMD_5300GAMD 5300G
accera.Target.Model.AMD_5300GEAMD 5300GE
accera.Target.Model.AMD_5300UAMD 5300U
accera.Target.Model.AMD_5400UAMD 5400U
accera.Target.Model.AMD_5500UAMD 5500U
accera.Target.Model.AMD_5600GAMD 5600G
accera.Target.Model.AMD_5600GEAMD 5600GE
accera.Target.Model.AMD_5600HAMD 5600H
accera.Target.Model.AMD_5600HSAMD 5600HS
accera.Target.Model.AMD_5600UAMD 5600U
accera.Target.Model.AMD_5600XAMD 5600X
accera.Target.Model.AMD_5700GAMD 5700G
accera.Target.Model.AMD_5700GEAMD 5700GE
accera.Target.Model.AMD_5700UAMD 5700U
accera.Target.Model.AMD_5800AMD 5800
accera.Target.Model.AMD_5800HAMD 5800H
accera.Target.Model.AMD_5800HSAMD 5800HS
accera.Target.Model.AMD_5800UAMD 5800U
accera.Target.Model.AMD_5800XAMD 5800X
accera.Target.Model.AMD_5900AMD 5900
accera.Target.Model.AMD_5900HSAMD 5900HS
accera.Target.Model.AMD_5900HXAMD 5900HX
accera.Target.Model.AMD_5900XAMD 5900X
accera.Target.Model.AMD_5950XAMD 5950X
accera.Target.Model.AMD_5980HSAMD 5980HS
accera.Target.Model.AMD_5980HXAMD 5980HX
accera.Target.Model.AMD_7232PAMD 7232P
accera.Target.Model.AMD_7251AMD 7251
accera.Target.Model.AMD_7252AMD 7252
accera.Target.Model.AMD_7261AMD 7261
accera.Target.Model.AMD_7262AMD 7262
accera.Target.Model.AMD_7272AMD 7272
accera.Target.Model.AMD_7281AMD 7281
accera.Target.Model.AMD_7282AMD 7282
accera.Target.Model.AMD_72F3AMD 72F3
accera.Target.Model.AMD_7301AMD 7301
accera.Target.Model.AMD_7302AMD 7302
accera.Target.Model.AMD_7302PAMD 7302P
accera.Target.Model.AMD_7313AMD 7313
accera.Target.Model.AMD_7313PAMD 7313P
accera.Target.Model.AMD_7343AMD 7343
accera.Target.Model.AMD_7351AMD 7351
accera.Target.Model.AMD_7351PAMD 7351P
accera.Target.Model.AMD_7352AMD 7352
accera.Target.Model.AMD_7371AMD 7371
accera.Target.Model.AMD_73F3AMD 73F3
accera.Target.Model.AMD_7401AMD 7401
accera.Target.Model.AMD_7401PAMD 7401P
accera.Target.Model.AMD_7402AMD 7402
accera.Target.Model.AMD_7402PAMD 7402P
accera.Target.Model.AMD_7413AMD 7413
accera.Target.Model.AMD_7443AMD 7443
accera.Target.Model.AMD_7443PAMD 7443P
accera.Target.Model.AMD_7451AMD 7451
accera.Target.Model.AMD_7452AMD 7452
accera.Target.Model.AMD_7453AMD 7453
accera.Target.Model.AMD_74F3AMD 74F3
accera.Target.Model.AMD_7501AMD 7501
accera.Target.Model.AMD_7502AMD 7502
accera.Target.Model.AMD_7502PAMD 7502P
accera.Target.Model.AMD_7513AMD 7513
accera.Target.Model.AMD_7532AMD 7532
accera.Target.Model.AMD_7542AMD 7542
accera.Target.Model.AMD_7543AMD 7543
accera.Target.Model.AMD_7543PAMD 7543P
accera.Target.Model.AMD_7551AMD 7551
accera.Target.Model.AMD_7551PAMD 7551P
accera.Target.Model.AMD_7552AMD 7552
accera.Target.Model.AMD_75F3AMD 75F3
accera.Target.Model.AMD_7601AMD 7601
accera.Target.Model.AMD_7642AMD 7642
accera.Target.Model.AMD_7643AMD 7643
accera.Target.Model.AMD_7662AMD 7662
accera.Target.Model.AMD_7663AMD 7663
accera.Target.Model.AMD_7702AMD 7702
accera.Target.Model.AMD_7702PAMD 7702P
accera.Target.Model.AMD_7713AMD 7713
accera.Target.Model.AMD_7713PAMD 7713P
accera.Target.Model.AMD_7742AMD 7742
accera.Target.Model.AMD_7763AMD 7763
accera.Target.Model.AMD_7F32AMD 7F32
accera.Target.Model.AMD_7F52AMD 7F52
accera.Target.Model.AMD_7F72AMD 7F72
accera.Target.Model.AMD_7H12AMD 7H12
accera.Target.Model.AMD_7V12AMD 7V12
accera.Target.Model.AMD_FIREFLIGHTAMD FireFlight
accera.Target.Model.AMD_PRO_1200AMD PRO 1200
accera.Target.Model.AMD_PRO_1300AMD PRO 1300
accera.Target.Model.AMD_PRO_1500AMD PRO 1500
accera.Target.Model.AMD_PRO_1600AMD PRO 1600
accera.Target.Model.AMD_PRO_1700AMD PRO 1700
accera.Target.Model.AMD_PRO_1700XAMD PRO 1700X
accera.Target.Model.AMD_PRO_200GEAMD PRO 200GE
accera.Target.Model.AMD_PRO_2200GAMD PRO 2200G
accera.Target.Model.AMD_PRO_2200GEAMD PRO 2200GE
accera.Target.Model.AMD_PRO_2300UAMD PRO 2300U
accera.Target.Model.AMD_PRO_2400GAMD PRO 2400G
accera.Target.Model.AMD_PRO_2400GEAMD PRO 2400GE
accera.Target.Model.AMD_PRO_2500UAMD PRO 2500U
accera.Target.Model.AMD_PRO_2600AMD PRO 2600
accera.Target.Model.AMD_PRO_2700AMD PRO 2700
accera.Target.Model.AMD_PRO_2700UAMD PRO 2700U
accera.Target.Model.AMD_PRO_2700XAMD PRO 2700X
accera.Target.Model.AMD_PRO_300GEAMD PRO 300GE
accera.Target.Model.AMD_PRO_300UAMD PRO 300U
accera.Target.Model.AMD_PRO_3200GAMD PRO 3200G
accera.Target.Model.AMD_PRO_3200GEAMD PRO 3200GE
accera.Target.Model.AMD_PRO_3300UAMD PRO 3300U
accera.Target.Model.AMD_PRO_3400GAMD PRO 3400G
accera.Target.Model.AMD_PRO_3400GEAMD PRO 3400GE
accera.Target.Model.AMD_PRO_3500UAMD PRO 3500U
accera.Target.Model.AMD_PRO_3600AMD PRO 3600
accera.Target.Model.AMD_PRO_3700AMD PRO 3700
accera.Target.Model.AMD_PRO_3700UAMD PRO 3700U
accera.Target.Model.AMD_PRO_3900AMD PRO 3900
accera.Target.Model.AMD_PRO_4350GAMD PRO 4350G
accera.Target.Model.AMD_PRO_4350GEAMD PRO 4350GE
accera.Target.Model.AMD_PRO_4450UAMD PRO 4450U
accera.Target.Model.AMD_PRO_4650GAMD PRO 4650G
accera.Target.Model.AMD_PRO_4650GEAMD PRO 4650GE
accera.Target.Model.AMD_PRO_4650UAMD PRO 4650U
accera.Target.Model.AMD_PRO_4750GAMD PRO 4750G
accera.Target.Model.AMD_PRO_4750GEAMD PRO 4750GE
accera.Target.Model.AMD_PRO_4750UAMD PRO 4750U
accera.Target.Model.AMD_PRO_5350GAMD PRO 5350G
accera.Target.Model.AMD_PRO_5350GEAMD PRO 5350GE
accera.Target.Model.AMD_PRO_5450UAMD PRO 5450U
accera.Target.Model.AMD_PRO_5650GAMD PRO 5650G
accera.Target.Model.AMD_PRO_5650GEAMD PRO 5650GE
accera.Target.Model.AMD_PRO_5650UAMD PRO 5650U
accera.Target.Model.AMD_PRO_5750GAMD PRO 5750G
accera.Target.Model.AMD_PRO_5750GEAMD PRO 5750GE
accera.Target.Model.AMD_PRO_5850UAMD PRO 5850U
accera.Target.Model.AMD_R1102GAMD R1102G
accera.Target.Model.AMD_R1305GAMD R1305G
accera.Target.Model.AMD_R1505GAMD R1505G
accera.Target.Model.AMD_R1606GAMD R1606G
accera.Target.Model.AMD_V1202BAMD V1202B
accera.Target.Model.AMD_V1404IAMD V1404I
accera.Target.Model.AMD_V1500BAMD V1500B
accera.Target.Model.AMD_V1605BAMD V1605B
accera.Target.Model.AMD_V1756BAMD V1756B
accera.Target.Model.AMD_V1780BAMD V1780B
accera.Target.Model.AMD_V1807BAMD V1807B
accera.Target.Model.AMD_V2516AMD V2516
accera.Target.Model.AMD_V2546AMD V2546
accera.Target.Model.AMD_V2718AMD V2718
accera.Target.Model.AMD_V2748AMD V2748
accera.Target.Model.ARM_CORTEX_M4ARM Cortex-M4
accera.Target.Model.ARM_CORTEX_M4FARM Cortex-M4F
accera.Target.Model.APPLE_M1_MAXApple M1 Max
accera.Target.Model.INTEL_1000G1Intel 1000G1
accera.Target.Model.INTEL_1000G4Intel 1000G4
accera.Target.Model.INTEL_1005G1Intel 1005G1
accera.Target.Model.INTEL_10100Intel 10100
accera.Target.Model.INTEL_10100FIntel 10100F
accera.Target.Model.INTEL_10100TIntel 10100T
accera.Target.Model.INTEL_10300Intel 10300
accera.Target.Model.INTEL_10300TIntel 10300T
accera.Target.Model.INTEL_1030G4Intel 1030G4
accera.Target.Model.INTEL_1030G7Intel 1030G7
accera.Target.Model.INTEL_10320Intel 10320
accera.Target.Model.INTEL_1035G1Intel 1035G1
accera.Target.Model.INTEL_1035G4Intel 1035G4
accera.Target.Model.INTEL_1035G7Intel 1035G7
accera.Target.Model.INTEL_10400Intel 10400
accera.Target.Model.INTEL_10400FIntel 10400F
accera.Target.Model.INTEL_10400TIntel 10400T
accera.Target.Model.INTEL_10500Intel 10500
accera.Target.Model.INTEL_10500TIntel 10500T
accera.Target.Model.INTEL_10600Intel 10600
accera.Target.Model.INTEL_10600KIntel 10600K
accera.Target.Model.INTEL_10600KFIntel 10600KF
accera.Target.Model.INTEL_10600TIntel 10600T
accera.Target.Model.INTEL_1060G7Intel 1060G7
accera.Target.Model.INTEL_1065G7Intel 1065G7
accera.Target.Model.INTEL_1068G7Intel 1068G7
accera.Target.Model.INTEL_10700Intel 10700
accera.Target.Model.INTEL_10700FIntel 10700F
accera.Target.Model.INTEL_10700KIntel 10700K
accera.Target.Model.INTEL_10700KFIntel 10700KF
accera.Target.Model.INTEL_10700TIntel 10700T
accera.Target.Model.INTEL_10850KIntel 10850K
accera.Target.Model.INTEL_10900Intel 10900
accera.Target.Model.INTEL_10900FIntel 10900F
accera.Target.Model.INTEL_10900KIntel 10900K
accera.Target.Model.INTEL_10900KFIntel 10900KF
accera.Target.Model.INTEL_10900TIntel 10900T
accera.Target.Model.INTEL_10910Intel 10910
accera.Target.Model.INTEL_11100BIntel 11100B
accera.Target.Model.INTEL_1115G7Intel 1115G7
accera.Target.Model.INTEL_1125G7Intel 1125G7
accera.Target.Model.INTEL_1135G7Intel 1135G7
accera.Target.Model.INTEL_11400Intel 11400
accera.Target.Model.INTEL_11400FIntel 11400F
accera.Target.Model.INTEL_11400TIntel 11400T
accera.Target.Model.INTEL_1145G7Intel 1145G7
accera.Target.Model.INTEL_11500Intel 11500
accera.Target.Model.INTEL_11500BIntel 11500B
accera.Target.Model.INTEL_11500TIntel 11500T
accera.Target.Model.INTEL_1155G7Intel 1155G7
accera.Target.Model.INTEL_11600Intel 11600
accera.Target.Model.INTEL_11600KIntel 11600K
accera.Target.Model.INTEL_11600KFIntel 11600KF
accera.Target.Model.INTEL_11600TIntel 11600T
accera.Target.Model.INTEL_1165G7Intel 1165G7
accera.Target.Model.INTEL_11700Intel 11700
accera.Target.Model.INTEL_11700BIntel 11700B
accera.Target.Model.INTEL_11700FIntel 11700F
accera.Target.Model.INTEL_11700KIntel 11700K
accera.Target.Model.INTEL_11700KFIntel 11700KF
accera.Target.Model.INTEL_11700TIntel 11700T
accera.Target.Model.INTEL_11850HIntel 11850H
accera.Target.Model.INTEL_1185G7Intel 1185G7
accera.Target.Model.INTEL_11900Intel 11900
accera.Target.Model.INTEL_11900FIntel 11900F
accera.Target.Model.INTEL_11900KIntel 11900K
accera.Target.Model.INTEL_11900KBIntel 11900KB
accera.Target.Model.INTEL_11900KFIntel 11900KF
accera.Target.Model.INTEL_11900TIntel 11900T
accera.Target.Model.INTEL_1195G7Intel 1195G7
accera.Target.Model.INTEL_2104GIntel 2104G
accera.Target.Model.INTEL_2124Intel 2124
accera.Target.Model.INTEL_2124GIntel 2124G
accera.Target.Model.INTEL_2126GIntel 2126G
accera.Target.Model.INTEL_2134Intel 2134
accera.Target.Model.INTEL_2136Intel 2136
accera.Target.Model.INTEL_2144GIntel 2144G
accera.Target.Model.INTEL_2146GIntel 2146G
accera.Target.Model.INTEL_2174GIntel 2174G
accera.Target.Model.INTEL_2176GIntel 2176G
accera.Target.Model.INTEL_2186GIntel 2186G
accera.Target.Model.INTEL_2314Intel 2314
accera.Target.Model.INTEL_2324GIntel 2324G
accera.Target.Model.INTEL_2334Intel 2334
accera.Target.Model.INTEL_2336Intel 2336
accera.Target.Model.INTEL_2356GIntel 2356G
accera.Target.Model.INTEL_2374GIntel 2374G
accera.Target.Model.INTEL_2378Intel 2378
accera.Target.Model.INTEL_2378GIntel 2378G
accera.Target.Model.INTEL_2386GIntel 2386G
accera.Target.Model.INTEL_2388GIntel 2388G
accera.Target.Model.INTEL_3204Intel 3204
accera.Target.Model.INTEL_4108Intel 4108
accera.Target.Model.INTEL_4109TIntel 4109T
accera.Target.Model.INTEL_4110Intel 4110
accera.Target.Model.INTEL_4112Intel 4112
accera.Target.Model.INTEL_4114Intel 4114
accera.Target.Model.INTEL_4208Intel 4208
accera.Target.Model.INTEL_4209TIntel 4209T
accera.Target.Model.INTEL_4210Intel 4210
accera.Target.Model.INTEL_4210RIntel 4210R
accera.Target.Model.INTEL_4214Intel 4214
accera.Target.Model.INTEL_4214RIntel 4214R
accera.Target.Model.INTEL_4214YIntel 4214Y
accera.Target.Model.INTEL_4215Intel 4215
accera.Target.Model.INTEL_4215RIntel 4215R
accera.Target.Model.INTEL_4216Intel 4216
accera.Target.Model.INTEL_5215Intel 5215
accera.Target.Model.INTEL_5215LIntel 5215L
accera.Target.Model.INTEL_5215MIntel 5215M
accera.Target.Model.INTEL_5217Intel 5217
accera.Target.Model.INTEL_5218Intel 5218
accera.Target.Model.INTEL_5218BIntel 5218B
accera.Target.Model.INTEL_5218NIntel 5218N
accera.Target.Model.INTEL_5218RIntel 5218R
accera.Target.Model.INTEL_5218TIntel 5218T
accera.Target.Model.INTEL_5220Intel 5220
accera.Target.Model.INTEL_5220RIntel 5220R
accera.Target.Model.INTEL_5220SIntel 5220S
accera.Target.Model.INTEL_5220TIntel 5220T
accera.Target.Model.INTEL_5222Intel 5222
accera.Target.Model.INTEL_6035Intel 6035
accera.Target.Model.INTEL_6098PIntel 6098P
accera.Target.Model.INTEL_6100Intel 6100
accera.Target.Model.INTEL_6100TIntel 6100T
accera.Target.Model.INTEL_6209UIntel 6209U
accera.Target.Model.INTEL_6210UIntel 6210U
accera.Target.Model.INTEL_6212UIntel 6212U
accera.Target.Model.INTEL_6222VIntel 6222V
accera.Target.Model.INTEL_6226Intel 6226
accera.Target.Model.INTEL_6226RIntel 6226R
accera.Target.Model.INTEL_6230Intel 6230
accera.Target.Model.INTEL_6230NIntel 6230N
accera.Target.Model.INTEL_6230RIntel 6230R
accera.Target.Model.INTEL_6230TIntel 6230T
accera.Target.Model.INTEL_6234Intel 6234
accera.Target.Model.INTEL_6238Intel 6238
accera.Target.Model.INTEL_6238LIntel 6238L
accera.Target.Model.INTEL_6238MIntel 6238M
accera.Target.Model.INTEL_6238RIntel 6238R
accera.Target.Model.INTEL_6238TIntel 6238T
accera.Target.Model.INTEL_6240Intel 6240
accera.Target.Model.INTEL_6240LIntel 6240L
accera.Target.Model.INTEL_6240MIntel 6240M
accera.Target.Model.INTEL_6240RIntel 6240R
accera.Target.Model.INTEL_6240YIntel 6240Y
accera.Target.Model.INTEL_6242Intel 6242
accera.Target.Model.INTEL_6242RIntel 6242R
accera.Target.Model.INTEL_6244Intel 6244
accera.Target.Model.INTEL_6246Intel 6246
accera.Target.Model.INTEL_6246RIntel 6246R
accera.Target.Model.INTEL_6248Intel 6248
accera.Target.Model.INTEL_6248RIntel 6248R
accera.Target.Model.INTEL_6252Intel 6252
accera.Target.Model.INTEL_6252NIntel 6252N
accera.Target.Model.INTEL_6254Intel 6254
accera.Target.Model.INTEL_6258RIntel 6258R
accera.Target.Model.INTEL_6262VIntel 6262V
accera.Target.Model.INTEL_6300Intel 6300
accera.Target.Model.INTEL_6300TIntel 6300T
accera.Target.Model.INTEL_6320Intel 6320
accera.Target.Model.INTEL_6400Intel 6400
accera.Target.Model.INTEL_6400TIntel 6400T
accera.Target.Model.INTEL_6402PIntel 6402P
accera.Target.Model.INTEL_6500Intel 6500
accera.Target.Model.INTEL_6500TIntel 6500T
accera.Target.Model.INTEL_6585RIntel 6585R
accera.Target.Model.INTEL_6600Intel 6600
accera.Target.Model.INTEL_6600KIntel 6600K
accera.Target.Model.INTEL_6600TIntel 6600T
accera.Target.Model.INTEL_6685RIntel 6685R
accera.Target.Model.INTEL_6700Intel 6700
accera.Target.Model.INTEL_6700KIntel 6700K
accera.Target.Model.INTEL_6700TIntel 6700T
accera.Target.Model.INTEL_6785RIntel 6785R
accera.Target.Model.INTEL_6820HQIntel 6820HQ
accera.Target.Model.INTEL_7100Intel 7100
accera.Target.Model.INTEL_7100TIntel 7100T
accera.Target.Model.INTEL_7101EIntel 7101E
accera.Target.Model.INTEL_7101TEIntel 7101TE
accera.Target.Model.INTEL_7300Intel 7300
accera.Target.Model.INTEL_7300TIntel 7300T
accera.Target.Model.INTEL_7320Intel 7320
accera.Target.Model.INTEL_7350KIntel 7350K
accera.Target.Model.INTEL_7400Intel 7400
accera.Target.Model.INTEL_7400TIntel 7400T
accera.Target.Model.INTEL_7500Intel 7500
accera.Target.Model.INTEL_7500TIntel 7500T
accera.Target.Model.INTEL_7505Intel 7505
accera.Target.Model.INTEL_7600Intel 7600
accera.Target.Model.INTEL_7600KIntel 7600K
accera.Target.Model.INTEL_7600TIntel 7600T
accera.Target.Model.INTEL_7640XIntel 7640X
accera.Target.Model.INTEL_7700Intel 7700
accera.Target.Model.INTEL_7700KIntel 7700K
accera.Target.Model.INTEL_7700TIntel 7700T
accera.Target.Model.INTEL_7740XIntel 7740X
accera.Target.Model.INTEL_7800XIntel 7800X
accera.Target.Model.INTEL_7820XIntel 7820X
accera.Target.Model.INTEL_7900XIntel 7900X
accera.Target.Model.INTEL_7920XIntel 7920X
accera.Target.Model.INTEL_7940XIntel 7940X
accera.Target.Model.INTEL_7960XIntel 7960X
accera.Target.Model.INTEL_7980XEIntel 7980XE
accera.Target.Model.INTEL_8086KIntel 8086K
accera.Target.Model.INTEL_8100Intel 8100
accera.Target.Model.INTEL_8100FIntel 8100F
accera.Target.Model.INTEL_8100TIntel 8100T
accera.Target.Model.INTEL_8253Intel 8253
accera.Target.Model.INTEL_8256Intel 8256
accera.Target.Model.INTEL_8260Intel 8260
accera.Target.Model.INTEL_8260LIntel 8260L
accera.Target.Model.INTEL_8260MIntel 8260M
accera.Target.Model.INTEL_8260YIntel 8260Y
accera.Target.Model.INTEL_8268Intel 8268
accera.Target.Model.INTEL_8270Intel 8270
accera.Target.Model.INTEL_8272CLIntel 8272CL
accera.Target.Model.INTEL_8273CLIntel 8273CL
accera.Target.Model.INTEL_8276Intel 8276
accera.Target.Model.INTEL_8276LIntel 8276L
accera.Target.Model.INTEL_8276MIntel 8276M
accera.Target.Model.INTEL_8280Intel 8280
accera.Target.Model.INTEL_8280LIntel 8280L
accera.Target.Model.INTEL_8280MIntel 8280M
accera.Target.Model.INTEL_8284Intel 8284
accera.Target.Model.INTEL_8300Intel 8300
accera.Target.Model.INTEL_8300TIntel 8300T
accera.Target.Model.INTEL_8350KIntel 8350K
accera.Target.Model.INTEL_8351NIntel 8351N
accera.Target.Model.INTEL_8352SIntel 8352S
accera.Target.Model.INTEL_8352VIntel 8352V
accera.Target.Model.INTEL_8352YIntel 8352Y
accera.Target.Model.INTEL_8358Intel 8358
accera.Target.Model.INTEL_8358PIntel 8358P
accera.Target.Model.INTEL_8360YIntel 8360Y
accera.Target.Model.INTEL_8362Intel 8362
accera.Target.Model.INTEL_8368Intel 8368
accera.Target.Model.INTEL_8368QIntel 8368Q
accera.Target.Model.INTEL_8380Intel 8380
accera.Target.Model.INTEL_8400Intel 8400
accera.Target.Model.INTEL_8400TIntel 8400T
accera.Target.Model.INTEL_8500Intel 8500
accera.Target.Model.INTEL_8500TIntel 8500T
accera.Target.Model.INTEL_8550UIntel 8550U
accera.Target.Model.INTEL_8600Intel 8600
accera.Target.Model.INTEL_8600KIntel 8600K
accera.Target.Model.INTEL_8600TIntel 8600T
accera.Target.Model.INTEL_8650UIntel 8650U
accera.Target.Model.INTEL_8700Intel 8700
accera.Target.Model.INTEL_8700KIntel 8700K
accera.Target.Model.INTEL_8700TIntel 8700T
accera.Target.Model.INTEL_9221Intel 9221
accera.Target.Model.INTEL_9222Intel 9222
accera.Target.Model.INTEL_9242Intel 9242
accera.Target.Model.INTEL_9282Intel 9282
accera.Target.Model.INTEL_9800XIntel 9800X
accera.Target.Model.INTEL_9820XIntel 9820X
accera.Target.Model.INTEL_9900XIntel 9900X
accera.Target.Model.INTEL_9920XIntel 9920X
accera.Target.Model.INTEL_9940XIntel 9940X
accera.Target.Model.INTEL_9960XIntel 9960X
accera.Target.Model.INTEL_9980XEIntel 9980XE
accera.Target.Model.INTEL_9990XEIntel 9990XE
accera.Target.Model.INTEL_E3_1220_V6Intel E3-1220 v6
accera.Target.Model.INTEL_E3_1225_V6Intel E3-1225 v6
accera.Target.Model.INTEL_E3_1230_V6Intel E3-1230 v6
accera.Target.Model.INTEL_E3_1240_V6Intel E3-1240 v6
accera.Target.Model.INTEL_E3_1245_V6Intel E3-1245 v6
accera.Target.Model.INTEL_E3_1270_V6Intel E3-1270 v6
accera.Target.Model.INTEL_E3_1275_V6Intel E3-1275 v6
accera.Target.Model.INTEL_E3_1280_V6Intel E3-1280 v6
accera.Target.Model.INTEL_E3_1285_V6Intel E3-1285 v6
accera.Target.Model.INTEL_E5_1607_V2Intel E5-1607 v2
accera.Target.Model.INTEL_E5_1620_V2Intel E5-1620 v2
accera.Target.Model.INTEL_E5_1650_V2Intel E5-1650 v2
accera.Target.Model.INTEL_E5_1650_V3Intel E5-1650 v3
accera.Target.Model.INTEL_E5_1660_V2Intel E5-1660 v2
accera.Target.Model.INTEL_E5_1660_V3Intel E5-1660 v3
accera.Target.Model.INTEL_E5_1680_V2Intel E5-1680 v2
accera.Target.Model.INTEL_E5_1680_V3Intel E5-1680 v3
accera.Target.Model.INTEL_E5_2620_V3Intel E5-2620 v3
accera.Target.Model.INTEL_E5_2673_V4Intel E5-2673 v4
accera.Target.Model.INTEL_G3900Intel G3900
accera.Target.Model.INTEL_G3900TIntel G3900T
accera.Target.Model.INTEL_G3900TEIntel G3900TE
accera.Target.Model.INTEL_G3920Intel G3920
accera.Target.Model.INTEL_G4400Intel G4400
accera.Target.Model.INTEL_G4400TIntel G4400T
accera.Target.Model.INTEL_G4400TEIntel G4400TE
accera.Target.Model.INTEL_G4500Intel G4500
accera.Target.Model.INTEL_G4500TIntel G4500T
accera.Target.Model.INTEL_G4520Intel G4520
accera.Target.Model.INTEL_W_1250Intel W-1250
accera.Target.Model.INTEL_W_1250PIntel W-1250P
accera.Target.Model.INTEL_W_1270Intel W-1270
accera.Target.Model.INTEL_W_1270PIntel W-1270P
accera.Target.Model.INTEL_W_1290Intel W-1290
accera.Target.Model.INTEL_W_1290PIntel W-1290P
accera.Target.Model.INTEL_W_1290TIntel W-1290T
accera.Target.Model.INTEL_W_1350Intel W-1350
accera.Target.Model.INTEL_W_1350PIntel W-1350P
accera.Target.Model.INTEL_W_1370Intel W-1370
accera.Target.Model.INTEL_W_1370PIntel W-1370P
accera.Target.Model.INTEL_W_1390Intel W-1390
accera.Target.Model.INTEL_W_1390PIntel W-1390P
accera.Target.Model.INTEL_W_1390TIntel W-1390T
accera.Target.Model.INTEL_W_2102Intel W-2102
accera.Target.Model.INTEL_W_2104Intel W-2104
accera.Target.Model.INTEL_W_2123Intel W-2123
accera.Target.Model.INTEL_W_2125Intel W-2125
accera.Target.Model.INTEL_W_2133Intel W-2133
accera.Target.Model.INTEL_W_2135Intel W-2135
accera.Target.Model.INTEL_W_2140BIntel W-2140B
accera.Target.Model.INTEL_W_2150BIntel W-2150B
accera.Target.Model.INTEL_W_3175XIntel W-3175X
accera.Target.Model.INTEL_W_3223Intel W-3223
accera.Target.Model.INTEL_W_3225Intel W-3225
accera.Target.Model.INTEL_W_3235Intel W-3235
accera.Target.Model.INTEL_W_3245Intel W-3245
accera.Target.Model.INTEL_W_3245MIntel W-3245M
accera.Target.Model.INTEL_W_3265Intel W-3265
accera.Target.Model.INTEL_W_3265MIntel W-3265M
accera.Target.Model.INTEL_W_3275Intel W-3275
accera.Target.Model.INTEL_W_3275MIntel W-3275M
accera.Target.Model.RASPBERRY_PI_3BRaspberry Pi 3B
accera.Target.Model.RASPBERRY_PI_4BRaspberry Pi 4B
accera.Target.Model.RASPBERRY_PI_ZERORaspberry Pi Zero
+

The enum also defines constants for some well-known GPU models.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.Target.Model.AMD_MI100AMD MI100
accera.Target.Model.AMD_MI200AMD MI200
accera.Target.Model.AMD_MI50AMD MI50
accera.Target.Model.AMD_RADEON7AMD Radeon7
accera.Target.Model.NVIDIA_A100NVidia A100
accera.Target.Model.NVIDIA_P100NVidia P100
accera.Target.Model.NVIDIA_RTX_A6000NVidia RTX A6000
accera.Target.Model.NVIDIA_V100NVidia V100
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Target/Runtime/index.html b/Reference/classes/Target/Runtime/index.html new file mode 100644 index 00000000..fdee1c72 --- /dev/null +++ b/Reference/classes/Target/Runtime/index.html @@ -0,0 +1,2243 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Runtime - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Target.Runtime

+

The runtime for code generation and/or compilation.

+ + + + + + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.Target.Runtime.CUDAThe NVidia CUDA runtime.
accera.Target.Runtime.ROCMThe AMD ROCm runtime.
accera.Target.Runtime.VULKANThe Vulkan runtime.
accera.Target.Runtime.OPENMPThe OpenMP runtime.
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/classes/Target/Target/index.html b/Reference/classes/Target/Target/index.html new file mode 100644 index 00000000..92a14151 --- /dev/null +++ b/Reference/classes/Target/Target/index.html @@ -0,0 +1,2404 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Target - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Target([architecture, cache_lines, cache_sizes, category, extensions, family, frequency_GHz, known_name, model, name, num_cores, num_threads, runtime, tensor_core_info, turbo_frequency_GHz, vector_bytes, vector_registers)

+

Defines the capabilities of a target processor.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
architectureThe processor architectureaccera.Target.Architecture
cache_linesCache lines (kilobytes)list of positive integers
cache_sizesCache sizes (bytes)list of positive integers
categoryThe processor categoryaccera.Target.Category
extensionsSupported processor extensionslist of extension codes
familyThe processor familystring
frequency_GHzThe processor frequency (GHz)positive number
known_nameA name of a device known to Accerastring | accera.Target.Model / "HOST"
modelThe processor modelaccera.Target.Model
nameThe processor namestring
num_coresNumber of corespositive integer
num_threadsNumber of threadspositive integer
runtimeThe runtimeaccera.Target.Runtime
tensor_core_infoThe tensor core capabilities, such as the supported input type, output type, and shapesaccera.Targets.TensorCoreInformation
turbo_frequency_GHzTurbo frequency (GHz)positive number
vector_bytesBytes per vector registerpositive number
vector_registerstotal number of SIMD registerspositive number
+

Known device names

+

Accera provides a pre-defined list of known targets through the accera.Target.Models enumeration.

+

These known targets provide typical hardware settings and may not fit your specific hardware characteristics exactly. If your target matches closely with (but not exactly to) one of these targets, you can always start with a known target and update the properties accordingly.

+

If your target is your host machine, Accera will first try to find your host machine's CPU in the list of known devices then use its corresponding capabilities. If none is found, we recommend that you inspect the closest matching device in accera.Target.Models enumeration in order to generate optimal code. If there is no closely matching device for you host machine, we suggest you to look at the following section to define a cpu target in Accera.

+

Examples

+

Let's have a look at some examples to understand how to define a CPU target in Accera.

+

Create a custom CPU target: +

cpu_target = acc.Target(name="Custom processor", category=acc.Target.Category.CPU, architecture=acc.Target.Architecture.X86_64, num_cores=10)
+

+

We further create a known CPU target and can selectively override fields.

+
gen10 = acc.Target(
+                known_name="Intel 7940X",
+                category=acc.Target.Category.CPU,
+                extensions=["SSE4.1", "SSE4.2", "AVX2"])
+
+

In this example, we created a target device of a known CPU but overrode the +extensions to remove AVX512 support.

+

You can use this example as a starting point to define any other Intel Core Processor. Their specifications are listed in the table above.

+

Craete a pre-defined GPU target representing an NVidia Tesla v100 processor:

+
v100 = acc.Target(model=acc.Target.Model.NVIDIA_TESLA_V100)
+
+

Here is another example to create a custom GPU target:

+
gpu_target = acc.Target(name="Custom GPU processor", category=acc.Target.Category.GPU, default_block_size=16)
+
+

Additional Notes on Instruction Set Extensions

+

It is important to identify the number of vector registers and vector bytes of each SIMD register. These values may help you determine if you are leveraging the vector units of the underlying hardware to its best capabilities.

+

AVX

+

Advanced Vector Extensions (AVX) promotes legacy 128-bit SIMD instructions that operate on XMM registers to use a vector-extension (VEX) prefix and operate on 256-bit YMM registers.

+

Intel AVX introduced support for 256-bit wide SIMD registers (YMM0-YMM7 in operating modes that are 32-bit or less, YMM0-YMM15 in 64-bit mode). For Accera, 64-bit mode is the default. +a target. The lower 128-bits of the YMM registers are aliased to the respective 128-bit XMM registers. +In Intel AVX, there are 256-bit wide vector registers, 16 XMM registers, and 16 YMM registers to support an extension of 128-bits.

+

AVX512

+

AVX-512 is a further extension offering 32 ZMM registers, and each SIMD register is 512 bits (64 bytes) wide.

+

SSE4 Extension

+

There are 16 XMM registers (XMM0 to XMM15), each 128-bit wide. In 64-bit mode, eight additional XMM registers are accessible. Registers XMM8-XMM15 are accessed by using REX prefixes.

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/enumerations/CacheStrategy/index.html b/Reference/enumerations/CacheStrategy/index.html new file mode 100644 index 00000000..20d8aa40 --- /dev/null +++ b/Reference/enumerations/CacheStrategy/index.html @@ -0,0 +1,2233 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + CacheStrategy - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.CacheStrategy

+ + + + + + + + + + + + + + + + + +
typedescription
accera.CacheStrategy.BLOCKEDEvery thread copies a contiguous block of memory based on their thread index. e.g. If 100 elements are cached by 10 threads, thread 0 copies elements [0, 10), thread 1 copies elements [10, 20) and so on.
accera.CacheStrategy.STRIPEDEvery thread copies a part of their contribution in a round-robin fashion. e.g. In the previous example, thread 0 will now copy elements [0, 2), [20, 22), [40, 42), [60, 62) and [80, 82), thread 1 will copy [2, 4), [22, 24), [42, 44), [62, 64) and [82, 84) and so on. The minimum number of contiguous elements that each thread copies is governed by the vectorization parameter, which in this example is 2.
+

The effects of different caching strategies can be noticed as performance characteristics arising out of overhead caused by bank conflicts, memory coalescing etc.

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/enumerations/MMAFragmentOp/index.html b/Reference/enumerations/MMAFragmentOp/index.html new file mode 100644 index 00000000..2ab3e9e8 --- /dev/null +++ b/Reference/enumerations/MMAFragmentOp/index.html @@ -0,0 +1,2240 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + MMAFragmentOp - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.MMAFragmentOp

+ + + + + + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.MMAFragmentOp.NONENo-op which does not modify the fragment data, i.e. f(x) = x.
accera.MMAFragmentOp.ReLURectified linear unit activation function (details), i.e. f(x) = max(0, x).
accera.MMAFragmentOp.ReLU_NoConditionalRectified linear unit activation function which does not generate divergent code, i.e. f(x) = x * bool(x > 0).
accera.MMAFragmentOp.CLEARSets the data to constant 0, i.e. f(x) = 0.
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/enumerations/MMASchedulingPolicy/index.html b/Reference/enumerations/MMASchedulingPolicy/index.html new file mode 100644 index 00000000..dc1dbba5 --- /dev/null +++ b/Reference/enumerations/MMASchedulingPolicy/index.html @@ -0,0 +1,2232 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + MMASchedulingPolicy - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.MMASchedulingPolicy

+ + + + + + + + + + + + + + + + + +
typedescription
accera.MMASchedulingPolicy.PASS_ORDERProcess pass groups (fused passed) sequentially, within each pass group compute all the MFMA blocks. This allocates Accmulator registers required for all the blocks, however it only allocates input (A, B) registers which are only required for the current pass group.
accera.MMASchedulingPolicy.BLOCK_ORDERProcess MFMA blocks sequentially, for each block iterate over all the passes. This allocates Accumulator registers required for only 1 block and input (A, B) registers required for the entire pass group currently being processed. In this mode, input data for the same pass group is loaded into registers multiple times, once per block.
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/enumerations/MMAShape/index.html b/Reference/enumerations/MMAShape/index.html new file mode 100644 index 00000000..3913aaef --- /dev/null +++ b/Reference/enumerations/MMAShape/index.html @@ -0,0 +1,2406 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + MMAShape - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.MMAShape

+

The following table shows the matrix multiplication parameters associated with the different enum values, for different data types for a single pass. So for example a single pass of the M32xN32xK2_B1 operation would take input matrices of dimensions [32x2] (A) and [2x32] (B) to produce a matrix multiplication result of dimensions [32x32] (C). These operations can then be composed together to perform matrix multiplication of larger matrices.

+

More information about the corresponding Matrix Arithmetic Instructions (MAI) can be found here.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Supported MMA shapes and their compatible types for AMD targets
accera.MMAShapeMFMA InstructionM, N, KInput Type (ScalarType)Output Type (ScalarType)Compute Type (C++)
M64xN64xK1_B4V_MFMA_F32_16x16x1F3264, 64, 1float32float32float
M64xN64xK1_B2V_MFMA_F32_32x32x1F32
M32xN32xK2_B1V_MFMA_F32_32x32x2F3232, 32, 2
M16xN16xK4_B1V_MFMA_F32_16x16x4F3216, 16, 4
M64xN64xK2_B4V_MFMA_F32_16X16X2BF1664, 64, 2bfloat16bfloat16/float32
M64xN64xK2_B2V_MFMA_F32_32X32X2BF16bfloat16/float32
M32xN32xK4_B1V_MFMA_F32_32X32X4BF1632, 32, 4bfloat16/float32
M16xN16xK8_B1V_MFMA_F32_16X16X8BF1616, 16, 8bfloat16/float32
M64xN64xK4_B4V_MFMA_F32_16x16x4F1664, 64, 4float16float16/32
V_MFMA_I32_16X16X4I8int8int8/16/32int
M64xN64xK4_B2V_MFMA_F32_32x32x4F16float16float16/32float
V_MFMA_I32_32X32X4I8int8int8/16/32int
M32xN32xK8_B1V_MFMA_F32_32x32x8F1632, 32, 8float16float16/32float
V_MFMA_I32_32X32X8I8int8int8/16/32int
M16xN16xK16_B1V_MFMA_F32_16x16x16F1616, 16, 16float16float16/32float
V_MFMA_I32_16X16X16I8int8int8/16/32int
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Supported MMA shapes and their compatible types for Nvidia targets
accera.MMAShapeM, N, KInput Type (ScalarType)Output Type (ScalarType)Compute Type (C++)
M16xN16xK8_B116, 16, 8float32float32tf32*
M16xN16xK16_B116, 16, 16float16float16/32float
bfloat16float32
u/int8int32int
M32xN8xK16_B132, 8, 16float16float16/32float
bfloat16float32
u/int8int32int
M8xN32xK16_B18, 32, 16float16float16/32float
bfloat16float32
u/int8int32int
+ +
+ +

*TensorFloat-32 is a floating-point type introduced in the Nvidia Ampere architecture for accelerating FP32 performance. Information about this can be found here and in more detail in the architecture whitepaper. In this mode, multiplication is performed in TF32 precision and accumulation happens in FP32 precision.

+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/enumerations/Role/index.html b/Reference/enumerations/Role/index.html new file mode 100644 index 00000000..7c3ee67d --- /dev/null +++ b/Reference/enumerations/Role/index.html @@ -0,0 +1,2244 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Role - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.Role

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.Role.CONSTA constant array (immutable internally scoped) whose contents are known at compile-time.
accera.Role.INPUTAn input array (immutable external-scope).
accera.Role.INPUT_OUTPUTAn input/output array (mutable external-scope).
accera.Role.OUTPUTAn output array (mutable external-scope) which is allocated at runtime.
accera.Role.TEMPA temporary array (mutable internal-scope).
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/enumerations/ScalarType/index.html b/Reference/enumerations/ScalarType/index.html new file mode 100644 index 00000000..b83e2e3d --- /dev/null +++ b/Reference/enumerations/ScalarType/index.html @@ -0,0 +1,2276 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + ScalarType - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.ScalarType

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
typedescription
accera.ScalarType.boolboolean
accera.ScalarType.float1616-bit floating point number
accera.ScalarType.float3232-bit floating point number
accera.ScalarType.float6464-bit floating point number
accera.ScalarType.bfloat1616-bit Brain floating point number
accera.ScalarType.int88-bit signed integer
accera.ScalarType.int1616-bit signed integer
accera.ScalarType.int3232-bit signed integer
accera.ScalarType.int6464-bit signed integer
accera.ScalarType.uint88-bit unsigned integer
accera.ScalarType.uint1616-bit unsigned integer
accera.ScalarType.uint3232-bit unsigned integer
accera.ScalarType.uint6464-bit unsigned integer
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/functions/cast/index.html b/Reference/functions/cast/index.html new file mode 100644 index 00000000..7c8c1495 --- /dev/null +++ b/Reference/functions/cast/index.html @@ -0,0 +1,2298 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Cast - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.cast(value, type)

+

The cast operation converts a value from one acc.ScalarType to another.

+

Accera performs implicit casting between most types. Therefore, this operation should only be used to override the implicit casting behavior documented in Section 2.

+

Limitation: casting constants may result in truncation.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
valueThe value to cast
typeThe destination typeacc.ScalarType
+

Returns

+

The result after casting

+

Examples

+

Casting from float32 to int16:

+
A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, 20))
+B = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.int16, shape=(10, 20))
+
+nest = acc.Nest(10, 20)
+i, j = nest.get_indices()
+
+@nest.iteration_logic:
+def _():
+    B[i, j] = acc.cast(A[i, j], acc.ScalarType.int16) # explicit cast to int16
+    ...
+
+

In comparison, casting from int16 to float32 is implicit, which means the cast operation can be omitted:

+
A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.int16, shape=(10, 20))
+B = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(10, 20))
+
+nest = acc.Nest(10, 20)
+i, j = nest.get_indices()
+
+@nest.iteration_logic:
+def _():
+    B[i, j] = A[i, j] # implicit cast to float32
+    ...
+
+

Casting a constant to int8:

+
A = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.int8, shape=(10, 20))
+
+nest = acc.Nest(10, 20)
+i, j = nest.get_indices()
+
+@nest.iteration_logic:
+def _():
+    A[i, j] = acc.cast(10, acc.ScalarType.int8)
+    ...
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/functions/create_dimensions/index.html b/Reference/functions/create_dimensions/index.html new file mode 100644 index 00000000..c53e1499 --- /dev/null +++ b/Reference/functions/create_dimensions/index.html @@ -0,0 +1,2307 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Create dimensions - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.create_dimensions([role])

+

Creates placeholder dimensions of the specified role. These typically represent runtime Array and Nest dimensions.

+

There are two roles for runtime dimensions:

+
    +
  • accera.Role.INPUT - immutable dimension that is provided by an input parameter to an Accera function
  • +
  • accera.Role.OUTPUT - mutable dimension that is set within an Accera function
  • +
+

A third type of dimension, the compile-time dimension, is not covered here because it is just a constant.

+

Arguments

+ + + + + + + + + + + + + + + +
argumentdescriptiontype/default
roleThe role of the dimension determines if it is mutable or immutable.accera.Role. default: accera.Role.INPUT. Must be set to accera.Role.OUTPUT if intended for an accera.Role.OUTPUT Array.
+

Returns

+

Tuple of Dimension

+

Examples

+

Construct an input array with runtime input dimensions: +

import accera as acc
+M, K = acc.create_dimensions()
+A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))
+

+

Construct a input/output array using a combination of runtime and compile-time dimensions, respectively: +

M = acc.create_dimensions()
+A = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, 20))
+

+

Adding a function for an input/output array with runtime input dimensions: +

M, N = acc.create_dimensions()
+A = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+
+nest = acc.Nest(M, N)
+...
+
+package = acc.Package()
+package.add(nest, args=(A, M, N), base_name="myFunc")
+

+

Construct a output array with runtime (mutable) output dimensions. +

M, N = acc.create_dimensions(role=acc.Role.OUTPUT)
+A = acc.Array(role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+

+

Assign the value of a runtime input dimension to a runtime output dimension: +

M = acc.create_dimensions()
+N = acc.create_dimensions(role=acc.Role.OUTPUT)
+
+N.value = M
+

+

Assign the value of a runtime input dimension to a runtime output dimension: +

M = acc.create_dimensions()
+N = acc.create_dimensions(role=acc.Role.OUTPUT)
+
+N.value = M
+

+

Assign an integer value to a runtime output dimension: +

N = acc.create_dimensions(role=acc.Role.OUTPUT)
+N.value = 100
+

+

Assign a value to a runtime output dimension using an expression of runtime input dimensions: +

M, K = acc.create_dimensions()
+N = acc.create_dimensions(role=acc.Role.OUTPUT)
+
+N.value = M + K + 1
+

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/functions/create_parameter_grid/index.html b/Reference/functions/create_parameter_grid/index.html new file mode 100644 index 00000000..0b5df493 --- /dev/null +++ b/Reference/functions/create_parameter_grid/index.html @@ -0,0 +1,2281 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Create parameter grid - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.create_parameter_grid(parameter_choices, [filter_func, sample, seed])

+

Create a parameter grid from a dictionary that maps each parameter to its possible values.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
parameter_choicesA dictionary that maps each parameter to its possible valuesdictionary
filter_funcA callable to filter parameter_choices which returns a bool to indicate whether a given parameter combination should be included in the gridCallable
sampleA number to limit the size of the parameter grid. The grid is randomly sampled.integer
seedThe seed value for random sampling.integer
+

Returns

+

List of dictionary

+

Examples

+

Create a parameter grid from a dictionary that maps each parameter to its possible values:

+
parameters = acc.create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]})
+package.add(nest, args=(A, B, C), base_name="matmul", parameters)
+
+

Define a lambda or function to filter out combinations from the parameter grid. The arguments to the filter are the values of a parameter combination. The filter function should return True if the combination should be included, and False otherwise:

+
parameters = acc.create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]}, filter_func=lambda p0, p1, p2, p3: p2 < p1 and 4 * (p0 * p3 + p1 * p2 + p1 * p3 + p2 * p3) / 1024 < 256)
+
+

Parameter grids can result in a large number of possible combinations. We can limit the number of combinations by random sampling:

+
parameters = acc.create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]}, sample=5)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/functions/create_parameters/index.html b/Reference/functions/create_parameters/index.html new file mode 100644 index 00000000..73900931 --- /dev/null +++ b/Reference/functions/create_parameters/index.html @@ -0,0 +1,2236 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Create parameters - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.create_parameters()

+

Creates placeholder parameters.

+

Returns

+

Tuple of Parameter

+

Examples

+

Create 3 parameters m, n, k. Use them to parameterize the nest shape:

+
m, n, k = acc.create_parameters()
+nest = acc.Nest(shape=(m, n, k))
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/functions/fuse/index.html b/Reference/functions/fuse/index.html new file mode 100644 index 00000000..72403449 --- /dev/null +++ b/Reference/functions/fuse/index.html @@ -0,0 +1,2275 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Fuse - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

accera.fuse(schedules[, *args, partial])

+

The fuse operation combines multiple iteration spaces into a single "fused" iteration space. The fused iteration space represents the union of the work in the original spaces.

+

In cases where it doesn't make sense to fuse all of the iteration space dimensions, we can choose to fuse a prefix of the dimensions and leave the rest unfused.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + +
argumentdescriptiontype/default
schedulesIf performing partial fusing, this is a tuple of the schedules to fuse. If performing full fusing, this contains the first schedule to fuse, while args will contain the subsequent schedules.
*argsOptional variable arguments containing subsequent schedules to fusevariable Schedule arguments
partialThe number of dimensions to fuse. If not specified, all dimensions will be fusednon-negative integer
+

Returns

+

Instance of FusedSchedule

+

Examples

+

Full fusing of same-shaped iteration spaces:

+
# Fuse all dimensions of schedule0 and schedule1
+schedule = acc.fuse(schedule0, schedule1)
+
+

Partial iteration space fusing:

+
# Fuse the first two dimensions of schedule0 and schedule1
+schedule = acc.fuse((schedule0, schedule1), partial=2)
+
+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Reference/safety_analysis/index.html b/Reference/safety_analysis/index.html new file mode 100644 index 00000000..1089a7cc --- /dev/null +++ b/Reference/safety_analysis/index.html @@ -0,0 +1,2218 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Safety analysis - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera v1.2 Reference

+

Safety Analysis

+

One of the most important features in Accera is to provide safety guarantees to preserve the underlying logic no matter how we transform the schedule. Not all Accera schedules are safe, but those that are safe are much easier to work with.

+

Order-invariant Schedules

+

Order-invariant schedules are always safe because Accera transformations never remove any iterations. They only change the order of the loop-nest iterations, or add empty iterations in the form of padding when necessary. Recall that a Nest represents a simple nest. A simple nest is assumed to be order-invariant, and therefore any schedule created by a call to create_schedule() is safe.

+

Safety and Fusing

+

Fusing is another way to create a schedule (see Section 4 of the Accera manual). Say that we have a sequence of n schedules: schedule0, schedule1, ... and we partially fuse their first m dimensions. Namely: +

schedule = acc.fuse((schedule0, schedule1, ...), partial=m)
+
+At this point, schedule is equivalent to sequentially executing the individual schedules for each iteration of the fused dimensions. However, is the fused schedule safe? In other words, does schedule guarantee the preservation of underlying logic, regardless of the applied transformation?

+

The dimensions of schedule fall into three categories:

+
    +
  • Fused dimensions: at first, this category contains the next m dimensions of schedule. If any of these dimensions are split, the derived dimensions are also added to this category.
  • +
  • Fusing dimensions: at first, this category contains a single dimension, the first dimension of schedule. However, if this dimension is split, its derived dimensions are added to this category.
  • +
  • Unfused dimensions: all the remaining dimensions.
  • +
+

Note that the individual schedules being fused may have been created by previous fusing operations. The categories above relate to the role of each dimension in the current fusing operation.

+

Theorem

+

Imagine that we apply a sequence of transformations to schedule, which may derive new dimensions. Derived dimensions belong to the same category as the dimension from which they were derived. Suppose the fusing dimension (and its derived dimensions) precedes all the unfused dimensions. In that case, for any value of the fused dimensions, all the corresponding work from schedule0 is executed before any of the corresponding work from schedule1. Similarly, all the corresponding work from schedule1 is executed before any of the corresponding work from schedule2; and so on.

+

Proof

+

For simplicity, assume that there is only one fusing dimension, f. Also, assume that we've only fused two schedules, schedule0 and schedule1. Note that these simplifying assumptions can easily be relaxed.

+

Assume that f precedes all of the unfused dimensions. Therefore, dimensions that precede f are necessarily fused dimensions. Let U be a sequence of concrete values for all the fused dimensions, and let V denote only those values that correspond to dimensions that precede f. The work from schedule0 that corresponds to the concrete values in U is contained in the slice (V, 0, *, ..., *). Similarly, the work form schedule1 that corresponds to the values in U is contained in (V, 1, *, ..., *). Finally, note that the former slice lexicographically precedes the latter, concluding the proof.

+

An example

+

To make the theorem less abstract, we demonstrate how it applies to a simple example. Assume that we start with two three-dimensional schedules, schedule0 and schedule1, and we fuse their first two dimensions: +

i0, j0, k0 = schedule0.get_indices() # redundant operation, included for clarity
+i1, j1, k1 = schedule1.get_indices() # redundant operation, included for clarity
+schedule = acc.fuse((schedule0, schedule1), partial=2)
+i, j, f, k0, k1 = schedule.get_indices()
+
+Next, say that we transform schedule by tiling dimensions j and k0 to reorder the dimensions as follows: +
jj, kk0 = schedule.tile({
+    j: 4,
+    k0: 4
+})
+schedule.reorder(j, i, f, k0, k1, kk0, jj)
+
+Dimensions i, j, and jj are fused dimensions, while k0, kk0, and k1 are unfused dimensions. Note that the fusing dimension f precedes all of the unfused dimensions, satisfying the theorem's condition. Next, choose concrete values for the fused dimensions, say, i=4, j=3, and jj=2. The work from schedule0 that corresponds to these values is contained in the slice (3, 4, 0, *, *, *, *). Similarly, the work from schedule1 that corresponds to these values is contained in the slice (3, 4, 1, *, *, *, *). The former slice lexicographically precedes the latter and is therefore executed first.

+

Safety

+

The theorem holds for any schedule, but it does not imply that every schedule is safe. Additional effort is required to prove whether a specific schedule is safe. When performing a fuse operation, we must examine the specific circumstances and consider whether the theorem provides a sufficient condition for safety.

+
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Tutorials/Hello_MatMul/index.html b/Tutorials/Hello_MatMul/index.html new file mode 100644 index 00000000..0d7c9585 --- /dev/null +++ b/Tutorials/Hello_MatMul/index.html @@ -0,0 +1,2484 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Hello MatMul - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Hello MatMul

+ +

Hello MatMul

+

By the end of this tutorial, you will learn how to:

+
    +
  • Implement a simple Matrix Multiplication (MatMul) function using Accera's Domain Specific Language (DSL)
  • +
  • Produce a HAT package containing the MatMul function
  • +
  • Call the function from C or C++ code
  • +
+

Prerequisites

+
    +
  • This tutorial assumes you already have Accera installed. If not, you can find the instructions in here
  • +
  • You should also be familiar with writing Python and C++
  • +
+

A naive MatMul algorithm

+

Let's consider the example of multiplying matrices A and B, and adding the result into matrix C. In NumPy syntax, this can be expressed as:

+
C += A @ B
+
+

A naive algorithm for matrix multiplication typically contains 3 nested for loops. Expressed in Python, this could like:

+
# A.shape = (M, K), B.shape = (K, N), C.shape = (M, N)
+
+for i in range(M):
+    for j in range(N):
+        for k in range(K):
+            C[i, j] += A[i, k] * B[k, j]
+
+

Accera Python DSL

+

We will now walk through a naive Matrix Multiplication (MatMul) using Accera.

+

Create an empty file called hello_matmul_generator.py. First we'll import Accera's module.

+
import accera as acc
+
+

Define some matrix sizes. A will be M by K, B will be K by N, and C will be M by N.

+
# Define our matrix sizes
+M = 128
+N = 256
+K = 256
+
+

Write a Python function that receives arrays A, B and C. These are our input and input/output matrices.

+
A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))
+B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+
+

Here, we will use the Nest class to define our 3-layered nested for loop. The range indices are M, N, and K, with the outermost loop (M) listed first. We can get the loop nest indices in order to perform the computation.

+
# Define the loop nest
+nest = acc.Nest(shape=(M, N, K))
+
+# Get the loop nest indices
+i, j, k = nest.get_indices()
+
+

Next we define the logic of each iteration of the loop nest: +

# Define the loop nest logic
+@nest.iteration_logic
+def _():
+    C[i, j] += A[i, k] * B[k, j]
+

+

We have finished defining the logic of MatMul, and let's define the schedule which controls how the logic is executed. To do this, we first create the schedule from the nest:

+
sched = nest.create_schedule()
+
+

At this point, sched represents the default schedule for our algorithm. We can also perform some basic transformations on this schedule. For example, the following lines of code will split the k index in blocks of 4 (so k, k+4, k+8, and so on).

+
# Split the k loop into blocks of 4, effectively doing this
+# (assuming K is divisible by 4):
+#
+# for i in range(M):
+#    for j in range(N):
+#        # Split k into two loops
+#        for k in range(0, K, 4):
+#            for kk in range(4):
+#                C[i, j] += A[i, k + kk] * B[k + kk, j]
+#
+# If k is not divisible by 4, Accera will take care of the boundary
+# case for you.
+kk = sched.split(k, 4)
+
+

The split index is now k and kk.

+

The next step is to create a plan from the schedule. For instance, we can use this plan to unroll the innermost loop.

+
plan = sched.create_plan()
+# Unroll kk, effectively doing this
+# (assuming K is divisible by 4):
+#
+# for i in range(M):
+#    for j in range(N):
+#        for k in range(0, K, 4):
+#            # Unrolled kk
+#            C[i, j] += A[i, k + 0] * B[k + 0, j]
+#            C[i, j] += A[i, k + 1] * B[k + 1, j]
+#            C[i, j] += A[i, k + 2] * B[k + 2, j]
+#            C[i, j] += A[i, k + 3] * B[k + 3, j]
+#
+# If k is not divisible by 4, Accera will take care of the boundary
+# case for you.
+plan.unroll(kk)
+
+

Use the plan to add a callable function named hello_matmul_pi3_py to a HAT package.

+
# Create a package and add a function to the package based on the plan
+package = acc.Package()
+package.add(plan, args=(A, B, C), base_name="hello_matmul_py")
+
+

Finally, we build the HAT package: +

# Build the HAT package
+package.build(name="hello_matmul")
+

+

By now, you should have all the code necessary to generate your first Accera MatMul function. You can also find the complete Python script here.

+

Generate HAT package

+

Next, we run the generator script to produce a HAT package.

+
Windows/MacOS
+
python hello_matmul_generator.py
+
+
Ubuntu
+
python3 hello_matmul_generator.py
+
+

After this runs, you should see a header file hello_matmul.hat and some object files (such as hello_matmul.obj or hello_matmul.o). The .hat file format is described here. In Accera, we call these files the "HAT package".

+

Runner code

+

We will now walk through how to call our MatMul implementation from the HAT package.

+

Create a file called hello_matmul_runner.cpp with the code below. You can also find it here.

+
#include <stdio.h>
+#include <algorithm>
+
+// Include the HAT file that declares our MatMul function
+#include "hello_matmul.hat"
+
+#define M 128
+#define N 256
+#define K 256
+
+int main(int argc, const char** argv)
+{
+    // Prepare our matrices
+    float A[M*K];
+    float B[K*N];
+    float C[M*N];
+
+    // Fill with data
+    std::fill_n(A, M*K, 2.0f);
+    std::fill_n(B, K*N, 3.0f);
+    std::fill_n(C, M*N, 0.42f);
+
+    printf("Calling MatMul M=%d, K=%d, N=%d\n", M, K, N);
+    hello_matmul_py(A, B, C);
+
+    printf("Result (first few elements): ");
+    for (int i = 0; i < 10; ++i)
+    {
+        printf("%f ", C[i]);
+    }
+    printf("\n");
+    return 0;
+}
+
+

The code above creates the A, B, and C matrices, and calls the function hello_matmul_py to perform MatMul.

+

Now that we have written the code, we will compile and link it with the HAT package to create an executable. Save the file to your working directory, in the same location as hello_matmul_generator.py and the generated *.hat and object files.

+

Build and run

+
Windows
+

We will need the 64-bit Visual C++ tools to link against the generated 64-bit .obj. From an "x64 Native Tools Command Prompt":

+
cl.exe hello_matmul_runner.cpp *.lib
+hello_matmul_runner.exe
+
+
MacOS
+
clang hello_matmul_runner.cpp *.a -o hello_matmul_runner
+./hello_matmul_runner
+
+
Ubuntu
+
gcc hello_matmul_runner.cpp *.a -o hello_matmul_runner
+./hello_matmul_runner
+
+

The output should look like:

+
Calling MatMul M=128, K=256, N=256
+Result (first few elements): 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922
+
+

You can now experiment with the generated MatMul function with your own inputs.

+

Optimized MatMul algorithm

+

The above example illustrates a naive algorithm. To see what a more optimized version could like like, see the Optimized MatMul tutorial.

+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Tutorials/Hello_MatMul_GPU/index.html b/Tutorials/Hello_MatMul_GPU/index.html new file mode 100644 index 00000000..cc7e67a8 --- /dev/null +++ b/Tutorials/Hello_MatMul_GPU/index.html @@ -0,0 +1,2514 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Hello MatMul GPU - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Hello MatMul GPU

+ +

Hello MatMul GPU

+

In this tutorial, you will learn how to implement a simple Matrix Multiplication (MatMul) function for execution on a GPU. We will use the Accera's Domain Specific Language (DSL) to produce a HAT package containing the MatMul function that can be called from the host to launch the MatMul function on the GPU.

+

Prerequisites

+
    +
  • You should have Accera installed. If not, you can find the instructions in here.
  • +
  • Be familiar with writing Python and C++ code.
  • +
  • Be familiar with basic GPU programming and concepts.
  • +
  • You have completed the Hello_MatMul tutorial.
  • +
  • You have installed the Vulkan SDK and runtime.
  • +
+

Review: the naive MatMul algorithm

+

As in the Hello_MatMul tutorial, we'll consider the example of multiplying matrices A and B and adding the result into matrix C. In NumPy syntax, this can be expressed as:

+
C += A @ B
+
+

A naive algorithm for matrix multiplication typically contains 3 nested for-loops. Expressed in Python, this can look like:

+
for i in range(M):
+    for j in range(N):
+        for k in range(K):
+            C[i, j] += A[i, k] * B[k, j]
+
+

Accera Python DSL

+

We will now walk through a basic Matrix Multiplication (MatMul) using Accera. Additionally, we will direct Accera to execute this MatMul function on the default GPU.

+

Create an empty file called hello_matmul_gpu_generator.py. Import dependent modules:

+
import accera as acc
+
+

Define some matrix sizes, where A's shape is M by K, B's is K by N, and C's, M by N.

+
# Define our matrix sizes
+M = 1024
+N = 512
+K = 256
+
+

Declare arrays A, B, and C. These are our input and input/output matrices and hold 32-bit floating-point elements.

+
A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))
+B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+
+

Use the Nest class to define our 3-layered nested for-loop and get the indices: +

# Define the loop nest
+nest = acc.Nest(shape=(M, N, K))
+
+# Get the loop nest indices
+i, j, k = nest.get_indices()
+

+

Next, we define the logic for every iteration of the loop nest: +

# Define the loop nest logic
+@nest.iteration_logic
+def _():
+    C[i, j] += A[i, k] * B[k, j]
+

+

We have finished defining the logic of MatMul. Notice how, up to this point, it is identical to what we did for the CPU example. Let's now define the schedule to control the execution logic. To do this, we first create the schedule from the nest:

+
schedule = nest.create_schedule()
+
+

We will transform the iteration space and change the plan according to some predefined constants to execute this efficiently on our chosen hardware target. The values of these constants can come either from hardware target characteristics and the shapes of the arrays or can be found through auto-tuning. These will be explained in detail in a subsequent tutorial. For now, define:

+
block_x = 16
+block_y = 16
+
+

Transform the iteration space to specify the thread block behavior. See (GPU blocks)[TODO:markdown...] section to learning more about optimizing block sizes on GPU: +

ii = schedule.split(i, block_x)
+jj = schedule.split(j, block_y)
+

+

Set the order to traverse the iteration space. Note that the precise order of execution on GPU targets will be unknown due to the parallel nature of the hardware. Nevertheless, setting the order here is important since the coarse grain parallelization (e.g., grid) should precede the more fine-grained (e.g., warps/wavefronts): +

schedule.reorder(i, j, ii, jj, k)
+

+

Create a plan from the schedule. The plan allows us to control specific execution behavior on the hardware target. Such grid launch dimensions and thread blocks sizes are essential for high performance: +

target = acc.Target(category=acc.Target.Category.GPU, runtime=acc.Target.Runtime.VULKAN)
+plan = schedule.create_plan(target)
+

+

Bind dimensions of the schedule to execution units on the GPU. Use the outer dimensions i, j to be the block indices x,y in the grid, and the ii and jj dimensions to be the thread indices x,y in the block: +

plan.bind({
+    i: target.GridUnit.BLOCK_X,
+    j: target.GridUnit.BLOCK_Y,
+    ii: target.GridUnit.THREAD_X,
+    jj: target.GridUnit.THREAD_Y
+})
+

+

Use the plan to add a callable function named hello_matmul_gpu to a HAT package.

+
# Create a package and add a function to the package based on the plan
+package = acc.Package()
+package.add(plan, args=(A, B, C), base_name="hello_matmul_gpu")
+
+

Finally, we build the HAT package: +

# Build a statically-linked HAT package to be consumed by the C++ runner
+package.build(name="hello_matmul_gpu", format=acc.Package.Format.HAT_STATIC)
+

+

By now, you have all the code necessary to generate an Accera MatMul function that runs on the GPU. You can find the complete Python script here.

+

Generate HAT package

+

Next, we run the generator script to produce a HAT package.

+
Windows/MacOS
+
python hello_matmul_gpu_generator.py
+
+
Ubuntu
+
python3 hello_matmul_gpu_generator.py
+
+

After this script runs, you should see a header file hello_matmul_gpu.hat and some object files (such as hello_matmul_gpu.obj or hello_matmul_gpu.o). The build process also generates a supporting module, AcceraGPUUtilities.hat and its object file for GPU initialization and uninitialization. In Accera, we call these files the "HAT package".

+

Runner code

+

Let's see how we can call our MatMul implementation from the HAT package.

+

Create a file called hello_matmul_gpu_runner.cpp containing the code below. You can find it here.

+
#include <stdio.h>
+#include <algorithm>
+
+// Include the HAT file that declares GPU initialization/uninitialization functions
+#include "AcceraGPUUtilities.hat"
+
+// Include the HAT file that declares our MatMul function
+#include "hello_matmul_gpu.hat"
+
+#define M 1024
+#define N 512
+#define K 256
+
+int main(int argc, const char** argv)
+{
+    // Prepare our matrices (using the heap for large matrices)
+    float* A = new float[M*K];
+    float* B = new float[K*N];
+    float* C = new float[M*N];
+
+    // Fill with data
+    std::fill_n(A, M*K, 2.0f);
+    std::fill_n(B, K*N, 3.0f);
+    std::fill_n(C, M*N, 0.42f);
+
+    // Initialize the GPU
+    AcceraGPUInitialize();
+
+    printf("Calling MatMul M=%d, K=%d, N=%d\n", M, K, N);
+    hello_matmul_gpu(A, B, C);
+
+    printf("Result (first 10 elements): ");
+    for (int i = 0; i < 10; ++i)
+    {
+        printf("%f ", C[i]);
+    }
+    printf("\n");
+
+    // Uninitialize the GPU
+    AcceraGPUDeInitialize();
+
+    delete[] A;
+    delete[] B;
+    delete[] C;
+    return 0;
+}
+
+

The above code creates the A, B, and C matrices and calls the function hello_matmul_gpu to perform MatMul.

+

Now that we have the code, compile and link it with the HAT package to create an executable. Save the file to your working directory, in the exact location as hello_matmul_gpu_generator.py and the generated *.hat and object files.

+

Build and run

+

Accera includes a shared library that wraps the Vulkan APIs (acc-vulkan-runtime-wrappers.so, acc-vulkan-runtime-wrappers.dll, or acc-vulkan-runtime-wrappers.dylib). We need to provide the path to this shared library when building and running the executable.

+

Find the installed path to the "accera" package:

+
Windows/MacOS
+
pip show accera
+
+
Ubuntu
+
pip3 show accera
+
+

From the output above, find the Location entry, for example: +

Location: /usr/local/lib/python3.8/dist-packages
+

+

We will use this path below.

+
Windows
+

We will need the 64-bit Visual C++ tools to link against the generated 64-bit .obj. From an "x64 Native Tools Command Prompt":

+

Set the ACCERA_PATH environment variable to the full install path of the "accera" package (derived from pip show accera to locate acc-vulkan-runtime-wrappers.dll):

+
set ACCERA_PATH=<Location_path>\accera
+
+

Set the PATH environment variable to allow the runner to locate acc-vulkan-runtime-wrappers.dll:

+
set PATH=%PATH%;%ACCERA_PATH%
+
+

Now build and run:

+
cl.exe hello_matmul_gpu_runner.cpp *.lib %ACCERA_PATH%/*.lib
+hello_matmul_gpu_runner.exe
+
+
MacOS
+

Set the ACCERA_PATH environment variable to the full install path of the "accera" package (derived from pip show accera to locate acc-vulkan-runtime-wrappers.dylib):

+
export ACCERA_PATH=<Location_path>/accera
+
+

Now build and run:

+
clang++ hello_matmul_gpu_runner.cpp *.a $ACCERA_PATH/*.dylib -o hello_matmul_gpu_runner
+DYLD_LIBRARY_PATH=$ACCERA_PATH ./hello_matmul_gpu_runner
+
+
Ubuntu
+

Set the ACCERA_PATH environment variable to the full install path of the "accera" package (derived from pip3 show accera to locate acc-vulkan-runtime-wrappers.so):

+
export ACCERA_PATH=<Location_path>/accera
+
+

Now build and run:

+
g++ hello_matmul_gpu_runner.cpp *.a $ACCERA_PATH/*.so -o hello_matmul_gpu_runner
+LD_LIBRARY_PATH=$ACCERA_PATH ./hello_matmul_gpu_runner
+
+

The output should look like this:

+
Calling MatMul M=1024, K=256, N=512
+Result (first 10 elements): 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922
+
+

You can now experiment with the generated MatMul function with your own inputs.

+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Tutorials/Optimized_MatMul/index.html b/Tutorials/Optimized_MatMul/index.html new file mode 100644 index 00000000..cfaa9cd6 --- /dev/null +++ b/Tutorials/Optimized_MatMul/index.html @@ -0,0 +1,2505 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Optimized MatMul - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Optimized MatMul

+ +

Optimized MatMul

+

Optimizing MatMul is highly dependent on the target platform. The code in the example below is optimized specifically for an Intel Xeon E5-2673 v3 CPU. However, it should work equally well on CPUs with similar hardware characteristics, such as an AMD Epyc 7551.

+

By the end of this tutorial, you will learn how to: +* Implement a performant Matrix Multiplication (MatMul) function targetting AVX2 FMA3 CPUs like Intel Haswell or the AMD Epyc families. +* Produce a HAT package containing the optimized MatMul function. +* Call the function from C or C++ code.

+

Prerequisites

+
    +
  • You should have Accera installed. If not, you can find the instructions in here.
  • +
  • You are familiar with writing Python and C++.
  • +
  • You know about SIMD instructions and registers.
  • +
  • You have completed the Hello_MatMul tutorial.
  • +
+

Review: the naive MatMul algorithm

+

As in the Hello_MatMul tutorial, we'll consider the example of multiplying matrices A and B and adding the result into matrix C. In NumPy syntax, this can be expressed as:

+
C += A @ B
+
+

A naive algorithm for matrix multiplication typically contains 3 nested for-loops. Expressed in Python, this will look like:

+
for i in range(M):
+    for j in range(N):
+        for k in range(K):
+            C[i, j] += A[i, k] * B[k, j]
+
+

Accera Python DSL

+

We will walk through how to specify an optimized Matrix Multiplication (MatMul) using Accera. This tutorial assumes the following:

+
    +
  • Specific matrix sizes. Inputs A and B are 784 x 128 and 128 x 512 matrices, respectively. The output C is a 784 x 512 matrix. These can represent a mid-level layer in a Resnet-50 model. The A matrix contains the activation values from the previous layer, and the B matrix contains the weights of the neural network layer.
  • +
  • Row-major layout of the matrix elements.
  • +
  • The target hardware is capable of AVX2 FMA3 instructions, such as the Intel Xeon E5-2673 v3 or the AMD Epyc 7551.
  • +
+

Create an empty file called optimized_matmul_generator.py. Import dependent modules: +

import accera as acc
+

+

Define some matrix sizes, where A's shape is M by K, B's is K by N, and C's, M by N. +

# Define our matrix sizes
+M = 784
+N = 512
+K = 128
+

+

Declare arrays A, B, and C. These are our input and input/output matrices and hold 32-bit floating-point elements. +

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))
+B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+

+

Use the Nest class to define a 3-layered nested for-loop and get the indices: +

# Define the loop nest
+nest = acc.Nest(shape=(M, N, K))
+
+# Get the loop nest indices
+i, j, k = nest.get_indices()
+

+

Next, we define the logic for every iteration of the loop nest: +

# Define the loop nest logic
+@nest.iteration_logic
+def _():
+    C[i, j] += A[i, k] * B[k, j]
+

+

We have finished defining the logic of MatMul. Let's now define the schedule which controls execution logic. To do this, we first create the schedule from the nest: +

schedule = nest.create_schedule()
+

+

We will transform the iteration space and change the plan according to some predefined constants to execute this efficiently on our chosen hardware target. Values of these constants come either from hardware target characteristics and the shapes of the arrays or can be found through auto-tuning. These will be explained in detail in a subsequent tutorial. For now, define: +

tile_size_i = 6
+tile_size_j = 256
+tile_size_k = 128
+inner_dim_unroll = 4
+num_rows_in_kernel = 6
+

+

We create a CPU target that defines constants for the SIMD vector sizes and the number of vector execution units to use the hardware characteristics. +

target = acc.Target(category=acc.Target.Category.CPU)
+

+

Transform the iteration space to specify the tiling behavior: +

ii = schedule.split(i, tile_size_i)
+jj = schedule.split(j, tile_size_j)
+kk = schedule.split(k, tile_size_k)
+

+

Next, let's split the iteration space to match the kernel characteristics: +

kkk = schedule.split(kk, inner_dim_unroll)
+iii = schedule.split(ii, num_rows_in_kernel)
+jjj = schedule.split(jj, (target.vector_bytes // 4) * 2) # There are 2 vfma execution units, each holding (target.vector_bytes // 4) 32-bit float elements
+jjjj = schedule.split(jjj, target.vector_bytes // 4) # Each SIMD register holds (target.vector_bytes // 4) 32-bit float elements
+

+

Accera will handle the encountered boundary conditions for each of these splits and do appropriate optimizations such as loop un-switching to ensure that efficient code gets generated in those cases.

+

Set the order to traverse the iteration space. We start with the outer indices that control the tiling, then move to the innermost indices that are used in the kernel: +

schedule.reorder(j, k, i, jj, kk, ii, kkk, iii, jjj, jjjj)
+

+

Create a plan from the schedule and the current target. The plan allows us to control specific execution behavior on the hardware target, such as vectorization and caching, which are essential for high performance: +

plan = schedule.create_plan(target)
+

+

Add caching. We use an input cache for array B that exceeds our threshold. The B cache will be packed according to the access pattern specified by the schedule. We use an input/output cache for array C. See caching for more information: +

# Cache the B array by prefetching and packing the memory footprint along slices of the jj dimension.
+plan.cache(B, jj)
+# Cache the C array along slices of jj dimension. Since the C array is the output, its footprint is
+# the size of the kernel. If the kernel is small enough, Accera will use registers for this
+# accumulation before writing these values back to C.
+plan.cache(C, jj)
+

+

Kernelize the inner dimensions, which applies unroll and vectorize transformations allowing use of SIMD registers: +

plan.kernelize(unroll_indices=[jjj, iii, kkk], vectorize_indices=jjjj)
+

+

Use the plan to add a function named optimized_matmul_py to a HAT package. +

# Create a package and add a function to the package based on the plan
+package = acc.Package()
+package.add(plan, args=(A, B, C), base_name="optimized_matmul_py")
+

+

Finally, we build the HAT package: +

# Build a statically-linked HAT package to be consumed by the C++ runner
+package.build(name="optimized_matmul", format=acc.Package.Format.HAT_STATIC)
+

+

By now, you should have all the code necessary to generate an optimized Accera MatMul function. You can find the complete Python script here.

+

Generate HAT package

+

Next, we run the generator script to produce a HAT package.

+
Windows/MacOS
+
python optimized_matmul_generator.py
+
+
Ubuntu
+
python3 optimized_matmul_generator.py
+
+

The generator script produces a HAT file (optimized_matmul.hat). Examine this file, and you will see that it contains the exported function with the following meta-data:

+
[functions.optimized_matmul_py_4a6286d9]
+name = 'optimized_matmul_py_4a6286d9'
+description = ''
+calling_convention = "cdecl"
+arguments = [
+    {name = '', description = '', logical_type = "affine_array", declared_type = 'float*', element_type = 'float', usage = "input_output", shape = [ 784, 128 ], affine_map = [ 128, 1 ], affine_offset = 0},
+    {name = '', description = '', logical_type = "affine_array", declared_type = 'float*', element_type = 'float', usage = "input_output", shape = [ 128, 512 ], affine_map = [ 512, 1 ], affine_offset = 0},
+    {name = '', description = '', logical_type = "affine_array", declared_type = 'float*', element_type = 'float', usage = "input_output", shape = [ 784, 512 ], affine_map = [ 512, 1 ], affine_offset = 0}
+]
+return = {name = '', description = '', logical_type = "void", declared_type = 'void', element_type = 'void', usage = "output"}
+
+

The C declaration from the header is: +

void optimized_matmul_py_4a6286d9(float*, float*, float*);
+

+

Accera automatically appends a unique identifier to the function implementation, such as optimized_matmul_py_4a6286d9 to support auto-tuning. This name is re-generated every time the HAT package is rebuilt. To make it easier for client code to use the function, Accera also provides a fixed-name alias, optimized_matmul_py, for the same function.

+

To see how Accera generates code for the iteration space transformations and the plan, you can change the format=HAT to format=MLIR, which will output MLIR for each lowering phase. Stepping through the progression of lowerings, you can see how Accera moves from simple representation of the Accera DSL, to the final optimized assembly.

+

Compare this to previous tutorial, whose naive DSL is given here, and final assembly can be viewed here.

+

Runner code

+

Let's see how to call our MatMul implementation from the HAT package.

+

Create a file called optimized_matmul_runner.cpp with the code below. You can also find it here.

+
#include <stdio.h>
+#include <algorithm>
+
+// Include the HAT file that declares our MatMul function
+#include "optimized_matmul.hat"
+
+#define M 784
+#define N 512
+#define K 128
+
+int main(int argc, const char** argv)
+{
+    // Prepare our matrices (using the heap for large matrices)
+    float* A = new float[M*K];
+    float* B = new float[K*N];
+    float* C = new float[M*N];
+
+    // Fill with data
+    std::fill_n(A, M*K, 2.0f);
+    std::fill_n(B, K*N, 3.0f);
+    std::fill_n(C, M*N, 0.42f);
+
+    printf("Calling MatMul M=%d, K=%d, N=%d\n", M, K, N);
+    optimized_matmul_py(A, B, C);
+
+    printf("Result (first 10 elements): ");
+    for (int i = 0; i < 10; ++i)
+    {
+        printf("%f ", C[i]);
+    }
+    printf("\n");
+
+    delete[] A;
+    delete[] B;
+    delete[] C;
+    return 0;
+}
+
+

The above code creates the matrices A, B, and C and calls the function optimized_matmul_py to perform MatMul.

+

Now that we have the code, let's compile and link it with the HAT package to create an executable. Save the file to your working directory, in the same location as optimized_matmul_generator.py and the generated *.hat and object files.

+

Build and run

+
Windows
+

We will need the 64-bit Visual C++ tools to link against the generated 64-bit .obj. From an "x64 Native Tools Command Prompt":

+
cl.exe optimized_matmul_runner.cpp *.lib
+optimized_matmul_runner.exe
+
+
MacOS
+
clang++ optimized_matmul_runner.cpp *.a -o optimized_matmul_runner
+./optimized_matmul_runner
+
+
Ubuntu
+
g++ optimized_matmul_runner.cpp *.a -o optimized_matmul_runner
+./optimized_matmul_runner
+
+

The output should look like:

+
Calling MatMul M=784, K=128, N=512
+Result (first 10 elements): 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983
+
+

You can now experiment with the generated MatMul function with your own inputs.

+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Tutorials/Pi3_Cross_Compilation/index.html b/Tutorials/Pi3_Cross_Compilation/index.html new file mode 100644 index 00000000..79b5466a --- /dev/null +++ b/Tutorials/Pi3_Cross_Compilation/index.html @@ -0,0 +1,2449 @@ + + + + + + + + + + + + + + + + + + + + + + + + Pi3 Cross Compilation - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Cross Compiling for the Raspberry Pi 3

+

By the end of this tutorial, you will learn how to:

+
    +
  • Cross compile a simple Matrix Multiplication (MatMul) function for execution on a Raspberry Pi 3.
  • +
  • Produce a HAT package containing the MatMul function that can be called on the Pi 3 target.
  • +
  • Call the function on a Raspberry Pi 3 from C/C++ code.
  • +
+

Prerequisites

+
    +
  • You should have Accera installed. If not, you can find the instructions in here.
  • +
  • Be familiar with writing Python and C++ code.
  • +
  • Have access to a Raspberry Pi 3 device.
  • +
+

A naive MatMul algorithm

+

Consider the example of multiplying matrices A and B and adding the result into matrix C. In NumPy syntax, this can be expressed as:

+
C += A @ B
+
+

A naive algorithm for matrix multiplication typically contains 3 nested for-loops. In Python, this can be expressed as:

+
# A.shape = (M, K), B.shape = (K, N), C.shape = (M, N)
+
+for i in range(M):
+    for j in range(N):
+        for k in range(K):
+            C[i, j] += A[i, k] * B[k, j]
+
+

Accera Python DSL

+

Let's walk through a naïve Matrix Multiplication (MatMul) using Accera. Instead of using the default target, i.e., the host machine, we specify a target representing a Raspberry Pi 3 to cross-compile the host for a different target.

+

Create an empty file called hello_matmul_pi3_generator.py. First, we import Accera's module:

+
import accera as acc
+
+

Define some matrix sizes, where A's shape is M by K, B's is K by N, and C's, M by N.

+
# Define our matrix sizes
+M = 128
+N = 256
+K = 256
+
+

Write a Python function that receives A, B, and C arrays. These are our input and input/output matrices.

+
A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))
+B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N))
+C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))
+
+

We now use the Nest class to define our 3-layered nested for-loop. The range indices are M, N, and K, with the outermost loop (M) listed first. We can get the loop nest indices to perform the computation.

+
# Define the loop nest
+nest = acc.Nest(shape=(M, N, K))
+
+# Get the loop nest indices
+i, j, k = nest.get_indices()
+
+

Next, we define the logic for every iteration of the loop nest: +

# Define the loop nest logic
+@nest.iteration_logic
+def _():
+    C[i, j] += A[i, k] * B[k, j]
+

+

We have finished defining the logic of MatMul. Let's now define the schedule which controls the execution of logic. For this, we first create the schedule from the nest:

+
sched = nest.create_schedule()
+
+

At this point, sched represents the default schedule for our algorithm. We can also perform some basic transformations on this schedule. For example, the following lines of code split the k index into blocks of 4 ( k, k+4, k+8, and so on).

+
# Split the k loop into blocks of 4, effectively doing this
+# (assuming K is divisible by 4):
+#
+# for i in range(M):
+#    for j in range(N):
+#        # Split k into two loops
+#        for k in range(0, K, 4):
+#            for kk in range(4):
+#                C[i, j] += A[i, k + kk] * B[k + kk, j]
+#
+# If k is not divisible by 4, Accera will take care of the boundary
+# case for you.
+kk = sched.split(k, 4)
+
+

The split index is now k and kk.

+

The next step is to create a plan from the schedule. For instance, we can use this plan to unroll the innermost loop.

+
# Create a plan, specify the target to be a Raspberry Pi 3
+pi3 = acc.Target(acc.Target.Model.RASPBERRY_PI_3B)
+plan = sched.create_plan(pi3)
+
+# Unroll kk, effectively doing this
+# (assuming K is divisible by 4):
+#
+# for i in range(M):
+#    for j in range(N):
+#        for k in range(0, K, 4):
+#            # Unrolled kk
+#            C[i, j] += A[i, k + 0] * B[k + 0, j]
+#            C[i, j] += A[i, k + 1] * B[k + 1, j]
+#            C[i, j] += A[i, k + 2] * B[k + 2, j]
+#            C[i, j] += A[i, k + 3] * B[k + 3, j]
+#
+# If k is not divisible by 4, Accera will take care of the boundary
+# case for you.
+plan.unroll(kk)
+
+

Use the plan to add a callable function named hello_matmul_pi3_py to a HAT package.

+
# Create a package and add a function to the package based on the plan
+package = acc.Package()
+package.add(plan, args=(A, B, C), base_name="hello_matmul_pi3_py")
+
+

Finally, we build the statically-linked HAT package for the Raspbian platform: +

# Build the HAT package
+package.build(name="hello_matmul_pi3", format=acc.Package.Format.HAT_STATIC, platform=acc.Package.Platform.RASPBIAN)
+
+After following the above steps, you should now have all the code necessary to generate your Accera MatMul function that can be called on a Raspberry Pi 3 target. You can find the complete Python script here.

+

Generate HAT package

+

Next, we run the generator script to produce a HAT package for the Raspberry Pi 3 target.

+

Windows/MacOS

+
python hello_matmul_pi3_generator.py
+
+

Ubuntu

+
python3 hello_matmul_pi3_generator.py
+
+

After we run the script, there should be a header file hello_matmul_pi3.hat and an object file hello_matmul_pi3.o in the ELF format. The .hat file format is described here. Collectively, we call the .hat file and object file a "HAT package".

+

Runner code

+

Let's now see how we can call our MatMul implementation from the HAT package on the Raspberry Pi 3.

+

Create a file called hello_matmul_pi3_runner.cpp with the code below. You can find it here.

+
#include <stdio.h>
+#include <algorithm>
+
+// Include the HAT file that declares our MatMul function
+#include "hello_matmul_p3.HAT"
+
+#define M 128
+#define N 256
+#define K 256
+
+int main(int argc, const char** argv)
+{
+    // Prepare our matrices
+    float A[M*K];
+    float B[K*N];
+    float C[M*N];
+
+    // Fill with data
+    std::fill_n(A, M*K, 2.0f);
+    std::fill_n(B, K*N, 3.0f);
+    std::fill_n(C, M*N, 0.42f);
+
+    printf("Calling MatMul M=%d, K=%d, N=%d\n", M, K, N);
+    hello_matmul_py(A, B, C);
+
+    printf("Result (first few elements): ");
+    for (int i = 0; i < 10; ++i)
+    {
+        printf("%f ", C[i]);
+    }
+    printf("\n");
+    return 0;
+}
+
+

The above code creates the A, B, and C matrices and calls the function hello_matmul_pi3_py to perform MatMul.

+

Now that we have written the code, we compile and link it with the HAT package to create an executable file. Save this file to your working directory, in the exact location as hello_matmul_pi3_generator.py, and the generated *.hat and *.o files.

+

Build and run

+

On the Raspberry Pi 3 device

+

For this step, you'll be working with your Raspberry Pi device. If your Pi device is accessible over the network, copy hello_matmul_pi3_runner.cpp, hello_matmul_pi3.hat, and hello_matmul_pi3.o using the Unix scp tool or the Windows WinSCP tool here., otherwise use a USB thumb drive to transfer files manually. You do not need to copy the other generated files and folders.

+

You also need gcc. Although it is often installed by default on Raspberry Pi 3 systems, type this for confirmation:

+
sudo apt-get install -y gcc
+
+

This has been verified with "Raspbian GNU/Linux 9 (stretch)" and gcc<4:6.3.0-4> and should work with subsequent versions. +Now, you can run the following commands to build and run.

+
gcc hello_matmul_pi3_runner.cpp hello_matmul_pi3.o -o hello_matmul_pi3_runner
+./hello_matmul_pi3_runner
+
+

The output should look like:

+
Calling MatMul M=128, K=256, N=256
+Result (first few elements): 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922
+
+

You can now experiment with the generated MatMul function with your own inputs. To try different inputs, you can modify hello_matmul_pi3_runner.cpp on the Raspberry Pi 3 and recompile it with the existing HAT package.

+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Tutorials/cross_compilation_pi3/hello_matmul_pi3_generator.py b/Tutorials/cross_compilation_pi3/hello_matmul_pi3_generator.py new file mode 100644 index 00000000..f413edf8 --- /dev/null +++ b/Tutorials/cross_compilation_pi3/hello_matmul_pi3_generator.py @@ -0,0 +1,42 @@ +#!/usr/bin/env python3 +# Cross compilation for pi3 sample: Accera Hello MatMul generator +import accera as acc + +# Define our matrix sizes +M = 128 +N = 256 +K = 256 + +A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K)) +B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N)) +C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N)) + +# Define the loop nest +nest = acc.Nest(shape=(M, N, K)) + +# Get the loop nest indices +i, j, k = nest.get_indices() + +# Define the loop nest logic +@nest.iteration_logic +def _(): + C[i, j] += A[i, k] * B[k, j] + +schedule = nest.create_schedule() + +# Split the k loop into blocks of 4 +kk = schedule.split(k, 4) + +# Create a plan, specify the target to be a Raspberry Pi 3 +pi3 = acc.Target(acc.Target.Model.RASPBERRY_PI_3B) +plan = schedule.create_plan(pi3) + +# Then unroll kk +plan.unroll(kk) + +# Create a package and add a function to the package based on the plan +package = acc.Package() +package.add(plan, args=(A, B, C), base_name="hello_matmul_pi3_py") + +# Build the HAT package +package.build(name="hello_matmul_pi3", format=acc.Package.Format.HAT_STATIC, platform=acc.Package.Platform.RASPBIAN) diff --git a/Tutorials/cross_compilation_pi3/hello_matmul_pi3_runner.cpp b/Tutorials/cross_compilation_pi3/hello_matmul_pi3_runner.cpp new file mode 100644 index 00000000..92b72e91 --- /dev/null +++ b/Tutorials/cross_compilation_pi3/hello_matmul_pi3_runner.cpp @@ -0,0 +1,33 @@ +#include +#include + +// Include the HAT file that declares our MatMul function +#include "hello_matmul_pi3.hat" + +#define M 128 +#define N 256 +#define K 256 + +int main(int argc, const char** argv) +{ + // Prepare our matrices + float A[M*K]; + float B[K*N]; + float C[M*N]; + + // Fill with data + std::fill_n(A, M*K, 2.0f); + std::fill_n(B, K*N, 3.0f); + std::fill_n(C, M*N, 0.42f); + + printf("Calling MatMul M=%d, K=%d, N=%d\n", M, K, N); + hello_matmul_py(A, B, C); + + printf("Result (first few elements): "); + for (int i = 0; i < 10; ++i) + { + printf("%f ", C[i]); + } + printf("\n"); + return 0; +} diff --git a/Tutorials/hello_matmul/hello_matmul_generator.py b/Tutorials/hello_matmul/hello_matmul_generator.py new file mode 100644 index 00000000..3f8ac360 --- /dev/null +++ b/Tutorials/hello_matmul/hello_matmul_generator.py @@ -0,0 +1,38 @@ +#!/usr/bin/env python3 +# Accera Hello MatMul sample: generator +import accera as acc + +# Define our matrix sizes +M = 128 +N = 256 +K = 256 + +A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K)) +B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N)) +C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N)) + +# Define the loop nest +nest = acc.Nest(shape=(M, N, K)) + +# Get the loop nest indices +i, j, k = nest.get_indices() + +# Define the loop nest logic +@nest.iteration_logic +def _(): + C[i, j] += A[i, k] * B[k, j] + +schedule = nest.create_schedule() +# Split the k loop into blocks of 4 +kk = schedule.split(k, 4) + +plan = schedule.create_plan() +# Then unroll kk +plan.unroll(kk) + +# Create a package and add a function to the package based on the plan +package = acc.Package() +package.add(plan, args=(A, B, C), base_name="hello_matmul_py") + +# Build the HAT package +package.build("hello_matmul") diff --git a/Tutorials/hello_matmul/hello_matmul_runner.cpp b/Tutorials/hello_matmul/hello_matmul_runner.cpp new file mode 100644 index 00000000..e5693cda --- /dev/null +++ b/Tutorials/hello_matmul/hello_matmul_runner.cpp @@ -0,0 +1,33 @@ +#include +#include + +// Include the HAT file that declares our MatMul function +#include "hello_matmul.hat" + +#define M 128 +#define N 256 +#define K 256 + +int main(int argc, const char** argv) +{ + // Prepare our matrices + float A[M*K]; + float B[K*N]; + float C[M*N]; + + // Fill with data + std::fill_n(A, M*K, 2.0f); + std::fill_n(B, K*N, 3.0f); + std::fill_n(C, M*N, 0.42f); + + printf("Calling MatMul M=%d, K=%d, N=%d\n", M, K, N); + hello_matmul_py(A, B, C); + + printf("Result (first few elements): "); + for (int i = 0; i < 10; ++i) + { + printf("%f ", C[i]); + } + printf("\n"); + return 0; +} diff --git a/Tutorials/hello_matmul/mlir/0_Initial.mlir b/Tutorials/hello_matmul/mlir/0_Initial.mlir new file mode 100644 index 00000000..cace9f5c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/0_Initial.mlir @@ -0,0 +1,50 @@ +#map0 = affine_map<(d0, d1) -> (d0 * 256 + d1)> +#map1 = affine_map<()[s0] -> (s0)> + + +#domain0 = #accln<"idomain{{i,0}={0:128:1}, {j,1}={0:256:1}, {k,2}={0:256:1}}"> + +#xdomain0 = #accln<"xfdomain{dims: {{i,0}, {j,1}, {k,2}}, indices: {{{i,0} : {0:128:1}}, {{j,1} : {0:256:1}}, {{k,2} : {0:256:1} = {(d0, d1) -> (d0 + d1), {{k_o,3}, {k_i,4}}}}, {{k_o,3} : {0:256:4}}, {{k_i,4} : {0:4:1}}}}"> + +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "hello_matmul" { + accv.func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, #map0>, %arg1: memref<256x256xf32, #map0>, %arg2: memref<128x256xf32, #map0>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + "accln.nest"() ( { + %0 = accln.sym_index {name = "k_i", reference = "k"} #accln<"index{k_i,4}"> loc(unknown) + %1 = accln.sym_index {name = "k_o", reference = "k"} #accln<"index{k_o,3}"> loc(unknown) + %2 = accln.sym_index {name = "k_i"} #accln<"index{k_i,4}"> loc(unknown) + %3 = accln.sym_index {name = "k_o"} #accln<"index{k_o,3}"> loc(unknown) + %4 = accln.sym_index {name = "i"} #accln<"index{i,0}"> loc(unknown) + %5 = accln.sym_index {name = "j"} #accln<"index{j,1}"> loc(unknown) + %6 = accln.sym_index {name = "k"} #accln<"index{k,2}"> loc(unknown) + "accln.kernel"() ( { + %8 = "accv.slice"(%arg2, %4, %5) {sliceDimensions = [0, 1]} : (memref<128x256xf32, #map0>, index, index) -> memref loc(unknown) + %9 = "accv.slice"(%arg0, %4, %6) {sliceDimensions = [0, 1]} : (memref<128x256xf32, #map0>, index, index) -> memref loc(unknown) + %10 = "accv.slice"(%arg1, %6, %5) {sliceDimensions = [0, 1]} : (memref<256x256xf32, #map0>, index, index) -> memref loc(unknown) + %11 = "accv.get_element"(%9) : (memref) -> f32 loc(unknown) + %12 = "accv.get_element"(%10) : (memref) -> f32 loc(unknown) + %13 = "accv.bin_op"(%11, %12) {predicate = 2 : i64} : (f32, f32) -> f32 loc(unknown) + %14 = "accv.get_element"(%8) : (memref) -> f32 loc(unknown) + %15 = "accv.bin_op"(%14, %13) {predicate = 0 : i64} : (f32, f32) -> f32 loc(unknown) + "accv.copy"(%15, %8) : (f32, memref) -> () loc(unknown) + %16 = "accv.slice"(%arg2, %4, %5) {sliceDimensions = [0, 1]} : (memref<128x256xf32, #map0>, index, index) -> memref loc(unknown) + %17 = "accv.get_element"(%8) : (memref) -> f32 loc(unknown) + "accv.copy"(%17, %16) : (f32, memref) -> () loc(unknown) + accln.terminator loc(unknown) + }) {sym_name = "_"} : () -> () loc(unknown) + %7 = "accln.null_pred"() : () -> i1 loc(unknown) + "accln.scheduled_kernel"(%7) {kernel = @_, sym_name = "scheduled__"} : (i1) -> () loc(unknown) + "accln.schedule"() ( { + "accln.exec_plan"() {exec_target = 0 : i64} : () -> () loc(unknown) + accln.terminator loc(unknown) + }) {domain = #xdomain0, kernels = [@scheduled__], loopattrs = [], order = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k_o,3}">, #accln<"index{k_i,4}">], parallel = [], unroll_and_jammed = {}, unrolled = [4 : index]} : () -> () loc(unknown) + accln.terminator loc(unknown) + }) {domain = #domain0, exec_target = 0 : i64, kernels = []} : () -> () loc(unknown) + accv.return loc(unknown) + } loc(unknown) + accv.func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, #map0>, %arg1: memref<256x256xf32, #map0>, %arg2: memref<128x256xf32, #map0>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, #map0>, memref<256x256xf32, #map0>, memref<128x256xf32, #map0>) -> () loc(unknown) + accv.return loc(unknown) + } loc(unknown) + } loc(unknown) +} loc(unknown) diff --git a/Tutorials/hello_matmul/mlir/10_CSE.mlir b/Tutorials/hello_matmul/mlir/10_CSE.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/10_CSE.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/10_Canonicalizer.mlir b/Tutorials/hello_matmul/mlir/10_Canonicalizer.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/10_Canonicalizer.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/11_CSE.mlir b/Tutorials/hello_matmul/mlir/11_CSE.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/11_CSE.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/11_GpuKernelOutlining.mlir b/Tutorials/hello_matmul/mlir/11_GpuKernelOutlining.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/11_GpuKernelOutlining.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/12_GpuKernelOutlining.mlir b/Tutorials/hello_matmul/mlir/12_GpuKernelOutlining.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/12_GpuKernelOutlining.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/12_LegalizeStandardForSPIRV.mlir b/Tutorials/hello_matmul/mlir/12_LegalizeStandardForSPIRV.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/12_LegalizeStandardForSPIRV.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/13_ConvertAcceraToSPIRV.mlir b/Tutorials/hello_matmul/mlir/13_ConvertAcceraToSPIRV.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/13_ConvertAcceraToSPIRV.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/13_LegalizeStandardForSPIRV.mlir b/Tutorials/hello_matmul/mlir/13_LegalizeStandardForSPIRV.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/13_LegalizeStandardForSPIRV.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/14_Canonicalizer.mlir b/Tutorials/hello_matmul/mlir/14_Canonicalizer.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/14_Canonicalizer.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/14_ConvertAcceraToSPIRV.mlir b/Tutorials/hello_matmul/mlir/14_ConvertAcceraToSPIRV.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/14_ConvertAcceraToSPIRV.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/15_Canonicalizer.mlir b/Tutorials/hello_matmul/mlir/15_Canonicalizer.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/15_Canonicalizer.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/15_SPIRVLowerABIAttributes.mlir b/Tutorials/hello_matmul/mlir/15_SPIRVLowerABIAttributes.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/15_SPIRVLowerABIAttributes.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/16_SPIRVLowerABIAttributes.mlir b/Tutorials/hello_matmul/mlir/16_SPIRVLowerABIAttributes.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/16_SPIRVLowerABIAttributes.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/16_SPIRVUpdateVCE.mlir b/Tutorials/hello_matmul/mlir/16_SPIRVUpdateVCE.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/16_SPIRVUpdateVCE.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/17_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir b/Tutorials/hello_matmul/mlir/17_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/17_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/17_SPIRVUpdateVCE.mlir b/Tutorials/hello_matmul/mlir/17_SPIRVUpdateVCE.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/17_SPIRVUpdateVCE.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/18_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir b/Tutorials/hello_matmul/mlir/18_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/18_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/18_EmitVulkanWrapper.mlir b/Tutorials/hello_matmul/mlir/18_EmitVulkanWrapper.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/18_EmitVulkanWrapper.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/19_EmitVulkanWrapper.mlir b/Tutorials/hello_matmul/mlir/19_EmitVulkanWrapper.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/19_EmitVulkanWrapper.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/19_SCFToStandard.mlir b/Tutorials/hello_matmul/mlir/19_SCFToStandard.mlir new file mode 100644 index 00000000..a108e75d --- /dev/null +++ b/Tutorials/hello_matmul/mlir/19_SCFToStandard.mlir @@ -0,0 +1,75 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + br ^bb1(%c0 : index) + ^bb1(%0: index): // 2 preds: ^bb0, ^bb8 + %1 = cmpi "slt", %0, %c128 : index + cond_br %1, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + br ^bb3(%c0 : index) + ^bb3(%2: index): // 2 preds: ^bb2, ^bb7 + %3 = cmpi "slt", %2, %c256 : index + cond_br %3, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + br ^bb5(%c0 : index) + ^bb5(%4: index): // 2 preds: ^bb4, ^bb6 + %5 = cmpi "slt", %4, %c256 : index + cond_br %5, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %6 = load %arg0[%0, %4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %7 = load %arg1[%4, %2] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = mulf %6, %7 {RelaxedPrecision} : f32 + %9 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %10 = addf %9, %8 {RelaxedPrecision} : f32 + store %10, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %11, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = addi %4, %c1 : index + %13 = load %arg0[%0, %12] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %14 = load %arg1[%12, %2] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = mulf %13, %14 {RelaxedPrecision} : f32 + %16 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %17 = addf %16, %15 {RelaxedPrecision} : f32 + store %17, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %18, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = addi %4, %c2 : index + %20 = load %arg0[%0, %19] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %21 = load %arg1[%19, %2] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = mulf %20, %21 {RelaxedPrecision} : f32 + %23 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %24 = addf %23, %22 {RelaxedPrecision} : f32 + store %24, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %25, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = addi %4, %c3 : index + %27 = load %arg0[%0, %26] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %28 = load %arg1[%26, %2] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %29 = mulf %27, %28 {RelaxedPrecision} : f32 + %30 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %31 = addf %30, %29 {RelaxedPrecision} : f32 + store %31, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %32 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %32, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %33 = addi %4, %c4 : index + br ^bb5(%33 : index) + ^bb7: // pred: ^bb5 + %34 = addi %2, %c1 : index + br ^bb3(%34 : index) + ^bb8: // pred: ^bb3 + %35 = addi %0, %c1 : index + br ^bb1(%35 : index) + ^bb9: // pred: ^bb1 + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/1_Canonicalizer.mlir b/Tutorials/hello_matmul/mlir/1_Canonicalizer.mlir new file mode 100644 index 00000000..dbf434d4 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/1_Canonicalizer.mlir @@ -0,0 +1,34 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "hello_matmul" { + accv.func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = accln.sym_index {name = "i"} #accln<"index{i,0}"> + %1 = accln.sym_index {name = "j"} #accln<"index{j,1}"> + %2 = accln.sym_index {name = "k"} #accln<"index{k,2}"> + "accln.nest"() ( { + "accln.kernel"() ( { + %4 = load %arg0[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg1[%2, %1] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = "accv.bin_op"(%4, %5) {predicate = 2 : i64} : (f32, f32) -> f32 + %7 = load %arg2[%0, %1] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = "accv.bin_op"(%7, %6) {predicate = 0 : i64} : (f32, f32) -> f32 + store %8, %arg2[%0, %1] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = load %arg2[%0, %1] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %9, %arg2[%0, %1] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + accln.terminator + }) {sym_name = "_"} : () -> () + %3 = "accln.null_pred"() : () -> i1 + "accln.scheduled_kernel"(%3) {kernel = @_, sym_name = "scheduled__"} : (i1) -> () + "accln.schedule"() ( { + "accln.exec_plan"() {exec_target = 0 : i64} : () -> () + accln.terminator + }) {domain = #accln<"xfdomain{dims: {{i,0}, {j,1}, {k,2}}, indices: {{{i,0} : {0:128:1}}, {{j,1} : {0:256:1}}, {{k,2} : {0:256:1} = {(d0, d1) -> (d0 + d1), {{k_o,3}, {k_i,4}}}}, {{k_o,3} : {0:256:4}}, {{k_i,4} : {0:4:1}}}}">, kernels = [@scheduled__], loopattrs = [], order = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k_o,3}">, #accln<"index{k_i,4}">], parallel = [], unroll_and_jammed = {}, unrolled = [4 : index]} : () -> () + accln.terminator + }) {domain = #accln<"idomain{{i,0}={0:128:1}, {j,1}={0:256:1}, {k,2}={0:256:1}}">, exec_target = 0 : i64, kernels = []} : () -> () + accv.return + } + accv.func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + accv.return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/20_ConvertValueToLLVM.mlir b/Tutorials/hello_matmul/mlir/20_ConvertValueToLLVM.mlir new file mode 100644 index 00000000..069088f1 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/20_ConvertValueToLLVM.mlir @@ -0,0 +1,375 @@ +module @hello_matmul { + llvm.func @hello_matmul_py_impl_16252232176815793891(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(128 : index) : !llvm.i64 + %25 = llvm.mlir.constant(0 : index) : !llvm.i64 + %26 = llvm.mlir.constant(256 : index) : !llvm.i64 + %27 = llvm.mlir.constant(4 : index) : !llvm.i64 + %28 = llvm.mlir.constant(1 : index) : !llvm.i64 + %29 = llvm.mlir.constant(2 : index) : !llvm.i64 + %30 = llvm.mlir.constant(3 : index) : !llvm.i64 + llvm.br ^bb1(%25 : !llvm.i64) + ^bb1(%31: !llvm.i64): // 2 preds: ^bb0, ^bb8 + %32 = llvm.icmp "slt" %31, %24 : !llvm.i64 + llvm.cond_br %32, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%25 : !llvm.i64) + ^bb3(%33: !llvm.i64): // 2 preds: ^bb2, ^bb7 + %34 = llvm.icmp "slt" %33, %26 : !llvm.i64 + llvm.cond_br %34, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%25 : !llvm.i64) + ^bb5(%35: !llvm.i64): // 2 preds: ^bb4, ^bb6 + %36 = llvm.icmp "slt" %35, %26 : !llvm.i64 + llvm.cond_br %36, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %37 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %38 = llvm.mlir.constant(0 : index) : !llvm.i64 + %39 = llvm.mlir.constant(256 : index) : !llvm.i64 + %40 = llvm.mul %31, %39 : !llvm.i64 + %41 = llvm.add %38, %40 : !llvm.i64 + %42 = llvm.mlir.constant(1 : index) : !llvm.i64 + %43 = llvm.mul %35, %42 : !llvm.i64 + %44 = llvm.add %41, %43 : !llvm.i64 + %45 = llvm.getelementptr %37[%44] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %46 = llvm.load %45 : !llvm.ptr + %47 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.mlir.constant(0 : index) : !llvm.i64 + %49 = llvm.mlir.constant(256 : index) : !llvm.i64 + %50 = llvm.mul %35, %49 : !llvm.i64 + %51 = llvm.add %48, %50 : !llvm.i64 + %52 = llvm.mlir.constant(1 : index) : !llvm.i64 + %53 = llvm.mul %33, %52 : !llvm.i64 + %54 = llvm.add %51, %53 : !llvm.i64 + %55 = llvm.getelementptr %47[%54] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %56 = llvm.load %55 : !llvm.ptr + %57 = llvm.fmul %46, %56 {RelaxedPrecision} : !llvm.float + %58 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.mlir.constant(0 : index) : !llvm.i64 + %60 = llvm.mlir.constant(256 : index) : !llvm.i64 + %61 = llvm.mul %31, %60 : !llvm.i64 + %62 = llvm.add %59, %61 : !llvm.i64 + %63 = llvm.mlir.constant(1 : index) : !llvm.i64 + %64 = llvm.mul %33, %63 : !llvm.i64 + %65 = llvm.add %62, %64 : !llvm.i64 + %66 = llvm.getelementptr %58[%65] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %67 = llvm.load %66 : !llvm.ptr + %68 = llvm.fadd %67, %57 {RelaxedPrecision} : !llvm.float + %69 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %70 = llvm.mlir.constant(0 : index) : !llvm.i64 + %71 = llvm.mlir.constant(256 : index) : !llvm.i64 + %72 = llvm.mul %31, %71 : !llvm.i64 + %73 = llvm.add %70, %72 : !llvm.i64 + %74 = llvm.mlir.constant(1 : index) : !llvm.i64 + %75 = llvm.mul %33, %74 : !llvm.i64 + %76 = llvm.add %73, %75 : !llvm.i64 + %77 = llvm.getelementptr %69[%76] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %68, %77 : !llvm.ptr + %78 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %79 = llvm.mlir.constant(0 : index) : !llvm.i64 + %80 = llvm.mlir.constant(256 : index) : !llvm.i64 + %81 = llvm.mul %31, %80 : !llvm.i64 + %82 = llvm.add %79, %81 : !llvm.i64 + %83 = llvm.mlir.constant(1 : index) : !llvm.i64 + %84 = llvm.mul %33, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.getelementptr %78[%85] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %87 = llvm.load %86 : !llvm.ptr + %88 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.mlir.constant(0 : index) : !llvm.i64 + %90 = llvm.mlir.constant(256 : index) : !llvm.i64 + %91 = llvm.mul %31, %90 : !llvm.i64 + %92 = llvm.add %89, %91 : !llvm.i64 + %93 = llvm.mlir.constant(1 : index) : !llvm.i64 + %94 = llvm.mul %33, %93 : !llvm.i64 + %95 = llvm.add %92, %94 : !llvm.i64 + %96 = llvm.getelementptr %88[%95] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %87, %96 : !llvm.ptr + %97 = llvm.add %35, %28 : !llvm.i64 + %98 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %99 = llvm.mlir.constant(0 : index) : !llvm.i64 + %100 = llvm.mlir.constant(256 : index) : !llvm.i64 + %101 = llvm.mul %31, %100 : !llvm.i64 + %102 = llvm.add %99, %101 : !llvm.i64 + %103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %104 = llvm.mul %97, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.getelementptr %98[%105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %107 = llvm.load %106 : !llvm.ptr + %108 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %110 = llvm.mlir.constant(256 : index) : !llvm.i64 + %111 = llvm.mul %97, %110 : !llvm.i64 + %112 = llvm.add %109, %111 : !llvm.i64 + %113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %114 = llvm.mul %33, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.getelementptr %108[%115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %117 = llvm.load %116 : !llvm.ptr + %118 = llvm.fmul %107, %117 {RelaxedPrecision} : !llvm.float + %119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %121 = llvm.mlir.constant(256 : index) : !llvm.i64 + %122 = llvm.mul %31, %121 : !llvm.i64 + %123 = llvm.add %120, %122 : !llvm.i64 + %124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %125 = llvm.mul %33, %124 : !llvm.i64 + %126 = llvm.add %123, %125 : !llvm.i64 + %127 = llvm.getelementptr %119[%126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %128 = llvm.load %127 : !llvm.ptr + %129 = llvm.fadd %128, %118 {RelaxedPrecision} : !llvm.float + %130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %132 = llvm.mlir.constant(256 : index) : !llvm.i64 + %133 = llvm.mul %31, %132 : !llvm.i64 + %134 = llvm.add %131, %133 : !llvm.i64 + %135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %136 = llvm.mul %33, %135 : !llvm.i64 + %137 = llvm.add %134, %136 : !llvm.i64 + %138 = llvm.getelementptr %130[%137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %129, %138 : !llvm.ptr + %139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.mul %31, %141 : !llvm.i64 + %143 = llvm.add %140, %142 : !llvm.i64 + %144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %145 = llvm.mul %33, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.getelementptr %139[%146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %148 = llvm.load %147 : !llvm.ptr + %149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %151 = llvm.mlir.constant(256 : index) : !llvm.i64 + %152 = llvm.mul %31, %151 : !llvm.i64 + %153 = llvm.add %150, %152 : !llvm.i64 + %154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %155 = llvm.mul %33, %154 : !llvm.i64 + %156 = llvm.add %153, %155 : !llvm.i64 + %157 = llvm.getelementptr %149[%156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %148, %157 : !llvm.ptr + %158 = llvm.add %35, %29 : !llvm.i64 + %159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %161 = llvm.mlir.constant(256 : index) : !llvm.i64 + %162 = llvm.mul %31, %161 : !llvm.i64 + %163 = llvm.add %160, %162 : !llvm.i64 + %164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %165 = llvm.mul %158, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.getelementptr %159[%166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %168 = llvm.load %167 : !llvm.ptr + %169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(256 : index) : !llvm.i64 + %172 = llvm.mul %158, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %33, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %178 = llvm.load %177 : !llvm.ptr + %179 = llvm.fmul %168, %178 {RelaxedPrecision} : !llvm.float + %180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %182 = llvm.mlir.constant(256 : index) : !llvm.i64 + %183 = llvm.mul %31, %182 : !llvm.i64 + %184 = llvm.add %181, %183 : !llvm.i64 + %185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %186 = llvm.mul %33, %185 : !llvm.i64 + %187 = llvm.add %184, %186 : !llvm.i64 + %188 = llvm.getelementptr %180[%187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %189 = llvm.load %188 : !llvm.ptr + %190 = llvm.fadd %189, %179 {RelaxedPrecision} : !llvm.float + %191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %193 = llvm.mlir.constant(256 : index) : !llvm.i64 + %194 = llvm.mul %31, %193 : !llvm.i64 + %195 = llvm.add %192, %194 : !llvm.i64 + %196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %197 = llvm.mul %33, %196 : !llvm.i64 + %198 = llvm.add %195, %197 : !llvm.i64 + %199 = llvm.getelementptr %191[%198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %190, %199 : !llvm.ptr + %200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(256 : index) : !llvm.i64 + %203 = llvm.mul %31, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %33, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.load %208 : !llvm.ptr + %210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %212 = llvm.mlir.constant(256 : index) : !llvm.i64 + %213 = llvm.mul %31, %212 : !llvm.i64 + %214 = llvm.add %211, %213 : !llvm.i64 + %215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %216 = llvm.mul %33, %215 : !llvm.i64 + %217 = llvm.add %214, %216 : !llvm.i64 + %218 = llvm.getelementptr %210[%217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %209, %218 : !llvm.ptr + %219 = llvm.add %35, %30 : !llvm.i64 + %220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %222 = llvm.mlir.constant(256 : index) : !llvm.i64 + %223 = llvm.mul %31, %222 : !llvm.i64 + %224 = llvm.add %221, %223 : !llvm.i64 + %225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %226 = llvm.mul %219, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.getelementptr %220[%227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %229 = llvm.load %228 : !llvm.ptr + %230 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %232 = llvm.mlir.constant(256 : index) : !llvm.i64 + %233 = llvm.mul %219, %232 : !llvm.i64 + %234 = llvm.add %231, %233 : !llvm.i64 + %235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %236 = llvm.mul %33, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.getelementptr %230[%237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %239 = llvm.load %238 : !llvm.ptr + %240 = llvm.fmul %229, %239 {RelaxedPrecision} : !llvm.float + %241 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %242 = llvm.mlir.constant(0 : index) : !llvm.i64 + %243 = llvm.mlir.constant(256 : index) : !llvm.i64 + %244 = llvm.mul %31, %243 : !llvm.i64 + %245 = llvm.add %242, %244 : !llvm.i64 + %246 = llvm.mlir.constant(1 : index) : !llvm.i64 + %247 = llvm.mul %33, %246 : !llvm.i64 + %248 = llvm.add %245, %247 : !llvm.i64 + %249 = llvm.getelementptr %241[%248] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %250 = llvm.load %249 : !llvm.ptr + %251 = llvm.fadd %250, %240 {RelaxedPrecision} : !llvm.float + %252 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %253 = llvm.mlir.constant(0 : index) : !llvm.i64 + %254 = llvm.mlir.constant(256 : index) : !llvm.i64 + %255 = llvm.mul %31, %254 : !llvm.i64 + %256 = llvm.add %253, %255 : !llvm.i64 + %257 = llvm.mlir.constant(1 : index) : !llvm.i64 + %258 = llvm.mul %33, %257 : !llvm.i64 + %259 = llvm.add %256, %258 : !llvm.i64 + %260 = llvm.getelementptr %252[%259] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %251, %260 : !llvm.ptr + %261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %263 = llvm.mlir.constant(256 : index) : !llvm.i64 + %264 = llvm.mul %31, %263 : !llvm.i64 + %265 = llvm.add %262, %264 : !llvm.i64 + %266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %267 = llvm.mul %33, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.getelementptr %261[%268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %270 = llvm.load %269 : !llvm.ptr + %271 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %272 = llvm.mlir.constant(0 : index) : !llvm.i64 + %273 = llvm.mlir.constant(256 : index) : !llvm.i64 + %274 = llvm.mul %31, %273 : !llvm.i64 + %275 = llvm.add %272, %274 : !llvm.i64 + %276 = llvm.mlir.constant(1 : index) : !llvm.i64 + %277 = llvm.mul %33, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.getelementptr %271[%278] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %270, %279 : !llvm.ptr + %280 = llvm.add %35, %27 : !llvm.i64 + llvm.br ^bb5(%280 : !llvm.i64) + ^bb7: // pred: ^bb5 + %281 = llvm.add %33, %28 : !llvm.i64 + llvm.br ^bb3(%281 : !llvm.i64) + ^bb8: // pred: ^bb3 + %282 = llvm.add %31, %28 : !llvm.i64 + llvm.br ^bb1(%282 : !llvm.i64) + ^bb9: // pred: ^bb1 + llvm.return + } + llvm.func @hello_matmul_py(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(256 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(256 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(256 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(256 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(256 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(256 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(256 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @hello_matmul_py_impl_16252232176815793891(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/hello_matmul/mlir/20_SCFToStandard.mlir b/Tutorials/hello_matmul/mlir/20_SCFToStandard.mlir new file mode 100644 index 00000000..1c519d1f --- /dev/null +++ b/Tutorials/hello_matmul/mlir/20_SCFToStandard.mlir @@ -0,0 +1,75 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + br ^bb1(%c0 : index) + ^bb1(%0: index): // 2 preds: ^bb0, ^bb8 + %1 = cmpi "slt", %0, %c128 : index + cond_br %1, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + br ^bb3(%c0 : index) + ^bb3(%2: index): // 2 preds: ^bb2, ^bb7 + %3 = cmpi "slt", %2, %c256 : index + cond_br %3, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + br ^bb5(%c0 : index) + ^bb5(%4: index): // 2 preds: ^bb4, ^bb6 + %5 = cmpi "slt", %4, %c256 : index + cond_br %5, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %6 = load %arg0[%0, %4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %7 = load %arg1[%4, %2] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = mulf %6, %7 {RelaxedPrecision} : f32 + %9 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %10 = addf %9, %8 {RelaxedPrecision} : f32 + store %10, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %11, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = addi %4, %c1 : index + %13 = load %arg0[%0, %12] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %14 = load %arg1[%12, %2] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = mulf %13, %14 {RelaxedPrecision} : f32 + %16 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %17 = addf %16, %15 {RelaxedPrecision} : f32 + store %17, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %18, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = addi %4, %c2 : index + %20 = load %arg0[%0, %19] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %21 = load %arg1[%19, %2] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = mulf %20, %21 {RelaxedPrecision} : f32 + %23 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %24 = addf %23, %22 {RelaxedPrecision} : f32 + store %24, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %25, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = addi %4, %c3 : index + %27 = load %arg0[%0, %26] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %28 = load %arg1[%26, %2] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %29 = mulf %27, %28 {RelaxedPrecision} : f32 + %30 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %31 = addf %30, %29 {RelaxedPrecision} : f32 + store %31, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %32 = load %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %32, %arg2[%0, %2] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %33 = addi %4, %c4 : index + br ^bb5(%33 : index) + ^bb7: // pred: ^bb5 + %34 = addi %2, %c1 : index + br ^bb3(%34 : index) + ^bb8: // pred: ^bb3 + %35 = addi %0, %c1 : index + br ^bb1(%35 : index) + ^bb9: // pred: ^bb1 + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/21_ConvertValueToLLVM.mlir b/Tutorials/hello_matmul/mlir/21_ConvertValueToLLVM.mlir new file mode 100644 index 00000000..752331e8 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/21_ConvertValueToLLVM.mlir @@ -0,0 +1,375 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(128 : index) : !llvm.i64 + %25 = llvm.mlir.constant(0 : index) : !llvm.i64 + %26 = llvm.mlir.constant(256 : index) : !llvm.i64 + %27 = llvm.mlir.constant(4 : index) : !llvm.i64 + %28 = llvm.mlir.constant(1 : index) : !llvm.i64 + %29 = llvm.mlir.constant(2 : index) : !llvm.i64 + %30 = llvm.mlir.constant(3 : index) : !llvm.i64 + llvm.br ^bb1(%25 : !llvm.i64) + ^bb1(%31: !llvm.i64): // 2 preds: ^bb0, ^bb8 + %32 = llvm.icmp "slt" %31, %24 : !llvm.i64 + llvm.cond_br %32, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%25 : !llvm.i64) + ^bb3(%33: !llvm.i64): // 2 preds: ^bb2, ^bb7 + %34 = llvm.icmp "slt" %33, %26 : !llvm.i64 + llvm.cond_br %34, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%25 : !llvm.i64) + ^bb5(%35: !llvm.i64): // 2 preds: ^bb4, ^bb6 + %36 = llvm.icmp "slt" %35, %26 : !llvm.i64 + llvm.cond_br %36, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %37 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %38 = llvm.mlir.constant(0 : index) : !llvm.i64 + %39 = llvm.mlir.constant(256 : index) : !llvm.i64 + %40 = llvm.mul %31, %39 : !llvm.i64 + %41 = llvm.add %38, %40 : !llvm.i64 + %42 = llvm.mlir.constant(1 : index) : !llvm.i64 + %43 = llvm.mul %35, %42 : !llvm.i64 + %44 = llvm.add %41, %43 : !llvm.i64 + %45 = llvm.getelementptr %37[%44] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %46 = llvm.load %45 : !llvm.ptr + %47 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.mlir.constant(0 : index) : !llvm.i64 + %49 = llvm.mlir.constant(256 : index) : !llvm.i64 + %50 = llvm.mul %35, %49 : !llvm.i64 + %51 = llvm.add %48, %50 : !llvm.i64 + %52 = llvm.mlir.constant(1 : index) : !llvm.i64 + %53 = llvm.mul %33, %52 : !llvm.i64 + %54 = llvm.add %51, %53 : !llvm.i64 + %55 = llvm.getelementptr %47[%54] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %56 = llvm.load %55 : !llvm.ptr + %57 = llvm.fmul %46, %56 {RelaxedPrecision} : !llvm.float + %58 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.mlir.constant(0 : index) : !llvm.i64 + %60 = llvm.mlir.constant(256 : index) : !llvm.i64 + %61 = llvm.mul %31, %60 : !llvm.i64 + %62 = llvm.add %59, %61 : !llvm.i64 + %63 = llvm.mlir.constant(1 : index) : !llvm.i64 + %64 = llvm.mul %33, %63 : !llvm.i64 + %65 = llvm.add %62, %64 : !llvm.i64 + %66 = llvm.getelementptr %58[%65] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %67 = llvm.load %66 : !llvm.ptr + %68 = llvm.fadd %67, %57 {RelaxedPrecision} : !llvm.float + %69 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %70 = llvm.mlir.constant(0 : index) : !llvm.i64 + %71 = llvm.mlir.constant(256 : index) : !llvm.i64 + %72 = llvm.mul %31, %71 : !llvm.i64 + %73 = llvm.add %70, %72 : !llvm.i64 + %74 = llvm.mlir.constant(1 : index) : !llvm.i64 + %75 = llvm.mul %33, %74 : !llvm.i64 + %76 = llvm.add %73, %75 : !llvm.i64 + %77 = llvm.getelementptr %69[%76] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %68, %77 : !llvm.ptr + %78 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %79 = llvm.mlir.constant(0 : index) : !llvm.i64 + %80 = llvm.mlir.constant(256 : index) : !llvm.i64 + %81 = llvm.mul %31, %80 : !llvm.i64 + %82 = llvm.add %79, %81 : !llvm.i64 + %83 = llvm.mlir.constant(1 : index) : !llvm.i64 + %84 = llvm.mul %33, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.getelementptr %78[%85] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %87 = llvm.load %86 : !llvm.ptr + %88 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.mlir.constant(0 : index) : !llvm.i64 + %90 = llvm.mlir.constant(256 : index) : !llvm.i64 + %91 = llvm.mul %31, %90 : !llvm.i64 + %92 = llvm.add %89, %91 : !llvm.i64 + %93 = llvm.mlir.constant(1 : index) : !llvm.i64 + %94 = llvm.mul %33, %93 : !llvm.i64 + %95 = llvm.add %92, %94 : !llvm.i64 + %96 = llvm.getelementptr %88[%95] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %87, %96 : !llvm.ptr + %97 = llvm.add %35, %28 : !llvm.i64 + %98 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %99 = llvm.mlir.constant(0 : index) : !llvm.i64 + %100 = llvm.mlir.constant(256 : index) : !llvm.i64 + %101 = llvm.mul %31, %100 : !llvm.i64 + %102 = llvm.add %99, %101 : !llvm.i64 + %103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %104 = llvm.mul %97, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.getelementptr %98[%105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %107 = llvm.load %106 : !llvm.ptr + %108 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %110 = llvm.mlir.constant(256 : index) : !llvm.i64 + %111 = llvm.mul %97, %110 : !llvm.i64 + %112 = llvm.add %109, %111 : !llvm.i64 + %113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %114 = llvm.mul %33, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.getelementptr %108[%115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %117 = llvm.load %116 : !llvm.ptr + %118 = llvm.fmul %107, %117 {RelaxedPrecision} : !llvm.float + %119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %121 = llvm.mlir.constant(256 : index) : !llvm.i64 + %122 = llvm.mul %31, %121 : !llvm.i64 + %123 = llvm.add %120, %122 : !llvm.i64 + %124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %125 = llvm.mul %33, %124 : !llvm.i64 + %126 = llvm.add %123, %125 : !llvm.i64 + %127 = llvm.getelementptr %119[%126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %128 = llvm.load %127 : !llvm.ptr + %129 = llvm.fadd %128, %118 {RelaxedPrecision} : !llvm.float + %130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %132 = llvm.mlir.constant(256 : index) : !llvm.i64 + %133 = llvm.mul %31, %132 : !llvm.i64 + %134 = llvm.add %131, %133 : !llvm.i64 + %135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %136 = llvm.mul %33, %135 : !llvm.i64 + %137 = llvm.add %134, %136 : !llvm.i64 + %138 = llvm.getelementptr %130[%137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %129, %138 : !llvm.ptr + %139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.mul %31, %141 : !llvm.i64 + %143 = llvm.add %140, %142 : !llvm.i64 + %144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %145 = llvm.mul %33, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.getelementptr %139[%146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %148 = llvm.load %147 : !llvm.ptr + %149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %151 = llvm.mlir.constant(256 : index) : !llvm.i64 + %152 = llvm.mul %31, %151 : !llvm.i64 + %153 = llvm.add %150, %152 : !llvm.i64 + %154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %155 = llvm.mul %33, %154 : !llvm.i64 + %156 = llvm.add %153, %155 : !llvm.i64 + %157 = llvm.getelementptr %149[%156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %148, %157 : !llvm.ptr + %158 = llvm.add %35, %29 : !llvm.i64 + %159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %161 = llvm.mlir.constant(256 : index) : !llvm.i64 + %162 = llvm.mul %31, %161 : !llvm.i64 + %163 = llvm.add %160, %162 : !llvm.i64 + %164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %165 = llvm.mul %158, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.getelementptr %159[%166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %168 = llvm.load %167 : !llvm.ptr + %169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(256 : index) : !llvm.i64 + %172 = llvm.mul %158, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %33, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %178 = llvm.load %177 : !llvm.ptr + %179 = llvm.fmul %168, %178 {RelaxedPrecision} : !llvm.float + %180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %182 = llvm.mlir.constant(256 : index) : !llvm.i64 + %183 = llvm.mul %31, %182 : !llvm.i64 + %184 = llvm.add %181, %183 : !llvm.i64 + %185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %186 = llvm.mul %33, %185 : !llvm.i64 + %187 = llvm.add %184, %186 : !llvm.i64 + %188 = llvm.getelementptr %180[%187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %189 = llvm.load %188 : !llvm.ptr + %190 = llvm.fadd %189, %179 {RelaxedPrecision} : !llvm.float + %191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %193 = llvm.mlir.constant(256 : index) : !llvm.i64 + %194 = llvm.mul %31, %193 : !llvm.i64 + %195 = llvm.add %192, %194 : !llvm.i64 + %196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %197 = llvm.mul %33, %196 : !llvm.i64 + %198 = llvm.add %195, %197 : !llvm.i64 + %199 = llvm.getelementptr %191[%198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %190, %199 : !llvm.ptr + %200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(256 : index) : !llvm.i64 + %203 = llvm.mul %31, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %33, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.load %208 : !llvm.ptr + %210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %212 = llvm.mlir.constant(256 : index) : !llvm.i64 + %213 = llvm.mul %31, %212 : !llvm.i64 + %214 = llvm.add %211, %213 : !llvm.i64 + %215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %216 = llvm.mul %33, %215 : !llvm.i64 + %217 = llvm.add %214, %216 : !llvm.i64 + %218 = llvm.getelementptr %210[%217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %209, %218 : !llvm.ptr + %219 = llvm.add %35, %30 : !llvm.i64 + %220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %222 = llvm.mlir.constant(256 : index) : !llvm.i64 + %223 = llvm.mul %31, %222 : !llvm.i64 + %224 = llvm.add %221, %223 : !llvm.i64 + %225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %226 = llvm.mul %219, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.getelementptr %220[%227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %229 = llvm.load %228 : !llvm.ptr + %230 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %232 = llvm.mlir.constant(256 : index) : !llvm.i64 + %233 = llvm.mul %219, %232 : !llvm.i64 + %234 = llvm.add %231, %233 : !llvm.i64 + %235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %236 = llvm.mul %33, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.getelementptr %230[%237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %239 = llvm.load %238 : !llvm.ptr + %240 = llvm.fmul %229, %239 {RelaxedPrecision} : !llvm.float + %241 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %242 = llvm.mlir.constant(0 : index) : !llvm.i64 + %243 = llvm.mlir.constant(256 : index) : !llvm.i64 + %244 = llvm.mul %31, %243 : !llvm.i64 + %245 = llvm.add %242, %244 : !llvm.i64 + %246 = llvm.mlir.constant(1 : index) : !llvm.i64 + %247 = llvm.mul %33, %246 : !llvm.i64 + %248 = llvm.add %245, %247 : !llvm.i64 + %249 = llvm.getelementptr %241[%248] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %250 = llvm.load %249 : !llvm.ptr + %251 = llvm.fadd %250, %240 {RelaxedPrecision} : !llvm.float + %252 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %253 = llvm.mlir.constant(0 : index) : !llvm.i64 + %254 = llvm.mlir.constant(256 : index) : !llvm.i64 + %255 = llvm.mul %31, %254 : !llvm.i64 + %256 = llvm.add %253, %255 : !llvm.i64 + %257 = llvm.mlir.constant(1 : index) : !llvm.i64 + %258 = llvm.mul %33, %257 : !llvm.i64 + %259 = llvm.add %256, %258 : !llvm.i64 + %260 = llvm.getelementptr %252[%259] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %251, %260 : !llvm.ptr + %261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %263 = llvm.mlir.constant(256 : index) : !llvm.i64 + %264 = llvm.mul %31, %263 : !llvm.i64 + %265 = llvm.add %262, %264 : !llvm.i64 + %266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %267 = llvm.mul %33, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.getelementptr %261[%268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %270 = llvm.load %269 : !llvm.ptr + %271 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %272 = llvm.mlir.constant(0 : index) : !llvm.i64 + %273 = llvm.mlir.constant(256 : index) : !llvm.i64 + %274 = llvm.mul %31, %273 : !llvm.i64 + %275 = llvm.add %272, %274 : !llvm.i64 + %276 = llvm.mlir.constant(1 : index) : !llvm.i64 + %277 = llvm.mul %33, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.getelementptr %271[%278] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %270, %279 : !llvm.ptr + %280 = llvm.add %35, %27 : !llvm.i64 + llvm.br ^bb5(%280 : !llvm.i64) + ^bb7: // pred: ^bb5 + %281 = llvm.add %33, %28 : !llvm.i64 + llvm.br ^bb3(%281 : !llvm.i64) + ^bb8: // pred: ^bb3 + %282 = llvm.add %31, %28 : !llvm.i64 + llvm.br ^bb1(%282 : !llvm.i64) + ^bb9: // pred: ^bb1 + llvm.return + } + llvm.func @hello_matmul_py_0f07b3ac(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(256 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(256 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(256 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(256 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(256 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(256 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(256 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/hello_matmul/mlir/21_LLVMLegalizeForExport.mlir b/Tutorials/hello_matmul/mlir/21_LLVMLegalizeForExport.mlir new file mode 100644 index 00000000..069088f1 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/21_LLVMLegalizeForExport.mlir @@ -0,0 +1,375 @@ +module @hello_matmul { + llvm.func @hello_matmul_py_impl_16252232176815793891(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(128 : index) : !llvm.i64 + %25 = llvm.mlir.constant(0 : index) : !llvm.i64 + %26 = llvm.mlir.constant(256 : index) : !llvm.i64 + %27 = llvm.mlir.constant(4 : index) : !llvm.i64 + %28 = llvm.mlir.constant(1 : index) : !llvm.i64 + %29 = llvm.mlir.constant(2 : index) : !llvm.i64 + %30 = llvm.mlir.constant(3 : index) : !llvm.i64 + llvm.br ^bb1(%25 : !llvm.i64) + ^bb1(%31: !llvm.i64): // 2 preds: ^bb0, ^bb8 + %32 = llvm.icmp "slt" %31, %24 : !llvm.i64 + llvm.cond_br %32, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%25 : !llvm.i64) + ^bb3(%33: !llvm.i64): // 2 preds: ^bb2, ^bb7 + %34 = llvm.icmp "slt" %33, %26 : !llvm.i64 + llvm.cond_br %34, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%25 : !llvm.i64) + ^bb5(%35: !llvm.i64): // 2 preds: ^bb4, ^bb6 + %36 = llvm.icmp "slt" %35, %26 : !llvm.i64 + llvm.cond_br %36, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %37 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %38 = llvm.mlir.constant(0 : index) : !llvm.i64 + %39 = llvm.mlir.constant(256 : index) : !llvm.i64 + %40 = llvm.mul %31, %39 : !llvm.i64 + %41 = llvm.add %38, %40 : !llvm.i64 + %42 = llvm.mlir.constant(1 : index) : !llvm.i64 + %43 = llvm.mul %35, %42 : !llvm.i64 + %44 = llvm.add %41, %43 : !llvm.i64 + %45 = llvm.getelementptr %37[%44] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %46 = llvm.load %45 : !llvm.ptr + %47 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.mlir.constant(0 : index) : !llvm.i64 + %49 = llvm.mlir.constant(256 : index) : !llvm.i64 + %50 = llvm.mul %35, %49 : !llvm.i64 + %51 = llvm.add %48, %50 : !llvm.i64 + %52 = llvm.mlir.constant(1 : index) : !llvm.i64 + %53 = llvm.mul %33, %52 : !llvm.i64 + %54 = llvm.add %51, %53 : !llvm.i64 + %55 = llvm.getelementptr %47[%54] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %56 = llvm.load %55 : !llvm.ptr + %57 = llvm.fmul %46, %56 {RelaxedPrecision} : !llvm.float + %58 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.mlir.constant(0 : index) : !llvm.i64 + %60 = llvm.mlir.constant(256 : index) : !llvm.i64 + %61 = llvm.mul %31, %60 : !llvm.i64 + %62 = llvm.add %59, %61 : !llvm.i64 + %63 = llvm.mlir.constant(1 : index) : !llvm.i64 + %64 = llvm.mul %33, %63 : !llvm.i64 + %65 = llvm.add %62, %64 : !llvm.i64 + %66 = llvm.getelementptr %58[%65] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %67 = llvm.load %66 : !llvm.ptr + %68 = llvm.fadd %67, %57 {RelaxedPrecision} : !llvm.float + %69 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %70 = llvm.mlir.constant(0 : index) : !llvm.i64 + %71 = llvm.mlir.constant(256 : index) : !llvm.i64 + %72 = llvm.mul %31, %71 : !llvm.i64 + %73 = llvm.add %70, %72 : !llvm.i64 + %74 = llvm.mlir.constant(1 : index) : !llvm.i64 + %75 = llvm.mul %33, %74 : !llvm.i64 + %76 = llvm.add %73, %75 : !llvm.i64 + %77 = llvm.getelementptr %69[%76] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %68, %77 : !llvm.ptr + %78 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %79 = llvm.mlir.constant(0 : index) : !llvm.i64 + %80 = llvm.mlir.constant(256 : index) : !llvm.i64 + %81 = llvm.mul %31, %80 : !llvm.i64 + %82 = llvm.add %79, %81 : !llvm.i64 + %83 = llvm.mlir.constant(1 : index) : !llvm.i64 + %84 = llvm.mul %33, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.getelementptr %78[%85] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %87 = llvm.load %86 : !llvm.ptr + %88 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.mlir.constant(0 : index) : !llvm.i64 + %90 = llvm.mlir.constant(256 : index) : !llvm.i64 + %91 = llvm.mul %31, %90 : !llvm.i64 + %92 = llvm.add %89, %91 : !llvm.i64 + %93 = llvm.mlir.constant(1 : index) : !llvm.i64 + %94 = llvm.mul %33, %93 : !llvm.i64 + %95 = llvm.add %92, %94 : !llvm.i64 + %96 = llvm.getelementptr %88[%95] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %87, %96 : !llvm.ptr + %97 = llvm.add %35, %28 : !llvm.i64 + %98 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %99 = llvm.mlir.constant(0 : index) : !llvm.i64 + %100 = llvm.mlir.constant(256 : index) : !llvm.i64 + %101 = llvm.mul %31, %100 : !llvm.i64 + %102 = llvm.add %99, %101 : !llvm.i64 + %103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %104 = llvm.mul %97, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.getelementptr %98[%105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %107 = llvm.load %106 : !llvm.ptr + %108 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %110 = llvm.mlir.constant(256 : index) : !llvm.i64 + %111 = llvm.mul %97, %110 : !llvm.i64 + %112 = llvm.add %109, %111 : !llvm.i64 + %113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %114 = llvm.mul %33, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.getelementptr %108[%115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %117 = llvm.load %116 : !llvm.ptr + %118 = llvm.fmul %107, %117 {RelaxedPrecision} : !llvm.float + %119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %121 = llvm.mlir.constant(256 : index) : !llvm.i64 + %122 = llvm.mul %31, %121 : !llvm.i64 + %123 = llvm.add %120, %122 : !llvm.i64 + %124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %125 = llvm.mul %33, %124 : !llvm.i64 + %126 = llvm.add %123, %125 : !llvm.i64 + %127 = llvm.getelementptr %119[%126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %128 = llvm.load %127 : !llvm.ptr + %129 = llvm.fadd %128, %118 {RelaxedPrecision} : !llvm.float + %130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %132 = llvm.mlir.constant(256 : index) : !llvm.i64 + %133 = llvm.mul %31, %132 : !llvm.i64 + %134 = llvm.add %131, %133 : !llvm.i64 + %135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %136 = llvm.mul %33, %135 : !llvm.i64 + %137 = llvm.add %134, %136 : !llvm.i64 + %138 = llvm.getelementptr %130[%137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %129, %138 : !llvm.ptr + %139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.mul %31, %141 : !llvm.i64 + %143 = llvm.add %140, %142 : !llvm.i64 + %144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %145 = llvm.mul %33, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.getelementptr %139[%146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %148 = llvm.load %147 : !llvm.ptr + %149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %151 = llvm.mlir.constant(256 : index) : !llvm.i64 + %152 = llvm.mul %31, %151 : !llvm.i64 + %153 = llvm.add %150, %152 : !llvm.i64 + %154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %155 = llvm.mul %33, %154 : !llvm.i64 + %156 = llvm.add %153, %155 : !llvm.i64 + %157 = llvm.getelementptr %149[%156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %148, %157 : !llvm.ptr + %158 = llvm.add %35, %29 : !llvm.i64 + %159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %161 = llvm.mlir.constant(256 : index) : !llvm.i64 + %162 = llvm.mul %31, %161 : !llvm.i64 + %163 = llvm.add %160, %162 : !llvm.i64 + %164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %165 = llvm.mul %158, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.getelementptr %159[%166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %168 = llvm.load %167 : !llvm.ptr + %169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(256 : index) : !llvm.i64 + %172 = llvm.mul %158, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %33, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %178 = llvm.load %177 : !llvm.ptr + %179 = llvm.fmul %168, %178 {RelaxedPrecision} : !llvm.float + %180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %182 = llvm.mlir.constant(256 : index) : !llvm.i64 + %183 = llvm.mul %31, %182 : !llvm.i64 + %184 = llvm.add %181, %183 : !llvm.i64 + %185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %186 = llvm.mul %33, %185 : !llvm.i64 + %187 = llvm.add %184, %186 : !llvm.i64 + %188 = llvm.getelementptr %180[%187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %189 = llvm.load %188 : !llvm.ptr + %190 = llvm.fadd %189, %179 {RelaxedPrecision} : !llvm.float + %191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %193 = llvm.mlir.constant(256 : index) : !llvm.i64 + %194 = llvm.mul %31, %193 : !llvm.i64 + %195 = llvm.add %192, %194 : !llvm.i64 + %196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %197 = llvm.mul %33, %196 : !llvm.i64 + %198 = llvm.add %195, %197 : !llvm.i64 + %199 = llvm.getelementptr %191[%198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %190, %199 : !llvm.ptr + %200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(256 : index) : !llvm.i64 + %203 = llvm.mul %31, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %33, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.load %208 : !llvm.ptr + %210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %212 = llvm.mlir.constant(256 : index) : !llvm.i64 + %213 = llvm.mul %31, %212 : !llvm.i64 + %214 = llvm.add %211, %213 : !llvm.i64 + %215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %216 = llvm.mul %33, %215 : !llvm.i64 + %217 = llvm.add %214, %216 : !llvm.i64 + %218 = llvm.getelementptr %210[%217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %209, %218 : !llvm.ptr + %219 = llvm.add %35, %30 : !llvm.i64 + %220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %222 = llvm.mlir.constant(256 : index) : !llvm.i64 + %223 = llvm.mul %31, %222 : !llvm.i64 + %224 = llvm.add %221, %223 : !llvm.i64 + %225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %226 = llvm.mul %219, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.getelementptr %220[%227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %229 = llvm.load %228 : !llvm.ptr + %230 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %232 = llvm.mlir.constant(256 : index) : !llvm.i64 + %233 = llvm.mul %219, %232 : !llvm.i64 + %234 = llvm.add %231, %233 : !llvm.i64 + %235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %236 = llvm.mul %33, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.getelementptr %230[%237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %239 = llvm.load %238 : !llvm.ptr + %240 = llvm.fmul %229, %239 {RelaxedPrecision} : !llvm.float + %241 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %242 = llvm.mlir.constant(0 : index) : !llvm.i64 + %243 = llvm.mlir.constant(256 : index) : !llvm.i64 + %244 = llvm.mul %31, %243 : !llvm.i64 + %245 = llvm.add %242, %244 : !llvm.i64 + %246 = llvm.mlir.constant(1 : index) : !llvm.i64 + %247 = llvm.mul %33, %246 : !llvm.i64 + %248 = llvm.add %245, %247 : !llvm.i64 + %249 = llvm.getelementptr %241[%248] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %250 = llvm.load %249 : !llvm.ptr + %251 = llvm.fadd %250, %240 {RelaxedPrecision} : !llvm.float + %252 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %253 = llvm.mlir.constant(0 : index) : !llvm.i64 + %254 = llvm.mlir.constant(256 : index) : !llvm.i64 + %255 = llvm.mul %31, %254 : !llvm.i64 + %256 = llvm.add %253, %255 : !llvm.i64 + %257 = llvm.mlir.constant(1 : index) : !llvm.i64 + %258 = llvm.mul %33, %257 : !llvm.i64 + %259 = llvm.add %256, %258 : !llvm.i64 + %260 = llvm.getelementptr %252[%259] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %251, %260 : !llvm.ptr + %261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %263 = llvm.mlir.constant(256 : index) : !llvm.i64 + %264 = llvm.mul %31, %263 : !llvm.i64 + %265 = llvm.add %262, %264 : !llvm.i64 + %266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %267 = llvm.mul %33, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.getelementptr %261[%268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %270 = llvm.load %269 : !llvm.ptr + %271 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %272 = llvm.mlir.constant(0 : index) : !llvm.i64 + %273 = llvm.mlir.constant(256 : index) : !llvm.i64 + %274 = llvm.mul %31, %273 : !llvm.i64 + %275 = llvm.add %272, %274 : !llvm.i64 + %276 = llvm.mlir.constant(1 : index) : !llvm.i64 + %277 = llvm.mul %33, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.getelementptr %271[%278] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %270, %279 : !llvm.ptr + %280 = llvm.add %35, %27 : !llvm.i64 + llvm.br ^bb5(%280 : !llvm.i64) + ^bb7: // pred: ^bb5 + %281 = llvm.add %33, %28 : !llvm.i64 + llvm.br ^bb3(%281 : !llvm.i64) + ^bb8: // pred: ^bb3 + %282 = llvm.add %31, %28 : !llvm.i64 + llvm.br ^bb1(%282 : !llvm.i64) + ^bb9: // pred: ^bb1 + llvm.return + } + llvm.func @hello_matmul_py(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(256 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(256 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(256 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(256 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(256 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(256 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(256 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @hello_matmul_py_impl_16252232176815793891(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/hello_matmul/mlir/22_FunctionPointerResolution.mlir b/Tutorials/hello_matmul/mlir/22_FunctionPointerResolution.mlir new file mode 100644 index 00000000..069088f1 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/22_FunctionPointerResolution.mlir @@ -0,0 +1,375 @@ +module @hello_matmul { + llvm.func @hello_matmul_py_impl_16252232176815793891(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(128 : index) : !llvm.i64 + %25 = llvm.mlir.constant(0 : index) : !llvm.i64 + %26 = llvm.mlir.constant(256 : index) : !llvm.i64 + %27 = llvm.mlir.constant(4 : index) : !llvm.i64 + %28 = llvm.mlir.constant(1 : index) : !llvm.i64 + %29 = llvm.mlir.constant(2 : index) : !llvm.i64 + %30 = llvm.mlir.constant(3 : index) : !llvm.i64 + llvm.br ^bb1(%25 : !llvm.i64) + ^bb1(%31: !llvm.i64): // 2 preds: ^bb0, ^bb8 + %32 = llvm.icmp "slt" %31, %24 : !llvm.i64 + llvm.cond_br %32, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%25 : !llvm.i64) + ^bb3(%33: !llvm.i64): // 2 preds: ^bb2, ^bb7 + %34 = llvm.icmp "slt" %33, %26 : !llvm.i64 + llvm.cond_br %34, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%25 : !llvm.i64) + ^bb5(%35: !llvm.i64): // 2 preds: ^bb4, ^bb6 + %36 = llvm.icmp "slt" %35, %26 : !llvm.i64 + llvm.cond_br %36, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %37 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %38 = llvm.mlir.constant(0 : index) : !llvm.i64 + %39 = llvm.mlir.constant(256 : index) : !llvm.i64 + %40 = llvm.mul %31, %39 : !llvm.i64 + %41 = llvm.add %38, %40 : !llvm.i64 + %42 = llvm.mlir.constant(1 : index) : !llvm.i64 + %43 = llvm.mul %35, %42 : !llvm.i64 + %44 = llvm.add %41, %43 : !llvm.i64 + %45 = llvm.getelementptr %37[%44] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %46 = llvm.load %45 : !llvm.ptr + %47 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.mlir.constant(0 : index) : !llvm.i64 + %49 = llvm.mlir.constant(256 : index) : !llvm.i64 + %50 = llvm.mul %35, %49 : !llvm.i64 + %51 = llvm.add %48, %50 : !llvm.i64 + %52 = llvm.mlir.constant(1 : index) : !llvm.i64 + %53 = llvm.mul %33, %52 : !llvm.i64 + %54 = llvm.add %51, %53 : !llvm.i64 + %55 = llvm.getelementptr %47[%54] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %56 = llvm.load %55 : !llvm.ptr + %57 = llvm.fmul %46, %56 {RelaxedPrecision} : !llvm.float + %58 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.mlir.constant(0 : index) : !llvm.i64 + %60 = llvm.mlir.constant(256 : index) : !llvm.i64 + %61 = llvm.mul %31, %60 : !llvm.i64 + %62 = llvm.add %59, %61 : !llvm.i64 + %63 = llvm.mlir.constant(1 : index) : !llvm.i64 + %64 = llvm.mul %33, %63 : !llvm.i64 + %65 = llvm.add %62, %64 : !llvm.i64 + %66 = llvm.getelementptr %58[%65] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %67 = llvm.load %66 : !llvm.ptr + %68 = llvm.fadd %67, %57 {RelaxedPrecision} : !llvm.float + %69 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %70 = llvm.mlir.constant(0 : index) : !llvm.i64 + %71 = llvm.mlir.constant(256 : index) : !llvm.i64 + %72 = llvm.mul %31, %71 : !llvm.i64 + %73 = llvm.add %70, %72 : !llvm.i64 + %74 = llvm.mlir.constant(1 : index) : !llvm.i64 + %75 = llvm.mul %33, %74 : !llvm.i64 + %76 = llvm.add %73, %75 : !llvm.i64 + %77 = llvm.getelementptr %69[%76] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %68, %77 : !llvm.ptr + %78 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %79 = llvm.mlir.constant(0 : index) : !llvm.i64 + %80 = llvm.mlir.constant(256 : index) : !llvm.i64 + %81 = llvm.mul %31, %80 : !llvm.i64 + %82 = llvm.add %79, %81 : !llvm.i64 + %83 = llvm.mlir.constant(1 : index) : !llvm.i64 + %84 = llvm.mul %33, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.getelementptr %78[%85] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %87 = llvm.load %86 : !llvm.ptr + %88 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.mlir.constant(0 : index) : !llvm.i64 + %90 = llvm.mlir.constant(256 : index) : !llvm.i64 + %91 = llvm.mul %31, %90 : !llvm.i64 + %92 = llvm.add %89, %91 : !llvm.i64 + %93 = llvm.mlir.constant(1 : index) : !llvm.i64 + %94 = llvm.mul %33, %93 : !llvm.i64 + %95 = llvm.add %92, %94 : !llvm.i64 + %96 = llvm.getelementptr %88[%95] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %87, %96 : !llvm.ptr + %97 = llvm.add %35, %28 : !llvm.i64 + %98 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %99 = llvm.mlir.constant(0 : index) : !llvm.i64 + %100 = llvm.mlir.constant(256 : index) : !llvm.i64 + %101 = llvm.mul %31, %100 : !llvm.i64 + %102 = llvm.add %99, %101 : !llvm.i64 + %103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %104 = llvm.mul %97, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.getelementptr %98[%105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %107 = llvm.load %106 : !llvm.ptr + %108 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %110 = llvm.mlir.constant(256 : index) : !llvm.i64 + %111 = llvm.mul %97, %110 : !llvm.i64 + %112 = llvm.add %109, %111 : !llvm.i64 + %113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %114 = llvm.mul %33, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.getelementptr %108[%115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %117 = llvm.load %116 : !llvm.ptr + %118 = llvm.fmul %107, %117 {RelaxedPrecision} : !llvm.float + %119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %121 = llvm.mlir.constant(256 : index) : !llvm.i64 + %122 = llvm.mul %31, %121 : !llvm.i64 + %123 = llvm.add %120, %122 : !llvm.i64 + %124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %125 = llvm.mul %33, %124 : !llvm.i64 + %126 = llvm.add %123, %125 : !llvm.i64 + %127 = llvm.getelementptr %119[%126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %128 = llvm.load %127 : !llvm.ptr + %129 = llvm.fadd %128, %118 {RelaxedPrecision} : !llvm.float + %130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %132 = llvm.mlir.constant(256 : index) : !llvm.i64 + %133 = llvm.mul %31, %132 : !llvm.i64 + %134 = llvm.add %131, %133 : !llvm.i64 + %135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %136 = llvm.mul %33, %135 : !llvm.i64 + %137 = llvm.add %134, %136 : !llvm.i64 + %138 = llvm.getelementptr %130[%137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %129, %138 : !llvm.ptr + %139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.mul %31, %141 : !llvm.i64 + %143 = llvm.add %140, %142 : !llvm.i64 + %144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %145 = llvm.mul %33, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.getelementptr %139[%146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %148 = llvm.load %147 : !llvm.ptr + %149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %151 = llvm.mlir.constant(256 : index) : !llvm.i64 + %152 = llvm.mul %31, %151 : !llvm.i64 + %153 = llvm.add %150, %152 : !llvm.i64 + %154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %155 = llvm.mul %33, %154 : !llvm.i64 + %156 = llvm.add %153, %155 : !llvm.i64 + %157 = llvm.getelementptr %149[%156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %148, %157 : !llvm.ptr + %158 = llvm.add %35, %29 : !llvm.i64 + %159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %161 = llvm.mlir.constant(256 : index) : !llvm.i64 + %162 = llvm.mul %31, %161 : !llvm.i64 + %163 = llvm.add %160, %162 : !llvm.i64 + %164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %165 = llvm.mul %158, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.getelementptr %159[%166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %168 = llvm.load %167 : !llvm.ptr + %169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(256 : index) : !llvm.i64 + %172 = llvm.mul %158, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %33, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %178 = llvm.load %177 : !llvm.ptr + %179 = llvm.fmul %168, %178 {RelaxedPrecision} : !llvm.float + %180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %182 = llvm.mlir.constant(256 : index) : !llvm.i64 + %183 = llvm.mul %31, %182 : !llvm.i64 + %184 = llvm.add %181, %183 : !llvm.i64 + %185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %186 = llvm.mul %33, %185 : !llvm.i64 + %187 = llvm.add %184, %186 : !llvm.i64 + %188 = llvm.getelementptr %180[%187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %189 = llvm.load %188 : !llvm.ptr + %190 = llvm.fadd %189, %179 {RelaxedPrecision} : !llvm.float + %191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %193 = llvm.mlir.constant(256 : index) : !llvm.i64 + %194 = llvm.mul %31, %193 : !llvm.i64 + %195 = llvm.add %192, %194 : !llvm.i64 + %196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %197 = llvm.mul %33, %196 : !llvm.i64 + %198 = llvm.add %195, %197 : !llvm.i64 + %199 = llvm.getelementptr %191[%198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %190, %199 : !llvm.ptr + %200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(256 : index) : !llvm.i64 + %203 = llvm.mul %31, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %33, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.load %208 : !llvm.ptr + %210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %212 = llvm.mlir.constant(256 : index) : !llvm.i64 + %213 = llvm.mul %31, %212 : !llvm.i64 + %214 = llvm.add %211, %213 : !llvm.i64 + %215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %216 = llvm.mul %33, %215 : !llvm.i64 + %217 = llvm.add %214, %216 : !llvm.i64 + %218 = llvm.getelementptr %210[%217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %209, %218 : !llvm.ptr + %219 = llvm.add %35, %30 : !llvm.i64 + %220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %222 = llvm.mlir.constant(256 : index) : !llvm.i64 + %223 = llvm.mul %31, %222 : !llvm.i64 + %224 = llvm.add %221, %223 : !llvm.i64 + %225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %226 = llvm.mul %219, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.getelementptr %220[%227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %229 = llvm.load %228 : !llvm.ptr + %230 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %232 = llvm.mlir.constant(256 : index) : !llvm.i64 + %233 = llvm.mul %219, %232 : !llvm.i64 + %234 = llvm.add %231, %233 : !llvm.i64 + %235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %236 = llvm.mul %33, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.getelementptr %230[%237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %239 = llvm.load %238 : !llvm.ptr + %240 = llvm.fmul %229, %239 {RelaxedPrecision} : !llvm.float + %241 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %242 = llvm.mlir.constant(0 : index) : !llvm.i64 + %243 = llvm.mlir.constant(256 : index) : !llvm.i64 + %244 = llvm.mul %31, %243 : !llvm.i64 + %245 = llvm.add %242, %244 : !llvm.i64 + %246 = llvm.mlir.constant(1 : index) : !llvm.i64 + %247 = llvm.mul %33, %246 : !llvm.i64 + %248 = llvm.add %245, %247 : !llvm.i64 + %249 = llvm.getelementptr %241[%248] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %250 = llvm.load %249 : !llvm.ptr + %251 = llvm.fadd %250, %240 {RelaxedPrecision} : !llvm.float + %252 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %253 = llvm.mlir.constant(0 : index) : !llvm.i64 + %254 = llvm.mlir.constant(256 : index) : !llvm.i64 + %255 = llvm.mul %31, %254 : !llvm.i64 + %256 = llvm.add %253, %255 : !llvm.i64 + %257 = llvm.mlir.constant(1 : index) : !llvm.i64 + %258 = llvm.mul %33, %257 : !llvm.i64 + %259 = llvm.add %256, %258 : !llvm.i64 + %260 = llvm.getelementptr %252[%259] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %251, %260 : !llvm.ptr + %261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %263 = llvm.mlir.constant(256 : index) : !llvm.i64 + %264 = llvm.mul %31, %263 : !llvm.i64 + %265 = llvm.add %262, %264 : !llvm.i64 + %266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %267 = llvm.mul %33, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.getelementptr %261[%268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %270 = llvm.load %269 : !llvm.ptr + %271 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %272 = llvm.mlir.constant(0 : index) : !llvm.i64 + %273 = llvm.mlir.constant(256 : index) : !llvm.i64 + %274 = llvm.mul %31, %273 : !llvm.i64 + %275 = llvm.add %272, %274 : !llvm.i64 + %276 = llvm.mlir.constant(1 : index) : !llvm.i64 + %277 = llvm.mul %33, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.getelementptr %271[%278] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %270, %279 : !llvm.ptr + %280 = llvm.add %35, %27 : !llvm.i64 + llvm.br ^bb5(%280 : !llvm.i64) + ^bb7: // pred: ^bb5 + %281 = llvm.add %33, %28 : !llvm.i64 + llvm.br ^bb3(%281 : !llvm.i64) + ^bb8: // pred: ^bb3 + %282 = llvm.add %31, %28 : !llvm.i64 + llvm.br ^bb1(%282 : !llvm.i64) + ^bb9: // pred: ^bb1 + llvm.return + } + llvm.func @hello_matmul_py(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(256 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(256 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(256 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(256 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(256 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(256 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(256 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @hello_matmul_py_impl_16252232176815793891(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/hello_matmul/mlir/22_LLVMLegalizeForExport.mlir b/Tutorials/hello_matmul/mlir/22_LLVMLegalizeForExport.mlir new file mode 100644 index 00000000..752331e8 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/22_LLVMLegalizeForExport.mlir @@ -0,0 +1,375 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(128 : index) : !llvm.i64 + %25 = llvm.mlir.constant(0 : index) : !llvm.i64 + %26 = llvm.mlir.constant(256 : index) : !llvm.i64 + %27 = llvm.mlir.constant(4 : index) : !llvm.i64 + %28 = llvm.mlir.constant(1 : index) : !llvm.i64 + %29 = llvm.mlir.constant(2 : index) : !llvm.i64 + %30 = llvm.mlir.constant(3 : index) : !llvm.i64 + llvm.br ^bb1(%25 : !llvm.i64) + ^bb1(%31: !llvm.i64): // 2 preds: ^bb0, ^bb8 + %32 = llvm.icmp "slt" %31, %24 : !llvm.i64 + llvm.cond_br %32, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%25 : !llvm.i64) + ^bb3(%33: !llvm.i64): // 2 preds: ^bb2, ^bb7 + %34 = llvm.icmp "slt" %33, %26 : !llvm.i64 + llvm.cond_br %34, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%25 : !llvm.i64) + ^bb5(%35: !llvm.i64): // 2 preds: ^bb4, ^bb6 + %36 = llvm.icmp "slt" %35, %26 : !llvm.i64 + llvm.cond_br %36, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %37 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %38 = llvm.mlir.constant(0 : index) : !llvm.i64 + %39 = llvm.mlir.constant(256 : index) : !llvm.i64 + %40 = llvm.mul %31, %39 : !llvm.i64 + %41 = llvm.add %38, %40 : !llvm.i64 + %42 = llvm.mlir.constant(1 : index) : !llvm.i64 + %43 = llvm.mul %35, %42 : !llvm.i64 + %44 = llvm.add %41, %43 : !llvm.i64 + %45 = llvm.getelementptr %37[%44] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %46 = llvm.load %45 : !llvm.ptr + %47 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.mlir.constant(0 : index) : !llvm.i64 + %49 = llvm.mlir.constant(256 : index) : !llvm.i64 + %50 = llvm.mul %35, %49 : !llvm.i64 + %51 = llvm.add %48, %50 : !llvm.i64 + %52 = llvm.mlir.constant(1 : index) : !llvm.i64 + %53 = llvm.mul %33, %52 : !llvm.i64 + %54 = llvm.add %51, %53 : !llvm.i64 + %55 = llvm.getelementptr %47[%54] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %56 = llvm.load %55 : !llvm.ptr + %57 = llvm.fmul %46, %56 {RelaxedPrecision} : !llvm.float + %58 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.mlir.constant(0 : index) : !llvm.i64 + %60 = llvm.mlir.constant(256 : index) : !llvm.i64 + %61 = llvm.mul %31, %60 : !llvm.i64 + %62 = llvm.add %59, %61 : !llvm.i64 + %63 = llvm.mlir.constant(1 : index) : !llvm.i64 + %64 = llvm.mul %33, %63 : !llvm.i64 + %65 = llvm.add %62, %64 : !llvm.i64 + %66 = llvm.getelementptr %58[%65] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %67 = llvm.load %66 : !llvm.ptr + %68 = llvm.fadd %67, %57 {RelaxedPrecision} : !llvm.float + %69 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %70 = llvm.mlir.constant(0 : index) : !llvm.i64 + %71 = llvm.mlir.constant(256 : index) : !llvm.i64 + %72 = llvm.mul %31, %71 : !llvm.i64 + %73 = llvm.add %70, %72 : !llvm.i64 + %74 = llvm.mlir.constant(1 : index) : !llvm.i64 + %75 = llvm.mul %33, %74 : !llvm.i64 + %76 = llvm.add %73, %75 : !llvm.i64 + %77 = llvm.getelementptr %69[%76] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %68, %77 : !llvm.ptr + %78 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %79 = llvm.mlir.constant(0 : index) : !llvm.i64 + %80 = llvm.mlir.constant(256 : index) : !llvm.i64 + %81 = llvm.mul %31, %80 : !llvm.i64 + %82 = llvm.add %79, %81 : !llvm.i64 + %83 = llvm.mlir.constant(1 : index) : !llvm.i64 + %84 = llvm.mul %33, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.getelementptr %78[%85] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %87 = llvm.load %86 : !llvm.ptr + %88 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.mlir.constant(0 : index) : !llvm.i64 + %90 = llvm.mlir.constant(256 : index) : !llvm.i64 + %91 = llvm.mul %31, %90 : !llvm.i64 + %92 = llvm.add %89, %91 : !llvm.i64 + %93 = llvm.mlir.constant(1 : index) : !llvm.i64 + %94 = llvm.mul %33, %93 : !llvm.i64 + %95 = llvm.add %92, %94 : !llvm.i64 + %96 = llvm.getelementptr %88[%95] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %87, %96 : !llvm.ptr + %97 = llvm.add %35, %28 : !llvm.i64 + %98 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %99 = llvm.mlir.constant(0 : index) : !llvm.i64 + %100 = llvm.mlir.constant(256 : index) : !llvm.i64 + %101 = llvm.mul %31, %100 : !llvm.i64 + %102 = llvm.add %99, %101 : !llvm.i64 + %103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %104 = llvm.mul %97, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.getelementptr %98[%105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %107 = llvm.load %106 : !llvm.ptr + %108 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %110 = llvm.mlir.constant(256 : index) : !llvm.i64 + %111 = llvm.mul %97, %110 : !llvm.i64 + %112 = llvm.add %109, %111 : !llvm.i64 + %113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %114 = llvm.mul %33, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.getelementptr %108[%115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %117 = llvm.load %116 : !llvm.ptr + %118 = llvm.fmul %107, %117 {RelaxedPrecision} : !llvm.float + %119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %121 = llvm.mlir.constant(256 : index) : !llvm.i64 + %122 = llvm.mul %31, %121 : !llvm.i64 + %123 = llvm.add %120, %122 : !llvm.i64 + %124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %125 = llvm.mul %33, %124 : !llvm.i64 + %126 = llvm.add %123, %125 : !llvm.i64 + %127 = llvm.getelementptr %119[%126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %128 = llvm.load %127 : !llvm.ptr + %129 = llvm.fadd %128, %118 {RelaxedPrecision} : !llvm.float + %130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %132 = llvm.mlir.constant(256 : index) : !llvm.i64 + %133 = llvm.mul %31, %132 : !llvm.i64 + %134 = llvm.add %131, %133 : !llvm.i64 + %135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %136 = llvm.mul %33, %135 : !llvm.i64 + %137 = llvm.add %134, %136 : !llvm.i64 + %138 = llvm.getelementptr %130[%137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %129, %138 : !llvm.ptr + %139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.mul %31, %141 : !llvm.i64 + %143 = llvm.add %140, %142 : !llvm.i64 + %144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %145 = llvm.mul %33, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.getelementptr %139[%146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %148 = llvm.load %147 : !llvm.ptr + %149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %151 = llvm.mlir.constant(256 : index) : !llvm.i64 + %152 = llvm.mul %31, %151 : !llvm.i64 + %153 = llvm.add %150, %152 : !llvm.i64 + %154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %155 = llvm.mul %33, %154 : !llvm.i64 + %156 = llvm.add %153, %155 : !llvm.i64 + %157 = llvm.getelementptr %149[%156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %148, %157 : !llvm.ptr + %158 = llvm.add %35, %29 : !llvm.i64 + %159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %161 = llvm.mlir.constant(256 : index) : !llvm.i64 + %162 = llvm.mul %31, %161 : !llvm.i64 + %163 = llvm.add %160, %162 : !llvm.i64 + %164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %165 = llvm.mul %158, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.getelementptr %159[%166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %168 = llvm.load %167 : !llvm.ptr + %169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(256 : index) : !llvm.i64 + %172 = llvm.mul %158, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %33, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %178 = llvm.load %177 : !llvm.ptr + %179 = llvm.fmul %168, %178 {RelaxedPrecision} : !llvm.float + %180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %182 = llvm.mlir.constant(256 : index) : !llvm.i64 + %183 = llvm.mul %31, %182 : !llvm.i64 + %184 = llvm.add %181, %183 : !llvm.i64 + %185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %186 = llvm.mul %33, %185 : !llvm.i64 + %187 = llvm.add %184, %186 : !llvm.i64 + %188 = llvm.getelementptr %180[%187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %189 = llvm.load %188 : !llvm.ptr + %190 = llvm.fadd %189, %179 {RelaxedPrecision} : !llvm.float + %191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %193 = llvm.mlir.constant(256 : index) : !llvm.i64 + %194 = llvm.mul %31, %193 : !llvm.i64 + %195 = llvm.add %192, %194 : !llvm.i64 + %196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %197 = llvm.mul %33, %196 : !llvm.i64 + %198 = llvm.add %195, %197 : !llvm.i64 + %199 = llvm.getelementptr %191[%198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %190, %199 : !llvm.ptr + %200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(256 : index) : !llvm.i64 + %203 = llvm.mul %31, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %33, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.load %208 : !llvm.ptr + %210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %212 = llvm.mlir.constant(256 : index) : !llvm.i64 + %213 = llvm.mul %31, %212 : !llvm.i64 + %214 = llvm.add %211, %213 : !llvm.i64 + %215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %216 = llvm.mul %33, %215 : !llvm.i64 + %217 = llvm.add %214, %216 : !llvm.i64 + %218 = llvm.getelementptr %210[%217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %209, %218 : !llvm.ptr + %219 = llvm.add %35, %30 : !llvm.i64 + %220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %222 = llvm.mlir.constant(256 : index) : !llvm.i64 + %223 = llvm.mul %31, %222 : !llvm.i64 + %224 = llvm.add %221, %223 : !llvm.i64 + %225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %226 = llvm.mul %219, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.getelementptr %220[%227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %229 = llvm.load %228 : !llvm.ptr + %230 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %232 = llvm.mlir.constant(256 : index) : !llvm.i64 + %233 = llvm.mul %219, %232 : !llvm.i64 + %234 = llvm.add %231, %233 : !llvm.i64 + %235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %236 = llvm.mul %33, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.getelementptr %230[%237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %239 = llvm.load %238 : !llvm.ptr + %240 = llvm.fmul %229, %239 {RelaxedPrecision} : !llvm.float + %241 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %242 = llvm.mlir.constant(0 : index) : !llvm.i64 + %243 = llvm.mlir.constant(256 : index) : !llvm.i64 + %244 = llvm.mul %31, %243 : !llvm.i64 + %245 = llvm.add %242, %244 : !llvm.i64 + %246 = llvm.mlir.constant(1 : index) : !llvm.i64 + %247 = llvm.mul %33, %246 : !llvm.i64 + %248 = llvm.add %245, %247 : !llvm.i64 + %249 = llvm.getelementptr %241[%248] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %250 = llvm.load %249 : !llvm.ptr + %251 = llvm.fadd %250, %240 {RelaxedPrecision} : !llvm.float + %252 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %253 = llvm.mlir.constant(0 : index) : !llvm.i64 + %254 = llvm.mlir.constant(256 : index) : !llvm.i64 + %255 = llvm.mul %31, %254 : !llvm.i64 + %256 = llvm.add %253, %255 : !llvm.i64 + %257 = llvm.mlir.constant(1 : index) : !llvm.i64 + %258 = llvm.mul %33, %257 : !llvm.i64 + %259 = llvm.add %256, %258 : !llvm.i64 + %260 = llvm.getelementptr %252[%259] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %251, %260 : !llvm.ptr + %261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %263 = llvm.mlir.constant(256 : index) : !llvm.i64 + %264 = llvm.mul %31, %263 : !llvm.i64 + %265 = llvm.add %262, %264 : !llvm.i64 + %266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %267 = llvm.mul %33, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.getelementptr %261[%268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %270 = llvm.load %269 : !llvm.ptr + %271 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %272 = llvm.mlir.constant(0 : index) : !llvm.i64 + %273 = llvm.mlir.constant(256 : index) : !llvm.i64 + %274 = llvm.mul %31, %273 : !llvm.i64 + %275 = llvm.add %272, %274 : !llvm.i64 + %276 = llvm.mlir.constant(1 : index) : !llvm.i64 + %277 = llvm.mul %33, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.getelementptr %271[%278] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %270, %279 : !llvm.ptr + %280 = llvm.add %35, %27 : !llvm.i64 + llvm.br ^bb5(%280 : !llvm.i64) + ^bb7: // pred: ^bb5 + %281 = llvm.add %33, %28 : !llvm.i64 + llvm.br ^bb3(%281 : !llvm.i64) + ^bb8: // pred: ^bb3 + %282 = llvm.add %31, %28 : !llvm.i64 + llvm.br ^bb1(%282 : !llvm.i64) + ^bb9: // pred: ^bb1 + llvm.return + } + llvm.func @hello_matmul_py_0f07b3ac(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(256 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(256 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(256 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(256 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(256 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(256 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(256 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/hello_matmul/mlir/23_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir b/Tutorials/hello_matmul/mlir/23_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir new file mode 100644 index 00000000..069088f1 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/23_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir @@ -0,0 +1,375 @@ +module @hello_matmul { + llvm.func @hello_matmul_py_impl_16252232176815793891(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(128 : index) : !llvm.i64 + %25 = llvm.mlir.constant(0 : index) : !llvm.i64 + %26 = llvm.mlir.constant(256 : index) : !llvm.i64 + %27 = llvm.mlir.constant(4 : index) : !llvm.i64 + %28 = llvm.mlir.constant(1 : index) : !llvm.i64 + %29 = llvm.mlir.constant(2 : index) : !llvm.i64 + %30 = llvm.mlir.constant(3 : index) : !llvm.i64 + llvm.br ^bb1(%25 : !llvm.i64) + ^bb1(%31: !llvm.i64): // 2 preds: ^bb0, ^bb8 + %32 = llvm.icmp "slt" %31, %24 : !llvm.i64 + llvm.cond_br %32, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%25 : !llvm.i64) + ^bb3(%33: !llvm.i64): // 2 preds: ^bb2, ^bb7 + %34 = llvm.icmp "slt" %33, %26 : !llvm.i64 + llvm.cond_br %34, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%25 : !llvm.i64) + ^bb5(%35: !llvm.i64): // 2 preds: ^bb4, ^bb6 + %36 = llvm.icmp "slt" %35, %26 : !llvm.i64 + llvm.cond_br %36, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %37 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %38 = llvm.mlir.constant(0 : index) : !llvm.i64 + %39 = llvm.mlir.constant(256 : index) : !llvm.i64 + %40 = llvm.mul %31, %39 : !llvm.i64 + %41 = llvm.add %38, %40 : !llvm.i64 + %42 = llvm.mlir.constant(1 : index) : !llvm.i64 + %43 = llvm.mul %35, %42 : !llvm.i64 + %44 = llvm.add %41, %43 : !llvm.i64 + %45 = llvm.getelementptr %37[%44] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %46 = llvm.load %45 : !llvm.ptr + %47 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.mlir.constant(0 : index) : !llvm.i64 + %49 = llvm.mlir.constant(256 : index) : !llvm.i64 + %50 = llvm.mul %35, %49 : !llvm.i64 + %51 = llvm.add %48, %50 : !llvm.i64 + %52 = llvm.mlir.constant(1 : index) : !llvm.i64 + %53 = llvm.mul %33, %52 : !llvm.i64 + %54 = llvm.add %51, %53 : !llvm.i64 + %55 = llvm.getelementptr %47[%54] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %56 = llvm.load %55 : !llvm.ptr + %57 = llvm.fmul %46, %56 {RelaxedPrecision} : !llvm.float + %58 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.mlir.constant(0 : index) : !llvm.i64 + %60 = llvm.mlir.constant(256 : index) : !llvm.i64 + %61 = llvm.mul %31, %60 : !llvm.i64 + %62 = llvm.add %59, %61 : !llvm.i64 + %63 = llvm.mlir.constant(1 : index) : !llvm.i64 + %64 = llvm.mul %33, %63 : !llvm.i64 + %65 = llvm.add %62, %64 : !llvm.i64 + %66 = llvm.getelementptr %58[%65] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %67 = llvm.load %66 : !llvm.ptr + %68 = llvm.fadd %67, %57 {RelaxedPrecision} : !llvm.float + %69 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %70 = llvm.mlir.constant(0 : index) : !llvm.i64 + %71 = llvm.mlir.constant(256 : index) : !llvm.i64 + %72 = llvm.mul %31, %71 : !llvm.i64 + %73 = llvm.add %70, %72 : !llvm.i64 + %74 = llvm.mlir.constant(1 : index) : !llvm.i64 + %75 = llvm.mul %33, %74 : !llvm.i64 + %76 = llvm.add %73, %75 : !llvm.i64 + %77 = llvm.getelementptr %69[%76] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %68, %77 : !llvm.ptr + %78 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %79 = llvm.mlir.constant(0 : index) : !llvm.i64 + %80 = llvm.mlir.constant(256 : index) : !llvm.i64 + %81 = llvm.mul %31, %80 : !llvm.i64 + %82 = llvm.add %79, %81 : !llvm.i64 + %83 = llvm.mlir.constant(1 : index) : !llvm.i64 + %84 = llvm.mul %33, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.getelementptr %78[%85] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %87 = llvm.load %86 : !llvm.ptr + %88 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.mlir.constant(0 : index) : !llvm.i64 + %90 = llvm.mlir.constant(256 : index) : !llvm.i64 + %91 = llvm.mul %31, %90 : !llvm.i64 + %92 = llvm.add %89, %91 : !llvm.i64 + %93 = llvm.mlir.constant(1 : index) : !llvm.i64 + %94 = llvm.mul %33, %93 : !llvm.i64 + %95 = llvm.add %92, %94 : !llvm.i64 + %96 = llvm.getelementptr %88[%95] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %87, %96 : !llvm.ptr + %97 = llvm.add %35, %28 : !llvm.i64 + %98 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %99 = llvm.mlir.constant(0 : index) : !llvm.i64 + %100 = llvm.mlir.constant(256 : index) : !llvm.i64 + %101 = llvm.mul %31, %100 : !llvm.i64 + %102 = llvm.add %99, %101 : !llvm.i64 + %103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %104 = llvm.mul %97, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.getelementptr %98[%105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %107 = llvm.load %106 : !llvm.ptr + %108 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %110 = llvm.mlir.constant(256 : index) : !llvm.i64 + %111 = llvm.mul %97, %110 : !llvm.i64 + %112 = llvm.add %109, %111 : !llvm.i64 + %113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %114 = llvm.mul %33, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.getelementptr %108[%115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %117 = llvm.load %116 : !llvm.ptr + %118 = llvm.fmul %107, %117 {RelaxedPrecision} : !llvm.float + %119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %121 = llvm.mlir.constant(256 : index) : !llvm.i64 + %122 = llvm.mul %31, %121 : !llvm.i64 + %123 = llvm.add %120, %122 : !llvm.i64 + %124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %125 = llvm.mul %33, %124 : !llvm.i64 + %126 = llvm.add %123, %125 : !llvm.i64 + %127 = llvm.getelementptr %119[%126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %128 = llvm.load %127 : !llvm.ptr + %129 = llvm.fadd %128, %118 {RelaxedPrecision} : !llvm.float + %130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %132 = llvm.mlir.constant(256 : index) : !llvm.i64 + %133 = llvm.mul %31, %132 : !llvm.i64 + %134 = llvm.add %131, %133 : !llvm.i64 + %135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %136 = llvm.mul %33, %135 : !llvm.i64 + %137 = llvm.add %134, %136 : !llvm.i64 + %138 = llvm.getelementptr %130[%137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %129, %138 : !llvm.ptr + %139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.mul %31, %141 : !llvm.i64 + %143 = llvm.add %140, %142 : !llvm.i64 + %144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %145 = llvm.mul %33, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.getelementptr %139[%146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %148 = llvm.load %147 : !llvm.ptr + %149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %151 = llvm.mlir.constant(256 : index) : !llvm.i64 + %152 = llvm.mul %31, %151 : !llvm.i64 + %153 = llvm.add %150, %152 : !llvm.i64 + %154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %155 = llvm.mul %33, %154 : !llvm.i64 + %156 = llvm.add %153, %155 : !llvm.i64 + %157 = llvm.getelementptr %149[%156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %148, %157 : !llvm.ptr + %158 = llvm.add %35, %29 : !llvm.i64 + %159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %161 = llvm.mlir.constant(256 : index) : !llvm.i64 + %162 = llvm.mul %31, %161 : !llvm.i64 + %163 = llvm.add %160, %162 : !llvm.i64 + %164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %165 = llvm.mul %158, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.getelementptr %159[%166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %168 = llvm.load %167 : !llvm.ptr + %169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(256 : index) : !llvm.i64 + %172 = llvm.mul %158, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %33, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %178 = llvm.load %177 : !llvm.ptr + %179 = llvm.fmul %168, %178 {RelaxedPrecision} : !llvm.float + %180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %182 = llvm.mlir.constant(256 : index) : !llvm.i64 + %183 = llvm.mul %31, %182 : !llvm.i64 + %184 = llvm.add %181, %183 : !llvm.i64 + %185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %186 = llvm.mul %33, %185 : !llvm.i64 + %187 = llvm.add %184, %186 : !llvm.i64 + %188 = llvm.getelementptr %180[%187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %189 = llvm.load %188 : !llvm.ptr + %190 = llvm.fadd %189, %179 {RelaxedPrecision} : !llvm.float + %191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %193 = llvm.mlir.constant(256 : index) : !llvm.i64 + %194 = llvm.mul %31, %193 : !llvm.i64 + %195 = llvm.add %192, %194 : !llvm.i64 + %196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %197 = llvm.mul %33, %196 : !llvm.i64 + %198 = llvm.add %195, %197 : !llvm.i64 + %199 = llvm.getelementptr %191[%198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %190, %199 : !llvm.ptr + %200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(256 : index) : !llvm.i64 + %203 = llvm.mul %31, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %33, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.load %208 : !llvm.ptr + %210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %212 = llvm.mlir.constant(256 : index) : !llvm.i64 + %213 = llvm.mul %31, %212 : !llvm.i64 + %214 = llvm.add %211, %213 : !llvm.i64 + %215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %216 = llvm.mul %33, %215 : !llvm.i64 + %217 = llvm.add %214, %216 : !llvm.i64 + %218 = llvm.getelementptr %210[%217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %209, %218 : !llvm.ptr + %219 = llvm.add %35, %30 : !llvm.i64 + %220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %222 = llvm.mlir.constant(256 : index) : !llvm.i64 + %223 = llvm.mul %31, %222 : !llvm.i64 + %224 = llvm.add %221, %223 : !llvm.i64 + %225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %226 = llvm.mul %219, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.getelementptr %220[%227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %229 = llvm.load %228 : !llvm.ptr + %230 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %232 = llvm.mlir.constant(256 : index) : !llvm.i64 + %233 = llvm.mul %219, %232 : !llvm.i64 + %234 = llvm.add %231, %233 : !llvm.i64 + %235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %236 = llvm.mul %33, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.getelementptr %230[%237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %239 = llvm.load %238 : !llvm.ptr + %240 = llvm.fmul %229, %239 {RelaxedPrecision} : !llvm.float + %241 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %242 = llvm.mlir.constant(0 : index) : !llvm.i64 + %243 = llvm.mlir.constant(256 : index) : !llvm.i64 + %244 = llvm.mul %31, %243 : !llvm.i64 + %245 = llvm.add %242, %244 : !llvm.i64 + %246 = llvm.mlir.constant(1 : index) : !llvm.i64 + %247 = llvm.mul %33, %246 : !llvm.i64 + %248 = llvm.add %245, %247 : !llvm.i64 + %249 = llvm.getelementptr %241[%248] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %250 = llvm.load %249 : !llvm.ptr + %251 = llvm.fadd %250, %240 {RelaxedPrecision} : !llvm.float + %252 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %253 = llvm.mlir.constant(0 : index) : !llvm.i64 + %254 = llvm.mlir.constant(256 : index) : !llvm.i64 + %255 = llvm.mul %31, %254 : !llvm.i64 + %256 = llvm.add %253, %255 : !llvm.i64 + %257 = llvm.mlir.constant(1 : index) : !llvm.i64 + %258 = llvm.mul %33, %257 : !llvm.i64 + %259 = llvm.add %256, %258 : !llvm.i64 + %260 = llvm.getelementptr %252[%259] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %251, %260 : !llvm.ptr + %261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %263 = llvm.mlir.constant(256 : index) : !llvm.i64 + %264 = llvm.mul %31, %263 : !llvm.i64 + %265 = llvm.add %262, %264 : !llvm.i64 + %266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %267 = llvm.mul %33, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.getelementptr %261[%268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %270 = llvm.load %269 : !llvm.ptr + %271 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %272 = llvm.mlir.constant(0 : index) : !llvm.i64 + %273 = llvm.mlir.constant(256 : index) : !llvm.i64 + %274 = llvm.mul %31, %273 : !llvm.i64 + %275 = llvm.add %272, %274 : !llvm.i64 + %276 = llvm.mlir.constant(1 : index) : !llvm.i64 + %277 = llvm.mul %33, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.getelementptr %271[%278] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %270, %279 : !llvm.ptr + %280 = llvm.add %35, %27 : !llvm.i64 + llvm.br ^bb5(%280 : !llvm.i64) + ^bb7: // pred: ^bb5 + %281 = llvm.add %33, %28 : !llvm.i64 + llvm.br ^bb3(%281 : !llvm.i64) + ^bb8: // pred: ^bb3 + %282 = llvm.add %31, %28 : !llvm.i64 + llvm.br ^bb1(%282 : !llvm.i64) + ^bb9: // pred: ^bb1 + llvm.return + } + llvm.func @hello_matmul_py(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(256 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(256 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(256 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(256 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(256 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(256 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(256 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @hello_matmul_py_impl_16252232176815793891(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/hello_matmul/mlir/23_FunctionPointerResolution.mlir b/Tutorials/hello_matmul/mlir/23_FunctionPointerResolution.mlir new file mode 100644 index 00000000..752331e8 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/23_FunctionPointerResolution.mlir @@ -0,0 +1,375 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(128 : index) : !llvm.i64 + %25 = llvm.mlir.constant(0 : index) : !llvm.i64 + %26 = llvm.mlir.constant(256 : index) : !llvm.i64 + %27 = llvm.mlir.constant(4 : index) : !llvm.i64 + %28 = llvm.mlir.constant(1 : index) : !llvm.i64 + %29 = llvm.mlir.constant(2 : index) : !llvm.i64 + %30 = llvm.mlir.constant(3 : index) : !llvm.i64 + llvm.br ^bb1(%25 : !llvm.i64) + ^bb1(%31: !llvm.i64): // 2 preds: ^bb0, ^bb8 + %32 = llvm.icmp "slt" %31, %24 : !llvm.i64 + llvm.cond_br %32, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%25 : !llvm.i64) + ^bb3(%33: !llvm.i64): // 2 preds: ^bb2, ^bb7 + %34 = llvm.icmp "slt" %33, %26 : !llvm.i64 + llvm.cond_br %34, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%25 : !llvm.i64) + ^bb5(%35: !llvm.i64): // 2 preds: ^bb4, ^bb6 + %36 = llvm.icmp "slt" %35, %26 : !llvm.i64 + llvm.cond_br %36, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %37 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %38 = llvm.mlir.constant(0 : index) : !llvm.i64 + %39 = llvm.mlir.constant(256 : index) : !llvm.i64 + %40 = llvm.mul %31, %39 : !llvm.i64 + %41 = llvm.add %38, %40 : !llvm.i64 + %42 = llvm.mlir.constant(1 : index) : !llvm.i64 + %43 = llvm.mul %35, %42 : !llvm.i64 + %44 = llvm.add %41, %43 : !llvm.i64 + %45 = llvm.getelementptr %37[%44] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %46 = llvm.load %45 : !llvm.ptr + %47 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.mlir.constant(0 : index) : !llvm.i64 + %49 = llvm.mlir.constant(256 : index) : !llvm.i64 + %50 = llvm.mul %35, %49 : !llvm.i64 + %51 = llvm.add %48, %50 : !llvm.i64 + %52 = llvm.mlir.constant(1 : index) : !llvm.i64 + %53 = llvm.mul %33, %52 : !llvm.i64 + %54 = llvm.add %51, %53 : !llvm.i64 + %55 = llvm.getelementptr %47[%54] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %56 = llvm.load %55 : !llvm.ptr + %57 = llvm.fmul %46, %56 {RelaxedPrecision} : !llvm.float + %58 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.mlir.constant(0 : index) : !llvm.i64 + %60 = llvm.mlir.constant(256 : index) : !llvm.i64 + %61 = llvm.mul %31, %60 : !llvm.i64 + %62 = llvm.add %59, %61 : !llvm.i64 + %63 = llvm.mlir.constant(1 : index) : !llvm.i64 + %64 = llvm.mul %33, %63 : !llvm.i64 + %65 = llvm.add %62, %64 : !llvm.i64 + %66 = llvm.getelementptr %58[%65] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %67 = llvm.load %66 : !llvm.ptr + %68 = llvm.fadd %67, %57 {RelaxedPrecision} : !llvm.float + %69 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %70 = llvm.mlir.constant(0 : index) : !llvm.i64 + %71 = llvm.mlir.constant(256 : index) : !llvm.i64 + %72 = llvm.mul %31, %71 : !llvm.i64 + %73 = llvm.add %70, %72 : !llvm.i64 + %74 = llvm.mlir.constant(1 : index) : !llvm.i64 + %75 = llvm.mul %33, %74 : !llvm.i64 + %76 = llvm.add %73, %75 : !llvm.i64 + %77 = llvm.getelementptr %69[%76] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %68, %77 : !llvm.ptr + %78 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %79 = llvm.mlir.constant(0 : index) : !llvm.i64 + %80 = llvm.mlir.constant(256 : index) : !llvm.i64 + %81 = llvm.mul %31, %80 : !llvm.i64 + %82 = llvm.add %79, %81 : !llvm.i64 + %83 = llvm.mlir.constant(1 : index) : !llvm.i64 + %84 = llvm.mul %33, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.getelementptr %78[%85] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %87 = llvm.load %86 : !llvm.ptr + %88 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.mlir.constant(0 : index) : !llvm.i64 + %90 = llvm.mlir.constant(256 : index) : !llvm.i64 + %91 = llvm.mul %31, %90 : !llvm.i64 + %92 = llvm.add %89, %91 : !llvm.i64 + %93 = llvm.mlir.constant(1 : index) : !llvm.i64 + %94 = llvm.mul %33, %93 : !llvm.i64 + %95 = llvm.add %92, %94 : !llvm.i64 + %96 = llvm.getelementptr %88[%95] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %87, %96 : !llvm.ptr + %97 = llvm.add %35, %28 : !llvm.i64 + %98 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %99 = llvm.mlir.constant(0 : index) : !llvm.i64 + %100 = llvm.mlir.constant(256 : index) : !llvm.i64 + %101 = llvm.mul %31, %100 : !llvm.i64 + %102 = llvm.add %99, %101 : !llvm.i64 + %103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %104 = llvm.mul %97, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.getelementptr %98[%105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %107 = llvm.load %106 : !llvm.ptr + %108 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %110 = llvm.mlir.constant(256 : index) : !llvm.i64 + %111 = llvm.mul %97, %110 : !llvm.i64 + %112 = llvm.add %109, %111 : !llvm.i64 + %113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %114 = llvm.mul %33, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.getelementptr %108[%115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %117 = llvm.load %116 : !llvm.ptr + %118 = llvm.fmul %107, %117 {RelaxedPrecision} : !llvm.float + %119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %121 = llvm.mlir.constant(256 : index) : !llvm.i64 + %122 = llvm.mul %31, %121 : !llvm.i64 + %123 = llvm.add %120, %122 : !llvm.i64 + %124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %125 = llvm.mul %33, %124 : !llvm.i64 + %126 = llvm.add %123, %125 : !llvm.i64 + %127 = llvm.getelementptr %119[%126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %128 = llvm.load %127 : !llvm.ptr + %129 = llvm.fadd %128, %118 {RelaxedPrecision} : !llvm.float + %130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %132 = llvm.mlir.constant(256 : index) : !llvm.i64 + %133 = llvm.mul %31, %132 : !llvm.i64 + %134 = llvm.add %131, %133 : !llvm.i64 + %135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %136 = llvm.mul %33, %135 : !llvm.i64 + %137 = llvm.add %134, %136 : !llvm.i64 + %138 = llvm.getelementptr %130[%137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %129, %138 : !llvm.ptr + %139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.mul %31, %141 : !llvm.i64 + %143 = llvm.add %140, %142 : !llvm.i64 + %144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %145 = llvm.mul %33, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.getelementptr %139[%146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %148 = llvm.load %147 : !llvm.ptr + %149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %151 = llvm.mlir.constant(256 : index) : !llvm.i64 + %152 = llvm.mul %31, %151 : !llvm.i64 + %153 = llvm.add %150, %152 : !llvm.i64 + %154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %155 = llvm.mul %33, %154 : !llvm.i64 + %156 = llvm.add %153, %155 : !llvm.i64 + %157 = llvm.getelementptr %149[%156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %148, %157 : !llvm.ptr + %158 = llvm.add %35, %29 : !llvm.i64 + %159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %161 = llvm.mlir.constant(256 : index) : !llvm.i64 + %162 = llvm.mul %31, %161 : !llvm.i64 + %163 = llvm.add %160, %162 : !llvm.i64 + %164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %165 = llvm.mul %158, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.getelementptr %159[%166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %168 = llvm.load %167 : !llvm.ptr + %169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(256 : index) : !llvm.i64 + %172 = llvm.mul %158, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %33, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %178 = llvm.load %177 : !llvm.ptr + %179 = llvm.fmul %168, %178 {RelaxedPrecision} : !llvm.float + %180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %182 = llvm.mlir.constant(256 : index) : !llvm.i64 + %183 = llvm.mul %31, %182 : !llvm.i64 + %184 = llvm.add %181, %183 : !llvm.i64 + %185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %186 = llvm.mul %33, %185 : !llvm.i64 + %187 = llvm.add %184, %186 : !llvm.i64 + %188 = llvm.getelementptr %180[%187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %189 = llvm.load %188 : !llvm.ptr + %190 = llvm.fadd %189, %179 {RelaxedPrecision} : !llvm.float + %191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %193 = llvm.mlir.constant(256 : index) : !llvm.i64 + %194 = llvm.mul %31, %193 : !llvm.i64 + %195 = llvm.add %192, %194 : !llvm.i64 + %196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %197 = llvm.mul %33, %196 : !llvm.i64 + %198 = llvm.add %195, %197 : !llvm.i64 + %199 = llvm.getelementptr %191[%198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %190, %199 : !llvm.ptr + %200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(256 : index) : !llvm.i64 + %203 = llvm.mul %31, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %33, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.load %208 : !llvm.ptr + %210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %212 = llvm.mlir.constant(256 : index) : !llvm.i64 + %213 = llvm.mul %31, %212 : !llvm.i64 + %214 = llvm.add %211, %213 : !llvm.i64 + %215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %216 = llvm.mul %33, %215 : !llvm.i64 + %217 = llvm.add %214, %216 : !llvm.i64 + %218 = llvm.getelementptr %210[%217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %209, %218 : !llvm.ptr + %219 = llvm.add %35, %30 : !llvm.i64 + %220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %222 = llvm.mlir.constant(256 : index) : !llvm.i64 + %223 = llvm.mul %31, %222 : !llvm.i64 + %224 = llvm.add %221, %223 : !llvm.i64 + %225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %226 = llvm.mul %219, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.getelementptr %220[%227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %229 = llvm.load %228 : !llvm.ptr + %230 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %232 = llvm.mlir.constant(256 : index) : !llvm.i64 + %233 = llvm.mul %219, %232 : !llvm.i64 + %234 = llvm.add %231, %233 : !llvm.i64 + %235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %236 = llvm.mul %33, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.getelementptr %230[%237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %239 = llvm.load %238 : !llvm.ptr + %240 = llvm.fmul %229, %239 {RelaxedPrecision} : !llvm.float + %241 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %242 = llvm.mlir.constant(0 : index) : !llvm.i64 + %243 = llvm.mlir.constant(256 : index) : !llvm.i64 + %244 = llvm.mul %31, %243 : !llvm.i64 + %245 = llvm.add %242, %244 : !llvm.i64 + %246 = llvm.mlir.constant(1 : index) : !llvm.i64 + %247 = llvm.mul %33, %246 : !llvm.i64 + %248 = llvm.add %245, %247 : !llvm.i64 + %249 = llvm.getelementptr %241[%248] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %250 = llvm.load %249 : !llvm.ptr + %251 = llvm.fadd %250, %240 {RelaxedPrecision} : !llvm.float + %252 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %253 = llvm.mlir.constant(0 : index) : !llvm.i64 + %254 = llvm.mlir.constant(256 : index) : !llvm.i64 + %255 = llvm.mul %31, %254 : !llvm.i64 + %256 = llvm.add %253, %255 : !llvm.i64 + %257 = llvm.mlir.constant(1 : index) : !llvm.i64 + %258 = llvm.mul %33, %257 : !llvm.i64 + %259 = llvm.add %256, %258 : !llvm.i64 + %260 = llvm.getelementptr %252[%259] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %251, %260 : !llvm.ptr + %261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %263 = llvm.mlir.constant(256 : index) : !llvm.i64 + %264 = llvm.mul %31, %263 : !llvm.i64 + %265 = llvm.add %262, %264 : !llvm.i64 + %266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %267 = llvm.mul %33, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.getelementptr %261[%268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %270 = llvm.load %269 : !llvm.ptr + %271 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %272 = llvm.mlir.constant(0 : index) : !llvm.i64 + %273 = llvm.mlir.constant(256 : index) : !llvm.i64 + %274 = llvm.mul %31, %273 : !llvm.i64 + %275 = llvm.add %272, %274 : !llvm.i64 + %276 = llvm.mlir.constant(1 : index) : !llvm.i64 + %277 = llvm.mul %33, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.getelementptr %271[%278] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %270, %279 : !llvm.ptr + %280 = llvm.add %35, %27 : !llvm.i64 + llvm.br ^bb5(%280 : !llvm.i64) + ^bb7: // pred: ^bb5 + %281 = llvm.add %33, %28 : !llvm.i64 + llvm.br ^bb3(%281 : !llvm.i64) + ^bb8: // pred: ^bb3 + %282 = llvm.add %31, %28 : !llvm.i64 + llvm.br ^bb1(%282 : !llvm.i64) + ^bb9: // pred: ^bb1 + llvm.return + } + llvm.func @hello_matmul_py_0f07b3ac(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(256 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(256 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(256 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(256 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(256 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(256 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(256 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/hello_matmul/mlir/24_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir b/Tutorials/hello_matmul/mlir/24_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir new file mode 100644 index 00000000..752331e8 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/24_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir @@ -0,0 +1,375 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(128 : index) : !llvm.i64 + %25 = llvm.mlir.constant(0 : index) : !llvm.i64 + %26 = llvm.mlir.constant(256 : index) : !llvm.i64 + %27 = llvm.mlir.constant(4 : index) : !llvm.i64 + %28 = llvm.mlir.constant(1 : index) : !llvm.i64 + %29 = llvm.mlir.constant(2 : index) : !llvm.i64 + %30 = llvm.mlir.constant(3 : index) : !llvm.i64 + llvm.br ^bb1(%25 : !llvm.i64) + ^bb1(%31: !llvm.i64): // 2 preds: ^bb0, ^bb8 + %32 = llvm.icmp "slt" %31, %24 : !llvm.i64 + llvm.cond_br %32, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%25 : !llvm.i64) + ^bb3(%33: !llvm.i64): // 2 preds: ^bb2, ^bb7 + %34 = llvm.icmp "slt" %33, %26 : !llvm.i64 + llvm.cond_br %34, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%25 : !llvm.i64) + ^bb5(%35: !llvm.i64): // 2 preds: ^bb4, ^bb6 + %36 = llvm.icmp "slt" %35, %26 : !llvm.i64 + llvm.cond_br %36, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %37 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %38 = llvm.mlir.constant(0 : index) : !llvm.i64 + %39 = llvm.mlir.constant(256 : index) : !llvm.i64 + %40 = llvm.mul %31, %39 : !llvm.i64 + %41 = llvm.add %38, %40 : !llvm.i64 + %42 = llvm.mlir.constant(1 : index) : !llvm.i64 + %43 = llvm.mul %35, %42 : !llvm.i64 + %44 = llvm.add %41, %43 : !llvm.i64 + %45 = llvm.getelementptr %37[%44] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %46 = llvm.load %45 : !llvm.ptr + %47 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.mlir.constant(0 : index) : !llvm.i64 + %49 = llvm.mlir.constant(256 : index) : !llvm.i64 + %50 = llvm.mul %35, %49 : !llvm.i64 + %51 = llvm.add %48, %50 : !llvm.i64 + %52 = llvm.mlir.constant(1 : index) : !llvm.i64 + %53 = llvm.mul %33, %52 : !llvm.i64 + %54 = llvm.add %51, %53 : !llvm.i64 + %55 = llvm.getelementptr %47[%54] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %56 = llvm.load %55 : !llvm.ptr + %57 = llvm.fmul %46, %56 {RelaxedPrecision} : !llvm.float + %58 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.mlir.constant(0 : index) : !llvm.i64 + %60 = llvm.mlir.constant(256 : index) : !llvm.i64 + %61 = llvm.mul %31, %60 : !llvm.i64 + %62 = llvm.add %59, %61 : !llvm.i64 + %63 = llvm.mlir.constant(1 : index) : !llvm.i64 + %64 = llvm.mul %33, %63 : !llvm.i64 + %65 = llvm.add %62, %64 : !llvm.i64 + %66 = llvm.getelementptr %58[%65] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %67 = llvm.load %66 : !llvm.ptr + %68 = llvm.fadd %67, %57 {RelaxedPrecision} : !llvm.float + %69 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %70 = llvm.mlir.constant(0 : index) : !llvm.i64 + %71 = llvm.mlir.constant(256 : index) : !llvm.i64 + %72 = llvm.mul %31, %71 : !llvm.i64 + %73 = llvm.add %70, %72 : !llvm.i64 + %74 = llvm.mlir.constant(1 : index) : !llvm.i64 + %75 = llvm.mul %33, %74 : !llvm.i64 + %76 = llvm.add %73, %75 : !llvm.i64 + %77 = llvm.getelementptr %69[%76] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %68, %77 : !llvm.ptr + %78 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %79 = llvm.mlir.constant(0 : index) : !llvm.i64 + %80 = llvm.mlir.constant(256 : index) : !llvm.i64 + %81 = llvm.mul %31, %80 : !llvm.i64 + %82 = llvm.add %79, %81 : !llvm.i64 + %83 = llvm.mlir.constant(1 : index) : !llvm.i64 + %84 = llvm.mul %33, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.getelementptr %78[%85] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %87 = llvm.load %86 : !llvm.ptr + %88 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.mlir.constant(0 : index) : !llvm.i64 + %90 = llvm.mlir.constant(256 : index) : !llvm.i64 + %91 = llvm.mul %31, %90 : !llvm.i64 + %92 = llvm.add %89, %91 : !llvm.i64 + %93 = llvm.mlir.constant(1 : index) : !llvm.i64 + %94 = llvm.mul %33, %93 : !llvm.i64 + %95 = llvm.add %92, %94 : !llvm.i64 + %96 = llvm.getelementptr %88[%95] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %87, %96 : !llvm.ptr + %97 = llvm.add %35, %28 : !llvm.i64 + %98 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %99 = llvm.mlir.constant(0 : index) : !llvm.i64 + %100 = llvm.mlir.constant(256 : index) : !llvm.i64 + %101 = llvm.mul %31, %100 : !llvm.i64 + %102 = llvm.add %99, %101 : !llvm.i64 + %103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %104 = llvm.mul %97, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.getelementptr %98[%105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %107 = llvm.load %106 : !llvm.ptr + %108 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %110 = llvm.mlir.constant(256 : index) : !llvm.i64 + %111 = llvm.mul %97, %110 : !llvm.i64 + %112 = llvm.add %109, %111 : !llvm.i64 + %113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %114 = llvm.mul %33, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.getelementptr %108[%115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %117 = llvm.load %116 : !llvm.ptr + %118 = llvm.fmul %107, %117 {RelaxedPrecision} : !llvm.float + %119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %121 = llvm.mlir.constant(256 : index) : !llvm.i64 + %122 = llvm.mul %31, %121 : !llvm.i64 + %123 = llvm.add %120, %122 : !llvm.i64 + %124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %125 = llvm.mul %33, %124 : !llvm.i64 + %126 = llvm.add %123, %125 : !llvm.i64 + %127 = llvm.getelementptr %119[%126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %128 = llvm.load %127 : !llvm.ptr + %129 = llvm.fadd %128, %118 {RelaxedPrecision} : !llvm.float + %130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %132 = llvm.mlir.constant(256 : index) : !llvm.i64 + %133 = llvm.mul %31, %132 : !llvm.i64 + %134 = llvm.add %131, %133 : !llvm.i64 + %135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %136 = llvm.mul %33, %135 : !llvm.i64 + %137 = llvm.add %134, %136 : !llvm.i64 + %138 = llvm.getelementptr %130[%137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %129, %138 : !llvm.ptr + %139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.mul %31, %141 : !llvm.i64 + %143 = llvm.add %140, %142 : !llvm.i64 + %144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %145 = llvm.mul %33, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.getelementptr %139[%146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %148 = llvm.load %147 : !llvm.ptr + %149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %151 = llvm.mlir.constant(256 : index) : !llvm.i64 + %152 = llvm.mul %31, %151 : !llvm.i64 + %153 = llvm.add %150, %152 : !llvm.i64 + %154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %155 = llvm.mul %33, %154 : !llvm.i64 + %156 = llvm.add %153, %155 : !llvm.i64 + %157 = llvm.getelementptr %149[%156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %148, %157 : !llvm.ptr + %158 = llvm.add %35, %29 : !llvm.i64 + %159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %161 = llvm.mlir.constant(256 : index) : !llvm.i64 + %162 = llvm.mul %31, %161 : !llvm.i64 + %163 = llvm.add %160, %162 : !llvm.i64 + %164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %165 = llvm.mul %158, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.getelementptr %159[%166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %168 = llvm.load %167 : !llvm.ptr + %169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(256 : index) : !llvm.i64 + %172 = llvm.mul %158, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %33, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %178 = llvm.load %177 : !llvm.ptr + %179 = llvm.fmul %168, %178 {RelaxedPrecision} : !llvm.float + %180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %182 = llvm.mlir.constant(256 : index) : !llvm.i64 + %183 = llvm.mul %31, %182 : !llvm.i64 + %184 = llvm.add %181, %183 : !llvm.i64 + %185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %186 = llvm.mul %33, %185 : !llvm.i64 + %187 = llvm.add %184, %186 : !llvm.i64 + %188 = llvm.getelementptr %180[%187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %189 = llvm.load %188 : !llvm.ptr + %190 = llvm.fadd %189, %179 {RelaxedPrecision} : !llvm.float + %191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %193 = llvm.mlir.constant(256 : index) : !llvm.i64 + %194 = llvm.mul %31, %193 : !llvm.i64 + %195 = llvm.add %192, %194 : !llvm.i64 + %196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %197 = llvm.mul %33, %196 : !llvm.i64 + %198 = llvm.add %195, %197 : !llvm.i64 + %199 = llvm.getelementptr %191[%198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %190, %199 : !llvm.ptr + %200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(256 : index) : !llvm.i64 + %203 = llvm.mul %31, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %33, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.load %208 : !llvm.ptr + %210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %212 = llvm.mlir.constant(256 : index) : !llvm.i64 + %213 = llvm.mul %31, %212 : !llvm.i64 + %214 = llvm.add %211, %213 : !llvm.i64 + %215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %216 = llvm.mul %33, %215 : !llvm.i64 + %217 = llvm.add %214, %216 : !llvm.i64 + %218 = llvm.getelementptr %210[%217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %209, %218 : !llvm.ptr + %219 = llvm.add %35, %30 : !llvm.i64 + %220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %222 = llvm.mlir.constant(256 : index) : !llvm.i64 + %223 = llvm.mul %31, %222 : !llvm.i64 + %224 = llvm.add %221, %223 : !llvm.i64 + %225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %226 = llvm.mul %219, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.getelementptr %220[%227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %229 = llvm.load %228 : !llvm.ptr + %230 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %232 = llvm.mlir.constant(256 : index) : !llvm.i64 + %233 = llvm.mul %219, %232 : !llvm.i64 + %234 = llvm.add %231, %233 : !llvm.i64 + %235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %236 = llvm.mul %33, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.getelementptr %230[%237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %239 = llvm.load %238 : !llvm.ptr + %240 = llvm.fmul %229, %239 {RelaxedPrecision} : !llvm.float + %241 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %242 = llvm.mlir.constant(0 : index) : !llvm.i64 + %243 = llvm.mlir.constant(256 : index) : !llvm.i64 + %244 = llvm.mul %31, %243 : !llvm.i64 + %245 = llvm.add %242, %244 : !llvm.i64 + %246 = llvm.mlir.constant(1 : index) : !llvm.i64 + %247 = llvm.mul %33, %246 : !llvm.i64 + %248 = llvm.add %245, %247 : !llvm.i64 + %249 = llvm.getelementptr %241[%248] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %250 = llvm.load %249 : !llvm.ptr + %251 = llvm.fadd %250, %240 {RelaxedPrecision} : !llvm.float + %252 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %253 = llvm.mlir.constant(0 : index) : !llvm.i64 + %254 = llvm.mlir.constant(256 : index) : !llvm.i64 + %255 = llvm.mul %31, %254 : !llvm.i64 + %256 = llvm.add %253, %255 : !llvm.i64 + %257 = llvm.mlir.constant(1 : index) : !llvm.i64 + %258 = llvm.mul %33, %257 : !llvm.i64 + %259 = llvm.add %256, %258 : !llvm.i64 + %260 = llvm.getelementptr %252[%259] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %251, %260 : !llvm.ptr + %261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %263 = llvm.mlir.constant(256 : index) : !llvm.i64 + %264 = llvm.mul %31, %263 : !llvm.i64 + %265 = llvm.add %262, %264 : !llvm.i64 + %266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %267 = llvm.mul %33, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.getelementptr %261[%268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %270 = llvm.load %269 : !llvm.ptr + %271 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %272 = llvm.mlir.constant(0 : index) : !llvm.i64 + %273 = llvm.mlir.constant(256 : index) : !llvm.i64 + %274 = llvm.mul %31, %273 : !llvm.i64 + %275 = llvm.add %272, %274 : !llvm.i64 + %276 = llvm.mlir.constant(1 : index) : !llvm.i64 + %277 = llvm.mul %33, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.getelementptr %271[%278] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %270, %279 : !llvm.ptr + %280 = llvm.add %35, %27 : !llvm.i64 + llvm.br ^bb5(%280 : !llvm.i64) + ^bb7: // pred: ^bb5 + %281 = llvm.add %33, %28 : !llvm.i64 + llvm.br ^bb3(%281 : !llvm.i64) + ^bb8: // pred: ^bb3 + %282 = llvm.add %31, %28 : !llvm.i64 + llvm.br ^bb1(%282 : !llvm.i64) + ^bb9: // pred: ^bb1 + llvm.return + } + llvm.func @hello_matmul_py_0f07b3ac(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(256 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(256 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(256 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(256 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(256 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(256 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(256 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/hello_matmul/mlir/2_LoopNestToValueFunc.mlir b/Tutorials/hello_matmul/mlir/2_LoopNestToValueFunc.mlir new file mode 100644 index 00000000..872f0280 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/2_LoopNestToValueFunc.mlir @@ -0,0 +1,31 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "hello_matmul" { + accv.func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + "accv.lambda"() ( { + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + affine.for %arg6 = 0 to 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %1 = load %arg0[%arg3, %0] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = load %arg1[%0, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %3 = "accv.bin_op"(%1, %2) {predicate = 2 : i64} : (f32, f32) -> f32 + %4 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = "accv.bin_op"(%4, %3) {predicate = 0 : i64} : (f32, f32) -> f32 + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %6, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i,4}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_0", type = () -> ()} : () -> () + accv.return + } + accv.func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + accv.return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/3_ValueFuncToTarget.mlir b/Tutorials/hello_matmul/mlir/3_ValueFuncToTarget.mlir new file mode 100644 index 00000000..9ce82506 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/3_ValueFuncToTarget.mlir @@ -0,0 +1,106 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "hello_matmul" { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %c0) + %1 = load %arg0[%arg3, %0] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = load %arg1[%0, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %3 = "accv.bin_op"(%1, %2) {predicate = 2 : i64} : (f32, f32) -> f32 + %4 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = "accv.bin_op"(%4, %3) {predicate = 0 : i64} : (f32, f32) -> f32 + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %6, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %7 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0) + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %7) + %9 = load %arg0[%arg3, %8] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %10 = load %arg1[%8, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%9, %10) {predicate = 2 : i64} : (f32, f32) -> f32 + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = "accv.bin_op"(%12, %11) {predicate = 0 : i64} : (f32, f32) -> f32 + store %13, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %14 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %14, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0) + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %15) + %17 = load %arg0[%arg3, %16] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = load %arg1[%16, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = "accv.bin_op"(%17, %18) {predicate = 2 : i64} : (f32, f32) -> f32 + %20 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %21 = "accv.bin_op"(%20, %19) {predicate = 0 : i64} : (f32, f32) -> f32 + store %21, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %22, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0) + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %23) + %25 = load %arg0[%arg3, %24] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg1[%24, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %27 = "accv.bin_op"(%25, %26) {predicate = 2 : i64} : (f32, f32) -> f32 + %28 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %29 = "accv.bin_op"(%28, %27) {predicate = 0 : i64} : (f32, f32) -> f32 + store %29, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %30 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %30, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + func @NestFunction_0(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %c0) + %1 = load %arg0[%arg3, %0] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = load %arg1[%0, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %3 = "accv.bin_op"(%1, %2) {predicate = 2 : i64} : (f32, f32) -> f32 + %4 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = "accv.bin_op"(%4, %3) {predicate = 0 : i64} : (f32, f32) -> f32 + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %6, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %7 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0) + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %7) + %9 = load %arg0[%arg3, %8] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %10 = load %arg1[%8, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%9, %10) {predicate = 2 : i64} : (f32, f32) -> f32 + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = "accv.bin_op"(%12, %11) {predicate = 0 : i64} : (f32, f32) -> f32 + store %13, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %14 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %14, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0) + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %15) + %17 = load %arg0[%arg3, %16] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = load %arg1[%16, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = "accv.bin_op"(%17, %18) {predicate = 2 : i64} : (f32, f32) -> f32 + %20 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %21 = "accv.bin_op"(%20, %19) {predicate = 0 : i64} : (f32, f32) -> f32 + store %21, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %22, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0) + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %23) + %25 = load %arg0[%arg3, %24] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg1[%24, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %27 = "accv.bin_op"(%25, %26) {predicate = 2 : i64} : (f32, f32) -> f32 + %28 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %29 = "accv.bin_op"(%28, %27) {predicate = 0 : i64} : (f32, f32) -> f32 + store %29, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %30 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %30, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/4_LinalgLowerToAffineLoops.mlir b/Tutorials/hello_matmul/mlir/4_LinalgLowerToAffineLoops.mlir new file mode 100644 index 00000000..40e2455d --- /dev/null +++ b/Tutorials/hello_matmul/mlir/4_LinalgLowerToAffineLoops.mlir @@ -0,0 +1,52 @@ +module @hello_matmul { + accv.module "hello_matmul" { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = "accv.bin_op"(%0, %1) {predicate = 2 : i64} : (f32, f32) -> f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = "accv.bin_op"(%3, %2) {predicate = 0 : i64} : (f32, f32) -> f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg5) + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = "accv.bin_op"(%7, %8) {predicate = 2 : i64} : (f32, f32) -> f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%10, %9) {predicate = 0 : i64} : (f32, f32) -> f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg5) + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = "accv.bin_op"(%14, %15) {predicate = 2 : i64} : (f32, f32) -> f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = "accv.bin_op"(%17, %16) {predicate = 0 : i64} : (f32, f32) -> f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg5) + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = "accv.bin_op"(%21, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = "accv.bin_op"(%24, %23) {predicate = 0 : i64} : (f32, f32) -> f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_1,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/4_SymbolDCE.mlir b/Tutorials/hello_matmul/mlir/4_SymbolDCE.mlir new file mode 100644 index 00000000..17fad421 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/4_SymbolDCE.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "hello_matmul" { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %c0) + %1 = load %arg0[%arg3, %0] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = load %arg1[%0, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %3 = "accv.bin_op"(%1, %2) {predicate = 2 : i64} : (f32, f32) -> f32 + %4 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = "accv.bin_op"(%4, %3) {predicate = 0 : i64} : (f32, f32) -> f32 + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %6, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %7 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0) + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %7) + %9 = load %arg0[%arg3, %8] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %10 = load %arg1[%8, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%9, %10) {predicate = 2 : i64} : (f32, f32) -> f32 + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = "accv.bin_op"(%12, %11) {predicate = 0 : i64} : (f32, f32) -> f32 + store %13, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %14 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %14, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0) + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %15) + %17 = load %arg0[%arg3, %16] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = load %arg1[%16, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = "accv.bin_op"(%17, %18) {predicate = 2 : i64} : (f32, f32) -> f32 + %20 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %21 = "accv.bin_op"(%20, %19) {predicate = 0 : i64} : (f32, f32) -> f32 + store %21, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %22, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0) + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %23) + %25 = load %arg0[%arg3, %24] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg1[%24, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %27 = "accv.bin_op"(%25, %26) {predicate = 2 : i64} : (f32, f32) -> f32 + %28 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %29 = "accv.bin_op"(%28, %27) {predicate = 0 : i64} : (f32, f32) -> f32 + store %29, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %30 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %30, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/5_LinalgLowerToAffineLoops.mlir b/Tutorials/hello_matmul/mlir/5_LinalgLowerToAffineLoops.mlir new file mode 100644 index 00000000..eff89a6f --- /dev/null +++ b/Tutorials/hello_matmul/mlir/5_LinalgLowerToAffineLoops.mlir @@ -0,0 +1,52 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "hello_matmul" { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = "accv.bin_op"(%0, %1) {predicate = 2 : i64} : (f32, f32) -> f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = "accv.bin_op"(%3, %2) {predicate = 0 : i64} : (f32, f32) -> f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg5) + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = "accv.bin_op"(%7, %8) {predicate = 2 : i64} : (f32, f32) -> f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%10, %9) {predicate = 0 : i64} : (f32, f32) -> f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg5) + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = "accv.bin_op"(%14, %15) {predicate = 2 : i64} : (f32, f32) -> f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = "accv.bin_op"(%17, %16) {predicate = 0 : i64} : (f32, f32) -> f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg5) + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = "accv.bin_op"(%21, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = "accv.bin_op"(%24, %23) {predicate = 0 : i64} : (f32, f32) -> f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/5_SimplifyAffineStructures.mlir b/Tutorials/hello_matmul/mlir/5_SimplifyAffineStructures.mlir new file mode 100644 index 00000000..40e2455d --- /dev/null +++ b/Tutorials/hello_matmul/mlir/5_SimplifyAffineStructures.mlir @@ -0,0 +1,52 @@ +module @hello_matmul { + accv.module "hello_matmul" { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = "accv.bin_op"(%0, %1) {predicate = 2 : i64} : (f32, f32) -> f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = "accv.bin_op"(%3, %2) {predicate = 0 : i64} : (f32, f32) -> f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg5) + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = "accv.bin_op"(%7, %8) {predicate = 2 : i64} : (f32, f32) -> f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%10, %9) {predicate = 0 : i64} : (f32, f32) -> f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg5) + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = "accv.bin_op"(%14, %15) {predicate = 2 : i64} : (f32, f32) -> f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = "accv.bin_op"(%17, %16) {predicate = 0 : i64} : (f32, f32) -> f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg5) + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = "accv.bin_op"(%21, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = "accv.bin_op"(%24, %23) {predicate = 0 : i64} : (f32, f32) -> f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_1,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/6_Canonicalizer.mlir b/Tutorials/hello_matmul/mlir/6_Canonicalizer.mlir new file mode 100644 index 00000000..40e2455d --- /dev/null +++ b/Tutorials/hello_matmul/mlir/6_Canonicalizer.mlir @@ -0,0 +1,52 @@ +module @hello_matmul { + accv.module "hello_matmul" { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = "accv.bin_op"(%0, %1) {predicate = 2 : i64} : (f32, f32) -> f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = "accv.bin_op"(%3, %2) {predicate = 0 : i64} : (f32, f32) -> f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg5) + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = "accv.bin_op"(%7, %8) {predicate = 2 : i64} : (f32, f32) -> f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%10, %9) {predicate = 0 : i64} : (f32, f32) -> f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg5) + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = "accv.bin_op"(%14, %15) {predicate = 2 : i64} : (f32, f32) -> f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = "accv.bin_op"(%17, %16) {predicate = 0 : i64} : (f32, f32) -> f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg5) + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = "accv.bin_op"(%21, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = "accv.bin_op"(%24, %23) {predicate = 0 : i64} : (f32, f32) -> f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_1,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/6_SimplifyAffineStructures.mlir b/Tutorials/hello_matmul/mlir/6_SimplifyAffineStructures.mlir new file mode 100644 index 00000000..eff89a6f --- /dev/null +++ b/Tutorials/hello_matmul/mlir/6_SimplifyAffineStructures.mlir @@ -0,0 +1,52 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "hello_matmul" { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = "accv.bin_op"(%0, %1) {predicate = 2 : i64} : (f32, f32) -> f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = "accv.bin_op"(%3, %2) {predicate = 0 : i64} : (f32, f32) -> f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg5) + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = "accv.bin_op"(%7, %8) {predicate = 2 : i64} : (f32, f32) -> f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%10, %9) {predicate = 0 : i64} : (f32, f32) -> f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg5) + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = "accv.bin_op"(%14, %15) {predicate = 2 : i64} : (f32, f32) -> f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = "accv.bin_op"(%17, %16) {predicate = 0 : i64} : (f32, f32) -> f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg5) + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = "accv.bin_op"(%21, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = "accv.bin_op"(%24, %23) {predicate = 0 : i64} : (f32, f32) -> f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/7_Canonicalizer.mlir b/Tutorials/hello_matmul/mlir/7_Canonicalizer.mlir new file mode 100644 index 00000000..eff89a6f --- /dev/null +++ b/Tutorials/hello_matmul/mlir/7_Canonicalizer.mlir @@ -0,0 +1,52 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "hello_matmul" { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + affine.for %arg3 = 0 to 128 { + affine.for %arg4 = 0 to 256 { + affine.for %arg5 = 0 to 256 step 4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = "accv.bin_op"(%0, %1) {predicate = 2 : i64} : (f32, f32) -> f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = "accv.bin_op"(%3, %2) {predicate = 0 : i64} : (f32, f32) -> f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg5) + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = "accv.bin_op"(%7, %8) {predicate = 2 : i64} : (f32, f32) -> f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%10, %9) {predicate = 0 : i64} : (f32, f32) -> f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg5) + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = "accv.bin_op"(%14, %15) {predicate = 2 : i64} : (f32, f32) -> f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = "accv.bin_op"(%17, %16) {predicate = 0 : i64} : (f32, f32) -> f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg5) + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = "accv.bin_op"(%21, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = "accv.bin_op"(%24, %23) {predicate = 0 : i64} : (f32, f32) -> f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{k_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j,1}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 256]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i,0}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 256]} + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/7_ConvertAffineToStandard.mlir b/Tutorials/hello_matmul/mlir/7_ConvertAffineToStandard.mlir new file mode 100644 index 00000000..aaa4db3d --- /dev/null +++ b/Tutorials/hello_matmul/mlir/7_ConvertAffineToStandard.mlir @@ -0,0 +1,64 @@ +module @hello_matmul { + accv.module "hello_matmul" { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + %c128 = constant 128 : index + %c1 = constant 1 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + %c0_0 = constant 0 : index + %c256 = constant 256 : index + %c1_1 = constant 1 : index + scf.for %arg4 = %c0_0 to %c256 step %c1_1 { + %c0_2 = constant 0 : index + %c256_3 = constant 256 : index + %c4 = constant 4 : index + scf.for %arg5 = %c0_2 to %c256_3 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = "accv.bin_op"(%0, %1) {predicate = 2 : i64} : (f32, f32) -> f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = "accv.bin_op"(%3, %2) {predicate = 0 : i64} : (f32, f32) -> f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %c1_4 = constant 1 : index + %6 = addi %arg5, %c1_4 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = "accv.bin_op"(%7, %8) {predicate = 2 : i64} : (f32, f32) -> f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%10, %9) {predicate = 0 : i64} : (f32, f32) -> f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %c2 = constant 2 : index + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = "accv.bin_op"(%14, %15) {predicate = 2 : i64} : (f32, f32) -> f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = "accv.bin_op"(%17, %16) {predicate = 0 : i64} : (f32, f32) -> f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %c3 = constant 3 : index + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = "accv.bin_op"(%21, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = "accv.bin_op"(%24, %23) {predicate = 0 : i64} : (f32, f32) -> f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/8_ConvertAffineToStandard.mlir b/Tutorials/hello_matmul/mlir/8_ConvertAffineToStandard.mlir new file mode 100644 index 00000000..de49a5d2 --- /dev/null +++ b/Tutorials/hello_matmul/mlir/8_ConvertAffineToStandard.mlir @@ -0,0 +1,64 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "hello_matmul" { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + %c128 = constant 128 : index + %c1 = constant 1 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + %c0_0 = constant 0 : index + %c256 = constant 256 : index + %c1_1 = constant 1 : index + scf.for %arg4 = %c0_0 to %c256 step %c1_1 { + %c0_2 = constant 0 : index + %c256_3 = constant 256 : index + %c4 = constant 4 : index + scf.for %arg5 = %c0_2 to %c256_3 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = "accv.bin_op"(%0, %1) {predicate = 2 : i64} : (f32, f32) -> f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = "accv.bin_op"(%3, %2) {predicate = 0 : i64} : (f32, f32) -> f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %c1_4 = constant 1 : index + %6 = addi %arg5, %c1_4 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = "accv.bin_op"(%7, %8) {predicate = 2 : i64} : (f32, f32) -> f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = "accv.bin_op"(%10, %9) {predicate = 0 : i64} : (f32, f32) -> f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %c2 = constant 2 : index + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = "accv.bin_op"(%14, %15) {predicate = 2 : i64} : (f32, f32) -> f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = "accv.bin_op"(%17, %16) {predicate = 0 : i64} : (f32, f32) -> f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %c3 = constant 3 : index + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = "accv.bin_op"(%21, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = "accv.bin_op"(%24, %23) {predicate = 0 : i64} : (f32, f32) -> f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/hello_matmul/mlir/8_ConvertValueToStd.mlir b/Tutorials/hello_matmul/mlir/8_ConvertValueToStd.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/8_ConvertValueToStd.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/9_Canonicalizer.mlir b/Tutorials/hello_matmul/mlir/9_Canonicalizer.mlir new file mode 100644 index 00000000..aaf902ec --- /dev/null +++ b/Tutorials/hello_matmul/mlir/9_Canonicalizer.mlir @@ -0,0 +1,57 @@ +module @hello_matmul { + func @hello_matmul_py_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/9_ConvertValueToStd.mlir b/Tutorials/hello_matmul/mlir/9_ConvertValueToStd.mlir new file mode 100644 index 00000000..269aff3c --- /dev/null +++ b/Tutorials/hello_matmul/mlir/9_ConvertValueToStd.mlir @@ -0,0 +1,57 @@ +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c128 = constant 128 : index + %c0 = constant 0 : index + %c256 = constant 256 : index + %c4 = constant 4 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + scf.for %arg3 = %c0 to %c128 step %c1 { + scf.for %arg4 = %c0 to %c256 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c4 { + %0 = load %arg0[%arg3, %arg5] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %1 = load %arg1[%arg5, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %2 = mulf %0, %1 {RelaxedPrecision} : f32 + %3 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %4 = addf %3, %2 {RelaxedPrecision} : f32 + store %4, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %5 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %5, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %6 = addi %arg5, %c1 : index + %7 = load %arg0[%arg3, %6] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %8 = load %arg1[%6, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %9 = mulf %7, %8 {RelaxedPrecision} : f32 + %10 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %11 = addf %10, %9 {RelaxedPrecision} : f32 + store %11, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %12 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %12, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %13 = addi %arg5, %c2 : index + %14 = load %arg0[%arg3, %13] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %15 = load %arg1[%13, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %16 = mulf %14, %15 {RelaxedPrecision} : f32 + %17 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %18 = addf %17, %16 {RelaxedPrecision} : f32 + store %18, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %19 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %19, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %20 = addi %arg5, %c3 : index + %21 = load %arg0[%arg3, %20] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %22 = load %arg1[%20, %arg4] : memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %23 = mulf %21, %22 {RelaxedPrecision} : f32 + %24 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %25 = addf %24, %23 {RelaxedPrecision} : f32 + store %25, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + %26 = load %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + store %26, %arg2[%arg3, %arg4] : memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>> + } + } + } + return + } + func @hello_matmul_py_0f07b3ac(%arg0: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg1: memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, %arg2: memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0, %arg1, %arg2) : (memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<256x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>, memref<128x256xf32, affine_map<(d0, d1) -> (d0 * 256 + d1)>>) -> () + return + } +} diff --git a/Tutorials/hello_matmul/mlir/hello_matmul.s b/Tutorials/hello_matmul/mlir/hello_matmul.s new file mode 100644 index 00000000..9085c7cc --- /dev/null +++ b/Tutorials/hello_matmul/mlir/hello_matmul.s @@ -0,0 +1,387 @@ + .text + .def @feat.00; + .scl 3; + .type 0; + .endef + .globl @feat.00 +.set @feat.00, 0 + .file "LLVMDialectModule" + .def hello_matmul_py_0f07b3ac_impl_16252232176815793891; + .scl 2; + .type 32; + .endef + .globl hello_matmul_py_0f07b3ac_impl_16252232176815793891 # -- Begin function hello_matmul_py_0f07b3ac_impl_16252232176815793891 + .p2align 4, 0x90 +hello_matmul_py_0f07b3ac_impl_16252232176815793891: # @hello_matmul_py_0f07b3ac_impl_16252232176815793891 +.Lfunc_begin0: + .file 1 "D:\\win\\repos\\accera-samples\\tutorials\\hello_matmul\\_tmp\\hello_matmul\\hello_matmul_llvm.mlir" + .loc 1 6 0 # hello_matmul\hello_matmul_llvm.mlir:6:0 +# %bb.0: + pushq %rsi + pushq %rdi + pushq %rbx + movq 152(%rsp), %rax + movl $3072, %r8d # imm = 0xC00 +.Ltmp0: + .loc 1 41 5 prologue_end # hello_matmul\hello_matmul_llvm.mlir:41:5 + addq 96(%rsp), %r8 + addq $12, %rdx + xorl %r9d, %r9d + .p2align 4, 0x90 +.LBB0_1: # %.preheader1 + # =>This Loop Header: Depth=1 + # Child Loop BB0_2 Depth 2 + # Child Loop BB0_3 Depth 3 + .loc 1 0 5 is_stmt 0 # hello_matmul\hello_matmul_llvm.mlir:0:5 + movq %r9, %r10 + shlq $8, %r10 + movq %r8, %r11 + xorl %ecx, %ecx + .p2align 4, 0x90 +.LBB0_2: # %.preheader + # Parent Loop BB0_1 Depth=1 + # => This Loop Header: Depth=2 + # Child Loop BB0_3 Depth 3 + leaq (%rcx,%r10), %rsi + .loc 1 83 11 is_stmt 1 # hello_matmul\hello_matmul_llvm.mlir:83:11 + vmovss (%rax,%rsi,4), %xmm0 # xmm0 = mem[0],zero,zero,zero + movq $-4, %rdi + movq %r11, %rbx + .p2align 4, 0x90 +.LBB0_3: # Parent Loop BB0_1 Depth=1 + # Parent Loop BB0_2 Depth=2 + # => This Inner Loop Header: Depth=3 + .loc 1 62 11 # hello_matmul\hello_matmul_llvm.mlir:62:11 + vmovss 4(%rdx,%rdi,4), %xmm1 # xmm1 = mem[0],zero,zero,zero + .loc 1 84 11 # hello_matmul\hello_matmul_llvm.mlir:84:11 + vfmadd132ss -3072(%rbx), %xmm0, %xmm1 # xmm1 = (xmm1 * mem) + xmm0 + .loc 1 114 5 # hello_matmul\hello_matmul_llvm.mlir:114:5 + vmovss %xmm1, (%rax,%rsi,4) + .loc 1 125 12 # hello_matmul\hello_matmul_llvm.mlir:125:12 + vmovss 8(%rdx,%rdi,4), %xmm0 # xmm0 = mem[0],zero,zero,zero + .loc 1 147 12 # hello_matmul\hello_matmul_llvm.mlir:147:12 + vfmadd132ss -2048(%rbx), %xmm1, %xmm0 # xmm0 = (xmm0 * mem) + xmm1 + .loc 1 177 5 # hello_matmul\hello_matmul_llvm.mlir:177:5 + vmovss %xmm0, (%rax,%rsi,4) + .loc 1 188 12 # hello_matmul\hello_matmul_llvm.mlir:188:12 + vmovss 12(%rdx,%rdi,4), %xmm1 # xmm1 = mem[0],zero,zero,zero + .loc 1 210 12 # hello_matmul\hello_matmul_llvm.mlir:210:12 + vfmadd132ss -1024(%rbx), %xmm0, %xmm1 # xmm1 = (xmm1 * mem) + xmm0 + .loc 1 240 5 # hello_matmul\hello_matmul_llvm.mlir:240:5 + vmovss %xmm1, (%rax,%rsi,4) + .loc 1 251 12 # hello_matmul\hello_matmul_llvm.mlir:251:12 + vmovss 16(%rdx,%rdi,4), %xmm0 # xmm0 = mem[0],zero,zero,zero + .loc 1 273 12 # hello_matmul\hello_matmul_llvm.mlir:273:12 + vfmadd132ss (%rbx), %xmm1, %xmm0 # xmm0 = (xmm0 * mem) + xmm1 + .loc 1 303 5 # hello_matmul\hello_matmul_llvm.mlir:303:5 + vmovss %xmm0, (%rax,%rsi,4) + .loc 1 50 11 # hello_matmul\hello_matmul_llvm.mlir:50:11 + addq $4, %rdi + addq $4096, %rbx # imm = 0x1000 + cmpq $252, %rdi + .loc 1 51 5 # hello_matmul\hello_matmul_llvm.mlir:51:5 + jb .LBB0_3 +# %bb.4: # in Loop: Header=BB0_2 Depth=2 + .loc 1 307 12 # hello_matmul\hello_matmul_llvm.mlir:307:12 + incq %rcx + .loc 1 46 5 # hello_matmul\hello_matmul_llvm.mlir:46:5 + addq $4, %r11 + .loc 1 45 11 # hello_matmul\hello_matmul_llvm.mlir:45:11 + cmpq $256, %rcx # imm = 0x100 + .loc 1 46 5 # hello_matmul\hello_matmul_llvm.mlir:46:5 + jne .LBB0_2 +# %bb.5: # in Loop: Header=BB0_1 Depth=1 + .loc 1 310 12 # hello_matmul\hello_matmul_llvm.mlir:310:12 + incq %r9 + .loc 1 41 5 # hello_matmul\hello_matmul_llvm.mlir:41:5 + addq $1024, %rdx # imm = 0x400 + .loc 1 40 11 # hello_matmul\hello_matmul_llvm.mlir:40:11 + cmpq $128, %r9 + .loc 1 41 5 # hello_matmul\hello_matmul_llvm.mlir:41:5 + jne .LBB0_1 +# %bb.6: + .loc 1 313 5 # hello_matmul\hello_matmul_llvm.mlir:313:5 + popq %rbx + popq %rdi + popq %rsi + retq +.Ltmp1: +.Lfunc_end0: + # -- End function + .def hello_matmul_py_0f07b3ac; + .scl 2; + .type 32; + .endef + .globl hello_matmul_py_0f07b3ac # -- Begin function hello_matmul_py_0f07b3ac + .p2align 4, 0x90 +hello_matmul_py_0f07b3ac: # @hello_matmul_py_0f07b3ac +.Lfunc_begin1: + .loc 1 315 0 # hello_matmul\hello_matmul_llvm.mlir:315:0 +# %bb.0: + pushq %rsi + pushq %rdi + pushq %rbx + .loc 1 41 5 prologue_end # hello_matmul\hello_matmul_llvm.mlir:41:5 + addq $3072, %rdx # imm = 0xC00 + addq $12, %rcx + xorl %r9d, %r9d + .p2align 4, 0x90 +.LBB1_1: # %.preheader1.i + # =>This Loop Header: Depth=1 + # Child Loop BB1_2 Depth 2 + # Child Loop BB1_3 Depth 3 + .loc 1 0 5 is_stmt 0 # hello_matmul\hello_matmul_llvm.mlir:0:5 + movq %r9, %r10 + shlq $8, %r10 + movq %rdx, %r11 + xorl %eax, %eax + .p2align 4, 0x90 +.LBB1_2: # %.preheader.i + # Parent Loop BB1_1 Depth=1 + # => This Loop Header: Depth=2 + # Child Loop BB1_3 Depth 3 + leaq (%rax,%r10), %rsi + .loc 1 83 11 is_stmt 1 # hello_matmul\hello_matmul_llvm.mlir:83:11 + vmovss (%r8,%rsi,4), %xmm0 # xmm0 = mem[0],zero,zero,zero + movq $-4, %rdi + movq %r11, %rbx + .p2align 4, 0x90 +.LBB1_3: # Parent Loop BB1_1 Depth=1 + # Parent Loop BB1_2 Depth=2 + # => This Inner Loop Header: Depth=3 + .loc 1 62 11 # hello_matmul\hello_matmul_llvm.mlir:62:11 + vmovss 4(%rcx,%rdi,4), %xmm1 # xmm1 = mem[0],zero,zero,zero + .loc 1 84 11 # hello_matmul\hello_matmul_llvm.mlir:84:11 + vfmadd132ss -3072(%rbx), %xmm0, %xmm1 # xmm1 = (xmm1 * mem) + xmm0 + .loc 1 114 5 # hello_matmul\hello_matmul_llvm.mlir:114:5 + vmovss %xmm1, (%r8,%rsi,4) + .loc 1 125 12 # hello_matmul\hello_matmul_llvm.mlir:125:12 + vmovss 8(%rcx,%rdi,4), %xmm0 # xmm0 = mem[0],zero,zero,zero + .loc 1 147 12 # hello_matmul\hello_matmul_llvm.mlir:147:12 + vfmadd132ss -2048(%rbx), %xmm1, %xmm0 # xmm0 = (xmm0 * mem) + xmm1 + .loc 1 177 5 # hello_matmul\hello_matmul_llvm.mlir:177:5 + vmovss %xmm0, (%r8,%rsi,4) + .loc 1 188 12 # hello_matmul\hello_matmul_llvm.mlir:188:12 + vmovss 12(%rcx,%rdi,4), %xmm1 # xmm1 = mem[0],zero,zero,zero + .loc 1 210 12 # hello_matmul\hello_matmul_llvm.mlir:210:12 + vfmadd132ss -1024(%rbx), %xmm0, %xmm1 # xmm1 = (xmm1 * mem) + xmm0 + .loc 1 240 5 # hello_matmul\hello_matmul_llvm.mlir:240:5 + vmovss %xmm1, (%r8,%rsi,4) + .loc 1 251 12 # hello_matmul\hello_matmul_llvm.mlir:251:12 + vmovss 16(%rcx,%rdi,4), %xmm0 # xmm0 = mem[0],zero,zero,zero + .loc 1 273 12 # hello_matmul\hello_matmul_llvm.mlir:273:12 + vfmadd132ss (%rbx), %xmm1, %xmm0 # xmm0 = (xmm0 * mem) + xmm1 + .loc 1 303 5 # hello_matmul\hello_matmul_llvm.mlir:303:5 + vmovss %xmm0, (%r8,%rsi,4) + .loc 1 50 11 # hello_matmul\hello_matmul_llvm.mlir:50:11 + addq $4096, %rbx # imm = 0x1000 + addq $4, %rdi + cmpq $252, %rdi + .loc 1 51 5 # hello_matmul\hello_matmul_llvm.mlir:51:5 + jb .LBB1_3 +# %bb.4: # in Loop: Header=BB1_2 Depth=2 + .loc 1 307 12 # hello_matmul\hello_matmul_llvm.mlir:307:12 + incq %rax + .loc 1 46 5 # hello_matmul\hello_matmul_llvm.mlir:46:5 + addq $4, %r11 + .loc 1 45 11 # hello_matmul\hello_matmul_llvm.mlir:45:11 + cmpq $256, %rax # imm = 0x100 + .loc 1 46 5 # hello_matmul\hello_matmul_llvm.mlir:46:5 + jne .LBB1_2 +# %bb.5: # in Loop: Header=BB1_1 Depth=1 + .loc 1 310 12 # hello_matmul\hello_matmul_llvm.mlir:310:12 + incq %r9 + .loc 1 41 5 # hello_matmul\hello_matmul_llvm.mlir:41:5 + addq $1024, %rcx # imm = 0x400 + .loc 1 40 11 # hello_matmul\hello_matmul_llvm.mlir:40:11 + cmpq $128, %r9 + .loc 1 41 5 # hello_matmul\hello_matmul_llvm.mlir:41:5 + jne .LBB1_1 +.Ltmp2: +# %bb.6: # %hello_matmul_py_0f07b3ac_impl_16252232176815793891.exit + .loc 1 377 5 # hello_matmul\hello_matmul_llvm.mlir:377:5 + popq %rbx + popq %rdi + popq %rsi + retq +.Ltmp3: +.Lfunc_end1: + # -- End function + .section .debug_abbrev,"dr" +.Lsection_abbrev: + .byte 1 # Abbreviation Code + .byte 17 # DW_TAG_compile_unit + .byte 1 # DW_CHILDREN_yes + .byte 37 # DW_AT_producer + .byte 14 # DW_FORM_strp + .byte 19 # DW_AT_language + .byte 5 # DW_FORM_data2 + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 16 # DW_AT_stmt_list + .byte 23 # DW_FORM_sec_offset + .byte 27 # DW_AT_comp_dir + .byte 14 # DW_FORM_strp + .ascii "\264B" # DW_AT_GNU_pubnames + .byte 25 # DW_FORM_flag_present + .byte 17 # DW_AT_low_pc + .byte 1 # DW_FORM_addr + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 2 # Abbreviation Code + .byte 46 # DW_TAG_subprogram + .byte 0 # DW_CHILDREN_no + .byte 17 # DW_AT_low_pc + .byte 1 # DW_FORM_addr + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 64 # DW_AT_frame_base + .byte 24 # DW_FORM_exprloc + .byte 49 # DW_AT_abstract_origin + .byte 19 # DW_FORM_ref4 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 3 # Abbreviation Code + .byte 46 # DW_TAG_subprogram + .byte 0 # DW_CHILDREN_no + .byte 110 # DW_AT_linkage_name + .byte 14 # DW_FORM_strp + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 58 # DW_AT_decl_file + .byte 11 # DW_FORM_data1 + .byte 59 # DW_AT_decl_line + .byte 11 # DW_FORM_data1 + .byte 63 # DW_AT_external + .byte 25 # DW_FORM_flag_present + .byte 32 # DW_AT_inline + .byte 11 # DW_FORM_data1 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 4 # Abbreviation Code + .byte 46 # DW_TAG_subprogram + .byte 1 # DW_CHILDREN_yes + .byte 17 # DW_AT_low_pc + .byte 1 # DW_FORM_addr + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 64 # DW_AT_frame_base + .byte 24 # DW_FORM_exprloc + .byte 110 # DW_AT_linkage_name + .byte 14 # DW_FORM_strp + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 58 # DW_AT_decl_file + .byte 11 # DW_FORM_data1 + .byte 59 # DW_AT_decl_line + .byte 5 # DW_FORM_data2 + .byte 63 # DW_AT_external + .byte 25 # DW_FORM_flag_present + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 5 # Abbreviation Code + .byte 29 # DW_TAG_inlined_subroutine + .byte 0 # DW_CHILDREN_no + .byte 49 # DW_AT_abstract_origin + .byte 19 # DW_FORM_ref4 + .byte 17 # DW_AT_low_pc + .byte 1 # DW_FORM_addr + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 88 # DW_AT_call_file + .byte 11 # DW_FORM_data1 + .byte 89 # DW_AT_call_line + .byte 5 # DW_FORM_data2 + .byte 87 # DW_AT_call_column + .byte 11 # DW_FORM_data1 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 0 # EOM(3) + .section .debug_info,"dr" +.Lsection_info: +.Lcu_begin0: + .long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit +.Ldebug_info_start0: + .short 4 # DWARF version number + .secrel32 .Lsection_abbrev # Offset Into Abbrev. Section + .byte 8 # Address Size (in bytes) + .byte 1 # Abbrev [1] 0xb:0x6f DW_TAG_compile_unit + .secrel32 .Linfo_string0 # DW_AT_producer + .short 2 # DW_AT_language + .secrel32 .Linfo_string1 # DW_AT_name + .secrel32 .Lline_table_start0 # DW_AT_stmt_list + .secrel32 .Linfo_string2 # DW_AT_comp_dir + # DW_AT_GNU_pubnames + .quad .Lfunc_begin0 # DW_AT_low_pc + .long .Lfunc_end1-.Lfunc_begin0 # DW_AT_high_pc + .byte 2 # Abbrev [2] 0x2a:0x13 DW_TAG_subprogram + .quad .Lfunc_begin0 # DW_AT_low_pc + .long .Lfunc_end0-.Lfunc_begin0 # DW_AT_high_pc + .byte 1 # DW_AT_frame_base + .byte 87 + .long 61 # DW_AT_abstract_origin + .byte 3 # Abbrev [3] 0x3d:0xc DW_TAG_subprogram + .secrel32 .Linfo_string3 # DW_AT_linkage_name + .secrel32 .Linfo_string3 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 6 # DW_AT_decl_line + # DW_AT_external + .byte 1 # DW_AT_inline + .byte 4 # Abbrev [4] 0x49:0x30 DW_TAG_subprogram + .quad .Lfunc_begin1 # DW_AT_low_pc + .long .Lfunc_end1-.Lfunc_begin1 # DW_AT_high_pc + .byte 1 # DW_AT_frame_base + .byte 87 + .secrel32 .Linfo_string4 # DW_AT_linkage_name + .secrel32 .Linfo_string4 # DW_AT_name + .byte 1 # DW_AT_decl_file + .short 315 # DW_AT_decl_line + # DW_AT_external + .byte 5 # Abbrev [5] 0x63:0x15 DW_TAG_inlined_subroutine + .long 61 # DW_AT_abstract_origin + .quad .Lfunc_begin1 # DW_AT_low_pc + .long .Ltmp2-.Lfunc_begin1 # DW_AT_high_pc + .byte 1 # DW_AT_call_file + .short 376 # DW_AT_call_line + .byte 5 # DW_AT_call_column + .byte 0 # End Of Children Mark + .byte 0 # End Of Children Mark +.Ldebug_info_end0: + .section .debug_str,"dr" +.Linfo_string: +.Linfo_string0: + .asciz "mlir" # string offset=0 +.Linfo_string1: + .asciz "LLVMDialectModule" # string offset=5 +.Linfo_string2: + .asciz "/" # string offset=23 +.Linfo_string3: + .asciz "hello_matmul_py_0f07b3ac_impl_16252232176815793891" # string offset=25 +.Linfo_string4: + .asciz "hello_matmul_py_0f07b3ac" # string offset=76 + .section .debug_pubnames,"dr" + .long .LpubNames_end0-.LpubNames_begin0 # Length of Public Names Info +.LpubNames_begin0: + .short 2 # DWARF Version + .secrel32 .Lcu_begin0 # Offset of Compilation Unit Info + .long 122 # Compilation Unit Length + .long 73 # DIE offset + .asciz "hello_matmul_py_0f07b3ac" # External Name + .long 61 # DIE offset + .asciz "hello_matmul_py_0f07b3ac_impl_16252232176815793891" # External Name + .long 0 # End Mark +.LpubNames_end0: + .section .debug_pubtypes,"dr" + .long .LpubTypes_end0-.LpubTypes_begin0 # Length of Public Types Info +.LpubTypes_begin0: + .short 2 # DWARF Version + .secrel32 .Lcu_begin0 # Offset of Compilation Unit Info + .long 122 # Compilation Unit Length + .long 0 # End Mark +.LpubTypes_end0: + .globl _fltused + .section .debug_line,"dr" +.Lsection_line: +.Lline_table_start0: diff --git a/Tutorials/hello_matmul/mlir/hello_matmul_llvm.mlir b/Tutorials/hello_matmul/mlir/hello_matmul_llvm.mlir new file mode 100644 index 00000000..b4941f5d --- /dev/null +++ b/Tutorials/hello_matmul/mlir/hello_matmul_llvm.mlir @@ -0,0 +1,379 @@ + + + + +module @hello_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.func @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(128 : index) : !llvm.i64 + %25 = llvm.mlir.constant(0 : index) : !llvm.i64 + %26 = llvm.mlir.constant(256 : index) : !llvm.i64 + %27 = llvm.mlir.constant(4 : index) : !llvm.i64 + %28 = llvm.mlir.constant(1 : index) : !llvm.i64 + %29 = llvm.mlir.constant(2 : index) : !llvm.i64 + %30 = llvm.mlir.constant(3 : index) : !llvm.i64 + llvm.br ^bb1(%25 : !llvm.i64) + ^bb1(%31: !llvm.i64): // 2 preds: ^bb0, ^bb8 + %32 = llvm.icmp "slt" %31, %24 : !llvm.i64 + llvm.cond_br %32, ^bb2, ^bb9 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%25 : !llvm.i64) + ^bb3(%33: !llvm.i64): // 2 preds: ^bb2, ^bb7 + %34 = llvm.icmp "slt" %33, %26 : !llvm.i64 + llvm.cond_br %34, ^bb4, ^bb8 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%25 : !llvm.i64) + ^bb5(%35: !llvm.i64): // 2 preds: ^bb4, ^bb6 + %36 = llvm.icmp "slt" %35, %26 : !llvm.i64 + llvm.cond_br %36, ^bb6, ^bb7 + ^bb6: // pred: ^bb5 + %37 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %38 = llvm.mlir.constant(0 : index) : !llvm.i64 + %39 = llvm.mlir.constant(256 : index) : !llvm.i64 + %40 = llvm.mul %31, %39 : !llvm.i64 + %41 = llvm.add %38, %40 : !llvm.i64 + %42 = llvm.mlir.constant(1 : index) : !llvm.i64 + %43 = llvm.mul %35, %42 : !llvm.i64 + %44 = llvm.add %41, %43 : !llvm.i64 + %45 = llvm.getelementptr %37[%44] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %46 = llvm.load %45 : !llvm.ptr + %47 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.mlir.constant(0 : index) : !llvm.i64 + %49 = llvm.mlir.constant(256 : index) : !llvm.i64 + %50 = llvm.mul %35, %49 : !llvm.i64 + %51 = llvm.add %48, %50 : !llvm.i64 + %52 = llvm.mlir.constant(1 : index) : !llvm.i64 + %53 = llvm.mul %33, %52 : !llvm.i64 + %54 = llvm.add %51, %53 : !llvm.i64 + %55 = llvm.getelementptr %47[%54] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %56 = llvm.load %55 : !llvm.ptr + %57 = llvm.fmul %46, %56 {RelaxedPrecision} : !llvm.float + %58 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.mlir.constant(0 : index) : !llvm.i64 + %60 = llvm.mlir.constant(256 : index) : !llvm.i64 + %61 = llvm.mul %31, %60 : !llvm.i64 + %62 = llvm.add %59, %61 : !llvm.i64 + %63 = llvm.mlir.constant(1 : index) : !llvm.i64 + %64 = llvm.mul %33, %63 : !llvm.i64 + %65 = llvm.add %62, %64 : !llvm.i64 + %66 = llvm.getelementptr %58[%65] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %67 = llvm.load %66 : !llvm.ptr + %68 = llvm.fadd %67, %57 {RelaxedPrecision} : !llvm.float + %69 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %70 = llvm.mlir.constant(0 : index) : !llvm.i64 + %71 = llvm.mlir.constant(256 : index) : !llvm.i64 + %72 = llvm.mul %31, %71 : !llvm.i64 + %73 = llvm.add %70, %72 : !llvm.i64 + %74 = llvm.mlir.constant(1 : index) : !llvm.i64 + %75 = llvm.mul %33, %74 : !llvm.i64 + %76 = llvm.add %73, %75 : !llvm.i64 + %77 = llvm.getelementptr %69[%76] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %68, %77 : !llvm.ptr + %78 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %79 = llvm.mlir.constant(0 : index) : !llvm.i64 + %80 = llvm.mlir.constant(256 : index) : !llvm.i64 + %81 = llvm.mul %31, %80 : !llvm.i64 + %82 = llvm.add %79, %81 : !llvm.i64 + %83 = llvm.mlir.constant(1 : index) : !llvm.i64 + %84 = llvm.mul %33, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.getelementptr %78[%85] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %87 = llvm.load %86 : !llvm.ptr + %88 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.mlir.constant(0 : index) : !llvm.i64 + %90 = llvm.mlir.constant(256 : index) : !llvm.i64 + %91 = llvm.mul %31, %90 : !llvm.i64 + %92 = llvm.add %89, %91 : !llvm.i64 + %93 = llvm.mlir.constant(1 : index) : !llvm.i64 + %94 = llvm.mul %33, %93 : !llvm.i64 + %95 = llvm.add %92, %94 : !llvm.i64 + %96 = llvm.getelementptr %88[%95] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %87, %96 : !llvm.ptr + %97 = llvm.add %35, %28 : !llvm.i64 + %98 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %99 = llvm.mlir.constant(0 : index) : !llvm.i64 + %100 = llvm.mlir.constant(256 : index) : !llvm.i64 + %101 = llvm.mul %31, %100 : !llvm.i64 + %102 = llvm.add %99, %101 : !llvm.i64 + %103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %104 = llvm.mul %97, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.getelementptr %98[%105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %107 = llvm.load %106 : !llvm.ptr + %108 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %110 = llvm.mlir.constant(256 : index) : !llvm.i64 + %111 = llvm.mul %97, %110 : !llvm.i64 + %112 = llvm.add %109, %111 : !llvm.i64 + %113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %114 = llvm.mul %33, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.getelementptr %108[%115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %117 = llvm.load %116 : !llvm.ptr + %118 = llvm.fmul %107, %117 {RelaxedPrecision} : !llvm.float + %119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %121 = llvm.mlir.constant(256 : index) : !llvm.i64 + %122 = llvm.mul %31, %121 : !llvm.i64 + %123 = llvm.add %120, %122 : !llvm.i64 + %124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %125 = llvm.mul %33, %124 : !llvm.i64 + %126 = llvm.add %123, %125 : !llvm.i64 + %127 = llvm.getelementptr %119[%126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %128 = llvm.load %127 : !llvm.ptr + %129 = llvm.fadd %128, %118 {RelaxedPrecision} : !llvm.float + %130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %132 = llvm.mlir.constant(256 : index) : !llvm.i64 + %133 = llvm.mul %31, %132 : !llvm.i64 + %134 = llvm.add %131, %133 : !llvm.i64 + %135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %136 = llvm.mul %33, %135 : !llvm.i64 + %137 = llvm.add %134, %136 : !llvm.i64 + %138 = llvm.getelementptr %130[%137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %129, %138 : !llvm.ptr + %139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.mul %31, %141 : !llvm.i64 + %143 = llvm.add %140, %142 : !llvm.i64 + %144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %145 = llvm.mul %33, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.getelementptr %139[%146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %148 = llvm.load %147 : !llvm.ptr + %149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %151 = llvm.mlir.constant(256 : index) : !llvm.i64 + %152 = llvm.mul %31, %151 : !llvm.i64 + %153 = llvm.add %150, %152 : !llvm.i64 + %154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %155 = llvm.mul %33, %154 : !llvm.i64 + %156 = llvm.add %153, %155 : !llvm.i64 + %157 = llvm.getelementptr %149[%156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %148, %157 : !llvm.ptr + %158 = llvm.add %35, %29 : !llvm.i64 + %159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %161 = llvm.mlir.constant(256 : index) : !llvm.i64 + %162 = llvm.mul %31, %161 : !llvm.i64 + %163 = llvm.add %160, %162 : !llvm.i64 + %164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %165 = llvm.mul %158, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.getelementptr %159[%166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %168 = llvm.load %167 : !llvm.ptr + %169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(256 : index) : !llvm.i64 + %172 = llvm.mul %158, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %33, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %178 = llvm.load %177 : !llvm.ptr + %179 = llvm.fmul %168, %178 {RelaxedPrecision} : !llvm.float + %180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %182 = llvm.mlir.constant(256 : index) : !llvm.i64 + %183 = llvm.mul %31, %182 : !llvm.i64 + %184 = llvm.add %181, %183 : !llvm.i64 + %185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %186 = llvm.mul %33, %185 : !llvm.i64 + %187 = llvm.add %184, %186 : !llvm.i64 + %188 = llvm.getelementptr %180[%187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %189 = llvm.load %188 : !llvm.ptr + %190 = llvm.fadd %189, %179 {RelaxedPrecision} : !llvm.float + %191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %193 = llvm.mlir.constant(256 : index) : !llvm.i64 + %194 = llvm.mul %31, %193 : !llvm.i64 + %195 = llvm.add %192, %194 : !llvm.i64 + %196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %197 = llvm.mul %33, %196 : !llvm.i64 + %198 = llvm.add %195, %197 : !llvm.i64 + %199 = llvm.getelementptr %191[%198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %190, %199 : !llvm.ptr + %200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(256 : index) : !llvm.i64 + %203 = llvm.mul %31, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %33, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.load %208 : !llvm.ptr + %210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %212 = llvm.mlir.constant(256 : index) : !llvm.i64 + %213 = llvm.mul %31, %212 : !llvm.i64 + %214 = llvm.add %211, %213 : !llvm.i64 + %215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %216 = llvm.mul %33, %215 : !llvm.i64 + %217 = llvm.add %214, %216 : !llvm.i64 + %218 = llvm.getelementptr %210[%217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %209, %218 : !llvm.ptr + %219 = llvm.add %35, %30 : !llvm.i64 + %220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %222 = llvm.mlir.constant(256 : index) : !llvm.i64 + %223 = llvm.mul %31, %222 : !llvm.i64 + %224 = llvm.add %221, %223 : !llvm.i64 + %225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %226 = llvm.mul %219, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.getelementptr %220[%227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %229 = llvm.load %228 : !llvm.ptr + %230 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %232 = llvm.mlir.constant(256 : index) : !llvm.i64 + %233 = llvm.mul %219, %232 : !llvm.i64 + %234 = llvm.add %231, %233 : !llvm.i64 + %235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %236 = llvm.mul %33, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.getelementptr %230[%237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %239 = llvm.load %238 : !llvm.ptr + %240 = llvm.fmul %229, %239 {RelaxedPrecision} : !llvm.float + %241 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %242 = llvm.mlir.constant(0 : index) : !llvm.i64 + %243 = llvm.mlir.constant(256 : index) : !llvm.i64 + %244 = llvm.mul %31, %243 : !llvm.i64 + %245 = llvm.add %242, %244 : !llvm.i64 + %246 = llvm.mlir.constant(1 : index) : !llvm.i64 + %247 = llvm.mul %33, %246 : !llvm.i64 + %248 = llvm.add %245, %247 : !llvm.i64 + %249 = llvm.getelementptr %241[%248] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %250 = llvm.load %249 : !llvm.ptr + %251 = llvm.fadd %250, %240 {RelaxedPrecision} : !llvm.float + %252 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %253 = llvm.mlir.constant(0 : index) : !llvm.i64 + %254 = llvm.mlir.constant(256 : index) : !llvm.i64 + %255 = llvm.mul %31, %254 : !llvm.i64 + %256 = llvm.add %253, %255 : !llvm.i64 + %257 = llvm.mlir.constant(1 : index) : !llvm.i64 + %258 = llvm.mul %33, %257 : !llvm.i64 + %259 = llvm.add %256, %258 : !llvm.i64 + %260 = llvm.getelementptr %252[%259] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %251, %260 : !llvm.ptr + %261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %263 = llvm.mlir.constant(256 : index) : !llvm.i64 + %264 = llvm.mul %31, %263 : !llvm.i64 + %265 = llvm.add %262, %264 : !llvm.i64 + %266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %267 = llvm.mul %33, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.getelementptr %261[%268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %270 = llvm.load %269 : !llvm.ptr + %271 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %272 = llvm.mlir.constant(0 : index) : !llvm.i64 + %273 = llvm.mlir.constant(256 : index) : !llvm.i64 + %274 = llvm.mul %31, %273 : !llvm.i64 + %275 = llvm.add %272, %274 : !llvm.i64 + %276 = llvm.mlir.constant(1 : index) : !llvm.i64 + %277 = llvm.mul %33, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.getelementptr %271[%278] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %270, %279 : !llvm.ptr + %280 = llvm.add %35, %27 : !llvm.i64 + llvm.br ^bb5(%280 : !llvm.i64) + ^bb7: // pred: ^bb5 + %281 = llvm.add %33, %28 : !llvm.i64 + llvm.br ^bb3(%281 : !llvm.i64) + ^bb8: // pred: ^bb3 + %282 = llvm.add %31, %28 : !llvm.i64 + llvm.br ^bb1(%282 : !llvm.i64) + ^bb9: // pred: ^bb1 + llvm.return + } + llvm.func @hello_matmul_py_0f07b3ac(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "hello_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(256 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(256 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(256 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(256 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(256 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(256 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(256 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @hello_matmul_py_0f07b3ac_impl_16252232176815793891(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/hello_matmul_gpu/hello_matmul_gpu_generator.py b/Tutorials/hello_matmul_gpu/hello_matmul_gpu_generator.py new file mode 100644 index 00000000..02d0e83f --- /dev/null +++ b/Tutorials/hello_matmul_gpu/hello_matmul_gpu_generator.py @@ -0,0 +1,59 @@ +#!/usr/bin/env python3 +# Accera Hello MatMul GPU sample: generator +import accera as acc + +# Define our matrix sizes +M = 1024 +N = 512 +K = 256 + +# Define the arguments we want to take for the MatMul function +A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K)) +B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N)) +C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N)) + +# Define the loop nest +nest = acc.Nest(shape=(M, N, K)) + +# Get the loop nest indices +i, j, k = nest.get_indices() + +# Define the loop nest logic +@nest.iteration_logic +def _(): + C[i, j] += A[i, k] * B[k, j] + +# Create the schedule from the nest +schedule = nest.create_schedule() + +# Define constants +block_x = 16 +block_y = 16 +block_z = 1 + +# Transform the schedule +ii = schedule.split(i, block_x) +jj = schedule.split(j, block_y) + +# Set the dimension order +schedule.reorder(i, j, ii, jj, k) + +# Create the GPU plan +target = acc.Target(category=acc.Target.Category.GPU, runtime=acc.Target.Runtime.VULKAN) +plan = schedule.create_plan(target) + +# Bind dimensions to a grid of execution units +plan.bind({ + i: target.GridUnit.BLOCK_X, + j: target.GridUnit.BLOCK_Y, + ii: target.GridUnit.THREAD_X, + jj: target.GridUnit.THREAD_Y +}) + +# Create a package and add a function to the package based on the plan +package = acc.Package() +package.add(plan, args=(A, B, C), base_name="hello_matmul_gpu") + +# Build a statically-linked HAT package to be consumed by the C++ runner +# Change format=acc.Package.Format.HAT_STATIC to format=acc.Package.Format.MLIR_STATIC to also generate MLIR to _tmp/hello_matmul_gpu +package.build("hello_matmul_gpu", format=acc.Package.Format.HAT_STATIC) diff --git a/Tutorials/hello_matmul_gpu/hello_matmul_gpu_runner.cpp b/Tutorials/hello_matmul_gpu/hello_matmul_gpu_runner.cpp new file mode 100644 index 00000000..bf6862d8 --- /dev/null +++ b/Tutorials/hello_matmul_gpu/hello_matmul_gpu_runner.cpp @@ -0,0 +1,46 @@ +#include +#include + +// Include the HAT file that declares GPU initialization/uninitialization functions +#include "AcceraGPUUtilities.hat" + +// Include the HAT file that declares our MatMul function +#include "hello_matmul_gpu.hat" + +#define M 1024 +#define N 512 +#define K 256 + +int main(int argc, const char** argv) +{ + // Prepare our matrices (using the heap for large matrices) + float* A = new float[M*K]; + float* B = new float[K*N]; + float* C = new float[M*N]; + + // Fill with data + std::fill_n(A, M*K, 2.0f); + std::fill_n(B, K*N, 3.0f); + std::fill_n(C, M*N, 0.42f); + + // Initialize the GPU + AcceraGPUInitialize(); + + printf("Calling MatMul M=%d, K=%d, N=%d\n", M, K, N); + hello_matmul_gpu(A, B, C); + + printf("Result (first 10 elements): "); + for (int i = 0; i < 10; ++i) + { + printf("%f ", C[i]); + } + printf("\n"); + + // Uninitialize the GPU + AcceraGPUDeInitialize(); + + delete[] A; + delete[] B; + delete[] C; + return 0; +} diff --git a/Tutorials/index.html b/Tutorials/index.html new file mode 100644 index 00000000..ed092f1a --- /dev/null +++ b/Tutorials/index.html @@ -0,0 +1,2205 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Index - Accera + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera Tutorials

+ + + + + + + + + + + + + + + + + + + + + + + + + +
TutorialDescription
Hello Matrix MultiplicationStart here if you are completely new to Accera and would like to learn more about the workflow
Optimized Matrix MultiplicationOnce you understand the basics, we'll look at how to optimize matrix multiplication for a specific hardware target
Hello Matrix Multiplication on GPUWe'll look at how to apply the basic concepts for GPU targets
Cross Compilation for Raspberry Pi 3After you know how to generate code for the host target, we'll look at how to generate code for other targets
+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ +
+ + +
+ +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/Tutorials/optimized_matmul/mlir/0_Initial.mlir b/Tutorials/optimized_matmul/mlir/0_Initial.mlir new file mode 100644 index 00000000..9dff55f3 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/0_Initial.mlir @@ -0,0 +1,105 @@ +#map0 = affine_map<(d0, d1) -> (d0 * 512 + d1)> +#map1 = affine_map<()[s0] -> (s0)> +#map2 = affine_map<(d0, d1) -> (d0 * 128 + d1)> + + +#domain0 = #accln<"idomain{{i,0}={0:784:1}, {j,1}={0:512:1}, {k,2}={0:128:1}}"> + +#xdomain0 = #accln<"xfdomain{dims: {{i,0}, {j,1}, {k,2}}, indices: {{{i,0} : {0:784:1} = {(d0, d1) -> (d0 + d1), {{i_o,7}, {i_i,8}}}}, {{j,1} : {0:512:1} = {(d0, d1) -> (d0 + d1), {{j_o,3}, {j_i,4}}}}, {{k,2} : {0:128:1} = {(d0, d1) -> (d0 + d1), {{k_o,5}, {k_i,6}}}}, {{j_o,3} : {0:512:256}}, {{j_i,4} : {0:256:1} = {(d0, d1) -> (d0 + d1), {{j_i_o,13}, {j_i_i,14}}}}, {{k_o,5} : {0:128:128}}, {{k_i,6} : {0:128:1} = {(d0, d1) -> (d0 + d1), {{k_i_o,9}, {k_i_i,10}}}}, {{i_o,7} : {0:784:1}}, {{i_i,8} : {0:1:1} = {(d0, d1) -> (d0 + d1), {{i_i_o,11}, {i_i_i,12}}}}, {{k_i_o,9} : {0:128:4}}, {{k_i_i,10} : {0:4:1}}, {{i_i_o,11} : {0:1:6}}, {{i_i_i,12} : {0:6:1}}, {{j_i_o,13} : {0:256:16}}, {{j_i_i,14} : {0:16:1} = {(d0, d1) -> (d0 + d1), {{j_i_i_o,15}, {j_i_i_i,16}}}}, {{j_i_i_o,15} : {0:16:8}}, {{j_i_i_i,16} : {0:8:1}}}}"> + +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "optimized_matmul" { + accv.func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, #map2>, %arg1: memref<128x512xf32, #map0>, %arg2: memref<784x512xf32, #map0>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + "accln.nest"() ( { + %0 = accln.sym_index {name = "i_o"} #accln<"index{i_o,7}"> loc(unknown) + %1 = accln.sym_index {name = "j_i_i_i"} #accln<"index{j_i_i_i,16}"> loc(unknown) + %2 = accln.sym_index {name = "j_i_i_o"} #accln<"index{j_i_i_o,15}"> loc(unknown) + %3 = accln.sym_index {name = "i_i_i"} #accln<"index{i_i_i,12}"> loc(unknown) + %4 = accln.sym_index {name = "k_i_i"} #accln<"index{k_i_i,10}"> loc(unknown) + %5 = accln.sym_index {name = "i_i_o"} #accln<"index{i_i_o,11}"> loc(unknown) + %6 = accln.sym_index {name = "k_i_o"} #accln<"index{k_i_o,9}"> loc(unknown) + %7 = accln.sym_index {name = "j_i_o"} #accln<"index{j_i_o,13}"> loc(unknown) + %8 = accln.sym_index {name = "i_o"} #accln<"index{i_o,7}"> loc(unknown) + %9 = accln.sym_index {name = "k_o"} #accln<"index{k_o,5}"> loc(unknown) + %10 = accln.sym_index {name = "j_o"} #accln<"index{j_o,3}"> loc(unknown) + %11 = accln.sym_index {name = "k_o"} #accln<"index{k_o,5}"> loc(unknown) + %12 = accln.sym_index {name = "j_i_i_i"} #accln<"index{j_i_i_i,16}"> loc(unknown) + %13 = accln.sym_index {name = "j_i_i_o"} #accln<"index{j_i_i_o,15}"> loc(unknown) + %14 = accln.sym_index {name = "i_i_i"} #accln<"index{i_i_i,12}"> loc(unknown) + %15 = accln.sym_index {name = "k_i_i"} #accln<"index{k_i_i,10}"> loc(unknown) + %16 = accln.sym_index {name = "i_i_o"} #accln<"index{i_i_o,11}"> loc(unknown) + %17 = accln.sym_index {name = "k_i_o"} #accln<"index{k_i_o,9}"> loc(unknown) + %18 = accln.sym_index {name = "j_i_o"} #accln<"index{j_i_o,13}"> loc(unknown) + %19 = accln.sym_index {name = "i_o"} #accln<"index{i_o,7}"> loc(unknown) + %20 = accln.sym_index {name = "k_o"} #accln<"index{k_o,5}"> loc(unknown) + %21 = accln.sym_index {name = "j_o"} #accln<"index{j_o,3}"> loc(unknown) + %22 = accln.sym_index {name = "j_i_i_i", reference = "j_i_i"} #accln<"index{j_i_i_i,16}"> loc(unknown) + %23 = accln.sym_index {name = "j_i_i_o", reference = "j_i_i"} #accln<"index{j_i_i_o,15}"> loc(unknown) + %24 = accln.sym_index {name = "j_i_i_i"} #accln<"index{j_i_i_i,16}"> loc(unknown) + %25 = accln.sym_index {name = "j_i_i_o"} #accln<"index{j_i_i_o,15}"> loc(unknown) + %26 = accln.sym_index {name = "j_i_i", reference = "j_i"} #accln<"index{j_i_i,14}"> loc(unknown) + %27 = accln.sym_index {name = "j_i_o", reference = "j_i"} #accln<"index{j_i_o,13}"> loc(unknown) + %28 = accln.sym_index {name = "j_i_i"} #accln<"index{j_i_i,14}"> loc(unknown) + %29 = accln.sym_index {name = "j_i_o"} #accln<"index{j_i_o,13}"> loc(unknown) + %30 = accln.sym_index {name = "i_i_i", reference = "i_i"} #accln<"index{i_i_i,12}"> loc(unknown) + %31 = accln.sym_index {name = "i_i_o", reference = "i_i"} #accln<"index{i_i_o,11}"> loc(unknown) + %32 = accln.sym_index {name = "i_i_i"} #accln<"index{i_i_i,12}"> loc(unknown) + %33 = accln.sym_index {name = "i_i_o"} #accln<"index{i_i_o,11}"> loc(unknown) + %34 = accln.sym_index {name = "k_i_i", reference = "k_i"} #accln<"index{k_i_i,10}"> loc(unknown) + %35 = accln.sym_index {name = "k_i_o", reference = "k_i"} #accln<"index{k_i_o,9}"> loc(unknown) + %36 = accln.sym_index {name = "k_i_i"} #accln<"index{k_i_i,10}"> loc(unknown) + %37 = accln.sym_index {name = "k_i_o"} #accln<"index{k_i_o,9}"> loc(unknown) + %38 = accln.sym_index {name = "i_i", reference = "i"} #accln<"index{i_i,8}"> loc(unknown) + %39 = accln.sym_index {name = "i_o", reference = "i"} #accln<"index{i_o,7}"> loc(unknown) + %40 = accln.sym_index {name = "i_i"} #accln<"index{i_i,8}"> loc(unknown) + %41 = accln.sym_index {name = "i_o"} #accln<"index{i_o,7}"> loc(unknown) + %42 = accln.sym_index {name = "k_i", reference = "k"} #accln<"index{k_i,6}"> loc(unknown) + %43 = accln.sym_index {name = "k_o", reference = "k"} #accln<"index{k_o,5}"> loc(unknown) + %44 = accln.sym_index {name = "k_i"} #accln<"index{k_i,6}"> loc(unknown) + %45 = accln.sym_index {name = "k_o"} #accln<"index{k_o,5}"> loc(unknown) + %46 = accln.sym_index {name = "j_i", reference = "j"} #accln<"index{j_i,4}"> loc(unknown) + %47 = accln.sym_index {name = "j_o", reference = "j"} #accln<"index{j_o,3}"> loc(unknown) + %48 = accln.sym_index {name = "j_i"} #accln<"index{j_i,4}"> loc(unknown) + %49 = accln.sym_index {name = "j_o"} #accln<"index{j_o,3}"> loc(unknown) + %50 = accln.sym_index {name = "i"} #accln<"index{i,0}"> loc(unknown) + %51 = accln.sym_index {name = "j"} #accln<"index{j,1}"> loc(unknown) + %52 = accln.sym_index {name = "k"} #accln<"index{k,2}"> loc(unknown) + "accln.kernel"() ( { + %58 = "accv.slice"(%arg2, %50, %51) {sliceDimensions = [0, 1]} : (memref<784x512xf32, #map0>, index, index) -> memref loc(unknown) + %59 = "accv.slice"(%arg0, %50, %52) {sliceDimensions = [0, 1]} : (memref<784x128xf32, #map2>, index, index) -> memref loc(unknown) + %60 = "accv.slice"(%arg1, %52, %51) {sliceDimensions = [0, 1]} : (memref<128x512xf32, #map0>, index, index) -> memref loc(unknown) + %61 = "accv.get_element"(%59) : (memref) -> f32 loc(unknown) + %62 = "accv.get_element"(%60) : (memref) -> f32 loc(unknown) + %63 = "accv.bin_op"(%61, %62) {predicate = 2 : i64} : (f32, f32) -> f32 loc(unknown) + %64 = "accv.get_element"(%58) : (memref) -> f32 loc(unknown) + %65 = "accv.bin_op"(%64, %63) {predicate = 0 : i64} : (f32, f32) -> f32 loc(unknown) + "accv.copy"(%65, %58) : (f32, memref) -> () loc(unknown) + %66 = "accv.slice"(%arg2, %50, %51) {sliceDimensions = [0, 1]} : (memref<784x512xf32, #map0>, index, index) -> memref loc(unknown) + %67 = "accv.get_element"(%58) : (memref) -> f32 loc(unknown) + "accv.copy"(%67, %66) : (f32, memref) -> () loc(unknown) + accln.terminator loc(unknown) + }) {sym_name = "_"} : () -> () loc(unknown) + %53 = "accln.null_pred"() : () -> i1 loc(unknown) + "accln.scheduled_kernel"(%53) {kernel = @_, sym_name = "scheduled__"} : (i1) -> () loc(unknown) + %54 = "accxp.make_cache"() {memorySpace = 0 : i64} : () -> memref<16x128x16xf32, 3> loc(unknown) + %55 = "accxp.cache_region"(%arg1, %54, %21, %20) ( { + accxp.cache_region_terminator loc(unknown) + }) {cacheAccessIndexing = 0 : i64, cacheAccessMaps = {globalInputToLogicalCache = affine_map<(d0, d1, d2, d3) -> (d2 - d1, d3 - d0)>, globalInputToPhysicalCache = affine_map<(d0, d1, d2, d3) -> (((d3 - d0) floordiv 16) mod 16, (d2 - d1) mod 128, (d3 - d0) mod 16)>, logicalCacheToGlobalInput = affine_map<(d0, d1, d2, d3) -> (d2 + d1, d3 + d0)>, logicalCacheToPhysicalCache = affine_map<(d0, d1) -> ((d1 floordiv 16) mod 16, d0 mod 128, d1 mod 16)>}, cacheDimGlobalIndices = [#accln<"index{k,2}">, #accln<"index{j,1}">], cacheGlobalDimensionSizes = [128, 512], id = 0 : i64, injectionIndex = #accln<"index{k_o,5}">, inputAccessIndexing = 0 : i64, inputAccessMaps = {globalInputToPhysicalCache = affine_map<(d0, d1) -> (d0, d1)>}} : (memref<128x512xf32, #map0>, memref<16x128x16xf32, 3>, index, index) -> index loc(unknown) + %56 = "accxp.make_cache"() {memorySpace = 0 : i64} : () -> memref<16x6x16xf32, 3> loc(unknown) + %57 = "accxp.cache_region"(%arg2, %56, %10, %9, %8) ( { + accxp.cache_region_terminator loc(unknown) + }) {cacheAccessIndexing = 0 : i64, cacheAccessMaps = {globalInputToLogicalCache = affine_map<(d0, d1, d2, d3, d4) -> (d3 - d2, d4 - d0)>, globalInputToPhysicalCache = affine_map<(d0, d1, d2, d3, d4) -> (((d4 - d0) floordiv 16) mod 16, (d3 - d2) mod 6, (d4 - d0) mod 16)>, logicalCacheToGlobalInput = affine_map<(d0, d1, d2, d3, d4) -> (d3 + d2, d4 + d0)>, logicalCacheToPhysicalCache = affine_map<(d0, d1) -> ((d1 floordiv 16) mod 16, d0 mod 6, d1 mod 16)>}, cacheDimGlobalIndices = [#accln<"index{i,0}">, #accln<"index{j,1}">], cacheGlobalDimensionSizes = [784, 512], id = 1 : i64, injectionIndex = #accln<"index{i_o,7}">, inputAccessIndexing = 0 : i64, inputAccessMaps = {globalInputToPhysicalCache = affine_map<(d0, d1) -> (d0, d1)>}} : (memref<784x512xf32, #map0>, memref<16x6x16xf32, 3>, index, index, index) -> index loc(unknown) + "accln.schedule"(%55, %57) ( { + "accln.exec_plan"() {exec_target = 0 : i64} : () -> () loc(unknown) + accln.terminator loc(unknown) + }) {domain = #xdomain0, kernels = [@scheduled__], loopattrs = [{accxp_vectorizationInfo = #accxp<"vectorizationinfo{8,16,1}">, scheduledIndex = #accln<"index{j_i_i_i,16}">}], order = [#accln<"index{j_o,3}">, #accln<"index{k_o,5}">, #accln<"index{i_o,7}">, #accln<"index{j_i_o,13}">, #accln<"index{k_i_o,9}">, #accln<"index{i_i_o,11}">, #accln<"index{k_i_i,10}">, #accln<"index{i_i_i,12}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], parallel = [], unroll_and_jammed = {}, unrolled = [15 : index, 11 : index]} : (index, index) -> () loc(unknown) + accln.terminator loc(unknown) + }) {domain = #domain0, exec_target = 0 : i64, kernels = []} : () -> () loc(unknown) + accv.return loc(unknown) + } loc(unknown) + accv.func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, #map2>, %arg1: memref<128x512xf32, #map0>, %arg2: memref<784x512xf32, #map0>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, #map2>, memref<128x512xf32, #map0>, memref<784x512xf32, #map0>) -> () loc(unknown) + accv.return loc(unknown) + } loc(unknown) + } loc(unknown) +} loc(unknown) diff --git a/Tutorials/optimized_matmul/mlir/10_CSE.mlir b/Tutorials/optimized_matmul/mlir/10_CSE.mlir new file mode 100644 index 00000000..2b20194d --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/10_CSE.mlir @@ -0,0 +1,1368 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = addi %arg4, %c1 : index + %114 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = mulf %114, %115 {RelaxedPrecision} : f32 + %117 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = addf %117, %116 {RelaxedPrecision} : f32 + store %118, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %121 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = mulf %120, %121 {RelaxedPrecision} : f32 + %123 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = addf %123, %122 {RelaxedPrecision} : f32 + store %124, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %125, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = mulf %126, %127 {RelaxedPrecision} : f32 + %129 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = addf %129, %128 {RelaxedPrecision} : f32 + store %130, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %131, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = mulf %132, %133 {RelaxedPrecision} : f32 + %135 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = addf %135, %134 {RelaxedPrecision} : f32 + store %136, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %137, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = mulf %138, %139 {RelaxedPrecision} : f32 + %141 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = addf %141, %140 {RelaxedPrecision} : f32 + store %142, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = mulf %144, %145 {RelaxedPrecision} : f32 + %147 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = addf %147, %146 {RelaxedPrecision} : f32 + store %148, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %149, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %151 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = mulf %150, %151 {RelaxedPrecision} : f32 + %153 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = addf %153, %152 {RelaxedPrecision} : f32 + store %154, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %155, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = mulf %162, %163 {RelaxedPrecision} : f32 + %165 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = addf %165, %164 {RelaxedPrecision} : f32 + store %166, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %169 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = mulf %168, %169 {RelaxedPrecision} : f32 + %171 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addf %171, %170 {RelaxedPrecision} : f32 + store %172, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %173, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %175 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = mulf %174, %175 {RelaxedPrecision} : f32 + %177 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addf %177, %176 {RelaxedPrecision} : f32 + store %178, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %179, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %181 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = mulf %180, %181 {RelaxedPrecision} : f32 + %183 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = addf %183, %182 {RelaxedPrecision} : f32 + store %184, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %185, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %193 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = mulf %192, %193 {RelaxedPrecision} : f32 + %195 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addf %195, %194 {RelaxedPrecision} : f32 + store %196, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %197, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %199 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = mulf %198, %199 {RelaxedPrecision} : f32 + %201 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addf %201, %200 {RelaxedPrecision} : f32 + store %202, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %203, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %205 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = mulf %204, %205 {RelaxedPrecision} : f32 + %207 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = addf %207, %206 {RelaxedPrecision} : f32 + store %208, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %209, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addi %arg4, %c2 : index + %211 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %212 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = mulf %211, %212 {RelaxedPrecision} : f32 + %214 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = addf %214, %213 {RelaxedPrecision} : f32 + store %215, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %216, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %224 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = mulf %223, %224 {RelaxedPrecision} : f32 + %226 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = addf %226, %225 {RelaxedPrecision} : f32 + store %227, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %228, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %230 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = mulf %229, %230 {RelaxedPrecision} : f32 + %232 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = addf %232, %231 {RelaxedPrecision} : f32 + store %233, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %234, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %242 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = mulf %241, %242 {RelaxedPrecision} : f32 + %244 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = addf %244, %243 {RelaxedPrecision} : f32 + store %245, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %246, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %248 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = mulf %247, %248 {RelaxedPrecision} : f32 + %250 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = addf %250, %249 {RelaxedPrecision} : f32 + store %251, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %252, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %260 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = mulf %259, %260 {RelaxedPrecision} : f32 + %262 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = addf %262, %261 {RelaxedPrecision} : f32 + store %263, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %264, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %266 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = mulf %265, %266 {RelaxedPrecision} : f32 + %268 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = addf %268, %267 {RelaxedPrecision} : f32 + store %269, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %270, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %278 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = mulf %277, %278 {RelaxedPrecision} : f32 + %280 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = addf %280, %279 {RelaxedPrecision} : f32 + store %281, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %282, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %284 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = mulf %283, %284 {RelaxedPrecision} : f32 + %286 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = addf %286, %285 {RelaxedPrecision} : f32 + store %287, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %288, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %290 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = mulf %289, %290 {RelaxedPrecision} : f32 + %292 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = addf %292, %291 {RelaxedPrecision} : f32 + store %293, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %294, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %296 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = mulf %295, %296 {RelaxedPrecision} : f32 + %298 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = addf %298, %297 {RelaxedPrecision} : f32 + store %299, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %300, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %302 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = mulf %301, %302 {RelaxedPrecision} : f32 + %304 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = addf %304, %303 {RelaxedPrecision} : f32 + store %305, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %306, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = addi %arg4, %c3 : index + %308 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %309 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = mulf %308, %309 {RelaxedPrecision} : f32 + %311 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addf %311, %310 {RelaxedPrecision} : f32 + store %312, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %313, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %314 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = mulf %314, %315 {RelaxedPrecision} : f32 + %317 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = addf %317, %316 {RelaxedPrecision} : f32 + store %318, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = mulf %320, %321 {RelaxedPrecision} : f32 + %323 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = addf %323, %322 {RelaxedPrecision} : f32 + store %324, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %327 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = mulf %326, %327 {RelaxedPrecision} : f32 + %329 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addf %329, %328 {RelaxedPrecision} : f32 + store %330, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %331, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %333 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = mulf %332, %333 {RelaxedPrecision} : f32 + %335 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = addf %335, %334 {RelaxedPrecision} : f32 + store %336, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %337, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = mulf %338, %339 {RelaxedPrecision} : f32 + %341 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = addf %341, %340 {RelaxedPrecision} : f32 + store %342, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %345 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = mulf %344, %345 {RelaxedPrecision} : f32 + %347 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addf %347, %346 {RelaxedPrecision} : f32 + store %348, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %349, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %351 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = mulf %350, %351 {RelaxedPrecision} : f32 + %353 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = addf %353, %352 {RelaxedPrecision} : f32 + store %354, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %355, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = mulf %356, %357 {RelaxedPrecision} : f32 + %359 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = addf %359, %358 {RelaxedPrecision} : f32 + store %360, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = mulf %362, %363 {RelaxedPrecision} : f32 + %365 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addf %365, %364 {RelaxedPrecision} : f32 + store %366, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %369 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = mulf %368, %369 {RelaxedPrecision} : f32 + %371 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = addf %371, %370 {RelaxedPrecision} : f32 + store %372, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %373, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = mulf %374, %375 {RelaxedPrecision} : f32 + %377 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = addf %377, %376 {RelaxedPrecision} : f32 + store %378, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %381 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = mulf %380, %381 {RelaxedPrecision} : f32 + %383 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addf %383, %382 {RelaxedPrecision} : f32 + store %384, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %385, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = mulf %386, %387 {RelaxedPrecision} : f32 + %389 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = addf %389, %388 {RelaxedPrecision} : f32 + store %390, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = mulf %392, %393 {RelaxedPrecision} : f32 + %395 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = addf %395, %394 {RelaxedPrecision} : f32 + store %396, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %399 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = mulf %398, %399 {RelaxedPrecision} : f32 + %401 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addf %401, %400 {RelaxedPrecision} : f32 + store %402, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %403, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = addi %arg4, %c4 : index + %405 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %412 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = mulf %411, %412 {RelaxedPrecision} : f32 + %414 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = addf %414, %413 {RelaxedPrecision} : f32 + store %415, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %416, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %418 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = mulf %417, %418 {RelaxedPrecision} : f32 + %420 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addf %420, %419 {RelaxedPrecision} : f32 + store %421, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %422, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %430 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = mulf %429, %430 {RelaxedPrecision} : f32 + %432 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = addf %432, %431 {RelaxedPrecision} : f32 + store %433, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %434, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %442 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = mulf %441, %442 {RelaxedPrecision} : f32 + %444 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = addf %444, %443 {RelaxedPrecision} : f32 + store %445, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %446, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %448 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = mulf %447, %448 {RelaxedPrecision} : f32 + %450 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addf %450, %449 {RelaxedPrecision} : f32 + store %451, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %452, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %454 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = mulf %453, %454 {RelaxedPrecision} : f32 + %456 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = addf %456, %455 {RelaxedPrecision} : f32 + store %457, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %458 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %458, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %460 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = mulf %459, %460 {RelaxedPrecision} : f32 + %462 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = addf %462, %461 {RelaxedPrecision} : f32 + store %463, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %464, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %466 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = mulf %465, %466 {RelaxedPrecision} : f32 + %468 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = addf %468, %467 {RelaxedPrecision} : f32 + store %469, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %470, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %472 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = mulf %471, %472 {RelaxedPrecision} : f32 + %474 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = addf %474, %473 {RelaxedPrecision} : f32 + store %475, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %476, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %478 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = mulf %477, %478 {RelaxedPrecision} : f32 + %480 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = addf %480, %479 {RelaxedPrecision} : f32 + store %481, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %482, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %484 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = mulf %483, %484 {RelaxedPrecision} : f32 + %486 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = addf %486, %485 {RelaxedPrecision} : f32 + store %487, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %488, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %490 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = mulf %489, %490 {RelaxedPrecision} : f32 + %492 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = addf %492, %491 {RelaxedPrecision} : f32 + store %493, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %494, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %496 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = mulf %495, %496 {RelaxedPrecision} : f32 + %498 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = addf %498, %497 {RelaxedPrecision} : f32 + store %499, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %500, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = addi %arg4, %c5 : index + %502 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %503 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = mulf %502, %503 {RelaxedPrecision} : f32 + %505 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = addf %505, %504 {RelaxedPrecision} : f32 + store %506, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %507, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %509 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = mulf %508, %509 {RelaxedPrecision} : f32 + %511 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %512 = addf %511, %510 {RelaxedPrecision} : f32 + store %512, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %513, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %521 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = mulf %520, %521 {RelaxedPrecision} : f32 + %523 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = addf %523, %522 {RelaxedPrecision} : f32 + store %524, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %525, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %527 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = mulf %526, %527 {RelaxedPrecision} : f32 + %529 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addf %529, %528 {RelaxedPrecision} : f32 + store %530, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %531, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %533 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = mulf %532, %533 {RelaxedPrecision} : f32 + %535 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addf %535, %534 {RelaxedPrecision} : f32 + store %536, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %537 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %537, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %539 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = mulf %538, %539 {RelaxedPrecision} : f32 + %541 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = addf %541, %540 {RelaxedPrecision} : f32 + store %542, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %543, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %551 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = mulf %550, %551 {RelaxedPrecision} : f32 + %553 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addf %553, %552 {RelaxedPrecision} : f32 + store %554, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %555 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %555, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %557 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = mulf %556, %557 {RelaxedPrecision} : f32 + %559 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addf %559, %558 {RelaxedPrecision} : f32 + store %560, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %561, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %563 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = mulf %562, %563 {RelaxedPrecision} : f32 + %565 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = addf %565, %564 {RelaxedPrecision} : f32 + store %566, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %567, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %569 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = mulf %568, %569 {RelaxedPrecision} : f32 + %571 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = addf %571, %570 {RelaxedPrecision} : f32 + store %572, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %573 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %573, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %581 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = mulf %580, %581 {RelaxedPrecision} : f32 + %583 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = addf %583, %582 {RelaxedPrecision} : f32 + store %584, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %585, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %587 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = mulf %586, %587 {RelaxedPrecision} : f32 + %589 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addf %589, %588 {RelaxedPrecision} : f32 + store %590, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %591 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %591, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %593 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = mulf %592, %593 {RelaxedPrecision} : f32 + %595 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = addf %595, %594 {RelaxedPrecision} : f32 + store %596, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %597 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %597, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = mulf %113, %114 {RelaxedPrecision} : f32 + %116 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %117 = addf %116, %115 {RelaxedPrecision} : f32 + store %117, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %118, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = mulf %125, %126 {RelaxedPrecision} : f32 + %128 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = addf %128, %127 {RelaxedPrecision} : f32 + store %129, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %130, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = mulf %131, %132 {RelaxedPrecision} : f32 + %134 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = addf %134, %133 {RelaxedPrecision} : f32 + store %135, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = mulf %143, %144 {RelaxedPrecision} : f32 + %146 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = addf %146, %145 {RelaxedPrecision} : f32 + store %147, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %148, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = mulf %149, %150 {RelaxedPrecision} : f32 + %152 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = addf %152, %151 {RelaxedPrecision} : f32 + store %153, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %157 = mulf %155, %156 {RelaxedPrecision} : f32 + %158 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = addf %158, %157 {RelaxedPrecision} : f32 + store %159, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %160, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = mulf %161, %162 {RelaxedPrecision} : f32 + %164 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = addf %164, %163 {RelaxedPrecision} : f32 + store %165, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %166, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = mulf %167, %168 {RelaxedPrecision} : f32 + %170 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = addf %170, %169 {RelaxedPrecision} : f32 + store %171, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %174 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = mulf %173, %174 {RelaxedPrecision} : f32 + %176 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = addf %176, %175 {RelaxedPrecision} : f32 + store %177, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %178, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %180 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = mulf %179, %180 {RelaxedPrecision} : f32 + %182 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = addf %182, %181 {RelaxedPrecision} : f32 + store %183, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %184, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = mulf %185, %186 {RelaxedPrecision} : f32 + %188 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = addf %188, %187 {RelaxedPrecision} : f32 + store %189, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %192 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %193 = mulf %191, %192 {RelaxedPrecision} : f32 + %194 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = addf %194, %193 {RelaxedPrecision} : f32 + store %195, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %196, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %198 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = mulf %197, %198 {RelaxedPrecision} : f32 + %200 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = addf %200, %199 {RelaxedPrecision} : f32 + store %201, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %202, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = mulf %203, %204 {RelaxedPrecision} : f32 + %206 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = addf %206, %205 {RelaxedPrecision} : f32 + store %207, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %210 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = mulf %209, %210 {RelaxedPrecision} : f32 + %212 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = addf %212, %211 {RelaxedPrecision} : f32 + store %213, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %214, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %216 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = mulf %215, %216 {RelaxedPrecision} : f32 + %218 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = addf %218, %217 {RelaxedPrecision} : f32 + store %219, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %220, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %318 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = mulf %317, %318 {RelaxedPrecision} : f32 + %320 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addf %320, %319 {RelaxedPrecision} : f32 + store %321, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %322, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %324 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = mulf %323, %324 {RelaxedPrecision} : f32 + %326 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = addf %326, %325 {RelaxedPrecision} : f32 + store %327, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %328, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = mulf %329, %330 {RelaxedPrecision} : f32 + %332 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = addf %332, %331 {RelaxedPrecision} : f32 + store %333, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %342 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = mulf %341, %342 {RelaxedPrecision} : f32 + %344 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %345 = addf %344, %343 {RelaxedPrecision} : f32 + store %345, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %346, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = mulf %347, %348 {RelaxedPrecision} : f32 + %350 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addf %350, %349 {RelaxedPrecision} : f32 + store %351, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %354 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = mulf %353, %354 {RelaxedPrecision} : f32 + %356 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addf %356, %355 {RelaxedPrecision} : f32 + store %357, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %358, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %360 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = mulf %359, %360 {RelaxedPrecision} : f32 + %362 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %363 = addf %362, %361 {RelaxedPrecision} : f32 + store %363, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %364, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %372 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = mulf %371, %372 {RelaxedPrecision} : f32 + %374 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addf %374, %373 {RelaxedPrecision} : f32 + store %375, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %376, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %378 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = mulf %377, %378 {RelaxedPrecision} : f32 + %380 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addf %380, %379 {RelaxedPrecision} : f32 + store %381, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %382, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = mulf %383, %384 {RelaxedPrecision} : f32 + %386 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = addf %386, %385 {RelaxedPrecision} : f32 + store %387, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %390 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = mulf %389, %390 {RelaxedPrecision} : f32 + %392 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addf %392, %391 {RelaxedPrecision} : f32 + store %393, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %394, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/10_Canonicalizer.mlir b/Tutorials/optimized_matmul/mlir/10_Canonicalizer.mlir new file mode 100644 index 00000000..ff398abb --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/10_Canonicalizer.mlir @@ -0,0 +1,11596 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %arg3, %arg5 : index + %7 = addi %6, %c8 : index + %8 = vector.transfer_read %arg1[%arg4, %7], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %8, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %9 = addi %arg3, %arg5 : index + %10 = addi %9, %c16 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %12 = addi %arg3, %arg5 : index + %13 = addi %12, %c24 : index + %14 = vector.transfer_read %arg1[%arg4, %13], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %15 = addi %arg3, %arg5 : index + %16 = addi %15, %c32 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %18 = addi %arg3, %arg5 : index + %19 = addi %18, %c40 : index + %20 = vector.transfer_read %arg1[%arg4, %19], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %20, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %21 = addi %arg3, %arg5 : index + %22 = addi %21, %c48 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %24 = addi %arg3, %arg5 : index + %25 = addi %24, %c56 : index + %26 = vector.transfer_read %arg1[%arg4, %25], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %27 = addi %arg3, %arg5 : index + %28 = addi %27, %c64 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %30 = addi %arg3, %arg5 : index + %31 = addi %30, %c72 : index + %32 = vector.transfer_read %arg1[%arg4, %31], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %32, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %33 = addi %arg3, %arg5 : index + %34 = addi %33, %c80 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %36 = addi %arg3, %arg5 : index + %37 = addi %36, %c88 : index + %38 = vector.transfer_read %arg1[%arg4, %37], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %39 = addi %arg3, %arg5 : index + %40 = addi %39, %c96 : index + %41 = vector.transfer_read %arg1[%arg4, %40], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %41, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %42 = addi %arg3, %arg5 : index + %43 = addi %42, %c104 : index + %44 = vector.transfer_read %arg1[%arg4, %43], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %44, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %45 = addi %arg3, %arg5 : index + %46 = addi %45, %c112 : index + %47 = vector.transfer_read %arg1[%arg4, %46], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %47, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %48 = addi %arg3, %arg5 : index + %49 = addi %48, %c120 : index + %50 = vector.transfer_read %arg1[%arg4, %49], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %51 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %52 = cmpi "slt", %arg5, %c0 : index + %53 = subi %c-1, %arg5 : index + %54 = select %52, %53, %arg5 : index + %55 = divi_signed %54, %c16 : index + %56 = subi %c-1, %55 : index + %57 = select %52, %56, %55 : index + %58 = remi_signed %57, %c16 : index + %59 = cmpi "slt", %58, %c0 : index + %60 = addi %58, %c16 : index + %61 = select %59, %60, %58 : index + %62 = remi_signed %arg4, %c128 : index + %63 = cmpi "slt", %62, %c0 : index + %64 = addi %62, %c128 : index + %65 = select %63, %64, %62 : index + %66 = remi_signed %arg5, %c16 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = addi %66, %c16 : index + %69 = select %67, %68, %66 : index + %70 = cmpi "slt", %69, %c0 : index + %71 = subi %c-1, %69 : index + %72 = select %70, %71, %69 : index + %73 = divi_signed %72, %c8 : index + %74 = subi %c-1, %73 : index + %75 = select %70, %74, %73 : index + %76 = remi_signed %75, %c2 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = addi %76, %c2 : index + %79 = select %77, %78, %76 : index + store %51, %3[%61, %65, %79] : memref<16x128x2xvector<8xf32>> + %80 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %81 = addi %arg5, %c8 : index + %82 = cmpi "slt", %81, %c0 : index + %83 = subi %c-1, %81 : index + %84 = select %82, %83, %81 : index + %85 = divi_signed %84, %c16 : index + %86 = subi %c-1, %85 : index + %87 = select %82, %86, %85 : index + %88 = remi_signed %87, %c16 : index + %89 = cmpi "slt", %88, %c0 : index + %90 = addi %88, %c16 : index + %91 = select %89, %90, %88 : index + %92 = remi_signed %arg4, %c128 : index + %93 = cmpi "slt", %92, %c0 : index + %94 = addi %92, %c128 : index + %95 = select %93, %94, %92 : index + %96 = cmpi "slt", %arg5, %c0 : index + %97 = subi %c-1, %arg5 : index + %98 = select %96, %97, %arg5 : index + %99 = divi_signed %98, %c8 : index + %100 = subi %c-1, %99 : index + %101 = select %96, %100, %99 : index + %102 = addi %arg5, %c8 : index + %103 = cmpi "slt", %102, %c0 : index + %104 = subi %c-1, %102 : index + %105 = select %103, %104, %102 : index + %106 = divi_signed %105, %c16 : index + %107 = subi %c-1, %106 : index + %108 = select %103, %107, %106 : index + %109 = muli %108, %c-2 : index + %110 = addi %101, %109 : index + %111 = cmpi "slt", %arg5, %c0 : index + %112 = subi %c-1, %arg5 : index + %113 = select %111, %112, %arg5 : index + %114 = divi_signed %113, %c8 : index + %115 = subi %c-1, %114 : index + %116 = select %111, %115, %114 : index + %117 = addi %arg5, %c8 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c16 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c1 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = subi %c-1, %126 : index + %129 = select %127, %128, %126 : index + %130 = divi_signed %129, %c2 : index + %131 = subi %c-1, %130 : index + %132 = select %127, %131, %130 : index + %133 = muli %132, %c-2 : index + %134 = addi %110, %133 : index + %135 = addi %134, %c1 : index + store %80, %3[%91, %95, %135] : memref<16x128x2xvector<8xf32>> + %136 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %137 = cmpi "slt", %arg5, %c0 : index + %138 = subi %c-1, %arg5 : index + %139 = select %137, %138, %arg5 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = cmpi "slt", %arg5, %c0 : index + %144 = subi %c-1, %arg5 : index + %145 = select %143, %144, %arg5 : index + %146 = divi_signed %145, %c16 : index + %147 = subi %c-1, %146 : index + %148 = select %143, %147, %146 : index + %149 = addi %148, %c1 : index + %150 = cmpi "slt", %149, %c0 : index + %151 = subi %c-1, %149 : index + %152 = select %150, %151, %149 : index + %153 = divi_signed %152, %c16 : index + %154 = subi %c-1, %153 : index + %155 = select %150, %154, %153 : index + %156 = muli %155, %c-16 : index + %157 = addi %142, %156 : index + %158 = addi %157, %c1 : index + %159 = remi_signed %arg4, %c128 : index + %160 = cmpi "slt", %159, %c0 : index + %161 = addi %159, %c128 : index + %162 = select %160, %161, %159 : index + %163 = remi_signed %arg5, %c16 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = addi %163, %c16 : index + %166 = select %164, %165, %163 : index + %167 = cmpi "slt", %166, %c0 : index + %168 = subi %c-1, %166 : index + %169 = select %167, %168, %166 : index + %170 = divi_signed %169, %c8 : index + %171 = subi %c-1, %170 : index + %172 = select %167, %171, %170 : index + %173 = remi_signed %172, %c2 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = addi %173, %c2 : index + %176 = select %174, %175, %173 : index + store %136, %3[%158, %162, %176] : memref<16x128x2xvector<8xf32>> + %177 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %178 = addi %arg5, %c24 : index + %179 = cmpi "slt", %178, %c0 : index + %180 = subi %c-1, %178 : index + %181 = select %179, %180, %178 : index + %182 = divi_signed %181, %c16 : index + %183 = subi %c-1, %182 : index + %184 = select %179, %183, %182 : index + %185 = remi_signed %184, %c16 : index + %186 = cmpi "slt", %185, %c0 : index + %187 = addi %185, %c16 : index + %188 = select %186, %187, %185 : index + %189 = remi_signed %arg4, %c128 : index + %190 = cmpi "slt", %189, %c0 : index + %191 = addi %189, %c128 : index + %192 = select %190, %191, %189 : index + %193 = cmpi "slt", %arg5, %c0 : index + %194 = subi %c-1, %arg5 : index + %195 = select %193, %194, %arg5 : index + %196 = divi_signed %195, %c8 : index + %197 = subi %c-1, %196 : index + %198 = select %193, %197, %196 : index + %199 = addi %arg5, %c24 : index + %200 = cmpi "slt", %199, %c0 : index + %201 = subi %c-1, %199 : index + %202 = select %200, %201, %199 : index + %203 = divi_signed %202, %c16 : index + %204 = subi %c-1, %203 : index + %205 = select %200, %204, %203 : index + %206 = muli %205, %c-2 : index + %207 = addi %198, %206 : index + %208 = cmpi "slt", %arg5, %c0 : index + %209 = subi %c-1, %arg5 : index + %210 = select %208, %209, %arg5 : index + %211 = divi_signed %210, %c8 : index + %212 = subi %c-1, %211 : index + %213 = select %208, %212, %211 : index + %214 = addi %arg5, %c24 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c16 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c3 : index + %224 = cmpi "slt", %223, %c0 : index + %225 = subi %c-1, %223 : index + %226 = select %224, %225, %223 : index + %227 = divi_signed %226, %c2 : index + %228 = subi %c-1, %227 : index + %229 = select %224, %228, %227 : index + %230 = muli %229, %c-2 : index + %231 = addi %207, %230 : index + %232 = addi %231, %c3 : index + store %177, %3[%188, %192, %232] : memref<16x128x2xvector<8xf32>> + %233 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %234 = cmpi "slt", %arg5, %c0 : index + %235 = subi %c-1, %arg5 : index + %236 = select %234, %235, %arg5 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = cmpi "slt", %arg5, %c0 : index + %241 = subi %c-1, %arg5 : index + %242 = select %240, %241, %arg5 : index + %243 = divi_signed %242, %c16 : index + %244 = subi %c-1, %243 : index + %245 = select %240, %244, %243 : index + %246 = addi %245, %c2 : index + %247 = cmpi "slt", %246, %c0 : index + %248 = subi %c-1, %246 : index + %249 = select %247, %248, %246 : index + %250 = divi_signed %249, %c16 : index + %251 = subi %c-1, %250 : index + %252 = select %247, %251, %250 : index + %253 = muli %252, %c-16 : index + %254 = addi %239, %253 : index + %255 = addi %254, %c2 : index + %256 = remi_signed %arg4, %c128 : index + %257 = cmpi "slt", %256, %c0 : index + %258 = addi %256, %c128 : index + %259 = select %257, %258, %256 : index + %260 = remi_signed %arg5, %c16 : index + %261 = cmpi "slt", %260, %c0 : index + %262 = addi %260, %c16 : index + %263 = select %261, %262, %260 : index + %264 = cmpi "slt", %263, %c0 : index + %265 = subi %c-1, %263 : index + %266 = select %264, %265, %263 : index + %267 = divi_signed %266, %c8 : index + %268 = subi %c-1, %267 : index + %269 = select %264, %268, %267 : index + %270 = remi_signed %269, %c2 : index + %271 = cmpi "slt", %270, %c0 : index + %272 = addi %270, %c2 : index + %273 = select %271, %272, %270 : index + store %233, %3[%255, %259, %273] : memref<16x128x2xvector<8xf32>> + %274 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %275 = addi %arg5, %c40 : index + %276 = cmpi "slt", %275, %c0 : index + %277 = subi %c-1, %275 : index + %278 = select %276, %277, %275 : index + %279 = divi_signed %278, %c16 : index + %280 = subi %c-1, %279 : index + %281 = select %276, %280, %279 : index + %282 = remi_signed %281, %c16 : index + %283 = cmpi "slt", %282, %c0 : index + %284 = addi %282, %c16 : index + %285 = select %283, %284, %282 : index + %286 = remi_signed %arg4, %c128 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c128 : index + %289 = select %287, %288, %286 : index + %290 = cmpi "slt", %arg5, %c0 : index + %291 = subi %c-1, %arg5 : index + %292 = select %290, %291, %arg5 : index + %293 = divi_signed %292, %c8 : index + %294 = subi %c-1, %293 : index + %295 = select %290, %294, %293 : index + %296 = addi %arg5, %c40 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c16 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = cmpi "slt", %arg5, %c0 : index + %306 = subi %c-1, %arg5 : index + %307 = select %305, %306, %arg5 : index + %308 = divi_signed %307, %c8 : index + %309 = subi %c-1, %308 : index + %310 = select %305, %309, %308 : index + %311 = addi %arg5, %c40 : index + %312 = cmpi "slt", %311, %c0 : index + %313 = subi %c-1, %311 : index + %314 = select %312, %313, %311 : index + %315 = divi_signed %314, %c16 : index + %316 = subi %c-1, %315 : index + %317 = select %312, %316, %315 : index + %318 = muli %317, %c-2 : index + %319 = addi %310, %318 : index + %320 = addi %319, %c5 : index + %321 = cmpi "slt", %320, %c0 : index + %322 = subi %c-1, %320 : index + %323 = select %321, %322, %320 : index + %324 = divi_signed %323, %c2 : index + %325 = subi %c-1, %324 : index + %326 = select %321, %325, %324 : index + %327 = muli %326, %c-2 : index + %328 = addi %304, %327 : index + %329 = addi %328, %c5 : index + store %274, %3[%285, %289, %329] : memref<16x128x2xvector<8xf32>> + %330 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %331 = cmpi "slt", %arg5, %c0 : index + %332 = subi %c-1, %arg5 : index + %333 = select %331, %332, %arg5 : index + %334 = divi_signed %333, %c16 : index + %335 = subi %c-1, %334 : index + %336 = select %331, %335, %334 : index + %337 = cmpi "slt", %arg5, %c0 : index + %338 = subi %c-1, %arg5 : index + %339 = select %337, %338, %arg5 : index + %340 = divi_signed %339, %c16 : index + %341 = subi %c-1, %340 : index + %342 = select %337, %341, %340 : index + %343 = addi %342, %c3 : index + %344 = cmpi "slt", %343, %c0 : index + %345 = subi %c-1, %343 : index + %346 = select %344, %345, %343 : index + %347 = divi_signed %346, %c16 : index + %348 = subi %c-1, %347 : index + %349 = select %344, %348, %347 : index + %350 = muli %349, %c-16 : index + %351 = addi %336, %350 : index + %352 = addi %351, %c3 : index + %353 = remi_signed %arg4, %c128 : index + %354 = cmpi "slt", %353, %c0 : index + %355 = addi %353, %c128 : index + %356 = select %354, %355, %353 : index + %357 = remi_signed %arg5, %c16 : index + %358 = cmpi "slt", %357, %c0 : index + %359 = addi %357, %c16 : index + %360 = select %358, %359, %357 : index + %361 = cmpi "slt", %360, %c0 : index + %362 = subi %c-1, %360 : index + %363 = select %361, %362, %360 : index + %364 = divi_signed %363, %c8 : index + %365 = subi %c-1, %364 : index + %366 = select %361, %365, %364 : index + %367 = remi_signed %366, %c2 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = addi %367, %c2 : index + %370 = select %368, %369, %367 : index + store %330, %3[%352, %356, %370] : memref<16x128x2xvector<8xf32>> + %371 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %372 = addi %arg5, %c56 : index + %373 = cmpi "slt", %372, %c0 : index + %374 = subi %c-1, %372 : index + %375 = select %373, %374, %372 : index + %376 = divi_signed %375, %c16 : index + %377 = subi %c-1, %376 : index + %378 = select %373, %377, %376 : index + %379 = remi_signed %378, %c16 : index + %380 = cmpi "slt", %379, %c0 : index + %381 = addi %379, %c16 : index + %382 = select %380, %381, %379 : index + %383 = remi_signed %arg4, %c128 : index + %384 = cmpi "slt", %383, %c0 : index + %385 = addi %383, %c128 : index + %386 = select %384, %385, %383 : index + %387 = cmpi "slt", %arg5, %c0 : index + %388 = subi %c-1, %arg5 : index + %389 = select %387, %388, %arg5 : index + %390 = divi_signed %389, %c8 : index + %391 = subi %c-1, %390 : index + %392 = select %387, %391, %390 : index + %393 = addi %arg5, %c56 : index + %394 = cmpi "slt", %393, %c0 : index + %395 = subi %c-1, %393 : index + %396 = select %394, %395, %393 : index + %397 = divi_signed %396, %c16 : index + %398 = subi %c-1, %397 : index + %399 = select %394, %398, %397 : index + %400 = muli %399, %c-2 : index + %401 = addi %392, %400 : index + %402 = cmpi "slt", %arg5, %c0 : index + %403 = subi %c-1, %arg5 : index + %404 = select %402, %403, %arg5 : index + %405 = divi_signed %404, %c8 : index + %406 = subi %c-1, %405 : index + %407 = select %402, %406, %405 : index + %408 = addi %arg5, %c56 : index + %409 = cmpi "slt", %408, %c0 : index + %410 = subi %c-1, %408 : index + %411 = select %409, %410, %408 : index + %412 = divi_signed %411, %c16 : index + %413 = subi %c-1, %412 : index + %414 = select %409, %413, %412 : index + %415 = muli %414, %c-2 : index + %416 = addi %407, %415 : index + %417 = addi %416, %c7 : index + %418 = cmpi "slt", %417, %c0 : index + %419 = subi %c-1, %417 : index + %420 = select %418, %419, %417 : index + %421 = divi_signed %420, %c2 : index + %422 = subi %c-1, %421 : index + %423 = select %418, %422, %421 : index + %424 = muli %423, %c-2 : index + %425 = addi %401, %424 : index + %426 = addi %425, %c7 : index + store %371, %3[%382, %386, %426] : memref<16x128x2xvector<8xf32>> + %427 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %428 = cmpi "slt", %arg5, %c0 : index + %429 = subi %c-1, %arg5 : index + %430 = select %428, %429, %arg5 : index + %431 = divi_signed %430, %c16 : index + %432 = subi %c-1, %431 : index + %433 = select %428, %432, %431 : index + %434 = cmpi "slt", %arg5, %c0 : index + %435 = subi %c-1, %arg5 : index + %436 = select %434, %435, %arg5 : index + %437 = divi_signed %436, %c16 : index + %438 = subi %c-1, %437 : index + %439 = select %434, %438, %437 : index + %440 = addi %439, %c4 : index + %441 = cmpi "slt", %440, %c0 : index + %442 = subi %c-1, %440 : index + %443 = select %441, %442, %440 : index + %444 = divi_signed %443, %c16 : index + %445 = subi %c-1, %444 : index + %446 = select %441, %445, %444 : index + %447 = muli %446, %c-16 : index + %448 = addi %433, %447 : index + %449 = addi %448, %c4 : index + %450 = remi_signed %arg4, %c128 : index + %451 = cmpi "slt", %450, %c0 : index + %452 = addi %450, %c128 : index + %453 = select %451, %452, %450 : index + %454 = remi_signed %arg5, %c16 : index + %455 = cmpi "slt", %454, %c0 : index + %456 = addi %454, %c16 : index + %457 = select %455, %456, %454 : index + %458 = cmpi "slt", %457, %c0 : index + %459 = subi %c-1, %457 : index + %460 = select %458, %459, %457 : index + %461 = divi_signed %460, %c8 : index + %462 = subi %c-1, %461 : index + %463 = select %458, %462, %461 : index + %464 = remi_signed %463, %c2 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = addi %464, %c2 : index + %467 = select %465, %466, %464 : index + store %427, %3[%449, %453, %467] : memref<16x128x2xvector<8xf32>> + %468 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %469 = addi %arg5, %c72 : index + %470 = cmpi "slt", %469, %c0 : index + %471 = subi %c-1, %469 : index + %472 = select %470, %471, %469 : index + %473 = divi_signed %472, %c16 : index + %474 = subi %c-1, %473 : index + %475 = select %470, %474, %473 : index + %476 = remi_signed %475, %c16 : index + %477 = cmpi "slt", %476, %c0 : index + %478 = addi %476, %c16 : index + %479 = select %477, %478, %476 : index + %480 = remi_signed %arg4, %c128 : index + %481 = cmpi "slt", %480, %c0 : index + %482 = addi %480, %c128 : index + %483 = select %481, %482, %480 : index + %484 = cmpi "slt", %arg5, %c0 : index + %485 = subi %c-1, %arg5 : index + %486 = select %484, %485, %arg5 : index + %487 = divi_signed %486, %c8 : index + %488 = subi %c-1, %487 : index + %489 = select %484, %488, %487 : index + %490 = addi %arg5, %c72 : index + %491 = cmpi "slt", %490, %c0 : index + %492 = subi %c-1, %490 : index + %493 = select %491, %492, %490 : index + %494 = divi_signed %493, %c16 : index + %495 = subi %c-1, %494 : index + %496 = select %491, %495, %494 : index + %497 = muli %496, %c-2 : index + %498 = addi %489, %497 : index + %499 = cmpi "slt", %arg5, %c0 : index + %500 = subi %c-1, %arg5 : index + %501 = select %499, %500, %arg5 : index + %502 = divi_signed %501, %c8 : index + %503 = subi %c-1, %502 : index + %504 = select %499, %503, %502 : index + %505 = addi %arg5, %c72 : index + %506 = cmpi "slt", %505, %c0 : index + %507 = subi %c-1, %505 : index + %508 = select %506, %507, %505 : index + %509 = divi_signed %508, %c16 : index + %510 = subi %c-1, %509 : index + %511 = select %506, %510, %509 : index + %512 = muli %511, %c-2 : index + %513 = addi %504, %512 : index + %514 = addi %513, %c9 : index + %515 = cmpi "slt", %514, %c0 : index + %516 = subi %c-1, %514 : index + %517 = select %515, %516, %514 : index + %518 = divi_signed %517, %c2 : index + %519 = subi %c-1, %518 : index + %520 = select %515, %519, %518 : index + %521 = muli %520, %c-2 : index + %522 = addi %498, %521 : index + %523 = addi %522, %c9 : index + store %468, %3[%479, %483, %523] : memref<16x128x2xvector<8xf32>> + %524 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %525 = cmpi "slt", %arg5, %c0 : index + %526 = subi %c-1, %arg5 : index + %527 = select %525, %526, %arg5 : index + %528 = divi_signed %527, %c16 : index + %529 = subi %c-1, %528 : index + %530 = select %525, %529, %528 : index + %531 = cmpi "slt", %arg5, %c0 : index + %532 = subi %c-1, %arg5 : index + %533 = select %531, %532, %arg5 : index + %534 = divi_signed %533, %c16 : index + %535 = subi %c-1, %534 : index + %536 = select %531, %535, %534 : index + %537 = addi %536, %c5 : index + %538 = cmpi "slt", %537, %c0 : index + %539 = subi %c-1, %537 : index + %540 = select %538, %539, %537 : index + %541 = divi_signed %540, %c16 : index + %542 = subi %c-1, %541 : index + %543 = select %538, %542, %541 : index + %544 = muli %543, %c-16 : index + %545 = addi %530, %544 : index + %546 = addi %545, %c5 : index + %547 = remi_signed %arg4, %c128 : index + %548 = cmpi "slt", %547, %c0 : index + %549 = addi %547, %c128 : index + %550 = select %548, %549, %547 : index + %551 = remi_signed %arg5, %c16 : index + %552 = cmpi "slt", %551, %c0 : index + %553 = addi %551, %c16 : index + %554 = select %552, %553, %551 : index + %555 = cmpi "slt", %554, %c0 : index + %556 = subi %c-1, %554 : index + %557 = select %555, %556, %554 : index + %558 = divi_signed %557, %c8 : index + %559 = subi %c-1, %558 : index + %560 = select %555, %559, %558 : index + %561 = remi_signed %560, %c2 : index + %562 = cmpi "slt", %561, %c0 : index + %563 = addi %561, %c2 : index + %564 = select %562, %563, %561 : index + store %524, %3[%546, %550, %564] : memref<16x128x2xvector<8xf32>> + %565 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %566 = addi %arg5, %c88 : index + %567 = cmpi "slt", %566, %c0 : index + %568 = subi %c-1, %566 : index + %569 = select %567, %568, %566 : index + %570 = divi_signed %569, %c16 : index + %571 = subi %c-1, %570 : index + %572 = select %567, %571, %570 : index + %573 = remi_signed %572, %c16 : index + %574 = cmpi "slt", %573, %c0 : index + %575 = addi %573, %c16 : index + %576 = select %574, %575, %573 : index + %577 = remi_signed %arg4, %c128 : index + %578 = cmpi "slt", %577, %c0 : index + %579 = addi %577, %c128 : index + %580 = select %578, %579, %577 : index + %581 = cmpi "slt", %arg5, %c0 : index + %582 = subi %c-1, %arg5 : index + %583 = select %581, %582, %arg5 : index + %584 = divi_signed %583, %c8 : index + %585 = subi %c-1, %584 : index + %586 = select %581, %585, %584 : index + %587 = addi %arg5, %c88 : index + %588 = cmpi "slt", %587, %c0 : index + %589 = subi %c-1, %587 : index + %590 = select %588, %589, %587 : index + %591 = divi_signed %590, %c16 : index + %592 = subi %c-1, %591 : index + %593 = select %588, %592, %591 : index + %594 = muli %593, %c-2 : index + %595 = addi %586, %594 : index + %596 = cmpi "slt", %arg5, %c0 : index + %597 = subi %c-1, %arg5 : index + %598 = select %596, %597, %arg5 : index + %599 = divi_signed %598, %c8 : index + %600 = subi %c-1, %599 : index + %601 = select %596, %600, %599 : index + %602 = addi %arg5, %c88 : index + %603 = cmpi "slt", %602, %c0 : index + %604 = subi %c-1, %602 : index + %605 = select %603, %604, %602 : index + %606 = divi_signed %605, %c16 : index + %607 = subi %c-1, %606 : index + %608 = select %603, %607, %606 : index + %609 = muli %608, %c-2 : index + %610 = addi %601, %609 : index + %611 = addi %610, %c11 : index + %612 = cmpi "slt", %611, %c0 : index + %613 = subi %c-1, %611 : index + %614 = select %612, %613, %611 : index + %615 = divi_signed %614, %c2 : index + %616 = subi %c-1, %615 : index + %617 = select %612, %616, %615 : index + %618 = muli %617, %c-2 : index + %619 = addi %595, %618 : index + %620 = addi %619, %c11 : index + store %565, %3[%576, %580, %620] : memref<16x128x2xvector<8xf32>> + %621 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %622 = cmpi "slt", %arg5, %c0 : index + %623 = subi %c-1, %arg5 : index + %624 = select %622, %623, %arg5 : index + %625 = divi_signed %624, %c16 : index + %626 = subi %c-1, %625 : index + %627 = select %622, %626, %625 : index + %628 = cmpi "slt", %arg5, %c0 : index + %629 = subi %c-1, %arg5 : index + %630 = select %628, %629, %arg5 : index + %631 = divi_signed %630, %c16 : index + %632 = subi %c-1, %631 : index + %633 = select %628, %632, %631 : index + %634 = addi %633, %c6 : index + %635 = cmpi "slt", %634, %c0 : index + %636 = subi %c-1, %634 : index + %637 = select %635, %636, %634 : index + %638 = divi_signed %637, %c16 : index + %639 = subi %c-1, %638 : index + %640 = select %635, %639, %638 : index + %641 = muli %640, %c-16 : index + %642 = addi %627, %641 : index + %643 = addi %642, %c6 : index + %644 = remi_signed %arg4, %c128 : index + %645 = cmpi "slt", %644, %c0 : index + %646 = addi %644, %c128 : index + %647 = select %645, %646, %644 : index + %648 = remi_signed %arg5, %c16 : index + %649 = cmpi "slt", %648, %c0 : index + %650 = addi %648, %c16 : index + %651 = select %649, %650, %648 : index + %652 = cmpi "slt", %651, %c0 : index + %653 = subi %c-1, %651 : index + %654 = select %652, %653, %651 : index + %655 = divi_signed %654, %c8 : index + %656 = subi %c-1, %655 : index + %657 = select %652, %656, %655 : index + %658 = remi_signed %657, %c2 : index + %659 = cmpi "slt", %658, %c0 : index + %660 = addi %658, %c2 : index + %661 = select %659, %660, %658 : index + store %621, %3[%643, %647, %661] : memref<16x128x2xvector<8xf32>> + %662 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %663 = addi %arg5, %c104 : index + %664 = cmpi "slt", %663, %c0 : index + %665 = subi %c-1, %663 : index + %666 = select %664, %665, %663 : index + %667 = divi_signed %666, %c16 : index + %668 = subi %c-1, %667 : index + %669 = select %664, %668, %667 : index + %670 = remi_signed %669, %c16 : index + %671 = cmpi "slt", %670, %c0 : index + %672 = addi %670, %c16 : index + %673 = select %671, %672, %670 : index + %674 = remi_signed %arg4, %c128 : index + %675 = cmpi "slt", %674, %c0 : index + %676 = addi %674, %c128 : index + %677 = select %675, %676, %674 : index + %678 = cmpi "slt", %arg5, %c0 : index + %679 = subi %c-1, %arg5 : index + %680 = select %678, %679, %arg5 : index + %681 = divi_signed %680, %c8 : index + %682 = subi %c-1, %681 : index + %683 = select %678, %682, %681 : index + %684 = addi %arg5, %c104 : index + %685 = cmpi "slt", %684, %c0 : index + %686 = subi %c-1, %684 : index + %687 = select %685, %686, %684 : index + %688 = divi_signed %687, %c16 : index + %689 = subi %c-1, %688 : index + %690 = select %685, %689, %688 : index + %691 = muli %690, %c-2 : index + %692 = addi %683, %691 : index + %693 = cmpi "slt", %arg5, %c0 : index + %694 = subi %c-1, %arg5 : index + %695 = select %693, %694, %arg5 : index + %696 = divi_signed %695, %c8 : index + %697 = subi %c-1, %696 : index + %698 = select %693, %697, %696 : index + %699 = addi %arg5, %c104 : index + %700 = cmpi "slt", %699, %c0 : index + %701 = subi %c-1, %699 : index + %702 = select %700, %701, %699 : index + %703 = divi_signed %702, %c16 : index + %704 = subi %c-1, %703 : index + %705 = select %700, %704, %703 : index + %706 = muli %705, %c-2 : index + %707 = addi %698, %706 : index + %708 = addi %707, %c13 : index + %709 = cmpi "slt", %708, %c0 : index + %710 = subi %c-1, %708 : index + %711 = select %709, %710, %708 : index + %712 = divi_signed %711, %c2 : index + %713 = subi %c-1, %712 : index + %714 = select %709, %713, %712 : index + %715 = muli %714, %c-2 : index + %716 = addi %692, %715 : index + %717 = addi %716, %c13 : index + store %662, %3[%673, %677, %717] : memref<16x128x2xvector<8xf32>> + %718 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %719 = cmpi "slt", %arg5, %c0 : index + %720 = subi %c-1, %arg5 : index + %721 = select %719, %720, %arg5 : index + %722 = divi_signed %721, %c16 : index + %723 = subi %c-1, %722 : index + %724 = select %719, %723, %722 : index + %725 = cmpi "slt", %arg5, %c0 : index + %726 = subi %c-1, %arg5 : index + %727 = select %725, %726, %arg5 : index + %728 = divi_signed %727, %c16 : index + %729 = subi %c-1, %728 : index + %730 = select %725, %729, %728 : index + %731 = addi %730, %c7 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c16 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = muli %737, %c-16 : index + %739 = addi %724, %738 : index + %740 = addi %739, %c7 : index + %741 = remi_signed %arg4, %c128 : index + %742 = cmpi "slt", %741, %c0 : index + %743 = addi %741, %c128 : index + %744 = select %742, %743, %741 : index + %745 = remi_signed %arg5, %c16 : index + %746 = cmpi "slt", %745, %c0 : index + %747 = addi %745, %c16 : index + %748 = select %746, %747, %745 : index + %749 = cmpi "slt", %748, %c0 : index + %750 = subi %c-1, %748 : index + %751 = select %749, %750, %748 : index + %752 = divi_signed %751, %c8 : index + %753 = subi %c-1, %752 : index + %754 = select %749, %753, %752 : index + %755 = remi_signed %754, %c2 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = addi %755, %c2 : index + %758 = select %756, %757, %755 : index + store %718, %3[%740, %744, %758] : memref<16x128x2xvector<8xf32>> + %759 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %760 = addi %arg5, %c120 : index + %761 = cmpi "slt", %760, %c0 : index + %762 = subi %c-1, %760 : index + %763 = select %761, %762, %760 : index + %764 = divi_signed %763, %c16 : index + %765 = subi %c-1, %764 : index + %766 = select %761, %765, %764 : index + %767 = remi_signed %766, %c16 : index + %768 = cmpi "slt", %767, %c0 : index + %769 = addi %767, %c16 : index + %770 = select %768, %769, %767 : index + %771 = remi_signed %arg4, %c128 : index + %772 = cmpi "slt", %771, %c0 : index + %773 = addi %771, %c128 : index + %774 = select %772, %773, %771 : index + %775 = cmpi "slt", %arg5, %c0 : index + %776 = subi %c-1, %arg5 : index + %777 = select %775, %776, %arg5 : index + %778 = divi_signed %777, %c8 : index + %779 = subi %c-1, %778 : index + %780 = select %775, %779, %778 : index + %781 = addi %arg5, %c120 : index + %782 = cmpi "slt", %781, %c0 : index + %783 = subi %c-1, %781 : index + %784 = select %782, %783, %781 : index + %785 = divi_signed %784, %c16 : index + %786 = subi %c-1, %785 : index + %787 = select %782, %786, %785 : index + %788 = muli %787, %c-2 : index + %789 = addi %780, %788 : index + %790 = cmpi "slt", %arg5, %c0 : index + %791 = subi %c-1, %arg5 : index + %792 = select %790, %791, %arg5 : index + %793 = divi_signed %792, %c8 : index + %794 = subi %c-1, %793 : index + %795 = select %790, %794, %793 : index + %796 = addi %arg5, %c120 : index + %797 = cmpi "slt", %796, %c0 : index + %798 = subi %c-1, %796 : index + %799 = select %797, %798, %796 : index + %800 = divi_signed %799, %c16 : index + %801 = subi %c-1, %800 : index + %802 = select %797, %801, %800 : index + %803 = muli %802, %c-2 : index + %804 = addi %795, %803 : index + %805 = addi %804, %c15 : index + %806 = cmpi "slt", %805, %c0 : index + %807 = subi %c-1, %805 : index + %808 = select %806, %807, %805 : index + %809 = divi_signed %808, %c2 : index + %810 = subi %c-1, %809 : index + %811 = select %806, %810, %809 : index + %812 = muli %811, %c-2 : index + %813 = addi %789, %812 : index + %814 = addi %813, %c15 : index + store %759, %3[%770, %774, %814] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %arg3, %arg5 : index + %7 = addi %6, %c8 : index + %8 = vector.transfer_read %arg1[%arg4, %7], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %8, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %9 = addi %arg3, %arg5 : index + %10 = addi %9, %c16 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %12 = addi %arg3, %arg5 : index + %13 = addi %12, %c24 : index + %14 = vector.transfer_read %arg1[%arg4, %13], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %15 = addi %arg3, %arg5 : index + %16 = addi %15, %c32 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %18 = addi %arg3, %arg5 : index + %19 = addi %18, %c40 : index + %20 = vector.transfer_read %arg1[%arg4, %19], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %20, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %21 = addi %arg3, %arg5 : index + %22 = addi %21, %c48 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %24 = addi %arg3, %arg5 : index + %25 = addi %24, %c56 : index + %26 = vector.transfer_read %arg1[%arg4, %25], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %27 = addi %arg3, %arg5 : index + %28 = addi %27, %c64 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %30 = addi %arg3, %arg5 : index + %31 = addi %30, %c72 : index + %32 = vector.transfer_read %arg1[%arg4, %31], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %32, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %33 = addi %arg3, %arg5 : index + %34 = addi %33, %c80 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %36 = addi %arg3, %arg5 : index + %37 = addi %36, %c88 : index + %38 = vector.transfer_read %arg1[%arg4, %37], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %39 = addi %arg3, %arg5 : index + %40 = addi %39, %c96 : index + %41 = vector.transfer_read %arg1[%arg4, %40], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %41, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %42 = addi %arg3, %arg5 : index + %43 = addi %42, %c104 : index + %44 = vector.transfer_read %arg1[%arg4, %43], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %44, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %45 = addi %arg3, %arg5 : index + %46 = addi %45, %c112 : index + %47 = vector.transfer_read %arg1[%arg4, %46], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %47, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %48 = addi %arg3, %arg5 : index + %49 = addi %48, %c120 : index + %50 = vector.transfer_read %arg1[%arg4, %49], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %51 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %52 = cmpi "slt", %arg5, %c0 : index + %53 = subi %c-1, %arg5 : index + %54 = select %52, %53, %arg5 : index + %55 = divi_signed %54, %c16 : index + %56 = subi %c-1, %55 : index + %57 = select %52, %56, %55 : index + %58 = remi_signed %57, %c16 : index + %59 = cmpi "slt", %58, %c0 : index + %60 = addi %58, %c16 : index + %61 = select %59, %60, %58 : index + %62 = remi_signed %arg4, %c128 : index + %63 = cmpi "slt", %62, %c0 : index + %64 = addi %62, %c128 : index + %65 = select %63, %64, %62 : index + %66 = remi_signed %arg5, %c16 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = addi %66, %c16 : index + %69 = select %67, %68, %66 : index + %70 = cmpi "slt", %69, %c0 : index + %71 = subi %c-1, %69 : index + %72 = select %70, %71, %69 : index + %73 = divi_signed %72, %c8 : index + %74 = subi %c-1, %73 : index + %75 = select %70, %74, %73 : index + %76 = remi_signed %75, %c2 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = addi %76, %c2 : index + %79 = select %77, %78, %76 : index + store %51, %3[%61, %65, %79] : memref<16x128x2xvector<8xf32>> + %80 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %81 = addi %arg5, %c8 : index + %82 = cmpi "slt", %81, %c0 : index + %83 = subi %c-1, %81 : index + %84 = select %82, %83, %81 : index + %85 = divi_signed %84, %c16 : index + %86 = subi %c-1, %85 : index + %87 = select %82, %86, %85 : index + %88 = remi_signed %87, %c16 : index + %89 = cmpi "slt", %88, %c0 : index + %90 = addi %88, %c16 : index + %91 = select %89, %90, %88 : index + %92 = remi_signed %arg4, %c128 : index + %93 = cmpi "slt", %92, %c0 : index + %94 = addi %92, %c128 : index + %95 = select %93, %94, %92 : index + %96 = cmpi "slt", %arg5, %c0 : index + %97 = subi %c-1, %arg5 : index + %98 = select %96, %97, %arg5 : index + %99 = divi_signed %98, %c8 : index + %100 = subi %c-1, %99 : index + %101 = select %96, %100, %99 : index + %102 = addi %arg5, %c8 : index + %103 = cmpi "slt", %102, %c0 : index + %104 = subi %c-1, %102 : index + %105 = select %103, %104, %102 : index + %106 = divi_signed %105, %c16 : index + %107 = subi %c-1, %106 : index + %108 = select %103, %107, %106 : index + %109 = muli %108, %c-2 : index + %110 = addi %101, %109 : index + %111 = cmpi "slt", %arg5, %c0 : index + %112 = subi %c-1, %arg5 : index + %113 = select %111, %112, %arg5 : index + %114 = divi_signed %113, %c8 : index + %115 = subi %c-1, %114 : index + %116 = select %111, %115, %114 : index + %117 = addi %arg5, %c8 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c16 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c1 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = subi %c-1, %126 : index + %129 = select %127, %128, %126 : index + %130 = divi_signed %129, %c2 : index + %131 = subi %c-1, %130 : index + %132 = select %127, %131, %130 : index + %133 = muli %132, %c-2 : index + %134 = addi %110, %133 : index + %135 = addi %134, %c1 : index + store %80, %3[%91, %95, %135] : memref<16x128x2xvector<8xf32>> + %136 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %137 = cmpi "slt", %arg5, %c0 : index + %138 = subi %c-1, %arg5 : index + %139 = select %137, %138, %arg5 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = cmpi "slt", %arg5, %c0 : index + %144 = subi %c-1, %arg5 : index + %145 = select %143, %144, %arg5 : index + %146 = divi_signed %145, %c16 : index + %147 = subi %c-1, %146 : index + %148 = select %143, %147, %146 : index + %149 = addi %148, %c1 : index + %150 = cmpi "slt", %149, %c0 : index + %151 = subi %c-1, %149 : index + %152 = select %150, %151, %149 : index + %153 = divi_signed %152, %c16 : index + %154 = subi %c-1, %153 : index + %155 = select %150, %154, %153 : index + %156 = muli %155, %c-16 : index + %157 = addi %142, %156 : index + %158 = addi %157, %c1 : index + %159 = remi_signed %arg4, %c128 : index + %160 = cmpi "slt", %159, %c0 : index + %161 = addi %159, %c128 : index + %162 = select %160, %161, %159 : index + %163 = remi_signed %arg5, %c16 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = addi %163, %c16 : index + %166 = select %164, %165, %163 : index + %167 = cmpi "slt", %166, %c0 : index + %168 = subi %c-1, %166 : index + %169 = select %167, %168, %166 : index + %170 = divi_signed %169, %c8 : index + %171 = subi %c-1, %170 : index + %172 = select %167, %171, %170 : index + %173 = remi_signed %172, %c2 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = addi %173, %c2 : index + %176 = select %174, %175, %173 : index + store %136, %3[%158, %162, %176] : memref<16x128x2xvector<8xf32>> + %177 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %178 = addi %arg5, %c24 : index + %179 = cmpi "slt", %178, %c0 : index + %180 = subi %c-1, %178 : index + %181 = select %179, %180, %178 : index + %182 = divi_signed %181, %c16 : index + %183 = subi %c-1, %182 : index + %184 = select %179, %183, %182 : index + %185 = remi_signed %184, %c16 : index + %186 = cmpi "slt", %185, %c0 : index + %187 = addi %185, %c16 : index + %188 = select %186, %187, %185 : index + %189 = remi_signed %arg4, %c128 : index + %190 = cmpi "slt", %189, %c0 : index + %191 = addi %189, %c128 : index + %192 = select %190, %191, %189 : index + %193 = cmpi "slt", %arg5, %c0 : index + %194 = subi %c-1, %arg5 : index + %195 = select %193, %194, %arg5 : index + %196 = divi_signed %195, %c8 : index + %197 = subi %c-1, %196 : index + %198 = select %193, %197, %196 : index + %199 = addi %arg5, %c24 : index + %200 = cmpi "slt", %199, %c0 : index + %201 = subi %c-1, %199 : index + %202 = select %200, %201, %199 : index + %203 = divi_signed %202, %c16 : index + %204 = subi %c-1, %203 : index + %205 = select %200, %204, %203 : index + %206 = muli %205, %c-2 : index + %207 = addi %198, %206 : index + %208 = cmpi "slt", %arg5, %c0 : index + %209 = subi %c-1, %arg5 : index + %210 = select %208, %209, %arg5 : index + %211 = divi_signed %210, %c8 : index + %212 = subi %c-1, %211 : index + %213 = select %208, %212, %211 : index + %214 = addi %arg5, %c24 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c16 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c3 : index + %224 = cmpi "slt", %223, %c0 : index + %225 = subi %c-1, %223 : index + %226 = select %224, %225, %223 : index + %227 = divi_signed %226, %c2 : index + %228 = subi %c-1, %227 : index + %229 = select %224, %228, %227 : index + %230 = muli %229, %c-2 : index + %231 = addi %207, %230 : index + %232 = addi %231, %c3 : index + store %177, %3[%188, %192, %232] : memref<16x128x2xvector<8xf32>> + %233 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %234 = cmpi "slt", %arg5, %c0 : index + %235 = subi %c-1, %arg5 : index + %236 = select %234, %235, %arg5 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = cmpi "slt", %arg5, %c0 : index + %241 = subi %c-1, %arg5 : index + %242 = select %240, %241, %arg5 : index + %243 = divi_signed %242, %c16 : index + %244 = subi %c-1, %243 : index + %245 = select %240, %244, %243 : index + %246 = addi %245, %c2 : index + %247 = cmpi "slt", %246, %c0 : index + %248 = subi %c-1, %246 : index + %249 = select %247, %248, %246 : index + %250 = divi_signed %249, %c16 : index + %251 = subi %c-1, %250 : index + %252 = select %247, %251, %250 : index + %253 = muli %252, %c-16 : index + %254 = addi %239, %253 : index + %255 = addi %254, %c2 : index + %256 = remi_signed %arg4, %c128 : index + %257 = cmpi "slt", %256, %c0 : index + %258 = addi %256, %c128 : index + %259 = select %257, %258, %256 : index + %260 = remi_signed %arg5, %c16 : index + %261 = cmpi "slt", %260, %c0 : index + %262 = addi %260, %c16 : index + %263 = select %261, %262, %260 : index + %264 = cmpi "slt", %263, %c0 : index + %265 = subi %c-1, %263 : index + %266 = select %264, %265, %263 : index + %267 = divi_signed %266, %c8 : index + %268 = subi %c-1, %267 : index + %269 = select %264, %268, %267 : index + %270 = remi_signed %269, %c2 : index + %271 = cmpi "slt", %270, %c0 : index + %272 = addi %270, %c2 : index + %273 = select %271, %272, %270 : index + store %233, %3[%255, %259, %273] : memref<16x128x2xvector<8xf32>> + %274 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %275 = addi %arg5, %c40 : index + %276 = cmpi "slt", %275, %c0 : index + %277 = subi %c-1, %275 : index + %278 = select %276, %277, %275 : index + %279 = divi_signed %278, %c16 : index + %280 = subi %c-1, %279 : index + %281 = select %276, %280, %279 : index + %282 = remi_signed %281, %c16 : index + %283 = cmpi "slt", %282, %c0 : index + %284 = addi %282, %c16 : index + %285 = select %283, %284, %282 : index + %286 = remi_signed %arg4, %c128 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c128 : index + %289 = select %287, %288, %286 : index + %290 = cmpi "slt", %arg5, %c0 : index + %291 = subi %c-1, %arg5 : index + %292 = select %290, %291, %arg5 : index + %293 = divi_signed %292, %c8 : index + %294 = subi %c-1, %293 : index + %295 = select %290, %294, %293 : index + %296 = addi %arg5, %c40 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c16 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = cmpi "slt", %arg5, %c0 : index + %306 = subi %c-1, %arg5 : index + %307 = select %305, %306, %arg5 : index + %308 = divi_signed %307, %c8 : index + %309 = subi %c-1, %308 : index + %310 = select %305, %309, %308 : index + %311 = addi %arg5, %c40 : index + %312 = cmpi "slt", %311, %c0 : index + %313 = subi %c-1, %311 : index + %314 = select %312, %313, %311 : index + %315 = divi_signed %314, %c16 : index + %316 = subi %c-1, %315 : index + %317 = select %312, %316, %315 : index + %318 = muli %317, %c-2 : index + %319 = addi %310, %318 : index + %320 = addi %319, %c5 : index + %321 = cmpi "slt", %320, %c0 : index + %322 = subi %c-1, %320 : index + %323 = select %321, %322, %320 : index + %324 = divi_signed %323, %c2 : index + %325 = subi %c-1, %324 : index + %326 = select %321, %325, %324 : index + %327 = muli %326, %c-2 : index + %328 = addi %304, %327 : index + %329 = addi %328, %c5 : index + store %274, %3[%285, %289, %329] : memref<16x128x2xvector<8xf32>> + %330 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %331 = cmpi "slt", %arg5, %c0 : index + %332 = subi %c-1, %arg5 : index + %333 = select %331, %332, %arg5 : index + %334 = divi_signed %333, %c16 : index + %335 = subi %c-1, %334 : index + %336 = select %331, %335, %334 : index + %337 = cmpi "slt", %arg5, %c0 : index + %338 = subi %c-1, %arg5 : index + %339 = select %337, %338, %arg5 : index + %340 = divi_signed %339, %c16 : index + %341 = subi %c-1, %340 : index + %342 = select %337, %341, %340 : index + %343 = addi %342, %c3 : index + %344 = cmpi "slt", %343, %c0 : index + %345 = subi %c-1, %343 : index + %346 = select %344, %345, %343 : index + %347 = divi_signed %346, %c16 : index + %348 = subi %c-1, %347 : index + %349 = select %344, %348, %347 : index + %350 = muli %349, %c-16 : index + %351 = addi %336, %350 : index + %352 = addi %351, %c3 : index + %353 = remi_signed %arg4, %c128 : index + %354 = cmpi "slt", %353, %c0 : index + %355 = addi %353, %c128 : index + %356 = select %354, %355, %353 : index + %357 = remi_signed %arg5, %c16 : index + %358 = cmpi "slt", %357, %c0 : index + %359 = addi %357, %c16 : index + %360 = select %358, %359, %357 : index + %361 = cmpi "slt", %360, %c0 : index + %362 = subi %c-1, %360 : index + %363 = select %361, %362, %360 : index + %364 = divi_signed %363, %c8 : index + %365 = subi %c-1, %364 : index + %366 = select %361, %365, %364 : index + %367 = remi_signed %366, %c2 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = addi %367, %c2 : index + %370 = select %368, %369, %367 : index + store %330, %3[%352, %356, %370] : memref<16x128x2xvector<8xf32>> + %371 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %372 = addi %arg5, %c56 : index + %373 = cmpi "slt", %372, %c0 : index + %374 = subi %c-1, %372 : index + %375 = select %373, %374, %372 : index + %376 = divi_signed %375, %c16 : index + %377 = subi %c-1, %376 : index + %378 = select %373, %377, %376 : index + %379 = remi_signed %378, %c16 : index + %380 = cmpi "slt", %379, %c0 : index + %381 = addi %379, %c16 : index + %382 = select %380, %381, %379 : index + %383 = remi_signed %arg4, %c128 : index + %384 = cmpi "slt", %383, %c0 : index + %385 = addi %383, %c128 : index + %386 = select %384, %385, %383 : index + %387 = cmpi "slt", %arg5, %c0 : index + %388 = subi %c-1, %arg5 : index + %389 = select %387, %388, %arg5 : index + %390 = divi_signed %389, %c8 : index + %391 = subi %c-1, %390 : index + %392 = select %387, %391, %390 : index + %393 = addi %arg5, %c56 : index + %394 = cmpi "slt", %393, %c0 : index + %395 = subi %c-1, %393 : index + %396 = select %394, %395, %393 : index + %397 = divi_signed %396, %c16 : index + %398 = subi %c-1, %397 : index + %399 = select %394, %398, %397 : index + %400 = muli %399, %c-2 : index + %401 = addi %392, %400 : index + %402 = cmpi "slt", %arg5, %c0 : index + %403 = subi %c-1, %arg5 : index + %404 = select %402, %403, %arg5 : index + %405 = divi_signed %404, %c8 : index + %406 = subi %c-1, %405 : index + %407 = select %402, %406, %405 : index + %408 = addi %arg5, %c56 : index + %409 = cmpi "slt", %408, %c0 : index + %410 = subi %c-1, %408 : index + %411 = select %409, %410, %408 : index + %412 = divi_signed %411, %c16 : index + %413 = subi %c-1, %412 : index + %414 = select %409, %413, %412 : index + %415 = muli %414, %c-2 : index + %416 = addi %407, %415 : index + %417 = addi %416, %c7 : index + %418 = cmpi "slt", %417, %c0 : index + %419 = subi %c-1, %417 : index + %420 = select %418, %419, %417 : index + %421 = divi_signed %420, %c2 : index + %422 = subi %c-1, %421 : index + %423 = select %418, %422, %421 : index + %424 = muli %423, %c-2 : index + %425 = addi %401, %424 : index + %426 = addi %425, %c7 : index + store %371, %3[%382, %386, %426] : memref<16x128x2xvector<8xf32>> + %427 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %428 = cmpi "slt", %arg5, %c0 : index + %429 = subi %c-1, %arg5 : index + %430 = select %428, %429, %arg5 : index + %431 = divi_signed %430, %c16 : index + %432 = subi %c-1, %431 : index + %433 = select %428, %432, %431 : index + %434 = cmpi "slt", %arg5, %c0 : index + %435 = subi %c-1, %arg5 : index + %436 = select %434, %435, %arg5 : index + %437 = divi_signed %436, %c16 : index + %438 = subi %c-1, %437 : index + %439 = select %434, %438, %437 : index + %440 = addi %439, %c4 : index + %441 = cmpi "slt", %440, %c0 : index + %442 = subi %c-1, %440 : index + %443 = select %441, %442, %440 : index + %444 = divi_signed %443, %c16 : index + %445 = subi %c-1, %444 : index + %446 = select %441, %445, %444 : index + %447 = muli %446, %c-16 : index + %448 = addi %433, %447 : index + %449 = addi %448, %c4 : index + %450 = remi_signed %arg4, %c128 : index + %451 = cmpi "slt", %450, %c0 : index + %452 = addi %450, %c128 : index + %453 = select %451, %452, %450 : index + %454 = remi_signed %arg5, %c16 : index + %455 = cmpi "slt", %454, %c0 : index + %456 = addi %454, %c16 : index + %457 = select %455, %456, %454 : index + %458 = cmpi "slt", %457, %c0 : index + %459 = subi %c-1, %457 : index + %460 = select %458, %459, %457 : index + %461 = divi_signed %460, %c8 : index + %462 = subi %c-1, %461 : index + %463 = select %458, %462, %461 : index + %464 = remi_signed %463, %c2 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = addi %464, %c2 : index + %467 = select %465, %466, %464 : index + store %427, %3[%449, %453, %467] : memref<16x128x2xvector<8xf32>> + %468 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %469 = addi %arg5, %c72 : index + %470 = cmpi "slt", %469, %c0 : index + %471 = subi %c-1, %469 : index + %472 = select %470, %471, %469 : index + %473 = divi_signed %472, %c16 : index + %474 = subi %c-1, %473 : index + %475 = select %470, %474, %473 : index + %476 = remi_signed %475, %c16 : index + %477 = cmpi "slt", %476, %c0 : index + %478 = addi %476, %c16 : index + %479 = select %477, %478, %476 : index + %480 = remi_signed %arg4, %c128 : index + %481 = cmpi "slt", %480, %c0 : index + %482 = addi %480, %c128 : index + %483 = select %481, %482, %480 : index + %484 = cmpi "slt", %arg5, %c0 : index + %485 = subi %c-1, %arg5 : index + %486 = select %484, %485, %arg5 : index + %487 = divi_signed %486, %c8 : index + %488 = subi %c-1, %487 : index + %489 = select %484, %488, %487 : index + %490 = addi %arg5, %c72 : index + %491 = cmpi "slt", %490, %c0 : index + %492 = subi %c-1, %490 : index + %493 = select %491, %492, %490 : index + %494 = divi_signed %493, %c16 : index + %495 = subi %c-1, %494 : index + %496 = select %491, %495, %494 : index + %497 = muli %496, %c-2 : index + %498 = addi %489, %497 : index + %499 = cmpi "slt", %arg5, %c0 : index + %500 = subi %c-1, %arg5 : index + %501 = select %499, %500, %arg5 : index + %502 = divi_signed %501, %c8 : index + %503 = subi %c-1, %502 : index + %504 = select %499, %503, %502 : index + %505 = addi %arg5, %c72 : index + %506 = cmpi "slt", %505, %c0 : index + %507 = subi %c-1, %505 : index + %508 = select %506, %507, %505 : index + %509 = divi_signed %508, %c16 : index + %510 = subi %c-1, %509 : index + %511 = select %506, %510, %509 : index + %512 = muli %511, %c-2 : index + %513 = addi %504, %512 : index + %514 = addi %513, %c9 : index + %515 = cmpi "slt", %514, %c0 : index + %516 = subi %c-1, %514 : index + %517 = select %515, %516, %514 : index + %518 = divi_signed %517, %c2 : index + %519 = subi %c-1, %518 : index + %520 = select %515, %519, %518 : index + %521 = muli %520, %c-2 : index + %522 = addi %498, %521 : index + %523 = addi %522, %c9 : index + store %468, %3[%479, %483, %523] : memref<16x128x2xvector<8xf32>> + %524 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %525 = cmpi "slt", %arg5, %c0 : index + %526 = subi %c-1, %arg5 : index + %527 = select %525, %526, %arg5 : index + %528 = divi_signed %527, %c16 : index + %529 = subi %c-1, %528 : index + %530 = select %525, %529, %528 : index + %531 = cmpi "slt", %arg5, %c0 : index + %532 = subi %c-1, %arg5 : index + %533 = select %531, %532, %arg5 : index + %534 = divi_signed %533, %c16 : index + %535 = subi %c-1, %534 : index + %536 = select %531, %535, %534 : index + %537 = addi %536, %c5 : index + %538 = cmpi "slt", %537, %c0 : index + %539 = subi %c-1, %537 : index + %540 = select %538, %539, %537 : index + %541 = divi_signed %540, %c16 : index + %542 = subi %c-1, %541 : index + %543 = select %538, %542, %541 : index + %544 = muli %543, %c-16 : index + %545 = addi %530, %544 : index + %546 = addi %545, %c5 : index + %547 = remi_signed %arg4, %c128 : index + %548 = cmpi "slt", %547, %c0 : index + %549 = addi %547, %c128 : index + %550 = select %548, %549, %547 : index + %551 = remi_signed %arg5, %c16 : index + %552 = cmpi "slt", %551, %c0 : index + %553 = addi %551, %c16 : index + %554 = select %552, %553, %551 : index + %555 = cmpi "slt", %554, %c0 : index + %556 = subi %c-1, %554 : index + %557 = select %555, %556, %554 : index + %558 = divi_signed %557, %c8 : index + %559 = subi %c-1, %558 : index + %560 = select %555, %559, %558 : index + %561 = remi_signed %560, %c2 : index + %562 = cmpi "slt", %561, %c0 : index + %563 = addi %561, %c2 : index + %564 = select %562, %563, %561 : index + store %524, %3[%546, %550, %564] : memref<16x128x2xvector<8xf32>> + %565 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %566 = addi %arg5, %c88 : index + %567 = cmpi "slt", %566, %c0 : index + %568 = subi %c-1, %566 : index + %569 = select %567, %568, %566 : index + %570 = divi_signed %569, %c16 : index + %571 = subi %c-1, %570 : index + %572 = select %567, %571, %570 : index + %573 = remi_signed %572, %c16 : index + %574 = cmpi "slt", %573, %c0 : index + %575 = addi %573, %c16 : index + %576 = select %574, %575, %573 : index + %577 = remi_signed %arg4, %c128 : index + %578 = cmpi "slt", %577, %c0 : index + %579 = addi %577, %c128 : index + %580 = select %578, %579, %577 : index + %581 = cmpi "slt", %arg5, %c0 : index + %582 = subi %c-1, %arg5 : index + %583 = select %581, %582, %arg5 : index + %584 = divi_signed %583, %c8 : index + %585 = subi %c-1, %584 : index + %586 = select %581, %585, %584 : index + %587 = addi %arg5, %c88 : index + %588 = cmpi "slt", %587, %c0 : index + %589 = subi %c-1, %587 : index + %590 = select %588, %589, %587 : index + %591 = divi_signed %590, %c16 : index + %592 = subi %c-1, %591 : index + %593 = select %588, %592, %591 : index + %594 = muli %593, %c-2 : index + %595 = addi %586, %594 : index + %596 = cmpi "slt", %arg5, %c0 : index + %597 = subi %c-1, %arg5 : index + %598 = select %596, %597, %arg5 : index + %599 = divi_signed %598, %c8 : index + %600 = subi %c-1, %599 : index + %601 = select %596, %600, %599 : index + %602 = addi %arg5, %c88 : index + %603 = cmpi "slt", %602, %c0 : index + %604 = subi %c-1, %602 : index + %605 = select %603, %604, %602 : index + %606 = divi_signed %605, %c16 : index + %607 = subi %c-1, %606 : index + %608 = select %603, %607, %606 : index + %609 = muli %608, %c-2 : index + %610 = addi %601, %609 : index + %611 = addi %610, %c11 : index + %612 = cmpi "slt", %611, %c0 : index + %613 = subi %c-1, %611 : index + %614 = select %612, %613, %611 : index + %615 = divi_signed %614, %c2 : index + %616 = subi %c-1, %615 : index + %617 = select %612, %616, %615 : index + %618 = muli %617, %c-2 : index + %619 = addi %595, %618 : index + %620 = addi %619, %c11 : index + store %565, %3[%576, %580, %620] : memref<16x128x2xvector<8xf32>> + %621 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %622 = cmpi "slt", %arg5, %c0 : index + %623 = subi %c-1, %arg5 : index + %624 = select %622, %623, %arg5 : index + %625 = divi_signed %624, %c16 : index + %626 = subi %c-1, %625 : index + %627 = select %622, %626, %625 : index + %628 = cmpi "slt", %arg5, %c0 : index + %629 = subi %c-1, %arg5 : index + %630 = select %628, %629, %arg5 : index + %631 = divi_signed %630, %c16 : index + %632 = subi %c-1, %631 : index + %633 = select %628, %632, %631 : index + %634 = addi %633, %c6 : index + %635 = cmpi "slt", %634, %c0 : index + %636 = subi %c-1, %634 : index + %637 = select %635, %636, %634 : index + %638 = divi_signed %637, %c16 : index + %639 = subi %c-1, %638 : index + %640 = select %635, %639, %638 : index + %641 = muli %640, %c-16 : index + %642 = addi %627, %641 : index + %643 = addi %642, %c6 : index + %644 = remi_signed %arg4, %c128 : index + %645 = cmpi "slt", %644, %c0 : index + %646 = addi %644, %c128 : index + %647 = select %645, %646, %644 : index + %648 = remi_signed %arg5, %c16 : index + %649 = cmpi "slt", %648, %c0 : index + %650 = addi %648, %c16 : index + %651 = select %649, %650, %648 : index + %652 = cmpi "slt", %651, %c0 : index + %653 = subi %c-1, %651 : index + %654 = select %652, %653, %651 : index + %655 = divi_signed %654, %c8 : index + %656 = subi %c-1, %655 : index + %657 = select %652, %656, %655 : index + %658 = remi_signed %657, %c2 : index + %659 = cmpi "slt", %658, %c0 : index + %660 = addi %658, %c2 : index + %661 = select %659, %660, %658 : index + store %621, %3[%643, %647, %661] : memref<16x128x2xvector<8xf32>> + %662 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %663 = addi %arg5, %c104 : index + %664 = cmpi "slt", %663, %c0 : index + %665 = subi %c-1, %663 : index + %666 = select %664, %665, %663 : index + %667 = divi_signed %666, %c16 : index + %668 = subi %c-1, %667 : index + %669 = select %664, %668, %667 : index + %670 = remi_signed %669, %c16 : index + %671 = cmpi "slt", %670, %c0 : index + %672 = addi %670, %c16 : index + %673 = select %671, %672, %670 : index + %674 = remi_signed %arg4, %c128 : index + %675 = cmpi "slt", %674, %c0 : index + %676 = addi %674, %c128 : index + %677 = select %675, %676, %674 : index + %678 = cmpi "slt", %arg5, %c0 : index + %679 = subi %c-1, %arg5 : index + %680 = select %678, %679, %arg5 : index + %681 = divi_signed %680, %c8 : index + %682 = subi %c-1, %681 : index + %683 = select %678, %682, %681 : index + %684 = addi %arg5, %c104 : index + %685 = cmpi "slt", %684, %c0 : index + %686 = subi %c-1, %684 : index + %687 = select %685, %686, %684 : index + %688 = divi_signed %687, %c16 : index + %689 = subi %c-1, %688 : index + %690 = select %685, %689, %688 : index + %691 = muli %690, %c-2 : index + %692 = addi %683, %691 : index + %693 = cmpi "slt", %arg5, %c0 : index + %694 = subi %c-1, %arg5 : index + %695 = select %693, %694, %arg5 : index + %696 = divi_signed %695, %c8 : index + %697 = subi %c-1, %696 : index + %698 = select %693, %697, %696 : index + %699 = addi %arg5, %c104 : index + %700 = cmpi "slt", %699, %c0 : index + %701 = subi %c-1, %699 : index + %702 = select %700, %701, %699 : index + %703 = divi_signed %702, %c16 : index + %704 = subi %c-1, %703 : index + %705 = select %700, %704, %703 : index + %706 = muli %705, %c-2 : index + %707 = addi %698, %706 : index + %708 = addi %707, %c13 : index + %709 = cmpi "slt", %708, %c0 : index + %710 = subi %c-1, %708 : index + %711 = select %709, %710, %708 : index + %712 = divi_signed %711, %c2 : index + %713 = subi %c-1, %712 : index + %714 = select %709, %713, %712 : index + %715 = muli %714, %c-2 : index + %716 = addi %692, %715 : index + %717 = addi %716, %c13 : index + store %662, %3[%673, %677, %717] : memref<16x128x2xvector<8xf32>> + %718 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %719 = cmpi "slt", %arg5, %c0 : index + %720 = subi %c-1, %arg5 : index + %721 = select %719, %720, %arg5 : index + %722 = divi_signed %721, %c16 : index + %723 = subi %c-1, %722 : index + %724 = select %719, %723, %722 : index + %725 = cmpi "slt", %arg5, %c0 : index + %726 = subi %c-1, %arg5 : index + %727 = select %725, %726, %arg5 : index + %728 = divi_signed %727, %c16 : index + %729 = subi %c-1, %728 : index + %730 = select %725, %729, %728 : index + %731 = addi %730, %c7 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c16 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = muli %737, %c-16 : index + %739 = addi %724, %738 : index + %740 = addi %739, %c7 : index + %741 = remi_signed %arg4, %c128 : index + %742 = cmpi "slt", %741, %c0 : index + %743 = addi %741, %c128 : index + %744 = select %742, %743, %741 : index + %745 = remi_signed %arg5, %c16 : index + %746 = cmpi "slt", %745, %c0 : index + %747 = addi %745, %c16 : index + %748 = select %746, %747, %745 : index + %749 = cmpi "slt", %748, %c0 : index + %750 = subi %c-1, %748 : index + %751 = select %749, %750, %748 : index + %752 = divi_signed %751, %c8 : index + %753 = subi %c-1, %752 : index + %754 = select %749, %753, %752 : index + %755 = remi_signed %754, %c2 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = addi %755, %c2 : index + %758 = select %756, %757, %755 : index + store %718, %3[%740, %744, %758] : memref<16x128x2xvector<8xf32>> + %759 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %760 = addi %arg5, %c120 : index + %761 = cmpi "slt", %760, %c0 : index + %762 = subi %c-1, %760 : index + %763 = select %761, %762, %760 : index + %764 = divi_signed %763, %c16 : index + %765 = subi %c-1, %764 : index + %766 = select %761, %765, %764 : index + %767 = remi_signed %766, %c16 : index + %768 = cmpi "slt", %767, %c0 : index + %769 = addi %767, %c16 : index + %770 = select %768, %769, %767 : index + %771 = remi_signed %arg4, %c128 : index + %772 = cmpi "slt", %771, %c0 : index + %773 = addi %771, %c128 : index + %774 = select %772, %773, %771 : index + %775 = cmpi "slt", %arg5, %c0 : index + %776 = subi %c-1, %arg5 : index + %777 = select %775, %776, %arg5 : index + %778 = divi_signed %777, %c8 : index + %779 = subi %c-1, %778 : index + %780 = select %775, %779, %778 : index + %781 = addi %arg5, %c120 : index + %782 = cmpi "slt", %781, %c0 : index + %783 = subi %c-1, %781 : index + %784 = select %782, %783, %781 : index + %785 = divi_signed %784, %c16 : index + %786 = subi %c-1, %785 : index + %787 = select %782, %786, %785 : index + %788 = muli %787, %c-2 : index + %789 = addi %780, %788 : index + %790 = cmpi "slt", %arg5, %c0 : index + %791 = subi %c-1, %arg5 : index + %792 = select %790, %791, %arg5 : index + %793 = divi_signed %792, %c8 : index + %794 = subi %c-1, %793 : index + %795 = select %790, %794, %793 : index + %796 = addi %arg5, %c120 : index + %797 = cmpi "slt", %796, %c0 : index + %798 = subi %c-1, %796 : index + %799 = select %797, %798, %796 : index + %800 = divi_signed %799, %c16 : index + %801 = subi %c-1, %800 : index + %802 = select %797, %801, %800 : index + %803 = muli %802, %c-2 : index + %804 = addi %795, %803 : index + %805 = addi %804, %c15 : index + %806 = cmpi "slt", %805, %c0 : index + %807 = subi %c-1, %805 : index + %808 = select %806, %807, %805 : index + %809 = divi_signed %808, %c2 : index + %810 = subi %c-1, %809 : index + %811 = select %806, %810, %809 : index + %812 = muli %811, %c-2 : index + %813 = addi %789, %812 : index + %814 = addi %813, %c15 : index + store %759, %3[%770, %774, %814] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg4, %arg7 : index + %7 = addi %6, %arg9 : index + %8 = addi %arg4, %arg7 : index + %9 = addi %8, %arg9 : index + %10 = addi %arg4, %arg7 : index + %11 = addi %10, %arg9 : index + %12 = addi %arg4, %arg7 : index + %13 = addi %12, %arg9 : index + %14 = addi %arg4, %arg7 : index + %15 = addi %14, %arg9 : index + %16 = addi %arg4, %arg7 : index + %17 = addi %16, %arg9 : index + %18 = addi %arg4, %arg7 : index + %19 = addi %18, %arg9 : index + %20 = addi %arg6, %arg8 : index + %21 = addi %arg6, %arg8 : index + %22 = addi %arg6, %arg8 : index + %23 = addi %arg6, %arg8 : index + %24 = addi %arg6, %arg8 : index + %25 = addi %arg6, %arg8 : index + %26 = addi %arg6, %arg8 : index + %27 = addi %arg6, %arg8 : index + %28 = load %arg0[%5, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = load %arg0[%7, %21] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %30 = load %arg0[%9, %22] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg0[%11, %23] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %32 = load %arg0[%13, %24] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %33 = load %arg0[%15, %25] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %34 = load %arg0[%17, %26] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %35 = load %arg0[%19, %27] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %36 = cmpi "slt", %arg5, %c0 : index + %37 = subi %c-1, %arg5 : index + %38 = select %36, %37, %arg5 : index + %39 = divi_signed %38, %c16 : index + %40 = subi %c-1, %39 : index + %41 = select %36, %40, %39 : index + %42 = remi_signed %41, %c16 : index + %43 = cmpi "slt", %42, %c0 : index + %44 = addi %42, %c16 : index + %45 = select %43, %44, %42 : index + %46 = addi %arg6, %arg8 : index + %47 = remi_signed %46, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + %65 = load %3[%45, %50, %64] : memref<16x128x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = cmpi "slt", %arg5, %c0 : index + %68 = subi %c-1, %arg5 : index + %69 = select %67, %68, %arg5 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = addi %arg6, %arg8 : index + %78 = remi_signed %77, %c128 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = addi %78, %c128 : index + %81 = select %79, %80, %78 : index + %82 = remi_signed %arg5, %c16 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = addi %82, %c16 : index + %85 = select %83, %84, %82 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = subi %c-1, %85 : index + %88 = select %86, %87, %85 : index + %89 = divi_signed %88, %c8 : index + %90 = subi %c-1, %89 : index + %91 = select %86, %90, %89 : index + %92 = remi_signed %91, %c2 : index + %93 = cmpi "slt", %92, %c0 : index + %94 = addi %92, %c2 : index + %95 = select %93, %94, %92 : index + %96 = load %3[%76, %81, %95] : memref<16x128x2xvector<8xf32>> + %97 = vector.extractelement %96[%c1_i64 : i64] : vector<8xf32> + %98 = cmpi "slt", %arg5, %c0 : index + %99 = subi %c-1, %arg5 : index + %100 = select %98, %99, %arg5 : index + %101 = divi_signed %100, %c16 : index + %102 = subi %c-1, %101 : index + %103 = select %98, %102, %101 : index + %104 = remi_signed %103, %c16 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = addi %104, %c16 : index + %107 = select %105, %106, %104 : index + %108 = addi %arg6, %arg8 : index + %109 = remi_signed %108, %c128 : index + %110 = cmpi "slt", %109, %c0 : index + %111 = addi %109, %c128 : index + %112 = select %110, %111, %109 : index + %113 = remi_signed %arg5, %c16 : index + %114 = cmpi "slt", %113, %c0 : index + %115 = addi %113, %c16 : index + %116 = select %114, %115, %113 : index + %117 = cmpi "slt", %116, %c0 : index + %118 = subi %c-1, %116 : index + %119 = select %117, %118, %116 : index + %120 = divi_signed %119, %c8 : index + %121 = subi %c-1, %120 : index + %122 = select %117, %121, %120 : index + %123 = remi_signed %122, %c2 : index + %124 = cmpi "slt", %123, %c0 : index + %125 = addi %123, %c2 : index + %126 = select %124, %125, %123 : index + %127 = load %3[%107, %112, %126] : memref<16x128x2xvector<8xf32>> + %128 = vector.extractelement %127[%c2_i64 : i64] : vector<8xf32> + %129 = cmpi "slt", %arg5, %c0 : index + %130 = subi %c-1, %arg5 : index + %131 = select %129, %130, %arg5 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = remi_signed %134, %c16 : index + %136 = cmpi "slt", %135, %c0 : index + %137 = addi %135, %c16 : index + %138 = select %136, %137, %135 : index + %139 = addi %arg6, %arg8 : index + %140 = remi_signed %139, %c128 : index + %141 = cmpi "slt", %140, %c0 : index + %142 = addi %140, %c128 : index + %143 = select %141, %142, %140 : index + %144 = remi_signed %arg5, %c16 : index + %145 = cmpi "slt", %144, %c0 : index + %146 = addi %144, %c16 : index + %147 = select %145, %146, %144 : index + %148 = cmpi "slt", %147, %c0 : index + %149 = subi %c-1, %147 : index + %150 = select %148, %149, %147 : index + %151 = divi_signed %150, %c8 : index + %152 = subi %c-1, %151 : index + %153 = select %148, %152, %151 : index + %154 = remi_signed %153, %c2 : index + %155 = cmpi "slt", %154, %c0 : index + %156 = addi %154, %c2 : index + %157 = select %155, %156, %154 : index + %158 = load %3[%138, %143, %157] : memref<16x128x2xvector<8xf32>> + %159 = vector.extractelement %158[%c3_i64 : i64] : vector<8xf32> + %160 = cmpi "slt", %arg5, %c0 : index + %161 = subi %c-1, %arg5 : index + %162 = select %160, %161, %arg5 : index + %163 = divi_signed %162, %c16 : index + %164 = subi %c-1, %163 : index + %165 = select %160, %164, %163 : index + %166 = remi_signed %165, %c16 : index + %167 = cmpi "slt", %166, %c0 : index + %168 = addi %166, %c16 : index + %169 = select %167, %168, %166 : index + %170 = addi %arg6, %arg8 : index + %171 = remi_signed %170, %c128 : index + %172 = cmpi "slt", %171, %c0 : index + %173 = addi %171, %c128 : index + %174 = select %172, %173, %171 : index + %175 = remi_signed %arg5, %c16 : index + %176 = cmpi "slt", %175, %c0 : index + %177 = addi %175, %c16 : index + %178 = select %176, %177, %175 : index + %179 = cmpi "slt", %178, %c0 : index + %180 = subi %c-1, %178 : index + %181 = select %179, %180, %178 : index + %182 = divi_signed %181, %c8 : index + %183 = subi %c-1, %182 : index + %184 = select %179, %183, %182 : index + %185 = remi_signed %184, %c2 : index + %186 = cmpi "slt", %185, %c0 : index + %187 = addi %185, %c2 : index + %188 = select %186, %187, %185 : index + %189 = load %3[%169, %174, %188] : memref<16x128x2xvector<8xf32>> + %190 = vector.extractelement %189[%c4_i64 : i64] : vector<8xf32> + %191 = cmpi "slt", %arg5, %c0 : index + %192 = subi %c-1, %arg5 : index + %193 = select %191, %192, %arg5 : index + %194 = divi_signed %193, %c16 : index + %195 = subi %c-1, %194 : index + %196 = select %191, %195, %194 : index + %197 = remi_signed %196, %c16 : index + %198 = cmpi "slt", %197, %c0 : index + %199 = addi %197, %c16 : index + %200 = select %198, %199, %197 : index + %201 = addi %arg6, %arg8 : index + %202 = remi_signed %201, %c128 : index + %203 = cmpi "slt", %202, %c0 : index + %204 = addi %202, %c128 : index + %205 = select %203, %204, %202 : index + %206 = remi_signed %arg5, %c16 : index + %207 = cmpi "slt", %206, %c0 : index + %208 = addi %206, %c16 : index + %209 = select %207, %208, %206 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c8 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c2 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c2 : index + %219 = select %217, %218, %216 : index + %220 = load %3[%200, %205, %219] : memref<16x128x2xvector<8xf32>> + %221 = vector.extractelement %220[%c5_i64 : i64] : vector<8xf32> + %222 = cmpi "slt", %arg5, %c0 : index + %223 = subi %c-1, %arg5 : index + %224 = select %222, %223, %arg5 : index + %225 = divi_signed %224, %c16 : index + %226 = subi %c-1, %225 : index + %227 = select %222, %226, %225 : index + %228 = remi_signed %227, %c16 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = addi %228, %c16 : index + %231 = select %229, %230, %228 : index + %232 = addi %arg6, %arg8 : index + %233 = remi_signed %232, %c128 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = addi %233, %c128 : index + %236 = select %234, %235, %233 : index + %237 = remi_signed %arg5, %c16 : index + %238 = cmpi "slt", %237, %c0 : index + %239 = addi %237, %c16 : index + %240 = select %238, %239, %237 : index + %241 = cmpi "slt", %240, %c0 : index + %242 = subi %c-1, %240 : index + %243 = select %241, %242, %240 : index + %244 = divi_signed %243, %c8 : index + %245 = subi %c-1, %244 : index + %246 = select %241, %245, %244 : index + %247 = remi_signed %246, %c2 : index + %248 = cmpi "slt", %247, %c0 : index + %249 = addi %247, %c2 : index + %250 = select %248, %249, %247 : index + %251 = load %3[%231, %236, %250] : memref<16x128x2xvector<8xf32>> + %252 = vector.extractelement %251[%c6_i64 : i64] : vector<8xf32> + %253 = cmpi "slt", %arg5, %c0 : index + %254 = subi %c-1, %arg5 : index + %255 = select %253, %254, %arg5 : index + %256 = divi_signed %255, %c16 : index + %257 = subi %c-1, %256 : index + %258 = select %253, %257, %256 : index + %259 = remi_signed %258, %c16 : index + %260 = cmpi "slt", %259, %c0 : index + %261 = addi %259, %c16 : index + %262 = select %260, %261, %259 : index + %263 = addi %arg6, %arg8 : index + %264 = remi_signed %263, %c128 : index + %265 = cmpi "slt", %264, %c0 : index + %266 = addi %264, %c128 : index + %267 = select %265, %266, %264 : index + %268 = remi_signed %arg5, %c16 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = addi %268, %c16 : index + %271 = select %269, %270, %268 : index + %272 = cmpi "slt", %271, %c0 : index + %273 = subi %c-1, %271 : index + %274 = select %272, %273, %271 : index + %275 = divi_signed %274, %c8 : index + %276 = subi %c-1, %275 : index + %277 = select %272, %276, %275 : index + %278 = remi_signed %277, %c2 : index + %279 = cmpi "slt", %278, %c0 : index + %280 = addi %278, %c2 : index + %281 = select %279, %280, %278 : index + %282 = load %3[%262, %267, %281] : memref<16x128x2xvector<8xf32>> + %283 = vector.extractelement %282[%c7_i64 : i64] : vector<8xf32> + %284 = mulf %28, %66 {RelaxedPrecision} : f32 + %285 = mulf %29, %97 {RelaxedPrecision} : f32 + %286 = mulf %30, %128 {RelaxedPrecision} : f32 + %287 = mulf %31, %159 {RelaxedPrecision} : f32 + %288 = mulf %32, %190 {RelaxedPrecision} : f32 + %289 = mulf %33, %221 {RelaxedPrecision} : f32 + %290 = mulf %34, %252 {RelaxedPrecision} : f32 + %291 = mulf %35, %283 {RelaxedPrecision} : f32 + %292 = cmpi "slt", %arg5, %c0 : index + %293 = subi %c-1, %arg5 : index + %294 = select %292, %293, %arg5 : index + %295 = divi_signed %294, %c16 : index + %296 = subi %c-1, %295 : index + %297 = select %292, %296, %295 : index + %298 = remi_signed %297, %c16 : index + %299 = cmpi "slt", %298, %c0 : index + %300 = addi %298, %c16 : index + %301 = select %299, %300, %298 : index + %302 = addi %arg7, %arg9 : index + %303 = remi_signed %302, %c6 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = addi %303, %c6 : index + %306 = select %304, %305, %303 : index + %307 = remi_signed %arg5, %c16 : index + %308 = cmpi "slt", %307, %c0 : index + %309 = addi %307, %c16 : index + %310 = select %308, %309, %307 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c8 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = remi_signed %316, %c2 : index + %318 = cmpi "slt", %317, %c0 : index + %319 = addi %317, %c2 : index + %320 = select %318, %319, %317 : index + %321 = load %2[%301, %306, %320] : memref<16x6x2xvector<8xf32>> + %322 = vector.extractelement %321[%c0_i64 : i64] : vector<8xf32> + %323 = cmpi "slt", %arg5, %c0 : index + %324 = subi %c-1, %arg5 : index + %325 = select %323, %324, %arg5 : index + %326 = divi_signed %325, %c16 : index + %327 = subi %c-1, %326 : index + %328 = select %323, %327, %326 : index + %329 = remi_signed %328, %c16 : index + %330 = cmpi "slt", %329, %c0 : index + %331 = addi %329, %c16 : index + %332 = select %330, %331, %329 : index + %333 = addi %arg7, %arg9 : index + %334 = remi_signed %333, %c6 : index + %335 = cmpi "slt", %334, %c0 : index + %336 = addi %334, %c6 : index + %337 = select %335, %336, %334 : index + %338 = remi_signed %arg5, %c16 : index + %339 = cmpi "slt", %338, %c0 : index + %340 = addi %338, %c16 : index + %341 = select %339, %340, %338 : index + %342 = cmpi "slt", %341, %c0 : index + %343 = subi %c-1, %341 : index + %344 = select %342, %343, %341 : index + %345 = divi_signed %344, %c8 : index + %346 = subi %c-1, %345 : index + %347 = select %342, %346, %345 : index + %348 = remi_signed %347, %c2 : index + %349 = cmpi "slt", %348, %c0 : index + %350 = addi %348, %c2 : index + %351 = select %349, %350, %348 : index + %352 = load %2[%332, %337, %351] : memref<16x6x2xvector<8xf32>> + %353 = vector.extractelement %352[%c1_i64 : i64] : vector<8xf32> + %354 = cmpi "slt", %arg5, %c0 : index + %355 = subi %c-1, %arg5 : index + %356 = select %354, %355, %arg5 : index + %357 = divi_signed %356, %c16 : index + %358 = subi %c-1, %357 : index + %359 = select %354, %358, %357 : index + %360 = remi_signed %359, %c16 : index + %361 = cmpi "slt", %360, %c0 : index + %362 = addi %360, %c16 : index + %363 = select %361, %362, %360 : index + %364 = addi %arg7, %arg9 : index + %365 = remi_signed %364, %c6 : index + %366 = cmpi "slt", %365, %c0 : index + %367 = addi %365, %c6 : index + %368 = select %366, %367, %365 : index + %369 = remi_signed %arg5, %c16 : index + %370 = cmpi "slt", %369, %c0 : index + %371 = addi %369, %c16 : index + %372 = select %370, %371, %369 : index + %373 = cmpi "slt", %372, %c0 : index + %374 = subi %c-1, %372 : index + %375 = select %373, %374, %372 : index + %376 = divi_signed %375, %c8 : index + %377 = subi %c-1, %376 : index + %378 = select %373, %377, %376 : index + %379 = remi_signed %378, %c2 : index + %380 = cmpi "slt", %379, %c0 : index + %381 = addi %379, %c2 : index + %382 = select %380, %381, %379 : index + %383 = load %2[%363, %368, %382] : memref<16x6x2xvector<8xf32>> + %384 = vector.extractelement %383[%c2_i64 : i64] : vector<8xf32> + %385 = cmpi "slt", %arg5, %c0 : index + %386 = subi %c-1, %arg5 : index + %387 = select %385, %386, %arg5 : index + %388 = divi_signed %387, %c16 : index + %389 = subi %c-1, %388 : index + %390 = select %385, %389, %388 : index + %391 = remi_signed %390, %c16 : index + %392 = cmpi "slt", %391, %c0 : index + %393 = addi %391, %c16 : index + %394 = select %392, %393, %391 : index + %395 = addi %arg7, %arg9 : index + %396 = remi_signed %395, %c6 : index + %397 = cmpi "slt", %396, %c0 : index + %398 = addi %396, %c6 : index + %399 = select %397, %398, %396 : index + %400 = remi_signed %arg5, %c16 : index + %401 = cmpi "slt", %400, %c0 : index + %402 = addi %400, %c16 : index + %403 = select %401, %402, %400 : index + %404 = cmpi "slt", %403, %c0 : index + %405 = subi %c-1, %403 : index + %406 = select %404, %405, %403 : index + %407 = divi_signed %406, %c8 : index + %408 = subi %c-1, %407 : index + %409 = select %404, %408, %407 : index + %410 = remi_signed %409, %c2 : index + %411 = cmpi "slt", %410, %c0 : index + %412 = addi %410, %c2 : index + %413 = select %411, %412, %410 : index + %414 = load %2[%394, %399, %413] : memref<16x6x2xvector<8xf32>> + %415 = vector.extractelement %414[%c3_i64 : i64] : vector<8xf32> + %416 = cmpi "slt", %arg5, %c0 : index + %417 = subi %c-1, %arg5 : index + %418 = select %416, %417, %arg5 : index + %419 = divi_signed %418, %c16 : index + %420 = subi %c-1, %419 : index + %421 = select %416, %420, %419 : index + %422 = remi_signed %421, %c16 : index + %423 = cmpi "slt", %422, %c0 : index + %424 = addi %422, %c16 : index + %425 = select %423, %424, %422 : index + %426 = addi %arg7, %arg9 : index + %427 = remi_signed %426, %c6 : index + %428 = cmpi "slt", %427, %c0 : index + %429 = addi %427, %c6 : index + %430 = select %428, %429, %427 : index + %431 = remi_signed %arg5, %c16 : index + %432 = cmpi "slt", %431, %c0 : index + %433 = addi %431, %c16 : index + %434 = select %432, %433, %431 : index + %435 = cmpi "slt", %434, %c0 : index + %436 = subi %c-1, %434 : index + %437 = select %435, %436, %434 : index + %438 = divi_signed %437, %c8 : index + %439 = subi %c-1, %438 : index + %440 = select %435, %439, %438 : index + %441 = remi_signed %440, %c2 : index + %442 = cmpi "slt", %441, %c0 : index + %443 = addi %441, %c2 : index + %444 = select %442, %443, %441 : index + %445 = load %2[%425, %430, %444] : memref<16x6x2xvector<8xf32>> + %446 = vector.extractelement %445[%c4_i64 : i64] : vector<8xf32> + %447 = cmpi "slt", %arg5, %c0 : index + %448 = subi %c-1, %arg5 : index + %449 = select %447, %448, %arg5 : index + %450 = divi_signed %449, %c16 : index + %451 = subi %c-1, %450 : index + %452 = select %447, %451, %450 : index + %453 = remi_signed %452, %c16 : index + %454 = cmpi "slt", %453, %c0 : index + %455 = addi %453, %c16 : index + %456 = select %454, %455, %453 : index + %457 = addi %arg7, %arg9 : index + %458 = remi_signed %457, %c6 : index + %459 = cmpi "slt", %458, %c0 : index + %460 = addi %458, %c6 : index + %461 = select %459, %460, %458 : index + %462 = remi_signed %arg5, %c16 : index + %463 = cmpi "slt", %462, %c0 : index + %464 = addi %462, %c16 : index + %465 = select %463, %464, %462 : index + %466 = cmpi "slt", %465, %c0 : index + %467 = subi %c-1, %465 : index + %468 = select %466, %467, %465 : index + %469 = divi_signed %468, %c8 : index + %470 = subi %c-1, %469 : index + %471 = select %466, %470, %469 : index + %472 = remi_signed %471, %c2 : index + %473 = cmpi "slt", %472, %c0 : index + %474 = addi %472, %c2 : index + %475 = select %473, %474, %472 : index + %476 = load %2[%456, %461, %475] : memref<16x6x2xvector<8xf32>> + %477 = vector.extractelement %476[%c5_i64 : i64] : vector<8xf32> + %478 = cmpi "slt", %arg5, %c0 : index + %479 = subi %c-1, %arg5 : index + %480 = select %478, %479, %arg5 : index + %481 = divi_signed %480, %c16 : index + %482 = subi %c-1, %481 : index + %483 = select %478, %482, %481 : index + %484 = remi_signed %483, %c16 : index + %485 = cmpi "slt", %484, %c0 : index + %486 = addi %484, %c16 : index + %487 = select %485, %486, %484 : index + %488 = addi %arg7, %arg9 : index + %489 = remi_signed %488, %c6 : index + %490 = cmpi "slt", %489, %c0 : index + %491 = addi %489, %c6 : index + %492 = select %490, %491, %489 : index + %493 = remi_signed %arg5, %c16 : index + %494 = cmpi "slt", %493, %c0 : index + %495 = addi %493, %c16 : index + %496 = select %494, %495, %493 : index + %497 = cmpi "slt", %496, %c0 : index + %498 = subi %c-1, %496 : index + %499 = select %497, %498, %496 : index + %500 = divi_signed %499, %c8 : index + %501 = subi %c-1, %500 : index + %502 = select %497, %501, %500 : index + %503 = remi_signed %502, %c2 : index + %504 = cmpi "slt", %503, %c0 : index + %505 = addi %503, %c2 : index + %506 = select %504, %505, %503 : index + %507 = load %2[%487, %492, %506] : memref<16x6x2xvector<8xf32>> + %508 = vector.extractelement %507[%c6_i64 : i64] : vector<8xf32> + %509 = cmpi "slt", %arg5, %c0 : index + %510 = subi %c-1, %arg5 : index + %511 = select %509, %510, %arg5 : index + %512 = divi_signed %511, %c16 : index + %513 = subi %c-1, %512 : index + %514 = select %509, %513, %512 : index + %515 = remi_signed %514, %c16 : index + %516 = cmpi "slt", %515, %c0 : index + %517 = addi %515, %c16 : index + %518 = select %516, %517, %515 : index + %519 = addi %arg7, %arg9 : index + %520 = remi_signed %519, %c6 : index + %521 = cmpi "slt", %520, %c0 : index + %522 = addi %520, %c6 : index + %523 = select %521, %522, %520 : index + %524 = remi_signed %arg5, %c16 : index + %525 = cmpi "slt", %524, %c0 : index + %526 = addi %524, %c16 : index + %527 = select %525, %526, %524 : index + %528 = cmpi "slt", %527, %c0 : index + %529 = subi %c-1, %527 : index + %530 = select %528, %529, %527 : index + %531 = divi_signed %530, %c8 : index + %532 = subi %c-1, %531 : index + %533 = select %528, %532, %531 : index + %534 = remi_signed %533, %c2 : index + %535 = cmpi "slt", %534, %c0 : index + %536 = addi %534, %c2 : index + %537 = select %535, %536, %534 : index + %538 = load %2[%518, %523, %537] : memref<16x6x2xvector<8xf32>> + %539 = vector.extractelement %538[%c7_i64 : i64] : vector<8xf32> + %540 = addf %322, %284 {RelaxedPrecision} : f32 + %541 = addf %353, %285 {RelaxedPrecision} : f32 + %542 = addf %384, %286 {RelaxedPrecision} : f32 + %543 = addf %415, %287 {RelaxedPrecision} : f32 + %544 = addf %446, %288 {RelaxedPrecision} : f32 + %545 = addf %477, %289 {RelaxedPrecision} : f32 + %546 = addf %508, %290 {RelaxedPrecision} : f32 + %547 = addf %539, %291 {RelaxedPrecision} : f32 + %548 = cmpi "slt", %arg5, %c0 : index + %549 = subi %c-1, %arg5 : index + %550 = select %548, %549, %arg5 : index + %551 = divi_signed %550, %c16 : index + %552 = subi %c-1, %551 : index + %553 = select %548, %552, %551 : index + %554 = remi_signed %553, %c16 : index + %555 = cmpi "slt", %554, %c0 : index + %556 = addi %554, %c16 : index + %557 = select %555, %556, %554 : index + %558 = addi %arg7, %arg9 : index + %559 = remi_signed %558, %c6 : index + %560 = cmpi "slt", %559, %c0 : index + %561 = addi %559, %c6 : index + %562 = select %560, %561, %559 : index + %563 = remi_signed %arg5, %c16 : index + %564 = cmpi "slt", %563, %c0 : index + %565 = addi %563, %c16 : index + %566 = select %564, %565, %563 : index + %567 = cmpi "slt", %566, %c0 : index + %568 = subi %c-1, %566 : index + %569 = select %567, %568, %566 : index + %570 = divi_signed %569, %c8 : index + %571 = subi %c-1, %570 : index + %572 = select %567, %571, %570 : index + %573 = remi_signed %572, %c2 : index + %574 = cmpi "slt", %573, %c0 : index + %575 = addi %573, %c2 : index + %576 = select %574, %575, %573 : index + %577 = load %2[%557, %562, %576] : memref<16x6x2xvector<8xf32>> + %578 = vector.insertelement %540, %577[%c0_i64 : i64] : vector<8xf32> + %579 = cmpi "slt", %arg5, %c0 : index + %580 = subi %c-1, %arg5 : index + %581 = select %579, %580, %arg5 : index + %582 = divi_signed %581, %c16 : index + %583 = subi %c-1, %582 : index + %584 = select %579, %583, %582 : index + %585 = remi_signed %584, %c16 : index + %586 = cmpi "slt", %585, %c0 : index + %587 = addi %585, %c16 : index + %588 = select %586, %587, %585 : index + %589 = addi %arg7, %arg9 : index + %590 = remi_signed %589, %c6 : index + %591 = cmpi "slt", %590, %c0 : index + %592 = addi %590, %c6 : index + %593 = select %591, %592, %590 : index + %594 = remi_signed %arg5, %c16 : index + %595 = cmpi "slt", %594, %c0 : index + %596 = addi %594, %c16 : index + %597 = select %595, %596, %594 : index + %598 = cmpi "slt", %597, %c0 : index + %599 = subi %c-1, %597 : index + %600 = select %598, %599, %597 : index + %601 = divi_signed %600, %c8 : index + %602 = subi %c-1, %601 : index + %603 = select %598, %602, %601 : index + %604 = remi_signed %603, %c2 : index + %605 = cmpi "slt", %604, %c0 : index + %606 = addi %604, %c2 : index + %607 = select %605, %606, %604 : index + store %578, %2[%588, %593, %607] : memref<16x6x2xvector<8xf32>> + %608 = cmpi "slt", %arg5, %c0 : index + %609 = subi %c-1, %arg5 : index + %610 = select %608, %609, %arg5 : index + %611 = divi_signed %610, %c16 : index + %612 = subi %c-1, %611 : index + %613 = select %608, %612, %611 : index + %614 = remi_signed %613, %c16 : index + %615 = cmpi "slt", %614, %c0 : index + %616 = addi %614, %c16 : index + %617 = select %615, %616, %614 : index + %618 = addi %arg7, %arg9 : index + %619 = remi_signed %618, %c6 : index + %620 = cmpi "slt", %619, %c0 : index + %621 = addi %619, %c6 : index + %622 = select %620, %621, %619 : index + %623 = remi_signed %arg5, %c16 : index + %624 = cmpi "slt", %623, %c0 : index + %625 = addi %623, %c16 : index + %626 = select %624, %625, %623 : index + %627 = cmpi "slt", %626, %c0 : index + %628 = subi %c-1, %626 : index + %629 = select %627, %628, %626 : index + %630 = divi_signed %629, %c8 : index + %631 = subi %c-1, %630 : index + %632 = select %627, %631, %630 : index + %633 = remi_signed %632, %c2 : index + %634 = cmpi "slt", %633, %c0 : index + %635 = addi %633, %c2 : index + %636 = select %634, %635, %633 : index + %637 = load %2[%617, %622, %636] : memref<16x6x2xvector<8xf32>> + %638 = vector.insertelement %541, %637[%c1_i64 : i64] : vector<8xf32> + %639 = cmpi "slt", %arg5, %c0 : index + %640 = subi %c-1, %arg5 : index + %641 = select %639, %640, %arg5 : index + %642 = divi_signed %641, %c16 : index + %643 = subi %c-1, %642 : index + %644 = select %639, %643, %642 : index + %645 = remi_signed %644, %c16 : index + %646 = cmpi "slt", %645, %c0 : index + %647 = addi %645, %c16 : index + %648 = select %646, %647, %645 : index + %649 = addi %arg7, %arg9 : index + %650 = remi_signed %649, %c6 : index + %651 = cmpi "slt", %650, %c0 : index + %652 = addi %650, %c6 : index + %653 = select %651, %652, %650 : index + %654 = remi_signed %arg5, %c16 : index + %655 = cmpi "slt", %654, %c0 : index + %656 = addi %654, %c16 : index + %657 = select %655, %656, %654 : index + %658 = cmpi "slt", %657, %c0 : index + %659 = subi %c-1, %657 : index + %660 = select %658, %659, %657 : index + %661 = divi_signed %660, %c8 : index + %662 = subi %c-1, %661 : index + %663 = select %658, %662, %661 : index + %664 = remi_signed %663, %c2 : index + %665 = cmpi "slt", %664, %c0 : index + %666 = addi %664, %c2 : index + %667 = select %665, %666, %664 : index + store %638, %2[%648, %653, %667] : memref<16x6x2xvector<8xf32>> + %668 = cmpi "slt", %arg5, %c0 : index + %669 = subi %c-1, %arg5 : index + %670 = select %668, %669, %arg5 : index + %671 = divi_signed %670, %c16 : index + %672 = subi %c-1, %671 : index + %673 = select %668, %672, %671 : index + %674 = remi_signed %673, %c16 : index + %675 = cmpi "slt", %674, %c0 : index + %676 = addi %674, %c16 : index + %677 = select %675, %676, %674 : index + %678 = addi %arg7, %arg9 : index + %679 = remi_signed %678, %c6 : index + %680 = cmpi "slt", %679, %c0 : index + %681 = addi %679, %c6 : index + %682 = select %680, %681, %679 : index + %683 = remi_signed %arg5, %c16 : index + %684 = cmpi "slt", %683, %c0 : index + %685 = addi %683, %c16 : index + %686 = select %684, %685, %683 : index + %687 = cmpi "slt", %686, %c0 : index + %688 = subi %c-1, %686 : index + %689 = select %687, %688, %686 : index + %690 = divi_signed %689, %c8 : index + %691 = subi %c-1, %690 : index + %692 = select %687, %691, %690 : index + %693 = remi_signed %692, %c2 : index + %694 = cmpi "slt", %693, %c0 : index + %695 = addi %693, %c2 : index + %696 = select %694, %695, %693 : index + %697 = load %2[%677, %682, %696] : memref<16x6x2xvector<8xf32>> + %698 = vector.insertelement %542, %697[%c2_i64 : i64] : vector<8xf32> + %699 = cmpi "slt", %arg5, %c0 : index + %700 = subi %c-1, %arg5 : index + %701 = select %699, %700, %arg5 : index + %702 = divi_signed %701, %c16 : index + %703 = subi %c-1, %702 : index + %704 = select %699, %703, %702 : index + %705 = remi_signed %704, %c16 : index + %706 = cmpi "slt", %705, %c0 : index + %707 = addi %705, %c16 : index + %708 = select %706, %707, %705 : index + %709 = addi %arg7, %arg9 : index + %710 = remi_signed %709, %c6 : index + %711 = cmpi "slt", %710, %c0 : index + %712 = addi %710, %c6 : index + %713 = select %711, %712, %710 : index + %714 = remi_signed %arg5, %c16 : index + %715 = cmpi "slt", %714, %c0 : index + %716 = addi %714, %c16 : index + %717 = select %715, %716, %714 : index + %718 = cmpi "slt", %717, %c0 : index + %719 = subi %c-1, %717 : index + %720 = select %718, %719, %717 : index + %721 = divi_signed %720, %c8 : index + %722 = subi %c-1, %721 : index + %723 = select %718, %722, %721 : index + %724 = remi_signed %723, %c2 : index + %725 = cmpi "slt", %724, %c0 : index + %726 = addi %724, %c2 : index + %727 = select %725, %726, %724 : index + store %698, %2[%708, %713, %727] : memref<16x6x2xvector<8xf32>> + %728 = cmpi "slt", %arg5, %c0 : index + %729 = subi %c-1, %arg5 : index + %730 = select %728, %729, %arg5 : index + %731 = divi_signed %730, %c16 : index + %732 = subi %c-1, %731 : index + %733 = select %728, %732, %731 : index + %734 = remi_signed %733, %c16 : index + %735 = cmpi "slt", %734, %c0 : index + %736 = addi %734, %c16 : index + %737 = select %735, %736, %734 : index + %738 = addi %arg7, %arg9 : index + %739 = remi_signed %738, %c6 : index + %740 = cmpi "slt", %739, %c0 : index + %741 = addi %739, %c6 : index + %742 = select %740, %741, %739 : index + %743 = remi_signed %arg5, %c16 : index + %744 = cmpi "slt", %743, %c0 : index + %745 = addi %743, %c16 : index + %746 = select %744, %745, %743 : index + %747 = cmpi "slt", %746, %c0 : index + %748 = subi %c-1, %746 : index + %749 = select %747, %748, %746 : index + %750 = divi_signed %749, %c8 : index + %751 = subi %c-1, %750 : index + %752 = select %747, %751, %750 : index + %753 = remi_signed %752, %c2 : index + %754 = cmpi "slt", %753, %c0 : index + %755 = addi %753, %c2 : index + %756 = select %754, %755, %753 : index + %757 = load %2[%737, %742, %756] : memref<16x6x2xvector<8xf32>> + %758 = vector.insertelement %543, %757[%c3_i64 : i64] : vector<8xf32> + %759 = cmpi "slt", %arg5, %c0 : index + %760 = subi %c-1, %arg5 : index + %761 = select %759, %760, %arg5 : index + %762 = divi_signed %761, %c16 : index + %763 = subi %c-1, %762 : index + %764 = select %759, %763, %762 : index + %765 = remi_signed %764, %c16 : index + %766 = cmpi "slt", %765, %c0 : index + %767 = addi %765, %c16 : index + %768 = select %766, %767, %765 : index + %769 = addi %arg7, %arg9 : index + %770 = remi_signed %769, %c6 : index + %771 = cmpi "slt", %770, %c0 : index + %772 = addi %770, %c6 : index + %773 = select %771, %772, %770 : index + %774 = remi_signed %arg5, %c16 : index + %775 = cmpi "slt", %774, %c0 : index + %776 = addi %774, %c16 : index + %777 = select %775, %776, %774 : index + %778 = cmpi "slt", %777, %c0 : index + %779 = subi %c-1, %777 : index + %780 = select %778, %779, %777 : index + %781 = divi_signed %780, %c8 : index + %782 = subi %c-1, %781 : index + %783 = select %778, %782, %781 : index + %784 = remi_signed %783, %c2 : index + %785 = cmpi "slt", %784, %c0 : index + %786 = addi %784, %c2 : index + %787 = select %785, %786, %784 : index + store %758, %2[%768, %773, %787] : memref<16x6x2xvector<8xf32>> + %788 = cmpi "slt", %arg5, %c0 : index + %789 = subi %c-1, %arg5 : index + %790 = select %788, %789, %arg5 : index + %791 = divi_signed %790, %c16 : index + %792 = subi %c-1, %791 : index + %793 = select %788, %792, %791 : index + %794 = remi_signed %793, %c16 : index + %795 = cmpi "slt", %794, %c0 : index + %796 = addi %794, %c16 : index + %797 = select %795, %796, %794 : index + %798 = addi %arg7, %arg9 : index + %799 = remi_signed %798, %c6 : index + %800 = cmpi "slt", %799, %c0 : index + %801 = addi %799, %c6 : index + %802 = select %800, %801, %799 : index + %803 = remi_signed %arg5, %c16 : index + %804 = cmpi "slt", %803, %c0 : index + %805 = addi %803, %c16 : index + %806 = select %804, %805, %803 : index + %807 = cmpi "slt", %806, %c0 : index + %808 = subi %c-1, %806 : index + %809 = select %807, %808, %806 : index + %810 = divi_signed %809, %c8 : index + %811 = subi %c-1, %810 : index + %812 = select %807, %811, %810 : index + %813 = remi_signed %812, %c2 : index + %814 = cmpi "slt", %813, %c0 : index + %815 = addi %813, %c2 : index + %816 = select %814, %815, %813 : index + %817 = load %2[%797, %802, %816] : memref<16x6x2xvector<8xf32>> + %818 = vector.insertelement %544, %817[%c4_i64 : i64] : vector<8xf32> + %819 = cmpi "slt", %arg5, %c0 : index + %820 = subi %c-1, %arg5 : index + %821 = select %819, %820, %arg5 : index + %822 = divi_signed %821, %c16 : index + %823 = subi %c-1, %822 : index + %824 = select %819, %823, %822 : index + %825 = remi_signed %824, %c16 : index + %826 = cmpi "slt", %825, %c0 : index + %827 = addi %825, %c16 : index + %828 = select %826, %827, %825 : index + %829 = addi %arg7, %arg9 : index + %830 = remi_signed %829, %c6 : index + %831 = cmpi "slt", %830, %c0 : index + %832 = addi %830, %c6 : index + %833 = select %831, %832, %830 : index + %834 = remi_signed %arg5, %c16 : index + %835 = cmpi "slt", %834, %c0 : index + %836 = addi %834, %c16 : index + %837 = select %835, %836, %834 : index + %838 = cmpi "slt", %837, %c0 : index + %839 = subi %c-1, %837 : index + %840 = select %838, %839, %837 : index + %841 = divi_signed %840, %c8 : index + %842 = subi %c-1, %841 : index + %843 = select %838, %842, %841 : index + %844 = remi_signed %843, %c2 : index + %845 = cmpi "slt", %844, %c0 : index + %846 = addi %844, %c2 : index + %847 = select %845, %846, %844 : index + store %818, %2[%828, %833, %847] : memref<16x6x2xvector<8xf32>> + %848 = cmpi "slt", %arg5, %c0 : index + %849 = subi %c-1, %arg5 : index + %850 = select %848, %849, %arg5 : index + %851 = divi_signed %850, %c16 : index + %852 = subi %c-1, %851 : index + %853 = select %848, %852, %851 : index + %854 = remi_signed %853, %c16 : index + %855 = cmpi "slt", %854, %c0 : index + %856 = addi %854, %c16 : index + %857 = select %855, %856, %854 : index + %858 = addi %arg7, %arg9 : index + %859 = remi_signed %858, %c6 : index + %860 = cmpi "slt", %859, %c0 : index + %861 = addi %859, %c6 : index + %862 = select %860, %861, %859 : index + %863 = remi_signed %arg5, %c16 : index + %864 = cmpi "slt", %863, %c0 : index + %865 = addi %863, %c16 : index + %866 = select %864, %865, %863 : index + %867 = cmpi "slt", %866, %c0 : index + %868 = subi %c-1, %866 : index + %869 = select %867, %868, %866 : index + %870 = divi_signed %869, %c8 : index + %871 = subi %c-1, %870 : index + %872 = select %867, %871, %870 : index + %873 = remi_signed %872, %c2 : index + %874 = cmpi "slt", %873, %c0 : index + %875 = addi %873, %c2 : index + %876 = select %874, %875, %873 : index + %877 = load %2[%857, %862, %876] : memref<16x6x2xvector<8xf32>> + %878 = vector.insertelement %545, %877[%c5_i64 : i64] : vector<8xf32> + %879 = cmpi "slt", %arg5, %c0 : index + %880 = subi %c-1, %arg5 : index + %881 = select %879, %880, %arg5 : index + %882 = divi_signed %881, %c16 : index + %883 = subi %c-1, %882 : index + %884 = select %879, %883, %882 : index + %885 = remi_signed %884, %c16 : index + %886 = cmpi "slt", %885, %c0 : index + %887 = addi %885, %c16 : index + %888 = select %886, %887, %885 : index + %889 = addi %arg7, %arg9 : index + %890 = remi_signed %889, %c6 : index + %891 = cmpi "slt", %890, %c0 : index + %892 = addi %890, %c6 : index + %893 = select %891, %892, %890 : index + %894 = remi_signed %arg5, %c16 : index + %895 = cmpi "slt", %894, %c0 : index + %896 = addi %894, %c16 : index + %897 = select %895, %896, %894 : index + %898 = cmpi "slt", %897, %c0 : index + %899 = subi %c-1, %897 : index + %900 = select %898, %899, %897 : index + %901 = divi_signed %900, %c8 : index + %902 = subi %c-1, %901 : index + %903 = select %898, %902, %901 : index + %904 = remi_signed %903, %c2 : index + %905 = cmpi "slt", %904, %c0 : index + %906 = addi %904, %c2 : index + %907 = select %905, %906, %904 : index + store %878, %2[%888, %893, %907] : memref<16x6x2xvector<8xf32>> + %908 = cmpi "slt", %arg5, %c0 : index + %909 = subi %c-1, %arg5 : index + %910 = select %908, %909, %arg5 : index + %911 = divi_signed %910, %c16 : index + %912 = subi %c-1, %911 : index + %913 = select %908, %912, %911 : index + %914 = remi_signed %913, %c16 : index + %915 = cmpi "slt", %914, %c0 : index + %916 = addi %914, %c16 : index + %917 = select %915, %916, %914 : index + %918 = addi %arg7, %arg9 : index + %919 = remi_signed %918, %c6 : index + %920 = cmpi "slt", %919, %c0 : index + %921 = addi %919, %c6 : index + %922 = select %920, %921, %919 : index + %923 = remi_signed %arg5, %c16 : index + %924 = cmpi "slt", %923, %c0 : index + %925 = addi %923, %c16 : index + %926 = select %924, %925, %923 : index + %927 = cmpi "slt", %926, %c0 : index + %928 = subi %c-1, %926 : index + %929 = select %927, %928, %926 : index + %930 = divi_signed %929, %c8 : index + %931 = subi %c-1, %930 : index + %932 = select %927, %931, %930 : index + %933 = remi_signed %932, %c2 : index + %934 = cmpi "slt", %933, %c0 : index + %935 = addi %933, %c2 : index + %936 = select %934, %935, %933 : index + %937 = load %2[%917, %922, %936] : memref<16x6x2xvector<8xf32>> + %938 = vector.insertelement %546, %937[%c6_i64 : i64] : vector<8xf32> + %939 = cmpi "slt", %arg5, %c0 : index + %940 = subi %c-1, %arg5 : index + %941 = select %939, %940, %arg5 : index + %942 = divi_signed %941, %c16 : index + %943 = subi %c-1, %942 : index + %944 = select %939, %943, %942 : index + %945 = remi_signed %944, %c16 : index + %946 = cmpi "slt", %945, %c0 : index + %947 = addi %945, %c16 : index + %948 = select %946, %947, %945 : index + %949 = addi %arg7, %arg9 : index + %950 = remi_signed %949, %c6 : index + %951 = cmpi "slt", %950, %c0 : index + %952 = addi %950, %c6 : index + %953 = select %951, %952, %950 : index + %954 = remi_signed %arg5, %c16 : index + %955 = cmpi "slt", %954, %c0 : index + %956 = addi %954, %c16 : index + %957 = select %955, %956, %954 : index + %958 = cmpi "slt", %957, %c0 : index + %959 = subi %c-1, %957 : index + %960 = select %958, %959, %957 : index + %961 = divi_signed %960, %c8 : index + %962 = subi %c-1, %961 : index + %963 = select %958, %962, %961 : index + %964 = remi_signed %963, %c2 : index + %965 = cmpi "slt", %964, %c0 : index + %966 = addi %964, %c2 : index + %967 = select %965, %966, %964 : index + store %938, %2[%948, %953, %967] : memref<16x6x2xvector<8xf32>> + %968 = cmpi "slt", %arg5, %c0 : index + %969 = subi %c-1, %arg5 : index + %970 = select %968, %969, %arg5 : index + %971 = divi_signed %970, %c16 : index + %972 = subi %c-1, %971 : index + %973 = select %968, %972, %971 : index + %974 = remi_signed %973, %c16 : index + %975 = cmpi "slt", %974, %c0 : index + %976 = addi %974, %c16 : index + %977 = select %975, %976, %974 : index + %978 = addi %arg7, %arg9 : index + %979 = remi_signed %978, %c6 : index + %980 = cmpi "slt", %979, %c0 : index + %981 = addi %979, %c6 : index + %982 = select %980, %981, %979 : index + %983 = remi_signed %arg5, %c16 : index + %984 = cmpi "slt", %983, %c0 : index + %985 = addi %983, %c16 : index + %986 = select %984, %985, %983 : index + %987 = cmpi "slt", %986, %c0 : index + %988 = subi %c-1, %986 : index + %989 = select %987, %988, %986 : index + %990 = divi_signed %989, %c8 : index + %991 = subi %c-1, %990 : index + %992 = select %987, %991, %990 : index + %993 = remi_signed %992, %c2 : index + %994 = cmpi "slt", %993, %c0 : index + %995 = addi %993, %c2 : index + %996 = select %994, %995, %993 : index + %997 = load %2[%977, %982, %996] : memref<16x6x2xvector<8xf32>> + %998 = vector.insertelement %547, %997[%c7_i64 : i64] : vector<8xf32> + %999 = cmpi "slt", %arg5, %c0 : index + %1000 = subi %c-1, %arg5 : index + %1001 = select %999, %1000, %arg5 : index + %1002 = divi_signed %1001, %c16 : index + %1003 = subi %c-1, %1002 : index + %1004 = select %999, %1003, %1002 : index + %1005 = remi_signed %1004, %c16 : index + %1006 = cmpi "slt", %1005, %c0 : index + %1007 = addi %1005, %c16 : index + %1008 = select %1006, %1007, %1005 : index + %1009 = addi %arg7, %arg9 : index + %1010 = remi_signed %1009, %c6 : index + %1011 = cmpi "slt", %1010, %c0 : index + %1012 = addi %1010, %c6 : index + %1013 = select %1011, %1012, %1010 : index + %1014 = remi_signed %arg5, %c16 : index + %1015 = cmpi "slt", %1014, %c0 : index + %1016 = addi %1014, %c16 : index + %1017 = select %1015, %1016, %1014 : index + %1018 = cmpi "slt", %1017, %c0 : index + %1019 = subi %c-1, %1017 : index + %1020 = select %1018, %1019, %1017 : index + %1021 = divi_signed %1020, %c8 : index + %1022 = subi %c-1, %1021 : index + %1023 = select %1018, %1022, %1021 : index + %1024 = remi_signed %1023, %c2 : index + %1025 = cmpi "slt", %1024, %c0 : index + %1026 = addi %1024, %c2 : index + %1027 = select %1025, %1026, %1024 : index + store %998, %2[%1008, %1013, %1027] : memref<16x6x2xvector<8xf32>> + %1028 = cmpi "slt", %arg5, %c0 : index + %1029 = subi %c-1, %arg5 : index + %1030 = select %1028, %1029, %arg5 : index + %1031 = divi_signed %1030, %c16 : index + %1032 = subi %c-1, %1031 : index + %1033 = select %1028, %1032, %1031 : index + %1034 = remi_signed %1033, %c16 : index + %1035 = cmpi "slt", %1034, %c0 : index + %1036 = addi %1034, %c16 : index + %1037 = select %1035, %1036, %1034 : index + %1038 = addi %arg7, %arg9 : index + %1039 = remi_signed %1038, %c6 : index + %1040 = cmpi "slt", %1039, %c0 : index + %1041 = addi %1039, %c6 : index + %1042 = select %1040, %1041, %1039 : index + %1043 = remi_signed %arg5, %c16 : index + %1044 = cmpi "slt", %1043, %c0 : index + %1045 = addi %1043, %c16 : index + %1046 = select %1044, %1045, %1043 : index + %1047 = cmpi "slt", %1046, %c0 : index + %1048 = subi %c-1, %1046 : index + %1049 = select %1047, %1048, %1046 : index + %1050 = divi_signed %1049, %c8 : index + %1051 = subi %c-1, %1050 : index + %1052 = select %1047, %1051, %1050 : index + %1053 = remi_signed %1052, %c2 : index + %1054 = cmpi "slt", %1053, %c0 : index + %1055 = addi %1053, %c2 : index + %1056 = select %1054, %1055, %1053 : index + %1057 = load %2[%1037, %1042, %1056] : memref<16x6x2xvector<8xf32>> + %1058 = vector.insertelement %540, %1057[%c0_i64 : i64] : vector<8xf32> + %1059 = cmpi "slt", %arg5, %c0 : index + %1060 = subi %c-1, %arg5 : index + %1061 = select %1059, %1060, %arg5 : index + %1062 = divi_signed %1061, %c16 : index + %1063 = subi %c-1, %1062 : index + %1064 = select %1059, %1063, %1062 : index + %1065 = remi_signed %1064, %c16 : index + %1066 = cmpi "slt", %1065, %c0 : index + %1067 = addi %1065, %c16 : index + %1068 = select %1066, %1067, %1065 : index + %1069 = addi %arg7, %arg9 : index + %1070 = remi_signed %1069, %c6 : index + %1071 = cmpi "slt", %1070, %c0 : index + %1072 = addi %1070, %c6 : index + %1073 = select %1071, %1072, %1070 : index + %1074 = remi_signed %arg5, %c16 : index + %1075 = cmpi "slt", %1074, %c0 : index + %1076 = addi %1074, %c16 : index + %1077 = select %1075, %1076, %1074 : index + %1078 = cmpi "slt", %1077, %c0 : index + %1079 = subi %c-1, %1077 : index + %1080 = select %1078, %1079, %1077 : index + %1081 = divi_signed %1080, %c8 : index + %1082 = subi %c-1, %1081 : index + %1083 = select %1078, %1082, %1081 : index + %1084 = remi_signed %1083, %c2 : index + %1085 = cmpi "slt", %1084, %c0 : index + %1086 = addi %1084, %c2 : index + %1087 = select %1085, %1086, %1084 : index + store %1058, %2[%1068, %1073, %1087] : memref<16x6x2xvector<8xf32>> + %1088 = cmpi "slt", %arg5, %c0 : index + %1089 = subi %c-1, %arg5 : index + %1090 = select %1088, %1089, %arg5 : index + %1091 = divi_signed %1090, %c16 : index + %1092 = subi %c-1, %1091 : index + %1093 = select %1088, %1092, %1091 : index + %1094 = remi_signed %1093, %c16 : index + %1095 = cmpi "slt", %1094, %c0 : index + %1096 = addi %1094, %c16 : index + %1097 = select %1095, %1096, %1094 : index + %1098 = addi %arg7, %arg9 : index + %1099 = remi_signed %1098, %c6 : index + %1100 = cmpi "slt", %1099, %c0 : index + %1101 = addi %1099, %c6 : index + %1102 = select %1100, %1101, %1099 : index + %1103 = remi_signed %arg5, %c16 : index + %1104 = cmpi "slt", %1103, %c0 : index + %1105 = addi %1103, %c16 : index + %1106 = select %1104, %1105, %1103 : index + %1107 = cmpi "slt", %1106, %c0 : index + %1108 = subi %c-1, %1106 : index + %1109 = select %1107, %1108, %1106 : index + %1110 = divi_signed %1109, %c8 : index + %1111 = subi %c-1, %1110 : index + %1112 = select %1107, %1111, %1110 : index + %1113 = remi_signed %1112, %c2 : index + %1114 = cmpi "slt", %1113, %c0 : index + %1115 = addi %1113, %c2 : index + %1116 = select %1114, %1115, %1113 : index + %1117 = load %2[%1097, %1102, %1116] : memref<16x6x2xvector<8xf32>> + %1118 = vector.insertelement %541, %1117[%c1_i64 : i64] : vector<8xf32> + %1119 = cmpi "slt", %arg5, %c0 : index + %1120 = subi %c-1, %arg5 : index + %1121 = select %1119, %1120, %arg5 : index + %1122 = divi_signed %1121, %c16 : index + %1123 = subi %c-1, %1122 : index + %1124 = select %1119, %1123, %1122 : index + %1125 = remi_signed %1124, %c16 : index + %1126 = cmpi "slt", %1125, %c0 : index + %1127 = addi %1125, %c16 : index + %1128 = select %1126, %1127, %1125 : index + %1129 = addi %arg7, %arg9 : index + %1130 = remi_signed %1129, %c6 : index + %1131 = cmpi "slt", %1130, %c0 : index + %1132 = addi %1130, %c6 : index + %1133 = select %1131, %1132, %1130 : index + %1134 = remi_signed %arg5, %c16 : index + %1135 = cmpi "slt", %1134, %c0 : index + %1136 = addi %1134, %c16 : index + %1137 = select %1135, %1136, %1134 : index + %1138 = cmpi "slt", %1137, %c0 : index + %1139 = subi %c-1, %1137 : index + %1140 = select %1138, %1139, %1137 : index + %1141 = divi_signed %1140, %c8 : index + %1142 = subi %c-1, %1141 : index + %1143 = select %1138, %1142, %1141 : index + %1144 = remi_signed %1143, %c2 : index + %1145 = cmpi "slt", %1144, %c0 : index + %1146 = addi %1144, %c2 : index + %1147 = select %1145, %1146, %1144 : index + store %1118, %2[%1128, %1133, %1147] : memref<16x6x2xvector<8xf32>> + %1148 = cmpi "slt", %arg5, %c0 : index + %1149 = subi %c-1, %arg5 : index + %1150 = select %1148, %1149, %arg5 : index + %1151 = divi_signed %1150, %c16 : index + %1152 = subi %c-1, %1151 : index + %1153 = select %1148, %1152, %1151 : index + %1154 = remi_signed %1153, %c16 : index + %1155 = cmpi "slt", %1154, %c0 : index + %1156 = addi %1154, %c16 : index + %1157 = select %1155, %1156, %1154 : index + %1158 = addi %arg7, %arg9 : index + %1159 = remi_signed %1158, %c6 : index + %1160 = cmpi "slt", %1159, %c0 : index + %1161 = addi %1159, %c6 : index + %1162 = select %1160, %1161, %1159 : index + %1163 = remi_signed %arg5, %c16 : index + %1164 = cmpi "slt", %1163, %c0 : index + %1165 = addi %1163, %c16 : index + %1166 = select %1164, %1165, %1163 : index + %1167 = cmpi "slt", %1166, %c0 : index + %1168 = subi %c-1, %1166 : index + %1169 = select %1167, %1168, %1166 : index + %1170 = divi_signed %1169, %c8 : index + %1171 = subi %c-1, %1170 : index + %1172 = select %1167, %1171, %1170 : index + %1173 = remi_signed %1172, %c2 : index + %1174 = cmpi "slt", %1173, %c0 : index + %1175 = addi %1173, %c2 : index + %1176 = select %1174, %1175, %1173 : index + %1177 = load %2[%1157, %1162, %1176] : memref<16x6x2xvector<8xf32>> + %1178 = vector.insertelement %542, %1177[%c2_i64 : i64] : vector<8xf32> + %1179 = cmpi "slt", %arg5, %c0 : index + %1180 = subi %c-1, %arg5 : index + %1181 = select %1179, %1180, %arg5 : index + %1182 = divi_signed %1181, %c16 : index + %1183 = subi %c-1, %1182 : index + %1184 = select %1179, %1183, %1182 : index + %1185 = remi_signed %1184, %c16 : index + %1186 = cmpi "slt", %1185, %c0 : index + %1187 = addi %1185, %c16 : index + %1188 = select %1186, %1187, %1185 : index + %1189 = addi %arg7, %arg9 : index + %1190 = remi_signed %1189, %c6 : index + %1191 = cmpi "slt", %1190, %c0 : index + %1192 = addi %1190, %c6 : index + %1193 = select %1191, %1192, %1190 : index + %1194 = remi_signed %arg5, %c16 : index + %1195 = cmpi "slt", %1194, %c0 : index + %1196 = addi %1194, %c16 : index + %1197 = select %1195, %1196, %1194 : index + %1198 = cmpi "slt", %1197, %c0 : index + %1199 = subi %c-1, %1197 : index + %1200 = select %1198, %1199, %1197 : index + %1201 = divi_signed %1200, %c8 : index + %1202 = subi %c-1, %1201 : index + %1203 = select %1198, %1202, %1201 : index + %1204 = remi_signed %1203, %c2 : index + %1205 = cmpi "slt", %1204, %c0 : index + %1206 = addi %1204, %c2 : index + %1207 = select %1205, %1206, %1204 : index + store %1178, %2[%1188, %1193, %1207] : memref<16x6x2xvector<8xf32>> + %1208 = cmpi "slt", %arg5, %c0 : index + %1209 = subi %c-1, %arg5 : index + %1210 = select %1208, %1209, %arg5 : index + %1211 = divi_signed %1210, %c16 : index + %1212 = subi %c-1, %1211 : index + %1213 = select %1208, %1212, %1211 : index + %1214 = remi_signed %1213, %c16 : index + %1215 = cmpi "slt", %1214, %c0 : index + %1216 = addi %1214, %c16 : index + %1217 = select %1215, %1216, %1214 : index + %1218 = addi %arg7, %arg9 : index + %1219 = remi_signed %1218, %c6 : index + %1220 = cmpi "slt", %1219, %c0 : index + %1221 = addi %1219, %c6 : index + %1222 = select %1220, %1221, %1219 : index + %1223 = remi_signed %arg5, %c16 : index + %1224 = cmpi "slt", %1223, %c0 : index + %1225 = addi %1223, %c16 : index + %1226 = select %1224, %1225, %1223 : index + %1227 = cmpi "slt", %1226, %c0 : index + %1228 = subi %c-1, %1226 : index + %1229 = select %1227, %1228, %1226 : index + %1230 = divi_signed %1229, %c8 : index + %1231 = subi %c-1, %1230 : index + %1232 = select %1227, %1231, %1230 : index + %1233 = remi_signed %1232, %c2 : index + %1234 = cmpi "slt", %1233, %c0 : index + %1235 = addi %1233, %c2 : index + %1236 = select %1234, %1235, %1233 : index + %1237 = load %2[%1217, %1222, %1236] : memref<16x6x2xvector<8xf32>> + %1238 = vector.insertelement %543, %1237[%c3_i64 : i64] : vector<8xf32> + %1239 = cmpi "slt", %arg5, %c0 : index + %1240 = subi %c-1, %arg5 : index + %1241 = select %1239, %1240, %arg5 : index + %1242 = divi_signed %1241, %c16 : index + %1243 = subi %c-1, %1242 : index + %1244 = select %1239, %1243, %1242 : index + %1245 = remi_signed %1244, %c16 : index + %1246 = cmpi "slt", %1245, %c0 : index + %1247 = addi %1245, %c16 : index + %1248 = select %1246, %1247, %1245 : index + %1249 = addi %arg7, %arg9 : index + %1250 = remi_signed %1249, %c6 : index + %1251 = cmpi "slt", %1250, %c0 : index + %1252 = addi %1250, %c6 : index + %1253 = select %1251, %1252, %1250 : index + %1254 = remi_signed %arg5, %c16 : index + %1255 = cmpi "slt", %1254, %c0 : index + %1256 = addi %1254, %c16 : index + %1257 = select %1255, %1256, %1254 : index + %1258 = cmpi "slt", %1257, %c0 : index + %1259 = subi %c-1, %1257 : index + %1260 = select %1258, %1259, %1257 : index + %1261 = divi_signed %1260, %c8 : index + %1262 = subi %c-1, %1261 : index + %1263 = select %1258, %1262, %1261 : index + %1264 = remi_signed %1263, %c2 : index + %1265 = cmpi "slt", %1264, %c0 : index + %1266 = addi %1264, %c2 : index + %1267 = select %1265, %1266, %1264 : index + store %1238, %2[%1248, %1253, %1267] : memref<16x6x2xvector<8xf32>> + %1268 = cmpi "slt", %arg5, %c0 : index + %1269 = subi %c-1, %arg5 : index + %1270 = select %1268, %1269, %arg5 : index + %1271 = divi_signed %1270, %c16 : index + %1272 = subi %c-1, %1271 : index + %1273 = select %1268, %1272, %1271 : index + %1274 = remi_signed %1273, %c16 : index + %1275 = cmpi "slt", %1274, %c0 : index + %1276 = addi %1274, %c16 : index + %1277 = select %1275, %1276, %1274 : index + %1278 = addi %arg7, %arg9 : index + %1279 = remi_signed %1278, %c6 : index + %1280 = cmpi "slt", %1279, %c0 : index + %1281 = addi %1279, %c6 : index + %1282 = select %1280, %1281, %1279 : index + %1283 = remi_signed %arg5, %c16 : index + %1284 = cmpi "slt", %1283, %c0 : index + %1285 = addi %1283, %c16 : index + %1286 = select %1284, %1285, %1283 : index + %1287 = cmpi "slt", %1286, %c0 : index + %1288 = subi %c-1, %1286 : index + %1289 = select %1287, %1288, %1286 : index + %1290 = divi_signed %1289, %c8 : index + %1291 = subi %c-1, %1290 : index + %1292 = select %1287, %1291, %1290 : index + %1293 = remi_signed %1292, %c2 : index + %1294 = cmpi "slt", %1293, %c0 : index + %1295 = addi %1293, %c2 : index + %1296 = select %1294, %1295, %1293 : index + %1297 = load %2[%1277, %1282, %1296] : memref<16x6x2xvector<8xf32>> + %1298 = vector.insertelement %544, %1297[%c4_i64 : i64] : vector<8xf32> + %1299 = cmpi "slt", %arg5, %c0 : index + %1300 = subi %c-1, %arg5 : index + %1301 = select %1299, %1300, %arg5 : index + %1302 = divi_signed %1301, %c16 : index + %1303 = subi %c-1, %1302 : index + %1304 = select %1299, %1303, %1302 : index + %1305 = remi_signed %1304, %c16 : index + %1306 = cmpi "slt", %1305, %c0 : index + %1307 = addi %1305, %c16 : index + %1308 = select %1306, %1307, %1305 : index + %1309 = addi %arg7, %arg9 : index + %1310 = remi_signed %1309, %c6 : index + %1311 = cmpi "slt", %1310, %c0 : index + %1312 = addi %1310, %c6 : index + %1313 = select %1311, %1312, %1310 : index + %1314 = remi_signed %arg5, %c16 : index + %1315 = cmpi "slt", %1314, %c0 : index + %1316 = addi %1314, %c16 : index + %1317 = select %1315, %1316, %1314 : index + %1318 = cmpi "slt", %1317, %c0 : index + %1319 = subi %c-1, %1317 : index + %1320 = select %1318, %1319, %1317 : index + %1321 = divi_signed %1320, %c8 : index + %1322 = subi %c-1, %1321 : index + %1323 = select %1318, %1322, %1321 : index + %1324 = remi_signed %1323, %c2 : index + %1325 = cmpi "slt", %1324, %c0 : index + %1326 = addi %1324, %c2 : index + %1327 = select %1325, %1326, %1324 : index + store %1298, %2[%1308, %1313, %1327] : memref<16x6x2xvector<8xf32>> + %1328 = cmpi "slt", %arg5, %c0 : index + %1329 = subi %c-1, %arg5 : index + %1330 = select %1328, %1329, %arg5 : index + %1331 = divi_signed %1330, %c16 : index + %1332 = subi %c-1, %1331 : index + %1333 = select %1328, %1332, %1331 : index + %1334 = remi_signed %1333, %c16 : index + %1335 = cmpi "slt", %1334, %c0 : index + %1336 = addi %1334, %c16 : index + %1337 = select %1335, %1336, %1334 : index + %1338 = addi %arg7, %arg9 : index + %1339 = remi_signed %1338, %c6 : index + %1340 = cmpi "slt", %1339, %c0 : index + %1341 = addi %1339, %c6 : index + %1342 = select %1340, %1341, %1339 : index + %1343 = remi_signed %arg5, %c16 : index + %1344 = cmpi "slt", %1343, %c0 : index + %1345 = addi %1343, %c16 : index + %1346 = select %1344, %1345, %1343 : index + %1347 = cmpi "slt", %1346, %c0 : index + %1348 = subi %c-1, %1346 : index + %1349 = select %1347, %1348, %1346 : index + %1350 = divi_signed %1349, %c8 : index + %1351 = subi %c-1, %1350 : index + %1352 = select %1347, %1351, %1350 : index + %1353 = remi_signed %1352, %c2 : index + %1354 = cmpi "slt", %1353, %c0 : index + %1355 = addi %1353, %c2 : index + %1356 = select %1354, %1355, %1353 : index + %1357 = load %2[%1337, %1342, %1356] : memref<16x6x2xvector<8xf32>> + %1358 = vector.insertelement %545, %1357[%c5_i64 : i64] : vector<8xf32> + %1359 = cmpi "slt", %arg5, %c0 : index + %1360 = subi %c-1, %arg5 : index + %1361 = select %1359, %1360, %arg5 : index + %1362 = divi_signed %1361, %c16 : index + %1363 = subi %c-1, %1362 : index + %1364 = select %1359, %1363, %1362 : index + %1365 = remi_signed %1364, %c16 : index + %1366 = cmpi "slt", %1365, %c0 : index + %1367 = addi %1365, %c16 : index + %1368 = select %1366, %1367, %1365 : index + %1369 = addi %arg7, %arg9 : index + %1370 = remi_signed %1369, %c6 : index + %1371 = cmpi "slt", %1370, %c0 : index + %1372 = addi %1370, %c6 : index + %1373 = select %1371, %1372, %1370 : index + %1374 = remi_signed %arg5, %c16 : index + %1375 = cmpi "slt", %1374, %c0 : index + %1376 = addi %1374, %c16 : index + %1377 = select %1375, %1376, %1374 : index + %1378 = cmpi "slt", %1377, %c0 : index + %1379 = subi %c-1, %1377 : index + %1380 = select %1378, %1379, %1377 : index + %1381 = divi_signed %1380, %c8 : index + %1382 = subi %c-1, %1381 : index + %1383 = select %1378, %1382, %1381 : index + %1384 = remi_signed %1383, %c2 : index + %1385 = cmpi "slt", %1384, %c0 : index + %1386 = addi %1384, %c2 : index + %1387 = select %1385, %1386, %1384 : index + store %1358, %2[%1368, %1373, %1387] : memref<16x6x2xvector<8xf32>> + %1388 = cmpi "slt", %arg5, %c0 : index + %1389 = subi %c-1, %arg5 : index + %1390 = select %1388, %1389, %arg5 : index + %1391 = divi_signed %1390, %c16 : index + %1392 = subi %c-1, %1391 : index + %1393 = select %1388, %1392, %1391 : index + %1394 = remi_signed %1393, %c16 : index + %1395 = cmpi "slt", %1394, %c0 : index + %1396 = addi %1394, %c16 : index + %1397 = select %1395, %1396, %1394 : index + %1398 = addi %arg7, %arg9 : index + %1399 = remi_signed %1398, %c6 : index + %1400 = cmpi "slt", %1399, %c0 : index + %1401 = addi %1399, %c6 : index + %1402 = select %1400, %1401, %1399 : index + %1403 = remi_signed %arg5, %c16 : index + %1404 = cmpi "slt", %1403, %c0 : index + %1405 = addi %1403, %c16 : index + %1406 = select %1404, %1405, %1403 : index + %1407 = cmpi "slt", %1406, %c0 : index + %1408 = subi %c-1, %1406 : index + %1409 = select %1407, %1408, %1406 : index + %1410 = divi_signed %1409, %c8 : index + %1411 = subi %c-1, %1410 : index + %1412 = select %1407, %1411, %1410 : index + %1413 = remi_signed %1412, %c2 : index + %1414 = cmpi "slt", %1413, %c0 : index + %1415 = addi %1413, %c2 : index + %1416 = select %1414, %1415, %1413 : index + %1417 = load %2[%1397, %1402, %1416] : memref<16x6x2xvector<8xf32>> + %1418 = vector.insertelement %546, %1417[%c6_i64 : i64] : vector<8xf32> + %1419 = cmpi "slt", %arg5, %c0 : index + %1420 = subi %c-1, %arg5 : index + %1421 = select %1419, %1420, %arg5 : index + %1422 = divi_signed %1421, %c16 : index + %1423 = subi %c-1, %1422 : index + %1424 = select %1419, %1423, %1422 : index + %1425 = remi_signed %1424, %c16 : index + %1426 = cmpi "slt", %1425, %c0 : index + %1427 = addi %1425, %c16 : index + %1428 = select %1426, %1427, %1425 : index + %1429 = addi %arg7, %arg9 : index + %1430 = remi_signed %1429, %c6 : index + %1431 = cmpi "slt", %1430, %c0 : index + %1432 = addi %1430, %c6 : index + %1433 = select %1431, %1432, %1430 : index + %1434 = remi_signed %arg5, %c16 : index + %1435 = cmpi "slt", %1434, %c0 : index + %1436 = addi %1434, %c16 : index + %1437 = select %1435, %1436, %1434 : index + %1438 = cmpi "slt", %1437, %c0 : index + %1439 = subi %c-1, %1437 : index + %1440 = select %1438, %1439, %1437 : index + %1441 = divi_signed %1440, %c8 : index + %1442 = subi %c-1, %1441 : index + %1443 = select %1438, %1442, %1441 : index + %1444 = remi_signed %1443, %c2 : index + %1445 = cmpi "slt", %1444, %c0 : index + %1446 = addi %1444, %c2 : index + %1447 = select %1445, %1446, %1444 : index + store %1418, %2[%1428, %1433, %1447] : memref<16x6x2xvector<8xf32>> + %1448 = cmpi "slt", %arg5, %c0 : index + %1449 = subi %c-1, %arg5 : index + %1450 = select %1448, %1449, %arg5 : index + %1451 = divi_signed %1450, %c16 : index + %1452 = subi %c-1, %1451 : index + %1453 = select %1448, %1452, %1451 : index + %1454 = remi_signed %1453, %c16 : index + %1455 = cmpi "slt", %1454, %c0 : index + %1456 = addi %1454, %c16 : index + %1457 = select %1455, %1456, %1454 : index + %1458 = addi %arg7, %arg9 : index + %1459 = remi_signed %1458, %c6 : index + %1460 = cmpi "slt", %1459, %c0 : index + %1461 = addi %1459, %c6 : index + %1462 = select %1460, %1461, %1459 : index + %1463 = remi_signed %arg5, %c16 : index + %1464 = cmpi "slt", %1463, %c0 : index + %1465 = addi %1463, %c16 : index + %1466 = select %1464, %1465, %1463 : index + %1467 = cmpi "slt", %1466, %c0 : index + %1468 = subi %c-1, %1466 : index + %1469 = select %1467, %1468, %1466 : index + %1470 = divi_signed %1469, %c8 : index + %1471 = subi %c-1, %1470 : index + %1472 = select %1467, %1471, %1470 : index + %1473 = remi_signed %1472, %c2 : index + %1474 = cmpi "slt", %1473, %c0 : index + %1475 = addi %1473, %c2 : index + %1476 = select %1474, %1475, %1473 : index + %1477 = load %2[%1457, %1462, %1476] : memref<16x6x2xvector<8xf32>> + %1478 = vector.insertelement %547, %1477[%c7_i64 : i64] : vector<8xf32> + %1479 = cmpi "slt", %arg5, %c0 : index + %1480 = subi %c-1, %arg5 : index + %1481 = select %1479, %1480, %arg5 : index + %1482 = divi_signed %1481, %c16 : index + %1483 = subi %c-1, %1482 : index + %1484 = select %1479, %1483, %1482 : index + %1485 = remi_signed %1484, %c16 : index + %1486 = cmpi "slt", %1485, %c0 : index + %1487 = addi %1485, %c16 : index + %1488 = select %1486, %1487, %1485 : index + %1489 = addi %arg7, %arg9 : index + %1490 = remi_signed %1489, %c6 : index + %1491 = cmpi "slt", %1490, %c0 : index + %1492 = addi %1490, %c6 : index + %1493 = select %1491, %1492, %1490 : index + %1494 = remi_signed %arg5, %c16 : index + %1495 = cmpi "slt", %1494, %c0 : index + %1496 = addi %1494, %c16 : index + %1497 = select %1495, %1496, %1494 : index + %1498 = cmpi "slt", %1497, %c0 : index + %1499 = subi %c-1, %1497 : index + %1500 = select %1498, %1499, %1497 : index + %1501 = divi_signed %1500, %c8 : index + %1502 = subi %c-1, %1501 : index + %1503 = select %1498, %1502, %1501 : index + %1504 = remi_signed %1503, %c2 : index + %1505 = cmpi "slt", %1504, %c0 : index + %1506 = addi %1504, %c2 : index + %1507 = select %1505, %1506, %1504 : index + store %1478, %2[%1488, %1493, %1507] : memref<16x6x2xvector<8xf32>> + %1508 = addi %arg4, %arg7 : index + %1509 = addi %1508, %arg9 : index + %1510 = addi %arg4, %arg7 : index + %1511 = addi %1510, %arg9 : index + %1512 = addi %arg4, %arg7 : index + %1513 = addi %1512, %arg9 : index + %1514 = addi %arg4, %arg7 : index + %1515 = addi %1514, %arg9 : index + %1516 = addi %arg4, %arg7 : index + %1517 = addi %1516, %arg9 : index + %1518 = addi %arg4, %arg7 : index + %1519 = addi %1518, %arg9 : index + %1520 = addi %arg4, %arg7 : index + %1521 = addi %1520, %arg9 : index + %1522 = addi %arg4, %arg7 : index + %1523 = addi %1522, %arg9 : index + %1524 = addi %arg6, %arg8 : index + %1525 = addi %arg6, %arg8 : index + %1526 = addi %arg6, %arg8 : index + %1527 = addi %arg6, %arg8 : index + %1528 = addi %arg6, %arg8 : index + %1529 = addi %arg6, %arg8 : index + %1530 = addi %arg6, %arg8 : index + %1531 = addi %arg6, %arg8 : index + %1532 = load %arg0[%1509, %1524] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1533 = load %arg0[%1511, %1525] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1534 = load %arg0[%1513, %1526] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1535 = load %arg0[%1515, %1527] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1536 = load %arg0[%1517, %1528] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1537 = load %arg0[%1519, %1529] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1538 = load %arg0[%1521, %1530] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1539 = load %arg0[%1523, %1531] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1540 = addi %arg5, %c8 : index + %1541 = cmpi "slt", %1540, %c0 : index + %1542 = subi %c-1, %1540 : index + %1543 = select %1541, %1542, %1540 : index + %1544 = divi_signed %1543, %c16 : index + %1545 = subi %c-1, %1544 : index + %1546 = select %1541, %1545, %1544 : index + %1547 = remi_signed %1546, %c16 : index + %1548 = cmpi "slt", %1547, %c0 : index + %1549 = addi %1547, %c16 : index + %1550 = select %1548, %1549, %1547 : index + %1551 = addi %arg6, %arg8 : index + %1552 = remi_signed %1551, %c128 : index + %1553 = cmpi "slt", %1552, %c0 : index + %1554 = addi %1552, %c128 : index + %1555 = select %1553, %1554, %1552 : index + %1556 = cmpi "slt", %arg5, %c0 : index + %1557 = subi %c-1, %arg5 : index + %1558 = select %1556, %1557, %arg5 : index + %1559 = divi_signed %1558, %c8 : index + %1560 = subi %c-1, %1559 : index + %1561 = select %1556, %1560, %1559 : index + %1562 = addi %arg5, %c8 : index + %1563 = cmpi "slt", %1562, %c0 : index + %1564 = subi %c-1, %1562 : index + %1565 = select %1563, %1564, %1562 : index + %1566 = divi_signed %1565, %c16 : index + %1567 = subi %c-1, %1566 : index + %1568 = select %1563, %1567, %1566 : index + %1569 = muli %1568, %c-2 : index + %1570 = addi %1561, %1569 : index + %1571 = cmpi "slt", %arg5, %c0 : index + %1572 = subi %c-1, %arg5 : index + %1573 = select %1571, %1572, %arg5 : index + %1574 = divi_signed %1573, %c8 : index + %1575 = subi %c-1, %1574 : index + %1576 = select %1571, %1575, %1574 : index + %1577 = addi %arg5, %c8 : index + %1578 = cmpi "slt", %1577, %c0 : index + %1579 = subi %c-1, %1577 : index + %1580 = select %1578, %1579, %1577 : index + %1581 = divi_signed %1580, %c16 : index + %1582 = subi %c-1, %1581 : index + %1583 = select %1578, %1582, %1581 : index + %1584 = muli %1583, %c-2 : index + %1585 = addi %1576, %1584 : index + %1586 = addi %1585, %c1 : index + %1587 = cmpi "slt", %1586, %c0 : index + %1588 = subi %c-1, %1586 : index + %1589 = select %1587, %1588, %1586 : index + %1590 = divi_signed %1589, %c2 : index + %1591 = subi %c-1, %1590 : index + %1592 = select %1587, %1591, %1590 : index + %1593 = muli %1592, %c-2 : index + %1594 = addi %1570, %1593 : index + %1595 = addi %1594, %c1 : index + %1596 = load %3[%1550, %1555, %1595] : memref<16x128x2xvector<8xf32>> + %1597 = vector.extractelement %1596[%c0_i64 : i64] : vector<8xf32> + %1598 = addi %arg5, %c8 : index + %1599 = cmpi "slt", %1598, %c0 : index + %1600 = subi %c-1, %1598 : index + %1601 = select %1599, %1600, %1598 : index + %1602 = divi_signed %1601, %c16 : index + %1603 = subi %c-1, %1602 : index + %1604 = select %1599, %1603, %1602 : index + %1605 = remi_signed %1604, %c16 : index + %1606 = cmpi "slt", %1605, %c0 : index + %1607 = addi %1605, %c16 : index + %1608 = select %1606, %1607, %1605 : index + %1609 = addi %arg6, %arg8 : index + %1610 = remi_signed %1609, %c128 : index + %1611 = cmpi "slt", %1610, %c0 : index + %1612 = addi %1610, %c128 : index + %1613 = select %1611, %1612, %1610 : index + %1614 = cmpi "slt", %arg5, %c0 : index + %1615 = subi %c-1, %arg5 : index + %1616 = select %1614, %1615, %arg5 : index + %1617 = divi_signed %1616, %c8 : index + %1618 = subi %c-1, %1617 : index + %1619 = select %1614, %1618, %1617 : index + %1620 = addi %arg5, %c8 : index + %1621 = cmpi "slt", %1620, %c0 : index + %1622 = subi %c-1, %1620 : index + %1623 = select %1621, %1622, %1620 : index + %1624 = divi_signed %1623, %c16 : index + %1625 = subi %c-1, %1624 : index + %1626 = select %1621, %1625, %1624 : index + %1627 = muli %1626, %c-2 : index + %1628 = addi %1619, %1627 : index + %1629 = cmpi "slt", %arg5, %c0 : index + %1630 = subi %c-1, %arg5 : index + %1631 = select %1629, %1630, %arg5 : index + %1632 = divi_signed %1631, %c8 : index + %1633 = subi %c-1, %1632 : index + %1634 = select %1629, %1633, %1632 : index + %1635 = addi %arg5, %c8 : index + %1636 = cmpi "slt", %1635, %c0 : index + %1637 = subi %c-1, %1635 : index + %1638 = select %1636, %1637, %1635 : index + %1639 = divi_signed %1638, %c16 : index + %1640 = subi %c-1, %1639 : index + %1641 = select %1636, %1640, %1639 : index + %1642 = muli %1641, %c-2 : index + %1643 = addi %1634, %1642 : index + %1644 = addi %1643, %c1 : index + %1645 = cmpi "slt", %1644, %c0 : index + %1646 = subi %c-1, %1644 : index + %1647 = select %1645, %1646, %1644 : index + %1648 = divi_signed %1647, %c2 : index + %1649 = subi %c-1, %1648 : index + %1650 = select %1645, %1649, %1648 : index + %1651 = muli %1650, %c-2 : index + %1652 = addi %1628, %1651 : index + %1653 = addi %1652, %c1 : index + %1654 = load %3[%1608, %1613, %1653] : memref<16x128x2xvector<8xf32>> + %1655 = vector.extractelement %1654[%c1_i64 : i64] : vector<8xf32> + %1656 = addi %arg5, %c8 : index + %1657 = cmpi "slt", %1656, %c0 : index + %1658 = subi %c-1, %1656 : index + %1659 = select %1657, %1658, %1656 : index + %1660 = divi_signed %1659, %c16 : index + %1661 = subi %c-1, %1660 : index + %1662 = select %1657, %1661, %1660 : index + %1663 = remi_signed %1662, %c16 : index + %1664 = cmpi "slt", %1663, %c0 : index + %1665 = addi %1663, %c16 : index + %1666 = select %1664, %1665, %1663 : index + %1667 = addi %arg6, %arg8 : index + %1668 = remi_signed %1667, %c128 : index + %1669 = cmpi "slt", %1668, %c0 : index + %1670 = addi %1668, %c128 : index + %1671 = select %1669, %1670, %1668 : index + %1672 = cmpi "slt", %arg5, %c0 : index + %1673 = subi %c-1, %arg5 : index + %1674 = select %1672, %1673, %arg5 : index + %1675 = divi_signed %1674, %c8 : index + %1676 = subi %c-1, %1675 : index + %1677 = select %1672, %1676, %1675 : index + %1678 = addi %arg5, %c8 : index + %1679 = cmpi "slt", %1678, %c0 : index + %1680 = subi %c-1, %1678 : index + %1681 = select %1679, %1680, %1678 : index + %1682 = divi_signed %1681, %c16 : index + %1683 = subi %c-1, %1682 : index + %1684 = select %1679, %1683, %1682 : index + %1685 = muli %1684, %c-2 : index + %1686 = addi %1677, %1685 : index + %1687 = cmpi "slt", %arg5, %c0 : index + %1688 = subi %c-1, %arg5 : index + %1689 = select %1687, %1688, %arg5 : index + %1690 = divi_signed %1689, %c8 : index + %1691 = subi %c-1, %1690 : index + %1692 = select %1687, %1691, %1690 : index + %1693 = addi %arg5, %c8 : index + %1694 = cmpi "slt", %1693, %c0 : index + %1695 = subi %c-1, %1693 : index + %1696 = select %1694, %1695, %1693 : index + %1697 = divi_signed %1696, %c16 : index + %1698 = subi %c-1, %1697 : index + %1699 = select %1694, %1698, %1697 : index + %1700 = muli %1699, %c-2 : index + %1701 = addi %1692, %1700 : index + %1702 = addi %1701, %c1 : index + %1703 = cmpi "slt", %1702, %c0 : index + %1704 = subi %c-1, %1702 : index + %1705 = select %1703, %1704, %1702 : index + %1706 = divi_signed %1705, %c2 : index + %1707 = subi %c-1, %1706 : index + %1708 = select %1703, %1707, %1706 : index + %1709 = muli %1708, %c-2 : index + %1710 = addi %1686, %1709 : index + %1711 = addi %1710, %c1 : index + %1712 = load %3[%1666, %1671, %1711] : memref<16x128x2xvector<8xf32>> + %1713 = vector.extractelement %1712[%c2_i64 : i64] : vector<8xf32> + %1714 = addi %arg5, %c8 : index + %1715 = cmpi "slt", %1714, %c0 : index + %1716 = subi %c-1, %1714 : index + %1717 = select %1715, %1716, %1714 : index + %1718 = divi_signed %1717, %c16 : index + %1719 = subi %c-1, %1718 : index + %1720 = select %1715, %1719, %1718 : index + %1721 = remi_signed %1720, %c16 : index + %1722 = cmpi "slt", %1721, %c0 : index + %1723 = addi %1721, %c16 : index + %1724 = select %1722, %1723, %1721 : index + %1725 = addi %arg6, %arg8 : index + %1726 = remi_signed %1725, %c128 : index + %1727 = cmpi "slt", %1726, %c0 : index + %1728 = addi %1726, %c128 : index + %1729 = select %1727, %1728, %1726 : index + %1730 = cmpi "slt", %arg5, %c0 : index + %1731 = subi %c-1, %arg5 : index + %1732 = select %1730, %1731, %arg5 : index + %1733 = divi_signed %1732, %c8 : index + %1734 = subi %c-1, %1733 : index + %1735 = select %1730, %1734, %1733 : index + %1736 = addi %arg5, %c8 : index + %1737 = cmpi "slt", %1736, %c0 : index + %1738 = subi %c-1, %1736 : index + %1739 = select %1737, %1738, %1736 : index + %1740 = divi_signed %1739, %c16 : index + %1741 = subi %c-1, %1740 : index + %1742 = select %1737, %1741, %1740 : index + %1743 = muli %1742, %c-2 : index + %1744 = addi %1735, %1743 : index + %1745 = cmpi "slt", %arg5, %c0 : index + %1746 = subi %c-1, %arg5 : index + %1747 = select %1745, %1746, %arg5 : index + %1748 = divi_signed %1747, %c8 : index + %1749 = subi %c-1, %1748 : index + %1750 = select %1745, %1749, %1748 : index + %1751 = addi %arg5, %c8 : index + %1752 = cmpi "slt", %1751, %c0 : index + %1753 = subi %c-1, %1751 : index + %1754 = select %1752, %1753, %1751 : index + %1755 = divi_signed %1754, %c16 : index + %1756 = subi %c-1, %1755 : index + %1757 = select %1752, %1756, %1755 : index + %1758 = muli %1757, %c-2 : index + %1759 = addi %1750, %1758 : index + %1760 = addi %1759, %c1 : index + %1761 = cmpi "slt", %1760, %c0 : index + %1762 = subi %c-1, %1760 : index + %1763 = select %1761, %1762, %1760 : index + %1764 = divi_signed %1763, %c2 : index + %1765 = subi %c-1, %1764 : index + %1766 = select %1761, %1765, %1764 : index + %1767 = muli %1766, %c-2 : index + %1768 = addi %1744, %1767 : index + %1769 = addi %1768, %c1 : index + %1770 = load %3[%1724, %1729, %1769] : memref<16x128x2xvector<8xf32>> + %1771 = vector.extractelement %1770[%c3_i64 : i64] : vector<8xf32> + %1772 = addi %arg5, %c8 : index + %1773 = cmpi "slt", %1772, %c0 : index + %1774 = subi %c-1, %1772 : index + %1775 = select %1773, %1774, %1772 : index + %1776 = divi_signed %1775, %c16 : index + %1777 = subi %c-1, %1776 : index + %1778 = select %1773, %1777, %1776 : index + %1779 = remi_signed %1778, %c16 : index + %1780 = cmpi "slt", %1779, %c0 : index + %1781 = addi %1779, %c16 : index + %1782 = select %1780, %1781, %1779 : index + %1783 = addi %arg6, %arg8 : index + %1784 = remi_signed %1783, %c128 : index + %1785 = cmpi "slt", %1784, %c0 : index + %1786 = addi %1784, %c128 : index + %1787 = select %1785, %1786, %1784 : index + %1788 = cmpi "slt", %arg5, %c0 : index + %1789 = subi %c-1, %arg5 : index + %1790 = select %1788, %1789, %arg5 : index + %1791 = divi_signed %1790, %c8 : index + %1792 = subi %c-1, %1791 : index + %1793 = select %1788, %1792, %1791 : index + %1794 = addi %arg5, %c8 : index + %1795 = cmpi "slt", %1794, %c0 : index + %1796 = subi %c-1, %1794 : index + %1797 = select %1795, %1796, %1794 : index + %1798 = divi_signed %1797, %c16 : index + %1799 = subi %c-1, %1798 : index + %1800 = select %1795, %1799, %1798 : index + %1801 = muli %1800, %c-2 : index + %1802 = addi %1793, %1801 : index + %1803 = cmpi "slt", %arg5, %c0 : index + %1804 = subi %c-1, %arg5 : index + %1805 = select %1803, %1804, %arg5 : index + %1806 = divi_signed %1805, %c8 : index + %1807 = subi %c-1, %1806 : index + %1808 = select %1803, %1807, %1806 : index + %1809 = addi %arg5, %c8 : index + %1810 = cmpi "slt", %1809, %c0 : index + %1811 = subi %c-1, %1809 : index + %1812 = select %1810, %1811, %1809 : index + %1813 = divi_signed %1812, %c16 : index + %1814 = subi %c-1, %1813 : index + %1815 = select %1810, %1814, %1813 : index + %1816 = muli %1815, %c-2 : index + %1817 = addi %1808, %1816 : index + %1818 = addi %1817, %c1 : index + %1819 = cmpi "slt", %1818, %c0 : index + %1820 = subi %c-1, %1818 : index + %1821 = select %1819, %1820, %1818 : index + %1822 = divi_signed %1821, %c2 : index + %1823 = subi %c-1, %1822 : index + %1824 = select %1819, %1823, %1822 : index + %1825 = muli %1824, %c-2 : index + %1826 = addi %1802, %1825 : index + %1827 = addi %1826, %c1 : index + %1828 = load %3[%1782, %1787, %1827] : memref<16x128x2xvector<8xf32>> + %1829 = vector.extractelement %1828[%c4_i64 : i64] : vector<8xf32> + %1830 = addi %arg5, %c8 : index + %1831 = cmpi "slt", %1830, %c0 : index + %1832 = subi %c-1, %1830 : index + %1833 = select %1831, %1832, %1830 : index + %1834 = divi_signed %1833, %c16 : index + %1835 = subi %c-1, %1834 : index + %1836 = select %1831, %1835, %1834 : index + %1837 = remi_signed %1836, %c16 : index + %1838 = cmpi "slt", %1837, %c0 : index + %1839 = addi %1837, %c16 : index + %1840 = select %1838, %1839, %1837 : index + %1841 = addi %arg6, %arg8 : index + %1842 = remi_signed %1841, %c128 : index + %1843 = cmpi "slt", %1842, %c0 : index + %1844 = addi %1842, %c128 : index + %1845 = select %1843, %1844, %1842 : index + %1846 = cmpi "slt", %arg5, %c0 : index + %1847 = subi %c-1, %arg5 : index + %1848 = select %1846, %1847, %arg5 : index + %1849 = divi_signed %1848, %c8 : index + %1850 = subi %c-1, %1849 : index + %1851 = select %1846, %1850, %1849 : index + %1852 = addi %arg5, %c8 : index + %1853 = cmpi "slt", %1852, %c0 : index + %1854 = subi %c-1, %1852 : index + %1855 = select %1853, %1854, %1852 : index + %1856 = divi_signed %1855, %c16 : index + %1857 = subi %c-1, %1856 : index + %1858 = select %1853, %1857, %1856 : index + %1859 = muli %1858, %c-2 : index + %1860 = addi %1851, %1859 : index + %1861 = cmpi "slt", %arg5, %c0 : index + %1862 = subi %c-1, %arg5 : index + %1863 = select %1861, %1862, %arg5 : index + %1864 = divi_signed %1863, %c8 : index + %1865 = subi %c-1, %1864 : index + %1866 = select %1861, %1865, %1864 : index + %1867 = addi %arg5, %c8 : index + %1868 = cmpi "slt", %1867, %c0 : index + %1869 = subi %c-1, %1867 : index + %1870 = select %1868, %1869, %1867 : index + %1871 = divi_signed %1870, %c16 : index + %1872 = subi %c-1, %1871 : index + %1873 = select %1868, %1872, %1871 : index + %1874 = muli %1873, %c-2 : index + %1875 = addi %1866, %1874 : index + %1876 = addi %1875, %c1 : index + %1877 = cmpi "slt", %1876, %c0 : index + %1878 = subi %c-1, %1876 : index + %1879 = select %1877, %1878, %1876 : index + %1880 = divi_signed %1879, %c2 : index + %1881 = subi %c-1, %1880 : index + %1882 = select %1877, %1881, %1880 : index + %1883 = muli %1882, %c-2 : index + %1884 = addi %1860, %1883 : index + %1885 = addi %1884, %c1 : index + %1886 = load %3[%1840, %1845, %1885] : memref<16x128x2xvector<8xf32>> + %1887 = vector.extractelement %1886[%c5_i64 : i64] : vector<8xf32> + %1888 = addi %arg5, %c8 : index + %1889 = cmpi "slt", %1888, %c0 : index + %1890 = subi %c-1, %1888 : index + %1891 = select %1889, %1890, %1888 : index + %1892 = divi_signed %1891, %c16 : index + %1893 = subi %c-1, %1892 : index + %1894 = select %1889, %1893, %1892 : index + %1895 = remi_signed %1894, %c16 : index + %1896 = cmpi "slt", %1895, %c0 : index + %1897 = addi %1895, %c16 : index + %1898 = select %1896, %1897, %1895 : index + %1899 = addi %arg6, %arg8 : index + %1900 = remi_signed %1899, %c128 : index + %1901 = cmpi "slt", %1900, %c0 : index + %1902 = addi %1900, %c128 : index + %1903 = select %1901, %1902, %1900 : index + %1904 = cmpi "slt", %arg5, %c0 : index + %1905 = subi %c-1, %arg5 : index + %1906 = select %1904, %1905, %arg5 : index + %1907 = divi_signed %1906, %c8 : index + %1908 = subi %c-1, %1907 : index + %1909 = select %1904, %1908, %1907 : index + %1910 = addi %arg5, %c8 : index + %1911 = cmpi "slt", %1910, %c0 : index + %1912 = subi %c-1, %1910 : index + %1913 = select %1911, %1912, %1910 : index + %1914 = divi_signed %1913, %c16 : index + %1915 = subi %c-1, %1914 : index + %1916 = select %1911, %1915, %1914 : index + %1917 = muli %1916, %c-2 : index + %1918 = addi %1909, %1917 : index + %1919 = cmpi "slt", %arg5, %c0 : index + %1920 = subi %c-1, %arg5 : index + %1921 = select %1919, %1920, %arg5 : index + %1922 = divi_signed %1921, %c8 : index + %1923 = subi %c-1, %1922 : index + %1924 = select %1919, %1923, %1922 : index + %1925 = addi %arg5, %c8 : index + %1926 = cmpi "slt", %1925, %c0 : index + %1927 = subi %c-1, %1925 : index + %1928 = select %1926, %1927, %1925 : index + %1929 = divi_signed %1928, %c16 : index + %1930 = subi %c-1, %1929 : index + %1931 = select %1926, %1930, %1929 : index + %1932 = muli %1931, %c-2 : index + %1933 = addi %1924, %1932 : index + %1934 = addi %1933, %c1 : index + %1935 = cmpi "slt", %1934, %c0 : index + %1936 = subi %c-1, %1934 : index + %1937 = select %1935, %1936, %1934 : index + %1938 = divi_signed %1937, %c2 : index + %1939 = subi %c-1, %1938 : index + %1940 = select %1935, %1939, %1938 : index + %1941 = muli %1940, %c-2 : index + %1942 = addi %1918, %1941 : index + %1943 = addi %1942, %c1 : index + %1944 = load %3[%1898, %1903, %1943] : memref<16x128x2xvector<8xf32>> + %1945 = vector.extractelement %1944[%c6_i64 : i64] : vector<8xf32> + %1946 = addi %arg5, %c8 : index + %1947 = cmpi "slt", %1946, %c0 : index + %1948 = subi %c-1, %1946 : index + %1949 = select %1947, %1948, %1946 : index + %1950 = divi_signed %1949, %c16 : index + %1951 = subi %c-1, %1950 : index + %1952 = select %1947, %1951, %1950 : index + %1953 = remi_signed %1952, %c16 : index + %1954 = cmpi "slt", %1953, %c0 : index + %1955 = addi %1953, %c16 : index + %1956 = select %1954, %1955, %1953 : index + %1957 = addi %arg6, %arg8 : index + %1958 = remi_signed %1957, %c128 : index + %1959 = cmpi "slt", %1958, %c0 : index + %1960 = addi %1958, %c128 : index + %1961 = select %1959, %1960, %1958 : index + %1962 = cmpi "slt", %arg5, %c0 : index + %1963 = subi %c-1, %arg5 : index + %1964 = select %1962, %1963, %arg5 : index + %1965 = divi_signed %1964, %c8 : index + %1966 = subi %c-1, %1965 : index + %1967 = select %1962, %1966, %1965 : index + %1968 = addi %arg5, %c8 : index + %1969 = cmpi "slt", %1968, %c0 : index + %1970 = subi %c-1, %1968 : index + %1971 = select %1969, %1970, %1968 : index + %1972 = divi_signed %1971, %c16 : index + %1973 = subi %c-1, %1972 : index + %1974 = select %1969, %1973, %1972 : index + %1975 = muli %1974, %c-2 : index + %1976 = addi %1967, %1975 : index + %1977 = cmpi "slt", %arg5, %c0 : index + %1978 = subi %c-1, %arg5 : index + %1979 = select %1977, %1978, %arg5 : index + %1980 = divi_signed %1979, %c8 : index + %1981 = subi %c-1, %1980 : index + %1982 = select %1977, %1981, %1980 : index + %1983 = addi %arg5, %c8 : index + %1984 = cmpi "slt", %1983, %c0 : index + %1985 = subi %c-1, %1983 : index + %1986 = select %1984, %1985, %1983 : index + %1987 = divi_signed %1986, %c16 : index + %1988 = subi %c-1, %1987 : index + %1989 = select %1984, %1988, %1987 : index + %1990 = muli %1989, %c-2 : index + %1991 = addi %1982, %1990 : index + %1992 = addi %1991, %c1 : index + %1993 = cmpi "slt", %1992, %c0 : index + %1994 = subi %c-1, %1992 : index + %1995 = select %1993, %1994, %1992 : index + %1996 = divi_signed %1995, %c2 : index + %1997 = subi %c-1, %1996 : index + %1998 = select %1993, %1997, %1996 : index + %1999 = muli %1998, %c-2 : index + %2000 = addi %1976, %1999 : index + %2001 = addi %2000, %c1 : index + %2002 = load %3[%1956, %1961, %2001] : memref<16x128x2xvector<8xf32>> + %2003 = vector.extractelement %2002[%c7_i64 : i64] : vector<8xf32> + %2004 = mulf %1532, %1597 {RelaxedPrecision} : f32 + %2005 = mulf %1533, %1655 {RelaxedPrecision} : f32 + %2006 = mulf %1534, %1713 {RelaxedPrecision} : f32 + %2007 = mulf %1535, %1771 {RelaxedPrecision} : f32 + %2008 = mulf %1536, %1829 {RelaxedPrecision} : f32 + %2009 = mulf %1537, %1887 {RelaxedPrecision} : f32 + %2010 = mulf %1538, %1945 {RelaxedPrecision} : f32 + %2011 = mulf %1539, %2003 {RelaxedPrecision} : f32 + %2012 = addi %arg5, %c8 : index + %2013 = cmpi "slt", %2012, %c0 : index + %2014 = subi %c-1, %2012 : index + %2015 = select %2013, %2014, %2012 : index + %2016 = divi_signed %2015, %c16 : index + %2017 = subi %c-1, %2016 : index + %2018 = select %2013, %2017, %2016 : index + %2019 = remi_signed %2018, %c16 : index + %2020 = cmpi "slt", %2019, %c0 : index + %2021 = addi %2019, %c16 : index + %2022 = select %2020, %2021, %2019 : index + %2023 = addi %arg7, %arg9 : index + %2024 = remi_signed %2023, %c6 : index + %2025 = cmpi "slt", %2024, %c0 : index + %2026 = addi %2024, %c6 : index + %2027 = select %2025, %2026, %2024 : index + %2028 = cmpi "slt", %arg5, %c0 : index + %2029 = subi %c-1, %arg5 : index + %2030 = select %2028, %2029, %arg5 : index + %2031 = divi_signed %2030, %c8 : index + %2032 = subi %c-1, %2031 : index + %2033 = select %2028, %2032, %2031 : index + %2034 = addi %arg5, %c8 : index + %2035 = cmpi "slt", %2034, %c0 : index + %2036 = subi %c-1, %2034 : index + %2037 = select %2035, %2036, %2034 : index + %2038 = divi_signed %2037, %c16 : index + %2039 = subi %c-1, %2038 : index + %2040 = select %2035, %2039, %2038 : index + %2041 = muli %2040, %c-2 : index + %2042 = addi %2033, %2041 : index + %2043 = cmpi "slt", %arg5, %c0 : index + %2044 = subi %c-1, %arg5 : index + %2045 = select %2043, %2044, %arg5 : index + %2046 = divi_signed %2045, %c8 : index + %2047 = subi %c-1, %2046 : index + %2048 = select %2043, %2047, %2046 : index + %2049 = addi %arg5, %c8 : index + %2050 = cmpi "slt", %2049, %c0 : index + %2051 = subi %c-1, %2049 : index + %2052 = select %2050, %2051, %2049 : index + %2053 = divi_signed %2052, %c16 : index + %2054 = subi %c-1, %2053 : index + %2055 = select %2050, %2054, %2053 : index + %2056 = muli %2055, %c-2 : index + %2057 = addi %2048, %2056 : index + %2058 = addi %2057, %c1 : index + %2059 = cmpi "slt", %2058, %c0 : index + %2060 = subi %c-1, %2058 : index + %2061 = select %2059, %2060, %2058 : index + %2062 = divi_signed %2061, %c2 : index + %2063 = subi %c-1, %2062 : index + %2064 = select %2059, %2063, %2062 : index + %2065 = muli %2064, %c-2 : index + %2066 = addi %2042, %2065 : index + %2067 = addi %2066, %c1 : index + %2068 = load %2[%2022, %2027, %2067] : memref<16x6x2xvector<8xf32>> + %2069 = vector.extractelement %2068[%c0_i64 : i64] : vector<8xf32> + %2070 = addi %arg5, %c8 : index + %2071 = cmpi "slt", %2070, %c0 : index + %2072 = subi %c-1, %2070 : index + %2073 = select %2071, %2072, %2070 : index + %2074 = divi_signed %2073, %c16 : index + %2075 = subi %c-1, %2074 : index + %2076 = select %2071, %2075, %2074 : index + %2077 = remi_signed %2076, %c16 : index + %2078 = cmpi "slt", %2077, %c0 : index + %2079 = addi %2077, %c16 : index + %2080 = select %2078, %2079, %2077 : index + %2081 = addi %arg7, %arg9 : index + %2082 = remi_signed %2081, %c6 : index + %2083 = cmpi "slt", %2082, %c0 : index + %2084 = addi %2082, %c6 : index + %2085 = select %2083, %2084, %2082 : index + %2086 = cmpi "slt", %arg5, %c0 : index + %2087 = subi %c-1, %arg5 : index + %2088 = select %2086, %2087, %arg5 : index + %2089 = divi_signed %2088, %c8 : index + %2090 = subi %c-1, %2089 : index + %2091 = select %2086, %2090, %2089 : index + %2092 = addi %arg5, %c8 : index + %2093 = cmpi "slt", %2092, %c0 : index + %2094 = subi %c-1, %2092 : index + %2095 = select %2093, %2094, %2092 : index + %2096 = divi_signed %2095, %c16 : index + %2097 = subi %c-1, %2096 : index + %2098 = select %2093, %2097, %2096 : index + %2099 = muli %2098, %c-2 : index + %2100 = addi %2091, %2099 : index + %2101 = cmpi "slt", %arg5, %c0 : index + %2102 = subi %c-1, %arg5 : index + %2103 = select %2101, %2102, %arg5 : index + %2104 = divi_signed %2103, %c8 : index + %2105 = subi %c-1, %2104 : index + %2106 = select %2101, %2105, %2104 : index + %2107 = addi %arg5, %c8 : index + %2108 = cmpi "slt", %2107, %c0 : index + %2109 = subi %c-1, %2107 : index + %2110 = select %2108, %2109, %2107 : index + %2111 = divi_signed %2110, %c16 : index + %2112 = subi %c-1, %2111 : index + %2113 = select %2108, %2112, %2111 : index + %2114 = muli %2113, %c-2 : index + %2115 = addi %2106, %2114 : index + %2116 = addi %2115, %c1 : index + %2117 = cmpi "slt", %2116, %c0 : index + %2118 = subi %c-1, %2116 : index + %2119 = select %2117, %2118, %2116 : index + %2120 = divi_signed %2119, %c2 : index + %2121 = subi %c-1, %2120 : index + %2122 = select %2117, %2121, %2120 : index + %2123 = muli %2122, %c-2 : index + %2124 = addi %2100, %2123 : index + %2125 = addi %2124, %c1 : index + %2126 = load %2[%2080, %2085, %2125] : memref<16x6x2xvector<8xf32>> + %2127 = vector.extractelement %2126[%c1_i64 : i64] : vector<8xf32> + %2128 = addi %arg5, %c8 : index + %2129 = cmpi "slt", %2128, %c0 : index + %2130 = subi %c-1, %2128 : index + %2131 = select %2129, %2130, %2128 : index + %2132 = divi_signed %2131, %c16 : index + %2133 = subi %c-1, %2132 : index + %2134 = select %2129, %2133, %2132 : index + %2135 = remi_signed %2134, %c16 : index + %2136 = cmpi "slt", %2135, %c0 : index + %2137 = addi %2135, %c16 : index + %2138 = select %2136, %2137, %2135 : index + %2139 = addi %arg7, %arg9 : index + %2140 = remi_signed %2139, %c6 : index + %2141 = cmpi "slt", %2140, %c0 : index + %2142 = addi %2140, %c6 : index + %2143 = select %2141, %2142, %2140 : index + %2144 = cmpi "slt", %arg5, %c0 : index + %2145 = subi %c-1, %arg5 : index + %2146 = select %2144, %2145, %arg5 : index + %2147 = divi_signed %2146, %c8 : index + %2148 = subi %c-1, %2147 : index + %2149 = select %2144, %2148, %2147 : index + %2150 = addi %arg5, %c8 : index + %2151 = cmpi "slt", %2150, %c0 : index + %2152 = subi %c-1, %2150 : index + %2153 = select %2151, %2152, %2150 : index + %2154 = divi_signed %2153, %c16 : index + %2155 = subi %c-1, %2154 : index + %2156 = select %2151, %2155, %2154 : index + %2157 = muli %2156, %c-2 : index + %2158 = addi %2149, %2157 : index + %2159 = cmpi "slt", %arg5, %c0 : index + %2160 = subi %c-1, %arg5 : index + %2161 = select %2159, %2160, %arg5 : index + %2162 = divi_signed %2161, %c8 : index + %2163 = subi %c-1, %2162 : index + %2164 = select %2159, %2163, %2162 : index + %2165 = addi %arg5, %c8 : index + %2166 = cmpi "slt", %2165, %c0 : index + %2167 = subi %c-1, %2165 : index + %2168 = select %2166, %2167, %2165 : index + %2169 = divi_signed %2168, %c16 : index + %2170 = subi %c-1, %2169 : index + %2171 = select %2166, %2170, %2169 : index + %2172 = muli %2171, %c-2 : index + %2173 = addi %2164, %2172 : index + %2174 = addi %2173, %c1 : index + %2175 = cmpi "slt", %2174, %c0 : index + %2176 = subi %c-1, %2174 : index + %2177 = select %2175, %2176, %2174 : index + %2178 = divi_signed %2177, %c2 : index + %2179 = subi %c-1, %2178 : index + %2180 = select %2175, %2179, %2178 : index + %2181 = muli %2180, %c-2 : index + %2182 = addi %2158, %2181 : index + %2183 = addi %2182, %c1 : index + %2184 = load %2[%2138, %2143, %2183] : memref<16x6x2xvector<8xf32>> + %2185 = vector.extractelement %2184[%c2_i64 : i64] : vector<8xf32> + %2186 = addi %arg5, %c8 : index + %2187 = cmpi "slt", %2186, %c0 : index + %2188 = subi %c-1, %2186 : index + %2189 = select %2187, %2188, %2186 : index + %2190 = divi_signed %2189, %c16 : index + %2191 = subi %c-1, %2190 : index + %2192 = select %2187, %2191, %2190 : index + %2193 = remi_signed %2192, %c16 : index + %2194 = cmpi "slt", %2193, %c0 : index + %2195 = addi %2193, %c16 : index + %2196 = select %2194, %2195, %2193 : index + %2197 = addi %arg7, %arg9 : index + %2198 = remi_signed %2197, %c6 : index + %2199 = cmpi "slt", %2198, %c0 : index + %2200 = addi %2198, %c6 : index + %2201 = select %2199, %2200, %2198 : index + %2202 = cmpi "slt", %arg5, %c0 : index + %2203 = subi %c-1, %arg5 : index + %2204 = select %2202, %2203, %arg5 : index + %2205 = divi_signed %2204, %c8 : index + %2206 = subi %c-1, %2205 : index + %2207 = select %2202, %2206, %2205 : index + %2208 = addi %arg5, %c8 : index + %2209 = cmpi "slt", %2208, %c0 : index + %2210 = subi %c-1, %2208 : index + %2211 = select %2209, %2210, %2208 : index + %2212 = divi_signed %2211, %c16 : index + %2213 = subi %c-1, %2212 : index + %2214 = select %2209, %2213, %2212 : index + %2215 = muli %2214, %c-2 : index + %2216 = addi %2207, %2215 : index + %2217 = cmpi "slt", %arg5, %c0 : index + %2218 = subi %c-1, %arg5 : index + %2219 = select %2217, %2218, %arg5 : index + %2220 = divi_signed %2219, %c8 : index + %2221 = subi %c-1, %2220 : index + %2222 = select %2217, %2221, %2220 : index + %2223 = addi %arg5, %c8 : index + %2224 = cmpi "slt", %2223, %c0 : index + %2225 = subi %c-1, %2223 : index + %2226 = select %2224, %2225, %2223 : index + %2227 = divi_signed %2226, %c16 : index + %2228 = subi %c-1, %2227 : index + %2229 = select %2224, %2228, %2227 : index + %2230 = muli %2229, %c-2 : index + %2231 = addi %2222, %2230 : index + %2232 = addi %2231, %c1 : index + %2233 = cmpi "slt", %2232, %c0 : index + %2234 = subi %c-1, %2232 : index + %2235 = select %2233, %2234, %2232 : index + %2236 = divi_signed %2235, %c2 : index + %2237 = subi %c-1, %2236 : index + %2238 = select %2233, %2237, %2236 : index + %2239 = muli %2238, %c-2 : index + %2240 = addi %2216, %2239 : index + %2241 = addi %2240, %c1 : index + %2242 = load %2[%2196, %2201, %2241] : memref<16x6x2xvector<8xf32>> + %2243 = vector.extractelement %2242[%c3_i64 : i64] : vector<8xf32> + %2244 = addi %arg5, %c8 : index + %2245 = cmpi "slt", %2244, %c0 : index + %2246 = subi %c-1, %2244 : index + %2247 = select %2245, %2246, %2244 : index + %2248 = divi_signed %2247, %c16 : index + %2249 = subi %c-1, %2248 : index + %2250 = select %2245, %2249, %2248 : index + %2251 = remi_signed %2250, %c16 : index + %2252 = cmpi "slt", %2251, %c0 : index + %2253 = addi %2251, %c16 : index + %2254 = select %2252, %2253, %2251 : index + %2255 = addi %arg7, %arg9 : index + %2256 = remi_signed %2255, %c6 : index + %2257 = cmpi "slt", %2256, %c0 : index + %2258 = addi %2256, %c6 : index + %2259 = select %2257, %2258, %2256 : index + %2260 = cmpi "slt", %arg5, %c0 : index + %2261 = subi %c-1, %arg5 : index + %2262 = select %2260, %2261, %arg5 : index + %2263 = divi_signed %2262, %c8 : index + %2264 = subi %c-1, %2263 : index + %2265 = select %2260, %2264, %2263 : index + %2266 = addi %arg5, %c8 : index + %2267 = cmpi "slt", %2266, %c0 : index + %2268 = subi %c-1, %2266 : index + %2269 = select %2267, %2268, %2266 : index + %2270 = divi_signed %2269, %c16 : index + %2271 = subi %c-1, %2270 : index + %2272 = select %2267, %2271, %2270 : index + %2273 = muli %2272, %c-2 : index + %2274 = addi %2265, %2273 : index + %2275 = cmpi "slt", %arg5, %c0 : index + %2276 = subi %c-1, %arg5 : index + %2277 = select %2275, %2276, %arg5 : index + %2278 = divi_signed %2277, %c8 : index + %2279 = subi %c-1, %2278 : index + %2280 = select %2275, %2279, %2278 : index + %2281 = addi %arg5, %c8 : index + %2282 = cmpi "slt", %2281, %c0 : index + %2283 = subi %c-1, %2281 : index + %2284 = select %2282, %2283, %2281 : index + %2285 = divi_signed %2284, %c16 : index + %2286 = subi %c-1, %2285 : index + %2287 = select %2282, %2286, %2285 : index + %2288 = muli %2287, %c-2 : index + %2289 = addi %2280, %2288 : index + %2290 = addi %2289, %c1 : index + %2291 = cmpi "slt", %2290, %c0 : index + %2292 = subi %c-1, %2290 : index + %2293 = select %2291, %2292, %2290 : index + %2294 = divi_signed %2293, %c2 : index + %2295 = subi %c-1, %2294 : index + %2296 = select %2291, %2295, %2294 : index + %2297 = muli %2296, %c-2 : index + %2298 = addi %2274, %2297 : index + %2299 = addi %2298, %c1 : index + %2300 = load %2[%2254, %2259, %2299] : memref<16x6x2xvector<8xf32>> + %2301 = vector.extractelement %2300[%c4_i64 : i64] : vector<8xf32> + %2302 = addi %arg5, %c8 : index + %2303 = cmpi "slt", %2302, %c0 : index + %2304 = subi %c-1, %2302 : index + %2305 = select %2303, %2304, %2302 : index + %2306 = divi_signed %2305, %c16 : index + %2307 = subi %c-1, %2306 : index + %2308 = select %2303, %2307, %2306 : index + %2309 = remi_signed %2308, %c16 : index + %2310 = cmpi "slt", %2309, %c0 : index + %2311 = addi %2309, %c16 : index + %2312 = select %2310, %2311, %2309 : index + %2313 = addi %arg7, %arg9 : index + %2314 = remi_signed %2313, %c6 : index + %2315 = cmpi "slt", %2314, %c0 : index + %2316 = addi %2314, %c6 : index + %2317 = select %2315, %2316, %2314 : index + %2318 = cmpi "slt", %arg5, %c0 : index + %2319 = subi %c-1, %arg5 : index + %2320 = select %2318, %2319, %arg5 : index + %2321 = divi_signed %2320, %c8 : index + %2322 = subi %c-1, %2321 : index + %2323 = select %2318, %2322, %2321 : index + %2324 = addi %arg5, %c8 : index + %2325 = cmpi "slt", %2324, %c0 : index + %2326 = subi %c-1, %2324 : index + %2327 = select %2325, %2326, %2324 : index + %2328 = divi_signed %2327, %c16 : index + %2329 = subi %c-1, %2328 : index + %2330 = select %2325, %2329, %2328 : index + %2331 = muli %2330, %c-2 : index + %2332 = addi %2323, %2331 : index + %2333 = cmpi "slt", %arg5, %c0 : index + %2334 = subi %c-1, %arg5 : index + %2335 = select %2333, %2334, %arg5 : index + %2336 = divi_signed %2335, %c8 : index + %2337 = subi %c-1, %2336 : index + %2338 = select %2333, %2337, %2336 : index + %2339 = addi %arg5, %c8 : index + %2340 = cmpi "slt", %2339, %c0 : index + %2341 = subi %c-1, %2339 : index + %2342 = select %2340, %2341, %2339 : index + %2343 = divi_signed %2342, %c16 : index + %2344 = subi %c-1, %2343 : index + %2345 = select %2340, %2344, %2343 : index + %2346 = muli %2345, %c-2 : index + %2347 = addi %2338, %2346 : index + %2348 = addi %2347, %c1 : index + %2349 = cmpi "slt", %2348, %c0 : index + %2350 = subi %c-1, %2348 : index + %2351 = select %2349, %2350, %2348 : index + %2352 = divi_signed %2351, %c2 : index + %2353 = subi %c-1, %2352 : index + %2354 = select %2349, %2353, %2352 : index + %2355 = muli %2354, %c-2 : index + %2356 = addi %2332, %2355 : index + %2357 = addi %2356, %c1 : index + %2358 = load %2[%2312, %2317, %2357] : memref<16x6x2xvector<8xf32>> + %2359 = vector.extractelement %2358[%c5_i64 : i64] : vector<8xf32> + %2360 = addi %arg5, %c8 : index + %2361 = cmpi "slt", %2360, %c0 : index + %2362 = subi %c-1, %2360 : index + %2363 = select %2361, %2362, %2360 : index + %2364 = divi_signed %2363, %c16 : index + %2365 = subi %c-1, %2364 : index + %2366 = select %2361, %2365, %2364 : index + %2367 = remi_signed %2366, %c16 : index + %2368 = cmpi "slt", %2367, %c0 : index + %2369 = addi %2367, %c16 : index + %2370 = select %2368, %2369, %2367 : index + %2371 = addi %arg7, %arg9 : index + %2372 = remi_signed %2371, %c6 : index + %2373 = cmpi "slt", %2372, %c0 : index + %2374 = addi %2372, %c6 : index + %2375 = select %2373, %2374, %2372 : index + %2376 = cmpi "slt", %arg5, %c0 : index + %2377 = subi %c-1, %arg5 : index + %2378 = select %2376, %2377, %arg5 : index + %2379 = divi_signed %2378, %c8 : index + %2380 = subi %c-1, %2379 : index + %2381 = select %2376, %2380, %2379 : index + %2382 = addi %arg5, %c8 : index + %2383 = cmpi "slt", %2382, %c0 : index + %2384 = subi %c-1, %2382 : index + %2385 = select %2383, %2384, %2382 : index + %2386 = divi_signed %2385, %c16 : index + %2387 = subi %c-1, %2386 : index + %2388 = select %2383, %2387, %2386 : index + %2389 = muli %2388, %c-2 : index + %2390 = addi %2381, %2389 : index + %2391 = cmpi "slt", %arg5, %c0 : index + %2392 = subi %c-1, %arg5 : index + %2393 = select %2391, %2392, %arg5 : index + %2394 = divi_signed %2393, %c8 : index + %2395 = subi %c-1, %2394 : index + %2396 = select %2391, %2395, %2394 : index + %2397 = addi %arg5, %c8 : index + %2398 = cmpi "slt", %2397, %c0 : index + %2399 = subi %c-1, %2397 : index + %2400 = select %2398, %2399, %2397 : index + %2401 = divi_signed %2400, %c16 : index + %2402 = subi %c-1, %2401 : index + %2403 = select %2398, %2402, %2401 : index + %2404 = muli %2403, %c-2 : index + %2405 = addi %2396, %2404 : index + %2406 = addi %2405, %c1 : index + %2407 = cmpi "slt", %2406, %c0 : index + %2408 = subi %c-1, %2406 : index + %2409 = select %2407, %2408, %2406 : index + %2410 = divi_signed %2409, %c2 : index + %2411 = subi %c-1, %2410 : index + %2412 = select %2407, %2411, %2410 : index + %2413 = muli %2412, %c-2 : index + %2414 = addi %2390, %2413 : index + %2415 = addi %2414, %c1 : index + %2416 = load %2[%2370, %2375, %2415] : memref<16x6x2xvector<8xf32>> + %2417 = vector.extractelement %2416[%c6_i64 : i64] : vector<8xf32> + %2418 = addi %arg5, %c8 : index + %2419 = cmpi "slt", %2418, %c0 : index + %2420 = subi %c-1, %2418 : index + %2421 = select %2419, %2420, %2418 : index + %2422 = divi_signed %2421, %c16 : index + %2423 = subi %c-1, %2422 : index + %2424 = select %2419, %2423, %2422 : index + %2425 = remi_signed %2424, %c16 : index + %2426 = cmpi "slt", %2425, %c0 : index + %2427 = addi %2425, %c16 : index + %2428 = select %2426, %2427, %2425 : index + %2429 = addi %arg7, %arg9 : index + %2430 = remi_signed %2429, %c6 : index + %2431 = cmpi "slt", %2430, %c0 : index + %2432 = addi %2430, %c6 : index + %2433 = select %2431, %2432, %2430 : index + %2434 = cmpi "slt", %arg5, %c0 : index + %2435 = subi %c-1, %arg5 : index + %2436 = select %2434, %2435, %arg5 : index + %2437 = divi_signed %2436, %c8 : index + %2438 = subi %c-1, %2437 : index + %2439 = select %2434, %2438, %2437 : index + %2440 = addi %arg5, %c8 : index + %2441 = cmpi "slt", %2440, %c0 : index + %2442 = subi %c-1, %2440 : index + %2443 = select %2441, %2442, %2440 : index + %2444 = divi_signed %2443, %c16 : index + %2445 = subi %c-1, %2444 : index + %2446 = select %2441, %2445, %2444 : index + %2447 = muli %2446, %c-2 : index + %2448 = addi %2439, %2447 : index + %2449 = cmpi "slt", %arg5, %c0 : index + %2450 = subi %c-1, %arg5 : index + %2451 = select %2449, %2450, %arg5 : index + %2452 = divi_signed %2451, %c8 : index + %2453 = subi %c-1, %2452 : index + %2454 = select %2449, %2453, %2452 : index + %2455 = addi %arg5, %c8 : index + %2456 = cmpi "slt", %2455, %c0 : index + %2457 = subi %c-1, %2455 : index + %2458 = select %2456, %2457, %2455 : index + %2459 = divi_signed %2458, %c16 : index + %2460 = subi %c-1, %2459 : index + %2461 = select %2456, %2460, %2459 : index + %2462 = muli %2461, %c-2 : index + %2463 = addi %2454, %2462 : index + %2464 = addi %2463, %c1 : index + %2465 = cmpi "slt", %2464, %c0 : index + %2466 = subi %c-1, %2464 : index + %2467 = select %2465, %2466, %2464 : index + %2468 = divi_signed %2467, %c2 : index + %2469 = subi %c-1, %2468 : index + %2470 = select %2465, %2469, %2468 : index + %2471 = muli %2470, %c-2 : index + %2472 = addi %2448, %2471 : index + %2473 = addi %2472, %c1 : index + %2474 = load %2[%2428, %2433, %2473] : memref<16x6x2xvector<8xf32>> + %2475 = vector.extractelement %2474[%c7_i64 : i64] : vector<8xf32> + %2476 = addf %2069, %2004 {RelaxedPrecision} : f32 + %2477 = addf %2127, %2005 {RelaxedPrecision} : f32 + %2478 = addf %2185, %2006 {RelaxedPrecision} : f32 + %2479 = addf %2243, %2007 {RelaxedPrecision} : f32 + %2480 = addf %2301, %2008 {RelaxedPrecision} : f32 + %2481 = addf %2359, %2009 {RelaxedPrecision} : f32 + %2482 = addf %2417, %2010 {RelaxedPrecision} : f32 + %2483 = addf %2475, %2011 {RelaxedPrecision} : f32 + %2484 = addi %arg5, %c8 : index + %2485 = cmpi "slt", %2484, %c0 : index + %2486 = subi %c-1, %2484 : index + %2487 = select %2485, %2486, %2484 : index + %2488 = divi_signed %2487, %c16 : index + %2489 = subi %c-1, %2488 : index + %2490 = select %2485, %2489, %2488 : index + %2491 = remi_signed %2490, %c16 : index + %2492 = cmpi "slt", %2491, %c0 : index + %2493 = addi %2491, %c16 : index + %2494 = select %2492, %2493, %2491 : index + %2495 = addi %arg7, %arg9 : index + %2496 = remi_signed %2495, %c6 : index + %2497 = cmpi "slt", %2496, %c0 : index + %2498 = addi %2496, %c6 : index + %2499 = select %2497, %2498, %2496 : index + %2500 = cmpi "slt", %arg5, %c0 : index + %2501 = subi %c-1, %arg5 : index + %2502 = select %2500, %2501, %arg5 : index + %2503 = divi_signed %2502, %c8 : index + %2504 = subi %c-1, %2503 : index + %2505 = select %2500, %2504, %2503 : index + %2506 = addi %arg5, %c8 : index + %2507 = cmpi "slt", %2506, %c0 : index + %2508 = subi %c-1, %2506 : index + %2509 = select %2507, %2508, %2506 : index + %2510 = divi_signed %2509, %c16 : index + %2511 = subi %c-1, %2510 : index + %2512 = select %2507, %2511, %2510 : index + %2513 = muli %2512, %c-2 : index + %2514 = addi %2505, %2513 : index + %2515 = cmpi "slt", %arg5, %c0 : index + %2516 = subi %c-1, %arg5 : index + %2517 = select %2515, %2516, %arg5 : index + %2518 = divi_signed %2517, %c8 : index + %2519 = subi %c-1, %2518 : index + %2520 = select %2515, %2519, %2518 : index + %2521 = addi %arg5, %c8 : index + %2522 = cmpi "slt", %2521, %c0 : index + %2523 = subi %c-1, %2521 : index + %2524 = select %2522, %2523, %2521 : index + %2525 = divi_signed %2524, %c16 : index + %2526 = subi %c-1, %2525 : index + %2527 = select %2522, %2526, %2525 : index + %2528 = muli %2527, %c-2 : index + %2529 = addi %2520, %2528 : index + %2530 = addi %2529, %c1 : index + %2531 = cmpi "slt", %2530, %c0 : index + %2532 = subi %c-1, %2530 : index + %2533 = select %2531, %2532, %2530 : index + %2534 = divi_signed %2533, %c2 : index + %2535 = subi %c-1, %2534 : index + %2536 = select %2531, %2535, %2534 : index + %2537 = muli %2536, %c-2 : index + %2538 = addi %2514, %2537 : index + %2539 = addi %2538, %c1 : index + %2540 = load %2[%2494, %2499, %2539] : memref<16x6x2xvector<8xf32>> + %2541 = vector.insertelement %2476, %2540[%c0_i64 : i64] : vector<8xf32> + %2542 = addi %arg5, %c8 : index + %2543 = cmpi "slt", %2542, %c0 : index + %2544 = subi %c-1, %2542 : index + %2545 = select %2543, %2544, %2542 : index + %2546 = divi_signed %2545, %c16 : index + %2547 = subi %c-1, %2546 : index + %2548 = select %2543, %2547, %2546 : index + %2549 = remi_signed %2548, %c16 : index + %2550 = cmpi "slt", %2549, %c0 : index + %2551 = addi %2549, %c16 : index + %2552 = select %2550, %2551, %2549 : index + %2553 = addi %arg7, %arg9 : index + %2554 = remi_signed %2553, %c6 : index + %2555 = cmpi "slt", %2554, %c0 : index + %2556 = addi %2554, %c6 : index + %2557 = select %2555, %2556, %2554 : index + %2558 = cmpi "slt", %arg5, %c0 : index + %2559 = subi %c-1, %arg5 : index + %2560 = select %2558, %2559, %arg5 : index + %2561 = divi_signed %2560, %c8 : index + %2562 = subi %c-1, %2561 : index + %2563 = select %2558, %2562, %2561 : index + %2564 = addi %arg5, %c8 : index + %2565 = cmpi "slt", %2564, %c0 : index + %2566 = subi %c-1, %2564 : index + %2567 = select %2565, %2566, %2564 : index + %2568 = divi_signed %2567, %c16 : index + %2569 = subi %c-1, %2568 : index + %2570 = select %2565, %2569, %2568 : index + %2571 = muli %2570, %c-2 : index + %2572 = addi %2563, %2571 : index + %2573 = cmpi "slt", %arg5, %c0 : index + %2574 = subi %c-1, %arg5 : index + %2575 = select %2573, %2574, %arg5 : index + %2576 = divi_signed %2575, %c8 : index + %2577 = subi %c-1, %2576 : index + %2578 = select %2573, %2577, %2576 : index + %2579 = addi %arg5, %c8 : index + %2580 = cmpi "slt", %2579, %c0 : index + %2581 = subi %c-1, %2579 : index + %2582 = select %2580, %2581, %2579 : index + %2583 = divi_signed %2582, %c16 : index + %2584 = subi %c-1, %2583 : index + %2585 = select %2580, %2584, %2583 : index + %2586 = muli %2585, %c-2 : index + %2587 = addi %2578, %2586 : index + %2588 = addi %2587, %c1 : index + %2589 = cmpi "slt", %2588, %c0 : index + %2590 = subi %c-1, %2588 : index + %2591 = select %2589, %2590, %2588 : index + %2592 = divi_signed %2591, %c2 : index + %2593 = subi %c-1, %2592 : index + %2594 = select %2589, %2593, %2592 : index + %2595 = muli %2594, %c-2 : index + %2596 = addi %2572, %2595 : index + %2597 = addi %2596, %c1 : index + store %2541, %2[%2552, %2557, %2597] : memref<16x6x2xvector<8xf32>> + %2598 = addi %arg5, %c8 : index + %2599 = cmpi "slt", %2598, %c0 : index + %2600 = subi %c-1, %2598 : index + %2601 = select %2599, %2600, %2598 : index + %2602 = divi_signed %2601, %c16 : index + %2603 = subi %c-1, %2602 : index + %2604 = select %2599, %2603, %2602 : index + %2605 = remi_signed %2604, %c16 : index + %2606 = cmpi "slt", %2605, %c0 : index + %2607 = addi %2605, %c16 : index + %2608 = select %2606, %2607, %2605 : index + %2609 = addi %arg7, %arg9 : index + %2610 = remi_signed %2609, %c6 : index + %2611 = cmpi "slt", %2610, %c0 : index + %2612 = addi %2610, %c6 : index + %2613 = select %2611, %2612, %2610 : index + %2614 = cmpi "slt", %arg5, %c0 : index + %2615 = subi %c-1, %arg5 : index + %2616 = select %2614, %2615, %arg5 : index + %2617 = divi_signed %2616, %c8 : index + %2618 = subi %c-1, %2617 : index + %2619 = select %2614, %2618, %2617 : index + %2620 = addi %arg5, %c8 : index + %2621 = cmpi "slt", %2620, %c0 : index + %2622 = subi %c-1, %2620 : index + %2623 = select %2621, %2622, %2620 : index + %2624 = divi_signed %2623, %c16 : index + %2625 = subi %c-1, %2624 : index + %2626 = select %2621, %2625, %2624 : index + %2627 = muli %2626, %c-2 : index + %2628 = addi %2619, %2627 : index + %2629 = cmpi "slt", %arg5, %c0 : index + %2630 = subi %c-1, %arg5 : index + %2631 = select %2629, %2630, %arg5 : index + %2632 = divi_signed %2631, %c8 : index + %2633 = subi %c-1, %2632 : index + %2634 = select %2629, %2633, %2632 : index + %2635 = addi %arg5, %c8 : index + %2636 = cmpi "slt", %2635, %c0 : index + %2637 = subi %c-1, %2635 : index + %2638 = select %2636, %2637, %2635 : index + %2639 = divi_signed %2638, %c16 : index + %2640 = subi %c-1, %2639 : index + %2641 = select %2636, %2640, %2639 : index + %2642 = muli %2641, %c-2 : index + %2643 = addi %2634, %2642 : index + %2644 = addi %2643, %c1 : index + %2645 = cmpi "slt", %2644, %c0 : index + %2646 = subi %c-1, %2644 : index + %2647 = select %2645, %2646, %2644 : index + %2648 = divi_signed %2647, %c2 : index + %2649 = subi %c-1, %2648 : index + %2650 = select %2645, %2649, %2648 : index + %2651 = muli %2650, %c-2 : index + %2652 = addi %2628, %2651 : index + %2653 = addi %2652, %c1 : index + %2654 = load %2[%2608, %2613, %2653] : memref<16x6x2xvector<8xf32>> + %2655 = vector.insertelement %2477, %2654[%c1_i64 : i64] : vector<8xf32> + %2656 = addi %arg5, %c8 : index + %2657 = cmpi "slt", %2656, %c0 : index + %2658 = subi %c-1, %2656 : index + %2659 = select %2657, %2658, %2656 : index + %2660 = divi_signed %2659, %c16 : index + %2661 = subi %c-1, %2660 : index + %2662 = select %2657, %2661, %2660 : index + %2663 = remi_signed %2662, %c16 : index + %2664 = cmpi "slt", %2663, %c0 : index + %2665 = addi %2663, %c16 : index + %2666 = select %2664, %2665, %2663 : index + %2667 = addi %arg7, %arg9 : index + %2668 = remi_signed %2667, %c6 : index + %2669 = cmpi "slt", %2668, %c0 : index + %2670 = addi %2668, %c6 : index + %2671 = select %2669, %2670, %2668 : index + %2672 = cmpi "slt", %arg5, %c0 : index + %2673 = subi %c-1, %arg5 : index + %2674 = select %2672, %2673, %arg5 : index + %2675 = divi_signed %2674, %c8 : index + %2676 = subi %c-1, %2675 : index + %2677 = select %2672, %2676, %2675 : index + %2678 = addi %arg5, %c8 : index + %2679 = cmpi "slt", %2678, %c0 : index + %2680 = subi %c-1, %2678 : index + %2681 = select %2679, %2680, %2678 : index + %2682 = divi_signed %2681, %c16 : index + %2683 = subi %c-1, %2682 : index + %2684 = select %2679, %2683, %2682 : index + %2685 = muli %2684, %c-2 : index + %2686 = addi %2677, %2685 : index + %2687 = cmpi "slt", %arg5, %c0 : index + %2688 = subi %c-1, %arg5 : index + %2689 = select %2687, %2688, %arg5 : index + %2690 = divi_signed %2689, %c8 : index + %2691 = subi %c-1, %2690 : index + %2692 = select %2687, %2691, %2690 : index + %2693 = addi %arg5, %c8 : index + %2694 = cmpi "slt", %2693, %c0 : index + %2695 = subi %c-1, %2693 : index + %2696 = select %2694, %2695, %2693 : index + %2697 = divi_signed %2696, %c16 : index + %2698 = subi %c-1, %2697 : index + %2699 = select %2694, %2698, %2697 : index + %2700 = muli %2699, %c-2 : index + %2701 = addi %2692, %2700 : index + %2702 = addi %2701, %c1 : index + %2703 = cmpi "slt", %2702, %c0 : index + %2704 = subi %c-1, %2702 : index + %2705 = select %2703, %2704, %2702 : index + %2706 = divi_signed %2705, %c2 : index + %2707 = subi %c-1, %2706 : index + %2708 = select %2703, %2707, %2706 : index + %2709 = muli %2708, %c-2 : index + %2710 = addi %2686, %2709 : index + %2711 = addi %2710, %c1 : index + store %2655, %2[%2666, %2671, %2711] : memref<16x6x2xvector<8xf32>> + %2712 = addi %arg5, %c8 : index + %2713 = cmpi "slt", %2712, %c0 : index + %2714 = subi %c-1, %2712 : index + %2715 = select %2713, %2714, %2712 : index + %2716 = divi_signed %2715, %c16 : index + %2717 = subi %c-1, %2716 : index + %2718 = select %2713, %2717, %2716 : index + %2719 = remi_signed %2718, %c16 : index + %2720 = cmpi "slt", %2719, %c0 : index + %2721 = addi %2719, %c16 : index + %2722 = select %2720, %2721, %2719 : index + %2723 = addi %arg7, %arg9 : index + %2724 = remi_signed %2723, %c6 : index + %2725 = cmpi "slt", %2724, %c0 : index + %2726 = addi %2724, %c6 : index + %2727 = select %2725, %2726, %2724 : index + %2728 = cmpi "slt", %arg5, %c0 : index + %2729 = subi %c-1, %arg5 : index + %2730 = select %2728, %2729, %arg5 : index + %2731 = divi_signed %2730, %c8 : index + %2732 = subi %c-1, %2731 : index + %2733 = select %2728, %2732, %2731 : index + %2734 = addi %arg5, %c8 : index + %2735 = cmpi "slt", %2734, %c0 : index + %2736 = subi %c-1, %2734 : index + %2737 = select %2735, %2736, %2734 : index + %2738 = divi_signed %2737, %c16 : index + %2739 = subi %c-1, %2738 : index + %2740 = select %2735, %2739, %2738 : index + %2741 = muli %2740, %c-2 : index + %2742 = addi %2733, %2741 : index + %2743 = cmpi "slt", %arg5, %c0 : index + %2744 = subi %c-1, %arg5 : index + %2745 = select %2743, %2744, %arg5 : index + %2746 = divi_signed %2745, %c8 : index + %2747 = subi %c-1, %2746 : index + %2748 = select %2743, %2747, %2746 : index + %2749 = addi %arg5, %c8 : index + %2750 = cmpi "slt", %2749, %c0 : index + %2751 = subi %c-1, %2749 : index + %2752 = select %2750, %2751, %2749 : index + %2753 = divi_signed %2752, %c16 : index + %2754 = subi %c-1, %2753 : index + %2755 = select %2750, %2754, %2753 : index + %2756 = muli %2755, %c-2 : index + %2757 = addi %2748, %2756 : index + %2758 = addi %2757, %c1 : index + %2759 = cmpi "slt", %2758, %c0 : index + %2760 = subi %c-1, %2758 : index + %2761 = select %2759, %2760, %2758 : index + %2762 = divi_signed %2761, %c2 : index + %2763 = subi %c-1, %2762 : index + %2764 = select %2759, %2763, %2762 : index + %2765 = muli %2764, %c-2 : index + %2766 = addi %2742, %2765 : index + %2767 = addi %2766, %c1 : index + %2768 = load %2[%2722, %2727, %2767] : memref<16x6x2xvector<8xf32>> + %2769 = vector.insertelement %2478, %2768[%c2_i64 : i64] : vector<8xf32> + %2770 = addi %arg5, %c8 : index + %2771 = cmpi "slt", %2770, %c0 : index + %2772 = subi %c-1, %2770 : index + %2773 = select %2771, %2772, %2770 : index + %2774 = divi_signed %2773, %c16 : index + %2775 = subi %c-1, %2774 : index + %2776 = select %2771, %2775, %2774 : index + %2777 = remi_signed %2776, %c16 : index + %2778 = cmpi "slt", %2777, %c0 : index + %2779 = addi %2777, %c16 : index + %2780 = select %2778, %2779, %2777 : index + %2781 = addi %arg7, %arg9 : index + %2782 = remi_signed %2781, %c6 : index + %2783 = cmpi "slt", %2782, %c0 : index + %2784 = addi %2782, %c6 : index + %2785 = select %2783, %2784, %2782 : index + %2786 = cmpi "slt", %arg5, %c0 : index + %2787 = subi %c-1, %arg5 : index + %2788 = select %2786, %2787, %arg5 : index + %2789 = divi_signed %2788, %c8 : index + %2790 = subi %c-1, %2789 : index + %2791 = select %2786, %2790, %2789 : index + %2792 = addi %arg5, %c8 : index + %2793 = cmpi "slt", %2792, %c0 : index + %2794 = subi %c-1, %2792 : index + %2795 = select %2793, %2794, %2792 : index + %2796 = divi_signed %2795, %c16 : index + %2797 = subi %c-1, %2796 : index + %2798 = select %2793, %2797, %2796 : index + %2799 = muli %2798, %c-2 : index + %2800 = addi %2791, %2799 : index + %2801 = cmpi "slt", %arg5, %c0 : index + %2802 = subi %c-1, %arg5 : index + %2803 = select %2801, %2802, %arg5 : index + %2804 = divi_signed %2803, %c8 : index + %2805 = subi %c-1, %2804 : index + %2806 = select %2801, %2805, %2804 : index + %2807 = addi %arg5, %c8 : index + %2808 = cmpi "slt", %2807, %c0 : index + %2809 = subi %c-1, %2807 : index + %2810 = select %2808, %2809, %2807 : index + %2811 = divi_signed %2810, %c16 : index + %2812 = subi %c-1, %2811 : index + %2813 = select %2808, %2812, %2811 : index + %2814 = muli %2813, %c-2 : index + %2815 = addi %2806, %2814 : index + %2816 = addi %2815, %c1 : index + %2817 = cmpi "slt", %2816, %c0 : index + %2818 = subi %c-1, %2816 : index + %2819 = select %2817, %2818, %2816 : index + %2820 = divi_signed %2819, %c2 : index + %2821 = subi %c-1, %2820 : index + %2822 = select %2817, %2821, %2820 : index + %2823 = muli %2822, %c-2 : index + %2824 = addi %2800, %2823 : index + %2825 = addi %2824, %c1 : index + store %2769, %2[%2780, %2785, %2825] : memref<16x6x2xvector<8xf32>> + %2826 = addi %arg5, %c8 : index + %2827 = cmpi "slt", %2826, %c0 : index + %2828 = subi %c-1, %2826 : index + %2829 = select %2827, %2828, %2826 : index + %2830 = divi_signed %2829, %c16 : index + %2831 = subi %c-1, %2830 : index + %2832 = select %2827, %2831, %2830 : index + %2833 = remi_signed %2832, %c16 : index + %2834 = cmpi "slt", %2833, %c0 : index + %2835 = addi %2833, %c16 : index + %2836 = select %2834, %2835, %2833 : index + %2837 = addi %arg7, %arg9 : index + %2838 = remi_signed %2837, %c6 : index + %2839 = cmpi "slt", %2838, %c0 : index + %2840 = addi %2838, %c6 : index + %2841 = select %2839, %2840, %2838 : index + %2842 = cmpi "slt", %arg5, %c0 : index + %2843 = subi %c-1, %arg5 : index + %2844 = select %2842, %2843, %arg5 : index + %2845 = divi_signed %2844, %c8 : index + %2846 = subi %c-1, %2845 : index + %2847 = select %2842, %2846, %2845 : index + %2848 = addi %arg5, %c8 : index + %2849 = cmpi "slt", %2848, %c0 : index + %2850 = subi %c-1, %2848 : index + %2851 = select %2849, %2850, %2848 : index + %2852 = divi_signed %2851, %c16 : index + %2853 = subi %c-1, %2852 : index + %2854 = select %2849, %2853, %2852 : index + %2855 = muli %2854, %c-2 : index + %2856 = addi %2847, %2855 : index + %2857 = cmpi "slt", %arg5, %c0 : index + %2858 = subi %c-1, %arg5 : index + %2859 = select %2857, %2858, %arg5 : index + %2860 = divi_signed %2859, %c8 : index + %2861 = subi %c-1, %2860 : index + %2862 = select %2857, %2861, %2860 : index + %2863 = addi %arg5, %c8 : index + %2864 = cmpi "slt", %2863, %c0 : index + %2865 = subi %c-1, %2863 : index + %2866 = select %2864, %2865, %2863 : index + %2867 = divi_signed %2866, %c16 : index + %2868 = subi %c-1, %2867 : index + %2869 = select %2864, %2868, %2867 : index + %2870 = muli %2869, %c-2 : index + %2871 = addi %2862, %2870 : index + %2872 = addi %2871, %c1 : index + %2873 = cmpi "slt", %2872, %c0 : index + %2874 = subi %c-1, %2872 : index + %2875 = select %2873, %2874, %2872 : index + %2876 = divi_signed %2875, %c2 : index + %2877 = subi %c-1, %2876 : index + %2878 = select %2873, %2877, %2876 : index + %2879 = muli %2878, %c-2 : index + %2880 = addi %2856, %2879 : index + %2881 = addi %2880, %c1 : index + %2882 = load %2[%2836, %2841, %2881] : memref<16x6x2xvector<8xf32>> + %2883 = vector.insertelement %2479, %2882[%c3_i64 : i64] : vector<8xf32> + %2884 = addi %arg5, %c8 : index + %2885 = cmpi "slt", %2884, %c0 : index + %2886 = subi %c-1, %2884 : index + %2887 = select %2885, %2886, %2884 : index + %2888 = divi_signed %2887, %c16 : index + %2889 = subi %c-1, %2888 : index + %2890 = select %2885, %2889, %2888 : index + %2891 = remi_signed %2890, %c16 : index + %2892 = cmpi "slt", %2891, %c0 : index + %2893 = addi %2891, %c16 : index + %2894 = select %2892, %2893, %2891 : index + %2895 = addi %arg7, %arg9 : index + %2896 = remi_signed %2895, %c6 : index + %2897 = cmpi "slt", %2896, %c0 : index + %2898 = addi %2896, %c6 : index + %2899 = select %2897, %2898, %2896 : index + %2900 = cmpi "slt", %arg5, %c0 : index + %2901 = subi %c-1, %arg5 : index + %2902 = select %2900, %2901, %arg5 : index + %2903 = divi_signed %2902, %c8 : index + %2904 = subi %c-1, %2903 : index + %2905 = select %2900, %2904, %2903 : index + %2906 = addi %arg5, %c8 : index + %2907 = cmpi "slt", %2906, %c0 : index + %2908 = subi %c-1, %2906 : index + %2909 = select %2907, %2908, %2906 : index + %2910 = divi_signed %2909, %c16 : index + %2911 = subi %c-1, %2910 : index + %2912 = select %2907, %2911, %2910 : index + %2913 = muli %2912, %c-2 : index + %2914 = addi %2905, %2913 : index + %2915 = cmpi "slt", %arg5, %c0 : index + %2916 = subi %c-1, %arg5 : index + %2917 = select %2915, %2916, %arg5 : index + %2918 = divi_signed %2917, %c8 : index + %2919 = subi %c-1, %2918 : index + %2920 = select %2915, %2919, %2918 : index + %2921 = addi %arg5, %c8 : index + %2922 = cmpi "slt", %2921, %c0 : index + %2923 = subi %c-1, %2921 : index + %2924 = select %2922, %2923, %2921 : index + %2925 = divi_signed %2924, %c16 : index + %2926 = subi %c-1, %2925 : index + %2927 = select %2922, %2926, %2925 : index + %2928 = muli %2927, %c-2 : index + %2929 = addi %2920, %2928 : index + %2930 = addi %2929, %c1 : index + %2931 = cmpi "slt", %2930, %c0 : index + %2932 = subi %c-1, %2930 : index + %2933 = select %2931, %2932, %2930 : index + %2934 = divi_signed %2933, %c2 : index + %2935 = subi %c-1, %2934 : index + %2936 = select %2931, %2935, %2934 : index + %2937 = muli %2936, %c-2 : index + %2938 = addi %2914, %2937 : index + %2939 = addi %2938, %c1 : index + store %2883, %2[%2894, %2899, %2939] : memref<16x6x2xvector<8xf32>> + %2940 = addi %arg5, %c8 : index + %2941 = cmpi "slt", %2940, %c0 : index + %2942 = subi %c-1, %2940 : index + %2943 = select %2941, %2942, %2940 : index + %2944 = divi_signed %2943, %c16 : index + %2945 = subi %c-1, %2944 : index + %2946 = select %2941, %2945, %2944 : index + %2947 = remi_signed %2946, %c16 : index + %2948 = cmpi "slt", %2947, %c0 : index + %2949 = addi %2947, %c16 : index + %2950 = select %2948, %2949, %2947 : index + %2951 = addi %arg7, %arg9 : index + %2952 = remi_signed %2951, %c6 : index + %2953 = cmpi "slt", %2952, %c0 : index + %2954 = addi %2952, %c6 : index + %2955 = select %2953, %2954, %2952 : index + %2956 = cmpi "slt", %arg5, %c0 : index + %2957 = subi %c-1, %arg5 : index + %2958 = select %2956, %2957, %arg5 : index + %2959 = divi_signed %2958, %c8 : index + %2960 = subi %c-1, %2959 : index + %2961 = select %2956, %2960, %2959 : index + %2962 = addi %arg5, %c8 : index + %2963 = cmpi "slt", %2962, %c0 : index + %2964 = subi %c-1, %2962 : index + %2965 = select %2963, %2964, %2962 : index + %2966 = divi_signed %2965, %c16 : index + %2967 = subi %c-1, %2966 : index + %2968 = select %2963, %2967, %2966 : index + %2969 = muli %2968, %c-2 : index + %2970 = addi %2961, %2969 : index + %2971 = cmpi "slt", %arg5, %c0 : index + %2972 = subi %c-1, %arg5 : index + %2973 = select %2971, %2972, %arg5 : index + %2974 = divi_signed %2973, %c8 : index + %2975 = subi %c-1, %2974 : index + %2976 = select %2971, %2975, %2974 : index + %2977 = addi %arg5, %c8 : index + %2978 = cmpi "slt", %2977, %c0 : index + %2979 = subi %c-1, %2977 : index + %2980 = select %2978, %2979, %2977 : index + %2981 = divi_signed %2980, %c16 : index + %2982 = subi %c-1, %2981 : index + %2983 = select %2978, %2982, %2981 : index + %2984 = muli %2983, %c-2 : index + %2985 = addi %2976, %2984 : index + %2986 = addi %2985, %c1 : index + %2987 = cmpi "slt", %2986, %c0 : index + %2988 = subi %c-1, %2986 : index + %2989 = select %2987, %2988, %2986 : index + %2990 = divi_signed %2989, %c2 : index + %2991 = subi %c-1, %2990 : index + %2992 = select %2987, %2991, %2990 : index + %2993 = muli %2992, %c-2 : index + %2994 = addi %2970, %2993 : index + %2995 = addi %2994, %c1 : index + %2996 = load %2[%2950, %2955, %2995] : memref<16x6x2xvector<8xf32>> + %2997 = vector.insertelement %2480, %2996[%c4_i64 : i64] : vector<8xf32> + %2998 = addi %arg5, %c8 : index + %2999 = cmpi "slt", %2998, %c0 : index + %3000 = subi %c-1, %2998 : index + %3001 = select %2999, %3000, %2998 : index + %3002 = divi_signed %3001, %c16 : index + %3003 = subi %c-1, %3002 : index + %3004 = select %2999, %3003, %3002 : index + %3005 = remi_signed %3004, %c16 : index + %3006 = cmpi "slt", %3005, %c0 : index + %3007 = addi %3005, %c16 : index + %3008 = select %3006, %3007, %3005 : index + %3009 = addi %arg7, %arg9 : index + %3010 = remi_signed %3009, %c6 : index + %3011 = cmpi "slt", %3010, %c0 : index + %3012 = addi %3010, %c6 : index + %3013 = select %3011, %3012, %3010 : index + %3014 = cmpi "slt", %arg5, %c0 : index + %3015 = subi %c-1, %arg5 : index + %3016 = select %3014, %3015, %arg5 : index + %3017 = divi_signed %3016, %c8 : index + %3018 = subi %c-1, %3017 : index + %3019 = select %3014, %3018, %3017 : index + %3020 = addi %arg5, %c8 : index + %3021 = cmpi "slt", %3020, %c0 : index + %3022 = subi %c-1, %3020 : index + %3023 = select %3021, %3022, %3020 : index + %3024 = divi_signed %3023, %c16 : index + %3025 = subi %c-1, %3024 : index + %3026 = select %3021, %3025, %3024 : index + %3027 = muli %3026, %c-2 : index + %3028 = addi %3019, %3027 : index + %3029 = cmpi "slt", %arg5, %c0 : index + %3030 = subi %c-1, %arg5 : index + %3031 = select %3029, %3030, %arg5 : index + %3032 = divi_signed %3031, %c8 : index + %3033 = subi %c-1, %3032 : index + %3034 = select %3029, %3033, %3032 : index + %3035 = addi %arg5, %c8 : index + %3036 = cmpi "slt", %3035, %c0 : index + %3037 = subi %c-1, %3035 : index + %3038 = select %3036, %3037, %3035 : index + %3039 = divi_signed %3038, %c16 : index + %3040 = subi %c-1, %3039 : index + %3041 = select %3036, %3040, %3039 : index + %3042 = muli %3041, %c-2 : index + %3043 = addi %3034, %3042 : index + %3044 = addi %3043, %c1 : index + %3045 = cmpi "slt", %3044, %c0 : index + %3046 = subi %c-1, %3044 : index + %3047 = select %3045, %3046, %3044 : index + %3048 = divi_signed %3047, %c2 : index + %3049 = subi %c-1, %3048 : index + %3050 = select %3045, %3049, %3048 : index + %3051 = muli %3050, %c-2 : index + %3052 = addi %3028, %3051 : index + %3053 = addi %3052, %c1 : index + store %2997, %2[%3008, %3013, %3053] : memref<16x6x2xvector<8xf32>> + %3054 = addi %arg5, %c8 : index + %3055 = cmpi "slt", %3054, %c0 : index + %3056 = subi %c-1, %3054 : index + %3057 = select %3055, %3056, %3054 : index + %3058 = divi_signed %3057, %c16 : index + %3059 = subi %c-1, %3058 : index + %3060 = select %3055, %3059, %3058 : index + %3061 = remi_signed %3060, %c16 : index + %3062 = cmpi "slt", %3061, %c0 : index + %3063 = addi %3061, %c16 : index + %3064 = select %3062, %3063, %3061 : index + %3065 = addi %arg7, %arg9 : index + %3066 = remi_signed %3065, %c6 : index + %3067 = cmpi "slt", %3066, %c0 : index + %3068 = addi %3066, %c6 : index + %3069 = select %3067, %3068, %3066 : index + %3070 = cmpi "slt", %arg5, %c0 : index + %3071 = subi %c-1, %arg5 : index + %3072 = select %3070, %3071, %arg5 : index + %3073 = divi_signed %3072, %c8 : index + %3074 = subi %c-1, %3073 : index + %3075 = select %3070, %3074, %3073 : index + %3076 = addi %arg5, %c8 : index + %3077 = cmpi "slt", %3076, %c0 : index + %3078 = subi %c-1, %3076 : index + %3079 = select %3077, %3078, %3076 : index + %3080 = divi_signed %3079, %c16 : index + %3081 = subi %c-1, %3080 : index + %3082 = select %3077, %3081, %3080 : index + %3083 = muli %3082, %c-2 : index + %3084 = addi %3075, %3083 : index + %3085 = cmpi "slt", %arg5, %c0 : index + %3086 = subi %c-1, %arg5 : index + %3087 = select %3085, %3086, %arg5 : index + %3088 = divi_signed %3087, %c8 : index + %3089 = subi %c-1, %3088 : index + %3090 = select %3085, %3089, %3088 : index + %3091 = addi %arg5, %c8 : index + %3092 = cmpi "slt", %3091, %c0 : index + %3093 = subi %c-1, %3091 : index + %3094 = select %3092, %3093, %3091 : index + %3095 = divi_signed %3094, %c16 : index + %3096 = subi %c-1, %3095 : index + %3097 = select %3092, %3096, %3095 : index + %3098 = muli %3097, %c-2 : index + %3099 = addi %3090, %3098 : index + %3100 = addi %3099, %c1 : index + %3101 = cmpi "slt", %3100, %c0 : index + %3102 = subi %c-1, %3100 : index + %3103 = select %3101, %3102, %3100 : index + %3104 = divi_signed %3103, %c2 : index + %3105 = subi %c-1, %3104 : index + %3106 = select %3101, %3105, %3104 : index + %3107 = muli %3106, %c-2 : index + %3108 = addi %3084, %3107 : index + %3109 = addi %3108, %c1 : index + %3110 = load %2[%3064, %3069, %3109] : memref<16x6x2xvector<8xf32>> + %3111 = vector.insertelement %2481, %3110[%c5_i64 : i64] : vector<8xf32> + %3112 = addi %arg5, %c8 : index + %3113 = cmpi "slt", %3112, %c0 : index + %3114 = subi %c-1, %3112 : index + %3115 = select %3113, %3114, %3112 : index + %3116 = divi_signed %3115, %c16 : index + %3117 = subi %c-1, %3116 : index + %3118 = select %3113, %3117, %3116 : index + %3119 = remi_signed %3118, %c16 : index + %3120 = cmpi "slt", %3119, %c0 : index + %3121 = addi %3119, %c16 : index + %3122 = select %3120, %3121, %3119 : index + %3123 = addi %arg7, %arg9 : index + %3124 = remi_signed %3123, %c6 : index + %3125 = cmpi "slt", %3124, %c0 : index + %3126 = addi %3124, %c6 : index + %3127 = select %3125, %3126, %3124 : index + %3128 = cmpi "slt", %arg5, %c0 : index + %3129 = subi %c-1, %arg5 : index + %3130 = select %3128, %3129, %arg5 : index + %3131 = divi_signed %3130, %c8 : index + %3132 = subi %c-1, %3131 : index + %3133 = select %3128, %3132, %3131 : index + %3134 = addi %arg5, %c8 : index + %3135 = cmpi "slt", %3134, %c0 : index + %3136 = subi %c-1, %3134 : index + %3137 = select %3135, %3136, %3134 : index + %3138 = divi_signed %3137, %c16 : index + %3139 = subi %c-1, %3138 : index + %3140 = select %3135, %3139, %3138 : index + %3141 = muli %3140, %c-2 : index + %3142 = addi %3133, %3141 : index + %3143 = cmpi "slt", %arg5, %c0 : index + %3144 = subi %c-1, %arg5 : index + %3145 = select %3143, %3144, %arg5 : index + %3146 = divi_signed %3145, %c8 : index + %3147 = subi %c-1, %3146 : index + %3148 = select %3143, %3147, %3146 : index + %3149 = addi %arg5, %c8 : index + %3150 = cmpi "slt", %3149, %c0 : index + %3151 = subi %c-1, %3149 : index + %3152 = select %3150, %3151, %3149 : index + %3153 = divi_signed %3152, %c16 : index + %3154 = subi %c-1, %3153 : index + %3155 = select %3150, %3154, %3153 : index + %3156 = muli %3155, %c-2 : index + %3157 = addi %3148, %3156 : index + %3158 = addi %3157, %c1 : index + %3159 = cmpi "slt", %3158, %c0 : index + %3160 = subi %c-1, %3158 : index + %3161 = select %3159, %3160, %3158 : index + %3162 = divi_signed %3161, %c2 : index + %3163 = subi %c-1, %3162 : index + %3164 = select %3159, %3163, %3162 : index + %3165 = muli %3164, %c-2 : index + %3166 = addi %3142, %3165 : index + %3167 = addi %3166, %c1 : index + store %3111, %2[%3122, %3127, %3167] : memref<16x6x2xvector<8xf32>> + %3168 = addi %arg5, %c8 : index + %3169 = cmpi "slt", %3168, %c0 : index + %3170 = subi %c-1, %3168 : index + %3171 = select %3169, %3170, %3168 : index + %3172 = divi_signed %3171, %c16 : index + %3173 = subi %c-1, %3172 : index + %3174 = select %3169, %3173, %3172 : index + %3175 = remi_signed %3174, %c16 : index + %3176 = cmpi "slt", %3175, %c0 : index + %3177 = addi %3175, %c16 : index + %3178 = select %3176, %3177, %3175 : index + %3179 = addi %arg7, %arg9 : index + %3180 = remi_signed %3179, %c6 : index + %3181 = cmpi "slt", %3180, %c0 : index + %3182 = addi %3180, %c6 : index + %3183 = select %3181, %3182, %3180 : index + %3184 = cmpi "slt", %arg5, %c0 : index + %3185 = subi %c-1, %arg5 : index + %3186 = select %3184, %3185, %arg5 : index + %3187 = divi_signed %3186, %c8 : index + %3188 = subi %c-1, %3187 : index + %3189 = select %3184, %3188, %3187 : index + %3190 = addi %arg5, %c8 : index + %3191 = cmpi "slt", %3190, %c0 : index + %3192 = subi %c-1, %3190 : index + %3193 = select %3191, %3192, %3190 : index + %3194 = divi_signed %3193, %c16 : index + %3195 = subi %c-1, %3194 : index + %3196 = select %3191, %3195, %3194 : index + %3197 = muli %3196, %c-2 : index + %3198 = addi %3189, %3197 : index + %3199 = cmpi "slt", %arg5, %c0 : index + %3200 = subi %c-1, %arg5 : index + %3201 = select %3199, %3200, %arg5 : index + %3202 = divi_signed %3201, %c8 : index + %3203 = subi %c-1, %3202 : index + %3204 = select %3199, %3203, %3202 : index + %3205 = addi %arg5, %c8 : index + %3206 = cmpi "slt", %3205, %c0 : index + %3207 = subi %c-1, %3205 : index + %3208 = select %3206, %3207, %3205 : index + %3209 = divi_signed %3208, %c16 : index + %3210 = subi %c-1, %3209 : index + %3211 = select %3206, %3210, %3209 : index + %3212 = muli %3211, %c-2 : index + %3213 = addi %3204, %3212 : index + %3214 = addi %3213, %c1 : index + %3215 = cmpi "slt", %3214, %c0 : index + %3216 = subi %c-1, %3214 : index + %3217 = select %3215, %3216, %3214 : index + %3218 = divi_signed %3217, %c2 : index + %3219 = subi %c-1, %3218 : index + %3220 = select %3215, %3219, %3218 : index + %3221 = muli %3220, %c-2 : index + %3222 = addi %3198, %3221 : index + %3223 = addi %3222, %c1 : index + %3224 = load %2[%3178, %3183, %3223] : memref<16x6x2xvector<8xf32>> + %3225 = vector.insertelement %2482, %3224[%c6_i64 : i64] : vector<8xf32> + %3226 = addi %arg5, %c8 : index + %3227 = cmpi "slt", %3226, %c0 : index + %3228 = subi %c-1, %3226 : index + %3229 = select %3227, %3228, %3226 : index + %3230 = divi_signed %3229, %c16 : index + %3231 = subi %c-1, %3230 : index + %3232 = select %3227, %3231, %3230 : index + %3233 = remi_signed %3232, %c16 : index + %3234 = cmpi "slt", %3233, %c0 : index + %3235 = addi %3233, %c16 : index + %3236 = select %3234, %3235, %3233 : index + %3237 = addi %arg7, %arg9 : index + %3238 = remi_signed %3237, %c6 : index + %3239 = cmpi "slt", %3238, %c0 : index + %3240 = addi %3238, %c6 : index + %3241 = select %3239, %3240, %3238 : index + %3242 = cmpi "slt", %arg5, %c0 : index + %3243 = subi %c-1, %arg5 : index + %3244 = select %3242, %3243, %arg5 : index + %3245 = divi_signed %3244, %c8 : index + %3246 = subi %c-1, %3245 : index + %3247 = select %3242, %3246, %3245 : index + %3248 = addi %arg5, %c8 : index + %3249 = cmpi "slt", %3248, %c0 : index + %3250 = subi %c-1, %3248 : index + %3251 = select %3249, %3250, %3248 : index + %3252 = divi_signed %3251, %c16 : index + %3253 = subi %c-1, %3252 : index + %3254 = select %3249, %3253, %3252 : index + %3255 = muli %3254, %c-2 : index + %3256 = addi %3247, %3255 : index + %3257 = cmpi "slt", %arg5, %c0 : index + %3258 = subi %c-1, %arg5 : index + %3259 = select %3257, %3258, %arg5 : index + %3260 = divi_signed %3259, %c8 : index + %3261 = subi %c-1, %3260 : index + %3262 = select %3257, %3261, %3260 : index + %3263 = addi %arg5, %c8 : index + %3264 = cmpi "slt", %3263, %c0 : index + %3265 = subi %c-1, %3263 : index + %3266 = select %3264, %3265, %3263 : index + %3267 = divi_signed %3266, %c16 : index + %3268 = subi %c-1, %3267 : index + %3269 = select %3264, %3268, %3267 : index + %3270 = muli %3269, %c-2 : index + %3271 = addi %3262, %3270 : index + %3272 = addi %3271, %c1 : index + %3273 = cmpi "slt", %3272, %c0 : index + %3274 = subi %c-1, %3272 : index + %3275 = select %3273, %3274, %3272 : index + %3276 = divi_signed %3275, %c2 : index + %3277 = subi %c-1, %3276 : index + %3278 = select %3273, %3277, %3276 : index + %3279 = muli %3278, %c-2 : index + %3280 = addi %3256, %3279 : index + %3281 = addi %3280, %c1 : index + store %3225, %2[%3236, %3241, %3281] : memref<16x6x2xvector<8xf32>> + %3282 = addi %arg5, %c8 : index + %3283 = cmpi "slt", %3282, %c0 : index + %3284 = subi %c-1, %3282 : index + %3285 = select %3283, %3284, %3282 : index + %3286 = divi_signed %3285, %c16 : index + %3287 = subi %c-1, %3286 : index + %3288 = select %3283, %3287, %3286 : index + %3289 = remi_signed %3288, %c16 : index + %3290 = cmpi "slt", %3289, %c0 : index + %3291 = addi %3289, %c16 : index + %3292 = select %3290, %3291, %3289 : index + %3293 = addi %arg7, %arg9 : index + %3294 = remi_signed %3293, %c6 : index + %3295 = cmpi "slt", %3294, %c0 : index + %3296 = addi %3294, %c6 : index + %3297 = select %3295, %3296, %3294 : index + %3298 = cmpi "slt", %arg5, %c0 : index + %3299 = subi %c-1, %arg5 : index + %3300 = select %3298, %3299, %arg5 : index + %3301 = divi_signed %3300, %c8 : index + %3302 = subi %c-1, %3301 : index + %3303 = select %3298, %3302, %3301 : index + %3304 = addi %arg5, %c8 : index + %3305 = cmpi "slt", %3304, %c0 : index + %3306 = subi %c-1, %3304 : index + %3307 = select %3305, %3306, %3304 : index + %3308 = divi_signed %3307, %c16 : index + %3309 = subi %c-1, %3308 : index + %3310 = select %3305, %3309, %3308 : index + %3311 = muli %3310, %c-2 : index + %3312 = addi %3303, %3311 : index + %3313 = cmpi "slt", %arg5, %c0 : index + %3314 = subi %c-1, %arg5 : index + %3315 = select %3313, %3314, %arg5 : index + %3316 = divi_signed %3315, %c8 : index + %3317 = subi %c-1, %3316 : index + %3318 = select %3313, %3317, %3316 : index + %3319 = addi %arg5, %c8 : index + %3320 = cmpi "slt", %3319, %c0 : index + %3321 = subi %c-1, %3319 : index + %3322 = select %3320, %3321, %3319 : index + %3323 = divi_signed %3322, %c16 : index + %3324 = subi %c-1, %3323 : index + %3325 = select %3320, %3324, %3323 : index + %3326 = muli %3325, %c-2 : index + %3327 = addi %3318, %3326 : index + %3328 = addi %3327, %c1 : index + %3329 = cmpi "slt", %3328, %c0 : index + %3330 = subi %c-1, %3328 : index + %3331 = select %3329, %3330, %3328 : index + %3332 = divi_signed %3331, %c2 : index + %3333 = subi %c-1, %3332 : index + %3334 = select %3329, %3333, %3332 : index + %3335 = muli %3334, %c-2 : index + %3336 = addi %3312, %3335 : index + %3337 = addi %3336, %c1 : index + %3338 = load %2[%3292, %3297, %3337] : memref<16x6x2xvector<8xf32>> + %3339 = vector.insertelement %2483, %3338[%c7_i64 : i64] : vector<8xf32> + %3340 = addi %arg5, %c8 : index + %3341 = cmpi "slt", %3340, %c0 : index + %3342 = subi %c-1, %3340 : index + %3343 = select %3341, %3342, %3340 : index + %3344 = divi_signed %3343, %c16 : index + %3345 = subi %c-1, %3344 : index + %3346 = select %3341, %3345, %3344 : index + %3347 = remi_signed %3346, %c16 : index + %3348 = cmpi "slt", %3347, %c0 : index + %3349 = addi %3347, %c16 : index + %3350 = select %3348, %3349, %3347 : index + %3351 = addi %arg7, %arg9 : index + %3352 = remi_signed %3351, %c6 : index + %3353 = cmpi "slt", %3352, %c0 : index + %3354 = addi %3352, %c6 : index + %3355 = select %3353, %3354, %3352 : index + %3356 = cmpi "slt", %arg5, %c0 : index + %3357 = subi %c-1, %arg5 : index + %3358 = select %3356, %3357, %arg5 : index + %3359 = divi_signed %3358, %c8 : index + %3360 = subi %c-1, %3359 : index + %3361 = select %3356, %3360, %3359 : index + %3362 = addi %arg5, %c8 : index + %3363 = cmpi "slt", %3362, %c0 : index + %3364 = subi %c-1, %3362 : index + %3365 = select %3363, %3364, %3362 : index + %3366 = divi_signed %3365, %c16 : index + %3367 = subi %c-1, %3366 : index + %3368 = select %3363, %3367, %3366 : index + %3369 = muli %3368, %c-2 : index + %3370 = addi %3361, %3369 : index + %3371 = cmpi "slt", %arg5, %c0 : index + %3372 = subi %c-1, %arg5 : index + %3373 = select %3371, %3372, %arg5 : index + %3374 = divi_signed %3373, %c8 : index + %3375 = subi %c-1, %3374 : index + %3376 = select %3371, %3375, %3374 : index + %3377 = addi %arg5, %c8 : index + %3378 = cmpi "slt", %3377, %c0 : index + %3379 = subi %c-1, %3377 : index + %3380 = select %3378, %3379, %3377 : index + %3381 = divi_signed %3380, %c16 : index + %3382 = subi %c-1, %3381 : index + %3383 = select %3378, %3382, %3381 : index + %3384 = muli %3383, %c-2 : index + %3385 = addi %3376, %3384 : index + %3386 = addi %3385, %c1 : index + %3387 = cmpi "slt", %3386, %c0 : index + %3388 = subi %c-1, %3386 : index + %3389 = select %3387, %3388, %3386 : index + %3390 = divi_signed %3389, %c2 : index + %3391 = subi %c-1, %3390 : index + %3392 = select %3387, %3391, %3390 : index + %3393 = muli %3392, %c-2 : index + %3394 = addi %3370, %3393 : index + %3395 = addi %3394, %c1 : index + store %3339, %2[%3350, %3355, %3395] : memref<16x6x2xvector<8xf32>> + %3396 = addi %arg5, %c8 : index + %3397 = cmpi "slt", %3396, %c0 : index + %3398 = subi %c-1, %3396 : index + %3399 = select %3397, %3398, %3396 : index + %3400 = divi_signed %3399, %c16 : index + %3401 = subi %c-1, %3400 : index + %3402 = select %3397, %3401, %3400 : index + %3403 = remi_signed %3402, %c16 : index + %3404 = cmpi "slt", %3403, %c0 : index + %3405 = addi %3403, %c16 : index + %3406 = select %3404, %3405, %3403 : index + %3407 = addi %arg7, %arg9 : index + %3408 = remi_signed %3407, %c6 : index + %3409 = cmpi "slt", %3408, %c0 : index + %3410 = addi %3408, %c6 : index + %3411 = select %3409, %3410, %3408 : index + %3412 = cmpi "slt", %arg5, %c0 : index + %3413 = subi %c-1, %arg5 : index + %3414 = select %3412, %3413, %arg5 : index + %3415 = divi_signed %3414, %c8 : index + %3416 = subi %c-1, %3415 : index + %3417 = select %3412, %3416, %3415 : index + %3418 = addi %arg5, %c8 : index + %3419 = cmpi "slt", %3418, %c0 : index + %3420 = subi %c-1, %3418 : index + %3421 = select %3419, %3420, %3418 : index + %3422 = divi_signed %3421, %c16 : index + %3423 = subi %c-1, %3422 : index + %3424 = select %3419, %3423, %3422 : index + %3425 = muli %3424, %c-2 : index + %3426 = addi %3417, %3425 : index + %3427 = cmpi "slt", %arg5, %c0 : index + %3428 = subi %c-1, %arg5 : index + %3429 = select %3427, %3428, %arg5 : index + %3430 = divi_signed %3429, %c8 : index + %3431 = subi %c-1, %3430 : index + %3432 = select %3427, %3431, %3430 : index + %3433 = addi %arg5, %c8 : index + %3434 = cmpi "slt", %3433, %c0 : index + %3435 = subi %c-1, %3433 : index + %3436 = select %3434, %3435, %3433 : index + %3437 = divi_signed %3436, %c16 : index + %3438 = subi %c-1, %3437 : index + %3439 = select %3434, %3438, %3437 : index + %3440 = muli %3439, %c-2 : index + %3441 = addi %3432, %3440 : index + %3442 = addi %3441, %c1 : index + %3443 = cmpi "slt", %3442, %c0 : index + %3444 = subi %c-1, %3442 : index + %3445 = select %3443, %3444, %3442 : index + %3446 = divi_signed %3445, %c2 : index + %3447 = subi %c-1, %3446 : index + %3448 = select %3443, %3447, %3446 : index + %3449 = muli %3448, %c-2 : index + %3450 = addi %3426, %3449 : index + %3451 = addi %3450, %c1 : index + %3452 = load %2[%3406, %3411, %3451] : memref<16x6x2xvector<8xf32>> + %3453 = vector.insertelement %2476, %3452[%c0_i64 : i64] : vector<8xf32> + %3454 = addi %arg5, %c8 : index + %3455 = cmpi "slt", %3454, %c0 : index + %3456 = subi %c-1, %3454 : index + %3457 = select %3455, %3456, %3454 : index + %3458 = divi_signed %3457, %c16 : index + %3459 = subi %c-1, %3458 : index + %3460 = select %3455, %3459, %3458 : index + %3461 = remi_signed %3460, %c16 : index + %3462 = cmpi "slt", %3461, %c0 : index + %3463 = addi %3461, %c16 : index + %3464 = select %3462, %3463, %3461 : index + %3465 = addi %arg7, %arg9 : index + %3466 = remi_signed %3465, %c6 : index + %3467 = cmpi "slt", %3466, %c0 : index + %3468 = addi %3466, %c6 : index + %3469 = select %3467, %3468, %3466 : index + %3470 = cmpi "slt", %arg5, %c0 : index + %3471 = subi %c-1, %arg5 : index + %3472 = select %3470, %3471, %arg5 : index + %3473 = divi_signed %3472, %c8 : index + %3474 = subi %c-1, %3473 : index + %3475 = select %3470, %3474, %3473 : index + %3476 = addi %arg5, %c8 : index + %3477 = cmpi "slt", %3476, %c0 : index + %3478 = subi %c-1, %3476 : index + %3479 = select %3477, %3478, %3476 : index + %3480 = divi_signed %3479, %c16 : index + %3481 = subi %c-1, %3480 : index + %3482 = select %3477, %3481, %3480 : index + %3483 = muli %3482, %c-2 : index + %3484 = addi %3475, %3483 : index + %3485 = cmpi "slt", %arg5, %c0 : index + %3486 = subi %c-1, %arg5 : index + %3487 = select %3485, %3486, %arg5 : index + %3488 = divi_signed %3487, %c8 : index + %3489 = subi %c-1, %3488 : index + %3490 = select %3485, %3489, %3488 : index + %3491 = addi %arg5, %c8 : index + %3492 = cmpi "slt", %3491, %c0 : index + %3493 = subi %c-1, %3491 : index + %3494 = select %3492, %3493, %3491 : index + %3495 = divi_signed %3494, %c16 : index + %3496 = subi %c-1, %3495 : index + %3497 = select %3492, %3496, %3495 : index + %3498 = muli %3497, %c-2 : index + %3499 = addi %3490, %3498 : index + %3500 = addi %3499, %c1 : index + %3501 = cmpi "slt", %3500, %c0 : index + %3502 = subi %c-1, %3500 : index + %3503 = select %3501, %3502, %3500 : index + %3504 = divi_signed %3503, %c2 : index + %3505 = subi %c-1, %3504 : index + %3506 = select %3501, %3505, %3504 : index + %3507 = muli %3506, %c-2 : index + %3508 = addi %3484, %3507 : index + %3509 = addi %3508, %c1 : index + store %3453, %2[%3464, %3469, %3509] : memref<16x6x2xvector<8xf32>> + %3510 = addi %arg5, %c8 : index + %3511 = cmpi "slt", %3510, %c0 : index + %3512 = subi %c-1, %3510 : index + %3513 = select %3511, %3512, %3510 : index + %3514 = divi_signed %3513, %c16 : index + %3515 = subi %c-1, %3514 : index + %3516 = select %3511, %3515, %3514 : index + %3517 = remi_signed %3516, %c16 : index + %3518 = cmpi "slt", %3517, %c0 : index + %3519 = addi %3517, %c16 : index + %3520 = select %3518, %3519, %3517 : index + %3521 = addi %arg7, %arg9 : index + %3522 = remi_signed %3521, %c6 : index + %3523 = cmpi "slt", %3522, %c0 : index + %3524 = addi %3522, %c6 : index + %3525 = select %3523, %3524, %3522 : index + %3526 = cmpi "slt", %arg5, %c0 : index + %3527 = subi %c-1, %arg5 : index + %3528 = select %3526, %3527, %arg5 : index + %3529 = divi_signed %3528, %c8 : index + %3530 = subi %c-1, %3529 : index + %3531 = select %3526, %3530, %3529 : index + %3532 = addi %arg5, %c8 : index + %3533 = cmpi "slt", %3532, %c0 : index + %3534 = subi %c-1, %3532 : index + %3535 = select %3533, %3534, %3532 : index + %3536 = divi_signed %3535, %c16 : index + %3537 = subi %c-1, %3536 : index + %3538 = select %3533, %3537, %3536 : index + %3539 = muli %3538, %c-2 : index + %3540 = addi %3531, %3539 : index + %3541 = cmpi "slt", %arg5, %c0 : index + %3542 = subi %c-1, %arg5 : index + %3543 = select %3541, %3542, %arg5 : index + %3544 = divi_signed %3543, %c8 : index + %3545 = subi %c-1, %3544 : index + %3546 = select %3541, %3545, %3544 : index + %3547 = addi %arg5, %c8 : index + %3548 = cmpi "slt", %3547, %c0 : index + %3549 = subi %c-1, %3547 : index + %3550 = select %3548, %3549, %3547 : index + %3551 = divi_signed %3550, %c16 : index + %3552 = subi %c-1, %3551 : index + %3553 = select %3548, %3552, %3551 : index + %3554 = muli %3553, %c-2 : index + %3555 = addi %3546, %3554 : index + %3556 = addi %3555, %c1 : index + %3557 = cmpi "slt", %3556, %c0 : index + %3558 = subi %c-1, %3556 : index + %3559 = select %3557, %3558, %3556 : index + %3560 = divi_signed %3559, %c2 : index + %3561 = subi %c-1, %3560 : index + %3562 = select %3557, %3561, %3560 : index + %3563 = muli %3562, %c-2 : index + %3564 = addi %3540, %3563 : index + %3565 = addi %3564, %c1 : index + %3566 = load %2[%3520, %3525, %3565] : memref<16x6x2xvector<8xf32>> + %3567 = vector.insertelement %2477, %3566[%c1_i64 : i64] : vector<8xf32> + %3568 = addi %arg5, %c8 : index + %3569 = cmpi "slt", %3568, %c0 : index + %3570 = subi %c-1, %3568 : index + %3571 = select %3569, %3570, %3568 : index + %3572 = divi_signed %3571, %c16 : index + %3573 = subi %c-1, %3572 : index + %3574 = select %3569, %3573, %3572 : index + %3575 = remi_signed %3574, %c16 : index + %3576 = cmpi "slt", %3575, %c0 : index + %3577 = addi %3575, %c16 : index + %3578 = select %3576, %3577, %3575 : index + %3579 = addi %arg7, %arg9 : index + %3580 = remi_signed %3579, %c6 : index + %3581 = cmpi "slt", %3580, %c0 : index + %3582 = addi %3580, %c6 : index + %3583 = select %3581, %3582, %3580 : index + %3584 = cmpi "slt", %arg5, %c0 : index + %3585 = subi %c-1, %arg5 : index + %3586 = select %3584, %3585, %arg5 : index + %3587 = divi_signed %3586, %c8 : index + %3588 = subi %c-1, %3587 : index + %3589 = select %3584, %3588, %3587 : index + %3590 = addi %arg5, %c8 : index + %3591 = cmpi "slt", %3590, %c0 : index + %3592 = subi %c-1, %3590 : index + %3593 = select %3591, %3592, %3590 : index + %3594 = divi_signed %3593, %c16 : index + %3595 = subi %c-1, %3594 : index + %3596 = select %3591, %3595, %3594 : index + %3597 = muli %3596, %c-2 : index + %3598 = addi %3589, %3597 : index + %3599 = cmpi "slt", %arg5, %c0 : index + %3600 = subi %c-1, %arg5 : index + %3601 = select %3599, %3600, %arg5 : index + %3602 = divi_signed %3601, %c8 : index + %3603 = subi %c-1, %3602 : index + %3604 = select %3599, %3603, %3602 : index + %3605 = addi %arg5, %c8 : index + %3606 = cmpi "slt", %3605, %c0 : index + %3607 = subi %c-1, %3605 : index + %3608 = select %3606, %3607, %3605 : index + %3609 = divi_signed %3608, %c16 : index + %3610 = subi %c-1, %3609 : index + %3611 = select %3606, %3610, %3609 : index + %3612 = muli %3611, %c-2 : index + %3613 = addi %3604, %3612 : index + %3614 = addi %3613, %c1 : index + %3615 = cmpi "slt", %3614, %c0 : index + %3616 = subi %c-1, %3614 : index + %3617 = select %3615, %3616, %3614 : index + %3618 = divi_signed %3617, %c2 : index + %3619 = subi %c-1, %3618 : index + %3620 = select %3615, %3619, %3618 : index + %3621 = muli %3620, %c-2 : index + %3622 = addi %3598, %3621 : index + %3623 = addi %3622, %c1 : index + store %3567, %2[%3578, %3583, %3623] : memref<16x6x2xvector<8xf32>> + %3624 = addi %arg5, %c8 : index + %3625 = cmpi "slt", %3624, %c0 : index + %3626 = subi %c-1, %3624 : index + %3627 = select %3625, %3626, %3624 : index + %3628 = divi_signed %3627, %c16 : index + %3629 = subi %c-1, %3628 : index + %3630 = select %3625, %3629, %3628 : index + %3631 = remi_signed %3630, %c16 : index + %3632 = cmpi "slt", %3631, %c0 : index + %3633 = addi %3631, %c16 : index + %3634 = select %3632, %3633, %3631 : index + %3635 = addi %arg7, %arg9 : index + %3636 = remi_signed %3635, %c6 : index + %3637 = cmpi "slt", %3636, %c0 : index + %3638 = addi %3636, %c6 : index + %3639 = select %3637, %3638, %3636 : index + %3640 = cmpi "slt", %arg5, %c0 : index + %3641 = subi %c-1, %arg5 : index + %3642 = select %3640, %3641, %arg5 : index + %3643 = divi_signed %3642, %c8 : index + %3644 = subi %c-1, %3643 : index + %3645 = select %3640, %3644, %3643 : index + %3646 = addi %arg5, %c8 : index + %3647 = cmpi "slt", %3646, %c0 : index + %3648 = subi %c-1, %3646 : index + %3649 = select %3647, %3648, %3646 : index + %3650 = divi_signed %3649, %c16 : index + %3651 = subi %c-1, %3650 : index + %3652 = select %3647, %3651, %3650 : index + %3653 = muli %3652, %c-2 : index + %3654 = addi %3645, %3653 : index + %3655 = cmpi "slt", %arg5, %c0 : index + %3656 = subi %c-1, %arg5 : index + %3657 = select %3655, %3656, %arg5 : index + %3658 = divi_signed %3657, %c8 : index + %3659 = subi %c-1, %3658 : index + %3660 = select %3655, %3659, %3658 : index + %3661 = addi %arg5, %c8 : index + %3662 = cmpi "slt", %3661, %c0 : index + %3663 = subi %c-1, %3661 : index + %3664 = select %3662, %3663, %3661 : index + %3665 = divi_signed %3664, %c16 : index + %3666 = subi %c-1, %3665 : index + %3667 = select %3662, %3666, %3665 : index + %3668 = muli %3667, %c-2 : index + %3669 = addi %3660, %3668 : index + %3670 = addi %3669, %c1 : index + %3671 = cmpi "slt", %3670, %c0 : index + %3672 = subi %c-1, %3670 : index + %3673 = select %3671, %3672, %3670 : index + %3674 = divi_signed %3673, %c2 : index + %3675 = subi %c-1, %3674 : index + %3676 = select %3671, %3675, %3674 : index + %3677 = muli %3676, %c-2 : index + %3678 = addi %3654, %3677 : index + %3679 = addi %3678, %c1 : index + %3680 = load %2[%3634, %3639, %3679] : memref<16x6x2xvector<8xf32>> + %3681 = vector.insertelement %2478, %3680[%c2_i64 : i64] : vector<8xf32> + %3682 = addi %arg5, %c8 : index + %3683 = cmpi "slt", %3682, %c0 : index + %3684 = subi %c-1, %3682 : index + %3685 = select %3683, %3684, %3682 : index + %3686 = divi_signed %3685, %c16 : index + %3687 = subi %c-1, %3686 : index + %3688 = select %3683, %3687, %3686 : index + %3689 = remi_signed %3688, %c16 : index + %3690 = cmpi "slt", %3689, %c0 : index + %3691 = addi %3689, %c16 : index + %3692 = select %3690, %3691, %3689 : index + %3693 = addi %arg7, %arg9 : index + %3694 = remi_signed %3693, %c6 : index + %3695 = cmpi "slt", %3694, %c0 : index + %3696 = addi %3694, %c6 : index + %3697 = select %3695, %3696, %3694 : index + %3698 = cmpi "slt", %arg5, %c0 : index + %3699 = subi %c-1, %arg5 : index + %3700 = select %3698, %3699, %arg5 : index + %3701 = divi_signed %3700, %c8 : index + %3702 = subi %c-1, %3701 : index + %3703 = select %3698, %3702, %3701 : index + %3704 = addi %arg5, %c8 : index + %3705 = cmpi "slt", %3704, %c0 : index + %3706 = subi %c-1, %3704 : index + %3707 = select %3705, %3706, %3704 : index + %3708 = divi_signed %3707, %c16 : index + %3709 = subi %c-1, %3708 : index + %3710 = select %3705, %3709, %3708 : index + %3711 = muli %3710, %c-2 : index + %3712 = addi %3703, %3711 : index + %3713 = cmpi "slt", %arg5, %c0 : index + %3714 = subi %c-1, %arg5 : index + %3715 = select %3713, %3714, %arg5 : index + %3716 = divi_signed %3715, %c8 : index + %3717 = subi %c-1, %3716 : index + %3718 = select %3713, %3717, %3716 : index + %3719 = addi %arg5, %c8 : index + %3720 = cmpi "slt", %3719, %c0 : index + %3721 = subi %c-1, %3719 : index + %3722 = select %3720, %3721, %3719 : index + %3723 = divi_signed %3722, %c16 : index + %3724 = subi %c-1, %3723 : index + %3725 = select %3720, %3724, %3723 : index + %3726 = muli %3725, %c-2 : index + %3727 = addi %3718, %3726 : index + %3728 = addi %3727, %c1 : index + %3729 = cmpi "slt", %3728, %c0 : index + %3730 = subi %c-1, %3728 : index + %3731 = select %3729, %3730, %3728 : index + %3732 = divi_signed %3731, %c2 : index + %3733 = subi %c-1, %3732 : index + %3734 = select %3729, %3733, %3732 : index + %3735 = muli %3734, %c-2 : index + %3736 = addi %3712, %3735 : index + %3737 = addi %3736, %c1 : index + store %3681, %2[%3692, %3697, %3737] : memref<16x6x2xvector<8xf32>> + %3738 = addi %arg5, %c8 : index + %3739 = cmpi "slt", %3738, %c0 : index + %3740 = subi %c-1, %3738 : index + %3741 = select %3739, %3740, %3738 : index + %3742 = divi_signed %3741, %c16 : index + %3743 = subi %c-1, %3742 : index + %3744 = select %3739, %3743, %3742 : index + %3745 = remi_signed %3744, %c16 : index + %3746 = cmpi "slt", %3745, %c0 : index + %3747 = addi %3745, %c16 : index + %3748 = select %3746, %3747, %3745 : index + %3749 = addi %arg7, %arg9 : index + %3750 = remi_signed %3749, %c6 : index + %3751 = cmpi "slt", %3750, %c0 : index + %3752 = addi %3750, %c6 : index + %3753 = select %3751, %3752, %3750 : index + %3754 = cmpi "slt", %arg5, %c0 : index + %3755 = subi %c-1, %arg5 : index + %3756 = select %3754, %3755, %arg5 : index + %3757 = divi_signed %3756, %c8 : index + %3758 = subi %c-1, %3757 : index + %3759 = select %3754, %3758, %3757 : index + %3760 = addi %arg5, %c8 : index + %3761 = cmpi "slt", %3760, %c0 : index + %3762 = subi %c-1, %3760 : index + %3763 = select %3761, %3762, %3760 : index + %3764 = divi_signed %3763, %c16 : index + %3765 = subi %c-1, %3764 : index + %3766 = select %3761, %3765, %3764 : index + %3767 = muli %3766, %c-2 : index + %3768 = addi %3759, %3767 : index + %3769 = cmpi "slt", %arg5, %c0 : index + %3770 = subi %c-1, %arg5 : index + %3771 = select %3769, %3770, %arg5 : index + %3772 = divi_signed %3771, %c8 : index + %3773 = subi %c-1, %3772 : index + %3774 = select %3769, %3773, %3772 : index + %3775 = addi %arg5, %c8 : index + %3776 = cmpi "slt", %3775, %c0 : index + %3777 = subi %c-1, %3775 : index + %3778 = select %3776, %3777, %3775 : index + %3779 = divi_signed %3778, %c16 : index + %3780 = subi %c-1, %3779 : index + %3781 = select %3776, %3780, %3779 : index + %3782 = muli %3781, %c-2 : index + %3783 = addi %3774, %3782 : index + %3784 = addi %3783, %c1 : index + %3785 = cmpi "slt", %3784, %c0 : index + %3786 = subi %c-1, %3784 : index + %3787 = select %3785, %3786, %3784 : index + %3788 = divi_signed %3787, %c2 : index + %3789 = subi %c-1, %3788 : index + %3790 = select %3785, %3789, %3788 : index + %3791 = muli %3790, %c-2 : index + %3792 = addi %3768, %3791 : index + %3793 = addi %3792, %c1 : index + %3794 = load %2[%3748, %3753, %3793] : memref<16x6x2xvector<8xf32>> + %3795 = vector.insertelement %2479, %3794[%c3_i64 : i64] : vector<8xf32> + %3796 = addi %arg5, %c8 : index + %3797 = cmpi "slt", %3796, %c0 : index + %3798 = subi %c-1, %3796 : index + %3799 = select %3797, %3798, %3796 : index + %3800 = divi_signed %3799, %c16 : index + %3801 = subi %c-1, %3800 : index + %3802 = select %3797, %3801, %3800 : index + %3803 = remi_signed %3802, %c16 : index + %3804 = cmpi "slt", %3803, %c0 : index + %3805 = addi %3803, %c16 : index + %3806 = select %3804, %3805, %3803 : index + %3807 = addi %arg7, %arg9 : index + %3808 = remi_signed %3807, %c6 : index + %3809 = cmpi "slt", %3808, %c0 : index + %3810 = addi %3808, %c6 : index + %3811 = select %3809, %3810, %3808 : index + %3812 = cmpi "slt", %arg5, %c0 : index + %3813 = subi %c-1, %arg5 : index + %3814 = select %3812, %3813, %arg5 : index + %3815 = divi_signed %3814, %c8 : index + %3816 = subi %c-1, %3815 : index + %3817 = select %3812, %3816, %3815 : index + %3818 = addi %arg5, %c8 : index + %3819 = cmpi "slt", %3818, %c0 : index + %3820 = subi %c-1, %3818 : index + %3821 = select %3819, %3820, %3818 : index + %3822 = divi_signed %3821, %c16 : index + %3823 = subi %c-1, %3822 : index + %3824 = select %3819, %3823, %3822 : index + %3825 = muli %3824, %c-2 : index + %3826 = addi %3817, %3825 : index + %3827 = cmpi "slt", %arg5, %c0 : index + %3828 = subi %c-1, %arg5 : index + %3829 = select %3827, %3828, %arg5 : index + %3830 = divi_signed %3829, %c8 : index + %3831 = subi %c-1, %3830 : index + %3832 = select %3827, %3831, %3830 : index + %3833 = addi %arg5, %c8 : index + %3834 = cmpi "slt", %3833, %c0 : index + %3835 = subi %c-1, %3833 : index + %3836 = select %3834, %3835, %3833 : index + %3837 = divi_signed %3836, %c16 : index + %3838 = subi %c-1, %3837 : index + %3839 = select %3834, %3838, %3837 : index + %3840 = muli %3839, %c-2 : index + %3841 = addi %3832, %3840 : index + %3842 = addi %3841, %c1 : index + %3843 = cmpi "slt", %3842, %c0 : index + %3844 = subi %c-1, %3842 : index + %3845 = select %3843, %3844, %3842 : index + %3846 = divi_signed %3845, %c2 : index + %3847 = subi %c-1, %3846 : index + %3848 = select %3843, %3847, %3846 : index + %3849 = muli %3848, %c-2 : index + %3850 = addi %3826, %3849 : index + %3851 = addi %3850, %c1 : index + store %3795, %2[%3806, %3811, %3851] : memref<16x6x2xvector<8xf32>> + %3852 = addi %arg5, %c8 : index + %3853 = cmpi "slt", %3852, %c0 : index + %3854 = subi %c-1, %3852 : index + %3855 = select %3853, %3854, %3852 : index + %3856 = divi_signed %3855, %c16 : index + %3857 = subi %c-1, %3856 : index + %3858 = select %3853, %3857, %3856 : index + %3859 = remi_signed %3858, %c16 : index + %3860 = cmpi "slt", %3859, %c0 : index + %3861 = addi %3859, %c16 : index + %3862 = select %3860, %3861, %3859 : index + %3863 = addi %arg7, %arg9 : index + %3864 = remi_signed %3863, %c6 : index + %3865 = cmpi "slt", %3864, %c0 : index + %3866 = addi %3864, %c6 : index + %3867 = select %3865, %3866, %3864 : index + %3868 = cmpi "slt", %arg5, %c0 : index + %3869 = subi %c-1, %arg5 : index + %3870 = select %3868, %3869, %arg5 : index + %3871 = divi_signed %3870, %c8 : index + %3872 = subi %c-1, %3871 : index + %3873 = select %3868, %3872, %3871 : index + %3874 = addi %arg5, %c8 : index + %3875 = cmpi "slt", %3874, %c0 : index + %3876 = subi %c-1, %3874 : index + %3877 = select %3875, %3876, %3874 : index + %3878 = divi_signed %3877, %c16 : index + %3879 = subi %c-1, %3878 : index + %3880 = select %3875, %3879, %3878 : index + %3881 = muli %3880, %c-2 : index + %3882 = addi %3873, %3881 : index + %3883 = cmpi "slt", %arg5, %c0 : index + %3884 = subi %c-1, %arg5 : index + %3885 = select %3883, %3884, %arg5 : index + %3886 = divi_signed %3885, %c8 : index + %3887 = subi %c-1, %3886 : index + %3888 = select %3883, %3887, %3886 : index + %3889 = addi %arg5, %c8 : index + %3890 = cmpi "slt", %3889, %c0 : index + %3891 = subi %c-1, %3889 : index + %3892 = select %3890, %3891, %3889 : index + %3893 = divi_signed %3892, %c16 : index + %3894 = subi %c-1, %3893 : index + %3895 = select %3890, %3894, %3893 : index + %3896 = muli %3895, %c-2 : index + %3897 = addi %3888, %3896 : index + %3898 = addi %3897, %c1 : index + %3899 = cmpi "slt", %3898, %c0 : index + %3900 = subi %c-1, %3898 : index + %3901 = select %3899, %3900, %3898 : index + %3902 = divi_signed %3901, %c2 : index + %3903 = subi %c-1, %3902 : index + %3904 = select %3899, %3903, %3902 : index + %3905 = muli %3904, %c-2 : index + %3906 = addi %3882, %3905 : index + %3907 = addi %3906, %c1 : index + %3908 = load %2[%3862, %3867, %3907] : memref<16x6x2xvector<8xf32>> + %3909 = vector.insertelement %2480, %3908[%c4_i64 : i64] : vector<8xf32> + %3910 = addi %arg5, %c8 : index + %3911 = cmpi "slt", %3910, %c0 : index + %3912 = subi %c-1, %3910 : index + %3913 = select %3911, %3912, %3910 : index + %3914 = divi_signed %3913, %c16 : index + %3915 = subi %c-1, %3914 : index + %3916 = select %3911, %3915, %3914 : index + %3917 = remi_signed %3916, %c16 : index + %3918 = cmpi "slt", %3917, %c0 : index + %3919 = addi %3917, %c16 : index + %3920 = select %3918, %3919, %3917 : index + %3921 = addi %arg7, %arg9 : index + %3922 = remi_signed %3921, %c6 : index + %3923 = cmpi "slt", %3922, %c0 : index + %3924 = addi %3922, %c6 : index + %3925 = select %3923, %3924, %3922 : index + %3926 = cmpi "slt", %arg5, %c0 : index + %3927 = subi %c-1, %arg5 : index + %3928 = select %3926, %3927, %arg5 : index + %3929 = divi_signed %3928, %c8 : index + %3930 = subi %c-1, %3929 : index + %3931 = select %3926, %3930, %3929 : index + %3932 = addi %arg5, %c8 : index + %3933 = cmpi "slt", %3932, %c0 : index + %3934 = subi %c-1, %3932 : index + %3935 = select %3933, %3934, %3932 : index + %3936 = divi_signed %3935, %c16 : index + %3937 = subi %c-1, %3936 : index + %3938 = select %3933, %3937, %3936 : index + %3939 = muli %3938, %c-2 : index + %3940 = addi %3931, %3939 : index + %3941 = cmpi "slt", %arg5, %c0 : index + %3942 = subi %c-1, %arg5 : index + %3943 = select %3941, %3942, %arg5 : index + %3944 = divi_signed %3943, %c8 : index + %3945 = subi %c-1, %3944 : index + %3946 = select %3941, %3945, %3944 : index + %3947 = addi %arg5, %c8 : index + %3948 = cmpi "slt", %3947, %c0 : index + %3949 = subi %c-1, %3947 : index + %3950 = select %3948, %3949, %3947 : index + %3951 = divi_signed %3950, %c16 : index + %3952 = subi %c-1, %3951 : index + %3953 = select %3948, %3952, %3951 : index + %3954 = muli %3953, %c-2 : index + %3955 = addi %3946, %3954 : index + %3956 = addi %3955, %c1 : index + %3957 = cmpi "slt", %3956, %c0 : index + %3958 = subi %c-1, %3956 : index + %3959 = select %3957, %3958, %3956 : index + %3960 = divi_signed %3959, %c2 : index + %3961 = subi %c-1, %3960 : index + %3962 = select %3957, %3961, %3960 : index + %3963 = muli %3962, %c-2 : index + %3964 = addi %3940, %3963 : index + %3965 = addi %3964, %c1 : index + store %3909, %2[%3920, %3925, %3965] : memref<16x6x2xvector<8xf32>> + %3966 = addi %arg5, %c8 : index + %3967 = cmpi "slt", %3966, %c0 : index + %3968 = subi %c-1, %3966 : index + %3969 = select %3967, %3968, %3966 : index + %3970 = divi_signed %3969, %c16 : index + %3971 = subi %c-1, %3970 : index + %3972 = select %3967, %3971, %3970 : index + %3973 = remi_signed %3972, %c16 : index + %3974 = cmpi "slt", %3973, %c0 : index + %3975 = addi %3973, %c16 : index + %3976 = select %3974, %3975, %3973 : index + %3977 = addi %arg7, %arg9 : index + %3978 = remi_signed %3977, %c6 : index + %3979 = cmpi "slt", %3978, %c0 : index + %3980 = addi %3978, %c6 : index + %3981 = select %3979, %3980, %3978 : index + %3982 = cmpi "slt", %arg5, %c0 : index + %3983 = subi %c-1, %arg5 : index + %3984 = select %3982, %3983, %arg5 : index + %3985 = divi_signed %3984, %c8 : index + %3986 = subi %c-1, %3985 : index + %3987 = select %3982, %3986, %3985 : index + %3988 = addi %arg5, %c8 : index + %3989 = cmpi "slt", %3988, %c0 : index + %3990 = subi %c-1, %3988 : index + %3991 = select %3989, %3990, %3988 : index + %3992 = divi_signed %3991, %c16 : index + %3993 = subi %c-1, %3992 : index + %3994 = select %3989, %3993, %3992 : index + %3995 = muli %3994, %c-2 : index + %3996 = addi %3987, %3995 : index + %3997 = cmpi "slt", %arg5, %c0 : index + %3998 = subi %c-1, %arg5 : index + %3999 = select %3997, %3998, %arg5 : index + %4000 = divi_signed %3999, %c8 : index + %4001 = subi %c-1, %4000 : index + %4002 = select %3997, %4001, %4000 : index + %4003 = addi %arg5, %c8 : index + %4004 = cmpi "slt", %4003, %c0 : index + %4005 = subi %c-1, %4003 : index + %4006 = select %4004, %4005, %4003 : index + %4007 = divi_signed %4006, %c16 : index + %4008 = subi %c-1, %4007 : index + %4009 = select %4004, %4008, %4007 : index + %4010 = muli %4009, %c-2 : index + %4011 = addi %4002, %4010 : index + %4012 = addi %4011, %c1 : index + %4013 = cmpi "slt", %4012, %c0 : index + %4014 = subi %c-1, %4012 : index + %4015 = select %4013, %4014, %4012 : index + %4016 = divi_signed %4015, %c2 : index + %4017 = subi %c-1, %4016 : index + %4018 = select %4013, %4017, %4016 : index + %4019 = muli %4018, %c-2 : index + %4020 = addi %3996, %4019 : index + %4021 = addi %4020, %c1 : index + %4022 = load %2[%3976, %3981, %4021] : memref<16x6x2xvector<8xf32>> + %4023 = vector.insertelement %2481, %4022[%c5_i64 : i64] : vector<8xf32> + %4024 = addi %arg5, %c8 : index + %4025 = cmpi "slt", %4024, %c0 : index + %4026 = subi %c-1, %4024 : index + %4027 = select %4025, %4026, %4024 : index + %4028 = divi_signed %4027, %c16 : index + %4029 = subi %c-1, %4028 : index + %4030 = select %4025, %4029, %4028 : index + %4031 = remi_signed %4030, %c16 : index + %4032 = cmpi "slt", %4031, %c0 : index + %4033 = addi %4031, %c16 : index + %4034 = select %4032, %4033, %4031 : index + %4035 = addi %arg7, %arg9 : index + %4036 = remi_signed %4035, %c6 : index + %4037 = cmpi "slt", %4036, %c0 : index + %4038 = addi %4036, %c6 : index + %4039 = select %4037, %4038, %4036 : index + %4040 = cmpi "slt", %arg5, %c0 : index + %4041 = subi %c-1, %arg5 : index + %4042 = select %4040, %4041, %arg5 : index + %4043 = divi_signed %4042, %c8 : index + %4044 = subi %c-1, %4043 : index + %4045 = select %4040, %4044, %4043 : index + %4046 = addi %arg5, %c8 : index + %4047 = cmpi "slt", %4046, %c0 : index + %4048 = subi %c-1, %4046 : index + %4049 = select %4047, %4048, %4046 : index + %4050 = divi_signed %4049, %c16 : index + %4051 = subi %c-1, %4050 : index + %4052 = select %4047, %4051, %4050 : index + %4053 = muli %4052, %c-2 : index + %4054 = addi %4045, %4053 : index + %4055 = cmpi "slt", %arg5, %c0 : index + %4056 = subi %c-1, %arg5 : index + %4057 = select %4055, %4056, %arg5 : index + %4058 = divi_signed %4057, %c8 : index + %4059 = subi %c-1, %4058 : index + %4060 = select %4055, %4059, %4058 : index + %4061 = addi %arg5, %c8 : index + %4062 = cmpi "slt", %4061, %c0 : index + %4063 = subi %c-1, %4061 : index + %4064 = select %4062, %4063, %4061 : index + %4065 = divi_signed %4064, %c16 : index + %4066 = subi %c-1, %4065 : index + %4067 = select %4062, %4066, %4065 : index + %4068 = muli %4067, %c-2 : index + %4069 = addi %4060, %4068 : index + %4070 = addi %4069, %c1 : index + %4071 = cmpi "slt", %4070, %c0 : index + %4072 = subi %c-1, %4070 : index + %4073 = select %4071, %4072, %4070 : index + %4074 = divi_signed %4073, %c2 : index + %4075 = subi %c-1, %4074 : index + %4076 = select %4071, %4075, %4074 : index + %4077 = muli %4076, %c-2 : index + %4078 = addi %4054, %4077 : index + %4079 = addi %4078, %c1 : index + store %4023, %2[%4034, %4039, %4079] : memref<16x6x2xvector<8xf32>> + %4080 = addi %arg5, %c8 : index + %4081 = cmpi "slt", %4080, %c0 : index + %4082 = subi %c-1, %4080 : index + %4083 = select %4081, %4082, %4080 : index + %4084 = divi_signed %4083, %c16 : index + %4085 = subi %c-1, %4084 : index + %4086 = select %4081, %4085, %4084 : index + %4087 = remi_signed %4086, %c16 : index + %4088 = cmpi "slt", %4087, %c0 : index + %4089 = addi %4087, %c16 : index + %4090 = select %4088, %4089, %4087 : index + %4091 = addi %arg7, %arg9 : index + %4092 = remi_signed %4091, %c6 : index + %4093 = cmpi "slt", %4092, %c0 : index + %4094 = addi %4092, %c6 : index + %4095 = select %4093, %4094, %4092 : index + %4096 = cmpi "slt", %arg5, %c0 : index + %4097 = subi %c-1, %arg5 : index + %4098 = select %4096, %4097, %arg5 : index + %4099 = divi_signed %4098, %c8 : index + %4100 = subi %c-1, %4099 : index + %4101 = select %4096, %4100, %4099 : index + %4102 = addi %arg5, %c8 : index + %4103 = cmpi "slt", %4102, %c0 : index + %4104 = subi %c-1, %4102 : index + %4105 = select %4103, %4104, %4102 : index + %4106 = divi_signed %4105, %c16 : index + %4107 = subi %c-1, %4106 : index + %4108 = select %4103, %4107, %4106 : index + %4109 = muli %4108, %c-2 : index + %4110 = addi %4101, %4109 : index + %4111 = cmpi "slt", %arg5, %c0 : index + %4112 = subi %c-1, %arg5 : index + %4113 = select %4111, %4112, %arg5 : index + %4114 = divi_signed %4113, %c8 : index + %4115 = subi %c-1, %4114 : index + %4116 = select %4111, %4115, %4114 : index + %4117 = addi %arg5, %c8 : index + %4118 = cmpi "slt", %4117, %c0 : index + %4119 = subi %c-1, %4117 : index + %4120 = select %4118, %4119, %4117 : index + %4121 = divi_signed %4120, %c16 : index + %4122 = subi %c-1, %4121 : index + %4123 = select %4118, %4122, %4121 : index + %4124 = muli %4123, %c-2 : index + %4125 = addi %4116, %4124 : index + %4126 = addi %4125, %c1 : index + %4127 = cmpi "slt", %4126, %c0 : index + %4128 = subi %c-1, %4126 : index + %4129 = select %4127, %4128, %4126 : index + %4130 = divi_signed %4129, %c2 : index + %4131 = subi %c-1, %4130 : index + %4132 = select %4127, %4131, %4130 : index + %4133 = muli %4132, %c-2 : index + %4134 = addi %4110, %4133 : index + %4135 = addi %4134, %c1 : index + %4136 = load %2[%4090, %4095, %4135] : memref<16x6x2xvector<8xf32>> + %4137 = vector.insertelement %2482, %4136[%c6_i64 : i64] : vector<8xf32> + %4138 = addi %arg5, %c8 : index + %4139 = cmpi "slt", %4138, %c0 : index + %4140 = subi %c-1, %4138 : index + %4141 = select %4139, %4140, %4138 : index + %4142 = divi_signed %4141, %c16 : index + %4143 = subi %c-1, %4142 : index + %4144 = select %4139, %4143, %4142 : index + %4145 = remi_signed %4144, %c16 : index + %4146 = cmpi "slt", %4145, %c0 : index + %4147 = addi %4145, %c16 : index + %4148 = select %4146, %4147, %4145 : index + %4149 = addi %arg7, %arg9 : index + %4150 = remi_signed %4149, %c6 : index + %4151 = cmpi "slt", %4150, %c0 : index + %4152 = addi %4150, %c6 : index + %4153 = select %4151, %4152, %4150 : index + %4154 = cmpi "slt", %arg5, %c0 : index + %4155 = subi %c-1, %arg5 : index + %4156 = select %4154, %4155, %arg5 : index + %4157 = divi_signed %4156, %c8 : index + %4158 = subi %c-1, %4157 : index + %4159 = select %4154, %4158, %4157 : index + %4160 = addi %arg5, %c8 : index + %4161 = cmpi "slt", %4160, %c0 : index + %4162 = subi %c-1, %4160 : index + %4163 = select %4161, %4162, %4160 : index + %4164 = divi_signed %4163, %c16 : index + %4165 = subi %c-1, %4164 : index + %4166 = select %4161, %4165, %4164 : index + %4167 = muli %4166, %c-2 : index + %4168 = addi %4159, %4167 : index + %4169 = cmpi "slt", %arg5, %c0 : index + %4170 = subi %c-1, %arg5 : index + %4171 = select %4169, %4170, %arg5 : index + %4172 = divi_signed %4171, %c8 : index + %4173 = subi %c-1, %4172 : index + %4174 = select %4169, %4173, %4172 : index + %4175 = addi %arg5, %c8 : index + %4176 = cmpi "slt", %4175, %c0 : index + %4177 = subi %c-1, %4175 : index + %4178 = select %4176, %4177, %4175 : index + %4179 = divi_signed %4178, %c16 : index + %4180 = subi %c-1, %4179 : index + %4181 = select %4176, %4180, %4179 : index + %4182 = muli %4181, %c-2 : index + %4183 = addi %4174, %4182 : index + %4184 = addi %4183, %c1 : index + %4185 = cmpi "slt", %4184, %c0 : index + %4186 = subi %c-1, %4184 : index + %4187 = select %4185, %4186, %4184 : index + %4188 = divi_signed %4187, %c2 : index + %4189 = subi %c-1, %4188 : index + %4190 = select %4185, %4189, %4188 : index + %4191 = muli %4190, %c-2 : index + %4192 = addi %4168, %4191 : index + %4193 = addi %4192, %c1 : index + store %4137, %2[%4148, %4153, %4193] : memref<16x6x2xvector<8xf32>> + %4194 = addi %arg5, %c8 : index + %4195 = cmpi "slt", %4194, %c0 : index + %4196 = subi %c-1, %4194 : index + %4197 = select %4195, %4196, %4194 : index + %4198 = divi_signed %4197, %c16 : index + %4199 = subi %c-1, %4198 : index + %4200 = select %4195, %4199, %4198 : index + %4201 = remi_signed %4200, %c16 : index + %4202 = cmpi "slt", %4201, %c0 : index + %4203 = addi %4201, %c16 : index + %4204 = select %4202, %4203, %4201 : index + %4205 = addi %arg7, %arg9 : index + %4206 = remi_signed %4205, %c6 : index + %4207 = cmpi "slt", %4206, %c0 : index + %4208 = addi %4206, %c6 : index + %4209 = select %4207, %4208, %4206 : index + %4210 = cmpi "slt", %arg5, %c0 : index + %4211 = subi %c-1, %arg5 : index + %4212 = select %4210, %4211, %arg5 : index + %4213 = divi_signed %4212, %c8 : index + %4214 = subi %c-1, %4213 : index + %4215 = select %4210, %4214, %4213 : index + %4216 = addi %arg5, %c8 : index + %4217 = cmpi "slt", %4216, %c0 : index + %4218 = subi %c-1, %4216 : index + %4219 = select %4217, %4218, %4216 : index + %4220 = divi_signed %4219, %c16 : index + %4221 = subi %c-1, %4220 : index + %4222 = select %4217, %4221, %4220 : index + %4223 = muli %4222, %c-2 : index + %4224 = addi %4215, %4223 : index + %4225 = cmpi "slt", %arg5, %c0 : index + %4226 = subi %c-1, %arg5 : index + %4227 = select %4225, %4226, %arg5 : index + %4228 = divi_signed %4227, %c8 : index + %4229 = subi %c-1, %4228 : index + %4230 = select %4225, %4229, %4228 : index + %4231 = addi %arg5, %c8 : index + %4232 = cmpi "slt", %4231, %c0 : index + %4233 = subi %c-1, %4231 : index + %4234 = select %4232, %4233, %4231 : index + %4235 = divi_signed %4234, %c16 : index + %4236 = subi %c-1, %4235 : index + %4237 = select %4232, %4236, %4235 : index + %4238 = muli %4237, %c-2 : index + %4239 = addi %4230, %4238 : index + %4240 = addi %4239, %c1 : index + %4241 = cmpi "slt", %4240, %c0 : index + %4242 = subi %c-1, %4240 : index + %4243 = select %4241, %4242, %4240 : index + %4244 = divi_signed %4243, %c2 : index + %4245 = subi %c-1, %4244 : index + %4246 = select %4241, %4245, %4244 : index + %4247 = muli %4246, %c-2 : index + %4248 = addi %4224, %4247 : index + %4249 = addi %4248, %c1 : index + %4250 = load %2[%4204, %4209, %4249] : memref<16x6x2xvector<8xf32>> + %4251 = vector.insertelement %2483, %4250[%c7_i64 : i64] : vector<8xf32> + %4252 = addi %arg5, %c8 : index + %4253 = cmpi "slt", %4252, %c0 : index + %4254 = subi %c-1, %4252 : index + %4255 = select %4253, %4254, %4252 : index + %4256 = divi_signed %4255, %c16 : index + %4257 = subi %c-1, %4256 : index + %4258 = select %4253, %4257, %4256 : index + %4259 = remi_signed %4258, %c16 : index + %4260 = cmpi "slt", %4259, %c0 : index + %4261 = addi %4259, %c16 : index + %4262 = select %4260, %4261, %4259 : index + %4263 = addi %arg7, %arg9 : index + %4264 = remi_signed %4263, %c6 : index + %4265 = cmpi "slt", %4264, %c0 : index + %4266 = addi %4264, %c6 : index + %4267 = select %4265, %4266, %4264 : index + %4268 = cmpi "slt", %arg5, %c0 : index + %4269 = subi %c-1, %arg5 : index + %4270 = select %4268, %4269, %arg5 : index + %4271 = divi_signed %4270, %c8 : index + %4272 = subi %c-1, %4271 : index + %4273 = select %4268, %4272, %4271 : index + %4274 = addi %arg5, %c8 : index + %4275 = cmpi "slt", %4274, %c0 : index + %4276 = subi %c-1, %4274 : index + %4277 = select %4275, %4276, %4274 : index + %4278 = divi_signed %4277, %c16 : index + %4279 = subi %c-1, %4278 : index + %4280 = select %4275, %4279, %4278 : index + %4281 = muli %4280, %c-2 : index + %4282 = addi %4273, %4281 : index + %4283 = cmpi "slt", %arg5, %c0 : index + %4284 = subi %c-1, %arg5 : index + %4285 = select %4283, %4284, %arg5 : index + %4286 = divi_signed %4285, %c8 : index + %4287 = subi %c-1, %4286 : index + %4288 = select %4283, %4287, %4286 : index + %4289 = addi %arg5, %c8 : index + %4290 = cmpi "slt", %4289, %c0 : index + %4291 = subi %c-1, %4289 : index + %4292 = select %4290, %4291, %4289 : index + %4293 = divi_signed %4292, %c16 : index + %4294 = subi %c-1, %4293 : index + %4295 = select %4290, %4294, %4293 : index + %4296 = muli %4295, %c-2 : index + %4297 = addi %4288, %4296 : index + %4298 = addi %4297, %c1 : index + %4299 = cmpi "slt", %4298, %c0 : index + %4300 = subi %c-1, %4298 : index + %4301 = select %4299, %4300, %4298 : index + %4302 = divi_signed %4301, %c2 : index + %4303 = subi %c-1, %4302 : index + %4304 = select %4299, %4303, %4302 : index + %4305 = muli %4304, %c-2 : index + %4306 = addi %4282, %4305 : index + %4307 = addi %4306, %c1 : index + store %4251, %2[%4262, %4267, %4307] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = addi %arg6, %arg7 : index + %6 = addi %arg6, %arg7 : index + %7 = addi %arg6, %arg7 : index + %8 = addi %arg6, %arg7 : index + %9 = addi %arg6, %arg7 : index + %10 = addi %arg6, %arg7 : index + %11 = addi %arg6, %arg7 : index + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%arg4, %5] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%arg4, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = load %arg0[%arg4, %7] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %16 = load %arg0[%arg4, %8] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg0[%arg4, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %18 = load %arg0[%arg4, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg0[%arg4, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %20 = cmpi "slt", %arg5, %c0 : index + %21 = subi %c-1, %arg5 : index + %22 = select %20, %21, %arg5 : index + %23 = divi_signed %22, %c16 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c16 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c16 : index + %29 = select %27, %28, %26 : index + %30 = addi %arg6, %arg7 : index + %31 = remi_signed %30, %c128 : index + %32 = cmpi "slt", %31, %c0 : index + %33 = addi %31, %c128 : index + %34 = select %32, %33, %31 : index + %35 = remi_signed %arg5, %c16 : index + %36 = cmpi "slt", %35, %c0 : index + %37 = addi %35, %c16 : index + %38 = select %36, %37, %35 : index + %39 = cmpi "slt", %38, %c0 : index + %40 = subi %c-1, %38 : index + %41 = select %39, %40, %38 : index + %42 = divi_signed %41, %c8 : index + %43 = subi %c-1, %42 : index + %44 = select %39, %43, %42 : index + %45 = remi_signed %44, %c2 : index + %46 = cmpi "slt", %45, %c0 : index + %47 = addi %45, %c2 : index + %48 = select %46, %47, %45 : index + %49 = load %3[%29, %34, %48] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c0_i64 : i64] : vector<8xf32> + %51 = cmpi "slt", %arg5, %c0 : index + %52 = subi %c-1, %arg5 : index + %53 = select %51, %52, %arg5 : index + %54 = divi_signed %53, %c16 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = remi_signed %56, %c16 : index + %58 = cmpi "slt", %57, %c0 : index + %59 = addi %57, %c16 : index + %60 = select %58, %59, %57 : index + %61 = addi %arg6, %arg7 : index + %62 = remi_signed %61, %c128 : index + %63 = cmpi "slt", %62, %c0 : index + %64 = addi %62, %c128 : index + %65 = select %63, %64, %62 : index + %66 = remi_signed %arg5, %c16 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = addi %66, %c16 : index + %69 = select %67, %68, %66 : index + %70 = cmpi "slt", %69, %c0 : index + %71 = subi %c-1, %69 : index + %72 = select %70, %71, %69 : index + %73 = divi_signed %72, %c8 : index + %74 = subi %c-1, %73 : index + %75 = select %70, %74, %73 : index + %76 = remi_signed %75, %c2 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = addi %76, %c2 : index + %79 = select %77, %78, %76 : index + %80 = load %3[%60, %65, %79] : memref<16x128x2xvector<8xf32>> + %81 = vector.extractelement %80[%c1_i64 : i64] : vector<8xf32> + %82 = cmpi "slt", %arg5, %c0 : index + %83 = subi %c-1, %arg5 : index + %84 = select %82, %83, %arg5 : index + %85 = divi_signed %84, %c16 : index + %86 = subi %c-1, %85 : index + %87 = select %82, %86, %85 : index + %88 = remi_signed %87, %c16 : index + %89 = cmpi "slt", %88, %c0 : index + %90 = addi %88, %c16 : index + %91 = select %89, %90, %88 : index + %92 = addi %arg6, %arg7 : index + %93 = remi_signed %92, %c128 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = addi %93, %c128 : index + %96 = select %94, %95, %93 : index + %97 = remi_signed %arg5, %c16 : index + %98 = cmpi "slt", %97, %c0 : index + %99 = addi %97, %c16 : index + %100 = select %98, %99, %97 : index + %101 = cmpi "slt", %100, %c0 : index + %102 = subi %c-1, %100 : index + %103 = select %101, %102, %100 : index + %104 = divi_signed %103, %c8 : index + %105 = subi %c-1, %104 : index + %106 = select %101, %105, %104 : index + %107 = remi_signed %106, %c2 : index + %108 = cmpi "slt", %107, %c0 : index + %109 = addi %107, %c2 : index + %110 = select %108, %109, %107 : index + %111 = load %3[%91, %96, %110] : memref<16x128x2xvector<8xf32>> + %112 = vector.extractelement %111[%c2_i64 : i64] : vector<8xf32> + %113 = cmpi "slt", %arg5, %c0 : index + %114 = subi %c-1, %arg5 : index + %115 = select %113, %114, %arg5 : index + %116 = divi_signed %115, %c16 : index + %117 = subi %c-1, %116 : index + %118 = select %113, %117, %116 : index + %119 = remi_signed %118, %c16 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = addi %119, %c16 : index + %122 = select %120, %121, %119 : index + %123 = addi %arg6, %arg7 : index + %124 = remi_signed %123, %c128 : index + %125 = cmpi "slt", %124, %c0 : index + %126 = addi %124, %c128 : index + %127 = select %125, %126, %124 : index + %128 = remi_signed %arg5, %c16 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = addi %128, %c16 : index + %131 = select %129, %130, %128 : index + %132 = cmpi "slt", %131, %c0 : index + %133 = subi %c-1, %131 : index + %134 = select %132, %133, %131 : index + %135 = divi_signed %134, %c8 : index + %136 = subi %c-1, %135 : index + %137 = select %132, %136, %135 : index + %138 = remi_signed %137, %c2 : index + %139 = cmpi "slt", %138, %c0 : index + %140 = addi %138, %c2 : index + %141 = select %139, %140, %138 : index + %142 = load %3[%122, %127, %141] : memref<16x128x2xvector<8xf32>> + %143 = vector.extractelement %142[%c3_i64 : i64] : vector<8xf32> + %144 = cmpi "slt", %arg5, %c0 : index + %145 = subi %c-1, %arg5 : index + %146 = select %144, %145, %arg5 : index + %147 = divi_signed %146, %c16 : index + %148 = subi %c-1, %147 : index + %149 = select %144, %148, %147 : index + %150 = remi_signed %149, %c16 : index + %151 = cmpi "slt", %150, %c0 : index + %152 = addi %150, %c16 : index + %153 = select %151, %152, %150 : index + %154 = addi %arg6, %arg7 : index + %155 = remi_signed %154, %c128 : index + %156 = cmpi "slt", %155, %c0 : index + %157 = addi %155, %c128 : index + %158 = select %156, %157, %155 : index + %159 = remi_signed %arg5, %c16 : index + %160 = cmpi "slt", %159, %c0 : index + %161 = addi %159, %c16 : index + %162 = select %160, %161, %159 : index + %163 = cmpi "slt", %162, %c0 : index + %164 = subi %c-1, %162 : index + %165 = select %163, %164, %162 : index + %166 = divi_signed %165, %c8 : index + %167 = subi %c-1, %166 : index + %168 = select %163, %167, %166 : index + %169 = remi_signed %168, %c2 : index + %170 = cmpi "slt", %169, %c0 : index + %171 = addi %169, %c2 : index + %172 = select %170, %171, %169 : index + %173 = load %3[%153, %158, %172] : memref<16x128x2xvector<8xf32>> + %174 = vector.extractelement %173[%c4_i64 : i64] : vector<8xf32> + %175 = cmpi "slt", %arg5, %c0 : index + %176 = subi %c-1, %arg5 : index + %177 = select %175, %176, %arg5 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = addi %arg6, %arg7 : index + %186 = remi_signed %185, %c128 : index + %187 = cmpi "slt", %186, %c0 : index + %188 = addi %186, %c128 : index + %189 = select %187, %188, %186 : index + %190 = remi_signed %arg5, %c16 : index + %191 = cmpi "slt", %190, %c0 : index + %192 = addi %190, %c16 : index + %193 = select %191, %192, %190 : index + %194 = cmpi "slt", %193, %c0 : index + %195 = subi %c-1, %193 : index + %196 = select %194, %195, %193 : index + %197 = divi_signed %196, %c8 : index + %198 = subi %c-1, %197 : index + %199 = select %194, %198, %197 : index + %200 = remi_signed %199, %c2 : index + %201 = cmpi "slt", %200, %c0 : index + %202 = addi %200, %c2 : index + %203 = select %201, %202, %200 : index + %204 = load %3[%184, %189, %203] : memref<16x128x2xvector<8xf32>> + %205 = vector.extractelement %204[%c5_i64 : i64] : vector<8xf32> + %206 = cmpi "slt", %arg5, %c0 : index + %207 = subi %c-1, %arg5 : index + %208 = select %206, %207, %arg5 : index + %209 = divi_signed %208, %c16 : index + %210 = subi %c-1, %209 : index + %211 = select %206, %210, %209 : index + %212 = remi_signed %211, %c16 : index + %213 = cmpi "slt", %212, %c0 : index + %214 = addi %212, %c16 : index + %215 = select %213, %214, %212 : index + %216 = addi %arg6, %arg7 : index + %217 = remi_signed %216, %c128 : index + %218 = cmpi "slt", %217, %c0 : index + %219 = addi %217, %c128 : index + %220 = select %218, %219, %217 : index + %221 = remi_signed %arg5, %c16 : index + %222 = cmpi "slt", %221, %c0 : index + %223 = addi %221, %c16 : index + %224 = select %222, %223, %221 : index + %225 = cmpi "slt", %224, %c0 : index + %226 = subi %c-1, %224 : index + %227 = select %225, %226, %224 : index + %228 = divi_signed %227, %c8 : index + %229 = subi %c-1, %228 : index + %230 = select %225, %229, %228 : index + %231 = remi_signed %230, %c2 : index + %232 = cmpi "slt", %231, %c0 : index + %233 = addi %231, %c2 : index + %234 = select %232, %233, %231 : index + %235 = load %3[%215, %220, %234] : memref<16x128x2xvector<8xf32>> + %236 = vector.extractelement %235[%c6_i64 : i64] : vector<8xf32> + %237 = cmpi "slt", %arg5, %c0 : index + %238 = subi %c-1, %arg5 : index + %239 = select %237, %238, %arg5 : index + %240 = divi_signed %239, %c16 : index + %241 = subi %c-1, %240 : index + %242 = select %237, %241, %240 : index + %243 = remi_signed %242, %c16 : index + %244 = cmpi "slt", %243, %c0 : index + %245 = addi %243, %c16 : index + %246 = select %244, %245, %243 : index + %247 = addi %arg6, %arg7 : index + %248 = remi_signed %247, %c128 : index + %249 = cmpi "slt", %248, %c0 : index + %250 = addi %248, %c128 : index + %251 = select %249, %250, %248 : index + %252 = remi_signed %arg5, %c16 : index + %253 = cmpi "slt", %252, %c0 : index + %254 = addi %252, %c16 : index + %255 = select %253, %254, %252 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c8 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = remi_signed %261, %c2 : index + %263 = cmpi "slt", %262, %c0 : index + %264 = addi %262, %c2 : index + %265 = select %263, %264, %262 : index + %266 = load %3[%246, %251, %265] : memref<16x128x2xvector<8xf32>> + %267 = vector.extractelement %266[%c7_i64 : i64] : vector<8xf32> + %268 = mulf %12, %50 {RelaxedPrecision} : f32 + %269 = mulf %13, %81 {RelaxedPrecision} : f32 + %270 = mulf %14, %112 {RelaxedPrecision} : f32 + %271 = mulf %15, %143 {RelaxedPrecision} : f32 + %272 = mulf %16, %174 {RelaxedPrecision} : f32 + %273 = mulf %17, %205 {RelaxedPrecision} : f32 + %274 = mulf %18, %236 {RelaxedPrecision} : f32 + %275 = mulf %19, %267 {RelaxedPrecision} : f32 + %276 = cmpi "slt", %arg5, %c0 : index + %277 = subi %c-1, %arg5 : index + %278 = select %276, %277, %arg5 : index + %279 = divi_signed %278, %c16 : index + %280 = subi %c-1, %279 : index + %281 = select %276, %280, %279 : index + %282 = remi_signed %281, %c16 : index + %283 = cmpi "slt", %282, %c0 : index + %284 = addi %282, %c16 : index + %285 = select %283, %284, %282 : index + %286 = remi_signed %arg5, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = cmpi "slt", %289, %c0 : index + %291 = subi %c-1, %289 : index + %292 = select %290, %291, %289 : index + %293 = divi_signed %292, %c8 : index + %294 = subi %c-1, %293 : index + %295 = select %290, %294, %293 : index + %296 = remi_signed %295, %c2 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = addi %296, %c2 : index + %299 = select %297, %298, %296 : index + %300 = load %2[%285, %c0, %299] : memref<16x6x2xvector<8xf32>> + %301 = vector.extractelement %300[%c0_i64 : i64] : vector<8xf32> + %302 = cmpi "slt", %arg5, %c0 : index + %303 = subi %c-1, %arg5 : index + %304 = select %302, %303, %arg5 : index + %305 = divi_signed %304, %c16 : index + %306 = subi %c-1, %305 : index + %307 = select %302, %306, %305 : index + %308 = remi_signed %307, %c16 : index + %309 = cmpi "slt", %308, %c0 : index + %310 = addi %308, %c16 : index + %311 = select %309, %310, %308 : index + %312 = remi_signed %arg5, %c16 : index + %313 = cmpi "slt", %312, %c0 : index + %314 = addi %312, %c16 : index + %315 = select %313, %314, %312 : index + %316 = cmpi "slt", %315, %c0 : index + %317 = subi %c-1, %315 : index + %318 = select %316, %317, %315 : index + %319 = divi_signed %318, %c8 : index + %320 = subi %c-1, %319 : index + %321 = select %316, %320, %319 : index + %322 = remi_signed %321, %c2 : index + %323 = cmpi "slt", %322, %c0 : index + %324 = addi %322, %c2 : index + %325 = select %323, %324, %322 : index + %326 = load %2[%311, %c0, %325] : memref<16x6x2xvector<8xf32>> + %327 = vector.extractelement %326[%c1_i64 : i64] : vector<8xf32> + %328 = cmpi "slt", %arg5, %c0 : index + %329 = subi %c-1, %arg5 : index + %330 = select %328, %329, %arg5 : index + %331 = divi_signed %330, %c16 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = remi_signed %333, %c16 : index + %335 = cmpi "slt", %334, %c0 : index + %336 = addi %334, %c16 : index + %337 = select %335, %336, %334 : index + %338 = remi_signed %arg5, %c16 : index + %339 = cmpi "slt", %338, %c0 : index + %340 = addi %338, %c16 : index + %341 = select %339, %340, %338 : index + %342 = cmpi "slt", %341, %c0 : index + %343 = subi %c-1, %341 : index + %344 = select %342, %343, %341 : index + %345 = divi_signed %344, %c8 : index + %346 = subi %c-1, %345 : index + %347 = select %342, %346, %345 : index + %348 = remi_signed %347, %c2 : index + %349 = cmpi "slt", %348, %c0 : index + %350 = addi %348, %c2 : index + %351 = select %349, %350, %348 : index + %352 = load %2[%337, %c0, %351] : memref<16x6x2xvector<8xf32>> + %353 = vector.extractelement %352[%c2_i64 : i64] : vector<8xf32> + %354 = cmpi "slt", %arg5, %c0 : index + %355 = subi %c-1, %arg5 : index + %356 = select %354, %355, %arg5 : index + %357 = divi_signed %356, %c16 : index + %358 = subi %c-1, %357 : index + %359 = select %354, %358, %357 : index + %360 = remi_signed %359, %c16 : index + %361 = cmpi "slt", %360, %c0 : index + %362 = addi %360, %c16 : index + %363 = select %361, %362, %360 : index + %364 = remi_signed %arg5, %c16 : index + %365 = cmpi "slt", %364, %c0 : index + %366 = addi %364, %c16 : index + %367 = select %365, %366, %364 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = subi %c-1, %367 : index + %370 = select %368, %369, %367 : index + %371 = divi_signed %370, %c8 : index + %372 = subi %c-1, %371 : index + %373 = select %368, %372, %371 : index + %374 = remi_signed %373, %c2 : index + %375 = cmpi "slt", %374, %c0 : index + %376 = addi %374, %c2 : index + %377 = select %375, %376, %374 : index + %378 = load %2[%363, %c0, %377] : memref<16x6x2xvector<8xf32>> + %379 = vector.extractelement %378[%c3_i64 : i64] : vector<8xf32> + %380 = cmpi "slt", %arg5, %c0 : index + %381 = subi %c-1, %arg5 : index + %382 = select %380, %381, %arg5 : index + %383 = divi_signed %382, %c16 : index + %384 = subi %c-1, %383 : index + %385 = select %380, %384, %383 : index + %386 = remi_signed %385, %c16 : index + %387 = cmpi "slt", %386, %c0 : index + %388 = addi %386, %c16 : index + %389 = select %387, %388, %386 : index + %390 = remi_signed %arg5, %c16 : index + %391 = cmpi "slt", %390, %c0 : index + %392 = addi %390, %c16 : index + %393 = select %391, %392, %390 : index + %394 = cmpi "slt", %393, %c0 : index + %395 = subi %c-1, %393 : index + %396 = select %394, %395, %393 : index + %397 = divi_signed %396, %c8 : index + %398 = subi %c-1, %397 : index + %399 = select %394, %398, %397 : index + %400 = remi_signed %399, %c2 : index + %401 = cmpi "slt", %400, %c0 : index + %402 = addi %400, %c2 : index + %403 = select %401, %402, %400 : index + %404 = load %2[%389, %c0, %403] : memref<16x6x2xvector<8xf32>> + %405 = vector.extractelement %404[%c4_i64 : i64] : vector<8xf32> + %406 = cmpi "slt", %arg5, %c0 : index + %407 = subi %c-1, %arg5 : index + %408 = select %406, %407, %arg5 : index + %409 = divi_signed %408, %c16 : index + %410 = subi %c-1, %409 : index + %411 = select %406, %410, %409 : index + %412 = remi_signed %411, %c16 : index + %413 = cmpi "slt", %412, %c0 : index + %414 = addi %412, %c16 : index + %415 = select %413, %414, %412 : index + %416 = remi_signed %arg5, %c16 : index + %417 = cmpi "slt", %416, %c0 : index + %418 = addi %416, %c16 : index + %419 = select %417, %418, %416 : index + %420 = cmpi "slt", %419, %c0 : index + %421 = subi %c-1, %419 : index + %422 = select %420, %421, %419 : index + %423 = divi_signed %422, %c8 : index + %424 = subi %c-1, %423 : index + %425 = select %420, %424, %423 : index + %426 = remi_signed %425, %c2 : index + %427 = cmpi "slt", %426, %c0 : index + %428 = addi %426, %c2 : index + %429 = select %427, %428, %426 : index + %430 = load %2[%415, %c0, %429] : memref<16x6x2xvector<8xf32>> + %431 = vector.extractelement %430[%c5_i64 : i64] : vector<8xf32> + %432 = cmpi "slt", %arg5, %c0 : index + %433 = subi %c-1, %arg5 : index + %434 = select %432, %433, %arg5 : index + %435 = divi_signed %434, %c16 : index + %436 = subi %c-1, %435 : index + %437 = select %432, %436, %435 : index + %438 = remi_signed %437, %c16 : index + %439 = cmpi "slt", %438, %c0 : index + %440 = addi %438, %c16 : index + %441 = select %439, %440, %438 : index + %442 = remi_signed %arg5, %c16 : index + %443 = cmpi "slt", %442, %c0 : index + %444 = addi %442, %c16 : index + %445 = select %443, %444, %442 : index + %446 = cmpi "slt", %445, %c0 : index + %447 = subi %c-1, %445 : index + %448 = select %446, %447, %445 : index + %449 = divi_signed %448, %c8 : index + %450 = subi %c-1, %449 : index + %451 = select %446, %450, %449 : index + %452 = remi_signed %451, %c2 : index + %453 = cmpi "slt", %452, %c0 : index + %454 = addi %452, %c2 : index + %455 = select %453, %454, %452 : index + %456 = load %2[%441, %c0, %455] : memref<16x6x2xvector<8xf32>> + %457 = vector.extractelement %456[%c6_i64 : i64] : vector<8xf32> + %458 = cmpi "slt", %arg5, %c0 : index + %459 = subi %c-1, %arg5 : index + %460 = select %458, %459, %arg5 : index + %461 = divi_signed %460, %c16 : index + %462 = subi %c-1, %461 : index + %463 = select %458, %462, %461 : index + %464 = remi_signed %463, %c16 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = addi %464, %c16 : index + %467 = select %465, %466, %464 : index + %468 = remi_signed %arg5, %c16 : index + %469 = cmpi "slt", %468, %c0 : index + %470 = addi %468, %c16 : index + %471 = select %469, %470, %468 : index + %472 = cmpi "slt", %471, %c0 : index + %473 = subi %c-1, %471 : index + %474 = select %472, %473, %471 : index + %475 = divi_signed %474, %c8 : index + %476 = subi %c-1, %475 : index + %477 = select %472, %476, %475 : index + %478 = remi_signed %477, %c2 : index + %479 = cmpi "slt", %478, %c0 : index + %480 = addi %478, %c2 : index + %481 = select %479, %480, %478 : index + %482 = load %2[%467, %c0, %481] : memref<16x6x2xvector<8xf32>> + %483 = vector.extractelement %482[%c7_i64 : i64] : vector<8xf32> + %484 = addf %301, %268 {RelaxedPrecision} : f32 + %485 = addf %327, %269 {RelaxedPrecision} : f32 + %486 = addf %353, %270 {RelaxedPrecision} : f32 + %487 = addf %379, %271 {RelaxedPrecision} : f32 + %488 = addf %405, %272 {RelaxedPrecision} : f32 + %489 = addf %431, %273 {RelaxedPrecision} : f32 + %490 = addf %457, %274 {RelaxedPrecision} : f32 + %491 = addf %483, %275 {RelaxedPrecision} : f32 + %492 = cmpi "slt", %arg5, %c0 : index + %493 = subi %c-1, %arg5 : index + %494 = select %492, %493, %arg5 : index + %495 = divi_signed %494, %c16 : index + %496 = subi %c-1, %495 : index + %497 = select %492, %496, %495 : index + %498 = remi_signed %497, %c16 : index + %499 = cmpi "slt", %498, %c0 : index + %500 = addi %498, %c16 : index + %501 = select %499, %500, %498 : index + %502 = remi_signed %arg5, %c16 : index + %503 = cmpi "slt", %502, %c0 : index + %504 = addi %502, %c16 : index + %505 = select %503, %504, %502 : index + %506 = cmpi "slt", %505, %c0 : index + %507 = subi %c-1, %505 : index + %508 = select %506, %507, %505 : index + %509 = divi_signed %508, %c8 : index + %510 = subi %c-1, %509 : index + %511 = select %506, %510, %509 : index + %512 = remi_signed %511, %c2 : index + %513 = cmpi "slt", %512, %c0 : index + %514 = addi %512, %c2 : index + %515 = select %513, %514, %512 : index + %516 = load %2[%501, %c0, %515] : memref<16x6x2xvector<8xf32>> + %517 = vector.insertelement %484, %516[%c0_i64 : i64] : vector<8xf32> + %518 = cmpi "slt", %arg5, %c0 : index + %519 = subi %c-1, %arg5 : index + %520 = select %518, %519, %arg5 : index + %521 = divi_signed %520, %c16 : index + %522 = subi %c-1, %521 : index + %523 = select %518, %522, %521 : index + %524 = remi_signed %523, %c16 : index + %525 = cmpi "slt", %524, %c0 : index + %526 = addi %524, %c16 : index + %527 = select %525, %526, %524 : index + %528 = remi_signed %arg5, %c16 : index + %529 = cmpi "slt", %528, %c0 : index + %530 = addi %528, %c16 : index + %531 = select %529, %530, %528 : index + %532 = cmpi "slt", %531, %c0 : index + %533 = subi %c-1, %531 : index + %534 = select %532, %533, %531 : index + %535 = divi_signed %534, %c8 : index + %536 = subi %c-1, %535 : index + %537 = select %532, %536, %535 : index + %538 = remi_signed %537, %c2 : index + %539 = cmpi "slt", %538, %c0 : index + %540 = addi %538, %c2 : index + %541 = select %539, %540, %538 : index + store %517, %2[%527, %c0, %541] : memref<16x6x2xvector<8xf32>> + %542 = cmpi "slt", %arg5, %c0 : index + %543 = subi %c-1, %arg5 : index + %544 = select %542, %543, %arg5 : index + %545 = divi_signed %544, %c16 : index + %546 = subi %c-1, %545 : index + %547 = select %542, %546, %545 : index + %548 = remi_signed %547, %c16 : index + %549 = cmpi "slt", %548, %c0 : index + %550 = addi %548, %c16 : index + %551 = select %549, %550, %548 : index + %552 = remi_signed %arg5, %c16 : index + %553 = cmpi "slt", %552, %c0 : index + %554 = addi %552, %c16 : index + %555 = select %553, %554, %552 : index + %556 = cmpi "slt", %555, %c0 : index + %557 = subi %c-1, %555 : index + %558 = select %556, %557, %555 : index + %559 = divi_signed %558, %c8 : index + %560 = subi %c-1, %559 : index + %561 = select %556, %560, %559 : index + %562 = remi_signed %561, %c2 : index + %563 = cmpi "slt", %562, %c0 : index + %564 = addi %562, %c2 : index + %565 = select %563, %564, %562 : index + %566 = load %2[%551, %c0, %565] : memref<16x6x2xvector<8xf32>> + %567 = vector.insertelement %485, %566[%c1_i64 : i64] : vector<8xf32> + %568 = cmpi "slt", %arg5, %c0 : index + %569 = subi %c-1, %arg5 : index + %570 = select %568, %569, %arg5 : index + %571 = divi_signed %570, %c16 : index + %572 = subi %c-1, %571 : index + %573 = select %568, %572, %571 : index + %574 = remi_signed %573, %c16 : index + %575 = cmpi "slt", %574, %c0 : index + %576 = addi %574, %c16 : index + %577 = select %575, %576, %574 : index + %578 = remi_signed %arg5, %c16 : index + %579 = cmpi "slt", %578, %c0 : index + %580 = addi %578, %c16 : index + %581 = select %579, %580, %578 : index + %582 = cmpi "slt", %581, %c0 : index + %583 = subi %c-1, %581 : index + %584 = select %582, %583, %581 : index + %585 = divi_signed %584, %c8 : index + %586 = subi %c-1, %585 : index + %587 = select %582, %586, %585 : index + %588 = remi_signed %587, %c2 : index + %589 = cmpi "slt", %588, %c0 : index + %590 = addi %588, %c2 : index + %591 = select %589, %590, %588 : index + store %567, %2[%577, %c0, %591] : memref<16x6x2xvector<8xf32>> + %592 = cmpi "slt", %arg5, %c0 : index + %593 = subi %c-1, %arg5 : index + %594 = select %592, %593, %arg5 : index + %595 = divi_signed %594, %c16 : index + %596 = subi %c-1, %595 : index + %597 = select %592, %596, %595 : index + %598 = remi_signed %597, %c16 : index + %599 = cmpi "slt", %598, %c0 : index + %600 = addi %598, %c16 : index + %601 = select %599, %600, %598 : index + %602 = remi_signed %arg5, %c16 : index + %603 = cmpi "slt", %602, %c0 : index + %604 = addi %602, %c16 : index + %605 = select %603, %604, %602 : index + %606 = cmpi "slt", %605, %c0 : index + %607 = subi %c-1, %605 : index + %608 = select %606, %607, %605 : index + %609 = divi_signed %608, %c8 : index + %610 = subi %c-1, %609 : index + %611 = select %606, %610, %609 : index + %612 = remi_signed %611, %c2 : index + %613 = cmpi "slt", %612, %c0 : index + %614 = addi %612, %c2 : index + %615 = select %613, %614, %612 : index + %616 = load %2[%601, %c0, %615] : memref<16x6x2xvector<8xf32>> + %617 = vector.insertelement %486, %616[%c2_i64 : i64] : vector<8xf32> + %618 = cmpi "slt", %arg5, %c0 : index + %619 = subi %c-1, %arg5 : index + %620 = select %618, %619, %arg5 : index + %621 = divi_signed %620, %c16 : index + %622 = subi %c-1, %621 : index + %623 = select %618, %622, %621 : index + %624 = remi_signed %623, %c16 : index + %625 = cmpi "slt", %624, %c0 : index + %626 = addi %624, %c16 : index + %627 = select %625, %626, %624 : index + %628 = remi_signed %arg5, %c16 : index + %629 = cmpi "slt", %628, %c0 : index + %630 = addi %628, %c16 : index + %631 = select %629, %630, %628 : index + %632 = cmpi "slt", %631, %c0 : index + %633 = subi %c-1, %631 : index + %634 = select %632, %633, %631 : index + %635 = divi_signed %634, %c8 : index + %636 = subi %c-1, %635 : index + %637 = select %632, %636, %635 : index + %638 = remi_signed %637, %c2 : index + %639 = cmpi "slt", %638, %c0 : index + %640 = addi %638, %c2 : index + %641 = select %639, %640, %638 : index + store %617, %2[%627, %c0, %641] : memref<16x6x2xvector<8xf32>> + %642 = cmpi "slt", %arg5, %c0 : index + %643 = subi %c-1, %arg5 : index + %644 = select %642, %643, %arg5 : index + %645 = divi_signed %644, %c16 : index + %646 = subi %c-1, %645 : index + %647 = select %642, %646, %645 : index + %648 = remi_signed %647, %c16 : index + %649 = cmpi "slt", %648, %c0 : index + %650 = addi %648, %c16 : index + %651 = select %649, %650, %648 : index + %652 = remi_signed %arg5, %c16 : index + %653 = cmpi "slt", %652, %c0 : index + %654 = addi %652, %c16 : index + %655 = select %653, %654, %652 : index + %656 = cmpi "slt", %655, %c0 : index + %657 = subi %c-1, %655 : index + %658 = select %656, %657, %655 : index + %659 = divi_signed %658, %c8 : index + %660 = subi %c-1, %659 : index + %661 = select %656, %660, %659 : index + %662 = remi_signed %661, %c2 : index + %663 = cmpi "slt", %662, %c0 : index + %664 = addi %662, %c2 : index + %665 = select %663, %664, %662 : index + %666 = load %2[%651, %c0, %665] : memref<16x6x2xvector<8xf32>> + %667 = vector.insertelement %487, %666[%c3_i64 : i64] : vector<8xf32> + %668 = cmpi "slt", %arg5, %c0 : index + %669 = subi %c-1, %arg5 : index + %670 = select %668, %669, %arg5 : index + %671 = divi_signed %670, %c16 : index + %672 = subi %c-1, %671 : index + %673 = select %668, %672, %671 : index + %674 = remi_signed %673, %c16 : index + %675 = cmpi "slt", %674, %c0 : index + %676 = addi %674, %c16 : index + %677 = select %675, %676, %674 : index + %678 = remi_signed %arg5, %c16 : index + %679 = cmpi "slt", %678, %c0 : index + %680 = addi %678, %c16 : index + %681 = select %679, %680, %678 : index + %682 = cmpi "slt", %681, %c0 : index + %683 = subi %c-1, %681 : index + %684 = select %682, %683, %681 : index + %685 = divi_signed %684, %c8 : index + %686 = subi %c-1, %685 : index + %687 = select %682, %686, %685 : index + %688 = remi_signed %687, %c2 : index + %689 = cmpi "slt", %688, %c0 : index + %690 = addi %688, %c2 : index + %691 = select %689, %690, %688 : index + store %667, %2[%677, %c0, %691] : memref<16x6x2xvector<8xf32>> + %692 = cmpi "slt", %arg5, %c0 : index + %693 = subi %c-1, %arg5 : index + %694 = select %692, %693, %arg5 : index + %695 = divi_signed %694, %c16 : index + %696 = subi %c-1, %695 : index + %697 = select %692, %696, %695 : index + %698 = remi_signed %697, %c16 : index + %699 = cmpi "slt", %698, %c0 : index + %700 = addi %698, %c16 : index + %701 = select %699, %700, %698 : index + %702 = remi_signed %arg5, %c16 : index + %703 = cmpi "slt", %702, %c0 : index + %704 = addi %702, %c16 : index + %705 = select %703, %704, %702 : index + %706 = cmpi "slt", %705, %c0 : index + %707 = subi %c-1, %705 : index + %708 = select %706, %707, %705 : index + %709 = divi_signed %708, %c8 : index + %710 = subi %c-1, %709 : index + %711 = select %706, %710, %709 : index + %712 = remi_signed %711, %c2 : index + %713 = cmpi "slt", %712, %c0 : index + %714 = addi %712, %c2 : index + %715 = select %713, %714, %712 : index + %716 = load %2[%701, %c0, %715] : memref<16x6x2xvector<8xf32>> + %717 = vector.insertelement %488, %716[%c4_i64 : i64] : vector<8xf32> + %718 = cmpi "slt", %arg5, %c0 : index + %719 = subi %c-1, %arg5 : index + %720 = select %718, %719, %arg5 : index + %721 = divi_signed %720, %c16 : index + %722 = subi %c-1, %721 : index + %723 = select %718, %722, %721 : index + %724 = remi_signed %723, %c16 : index + %725 = cmpi "slt", %724, %c0 : index + %726 = addi %724, %c16 : index + %727 = select %725, %726, %724 : index + %728 = remi_signed %arg5, %c16 : index + %729 = cmpi "slt", %728, %c0 : index + %730 = addi %728, %c16 : index + %731 = select %729, %730, %728 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c8 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = remi_signed %737, %c2 : index + %739 = cmpi "slt", %738, %c0 : index + %740 = addi %738, %c2 : index + %741 = select %739, %740, %738 : index + store %717, %2[%727, %c0, %741] : memref<16x6x2xvector<8xf32>> + %742 = cmpi "slt", %arg5, %c0 : index + %743 = subi %c-1, %arg5 : index + %744 = select %742, %743, %arg5 : index + %745 = divi_signed %744, %c16 : index + %746 = subi %c-1, %745 : index + %747 = select %742, %746, %745 : index + %748 = remi_signed %747, %c16 : index + %749 = cmpi "slt", %748, %c0 : index + %750 = addi %748, %c16 : index + %751 = select %749, %750, %748 : index + %752 = remi_signed %arg5, %c16 : index + %753 = cmpi "slt", %752, %c0 : index + %754 = addi %752, %c16 : index + %755 = select %753, %754, %752 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = subi %c-1, %755 : index + %758 = select %756, %757, %755 : index + %759 = divi_signed %758, %c8 : index + %760 = subi %c-1, %759 : index + %761 = select %756, %760, %759 : index + %762 = remi_signed %761, %c2 : index + %763 = cmpi "slt", %762, %c0 : index + %764 = addi %762, %c2 : index + %765 = select %763, %764, %762 : index + %766 = load %2[%751, %c0, %765] : memref<16x6x2xvector<8xf32>> + %767 = vector.insertelement %489, %766[%c5_i64 : i64] : vector<8xf32> + %768 = cmpi "slt", %arg5, %c0 : index + %769 = subi %c-1, %arg5 : index + %770 = select %768, %769, %arg5 : index + %771 = divi_signed %770, %c16 : index + %772 = subi %c-1, %771 : index + %773 = select %768, %772, %771 : index + %774 = remi_signed %773, %c16 : index + %775 = cmpi "slt", %774, %c0 : index + %776 = addi %774, %c16 : index + %777 = select %775, %776, %774 : index + %778 = remi_signed %arg5, %c16 : index + %779 = cmpi "slt", %778, %c0 : index + %780 = addi %778, %c16 : index + %781 = select %779, %780, %778 : index + %782 = cmpi "slt", %781, %c0 : index + %783 = subi %c-1, %781 : index + %784 = select %782, %783, %781 : index + %785 = divi_signed %784, %c8 : index + %786 = subi %c-1, %785 : index + %787 = select %782, %786, %785 : index + %788 = remi_signed %787, %c2 : index + %789 = cmpi "slt", %788, %c0 : index + %790 = addi %788, %c2 : index + %791 = select %789, %790, %788 : index + store %767, %2[%777, %c0, %791] : memref<16x6x2xvector<8xf32>> + %792 = cmpi "slt", %arg5, %c0 : index + %793 = subi %c-1, %arg5 : index + %794 = select %792, %793, %arg5 : index + %795 = divi_signed %794, %c16 : index + %796 = subi %c-1, %795 : index + %797 = select %792, %796, %795 : index + %798 = remi_signed %797, %c16 : index + %799 = cmpi "slt", %798, %c0 : index + %800 = addi %798, %c16 : index + %801 = select %799, %800, %798 : index + %802 = remi_signed %arg5, %c16 : index + %803 = cmpi "slt", %802, %c0 : index + %804 = addi %802, %c16 : index + %805 = select %803, %804, %802 : index + %806 = cmpi "slt", %805, %c0 : index + %807 = subi %c-1, %805 : index + %808 = select %806, %807, %805 : index + %809 = divi_signed %808, %c8 : index + %810 = subi %c-1, %809 : index + %811 = select %806, %810, %809 : index + %812 = remi_signed %811, %c2 : index + %813 = cmpi "slt", %812, %c0 : index + %814 = addi %812, %c2 : index + %815 = select %813, %814, %812 : index + %816 = load %2[%801, %c0, %815] : memref<16x6x2xvector<8xf32>> + %817 = vector.insertelement %490, %816[%c6_i64 : i64] : vector<8xf32> + %818 = cmpi "slt", %arg5, %c0 : index + %819 = subi %c-1, %arg5 : index + %820 = select %818, %819, %arg5 : index + %821 = divi_signed %820, %c16 : index + %822 = subi %c-1, %821 : index + %823 = select %818, %822, %821 : index + %824 = remi_signed %823, %c16 : index + %825 = cmpi "slt", %824, %c0 : index + %826 = addi %824, %c16 : index + %827 = select %825, %826, %824 : index + %828 = remi_signed %arg5, %c16 : index + %829 = cmpi "slt", %828, %c0 : index + %830 = addi %828, %c16 : index + %831 = select %829, %830, %828 : index + %832 = cmpi "slt", %831, %c0 : index + %833 = subi %c-1, %831 : index + %834 = select %832, %833, %831 : index + %835 = divi_signed %834, %c8 : index + %836 = subi %c-1, %835 : index + %837 = select %832, %836, %835 : index + %838 = remi_signed %837, %c2 : index + %839 = cmpi "slt", %838, %c0 : index + %840 = addi %838, %c2 : index + %841 = select %839, %840, %838 : index + store %817, %2[%827, %c0, %841] : memref<16x6x2xvector<8xf32>> + %842 = cmpi "slt", %arg5, %c0 : index + %843 = subi %c-1, %arg5 : index + %844 = select %842, %843, %arg5 : index + %845 = divi_signed %844, %c16 : index + %846 = subi %c-1, %845 : index + %847 = select %842, %846, %845 : index + %848 = remi_signed %847, %c16 : index + %849 = cmpi "slt", %848, %c0 : index + %850 = addi %848, %c16 : index + %851 = select %849, %850, %848 : index + %852 = remi_signed %arg5, %c16 : index + %853 = cmpi "slt", %852, %c0 : index + %854 = addi %852, %c16 : index + %855 = select %853, %854, %852 : index + %856 = cmpi "slt", %855, %c0 : index + %857 = subi %c-1, %855 : index + %858 = select %856, %857, %855 : index + %859 = divi_signed %858, %c8 : index + %860 = subi %c-1, %859 : index + %861 = select %856, %860, %859 : index + %862 = remi_signed %861, %c2 : index + %863 = cmpi "slt", %862, %c0 : index + %864 = addi %862, %c2 : index + %865 = select %863, %864, %862 : index + %866 = load %2[%851, %c0, %865] : memref<16x6x2xvector<8xf32>> + %867 = vector.insertelement %491, %866[%c7_i64 : i64] : vector<8xf32> + %868 = cmpi "slt", %arg5, %c0 : index + %869 = subi %c-1, %arg5 : index + %870 = select %868, %869, %arg5 : index + %871 = divi_signed %870, %c16 : index + %872 = subi %c-1, %871 : index + %873 = select %868, %872, %871 : index + %874 = remi_signed %873, %c16 : index + %875 = cmpi "slt", %874, %c0 : index + %876 = addi %874, %c16 : index + %877 = select %875, %876, %874 : index + %878 = remi_signed %arg5, %c16 : index + %879 = cmpi "slt", %878, %c0 : index + %880 = addi %878, %c16 : index + %881 = select %879, %880, %878 : index + %882 = cmpi "slt", %881, %c0 : index + %883 = subi %c-1, %881 : index + %884 = select %882, %883, %881 : index + %885 = divi_signed %884, %c8 : index + %886 = subi %c-1, %885 : index + %887 = select %882, %886, %885 : index + %888 = remi_signed %887, %c2 : index + %889 = cmpi "slt", %888, %c0 : index + %890 = addi %888, %c2 : index + %891 = select %889, %890, %888 : index + store %867, %2[%877, %c0, %891] : memref<16x6x2xvector<8xf32>> + %892 = cmpi "slt", %arg5, %c0 : index + %893 = subi %c-1, %arg5 : index + %894 = select %892, %893, %arg5 : index + %895 = divi_signed %894, %c16 : index + %896 = subi %c-1, %895 : index + %897 = select %892, %896, %895 : index + %898 = remi_signed %897, %c16 : index + %899 = cmpi "slt", %898, %c0 : index + %900 = addi %898, %c16 : index + %901 = select %899, %900, %898 : index + %902 = remi_signed %arg5, %c16 : index + %903 = cmpi "slt", %902, %c0 : index + %904 = addi %902, %c16 : index + %905 = select %903, %904, %902 : index + %906 = cmpi "slt", %905, %c0 : index + %907 = subi %c-1, %905 : index + %908 = select %906, %907, %905 : index + %909 = divi_signed %908, %c8 : index + %910 = subi %c-1, %909 : index + %911 = select %906, %910, %909 : index + %912 = remi_signed %911, %c2 : index + %913 = cmpi "slt", %912, %c0 : index + %914 = addi %912, %c2 : index + %915 = select %913, %914, %912 : index + %916 = load %2[%901, %c0, %915] : memref<16x6x2xvector<8xf32>> + %917 = vector.insertelement %484, %916[%c0_i64 : i64] : vector<8xf32> + %918 = cmpi "slt", %arg5, %c0 : index + %919 = subi %c-1, %arg5 : index + %920 = select %918, %919, %arg5 : index + %921 = divi_signed %920, %c16 : index + %922 = subi %c-1, %921 : index + %923 = select %918, %922, %921 : index + %924 = remi_signed %923, %c16 : index + %925 = cmpi "slt", %924, %c0 : index + %926 = addi %924, %c16 : index + %927 = select %925, %926, %924 : index + %928 = remi_signed %arg5, %c16 : index + %929 = cmpi "slt", %928, %c0 : index + %930 = addi %928, %c16 : index + %931 = select %929, %930, %928 : index + %932 = cmpi "slt", %931, %c0 : index + %933 = subi %c-1, %931 : index + %934 = select %932, %933, %931 : index + %935 = divi_signed %934, %c8 : index + %936 = subi %c-1, %935 : index + %937 = select %932, %936, %935 : index + %938 = remi_signed %937, %c2 : index + %939 = cmpi "slt", %938, %c0 : index + %940 = addi %938, %c2 : index + %941 = select %939, %940, %938 : index + store %917, %2[%927, %c0, %941] : memref<16x6x2xvector<8xf32>> + %942 = cmpi "slt", %arg5, %c0 : index + %943 = subi %c-1, %arg5 : index + %944 = select %942, %943, %arg5 : index + %945 = divi_signed %944, %c16 : index + %946 = subi %c-1, %945 : index + %947 = select %942, %946, %945 : index + %948 = remi_signed %947, %c16 : index + %949 = cmpi "slt", %948, %c0 : index + %950 = addi %948, %c16 : index + %951 = select %949, %950, %948 : index + %952 = remi_signed %arg5, %c16 : index + %953 = cmpi "slt", %952, %c0 : index + %954 = addi %952, %c16 : index + %955 = select %953, %954, %952 : index + %956 = cmpi "slt", %955, %c0 : index + %957 = subi %c-1, %955 : index + %958 = select %956, %957, %955 : index + %959 = divi_signed %958, %c8 : index + %960 = subi %c-1, %959 : index + %961 = select %956, %960, %959 : index + %962 = remi_signed %961, %c2 : index + %963 = cmpi "slt", %962, %c0 : index + %964 = addi %962, %c2 : index + %965 = select %963, %964, %962 : index + %966 = load %2[%951, %c0, %965] : memref<16x6x2xvector<8xf32>> + %967 = vector.insertelement %485, %966[%c1_i64 : i64] : vector<8xf32> + %968 = cmpi "slt", %arg5, %c0 : index + %969 = subi %c-1, %arg5 : index + %970 = select %968, %969, %arg5 : index + %971 = divi_signed %970, %c16 : index + %972 = subi %c-1, %971 : index + %973 = select %968, %972, %971 : index + %974 = remi_signed %973, %c16 : index + %975 = cmpi "slt", %974, %c0 : index + %976 = addi %974, %c16 : index + %977 = select %975, %976, %974 : index + %978 = remi_signed %arg5, %c16 : index + %979 = cmpi "slt", %978, %c0 : index + %980 = addi %978, %c16 : index + %981 = select %979, %980, %978 : index + %982 = cmpi "slt", %981, %c0 : index + %983 = subi %c-1, %981 : index + %984 = select %982, %983, %981 : index + %985 = divi_signed %984, %c8 : index + %986 = subi %c-1, %985 : index + %987 = select %982, %986, %985 : index + %988 = remi_signed %987, %c2 : index + %989 = cmpi "slt", %988, %c0 : index + %990 = addi %988, %c2 : index + %991 = select %989, %990, %988 : index + store %967, %2[%977, %c0, %991] : memref<16x6x2xvector<8xf32>> + %992 = cmpi "slt", %arg5, %c0 : index + %993 = subi %c-1, %arg5 : index + %994 = select %992, %993, %arg5 : index + %995 = divi_signed %994, %c16 : index + %996 = subi %c-1, %995 : index + %997 = select %992, %996, %995 : index + %998 = remi_signed %997, %c16 : index + %999 = cmpi "slt", %998, %c0 : index + %1000 = addi %998, %c16 : index + %1001 = select %999, %1000, %998 : index + %1002 = remi_signed %arg5, %c16 : index + %1003 = cmpi "slt", %1002, %c0 : index + %1004 = addi %1002, %c16 : index + %1005 = select %1003, %1004, %1002 : index + %1006 = cmpi "slt", %1005, %c0 : index + %1007 = subi %c-1, %1005 : index + %1008 = select %1006, %1007, %1005 : index + %1009 = divi_signed %1008, %c8 : index + %1010 = subi %c-1, %1009 : index + %1011 = select %1006, %1010, %1009 : index + %1012 = remi_signed %1011, %c2 : index + %1013 = cmpi "slt", %1012, %c0 : index + %1014 = addi %1012, %c2 : index + %1015 = select %1013, %1014, %1012 : index + %1016 = load %2[%1001, %c0, %1015] : memref<16x6x2xvector<8xf32>> + %1017 = vector.insertelement %486, %1016[%c2_i64 : i64] : vector<8xf32> + %1018 = cmpi "slt", %arg5, %c0 : index + %1019 = subi %c-1, %arg5 : index + %1020 = select %1018, %1019, %arg5 : index + %1021 = divi_signed %1020, %c16 : index + %1022 = subi %c-1, %1021 : index + %1023 = select %1018, %1022, %1021 : index + %1024 = remi_signed %1023, %c16 : index + %1025 = cmpi "slt", %1024, %c0 : index + %1026 = addi %1024, %c16 : index + %1027 = select %1025, %1026, %1024 : index + %1028 = remi_signed %arg5, %c16 : index + %1029 = cmpi "slt", %1028, %c0 : index + %1030 = addi %1028, %c16 : index + %1031 = select %1029, %1030, %1028 : index + %1032 = cmpi "slt", %1031, %c0 : index + %1033 = subi %c-1, %1031 : index + %1034 = select %1032, %1033, %1031 : index + %1035 = divi_signed %1034, %c8 : index + %1036 = subi %c-1, %1035 : index + %1037 = select %1032, %1036, %1035 : index + %1038 = remi_signed %1037, %c2 : index + %1039 = cmpi "slt", %1038, %c0 : index + %1040 = addi %1038, %c2 : index + %1041 = select %1039, %1040, %1038 : index + store %1017, %2[%1027, %c0, %1041] : memref<16x6x2xvector<8xf32>> + %1042 = cmpi "slt", %arg5, %c0 : index + %1043 = subi %c-1, %arg5 : index + %1044 = select %1042, %1043, %arg5 : index + %1045 = divi_signed %1044, %c16 : index + %1046 = subi %c-1, %1045 : index + %1047 = select %1042, %1046, %1045 : index + %1048 = remi_signed %1047, %c16 : index + %1049 = cmpi "slt", %1048, %c0 : index + %1050 = addi %1048, %c16 : index + %1051 = select %1049, %1050, %1048 : index + %1052 = remi_signed %arg5, %c16 : index + %1053 = cmpi "slt", %1052, %c0 : index + %1054 = addi %1052, %c16 : index + %1055 = select %1053, %1054, %1052 : index + %1056 = cmpi "slt", %1055, %c0 : index + %1057 = subi %c-1, %1055 : index + %1058 = select %1056, %1057, %1055 : index + %1059 = divi_signed %1058, %c8 : index + %1060 = subi %c-1, %1059 : index + %1061 = select %1056, %1060, %1059 : index + %1062 = remi_signed %1061, %c2 : index + %1063 = cmpi "slt", %1062, %c0 : index + %1064 = addi %1062, %c2 : index + %1065 = select %1063, %1064, %1062 : index + %1066 = load %2[%1051, %c0, %1065] : memref<16x6x2xvector<8xf32>> + %1067 = vector.insertelement %487, %1066[%c3_i64 : i64] : vector<8xf32> + %1068 = cmpi "slt", %arg5, %c0 : index + %1069 = subi %c-1, %arg5 : index + %1070 = select %1068, %1069, %arg5 : index + %1071 = divi_signed %1070, %c16 : index + %1072 = subi %c-1, %1071 : index + %1073 = select %1068, %1072, %1071 : index + %1074 = remi_signed %1073, %c16 : index + %1075 = cmpi "slt", %1074, %c0 : index + %1076 = addi %1074, %c16 : index + %1077 = select %1075, %1076, %1074 : index + %1078 = remi_signed %arg5, %c16 : index + %1079 = cmpi "slt", %1078, %c0 : index + %1080 = addi %1078, %c16 : index + %1081 = select %1079, %1080, %1078 : index + %1082 = cmpi "slt", %1081, %c0 : index + %1083 = subi %c-1, %1081 : index + %1084 = select %1082, %1083, %1081 : index + %1085 = divi_signed %1084, %c8 : index + %1086 = subi %c-1, %1085 : index + %1087 = select %1082, %1086, %1085 : index + %1088 = remi_signed %1087, %c2 : index + %1089 = cmpi "slt", %1088, %c0 : index + %1090 = addi %1088, %c2 : index + %1091 = select %1089, %1090, %1088 : index + store %1067, %2[%1077, %c0, %1091] : memref<16x6x2xvector<8xf32>> + %1092 = cmpi "slt", %arg5, %c0 : index + %1093 = subi %c-1, %arg5 : index + %1094 = select %1092, %1093, %arg5 : index + %1095 = divi_signed %1094, %c16 : index + %1096 = subi %c-1, %1095 : index + %1097 = select %1092, %1096, %1095 : index + %1098 = remi_signed %1097, %c16 : index + %1099 = cmpi "slt", %1098, %c0 : index + %1100 = addi %1098, %c16 : index + %1101 = select %1099, %1100, %1098 : index + %1102 = remi_signed %arg5, %c16 : index + %1103 = cmpi "slt", %1102, %c0 : index + %1104 = addi %1102, %c16 : index + %1105 = select %1103, %1104, %1102 : index + %1106 = cmpi "slt", %1105, %c0 : index + %1107 = subi %c-1, %1105 : index + %1108 = select %1106, %1107, %1105 : index + %1109 = divi_signed %1108, %c8 : index + %1110 = subi %c-1, %1109 : index + %1111 = select %1106, %1110, %1109 : index + %1112 = remi_signed %1111, %c2 : index + %1113 = cmpi "slt", %1112, %c0 : index + %1114 = addi %1112, %c2 : index + %1115 = select %1113, %1114, %1112 : index + %1116 = load %2[%1101, %c0, %1115] : memref<16x6x2xvector<8xf32>> + %1117 = vector.insertelement %488, %1116[%c4_i64 : i64] : vector<8xf32> + %1118 = cmpi "slt", %arg5, %c0 : index + %1119 = subi %c-1, %arg5 : index + %1120 = select %1118, %1119, %arg5 : index + %1121 = divi_signed %1120, %c16 : index + %1122 = subi %c-1, %1121 : index + %1123 = select %1118, %1122, %1121 : index + %1124 = remi_signed %1123, %c16 : index + %1125 = cmpi "slt", %1124, %c0 : index + %1126 = addi %1124, %c16 : index + %1127 = select %1125, %1126, %1124 : index + %1128 = remi_signed %arg5, %c16 : index + %1129 = cmpi "slt", %1128, %c0 : index + %1130 = addi %1128, %c16 : index + %1131 = select %1129, %1130, %1128 : index + %1132 = cmpi "slt", %1131, %c0 : index + %1133 = subi %c-1, %1131 : index + %1134 = select %1132, %1133, %1131 : index + %1135 = divi_signed %1134, %c8 : index + %1136 = subi %c-1, %1135 : index + %1137 = select %1132, %1136, %1135 : index + %1138 = remi_signed %1137, %c2 : index + %1139 = cmpi "slt", %1138, %c0 : index + %1140 = addi %1138, %c2 : index + %1141 = select %1139, %1140, %1138 : index + store %1117, %2[%1127, %c0, %1141] : memref<16x6x2xvector<8xf32>> + %1142 = cmpi "slt", %arg5, %c0 : index + %1143 = subi %c-1, %arg5 : index + %1144 = select %1142, %1143, %arg5 : index + %1145 = divi_signed %1144, %c16 : index + %1146 = subi %c-1, %1145 : index + %1147 = select %1142, %1146, %1145 : index + %1148 = remi_signed %1147, %c16 : index + %1149 = cmpi "slt", %1148, %c0 : index + %1150 = addi %1148, %c16 : index + %1151 = select %1149, %1150, %1148 : index + %1152 = remi_signed %arg5, %c16 : index + %1153 = cmpi "slt", %1152, %c0 : index + %1154 = addi %1152, %c16 : index + %1155 = select %1153, %1154, %1152 : index + %1156 = cmpi "slt", %1155, %c0 : index + %1157 = subi %c-1, %1155 : index + %1158 = select %1156, %1157, %1155 : index + %1159 = divi_signed %1158, %c8 : index + %1160 = subi %c-1, %1159 : index + %1161 = select %1156, %1160, %1159 : index + %1162 = remi_signed %1161, %c2 : index + %1163 = cmpi "slt", %1162, %c0 : index + %1164 = addi %1162, %c2 : index + %1165 = select %1163, %1164, %1162 : index + %1166 = load %2[%1151, %c0, %1165] : memref<16x6x2xvector<8xf32>> + %1167 = vector.insertelement %489, %1166[%c5_i64 : i64] : vector<8xf32> + %1168 = cmpi "slt", %arg5, %c0 : index + %1169 = subi %c-1, %arg5 : index + %1170 = select %1168, %1169, %arg5 : index + %1171 = divi_signed %1170, %c16 : index + %1172 = subi %c-1, %1171 : index + %1173 = select %1168, %1172, %1171 : index + %1174 = remi_signed %1173, %c16 : index + %1175 = cmpi "slt", %1174, %c0 : index + %1176 = addi %1174, %c16 : index + %1177 = select %1175, %1176, %1174 : index + %1178 = remi_signed %arg5, %c16 : index + %1179 = cmpi "slt", %1178, %c0 : index + %1180 = addi %1178, %c16 : index + %1181 = select %1179, %1180, %1178 : index + %1182 = cmpi "slt", %1181, %c0 : index + %1183 = subi %c-1, %1181 : index + %1184 = select %1182, %1183, %1181 : index + %1185 = divi_signed %1184, %c8 : index + %1186 = subi %c-1, %1185 : index + %1187 = select %1182, %1186, %1185 : index + %1188 = remi_signed %1187, %c2 : index + %1189 = cmpi "slt", %1188, %c0 : index + %1190 = addi %1188, %c2 : index + %1191 = select %1189, %1190, %1188 : index + store %1167, %2[%1177, %c0, %1191] : memref<16x6x2xvector<8xf32>> + %1192 = cmpi "slt", %arg5, %c0 : index + %1193 = subi %c-1, %arg5 : index + %1194 = select %1192, %1193, %arg5 : index + %1195 = divi_signed %1194, %c16 : index + %1196 = subi %c-1, %1195 : index + %1197 = select %1192, %1196, %1195 : index + %1198 = remi_signed %1197, %c16 : index + %1199 = cmpi "slt", %1198, %c0 : index + %1200 = addi %1198, %c16 : index + %1201 = select %1199, %1200, %1198 : index + %1202 = remi_signed %arg5, %c16 : index + %1203 = cmpi "slt", %1202, %c0 : index + %1204 = addi %1202, %c16 : index + %1205 = select %1203, %1204, %1202 : index + %1206 = cmpi "slt", %1205, %c0 : index + %1207 = subi %c-1, %1205 : index + %1208 = select %1206, %1207, %1205 : index + %1209 = divi_signed %1208, %c8 : index + %1210 = subi %c-1, %1209 : index + %1211 = select %1206, %1210, %1209 : index + %1212 = remi_signed %1211, %c2 : index + %1213 = cmpi "slt", %1212, %c0 : index + %1214 = addi %1212, %c2 : index + %1215 = select %1213, %1214, %1212 : index + %1216 = load %2[%1201, %c0, %1215] : memref<16x6x2xvector<8xf32>> + %1217 = vector.insertelement %490, %1216[%c6_i64 : i64] : vector<8xf32> + %1218 = cmpi "slt", %arg5, %c0 : index + %1219 = subi %c-1, %arg5 : index + %1220 = select %1218, %1219, %arg5 : index + %1221 = divi_signed %1220, %c16 : index + %1222 = subi %c-1, %1221 : index + %1223 = select %1218, %1222, %1221 : index + %1224 = remi_signed %1223, %c16 : index + %1225 = cmpi "slt", %1224, %c0 : index + %1226 = addi %1224, %c16 : index + %1227 = select %1225, %1226, %1224 : index + %1228 = remi_signed %arg5, %c16 : index + %1229 = cmpi "slt", %1228, %c0 : index + %1230 = addi %1228, %c16 : index + %1231 = select %1229, %1230, %1228 : index + %1232 = cmpi "slt", %1231, %c0 : index + %1233 = subi %c-1, %1231 : index + %1234 = select %1232, %1233, %1231 : index + %1235 = divi_signed %1234, %c8 : index + %1236 = subi %c-1, %1235 : index + %1237 = select %1232, %1236, %1235 : index + %1238 = remi_signed %1237, %c2 : index + %1239 = cmpi "slt", %1238, %c0 : index + %1240 = addi %1238, %c2 : index + %1241 = select %1239, %1240, %1238 : index + store %1217, %2[%1227, %c0, %1241] : memref<16x6x2xvector<8xf32>> + %1242 = cmpi "slt", %arg5, %c0 : index + %1243 = subi %c-1, %arg5 : index + %1244 = select %1242, %1243, %arg5 : index + %1245 = divi_signed %1244, %c16 : index + %1246 = subi %c-1, %1245 : index + %1247 = select %1242, %1246, %1245 : index + %1248 = remi_signed %1247, %c16 : index + %1249 = cmpi "slt", %1248, %c0 : index + %1250 = addi %1248, %c16 : index + %1251 = select %1249, %1250, %1248 : index + %1252 = remi_signed %arg5, %c16 : index + %1253 = cmpi "slt", %1252, %c0 : index + %1254 = addi %1252, %c16 : index + %1255 = select %1253, %1254, %1252 : index + %1256 = cmpi "slt", %1255, %c0 : index + %1257 = subi %c-1, %1255 : index + %1258 = select %1256, %1257, %1255 : index + %1259 = divi_signed %1258, %c8 : index + %1260 = subi %c-1, %1259 : index + %1261 = select %1256, %1260, %1259 : index + %1262 = remi_signed %1261, %c2 : index + %1263 = cmpi "slt", %1262, %c0 : index + %1264 = addi %1262, %c2 : index + %1265 = select %1263, %1264, %1262 : index + %1266 = load %2[%1251, %c0, %1265] : memref<16x6x2xvector<8xf32>> + %1267 = vector.insertelement %491, %1266[%c7_i64 : i64] : vector<8xf32> + %1268 = cmpi "slt", %arg5, %c0 : index + %1269 = subi %c-1, %arg5 : index + %1270 = select %1268, %1269, %arg5 : index + %1271 = divi_signed %1270, %c16 : index + %1272 = subi %c-1, %1271 : index + %1273 = select %1268, %1272, %1271 : index + %1274 = remi_signed %1273, %c16 : index + %1275 = cmpi "slt", %1274, %c0 : index + %1276 = addi %1274, %c16 : index + %1277 = select %1275, %1276, %1274 : index + %1278 = remi_signed %arg5, %c16 : index + %1279 = cmpi "slt", %1278, %c0 : index + %1280 = addi %1278, %c16 : index + %1281 = select %1279, %1280, %1278 : index + %1282 = cmpi "slt", %1281, %c0 : index + %1283 = subi %c-1, %1281 : index + %1284 = select %1282, %1283, %1281 : index + %1285 = divi_signed %1284, %c8 : index + %1286 = subi %c-1, %1285 : index + %1287 = select %1282, %1286, %1285 : index + %1288 = remi_signed %1287, %c2 : index + %1289 = cmpi "slt", %1288, %c0 : index + %1290 = addi %1288, %c2 : index + %1291 = select %1289, %1290, %1288 : index + store %1267, %2[%1277, %c0, %1291] : memref<16x6x2xvector<8xf32>> + %1292 = addi %arg6, %arg7 : index + %1293 = addi %arg6, %arg7 : index + %1294 = addi %arg6, %arg7 : index + %1295 = addi %arg6, %arg7 : index + %1296 = addi %arg6, %arg7 : index + %1297 = addi %arg6, %arg7 : index + %1298 = addi %arg6, %arg7 : index + %1299 = addi %arg6, %arg7 : index + %1300 = load %arg0[%arg4, %1292] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1301 = load %arg0[%arg4, %1293] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1302 = load %arg0[%arg4, %1294] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1303 = load %arg0[%arg4, %1295] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1304 = load %arg0[%arg4, %1296] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1305 = load %arg0[%arg4, %1297] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1306 = load %arg0[%arg4, %1298] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1307 = load %arg0[%arg4, %1299] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1308 = addi %arg5, %c8 : index + %1309 = cmpi "slt", %1308, %c0 : index + %1310 = subi %c-1, %1308 : index + %1311 = select %1309, %1310, %1308 : index + %1312 = divi_signed %1311, %c16 : index + %1313 = subi %c-1, %1312 : index + %1314 = select %1309, %1313, %1312 : index + %1315 = remi_signed %1314, %c16 : index + %1316 = cmpi "slt", %1315, %c0 : index + %1317 = addi %1315, %c16 : index + %1318 = select %1316, %1317, %1315 : index + %1319 = addi %arg6, %arg7 : index + %1320 = remi_signed %1319, %c128 : index + %1321 = cmpi "slt", %1320, %c0 : index + %1322 = addi %1320, %c128 : index + %1323 = select %1321, %1322, %1320 : index + %1324 = cmpi "slt", %arg5, %c0 : index + %1325 = subi %c-1, %arg5 : index + %1326 = select %1324, %1325, %arg5 : index + %1327 = divi_signed %1326, %c8 : index + %1328 = subi %c-1, %1327 : index + %1329 = select %1324, %1328, %1327 : index + %1330 = addi %arg5, %c8 : index + %1331 = cmpi "slt", %1330, %c0 : index + %1332 = subi %c-1, %1330 : index + %1333 = select %1331, %1332, %1330 : index + %1334 = divi_signed %1333, %c16 : index + %1335 = subi %c-1, %1334 : index + %1336 = select %1331, %1335, %1334 : index + %1337 = muli %1336, %c-2 : index + %1338 = addi %1329, %1337 : index + %1339 = cmpi "slt", %arg5, %c0 : index + %1340 = subi %c-1, %arg5 : index + %1341 = select %1339, %1340, %arg5 : index + %1342 = divi_signed %1341, %c8 : index + %1343 = subi %c-1, %1342 : index + %1344 = select %1339, %1343, %1342 : index + %1345 = addi %arg5, %c8 : index + %1346 = cmpi "slt", %1345, %c0 : index + %1347 = subi %c-1, %1345 : index + %1348 = select %1346, %1347, %1345 : index + %1349 = divi_signed %1348, %c16 : index + %1350 = subi %c-1, %1349 : index + %1351 = select %1346, %1350, %1349 : index + %1352 = muli %1351, %c-2 : index + %1353 = addi %1344, %1352 : index + %1354 = addi %1353, %c1 : index + %1355 = cmpi "slt", %1354, %c0 : index + %1356 = subi %c-1, %1354 : index + %1357 = select %1355, %1356, %1354 : index + %1358 = divi_signed %1357, %c2 : index + %1359 = subi %c-1, %1358 : index + %1360 = select %1355, %1359, %1358 : index + %1361 = muli %1360, %c-2 : index + %1362 = addi %1338, %1361 : index + %1363 = addi %1362, %c1 : index + %1364 = load %3[%1318, %1323, %1363] : memref<16x128x2xvector<8xf32>> + %1365 = vector.extractelement %1364[%c0_i64 : i64] : vector<8xf32> + %1366 = addi %arg5, %c8 : index + %1367 = cmpi "slt", %1366, %c0 : index + %1368 = subi %c-1, %1366 : index + %1369 = select %1367, %1368, %1366 : index + %1370 = divi_signed %1369, %c16 : index + %1371 = subi %c-1, %1370 : index + %1372 = select %1367, %1371, %1370 : index + %1373 = remi_signed %1372, %c16 : index + %1374 = cmpi "slt", %1373, %c0 : index + %1375 = addi %1373, %c16 : index + %1376 = select %1374, %1375, %1373 : index + %1377 = addi %arg6, %arg7 : index + %1378 = remi_signed %1377, %c128 : index + %1379 = cmpi "slt", %1378, %c0 : index + %1380 = addi %1378, %c128 : index + %1381 = select %1379, %1380, %1378 : index + %1382 = cmpi "slt", %arg5, %c0 : index + %1383 = subi %c-1, %arg5 : index + %1384 = select %1382, %1383, %arg5 : index + %1385 = divi_signed %1384, %c8 : index + %1386 = subi %c-1, %1385 : index + %1387 = select %1382, %1386, %1385 : index + %1388 = addi %arg5, %c8 : index + %1389 = cmpi "slt", %1388, %c0 : index + %1390 = subi %c-1, %1388 : index + %1391 = select %1389, %1390, %1388 : index + %1392 = divi_signed %1391, %c16 : index + %1393 = subi %c-1, %1392 : index + %1394 = select %1389, %1393, %1392 : index + %1395 = muli %1394, %c-2 : index + %1396 = addi %1387, %1395 : index + %1397 = cmpi "slt", %arg5, %c0 : index + %1398 = subi %c-1, %arg5 : index + %1399 = select %1397, %1398, %arg5 : index + %1400 = divi_signed %1399, %c8 : index + %1401 = subi %c-1, %1400 : index + %1402 = select %1397, %1401, %1400 : index + %1403 = addi %arg5, %c8 : index + %1404 = cmpi "slt", %1403, %c0 : index + %1405 = subi %c-1, %1403 : index + %1406 = select %1404, %1405, %1403 : index + %1407 = divi_signed %1406, %c16 : index + %1408 = subi %c-1, %1407 : index + %1409 = select %1404, %1408, %1407 : index + %1410 = muli %1409, %c-2 : index + %1411 = addi %1402, %1410 : index + %1412 = addi %1411, %c1 : index + %1413 = cmpi "slt", %1412, %c0 : index + %1414 = subi %c-1, %1412 : index + %1415 = select %1413, %1414, %1412 : index + %1416 = divi_signed %1415, %c2 : index + %1417 = subi %c-1, %1416 : index + %1418 = select %1413, %1417, %1416 : index + %1419 = muli %1418, %c-2 : index + %1420 = addi %1396, %1419 : index + %1421 = addi %1420, %c1 : index + %1422 = load %3[%1376, %1381, %1421] : memref<16x128x2xvector<8xf32>> + %1423 = vector.extractelement %1422[%c1_i64 : i64] : vector<8xf32> + %1424 = addi %arg5, %c8 : index + %1425 = cmpi "slt", %1424, %c0 : index + %1426 = subi %c-1, %1424 : index + %1427 = select %1425, %1426, %1424 : index + %1428 = divi_signed %1427, %c16 : index + %1429 = subi %c-1, %1428 : index + %1430 = select %1425, %1429, %1428 : index + %1431 = remi_signed %1430, %c16 : index + %1432 = cmpi "slt", %1431, %c0 : index + %1433 = addi %1431, %c16 : index + %1434 = select %1432, %1433, %1431 : index + %1435 = addi %arg6, %arg7 : index + %1436 = remi_signed %1435, %c128 : index + %1437 = cmpi "slt", %1436, %c0 : index + %1438 = addi %1436, %c128 : index + %1439 = select %1437, %1438, %1436 : index + %1440 = cmpi "slt", %arg5, %c0 : index + %1441 = subi %c-1, %arg5 : index + %1442 = select %1440, %1441, %arg5 : index + %1443 = divi_signed %1442, %c8 : index + %1444 = subi %c-1, %1443 : index + %1445 = select %1440, %1444, %1443 : index + %1446 = addi %arg5, %c8 : index + %1447 = cmpi "slt", %1446, %c0 : index + %1448 = subi %c-1, %1446 : index + %1449 = select %1447, %1448, %1446 : index + %1450 = divi_signed %1449, %c16 : index + %1451 = subi %c-1, %1450 : index + %1452 = select %1447, %1451, %1450 : index + %1453 = muli %1452, %c-2 : index + %1454 = addi %1445, %1453 : index + %1455 = cmpi "slt", %arg5, %c0 : index + %1456 = subi %c-1, %arg5 : index + %1457 = select %1455, %1456, %arg5 : index + %1458 = divi_signed %1457, %c8 : index + %1459 = subi %c-1, %1458 : index + %1460 = select %1455, %1459, %1458 : index + %1461 = addi %arg5, %c8 : index + %1462 = cmpi "slt", %1461, %c0 : index + %1463 = subi %c-1, %1461 : index + %1464 = select %1462, %1463, %1461 : index + %1465 = divi_signed %1464, %c16 : index + %1466 = subi %c-1, %1465 : index + %1467 = select %1462, %1466, %1465 : index + %1468 = muli %1467, %c-2 : index + %1469 = addi %1460, %1468 : index + %1470 = addi %1469, %c1 : index + %1471 = cmpi "slt", %1470, %c0 : index + %1472 = subi %c-1, %1470 : index + %1473 = select %1471, %1472, %1470 : index + %1474 = divi_signed %1473, %c2 : index + %1475 = subi %c-1, %1474 : index + %1476 = select %1471, %1475, %1474 : index + %1477 = muli %1476, %c-2 : index + %1478 = addi %1454, %1477 : index + %1479 = addi %1478, %c1 : index + %1480 = load %3[%1434, %1439, %1479] : memref<16x128x2xvector<8xf32>> + %1481 = vector.extractelement %1480[%c2_i64 : i64] : vector<8xf32> + %1482 = addi %arg5, %c8 : index + %1483 = cmpi "slt", %1482, %c0 : index + %1484 = subi %c-1, %1482 : index + %1485 = select %1483, %1484, %1482 : index + %1486 = divi_signed %1485, %c16 : index + %1487 = subi %c-1, %1486 : index + %1488 = select %1483, %1487, %1486 : index + %1489 = remi_signed %1488, %c16 : index + %1490 = cmpi "slt", %1489, %c0 : index + %1491 = addi %1489, %c16 : index + %1492 = select %1490, %1491, %1489 : index + %1493 = addi %arg6, %arg7 : index + %1494 = remi_signed %1493, %c128 : index + %1495 = cmpi "slt", %1494, %c0 : index + %1496 = addi %1494, %c128 : index + %1497 = select %1495, %1496, %1494 : index + %1498 = cmpi "slt", %arg5, %c0 : index + %1499 = subi %c-1, %arg5 : index + %1500 = select %1498, %1499, %arg5 : index + %1501 = divi_signed %1500, %c8 : index + %1502 = subi %c-1, %1501 : index + %1503 = select %1498, %1502, %1501 : index + %1504 = addi %arg5, %c8 : index + %1505 = cmpi "slt", %1504, %c0 : index + %1506 = subi %c-1, %1504 : index + %1507 = select %1505, %1506, %1504 : index + %1508 = divi_signed %1507, %c16 : index + %1509 = subi %c-1, %1508 : index + %1510 = select %1505, %1509, %1508 : index + %1511 = muli %1510, %c-2 : index + %1512 = addi %1503, %1511 : index + %1513 = cmpi "slt", %arg5, %c0 : index + %1514 = subi %c-1, %arg5 : index + %1515 = select %1513, %1514, %arg5 : index + %1516 = divi_signed %1515, %c8 : index + %1517 = subi %c-1, %1516 : index + %1518 = select %1513, %1517, %1516 : index + %1519 = addi %arg5, %c8 : index + %1520 = cmpi "slt", %1519, %c0 : index + %1521 = subi %c-1, %1519 : index + %1522 = select %1520, %1521, %1519 : index + %1523 = divi_signed %1522, %c16 : index + %1524 = subi %c-1, %1523 : index + %1525 = select %1520, %1524, %1523 : index + %1526 = muli %1525, %c-2 : index + %1527 = addi %1518, %1526 : index + %1528 = addi %1527, %c1 : index + %1529 = cmpi "slt", %1528, %c0 : index + %1530 = subi %c-1, %1528 : index + %1531 = select %1529, %1530, %1528 : index + %1532 = divi_signed %1531, %c2 : index + %1533 = subi %c-1, %1532 : index + %1534 = select %1529, %1533, %1532 : index + %1535 = muli %1534, %c-2 : index + %1536 = addi %1512, %1535 : index + %1537 = addi %1536, %c1 : index + %1538 = load %3[%1492, %1497, %1537] : memref<16x128x2xvector<8xf32>> + %1539 = vector.extractelement %1538[%c3_i64 : i64] : vector<8xf32> + %1540 = addi %arg5, %c8 : index + %1541 = cmpi "slt", %1540, %c0 : index + %1542 = subi %c-1, %1540 : index + %1543 = select %1541, %1542, %1540 : index + %1544 = divi_signed %1543, %c16 : index + %1545 = subi %c-1, %1544 : index + %1546 = select %1541, %1545, %1544 : index + %1547 = remi_signed %1546, %c16 : index + %1548 = cmpi "slt", %1547, %c0 : index + %1549 = addi %1547, %c16 : index + %1550 = select %1548, %1549, %1547 : index + %1551 = addi %arg6, %arg7 : index + %1552 = remi_signed %1551, %c128 : index + %1553 = cmpi "slt", %1552, %c0 : index + %1554 = addi %1552, %c128 : index + %1555 = select %1553, %1554, %1552 : index + %1556 = cmpi "slt", %arg5, %c0 : index + %1557 = subi %c-1, %arg5 : index + %1558 = select %1556, %1557, %arg5 : index + %1559 = divi_signed %1558, %c8 : index + %1560 = subi %c-1, %1559 : index + %1561 = select %1556, %1560, %1559 : index + %1562 = addi %arg5, %c8 : index + %1563 = cmpi "slt", %1562, %c0 : index + %1564 = subi %c-1, %1562 : index + %1565 = select %1563, %1564, %1562 : index + %1566 = divi_signed %1565, %c16 : index + %1567 = subi %c-1, %1566 : index + %1568 = select %1563, %1567, %1566 : index + %1569 = muli %1568, %c-2 : index + %1570 = addi %1561, %1569 : index + %1571 = cmpi "slt", %arg5, %c0 : index + %1572 = subi %c-1, %arg5 : index + %1573 = select %1571, %1572, %arg5 : index + %1574 = divi_signed %1573, %c8 : index + %1575 = subi %c-1, %1574 : index + %1576 = select %1571, %1575, %1574 : index + %1577 = addi %arg5, %c8 : index + %1578 = cmpi "slt", %1577, %c0 : index + %1579 = subi %c-1, %1577 : index + %1580 = select %1578, %1579, %1577 : index + %1581 = divi_signed %1580, %c16 : index + %1582 = subi %c-1, %1581 : index + %1583 = select %1578, %1582, %1581 : index + %1584 = muli %1583, %c-2 : index + %1585 = addi %1576, %1584 : index + %1586 = addi %1585, %c1 : index + %1587 = cmpi "slt", %1586, %c0 : index + %1588 = subi %c-1, %1586 : index + %1589 = select %1587, %1588, %1586 : index + %1590 = divi_signed %1589, %c2 : index + %1591 = subi %c-1, %1590 : index + %1592 = select %1587, %1591, %1590 : index + %1593 = muli %1592, %c-2 : index + %1594 = addi %1570, %1593 : index + %1595 = addi %1594, %c1 : index + %1596 = load %3[%1550, %1555, %1595] : memref<16x128x2xvector<8xf32>> + %1597 = vector.extractelement %1596[%c4_i64 : i64] : vector<8xf32> + %1598 = addi %arg5, %c8 : index + %1599 = cmpi "slt", %1598, %c0 : index + %1600 = subi %c-1, %1598 : index + %1601 = select %1599, %1600, %1598 : index + %1602 = divi_signed %1601, %c16 : index + %1603 = subi %c-1, %1602 : index + %1604 = select %1599, %1603, %1602 : index + %1605 = remi_signed %1604, %c16 : index + %1606 = cmpi "slt", %1605, %c0 : index + %1607 = addi %1605, %c16 : index + %1608 = select %1606, %1607, %1605 : index + %1609 = addi %arg6, %arg7 : index + %1610 = remi_signed %1609, %c128 : index + %1611 = cmpi "slt", %1610, %c0 : index + %1612 = addi %1610, %c128 : index + %1613 = select %1611, %1612, %1610 : index + %1614 = cmpi "slt", %arg5, %c0 : index + %1615 = subi %c-1, %arg5 : index + %1616 = select %1614, %1615, %arg5 : index + %1617 = divi_signed %1616, %c8 : index + %1618 = subi %c-1, %1617 : index + %1619 = select %1614, %1618, %1617 : index + %1620 = addi %arg5, %c8 : index + %1621 = cmpi "slt", %1620, %c0 : index + %1622 = subi %c-1, %1620 : index + %1623 = select %1621, %1622, %1620 : index + %1624 = divi_signed %1623, %c16 : index + %1625 = subi %c-1, %1624 : index + %1626 = select %1621, %1625, %1624 : index + %1627 = muli %1626, %c-2 : index + %1628 = addi %1619, %1627 : index + %1629 = cmpi "slt", %arg5, %c0 : index + %1630 = subi %c-1, %arg5 : index + %1631 = select %1629, %1630, %arg5 : index + %1632 = divi_signed %1631, %c8 : index + %1633 = subi %c-1, %1632 : index + %1634 = select %1629, %1633, %1632 : index + %1635 = addi %arg5, %c8 : index + %1636 = cmpi "slt", %1635, %c0 : index + %1637 = subi %c-1, %1635 : index + %1638 = select %1636, %1637, %1635 : index + %1639 = divi_signed %1638, %c16 : index + %1640 = subi %c-1, %1639 : index + %1641 = select %1636, %1640, %1639 : index + %1642 = muli %1641, %c-2 : index + %1643 = addi %1634, %1642 : index + %1644 = addi %1643, %c1 : index + %1645 = cmpi "slt", %1644, %c0 : index + %1646 = subi %c-1, %1644 : index + %1647 = select %1645, %1646, %1644 : index + %1648 = divi_signed %1647, %c2 : index + %1649 = subi %c-1, %1648 : index + %1650 = select %1645, %1649, %1648 : index + %1651 = muli %1650, %c-2 : index + %1652 = addi %1628, %1651 : index + %1653 = addi %1652, %c1 : index + %1654 = load %3[%1608, %1613, %1653] : memref<16x128x2xvector<8xf32>> + %1655 = vector.extractelement %1654[%c5_i64 : i64] : vector<8xf32> + %1656 = addi %arg5, %c8 : index + %1657 = cmpi "slt", %1656, %c0 : index + %1658 = subi %c-1, %1656 : index + %1659 = select %1657, %1658, %1656 : index + %1660 = divi_signed %1659, %c16 : index + %1661 = subi %c-1, %1660 : index + %1662 = select %1657, %1661, %1660 : index + %1663 = remi_signed %1662, %c16 : index + %1664 = cmpi "slt", %1663, %c0 : index + %1665 = addi %1663, %c16 : index + %1666 = select %1664, %1665, %1663 : index + %1667 = addi %arg6, %arg7 : index + %1668 = remi_signed %1667, %c128 : index + %1669 = cmpi "slt", %1668, %c0 : index + %1670 = addi %1668, %c128 : index + %1671 = select %1669, %1670, %1668 : index + %1672 = cmpi "slt", %arg5, %c0 : index + %1673 = subi %c-1, %arg5 : index + %1674 = select %1672, %1673, %arg5 : index + %1675 = divi_signed %1674, %c8 : index + %1676 = subi %c-1, %1675 : index + %1677 = select %1672, %1676, %1675 : index + %1678 = addi %arg5, %c8 : index + %1679 = cmpi "slt", %1678, %c0 : index + %1680 = subi %c-1, %1678 : index + %1681 = select %1679, %1680, %1678 : index + %1682 = divi_signed %1681, %c16 : index + %1683 = subi %c-1, %1682 : index + %1684 = select %1679, %1683, %1682 : index + %1685 = muli %1684, %c-2 : index + %1686 = addi %1677, %1685 : index + %1687 = cmpi "slt", %arg5, %c0 : index + %1688 = subi %c-1, %arg5 : index + %1689 = select %1687, %1688, %arg5 : index + %1690 = divi_signed %1689, %c8 : index + %1691 = subi %c-1, %1690 : index + %1692 = select %1687, %1691, %1690 : index + %1693 = addi %arg5, %c8 : index + %1694 = cmpi "slt", %1693, %c0 : index + %1695 = subi %c-1, %1693 : index + %1696 = select %1694, %1695, %1693 : index + %1697 = divi_signed %1696, %c16 : index + %1698 = subi %c-1, %1697 : index + %1699 = select %1694, %1698, %1697 : index + %1700 = muli %1699, %c-2 : index + %1701 = addi %1692, %1700 : index + %1702 = addi %1701, %c1 : index + %1703 = cmpi "slt", %1702, %c0 : index + %1704 = subi %c-1, %1702 : index + %1705 = select %1703, %1704, %1702 : index + %1706 = divi_signed %1705, %c2 : index + %1707 = subi %c-1, %1706 : index + %1708 = select %1703, %1707, %1706 : index + %1709 = muli %1708, %c-2 : index + %1710 = addi %1686, %1709 : index + %1711 = addi %1710, %c1 : index + %1712 = load %3[%1666, %1671, %1711] : memref<16x128x2xvector<8xf32>> + %1713 = vector.extractelement %1712[%c6_i64 : i64] : vector<8xf32> + %1714 = addi %arg5, %c8 : index + %1715 = cmpi "slt", %1714, %c0 : index + %1716 = subi %c-1, %1714 : index + %1717 = select %1715, %1716, %1714 : index + %1718 = divi_signed %1717, %c16 : index + %1719 = subi %c-1, %1718 : index + %1720 = select %1715, %1719, %1718 : index + %1721 = remi_signed %1720, %c16 : index + %1722 = cmpi "slt", %1721, %c0 : index + %1723 = addi %1721, %c16 : index + %1724 = select %1722, %1723, %1721 : index + %1725 = addi %arg6, %arg7 : index + %1726 = remi_signed %1725, %c128 : index + %1727 = cmpi "slt", %1726, %c0 : index + %1728 = addi %1726, %c128 : index + %1729 = select %1727, %1728, %1726 : index + %1730 = cmpi "slt", %arg5, %c0 : index + %1731 = subi %c-1, %arg5 : index + %1732 = select %1730, %1731, %arg5 : index + %1733 = divi_signed %1732, %c8 : index + %1734 = subi %c-1, %1733 : index + %1735 = select %1730, %1734, %1733 : index + %1736 = addi %arg5, %c8 : index + %1737 = cmpi "slt", %1736, %c0 : index + %1738 = subi %c-1, %1736 : index + %1739 = select %1737, %1738, %1736 : index + %1740 = divi_signed %1739, %c16 : index + %1741 = subi %c-1, %1740 : index + %1742 = select %1737, %1741, %1740 : index + %1743 = muli %1742, %c-2 : index + %1744 = addi %1735, %1743 : index + %1745 = cmpi "slt", %arg5, %c0 : index + %1746 = subi %c-1, %arg5 : index + %1747 = select %1745, %1746, %arg5 : index + %1748 = divi_signed %1747, %c8 : index + %1749 = subi %c-1, %1748 : index + %1750 = select %1745, %1749, %1748 : index + %1751 = addi %arg5, %c8 : index + %1752 = cmpi "slt", %1751, %c0 : index + %1753 = subi %c-1, %1751 : index + %1754 = select %1752, %1753, %1751 : index + %1755 = divi_signed %1754, %c16 : index + %1756 = subi %c-1, %1755 : index + %1757 = select %1752, %1756, %1755 : index + %1758 = muli %1757, %c-2 : index + %1759 = addi %1750, %1758 : index + %1760 = addi %1759, %c1 : index + %1761 = cmpi "slt", %1760, %c0 : index + %1762 = subi %c-1, %1760 : index + %1763 = select %1761, %1762, %1760 : index + %1764 = divi_signed %1763, %c2 : index + %1765 = subi %c-1, %1764 : index + %1766 = select %1761, %1765, %1764 : index + %1767 = muli %1766, %c-2 : index + %1768 = addi %1744, %1767 : index + %1769 = addi %1768, %c1 : index + %1770 = load %3[%1724, %1729, %1769] : memref<16x128x2xvector<8xf32>> + %1771 = vector.extractelement %1770[%c7_i64 : i64] : vector<8xf32> + %1772 = mulf %1300, %1365 {RelaxedPrecision} : f32 + %1773 = mulf %1301, %1423 {RelaxedPrecision} : f32 + %1774 = mulf %1302, %1481 {RelaxedPrecision} : f32 + %1775 = mulf %1303, %1539 {RelaxedPrecision} : f32 + %1776 = mulf %1304, %1597 {RelaxedPrecision} : f32 + %1777 = mulf %1305, %1655 {RelaxedPrecision} : f32 + %1778 = mulf %1306, %1713 {RelaxedPrecision} : f32 + %1779 = mulf %1307, %1771 {RelaxedPrecision} : f32 + %1780 = addi %arg5, %c8 : index + %1781 = cmpi "slt", %1780, %c0 : index + %1782 = subi %c-1, %1780 : index + %1783 = select %1781, %1782, %1780 : index + %1784 = divi_signed %1783, %c16 : index + %1785 = subi %c-1, %1784 : index + %1786 = select %1781, %1785, %1784 : index + %1787 = remi_signed %1786, %c16 : index + %1788 = cmpi "slt", %1787, %c0 : index + %1789 = addi %1787, %c16 : index + %1790 = select %1788, %1789, %1787 : index + %1791 = cmpi "slt", %arg5, %c0 : index + %1792 = subi %c-1, %arg5 : index + %1793 = select %1791, %1792, %arg5 : index + %1794 = divi_signed %1793, %c8 : index + %1795 = subi %c-1, %1794 : index + %1796 = select %1791, %1795, %1794 : index + %1797 = addi %arg5, %c8 : index + %1798 = cmpi "slt", %1797, %c0 : index + %1799 = subi %c-1, %1797 : index + %1800 = select %1798, %1799, %1797 : index + %1801 = divi_signed %1800, %c16 : index + %1802 = subi %c-1, %1801 : index + %1803 = select %1798, %1802, %1801 : index + %1804 = muli %1803, %c-2 : index + %1805 = addi %1796, %1804 : index + %1806 = cmpi "slt", %arg5, %c0 : index + %1807 = subi %c-1, %arg5 : index + %1808 = select %1806, %1807, %arg5 : index + %1809 = divi_signed %1808, %c8 : index + %1810 = subi %c-1, %1809 : index + %1811 = select %1806, %1810, %1809 : index + %1812 = addi %arg5, %c8 : index + %1813 = cmpi "slt", %1812, %c0 : index + %1814 = subi %c-1, %1812 : index + %1815 = select %1813, %1814, %1812 : index + %1816 = divi_signed %1815, %c16 : index + %1817 = subi %c-1, %1816 : index + %1818 = select %1813, %1817, %1816 : index + %1819 = muli %1818, %c-2 : index + %1820 = addi %1811, %1819 : index + %1821 = addi %1820, %c1 : index + %1822 = cmpi "slt", %1821, %c0 : index + %1823 = subi %c-1, %1821 : index + %1824 = select %1822, %1823, %1821 : index + %1825 = divi_signed %1824, %c2 : index + %1826 = subi %c-1, %1825 : index + %1827 = select %1822, %1826, %1825 : index + %1828 = muli %1827, %c-2 : index + %1829 = addi %1805, %1828 : index + %1830 = addi %1829, %c1 : index + %1831 = load %2[%1790, %c0, %1830] : memref<16x6x2xvector<8xf32>> + %1832 = vector.extractelement %1831[%c0_i64 : i64] : vector<8xf32> + %1833 = addi %arg5, %c8 : index + %1834 = cmpi "slt", %1833, %c0 : index + %1835 = subi %c-1, %1833 : index + %1836 = select %1834, %1835, %1833 : index + %1837 = divi_signed %1836, %c16 : index + %1838 = subi %c-1, %1837 : index + %1839 = select %1834, %1838, %1837 : index + %1840 = remi_signed %1839, %c16 : index + %1841 = cmpi "slt", %1840, %c0 : index + %1842 = addi %1840, %c16 : index + %1843 = select %1841, %1842, %1840 : index + %1844 = cmpi "slt", %arg5, %c0 : index + %1845 = subi %c-1, %arg5 : index + %1846 = select %1844, %1845, %arg5 : index + %1847 = divi_signed %1846, %c8 : index + %1848 = subi %c-1, %1847 : index + %1849 = select %1844, %1848, %1847 : index + %1850 = addi %arg5, %c8 : index + %1851 = cmpi "slt", %1850, %c0 : index + %1852 = subi %c-1, %1850 : index + %1853 = select %1851, %1852, %1850 : index + %1854 = divi_signed %1853, %c16 : index + %1855 = subi %c-1, %1854 : index + %1856 = select %1851, %1855, %1854 : index + %1857 = muli %1856, %c-2 : index + %1858 = addi %1849, %1857 : index + %1859 = cmpi "slt", %arg5, %c0 : index + %1860 = subi %c-1, %arg5 : index + %1861 = select %1859, %1860, %arg5 : index + %1862 = divi_signed %1861, %c8 : index + %1863 = subi %c-1, %1862 : index + %1864 = select %1859, %1863, %1862 : index + %1865 = addi %arg5, %c8 : index + %1866 = cmpi "slt", %1865, %c0 : index + %1867 = subi %c-1, %1865 : index + %1868 = select %1866, %1867, %1865 : index + %1869 = divi_signed %1868, %c16 : index + %1870 = subi %c-1, %1869 : index + %1871 = select %1866, %1870, %1869 : index + %1872 = muli %1871, %c-2 : index + %1873 = addi %1864, %1872 : index + %1874 = addi %1873, %c1 : index + %1875 = cmpi "slt", %1874, %c0 : index + %1876 = subi %c-1, %1874 : index + %1877 = select %1875, %1876, %1874 : index + %1878 = divi_signed %1877, %c2 : index + %1879 = subi %c-1, %1878 : index + %1880 = select %1875, %1879, %1878 : index + %1881 = muli %1880, %c-2 : index + %1882 = addi %1858, %1881 : index + %1883 = addi %1882, %c1 : index + %1884 = load %2[%1843, %c0, %1883] : memref<16x6x2xvector<8xf32>> + %1885 = vector.extractelement %1884[%c1_i64 : i64] : vector<8xf32> + %1886 = addi %arg5, %c8 : index + %1887 = cmpi "slt", %1886, %c0 : index + %1888 = subi %c-1, %1886 : index + %1889 = select %1887, %1888, %1886 : index + %1890 = divi_signed %1889, %c16 : index + %1891 = subi %c-1, %1890 : index + %1892 = select %1887, %1891, %1890 : index + %1893 = remi_signed %1892, %c16 : index + %1894 = cmpi "slt", %1893, %c0 : index + %1895 = addi %1893, %c16 : index + %1896 = select %1894, %1895, %1893 : index + %1897 = cmpi "slt", %arg5, %c0 : index + %1898 = subi %c-1, %arg5 : index + %1899 = select %1897, %1898, %arg5 : index + %1900 = divi_signed %1899, %c8 : index + %1901 = subi %c-1, %1900 : index + %1902 = select %1897, %1901, %1900 : index + %1903 = addi %arg5, %c8 : index + %1904 = cmpi "slt", %1903, %c0 : index + %1905 = subi %c-1, %1903 : index + %1906 = select %1904, %1905, %1903 : index + %1907 = divi_signed %1906, %c16 : index + %1908 = subi %c-1, %1907 : index + %1909 = select %1904, %1908, %1907 : index + %1910 = muli %1909, %c-2 : index + %1911 = addi %1902, %1910 : index + %1912 = cmpi "slt", %arg5, %c0 : index + %1913 = subi %c-1, %arg5 : index + %1914 = select %1912, %1913, %arg5 : index + %1915 = divi_signed %1914, %c8 : index + %1916 = subi %c-1, %1915 : index + %1917 = select %1912, %1916, %1915 : index + %1918 = addi %arg5, %c8 : index + %1919 = cmpi "slt", %1918, %c0 : index + %1920 = subi %c-1, %1918 : index + %1921 = select %1919, %1920, %1918 : index + %1922 = divi_signed %1921, %c16 : index + %1923 = subi %c-1, %1922 : index + %1924 = select %1919, %1923, %1922 : index + %1925 = muli %1924, %c-2 : index + %1926 = addi %1917, %1925 : index + %1927 = addi %1926, %c1 : index + %1928 = cmpi "slt", %1927, %c0 : index + %1929 = subi %c-1, %1927 : index + %1930 = select %1928, %1929, %1927 : index + %1931 = divi_signed %1930, %c2 : index + %1932 = subi %c-1, %1931 : index + %1933 = select %1928, %1932, %1931 : index + %1934 = muli %1933, %c-2 : index + %1935 = addi %1911, %1934 : index + %1936 = addi %1935, %c1 : index + %1937 = load %2[%1896, %c0, %1936] : memref<16x6x2xvector<8xf32>> + %1938 = vector.extractelement %1937[%c2_i64 : i64] : vector<8xf32> + %1939 = addi %arg5, %c8 : index + %1940 = cmpi "slt", %1939, %c0 : index + %1941 = subi %c-1, %1939 : index + %1942 = select %1940, %1941, %1939 : index + %1943 = divi_signed %1942, %c16 : index + %1944 = subi %c-1, %1943 : index + %1945 = select %1940, %1944, %1943 : index + %1946 = remi_signed %1945, %c16 : index + %1947 = cmpi "slt", %1946, %c0 : index + %1948 = addi %1946, %c16 : index + %1949 = select %1947, %1948, %1946 : index + %1950 = cmpi "slt", %arg5, %c0 : index + %1951 = subi %c-1, %arg5 : index + %1952 = select %1950, %1951, %arg5 : index + %1953 = divi_signed %1952, %c8 : index + %1954 = subi %c-1, %1953 : index + %1955 = select %1950, %1954, %1953 : index + %1956 = addi %arg5, %c8 : index + %1957 = cmpi "slt", %1956, %c0 : index + %1958 = subi %c-1, %1956 : index + %1959 = select %1957, %1958, %1956 : index + %1960 = divi_signed %1959, %c16 : index + %1961 = subi %c-1, %1960 : index + %1962 = select %1957, %1961, %1960 : index + %1963 = muli %1962, %c-2 : index + %1964 = addi %1955, %1963 : index + %1965 = cmpi "slt", %arg5, %c0 : index + %1966 = subi %c-1, %arg5 : index + %1967 = select %1965, %1966, %arg5 : index + %1968 = divi_signed %1967, %c8 : index + %1969 = subi %c-1, %1968 : index + %1970 = select %1965, %1969, %1968 : index + %1971 = addi %arg5, %c8 : index + %1972 = cmpi "slt", %1971, %c0 : index + %1973 = subi %c-1, %1971 : index + %1974 = select %1972, %1973, %1971 : index + %1975 = divi_signed %1974, %c16 : index + %1976 = subi %c-1, %1975 : index + %1977 = select %1972, %1976, %1975 : index + %1978 = muli %1977, %c-2 : index + %1979 = addi %1970, %1978 : index + %1980 = addi %1979, %c1 : index + %1981 = cmpi "slt", %1980, %c0 : index + %1982 = subi %c-1, %1980 : index + %1983 = select %1981, %1982, %1980 : index + %1984 = divi_signed %1983, %c2 : index + %1985 = subi %c-1, %1984 : index + %1986 = select %1981, %1985, %1984 : index + %1987 = muli %1986, %c-2 : index + %1988 = addi %1964, %1987 : index + %1989 = addi %1988, %c1 : index + %1990 = load %2[%1949, %c0, %1989] : memref<16x6x2xvector<8xf32>> + %1991 = vector.extractelement %1990[%c3_i64 : i64] : vector<8xf32> + %1992 = addi %arg5, %c8 : index + %1993 = cmpi "slt", %1992, %c0 : index + %1994 = subi %c-1, %1992 : index + %1995 = select %1993, %1994, %1992 : index + %1996 = divi_signed %1995, %c16 : index + %1997 = subi %c-1, %1996 : index + %1998 = select %1993, %1997, %1996 : index + %1999 = remi_signed %1998, %c16 : index + %2000 = cmpi "slt", %1999, %c0 : index + %2001 = addi %1999, %c16 : index + %2002 = select %2000, %2001, %1999 : index + %2003 = cmpi "slt", %arg5, %c0 : index + %2004 = subi %c-1, %arg5 : index + %2005 = select %2003, %2004, %arg5 : index + %2006 = divi_signed %2005, %c8 : index + %2007 = subi %c-1, %2006 : index + %2008 = select %2003, %2007, %2006 : index + %2009 = addi %arg5, %c8 : index + %2010 = cmpi "slt", %2009, %c0 : index + %2011 = subi %c-1, %2009 : index + %2012 = select %2010, %2011, %2009 : index + %2013 = divi_signed %2012, %c16 : index + %2014 = subi %c-1, %2013 : index + %2015 = select %2010, %2014, %2013 : index + %2016 = muli %2015, %c-2 : index + %2017 = addi %2008, %2016 : index + %2018 = cmpi "slt", %arg5, %c0 : index + %2019 = subi %c-1, %arg5 : index + %2020 = select %2018, %2019, %arg5 : index + %2021 = divi_signed %2020, %c8 : index + %2022 = subi %c-1, %2021 : index + %2023 = select %2018, %2022, %2021 : index + %2024 = addi %arg5, %c8 : index + %2025 = cmpi "slt", %2024, %c0 : index + %2026 = subi %c-1, %2024 : index + %2027 = select %2025, %2026, %2024 : index + %2028 = divi_signed %2027, %c16 : index + %2029 = subi %c-1, %2028 : index + %2030 = select %2025, %2029, %2028 : index + %2031 = muli %2030, %c-2 : index + %2032 = addi %2023, %2031 : index + %2033 = addi %2032, %c1 : index + %2034 = cmpi "slt", %2033, %c0 : index + %2035 = subi %c-1, %2033 : index + %2036 = select %2034, %2035, %2033 : index + %2037 = divi_signed %2036, %c2 : index + %2038 = subi %c-1, %2037 : index + %2039 = select %2034, %2038, %2037 : index + %2040 = muli %2039, %c-2 : index + %2041 = addi %2017, %2040 : index + %2042 = addi %2041, %c1 : index + %2043 = load %2[%2002, %c0, %2042] : memref<16x6x2xvector<8xf32>> + %2044 = vector.extractelement %2043[%c4_i64 : i64] : vector<8xf32> + %2045 = addi %arg5, %c8 : index + %2046 = cmpi "slt", %2045, %c0 : index + %2047 = subi %c-1, %2045 : index + %2048 = select %2046, %2047, %2045 : index + %2049 = divi_signed %2048, %c16 : index + %2050 = subi %c-1, %2049 : index + %2051 = select %2046, %2050, %2049 : index + %2052 = remi_signed %2051, %c16 : index + %2053 = cmpi "slt", %2052, %c0 : index + %2054 = addi %2052, %c16 : index + %2055 = select %2053, %2054, %2052 : index + %2056 = cmpi "slt", %arg5, %c0 : index + %2057 = subi %c-1, %arg5 : index + %2058 = select %2056, %2057, %arg5 : index + %2059 = divi_signed %2058, %c8 : index + %2060 = subi %c-1, %2059 : index + %2061 = select %2056, %2060, %2059 : index + %2062 = addi %arg5, %c8 : index + %2063 = cmpi "slt", %2062, %c0 : index + %2064 = subi %c-1, %2062 : index + %2065 = select %2063, %2064, %2062 : index + %2066 = divi_signed %2065, %c16 : index + %2067 = subi %c-1, %2066 : index + %2068 = select %2063, %2067, %2066 : index + %2069 = muli %2068, %c-2 : index + %2070 = addi %2061, %2069 : index + %2071 = cmpi "slt", %arg5, %c0 : index + %2072 = subi %c-1, %arg5 : index + %2073 = select %2071, %2072, %arg5 : index + %2074 = divi_signed %2073, %c8 : index + %2075 = subi %c-1, %2074 : index + %2076 = select %2071, %2075, %2074 : index + %2077 = addi %arg5, %c8 : index + %2078 = cmpi "slt", %2077, %c0 : index + %2079 = subi %c-1, %2077 : index + %2080 = select %2078, %2079, %2077 : index + %2081 = divi_signed %2080, %c16 : index + %2082 = subi %c-1, %2081 : index + %2083 = select %2078, %2082, %2081 : index + %2084 = muli %2083, %c-2 : index + %2085 = addi %2076, %2084 : index + %2086 = addi %2085, %c1 : index + %2087 = cmpi "slt", %2086, %c0 : index + %2088 = subi %c-1, %2086 : index + %2089 = select %2087, %2088, %2086 : index + %2090 = divi_signed %2089, %c2 : index + %2091 = subi %c-1, %2090 : index + %2092 = select %2087, %2091, %2090 : index + %2093 = muli %2092, %c-2 : index + %2094 = addi %2070, %2093 : index + %2095 = addi %2094, %c1 : index + %2096 = load %2[%2055, %c0, %2095] : memref<16x6x2xvector<8xf32>> + %2097 = vector.extractelement %2096[%c5_i64 : i64] : vector<8xf32> + %2098 = addi %arg5, %c8 : index + %2099 = cmpi "slt", %2098, %c0 : index + %2100 = subi %c-1, %2098 : index + %2101 = select %2099, %2100, %2098 : index + %2102 = divi_signed %2101, %c16 : index + %2103 = subi %c-1, %2102 : index + %2104 = select %2099, %2103, %2102 : index + %2105 = remi_signed %2104, %c16 : index + %2106 = cmpi "slt", %2105, %c0 : index + %2107 = addi %2105, %c16 : index + %2108 = select %2106, %2107, %2105 : index + %2109 = cmpi "slt", %arg5, %c0 : index + %2110 = subi %c-1, %arg5 : index + %2111 = select %2109, %2110, %arg5 : index + %2112 = divi_signed %2111, %c8 : index + %2113 = subi %c-1, %2112 : index + %2114 = select %2109, %2113, %2112 : index + %2115 = addi %arg5, %c8 : index + %2116 = cmpi "slt", %2115, %c0 : index + %2117 = subi %c-1, %2115 : index + %2118 = select %2116, %2117, %2115 : index + %2119 = divi_signed %2118, %c16 : index + %2120 = subi %c-1, %2119 : index + %2121 = select %2116, %2120, %2119 : index + %2122 = muli %2121, %c-2 : index + %2123 = addi %2114, %2122 : index + %2124 = cmpi "slt", %arg5, %c0 : index + %2125 = subi %c-1, %arg5 : index + %2126 = select %2124, %2125, %arg5 : index + %2127 = divi_signed %2126, %c8 : index + %2128 = subi %c-1, %2127 : index + %2129 = select %2124, %2128, %2127 : index + %2130 = addi %arg5, %c8 : index + %2131 = cmpi "slt", %2130, %c0 : index + %2132 = subi %c-1, %2130 : index + %2133 = select %2131, %2132, %2130 : index + %2134 = divi_signed %2133, %c16 : index + %2135 = subi %c-1, %2134 : index + %2136 = select %2131, %2135, %2134 : index + %2137 = muli %2136, %c-2 : index + %2138 = addi %2129, %2137 : index + %2139 = addi %2138, %c1 : index + %2140 = cmpi "slt", %2139, %c0 : index + %2141 = subi %c-1, %2139 : index + %2142 = select %2140, %2141, %2139 : index + %2143 = divi_signed %2142, %c2 : index + %2144 = subi %c-1, %2143 : index + %2145 = select %2140, %2144, %2143 : index + %2146 = muli %2145, %c-2 : index + %2147 = addi %2123, %2146 : index + %2148 = addi %2147, %c1 : index + %2149 = load %2[%2108, %c0, %2148] : memref<16x6x2xvector<8xf32>> + %2150 = vector.extractelement %2149[%c6_i64 : i64] : vector<8xf32> + %2151 = addi %arg5, %c8 : index + %2152 = cmpi "slt", %2151, %c0 : index + %2153 = subi %c-1, %2151 : index + %2154 = select %2152, %2153, %2151 : index + %2155 = divi_signed %2154, %c16 : index + %2156 = subi %c-1, %2155 : index + %2157 = select %2152, %2156, %2155 : index + %2158 = remi_signed %2157, %c16 : index + %2159 = cmpi "slt", %2158, %c0 : index + %2160 = addi %2158, %c16 : index + %2161 = select %2159, %2160, %2158 : index + %2162 = cmpi "slt", %arg5, %c0 : index + %2163 = subi %c-1, %arg5 : index + %2164 = select %2162, %2163, %arg5 : index + %2165 = divi_signed %2164, %c8 : index + %2166 = subi %c-1, %2165 : index + %2167 = select %2162, %2166, %2165 : index + %2168 = addi %arg5, %c8 : index + %2169 = cmpi "slt", %2168, %c0 : index + %2170 = subi %c-1, %2168 : index + %2171 = select %2169, %2170, %2168 : index + %2172 = divi_signed %2171, %c16 : index + %2173 = subi %c-1, %2172 : index + %2174 = select %2169, %2173, %2172 : index + %2175 = muli %2174, %c-2 : index + %2176 = addi %2167, %2175 : index + %2177 = cmpi "slt", %arg5, %c0 : index + %2178 = subi %c-1, %arg5 : index + %2179 = select %2177, %2178, %arg5 : index + %2180 = divi_signed %2179, %c8 : index + %2181 = subi %c-1, %2180 : index + %2182 = select %2177, %2181, %2180 : index + %2183 = addi %arg5, %c8 : index + %2184 = cmpi "slt", %2183, %c0 : index + %2185 = subi %c-1, %2183 : index + %2186 = select %2184, %2185, %2183 : index + %2187 = divi_signed %2186, %c16 : index + %2188 = subi %c-1, %2187 : index + %2189 = select %2184, %2188, %2187 : index + %2190 = muli %2189, %c-2 : index + %2191 = addi %2182, %2190 : index + %2192 = addi %2191, %c1 : index + %2193 = cmpi "slt", %2192, %c0 : index + %2194 = subi %c-1, %2192 : index + %2195 = select %2193, %2194, %2192 : index + %2196 = divi_signed %2195, %c2 : index + %2197 = subi %c-1, %2196 : index + %2198 = select %2193, %2197, %2196 : index + %2199 = muli %2198, %c-2 : index + %2200 = addi %2176, %2199 : index + %2201 = addi %2200, %c1 : index + %2202 = load %2[%2161, %c0, %2201] : memref<16x6x2xvector<8xf32>> + %2203 = vector.extractelement %2202[%c7_i64 : i64] : vector<8xf32> + %2204 = addf %1832, %1772 {RelaxedPrecision} : f32 + %2205 = addf %1885, %1773 {RelaxedPrecision} : f32 + %2206 = addf %1938, %1774 {RelaxedPrecision} : f32 + %2207 = addf %1991, %1775 {RelaxedPrecision} : f32 + %2208 = addf %2044, %1776 {RelaxedPrecision} : f32 + %2209 = addf %2097, %1777 {RelaxedPrecision} : f32 + %2210 = addf %2150, %1778 {RelaxedPrecision} : f32 + %2211 = addf %2203, %1779 {RelaxedPrecision} : f32 + %2212 = addi %arg5, %c8 : index + %2213 = cmpi "slt", %2212, %c0 : index + %2214 = subi %c-1, %2212 : index + %2215 = select %2213, %2214, %2212 : index + %2216 = divi_signed %2215, %c16 : index + %2217 = subi %c-1, %2216 : index + %2218 = select %2213, %2217, %2216 : index + %2219 = remi_signed %2218, %c16 : index + %2220 = cmpi "slt", %2219, %c0 : index + %2221 = addi %2219, %c16 : index + %2222 = select %2220, %2221, %2219 : index + %2223 = cmpi "slt", %arg5, %c0 : index + %2224 = subi %c-1, %arg5 : index + %2225 = select %2223, %2224, %arg5 : index + %2226 = divi_signed %2225, %c8 : index + %2227 = subi %c-1, %2226 : index + %2228 = select %2223, %2227, %2226 : index + %2229 = addi %arg5, %c8 : index + %2230 = cmpi "slt", %2229, %c0 : index + %2231 = subi %c-1, %2229 : index + %2232 = select %2230, %2231, %2229 : index + %2233 = divi_signed %2232, %c16 : index + %2234 = subi %c-1, %2233 : index + %2235 = select %2230, %2234, %2233 : index + %2236 = muli %2235, %c-2 : index + %2237 = addi %2228, %2236 : index + %2238 = cmpi "slt", %arg5, %c0 : index + %2239 = subi %c-1, %arg5 : index + %2240 = select %2238, %2239, %arg5 : index + %2241 = divi_signed %2240, %c8 : index + %2242 = subi %c-1, %2241 : index + %2243 = select %2238, %2242, %2241 : index + %2244 = addi %arg5, %c8 : index + %2245 = cmpi "slt", %2244, %c0 : index + %2246 = subi %c-1, %2244 : index + %2247 = select %2245, %2246, %2244 : index + %2248 = divi_signed %2247, %c16 : index + %2249 = subi %c-1, %2248 : index + %2250 = select %2245, %2249, %2248 : index + %2251 = muli %2250, %c-2 : index + %2252 = addi %2243, %2251 : index + %2253 = addi %2252, %c1 : index + %2254 = cmpi "slt", %2253, %c0 : index + %2255 = subi %c-1, %2253 : index + %2256 = select %2254, %2255, %2253 : index + %2257 = divi_signed %2256, %c2 : index + %2258 = subi %c-1, %2257 : index + %2259 = select %2254, %2258, %2257 : index + %2260 = muli %2259, %c-2 : index + %2261 = addi %2237, %2260 : index + %2262 = addi %2261, %c1 : index + %2263 = load %2[%2222, %c0, %2262] : memref<16x6x2xvector<8xf32>> + %2264 = vector.insertelement %2204, %2263[%c0_i64 : i64] : vector<8xf32> + %2265 = addi %arg5, %c8 : index + %2266 = cmpi "slt", %2265, %c0 : index + %2267 = subi %c-1, %2265 : index + %2268 = select %2266, %2267, %2265 : index + %2269 = divi_signed %2268, %c16 : index + %2270 = subi %c-1, %2269 : index + %2271 = select %2266, %2270, %2269 : index + %2272 = remi_signed %2271, %c16 : index + %2273 = cmpi "slt", %2272, %c0 : index + %2274 = addi %2272, %c16 : index + %2275 = select %2273, %2274, %2272 : index + %2276 = cmpi "slt", %arg5, %c0 : index + %2277 = subi %c-1, %arg5 : index + %2278 = select %2276, %2277, %arg5 : index + %2279 = divi_signed %2278, %c8 : index + %2280 = subi %c-1, %2279 : index + %2281 = select %2276, %2280, %2279 : index + %2282 = addi %arg5, %c8 : index + %2283 = cmpi "slt", %2282, %c0 : index + %2284 = subi %c-1, %2282 : index + %2285 = select %2283, %2284, %2282 : index + %2286 = divi_signed %2285, %c16 : index + %2287 = subi %c-1, %2286 : index + %2288 = select %2283, %2287, %2286 : index + %2289 = muli %2288, %c-2 : index + %2290 = addi %2281, %2289 : index + %2291 = cmpi "slt", %arg5, %c0 : index + %2292 = subi %c-1, %arg5 : index + %2293 = select %2291, %2292, %arg5 : index + %2294 = divi_signed %2293, %c8 : index + %2295 = subi %c-1, %2294 : index + %2296 = select %2291, %2295, %2294 : index + %2297 = addi %arg5, %c8 : index + %2298 = cmpi "slt", %2297, %c0 : index + %2299 = subi %c-1, %2297 : index + %2300 = select %2298, %2299, %2297 : index + %2301 = divi_signed %2300, %c16 : index + %2302 = subi %c-1, %2301 : index + %2303 = select %2298, %2302, %2301 : index + %2304 = muli %2303, %c-2 : index + %2305 = addi %2296, %2304 : index + %2306 = addi %2305, %c1 : index + %2307 = cmpi "slt", %2306, %c0 : index + %2308 = subi %c-1, %2306 : index + %2309 = select %2307, %2308, %2306 : index + %2310 = divi_signed %2309, %c2 : index + %2311 = subi %c-1, %2310 : index + %2312 = select %2307, %2311, %2310 : index + %2313 = muli %2312, %c-2 : index + %2314 = addi %2290, %2313 : index + %2315 = addi %2314, %c1 : index + store %2264, %2[%2275, %c0, %2315] : memref<16x6x2xvector<8xf32>> + %2316 = addi %arg5, %c8 : index + %2317 = cmpi "slt", %2316, %c0 : index + %2318 = subi %c-1, %2316 : index + %2319 = select %2317, %2318, %2316 : index + %2320 = divi_signed %2319, %c16 : index + %2321 = subi %c-1, %2320 : index + %2322 = select %2317, %2321, %2320 : index + %2323 = remi_signed %2322, %c16 : index + %2324 = cmpi "slt", %2323, %c0 : index + %2325 = addi %2323, %c16 : index + %2326 = select %2324, %2325, %2323 : index + %2327 = cmpi "slt", %arg5, %c0 : index + %2328 = subi %c-1, %arg5 : index + %2329 = select %2327, %2328, %arg5 : index + %2330 = divi_signed %2329, %c8 : index + %2331 = subi %c-1, %2330 : index + %2332 = select %2327, %2331, %2330 : index + %2333 = addi %arg5, %c8 : index + %2334 = cmpi "slt", %2333, %c0 : index + %2335 = subi %c-1, %2333 : index + %2336 = select %2334, %2335, %2333 : index + %2337 = divi_signed %2336, %c16 : index + %2338 = subi %c-1, %2337 : index + %2339 = select %2334, %2338, %2337 : index + %2340 = muli %2339, %c-2 : index + %2341 = addi %2332, %2340 : index + %2342 = cmpi "slt", %arg5, %c0 : index + %2343 = subi %c-1, %arg5 : index + %2344 = select %2342, %2343, %arg5 : index + %2345 = divi_signed %2344, %c8 : index + %2346 = subi %c-1, %2345 : index + %2347 = select %2342, %2346, %2345 : index + %2348 = addi %arg5, %c8 : index + %2349 = cmpi "slt", %2348, %c0 : index + %2350 = subi %c-1, %2348 : index + %2351 = select %2349, %2350, %2348 : index + %2352 = divi_signed %2351, %c16 : index + %2353 = subi %c-1, %2352 : index + %2354 = select %2349, %2353, %2352 : index + %2355 = muli %2354, %c-2 : index + %2356 = addi %2347, %2355 : index + %2357 = addi %2356, %c1 : index + %2358 = cmpi "slt", %2357, %c0 : index + %2359 = subi %c-1, %2357 : index + %2360 = select %2358, %2359, %2357 : index + %2361 = divi_signed %2360, %c2 : index + %2362 = subi %c-1, %2361 : index + %2363 = select %2358, %2362, %2361 : index + %2364 = muli %2363, %c-2 : index + %2365 = addi %2341, %2364 : index + %2366 = addi %2365, %c1 : index + %2367 = load %2[%2326, %c0, %2366] : memref<16x6x2xvector<8xf32>> + %2368 = vector.insertelement %2205, %2367[%c1_i64 : i64] : vector<8xf32> + %2369 = addi %arg5, %c8 : index + %2370 = cmpi "slt", %2369, %c0 : index + %2371 = subi %c-1, %2369 : index + %2372 = select %2370, %2371, %2369 : index + %2373 = divi_signed %2372, %c16 : index + %2374 = subi %c-1, %2373 : index + %2375 = select %2370, %2374, %2373 : index + %2376 = remi_signed %2375, %c16 : index + %2377 = cmpi "slt", %2376, %c0 : index + %2378 = addi %2376, %c16 : index + %2379 = select %2377, %2378, %2376 : index + %2380 = cmpi "slt", %arg5, %c0 : index + %2381 = subi %c-1, %arg5 : index + %2382 = select %2380, %2381, %arg5 : index + %2383 = divi_signed %2382, %c8 : index + %2384 = subi %c-1, %2383 : index + %2385 = select %2380, %2384, %2383 : index + %2386 = addi %arg5, %c8 : index + %2387 = cmpi "slt", %2386, %c0 : index + %2388 = subi %c-1, %2386 : index + %2389 = select %2387, %2388, %2386 : index + %2390 = divi_signed %2389, %c16 : index + %2391 = subi %c-1, %2390 : index + %2392 = select %2387, %2391, %2390 : index + %2393 = muli %2392, %c-2 : index + %2394 = addi %2385, %2393 : index + %2395 = cmpi "slt", %arg5, %c0 : index + %2396 = subi %c-1, %arg5 : index + %2397 = select %2395, %2396, %arg5 : index + %2398 = divi_signed %2397, %c8 : index + %2399 = subi %c-1, %2398 : index + %2400 = select %2395, %2399, %2398 : index + %2401 = addi %arg5, %c8 : index + %2402 = cmpi "slt", %2401, %c0 : index + %2403 = subi %c-1, %2401 : index + %2404 = select %2402, %2403, %2401 : index + %2405 = divi_signed %2404, %c16 : index + %2406 = subi %c-1, %2405 : index + %2407 = select %2402, %2406, %2405 : index + %2408 = muli %2407, %c-2 : index + %2409 = addi %2400, %2408 : index + %2410 = addi %2409, %c1 : index + %2411 = cmpi "slt", %2410, %c0 : index + %2412 = subi %c-1, %2410 : index + %2413 = select %2411, %2412, %2410 : index + %2414 = divi_signed %2413, %c2 : index + %2415 = subi %c-1, %2414 : index + %2416 = select %2411, %2415, %2414 : index + %2417 = muli %2416, %c-2 : index + %2418 = addi %2394, %2417 : index + %2419 = addi %2418, %c1 : index + store %2368, %2[%2379, %c0, %2419] : memref<16x6x2xvector<8xf32>> + %2420 = addi %arg5, %c8 : index + %2421 = cmpi "slt", %2420, %c0 : index + %2422 = subi %c-1, %2420 : index + %2423 = select %2421, %2422, %2420 : index + %2424 = divi_signed %2423, %c16 : index + %2425 = subi %c-1, %2424 : index + %2426 = select %2421, %2425, %2424 : index + %2427 = remi_signed %2426, %c16 : index + %2428 = cmpi "slt", %2427, %c0 : index + %2429 = addi %2427, %c16 : index + %2430 = select %2428, %2429, %2427 : index + %2431 = cmpi "slt", %arg5, %c0 : index + %2432 = subi %c-1, %arg5 : index + %2433 = select %2431, %2432, %arg5 : index + %2434 = divi_signed %2433, %c8 : index + %2435 = subi %c-1, %2434 : index + %2436 = select %2431, %2435, %2434 : index + %2437 = addi %arg5, %c8 : index + %2438 = cmpi "slt", %2437, %c0 : index + %2439 = subi %c-1, %2437 : index + %2440 = select %2438, %2439, %2437 : index + %2441 = divi_signed %2440, %c16 : index + %2442 = subi %c-1, %2441 : index + %2443 = select %2438, %2442, %2441 : index + %2444 = muli %2443, %c-2 : index + %2445 = addi %2436, %2444 : index + %2446 = cmpi "slt", %arg5, %c0 : index + %2447 = subi %c-1, %arg5 : index + %2448 = select %2446, %2447, %arg5 : index + %2449 = divi_signed %2448, %c8 : index + %2450 = subi %c-1, %2449 : index + %2451 = select %2446, %2450, %2449 : index + %2452 = addi %arg5, %c8 : index + %2453 = cmpi "slt", %2452, %c0 : index + %2454 = subi %c-1, %2452 : index + %2455 = select %2453, %2454, %2452 : index + %2456 = divi_signed %2455, %c16 : index + %2457 = subi %c-1, %2456 : index + %2458 = select %2453, %2457, %2456 : index + %2459 = muli %2458, %c-2 : index + %2460 = addi %2451, %2459 : index + %2461 = addi %2460, %c1 : index + %2462 = cmpi "slt", %2461, %c0 : index + %2463 = subi %c-1, %2461 : index + %2464 = select %2462, %2463, %2461 : index + %2465 = divi_signed %2464, %c2 : index + %2466 = subi %c-1, %2465 : index + %2467 = select %2462, %2466, %2465 : index + %2468 = muli %2467, %c-2 : index + %2469 = addi %2445, %2468 : index + %2470 = addi %2469, %c1 : index + %2471 = load %2[%2430, %c0, %2470] : memref<16x6x2xvector<8xf32>> + %2472 = vector.insertelement %2206, %2471[%c2_i64 : i64] : vector<8xf32> + %2473 = addi %arg5, %c8 : index + %2474 = cmpi "slt", %2473, %c0 : index + %2475 = subi %c-1, %2473 : index + %2476 = select %2474, %2475, %2473 : index + %2477 = divi_signed %2476, %c16 : index + %2478 = subi %c-1, %2477 : index + %2479 = select %2474, %2478, %2477 : index + %2480 = remi_signed %2479, %c16 : index + %2481 = cmpi "slt", %2480, %c0 : index + %2482 = addi %2480, %c16 : index + %2483 = select %2481, %2482, %2480 : index + %2484 = cmpi "slt", %arg5, %c0 : index + %2485 = subi %c-1, %arg5 : index + %2486 = select %2484, %2485, %arg5 : index + %2487 = divi_signed %2486, %c8 : index + %2488 = subi %c-1, %2487 : index + %2489 = select %2484, %2488, %2487 : index + %2490 = addi %arg5, %c8 : index + %2491 = cmpi "slt", %2490, %c0 : index + %2492 = subi %c-1, %2490 : index + %2493 = select %2491, %2492, %2490 : index + %2494 = divi_signed %2493, %c16 : index + %2495 = subi %c-1, %2494 : index + %2496 = select %2491, %2495, %2494 : index + %2497 = muli %2496, %c-2 : index + %2498 = addi %2489, %2497 : index + %2499 = cmpi "slt", %arg5, %c0 : index + %2500 = subi %c-1, %arg5 : index + %2501 = select %2499, %2500, %arg5 : index + %2502 = divi_signed %2501, %c8 : index + %2503 = subi %c-1, %2502 : index + %2504 = select %2499, %2503, %2502 : index + %2505 = addi %arg5, %c8 : index + %2506 = cmpi "slt", %2505, %c0 : index + %2507 = subi %c-1, %2505 : index + %2508 = select %2506, %2507, %2505 : index + %2509 = divi_signed %2508, %c16 : index + %2510 = subi %c-1, %2509 : index + %2511 = select %2506, %2510, %2509 : index + %2512 = muli %2511, %c-2 : index + %2513 = addi %2504, %2512 : index + %2514 = addi %2513, %c1 : index + %2515 = cmpi "slt", %2514, %c0 : index + %2516 = subi %c-1, %2514 : index + %2517 = select %2515, %2516, %2514 : index + %2518 = divi_signed %2517, %c2 : index + %2519 = subi %c-1, %2518 : index + %2520 = select %2515, %2519, %2518 : index + %2521 = muli %2520, %c-2 : index + %2522 = addi %2498, %2521 : index + %2523 = addi %2522, %c1 : index + store %2472, %2[%2483, %c0, %2523] : memref<16x6x2xvector<8xf32>> + %2524 = addi %arg5, %c8 : index + %2525 = cmpi "slt", %2524, %c0 : index + %2526 = subi %c-1, %2524 : index + %2527 = select %2525, %2526, %2524 : index + %2528 = divi_signed %2527, %c16 : index + %2529 = subi %c-1, %2528 : index + %2530 = select %2525, %2529, %2528 : index + %2531 = remi_signed %2530, %c16 : index + %2532 = cmpi "slt", %2531, %c0 : index + %2533 = addi %2531, %c16 : index + %2534 = select %2532, %2533, %2531 : index + %2535 = cmpi "slt", %arg5, %c0 : index + %2536 = subi %c-1, %arg5 : index + %2537 = select %2535, %2536, %arg5 : index + %2538 = divi_signed %2537, %c8 : index + %2539 = subi %c-1, %2538 : index + %2540 = select %2535, %2539, %2538 : index + %2541 = addi %arg5, %c8 : index + %2542 = cmpi "slt", %2541, %c0 : index + %2543 = subi %c-1, %2541 : index + %2544 = select %2542, %2543, %2541 : index + %2545 = divi_signed %2544, %c16 : index + %2546 = subi %c-1, %2545 : index + %2547 = select %2542, %2546, %2545 : index + %2548 = muli %2547, %c-2 : index + %2549 = addi %2540, %2548 : index + %2550 = cmpi "slt", %arg5, %c0 : index + %2551 = subi %c-1, %arg5 : index + %2552 = select %2550, %2551, %arg5 : index + %2553 = divi_signed %2552, %c8 : index + %2554 = subi %c-1, %2553 : index + %2555 = select %2550, %2554, %2553 : index + %2556 = addi %arg5, %c8 : index + %2557 = cmpi "slt", %2556, %c0 : index + %2558 = subi %c-1, %2556 : index + %2559 = select %2557, %2558, %2556 : index + %2560 = divi_signed %2559, %c16 : index + %2561 = subi %c-1, %2560 : index + %2562 = select %2557, %2561, %2560 : index + %2563 = muli %2562, %c-2 : index + %2564 = addi %2555, %2563 : index + %2565 = addi %2564, %c1 : index + %2566 = cmpi "slt", %2565, %c0 : index + %2567 = subi %c-1, %2565 : index + %2568 = select %2566, %2567, %2565 : index + %2569 = divi_signed %2568, %c2 : index + %2570 = subi %c-1, %2569 : index + %2571 = select %2566, %2570, %2569 : index + %2572 = muli %2571, %c-2 : index + %2573 = addi %2549, %2572 : index + %2574 = addi %2573, %c1 : index + %2575 = load %2[%2534, %c0, %2574] : memref<16x6x2xvector<8xf32>> + %2576 = vector.insertelement %2207, %2575[%c3_i64 : i64] : vector<8xf32> + %2577 = addi %arg5, %c8 : index + %2578 = cmpi "slt", %2577, %c0 : index + %2579 = subi %c-1, %2577 : index + %2580 = select %2578, %2579, %2577 : index + %2581 = divi_signed %2580, %c16 : index + %2582 = subi %c-1, %2581 : index + %2583 = select %2578, %2582, %2581 : index + %2584 = remi_signed %2583, %c16 : index + %2585 = cmpi "slt", %2584, %c0 : index + %2586 = addi %2584, %c16 : index + %2587 = select %2585, %2586, %2584 : index + %2588 = cmpi "slt", %arg5, %c0 : index + %2589 = subi %c-1, %arg5 : index + %2590 = select %2588, %2589, %arg5 : index + %2591 = divi_signed %2590, %c8 : index + %2592 = subi %c-1, %2591 : index + %2593 = select %2588, %2592, %2591 : index + %2594 = addi %arg5, %c8 : index + %2595 = cmpi "slt", %2594, %c0 : index + %2596 = subi %c-1, %2594 : index + %2597 = select %2595, %2596, %2594 : index + %2598 = divi_signed %2597, %c16 : index + %2599 = subi %c-1, %2598 : index + %2600 = select %2595, %2599, %2598 : index + %2601 = muli %2600, %c-2 : index + %2602 = addi %2593, %2601 : index + %2603 = cmpi "slt", %arg5, %c0 : index + %2604 = subi %c-1, %arg5 : index + %2605 = select %2603, %2604, %arg5 : index + %2606 = divi_signed %2605, %c8 : index + %2607 = subi %c-1, %2606 : index + %2608 = select %2603, %2607, %2606 : index + %2609 = addi %arg5, %c8 : index + %2610 = cmpi "slt", %2609, %c0 : index + %2611 = subi %c-1, %2609 : index + %2612 = select %2610, %2611, %2609 : index + %2613 = divi_signed %2612, %c16 : index + %2614 = subi %c-1, %2613 : index + %2615 = select %2610, %2614, %2613 : index + %2616 = muli %2615, %c-2 : index + %2617 = addi %2608, %2616 : index + %2618 = addi %2617, %c1 : index + %2619 = cmpi "slt", %2618, %c0 : index + %2620 = subi %c-1, %2618 : index + %2621 = select %2619, %2620, %2618 : index + %2622 = divi_signed %2621, %c2 : index + %2623 = subi %c-1, %2622 : index + %2624 = select %2619, %2623, %2622 : index + %2625 = muli %2624, %c-2 : index + %2626 = addi %2602, %2625 : index + %2627 = addi %2626, %c1 : index + store %2576, %2[%2587, %c0, %2627] : memref<16x6x2xvector<8xf32>> + %2628 = addi %arg5, %c8 : index + %2629 = cmpi "slt", %2628, %c0 : index + %2630 = subi %c-1, %2628 : index + %2631 = select %2629, %2630, %2628 : index + %2632 = divi_signed %2631, %c16 : index + %2633 = subi %c-1, %2632 : index + %2634 = select %2629, %2633, %2632 : index + %2635 = remi_signed %2634, %c16 : index + %2636 = cmpi "slt", %2635, %c0 : index + %2637 = addi %2635, %c16 : index + %2638 = select %2636, %2637, %2635 : index + %2639 = cmpi "slt", %arg5, %c0 : index + %2640 = subi %c-1, %arg5 : index + %2641 = select %2639, %2640, %arg5 : index + %2642 = divi_signed %2641, %c8 : index + %2643 = subi %c-1, %2642 : index + %2644 = select %2639, %2643, %2642 : index + %2645 = addi %arg5, %c8 : index + %2646 = cmpi "slt", %2645, %c0 : index + %2647 = subi %c-1, %2645 : index + %2648 = select %2646, %2647, %2645 : index + %2649 = divi_signed %2648, %c16 : index + %2650 = subi %c-1, %2649 : index + %2651 = select %2646, %2650, %2649 : index + %2652 = muli %2651, %c-2 : index + %2653 = addi %2644, %2652 : index + %2654 = cmpi "slt", %arg5, %c0 : index + %2655 = subi %c-1, %arg5 : index + %2656 = select %2654, %2655, %arg5 : index + %2657 = divi_signed %2656, %c8 : index + %2658 = subi %c-1, %2657 : index + %2659 = select %2654, %2658, %2657 : index + %2660 = addi %arg5, %c8 : index + %2661 = cmpi "slt", %2660, %c0 : index + %2662 = subi %c-1, %2660 : index + %2663 = select %2661, %2662, %2660 : index + %2664 = divi_signed %2663, %c16 : index + %2665 = subi %c-1, %2664 : index + %2666 = select %2661, %2665, %2664 : index + %2667 = muli %2666, %c-2 : index + %2668 = addi %2659, %2667 : index + %2669 = addi %2668, %c1 : index + %2670 = cmpi "slt", %2669, %c0 : index + %2671 = subi %c-1, %2669 : index + %2672 = select %2670, %2671, %2669 : index + %2673 = divi_signed %2672, %c2 : index + %2674 = subi %c-1, %2673 : index + %2675 = select %2670, %2674, %2673 : index + %2676 = muli %2675, %c-2 : index + %2677 = addi %2653, %2676 : index + %2678 = addi %2677, %c1 : index + %2679 = load %2[%2638, %c0, %2678] : memref<16x6x2xvector<8xf32>> + %2680 = vector.insertelement %2208, %2679[%c4_i64 : i64] : vector<8xf32> + %2681 = addi %arg5, %c8 : index + %2682 = cmpi "slt", %2681, %c0 : index + %2683 = subi %c-1, %2681 : index + %2684 = select %2682, %2683, %2681 : index + %2685 = divi_signed %2684, %c16 : index + %2686 = subi %c-1, %2685 : index + %2687 = select %2682, %2686, %2685 : index + %2688 = remi_signed %2687, %c16 : index + %2689 = cmpi "slt", %2688, %c0 : index + %2690 = addi %2688, %c16 : index + %2691 = select %2689, %2690, %2688 : index + %2692 = cmpi "slt", %arg5, %c0 : index + %2693 = subi %c-1, %arg5 : index + %2694 = select %2692, %2693, %arg5 : index + %2695 = divi_signed %2694, %c8 : index + %2696 = subi %c-1, %2695 : index + %2697 = select %2692, %2696, %2695 : index + %2698 = addi %arg5, %c8 : index + %2699 = cmpi "slt", %2698, %c0 : index + %2700 = subi %c-1, %2698 : index + %2701 = select %2699, %2700, %2698 : index + %2702 = divi_signed %2701, %c16 : index + %2703 = subi %c-1, %2702 : index + %2704 = select %2699, %2703, %2702 : index + %2705 = muli %2704, %c-2 : index + %2706 = addi %2697, %2705 : index + %2707 = cmpi "slt", %arg5, %c0 : index + %2708 = subi %c-1, %arg5 : index + %2709 = select %2707, %2708, %arg5 : index + %2710 = divi_signed %2709, %c8 : index + %2711 = subi %c-1, %2710 : index + %2712 = select %2707, %2711, %2710 : index + %2713 = addi %arg5, %c8 : index + %2714 = cmpi "slt", %2713, %c0 : index + %2715 = subi %c-1, %2713 : index + %2716 = select %2714, %2715, %2713 : index + %2717 = divi_signed %2716, %c16 : index + %2718 = subi %c-1, %2717 : index + %2719 = select %2714, %2718, %2717 : index + %2720 = muli %2719, %c-2 : index + %2721 = addi %2712, %2720 : index + %2722 = addi %2721, %c1 : index + %2723 = cmpi "slt", %2722, %c0 : index + %2724 = subi %c-1, %2722 : index + %2725 = select %2723, %2724, %2722 : index + %2726 = divi_signed %2725, %c2 : index + %2727 = subi %c-1, %2726 : index + %2728 = select %2723, %2727, %2726 : index + %2729 = muli %2728, %c-2 : index + %2730 = addi %2706, %2729 : index + %2731 = addi %2730, %c1 : index + store %2680, %2[%2691, %c0, %2731] : memref<16x6x2xvector<8xf32>> + %2732 = addi %arg5, %c8 : index + %2733 = cmpi "slt", %2732, %c0 : index + %2734 = subi %c-1, %2732 : index + %2735 = select %2733, %2734, %2732 : index + %2736 = divi_signed %2735, %c16 : index + %2737 = subi %c-1, %2736 : index + %2738 = select %2733, %2737, %2736 : index + %2739 = remi_signed %2738, %c16 : index + %2740 = cmpi "slt", %2739, %c0 : index + %2741 = addi %2739, %c16 : index + %2742 = select %2740, %2741, %2739 : index + %2743 = cmpi "slt", %arg5, %c0 : index + %2744 = subi %c-1, %arg5 : index + %2745 = select %2743, %2744, %arg5 : index + %2746 = divi_signed %2745, %c8 : index + %2747 = subi %c-1, %2746 : index + %2748 = select %2743, %2747, %2746 : index + %2749 = addi %arg5, %c8 : index + %2750 = cmpi "slt", %2749, %c0 : index + %2751 = subi %c-1, %2749 : index + %2752 = select %2750, %2751, %2749 : index + %2753 = divi_signed %2752, %c16 : index + %2754 = subi %c-1, %2753 : index + %2755 = select %2750, %2754, %2753 : index + %2756 = muli %2755, %c-2 : index + %2757 = addi %2748, %2756 : index + %2758 = cmpi "slt", %arg5, %c0 : index + %2759 = subi %c-1, %arg5 : index + %2760 = select %2758, %2759, %arg5 : index + %2761 = divi_signed %2760, %c8 : index + %2762 = subi %c-1, %2761 : index + %2763 = select %2758, %2762, %2761 : index + %2764 = addi %arg5, %c8 : index + %2765 = cmpi "slt", %2764, %c0 : index + %2766 = subi %c-1, %2764 : index + %2767 = select %2765, %2766, %2764 : index + %2768 = divi_signed %2767, %c16 : index + %2769 = subi %c-1, %2768 : index + %2770 = select %2765, %2769, %2768 : index + %2771 = muli %2770, %c-2 : index + %2772 = addi %2763, %2771 : index + %2773 = addi %2772, %c1 : index + %2774 = cmpi "slt", %2773, %c0 : index + %2775 = subi %c-1, %2773 : index + %2776 = select %2774, %2775, %2773 : index + %2777 = divi_signed %2776, %c2 : index + %2778 = subi %c-1, %2777 : index + %2779 = select %2774, %2778, %2777 : index + %2780 = muli %2779, %c-2 : index + %2781 = addi %2757, %2780 : index + %2782 = addi %2781, %c1 : index + %2783 = load %2[%2742, %c0, %2782] : memref<16x6x2xvector<8xf32>> + %2784 = vector.insertelement %2209, %2783[%c5_i64 : i64] : vector<8xf32> + %2785 = addi %arg5, %c8 : index + %2786 = cmpi "slt", %2785, %c0 : index + %2787 = subi %c-1, %2785 : index + %2788 = select %2786, %2787, %2785 : index + %2789 = divi_signed %2788, %c16 : index + %2790 = subi %c-1, %2789 : index + %2791 = select %2786, %2790, %2789 : index + %2792 = remi_signed %2791, %c16 : index + %2793 = cmpi "slt", %2792, %c0 : index + %2794 = addi %2792, %c16 : index + %2795 = select %2793, %2794, %2792 : index + %2796 = cmpi "slt", %arg5, %c0 : index + %2797 = subi %c-1, %arg5 : index + %2798 = select %2796, %2797, %arg5 : index + %2799 = divi_signed %2798, %c8 : index + %2800 = subi %c-1, %2799 : index + %2801 = select %2796, %2800, %2799 : index + %2802 = addi %arg5, %c8 : index + %2803 = cmpi "slt", %2802, %c0 : index + %2804 = subi %c-1, %2802 : index + %2805 = select %2803, %2804, %2802 : index + %2806 = divi_signed %2805, %c16 : index + %2807 = subi %c-1, %2806 : index + %2808 = select %2803, %2807, %2806 : index + %2809 = muli %2808, %c-2 : index + %2810 = addi %2801, %2809 : index + %2811 = cmpi "slt", %arg5, %c0 : index + %2812 = subi %c-1, %arg5 : index + %2813 = select %2811, %2812, %arg5 : index + %2814 = divi_signed %2813, %c8 : index + %2815 = subi %c-1, %2814 : index + %2816 = select %2811, %2815, %2814 : index + %2817 = addi %arg5, %c8 : index + %2818 = cmpi "slt", %2817, %c0 : index + %2819 = subi %c-1, %2817 : index + %2820 = select %2818, %2819, %2817 : index + %2821 = divi_signed %2820, %c16 : index + %2822 = subi %c-1, %2821 : index + %2823 = select %2818, %2822, %2821 : index + %2824 = muli %2823, %c-2 : index + %2825 = addi %2816, %2824 : index + %2826 = addi %2825, %c1 : index + %2827 = cmpi "slt", %2826, %c0 : index + %2828 = subi %c-1, %2826 : index + %2829 = select %2827, %2828, %2826 : index + %2830 = divi_signed %2829, %c2 : index + %2831 = subi %c-1, %2830 : index + %2832 = select %2827, %2831, %2830 : index + %2833 = muli %2832, %c-2 : index + %2834 = addi %2810, %2833 : index + %2835 = addi %2834, %c1 : index + store %2784, %2[%2795, %c0, %2835] : memref<16x6x2xvector<8xf32>> + %2836 = addi %arg5, %c8 : index + %2837 = cmpi "slt", %2836, %c0 : index + %2838 = subi %c-1, %2836 : index + %2839 = select %2837, %2838, %2836 : index + %2840 = divi_signed %2839, %c16 : index + %2841 = subi %c-1, %2840 : index + %2842 = select %2837, %2841, %2840 : index + %2843 = remi_signed %2842, %c16 : index + %2844 = cmpi "slt", %2843, %c0 : index + %2845 = addi %2843, %c16 : index + %2846 = select %2844, %2845, %2843 : index + %2847 = cmpi "slt", %arg5, %c0 : index + %2848 = subi %c-1, %arg5 : index + %2849 = select %2847, %2848, %arg5 : index + %2850 = divi_signed %2849, %c8 : index + %2851 = subi %c-1, %2850 : index + %2852 = select %2847, %2851, %2850 : index + %2853 = addi %arg5, %c8 : index + %2854 = cmpi "slt", %2853, %c0 : index + %2855 = subi %c-1, %2853 : index + %2856 = select %2854, %2855, %2853 : index + %2857 = divi_signed %2856, %c16 : index + %2858 = subi %c-1, %2857 : index + %2859 = select %2854, %2858, %2857 : index + %2860 = muli %2859, %c-2 : index + %2861 = addi %2852, %2860 : index + %2862 = cmpi "slt", %arg5, %c0 : index + %2863 = subi %c-1, %arg5 : index + %2864 = select %2862, %2863, %arg5 : index + %2865 = divi_signed %2864, %c8 : index + %2866 = subi %c-1, %2865 : index + %2867 = select %2862, %2866, %2865 : index + %2868 = addi %arg5, %c8 : index + %2869 = cmpi "slt", %2868, %c0 : index + %2870 = subi %c-1, %2868 : index + %2871 = select %2869, %2870, %2868 : index + %2872 = divi_signed %2871, %c16 : index + %2873 = subi %c-1, %2872 : index + %2874 = select %2869, %2873, %2872 : index + %2875 = muli %2874, %c-2 : index + %2876 = addi %2867, %2875 : index + %2877 = addi %2876, %c1 : index + %2878 = cmpi "slt", %2877, %c0 : index + %2879 = subi %c-1, %2877 : index + %2880 = select %2878, %2879, %2877 : index + %2881 = divi_signed %2880, %c2 : index + %2882 = subi %c-1, %2881 : index + %2883 = select %2878, %2882, %2881 : index + %2884 = muli %2883, %c-2 : index + %2885 = addi %2861, %2884 : index + %2886 = addi %2885, %c1 : index + %2887 = load %2[%2846, %c0, %2886] : memref<16x6x2xvector<8xf32>> + %2888 = vector.insertelement %2210, %2887[%c6_i64 : i64] : vector<8xf32> + %2889 = addi %arg5, %c8 : index + %2890 = cmpi "slt", %2889, %c0 : index + %2891 = subi %c-1, %2889 : index + %2892 = select %2890, %2891, %2889 : index + %2893 = divi_signed %2892, %c16 : index + %2894 = subi %c-1, %2893 : index + %2895 = select %2890, %2894, %2893 : index + %2896 = remi_signed %2895, %c16 : index + %2897 = cmpi "slt", %2896, %c0 : index + %2898 = addi %2896, %c16 : index + %2899 = select %2897, %2898, %2896 : index + %2900 = cmpi "slt", %arg5, %c0 : index + %2901 = subi %c-1, %arg5 : index + %2902 = select %2900, %2901, %arg5 : index + %2903 = divi_signed %2902, %c8 : index + %2904 = subi %c-1, %2903 : index + %2905 = select %2900, %2904, %2903 : index + %2906 = addi %arg5, %c8 : index + %2907 = cmpi "slt", %2906, %c0 : index + %2908 = subi %c-1, %2906 : index + %2909 = select %2907, %2908, %2906 : index + %2910 = divi_signed %2909, %c16 : index + %2911 = subi %c-1, %2910 : index + %2912 = select %2907, %2911, %2910 : index + %2913 = muli %2912, %c-2 : index + %2914 = addi %2905, %2913 : index + %2915 = cmpi "slt", %arg5, %c0 : index + %2916 = subi %c-1, %arg5 : index + %2917 = select %2915, %2916, %arg5 : index + %2918 = divi_signed %2917, %c8 : index + %2919 = subi %c-1, %2918 : index + %2920 = select %2915, %2919, %2918 : index + %2921 = addi %arg5, %c8 : index + %2922 = cmpi "slt", %2921, %c0 : index + %2923 = subi %c-1, %2921 : index + %2924 = select %2922, %2923, %2921 : index + %2925 = divi_signed %2924, %c16 : index + %2926 = subi %c-1, %2925 : index + %2927 = select %2922, %2926, %2925 : index + %2928 = muli %2927, %c-2 : index + %2929 = addi %2920, %2928 : index + %2930 = addi %2929, %c1 : index + %2931 = cmpi "slt", %2930, %c0 : index + %2932 = subi %c-1, %2930 : index + %2933 = select %2931, %2932, %2930 : index + %2934 = divi_signed %2933, %c2 : index + %2935 = subi %c-1, %2934 : index + %2936 = select %2931, %2935, %2934 : index + %2937 = muli %2936, %c-2 : index + %2938 = addi %2914, %2937 : index + %2939 = addi %2938, %c1 : index + store %2888, %2[%2899, %c0, %2939] : memref<16x6x2xvector<8xf32>> + %2940 = addi %arg5, %c8 : index + %2941 = cmpi "slt", %2940, %c0 : index + %2942 = subi %c-1, %2940 : index + %2943 = select %2941, %2942, %2940 : index + %2944 = divi_signed %2943, %c16 : index + %2945 = subi %c-1, %2944 : index + %2946 = select %2941, %2945, %2944 : index + %2947 = remi_signed %2946, %c16 : index + %2948 = cmpi "slt", %2947, %c0 : index + %2949 = addi %2947, %c16 : index + %2950 = select %2948, %2949, %2947 : index + %2951 = cmpi "slt", %arg5, %c0 : index + %2952 = subi %c-1, %arg5 : index + %2953 = select %2951, %2952, %arg5 : index + %2954 = divi_signed %2953, %c8 : index + %2955 = subi %c-1, %2954 : index + %2956 = select %2951, %2955, %2954 : index + %2957 = addi %arg5, %c8 : index + %2958 = cmpi "slt", %2957, %c0 : index + %2959 = subi %c-1, %2957 : index + %2960 = select %2958, %2959, %2957 : index + %2961 = divi_signed %2960, %c16 : index + %2962 = subi %c-1, %2961 : index + %2963 = select %2958, %2962, %2961 : index + %2964 = muli %2963, %c-2 : index + %2965 = addi %2956, %2964 : index + %2966 = cmpi "slt", %arg5, %c0 : index + %2967 = subi %c-1, %arg5 : index + %2968 = select %2966, %2967, %arg5 : index + %2969 = divi_signed %2968, %c8 : index + %2970 = subi %c-1, %2969 : index + %2971 = select %2966, %2970, %2969 : index + %2972 = addi %arg5, %c8 : index + %2973 = cmpi "slt", %2972, %c0 : index + %2974 = subi %c-1, %2972 : index + %2975 = select %2973, %2974, %2972 : index + %2976 = divi_signed %2975, %c16 : index + %2977 = subi %c-1, %2976 : index + %2978 = select %2973, %2977, %2976 : index + %2979 = muli %2978, %c-2 : index + %2980 = addi %2971, %2979 : index + %2981 = addi %2980, %c1 : index + %2982 = cmpi "slt", %2981, %c0 : index + %2983 = subi %c-1, %2981 : index + %2984 = select %2982, %2983, %2981 : index + %2985 = divi_signed %2984, %c2 : index + %2986 = subi %c-1, %2985 : index + %2987 = select %2982, %2986, %2985 : index + %2988 = muli %2987, %c-2 : index + %2989 = addi %2965, %2988 : index + %2990 = addi %2989, %c1 : index + %2991 = load %2[%2950, %c0, %2990] : memref<16x6x2xvector<8xf32>> + %2992 = vector.insertelement %2211, %2991[%c7_i64 : i64] : vector<8xf32> + %2993 = addi %arg5, %c8 : index + %2994 = cmpi "slt", %2993, %c0 : index + %2995 = subi %c-1, %2993 : index + %2996 = select %2994, %2995, %2993 : index + %2997 = divi_signed %2996, %c16 : index + %2998 = subi %c-1, %2997 : index + %2999 = select %2994, %2998, %2997 : index + %3000 = remi_signed %2999, %c16 : index + %3001 = cmpi "slt", %3000, %c0 : index + %3002 = addi %3000, %c16 : index + %3003 = select %3001, %3002, %3000 : index + %3004 = cmpi "slt", %arg5, %c0 : index + %3005 = subi %c-1, %arg5 : index + %3006 = select %3004, %3005, %arg5 : index + %3007 = divi_signed %3006, %c8 : index + %3008 = subi %c-1, %3007 : index + %3009 = select %3004, %3008, %3007 : index + %3010 = addi %arg5, %c8 : index + %3011 = cmpi "slt", %3010, %c0 : index + %3012 = subi %c-1, %3010 : index + %3013 = select %3011, %3012, %3010 : index + %3014 = divi_signed %3013, %c16 : index + %3015 = subi %c-1, %3014 : index + %3016 = select %3011, %3015, %3014 : index + %3017 = muli %3016, %c-2 : index + %3018 = addi %3009, %3017 : index + %3019 = cmpi "slt", %arg5, %c0 : index + %3020 = subi %c-1, %arg5 : index + %3021 = select %3019, %3020, %arg5 : index + %3022 = divi_signed %3021, %c8 : index + %3023 = subi %c-1, %3022 : index + %3024 = select %3019, %3023, %3022 : index + %3025 = addi %arg5, %c8 : index + %3026 = cmpi "slt", %3025, %c0 : index + %3027 = subi %c-1, %3025 : index + %3028 = select %3026, %3027, %3025 : index + %3029 = divi_signed %3028, %c16 : index + %3030 = subi %c-1, %3029 : index + %3031 = select %3026, %3030, %3029 : index + %3032 = muli %3031, %c-2 : index + %3033 = addi %3024, %3032 : index + %3034 = addi %3033, %c1 : index + %3035 = cmpi "slt", %3034, %c0 : index + %3036 = subi %c-1, %3034 : index + %3037 = select %3035, %3036, %3034 : index + %3038 = divi_signed %3037, %c2 : index + %3039 = subi %c-1, %3038 : index + %3040 = select %3035, %3039, %3038 : index + %3041 = muli %3040, %c-2 : index + %3042 = addi %3018, %3041 : index + %3043 = addi %3042, %c1 : index + store %2992, %2[%3003, %c0, %3043] : memref<16x6x2xvector<8xf32>> + %3044 = addi %arg5, %c8 : index + %3045 = cmpi "slt", %3044, %c0 : index + %3046 = subi %c-1, %3044 : index + %3047 = select %3045, %3046, %3044 : index + %3048 = divi_signed %3047, %c16 : index + %3049 = subi %c-1, %3048 : index + %3050 = select %3045, %3049, %3048 : index + %3051 = remi_signed %3050, %c16 : index + %3052 = cmpi "slt", %3051, %c0 : index + %3053 = addi %3051, %c16 : index + %3054 = select %3052, %3053, %3051 : index + %3055 = cmpi "slt", %arg5, %c0 : index + %3056 = subi %c-1, %arg5 : index + %3057 = select %3055, %3056, %arg5 : index + %3058 = divi_signed %3057, %c8 : index + %3059 = subi %c-1, %3058 : index + %3060 = select %3055, %3059, %3058 : index + %3061 = addi %arg5, %c8 : index + %3062 = cmpi "slt", %3061, %c0 : index + %3063 = subi %c-1, %3061 : index + %3064 = select %3062, %3063, %3061 : index + %3065 = divi_signed %3064, %c16 : index + %3066 = subi %c-1, %3065 : index + %3067 = select %3062, %3066, %3065 : index + %3068 = muli %3067, %c-2 : index + %3069 = addi %3060, %3068 : index + %3070 = cmpi "slt", %arg5, %c0 : index + %3071 = subi %c-1, %arg5 : index + %3072 = select %3070, %3071, %arg5 : index + %3073 = divi_signed %3072, %c8 : index + %3074 = subi %c-1, %3073 : index + %3075 = select %3070, %3074, %3073 : index + %3076 = addi %arg5, %c8 : index + %3077 = cmpi "slt", %3076, %c0 : index + %3078 = subi %c-1, %3076 : index + %3079 = select %3077, %3078, %3076 : index + %3080 = divi_signed %3079, %c16 : index + %3081 = subi %c-1, %3080 : index + %3082 = select %3077, %3081, %3080 : index + %3083 = muli %3082, %c-2 : index + %3084 = addi %3075, %3083 : index + %3085 = addi %3084, %c1 : index + %3086 = cmpi "slt", %3085, %c0 : index + %3087 = subi %c-1, %3085 : index + %3088 = select %3086, %3087, %3085 : index + %3089 = divi_signed %3088, %c2 : index + %3090 = subi %c-1, %3089 : index + %3091 = select %3086, %3090, %3089 : index + %3092 = muli %3091, %c-2 : index + %3093 = addi %3069, %3092 : index + %3094 = addi %3093, %c1 : index + %3095 = load %2[%3054, %c0, %3094] : memref<16x6x2xvector<8xf32>> + %3096 = vector.insertelement %2204, %3095[%c0_i64 : i64] : vector<8xf32> + %3097 = addi %arg5, %c8 : index + %3098 = cmpi "slt", %3097, %c0 : index + %3099 = subi %c-1, %3097 : index + %3100 = select %3098, %3099, %3097 : index + %3101 = divi_signed %3100, %c16 : index + %3102 = subi %c-1, %3101 : index + %3103 = select %3098, %3102, %3101 : index + %3104 = remi_signed %3103, %c16 : index + %3105 = cmpi "slt", %3104, %c0 : index + %3106 = addi %3104, %c16 : index + %3107 = select %3105, %3106, %3104 : index + %3108 = cmpi "slt", %arg5, %c0 : index + %3109 = subi %c-1, %arg5 : index + %3110 = select %3108, %3109, %arg5 : index + %3111 = divi_signed %3110, %c8 : index + %3112 = subi %c-1, %3111 : index + %3113 = select %3108, %3112, %3111 : index + %3114 = addi %arg5, %c8 : index + %3115 = cmpi "slt", %3114, %c0 : index + %3116 = subi %c-1, %3114 : index + %3117 = select %3115, %3116, %3114 : index + %3118 = divi_signed %3117, %c16 : index + %3119 = subi %c-1, %3118 : index + %3120 = select %3115, %3119, %3118 : index + %3121 = muli %3120, %c-2 : index + %3122 = addi %3113, %3121 : index + %3123 = cmpi "slt", %arg5, %c0 : index + %3124 = subi %c-1, %arg5 : index + %3125 = select %3123, %3124, %arg5 : index + %3126 = divi_signed %3125, %c8 : index + %3127 = subi %c-1, %3126 : index + %3128 = select %3123, %3127, %3126 : index + %3129 = addi %arg5, %c8 : index + %3130 = cmpi "slt", %3129, %c0 : index + %3131 = subi %c-1, %3129 : index + %3132 = select %3130, %3131, %3129 : index + %3133 = divi_signed %3132, %c16 : index + %3134 = subi %c-1, %3133 : index + %3135 = select %3130, %3134, %3133 : index + %3136 = muli %3135, %c-2 : index + %3137 = addi %3128, %3136 : index + %3138 = addi %3137, %c1 : index + %3139 = cmpi "slt", %3138, %c0 : index + %3140 = subi %c-1, %3138 : index + %3141 = select %3139, %3140, %3138 : index + %3142 = divi_signed %3141, %c2 : index + %3143 = subi %c-1, %3142 : index + %3144 = select %3139, %3143, %3142 : index + %3145 = muli %3144, %c-2 : index + %3146 = addi %3122, %3145 : index + %3147 = addi %3146, %c1 : index + store %3096, %2[%3107, %c0, %3147] : memref<16x6x2xvector<8xf32>> + %3148 = addi %arg5, %c8 : index + %3149 = cmpi "slt", %3148, %c0 : index + %3150 = subi %c-1, %3148 : index + %3151 = select %3149, %3150, %3148 : index + %3152 = divi_signed %3151, %c16 : index + %3153 = subi %c-1, %3152 : index + %3154 = select %3149, %3153, %3152 : index + %3155 = remi_signed %3154, %c16 : index + %3156 = cmpi "slt", %3155, %c0 : index + %3157 = addi %3155, %c16 : index + %3158 = select %3156, %3157, %3155 : index + %3159 = cmpi "slt", %arg5, %c0 : index + %3160 = subi %c-1, %arg5 : index + %3161 = select %3159, %3160, %arg5 : index + %3162 = divi_signed %3161, %c8 : index + %3163 = subi %c-1, %3162 : index + %3164 = select %3159, %3163, %3162 : index + %3165 = addi %arg5, %c8 : index + %3166 = cmpi "slt", %3165, %c0 : index + %3167 = subi %c-1, %3165 : index + %3168 = select %3166, %3167, %3165 : index + %3169 = divi_signed %3168, %c16 : index + %3170 = subi %c-1, %3169 : index + %3171 = select %3166, %3170, %3169 : index + %3172 = muli %3171, %c-2 : index + %3173 = addi %3164, %3172 : index + %3174 = cmpi "slt", %arg5, %c0 : index + %3175 = subi %c-1, %arg5 : index + %3176 = select %3174, %3175, %arg5 : index + %3177 = divi_signed %3176, %c8 : index + %3178 = subi %c-1, %3177 : index + %3179 = select %3174, %3178, %3177 : index + %3180 = addi %arg5, %c8 : index + %3181 = cmpi "slt", %3180, %c0 : index + %3182 = subi %c-1, %3180 : index + %3183 = select %3181, %3182, %3180 : index + %3184 = divi_signed %3183, %c16 : index + %3185 = subi %c-1, %3184 : index + %3186 = select %3181, %3185, %3184 : index + %3187 = muli %3186, %c-2 : index + %3188 = addi %3179, %3187 : index + %3189 = addi %3188, %c1 : index + %3190 = cmpi "slt", %3189, %c0 : index + %3191 = subi %c-1, %3189 : index + %3192 = select %3190, %3191, %3189 : index + %3193 = divi_signed %3192, %c2 : index + %3194 = subi %c-1, %3193 : index + %3195 = select %3190, %3194, %3193 : index + %3196 = muli %3195, %c-2 : index + %3197 = addi %3173, %3196 : index + %3198 = addi %3197, %c1 : index + %3199 = load %2[%3158, %c0, %3198] : memref<16x6x2xvector<8xf32>> + %3200 = vector.insertelement %2205, %3199[%c1_i64 : i64] : vector<8xf32> + %3201 = addi %arg5, %c8 : index + %3202 = cmpi "slt", %3201, %c0 : index + %3203 = subi %c-1, %3201 : index + %3204 = select %3202, %3203, %3201 : index + %3205 = divi_signed %3204, %c16 : index + %3206 = subi %c-1, %3205 : index + %3207 = select %3202, %3206, %3205 : index + %3208 = remi_signed %3207, %c16 : index + %3209 = cmpi "slt", %3208, %c0 : index + %3210 = addi %3208, %c16 : index + %3211 = select %3209, %3210, %3208 : index + %3212 = cmpi "slt", %arg5, %c0 : index + %3213 = subi %c-1, %arg5 : index + %3214 = select %3212, %3213, %arg5 : index + %3215 = divi_signed %3214, %c8 : index + %3216 = subi %c-1, %3215 : index + %3217 = select %3212, %3216, %3215 : index + %3218 = addi %arg5, %c8 : index + %3219 = cmpi "slt", %3218, %c0 : index + %3220 = subi %c-1, %3218 : index + %3221 = select %3219, %3220, %3218 : index + %3222 = divi_signed %3221, %c16 : index + %3223 = subi %c-1, %3222 : index + %3224 = select %3219, %3223, %3222 : index + %3225 = muli %3224, %c-2 : index + %3226 = addi %3217, %3225 : index + %3227 = cmpi "slt", %arg5, %c0 : index + %3228 = subi %c-1, %arg5 : index + %3229 = select %3227, %3228, %arg5 : index + %3230 = divi_signed %3229, %c8 : index + %3231 = subi %c-1, %3230 : index + %3232 = select %3227, %3231, %3230 : index + %3233 = addi %arg5, %c8 : index + %3234 = cmpi "slt", %3233, %c0 : index + %3235 = subi %c-1, %3233 : index + %3236 = select %3234, %3235, %3233 : index + %3237 = divi_signed %3236, %c16 : index + %3238 = subi %c-1, %3237 : index + %3239 = select %3234, %3238, %3237 : index + %3240 = muli %3239, %c-2 : index + %3241 = addi %3232, %3240 : index + %3242 = addi %3241, %c1 : index + %3243 = cmpi "slt", %3242, %c0 : index + %3244 = subi %c-1, %3242 : index + %3245 = select %3243, %3244, %3242 : index + %3246 = divi_signed %3245, %c2 : index + %3247 = subi %c-1, %3246 : index + %3248 = select %3243, %3247, %3246 : index + %3249 = muli %3248, %c-2 : index + %3250 = addi %3226, %3249 : index + %3251 = addi %3250, %c1 : index + store %3200, %2[%3211, %c0, %3251] : memref<16x6x2xvector<8xf32>> + %3252 = addi %arg5, %c8 : index + %3253 = cmpi "slt", %3252, %c0 : index + %3254 = subi %c-1, %3252 : index + %3255 = select %3253, %3254, %3252 : index + %3256 = divi_signed %3255, %c16 : index + %3257 = subi %c-1, %3256 : index + %3258 = select %3253, %3257, %3256 : index + %3259 = remi_signed %3258, %c16 : index + %3260 = cmpi "slt", %3259, %c0 : index + %3261 = addi %3259, %c16 : index + %3262 = select %3260, %3261, %3259 : index + %3263 = cmpi "slt", %arg5, %c0 : index + %3264 = subi %c-1, %arg5 : index + %3265 = select %3263, %3264, %arg5 : index + %3266 = divi_signed %3265, %c8 : index + %3267 = subi %c-1, %3266 : index + %3268 = select %3263, %3267, %3266 : index + %3269 = addi %arg5, %c8 : index + %3270 = cmpi "slt", %3269, %c0 : index + %3271 = subi %c-1, %3269 : index + %3272 = select %3270, %3271, %3269 : index + %3273 = divi_signed %3272, %c16 : index + %3274 = subi %c-1, %3273 : index + %3275 = select %3270, %3274, %3273 : index + %3276 = muli %3275, %c-2 : index + %3277 = addi %3268, %3276 : index + %3278 = cmpi "slt", %arg5, %c0 : index + %3279 = subi %c-1, %arg5 : index + %3280 = select %3278, %3279, %arg5 : index + %3281 = divi_signed %3280, %c8 : index + %3282 = subi %c-1, %3281 : index + %3283 = select %3278, %3282, %3281 : index + %3284 = addi %arg5, %c8 : index + %3285 = cmpi "slt", %3284, %c0 : index + %3286 = subi %c-1, %3284 : index + %3287 = select %3285, %3286, %3284 : index + %3288 = divi_signed %3287, %c16 : index + %3289 = subi %c-1, %3288 : index + %3290 = select %3285, %3289, %3288 : index + %3291 = muli %3290, %c-2 : index + %3292 = addi %3283, %3291 : index + %3293 = addi %3292, %c1 : index + %3294 = cmpi "slt", %3293, %c0 : index + %3295 = subi %c-1, %3293 : index + %3296 = select %3294, %3295, %3293 : index + %3297 = divi_signed %3296, %c2 : index + %3298 = subi %c-1, %3297 : index + %3299 = select %3294, %3298, %3297 : index + %3300 = muli %3299, %c-2 : index + %3301 = addi %3277, %3300 : index + %3302 = addi %3301, %c1 : index + %3303 = load %2[%3262, %c0, %3302] : memref<16x6x2xvector<8xf32>> + %3304 = vector.insertelement %2206, %3303[%c2_i64 : i64] : vector<8xf32> + %3305 = addi %arg5, %c8 : index + %3306 = cmpi "slt", %3305, %c0 : index + %3307 = subi %c-1, %3305 : index + %3308 = select %3306, %3307, %3305 : index + %3309 = divi_signed %3308, %c16 : index + %3310 = subi %c-1, %3309 : index + %3311 = select %3306, %3310, %3309 : index + %3312 = remi_signed %3311, %c16 : index + %3313 = cmpi "slt", %3312, %c0 : index + %3314 = addi %3312, %c16 : index + %3315 = select %3313, %3314, %3312 : index + %3316 = cmpi "slt", %arg5, %c0 : index + %3317 = subi %c-1, %arg5 : index + %3318 = select %3316, %3317, %arg5 : index + %3319 = divi_signed %3318, %c8 : index + %3320 = subi %c-1, %3319 : index + %3321 = select %3316, %3320, %3319 : index + %3322 = addi %arg5, %c8 : index + %3323 = cmpi "slt", %3322, %c0 : index + %3324 = subi %c-1, %3322 : index + %3325 = select %3323, %3324, %3322 : index + %3326 = divi_signed %3325, %c16 : index + %3327 = subi %c-1, %3326 : index + %3328 = select %3323, %3327, %3326 : index + %3329 = muli %3328, %c-2 : index + %3330 = addi %3321, %3329 : index + %3331 = cmpi "slt", %arg5, %c0 : index + %3332 = subi %c-1, %arg5 : index + %3333 = select %3331, %3332, %arg5 : index + %3334 = divi_signed %3333, %c8 : index + %3335 = subi %c-1, %3334 : index + %3336 = select %3331, %3335, %3334 : index + %3337 = addi %arg5, %c8 : index + %3338 = cmpi "slt", %3337, %c0 : index + %3339 = subi %c-1, %3337 : index + %3340 = select %3338, %3339, %3337 : index + %3341 = divi_signed %3340, %c16 : index + %3342 = subi %c-1, %3341 : index + %3343 = select %3338, %3342, %3341 : index + %3344 = muli %3343, %c-2 : index + %3345 = addi %3336, %3344 : index + %3346 = addi %3345, %c1 : index + %3347 = cmpi "slt", %3346, %c0 : index + %3348 = subi %c-1, %3346 : index + %3349 = select %3347, %3348, %3346 : index + %3350 = divi_signed %3349, %c2 : index + %3351 = subi %c-1, %3350 : index + %3352 = select %3347, %3351, %3350 : index + %3353 = muli %3352, %c-2 : index + %3354 = addi %3330, %3353 : index + %3355 = addi %3354, %c1 : index + store %3304, %2[%3315, %c0, %3355] : memref<16x6x2xvector<8xf32>> + %3356 = addi %arg5, %c8 : index + %3357 = cmpi "slt", %3356, %c0 : index + %3358 = subi %c-1, %3356 : index + %3359 = select %3357, %3358, %3356 : index + %3360 = divi_signed %3359, %c16 : index + %3361 = subi %c-1, %3360 : index + %3362 = select %3357, %3361, %3360 : index + %3363 = remi_signed %3362, %c16 : index + %3364 = cmpi "slt", %3363, %c0 : index + %3365 = addi %3363, %c16 : index + %3366 = select %3364, %3365, %3363 : index + %3367 = cmpi "slt", %arg5, %c0 : index + %3368 = subi %c-1, %arg5 : index + %3369 = select %3367, %3368, %arg5 : index + %3370 = divi_signed %3369, %c8 : index + %3371 = subi %c-1, %3370 : index + %3372 = select %3367, %3371, %3370 : index + %3373 = addi %arg5, %c8 : index + %3374 = cmpi "slt", %3373, %c0 : index + %3375 = subi %c-1, %3373 : index + %3376 = select %3374, %3375, %3373 : index + %3377 = divi_signed %3376, %c16 : index + %3378 = subi %c-1, %3377 : index + %3379 = select %3374, %3378, %3377 : index + %3380 = muli %3379, %c-2 : index + %3381 = addi %3372, %3380 : index + %3382 = cmpi "slt", %arg5, %c0 : index + %3383 = subi %c-1, %arg5 : index + %3384 = select %3382, %3383, %arg5 : index + %3385 = divi_signed %3384, %c8 : index + %3386 = subi %c-1, %3385 : index + %3387 = select %3382, %3386, %3385 : index + %3388 = addi %arg5, %c8 : index + %3389 = cmpi "slt", %3388, %c0 : index + %3390 = subi %c-1, %3388 : index + %3391 = select %3389, %3390, %3388 : index + %3392 = divi_signed %3391, %c16 : index + %3393 = subi %c-1, %3392 : index + %3394 = select %3389, %3393, %3392 : index + %3395 = muli %3394, %c-2 : index + %3396 = addi %3387, %3395 : index + %3397 = addi %3396, %c1 : index + %3398 = cmpi "slt", %3397, %c0 : index + %3399 = subi %c-1, %3397 : index + %3400 = select %3398, %3399, %3397 : index + %3401 = divi_signed %3400, %c2 : index + %3402 = subi %c-1, %3401 : index + %3403 = select %3398, %3402, %3401 : index + %3404 = muli %3403, %c-2 : index + %3405 = addi %3381, %3404 : index + %3406 = addi %3405, %c1 : index + %3407 = load %2[%3366, %c0, %3406] : memref<16x6x2xvector<8xf32>> + %3408 = vector.insertelement %2207, %3407[%c3_i64 : i64] : vector<8xf32> + %3409 = addi %arg5, %c8 : index + %3410 = cmpi "slt", %3409, %c0 : index + %3411 = subi %c-1, %3409 : index + %3412 = select %3410, %3411, %3409 : index + %3413 = divi_signed %3412, %c16 : index + %3414 = subi %c-1, %3413 : index + %3415 = select %3410, %3414, %3413 : index + %3416 = remi_signed %3415, %c16 : index + %3417 = cmpi "slt", %3416, %c0 : index + %3418 = addi %3416, %c16 : index + %3419 = select %3417, %3418, %3416 : index + %3420 = cmpi "slt", %arg5, %c0 : index + %3421 = subi %c-1, %arg5 : index + %3422 = select %3420, %3421, %arg5 : index + %3423 = divi_signed %3422, %c8 : index + %3424 = subi %c-1, %3423 : index + %3425 = select %3420, %3424, %3423 : index + %3426 = addi %arg5, %c8 : index + %3427 = cmpi "slt", %3426, %c0 : index + %3428 = subi %c-1, %3426 : index + %3429 = select %3427, %3428, %3426 : index + %3430 = divi_signed %3429, %c16 : index + %3431 = subi %c-1, %3430 : index + %3432 = select %3427, %3431, %3430 : index + %3433 = muli %3432, %c-2 : index + %3434 = addi %3425, %3433 : index + %3435 = cmpi "slt", %arg5, %c0 : index + %3436 = subi %c-1, %arg5 : index + %3437 = select %3435, %3436, %arg5 : index + %3438 = divi_signed %3437, %c8 : index + %3439 = subi %c-1, %3438 : index + %3440 = select %3435, %3439, %3438 : index + %3441 = addi %arg5, %c8 : index + %3442 = cmpi "slt", %3441, %c0 : index + %3443 = subi %c-1, %3441 : index + %3444 = select %3442, %3443, %3441 : index + %3445 = divi_signed %3444, %c16 : index + %3446 = subi %c-1, %3445 : index + %3447 = select %3442, %3446, %3445 : index + %3448 = muli %3447, %c-2 : index + %3449 = addi %3440, %3448 : index + %3450 = addi %3449, %c1 : index + %3451 = cmpi "slt", %3450, %c0 : index + %3452 = subi %c-1, %3450 : index + %3453 = select %3451, %3452, %3450 : index + %3454 = divi_signed %3453, %c2 : index + %3455 = subi %c-1, %3454 : index + %3456 = select %3451, %3455, %3454 : index + %3457 = muli %3456, %c-2 : index + %3458 = addi %3434, %3457 : index + %3459 = addi %3458, %c1 : index + store %3408, %2[%3419, %c0, %3459] : memref<16x6x2xvector<8xf32>> + %3460 = addi %arg5, %c8 : index + %3461 = cmpi "slt", %3460, %c0 : index + %3462 = subi %c-1, %3460 : index + %3463 = select %3461, %3462, %3460 : index + %3464 = divi_signed %3463, %c16 : index + %3465 = subi %c-1, %3464 : index + %3466 = select %3461, %3465, %3464 : index + %3467 = remi_signed %3466, %c16 : index + %3468 = cmpi "slt", %3467, %c0 : index + %3469 = addi %3467, %c16 : index + %3470 = select %3468, %3469, %3467 : index + %3471 = cmpi "slt", %arg5, %c0 : index + %3472 = subi %c-1, %arg5 : index + %3473 = select %3471, %3472, %arg5 : index + %3474 = divi_signed %3473, %c8 : index + %3475 = subi %c-1, %3474 : index + %3476 = select %3471, %3475, %3474 : index + %3477 = addi %arg5, %c8 : index + %3478 = cmpi "slt", %3477, %c0 : index + %3479 = subi %c-1, %3477 : index + %3480 = select %3478, %3479, %3477 : index + %3481 = divi_signed %3480, %c16 : index + %3482 = subi %c-1, %3481 : index + %3483 = select %3478, %3482, %3481 : index + %3484 = muli %3483, %c-2 : index + %3485 = addi %3476, %3484 : index + %3486 = cmpi "slt", %arg5, %c0 : index + %3487 = subi %c-1, %arg5 : index + %3488 = select %3486, %3487, %arg5 : index + %3489 = divi_signed %3488, %c8 : index + %3490 = subi %c-1, %3489 : index + %3491 = select %3486, %3490, %3489 : index + %3492 = addi %arg5, %c8 : index + %3493 = cmpi "slt", %3492, %c0 : index + %3494 = subi %c-1, %3492 : index + %3495 = select %3493, %3494, %3492 : index + %3496 = divi_signed %3495, %c16 : index + %3497 = subi %c-1, %3496 : index + %3498 = select %3493, %3497, %3496 : index + %3499 = muli %3498, %c-2 : index + %3500 = addi %3491, %3499 : index + %3501 = addi %3500, %c1 : index + %3502 = cmpi "slt", %3501, %c0 : index + %3503 = subi %c-1, %3501 : index + %3504 = select %3502, %3503, %3501 : index + %3505 = divi_signed %3504, %c2 : index + %3506 = subi %c-1, %3505 : index + %3507 = select %3502, %3506, %3505 : index + %3508 = muli %3507, %c-2 : index + %3509 = addi %3485, %3508 : index + %3510 = addi %3509, %c1 : index + %3511 = load %2[%3470, %c0, %3510] : memref<16x6x2xvector<8xf32>> + %3512 = vector.insertelement %2208, %3511[%c4_i64 : i64] : vector<8xf32> + %3513 = addi %arg5, %c8 : index + %3514 = cmpi "slt", %3513, %c0 : index + %3515 = subi %c-1, %3513 : index + %3516 = select %3514, %3515, %3513 : index + %3517 = divi_signed %3516, %c16 : index + %3518 = subi %c-1, %3517 : index + %3519 = select %3514, %3518, %3517 : index + %3520 = remi_signed %3519, %c16 : index + %3521 = cmpi "slt", %3520, %c0 : index + %3522 = addi %3520, %c16 : index + %3523 = select %3521, %3522, %3520 : index + %3524 = cmpi "slt", %arg5, %c0 : index + %3525 = subi %c-1, %arg5 : index + %3526 = select %3524, %3525, %arg5 : index + %3527 = divi_signed %3526, %c8 : index + %3528 = subi %c-1, %3527 : index + %3529 = select %3524, %3528, %3527 : index + %3530 = addi %arg5, %c8 : index + %3531 = cmpi "slt", %3530, %c0 : index + %3532 = subi %c-1, %3530 : index + %3533 = select %3531, %3532, %3530 : index + %3534 = divi_signed %3533, %c16 : index + %3535 = subi %c-1, %3534 : index + %3536 = select %3531, %3535, %3534 : index + %3537 = muli %3536, %c-2 : index + %3538 = addi %3529, %3537 : index + %3539 = cmpi "slt", %arg5, %c0 : index + %3540 = subi %c-1, %arg5 : index + %3541 = select %3539, %3540, %arg5 : index + %3542 = divi_signed %3541, %c8 : index + %3543 = subi %c-1, %3542 : index + %3544 = select %3539, %3543, %3542 : index + %3545 = addi %arg5, %c8 : index + %3546 = cmpi "slt", %3545, %c0 : index + %3547 = subi %c-1, %3545 : index + %3548 = select %3546, %3547, %3545 : index + %3549 = divi_signed %3548, %c16 : index + %3550 = subi %c-1, %3549 : index + %3551 = select %3546, %3550, %3549 : index + %3552 = muli %3551, %c-2 : index + %3553 = addi %3544, %3552 : index + %3554 = addi %3553, %c1 : index + %3555 = cmpi "slt", %3554, %c0 : index + %3556 = subi %c-1, %3554 : index + %3557 = select %3555, %3556, %3554 : index + %3558 = divi_signed %3557, %c2 : index + %3559 = subi %c-1, %3558 : index + %3560 = select %3555, %3559, %3558 : index + %3561 = muli %3560, %c-2 : index + %3562 = addi %3538, %3561 : index + %3563 = addi %3562, %c1 : index + store %3512, %2[%3523, %c0, %3563] : memref<16x6x2xvector<8xf32>> + %3564 = addi %arg5, %c8 : index + %3565 = cmpi "slt", %3564, %c0 : index + %3566 = subi %c-1, %3564 : index + %3567 = select %3565, %3566, %3564 : index + %3568 = divi_signed %3567, %c16 : index + %3569 = subi %c-1, %3568 : index + %3570 = select %3565, %3569, %3568 : index + %3571 = remi_signed %3570, %c16 : index + %3572 = cmpi "slt", %3571, %c0 : index + %3573 = addi %3571, %c16 : index + %3574 = select %3572, %3573, %3571 : index + %3575 = cmpi "slt", %arg5, %c0 : index + %3576 = subi %c-1, %arg5 : index + %3577 = select %3575, %3576, %arg5 : index + %3578 = divi_signed %3577, %c8 : index + %3579 = subi %c-1, %3578 : index + %3580 = select %3575, %3579, %3578 : index + %3581 = addi %arg5, %c8 : index + %3582 = cmpi "slt", %3581, %c0 : index + %3583 = subi %c-1, %3581 : index + %3584 = select %3582, %3583, %3581 : index + %3585 = divi_signed %3584, %c16 : index + %3586 = subi %c-1, %3585 : index + %3587 = select %3582, %3586, %3585 : index + %3588 = muli %3587, %c-2 : index + %3589 = addi %3580, %3588 : index + %3590 = cmpi "slt", %arg5, %c0 : index + %3591 = subi %c-1, %arg5 : index + %3592 = select %3590, %3591, %arg5 : index + %3593 = divi_signed %3592, %c8 : index + %3594 = subi %c-1, %3593 : index + %3595 = select %3590, %3594, %3593 : index + %3596 = addi %arg5, %c8 : index + %3597 = cmpi "slt", %3596, %c0 : index + %3598 = subi %c-1, %3596 : index + %3599 = select %3597, %3598, %3596 : index + %3600 = divi_signed %3599, %c16 : index + %3601 = subi %c-1, %3600 : index + %3602 = select %3597, %3601, %3600 : index + %3603 = muli %3602, %c-2 : index + %3604 = addi %3595, %3603 : index + %3605 = addi %3604, %c1 : index + %3606 = cmpi "slt", %3605, %c0 : index + %3607 = subi %c-1, %3605 : index + %3608 = select %3606, %3607, %3605 : index + %3609 = divi_signed %3608, %c2 : index + %3610 = subi %c-1, %3609 : index + %3611 = select %3606, %3610, %3609 : index + %3612 = muli %3611, %c-2 : index + %3613 = addi %3589, %3612 : index + %3614 = addi %3613, %c1 : index + %3615 = load %2[%3574, %c0, %3614] : memref<16x6x2xvector<8xf32>> + %3616 = vector.insertelement %2209, %3615[%c5_i64 : i64] : vector<8xf32> + %3617 = addi %arg5, %c8 : index + %3618 = cmpi "slt", %3617, %c0 : index + %3619 = subi %c-1, %3617 : index + %3620 = select %3618, %3619, %3617 : index + %3621 = divi_signed %3620, %c16 : index + %3622 = subi %c-1, %3621 : index + %3623 = select %3618, %3622, %3621 : index + %3624 = remi_signed %3623, %c16 : index + %3625 = cmpi "slt", %3624, %c0 : index + %3626 = addi %3624, %c16 : index + %3627 = select %3625, %3626, %3624 : index + %3628 = cmpi "slt", %arg5, %c0 : index + %3629 = subi %c-1, %arg5 : index + %3630 = select %3628, %3629, %arg5 : index + %3631 = divi_signed %3630, %c8 : index + %3632 = subi %c-1, %3631 : index + %3633 = select %3628, %3632, %3631 : index + %3634 = addi %arg5, %c8 : index + %3635 = cmpi "slt", %3634, %c0 : index + %3636 = subi %c-1, %3634 : index + %3637 = select %3635, %3636, %3634 : index + %3638 = divi_signed %3637, %c16 : index + %3639 = subi %c-1, %3638 : index + %3640 = select %3635, %3639, %3638 : index + %3641 = muli %3640, %c-2 : index + %3642 = addi %3633, %3641 : index + %3643 = cmpi "slt", %arg5, %c0 : index + %3644 = subi %c-1, %arg5 : index + %3645 = select %3643, %3644, %arg5 : index + %3646 = divi_signed %3645, %c8 : index + %3647 = subi %c-1, %3646 : index + %3648 = select %3643, %3647, %3646 : index + %3649 = addi %arg5, %c8 : index + %3650 = cmpi "slt", %3649, %c0 : index + %3651 = subi %c-1, %3649 : index + %3652 = select %3650, %3651, %3649 : index + %3653 = divi_signed %3652, %c16 : index + %3654 = subi %c-1, %3653 : index + %3655 = select %3650, %3654, %3653 : index + %3656 = muli %3655, %c-2 : index + %3657 = addi %3648, %3656 : index + %3658 = addi %3657, %c1 : index + %3659 = cmpi "slt", %3658, %c0 : index + %3660 = subi %c-1, %3658 : index + %3661 = select %3659, %3660, %3658 : index + %3662 = divi_signed %3661, %c2 : index + %3663 = subi %c-1, %3662 : index + %3664 = select %3659, %3663, %3662 : index + %3665 = muli %3664, %c-2 : index + %3666 = addi %3642, %3665 : index + %3667 = addi %3666, %c1 : index + store %3616, %2[%3627, %c0, %3667] : memref<16x6x2xvector<8xf32>> + %3668 = addi %arg5, %c8 : index + %3669 = cmpi "slt", %3668, %c0 : index + %3670 = subi %c-1, %3668 : index + %3671 = select %3669, %3670, %3668 : index + %3672 = divi_signed %3671, %c16 : index + %3673 = subi %c-1, %3672 : index + %3674 = select %3669, %3673, %3672 : index + %3675 = remi_signed %3674, %c16 : index + %3676 = cmpi "slt", %3675, %c0 : index + %3677 = addi %3675, %c16 : index + %3678 = select %3676, %3677, %3675 : index + %3679 = cmpi "slt", %arg5, %c0 : index + %3680 = subi %c-1, %arg5 : index + %3681 = select %3679, %3680, %arg5 : index + %3682 = divi_signed %3681, %c8 : index + %3683 = subi %c-1, %3682 : index + %3684 = select %3679, %3683, %3682 : index + %3685 = addi %arg5, %c8 : index + %3686 = cmpi "slt", %3685, %c0 : index + %3687 = subi %c-1, %3685 : index + %3688 = select %3686, %3687, %3685 : index + %3689 = divi_signed %3688, %c16 : index + %3690 = subi %c-1, %3689 : index + %3691 = select %3686, %3690, %3689 : index + %3692 = muli %3691, %c-2 : index + %3693 = addi %3684, %3692 : index + %3694 = cmpi "slt", %arg5, %c0 : index + %3695 = subi %c-1, %arg5 : index + %3696 = select %3694, %3695, %arg5 : index + %3697 = divi_signed %3696, %c8 : index + %3698 = subi %c-1, %3697 : index + %3699 = select %3694, %3698, %3697 : index + %3700 = addi %arg5, %c8 : index + %3701 = cmpi "slt", %3700, %c0 : index + %3702 = subi %c-1, %3700 : index + %3703 = select %3701, %3702, %3700 : index + %3704 = divi_signed %3703, %c16 : index + %3705 = subi %c-1, %3704 : index + %3706 = select %3701, %3705, %3704 : index + %3707 = muli %3706, %c-2 : index + %3708 = addi %3699, %3707 : index + %3709 = addi %3708, %c1 : index + %3710 = cmpi "slt", %3709, %c0 : index + %3711 = subi %c-1, %3709 : index + %3712 = select %3710, %3711, %3709 : index + %3713 = divi_signed %3712, %c2 : index + %3714 = subi %c-1, %3713 : index + %3715 = select %3710, %3714, %3713 : index + %3716 = muli %3715, %c-2 : index + %3717 = addi %3693, %3716 : index + %3718 = addi %3717, %c1 : index + %3719 = load %2[%3678, %c0, %3718] : memref<16x6x2xvector<8xf32>> + %3720 = vector.insertelement %2210, %3719[%c6_i64 : i64] : vector<8xf32> + %3721 = addi %arg5, %c8 : index + %3722 = cmpi "slt", %3721, %c0 : index + %3723 = subi %c-1, %3721 : index + %3724 = select %3722, %3723, %3721 : index + %3725 = divi_signed %3724, %c16 : index + %3726 = subi %c-1, %3725 : index + %3727 = select %3722, %3726, %3725 : index + %3728 = remi_signed %3727, %c16 : index + %3729 = cmpi "slt", %3728, %c0 : index + %3730 = addi %3728, %c16 : index + %3731 = select %3729, %3730, %3728 : index + %3732 = cmpi "slt", %arg5, %c0 : index + %3733 = subi %c-1, %arg5 : index + %3734 = select %3732, %3733, %arg5 : index + %3735 = divi_signed %3734, %c8 : index + %3736 = subi %c-1, %3735 : index + %3737 = select %3732, %3736, %3735 : index + %3738 = addi %arg5, %c8 : index + %3739 = cmpi "slt", %3738, %c0 : index + %3740 = subi %c-1, %3738 : index + %3741 = select %3739, %3740, %3738 : index + %3742 = divi_signed %3741, %c16 : index + %3743 = subi %c-1, %3742 : index + %3744 = select %3739, %3743, %3742 : index + %3745 = muli %3744, %c-2 : index + %3746 = addi %3737, %3745 : index + %3747 = cmpi "slt", %arg5, %c0 : index + %3748 = subi %c-1, %arg5 : index + %3749 = select %3747, %3748, %arg5 : index + %3750 = divi_signed %3749, %c8 : index + %3751 = subi %c-1, %3750 : index + %3752 = select %3747, %3751, %3750 : index + %3753 = addi %arg5, %c8 : index + %3754 = cmpi "slt", %3753, %c0 : index + %3755 = subi %c-1, %3753 : index + %3756 = select %3754, %3755, %3753 : index + %3757 = divi_signed %3756, %c16 : index + %3758 = subi %c-1, %3757 : index + %3759 = select %3754, %3758, %3757 : index + %3760 = muli %3759, %c-2 : index + %3761 = addi %3752, %3760 : index + %3762 = addi %3761, %c1 : index + %3763 = cmpi "slt", %3762, %c0 : index + %3764 = subi %c-1, %3762 : index + %3765 = select %3763, %3764, %3762 : index + %3766 = divi_signed %3765, %c2 : index + %3767 = subi %c-1, %3766 : index + %3768 = select %3763, %3767, %3766 : index + %3769 = muli %3768, %c-2 : index + %3770 = addi %3746, %3769 : index + %3771 = addi %3770, %c1 : index + store %3720, %2[%3731, %c0, %3771] : memref<16x6x2xvector<8xf32>> + %3772 = addi %arg5, %c8 : index + %3773 = cmpi "slt", %3772, %c0 : index + %3774 = subi %c-1, %3772 : index + %3775 = select %3773, %3774, %3772 : index + %3776 = divi_signed %3775, %c16 : index + %3777 = subi %c-1, %3776 : index + %3778 = select %3773, %3777, %3776 : index + %3779 = remi_signed %3778, %c16 : index + %3780 = cmpi "slt", %3779, %c0 : index + %3781 = addi %3779, %c16 : index + %3782 = select %3780, %3781, %3779 : index + %3783 = cmpi "slt", %arg5, %c0 : index + %3784 = subi %c-1, %arg5 : index + %3785 = select %3783, %3784, %arg5 : index + %3786 = divi_signed %3785, %c8 : index + %3787 = subi %c-1, %3786 : index + %3788 = select %3783, %3787, %3786 : index + %3789 = addi %arg5, %c8 : index + %3790 = cmpi "slt", %3789, %c0 : index + %3791 = subi %c-1, %3789 : index + %3792 = select %3790, %3791, %3789 : index + %3793 = divi_signed %3792, %c16 : index + %3794 = subi %c-1, %3793 : index + %3795 = select %3790, %3794, %3793 : index + %3796 = muli %3795, %c-2 : index + %3797 = addi %3788, %3796 : index + %3798 = cmpi "slt", %arg5, %c0 : index + %3799 = subi %c-1, %arg5 : index + %3800 = select %3798, %3799, %arg5 : index + %3801 = divi_signed %3800, %c8 : index + %3802 = subi %c-1, %3801 : index + %3803 = select %3798, %3802, %3801 : index + %3804 = addi %arg5, %c8 : index + %3805 = cmpi "slt", %3804, %c0 : index + %3806 = subi %c-1, %3804 : index + %3807 = select %3805, %3806, %3804 : index + %3808 = divi_signed %3807, %c16 : index + %3809 = subi %c-1, %3808 : index + %3810 = select %3805, %3809, %3808 : index + %3811 = muli %3810, %c-2 : index + %3812 = addi %3803, %3811 : index + %3813 = addi %3812, %c1 : index + %3814 = cmpi "slt", %3813, %c0 : index + %3815 = subi %c-1, %3813 : index + %3816 = select %3814, %3815, %3813 : index + %3817 = divi_signed %3816, %c2 : index + %3818 = subi %c-1, %3817 : index + %3819 = select %3814, %3818, %3817 : index + %3820 = muli %3819, %c-2 : index + %3821 = addi %3797, %3820 : index + %3822 = addi %3821, %c1 : index + %3823 = load %2[%3782, %c0, %3822] : memref<16x6x2xvector<8xf32>> + %3824 = vector.insertelement %2211, %3823[%c7_i64 : i64] : vector<8xf32> + %3825 = addi %arg5, %c8 : index + %3826 = cmpi "slt", %3825, %c0 : index + %3827 = subi %c-1, %3825 : index + %3828 = select %3826, %3827, %3825 : index + %3829 = divi_signed %3828, %c16 : index + %3830 = subi %c-1, %3829 : index + %3831 = select %3826, %3830, %3829 : index + %3832 = remi_signed %3831, %c16 : index + %3833 = cmpi "slt", %3832, %c0 : index + %3834 = addi %3832, %c16 : index + %3835 = select %3833, %3834, %3832 : index + %3836 = cmpi "slt", %arg5, %c0 : index + %3837 = subi %c-1, %arg5 : index + %3838 = select %3836, %3837, %arg5 : index + %3839 = divi_signed %3838, %c8 : index + %3840 = subi %c-1, %3839 : index + %3841 = select %3836, %3840, %3839 : index + %3842 = addi %arg5, %c8 : index + %3843 = cmpi "slt", %3842, %c0 : index + %3844 = subi %c-1, %3842 : index + %3845 = select %3843, %3844, %3842 : index + %3846 = divi_signed %3845, %c16 : index + %3847 = subi %c-1, %3846 : index + %3848 = select %3843, %3847, %3846 : index + %3849 = muli %3848, %c-2 : index + %3850 = addi %3841, %3849 : index + %3851 = cmpi "slt", %arg5, %c0 : index + %3852 = subi %c-1, %arg5 : index + %3853 = select %3851, %3852, %arg5 : index + %3854 = divi_signed %3853, %c8 : index + %3855 = subi %c-1, %3854 : index + %3856 = select %3851, %3855, %3854 : index + %3857 = addi %arg5, %c8 : index + %3858 = cmpi "slt", %3857, %c0 : index + %3859 = subi %c-1, %3857 : index + %3860 = select %3858, %3859, %3857 : index + %3861 = divi_signed %3860, %c16 : index + %3862 = subi %c-1, %3861 : index + %3863 = select %3858, %3862, %3861 : index + %3864 = muli %3863, %c-2 : index + %3865 = addi %3856, %3864 : index + %3866 = addi %3865, %c1 : index + %3867 = cmpi "slt", %3866, %c0 : index + %3868 = subi %c-1, %3866 : index + %3869 = select %3867, %3868, %3866 : index + %3870 = divi_signed %3869, %c2 : index + %3871 = subi %c-1, %3870 : index + %3872 = select %3867, %3871, %3870 : index + %3873 = muli %3872, %c-2 : index + %3874 = addi %3850, %3873 : index + %3875 = addi %3874, %c1 : index + store %3824, %2[%3835, %c0, %3875] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %arg3, %arg5 : index + %33 = addi %32, %c8 : index + %34 = vector.transfer_read %arg2[%arg4, %33], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %35 = addi %arg5, %c8 : index + %36 = cmpi "slt", %35, %c0 : index + %37 = subi %c-1, %35 : index + %38 = select %36, %37, %35 : index + %39 = divi_signed %38, %c16 : index + %40 = subi %c-1, %39 : index + %41 = select %36, %40, %39 : index + %42 = remi_signed %41, %c16 : index + %43 = cmpi "slt", %42, %c0 : index + %44 = addi %42, %c16 : index + %45 = select %43, %44, %42 : index + %46 = cmpi "slt", %arg5, %c0 : index + %47 = subi %c-1, %arg5 : index + %48 = select %46, %47, %arg5 : index + %49 = divi_signed %48, %c8 : index + %50 = subi %c-1, %49 : index + %51 = select %46, %50, %49 : index + %52 = addi %arg5, %c8 : index + %53 = cmpi "slt", %52, %c0 : index + %54 = subi %c-1, %52 : index + %55 = select %53, %54, %52 : index + %56 = divi_signed %55, %c16 : index + %57 = subi %c-1, %56 : index + %58 = select %53, %57, %56 : index + %59 = muli %58, %c-2 : index + %60 = addi %51, %59 : index + %61 = cmpi "slt", %arg5, %c0 : index + %62 = subi %c-1, %arg5 : index + %63 = select %61, %62, %arg5 : index + %64 = divi_signed %63, %c8 : index + %65 = subi %c-1, %64 : index + %66 = select %61, %65, %64 : index + %67 = addi %arg5, %c8 : index + %68 = cmpi "slt", %67, %c0 : index + %69 = subi %c-1, %67 : index + %70 = select %68, %69, %67 : index + %71 = divi_signed %70, %c16 : index + %72 = subi %c-1, %71 : index + %73 = select %68, %72, %71 : index + %74 = muli %73, %c-2 : index + %75 = addi %66, %74 : index + %76 = addi %75, %c1 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = subi %c-1, %76 : index + %79 = select %77, %78, %76 : index + %80 = divi_signed %79, %c2 : index + %81 = subi %c-1, %80 : index + %82 = select %77, %81, %80 : index + %83 = muli %82, %c-2 : index + %84 = addi %60, %83 : index + %85 = addi %84, %c1 : index + %86 = load %2[%45, %c0, %85] : memref<16x6x2xvector<8xf32>> + %87 = addf %34, %86 : vector<8xf32> + store %87, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %88 = addi %arg3, %arg5 : index + %89 = addi %88, %c16 : index + %90 = vector.transfer_read %arg2[%arg4, %89], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %91 = cmpi "slt", %arg5, %c0 : index + %92 = subi %c-1, %arg5 : index + %93 = select %91, %92, %arg5 : index + %94 = divi_signed %93, %c16 : index + %95 = subi %c-1, %94 : index + %96 = select %91, %95, %94 : index + %97 = cmpi "slt", %arg5, %c0 : index + %98 = subi %c-1, %arg5 : index + %99 = select %97, %98, %arg5 : index + %100 = divi_signed %99, %c16 : index + %101 = subi %c-1, %100 : index + %102 = select %97, %101, %100 : index + %103 = addi %102, %c1 : index + %104 = cmpi "slt", %103, %c0 : index + %105 = subi %c-1, %103 : index + %106 = select %104, %105, %103 : index + %107 = divi_signed %106, %c16 : index + %108 = subi %c-1, %107 : index + %109 = select %104, %108, %107 : index + %110 = muli %109, %c-16 : index + %111 = addi %96, %110 : index + %112 = addi %111, %c1 : index + %113 = remi_signed %arg5, %c16 : index + %114 = cmpi "slt", %113, %c0 : index + %115 = addi %113, %c16 : index + %116 = select %114, %115, %113 : index + %117 = cmpi "slt", %116, %c0 : index + %118 = subi %c-1, %116 : index + %119 = select %117, %118, %116 : index + %120 = divi_signed %119, %c8 : index + %121 = subi %c-1, %120 : index + %122 = select %117, %121, %120 : index + %123 = remi_signed %122, %c2 : index + %124 = cmpi "slt", %123, %c0 : index + %125 = addi %123, %c2 : index + %126 = select %124, %125, %123 : index + %127 = load %2[%112, %c0, %126] : memref<16x6x2xvector<8xf32>> + %128 = addf %90, %127 : vector<8xf32> + store %128, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %129 = addi %arg3, %arg5 : index + %130 = addi %129, %c24 : index + %131 = vector.transfer_read %arg2[%arg4, %130], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %132 = addi %arg5, %c24 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c16 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = remi_signed %138, %c16 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = addi %139, %c16 : index + %142 = select %140, %141, %139 : index + %143 = cmpi "slt", %arg5, %c0 : index + %144 = subi %c-1, %arg5 : index + %145 = select %143, %144, %arg5 : index + %146 = divi_signed %145, %c8 : index + %147 = subi %c-1, %146 : index + %148 = select %143, %147, %146 : index + %149 = addi %arg5, %c24 : index + %150 = cmpi "slt", %149, %c0 : index + %151 = subi %c-1, %149 : index + %152 = select %150, %151, %149 : index + %153 = divi_signed %152, %c16 : index + %154 = subi %c-1, %153 : index + %155 = select %150, %154, %153 : index + %156 = muli %155, %c-2 : index + %157 = addi %148, %156 : index + %158 = cmpi "slt", %arg5, %c0 : index + %159 = subi %c-1, %arg5 : index + %160 = select %158, %159, %arg5 : index + %161 = divi_signed %160, %c8 : index + %162 = subi %c-1, %161 : index + %163 = select %158, %162, %161 : index + %164 = addi %arg5, %c24 : index + %165 = cmpi "slt", %164, %c0 : index + %166 = subi %c-1, %164 : index + %167 = select %165, %166, %164 : index + %168 = divi_signed %167, %c16 : index + %169 = subi %c-1, %168 : index + %170 = select %165, %169, %168 : index + %171 = muli %170, %c-2 : index + %172 = addi %163, %171 : index + %173 = addi %172, %c3 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %157, %180 : index + %182 = addi %181, %c3 : index + %183 = load %2[%142, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %131, %183 : vector<8xf32> + store %184, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %185 = addi %arg3, %arg5 : index + %186 = addi %185, %c32 : index + %187 = vector.transfer_read %arg2[%arg4, %186], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %188 = cmpi "slt", %arg5, %c0 : index + %189 = subi %c-1, %arg5 : index + %190 = select %188, %189, %arg5 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = cmpi "slt", %arg5, %c0 : index + %195 = subi %c-1, %arg5 : index + %196 = select %194, %195, %arg5 : index + %197 = divi_signed %196, %c16 : index + %198 = subi %c-1, %197 : index + %199 = select %194, %198, %197 : index + %200 = addi %199, %c2 : index + %201 = cmpi "slt", %200, %c0 : index + %202 = subi %c-1, %200 : index + %203 = select %201, %202, %200 : index + %204 = divi_signed %203, %c16 : index + %205 = subi %c-1, %204 : index + %206 = select %201, %205, %204 : index + %207 = muli %206, %c-16 : index + %208 = addi %193, %207 : index + %209 = addi %208, %c2 : index + %210 = remi_signed %arg5, %c16 : index + %211 = cmpi "slt", %210, %c0 : index + %212 = addi %210, %c16 : index + %213 = select %211, %212, %210 : index + %214 = cmpi "slt", %213, %c0 : index + %215 = subi %c-1, %213 : index + %216 = select %214, %215, %213 : index + %217 = divi_signed %216, %c8 : index + %218 = subi %c-1, %217 : index + %219 = select %214, %218, %217 : index + %220 = remi_signed %219, %c2 : index + %221 = cmpi "slt", %220, %c0 : index + %222 = addi %220, %c2 : index + %223 = select %221, %222, %220 : index + %224 = load %2[%209, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %187, %224 : vector<8xf32> + store %225, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %226 = addi %arg3, %arg5 : index + %227 = addi %226, %c40 : index + %228 = vector.transfer_read %arg2[%arg4, %227], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %229 = addi %arg5, %c40 : index + %230 = cmpi "slt", %229, %c0 : index + %231 = subi %c-1, %229 : index + %232 = select %230, %231, %229 : index + %233 = divi_signed %232, %c16 : index + %234 = subi %c-1, %233 : index + %235 = select %230, %234, %233 : index + %236 = remi_signed %235, %c16 : index + %237 = cmpi "slt", %236, %c0 : index + %238 = addi %236, %c16 : index + %239 = select %237, %238, %236 : index + %240 = cmpi "slt", %arg5, %c0 : index + %241 = subi %c-1, %arg5 : index + %242 = select %240, %241, %arg5 : index + %243 = divi_signed %242, %c8 : index + %244 = subi %c-1, %243 : index + %245 = select %240, %244, %243 : index + %246 = addi %arg5, %c40 : index + %247 = cmpi "slt", %246, %c0 : index + %248 = subi %c-1, %246 : index + %249 = select %247, %248, %246 : index + %250 = divi_signed %249, %c16 : index + %251 = subi %c-1, %250 : index + %252 = select %247, %251, %250 : index + %253 = muli %252, %c-2 : index + %254 = addi %245, %253 : index + %255 = cmpi "slt", %arg5, %c0 : index + %256 = subi %c-1, %arg5 : index + %257 = select %255, %256, %arg5 : index + %258 = divi_signed %257, %c8 : index + %259 = subi %c-1, %258 : index + %260 = select %255, %259, %258 : index + %261 = addi %arg5, %c40 : index + %262 = cmpi "slt", %261, %c0 : index + %263 = subi %c-1, %261 : index + %264 = select %262, %263, %261 : index + %265 = divi_signed %264, %c16 : index + %266 = subi %c-1, %265 : index + %267 = select %262, %266, %265 : index + %268 = muli %267, %c-2 : index + %269 = addi %260, %268 : index + %270 = addi %269, %c5 : index + %271 = cmpi "slt", %270, %c0 : index + %272 = subi %c-1, %270 : index + %273 = select %271, %272, %270 : index + %274 = divi_signed %273, %c2 : index + %275 = subi %c-1, %274 : index + %276 = select %271, %275, %274 : index + %277 = muli %276, %c-2 : index + %278 = addi %254, %277 : index + %279 = addi %278, %c5 : index + %280 = load %2[%239, %c0, %279] : memref<16x6x2xvector<8xf32>> + %281 = addf %228, %280 : vector<8xf32> + store %281, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %282 = addi %arg3, %arg5 : index + %283 = addi %282, %c48 : index + %284 = vector.transfer_read %arg2[%arg4, %283], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %285 = cmpi "slt", %arg5, %c0 : index + %286 = subi %c-1, %arg5 : index + %287 = select %285, %286, %arg5 : index + %288 = divi_signed %287, %c16 : index + %289 = subi %c-1, %288 : index + %290 = select %285, %289, %288 : index + %291 = cmpi "slt", %arg5, %c0 : index + %292 = subi %c-1, %arg5 : index + %293 = select %291, %292, %arg5 : index + %294 = divi_signed %293, %c16 : index + %295 = subi %c-1, %294 : index + %296 = select %291, %295, %294 : index + %297 = addi %296, %c3 : index + %298 = cmpi "slt", %297, %c0 : index + %299 = subi %c-1, %297 : index + %300 = select %298, %299, %297 : index + %301 = divi_signed %300, %c16 : index + %302 = subi %c-1, %301 : index + %303 = select %298, %302, %301 : index + %304 = muli %303, %c-16 : index + %305 = addi %290, %304 : index + %306 = addi %305, %c3 : index + %307 = remi_signed %arg5, %c16 : index + %308 = cmpi "slt", %307, %c0 : index + %309 = addi %307, %c16 : index + %310 = select %308, %309, %307 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c8 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = remi_signed %316, %c2 : index + %318 = cmpi "slt", %317, %c0 : index + %319 = addi %317, %c2 : index + %320 = select %318, %319, %317 : index + %321 = load %2[%306, %c0, %320] : memref<16x6x2xvector<8xf32>> + %322 = addf %284, %321 : vector<8xf32> + store %322, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %323 = addi %arg3, %arg5 : index + %324 = addi %323, %c56 : index + %325 = vector.transfer_read %arg2[%arg4, %324], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %326 = addi %arg5, %c56 : index + %327 = cmpi "slt", %326, %c0 : index + %328 = subi %c-1, %326 : index + %329 = select %327, %328, %326 : index + %330 = divi_signed %329, %c16 : index + %331 = subi %c-1, %330 : index + %332 = select %327, %331, %330 : index + %333 = remi_signed %332, %c16 : index + %334 = cmpi "slt", %333, %c0 : index + %335 = addi %333, %c16 : index + %336 = select %334, %335, %333 : index + %337 = cmpi "slt", %arg5, %c0 : index + %338 = subi %c-1, %arg5 : index + %339 = select %337, %338, %arg5 : index + %340 = divi_signed %339, %c8 : index + %341 = subi %c-1, %340 : index + %342 = select %337, %341, %340 : index + %343 = addi %arg5, %c56 : index + %344 = cmpi "slt", %343, %c0 : index + %345 = subi %c-1, %343 : index + %346 = select %344, %345, %343 : index + %347 = divi_signed %346, %c16 : index + %348 = subi %c-1, %347 : index + %349 = select %344, %348, %347 : index + %350 = muli %349, %c-2 : index + %351 = addi %342, %350 : index + %352 = cmpi "slt", %arg5, %c0 : index + %353 = subi %c-1, %arg5 : index + %354 = select %352, %353, %arg5 : index + %355 = divi_signed %354, %c8 : index + %356 = subi %c-1, %355 : index + %357 = select %352, %356, %355 : index + %358 = addi %arg5, %c56 : index + %359 = cmpi "slt", %358, %c0 : index + %360 = subi %c-1, %358 : index + %361 = select %359, %360, %358 : index + %362 = divi_signed %361, %c16 : index + %363 = subi %c-1, %362 : index + %364 = select %359, %363, %362 : index + %365 = muli %364, %c-2 : index + %366 = addi %357, %365 : index + %367 = addi %366, %c7 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = subi %c-1, %367 : index + %370 = select %368, %369, %367 : index + %371 = divi_signed %370, %c2 : index + %372 = subi %c-1, %371 : index + %373 = select %368, %372, %371 : index + %374 = muli %373, %c-2 : index + %375 = addi %351, %374 : index + %376 = addi %375, %c7 : index + %377 = load %2[%336, %c0, %376] : memref<16x6x2xvector<8xf32>> + %378 = addf %325, %377 : vector<8xf32> + store %378, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %379 = addi %arg3, %arg5 : index + %380 = addi %379, %c64 : index + %381 = vector.transfer_read %arg2[%arg4, %380], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %382 = cmpi "slt", %arg5, %c0 : index + %383 = subi %c-1, %arg5 : index + %384 = select %382, %383, %arg5 : index + %385 = divi_signed %384, %c16 : index + %386 = subi %c-1, %385 : index + %387 = select %382, %386, %385 : index + %388 = cmpi "slt", %arg5, %c0 : index + %389 = subi %c-1, %arg5 : index + %390 = select %388, %389, %arg5 : index + %391 = divi_signed %390, %c16 : index + %392 = subi %c-1, %391 : index + %393 = select %388, %392, %391 : index + %394 = addi %393, %c4 : index + %395 = cmpi "slt", %394, %c0 : index + %396 = subi %c-1, %394 : index + %397 = select %395, %396, %394 : index + %398 = divi_signed %397, %c16 : index + %399 = subi %c-1, %398 : index + %400 = select %395, %399, %398 : index + %401 = muli %400, %c-16 : index + %402 = addi %387, %401 : index + %403 = addi %402, %c4 : index + %404 = remi_signed %arg5, %c16 : index + %405 = cmpi "slt", %404, %c0 : index + %406 = addi %404, %c16 : index + %407 = select %405, %406, %404 : index + %408 = cmpi "slt", %407, %c0 : index + %409 = subi %c-1, %407 : index + %410 = select %408, %409, %407 : index + %411 = divi_signed %410, %c8 : index + %412 = subi %c-1, %411 : index + %413 = select %408, %412, %411 : index + %414 = remi_signed %413, %c2 : index + %415 = cmpi "slt", %414, %c0 : index + %416 = addi %414, %c2 : index + %417 = select %415, %416, %414 : index + %418 = load %2[%403, %c0, %417] : memref<16x6x2xvector<8xf32>> + %419 = addf %381, %418 : vector<8xf32> + store %419, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %420 = addi %arg3, %arg5 : index + %421 = addi %420, %c72 : index + %422 = vector.transfer_read %arg2[%arg4, %421], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %423 = addi %arg5, %c72 : index + %424 = cmpi "slt", %423, %c0 : index + %425 = subi %c-1, %423 : index + %426 = select %424, %425, %423 : index + %427 = divi_signed %426, %c16 : index + %428 = subi %c-1, %427 : index + %429 = select %424, %428, %427 : index + %430 = remi_signed %429, %c16 : index + %431 = cmpi "slt", %430, %c0 : index + %432 = addi %430, %c16 : index + %433 = select %431, %432, %430 : index + %434 = cmpi "slt", %arg5, %c0 : index + %435 = subi %c-1, %arg5 : index + %436 = select %434, %435, %arg5 : index + %437 = divi_signed %436, %c8 : index + %438 = subi %c-1, %437 : index + %439 = select %434, %438, %437 : index + %440 = addi %arg5, %c72 : index + %441 = cmpi "slt", %440, %c0 : index + %442 = subi %c-1, %440 : index + %443 = select %441, %442, %440 : index + %444 = divi_signed %443, %c16 : index + %445 = subi %c-1, %444 : index + %446 = select %441, %445, %444 : index + %447 = muli %446, %c-2 : index + %448 = addi %439, %447 : index + %449 = cmpi "slt", %arg5, %c0 : index + %450 = subi %c-1, %arg5 : index + %451 = select %449, %450, %arg5 : index + %452 = divi_signed %451, %c8 : index + %453 = subi %c-1, %452 : index + %454 = select %449, %453, %452 : index + %455 = addi %arg5, %c72 : index + %456 = cmpi "slt", %455, %c0 : index + %457 = subi %c-1, %455 : index + %458 = select %456, %457, %455 : index + %459 = divi_signed %458, %c16 : index + %460 = subi %c-1, %459 : index + %461 = select %456, %460, %459 : index + %462 = muli %461, %c-2 : index + %463 = addi %454, %462 : index + %464 = addi %463, %c9 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = subi %c-1, %464 : index + %467 = select %465, %466, %464 : index + %468 = divi_signed %467, %c2 : index + %469 = subi %c-1, %468 : index + %470 = select %465, %469, %468 : index + %471 = muli %470, %c-2 : index + %472 = addi %448, %471 : index + %473 = addi %472, %c9 : index + %474 = load %2[%433, %c0, %473] : memref<16x6x2xvector<8xf32>> + %475 = addf %422, %474 : vector<8xf32> + store %475, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %476 = addi %arg3, %arg5 : index + %477 = addi %476, %c80 : index + %478 = vector.transfer_read %arg2[%arg4, %477], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %479 = cmpi "slt", %arg5, %c0 : index + %480 = subi %c-1, %arg5 : index + %481 = select %479, %480, %arg5 : index + %482 = divi_signed %481, %c16 : index + %483 = subi %c-1, %482 : index + %484 = select %479, %483, %482 : index + %485 = cmpi "slt", %arg5, %c0 : index + %486 = subi %c-1, %arg5 : index + %487 = select %485, %486, %arg5 : index + %488 = divi_signed %487, %c16 : index + %489 = subi %c-1, %488 : index + %490 = select %485, %489, %488 : index + %491 = addi %490, %c5 : index + %492 = cmpi "slt", %491, %c0 : index + %493 = subi %c-1, %491 : index + %494 = select %492, %493, %491 : index + %495 = divi_signed %494, %c16 : index + %496 = subi %c-1, %495 : index + %497 = select %492, %496, %495 : index + %498 = muli %497, %c-16 : index + %499 = addi %484, %498 : index + %500 = addi %499, %c5 : index + %501 = remi_signed %arg5, %c16 : index + %502 = cmpi "slt", %501, %c0 : index + %503 = addi %501, %c16 : index + %504 = select %502, %503, %501 : index + %505 = cmpi "slt", %504, %c0 : index + %506 = subi %c-1, %504 : index + %507 = select %505, %506, %504 : index + %508 = divi_signed %507, %c8 : index + %509 = subi %c-1, %508 : index + %510 = select %505, %509, %508 : index + %511 = remi_signed %510, %c2 : index + %512 = cmpi "slt", %511, %c0 : index + %513 = addi %511, %c2 : index + %514 = select %512, %513, %511 : index + %515 = load %2[%500, %c0, %514] : memref<16x6x2xvector<8xf32>> + %516 = addf %478, %515 : vector<8xf32> + store %516, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %517 = addi %arg3, %arg5 : index + %518 = addi %517, %c88 : index + %519 = vector.transfer_read %arg2[%arg4, %518], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %520 = addi %arg5, %c88 : index + %521 = cmpi "slt", %520, %c0 : index + %522 = subi %c-1, %520 : index + %523 = select %521, %522, %520 : index + %524 = divi_signed %523, %c16 : index + %525 = subi %c-1, %524 : index + %526 = select %521, %525, %524 : index + %527 = remi_signed %526, %c16 : index + %528 = cmpi "slt", %527, %c0 : index + %529 = addi %527, %c16 : index + %530 = select %528, %529, %527 : index + %531 = cmpi "slt", %arg5, %c0 : index + %532 = subi %c-1, %arg5 : index + %533 = select %531, %532, %arg5 : index + %534 = divi_signed %533, %c8 : index + %535 = subi %c-1, %534 : index + %536 = select %531, %535, %534 : index + %537 = addi %arg5, %c88 : index + %538 = cmpi "slt", %537, %c0 : index + %539 = subi %c-1, %537 : index + %540 = select %538, %539, %537 : index + %541 = divi_signed %540, %c16 : index + %542 = subi %c-1, %541 : index + %543 = select %538, %542, %541 : index + %544 = muli %543, %c-2 : index + %545 = addi %536, %544 : index + %546 = cmpi "slt", %arg5, %c0 : index + %547 = subi %c-1, %arg5 : index + %548 = select %546, %547, %arg5 : index + %549 = divi_signed %548, %c8 : index + %550 = subi %c-1, %549 : index + %551 = select %546, %550, %549 : index + %552 = addi %arg5, %c88 : index + %553 = cmpi "slt", %552, %c0 : index + %554 = subi %c-1, %552 : index + %555 = select %553, %554, %552 : index + %556 = divi_signed %555, %c16 : index + %557 = subi %c-1, %556 : index + %558 = select %553, %557, %556 : index + %559 = muli %558, %c-2 : index + %560 = addi %551, %559 : index + %561 = addi %560, %c11 : index + %562 = cmpi "slt", %561, %c0 : index + %563 = subi %c-1, %561 : index + %564 = select %562, %563, %561 : index + %565 = divi_signed %564, %c2 : index + %566 = subi %c-1, %565 : index + %567 = select %562, %566, %565 : index + %568 = muli %567, %c-2 : index + %569 = addi %545, %568 : index + %570 = addi %569, %c11 : index + %571 = load %2[%530, %c0, %570] : memref<16x6x2xvector<8xf32>> + %572 = addf %519, %571 : vector<8xf32> + store %572, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %573 = addi %arg3, %arg5 : index + %574 = addi %573, %c96 : index + %575 = vector.transfer_read %arg2[%arg4, %574], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %576 = cmpi "slt", %arg5, %c0 : index + %577 = subi %c-1, %arg5 : index + %578 = select %576, %577, %arg5 : index + %579 = divi_signed %578, %c16 : index + %580 = subi %c-1, %579 : index + %581 = select %576, %580, %579 : index + %582 = cmpi "slt", %arg5, %c0 : index + %583 = subi %c-1, %arg5 : index + %584 = select %582, %583, %arg5 : index + %585 = divi_signed %584, %c16 : index + %586 = subi %c-1, %585 : index + %587 = select %582, %586, %585 : index + %588 = addi %587, %c6 : index + %589 = cmpi "slt", %588, %c0 : index + %590 = subi %c-1, %588 : index + %591 = select %589, %590, %588 : index + %592 = divi_signed %591, %c16 : index + %593 = subi %c-1, %592 : index + %594 = select %589, %593, %592 : index + %595 = muli %594, %c-16 : index + %596 = addi %581, %595 : index + %597 = addi %596, %c6 : index + %598 = remi_signed %arg5, %c16 : index + %599 = cmpi "slt", %598, %c0 : index + %600 = addi %598, %c16 : index + %601 = select %599, %600, %598 : index + %602 = cmpi "slt", %601, %c0 : index + %603 = subi %c-1, %601 : index + %604 = select %602, %603, %601 : index + %605 = divi_signed %604, %c8 : index + %606 = subi %c-1, %605 : index + %607 = select %602, %606, %605 : index + %608 = remi_signed %607, %c2 : index + %609 = cmpi "slt", %608, %c0 : index + %610 = addi %608, %c2 : index + %611 = select %609, %610, %608 : index + %612 = load %2[%597, %c0, %611] : memref<16x6x2xvector<8xf32>> + %613 = addf %575, %612 : vector<8xf32> + store %613, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %614 = addi %arg3, %arg5 : index + %615 = addi %614, %c104 : index + %616 = vector.transfer_read %arg2[%arg4, %615], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %617 = addi %arg5, %c104 : index + %618 = cmpi "slt", %617, %c0 : index + %619 = subi %c-1, %617 : index + %620 = select %618, %619, %617 : index + %621 = divi_signed %620, %c16 : index + %622 = subi %c-1, %621 : index + %623 = select %618, %622, %621 : index + %624 = remi_signed %623, %c16 : index + %625 = cmpi "slt", %624, %c0 : index + %626 = addi %624, %c16 : index + %627 = select %625, %626, %624 : index + %628 = cmpi "slt", %arg5, %c0 : index + %629 = subi %c-1, %arg5 : index + %630 = select %628, %629, %arg5 : index + %631 = divi_signed %630, %c8 : index + %632 = subi %c-1, %631 : index + %633 = select %628, %632, %631 : index + %634 = addi %arg5, %c104 : index + %635 = cmpi "slt", %634, %c0 : index + %636 = subi %c-1, %634 : index + %637 = select %635, %636, %634 : index + %638 = divi_signed %637, %c16 : index + %639 = subi %c-1, %638 : index + %640 = select %635, %639, %638 : index + %641 = muli %640, %c-2 : index + %642 = addi %633, %641 : index + %643 = cmpi "slt", %arg5, %c0 : index + %644 = subi %c-1, %arg5 : index + %645 = select %643, %644, %arg5 : index + %646 = divi_signed %645, %c8 : index + %647 = subi %c-1, %646 : index + %648 = select %643, %647, %646 : index + %649 = addi %arg5, %c104 : index + %650 = cmpi "slt", %649, %c0 : index + %651 = subi %c-1, %649 : index + %652 = select %650, %651, %649 : index + %653 = divi_signed %652, %c16 : index + %654 = subi %c-1, %653 : index + %655 = select %650, %654, %653 : index + %656 = muli %655, %c-2 : index + %657 = addi %648, %656 : index + %658 = addi %657, %c13 : index + %659 = cmpi "slt", %658, %c0 : index + %660 = subi %c-1, %658 : index + %661 = select %659, %660, %658 : index + %662 = divi_signed %661, %c2 : index + %663 = subi %c-1, %662 : index + %664 = select %659, %663, %662 : index + %665 = muli %664, %c-2 : index + %666 = addi %642, %665 : index + %667 = addi %666, %c13 : index + %668 = load %2[%627, %c0, %667] : memref<16x6x2xvector<8xf32>> + %669 = addf %616, %668 : vector<8xf32> + store %669, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %670 = addi %arg3, %arg5 : index + %671 = addi %670, %c112 : index + %672 = vector.transfer_read %arg2[%arg4, %671], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %673 = cmpi "slt", %arg5, %c0 : index + %674 = subi %c-1, %arg5 : index + %675 = select %673, %674, %arg5 : index + %676 = divi_signed %675, %c16 : index + %677 = subi %c-1, %676 : index + %678 = select %673, %677, %676 : index + %679 = cmpi "slt", %arg5, %c0 : index + %680 = subi %c-1, %arg5 : index + %681 = select %679, %680, %arg5 : index + %682 = divi_signed %681, %c16 : index + %683 = subi %c-1, %682 : index + %684 = select %679, %683, %682 : index + %685 = addi %684, %c7 : index + %686 = cmpi "slt", %685, %c0 : index + %687 = subi %c-1, %685 : index + %688 = select %686, %687, %685 : index + %689 = divi_signed %688, %c16 : index + %690 = subi %c-1, %689 : index + %691 = select %686, %690, %689 : index + %692 = muli %691, %c-16 : index + %693 = addi %678, %692 : index + %694 = addi %693, %c7 : index + %695 = remi_signed %arg5, %c16 : index + %696 = cmpi "slt", %695, %c0 : index + %697 = addi %695, %c16 : index + %698 = select %696, %697, %695 : index + %699 = cmpi "slt", %698, %c0 : index + %700 = subi %c-1, %698 : index + %701 = select %699, %700, %698 : index + %702 = divi_signed %701, %c8 : index + %703 = subi %c-1, %702 : index + %704 = select %699, %703, %702 : index + %705 = remi_signed %704, %c2 : index + %706 = cmpi "slt", %705, %c0 : index + %707 = addi %705, %c2 : index + %708 = select %706, %707, %705 : index + %709 = load %2[%694, %c0, %708] : memref<16x6x2xvector<8xf32>> + %710 = addf %672, %709 : vector<8xf32> + store %710, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %711 = addi %arg3, %arg5 : index + %712 = addi %711, %c120 : index + %713 = vector.transfer_read %arg2[%arg4, %712], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %714 = addi %arg5, %c120 : index + %715 = cmpi "slt", %714, %c0 : index + %716 = subi %c-1, %714 : index + %717 = select %715, %716, %714 : index + %718 = divi_signed %717, %c16 : index + %719 = subi %c-1, %718 : index + %720 = select %715, %719, %718 : index + %721 = remi_signed %720, %c16 : index + %722 = cmpi "slt", %721, %c0 : index + %723 = addi %721, %c16 : index + %724 = select %722, %723, %721 : index + %725 = cmpi "slt", %arg5, %c0 : index + %726 = subi %c-1, %arg5 : index + %727 = select %725, %726, %arg5 : index + %728 = divi_signed %727, %c8 : index + %729 = subi %c-1, %728 : index + %730 = select %725, %729, %728 : index + %731 = addi %arg5, %c120 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c16 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = muli %737, %c-2 : index + %739 = addi %730, %738 : index + %740 = cmpi "slt", %arg5, %c0 : index + %741 = subi %c-1, %arg5 : index + %742 = select %740, %741, %arg5 : index + %743 = divi_signed %742, %c8 : index + %744 = subi %c-1, %743 : index + %745 = select %740, %744, %743 : index + %746 = addi %arg5, %c120 : index + %747 = cmpi "slt", %746, %c0 : index + %748 = subi %c-1, %746 : index + %749 = select %747, %748, %746 : index + %750 = divi_signed %749, %c16 : index + %751 = subi %c-1, %750 : index + %752 = select %747, %751, %750 : index + %753 = muli %752, %c-2 : index + %754 = addi %745, %753 : index + %755 = addi %754, %c15 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = subi %c-1, %755 : index + %758 = select %756, %757, %755 : index + %759 = divi_signed %758, %c2 : index + %760 = subi %c-1, %759 : index + %761 = select %756, %760, %759 : index + %762 = muli %761, %c-2 : index + %763 = addi %739, %762 : index + %764 = addi %763, %c15 : index + %765 = load %2[%724, %c0, %764] : memref<16x6x2xvector<8xf32>> + %766 = addf %713, %765 : vector<8xf32> + store %766, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %767 = addi %arg3, %arg5 : index + %768 = muli %arg6, %c8 : index + %769 = addi %767, %768 : index + %770 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %770, %arg2[%arg4, %769] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %arg3, %arg5 : index + %33 = addi %32, %c8 : index + %34 = vector.transfer_read %arg2[%arg4, %33], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %35 = addi %arg5, %c8 : index + %36 = cmpi "slt", %35, %c0 : index + %37 = subi %c-1, %35 : index + %38 = select %36, %37, %35 : index + %39 = divi_signed %38, %c16 : index + %40 = subi %c-1, %39 : index + %41 = select %36, %40, %39 : index + %42 = remi_signed %41, %c16 : index + %43 = cmpi "slt", %42, %c0 : index + %44 = addi %42, %c16 : index + %45 = select %43, %44, %42 : index + %46 = cmpi "slt", %arg5, %c0 : index + %47 = subi %c-1, %arg5 : index + %48 = select %46, %47, %arg5 : index + %49 = divi_signed %48, %c8 : index + %50 = subi %c-1, %49 : index + %51 = select %46, %50, %49 : index + %52 = addi %arg5, %c8 : index + %53 = cmpi "slt", %52, %c0 : index + %54 = subi %c-1, %52 : index + %55 = select %53, %54, %52 : index + %56 = divi_signed %55, %c16 : index + %57 = subi %c-1, %56 : index + %58 = select %53, %57, %56 : index + %59 = muli %58, %c-2 : index + %60 = addi %51, %59 : index + %61 = cmpi "slt", %arg5, %c0 : index + %62 = subi %c-1, %arg5 : index + %63 = select %61, %62, %arg5 : index + %64 = divi_signed %63, %c8 : index + %65 = subi %c-1, %64 : index + %66 = select %61, %65, %64 : index + %67 = addi %arg5, %c8 : index + %68 = cmpi "slt", %67, %c0 : index + %69 = subi %c-1, %67 : index + %70 = select %68, %69, %67 : index + %71 = divi_signed %70, %c16 : index + %72 = subi %c-1, %71 : index + %73 = select %68, %72, %71 : index + %74 = muli %73, %c-2 : index + %75 = addi %66, %74 : index + %76 = addi %75, %c1 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = subi %c-1, %76 : index + %79 = select %77, %78, %76 : index + %80 = divi_signed %79, %c2 : index + %81 = subi %c-1, %80 : index + %82 = select %77, %81, %80 : index + %83 = muli %82, %c-2 : index + %84 = addi %60, %83 : index + %85 = addi %84, %c1 : index + %86 = load %2[%45, %c0, %85] : memref<16x6x2xvector<8xf32>> + %87 = addf %34, %86 : vector<8xf32> + store %87, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %88 = addi %arg3, %arg5 : index + %89 = addi %88, %c16 : index + %90 = vector.transfer_read %arg2[%arg4, %89], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %91 = cmpi "slt", %arg5, %c0 : index + %92 = subi %c-1, %arg5 : index + %93 = select %91, %92, %arg5 : index + %94 = divi_signed %93, %c16 : index + %95 = subi %c-1, %94 : index + %96 = select %91, %95, %94 : index + %97 = cmpi "slt", %arg5, %c0 : index + %98 = subi %c-1, %arg5 : index + %99 = select %97, %98, %arg5 : index + %100 = divi_signed %99, %c16 : index + %101 = subi %c-1, %100 : index + %102 = select %97, %101, %100 : index + %103 = addi %102, %c1 : index + %104 = cmpi "slt", %103, %c0 : index + %105 = subi %c-1, %103 : index + %106 = select %104, %105, %103 : index + %107 = divi_signed %106, %c16 : index + %108 = subi %c-1, %107 : index + %109 = select %104, %108, %107 : index + %110 = muli %109, %c-16 : index + %111 = addi %96, %110 : index + %112 = addi %111, %c1 : index + %113 = remi_signed %arg5, %c16 : index + %114 = cmpi "slt", %113, %c0 : index + %115 = addi %113, %c16 : index + %116 = select %114, %115, %113 : index + %117 = cmpi "slt", %116, %c0 : index + %118 = subi %c-1, %116 : index + %119 = select %117, %118, %116 : index + %120 = divi_signed %119, %c8 : index + %121 = subi %c-1, %120 : index + %122 = select %117, %121, %120 : index + %123 = remi_signed %122, %c2 : index + %124 = cmpi "slt", %123, %c0 : index + %125 = addi %123, %c2 : index + %126 = select %124, %125, %123 : index + %127 = load %2[%112, %c0, %126] : memref<16x6x2xvector<8xf32>> + %128 = addf %90, %127 : vector<8xf32> + store %128, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %129 = addi %arg3, %arg5 : index + %130 = addi %129, %c24 : index + %131 = vector.transfer_read %arg2[%arg4, %130], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %132 = addi %arg5, %c24 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c16 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = remi_signed %138, %c16 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = addi %139, %c16 : index + %142 = select %140, %141, %139 : index + %143 = cmpi "slt", %arg5, %c0 : index + %144 = subi %c-1, %arg5 : index + %145 = select %143, %144, %arg5 : index + %146 = divi_signed %145, %c8 : index + %147 = subi %c-1, %146 : index + %148 = select %143, %147, %146 : index + %149 = addi %arg5, %c24 : index + %150 = cmpi "slt", %149, %c0 : index + %151 = subi %c-1, %149 : index + %152 = select %150, %151, %149 : index + %153 = divi_signed %152, %c16 : index + %154 = subi %c-1, %153 : index + %155 = select %150, %154, %153 : index + %156 = muli %155, %c-2 : index + %157 = addi %148, %156 : index + %158 = cmpi "slt", %arg5, %c0 : index + %159 = subi %c-1, %arg5 : index + %160 = select %158, %159, %arg5 : index + %161 = divi_signed %160, %c8 : index + %162 = subi %c-1, %161 : index + %163 = select %158, %162, %161 : index + %164 = addi %arg5, %c24 : index + %165 = cmpi "slt", %164, %c0 : index + %166 = subi %c-1, %164 : index + %167 = select %165, %166, %164 : index + %168 = divi_signed %167, %c16 : index + %169 = subi %c-1, %168 : index + %170 = select %165, %169, %168 : index + %171 = muli %170, %c-2 : index + %172 = addi %163, %171 : index + %173 = addi %172, %c3 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %157, %180 : index + %182 = addi %181, %c3 : index + %183 = load %2[%142, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %131, %183 : vector<8xf32> + store %184, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %185 = addi %arg3, %arg5 : index + %186 = addi %185, %c32 : index + %187 = vector.transfer_read %arg2[%arg4, %186], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %188 = cmpi "slt", %arg5, %c0 : index + %189 = subi %c-1, %arg5 : index + %190 = select %188, %189, %arg5 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = cmpi "slt", %arg5, %c0 : index + %195 = subi %c-1, %arg5 : index + %196 = select %194, %195, %arg5 : index + %197 = divi_signed %196, %c16 : index + %198 = subi %c-1, %197 : index + %199 = select %194, %198, %197 : index + %200 = addi %199, %c2 : index + %201 = cmpi "slt", %200, %c0 : index + %202 = subi %c-1, %200 : index + %203 = select %201, %202, %200 : index + %204 = divi_signed %203, %c16 : index + %205 = subi %c-1, %204 : index + %206 = select %201, %205, %204 : index + %207 = muli %206, %c-16 : index + %208 = addi %193, %207 : index + %209 = addi %208, %c2 : index + %210 = remi_signed %arg5, %c16 : index + %211 = cmpi "slt", %210, %c0 : index + %212 = addi %210, %c16 : index + %213 = select %211, %212, %210 : index + %214 = cmpi "slt", %213, %c0 : index + %215 = subi %c-1, %213 : index + %216 = select %214, %215, %213 : index + %217 = divi_signed %216, %c8 : index + %218 = subi %c-1, %217 : index + %219 = select %214, %218, %217 : index + %220 = remi_signed %219, %c2 : index + %221 = cmpi "slt", %220, %c0 : index + %222 = addi %220, %c2 : index + %223 = select %221, %222, %220 : index + %224 = load %2[%209, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %187, %224 : vector<8xf32> + store %225, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %226 = addi %arg3, %arg5 : index + %227 = addi %226, %c40 : index + %228 = vector.transfer_read %arg2[%arg4, %227], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %229 = addi %arg5, %c40 : index + %230 = cmpi "slt", %229, %c0 : index + %231 = subi %c-1, %229 : index + %232 = select %230, %231, %229 : index + %233 = divi_signed %232, %c16 : index + %234 = subi %c-1, %233 : index + %235 = select %230, %234, %233 : index + %236 = remi_signed %235, %c16 : index + %237 = cmpi "slt", %236, %c0 : index + %238 = addi %236, %c16 : index + %239 = select %237, %238, %236 : index + %240 = cmpi "slt", %arg5, %c0 : index + %241 = subi %c-1, %arg5 : index + %242 = select %240, %241, %arg5 : index + %243 = divi_signed %242, %c8 : index + %244 = subi %c-1, %243 : index + %245 = select %240, %244, %243 : index + %246 = addi %arg5, %c40 : index + %247 = cmpi "slt", %246, %c0 : index + %248 = subi %c-1, %246 : index + %249 = select %247, %248, %246 : index + %250 = divi_signed %249, %c16 : index + %251 = subi %c-1, %250 : index + %252 = select %247, %251, %250 : index + %253 = muli %252, %c-2 : index + %254 = addi %245, %253 : index + %255 = cmpi "slt", %arg5, %c0 : index + %256 = subi %c-1, %arg5 : index + %257 = select %255, %256, %arg5 : index + %258 = divi_signed %257, %c8 : index + %259 = subi %c-1, %258 : index + %260 = select %255, %259, %258 : index + %261 = addi %arg5, %c40 : index + %262 = cmpi "slt", %261, %c0 : index + %263 = subi %c-1, %261 : index + %264 = select %262, %263, %261 : index + %265 = divi_signed %264, %c16 : index + %266 = subi %c-1, %265 : index + %267 = select %262, %266, %265 : index + %268 = muli %267, %c-2 : index + %269 = addi %260, %268 : index + %270 = addi %269, %c5 : index + %271 = cmpi "slt", %270, %c0 : index + %272 = subi %c-1, %270 : index + %273 = select %271, %272, %270 : index + %274 = divi_signed %273, %c2 : index + %275 = subi %c-1, %274 : index + %276 = select %271, %275, %274 : index + %277 = muli %276, %c-2 : index + %278 = addi %254, %277 : index + %279 = addi %278, %c5 : index + %280 = load %2[%239, %c0, %279] : memref<16x6x2xvector<8xf32>> + %281 = addf %228, %280 : vector<8xf32> + store %281, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %282 = addi %arg3, %arg5 : index + %283 = addi %282, %c48 : index + %284 = vector.transfer_read %arg2[%arg4, %283], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %285 = cmpi "slt", %arg5, %c0 : index + %286 = subi %c-1, %arg5 : index + %287 = select %285, %286, %arg5 : index + %288 = divi_signed %287, %c16 : index + %289 = subi %c-1, %288 : index + %290 = select %285, %289, %288 : index + %291 = cmpi "slt", %arg5, %c0 : index + %292 = subi %c-1, %arg5 : index + %293 = select %291, %292, %arg5 : index + %294 = divi_signed %293, %c16 : index + %295 = subi %c-1, %294 : index + %296 = select %291, %295, %294 : index + %297 = addi %296, %c3 : index + %298 = cmpi "slt", %297, %c0 : index + %299 = subi %c-1, %297 : index + %300 = select %298, %299, %297 : index + %301 = divi_signed %300, %c16 : index + %302 = subi %c-1, %301 : index + %303 = select %298, %302, %301 : index + %304 = muli %303, %c-16 : index + %305 = addi %290, %304 : index + %306 = addi %305, %c3 : index + %307 = remi_signed %arg5, %c16 : index + %308 = cmpi "slt", %307, %c0 : index + %309 = addi %307, %c16 : index + %310 = select %308, %309, %307 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c8 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = remi_signed %316, %c2 : index + %318 = cmpi "slt", %317, %c0 : index + %319 = addi %317, %c2 : index + %320 = select %318, %319, %317 : index + %321 = load %2[%306, %c0, %320] : memref<16x6x2xvector<8xf32>> + %322 = addf %284, %321 : vector<8xf32> + store %322, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %323 = addi %arg3, %arg5 : index + %324 = addi %323, %c56 : index + %325 = vector.transfer_read %arg2[%arg4, %324], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %326 = addi %arg5, %c56 : index + %327 = cmpi "slt", %326, %c0 : index + %328 = subi %c-1, %326 : index + %329 = select %327, %328, %326 : index + %330 = divi_signed %329, %c16 : index + %331 = subi %c-1, %330 : index + %332 = select %327, %331, %330 : index + %333 = remi_signed %332, %c16 : index + %334 = cmpi "slt", %333, %c0 : index + %335 = addi %333, %c16 : index + %336 = select %334, %335, %333 : index + %337 = cmpi "slt", %arg5, %c0 : index + %338 = subi %c-1, %arg5 : index + %339 = select %337, %338, %arg5 : index + %340 = divi_signed %339, %c8 : index + %341 = subi %c-1, %340 : index + %342 = select %337, %341, %340 : index + %343 = addi %arg5, %c56 : index + %344 = cmpi "slt", %343, %c0 : index + %345 = subi %c-1, %343 : index + %346 = select %344, %345, %343 : index + %347 = divi_signed %346, %c16 : index + %348 = subi %c-1, %347 : index + %349 = select %344, %348, %347 : index + %350 = muli %349, %c-2 : index + %351 = addi %342, %350 : index + %352 = cmpi "slt", %arg5, %c0 : index + %353 = subi %c-1, %arg5 : index + %354 = select %352, %353, %arg5 : index + %355 = divi_signed %354, %c8 : index + %356 = subi %c-1, %355 : index + %357 = select %352, %356, %355 : index + %358 = addi %arg5, %c56 : index + %359 = cmpi "slt", %358, %c0 : index + %360 = subi %c-1, %358 : index + %361 = select %359, %360, %358 : index + %362 = divi_signed %361, %c16 : index + %363 = subi %c-1, %362 : index + %364 = select %359, %363, %362 : index + %365 = muli %364, %c-2 : index + %366 = addi %357, %365 : index + %367 = addi %366, %c7 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = subi %c-1, %367 : index + %370 = select %368, %369, %367 : index + %371 = divi_signed %370, %c2 : index + %372 = subi %c-1, %371 : index + %373 = select %368, %372, %371 : index + %374 = muli %373, %c-2 : index + %375 = addi %351, %374 : index + %376 = addi %375, %c7 : index + %377 = load %2[%336, %c0, %376] : memref<16x6x2xvector<8xf32>> + %378 = addf %325, %377 : vector<8xf32> + store %378, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %379 = addi %arg3, %arg5 : index + %380 = addi %379, %c64 : index + %381 = vector.transfer_read %arg2[%arg4, %380], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %382 = cmpi "slt", %arg5, %c0 : index + %383 = subi %c-1, %arg5 : index + %384 = select %382, %383, %arg5 : index + %385 = divi_signed %384, %c16 : index + %386 = subi %c-1, %385 : index + %387 = select %382, %386, %385 : index + %388 = cmpi "slt", %arg5, %c0 : index + %389 = subi %c-1, %arg5 : index + %390 = select %388, %389, %arg5 : index + %391 = divi_signed %390, %c16 : index + %392 = subi %c-1, %391 : index + %393 = select %388, %392, %391 : index + %394 = addi %393, %c4 : index + %395 = cmpi "slt", %394, %c0 : index + %396 = subi %c-1, %394 : index + %397 = select %395, %396, %394 : index + %398 = divi_signed %397, %c16 : index + %399 = subi %c-1, %398 : index + %400 = select %395, %399, %398 : index + %401 = muli %400, %c-16 : index + %402 = addi %387, %401 : index + %403 = addi %402, %c4 : index + %404 = remi_signed %arg5, %c16 : index + %405 = cmpi "slt", %404, %c0 : index + %406 = addi %404, %c16 : index + %407 = select %405, %406, %404 : index + %408 = cmpi "slt", %407, %c0 : index + %409 = subi %c-1, %407 : index + %410 = select %408, %409, %407 : index + %411 = divi_signed %410, %c8 : index + %412 = subi %c-1, %411 : index + %413 = select %408, %412, %411 : index + %414 = remi_signed %413, %c2 : index + %415 = cmpi "slt", %414, %c0 : index + %416 = addi %414, %c2 : index + %417 = select %415, %416, %414 : index + %418 = load %2[%403, %c0, %417] : memref<16x6x2xvector<8xf32>> + %419 = addf %381, %418 : vector<8xf32> + store %419, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %420 = addi %arg3, %arg5 : index + %421 = addi %420, %c72 : index + %422 = vector.transfer_read %arg2[%arg4, %421], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %423 = addi %arg5, %c72 : index + %424 = cmpi "slt", %423, %c0 : index + %425 = subi %c-1, %423 : index + %426 = select %424, %425, %423 : index + %427 = divi_signed %426, %c16 : index + %428 = subi %c-1, %427 : index + %429 = select %424, %428, %427 : index + %430 = remi_signed %429, %c16 : index + %431 = cmpi "slt", %430, %c0 : index + %432 = addi %430, %c16 : index + %433 = select %431, %432, %430 : index + %434 = cmpi "slt", %arg5, %c0 : index + %435 = subi %c-1, %arg5 : index + %436 = select %434, %435, %arg5 : index + %437 = divi_signed %436, %c8 : index + %438 = subi %c-1, %437 : index + %439 = select %434, %438, %437 : index + %440 = addi %arg5, %c72 : index + %441 = cmpi "slt", %440, %c0 : index + %442 = subi %c-1, %440 : index + %443 = select %441, %442, %440 : index + %444 = divi_signed %443, %c16 : index + %445 = subi %c-1, %444 : index + %446 = select %441, %445, %444 : index + %447 = muli %446, %c-2 : index + %448 = addi %439, %447 : index + %449 = cmpi "slt", %arg5, %c0 : index + %450 = subi %c-1, %arg5 : index + %451 = select %449, %450, %arg5 : index + %452 = divi_signed %451, %c8 : index + %453 = subi %c-1, %452 : index + %454 = select %449, %453, %452 : index + %455 = addi %arg5, %c72 : index + %456 = cmpi "slt", %455, %c0 : index + %457 = subi %c-1, %455 : index + %458 = select %456, %457, %455 : index + %459 = divi_signed %458, %c16 : index + %460 = subi %c-1, %459 : index + %461 = select %456, %460, %459 : index + %462 = muli %461, %c-2 : index + %463 = addi %454, %462 : index + %464 = addi %463, %c9 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = subi %c-1, %464 : index + %467 = select %465, %466, %464 : index + %468 = divi_signed %467, %c2 : index + %469 = subi %c-1, %468 : index + %470 = select %465, %469, %468 : index + %471 = muli %470, %c-2 : index + %472 = addi %448, %471 : index + %473 = addi %472, %c9 : index + %474 = load %2[%433, %c0, %473] : memref<16x6x2xvector<8xf32>> + %475 = addf %422, %474 : vector<8xf32> + store %475, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %476 = addi %arg3, %arg5 : index + %477 = addi %476, %c80 : index + %478 = vector.transfer_read %arg2[%arg4, %477], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %479 = cmpi "slt", %arg5, %c0 : index + %480 = subi %c-1, %arg5 : index + %481 = select %479, %480, %arg5 : index + %482 = divi_signed %481, %c16 : index + %483 = subi %c-1, %482 : index + %484 = select %479, %483, %482 : index + %485 = cmpi "slt", %arg5, %c0 : index + %486 = subi %c-1, %arg5 : index + %487 = select %485, %486, %arg5 : index + %488 = divi_signed %487, %c16 : index + %489 = subi %c-1, %488 : index + %490 = select %485, %489, %488 : index + %491 = addi %490, %c5 : index + %492 = cmpi "slt", %491, %c0 : index + %493 = subi %c-1, %491 : index + %494 = select %492, %493, %491 : index + %495 = divi_signed %494, %c16 : index + %496 = subi %c-1, %495 : index + %497 = select %492, %496, %495 : index + %498 = muli %497, %c-16 : index + %499 = addi %484, %498 : index + %500 = addi %499, %c5 : index + %501 = remi_signed %arg5, %c16 : index + %502 = cmpi "slt", %501, %c0 : index + %503 = addi %501, %c16 : index + %504 = select %502, %503, %501 : index + %505 = cmpi "slt", %504, %c0 : index + %506 = subi %c-1, %504 : index + %507 = select %505, %506, %504 : index + %508 = divi_signed %507, %c8 : index + %509 = subi %c-1, %508 : index + %510 = select %505, %509, %508 : index + %511 = remi_signed %510, %c2 : index + %512 = cmpi "slt", %511, %c0 : index + %513 = addi %511, %c2 : index + %514 = select %512, %513, %511 : index + %515 = load %2[%500, %c0, %514] : memref<16x6x2xvector<8xf32>> + %516 = addf %478, %515 : vector<8xf32> + store %516, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %517 = addi %arg3, %arg5 : index + %518 = addi %517, %c88 : index + %519 = vector.transfer_read %arg2[%arg4, %518], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %520 = addi %arg5, %c88 : index + %521 = cmpi "slt", %520, %c0 : index + %522 = subi %c-1, %520 : index + %523 = select %521, %522, %520 : index + %524 = divi_signed %523, %c16 : index + %525 = subi %c-1, %524 : index + %526 = select %521, %525, %524 : index + %527 = remi_signed %526, %c16 : index + %528 = cmpi "slt", %527, %c0 : index + %529 = addi %527, %c16 : index + %530 = select %528, %529, %527 : index + %531 = cmpi "slt", %arg5, %c0 : index + %532 = subi %c-1, %arg5 : index + %533 = select %531, %532, %arg5 : index + %534 = divi_signed %533, %c8 : index + %535 = subi %c-1, %534 : index + %536 = select %531, %535, %534 : index + %537 = addi %arg5, %c88 : index + %538 = cmpi "slt", %537, %c0 : index + %539 = subi %c-1, %537 : index + %540 = select %538, %539, %537 : index + %541 = divi_signed %540, %c16 : index + %542 = subi %c-1, %541 : index + %543 = select %538, %542, %541 : index + %544 = muli %543, %c-2 : index + %545 = addi %536, %544 : index + %546 = cmpi "slt", %arg5, %c0 : index + %547 = subi %c-1, %arg5 : index + %548 = select %546, %547, %arg5 : index + %549 = divi_signed %548, %c8 : index + %550 = subi %c-1, %549 : index + %551 = select %546, %550, %549 : index + %552 = addi %arg5, %c88 : index + %553 = cmpi "slt", %552, %c0 : index + %554 = subi %c-1, %552 : index + %555 = select %553, %554, %552 : index + %556 = divi_signed %555, %c16 : index + %557 = subi %c-1, %556 : index + %558 = select %553, %557, %556 : index + %559 = muli %558, %c-2 : index + %560 = addi %551, %559 : index + %561 = addi %560, %c11 : index + %562 = cmpi "slt", %561, %c0 : index + %563 = subi %c-1, %561 : index + %564 = select %562, %563, %561 : index + %565 = divi_signed %564, %c2 : index + %566 = subi %c-1, %565 : index + %567 = select %562, %566, %565 : index + %568 = muli %567, %c-2 : index + %569 = addi %545, %568 : index + %570 = addi %569, %c11 : index + %571 = load %2[%530, %c0, %570] : memref<16x6x2xvector<8xf32>> + %572 = addf %519, %571 : vector<8xf32> + store %572, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %573 = addi %arg3, %arg5 : index + %574 = addi %573, %c96 : index + %575 = vector.transfer_read %arg2[%arg4, %574], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %576 = cmpi "slt", %arg5, %c0 : index + %577 = subi %c-1, %arg5 : index + %578 = select %576, %577, %arg5 : index + %579 = divi_signed %578, %c16 : index + %580 = subi %c-1, %579 : index + %581 = select %576, %580, %579 : index + %582 = cmpi "slt", %arg5, %c0 : index + %583 = subi %c-1, %arg5 : index + %584 = select %582, %583, %arg5 : index + %585 = divi_signed %584, %c16 : index + %586 = subi %c-1, %585 : index + %587 = select %582, %586, %585 : index + %588 = addi %587, %c6 : index + %589 = cmpi "slt", %588, %c0 : index + %590 = subi %c-1, %588 : index + %591 = select %589, %590, %588 : index + %592 = divi_signed %591, %c16 : index + %593 = subi %c-1, %592 : index + %594 = select %589, %593, %592 : index + %595 = muli %594, %c-16 : index + %596 = addi %581, %595 : index + %597 = addi %596, %c6 : index + %598 = remi_signed %arg5, %c16 : index + %599 = cmpi "slt", %598, %c0 : index + %600 = addi %598, %c16 : index + %601 = select %599, %600, %598 : index + %602 = cmpi "slt", %601, %c0 : index + %603 = subi %c-1, %601 : index + %604 = select %602, %603, %601 : index + %605 = divi_signed %604, %c8 : index + %606 = subi %c-1, %605 : index + %607 = select %602, %606, %605 : index + %608 = remi_signed %607, %c2 : index + %609 = cmpi "slt", %608, %c0 : index + %610 = addi %608, %c2 : index + %611 = select %609, %610, %608 : index + %612 = load %2[%597, %c0, %611] : memref<16x6x2xvector<8xf32>> + %613 = addf %575, %612 : vector<8xf32> + store %613, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %614 = addi %arg3, %arg5 : index + %615 = addi %614, %c104 : index + %616 = vector.transfer_read %arg2[%arg4, %615], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %617 = addi %arg5, %c104 : index + %618 = cmpi "slt", %617, %c0 : index + %619 = subi %c-1, %617 : index + %620 = select %618, %619, %617 : index + %621 = divi_signed %620, %c16 : index + %622 = subi %c-1, %621 : index + %623 = select %618, %622, %621 : index + %624 = remi_signed %623, %c16 : index + %625 = cmpi "slt", %624, %c0 : index + %626 = addi %624, %c16 : index + %627 = select %625, %626, %624 : index + %628 = cmpi "slt", %arg5, %c0 : index + %629 = subi %c-1, %arg5 : index + %630 = select %628, %629, %arg5 : index + %631 = divi_signed %630, %c8 : index + %632 = subi %c-1, %631 : index + %633 = select %628, %632, %631 : index + %634 = addi %arg5, %c104 : index + %635 = cmpi "slt", %634, %c0 : index + %636 = subi %c-1, %634 : index + %637 = select %635, %636, %634 : index + %638 = divi_signed %637, %c16 : index + %639 = subi %c-1, %638 : index + %640 = select %635, %639, %638 : index + %641 = muli %640, %c-2 : index + %642 = addi %633, %641 : index + %643 = cmpi "slt", %arg5, %c0 : index + %644 = subi %c-1, %arg5 : index + %645 = select %643, %644, %arg5 : index + %646 = divi_signed %645, %c8 : index + %647 = subi %c-1, %646 : index + %648 = select %643, %647, %646 : index + %649 = addi %arg5, %c104 : index + %650 = cmpi "slt", %649, %c0 : index + %651 = subi %c-1, %649 : index + %652 = select %650, %651, %649 : index + %653 = divi_signed %652, %c16 : index + %654 = subi %c-1, %653 : index + %655 = select %650, %654, %653 : index + %656 = muli %655, %c-2 : index + %657 = addi %648, %656 : index + %658 = addi %657, %c13 : index + %659 = cmpi "slt", %658, %c0 : index + %660 = subi %c-1, %658 : index + %661 = select %659, %660, %658 : index + %662 = divi_signed %661, %c2 : index + %663 = subi %c-1, %662 : index + %664 = select %659, %663, %662 : index + %665 = muli %664, %c-2 : index + %666 = addi %642, %665 : index + %667 = addi %666, %c13 : index + %668 = load %2[%627, %c0, %667] : memref<16x6x2xvector<8xf32>> + %669 = addf %616, %668 : vector<8xf32> + store %669, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %670 = addi %arg3, %arg5 : index + %671 = addi %670, %c112 : index + %672 = vector.transfer_read %arg2[%arg4, %671], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %673 = cmpi "slt", %arg5, %c0 : index + %674 = subi %c-1, %arg5 : index + %675 = select %673, %674, %arg5 : index + %676 = divi_signed %675, %c16 : index + %677 = subi %c-1, %676 : index + %678 = select %673, %677, %676 : index + %679 = cmpi "slt", %arg5, %c0 : index + %680 = subi %c-1, %arg5 : index + %681 = select %679, %680, %arg5 : index + %682 = divi_signed %681, %c16 : index + %683 = subi %c-1, %682 : index + %684 = select %679, %683, %682 : index + %685 = addi %684, %c7 : index + %686 = cmpi "slt", %685, %c0 : index + %687 = subi %c-1, %685 : index + %688 = select %686, %687, %685 : index + %689 = divi_signed %688, %c16 : index + %690 = subi %c-1, %689 : index + %691 = select %686, %690, %689 : index + %692 = muli %691, %c-16 : index + %693 = addi %678, %692 : index + %694 = addi %693, %c7 : index + %695 = remi_signed %arg5, %c16 : index + %696 = cmpi "slt", %695, %c0 : index + %697 = addi %695, %c16 : index + %698 = select %696, %697, %695 : index + %699 = cmpi "slt", %698, %c0 : index + %700 = subi %c-1, %698 : index + %701 = select %699, %700, %698 : index + %702 = divi_signed %701, %c8 : index + %703 = subi %c-1, %702 : index + %704 = select %699, %703, %702 : index + %705 = remi_signed %704, %c2 : index + %706 = cmpi "slt", %705, %c0 : index + %707 = addi %705, %c2 : index + %708 = select %706, %707, %705 : index + %709 = load %2[%694, %c0, %708] : memref<16x6x2xvector<8xf32>> + %710 = addf %672, %709 : vector<8xf32> + store %710, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %711 = addi %arg3, %arg5 : index + %712 = addi %711, %c120 : index + %713 = vector.transfer_read %arg2[%arg4, %712], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %714 = addi %arg5, %c120 : index + %715 = cmpi "slt", %714, %c0 : index + %716 = subi %c-1, %714 : index + %717 = select %715, %716, %714 : index + %718 = divi_signed %717, %c16 : index + %719 = subi %c-1, %718 : index + %720 = select %715, %719, %718 : index + %721 = remi_signed %720, %c16 : index + %722 = cmpi "slt", %721, %c0 : index + %723 = addi %721, %c16 : index + %724 = select %722, %723, %721 : index + %725 = cmpi "slt", %arg5, %c0 : index + %726 = subi %c-1, %arg5 : index + %727 = select %725, %726, %arg5 : index + %728 = divi_signed %727, %c8 : index + %729 = subi %c-1, %728 : index + %730 = select %725, %729, %728 : index + %731 = addi %arg5, %c120 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c16 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = muli %737, %c-2 : index + %739 = addi %730, %738 : index + %740 = cmpi "slt", %arg5, %c0 : index + %741 = subi %c-1, %arg5 : index + %742 = select %740, %741, %arg5 : index + %743 = divi_signed %742, %c8 : index + %744 = subi %c-1, %743 : index + %745 = select %740, %744, %743 : index + %746 = addi %arg5, %c120 : index + %747 = cmpi "slt", %746, %c0 : index + %748 = subi %c-1, %746 : index + %749 = select %747, %748, %746 : index + %750 = divi_signed %749, %c16 : index + %751 = subi %c-1, %750 : index + %752 = select %747, %751, %750 : index + %753 = muli %752, %c-2 : index + %754 = addi %745, %753 : index + %755 = addi %754, %c15 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = subi %c-1, %755 : index + %758 = select %756, %757, %755 : index + %759 = divi_signed %758, %c2 : index + %760 = subi %c-1, %759 : index + %761 = select %756, %760, %759 : index + %762 = muli %761, %c-2 : index + %763 = addi %739, %762 : index + %764 = addi %763, %c15 : index + %765 = load %2[%724, %c0, %764] : memref<16x6x2xvector<8xf32>> + %766 = addf %713, %765 : vector<8xf32> + store %766, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %767 = addi %arg3, %arg5 : index + %768 = muli %arg6, %c8 : index + %769 = addi %767, %768 : index + %770 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %770, %arg2[%arg4, %769] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/11_CSE.mlir b/Tutorials/optimized_matmul/mlir/11_CSE.mlir new file mode 100644 index 00000000..aa07fd1b --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/11_CSE.mlir @@ -0,0 +1,2095 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg6, %arg8 : index + %7 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = cmpi "slt", %arg5, %c0 : index + %16 = subi %c-1, %arg5 : index + %17 = select %15, %16, %arg5 : index + %18 = divi_signed %17, %c16 : index + %19 = subi %c-1, %18 : index + %20 = select %15, %19, %18 : index + %21 = remi_signed %20, %c16 : index + %22 = cmpi "slt", %21, %c0 : index + %23 = addi %21, %c16 : index + %24 = select %22, %23, %21 : index + %25 = remi_signed %6, %c128 : index + %26 = cmpi "slt", %25, %c0 : index + %27 = addi %25, %c128 : index + %28 = select %26, %27, %25 : index + %29 = remi_signed %arg5, %c16 : index + %30 = cmpi "slt", %29, %c0 : index + %31 = addi %29, %c16 : index + %32 = select %30, %31, %29 : index + %33 = cmpi "slt", %32, %c0 : index + %34 = subi %c-1, %32 : index + %35 = select %33, %34, %32 : index + %36 = divi_signed %35, %c8 : index + %37 = subi %c-1, %36 : index + %38 = select %33, %37, %36 : index + %39 = remi_signed %38, %c2 : index + %40 = cmpi "slt", %39, %c0 : index + %41 = addi %39, %c2 : index + %42 = select %40, %41, %39 : index + %43 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c0_i64 : i64] : vector<8xf32> + %45 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c1_i64 : i64] : vector<8xf32> + %47 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c2_i64 : i64] : vector<8xf32> + %49 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c3_i64 : i64] : vector<8xf32> + %51 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c4_i64 : i64] : vector<8xf32> + %53 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c5_i64 : i64] : vector<8xf32> + %55 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c6_i64 : i64] : vector<8xf32> + %57 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %58 = vector.extractelement %57[%c7_i64 : i64] : vector<8xf32> + %59 = mulf %7, %44 {RelaxedPrecision} : f32 + %60 = mulf %8, %46 {RelaxedPrecision} : f32 + %61 = mulf %9, %48 {RelaxedPrecision} : f32 + %62 = mulf %10, %50 {RelaxedPrecision} : f32 + %63 = mulf %11, %52 {RelaxedPrecision} : f32 + %64 = mulf %12, %54 {RelaxedPrecision} : f32 + %65 = mulf %13, %56 {RelaxedPrecision} : f32 + %66 = mulf %14, %58 {RelaxedPrecision} : f32 + %67 = addi %arg7, %arg9 : index + %68 = remi_signed %67, %c6 : index + %69 = cmpi "slt", %68, %c0 : index + %70 = addi %68, %c6 : index + %71 = select %69, %70, %68 : index + %72 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c0_i64 : i64] : vector<8xf32> + %74 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %75 = vector.extractelement %74[%c1_i64 : i64] : vector<8xf32> + %76 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c2_i64 : i64] : vector<8xf32> + %78 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c3_i64 : i64] : vector<8xf32> + %80 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %81 = vector.extractelement %80[%c4_i64 : i64] : vector<8xf32> + %82 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %83 = vector.extractelement %82[%c5_i64 : i64] : vector<8xf32> + %84 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %85 = vector.extractelement %84[%c6_i64 : i64] : vector<8xf32> + %86 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %87 = vector.extractelement %86[%c7_i64 : i64] : vector<8xf32> + %88 = addf %73, %59 {RelaxedPrecision} : f32 + %89 = addf %75, %60 {RelaxedPrecision} : f32 + %90 = addf %77, %61 {RelaxedPrecision} : f32 + %91 = addf %79, %62 {RelaxedPrecision} : f32 + %92 = addf %81, %63 {RelaxedPrecision} : f32 + %93 = addf %83, %64 {RelaxedPrecision} : f32 + %94 = addf %85, %65 {RelaxedPrecision} : f32 + %95 = addf %87, %66 {RelaxedPrecision} : f32 + %96 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %88, %96[%c0_i64 : i64] : vector<8xf32> + store %97, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %98 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %89, %98[%c1_i64 : i64] : vector<8xf32> + store %99, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %100 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %90, %100[%c2_i64 : i64] : vector<8xf32> + store %101, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %102 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %91, %102[%c3_i64 : i64] : vector<8xf32> + store %103, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %104 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %92, %104[%c4_i64 : i64] : vector<8xf32> + store %105, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %106 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %93, %106[%c5_i64 : i64] : vector<8xf32> + store %107, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %108 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %109 = vector.insertelement %94, %108[%c6_i64 : i64] : vector<8xf32> + store %109, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %110 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %95, %110[%c7_i64 : i64] : vector<8xf32> + store %111, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %112 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %88, %112[%c0_i64 : i64] : vector<8xf32> + store %113, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %114 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %89, %114[%c1_i64 : i64] : vector<8xf32> + store %115, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %116 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %117 = vector.insertelement %90, %116[%c2_i64 : i64] : vector<8xf32> + store %117, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %118 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %91, %118[%c3_i64 : i64] : vector<8xf32> + store %119, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %120 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %92, %120[%c4_i64 : i64] : vector<8xf32> + store %121, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %122 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %123 = vector.insertelement %93, %122[%c5_i64 : i64] : vector<8xf32> + store %123, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %124 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %94, %124[%c6_i64 : i64] : vector<8xf32> + store %125, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %126 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %95, %126[%c7_i64 : i64] : vector<8xf32> + store %127, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %128 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %134 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %135 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %136 = addi %arg5, %c8 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = subi %c-1, %136 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = remi_signed %142, %c16 : index + %144 = cmpi "slt", %143, %c0 : index + %145 = addi %143, %c16 : index + %146 = select %144, %145, %143 : index + %147 = divi_signed %17, %c8 : index + %148 = subi %c-1, %147 : index + %149 = select %15, %148, %147 : index + %150 = muli %142, %c-2 : index + %151 = addi %149, %150 : index + %152 = addi %151, %c1 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c1 : index + %162 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c2_i64 : i64] : vector<8xf32> + %168 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c3_i64 : i64] : vector<8xf32> + %170 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c4_i64 : i64] : vector<8xf32> + %172 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c5_i64 : i64] : vector<8xf32> + %174 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c6_i64 : i64] : vector<8xf32> + %176 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c7_i64 : i64] : vector<8xf32> + %178 = mulf %128, %163 {RelaxedPrecision} : f32 + %179 = mulf %129, %165 {RelaxedPrecision} : f32 + %180 = mulf %130, %167 {RelaxedPrecision} : f32 + %181 = mulf %131, %169 {RelaxedPrecision} : f32 + %182 = mulf %132, %171 {RelaxedPrecision} : f32 + %183 = mulf %133, %173 {RelaxedPrecision} : f32 + %184 = mulf %134, %175 {RelaxedPrecision} : f32 + %185 = mulf %135, %177 {RelaxedPrecision} : f32 + %186 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c0_i64 : i64] : vector<8xf32> + %188 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %189 = vector.extractelement %188[%c1_i64 : i64] : vector<8xf32> + %190 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %191 = vector.extractelement %190[%c2_i64 : i64] : vector<8xf32> + %192 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c3_i64 : i64] : vector<8xf32> + %194 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c4_i64 : i64] : vector<8xf32> + %196 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c5_i64 : i64] : vector<8xf32> + %198 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c6_i64 : i64] : vector<8xf32> + %200 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c7_i64 : i64] : vector<8xf32> + %202 = addf %187, %178 {RelaxedPrecision} : f32 + %203 = addf %189, %179 {RelaxedPrecision} : f32 + %204 = addf %191, %180 {RelaxedPrecision} : f32 + %205 = addf %193, %181 {RelaxedPrecision} : f32 + %206 = addf %195, %182 {RelaxedPrecision} : f32 + %207 = addf %197, %183 {RelaxedPrecision} : f32 + %208 = addf %199, %184 {RelaxedPrecision} : f32 + %209 = addf %201, %185 {RelaxedPrecision} : f32 + %210 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %202, %210[%c0_i64 : i64] : vector<8xf32> + store %211, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %212 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %213 = vector.insertelement %203, %212[%c1_i64 : i64] : vector<8xf32> + store %213, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %214 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %204, %214[%c2_i64 : i64] : vector<8xf32> + store %215, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %216 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %217 = vector.insertelement %205, %216[%c3_i64 : i64] : vector<8xf32> + store %217, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %218 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %219 = vector.insertelement %206, %218[%c4_i64 : i64] : vector<8xf32> + store %219, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %220 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %207, %220[%c5_i64 : i64] : vector<8xf32> + store %221, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %222 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %223 = vector.insertelement %208, %222[%c6_i64 : i64] : vector<8xf32> + store %223, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %224 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %209, %224[%c7_i64 : i64] : vector<8xf32> + store %225, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %226 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %202, %226[%c0_i64 : i64] : vector<8xf32> + store %227, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %228 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %203, %228[%c1_i64 : i64] : vector<8xf32> + store %229, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %230 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %204, %230[%c2_i64 : i64] : vector<8xf32> + store %231, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %232 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %205, %232[%c3_i64 : i64] : vector<8xf32> + store %233, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %234 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %206, %234[%c4_i64 : i64] : vector<8xf32> + store %235, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %236 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %207, %236[%c5_i64 : i64] : vector<8xf32> + store %237, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %238 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %208, %238[%c6_i64 : i64] : vector<8xf32> + store %239, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %240 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %209, %240[%c7_i64 : i64] : vector<8xf32> + store %241, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %6 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %7 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = cmpi "slt", %arg5, %c0 : index + %14 = subi %c-1, %arg5 : index + %15 = select %13, %14, %arg5 : index + %16 = divi_signed %15, %c16 : index + %17 = subi %c-1, %16 : index + %18 = select %13, %17, %16 : index + %19 = remi_signed %18, %c16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = addi %19, %c16 : index + %22 = select %20, %21, %19 : index + %23 = remi_signed %4, %c128 : index + %24 = cmpi "slt", %23, %c0 : index + %25 = addi %23, %c128 : index + %26 = select %24, %25, %23 : index + %27 = remi_signed %arg5, %c16 : index + %28 = cmpi "slt", %27, %c0 : index + %29 = addi %27, %c16 : index + %30 = select %28, %29, %27 : index + %31 = cmpi "slt", %30, %c0 : index + %32 = subi %c-1, %30 : index + %33 = select %31, %32, %30 : index + %34 = divi_signed %33, %c8 : index + %35 = subi %c-1, %34 : index + %36 = select %31, %35, %34 : index + %37 = remi_signed %36, %c2 : index + %38 = cmpi "slt", %37, %c0 : index + %39 = addi %37, %c2 : index + %40 = select %38, %39, %37 : index + %41 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c0_i64 : i64] : vector<8xf32> + %43 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c1_i64 : i64] : vector<8xf32> + %45 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c2_i64 : i64] : vector<8xf32> + %47 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c3_i64 : i64] : vector<8xf32> + %49 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c4_i64 : i64] : vector<8xf32> + %51 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c5_i64 : i64] : vector<8xf32> + %53 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c6_i64 : i64] : vector<8xf32> + %55 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c7_i64 : i64] : vector<8xf32> + %57 = mulf %5, %42 {RelaxedPrecision} : f32 + %58 = mulf %6, %44 {RelaxedPrecision} : f32 + %59 = mulf %7, %46 {RelaxedPrecision} : f32 + %60 = mulf %8, %48 {RelaxedPrecision} : f32 + %61 = mulf %9, %50 {RelaxedPrecision} : f32 + %62 = mulf %10, %52 {RelaxedPrecision} : f32 + %63 = mulf %11, %54 {RelaxedPrecision} : f32 + %64 = mulf %12, %56 {RelaxedPrecision} : f32 + %65 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c1_i64 : i64] : vector<8xf32> + %69 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c2_i64 : i64] : vector<8xf32> + %71 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %72 = vector.extractelement %71[%c3_i64 : i64] : vector<8xf32> + %73 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c4_i64 : i64] : vector<8xf32> + %75 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %78 = vector.extractelement %77[%c6_i64 : i64] : vector<8xf32> + %79 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = addf %66, %57 {RelaxedPrecision} : f32 + %82 = addf %68, %58 {RelaxedPrecision} : f32 + %83 = addf %70, %59 {RelaxedPrecision} : f32 + %84 = addf %72, %60 {RelaxedPrecision} : f32 + %85 = addf %74, %61 {RelaxedPrecision} : f32 + %86 = addf %76, %62 {RelaxedPrecision} : f32 + %87 = addf %78, %63 {RelaxedPrecision} : f32 + %88 = addf %80, %64 {RelaxedPrecision} : f32 + %89 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + store %90, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %91 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %82, %91[%c1_i64 : i64] : vector<8xf32> + store %92, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %93 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %94 = vector.insertelement %83, %93[%c2_i64 : i64] : vector<8xf32> + store %94, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %95 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %84, %95[%c3_i64 : i64] : vector<8xf32> + store %96, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %97 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c4_i64 : i64] : vector<8xf32> + store %98, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %99 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %100 = vector.insertelement %86, %99[%c5_i64 : i64] : vector<8xf32> + store %100, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %101 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %87, %101[%c6_i64 : i64] : vector<8xf32> + store %102, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %103 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %88, %103[%c7_i64 : i64] : vector<8xf32> + store %104, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %105 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %106 = vector.insertelement %81, %105[%c0_i64 : i64] : vector<8xf32> + store %106, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %107 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %82, %107[%c1_i64 : i64] : vector<8xf32> + store %108, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %109 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %83, %109[%c2_i64 : i64] : vector<8xf32> + store %110, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %111 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %112 = vector.insertelement %84, %111[%c3_i64 : i64] : vector<8xf32> + store %112, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %113 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %114 = vector.insertelement %85, %113[%c4_i64 : i64] : vector<8xf32> + store %114, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %115 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %86, %115[%c5_i64 : i64] : vector<8xf32> + store %116, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %117 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %87, %117[%c6_i64 : i64] : vector<8xf32> + store %118, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %119 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %120 = vector.insertelement %88, %119[%c7_i64 : i64] : vector<8xf32> + store %120, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %121 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %122 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %124 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = addi %arg5, %c8 : index + %130 = cmpi "slt", %129, %c0 : index + %131 = subi %c-1, %129 : index + %132 = select %130, %131, %129 : index + %133 = divi_signed %132, %c16 : index + %134 = subi %c-1, %133 : index + %135 = select %130, %134, %133 : index + %136 = remi_signed %135, %c16 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = addi %136, %c16 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %15, %c8 : index + %141 = subi %c-1, %140 : index + %142 = select %13, %141, %140 : index + %143 = muli %135, %c-2 : index + %144 = addi %142, %143 : index + %145 = addi %144, %c1 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c2 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = muli %151, %c-2 : index + %153 = addi %144, %152 : index + %154 = addi %153, %c1 : index + %155 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c0_i64 : i64] : vector<8xf32> + %157 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %158 = vector.extractelement %157[%c1_i64 : i64] : vector<8xf32> + %159 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %160 = vector.extractelement %159[%c2_i64 : i64] : vector<8xf32> + %161 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c3_i64 : i64] : vector<8xf32> + %163 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c4_i64 : i64] : vector<8xf32> + %165 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c5_i64 : i64] : vector<8xf32> + %167 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c6_i64 : i64] : vector<8xf32> + %169 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c7_i64 : i64] : vector<8xf32> + %171 = mulf %121, %156 {RelaxedPrecision} : f32 + %172 = mulf %122, %158 {RelaxedPrecision} : f32 + %173 = mulf %123, %160 {RelaxedPrecision} : f32 + %174 = mulf %124, %162 {RelaxedPrecision} : f32 + %175 = mulf %125, %164 {RelaxedPrecision} : f32 + %176 = mulf %126, %166 {RelaxedPrecision} : f32 + %177 = mulf %127, %168 {RelaxedPrecision} : f32 + %178 = mulf %128, %170 {RelaxedPrecision} : f32 + %179 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %180 = vector.extractelement %179[%c0_i64 : i64] : vector<8xf32> + %181 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %182 = vector.extractelement %181[%c1_i64 : i64] : vector<8xf32> + %183 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %186 = vector.extractelement %185[%c3_i64 : i64] : vector<8xf32> + %187 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %188 = vector.extractelement %187[%c4_i64 : i64] : vector<8xf32> + %189 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c5_i64 : i64] : vector<8xf32> + %191 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %192 = vector.extractelement %191[%c6_i64 : i64] : vector<8xf32> + %193 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c7_i64 : i64] : vector<8xf32> + %195 = addf %180, %171 {RelaxedPrecision} : f32 + %196 = addf %182, %172 {RelaxedPrecision} : f32 + %197 = addf %184, %173 {RelaxedPrecision} : f32 + %198 = addf %186, %174 {RelaxedPrecision} : f32 + %199 = addf %188, %175 {RelaxedPrecision} : f32 + %200 = addf %190, %176 {RelaxedPrecision} : f32 + %201 = addf %192, %177 {RelaxedPrecision} : f32 + %202 = addf %194, %178 {RelaxedPrecision} : f32 + %203 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %204 = vector.insertelement %195, %203[%c0_i64 : i64] : vector<8xf32> + store %204, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %205 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %206 = vector.insertelement %196, %205[%c1_i64 : i64] : vector<8xf32> + store %206, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %207 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %208 = vector.insertelement %197, %207[%c2_i64 : i64] : vector<8xf32> + store %208, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %209 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %210 = vector.insertelement %198, %209[%c3_i64 : i64] : vector<8xf32> + store %210, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %211 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %199, %211[%c4_i64 : i64] : vector<8xf32> + store %212, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %213 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %214 = vector.insertelement %200, %213[%c5_i64 : i64] : vector<8xf32> + store %214, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %215 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %216 = vector.insertelement %201, %215[%c6_i64 : i64] : vector<8xf32> + store %216, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %217 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %202, %217[%c7_i64 : i64] : vector<8xf32> + store %218, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %219 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %220 = vector.insertelement %195, %219[%c0_i64 : i64] : vector<8xf32> + store %220, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %221 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %222 = vector.insertelement %196, %221[%c1_i64 : i64] : vector<8xf32> + store %222, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %223 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %197, %223[%c2_i64 : i64] : vector<8xf32> + store %224, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %225 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %198, %225[%c3_i64 : i64] : vector<8xf32> + store %226, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %227 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %199, %227[%c4_i64 : i64] : vector<8xf32> + store %228, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %229 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %200, %229[%c5_i64 : i64] : vector<8xf32> + store %230, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %231 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %201, %231[%c6_i64 : i64] : vector<8xf32> + store %232, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %233 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %202, %233[%c7_i64 : i64] : vector<8xf32> + store %234, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/11_GpuKernelOutlining.mlir b/Tutorials/optimized_matmul/mlir/11_GpuKernelOutlining.mlir new file mode 100644 index 00000000..2b20194d --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/11_GpuKernelOutlining.mlir @@ -0,0 +1,1368 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = addi %arg4, %c1 : index + %114 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = mulf %114, %115 {RelaxedPrecision} : f32 + %117 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = addf %117, %116 {RelaxedPrecision} : f32 + store %118, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %121 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = mulf %120, %121 {RelaxedPrecision} : f32 + %123 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = addf %123, %122 {RelaxedPrecision} : f32 + store %124, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %125, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = mulf %126, %127 {RelaxedPrecision} : f32 + %129 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = addf %129, %128 {RelaxedPrecision} : f32 + store %130, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %131, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = mulf %132, %133 {RelaxedPrecision} : f32 + %135 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = addf %135, %134 {RelaxedPrecision} : f32 + store %136, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %137, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = mulf %138, %139 {RelaxedPrecision} : f32 + %141 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = addf %141, %140 {RelaxedPrecision} : f32 + store %142, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = mulf %144, %145 {RelaxedPrecision} : f32 + %147 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = addf %147, %146 {RelaxedPrecision} : f32 + store %148, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %149, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %151 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = mulf %150, %151 {RelaxedPrecision} : f32 + %153 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = addf %153, %152 {RelaxedPrecision} : f32 + store %154, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %155, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = mulf %162, %163 {RelaxedPrecision} : f32 + %165 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = addf %165, %164 {RelaxedPrecision} : f32 + store %166, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %169 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = mulf %168, %169 {RelaxedPrecision} : f32 + %171 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addf %171, %170 {RelaxedPrecision} : f32 + store %172, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %173, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %175 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = mulf %174, %175 {RelaxedPrecision} : f32 + %177 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addf %177, %176 {RelaxedPrecision} : f32 + store %178, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %179, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %181 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = mulf %180, %181 {RelaxedPrecision} : f32 + %183 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = addf %183, %182 {RelaxedPrecision} : f32 + store %184, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %185, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %193 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = mulf %192, %193 {RelaxedPrecision} : f32 + %195 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addf %195, %194 {RelaxedPrecision} : f32 + store %196, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %197, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %199 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = mulf %198, %199 {RelaxedPrecision} : f32 + %201 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addf %201, %200 {RelaxedPrecision} : f32 + store %202, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %203, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %205 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = mulf %204, %205 {RelaxedPrecision} : f32 + %207 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = addf %207, %206 {RelaxedPrecision} : f32 + store %208, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %209, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addi %arg4, %c2 : index + %211 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %212 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = mulf %211, %212 {RelaxedPrecision} : f32 + %214 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = addf %214, %213 {RelaxedPrecision} : f32 + store %215, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %216, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %224 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = mulf %223, %224 {RelaxedPrecision} : f32 + %226 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = addf %226, %225 {RelaxedPrecision} : f32 + store %227, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %228, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %230 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = mulf %229, %230 {RelaxedPrecision} : f32 + %232 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = addf %232, %231 {RelaxedPrecision} : f32 + store %233, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %234, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %242 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = mulf %241, %242 {RelaxedPrecision} : f32 + %244 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = addf %244, %243 {RelaxedPrecision} : f32 + store %245, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %246, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %248 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = mulf %247, %248 {RelaxedPrecision} : f32 + %250 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = addf %250, %249 {RelaxedPrecision} : f32 + store %251, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %252, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %260 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = mulf %259, %260 {RelaxedPrecision} : f32 + %262 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = addf %262, %261 {RelaxedPrecision} : f32 + store %263, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %264, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %266 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = mulf %265, %266 {RelaxedPrecision} : f32 + %268 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = addf %268, %267 {RelaxedPrecision} : f32 + store %269, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %270, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %278 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = mulf %277, %278 {RelaxedPrecision} : f32 + %280 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = addf %280, %279 {RelaxedPrecision} : f32 + store %281, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %282, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %284 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = mulf %283, %284 {RelaxedPrecision} : f32 + %286 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = addf %286, %285 {RelaxedPrecision} : f32 + store %287, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %288, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %290 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = mulf %289, %290 {RelaxedPrecision} : f32 + %292 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = addf %292, %291 {RelaxedPrecision} : f32 + store %293, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %294, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %296 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = mulf %295, %296 {RelaxedPrecision} : f32 + %298 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = addf %298, %297 {RelaxedPrecision} : f32 + store %299, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %300, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %302 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = mulf %301, %302 {RelaxedPrecision} : f32 + %304 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = addf %304, %303 {RelaxedPrecision} : f32 + store %305, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %306, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = addi %arg4, %c3 : index + %308 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %309 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = mulf %308, %309 {RelaxedPrecision} : f32 + %311 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addf %311, %310 {RelaxedPrecision} : f32 + store %312, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %313, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %314 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = mulf %314, %315 {RelaxedPrecision} : f32 + %317 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = addf %317, %316 {RelaxedPrecision} : f32 + store %318, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = mulf %320, %321 {RelaxedPrecision} : f32 + %323 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = addf %323, %322 {RelaxedPrecision} : f32 + store %324, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %327 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = mulf %326, %327 {RelaxedPrecision} : f32 + %329 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addf %329, %328 {RelaxedPrecision} : f32 + store %330, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %331, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %333 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = mulf %332, %333 {RelaxedPrecision} : f32 + %335 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = addf %335, %334 {RelaxedPrecision} : f32 + store %336, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %337, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = mulf %338, %339 {RelaxedPrecision} : f32 + %341 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = addf %341, %340 {RelaxedPrecision} : f32 + store %342, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %345 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = mulf %344, %345 {RelaxedPrecision} : f32 + %347 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addf %347, %346 {RelaxedPrecision} : f32 + store %348, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %349, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %351 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = mulf %350, %351 {RelaxedPrecision} : f32 + %353 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = addf %353, %352 {RelaxedPrecision} : f32 + store %354, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %355, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = mulf %356, %357 {RelaxedPrecision} : f32 + %359 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = addf %359, %358 {RelaxedPrecision} : f32 + store %360, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = mulf %362, %363 {RelaxedPrecision} : f32 + %365 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addf %365, %364 {RelaxedPrecision} : f32 + store %366, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %369 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = mulf %368, %369 {RelaxedPrecision} : f32 + %371 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = addf %371, %370 {RelaxedPrecision} : f32 + store %372, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %373, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = mulf %374, %375 {RelaxedPrecision} : f32 + %377 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = addf %377, %376 {RelaxedPrecision} : f32 + store %378, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %381 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = mulf %380, %381 {RelaxedPrecision} : f32 + %383 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addf %383, %382 {RelaxedPrecision} : f32 + store %384, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %385, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = mulf %386, %387 {RelaxedPrecision} : f32 + %389 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = addf %389, %388 {RelaxedPrecision} : f32 + store %390, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = mulf %392, %393 {RelaxedPrecision} : f32 + %395 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = addf %395, %394 {RelaxedPrecision} : f32 + store %396, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %399 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = mulf %398, %399 {RelaxedPrecision} : f32 + %401 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addf %401, %400 {RelaxedPrecision} : f32 + store %402, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %403, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = addi %arg4, %c4 : index + %405 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %412 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = mulf %411, %412 {RelaxedPrecision} : f32 + %414 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = addf %414, %413 {RelaxedPrecision} : f32 + store %415, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %416, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %418 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = mulf %417, %418 {RelaxedPrecision} : f32 + %420 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addf %420, %419 {RelaxedPrecision} : f32 + store %421, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %422, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %430 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = mulf %429, %430 {RelaxedPrecision} : f32 + %432 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = addf %432, %431 {RelaxedPrecision} : f32 + store %433, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %434, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %442 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = mulf %441, %442 {RelaxedPrecision} : f32 + %444 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = addf %444, %443 {RelaxedPrecision} : f32 + store %445, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %446, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %448 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = mulf %447, %448 {RelaxedPrecision} : f32 + %450 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addf %450, %449 {RelaxedPrecision} : f32 + store %451, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %452, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %454 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = mulf %453, %454 {RelaxedPrecision} : f32 + %456 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = addf %456, %455 {RelaxedPrecision} : f32 + store %457, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %458 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %458, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %460 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = mulf %459, %460 {RelaxedPrecision} : f32 + %462 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = addf %462, %461 {RelaxedPrecision} : f32 + store %463, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %464, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %466 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = mulf %465, %466 {RelaxedPrecision} : f32 + %468 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = addf %468, %467 {RelaxedPrecision} : f32 + store %469, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %470, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %472 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = mulf %471, %472 {RelaxedPrecision} : f32 + %474 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = addf %474, %473 {RelaxedPrecision} : f32 + store %475, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %476, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %478 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = mulf %477, %478 {RelaxedPrecision} : f32 + %480 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = addf %480, %479 {RelaxedPrecision} : f32 + store %481, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %482, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %484 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = mulf %483, %484 {RelaxedPrecision} : f32 + %486 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = addf %486, %485 {RelaxedPrecision} : f32 + store %487, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %488, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %490 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = mulf %489, %490 {RelaxedPrecision} : f32 + %492 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = addf %492, %491 {RelaxedPrecision} : f32 + store %493, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %494, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %496 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = mulf %495, %496 {RelaxedPrecision} : f32 + %498 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = addf %498, %497 {RelaxedPrecision} : f32 + store %499, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %500, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = addi %arg4, %c5 : index + %502 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %503 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = mulf %502, %503 {RelaxedPrecision} : f32 + %505 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = addf %505, %504 {RelaxedPrecision} : f32 + store %506, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %507, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %509 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = mulf %508, %509 {RelaxedPrecision} : f32 + %511 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %512 = addf %511, %510 {RelaxedPrecision} : f32 + store %512, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %513, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %521 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = mulf %520, %521 {RelaxedPrecision} : f32 + %523 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = addf %523, %522 {RelaxedPrecision} : f32 + store %524, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %525, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %527 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = mulf %526, %527 {RelaxedPrecision} : f32 + %529 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addf %529, %528 {RelaxedPrecision} : f32 + store %530, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %531, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %533 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = mulf %532, %533 {RelaxedPrecision} : f32 + %535 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addf %535, %534 {RelaxedPrecision} : f32 + store %536, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %537 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %537, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %539 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = mulf %538, %539 {RelaxedPrecision} : f32 + %541 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = addf %541, %540 {RelaxedPrecision} : f32 + store %542, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %543, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %551 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = mulf %550, %551 {RelaxedPrecision} : f32 + %553 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addf %553, %552 {RelaxedPrecision} : f32 + store %554, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %555 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %555, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %557 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = mulf %556, %557 {RelaxedPrecision} : f32 + %559 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addf %559, %558 {RelaxedPrecision} : f32 + store %560, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %561, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %563 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = mulf %562, %563 {RelaxedPrecision} : f32 + %565 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = addf %565, %564 {RelaxedPrecision} : f32 + store %566, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %567, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %569 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = mulf %568, %569 {RelaxedPrecision} : f32 + %571 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = addf %571, %570 {RelaxedPrecision} : f32 + store %572, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %573 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %573, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %581 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = mulf %580, %581 {RelaxedPrecision} : f32 + %583 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = addf %583, %582 {RelaxedPrecision} : f32 + store %584, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %585, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %587 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = mulf %586, %587 {RelaxedPrecision} : f32 + %589 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addf %589, %588 {RelaxedPrecision} : f32 + store %590, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %591 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %591, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %593 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = mulf %592, %593 {RelaxedPrecision} : f32 + %595 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = addf %595, %594 {RelaxedPrecision} : f32 + store %596, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %597 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %597, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = mulf %113, %114 {RelaxedPrecision} : f32 + %116 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %117 = addf %116, %115 {RelaxedPrecision} : f32 + store %117, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %118, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = mulf %125, %126 {RelaxedPrecision} : f32 + %128 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = addf %128, %127 {RelaxedPrecision} : f32 + store %129, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %130, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = mulf %131, %132 {RelaxedPrecision} : f32 + %134 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = addf %134, %133 {RelaxedPrecision} : f32 + store %135, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = mulf %143, %144 {RelaxedPrecision} : f32 + %146 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = addf %146, %145 {RelaxedPrecision} : f32 + store %147, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %148, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = mulf %149, %150 {RelaxedPrecision} : f32 + %152 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = addf %152, %151 {RelaxedPrecision} : f32 + store %153, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %157 = mulf %155, %156 {RelaxedPrecision} : f32 + %158 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = addf %158, %157 {RelaxedPrecision} : f32 + store %159, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %160, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = mulf %161, %162 {RelaxedPrecision} : f32 + %164 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = addf %164, %163 {RelaxedPrecision} : f32 + store %165, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %166, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = mulf %167, %168 {RelaxedPrecision} : f32 + %170 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = addf %170, %169 {RelaxedPrecision} : f32 + store %171, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %174 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = mulf %173, %174 {RelaxedPrecision} : f32 + %176 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = addf %176, %175 {RelaxedPrecision} : f32 + store %177, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %178, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %180 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = mulf %179, %180 {RelaxedPrecision} : f32 + %182 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = addf %182, %181 {RelaxedPrecision} : f32 + store %183, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %184, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = mulf %185, %186 {RelaxedPrecision} : f32 + %188 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = addf %188, %187 {RelaxedPrecision} : f32 + store %189, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %192 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %193 = mulf %191, %192 {RelaxedPrecision} : f32 + %194 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = addf %194, %193 {RelaxedPrecision} : f32 + store %195, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %196, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %198 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = mulf %197, %198 {RelaxedPrecision} : f32 + %200 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = addf %200, %199 {RelaxedPrecision} : f32 + store %201, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %202, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = mulf %203, %204 {RelaxedPrecision} : f32 + %206 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = addf %206, %205 {RelaxedPrecision} : f32 + store %207, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %210 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = mulf %209, %210 {RelaxedPrecision} : f32 + %212 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = addf %212, %211 {RelaxedPrecision} : f32 + store %213, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %214, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %216 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = mulf %215, %216 {RelaxedPrecision} : f32 + %218 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = addf %218, %217 {RelaxedPrecision} : f32 + store %219, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %220, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %318 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = mulf %317, %318 {RelaxedPrecision} : f32 + %320 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addf %320, %319 {RelaxedPrecision} : f32 + store %321, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %322, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %324 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = mulf %323, %324 {RelaxedPrecision} : f32 + %326 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = addf %326, %325 {RelaxedPrecision} : f32 + store %327, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %328, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = mulf %329, %330 {RelaxedPrecision} : f32 + %332 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = addf %332, %331 {RelaxedPrecision} : f32 + store %333, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %342 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = mulf %341, %342 {RelaxedPrecision} : f32 + %344 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %345 = addf %344, %343 {RelaxedPrecision} : f32 + store %345, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %346, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = mulf %347, %348 {RelaxedPrecision} : f32 + %350 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addf %350, %349 {RelaxedPrecision} : f32 + store %351, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %354 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = mulf %353, %354 {RelaxedPrecision} : f32 + %356 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addf %356, %355 {RelaxedPrecision} : f32 + store %357, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %358, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %360 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = mulf %359, %360 {RelaxedPrecision} : f32 + %362 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %363 = addf %362, %361 {RelaxedPrecision} : f32 + store %363, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %364, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %372 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = mulf %371, %372 {RelaxedPrecision} : f32 + %374 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addf %374, %373 {RelaxedPrecision} : f32 + store %375, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %376, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %378 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = mulf %377, %378 {RelaxedPrecision} : f32 + %380 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addf %380, %379 {RelaxedPrecision} : f32 + store %381, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %382, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = mulf %383, %384 {RelaxedPrecision} : f32 + %386 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = addf %386, %385 {RelaxedPrecision} : f32 + store %387, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %390 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = mulf %389, %390 {RelaxedPrecision} : f32 + %392 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addf %392, %391 {RelaxedPrecision} : f32 + store %393, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %394, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/12_GpuKernelOutlining.mlir b/Tutorials/optimized_matmul/mlir/12_GpuKernelOutlining.mlir new file mode 100644 index 00000000..aa07fd1b --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/12_GpuKernelOutlining.mlir @@ -0,0 +1,2095 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg6, %arg8 : index + %7 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = cmpi "slt", %arg5, %c0 : index + %16 = subi %c-1, %arg5 : index + %17 = select %15, %16, %arg5 : index + %18 = divi_signed %17, %c16 : index + %19 = subi %c-1, %18 : index + %20 = select %15, %19, %18 : index + %21 = remi_signed %20, %c16 : index + %22 = cmpi "slt", %21, %c0 : index + %23 = addi %21, %c16 : index + %24 = select %22, %23, %21 : index + %25 = remi_signed %6, %c128 : index + %26 = cmpi "slt", %25, %c0 : index + %27 = addi %25, %c128 : index + %28 = select %26, %27, %25 : index + %29 = remi_signed %arg5, %c16 : index + %30 = cmpi "slt", %29, %c0 : index + %31 = addi %29, %c16 : index + %32 = select %30, %31, %29 : index + %33 = cmpi "slt", %32, %c0 : index + %34 = subi %c-1, %32 : index + %35 = select %33, %34, %32 : index + %36 = divi_signed %35, %c8 : index + %37 = subi %c-1, %36 : index + %38 = select %33, %37, %36 : index + %39 = remi_signed %38, %c2 : index + %40 = cmpi "slt", %39, %c0 : index + %41 = addi %39, %c2 : index + %42 = select %40, %41, %39 : index + %43 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c0_i64 : i64] : vector<8xf32> + %45 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c1_i64 : i64] : vector<8xf32> + %47 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c2_i64 : i64] : vector<8xf32> + %49 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c3_i64 : i64] : vector<8xf32> + %51 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c4_i64 : i64] : vector<8xf32> + %53 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c5_i64 : i64] : vector<8xf32> + %55 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c6_i64 : i64] : vector<8xf32> + %57 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %58 = vector.extractelement %57[%c7_i64 : i64] : vector<8xf32> + %59 = mulf %7, %44 {RelaxedPrecision} : f32 + %60 = mulf %8, %46 {RelaxedPrecision} : f32 + %61 = mulf %9, %48 {RelaxedPrecision} : f32 + %62 = mulf %10, %50 {RelaxedPrecision} : f32 + %63 = mulf %11, %52 {RelaxedPrecision} : f32 + %64 = mulf %12, %54 {RelaxedPrecision} : f32 + %65 = mulf %13, %56 {RelaxedPrecision} : f32 + %66 = mulf %14, %58 {RelaxedPrecision} : f32 + %67 = addi %arg7, %arg9 : index + %68 = remi_signed %67, %c6 : index + %69 = cmpi "slt", %68, %c0 : index + %70 = addi %68, %c6 : index + %71 = select %69, %70, %68 : index + %72 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c0_i64 : i64] : vector<8xf32> + %74 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %75 = vector.extractelement %74[%c1_i64 : i64] : vector<8xf32> + %76 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c2_i64 : i64] : vector<8xf32> + %78 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c3_i64 : i64] : vector<8xf32> + %80 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %81 = vector.extractelement %80[%c4_i64 : i64] : vector<8xf32> + %82 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %83 = vector.extractelement %82[%c5_i64 : i64] : vector<8xf32> + %84 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %85 = vector.extractelement %84[%c6_i64 : i64] : vector<8xf32> + %86 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %87 = vector.extractelement %86[%c7_i64 : i64] : vector<8xf32> + %88 = addf %73, %59 {RelaxedPrecision} : f32 + %89 = addf %75, %60 {RelaxedPrecision} : f32 + %90 = addf %77, %61 {RelaxedPrecision} : f32 + %91 = addf %79, %62 {RelaxedPrecision} : f32 + %92 = addf %81, %63 {RelaxedPrecision} : f32 + %93 = addf %83, %64 {RelaxedPrecision} : f32 + %94 = addf %85, %65 {RelaxedPrecision} : f32 + %95 = addf %87, %66 {RelaxedPrecision} : f32 + %96 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %88, %96[%c0_i64 : i64] : vector<8xf32> + store %97, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %98 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %89, %98[%c1_i64 : i64] : vector<8xf32> + store %99, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %100 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %90, %100[%c2_i64 : i64] : vector<8xf32> + store %101, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %102 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %91, %102[%c3_i64 : i64] : vector<8xf32> + store %103, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %104 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %92, %104[%c4_i64 : i64] : vector<8xf32> + store %105, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %106 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %93, %106[%c5_i64 : i64] : vector<8xf32> + store %107, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %108 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %109 = vector.insertelement %94, %108[%c6_i64 : i64] : vector<8xf32> + store %109, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %110 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %95, %110[%c7_i64 : i64] : vector<8xf32> + store %111, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %112 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %88, %112[%c0_i64 : i64] : vector<8xf32> + store %113, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %114 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %89, %114[%c1_i64 : i64] : vector<8xf32> + store %115, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %116 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %117 = vector.insertelement %90, %116[%c2_i64 : i64] : vector<8xf32> + store %117, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %118 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %91, %118[%c3_i64 : i64] : vector<8xf32> + store %119, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %120 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %92, %120[%c4_i64 : i64] : vector<8xf32> + store %121, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %122 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %123 = vector.insertelement %93, %122[%c5_i64 : i64] : vector<8xf32> + store %123, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %124 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %94, %124[%c6_i64 : i64] : vector<8xf32> + store %125, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %126 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %95, %126[%c7_i64 : i64] : vector<8xf32> + store %127, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %128 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %134 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %135 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %136 = addi %arg5, %c8 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = subi %c-1, %136 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = remi_signed %142, %c16 : index + %144 = cmpi "slt", %143, %c0 : index + %145 = addi %143, %c16 : index + %146 = select %144, %145, %143 : index + %147 = divi_signed %17, %c8 : index + %148 = subi %c-1, %147 : index + %149 = select %15, %148, %147 : index + %150 = muli %142, %c-2 : index + %151 = addi %149, %150 : index + %152 = addi %151, %c1 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c1 : index + %162 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c2_i64 : i64] : vector<8xf32> + %168 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c3_i64 : i64] : vector<8xf32> + %170 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c4_i64 : i64] : vector<8xf32> + %172 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c5_i64 : i64] : vector<8xf32> + %174 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c6_i64 : i64] : vector<8xf32> + %176 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c7_i64 : i64] : vector<8xf32> + %178 = mulf %128, %163 {RelaxedPrecision} : f32 + %179 = mulf %129, %165 {RelaxedPrecision} : f32 + %180 = mulf %130, %167 {RelaxedPrecision} : f32 + %181 = mulf %131, %169 {RelaxedPrecision} : f32 + %182 = mulf %132, %171 {RelaxedPrecision} : f32 + %183 = mulf %133, %173 {RelaxedPrecision} : f32 + %184 = mulf %134, %175 {RelaxedPrecision} : f32 + %185 = mulf %135, %177 {RelaxedPrecision} : f32 + %186 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c0_i64 : i64] : vector<8xf32> + %188 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %189 = vector.extractelement %188[%c1_i64 : i64] : vector<8xf32> + %190 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %191 = vector.extractelement %190[%c2_i64 : i64] : vector<8xf32> + %192 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c3_i64 : i64] : vector<8xf32> + %194 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c4_i64 : i64] : vector<8xf32> + %196 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c5_i64 : i64] : vector<8xf32> + %198 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c6_i64 : i64] : vector<8xf32> + %200 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c7_i64 : i64] : vector<8xf32> + %202 = addf %187, %178 {RelaxedPrecision} : f32 + %203 = addf %189, %179 {RelaxedPrecision} : f32 + %204 = addf %191, %180 {RelaxedPrecision} : f32 + %205 = addf %193, %181 {RelaxedPrecision} : f32 + %206 = addf %195, %182 {RelaxedPrecision} : f32 + %207 = addf %197, %183 {RelaxedPrecision} : f32 + %208 = addf %199, %184 {RelaxedPrecision} : f32 + %209 = addf %201, %185 {RelaxedPrecision} : f32 + %210 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %202, %210[%c0_i64 : i64] : vector<8xf32> + store %211, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %212 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %213 = vector.insertelement %203, %212[%c1_i64 : i64] : vector<8xf32> + store %213, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %214 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %204, %214[%c2_i64 : i64] : vector<8xf32> + store %215, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %216 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %217 = vector.insertelement %205, %216[%c3_i64 : i64] : vector<8xf32> + store %217, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %218 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %219 = vector.insertelement %206, %218[%c4_i64 : i64] : vector<8xf32> + store %219, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %220 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %207, %220[%c5_i64 : i64] : vector<8xf32> + store %221, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %222 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %223 = vector.insertelement %208, %222[%c6_i64 : i64] : vector<8xf32> + store %223, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %224 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %209, %224[%c7_i64 : i64] : vector<8xf32> + store %225, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %226 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %202, %226[%c0_i64 : i64] : vector<8xf32> + store %227, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %228 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %203, %228[%c1_i64 : i64] : vector<8xf32> + store %229, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %230 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %204, %230[%c2_i64 : i64] : vector<8xf32> + store %231, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %232 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %205, %232[%c3_i64 : i64] : vector<8xf32> + store %233, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %234 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %206, %234[%c4_i64 : i64] : vector<8xf32> + store %235, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %236 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %207, %236[%c5_i64 : i64] : vector<8xf32> + store %237, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %238 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %208, %238[%c6_i64 : i64] : vector<8xf32> + store %239, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %240 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %209, %240[%c7_i64 : i64] : vector<8xf32> + store %241, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %6 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %7 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = cmpi "slt", %arg5, %c0 : index + %14 = subi %c-1, %arg5 : index + %15 = select %13, %14, %arg5 : index + %16 = divi_signed %15, %c16 : index + %17 = subi %c-1, %16 : index + %18 = select %13, %17, %16 : index + %19 = remi_signed %18, %c16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = addi %19, %c16 : index + %22 = select %20, %21, %19 : index + %23 = remi_signed %4, %c128 : index + %24 = cmpi "slt", %23, %c0 : index + %25 = addi %23, %c128 : index + %26 = select %24, %25, %23 : index + %27 = remi_signed %arg5, %c16 : index + %28 = cmpi "slt", %27, %c0 : index + %29 = addi %27, %c16 : index + %30 = select %28, %29, %27 : index + %31 = cmpi "slt", %30, %c0 : index + %32 = subi %c-1, %30 : index + %33 = select %31, %32, %30 : index + %34 = divi_signed %33, %c8 : index + %35 = subi %c-1, %34 : index + %36 = select %31, %35, %34 : index + %37 = remi_signed %36, %c2 : index + %38 = cmpi "slt", %37, %c0 : index + %39 = addi %37, %c2 : index + %40 = select %38, %39, %37 : index + %41 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c0_i64 : i64] : vector<8xf32> + %43 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c1_i64 : i64] : vector<8xf32> + %45 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c2_i64 : i64] : vector<8xf32> + %47 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c3_i64 : i64] : vector<8xf32> + %49 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c4_i64 : i64] : vector<8xf32> + %51 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c5_i64 : i64] : vector<8xf32> + %53 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c6_i64 : i64] : vector<8xf32> + %55 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c7_i64 : i64] : vector<8xf32> + %57 = mulf %5, %42 {RelaxedPrecision} : f32 + %58 = mulf %6, %44 {RelaxedPrecision} : f32 + %59 = mulf %7, %46 {RelaxedPrecision} : f32 + %60 = mulf %8, %48 {RelaxedPrecision} : f32 + %61 = mulf %9, %50 {RelaxedPrecision} : f32 + %62 = mulf %10, %52 {RelaxedPrecision} : f32 + %63 = mulf %11, %54 {RelaxedPrecision} : f32 + %64 = mulf %12, %56 {RelaxedPrecision} : f32 + %65 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c1_i64 : i64] : vector<8xf32> + %69 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c2_i64 : i64] : vector<8xf32> + %71 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %72 = vector.extractelement %71[%c3_i64 : i64] : vector<8xf32> + %73 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c4_i64 : i64] : vector<8xf32> + %75 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %78 = vector.extractelement %77[%c6_i64 : i64] : vector<8xf32> + %79 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = addf %66, %57 {RelaxedPrecision} : f32 + %82 = addf %68, %58 {RelaxedPrecision} : f32 + %83 = addf %70, %59 {RelaxedPrecision} : f32 + %84 = addf %72, %60 {RelaxedPrecision} : f32 + %85 = addf %74, %61 {RelaxedPrecision} : f32 + %86 = addf %76, %62 {RelaxedPrecision} : f32 + %87 = addf %78, %63 {RelaxedPrecision} : f32 + %88 = addf %80, %64 {RelaxedPrecision} : f32 + %89 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + store %90, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %91 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %82, %91[%c1_i64 : i64] : vector<8xf32> + store %92, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %93 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %94 = vector.insertelement %83, %93[%c2_i64 : i64] : vector<8xf32> + store %94, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %95 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %84, %95[%c3_i64 : i64] : vector<8xf32> + store %96, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %97 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c4_i64 : i64] : vector<8xf32> + store %98, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %99 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %100 = vector.insertelement %86, %99[%c5_i64 : i64] : vector<8xf32> + store %100, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %101 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %87, %101[%c6_i64 : i64] : vector<8xf32> + store %102, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %103 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %88, %103[%c7_i64 : i64] : vector<8xf32> + store %104, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %105 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %106 = vector.insertelement %81, %105[%c0_i64 : i64] : vector<8xf32> + store %106, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %107 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %82, %107[%c1_i64 : i64] : vector<8xf32> + store %108, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %109 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %83, %109[%c2_i64 : i64] : vector<8xf32> + store %110, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %111 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %112 = vector.insertelement %84, %111[%c3_i64 : i64] : vector<8xf32> + store %112, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %113 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %114 = vector.insertelement %85, %113[%c4_i64 : i64] : vector<8xf32> + store %114, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %115 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %86, %115[%c5_i64 : i64] : vector<8xf32> + store %116, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %117 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %87, %117[%c6_i64 : i64] : vector<8xf32> + store %118, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %119 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %120 = vector.insertelement %88, %119[%c7_i64 : i64] : vector<8xf32> + store %120, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %121 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %122 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %124 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = addi %arg5, %c8 : index + %130 = cmpi "slt", %129, %c0 : index + %131 = subi %c-1, %129 : index + %132 = select %130, %131, %129 : index + %133 = divi_signed %132, %c16 : index + %134 = subi %c-1, %133 : index + %135 = select %130, %134, %133 : index + %136 = remi_signed %135, %c16 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = addi %136, %c16 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %15, %c8 : index + %141 = subi %c-1, %140 : index + %142 = select %13, %141, %140 : index + %143 = muli %135, %c-2 : index + %144 = addi %142, %143 : index + %145 = addi %144, %c1 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c2 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = muli %151, %c-2 : index + %153 = addi %144, %152 : index + %154 = addi %153, %c1 : index + %155 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c0_i64 : i64] : vector<8xf32> + %157 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %158 = vector.extractelement %157[%c1_i64 : i64] : vector<8xf32> + %159 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %160 = vector.extractelement %159[%c2_i64 : i64] : vector<8xf32> + %161 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c3_i64 : i64] : vector<8xf32> + %163 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c4_i64 : i64] : vector<8xf32> + %165 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c5_i64 : i64] : vector<8xf32> + %167 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c6_i64 : i64] : vector<8xf32> + %169 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c7_i64 : i64] : vector<8xf32> + %171 = mulf %121, %156 {RelaxedPrecision} : f32 + %172 = mulf %122, %158 {RelaxedPrecision} : f32 + %173 = mulf %123, %160 {RelaxedPrecision} : f32 + %174 = mulf %124, %162 {RelaxedPrecision} : f32 + %175 = mulf %125, %164 {RelaxedPrecision} : f32 + %176 = mulf %126, %166 {RelaxedPrecision} : f32 + %177 = mulf %127, %168 {RelaxedPrecision} : f32 + %178 = mulf %128, %170 {RelaxedPrecision} : f32 + %179 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %180 = vector.extractelement %179[%c0_i64 : i64] : vector<8xf32> + %181 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %182 = vector.extractelement %181[%c1_i64 : i64] : vector<8xf32> + %183 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %186 = vector.extractelement %185[%c3_i64 : i64] : vector<8xf32> + %187 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %188 = vector.extractelement %187[%c4_i64 : i64] : vector<8xf32> + %189 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c5_i64 : i64] : vector<8xf32> + %191 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %192 = vector.extractelement %191[%c6_i64 : i64] : vector<8xf32> + %193 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c7_i64 : i64] : vector<8xf32> + %195 = addf %180, %171 {RelaxedPrecision} : f32 + %196 = addf %182, %172 {RelaxedPrecision} : f32 + %197 = addf %184, %173 {RelaxedPrecision} : f32 + %198 = addf %186, %174 {RelaxedPrecision} : f32 + %199 = addf %188, %175 {RelaxedPrecision} : f32 + %200 = addf %190, %176 {RelaxedPrecision} : f32 + %201 = addf %192, %177 {RelaxedPrecision} : f32 + %202 = addf %194, %178 {RelaxedPrecision} : f32 + %203 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %204 = vector.insertelement %195, %203[%c0_i64 : i64] : vector<8xf32> + store %204, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %205 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %206 = vector.insertelement %196, %205[%c1_i64 : i64] : vector<8xf32> + store %206, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %207 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %208 = vector.insertelement %197, %207[%c2_i64 : i64] : vector<8xf32> + store %208, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %209 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %210 = vector.insertelement %198, %209[%c3_i64 : i64] : vector<8xf32> + store %210, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %211 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %199, %211[%c4_i64 : i64] : vector<8xf32> + store %212, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %213 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %214 = vector.insertelement %200, %213[%c5_i64 : i64] : vector<8xf32> + store %214, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %215 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %216 = vector.insertelement %201, %215[%c6_i64 : i64] : vector<8xf32> + store %216, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %217 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %202, %217[%c7_i64 : i64] : vector<8xf32> + store %218, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %219 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %220 = vector.insertelement %195, %219[%c0_i64 : i64] : vector<8xf32> + store %220, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %221 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %222 = vector.insertelement %196, %221[%c1_i64 : i64] : vector<8xf32> + store %222, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %223 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %197, %223[%c2_i64 : i64] : vector<8xf32> + store %224, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %225 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %198, %225[%c3_i64 : i64] : vector<8xf32> + store %226, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %227 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %199, %227[%c4_i64 : i64] : vector<8xf32> + store %228, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %229 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %200, %229[%c5_i64 : i64] : vector<8xf32> + store %230, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %231 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %201, %231[%c6_i64 : i64] : vector<8xf32> + store %232, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %233 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %202, %233[%c7_i64 : i64] : vector<8xf32> + store %234, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/12_LegalizeStandardForSPIRV.mlir b/Tutorials/optimized_matmul/mlir/12_LegalizeStandardForSPIRV.mlir new file mode 100644 index 00000000..2b20194d --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/12_LegalizeStandardForSPIRV.mlir @@ -0,0 +1,1368 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = addi %arg4, %c1 : index + %114 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = mulf %114, %115 {RelaxedPrecision} : f32 + %117 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = addf %117, %116 {RelaxedPrecision} : f32 + store %118, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %121 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = mulf %120, %121 {RelaxedPrecision} : f32 + %123 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = addf %123, %122 {RelaxedPrecision} : f32 + store %124, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %125, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = mulf %126, %127 {RelaxedPrecision} : f32 + %129 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = addf %129, %128 {RelaxedPrecision} : f32 + store %130, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %131, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = mulf %132, %133 {RelaxedPrecision} : f32 + %135 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = addf %135, %134 {RelaxedPrecision} : f32 + store %136, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %137, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = mulf %138, %139 {RelaxedPrecision} : f32 + %141 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = addf %141, %140 {RelaxedPrecision} : f32 + store %142, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = mulf %144, %145 {RelaxedPrecision} : f32 + %147 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = addf %147, %146 {RelaxedPrecision} : f32 + store %148, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %149, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %151 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = mulf %150, %151 {RelaxedPrecision} : f32 + %153 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = addf %153, %152 {RelaxedPrecision} : f32 + store %154, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %155, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = mulf %162, %163 {RelaxedPrecision} : f32 + %165 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = addf %165, %164 {RelaxedPrecision} : f32 + store %166, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %169 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = mulf %168, %169 {RelaxedPrecision} : f32 + %171 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addf %171, %170 {RelaxedPrecision} : f32 + store %172, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %173, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %175 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = mulf %174, %175 {RelaxedPrecision} : f32 + %177 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addf %177, %176 {RelaxedPrecision} : f32 + store %178, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %179, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %181 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = mulf %180, %181 {RelaxedPrecision} : f32 + %183 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = addf %183, %182 {RelaxedPrecision} : f32 + store %184, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %185, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %193 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = mulf %192, %193 {RelaxedPrecision} : f32 + %195 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addf %195, %194 {RelaxedPrecision} : f32 + store %196, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %197, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %199 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = mulf %198, %199 {RelaxedPrecision} : f32 + %201 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addf %201, %200 {RelaxedPrecision} : f32 + store %202, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %203, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %205 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = mulf %204, %205 {RelaxedPrecision} : f32 + %207 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = addf %207, %206 {RelaxedPrecision} : f32 + store %208, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %209, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addi %arg4, %c2 : index + %211 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %212 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = mulf %211, %212 {RelaxedPrecision} : f32 + %214 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = addf %214, %213 {RelaxedPrecision} : f32 + store %215, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %216, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %224 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = mulf %223, %224 {RelaxedPrecision} : f32 + %226 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = addf %226, %225 {RelaxedPrecision} : f32 + store %227, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %228, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %230 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = mulf %229, %230 {RelaxedPrecision} : f32 + %232 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = addf %232, %231 {RelaxedPrecision} : f32 + store %233, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %234, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %242 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = mulf %241, %242 {RelaxedPrecision} : f32 + %244 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = addf %244, %243 {RelaxedPrecision} : f32 + store %245, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %246, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %248 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = mulf %247, %248 {RelaxedPrecision} : f32 + %250 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = addf %250, %249 {RelaxedPrecision} : f32 + store %251, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %252, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %260 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = mulf %259, %260 {RelaxedPrecision} : f32 + %262 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = addf %262, %261 {RelaxedPrecision} : f32 + store %263, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %264, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %266 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = mulf %265, %266 {RelaxedPrecision} : f32 + %268 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = addf %268, %267 {RelaxedPrecision} : f32 + store %269, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %270, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %278 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = mulf %277, %278 {RelaxedPrecision} : f32 + %280 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = addf %280, %279 {RelaxedPrecision} : f32 + store %281, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %282, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %284 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = mulf %283, %284 {RelaxedPrecision} : f32 + %286 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = addf %286, %285 {RelaxedPrecision} : f32 + store %287, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %288, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %290 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = mulf %289, %290 {RelaxedPrecision} : f32 + %292 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = addf %292, %291 {RelaxedPrecision} : f32 + store %293, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %294, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %296 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = mulf %295, %296 {RelaxedPrecision} : f32 + %298 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = addf %298, %297 {RelaxedPrecision} : f32 + store %299, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %300, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %302 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = mulf %301, %302 {RelaxedPrecision} : f32 + %304 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = addf %304, %303 {RelaxedPrecision} : f32 + store %305, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %306, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = addi %arg4, %c3 : index + %308 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %309 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = mulf %308, %309 {RelaxedPrecision} : f32 + %311 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addf %311, %310 {RelaxedPrecision} : f32 + store %312, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %313, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %314 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = mulf %314, %315 {RelaxedPrecision} : f32 + %317 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = addf %317, %316 {RelaxedPrecision} : f32 + store %318, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = mulf %320, %321 {RelaxedPrecision} : f32 + %323 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = addf %323, %322 {RelaxedPrecision} : f32 + store %324, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %327 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = mulf %326, %327 {RelaxedPrecision} : f32 + %329 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addf %329, %328 {RelaxedPrecision} : f32 + store %330, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %331, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %333 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = mulf %332, %333 {RelaxedPrecision} : f32 + %335 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = addf %335, %334 {RelaxedPrecision} : f32 + store %336, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %337, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = mulf %338, %339 {RelaxedPrecision} : f32 + %341 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = addf %341, %340 {RelaxedPrecision} : f32 + store %342, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %345 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = mulf %344, %345 {RelaxedPrecision} : f32 + %347 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addf %347, %346 {RelaxedPrecision} : f32 + store %348, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %349, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %351 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = mulf %350, %351 {RelaxedPrecision} : f32 + %353 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = addf %353, %352 {RelaxedPrecision} : f32 + store %354, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %355, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = mulf %356, %357 {RelaxedPrecision} : f32 + %359 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = addf %359, %358 {RelaxedPrecision} : f32 + store %360, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = mulf %362, %363 {RelaxedPrecision} : f32 + %365 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addf %365, %364 {RelaxedPrecision} : f32 + store %366, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %369 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = mulf %368, %369 {RelaxedPrecision} : f32 + %371 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = addf %371, %370 {RelaxedPrecision} : f32 + store %372, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %373, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = mulf %374, %375 {RelaxedPrecision} : f32 + %377 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = addf %377, %376 {RelaxedPrecision} : f32 + store %378, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %381 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = mulf %380, %381 {RelaxedPrecision} : f32 + %383 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addf %383, %382 {RelaxedPrecision} : f32 + store %384, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %385, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = mulf %386, %387 {RelaxedPrecision} : f32 + %389 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = addf %389, %388 {RelaxedPrecision} : f32 + store %390, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = mulf %392, %393 {RelaxedPrecision} : f32 + %395 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = addf %395, %394 {RelaxedPrecision} : f32 + store %396, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %399 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = mulf %398, %399 {RelaxedPrecision} : f32 + %401 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addf %401, %400 {RelaxedPrecision} : f32 + store %402, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %403, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = addi %arg4, %c4 : index + %405 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %412 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = mulf %411, %412 {RelaxedPrecision} : f32 + %414 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = addf %414, %413 {RelaxedPrecision} : f32 + store %415, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %416, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %418 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = mulf %417, %418 {RelaxedPrecision} : f32 + %420 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addf %420, %419 {RelaxedPrecision} : f32 + store %421, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %422, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %430 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = mulf %429, %430 {RelaxedPrecision} : f32 + %432 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = addf %432, %431 {RelaxedPrecision} : f32 + store %433, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %434, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %442 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = mulf %441, %442 {RelaxedPrecision} : f32 + %444 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = addf %444, %443 {RelaxedPrecision} : f32 + store %445, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %446, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %448 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = mulf %447, %448 {RelaxedPrecision} : f32 + %450 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addf %450, %449 {RelaxedPrecision} : f32 + store %451, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %452, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %454 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = mulf %453, %454 {RelaxedPrecision} : f32 + %456 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = addf %456, %455 {RelaxedPrecision} : f32 + store %457, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %458 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %458, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %460 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = mulf %459, %460 {RelaxedPrecision} : f32 + %462 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = addf %462, %461 {RelaxedPrecision} : f32 + store %463, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %464, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %466 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = mulf %465, %466 {RelaxedPrecision} : f32 + %468 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = addf %468, %467 {RelaxedPrecision} : f32 + store %469, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %470, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %472 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = mulf %471, %472 {RelaxedPrecision} : f32 + %474 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = addf %474, %473 {RelaxedPrecision} : f32 + store %475, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %476, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %478 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = mulf %477, %478 {RelaxedPrecision} : f32 + %480 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = addf %480, %479 {RelaxedPrecision} : f32 + store %481, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %482, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %484 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = mulf %483, %484 {RelaxedPrecision} : f32 + %486 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = addf %486, %485 {RelaxedPrecision} : f32 + store %487, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %488, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %490 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = mulf %489, %490 {RelaxedPrecision} : f32 + %492 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = addf %492, %491 {RelaxedPrecision} : f32 + store %493, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %494, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %496 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = mulf %495, %496 {RelaxedPrecision} : f32 + %498 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = addf %498, %497 {RelaxedPrecision} : f32 + store %499, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %500, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = addi %arg4, %c5 : index + %502 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %503 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = mulf %502, %503 {RelaxedPrecision} : f32 + %505 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = addf %505, %504 {RelaxedPrecision} : f32 + store %506, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %507, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %509 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = mulf %508, %509 {RelaxedPrecision} : f32 + %511 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %512 = addf %511, %510 {RelaxedPrecision} : f32 + store %512, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %513, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %521 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = mulf %520, %521 {RelaxedPrecision} : f32 + %523 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = addf %523, %522 {RelaxedPrecision} : f32 + store %524, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %525, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %527 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = mulf %526, %527 {RelaxedPrecision} : f32 + %529 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addf %529, %528 {RelaxedPrecision} : f32 + store %530, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %531, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %533 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = mulf %532, %533 {RelaxedPrecision} : f32 + %535 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addf %535, %534 {RelaxedPrecision} : f32 + store %536, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %537 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %537, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %539 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = mulf %538, %539 {RelaxedPrecision} : f32 + %541 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = addf %541, %540 {RelaxedPrecision} : f32 + store %542, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %543, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %551 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = mulf %550, %551 {RelaxedPrecision} : f32 + %553 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addf %553, %552 {RelaxedPrecision} : f32 + store %554, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %555 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %555, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %557 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = mulf %556, %557 {RelaxedPrecision} : f32 + %559 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addf %559, %558 {RelaxedPrecision} : f32 + store %560, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %561, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %563 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = mulf %562, %563 {RelaxedPrecision} : f32 + %565 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = addf %565, %564 {RelaxedPrecision} : f32 + store %566, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %567, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %569 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = mulf %568, %569 {RelaxedPrecision} : f32 + %571 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = addf %571, %570 {RelaxedPrecision} : f32 + store %572, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %573 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %573, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %581 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = mulf %580, %581 {RelaxedPrecision} : f32 + %583 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = addf %583, %582 {RelaxedPrecision} : f32 + store %584, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %585, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %587 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = mulf %586, %587 {RelaxedPrecision} : f32 + %589 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addf %589, %588 {RelaxedPrecision} : f32 + store %590, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %591 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %591, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %593 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = mulf %592, %593 {RelaxedPrecision} : f32 + %595 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = addf %595, %594 {RelaxedPrecision} : f32 + store %596, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %597 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %597, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = mulf %113, %114 {RelaxedPrecision} : f32 + %116 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %117 = addf %116, %115 {RelaxedPrecision} : f32 + store %117, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %118, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = mulf %125, %126 {RelaxedPrecision} : f32 + %128 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = addf %128, %127 {RelaxedPrecision} : f32 + store %129, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %130, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = mulf %131, %132 {RelaxedPrecision} : f32 + %134 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = addf %134, %133 {RelaxedPrecision} : f32 + store %135, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = mulf %143, %144 {RelaxedPrecision} : f32 + %146 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = addf %146, %145 {RelaxedPrecision} : f32 + store %147, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %148, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = mulf %149, %150 {RelaxedPrecision} : f32 + %152 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = addf %152, %151 {RelaxedPrecision} : f32 + store %153, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %157 = mulf %155, %156 {RelaxedPrecision} : f32 + %158 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = addf %158, %157 {RelaxedPrecision} : f32 + store %159, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %160, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = mulf %161, %162 {RelaxedPrecision} : f32 + %164 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = addf %164, %163 {RelaxedPrecision} : f32 + store %165, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %166, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = mulf %167, %168 {RelaxedPrecision} : f32 + %170 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = addf %170, %169 {RelaxedPrecision} : f32 + store %171, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %174 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = mulf %173, %174 {RelaxedPrecision} : f32 + %176 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = addf %176, %175 {RelaxedPrecision} : f32 + store %177, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %178, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %180 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = mulf %179, %180 {RelaxedPrecision} : f32 + %182 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = addf %182, %181 {RelaxedPrecision} : f32 + store %183, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %184, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = mulf %185, %186 {RelaxedPrecision} : f32 + %188 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = addf %188, %187 {RelaxedPrecision} : f32 + store %189, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %192 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %193 = mulf %191, %192 {RelaxedPrecision} : f32 + %194 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = addf %194, %193 {RelaxedPrecision} : f32 + store %195, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %196, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %198 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = mulf %197, %198 {RelaxedPrecision} : f32 + %200 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = addf %200, %199 {RelaxedPrecision} : f32 + store %201, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %202, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = mulf %203, %204 {RelaxedPrecision} : f32 + %206 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = addf %206, %205 {RelaxedPrecision} : f32 + store %207, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %210 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = mulf %209, %210 {RelaxedPrecision} : f32 + %212 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = addf %212, %211 {RelaxedPrecision} : f32 + store %213, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %214, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %216 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = mulf %215, %216 {RelaxedPrecision} : f32 + %218 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = addf %218, %217 {RelaxedPrecision} : f32 + store %219, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %220, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %318 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = mulf %317, %318 {RelaxedPrecision} : f32 + %320 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addf %320, %319 {RelaxedPrecision} : f32 + store %321, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %322, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %324 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = mulf %323, %324 {RelaxedPrecision} : f32 + %326 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = addf %326, %325 {RelaxedPrecision} : f32 + store %327, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %328, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = mulf %329, %330 {RelaxedPrecision} : f32 + %332 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = addf %332, %331 {RelaxedPrecision} : f32 + store %333, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %342 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = mulf %341, %342 {RelaxedPrecision} : f32 + %344 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %345 = addf %344, %343 {RelaxedPrecision} : f32 + store %345, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %346, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = mulf %347, %348 {RelaxedPrecision} : f32 + %350 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addf %350, %349 {RelaxedPrecision} : f32 + store %351, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %354 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = mulf %353, %354 {RelaxedPrecision} : f32 + %356 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addf %356, %355 {RelaxedPrecision} : f32 + store %357, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %358, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %360 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = mulf %359, %360 {RelaxedPrecision} : f32 + %362 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %363 = addf %362, %361 {RelaxedPrecision} : f32 + store %363, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %364, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %372 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = mulf %371, %372 {RelaxedPrecision} : f32 + %374 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addf %374, %373 {RelaxedPrecision} : f32 + store %375, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %376, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %378 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = mulf %377, %378 {RelaxedPrecision} : f32 + %380 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addf %380, %379 {RelaxedPrecision} : f32 + store %381, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %382, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = mulf %383, %384 {RelaxedPrecision} : f32 + %386 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = addf %386, %385 {RelaxedPrecision} : f32 + store %387, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %390 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = mulf %389, %390 {RelaxedPrecision} : f32 + %392 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addf %392, %391 {RelaxedPrecision} : f32 + store %393, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %394, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/13_ConvertAcceraToSPIRV.mlir b/Tutorials/optimized_matmul/mlir/13_ConvertAcceraToSPIRV.mlir new file mode 100644 index 00000000..2b20194d --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/13_ConvertAcceraToSPIRV.mlir @@ -0,0 +1,1368 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = addi %arg4, %c1 : index + %114 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = mulf %114, %115 {RelaxedPrecision} : f32 + %117 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = addf %117, %116 {RelaxedPrecision} : f32 + store %118, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %121 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = mulf %120, %121 {RelaxedPrecision} : f32 + %123 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = addf %123, %122 {RelaxedPrecision} : f32 + store %124, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %125, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = mulf %126, %127 {RelaxedPrecision} : f32 + %129 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = addf %129, %128 {RelaxedPrecision} : f32 + store %130, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %131, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = mulf %132, %133 {RelaxedPrecision} : f32 + %135 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = addf %135, %134 {RelaxedPrecision} : f32 + store %136, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %137, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = mulf %138, %139 {RelaxedPrecision} : f32 + %141 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = addf %141, %140 {RelaxedPrecision} : f32 + store %142, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = mulf %144, %145 {RelaxedPrecision} : f32 + %147 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = addf %147, %146 {RelaxedPrecision} : f32 + store %148, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %149, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %151 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = mulf %150, %151 {RelaxedPrecision} : f32 + %153 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = addf %153, %152 {RelaxedPrecision} : f32 + store %154, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %155, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = mulf %162, %163 {RelaxedPrecision} : f32 + %165 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = addf %165, %164 {RelaxedPrecision} : f32 + store %166, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %169 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = mulf %168, %169 {RelaxedPrecision} : f32 + %171 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addf %171, %170 {RelaxedPrecision} : f32 + store %172, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %173, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %175 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = mulf %174, %175 {RelaxedPrecision} : f32 + %177 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addf %177, %176 {RelaxedPrecision} : f32 + store %178, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %179, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %181 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = mulf %180, %181 {RelaxedPrecision} : f32 + %183 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = addf %183, %182 {RelaxedPrecision} : f32 + store %184, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %185, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %193 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = mulf %192, %193 {RelaxedPrecision} : f32 + %195 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addf %195, %194 {RelaxedPrecision} : f32 + store %196, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %197, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %199 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = mulf %198, %199 {RelaxedPrecision} : f32 + %201 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addf %201, %200 {RelaxedPrecision} : f32 + store %202, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %203, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %205 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = mulf %204, %205 {RelaxedPrecision} : f32 + %207 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = addf %207, %206 {RelaxedPrecision} : f32 + store %208, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %209, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addi %arg4, %c2 : index + %211 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %212 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = mulf %211, %212 {RelaxedPrecision} : f32 + %214 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = addf %214, %213 {RelaxedPrecision} : f32 + store %215, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %216, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %224 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = mulf %223, %224 {RelaxedPrecision} : f32 + %226 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = addf %226, %225 {RelaxedPrecision} : f32 + store %227, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %228, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %230 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = mulf %229, %230 {RelaxedPrecision} : f32 + %232 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = addf %232, %231 {RelaxedPrecision} : f32 + store %233, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %234, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %242 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = mulf %241, %242 {RelaxedPrecision} : f32 + %244 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = addf %244, %243 {RelaxedPrecision} : f32 + store %245, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %246, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %248 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = mulf %247, %248 {RelaxedPrecision} : f32 + %250 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = addf %250, %249 {RelaxedPrecision} : f32 + store %251, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %252, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %260 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = mulf %259, %260 {RelaxedPrecision} : f32 + %262 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = addf %262, %261 {RelaxedPrecision} : f32 + store %263, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %264, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %266 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = mulf %265, %266 {RelaxedPrecision} : f32 + %268 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = addf %268, %267 {RelaxedPrecision} : f32 + store %269, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %270, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %278 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = mulf %277, %278 {RelaxedPrecision} : f32 + %280 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = addf %280, %279 {RelaxedPrecision} : f32 + store %281, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %282, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %284 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = mulf %283, %284 {RelaxedPrecision} : f32 + %286 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = addf %286, %285 {RelaxedPrecision} : f32 + store %287, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %288, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %290 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = mulf %289, %290 {RelaxedPrecision} : f32 + %292 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = addf %292, %291 {RelaxedPrecision} : f32 + store %293, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %294, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %296 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = mulf %295, %296 {RelaxedPrecision} : f32 + %298 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = addf %298, %297 {RelaxedPrecision} : f32 + store %299, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %300, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %302 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = mulf %301, %302 {RelaxedPrecision} : f32 + %304 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = addf %304, %303 {RelaxedPrecision} : f32 + store %305, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %306, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = addi %arg4, %c3 : index + %308 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %309 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = mulf %308, %309 {RelaxedPrecision} : f32 + %311 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addf %311, %310 {RelaxedPrecision} : f32 + store %312, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %313, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %314 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = mulf %314, %315 {RelaxedPrecision} : f32 + %317 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = addf %317, %316 {RelaxedPrecision} : f32 + store %318, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = mulf %320, %321 {RelaxedPrecision} : f32 + %323 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = addf %323, %322 {RelaxedPrecision} : f32 + store %324, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %327 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = mulf %326, %327 {RelaxedPrecision} : f32 + %329 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addf %329, %328 {RelaxedPrecision} : f32 + store %330, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %331, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %333 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = mulf %332, %333 {RelaxedPrecision} : f32 + %335 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = addf %335, %334 {RelaxedPrecision} : f32 + store %336, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %337, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = mulf %338, %339 {RelaxedPrecision} : f32 + %341 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = addf %341, %340 {RelaxedPrecision} : f32 + store %342, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %345 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = mulf %344, %345 {RelaxedPrecision} : f32 + %347 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addf %347, %346 {RelaxedPrecision} : f32 + store %348, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %349, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %351 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = mulf %350, %351 {RelaxedPrecision} : f32 + %353 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = addf %353, %352 {RelaxedPrecision} : f32 + store %354, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %355, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = mulf %356, %357 {RelaxedPrecision} : f32 + %359 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = addf %359, %358 {RelaxedPrecision} : f32 + store %360, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = mulf %362, %363 {RelaxedPrecision} : f32 + %365 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addf %365, %364 {RelaxedPrecision} : f32 + store %366, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %369 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = mulf %368, %369 {RelaxedPrecision} : f32 + %371 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = addf %371, %370 {RelaxedPrecision} : f32 + store %372, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %373, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = mulf %374, %375 {RelaxedPrecision} : f32 + %377 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = addf %377, %376 {RelaxedPrecision} : f32 + store %378, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %381 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = mulf %380, %381 {RelaxedPrecision} : f32 + %383 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addf %383, %382 {RelaxedPrecision} : f32 + store %384, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %385, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = mulf %386, %387 {RelaxedPrecision} : f32 + %389 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = addf %389, %388 {RelaxedPrecision} : f32 + store %390, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = mulf %392, %393 {RelaxedPrecision} : f32 + %395 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = addf %395, %394 {RelaxedPrecision} : f32 + store %396, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %399 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = mulf %398, %399 {RelaxedPrecision} : f32 + %401 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addf %401, %400 {RelaxedPrecision} : f32 + store %402, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %403, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = addi %arg4, %c4 : index + %405 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %412 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = mulf %411, %412 {RelaxedPrecision} : f32 + %414 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = addf %414, %413 {RelaxedPrecision} : f32 + store %415, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %416, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %418 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = mulf %417, %418 {RelaxedPrecision} : f32 + %420 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addf %420, %419 {RelaxedPrecision} : f32 + store %421, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %422, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %430 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = mulf %429, %430 {RelaxedPrecision} : f32 + %432 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = addf %432, %431 {RelaxedPrecision} : f32 + store %433, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %434, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %442 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = mulf %441, %442 {RelaxedPrecision} : f32 + %444 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = addf %444, %443 {RelaxedPrecision} : f32 + store %445, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %446, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %448 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = mulf %447, %448 {RelaxedPrecision} : f32 + %450 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addf %450, %449 {RelaxedPrecision} : f32 + store %451, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %452, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %454 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = mulf %453, %454 {RelaxedPrecision} : f32 + %456 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = addf %456, %455 {RelaxedPrecision} : f32 + store %457, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %458 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %458, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %460 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = mulf %459, %460 {RelaxedPrecision} : f32 + %462 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = addf %462, %461 {RelaxedPrecision} : f32 + store %463, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %464, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %466 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = mulf %465, %466 {RelaxedPrecision} : f32 + %468 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = addf %468, %467 {RelaxedPrecision} : f32 + store %469, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %470, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %472 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = mulf %471, %472 {RelaxedPrecision} : f32 + %474 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = addf %474, %473 {RelaxedPrecision} : f32 + store %475, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %476, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %478 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = mulf %477, %478 {RelaxedPrecision} : f32 + %480 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = addf %480, %479 {RelaxedPrecision} : f32 + store %481, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %482, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %484 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = mulf %483, %484 {RelaxedPrecision} : f32 + %486 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = addf %486, %485 {RelaxedPrecision} : f32 + store %487, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %488, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %490 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = mulf %489, %490 {RelaxedPrecision} : f32 + %492 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = addf %492, %491 {RelaxedPrecision} : f32 + store %493, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %494, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %496 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = mulf %495, %496 {RelaxedPrecision} : f32 + %498 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = addf %498, %497 {RelaxedPrecision} : f32 + store %499, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %500, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = addi %arg4, %c5 : index + %502 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %503 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = mulf %502, %503 {RelaxedPrecision} : f32 + %505 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = addf %505, %504 {RelaxedPrecision} : f32 + store %506, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %507, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %509 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = mulf %508, %509 {RelaxedPrecision} : f32 + %511 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %512 = addf %511, %510 {RelaxedPrecision} : f32 + store %512, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %513, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %521 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = mulf %520, %521 {RelaxedPrecision} : f32 + %523 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = addf %523, %522 {RelaxedPrecision} : f32 + store %524, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %525, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %527 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = mulf %526, %527 {RelaxedPrecision} : f32 + %529 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addf %529, %528 {RelaxedPrecision} : f32 + store %530, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %531, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %533 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = mulf %532, %533 {RelaxedPrecision} : f32 + %535 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addf %535, %534 {RelaxedPrecision} : f32 + store %536, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %537 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %537, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %539 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = mulf %538, %539 {RelaxedPrecision} : f32 + %541 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = addf %541, %540 {RelaxedPrecision} : f32 + store %542, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %543, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %551 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = mulf %550, %551 {RelaxedPrecision} : f32 + %553 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addf %553, %552 {RelaxedPrecision} : f32 + store %554, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %555 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %555, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %557 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = mulf %556, %557 {RelaxedPrecision} : f32 + %559 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addf %559, %558 {RelaxedPrecision} : f32 + store %560, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %561, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %563 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = mulf %562, %563 {RelaxedPrecision} : f32 + %565 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = addf %565, %564 {RelaxedPrecision} : f32 + store %566, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %567, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %569 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = mulf %568, %569 {RelaxedPrecision} : f32 + %571 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = addf %571, %570 {RelaxedPrecision} : f32 + store %572, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %573 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %573, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %581 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = mulf %580, %581 {RelaxedPrecision} : f32 + %583 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = addf %583, %582 {RelaxedPrecision} : f32 + store %584, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %585, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %587 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = mulf %586, %587 {RelaxedPrecision} : f32 + %589 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addf %589, %588 {RelaxedPrecision} : f32 + store %590, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %591 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %591, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %593 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = mulf %592, %593 {RelaxedPrecision} : f32 + %595 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = addf %595, %594 {RelaxedPrecision} : f32 + store %596, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %597 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %597, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = mulf %113, %114 {RelaxedPrecision} : f32 + %116 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %117 = addf %116, %115 {RelaxedPrecision} : f32 + store %117, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %118, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = mulf %125, %126 {RelaxedPrecision} : f32 + %128 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = addf %128, %127 {RelaxedPrecision} : f32 + store %129, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %130, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = mulf %131, %132 {RelaxedPrecision} : f32 + %134 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = addf %134, %133 {RelaxedPrecision} : f32 + store %135, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = mulf %143, %144 {RelaxedPrecision} : f32 + %146 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = addf %146, %145 {RelaxedPrecision} : f32 + store %147, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %148, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = mulf %149, %150 {RelaxedPrecision} : f32 + %152 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = addf %152, %151 {RelaxedPrecision} : f32 + store %153, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %157 = mulf %155, %156 {RelaxedPrecision} : f32 + %158 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = addf %158, %157 {RelaxedPrecision} : f32 + store %159, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %160, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = mulf %161, %162 {RelaxedPrecision} : f32 + %164 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = addf %164, %163 {RelaxedPrecision} : f32 + store %165, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %166, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = mulf %167, %168 {RelaxedPrecision} : f32 + %170 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = addf %170, %169 {RelaxedPrecision} : f32 + store %171, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %174 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = mulf %173, %174 {RelaxedPrecision} : f32 + %176 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = addf %176, %175 {RelaxedPrecision} : f32 + store %177, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %178, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %180 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = mulf %179, %180 {RelaxedPrecision} : f32 + %182 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = addf %182, %181 {RelaxedPrecision} : f32 + store %183, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %184, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = mulf %185, %186 {RelaxedPrecision} : f32 + %188 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = addf %188, %187 {RelaxedPrecision} : f32 + store %189, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %192 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %193 = mulf %191, %192 {RelaxedPrecision} : f32 + %194 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = addf %194, %193 {RelaxedPrecision} : f32 + store %195, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %196, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %198 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = mulf %197, %198 {RelaxedPrecision} : f32 + %200 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = addf %200, %199 {RelaxedPrecision} : f32 + store %201, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %202, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = mulf %203, %204 {RelaxedPrecision} : f32 + %206 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = addf %206, %205 {RelaxedPrecision} : f32 + store %207, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %210 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = mulf %209, %210 {RelaxedPrecision} : f32 + %212 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = addf %212, %211 {RelaxedPrecision} : f32 + store %213, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %214, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %216 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = mulf %215, %216 {RelaxedPrecision} : f32 + %218 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = addf %218, %217 {RelaxedPrecision} : f32 + store %219, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %220, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %318 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = mulf %317, %318 {RelaxedPrecision} : f32 + %320 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addf %320, %319 {RelaxedPrecision} : f32 + store %321, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %322, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %324 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = mulf %323, %324 {RelaxedPrecision} : f32 + %326 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = addf %326, %325 {RelaxedPrecision} : f32 + store %327, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %328, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = mulf %329, %330 {RelaxedPrecision} : f32 + %332 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = addf %332, %331 {RelaxedPrecision} : f32 + store %333, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %342 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = mulf %341, %342 {RelaxedPrecision} : f32 + %344 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %345 = addf %344, %343 {RelaxedPrecision} : f32 + store %345, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %346, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = mulf %347, %348 {RelaxedPrecision} : f32 + %350 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addf %350, %349 {RelaxedPrecision} : f32 + store %351, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %354 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = mulf %353, %354 {RelaxedPrecision} : f32 + %356 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addf %356, %355 {RelaxedPrecision} : f32 + store %357, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %358, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %360 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = mulf %359, %360 {RelaxedPrecision} : f32 + %362 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %363 = addf %362, %361 {RelaxedPrecision} : f32 + store %363, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %364, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %372 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = mulf %371, %372 {RelaxedPrecision} : f32 + %374 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addf %374, %373 {RelaxedPrecision} : f32 + store %375, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %376, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %378 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = mulf %377, %378 {RelaxedPrecision} : f32 + %380 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addf %380, %379 {RelaxedPrecision} : f32 + store %381, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %382, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = mulf %383, %384 {RelaxedPrecision} : f32 + %386 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = addf %386, %385 {RelaxedPrecision} : f32 + store %387, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %390 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = mulf %389, %390 {RelaxedPrecision} : f32 + %392 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addf %392, %391 {RelaxedPrecision} : f32 + store %393, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %394, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/13_LegalizeStandardForSPIRV.mlir b/Tutorials/optimized_matmul/mlir/13_LegalizeStandardForSPIRV.mlir new file mode 100644 index 00000000..aa07fd1b --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/13_LegalizeStandardForSPIRV.mlir @@ -0,0 +1,2095 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg6, %arg8 : index + %7 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = cmpi "slt", %arg5, %c0 : index + %16 = subi %c-1, %arg5 : index + %17 = select %15, %16, %arg5 : index + %18 = divi_signed %17, %c16 : index + %19 = subi %c-1, %18 : index + %20 = select %15, %19, %18 : index + %21 = remi_signed %20, %c16 : index + %22 = cmpi "slt", %21, %c0 : index + %23 = addi %21, %c16 : index + %24 = select %22, %23, %21 : index + %25 = remi_signed %6, %c128 : index + %26 = cmpi "slt", %25, %c0 : index + %27 = addi %25, %c128 : index + %28 = select %26, %27, %25 : index + %29 = remi_signed %arg5, %c16 : index + %30 = cmpi "slt", %29, %c0 : index + %31 = addi %29, %c16 : index + %32 = select %30, %31, %29 : index + %33 = cmpi "slt", %32, %c0 : index + %34 = subi %c-1, %32 : index + %35 = select %33, %34, %32 : index + %36 = divi_signed %35, %c8 : index + %37 = subi %c-1, %36 : index + %38 = select %33, %37, %36 : index + %39 = remi_signed %38, %c2 : index + %40 = cmpi "slt", %39, %c0 : index + %41 = addi %39, %c2 : index + %42 = select %40, %41, %39 : index + %43 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c0_i64 : i64] : vector<8xf32> + %45 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c1_i64 : i64] : vector<8xf32> + %47 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c2_i64 : i64] : vector<8xf32> + %49 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c3_i64 : i64] : vector<8xf32> + %51 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c4_i64 : i64] : vector<8xf32> + %53 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c5_i64 : i64] : vector<8xf32> + %55 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c6_i64 : i64] : vector<8xf32> + %57 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %58 = vector.extractelement %57[%c7_i64 : i64] : vector<8xf32> + %59 = mulf %7, %44 {RelaxedPrecision} : f32 + %60 = mulf %8, %46 {RelaxedPrecision} : f32 + %61 = mulf %9, %48 {RelaxedPrecision} : f32 + %62 = mulf %10, %50 {RelaxedPrecision} : f32 + %63 = mulf %11, %52 {RelaxedPrecision} : f32 + %64 = mulf %12, %54 {RelaxedPrecision} : f32 + %65 = mulf %13, %56 {RelaxedPrecision} : f32 + %66 = mulf %14, %58 {RelaxedPrecision} : f32 + %67 = addi %arg7, %arg9 : index + %68 = remi_signed %67, %c6 : index + %69 = cmpi "slt", %68, %c0 : index + %70 = addi %68, %c6 : index + %71 = select %69, %70, %68 : index + %72 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c0_i64 : i64] : vector<8xf32> + %74 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %75 = vector.extractelement %74[%c1_i64 : i64] : vector<8xf32> + %76 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c2_i64 : i64] : vector<8xf32> + %78 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c3_i64 : i64] : vector<8xf32> + %80 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %81 = vector.extractelement %80[%c4_i64 : i64] : vector<8xf32> + %82 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %83 = vector.extractelement %82[%c5_i64 : i64] : vector<8xf32> + %84 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %85 = vector.extractelement %84[%c6_i64 : i64] : vector<8xf32> + %86 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %87 = vector.extractelement %86[%c7_i64 : i64] : vector<8xf32> + %88 = addf %73, %59 {RelaxedPrecision} : f32 + %89 = addf %75, %60 {RelaxedPrecision} : f32 + %90 = addf %77, %61 {RelaxedPrecision} : f32 + %91 = addf %79, %62 {RelaxedPrecision} : f32 + %92 = addf %81, %63 {RelaxedPrecision} : f32 + %93 = addf %83, %64 {RelaxedPrecision} : f32 + %94 = addf %85, %65 {RelaxedPrecision} : f32 + %95 = addf %87, %66 {RelaxedPrecision} : f32 + %96 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %88, %96[%c0_i64 : i64] : vector<8xf32> + store %97, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %98 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %89, %98[%c1_i64 : i64] : vector<8xf32> + store %99, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %100 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %90, %100[%c2_i64 : i64] : vector<8xf32> + store %101, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %102 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %91, %102[%c3_i64 : i64] : vector<8xf32> + store %103, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %104 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %92, %104[%c4_i64 : i64] : vector<8xf32> + store %105, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %106 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %93, %106[%c5_i64 : i64] : vector<8xf32> + store %107, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %108 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %109 = vector.insertelement %94, %108[%c6_i64 : i64] : vector<8xf32> + store %109, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %110 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %95, %110[%c7_i64 : i64] : vector<8xf32> + store %111, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %112 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %88, %112[%c0_i64 : i64] : vector<8xf32> + store %113, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %114 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %89, %114[%c1_i64 : i64] : vector<8xf32> + store %115, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %116 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %117 = vector.insertelement %90, %116[%c2_i64 : i64] : vector<8xf32> + store %117, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %118 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %91, %118[%c3_i64 : i64] : vector<8xf32> + store %119, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %120 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %92, %120[%c4_i64 : i64] : vector<8xf32> + store %121, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %122 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %123 = vector.insertelement %93, %122[%c5_i64 : i64] : vector<8xf32> + store %123, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %124 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %94, %124[%c6_i64 : i64] : vector<8xf32> + store %125, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %126 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %95, %126[%c7_i64 : i64] : vector<8xf32> + store %127, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %128 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %134 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %135 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %136 = addi %arg5, %c8 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = subi %c-1, %136 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = remi_signed %142, %c16 : index + %144 = cmpi "slt", %143, %c0 : index + %145 = addi %143, %c16 : index + %146 = select %144, %145, %143 : index + %147 = divi_signed %17, %c8 : index + %148 = subi %c-1, %147 : index + %149 = select %15, %148, %147 : index + %150 = muli %142, %c-2 : index + %151 = addi %149, %150 : index + %152 = addi %151, %c1 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c1 : index + %162 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c2_i64 : i64] : vector<8xf32> + %168 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c3_i64 : i64] : vector<8xf32> + %170 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c4_i64 : i64] : vector<8xf32> + %172 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c5_i64 : i64] : vector<8xf32> + %174 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c6_i64 : i64] : vector<8xf32> + %176 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c7_i64 : i64] : vector<8xf32> + %178 = mulf %128, %163 {RelaxedPrecision} : f32 + %179 = mulf %129, %165 {RelaxedPrecision} : f32 + %180 = mulf %130, %167 {RelaxedPrecision} : f32 + %181 = mulf %131, %169 {RelaxedPrecision} : f32 + %182 = mulf %132, %171 {RelaxedPrecision} : f32 + %183 = mulf %133, %173 {RelaxedPrecision} : f32 + %184 = mulf %134, %175 {RelaxedPrecision} : f32 + %185 = mulf %135, %177 {RelaxedPrecision} : f32 + %186 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c0_i64 : i64] : vector<8xf32> + %188 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %189 = vector.extractelement %188[%c1_i64 : i64] : vector<8xf32> + %190 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %191 = vector.extractelement %190[%c2_i64 : i64] : vector<8xf32> + %192 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c3_i64 : i64] : vector<8xf32> + %194 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c4_i64 : i64] : vector<8xf32> + %196 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c5_i64 : i64] : vector<8xf32> + %198 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c6_i64 : i64] : vector<8xf32> + %200 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c7_i64 : i64] : vector<8xf32> + %202 = addf %187, %178 {RelaxedPrecision} : f32 + %203 = addf %189, %179 {RelaxedPrecision} : f32 + %204 = addf %191, %180 {RelaxedPrecision} : f32 + %205 = addf %193, %181 {RelaxedPrecision} : f32 + %206 = addf %195, %182 {RelaxedPrecision} : f32 + %207 = addf %197, %183 {RelaxedPrecision} : f32 + %208 = addf %199, %184 {RelaxedPrecision} : f32 + %209 = addf %201, %185 {RelaxedPrecision} : f32 + %210 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %202, %210[%c0_i64 : i64] : vector<8xf32> + store %211, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %212 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %213 = vector.insertelement %203, %212[%c1_i64 : i64] : vector<8xf32> + store %213, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %214 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %204, %214[%c2_i64 : i64] : vector<8xf32> + store %215, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %216 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %217 = vector.insertelement %205, %216[%c3_i64 : i64] : vector<8xf32> + store %217, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %218 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %219 = vector.insertelement %206, %218[%c4_i64 : i64] : vector<8xf32> + store %219, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %220 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %207, %220[%c5_i64 : i64] : vector<8xf32> + store %221, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %222 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %223 = vector.insertelement %208, %222[%c6_i64 : i64] : vector<8xf32> + store %223, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %224 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %209, %224[%c7_i64 : i64] : vector<8xf32> + store %225, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %226 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %202, %226[%c0_i64 : i64] : vector<8xf32> + store %227, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %228 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %203, %228[%c1_i64 : i64] : vector<8xf32> + store %229, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %230 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %204, %230[%c2_i64 : i64] : vector<8xf32> + store %231, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %232 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %205, %232[%c3_i64 : i64] : vector<8xf32> + store %233, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %234 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %206, %234[%c4_i64 : i64] : vector<8xf32> + store %235, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %236 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %207, %236[%c5_i64 : i64] : vector<8xf32> + store %237, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %238 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %208, %238[%c6_i64 : i64] : vector<8xf32> + store %239, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %240 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %209, %240[%c7_i64 : i64] : vector<8xf32> + store %241, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %6 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %7 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = cmpi "slt", %arg5, %c0 : index + %14 = subi %c-1, %arg5 : index + %15 = select %13, %14, %arg5 : index + %16 = divi_signed %15, %c16 : index + %17 = subi %c-1, %16 : index + %18 = select %13, %17, %16 : index + %19 = remi_signed %18, %c16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = addi %19, %c16 : index + %22 = select %20, %21, %19 : index + %23 = remi_signed %4, %c128 : index + %24 = cmpi "slt", %23, %c0 : index + %25 = addi %23, %c128 : index + %26 = select %24, %25, %23 : index + %27 = remi_signed %arg5, %c16 : index + %28 = cmpi "slt", %27, %c0 : index + %29 = addi %27, %c16 : index + %30 = select %28, %29, %27 : index + %31 = cmpi "slt", %30, %c0 : index + %32 = subi %c-1, %30 : index + %33 = select %31, %32, %30 : index + %34 = divi_signed %33, %c8 : index + %35 = subi %c-1, %34 : index + %36 = select %31, %35, %34 : index + %37 = remi_signed %36, %c2 : index + %38 = cmpi "slt", %37, %c0 : index + %39 = addi %37, %c2 : index + %40 = select %38, %39, %37 : index + %41 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c0_i64 : i64] : vector<8xf32> + %43 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c1_i64 : i64] : vector<8xf32> + %45 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c2_i64 : i64] : vector<8xf32> + %47 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c3_i64 : i64] : vector<8xf32> + %49 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c4_i64 : i64] : vector<8xf32> + %51 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c5_i64 : i64] : vector<8xf32> + %53 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c6_i64 : i64] : vector<8xf32> + %55 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c7_i64 : i64] : vector<8xf32> + %57 = mulf %5, %42 {RelaxedPrecision} : f32 + %58 = mulf %6, %44 {RelaxedPrecision} : f32 + %59 = mulf %7, %46 {RelaxedPrecision} : f32 + %60 = mulf %8, %48 {RelaxedPrecision} : f32 + %61 = mulf %9, %50 {RelaxedPrecision} : f32 + %62 = mulf %10, %52 {RelaxedPrecision} : f32 + %63 = mulf %11, %54 {RelaxedPrecision} : f32 + %64 = mulf %12, %56 {RelaxedPrecision} : f32 + %65 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c1_i64 : i64] : vector<8xf32> + %69 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c2_i64 : i64] : vector<8xf32> + %71 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %72 = vector.extractelement %71[%c3_i64 : i64] : vector<8xf32> + %73 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c4_i64 : i64] : vector<8xf32> + %75 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %78 = vector.extractelement %77[%c6_i64 : i64] : vector<8xf32> + %79 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = addf %66, %57 {RelaxedPrecision} : f32 + %82 = addf %68, %58 {RelaxedPrecision} : f32 + %83 = addf %70, %59 {RelaxedPrecision} : f32 + %84 = addf %72, %60 {RelaxedPrecision} : f32 + %85 = addf %74, %61 {RelaxedPrecision} : f32 + %86 = addf %76, %62 {RelaxedPrecision} : f32 + %87 = addf %78, %63 {RelaxedPrecision} : f32 + %88 = addf %80, %64 {RelaxedPrecision} : f32 + %89 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + store %90, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %91 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %82, %91[%c1_i64 : i64] : vector<8xf32> + store %92, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %93 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %94 = vector.insertelement %83, %93[%c2_i64 : i64] : vector<8xf32> + store %94, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %95 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %84, %95[%c3_i64 : i64] : vector<8xf32> + store %96, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %97 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c4_i64 : i64] : vector<8xf32> + store %98, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %99 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %100 = vector.insertelement %86, %99[%c5_i64 : i64] : vector<8xf32> + store %100, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %101 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %87, %101[%c6_i64 : i64] : vector<8xf32> + store %102, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %103 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %88, %103[%c7_i64 : i64] : vector<8xf32> + store %104, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %105 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %106 = vector.insertelement %81, %105[%c0_i64 : i64] : vector<8xf32> + store %106, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %107 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %82, %107[%c1_i64 : i64] : vector<8xf32> + store %108, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %109 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %83, %109[%c2_i64 : i64] : vector<8xf32> + store %110, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %111 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %112 = vector.insertelement %84, %111[%c3_i64 : i64] : vector<8xf32> + store %112, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %113 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %114 = vector.insertelement %85, %113[%c4_i64 : i64] : vector<8xf32> + store %114, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %115 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %86, %115[%c5_i64 : i64] : vector<8xf32> + store %116, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %117 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %87, %117[%c6_i64 : i64] : vector<8xf32> + store %118, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %119 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %120 = vector.insertelement %88, %119[%c7_i64 : i64] : vector<8xf32> + store %120, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %121 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %122 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %124 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = addi %arg5, %c8 : index + %130 = cmpi "slt", %129, %c0 : index + %131 = subi %c-1, %129 : index + %132 = select %130, %131, %129 : index + %133 = divi_signed %132, %c16 : index + %134 = subi %c-1, %133 : index + %135 = select %130, %134, %133 : index + %136 = remi_signed %135, %c16 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = addi %136, %c16 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %15, %c8 : index + %141 = subi %c-1, %140 : index + %142 = select %13, %141, %140 : index + %143 = muli %135, %c-2 : index + %144 = addi %142, %143 : index + %145 = addi %144, %c1 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c2 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = muli %151, %c-2 : index + %153 = addi %144, %152 : index + %154 = addi %153, %c1 : index + %155 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c0_i64 : i64] : vector<8xf32> + %157 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %158 = vector.extractelement %157[%c1_i64 : i64] : vector<8xf32> + %159 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %160 = vector.extractelement %159[%c2_i64 : i64] : vector<8xf32> + %161 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c3_i64 : i64] : vector<8xf32> + %163 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c4_i64 : i64] : vector<8xf32> + %165 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c5_i64 : i64] : vector<8xf32> + %167 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c6_i64 : i64] : vector<8xf32> + %169 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c7_i64 : i64] : vector<8xf32> + %171 = mulf %121, %156 {RelaxedPrecision} : f32 + %172 = mulf %122, %158 {RelaxedPrecision} : f32 + %173 = mulf %123, %160 {RelaxedPrecision} : f32 + %174 = mulf %124, %162 {RelaxedPrecision} : f32 + %175 = mulf %125, %164 {RelaxedPrecision} : f32 + %176 = mulf %126, %166 {RelaxedPrecision} : f32 + %177 = mulf %127, %168 {RelaxedPrecision} : f32 + %178 = mulf %128, %170 {RelaxedPrecision} : f32 + %179 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %180 = vector.extractelement %179[%c0_i64 : i64] : vector<8xf32> + %181 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %182 = vector.extractelement %181[%c1_i64 : i64] : vector<8xf32> + %183 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %186 = vector.extractelement %185[%c3_i64 : i64] : vector<8xf32> + %187 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %188 = vector.extractelement %187[%c4_i64 : i64] : vector<8xf32> + %189 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c5_i64 : i64] : vector<8xf32> + %191 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %192 = vector.extractelement %191[%c6_i64 : i64] : vector<8xf32> + %193 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c7_i64 : i64] : vector<8xf32> + %195 = addf %180, %171 {RelaxedPrecision} : f32 + %196 = addf %182, %172 {RelaxedPrecision} : f32 + %197 = addf %184, %173 {RelaxedPrecision} : f32 + %198 = addf %186, %174 {RelaxedPrecision} : f32 + %199 = addf %188, %175 {RelaxedPrecision} : f32 + %200 = addf %190, %176 {RelaxedPrecision} : f32 + %201 = addf %192, %177 {RelaxedPrecision} : f32 + %202 = addf %194, %178 {RelaxedPrecision} : f32 + %203 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %204 = vector.insertelement %195, %203[%c0_i64 : i64] : vector<8xf32> + store %204, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %205 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %206 = vector.insertelement %196, %205[%c1_i64 : i64] : vector<8xf32> + store %206, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %207 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %208 = vector.insertelement %197, %207[%c2_i64 : i64] : vector<8xf32> + store %208, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %209 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %210 = vector.insertelement %198, %209[%c3_i64 : i64] : vector<8xf32> + store %210, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %211 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %199, %211[%c4_i64 : i64] : vector<8xf32> + store %212, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %213 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %214 = vector.insertelement %200, %213[%c5_i64 : i64] : vector<8xf32> + store %214, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %215 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %216 = vector.insertelement %201, %215[%c6_i64 : i64] : vector<8xf32> + store %216, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %217 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %202, %217[%c7_i64 : i64] : vector<8xf32> + store %218, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %219 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %220 = vector.insertelement %195, %219[%c0_i64 : i64] : vector<8xf32> + store %220, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %221 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %222 = vector.insertelement %196, %221[%c1_i64 : i64] : vector<8xf32> + store %222, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %223 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %197, %223[%c2_i64 : i64] : vector<8xf32> + store %224, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %225 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %198, %225[%c3_i64 : i64] : vector<8xf32> + store %226, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %227 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %199, %227[%c4_i64 : i64] : vector<8xf32> + store %228, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %229 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %200, %229[%c5_i64 : i64] : vector<8xf32> + store %230, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %231 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %201, %231[%c6_i64 : i64] : vector<8xf32> + store %232, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %233 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %202, %233[%c7_i64 : i64] : vector<8xf32> + store %234, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/14_Canonicalizer.mlir b/Tutorials/optimized_matmul/mlir/14_Canonicalizer.mlir new file mode 100644 index 00000000..2b20194d --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/14_Canonicalizer.mlir @@ -0,0 +1,1368 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = addi %arg4, %c1 : index + %114 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = mulf %114, %115 {RelaxedPrecision} : f32 + %117 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = addf %117, %116 {RelaxedPrecision} : f32 + store %118, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %121 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = mulf %120, %121 {RelaxedPrecision} : f32 + %123 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = addf %123, %122 {RelaxedPrecision} : f32 + store %124, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %125, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = mulf %126, %127 {RelaxedPrecision} : f32 + %129 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = addf %129, %128 {RelaxedPrecision} : f32 + store %130, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %131, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = mulf %132, %133 {RelaxedPrecision} : f32 + %135 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = addf %135, %134 {RelaxedPrecision} : f32 + store %136, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %137, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = mulf %138, %139 {RelaxedPrecision} : f32 + %141 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = addf %141, %140 {RelaxedPrecision} : f32 + store %142, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = mulf %144, %145 {RelaxedPrecision} : f32 + %147 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = addf %147, %146 {RelaxedPrecision} : f32 + store %148, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %149, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %151 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = mulf %150, %151 {RelaxedPrecision} : f32 + %153 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = addf %153, %152 {RelaxedPrecision} : f32 + store %154, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %155, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = mulf %162, %163 {RelaxedPrecision} : f32 + %165 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = addf %165, %164 {RelaxedPrecision} : f32 + store %166, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %169 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = mulf %168, %169 {RelaxedPrecision} : f32 + %171 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addf %171, %170 {RelaxedPrecision} : f32 + store %172, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %173, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %175 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = mulf %174, %175 {RelaxedPrecision} : f32 + %177 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addf %177, %176 {RelaxedPrecision} : f32 + store %178, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %179, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %181 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = mulf %180, %181 {RelaxedPrecision} : f32 + %183 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = addf %183, %182 {RelaxedPrecision} : f32 + store %184, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %185, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %193 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = mulf %192, %193 {RelaxedPrecision} : f32 + %195 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addf %195, %194 {RelaxedPrecision} : f32 + store %196, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %197, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %199 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = mulf %198, %199 {RelaxedPrecision} : f32 + %201 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addf %201, %200 {RelaxedPrecision} : f32 + store %202, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %203, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %205 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = mulf %204, %205 {RelaxedPrecision} : f32 + %207 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = addf %207, %206 {RelaxedPrecision} : f32 + store %208, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %209, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addi %arg4, %c2 : index + %211 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %212 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = mulf %211, %212 {RelaxedPrecision} : f32 + %214 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = addf %214, %213 {RelaxedPrecision} : f32 + store %215, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %216, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %224 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = mulf %223, %224 {RelaxedPrecision} : f32 + %226 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = addf %226, %225 {RelaxedPrecision} : f32 + store %227, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %228, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %230 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = mulf %229, %230 {RelaxedPrecision} : f32 + %232 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = addf %232, %231 {RelaxedPrecision} : f32 + store %233, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %234, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %242 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = mulf %241, %242 {RelaxedPrecision} : f32 + %244 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = addf %244, %243 {RelaxedPrecision} : f32 + store %245, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %246, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %248 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = mulf %247, %248 {RelaxedPrecision} : f32 + %250 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = addf %250, %249 {RelaxedPrecision} : f32 + store %251, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %252, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %260 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = mulf %259, %260 {RelaxedPrecision} : f32 + %262 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = addf %262, %261 {RelaxedPrecision} : f32 + store %263, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %264, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %266 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = mulf %265, %266 {RelaxedPrecision} : f32 + %268 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = addf %268, %267 {RelaxedPrecision} : f32 + store %269, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %270, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %278 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = mulf %277, %278 {RelaxedPrecision} : f32 + %280 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = addf %280, %279 {RelaxedPrecision} : f32 + store %281, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %282, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %284 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = mulf %283, %284 {RelaxedPrecision} : f32 + %286 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = addf %286, %285 {RelaxedPrecision} : f32 + store %287, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %288, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %290 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = mulf %289, %290 {RelaxedPrecision} : f32 + %292 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = addf %292, %291 {RelaxedPrecision} : f32 + store %293, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %294, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %296 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = mulf %295, %296 {RelaxedPrecision} : f32 + %298 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = addf %298, %297 {RelaxedPrecision} : f32 + store %299, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %300, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %302 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = mulf %301, %302 {RelaxedPrecision} : f32 + %304 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = addf %304, %303 {RelaxedPrecision} : f32 + store %305, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %306, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = addi %arg4, %c3 : index + %308 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %309 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = mulf %308, %309 {RelaxedPrecision} : f32 + %311 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addf %311, %310 {RelaxedPrecision} : f32 + store %312, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %313, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %314 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = mulf %314, %315 {RelaxedPrecision} : f32 + %317 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = addf %317, %316 {RelaxedPrecision} : f32 + store %318, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = mulf %320, %321 {RelaxedPrecision} : f32 + %323 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = addf %323, %322 {RelaxedPrecision} : f32 + store %324, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %327 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = mulf %326, %327 {RelaxedPrecision} : f32 + %329 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addf %329, %328 {RelaxedPrecision} : f32 + store %330, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %331, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %333 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = mulf %332, %333 {RelaxedPrecision} : f32 + %335 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = addf %335, %334 {RelaxedPrecision} : f32 + store %336, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %337, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = mulf %338, %339 {RelaxedPrecision} : f32 + %341 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = addf %341, %340 {RelaxedPrecision} : f32 + store %342, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %345 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = mulf %344, %345 {RelaxedPrecision} : f32 + %347 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addf %347, %346 {RelaxedPrecision} : f32 + store %348, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %349, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %351 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = mulf %350, %351 {RelaxedPrecision} : f32 + %353 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = addf %353, %352 {RelaxedPrecision} : f32 + store %354, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %355, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = mulf %356, %357 {RelaxedPrecision} : f32 + %359 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = addf %359, %358 {RelaxedPrecision} : f32 + store %360, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = mulf %362, %363 {RelaxedPrecision} : f32 + %365 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addf %365, %364 {RelaxedPrecision} : f32 + store %366, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %369 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = mulf %368, %369 {RelaxedPrecision} : f32 + %371 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = addf %371, %370 {RelaxedPrecision} : f32 + store %372, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %373, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = mulf %374, %375 {RelaxedPrecision} : f32 + %377 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = addf %377, %376 {RelaxedPrecision} : f32 + store %378, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %381 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = mulf %380, %381 {RelaxedPrecision} : f32 + %383 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addf %383, %382 {RelaxedPrecision} : f32 + store %384, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %385, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = mulf %386, %387 {RelaxedPrecision} : f32 + %389 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = addf %389, %388 {RelaxedPrecision} : f32 + store %390, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = mulf %392, %393 {RelaxedPrecision} : f32 + %395 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = addf %395, %394 {RelaxedPrecision} : f32 + store %396, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %399 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = mulf %398, %399 {RelaxedPrecision} : f32 + %401 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addf %401, %400 {RelaxedPrecision} : f32 + store %402, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %403, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = addi %arg4, %c4 : index + %405 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %412 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = mulf %411, %412 {RelaxedPrecision} : f32 + %414 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = addf %414, %413 {RelaxedPrecision} : f32 + store %415, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %416, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %418 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = mulf %417, %418 {RelaxedPrecision} : f32 + %420 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addf %420, %419 {RelaxedPrecision} : f32 + store %421, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %422, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %430 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = mulf %429, %430 {RelaxedPrecision} : f32 + %432 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = addf %432, %431 {RelaxedPrecision} : f32 + store %433, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %434, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %442 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = mulf %441, %442 {RelaxedPrecision} : f32 + %444 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = addf %444, %443 {RelaxedPrecision} : f32 + store %445, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %446, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %448 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = mulf %447, %448 {RelaxedPrecision} : f32 + %450 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addf %450, %449 {RelaxedPrecision} : f32 + store %451, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %452, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %454 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = mulf %453, %454 {RelaxedPrecision} : f32 + %456 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = addf %456, %455 {RelaxedPrecision} : f32 + store %457, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %458 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %458, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %460 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = mulf %459, %460 {RelaxedPrecision} : f32 + %462 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = addf %462, %461 {RelaxedPrecision} : f32 + store %463, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %464, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %466 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = mulf %465, %466 {RelaxedPrecision} : f32 + %468 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = addf %468, %467 {RelaxedPrecision} : f32 + store %469, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %470, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %472 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = mulf %471, %472 {RelaxedPrecision} : f32 + %474 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = addf %474, %473 {RelaxedPrecision} : f32 + store %475, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %476, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %478 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = mulf %477, %478 {RelaxedPrecision} : f32 + %480 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = addf %480, %479 {RelaxedPrecision} : f32 + store %481, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %482, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %484 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = mulf %483, %484 {RelaxedPrecision} : f32 + %486 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = addf %486, %485 {RelaxedPrecision} : f32 + store %487, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %488, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %490 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = mulf %489, %490 {RelaxedPrecision} : f32 + %492 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = addf %492, %491 {RelaxedPrecision} : f32 + store %493, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %494, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %496 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = mulf %495, %496 {RelaxedPrecision} : f32 + %498 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = addf %498, %497 {RelaxedPrecision} : f32 + store %499, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %500, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = addi %arg4, %c5 : index + %502 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %503 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = mulf %502, %503 {RelaxedPrecision} : f32 + %505 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = addf %505, %504 {RelaxedPrecision} : f32 + store %506, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %507, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %509 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = mulf %508, %509 {RelaxedPrecision} : f32 + %511 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %512 = addf %511, %510 {RelaxedPrecision} : f32 + store %512, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %513, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %521 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = mulf %520, %521 {RelaxedPrecision} : f32 + %523 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = addf %523, %522 {RelaxedPrecision} : f32 + store %524, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %525, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %527 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = mulf %526, %527 {RelaxedPrecision} : f32 + %529 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addf %529, %528 {RelaxedPrecision} : f32 + store %530, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %531, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %533 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = mulf %532, %533 {RelaxedPrecision} : f32 + %535 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addf %535, %534 {RelaxedPrecision} : f32 + store %536, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %537 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %537, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %539 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = mulf %538, %539 {RelaxedPrecision} : f32 + %541 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = addf %541, %540 {RelaxedPrecision} : f32 + store %542, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %543, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %551 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = mulf %550, %551 {RelaxedPrecision} : f32 + %553 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addf %553, %552 {RelaxedPrecision} : f32 + store %554, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %555 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %555, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %557 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = mulf %556, %557 {RelaxedPrecision} : f32 + %559 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addf %559, %558 {RelaxedPrecision} : f32 + store %560, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %561, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %563 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = mulf %562, %563 {RelaxedPrecision} : f32 + %565 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = addf %565, %564 {RelaxedPrecision} : f32 + store %566, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %567, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %569 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = mulf %568, %569 {RelaxedPrecision} : f32 + %571 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = addf %571, %570 {RelaxedPrecision} : f32 + store %572, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %573 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %573, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %581 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = mulf %580, %581 {RelaxedPrecision} : f32 + %583 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = addf %583, %582 {RelaxedPrecision} : f32 + store %584, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %585, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %587 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = mulf %586, %587 {RelaxedPrecision} : f32 + %589 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addf %589, %588 {RelaxedPrecision} : f32 + store %590, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %591 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %591, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %593 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = mulf %592, %593 {RelaxedPrecision} : f32 + %595 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = addf %595, %594 {RelaxedPrecision} : f32 + store %596, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %597 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %597, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = mulf %113, %114 {RelaxedPrecision} : f32 + %116 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %117 = addf %116, %115 {RelaxedPrecision} : f32 + store %117, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %118, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = mulf %125, %126 {RelaxedPrecision} : f32 + %128 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = addf %128, %127 {RelaxedPrecision} : f32 + store %129, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %130, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = mulf %131, %132 {RelaxedPrecision} : f32 + %134 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = addf %134, %133 {RelaxedPrecision} : f32 + store %135, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = mulf %143, %144 {RelaxedPrecision} : f32 + %146 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = addf %146, %145 {RelaxedPrecision} : f32 + store %147, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %148, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = mulf %149, %150 {RelaxedPrecision} : f32 + %152 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = addf %152, %151 {RelaxedPrecision} : f32 + store %153, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %157 = mulf %155, %156 {RelaxedPrecision} : f32 + %158 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = addf %158, %157 {RelaxedPrecision} : f32 + store %159, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %160, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = mulf %161, %162 {RelaxedPrecision} : f32 + %164 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = addf %164, %163 {RelaxedPrecision} : f32 + store %165, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %166, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = mulf %167, %168 {RelaxedPrecision} : f32 + %170 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = addf %170, %169 {RelaxedPrecision} : f32 + store %171, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %174 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = mulf %173, %174 {RelaxedPrecision} : f32 + %176 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = addf %176, %175 {RelaxedPrecision} : f32 + store %177, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %178, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %180 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = mulf %179, %180 {RelaxedPrecision} : f32 + %182 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = addf %182, %181 {RelaxedPrecision} : f32 + store %183, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %184, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = mulf %185, %186 {RelaxedPrecision} : f32 + %188 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = addf %188, %187 {RelaxedPrecision} : f32 + store %189, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %192 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %193 = mulf %191, %192 {RelaxedPrecision} : f32 + %194 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = addf %194, %193 {RelaxedPrecision} : f32 + store %195, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %196, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %198 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = mulf %197, %198 {RelaxedPrecision} : f32 + %200 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = addf %200, %199 {RelaxedPrecision} : f32 + store %201, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %202, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = mulf %203, %204 {RelaxedPrecision} : f32 + %206 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = addf %206, %205 {RelaxedPrecision} : f32 + store %207, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %210 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = mulf %209, %210 {RelaxedPrecision} : f32 + %212 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = addf %212, %211 {RelaxedPrecision} : f32 + store %213, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %214, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %216 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = mulf %215, %216 {RelaxedPrecision} : f32 + %218 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = addf %218, %217 {RelaxedPrecision} : f32 + store %219, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %220, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %318 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = mulf %317, %318 {RelaxedPrecision} : f32 + %320 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addf %320, %319 {RelaxedPrecision} : f32 + store %321, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %322, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %324 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = mulf %323, %324 {RelaxedPrecision} : f32 + %326 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = addf %326, %325 {RelaxedPrecision} : f32 + store %327, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %328, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = mulf %329, %330 {RelaxedPrecision} : f32 + %332 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = addf %332, %331 {RelaxedPrecision} : f32 + store %333, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %342 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = mulf %341, %342 {RelaxedPrecision} : f32 + %344 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %345 = addf %344, %343 {RelaxedPrecision} : f32 + store %345, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %346, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = mulf %347, %348 {RelaxedPrecision} : f32 + %350 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addf %350, %349 {RelaxedPrecision} : f32 + store %351, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %354 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = mulf %353, %354 {RelaxedPrecision} : f32 + %356 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addf %356, %355 {RelaxedPrecision} : f32 + store %357, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %358, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %360 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = mulf %359, %360 {RelaxedPrecision} : f32 + %362 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %363 = addf %362, %361 {RelaxedPrecision} : f32 + store %363, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %364, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %372 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = mulf %371, %372 {RelaxedPrecision} : f32 + %374 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addf %374, %373 {RelaxedPrecision} : f32 + store %375, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %376, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %378 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = mulf %377, %378 {RelaxedPrecision} : f32 + %380 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addf %380, %379 {RelaxedPrecision} : f32 + store %381, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %382, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = mulf %383, %384 {RelaxedPrecision} : f32 + %386 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = addf %386, %385 {RelaxedPrecision} : f32 + store %387, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %390 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = mulf %389, %390 {RelaxedPrecision} : f32 + %392 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addf %392, %391 {RelaxedPrecision} : f32 + store %393, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %394, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/14_ConvertAcceraToSPIRV.mlir b/Tutorials/optimized_matmul/mlir/14_ConvertAcceraToSPIRV.mlir new file mode 100644 index 00000000..aa07fd1b --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/14_ConvertAcceraToSPIRV.mlir @@ -0,0 +1,2095 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg6, %arg8 : index + %7 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = cmpi "slt", %arg5, %c0 : index + %16 = subi %c-1, %arg5 : index + %17 = select %15, %16, %arg5 : index + %18 = divi_signed %17, %c16 : index + %19 = subi %c-1, %18 : index + %20 = select %15, %19, %18 : index + %21 = remi_signed %20, %c16 : index + %22 = cmpi "slt", %21, %c0 : index + %23 = addi %21, %c16 : index + %24 = select %22, %23, %21 : index + %25 = remi_signed %6, %c128 : index + %26 = cmpi "slt", %25, %c0 : index + %27 = addi %25, %c128 : index + %28 = select %26, %27, %25 : index + %29 = remi_signed %arg5, %c16 : index + %30 = cmpi "slt", %29, %c0 : index + %31 = addi %29, %c16 : index + %32 = select %30, %31, %29 : index + %33 = cmpi "slt", %32, %c0 : index + %34 = subi %c-1, %32 : index + %35 = select %33, %34, %32 : index + %36 = divi_signed %35, %c8 : index + %37 = subi %c-1, %36 : index + %38 = select %33, %37, %36 : index + %39 = remi_signed %38, %c2 : index + %40 = cmpi "slt", %39, %c0 : index + %41 = addi %39, %c2 : index + %42 = select %40, %41, %39 : index + %43 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c0_i64 : i64] : vector<8xf32> + %45 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c1_i64 : i64] : vector<8xf32> + %47 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c2_i64 : i64] : vector<8xf32> + %49 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c3_i64 : i64] : vector<8xf32> + %51 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c4_i64 : i64] : vector<8xf32> + %53 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c5_i64 : i64] : vector<8xf32> + %55 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c6_i64 : i64] : vector<8xf32> + %57 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %58 = vector.extractelement %57[%c7_i64 : i64] : vector<8xf32> + %59 = mulf %7, %44 {RelaxedPrecision} : f32 + %60 = mulf %8, %46 {RelaxedPrecision} : f32 + %61 = mulf %9, %48 {RelaxedPrecision} : f32 + %62 = mulf %10, %50 {RelaxedPrecision} : f32 + %63 = mulf %11, %52 {RelaxedPrecision} : f32 + %64 = mulf %12, %54 {RelaxedPrecision} : f32 + %65 = mulf %13, %56 {RelaxedPrecision} : f32 + %66 = mulf %14, %58 {RelaxedPrecision} : f32 + %67 = addi %arg7, %arg9 : index + %68 = remi_signed %67, %c6 : index + %69 = cmpi "slt", %68, %c0 : index + %70 = addi %68, %c6 : index + %71 = select %69, %70, %68 : index + %72 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c0_i64 : i64] : vector<8xf32> + %74 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %75 = vector.extractelement %74[%c1_i64 : i64] : vector<8xf32> + %76 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c2_i64 : i64] : vector<8xf32> + %78 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c3_i64 : i64] : vector<8xf32> + %80 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %81 = vector.extractelement %80[%c4_i64 : i64] : vector<8xf32> + %82 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %83 = vector.extractelement %82[%c5_i64 : i64] : vector<8xf32> + %84 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %85 = vector.extractelement %84[%c6_i64 : i64] : vector<8xf32> + %86 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %87 = vector.extractelement %86[%c7_i64 : i64] : vector<8xf32> + %88 = addf %73, %59 {RelaxedPrecision} : f32 + %89 = addf %75, %60 {RelaxedPrecision} : f32 + %90 = addf %77, %61 {RelaxedPrecision} : f32 + %91 = addf %79, %62 {RelaxedPrecision} : f32 + %92 = addf %81, %63 {RelaxedPrecision} : f32 + %93 = addf %83, %64 {RelaxedPrecision} : f32 + %94 = addf %85, %65 {RelaxedPrecision} : f32 + %95 = addf %87, %66 {RelaxedPrecision} : f32 + %96 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %88, %96[%c0_i64 : i64] : vector<8xf32> + store %97, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %98 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %89, %98[%c1_i64 : i64] : vector<8xf32> + store %99, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %100 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %90, %100[%c2_i64 : i64] : vector<8xf32> + store %101, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %102 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %91, %102[%c3_i64 : i64] : vector<8xf32> + store %103, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %104 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %92, %104[%c4_i64 : i64] : vector<8xf32> + store %105, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %106 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %93, %106[%c5_i64 : i64] : vector<8xf32> + store %107, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %108 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %109 = vector.insertelement %94, %108[%c6_i64 : i64] : vector<8xf32> + store %109, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %110 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %95, %110[%c7_i64 : i64] : vector<8xf32> + store %111, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %112 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %88, %112[%c0_i64 : i64] : vector<8xf32> + store %113, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %114 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %89, %114[%c1_i64 : i64] : vector<8xf32> + store %115, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %116 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %117 = vector.insertelement %90, %116[%c2_i64 : i64] : vector<8xf32> + store %117, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %118 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %91, %118[%c3_i64 : i64] : vector<8xf32> + store %119, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %120 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %92, %120[%c4_i64 : i64] : vector<8xf32> + store %121, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %122 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %123 = vector.insertelement %93, %122[%c5_i64 : i64] : vector<8xf32> + store %123, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %124 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %94, %124[%c6_i64 : i64] : vector<8xf32> + store %125, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %126 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %95, %126[%c7_i64 : i64] : vector<8xf32> + store %127, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %128 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %134 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %135 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %136 = addi %arg5, %c8 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = subi %c-1, %136 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = remi_signed %142, %c16 : index + %144 = cmpi "slt", %143, %c0 : index + %145 = addi %143, %c16 : index + %146 = select %144, %145, %143 : index + %147 = divi_signed %17, %c8 : index + %148 = subi %c-1, %147 : index + %149 = select %15, %148, %147 : index + %150 = muli %142, %c-2 : index + %151 = addi %149, %150 : index + %152 = addi %151, %c1 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c1 : index + %162 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c2_i64 : i64] : vector<8xf32> + %168 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c3_i64 : i64] : vector<8xf32> + %170 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c4_i64 : i64] : vector<8xf32> + %172 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c5_i64 : i64] : vector<8xf32> + %174 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c6_i64 : i64] : vector<8xf32> + %176 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c7_i64 : i64] : vector<8xf32> + %178 = mulf %128, %163 {RelaxedPrecision} : f32 + %179 = mulf %129, %165 {RelaxedPrecision} : f32 + %180 = mulf %130, %167 {RelaxedPrecision} : f32 + %181 = mulf %131, %169 {RelaxedPrecision} : f32 + %182 = mulf %132, %171 {RelaxedPrecision} : f32 + %183 = mulf %133, %173 {RelaxedPrecision} : f32 + %184 = mulf %134, %175 {RelaxedPrecision} : f32 + %185 = mulf %135, %177 {RelaxedPrecision} : f32 + %186 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c0_i64 : i64] : vector<8xf32> + %188 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %189 = vector.extractelement %188[%c1_i64 : i64] : vector<8xf32> + %190 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %191 = vector.extractelement %190[%c2_i64 : i64] : vector<8xf32> + %192 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c3_i64 : i64] : vector<8xf32> + %194 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c4_i64 : i64] : vector<8xf32> + %196 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c5_i64 : i64] : vector<8xf32> + %198 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c6_i64 : i64] : vector<8xf32> + %200 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c7_i64 : i64] : vector<8xf32> + %202 = addf %187, %178 {RelaxedPrecision} : f32 + %203 = addf %189, %179 {RelaxedPrecision} : f32 + %204 = addf %191, %180 {RelaxedPrecision} : f32 + %205 = addf %193, %181 {RelaxedPrecision} : f32 + %206 = addf %195, %182 {RelaxedPrecision} : f32 + %207 = addf %197, %183 {RelaxedPrecision} : f32 + %208 = addf %199, %184 {RelaxedPrecision} : f32 + %209 = addf %201, %185 {RelaxedPrecision} : f32 + %210 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %202, %210[%c0_i64 : i64] : vector<8xf32> + store %211, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %212 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %213 = vector.insertelement %203, %212[%c1_i64 : i64] : vector<8xf32> + store %213, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %214 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %204, %214[%c2_i64 : i64] : vector<8xf32> + store %215, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %216 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %217 = vector.insertelement %205, %216[%c3_i64 : i64] : vector<8xf32> + store %217, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %218 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %219 = vector.insertelement %206, %218[%c4_i64 : i64] : vector<8xf32> + store %219, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %220 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %207, %220[%c5_i64 : i64] : vector<8xf32> + store %221, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %222 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %223 = vector.insertelement %208, %222[%c6_i64 : i64] : vector<8xf32> + store %223, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %224 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %209, %224[%c7_i64 : i64] : vector<8xf32> + store %225, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %226 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %202, %226[%c0_i64 : i64] : vector<8xf32> + store %227, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %228 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %203, %228[%c1_i64 : i64] : vector<8xf32> + store %229, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %230 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %204, %230[%c2_i64 : i64] : vector<8xf32> + store %231, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %232 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %205, %232[%c3_i64 : i64] : vector<8xf32> + store %233, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %234 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %206, %234[%c4_i64 : i64] : vector<8xf32> + store %235, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %236 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %207, %236[%c5_i64 : i64] : vector<8xf32> + store %237, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %238 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %208, %238[%c6_i64 : i64] : vector<8xf32> + store %239, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %240 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %209, %240[%c7_i64 : i64] : vector<8xf32> + store %241, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %6 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %7 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = cmpi "slt", %arg5, %c0 : index + %14 = subi %c-1, %arg5 : index + %15 = select %13, %14, %arg5 : index + %16 = divi_signed %15, %c16 : index + %17 = subi %c-1, %16 : index + %18 = select %13, %17, %16 : index + %19 = remi_signed %18, %c16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = addi %19, %c16 : index + %22 = select %20, %21, %19 : index + %23 = remi_signed %4, %c128 : index + %24 = cmpi "slt", %23, %c0 : index + %25 = addi %23, %c128 : index + %26 = select %24, %25, %23 : index + %27 = remi_signed %arg5, %c16 : index + %28 = cmpi "slt", %27, %c0 : index + %29 = addi %27, %c16 : index + %30 = select %28, %29, %27 : index + %31 = cmpi "slt", %30, %c0 : index + %32 = subi %c-1, %30 : index + %33 = select %31, %32, %30 : index + %34 = divi_signed %33, %c8 : index + %35 = subi %c-1, %34 : index + %36 = select %31, %35, %34 : index + %37 = remi_signed %36, %c2 : index + %38 = cmpi "slt", %37, %c0 : index + %39 = addi %37, %c2 : index + %40 = select %38, %39, %37 : index + %41 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c0_i64 : i64] : vector<8xf32> + %43 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c1_i64 : i64] : vector<8xf32> + %45 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c2_i64 : i64] : vector<8xf32> + %47 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c3_i64 : i64] : vector<8xf32> + %49 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c4_i64 : i64] : vector<8xf32> + %51 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c5_i64 : i64] : vector<8xf32> + %53 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c6_i64 : i64] : vector<8xf32> + %55 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c7_i64 : i64] : vector<8xf32> + %57 = mulf %5, %42 {RelaxedPrecision} : f32 + %58 = mulf %6, %44 {RelaxedPrecision} : f32 + %59 = mulf %7, %46 {RelaxedPrecision} : f32 + %60 = mulf %8, %48 {RelaxedPrecision} : f32 + %61 = mulf %9, %50 {RelaxedPrecision} : f32 + %62 = mulf %10, %52 {RelaxedPrecision} : f32 + %63 = mulf %11, %54 {RelaxedPrecision} : f32 + %64 = mulf %12, %56 {RelaxedPrecision} : f32 + %65 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c1_i64 : i64] : vector<8xf32> + %69 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c2_i64 : i64] : vector<8xf32> + %71 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %72 = vector.extractelement %71[%c3_i64 : i64] : vector<8xf32> + %73 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c4_i64 : i64] : vector<8xf32> + %75 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %78 = vector.extractelement %77[%c6_i64 : i64] : vector<8xf32> + %79 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = addf %66, %57 {RelaxedPrecision} : f32 + %82 = addf %68, %58 {RelaxedPrecision} : f32 + %83 = addf %70, %59 {RelaxedPrecision} : f32 + %84 = addf %72, %60 {RelaxedPrecision} : f32 + %85 = addf %74, %61 {RelaxedPrecision} : f32 + %86 = addf %76, %62 {RelaxedPrecision} : f32 + %87 = addf %78, %63 {RelaxedPrecision} : f32 + %88 = addf %80, %64 {RelaxedPrecision} : f32 + %89 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + store %90, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %91 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %82, %91[%c1_i64 : i64] : vector<8xf32> + store %92, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %93 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %94 = vector.insertelement %83, %93[%c2_i64 : i64] : vector<8xf32> + store %94, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %95 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %84, %95[%c3_i64 : i64] : vector<8xf32> + store %96, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %97 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c4_i64 : i64] : vector<8xf32> + store %98, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %99 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %100 = vector.insertelement %86, %99[%c5_i64 : i64] : vector<8xf32> + store %100, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %101 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %87, %101[%c6_i64 : i64] : vector<8xf32> + store %102, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %103 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %88, %103[%c7_i64 : i64] : vector<8xf32> + store %104, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %105 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %106 = vector.insertelement %81, %105[%c0_i64 : i64] : vector<8xf32> + store %106, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %107 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %82, %107[%c1_i64 : i64] : vector<8xf32> + store %108, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %109 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %83, %109[%c2_i64 : i64] : vector<8xf32> + store %110, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %111 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %112 = vector.insertelement %84, %111[%c3_i64 : i64] : vector<8xf32> + store %112, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %113 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %114 = vector.insertelement %85, %113[%c4_i64 : i64] : vector<8xf32> + store %114, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %115 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %86, %115[%c5_i64 : i64] : vector<8xf32> + store %116, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %117 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %87, %117[%c6_i64 : i64] : vector<8xf32> + store %118, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %119 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %120 = vector.insertelement %88, %119[%c7_i64 : i64] : vector<8xf32> + store %120, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %121 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %122 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %124 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = addi %arg5, %c8 : index + %130 = cmpi "slt", %129, %c0 : index + %131 = subi %c-1, %129 : index + %132 = select %130, %131, %129 : index + %133 = divi_signed %132, %c16 : index + %134 = subi %c-1, %133 : index + %135 = select %130, %134, %133 : index + %136 = remi_signed %135, %c16 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = addi %136, %c16 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %15, %c8 : index + %141 = subi %c-1, %140 : index + %142 = select %13, %141, %140 : index + %143 = muli %135, %c-2 : index + %144 = addi %142, %143 : index + %145 = addi %144, %c1 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c2 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = muli %151, %c-2 : index + %153 = addi %144, %152 : index + %154 = addi %153, %c1 : index + %155 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c0_i64 : i64] : vector<8xf32> + %157 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %158 = vector.extractelement %157[%c1_i64 : i64] : vector<8xf32> + %159 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %160 = vector.extractelement %159[%c2_i64 : i64] : vector<8xf32> + %161 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c3_i64 : i64] : vector<8xf32> + %163 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c4_i64 : i64] : vector<8xf32> + %165 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c5_i64 : i64] : vector<8xf32> + %167 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c6_i64 : i64] : vector<8xf32> + %169 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c7_i64 : i64] : vector<8xf32> + %171 = mulf %121, %156 {RelaxedPrecision} : f32 + %172 = mulf %122, %158 {RelaxedPrecision} : f32 + %173 = mulf %123, %160 {RelaxedPrecision} : f32 + %174 = mulf %124, %162 {RelaxedPrecision} : f32 + %175 = mulf %125, %164 {RelaxedPrecision} : f32 + %176 = mulf %126, %166 {RelaxedPrecision} : f32 + %177 = mulf %127, %168 {RelaxedPrecision} : f32 + %178 = mulf %128, %170 {RelaxedPrecision} : f32 + %179 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %180 = vector.extractelement %179[%c0_i64 : i64] : vector<8xf32> + %181 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %182 = vector.extractelement %181[%c1_i64 : i64] : vector<8xf32> + %183 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %186 = vector.extractelement %185[%c3_i64 : i64] : vector<8xf32> + %187 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %188 = vector.extractelement %187[%c4_i64 : i64] : vector<8xf32> + %189 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c5_i64 : i64] : vector<8xf32> + %191 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %192 = vector.extractelement %191[%c6_i64 : i64] : vector<8xf32> + %193 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c7_i64 : i64] : vector<8xf32> + %195 = addf %180, %171 {RelaxedPrecision} : f32 + %196 = addf %182, %172 {RelaxedPrecision} : f32 + %197 = addf %184, %173 {RelaxedPrecision} : f32 + %198 = addf %186, %174 {RelaxedPrecision} : f32 + %199 = addf %188, %175 {RelaxedPrecision} : f32 + %200 = addf %190, %176 {RelaxedPrecision} : f32 + %201 = addf %192, %177 {RelaxedPrecision} : f32 + %202 = addf %194, %178 {RelaxedPrecision} : f32 + %203 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %204 = vector.insertelement %195, %203[%c0_i64 : i64] : vector<8xf32> + store %204, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %205 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %206 = vector.insertelement %196, %205[%c1_i64 : i64] : vector<8xf32> + store %206, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %207 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %208 = vector.insertelement %197, %207[%c2_i64 : i64] : vector<8xf32> + store %208, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %209 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %210 = vector.insertelement %198, %209[%c3_i64 : i64] : vector<8xf32> + store %210, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %211 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %199, %211[%c4_i64 : i64] : vector<8xf32> + store %212, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %213 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %214 = vector.insertelement %200, %213[%c5_i64 : i64] : vector<8xf32> + store %214, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %215 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %216 = vector.insertelement %201, %215[%c6_i64 : i64] : vector<8xf32> + store %216, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %217 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %202, %217[%c7_i64 : i64] : vector<8xf32> + store %218, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %219 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %220 = vector.insertelement %195, %219[%c0_i64 : i64] : vector<8xf32> + store %220, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %221 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %222 = vector.insertelement %196, %221[%c1_i64 : i64] : vector<8xf32> + store %222, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %223 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %197, %223[%c2_i64 : i64] : vector<8xf32> + store %224, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %225 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %198, %225[%c3_i64 : i64] : vector<8xf32> + store %226, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %227 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %199, %227[%c4_i64 : i64] : vector<8xf32> + store %228, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %229 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %200, %229[%c5_i64 : i64] : vector<8xf32> + store %230, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %231 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %201, %231[%c6_i64 : i64] : vector<8xf32> + store %232, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %233 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %202, %233[%c7_i64 : i64] : vector<8xf32> + store %234, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/15_Canonicalizer.mlir b/Tutorials/optimized_matmul/mlir/15_Canonicalizer.mlir new file mode 100644 index 00000000..aa07fd1b --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/15_Canonicalizer.mlir @@ -0,0 +1,2095 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg6, %arg8 : index + %7 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = cmpi "slt", %arg5, %c0 : index + %16 = subi %c-1, %arg5 : index + %17 = select %15, %16, %arg5 : index + %18 = divi_signed %17, %c16 : index + %19 = subi %c-1, %18 : index + %20 = select %15, %19, %18 : index + %21 = remi_signed %20, %c16 : index + %22 = cmpi "slt", %21, %c0 : index + %23 = addi %21, %c16 : index + %24 = select %22, %23, %21 : index + %25 = remi_signed %6, %c128 : index + %26 = cmpi "slt", %25, %c0 : index + %27 = addi %25, %c128 : index + %28 = select %26, %27, %25 : index + %29 = remi_signed %arg5, %c16 : index + %30 = cmpi "slt", %29, %c0 : index + %31 = addi %29, %c16 : index + %32 = select %30, %31, %29 : index + %33 = cmpi "slt", %32, %c0 : index + %34 = subi %c-1, %32 : index + %35 = select %33, %34, %32 : index + %36 = divi_signed %35, %c8 : index + %37 = subi %c-1, %36 : index + %38 = select %33, %37, %36 : index + %39 = remi_signed %38, %c2 : index + %40 = cmpi "slt", %39, %c0 : index + %41 = addi %39, %c2 : index + %42 = select %40, %41, %39 : index + %43 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c0_i64 : i64] : vector<8xf32> + %45 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c1_i64 : i64] : vector<8xf32> + %47 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c2_i64 : i64] : vector<8xf32> + %49 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c3_i64 : i64] : vector<8xf32> + %51 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c4_i64 : i64] : vector<8xf32> + %53 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c5_i64 : i64] : vector<8xf32> + %55 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c6_i64 : i64] : vector<8xf32> + %57 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %58 = vector.extractelement %57[%c7_i64 : i64] : vector<8xf32> + %59 = mulf %7, %44 {RelaxedPrecision} : f32 + %60 = mulf %8, %46 {RelaxedPrecision} : f32 + %61 = mulf %9, %48 {RelaxedPrecision} : f32 + %62 = mulf %10, %50 {RelaxedPrecision} : f32 + %63 = mulf %11, %52 {RelaxedPrecision} : f32 + %64 = mulf %12, %54 {RelaxedPrecision} : f32 + %65 = mulf %13, %56 {RelaxedPrecision} : f32 + %66 = mulf %14, %58 {RelaxedPrecision} : f32 + %67 = addi %arg7, %arg9 : index + %68 = remi_signed %67, %c6 : index + %69 = cmpi "slt", %68, %c0 : index + %70 = addi %68, %c6 : index + %71 = select %69, %70, %68 : index + %72 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c0_i64 : i64] : vector<8xf32> + %74 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %75 = vector.extractelement %74[%c1_i64 : i64] : vector<8xf32> + %76 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c2_i64 : i64] : vector<8xf32> + %78 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c3_i64 : i64] : vector<8xf32> + %80 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %81 = vector.extractelement %80[%c4_i64 : i64] : vector<8xf32> + %82 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %83 = vector.extractelement %82[%c5_i64 : i64] : vector<8xf32> + %84 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %85 = vector.extractelement %84[%c6_i64 : i64] : vector<8xf32> + %86 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %87 = vector.extractelement %86[%c7_i64 : i64] : vector<8xf32> + %88 = addf %73, %59 {RelaxedPrecision} : f32 + %89 = addf %75, %60 {RelaxedPrecision} : f32 + %90 = addf %77, %61 {RelaxedPrecision} : f32 + %91 = addf %79, %62 {RelaxedPrecision} : f32 + %92 = addf %81, %63 {RelaxedPrecision} : f32 + %93 = addf %83, %64 {RelaxedPrecision} : f32 + %94 = addf %85, %65 {RelaxedPrecision} : f32 + %95 = addf %87, %66 {RelaxedPrecision} : f32 + %96 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %88, %96[%c0_i64 : i64] : vector<8xf32> + store %97, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %98 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %89, %98[%c1_i64 : i64] : vector<8xf32> + store %99, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %100 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %90, %100[%c2_i64 : i64] : vector<8xf32> + store %101, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %102 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %91, %102[%c3_i64 : i64] : vector<8xf32> + store %103, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %104 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %92, %104[%c4_i64 : i64] : vector<8xf32> + store %105, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %106 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %93, %106[%c5_i64 : i64] : vector<8xf32> + store %107, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %108 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %109 = vector.insertelement %94, %108[%c6_i64 : i64] : vector<8xf32> + store %109, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %110 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %95, %110[%c7_i64 : i64] : vector<8xf32> + store %111, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %112 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %88, %112[%c0_i64 : i64] : vector<8xf32> + store %113, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %114 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %89, %114[%c1_i64 : i64] : vector<8xf32> + store %115, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %116 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %117 = vector.insertelement %90, %116[%c2_i64 : i64] : vector<8xf32> + store %117, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %118 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %91, %118[%c3_i64 : i64] : vector<8xf32> + store %119, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %120 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %92, %120[%c4_i64 : i64] : vector<8xf32> + store %121, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %122 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %123 = vector.insertelement %93, %122[%c5_i64 : i64] : vector<8xf32> + store %123, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %124 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %94, %124[%c6_i64 : i64] : vector<8xf32> + store %125, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %126 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %95, %126[%c7_i64 : i64] : vector<8xf32> + store %127, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %128 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %134 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %135 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %136 = addi %arg5, %c8 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = subi %c-1, %136 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = remi_signed %142, %c16 : index + %144 = cmpi "slt", %143, %c0 : index + %145 = addi %143, %c16 : index + %146 = select %144, %145, %143 : index + %147 = divi_signed %17, %c8 : index + %148 = subi %c-1, %147 : index + %149 = select %15, %148, %147 : index + %150 = muli %142, %c-2 : index + %151 = addi %149, %150 : index + %152 = addi %151, %c1 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c1 : index + %162 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c2_i64 : i64] : vector<8xf32> + %168 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c3_i64 : i64] : vector<8xf32> + %170 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c4_i64 : i64] : vector<8xf32> + %172 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c5_i64 : i64] : vector<8xf32> + %174 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c6_i64 : i64] : vector<8xf32> + %176 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c7_i64 : i64] : vector<8xf32> + %178 = mulf %128, %163 {RelaxedPrecision} : f32 + %179 = mulf %129, %165 {RelaxedPrecision} : f32 + %180 = mulf %130, %167 {RelaxedPrecision} : f32 + %181 = mulf %131, %169 {RelaxedPrecision} : f32 + %182 = mulf %132, %171 {RelaxedPrecision} : f32 + %183 = mulf %133, %173 {RelaxedPrecision} : f32 + %184 = mulf %134, %175 {RelaxedPrecision} : f32 + %185 = mulf %135, %177 {RelaxedPrecision} : f32 + %186 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c0_i64 : i64] : vector<8xf32> + %188 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %189 = vector.extractelement %188[%c1_i64 : i64] : vector<8xf32> + %190 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %191 = vector.extractelement %190[%c2_i64 : i64] : vector<8xf32> + %192 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c3_i64 : i64] : vector<8xf32> + %194 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c4_i64 : i64] : vector<8xf32> + %196 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c5_i64 : i64] : vector<8xf32> + %198 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c6_i64 : i64] : vector<8xf32> + %200 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c7_i64 : i64] : vector<8xf32> + %202 = addf %187, %178 {RelaxedPrecision} : f32 + %203 = addf %189, %179 {RelaxedPrecision} : f32 + %204 = addf %191, %180 {RelaxedPrecision} : f32 + %205 = addf %193, %181 {RelaxedPrecision} : f32 + %206 = addf %195, %182 {RelaxedPrecision} : f32 + %207 = addf %197, %183 {RelaxedPrecision} : f32 + %208 = addf %199, %184 {RelaxedPrecision} : f32 + %209 = addf %201, %185 {RelaxedPrecision} : f32 + %210 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %202, %210[%c0_i64 : i64] : vector<8xf32> + store %211, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %212 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %213 = vector.insertelement %203, %212[%c1_i64 : i64] : vector<8xf32> + store %213, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %214 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %204, %214[%c2_i64 : i64] : vector<8xf32> + store %215, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %216 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %217 = vector.insertelement %205, %216[%c3_i64 : i64] : vector<8xf32> + store %217, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %218 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %219 = vector.insertelement %206, %218[%c4_i64 : i64] : vector<8xf32> + store %219, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %220 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %207, %220[%c5_i64 : i64] : vector<8xf32> + store %221, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %222 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %223 = vector.insertelement %208, %222[%c6_i64 : i64] : vector<8xf32> + store %223, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %224 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %209, %224[%c7_i64 : i64] : vector<8xf32> + store %225, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %226 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %202, %226[%c0_i64 : i64] : vector<8xf32> + store %227, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %228 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %203, %228[%c1_i64 : i64] : vector<8xf32> + store %229, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %230 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %204, %230[%c2_i64 : i64] : vector<8xf32> + store %231, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %232 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %205, %232[%c3_i64 : i64] : vector<8xf32> + store %233, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %234 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %206, %234[%c4_i64 : i64] : vector<8xf32> + store %235, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %236 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %207, %236[%c5_i64 : i64] : vector<8xf32> + store %237, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %238 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %208, %238[%c6_i64 : i64] : vector<8xf32> + store %239, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %240 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %209, %240[%c7_i64 : i64] : vector<8xf32> + store %241, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %6 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %7 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = cmpi "slt", %arg5, %c0 : index + %14 = subi %c-1, %arg5 : index + %15 = select %13, %14, %arg5 : index + %16 = divi_signed %15, %c16 : index + %17 = subi %c-1, %16 : index + %18 = select %13, %17, %16 : index + %19 = remi_signed %18, %c16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = addi %19, %c16 : index + %22 = select %20, %21, %19 : index + %23 = remi_signed %4, %c128 : index + %24 = cmpi "slt", %23, %c0 : index + %25 = addi %23, %c128 : index + %26 = select %24, %25, %23 : index + %27 = remi_signed %arg5, %c16 : index + %28 = cmpi "slt", %27, %c0 : index + %29 = addi %27, %c16 : index + %30 = select %28, %29, %27 : index + %31 = cmpi "slt", %30, %c0 : index + %32 = subi %c-1, %30 : index + %33 = select %31, %32, %30 : index + %34 = divi_signed %33, %c8 : index + %35 = subi %c-1, %34 : index + %36 = select %31, %35, %34 : index + %37 = remi_signed %36, %c2 : index + %38 = cmpi "slt", %37, %c0 : index + %39 = addi %37, %c2 : index + %40 = select %38, %39, %37 : index + %41 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c0_i64 : i64] : vector<8xf32> + %43 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c1_i64 : i64] : vector<8xf32> + %45 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c2_i64 : i64] : vector<8xf32> + %47 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c3_i64 : i64] : vector<8xf32> + %49 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c4_i64 : i64] : vector<8xf32> + %51 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c5_i64 : i64] : vector<8xf32> + %53 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c6_i64 : i64] : vector<8xf32> + %55 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c7_i64 : i64] : vector<8xf32> + %57 = mulf %5, %42 {RelaxedPrecision} : f32 + %58 = mulf %6, %44 {RelaxedPrecision} : f32 + %59 = mulf %7, %46 {RelaxedPrecision} : f32 + %60 = mulf %8, %48 {RelaxedPrecision} : f32 + %61 = mulf %9, %50 {RelaxedPrecision} : f32 + %62 = mulf %10, %52 {RelaxedPrecision} : f32 + %63 = mulf %11, %54 {RelaxedPrecision} : f32 + %64 = mulf %12, %56 {RelaxedPrecision} : f32 + %65 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c1_i64 : i64] : vector<8xf32> + %69 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c2_i64 : i64] : vector<8xf32> + %71 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %72 = vector.extractelement %71[%c3_i64 : i64] : vector<8xf32> + %73 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c4_i64 : i64] : vector<8xf32> + %75 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %78 = vector.extractelement %77[%c6_i64 : i64] : vector<8xf32> + %79 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = addf %66, %57 {RelaxedPrecision} : f32 + %82 = addf %68, %58 {RelaxedPrecision} : f32 + %83 = addf %70, %59 {RelaxedPrecision} : f32 + %84 = addf %72, %60 {RelaxedPrecision} : f32 + %85 = addf %74, %61 {RelaxedPrecision} : f32 + %86 = addf %76, %62 {RelaxedPrecision} : f32 + %87 = addf %78, %63 {RelaxedPrecision} : f32 + %88 = addf %80, %64 {RelaxedPrecision} : f32 + %89 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + store %90, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %91 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %82, %91[%c1_i64 : i64] : vector<8xf32> + store %92, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %93 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %94 = vector.insertelement %83, %93[%c2_i64 : i64] : vector<8xf32> + store %94, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %95 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %84, %95[%c3_i64 : i64] : vector<8xf32> + store %96, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %97 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c4_i64 : i64] : vector<8xf32> + store %98, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %99 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %100 = vector.insertelement %86, %99[%c5_i64 : i64] : vector<8xf32> + store %100, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %101 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %87, %101[%c6_i64 : i64] : vector<8xf32> + store %102, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %103 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %88, %103[%c7_i64 : i64] : vector<8xf32> + store %104, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %105 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %106 = vector.insertelement %81, %105[%c0_i64 : i64] : vector<8xf32> + store %106, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %107 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %82, %107[%c1_i64 : i64] : vector<8xf32> + store %108, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %109 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %83, %109[%c2_i64 : i64] : vector<8xf32> + store %110, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %111 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %112 = vector.insertelement %84, %111[%c3_i64 : i64] : vector<8xf32> + store %112, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %113 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %114 = vector.insertelement %85, %113[%c4_i64 : i64] : vector<8xf32> + store %114, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %115 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %86, %115[%c5_i64 : i64] : vector<8xf32> + store %116, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %117 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %87, %117[%c6_i64 : i64] : vector<8xf32> + store %118, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %119 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %120 = vector.insertelement %88, %119[%c7_i64 : i64] : vector<8xf32> + store %120, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %121 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %122 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %124 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = addi %arg5, %c8 : index + %130 = cmpi "slt", %129, %c0 : index + %131 = subi %c-1, %129 : index + %132 = select %130, %131, %129 : index + %133 = divi_signed %132, %c16 : index + %134 = subi %c-1, %133 : index + %135 = select %130, %134, %133 : index + %136 = remi_signed %135, %c16 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = addi %136, %c16 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %15, %c8 : index + %141 = subi %c-1, %140 : index + %142 = select %13, %141, %140 : index + %143 = muli %135, %c-2 : index + %144 = addi %142, %143 : index + %145 = addi %144, %c1 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c2 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = muli %151, %c-2 : index + %153 = addi %144, %152 : index + %154 = addi %153, %c1 : index + %155 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c0_i64 : i64] : vector<8xf32> + %157 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %158 = vector.extractelement %157[%c1_i64 : i64] : vector<8xf32> + %159 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %160 = vector.extractelement %159[%c2_i64 : i64] : vector<8xf32> + %161 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c3_i64 : i64] : vector<8xf32> + %163 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c4_i64 : i64] : vector<8xf32> + %165 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c5_i64 : i64] : vector<8xf32> + %167 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c6_i64 : i64] : vector<8xf32> + %169 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c7_i64 : i64] : vector<8xf32> + %171 = mulf %121, %156 {RelaxedPrecision} : f32 + %172 = mulf %122, %158 {RelaxedPrecision} : f32 + %173 = mulf %123, %160 {RelaxedPrecision} : f32 + %174 = mulf %124, %162 {RelaxedPrecision} : f32 + %175 = mulf %125, %164 {RelaxedPrecision} : f32 + %176 = mulf %126, %166 {RelaxedPrecision} : f32 + %177 = mulf %127, %168 {RelaxedPrecision} : f32 + %178 = mulf %128, %170 {RelaxedPrecision} : f32 + %179 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %180 = vector.extractelement %179[%c0_i64 : i64] : vector<8xf32> + %181 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %182 = vector.extractelement %181[%c1_i64 : i64] : vector<8xf32> + %183 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %186 = vector.extractelement %185[%c3_i64 : i64] : vector<8xf32> + %187 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %188 = vector.extractelement %187[%c4_i64 : i64] : vector<8xf32> + %189 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c5_i64 : i64] : vector<8xf32> + %191 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %192 = vector.extractelement %191[%c6_i64 : i64] : vector<8xf32> + %193 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c7_i64 : i64] : vector<8xf32> + %195 = addf %180, %171 {RelaxedPrecision} : f32 + %196 = addf %182, %172 {RelaxedPrecision} : f32 + %197 = addf %184, %173 {RelaxedPrecision} : f32 + %198 = addf %186, %174 {RelaxedPrecision} : f32 + %199 = addf %188, %175 {RelaxedPrecision} : f32 + %200 = addf %190, %176 {RelaxedPrecision} : f32 + %201 = addf %192, %177 {RelaxedPrecision} : f32 + %202 = addf %194, %178 {RelaxedPrecision} : f32 + %203 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %204 = vector.insertelement %195, %203[%c0_i64 : i64] : vector<8xf32> + store %204, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %205 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %206 = vector.insertelement %196, %205[%c1_i64 : i64] : vector<8xf32> + store %206, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %207 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %208 = vector.insertelement %197, %207[%c2_i64 : i64] : vector<8xf32> + store %208, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %209 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %210 = vector.insertelement %198, %209[%c3_i64 : i64] : vector<8xf32> + store %210, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %211 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %199, %211[%c4_i64 : i64] : vector<8xf32> + store %212, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %213 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %214 = vector.insertelement %200, %213[%c5_i64 : i64] : vector<8xf32> + store %214, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %215 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %216 = vector.insertelement %201, %215[%c6_i64 : i64] : vector<8xf32> + store %216, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %217 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %202, %217[%c7_i64 : i64] : vector<8xf32> + store %218, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %219 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %220 = vector.insertelement %195, %219[%c0_i64 : i64] : vector<8xf32> + store %220, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %221 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %222 = vector.insertelement %196, %221[%c1_i64 : i64] : vector<8xf32> + store %222, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %223 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %197, %223[%c2_i64 : i64] : vector<8xf32> + store %224, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %225 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %198, %225[%c3_i64 : i64] : vector<8xf32> + store %226, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %227 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %199, %227[%c4_i64 : i64] : vector<8xf32> + store %228, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %229 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %200, %229[%c5_i64 : i64] : vector<8xf32> + store %230, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %231 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %201, %231[%c6_i64 : i64] : vector<8xf32> + store %232, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %233 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %202, %233[%c7_i64 : i64] : vector<8xf32> + store %234, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/15_SPIRVLowerABIAttributes.mlir b/Tutorials/optimized_matmul/mlir/15_SPIRVLowerABIAttributes.mlir new file mode 100644 index 00000000..2b20194d --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/15_SPIRVLowerABIAttributes.mlir @@ -0,0 +1,1368 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = addi %arg4, %c1 : index + %114 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = mulf %114, %115 {RelaxedPrecision} : f32 + %117 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = addf %117, %116 {RelaxedPrecision} : f32 + store %118, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %121 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = mulf %120, %121 {RelaxedPrecision} : f32 + %123 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = addf %123, %122 {RelaxedPrecision} : f32 + store %124, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %125, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = mulf %126, %127 {RelaxedPrecision} : f32 + %129 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = addf %129, %128 {RelaxedPrecision} : f32 + store %130, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %131, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = mulf %132, %133 {RelaxedPrecision} : f32 + %135 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = addf %135, %134 {RelaxedPrecision} : f32 + store %136, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %137, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = mulf %138, %139 {RelaxedPrecision} : f32 + %141 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = addf %141, %140 {RelaxedPrecision} : f32 + store %142, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = mulf %144, %145 {RelaxedPrecision} : f32 + %147 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = addf %147, %146 {RelaxedPrecision} : f32 + store %148, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %149, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %151 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = mulf %150, %151 {RelaxedPrecision} : f32 + %153 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = addf %153, %152 {RelaxedPrecision} : f32 + store %154, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %155, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = mulf %162, %163 {RelaxedPrecision} : f32 + %165 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = addf %165, %164 {RelaxedPrecision} : f32 + store %166, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %169 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = mulf %168, %169 {RelaxedPrecision} : f32 + %171 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addf %171, %170 {RelaxedPrecision} : f32 + store %172, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %173, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %175 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = mulf %174, %175 {RelaxedPrecision} : f32 + %177 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addf %177, %176 {RelaxedPrecision} : f32 + store %178, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %179, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %181 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = mulf %180, %181 {RelaxedPrecision} : f32 + %183 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = addf %183, %182 {RelaxedPrecision} : f32 + store %184, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %185, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %193 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = mulf %192, %193 {RelaxedPrecision} : f32 + %195 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addf %195, %194 {RelaxedPrecision} : f32 + store %196, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %197, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %199 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = mulf %198, %199 {RelaxedPrecision} : f32 + %201 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addf %201, %200 {RelaxedPrecision} : f32 + store %202, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %203, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %205 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = mulf %204, %205 {RelaxedPrecision} : f32 + %207 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = addf %207, %206 {RelaxedPrecision} : f32 + store %208, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %209, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addi %arg4, %c2 : index + %211 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %212 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = mulf %211, %212 {RelaxedPrecision} : f32 + %214 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = addf %214, %213 {RelaxedPrecision} : f32 + store %215, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %216, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %224 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = mulf %223, %224 {RelaxedPrecision} : f32 + %226 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = addf %226, %225 {RelaxedPrecision} : f32 + store %227, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %228, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %230 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = mulf %229, %230 {RelaxedPrecision} : f32 + %232 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = addf %232, %231 {RelaxedPrecision} : f32 + store %233, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %234, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %242 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = mulf %241, %242 {RelaxedPrecision} : f32 + %244 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = addf %244, %243 {RelaxedPrecision} : f32 + store %245, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %246, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %248 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = mulf %247, %248 {RelaxedPrecision} : f32 + %250 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = addf %250, %249 {RelaxedPrecision} : f32 + store %251, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %252, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %260 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = mulf %259, %260 {RelaxedPrecision} : f32 + %262 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = addf %262, %261 {RelaxedPrecision} : f32 + store %263, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %264, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %266 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = mulf %265, %266 {RelaxedPrecision} : f32 + %268 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = addf %268, %267 {RelaxedPrecision} : f32 + store %269, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %270, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %278 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = mulf %277, %278 {RelaxedPrecision} : f32 + %280 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = addf %280, %279 {RelaxedPrecision} : f32 + store %281, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %282, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %284 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = mulf %283, %284 {RelaxedPrecision} : f32 + %286 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = addf %286, %285 {RelaxedPrecision} : f32 + store %287, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %288, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %290 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = mulf %289, %290 {RelaxedPrecision} : f32 + %292 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = addf %292, %291 {RelaxedPrecision} : f32 + store %293, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %294, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %296 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = mulf %295, %296 {RelaxedPrecision} : f32 + %298 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = addf %298, %297 {RelaxedPrecision} : f32 + store %299, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %300, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %302 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = mulf %301, %302 {RelaxedPrecision} : f32 + %304 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = addf %304, %303 {RelaxedPrecision} : f32 + store %305, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %306, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = addi %arg4, %c3 : index + %308 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %309 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = mulf %308, %309 {RelaxedPrecision} : f32 + %311 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addf %311, %310 {RelaxedPrecision} : f32 + store %312, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %313, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %314 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = mulf %314, %315 {RelaxedPrecision} : f32 + %317 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = addf %317, %316 {RelaxedPrecision} : f32 + store %318, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = mulf %320, %321 {RelaxedPrecision} : f32 + %323 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = addf %323, %322 {RelaxedPrecision} : f32 + store %324, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %327 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = mulf %326, %327 {RelaxedPrecision} : f32 + %329 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addf %329, %328 {RelaxedPrecision} : f32 + store %330, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %331, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %333 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = mulf %332, %333 {RelaxedPrecision} : f32 + %335 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = addf %335, %334 {RelaxedPrecision} : f32 + store %336, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %337, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = mulf %338, %339 {RelaxedPrecision} : f32 + %341 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = addf %341, %340 {RelaxedPrecision} : f32 + store %342, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %345 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = mulf %344, %345 {RelaxedPrecision} : f32 + %347 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addf %347, %346 {RelaxedPrecision} : f32 + store %348, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %349, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %351 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = mulf %350, %351 {RelaxedPrecision} : f32 + %353 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = addf %353, %352 {RelaxedPrecision} : f32 + store %354, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %355, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = mulf %356, %357 {RelaxedPrecision} : f32 + %359 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = addf %359, %358 {RelaxedPrecision} : f32 + store %360, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = mulf %362, %363 {RelaxedPrecision} : f32 + %365 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addf %365, %364 {RelaxedPrecision} : f32 + store %366, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %369 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = mulf %368, %369 {RelaxedPrecision} : f32 + %371 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = addf %371, %370 {RelaxedPrecision} : f32 + store %372, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %373, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = mulf %374, %375 {RelaxedPrecision} : f32 + %377 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = addf %377, %376 {RelaxedPrecision} : f32 + store %378, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %381 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = mulf %380, %381 {RelaxedPrecision} : f32 + %383 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addf %383, %382 {RelaxedPrecision} : f32 + store %384, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %385, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = mulf %386, %387 {RelaxedPrecision} : f32 + %389 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = addf %389, %388 {RelaxedPrecision} : f32 + store %390, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = mulf %392, %393 {RelaxedPrecision} : f32 + %395 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = addf %395, %394 {RelaxedPrecision} : f32 + store %396, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %399 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = mulf %398, %399 {RelaxedPrecision} : f32 + %401 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addf %401, %400 {RelaxedPrecision} : f32 + store %402, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %403, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = addi %arg4, %c4 : index + %405 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %412 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = mulf %411, %412 {RelaxedPrecision} : f32 + %414 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = addf %414, %413 {RelaxedPrecision} : f32 + store %415, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %416, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %418 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = mulf %417, %418 {RelaxedPrecision} : f32 + %420 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addf %420, %419 {RelaxedPrecision} : f32 + store %421, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %422, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %430 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = mulf %429, %430 {RelaxedPrecision} : f32 + %432 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = addf %432, %431 {RelaxedPrecision} : f32 + store %433, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %434, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %442 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = mulf %441, %442 {RelaxedPrecision} : f32 + %444 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = addf %444, %443 {RelaxedPrecision} : f32 + store %445, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %446, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %448 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = mulf %447, %448 {RelaxedPrecision} : f32 + %450 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addf %450, %449 {RelaxedPrecision} : f32 + store %451, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %452, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %454 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = mulf %453, %454 {RelaxedPrecision} : f32 + %456 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = addf %456, %455 {RelaxedPrecision} : f32 + store %457, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %458 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %458, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %460 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = mulf %459, %460 {RelaxedPrecision} : f32 + %462 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = addf %462, %461 {RelaxedPrecision} : f32 + store %463, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %464, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %466 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = mulf %465, %466 {RelaxedPrecision} : f32 + %468 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = addf %468, %467 {RelaxedPrecision} : f32 + store %469, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %470, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %472 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = mulf %471, %472 {RelaxedPrecision} : f32 + %474 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = addf %474, %473 {RelaxedPrecision} : f32 + store %475, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %476, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %478 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = mulf %477, %478 {RelaxedPrecision} : f32 + %480 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = addf %480, %479 {RelaxedPrecision} : f32 + store %481, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %482, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %484 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = mulf %483, %484 {RelaxedPrecision} : f32 + %486 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = addf %486, %485 {RelaxedPrecision} : f32 + store %487, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %488, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %490 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = mulf %489, %490 {RelaxedPrecision} : f32 + %492 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = addf %492, %491 {RelaxedPrecision} : f32 + store %493, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %494, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %496 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = mulf %495, %496 {RelaxedPrecision} : f32 + %498 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = addf %498, %497 {RelaxedPrecision} : f32 + store %499, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %500, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = addi %arg4, %c5 : index + %502 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %503 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = mulf %502, %503 {RelaxedPrecision} : f32 + %505 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = addf %505, %504 {RelaxedPrecision} : f32 + store %506, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %507, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %509 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = mulf %508, %509 {RelaxedPrecision} : f32 + %511 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %512 = addf %511, %510 {RelaxedPrecision} : f32 + store %512, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %513, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %521 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = mulf %520, %521 {RelaxedPrecision} : f32 + %523 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = addf %523, %522 {RelaxedPrecision} : f32 + store %524, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %525, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %527 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = mulf %526, %527 {RelaxedPrecision} : f32 + %529 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addf %529, %528 {RelaxedPrecision} : f32 + store %530, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %531, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %533 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = mulf %532, %533 {RelaxedPrecision} : f32 + %535 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addf %535, %534 {RelaxedPrecision} : f32 + store %536, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %537 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %537, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %539 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = mulf %538, %539 {RelaxedPrecision} : f32 + %541 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = addf %541, %540 {RelaxedPrecision} : f32 + store %542, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %543, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %551 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = mulf %550, %551 {RelaxedPrecision} : f32 + %553 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addf %553, %552 {RelaxedPrecision} : f32 + store %554, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %555 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %555, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %557 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = mulf %556, %557 {RelaxedPrecision} : f32 + %559 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addf %559, %558 {RelaxedPrecision} : f32 + store %560, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %561, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %563 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = mulf %562, %563 {RelaxedPrecision} : f32 + %565 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = addf %565, %564 {RelaxedPrecision} : f32 + store %566, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %567, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %569 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = mulf %568, %569 {RelaxedPrecision} : f32 + %571 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = addf %571, %570 {RelaxedPrecision} : f32 + store %572, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %573 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %573, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %581 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = mulf %580, %581 {RelaxedPrecision} : f32 + %583 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = addf %583, %582 {RelaxedPrecision} : f32 + store %584, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %585, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %587 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = mulf %586, %587 {RelaxedPrecision} : f32 + %589 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addf %589, %588 {RelaxedPrecision} : f32 + store %590, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %591 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %591, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %593 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = mulf %592, %593 {RelaxedPrecision} : f32 + %595 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = addf %595, %594 {RelaxedPrecision} : f32 + store %596, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %597 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %597, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = mulf %113, %114 {RelaxedPrecision} : f32 + %116 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %117 = addf %116, %115 {RelaxedPrecision} : f32 + store %117, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %118, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = mulf %125, %126 {RelaxedPrecision} : f32 + %128 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = addf %128, %127 {RelaxedPrecision} : f32 + store %129, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %130, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = mulf %131, %132 {RelaxedPrecision} : f32 + %134 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = addf %134, %133 {RelaxedPrecision} : f32 + store %135, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = mulf %143, %144 {RelaxedPrecision} : f32 + %146 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = addf %146, %145 {RelaxedPrecision} : f32 + store %147, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %148, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = mulf %149, %150 {RelaxedPrecision} : f32 + %152 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = addf %152, %151 {RelaxedPrecision} : f32 + store %153, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %157 = mulf %155, %156 {RelaxedPrecision} : f32 + %158 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = addf %158, %157 {RelaxedPrecision} : f32 + store %159, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %160, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = mulf %161, %162 {RelaxedPrecision} : f32 + %164 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = addf %164, %163 {RelaxedPrecision} : f32 + store %165, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %166, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = mulf %167, %168 {RelaxedPrecision} : f32 + %170 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = addf %170, %169 {RelaxedPrecision} : f32 + store %171, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %174 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = mulf %173, %174 {RelaxedPrecision} : f32 + %176 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = addf %176, %175 {RelaxedPrecision} : f32 + store %177, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %178, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %180 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = mulf %179, %180 {RelaxedPrecision} : f32 + %182 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = addf %182, %181 {RelaxedPrecision} : f32 + store %183, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %184, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = mulf %185, %186 {RelaxedPrecision} : f32 + %188 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = addf %188, %187 {RelaxedPrecision} : f32 + store %189, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %192 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %193 = mulf %191, %192 {RelaxedPrecision} : f32 + %194 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = addf %194, %193 {RelaxedPrecision} : f32 + store %195, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %196, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %198 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = mulf %197, %198 {RelaxedPrecision} : f32 + %200 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = addf %200, %199 {RelaxedPrecision} : f32 + store %201, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %202, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = mulf %203, %204 {RelaxedPrecision} : f32 + %206 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = addf %206, %205 {RelaxedPrecision} : f32 + store %207, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %210 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = mulf %209, %210 {RelaxedPrecision} : f32 + %212 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = addf %212, %211 {RelaxedPrecision} : f32 + store %213, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %214, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %216 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = mulf %215, %216 {RelaxedPrecision} : f32 + %218 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = addf %218, %217 {RelaxedPrecision} : f32 + store %219, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %220, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %318 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = mulf %317, %318 {RelaxedPrecision} : f32 + %320 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addf %320, %319 {RelaxedPrecision} : f32 + store %321, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %322, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %324 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = mulf %323, %324 {RelaxedPrecision} : f32 + %326 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = addf %326, %325 {RelaxedPrecision} : f32 + store %327, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %328, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = mulf %329, %330 {RelaxedPrecision} : f32 + %332 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = addf %332, %331 {RelaxedPrecision} : f32 + store %333, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %342 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = mulf %341, %342 {RelaxedPrecision} : f32 + %344 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %345 = addf %344, %343 {RelaxedPrecision} : f32 + store %345, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %346, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = mulf %347, %348 {RelaxedPrecision} : f32 + %350 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addf %350, %349 {RelaxedPrecision} : f32 + store %351, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %354 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = mulf %353, %354 {RelaxedPrecision} : f32 + %356 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addf %356, %355 {RelaxedPrecision} : f32 + store %357, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %358, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %360 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = mulf %359, %360 {RelaxedPrecision} : f32 + %362 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %363 = addf %362, %361 {RelaxedPrecision} : f32 + store %363, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %364, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %372 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = mulf %371, %372 {RelaxedPrecision} : f32 + %374 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addf %374, %373 {RelaxedPrecision} : f32 + store %375, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %376, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %378 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = mulf %377, %378 {RelaxedPrecision} : f32 + %380 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addf %380, %379 {RelaxedPrecision} : f32 + store %381, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %382, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = mulf %383, %384 {RelaxedPrecision} : f32 + %386 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = addf %386, %385 {RelaxedPrecision} : f32 + store %387, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %390 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = mulf %389, %390 {RelaxedPrecision} : f32 + %392 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addf %392, %391 {RelaxedPrecision} : f32 + store %393, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %394, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/16_SPIRVLowerABIAttributes.mlir b/Tutorials/optimized_matmul/mlir/16_SPIRVLowerABIAttributes.mlir new file mode 100644 index 00000000..aa07fd1b --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/16_SPIRVLowerABIAttributes.mlir @@ -0,0 +1,2095 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg6, %arg8 : index + %7 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = cmpi "slt", %arg5, %c0 : index + %16 = subi %c-1, %arg5 : index + %17 = select %15, %16, %arg5 : index + %18 = divi_signed %17, %c16 : index + %19 = subi %c-1, %18 : index + %20 = select %15, %19, %18 : index + %21 = remi_signed %20, %c16 : index + %22 = cmpi "slt", %21, %c0 : index + %23 = addi %21, %c16 : index + %24 = select %22, %23, %21 : index + %25 = remi_signed %6, %c128 : index + %26 = cmpi "slt", %25, %c0 : index + %27 = addi %25, %c128 : index + %28 = select %26, %27, %25 : index + %29 = remi_signed %arg5, %c16 : index + %30 = cmpi "slt", %29, %c0 : index + %31 = addi %29, %c16 : index + %32 = select %30, %31, %29 : index + %33 = cmpi "slt", %32, %c0 : index + %34 = subi %c-1, %32 : index + %35 = select %33, %34, %32 : index + %36 = divi_signed %35, %c8 : index + %37 = subi %c-1, %36 : index + %38 = select %33, %37, %36 : index + %39 = remi_signed %38, %c2 : index + %40 = cmpi "slt", %39, %c0 : index + %41 = addi %39, %c2 : index + %42 = select %40, %41, %39 : index + %43 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c0_i64 : i64] : vector<8xf32> + %45 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c1_i64 : i64] : vector<8xf32> + %47 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c2_i64 : i64] : vector<8xf32> + %49 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c3_i64 : i64] : vector<8xf32> + %51 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c4_i64 : i64] : vector<8xf32> + %53 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c5_i64 : i64] : vector<8xf32> + %55 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c6_i64 : i64] : vector<8xf32> + %57 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %58 = vector.extractelement %57[%c7_i64 : i64] : vector<8xf32> + %59 = mulf %7, %44 {RelaxedPrecision} : f32 + %60 = mulf %8, %46 {RelaxedPrecision} : f32 + %61 = mulf %9, %48 {RelaxedPrecision} : f32 + %62 = mulf %10, %50 {RelaxedPrecision} : f32 + %63 = mulf %11, %52 {RelaxedPrecision} : f32 + %64 = mulf %12, %54 {RelaxedPrecision} : f32 + %65 = mulf %13, %56 {RelaxedPrecision} : f32 + %66 = mulf %14, %58 {RelaxedPrecision} : f32 + %67 = addi %arg7, %arg9 : index + %68 = remi_signed %67, %c6 : index + %69 = cmpi "slt", %68, %c0 : index + %70 = addi %68, %c6 : index + %71 = select %69, %70, %68 : index + %72 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c0_i64 : i64] : vector<8xf32> + %74 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %75 = vector.extractelement %74[%c1_i64 : i64] : vector<8xf32> + %76 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c2_i64 : i64] : vector<8xf32> + %78 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c3_i64 : i64] : vector<8xf32> + %80 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %81 = vector.extractelement %80[%c4_i64 : i64] : vector<8xf32> + %82 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %83 = vector.extractelement %82[%c5_i64 : i64] : vector<8xf32> + %84 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %85 = vector.extractelement %84[%c6_i64 : i64] : vector<8xf32> + %86 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %87 = vector.extractelement %86[%c7_i64 : i64] : vector<8xf32> + %88 = addf %73, %59 {RelaxedPrecision} : f32 + %89 = addf %75, %60 {RelaxedPrecision} : f32 + %90 = addf %77, %61 {RelaxedPrecision} : f32 + %91 = addf %79, %62 {RelaxedPrecision} : f32 + %92 = addf %81, %63 {RelaxedPrecision} : f32 + %93 = addf %83, %64 {RelaxedPrecision} : f32 + %94 = addf %85, %65 {RelaxedPrecision} : f32 + %95 = addf %87, %66 {RelaxedPrecision} : f32 + %96 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %88, %96[%c0_i64 : i64] : vector<8xf32> + store %97, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %98 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %89, %98[%c1_i64 : i64] : vector<8xf32> + store %99, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %100 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %90, %100[%c2_i64 : i64] : vector<8xf32> + store %101, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %102 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %91, %102[%c3_i64 : i64] : vector<8xf32> + store %103, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %104 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %92, %104[%c4_i64 : i64] : vector<8xf32> + store %105, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %106 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %93, %106[%c5_i64 : i64] : vector<8xf32> + store %107, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %108 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %109 = vector.insertelement %94, %108[%c6_i64 : i64] : vector<8xf32> + store %109, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %110 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %95, %110[%c7_i64 : i64] : vector<8xf32> + store %111, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %112 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %88, %112[%c0_i64 : i64] : vector<8xf32> + store %113, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %114 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %89, %114[%c1_i64 : i64] : vector<8xf32> + store %115, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %116 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %117 = vector.insertelement %90, %116[%c2_i64 : i64] : vector<8xf32> + store %117, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %118 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %91, %118[%c3_i64 : i64] : vector<8xf32> + store %119, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %120 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %92, %120[%c4_i64 : i64] : vector<8xf32> + store %121, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %122 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %123 = vector.insertelement %93, %122[%c5_i64 : i64] : vector<8xf32> + store %123, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %124 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %94, %124[%c6_i64 : i64] : vector<8xf32> + store %125, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %126 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %95, %126[%c7_i64 : i64] : vector<8xf32> + store %127, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %128 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %134 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %135 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %136 = addi %arg5, %c8 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = subi %c-1, %136 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = remi_signed %142, %c16 : index + %144 = cmpi "slt", %143, %c0 : index + %145 = addi %143, %c16 : index + %146 = select %144, %145, %143 : index + %147 = divi_signed %17, %c8 : index + %148 = subi %c-1, %147 : index + %149 = select %15, %148, %147 : index + %150 = muli %142, %c-2 : index + %151 = addi %149, %150 : index + %152 = addi %151, %c1 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c1 : index + %162 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c2_i64 : i64] : vector<8xf32> + %168 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c3_i64 : i64] : vector<8xf32> + %170 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c4_i64 : i64] : vector<8xf32> + %172 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c5_i64 : i64] : vector<8xf32> + %174 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c6_i64 : i64] : vector<8xf32> + %176 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c7_i64 : i64] : vector<8xf32> + %178 = mulf %128, %163 {RelaxedPrecision} : f32 + %179 = mulf %129, %165 {RelaxedPrecision} : f32 + %180 = mulf %130, %167 {RelaxedPrecision} : f32 + %181 = mulf %131, %169 {RelaxedPrecision} : f32 + %182 = mulf %132, %171 {RelaxedPrecision} : f32 + %183 = mulf %133, %173 {RelaxedPrecision} : f32 + %184 = mulf %134, %175 {RelaxedPrecision} : f32 + %185 = mulf %135, %177 {RelaxedPrecision} : f32 + %186 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c0_i64 : i64] : vector<8xf32> + %188 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %189 = vector.extractelement %188[%c1_i64 : i64] : vector<8xf32> + %190 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %191 = vector.extractelement %190[%c2_i64 : i64] : vector<8xf32> + %192 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c3_i64 : i64] : vector<8xf32> + %194 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c4_i64 : i64] : vector<8xf32> + %196 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c5_i64 : i64] : vector<8xf32> + %198 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c6_i64 : i64] : vector<8xf32> + %200 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c7_i64 : i64] : vector<8xf32> + %202 = addf %187, %178 {RelaxedPrecision} : f32 + %203 = addf %189, %179 {RelaxedPrecision} : f32 + %204 = addf %191, %180 {RelaxedPrecision} : f32 + %205 = addf %193, %181 {RelaxedPrecision} : f32 + %206 = addf %195, %182 {RelaxedPrecision} : f32 + %207 = addf %197, %183 {RelaxedPrecision} : f32 + %208 = addf %199, %184 {RelaxedPrecision} : f32 + %209 = addf %201, %185 {RelaxedPrecision} : f32 + %210 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %202, %210[%c0_i64 : i64] : vector<8xf32> + store %211, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %212 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %213 = vector.insertelement %203, %212[%c1_i64 : i64] : vector<8xf32> + store %213, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %214 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %204, %214[%c2_i64 : i64] : vector<8xf32> + store %215, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %216 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %217 = vector.insertelement %205, %216[%c3_i64 : i64] : vector<8xf32> + store %217, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %218 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %219 = vector.insertelement %206, %218[%c4_i64 : i64] : vector<8xf32> + store %219, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %220 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %207, %220[%c5_i64 : i64] : vector<8xf32> + store %221, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %222 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %223 = vector.insertelement %208, %222[%c6_i64 : i64] : vector<8xf32> + store %223, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %224 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %209, %224[%c7_i64 : i64] : vector<8xf32> + store %225, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %226 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %202, %226[%c0_i64 : i64] : vector<8xf32> + store %227, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %228 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %203, %228[%c1_i64 : i64] : vector<8xf32> + store %229, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %230 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %204, %230[%c2_i64 : i64] : vector<8xf32> + store %231, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %232 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %205, %232[%c3_i64 : i64] : vector<8xf32> + store %233, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %234 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %206, %234[%c4_i64 : i64] : vector<8xf32> + store %235, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %236 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %207, %236[%c5_i64 : i64] : vector<8xf32> + store %237, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %238 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %208, %238[%c6_i64 : i64] : vector<8xf32> + store %239, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %240 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %209, %240[%c7_i64 : i64] : vector<8xf32> + store %241, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %6 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %7 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = cmpi "slt", %arg5, %c0 : index + %14 = subi %c-1, %arg5 : index + %15 = select %13, %14, %arg5 : index + %16 = divi_signed %15, %c16 : index + %17 = subi %c-1, %16 : index + %18 = select %13, %17, %16 : index + %19 = remi_signed %18, %c16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = addi %19, %c16 : index + %22 = select %20, %21, %19 : index + %23 = remi_signed %4, %c128 : index + %24 = cmpi "slt", %23, %c0 : index + %25 = addi %23, %c128 : index + %26 = select %24, %25, %23 : index + %27 = remi_signed %arg5, %c16 : index + %28 = cmpi "slt", %27, %c0 : index + %29 = addi %27, %c16 : index + %30 = select %28, %29, %27 : index + %31 = cmpi "slt", %30, %c0 : index + %32 = subi %c-1, %30 : index + %33 = select %31, %32, %30 : index + %34 = divi_signed %33, %c8 : index + %35 = subi %c-1, %34 : index + %36 = select %31, %35, %34 : index + %37 = remi_signed %36, %c2 : index + %38 = cmpi "slt", %37, %c0 : index + %39 = addi %37, %c2 : index + %40 = select %38, %39, %37 : index + %41 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c0_i64 : i64] : vector<8xf32> + %43 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c1_i64 : i64] : vector<8xf32> + %45 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c2_i64 : i64] : vector<8xf32> + %47 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c3_i64 : i64] : vector<8xf32> + %49 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c4_i64 : i64] : vector<8xf32> + %51 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c5_i64 : i64] : vector<8xf32> + %53 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c6_i64 : i64] : vector<8xf32> + %55 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c7_i64 : i64] : vector<8xf32> + %57 = mulf %5, %42 {RelaxedPrecision} : f32 + %58 = mulf %6, %44 {RelaxedPrecision} : f32 + %59 = mulf %7, %46 {RelaxedPrecision} : f32 + %60 = mulf %8, %48 {RelaxedPrecision} : f32 + %61 = mulf %9, %50 {RelaxedPrecision} : f32 + %62 = mulf %10, %52 {RelaxedPrecision} : f32 + %63 = mulf %11, %54 {RelaxedPrecision} : f32 + %64 = mulf %12, %56 {RelaxedPrecision} : f32 + %65 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c1_i64 : i64] : vector<8xf32> + %69 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c2_i64 : i64] : vector<8xf32> + %71 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %72 = vector.extractelement %71[%c3_i64 : i64] : vector<8xf32> + %73 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c4_i64 : i64] : vector<8xf32> + %75 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %78 = vector.extractelement %77[%c6_i64 : i64] : vector<8xf32> + %79 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = addf %66, %57 {RelaxedPrecision} : f32 + %82 = addf %68, %58 {RelaxedPrecision} : f32 + %83 = addf %70, %59 {RelaxedPrecision} : f32 + %84 = addf %72, %60 {RelaxedPrecision} : f32 + %85 = addf %74, %61 {RelaxedPrecision} : f32 + %86 = addf %76, %62 {RelaxedPrecision} : f32 + %87 = addf %78, %63 {RelaxedPrecision} : f32 + %88 = addf %80, %64 {RelaxedPrecision} : f32 + %89 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + store %90, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %91 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %82, %91[%c1_i64 : i64] : vector<8xf32> + store %92, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %93 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %94 = vector.insertelement %83, %93[%c2_i64 : i64] : vector<8xf32> + store %94, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %95 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %84, %95[%c3_i64 : i64] : vector<8xf32> + store %96, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %97 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c4_i64 : i64] : vector<8xf32> + store %98, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %99 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %100 = vector.insertelement %86, %99[%c5_i64 : i64] : vector<8xf32> + store %100, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %101 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %87, %101[%c6_i64 : i64] : vector<8xf32> + store %102, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %103 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %88, %103[%c7_i64 : i64] : vector<8xf32> + store %104, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %105 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %106 = vector.insertelement %81, %105[%c0_i64 : i64] : vector<8xf32> + store %106, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %107 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %82, %107[%c1_i64 : i64] : vector<8xf32> + store %108, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %109 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %83, %109[%c2_i64 : i64] : vector<8xf32> + store %110, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %111 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %112 = vector.insertelement %84, %111[%c3_i64 : i64] : vector<8xf32> + store %112, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %113 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %114 = vector.insertelement %85, %113[%c4_i64 : i64] : vector<8xf32> + store %114, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %115 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %86, %115[%c5_i64 : i64] : vector<8xf32> + store %116, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %117 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %87, %117[%c6_i64 : i64] : vector<8xf32> + store %118, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %119 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %120 = vector.insertelement %88, %119[%c7_i64 : i64] : vector<8xf32> + store %120, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %121 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %122 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %124 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = addi %arg5, %c8 : index + %130 = cmpi "slt", %129, %c0 : index + %131 = subi %c-1, %129 : index + %132 = select %130, %131, %129 : index + %133 = divi_signed %132, %c16 : index + %134 = subi %c-1, %133 : index + %135 = select %130, %134, %133 : index + %136 = remi_signed %135, %c16 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = addi %136, %c16 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %15, %c8 : index + %141 = subi %c-1, %140 : index + %142 = select %13, %141, %140 : index + %143 = muli %135, %c-2 : index + %144 = addi %142, %143 : index + %145 = addi %144, %c1 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c2 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = muli %151, %c-2 : index + %153 = addi %144, %152 : index + %154 = addi %153, %c1 : index + %155 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c0_i64 : i64] : vector<8xf32> + %157 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %158 = vector.extractelement %157[%c1_i64 : i64] : vector<8xf32> + %159 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %160 = vector.extractelement %159[%c2_i64 : i64] : vector<8xf32> + %161 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c3_i64 : i64] : vector<8xf32> + %163 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c4_i64 : i64] : vector<8xf32> + %165 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c5_i64 : i64] : vector<8xf32> + %167 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c6_i64 : i64] : vector<8xf32> + %169 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c7_i64 : i64] : vector<8xf32> + %171 = mulf %121, %156 {RelaxedPrecision} : f32 + %172 = mulf %122, %158 {RelaxedPrecision} : f32 + %173 = mulf %123, %160 {RelaxedPrecision} : f32 + %174 = mulf %124, %162 {RelaxedPrecision} : f32 + %175 = mulf %125, %164 {RelaxedPrecision} : f32 + %176 = mulf %126, %166 {RelaxedPrecision} : f32 + %177 = mulf %127, %168 {RelaxedPrecision} : f32 + %178 = mulf %128, %170 {RelaxedPrecision} : f32 + %179 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %180 = vector.extractelement %179[%c0_i64 : i64] : vector<8xf32> + %181 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %182 = vector.extractelement %181[%c1_i64 : i64] : vector<8xf32> + %183 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %186 = vector.extractelement %185[%c3_i64 : i64] : vector<8xf32> + %187 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %188 = vector.extractelement %187[%c4_i64 : i64] : vector<8xf32> + %189 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c5_i64 : i64] : vector<8xf32> + %191 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %192 = vector.extractelement %191[%c6_i64 : i64] : vector<8xf32> + %193 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c7_i64 : i64] : vector<8xf32> + %195 = addf %180, %171 {RelaxedPrecision} : f32 + %196 = addf %182, %172 {RelaxedPrecision} : f32 + %197 = addf %184, %173 {RelaxedPrecision} : f32 + %198 = addf %186, %174 {RelaxedPrecision} : f32 + %199 = addf %188, %175 {RelaxedPrecision} : f32 + %200 = addf %190, %176 {RelaxedPrecision} : f32 + %201 = addf %192, %177 {RelaxedPrecision} : f32 + %202 = addf %194, %178 {RelaxedPrecision} : f32 + %203 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %204 = vector.insertelement %195, %203[%c0_i64 : i64] : vector<8xf32> + store %204, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %205 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %206 = vector.insertelement %196, %205[%c1_i64 : i64] : vector<8xf32> + store %206, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %207 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %208 = vector.insertelement %197, %207[%c2_i64 : i64] : vector<8xf32> + store %208, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %209 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %210 = vector.insertelement %198, %209[%c3_i64 : i64] : vector<8xf32> + store %210, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %211 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %199, %211[%c4_i64 : i64] : vector<8xf32> + store %212, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %213 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %214 = vector.insertelement %200, %213[%c5_i64 : i64] : vector<8xf32> + store %214, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %215 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %216 = vector.insertelement %201, %215[%c6_i64 : i64] : vector<8xf32> + store %216, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %217 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %202, %217[%c7_i64 : i64] : vector<8xf32> + store %218, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %219 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %220 = vector.insertelement %195, %219[%c0_i64 : i64] : vector<8xf32> + store %220, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %221 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %222 = vector.insertelement %196, %221[%c1_i64 : i64] : vector<8xf32> + store %222, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %223 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %197, %223[%c2_i64 : i64] : vector<8xf32> + store %224, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %225 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %198, %225[%c3_i64 : i64] : vector<8xf32> + store %226, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %227 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %199, %227[%c4_i64 : i64] : vector<8xf32> + store %228, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %229 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %200, %229[%c5_i64 : i64] : vector<8xf32> + store %230, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %231 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %201, %231[%c6_i64 : i64] : vector<8xf32> + store %232, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %233 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %202, %233[%c7_i64 : i64] : vector<8xf32> + store %234, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/16_SPIRVUpdateVCE.mlir b/Tutorials/optimized_matmul/mlir/16_SPIRVUpdateVCE.mlir new file mode 100644 index 00000000..2b20194d --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/16_SPIRVUpdateVCE.mlir @@ -0,0 +1,1368 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = addi %arg4, %c1 : index + %114 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = mulf %114, %115 {RelaxedPrecision} : f32 + %117 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = addf %117, %116 {RelaxedPrecision} : f32 + store %118, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %121 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = mulf %120, %121 {RelaxedPrecision} : f32 + %123 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = addf %123, %122 {RelaxedPrecision} : f32 + store %124, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %125, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = mulf %126, %127 {RelaxedPrecision} : f32 + %129 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = addf %129, %128 {RelaxedPrecision} : f32 + store %130, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %131, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = mulf %132, %133 {RelaxedPrecision} : f32 + %135 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = addf %135, %134 {RelaxedPrecision} : f32 + store %136, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %137, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = mulf %138, %139 {RelaxedPrecision} : f32 + %141 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = addf %141, %140 {RelaxedPrecision} : f32 + store %142, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = mulf %144, %145 {RelaxedPrecision} : f32 + %147 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = addf %147, %146 {RelaxedPrecision} : f32 + store %148, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %149, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %151 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = mulf %150, %151 {RelaxedPrecision} : f32 + %153 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = addf %153, %152 {RelaxedPrecision} : f32 + store %154, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %155, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = mulf %162, %163 {RelaxedPrecision} : f32 + %165 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = addf %165, %164 {RelaxedPrecision} : f32 + store %166, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %169 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = mulf %168, %169 {RelaxedPrecision} : f32 + %171 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addf %171, %170 {RelaxedPrecision} : f32 + store %172, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %173, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %175 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = mulf %174, %175 {RelaxedPrecision} : f32 + %177 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addf %177, %176 {RelaxedPrecision} : f32 + store %178, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %179, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %181 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = mulf %180, %181 {RelaxedPrecision} : f32 + %183 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = addf %183, %182 {RelaxedPrecision} : f32 + store %184, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %185, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %193 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = mulf %192, %193 {RelaxedPrecision} : f32 + %195 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addf %195, %194 {RelaxedPrecision} : f32 + store %196, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %197, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %199 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = mulf %198, %199 {RelaxedPrecision} : f32 + %201 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addf %201, %200 {RelaxedPrecision} : f32 + store %202, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %203, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %205 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = mulf %204, %205 {RelaxedPrecision} : f32 + %207 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = addf %207, %206 {RelaxedPrecision} : f32 + store %208, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %209, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addi %arg4, %c2 : index + %211 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %212 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = mulf %211, %212 {RelaxedPrecision} : f32 + %214 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = addf %214, %213 {RelaxedPrecision} : f32 + store %215, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %216, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %224 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = mulf %223, %224 {RelaxedPrecision} : f32 + %226 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = addf %226, %225 {RelaxedPrecision} : f32 + store %227, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %228, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %230 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = mulf %229, %230 {RelaxedPrecision} : f32 + %232 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = addf %232, %231 {RelaxedPrecision} : f32 + store %233, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %234, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %242 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = mulf %241, %242 {RelaxedPrecision} : f32 + %244 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = addf %244, %243 {RelaxedPrecision} : f32 + store %245, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %246, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %248 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = mulf %247, %248 {RelaxedPrecision} : f32 + %250 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = addf %250, %249 {RelaxedPrecision} : f32 + store %251, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %252, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %260 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = mulf %259, %260 {RelaxedPrecision} : f32 + %262 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = addf %262, %261 {RelaxedPrecision} : f32 + store %263, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %264, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %266 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = mulf %265, %266 {RelaxedPrecision} : f32 + %268 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = addf %268, %267 {RelaxedPrecision} : f32 + store %269, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %270, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %278 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = mulf %277, %278 {RelaxedPrecision} : f32 + %280 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = addf %280, %279 {RelaxedPrecision} : f32 + store %281, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %282, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %284 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = mulf %283, %284 {RelaxedPrecision} : f32 + %286 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = addf %286, %285 {RelaxedPrecision} : f32 + store %287, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %288, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %290 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = mulf %289, %290 {RelaxedPrecision} : f32 + %292 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = addf %292, %291 {RelaxedPrecision} : f32 + store %293, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %294, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %296 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = mulf %295, %296 {RelaxedPrecision} : f32 + %298 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = addf %298, %297 {RelaxedPrecision} : f32 + store %299, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %300, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %302 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = mulf %301, %302 {RelaxedPrecision} : f32 + %304 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = addf %304, %303 {RelaxedPrecision} : f32 + store %305, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %306, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = addi %arg4, %c3 : index + %308 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %309 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = mulf %308, %309 {RelaxedPrecision} : f32 + %311 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addf %311, %310 {RelaxedPrecision} : f32 + store %312, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %313, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %314 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = mulf %314, %315 {RelaxedPrecision} : f32 + %317 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = addf %317, %316 {RelaxedPrecision} : f32 + store %318, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = mulf %320, %321 {RelaxedPrecision} : f32 + %323 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = addf %323, %322 {RelaxedPrecision} : f32 + store %324, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %327 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = mulf %326, %327 {RelaxedPrecision} : f32 + %329 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addf %329, %328 {RelaxedPrecision} : f32 + store %330, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %331, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %333 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = mulf %332, %333 {RelaxedPrecision} : f32 + %335 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = addf %335, %334 {RelaxedPrecision} : f32 + store %336, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %337, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = mulf %338, %339 {RelaxedPrecision} : f32 + %341 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = addf %341, %340 {RelaxedPrecision} : f32 + store %342, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %345 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = mulf %344, %345 {RelaxedPrecision} : f32 + %347 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addf %347, %346 {RelaxedPrecision} : f32 + store %348, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %349, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %351 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = mulf %350, %351 {RelaxedPrecision} : f32 + %353 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = addf %353, %352 {RelaxedPrecision} : f32 + store %354, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %355, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = mulf %356, %357 {RelaxedPrecision} : f32 + %359 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = addf %359, %358 {RelaxedPrecision} : f32 + store %360, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = mulf %362, %363 {RelaxedPrecision} : f32 + %365 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addf %365, %364 {RelaxedPrecision} : f32 + store %366, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %369 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = mulf %368, %369 {RelaxedPrecision} : f32 + %371 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = addf %371, %370 {RelaxedPrecision} : f32 + store %372, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %373, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = mulf %374, %375 {RelaxedPrecision} : f32 + %377 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = addf %377, %376 {RelaxedPrecision} : f32 + store %378, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %381 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = mulf %380, %381 {RelaxedPrecision} : f32 + %383 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addf %383, %382 {RelaxedPrecision} : f32 + store %384, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %385, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = mulf %386, %387 {RelaxedPrecision} : f32 + %389 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = addf %389, %388 {RelaxedPrecision} : f32 + store %390, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = mulf %392, %393 {RelaxedPrecision} : f32 + %395 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = addf %395, %394 {RelaxedPrecision} : f32 + store %396, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %399 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = mulf %398, %399 {RelaxedPrecision} : f32 + %401 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addf %401, %400 {RelaxedPrecision} : f32 + store %402, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %403, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = addi %arg4, %c4 : index + %405 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %412 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = mulf %411, %412 {RelaxedPrecision} : f32 + %414 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = addf %414, %413 {RelaxedPrecision} : f32 + store %415, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %416, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %418 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = mulf %417, %418 {RelaxedPrecision} : f32 + %420 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addf %420, %419 {RelaxedPrecision} : f32 + store %421, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %422, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %430 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = mulf %429, %430 {RelaxedPrecision} : f32 + %432 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = addf %432, %431 {RelaxedPrecision} : f32 + store %433, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %434, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %442 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = mulf %441, %442 {RelaxedPrecision} : f32 + %444 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = addf %444, %443 {RelaxedPrecision} : f32 + store %445, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %446, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %448 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = mulf %447, %448 {RelaxedPrecision} : f32 + %450 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addf %450, %449 {RelaxedPrecision} : f32 + store %451, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %452, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %454 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = mulf %453, %454 {RelaxedPrecision} : f32 + %456 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = addf %456, %455 {RelaxedPrecision} : f32 + store %457, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %458 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %458, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %460 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = mulf %459, %460 {RelaxedPrecision} : f32 + %462 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = addf %462, %461 {RelaxedPrecision} : f32 + store %463, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %464, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %466 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = mulf %465, %466 {RelaxedPrecision} : f32 + %468 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = addf %468, %467 {RelaxedPrecision} : f32 + store %469, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %470, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %472 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = mulf %471, %472 {RelaxedPrecision} : f32 + %474 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = addf %474, %473 {RelaxedPrecision} : f32 + store %475, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %476, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %478 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = mulf %477, %478 {RelaxedPrecision} : f32 + %480 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = addf %480, %479 {RelaxedPrecision} : f32 + store %481, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %482, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %484 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = mulf %483, %484 {RelaxedPrecision} : f32 + %486 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = addf %486, %485 {RelaxedPrecision} : f32 + store %487, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %488, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %490 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = mulf %489, %490 {RelaxedPrecision} : f32 + %492 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = addf %492, %491 {RelaxedPrecision} : f32 + store %493, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %494, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %496 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = mulf %495, %496 {RelaxedPrecision} : f32 + %498 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = addf %498, %497 {RelaxedPrecision} : f32 + store %499, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %500, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = addi %arg4, %c5 : index + %502 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %503 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = mulf %502, %503 {RelaxedPrecision} : f32 + %505 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = addf %505, %504 {RelaxedPrecision} : f32 + store %506, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %507, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %509 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = mulf %508, %509 {RelaxedPrecision} : f32 + %511 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %512 = addf %511, %510 {RelaxedPrecision} : f32 + store %512, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %513, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %521 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = mulf %520, %521 {RelaxedPrecision} : f32 + %523 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = addf %523, %522 {RelaxedPrecision} : f32 + store %524, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %525, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %527 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = mulf %526, %527 {RelaxedPrecision} : f32 + %529 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addf %529, %528 {RelaxedPrecision} : f32 + store %530, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %531, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %533 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = mulf %532, %533 {RelaxedPrecision} : f32 + %535 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addf %535, %534 {RelaxedPrecision} : f32 + store %536, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %537 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %537, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %539 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = mulf %538, %539 {RelaxedPrecision} : f32 + %541 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = addf %541, %540 {RelaxedPrecision} : f32 + store %542, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %543, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %551 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = mulf %550, %551 {RelaxedPrecision} : f32 + %553 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addf %553, %552 {RelaxedPrecision} : f32 + store %554, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %555 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %555, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %557 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = mulf %556, %557 {RelaxedPrecision} : f32 + %559 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addf %559, %558 {RelaxedPrecision} : f32 + store %560, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %561, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %563 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = mulf %562, %563 {RelaxedPrecision} : f32 + %565 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = addf %565, %564 {RelaxedPrecision} : f32 + store %566, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %567, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %569 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = mulf %568, %569 {RelaxedPrecision} : f32 + %571 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = addf %571, %570 {RelaxedPrecision} : f32 + store %572, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %573 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %573, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %581 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = mulf %580, %581 {RelaxedPrecision} : f32 + %583 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = addf %583, %582 {RelaxedPrecision} : f32 + store %584, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %585, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %587 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = mulf %586, %587 {RelaxedPrecision} : f32 + %589 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addf %589, %588 {RelaxedPrecision} : f32 + store %590, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %591 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %591, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %593 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = mulf %592, %593 {RelaxedPrecision} : f32 + %595 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = addf %595, %594 {RelaxedPrecision} : f32 + store %596, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %597 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %597, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = mulf %113, %114 {RelaxedPrecision} : f32 + %116 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %117 = addf %116, %115 {RelaxedPrecision} : f32 + store %117, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %118, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = mulf %125, %126 {RelaxedPrecision} : f32 + %128 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = addf %128, %127 {RelaxedPrecision} : f32 + store %129, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %130, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = mulf %131, %132 {RelaxedPrecision} : f32 + %134 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = addf %134, %133 {RelaxedPrecision} : f32 + store %135, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = mulf %143, %144 {RelaxedPrecision} : f32 + %146 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = addf %146, %145 {RelaxedPrecision} : f32 + store %147, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %148, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = mulf %149, %150 {RelaxedPrecision} : f32 + %152 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = addf %152, %151 {RelaxedPrecision} : f32 + store %153, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %157 = mulf %155, %156 {RelaxedPrecision} : f32 + %158 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = addf %158, %157 {RelaxedPrecision} : f32 + store %159, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %160, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = mulf %161, %162 {RelaxedPrecision} : f32 + %164 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = addf %164, %163 {RelaxedPrecision} : f32 + store %165, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %166, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = mulf %167, %168 {RelaxedPrecision} : f32 + %170 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = addf %170, %169 {RelaxedPrecision} : f32 + store %171, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %174 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = mulf %173, %174 {RelaxedPrecision} : f32 + %176 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = addf %176, %175 {RelaxedPrecision} : f32 + store %177, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %178, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %180 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = mulf %179, %180 {RelaxedPrecision} : f32 + %182 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = addf %182, %181 {RelaxedPrecision} : f32 + store %183, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %184, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = mulf %185, %186 {RelaxedPrecision} : f32 + %188 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = addf %188, %187 {RelaxedPrecision} : f32 + store %189, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %192 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %193 = mulf %191, %192 {RelaxedPrecision} : f32 + %194 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = addf %194, %193 {RelaxedPrecision} : f32 + store %195, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %196, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %198 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = mulf %197, %198 {RelaxedPrecision} : f32 + %200 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = addf %200, %199 {RelaxedPrecision} : f32 + store %201, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %202, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = mulf %203, %204 {RelaxedPrecision} : f32 + %206 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = addf %206, %205 {RelaxedPrecision} : f32 + store %207, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %210 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = mulf %209, %210 {RelaxedPrecision} : f32 + %212 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = addf %212, %211 {RelaxedPrecision} : f32 + store %213, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %214, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %216 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = mulf %215, %216 {RelaxedPrecision} : f32 + %218 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = addf %218, %217 {RelaxedPrecision} : f32 + store %219, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %220, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %318 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = mulf %317, %318 {RelaxedPrecision} : f32 + %320 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addf %320, %319 {RelaxedPrecision} : f32 + store %321, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %322, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %324 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = mulf %323, %324 {RelaxedPrecision} : f32 + %326 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = addf %326, %325 {RelaxedPrecision} : f32 + store %327, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %328, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = mulf %329, %330 {RelaxedPrecision} : f32 + %332 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = addf %332, %331 {RelaxedPrecision} : f32 + store %333, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %342 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = mulf %341, %342 {RelaxedPrecision} : f32 + %344 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %345 = addf %344, %343 {RelaxedPrecision} : f32 + store %345, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %346, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = mulf %347, %348 {RelaxedPrecision} : f32 + %350 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addf %350, %349 {RelaxedPrecision} : f32 + store %351, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %354 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = mulf %353, %354 {RelaxedPrecision} : f32 + %356 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addf %356, %355 {RelaxedPrecision} : f32 + store %357, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %358, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %360 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = mulf %359, %360 {RelaxedPrecision} : f32 + %362 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %363 = addf %362, %361 {RelaxedPrecision} : f32 + store %363, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %364, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %372 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = mulf %371, %372 {RelaxedPrecision} : f32 + %374 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addf %374, %373 {RelaxedPrecision} : f32 + store %375, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %376, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %378 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = mulf %377, %378 {RelaxedPrecision} : f32 + %380 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addf %380, %379 {RelaxedPrecision} : f32 + store %381, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %382, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = mulf %383, %384 {RelaxedPrecision} : f32 + %386 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = addf %386, %385 {RelaxedPrecision} : f32 + store %387, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %390 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = mulf %389, %390 {RelaxedPrecision} : f32 + %392 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addf %392, %391 {RelaxedPrecision} : f32 + store %393, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %394, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/17_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir b/Tutorials/optimized_matmul/mlir/17_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir new file mode 100644 index 00000000..2b20194d --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/17_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir @@ -0,0 +1,1368 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = addi %arg4, %c1 : index + %114 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = mulf %114, %115 {RelaxedPrecision} : f32 + %117 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = addf %117, %116 {RelaxedPrecision} : f32 + store %118, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %121 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = mulf %120, %121 {RelaxedPrecision} : f32 + %123 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = addf %123, %122 {RelaxedPrecision} : f32 + store %124, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %125, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = mulf %126, %127 {RelaxedPrecision} : f32 + %129 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = addf %129, %128 {RelaxedPrecision} : f32 + store %130, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %131, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = mulf %132, %133 {RelaxedPrecision} : f32 + %135 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = addf %135, %134 {RelaxedPrecision} : f32 + store %136, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %137, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = mulf %138, %139 {RelaxedPrecision} : f32 + %141 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = addf %141, %140 {RelaxedPrecision} : f32 + store %142, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = mulf %144, %145 {RelaxedPrecision} : f32 + %147 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = addf %147, %146 {RelaxedPrecision} : f32 + store %148, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %149, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %151 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = mulf %150, %151 {RelaxedPrecision} : f32 + %153 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = addf %153, %152 {RelaxedPrecision} : f32 + store %154, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %155, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = mulf %162, %163 {RelaxedPrecision} : f32 + %165 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = addf %165, %164 {RelaxedPrecision} : f32 + store %166, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %169 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = mulf %168, %169 {RelaxedPrecision} : f32 + %171 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addf %171, %170 {RelaxedPrecision} : f32 + store %172, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %173, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %175 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = mulf %174, %175 {RelaxedPrecision} : f32 + %177 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addf %177, %176 {RelaxedPrecision} : f32 + store %178, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %179, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %181 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = mulf %180, %181 {RelaxedPrecision} : f32 + %183 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = addf %183, %182 {RelaxedPrecision} : f32 + store %184, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %185, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %193 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = mulf %192, %193 {RelaxedPrecision} : f32 + %195 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addf %195, %194 {RelaxedPrecision} : f32 + store %196, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %197, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %199 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = mulf %198, %199 {RelaxedPrecision} : f32 + %201 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addf %201, %200 {RelaxedPrecision} : f32 + store %202, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %203, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %205 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = mulf %204, %205 {RelaxedPrecision} : f32 + %207 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = addf %207, %206 {RelaxedPrecision} : f32 + store %208, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %209, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addi %arg4, %c2 : index + %211 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %212 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = mulf %211, %212 {RelaxedPrecision} : f32 + %214 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = addf %214, %213 {RelaxedPrecision} : f32 + store %215, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %216, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %224 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = mulf %223, %224 {RelaxedPrecision} : f32 + %226 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = addf %226, %225 {RelaxedPrecision} : f32 + store %227, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %228, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %230 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = mulf %229, %230 {RelaxedPrecision} : f32 + %232 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = addf %232, %231 {RelaxedPrecision} : f32 + store %233, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %234, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %242 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = mulf %241, %242 {RelaxedPrecision} : f32 + %244 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = addf %244, %243 {RelaxedPrecision} : f32 + store %245, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %246, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %248 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = mulf %247, %248 {RelaxedPrecision} : f32 + %250 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = addf %250, %249 {RelaxedPrecision} : f32 + store %251, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %252, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %260 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = mulf %259, %260 {RelaxedPrecision} : f32 + %262 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = addf %262, %261 {RelaxedPrecision} : f32 + store %263, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %264, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %266 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = mulf %265, %266 {RelaxedPrecision} : f32 + %268 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = addf %268, %267 {RelaxedPrecision} : f32 + store %269, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %270, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %278 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = mulf %277, %278 {RelaxedPrecision} : f32 + %280 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = addf %280, %279 {RelaxedPrecision} : f32 + store %281, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %282, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %284 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = mulf %283, %284 {RelaxedPrecision} : f32 + %286 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = addf %286, %285 {RelaxedPrecision} : f32 + store %287, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %288, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %290 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = mulf %289, %290 {RelaxedPrecision} : f32 + %292 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = addf %292, %291 {RelaxedPrecision} : f32 + store %293, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %294, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %296 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = mulf %295, %296 {RelaxedPrecision} : f32 + %298 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = addf %298, %297 {RelaxedPrecision} : f32 + store %299, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %300, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %302 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = mulf %301, %302 {RelaxedPrecision} : f32 + %304 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = addf %304, %303 {RelaxedPrecision} : f32 + store %305, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %306, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = addi %arg4, %c3 : index + %308 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %309 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = mulf %308, %309 {RelaxedPrecision} : f32 + %311 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addf %311, %310 {RelaxedPrecision} : f32 + store %312, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %313, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %314 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = mulf %314, %315 {RelaxedPrecision} : f32 + %317 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = addf %317, %316 {RelaxedPrecision} : f32 + store %318, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = mulf %320, %321 {RelaxedPrecision} : f32 + %323 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = addf %323, %322 {RelaxedPrecision} : f32 + store %324, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %327 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = mulf %326, %327 {RelaxedPrecision} : f32 + %329 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addf %329, %328 {RelaxedPrecision} : f32 + store %330, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %331, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %333 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = mulf %332, %333 {RelaxedPrecision} : f32 + %335 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = addf %335, %334 {RelaxedPrecision} : f32 + store %336, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %337, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = mulf %338, %339 {RelaxedPrecision} : f32 + %341 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = addf %341, %340 {RelaxedPrecision} : f32 + store %342, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %345 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = mulf %344, %345 {RelaxedPrecision} : f32 + %347 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addf %347, %346 {RelaxedPrecision} : f32 + store %348, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %349, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %351 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = mulf %350, %351 {RelaxedPrecision} : f32 + %353 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = addf %353, %352 {RelaxedPrecision} : f32 + store %354, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %355, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = mulf %356, %357 {RelaxedPrecision} : f32 + %359 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = addf %359, %358 {RelaxedPrecision} : f32 + store %360, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = mulf %362, %363 {RelaxedPrecision} : f32 + %365 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addf %365, %364 {RelaxedPrecision} : f32 + store %366, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %369 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = mulf %368, %369 {RelaxedPrecision} : f32 + %371 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = addf %371, %370 {RelaxedPrecision} : f32 + store %372, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %373, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = mulf %374, %375 {RelaxedPrecision} : f32 + %377 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = addf %377, %376 {RelaxedPrecision} : f32 + store %378, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %381 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = mulf %380, %381 {RelaxedPrecision} : f32 + %383 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addf %383, %382 {RelaxedPrecision} : f32 + store %384, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %385, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = mulf %386, %387 {RelaxedPrecision} : f32 + %389 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = addf %389, %388 {RelaxedPrecision} : f32 + store %390, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = mulf %392, %393 {RelaxedPrecision} : f32 + %395 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = addf %395, %394 {RelaxedPrecision} : f32 + store %396, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %399 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = mulf %398, %399 {RelaxedPrecision} : f32 + %401 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addf %401, %400 {RelaxedPrecision} : f32 + store %402, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %403, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = addi %arg4, %c4 : index + %405 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %412 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = mulf %411, %412 {RelaxedPrecision} : f32 + %414 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = addf %414, %413 {RelaxedPrecision} : f32 + store %415, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %416, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %418 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = mulf %417, %418 {RelaxedPrecision} : f32 + %420 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addf %420, %419 {RelaxedPrecision} : f32 + store %421, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %422, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %430 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = mulf %429, %430 {RelaxedPrecision} : f32 + %432 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = addf %432, %431 {RelaxedPrecision} : f32 + store %433, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %434, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %442 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = mulf %441, %442 {RelaxedPrecision} : f32 + %444 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = addf %444, %443 {RelaxedPrecision} : f32 + store %445, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %446, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %448 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = mulf %447, %448 {RelaxedPrecision} : f32 + %450 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addf %450, %449 {RelaxedPrecision} : f32 + store %451, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %452, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %454 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = mulf %453, %454 {RelaxedPrecision} : f32 + %456 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = addf %456, %455 {RelaxedPrecision} : f32 + store %457, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %458 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %458, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %460 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = mulf %459, %460 {RelaxedPrecision} : f32 + %462 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = addf %462, %461 {RelaxedPrecision} : f32 + store %463, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %464, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %466 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = mulf %465, %466 {RelaxedPrecision} : f32 + %468 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = addf %468, %467 {RelaxedPrecision} : f32 + store %469, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %470, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %472 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = mulf %471, %472 {RelaxedPrecision} : f32 + %474 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = addf %474, %473 {RelaxedPrecision} : f32 + store %475, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %476, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %478 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = mulf %477, %478 {RelaxedPrecision} : f32 + %480 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = addf %480, %479 {RelaxedPrecision} : f32 + store %481, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %482, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %484 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = mulf %483, %484 {RelaxedPrecision} : f32 + %486 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = addf %486, %485 {RelaxedPrecision} : f32 + store %487, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %488, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %490 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = mulf %489, %490 {RelaxedPrecision} : f32 + %492 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = addf %492, %491 {RelaxedPrecision} : f32 + store %493, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %494, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %496 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = mulf %495, %496 {RelaxedPrecision} : f32 + %498 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = addf %498, %497 {RelaxedPrecision} : f32 + store %499, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %500, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = addi %arg4, %c5 : index + %502 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %503 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = mulf %502, %503 {RelaxedPrecision} : f32 + %505 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = addf %505, %504 {RelaxedPrecision} : f32 + store %506, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %507, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %509 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = mulf %508, %509 {RelaxedPrecision} : f32 + %511 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %512 = addf %511, %510 {RelaxedPrecision} : f32 + store %512, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %513, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %521 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = mulf %520, %521 {RelaxedPrecision} : f32 + %523 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = addf %523, %522 {RelaxedPrecision} : f32 + store %524, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %525, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %527 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = mulf %526, %527 {RelaxedPrecision} : f32 + %529 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addf %529, %528 {RelaxedPrecision} : f32 + store %530, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %531, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %533 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = mulf %532, %533 {RelaxedPrecision} : f32 + %535 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addf %535, %534 {RelaxedPrecision} : f32 + store %536, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %537 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %537, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %539 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = mulf %538, %539 {RelaxedPrecision} : f32 + %541 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = addf %541, %540 {RelaxedPrecision} : f32 + store %542, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %543, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %551 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = mulf %550, %551 {RelaxedPrecision} : f32 + %553 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addf %553, %552 {RelaxedPrecision} : f32 + store %554, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %555 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %555, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %557 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = mulf %556, %557 {RelaxedPrecision} : f32 + %559 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addf %559, %558 {RelaxedPrecision} : f32 + store %560, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %561, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %563 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = mulf %562, %563 {RelaxedPrecision} : f32 + %565 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = addf %565, %564 {RelaxedPrecision} : f32 + store %566, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %567, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %569 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = mulf %568, %569 {RelaxedPrecision} : f32 + %571 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = addf %571, %570 {RelaxedPrecision} : f32 + store %572, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %573 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %573, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %581 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = mulf %580, %581 {RelaxedPrecision} : f32 + %583 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = addf %583, %582 {RelaxedPrecision} : f32 + store %584, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %585, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %587 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = mulf %586, %587 {RelaxedPrecision} : f32 + %589 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addf %589, %588 {RelaxedPrecision} : f32 + store %590, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %591 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %591, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %593 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = mulf %592, %593 {RelaxedPrecision} : f32 + %595 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = addf %595, %594 {RelaxedPrecision} : f32 + store %596, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %597 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %597, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = mulf %113, %114 {RelaxedPrecision} : f32 + %116 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %117 = addf %116, %115 {RelaxedPrecision} : f32 + store %117, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %118, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = mulf %125, %126 {RelaxedPrecision} : f32 + %128 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = addf %128, %127 {RelaxedPrecision} : f32 + store %129, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %130, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = mulf %131, %132 {RelaxedPrecision} : f32 + %134 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = addf %134, %133 {RelaxedPrecision} : f32 + store %135, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = mulf %143, %144 {RelaxedPrecision} : f32 + %146 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = addf %146, %145 {RelaxedPrecision} : f32 + store %147, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %148, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = mulf %149, %150 {RelaxedPrecision} : f32 + %152 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = addf %152, %151 {RelaxedPrecision} : f32 + store %153, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %157 = mulf %155, %156 {RelaxedPrecision} : f32 + %158 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = addf %158, %157 {RelaxedPrecision} : f32 + store %159, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %160, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = mulf %161, %162 {RelaxedPrecision} : f32 + %164 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = addf %164, %163 {RelaxedPrecision} : f32 + store %165, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %166, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = mulf %167, %168 {RelaxedPrecision} : f32 + %170 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = addf %170, %169 {RelaxedPrecision} : f32 + store %171, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %174 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = mulf %173, %174 {RelaxedPrecision} : f32 + %176 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = addf %176, %175 {RelaxedPrecision} : f32 + store %177, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %178, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %180 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = mulf %179, %180 {RelaxedPrecision} : f32 + %182 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = addf %182, %181 {RelaxedPrecision} : f32 + store %183, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %184, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = mulf %185, %186 {RelaxedPrecision} : f32 + %188 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = addf %188, %187 {RelaxedPrecision} : f32 + store %189, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %192 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %193 = mulf %191, %192 {RelaxedPrecision} : f32 + %194 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = addf %194, %193 {RelaxedPrecision} : f32 + store %195, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %196, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %198 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = mulf %197, %198 {RelaxedPrecision} : f32 + %200 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = addf %200, %199 {RelaxedPrecision} : f32 + store %201, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %202, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = mulf %203, %204 {RelaxedPrecision} : f32 + %206 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = addf %206, %205 {RelaxedPrecision} : f32 + store %207, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %210 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = mulf %209, %210 {RelaxedPrecision} : f32 + %212 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = addf %212, %211 {RelaxedPrecision} : f32 + store %213, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %214, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %216 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = mulf %215, %216 {RelaxedPrecision} : f32 + %218 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = addf %218, %217 {RelaxedPrecision} : f32 + store %219, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %220, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %318 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = mulf %317, %318 {RelaxedPrecision} : f32 + %320 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addf %320, %319 {RelaxedPrecision} : f32 + store %321, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %322, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %324 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = mulf %323, %324 {RelaxedPrecision} : f32 + %326 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = addf %326, %325 {RelaxedPrecision} : f32 + store %327, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %328, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = mulf %329, %330 {RelaxedPrecision} : f32 + %332 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = addf %332, %331 {RelaxedPrecision} : f32 + store %333, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %342 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = mulf %341, %342 {RelaxedPrecision} : f32 + %344 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %345 = addf %344, %343 {RelaxedPrecision} : f32 + store %345, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %346, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = mulf %347, %348 {RelaxedPrecision} : f32 + %350 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addf %350, %349 {RelaxedPrecision} : f32 + store %351, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %354 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = mulf %353, %354 {RelaxedPrecision} : f32 + %356 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addf %356, %355 {RelaxedPrecision} : f32 + store %357, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %358, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %360 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = mulf %359, %360 {RelaxedPrecision} : f32 + %362 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %363 = addf %362, %361 {RelaxedPrecision} : f32 + store %363, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %364, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %372 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = mulf %371, %372 {RelaxedPrecision} : f32 + %374 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addf %374, %373 {RelaxedPrecision} : f32 + store %375, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %376, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %378 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = mulf %377, %378 {RelaxedPrecision} : f32 + %380 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addf %380, %379 {RelaxedPrecision} : f32 + store %381, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %382, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = mulf %383, %384 {RelaxedPrecision} : f32 + %386 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = addf %386, %385 {RelaxedPrecision} : f32 + store %387, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %390 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = mulf %389, %390 {RelaxedPrecision} : f32 + %392 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addf %392, %391 {RelaxedPrecision} : f32 + store %393, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %394, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/17_SPIRVUpdateVCE.mlir b/Tutorials/optimized_matmul/mlir/17_SPIRVUpdateVCE.mlir new file mode 100644 index 00000000..aa07fd1b --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/17_SPIRVUpdateVCE.mlir @@ -0,0 +1,2095 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg6, %arg8 : index + %7 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = cmpi "slt", %arg5, %c0 : index + %16 = subi %c-1, %arg5 : index + %17 = select %15, %16, %arg5 : index + %18 = divi_signed %17, %c16 : index + %19 = subi %c-1, %18 : index + %20 = select %15, %19, %18 : index + %21 = remi_signed %20, %c16 : index + %22 = cmpi "slt", %21, %c0 : index + %23 = addi %21, %c16 : index + %24 = select %22, %23, %21 : index + %25 = remi_signed %6, %c128 : index + %26 = cmpi "slt", %25, %c0 : index + %27 = addi %25, %c128 : index + %28 = select %26, %27, %25 : index + %29 = remi_signed %arg5, %c16 : index + %30 = cmpi "slt", %29, %c0 : index + %31 = addi %29, %c16 : index + %32 = select %30, %31, %29 : index + %33 = cmpi "slt", %32, %c0 : index + %34 = subi %c-1, %32 : index + %35 = select %33, %34, %32 : index + %36 = divi_signed %35, %c8 : index + %37 = subi %c-1, %36 : index + %38 = select %33, %37, %36 : index + %39 = remi_signed %38, %c2 : index + %40 = cmpi "slt", %39, %c0 : index + %41 = addi %39, %c2 : index + %42 = select %40, %41, %39 : index + %43 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c0_i64 : i64] : vector<8xf32> + %45 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c1_i64 : i64] : vector<8xf32> + %47 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c2_i64 : i64] : vector<8xf32> + %49 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c3_i64 : i64] : vector<8xf32> + %51 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c4_i64 : i64] : vector<8xf32> + %53 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c5_i64 : i64] : vector<8xf32> + %55 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c6_i64 : i64] : vector<8xf32> + %57 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %58 = vector.extractelement %57[%c7_i64 : i64] : vector<8xf32> + %59 = mulf %7, %44 {RelaxedPrecision} : f32 + %60 = mulf %8, %46 {RelaxedPrecision} : f32 + %61 = mulf %9, %48 {RelaxedPrecision} : f32 + %62 = mulf %10, %50 {RelaxedPrecision} : f32 + %63 = mulf %11, %52 {RelaxedPrecision} : f32 + %64 = mulf %12, %54 {RelaxedPrecision} : f32 + %65 = mulf %13, %56 {RelaxedPrecision} : f32 + %66 = mulf %14, %58 {RelaxedPrecision} : f32 + %67 = addi %arg7, %arg9 : index + %68 = remi_signed %67, %c6 : index + %69 = cmpi "slt", %68, %c0 : index + %70 = addi %68, %c6 : index + %71 = select %69, %70, %68 : index + %72 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c0_i64 : i64] : vector<8xf32> + %74 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %75 = vector.extractelement %74[%c1_i64 : i64] : vector<8xf32> + %76 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c2_i64 : i64] : vector<8xf32> + %78 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c3_i64 : i64] : vector<8xf32> + %80 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %81 = vector.extractelement %80[%c4_i64 : i64] : vector<8xf32> + %82 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %83 = vector.extractelement %82[%c5_i64 : i64] : vector<8xf32> + %84 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %85 = vector.extractelement %84[%c6_i64 : i64] : vector<8xf32> + %86 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %87 = vector.extractelement %86[%c7_i64 : i64] : vector<8xf32> + %88 = addf %73, %59 {RelaxedPrecision} : f32 + %89 = addf %75, %60 {RelaxedPrecision} : f32 + %90 = addf %77, %61 {RelaxedPrecision} : f32 + %91 = addf %79, %62 {RelaxedPrecision} : f32 + %92 = addf %81, %63 {RelaxedPrecision} : f32 + %93 = addf %83, %64 {RelaxedPrecision} : f32 + %94 = addf %85, %65 {RelaxedPrecision} : f32 + %95 = addf %87, %66 {RelaxedPrecision} : f32 + %96 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %88, %96[%c0_i64 : i64] : vector<8xf32> + store %97, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %98 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %89, %98[%c1_i64 : i64] : vector<8xf32> + store %99, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %100 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %90, %100[%c2_i64 : i64] : vector<8xf32> + store %101, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %102 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %91, %102[%c3_i64 : i64] : vector<8xf32> + store %103, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %104 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %92, %104[%c4_i64 : i64] : vector<8xf32> + store %105, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %106 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %93, %106[%c5_i64 : i64] : vector<8xf32> + store %107, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %108 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %109 = vector.insertelement %94, %108[%c6_i64 : i64] : vector<8xf32> + store %109, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %110 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %95, %110[%c7_i64 : i64] : vector<8xf32> + store %111, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %112 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %88, %112[%c0_i64 : i64] : vector<8xf32> + store %113, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %114 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %89, %114[%c1_i64 : i64] : vector<8xf32> + store %115, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %116 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %117 = vector.insertelement %90, %116[%c2_i64 : i64] : vector<8xf32> + store %117, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %118 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %91, %118[%c3_i64 : i64] : vector<8xf32> + store %119, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %120 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %92, %120[%c4_i64 : i64] : vector<8xf32> + store %121, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %122 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %123 = vector.insertelement %93, %122[%c5_i64 : i64] : vector<8xf32> + store %123, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %124 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %94, %124[%c6_i64 : i64] : vector<8xf32> + store %125, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %126 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %95, %126[%c7_i64 : i64] : vector<8xf32> + store %127, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %128 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %134 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %135 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %136 = addi %arg5, %c8 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = subi %c-1, %136 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = remi_signed %142, %c16 : index + %144 = cmpi "slt", %143, %c0 : index + %145 = addi %143, %c16 : index + %146 = select %144, %145, %143 : index + %147 = divi_signed %17, %c8 : index + %148 = subi %c-1, %147 : index + %149 = select %15, %148, %147 : index + %150 = muli %142, %c-2 : index + %151 = addi %149, %150 : index + %152 = addi %151, %c1 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c1 : index + %162 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c2_i64 : i64] : vector<8xf32> + %168 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c3_i64 : i64] : vector<8xf32> + %170 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c4_i64 : i64] : vector<8xf32> + %172 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c5_i64 : i64] : vector<8xf32> + %174 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c6_i64 : i64] : vector<8xf32> + %176 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c7_i64 : i64] : vector<8xf32> + %178 = mulf %128, %163 {RelaxedPrecision} : f32 + %179 = mulf %129, %165 {RelaxedPrecision} : f32 + %180 = mulf %130, %167 {RelaxedPrecision} : f32 + %181 = mulf %131, %169 {RelaxedPrecision} : f32 + %182 = mulf %132, %171 {RelaxedPrecision} : f32 + %183 = mulf %133, %173 {RelaxedPrecision} : f32 + %184 = mulf %134, %175 {RelaxedPrecision} : f32 + %185 = mulf %135, %177 {RelaxedPrecision} : f32 + %186 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c0_i64 : i64] : vector<8xf32> + %188 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %189 = vector.extractelement %188[%c1_i64 : i64] : vector<8xf32> + %190 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %191 = vector.extractelement %190[%c2_i64 : i64] : vector<8xf32> + %192 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c3_i64 : i64] : vector<8xf32> + %194 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c4_i64 : i64] : vector<8xf32> + %196 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c5_i64 : i64] : vector<8xf32> + %198 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c6_i64 : i64] : vector<8xf32> + %200 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c7_i64 : i64] : vector<8xf32> + %202 = addf %187, %178 {RelaxedPrecision} : f32 + %203 = addf %189, %179 {RelaxedPrecision} : f32 + %204 = addf %191, %180 {RelaxedPrecision} : f32 + %205 = addf %193, %181 {RelaxedPrecision} : f32 + %206 = addf %195, %182 {RelaxedPrecision} : f32 + %207 = addf %197, %183 {RelaxedPrecision} : f32 + %208 = addf %199, %184 {RelaxedPrecision} : f32 + %209 = addf %201, %185 {RelaxedPrecision} : f32 + %210 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %202, %210[%c0_i64 : i64] : vector<8xf32> + store %211, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %212 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %213 = vector.insertelement %203, %212[%c1_i64 : i64] : vector<8xf32> + store %213, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %214 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %204, %214[%c2_i64 : i64] : vector<8xf32> + store %215, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %216 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %217 = vector.insertelement %205, %216[%c3_i64 : i64] : vector<8xf32> + store %217, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %218 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %219 = vector.insertelement %206, %218[%c4_i64 : i64] : vector<8xf32> + store %219, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %220 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %207, %220[%c5_i64 : i64] : vector<8xf32> + store %221, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %222 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %223 = vector.insertelement %208, %222[%c6_i64 : i64] : vector<8xf32> + store %223, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %224 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %209, %224[%c7_i64 : i64] : vector<8xf32> + store %225, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %226 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %202, %226[%c0_i64 : i64] : vector<8xf32> + store %227, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %228 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %203, %228[%c1_i64 : i64] : vector<8xf32> + store %229, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %230 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %204, %230[%c2_i64 : i64] : vector<8xf32> + store %231, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %232 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %205, %232[%c3_i64 : i64] : vector<8xf32> + store %233, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %234 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %206, %234[%c4_i64 : i64] : vector<8xf32> + store %235, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %236 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %207, %236[%c5_i64 : i64] : vector<8xf32> + store %237, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %238 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %208, %238[%c6_i64 : i64] : vector<8xf32> + store %239, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %240 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %209, %240[%c7_i64 : i64] : vector<8xf32> + store %241, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %6 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %7 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = cmpi "slt", %arg5, %c0 : index + %14 = subi %c-1, %arg5 : index + %15 = select %13, %14, %arg5 : index + %16 = divi_signed %15, %c16 : index + %17 = subi %c-1, %16 : index + %18 = select %13, %17, %16 : index + %19 = remi_signed %18, %c16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = addi %19, %c16 : index + %22 = select %20, %21, %19 : index + %23 = remi_signed %4, %c128 : index + %24 = cmpi "slt", %23, %c0 : index + %25 = addi %23, %c128 : index + %26 = select %24, %25, %23 : index + %27 = remi_signed %arg5, %c16 : index + %28 = cmpi "slt", %27, %c0 : index + %29 = addi %27, %c16 : index + %30 = select %28, %29, %27 : index + %31 = cmpi "slt", %30, %c0 : index + %32 = subi %c-1, %30 : index + %33 = select %31, %32, %30 : index + %34 = divi_signed %33, %c8 : index + %35 = subi %c-1, %34 : index + %36 = select %31, %35, %34 : index + %37 = remi_signed %36, %c2 : index + %38 = cmpi "slt", %37, %c0 : index + %39 = addi %37, %c2 : index + %40 = select %38, %39, %37 : index + %41 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c0_i64 : i64] : vector<8xf32> + %43 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c1_i64 : i64] : vector<8xf32> + %45 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c2_i64 : i64] : vector<8xf32> + %47 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c3_i64 : i64] : vector<8xf32> + %49 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c4_i64 : i64] : vector<8xf32> + %51 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c5_i64 : i64] : vector<8xf32> + %53 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c6_i64 : i64] : vector<8xf32> + %55 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c7_i64 : i64] : vector<8xf32> + %57 = mulf %5, %42 {RelaxedPrecision} : f32 + %58 = mulf %6, %44 {RelaxedPrecision} : f32 + %59 = mulf %7, %46 {RelaxedPrecision} : f32 + %60 = mulf %8, %48 {RelaxedPrecision} : f32 + %61 = mulf %9, %50 {RelaxedPrecision} : f32 + %62 = mulf %10, %52 {RelaxedPrecision} : f32 + %63 = mulf %11, %54 {RelaxedPrecision} : f32 + %64 = mulf %12, %56 {RelaxedPrecision} : f32 + %65 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c1_i64 : i64] : vector<8xf32> + %69 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c2_i64 : i64] : vector<8xf32> + %71 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %72 = vector.extractelement %71[%c3_i64 : i64] : vector<8xf32> + %73 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c4_i64 : i64] : vector<8xf32> + %75 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %78 = vector.extractelement %77[%c6_i64 : i64] : vector<8xf32> + %79 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = addf %66, %57 {RelaxedPrecision} : f32 + %82 = addf %68, %58 {RelaxedPrecision} : f32 + %83 = addf %70, %59 {RelaxedPrecision} : f32 + %84 = addf %72, %60 {RelaxedPrecision} : f32 + %85 = addf %74, %61 {RelaxedPrecision} : f32 + %86 = addf %76, %62 {RelaxedPrecision} : f32 + %87 = addf %78, %63 {RelaxedPrecision} : f32 + %88 = addf %80, %64 {RelaxedPrecision} : f32 + %89 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + store %90, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %91 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %82, %91[%c1_i64 : i64] : vector<8xf32> + store %92, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %93 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %94 = vector.insertelement %83, %93[%c2_i64 : i64] : vector<8xf32> + store %94, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %95 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %84, %95[%c3_i64 : i64] : vector<8xf32> + store %96, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %97 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c4_i64 : i64] : vector<8xf32> + store %98, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %99 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %100 = vector.insertelement %86, %99[%c5_i64 : i64] : vector<8xf32> + store %100, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %101 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %87, %101[%c6_i64 : i64] : vector<8xf32> + store %102, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %103 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %88, %103[%c7_i64 : i64] : vector<8xf32> + store %104, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %105 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %106 = vector.insertelement %81, %105[%c0_i64 : i64] : vector<8xf32> + store %106, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %107 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %82, %107[%c1_i64 : i64] : vector<8xf32> + store %108, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %109 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %83, %109[%c2_i64 : i64] : vector<8xf32> + store %110, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %111 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %112 = vector.insertelement %84, %111[%c3_i64 : i64] : vector<8xf32> + store %112, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %113 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %114 = vector.insertelement %85, %113[%c4_i64 : i64] : vector<8xf32> + store %114, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %115 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %86, %115[%c5_i64 : i64] : vector<8xf32> + store %116, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %117 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %87, %117[%c6_i64 : i64] : vector<8xf32> + store %118, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %119 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %120 = vector.insertelement %88, %119[%c7_i64 : i64] : vector<8xf32> + store %120, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %121 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %122 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %124 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = addi %arg5, %c8 : index + %130 = cmpi "slt", %129, %c0 : index + %131 = subi %c-1, %129 : index + %132 = select %130, %131, %129 : index + %133 = divi_signed %132, %c16 : index + %134 = subi %c-1, %133 : index + %135 = select %130, %134, %133 : index + %136 = remi_signed %135, %c16 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = addi %136, %c16 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %15, %c8 : index + %141 = subi %c-1, %140 : index + %142 = select %13, %141, %140 : index + %143 = muli %135, %c-2 : index + %144 = addi %142, %143 : index + %145 = addi %144, %c1 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c2 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = muli %151, %c-2 : index + %153 = addi %144, %152 : index + %154 = addi %153, %c1 : index + %155 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c0_i64 : i64] : vector<8xf32> + %157 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %158 = vector.extractelement %157[%c1_i64 : i64] : vector<8xf32> + %159 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %160 = vector.extractelement %159[%c2_i64 : i64] : vector<8xf32> + %161 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c3_i64 : i64] : vector<8xf32> + %163 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c4_i64 : i64] : vector<8xf32> + %165 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c5_i64 : i64] : vector<8xf32> + %167 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c6_i64 : i64] : vector<8xf32> + %169 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c7_i64 : i64] : vector<8xf32> + %171 = mulf %121, %156 {RelaxedPrecision} : f32 + %172 = mulf %122, %158 {RelaxedPrecision} : f32 + %173 = mulf %123, %160 {RelaxedPrecision} : f32 + %174 = mulf %124, %162 {RelaxedPrecision} : f32 + %175 = mulf %125, %164 {RelaxedPrecision} : f32 + %176 = mulf %126, %166 {RelaxedPrecision} : f32 + %177 = mulf %127, %168 {RelaxedPrecision} : f32 + %178 = mulf %128, %170 {RelaxedPrecision} : f32 + %179 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %180 = vector.extractelement %179[%c0_i64 : i64] : vector<8xf32> + %181 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %182 = vector.extractelement %181[%c1_i64 : i64] : vector<8xf32> + %183 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %186 = vector.extractelement %185[%c3_i64 : i64] : vector<8xf32> + %187 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %188 = vector.extractelement %187[%c4_i64 : i64] : vector<8xf32> + %189 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c5_i64 : i64] : vector<8xf32> + %191 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %192 = vector.extractelement %191[%c6_i64 : i64] : vector<8xf32> + %193 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c7_i64 : i64] : vector<8xf32> + %195 = addf %180, %171 {RelaxedPrecision} : f32 + %196 = addf %182, %172 {RelaxedPrecision} : f32 + %197 = addf %184, %173 {RelaxedPrecision} : f32 + %198 = addf %186, %174 {RelaxedPrecision} : f32 + %199 = addf %188, %175 {RelaxedPrecision} : f32 + %200 = addf %190, %176 {RelaxedPrecision} : f32 + %201 = addf %192, %177 {RelaxedPrecision} : f32 + %202 = addf %194, %178 {RelaxedPrecision} : f32 + %203 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %204 = vector.insertelement %195, %203[%c0_i64 : i64] : vector<8xf32> + store %204, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %205 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %206 = vector.insertelement %196, %205[%c1_i64 : i64] : vector<8xf32> + store %206, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %207 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %208 = vector.insertelement %197, %207[%c2_i64 : i64] : vector<8xf32> + store %208, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %209 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %210 = vector.insertelement %198, %209[%c3_i64 : i64] : vector<8xf32> + store %210, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %211 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %199, %211[%c4_i64 : i64] : vector<8xf32> + store %212, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %213 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %214 = vector.insertelement %200, %213[%c5_i64 : i64] : vector<8xf32> + store %214, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %215 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %216 = vector.insertelement %201, %215[%c6_i64 : i64] : vector<8xf32> + store %216, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %217 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %202, %217[%c7_i64 : i64] : vector<8xf32> + store %218, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %219 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %220 = vector.insertelement %195, %219[%c0_i64 : i64] : vector<8xf32> + store %220, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %221 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %222 = vector.insertelement %196, %221[%c1_i64 : i64] : vector<8xf32> + store %222, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %223 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %197, %223[%c2_i64 : i64] : vector<8xf32> + store %224, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %225 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %198, %225[%c3_i64 : i64] : vector<8xf32> + store %226, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %227 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %199, %227[%c4_i64 : i64] : vector<8xf32> + store %228, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %229 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %200, %229[%c5_i64 : i64] : vector<8xf32> + store %230, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %231 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %201, %231[%c6_i64 : i64] : vector<8xf32> + store %232, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %233 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %202, %233[%c7_i64 : i64] : vector<8xf32> + store %234, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/18_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir b/Tutorials/optimized_matmul/mlir/18_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir new file mode 100644 index 00000000..aa07fd1b --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/18_ConvertGpuLaunchFuncToVulkanLaunchFunc.mlir @@ -0,0 +1,2095 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg6, %arg8 : index + %7 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = cmpi "slt", %arg5, %c0 : index + %16 = subi %c-1, %arg5 : index + %17 = select %15, %16, %arg5 : index + %18 = divi_signed %17, %c16 : index + %19 = subi %c-1, %18 : index + %20 = select %15, %19, %18 : index + %21 = remi_signed %20, %c16 : index + %22 = cmpi "slt", %21, %c0 : index + %23 = addi %21, %c16 : index + %24 = select %22, %23, %21 : index + %25 = remi_signed %6, %c128 : index + %26 = cmpi "slt", %25, %c0 : index + %27 = addi %25, %c128 : index + %28 = select %26, %27, %25 : index + %29 = remi_signed %arg5, %c16 : index + %30 = cmpi "slt", %29, %c0 : index + %31 = addi %29, %c16 : index + %32 = select %30, %31, %29 : index + %33 = cmpi "slt", %32, %c0 : index + %34 = subi %c-1, %32 : index + %35 = select %33, %34, %32 : index + %36 = divi_signed %35, %c8 : index + %37 = subi %c-1, %36 : index + %38 = select %33, %37, %36 : index + %39 = remi_signed %38, %c2 : index + %40 = cmpi "slt", %39, %c0 : index + %41 = addi %39, %c2 : index + %42 = select %40, %41, %39 : index + %43 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c0_i64 : i64] : vector<8xf32> + %45 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c1_i64 : i64] : vector<8xf32> + %47 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c2_i64 : i64] : vector<8xf32> + %49 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c3_i64 : i64] : vector<8xf32> + %51 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c4_i64 : i64] : vector<8xf32> + %53 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c5_i64 : i64] : vector<8xf32> + %55 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c6_i64 : i64] : vector<8xf32> + %57 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %58 = vector.extractelement %57[%c7_i64 : i64] : vector<8xf32> + %59 = mulf %7, %44 {RelaxedPrecision} : f32 + %60 = mulf %8, %46 {RelaxedPrecision} : f32 + %61 = mulf %9, %48 {RelaxedPrecision} : f32 + %62 = mulf %10, %50 {RelaxedPrecision} : f32 + %63 = mulf %11, %52 {RelaxedPrecision} : f32 + %64 = mulf %12, %54 {RelaxedPrecision} : f32 + %65 = mulf %13, %56 {RelaxedPrecision} : f32 + %66 = mulf %14, %58 {RelaxedPrecision} : f32 + %67 = addi %arg7, %arg9 : index + %68 = remi_signed %67, %c6 : index + %69 = cmpi "slt", %68, %c0 : index + %70 = addi %68, %c6 : index + %71 = select %69, %70, %68 : index + %72 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c0_i64 : i64] : vector<8xf32> + %74 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %75 = vector.extractelement %74[%c1_i64 : i64] : vector<8xf32> + %76 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c2_i64 : i64] : vector<8xf32> + %78 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c3_i64 : i64] : vector<8xf32> + %80 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %81 = vector.extractelement %80[%c4_i64 : i64] : vector<8xf32> + %82 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %83 = vector.extractelement %82[%c5_i64 : i64] : vector<8xf32> + %84 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %85 = vector.extractelement %84[%c6_i64 : i64] : vector<8xf32> + %86 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %87 = vector.extractelement %86[%c7_i64 : i64] : vector<8xf32> + %88 = addf %73, %59 {RelaxedPrecision} : f32 + %89 = addf %75, %60 {RelaxedPrecision} : f32 + %90 = addf %77, %61 {RelaxedPrecision} : f32 + %91 = addf %79, %62 {RelaxedPrecision} : f32 + %92 = addf %81, %63 {RelaxedPrecision} : f32 + %93 = addf %83, %64 {RelaxedPrecision} : f32 + %94 = addf %85, %65 {RelaxedPrecision} : f32 + %95 = addf %87, %66 {RelaxedPrecision} : f32 + %96 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %88, %96[%c0_i64 : i64] : vector<8xf32> + store %97, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %98 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %89, %98[%c1_i64 : i64] : vector<8xf32> + store %99, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %100 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %90, %100[%c2_i64 : i64] : vector<8xf32> + store %101, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %102 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %91, %102[%c3_i64 : i64] : vector<8xf32> + store %103, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %104 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %92, %104[%c4_i64 : i64] : vector<8xf32> + store %105, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %106 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %93, %106[%c5_i64 : i64] : vector<8xf32> + store %107, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %108 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %109 = vector.insertelement %94, %108[%c6_i64 : i64] : vector<8xf32> + store %109, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %110 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %95, %110[%c7_i64 : i64] : vector<8xf32> + store %111, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %112 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %88, %112[%c0_i64 : i64] : vector<8xf32> + store %113, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %114 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %89, %114[%c1_i64 : i64] : vector<8xf32> + store %115, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %116 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %117 = vector.insertelement %90, %116[%c2_i64 : i64] : vector<8xf32> + store %117, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %118 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %91, %118[%c3_i64 : i64] : vector<8xf32> + store %119, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %120 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %92, %120[%c4_i64 : i64] : vector<8xf32> + store %121, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %122 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %123 = vector.insertelement %93, %122[%c5_i64 : i64] : vector<8xf32> + store %123, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %124 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %94, %124[%c6_i64 : i64] : vector<8xf32> + store %125, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %126 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %95, %126[%c7_i64 : i64] : vector<8xf32> + store %127, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %128 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %134 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %135 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %136 = addi %arg5, %c8 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = subi %c-1, %136 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = remi_signed %142, %c16 : index + %144 = cmpi "slt", %143, %c0 : index + %145 = addi %143, %c16 : index + %146 = select %144, %145, %143 : index + %147 = divi_signed %17, %c8 : index + %148 = subi %c-1, %147 : index + %149 = select %15, %148, %147 : index + %150 = muli %142, %c-2 : index + %151 = addi %149, %150 : index + %152 = addi %151, %c1 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c1 : index + %162 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c2_i64 : i64] : vector<8xf32> + %168 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c3_i64 : i64] : vector<8xf32> + %170 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c4_i64 : i64] : vector<8xf32> + %172 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c5_i64 : i64] : vector<8xf32> + %174 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c6_i64 : i64] : vector<8xf32> + %176 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c7_i64 : i64] : vector<8xf32> + %178 = mulf %128, %163 {RelaxedPrecision} : f32 + %179 = mulf %129, %165 {RelaxedPrecision} : f32 + %180 = mulf %130, %167 {RelaxedPrecision} : f32 + %181 = mulf %131, %169 {RelaxedPrecision} : f32 + %182 = mulf %132, %171 {RelaxedPrecision} : f32 + %183 = mulf %133, %173 {RelaxedPrecision} : f32 + %184 = mulf %134, %175 {RelaxedPrecision} : f32 + %185 = mulf %135, %177 {RelaxedPrecision} : f32 + %186 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c0_i64 : i64] : vector<8xf32> + %188 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %189 = vector.extractelement %188[%c1_i64 : i64] : vector<8xf32> + %190 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %191 = vector.extractelement %190[%c2_i64 : i64] : vector<8xf32> + %192 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c3_i64 : i64] : vector<8xf32> + %194 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c4_i64 : i64] : vector<8xf32> + %196 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c5_i64 : i64] : vector<8xf32> + %198 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c6_i64 : i64] : vector<8xf32> + %200 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c7_i64 : i64] : vector<8xf32> + %202 = addf %187, %178 {RelaxedPrecision} : f32 + %203 = addf %189, %179 {RelaxedPrecision} : f32 + %204 = addf %191, %180 {RelaxedPrecision} : f32 + %205 = addf %193, %181 {RelaxedPrecision} : f32 + %206 = addf %195, %182 {RelaxedPrecision} : f32 + %207 = addf %197, %183 {RelaxedPrecision} : f32 + %208 = addf %199, %184 {RelaxedPrecision} : f32 + %209 = addf %201, %185 {RelaxedPrecision} : f32 + %210 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %202, %210[%c0_i64 : i64] : vector<8xf32> + store %211, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %212 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %213 = vector.insertelement %203, %212[%c1_i64 : i64] : vector<8xf32> + store %213, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %214 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %204, %214[%c2_i64 : i64] : vector<8xf32> + store %215, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %216 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %217 = vector.insertelement %205, %216[%c3_i64 : i64] : vector<8xf32> + store %217, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %218 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %219 = vector.insertelement %206, %218[%c4_i64 : i64] : vector<8xf32> + store %219, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %220 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %207, %220[%c5_i64 : i64] : vector<8xf32> + store %221, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %222 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %223 = vector.insertelement %208, %222[%c6_i64 : i64] : vector<8xf32> + store %223, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %224 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %209, %224[%c7_i64 : i64] : vector<8xf32> + store %225, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %226 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %202, %226[%c0_i64 : i64] : vector<8xf32> + store %227, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %228 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %203, %228[%c1_i64 : i64] : vector<8xf32> + store %229, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %230 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %204, %230[%c2_i64 : i64] : vector<8xf32> + store %231, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %232 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %205, %232[%c3_i64 : i64] : vector<8xf32> + store %233, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %234 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %206, %234[%c4_i64 : i64] : vector<8xf32> + store %235, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %236 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %207, %236[%c5_i64 : i64] : vector<8xf32> + store %237, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %238 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %208, %238[%c6_i64 : i64] : vector<8xf32> + store %239, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %240 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %209, %240[%c7_i64 : i64] : vector<8xf32> + store %241, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %6 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %7 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = cmpi "slt", %arg5, %c0 : index + %14 = subi %c-1, %arg5 : index + %15 = select %13, %14, %arg5 : index + %16 = divi_signed %15, %c16 : index + %17 = subi %c-1, %16 : index + %18 = select %13, %17, %16 : index + %19 = remi_signed %18, %c16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = addi %19, %c16 : index + %22 = select %20, %21, %19 : index + %23 = remi_signed %4, %c128 : index + %24 = cmpi "slt", %23, %c0 : index + %25 = addi %23, %c128 : index + %26 = select %24, %25, %23 : index + %27 = remi_signed %arg5, %c16 : index + %28 = cmpi "slt", %27, %c0 : index + %29 = addi %27, %c16 : index + %30 = select %28, %29, %27 : index + %31 = cmpi "slt", %30, %c0 : index + %32 = subi %c-1, %30 : index + %33 = select %31, %32, %30 : index + %34 = divi_signed %33, %c8 : index + %35 = subi %c-1, %34 : index + %36 = select %31, %35, %34 : index + %37 = remi_signed %36, %c2 : index + %38 = cmpi "slt", %37, %c0 : index + %39 = addi %37, %c2 : index + %40 = select %38, %39, %37 : index + %41 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c0_i64 : i64] : vector<8xf32> + %43 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c1_i64 : i64] : vector<8xf32> + %45 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c2_i64 : i64] : vector<8xf32> + %47 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c3_i64 : i64] : vector<8xf32> + %49 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c4_i64 : i64] : vector<8xf32> + %51 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c5_i64 : i64] : vector<8xf32> + %53 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c6_i64 : i64] : vector<8xf32> + %55 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c7_i64 : i64] : vector<8xf32> + %57 = mulf %5, %42 {RelaxedPrecision} : f32 + %58 = mulf %6, %44 {RelaxedPrecision} : f32 + %59 = mulf %7, %46 {RelaxedPrecision} : f32 + %60 = mulf %8, %48 {RelaxedPrecision} : f32 + %61 = mulf %9, %50 {RelaxedPrecision} : f32 + %62 = mulf %10, %52 {RelaxedPrecision} : f32 + %63 = mulf %11, %54 {RelaxedPrecision} : f32 + %64 = mulf %12, %56 {RelaxedPrecision} : f32 + %65 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c1_i64 : i64] : vector<8xf32> + %69 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c2_i64 : i64] : vector<8xf32> + %71 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %72 = vector.extractelement %71[%c3_i64 : i64] : vector<8xf32> + %73 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c4_i64 : i64] : vector<8xf32> + %75 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %78 = vector.extractelement %77[%c6_i64 : i64] : vector<8xf32> + %79 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = addf %66, %57 {RelaxedPrecision} : f32 + %82 = addf %68, %58 {RelaxedPrecision} : f32 + %83 = addf %70, %59 {RelaxedPrecision} : f32 + %84 = addf %72, %60 {RelaxedPrecision} : f32 + %85 = addf %74, %61 {RelaxedPrecision} : f32 + %86 = addf %76, %62 {RelaxedPrecision} : f32 + %87 = addf %78, %63 {RelaxedPrecision} : f32 + %88 = addf %80, %64 {RelaxedPrecision} : f32 + %89 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + store %90, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %91 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %82, %91[%c1_i64 : i64] : vector<8xf32> + store %92, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %93 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %94 = vector.insertelement %83, %93[%c2_i64 : i64] : vector<8xf32> + store %94, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %95 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %84, %95[%c3_i64 : i64] : vector<8xf32> + store %96, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %97 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c4_i64 : i64] : vector<8xf32> + store %98, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %99 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %100 = vector.insertelement %86, %99[%c5_i64 : i64] : vector<8xf32> + store %100, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %101 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %87, %101[%c6_i64 : i64] : vector<8xf32> + store %102, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %103 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %88, %103[%c7_i64 : i64] : vector<8xf32> + store %104, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %105 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %106 = vector.insertelement %81, %105[%c0_i64 : i64] : vector<8xf32> + store %106, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %107 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %82, %107[%c1_i64 : i64] : vector<8xf32> + store %108, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %109 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %83, %109[%c2_i64 : i64] : vector<8xf32> + store %110, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %111 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %112 = vector.insertelement %84, %111[%c3_i64 : i64] : vector<8xf32> + store %112, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %113 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %114 = vector.insertelement %85, %113[%c4_i64 : i64] : vector<8xf32> + store %114, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %115 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %86, %115[%c5_i64 : i64] : vector<8xf32> + store %116, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %117 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %87, %117[%c6_i64 : i64] : vector<8xf32> + store %118, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %119 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %120 = vector.insertelement %88, %119[%c7_i64 : i64] : vector<8xf32> + store %120, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %121 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %122 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %124 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = addi %arg5, %c8 : index + %130 = cmpi "slt", %129, %c0 : index + %131 = subi %c-1, %129 : index + %132 = select %130, %131, %129 : index + %133 = divi_signed %132, %c16 : index + %134 = subi %c-1, %133 : index + %135 = select %130, %134, %133 : index + %136 = remi_signed %135, %c16 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = addi %136, %c16 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %15, %c8 : index + %141 = subi %c-1, %140 : index + %142 = select %13, %141, %140 : index + %143 = muli %135, %c-2 : index + %144 = addi %142, %143 : index + %145 = addi %144, %c1 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c2 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = muli %151, %c-2 : index + %153 = addi %144, %152 : index + %154 = addi %153, %c1 : index + %155 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c0_i64 : i64] : vector<8xf32> + %157 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %158 = vector.extractelement %157[%c1_i64 : i64] : vector<8xf32> + %159 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %160 = vector.extractelement %159[%c2_i64 : i64] : vector<8xf32> + %161 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c3_i64 : i64] : vector<8xf32> + %163 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c4_i64 : i64] : vector<8xf32> + %165 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c5_i64 : i64] : vector<8xf32> + %167 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c6_i64 : i64] : vector<8xf32> + %169 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c7_i64 : i64] : vector<8xf32> + %171 = mulf %121, %156 {RelaxedPrecision} : f32 + %172 = mulf %122, %158 {RelaxedPrecision} : f32 + %173 = mulf %123, %160 {RelaxedPrecision} : f32 + %174 = mulf %124, %162 {RelaxedPrecision} : f32 + %175 = mulf %125, %164 {RelaxedPrecision} : f32 + %176 = mulf %126, %166 {RelaxedPrecision} : f32 + %177 = mulf %127, %168 {RelaxedPrecision} : f32 + %178 = mulf %128, %170 {RelaxedPrecision} : f32 + %179 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %180 = vector.extractelement %179[%c0_i64 : i64] : vector<8xf32> + %181 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %182 = vector.extractelement %181[%c1_i64 : i64] : vector<8xf32> + %183 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %186 = vector.extractelement %185[%c3_i64 : i64] : vector<8xf32> + %187 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %188 = vector.extractelement %187[%c4_i64 : i64] : vector<8xf32> + %189 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c5_i64 : i64] : vector<8xf32> + %191 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %192 = vector.extractelement %191[%c6_i64 : i64] : vector<8xf32> + %193 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c7_i64 : i64] : vector<8xf32> + %195 = addf %180, %171 {RelaxedPrecision} : f32 + %196 = addf %182, %172 {RelaxedPrecision} : f32 + %197 = addf %184, %173 {RelaxedPrecision} : f32 + %198 = addf %186, %174 {RelaxedPrecision} : f32 + %199 = addf %188, %175 {RelaxedPrecision} : f32 + %200 = addf %190, %176 {RelaxedPrecision} : f32 + %201 = addf %192, %177 {RelaxedPrecision} : f32 + %202 = addf %194, %178 {RelaxedPrecision} : f32 + %203 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %204 = vector.insertelement %195, %203[%c0_i64 : i64] : vector<8xf32> + store %204, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %205 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %206 = vector.insertelement %196, %205[%c1_i64 : i64] : vector<8xf32> + store %206, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %207 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %208 = vector.insertelement %197, %207[%c2_i64 : i64] : vector<8xf32> + store %208, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %209 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %210 = vector.insertelement %198, %209[%c3_i64 : i64] : vector<8xf32> + store %210, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %211 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %199, %211[%c4_i64 : i64] : vector<8xf32> + store %212, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %213 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %214 = vector.insertelement %200, %213[%c5_i64 : i64] : vector<8xf32> + store %214, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %215 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %216 = vector.insertelement %201, %215[%c6_i64 : i64] : vector<8xf32> + store %216, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %217 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %202, %217[%c7_i64 : i64] : vector<8xf32> + store %218, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %219 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %220 = vector.insertelement %195, %219[%c0_i64 : i64] : vector<8xf32> + store %220, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %221 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %222 = vector.insertelement %196, %221[%c1_i64 : i64] : vector<8xf32> + store %222, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %223 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %197, %223[%c2_i64 : i64] : vector<8xf32> + store %224, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %225 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %198, %225[%c3_i64 : i64] : vector<8xf32> + store %226, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %227 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %199, %227[%c4_i64 : i64] : vector<8xf32> + store %228, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %229 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %200, %229[%c5_i64 : i64] : vector<8xf32> + store %230, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %231 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %201, %231[%c6_i64 : i64] : vector<8xf32> + store %232, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %233 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %202, %233[%c7_i64 : i64] : vector<8xf32> + store %234, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/18_EmitVulkanWrapper.mlir b/Tutorials/optimized_matmul/mlir/18_EmitVulkanWrapper.mlir new file mode 100644 index 00000000..2b20194d --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/18_EmitVulkanWrapper.mlir @@ -0,0 +1,1368 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%arg4, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%arg4, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%arg4, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%arg4, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%arg4, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%arg4, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%arg4, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%arg4, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%arg4, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%arg4, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = addi %arg4, %c1 : index + %114 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = mulf %114, %115 {RelaxedPrecision} : f32 + %117 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = addf %117, %116 {RelaxedPrecision} : f32 + store %118, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%113, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %121 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = mulf %120, %121 {RelaxedPrecision} : f32 + %123 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = addf %123, %122 {RelaxedPrecision} : f32 + store %124, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %125, %arg2[%113, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = mulf %126, %127 {RelaxedPrecision} : f32 + %129 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = addf %129, %128 {RelaxedPrecision} : f32 + store %130, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %131, %arg2[%113, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = mulf %132, %133 {RelaxedPrecision} : f32 + %135 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = addf %135, %134 {RelaxedPrecision} : f32 + store %136, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %137, %arg2[%113, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = mulf %138, %139 {RelaxedPrecision} : f32 + %141 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = addf %141, %140 {RelaxedPrecision} : f32 + store %142, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%113, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = mulf %144, %145 {RelaxedPrecision} : f32 + %147 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = addf %147, %146 {RelaxedPrecision} : f32 + store %148, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %149, %arg2[%113, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %151 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = mulf %150, %151 {RelaxedPrecision} : f32 + %153 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = addf %153, %152 {RelaxedPrecision} : f32 + store %154, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %155, %arg2[%113, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%113, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = mulf %162, %163 {RelaxedPrecision} : f32 + %165 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = addf %165, %164 {RelaxedPrecision} : f32 + store %166, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%113, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %169 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = mulf %168, %169 {RelaxedPrecision} : f32 + %171 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addf %171, %170 {RelaxedPrecision} : f32 + store %172, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %173, %arg2[%113, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %175 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = mulf %174, %175 {RelaxedPrecision} : f32 + %177 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addf %177, %176 {RelaxedPrecision} : f32 + store %178, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %179, %arg2[%113, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %181 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = mulf %180, %181 {RelaxedPrecision} : f32 + %183 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = addf %183, %182 {RelaxedPrecision} : f32 + store %184, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %185, %arg2[%113, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%113, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %193 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = mulf %192, %193 {RelaxedPrecision} : f32 + %195 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addf %195, %194 {RelaxedPrecision} : f32 + store %196, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %197, %arg2[%113, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %199 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = mulf %198, %199 {RelaxedPrecision} : f32 + %201 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addf %201, %200 {RelaxedPrecision} : f32 + store %202, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %203, %arg2[%113, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg0[%113, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %205 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = mulf %204, %205 {RelaxedPrecision} : f32 + %207 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = addf %207, %206 {RelaxedPrecision} : f32 + store %208, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %209, %arg2[%113, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addi %arg4, %c2 : index + %211 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %212 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = mulf %211, %212 {RelaxedPrecision} : f32 + %214 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = addf %214, %213 {RelaxedPrecision} : f32 + store %215, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = load %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %216, %arg2[%210, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%210, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %224 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = mulf %223, %224 {RelaxedPrecision} : f32 + %226 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = addf %226, %225 {RelaxedPrecision} : f32 + store %227, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = load %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %228, %arg2[%210, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %230 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = mulf %229, %230 {RelaxedPrecision} : f32 + %232 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = addf %232, %231 {RelaxedPrecision} : f32 + store %233, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = load %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %234, %arg2[%210, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%210, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %242 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = mulf %241, %242 {RelaxedPrecision} : f32 + %244 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = addf %244, %243 {RelaxedPrecision} : f32 + store %245, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = load %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %246, %arg2[%210, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %248 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = mulf %247, %248 {RelaxedPrecision} : f32 + %250 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = addf %250, %249 {RelaxedPrecision} : f32 + store %251, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = load %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %252, %arg2[%210, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%210, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %260 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = mulf %259, %260 {RelaxedPrecision} : f32 + %262 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = addf %262, %261 {RelaxedPrecision} : f32 + store %263, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = load %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %264, %arg2[%210, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %266 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = mulf %265, %266 {RelaxedPrecision} : f32 + %268 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = addf %268, %267 {RelaxedPrecision} : f32 + store %269, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = load %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %270, %arg2[%210, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%210, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %278 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = mulf %277, %278 {RelaxedPrecision} : f32 + %280 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = addf %280, %279 {RelaxedPrecision} : f32 + store %281, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = load %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %282, %arg2[%210, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %284 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = mulf %283, %284 {RelaxedPrecision} : f32 + %286 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = addf %286, %285 {RelaxedPrecision} : f32 + store %287, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = load %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %288, %arg2[%210, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %290 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = mulf %289, %290 {RelaxedPrecision} : f32 + %292 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = addf %292, %291 {RelaxedPrecision} : f32 + store %293, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = load %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %294, %arg2[%210, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %296 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = mulf %295, %296 {RelaxedPrecision} : f32 + %298 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = addf %298, %297 {RelaxedPrecision} : f32 + store %299, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = load %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %300, %arg2[%210, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg0[%210, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %302 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = mulf %301, %302 {RelaxedPrecision} : f32 + %304 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = addf %304, %303 {RelaxedPrecision} : f32 + store %305, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = load %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %306, %arg2[%210, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = addi %arg4, %c3 : index + %308 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %309 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = mulf %308, %309 {RelaxedPrecision} : f32 + %311 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addf %311, %310 {RelaxedPrecision} : f32 + store %312, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = load %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %313, %arg2[%307, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %314 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = mulf %314, %315 {RelaxedPrecision} : f32 + %317 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = addf %317, %316 {RelaxedPrecision} : f32 + store %318, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%307, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = mulf %320, %321 {RelaxedPrecision} : f32 + %323 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = addf %323, %322 {RelaxedPrecision} : f32 + store %324, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%307, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %327 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = mulf %326, %327 {RelaxedPrecision} : f32 + %329 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addf %329, %328 {RelaxedPrecision} : f32 + store %330, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = load %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %331, %arg2[%307, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %333 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = mulf %332, %333 {RelaxedPrecision} : f32 + %335 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = addf %335, %334 {RelaxedPrecision} : f32 + store %336, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = load %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %337, %arg2[%307, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = mulf %338, %339 {RelaxedPrecision} : f32 + %341 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = addf %341, %340 {RelaxedPrecision} : f32 + store %342, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%307, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %345 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = mulf %344, %345 {RelaxedPrecision} : f32 + %347 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addf %347, %346 {RelaxedPrecision} : f32 + store %348, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = load %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %349, %arg2[%307, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %351 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = mulf %350, %351 {RelaxedPrecision} : f32 + %353 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = addf %353, %352 {RelaxedPrecision} : f32 + store %354, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = load %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %355, %arg2[%307, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = mulf %356, %357 {RelaxedPrecision} : f32 + %359 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = addf %359, %358 {RelaxedPrecision} : f32 + store %360, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%307, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = mulf %362, %363 {RelaxedPrecision} : f32 + %365 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addf %365, %364 {RelaxedPrecision} : f32 + store %366, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%307, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %369 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = mulf %368, %369 {RelaxedPrecision} : f32 + %371 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = addf %371, %370 {RelaxedPrecision} : f32 + store %372, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = load %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %373, %arg2[%307, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = mulf %374, %375 {RelaxedPrecision} : f32 + %377 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = addf %377, %376 {RelaxedPrecision} : f32 + store %378, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%307, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %381 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = mulf %380, %381 {RelaxedPrecision} : f32 + %383 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addf %383, %382 {RelaxedPrecision} : f32 + store %384, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = load %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %385, %arg2[%307, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = mulf %386, %387 {RelaxedPrecision} : f32 + %389 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = addf %389, %388 {RelaxedPrecision} : f32 + store %390, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%307, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = mulf %392, %393 {RelaxedPrecision} : f32 + %395 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = addf %395, %394 {RelaxedPrecision} : f32 + store %396, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%307, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = load %arg0[%307, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %399 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = mulf %398, %399 {RelaxedPrecision} : f32 + %401 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addf %401, %400 {RelaxedPrecision} : f32 + store %402, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = load %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %403, %arg2[%307, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = addi %arg4, %c4 : index + %405 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%404, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %412 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = mulf %411, %412 {RelaxedPrecision} : f32 + %414 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = addf %414, %413 {RelaxedPrecision} : f32 + store %415, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = load %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %416, %arg2[%404, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %418 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = mulf %417, %418 {RelaxedPrecision} : f32 + %420 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addf %420, %419 {RelaxedPrecision} : f32 + store %421, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = load %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %422, %arg2[%404, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%404, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %430 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = mulf %429, %430 {RelaxedPrecision} : f32 + %432 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = addf %432, %431 {RelaxedPrecision} : f32 + store %433, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = load %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %434, %arg2[%404, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%404, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %442 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = mulf %441, %442 {RelaxedPrecision} : f32 + %444 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = addf %444, %443 {RelaxedPrecision} : f32 + store %445, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = load %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %446, %arg2[%404, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %448 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = mulf %447, %448 {RelaxedPrecision} : f32 + %450 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addf %450, %449 {RelaxedPrecision} : f32 + store %451, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = load %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %452, %arg2[%404, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %454 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = mulf %453, %454 {RelaxedPrecision} : f32 + %456 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = addf %456, %455 {RelaxedPrecision} : f32 + store %457, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %458 = load %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %458, %arg2[%404, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %460 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = mulf %459, %460 {RelaxedPrecision} : f32 + %462 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = addf %462, %461 {RelaxedPrecision} : f32 + store %463, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = load %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %464, %arg2[%404, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %466 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = mulf %465, %466 {RelaxedPrecision} : f32 + %468 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = addf %468, %467 {RelaxedPrecision} : f32 + store %469, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = load %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %470, %arg2[%404, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %472 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = mulf %471, %472 {RelaxedPrecision} : f32 + %474 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = addf %474, %473 {RelaxedPrecision} : f32 + store %475, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = load %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %476, %arg2[%404, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %478 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = mulf %477, %478 {RelaxedPrecision} : f32 + %480 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = addf %480, %479 {RelaxedPrecision} : f32 + store %481, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = load %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %482, %arg2[%404, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %484 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = mulf %483, %484 {RelaxedPrecision} : f32 + %486 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = addf %486, %485 {RelaxedPrecision} : f32 + store %487, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = load %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %488, %arg2[%404, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %490 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = mulf %489, %490 {RelaxedPrecision} : f32 + %492 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = addf %492, %491 {RelaxedPrecision} : f32 + store %493, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = load %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %494, %arg2[%404, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg0[%404, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %496 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = mulf %495, %496 {RelaxedPrecision} : f32 + %498 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = addf %498, %497 {RelaxedPrecision} : f32 + store %499, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = load %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %500, %arg2[%404, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = addi %arg4, %c5 : index + %502 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %503 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = mulf %502, %503 {RelaxedPrecision} : f32 + %505 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = addf %505, %504 {RelaxedPrecision} : f32 + store %506, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = load %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %507, %arg2[%501, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %509 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = mulf %508, %509 {RelaxedPrecision} : f32 + %511 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %512 = addf %511, %510 {RelaxedPrecision} : f32 + store %512, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = load %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %513, %arg2[%501, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%501, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %521 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = mulf %520, %521 {RelaxedPrecision} : f32 + %523 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = addf %523, %522 {RelaxedPrecision} : f32 + store %524, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = load %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %525, %arg2[%501, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %527 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = mulf %526, %527 {RelaxedPrecision} : f32 + %529 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addf %529, %528 {RelaxedPrecision} : f32 + store %530, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = load %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %531, %arg2[%501, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %533 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = mulf %532, %533 {RelaxedPrecision} : f32 + %535 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addf %535, %534 {RelaxedPrecision} : f32 + store %536, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %537 = load %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %537, %arg2[%501, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %539 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = mulf %538, %539 {RelaxedPrecision} : f32 + %541 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = addf %541, %540 {RelaxedPrecision} : f32 + store %542, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = load %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %543, %arg2[%501, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%501, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %551 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = mulf %550, %551 {RelaxedPrecision} : f32 + %553 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addf %553, %552 {RelaxedPrecision} : f32 + store %554, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %555 = load %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %555, %arg2[%501, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %557 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = mulf %556, %557 {RelaxedPrecision} : f32 + %559 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addf %559, %558 {RelaxedPrecision} : f32 + store %560, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = load %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %561, %arg2[%501, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %563 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = mulf %562, %563 {RelaxedPrecision} : f32 + %565 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = addf %565, %564 {RelaxedPrecision} : f32 + store %566, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = load %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %567, %arg2[%501, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %569 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = mulf %568, %569 {RelaxedPrecision} : f32 + %571 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = addf %571, %570 {RelaxedPrecision} : f32 + store %572, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %573 = load %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %573, %arg2[%501, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%501, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %581 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = mulf %580, %581 {RelaxedPrecision} : f32 + %583 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = addf %583, %582 {RelaxedPrecision} : f32 + store %584, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = load %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %585, %arg2[%501, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %587 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = mulf %586, %587 {RelaxedPrecision} : f32 + %589 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addf %589, %588 {RelaxedPrecision} : f32 + store %590, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %591 = load %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %591, %arg2[%501, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = load %arg0[%501, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %593 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = mulf %592, %593 {RelaxedPrecision} : f32 + %595 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = addf %595, %594 {RelaxedPrecision} : f32 + store %596, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %597 = load %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %597, %arg2[%501, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %0, %c1 : index + %9 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %11 = mulf %9, %10 {RelaxedPrecision} : f32 + %12 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = addf %12, %11 {RelaxedPrecision} : f32 + store %13, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addi %0, %c2 : index + %16 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = mulf %16, %17 {RelaxedPrecision} : f32 + %19 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = addf %19, %18 {RelaxedPrecision} : f32 + store %20, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = load %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %21, %arg2[%c780, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = addi %0, %c3 : index + %23 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = mulf %23, %24 {RelaxedPrecision} : f32 + %26 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %27 = addf %26, %25 {RelaxedPrecision} : f32 + store %27, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = load %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %28, %arg2[%c780, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %29 = addi %0, %c4 : index + %30 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = mulf %30, %31 {RelaxedPrecision} : f32 + %33 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = addf %33, %32 {RelaxedPrecision} : f32 + store %34, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = load %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %35, %arg2[%c780, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = addi %0, %c5 : index + %37 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %38 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = mulf %37, %38 {RelaxedPrecision} : f32 + %40 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %41 = addf %40, %39 {RelaxedPrecision} : f32 + store %41, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = addi %0, %c6 : index + %44 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %45 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = mulf %44, %45 {RelaxedPrecision} : f32 + %47 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = addf %47, %46 {RelaxedPrecision} : f32 + store %48, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = load %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %49, %arg2[%c780, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %50 = addi %0, %c7 : index + %51 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %52 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = mulf %51, %52 {RelaxedPrecision} : f32 + %54 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = addf %54, %53 {RelaxedPrecision} : f32 + store %55, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = load %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %56, %arg2[%c780, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %57 = addi %0, %c8 : index + %58 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = mulf %58, %59 {RelaxedPrecision} : f32 + %61 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addf %61, %60 {RelaxedPrecision} : f32 + store %62, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = addi %0, %c9 : index + %65 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %0, %c10 : index + %72 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %73 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = mulf %72, %73 {RelaxedPrecision} : f32 + %75 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = addf %75, %74 {RelaxedPrecision} : f32 + store %76, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = load %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %77, %arg2[%c780, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addi %0, %c11 : index + %79 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %80 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = mulf %79, %80 {RelaxedPrecision} : f32 + %82 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %83 = addf %82, %81 {RelaxedPrecision} : f32 + store %83, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = load %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %84, %arg2[%c780, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = addi %0, %c12 : index + %86 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %87 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = mulf %86, %87 {RelaxedPrecision} : f32 + %89 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %90 = addf %89, %88 {RelaxedPrecision} : f32 + store %90, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = load %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %91, %arg2[%c780, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = addi %0, %c13 : index + %93 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %94 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = mulf %93, %94 {RelaxedPrecision} : f32 + %96 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = addf %96, %95 {RelaxedPrecision} : f32 + store %97, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = load %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %98, %arg2[%c780, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %99 = addi %0, %c14 : index + %100 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %101 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = mulf %100, %101 {RelaxedPrecision} : f32 + %103 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = addf %103, %102 {RelaxedPrecision} : f32 + store %104, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = addi %0, %c15 : index + %107 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %108 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = mulf %107, %108 {RelaxedPrecision} : f32 + %110 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = addf %110, %109 {RelaxedPrecision} : f32 + store %111, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = load %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %112, %arg2[%c780, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %113 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = mulf %113, %114 {RelaxedPrecision} : f32 + %116 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %117 = addf %116, %115 {RelaxedPrecision} : f32 + store %117, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = load %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %118, %arg2[%c781, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c781, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = mulf %125, %126 {RelaxedPrecision} : f32 + %128 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = addf %128, %127 {RelaxedPrecision} : f32 + store %129, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %130, %arg2[%c781, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %131 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = mulf %131, %132 {RelaxedPrecision} : f32 + %134 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = addf %134, %133 {RelaxedPrecision} : f32 + store %135, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%c781, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c781, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = mulf %143, %144 {RelaxedPrecision} : f32 + %146 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = addf %146, %145 {RelaxedPrecision} : f32 + store %147, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %148, %arg2[%c781, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = mulf %149, %150 {RelaxedPrecision} : f32 + %152 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = addf %152, %151 {RelaxedPrecision} : f32 + store %153, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%c781, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %157 = mulf %155, %156 {RelaxedPrecision} : f32 + %158 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = addf %158, %157 {RelaxedPrecision} : f32 + store %159, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %160, %arg2[%c781, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = mulf %161, %162 {RelaxedPrecision} : f32 + %164 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = addf %164, %163 {RelaxedPrecision} : f32 + store %165, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %166, %arg2[%c781, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = mulf %167, %168 {RelaxedPrecision} : f32 + %170 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = addf %170, %169 {RelaxedPrecision} : f32 + store %171, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%c781, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %174 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = mulf %173, %174 {RelaxedPrecision} : f32 + %176 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = addf %176, %175 {RelaxedPrecision} : f32 + store %177, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %178, %arg2[%c781, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %179 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %180 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = mulf %179, %180 {RelaxedPrecision} : f32 + %182 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = addf %182, %181 {RelaxedPrecision} : f32 + store %183, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %184, %arg2[%c781, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = mulf %185, %186 {RelaxedPrecision} : f32 + %188 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = addf %188, %187 {RelaxedPrecision} : f32 + store %189, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%c781, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %192 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %193 = mulf %191, %192 {RelaxedPrecision} : f32 + %194 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = addf %194, %193 {RelaxedPrecision} : f32 + store %195, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %196, %arg2[%c781, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %197 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %198 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = mulf %197, %198 {RelaxedPrecision} : f32 + %200 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = addf %200, %199 {RelaxedPrecision} : f32 + store %201, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %202, %arg2[%c781, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = load %arg0[%c781, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = mulf %203, %204 {RelaxedPrecision} : f32 + %206 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = addf %206, %205 {RelaxedPrecision} : f32 + store %207, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%c781, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %210 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = mulf %209, %210 {RelaxedPrecision} : f32 + %212 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = addf %212, %211 {RelaxedPrecision} : f32 + store %213, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %214, %arg2[%c782, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %216 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = mulf %215, %216 {RelaxedPrecision} : f32 + %218 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = addf %218, %217 {RelaxedPrecision} : f32 + store %219, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = load %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %220, %arg2[%c782, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%c782, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%c782, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%c782, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%c782, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%c782, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%c782, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%c782, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%c782, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%c782, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%c782, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%c782, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%c782, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%c782, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%c782, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%c782, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%c783, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%1, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%c783, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %318 = load %arg1[%1, %15] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = mulf %317, %318 {RelaxedPrecision} : f32 + %320 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addf %320, %319 {RelaxedPrecision} : f32 + store %321, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = load %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %322, %arg2[%c783, %15] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %324 = load %arg1[%1, %22] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = mulf %323, %324 {RelaxedPrecision} : f32 + %326 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = addf %326, %325 {RelaxedPrecision} : f32 + store %327, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = load %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %328, %arg2[%c783, %22] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%1, %29] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = mulf %329, %330 {RelaxedPrecision} : f32 + %332 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = addf %332, %331 {RelaxedPrecision} : f32 + store %333, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%c783, %29] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%1, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%c783, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %342 = load %arg1[%1, %43] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = mulf %341, %342 {RelaxedPrecision} : f32 + %344 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %345 = addf %344, %343 {RelaxedPrecision} : f32 + store %345, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = load %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %346, %arg2[%c783, %43] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%1, %50] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = mulf %347, %348 {RelaxedPrecision} : f32 + %350 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addf %350, %349 {RelaxedPrecision} : f32 + store %351, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%c783, %50] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %354 = load %arg1[%1, %57] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = mulf %353, %354 {RelaxedPrecision} : f32 + %356 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addf %356, %355 {RelaxedPrecision} : f32 + store %357, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = load %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %358, %arg2[%c783, %57] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %360 = load %arg1[%1, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = mulf %359, %360 {RelaxedPrecision} : f32 + %362 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %363 = addf %362, %361 {RelaxedPrecision} : f32 + store %363, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = load %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %364, %arg2[%c783, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%1, %71] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%c783, %71] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %372 = load %arg1[%1, %78] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = mulf %371, %372 {RelaxedPrecision} : f32 + %374 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addf %374, %373 {RelaxedPrecision} : f32 + store %375, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = load %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %376, %arg2[%c783, %78] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %378 = load %arg1[%1, %85] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = mulf %377, %378 {RelaxedPrecision} : f32 + %380 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addf %380, %379 {RelaxedPrecision} : f32 + store %381, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = load %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %382, %arg2[%c783, %85] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%1, %92] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = mulf %383, %384 {RelaxedPrecision} : f32 + %386 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = addf %386, %385 {RelaxedPrecision} : f32 + store %387, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%c783, %92] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %390 = load %arg1[%1, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = mulf %389, %390 {RelaxedPrecision} : f32 + %392 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addf %392, %391 {RelaxedPrecision} : f32 + store %393, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = load %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %394, %arg2[%c783, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg0[%c783, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%1, %106] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%c783, %106] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/19_EmitVulkanWrapper.mlir b/Tutorials/optimized_matmul/mlir/19_EmitVulkanWrapper.mlir new file mode 100644 index 00000000..aa07fd1b --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/19_EmitVulkanWrapper.mlir @@ -0,0 +1,2095 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %4, %c8 : index + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = addi %4, %c16 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = addi %4, %c24 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = addi %4, %c32 : index + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = addi %4, %c40 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = addi %4, %c48 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = addi %4, %c56 : index + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = addi %4, %c64 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = addi %4, %c72 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = addi %4, %c80 : index + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = addi %4, %c88 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = addi %4, %c96 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = addi %4, %c104 : index + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c112 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = addi %4, %c120 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %37 = cmpi "slt", %arg5, %c0 : index + %38 = subi %c-1, %arg5 : index + %39 = select %37, %38, %arg5 : index + %40 = divi_signed %39, %c16 : index + %41 = subi %c-1, %40 : index + %42 = select %37, %41, %40 : index + %43 = remi_signed %42, %c16 : index + %44 = cmpi "slt", %43, %c0 : index + %45 = addi %43, %c16 : index + %46 = select %44, %45, %43 : index + %47 = remi_signed %arg4, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + store %36, %3[%46, %50, %64] : memref<16x128x2xvector<8xf32>> + %65 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %66 = addi %arg5, %c8 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = subi %c-1, %66 : index + %69 = select %67, %68, %66 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = divi_signed %39, %c8 : index + %78 = subi %c-1, %77 : index + %79 = select %37, %78, %77 : index + %80 = muli %72, %c-2 : index + %81 = addi %79, %80 : index + %82 = addi %81, %c1 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = subi %c-1, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c2 : index + %87 = subi %c-1, %86 : index + %88 = select %83, %87, %86 : index + %89 = muli %88, %c-2 : index + %90 = addi %81, %89 : index + %91 = addi %90, %c1 : index + store %65, %3[%76, %50, %91] : memref<16x128x2xvector<8xf32>> + %92 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %93 = addi %42, %c1 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = subi %c-1, %93 : index + %96 = select %94, %95, %93 : index + %97 = divi_signed %96, %c16 : index + %98 = subi %c-1, %97 : index + %99 = select %94, %98, %97 : index + %100 = muli %99, %c-16 : index + %101 = addi %42, %100 : index + %102 = addi %101, %c1 : index + store %92, %3[%102, %50, %64] : memref<16x128x2xvector<8xf32>> + %103 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %104 = addi %arg5, %c24 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = subi %c-1, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16 : index + %109 = subi %c-1, %108 : index + %110 = select %105, %109, %108 : index + %111 = remi_signed %110, %c16 : index + %112 = cmpi "slt", %111, %c0 : index + %113 = addi %111, %c16 : index + %114 = select %112, %113, %111 : index + %115 = muli %110, %c-2 : index + %116 = addi %79, %115 : index + %117 = addi %116, %c3 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c2 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c3 : index + store %103, %3[%114, %50, %126] : memref<16x128x2xvector<8xf32>> + %127 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %128 = addi %42, %c2 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = subi %c-1, %128 : index + %131 = select %129, %130, %128 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = muli %134, %c-16 : index + %136 = addi %42, %135 : index + %137 = addi %136, %c2 : index + store %127, %3[%137, %50, %64] : memref<16x128x2xvector<8xf32>> + %138 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %139 = addi %arg5, %c40 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = subi %c-1, %139 : index + %142 = select %140, %141, %139 : index + %143 = divi_signed %142, %c16 : index + %144 = subi %c-1, %143 : index + %145 = select %140, %144, %143 : index + %146 = remi_signed %145, %c16 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = addi %146, %c16 : index + %149 = select %147, %148, %146 : index + %150 = muli %145, %c-2 : index + %151 = addi %79, %150 : index + %152 = addi %151, %c5 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c5 : index + store %138, %3[%149, %50, %161] : memref<16x128x2xvector<8xf32>> + %162 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %163 = addi %42, %c3 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = subi %c-1, %163 : index + %166 = select %164, %165, %163 : index + %167 = divi_signed %166, %c16 : index + %168 = subi %c-1, %167 : index + %169 = select %164, %168, %167 : index + %170 = muli %169, %c-16 : index + %171 = addi %42, %170 : index + %172 = addi %171, %c3 : index + store %162, %3[%172, %50, %64] : memref<16x128x2xvector<8xf32>> + %173 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %174 = addi %arg5, %c56 : index + %175 = cmpi "slt", %174, %c0 : index + %176 = subi %c-1, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = muli %180, %c-2 : index + %186 = addi %79, %185 : index + %187 = addi %186, %c7 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c2 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-2 : index + %195 = addi %186, %194 : index + %196 = addi %195, %c7 : index + store %173, %3[%184, %50, %196] : memref<16x128x2xvector<8xf32>> + %197 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %198 = addi %42, %c4 : index + %199 = cmpi "slt", %198, %c0 : index + %200 = subi %c-1, %198 : index + %201 = select %199, %200, %198 : index + %202 = divi_signed %201, %c16 : index + %203 = subi %c-1, %202 : index + %204 = select %199, %203, %202 : index + %205 = muli %204, %c-16 : index + %206 = addi %42, %205 : index + %207 = addi %206, %c4 : index + store %197, %3[%207, %50, %64] : memref<16x128x2xvector<8xf32>> + %208 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %209 = addi %arg5, %c72 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c16 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c16 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c16 : index + %219 = select %217, %218, %216 : index + %220 = muli %215, %c-2 : index + %221 = addi %79, %220 : index + %222 = addi %221, %c9 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = subi %c-1, %222 : index + %225 = select %223, %224, %222 : index + %226 = divi_signed %225, %c2 : index + %227 = subi %c-1, %226 : index + %228 = select %223, %227, %226 : index + %229 = muli %228, %c-2 : index + %230 = addi %221, %229 : index + %231 = addi %230, %c9 : index + store %208, %3[%219, %50, %231] : memref<16x128x2xvector<8xf32>> + %232 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %233 = addi %42, %c5 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = subi %c-1, %233 : index + %236 = select %234, %235, %233 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = muli %239, %c-16 : index + %241 = addi %42, %240 : index + %242 = addi %241, %c5 : index + store %232, %3[%242, %50, %64] : memref<16x128x2xvector<8xf32>> + %243 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %244 = addi %arg5, %c88 : index + %245 = cmpi "slt", %244, %c0 : index + %246 = subi %c-1, %244 : index + %247 = select %245, %246, %244 : index + %248 = divi_signed %247, %c16 : index + %249 = subi %c-1, %248 : index + %250 = select %245, %249, %248 : index + %251 = remi_signed %250, %c16 : index + %252 = cmpi "slt", %251, %c0 : index + %253 = addi %251, %c16 : index + %254 = select %252, %253, %251 : index + %255 = muli %250, %c-2 : index + %256 = addi %79, %255 : index + %257 = addi %256, %c11 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = subi %c-1, %257 : index + %260 = select %258, %259, %257 : index + %261 = divi_signed %260, %c2 : index + %262 = subi %c-1, %261 : index + %263 = select %258, %262, %261 : index + %264 = muli %263, %c-2 : index + %265 = addi %256, %264 : index + %266 = addi %265, %c11 : index + store %243, %3[%254, %50, %266] : memref<16x128x2xvector<8xf32>> + %267 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %268 = addi %42, %c6 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = subi %c-1, %268 : index + %271 = select %269, %270, %268 : index + %272 = divi_signed %271, %c16 : index + %273 = subi %c-1, %272 : index + %274 = select %269, %273, %272 : index + %275 = muli %274, %c-16 : index + %276 = addi %42, %275 : index + %277 = addi %276, %c6 : index + store %267, %3[%277, %50, %64] : memref<16x128x2xvector<8xf32>> + %278 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %279 = addi %arg5, %c104 : index + %280 = cmpi "slt", %279, %c0 : index + %281 = subi %c-1, %279 : index + %282 = select %280, %281, %279 : index + %283 = divi_signed %282, %c16 : index + %284 = subi %c-1, %283 : index + %285 = select %280, %284, %283 : index + %286 = remi_signed %285, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = muli %285, %c-2 : index + %291 = addi %79, %290 : index + %292 = addi %291, %c13 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = subi %c-1, %292 : index + %295 = select %293, %294, %292 : index + %296 = divi_signed %295, %c2 : index + %297 = subi %c-1, %296 : index + %298 = select %293, %297, %296 : index + %299 = muli %298, %c-2 : index + %300 = addi %291, %299 : index + %301 = addi %300, %c13 : index + store %278, %3[%289, %50, %301] : memref<16x128x2xvector<8xf32>> + %302 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %303 = addi %42, %c7 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = subi %c-1, %303 : index + %306 = select %304, %305, %303 : index + %307 = divi_signed %306, %c16 : index + %308 = subi %c-1, %307 : index + %309 = select %304, %308, %307 : index + %310 = muli %309, %c-16 : index + %311 = addi %42, %310 : index + %312 = addi %311, %c7 : index + store %302, %3[%312, %50, %64] : memref<16x128x2xvector<8xf32>> + %313 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %314 = addi %arg5, %c120 : index + %315 = cmpi "slt", %314, %c0 : index + %316 = subi %c-1, %314 : index + %317 = select %315, %316, %314 : index + %318 = divi_signed %317, %c16 : index + %319 = subi %c-1, %318 : index + %320 = select %315, %319, %318 : index + %321 = remi_signed %320, %c16 : index + %322 = cmpi "slt", %321, %c0 : index + %323 = addi %321, %c16 : index + %324 = select %322, %323, %321 : index + %325 = muli %320, %c-2 : index + %326 = addi %79, %325 : index + %327 = addi %326, %c15 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = subi %c-1, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c2 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = muli %333, %c-2 : index + %335 = addi %326, %334 : index + %336 = addi %335, %c15 : index + store %313, %3[%324, %50, %336] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg6, %arg8 : index + %7 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = cmpi "slt", %arg5, %c0 : index + %16 = subi %c-1, %arg5 : index + %17 = select %15, %16, %arg5 : index + %18 = divi_signed %17, %c16 : index + %19 = subi %c-1, %18 : index + %20 = select %15, %19, %18 : index + %21 = remi_signed %20, %c16 : index + %22 = cmpi "slt", %21, %c0 : index + %23 = addi %21, %c16 : index + %24 = select %22, %23, %21 : index + %25 = remi_signed %6, %c128 : index + %26 = cmpi "slt", %25, %c0 : index + %27 = addi %25, %c128 : index + %28 = select %26, %27, %25 : index + %29 = remi_signed %arg5, %c16 : index + %30 = cmpi "slt", %29, %c0 : index + %31 = addi %29, %c16 : index + %32 = select %30, %31, %29 : index + %33 = cmpi "slt", %32, %c0 : index + %34 = subi %c-1, %32 : index + %35 = select %33, %34, %32 : index + %36 = divi_signed %35, %c8 : index + %37 = subi %c-1, %36 : index + %38 = select %33, %37, %36 : index + %39 = remi_signed %38, %c2 : index + %40 = cmpi "slt", %39, %c0 : index + %41 = addi %39, %c2 : index + %42 = select %40, %41, %39 : index + %43 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c0_i64 : i64] : vector<8xf32> + %45 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c1_i64 : i64] : vector<8xf32> + %47 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c2_i64 : i64] : vector<8xf32> + %49 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c3_i64 : i64] : vector<8xf32> + %51 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c4_i64 : i64] : vector<8xf32> + %53 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c5_i64 : i64] : vector<8xf32> + %55 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c6_i64 : i64] : vector<8xf32> + %57 = load %3[%24, %28, %42] : memref<16x128x2xvector<8xf32>> + %58 = vector.extractelement %57[%c7_i64 : i64] : vector<8xf32> + %59 = mulf %7, %44 {RelaxedPrecision} : f32 + %60 = mulf %8, %46 {RelaxedPrecision} : f32 + %61 = mulf %9, %48 {RelaxedPrecision} : f32 + %62 = mulf %10, %50 {RelaxedPrecision} : f32 + %63 = mulf %11, %52 {RelaxedPrecision} : f32 + %64 = mulf %12, %54 {RelaxedPrecision} : f32 + %65 = mulf %13, %56 {RelaxedPrecision} : f32 + %66 = mulf %14, %58 {RelaxedPrecision} : f32 + %67 = addi %arg7, %arg9 : index + %68 = remi_signed %67, %c6 : index + %69 = cmpi "slt", %68, %c0 : index + %70 = addi %68, %c6 : index + %71 = select %69, %70, %68 : index + %72 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c0_i64 : i64] : vector<8xf32> + %74 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %75 = vector.extractelement %74[%c1_i64 : i64] : vector<8xf32> + %76 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c2_i64 : i64] : vector<8xf32> + %78 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c3_i64 : i64] : vector<8xf32> + %80 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %81 = vector.extractelement %80[%c4_i64 : i64] : vector<8xf32> + %82 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %83 = vector.extractelement %82[%c5_i64 : i64] : vector<8xf32> + %84 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %85 = vector.extractelement %84[%c6_i64 : i64] : vector<8xf32> + %86 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %87 = vector.extractelement %86[%c7_i64 : i64] : vector<8xf32> + %88 = addf %73, %59 {RelaxedPrecision} : f32 + %89 = addf %75, %60 {RelaxedPrecision} : f32 + %90 = addf %77, %61 {RelaxedPrecision} : f32 + %91 = addf %79, %62 {RelaxedPrecision} : f32 + %92 = addf %81, %63 {RelaxedPrecision} : f32 + %93 = addf %83, %64 {RelaxedPrecision} : f32 + %94 = addf %85, %65 {RelaxedPrecision} : f32 + %95 = addf %87, %66 {RelaxedPrecision} : f32 + %96 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %88, %96[%c0_i64 : i64] : vector<8xf32> + store %97, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %98 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %89, %98[%c1_i64 : i64] : vector<8xf32> + store %99, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %100 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %90, %100[%c2_i64 : i64] : vector<8xf32> + store %101, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %102 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %91, %102[%c3_i64 : i64] : vector<8xf32> + store %103, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %104 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %92, %104[%c4_i64 : i64] : vector<8xf32> + store %105, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %106 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %93, %106[%c5_i64 : i64] : vector<8xf32> + store %107, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %108 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %109 = vector.insertelement %94, %108[%c6_i64 : i64] : vector<8xf32> + store %109, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %110 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %95, %110[%c7_i64 : i64] : vector<8xf32> + store %111, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %112 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %88, %112[%c0_i64 : i64] : vector<8xf32> + store %113, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %114 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %89, %114[%c1_i64 : i64] : vector<8xf32> + store %115, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %116 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %117 = vector.insertelement %90, %116[%c2_i64 : i64] : vector<8xf32> + store %117, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %118 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %91, %118[%c3_i64 : i64] : vector<8xf32> + store %119, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %120 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %92, %120[%c4_i64 : i64] : vector<8xf32> + store %121, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %122 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %123 = vector.insertelement %93, %122[%c5_i64 : i64] : vector<8xf32> + store %123, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %124 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %94, %124[%c6_i64 : i64] : vector<8xf32> + store %125, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %126 = load %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %95, %126[%c7_i64 : i64] : vector<8xf32> + store %127, %2[%24, %71, %42] : memref<16x6x2xvector<8xf32>> + %128 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %133 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %134 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %135 = load %arg0[%5, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %136 = addi %arg5, %c8 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = subi %c-1, %136 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = remi_signed %142, %c16 : index + %144 = cmpi "slt", %143, %c0 : index + %145 = addi %143, %c16 : index + %146 = select %144, %145, %143 : index + %147 = divi_signed %17, %c8 : index + %148 = subi %c-1, %147 : index + %149 = select %15, %148, %147 : index + %150 = muli %142, %c-2 : index + %151 = addi %149, %150 : index + %152 = addi %151, %c1 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = subi %c-1, %152 : index + %155 = select %153, %154, %152 : index + %156 = divi_signed %155, %c2 : index + %157 = subi %c-1, %156 : index + %158 = select %153, %157, %156 : index + %159 = muli %158, %c-2 : index + %160 = addi %151, %159 : index + %161 = addi %160, %c1 : index + %162 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c2_i64 : i64] : vector<8xf32> + %168 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c3_i64 : i64] : vector<8xf32> + %170 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c4_i64 : i64] : vector<8xf32> + %172 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c5_i64 : i64] : vector<8xf32> + %174 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c6_i64 : i64] : vector<8xf32> + %176 = load %3[%146, %28, %161] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c7_i64 : i64] : vector<8xf32> + %178 = mulf %128, %163 {RelaxedPrecision} : f32 + %179 = mulf %129, %165 {RelaxedPrecision} : f32 + %180 = mulf %130, %167 {RelaxedPrecision} : f32 + %181 = mulf %131, %169 {RelaxedPrecision} : f32 + %182 = mulf %132, %171 {RelaxedPrecision} : f32 + %183 = mulf %133, %173 {RelaxedPrecision} : f32 + %184 = mulf %134, %175 {RelaxedPrecision} : f32 + %185 = mulf %135, %177 {RelaxedPrecision} : f32 + %186 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c0_i64 : i64] : vector<8xf32> + %188 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %189 = vector.extractelement %188[%c1_i64 : i64] : vector<8xf32> + %190 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %191 = vector.extractelement %190[%c2_i64 : i64] : vector<8xf32> + %192 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c3_i64 : i64] : vector<8xf32> + %194 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c4_i64 : i64] : vector<8xf32> + %196 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c5_i64 : i64] : vector<8xf32> + %198 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c6_i64 : i64] : vector<8xf32> + %200 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c7_i64 : i64] : vector<8xf32> + %202 = addf %187, %178 {RelaxedPrecision} : f32 + %203 = addf %189, %179 {RelaxedPrecision} : f32 + %204 = addf %191, %180 {RelaxedPrecision} : f32 + %205 = addf %193, %181 {RelaxedPrecision} : f32 + %206 = addf %195, %182 {RelaxedPrecision} : f32 + %207 = addf %197, %183 {RelaxedPrecision} : f32 + %208 = addf %199, %184 {RelaxedPrecision} : f32 + %209 = addf %201, %185 {RelaxedPrecision} : f32 + %210 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %202, %210[%c0_i64 : i64] : vector<8xf32> + store %211, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %212 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %213 = vector.insertelement %203, %212[%c1_i64 : i64] : vector<8xf32> + store %213, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %214 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %204, %214[%c2_i64 : i64] : vector<8xf32> + store %215, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %216 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %217 = vector.insertelement %205, %216[%c3_i64 : i64] : vector<8xf32> + store %217, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %218 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %219 = vector.insertelement %206, %218[%c4_i64 : i64] : vector<8xf32> + store %219, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %220 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %207, %220[%c5_i64 : i64] : vector<8xf32> + store %221, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %222 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %223 = vector.insertelement %208, %222[%c6_i64 : i64] : vector<8xf32> + store %223, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %224 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %209, %224[%c7_i64 : i64] : vector<8xf32> + store %225, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %226 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %202, %226[%c0_i64 : i64] : vector<8xf32> + store %227, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %228 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %203, %228[%c1_i64 : i64] : vector<8xf32> + store %229, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %230 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %204, %230[%c2_i64 : i64] : vector<8xf32> + store %231, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %232 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %205, %232[%c3_i64 : i64] : vector<8xf32> + store %233, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %234 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %206, %234[%c4_i64 : i64] : vector<8xf32> + store %235, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %236 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %207, %236[%c5_i64 : i64] : vector<8xf32> + store %237, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %238 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %208, %238[%c6_i64 : i64] : vector<8xf32> + store %239, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %240 = load %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %209, %240[%c7_i64 : i64] : vector<8xf32> + store %241, %2[%146, %71, %161] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %6 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %7 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %8 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %9 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %10 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = cmpi "slt", %arg5, %c0 : index + %14 = subi %c-1, %arg5 : index + %15 = select %13, %14, %arg5 : index + %16 = divi_signed %15, %c16 : index + %17 = subi %c-1, %16 : index + %18 = select %13, %17, %16 : index + %19 = remi_signed %18, %c16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = addi %19, %c16 : index + %22 = select %20, %21, %19 : index + %23 = remi_signed %4, %c128 : index + %24 = cmpi "slt", %23, %c0 : index + %25 = addi %23, %c128 : index + %26 = select %24, %25, %23 : index + %27 = remi_signed %arg5, %c16 : index + %28 = cmpi "slt", %27, %c0 : index + %29 = addi %27, %c16 : index + %30 = select %28, %29, %27 : index + %31 = cmpi "slt", %30, %c0 : index + %32 = subi %c-1, %30 : index + %33 = select %31, %32, %30 : index + %34 = divi_signed %33, %c8 : index + %35 = subi %c-1, %34 : index + %36 = select %31, %35, %34 : index + %37 = remi_signed %36, %c2 : index + %38 = cmpi "slt", %37, %c0 : index + %39 = addi %37, %c2 : index + %40 = select %38, %39, %37 : index + %41 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c0_i64 : i64] : vector<8xf32> + %43 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %44 = vector.extractelement %43[%c1_i64 : i64] : vector<8xf32> + %45 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c2_i64 : i64] : vector<8xf32> + %47 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c3_i64 : i64] : vector<8xf32> + %49 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c4_i64 : i64] : vector<8xf32> + %51 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %52 = vector.extractelement %51[%c5_i64 : i64] : vector<8xf32> + %53 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %54 = vector.extractelement %53[%c6_i64 : i64] : vector<8xf32> + %55 = load %3[%22, %26, %40] : memref<16x128x2xvector<8xf32>> + %56 = vector.extractelement %55[%c7_i64 : i64] : vector<8xf32> + %57 = mulf %5, %42 {RelaxedPrecision} : f32 + %58 = mulf %6, %44 {RelaxedPrecision} : f32 + %59 = mulf %7, %46 {RelaxedPrecision} : f32 + %60 = mulf %8, %48 {RelaxedPrecision} : f32 + %61 = mulf %9, %50 {RelaxedPrecision} : f32 + %62 = mulf %10, %52 {RelaxedPrecision} : f32 + %63 = mulf %11, %54 {RelaxedPrecision} : f32 + %64 = mulf %12, %56 {RelaxedPrecision} : f32 + %65 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c1_i64 : i64] : vector<8xf32> + %69 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c2_i64 : i64] : vector<8xf32> + %71 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %72 = vector.extractelement %71[%c3_i64 : i64] : vector<8xf32> + %73 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c4_i64 : i64] : vector<8xf32> + %75 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %78 = vector.extractelement %77[%c6_i64 : i64] : vector<8xf32> + %79 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = addf %66, %57 {RelaxedPrecision} : f32 + %82 = addf %68, %58 {RelaxedPrecision} : f32 + %83 = addf %70, %59 {RelaxedPrecision} : f32 + %84 = addf %72, %60 {RelaxedPrecision} : f32 + %85 = addf %74, %61 {RelaxedPrecision} : f32 + %86 = addf %76, %62 {RelaxedPrecision} : f32 + %87 = addf %78, %63 {RelaxedPrecision} : f32 + %88 = addf %80, %64 {RelaxedPrecision} : f32 + %89 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + store %90, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %91 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %82, %91[%c1_i64 : i64] : vector<8xf32> + store %92, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %93 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %94 = vector.insertelement %83, %93[%c2_i64 : i64] : vector<8xf32> + store %94, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %95 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %84, %95[%c3_i64 : i64] : vector<8xf32> + store %96, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %97 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c4_i64 : i64] : vector<8xf32> + store %98, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %99 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %100 = vector.insertelement %86, %99[%c5_i64 : i64] : vector<8xf32> + store %100, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %101 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %87, %101[%c6_i64 : i64] : vector<8xf32> + store %102, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %103 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %88, %103[%c7_i64 : i64] : vector<8xf32> + store %104, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %105 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %106 = vector.insertelement %81, %105[%c0_i64 : i64] : vector<8xf32> + store %106, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %107 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %82, %107[%c1_i64 : i64] : vector<8xf32> + store %108, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %109 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %83, %109[%c2_i64 : i64] : vector<8xf32> + store %110, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %111 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %112 = vector.insertelement %84, %111[%c3_i64 : i64] : vector<8xf32> + store %112, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %113 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %114 = vector.insertelement %85, %113[%c4_i64 : i64] : vector<8xf32> + store %114, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %115 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %86, %115[%c5_i64 : i64] : vector<8xf32> + store %116, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %117 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %87, %117[%c6_i64 : i64] : vector<8xf32> + store %118, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %119 = load %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %120 = vector.insertelement %88, %119[%c7_i64 : i64] : vector<8xf32> + store %120, %2[%22, %c0, %40] : memref<16x6x2xvector<8xf32>> + %121 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %122 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %124 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = addi %arg5, %c8 : index + %130 = cmpi "slt", %129, %c0 : index + %131 = subi %c-1, %129 : index + %132 = select %130, %131, %129 : index + %133 = divi_signed %132, %c16 : index + %134 = subi %c-1, %133 : index + %135 = select %130, %134, %133 : index + %136 = remi_signed %135, %c16 : index + %137 = cmpi "slt", %136, %c0 : index + %138 = addi %136, %c16 : index + %139 = select %137, %138, %136 : index + %140 = divi_signed %15, %c8 : index + %141 = subi %c-1, %140 : index + %142 = select %13, %141, %140 : index + %143 = muli %135, %c-2 : index + %144 = addi %142, %143 : index + %145 = addi %144, %c1 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c2 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = muli %151, %c-2 : index + %153 = addi %144, %152 : index + %154 = addi %153, %c1 : index + %155 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c0_i64 : i64] : vector<8xf32> + %157 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %158 = vector.extractelement %157[%c1_i64 : i64] : vector<8xf32> + %159 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %160 = vector.extractelement %159[%c2_i64 : i64] : vector<8xf32> + %161 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c3_i64 : i64] : vector<8xf32> + %163 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c4_i64 : i64] : vector<8xf32> + %165 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c5_i64 : i64] : vector<8xf32> + %167 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c6_i64 : i64] : vector<8xf32> + %169 = load %3[%139, %26, %154] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c7_i64 : i64] : vector<8xf32> + %171 = mulf %121, %156 {RelaxedPrecision} : f32 + %172 = mulf %122, %158 {RelaxedPrecision} : f32 + %173 = mulf %123, %160 {RelaxedPrecision} : f32 + %174 = mulf %124, %162 {RelaxedPrecision} : f32 + %175 = mulf %125, %164 {RelaxedPrecision} : f32 + %176 = mulf %126, %166 {RelaxedPrecision} : f32 + %177 = mulf %127, %168 {RelaxedPrecision} : f32 + %178 = mulf %128, %170 {RelaxedPrecision} : f32 + %179 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %180 = vector.extractelement %179[%c0_i64 : i64] : vector<8xf32> + %181 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %182 = vector.extractelement %181[%c1_i64 : i64] : vector<8xf32> + %183 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %186 = vector.extractelement %185[%c3_i64 : i64] : vector<8xf32> + %187 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %188 = vector.extractelement %187[%c4_i64 : i64] : vector<8xf32> + %189 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c5_i64 : i64] : vector<8xf32> + %191 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %192 = vector.extractelement %191[%c6_i64 : i64] : vector<8xf32> + %193 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c7_i64 : i64] : vector<8xf32> + %195 = addf %180, %171 {RelaxedPrecision} : f32 + %196 = addf %182, %172 {RelaxedPrecision} : f32 + %197 = addf %184, %173 {RelaxedPrecision} : f32 + %198 = addf %186, %174 {RelaxedPrecision} : f32 + %199 = addf %188, %175 {RelaxedPrecision} : f32 + %200 = addf %190, %176 {RelaxedPrecision} : f32 + %201 = addf %192, %177 {RelaxedPrecision} : f32 + %202 = addf %194, %178 {RelaxedPrecision} : f32 + %203 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %204 = vector.insertelement %195, %203[%c0_i64 : i64] : vector<8xf32> + store %204, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %205 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %206 = vector.insertelement %196, %205[%c1_i64 : i64] : vector<8xf32> + store %206, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %207 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %208 = vector.insertelement %197, %207[%c2_i64 : i64] : vector<8xf32> + store %208, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %209 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %210 = vector.insertelement %198, %209[%c3_i64 : i64] : vector<8xf32> + store %210, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %211 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %199, %211[%c4_i64 : i64] : vector<8xf32> + store %212, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %213 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %214 = vector.insertelement %200, %213[%c5_i64 : i64] : vector<8xf32> + store %214, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %215 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %216 = vector.insertelement %201, %215[%c6_i64 : i64] : vector<8xf32> + store %216, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %217 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %202, %217[%c7_i64 : i64] : vector<8xf32> + store %218, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %219 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %220 = vector.insertelement %195, %219[%c0_i64 : i64] : vector<8xf32> + store %220, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %221 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %222 = vector.insertelement %196, %221[%c1_i64 : i64] : vector<8xf32> + store %222, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %223 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %197, %223[%c2_i64 : i64] : vector<8xf32> + store %224, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %225 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %198, %225[%c3_i64 : i64] : vector<8xf32> + store %226, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %227 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %199, %227[%c4_i64 : i64] : vector<8xf32> + store %228, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %229 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %200, %229[%c5_i64 : i64] : vector<8xf32> + store %230, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %231 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %201, %231[%c6_i64 : i64] : vector<8xf32> + store %232, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %233 = load %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %202, %233[%c7_i64 : i64] : vector<8xf32> + store %234, %2[%139, %c0, %154] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %4, %c8 : index + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = addi %arg5, %c8 : index + %35 = cmpi "slt", %34, %c0 : index + %36 = subi %c-1, %34 : index + %37 = select %35, %36, %34 : index + %38 = divi_signed %37, %c16 : index + %39 = subi %c-1, %38 : index + %40 = select %35, %39, %38 : index + %41 = remi_signed %40, %c16 : index + %42 = cmpi "slt", %41, %c0 : index + %43 = addi %41, %c16 : index + %44 = select %42, %43, %41 : index + %45 = divi_signed %8, %c8 : index + %46 = subi %c-1, %45 : index + %47 = select %6, %46, %45 : index + %48 = muli %40, %c-2 : index + %49 = addi %47, %48 : index + %50 = addi %49, %c1 : index + %51 = cmpi "slt", %50, %c0 : index + %52 = subi %c-1, %50 : index + %53 = select %51, %52, %50 : index + %54 = divi_signed %53, %c2 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = muli %56, %c-2 : index + %58 = addi %49, %57 : index + %59 = addi %58, %c1 : index + %60 = load %2[%44, %c0, %59] : memref<16x6x2xvector<8xf32>> + %61 = addf %33, %60 : vector<8xf32> + store %61, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %62 = addi %4, %c16 : index + %63 = vector.transfer_read %arg2[%arg4, %62], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %64 = addi %11, %c1 : index + %65 = cmpi "slt", %64, %c0 : index + %66 = subi %c-1, %64 : index + %67 = select %65, %66, %64 : index + %68 = divi_signed %67, %c16 : index + %69 = subi %c-1, %68 : index + %70 = select %65, %69, %68 : index + %71 = muli %70, %c-16 : index + %72 = addi %11, %71 : index + %73 = addi %72, %c1 : index + %74 = load %2[%73, %c0, %29] : memref<16x6x2xvector<8xf32>> + %75 = addf %63, %74 : vector<8xf32> + store %75, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %76 = addi %4, %c24 : index + %77 = vector.transfer_read %arg2[%arg4, %76], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %78 = addi %arg5, %c24 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = subi %c-1, %78 : index + %81 = select %79, %80, %78 : index + %82 = divi_signed %81, %c16 : index + %83 = subi %c-1, %82 : index + %84 = select %79, %83, %82 : index + %85 = remi_signed %84, %c16 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = addi %85, %c16 : index + %88 = select %86, %87, %85 : index + %89 = muli %84, %c-2 : index + %90 = addi %47, %89 : index + %91 = addi %90, %c3 : index + %92 = cmpi "slt", %91, %c0 : index + %93 = subi %c-1, %91 : index + %94 = select %92, %93, %91 : index + %95 = divi_signed %94, %c2 : index + %96 = subi %c-1, %95 : index + %97 = select %92, %96, %95 : index + %98 = muli %97, %c-2 : index + %99 = addi %90, %98 : index + %100 = addi %99, %c3 : index + %101 = load %2[%88, %c0, %100] : memref<16x6x2xvector<8xf32>> + %102 = addf %77, %101 : vector<8xf32> + store %102, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %103 = addi %4, %c32 : index + %104 = vector.transfer_read %arg2[%arg4, %103], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %105 = addi %11, %c2 : index + %106 = cmpi "slt", %105, %c0 : index + %107 = subi %c-1, %105 : index + %108 = select %106, %107, %105 : index + %109 = divi_signed %108, %c16 : index + %110 = subi %c-1, %109 : index + %111 = select %106, %110, %109 : index + %112 = muli %111, %c-16 : index + %113 = addi %11, %112 : index + %114 = addi %113, %c2 : index + %115 = load %2[%114, %c0, %29] : memref<16x6x2xvector<8xf32>> + %116 = addf %104, %115 : vector<8xf32> + store %116, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %117 = addi %4, %c40 : index + %118 = vector.transfer_read %arg2[%arg4, %117], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %119 = addi %arg5, %c40 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = subi %c-1, %119 : index + %122 = select %120, %121, %119 : index + %123 = divi_signed %122, %c16 : index + %124 = subi %c-1, %123 : index + %125 = select %120, %124, %123 : index + %126 = remi_signed %125, %c16 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = addi %126, %c16 : index + %129 = select %127, %128, %126 : index + %130 = muli %125, %c-2 : index + %131 = addi %47, %130 : index + %132 = addi %131, %c5 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c2 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = muli %138, %c-2 : index + %140 = addi %131, %139 : index + %141 = addi %140, %c5 : index + %142 = load %2[%129, %c0, %141] : memref<16x6x2xvector<8xf32>> + %143 = addf %118, %142 : vector<8xf32> + store %143, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %144 = addi %4, %c48 : index + %145 = vector.transfer_read %arg2[%arg4, %144], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %146 = addi %11, %c3 : index + %147 = cmpi "slt", %146, %c0 : index + %148 = subi %c-1, %146 : index + %149 = select %147, %148, %146 : index + %150 = divi_signed %149, %c16 : index + %151 = subi %c-1, %150 : index + %152 = select %147, %151, %150 : index + %153 = muli %152, %c-16 : index + %154 = addi %11, %153 : index + %155 = addi %154, %c3 : index + %156 = load %2[%155, %c0, %29] : memref<16x6x2xvector<8xf32>> + %157 = addf %145, %156 : vector<8xf32> + store %157, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %158 = addi %4, %c56 : index + %159 = vector.transfer_read %arg2[%arg4, %158], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %160 = addi %arg5, %c56 : index + %161 = cmpi "slt", %160, %c0 : index + %162 = subi %c-1, %160 : index + %163 = select %161, %162, %160 : index + %164 = divi_signed %163, %c16 : index + %165 = subi %c-1, %164 : index + %166 = select %161, %165, %164 : index + %167 = remi_signed %166, %c16 : index + %168 = cmpi "slt", %167, %c0 : index + %169 = addi %167, %c16 : index + %170 = select %168, %169, %167 : index + %171 = muli %166, %c-2 : index + %172 = addi %47, %171 : index + %173 = addi %172, %c7 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %172, %180 : index + %182 = addi %181, %c7 : index + %183 = load %2[%170, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %159, %183 : vector<8xf32> + store %184, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %185 = addi %4, %c64 : index + %186 = vector.transfer_read %arg2[%arg4, %185], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %187 = addi %11, %c4 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = subi %c-1, %187 : index + %190 = select %188, %189, %187 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = muli %193, %c-16 : index + %195 = addi %11, %194 : index + %196 = addi %195, %c4 : index + %197 = load %2[%196, %c0, %29] : memref<16x6x2xvector<8xf32>> + %198 = addf %186, %197 : vector<8xf32> + store %198, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %199 = addi %4, %c72 : index + %200 = vector.transfer_read %arg2[%arg4, %199], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %201 = addi %arg5, %c72 : index + %202 = cmpi "slt", %201, %c0 : index + %203 = subi %c-1, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16 : index + %206 = subi %c-1, %205 : index + %207 = select %202, %206, %205 : index + %208 = remi_signed %207, %c16 : index + %209 = cmpi "slt", %208, %c0 : index + %210 = addi %208, %c16 : index + %211 = select %209, %210, %208 : index + %212 = muli %207, %c-2 : index + %213 = addi %47, %212 : index + %214 = addi %213, %c9 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c2 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c9 : index + %224 = load %2[%211, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %200, %224 : vector<8xf32> + store %225, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %226 = addi %4, %c80 : index + %227 = vector.transfer_read %arg2[%arg4, %226], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %228 = addi %11, %c5 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c16 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-16 : index + %236 = addi %11, %235 : index + %237 = addi %236, %c5 : index + %238 = load %2[%237, %c0, %29] : memref<16x6x2xvector<8xf32>> + %239 = addf %227, %238 : vector<8xf32> + store %239, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %240 = addi %4, %c88 : index + %241 = vector.transfer_read %arg2[%arg4, %240], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %242 = addi %arg5, %c88 : index + %243 = cmpi "slt", %242, %c0 : index + %244 = subi %c-1, %242 : index + %245 = select %243, %244, %242 : index + %246 = divi_signed %245, %c16 : index + %247 = subi %c-1, %246 : index + %248 = select %243, %247, %246 : index + %249 = remi_signed %248, %c16 : index + %250 = cmpi "slt", %249, %c0 : index + %251 = addi %249, %c16 : index + %252 = select %250, %251, %249 : index + %253 = muli %248, %c-2 : index + %254 = addi %47, %253 : index + %255 = addi %254, %c11 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c2 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = muli %261, %c-2 : index + %263 = addi %254, %262 : index + %264 = addi %263, %c11 : index + %265 = load %2[%252, %c0, %264] : memref<16x6x2xvector<8xf32>> + %266 = addf %241, %265 : vector<8xf32> + store %266, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %267 = addi %4, %c96 : index + %268 = vector.transfer_read %arg2[%arg4, %267], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %269 = addi %11, %c6 : index + %270 = cmpi "slt", %269, %c0 : index + %271 = subi %c-1, %269 : index + %272 = select %270, %271, %269 : index + %273 = divi_signed %272, %c16 : index + %274 = subi %c-1, %273 : index + %275 = select %270, %274, %273 : index + %276 = muli %275, %c-16 : index + %277 = addi %11, %276 : index + %278 = addi %277, %c6 : index + %279 = load %2[%278, %c0, %29] : memref<16x6x2xvector<8xf32>> + %280 = addf %268, %279 : vector<8xf32> + store %280, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %281 = addi %4, %c104 : index + %282 = vector.transfer_read %arg2[%arg4, %281], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %283 = addi %arg5, %c104 : index + %284 = cmpi "slt", %283, %c0 : index + %285 = subi %c-1, %283 : index + %286 = select %284, %285, %283 : index + %287 = divi_signed %286, %c16 : index + %288 = subi %c-1, %287 : index + %289 = select %284, %288, %287 : index + %290 = remi_signed %289, %c16 : index + %291 = cmpi "slt", %290, %c0 : index + %292 = addi %290, %c16 : index + %293 = select %291, %292, %290 : index + %294 = muli %289, %c-2 : index + %295 = addi %47, %294 : index + %296 = addi %295, %c13 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c2 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = addi %304, %c13 : index + %306 = load %2[%293, %c0, %305] : memref<16x6x2xvector<8xf32>> + %307 = addf %282, %306 : vector<8xf32> + store %307, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %308 = addi %4, %c112 : index + %309 = vector.transfer_read %arg2[%arg4, %308], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %310 = addi %11, %c7 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c16 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = muli %316, %c-16 : index + %318 = addi %11, %317 : index + %319 = addi %318, %c7 : index + %320 = load %2[%319, %c0, %29] : memref<16x6x2xvector<8xf32>> + %321 = addf %309, %320 : vector<8xf32> + store %321, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %322 = addi %4, %c120 : index + %323 = vector.transfer_read %arg2[%arg4, %322], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %324 = addi %arg5, %c120 : index + %325 = cmpi "slt", %324, %c0 : index + %326 = subi %c-1, %324 : index + %327 = select %325, %326, %324 : index + %328 = divi_signed %327, %c16 : index + %329 = subi %c-1, %328 : index + %330 = select %325, %329, %328 : index + %331 = remi_signed %330, %c16 : index + %332 = cmpi "slt", %331, %c0 : index + %333 = addi %331, %c16 : index + %334 = select %332, %333, %331 : index + %335 = muli %330, %c-2 : index + %336 = addi %47, %335 : index + %337 = addi %336, %c15 : index + %338 = cmpi "slt", %337, %c0 : index + %339 = subi %c-1, %337 : index + %340 = select %338, %339, %337 : index + %341 = divi_signed %340, %c2 : index + %342 = subi %c-1, %341 : index + %343 = select %338, %342, %341 : index + %344 = muli %343, %c-2 : index + %345 = addi %336, %344 : index + %346 = addi %345, %c15 : index + %347 = load %2[%334, %c0, %346] : memref<16x6x2xvector<8xf32>> + %348 = addf %323, %347 : vector<8xf32> + store %348, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %349 = muli %arg6, %c8 : index + %350 = addi %4, %349 : index + %351 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %351, %arg2[%arg4, %350] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/19_SCFToStandard.mlir b/Tutorials/optimized_matmul/mlir/19_SCFToStandard.mlir new file mode 100644 index 00000000..e0805722 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/19_SCFToStandard.mlir @@ -0,0 +1,1416 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + br ^bb1(%c0 : index) + ^bb1(%0: index): // 2 preds: ^bb0, ^bb23 + %1 = cmpi "slt", %0, %c512 : index + cond_br %1, ^bb2, ^bb24 + ^bb2: // pred: ^bb1 + br ^bb3(%c0 : index) + ^bb3(%2: index): // 2 preds: ^bb2, ^bb13 + %3 = cmpi "slt", %2, %c780 : index + cond_br %3, ^bb4, ^bb14 + ^bb4: // pred: ^bb3 + br ^bb5(%c0 : index) + ^bb5(%4: index): // 2 preds: ^bb4, ^bb12 + %5 = cmpi "slt", %4, %c256 : index + cond_br %5, ^bb6, ^bb13 + ^bb6: // pred: ^bb5 + br ^bb7(%c0 : index) + ^bb7(%6: index): // 2 preds: ^bb6, ^bb11 + %7 = cmpi "slt", %6, %c128 : index + cond_br %7, ^bb8, ^bb12 + ^bb8: // pred: ^bb7 + br ^bb9(%c0 : index) + ^bb9(%8: index): // 2 preds: ^bb8, ^bb10 + %9 = cmpi "slt", %8, %c4 : index + cond_br %9, ^bb10, ^bb11 + ^bb10: // pred: ^bb9 + %10 = addi %0, %4 : index + %11 = addi %6, %8 : index + %12 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg1[%11, %10] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = mulf %12, %13 {RelaxedPrecision} : f32 + %15 = load %arg2[%2, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = addf %15, %14 {RelaxedPrecision} : f32 + store %16, %arg2[%2, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %17 = load %arg2[%2, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %17, %arg2[%2, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %18 = addi %10, %c1 : index + %19 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %20 = load %arg1[%11, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %21 = mulf %19, %20 {RelaxedPrecision} : f32 + %22 = load %arg2[%2, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %23 = addf %22, %21 {RelaxedPrecision} : f32 + store %23, %arg2[%2, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = load %arg2[%2, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %24, %arg2[%2, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = addi %10, %c2 : index + %26 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg1[%11, %25] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = mulf %26, %27 {RelaxedPrecision} : f32 + %29 = load %arg2[%2, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %30 = addf %29, %28 {RelaxedPrecision} : f32 + store %30, %arg2[%2, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = load %arg2[%2, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %31, %arg2[%2, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = addi %10, %c3 : index + %33 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %34 = load %arg1[%11, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = mulf %33, %34 {RelaxedPrecision} : f32 + %36 = load %arg2[%2, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %37 = addf %36, %35 {RelaxedPrecision} : f32 + store %37, %arg2[%2, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %38 = load %arg2[%2, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %38, %arg2[%2, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = addi %10, %c4 : index + %40 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %41 = load %arg1[%11, %39] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = mulf %40, %41 {RelaxedPrecision} : f32 + %43 = load %arg2[%2, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = addf %43, %42 {RelaxedPrecision} : f32 + store %44, %arg2[%2, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %45 = load %arg2[%2, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %45, %arg2[%2, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = addi %10, %c5 : index + %47 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %48 = load %arg1[%11, %46] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = mulf %47, %48 {RelaxedPrecision} : f32 + %50 = load %arg2[%2, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %51 = addf %50, %49 {RelaxedPrecision} : f32 + store %51, %arg2[%2, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = load %arg2[%2, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %52, %arg2[%2, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = addi %10, %c6 : index + %54 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %55 = load %arg1[%11, %53] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = mulf %54, %55 {RelaxedPrecision} : f32 + %57 = load %arg2[%2, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %58 = addf %57, %56 {RelaxedPrecision} : f32 + store %58, %arg2[%2, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %59 = load %arg2[%2, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %59, %arg2[%2, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = addi %10, %c7 : index + %61 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %62 = load %arg1[%11, %60] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = mulf %61, %62 {RelaxedPrecision} : f32 + %64 = load %arg2[%2, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %65 = addf %64, %63 {RelaxedPrecision} : f32 + store %65, %arg2[%2, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %66 = load %arg2[%2, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %66, %arg2[%2, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = addi %10, %c8 : index + %68 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %69 = load %arg1[%11, %67] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = mulf %68, %69 {RelaxedPrecision} : f32 + %71 = load %arg2[%2, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %72 = addf %71, %70 {RelaxedPrecision} : f32 + store %72, %arg2[%2, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %73 = load %arg2[%2, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %73, %arg2[%2, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %74 = addi %10, %c9 : index + %75 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %76 = load %arg1[%11, %74] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %77 = mulf %75, %76 {RelaxedPrecision} : f32 + %78 = load %arg2[%2, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = addf %78, %77 {RelaxedPrecision} : f32 + store %79, %arg2[%2, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = load %arg2[%2, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %80, %arg2[%2, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %81 = addi %10, %c10 : index + %82 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %83 = load %arg1[%11, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = mulf %82, %83 {RelaxedPrecision} : f32 + %85 = load %arg2[%2, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %86 = addf %85, %84 {RelaxedPrecision} : f32 + store %86, %arg2[%2, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = load %arg2[%2, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %87, %arg2[%2, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = addi %10, %c11 : index + %89 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %90 = load %arg1[%11, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %91 = mulf %89, %90 {RelaxedPrecision} : f32 + %92 = load %arg2[%2, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %93 = addf %92, %91 {RelaxedPrecision} : f32 + store %93, %arg2[%2, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = load %arg2[%2, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %94, %arg2[%2, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = addi %10, %c12 : index + %96 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %97 = load %arg1[%11, %95] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = mulf %96, %97 {RelaxedPrecision} : f32 + %99 = load %arg2[%2, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %100 = addf %99, %98 {RelaxedPrecision} : f32 + store %100, %arg2[%2, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %101 = load %arg2[%2, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %101, %arg2[%2, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = addi %10, %c13 : index + %103 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %104 = load %arg1[%11, %102] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = mulf %103, %104 {RelaxedPrecision} : f32 + %106 = load %arg2[%2, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %107 = addf %106, %105 {RelaxedPrecision} : f32 + store %107, %arg2[%2, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %108 = load %arg2[%2, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %108, %arg2[%2, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %109 = addi %10, %c14 : index + %110 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %111 = load %arg1[%11, %109] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = mulf %110, %111 {RelaxedPrecision} : f32 + %113 = load %arg2[%2, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %114 = addf %113, %112 {RelaxedPrecision} : f32 + store %114, %arg2[%2, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = load %arg2[%2, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %115, %arg2[%2, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = addi %10, %c15 : index + %117 = load %arg0[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %118 = load %arg1[%11, %116] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = mulf %117, %118 {RelaxedPrecision} : f32 + %120 = load %arg2[%2, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = addf %120, %119 {RelaxedPrecision} : f32 + store %121, %arg2[%2, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %122 = load %arg2[%2, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %122, %arg2[%2, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addi %2, %c1 : index + %124 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg1[%11, %10] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = mulf %124, %125 {RelaxedPrecision} : f32 + %127 = load %arg2[%123, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = addf %127, %126 {RelaxedPrecision} : f32 + store %128, %arg2[%123, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %129 = load %arg2[%123, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %129, %arg2[%123, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg1[%11, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = mulf %130, %131 {RelaxedPrecision} : f32 + %133 = load %arg2[%123, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = addf %133, %132 {RelaxedPrecision} : f32 + store %134, %arg2[%123, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = load %arg2[%123, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %135, %arg2[%123, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %137 = load %arg1[%11, %25] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %138 = mulf %136, %137 {RelaxedPrecision} : f32 + %139 = load %arg2[%123, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = addf %139, %138 {RelaxedPrecision} : f32 + store %140, %arg2[%123, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = load %arg2[%123, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %141, %arg2[%123, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %143 = load %arg1[%11, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = mulf %142, %143 {RelaxedPrecision} : f32 + %145 = load %arg2[%123, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = addf %145, %144 {RelaxedPrecision} : f32 + store %146, %arg2[%123, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = load %arg2[%123, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %147, %arg2[%123, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %149 = load %arg1[%11, %39] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = mulf %148, %149 {RelaxedPrecision} : f32 + %151 = load %arg2[%123, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = addf %151, %150 {RelaxedPrecision} : f32 + store %152, %arg2[%123, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = load %arg2[%123, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %153, %arg2[%123, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg1[%11, %46] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = mulf %154, %155 {RelaxedPrecision} : f32 + %157 = load %arg2[%123, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = addf %157, %156 {RelaxedPrecision} : f32 + store %158, %arg2[%123, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = load %arg2[%123, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %159, %arg2[%123, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %161 = load %arg1[%11, %53] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = mulf %160, %161 {RelaxedPrecision} : f32 + %163 = load %arg2[%123, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = addf %163, %162 {RelaxedPrecision} : f32 + store %164, %arg2[%123, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = load %arg2[%123, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %165, %arg2[%123, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %167 = load %arg1[%11, %60] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = mulf %166, %167 {RelaxedPrecision} : f32 + %169 = load %arg2[%123, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = addf %169, %168 {RelaxedPrecision} : f32 + store %170, %arg2[%123, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = load %arg2[%123, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %171, %arg2[%123, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %173 = load %arg1[%11, %67] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = mulf %172, %173 {RelaxedPrecision} : f32 + %175 = load %arg2[%123, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = addf %175, %174 {RelaxedPrecision} : f32 + store %176, %arg2[%123, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = load %arg2[%123, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %177, %arg2[%123, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %179 = load %arg1[%11, %74] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = mulf %178, %179 {RelaxedPrecision} : f32 + %181 = load %arg2[%123, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = addf %181, %180 {RelaxedPrecision} : f32 + store %182, %arg2[%123, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = load %arg2[%123, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %183, %arg2[%123, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %185 = load %arg1[%11, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = mulf %184, %185 {RelaxedPrecision} : f32 + %187 = load %arg2[%123, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = addf %187, %186 {RelaxedPrecision} : f32 + store %188, %arg2[%123, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = load %arg2[%123, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %189, %arg2[%123, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %191 = load %arg1[%11, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = mulf %190, %191 {RelaxedPrecision} : f32 + %193 = load %arg2[%123, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = addf %193, %192 {RelaxedPrecision} : f32 + store %194, %arg2[%123, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = load %arg2[%123, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %195, %arg2[%123, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %197 = load %arg1[%11, %95] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = mulf %196, %197 {RelaxedPrecision} : f32 + %199 = load %arg2[%123, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = addf %199, %198 {RelaxedPrecision} : f32 + store %200, %arg2[%123, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = load %arg2[%123, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %201, %arg2[%123, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %203 = load %arg1[%11, %102] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = mulf %202, %203 {RelaxedPrecision} : f32 + %205 = load %arg2[%123, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = addf %205, %204 {RelaxedPrecision} : f32 + store %206, %arg2[%123, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = load %arg2[%123, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %207, %arg2[%123, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %209 = load %arg1[%11, %109] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = mulf %208, %209 {RelaxedPrecision} : f32 + %211 = load %arg2[%123, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %212 = addf %211, %210 {RelaxedPrecision} : f32 + store %212, %arg2[%123, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = load %arg2[%123, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %213, %arg2[%123, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = load %arg0[%123, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %215 = load %arg1[%11, %116] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = mulf %214, %215 {RelaxedPrecision} : f32 + %217 = load %arg2[%123, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %218 = addf %217, %216 {RelaxedPrecision} : f32 + store %218, %arg2[%123, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = load %arg2[%123, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %219, %arg2[%123, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = addi %2, %c2 : index + %221 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%11, %10] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = mulf %221, %222 {RelaxedPrecision} : f32 + %224 = load %arg2[%220, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = addf %224, %223 {RelaxedPrecision} : f32 + store %225, %arg2[%220, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%220, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%220, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %228 = load %arg1[%11, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %229 = mulf %227, %228 {RelaxedPrecision} : f32 + %230 = load %arg2[%220, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = addf %230, %229 {RelaxedPrecision} : f32 + store %231, %arg2[%220, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = load %arg2[%220, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %232, %arg2[%220, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %233 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %234 = load %arg1[%11, %25] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = mulf %233, %234 {RelaxedPrecision} : f32 + %236 = load %arg2[%220, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = addf %236, %235 {RelaxedPrecision} : f32 + store %237, %arg2[%220, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = load %arg2[%220, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %238, %arg2[%220, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%11, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = mulf %239, %240 {RelaxedPrecision} : f32 + %242 = load %arg2[%220, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = addf %242, %241 {RelaxedPrecision} : f32 + store %243, %arg2[%220, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%220, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%220, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %246 = load %arg1[%11, %39] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = mulf %245, %246 {RelaxedPrecision} : f32 + %248 = load %arg2[%220, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = addf %248, %247 {RelaxedPrecision} : f32 + store %249, %arg2[%220, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = load %arg2[%220, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %250, %arg2[%220, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %252 = load %arg1[%11, %46] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = mulf %251, %252 {RelaxedPrecision} : f32 + %254 = load %arg2[%220, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = addf %254, %253 {RelaxedPrecision} : f32 + store %255, %arg2[%220, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = load %arg2[%220, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %256, %arg2[%220, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%11, %53] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = mulf %257, %258 {RelaxedPrecision} : f32 + %260 = load %arg2[%220, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = addf %260, %259 {RelaxedPrecision} : f32 + store %261, %arg2[%220, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%220, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%220, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %264 = load %arg1[%11, %60] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %265 = mulf %263, %264 {RelaxedPrecision} : f32 + %266 = load %arg2[%220, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = addf %266, %265 {RelaxedPrecision} : f32 + store %267, %arg2[%220, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = load %arg2[%220, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %268, %arg2[%220, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %269 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %270 = load %arg1[%11, %67] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = mulf %269, %270 {RelaxedPrecision} : f32 + %272 = load %arg2[%220, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = addf %272, %271 {RelaxedPrecision} : f32 + store %273, %arg2[%220, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %274 = load %arg2[%220, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %274, %arg2[%220, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%11, %74] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = mulf %275, %276 {RelaxedPrecision} : f32 + %278 = load %arg2[%220, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = addf %278, %277 {RelaxedPrecision} : f32 + store %279, %arg2[%220, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%220, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%220, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %282 = load %arg1[%11, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %283 = mulf %281, %282 {RelaxedPrecision} : f32 + %284 = load %arg2[%220, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = addf %284, %283 {RelaxedPrecision} : f32 + store %285, %arg2[%220, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = load %arg2[%220, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %286, %arg2[%220, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %288 = load %arg1[%11, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = mulf %287, %288 {RelaxedPrecision} : f32 + %290 = load %arg2[%220, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = addf %290, %289 {RelaxedPrecision} : f32 + store %291, %arg2[%220, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = load %arg2[%220, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %292, %arg2[%220, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%11, %95] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = mulf %293, %294 {RelaxedPrecision} : f32 + %296 = load %arg2[%220, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = addf %296, %295 {RelaxedPrecision} : f32 + store %297, %arg2[%220, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%220, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%220, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %300 = load %arg1[%11, %102] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = mulf %299, %300 {RelaxedPrecision} : f32 + %302 = load %arg2[%220, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addf %302, %301 {RelaxedPrecision} : f32 + store %303, %arg2[%220, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = load %arg2[%220, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %304, %arg2[%220, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %305 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%11, %109] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%220, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%220, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%220, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%220, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg0[%220, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%11, %116] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = mulf %311, %312 {RelaxedPrecision} : f32 + %314 = load %arg2[%220, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = addf %314, %313 {RelaxedPrecision} : f32 + store %315, %arg2[%220, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%220, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%220, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = addi %2, %c3 : index + %318 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %319 = load %arg1[%11, %10] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = mulf %318, %319 {RelaxedPrecision} : f32 + %321 = load %arg2[%317, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = addf %321, %320 {RelaxedPrecision} : f32 + store %322, %arg2[%317, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %323 = load %arg2[%317, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %323, %arg2[%317, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %325 = load %arg1[%11, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = mulf %324, %325 {RelaxedPrecision} : f32 + %327 = load %arg2[%317, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = addf %327, %326 {RelaxedPrecision} : f32 + store %328, %arg2[%317, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg2[%317, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %329, %arg2[%317, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %331 = load %arg1[%11, %25] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = mulf %330, %331 {RelaxedPrecision} : f32 + %333 = load %arg2[%317, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = addf %333, %332 {RelaxedPrecision} : f32 + store %334, %arg2[%317, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg2[%317, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %335, %arg2[%317, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %337 = load %arg1[%11, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = mulf %336, %337 {RelaxedPrecision} : f32 + %339 = load %arg2[%317, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = addf %339, %338 {RelaxedPrecision} : f32 + store %340, %arg2[%317, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = load %arg2[%317, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %341, %arg2[%317, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %343 = load %arg1[%11, %39] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = mulf %342, %343 {RelaxedPrecision} : f32 + %345 = load %arg2[%317, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = addf %345, %344 {RelaxedPrecision} : f32 + store %346, %arg2[%317, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg2[%317, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %347, %arg2[%317, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %349 = load %arg1[%11, %46] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = mulf %348, %349 {RelaxedPrecision} : f32 + %351 = load %arg2[%317, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = addf %351, %350 {RelaxedPrecision} : f32 + store %352, %arg2[%317, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = load %arg2[%317, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %353, %arg2[%317, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %354 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %355 = load %arg1[%11, %53] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = mulf %354, %355 {RelaxedPrecision} : f32 + %357 = load %arg2[%317, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = addf %357, %356 {RelaxedPrecision} : f32 + store %358, %arg2[%317, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg2[%317, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %359, %arg2[%317, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %361 = load %arg1[%11, %60] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = mulf %360, %361 {RelaxedPrecision} : f32 + %363 = load %arg2[%317, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = addf %363, %362 {RelaxedPrecision} : f32 + store %364, %arg2[%317, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg2[%317, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %365, %arg2[%317, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %367 = load %arg1[%11, %67] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = mulf %366, %367 {RelaxedPrecision} : f32 + %369 = load %arg2[%317, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = addf %369, %368 {RelaxedPrecision} : f32 + store %370, %arg2[%317, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = load %arg2[%317, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %371, %arg2[%317, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %373 = load %arg1[%11, %74] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = mulf %372, %373 {RelaxedPrecision} : f32 + %375 = load %arg2[%317, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = addf %375, %374 {RelaxedPrecision} : f32 + store %376, %arg2[%317, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = load %arg2[%317, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %377, %arg2[%317, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %379 = load %arg1[%11, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = mulf %378, %379 {RelaxedPrecision} : f32 + %381 = load %arg2[%317, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = addf %381, %380 {RelaxedPrecision} : f32 + store %382, %arg2[%317, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg2[%317, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %383, %arg2[%317, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %385 = load %arg1[%11, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %386 = mulf %384, %385 {RelaxedPrecision} : f32 + %387 = load %arg2[%317, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = addf %387, %386 {RelaxedPrecision} : f32 + store %388, %arg2[%317, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = load %arg2[%317, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %389, %arg2[%317, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %391 = load %arg1[%11, %95] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = mulf %390, %391 {RelaxedPrecision} : f32 + %393 = load %arg2[%317, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = addf %393, %392 {RelaxedPrecision} : f32 + store %394, %arg2[%317, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %395 = load %arg2[%317, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %395, %arg2[%317, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %397 = load %arg1[%11, %102] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = mulf %396, %397 {RelaxedPrecision} : f32 + %399 = load %arg2[%317, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = addf %399, %398 {RelaxedPrecision} : f32 + store %400, %arg2[%317, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %401 = load %arg2[%317, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %401, %arg2[%317, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %403 = load %arg1[%11, %109] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = mulf %402, %403 {RelaxedPrecision} : f32 + %405 = load %arg2[%317, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %406 = addf %405, %404 {RelaxedPrecision} : f32 + store %406, %arg2[%317, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = load %arg2[%317, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %407, %arg2[%317, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %408 = load %arg0[%317, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %409 = load %arg1[%11, %116] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = mulf %408, %409 {RelaxedPrecision} : f32 + %411 = load %arg2[%317, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %412 = addf %411, %410 {RelaxedPrecision} : f32 + store %412, %arg2[%317, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %413 = load %arg2[%317, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %413, %arg2[%317, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %414 = addi %2, %c4 : index + %415 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %416 = load %arg1[%11, %10] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = mulf %415, %416 {RelaxedPrecision} : f32 + %418 = load %arg2[%414, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = addf %418, %417 {RelaxedPrecision} : f32 + store %419, %arg2[%414, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = load %arg2[%414, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %420, %arg2[%414, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %422 = load %arg1[%11, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = mulf %421, %422 {RelaxedPrecision} : f32 + %424 = load %arg2[%414, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = addf %424, %423 {RelaxedPrecision} : f32 + store %425, %arg2[%414, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %426 = load %arg2[%414, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %426, %arg2[%414, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %428 = load %arg1[%11, %25] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = mulf %427, %428 {RelaxedPrecision} : f32 + %430 = load %arg2[%414, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = addf %430, %429 {RelaxedPrecision} : f32 + store %431, %arg2[%414, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %432 = load %arg2[%414, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %432, %arg2[%414, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %434 = load %arg1[%11, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = mulf %433, %434 {RelaxedPrecision} : f32 + %436 = load %arg2[%414, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = addf %436, %435 {RelaxedPrecision} : f32 + store %437, %arg2[%414, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %438 = load %arg2[%414, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %438, %arg2[%414, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %440 = load %arg1[%11, %39] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = mulf %439, %440 {RelaxedPrecision} : f32 + %442 = load %arg2[%414, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = addf %442, %441 {RelaxedPrecision} : f32 + store %443, %arg2[%414, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %444 = load %arg2[%414, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %444, %arg2[%414, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %446 = load %arg1[%11, %46] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = mulf %445, %446 {RelaxedPrecision} : f32 + %448 = load %arg2[%414, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = addf %448, %447 {RelaxedPrecision} : f32 + store %449, %arg2[%414, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %450 = load %arg2[%414, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %450, %arg2[%414, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %452 = load %arg1[%11, %53] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = mulf %451, %452 {RelaxedPrecision} : f32 + %454 = load %arg2[%414, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = addf %454, %453 {RelaxedPrecision} : f32 + store %455, %arg2[%414, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %456 = load %arg2[%414, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %456, %arg2[%414, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %458 = load %arg1[%11, %60] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = mulf %457, %458 {RelaxedPrecision} : f32 + %460 = load %arg2[%414, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = addf %460, %459 {RelaxedPrecision} : f32 + store %461, %arg2[%414, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %462 = load %arg2[%414, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %462, %arg2[%414, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %464 = load %arg1[%11, %67] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %465 = mulf %463, %464 {RelaxedPrecision} : f32 + %466 = load %arg2[%414, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %467 = addf %466, %465 {RelaxedPrecision} : f32 + store %467, %arg2[%414, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = load %arg2[%414, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %468, %arg2[%414, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %470 = load %arg1[%11, %74] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = mulf %469, %470 {RelaxedPrecision} : f32 + %472 = load %arg2[%414, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = addf %472, %471 {RelaxedPrecision} : f32 + store %473, %arg2[%414, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %474 = load %arg2[%414, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %474, %arg2[%414, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %476 = load %arg1[%11, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = mulf %475, %476 {RelaxedPrecision} : f32 + %478 = load %arg2[%414, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = addf %478, %477 {RelaxedPrecision} : f32 + store %479, %arg2[%414, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %480 = load %arg2[%414, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %480, %arg2[%414, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %482 = load %arg1[%11, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %483 = mulf %481, %482 {RelaxedPrecision} : f32 + %484 = load %arg2[%414, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %485 = addf %484, %483 {RelaxedPrecision} : f32 + store %485, %arg2[%414, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = load %arg2[%414, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %486, %arg2[%414, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %488 = load %arg1[%11, %95] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = mulf %487, %488 {RelaxedPrecision} : f32 + %490 = load %arg2[%414, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = addf %490, %489 {RelaxedPrecision} : f32 + store %491, %arg2[%414, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %492 = load %arg2[%414, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %492, %arg2[%414, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %494 = load %arg1[%11, %102] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = mulf %493, %494 {RelaxedPrecision} : f32 + %496 = load %arg2[%414, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = addf %496, %495 {RelaxedPrecision} : f32 + store %497, %arg2[%414, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %498 = load %arg2[%414, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %498, %arg2[%414, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %500 = load %arg1[%11, %109] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %501 = mulf %499, %500 {RelaxedPrecision} : f32 + %502 = load %arg2[%414, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %503 = addf %502, %501 {RelaxedPrecision} : f32 + store %503, %arg2[%414, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = load %arg2[%414, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %504, %arg2[%414, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %505 = load %arg0[%414, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %506 = load %arg1[%11, %116] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = mulf %505, %506 {RelaxedPrecision} : f32 + %508 = load %arg2[%414, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %509 = addf %508, %507 {RelaxedPrecision} : f32 + store %509, %arg2[%414, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = load %arg2[%414, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %510, %arg2[%414, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %511 = addi %2, %c5 : index + %512 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %513 = load %arg1[%11, %10] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = mulf %512, %513 {RelaxedPrecision} : f32 + %515 = load %arg2[%511, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = addf %515, %514 {RelaxedPrecision} : f32 + store %516, %arg2[%511, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %517 = load %arg2[%511, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %517, %arg2[%511, %10] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %519 = load %arg1[%11, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = mulf %518, %519 {RelaxedPrecision} : f32 + %521 = load %arg2[%511, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = addf %521, %520 {RelaxedPrecision} : f32 + store %522, %arg2[%511, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %523 = load %arg2[%511, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %523, %arg2[%511, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %525 = load %arg1[%11, %25] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = mulf %524, %525 {RelaxedPrecision} : f32 + %527 = load %arg2[%511, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = addf %527, %526 {RelaxedPrecision} : f32 + store %528, %arg2[%511, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %529 = load %arg2[%511, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %529, %arg2[%511, %25] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %531 = load %arg1[%11, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = mulf %530, %531 {RelaxedPrecision} : f32 + %533 = load %arg2[%511, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = addf %533, %532 {RelaxedPrecision} : f32 + store %534, %arg2[%511, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %535 = load %arg2[%511, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %535, %arg2[%511, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %537 = load %arg1[%11, %39] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = mulf %536, %537 {RelaxedPrecision} : f32 + %539 = load %arg2[%511, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = addf %539, %538 {RelaxedPrecision} : f32 + store %540, %arg2[%511, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %541 = load %arg2[%511, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %541, %arg2[%511, %39] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %543 = load %arg1[%11, %46] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = mulf %542, %543 {RelaxedPrecision} : f32 + %545 = load %arg2[%511, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = addf %545, %544 {RelaxedPrecision} : f32 + store %546, %arg2[%511, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %547 = load %arg2[%511, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %547, %arg2[%511, %46] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %549 = load %arg1[%11, %53] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = mulf %548, %549 {RelaxedPrecision} : f32 + %551 = load %arg2[%511, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = addf %551, %550 {RelaxedPrecision} : f32 + store %552, %arg2[%511, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %553 = load %arg2[%511, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %553, %arg2[%511, %53] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %555 = load %arg1[%11, %60] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = mulf %554, %555 {RelaxedPrecision} : f32 + %557 = load %arg2[%511, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = addf %557, %556 {RelaxedPrecision} : f32 + store %558, %arg2[%511, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = load %arg2[%511, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %559, %arg2[%511, %60] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %561 = load %arg1[%11, %67] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = mulf %560, %561 {RelaxedPrecision} : f32 + %563 = load %arg2[%511, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %564 = addf %563, %562 {RelaxedPrecision} : f32 + store %564, %arg2[%511, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %565 = load %arg2[%511, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %565, %arg2[%511, %67] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %567 = load %arg1[%11, %74] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = mulf %566, %567 {RelaxedPrecision} : f32 + %569 = load %arg2[%511, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = addf %569, %568 {RelaxedPrecision} : f32 + store %570, %arg2[%511, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %571 = load %arg2[%511, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %571, %arg2[%511, %74] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %572 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %573 = load %arg1[%11, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = mulf %572, %573 {RelaxedPrecision} : f32 + %575 = load %arg2[%511, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = addf %575, %574 {RelaxedPrecision} : f32 + store %576, %arg2[%511, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %577 = load %arg2[%511, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %577, %arg2[%511, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %579 = load %arg1[%11, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = mulf %578, %579 {RelaxedPrecision} : f32 + %581 = load %arg2[%511, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %582 = addf %581, %580 {RelaxedPrecision} : f32 + store %582, %arg2[%511, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %583 = load %arg2[%511, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %583, %arg2[%511, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %584 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %585 = load %arg1[%11, %95] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = mulf %584, %585 {RelaxedPrecision} : f32 + %587 = load %arg2[%511, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = addf %587, %586 {RelaxedPrecision} : f32 + store %588, %arg2[%511, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %589 = load %arg2[%511, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %589, %arg2[%511, %95] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %591 = load %arg1[%11, %102] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = mulf %590, %591 {RelaxedPrecision} : f32 + %593 = load %arg2[%511, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = addf %593, %592 {RelaxedPrecision} : f32 + store %594, %arg2[%511, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %595 = load %arg2[%511, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %595, %arg2[%511, %102] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %597 = load %arg1[%11, %109] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %598 = mulf %596, %597 {RelaxedPrecision} : f32 + %599 = load %arg2[%511, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %600 = addf %599, %598 {RelaxedPrecision} : f32 + store %600, %arg2[%511, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %601 = load %arg2[%511, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %601, %arg2[%511, %109] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %602 = load %arg0[%511, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %603 = load %arg1[%11, %116] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %604 = mulf %602, %603 {RelaxedPrecision} : f32 + %605 = load %arg2[%511, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %606 = addf %605, %604 {RelaxedPrecision} : f32 + store %606, %arg2[%511, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %607 = load %arg2[%511, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %607, %arg2[%511, %116] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %608 = addi %8, %c1 : index + br ^bb9(%608 : index) + ^bb11: // pred: ^bb9 + %609 = addi %6, %c4 : index + br ^bb7(%609 : index) + ^bb12: // pred: ^bb7 + %610 = addi %4, %c16 : index + br ^bb5(%610 : index) + ^bb13: // pred: ^bb5 + %611 = addi %2, %c6 : index + br ^bb3(%611 : index) + ^bb14: // pred: ^bb3 + br ^bb15(%c0 : index) + ^bb15(%612: index): // 2 preds: ^bb14, ^bb22 + %613 = cmpi "slt", %612, %c256 : index + cond_br %613, ^bb16, ^bb23 + ^bb16: // pred: ^bb15 + br ^bb17(%c0 : index) + ^bb17(%614: index): // 2 preds: ^bb16, ^bb21 + %615 = cmpi "slt", %614, %c128 : index + cond_br %615, ^bb18, ^bb22 + ^bb18: // pred: ^bb17 + br ^bb19(%c0 : index) + ^bb19(%616: index): // 2 preds: ^bb18, ^bb20 + %617 = cmpi "slt", %616, %c4 : index + cond_br %617, ^bb20, ^bb21 + ^bb20: // pred: ^bb19 + %618 = addi %0, %612 : index + %619 = addi %614, %616 : index + %620 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %621 = load %arg1[%619, %618] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %622 = mulf %620, %621 {RelaxedPrecision} : f32 + %623 = load %arg2[%c780, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %624 = addf %623, %622 {RelaxedPrecision} : f32 + store %624, %arg2[%c780, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %625 = load %arg2[%c780, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %625, %arg2[%c780, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %626 = addi %618, %c1 : index + %627 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %628 = load %arg1[%619, %626] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %629 = mulf %627, %628 {RelaxedPrecision} : f32 + %630 = load %arg2[%c780, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %631 = addf %630, %629 {RelaxedPrecision} : f32 + store %631, %arg2[%c780, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %632 = load %arg2[%c780, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %632, %arg2[%c780, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %633 = addi %618, %c2 : index + %634 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %635 = load %arg1[%619, %633] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %636 = mulf %634, %635 {RelaxedPrecision} : f32 + %637 = load %arg2[%c780, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %638 = addf %637, %636 {RelaxedPrecision} : f32 + store %638, %arg2[%c780, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %639 = load %arg2[%c780, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %639, %arg2[%c780, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %640 = addi %618, %c3 : index + %641 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %642 = load %arg1[%619, %640] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %643 = mulf %641, %642 {RelaxedPrecision} : f32 + %644 = load %arg2[%c780, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %645 = addf %644, %643 {RelaxedPrecision} : f32 + store %645, %arg2[%c780, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %646 = load %arg2[%c780, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %646, %arg2[%c780, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %647 = addi %618, %c4 : index + %648 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %649 = load %arg1[%619, %647] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %650 = mulf %648, %649 {RelaxedPrecision} : f32 + %651 = load %arg2[%c780, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %652 = addf %651, %650 {RelaxedPrecision} : f32 + store %652, %arg2[%c780, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %653 = load %arg2[%c780, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %653, %arg2[%c780, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %654 = addi %618, %c5 : index + %655 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %656 = load %arg1[%619, %654] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %657 = mulf %655, %656 {RelaxedPrecision} : f32 + %658 = load %arg2[%c780, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %659 = addf %658, %657 {RelaxedPrecision} : f32 + store %659, %arg2[%c780, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %660 = load %arg2[%c780, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %660, %arg2[%c780, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %661 = addi %618, %c6 : index + %662 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %663 = load %arg1[%619, %661] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %664 = mulf %662, %663 {RelaxedPrecision} : f32 + %665 = load %arg2[%c780, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %666 = addf %665, %664 {RelaxedPrecision} : f32 + store %666, %arg2[%c780, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %667 = load %arg2[%c780, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %667, %arg2[%c780, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %668 = addi %618, %c7 : index + %669 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %670 = load %arg1[%619, %668] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %671 = mulf %669, %670 {RelaxedPrecision} : f32 + %672 = load %arg2[%c780, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %673 = addf %672, %671 {RelaxedPrecision} : f32 + store %673, %arg2[%c780, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %674 = load %arg2[%c780, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %674, %arg2[%c780, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %675 = addi %618, %c8 : index + %676 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %677 = load %arg1[%619, %675] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %678 = mulf %676, %677 {RelaxedPrecision} : f32 + %679 = load %arg2[%c780, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %680 = addf %679, %678 {RelaxedPrecision} : f32 + store %680, %arg2[%c780, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %681 = load %arg2[%c780, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %681, %arg2[%c780, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %682 = addi %618, %c9 : index + %683 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %684 = load %arg1[%619, %682] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %685 = mulf %683, %684 {RelaxedPrecision} : f32 + %686 = load %arg2[%c780, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %687 = addf %686, %685 {RelaxedPrecision} : f32 + store %687, %arg2[%c780, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %688 = load %arg2[%c780, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %688, %arg2[%c780, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %689 = addi %618, %c10 : index + %690 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %691 = load %arg1[%619, %689] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %692 = mulf %690, %691 {RelaxedPrecision} : f32 + %693 = load %arg2[%c780, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %694 = addf %693, %692 {RelaxedPrecision} : f32 + store %694, %arg2[%c780, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %695 = load %arg2[%c780, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %695, %arg2[%c780, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %696 = addi %618, %c11 : index + %697 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %698 = load %arg1[%619, %696] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %699 = mulf %697, %698 {RelaxedPrecision} : f32 + %700 = load %arg2[%c780, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %701 = addf %700, %699 {RelaxedPrecision} : f32 + store %701, %arg2[%c780, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %702 = load %arg2[%c780, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %702, %arg2[%c780, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %703 = addi %618, %c12 : index + %704 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %705 = load %arg1[%619, %703] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %706 = mulf %704, %705 {RelaxedPrecision} : f32 + %707 = load %arg2[%c780, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %708 = addf %707, %706 {RelaxedPrecision} : f32 + store %708, %arg2[%c780, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %709 = load %arg2[%c780, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %709, %arg2[%c780, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %710 = addi %618, %c13 : index + %711 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %712 = load %arg1[%619, %710] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %713 = mulf %711, %712 {RelaxedPrecision} : f32 + %714 = load %arg2[%c780, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %715 = addf %714, %713 {RelaxedPrecision} : f32 + store %715, %arg2[%c780, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %716 = load %arg2[%c780, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %716, %arg2[%c780, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %717 = addi %618, %c14 : index + %718 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %719 = load %arg1[%619, %717] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %720 = mulf %718, %719 {RelaxedPrecision} : f32 + %721 = load %arg2[%c780, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %722 = addf %721, %720 {RelaxedPrecision} : f32 + store %722, %arg2[%c780, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %723 = load %arg2[%c780, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %723, %arg2[%c780, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %724 = addi %618, %c15 : index + %725 = load %arg0[%c780, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %726 = load %arg1[%619, %724] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %727 = mulf %725, %726 {RelaxedPrecision} : f32 + %728 = load %arg2[%c780, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %729 = addf %728, %727 {RelaxedPrecision} : f32 + store %729, %arg2[%c780, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %730 = load %arg2[%c780, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %730, %arg2[%c780, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %731 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %732 = load %arg1[%619, %618] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %733 = mulf %731, %732 {RelaxedPrecision} : f32 + %734 = load %arg2[%c781, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %735 = addf %734, %733 {RelaxedPrecision} : f32 + store %735, %arg2[%c781, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %736 = load %arg2[%c781, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %736, %arg2[%c781, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %737 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %738 = load %arg1[%619, %626] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %739 = mulf %737, %738 {RelaxedPrecision} : f32 + %740 = load %arg2[%c781, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %741 = addf %740, %739 {RelaxedPrecision} : f32 + store %741, %arg2[%c781, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %742 = load %arg2[%c781, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %742, %arg2[%c781, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %743 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %744 = load %arg1[%619, %633] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %745 = mulf %743, %744 {RelaxedPrecision} : f32 + %746 = load %arg2[%c781, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %747 = addf %746, %745 {RelaxedPrecision} : f32 + store %747, %arg2[%c781, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %748 = load %arg2[%c781, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %748, %arg2[%c781, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %749 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %750 = load %arg1[%619, %640] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %751 = mulf %749, %750 {RelaxedPrecision} : f32 + %752 = load %arg2[%c781, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %753 = addf %752, %751 {RelaxedPrecision} : f32 + store %753, %arg2[%c781, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %754 = load %arg2[%c781, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %754, %arg2[%c781, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %755 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %756 = load %arg1[%619, %647] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %757 = mulf %755, %756 {RelaxedPrecision} : f32 + %758 = load %arg2[%c781, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %759 = addf %758, %757 {RelaxedPrecision} : f32 + store %759, %arg2[%c781, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %760 = load %arg2[%c781, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %760, %arg2[%c781, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %761 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %762 = load %arg1[%619, %654] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %763 = mulf %761, %762 {RelaxedPrecision} : f32 + %764 = load %arg2[%c781, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %765 = addf %764, %763 {RelaxedPrecision} : f32 + store %765, %arg2[%c781, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %766 = load %arg2[%c781, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %766, %arg2[%c781, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %767 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %768 = load %arg1[%619, %661] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %769 = mulf %767, %768 {RelaxedPrecision} : f32 + %770 = load %arg2[%c781, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %771 = addf %770, %769 {RelaxedPrecision} : f32 + store %771, %arg2[%c781, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %772 = load %arg2[%c781, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %772, %arg2[%c781, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %773 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %774 = load %arg1[%619, %668] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %775 = mulf %773, %774 {RelaxedPrecision} : f32 + %776 = load %arg2[%c781, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %777 = addf %776, %775 {RelaxedPrecision} : f32 + store %777, %arg2[%c781, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %778 = load %arg2[%c781, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %778, %arg2[%c781, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %779 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %780 = load %arg1[%619, %675] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %781 = mulf %779, %780 {RelaxedPrecision} : f32 + %782 = load %arg2[%c781, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %783 = addf %782, %781 {RelaxedPrecision} : f32 + store %783, %arg2[%c781, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %784 = load %arg2[%c781, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %784, %arg2[%c781, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %785 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %786 = load %arg1[%619, %682] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %787 = mulf %785, %786 {RelaxedPrecision} : f32 + %788 = load %arg2[%c781, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %789 = addf %788, %787 {RelaxedPrecision} : f32 + store %789, %arg2[%c781, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %790 = load %arg2[%c781, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %790, %arg2[%c781, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %791 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %792 = load %arg1[%619, %689] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %793 = mulf %791, %792 {RelaxedPrecision} : f32 + %794 = load %arg2[%c781, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %795 = addf %794, %793 {RelaxedPrecision} : f32 + store %795, %arg2[%c781, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %796 = load %arg2[%c781, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %796, %arg2[%c781, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %797 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %798 = load %arg1[%619, %696] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %799 = mulf %797, %798 {RelaxedPrecision} : f32 + %800 = load %arg2[%c781, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %801 = addf %800, %799 {RelaxedPrecision} : f32 + store %801, %arg2[%c781, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %802 = load %arg2[%c781, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %802, %arg2[%c781, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %803 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %804 = load %arg1[%619, %703] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %805 = mulf %803, %804 {RelaxedPrecision} : f32 + %806 = load %arg2[%c781, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %807 = addf %806, %805 {RelaxedPrecision} : f32 + store %807, %arg2[%c781, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %808 = load %arg2[%c781, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %808, %arg2[%c781, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %809 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %810 = load %arg1[%619, %710] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %811 = mulf %809, %810 {RelaxedPrecision} : f32 + %812 = load %arg2[%c781, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %813 = addf %812, %811 {RelaxedPrecision} : f32 + store %813, %arg2[%c781, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %814 = load %arg2[%c781, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %814, %arg2[%c781, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %815 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %816 = load %arg1[%619, %717] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %817 = mulf %815, %816 {RelaxedPrecision} : f32 + %818 = load %arg2[%c781, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %819 = addf %818, %817 {RelaxedPrecision} : f32 + store %819, %arg2[%c781, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %820 = load %arg2[%c781, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %820, %arg2[%c781, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %821 = load %arg0[%c781, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %822 = load %arg1[%619, %724] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %823 = mulf %821, %822 {RelaxedPrecision} : f32 + %824 = load %arg2[%c781, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %825 = addf %824, %823 {RelaxedPrecision} : f32 + store %825, %arg2[%c781, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %826 = load %arg2[%c781, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %826, %arg2[%c781, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %827 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %828 = load %arg1[%619, %618] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %829 = mulf %827, %828 {RelaxedPrecision} : f32 + %830 = load %arg2[%c782, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %831 = addf %830, %829 {RelaxedPrecision} : f32 + store %831, %arg2[%c782, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %832 = load %arg2[%c782, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %832, %arg2[%c782, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %833 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %834 = load %arg1[%619, %626] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %835 = mulf %833, %834 {RelaxedPrecision} : f32 + %836 = load %arg2[%c782, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %837 = addf %836, %835 {RelaxedPrecision} : f32 + store %837, %arg2[%c782, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %838 = load %arg2[%c782, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %838, %arg2[%c782, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %839 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %840 = load %arg1[%619, %633] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %841 = mulf %839, %840 {RelaxedPrecision} : f32 + %842 = load %arg2[%c782, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %843 = addf %842, %841 {RelaxedPrecision} : f32 + store %843, %arg2[%c782, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %844 = load %arg2[%c782, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %844, %arg2[%c782, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %845 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %846 = load %arg1[%619, %640] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %847 = mulf %845, %846 {RelaxedPrecision} : f32 + %848 = load %arg2[%c782, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %849 = addf %848, %847 {RelaxedPrecision} : f32 + store %849, %arg2[%c782, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %850 = load %arg2[%c782, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %850, %arg2[%c782, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %851 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %852 = load %arg1[%619, %647] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %853 = mulf %851, %852 {RelaxedPrecision} : f32 + %854 = load %arg2[%c782, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %855 = addf %854, %853 {RelaxedPrecision} : f32 + store %855, %arg2[%c782, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %856 = load %arg2[%c782, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %856, %arg2[%c782, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %857 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %858 = load %arg1[%619, %654] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %859 = mulf %857, %858 {RelaxedPrecision} : f32 + %860 = load %arg2[%c782, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %861 = addf %860, %859 {RelaxedPrecision} : f32 + store %861, %arg2[%c782, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %862 = load %arg2[%c782, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %862, %arg2[%c782, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %863 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %864 = load %arg1[%619, %661] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %865 = mulf %863, %864 {RelaxedPrecision} : f32 + %866 = load %arg2[%c782, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %867 = addf %866, %865 {RelaxedPrecision} : f32 + store %867, %arg2[%c782, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %868 = load %arg2[%c782, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %868, %arg2[%c782, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %869 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %870 = load %arg1[%619, %668] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %871 = mulf %869, %870 {RelaxedPrecision} : f32 + %872 = load %arg2[%c782, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %873 = addf %872, %871 {RelaxedPrecision} : f32 + store %873, %arg2[%c782, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %874 = load %arg2[%c782, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %874, %arg2[%c782, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %875 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %876 = load %arg1[%619, %675] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %877 = mulf %875, %876 {RelaxedPrecision} : f32 + %878 = load %arg2[%c782, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %879 = addf %878, %877 {RelaxedPrecision} : f32 + store %879, %arg2[%c782, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %880 = load %arg2[%c782, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %880, %arg2[%c782, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %881 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %882 = load %arg1[%619, %682] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %883 = mulf %881, %882 {RelaxedPrecision} : f32 + %884 = load %arg2[%c782, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %885 = addf %884, %883 {RelaxedPrecision} : f32 + store %885, %arg2[%c782, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %886 = load %arg2[%c782, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %886, %arg2[%c782, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %887 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %888 = load %arg1[%619, %689] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %889 = mulf %887, %888 {RelaxedPrecision} : f32 + %890 = load %arg2[%c782, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %891 = addf %890, %889 {RelaxedPrecision} : f32 + store %891, %arg2[%c782, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %892 = load %arg2[%c782, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %892, %arg2[%c782, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %893 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %894 = load %arg1[%619, %696] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %895 = mulf %893, %894 {RelaxedPrecision} : f32 + %896 = load %arg2[%c782, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %897 = addf %896, %895 {RelaxedPrecision} : f32 + store %897, %arg2[%c782, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %898 = load %arg2[%c782, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %898, %arg2[%c782, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %899 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %900 = load %arg1[%619, %703] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %901 = mulf %899, %900 {RelaxedPrecision} : f32 + %902 = load %arg2[%c782, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %903 = addf %902, %901 {RelaxedPrecision} : f32 + store %903, %arg2[%c782, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %904 = load %arg2[%c782, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %904, %arg2[%c782, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %905 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %906 = load %arg1[%619, %710] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %907 = mulf %905, %906 {RelaxedPrecision} : f32 + %908 = load %arg2[%c782, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %909 = addf %908, %907 {RelaxedPrecision} : f32 + store %909, %arg2[%c782, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %910 = load %arg2[%c782, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %910, %arg2[%c782, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %911 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %912 = load %arg1[%619, %717] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %913 = mulf %911, %912 {RelaxedPrecision} : f32 + %914 = load %arg2[%c782, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %915 = addf %914, %913 {RelaxedPrecision} : f32 + store %915, %arg2[%c782, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %916 = load %arg2[%c782, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %916, %arg2[%c782, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %917 = load %arg0[%c782, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %918 = load %arg1[%619, %724] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %919 = mulf %917, %918 {RelaxedPrecision} : f32 + %920 = load %arg2[%c782, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %921 = addf %920, %919 {RelaxedPrecision} : f32 + store %921, %arg2[%c782, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %922 = load %arg2[%c782, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %922, %arg2[%c782, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %923 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %924 = load %arg1[%619, %618] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %925 = mulf %923, %924 {RelaxedPrecision} : f32 + %926 = load %arg2[%c783, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %927 = addf %926, %925 {RelaxedPrecision} : f32 + store %927, %arg2[%c783, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %928 = load %arg2[%c783, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %928, %arg2[%c783, %618] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %929 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %930 = load %arg1[%619, %626] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %931 = mulf %929, %930 {RelaxedPrecision} : f32 + %932 = load %arg2[%c783, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %933 = addf %932, %931 {RelaxedPrecision} : f32 + store %933, %arg2[%c783, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %934 = load %arg2[%c783, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %934, %arg2[%c783, %626] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %935 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %936 = load %arg1[%619, %633] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %937 = mulf %935, %936 {RelaxedPrecision} : f32 + %938 = load %arg2[%c783, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %939 = addf %938, %937 {RelaxedPrecision} : f32 + store %939, %arg2[%c783, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %940 = load %arg2[%c783, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %940, %arg2[%c783, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %941 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %942 = load %arg1[%619, %640] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %943 = mulf %941, %942 {RelaxedPrecision} : f32 + %944 = load %arg2[%c783, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %945 = addf %944, %943 {RelaxedPrecision} : f32 + store %945, %arg2[%c783, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %946 = load %arg2[%c783, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %946, %arg2[%c783, %640] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %947 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %948 = load %arg1[%619, %647] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %949 = mulf %947, %948 {RelaxedPrecision} : f32 + %950 = load %arg2[%c783, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %951 = addf %950, %949 {RelaxedPrecision} : f32 + store %951, %arg2[%c783, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %952 = load %arg2[%c783, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %952, %arg2[%c783, %647] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %953 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %954 = load %arg1[%619, %654] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %955 = mulf %953, %954 {RelaxedPrecision} : f32 + %956 = load %arg2[%c783, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %957 = addf %956, %955 {RelaxedPrecision} : f32 + store %957, %arg2[%c783, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %958 = load %arg2[%c783, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %958, %arg2[%c783, %654] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %959 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %960 = load %arg1[%619, %661] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %961 = mulf %959, %960 {RelaxedPrecision} : f32 + %962 = load %arg2[%c783, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %963 = addf %962, %961 {RelaxedPrecision} : f32 + store %963, %arg2[%c783, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %964 = load %arg2[%c783, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %964, %arg2[%c783, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %965 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %966 = load %arg1[%619, %668] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %967 = mulf %965, %966 {RelaxedPrecision} : f32 + %968 = load %arg2[%c783, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %969 = addf %968, %967 {RelaxedPrecision} : f32 + store %969, %arg2[%c783, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %970 = load %arg2[%c783, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %970, %arg2[%c783, %668] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %971 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %972 = load %arg1[%619, %675] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %973 = mulf %971, %972 {RelaxedPrecision} : f32 + %974 = load %arg2[%c783, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %975 = addf %974, %973 {RelaxedPrecision} : f32 + store %975, %arg2[%c783, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %976 = load %arg2[%c783, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %976, %arg2[%c783, %675] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %977 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %978 = load %arg1[%619, %682] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %979 = mulf %977, %978 {RelaxedPrecision} : f32 + %980 = load %arg2[%c783, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %981 = addf %980, %979 {RelaxedPrecision} : f32 + store %981, %arg2[%c783, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %982 = load %arg2[%c783, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %982, %arg2[%c783, %682] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %983 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %984 = load %arg1[%619, %689] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %985 = mulf %983, %984 {RelaxedPrecision} : f32 + %986 = load %arg2[%c783, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %987 = addf %986, %985 {RelaxedPrecision} : f32 + store %987, %arg2[%c783, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %988 = load %arg2[%c783, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %988, %arg2[%c783, %689] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %989 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %990 = load %arg1[%619, %696] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %991 = mulf %989, %990 {RelaxedPrecision} : f32 + %992 = load %arg2[%c783, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %993 = addf %992, %991 {RelaxedPrecision} : f32 + store %993, %arg2[%c783, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %994 = load %arg2[%c783, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %994, %arg2[%c783, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %995 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %996 = load %arg1[%619, %703] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %997 = mulf %995, %996 {RelaxedPrecision} : f32 + %998 = load %arg2[%c783, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %999 = addf %998, %997 {RelaxedPrecision} : f32 + store %999, %arg2[%c783, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1000 = load %arg2[%c783, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %1000, %arg2[%c783, %703] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1001 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1002 = load %arg1[%619, %710] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1003 = mulf %1001, %1002 {RelaxedPrecision} : f32 + %1004 = load %arg2[%c783, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1005 = addf %1004, %1003 {RelaxedPrecision} : f32 + store %1005, %arg2[%c783, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1006 = load %arg2[%c783, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %1006, %arg2[%c783, %710] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1007 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1008 = load %arg1[%619, %717] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1009 = mulf %1007, %1008 {RelaxedPrecision} : f32 + %1010 = load %arg2[%c783, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1011 = addf %1010, %1009 {RelaxedPrecision} : f32 + store %1011, %arg2[%c783, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1012 = load %arg2[%c783, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %1012, %arg2[%c783, %717] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1013 = load %arg0[%c783, %619] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1014 = load %arg1[%619, %724] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1015 = mulf %1013, %1014 {RelaxedPrecision} : f32 + %1016 = load %arg2[%c783, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1017 = addf %1016, %1015 {RelaxedPrecision} : f32 + store %1017, %arg2[%c783, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1018 = load %arg2[%c783, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %1018, %arg2[%c783, %724] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1019 = addi %616, %c1 : index + br ^bb19(%1019 : index) + ^bb21: // pred: ^bb19 + %1020 = addi %614, %c4 : index + br ^bb17(%1020 : index) + ^bb22: // pred: ^bb17 + %1021 = addi %612, %c16 : index + br ^bb15(%1021 : index) + ^bb23: // pred: ^bb15 + %1022 = addi %0, %c256 : index + br ^bb1(%1022 : index) + ^bb24: // pred: ^bb1 + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/1_Canonicalizer.mlir b/Tutorials/optimized_matmul/mlir/1_Canonicalizer.mlir new file mode 100644 index 00000000..44024330 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/1_Canonicalizer.mlir @@ -0,0 +1,45 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "optimized_matmul" { + accv.func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = accln.sym_index {name = "i_o"} #accln<"index{i_o,7}"> + %1 = accln.sym_index {name = "k_o"} #accln<"index{k_o,5}"> + %2 = accln.sym_index {name = "j_o"} #accln<"index{j_o,3}"> + %3 = accln.sym_index {name = "i"} #accln<"index{i,0}"> + %4 = accln.sym_index {name = "j"} #accln<"index{j,1}"> + %5 = accln.sym_index {name = "k"} #accln<"index{k,2}"> + "accln.nest"() ( { + "accln.kernel"() ( { + %11 = load %arg0[%3, %5] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg1[%5, %4] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = "accv.bin_op"(%11, %12) {predicate = 2 : i64} : (f32, f32) -> f32 + %14 = load %arg2[%3, %4] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = "accv.bin_op"(%14, %13) {predicate = 0 : i64} : (f32, f32) -> f32 + store %15, %arg2[%3, %4] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = load %arg2[%3, %4] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %16, %arg2[%3, %4] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + accln.terminator + }) {sym_name = "_"} : () -> () + %6 = "accln.null_pred"() : () -> i1 + "accln.scheduled_kernel"(%6) {kernel = @_, sym_name = "scheduled__"} : (i1) -> () + %7 = "accxp.make_cache"() {memorySpace = 0 : i64} : () -> memref<16x128x16xf32, 3> + %8 = "accxp.cache_region"(%arg1, %7, %2, %1) ( { + accxp.cache_region_terminator + }) {cacheAccessIndexing = 0 : i64, cacheAccessMaps = {globalInputToLogicalCache = affine_map<(d0, d1, d2, d3) -> (d2 - d1, d3 - d0)>, globalInputToPhysicalCache = affine_map<(d0, d1, d2, d3) -> (((d3 - d0) floordiv 16) mod 16, (d2 - d1) mod 128, (d3 - d0) mod 16)>, logicalCacheToGlobalInput = affine_map<(d0, d1, d2, d3) -> (d2 + d1, d3 + d0)>, logicalCacheToPhysicalCache = affine_map<(d0, d1) -> ((d1 floordiv 16) mod 16, d0 mod 128, d1 mod 16)>}, cacheDimGlobalIndices = [#accln<"index{k,2}">, #accln<"index{j,1}">], cacheGlobalDimensionSizes = [128, 512], id = 0 : i64, injectionIndex = #accln<"index{k_o,5}">, inputAccessIndexing = 0 : i64, inputAccessMaps = {globalInputToPhysicalCache = affine_map<(d0, d1) -> (d0, d1)>}} : (memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<16x128x16xf32, 3>, index, index) -> index + %9 = "accxp.make_cache"() {memorySpace = 0 : i64} : () -> memref<16x6x16xf32, 3> + %10 = "accxp.cache_region"(%arg2, %9, %2, %1, %0) ( { + accxp.cache_region_terminator + }) {cacheAccessIndexing = 0 : i64, cacheAccessMaps = {globalInputToLogicalCache = affine_map<(d0, d1, d2, d3, d4) -> (d3 - d2, d4 - d0)>, globalInputToPhysicalCache = affine_map<(d0, d1, d2, d3, d4) -> (((d4 - d0) floordiv 16) mod 16, (d3 - d2) mod 6, (d4 - d0) mod 16)>, logicalCacheToGlobalInput = affine_map<(d0, d1, d2, d3, d4) -> (d3 + d2, d4 + d0)>, logicalCacheToPhysicalCache = affine_map<(d0, d1) -> ((d1 floordiv 16) mod 16, d0 mod 6, d1 mod 16)>}, cacheDimGlobalIndices = [#accln<"index{i,0}">, #accln<"index{j,1}">], cacheGlobalDimensionSizes = [784, 512], id = 1 : i64, injectionIndex = #accln<"index{i_o,7}">, inputAccessIndexing = 0 : i64, inputAccessMaps = {globalInputToPhysicalCache = affine_map<(d0, d1) -> (d0, d1)>}} : (memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<16x6x16xf32, 3>, index, index, index) -> index + "accln.schedule"(%8, %10) ( { + "accln.exec_plan"() {exec_target = 0 : i64} : () -> () + accln.terminator + }) {domain = #accln<"xfdomain{dims: {{i,0}, {j,1}, {k,2}}, indices: {{{i,0} : {0:784:1} = {(d0, d1) -> (d0 + d1), {{i_o,7}, {i_i,8}}}}, {{j,1} : {0:512:1} = {(d0, d1) -> (d0 + d1), {{j_o,3}, {j_i,4}}}}, {{k,2} : {0:128:1} = {(d0, d1) -> (d0 + d1), {{k_o,5}, {k_i,6}}}}, {{j_o,3} : {0:512:256}}, {{j_i,4} : {0:256:1} = {(d0, d1) -> (d0 + d1), {{j_i_o,13}, {j_i_i,14}}}}, {{k_o,5} : {0:128:128}}, {{k_i,6} : {0:128:1} = {(d0, d1) -> (d0 + d1), {{k_i_o,9}, {k_i_i,10}}}}, {{i_o,7} : {0:784:1}}, {{i_i,8} : {0:1:1} = {(d0, d1) -> (d0 + d1), {{i_i_o,11}, {i_i_i,12}}}}, {{k_i_o,9} : {0:128:4}}, {{k_i_i,10} : {0:4:1}}, {{i_i_o,11} : {0:1:6}}, {{i_i_i,12} : {0:6:1}}, {{j_i_o,13} : {0:256:16}}, {{j_i_i,14} : {0:16:1} = {(d0, d1) -> (d0 + d1), {{j_i_i_o,15}, {j_i_i_i,16}}}}, {{j_i_i_o,15} : {0:16:8}}, {{j_i_i_i,16} : {0:8:1}}}}">, kernels = [@scheduled__], loopattrs = [{accxp_vectorizationInfo = #accxp<"vectorizationinfo{8,16,1}">, scheduledIndex = #accln<"index{j_i_i_i,16}">}], order = [#accln<"index{j_o,3}">, #accln<"index{k_o,5}">, #accln<"index{i_o,7}">, #accln<"index{j_i_o,13}">, #accln<"index{k_i_o,9}">, #accln<"index{i_i_o,11}">, #accln<"index{k_i_i,10}">, #accln<"index{i_i_i,12}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], parallel = [], unroll_and_jammed = {}, unrolled = [15 : index, 11 : index]} : (index, index) -> () + accln.terminator + }) {domain = #accln<"idomain{{i,0}={0:784:1}, {j,1}={0:512:1}, {k,2}={0:128:1}}">, exec_target = 0 : i64, kernels = []} : () -> () + accv.return + } + accv.func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + accv.return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/20_ConvertValueToLLVM.mlir b/Tutorials/optimized_matmul/mlir/20_ConvertValueToLLVM.mlir new file mode 100644 index 00000000..be537b05 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/20_ConvertValueToLLVM.mlir @@ -0,0 +1,10140 @@ +module @optimized_matmul { + llvm.func @optimized_matmul_py_impl_17630232307017152746(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(781 : index) : !llvm.i64 + %25 = llvm.mlir.constant(782 : index) : !llvm.i64 + %26 = llvm.mlir.constant(783 : index) : !llvm.i64 + %27 = llvm.mlir.constant(512 : index) : !llvm.i64 + %28 = llvm.mlir.constant(780 : index) : !llvm.i64 + %29 = llvm.mlir.constant(256 : index) : !llvm.i64 + %30 = llvm.mlir.constant(16 : index) : !llvm.i64 + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.mlir.constant(0 : index) : !llvm.i64 + %33 = llvm.mlir.constant(1 : index) : !llvm.i64 + %34 = llvm.mlir.constant(2 : index) : !llvm.i64 + %35 = llvm.mlir.constant(3 : index) : !llvm.i64 + %36 = llvm.mlir.constant(4 : index) : !llvm.i64 + %37 = llvm.mlir.constant(5 : index) : !llvm.i64 + %38 = llvm.mlir.constant(6 : index) : !llvm.i64 + %39 = llvm.mlir.constant(7 : index) : !llvm.i64 + %40 = llvm.mlir.constant(8 : index) : !llvm.i64 + %41 = llvm.mlir.constant(9 : index) : !llvm.i64 + %42 = llvm.mlir.constant(10 : index) : !llvm.i64 + %43 = llvm.mlir.constant(11 : index) : !llvm.i64 + %44 = llvm.mlir.constant(12 : index) : !llvm.i64 + %45 = llvm.mlir.constant(13 : index) : !llvm.i64 + %46 = llvm.mlir.constant(14 : index) : !llvm.i64 + %47 = llvm.mlir.constant(15 : index) : !llvm.i64 + llvm.br ^bb1(%32 : !llvm.i64) + ^bb1(%48: !llvm.i64): // 2 preds: ^bb0, ^bb23 + %49 = llvm.icmp "slt" %48, %27 : !llvm.i64 + llvm.cond_br %49, ^bb2, ^bb24 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%32 : !llvm.i64) + ^bb3(%50: !llvm.i64): // 2 preds: ^bb2, ^bb13 + %51 = llvm.icmp "slt" %50, %28 : !llvm.i64 + llvm.cond_br %51, ^bb4, ^bb14 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%32 : !llvm.i64) + ^bb5(%52: !llvm.i64): // 2 preds: ^bb4, ^bb12 + %53 = llvm.icmp "slt" %52, %29 : !llvm.i64 + llvm.cond_br %53, ^bb6, ^bb13 + ^bb6: // pred: ^bb5 + llvm.br ^bb7(%32 : !llvm.i64) + ^bb7(%54: !llvm.i64): // 2 preds: ^bb6, ^bb11 + %55 = llvm.icmp "slt" %54, %31 : !llvm.i64 + llvm.cond_br %55, ^bb8, ^bb12 + ^bb8: // pred: ^bb7 + llvm.br ^bb9(%32 : !llvm.i64) + ^bb9(%56: !llvm.i64): // 2 preds: ^bb8, ^bb10 + %57 = llvm.icmp "slt" %56, %36 : !llvm.i64 + llvm.cond_br %57, ^bb10, ^bb11 + ^bb10: // pred: ^bb9 + %58 = llvm.add %48, %52 : !llvm.i64 + %59 = llvm.add %54, %56 : !llvm.i64 + %60 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %61 = llvm.mlir.constant(0 : index) : !llvm.i64 + %62 = llvm.mlir.constant(128 : index) : !llvm.i64 + %63 = llvm.mul %50, %62 : !llvm.i64 + %64 = llvm.add %61, %63 : !llvm.i64 + %65 = llvm.mlir.constant(1 : index) : !llvm.i64 + %66 = llvm.mul %59, %65 : !llvm.i64 + %67 = llvm.add %64, %66 : !llvm.i64 + %68 = llvm.getelementptr %60[%67] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %69 = llvm.load %68 : !llvm.ptr + %70 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %71 = llvm.mlir.constant(0 : index) : !llvm.i64 + %72 = llvm.mlir.constant(512 : index) : !llvm.i64 + %73 = llvm.mul %59, %72 : !llvm.i64 + %74 = llvm.add %71, %73 : !llvm.i64 + %75 = llvm.mlir.constant(1 : index) : !llvm.i64 + %76 = llvm.mul %58, %75 : !llvm.i64 + %77 = llvm.add %74, %76 : !llvm.i64 + %78 = llvm.getelementptr %70[%77] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %79 = llvm.load %78 : !llvm.ptr + %80 = llvm.fmul %69, %79 {RelaxedPrecision} : !llvm.float + %81 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %82 = llvm.mlir.constant(0 : index) : !llvm.i64 + %83 = llvm.mlir.constant(512 : index) : !llvm.i64 + %84 = llvm.mul %50, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.mlir.constant(1 : index) : !llvm.i64 + %87 = llvm.mul %58, %86 : !llvm.i64 + %88 = llvm.add %85, %87 : !llvm.i64 + %89 = llvm.getelementptr %81[%88] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %90 = llvm.load %89 : !llvm.ptr + %91 = llvm.fadd %90, %80 {RelaxedPrecision} : !llvm.float + %92 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %93 = llvm.mlir.constant(0 : index) : !llvm.i64 + %94 = llvm.mlir.constant(512 : index) : !llvm.i64 + %95 = llvm.mul %50, %94 : !llvm.i64 + %96 = llvm.add %93, %95 : !llvm.i64 + %97 = llvm.mlir.constant(1 : index) : !llvm.i64 + %98 = llvm.mul %58, %97 : !llvm.i64 + %99 = llvm.add %96, %98 : !llvm.i64 + %100 = llvm.getelementptr %92[%99] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %91, %100 : !llvm.ptr + %101 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %102 = llvm.mlir.constant(0 : index) : !llvm.i64 + %103 = llvm.mlir.constant(512 : index) : !llvm.i64 + %104 = llvm.mul %50, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %107 = llvm.mul %58, %106 : !llvm.i64 + %108 = llvm.add %105, %107 : !llvm.i64 + %109 = llvm.getelementptr %101[%108] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %110 = llvm.load %109 : !llvm.ptr + %111 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %112 = llvm.mlir.constant(0 : index) : !llvm.i64 + %113 = llvm.mlir.constant(512 : index) : !llvm.i64 + %114 = llvm.mul %50, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.mlir.constant(1 : index) : !llvm.i64 + %117 = llvm.mul %58, %116 : !llvm.i64 + %118 = llvm.add %115, %117 : !llvm.i64 + %119 = llvm.getelementptr %111[%118] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %110, %119 : !llvm.ptr + %120 = llvm.add %58, %33 : !llvm.i64 + %121 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %122 = llvm.mlir.constant(0 : index) : !llvm.i64 + %123 = llvm.mlir.constant(128 : index) : !llvm.i64 + %124 = llvm.mul %50, %123 : !llvm.i64 + %125 = llvm.add %122, %124 : !llvm.i64 + %126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %127 = llvm.mul %59, %126 : !llvm.i64 + %128 = llvm.add %125, %127 : !llvm.i64 + %129 = llvm.getelementptr %121[%128] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %130 = llvm.load %129 : !llvm.ptr + %131 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %134 = llvm.mul %59, %133 : !llvm.i64 + %135 = llvm.add %132, %134 : !llvm.i64 + %136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %137 = llvm.mul %120, %136 : !llvm.i64 + %138 = llvm.add %135, %137 : !llvm.i64 + %139 = llvm.getelementptr %131[%138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %140 = llvm.load %139 : !llvm.ptr + %141 = llvm.fmul %130, %140 {RelaxedPrecision} : !llvm.float + %142 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %143 = llvm.mlir.constant(0 : index) : !llvm.i64 + %144 = llvm.mlir.constant(512 : index) : !llvm.i64 + %145 = llvm.mul %50, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.mlir.constant(1 : index) : !llvm.i64 + %148 = llvm.mul %120, %147 : !llvm.i64 + %149 = llvm.add %146, %148 : !llvm.i64 + %150 = llvm.getelementptr %142[%149] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %151 = llvm.load %150 : !llvm.ptr + %152 = llvm.fadd %151, %141 {RelaxedPrecision} : !llvm.float + %153 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %154 = llvm.mlir.constant(0 : index) : !llvm.i64 + %155 = llvm.mlir.constant(512 : index) : !llvm.i64 + %156 = llvm.mul %50, %155 : !llvm.i64 + %157 = llvm.add %154, %156 : !llvm.i64 + %158 = llvm.mlir.constant(1 : index) : !llvm.i64 + %159 = llvm.mul %120, %158 : !llvm.i64 + %160 = llvm.add %157, %159 : !llvm.i64 + %161 = llvm.getelementptr %153[%160] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %152, %161 : !llvm.ptr + %162 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %163 = llvm.mlir.constant(0 : index) : !llvm.i64 + %164 = llvm.mlir.constant(512 : index) : !llvm.i64 + %165 = llvm.mul %50, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.mlir.constant(1 : index) : !llvm.i64 + %168 = llvm.mul %120, %167 : !llvm.i64 + %169 = llvm.add %166, %168 : !llvm.i64 + %170 = llvm.getelementptr %162[%169] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %171 = llvm.load %170 : !llvm.ptr + %172 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %173 = llvm.mlir.constant(0 : index) : !llvm.i64 + %174 = llvm.mlir.constant(512 : index) : !llvm.i64 + %175 = llvm.mul %50, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.mlir.constant(1 : index) : !llvm.i64 + %178 = llvm.mul %120, %177 : !llvm.i64 + %179 = llvm.add %176, %178 : !llvm.i64 + %180 = llvm.getelementptr %172[%179] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %171, %180 : !llvm.ptr + %181 = llvm.add %58, %34 : !llvm.i64 + %182 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %183 = llvm.mlir.constant(0 : index) : !llvm.i64 + %184 = llvm.mlir.constant(128 : index) : !llvm.i64 + %185 = llvm.mul %50, %184 : !llvm.i64 + %186 = llvm.add %183, %185 : !llvm.i64 + %187 = llvm.mlir.constant(1 : index) : !llvm.i64 + %188 = llvm.mul %59, %187 : !llvm.i64 + %189 = llvm.add %186, %188 : !llvm.i64 + %190 = llvm.getelementptr %182[%189] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %191 = llvm.load %190 : !llvm.ptr + %192 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %193 = llvm.mlir.constant(0 : index) : !llvm.i64 + %194 = llvm.mlir.constant(512 : index) : !llvm.i64 + %195 = llvm.mul %59, %194 : !llvm.i64 + %196 = llvm.add %193, %195 : !llvm.i64 + %197 = llvm.mlir.constant(1 : index) : !llvm.i64 + %198 = llvm.mul %181, %197 : !llvm.i64 + %199 = llvm.add %196, %198 : !llvm.i64 + %200 = llvm.getelementptr %192[%199] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %201 = llvm.load %200 : !llvm.ptr + %202 = llvm.fmul %191, %201 {RelaxedPrecision} : !llvm.float + %203 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %204 = llvm.mlir.constant(0 : index) : !llvm.i64 + %205 = llvm.mlir.constant(512 : index) : !llvm.i64 + %206 = llvm.mul %50, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.mlir.constant(1 : index) : !llvm.i64 + %209 = llvm.mul %181, %208 : !llvm.i64 + %210 = llvm.add %207, %209 : !llvm.i64 + %211 = llvm.getelementptr %203[%210] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %212 = llvm.load %211 : !llvm.ptr + %213 = llvm.fadd %212, %202 {RelaxedPrecision} : !llvm.float + %214 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %215 = llvm.mlir.constant(0 : index) : !llvm.i64 + %216 = llvm.mlir.constant(512 : index) : !llvm.i64 + %217 = llvm.mul %50, %216 : !llvm.i64 + %218 = llvm.add %215, %217 : !llvm.i64 + %219 = llvm.mlir.constant(1 : index) : !llvm.i64 + %220 = llvm.mul %181, %219 : !llvm.i64 + %221 = llvm.add %218, %220 : !llvm.i64 + %222 = llvm.getelementptr %214[%221] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %213, %222 : !llvm.ptr + %223 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %224 = llvm.mlir.constant(0 : index) : !llvm.i64 + %225 = llvm.mlir.constant(512 : index) : !llvm.i64 + %226 = llvm.mul %50, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.mlir.constant(1 : index) : !llvm.i64 + %229 = llvm.mul %181, %228 : !llvm.i64 + %230 = llvm.add %227, %229 : !llvm.i64 + %231 = llvm.getelementptr %223[%230] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %232 = llvm.load %231 : !llvm.ptr + %233 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %234 = llvm.mlir.constant(0 : index) : !llvm.i64 + %235 = llvm.mlir.constant(512 : index) : !llvm.i64 + %236 = llvm.mul %50, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %239 = llvm.mul %181, %238 : !llvm.i64 + %240 = llvm.add %237, %239 : !llvm.i64 + %241 = llvm.getelementptr %233[%240] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %232, %241 : !llvm.ptr + %242 = llvm.add %58, %35 : !llvm.i64 + %243 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %244 = llvm.mlir.constant(0 : index) : !llvm.i64 + %245 = llvm.mlir.constant(128 : index) : !llvm.i64 + %246 = llvm.mul %50, %245 : !llvm.i64 + %247 = llvm.add %244, %246 : !llvm.i64 + %248 = llvm.mlir.constant(1 : index) : !llvm.i64 + %249 = llvm.mul %59, %248 : !llvm.i64 + %250 = llvm.add %247, %249 : !llvm.i64 + %251 = llvm.getelementptr %243[%250] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %252 = llvm.load %251 : !llvm.ptr + %253 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %254 = llvm.mlir.constant(0 : index) : !llvm.i64 + %255 = llvm.mlir.constant(512 : index) : !llvm.i64 + %256 = llvm.mul %59, %255 : !llvm.i64 + %257 = llvm.add %254, %256 : !llvm.i64 + %258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %259 = llvm.mul %242, %258 : !llvm.i64 + %260 = llvm.add %257, %259 : !llvm.i64 + %261 = llvm.getelementptr %253[%260] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %262 = llvm.load %261 : !llvm.ptr + %263 = llvm.fmul %252, %262 {RelaxedPrecision} : !llvm.float + %264 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %265 = llvm.mlir.constant(0 : index) : !llvm.i64 + %266 = llvm.mlir.constant(512 : index) : !llvm.i64 + %267 = llvm.mul %50, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.mlir.constant(1 : index) : !llvm.i64 + %270 = llvm.mul %242, %269 : !llvm.i64 + %271 = llvm.add %268, %270 : !llvm.i64 + %272 = llvm.getelementptr %264[%271] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %273 = llvm.load %272 : !llvm.ptr + %274 = llvm.fadd %273, %263 {RelaxedPrecision} : !llvm.float + %275 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %276 = llvm.mlir.constant(0 : index) : !llvm.i64 + %277 = llvm.mlir.constant(512 : index) : !llvm.i64 + %278 = llvm.mul %50, %277 : !llvm.i64 + %279 = llvm.add %276, %278 : !llvm.i64 + %280 = llvm.mlir.constant(1 : index) : !llvm.i64 + %281 = llvm.mul %242, %280 : !llvm.i64 + %282 = llvm.add %279, %281 : !llvm.i64 + %283 = llvm.getelementptr %275[%282] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %274, %283 : !llvm.ptr + %284 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %286 = llvm.mlir.constant(512 : index) : !llvm.i64 + %287 = llvm.mul %50, %286 : !llvm.i64 + %288 = llvm.add %285, %287 : !llvm.i64 + %289 = llvm.mlir.constant(1 : index) : !llvm.i64 + %290 = llvm.mul %242, %289 : !llvm.i64 + %291 = llvm.add %288, %290 : !llvm.i64 + %292 = llvm.getelementptr %284[%291] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %293 = llvm.load %292 : !llvm.ptr + %294 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %295 = llvm.mlir.constant(0 : index) : !llvm.i64 + %296 = llvm.mlir.constant(512 : index) : !llvm.i64 + %297 = llvm.mul %50, %296 : !llvm.i64 + %298 = llvm.add %295, %297 : !llvm.i64 + %299 = llvm.mlir.constant(1 : index) : !llvm.i64 + %300 = llvm.mul %242, %299 : !llvm.i64 + %301 = llvm.add %298, %300 : !llvm.i64 + %302 = llvm.getelementptr %294[%301] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %293, %302 : !llvm.ptr + %303 = llvm.add %58, %36 : !llvm.i64 + %304 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %305 = llvm.mlir.constant(0 : index) : !llvm.i64 + %306 = llvm.mlir.constant(128 : index) : !llvm.i64 + %307 = llvm.mul %50, %306 : !llvm.i64 + %308 = llvm.add %305, %307 : !llvm.i64 + %309 = llvm.mlir.constant(1 : index) : !llvm.i64 + %310 = llvm.mul %59, %309 : !llvm.i64 + %311 = llvm.add %308, %310 : !llvm.i64 + %312 = llvm.getelementptr %304[%311] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %313 = llvm.load %312 : !llvm.ptr + %314 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %316 = llvm.mlir.constant(512 : index) : !llvm.i64 + %317 = llvm.mul %59, %316 : !llvm.i64 + %318 = llvm.add %315, %317 : !llvm.i64 + %319 = llvm.mlir.constant(1 : index) : !llvm.i64 + %320 = llvm.mul %303, %319 : !llvm.i64 + %321 = llvm.add %318, %320 : !llvm.i64 + %322 = llvm.getelementptr %314[%321] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %323 = llvm.load %322 : !llvm.ptr + %324 = llvm.fmul %313, %323 {RelaxedPrecision} : !llvm.float + %325 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %326 = llvm.mlir.constant(0 : index) : !llvm.i64 + %327 = llvm.mlir.constant(512 : index) : !llvm.i64 + %328 = llvm.mul %50, %327 : !llvm.i64 + %329 = llvm.add %326, %328 : !llvm.i64 + %330 = llvm.mlir.constant(1 : index) : !llvm.i64 + %331 = llvm.mul %303, %330 : !llvm.i64 + %332 = llvm.add %329, %331 : !llvm.i64 + %333 = llvm.getelementptr %325[%332] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %334 = llvm.load %333 : !llvm.ptr + %335 = llvm.fadd %334, %324 {RelaxedPrecision} : !llvm.float + %336 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %337 = llvm.mlir.constant(0 : index) : !llvm.i64 + %338 = llvm.mlir.constant(512 : index) : !llvm.i64 + %339 = llvm.mul %50, %338 : !llvm.i64 + %340 = llvm.add %337, %339 : !llvm.i64 + %341 = llvm.mlir.constant(1 : index) : !llvm.i64 + %342 = llvm.mul %303, %341 : !llvm.i64 + %343 = llvm.add %340, %342 : !llvm.i64 + %344 = llvm.getelementptr %336[%343] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %335, %344 : !llvm.ptr + %345 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %346 = llvm.mlir.constant(0 : index) : !llvm.i64 + %347 = llvm.mlir.constant(512 : index) : !llvm.i64 + %348 = llvm.mul %50, %347 : !llvm.i64 + %349 = llvm.add %346, %348 : !llvm.i64 + %350 = llvm.mlir.constant(1 : index) : !llvm.i64 + %351 = llvm.mul %303, %350 : !llvm.i64 + %352 = llvm.add %349, %351 : !llvm.i64 + %353 = llvm.getelementptr %345[%352] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %354 = llvm.load %353 : !llvm.ptr + %355 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %356 = llvm.mlir.constant(0 : index) : !llvm.i64 + %357 = llvm.mlir.constant(512 : index) : !llvm.i64 + %358 = llvm.mul %50, %357 : !llvm.i64 + %359 = llvm.add %356, %358 : !llvm.i64 + %360 = llvm.mlir.constant(1 : index) : !llvm.i64 + %361 = llvm.mul %303, %360 : !llvm.i64 + %362 = llvm.add %359, %361 : !llvm.i64 + %363 = llvm.getelementptr %355[%362] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %354, %363 : !llvm.ptr + %364 = llvm.add %58, %37 : !llvm.i64 + %365 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %366 = llvm.mlir.constant(0 : index) : !llvm.i64 + %367 = llvm.mlir.constant(128 : index) : !llvm.i64 + %368 = llvm.mul %50, %367 : !llvm.i64 + %369 = llvm.add %366, %368 : !llvm.i64 + %370 = llvm.mlir.constant(1 : index) : !llvm.i64 + %371 = llvm.mul %59, %370 : !llvm.i64 + %372 = llvm.add %369, %371 : !llvm.i64 + %373 = llvm.getelementptr %365[%372] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %374 = llvm.load %373 : !llvm.ptr + %375 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %376 = llvm.mlir.constant(0 : index) : !llvm.i64 + %377 = llvm.mlir.constant(512 : index) : !llvm.i64 + %378 = llvm.mul %59, %377 : !llvm.i64 + %379 = llvm.add %376, %378 : !llvm.i64 + %380 = llvm.mlir.constant(1 : index) : !llvm.i64 + %381 = llvm.mul %364, %380 : !llvm.i64 + %382 = llvm.add %379, %381 : !llvm.i64 + %383 = llvm.getelementptr %375[%382] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %384 = llvm.load %383 : !llvm.ptr + %385 = llvm.fmul %374, %384 {RelaxedPrecision} : !llvm.float + %386 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %387 = llvm.mlir.constant(0 : index) : !llvm.i64 + %388 = llvm.mlir.constant(512 : index) : !llvm.i64 + %389 = llvm.mul %50, %388 : !llvm.i64 + %390 = llvm.add %387, %389 : !llvm.i64 + %391 = llvm.mlir.constant(1 : index) : !llvm.i64 + %392 = llvm.mul %364, %391 : !llvm.i64 + %393 = llvm.add %390, %392 : !llvm.i64 + %394 = llvm.getelementptr %386[%393] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %395 = llvm.load %394 : !llvm.ptr + %396 = llvm.fadd %395, %385 {RelaxedPrecision} : !llvm.float + %397 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %398 = llvm.mlir.constant(0 : index) : !llvm.i64 + %399 = llvm.mlir.constant(512 : index) : !llvm.i64 + %400 = llvm.mul %50, %399 : !llvm.i64 + %401 = llvm.add %398, %400 : !llvm.i64 + %402 = llvm.mlir.constant(1 : index) : !llvm.i64 + %403 = llvm.mul %364, %402 : !llvm.i64 + %404 = llvm.add %401, %403 : !llvm.i64 + %405 = llvm.getelementptr %397[%404] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %396, %405 : !llvm.ptr + %406 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %408 = llvm.mlir.constant(512 : index) : !llvm.i64 + %409 = llvm.mul %50, %408 : !llvm.i64 + %410 = llvm.add %407, %409 : !llvm.i64 + %411 = llvm.mlir.constant(1 : index) : !llvm.i64 + %412 = llvm.mul %364, %411 : !llvm.i64 + %413 = llvm.add %410, %412 : !llvm.i64 + %414 = llvm.getelementptr %406[%413] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %415 = llvm.load %414 : !llvm.ptr + %416 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %417 = llvm.mlir.constant(0 : index) : !llvm.i64 + %418 = llvm.mlir.constant(512 : index) : !llvm.i64 + %419 = llvm.mul %50, %418 : !llvm.i64 + %420 = llvm.add %417, %419 : !llvm.i64 + %421 = llvm.mlir.constant(1 : index) : !llvm.i64 + %422 = llvm.mul %364, %421 : !llvm.i64 + %423 = llvm.add %420, %422 : !llvm.i64 + %424 = llvm.getelementptr %416[%423] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %415, %424 : !llvm.ptr + %425 = llvm.add %58, %38 : !llvm.i64 + %426 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %427 = llvm.mlir.constant(0 : index) : !llvm.i64 + %428 = llvm.mlir.constant(128 : index) : !llvm.i64 + %429 = llvm.mul %50, %428 : !llvm.i64 + %430 = llvm.add %427, %429 : !llvm.i64 + %431 = llvm.mlir.constant(1 : index) : !llvm.i64 + %432 = llvm.mul %59, %431 : !llvm.i64 + %433 = llvm.add %430, %432 : !llvm.i64 + %434 = llvm.getelementptr %426[%433] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %435 = llvm.load %434 : !llvm.ptr + %436 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %437 = llvm.mlir.constant(0 : index) : !llvm.i64 + %438 = llvm.mlir.constant(512 : index) : !llvm.i64 + %439 = llvm.mul %59, %438 : !llvm.i64 + %440 = llvm.add %437, %439 : !llvm.i64 + %441 = llvm.mlir.constant(1 : index) : !llvm.i64 + %442 = llvm.mul %425, %441 : !llvm.i64 + %443 = llvm.add %440, %442 : !llvm.i64 + %444 = llvm.getelementptr %436[%443] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %445 = llvm.load %444 : !llvm.ptr + %446 = llvm.fmul %435, %445 {RelaxedPrecision} : !llvm.float + %447 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %449 = llvm.mlir.constant(512 : index) : !llvm.i64 + %450 = llvm.mul %50, %449 : !llvm.i64 + %451 = llvm.add %448, %450 : !llvm.i64 + %452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %453 = llvm.mul %425, %452 : !llvm.i64 + %454 = llvm.add %451, %453 : !llvm.i64 + %455 = llvm.getelementptr %447[%454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %456 = llvm.load %455 : !llvm.ptr + %457 = llvm.fadd %456, %446 {RelaxedPrecision} : !llvm.float + %458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %461 = llvm.mul %50, %460 : !llvm.i64 + %462 = llvm.add %459, %461 : !llvm.i64 + %463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %464 = llvm.mul %425, %463 : !llvm.i64 + %465 = llvm.add %462, %464 : !llvm.i64 + %466 = llvm.getelementptr %458[%465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %457, %466 : !llvm.ptr + %467 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %468 = llvm.mlir.constant(0 : index) : !llvm.i64 + %469 = llvm.mlir.constant(512 : index) : !llvm.i64 + %470 = llvm.mul %50, %469 : !llvm.i64 + %471 = llvm.add %468, %470 : !llvm.i64 + %472 = llvm.mlir.constant(1 : index) : !llvm.i64 + %473 = llvm.mul %425, %472 : !llvm.i64 + %474 = llvm.add %471, %473 : !llvm.i64 + %475 = llvm.getelementptr %467[%474] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %476 = llvm.load %475 : !llvm.ptr + %477 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %480 = llvm.mul %50, %479 : !llvm.i64 + %481 = llvm.add %478, %480 : !llvm.i64 + %482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %483 = llvm.mul %425, %482 : !llvm.i64 + %484 = llvm.add %481, %483 : !llvm.i64 + %485 = llvm.getelementptr %477[%484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %476, %485 : !llvm.ptr + %486 = llvm.add %58, %39 : !llvm.i64 + %487 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %489 = llvm.mlir.constant(128 : index) : !llvm.i64 + %490 = llvm.mul %50, %489 : !llvm.i64 + %491 = llvm.add %488, %490 : !llvm.i64 + %492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %493 = llvm.mul %59, %492 : !llvm.i64 + %494 = llvm.add %491, %493 : !llvm.i64 + %495 = llvm.getelementptr %487[%494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %496 = llvm.load %495 : !llvm.ptr + %497 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %500 = llvm.mul %59, %499 : !llvm.i64 + %501 = llvm.add %498, %500 : !llvm.i64 + %502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %503 = llvm.mul %486, %502 : !llvm.i64 + %504 = llvm.add %501, %503 : !llvm.i64 + %505 = llvm.getelementptr %497[%504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %506 = llvm.load %505 : !llvm.ptr + %507 = llvm.fmul %496, %506 {RelaxedPrecision} : !llvm.float + %508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %511 = llvm.mul %50, %510 : !llvm.i64 + %512 = llvm.add %509, %511 : !llvm.i64 + %513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %514 = llvm.mul %486, %513 : !llvm.i64 + %515 = llvm.add %512, %514 : !llvm.i64 + %516 = llvm.getelementptr %508[%515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %517 = llvm.load %516 : !llvm.ptr + %518 = llvm.fadd %517, %507 {RelaxedPrecision} : !llvm.float + %519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %522 = llvm.mul %50, %521 : !llvm.i64 + %523 = llvm.add %520, %522 : !llvm.i64 + %524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %525 = llvm.mul %486, %524 : !llvm.i64 + %526 = llvm.add %523, %525 : !llvm.i64 + %527 = llvm.getelementptr %519[%526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %518, %527 : !llvm.ptr + %528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %531 = llvm.mul %50, %530 : !llvm.i64 + %532 = llvm.add %529, %531 : !llvm.i64 + %533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %534 = llvm.mul %486, %533 : !llvm.i64 + %535 = llvm.add %532, %534 : !llvm.i64 + %536 = llvm.getelementptr %528[%535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %537 = llvm.load %536 : !llvm.ptr + %538 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %539 = llvm.mlir.constant(0 : index) : !llvm.i64 + %540 = llvm.mlir.constant(512 : index) : !llvm.i64 + %541 = llvm.mul %50, %540 : !llvm.i64 + %542 = llvm.add %539, %541 : !llvm.i64 + %543 = llvm.mlir.constant(1 : index) : !llvm.i64 + %544 = llvm.mul %486, %543 : !llvm.i64 + %545 = llvm.add %542, %544 : !llvm.i64 + %546 = llvm.getelementptr %538[%545] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %537, %546 : !llvm.ptr + %547 = llvm.add %58, %40 : !llvm.i64 + %548 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %550 = llvm.mlir.constant(128 : index) : !llvm.i64 + %551 = llvm.mul %50, %550 : !llvm.i64 + %552 = llvm.add %549, %551 : !llvm.i64 + %553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %554 = llvm.mul %59, %553 : !llvm.i64 + %555 = llvm.add %552, %554 : !llvm.i64 + %556 = llvm.getelementptr %548[%555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %557 = llvm.load %556 : !llvm.ptr + %558 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %561 = llvm.mul %59, %560 : !llvm.i64 + %562 = llvm.add %559, %561 : !llvm.i64 + %563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %564 = llvm.mul %547, %563 : !llvm.i64 + %565 = llvm.add %562, %564 : !llvm.i64 + %566 = llvm.getelementptr %558[%565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %567 = llvm.load %566 : !llvm.ptr + %568 = llvm.fmul %557, %567 {RelaxedPrecision} : !llvm.float + %569 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %570 = llvm.mlir.constant(0 : index) : !llvm.i64 + %571 = llvm.mlir.constant(512 : index) : !llvm.i64 + %572 = llvm.mul %50, %571 : !llvm.i64 + %573 = llvm.add %570, %572 : !llvm.i64 + %574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %575 = llvm.mul %547, %574 : !llvm.i64 + %576 = llvm.add %573, %575 : !llvm.i64 + %577 = llvm.getelementptr %569[%576] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %578 = llvm.load %577 : !llvm.ptr + %579 = llvm.fadd %578, %568 {RelaxedPrecision} : !llvm.float + %580 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %581 = llvm.mlir.constant(0 : index) : !llvm.i64 + %582 = llvm.mlir.constant(512 : index) : !llvm.i64 + %583 = llvm.mul %50, %582 : !llvm.i64 + %584 = llvm.add %581, %583 : !llvm.i64 + %585 = llvm.mlir.constant(1 : index) : !llvm.i64 + %586 = llvm.mul %547, %585 : !llvm.i64 + %587 = llvm.add %584, %586 : !llvm.i64 + %588 = llvm.getelementptr %580[%587] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %579, %588 : !llvm.ptr + %589 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %591 = llvm.mlir.constant(512 : index) : !llvm.i64 + %592 = llvm.mul %50, %591 : !llvm.i64 + %593 = llvm.add %590, %592 : !llvm.i64 + %594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %595 = llvm.mul %547, %594 : !llvm.i64 + %596 = llvm.add %593, %595 : !llvm.i64 + %597 = llvm.getelementptr %589[%596] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %598 = llvm.load %597 : !llvm.ptr + %599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %602 = llvm.mul %50, %601 : !llvm.i64 + %603 = llvm.add %600, %602 : !llvm.i64 + %604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %605 = llvm.mul %547, %604 : !llvm.i64 + %606 = llvm.add %603, %605 : !llvm.i64 + %607 = llvm.getelementptr %599[%606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %598, %607 : !llvm.ptr + %608 = llvm.add %58, %41 : !llvm.i64 + %609 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %610 = llvm.mlir.constant(0 : index) : !llvm.i64 + %611 = llvm.mlir.constant(128 : index) : !llvm.i64 + %612 = llvm.mul %50, %611 : !llvm.i64 + %613 = llvm.add %610, %612 : !llvm.i64 + %614 = llvm.mlir.constant(1 : index) : !llvm.i64 + %615 = llvm.mul %59, %614 : !llvm.i64 + %616 = llvm.add %613, %615 : !llvm.i64 + %617 = llvm.getelementptr %609[%616] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %618 = llvm.load %617 : !llvm.ptr + %619 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %620 = llvm.mlir.constant(0 : index) : !llvm.i64 + %621 = llvm.mlir.constant(512 : index) : !llvm.i64 + %622 = llvm.mul %59, %621 : !llvm.i64 + %623 = llvm.add %620, %622 : !llvm.i64 + %624 = llvm.mlir.constant(1 : index) : !llvm.i64 + %625 = llvm.mul %608, %624 : !llvm.i64 + %626 = llvm.add %623, %625 : !llvm.i64 + %627 = llvm.getelementptr %619[%626] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %628 = llvm.load %627 : !llvm.ptr + %629 = llvm.fmul %618, %628 {RelaxedPrecision} : !llvm.float + %630 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %632 = llvm.mlir.constant(512 : index) : !llvm.i64 + %633 = llvm.mul %50, %632 : !llvm.i64 + %634 = llvm.add %631, %633 : !llvm.i64 + %635 = llvm.mlir.constant(1 : index) : !llvm.i64 + %636 = llvm.mul %608, %635 : !llvm.i64 + %637 = llvm.add %634, %636 : !llvm.i64 + %638 = llvm.getelementptr %630[%637] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %639 = llvm.load %638 : !llvm.ptr + %640 = llvm.fadd %639, %629 {RelaxedPrecision} : !llvm.float + %641 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %642 = llvm.mlir.constant(0 : index) : !llvm.i64 + %643 = llvm.mlir.constant(512 : index) : !llvm.i64 + %644 = llvm.mul %50, %643 : !llvm.i64 + %645 = llvm.add %642, %644 : !llvm.i64 + %646 = llvm.mlir.constant(1 : index) : !llvm.i64 + %647 = llvm.mul %608, %646 : !llvm.i64 + %648 = llvm.add %645, %647 : !llvm.i64 + %649 = llvm.getelementptr %641[%648] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %640, %649 : !llvm.ptr + %650 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %651 = llvm.mlir.constant(0 : index) : !llvm.i64 + %652 = llvm.mlir.constant(512 : index) : !llvm.i64 + %653 = llvm.mul %50, %652 : !llvm.i64 + %654 = llvm.add %651, %653 : !llvm.i64 + %655 = llvm.mlir.constant(1 : index) : !llvm.i64 + %656 = llvm.mul %608, %655 : !llvm.i64 + %657 = llvm.add %654, %656 : !llvm.i64 + %658 = llvm.getelementptr %650[%657] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %659 = llvm.load %658 : !llvm.ptr + %660 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %662 = llvm.mlir.constant(512 : index) : !llvm.i64 + %663 = llvm.mul %50, %662 : !llvm.i64 + %664 = llvm.add %661, %663 : !llvm.i64 + %665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %666 = llvm.mul %608, %665 : !llvm.i64 + %667 = llvm.add %664, %666 : !llvm.i64 + %668 = llvm.getelementptr %660[%667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %659, %668 : !llvm.ptr + %669 = llvm.add %58, %42 : !llvm.i64 + %670 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %672 = llvm.mlir.constant(128 : index) : !llvm.i64 + %673 = llvm.mul %50, %672 : !llvm.i64 + %674 = llvm.add %671, %673 : !llvm.i64 + %675 = llvm.mlir.constant(1 : index) : !llvm.i64 + %676 = llvm.mul %59, %675 : !llvm.i64 + %677 = llvm.add %674, %676 : !llvm.i64 + %678 = llvm.getelementptr %670[%677] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %679 = llvm.load %678 : !llvm.ptr + %680 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %681 = llvm.mlir.constant(0 : index) : !llvm.i64 + %682 = llvm.mlir.constant(512 : index) : !llvm.i64 + %683 = llvm.mul %59, %682 : !llvm.i64 + %684 = llvm.add %681, %683 : !llvm.i64 + %685 = llvm.mlir.constant(1 : index) : !llvm.i64 + %686 = llvm.mul %669, %685 : !llvm.i64 + %687 = llvm.add %684, %686 : !llvm.i64 + %688 = llvm.getelementptr %680[%687] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %689 = llvm.load %688 : !llvm.ptr + %690 = llvm.fmul %679, %689 {RelaxedPrecision} : !llvm.float + %691 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %692 = llvm.mlir.constant(0 : index) : !llvm.i64 + %693 = llvm.mlir.constant(512 : index) : !llvm.i64 + %694 = llvm.mul %50, %693 : !llvm.i64 + %695 = llvm.add %692, %694 : !llvm.i64 + %696 = llvm.mlir.constant(1 : index) : !llvm.i64 + %697 = llvm.mul %669, %696 : !llvm.i64 + %698 = llvm.add %695, %697 : !llvm.i64 + %699 = llvm.getelementptr %691[%698] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %700 = llvm.load %699 : !llvm.ptr + %701 = llvm.fadd %700, %690 {RelaxedPrecision} : !llvm.float + %702 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %703 = llvm.mlir.constant(0 : index) : !llvm.i64 + %704 = llvm.mlir.constant(512 : index) : !llvm.i64 + %705 = llvm.mul %50, %704 : !llvm.i64 + %706 = llvm.add %703, %705 : !llvm.i64 + %707 = llvm.mlir.constant(1 : index) : !llvm.i64 + %708 = llvm.mul %669, %707 : !llvm.i64 + %709 = llvm.add %706, %708 : !llvm.i64 + %710 = llvm.getelementptr %702[%709] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %701, %710 : !llvm.ptr + %711 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %712 = llvm.mlir.constant(0 : index) : !llvm.i64 + %713 = llvm.mlir.constant(512 : index) : !llvm.i64 + %714 = llvm.mul %50, %713 : !llvm.i64 + %715 = llvm.add %712, %714 : !llvm.i64 + %716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %717 = llvm.mul %669, %716 : !llvm.i64 + %718 = llvm.add %715, %717 : !llvm.i64 + %719 = llvm.getelementptr %711[%718] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %720 = llvm.load %719 : !llvm.ptr + %721 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %722 = llvm.mlir.constant(0 : index) : !llvm.i64 + %723 = llvm.mlir.constant(512 : index) : !llvm.i64 + %724 = llvm.mul %50, %723 : !llvm.i64 + %725 = llvm.add %722, %724 : !llvm.i64 + %726 = llvm.mlir.constant(1 : index) : !llvm.i64 + %727 = llvm.mul %669, %726 : !llvm.i64 + %728 = llvm.add %725, %727 : !llvm.i64 + %729 = llvm.getelementptr %721[%728] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %720, %729 : !llvm.ptr + %730 = llvm.add %58, %43 : !llvm.i64 + %731 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %732 = llvm.mlir.constant(0 : index) : !llvm.i64 + %733 = llvm.mlir.constant(128 : index) : !llvm.i64 + %734 = llvm.mul %50, %733 : !llvm.i64 + %735 = llvm.add %732, %734 : !llvm.i64 + %736 = llvm.mlir.constant(1 : index) : !llvm.i64 + %737 = llvm.mul %59, %736 : !llvm.i64 + %738 = llvm.add %735, %737 : !llvm.i64 + %739 = llvm.getelementptr %731[%738] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %740 = llvm.load %739 : !llvm.ptr + %741 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %742 = llvm.mlir.constant(0 : index) : !llvm.i64 + %743 = llvm.mlir.constant(512 : index) : !llvm.i64 + %744 = llvm.mul %59, %743 : !llvm.i64 + %745 = llvm.add %742, %744 : !llvm.i64 + %746 = llvm.mlir.constant(1 : index) : !llvm.i64 + %747 = llvm.mul %730, %746 : !llvm.i64 + %748 = llvm.add %745, %747 : !llvm.i64 + %749 = llvm.getelementptr %741[%748] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %750 = llvm.load %749 : !llvm.ptr + %751 = llvm.fmul %740, %750 {RelaxedPrecision} : !llvm.float + %752 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %754 = llvm.mlir.constant(512 : index) : !llvm.i64 + %755 = llvm.mul %50, %754 : !llvm.i64 + %756 = llvm.add %753, %755 : !llvm.i64 + %757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %758 = llvm.mul %730, %757 : !llvm.i64 + %759 = llvm.add %756, %758 : !llvm.i64 + %760 = llvm.getelementptr %752[%759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %761 = llvm.load %760 : !llvm.ptr + %762 = llvm.fadd %761, %751 {RelaxedPrecision} : !llvm.float + %763 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %764 = llvm.mlir.constant(0 : index) : !llvm.i64 + %765 = llvm.mlir.constant(512 : index) : !llvm.i64 + %766 = llvm.mul %50, %765 : !llvm.i64 + %767 = llvm.add %764, %766 : !llvm.i64 + %768 = llvm.mlir.constant(1 : index) : !llvm.i64 + %769 = llvm.mul %730, %768 : !llvm.i64 + %770 = llvm.add %767, %769 : !llvm.i64 + %771 = llvm.getelementptr %763[%770] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %762, %771 : !llvm.ptr + %772 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %773 = llvm.mlir.constant(0 : index) : !llvm.i64 + %774 = llvm.mlir.constant(512 : index) : !llvm.i64 + %775 = llvm.mul %50, %774 : !llvm.i64 + %776 = llvm.add %773, %775 : !llvm.i64 + %777 = llvm.mlir.constant(1 : index) : !llvm.i64 + %778 = llvm.mul %730, %777 : !llvm.i64 + %779 = llvm.add %776, %778 : !llvm.i64 + %780 = llvm.getelementptr %772[%779] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %781 = llvm.load %780 : !llvm.ptr + %782 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %784 = llvm.mlir.constant(512 : index) : !llvm.i64 + %785 = llvm.mul %50, %784 : !llvm.i64 + %786 = llvm.add %783, %785 : !llvm.i64 + %787 = llvm.mlir.constant(1 : index) : !llvm.i64 + %788 = llvm.mul %730, %787 : !llvm.i64 + %789 = llvm.add %786, %788 : !llvm.i64 + %790 = llvm.getelementptr %782[%789] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %781, %790 : !llvm.ptr + %791 = llvm.add %58, %44 : !llvm.i64 + %792 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %794 = llvm.mlir.constant(128 : index) : !llvm.i64 + %795 = llvm.mul %50, %794 : !llvm.i64 + %796 = llvm.add %793, %795 : !llvm.i64 + %797 = llvm.mlir.constant(1 : index) : !llvm.i64 + %798 = llvm.mul %59, %797 : !llvm.i64 + %799 = llvm.add %796, %798 : !llvm.i64 + %800 = llvm.getelementptr %792[%799] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %801 = llvm.load %800 : !llvm.ptr + %802 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %803 = llvm.mlir.constant(0 : index) : !llvm.i64 + %804 = llvm.mlir.constant(512 : index) : !llvm.i64 + %805 = llvm.mul %59, %804 : !llvm.i64 + %806 = llvm.add %803, %805 : !llvm.i64 + %807 = llvm.mlir.constant(1 : index) : !llvm.i64 + %808 = llvm.mul %791, %807 : !llvm.i64 + %809 = llvm.add %806, %808 : !llvm.i64 + %810 = llvm.getelementptr %802[%809] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %811 = llvm.load %810 : !llvm.ptr + %812 = llvm.fmul %801, %811 {RelaxedPrecision} : !llvm.float + %813 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %814 = llvm.mlir.constant(0 : index) : !llvm.i64 + %815 = llvm.mlir.constant(512 : index) : !llvm.i64 + %816 = llvm.mul %50, %815 : !llvm.i64 + %817 = llvm.add %814, %816 : !llvm.i64 + %818 = llvm.mlir.constant(1 : index) : !llvm.i64 + %819 = llvm.mul %791, %818 : !llvm.i64 + %820 = llvm.add %817, %819 : !llvm.i64 + %821 = llvm.getelementptr %813[%820] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %822 = llvm.load %821 : !llvm.ptr + %823 = llvm.fadd %822, %812 {RelaxedPrecision} : !llvm.float + %824 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %825 = llvm.mlir.constant(0 : index) : !llvm.i64 + %826 = llvm.mlir.constant(512 : index) : !llvm.i64 + %827 = llvm.mul %50, %826 : !llvm.i64 + %828 = llvm.add %825, %827 : !llvm.i64 + %829 = llvm.mlir.constant(1 : index) : !llvm.i64 + %830 = llvm.mul %791, %829 : !llvm.i64 + %831 = llvm.add %828, %830 : !llvm.i64 + %832 = llvm.getelementptr %824[%831] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %823, %832 : !llvm.ptr + %833 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %834 = llvm.mlir.constant(0 : index) : !llvm.i64 + %835 = llvm.mlir.constant(512 : index) : !llvm.i64 + %836 = llvm.mul %50, %835 : !llvm.i64 + %837 = llvm.add %834, %836 : !llvm.i64 + %838 = llvm.mlir.constant(1 : index) : !llvm.i64 + %839 = llvm.mul %791, %838 : !llvm.i64 + %840 = llvm.add %837, %839 : !llvm.i64 + %841 = llvm.getelementptr %833[%840] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %842 = llvm.load %841 : !llvm.ptr + %843 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %844 = llvm.mlir.constant(0 : index) : !llvm.i64 + %845 = llvm.mlir.constant(512 : index) : !llvm.i64 + %846 = llvm.mul %50, %845 : !llvm.i64 + %847 = llvm.add %844, %846 : !llvm.i64 + %848 = llvm.mlir.constant(1 : index) : !llvm.i64 + %849 = llvm.mul %791, %848 : !llvm.i64 + %850 = llvm.add %847, %849 : !llvm.i64 + %851 = llvm.getelementptr %843[%850] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %842, %851 : !llvm.ptr + %852 = llvm.add %58, %45 : !llvm.i64 + %853 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %854 = llvm.mlir.constant(0 : index) : !llvm.i64 + %855 = llvm.mlir.constant(128 : index) : !llvm.i64 + %856 = llvm.mul %50, %855 : !llvm.i64 + %857 = llvm.add %854, %856 : !llvm.i64 + %858 = llvm.mlir.constant(1 : index) : !llvm.i64 + %859 = llvm.mul %59, %858 : !llvm.i64 + %860 = llvm.add %857, %859 : !llvm.i64 + %861 = llvm.getelementptr %853[%860] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %862 = llvm.load %861 : !llvm.ptr + %863 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %864 = llvm.mlir.constant(0 : index) : !llvm.i64 + %865 = llvm.mlir.constant(512 : index) : !llvm.i64 + %866 = llvm.mul %59, %865 : !llvm.i64 + %867 = llvm.add %864, %866 : !llvm.i64 + %868 = llvm.mlir.constant(1 : index) : !llvm.i64 + %869 = llvm.mul %852, %868 : !llvm.i64 + %870 = llvm.add %867, %869 : !llvm.i64 + %871 = llvm.getelementptr %863[%870] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %872 = llvm.load %871 : !llvm.ptr + %873 = llvm.fmul %862, %872 {RelaxedPrecision} : !llvm.float + %874 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %875 = llvm.mlir.constant(0 : index) : !llvm.i64 + %876 = llvm.mlir.constant(512 : index) : !llvm.i64 + %877 = llvm.mul %50, %876 : !llvm.i64 + %878 = llvm.add %875, %877 : !llvm.i64 + %879 = llvm.mlir.constant(1 : index) : !llvm.i64 + %880 = llvm.mul %852, %879 : !llvm.i64 + %881 = llvm.add %878, %880 : !llvm.i64 + %882 = llvm.getelementptr %874[%881] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %883 = llvm.load %882 : !llvm.ptr + %884 = llvm.fadd %883, %873 {RelaxedPrecision} : !llvm.float + %885 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %886 = llvm.mlir.constant(0 : index) : !llvm.i64 + %887 = llvm.mlir.constant(512 : index) : !llvm.i64 + %888 = llvm.mul %50, %887 : !llvm.i64 + %889 = llvm.add %886, %888 : !llvm.i64 + %890 = llvm.mlir.constant(1 : index) : !llvm.i64 + %891 = llvm.mul %852, %890 : !llvm.i64 + %892 = llvm.add %889, %891 : !llvm.i64 + %893 = llvm.getelementptr %885[%892] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %884, %893 : !llvm.ptr + %894 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %895 = llvm.mlir.constant(0 : index) : !llvm.i64 + %896 = llvm.mlir.constant(512 : index) : !llvm.i64 + %897 = llvm.mul %50, %896 : !llvm.i64 + %898 = llvm.add %895, %897 : !llvm.i64 + %899 = llvm.mlir.constant(1 : index) : !llvm.i64 + %900 = llvm.mul %852, %899 : !llvm.i64 + %901 = llvm.add %898, %900 : !llvm.i64 + %902 = llvm.getelementptr %894[%901] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %903 = llvm.load %902 : !llvm.ptr + %904 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %905 = llvm.mlir.constant(0 : index) : !llvm.i64 + %906 = llvm.mlir.constant(512 : index) : !llvm.i64 + %907 = llvm.mul %50, %906 : !llvm.i64 + %908 = llvm.add %905, %907 : !llvm.i64 + %909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %910 = llvm.mul %852, %909 : !llvm.i64 + %911 = llvm.add %908, %910 : !llvm.i64 + %912 = llvm.getelementptr %904[%911] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %903, %912 : !llvm.ptr + %913 = llvm.add %58, %46 : !llvm.i64 + %914 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %915 = llvm.mlir.constant(0 : index) : !llvm.i64 + %916 = llvm.mlir.constant(128 : index) : !llvm.i64 + %917 = llvm.mul %50, %916 : !llvm.i64 + %918 = llvm.add %915, %917 : !llvm.i64 + %919 = llvm.mlir.constant(1 : index) : !llvm.i64 + %920 = llvm.mul %59, %919 : !llvm.i64 + %921 = llvm.add %918, %920 : !llvm.i64 + %922 = llvm.getelementptr %914[%921] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %923 = llvm.load %922 : !llvm.ptr + %924 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %925 = llvm.mlir.constant(0 : index) : !llvm.i64 + %926 = llvm.mlir.constant(512 : index) : !llvm.i64 + %927 = llvm.mul %59, %926 : !llvm.i64 + %928 = llvm.add %925, %927 : !llvm.i64 + %929 = llvm.mlir.constant(1 : index) : !llvm.i64 + %930 = llvm.mul %913, %929 : !llvm.i64 + %931 = llvm.add %928, %930 : !llvm.i64 + %932 = llvm.getelementptr %924[%931] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %933 = llvm.load %932 : !llvm.ptr + %934 = llvm.fmul %923, %933 {RelaxedPrecision} : !llvm.float + %935 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %936 = llvm.mlir.constant(0 : index) : !llvm.i64 + %937 = llvm.mlir.constant(512 : index) : !llvm.i64 + %938 = llvm.mul %50, %937 : !llvm.i64 + %939 = llvm.add %936, %938 : !llvm.i64 + %940 = llvm.mlir.constant(1 : index) : !llvm.i64 + %941 = llvm.mul %913, %940 : !llvm.i64 + %942 = llvm.add %939, %941 : !llvm.i64 + %943 = llvm.getelementptr %935[%942] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %944 = llvm.load %943 : !llvm.ptr + %945 = llvm.fadd %944, %934 {RelaxedPrecision} : !llvm.float + %946 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %948 = llvm.mlir.constant(512 : index) : !llvm.i64 + %949 = llvm.mul %50, %948 : !llvm.i64 + %950 = llvm.add %947, %949 : !llvm.i64 + %951 = llvm.mlir.constant(1 : index) : !llvm.i64 + %952 = llvm.mul %913, %951 : !llvm.i64 + %953 = llvm.add %950, %952 : !llvm.i64 + %954 = llvm.getelementptr %946[%953] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %945, %954 : !llvm.ptr + %955 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %956 = llvm.mlir.constant(0 : index) : !llvm.i64 + %957 = llvm.mlir.constant(512 : index) : !llvm.i64 + %958 = llvm.mul %50, %957 : !llvm.i64 + %959 = llvm.add %956, %958 : !llvm.i64 + %960 = llvm.mlir.constant(1 : index) : !llvm.i64 + %961 = llvm.mul %913, %960 : !llvm.i64 + %962 = llvm.add %959, %961 : !llvm.i64 + %963 = llvm.getelementptr %955[%962] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %964 = llvm.load %963 : !llvm.ptr + %965 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %966 = llvm.mlir.constant(0 : index) : !llvm.i64 + %967 = llvm.mlir.constant(512 : index) : !llvm.i64 + %968 = llvm.mul %50, %967 : !llvm.i64 + %969 = llvm.add %966, %968 : !llvm.i64 + %970 = llvm.mlir.constant(1 : index) : !llvm.i64 + %971 = llvm.mul %913, %970 : !llvm.i64 + %972 = llvm.add %969, %971 : !llvm.i64 + %973 = llvm.getelementptr %965[%972] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %964, %973 : !llvm.ptr + %974 = llvm.add %58, %47 : !llvm.i64 + %975 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %976 = llvm.mlir.constant(0 : index) : !llvm.i64 + %977 = llvm.mlir.constant(128 : index) : !llvm.i64 + %978 = llvm.mul %50, %977 : !llvm.i64 + %979 = llvm.add %976, %978 : !llvm.i64 + %980 = llvm.mlir.constant(1 : index) : !llvm.i64 + %981 = llvm.mul %59, %980 : !llvm.i64 + %982 = llvm.add %979, %981 : !llvm.i64 + %983 = llvm.getelementptr %975[%982] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %984 = llvm.load %983 : !llvm.ptr + %985 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %986 = llvm.mlir.constant(0 : index) : !llvm.i64 + %987 = llvm.mlir.constant(512 : index) : !llvm.i64 + %988 = llvm.mul %59, %987 : !llvm.i64 + %989 = llvm.add %986, %988 : !llvm.i64 + %990 = llvm.mlir.constant(1 : index) : !llvm.i64 + %991 = llvm.mul %974, %990 : !llvm.i64 + %992 = llvm.add %989, %991 : !llvm.i64 + %993 = llvm.getelementptr %985[%992] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %994 = llvm.load %993 : !llvm.ptr + %995 = llvm.fmul %984, %994 {RelaxedPrecision} : !llvm.float + %996 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %997 = llvm.mlir.constant(0 : index) : !llvm.i64 + %998 = llvm.mlir.constant(512 : index) : !llvm.i64 + %999 = llvm.mul %50, %998 : !llvm.i64 + %1000 = llvm.add %997, %999 : !llvm.i64 + %1001 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1002 = llvm.mul %974, %1001 : !llvm.i64 + %1003 = llvm.add %1000, %1002 : !llvm.i64 + %1004 = llvm.getelementptr %996[%1003] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1005 = llvm.load %1004 : !llvm.ptr + %1006 = llvm.fadd %1005, %995 {RelaxedPrecision} : !llvm.float + %1007 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1009 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1010 = llvm.mul %50, %1009 : !llvm.i64 + %1011 = llvm.add %1008, %1010 : !llvm.i64 + %1012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1013 = llvm.mul %974, %1012 : !llvm.i64 + %1014 = llvm.add %1011, %1013 : !llvm.i64 + %1015 = llvm.getelementptr %1007[%1014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1006, %1015 : !llvm.ptr + %1016 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1017 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1018 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1019 = llvm.mul %50, %1018 : !llvm.i64 + %1020 = llvm.add %1017, %1019 : !llvm.i64 + %1021 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1022 = llvm.mul %974, %1021 : !llvm.i64 + %1023 = llvm.add %1020, %1022 : !llvm.i64 + %1024 = llvm.getelementptr %1016[%1023] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1025 = llvm.load %1024 : !llvm.ptr + %1026 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1027 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1028 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1029 = llvm.mul %50, %1028 : !llvm.i64 + %1030 = llvm.add %1027, %1029 : !llvm.i64 + %1031 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1032 = llvm.mul %974, %1031 : !llvm.i64 + %1033 = llvm.add %1030, %1032 : !llvm.i64 + %1034 = llvm.getelementptr %1026[%1033] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1025, %1034 : !llvm.ptr + %1035 = llvm.add %50, %33 : !llvm.i64 + %1036 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1037 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1038 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1039 = llvm.mul %1035, %1038 : !llvm.i64 + %1040 = llvm.add %1037, %1039 : !llvm.i64 + %1041 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1042 = llvm.mul %59, %1041 : !llvm.i64 + %1043 = llvm.add %1040, %1042 : !llvm.i64 + %1044 = llvm.getelementptr %1036[%1043] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1045 = llvm.load %1044 : !llvm.ptr + %1046 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1047 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1048 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1049 = llvm.mul %59, %1048 : !llvm.i64 + %1050 = llvm.add %1047, %1049 : !llvm.i64 + %1051 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1052 = llvm.mul %58, %1051 : !llvm.i64 + %1053 = llvm.add %1050, %1052 : !llvm.i64 + %1054 = llvm.getelementptr %1046[%1053] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1055 = llvm.load %1054 : !llvm.ptr + %1056 = llvm.fmul %1045, %1055 {RelaxedPrecision} : !llvm.float + %1057 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1059 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1060 = llvm.mul %1035, %1059 : !llvm.i64 + %1061 = llvm.add %1058, %1060 : !llvm.i64 + %1062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1063 = llvm.mul %58, %1062 : !llvm.i64 + %1064 = llvm.add %1061, %1063 : !llvm.i64 + %1065 = llvm.getelementptr %1057[%1064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1066 = llvm.load %1065 : !llvm.ptr + %1067 = llvm.fadd %1066, %1056 {RelaxedPrecision} : !llvm.float + %1068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1071 = llvm.mul %1035, %1070 : !llvm.i64 + %1072 = llvm.add %1069, %1071 : !llvm.i64 + %1073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1074 = llvm.mul %58, %1073 : !llvm.i64 + %1075 = llvm.add %1072, %1074 : !llvm.i64 + %1076 = llvm.getelementptr %1068[%1075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1067, %1076 : !llvm.ptr + %1077 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1078 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1079 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1080 = llvm.mul %1035, %1079 : !llvm.i64 + %1081 = llvm.add %1078, %1080 : !llvm.i64 + %1082 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1083 = llvm.mul %58, %1082 : !llvm.i64 + %1084 = llvm.add %1081, %1083 : !llvm.i64 + %1085 = llvm.getelementptr %1077[%1084] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1086 = llvm.load %1085 : !llvm.ptr + %1087 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1088 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1089 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1090 = llvm.mul %1035, %1089 : !llvm.i64 + %1091 = llvm.add %1088, %1090 : !llvm.i64 + %1092 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1093 = llvm.mul %58, %1092 : !llvm.i64 + %1094 = llvm.add %1091, %1093 : !llvm.i64 + %1095 = llvm.getelementptr %1087[%1094] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1086, %1095 : !llvm.ptr + %1096 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1098 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1099 = llvm.mul %1035, %1098 : !llvm.i64 + %1100 = llvm.add %1097, %1099 : !llvm.i64 + %1101 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1102 = llvm.mul %59, %1101 : !llvm.i64 + %1103 = llvm.add %1100, %1102 : !llvm.i64 + %1104 = llvm.getelementptr %1096[%1103] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1105 = llvm.load %1104 : !llvm.ptr + %1106 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1108 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1109 = llvm.mul %59, %1108 : !llvm.i64 + %1110 = llvm.add %1107, %1109 : !llvm.i64 + %1111 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1112 = llvm.mul %120, %1111 : !llvm.i64 + %1113 = llvm.add %1110, %1112 : !llvm.i64 + %1114 = llvm.getelementptr %1106[%1113] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1115 = llvm.load %1114 : !llvm.ptr + %1116 = llvm.fmul %1105, %1115 {RelaxedPrecision} : !llvm.float + %1117 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1120 = llvm.mul %1035, %1119 : !llvm.i64 + %1121 = llvm.add %1118, %1120 : !llvm.i64 + %1122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1123 = llvm.mul %120, %1122 : !llvm.i64 + %1124 = llvm.add %1121, %1123 : !llvm.i64 + %1125 = llvm.getelementptr %1117[%1124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1126 = llvm.load %1125 : !llvm.ptr + %1127 = llvm.fadd %1126, %1116 {RelaxedPrecision} : !llvm.float + %1128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1131 = llvm.mul %1035, %1130 : !llvm.i64 + %1132 = llvm.add %1129, %1131 : !llvm.i64 + %1133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1134 = llvm.mul %120, %1133 : !llvm.i64 + %1135 = llvm.add %1132, %1134 : !llvm.i64 + %1136 = llvm.getelementptr %1128[%1135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1127, %1136 : !llvm.ptr + %1137 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1140 = llvm.mul %1035, %1139 : !llvm.i64 + %1141 = llvm.add %1138, %1140 : !llvm.i64 + %1142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1143 = llvm.mul %120, %1142 : !llvm.i64 + %1144 = llvm.add %1141, %1143 : !llvm.i64 + %1145 = llvm.getelementptr %1137[%1144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1146 = llvm.load %1145 : !llvm.ptr + %1147 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1149 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1150 = llvm.mul %1035, %1149 : !llvm.i64 + %1151 = llvm.add %1148, %1150 : !llvm.i64 + %1152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1153 = llvm.mul %120, %1152 : !llvm.i64 + %1154 = llvm.add %1151, %1153 : !llvm.i64 + %1155 = llvm.getelementptr %1147[%1154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1146, %1155 : !llvm.ptr + %1156 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1157 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1158 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1159 = llvm.mul %1035, %1158 : !llvm.i64 + %1160 = llvm.add %1157, %1159 : !llvm.i64 + %1161 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1162 = llvm.mul %59, %1161 : !llvm.i64 + %1163 = llvm.add %1160, %1162 : !llvm.i64 + %1164 = llvm.getelementptr %1156[%1163] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1165 = llvm.load %1164 : !llvm.ptr + %1166 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1167 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1168 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1169 = llvm.mul %59, %1168 : !llvm.i64 + %1170 = llvm.add %1167, %1169 : !llvm.i64 + %1171 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1172 = llvm.mul %181, %1171 : !llvm.i64 + %1173 = llvm.add %1170, %1172 : !llvm.i64 + %1174 = llvm.getelementptr %1166[%1173] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1175 = llvm.load %1174 : !llvm.ptr + %1176 = llvm.fmul %1165, %1175 {RelaxedPrecision} : !llvm.float + %1177 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1179 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1180 = llvm.mul %1035, %1179 : !llvm.i64 + %1181 = llvm.add %1178, %1180 : !llvm.i64 + %1182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1183 = llvm.mul %181, %1182 : !llvm.i64 + %1184 = llvm.add %1181, %1183 : !llvm.i64 + %1185 = llvm.getelementptr %1177[%1184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1186 = llvm.load %1185 : !llvm.ptr + %1187 = llvm.fadd %1186, %1176 {RelaxedPrecision} : !llvm.float + %1188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1191 = llvm.mul %1035, %1190 : !llvm.i64 + %1192 = llvm.add %1189, %1191 : !llvm.i64 + %1193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1194 = llvm.mul %181, %1193 : !llvm.i64 + %1195 = llvm.add %1192, %1194 : !llvm.i64 + %1196 = llvm.getelementptr %1188[%1195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1187, %1196 : !llvm.ptr + %1197 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1198 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1199 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1200 = llvm.mul %1035, %1199 : !llvm.i64 + %1201 = llvm.add %1198, %1200 : !llvm.i64 + %1202 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1203 = llvm.mul %181, %1202 : !llvm.i64 + %1204 = llvm.add %1201, %1203 : !llvm.i64 + %1205 = llvm.getelementptr %1197[%1204] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1206 = llvm.load %1205 : !llvm.ptr + %1207 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1208 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1209 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1210 = llvm.mul %1035, %1209 : !llvm.i64 + %1211 = llvm.add %1208, %1210 : !llvm.i64 + %1212 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1213 = llvm.mul %181, %1212 : !llvm.i64 + %1214 = llvm.add %1211, %1213 : !llvm.i64 + %1215 = llvm.getelementptr %1207[%1214] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1206, %1215 : !llvm.ptr + %1216 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1217 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1218 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1219 = llvm.mul %1035, %1218 : !llvm.i64 + %1220 = llvm.add %1217, %1219 : !llvm.i64 + %1221 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1222 = llvm.mul %59, %1221 : !llvm.i64 + %1223 = llvm.add %1220, %1222 : !llvm.i64 + %1224 = llvm.getelementptr %1216[%1223] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1225 = llvm.load %1224 : !llvm.ptr + %1226 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1227 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1228 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1229 = llvm.mul %59, %1228 : !llvm.i64 + %1230 = llvm.add %1227, %1229 : !llvm.i64 + %1231 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1232 = llvm.mul %242, %1231 : !llvm.i64 + %1233 = llvm.add %1230, %1232 : !llvm.i64 + %1234 = llvm.getelementptr %1226[%1233] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1235 = llvm.load %1234 : !llvm.ptr + %1236 = llvm.fmul %1225, %1235 {RelaxedPrecision} : !llvm.float + %1237 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1239 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1240 = llvm.mul %1035, %1239 : !llvm.i64 + %1241 = llvm.add %1238, %1240 : !llvm.i64 + %1242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1243 = llvm.mul %242, %1242 : !llvm.i64 + %1244 = llvm.add %1241, %1243 : !llvm.i64 + %1245 = llvm.getelementptr %1237[%1244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1246 = llvm.load %1245 : !llvm.ptr + %1247 = llvm.fadd %1246, %1236 {RelaxedPrecision} : !llvm.float + %1248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1251 = llvm.mul %1035, %1250 : !llvm.i64 + %1252 = llvm.add %1249, %1251 : !llvm.i64 + %1253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1254 = llvm.mul %242, %1253 : !llvm.i64 + %1255 = llvm.add %1252, %1254 : !llvm.i64 + %1256 = llvm.getelementptr %1248[%1255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1247, %1256 : !llvm.ptr + %1257 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1258 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1259 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1260 = llvm.mul %1035, %1259 : !llvm.i64 + %1261 = llvm.add %1258, %1260 : !llvm.i64 + %1262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1263 = llvm.mul %242, %1262 : !llvm.i64 + %1264 = llvm.add %1261, %1263 : !llvm.i64 + %1265 = llvm.getelementptr %1257[%1264] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1266 = llvm.load %1265 : !llvm.ptr + %1267 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1269 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1270 = llvm.mul %1035, %1269 : !llvm.i64 + %1271 = llvm.add %1268, %1270 : !llvm.i64 + %1272 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1273 = llvm.mul %242, %1272 : !llvm.i64 + %1274 = llvm.add %1271, %1273 : !llvm.i64 + %1275 = llvm.getelementptr %1267[%1274] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1266, %1275 : !llvm.ptr + %1276 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1277 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1278 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1279 = llvm.mul %1035, %1278 : !llvm.i64 + %1280 = llvm.add %1277, %1279 : !llvm.i64 + %1281 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1282 = llvm.mul %59, %1281 : !llvm.i64 + %1283 = llvm.add %1280, %1282 : !llvm.i64 + %1284 = llvm.getelementptr %1276[%1283] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1285 = llvm.load %1284 : !llvm.ptr + %1286 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1287 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1288 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1289 = llvm.mul %59, %1288 : !llvm.i64 + %1290 = llvm.add %1287, %1289 : !llvm.i64 + %1291 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1292 = llvm.mul %303, %1291 : !llvm.i64 + %1293 = llvm.add %1290, %1292 : !llvm.i64 + %1294 = llvm.getelementptr %1286[%1293] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1295 = llvm.load %1294 : !llvm.ptr + %1296 = llvm.fmul %1285, %1295 {RelaxedPrecision} : !llvm.float + %1297 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1299 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1300 = llvm.mul %1035, %1299 : !llvm.i64 + %1301 = llvm.add %1298, %1300 : !llvm.i64 + %1302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1303 = llvm.mul %303, %1302 : !llvm.i64 + %1304 = llvm.add %1301, %1303 : !llvm.i64 + %1305 = llvm.getelementptr %1297[%1304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1306 = llvm.load %1305 : !llvm.ptr + %1307 = llvm.fadd %1306, %1296 {RelaxedPrecision} : !llvm.float + %1308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1311 = llvm.mul %1035, %1310 : !llvm.i64 + %1312 = llvm.add %1309, %1311 : !llvm.i64 + %1313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1314 = llvm.mul %303, %1313 : !llvm.i64 + %1315 = llvm.add %1312, %1314 : !llvm.i64 + %1316 = llvm.getelementptr %1308[%1315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1307, %1316 : !llvm.ptr + %1317 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1318 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1319 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1320 = llvm.mul %1035, %1319 : !llvm.i64 + %1321 = llvm.add %1318, %1320 : !llvm.i64 + %1322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1323 = llvm.mul %303, %1322 : !llvm.i64 + %1324 = llvm.add %1321, %1323 : !llvm.i64 + %1325 = llvm.getelementptr %1317[%1324] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1326 = llvm.load %1325 : !llvm.ptr + %1327 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1328 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1329 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1330 = llvm.mul %1035, %1329 : !llvm.i64 + %1331 = llvm.add %1328, %1330 : !llvm.i64 + %1332 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1333 = llvm.mul %303, %1332 : !llvm.i64 + %1334 = llvm.add %1331, %1333 : !llvm.i64 + %1335 = llvm.getelementptr %1327[%1334] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1326, %1335 : !llvm.ptr + %1336 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1337 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1338 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1339 = llvm.mul %1035, %1338 : !llvm.i64 + %1340 = llvm.add %1337, %1339 : !llvm.i64 + %1341 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1342 = llvm.mul %59, %1341 : !llvm.i64 + %1343 = llvm.add %1340, %1342 : !llvm.i64 + %1344 = llvm.getelementptr %1336[%1343] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1345 = llvm.load %1344 : !llvm.ptr + %1346 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1347 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1348 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1349 = llvm.mul %59, %1348 : !llvm.i64 + %1350 = llvm.add %1347, %1349 : !llvm.i64 + %1351 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1352 = llvm.mul %364, %1351 : !llvm.i64 + %1353 = llvm.add %1350, %1352 : !llvm.i64 + %1354 = llvm.getelementptr %1346[%1353] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1355 = llvm.load %1354 : !llvm.ptr + %1356 = llvm.fmul %1345, %1355 {RelaxedPrecision} : !llvm.float + %1357 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1359 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1360 = llvm.mul %1035, %1359 : !llvm.i64 + %1361 = llvm.add %1358, %1360 : !llvm.i64 + %1362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1363 = llvm.mul %364, %1362 : !llvm.i64 + %1364 = llvm.add %1361, %1363 : !llvm.i64 + %1365 = llvm.getelementptr %1357[%1364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1366 = llvm.load %1365 : !llvm.ptr + %1367 = llvm.fadd %1366, %1356 {RelaxedPrecision} : !llvm.float + %1368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1371 = llvm.mul %1035, %1370 : !llvm.i64 + %1372 = llvm.add %1369, %1371 : !llvm.i64 + %1373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1374 = llvm.mul %364, %1373 : !llvm.i64 + %1375 = llvm.add %1372, %1374 : !llvm.i64 + %1376 = llvm.getelementptr %1368[%1375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1367, %1376 : !llvm.ptr + %1377 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1378 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1379 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1380 = llvm.mul %1035, %1379 : !llvm.i64 + %1381 = llvm.add %1378, %1380 : !llvm.i64 + %1382 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1383 = llvm.mul %364, %1382 : !llvm.i64 + %1384 = llvm.add %1381, %1383 : !llvm.i64 + %1385 = llvm.getelementptr %1377[%1384] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1386 = llvm.load %1385 : !llvm.ptr + %1387 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1389 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1390 = llvm.mul %1035, %1389 : !llvm.i64 + %1391 = llvm.add %1388, %1390 : !llvm.i64 + %1392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1393 = llvm.mul %364, %1392 : !llvm.i64 + %1394 = llvm.add %1391, %1393 : !llvm.i64 + %1395 = llvm.getelementptr %1387[%1394] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1386, %1395 : !llvm.ptr + %1396 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1397 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1398 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1399 = llvm.mul %1035, %1398 : !llvm.i64 + %1400 = llvm.add %1397, %1399 : !llvm.i64 + %1401 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1402 = llvm.mul %59, %1401 : !llvm.i64 + %1403 = llvm.add %1400, %1402 : !llvm.i64 + %1404 = llvm.getelementptr %1396[%1403] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1405 = llvm.load %1404 : !llvm.ptr + %1406 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1408 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1409 = llvm.mul %59, %1408 : !llvm.i64 + %1410 = llvm.add %1407, %1409 : !llvm.i64 + %1411 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1412 = llvm.mul %425, %1411 : !llvm.i64 + %1413 = llvm.add %1410, %1412 : !llvm.i64 + %1414 = llvm.getelementptr %1406[%1413] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1415 = llvm.load %1414 : !llvm.ptr + %1416 = llvm.fmul %1405, %1415 {RelaxedPrecision} : !llvm.float + %1417 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1419 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1420 = llvm.mul %1035, %1419 : !llvm.i64 + %1421 = llvm.add %1418, %1420 : !llvm.i64 + %1422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1423 = llvm.mul %425, %1422 : !llvm.i64 + %1424 = llvm.add %1421, %1423 : !llvm.i64 + %1425 = llvm.getelementptr %1417[%1424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1426 = llvm.load %1425 : !llvm.ptr + %1427 = llvm.fadd %1426, %1416 {RelaxedPrecision} : !llvm.float + %1428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1431 = llvm.mul %1035, %1430 : !llvm.i64 + %1432 = llvm.add %1429, %1431 : !llvm.i64 + %1433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1434 = llvm.mul %425, %1433 : !llvm.i64 + %1435 = llvm.add %1432, %1434 : !llvm.i64 + %1436 = llvm.getelementptr %1428[%1435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1427, %1436 : !llvm.ptr + %1437 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1438 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1439 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1440 = llvm.mul %1035, %1439 : !llvm.i64 + %1441 = llvm.add %1438, %1440 : !llvm.i64 + %1442 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1443 = llvm.mul %425, %1442 : !llvm.i64 + %1444 = llvm.add %1441, %1443 : !llvm.i64 + %1445 = llvm.getelementptr %1437[%1444] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1446 = llvm.load %1445 : !llvm.ptr + %1447 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1449 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1450 = llvm.mul %1035, %1449 : !llvm.i64 + %1451 = llvm.add %1448, %1450 : !llvm.i64 + %1452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1453 = llvm.mul %425, %1452 : !llvm.i64 + %1454 = llvm.add %1451, %1453 : !llvm.i64 + %1455 = llvm.getelementptr %1447[%1454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1446, %1455 : !llvm.ptr + %1456 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1457 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1458 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1459 = llvm.mul %1035, %1458 : !llvm.i64 + %1460 = llvm.add %1457, %1459 : !llvm.i64 + %1461 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1462 = llvm.mul %59, %1461 : !llvm.i64 + %1463 = llvm.add %1460, %1462 : !llvm.i64 + %1464 = llvm.getelementptr %1456[%1463] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1465 = llvm.load %1464 : !llvm.ptr + %1466 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1467 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1468 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1469 = llvm.mul %59, %1468 : !llvm.i64 + %1470 = llvm.add %1467, %1469 : !llvm.i64 + %1471 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1472 = llvm.mul %486, %1471 : !llvm.i64 + %1473 = llvm.add %1470, %1472 : !llvm.i64 + %1474 = llvm.getelementptr %1466[%1473] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1475 = llvm.load %1474 : !llvm.ptr + %1476 = llvm.fmul %1465, %1475 {RelaxedPrecision} : !llvm.float + %1477 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1480 = llvm.mul %1035, %1479 : !llvm.i64 + %1481 = llvm.add %1478, %1480 : !llvm.i64 + %1482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1483 = llvm.mul %486, %1482 : !llvm.i64 + %1484 = llvm.add %1481, %1483 : !llvm.i64 + %1485 = llvm.getelementptr %1477[%1484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1486 = llvm.load %1485 : !llvm.ptr + %1487 = llvm.fadd %1486, %1476 {RelaxedPrecision} : !llvm.float + %1488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1491 = llvm.mul %1035, %1490 : !llvm.i64 + %1492 = llvm.add %1489, %1491 : !llvm.i64 + %1493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1494 = llvm.mul %486, %1493 : !llvm.i64 + %1495 = llvm.add %1492, %1494 : !llvm.i64 + %1496 = llvm.getelementptr %1488[%1495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1487, %1496 : !llvm.ptr + %1497 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1500 = llvm.mul %1035, %1499 : !llvm.i64 + %1501 = llvm.add %1498, %1500 : !llvm.i64 + %1502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1503 = llvm.mul %486, %1502 : !llvm.i64 + %1504 = llvm.add %1501, %1503 : !llvm.i64 + %1505 = llvm.getelementptr %1497[%1504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1506 = llvm.load %1505 : !llvm.ptr + %1507 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1508 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1509 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1510 = llvm.mul %1035, %1509 : !llvm.i64 + %1511 = llvm.add %1508, %1510 : !llvm.i64 + %1512 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1513 = llvm.mul %486, %1512 : !llvm.i64 + %1514 = llvm.add %1511, %1513 : !llvm.i64 + %1515 = llvm.getelementptr %1507[%1514] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1506, %1515 : !llvm.ptr + %1516 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1517 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1518 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1519 = llvm.mul %1035, %1518 : !llvm.i64 + %1520 = llvm.add %1517, %1519 : !llvm.i64 + %1521 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1522 = llvm.mul %59, %1521 : !llvm.i64 + %1523 = llvm.add %1520, %1522 : !llvm.i64 + %1524 = llvm.getelementptr %1516[%1523] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1525 = llvm.load %1524 : !llvm.ptr + %1526 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1528 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1529 = llvm.mul %59, %1528 : !llvm.i64 + %1530 = llvm.add %1527, %1529 : !llvm.i64 + %1531 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1532 = llvm.mul %547, %1531 : !llvm.i64 + %1533 = llvm.add %1530, %1532 : !llvm.i64 + %1534 = llvm.getelementptr %1526[%1533] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1535 = llvm.load %1534 : !llvm.ptr + %1536 = llvm.fmul %1525, %1535 {RelaxedPrecision} : !llvm.float + %1537 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1539 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1540 = llvm.mul %1035, %1539 : !llvm.i64 + %1541 = llvm.add %1538, %1540 : !llvm.i64 + %1542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1543 = llvm.mul %547, %1542 : !llvm.i64 + %1544 = llvm.add %1541, %1543 : !llvm.i64 + %1545 = llvm.getelementptr %1537[%1544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1546 = llvm.load %1545 : !llvm.ptr + %1547 = llvm.fadd %1546, %1536 {RelaxedPrecision} : !llvm.float + %1548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1551 = llvm.mul %1035, %1550 : !llvm.i64 + %1552 = llvm.add %1549, %1551 : !llvm.i64 + %1553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1554 = llvm.mul %547, %1553 : !llvm.i64 + %1555 = llvm.add %1552, %1554 : !llvm.i64 + %1556 = llvm.getelementptr %1548[%1555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1547, %1556 : !llvm.ptr + %1557 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1558 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1559 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1560 = llvm.mul %1035, %1559 : !llvm.i64 + %1561 = llvm.add %1558, %1560 : !llvm.i64 + %1562 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1563 = llvm.mul %547, %1562 : !llvm.i64 + %1564 = llvm.add %1561, %1563 : !llvm.i64 + %1565 = llvm.getelementptr %1557[%1564] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1566 = llvm.load %1565 : !llvm.ptr + %1567 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1568 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1569 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1570 = llvm.mul %1035, %1569 : !llvm.i64 + %1571 = llvm.add %1568, %1570 : !llvm.i64 + %1572 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1573 = llvm.mul %547, %1572 : !llvm.i64 + %1574 = llvm.add %1571, %1573 : !llvm.i64 + %1575 = llvm.getelementptr %1567[%1574] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1566, %1575 : !llvm.ptr + %1576 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1577 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1578 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1579 = llvm.mul %1035, %1578 : !llvm.i64 + %1580 = llvm.add %1577, %1579 : !llvm.i64 + %1581 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1582 = llvm.mul %59, %1581 : !llvm.i64 + %1583 = llvm.add %1580, %1582 : !llvm.i64 + %1584 = llvm.getelementptr %1576[%1583] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1585 = llvm.load %1584 : !llvm.ptr + %1586 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1587 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1588 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1589 = llvm.mul %59, %1588 : !llvm.i64 + %1590 = llvm.add %1587, %1589 : !llvm.i64 + %1591 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1592 = llvm.mul %608, %1591 : !llvm.i64 + %1593 = llvm.add %1590, %1592 : !llvm.i64 + %1594 = llvm.getelementptr %1586[%1593] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1595 = llvm.load %1594 : !llvm.ptr + %1596 = llvm.fmul %1585, %1595 {RelaxedPrecision} : !llvm.float + %1597 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1599 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1600 = llvm.mul %1035, %1599 : !llvm.i64 + %1601 = llvm.add %1598, %1600 : !llvm.i64 + %1602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1603 = llvm.mul %608, %1602 : !llvm.i64 + %1604 = llvm.add %1601, %1603 : !llvm.i64 + %1605 = llvm.getelementptr %1597[%1604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1606 = llvm.load %1605 : !llvm.ptr + %1607 = llvm.fadd %1606, %1596 {RelaxedPrecision} : !llvm.float + %1608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1611 = llvm.mul %1035, %1610 : !llvm.i64 + %1612 = llvm.add %1609, %1611 : !llvm.i64 + %1613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1614 = llvm.mul %608, %1613 : !llvm.i64 + %1615 = llvm.add %1612, %1614 : !llvm.i64 + %1616 = llvm.getelementptr %1608[%1615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1607, %1616 : !llvm.ptr + %1617 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1618 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1619 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1620 = llvm.mul %1035, %1619 : !llvm.i64 + %1621 = llvm.add %1618, %1620 : !llvm.i64 + %1622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1623 = llvm.mul %608, %1622 : !llvm.i64 + %1624 = llvm.add %1621, %1623 : !llvm.i64 + %1625 = llvm.getelementptr %1617[%1624] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1626 = llvm.load %1625 : !llvm.ptr + %1627 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1628 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1629 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1630 = llvm.mul %1035, %1629 : !llvm.i64 + %1631 = llvm.add %1628, %1630 : !llvm.i64 + %1632 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1633 = llvm.mul %608, %1632 : !llvm.i64 + %1634 = llvm.add %1631, %1633 : !llvm.i64 + %1635 = llvm.getelementptr %1627[%1634] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1626, %1635 : !llvm.ptr + %1636 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1637 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1638 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1639 = llvm.mul %1035, %1638 : !llvm.i64 + %1640 = llvm.add %1637, %1639 : !llvm.i64 + %1641 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1642 = llvm.mul %59, %1641 : !llvm.i64 + %1643 = llvm.add %1640, %1642 : !llvm.i64 + %1644 = llvm.getelementptr %1636[%1643] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1645 = llvm.load %1644 : !llvm.ptr + %1646 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1647 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1648 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1649 = llvm.mul %59, %1648 : !llvm.i64 + %1650 = llvm.add %1647, %1649 : !llvm.i64 + %1651 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1652 = llvm.mul %669, %1651 : !llvm.i64 + %1653 = llvm.add %1650, %1652 : !llvm.i64 + %1654 = llvm.getelementptr %1646[%1653] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1655 = llvm.load %1654 : !llvm.ptr + %1656 = llvm.fmul %1645, %1655 {RelaxedPrecision} : !llvm.float + %1657 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1659 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1660 = llvm.mul %1035, %1659 : !llvm.i64 + %1661 = llvm.add %1658, %1660 : !llvm.i64 + %1662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1663 = llvm.mul %669, %1662 : !llvm.i64 + %1664 = llvm.add %1661, %1663 : !llvm.i64 + %1665 = llvm.getelementptr %1657[%1664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1666 = llvm.load %1665 : !llvm.ptr + %1667 = llvm.fadd %1666, %1656 {RelaxedPrecision} : !llvm.float + %1668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1671 = llvm.mul %1035, %1670 : !llvm.i64 + %1672 = llvm.add %1669, %1671 : !llvm.i64 + %1673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1674 = llvm.mul %669, %1673 : !llvm.i64 + %1675 = llvm.add %1672, %1674 : !llvm.i64 + %1676 = llvm.getelementptr %1668[%1675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1667, %1676 : !llvm.ptr + %1677 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1680 = llvm.mul %1035, %1679 : !llvm.i64 + %1681 = llvm.add %1678, %1680 : !llvm.i64 + %1682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1683 = llvm.mul %669, %1682 : !llvm.i64 + %1684 = llvm.add %1681, %1683 : !llvm.i64 + %1685 = llvm.getelementptr %1677[%1684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1686 = llvm.load %1685 : !llvm.ptr + %1687 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1688 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1689 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1690 = llvm.mul %1035, %1689 : !llvm.i64 + %1691 = llvm.add %1688, %1690 : !llvm.i64 + %1692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1693 = llvm.mul %669, %1692 : !llvm.i64 + %1694 = llvm.add %1691, %1693 : !llvm.i64 + %1695 = llvm.getelementptr %1687[%1694] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1686, %1695 : !llvm.ptr + %1696 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1697 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1698 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1699 = llvm.mul %1035, %1698 : !llvm.i64 + %1700 = llvm.add %1697, %1699 : !llvm.i64 + %1701 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1702 = llvm.mul %59, %1701 : !llvm.i64 + %1703 = llvm.add %1700, %1702 : !llvm.i64 + %1704 = llvm.getelementptr %1696[%1703] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1705 = llvm.load %1704 : !llvm.ptr + %1706 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1708 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1709 = llvm.mul %59, %1708 : !llvm.i64 + %1710 = llvm.add %1707, %1709 : !llvm.i64 + %1711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1712 = llvm.mul %730, %1711 : !llvm.i64 + %1713 = llvm.add %1710, %1712 : !llvm.i64 + %1714 = llvm.getelementptr %1706[%1713] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1715 = llvm.load %1714 : !llvm.ptr + %1716 = llvm.fmul %1705, %1715 {RelaxedPrecision} : !llvm.float + %1717 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1718 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1719 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1720 = llvm.mul %1035, %1719 : !llvm.i64 + %1721 = llvm.add %1718, %1720 : !llvm.i64 + %1722 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1723 = llvm.mul %730, %1722 : !llvm.i64 + %1724 = llvm.add %1721, %1723 : !llvm.i64 + %1725 = llvm.getelementptr %1717[%1724] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1726 = llvm.load %1725 : !llvm.ptr + %1727 = llvm.fadd %1726, %1716 {RelaxedPrecision} : !llvm.float + %1728 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1729 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1730 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1731 = llvm.mul %1035, %1730 : !llvm.i64 + %1732 = llvm.add %1729, %1731 : !llvm.i64 + %1733 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1734 = llvm.mul %730, %1733 : !llvm.i64 + %1735 = llvm.add %1732, %1734 : !llvm.i64 + %1736 = llvm.getelementptr %1728[%1735] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1727, %1736 : !llvm.ptr + %1737 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1738 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1739 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1740 = llvm.mul %1035, %1739 : !llvm.i64 + %1741 = llvm.add %1738, %1740 : !llvm.i64 + %1742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1743 = llvm.mul %730, %1742 : !llvm.i64 + %1744 = llvm.add %1741, %1743 : !llvm.i64 + %1745 = llvm.getelementptr %1737[%1744] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1746 = llvm.load %1745 : !llvm.ptr + %1747 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1749 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1750 = llvm.mul %1035, %1749 : !llvm.i64 + %1751 = llvm.add %1748, %1750 : !llvm.i64 + %1752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1753 = llvm.mul %730, %1752 : !llvm.i64 + %1754 = llvm.add %1751, %1753 : !llvm.i64 + %1755 = llvm.getelementptr %1747[%1754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1746, %1755 : !llvm.ptr + %1756 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1758 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1759 = llvm.mul %1035, %1758 : !llvm.i64 + %1760 = llvm.add %1757, %1759 : !llvm.i64 + %1761 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1762 = llvm.mul %59, %1761 : !llvm.i64 + %1763 = llvm.add %1760, %1762 : !llvm.i64 + %1764 = llvm.getelementptr %1756[%1763] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1765 = llvm.load %1764 : !llvm.ptr + %1766 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1767 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1768 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1769 = llvm.mul %59, %1768 : !llvm.i64 + %1770 = llvm.add %1767, %1769 : !llvm.i64 + %1771 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1772 = llvm.mul %791, %1771 : !llvm.i64 + %1773 = llvm.add %1770, %1772 : !llvm.i64 + %1774 = llvm.getelementptr %1766[%1773] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1775 = llvm.load %1774 : !llvm.ptr + %1776 = llvm.fmul %1765, %1775 {RelaxedPrecision} : !llvm.float + %1777 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1779 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1780 = llvm.mul %1035, %1779 : !llvm.i64 + %1781 = llvm.add %1778, %1780 : !llvm.i64 + %1782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1783 = llvm.mul %791, %1782 : !llvm.i64 + %1784 = llvm.add %1781, %1783 : !llvm.i64 + %1785 = llvm.getelementptr %1777[%1784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1786 = llvm.load %1785 : !llvm.ptr + %1787 = llvm.fadd %1786, %1776 {RelaxedPrecision} : !llvm.float + %1788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1791 = llvm.mul %1035, %1790 : !llvm.i64 + %1792 = llvm.add %1789, %1791 : !llvm.i64 + %1793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1794 = llvm.mul %791, %1793 : !llvm.i64 + %1795 = llvm.add %1792, %1794 : !llvm.i64 + %1796 = llvm.getelementptr %1788[%1795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1787, %1796 : !llvm.ptr + %1797 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1799 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1800 = llvm.mul %1035, %1799 : !llvm.i64 + %1801 = llvm.add %1798, %1800 : !llvm.i64 + %1802 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1803 = llvm.mul %791, %1802 : !llvm.i64 + %1804 = llvm.add %1801, %1803 : !llvm.i64 + %1805 = llvm.getelementptr %1797[%1804] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1806 = llvm.load %1805 : !llvm.ptr + %1807 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1808 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1809 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1810 = llvm.mul %1035, %1809 : !llvm.i64 + %1811 = llvm.add %1808, %1810 : !llvm.i64 + %1812 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1813 = llvm.mul %791, %1812 : !llvm.i64 + %1814 = llvm.add %1811, %1813 : !llvm.i64 + %1815 = llvm.getelementptr %1807[%1814] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1806, %1815 : !llvm.ptr + %1816 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1817 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1818 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1819 = llvm.mul %1035, %1818 : !llvm.i64 + %1820 = llvm.add %1817, %1819 : !llvm.i64 + %1821 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1822 = llvm.mul %59, %1821 : !llvm.i64 + %1823 = llvm.add %1820, %1822 : !llvm.i64 + %1824 = llvm.getelementptr %1816[%1823] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1825 = llvm.load %1824 : !llvm.ptr + %1826 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1827 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1828 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1829 = llvm.mul %59, %1828 : !llvm.i64 + %1830 = llvm.add %1827, %1829 : !llvm.i64 + %1831 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1832 = llvm.mul %852, %1831 : !llvm.i64 + %1833 = llvm.add %1830, %1832 : !llvm.i64 + %1834 = llvm.getelementptr %1826[%1833] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1835 = llvm.load %1834 : !llvm.ptr + %1836 = llvm.fmul %1825, %1835 {RelaxedPrecision} : !llvm.float + %1837 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1839 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1840 = llvm.mul %1035, %1839 : !llvm.i64 + %1841 = llvm.add %1838, %1840 : !llvm.i64 + %1842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1843 = llvm.mul %852, %1842 : !llvm.i64 + %1844 = llvm.add %1841, %1843 : !llvm.i64 + %1845 = llvm.getelementptr %1837[%1844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1846 = llvm.load %1845 : !llvm.ptr + %1847 = llvm.fadd %1846, %1836 {RelaxedPrecision} : !llvm.float + %1848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1851 = llvm.mul %1035, %1850 : !llvm.i64 + %1852 = llvm.add %1849, %1851 : !llvm.i64 + %1853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1854 = llvm.mul %852, %1853 : !llvm.i64 + %1855 = llvm.add %1852, %1854 : !llvm.i64 + %1856 = llvm.getelementptr %1848[%1855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1847, %1856 : !llvm.ptr + %1857 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1858 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1859 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1860 = llvm.mul %1035, %1859 : !llvm.i64 + %1861 = llvm.add %1858, %1860 : !llvm.i64 + %1862 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1863 = llvm.mul %852, %1862 : !llvm.i64 + %1864 = llvm.add %1861, %1863 : !llvm.i64 + %1865 = llvm.getelementptr %1857[%1864] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1866 = llvm.load %1865 : !llvm.ptr + %1867 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1868 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1869 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1870 = llvm.mul %1035, %1869 : !llvm.i64 + %1871 = llvm.add %1868, %1870 : !llvm.i64 + %1872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1873 = llvm.mul %852, %1872 : !llvm.i64 + %1874 = llvm.add %1871, %1873 : !llvm.i64 + %1875 = llvm.getelementptr %1867[%1874] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1866, %1875 : !llvm.ptr + %1876 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1877 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1878 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1879 = llvm.mul %1035, %1878 : !llvm.i64 + %1880 = llvm.add %1877, %1879 : !llvm.i64 + %1881 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1882 = llvm.mul %59, %1881 : !llvm.i64 + %1883 = llvm.add %1880, %1882 : !llvm.i64 + %1884 = llvm.getelementptr %1876[%1883] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1885 = llvm.load %1884 : !llvm.ptr + %1886 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1888 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1889 = llvm.mul %59, %1888 : !llvm.i64 + %1890 = llvm.add %1887, %1889 : !llvm.i64 + %1891 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1892 = llvm.mul %913, %1891 : !llvm.i64 + %1893 = llvm.add %1890, %1892 : !llvm.i64 + %1894 = llvm.getelementptr %1886[%1893] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1895 = llvm.load %1894 : !llvm.ptr + %1896 = llvm.fmul %1885, %1895 {RelaxedPrecision} : !llvm.float + %1897 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1899 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1900 = llvm.mul %1035, %1899 : !llvm.i64 + %1901 = llvm.add %1898, %1900 : !llvm.i64 + %1902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1903 = llvm.mul %913, %1902 : !llvm.i64 + %1904 = llvm.add %1901, %1903 : !llvm.i64 + %1905 = llvm.getelementptr %1897[%1904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1906 = llvm.load %1905 : !llvm.ptr + %1907 = llvm.fadd %1906, %1896 {RelaxedPrecision} : !llvm.float + %1908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1911 = llvm.mul %1035, %1910 : !llvm.i64 + %1912 = llvm.add %1909, %1911 : !llvm.i64 + %1913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1914 = llvm.mul %913, %1913 : !llvm.i64 + %1915 = llvm.add %1912, %1914 : !llvm.i64 + %1916 = llvm.getelementptr %1908[%1915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1907, %1916 : !llvm.ptr + %1917 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1918 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1919 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1920 = llvm.mul %1035, %1919 : !llvm.i64 + %1921 = llvm.add %1918, %1920 : !llvm.i64 + %1922 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1923 = llvm.mul %913, %1922 : !llvm.i64 + %1924 = llvm.add %1921, %1923 : !llvm.i64 + %1925 = llvm.getelementptr %1917[%1924] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1926 = llvm.load %1925 : !llvm.ptr + %1927 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1928 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1929 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1930 = llvm.mul %1035, %1929 : !llvm.i64 + %1931 = llvm.add %1928, %1930 : !llvm.i64 + %1932 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1933 = llvm.mul %913, %1932 : !llvm.i64 + %1934 = llvm.add %1931, %1933 : !llvm.i64 + %1935 = llvm.getelementptr %1927[%1934] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1926, %1935 : !llvm.ptr + %1936 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1937 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1938 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1939 = llvm.mul %1035, %1938 : !llvm.i64 + %1940 = llvm.add %1937, %1939 : !llvm.i64 + %1941 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1942 = llvm.mul %59, %1941 : !llvm.i64 + %1943 = llvm.add %1940, %1942 : !llvm.i64 + %1944 = llvm.getelementptr %1936[%1943] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1945 = llvm.load %1944 : !llvm.ptr + %1946 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1948 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1949 = llvm.mul %59, %1948 : !llvm.i64 + %1950 = llvm.add %1947, %1949 : !llvm.i64 + %1951 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1952 = llvm.mul %974, %1951 : !llvm.i64 + %1953 = llvm.add %1950, %1952 : !llvm.i64 + %1954 = llvm.getelementptr %1946[%1953] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1955 = llvm.load %1954 : !llvm.ptr + %1956 = llvm.fmul %1945, %1955 {RelaxedPrecision} : !llvm.float + %1957 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1958 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1959 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1960 = llvm.mul %1035, %1959 : !llvm.i64 + %1961 = llvm.add %1958, %1960 : !llvm.i64 + %1962 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1963 = llvm.mul %974, %1962 : !llvm.i64 + %1964 = llvm.add %1961, %1963 : !llvm.i64 + %1965 = llvm.getelementptr %1957[%1964] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1966 = llvm.load %1965 : !llvm.ptr + %1967 = llvm.fadd %1966, %1956 {RelaxedPrecision} : !llvm.float + %1968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1971 = llvm.mul %1035, %1970 : !llvm.i64 + %1972 = llvm.add %1969, %1971 : !llvm.i64 + %1973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1974 = llvm.mul %974, %1973 : !llvm.i64 + %1975 = llvm.add %1972, %1974 : !llvm.i64 + %1976 = llvm.getelementptr %1968[%1975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1967, %1976 : !llvm.ptr + %1977 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1978 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1979 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1980 = llvm.mul %1035, %1979 : !llvm.i64 + %1981 = llvm.add %1978, %1980 : !llvm.i64 + %1982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1983 = llvm.mul %974, %1982 : !llvm.i64 + %1984 = llvm.add %1981, %1983 : !llvm.i64 + %1985 = llvm.getelementptr %1977[%1984] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1986 = llvm.load %1985 : !llvm.ptr + %1987 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1988 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1989 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1990 = llvm.mul %1035, %1989 : !llvm.i64 + %1991 = llvm.add %1988, %1990 : !llvm.i64 + %1992 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1993 = llvm.mul %974, %1992 : !llvm.i64 + %1994 = llvm.add %1991, %1993 : !llvm.i64 + %1995 = llvm.getelementptr %1987[%1994] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1986, %1995 : !llvm.ptr + %1996 = llvm.add %50, %34 : !llvm.i64 + %1997 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1998 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1999 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2000 = llvm.mul %1996, %1999 : !llvm.i64 + %2001 = llvm.add %1998, %2000 : !llvm.i64 + %2002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2003 = llvm.mul %59, %2002 : !llvm.i64 + %2004 = llvm.add %2001, %2003 : !llvm.i64 + %2005 = llvm.getelementptr %1997[%2004] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2006 = llvm.load %2005 : !llvm.ptr + %2007 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2009 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2010 = llvm.mul %59, %2009 : !llvm.i64 + %2011 = llvm.add %2008, %2010 : !llvm.i64 + %2012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2013 = llvm.mul %58, %2012 : !llvm.i64 + %2014 = llvm.add %2011, %2013 : !llvm.i64 + %2015 = llvm.getelementptr %2007[%2014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2016 = llvm.load %2015 : !llvm.ptr + %2017 = llvm.fmul %2006, %2016 {RelaxedPrecision} : !llvm.float + %2018 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2020 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2021 = llvm.mul %1996, %2020 : !llvm.i64 + %2022 = llvm.add %2019, %2021 : !llvm.i64 + %2023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2024 = llvm.mul %58, %2023 : !llvm.i64 + %2025 = llvm.add %2022, %2024 : !llvm.i64 + %2026 = llvm.getelementptr %2018[%2025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2027 = llvm.load %2026 : !llvm.ptr + %2028 = llvm.fadd %2027, %2017 {RelaxedPrecision} : !llvm.float + %2029 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2030 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2031 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2032 = llvm.mul %1996, %2031 : !llvm.i64 + %2033 = llvm.add %2030, %2032 : !llvm.i64 + %2034 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2035 = llvm.mul %58, %2034 : !llvm.i64 + %2036 = llvm.add %2033, %2035 : !llvm.i64 + %2037 = llvm.getelementptr %2029[%2036] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2028, %2037 : !llvm.ptr + %2038 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2039 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2040 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2041 = llvm.mul %1996, %2040 : !llvm.i64 + %2042 = llvm.add %2039, %2041 : !llvm.i64 + %2043 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2044 = llvm.mul %58, %2043 : !llvm.i64 + %2045 = llvm.add %2042, %2044 : !llvm.i64 + %2046 = llvm.getelementptr %2038[%2045] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2047 = llvm.load %2046 : !llvm.ptr + %2048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2051 = llvm.mul %1996, %2050 : !llvm.i64 + %2052 = llvm.add %2049, %2051 : !llvm.i64 + %2053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2054 = llvm.mul %58, %2053 : !llvm.i64 + %2055 = llvm.add %2052, %2054 : !llvm.i64 + %2056 = llvm.getelementptr %2048[%2055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2047, %2056 : !llvm.ptr + %2057 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2059 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2060 = llvm.mul %1996, %2059 : !llvm.i64 + %2061 = llvm.add %2058, %2060 : !llvm.i64 + %2062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2063 = llvm.mul %59, %2062 : !llvm.i64 + %2064 = llvm.add %2061, %2063 : !llvm.i64 + %2065 = llvm.getelementptr %2057[%2064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2066 = llvm.load %2065 : !llvm.ptr + %2067 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2069 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2070 = llvm.mul %59, %2069 : !llvm.i64 + %2071 = llvm.add %2068, %2070 : !llvm.i64 + %2072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2073 = llvm.mul %120, %2072 : !llvm.i64 + %2074 = llvm.add %2071, %2073 : !llvm.i64 + %2075 = llvm.getelementptr %2067[%2074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2076 = llvm.load %2075 : !llvm.ptr + %2077 = llvm.fmul %2066, %2076 {RelaxedPrecision} : !llvm.float + %2078 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2080 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2081 = llvm.mul %1996, %2080 : !llvm.i64 + %2082 = llvm.add %2079, %2081 : !llvm.i64 + %2083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2084 = llvm.mul %120, %2083 : !llvm.i64 + %2085 = llvm.add %2082, %2084 : !llvm.i64 + %2086 = llvm.getelementptr %2078[%2085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2087 = llvm.load %2086 : !llvm.ptr + %2088 = llvm.fadd %2087, %2077 {RelaxedPrecision} : !llvm.float + %2089 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2090 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2091 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2092 = llvm.mul %1996, %2091 : !llvm.i64 + %2093 = llvm.add %2090, %2092 : !llvm.i64 + %2094 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2095 = llvm.mul %120, %2094 : !llvm.i64 + %2096 = llvm.add %2093, %2095 : !llvm.i64 + %2097 = llvm.getelementptr %2089[%2096] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2088, %2097 : !llvm.ptr + %2098 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2100 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2101 = llvm.mul %1996, %2100 : !llvm.i64 + %2102 = llvm.add %2099, %2101 : !llvm.i64 + %2103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2104 = llvm.mul %120, %2103 : !llvm.i64 + %2105 = llvm.add %2102, %2104 : !llvm.i64 + %2106 = llvm.getelementptr %2098[%2105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2107 = llvm.load %2106 : !llvm.ptr + %2108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2111 = llvm.mul %1996, %2110 : !llvm.i64 + %2112 = llvm.add %2109, %2111 : !llvm.i64 + %2113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2114 = llvm.mul %120, %2113 : !llvm.i64 + %2115 = llvm.add %2112, %2114 : !llvm.i64 + %2116 = llvm.getelementptr %2108[%2115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2107, %2116 : !llvm.ptr + %2117 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2119 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2120 = llvm.mul %1996, %2119 : !llvm.i64 + %2121 = llvm.add %2118, %2120 : !llvm.i64 + %2122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2123 = llvm.mul %59, %2122 : !llvm.i64 + %2124 = llvm.add %2121, %2123 : !llvm.i64 + %2125 = llvm.getelementptr %2117[%2124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2126 = llvm.load %2125 : !llvm.ptr + %2127 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2128 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2129 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2130 = llvm.mul %59, %2129 : !llvm.i64 + %2131 = llvm.add %2128, %2130 : !llvm.i64 + %2132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2133 = llvm.mul %181, %2132 : !llvm.i64 + %2134 = llvm.add %2131, %2133 : !llvm.i64 + %2135 = llvm.getelementptr %2127[%2134] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2136 = llvm.load %2135 : !llvm.ptr + %2137 = llvm.fmul %2126, %2136 {RelaxedPrecision} : !llvm.float + %2138 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2140 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2141 = llvm.mul %1996, %2140 : !llvm.i64 + %2142 = llvm.add %2139, %2141 : !llvm.i64 + %2143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2144 = llvm.mul %181, %2143 : !llvm.i64 + %2145 = llvm.add %2142, %2144 : !llvm.i64 + %2146 = llvm.getelementptr %2138[%2145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2147 = llvm.load %2146 : !llvm.ptr + %2148 = llvm.fadd %2147, %2137 {RelaxedPrecision} : !llvm.float + %2149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2151 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2152 = llvm.mul %1996, %2151 : !llvm.i64 + %2153 = llvm.add %2150, %2152 : !llvm.i64 + %2154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2155 = llvm.mul %181, %2154 : !llvm.i64 + %2156 = llvm.add %2153, %2155 : !llvm.i64 + %2157 = llvm.getelementptr %2149[%2156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2148, %2157 : !llvm.ptr + %2158 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2161 = llvm.mul %1996, %2160 : !llvm.i64 + %2162 = llvm.add %2159, %2161 : !llvm.i64 + %2163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2164 = llvm.mul %181, %2163 : !llvm.i64 + %2165 = llvm.add %2162, %2164 : !llvm.i64 + %2166 = llvm.getelementptr %2158[%2165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2167 = llvm.load %2166 : !llvm.ptr + %2168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2171 = llvm.mul %1996, %2170 : !llvm.i64 + %2172 = llvm.add %2169, %2171 : !llvm.i64 + %2173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2174 = llvm.mul %181, %2173 : !llvm.i64 + %2175 = llvm.add %2172, %2174 : !llvm.i64 + %2176 = llvm.getelementptr %2168[%2175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2167, %2176 : !llvm.ptr + %2177 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2179 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2180 = llvm.mul %1996, %2179 : !llvm.i64 + %2181 = llvm.add %2178, %2180 : !llvm.i64 + %2182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2183 = llvm.mul %59, %2182 : !llvm.i64 + %2184 = llvm.add %2181, %2183 : !llvm.i64 + %2185 = llvm.getelementptr %2177[%2184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2186 = llvm.load %2185 : !llvm.ptr + %2187 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2188 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2189 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2190 = llvm.mul %59, %2189 : !llvm.i64 + %2191 = llvm.add %2188, %2190 : !llvm.i64 + %2192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2193 = llvm.mul %242, %2192 : !llvm.i64 + %2194 = llvm.add %2191, %2193 : !llvm.i64 + %2195 = llvm.getelementptr %2187[%2194] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2196 = llvm.load %2195 : !llvm.ptr + %2197 = llvm.fmul %2186, %2196 {RelaxedPrecision} : !llvm.float + %2198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2201 = llvm.mul %1996, %2200 : !llvm.i64 + %2202 = llvm.add %2199, %2201 : !llvm.i64 + %2203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2204 = llvm.mul %242, %2203 : !llvm.i64 + %2205 = llvm.add %2202, %2204 : !llvm.i64 + %2206 = llvm.getelementptr %2198[%2205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2207 = llvm.load %2206 : !llvm.ptr + %2208 = llvm.fadd %2207, %2197 {RelaxedPrecision} : !llvm.float + %2209 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2211 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2212 = llvm.mul %1996, %2211 : !llvm.i64 + %2213 = llvm.add %2210, %2212 : !llvm.i64 + %2214 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2215 = llvm.mul %242, %2214 : !llvm.i64 + %2216 = llvm.add %2213, %2215 : !llvm.i64 + %2217 = llvm.getelementptr %2209[%2216] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2208, %2217 : !llvm.ptr + %2218 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2220 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2221 = llvm.mul %1996, %2220 : !llvm.i64 + %2222 = llvm.add %2219, %2221 : !llvm.i64 + %2223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2224 = llvm.mul %242, %2223 : !llvm.i64 + %2225 = llvm.add %2222, %2224 : !llvm.i64 + %2226 = llvm.getelementptr %2218[%2225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2227 = llvm.load %2226 : !llvm.ptr + %2228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2231 = llvm.mul %1996, %2230 : !llvm.i64 + %2232 = llvm.add %2229, %2231 : !llvm.i64 + %2233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2234 = llvm.mul %242, %2233 : !llvm.i64 + %2235 = llvm.add %2232, %2234 : !llvm.i64 + %2236 = llvm.getelementptr %2228[%2235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2227, %2236 : !llvm.ptr + %2237 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2239 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2240 = llvm.mul %1996, %2239 : !llvm.i64 + %2241 = llvm.add %2238, %2240 : !llvm.i64 + %2242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2243 = llvm.mul %59, %2242 : !llvm.i64 + %2244 = llvm.add %2241, %2243 : !llvm.i64 + %2245 = llvm.getelementptr %2237[%2244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2246 = llvm.load %2245 : !llvm.ptr + %2247 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2248 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2249 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2250 = llvm.mul %59, %2249 : !llvm.i64 + %2251 = llvm.add %2248, %2250 : !llvm.i64 + %2252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2253 = llvm.mul %303, %2252 : !llvm.i64 + %2254 = llvm.add %2251, %2253 : !llvm.i64 + %2255 = llvm.getelementptr %2247[%2254] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2256 = llvm.load %2255 : !llvm.ptr + %2257 = llvm.fmul %2246, %2256 {RelaxedPrecision} : !llvm.float + %2258 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2260 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2261 = llvm.mul %1996, %2260 : !llvm.i64 + %2262 = llvm.add %2259, %2261 : !llvm.i64 + %2263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2264 = llvm.mul %303, %2263 : !llvm.i64 + %2265 = llvm.add %2262, %2264 : !llvm.i64 + %2266 = llvm.getelementptr %2258[%2265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2267 = llvm.load %2266 : !llvm.ptr + %2268 = llvm.fadd %2267, %2257 {RelaxedPrecision} : !llvm.float + %2269 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2270 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2271 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2272 = llvm.mul %1996, %2271 : !llvm.i64 + %2273 = llvm.add %2270, %2272 : !llvm.i64 + %2274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2275 = llvm.mul %303, %2274 : !llvm.i64 + %2276 = llvm.add %2273, %2275 : !llvm.i64 + %2277 = llvm.getelementptr %2269[%2276] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2268, %2277 : !llvm.ptr + %2278 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2279 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2280 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2281 = llvm.mul %1996, %2280 : !llvm.i64 + %2282 = llvm.add %2279, %2281 : !llvm.i64 + %2283 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2284 = llvm.mul %303, %2283 : !llvm.i64 + %2285 = llvm.add %2282, %2284 : !llvm.i64 + %2286 = llvm.getelementptr %2278[%2285] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2287 = llvm.load %2286 : !llvm.ptr + %2288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2291 = llvm.mul %1996, %2290 : !llvm.i64 + %2292 = llvm.add %2289, %2291 : !llvm.i64 + %2293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2294 = llvm.mul %303, %2293 : !llvm.i64 + %2295 = llvm.add %2292, %2294 : !llvm.i64 + %2296 = llvm.getelementptr %2288[%2295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2287, %2296 : !llvm.ptr + %2297 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2299 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2300 = llvm.mul %1996, %2299 : !llvm.i64 + %2301 = llvm.add %2298, %2300 : !llvm.i64 + %2302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2303 = llvm.mul %59, %2302 : !llvm.i64 + %2304 = llvm.add %2301, %2303 : !llvm.i64 + %2305 = llvm.getelementptr %2297[%2304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2306 = llvm.load %2305 : !llvm.ptr + %2307 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2308 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2309 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2310 = llvm.mul %59, %2309 : !llvm.i64 + %2311 = llvm.add %2308, %2310 : !llvm.i64 + %2312 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2313 = llvm.mul %364, %2312 : !llvm.i64 + %2314 = llvm.add %2311, %2313 : !llvm.i64 + %2315 = llvm.getelementptr %2307[%2314] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2316 = llvm.load %2315 : !llvm.ptr + %2317 = llvm.fmul %2306, %2316 {RelaxedPrecision} : !llvm.float + %2318 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2320 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2321 = llvm.mul %1996, %2320 : !llvm.i64 + %2322 = llvm.add %2319, %2321 : !llvm.i64 + %2323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2324 = llvm.mul %364, %2323 : !llvm.i64 + %2325 = llvm.add %2322, %2324 : !llvm.i64 + %2326 = llvm.getelementptr %2318[%2325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2327 = llvm.load %2326 : !llvm.ptr + %2328 = llvm.fadd %2327, %2317 {RelaxedPrecision} : !llvm.float + %2329 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2330 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2331 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2332 = llvm.mul %1996, %2331 : !llvm.i64 + %2333 = llvm.add %2330, %2332 : !llvm.i64 + %2334 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2335 = llvm.mul %364, %2334 : !llvm.i64 + %2336 = llvm.add %2333, %2335 : !llvm.i64 + %2337 = llvm.getelementptr %2329[%2336] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2328, %2337 : !llvm.ptr + %2338 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2339 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2340 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2341 = llvm.mul %1996, %2340 : !llvm.i64 + %2342 = llvm.add %2339, %2341 : !llvm.i64 + %2343 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2344 = llvm.mul %364, %2343 : !llvm.i64 + %2345 = llvm.add %2342, %2344 : !llvm.i64 + %2346 = llvm.getelementptr %2338[%2345] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2347 = llvm.load %2346 : !llvm.ptr + %2348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2351 = llvm.mul %1996, %2350 : !llvm.i64 + %2352 = llvm.add %2349, %2351 : !llvm.i64 + %2353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2354 = llvm.mul %364, %2353 : !llvm.i64 + %2355 = llvm.add %2352, %2354 : !llvm.i64 + %2356 = llvm.getelementptr %2348[%2355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2347, %2356 : !llvm.ptr + %2357 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2359 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2360 = llvm.mul %1996, %2359 : !llvm.i64 + %2361 = llvm.add %2358, %2360 : !llvm.i64 + %2362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2363 = llvm.mul %59, %2362 : !llvm.i64 + %2364 = llvm.add %2361, %2363 : !llvm.i64 + %2365 = llvm.getelementptr %2357[%2364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2366 = llvm.load %2365 : !llvm.ptr + %2367 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2368 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2369 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2370 = llvm.mul %59, %2369 : !llvm.i64 + %2371 = llvm.add %2368, %2370 : !llvm.i64 + %2372 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2373 = llvm.mul %425, %2372 : !llvm.i64 + %2374 = llvm.add %2371, %2373 : !llvm.i64 + %2375 = llvm.getelementptr %2367[%2374] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2376 = llvm.load %2375 : !llvm.ptr + %2377 = llvm.fmul %2366, %2376 {RelaxedPrecision} : !llvm.float + %2378 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2380 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2381 = llvm.mul %1996, %2380 : !llvm.i64 + %2382 = llvm.add %2379, %2381 : !llvm.i64 + %2383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2384 = llvm.mul %425, %2383 : !llvm.i64 + %2385 = llvm.add %2382, %2384 : !llvm.i64 + %2386 = llvm.getelementptr %2378[%2385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2387 = llvm.load %2386 : !llvm.ptr + %2388 = llvm.fadd %2387, %2377 {RelaxedPrecision} : !llvm.float + %2389 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2390 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2391 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2392 = llvm.mul %1996, %2391 : !llvm.i64 + %2393 = llvm.add %2390, %2392 : !llvm.i64 + %2394 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2395 = llvm.mul %425, %2394 : !llvm.i64 + %2396 = llvm.add %2393, %2395 : !llvm.i64 + %2397 = llvm.getelementptr %2389[%2396] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2388, %2397 : !llvm.ptr + %2398 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2399 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2401 = llvm.mul %1996, %2400 : !llvm.i64 + %2402 = llvm.add %2399, %2401 : !llvm.i64 + %2403 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2404 = llvm.mul %425, %2403 : !llvm.i64 + %2405 = llvm.add %2402, %2404 : !llvm.i64 + %2406 = llvm.getelementptr %2398[%2405] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2407 = llvm.load %2406 : !llvm.ptr + %2408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2411 = llvm.mul %1996, %2410 : !llvm.i64 + %2412 = llvm.add %2409, %2411 : !llvm.i64 + %2413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2414 = llvm.mul %425, %2413 : !llvm.i64 + %2415 = llvm.add %2412, %2414 : !llvm.i64 + %2416 = llvm.getelementptr %2408[%2415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2407, %2416 : !llvm.ptr + %2417 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2419 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2420 = llvm.mul %1996, %2419 : !llvm.i64 + %2421 = llvm.add %2418, %2420 : !llvm.i64 + %2422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2423 = llvm.mul %59, %2422 : !llvm.i64 + %2424 = llvm.add %2421, %2423 : !llvm.i64 + %2425 = llvm.getelementptr %2417[%2424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2426 = llvm.load %2425 : !llvm.ptr + %2427 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2428 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2429 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2430 = llvm.mul %59, %2429 : !llvm.i64 + %2431 = llvm.add %2428, %2430 : !llvm.i64 + %2432 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2433 = llvm.mul %486, %2432 : !llvm.i64 + %2434 = llvm.add %2431, %2433 : !llvm.i64 + %2435 = llvm.getelementptr %2427[%2434] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2436 = llvm.load %2435 : !llvm.ptr + %2437 = llvm.fmul %2426, %2436 {RelaxedPrecision} : !llvm.float + %2438 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2440 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2441 = llvm.mul %1996, %2440 : !llvm.i64 + %2442 = llvm.add %2439, %2441 : !llvm.i64 + %2443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2444 = llvm.mul %486, %2443 : !llvm.i64 + %2445 = llvm.add %2442, %2444 : !llvm.i64 + %2446 = llvm.getelementptr %2438[%2445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2447 = llvm.load %2446 : !llvm.ptr + %2448 = llvm.fadd %2447, %2437 {RelaxedPrecision} : !llvm.float + %2449 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2450 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2451 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2452 = llvm.mul %1996, %2451 : !llvm.i64 + %2453 = llvm.add %2450, %2452 : !llvm.i64 + %2454 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2455 = llvm.mul %486, %2454 : !llvm.i64 + %2456 = llvm.add %2453, %2455 : !llvm.i64 + %2457 = llvm.getelementptr %2449[%2456] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2448, %2457 : !llvm.ptr + %2458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2461 = llvm.mul %1996, %2460 : !llvm.i64 + %2462 = llvm.add %2459, %2461 : !llvm.i64 + %2463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2464 = llvm.mul %486, %2463 : !llvm.i64 + %2465 = llvm.add %2462, %2464 : !llvm.i64 + %2466 = llvm.getelementptr %2458[%2465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2467 = llvm.load %2466 : !llvm.ptr + %2468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2471 = llvm.mul %1996, %2470 : !llvm.i64 + %2472 = llvm.add %2469, %2471 : !llvm.i64 + %2473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2474 = llvm.mul %486, %2473 : !llvm.i64 + %2475 = llvm.add %2472, %2474 : !llvm.i64 + %2476 = llvm.getelementptr %2468[%2475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2467, %2476 : !llvm.ptr + %2477 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2479 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2480 = llvm.mul %1996, %2479 : !llvm.i64 + %2481 = llvm.add %2478, %2480 : !llvm.i64 + %2482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2483 = llvm.mul %59, %2482 : !llvm.i64 + %2484 = llvm.add %2481, %2483 : !llvm.i64 + %2485 = llvm.getelementptr %2477[%2484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2486 = llvm.load %2485 : !llvm.ptr + %2487 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2489 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2490 = llvm.mul %59, %2489 : !llvm.i64 + %2491 = llvm.add %2488, %2490 : !llvm.i64 + %2492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2493 = llvm.mul %547, %2492 : !llvm.i64 + %2494 = llvm.add %2491, %2493 : !llvm.i64 + %2495 = llvm.getelementptr %2487[%2494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2496 = llvm.load %2495 : !llvm.ptr + %2497 = llvm.fmul %2486, %2496 {RelaxedPrecision} : !llvm.float + %2498 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2500 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2501 = llvm.mul %1996, %2500 : !llvm.i64 + %2502 = llvm.add %2499, %2501 : !llvm.i64 + %2503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2504 = llvm.mul %547, %2503 : !llvm.i64 + %2505 = llvm.add %2502, %2504 : !llvm.i64 + %2506 = llvm.getelementptr %2498[%2505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2507 = llvm.load %2506 : !llvm.ptr + %2508 = llvm.fadd %2507, %2497 {RelaxedPrecision} : !llvm.float + %2509 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2510 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2511 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2512 = llvm.mul %1996, %2511 : !llvm.i64 + %2513 = llvm.add %2510, %2512 : !llvm.i64 + %2514 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2515 = llvm.mul %547, %2514 : !llvm.i64 + %2516 = llvm.add %2513, %2515 : !llvm.i64 + %2517 = llvm.getelementptr %2509[%2516] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2508, %2517 : !llvm.ptr + %2518 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2519 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2520 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2521 = llvm.mul %1996, %2520 : !llvm.i64 + %2522 = llvm.add %2519, %2521 : !llvm.i64 + %2523 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2524 = llvm.mul %547, %2523 : !llvm.i64 + %2525 = llvm.add %2522, %2524 : !llvm.i64 + %2526 = llvm.getelementptr %2518[%2525] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2527 = llvm.load %2526 : !llvm.ptr + %2528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2531 = llvm.mul %1996, %2530 : !llvm.i64 + %2532 = llvm.add %2529, %2531 : !llvm.i64 + %2533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2534 = llvm.mul %547, %2533 : !llvm.i64 + %2535 = llvm.add %2532, %2534 : !llvm.i64 + %2536 = llvm.getelementptr %2528[%2535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2527, %2536 : !llvm.ptr + %2537 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2539 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2540 = llvm.mul %1996, %2539 : !llvm.i64 + %2541 = llvm.add %2538, %2540 : !llvm.i64 + %2542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2543 = llvm.mul %59, %2542 : !llvm.i64 + %2544 = llvm.add %2541, %2543 : !llvm.i64 + %2545 = llvm.getelementptr %2537[%2544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2546 = llvm.load %2545 : !llvm.ptr + %2547 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2548 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2549 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2550 = llvm.mul %59, %2549 : !llvm.i64 + %2551 = llvm.add %2548, %2550 : !llvm.i64 + %2552 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2553 = llvm.mul %608, %2552 : !llvm.i64 + %2554 = llvm.add %2551, %2553 : !llvm.i64 + %2555 = llvm.getelementptr %2547[%2554] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2556 = llvm.load %2555 : !llvm.ptr + %2557 = llvm.fmul %2546, %2556 {RelaxedPrecision} : !llvm.float + %2558 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2561 = llvm.mul %1996, %2560 : !llvm.i64 + %2562 = llvm.add %2559, %2561 : !llvm.i64 + %2563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2564 = llvm.mul %608, %2563 : !llvm.i64 + %2565 = llvm.add %2562, %2564 : !llvm.i64 + %2566 = llvm.getelementptr %2558[%2565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2567 = llvm.load %2566 : !llvm.ptr + %2568 = llvm.fadd %2567, %2557 {RelaxedPrecision} : !llvm.float + %2569 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2570 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2571 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2572 = llvm.mul %1996, %2571 : !llvm.i64 + %2573 = llvm.add %2570, %2572 : !llvm.i64 + %2574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2575 = llvm.mul %608, %2574 : !llvm.i64 + %2576 = llvm.add %2573, %2575 : !llvm.i64 + %2577 = llvm.getelementptr %2569[%2576] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2568, %2577 : !llvm.ptr + %2578 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2580 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2581 = llvm.mul %1996, %2580 : !llvm.i64 + %2582 = llvm.add %2579, %2581 : !llvm.i64 + %2583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2584 = llvm.mul %608, %2583 : !llvm.i64 + %2585 = llvm.add %2582, %2584 : !llvm.i64 + %2586 = llvm.getelementptr %2578[%2585] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2587 = llvm.load %2586 : !llvm.ptr + %2588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2591 = llvm.mul %1996, %2590 : !llvm.i64 + %2592 = llvm.add %2589, %2591 : !llvm.i64 + %2593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2594 = llvm.mul %608, %2593 : !llvm.i64 + %2595 = llvm.add %2592, %2594 : !llvm.i64 + %2596 = llvm.getelementptr %2588[%2595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2587, %2596 : !llvm.ptr + %2597 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2599 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2600 = llvm.mul %1996, %2599 : !llvm.i64 + %2601 = llvm.add %2598, %2600 : !llvm.i64 + %2602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2603 = llvm.mul %59, %2602 : !llvm.i64 + %2604 = llvm.add %2601, %2603 : !llvm.i64 + %2605 = llvm.getelementptr %2597[%2604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2606 = llvm.load %2605 : !llvm.ptr + %2607 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2608 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2610 = llvm.mul %59, %2609 : !llvm.i64 + %2611 = llvm.add %2608, %2610 : !llvm.i64 + %2612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2613 = llvm.mul %669, %2612 : !llvm.i64 + %2614 = llvm.add %2611, %2613 : !llvm.i64 + %2615 = llvm.getelementptr %2607[%2614] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2616 = llvm.load %2615 : !llvm.ptr + %2617 = llvm.fmul %2606, %2616 {RelaxedPrecision} : !llvm.float + %2618 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2620 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2621 = llvm.mul %1996, %2620 : !llvm.i64 + %2622 = llvm.add %2619, %2621 : !llvm.i64 + %2623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2624 = llvm.mul %669, %2623 : !llvm.i64 + %2625 = llvm.add %2622, %2624 : !llvm.i64 + %2626 = llvm.getelementptr %2618[%2625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2627 = llvm.load %2626 : !llvm.ptr + %2628 = llvm.fadd %2627, %2617 {RelaxedPrecision} : !llvm.float + %2629 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2630 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2631 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2632 = llvm.mul %1996, %2631 : !llvm.i64 + %2633 = llvm.add %2630, %2632 : !llvm.i64 + %2634 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2635 = llvm.mul %669, %2634 : !llvm.i64 + %2636 = llvm.add %2633, %2635 : !llvm.i64 + %2637 = llvm.getelementptr %2629[%2636] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2628, %2637 : !llvm.ptr + %2638 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2639 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2640 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2641 = llvm.mul %1996, %2640 : !llvm.i64 + %2642 = llvm.add %2639, %2641 : !llvm.i64 + %2643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2644 = llvm.mul %669, %2643 : !llvm.i64 + %2645 = llvm.add %2642, %2644 : !llvm.i64 + %2646 = llvm.getelementptr %2638[%2645] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2647 = llvm.load %2646 : !llvm.ptr + %2648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2651 = llvm.mul %1996, %2650 : !llvm.i64 + %2652 = llvm.add %2649, %2651 : !llvm.i64 + %2653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2654 = llvm.mul %669, %2653 : !llvm.i64 + %2655 = llvm.add %2652, %2654 : !llvm.i64 + %2656 = llvm.getelementptr %2648[%2655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2647, %2656 : !llvm.ptr + %2657 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2659 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2660 = llvm.mul %1996, %2659 : !llvm.i64 + %2661 = llvm.add %2658, %2660 : !llvm.i64 + %2662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2663 = llvm.mul %59, %2662 : !llvm.i64 + %2664 = llvm.add %2661, %2663 : !llvm.i64 + %2665 = llvm.getelementptr %2657[%2664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2666 = llvm.load %2665 : !llvm.ptr + %2667 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2668 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2669 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2670 = llvm.mul %59, %2669 : !llvm.i64 + %2671 = llvm.add %2668, %2670 : !llvm.i64 + %2672 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2673 = llvm.mul %730, %2672 : !llvm.i64 + %2674 = llvm.add %2671, %2673 : !llvm.i64 + %2675 = llvm.getelementptr %2667[%2674] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2676 = llvm.load %2675 : !llvm.ptr + %2677 = llvm.fmul %2666, %2676 {RelaxedPrecision} : !llvm.float + %2678 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2680 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2681 = llvm.mul %1996, %2680 : !llvm.i64 + %2682 = llvm.add %2679, %2681 : !llvm.i64 + %2683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2684 = llvm.mul %730, %2683 : !llvm.i64 + %2685 = llvm.add %2682, %2684 : !llvm.i64 + %2686 = llvm.getelementptr %2678[%2685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2687 = llvm.load %2686 : !llvm.ptr + %2688 = llvm.fadd %2687, %2677 {RelaxedPrecision} : !llvm.float + %2689 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2690 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2691 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2692 = llvm.mul %1996, %2691 : !llvm.i64 + %2693 = llvm.add %2690, %2692 : !llvm.i64 + %2694 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2695 = llvm.mul %730, %2694 : !llvm.i64 + %2696 = llvm.add %2693, %2695 : !llvm.i64 + %2697 = llvm.getelementptr %2689[%2696] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2688, %2697 : !llvm.ptr + %2698 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2700 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2701 = llvm.mul %1996, %2700 : !llvm.i64 + %2702 = llvm.add %2699, %2701 : !llvm.i64 + %2703 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2704 = llvm.mul %730, %2703 : !llvm.i64 + %2705 = llvm.add %2702, %2704 : !llvm.i64 + %2706 = llvm.getelementptr %2698[%2705] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2707 = llvm.load %2706 : !llvm.ptr + %2708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2711 = llvm.mul %1996, %2710 : !llvm.i64 + %2712 = llvm.add %2709, %2711 : !llvm.i64 + %2713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2714 = llvm.mul %730, %2713 : !llvm.i64 + %2715 = llvm.add %2712, %2714 : !llvm.i64 + %2716 = llvm.getelementptr %2708[%2715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2707, %2716 : !llvm.ptr + %2717 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2718 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2719 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2720 = llvm.mul %1996, %2719 : !llvm.i64 + %2721 = llvm.add %2718, %2720 : !llvm.i64 + %2722 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2723 = llvm.mul %59, %2722 : !llvm.i64 + %2724 = llvm.add %2721, %2723 : !llvm.i64 + %2725 = llvm.getelementptr %2717[%2724] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2726 = llvm.load %2725 : !llvm.ptr + %2727 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2729 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2730 = llvm.mul %59, %2729 : !llvm.i64 + %2731 = llvm.add %2728, %2730 : !llvm.i64 + %2732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2733 = llvm.mul %791, %2732 : !llvm.i64 + %2734 = llvm.add %2731, %2733 : !llvm.i64 + %2735 = llvm.getelementptr %2727[%2734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2736 = llvm.load %2735 : !llvm.ptr + %2737 = llvm.fmul %2726, %2736 {RelaxedPrecision} : !llvm.float + %2738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2741 = llvm.mul %1996, %2740 : !llvm.i64 + %2742 = llvm.add %2739, %2741 : !llvm.i64 + %2743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2744 = llvm.mul %791, %2743 : !llvm.i64 + %2745 = llvm.add %2742, %2744 : !llvm.i64 + %2746 = llvm.getelementptr %2738[%2745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2747 = llvm.load %2746 : !llvm.ptr + %2748 = llvm.fadd %2747, %2737 {RelaxedPrecision} : !llvm.float + %2749 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2750 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2751 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2752 = llvm.mul %1996, %2751 : !llvm.i64 + %2753 = llvm.add %2750, %2752 : !llvm.i64 + %2754 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2755 = llvm.mul %791, %2754 : !llvm.i64 + %2756 = llvm.add %2753, %2755 : !llvm.i64 + %2757 = llvm.getelementptr %2749[%2756] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2748, %2757 : !llvm.ptr + %2758 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2759 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2760 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2761 = llvm.mul %1996, %2760 : !llvm.i64 + %2762 = llvm.add %2759, %2761 : !llvm.i64 + %2763 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2764 = llvm.mul %791, %2763 : !llvm.i64 + %2765 = llvm.add %2762, %2764 : !llvm.i64 + %2766 = llvm.getelementptr %2758[%2765] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2767 = llvm.load %2766 : !llvm.ptr + %2768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2771 = llvm.mul %1996, %2770 : !llvm.i64 + %2772 = llvm.add %2769, %2771 : !llvm.i64 + %2773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2774 = llvm.mul %791, %2773 : !llvm.i64 + %2775 = llvm.add %2772, %2774 : !llvm.i64 + %2776 = llvm.getelementptr %2768[%2775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2767, %2776 : !llvm.ptr + %2777 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2779 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2780 = llvm.mul %1996, %2779 : !llvm.i64 + %2781 = llvm.add %2778, %2780 : !llvm.i64 + %2782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2783 = llvm.mul %59, %2782 : !llvm.i64 + %2784 = llvm.add %2781, %2783 : !llvm.i64 + %2785 = llvm.getelementptr %2777[%2784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2786 = llvm.load %2785 : !llvm.ptr + %2787 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2788 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2789 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2790 = llvm.mul %59, %2789 : !llvm.i64 + %2791 = llvm.add %2788, %2790 : !llvm.i64 + %2792 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2793 = llvm.mul %852, %2792 : !llvm.i64 + %2794 = llvm.add %2791, %2793 : !llvm.i64 + %2795 = llvm.getelementptr %2787[%2794] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2796 = llvm.load %2795 : !llvm.ptr + %2797 = llvm.fmul %2786, %2796 {RelaxedPrecision} : !llvm.float + %2798 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2800 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2801 = llvm.mul %1996, %2800 : !llvm.i64 + %2802 = llvm.add %2799, %2801 : !llvm.i64 + %2803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2804 = llvm.mul %852, %2803 : !llvm.i64 + %2805 = llvm.add %2802, %2804 : !llvm.i64 + %2806 = llvm.getelementptr %2798[%2805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2807 = llvm.load %2806 : !llvm.ptr + %2808 = llvm.fadd %2807, %2797 {RelaxedPrecision} : !llvm.float + %2809 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2810 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2811 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2812 = llvm.mul %1996, %2811 : !llvm.i64 + %2813 = llvm.add %2810, %2812 : !llvm.i64 + %2814 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2815 = llvm.mul %852, %2814 : !llvm.i64 + %2816 = llvm.add %2813, %2815 : !llvm.i64 + %2817 = llvm.getelementptr %2809[%2816] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2808, %2817 : !llvm.ptr + %2818 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2819 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2820 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2821 = llvm.mul %1996, %2820 : !llvm.i64 + %2822 = llvm.add %2819, %2821 : !llvm.i64 + %2823 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2824 = llvm.mul %852, %2823 : !llvm.i64 + %2825 = llvm.add %2822, %2824 : !llvm.i64 + %2826 = llvm.getelementptr %2818[%2825] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2827 = llvm.load %2826 : !llvm.ptr + %2828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2831 = llvm.mul %1996, %2830 : !llvm.i64 + %2832 = llvm.add %2829, %2831 : !llvm.i64 + %2833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2834 = llvm.mul %852, %2833 : !llvm.i64 + %2835 = llvm.add %2832, %2834 : !llvm.i64 + %2836 = llvm.getelementptr %2828[%2835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2827, %2836 : !llvm.ptr + %2837 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2839 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2840 = llvm.mul %1996, %2839 : !llvm.i64 + %2841 = llvm.add %2838, %2840 : !llvm.i64 + %2842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2843 = llvm.mul %59, %2842 : !llvm.i64 + %2844 = llvm.add %2841, %2843 : !llvm.i64 + %2845 = llvm.getelementptr %2837[%2844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2846 = llvm.load %2845 : !llvm.ptr + %2847 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2848 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2849 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2850 = llvm.mul %59, %2849 : !llvm.i64 + %2851 = llvm.add %2848, %2850 : !llvm.i64 + %2852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2853 = llvm.mul %913, %2852 : !llvm.i64 + %2854 = llvm.add %2851, %2853 : !llvm.i64 + %2855 = llvm.getelementptr %2847[%2854] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2856 = llvm.load %2855 : !llvm.ptr + %2857 = llvm.fmul %2846, %2856 {RelaxedPrecision} : !llvm.float + %2858 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2860 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2861 = llvm.mul %1996, %2860 : !llvm.i64 + %2862 = llvm.add %2859, %2861 : !llvm.i64 + %2863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2864 = llvm.mul %913, %2863 : !llvm.i64 + %2865 = llvm.add %2862, %2864 : !llvm.i64 + %2866 = llvm.getelementptr %2858[%2865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2867 = llvm.load %2866 : !llvm.ptr + %2868 = llvm.fadd %2867, %2857 {RelaxedPrecision} : !llvm.float + %2869 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2871 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2872 = llvm.mul %1996, %2871 : !llvm.i64 + %2873 = llvm.add %2870, %2872 : !llvm.i64 + %2874 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2875 = llvm.mul %913, %2874 : !llvm.i64 + %2876 = llvm.add %2873, %2875 : !llvm.i64 + %2877 = llvm.getelementptr %2869[%2876] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2868, %2877 : !llvm.ptr + %2878 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2880 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2881 = llvm.mul %1996, %2880 : !llvm.i64 + %2882 = llvm.add %2879, %2881 : !llvm.i64 + %2883 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2884 = llvm.mul %913, %2883 : !llvm.i64 + %2885 = llvm.add %2882, %2884 : !llvm.i64 + %2886 = llvm.getelementptr %2878[%2885] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2887 = llvm.load %2886 : !llvm.ptr + %2888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2891 = llvm.mul %1996, %2890 : !llvm.i64 + %2892 = llvm.add %2889, %2891 : !llvm.i64 + %2893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2894 = llvm.mul %913, %2893 : !llvm.i64 + %2895 = llvm.add %2892, %2894 : !llvm.i64 + %2896 = llvm.getelementptr %2888[%2895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2887, %2896 : !llvm.ptr + %2897 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2899 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2900 = llvm.mul %1996, %2899 : !llvm.i64 + %2901 = llvm.add %2898, %2900 : !llvm.i64 + %2902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2903 = llvm.mul %59, %2902 : !llvm.i64 + %2904 = llvm.add %2901, %2903 : !llvm.i64 + %2905 = llvm.getelementptr %2897[%2904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2906 = llvm.load %2905 : !llvm.ptr + %2907 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2908 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2909 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2910 = llvm.mul %59, %2909 : !llvm.i64 + %2911 = llvm.add %2908, %2910 : !llvm.i64 + %2912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2913 = llvm.mul %974, %2912 : !llvm.i64 + %2914 = llvm.add %2911, %2913 : !llvm.i64 + %2915 = llvm.getelementptr %2907[%2914] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2916 = llvm.load %2915 : !llvm.ptr + %2917 = llvm.fmul %2906, %2916 {RelaxedPrecision} : !llvm.float + %2918 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2920 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2921 = llvm.mul %1996, %2920 : !llvm.i64 + %2922 = llvm.add %2919, %2921 : !llvm.i64 + %2923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2924 = llvm.mul %974, %2923 : !llvm.i64 + %2925 = llvm.add %2922, %2924 : !llvm.i64 + %2926 = llvm.getelementptr %2918[%2925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2927 = llvm.load %2926 : !llvm.ptr + %2928 = llvm.fadd %2927, %2917 {RelaxedPrecision} : !llvm.float + %2929 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2930 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2931 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2932 = llvm.mul %1996, %2931 : !llvm.i64 + %2933 = llvm.add %2930, %2932 : !llvm.i64 + %2934 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2935 = llvm.mul %974, %2934 : !llvm.i64 + %2936 = llvm.add %2933, %2935 : !llvm.i64 + %2937 = llvm.getelementptr %2929[%2936] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2928, %2937 : !llvm.ptr + %2938 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2940 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2941 = llvm.mul %1996, %2940 : !llvm.i64 + %2942 = llvm.add %2939, %2941 : !llvm.i64 + %2943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2944 = llvm.mul %974, %2943 : !llvm.i64 + %2945 = llvm.add %2942, %2944 : !llvm.i64 + %2946 = llvm.getelementptr %2938[%2945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2947 = llvm.load %2946 : !llvm.ptr + %2948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2951 = llvm.mul %1996, %2950 : !llvm.i64 + %2952 = llvm.add %2949, %2951 : !llvm.i64 + %2953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2954 = llvm.mul %974, %2953 : !llvm.i64 + %2955 = llvm.add %2952, %2954 : !llvm.i64 + %2956 = llvm.getelementptr %2948[%2955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2947, %2956 : !llvm.ptr + %2957 = llvm.add %50, %35 : !llvm.i64 + %2958 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2960 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2961 = llvm.mul %2957, %2960 : !llvm.i64 + %2962 = llvm.add %2959, %2961 : !llvm.i64 + %2963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2964 = llvm.mul %59, %2963 : !llvm.i64 + %2965 = llvm.add %2962, %2964 : !llvm.i64 + %2966 = llvm.getelementptr %2958[%2965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2967 = llvm.load %2966 : !llvm.ptr + %2968 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2971 = llvm.mul %59, %2970 : !llvm.i64 + %2972 = llvm.add %2969, %2971 : !llvm.i64 + %2973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2974 = llvm.mul %58, %2973 : !llvm.i64 + %2975 = llvm.add %2972, %2974 : !llvm.i64 + %2976 = llvm.getelementptr %2968[%2975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2977 = llvm.load %2976 : !llvm.ptr + %2978 = llvm.fmul %2967, %2977 {RelaxedPrecision} : !llvm.float + %2979 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2981 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2982 = llvm.mul %2957, %2981 : !llvm.i64 + %2983 = llvm.add %2980, %2982 : !llvm.i64 + %2984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2985 = llvm.mul %58, %2984 : !llvm.i64 + %2986 = llvm.add %2983, %2985 : !llvm.i64 + %2987 = llvm.getelementptr %2979[%2986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2988 = llvm.load %2987 : !llvm.ptr + %2989 = llvm.fadd %2988, %2978 {RelaxedPrecision} : !llvm.float + %2990 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2991 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2992 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2993 = llvm.mul %2957, %2992 : !llvm.i64 + %2994 = llvm.add %2991, %2993 : !llvm.i64 + %2995 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2996 = llvm.mul %58, %2995 : !llvm.i64 + %2997 = llvm.add %2994, %2996 : !llvm.i64 + %2998 = llvm.getelementptr %2990[%2997] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2989, %2998 : !llvm.ptr + %2999 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3000 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3001 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3002 = llvm.mul %2957, %3001 : !llvm.i64 + %3003 = llvm.add %3000, %3002 : !llvm.i64 + %3004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3005 = llvm.mul %58, %3004 : !llvm.i64 + %3006 = llvm.add %3003, %3005 : !llvm.i64 + %3007 = llvm.getelementptr %2999[%3006] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3008 = llvm.load %3007 : !llvm.ptr + %3009 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3010 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3011 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3012 = llvm.mul %2957, %3011 : !llvm.i64 + %3013 = llvm.add %3010, %3012 : !llvm.i64 + %3014 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3015 = llvm.mul %58, %3014 : !llvm.i64 + %3016 = llvm.add %3013, %3015 : !llvm.i64 + %3017 = llvm.getelementptr %3009[%3016] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3008, %3017 : !llvm.ptr + %3018 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3020 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3021 = llvm.mul %2957, %3020 : !llvm.i64 + %3022 = llvm.add %3019, %3021 : !llvm.i64 + %3023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3024 = llvm.mul %59, %3023 : !llvm.i64 + %3025 = llvm.add %3022, %3024 : !llvm.i64 + %3026 = llvm.getelementptr %3018[%3025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3027 = llvm.load %3026 : !llvm.ptr + %3028 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3031 = llvm.mul %59, %3030 : !llvm.i64 + %3032 = llvm.add %3029, %3031 : !llvm.i64 + %3033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3034 = llvm.mul %120, %3033 : !llvm.i64 + %3035 = llvm.add %3032, %3034 : !llvm.i64 + %3036 = llvm.getelementptr %3028[%3035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3037 = llvm.load %3036 : !llvm.ptr + %3038 = llvm.fmul %3027, %3037 {RelaxedPrecision} : !llvm.float + %3039 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3041 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3042 = llvm.mul %2957, %3041 : !llvm.i64 + %3043 = llvm.add %3040, %3042 : !llvm.i64 + %3044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3045 = llvm.mul %120, %3044 : !llvm.i64 + %3046 = llvm.add %3043, %3045 : !llvm.i64 + %3047 = llvm.getelementptr %3039[%3046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3048 = llvm.load %3047 : !llvm.ptr + %3049 = llvm.fadd %3048, %3038 {RelaxedPrecision} : !llvm.float + %3050 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3051 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3052 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3053 = llvm.mul %2957, %3052 : !llvm.i64 + %3054 = llvm.add %3051, %3053 : !llvm.i64 + %3055 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3056 = llvm.mul %120, %3055 : !llvm.i64 + %3057 = llvm.add %3054, %3056 : !llvm.i64 + %3058 = llvm.getelementptr %3050[%3057] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3049, %3058 : !llvm.ptr + %3059 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3060 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3061 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3062 = llvm.mul %2957, %3061 : !llvm.i64 + %3063 = llvm.add %3060, %3062 : !llvm.i64 + %3064 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3065 = llvm.mul %120, %3064 : !llvm.i64 + %3066 = llvm.add %3063, %3065 : !llvm.i64 + %3067 = llvm.getelementptr %3059[%3066] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3068 = llvm.load %3067 : !llvm.ptr + %3069 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3070 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3071 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3072 = llvm.mul %2957, %3071 : !llvm.i64 + %3073 = llvm.add %3070, %3072 : !llvm.i64 + %3074 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3075 = llvm.mul %120, %3074 : !llvm.i64 + %3076 = llvm.add %3073, %3075 : !llvm.i64 + %3077 = llvm.getelementptr %3069[%3076] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3068, %3077 : !llvm.ptr + %3078 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3080 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3081 = llvm.mul %2957, %3080 : !llvm.i64 + %3082 = llvm.add %3079, %3081 : !llvm.i64 + %3083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3084 = llvm.mul %59, %3083 : !llvm.i64 + %3085 = llvm.add %3082, %3084 : !llvm.i64 + %3086 = llvm.getelementptr %3078[%3085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3087 = llvm.load %3086 : !llvm.ptr + %3088 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3091 = llvm.mul %59, %3090 : !llvm.i64 + %3092 = llvm.add %3089, %3091 : !llvm.i64 + %3093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3094 = llvm.mul %181, %3093 : !llvm.i64 + %3095 = llvm.add %3092, %3094 : !llvm.i64 + %3096 = llvm.getelementptr %3088[%3095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3097 = llvm.load %3096 : !llvm.ptr + %3098 = llvm.fmul %3087, %3097 {RelaxedPrecision} : !llvm.float + %3099 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3101 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3102 = llvm.mul %2957, %3101 : !llvm.i64 + %3103 = llvm.add %3100, %3102 : !llvm.i64 + %3104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3105 = llvm.mul %181, %3104 : !llvm.i64 + %3106 = llvm.add %3103, %3105 : !llvm.i64 + %3107 = llvm.getelementptr %3099[%3106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3108 = llvm.load %3107 : !llvm.ptr + %3109 = llvm.fadd %3108, %3098 {RelaxedPrecision} : !llvm.float + %3110 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3111 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3112 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3113 = llvm.mul %2957, %3112 : !llvm.i64 + %3114 = llvm.add %3111, %3113 : !llvm.i64 + %3115 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3116 = llvm.mul %181, %3115 : !llvm.i64 + %3117 = llvm.add %3114, %3116 : !llvm.i64 + %3118 = llvm.getelementptr %3110[%3117] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3109, %3118 : !llvm.ptr + %3119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3121 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3122 = llvm.mul %2957, %3121 : !llvm.i64 + %3123 = llvm.add %3120, %3122 : !llvm.i64 + %3124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3125 = llvm.mul %181, %3124 : !llvm.i64 + %3126 = llvm.add %3123, %3125 : !llvm.i64 + %3127 = llvm.getelementptr %3119[%3126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3128 = llvm.load %3127 : !llvm.ptr + %3129 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3130 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3131 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3132 = llvm.mul %2957, %3131 : !llvm.i64 + %3133 = llvm.add %3130, %3132 : !llvm.i64 + %3134 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3135 = llvm.mul %181, %3134 : !llvm.i64 + %3136 = llvm.add %3133, %3135 : !llvm.i64 + %3137 = llvm.getelementptr %3129[%3136] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3128, %3137 : !llvm.ptr + %3138 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3140 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3141 = llvm.mul %2957, %3140 : !llvm.i64 + %3142 = llvm.add %3139, %3141 : !llvm.i64 + %3143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3144 = llvm.mul %59, %3143 : !llvm.i64 + %3145 = llvm.add %3142, %3144 : !llvm.i64 + %3146 = llvm.getelementptr %3138[%3145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3147 = llvm.load %3146 : !llvm.ptr + %3148 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3151 = llvm.mul %59, %3150 : !llvm.i64 + %3152 = llvm.add %3149, %3151 : !llvm.i64 + %3153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3154 = llvm.mul %242, %3153 : !llvm.i64 + %3155 = llvm.add %3152, %3154 : !llvm.i64 + %3156 = llvm.getelementptr %3148[%3155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3157 = llvm.load %3156 : !llvm.ptr + %3158 = llvm.fmul %3147, %3157 {RelaxedPrecision} : !llvm.float + %3159 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3161 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3162 = llvm.mul %2957, %3161 : !llvm.i64 + %3163 = llvm.add %3160, %3162 : !llvm.i64 + %3164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3165 = llvm.mul %242, %3164 : !llvm.i64 + %3166 = llvm.add %3163, %3165 : !llvm.i64 + %3167 = llvm.getelementptr %3159[%3166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3168 = llvm.load %3167 : !llvm.ptr + %3169 = llvm.fadd %3168, %3158 {RelaxedPrecision} : !llvm.float + %3170 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3171 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3172 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3173 = llvm.mul %2957, %3172 : !llvm.i64 + %3174 = llvm.add %3171, %3173 : !llvm.i64 + %3175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3176 = llvm.mul %242, %3175 : !llvm.i64 + %3177 = llvm.add %3174, %3176 : !llvm.i64 + %3178 = llvm.getelementptr %3170[%3177] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3169, %3178 : !llvm.ptr + %3179 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3182 = llvm.mul %2957, %3181 : !llvm.i64 + %3183 = llvm.add %3180, %3182 : !llvm.i64 + %3184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3185 = llvm.mul %242, %3184 : !llvm.i64 + %3186 = llvm.add %3183, %3185 : !llvm.i64 + %3187 = llvm.getelementptr %3179[%3186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3188 = llvm.load %3187 : !llvm.ptr + %3189 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3191 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3192 = llvm.mul %2957, %3191 : !llvm.i64 + %3193 = llvm.add %3190, %3192 : !llvm.i64 + %3194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3195 = llvm.mul %242, %3194 : !llvm.i64 + %3196 = llvm.add %3193, %3195 : !llvm.i64 + %3197 = llvm.getelementptr %3189[%3196] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3188, %3197 : !llvm.ptr + %3198 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3200 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3201 = llvm.mul %2957, %3200 : !llvm.i64 + %3202 = llvm.add %3199, %3201 : !llvm.i64 + %3203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3204 = llvm.mul %59, %3203 : !llvm.i64 + %3205 = llvm.add %3202, %3204 : !llvm.i64 + %3206 = llvm.getelementptr %3198[%3205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3207 = llvm.load %3206 : !llvm.ptr + %3208 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3211 = llvm.mul %59, %3210 : !llvm.i64 + %3212 = llvm.add %3209, %3211 : !llvm.i64 + %3213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3214 = llvm.mul %303, %3213 : !llvm.i64 + %3215 = llvm.add %3212, %3214 : !llvm.i64 + %3216 = llvm.getelementptr %3208[%3215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3217 = llvm.load %3216 : !llvm.ptr + %3218 = llvm.fmul %3207, %3217 {RelaxedPrecision} : !llvm.float + %3219 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3221 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3222 = llvm.mul %2957, %3221 : !llvm.i64 + %3223 = llvm.add %3220, %3222 : !llvm.i64 + %3224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3225 = llvm.mul %303, %3224 : !llvm.i64 + %3226 = llvm.add %3223, %3225 : !llvm.i64 + %3227 = llvm.getelementptr %3219[%3226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3228 = llvm.load %3227 : !llvm.ptr + %3229 = llvm.fadd %3228, %3218 {RelaxedPrecision} : !llvm.float + %3230 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3232 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3233 = llvm.mul %2957, %3232 : !llvm.i64 + %3234 = llvm.add %3231, %3233 : !llvm.i64 + %3235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3236 = llvm.mul %303, %3235 : !llvm.i64 + %3237 = llvm.add %3234, %3236 : !llvm.i64 + %3238 = llvm.getelementptr %3230[%3237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3229, %3238 : !llvm.ptr + %3239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3242 = llvm.mul %2957, %3241 : !llvm.i64 + %3243 = llvm.add %3240, %3242 : !llvm.i64 + %3244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3245 = llvm.mul %303, %3244 : !llvm.i64 + %3246 = llvm.add %3243, %3245 : !llvm.i64 + %3247 = llvm.getelementptr %3239[%3246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3248 = llvm.load %3247 : !llvm.ptr + %3249 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3250 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3251 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3252 = llvm.mul %2957, %3251 : !llvm.i64 + %3253 = llvm.add %3250, %3252 : !llvm.i64 + %3254 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3255 = llvm.mul %303, %3254 : !llvm.i64 + %3256 = llvm.add %3253, %3255 : !llvm.i64 + %3257 = llvm.getelementptr %3249[%3256] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3248, %3257 : !llvm.ptr + %3258 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3260 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3261 = llvm.mul %2957, %3260 : !llvm.i64 + %3262 = llvm.add %3259, %3261 : !llvm.i64 + %3263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3264 = llvm.mul %59, %3263 : !llvm.i64 + %3265 = llvm.add %3262, %3264 : !llvm.i64 + %3266 = llvm.getelementptr %3258[%3265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3267 = llvm.load %3266 : !llvm.ptr + %3268 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3271 = llvm.mul %59, %3270 : !llvm.i64 + %3272 = llvm.add %3269, %3271 : !llvm.i64 + %3273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3274 = llvm.mul %364, %3273 : !llvm.i64 + %3275 = llvm.add %3272, %3274 : !llvm.i64 + %3276 = llvm.getelementptr %3268[%3275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3277 = llvm.load %3276 : !llvm.ptr + %3278 = llvm.fmul %3267, %3277 {RelaxedPrecision} : !llvm.float + %3279 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3281 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3282 = llvm.mul %2957, %3281 : !llvm.i64 + %3283 = llvm.add %3280, %3282 : !llvm.i64 + %3284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3285 = llvm.mul %364, %3284 : !llvm.i64 + %3286 = llvm.add %3283, %3285 : !llvm.i64 + %3287 = llvm.getelementptr %3279[%3286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3288 = llvm.load %3287 : !llvm.ptr + %3289 = llvm.fadd %3288, %3278 {RelaxedPrecision} : !llvm.float + %3290 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3291 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3292 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3293 = llvm.mul %2957, %3292 : !llvm.i64 + %3294 = llvm.add %3291, %3293 : !llvm.i64 + %3295 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3296 = llvm.mul %364, %3295 : !llvm.i64 + %3297 = llvm.add %3294, %3296 : !llvm.i64 + %3298 = llvm.getelementptr %3290[%3297] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3289, %3298 : !llvm.ptr + %3299 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3300 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3301 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3302 = llvm.mul %2957, %3301 : !llvm.i64 + %3303 = llvm.add %3300, %3302 : !llvm.i64 + %3304 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3305 = llvm.mul %364, %3304 : !llvm.i64 + %3306 = llvm.add %3303, %3305 : !llvm.i64 + %3307 = llvm.getelementptr %3299[%3306] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3308 = llvm.load %3307 : !llvm.ptr + %3309 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3310 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3311 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3312 = llvm.mul %2957, %3311 : !llvm.i64 + %3313 = llvm.add %3310, %3312 : !llvm.i64 + %3314 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3315 = llvm.mul %364, %3314 : !llvm.i64 + %3316 = llvm.add %3313, %3315 : !llvm.i64 + %3317 = llvm.getelementptr %3309[%3316] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3308, %3317 : !llvm.ptr + %3318 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3320 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3321 = llvm.mul %2957, %3320 : !llvm.i64 + %3322 = llvm.add %3319, %3321 : !llvm.i64 + %3323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3324 = llvm.mul %59, %3323 : !llvm.i64 + %3325 = llvm.add %3322, %3324 : !llvm.i64 + %3326 = llvm.getelementptr %3318[%3325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3327 = llvm.load %3326 : !llvm.ptr + %3328 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3331 = llvm.mul %59, %3330 : !llvm.i64 + %3332 = llvm.add %3329, %3331 : !llvm.i64 + %3333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3334 = llvm.mul %425, %3333 : !llvm.i64 + %3335 = llvm.add %3332, %3334 : !llvm.i64 + %3336 = llvm.getelementptr %3328[%3335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3337 = llvm.load %3336 : !llvm.ptr + %3338 = llvm.fmul %3327, %3337 {RelaxedPrecision} : !llvm.float + %3339 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3341 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3342 = llvm.mul %2957, %3341 : !llvm.i64 + %3343 = llvm.add %3340, %3342 : !llvm.i64 + %3344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3345 = llvm.mul %425, %3344 : !llvm.i64 + %3346 = llvm.add %3343, %3345 : !llvm.i64 + %3347 = llvm.getelementptr %3339[%3346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3348 = llvm.load %3347 : !llvm.ptr + %3349 = llvm.fadd %3348, %3338 {RelaxedPrecision} : !llvm.float + %3350 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3351 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3352 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3353 = llvm.mul %2957, %3352 : !llvm.i64 + %3354 = llvm.add %3351, %3353 : !llvm.i64 + %3355 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3356 = llvm.mul %425, %3355 : !llvm.i64 + %3357 = llvm.add %3354, %3356 : !llvm.i64 + %3358 = llvm.getelementptr %3350[%3357] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3349, %3358 : !llvm.ptr + %3359 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3360 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3361 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3362 = llvm.mul %2957, %3361 : !llvm.i64 + %3363 = llvm.add %3360, %3362 : !llvm.i64 + %3364 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3365 = llvm.mul %425, %3364 : !llvm.i64 + %3366 = llvm.add %3363, %3365 : !llvm.i64 + %3367 = llvm.getelementptr %3359[%3366] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3368 = llvm.load %3367 : !llvm.ptr + %3369 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3370 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3371 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3372 = llvm.mul %2957, %3371 : !llvm.i64 + %3373 = llvm.add %3370, %3372 : !llvm.i64 + %3374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3375 = llvm.mul %425, %3374 : !llvm.i64 + %3376 = llvm.add %3373, %3375 : !llvm.i64 + %3377 = llvm.getelementptr %3369[%3376] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3368, %3377 : !llvm.ptr + %3378 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3380 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3381 = llvm.mul %2957, %3380 : !llvm.i64 + %3382 = llvm.add %3379, %3381 : !llvm.i64 + %3383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3384 = llvm.mul %59, %3383 : !llvm.i64 + %3385 = llvm.add %3382, %3384 : !llvm.i64 + %3386 = llvm.getelementptr %3378[%3385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3387 = llvm.load %3386 : !llvm.ptr + %3388 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3391 = llvm.mul %59, %3390 : !llvm.i64 + %3392 = llvm.add %3389, %3391 : !llvm.i64 + %3393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3394 = llvm.mul %486, %3393 : !llvm.i64 + %3395 = llvm.add %3392, %3394 : !llvm.i64 + %3396 = llvm.getelementptr %3388[%3395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3397 = llvm.load %3396 : !llvm.ptr + %3398 = llvm.fmul %3387, %3397 {RelaxedPrecision} : !llvm.float + %3399 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3401 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3402 = llvm.mul %2957, %3401 : !llvm.i64 + %3403 = llvm.add %3400, %3402 : !llvm.i64 + %3404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3405 = llvm.mul %486, %3404 : !llvm.i64 + %3406 = llvm.add %3403, %3405 : !llvm.i64 + %3407 = llvm.getelementptr %3399[%3406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3408 = llvm.load %3407 : !llvm.ptr + %3409 = llvm.fadd %3408, %3398 {RelaxedPrecision} : !llvm.float + %3410 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3413 = llvm.mul %2957, %3412 : !llvm.i64 + %3414 = llvm.add %3411, %3413 : !llvm.i64 + %3415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3416 = llvm.mul %486, %3415 : !llvm.i64 + %3417 = llvm.add %3414, %3416 : !llvm.i64 + %3418 = llvm.getelementptr %3410[%3417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3409, %3418 : !llvm.ptr + %3419 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3422 = llvm.mul %2957, %3421 : !llvm.i64 + %3423 = llvm.add %3420, %3422 : !llvm.i64 + %3424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3425 = llvm.mul %486, %3424 : !llvm.i64 + %3426 = llvm.add %3423, %3425 : !llvm.i64 + %3427 = llvm.getelementptr %3419[%3426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3428 = llvm.load %3427 : !llvm.ptr + %3429 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3430 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3431 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3432 = llvm.mul %2957, %3431 : !llvm.i64 + %3433 = llvm.add %3430, %3432 : !llvm.i64 + %3434 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3435 = llvm.mul %486, %3434 : !llvm.i64 + %3436 = llvm.add %3433, %3435 : !llvm.i64 + %3437 = llvm.getelementptr %3429[%3436] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3428, %3437 : !llvm.ptr + %3438 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3440 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3441 = llvm.mul %2957, %3440 : !llvm.i64 + %3442 = llvm.add %3439, %3441 : !llvm.i64 + %3443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3444 = llvm.mul %59, %3443 : !llvm.i64 + %3445 = llvm.add %3442, %3444 : !llvm.i64 + %3446 = llvm.getelementptr %3438[%3445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3447 = llvm.load %3446 : !llvm.ptr + %3448 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3451 = llvm.mul %59, %3450 : !llvm.i64 + %3452 = llvm.add %3449, %3451 : !llvm.i64 + %3453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3454 = llvm.mul %547, %3453 : !llvm.i64 + %3455 = llvm.add %3452, %3454 : !llvm.i64 + %3456 = llvm.getelementptr %3448[%3455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3457 = llvm.load %3456 : !llvm.ptr + %3458 = llvm.fmul %3447, %3457 {RelaxedPrecision} : !llvm.float + %3459 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3461 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3462 = llvm.mul %2957, %3461 : !llvm.i64 + %3463 = llvm.add %3460, %3462 : !llvm.i64 + %3464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3465 = llvm.mul %547, %3464 : !llvm.i64 + %3466 = llvm.add %3463, %3465 : !llvm.i64 + %3467 = llvm.getelementptr %3459[%3466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3468 = llvm.load %3467 : !llvm.ptr + %3469 = llvm.fadd %3468, %3458 {RelaxedPrecision} : !llvm.float + %3470 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3471 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3472 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3473 = llvm.mul %2957, %3472 : !llvm.i64 + %3474 = llvm.add %3471, %3473 : !llvm.i64 + %3475 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3476 = llvm.mul %547, %3475 : !llvm.i64 + %3477 = llvm.add %3474, %3476 : !llvm.i64 + %3478 = llvm.getelementptr %3470[%3477] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3469, %3478 : !llvm.ptr + %3479 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3480 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3481 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3482 = llvm.mul %2957, %3481 : !llvm.i64 + %3483 = llvm.add %3480, %3482 : !llvm.i64 + %3484 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3485 = llvm.mul %547, %3484 : !llvm.i64 + %3486 = llvm.add %3483, %3485 : !llvm.i64 + %3487 = llvm.getelementptr %3479[%3486] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3488 = llvm.load %3487 : !llvm.ptr + %3489 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3490 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3491 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3492 = llvm.mul %2957, %3491 : !llvm.i64 + %3493 = llvm.add %3490, %3492 : !llvm.i64 + %3494 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3495 = llvm.mul %547, %3494 : !llvm.i64 + %3496 = llvm.add %3493, %3495 : !llvm.i64 + %3497 = llvm.getelementptr %3489[%3496] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3488, %3497 : !llvm.ptr + %3498 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3500 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3501 = llvm.mul %2957, %3500 : !llvm.i64 + %3502 = llvm.add %3499, %3501 : !llvm.i64 + %3503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3504 = llvm.mul %59, %3503 : !llvm.i64 + %3505 = llvm.add %3502, %3504 : !llvm.i64 + %3506 = llvm.getelementptr %3498[%3505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3507 = llvm.load %3506 : !llvm.ptr + %3508 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3511 = llvm.mul %59, %3510 : !llvm.i64 + %3512 = llvm.add %3509, %3511 : !llvm.i64 + %3513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3514 = llvm.mul %608, %3513 : !llvm.i64 + %3515 = llvm.add %3512, %3514 : !llvm.i64 + %3516 = llvm.getelementptr %3508[%3515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3517 = llvm.load %3516 : !llvm.ptr + %3518 = llvm.fmul %3507, %3517 {RelaxedPrecision} : !llvm.float + %3519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3522 = llvm.mul %2957, %3521 : !llvm.i64 + %3523 = llvm.add %3520, %3522 : !llvm.i64 + %3524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3525 = llvm.mul %608, %3524 : !llvm.i64 + %3526 = llvm.add %3523, %3525 : !llvm.i64 + %3527 = llvm.getelementptr %3519[%3526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3528 = llvm.load %3527 : !llvm.ptr + %3529 = llvm.fadd %3528, %3518 {RelaxedPrecision} : !llvm.float + %3530 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3531 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3532 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3533 = llvm.mul %2957, %3532 : !llvm.i64 + %3534 = llvm.add %3531, %3533 : !llvm.i64 + %3535 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3536 = llvm.mul %608, %3535 : !llvm.i64 + %3537 = llvm.add %3534, %3536 : !llvm.i64 + %3538 = llvm.getelementptr %3530[%3537] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3529, %3538 : !llvm.ptr + %3539 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3540 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3541 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3542 = llvm.mul %2957, %3541 : !llvm.i64 + %3543 = llvm.add %3540, %3542 : !llvm.i64 + %3544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3545 = llvm.mul %608, %3544 : !llvm.i64 + %3546 = llvm.add %3543, %3545 : !llvm.i64 + %3547 = llvm.getelementptr %3539[%3546] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3548 = llvm.load %3547 : !llvm.ptr + %3549 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3550 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3551 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3552 = llvm.mul %2957, %3551 : !llvm.i64 + %3553 = llvm.add %3550, %3552 : !llvm.i64 + %3554 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3555 = llvm.mul %608, %3554 : !llvm.i64 + %3556 = llvm.add %3553, %3555 : !llvm.i64 + %3557 = llvm.getelementptr %3549[%3556] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3548, %3557 : !llvm.ptr + %3558 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3560 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3561 = llvm.mul %2957, %3560 : !llvm.i64 + %3562 = llvm.add %3559, %3561 : !llvm.i64 + %3563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3564 = llvm.mul %59, %3563 : !llvm.i64 + %3565 = llvm.add %3562, %3564 : !llvm.i64 + %3566 = llvm.getelementptr %3558[%3565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3567 = llvm.load %3566 : !llvm.ptr + %3568 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3571 = llvm.mul %59, %3570 : !llvm.i64 + %3572 = llvm.add %3569, %3571 : !llvm.i64 + %3573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3574 = llvm.mul %669, %3573 : !llvm.i64 + %3575 = llvm.add %3572, %3574 : !llvm.i64 + %3576 = llvm.getelementptr %3568[%3575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3577 = llvm.load %3576 : !llvm.ptr + %3578 = llvm.fmul %3567, %3577 {RelaxedPrecision} : !llvm.float + %3579 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3581 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3582 = llvm.mul %2957, %3581 : !llvm.i64 + %3583 = llvm.add %3580, %3582 : !llvm.i64 + %3584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3585 = llvm.mul %669, %3584 : !llvm.i64 + %3586 = llvm.add %3583, %3585 : !llvm.i64 + %3587 = llvm.getelementptr %3579[%3586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3588 = llvm.load %3587 : !llvm.ptr + %3589 = llvm.fadd %3588, %3578 {RelaxedPrecision} : !llvm.float + %3590 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3591 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3592 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3593 = llvm.mul %2957, %3592 : !llvm.i64 + %3594 = llvm.add %3591, %3593 : !llvm.i64 + %3595 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3596 = llvm.mul %669, %3595 : !llvm.i64 + %3597 = llvm.add %3594, %3596 : !llvm.i64 + %3598 = llvm.getelementptr %3590[%3597] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3589, %3598 : !llvm.ptr + %3599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3602 = llvm.mul %2957, %3601 : !llvm.i64 + %3603 = llvm.add %3600, %3602 : !llvm.i64 + %3604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3605 = llvm.mul %669, %3604 : !llvm.i64 + %3606 = llvm.add %3603, %3605 : !llvm.i64 + %3607 = llvm.getelementptr %3599[%3606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3608 = llvm.load %3607 : !llvm.ptr + %3609 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3610 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3611 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3612 = llvm.mul %2957, %3611 : !llvm.i64 + %3613 = llvm.add %3610, %3612 : !llvm.i64 + %3614 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3615 = llvm.mul %669, %3614 : !llvm.i64 + %3616 = llvm.add %3613, %3615 : !llvm.i64 + %3617 = llvm.getelementptr %3609[%3616] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3608, %3617 : !llvm.ptr + %3618 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3620 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3621 = llvm.mul %2957, %3620 : !llvm.i64 + %3622 = llvm.add %3619, %3621 : !llvm.i64 + %3623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3624 = llvm.mul %59, %3623 : !llvm.i64 + %3625 = llvm.add %3622, %3624 : !llvm.i64 + %3626 = llvm.getelementptr %3618[%3625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3627 = llvm.load %3626 : !llvm.ptr + %3628 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3631 = llvm.mul %59, %3630 : !llvm.i64 + %3632 = llvm.add %3629, %3631 : !llvm.i64 + %3633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3634 = llvm.mul %730, %3633 : !llvm.i64 + %3635 = llvm.add %3632, %3634 : !llvm.i64 + %3636 = llvm.getelementptr %3628[%3635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3637 = llvm.load %3636 : !llvm.ptr + %3638 = llvm.fmul %3627, %3637 {RelaxedPrecision} : !llvm.float + %3639 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3641 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3642 = llvm.mul %2957, %3641 : !llvm.i64 + %3643 = llvm.add %3640, %3642 : !llvm.i64 + %3644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3645 = llvm.mul %730, %3644 : !llvm.i64 + %3646 = llvm.add %3643, %3645 : !llvm.i64 + %3647 = llvm.getelementptr %3639[%3646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3648 = llvm.load %3647 : !llvm.ptr + %3649 = llvm.fadd %3648, %3638 {RelaxedPrecision} : !llvm.float + %3650 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3651 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3652 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3653 = llvm.mul %2957, %3652 : !llvm.i64 + %3654 = llvm.add %3651, %3653 : !llvm.i64 + %3655 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3656 = llvm.mul %730, %3655 : !llvm.i64 + %3657 = llvm.add %3654, %3656 : !llvm.i64 + %3658 = llvm.getelementptr %3650[%3657] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3649, %3658 : !llvm.ptr + %3659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3662 = llvm.mul %2957, %3661 : !llvm.i64 + %3663 = llvm.add %3660, %3662 : !llvm.i64 + %3664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3665 = llvm.mul %730, %3664 : !llvm.i64 + %3666 = llvm.add %3663, %3665 : !llvm.i64 + %3667 = llvm.getelementptr %3659[%3666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3668 = llvm.load %3667 : !llvm.ptr + %3669 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3670 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3671 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3672 = llvm.mul %2957, %3671 : !llvm.i64 + %3673 = llvm.add %3670, %3672 : !llvm.i64 + %3674 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3675 = llvm.mul %730, %3674 : !llvm.i64 + %3676 = llvm.add %3673, %3675 : !llvm.i64 + %3677 = llvm.getelementptr %3669[%3676] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3668, %3677 : !llvm.ptr + %3678 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3680 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3681 = llvm.mul %2957, %3680 : !llvm.i64 + %3682 = llvm.add %3679, %3681 : !llvm.i64 + %3683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3684 = llvm.mul %59, %3683 : !llvm.i64 + %3685 = llvm.add %3682, %3684 : !llvm.i64 + %3686 = llvm.getelementptr %3678[%3685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3687 = llvm.load %3686 : !llvm.ptr + %3688 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3691 = llvm.mul %59, %3690 : !llvm.i64 + %3692 = llvm.add %3689, %3691 : !llvm.i64 + %3693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3694 = llvm.mul %791, %3693 : !llvm.i64 + %3695 = llvm.add %3692, %3694 : !llvm.i64 + %3696 = llvm.getelementptr %3688[%3695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3697 = llvm.load %3696 : !llvm.ptr + %3698 = llvm.fmul %3687, %3697 {RelaxedPrecision} : !llvm.float + %3699 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3701 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3702 = llvm.mul %2957, %3701 : !llvm.i64 + %3703 = llvm.add %3700, %3702 : !llvm.i64 + %3704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3705 = llvm.mul %791, %3704 : !llvm.i64 + %3706 = llvm.add %3703, %3705 : !llvm.i64 + %3707 = llvm.getelementptr %3699[%3706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3708 = llvm.load %3707 : !llvm.ptr + %3709 = llvm.fadd %3708, %3698 {RelaxedPrecision} : !llvm.float + %3710 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3711 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3712 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3713 = llvm.mul %2957, %3712 : !llvm.i64 + %3714 = llvm.add %3711, %3713 : !llvm.i64 + %3715 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3716 = llvm.mul %791, %3715 : !llvm.i64 + %3717 = llvm.add %3714, %3716 : !llvm.i64 + %3718 = llvm.getelementptr %3710[%3717] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3709, %3718 : !llvm.ptr + %3719 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3720 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3721 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3722 = llvm.mul %2957, %3721 : !llvm.i64 + %3723 = llvm.add %3720, %3722 : !llvm.i64 + %3724 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3725 = llvm.mul %791, %3724 : !llvm.i64 + %3726 = llvm.add %3723, %3725 : !llvm.i64 + %3727 = llvm.getelementptr %3719[%3726] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3728 = llvm.load %3727 : !llvm.ptr + %3729 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3730 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3731 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3732 = llvm.mul %2957, %3731 : !llvm.i64 + %3733 = llvm.add %3730, %3732 : !llvm.i64 + %3734 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3735 = llvm.mul %791, %3734 : !llvm.i64 + %3736 = llvm.add %3733, %3735 : !llvm.i64 + %3737 = llvm.getelementptr %3729[%3736] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3728, %3737 : !llvm.ptr + %3738 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3740 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3741 = llvm.mul %2957, %3740 : !llvm.i64 + %3742 = llvm.add %3739, %3741 : !llvm.i64 + %3743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3744 = llvm.mul %59, %3743 : !llvm.i64 + %3745 = llvm.add %3742, %3744 : !llvm.i64 + %3746 = llvm.getelementptr %3738[%3745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3747 = llvm.load %3746 : !llvm.ptr + %3748 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3750 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3751 = llvm.mul %59, %3750 : !llvm.i64 + %3752 = llvm.add %3749, %3751 : !llvm.i64 + %3753 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3754 = llvm.mul %852, %3753 : !llvm.i64 + %3755 = llvm.add %3752, %3754 : !llvm.i64 + %3756 = llvm.getelementptr %3748[%3755] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3757 = llvm.load %3756 : !llvm.ptr + %3758 = llvm.fmul %3747, %3757 {RelaxedPrecision} : !llvm.float + %3759 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3761 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3762 = llvm.mul %2957, %3761 : !llvm.i64 + %3763 = llvm.add %3760, %3762 : !llvm.i64 + %3764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3765 = llvm.mul %852, %3764 : !llvm.i64 + %3766 = llvm.add %3763, %3765 : !llvm.i64 + %3767 = llvm.getelementptr %3759[%3766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3768 = llvm.load %3767 : !llvm.ptr + %3769 = llvm.fadd %3768, %3758 {RelaxedPrecision} : !llvm.float + %3770 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3771 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3772 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3773 = llvm.mul %2957, %3772 : !llvm.i64 + %3774 = llvm.add %3771, %3773 : !llvm.i64 + %3775 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3776 = llvm.mul %852, %3775 : !llvm.i64 + %3777 = llvm.add %3774, %3776 : !llvm.i64 + %3778 = llvm.getelementptr %3770[%3777] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3769, %3778 : !llvm.ptr + %3779 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3780 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3781 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3782 = llvm.mul %2957, %3781 : !llvm.i64 + %3783 = llvm.add %3780, %3782 : !llvm.i64 + %3784 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3785 = llvm.mul %852, %3784 : !llvm.i64 + %3786 = llvm.add %3783, %3785 : !llvm.i64 + %3787 = llvm.getelementptr %3779[%3786] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3788 = llvm.load %3787 : !llvm.ptr + %3789 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3790 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3791 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3792 = llvm.mul %2957, %3791 : !llvm.i64 + %3793 = llvm.add %3790, %3792 : !llvm.i64 + %3794 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3795 = llvm.mul %852, %3794 : !llvm.i64 + %3796 = llvm.add %3793, %3795 : !llvm.i64 + %3797 = llvm.getelementptr %3789[%3796] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3788, %3797 : !llvm.ptr + %3798 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3800 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3801 = llvm.mul %2957, %3800 : !llvm.i64 + %3802 = llvm.add %3799, %3801 : !llvm.i64 + %3803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3804 = llvm.mul %59, %3803 : !llvm.i64 + %3805 = llvm.add %3802, %3804 : !llvm.i64 + %3806 = llvm.getelementptr %3798[%3805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3807 = llvm.load %3806 : !llvm.ptr + %3808 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3811 = llvm.mul %59, %3810 : !llvm.i64 + %3812 = llvm.add %3809, %3811 : !llvm.i64 + %3813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3814 = llvm.mul %913, %3813 : !llvm.i64 + %3815 = llvm.add %3812, %3814 : !llvm.i64 + %3816 = llvm.getelementptr %3808[%3815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3817 = llvm.load %3816 : !llvm.ptr + %3818 = llvm.fmul %3807, %3817 {RelaxedPrecision} : !llvm.float + %3819 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3821 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3822 = llvm.mul %2957, %3821 : !llvm.i64 + %3823 = llvm.add %3820, %3822 : !llvm.i64 + %3824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3825 = llvm.mul %913, %3824 : !llvm.i64 + %3826 = llvm.add %3823, %3825 : !llvm.i64 + %3827 = llvm.getelementptr %3819[%3826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3828 = llvm.load %3827 : !llvm.ptr + %3829 = llvm.fadd %3828, %3818 {RelaxedPrecision} : !llvm.float + %3830 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3831 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3832 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3833 = llvm.mul %2957, %3832 : !llvm.i64 + %3834 = llvm.add %3831, %3833 : !llvm.i64 + %3835 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3836 = llvm.mul %913, %3835 : !llvm.i64 + %3837 = llvm.add %3834, %3836 : !llvm.i64 + %3838 = llvm.getelementptr %3830[%3837] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3829, %3838 : !llvm.ptr + %3839 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3840 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3841 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3842 = llvm.mul %2957, %3841 : !llvm.i64 + %3843 = llvm.add %3840, %3842 : !llvm.i64 + %3844 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3845 = llvm.mul %913, %3844 : !llvm.i64 + %3846 = llvm.add %3843, %3845 : !llvm.i64 + %3847 = llvm.getelementptr %3839[%3846] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3848 = llvm.load %3847 : !llvm.ptr + %3849 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3850 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3851 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3852 = llvm.mul %2957, %3851 : !llvm.i64 + %3853 = llvm.add %3850, %3852 : !llvm.i64 + %3854 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3855 = llvm.mul %913, %3854 : !llvm.i64 + %3856 = llvm.add %3853, %3855 : !llvm.i64 + %3857 = llvm.getelementptr %3849[%3856] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3848, %3857 : !llvm.ptr + %3858 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3860 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3861 = llvm.mul %2957, %3860 : !llvm.i64 + %3862 = llvm.add %3859, %3861 : !llvm.i64 + %3863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3864 = llvm.mul %59, %3863 : !llvm.i64 + %3865 = llvm.add %3862, %3864 : !llvm.i64 + %3866 = llvm.getelementptr %3858[%3865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3867 = llvm.load %3866 : !llvm.ptr + %3868 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3871 = llvm.mul %59, %3870 : !llvm.i64 + %3872 = llvm.add %3869, %3871 : !llvm.i64 + %3873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3874 = llvm.mul %974, %3873 : !llvm.i64 + %3875 = llvm.add %3872, %3874 : !llvm.i64 + %3876 = llvm.getelementptr %3868[%3875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3877 = llvm.load %3876 : !llvm.ptr + %3878 = llvm.fmul %3867, %3877 {RelaxedPrecision} : !llvm.float + %3879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3882 = llvm.mul %2957, %3881 : !llvm.i64 + %3883 = llvm.add %3880, %3882 : !llvm.i64 + %3884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3885 = llvm.mul %974, %3884 : !llvm.i64 + %3886 = llvm.add %3883, %3885 : !llvm.i64 + %3887 = llvm.getelementptr %3879[%3886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3888 = llvm.load %3887 : !llvm.ptr + %3889 = llvm.fadd %3888, %3878 {RelaxedPrecision} : !llvm.float + %3890 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3892 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3893 = llvm.mul %2957, %3892 : !llvm.i64 + %3894 = llvm.add %3891, %3893 : !llvm.i64 + %3895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3896 = llvm.mul %974, %3895 : !llvm.i64 + %3897 = llvm.add %3894, %3896 : !llvm.i64 + %3898 = llvm.getelementptr %3890[%3897] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3889, %3898 : !llvm.ptr + %3899 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3901 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3902 = llvm.mul %2957, %3901 : !llvm.i64 + %3903 = llvm.add %3900, %3902 : !llvm.i64 + %3904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3905 = llvm.mul %974, %3904 : !llvm.i64 + %3906 = llvm.add %3903, %3905 : !llvm.i64 + %3907 = llvm.getelementptr %3899[%3906] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3908 = llvm.load %3907 : !llvm.ptr + %3909 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3910 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3911 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3912 = llvm.mul %2957, %3911 : !llvm.i64 + %3913 = llvm.add %3910, %3912 : !llvm.i64 + %3914 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3915 = llvm.mul %974, %3914 : !llvm.i64 + %3916 = llvm.add %3913, %3915 : !llvm.i64 + %3917 = llvm.getelementptr %3909[%3916] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3908, %3917 : !llvm.ptr + %3918 = llvm.add %50, %36 : !llvm.i64 + %3919 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3921 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3922 = llvm.mul %3918, %3921 : !llvm.i64 + %3923 = llvm.add %3920, %3922 : !llvm.i64 + %3924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3925 = llvm.mul %59, %3924 : !llvm.i64 + %3926 = llvm.add %3923, %3925 : !llvm.i64 + %3927 = llvm.getelementptr %3919[%3926] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3928 = llvm.load %3927 : !llvm.ptr + %3929 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3930 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3931 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3932 = llvm.mul %59, %3931 : !llvm.i64 + %3933 = llvm.add %3930, %3932 : !llvm.i64 + %3934 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3935 = llvm.mul %58, %3934 : !llvm.i64 + %3936 = llvm.add %3933, %3935 : !llvm.i64 + %3937 = llvm.getelementptr %3929[%3936] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3938 = llvm.load %3937 : !llvm.ptr + %3939 = llvm.fmul %3928, %3938 {RelaxedPrecision} : !llvm.float + %3940 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3941 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3942 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3943 = llvm.mul %3918, %3942 : !llvm.i64 + %3944 = llvm.add %3941, %3943 : !llvm.i64 + %3945 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3946 = llvm.mul %58, %3945 : !llvm.i64 + %3947 = llvm.add %3944, %3946 : !llvm.i64 + %3948 = llvm.getelementptr %3940[%3947] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3949 = llvm.load %3948 : !llvm.ptr + %3950 = llvm.fadd %3949, %3939 {RelaxedPrecision} : !llvm.float + %3951 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3952 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3953 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3954 = llvm.mul %3918, %3953 : !llvm.i64 + %3955 = llvm.add %3952, %3954 : !llvm.i64 + %3956 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3957 = llvm.mul %58, %3956 : !llvm.i64 + %3958 = llvm.add %3955, %3957 : !llvm.i64 + %3959 = llvm.getelementptr %3951[%3958] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3950, %3959 : !llvm.ptr + %3960 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3961 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3962 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3963 = llvm.mul %3918, %3962 : !llvm.i64 + %3964 = llvm.add %3961, %3963 : !llvm.i64 + %3965 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3966 = llvm.mul %58, %3965 : !llvm.i64 + %3967 = llvm.add %3964, %3966 : !llvm.i64 + %3968 = llvm.getelementptr %3960[%3967] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3969 = llvm.load %3968 : !llvm.ptr + %3970 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3971 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3972 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3973 = llvm.mul %3918, %3972 : !llvm.i64 + %3974 = llvm.add %3971, %3973 : !llvm.i64 + %3975 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3976 = llvm.mul %58, %3975 : !llvm.i64 + %3977 = llvm.add %3974, %3976 : !llvm.i64 + %3978 = llvm.getelementptr %3970[%3977] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3969, %3978 : !llvm.ptr + %3979 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3981 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3982 = llvm.mul %3918, %3981 : !llvm.i64 + %3983 = llvm.add %3980, %3982 : !llvm.i64 + %3984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3985 = llvm.mul %59, %3984 : !llvm.i64 + %3986 = llvm.add %3983, %3985 : !llvm.i64 + %3987 = llvm.getelementptr %3979[%3986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3988 = llvm.load %3987 : !llvm.ptr + %3989 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3990 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3991 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3992 = llvm.mul %59, %3991 : !llvm.i64 + %3993 = llvm.add %3990, %3992 : !llvm.i64 + %3994 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3995 = llvm.mul %120, %3994 : !llvm.i64 + %3996 = llvm.add %3993, %3995 : !llvm.i64 + %3997 = llvm.getelementptr %3989[%3996] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3998 = llvm.load %3997 : !llvm.ptr + %3999 = llvm.fmul %3988, %3998 {RelaxedPrecision} : !llvm.float + %4000 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4001 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4002 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4003 = llvm.mul %3918, %4002 : !llvm.i64 + %4004 = llvm.add %4001, %4003 : !llvm.i64 + %4005 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4006 = llvm.mul %120, %4005 : !llvm.i64 + %4007 = llvm.add %4004, %4006 : !llvm.i64 + %4008 = llvm.getelementptr %4000[%4007] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4009 = llvm.load %4008 : !llvm.ptr + %4010 = llvm.fadd %4009, %3999 {RelaxedPrecision} : !llvm.float + %4011 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4012 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4013 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4014 = llvm.mul %3918, %4013 : !llvm.i64 + %4015 = llvm.add %4012, %4014 : !llvm.i64 + %4016 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4017 = llvm.mul %120, %4016 : !llvm.i64 + %4018 = llvm.add %4015, %4017 : !llvm.i64 + %4019 = llvm.getelementptr %4011[%4018] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4010, %4019 : !llvm.ptr + %4020 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4021 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4022 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4023 = llvm.mul %3918, %4022 : !llvm.i64 + %4024 = llvm.add %4021, %4023 : !llvm.i64 + %4025 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4026 = llvm.mul %120, %4025 : !llvm.i64 + %4027 = llvm.add %4024, %4026 : !llvm.i64 + %4028 = llvm.getelementptr %4020[%4027] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4029 = llvm.load %4028 : !llvm.ptr + %4030 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4031 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4032 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4033 = llvm.mul %3918, %4032 : !llvm.i64 + %4034 = llvm.add %4031, %4033 : !llvm.i64 + %4035 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4036 = llvm.mul %120, %4035 : !llvm.i64 + %4037 = llvm.add %4034, %4036 : !llvm.i64 + %4038 = llvm.getelementptr %4030[%4037] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4029, %4038 : !llvm.ptr + %4039 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4041 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4042 = llvm.mul %3918, %4041 : !llvm.i64 + %4043 = llvm.add %4040, %4042 : !llvm.i64 + %4044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4045 = llvm.mul %59, %4044 : !llvm.i64 + %4046 = llvm.add %4043, %4045 : !llvm.i64 + %4047 = llvm.getelementptr %4039[%4046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4048 = llvm.load %4047 : !llvm.ptr + %4049 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4050 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4051 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4052 = llvm.mul %59, %4051 : !llvm.i64 + %4053 = llvm.add %4050, %4052 : !llvm.i64 + %4054 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4055 = llvm.mul %181, %4054 : !llvm.i64 + %4056 = llvm.add %4053, %4055 : !llvm.i64 + %4057 = llvm.getelementptr %4049[%4056] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4058 = llvm.load %4057 : !llvm.ptr + %4059 = llvm.fmul %4048, %4058 {RelaxedPrecision} : !llvm.float + %4060 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4062 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4063 = llvm.mul %3918, %4062 : !llvm.i64 + %4064 = llvm.add %4061, %4063 : !llvm.i64 + %4065 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4066 = llvm.mul %181, %4065 : !llvm.i64 + %4067 = llvm.add %4064, %4066 : !llvm.i64 + %4068 = llvm.getelementptr %4060[%4067] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4069 = llvm.load %4068 : !llvm.ptr + %4070 = llvm.fadd %4069, %4059 {RelaxedPrecision} : !llvm.float + %4071 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4072 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4073 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4074 = llvm.mul %3918, %4073 : !llvm.i64 + %4075 = llvm.add %4072, %4074 : !llvm.i64 + %4076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4077 = llvm.mul %181, %4076 : !llvm.i64 + %4078 = llvm.add %4075, %4077 : !llvm.i64 + %4079 = llvm.getelementptr %4071[%4078] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4070, %4079 : !llvm.ptr + %4080 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4081 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4082 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4083 = llvm.mul %3918, %4082 : !llvm.i64 + %4084 = llvm.add %4081, %4083 : !llvm.i64 + %4085 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4086 = llvm.mul %181, %4085 : !llvm.i64 + %4087 = llvm.add %4084, %4086 : !llvm.i64 + %4088 = llvm.getelementptr %4080[%4087] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4089 = llvm.load %4088 : !llvm.ptr + %4090 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4091 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4092 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4093 = llvm.mul %3918, %4092 : !llvm.i64 + %4094 = llvm.add %4091, %4093 : !llvm.i64 + %4095 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4096 = llvm.mul %181, %4095 : !llvm.i64 + %4097 = llvm.add %4094, %4096 : !llvm.i64 + %4098 = llvm.getelementptr %4090[%4097] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4089, %4098 : !llvm.ptr + %4099 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4101 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4102 = llvm.mul %3918, %4101 : !llvm.i64 + %4103 = llvm.add %4100, %4102 : !llvm.i64 + %4104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4105 = llvm.mul %59, %4104 : !llvm.i64 + %4106 = llvm.add %4103, %4105 : !llvm.i64 + %4107 = llvm.getelementptr %4099[%4106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4108 = llvm.load %4107 : !llvm.ptr + %4109 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4110 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4111 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4112 = llvm.mul %59, %4111 : !llvm.i64 + %4113 = llvm.add %4110, %4112 : !llvm.i64 + %4114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4115 = llvm.mul %242, %4114 : !llvm.i64 + %4116 = llvm.add %4113, %4115 : !llvm.i64 + %4117 = llvm.getelementptr %4109[%4116] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4118 = llvm.load %4117 : !llvm.ptr + %4119 = llvm.fmul %4108, %4118 {RelaxedPrecision} : !llvm.float + %4120 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4122 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4123 = llvm.mul %3918, %4122 : !llvm.i64 + %4124 = llvm.add %4121, %4123 : !llvm.i64 + %4125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4126 = llvm.mul %242, %4125 : !llvm.i64 + %4127 = llvm.add %4124, %4126 : !llvm.i64 + %4128 = llvm.getelementptr %4120[%4127] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4129 = llvm.load %4128 : !llvm.ptr + %4130 = llvm.fadd %4129, %4119 {RelaxedPrecision} : !llvm.float + %4131 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4134 = llvm.mul %3918, %4133 : !llvm.i64 + %4135 = llvm.add %4132, %4134 : !llvm.i64 + %4136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4137 = llvm.mul %242, %4136 : !llvm.i64 + %4138 = llvm.add %4135, %4137 : !llvm.i64 + %4139 = llvm.getelementptr %4131[%4138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4130, %4139 : !llvm.ptr + %4140 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4141 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4142 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4143 = llvm.mul %3918, %4142 : !llvm.i64 + %4144 = llvm.add %4141, %4143 : !llvm.i64 + %4145 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4146 = llvm.mul %242, %4145 : !llvm.i64 + %4147 = llvm.add %4144, %4146 : !llvm.i64 + %4148 = llvm.getelementptr %4140[%4147] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4149 = llvm.load %4148 : !llvm.ptr + %4150 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4151 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4152 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4153 = llvm.mul %3918, %4152 : !llvm.i64 + %4154 = llvm.add %4151, %4153 : !llvm.i64 + %4155 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4156 = llvm.mul %242, %4155 : !llvm.i64 + %4157 = llvm.add %4154, %4156 : !llvm.i64 + %4158 = llvm.getelementptr %4150[%4157] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4149, %4158 : !llvm.ptr + %4159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4161 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4162 = llvm.mul %3918, %4161 : !llvm.i64 + %4163 = llvm.add %4160, %4162 : !llvm.i64 + %4164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4165 = llvm.mul %59, %4164 : !llvm.i64 + %4166 = llvm.add %4163, %4165 : !llvm.i64 + %4167 = llvm.getelementptr %4159[%4166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4168 = llvm.load %4167 : !llvm.ptr + %4169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4171 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4172 = llvm.mul %59, %4171 : !llvm.i64 + %4173 = llvm.add %4170, %4172 : !llvm.i64 + %4174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4175 = llvm.mul %303, %4174 : !llvm.i64 + %4176 = llvm.add %4173, %4175 : !llvm.i64 + %4177 = llvm.getelementptr %4169[%4176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4178 = llvm.load %4177 : !llvm.ptr + %4179 = llvm.fmul %4168, %4178 {RelaxedPrecision} : !llvm.float + %4180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4182 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4183 = llvm.mul %3918, %4182 : !llvm.i64 + %4184 = llvm.add %4181, %4183 : !llvm.i64 + %4185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4186 = llvm.mul %303, %4185 : !llvm.i64 + %4187 = llvm.add %4184, %4186 : !llvm.i64 + %4188 = llvm.getelementptr %4180[%4187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4189 = llvm.load %4188 : !llvm.ptr + %4190 = llvm.fadd %4189, %4179 {RelaxedPrecision} : !llvm.float + %4191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4193 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4194 = llvm.mul %3918, %4193 : !llvm.i64 + %4195 = llvm.add %4192, %4194 : !llvm.i64 + %4196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4197 = llvm.mul %303, %4196 : !llvm.i64 + %4198 = llvm.add %4195, %4197 : !llvm.i64 + %4199 = llvm.getelementptr %4191[%4198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4190, %4199 : !llvm.ptr + %4200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4202 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4203 = llvm.mul %3918, %4202 : !llvm.i64 + %4204 = llvm.add %4201, %4203 : !llvm.i64 + %4205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4206 = llvm.mul %303, %4205 : !llvm.i64 + %4207 = llvm.add %4204, %4206 : !llvm.i64 + %4208 = llvm.getelementptr %4200[%4207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4209 = llvm.load %4208 : !llvm.ptr + %4210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4212 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4213 = llvm.mul %3918, %4212 : !llvm.i64 + %4214 = llvm.add %4211, %4213 : !llvm.i64 + %4215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4216 = llvm.mul %303, %4215 : !llvm.i64 + %4217 = llvm.add %4214, %4216 : !llvm.i64 + %4218 = llvm.getelementptr %4210[%4217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4209, %4218 : !llvm.ptr + %4219 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4221 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4222 = llvm.mul %3918, %4221 : !llvm.i64 + %4223 = llvm.add %4220, %4222 : !llvm.i64 + %4224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4225 = llvm.mul %59, %4224 : !llvm.i64 + %4226 = llvm.add %4223, %4225 : !llvm.i64 + %4227 = llvm.getelementptr %4219[%4226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4228 = llvm.load %4227 : !llvm.ptr + %4229 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4230 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4231 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4232 = llvm.mul %59, %4231 : !llvm.i64 + %4233 = llvm.add %4230, %4232 : !llvm.i64 + %4234 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4235 = llvm.mul %364, %4234 : !llvm.i64 + %4236 = llvm.add %4233, %4235 : !llvm.i64 + %4237 = llvm.getelementptr %4229[%4236] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4238 = llvm.load %4237 : !llvm.ptr + %4239 = llvm.fmul %4228, %4238 {RelaxedPrecision} : !llvm.float + %4240 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4242 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4243 = llvm.mul %3918, %4242 : !llvm.i64 + %4244 = llvm.add %4241, %4243 : !llvm.i64 + %4245 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4246 = llvm.mul %364, %4245 : !llvm.i64 + %4247 = llvm.add %4244, %4246 : !llvm.i64 + %4248 = llvm.getelementptr %4240[%4247] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4249 = llvm.load %4248 : !llvm.ptr + %4250 = llvm.fadd %4249, %4239 {RelaxedPrecision} : !llvm.float + %4251 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4252 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4253 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4254 = llvm.mul %3918, %4253 : !llvm.i64 + %4255 = llvm.add %4252, %4254 : !llvm.i64 + %4256 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4257 = llvm.mul %364, %4256 : !llvm.i64 + %4258 = llvm.add %4255, %4257 : !llvm.i64 + %4259 = llvm.getelementptr %4251[%4258] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4250, %4259 : !llvm.ptr + %4260 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4261 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4262 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4263 = llvm.mul %3918, %4262 : !llvm.i64 + %4264 = llvm.add %4261, %4263 : !llvm.i64 + %4265 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4266 = llvm.mul %364, %4265 : !llvm.i64 + %4267 = llvm.add %4264, %4266 : !llvm.i64 + %4268 = llvm.getelementptr %4260[%4267] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4269 = llvm.load %4268 : !llvm.ptr + %4270 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4271 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4272 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4273 = llvm.mul %3918, %4272 : !llvm.i64 + %4274 = llvm.add %4271, %4273 : !llvm.i64 + %4275 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4276 = llvm.mul %364, %4275 : !llvm.i64 + %4277 = llvm.add %4274, %4276 : !llvm.i64 + %4278 = llvm.getelementptr %4270[%4277] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4269, %4278 : !llvm.ptr + %4279 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4281 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4282 = llvm.mul %3918, %4281 : !llvm.i64 + %4283 = llvm.add %4280, %4282 : !llvm.i64 + %4284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4285 = llvm.mul %59, %4284 : !llvm.i64 + %4286 = llvm.add %4283, %4285 : !llvm.i64 + %4287 = llvm.getelementptr %4279[%4286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4288 = llvm.load %4287 : !llvm.ptr + %4289 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4290 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4291 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4292 = llvm.mul %59, %4291 : !llvm.i64 + %4293 = llvm.add %4290, %4292 : !llvm.i64 + %4294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4295 = llvm.mul %425, %4294 : !llvm.i64 + %4296 = llvm.add %4293, %4295 : !llvm.i64 + %4297 = llvm.getelementptr %4289[%4296] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4298 = llvm.load %4297 : !llvm.ptr + %4299 = llvm.fmul %4288, %4298 {RelaxedPrecision} : !llvm.float + %4300 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4302 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4303 = llvm.mul %3918, %4302 : !llvm.i64 + %4304 = llvm.add %4301, %4303 : !llvm.i64 + %4305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4306 = llvm.mul %425, %4305 : !llvm.i64 + %4307 = llvm.add %4304, %4306 : !llvm.i64 + %4308 = llvm.getelementptr %4300[%4307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4309 = llvm.load %4308 : !llvm.ptr + %4310 = llvm.fadd %4309, %4299 {RelaxedPrecision} : !llvm.float + %4311 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4314 = llvm.mul %3918, %4313 : !llvm.i64 + %4315 = llvm.add %4312, %4314 : !llvm.i64 + %4316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4317 = llvm.mul %425, %4316 : !llvm.i64 + %4318 = llvm.add %4315, %4317 : !llvm.i64 + %4319 = llvm.getelementptr %4311[%4318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4310, %4319 : !llvm.ptr + %4320 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4321 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4322 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4323 = llvm.mul %3918, %4322 : !llvm.i64 + %4324 = llvm.add %4321, %4323 : !llvm.i64 + %4325 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4326 = llvm.mul %425, %4325 : !llvm.i64 + %4327 = llvm.add %4324, %4326 : !llvm.i64 + %4328 = llvm.getelementptr %4320[%4327] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4329 = llvm.load %4328 : !llvm.ptr + %4330 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4331 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4332 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4333 = llvm.mul %3918, %4332 : !llvm.i64 + %4334 = llvm.add %4331, %4333 : !llvm.i64 + %4335 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4336 = llvm.mul %425, %4335 : !llvm.i64 + %4337 = llvm.add %4334, %4336 : !llvm.i64 + %4338 = llvm.getelementptr %4330[%4337] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4329, %4338 : !llvm.ptr + %4339 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4341 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4342 = llvm.mul %3918, %4341 : !llvm.i64 + %4343 = llvm.add %4340, %4342 : !llvm.i64 + %4344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4345 = llvm.mul %59, %4344 : !llvm.i64 + %4346 = llvm.add %4343, %4345 : !llvm.i64 + %4347 = llvm.getelementptr %4339[%4346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4348 = llvm.load %4347 : !llvm.ptr + %4349 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4350 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4351 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4352 = llvm.mul %59, %4351 : !llvm.i64 + %4353 = llvm.add %4350, %4352 : !llvm.i64 + %4354 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4355 = llvm.mul %486, %4354 : !llvm.i64 + %4356 = llvm.add %4353, %4355 : !llvm.i64 + %4357 = llvm.getelementptr %4349[%4356] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4358 = llvm.load %4357 : !llvm.ptr + %4359 = llvm.fmul %4348, %4358 {RelaxedPrecision} : !llvm.float + %4360 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4361 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4362 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4363 = llvm.mul %3918, %4362 : !llvm.i64 + %4364 = llvm.add %4361, %4363 : !llvm.i64 + %4365 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4366 = llvm.mul %486, %4365 : !llvm.i64 + %4367 = llvm.add %4364, %4366 : !llvm.i64 + %4368 = llvm.getelementptr %4360[%4367] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4369 = llvm.load %4368 : !llvm.ptr + %4370 = llvm.fadd %4369, %4359 {RelaxedPrecision} : !llvm.float + %4371 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4372 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4373 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4374 = llvm.mul %3918, %4373 : !llvm.i64 + %4375 = llvm.add %4372, %4374 : !llvm.i64 + %4376 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4377 = llvm.mul %486, %4376 : !llvm.i64 + %4378 = llvm.add %4375, %4377 : !llvm.i64 + %4379 = llvm.getelementptr %4371[%4378] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4370, %4379 : !llvm.ptr + %4380 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4382 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4383 = llvm.mul %3918, %4382 : !llvm.i64 + %4384 = llvm.add %4381, %4383 : !llvm.i64 + %4385 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4386 = llvm.mul %486, %4385 : !llvm.i64 + %4387 = llvm.add %4384, %4386 : !llvm.i64 + %4388 = llvm.getelementptr %4380[%4387] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4389 = llvm.load %4388 : !llvm.ptr + %4390 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4392 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4393 = llvm.mul %3918, %4392 : !llvm.i64 + %4394 = llvm.add %4391, %4393 : !llvm.i64 + %4395 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4396 = llvm.mul %486, %4395 : !llvm.i64 + %4397 = llvm.add %4394, %4396 : !llvm.i64 + %4398 = llvm.getelementptr %4390[%4397] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4389, %4398 : !llvm.ptr + %4399 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4401 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4402 = llvm.mul %3918, %4401 : !llvm.i64 + %4403 = llvm.add %4400, %4402 : !llvm.i64 + %4404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4405 = llvm.mul %59, %4404 : !llvm.i64 + %4406 = llvm.add %4403, %4405 : !llvm.i64 + %4407 = llvm.getelementptr %4399[%4406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4408 = llvm.load %4407 : !llvm.ptr + %4409 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4410 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4411 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4412 = llvm.mul %59, %4411 : !llvm.i64 + %4413 = llvm.add %4410, %4412 : !llvm.i64 + %4414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4415 = llvm.mul %547, %4414 : !llvm.i64 + %4416 = llvm.add %4413, %4415 : !llvm.i64 + %4417 = llvm.getelementptr %4409[%4416] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4418 = llvm.load %4417 : !llvm.ptr + %4419 = llvm.fmul %4408, %4418 {RelaxedPrecision} : !llvm.float + %4420 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4421 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4422 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4423 = llvm.mul %3918, %4422 : !llvm.i64 + %4424 = llvm.add %4421, %4423 : !llvm.i64 + %4425 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4426 = llvm.mul %547, %4425 : !llvm.i64 + %4427 = llvm.add %4424, %4426 : !llvm.i64 + %4428 = llvm.getelementptr %4420[%4427] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4429 = llvm.load %4428 : !llvm.ptr + %4430 = llvm.fadd %4429, %4419 {RelaxedPrecision} : !llvm.float + %4431 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4433 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4434 = llvm.mul %3918, %4433 : !llvm.i64 + %4435 = llvm.add %4432, %4434 : !llvm.i64 + %4436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4437 = llvm.mul %547, %4436 : !llvm.i64 + %4438 = llvm.add %4435, %4437 : !llvm.i64 + %4439 = llvm.getelementptr %4431[%4438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4430, %4439 : !llvm.ptr + %4440 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4441 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4442 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4443 = llvm.mul %3918, %4442 : !llvm.i64 + %4444 = llvm.add %4441, %4443 : !llvm.i64 + %4445 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4446 = llvm.mul %547, %4445 : !llvm.i64 + %4447 = llvm.add %4444, %4446 : !llvm.i64 + %4448 = llvm.getelementptr %4440[%4447] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4449 = llvm.load %4448 : !llvm.ptr + %4450 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4451 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4452 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4453 = llvm.mul %3918, %4452 : !llvm.i64 + %4454 = llvm.add %4451, %4453 : !llvm.i64 + %4455 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4456 = llvm.mul %547, %4455 : !llvm.i64 + %4457 = llvm.add %4454, %4456 : !llvm.i64 + %4458 = llvm.getelementptr %4450[%4457] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4449, %4458 : !llvm.ptr + %4459 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4461 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4462 = llvm.mul %3918, %4461 : !llvm.i64 + %4463 = llvm.add %4460, %4462 : !llvm.i64 + %4464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4465 = llvm.mul %59, %4464 : !llvm.i64 + %4466 = llvm.add %4463, %4465 : !llvm.i64 + %4467 = llvm.getelementptr %4459[%4466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4468 = llvm.load %4467 : !llvm.ptr + %4469 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4470 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4471 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4472 = llvm.mul %59, %4471 : !llvm.i64 + %4473 = llvm.add %4470, %4472 : !llvm.i64 + %4474 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4475 = llvm.mul %608, %4474 : !llvm.i64 + %4476 = llvm.add %4473, %4475 : !llvm.i64 + %4477 = llvm.getelementptr %4469[%4476] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4478 = llvm.load %4477 : !llvm.ptr + %4479 = llvm.fmul %4468, %4478 {RelaxedPrecision} : !llvm.float + %4480 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4481 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4482 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4483 = llvm.mul %3918, %4482 : !llvm.i64 + %4484 = llvm.add %4481, %4483 : !llvm.i64 + %4485 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4486 = llvm.mul %608, %4485 : !llvm.i64 + %4487 = llvm.add %4484, %4486 : !llvm.i64 + %4488 = llvm.getelementptr %4480[%4487] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4489 = llvm.load %4488 : !llvm.ptr + %4490 = llvm.fadd %4489, %4479 {RelaxedPrecision} : !llvm.float + %4491 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4492 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4493 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4494 = llvm.mul %3918, %4493 : !llvm.i64 + %4495 = llvm.add %4492, %4494 : !llvm.i64 + %4496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4497 = llvm.mul %608, %4496 : !llvm.i64 + %4498 = llvm.add %4495, %4497 : !llvm.i64 + %4499 = llvm.getelementptr %4491[%4498] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4490, %4499 : !llvm.ptr + %4500 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4501 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4502 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4503 = llvm.mul %3918, %4502 : !llvm.i64 + %4504 = llvm.add %4501, %4503 : !llvm.i64 + %4505 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4506 = llvm.mul %608, %4505 : !llvm.i64 + %4507 = llvm.add %4504, %4506 : !llvm.i64 + %4508 = llvm.getelementptr %4500[%4507] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4509 = llvm.load %4508 : !llvm.ptr + %4510 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4511 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4512 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4513 = llvm.mul %3918, %4512 : !llvm.i64 + %4514 = llvm.add %4511, %4513 : !llvm.i64 + %4515 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4516 = llvm.mul %608, %4515 : !llvm.i64 + %4517 = llvm.add %4514, %4516 : !llvm.i64 + %4518 = llvm.getelementptr %4510[%4517] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4509, %4518 : !llvm.ptr + %4519 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4521 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4522 = llvm.mul %3918, %4521 : !llvm.i64 + %4523 = llvm.add %4520, %4522 : !llvm.i64 + %4524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4525 = llvm.mul %59, %4524 : !llvm.i64 + %4526 = llvm.add %4523, %4525 : !llvm.i64 + %4527 = llvm.getelementptr %4519[%4526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4528 = llvm.load %4527 : !llvm.ptr + %4529 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4530 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4531 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4532 = llvm.mul %59, %4531 : !llvm.i64 + %4533 = llvm.add %4530, %4532 : !llvm.i64 + %4534 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4535 = llvm.mul %669, %4534 : !llvm.i64 + %4536 = llvm.add %4533, %4535 : !llvm.i64 + %4537 = llvm.getelementptr %4529[%4536] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4538 = llvm.load %4537 : !llvm.ptr + %4539 = llvm.fmul %4528, %4538 {RelaxedPrecision} : !llvm.float + %4540 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4542 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4543 = llvm.mul %3918, %4542 : !llvm.i64 + %4544 = llvm.add %4541, %4543 : !llvm.i64 + %4545 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4546 = llvm.mul %669, %4545 : !llvm.i64 + %4547 = llvm.add %4544, %4546 : !llvm.i64 + %4548 = llvm.getelementptr %4540[%4547] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4549 = llvm.load %4548 : !llvm.ptr + %4550 = llvm.fadd %4549, %4539 {RelaxedPrecision} : !llvm.float + %4551 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4552 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4553 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4554 = llvm.mul %3918, %4553 : !llvm.i64 + %4555 = llvm.add %4552, %4554 : !llvm.i64 + %4556 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4557 = llvm.mul %669, %4556 : !llvm.i64 + %4558 = llvm.add %4555, %4557 : !llvm.i64 + %4559 = llvm.getelementptr %4551[%4558] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4550, %4559 : !llvm.ptr + %4560 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4561 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4562 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4563 = llvm.mul %3918, %4562 : !llvm.i64 + %4564 = llvm.add %4561, %4563 : !llvm.i64 + %4565 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4566 = llvm.mul %669, %4565 : !llvm.i64 + %4567 = llvm.add %4564, %4566 : !llvm.i64 + %4568 = llvm.getelementptr %4560[%4567] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4569 = llvm.load %4568 : !llvm.ptr + %4570 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4571 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4572 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4573 = llvm.mul %3918, %4572 : !llvm.i64 + %4574 = llvm.add %4571, %4573 : !llvm.i64 + %4575 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4576 = llvm.mul %669, %4575 : !llvm.i64 + %4577 = llvm.add %4574, %4576 : !llvm.i64 + %4578 = llvm.getelementptr %4570[%4577] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4569, %4578 : !llvm.ptr + %4579 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4581 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4582 = llvm.mul %3918, %4581 : !llvm.i64 + %4583 = llvm.add %4580, %4582 : !llvm.i64 + %4584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4585 = llvm.mul %59, %4584 : !llvm.i64 + %4586 = llvm.add %4583, %4585 : !llvm.i64 + %4587 = llvm.getelementptr %4579[%4586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4588 = llvm.load %4587 : !llvm.ptr + %4589 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4591 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4592 = llvm.mul %59, %4591 : !llvm.i64 + %4593 = llvm.add %4590, %4592 : !llvm.i64 + %4594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4595 = llvm.mul %730, %4594 : !llvm.i64 + %4596 = llvm.add %4593, %4595 : !llvm.i64 + %4597 = llvm.getelementptr %4589[%4596] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4598 = llvm.load %4597 : !llvm.ptr + %4599 = llvm.fmul %4588, %4598 {RelaxedPrecision} : !llvm.float + %4600 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4602 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4603 = llvm.mul %3918, %4602 : !llvm.i64 + %4604 = llvm.add %4601, %4603 : !llvm.i64 + %4605 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4606 = llvm.mul %730, %4605 : !llvm.i64 + %4607 = llvm.add %4604, %4606 : !llvm.i64 + %4608 = llvm.getelementptr %4600[%4607] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4609 = llvm.load %4608 : !llvm.ptr + %4610 = llvm.fadd %4609, %4599 {RelaxedPrecision} : !llvm.float + %4611 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4612 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4613 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4614 = llvm.mul %3918, %4613 : !llvm.i64 + %4615 = llvm.add %4612, %4614 : !llvm.i64 + %4616 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4617 = llvm.mul %730, %4616 : !llvm.i64 + %4618 = llvm.add %4615, %4617 : !llvm.i64 + %4619 = llvm.getelementptr %4611[%4618] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4610, %4619 : !llvm.ptr + %4620 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4621 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4622 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4623 = llvm.mul %3918, %4622 : !llvm.i64 + %4624 = llvm.add %4621, %4623 : !llvm.i64 + %4625 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4626 = llvm.mul %730, %4625 : !llvm.i64 + %4627 = llvm.add %4624, %4626 : !llvm.i64 + %4628 = llvm.getelementptr %4620[%4627] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4629 = llvm.load %4628 : !llvm.ptr + %4630 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4632 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4633 = llvm.mul %3918, %4632 : !llvm.i64 + %4634 = llvm.add %4631, %4633 : !llvm.i64 + %4635 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4636 = llvm.mul %730, %4635 : !llvm.i64 + %4637 = llvm.add %4634, %4636 : !llvm.i64 + %4638 = llvm.getelementptr %4630[%4637] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4629, %4638 : !llvm.ptr + %4639 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4641 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4642 = llvm.mul %3918, %4641 : !llvm.i64 + %4643 = llvm.add %4640, %4642 : !llvm.i64 + %4644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4645 = llvm.mul %59, %4644 : !llvm.i64 + %4646 = llvm.add %4643, %4645 : !llvm.i64 + %4647 = llvm.getelementptr %4639[%4646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4648 = llvm.load %4647 : !llvm.ptr + %4649 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4651 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4652 = llvm.mul %59, %4651 : !llvm.i64 + %4653 = llvm.add %4650, %4652 : !llvm.i64 + %4654 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4655 = llvm.mul %791, %4654 : !llvm.i64 + %4656 = llvm.add %4653, %4655 : !llvm.i64 + %4657 = llvm.getelementptr %4649[%4656] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4658 = llvm.load %4657 : !llvm.ptr + %4659 = llvm.fmul %4648, %4658 {RelaxedPrecision} : !llvm.float + %4660 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4662 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4663 = llvm.mul %3918, %4662 : !llvm.i64 + %4664 = llvm.add %4661, %4663 : !llvm.i64 + %4665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4666 = llvm.mul %791, %4665 : !llvm.i64 + %4667 = llvm.add %4664, %4666 : !llvm.i64 + %4668 = llvm.getelementptr %4660[%4667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4669 = llvm.load %4668 : !llvm.ptr + %4670 = llvm.fadd %4669, %4659 {RelaxedPrecision} : !llvm.float + %4671 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4672 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4673 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4674 = llvm.mul %3918, %4673 : !llvm.i64 + %4675 = llvm.add %4672, %4674 : !llvm.i64 + %4676 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4677 = llvm.mul %791, %4676 : !llvm.i64 + %4678 = llvm.add %4675, %4677 : !llvm.i64 + %4679 = llvm.getelementptr %4671[%4678] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4670, %4679 : !llvm.ptr + %4680 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4681 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4682 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4683 = llvm.mul %3918, %4682 : !llvm.i64 + %4684 = llvm.add %4681, %4683 : !llvm.i64 + %4685 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4686 = llvm.mul %791, %4685 : !llvm.i64 + %4687 = llvm.add %4684, %4686 : !llvm.i64 + %4688 = llvm.getelementptr %4680[%4687] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4689 = llvm.load %4688 : !llvm.ptr + %4690 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4691 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4692 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4693 = llvm.mul %3918, %4692 : !llvm.i64 + %4694 = llvm.add %4691, %4693 : !llvm.i64 + %4695 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4696 = llvm.mul %791, %4695 : !llvm.i64 + %4697 = llvm.add %4694, %4696 : !llvm.i64 + %4698 = llvm.getelementptr %4690[%4697] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4689, %4698 : !llvm.ptr + %4699 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4701 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4702 = llvm.mul %3918, %4701 : !llvm.i64 + %4703 = llvm.add %4700, %4702 : !llvm.i64 + %4704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4705 = llvm.mul %59, %4704 : !llvm.i64 + %4706 = llvm.add %4703, %4705 : !llvm.i64 + %4707 = llvm.getelementptr %4699[%4706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4708 = llvm.load %4707 : !llvm.ptr + %4709 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4710 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4711 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4712 = llvm.mul %59, %4711 : !llvm.i64 + %4713 = llvm.add %4710, %4712 : !llvm.i64 + %4714 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4715 = llvm.mul %852, %4714 : !llvm.i64 + %4716 = llvm.add %4713, %4715 : !llvm.i64 + %4717 = llvm.getelementptr %4709[%4716] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4718 = llvm.load %4717 : !llvm.ptr + %4719 = llvm.fmul %4708, %4718 {RelaxedPrecision} : !llvm.float + %4720 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4721 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4722 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4723 = llvm.mul %3918, %4722 : !llvm.i64 + %4724 = llvm.add %4721, %4723 : !llvm.i64 + %4725 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4726 = llvm.mul %852, %4725 : !llvm.i64 + %4727 = llvm.add %4724, %4726 : !llvm.i64 + %4728 = llvm.getelementptr %4720[%4727] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4729 = llvm.load %4728 : !llvm.ptr + %4730 = llvm.fadd %4729, %4719 {RelaxedPrecision} : !llvm.float + %4731 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4732 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4733 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4734 = llvm.mul %3918, %4733 : !llvm.i64 + %4735 = llvm.add %4732, %4734 : !llvm.i64 + %4736 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4737 = llvm.mul %852, %4736 : !llvm.i64 + %4738 = llvm.add %4735, %4737 : !llvm.i64 + %4739 = llvm.getelementptr %4731[%4738] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4730, %4739 : !llvm.ptr + %4740 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4741 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4742 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4743 = llvm.mul %3918, %4742 : !llvm.i64 + %4744 = llvm.add %4741, %4743 : !llvm.i64 + %4745 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4746 = llvm.mul %852, %4745 : !llvm.i64 + %4747 = llvm.add %4744, %4746 : !llvm.i64 + %4748 = llvm.getelementptr %4740[%4747] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4749 = llvm.load %4748 : !llvm.ptr + %4750 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4751 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4752 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4753 = llvm.mul %3918, %4752 : !llvm.i64 + %4754 = llvm.add %4751, %4753 : !llvm.i64 + %4755 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4756 = llvm.mul %852, %4755 : !llvm.i64 + %4757 = llvm.add %4754, %4756 : !llvm.i64 + %4758 = llvm.getelementptr %4750[%4757] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4749, %4758 : !llvm.ptr + %4759 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4761 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4762 = llvm.mul %3918, %4761 : !llvm.i64 + %4763 = llvm.add %4760, %4762 : !llvm.i64 + %4764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4765 = llvm.mul %59, %4764 : !llvm.i64 + %4766 = llvm.add %4763, %4765 : !llvm.i64 + %4767 = llvm.getelementptr %4759[%4766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4768 = llvm.load %4767 : !llvm.ptr + %4769 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4770 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4771 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4772 = llvm.mul %59, %4771 : !llvm.i64 + %4773 = llvm.add %4770, %4772 : !llvm.i64 + %4774 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4775 = llvm.mul %913, %4774 : !llvm.i64 + %4776 = llvm.add %4773, %4775 : !llvm.i64 + %4777 = llvm.getelementptr %4769[%4776] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4778 = llvm.load %4777 : !llvm.ptr + %4779 = llvm.fmul %4768, %4778 {RelaxedPrecision} : !llvm.float + %4780 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4781 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4782 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4783 = llvm.mul %3918, %4782 : !llvm.i64 + %4784 = llvm.add %4781, %4783 : !llvm.i64 + %4785 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4786 = llvm.mul %913, %4785 : !llvm.i64 + %4787 = llvm.add %4784, %4786 : !llvm.i64 + %4788 = llvm.getelementptr %4780[%4787] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4789 = llvm.load %4788 : !llvm.ptr + %4790 = llvm.fadd %4789, %4779 {RelaxedPrecision} : !llvm.float + %4791 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4792 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4793 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4794 = llvm.mul %3918, %4793 : !llvm.i64 + %4795 = llvm.add %4792, %4794 : !llvm.i64 + %4796 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4797 = llvm.mul %913, %4796 : !llvm.i64 + %4798 = llvm.add %4795, %4797 : !llvm.i64 + %4799 = llvm.getelementptr %4791[%4798] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4790, %4799 : !llvm.ptr + %4800 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4801 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4802 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4803 = llvm.mul %3918, %4802 : !llvm.i64 + %4804 = llvm.add %4801, %4803 : !llvm.i64 + %4805 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4806 = llvm.mul %913, %4805 : !llvm.i64 + %4807 = llvm.add %4804, %4806 : !llvm.i64 + %4808 = llvm.getelementptr %4800[%4807] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4809 = llvm.load %4808 : !llvm.ptr + %4810 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4811 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4812 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4813 = llvm.mul %3918, %4812 : !llvm.i64 + %4814 = llvm.add %4811, %4813 : !llvm.i64 + %4815 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4816 = llvm.mul %913, %4815 : !llvm.i64 + %4817 = llvm.add %4814, %4816 : !llvm.i64 + %4818 = llvm.getelementptr %4810[%4817] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4809, %4818 : !llvm.ptr + %4819 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4821 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4822 = llvm.mul %3918, %4821 : !llvm.i64 + %4823 = llvm.add %4820, %4822 : !llvm.i64 + %4824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4825 = llvm.mul %59, %4824 : !llvm.i64 + %4826 = llvm.add %4823, %4825 : !llvm.i64 + %4827 = llvm.getelementptr %4819[%4826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4828 = llvm.load %4827 : !llvm.ptr + %4829 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4830 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4831 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4832 = llvm.mul %59, %4831 : !llvm.i64 + %4833 = llvm.add %4830, %4832 : !llvm.i64 + %4834 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4835 = llvm.mul %974, %4834 : !llvm.i64 + %4836 = llvm.add %4833, %4835 : !llvm.i64 + %4837 = llvm.getelementptr %4829[%4836] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4838 = llvm.load %4837 : !llvm.ptr + %4839 = llvm.fmul %4828, %4838 {RelaxedPrecision} : !llvm.float + %4840 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4841 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4842 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4843 = llvm.mul %3918, %4842 : !llvm.i64 + %4844 = llvm.add %4841, %4843 : !llvm.i64 + %4845 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4846 = llvm.mul %974, %4845 : !llvm.i64 + %4847 = llvm.add %4844, %4846 : !llvm.i64 + %4848 = llvm.getelementptr %4840[%4847] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4849 = llvm.load %4848 : !llvm.ptr + %4850 = llvm.fadd %4849, %4839 {RelaxedPrecision} : !llvm.float + %4851 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4852 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4853 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4854 = llvm.mul %3918, %4853 : !llvm.i64 + %4855 = llvm.add %4852, %4854 : !llvm.i64 + %4856 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4857 = llvm.mul %974, %4856 : !llvm.i64 + %4858 = llvm.add %4855, %4857 : !llvm.i64 + %4859 = llvm.getelementptr %4851[%4858] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4850, %4859 : !llvm.ptr + %4860 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4861 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4862 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4863 = llvm.mul %3918, %4862 : !llvm.i64 + %4864 = llvm.add %4861, %4863 : !llvm.i64 + %4865 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4866 = llvm.mul %974, %4865 : !llvm.i64 + %4867 = llvm.add %4864, %4866 : !llvm.i64 + %4868 = llvm.getelementptr %4860[%4867] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4869 = llvm.load %4868 : !llvm.ptr + %4870 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4871 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4872 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4873 = llvm.mul %3918, %4872 : !llvm.i64 + %4874 = llvm.add %4871, %4873 : !llvm.i64 + %4875 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4876 = llvm.mul %974, %4875 : !llvm.i64 + %4877 = llvm.add %4874, %4876 : !llvm.i64 + %4878 = llvm.getelementptr %4870[%4877] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4869, %4878 : !llvm.ptr + %4879 = llvm.add %50, %37 : !llvm.i64 + %4880 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4881 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4882 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4883 = llvm.mul %4879, %4882 : !llvm.i64 + %4884 = llvm.add %4881, %4883 : !llvm.i64 + %4885 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4886 = llvm.mul %59, %4885 : !llvm.i64 + %4887 = llvm.add %4884, %4886 : !llvm.i64 + %4888 = llvm.getelementptr %4880[%4887] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4889 = llvm.load %4888 : !llvm.ptr + %4890 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4892 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4893 = llvm.mul %59, %4892 : !llvm.i64 + %4894 = llvm.add %4891, %4893 : !llvm.i64 + %4895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4896 = llvm.mul %58, %4895 : !llvm.i64 + %4897 = llvm.add %4894, %4896 : !llvm.i64 + %4898 = llvm.getelementptr %4890[%4897] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4899 = llvm.load %4898 : !llvm.ptr + %4900 = llvm.fmul %4889, %4899 {RelaxedPrecision} : !llvm.float + %4901 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4903 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4904 = llvm.mul %4879, %4903 : !llvm.i64 + %4905 = llvm.add %4902, %4904 : !llvm.i64 + %4906 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4907 = llvm.mul %58, %4906 : !llvm.i64 + %4908 = llvm.add %4905, %4907 : !llvm.i64 + %4909 = llvm.getelementptr %4901[%4908] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4910 = llvm.load %4909 : !llvm.ptr + %4911 = llvm.fadd %4910, %4900 {RelaxedPrecision} : !llvm.float + %4912 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4913 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4914 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4915 = llvm.mul %4879, %4914 : !llvm.i64 + %4916 = llvm.add %4913, %4915 : !llvm.i64 + %4917 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4918 = llvm.mul %58, %4917 : !llvm.i64 + %4919 = llvm.add %4916, %4918 : !llvm.i64 + %4920 = llvm.getelementptr %4912[%4919] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4911, %4920 : !llvm.ptr + %4921 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4922 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4923 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4924 = llvm.mul %4879, %4923 : !llvm.i64 + %4925 = llvm.add %4922, %4924 : !llvm.i64 + %4926 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4927 = llvm.mul %58, %4926 : !llvm.i64 + %4928 = llvm.add %4925, %4927 : !llvm.i64 + %4929 = llvm.getelementptr %4921[%4928] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4930 = llvm.load %4929 : !llvm.ptr + %4931 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4932 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4933 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4934 = llvm.mul %4879, %4933 : !llvm.i64 + %4935 = llvm.add %4932, %4934 : !llvm.i64 + %4936 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4937 = llvm.mul %58, %4936 : !llvm.i64 + %4938 = llvm.add %4935, %4937 : !llvm.i64 + %4939 = llvm.getelementptr %4931[%4938] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4930, %4939 : !llvm.ptr + %4940 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4941 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4942 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4943 = llvm.mul %4879, %4942 : !llvm.i64 + %4944 = llvm.add %4941, %4943 : !llvm.i64 + %4945 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4946 = llvm.mul %59, %4945 : !llvm.i64 + %4947 = llvm.add %4944, %4946 : !llvm.i64 + %4948 = llvm.getelementptr %4940[%4947] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4949 = llvm.load %4948 : !llvm.ptr + %4950 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4951 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4952 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4953 = llvm.mul %59, %4952 : !llvm.i64 + %4954 = llvm.add %4951, %4953 : !llvm.i64 + %4955 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4956 = llvm.mul %120, %4955 : !llvm.i64 + %4957 = llvm.add %4954, %4956 : !llvm.i64 + %4958 = llvm.getelementptr %4950[%4957] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4959 = llvm.load %4958 : !llvm.ptr + %4960 = llvm.fmul %4949, %4959 {RelaxedPrecision} : !llvm.float + %4961 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4962 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4963 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4964 = llvm.mul %4879, %4963 : !llvm.i64 + %4965 = llvm.add %4962, %4964 : !llvm.i64 + %4966 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4967 = llvm.mul %120, %4966 : !llvm.i64 + %4968 = llvm.add %4965, %4967 : !llvm.i64 + %4969 = llvm.getelementptr %4961[%4968] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4970 = llvm.load %4969 : !llvm.ptr + %4971 = llvm.fadd %4970, %4960 {RelaxedPrecision} : !llvm.float + %4972 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4973 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4974 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4975 = llvm.mul %4879, %4974 : !llvm.i64 + %4976 = llvm.add %4973, %4975 : !llvm.i64 + %4977 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4978 = llvm.mul %120, %4977 : !llvm.i64 + %4979 = llvm.add %4976, %4978 : !llvm.i64 + %4980 = llvm.getelementptr %4972[%4979] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4971, %4980 : !llvm.ptr + %4981 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4982 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4983 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4984 = llvm.mul %4879, %4983 : !llvm.i64 + %4985 = llvm.add %4982, %4984 : !llvm.i64 + %4986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4987 = llvm.mul %120, %4986 : !llvm.i64 + %4988 = llvm.add %4985, %4987 : !llvm.i64 + %4989 = llvm.getelementptr %4981[%4988] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4990 = llvm.load %4989 : !llvm.ptr + %4991 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4992 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4993 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4994 = llvm.mul %4879, %4993 : !llvm.i64 + %4995 = llvm.add %4992, %4994 : !llvm.i64 + %4996 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4997 = llvm.mul %120, %4996 : !llvm.i64 + %4998 = llvm.add %4995, %4997 : !llvm.i64 + %4999 = llvm.getelementptr %4991[%4998] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4990, %4999 : !llvm.ptr + %5000 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5001 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5002 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5003 = llvm.mul %4879, %5002 : !llvm.i64 + %5004 = llvm.add %5001, %5003 : !llvm.i64 + %5005 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5006 = llvm.mul %59, %5005 : !llvm.i64 + %5007 = llvm.add %5004, %5006 : !llvm.i64 + %5008 = llvm.getelementptr %5000[%5007] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5009 = llvm.load %5008 : !llvm.ptr + %5010 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5011 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5012 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5013 = llvm.mul %59, %5012 : !llvm.i64 + %5014 = llvm.add %5011, %5013 : !llvm.i64 + %5015 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5016 = llvm.mul %181, %5015 : !llvm.i64 + %5017 = llvm.add %5014, %5016 : !llvm.i64 + %5018 = llvm.getelementptr %5010[%5017] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5019 = llvm.load %5018 : !llvm.ptr + %5020 = llvm.fmul %5009, %5019 {RelaxedPrecision} : !llvm.float + %5021 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5022 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5023 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5024 = llvm.mul %4879, %5023 : !llvm.i64 + %5025 = llvm.add %5022, %5024 : !llvm.i64 + %5026 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5027 = llvm.mul %181, %5026 : !llvm.i64 + %5028 = llvm.add %5025, %5027 : !llvm.i64 + %5029 = llvm.getelementptr %5021[%5028] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5030 = llvm.load %5029 : !llvm.ptr + %5031 = llvm.fadd %5030, %5020 {RelaxedPrecision} : !llvm.float + %5032 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5033 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5034 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5035 = llvm.mul %4879, %5034 : !llvm.i64 + %5036 = llvm.add %5033, %5035 : !llvm.i64 + %5037 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5038 = llvm.mul %181, %5037 : !llvm.i64 + %5039 = llvm.add %5036, %5038 : !llvm.i64 + %5040 = llvm.getelementptr %5032[%5039] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5031, %5040 : !llvm.ptr + %5041 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5042 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5043 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5044 = llvm.mul %4879, %5043 : !llvm.i64 + %5045 = llvm.add %5042, %5044 : !llvm.i64 + %5046 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5047 = llvm.mul %181, %5046 : !llvm.i64 + %5048 = llvm.add %5045, %5047 : !llvm.i64 + %5049 = llvm.getelementptr %5041[%5048] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5050 = llvm.load %5049 : !llvm.ptr + %5051 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5052 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5053 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5054 = llvm.mul %4879, %5053 : !llvm.i64 + %5055 = llvm.add %5052, %5054 : !llvm.i64 + %5056 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5057 = llvm.mul %181, %5056 : !llvm.i64 + %5058 = llvm.add %5055, %5057 : !llvm.i64 + %5059 = llvm.getelementptr %5051[%5058] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5050, %5059 : !llvm.ptr + %5060 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5062 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5063 = llvm.mul %4879, %5062 : !llvm.i64 + %5064 = llvm.add %5061, %5063 : !llvm.i64 + %5065 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5066 = llvm.mul %59, %5065 : !llvm.i64 + %5067 = llvm.add %5064, %5066 : !llvm.i64 + %5068 = llvm.getelementptr %5060[%5067] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5069 = llvm.load %5068 : !llvm.ptr + %5070 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5071 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5072 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5073 = llvm.mul %59, %5072 : !llvm.i64 + %5074 = llvm.add %5071, %5073 : !llvm.i64 + %5075 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5076 = llvm.mul %242, %5075 : !llvm.i64 + %5077 = llvm.add %5074, %5076 : !llvm.i64 + %5078 = llvm.getelementptr %5070[%5077] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5079 = llvm.load %5078 : !llvm.ptr + %5080 = llvm.fmul %5069, %5079 {RelaxedPrecision} : !llvm.float + %5081 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5082 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5083 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5084 = llvm.mul %4879, %5083 : !llvm.i64 + %5085 = llvm.add %5082, %5084 : !llvm.i64 + %5086 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5087 = llvm.mul %242, %5086 : !llvm.i64 + %5088 = llvm.add %5085, %5087 : !llvm.i64 + %5089 = llvm.getelementptr %5081[%5088] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5090 = llvm.load %5089 : !llvm.ptr + %5091 = llvm.fadd %5090, %5080 {RelaxedPrecision} : !llvm.float + %5092 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5093 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5094 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5095 = llvm.mul %4879, %5094 : !llvm.i64 + %5096 = llvm.add %5093, %5095 : !llvm.i64 + %5097 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5098 = llvm.mul %242, %5097 : !llvm.i64 + %5099 = llvm.add %5096, %5098 : !llvm.i64 + %5100 = llvm.getelementptr %5092[%5099] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5091, %5100 : !llvm.ptr + %5101 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5102 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5103 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5104 = llvm.mul %4879, %5103 : !llvm.i64 + %5105 = llvm.add %5102, %5104 : !llvm.i64 + %5106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5107 = llvm.mul %242, %5106 : !llvm.i64 + %5108 = llvm.add %5105, %5107 : !llvm.i64 + %5109 = llvm.getelementptr %5101[%5108] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5110 = llvm.load %5109 : !llvm.ptr + %5111 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5112 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5113 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5114 = llvm.mul %4879, %5113 : !llvm.i64 + %5115 = llvm.add %5112, %5114 : !llvm.i64 + %5116 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5117 = llvm.mul %242, %5116 : !llvm.i64 + %5118 = llvm.add %5115, %5117 : !llvm.i64 + %5119 = llvm.getelementptr %5111[%5118] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5110, %5119 : !llvm.ptr + %5120 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5122 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5123 = llvm.mul %4879, %5122 : !llvm.i64 + %5124 = llvm.add %5121, %5123 : !llvm.i64 + %5125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5126 = llvm.mul %59, %5125 : !llvm.i64 + %5127 = llvm.add %5124, %5126 : !llvm.i64 + %5128 = llvm.getelementptr %5120[%5127] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5129 = llvm.load %5128 : !llvm.ptr + %5130 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5132 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5133 = llvm.mul %59, %5132 : !llvm.i64 + %5134 = llvm.add %5131, %5133 : !llvm.i64 + %5135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5136 = llvm.mul %303, %5135 : !llvm.i64 + %5137 = llvm.add %5134, %5136 : !llvm.i64 + %5138 = llvm.getelementptr %5130[%5137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5139 = llvm.load %5138 : !llvm.ptr + %5140 = llvm.fmul %5129, %5139 {RelaxedPrecision} : !llvm.float + %5141 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5142 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5143 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5144 = llvm.mul %4879, %5143 : !llvm.i64 + %5145 = llvm.add %5142, %5144 : !llvm.i64 + %5146 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5147 = llvm.mul %303, %5146 : !llvm.i64 + %5148 = llvm.add %5145, %5147 : !llvm.i64 + %5149 = llvm.getelementptr %5141[%5148] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5150 = llvm.load %5149 : !llvm.ptr + %5151 = llvm.fadd %5150, %5140 {RelaxedPrecision} : !llvm.float + %5152 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5153 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5154 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5155 = llvm.mul %4879, %5154 : !llvm.i64 + %5156 = llvm.add %5153, %5155 : !llvm.i64 + %5157 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5158 = llvm.mul %303, %5157 : !llvm.i64 + %5159 = llvm.add %5156, %5158 : !llvm.i64 + %5160 = llvm.getelementptr %5152[%5159] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5151, %5160 : !llvm.ptr + %5161 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5162 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5163 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5164 = llvm.mul %4879, %5163 : !llvm.i64 + %5165 = llvm.add %5162, %5164 : !llvm.i64 + %5166 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5167 = llvm.mul %303, %5166 : !llvm.i64 + %5168 = llvm.add %5165, %5167 : !llvm.i64 + %5169 = llvm.getelementptr %5161[%5168] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5170 = llvm.load %5169 : !llvm.ptr + %5171 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5172 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5173 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5174 = llvm.mul %4879, %5173 : !llvm.i64 + %5175 = llvm.add %5172, %5174 : !llvm.i64 + %5176 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5177 = llvm.mul %303, %5176 : !llvm.i64 + %5178 = llvm.add %5175, %5177 : !llvm.i64 + %5179 = llvm.getelementptr %5171[%5178] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5170, %5179 : !llvm.ptr + %5180 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5182 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5183 = llvm.mul %4879, %5182 : !llvm.i64 + %5184 = llvm.add %5181, %5183 : !llvm.i64 + %5185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5186 = llvm.mul %59, %5185 : !llvm.i64 + %5187 = llvm.add %5184, %5186 : !llvm.i64 + %5188 = llvm.getelementptr %5180[%5187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5189 = llvm.load %5188 : !llvm.ptr + %5190 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5192 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5193 = llvm.mul %59, %5192 : !llvm.i64 + %5194 = llvm.add %5191, %5193 : !llvm.i64 + %5195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5196 = llvm.mul %364, %5195 : !llvm.i64 + %5197 = llvm.add %5194, %5196 : !llvm.i64 + %5198 = llvm.getelementptr %5190[%5197] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5199 = llvm.load %5198 : !llvm.ptr + %5200 = llvm.fmul %5189, %5199 {RelaxedPrecision} : !llvm.float + %5201 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5202 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5203 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5204 = llvm.mul %4879, %5203 : !llvm.i64 + %5205 = llvm.add %5202, %5204 : !llvm.i64 + %5206 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5207 = llvm.mul %364, %5206 : !llvm.i64 + %5208 = llvm.add %5205, %5207 : !llvm.i64 + %5209 = llvm.getelementptr %5201[%5208] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5210 = llvm.load %5209 : !llvm.ptr + %5211 = llvm.fadd %5210, %5200 {RelaxedPrecision} : !llvm.float + %5212 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5213 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5214 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5215 = llvm.mul %4879, %5214 : !llvm.i64 + %5216 = llvm.add %5213, %5215 : !llvm.i64 + %5217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5218 = llvm.mul %364, %5217 : !llvm.i64 + %5219 = llvm.add %5216, %5218 : !llvm.i64 + %5220 = llvm.getelementptr %5212[%5219] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5211, %5220 : !llvm.ptr + %5221 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5223 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5224 = llvm.mul %4879, %5223 : !llvm.i64 + %5225 = llvm.add %5222, %5224 : !llvm.i64 + %5226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5227 = llvm.mul %364, %5226 : !llvm.i64 + %5228 = llvm.add %5225, %5227 : !llvm.i64 + %5229 = llvm.getelementptr %5221[%5228] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5230 = llvm.load %5229 : !llvm.ptr + %5231 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5232 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5233 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5234 = llvm.mul %4879, %5233 : !llvm.i64 + %5235 = llvm.add %5232, %5234 : !llvm.i64 + %5236 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5237 = llvm.mul %364, %5236 : !llvm.i64 + %5238 = llvm.add %5235, %5237 : !llvm.i64 + %5239 = llvm.getelementptr %5231[%5238] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5230, %5239 : !llvm.ptr + %5240 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5242 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5243 = llvm.mul %4879, %5242 : !llvm.i64 + %5244 = llvm.add %5241, %5243 : !llvm.i64 + %5245 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5246 = llvm.mul %59, %5245 : !llvm.i64 + %5247 = llvm.add %5244, %5246 : !llvm.i64 + %5248 = llvm.getelementptr %5240[%5247] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5249 = llvm.load %5248 : !llvm.ptr + %5250 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5252 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5253 = llvm.mul %59, %5252 : !llvm.i64 + %5254 = llvm.add %5251, %5253 : !llvm.i64 + %5255 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5256 = llvm.mul %425, %5255 : !llvm.i64 + %5257 = llvm.add %5254, %5256 : !llvm.i64 + %5258 = llvm.getelementptr %5250[%5257] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5259 = llvm.load %5258 : !llvm.ptr + %5260 = llvm.fmul %5249, %5259 {RelaxedPrecision} : !llvm.float + %5261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5263 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5264 = llvm.mul %4879, %5263 : !llvm.i64 + %5265 = llvm.add %5262, %5264 : !llvm.i64 + %5266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5267 = llvm.mul %425, %5266 : !llvm.i64 + %5268 = llvm.add %5265, %5267 : !llvm.i64 + %5269 = llvm.getelementptr %5261[%5268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5270 = llvm.load %5269 : !llvm.ptr + %5271 = llvm.fadd %5270, %5260 {RelaxedPrecision} : !llvm.float + %5272 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5273 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5274 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5275 = llvm.mul %4879, %5274 : !llvm.i64 + %5276 = llvm.add %5273, %5275 : !llvm.i64 + %5277 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5278 = llvm.mul %425, %5277 : !llvm.i64 + %5279 = llvm.add %5276, %5278 : !llvm.i64 + %5280 = llvm.getelementptr %5272[%5279] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5271, %5280 : !llvm.ptr + %5281 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5282 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5283 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5284 = llvm.mul %4879, %5283 : !llvm.i64 + %5285 = llvm.add %5282, %5284 : !llvm.i64 + %5286 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5287 = llvm.mul %425, %5286 : !llvm.i64 + %5288 = llvm.add %5285, %5287 : !llvm.i64 + %5289 = llvm.getelementptr %5281[%5288] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5290 = llvm.load %5289 : !llvm.ptr + %5291 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5292 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5293 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5294 = llvm.mul %4879, %5293 : !llvm.i64 + %5295 = llvm.add %5292, %5294 : !llvm.i64 + %5296 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5297 = llvm.mul %425, %5296 : !llvm.i64 + %5298 = llvm.add %5295, %5297 : !llvm.i64 + %5299 = llvm.getelementptr %5291[%5298] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5290, %5299 : !llvm.ptr + %5300 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5302 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5303 = llvm.mul %4879, %5302 : !llvm.i64 + %5304 = llvm.add %5301, %5303 : !llvm.i64 + %5305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5306 = llvm.mul %59, %5305 : !llvm.i64 + %5307 = llvm.add %5304, %5306 : !llvm.i64 + %5308 = llvm.getelementptr %5300[%5307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5309 = llvm.load %5308 : !llvm.ptr + %5310 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5311 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5312 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5313 = llvm.mul %59, %5312 : !llvm.i64 + %5314 = llvm.add %5311, %5313 : !llvm.i64 + %5315 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5316 = llvm.mul %486, %5315 : !llvm.i64 + %5317 = llvm.add %5314, %5316 : !llvm.i64 + %5318 = llvm.getelementptr %5310[%5317] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5319 = llvm.load %5318 : !llvm.ptr + %5320 = llvm.fmul %5309, %5319 {RelaxedPrecision} : !llvm.float + %5321 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5322 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5323 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5324 = llvm.mul %4879, %5323 : !llvm.i64 + %5325 = llvm.add %5322, %5324 : !llvm.i64 + %5326 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5327 = llvm.mul %486, %5326 : !llvm.i64 + %5328 = llvm.add %5325, %5327 : !llvm.i64 + %5329 = llvm.getelementptr %5321[%5328] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5330 = llvm.load %5329 : !llvm.ptr + %5331 = llvm.fadd %5330, %5320 {RelaxedPrecision} : !llvm.float + %5332 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5333 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5334 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5335 = llvm.mul %4879, %5334 : !llvm.i64 + %5336 = llvm.add %5333, %5335 : !llvm.i64 + %5337 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5338 = llvm.mul %486, %5337 : !llvm.i64 + %5339 = llvm.add %5336, %5338 : !llvm.i64 + %5340 = llvm.getelementptr %5332[%5339] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5331, %5340 : !llvm.ptr + %5341 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5342 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5343 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5344 = llvm.mul %4879, %5343 : !llvm.i64 + %5345 = llvm.add %5342, %5344 : !llvm.i64 + %5346 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5347 = llvm.mul %486, %5346 : !llvm.i64 + %5348 = llvm.add %5345, %5347 : !llvm.i64 + %5349 = llvm.getelementptr %5341[%5348] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5350 = llvm.load %5349 : !llvm.ptr + %5351 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5352 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5353 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5354 = llvm.mul %4879, %5353 : !llvm.i64 + %5355 = llvm.add %5352, %5354 : !llvm.i64 + %5356 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5357 = llvm.mul %486, %5356 : !llvm.i64 + %5358 = llvm.add %5355, %5357 : !llvm.i64 + %5359 = llvm.getelementptr %5351[%5358] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5350, %5359 : !llvm.ptr + %5360 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5361 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5362 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5363 = llvm.mul %4879, %5362 : !llvm.i64 + %5364 = llvm.add %5361, %5363 : !llvm.i64 + %5365 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5366 = llvm.mul %59, %5365 : !llvm.i64 + %5367 = llvm.add %5364, %5366 : !llvm.i64 + %5368 = llvm.getelementptr %5360[%5367] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5369 = llvm.load %5368 : !llvm.ptr + %5370 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5371 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5372 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5373 = llvm.mul %59, %5372 : !llvm.i64 + %5374 = llvm.add %5371, %5373 : !llvm.i64 + %5375 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5376 = llvm.mul %547, %5375 : !llvm.i64 + %5377 = llvm.add %5374, %5376 : !llvm.i64 + %5378 = llvm.getelementptr %5370[%5377] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5379 = llvm.load %5378 : !llvm.ptr + %5380 = llvm.fmul %5369, %5379 {RelaxedPrecision} : !llvm.float + %5381 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5383 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5384 = llvm.mul %4879, %5383 : !llvm.i64 + %5385 = llvm.add %5382, %5384 : !llvm.i64 + %5386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5387 = llvm.mul %547, %5386 : !llvm.i64 + %5388 = llvm.add %5385, %5387 : !llvm.i64 + %5389 = llvm.getelementptr %5381[%5388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5390 = llvm.load %5389 : !llvm.ptr + %5391 = llvm.fadd %5390, %5380 {RelaxedPrecision} : !llvm.float + %5392 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5393 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5394 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5395 = llvm.mul %4879, %5394 : !llvm.i64 + %5396 = llvm.add %5393, %5395 : !llvm.i64 + %5397 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5398 = llvm.mul %547, %5397 : !llvm.i64 + %5399 = llvm.add %5396, %5398 : !llvm.i64 + %5400 = llvm.getelementptr %5392[%5399] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5391, %5400 : !llvm.ptr + %5401 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5403 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5404 = llvm.mul %4879, %5403 : !llvm.i64 + %5405 = llvm.add %5402, %5404 : !llvm.i64 + %5406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5407 = llvm.mul %547, %5406 : !llvm.i64 + %5408 = llvm.add %5405, %5407 : !llvm.i64 + %5409 = llvm.getelementptr %5401[%5408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5410 = llvm.load %5409 : !llvm.ptr + %5411 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5413 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5414 = llvm.mul %4879, %5413 : !llvm.i64 + %5415 = llvm.add %5412, %5414 : !llvm.i64 + %5416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5417 = llvm.mul %547, %5416 : !llvm.i64 + %5418 = llvm.add %5415, %5417 : !llvm.i64 + %5419 = llvm.getelementptr %5411[%5418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5410, %5419 : !llvm.ptr + %5420 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5421 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5422 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5423 = llvm.mul %4879, %5422 : !llvm.i64 + %5424 = llvm.add %5421, %5423 : !llvm.i64 + %5425 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5426 = llvm.mul %59, %5425 : !llvm.i64 + %5427 = llvm.add %5424, %5426 : !llvm.i64 + %5428 = llvm.getelementptr %5420[%5427] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5429 = llvm.load %5428 : !llvm.ptr + %5430 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5431 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5432 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5433 = llvm.mul %59, %5432 : !llvm.i64 + %5434 = llvm.add %5431, %5433 : !llvm.i64 + %5435 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5436 = llvm.mul %608, %5435 : !llvm.i64 + %5437 = llvm.add %5434, %5436 : !llvm.i64 + %5438 = llvm.getelementptr %5430[%5437] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5439 = llvm.load %5438 : !llvm.ptr + %5440 = llvm.fmul %5429, %5439 {RelaxedPrecision} : !llvm.float + %5441 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5442 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5443 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5444 = llvm.mul %4879, %5443 : !llvm.i64 + %5445 = llvm.add %5442, %5444 : !llvm.i64 + %5446 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5447 = llvm.mul %608, %5446 : !llvm.i64 + %5448 = llvm.add %5445, %5447 : !llvm.i64 + %5449 = llvm.getelementptr %5441[%5448] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5450 = llvm.load %5449 : !llvm.ptr + %5451 = llvm.fadd %5450, %5440 {RelaxedPrecision} : !llvm.float + %5452 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5455 = llvm.mul %4879, %5454 : !llvm.i64 + %5456 = llvm.add %5453, %5455 : !llvm.i64 + %5457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5458 = llvm.mul %608, %5457 : !llvm.i64 + %5459 = llvm.add %5456, %5458 : !llvm.i64 + %5460 = llvm.getelementptr %5452[%5459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5451, %5460 : !llvm.ptr + %5461 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5462 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5463 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5464 = llvm.mul %4879, %5463 : !llvm.i64 + %5465 = llvm.add %5462, %5464 : !llvm.i64 + %5466 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5467 = llvm.mul %608, %5466 : !llvm.i64 + %5468 = llvm.add %5465, %5467 : !llvm.i64 + %5469 = llvm.getelementptr %5461[%5468] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5470 = llvm.load %5469 : !llvm.ptr + %5471 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5472 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5473 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5474 = llvm.mul %4879, %5473 : !llvm.i64 + %5475 = llvm.add %5472, %5474 : !llvm.i64 + %5476 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5477 = llvm.mul %608, %5476 : !llvm.i64 + %5478 = llvm.add %5475, %5477 : !llvm.i64 + %5479 = llvm.getelementptr %5471[%5478] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5470, %5479 : !llvm.ptr + %5480 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5481 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5482 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5483 = llvm.mul %4879, %5482 : !llvm.i64 + %5484 = llvm.add %5481, %5483 : !llvm.i64 + %5485 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5486 = llvm.mul %59, %5485 : !llvm.i64 + %5487 = llvm.add %5484, %5486 : !llvm.i64 + %5488 = llvm.getelementptr %5480[%5487] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5489 = llvm.load %5488 : !llvm.ptr + %5490 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5491 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5492 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5493 = llvm.mul %59, %5492 : !llvm.i64 + %5494 = llvm.add %5491, %5493 : !llvm.i64 + %5495 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5496 = llvm.mul %669, %5495 : !llvm.i64 + %5497 = llvm.add %5494, %5496 : !llvm.i64 + %5498 = llvm.getelementptr %5490[%5497] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5499 = llvm.load %5498 : !llvm.ptr + %5500 = llvm.fmul %5489, %5499 {RelaxedPrecision} : !llvm.float + %5501 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5502 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5503 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5504 = llvm.mul %4879, %5503 : !llvm.i64 + %5505 = llvm.add %5502, %5504 : !llvm.i64 + %5506 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5507 = llvm.mul %669, %5506 : !llvm.i64 + %5508 = llvm.add %5505, %5507 : !llvm.i64 + %5509 = llvm.getelementptr %5501[%5508] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5510 = llvm.load %5509 : !llvm.ptr + %5511 = llvm.fadd %5510, %5500 {RelaxedPrecision} : !llvm.float + %5512 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5513 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5514 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5515 = llvm.mul %4879, %5514 : !llvm.i64 + %5516 = llvm.add %5513, %5515 : !llvm.i64 + %5517 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5518 = llvm.mul %669, %5517 : !llvm.i64 + %5519 = llvm.add %5516, %5518 : !llvm.i64 + %5520 = llvm.getelementptr %5512[%5519] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5511, %5520 : !llvm.ptr + %5521 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5522 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5523 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5524 = llvm.mul %4879, %5523 : !llvm.i64 + %5525 = llvm.add %5522, %5524 : !llvm.i64 + %5526 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5527 = llvm.mul %669, %5526 : !llvm.i64 + %5528 = llvm.add %5525, %5527 : !llvm.i64 + %5529 = llvm.getelementptr %5521[%5528] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5530 = llvm.load %5529 : !llvm.ptr + %5531 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5533 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5534 = llvm.mul %4879, %5533 : !llvm.i64 + %5535 = llvm.add %5532, %5534 : !llvm.i64 + %5536 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5537 = llvm.mul %669, %5536 : !llvm.i64 + %5538 = llvm.add %5535, %5537 : !llvm.i64 + %5539 = llvm.getelementptr %5531[%5538] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5530, %5539 : !llvm.ptr + %5540 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5542 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5543 = llvm.mul %4879, %5542 : !llvm.i64 + %5544 = llvm.add %5541, %5543 : !llvm.i64 + %5545 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5546 = llvm.mul %59, %5545 : !llvm.i64 + %5547 = llvm.add %5544, %5546 : !llvm.i64 + %5548 = llvm.getelementptr %5540[%5547] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5549 = llvm.load %5548 : !llvm.ptr + %5550 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5551 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5552 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5553 = llvm.mul %59, %5552 : !llvm.i64 + %5554 = llvm.add %5551, %5553 : !llvm.i64 + %5555 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5556 = llvm.mul %730, %5555 : !llvm.i64 + %5557 = llvm.add %5554, %5556 : !llvm.i64 + %5558 = llvm.getelementptr %5550[%5557] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5559 = llvm.load %5558 : !llvm.ptr + %5560 = llvm.fmul %5549, %5559 {RelaxedPrecision} : !llvm.float + %5561 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5562 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5563 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5564 = llvm.mul %4879, %5563 : !llvm.i64 + %5565 = llvm.add %5562, %5564 : !llvm.i64 + %5566 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5567 = llvm.mul %730, %5566 : !llvm.i64 + %5568 = llvm.add %5565, %5567 : !llvm.i64 + %5569 = llvm.getelementptr %5561[%5568] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5570 = llvm.load %5569 : !llvm.ptr + %5571 = llvm.fadd %5570, %5560 {RelaxedPrecision} : !llvm.float + %5572 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5573 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5574 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5575 = llvm.mul %4879, %5574 : !llvm.i64 + %5576 = llvm.add %5573, %5575 : !llvm.i64 + %5577 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5578 = llvm.mul %730, %5577 : !llvm.i64 + %5579 = llvm.add %5576, %5578 : !llvm.i64 + %5580 = llvm.getelementptr %5572[%5579] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5571, %5580 : !llvm.ptr + %5581 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5582 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5583 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5584 = llvm.mul %4879, %5583 : !llvm.i64 + %5585 = llvm.add %5582, %5584 : !llvm.i64 + %5586 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5587 = llvm.mul %730, %5586 : !llvm.i64 + %5588 = llvm.add %5585, %5587 : !llvm.i64 + %5589 = llvm.getelementptr %5581[%5588] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5590 = llvm.load %5589 : !llvm.ptr + %5591 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5593 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5594 = llvm.mul %4879, %5593 : !llvm.i64 + %5595 = llvm.add %5592, %5594 : !llvm.i64 + %5596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5597 = llvm.mul %730, %5596 : !llvm.i64 + %5598 = llvm.add %5595, %5597 : !llvm.i64 + %5599 = llvm.getelementptr %5591[%5598] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5590, %5599 : !llvm.ptr + %5600 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5602 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5603 = llvm.mul %4879, %5602 : !llvm.i64 + %5604 = llvm.add %5601, %5603 : !llvm.i64 + %5605 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5606 = llvm.mul %59, %5605 : !llvm.i64 + %5607 = llvm.add %5604, %5606 : !llvm.i64 + %5608 = llvm.getelementptr %5600[%5607] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5609 = llvm.load %5608 : !llvm.ptr + %5610 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5611 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5612 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5613 = llvm.mul %59, %5612 : !llvm.i64 + %5614 = llvm.add %5611, %5613 : !llvm.i64 + %5615 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5616 = llvm.mul %791, %5615 : !llvm.i64 + %5617 = llvm.add %5614, %5616 : !llvm.i64 + %5618 = llvm.getelementptr %5610[%5617] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5619 = llvm.load %5618 : !llvm.ptr + %5620 = llvm.fmul %5609, %5619 {RelaxedPrecision} : !llvm.float + %5621 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5622 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5623 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5624 = llvm.mul %4879, %5623 : !llvm.i64 + %5625 = llvm.add %5622, %5624 : !llvm.i64 + %5626 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5627 = llvm.mul %791, %5626 : !llvm.i64 + %5628 = llvm.add %5625, %5627 : !llvm.i64 + %5629 = llvm.getelementptr %5621[%5628] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5630 = llvm.load %5629 : !llvm.ptr + %5631 = llvm.fadd %5630, %5620 {RelaxedPrecision} : !llvm.float + %5632 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5633 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5634 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5635 = llvm.mul %4879, %5634 : !llvm.i64 + %5636 = llvm.add %5633, %5635 : !llvm.i64 + %5637 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5638 = llvm.mul %791, %5637 : !llvm.i64 + %5639 = llvm.add %5636, %5638 : !llvm.i64 + %5640 = llvm.getelementptr %5632[%5639] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5631, %5640 : !llvm.ptr + %5641 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5642 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5643 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5644 = llvm.mul %4879, %5643 : !llvm.i64 + %5645 = llvm.add %5642, %5644 : !llvm.i64 + %5646 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5647 = llvm.mul %791, %5646 : !llvm.i64 + %5648 = llvm.add %5645, %5647 : !llvm.i64 + %5649 = llvm.getelementptr %5641[%5648] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5650 = llvm.load %5649 : !llvm.ptr + %5651 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5652 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5653 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5654 = llvm.mul %4879, %5653 : !llvm.i64 + %5655 = llvm.add %5652, %5654 : !llvm.i64 + %5656 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5657 = llvm.mul %791, %5656 : !llvm.i64 + %5658 = llvm.add %5655, %5657 : !llvm.i64 + %5659 = llvm.getelementptr %5651[%5658] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5650, %5659 : !llvm.ptr + %5660 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5662 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5663 = llvm.mul %4879, %5662 : !llvm.i64 + %5664 = llvm.add %5661, %5663 : !llvm.i64 + %5665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5666 = llvm.mul %59, %5665 : !llvm.i64 + %5667 = llvm.add %5664, %5666 : !llvm.i64 + %5668 = llvm.getelementptr %5660[%5667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5669 = llvm.load %5668 : !llvm.ptr + %5670 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5672 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5673 = llvm.mul %59, %5672 : !llvm.i64 + %5674 = llvm.add %5671, %5673 : !llvm.i64 + %5675 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5676 = llvm.mul %852, %5675 : !llvm.i64 + %5677 = llvm.add %5674, %5676 : !llvm.i64 + %5678 = llvm.getelementptr %5670[%5677] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5679 = llvm.load %5678 : !llvm.ptr + %5680 = llvm.fmul %5669, %5679 {RelaxedPrecision} : !llvm.float + %5681 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5682 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5683 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5684 = llvm.mul %4879, %5683 : !llvm.i64 + %5685 = llvm.add %5682, %5684 : !llvm.i64 + %5686 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5687 = llvm.mul %852, %5686 : !llvm.i64 + %5688 = llvm.add %5685, %5687 : !llvm.i64 + %5689 = llvm.getelementptr %5681[%5688] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5690 = llvm.load %5689 : !llvm.ptr + %5691 = llvm.fadd %5690, %5680 {RelaxedPrecision} : !llvm.float + %5692 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5694 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5695 = llvm.mul %4879, %5694 : !llvm.i64 + %5696 = llvm.add %5693, %5695 : !llvm.i64 + %5697 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5698 = llvm.mul %852, %5697 : !llvm.i64 + %5699 = llvm.add %5696, %5698 : !llvm.i64 + %5700 = llvm.getelementptr %5692[%5699] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5691, %5700 : !llvm.ptr + %5701 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5702 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5703 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5704 = llvm.mul %4879, %5703 : !llvm.i64 + %5705 = llvm.add %5702, %5704 : !llvm.i64 + %5706 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5707 = llvm.mul %852, %5706 : !llvm.i64 + %5708 = llvm.add %5705, %5707 : !llvm.i64 + %5709 = llvm.getelementptr %5701[%5708] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5710 = llvm.load %5709 : !llvm.ptr + %5711 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5712 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5713 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5714 = llvm.mul %4879, %5713 : !llvm.i64 + %5715 = llvm.add %5712, %5714 : !llvm.i64 + %5716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5717 = llvm.mul %852, %5716 : !llvm.i64 + %5718 = llvm.add %5715, %5717 : !llvm.i64 + %5719 = llvm.getelementptr %5711[%5718] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5710, %5719 : !llvm.ptr + %5720 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5721 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5722 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5723 = llvm.mul %4879, %5722 : !llvm.i64 + %5724 = llvm.add %5721, %5723 : !llvm.i64 + %5725 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5726 = llvm.mul %59, %5725 : !llvm.i64 + %5727 = llvm.add %5724, %5726 : !llvm.i64 + %5728 = llvm.getelementptr %5720[%5727] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5729 = llvm.load %5728 : !llvm.ptr + %5730 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5731 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5732 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5733 = llvm.mul %59, %5732 : !llvm.i64 + %5734 = llvm.add %5731, %5733 : !llvm.i64 + %5735 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5736 = llvm.mul %913, %5735 : !llvm.i64 + %5737 = llvm.add %5734, %5736 : !llvm.i64 + %5738 = llvm.getelementptr %5730[%5737] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5739 = llvm.load %5738 : !llvm.ptr + %5740 = llvm.fmul %5729, %5739 {RelaxedPrecision} : !llvm.float + %5741 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5742 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5743 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5744 = llvm.mul %4879, %5743 : !llvm.i64 + %5745 = llvm.add %5742, %5744 : !llvm.i64 + %5746 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5747 = llvm.mul %913, %5746 : !llvm.i64 + %5748 = llvm.add %5745, %5747 : !llvm.i64 + %5749 = llvm.getelementptr %5741[%5748] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5750 = llvm.load %5749 : !llvm.ptr + %5751 = llvm.fadd %5750, %5740 {RelaxedPrecision} : !llvm.float + %5752 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5754 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5755 = llvm.mul %4879, %5754 : !llvm.i64 + %5756 = llvm.add %5753, %5755 : !llvm.i64 + %5757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5758 = llvm.mul %913, %5757 : !llvm.i64 + %5759 = llvm.add %5756, %5758 : !llvm.i64 + %5760 = llvm.getelementptr %5752[%5759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5751, %5760 : !llvm.ptr + %5761 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5762 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5763 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5764 = llvm.mul %4879, %5763 : !llvm.i64 + %5765 = llvm.add %5762, %5764 : !llvm.i64 + %5766 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5767 = llvm.mul %913, %5766 : !llvm.i64 + %5768 = llvm.add %5765, %5767 : !llvm.i64 + %5769 = llvm.getelementptr %5761[%5768] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5770 = llvm.load %5769 : !llvm.ptr + %5771 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5772 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5773 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5774 = llvm.mul %4879, %5773 : !llvm.i64 + %5775 = llvm.add %5772, %5774 : !llvm.i64 + %5776 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5777 = llvm.mul %913, %5776 : !llvm.i64 + %5778 = llvm.add %5775, %5777 : !llvm.i64 + %5779 = llvm.getelementptr %5771[%5778] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5770, %5779 : !llvm.ptr + %5780 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5781 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5782 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5783 = llvm.mul %4879, %5782 : !llvm.i64 + %5784 = llvm.add %5781, %5783 : !llvm.i64 + %5785 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5786 = llvm.mul %59, %5785 : !llvm.i64 + %5787 = llvm.add %5784, %5786 : !llvm.i64 + %5788 = llvm.getelementptr %5780[%5787] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5789 = llvm.load %5788 : !llvm.ptr + %5790 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5791 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5792 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5793 = llvm.mul %59, %5792 : !llvm.i64 + %5794 = llvm.add %5791, %5793 : !llvm.i64 + %5795 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5796 = llvm.mul %974, %5795 : !llvm.i64 + %5797 = llvm.add %5794, %5796 : !llvm.i64 + %5798 = llvm.getelementptr %5790[%5797] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5799 = llvm.load %5798 : !llvm.ptr + %5800 = llvm.fmul %5789, %5799 {RelaxedPrecision} : !llvm.float + %5801 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5802 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5803 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5804 = llvm.mul %4879, %5803 : !llvm.i64 + %5805 = llvm.add %5802, %5804 : !llvm.i64 + %5806 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5807 = llvm.mul %974, %5806 : !llvm.i64 + %5808 = llvm.add %5805, %5807 : !llvm.i64 + %5809 = llvm.getelementptr %5801[%5808] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5810 = llvm.load %5809 : !llvm.ptr + %5811 = llvm.fadd %5810, %5800 {RelaxedPrecision} : !llvm.float + %5812 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5813 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5814 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5815 = llvm.mul %4879, %5814 : !llvm.i64 + %5816 = llvm.add %5813, %5815 : !llvm.i64 + %5817 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5818 = llvm.mul %974, %5817 : !llvm.i64 + %5819 = llvm.add %5816, %5818 : !llvm.i64 + %5820 = llvm.getelementptr %5812[%5819] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5811, %5820 : !llvm.ptr + %5821 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5822 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5823 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5824 = llvm.mul %4879, %5823 : !llvm.i64 + %5825 = llvm.add %5822, %5824 : !llvm.i64 + %5826 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5827 = llvm.mul %974, %5826 : !llvm.i64 + %5828 = llvm.add %5825, %5827 : !llvm.i64 + %5829 = llvm.getelementptr %5821[%5828] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5830 = llvm.load %5829 : !llvm.ptr + %5831 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5832 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5833 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5834 = llvm.mul %4879, %5833 : !llvm.i64 + %5835 = llvm.add %5832, %5834 : !llvm.i64 + %5836 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5837 = llvm.mul %974, %5836 : !llvm.i64 + %5838 = llvm.add %5835, %5837 : !llvm.i64 + %5839 = llvm.getelementptr %5831[%5838] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5830, %5839 : !llvm.ptr + %5840 = llvm.add %56, %33 : !llvm.i64 + llvm.br ^bb9(%5840 : !llvm.i64) + ^bb11: // pred: ^bb9 + %5841 = llvm.add %54, %36 : !llvm.i64 + llvm.br ^bb7(%5841 : !llvm.i64) + ^bb12: // pred: ^bb7 + %5842 = llvm.add %52, %30 : !llvm.i64 + llvm.br ^bb5(%5842 : !llvm.i64) + ^bb13: // pred: ^bb5 + %5843 = llvm.add %50, %38 : !llvm.i64 + llvm.br ^bb3(%5843 : !llvm.i64) + ^bb14: // pred: ^bb3 + llvm.br ^bb15(%32 : !llvm.i64) + ^bb15(%5844: !llvm.i64): // 2 preds: ^bb14, ^bb22 + %5845 = llvm.icmp "slt" %5844, %29 : !llvm.i64 + llvm.cond_br %5845, ^bb16, ^bb23 + ^bb16: // pred: ^bb15 + llvm.br ^bb17(%32 : !llvm.i64) + ^bb17(%5846: !llvm.i64): // 2 preds: ^bb16, ^bb21 + %5847 = llvm.icmp "slt" %5846, %31 : !llvm.i64 + llvm.cond_br %5847, ^bb18, ^bb22 + ^bb18: // pred: ^bb17 + llvm.br ^bb19(%32 : !llvm.i64) + ^bb19(%5848: !llvm.i64): // 2 preds: ^bb18, ^bb20 + %5849 = llvm.icmp "slt" %5848, %36 : !llvm.i64 + llvm.cond_br %5849, ^bb20, ^bb21 + ^bb20: // pred: ^bb19 + %5850 = llvm.add %48, %5844 : !llvm.i64 + %5851 = llvm.add %5846, %5848 : !llvm.i64 + %5852 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5853 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5854 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5855 = llvm.mul %28, %5854 : !llvm.i64 + %5856 = llvm.add %5853, %5855 : !llvm.i64 + %5857 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5858 = llvm.mul %5851, %5857 : !llvm.i64 + %5859 = llvm.add %5856, %5858 : !llvm.i64 + %5860 = llvm.getelementptr %5852[%5859] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5861 = llvm.load %5860 : !llvm.ptr + %5862 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5863 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5864 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5865 = llvm.mul %5851, %5864 : !llvm.i64 + %5866 = llvm.add %5863, %5865 : !llvm.i64 + %5867 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5868 = llvm.mul %5850, %5867 : !llvm.i64 + %5869 = llvm.add %5866, %5868 : !llvm.i64 + %5870 = llvm.getelementptr %5862[%5869] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5871 = llvm.load %5870 : !llvm.ptr + %5872 = llvm.fmul %5861, %5871 {RelaxedPrecision} : !llvm.float + %5873 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5874 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5875 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5876 = llvm.mul %28, %5875 : !llvm.i64 + %5877 = llvm.add %5874, %5876 : !llvm.i64 + %5878 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5879 = llvm.mul %5850, %5878 : !llvm.i64 + %5880 = llvm.add %5877, %5879 : !llvm.i64 + %5881 = llvm.getelementptr %5873[%5880] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5882 = llvm.load %5881 : !llvm.ptr + %5883 = llvm.fadd %5882, %5872 {RelaxedPrecision} : !llvm.float + %5884 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5885 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5886 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5887 = llvm.mul %28, %5886 : !llvm.i64 + %5888 = llvm.add %5885, %5887 : !llvm.i64 + %5889 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5890 = llvm.mul %5850, %5889 : !llvm.i64 + %5891 = llvm.add %5888, %5890 : !llvm.i64 + %5892 = llvm.getelementptr %5884[%5891] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5883, %5892 : !llvm.ptr + %5893 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5894 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5895 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5896 = llvm.mul %28, %5895 : !llvm.i64 + %5897 = llvm.add %5894, %5896 : !llvm.i64 + %5898 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5899 = llvm.mul %5850, %5898 : !llvm.i64 + %5900 = llvm.add %5897, %5899 : !llvm.i64 + %5901 = llvm.getelementptr %5893[%5900] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5902 = llvm.load %5901 : !llvm.ptr + %5903 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5904 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5905 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5906 = llvm.mul %28, %5905 : !llvm.i64 + %5907 = llvm.add %5904, %5906 : !llvm.i64 + %5908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5909 = llvm.mul %5850, %5908 : !llvm.i64 + %5910 = llvm.add %5907, %5909 : !llvm.i64 + %5911 = llvm.getelementptr %5903[%5910] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5902, %5911 : !llvm.ptr + %5912 = llvm.add %5850, %33 : !llvm.i64 + %5913 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5914 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5915 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5916 = llvm.mul %28, %5915 : !llvm.i64 + %5917 = llvm.add %5914, %5916 : !llvm.i64 + %5918 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5919 = llvm.mul %5851, %5918 : !llvm.i64 + %5920 = llvm.add %5917, %5919 : !llvm.i64 + %5921 = llvm.getelementptr %5913[%5920] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5922 = llvm.load %5921 : !llvm.ptr + %5923 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5924 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5925 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5926 = llvm.mul %5851, %5925 : !llvm.i64 + %5927 = llvm.add %5924, %5926 : !llvm.i64 + %5928 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5929 = llvm.mul %5912, %5928 : !llvm.i64 + %5930 = llvm.add %5927, %5929 : !llvm.i64 + %5931 = llvm.getelementptr %5923[%5930] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5932 = llvm.load %5931 : !llvm.ptr + %5933 = llvm.fmul %5922, %5932 {RelaxedPrecision} : !llvm.float + %5934 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5935 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5936 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5937 = llvm.mul %28, %5936 : !llvm.i64 + %5938 = llvm.add %5935, %5937 : !llvm.i64 + %5939 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5940 = llvm.mul %5912, %5939 : !llvm.i64 + %5941 = llvm.add %5938, %5940 : !llvm.i64 + %5942 = llvm.getelementptr %5934[%5941] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5943 = llvm.load %5942 : !llvm.ptr + %5944 = llvm.fadd %5943, %5933 {RelaxedPrecision} : !llvm.float + %5945 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5946 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5947 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5948 = llvm.mul %28, %5947 : !llvm.i64 + %5949 = llvm.add %5946, %5948 : !llvm.i64 + %5950 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5951 = llvm.mul %5912, %5950 : !llvm.i64 + %5952 = llvm.add %5949, %5951 : !llvm.i64 + %5953 = llvm.getelementptr %5945[%5952] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5944, %5953 : !llvm.ptr + %5954 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5955 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5956 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5957 = llvm.mul %28, %5956 : !llvm.i64 + %5958 = llvm.add %5955, %5957 : !llvm.i64 + %5959 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5960 = llvm.mul %5912, %5959 : !llvm.i64 + %5961 = llvm.add %5958, %5960 : !llvm.i64 + %5962 = llvm.getelementptr %5954[%5961] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5963 = llvm.load %5962 : !llvm.ptr + %5964 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5966 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5967 = llvm.mul %28, %5966 : !llvm.i64 + %5968 = llvm.add %5965, %5967 : !llvm.i64 + %5969 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5970 = llvm.mul %5912, %5969 : !llvm.i64 + %5971 = llvm.add %5968, %5970 : !llvm.i64 + %5972 = llvm.getelementptr %5964[%5971] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5963, %5972 : !llvm.ptr + %5973 = llvm.add %5850, %34 : !llvm.i64 + %5974 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5975 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5976 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5977 = llvm.mul %28, %5976 : !llvm.i64 + %5978 = llvm.add %5975, %5977 : !llvm.i64 + %5979 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5980 = llvm.mul %5851, %5979 : !llvm.i64 + %5981 = llvm.add %5978, %5980 : !llvm.i64 + %5982 = llvm.getelementptr %5974[%5981] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5983 = llvm.load %5982 : !llvm.ptr + %5984 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5985 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5986 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5987 = llvm.mul %5851, %5986 : !llvm.i64 + %5988 = llvm.add %5985, %5987 : !llvm.i64 + %5989 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5990 = llvm.mul %5973, %5989 : !llvm.i64 + %5991 = llvm.add %5988, %5990 : !llvm.i64 + %5992 = llvm.getelementptr %5984[%5991] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5993 = llvm.load %5992 : !llvm.ptr + %5994 = llvm.fmul %5983, %5993 {RelaxedPrecision} : !llvm.float + %5995 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5996 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5997 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5998 = llvm.mul %28, %5997 : !llvm.i64 + %5999 = llvm.add %5996, %5998 : !llvm.i64 + %6000 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6001 = llvm.mul %5973, %6000 : !llvm.i64 + %6002 = llvm.add %5999, %6001 : !llvm.i64 + %6003 = llvm.getelementptr %5995[%6002] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6004 = llvm.load %6003 : !llvm.ptr + %6005 = llvm.fadd %6004, %5994 {RelaxedPrecision} : !llvm.float + %6006 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6007 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6008 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6009 = llvm.mul %28, %6008 : !llvm.i64 + %6010 = llvm.add %6007, %6009 : !llvm.i64 + %6011 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6012 = llvm.mul %5973, %6011 : !llvm.i64 + %6013 = llvm.add %6010, %6012 : !llvm.i64 + %6014 = llvm.getelementptr %6006[%6013] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6005, %6014 : !llvm.ptr + %6015 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6016 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6017 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6018 = llvm.mul %28, %6017 : !llvm.i64 + %6019 = llvm.add %6016, %6018 : !llvm.i64 + %6020 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6021 = llvm.mul %5973, %6020 : !llvm.i64 + %6022 = llvm.add %6019, %6021 : !llvm.i64 + %6023 = llvm.getelementptr %6015[%6022] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6024 = llvm.load %6023 : !llvm.ptr + %6025 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6026 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6027 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6028 = llvm.mul %28, %6027 : !llvm.i64 + %6029 = llvm.add %6026, %6028 : !llvm.i64 + %6030 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6031 = llvm.mul %5973, %6030 : !llvm.i64 + %6032 = llvm.add %6029, %6031 : !llvm.i64 + %6033 = llvm.getelementptr %6025[%6032] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6024, %6033 : !llvm.ptr + %6034 = llvm.add %5850, %35 : !llvm.i64 + %6035 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6036 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6037 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6038 = llvm.mul %28, %6037 : !llvm.i64 + %6039 = llvm.add %6036, %6038 : !llvm.i64 + %6040 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6041 = llvm.mul %5851, %6040 : !llvm.i64 + %6042 = llvm.add %6039, %6041 : !llvm.i64 + %6043 = llvm.getelementptr %6035[%6042] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6044 = llvm.load %6043 : !llvm.ptr + %6045 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6046 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6047 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6048 = llvm.mul %5851, %6047 : !llvm.i64 + %6049 = llvm.add %6046, %6048 : !llvm.i64 + %6050 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6051 = llvm.mul %6034, %6050 : !llvm.i64 + %6052 = llvm.add %6049, %6051 : !llvm.i64 + %6053 = llvm.getelementptr %6045[%6052] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6054 = llvm.load %6053 : !llvm.ptr + %6055 = llvm.fmul %6044, %6054 {RelaxedPrecision} : !llvm.float + %6056 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6057 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6058 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6059 = llvm.mul %28, %6058 : !llvm.i64 + %6060 = llvm.add %6057, %6059 : !llvm.i64 + %6061 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6062 = llvm.mul %6034, %6061 : !llvm.i64 + %6063 = llvm.add %6060, %6062 : !llvm.i64 + %6064 = llvm.getelementptr %6056[%6063] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6065 = llvm.load %6064 : !llvm.ptr + %6066 = llvm.fadd %6065, %6055 {RelaxedPrecision} : !llvm.float + %6067 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6069 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6070 = llvm.mul %28, %6069 : !llvm.i64 + %6071 = llvm.add %6068, %6070 : !llvm.i64 + %6072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6073 = llvm.mul %6034, %6072 : !llvm.i64 + %6074 = llvm.add %6071, %6073 : !llvm.i64 + %6075 = llvm.getelementptr %6067[%6074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6066, %6075 : !llvm.ptr + %6076 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6077 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6078 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6079 = llvm.mul %28, %6078 : !llvm.i64 + %6080 = llvm.add %6077, %6079 : !llvm.i64 + %6081 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6082 = llvm.mul %6034, %6081 : !llvm.i64 + %6083 = llvm.add %6080, %6082 : !llvm.i64 + %6084 = llvm.getelementptr %6076[%6083] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6085 = llvm.load %6084 : !llvm.ptr + %6086 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6087 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6088 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6089 = llvm.mul %28, %6088 : !llvm.i64 + %6090 = llvm.add %6087, %6089 : !llvm.i64 + %6091 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6092 = llvm.mul %6034, %6091 : !llvm.i64 + %6093 = llvm.add %6090, %6092 : !llvm.i64 + %6094 = llvm.getelementptr %6086[%6093] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6085, %6094 : !llvm.ptr + %6095 = llvm.add %5850, %36 : !llvm.i64 + %6096 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6098 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6099 = llvm.mul %28, %6098 : !llvm.i64 + %6100 = llvm.add %6097, %6099 : !llvm.i64 + %6101 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6102 = llvm.mul %5851, %6101 : !llvm.i64 + %6103 = llvm.add %6100, %6102 : !llvm.i64 + %6104 = llvm.getelementptr %6096[%6103] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6105 = llvm.load %6104 : !llvm.ptr + %6106 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6108 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6109 = llvm.mul %5851, %6108 : !llvm.i64 + %6110 = llvm.add %6107, %6109 : !llvm.i64 + %6111 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6112 = llvm.mul %6095, %6111 : !llvm.i64 + %6113 = llvm.add %6110, %6112 : !llvm.i64 + %6114 = llvm.getelementptr %6106[%6113] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6115 = llvm.load %6114 : !llvm.ptr + %6116 = llvm.fmul %6105, %6115 {RelaxedPrecision} : !llvm.float + %6117 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6120 = llvm.mul %28, %6119 : !llvm.i64 + %6121 = llvm.add %6118, %6120 : !llvm.i64 + %6122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6123 = llvm.mul %6095, %6122 : !llvm.i64 + %6124 = llvm.add %6121, %6123 : !llvm.i64 + %6125 = llvm.getelementptr %6117[%6124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6126 = llvm.load %6125 : !llvm.ptr + %6127 = llvm.fadd %6126, %6116 {RelaxedPrecision} : !llvm.float + %6128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6131 = llvm.mul %28, %6130 : !llvm.i64 + %6132 = llvm.add %6129, %6131 : !llvm.i64 + %6133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6134 = llvm.mul %6095, %6133 : !llvm.i64 + %6135 = llvm.add %6132, %6134 : !llvm.i64 + %6136 = llvm.getelementptr %6128[%6135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6127, %6136 : !llvm.ptr + %6137 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6140 = llvm.mul %28, %6139 : !llvm.i64 + %6141 = llvm.add %6138, %6140 : !llvm.i64 + %6142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6143 = llvm.mul %6095, %6142 : !llvm.i64 + %6144 = llvm.add %6141, %6143 : !llvm.i64 + %6145 = llvm.getelementptr %6137[%6144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6146 = llvm.load %6145 : !llvm.ptr + %6147 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6149 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6150 = llvm.mul %28, %6149 : !llvm.i64 + %6151 = llvm.add %6148, %6150 : !llvm.i64 + %6152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6153 = llvm.mul %6095, %6152 : !llvm.i64 + %6154 = llvm.add %6151, %6153 : !llvm.i64 + %6155 = llvm.getelementptr %6147[%6154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6146, %6155 : !llvm.ptr + %6156 = llvm.add %5850, %37 : !llvm.i64 + %6157 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6159 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6160 = llvm.mul %28, %6159 : !llvm.i64 + %6161 = llvm.add %6158, %6160 : !llvm.i64 + %6162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6163 = llvm.mul %5851, %6162 : !llvm.i64 + %6164 = llvm.add %6161, %6163 : !llvm.i64 + %6165 = llvm.getelementptr %6157[%6164] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6166 = llvm.load %6165 : !llvm.ptr + %6167 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6169 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6170 = llvm.mul %5851, %6169 : !llvm.i64 + %6171 = llvm.add %6168, %6170 : !llvm.i64 + %6172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6173 = llvm.mul %6156, %6172 : !llvm.i64 + %6174 = llvm.add %6171, %6173 : !llvm.i64 + %6175 = llvm.getelementptr %6167[%6174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6176 = llvm.load %6175 : !llvm.ptr + %6177 = llvm.fmul %6166, %6176 {RelaxedPrecision} : !llvm.float + %6178 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6180 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6181 = llvm.mul %28, %6180 : !llvm.i64 + %6182 = llvm.add %6179, %6181 : !llvm.i64 + %6183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6184 = llvm.mul %6156, %6183 : !llvm.i64 + %6185 = llvm.add %6182, %6184 : !llvm.i64 + %6186 = llvm.getelementptr %6178[%6185] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6187 = llvm.load %6186 : !llvm.ptr + %6188 = llvm.fadd %6187, %6177 {RelaxedPrecision} : !llvm.float + %6189 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6191 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6192 = llvm.mul %28, %6191 : !llvm.i64 + %6193 = llvm.add %6190, %6192 : !llvm.i64 + %6194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6195 = llvm.mul %6156, %6194 : !llvm.i64 + %6196 = llvm.add %6193, %6195 : !llvm.i64 + %6197 = llvm.getelementptr %6189[%6196] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6188, %6197 : !llvm.ptr + %6198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6201 = llvm.mul %28, %6200 : !llvm.i64 + %6202 = llvm.add %6199, %6201 : !llvm.i64 + %6203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6204 = llvm.mul %6156, %6203 : !llvm.i64 + %6205 = llvm.add %6202, %6204 : !llvm.i64 + %6206 = llvm.getelementptr %6198[%6205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6207 = llvm.load %6206 : !llvm.ptr + %6208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6211 = llvm.mul %28, %6210 : !llvm.i64 + %6212 = llvm.add %6209, %6211 : !llvm.i64 + %6213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6214 = llvm.mul %6156, %6213 : !llvm.i64 + %6215 = llvm.add %6212, %6214 : !llvm.i64 + %6216 = llvm.getelementptr %6208[%6215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6207, %6216 : !llvm.ptr + %6217 = llvm.add %5850, %38 : !llvm.i64 + %6218 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6220 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6221 = llvm.mul %28, %6220 : !llvm.i64 + %6222 = llvm.add %6219, %6221 : !llvm.i64 + %6223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6224 = llvm.mul %5851, %6223 : !llvm.i64 + %6225 = llvm.add %6222, %6224 : !llvm.i64 + %6226 = llvm.getelementptr %6218[%6225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6227 = llvm.load %6226 : !llvm.ptr + %6228 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6231 = llvm.mul %5851, %6230 : !llvm.i64 + %6232 = llvm.add %6229, %6231 : !llvm.i64 + %6233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6234 = llvm.mul %6217, %6233 : !llvm.i64 + %6235 = llvm.add %6232, %6234 : !llvm.i64 + %6236 = llvm.getelementptr %6228[%6235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6237 = llvm.load %6236 : !llvm.ptr + %6238 = llvm.fmul %6227, %6237 {RelaxedPrecision} : !llvm.float + %6239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6242 = llvm.mul %28, %6241 : !llvm.i64 + %6243 = llvm.add %6240, %6242 : !llvm.i64 + %6244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6245 = llvm.mul %6217, %6244 : !llvm.i64 + %6246 = llvm.add %6243, %6245 : !llvm.i64 + %6247 = llvm.getelementptr %6239[%6246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6248 = llvm.load %6247 : !llvm.ptr + %6249 = llvm.fadd %6248, %6238 {RelaxedPrecision} : !llvm.float + %6250 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6252 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6253 = llvm.mul %28, %6252 : !llvm.i64 + %6254 = llvm.add %6251, %6253 : !llvm.i64 + %6255 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6256 = llvm.mul %6217, %6255 : !llvm.i64 + %6257 = llvm.add %6254, %6256 : !llvm.i64 + %6258 = llvm.getelementptr %6250[%6257] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6249, %6258 : !llvm.ptr + %6259 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6260 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6261 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6262 = llvm.mul %28, %6261 : !llvm.i64 + %6263 = llvm.add %6260, %6262 : !llvm.i64 + %6264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6265 = llvm.mul %6217, %6264 : !llvm.i64 + %6266 = llvm.add %6263, %6265 : !llvm.i64 + %6267 = llvm.getelementptr %6259[%6266] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6268 = llvm.load %6267 : !llvm.ptr + %6269 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6270 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6271 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6272 = llvm.mul %28, %6271 : !llvm.i64 + %6273 = llvm.add %6270, %6272 : !llvm.i64 + %6274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6275 = llvm.mul %6217, %6274 : !llvm.i64 + %6276 = llvm.add %6273, %6275 : !llvm.i64 + %6277 = llvm.getelementptr %6269[%6276] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6268, %6277 : !llvm.ptr + %6278 = llvm.add %5850, %39 : !llvm.i64 + %6279 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6281 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6282 = llvm.mul %28, %6281 : !llvm.i64 + %6283 = llvm.add %6280, %6282 : !llvm.i64 + %6284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6285 = llvm.mul %5851, %6284 : !llvm.i64 + %6286 = llvm.add %6283, %6285 : !llvm.i64 + %6287 = llvm.getelementptr %6279[%6286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6288 = llvm.load %6287 : !llvm.ptr + %6289 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6290 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6291 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6292 = llvm.mul %5851, %6291 : !llvm.i64 + %6293 = llvm.add %6290, %6292 : !llvm.i64 + %6294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6295 = llvm.mul %6278, %6294 : !llvm.i64 + %6296 = llvm.add %6293, %6295 : !llvm.i64 + %6297 = llvm.getelementptr %6289[%6296] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6298 = llvm.load %6297 : !llvm.ptr + %6299 = llvm.fmul %6288, %6298 {RelaxedPrecision} : !llvm.float + %6300 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6302 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6303 = llvm.mul %28, %6302 : !llvm.i64 + %6304 = llvm.add %6301, %6303 : !llvm.i64 + %6305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6306 = llvm.mul %6278, %6305 : !llvm.i64 + %6307 = llvm.add %6304, %6306 : !llvm.i64 + %6308 = llvm.getelementptr %6300[%6307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6309 = llvm.load %6308 : !llvm.ptr + %6310 = llvm.fadd %6309, %6299 {RelaxedPrecision} : !llvm.float + %6311 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6314 = llvm.mul %28, %6313 : !llvm.i64 + %6315 = llvm.add %6312, %6314 : !llvm.i64 + %6316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6317 = llvm.mul %6278, %6316 : !llvm.i64 + %6318 = llvm.add %6315, %6317 : !llvm.i64 + %6319 = llvm.getelementptr %6311[%6318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6310, %6319 : !llvm.ptr + %6320 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6321 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6322 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6323 = llvm.mul %28, %6322 : !llvm.i64 + %6324 = llvm.add %6321, %6323 : !llvm.i64 + %6325 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6326 = llvm.mul %6278, %6325 : !llvm.i64 + %6327 = llvm.add %6324, %6326 : !llvm.i64 + %6328 = llvm.getelementptr %6320[%6327] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6329 = llvm.load %6328 : !llvm.ptr + %6330 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6331 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6332 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6333 = llvm.mul %28, %6332 : !llvm.i64 + %6334 = llvm.add %6331, %6333 : !llvm.i64 + %6335 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6336 = llvm.mul %6278, %6335 : !llvm.i64 + %6337 = llvm.add %6334, %6336 : !llvm.i64 + %6338 = llvm.getelementptr %6330[%6337] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6329, %6338 : !llvm.ptr + %6339 = llvm.add %5850, %40 : !llvm.i64 + %6340 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6342 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6343 = llvm.mul %28, %6342 : !llvm.i64 + %6344 = llvm.add %6341, %6343 : !llvm.i64 + %6345 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6346 = llvm.mul %5851, %6345 : !llvm.i64 + %6347 = llvm.add %6344, %6346 : !llvm.i64 + %6348 = llvm.getelementptr %6340[%6347] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6349 = llvm.load %6348 : !llvm.ptr + %6350 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6351 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6352 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6353 = llvm.mul %5851, %6352 : !llvm.i64 + %6354 = llvm.add %6351, %6353 : !llvm.i64 + %6355 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6356 = llvm.mul %6339, %6355 : !llvm.i64 + %6357 = llvm.add %6354, %6356 : !llvm.i64 + %6358 = llvm.getelementptr %6350[%6357] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6359 = llvm.load %6358 : !llvm.ptr + %6360 = llvm.fmul %6349, %6359 {RelaxedPrecision} : !llvm.float + %6361 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6362 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6363 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6364 = llvm.mul %28, %6363 : !llvm.i64 + %6365 = llvm.add %6362, %6364 : !llvm.i64 + %6366 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6367 = llvm.mul %6339, %6366 : !llvm.i64 + %6368 = llvm.add %6365, %6367 : !llvm.i64 + %6369 = llvm.getelementptr %6361[%6368] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6370 = llvm.load %6369 : !llvm.ptr + %6371 = llvm.fadd %6370, %6360 {RelaxedPrecision} : !llvm.float + %6372 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6373 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6374 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6375 = llvm.mul %28, %6374 : !llvm.i64 + %6376 = llvm.add %6373, %6375 : !llvm.i64 + %6377 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6378 = llvm.mul %6339, %6377 : !llvm.i64 + %6379 = llvm.add %6376, %6378 : !llvm.i64 + %6380 = llvm.getelementptr %6372[%6379] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6371, %6380 : !llvm.ptr + %6381 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6383 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6384 = llvm.mul %28, %6383 : !llvm.i64 + %6385 = llvm.add %6382, %6384 : !llvm.i64 + %6386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6387 = llvm.mul %6339, %6386 : !llvm.i64 + %6388 = llvm.add %6385, %6387 : !llvm.i64 + %6389 = llvm.getelementptr %6381[%6388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6390 = llvm.load %6389 : !llvm.ptr + %6391 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6392 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6393 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6394 = llvm.mul %28, %6393 : !llvm.i64 + %6395 = llvm.add %6392, %6394 : !llvm.i64 + %6396 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6397 = llvm.mul %6339, %6396 : !llvm.i64 + %6398 = llvm.add %6395, %6397 : !llvm.i64 + %6399 = llvm.getelementptr %6391[%6398] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6390, %6399 : !llvm.ptr + %6400 = llvm.add %5850, %41 : !llvm.i64 + %6401 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6403 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6404 = llvm.mul %28, %6403 : !llvm.i64 + %6405 = llvm.add %6402, %6404 : !llvm.i64 + %6406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6407 = llvm.mul %5851, %6406 : !llvm.i64 + %6408 = llvm.add %6405, %6407 : !llvm.i64 + %6409 = llvm.getelementptr %6401[%6408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6410 = llvm.load %6409 : !llvm.ptr + %6411 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6413 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6414 = llvm.mul %5851, %6413 : !llvm.i64 + %6415 = llvm.add %6412, %6414 : !llvm.i64 + %6416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6417 = llvm.mul %6400, %6416 : !llvm.i64 + %6418 = llvm.add %6415, %6417 : !llvm.i64 + %6419 = llvm.getelementptr %6411[%6418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6420 = llvm.load %6419 : !llvm.ptr + %6421 = llvm.fmul %6410, %6420 {RelaxedPrecision} : !llvm.float + %6422 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6423 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6424 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6425 = llvm.mul %28, %6424 : !llvm.i64 + %6426 = llvm.add %6423, %6425 : !llvm.i64 + %6427 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6428 = llvm.mul %6400, %6427 : !llvm.i64 + %6429 = llvm.add %6426, %6428 : !llvm.i64 + %6430 = llvm.getelementptr %6422[%6429] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6431 = llvm.load %6430 : !llvm.ptr + %6432 = llvm.fadd %6431, %6421 {RelaxedPrecision} : !llvm.float + %6433 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6434 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6435 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6436 = llvm.mul %28, %6435 : !llvm.i64 + %6437 = llvm.add %6434, %6436 : !llvm.i64 + %6438 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6439 = llvm.mul %6400, %6438 : !llvm.i64 + %6440 = llvm.add %6437, %6439 : !llvm.i64 + %6441 = llvm.getelementptr %6433[%6440] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6432, %6441 : !llvm.ptr + %6442 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6444 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6445 = llvm.mul %28, %6444 : !llvm.i64 + %6446 = llvm.add %6443, %6445 : !llvm.i64 + %6447 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6448 = llvm.mul %6400, %6447 : !llvm.i64 + %6449 = llvm.add %6446, %6448 : !llvm.i64 + %6450 = llvm.getelementptr %6442[%6449] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6451 = llvm.load %6450 : !llvm.ptr + %6452 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6455 = llvm.mul %28, %6454 : !llvm.i64 + %6456 = llvm.add %6453, %6455 : !llvm.i64 + %6457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6458 = llvm.mul %6400, %6457 : !llvm.i64 + %6459 = llvm.add %6456, %6458 : !llvm.i64 + %6460 = llvm.getelementptr %6452[%6459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6451, %6460 : !llvm.ptr + %6461 = llvm.add %5850, %42 : !llvm.i64 + %6462 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6463 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6464 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6465 = llvm.mul %28, %6464 : !llvm.i64 + %6466 = llvm.add %6463, %6465 : !llvm.i64 + %6467 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6468 = llvm.mul %5851, %6467 : !llvm.i64 + %6469 = llvm.add %6466, %6468 : !llvm.i64 + %6470 = llvm.getelementptr %6462[%6469] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6471 = llvm.load %6470 : !llvm.ptr + %6472 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6473 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6474 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6475 = llvm.mul %5851, %6474 : !llvm.i64 + %6476 = llvm.add %6473, %6475 : !llvm.i64 + %6477 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6478 = llvm.mul %6461, %6477 : !llvm.i64 + %6479 = llvm.add %6476, %6478 : !llvm.i64 + %6480 = llvm.getelementptr %6472[%6479] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6481 = llvm.load %6480 : !llvm.ptr + %6482 = llvm.fmul %6471, %6481 {RelaxedPrecision} : !llvm.float + %6483 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6484 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6485 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6486 = llvm.mul %28, %6485 : !llvm.i64 + %6487 = llvm.add %6484, %6486 : !llvm.i64 + %6488 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6489 = llvm.mul %6461, %6488 : !llvm.i64 + %6490 = llvm.add %6487, %6489 : !llvm.i64 + %6491 = llvm.getelementptr %6483[%6490] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6492 = llvm.load %6491 : !llvm.ptr + %6493 = llvm.fadd %6492, %6482 {RelaxedPrecision} : !llvm.float + %6494 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6495 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6496 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6497 = llvm.mul %28, %6496 : !llvm.i64 + %6498 = llvm.add %6495, %6497 : !llvm.i64 + %6499 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6500 = llvm.mul %6461, %6499 : !llvm.i64 + %6501 = llvm.add %6498, %6500 : !llvm.i64 + %6502 = llvm.getelementptr %6494[%6501] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6493, %6502 : !llvm.ptr + %6503 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6504 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6505 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6506 = llvm.mul %28, %6505 : !llvm.i64 + %6507 = llvm.add %6504, %6506 : !llvm.i64 + %6508 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6509 = llvm.mul %6461, %6508 : !llvm.i64 + %6510 = llvm.add %6507, %6509 : !llvm.i64 + %6511 = llvm.getelementptr %6503[%6510] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6512 = llvm.load %6511 : !llvm.ptr + %6513 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6514 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6515 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6516 = llvm.mul %28, %6515 : !llvm.i64 + %6517 = llvm.add %6514, %6516 : !llvm.i64 + %6518 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6519 = llvm.mul %6461, %6518 : !llvm.i64 + %6520 = llvm.add %6517, %6519 : !llvm.i64 + %6521 = llvm.getelementptr %6513[%6520] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6512, %6521 : !llvm.ptr + %6522 = llvm.add %5850, %43 : !llvm.i64 + %6523 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6524 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6525 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6526 = llvm.mul %28, %6525 : !llvm.i64 + %6527 = llvm.add %6524, %6526 : !llvm.i64 + %6528 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6529 = llvm.mul %5851, %6528 : !llvm.i64 + %6530 = llvm.add %6527, %6529 : !llvm.i64 + %6531 = llvm.getelementptr %6523[%6530] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6532 = llvm.load %6531 : !llvm.ptr + %6533 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6534 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6535 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6536 = llvm.mul %5851, %6535 : !llvm.i64 + %6537 = llvm.add %6534, %6536 : !llvm.i64 + %6538 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6539 = llvm.mul %6522, %6538 : !llvm.i64 + %6540 = llvm.add %6537, %6539 : !llvm.i64 + %6541 = llvm.getelementptr %6533[%6540] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6542 = llvm.load %6541 : !llvm.ptr + %6543 = llvm.fmul %6532, %6542 {RelaxedPrecision} : !llvm.float + %6544 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6545 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6546 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6547 = llvm.mul %28, %6546 : !llvm.i64 + %6548 = llvm.add %6545, %6547 : !llvm.i64 + %6549 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6550 = llvm.mul %6522, %6549 : !llvm.i64 + %6551 = llvm.add %6548, %6550 : !llvm.i64 + %6552 = llvm.getelementptr %6544[%6551] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6553 = llvm.load %6552 : !llvm.ptr + %6554 = llvm.fadd %6553, %6543 {RelaxedPrecision} : !llvm.float + %6555 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6556 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6557 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6558 = llvm.mul %28, %6557 : !llvm.i64 + %6559 = llvm.add %6556, %6558 : !llvm.i64 + %6560 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6561 = llvm.mul %6522, %6560 : !llvm.i64 + %6562 = llvm.add %6559, %6561 : !llvm.i64 + %6563 = llvm.getelementptr %6555[%6562] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6554, %6563 : !llvm.ptr + %6564 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6565 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6566 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6567 = llvm.mul %28, %6566 : !llvm.i64 + %6568 = llvm.add %6565, %6567 : !llvm.i64 + %6569 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6570 = llvm.mul %6522, %6569 : !llvm.i64 + %6571 = llvm.add %6568, %6570 : !llvm.i64 + %6572 = llvm.getelementptr %6564[%6571] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6573 = llvm.load %6572 : !llvm.ptr + %6574 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6575 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6576 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6577 = llvm.mul %28, %6576 : !llvm.i64 + %6578 = llvm.add %6575, %6577 : !llvm.i64 + %6579 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6580 = llvm.mul %6522, %6579 : !llvm.i64 + %6581 = llvm.add %6578, %6580 : !llvm.i64 + %6582 = llvm.getelementptr %6574[%6581] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6573, %6582 : !llvm.ptr + %6583 = llvm.add %5850, %44 : !llvm.i64 + %6584 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6585 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6586 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6587 = llvm.mul %28, %6586 : !llvm.i64 + %6588 = llvm.add %6585, %6587 : !llvm.i64 + %6589 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6590 = llvm.mul %5851, %6589 : !llvm.i64 + %6591 = llvm.add %6588, %6590 : !llvm.i64 + %6592 = llvm.getelementptr %6584[%6591] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6593 = llvm.load %6592 : !llvm.ptr + %6594 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6595 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6596 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6597 = llvm.mul %5851, %6596 : !llvm.i64 + %6598 = llvm.add %6595, %6597 : !llvm.i64 + %6599 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6600 = llvm.mul %6583, %6599 : !llvm.i64 + %6601 = llvm.add %6598, %6600 : !llvm.i64 + %6602 = llvm.getelementptr %6594[%6601] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6603 = llvm.load %6602 : !llvm.ptr + %6604 = llvm.fmul %6593, %6603 {RelaxedPrecision} : !llvm.float + %6605 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6606 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6607 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6608 = llvm.mul %28, %6607 : !llvm.i64 + %6609 = llvm.add %6606, %6608 : !llvm.i64 + %6610 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6611 = llvm.mul %6583, %6610 : !llvm.i64 + %6612 = llvm.add %6609, %6611 : !llvm.i64 + %6613 = llvm.getelementptr %6605[%6612] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6614 = llvm.load %6613 : !llvm.ptr + %6615 = llvm.fadd %6614, %6604 {RelaxedPrecision} : !llvm.float + %6616 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6617 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6618 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6619 = llvm.mul %28, %6618 : !llvm.i64 + %6620 = llvm.add %6617, %6619 : !llvm.i64 + %6621 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6622 = llvm.mul %6583, %6621 : !llvm.i64 + %6623 = llvm.add %6620, %6622 : !llvm.i64 + %6624 = llvm.getelementptr %6616[%6623] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6615, %6624 : !llvm.ptr + %6625 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6626 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6627 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6628 = llvm.mul %28, %6627 : !llvm.i64 + %6629 = llvm.add %6626, %6628 : !llvm.i64 + %6630 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6631 = llvm.mul %6583, %6630 : !llvm.i64 + %6632 = llvm.add %6629, %6631 : !llvm.i64 + %6633 = llvm.getelementptr %6625[%6632] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6634 = llvm.load %6633 : !llvm.ptr + %6635 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6637 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6638 = llvm.mul %28, %6637 : !llvm.i64 + %6639 = llvm.add %6636, %6638 : !llvm.i64 + %6640 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6641 = llvm.mul %6583, %6640 : !llvm.i64 + %6642 = llvm.add %6639, %6641 : !llvm.i64 + %6643 = llvm.getelementptr %6635[%6642] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6634, %6643 : !llvm.ptr + %6644 = llvm.add %5850, %45 : !llvm.i64 + %6645 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6646 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6647 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6648 = llvm.mul %28, %6647 : !llvm.i64 + %6649 = llvm.add %6646, %6648 : !llvm.i64 + %6650 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6651 = llvm.mul %5851, %6650 : !llvm.i64 + %6652 = llvm.add %6649, %6651 : !llvm.i64 + %6653 = llvm.getelementptr %6645[%6652] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6654 = llvm.load %6653 : !llvm.ptr + %6655 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6656 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6657 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6658 = llvm.mul %5851, %6657 : !llvm.i64 + %6659 = llvm.add %6656, %6658 : !llvm.i64 + %6660 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6661 = llvm.mul %6644, %6660 : !llvm.i64 + %6662 = llvm.add %6659, %6661 : !llvm.i64 + %6663 = llvm.getelementptr %6655[%6662] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6664 = llvm.load %6663 : !llvm.ptr + %6665 = llvm.fmul %6654, %6664 {RelaxedPrecision} : !llvm.float + %6666 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6667 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6668 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6669 = llvm.mul %28, %6668 : !llvm.i64 + %6670 = llvm.add %6667, %6669 : !llvm.i64 + %6671 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6672 = llvm.mul %6644, %6671 : !llvm.i64 + %6673 = llvm.add %6670, %6672 : !llvm.i64 + %6674 = llvm.getelementptr %6666[%6673] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6675 = llvm.load %6674 : !llvm.ptr + %6676 = llvm.fadd %6675, %6665 {RelaxedPrecision} : !llvm.float + %6677 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6680 = llvm.mul %28, %6679 : !llvm.i64 + %6681 = llvm.add %6678, %6680 : !llvm.i64 + %6682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6683 = llvm.mul %6644, %6682 : !llvm.i64 + %6684 = llvm.add %6681, %6683 : !llvm.i64 + %6685 = llvm.getelementptr %6677[%6684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6676, %6685 : !llvm.ptr + %6686 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6687 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6688 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6689 = llvm.mul %28, %6688 : !llvm.i64 + %6690 = llvm.add %6687, %6689 : !llvm.i64 + %6691 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6692 = llvm.mul %6644, %6691 : !llvm.i64 + %6693 = llvm.add %6690, %6692 : !llvm.i64 + %6694 = llvm.getelementptr %6686[%6693] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6695 = llvm.load %6694 : !llvm.ptr + %6696 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6697 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6698 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6699 = llvm.mul %28, %6698 : !llvm.i64 + %6700 = llvm.add %6697, %6699 : !llvm.i64 + %6701 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6702 = llvm.mul %6644, %6701 : !llvm.i64 + %6703 = llvm.add %6700, %6702 : !llvm.i64 + %6704 = llvm.getelementptr %6696[%6703] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6695, %6704 : !llvm.ptr + %6705 = llvm.add %5850, %46 : !llvm.i64 + %6706 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6708 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6709 = llvm.mul %28, %6708 : !llvm.i64 + %6710 = llvm.add %6707, %6709 : !llvm.i64 + %6711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6712 = llvm.mul %5851, %6711 : !llvm.i64 + %6713 = llvm.add %6710, %6712 : !llvm.i64 + %6714 = llvm.getelementptr %6706[%6713] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6715 = llvm.load %6714 : !llvm.ptr + %6716 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6717 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6718 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6719 = llvm.mul %5851, %6718 : !llvm.i64 + %6720 = llvm.add %6717, %6719 : !llvm.i64 + %6721 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6722 = llvm.mul %6705, %6721 : !llvm.i64 + %6723 = llvm.add %6720, %6722 : !llvm.i64 + %6724 = llvm.getelementptr %6716[%6723] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6725 = llvm.load %6724 : !llvm.ptr + %6726 = llvm.fmul %6715, %6725 {RelaxedPrecision} : !llvm.float + %6727 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6729 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6730 = llvm.mul %28, %6729 : !llvm.i64 + %6731 = llvm.add %6728, %6730 : !llvm.i64 + %6732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6733 = llvm.mul %6705, %6732 : !llvm.i64 + %6734 = llvm.add %6731, %6733 : !llvm.i64 + %6735 = llvm.getelementptr %6727[%6734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6736 = llvm.load %6735 : !llvm.ptr + %6737 = llvm.fadd %6736, %6726 {RelaxedPrecision} : !llvm.float + %6738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6741 = llvm.mul %28, %6740 : !llvm.i64 + %6742 = llvm.add %6739, %6741 : !llvm.i64 + %6743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6744 = llvm.mul %6705, %6743 : !llvm.i64 + %6745 = llvm.add %6742, %6744 : !llvm.i64 + %6746 = llvm.getelementptr %6738[%6745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6737, %6746 : !llvm.ptr + %6747 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6749 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6750 = llvm.mul %28, %6749 : !llvm.i64 + %6751 = llvm.add %6748, %6750 : !llvm.i64 + %6752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6753 = llvm.mul %6705, %6752 : !llvm.i64 + %6754 = llvm.add %6751, %6753 : !llvm.i64 + %6755 = llvm.getelementptr %6747[%6754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6756 = llvm.load %6755 : !llvm.ptr + %6757 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6758 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6759 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6760 = llvm.mul %28, %6759 : !llvm.i64 + %6761 = llvm.add %6758, %6760 : !llvm.i64 + %6762 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6763 = llvm.mul %6705, %6762 : !llvm.i64 + %6764 = llvm.add %6761, %6763 : !llvm.i64 + %6765 = llvm.getelementptr %6757[%6764] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6756, %6765 : !llvm.ptr + %6766 = llvm.add %5850, %47 : !llvm.i64 + %6767 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6768 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6769 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6770 = llvm.mul %28, %6769 : !llvm.i64 + %6771 = llvm.add %6768, %6770 : !llvm.i64 + %6772 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6773 = llvm.mul %5851, %6772 : !llvm.i64 + %6774 = llvm.add %6771, %6773 : !llvm.i64 + %6775 = llvm.getelementptr %6767[%6774] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6776 = llvm.load %6775 : !llvm.ptr + %6777 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6779 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6780 = llvm.mul %5851, %6779 : !llvm.i64 + %6781 = llvm.add %6778, %6780 : !llvm.i64 + %6782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6783 = llvm.mul %6766, %6782 : !llvm.i64 + %6784 = llvm.add %6781, %6783 : !llvm.i64 + %6785 = llvm.getelementptr %6777[%6784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6786 = llvm.load %6785 : !llvm.ptr + %6787 = llvm.fmul %6776, %6786 {RelaxedPrecision} : !llvm.float + %6788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6791 = llvm.mul %28, %6790 : !llvm.i64 + %6792 = llvm.add %6789, %6791 : !llvm.i64 + %6793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6794 = llvm.mul %6766, %6793 : !llvm.i64 + %6795 = llvm.add %6792, %6794 : !llvm.i64 + %6796 = llvm.getelementptr %6788[%6795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6797 = llvm.load %6796 : !llvm.ptr + %6798 = llvm.fadd %6797, %6787 {RelaxedPrecision} : !llvm.float + %6799 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6800 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6801 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6802 = llvm.mul %28, %6801 : !llvm.i64 + %6803 = llvm.add %6800, %6802 : !llvm.i64 + %6804 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6805 = llvm.mul %6766, %6804 : !llvm.i64 + %6806 = llvm.add %6803, %6805 : !llvm.i64 + %6807 = llvm.getelementptr %6799[%6806] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6798, %6807 : !llvm.ptr + %6808 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6811 = llvm.mul %28, %6810 : !llvm.i64 + %6812 = llvm.add %6809, %6811 : !llvm.i64 + %6813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6814 = llvm.mul %6766, %6813 : !llvm.i64 + %6815 = llvm.add %6812, %6814 : !llvm.i64 + %6816 = llvm.getelementptr %6808[%6815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6817 = llvm.load %6816 : !llvm.ptr + %6818 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6819 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6820 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6821 = llvm.mul %28, %6820 : !llvm.i64 + %6822 = llvm.add %6819, %6821 : !llvm.i64 + %6823 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6824 = llvm.mul %6766, %6823 : !llvm.i64 + %6825 = llvm.add %6822, %6824 : !llvm.i64 + %6826 = llvm.getelementptr %6818[%6825] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6817, %6826 : !llvm.ptr + %6827 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6828 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6829 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6830 = llvm.mul %24, %6829 : !llvm.i64 + %6831 = llvm.add %6828, %6830 : !llvm.i64 + %6832 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6833 = llvm.mul %5851, %6832 : !llvm.i64 + %6834 = llvm.add %6831, %6833 : !llvm.i64 + %6835 = llvm.getelementptr %6827[%6834] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6836 = llvm.load %6835 : !llvm.ptr + %6837 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6839 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6840 = llvm.mul %5851, %6839 : !llvm.i64 + %6841 = llvm.add %6838, %6840 : !llvm.i64 + %6842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6843 = llvm.mul %5850, %6842 : !llvm.i64 + %6844 = llvm.add %6841, %6843 : !llvm.i64 + %6845 = llvm.getelementptr %6837[%6844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6846 = llvm.load %6845 : !llvm.ptr + %6847 = llvm.fmul %6836, %6846 {RelaxedPrecision} : !llvm.float + %6848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6851 = llvm.mul %24, %6850 : !llvm.i64 + %6852 = llvm.add %6849, %6851 : !llvm.i64 + %6853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6854 = llvm.mul %5850, %6853 : !llvm.i64 + %6855 = llvm.add %6852, %6854 : !llvm.i64 + %6856 = llvm.getelementptr %6848[%6855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6857 = llvm.load %6856 : !llvm.ptr + %6858 = llvm.fadd %6857, %6847 {RelaxedPrecision} : !llvm.float + %6859 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6860 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6861 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6862 = llvm.mul %24, %6861 : !llvm.i64 + %6863 = llvm.add %6860, %6862 : !llvm.i64 + %6864 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6865 = llvm.mul %5850, %6864 : !llvm.i64 + %6866 = llvm.add %6863, %6865 : !llvm.i64 + %6867 = llvm.getelementptr %6859[%6866] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6858, %6867 : !llvm.ptr + %6868 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6871 = llvm.mul %24, %6870 : !llvm.i64 + %6872 = llvm.add %6869, %6871 : !llvm.i64 + %6873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6874 = llvm.mul %5850, %6873 : !llvm.i64 + %6875 = llvm.add %6872, %6874 : !llvm.i64 + %6876 = llvm.getelementptr %6868[%6875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6877 = llvm.load %6876 : !llvm.ptr + %6878 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6880 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6881 = llvm.mul %24, %6880 : !llvm.i64 + %6882 = llvm.add %6879, %6881 : !llvm.i64 + %6883 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6884 = llvm.mul %5850, %6883 : !llvm.i64 + %6885 = llvm.add %6882, %6884 : !llvm.i64 + %6886 = llvm.getelementptr %6878[%6885] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6877, %6886 : !llvm.ptr + %6887 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6888 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6889 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6890 = llvm.mul %24, %6889 : !llvm.i64 + %6891 = llvm.add %6888, %6890 : !llvm.i64 + %6892 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6893 = llvm.mul %5851, %6892 : !llvm.i64 + %6894 = llvm.add %6891, %6893 : !llvm.i64 + %6895 = llvm.getelementptr %6887[%6894] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6896 = llvm.load %6895 : !llvm.ptr + %6897 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6899 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6900 = llvm.mul %5851, %6899 : !llvm.i64 + %6901 = llvm.add %6898, %6900 : !llvm.i64 + %6902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6903 = llvm.mul %5912, %6902 : !llvm.i64 + %6904 = llvm.add %6901, %6903 : !llvm.i64 + %6905 = llvm.getelementptr %6897[%6904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6906 = llvm.load %6905 : !llvm.ptr + %6907 = llvm.fmul %6896, %6906 {RelaxedPrecision} : !llvm.float + %6908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6911 = llvm.mul %24, %6910 : !llvm.i64 + %6912 = llvm.add %6909, %6911 : !llvm.i64 + %6913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6914 = llvm.mul %5912, %6913 : !llvm.i64 + %6915 = llvm.add %6912, %6914 : !llvm.i64 + %6916 = llvm.getelementptr %6908[%6915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6917 = llvm.load %6916 : !llvm.ptr + %6918 = llvm.fadd %6917, %6907 {RelaxedPrecision} : !llvm.float + %6919 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6921 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6922 = llvm.mul %24, %6921 : !llvm.i64 + %6923 = llvm.add %6920, %6922 : !llvm.i64 + %6924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6925 = llvm.mul %5912, %6924 : !llvm.i64 + %6926 = llvm.add %6923, %6925 : !llvm.i64 + %6927 = llvm.getelementptr %6919[%6926] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6918, %6927 : !llvm.ptr + %6928 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6930 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6931 = llvm.mul %24, %6930 : !llvm.i64 + %6932 = llvm.add %6929, %6931 : !llvm.i64 + %6933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6934 = llvm.mul %5912, %6933 : !llvm.i64 + %6935 = llvm.add %6932, %6934 : !llvm.i64 + %6936 = llvm.getelementptr %6928[%6935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6937 = llvm.load %6936 : !llvm.ptr + %6938 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6940 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6941 = llvm.mul %24, %6940 : !llvm.i64 + %6942 = llvm.add %6939, %6941 : !llvm.i64 + %6943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6944 = llvm.mul %5912, %6943 : !llvm.i64 + %6945 = llvm.add %6942, %6944 : !llvm.i64 + %6946 = llvm.getelementptr %6938[%6945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6937, %6946 : !llvm.ptr + %6947 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6948 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6949 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6950 = llvm.mul %24, %6949 : !llvm.i64 + %6951 = llvm.add %6948, %6950 : !llvm.i64 + %6952 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6953 = llvm.mul %5851, %6952 : !llvm.i64 + %6954 = llvm.add %6951, %6953 : !llvm.i64 + %6955 = llvm.getelementptr %6947[%6954] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6956 = llvm.load %6955 : !llvm.ptr + %6957 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6958 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6959 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6960 = llvm.mul %5851, %6959 : !llvm.i64 + %6961 = llvm.add %6958, %6960 : !llvm.i64 + %6962 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6963 = llvm.mul %5973, %6962 : !llvm.i64 + %6964 = llvm.add %6961, %6963 : !llvm.i64 + %6965 = llvm.getelementptr %6957[%6964] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6966 = llvm.load %6965 : !llvm.ptr + %6967 = llvm.fmul %6956, %6966 {RelaxedPrecision} : !llvm.float + %6968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6971 = llvm.mul %24, %6970 : !llvm.i64 + %6972 = llvm.add %6969, %6971 : !llvm.i64 + %6973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6974 = llvm.mul %5973, %6973 : !llvm.i64 + %6975 = llvm.add %6972, %6974 : !llvm.i64 + %6976 = llvm.getelementptr %6968[%6975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6977 = llvm.load %6976 : !llvm.ptr + %6978 = llvm.fadd %6977, %6967 {RelaxedPrecision} : !llvm.float + %6979 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6981 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6982 = llvm.mul %24, %6981 : !llvm.i64 + %6983 = llvm.add %6980, %6982 : !llvm.i64 + %6984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6985 = llvm.mul %5973, %6984 : !llvm.i64 + %6986 = llvm.add %6983, %6985 : !llvm.i64 + %6987 = llvm.getelementptr %6979[%6986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6978, %6987 : !llvm.ptr + %6988 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6990 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6991 = llvm.mul %24, %6990 : !llvm.i64 + %6992 = llvm.add %6989, %6991 : !llvm.i64 + %6993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6994 = llvm.mul %5973, %6993 : !llvm.i64 + %6995 = llvm.add %6992, %6994 : !llvm.i64 + %6996 = llvm.getelementptr %6988[%6995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6997 = llvm.load %6996 : !llvm.ptr + %6998 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6999 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7000 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7001 = llvm.mul %24, %7000 : !llvm.i64 + %7002 = llvm.add %6999, %7001 : !llvm.i64 + %7003 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7004 = llvm.mul %5973, %7003 : !llvm.i64 + %7005 = llvm.add %7002, %7004 : !llvm.i64 + %7006 = llvm.getelementptr %6998[%7005] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6997, %7006 : !llvm.ptr + %7007 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7009 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7010 = llvm.mul %24, %7009 : !llvm.i64 + %7011 = llvm.add %7008, %7010 : !llvm.i64 + %7012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7013 = llvm.mul %5851, %7012 : !llvm.i64 + %7014 = llvm.add %7011, %7013 : !llvm.i64 + %7015 = llvm.getelementptr %7007[%7014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7016 = llvm.load %7015 : !llvm.ptr + %7017 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7018 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7019 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7020 = llvm.mul %5851, %7019 : !llvm.i64 + %7021 = llvm.add %7018, %7020 : !llvm.i64 + %7022 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7023 = llvm.mul %6034, %7022 : !llvm.i64 + %7024 = llvm.add %7021, %7023 : !llvm.i64 + %7025 = llvm.getelementptr %7017[%7024] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7026 = llvm.load %7025 : !llvm.ptr + %7027 = llvm.fmul %7016, %7026 {RelaxedPrecision} : !llvm.float + %7028 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7031 = llvm.mul %24, %7030 : !llvm.i64 + %7032 = llvm.add %7029, %7031 : !llvm.i64 + %7033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7034 = llvm.mul %6034, %7033 : !llvm.i64 + %7035 = llvm.add %7032, %7034 : !llvm.i64 + %7036 = llvm.getelementptr %7028[%7035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7037 = llvm.load %7036 : !llvm.ptr + %7038 = llvm.fadd %7037, %7027 {RelaxedPrecision} : !llvm.float + %7039 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7041 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7042 = llvm.mul %24, %7041 : !llvm.i64 + %7043 = llvm.add %7040, %7042 : !llvm.i64 + %7044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7045 = llvm.mul %6034, %7044 : !llvm.i64 + %7046 = llvm.add %7043, %7045 : !llvm.i64 + %7047 = llvm.getelementptr %7039[%7046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7038, %7047 : !llvm.ptr + %7048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7051 = llvm.mul %24, %7050 : !llvm.i64 + %7052 = llvm.add %7049, %7051 : !llvm.i64 + %7053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7054 = llvm.mul %6034, %7053 : !llvm.i64 + %7055 = llvm.add %7052, %7054 : !llvm.i64 + %7056 = llvm.getelementptr %7048[%7055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7057 = llvm.load %7056 : !llvm.ptr + %7058 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7059 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7060 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7061 = llvm.mul %24, %7060 : !llvm.i64 + %7062 = llvm.add %7059, %7061 : !llvm.i64 + %7063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7064 = llvm.mul %6034, %7063 : !llvm.i64 + %7065 = llvm.add %7062, %7064 : !llvm.i64 + %7066 = llvm.getelementptr %7058[%7065] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7057, %7066 : !llvm.ptr + %7067 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7069 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7070 = llvm.mul %24, %7069 : !llvm.i64 + %7071 = llvm.add %7068, %7070 : !llvm.i64 + %7072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7073 = llvm.mul %5851, %7072 : !llvm.i64 + %7074 = llvm.add %7071, %7073 : !llvm.i64 + %7075 = llvm.getelementptr %7067[%7074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7076 = llvm.load %7075 : !llvm.ptr + %7077 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7078 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7079 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7080 = llvm.mul %5851, %7079 : !llvm.i64 + %7081 = llvm.add %7078, %7080 : !llvm.i64 + %7082 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7083 = llvm.mul %6095, %7082 : !llvm.i64 + %7084 = llvm.add %7081, %7083 : !llvm.i64 + %7085 = llvm.getelementptr %7077[%7084] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7086 = llvm.load %7085 : !llvm.ptr + %7087 = llvm.fmul %7076, %7086 {RelaxedPrecision} : !llvm.float + %7088 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7091 = llvm.mul %24, %7090 : !llvm.i64 + %7092 = llvm.add %7089, %7091 : !llvm.i64 + %7093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7094 = llvm.mul %6095, %7093 : !llvm.i64 + %7095 = llvm.add %7092, %7094 : !llvm.i64 + %7096 = llvm.getelementptr %7088[%7095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7097 = llvm.load %7096 : !llvm.ptr + %7098 = llvm.fadd %7097, %7087 {RelaxedPrecision} : !llvm.float + %7099 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7101 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7102 = llvm.mul %24, %7101 : !llvm.i64 + %7103 = llvm.add %7100, %7102 : !llvm.i64 + %7104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7105 = llvm.mul %6095, %7104 : !llvm.i64 + %7106 = llvm.add %7103, %7105 : !llvm.i64 + %7107 = llvm.getelementptr %7099[%7106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7098, %7107 : !llvm.ptr + %7108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7111 = llvm.mul %24, %7110 : !llvm.i64 + %7112 = llvm.add %7109, %7111 : !llvm.i64 + %7113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7114 = llvm.mul %6095, %7113 : !llvm.i64 + %7115 = llvm.add %7112, %7114 : !llvm.i64 + %7116 = llvm.getelementptr %7108[%7115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7117 = llvm.load %7116 : !llvm.ptr + %7118 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7119 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7120 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7121 = llvm.mul %24, %7120 : !llvm.i64 + %7122 = llvm.add %7119, %7121 : !llvm.i64 + %7123 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7124 = llvm.mul %6095, %7123 : !llvm.i64 + %7125 = llvm.add %7122, %7124 : !llvm.i64 + %7126 = llvm.getelementptr %7118[%7125] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7117, %7126 : !llvm.ptr + %7127 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7128 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7129 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7130 = llvm.mul %24, %7129 : !llvm.i64 + %7131 = llvm.add %7128, %7130 : !llvm.i64 + %7132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7133 = llvm.mul %5851, %7132 : !llvm.i64 + %7134 = llvm.add %7131, %7133 : !llvm.i64 + %7135 = llvm.getelementptr %7127[%7134] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7136 = llvm.load %7135 : !llvm.ptr + %7137 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7140 = llvm.mul %5851, %7139 : !llvm.i64 + %7141 = llvm.add %7138, %7140 : !llvm.i64 + %7142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7143 = llvm.mul %6156, %7142 : !llvm.i64 + %7144 = llvm.add %7141, %7143 : !llvm.i64 + %7145 = llvm.getelementptr %7137[%7144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7146 = llvm.load %7145 : !llvm.ptr + %7147 = llvm.fmul %7136, %7146 {RelaxedPrecision} : !llvm.float + %7148 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7151 = llvm.mul %24, %7150 : !llvm.i64 + %7152 = llvm.add %7149, %7151 : !llvm.i64 + %7153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7154 = llvm.mul %6156, %7153 : !llvm.i64 + %7155 = llvm.add %7152, %7154 : !llvm.i64 + %7156 = llvm.getelementptr %7148[%7155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7157 = llvm.load %7156 : !llvm.ptr + %7158 = llvm.fadd %7157, %7147 {RelaxedPrecision} : !llvm.float + %7159 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7161 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7162 = llvm.mul %24, %7161 : !llvm.i64 + %7163 = llvm.add %7160, %7162 : !llvm.i64 + %7164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7165 = llvm.mul %6156, %7164 : !llvm.i64 + %7166 = llvm.add %7163, %7165 : !llvm.i64 + %7167 = llvm.getelementptr %7159[%7166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7158, %7167 : !llvm.ptr + %7168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7171 = llvm.mul %24, %7170 : !llvm.i64 + %7172 = llvm.add %7169, %7171 : !llvm.i64 + %7173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7174 = llvm.mul %6156, %7173 : !llvm.i64 + %7175 = llvm.add %7172, %7174 : !llvm.i64 + %7176 = llvm.getelementptr %7168[%7175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7177 = llvm.load %7176 : !llvm.ptr + %7178 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7180 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7181 = llvm.mul %24, %7180 : !llvm.i64 + %7182 = llvm.add %7179, %7181 : !llvm.i64 + %7183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7184 = llvm.mul %6156, %7183 : !llvm.i64 + %7185 = llvm.add %7182, %7184 : !llvm.i64 + %7186 = llvm.getelementptr %7178[%7185] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7177, %7186 : !llvm.ptr + %7187 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7188 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7189 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7190 = llvm.mul %24, %7189 : !llvm.i64 + %7191 = llvm.add %7188, %7190 : !llvm.i64 + %7192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7193 = llvm.mul %5851, %7192 : !llvm.i64 + %7194 = llvm.add %7191, %7193 : !llvm.i64 + %7195 = llvm.getelementptr %7187[%7194] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7196 = llvm.load %7195 : !llvm.ptr + %7197 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7198 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7199 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7200 = llvm.mul %5851, %7199 : !llvm.i64 + %7201 = llvm.add %7198, %7200 : !llvm.i64 + %7202 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7203 = llvm.mul %6217, %7202 : !llvm.i64 + %7204 = llvm.add %7201, %7203 : !llvm.i64 + %7205 = llvm.getelementptr %7197[%7204] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7206 = llvm.load %7205 : !llvm.ptr + %7207 = llvm.fmul %7196, %7206 {RelaxedPrecision} : !llvm.float + %7208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7211 = llvm.mul %24, %7210 : !llvm.i64 + %7212 = llvm.add %7209, %7211 : !llvm.i64 + %7213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7214 = llvm.mul %6217, %7213 : !llvm.i64 + %7215 = llvm.add %7212, %7214 : !llvm.i64 + %7216 = llvm.getelementptr %7208[%7215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7217 = llvm.load %7216 : !llvm.ptr + %7218 = llvm.fadd %7217, %7207 {RelaxedPrecision} : !llvm.float + %7219 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7221 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7222 = llvm.mul %24, %7221 : !llvm.i64 + %7223 = llvm.add %7220, %7222 : !llvm.i64 + %7224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7225 = llvm.mul %6217, %7224 : !llvm.i64 + %7226 = llvm.add %7223, %7225 : !llvm.i64 + %7227 = llvm.getelementptr %7219[%7226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7218, %7227 : !llvm.ptr + %7228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7231 = llvm.mul %24, %7230 : !llvm.i64 + %7232 = llvm.add %7229, %7231 : !llvm.i64 + %7233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7234 = llvm.mul %6217, %7233 : !llvm.i64 + %7235 = llvm.add %7232, %7234 : !llvm.i64 + %7236 = llvm.getelementptr %7228[%7235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7237 = llvm.load %7236 : !llvm.ptr + %7238 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7239 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7240 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7241 = llvm.mul %24, %7240 : !llvm.i64 + %7242 = llvm.add %7239, %7241 : !llvm.i64 + %7243 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7244 = llvm.mul %6217, %7243 : !llvm.i64 + %7245 = llvm.add %7242, %7244 : !llvm.i64 + %7246 = llvm.getelementptr %7238[%7245] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7237, %7246 : !llvm.ptr + %7247 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7248 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7249 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7250 = llvm.mul %24, %7249 : !llvm.i64 + %7251 = llvm.add %7248, %7250 : !llvm.i64 + %7252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7253 = llvm.mul %5851, %7252 : !llvm.i64 + %7254 = llvm.add %7251, %7253 : !llvm.i64 + %7255 = llvm.getelementptr %7247[%7254] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7256 = llvm.load %7255 : !llvm.ptr + %7257 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7258 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7259 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7260 = llvm.mul %5851, %7259 : !llvm.i64 + %7261 = llvm.add %7258, %7260 : !llvm.i64 + %7262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7263 = llvm.mul %6278, %7262 : !llvm.i64 + %7264 = llvm.add %7261, %7263 : !llvm.i64 + %7265 = llvm.getelementptr %7257[%7264] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7266 = llvm.load %7265 : !llvm.ptr + %7267 = llvm.fmul %7256, %7266 {RelaxedPrecision} : !llvm.float + %7268 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7271 = llvm.mul %24, %7270 : !llvm.i64 + %7272 = llvm.add %7269, %7271 : !llvm.i64 + %7273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7274 = llvm.mul %6278, %7273 : !llvm.i64 + %7275 = llvm.add %7272, %7274 : !llvm.i64 + %7276 = llvm.getelementptr %7268[%7275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7277 = llvm.load %7276 : !llvm.ptr + %7278 = llvm.fadd %7277, %7267 {RelaxedPrecision} : !llvm.float + %7279 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7281 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7282 = llvm.mul %24, %7281 : !llvm.i64 + %7283 = llvm.add %7280, %7282 : !llvm.i64 + %7284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7285 = llvm.mul %6278, %7284 : !llvm.i64 + %7286 = llvm.add %7283, %7285 : !llvm.i64 + %7287 = llvm.getelementptr %7279[%7286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7278, %7287 : !llvm.ptr + %7288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7291 = llvm.mul %24, %7290 : !llvm.i64 + %7292 = llvm.add %7289, %7291 : !llvm.i64 + %7293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7294 = llvm.mul %6278, %7293 : !llvm.i64 + %7295 = llvm.add %7292, %7294 : !llvm.i64 + %7296 = llvm.getelementptr %7288[%7295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7297 = llvm.load %7296 : !llvm.ptr + %7298 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7300 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7301 = llvm.mul %24, %7300 : !llvm.i64 + %7302 = llvm.add %7299, %7301 : !llvm.i64 + %7303 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7304 = llvm.mul %6278, %7303 : !llvm.i64 + %7305 = llvm.add %7302, %7304 : !llvm.i64 + %7306 = llvm.getelementptr %7298[%7305] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7297, %7306 : !llvm.ptr + %7307 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7308 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7309 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7310 = llvm.mul %24, %7309 : !llvm.i64 + %7311 = llvm.add %7308, %7310 : !llvm.i64 + %7312 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7313 = llvm.mul %5851, %7312 : !llvm.i64 + %7314 = llvm.add %7311, %7313 : !llvm.i64 + %7315 = llvm.getelementptr %7307[%7314] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7316 = llvm.load %7315 : !llvm.ptr + %7317 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7318 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7319 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7320 = llvm.mul %5851, %7319 : !llvm.i64 + %7321 = llvm.add %7318, %7320 : !llvm.i64 + %7322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7323 = llvm.mul %6339, %7322 : !llvm.i64 + %7324 = llvm.add %7321, %7323 : !llvm.i64 + %7325 = llvm.getelementptr %7317[%7324] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7326 = llvm.load %7325 : !llvm.ptr + %7327 = llvm.fmul %7316, %7326 {RelaxedPrecision} : !llvm.float + %7328 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7331 = llvm.mul %24, %7330 : !llvm.i64 + %7332 = llvm.add %7329, %7331 : !llvm.i64 + %7333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7334 = llvm.mul %6339, %7333 : !llvm.i64 + %7335 = llvm.add %7332, %7334 : !llvm.i64 + %7336 = llvm.getelementptr %7328[%7335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7337 = llvm.load %7336 : !llvm.ptr + %7338 = llvm.fadd %7337, %7327 {RelaxedPrecision} : !llvm.float + %7339 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7341 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7342 = llvm.mul %24, %7341 : !llvm.i64 + %7343 = llvm.add %7340, %7342 : !llvm.i64 + %7344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7345 = llvm.mul %6339, %7344 : !llvm.i64 + %7346 = llvm.add %7343, %7345 : !llvm.i64 + %7347 = llvm.getelementptr %7339[%7346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7338, %7347 : !llvm.ptr + %7348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7351 = llvm.mul %24, %7350 : !llvm.i64 + %7352 = llvm.add %7349, %7351 : !llvm.i64 + %7353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7354 = llvm.mul %6339, %7353 : !llvm.i64 + %7355 = llvm.add %7352, %7354 : !llvm.i64 + %7356 = llvm.getelementptr %7348[%7355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7357 = llvm.load %7356 : !llvm.ptr + %7358 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7360 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7361 = llvm.mul %24, %7360 : !llvm.i64 + %7362 = llvm.add %7359, %7361 : !llvm.i64 + %7363 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7364 = llvm.mul %6339, %7363 : !llvm.i64 + %7365 = llvm.add %7362, %7364 : !llvm.i64 + %7366 = llvm.getelementptr %7358[%7365] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7357, %7366 : !llvm.ptr + %7367 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7368 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7369 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7370 = llvm.mul %24, %7369 : !llvm.i64 + %7371 = llvm.add %7368, %7370 : !llvm.i64 + %7372 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7373 = llvm.mul %5851, %7372 : !llvm.i64 + %7374 = llvm.add %7371, %7373 : !llvm.i64 + %7375 = llvm.getelementptr %7367[%7374] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7376 = llvm.load %7375 : !llvm.ptr + %7377 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7378 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7379 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7380 = llvm.mul %5851, %7379 : !llvm.i64 + %7381 = llvm.add %7378, %7380 : !llvm.i64 + %7382 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7383 = llvm.mul %6400, %7382 : !llvm.i64 + %7384 = llvm.add %7381, %7383 : !llvm.i64 + %7385 = llvm.getelementptr %7377[%7384] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7386 = llvm.load %7385 : !llvm.ptr + %7387 = llvm.fmul %7376, %7386 {RelaxedPrecision} : !llvm.float + %7388 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7391 = llvm.mul %24, %7390 : !llvm.i64 + %7392 = llvm.add %7389, %7391 : !llvm.i64 + %7393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7394 = llvm.mul %6400, %7393 : !llvm.i64 + %7395 = llvm.add %7392, %7394 : !llvm.i64 + %7396 = llvm.getelementptr %7388[%7395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7397 = llvm.load %7396 : !llvm.ptr + %7398 = llvm.fadd %7397, %7387 {RelaxedPrecision} : !llvm.float + %7399 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7401 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7402 = llvm.mul %24, %7401 : !llvm.i64 + %7403 = llvm.add %7400, %7402 : !llvm.i64 + %7404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7405 = llvm.mul %6400, %7404 : !llvm.i64 + %7406 = llvm.add %7403, %7405 : !llvm.i64 + %7407 = llvm.getelementptr %7399[%7406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7398, %7407 : !llvm.ptr + %7408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7411 = llvm.mul %24, %7410 : !llvm.i64 + %7412 = llvm.add %7409, %7411 : !llvm.i64 + %7413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7414 = llvm.mul %6400, %7413 : !llvm.i64 + %7415 = llvm.add %7412, %7414 : !llvm.i64 + %7416 = llvm.getelementptr %7408[%7415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7417 = llvm.load %7416 : !llvm.ptr + %7418 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7420 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7421 = llvm.mul %24, %7420 : !llvm.i64 + %7422 = llvm.add %7419, %7421 : !llvm.i64 + %7423 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7424 = llvm.mul %6400, %7423 : !llvm.i64 + %7425 = llvm.add %7422, %7424 : !llvm.i64 + %7426 = llvm.getelementptr %7418[%7425] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7417, %7426 : !llvm.ptr + %7427 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7428 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7429 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7430 = llvm.mul %24, %7429 : !llvm.i64 + %7431 = llvm.add %7428, %7430 : !llvm.i64 + %7432 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7433 = llvm.mul %5851, %7432 : !llvm.i64 + %7434 = llvm.add %7431, %7433 : !llvm.i64 + %7435 = llvm.getelementptr %7427[%7434] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7436 = llvm.load %7435 : !llvm.ptr + %7437 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7438 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7439 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7440 = llvm.mul %5851, %7439 : !llvm.i64 + %7441 = llvm.add %7438, %7440 : !llvm.i64 + %7442 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7443 = llvm.mul %6461, %7442 : !llvm.i64 + %7444 = llvm.add %7441, %7443 : !llvm.i64 + %7445 = llvm.getelementptr %7437[%7444] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7446 = llvm.load %7445 : !llvm.ptr + %7447 = llvm.fmul %7436, %7446 {RelaxedPrecision} : !llvm.float + %7448 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7451 = llvm.mul %24, %7450 : !llvm.i64 + %7452 = llvm.add %7449, %7451 : !llvm.i64 + %7453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7454 = llvm.mul %6461, %7453 : !llvm.i64 + %7455 = llvm.add %7452, %7454 : !llvm.i64 + %7456 = llvm.getelementptr %7448[%7455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7457 = llvm.load %7456 : !llvm.ptr + %7458 = llvm.fadd %7457, %7447 {RelaxedPrecision} : !llvm.float + %7459 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7461 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7462 = llvm.mul %24, %7461 : !llvm.i64 + %7463 = llvm.add %7460, %7462 : !llvm.i64 + %7464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7465 = llvm.mul %6461, %7464 : !llvm.i64 + %7466 = llvm.add %7463, %7465 : !llvm.i64 + %7467 = llvm.getelementptr %7459[%7466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7458, %7467 : !llvm.ptr + %7468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7471 = llvm.mul %24, %7470 : !llvm.i64 + %7472 = llvm.add %7469, %7471 : !llvm.i64 + %7473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7474 = llvm.mul %6461, %7473 : !llvm.i64 + %7475 = llvm.add %7472, %7474 : !llvm.i64 + %7476 = llvm.getelementptr %7468[%7475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7477 = llvm.load %7476 : !llvm.ptr + %7478 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7479 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7480 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7481 = llvm.mul %24, %7480 : !llvm.i64 + %7482 = llvm.add %7479, %7481 : !llvm.i64 + %7483 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7484 = llvm.mul %6461, %7483 : !llvm.i64 + %7485 = llvm.add %7482, %7484 : !llvm.i64 + %7486 = llvm.getelementptr %7478[%7485] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7477, %7486 : !llvm.ptr + %7487 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7489 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7490 = llvm.mul %24, %7489 : !llvm.i64 + %7491 = llvm.add %7488, %7490 : !llvm.i64 + %7492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7493 = llvm.mul %5851, %7492 : !llvm.i64 + %7494 = llvm.add %7491, %7493 : !llvm.i64 + %7495 = llvm.getelementptr %7487[%7494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7496 = llvm.load %7495 : !llvm.ptr + %7497 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7500 = llvm.mul %5851, %7499 : !llvm.i64 + %7501 = llvm.add %7498, %7500 : !llvm.i64 + %7502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7503 = llvm.mul %6522, %7502 : !llvm.i64 + %7504 = llvm.add %7501, %7503 : !llvm.i64 + %7505 = llvm.getelementptr %7497[%7504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7506 = llvm.load %7505 : !llvm.ptr + %7507 = llvm.fmul %7496, %7506 {RelaxedPrecision} : !llvm.float + %7508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7511 = llvm.mul %24, %7510 : !llvm.i64 + %7512 = llvm.add %7509, %7511 : !llvm.i64 + %7513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7514 = llvm.mul %6522, %7513 : !llvm.i64 + %7515 = llvm.add %7512, %7514 : !llvm.i64 + %7516 = llvm.getelementptr %7508[%7515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7517 = llvm.load %7516 : !llvm.ptr + %7518 = llvm.fadd %7517, %7507 {RelaxedPrecision} : !llvm.float + %7519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7522 = llvm.mul %24, %7521 : !llvm.i64 + %7523 = llvm.add %7520, %7522 : !llvm.i64 + %7524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7525 = llvm.mul %6522, %7524 : !llvm.i64 + %7526 = llvm.add %7523, %7525 : !llvm.i64 + %7527 = llvm.getelementptr %7519[%7526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7518, %7527 : !llvm.ptr + %7528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7531 = llvm.mul %24, %7530 : !llvm.i64 + %7532 = llvm.add %7529, %7531 : !llvm.i64 + %7533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7534 = llvm.mul %6522, %7533 : !llvm.i64 + %7535 = llvm.add %7532, %7534 : !llvm.i64 + %7536 = llvm.getelementptr %7528[%7535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7537 = llvm.load %7536 : !llvm.ptr + %7538 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7539 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7540 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7541 = llvm.mul %24, %7540 : !llvm.i64 + %7542 = llvm.add %7539, %7541 : !llvm.i64 + %7543 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7544 = llvm.mul %6522, %7543 : !llvm.i64 + %7545 = llvm.add %7542, %7544 : !llvm.i64 + %7546 = llvm.getelementptr %7538[%7545] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7537, %7546 : !llvm.ptr + %7547 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7548 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7549 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7550 = llvm.mul %24, %7549 : !llvm.i64 + %7551 = llvm.add %7548, %7550 : !llvm.i64 + %7552 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7553 = llvm.mul %5851, %7552 : !llvm.i64 + %7554 = llvm.add %7551, %7553 : !llvm.i64 + %7555 = llvm.getelementptr %7547[%7554] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7556 = llvm.load %7555 : !llvm.ptr + %7557 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7558 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7559 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7560 = llvm.mul %5851, %7559 : !llvm.i64 + %7561 = llvm.add %7558, %7560 : !llvm.i64 + %7562 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7563 = llvm.mul %6583, %7562 : !llvm.i64 + %7564 = llvm.add %7561, %7563 : !llvm.i64 + %7565 = llvm.getelementptr %7557[%7564] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7566 = llvm.load %7565 : !llvm.ptr + %7567 = llvm.fmul %7556, %7566 {RelaxedPrecision} : !llvm.float + %7568 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7571 = llvm.mul %24, %7570 : !llvm.i64 + %7572 = llvm.add %7569, %7571 : !llvm.i64 + %7573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7574 = llvm.mul %6583, %7573 : !llvm.i64 + %7575 = llvm.add %7572, %7574 : !llvm.i64 + %7576 = llvm.getelementptr %7568[%7575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7577 = llvm.load %7576 : !llvm.ptr + %7578 = llvm.fadd %7577, %7567 {RelaxedPrecision} : !llvm.float + %7579 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7581 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7582 = llvm.mul %24, %7581 : !llvm.i64 + %7583 = llvm.add %7580, %7582 : !llvm.i64 + %7584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7585 = llvm.mul %6583, %7584 : !llvm.i64 + %7586 = llvm.add %7583, %7585 : !llvm.i64 + %7587 = llvm.getelementptr %7579[%7586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7578, %7587 : !llvm.ptr + %7588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7591 = llvm.mul %24, %7590 : !llvm.i64 + %7592 = llvm.add %7589, %7591 : !llvm.i64 + %7593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7594 = llvm.mul %6583, %7593 : !llvm.i64 + %7595 = llvm.add %7592, %7594 : !llvm.i64 + %7596 = llvm.getelementptr %7588[%7595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7597 = llvm.load %7596 : !llvm.ptr + %7598 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7599 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7600 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7601 = llvm.mul %24, %7600 : !llvm.i64 + %7602 = llvm.add %7599, %7601 : !llvm.i64 + %7603 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7604 = llvm.mul %6583, %7603 : !llvm.i64 + %7605 = llvm.add %7602, %7604 : !llvm.i64 + %7606 = llvm.getelementptr %7598[%7605] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7597, %7606 : !llvm.ptr + %7607 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7608 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7609 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7610 = llvm.mul %24, %7609 : !llvm.i64 + %7611 = llvm.add %7608, %7610 : !llvm.i64 + %7612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7613 = llvm.mul %5851, %7612 : !llvm.i64 + %7614 = llvm.add %7611, %7613 : !llvm.i64 + %7615 = llvm.getelementptr %7607[%7614] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7616 = llvm.load %7615 : !llvm.ptr + %7617 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7618 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7619 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7620 = llvm.mul %5851, %7619 : !llvm.i64 + %7621 = llvm.add %7618, %7620 : !llvm.i64 + %7622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7623 = llvm.mul %6644, %7622 : !llvm.i64 + %7624 = llvm.add %7621, %7623 : !llvm.i64 + %7625 = llvm.getelementptr %7617[%7624] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7626 = llvm.load %7625 : !llvm.ptr + %7627 = llvm.fmul %7616, %7626 {RelaxedPrecision} : !llvm.float + %7628 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7631 = llvm.mul %24, %7630 : !llvm.i64 + %7632 = llvm.add %7629, %7631 : !llvm.i64 + %7633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7634 = llvm.mul %6644, %7633 : !llvm.i64 + %7635 = llvm.add %7632, %7634 : !llvm.i64 + %7636 = llvm.getelementptr %7628[%7635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7637 = llvm.load %7636 : !llvm.ptr + %7638 = llvm.fadd %7637, %7627 {RelaxedPrecision} : !llvm.float + %7639 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7641 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7642 = llvm.mul %24, %7641 : !llvm.i64 + %7643 = llvm.add %7640, %7642 : !llvm.i64 + %7644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7645 = llvm.mul %6644, %7644 : !llvm.i64 + %7646 = llvm.add %7643, %7645 : !llvm.i64 + %7647 = llvm.getelementptr %7639[%7646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7638, %7647 : !llvm.ptr + %7648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7651 = llvm.mul %24, %7650 : !llvm.i64 + %7652 = llvm.add %7649, %7651 : !llvm.i64 + %7653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7654 = llvm.mul %6644, %7653 : !llvm.i64 + %7655 = llvm.add %7652, %7654 : !llvm.i64 + %7656 = llvm.getelementptr %7648[%7655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7657 = llvm.load %7656 : !llvm.ptr + %7658 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7659 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7660 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7661 = llvm.mul %24, %7660 : !llvm.i64 + %7662 = llvm.add %7659, %7661 : !llvm.i64 + %7663 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7664 = llvm.mul %6644, %7663 : !llvm.i64 + %7665 = llvm.add %7662, %7664 : !llvm.i64 + %7666 = llvm.getelementptr %7658[%7665] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7657, %7666 : !llvm.ptr + %7667 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7668 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7669 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7670 = llvm.mul %24, %7669 : !llvm.i64 + %7671 = llvm.add %7668, %7670 : !llvm.i64 + %7672 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7673 = llvm.mul %5851, %7672 : !llvm.i64 + %7674 = llvm.add %7671, %7673 : !llvm.i64 + %7675 = llvm.getelementptr %7667[%7674] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7676 = llvm.load %7675 : !llvm.ptr + %7677 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7680 = llvm.mul %5851, %7679 : !llvm.i64 + %7681 = llvm.add %7678, %7680 : !llvm.i64 + %7682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7683 = llvm.mul %6705, %7682 : !llvm.i64 + %7684 = llvm.add %7681, %7683 : !llvm.i64 + %7685 = llvm.getelementptr %7677[%7684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7686 = llvm.load %7685 : !llvm.ptr + %7687 = llvm.fmul %7676, %7686 {RelaxedPrecision} : !llvm.float + %7688 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7691 = llvm.mul %24, %7690 : !llvm.i64 + %7692 = llvm.add %7689, %7691 : !llvm.i64 + %7693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7694 = llvm.mul %6705, %7693 : !llvm.i64 + %7695 = llvm.add %7692, %7694 : !llvm.i64 + %7696 = llvm.getelementptr %7688[%7695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7697 = llvm.load %7696 : !llvm.ptr + %7698 = llvm.fadd %7697, %7687 {RelaxedPrecision} : !llvm.float + %7699 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7701 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7702 = llvm.mul %24, %7701 : !llvm.i64 + %7703 = llvm.add %7700, %7702 : !llvm.i64 + %7704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7705 = llvm.mul %6705, %7704 : !llvm.i64 + %7706 = llvm.add %7703, %7705 : !llvm.i64 + %7707 = llvm.getelementptr %7699[%7706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7698, %7707 : !llvm.ptr + %7708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7711 = llvm.mul %24, %7710 : !llvm.i64 + %7712 = llvm.add %7709, %7711 : !llvm.i64 + %7713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7714 = llvm.mul %6705, %7713 : !llvm.i64 + %7715 = llvm.add %7712, %7714 : !llvm.i64 + %7716 = llvm.getelementptr %7708[%7715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7717 = llvm.load %7716 : !llvm.ptr + %7718 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7720 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7721 = llvm.mul %24, %7720 : !llvm.i64 + %7722 = llvm.add %7719, %7721 : !llvm.i64 + %7723 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7724 = llvm.mul %6705, %7723 : !llvm.i64 + %7725 = llvm.add %7722, %7724 : !llvm.i64 + %7726 = llvm.getelementptr %7718[%7725] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7717, %7726 : !llvm.ptr + %7727 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7729 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7730 = llvm.mul %24, %7729 : !llvm.i64 + %7731 = llvm.add %7728, %7730 : !llvm.i64 + %7732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7733 = llvm.mul %5851, %7732 : !llvm.i64 + %7734 = llvm.add %7731, %7733 : !llvm.i64 + %7735 = llvm.getelementptr %7727[%7734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7736 = llvm.load %7735 : !llvm.ptr + %7737 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7738 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7739 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7740 = llvm.mul %5851, %7739 : !llvm.i64 + %7741 = llvm.add %7738, %7740 : !llvm.i64 + %7742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7743 = llvm.mul %6766, %7742 : !llvm.i64 + %7744 = llvm.add %7741, %7743 : !llvm.i64 + %7745 = llvm.getelementptr %7737[%7744] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7746 = llvm.load %7745 : !llvm.ptr + %7747 = llvm.fmul %7736, %7746 {RelaxedPrecision} : !llvm.float + %7748 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7750 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7751 = llvm.mul %24, %7750 : !llvm.i64 + %7752 = llvm.add %7749, %7751 : !llvm.i64 + %7753 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7754 = llvm.mul %6766, %7753 : !llvm.i64 + %7755 = llvm.add %7752, %7754 : !llvm.i64 + %7756 = llvm.getelementptr %7748[%7755] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7757 = llvm.load %7756 : !llvm.ptr + %7758 = llvm.fadd %7757, %7747 {RelaxedPrecision} : !llvm.float + %7759 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7761 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7762 = llvm.mul %24, %7761 : !llvm.i64 + %7763 = llvm.add %7760, %7762 : !llvm.i64 + %7764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7765 = llvm.mul %6766, %7764 : !llvm.i64 + %7766 = llvm.add %7763, %7765 : !llvm.i64 + %7767 = llvm.getelementptr %7759[%7766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7758, %7767 : !llvm.ptr + %7768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7771 = llvm.mul %24, %7770 : !llvm.i64 + %7772 = llvm.add %7769, %7771 : !llvm.i64 + %7773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7774 = llvm.mul %6766, %7773 : !llvm.i64 + %7775 = llvm.add %7772, %7774 : !llvm.i64 + %7776 = llvm.getelementptr %7768[%7775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7777 = llvm.load %7776 : !llvm.ptr + %7778 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7779 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7780 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7781 = llvm.mul %24, %7780 : !llvm.i64 + %7782 = llvm.add %7779, %7781 : !llvm.i64 + %7783 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7784 = llvm.mul %6766, %7783 : !llvm.i64 + %7785 = llvm.add %7782, %7784 : !llvm.i64 + %7786 = llvm.getelementptr %7778[%7785] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7777, %7786 : !llvm.ptr + %7787 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7788 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7789 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7790 = llvm.mul %25, %7789 : !llvm.i64 + %7791 = llvm.add %7788, %7790 : !llvm.i64 + %7792 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7793 = llvm.mul %5851, %7792 : !llvm.i64 + %7794 = llvm.add %7791, %7793 : !llvm.i64 + %7795 = llvm.getelementptr %7787[%7794] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7796 = llvm.load %7795 : !llvm.ptr + %7797 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7799 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7800 = llvm.mul %5851, %7799 : !llvm.i64 + %7801 = llvm.add %7798, %7800 : !llvm.i64 + %7802 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7803 = llvm.mul %5850, %7802 : !llvm.i64 + %7804 = llvm.add %7801, %7803 : !llvm.i64 + %7805 = llvm.getelementptr %7797[%7804] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7806 = llvm.load %7805 : !llvm.ptr + %7807 = llvm.fmul %7796, %7806 {RelaxedPrecision} : !llvm.float + %7808 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7811 = llvm.mul %25, %7810 : !llvm.i64 + %7812 = llvm.add %7809, %7811 : !llvm.i64 + %7813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7814 = llvm.mul %5850, %7813 : !llvm.i64 + %7815 = llvm.add %7812, %7814 : !llvm.i64 + %7816 = llvm.getelementptr %7808[%7815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7817 = llvm.load %7816 : !llvm.ptr + %7818 = llvm.fadd %7817, %7807 {RelaxedPrecision} : !llvm.float + %7819 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7821 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7822 = llvm.mul %25, %7821 : !llvm.i64 + %7823 = llvm.add %7820, %7822 : !llvm.i64 + %7824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7825 = llvm.mul %5850, %7824 : !llvm.i64 + %7826 = llvm.add %7823, %7825 : !llvm.i64 + %7827 = llvm.getelementptr %7819[%7826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7818, %7827 : !llvm.ptr + %7828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7831 = llvm.mul %25, %7830 : !llvm.i64 + %7832 = llvm.add %7829, %7831 : !llvm.i64 + %7833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7834 = llvm.mul %5850, %7833 : !llvm.i64 + %7835 = llvm.add %7832, %7834 : !llvm.i64 + %7836 = llvm.getelementptr %7828[%7835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7837 = llvm.load %7836 : !llvm.ptr + %7838 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7839 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7840 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7841 = llvm.mul %25, %7840 : !llvm.i64 + %7842 = llvm.add %7839, %7841 : !llvm.i64 + %7843 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7844 = llvm.mul %5850, %7843 : !llvm.i64 + %7845 = llvm.add %7842, %7844 : !llvm.i64 + %7846 = llvm.getelementptr %7838[%7845] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7837, %7846 : !llvm.ptr + %7847 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7848 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7849 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7850 = llvm.mul %25, %7849 : !llvm.i64 + %7851 = llvm.add %7848, %7850 : !llvm.i64 + %7852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7853 = llvm.mul %5851, %7852 : !llvm.i64 + %7854 = llvm.add %7851, %7853 : !llvm.i64 + %7855 = llvm.getelementptr %7847[%7854] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7856 = llvm.load %7855 : !llvm.ptr + %7857 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7858 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7859 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7860 = llvm.mul %5851, %7859 : !llvm.i64 + %7861 = llvm.add %7858, %7860 : !llvm.i64 + %7862 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7863 = llvm.mul %5912, %7862 : !llvm.i64 + %7864 = llvm.add %7861, %7863 : !llvm.i64 + %7865 = llvm.getelementptr %7857[%7864] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7866 = llvm.load %7865 : !llvm.ptr + %7867 = llvm.fmul %7856, %7866 {RelaxedPrecision} : !llvm.float + %7868 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7871 = llvm.mul %25, %7870 : !llvm.i64 + %7872 = llvm.add %7869, %7871 : !llvm.i64 + %7873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7874 = llvm.mul %5912, %7873 : !llvm.i64 + %7875 = llvm.add %7872, %7874 : !llvm.i64 + %7876 = llvm.getelementptr %7868[%7875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7877 = llvm.load %7876 : !llvm.ptr + %7878 = llvm.fadd %7877, %7867 {RelaxedPrecision} : !llvm.float + %7879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7882 = llvm.mul %25, %7881 : !llvm.i64 + %7883 = llvm.add %7880, %7882 : !llvm.i64 + %7884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7885 = llvm.mul %5912, %7884 : !llvm.i64 + %7886 = llvm.add %7883, %7885 : !llvm.i64 + %7887 = llvm.getelementptr %7879[%7886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7878, %7887 : !llvm.ptr + %7888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7891 = llvm.mul %25, %7890 : !llvm.i64 + %7892 = llvm.add %7889, %7891 : !llvm.i64 + %7893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7894 = llvm.mul %5912, %7893 : !llvm.i64 + %7895 = llvm.add %7892, %7894 : !llvm.i64 + %7896 = llvm.getelementptr %7888[%7895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7897 = llvm.load %7896 : !llvm.ptr + %7898 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7899 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7900 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7901 = llvm.mul %25, %7900 : !llvm.i64 + %7902 = llvm.add %7899, %7901 : !llvm.i64 + %7903 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7904 = llvm.mul %5912, %7903 : !llvm.i64 + %7905 = llvm.add %7902, %7904 : !llvm.i64 + %7906 = llvm.getelementptr %7898[%7905] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7897, %7906 : !llvm.ptr + %7907 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7908 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7909 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7910 = llvm.mul %25, %7909 : !llvm.i64 + %7911 = llvm.add %7908, %7910 : !llvm.i64 + %7912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7913 = llvm.mul %5851, %7912 : !llvm.i64 + %7914 = llvm.add %7911, %7913 : !llvm.i64 + %7915 = llvm.getelementptr %7907[%7914] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7916 = llvm.load %7915 : !llvm.ptr + %7917 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7918 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7919 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7920 = llvm.mul %5851, %7919 : !llvm.i64 + %7921 = llvm.add %7918, %7920 : !llvm.i64 + %7922 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7923 = llvm.mul %5973, %7922 : !llvm.i64 + %7924 = llvm.add %7921, %7923 : !llvm.i64 + %7925 = llvm.getelementptr %7917[%7924] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7926 = llvm.load %7925 : !llvm.ptr + %7927 = llvm.fmul %7916, %7926 {RelaxedPrecision} : !llvm.float + %7928 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7930 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7931 = llvm.mul %25, %7930 : !llvm.i64 + %7932 = llvm.add %7929, %7931 : !llvm.i64 + %7933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7934 = llvm.mul %5973, %7933 : !llvm.i64 + %7935 = llvm.add %7932, %7934 : !llvm.i64 + %7936 = llvm.getelementptr %7928[%7935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7937 = llvm.load %7936 : !llvm.ptr + %7938 = llvm.fadd %7937, %7927 {RelaxedPrecision} : !llvm.float + %7939 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7940 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7941 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7942 = llvm.mul %25, %7941 : !llvm.i64 + %7943 = llvm.add %7940, %7942 : !llvm.i64 + %7944 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7945 = llvm.mul %5973, %7944 : !llvm.i64 + %7946 = llvm.add %7943, %7945 : !llvm.i64 + %7947 = llvm.getelementptr %7939[%7946] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7938, %7947 : !llvm.ptr + %7948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7951 = llvm.mul %25, %7950 : !llvm.i64 + %7952 = llvm.add %7949, %7951 : !llvm.i64 + %7953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7954 = llvm.mul %5973, %7953 : !llvm.i64 + %7955 = llvm.add %7952, %7954 : !llvm.i64 + %7956 = llvm.getelementptr %7948[%7955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7957 = llvm.load %7956 : !llvm.ptr + %7958 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7960 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7961 = llvm.mul %25, %7960 : !llvm.i64 + %7962 = llvm.add %7959, %7961 : !llvm.i64 + %7963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7964 = llvm.mul %5973, %7963 : !llvm.i64 + %7965 = llvm.add %7962, %7964 : !llvm.i64 + %7966 = llvm.getelementptr %7958[%7965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7957, %7966 : !llvm.ptr + %7967 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7968 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7969 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7970 = llvm.mul %25, %7969 : !llvm.i64 + %7971 = llvm.add %7968, %7970 : !llvm.i64 + %7972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7973 = llvm.mul %5851, %7972 : !llvm.i64 + %7974 = llvm.add %7971, %7973 : !llvm.i64 + %7975 = llvm.getelementptr %7967[%7974] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7976 = llvm.load %7975 : !llvm.ptr + %7977 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7978 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7979 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7980 = llvm.mul %5851, %7979 : !llvm.i64 + %7981 = llvm.add %7978, %7980 : !llvm.i64 + %7982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7983 = llvm.mul %6034, %7982 : !llvm.i64 + %7984 = llvm.add %7981, %7983 : !llvm.i64 + %7985 = llvm.getelementptr %7977[%7984] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7986 = llvm.load %7985 : !llvm.ptr + %7987 = llvm.fmul %7976, %7986 {RelaxedPrecision} : !llvm.float + %7988 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7990 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7991 = llvm.mul %25, %7990 : !llvm.i64 + %7992 = llvm.add %7989, %7991 : !llvm.i64 + %7993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7994 = llvm.mul %6034, %7993 : !llvm.i64 + %7995 = llvm.add %7992, %7994 : !llvm.i64 + %7996 = llvm.getelementptr %7988[%7995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7997 = llvm.load %7996 : !llvm.ptr + %7998 = llvm.fadd %7997, %7987 {RelaxedPrecision} : !llvm.float + %7999 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8000 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8001 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8002 = llvm.mul %25, %8001 : !llvm.i64 + %8003 = llvm.add %8000, %8002 : !llvm.i64 + %8004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8005 = llvm.mul %6034, %8004 : !llvm.i64 + %8006 = llvm.add %8003, %8005 : !llvm.i64 + %8007 = llvm.getelementptr %7999[%8006] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7998, %8007 : !llvm.ptr + %8008 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8010 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8011 = llvm.mul %25, %8010 : !llvm.i64 + %8012 = llvm.add %8009, %8011 : !llvm.i64 + %8013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8014 = llvm.mul %6034, %8013 : !llvm.i64 + %8015 = llvm.add %8012, %8014 : !llvm.i64 + %8016 = llvm.getelementptr %8008[%8015] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8017 = llvm.load %8016 : !llvm.ptr + %8018 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8020 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8021 = llvm.mul %25, %8020 : !llvm.i64 + %8022 = llvm.add %8019, %8021 : !llvm.i64 + %8023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8024 = llvm.mul %6034, %8023 : !llvm.i64 + %8025 = llvm.add %8022, %8024 : !llvm.i64 + %8026 = llvm.getelementptr %8018[%8025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8017, %8026 : !llvm.ptr + %8027 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8028 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8029 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8030 = llvm.mul %25, %8029 : !llvm.i64 + %8031 = llvm.add %8028, %8030 : !llvm.i64 + %8032 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8033 = llvm.mul %5851, %8032 : !llvm.i64 + %8034 = llvm.add %8031, %8033 : !llvm.i64 + %8035 = llvm.getelementptr %8027[%8034] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8036 = llvm.load %8035 : !llvm.ptr + %8037 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8038 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8039 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8040 = llvm.mul %5851, %8039 : !llvm.i64 + %8041 = llvm.add %8038, %8040 : !llvm.i64 + %8042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8043 = llvm.mul %6095, %8042 : !llvm.i64 + %8044 = llvm.add %8041, %8043 : !llvm.i64 + %8045 = llvm.getelementptr %8037[%8044] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8046 = llvm.load %8045 : !llvm.ptr + %8047 = llvm.fmul %8036, %8046 {RelaxedPrecision} : !llvm.float + %8048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8051 = llvm.mul %25, %8050 : !llvm.i64 + %8052 = llvm.add %8049, %8051 : !llvm.i64 + %8053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8054 = llvm.mul %6095, %8053 : !llvm.i64 + %8055 = llvm.add %8052, %8054 : !llvm.i64 + %8056 = llvm.getelementptr %8048[%8055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8057 = llvm.load %8056 : !llvm.ptr + %8058 = llvm.fadd %8057, %8047 {RelaxedPrecision} : !llvm.float + %8059 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8060 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8061 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8062 = llvm.mul %25, %8061 : !llvm.i64 + %8063 = llvm.add %8060, %8062 : !llvm.i64 + %8064 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8065 = llvm.mul %6095, %8064 : !llvm.i64 + %8066 = llvm.add %8063, %8065 : !llvm.i64 + %8067 = llvm.getelementptr %8059[%8066] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8058, %8067 : !llvm.ptr + %8068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8071 = llvm.mul %25, %8070 : !llvm.i64 + %8072 = llvm.add %8069, %8071 : !llvm.i64 + %8073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8074 = llvm.mul %6095, %8073 : !llvm.i64 + %8075 = llvm.add %8072, %8074 : !llvm.i64 + %8076 = llvm.getelementptr %8068[%8075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8077 = llvm.load %8076 : !llvm.ptr + %8078 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8080 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8081 = llvm.mul %25, %8080 : !llvm.i64 + %8082 = llvm.add %8079, %8081 : !llvm.i64 + %8083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8084 = llvm.mul %6095, %8083 : !llvm.i64 + %8085 = llvm.add %8082, %8084 : !llvm.i64 + %8086 = llvm.getelementptr %8078[%8085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8077, %8086 : !llvm.ptr + %8087 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8088 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8089 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8090 = llvm.mul %25, %8089 : !llvm.i64 + %8091 = llvm.add %8088, %8090 : !llvm.i64 + %8092 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8093 = llvm.mul %5851, %8092 : !llvm.i64 + %8094 = llvm.add %8091, %8093 : !llvm.i64 + %8095 = llvm.getelementptr %8087[%8094] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8096 = llvm.load %8095 : !llvm.ptr + %8097 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8098 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8099 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8100 = llvm.mul %5851, %8099 : !llvm.i64 + %8101 = llvm.add %8098, %8100 : !llvm.i64 + %8102 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8103 = llvm.mul %6156, %8102 : !llvm.i64 + %8104 = llvm.add %8101, %8103 : !llvm.i64 + %8105 = llvm.getelementptr %8097[%8104] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8106 = llvm.load %8105 : !llvm.ptr + %8107 = llvm.fmul %8096, %8106 {RelaxedPrecision} : !llvm.float + %8108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8111 = llvm.mul %25, %8110 : !llvm.i64 + %8112 = llvm.add %8109, %8111 : !llvm.i64 + %8113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8114 = llvm.mul %6156, %8113 : !llvm.i64 + %8115 = llvm.add %8112, %8114 : !llvm.i64 + %8116 = llvm.getelementptr %8108[%8115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8117 = llvm.load %8116 : !llvm.ptr + %8118 = llvm.fadd %8117, %8107 {RelaxedPrecision} : !llvm.float + %8119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8121 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8122 = llvm.mul %25, %8121 : !llvm.i64 + %8123 = llvm.add %8120, %8122 : !llvm.i64 + %8124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8125 = llvm.mul %6156, %8124 : !llvm.i64 + %8126 = llvm.add %8123, %8125 : !llvm.i64 + %8127 = llvm.getelementptr %8119[%8126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8118, %8127 : !llvm.ptr + %8128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8131 = llvm.mul %25, %8130 : !llvm.i64 + %8132 = llvm.add %8129, %8131 : !llvm.i64 + %8133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8134 = llvm.mul %6156, %8133 : !llvm.i64 + %8135 = llvm.add %8132, %8134 : !llvm.i64 + %8136 = llvm.getelementptr %8128[%8135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8137 = llvm.load %8136 : !llvm.ptr + %8138 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8140 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8141 = llvm.mul %25, %8140 : !llvm.i64 + %8142 = llvm.add %8139, %8141 : !llvm.i64 + %8143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8144 = llvm.mul %6156, %8143 : !llvm.i64 + %8145 = llvm.add %8142, %8144 : !llvm.i64 + %8146 = llvm.getelementptr %8138[%8145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8137, %8146 : !llvm.ptr + %8147 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8149 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8150 = llvm.mul %25, %8149 : !llvm.i64 + %8151 = llvm.add %8148, %8150 : !llvm.i64 + %8152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8153 = llvm.mul %5851, %8152 : !llvm.i64 + %8154 = llvm.add %8151, %8153 : !llvm.i64 + %8155 = llvm.getelementptr %8147[%8154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8156 = llvm.load %8155 : !llvm.ptr + %8157 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8159 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8160 = llvm.mul %5851, %8159 : !llvm.i64 + %8161 = llvm.add %8158, %8160 : !llvm.i64 + %8162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8163 = llvm.mul %6217, %8162 : !llvm.i64 + %8164 = llvm.add %8161, %8163 : !llvm.i64 + %8165 = llvm.getelementptr %8157[%8164] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8166 = llvm.load %8165 : !llvm.ptr + %8167 = llvm.fmul %8156, %8166 {RelaxedPrecision} : !llvm.float + %8168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8171 = llvm.mul %25, %8170 : !llvm.i64 + %8172 = llvm.add %8169, %8171 : !llvm.i64 + %8173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8174 = llvm.mul %6217, %8173 : !llvm.i64 + %8175 = llvm.add %8172, %8174 : !llvm.i64 + %8176 = llvm.getelementptr %8168[%8175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8177 = llvm.load %8176 : !llvm.ptr + %8178 = llvm.fadd %8177, %8167 {RelaxedPrecision} : !llvm.float + %8179 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8182 = llvm.mul %25, %8181 : !llvm.i64 + %8183 = llvm.add %8180, %8182 : !llvm.i64 + %8184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8185 = llvm.mul %6217, %8184 : !llvm.i64 + %8186 = llvm.add %8183, %8185 : !llvm.i64 + %8187 = llvm.getelementptr %8179[%8186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8178, %8187 : !llvm.ptr + %8188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8191 = llvm.mul %25, %8190 : !llvm.i64 + %8192 = llvm.add %8189, %8191 : !llvm.i64 + %8193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8194 = llvm.mul %6217, %8193 : !llvm.i64 + %8195 = llvm.add %8192, %8194 : !llvm.i64 + %8196 = llvm.getelementptr %8188[%8195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8197 = llvm.load %8196 : !llvm.ptr + %8198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8201 = llvm.mul %25, %8200 : !llvm.i64 + %8202 = llvm.add %8199, %8201 : !llvm.i64 + %8203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8204 = llvm.mul %6217, %8203 : !llvm.i64 + %8205 = llvm.add %8202, %8204 : !llvm.i64 + %8206 = llvm.getelementptr %8198[%8205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8197, %8206 : !llvm.ptr + %8207 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8208 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8209 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8210 = llvm.mul %25, %8209 : !llvm.i64 + %8211 = llvm.add %8208, %8210 : !llvm.i64 + %8212 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8213 = llvm.mul %5851, %8212 : !llvm.i64 + %8214 = llvm.add %8211, %8213 : !llvm.i64 + %8215 = llvm.getelementptr %8207[%8214] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8216 = llvm.load %8215 : !llvm.ptr + %8217 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8218 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8219 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8220 = llvm.mul %5851, %8219 : !llvm.i64 + %8221 = llvm.add %8218, %8220 : !llvm.i64 + %8222 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8223 = llvm.mul %6278, %8222 : !llvm.i64 + %8224 = llvm.add %8221, %8223 : !llvm.i64 + %8225 = llvm.getelementptr %8217[%8224] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8226 = llvm.load %8225 : !llvm.ptr + %8227 = llvm.fmul %8216, %8226 {RelaxedPrecision} : !llvm.float + %8228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8231 = llvm.mul %25, %8230 : !llvm.i64 + %8232 = llvm.add %8229, %8231 : !llvm.i64 + %8233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8234 = llvm.mul %6278, %8233 : !llvm.i64 + %8235 = llvm.add %8232, %8234 : !llvm.i64 + %8236 = llvm.getelementptr %8228[%8235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8237 = llvm.load %8236 : !llvm.ptr + %8238 = llvm.fadd %8237, %8227 {RelaxedPrecision} : !llvm.float + %8239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8242 = llvm.mul %25, %8241 : !llvm.i64 + %8243 = llvm.add %8240, %8242 : !llvm.i64 + %8244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8245 = llvm.mul %6278, %8244 : !llvm.i64 + %8246 = llvm.add %8243, %8245 : !llvm.i64 + %8247 = llvm.getelementptr %8239[%8246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8238, %8247 : !llvm.ptr + %8248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8251 = llvm.mul %25, %8250 : !llvm.i64 + %8252 = llvm.add %8249, %8251 : !llvm.i64 + %8253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8254 = llvm.mul %6278, %8253 : !llvm.i64 + %8255 = llvm.add %8252, %8254 : !llvm.i64 + %8256 = llvm.getelementptr %8248[%8255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8257 = llvm.load %8256 : !llvm.ptr + %8258 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8260 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8261 = llvm.mul %25, %8260 : !llvm.i64 + %8262 = llvm.add %8259, %8261 : !llvm.i64 + %8263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8264 = llvm.mul %6278, %8263 : !llvm.i64 + %8265 = llvm.add %8262, %8264 : !llvm.i64 + %8266 = llvm.getelementptr %8258[%8265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8257, %8266 : !llvm.ptr + %8267 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8269 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8270 = llvm.mul %25, %8269 : !llvm.i64 + %8271 = llvm.add %8268, %8270 : !llvm.i64 + %8272 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8273 = llvm.mul %5851, %8272 : !llvm.i64 + %8274 = llvm.add %8271, %8273 : !llvm.i64 + %8275 = llvm.getelementptr %8267[%8274] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8276 = llvm.load %8275 : !llvm.ptr + %8277 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8278 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8279 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8280 = llvm.mul %5851, %8279 : !llvm.i64 + %8281 = llvm.add %8278, %8280 : !llvm.i64 + %8282 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8283 = llvm.mul %6339, %8282 : !llvm.i64 + %8284 = llvm.add %8281, %8283 : !llvm.i64 + %8285 = llvm.getelementptr %8277[%8284] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8286 = llvm.load %8285 : !llvm.ptr + %8287 = llvm.fmul %8276, %8286 {RelaxedPrecision} : !llvm.float + %8288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8291 = llvm.mul %25, %8290 : !llvm.i64 + %8292 = llvm.add %8289, %8291 : !llvm.i64 + %8293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8294 = llvm.mul %6339, %8293 : !llvm.i64 + %8295 = llvm.add %8292, %8294 : !llvm.i64 + %8296 = llvm.getelementptr %8288[%8295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8297 = llvm.load %8296 : !llvm.ptr + %8298 = llvm.fadd %8297, %8287 {RelaxedPrecision} : !llvm.float + %8299 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8300 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8301 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8302 = llvm.mul %25, %8301 : !llvm.i64 + %8303 = llvm.add %8300, %8302 : !llvm.i64 + %8304 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8305 = llvm.mul %6339, %8304 : !llvm.i64 + %8306 = llvm.add %8303, %8305 : !llvm.i64 + %8307 = llvm.getelementptr %8299[%8306] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8298, %8307 : !llvm.ptr + %8308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8311 = llvm.mul %25, %8310 : !llvm.i64 + %8312 = llvm.add %8309, %8311 : !llvm.i64 + %8313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8314 = llvm.mul %6339, %8313 : !llvm.i64 + %8315 = llvm.add %8312, %8314 : !llvm.i64 + %8316 = llvm.getelementptr %8308[%8315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8317 = llvm.load %8316 : !llvm.ptr + %8318 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8320 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8321 = llvm.mul %25, %8320 : !llvm.i64 + %8322 = llvm.add %8319, %8321 : !llvm.i64 + %8323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8324 = llvm.mul %6339, %8323 : !llvm.i64 + %8325 = llvm.add %8322, %8324 : !llvm.i64 + %8326 = llvm.getelementptr %8318[%8325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8317, %8326 : !llvm.ptr + %8327 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8328 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8329 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8330 = llvm.mul %25, %8329 : !llvm.i64 + %8331 = llvm.add %8328, %8330 : !llvm.i64 + %8332 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8333 = llvm.mul %5851, %8332 : !llvm.i64 + %8334 = llvm.add %8331, %8333 : !llvm.i64 + %8335 = llvm.getelementptr %8327[%8334] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8336 = llvm.load %8335 : !llvm.ptr + %8337 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8339 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8340 = llvm.mul %5851, %8339 : !llvm.i64 + %8341 = llvm.add %8338, %8340 : !llvm.i64 + %8342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8343 = llvm.mul %6400, %8342 : !llvm.i64 + %8344 = llvm.add %8341, %8343 : !llvm.i64 + %8345 = llvm.getelementptr %8337[%8344] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8346 = llvm.load %8345 : !llvm.ptr + %8347 = llvm.fmul %8336, %8346 {RelaxedPrecision} : !llvm.float + %8348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8351 = llvm.mul %25, %8350 : !llvm.i64 + %8352 = llvm.add %8349, %8351 : !llvm.i64 + %8353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8354 = llvm.mul %6400, %8353 : !llvm.i64 + %8355 = llvm.add %8352, %8354 : !llvm.i64 + %8356 = llvm.getelementptr %8348[%8355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8357 = llvm.load %8356 : !llvm.ptr + %8358 = llvm.fadd %8357, %8347 {RelaxedPrecision} : !llvm.float + %8359 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8360 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8361 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8362 = llvm.mul %25, %8361 : !llvm.i64 + %8363 = llvm.add %8360, %8362 : !llvm.i64 + %8364 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8365 = llvm.mul %6400, %8364 : !llvm.i64 + %8366 = llvm.add %8363, %8365 : !llvm.i64 + %8367 = llvm.getelementptr %8359[%8366] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8358, %8367 : !llvm.ptr + %8368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8371 = llvm.mul %25, %8370 : !llvm.i64 + %8372 = llvm.add %8369, %8371 : !llvm.i64 + %8373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8374 = llvm.mul %6400, %8373 : !llvm.i64 + %8375 = llvm.add %8372, %8374 : !llvm.i64 + %8376 = llvm.getelementptr %8368[%8375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8377 = llvm.load %8376 : !llvm.ptr + %8378 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8380 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8381 = llvm.mul %25, %8380 : !llvm.i64 + %8382 = llvm.add %8379, %8381 : !llvm.i64 + %8383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8384 = llvm.mul %6400, %8383 : !llvm.i64 + %8385 = llvm.add %8382, %8384 : !llvm.i64 + %8386 = llvm.getelementptr %8378[%8385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8377, %8386 : !llvm.ptr + %8387 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8389 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8390 = llvm.mul %25, %8389 : !llvm.i64 + %8391 = llvm.add %8388, %8390 : !llvm.i64 + %8392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8393 = llvm.mul %5851, %8392 : !llvm.i64 + %8394 = llvm.add %8391, %8393 : !llvm.i64 + %8395 = llvm.getelementptr %8387[%8394] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8396 = llvm.load %8395 : !llvm.ptr + %8397 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8398 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8399 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8400 = llvm.mul %5851, %8399 : !llvm.i64 + %8401 = llvm.add %8398, %8400 : !llvm.i64 + %8402 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8403 = llvm.mul %6461, %8402 : !llvm.i64 + %8404 = llvm.add %8401, %8403 : !llvm.i64 + %8405 = llvm.getelementptr %8397[%8404] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8406 = llvm.load %8405 : !llvm.ptr + %8407 = llvm.fmul %8396, %8406 {RelaxedPrecision} : !llvm.float + %8408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8411 = llvm.mul %25, %8410 : !llvm.i64 + %8412 = llvm.add %8409, %8411 : !llvm.i64 + %8413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8414 = llvm.mul %6461, %8413 : !llvm.i64 + %8415 = llvm.add %8412, %8414 : !llvm.i64 + %8416 = llvm.getelementptr %8408[%8415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8417 = llvm.load %8416 : !llvm.ptr + %8418 = llvm.fadd %8417, %8407 {RelaxedPrecision} : !llvm.float + %8419 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8422 = llvm.mul %25, %8421 : !llvm.i64 + %8423 = llvm.add %8420, %8422 : !llvm.i64 + %8424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8425 = llvm.mul %6461, %8424 : !llvm.i64 + %8426 = llvm.add %8423, %8425 : !llvm.i64 + %8427 = llvm.getelementptr %8419[%8426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8418, %8427 : !llvm.ptr + %8428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8431 = llvm.mul %25, %8430 : !llvm.i64 + %8432 = llvm.add %8429, %8431 : !llvm.i64 + %8433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8434 = llvm.mul %6461, %8433 : !llvm.i64 + %8435 = llvm.add %8432, %8434 : !llvm.i64 + %8436 = llvm.getelementptr %8428[%8435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8437 = llvm.load %8436 : !llvm.ptr + %8438 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8440 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8441 = llvm.mul %25, %8440 : !llvm.i64 + %8442 = llvm.add %8439, %8441 : !llvm.i64 + %8443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8444 = llvm.mul %6461, %8443 : !llvm.i64 + %8445 = llvm.add %8442, %8444 : !llvm.i64 + %8446 = llvm.getelementptr %8438[%8445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8437, %8446 : !llvm.ptr + %8447 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8449 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8450 = llvm.mul %25, %8449 : !llvm.i64 + %8451 = llvm.add %8448, %8450 : !llvm.i64 + %8452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8453 = llvm.mul %5851, %8452 : !llvm.i64 + %8454 = llvm.add %8451, %8453 : !llvm.i64 + %8455 = llvm.getelementptr %8447[%8454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8456 = llvm.load %8455 : !llvm.ptr + %8457 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8458 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8459 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8460 = llvm.mul %5851, %8459 : !llvm.i64 + %8461 = llvm.add %8458, %8460 : !llvm.i64 + %8462 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8463 = llvm.mul %6522, %8462 : !llvm.i64 + %8464 = llvm.add %8461, %8463 : !llvm.i64 + %8465 = llvm.getelementptr %8457[%8464] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8466 = llvm.load %8465 : !llvm.ptr + %8467 = llvm.fmul %8456, %8466 {RelaxedPrecision} : !llvm.float + %8468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8471 = llvm.mul %25, %8470 : !llvm.i64 + %8472 = llvm.add %8469, %8471 : !llvm.i64 + %8473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8474 = llvm.mul %6522, %8473 : !llvm.i64 + %8475 = llvm.add %8472, %8474 : !llvm.i64 + %8476 = llvm.getelementptr %8468[%8475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8477 = llvm.load %8476 : !llvm.ptr + %8478 = llvm.fadd %8477, %8467 {RelaxedPrecision} : !llvm.float + %8479 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8480 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8481 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8482 = llvm.mul %25, %8481 : !llvm.i64 + %8483 = llvm.add %8480, %8482 : !llvm.i64 + %8484 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8485 = llvm.mul %6522, %8484 : !llvm.i64 + %8486 = llvm.add %8483, %8485 : !llvm.i64 + %8487 = llvm.getelementptr %8479[%8486] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8478, %8487 : !llvm.ptr + %8488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8491 = llvm.mul %25, %8490 : !llvm.i64 + %8492 = llvm.add %8489, %8491 : !llvm.i64 + %8493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8494 = llvm.mul %6522, %8493 : !llvm.i64 + %8495 = llvm.add %8492, %8494 : !llvm.i64 + %8496 = llvm.getelementptr %8488[%8495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8497 = llvm.load %8496 : !llvm.ptr + %8498 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8500 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8501 = llvm.mul %25, %8500 : !llvm.i64 + %8502 = llvm.add %8499, %8501 : !llvm.i64 + %8503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8504 = llvm.mul %6522, %8503 : !llvm.i64 + %8505 = llvm.add %8502, %8504 : !llvm.i64 + %8506 = llvm.getelementptr %8498[%8505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8497, %8506 : !llvm.ptr + %8507 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8508 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8509 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8510 = llvm.mul %25, %8509 : !llvm.i64 + %8511 = llvm.add %8508, %8510 : !llvm.i64 + %8512 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8513 = llvm.mul %5851, %8512 : !llvm.i64 + %8514 = llvm.add %8511, %8513 : !llvm.i64 + %8515 = llvm.getelementptr %8507[%8514] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8516 = llvm.load %8515 : !llvm.ptr + %8517 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8519 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8520 = llvm.mul %5851, %8519 : !llvm.i64 + %8521 = llvm.add %8518, %8520 : !llvm.i64 + %8522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8523 = llvm.mul %6583, %8522 : !llvm.i64 + %8524 = llvm.add %8521, %8523 : !llvm.i64 + %8525 = llvm.getelementptr %8517[%8524] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8526 = llvm.load %8525 : !llvm.ptr + %8527 = llvm.fmul %8516, %8526 {RelaxedPrecision} : !llvm.float + %8528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8531 = llvm.mul %25, %8530 : !llvm.i64 + %8532 = llvm.add %8529, %8531 : !llvm.i64 + %8533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8534 = llvm.mul %6583, %8533 : !llvm.i64 + %8535 = llvm.add %8532, %8534 : !llvm.i64 + %8536 = llvm.getelementptr %8528[%8535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8537 = llvm.load %8536 : !llvm.ptr + %8538 = llvm.fadd %8537, %8527 {RelaxedPrecision} : !llvm.float + %8539 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8540 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8541 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8542 = llvm.mul %25, %8541 : !llvm.i64 + %8543 = llvm.add %8540, %8542 : !llvm.i64 + %8544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8545 = llvm.mul %6583, %8544 : !llvm.i64 + %8546 = llvm.add %8543, %8545 : !llvm.i64 + %8547 = llvm.getelementptr %8539[%8546] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8538, %8547 : !llvm.ptr + %8548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8551 = llvm.mul %25, %8550 : !llvm.i64 + %8552 = llvm.add %8549, %8551 : !llvm.i64 + %8553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8554 = llvm.mul %6583, %8553 : !llvm.i64 + %8555 = llvm.add %8552, %8554 : !llvm.i64 + %8556 = llvm.getelementptr %8548[%8555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8557 = llvm.load %8556 : !llvm.ptr + %8558 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8561 = llvm.mul %25, %8560 : !llvm.i64 + %8562 = llvm.add %8559, %8561 : !llvm.i64 + %8563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8564 = llvm.mul %6583, %8563 : !llvm.i64 + %8565 = llvm.add %8562, %8564 : !llvm.i64 + %8566 = llvm.getelementptr %8558[%8565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8557, %8566 : !llvm.ptr + %8567 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8568 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8569 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8570 = llvm.mul %25, %8569 : !llvm.i64 + %8571 = llvm.add %8568, %8570 : !llvm.i64 + %8572 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8573 = llvm.mul %5851, %8572 : !llvm.i64 + %8574 = llvm.add %8571, %8573 : !llvm.i64 + %8575 = llvm.getelementptr %8567[%8574] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8576 = llvm.load %8575 : !llvm.ptr + %8577 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8578 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8579 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8580 = llvm.mul %5851, %8579 : !llvm.i64 + %8581 = llvm.add %8578, %8580 : !llvm.i64 + %8582 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8583 = llvm.mul %6644, %8582 : !llvm.i64 + %8584 = llvm.add %8581, %8583 : !llvm.i64 + %8585 = llvm.getelementptr %8577[%8584] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8586 = llvm.load %8585 : !llvm.ptr + %8587 = llvm.fmul %8576, %8586 {RelaxedPrecision} : !llvm.float + %8588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8591 = llvm.mul %25, %8590 : !llvm.i64 + %8592 = llvm.add %8589, %8591 : !llvm.i64 + %8593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8594 = llvm.mul %6644, %8593 : !llvm.i64 + %8595 = llvm.add %8592, %8594 : !llvm.i64 + %8596 = llvm.getelementptr %8588[%8595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8597 = llvm.load %8596 : !llvm.ptr + %8598 = llvm.fadd %8597, %8587 {RelaxedPrecision} : !llvm.float + %8599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8602 = llvm.mul %25, %8601 : !llvm.i64 + %8603 = llvm.add %8600, %8602 : !llvm.i64 + %8604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8605 = llvm.mul %6644, %8604 : !llvm.i64 + %8606 = llvm.add %8603, %8605 : !llvm.i64 + %8607 = llvm.getelementptr %8599[%8606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8598, %8607 : !llvm.ptr + %8608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8611 = llvm.mul %25, %8610 : !llvm.i64 + %8612 = llvm.add %8609, %8611 : !llvm.i64 + %8613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8614 = llvm.mul %6644, %8613 : !llvm.i64 + %8615 = llvm.add %8612, %8614 : !llvm.i64 + %8616 = llvm.getelementptr %8608[%8615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8617 = llvm.load %8616 : !llvm.ptr + %8618 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8620 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8621 = llvm.mul %25, %8620 : !llvm.i64 + %8622 = llvm.add %8619, %8621 : !llvm.i64 + %8623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8624 = llvm.mul %6644, %8623 : !llvm.i64 + %8625 = llvm.add %8622, %8624 : !llvm.i64 + %8626 = llvm.getelementptr %8618[%8625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8617, %8626 : !llvm.ptr + %8627 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8628 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8629 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8630 = llvm.mul %25, %8629 : !llvm.i64 + %8631 = llvm.add %8628, %8630 : !llvm.i64 + %8632 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8633 = llvm.mul %5851, %8632 : !llvm.i64 + %8634 = llvm.add %8631, %8633 : !llvm.i64 + %8635 = llvm.getelementptr %8627[%8634] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8636 = llvm.load %8635 : !llvm.ptr + %8637 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8638 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8639 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8640 = llvm.mul %5851, %8639 : !llvm.i64 + %8641 = llvm.add %8638, %8640 : !llvm.i64 + %8642 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8643 = llvm.mul %6705, %8642 : !llvm.i64 + %8644 = llvm.add %8641, %8643 : !llvm.i64 + %8645 = llvm.getelementptr %8637[%8644] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8646 = llvm.load %8645 : !llvm.ptr + %8647 = llvm.fmul %8636, %8646 {RelaxedPrecision} : !llvm.float + %8648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8651 = llvm.mul %25, %8650 : !llvm.i64 + %8652 = llvm.add %8649, %8651 : !llvm.i64 + %8653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8654 = llvm.mul %6705, %8653 : !llvm.i64 + %8655 = llvm.add %8652, %8654 : !llvm.i64 + %8656 = llvm.getelementptr %8648[%8655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8657 = llvm.load %8656 : !llvm.ptr + %8658 = llvm.fadd %8657, %8647 {RelaxedPrecision} : !llvm.float + %8659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8662 = llvm.mul %25, %8661 : !llvm.i64 + %8663 = llvm.add %8660, %8662 : !llvm.i64 + %8664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8665 = llvm.mul %6705, %8664 : !llvm.i64 + %8666 = llvm.add %8663, %8665 : !llvm.i64 + %8667 = llvm.getelementptr %8659[%8666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8658, %8667 : !llvm.ptr + %8668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8671 = llvm.mul %25, %8670 : !llvm.i64 + %8672 = llvm.add %8669, %8671 : !llvm.i64 + %8673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8674 = llvm.mul %6705, %8673 : !llvm.i64 + %8675 = llvm.add %8672, %8674 : !llvm.i64 + %8676 = llvm.getelementptr %8668[%8675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8677 = llvm.load %8676 : !llvm.ptr + %8678 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8680 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8681 = llvm.mul %25, %8680 : !llvm.i64 + %8682 = llvm.add %8679, %8681 : !llvm.i64 + %8683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8684 = llvm.mul %6705, %8683 : !llvm.i64 + %8685 = llvm.add %8682, %8684 : !llvm.i64 + %8686 = llvm.getelementptr %8678[%8685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8677, %8686 : !llvm.ptr + %8687 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8688 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8689 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8690 = llvm.mul %25, %8689 : !llvm.i64 + %8691 = llvm.add %8688, %8690 : !llvm.i64 + %8692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8693 = llvm.mul %5851, %8692 : !llvm.i64 + %8694 = llvm.add %8691, %8693 : !llvm.i64 + %8695 = llvm.getelementptr %8687[%8694] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8696 = llvm.load %8695 : !llvm.ptr + %8697 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8698 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8699 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8700 = llvm.mul %5851, %8699 : !llvm.i64 + %8701 = llvm.add %8698, %8700 : !llvm.i64 + %8702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8703 = llvm.mul %6766, %8702 : !llvm.i64 + %8704 = llvm.add %8701, %8703 : !llvm.i64 + %8705 = llvm.getelementptr %8697[%8704] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8706 = llvm.load %8705 : !llvm.ptr + %8707 = llvm.fmul %8696, %8706 {RelaxedPrecision} : !llvm.float + %8708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8711 = llvm.mul %25, %8710 : !llvm.i64 + %8712 = llvm.add %8709, %8711 : !llvm.i64 + %8713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8714 = llvm.mul %6766, %8713 : !llvm.i64 + %8715 = llvm.add %8712, %8714 : !llvm.i64 + %8716 = llvm.getelementptr %8708[%8715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8717 = llvm.load %8716 : !llvm.ptr + %8718 = llvm.fadd %8717, %8707 {RelaxedPrecision} : !llvm.float + %8719 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8720 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8721 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8722 = llvm.mul %25, %8721 : !llvm.i64 + %8723 = llvm.add %8720, %8722 : !llvm.i64 + %8724 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8725 = llvm.mul %6766, %8724 : !llvm.i64 + %8726 = llvm.add %8723, %8725 : !llvm.i64 + %8727 = llvm.getelementptr %8719[%8726] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8718, %8727 : !llvm.ptr + %8728 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8729 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8730 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8731 = llvm.mul %25, %8730 : !llvm.i64 + %8732 = llvm.add %8729, %8731 : !llvm.i64 + %8733 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8734 = llvm.mul %6766, %8733 : !llvm.i64 + %8735 = llvm.add %8732, %8734 : !llvm.i64 + %8736 = llvm.getelementptr %8728[%8735] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8737 = llvm.load %8736 : !llvm.ptr + %8738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8741 = llvm.mul %25, %8740 : !llvm.i64 + %8742 = llvm.add %8739, %8741 : !llvm.i64 + %8743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8744 = llvm.mul %6766, %8743 : !llvm.i64 + %8745 = llvm.add %8742, %8744 : !llvm.i64 + %8746 = llvm.getelementptr %8738[%8745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8737, %8746 : !llvm.ptr + %8747 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8749 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8750 = llvm.mul %26, %8749 : !llvm.i64 + %8751 = llvm.add %8748, %8750 : !llvm.i64 + %8752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8753 = llvm.mul %5851, %8752 : !llvm.i64 + %8754 = llvm.add %8751, %8753 : !llvm.i64 + %8755 = llvm.getelementptr %8747[%8754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8756 = llvm.load %8755 : !llvm.ptr + %8757 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8758 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8759 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8760 = llvm.mul %5851, %8759 : !llvm.i64 + %8761 = llvm.add %8758, %8760 : !llvm.i64 + %8762 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8763 = llvm.mul %5850, %8762 : !llvm.i64 + %8764 = llvm.add %8761, %8763 : !llvm.i64 + %8765 = llvm.getelementptr %8757[%8764] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8766 = llvm.load %8765 : !llvm.ptr + %8767 = llvm.fmul %8756, %8766 {RelaxedPrecision} : !llvm.float + %8768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8771 = llvm.mul %26, %8770 : !llvm.i64 + %8772 = llvm.add %8769, %8771 : !llvm.i64 + %8773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8774 = llvm.mul %5850, %8773 : !llvm.i64 + %8775 = llvm.add %8772, %8774 : !llvm.i64 + %8776 = llvm.getelementptr %8768[%8775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8777 = llvm.load %8776 : !llvm.ptr + %8778 = llvm.fadd %8777, %8767 {RelaxedPrecision} : !llvm.float + %8779 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8780 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8781 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8782 = llvm.mul %26, %8781 : !llvm.i64 + %8783 = llvm.add %8780, %8782 : !llvm.i64 + %8784 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8785 = llvm.mul %5850, %8784 : !llvm.i64 + %8786 = llvm.add %8783, %8785 : !llvm.i64 + %8787 = llvm.getelementptr %8779[%8786] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8778, %8787 : !llvm.ptr + %8788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8791 = llvm.mul %26, %8790 : !llvm.i64 + %8792 = llvm.add %8789, %8791 : !llvm.i64 + %8793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8794 = llvm.mul %5850, %8793 : !llvm.i64 + %8795 = llvm.add %8792, %8794 : !llvm.i64 + %8796 = llvm.getelementptr %8788[%8795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8797 = llvm.load %8796 : !llvm.ptr + %8798 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8800 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8801 = llvm.mul %26, %8800 : !llvm.i64 + %8802 = llvm.add %8799, %8801 : !llvm.i64 + %8803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8804 = llvm.mul %5850, %8803 : !llvm.i64 + %8805 = llvm.add %8802, %8804 : !llvm.i64 + %8806 = llvm.getelementptr %8798[%8805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8797, %8806 : !llvm.ptr + %8807 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8808 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8809 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8810 = llvm.mul %26, %8809 : !llvm.i64 + %8811 = llvm.add %8808, %8810 : !llvm.i64 + %8812 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8813 = llvm.mul %5851, %8812 : !llvm.i64 + %8814 = llvm.add %8811, %8813 : !llvm.i64 + %8815 = llvm.getelementptr %8807[%8814] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8816 = llvm.load %8815 : !llvm.ptr + %8817 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8818 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8819 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8820 = llvm.mul %5851, %8819 : !llvm.i64 + %8821 = llvm.add %8818, %8820 : !llvm.i64 + %8822 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8823 = llvm.mul %5912, %8822 : !llvm.i64 + %8824 = llvm.add %8821, %8823 : !llvm.i64 + %8825 = llvm.getelementptr %8817[%8824] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8826 = llvm.load %8825 : !llvm.ptr + %8827 = llvm.fmul %8816, %8826 {RelaxedPrecision} : !llvm.float + %8828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8831 = llvm.mul %26, %8830 : !llvm.i64 + %8832 = llvm.add %8829, %8831 : !llvm.i64 + %8833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8834 = llvm.mul %5912, %8833 : !llvm.i64 + %8835 = llvm.add %8832, %8834 : !llvm.i64 + %8836 = llvm.getelementptr %8828[%8835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8837 = llvm.load %8836 : !llvm.ptr + %8838 = llvm.fadd %8837, %8827 {RelaxedPrecision} : !llvm.float + %8839 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8840 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8841 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8842 = llvm.mul %26, %8841 : !llvm.i64 + %8843 = llvm.add %8840, %8842 : !llvm.i64 + %8844 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8845 = llvm.mul %5912, %8844 : !llvm.i64 + %8846 = llvm.add %8843, %8845 : !llvm.i64 + %8847 = llvm.getelementptr %8839[%8846] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8838, %8847 : !llvm.ptr + %8848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8851 = llvm.mul %26, %8850 : !llvm.i64 + %8852 = llvm.add %8849, %8851 : !llvm.i64 + %8853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8854 = llvm.mul %5912, %8853 : !llvm.i64 + %8855 = llvm.add %8852, %8854 : !llvm.i64 + %8856 = llvm.getelementptr %8848[%8855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8857 = llvm.load %8856 : !llvm.ptr + %8858 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8860 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8861 = llvm.mul %26, %8860 : !llvm.i64 + %8862 = llvm.add %8859, %8861 : !llvm.i64 + %8863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8864 = llvm.mul %5912, %8863 : !llvm.i64 + %8865 = llvm.add %8862, %8864 : !llvm.i64 + %8866 = llvm.getelementptr %8858[%8865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8857, %8866 : !llvm.ptr + %8867 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8868 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8869 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8870 = llvm.mul %26, %8869 : !llvm.i64 + %8871 = llvm.add %8868, %8870 : !llvm.i64 + %8872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8873 = llvm.mul %5851, %8872 : !llvm.i64 + %8874 = llvm.add %8871, %8873 : !llvm.i64 + %8875 = llvm.getelementptr %8867[%8874] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8876 = llvm.load %8875 : !llvm.ptr + %8877 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8878 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8879 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8880 = llvm.mul %5851, %8879 : !llvm.i64 + %8881 = llvm.add %8878, %8880 : !llvm.i64 + %8882 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8883 = llvm.mul %5973, %8882 : !llvm.i64 + %8884 = llvm.add %8881, %8883 : !llvm.i64 + %8885 = llvm.getelementptr %8877[%8884] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8886 = llvm.load %8885 : !llvm.ptr + %8887 = llvm.fmul %8876, %8886 {RelaxedPrecision} : !llvm.float + %8888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8891 = llvm.mul %26, %8890 : !llvm.i64 + %8892 = llvm.add %8889, %8891 : !llvm.i64 + %8893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8894 = llvm.mul %5973, %8893 : !llvm.i64 + %8895 = llvm.add %8892, %8894 : !llvm.i64 + %8896 = llvm.getelementptr %8888[%8895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8897 = llvm.load %8896 : !llvm.ptr + %8898 = llvm.fadd %8897, %8887 {RelaxedPrecision} : !llvm.float + %8899 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8901 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8902 = llvm.mul %26, %8901 : !llvm.i64 + %8903 = llvm.add %8900, %8902 : !llvm.i64 + %8904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8905 = llvm.mul %5973, %8904 : !llvm.i64 + %8906 = llvm.add %8903, %8905 : !llvm.i64 + %8907 = llvm.getelementptr %8899[%8906] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8898, %8907 : !llvm.ptr + %8908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8911 = llvm.mul %26, %8910 : !llvm.i64 + %8912 = llvm.add %8909, %8911 : !llvm.i64 + %8913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8914 = llvm.mul %5973, %8913 : !llvm.i64 + %8915 = llvm.add %8912, %8914 : !llvm.i64 + %8916 = llvm.getelementptr %8908[%8915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8917 = llvm.load %8916 : !llvm.ptr + %8918 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8920 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8921 = llvm.mul %26, %8920 : !llvm.i64 + %8922 = llvm.add %8919, %8921 : !llvm.i64 + %8923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8924 = llvm.mul %5973, %8923 : !llvm.i64 + %8925 = llvm.add %8922, %8924 : !llvm.i64 + %8926 = llvm.getelementptr %8918[%8925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8917, %8926 : !llvm.ptr + %8927 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8928 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8929 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8930 = llvm.mul %26, %8929 : !llvm.i64 + %8931 = llvm.add %8928, %8930 : !llvm.i64 + %8932 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8933 = llvm.mul %5851, %8932 : !llvm.i64 + %8934 = llvm.add %8931, %8933 : !llvm.i64 + %8935 = llvm.getelementptr %8927[%8934] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8936 = llvm.load %8935 : !llvm.ptr + %8937 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8938 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8939 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8940 = llvm.mul %5851, %8939 : !llvm.i64 + %8941 = llvm.add %8938, %8940 : !llvm.i64 + %8942 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8943 = llvm.mul %6034, %8942 : !llvm.i64 + %8944 = llvm.add %8941, %8943 : !llvm.i64 + %8945 = llvm.getelementptr %8937[%8944] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8946 = llvm.load %8945 : !llvm.ptr + %8947 = llvm.fmul %8936, %8946 {RelaxedPrecision} : !llvm.float + %8948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8951 = llvm.mul %26, %8950 : !llvm.i64 + %8952 = llvm.add %8949, %8951 : !llvm.i64 + %8953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8954 = llvm.mul %6034, %8953 : !llvm.i64 + %8955 = llvm.add %8952, %8954 : !llvm.i64 + %8956 = llvm.getelementptr %8948[%8955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8957 = llvm.load %8956 : !llvm.ptr + %8958 = llvm.fadd %8957, %8947 {RelaxedPrecision} : !llvm.float + %8959 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8960 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8961 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8962 = llvm.mul %26, %8961 : !llvm.i64 + %8963 = llvm.add %8960, %8962 : !llvm.i64 + %8964 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8965 = llvm.mul %6034, %8964 : !llvm.i64 + %8966 = llvm.add %8963, %8965 : !llvm.i64 + %8967 = llvm.getelementptr %8959[%8966] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8958, %8967 : !llvm.ptr + %8968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8971 = llvm.mul %26, %8970 : !llvm.i64 + %8972 = llvm.add %8969, %8971 : !llvm.i64 + %8973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8974 = llvm.mul %6034, %8973 : !llvm.i64 + %8975 = llvm.add %8972, %8974 : !llvm.i64 + %8976 = llvm.getelementptr %8968[%8975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8977 = llvm.load %8976 : !llvm.ptr + %8978 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8980 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8981 = llvm.mul %26, %8980 : !llvm.i64 + %8982 = llvm.add %8979, %8981 : !llvm.i64 + %8983 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8984 = llvm.mul %6034, %8983 : !llvm.i64 + %8985 = llvm.add %8982, %8984 : !llvm.i64 + %8986 = llvm.getelementptr %8978[%8985] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8977, %8986 : !llvm.ptr + %8987 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8988 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8989 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8990 = llvm.mul %26, %8989 : !llvm.i64 + %8991 = llvm.add %8988, %8990 : !llvm.i64 + %8992 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8993 = llvm.mul %5851, %8992 : !llvm.i64 + %8994 = llvm.add %8991, %8993 : !llvm.i64 + %8995 = llvm.getelementptr %8987[%8994] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8996 = llvm.load %8995 : !llvm.ptr + %8997 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8998 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8999 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9000 = llvm.mul %5851, %8999 : !llvm.i64 + %9001 = llvm.add %8998, %9000 : !llvm.i64 + %9002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9003 = llvm.mul %6095, %9002 : !llvm.i64 + %9004 = llvm.add %9001, %9003 : !llvm.i64 + %9005 = llvm.getelementptr %8997[%9004] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9006 = llvm.load %9005 : !llvm.ptr + %9007 = llvm.fmul %8996, %9006 {RelaxedPrecision} : !llvm.float + %9008 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9010 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9011 = llvm.mul %26, %9010 : !llvm.i64 + %9012 = llvm.add %9009, %9011 : !llvm.i64 + %9013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9014 = llvm.mul %6095, %9013 : !llvm.i64 + %9015 = llvm.add %9012, %9014 : !llvm.i64 + %9016 = llvm.getelementptr %9008[%9015] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9017 = llvm.load %9016 : !llvm.ptr + %9018 = llvm.fadd %9017, %9007 {RelaxedPrecision} : !llvm.float + %9019 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9020 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9021 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9022 = llvm.mul %26, %9021 : !llvm.i64 + %9023 = llvm.add %9020, %9022 : !llvm.i64 + %9024 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9025 = llvm.mul %6095, %9024 : !llvm.i64 + %9026 = llvm.add %9023, %9025 : !llvm.i64 + %9027 = llvm.getelementptr %9019[%9026] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9018, %9027 : !llvm.ptr + %9028 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9031 = llvm.mul %26, %9030 : !llvm.i64 + %9032 = llvm.add %9029, %9031 : !llvm.i64 + %9033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9034 = llvm.mul %6095, %9033 : !llvm.i64 + %9035 = llvm.add %9032, %9034 : !llvm.i64 + %9036 = llvm.getelementptr %9028[%9035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9037 = llvm.load %9036 : !llvm.ptr + %9038 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9039 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9040 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9041 = llvm.mul %26, %9040 : !llvm.i64 + %9042 = llvm.add %9039, %9041 : !llvm.i64 + %9043 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9044 = llvm.mul %6095, %9043 : !llvm.i64 + %9045 = llvm.add %9042, %9044 : !llvm.i64 + %9046 = llvm.getelementptr %9038[%9045] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9037, %9046 : !llvm.ptr + %9047 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9048 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9049 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9050 = llvm.mul %26, %9049 : !llvm.i64 + %9051 = llvm.add %9048, %9050 : !llvm.i64 + %9052 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9053 = llvm.mul %5851, %9052 : !llvm.i64 + %9054 = llvm.add %9051, %9053 : !llvm.i64 + %9055 = llvm.getelementptr %9047[%9054] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9056 = llvm.load %9055 : !llvm.ptr + %9057 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9059 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9060 = llvm.mul %5851, %9059 : !llvm.i64 + %9061 = llvm.add %9058, %9060 : !llvm.i64 + %9062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9063 = llvm.mul %6156, %9062 : !llvm.i64 + %9064 = llvm.add %9061, %9063 : !llvm.i64 + %9065 = llvm.getelementptr %9057[%9064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9066 = llvm.load %9065 : !llvm.ptr + %9067 = llvm.fmul %9056, %9066 {RelaxedPrecision} : !llvm.float + %9068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9071 = llvm.mul %26, %9070 : !llvm.i64 + %9072 = llvm.add %9069, %9071 : !llvm.i64 + %9073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9074 = llvm.mul %6156, %9073 : !llvm.i64 + %9075 = llvm.add %9072, %9074 : !llvm.i64 + %9076 = llvm.getelementptr %9068[%9075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9077 = llvm.load %9076 : !llvm.ptr + %9078 = llvm.fadd %9077, %9067 {RelaxedPrecision} : !llvm.float + %9079 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9080 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9081 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9082 = llvm.mul %26, %9081 : !llvm.i64 + %9083 = llvm.add %9080, %9082 : !llvm.i64 + %9084 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9085 = llvm.mul %6156, %9084 : !llvm.i64 + %9086 = llvm.add %9083, %9085 : !llvm.i64 + %9087 = llvm.getelementptr %9079[%9086] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9078, %9087 : !llvm.ptr + %9088 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9091 = llvm.mul %26, %9090 : !llvm.i64 + %9092 = llvm.add %9089, %9091 : !llvm.i64 + %9093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9094 = llvm.mul %6156, %9093 : !llvm.i64 + %9095 = llvm.add %9092, %9094 : !llvm.i64 + %9096 = llvm.getelementptr %9088[%9095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9097 = llvm.load %9096 : !llvm.ptr + %9098 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9100 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9101 = llvm.mul %26, %9100 : !llvm.i64 + %9102 = llvm.add %9099, %9101 : !llvm.i64 + %9103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9104 = llvm.mul %6156, %9103 : !llvm.i64 + %9105 = llvm.add %9102, %9104 : !llvm.i64 + %9106 = llvm.getelementptr %9098[%9105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9097, %9106 : !llvm.ptr + %9107 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9108 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9109 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9110 = llvm.mul %26, %9109 : !llvm.i64 + %9111 = llvm.add %9108, %9110 : !llvm.i64 + %9112 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9113 = llvm.mul %5851, %9112 : !llvm.i64 + %9114 = llvm.add %9111, %9113 : !llvm.i64 + %9115 = llvm.getelementptr %9107[%9114] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9116 = llvm.load %9115 : !llvm.ptr + %9117 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9120 = llvm.mul %5851, %9119 : !llvm.i64 + %9121 = llvm.add %9118, %9120 : !llvm.i64 + %9122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9123 = llvm.mul %6217, %9122 : !llvm.i64 + %9124 = llvm.add %9121, %9123 : !llvm.i64 + %9125 = llvm.getelementptr %9117[%9124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9126 = llvm.load %9125 : !llvm.ptr + %9127 = llvm.fmul %9116, %9126 {RelaxedPrecision} : !llvm.float + %9128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9131 = llvm.mul %26, %9130 : !llvm.i64 + %9132 = llvm.add %9129, %9131 : !llvm.i64 + %9133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9134 = llvm.mul %6217, %9133 : !llvm.i64 + %9135 = llvm.add %9132, %9134 : !llvm.i64 + %9136 = llvm.getelementptr %9128[%9135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9137 = llvm.load %9136 : !llvm.ptr + %9138 = llvm.fadd %9137, %9127 {RelaxedPrecision} : !llvm.float + %9139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9142 = llvm.mul %26, %9141 : !llvm.i64 + %9143 = llvm.add %9140, %9142 : !llvm.i64 + %9144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9145 = llvm.mul %6217, %9144 : !llvm.i64 + %9146 = llvm.add %9143, %9145 : !llvm.i64 + %9147 = llvm.getelementptr %9139[%9146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9138, %9147 : !llvm.ptr + %9148 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9151 = llvm.mul %26, %9150 : !llvm.i64 + %9152 = llvm.add %9149, %9151 : !llvm.i64 + %9153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9154 = llvm.mul %6217, %9153 : !llvm.i64 + %9155 = llvm.add %9152, %9154 : !llvm.i64 + %9156 = llvm.getelementptr %9148[%9155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9157 = llvm.load %9156 : !llvm.ptr + %9158 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9161 = llvm.mul %26, %9160 : !llvm.i64 + %9162 = llvm.add %9159, %9161 : !llvm.i64 + %9163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9164 = llvm.mul %6217, %9163 : !llvm.i64 + %9165 = llvm.add %9162, %9164 : !llvm.i64 + %9166 = llvm.getelementptr %9158[%9165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9157, %9166 : !llvm.ptr + %9167 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9169 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9170 = llvm.mul %26, %9169 : !llvm.i64 + %9171 = llvm.add %9168, %9170 : !llvm.i64 + %9172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9173 = llvm.mul %5851, %9172 : !llvm.i64 + %9174 = llvm.add %9171, %9173 : !llvm.i64 + %9175 = llvm.getelementptr %9167[%9174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9176 = llvm.load %9175 : !llvm.ptr + %9177 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9179 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9180 = llvm.mul %5851, %9179 : !llvm.i64 + %9181 = llvm.add %9178, %9180 : !llvm.i64 + %9182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9183 = llvm.mul %6278, %9182 : !llvm.i64 + %9184 = llvm.add %9181, %9183 : !llvm.i64 + %9185 = llvm.getelementptr %9177[%9184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9186 = llvm.load %9185 : !llvm.ptr + %9187 = llvm.fmul %9176, %9186 {RelaxedPrecision} : !llvm.float + %9188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9191 = llvm.mul %26, %9190 : !llvm.i64 + %9192 = llvm.add %9189, %9191 : !llvm.i64 + %9193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9194 = llvm.mul %6278, %9193 : !llvm.i64 + %9195 = llvm.add %9192, %9194 : !llvm.i64 + %9196 = llvm.getelementptr %9188[%9195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9197 = llvm.load %9196 : !llvm.ptr + %9198 = llvm.fadd %9197, %9187 {RelaxedPrecision} : !llvm.float + %9199 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9200 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9201 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9202 = llvm.mul %26, %9201 : !llvm.i64 + %9203 = llvm.add %9200, %9202 : !llvm.i64 + %9204 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9205 = llvm.mul %6278, %9204 : !llvm.i64 + %9206 = llvm.add %9203, %9205 : !llvm.i64 + %9207 = llvm.getelementptr %9199[%9206] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9198, %9207 : !llvm.ptr + %9208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9211 = llvm.mul %26, %9210 : !llvm.i64 + %9212 = llvm.add %9209, %9211 : !llvm.i64 + %9213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9214 = llvm.mul %6278, %9213 : !llvm.i64 + %9215 = llvm.add %9212, %9214 : !llvm.i64 + %9216 = llvm.getelementptr %9208[%9215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9217 = llvm.load %9216 : !llvm.ptr + %9218 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9220 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9221 = llvm.mul %26, %9220 : !llvm.i64 + %9222 = llvm.add %9219, %9221 : !llvm.i64 + %9223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9224 = llvm.mul %6278, %9223 : !llvm.i64 + %9225 = llvm.add %9222, %9224 : !llvm.i64 + %9226 = llvm.getelementptr %9218[%9225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9217, %9226 : !llvm.ptr + %9227 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9228 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9229 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9230 = llvm.mul %26, %9229 : !llvm.i64 + %9231 = llvm.add %9228, %9230 : !llvm.i64 + %9232 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9233 = llvm.mul %5851, %9232 : !llvm.i64 + %9234 = llvm.add %9231, %9233 : !llvm.i64 + %9235 = llvm.getelementptr %9227[%9234] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9236 = llvm.load %9235 : !llvm.ptr + %9237 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9239 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9240 = llvm.mul %5851, %9239 : !llvm.i64 + %9241 = llvm.add %9238, %9240 : !llvm.i64 + %9242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9243 = llvm.mul %6339, %9242 : !llvm.i64 + %9244 = llvm.add %9241, %9243 : !llvm.i64 + %9245 = llvm.getelementptr %9237[%9244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9246 = llvm.load %9245 : !llvm.ptr + %9247 = llvm.fmul %9236, %9246 {RelaxedPrecision} : !llvm.float + %9248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9251 = llvm.mul %26, %9250 : !llvm.i64 + %9252 = llvm.add %9249, %9251 : !llvm.i64 + %9253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9254 = llvm.mul %6339, %9253 : !llvm.i64 + %9255 = llvm.add %9252, %9254 : !llvm.i64 + %9256 = llvm.getelementptr %9248[%9255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9257 = llvm.load %9256 : !llvm.ptr + %9258 = llvm.fadd %9257, %9247 {RelaxedPrecision} : !llvm.float + %9259 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9260 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9261 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9262 = llvm.mul %26, %9261 : !llvm.i64 + %9263 = llvm.add %9260, %9262 : !llvm.i64 + %9264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9265 = llvm.mul %6339, %9264 : !llvm.i64 + %9266 = llvm.add %9263, %9265 : !llvm.i64 + %9267 = llvm.getelementptr %9259[%9266] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9258, %9267 : !llvm.ptr + %9268 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9271 = llvm.mul %26, %9270 : !llvm.i64 + %9272 = llvm.add %9269, %9271 : !llvm.i64 + %9273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9274 = llvm.mul %6339, %9273 : !llvm.i64 + %9275 = llvm.add %9272, %9274 : !llvm.i64 + %9276 = llvm.getelementptr %9268[%9275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9277 = llvm.load %9276 : !llvm.ptr + %9278 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9279 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9280 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9281 = llvm.mul %26, %9280 : !llvm.i64 + %9282 = llvm.add %9279, %9281 : !llvm.i64 + %9283 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9284 = llvm.mul %6339, %9283 : !llvm.i64 + %9285 = llvm.add %9282, %9284 : !llvm.i64 + %9286 = llvm.getelementptr %9278[%9285] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9277, %9286 : !llvm.ptr + %9287 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9288 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9289 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9290 = llvm.mul %26, %9289 : !llvm.i64 + %9291 = llvm.add %9288, %9290 : !llvm.i64 + %9292 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9293 = llvm.mul %5851, %9292 : !llvm.i64 + %9294 = llvm.add %9291, %9293 : !llvm.i64 + %9295 = llvm.getelementptr %9287[%9294] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9296 = llvm.load %9295 : !llvm.ptr + %9297 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9299 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9300 = llvm.mul %5851, %9299 : !llvm.i64 + %9301 = llvm.add %9298, %9300 : !llvm.i64 + %9302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9303 = llvm.mul %6400, %9302 : !llvm.i64 + %9304 = llvm.add %9301, %9303 : !llvm.i64 + %9305 = llvm.getelementptr %9297[%9304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9306 = llvm.load %9305 : !llvm.ptr + %9307 = llvm.fmul %9296, %9306 {RelaxedPrecision} : !llvm.float + %9308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9311 = llvm.mul %26, %9310 : !llvm.i64 + %9312 = llvm.add %9309, %9311 : !llvm.i64 + %9313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9314 = llvm.mul %6400, %9313 : !llvm.i64 + %9315 = llvm.add %9312, %9314 : !llvm.i64 + %9316 = llvm.getelementptr %9308[%9315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9317 = llvm.load %9316 : !llvm.ptr + %9318 = llvm.fadd %9317, %9307 {RelaxedPrecision} : !llvm.float + %9319 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9320 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9321 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9322 = llvm.mul %26, %9321 : !llvm.i64 + %9323 = llvm.add %9320, %9322 : !llvm.i64 + %9324 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9325 = llvm.mul %6400, %9324 : !llvm.i64 + %9326 = llvm.add %9323, %9325 : !llvm.i64 + %9327 = llvm.getelementptr %9319[%9326] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9318, %9327 : !llvm.ptr + %9328 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9331 = llvm.mul %26, %9330 : !llvm.i64 + %9332 = llvm.add %9329, %9331 : !llvm.i64 + %9333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9334 = llvm.mul %6400, %9333 : !llvm.i64 + %9335 = llvm.add %9332, %9334 : !llvm.i64 + %9336 = llvm.getelementptr %9328[%9335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9337 = llvm.load %9336 : !llvm.ptr + %9338 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9339 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9340 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9341 = llvm.mul %26, %9340 : !llvm.i64 + %9342 = llvm.add %9339, %9341 : !llvm.i64 + %9343 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9344 = llvm.mul %6400, %9343 : !llvm.i64 + %9345 = llvm.add %9342, %9344 : !llvm.i64 + %9346 = llvm.getelementptr %9338[%9345] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9337, %9346 : !llvm.ptr + %9347 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9349 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9350 = llvm.mul %26, %9349 : !llvm.i64 + %9351 = llvm.add %9348, %9350 : !llvm.i64 + %9352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9353 = llvm.mul %5851, %9352 : !llvm.i64 + %9354 = llvm.add %9351, %9353 : !llvm.i64 + %9355 = llvm.getelementptr %9347[%9354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9356 = llvm.load %9355 : !llvm.ptr + %9357 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9359 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9360 = llvm.mul %5851, %9359 : !llvm.i64 + %9361 = llvm.add %9358, %9360 : !llvm.i64 + %9362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9363 = llvm.mul %6461, %9362 : !llvm.i64 + %9364 = llvm.add %9361, %9363 : !llvm.i64 + %9365 = llvm.getelementptr %9357[%9364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9366 = llvm.load %9365 : !llvm.ptr + %9367 = llvm.fmul %9356, %9366 {RelaxedPrecision} : !llvm.float + %9368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9371 = llvm.mul %26, %9370 : !llvm.i64 + %9372 = llvm.add %9369, %9371 : !llvm.i64 + %9373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9374 = llvm.mul %6461, %9373 : !llvm.i64 + %9375 = llvm.add %9372, %9374 : !llvm.i64 + %9376 = llvm.getelementptr %9368[%9375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9377 = llvm.load %9376 : !llvm.ptr + %9378 = llvm.fadd %9377, %9367 {RelaxedPrecision} : !llvm.float + %9379 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9380 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9381 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9382 = llvm.mul %26, %9381 : !llvm.i64 + %9383 = llvm.add %9380, %9382 : !llvm.i64 + %9384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9385 = llvm.mul %6461, %9384 : !llvm.i64 + %9386 = llvm.add %9383, %9385 : !llvm.i64 + %9387 = llvm.getelementptr %9379[%9386] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9378, %9387 : !llvm.ptr + %9388 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9391 = llvm.mul %26, %9390 : !llvm.i64 + %9392 = llvm.add %9389, %9391 : !llvm.i64 + %9393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9394 = llvm.mul %6461, %9393 : !llvm.i64 + %9395 = llvm.add %9392, %9394 : !llvm.i64 + %9396 = llvm.getelementptr %9388[%9395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9397 = llvm.load %9396 : !llvm.ptr + %9398 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9399 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9401 = llvm.mul %26, %9400 : !llvm.i64 + %9402 = llvm.add %9399, %9401 : !llvm.i64 + %9403 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9404 = llvm.mul %6461, %9403 : !llvm.i64 + %9405 = llvm.add %9402, %9404 : !llvm.i64 + %9406 = llvm.getelementptr %9398[%9405] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9397, %9406 : !llvm.ptr + %9407 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9408 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9409 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9410 = llvm.mul %26, %9409 : !llvm.i64 + %9411 = llvm.add %9408, %9410 : !llvm.i64 + %9412 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9413 = llvm.mul %5851, %9412 : !llvm.i64 + %9414 = llvm.add %9411, %9413 : !llvm.i64 + %9415 = llvm.getelementptr %9407[%9414] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9416 = llvm.load %9415 : !llvm.ptr + %9417 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9419 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9420 = llvm.mul %5851, %9419 : !llvm.i64 + %9421 = llvm.add %9418, %9420 : !llvm.i64 + %9422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9423 = llvm.mul %6522, %9422 : !llvm.i64 + %9424 = llvm.add %9421, %9423 : !llvm.i64 + %9425 = llvm.getelementptr %9417[%9424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9426 = llvm.load %9425 : !llvm.ptr + %9427 = llvm.fmul %9416, %9426 {RelaxedPrecision} : !llvm.float + %9428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9431 = llvm.mul %26, %9430 : !llvm.i64 + %9432 = llvm.add %9429, %9431 : !llvm.i64 + %9433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9434 = llvm.mul %6522, %9433 : !llvm.i64 + %9435 = llvm.add %9432, %9434 : !llvm.i64 + %9436 = llvm.getelementptr %9428[%9435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9437 = llvm.load %9436 : !llvm.ptr + %9438 = llvm.fadd %9437, %9427 {RelaxedPrecision} : !llvm.float + %9439 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9440 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9441 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9442 = llvm.mul %26, %9441 : !llvm.i64 + %9443 = llvm.add %9440, %9442 : !llvm.i64 + %9444 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9445 = llvm.mul %6522, %9444 : !llvm.i64 + %9446 = llvm.add %9443, %9445 : !llvm.i64 + %9447 = llvm.getelementptr %9439[%9446] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9438, %9447 : !llvm.ptr + %9448 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9451 = llvm.mul %26, %9450 : !llvm.i64 + %9452 = llvm.add %9449, %9451 : !llvm.i64 + %9453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9454 = llvm.mul %6522, %9453 : !llvm.i64 + %9455 = llvm.add %9452, %9454 : !llvm.i64 + %9456 = llvm.getelementptr %9448[%9455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9457 = llvm.load %9456 : !llvm.ptr + %9458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9461 = llvm.mul %26, %9460 : !llvm.i64 + %9462 = llvm.add %9459, %9461 : !llvm.i64 + %9463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9464 = llvm.mul %6522, %9463 : !llvm.i64 + %9465 = llvm.add %9462, %9464 : !llvm.i64 + %9466 = llvm.getelementptr %9458[%9465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9457, %9466 : !llvm.ptr + %9467 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9468 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9469 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9470 = llvm.mul %26, %9469 : !llvm.i64 + %9471 = llvm.add %9468, %9470 : !llvm.i64 + %9472 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9473 = llvm.mul %5851, %9472 : !llvm.i64 + %9474 = llvm.add %9471, %9473 : !llvm.i64 + %9475 = llvm.getelementptr %9467[%9474] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9476 = llvm.load %9475 : !llvm.ptr + %9477 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9480 = llvm.mul %5851, %9479 : !llvm.i64 + %9481 = llvm.add %9478, %9480 : !llvm.i64 + %9482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9483 = llvm.mul %6583, %9482 : !llvm.i64 + %9484 = llvm.add %9481, %9483 : !llvm.i64 + %9485 = llvm.getelementptr %9477[%9484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9486 = llvm.load %9485 : !llvm.ptr + %9487 = llvm.fmul %9476, %9486 {RelaxedPrecision} : !llvm.float + %9488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9491 = llvm.mul %26, %9490 : !llvm.i64 + %9492 = llvm.add %9489, %9491 : !llvm.i64 + %9493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9494 = llvm.mul %6583, %9493 : !llvm.i64 + %9495 = llvm.add %9492, %9494 : !llvm.i64 + %9496 = llvm.getelementptr %9488[%9495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9497 = llvm.load %9496 : !llvm.ptr + %9498 = llvm.fadd %9497, %9487 {RelaxedPrecision} : !llvm.float + %9499 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9500 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9501 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9502 = llvm.mul %26, %9501 : !llvm.i64 + %9503 = llvm.add %9500, %9502 : !llvm.i64 + %9504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9505 = llvm.mul %6583, %9504 : !llvm.i64 + %9506 = llvm.add %9503, %9505 : !llvm.i64 + %9507 = llvm.getelementptr %9499[%9506] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9498, %9507 : !llvm.ptr + %9508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9511 = llvm.mul %26, %9510 : !llvm.i64 + %9512 = llvm.add %9509, %9511 : !llvm.i64 + %9513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9514 = llvm.mul %6583, %9513 : !llvm.i64 + %9515 = llvm.add %9512, %9514 : !llvm.i64 + %9516 = llvm.getelementptr %9508[%9515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9517 = llvm.load %9516 : !llvm.ptr + %9518 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9519 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9520 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9521 = llvm.mul %26, %9520 : !llvm.i64 + %9522 = llvm.add %9519, %9521 : !llvm.i64 + %9523 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9524 = llvm.mul %6583, %9523 : !llvm.i64 + %9525 = llvm.add %9522, %9524 : !llvm.i64 + %9526 = llvm.getelementptr %9518[%9525] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9517, %9526 : !llvm.ptr + %9527 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9528 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9529 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9530 = llvm.mul %26, %9529 : !llvm.i64 + %9531 = llvm.add %9528, %9530 : !llvm.i64 + %9532 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9533 = llvm.mul %5851, %9532 : !llvm.i64 + %9534 = llvm.add %9531, %9533 : !llvm.i64 + %9535 = llvm.getelementptr %9527[%9534] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9536 = llvm.load %9535 : !llvm.ptr + %9537 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9539 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9540 = llvm.mul %5851, %9539 : !llvm.i64 + %9541 = llvm.add %9538, %9540 : !llvm.i64 + %9542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9543 = llvm.mul %6644, %9542 : !llvm.i64 + %9544 = llvm.add %9541, %9543 : !llvm.i64 + %9545 = llvm.getelementptr %9537[%9544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9546 = llvm.load %9545 : !llvm.ptr + %9547 = llvm.fmul %9536, %9546 {RelaxedPrecision} : !llvm.float + %9548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9551 = llvm.mul %26, %9550 : !llvm.i64 + %9552 = llvm.add %9549, %9551 : !llvm.i64 + %9553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9554 = llvm.mul %6644, %9553 : !llvm.i64 + %9555 = llvm.add %9552, %9554 : !llvm.i64 + %9556 = llvm.getelementptr %9548[%9555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9557 = llvm.load %9556 : !llvm.ptr + %9558 = llvm.fadd %9557, %9547 {RelaxedPrecision} : !llvm.float + %9559 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9560 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9561 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9562 = llvm.mul %26, %9561 : !llvm.i64 + %9563 = llvm.add %9560, %9562 : !llvm.i64 + %9564 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9565 = llvm.mul %6644, %9564 : !llvm.i64 + %9566 = llvm.add %9563, %9565 : !llvm.i64 + %9567 = llvm.getelementptr %9559[%9566] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9558, %9567 : !llvm.ptr + %9568 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9571 = llvm.mul %26, %9570 : !llvm.i64 + %9572 = llvm.add %9569, %9571 : !llvm.i64 + %9573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9574 = llvm.mul %6644, %9573 : !llvm.i64 + %9575 = llvm.add %9572, %9574 : !llvm.i64 + %9576 = llvm.getelementptr %9568[%9575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9577 = llvm.load %9576 : !llvm.ptr + %9578 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9580 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9581 = llvm.mul %26, %9580 : !llvm.i64 + %9582 = llvm.add %9579, %9581 : !llvm.i64 + %9583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9584 = llvm.mul %6644, %9583 : !llvm.i64 + %9585 = llvm.add %9582, %9584 : !llvm.i64 + %9586 = llvm.getelementptr %9578[%9585] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9577, %9586 : !llvm.ptr + %9587 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9588 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9589 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9590 = llvm.mul %26, %9589 : !llvm.i64 + %9591 = llvm.add %9588, %9590 : !llvm.i64 + %9592 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9593 = llvm.mul %5851, %9592 : !llvm.i64 + %9594 = llvm.add %9591, %9593 : !llvm.i64 + %9595 = llvm.getelementptr %9587[%9594] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9596 = llvm.load %9595 : !llvm.ptr + %9597 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9599 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9600 = llvm.mul %5851, %9599 : !llvm.i64 + %9601 = llvm.add %9598, %9600 : !llvm.i64 + %9602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9603 = llvm.mul %6705, %9602 : !llvm.i64 + %9604 = llvm.add %9601, %9603 : !llvm.i64 + %9605 = llvm.getelementptr %9597[%9604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9606 = llvm.load %9605 : !llvm.ptr + %9607 = llvm.fmul %9596, %9606 {RelaxedPrecision} : !llvm.float + %9608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9611 = llvm.mul %26, %9610 : !llvm.i64 + %9612 = llvm.add %9609, %9611 : !llvm.i64 + %9613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9614 = llvm.mul %6705, %9613 : !llvm.i64 + %9615 = llvm.add %9612, %9614 : !llvm.i64 + %9616 = llvm.getelementptr %9608[%9615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9617 = llvm.load %9616 : !llvm.ptr + %9618 = llvm.fadd %9617, %9607 {RelaxedPrecision} : !llvm.float + %9619 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9620 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9621 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9622 = llvm.mul %26, %9621 : !llvm.i64 + %9623 = llvm.add %9620, %9622 : !llvm.i64 + %9624 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9625 = llvm.mul %6705, %9624 : !llvm.i64 + %9626 = llvm.add %9623, %9625 : !llvm.i64 + %9627 = llvm.getelementptr %9619[%9626] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9618, %9627 : !llvm.ptr + %9628 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9631 = llvm.mul %26, %9630 : !llvm.i64 + %9632 = llvm.add %9629, %9631 : !llvm.i64 + %9633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9634 = llvm.mul %6705, %9633 : !llvm.i64 + %9635 = llvm.add %9632, %9634 : !llvm.i64 + %9636 = llvm.getelementptr %9628[%9635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9637 = llvm.load %9636 : !llvm.ptr + %9638 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9639 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9640 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9641 = llvm.mul %26, %9640 : !llvm.i64 + %9642 = llvm.add %9639, %9641 : !llvm.i64 + %9643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9644 = llvm.mul %6705, %9643 : !llvm.i64 + %9645 = llvm.add %9642, %9644 : !llvm.i64 + %9646 = llvm.getelementptr %9638[%9645] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9637, %9646 : !llvm.ptr + %9647 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9648 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9649 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9650 = llvm.mul %26, %9649 : !llvm.i64 + %9651 = llvm.add %9648, %9650 : !llvm.i64 + %9652 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9653 = llvm.mul %5851, %9652 : !llvm.i64 + %9654 = llvm.add %9651, %9653 : !llvm.i64 + %9655 = llvm.getelementptr %9647[%9654] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9656 = llvm.load %9655 : !llvm.ptr + %9657 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9659 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9660 = llvm.mul %5851, %9659 : !llvm.i64 + %9661 = llvm.add %9658, %9660 : !llvm.i64 + %9662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9663 = llvm.mul %6766, %9662 : !llvm.i64 + %9664 = llvm.add %9661, %9663 : !llvm.i64 + %9665 = llvm.getelementptr %9657[%9664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9666 = llvm.load %9665 : !llvm.ptr + %9667 = llvm.fmul %9656, %9666 {RelaxedPrecision} : !llvm.float + %9668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9671 = llvm.mul %26, %9670 : !llvm.i64 + %9672 = llvm.add %9669, %9671 : !llvm.i64 + %9673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9674 = llvm.mul %6766, %9673 : !llvm.i64 + %9675 = llvm.add %9672, %9674 : !llvm.i64 + %9676 = llvm.getelementptr %9668[%9675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9677 = llvm.load %9676 : !llvm.ptr + %9678 = llvm.fadd %9677, %9667 {RelaxedPrecision} : !llvm.float + %9679 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9680 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9681 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9682 = llvm.mul %26, %9681 : !llvm.i64 + %9683 = llvm.add %9680, %9682 : !llvm.i64 + %9684 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9685 = llvm.mul %6766, %9684 : !llvm.i64 + %9686 = llvm.add %9683, %9685 : !llvm.i64 + %9687 = llvm.getelementptr %9679[%9686] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9678, %9687 : !llvm.ptr + %9688 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9691 = llvm.mul %26, %9690 : !llvm.i64 + %9692 = llvm.add %9689, %9691 : !llvm.i64 + %9693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9694 = llvm.mul %6766, %9693 : !llvm.i64 + %9695 = llvm.add %9692, %9694 : !llvm.i64 + %9696 = llvm.getelementptr %9688[%9695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9697 = llvm.load %9696 : !llvm.ptr + %9698 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9700 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9701 = llvm.mul %26, %9700 : !llvm.i64 + %9702 = llvm.add %9699, %9701 : !llvm.i64 + %9703 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9704 = llvm.mul %6766, %9703 : !llvm.i64 + %9705 = llvm.add %9702, %9704 : !llvm.i64 + %9706 = llvm.getelementptr %9698[%9705] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9697, %9706 : !llvm.ptr + %9707 = llvm.add %5848, %33 : !llvm.i64 + llvm.br ^bb19(%9707 : !llvm.i64) + ^bb21: // pred: ^bb19 + %9708 = llvm.add %5846, %36 : !llvm.i64 + llvm.br ^bb17(%9708 : !llvm.i64) + ^bb22: // pred: ^bb17 + %9709 = llvm.add %5844, %30 : !llvm.i64 + llvm.br ^bb15(%9709 : !llvm.i64) + ^bb23: // pred: ^bb15 + %9710 = llvm.add %48, %29 : !llvm.i64 + llvm.br ^bb1(%9710 : !llvm.i64) + ^bb24: // pred: ^bb1 + llvm.return + } + llvm.func @optimized_matmul_py(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(784 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(128 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(128 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(512 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(512 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(784 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(512 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(512 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @optimized_matmul_py_impl_17630232307017152746(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/optimized_matmul/mlir/20_SCFToStandard.mlir b/Tutorials/optimized_matmul/mlir/20_SCFToStandard.mlir new file mode 100644 index 00000000..a3a1edd2 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/20_SCFToStandard.mlir @@ -0,0 +1,2197 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + br ^bb1(%c0 : index) + ^bb1(%4: index): // 2 preds: ^bb0, ^bb53 + %5 = cmpi "slt", %4, %c512 : index + cond_br %5, ^bb2, ^bb54 + ^bb2: // pred: ^bb1 + br ^bb3(%c0 : index) + ^bb3(%6: index): // 2 preds: ^bb2, ^bb10 + %7 = cmpi "slt", %6, %c128 : index + cond_br %7, ^bb4, ^bb11 + ^bb4: // pred: ^bb3 + br ^bb5(%c0 : index) + ^bb5(%8: index): // 2 preds: ^bb4, ^bb9 + %9 = cmpi "slt", %8, %c256 : index + cond_br %9, ^bb6, ^bb10 + ^bb6: // pred: ^bb5 + cond_br %true, ^bb7, ^bb8 + ^bb7: // pred: ^bb6 + %10 = addi %4, %8 : index + %11 = vector.transfer_read %arg1[%6, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %12 = addi %10, %c8 : index + %13 = vector.transfer_read %arg1[%6, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %14 = addi %10, %c16 : index + %15 = vector.transfer_read %arg1[%6, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %16 = addi %10, %c24 : index + %17 = vector.transfer_read %arg1[%6, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %18 = addi %10, %c32 : index + %19 = vector.transfer_read %arg1[%6, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %20 = addi %10, %c40 : index + %21 = vector.transfer_read %arg1[%6, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %22 = addi %10, %c48 : index + %23 = vector.transfer_read %arg1[%6, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %24 = addi %10, %c56 : index + %25 = vector.transfer_read %arg1[%6, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %26 = addi %10, %c64 : index + %27 = vector.transfer_read %arg1[%6, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %28 = addi %10, %c72 : index + %29 = vector.transfer_read %arg1[%6, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %30 = addi %10, %c80 : index + %31 = vector.transfer_read %arg1[%6, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %32 = addi %10, %c88 : index + %33 = vector.transfer_read %arg1[%6, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %34 = addi %10, %c96 : index + %35 = vector.transfer_read %arg1[%6, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %36 = addi %10, %c104 : index + %37 = vector.transfer_read %arg1[%6, %36], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %37, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %38 = addi %10, %c112 : index + %39 = vector.transfer_read %arg1[%6, %38], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %39, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %40 = addi %10, %c120 : index + %41 = vector.transfer_read %arg1[%6, %40], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %41, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %42 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %43 = cmpi "slt", %8, %c0 : index + %44 = subi %c-1, %8 : index + %45 = select %43, %44, %8 : index + %46 = divi_signed %45, %c16 : index + %47 = subi %c-1, %46 : index + %48 = select %43, %47, %46 : index + %49 = remi_signed %48, %c16 : index + %50 = cmpi "slt", %49, %c0 : index + %51 = addi %49, %c16 : index + %52 = select %50, %51, %49 : index + %53 = remi_signed %6, %c128 : index + %54 = cmpi "slt", %53, %c0 : index + %55 = addi %53, %c128 : index + %56 = select %54, %55, %53 : index + %57 = remi_signed %8, %c16 : index + %58 = cmpi "slt", %57, %c0 : index + %59 = addi %57, %c16 : index + %60 = select %58, %59, %57 : index + %61 = cmpi "slt", %60, %c0 : index + %62 = subi %c-1, %60 : index + %63 = select %61, %62, %60 : index + %64 = divi_signed %63, %c8 : index + %65 = subi %c-1, %64 : index + %66 = select %61, %65, %64 : index + %67 = remi_signed %66, %c2 : index + %68 = cmpi "slt", %67, %c0 : index + %69 = addi %67, %c2 : index + %70 = select %68, %69, %67 : index + store %42, %3[%52, %56, %70] : memref<16x128x2xvector<8xf32>> + %71 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %72 = addi %8, %c8 : index + %73 = cmpi "slt", %72, %c0 : index + %74 = subi %c-1, %72 : index + %75 = select %73, %74, %72 : index + %76 = divi_signed %75, %c16 : index + %77 = subi %c-1, %76 : index + %78 = select %73, %77, %76 : index + %79 = remi_signed %78, %c16 : index + %80 = cmpi "slt", %79, %c0 : index + %81 = addi %79, %c16 : index + %82 = select %80, %81, %79 : index + %83 = divi_signed %45, %c8 : index + %84 = subi %c-1, %83 : index + %85 = select %43, %84, %83 : index + %86 = muli %78, %c-2 : index + %87 = addi %85, %86 : index + %88 = addi %87, %c1 : index + %89 = cmpi "slt", %88, %c0 : index + %90 = subi %c-1, %88 : index + %91 = select %89, %90, %88 : index + %92 = divi_signed %91, %c2 : index + %93 = subi %c-1, %92 : index + %94 = select %89, %93, %92 : index + %95 = muli %94, %c-2 : index + %96 = addi %87, %95 : index + %97 = addi %96, %c1 : index + store %71, %3[%82, %56, %97] : memref<16x128x2xvector<8xf32>> + %98 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %99 = addi %48, %c1 : index + %100 = cmpi "slt", %99, %c0 : index + %101 = subi %c-1, %99 : index + %102 = select %100, %101, %99 : index + %103 = divi_signed %102, %c16 : index + %104 = subi %c-1, %103 : index + %105 = select %100, %104, %103 : index + %106 = muli %105, %c-16 : index + %107 = addi %48, %106 : index + %108 = addi %107, %c1 : index + store %98, %3[%108, %56, %70] : memref<16x128x2xvector<8xf32>> + %109 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %110 = addi %8, %c24 : index + %111 = cmpi "slt", %110, %c0 : index + %112 = subi %c-1, %110 : index + %113 = select %111, %112, %110 : index + %114 = divi_signed %113, %c16 : index + %115 = subi %c-1, %114 : index + %116 = select %111, %115, %114 : index + %117 = remi_signed %116, %c16 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = addi %117, %c16 : index + %120 = select %118, %119, %117 : index + %121 = muli %116, %c-2 : index + %122 = addi %85, %121 : index + %123 = addi %122, %c3 : index + %124 = cmpi "slt", %123, %c0 : index + %125 = subi %c-1, %123 : index + %126 = select %124, %125, %123 : index + %127 = divi_signed %126, %c2 : index + %128 = subi %c-1, %127 : index + %129 = select %124, %128, %127 : index + %130 = muli %129, %c-2 : index + %131 = addi %122, %130 : index + %132 = addi %131, %c3 : index + store %109, %3[%120, %56, %132] : memref<16x128x2xvector<8xf32>> + %133 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %134 = addi %48, %c2 : index + %135 = cmpi "slt", %134, %c0 : index + %136 = subi %c-1, %134 : index + %137 = select %135, %136, %134 : index + %138 = divi_signed %137, %c16 : index + %139 = subi %c-1, %138 : index + %140 = select %135, %139, %138 : index + %141 = muli %140, %c-16 : index + %142 = addi %48, %141 : index + %143 = addi %142, %c2 : index + store %133, %3[%143, %56, %70] : memref<16x128x2xvector<8xf32>> + %144 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %145 = addi %8, %c40 : index + %146 = cmpi "slt", %145, %c0 : index + %147 = subi %c-1, %145 : index + %148 = select %146, %147, %145 : index + %149 = divi_signed %148, %c16 : index + %150 = subi %c-1, %149 : index + %151 = select %146, %150, %149 : index + %152 = remi_signed %151, %c16 : index + %153 = cmpi "slt", %152, %c0 : index + %154 = addi %152, %c16 : index + %155 = select %153, %154, %152 : index + %156 = muli %151, %c-2 : index + %157 = addi %85, %156 : index + %158 = addi %157, %c5 : index + %159 = cmpi "slt", %158, %c0 : index + %160 = subi %c-1, %158 : index + %161 = select %159, %160, %158 : index + %162 = divi_signed %161, %c2 : index + %163 = subi %c-1, %162 : index + %164 = select %159, %163, %162 : index + %165 = muli %164, %c-2 : index + %166 = addi %157, %165 : index + %167 = addi %166, %c5 : index + store %144, %3[%155, %56, %167] : memref<16x128x2xvector<8xf32>> + %168 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %169 = addi %48, %c3 : index + %170 = cmpi "slt", %169, %c0 : index + %171 = subi %c-1, %169 : index + %172 = select %170, %171, %169 : index + %173 = divi_signed %172, %c16 : index + %174 = subi %c-1, %173 : index + %175 = select %170, %174, %173 : index + %176 = muli %175, %c-16 : index + %177 = addi %48, %176 : index + %178 = addi %177, %c3 : index + store %168, %3[%178, %56, %70] : memref<16x128x2xvector<8xf32>> + %179 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %180 = addi %8, %c56 : index + %181 = cmpi "slt", %180, %c0 : index + %182 = subi %c-1, %180 : index + %183 = select %181, %182, %180 : index + %184 = divi_signed %183, %c16 : index + %185 = subi %c-1, %184 : index + %186 = select %181, %185, %184 : index + %187 = remi_signed %186, %c16 : index + %188 = cmpi "slt", %187, %c0 : index + %189 = addi %187, %c16 : index + %190 = select %188, %189, %187 : index + %191 = muli %186, %c-2 : index + %192 = addi %85, %191 : index + %193 = addi %192, %c7 : index + %194 = cmpi "slt", %193, %c0 : index + %195 = subi %c-1, %193 : index + %196 = select %194, %195, %193 : index + %197 = divi_signed %196, %c2 : index + %198 = subi %c-1, %197 : index + %199 = select %194, %198, %197 : index + %200 = muli %199, %c-2 : index + %201 = addi %192, %200 : index + %202 = addi %201, %c7 : index + store %179, %3[%190, %56, %202] : memref<16x128x2xvector<8xf32>> + %203 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %204 = addi %48, %c4 : index + %205 = cmpi "slt", %204, %c0 : index + %206 = subi %c-1, %204 : index + %207 = select %205, %206, %204 : index + %208 = divi_signed %207, %c16 : index + %209 = subi %c-1, %208 : index + %210 = select %205, %209, %208 : index + %211 = muli %210, %c-16 : index + %212 = addi %48, %211 : index + %213 = addi %212, %c4 : index + store %203, %3[%213, %56, %70] : memref<16x128x2xvector<8xf32>> + %214 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %215 = addi %8, %c72 : index + %216 = cmpi "slt", %215, %c0 : index + %217 = subi %c-1, %215 : index + %218 = select %216, %217, %215 : index + %219 = divi_signed %218, %c16 : index + %220 = subi %c-1, %219 : index + %221 = select %216, %220, %219 : index + %222 = remi_signed %221, %c16 : index + %223 = cmpi "slt", %222, %c0 : index + %224 = addi %222, %c16 : index + %225 = select %223, %224, %222 : index + %226 = muli %221, %c-2 : index + %227 = addi %85, %226 : index + %228 = addi %227, %c9 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = subi %c-1, %228 : index + %231 = select %229, %230, %228 : index + %232 = divi_signed %231, %c2 : index + %233 = subi %c-1, %232 : index + %234 = select %229, %233, %232 : index + %235 = muli %234, %c-2 : index + %236 = addi %227, %235 : index + %237 = addi %236, %c9 : index + store %214, %3[%225, %56, %237] : memref<16x128x2xvector<8xf32>> + %238 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %239 = addi %48, %c5 : index + %240 = cmpi "slt", %239, %c0 : index + %241 = subi %c-1, %239 : index + %242 = select %240, %241, %239 : index + %243 = divi_signed %242, %c16 : index + %244 = subi %c-1, %243 : index + %245 = select %240, %244, %243 : index + %246 = muli %245, %c-16 : index + %247 = addi %48, %246 : index + %248 = addi %247, %c5 : index + store %238, %3[%248, %56, %70] : memref<16x128x2xvector<8xf32>> + %249 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %250 = addi %8, %c88 : index + %251 = cmpi "slt", %250, %c0 : index + %252 = subi %c-1, %250 : index + %253 = select %251, %252, %250 : index + %254 = divi_signed %253, %c16 : index + %255 = subi %c-1, %254 : index + %256 = select %251, %255, %254 : index + %257 = remi_signed %256, %c16 : index + %258 = cmpi "slt", %257, %c0 : index + %259 = addi %257, %c16 : index + %260 = select %258, %259, %257 : index + %261 = muli %256, %c-2 : index + %262 = addi %85, %261 : index + %263 = addi %262, %c11 : index + %264 = cmpi "slt", %263, %c0 : index + %265 = subi %c-1, %263 : index + %266 = select %264, %265, %263 : index + %267 = divi_signed %266, %c2 : index + %268 = subi %c-1, %267 : index + %269 = select %264, %268, %267 : index + %270 = muli %269, %c-2 : index + %271 = addi %262, %270 : index + %272 = addi %271, %c11 : index + store %249, %3[%260, %56, %272] : memref<16x128x2xvector<8xf32>> + %273 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %274 = addi %48, %c6 : index + %275 = cmpi "slt", %274, %c0 : index + %276 = subi %c-1, %274 : index + %277 = select %275, %276, %274 : index + %278 = divi_signed %277, %c16 : index + %279 = subi %c-1, %278 : index + %280 = select %275, %279, %278 : index + %281 = muli %280, %c-16 : index + %282 = addi %48, %281 : index + %283 = addi %282, %c6 : index + store %273, %3[%283, %56, %70] : memref<16x128x2xvector<8xf32>> + %284 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %285 = addi %8, %c104 : index + %286 = cmpi "slt", %285, %c0 : index + %287 = subi %c-1, %285 : index + %288 = select %286, %287, %285 : index + %289 = divi_signed %288, %c16 : index + %290 = subi %c-1, %289 : index + %291 = select %286, %290, %289 : index + %292 = remi_signed %291, %c16 : index + %293 = cmpi "slt", %292, %c0 : index + %294 = addi %292, %c16 : index + %295 = select %293, %294, %292 : index + %296 = muli %291, %c-2 : index + %297 = addi %85, %296 : index + %298 = addi %297, %c13 : index + %299 = cmpi "slt", %298, %c0 : index + %300 = subi %c-1, %298 : index + %301 = select %299, %300, %298 : index + %302 = divi_signed %301, %c2 : index + %303 = subi %c-1, %302 : index + %304 = select %299, %303, %302 : index + %305 = muli %304, %c-2 : index + %306 = addi %297, %305 : index + %307 = addi %306, %c13 : index + store %284, %3[%295, %56, %307] : memref<16x128x2xvector<8xf32>> + %308 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %309 = addi %48, %c7 : index + %310 = cmpi "slt", %309, %c0 : index + %311 = subi %c-1, %309 : index + %312 = select %310, %311, %309 : index + %313 = divi_signed %312, %c16 : index + %314 = subi %c-1, %313 : index + %315 = select %310, %314, %313 : index + %316 = muli %315, %c-16 : index + %317 = addi %48, %316 : index + %318 = addi %317, %c7 : index + store %308, %3[%318, %56, %70] : memref<16x128x2xvector<8xf32>> + %319 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %320 = addi %8, %c120 : index + %321 = cmpi "slt", %320, %c0 : index + %322 = subi %c-1, %320 : index + %323 = select %321, %322, %320 : index + %324 = divi_signed %323, %c16 : index + %325 = subi %c-1, %324 : index + %326 = select %321, %325, %324 : index + %327 = remi_signed %326, %c16 : index + %328 = cmpi "slt", %327, %c0 : index + %329 = addi %327, %c16 : index + %330 = select %328, %329, %327 : index + %331 = muli %326, %c-2 : index + %332 = addi %85, %331 : index + %333 = addi %332, %c15 : index + %334 = cmpi "slt", %333, %c0 : index + %335 = subi %c-1, %333 : index + %336 = select %334, %335, %333 : index + %337 = divi_signed %336, %c2 : index + %338 = subi %c-1, %337 : index + %339 = select %334, %338, %337 : index + %340 = muli %339, %c-2 : index + %341 = addi %332, %340 : index + %342 = addi %341, %c15 : index + store %319, %3[%330, %56, %342] : memref<16x128x2xvector<8xf32>> + br ^bb9 + ^bb8: // pred: ^bb6 + %343 = addi %4, %8 : index + %344 = vector.transfer_read %arg1[%6, %343], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %344, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %345 = addi %343, %c8 : index + %346 = vector.transfer_read %arg1[%6, %345], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %346, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %347 = addi %343, %c16 : index + %348 = vector.transfer_read %arg1[%6, %347], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %348, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %349 = addi %343, %c24 : index + %350 = vector.transfer_read %arg1[%6, %349], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %350, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %351 = addi %343, %c32 : index + %352 = vector.transfer_read %arg1[%6, %351], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %352, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %353 = addi %343, %c40 : index + %354 = vector.transfer_read %arg1[%6, %353], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %354, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %355 = addi %343, %c48 : index + %356 = vector.transfer_read %arg1[%6, %355], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %356, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %357 = addi %343, %c56 : index + %358 = vector.transfer_read %arg1[%6, %357], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %358, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %359 = addi %343, %c64 : index + %360 = vector.transfer_read %arg1[%6, %359], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %360, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %361 = addi %343, %c72 : index + %362 = vector.transfer_read %arg1[%6, %361], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %362, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %363 = addi %343, %c80 : index + %364 = vector.transfer_read %arg1[%6, %363], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %364, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %365 = addi %343, %c88 : index + %366 = vector.transfer_read %arg1[%6, %365], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %366, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %367 = addi %343, %c96 : index + %368 = vector.transfer_read %arg1[%6, %367], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %368, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %369 = addi %343, %c104 : index + %370 = vector.transfer_read %arg1[%6, %369], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %370, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %371 = addi %343, %c112 : index + %372 = vector.transfer_read %arg1[%6, %371], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %372, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %373 = addi %343, %c120 : index + %374 = vector.transfer_read %arg1[%6, %373], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %374, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %375 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %376 = cmpi "slt", %8, %c0 : index + %377 = subi %c-1, %8 : index + %378 = select %376, %377, %8 : index + %379 = divi_signed %378, %c16 : index + %380 = subi %c-1, %379 : index + %381 = select %376, %380, %379 : index + %382 = remi_signed %381, %c16 : index + %383 = cmpi "slt", %382, %c0 : index + %384 = addi %382, %c16 : index + %385 = select %383, %384, %382 : index + %386 = remi_signed %6, %c128 : index + %387 = cmpi "slt", %386, %c0 : index + %388 = addi %386, %c128 : index + %389 = select %387, %388, %386 : index + %390 = remi_signed %8, %c16 : index + %391 = cmpi "slt", %390, %c0 : index + %392 = addi %390, %c16 : index + %393 = select %391, %392, %390 : index + %394 = cmpi "slt", %393, %c0 : index + %395 = subi %c-1, %393 : index + %396 = select %394, %395, %393 : index + %397 = divi_signed %396, %c8 : index + %398 = subi %c-1, %397 : index + %399 = select %394, %398, %397 : index + %400 = remi_signed %399, %c2 : index + %401 = cmpi "slt", %400, %c0 : index + %402 = addi %400, %c2 : index + %403 = select %401, %402, %400 : index + store %375, %3[%385, %389, %403] : memref<16x128x2xvector<8xf32>> + %404 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %405 = addi %8, %c8 : index + %406 = cmpi "slt", %405, %c0 : index + %407 = subi %c-1, %405 : index + %408 = select %406, %407, %405 : index + %409 = divi_signed %408, %c16 : index + %410 = subi %c-1, %409 : index + %411 = select %406, %410, %409 : index + %412 = remi_signed %411, %c16 : index + %413 = cmpi "slt", %412, %c0 : index + %414 = addi %412, %c16 : index + %415 = select %413, %414, %412 : index + %416 = divi_signed %378, %c8 : index + %417 = subi %c-1, %416 : index + %418 = select %376, %417, %416 : index + %419 = muli %411, %c-2 : index + %420 = addi %418, %419 : index + %421 = addi %420, %c1 : index + %422 = cmpi "slt", %421, %c0 : index + %423 = subi %c-1, %421 : index + %424 = select %422, %423, %421 : index + %425 = divi_signed %424, %c2 : index + %426 = subi %c-1, %425 : index + %427 = select %422, %426, %425 : index + %428 = muli %427, %c-2 : index + %429 = addi %420, %428 : index + %430 = addi %429, %c1 : index + store %404, %3[%415, %389, %430] : memref<16x128x2xvector<8xf32>> + %431 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %432 = addi %381, %c1 : index + %433 = cmpi "slt", %432, %c0 : index + %434 = subi %c-1, %432 : index + %435 = select %433, %434, %432 : index + %436 = divi_signed %435, %c16 : index + %437 = subi %c-1, %436 : index + %438 = select %433, %437, %436 : index + %439 = muli %438, %c-16 : index + %440 = addi %381, %439 : index + %441 = addi %440, %c1 : index + store %431, %3[%441, %389, %403] : memref<16x128x2xvector<8xf32>> + %442 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %443 = addi %8, %c24 : index + %444 = cmpi "slt", %443, %c0 : index + %445 = subi %c-1, %443 : index + %446 = select %444, %445, %443 : index + %447 = divi_signed %446, %c16 : index + %448 = subi %c-1, %447 : index + %449 = select %444, %448, %447 : index + %450 = remi_signed %449, %c16 : index + %451 = cmpi "slt", %450, %c0 : index + %452 = addi %450, %c16 : index + %453 = select %451, %452, %450 : index + %454 = muli %449, %c-2 : index + %455 = addi %418, %454 : index + %456 = addi %455, %c3 : index + %457 = cmpi "slt", %456, %c0 : index + %458 = subi %c-1, %456 : index + %459 = select %457, %458, %456 : index + %460 = divi_signed %459, %c2 : index + %461 = subi %c-1, %460 : index + %462 = select %457, %461, %460 : index + %463 = muli %462, %c-2 : index + %464 = addi %455, %463 : index + %465 = addi %464, %c3 : index + store %442, %3[%453, %389, %465] : memref<16x128x2xvector<8xf32>> + %466 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %467 = addi %381, %c2 : index + %468 = cmpi "slt", %467, %c0 : index + %469 = subi %c-1, %467 : index + %470 = select %468, %469, %467 : index + %471 = divi_signed %470, %c16 : index + %472 = subi %c-1, %471 : index + %473 = select %468, %472, %471 : index + %474 = muli %473, %c-16 : index + %475 = addi %381, %474 : index + %476 = addi %475, %c2 : index + store %466, %3[%476, %389, %403] : memref<16x128x2xvector<8xf32>> + %477 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %478 = addi %8, %c40 : index + %479 = cmpi "slt", %478, %c0 : index + %480 = subi %c-1, %478 : index + %481 = select %479, %480, %478 : index + %482 = divi_signed %481, %c16 : index + %483 = subi %c-1, %482 : index + %484 = select %479, %483, %482 : index + %485 = remi_signed %484, %c16 : index + %486 = cmpi "slt", %485, %c0 : index + %487 = addi %485, %c16 : index + %488 = select %486, %487, %485 : index + %489 = muli %484, %c-2 : index + %490 = addi %418, %489 : index + %491 = addi %490, %c5 : index + %492 = cmpi "slt", %491, %c0 : index + %493 = subi %c-1, %491 : index + %494 = select %492, %493, %491 : index + %495 = divi_signed %494, %c2 : index + %496 = subi %c-1, %495 : index + %497 = select %492, %496, %495 : index + %498 = muli %497, %c-2 : index + %499 = addi %490, %498 : index + %500 = addi %499, %c5 : index + store %477, %3[%488, %389, %500] : memref<16x128x2xvector<8xf32>> + %501 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %502 = addi %381, %c3 : index + %503 = cmpi "slt", %502, %c0 : index + %504 = subi %c-1, %502 : index + %505 = select %503, %504, %502 : index + %506 = divi_signed %505, %c16 : index + %507 = subi %c-1, %506 : index + %508 = select %503, %507, %506 : index + %509 = muli %508, %c-16 : index + %510 = addi %381, %509 : index + %511 = addi %510, %c3 : index + store %501, %3[%511, %389, %403] : memref<16x128x2xvector<8xf32>> + %512 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %513 = addi %8, %c56 : index + %514 = cmpi "slt", %513, %c0 : index + %515 = subi %c-1, %513 : index + %516 = select %514, %515, %513 : index + %517 = divi_signed %516, %c16 : index + %518 = subi %c-1, %517 : index + %519 = select %514, %518, %517 : index + %520 = remi_signed %519, %c16 : index + %521 = cmpi "slt", %520, %c0 : index + %522 = addi %520, %c16 : index + %523 = select %521, %522, %520 : index + %524 = muli %519, %c-2 : index + %525 = addi %418, %524 : index + %526 = addi %525, %c7 : index + %527 = cmpi "slt", %526, %c0 : index + %528 = subi %c-1, %526 : index + %529 = select %527, %528, %526 : index + %530 = divi_signed %529, %c2 : index + %531 = subi %c-1, %530 : index + %532 = select %527, %531, %530 : index + %533 = muli %532, %c-2 : index + %534 = addi %525, %533 : index + %535 = addi %534, %c7 : index + store %512, %3[%523, %389, %535] : memref<16x128x2xvector<8xf32>> + %536 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %537 = addi %381, %c4 : index + %538 = cmpi "slt", %537, %c0 : index + %539 = subi %c-1, %537 : index + %540 = select %538, %539, %537 : index + %541 = divi_signed %540, %c16 : index + %542 = subi %c-1, %541 : index + %543 = select %538, %542, %541 : index + %544 = muli %543, %c-16 : index + %545 = addi %381, %544 : index + %546 = addi %545, %c4 : index + store %536, %3[%546, %389, %403] : memref<16x128x2xvector<8xf32>> + %547 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %548 = addi %8, %c72 : index + %549 = cmpi "slt", %548, %c0 : index + %550 = subi %c-1, %548 : index + %551 = select %549, %550, %548 : index + %552 = divi_signed %551, %c16 : index + %553 = subi %c-1, %552 : index + %554 = select %549, %553, %552 : index + %555 = remi_signed %554, %c16 : index + %556 = cmpi "slt", %555, %c0 : index + %557 = addi %555, %c16 : index + %558 = select %556, %557, %555 : index + %559 = muli %554, %c-2 : index + %560 = addi %418, %559 : index + %561 = addi %560, %c9 : index + %562 = cmpi "slt", %561, %c0 : index + %563 = subi %c-1, %561 : index + %564 = select %562, %563, %561 : index + %565 = divi_signed %564, %c2 : index + %566 = subi %c-1, %565 : index + %567 = select %562, %566, %565 : index + %568 = muli %567, %c-2 : index + %569 = addi %560, %568 : index + %570 = addi %569, %c9 : index + store %547, %3[%558, %389, %570] : memref<16x128x2xvector<8xf32>> + %571 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %572 = addi %381, %c5 : index + %573 = cmpi "slt", %572, %c0 : index + %574 = subi %c-1, %572 : index + %575 = select %573, %574, %572 : index + %576 = divi_signed %575, %c16 : index + %577 = subi %c-1, %576 : index + %578 = select %573, %577, %576 : index + %579 = muli %578, %c-16 : index + %580 = addi %381, %579 : index + %581 = addi %580, %c5 : index + store %571, %3[%581, %389, %403] : memref<16x128x2xvector<8xf32>> + %582 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %583 = addi %8, %c88 : index + %584 = cmpi "slt", %583, %c0 : index + %585 = subi %c-1, %583 : index + %586 = select %584, %585, %583 : index + %587 = divi_signed %586, %c16 : index + %588 = subi %c-1, %587 : index + %589 = select %584, %588, %587 : index + %590 = remi_signed %589, %c16 : index + %591 = cmpi "slt", %590, %c0 : index + %592 = addi %590, %c16 : index + %593 = select %591, %592, %590 : index + %594 = muli %589, %c-2 : index + %595 = addi %418, %594 : index + %596 = addi %595, %c11 : index + %597 = cmpi "slt", %596, %c0 : index + %598 = subi %c-1, %596 : index + %599 = select %597, %598, %596 : index + %600 = divi_signed %599, %c2 : index + %601 = subi %c-1, %600 : index + %602 = select %597, %601, %600 : index + %603 = muli %602, %c-2 : index + %604 = addi %595, %603 : index + %605 = addi %604, %c11 : index + store %582, %3[%593, %389, %605] : memref<16x128x2xvector<8xf32>> + %606 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %607 = addi %381, %c6 : index + %608 = cmpi "slt", %607, %c0 : index + %609 = subi %c-1, %607 : index + %610 = select %608, %609, %607 : index + %611 = divi_signed %610, %c16 : index + %612 = subi %c-1, %611 : index + %613 = select %608, %612, %611 : index + %614 = muli %613, %c-16 : index + %615 = addi %381, %614 : index + %616 = addi %615, %c6 : index + store %606, %3[%616, %389, %403] : memref<16x128x2xvector<8xf32>> + %617 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %618 = addi %8, %c104 : index + %619 = cmpi "slt", %618, %c0 : index + %620 = subi %c-1, %618 : index + %621 = select %619, %620, %618 : index + %622 = divi_signed %621, %c16 : index + %623 = subi %c-1, %622 : index + %624 = select %619, %623, %622 : index + %625 = remi_signed %624, %c16 : index + %626 = cmpi "slt", %625, %c0 : index + %627 = addi %625, %c16 : index + %628 = select %626, %627, %625 : index + %629 = muli %624, %c-2 : index + %630 = addi %418, %629 : index + %631 = addi %630, %c13 : index + %632 = cmpi "slt", %631, %c0 : index + %633 = subi %c-1, %631 : index + %634 = select %632, %633, %631 : index + %635 = divi_signed %634, %c2 : index + %636 = subi %c-1, %635 : index + %637 = select %632, %636, %635 : index + %638 = muli %637, %c-2 : index + %639 = addi %630, %638 : index + %640 = addi %639, %c13 : index + store %617, %3[%628, %389, %640] : memref<16x128x2xvector<8xf32>> + %641 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %642 = addi %381, %c7 : index + %643 = cmpi "slt", %642, %c0 : index + %644 = subi %c-1, %642 : index + %645 = select %643, %644, %642 : index + %646 = divi_signed %645, %c16 : index + %647 = subi %c-1, %646 : index + %648 = select %643, %647, %646 : index + %649 = muli %648, %c-16 : index + %650 = addi %381, %649 : index + %651 = addi %650, %c7 : index + store %641, %3[%651, %389, %403] : memref<16x128x2xvector<8xf32>> + %652 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %653 = addi %8, %c120 : index + %654 = cmpi "slt", %653, %c0 : index + %655 = subi %c-1, %653 : index + %656 = select %654, %655, %653 : index + %657 = divi_signed %656, %c16 : index + %658 = subi %c-1, %657 : index + %659 = select %654, %658, %657 : index + %660 = remi_signed %659, %c16 : index + %661 = cmpi "slt", %660, %c0 : index + %662 = addi %660, %c16 : index + %663 = select %661, %662, %660 : index + %664 = muli %659, %c-2 : index + %665 = addi %418, %664 : index + %666 = addi %665, %c15 : index + %667 = cmpi "slt", %666, %c0 : index + %668 = subi %c-1, %666 : index + %669 = select %667, %668, %666 : index + %670 = divi_signed %669, %c2 : index + %671 = subi %c-1, %670 : index + %672 = select %667, %671, %670 : index + %673 = muli %672, %c-2 : index + %674 = addi %665, %673 : index + %675 = addi %674, %c15 : index + store %652, %3[%663, %389, %675] : memref<16x128x2xvector<8xf32>> + br ^bb9 + ^bb9: // 2 preds: ^bb7, ^bb8 + %676 = addi %8, %c128 : index + br ^bb5(%676 : index) + ^bb10: // pred: ^bb5 + %677 = addi %6, %c1 : index + br ^bb3(%677 : index) + ^bb11: // pred: ^bb3 + br ^bb12(%c0 : index) + ^bb12(%678: index): // 2 preds: ^bb11, ^bb52 + %679 = cmpi "slt", %678, %c784 : index + cond_br %679, ^bb13, ^bb53 + ^bb13: // pred: ^bb12 + br ^bb14(%c0 : index) + ^bb14(%680: index): // 2 preds: ^bb13, ^bb21 + %681 = cmpi "slt", %680, %c16 : index + cond_br %681, ^bb15, ^bb22 + ^bb15: // pred: ^bb14 + br ^bb16(%c0 : index) + ^bb16(%682: index): // 2 preds: ^bb15, ^bb20 + %683 = cmpi "slt", %682, %c6 : index + cond_br %683, ^bb17, ^bb21 + ^bb17: // pred: ^bb16 + br ^bb18(%c0 : index) + ^bb18(%684: index): // 2 preds: ^bb17, ^bb19 + %685 = cmpi "slt", %684, %c2 : index + cond_br %685, ^bb19, ^bb20 + ^bb19: // pred: ^bb18 + store %cst_0, %2[%680, %682, %684] : memref<16x6x2xvector<8xf32>> + %686 = addi %684, %c1 : index + br ^bb18(%686 : index) + ^bb20: // pred: ^bb18 + %687 = addi %682, %c1 : index + br ^bb16(%687 : index) + ^bb21: // pred: ^bb16 + %688 = addi %680, %c1 : index + br ^bb14(%688 : index) + ^bb22: // pred: ^bb14 + br ^bb23(%c0 : index) + ^bb23(%689: index): // 2 preds: ^bb22, ^bb39 + %690 = cmpi "slt", %689, %c256 : index + cond_br %690, ^bb24, ^bb40 + ^bb24: // pred: ^bb23 + br ^bb25(%c0 : index) + ^bb25(%691: index): // 2 preds: ^bb24, ^bb38 + %692 = cmpi "slt", %691, %c128 : index + cond_br %692, ^bb26, ^bb39 + ^bb26: // pred: ^bb25 + br ^bb27(%c0 : index) + ^bb27(%693: index): // 2 preds: ^bb26, ^bb34 + %694 = cmpi "slt", %693, %c0 : index + cond_br %694, ^bb28, ^bb35 + ^bb28: // pred: ^bb27 + br ^bb29(%c0 : index) + ^bb29(%695: index): // 2 preds: ^bb28, ^bb33 + %696 = cmpi "slt", %695, %c4 : index + cond_br %696, ^bb30, ^bb34 + ^bb30: // pred: ^bb29 + br ^bb31(%c0 : index) + ^bb31(%697: index): // 2 preds: ^bb30, ^bb32 + %698 = cmpi "slt", %697, %c0 : index + cond_br %698, ^bb32, ^bb33 + ^bb32: // pred: ^bb31 + %699 = addi %678, %693 : index + %700 = addi %699, %697 : index + %701 = addi %691, %695 : index + %702 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %703 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %704 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %705 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %706 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %707 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %708 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %709 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %710 = cmpi "slt", %689, %c0 : index + %711 = subi %c-1, %689 : index + %712 = select %710, %711, %689 : index + %713 = divi_signed %712, %c16 : index + %714 = subi %c-1, %713 : index + %715 = select %710, %714, %713 : index + %716 = remi_signed %715, %c16 : index + %717 = cmpi "slt", %716, %c0 : index + %718 = addi %716, %c16 : index + %719 = select %717, %718, %716 : index + %720 = remi_signed %701, %c128 : index + %721 = cmpi "slt", %720, %c0 : index + %722 = addi %720, %c128 : index + %723 = select %721, %722, %720 : index + %724 = remi_signed %689, %c16 : index + %725 = cmpi "slt", %724, %c0 : index + %726 = addi %724, %c16 : index + %727 = select %725, %726, %724 : index + %728 = cmpi "slt", %727, %c0 : index + %729 = subi %c-1, %727 : index + %730 = select %728, %729, %727 : index + %731 = divi_signed %730, %c8 : index + %732 = subi %c-1, %731 : index + %733 = select %728, %732, %731 : index + %734 = remi_signed %733, %c2 : index + %735 = cmpi "slt", %734, %c0 : index + %736 = addi %734, %c2 : index + %737 = select %735, %736, %734 : index + %738 = load %3[%719, %723, %737] : memref<16x128x2xvector<8xf32>> + %739 = vector.extractelement %738[%c0_i64 : i64] : vector<8xf32> + %740 = load %3[%719, %723, %737] : memref<16x128x2xvector<8xf32>> + %741 = vector.extractelement %740[%c1_i64 : i64] : vector<8xf32> + %742 = load %3[%719, %723, %737] : memref<16x128x2xvector<8xf32>> + %743 = vector.extractelement %742[%c2_i64 : i64] : vector<8xf32> + %744 = load %3[%719, %723, %737] : memref<16x128x2xvector<8xf32>> + %745 = vector.extractelement %744[%c3_i64 : i64] : vector<8xf32> + %746 = load %3[%719, %723, %737] : memref<16x128x2xvector<8xf32>> + %747 = vector.extractelement %746[%c4_i64 : i64] : vector<8xf32> + %748 = load %3[%719, %723, %737] : memref<16x128x2xvector<8xf32>> + %749 = vector.extractelement %748[%c5_i64 : i64] : vector<8xf32> + %750 = load %3[%719, %723, %737] : memref<16x128x2xvector<8xf32>> + %751 = vector.extractelement %750[%c6_i64 : i64] : vector<8xf32> + %752 = load %3[%719, %723, %737] : memref<16x128x2xvector<8xf32>> + %753 = vector.extractelement %752[%c7_i64 : i64] : vector<8xf32> + %754 = mulf %702, %739 {RelaxedPrecision} : f32 + %755 = mulf %703, %741 {RelaxedPrecision} : f32 + %756 = mulf %704, %743 {RelaxedPrecision} : f32 + %757 = mulf %705, %745 {RelaxedPrecision} : f32 + %758 = mulf %706, %747 {RelaxedPrecision} : f32 + %759 = mulf %707, %749 {RelaxedPrecision} : f32 + %760 = mulf %708, %751 {RelaxedPrecision} : f32 + %761 = mulf %709, %753 {RelaxedPrecision} : f32 + %762 = addi %693, %697 : index + %763 = remi_signed %762, %c6 : index + %764 = cmpi "slt", %763, %c0 : index + %765 = addi %763, %c6 : index + %766 = select %764, %765, %763 : index + %767 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %768 = vector.extractelement %767[%c0_i64 : i64] : vector<8xf32> + %769 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %770 = vector.extractelement %769[%c1_i64 : i64] : vector<8xf32> + %771 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %772 = vector.extractelement %771[%c2_i64 : i64] : vector<8xf32> + %773 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %774 = vector.extractelement %773[%c3_i64 : i64] : vector<8xf32> + %775 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %776 = vector.extractelement %775[%c4_i64 : i64] : vector<8xf32> + %777 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %778 = vector.extractelement %777[%c5_i64 : i64] : vector<8xf32> + %779 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %780 = vector.extractelement %779[%c6_i64 : i64] : vector<8xf32> + %781 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %782 = vector.extractelement %781[%c7_i64 : i64] : vector<8xf32> + %783 = addf %768, %754 {RelaxedPrecision} : f32 + %784 = addf %770, %755 {RelaxedPrecision} : f32 + %785 = addf %772, %756 {RelaxedPrecision} : f32 + %786 = addf %774, %757 {RelaxedPrecision} : f32 + %787 = addf %776, %758 {RelaxedPrecision} : f32 + %788 = addf %778, %759 {RelaxedPrecision} : f32 + %789 = addf %780, %760 {RelaxedPrecision} : f32 + %790 = addf %782, %761 {RelaxedPrecision} : f32 + %791 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %792 = vector.insertelement %783, %791[%c0_i64 : i64] : vector<8xf32> + store %792, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %793 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %794 = vector.insertelement %784, %793[%c1_i64 : i64] : vector<8xf32> + store %794, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %795 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %796 = vector.insertelement %785, %795[%c2_i64 : i64] : vector<8xf32> + store %796, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %797 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %798 = vector.insertelement %786, %797[%c3_i64 : i64] : vector<8xf32> + store %798, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %799 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %800 = vector.insertelement %787, %799[%c4_i64 : i64] : vector<8xf32> + store %800, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %801 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %802 = vector.insertelement %788, %801[%c5_i64 : i64] : vector<8xf32> + store %802, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %803 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %804 = vector.insertelement %789, %803[%c6_i64 : i64] : vector<8xf32> + store %804, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %805 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %806 = vector.insertelement %790, %805[%c7_i64 : i64] : vector<8xf32> + store %806, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %807 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %808 = vector.insertelement %783, %807[%c0_i64 : i64] : vector<8xf32> + store %808, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %809 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %810 = vector.insertelement %784, %809[%c1_i64 : i64] : vector<8xf32> + store %810, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %811 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %812 = vector.insertelement %785, %811[%c2_i64 : i64] : vector<8xf32> + store %812, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %813 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %814 = vector.insertelement %786, %813[%c3_i64 : i64] : vector<8xf32> + store %814, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %815 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %816 = vector.insertelement %787, %815[%c4_i64 : i64] : vector<8xf32> + store %816, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %817 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %818 = vector.insertelement %788, %817[%c5_i64 : i64] : vector<8xf32> + store %818, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %819 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %820 = vector.insertelement %789, %819[%c6_i64 : i64] : vector<8xf32> + store %820, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %821 = load %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %822 = vector.insertelement %790, %821[%c7_i64 : i64] : vector<8xf32> + store %822, %2[%719, %766, %737] : memref<16x6x2xvector<8xf32>> + %823 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %824 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %825 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %826 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %827 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %828 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %829 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %830 = load %arg0[%700, %701] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %831 = addi %689, %c8 : index + %832 = cmpi "slt", %831, %c0 : index + %833 = subi %c-1, %831 : index + %834 = select %832, %833, %831 : index + %835 = divi_signed %834, %c16 : index + %836 = subi %c-1, %835 : index + %837 = select %832, %836, %835 : index + %838 = remi_signed %837, %c16 : index + %839 = cmpi "slt", %838, %c0 : index + %840 = addi %838, %c16 : index + %841 = select %839, %840, %838 : index + %842 = divi_signed %712, %c8 : index + %843 = subi %c-1, %842 : index + %844 = select %710, %843, %842 : index + %845 = muli %837, %c-2 : index + %846 = addi %844, %845 : index + %847 = addi %846, %c1 : index + %848 = cmpi "slt", %847, %c0 : index + %849 = subi %c-1, %847 : index + %850 = select %848, %849, %847 : index + %851 = divi_signed %850, %c2 : index + %852 = subi %c-1, %851 : index + %853 = select %848, %852, %851 : index + %854 = muli %853, %c-2 : index + %855 = addi %846, %854 : index + %856 = addi %855, %c1 : index + %857 = load %3[%841, %723, %856] : memref<16x128x2xvector<8xf32>> + %858 = vector.extractelement %857[%c0_i64 : i64] : vector<8xf32> + %859 = load %3[%841, %723, %856] : memref<16x128x2xvector<8xf32>> + %860 = vector.extractelement %859[%c1_i64 : i64] : vector<8xf32> + %861 = load %3[%841, %723, %856] : memref<16x128x2xvector<8xf32>> + %862 = vector.extractelement %861[%c2_i64 : i64] : vector<8xf32> + %863 = load %3[%841, %723, %856] : memref<16x128x2xvector<8xf32>> + %864 = vector.extractelement %863[%c3_i64 : i64] : vector<8xf32> + %865 = load %3[%841, %723, %856] : memref<16x128x2xvector<8xf32>> + %866 = vector.extractelement %865[%c4_i64 : i64] : vector<8xf32> + %867 = load %3[%841, %723, %856] : memref<16x128x2xvector<8xf32>> + %868 = vector.extractelement %867[%c5_i64 : i64] : vector<8xf32> + %869 = load %3[%841, %723, %856] : memref<16x128x2xvector<8xf32>> + %870 = vector.extractelement %869[%c6_i64 : i64] : vector<8xf32> + %871 = load %3[%841, %723, %856] : memref<16x128x2xvector<8xf32>> + %872 = vector.extractelement %871[%c7_i64 : i64] : vector<8xf32> + %873 = mulf %823, %858 {RelaxedPrecision} : f32 + %874 = mulf %824, %860 {RelaxedPrecision} : f32 + %875 = mulf %825, %862 {RelaxedPrecision} : f32 + %876 = mulf %826, %864 {RelaxedPrecision} : f32 + %877 = mulf %827, %866 {RelaxedPrecision} : f32 + %878 = mulf %828, %868 {RelaxedPrecision} : f32 + %879 = mulf %829, %870 {RelaxedPrecision} : f32 + %880 = mulf %830, %872 {RelaxedPrecision} : f32 + %881 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %882 = vector.extractelement %881[%c0_i64 : i64] : vector<8xf32> + %883 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %884 = vector.extractelement %883[%c1_i64 : i64] : vector<8xf32> + %885 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %886 = vector.extractelement %885[%c2_i64 : i64] : vector<8xf32> + %887 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %888 = vector.extractelement %887[%c3_i64 : i64] : vector<8xf32> + %889 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %890 = vector.extractelement %889[%c4_i64 : i64] : vector<8xf32> + %891 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %892 = vector.extractelement %891[%c5_i64 : i64] : vector<8xf32> + %893 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %894 = vector.extractelement %893[%c6_i64 : i64] : vector<8xf32> + %895 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %896 = vector.extractelement %895[%c7_i64 : i64] : vector<8xf32> + %897 = addf %882, %873 {RelaxedPrecision} : f32 + %898 = addf %884, %874 {RelaxedPrecision} : f32 + %899 = addf %886, %875 {RelaxedPrecision} : f32 + %900 = addf %888, %876 {RelaxedPrecision} : f32 + %901 = addf %890, %877 {RelaxedPrecision} : f32 + %902 = addf %892, %878 {RelaxedPrecision} : f32 + %903 = addf %894, %879 {RelaxedPrecision} : f32 + %904 = addf %896, %880 {RelaxedPrecision} : f32 + %905 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %906 = vector.insertelement %897, %905[%c0_i64 : i64] : vector<8xf32> + store %906, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %907 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %908 = vector.insertelement %898, %907[%c1_i64 : i64] : vector<8xf32> + store %908, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %909 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %910 = vector.insertelement %899, %909[%c2_i64 : i64] : vector<8xf32> + store %910, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %911 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %912 = vector.insertelement %900, %911[%c3_i64 : i64] : vector<8xf32> + store %912, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %913 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %914 = vector.insertelement %901, %913[%c4_i64 : i64] : vector<8xf32> + store %914, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %915 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %916 = vector.insertelement %902, %915[%c5_i64 : i64] : vector<8xf32> + store %916, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %917 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %918 = vector.insertelement %903, %917[%c6_i64 : i64] : vector<8xf32> + store %918, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %919 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %920 = vector.insertelement %904, %919[%c7_i64 : i64] : vector<8xf32> + store %920, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %921 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %922 = vector.insertelement %897, %921[%c0_i64 : i64] : vector<8xf32> + store %922, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %923 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %924 = vector.insertelement %898, %923[%c1_i64 : i64] : vector<8xf32> + store %924, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %925 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %926 = vector.insertelement %899, %925[%c2_i64 : i64] : vector<8xf32> + store %926, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %927 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %928 = vector.insertelement %900, %927[%c3_i64 : i64] : vector<8xf32> + store %928, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %929 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %930 = vector.insertelement %901, %929[%c4_i64 : i64] : vector<8xf32> + store %930, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %931 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %932 = vector.insertelement %902, %931[%c5_i64 : i64] : vector<8xf32> + store %932, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %933 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %934 = vector.insertelement %903, %933[%c6_i64 : i64] : vector<8xf32> + store %934, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %935 = load %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %936 = vector.insertelement %904, %935[%c7_i64 : i64] : vector<8xf32> + store %936, %2[%841, %766, %856] : memref<16x6x2xvector<8xf32>> + %937 = addi %697, %c1 : index + br ^bb31(%937 : index) + ^bb33: // pred: ^bb31 + %938 = addi %695, %c1 : index + br ^bb29(%938 : index) + ^bb34: // pred: ^bb29 + %939 = addi %693, %c6 : index + br ^bb27(%939 : index) + ^bb35: // pred: ^bb27 + br ^bb36(%c0 : index) + ^bb36(%940: index): // 2 preds: ^bb35, ^bb37 + %941 = cmpi "slt", %940, %c4 : index + cond_br %941, ^bb37, ^bb38 + ^bb37: // pred: ^bb36 + %942 = addi %691, %940 : index + %943 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %944 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %945 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %946 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %947 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %948 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %949 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %950 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %951 = cmpi "slt", %689, %c0 : index + %952 = subi %c-1, %689 : index + %953 = select %951, %952, %689 : index + %954 = divi_signed %953, %c16 : index + %955 = subi %c-1, %954 : index + %956 = select %951, %955, %954 : index + %957 = remi_signed %956, %c16 : index + %958 = cmpi "slt", %957, %c0 : index + %959 = addi %957, %c16 : index + %960 = select %958, %959, %957 : index + %961 = remi_signed %942, %c128 : index + %962 = cmpi "slt", %961, %c0 : index + %963 = addi %961, %c128 : index + %964 = select %962, %963, %961 : index + %965 = remi_signed %689, %c16 : index + %966 = cmpi "slt", %965, %c0 : index + %967 = addi %965, %c16 : index + %968 = select %966, %967, %965 : index + %969 = cmpi "slt", %968, %c0 : index + %970 = subi %c-1, %968 : index + %971 = select %969, %970, %968 : index + %972 = divi_signed %971, %c8 : index + %973 = subi %c-1, %972 : index + %974 = select %969, %973, %972 : index + %975 = remi_signed %974, %c2 : index + %976 = cmpi "slt", %975, %c0 : index + %977 = addi %975, %c2 : index + %978 = select %976, %977, %975 : index + %979 = load %3[%960, %964, %978] : memref<16x128x2xvector<8xf32>> + %980 = vector.extractelement %979[%c0_i64 : i64] : vector<8xf32> + %981 = load %3[%960, %964, %978] : memref<16x128x2xvector<8xf32>> + %982 = vector.extractelement %981[%c1_i64 : i64] : vector<8xf32> + %983 = load %3[%960, %964, %978] : memref<16x128x2xvector<8xf32>> + %984 = vector.extractelement %983[%c2_i64 : i64] : vector<8xf32> + %985 = load %3[%960, %964, %978] : memref<16x128x2xvector<8xf32>> + %986 = vector.extractelement %985[%c3_i64 : i64] : vector<8xf32> + %987 = load %3[%960, %964, %978] : memref<16x128x2xvector<8xf32>> + %988 = vector.extractelement %987[%c4_i64 : i64] : vector<8xf32> + %989 = load %3[%960, %964, %978] : memref<16x128x2xvector<8xf32>> + %990 = vector.extractelement %989[%c5_i64 : i64] : vector<8xf32> + %991 = load %3[%960, %964, %978] : memref<16x128x2xvector<8xf32>> + %992 = vector.extractelement %991[%c6_i64 : i64] : vector<8xf32> + %993 = load %3[%960, %964, %978] : memref<16x128x2xvector<8xf32>> + %994 = vector.extractelement %993[%c7_i64 : i64] : vector<8xf32> + %995 = mulf %943, %980 {RelaxedPrecision} : f32 + %996 = mulf %944, %982 {RelaxedPrecision} : f32 + %997 = mulf %945, %984 {RelaxedPrecision} : f32 + %998 = mulf %946, %986 {RelaxedPrecision} : f32 + %999 = mulf %947, %988 {RelaxedPrecision} : f32 + %1000 = mulf %948, %990 {RelaxedPrecision} : f32 + %1001 = mulf %949, %992 {RelaxedPrecision} : f32 + %1002 = mulf %950, %994 {RelaxedPrecision} : f32 + %1003 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1004 = vector.extractelement %1003[%c0_i64 : i64] : vector<8xf32> + %1005 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1006 = vector.extractelement %1005[%c1_i64 : i64] : vector<8xf32> + %1007 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1008 = vector.extractelement %1007[%c2_i64 : i64] : vector<8xf32> + %1009 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1010 = vector.extractelement %1009[%c3_i64 : i64] : vector<8xf32> + %1011 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1012 = vector.extractelement %1011[%c4_i64 : i64] : vector<8xf32> + %1013 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1014 = vector.extractelement %1013[%c5_i64 : i64] : vector<8xf32> + %1015 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1016 = vector.extractelement %1015[%c6_i64 : i64] : vector<8xf32> + %1017 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1018 = vector.extractelement %1017[%c7_i64 : i64] : vector<8xf32> + %1019 = addf %1004, %995 {RelaxedPrecision} : f32 + %1020 = addf %1006, %996 {RelaxedPrecision} : f32 + %1021 = addf %1008, %997 {RelaxedPrecision} : f32 + %1022 = addf %1010, %998 {RelaxedPrecision} : f32 + %1023 = addf %1012, %999 {RelaxedPrecision} : f32 + %1024 = addf %1014, %1000 {RelaxedPrecision} : f32 + %1025 = addf %1016, %1001 {RelaxedPrecision} : f32 + %1026 = addf %1018, %1002 {RelaxedPrecision} : f32 + %1027 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1028 = vector.insertelement %1019, %1027[%c0_i64 : i64] : vector<8xf32> + store %1028, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1029 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1030 = vector.insertelement %1020, %1029[%c1_i64 : i64] : vector<8xf32> + store %1030, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1031 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1032 = vector.insertelement %1021, %1031[%c2_i64 : i64] : vector<8xf32> + store %1032, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1033 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1034 = vector.insertelement %1022, %1033[%c3_i64 : i64] : vector<8xf32> + store %1034, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1035 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1036 = vector.insertelement %1023, %1035[%c4_i64 : i64] : vector<8xf32> + store %1036, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1037 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1038 = vector.insertelement %1024, %1037[%c5_i64 : i64] : vector<8xf32> + store %1038, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1039 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1040 = vector.insertelement %1025, %1039[%c6_i64 : i64] : vector<8xf32> + store %1040, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1041 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1042 = vector.insertelement %1026, %1041[%c7_i64 : i64] : vector<8xf32> + store %1042, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1043 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1044 = vector.insertelement %1019, %1043[%c0_i64 : i64] : vector<8xf32> + store %1044, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1045 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1046 = vector.insertelement %1020, %1045[%c1_i64 : i64] : vector<8xf32> + store %1046, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1047 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1048 = vector.insertelement %1021, %1047[%c2_i64 : i64] : vector<8xf32> + store %1048, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1049 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1050 = vector.insertelement %1022, %1049[%c3_i64 : i64] : vector<8xf32> + store %1050, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1051 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1052 = vector.insertelement %1023, %1051[%c4_i64 : i64] : vector<8xf32> + store %1052, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1053 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1054 = vector.insertelement %1024, %1053[%c5_i64 : i64] : vector<8xf32> + store %1054, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1055 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1056 = vector.insertelement %1025, %1055[%c6_i64 : i64] : vector<8xf32> + store %1056, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1057 = load %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1058 = vector.insertelement %1026, %1057[%c7_i64 : i64] : vector<8xf32> + store %1058, %2[%960, %c0, %978] : memref<16x6x2xvector<8xf32>> + %1059 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1060 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1061 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1062 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1063 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1064 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1065 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1066 = load %arg0[%678, %942] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1067 = addi %689, %c8 : index + %1068 = cmpi "slt", %1067, %c0 : index + %1069 = subi %c-1, %1067 : index + %1070 = select %1068, %1069, %1067 : index + %1071 = divi_signed %1070, %c16 : index + %1072 = subi %c-1, %1071 : index + %1073 = select %1068, %1072, %1071 : index + %1074 = remi_signed %1073, %c16 : index + %1075 = cmpi "slt", %1074, %c0 : index + %1076 = addi %1074, %c16 : index + %1077 = select %1075, %1076, %1074 : index + %1078 = divi_signed %953, %c8 : index + %1079 = subi %c-1, %1078 : index + %1080 = select %951, %1079, %1078 : index + %1081 = muli %1073, %c-2 : index + %1082 = addi %1080, %1081 : index + %1083 = addi %1082, %c1 : index + %1084 = cmpi "slt", %1083, %c0 : index + %1085 = subi %c-1, %1083 : index + %1086 = select %1084, %1085, %1083 : index + %1087 = divi_signed %1086, %c2 : index + %1088 = subi %c-1, %1087 : index + %1089 = select %1084, %1088, %1087 : index + %1090 = muli %1089, %c-2 : index + %1091 = addi %1082, %1090 : index + %1092 = addi %1091, %c1 : index + %1093 = load %3[%1077, %964, %1092] : memref<16x128x2xvector<8xf32>> + %1094 = vector.extractelement %1093[%c0_i64 : i64] : vector<8xf32> + %1095 = load %3[%1077, %964, %1092] : memref<16x128x2xvector<8xf32>> + %1096 = vector.extractelement %1095[%c1_i64 : i64] : vector<8xf32> + %1097 = load %3[%1077, %964, %1092] : memref<16x128x2xvector<8xf32>> + %1098 = vector.extractelement %1097[%c2_i64 : i64] : vector<8xf32> + %1099 = load %3[%1077, %964, %1092] : memref<16x128x2xvector<8xf32>> + %1100 = vector.extractelement %1099[%c3_i64 : i64] : vector<8xf32> + %1101 = load %3[%1077, %964, %1092] : memref<16x128x2xvector<8xf32>> + %1102 = vector.extractelement %1101[%c4_i64 : i64] : vector<8xf32> + %1103 = load %3[%1077, %964, %1092] : memref<16x128x2xvector<8xf32>> + %1104 = vector.extractelement %1103[%c5_i64 : i64] : vector<8xf32> + %1105 = load %3[%1077, %964, %1092] : memref<16x128x2xvector<8xf32>> + %1106 = vector.extractelement %1105[%c6_i64 : i64] : vector<8xf32> + %1107 = load %3[%1077, %964, %1092] : memref<16x128x2xvector<8xf32>> + %1108 = vector.extractelement %1107[%c7_i64 : i64] : vector<8xf32> + %1109 = mulf %1059, %1094 {RelaxedPrecision} : f32 + %1110 = mulf %1060, %1096 {RelaxedPrecision} : f32 + %1111 = mulf %1061, %1098 {RelaxedPrecision} : f32 + %1112 = mulf %1062, %1100 {RelaxedPrecision} : f32 + %1113 = mulf %1063, %1102 {RelaxedPrecision} : f32 + %1114 = mulf %1064, %1104 {RelaxedPrecision} : f32 + %1115 = mulf %1065, %1106 {RelaxedPrecision} : f32 + %1116 = mulf %1066, %1108 {RelaxedPrecision} : f32 + %1117 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1118 = vector.extractelement %1117[%c0_i64 : i64] : vector<8xf32> + %1119 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1120 = vector.extractelement %1119[%c1_i64 : i64] : vector<8xf32> + %1121 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1122 = vector.extractelement %1121[%c2_i64 : i64] : vector<8xf32> + %1123 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1124 = vector.extractelement %1123[%c3_i64 : i64] : vector<8xf32> + %1125 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1126 = vector.extractelement %1125[%c4_i64 : i64] : vector<8xf32> + %1127 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1128 = vector.extractelement %1127[%c5_i64 : i64] : vector<8xf32> + %1129 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1130 = vector.extractelement %1129[%c6_i64 : i64] : vector<8xf32> + %1131 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1132 = vector.extractelement %1131[%c7_i64 : i64] : vector<8xf32> + %1133 = addf %1118, %1109 {RelaxedPrecision} : f32 + %1134 = addf %1120, %1110 {RelaxedPrecision} : f32 + %1135 = addf %1122, %1111 {RelaxedPrecision} : f32 + %1136 = addf %1124, %1112 {RelaxedPrecision} : f32 + %1137 = addf %1126, %1113 {RelaxedPrecision} : f32 + %1138 = addf %1128, %1114 {RelaxedPrecision} : f32 + %1139 = addf %1130, %1115 {RelaxedPrecision} : f32 + %1140 = addf %1132, %1116 {RelaxedPrecision} : f32 + %1141 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1142 = vector.insertelement %1133, %1141[%c0_i64 : i64] : vector<8xf32> + store %1142, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1143 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1144 = vector.insertelement %1134, %1143[%c1_i64 : i64] : vector<8xf32> + store %1144, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1145 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1146 = vector.insertelement %1135, %1145[%c2_i64 : i64] : vector<8xf32> + store %1146, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1147 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1148 = vector.insertelement %1136, %1147[%c3_i64 : i64] : vector<8xf32> + store %1148, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1149 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1150 = vector.insertelement %1137, %1149[%c4_i64 : i64] : vector<8xf32> + store %1150, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1151 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1152 = vector.insertelement %1138, %1151[%c5_i64 : i64] : vector<8xf32> + store %1152, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1153 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1154 = vector.insertelement %1139, %1153[%c6_i64 : i64] : vector<8xf32> + store %1154, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1155 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1156 = vector.insertelement %1140, %1155[%c7_i64 : i64] : vector<8xf32> + store %1156, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1157 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1158 = vector.insertelement %1133, %1157[%c0_i64 : i64] : vector<8xf32> + store %1158, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1159 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1160 = vector.insertelement %1134, %1159[%c1_i64 : i64] : vector<8xf32> + store %1160, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1161 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1162 = vector.insertelement %1135, %1161[%c2_i64 : i64] : vector<8xf32> + store %1162, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1163 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1164 = vector.insertelement %1136, %1163[%c3_i64 : i64] : vector<8xf32> + store %1164, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1165 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1166 = vector.insertelement %1137, %1165[%c4_i64 : i64] : vector<8xf32> + store %1166, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1167 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1168 = vector.insertelement %1138, %1167[%c5_i64 : i64] : vector<8xf32> + store %1168, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1169 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1170 = vector.insertelement %1139, %1169[%c6_i64 : i64] : vector<8xf32> + store %1170, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1171 = load %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1172 = vector.insertelement %1140, %1171[%c7_i64 : i64] : vector<8xf32> + store %1172, %2[%1077, %c0, %1092] : memref<16x6x2xvector<8xf32>> + %1173 = addi %940, %c1 : index + br ^bb36(%1173 : index) + ^bb38: // pred: ^bb36 + %1174 = addi %691, %c4 : index + br ^bb25(%1174 : index) + ^bb39: // pred: ^bb25 + %1175 = addi %689, %c16 : index + br ^bb23(%1175 : index) + ^bb40: // pred: ^bb23 + br ^bb41(%c0 : index) + ^bb41(%1176: index): // 2 preds: ^bb40, ^bb51 + %1177 = cmpi "slt", %1176, %c256 : index + cond_br %1177, ^bb42, ^bb52 + ^bb42: // pred: ^bb41 + cond_br %true, ^bb43, ^bb47 + ^bb43: // pred: ^bb42 + %1178 = addi %4, %1176 : index + %1179 = vector.transfer_read %arg2[%678, %1178], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1180 = cmpi "slt", %1176, %c0 : index + %1181 = subi %c-1, %1176 : index + %1182 = select %1180, %1181, %1176 : index + %1183 = divi_signed %1182, %c16 : index + %1184 = subi %c-1, %1183 : index + %1185 = select %1180, %1184, %1183 : index + %1186 = remi_signed %1185, %c16 : index + %1187 = cmpi "slt", %1186, %c0 : index + %1188 = addi %1186, %c16 : index + %1189 = select %1187, %1188, %1186 : index + %1190 = remi_signed %1176, %c16 : index + %1191 = cmpi "slt", %1190, %c0 : index + %1192 = addi %1190, %c16 : index + %1193 = select %1191, %1192, %1190 : index + %1194 = cmpi "slt", %1193, %c0 : index + %1195 = subi %c-1, %1193 : index + %1196 = select %1194, %1195, %1193 : index + %1197 = divi_signed %1196, %c8 : index + %1198 = subi %c-1, %1197 : index + %1199 = select %1194, %1198, %1197 : index + %1200 = remi_signed %1199, %c2 : index + %1201 = cmpi "slt", %1200, %c0 : index + %1202 = addi %1200, %c2 : index + %1203 = select %1201, %1202, %1200 : index + %1204 = load %2[%1189, %c0, %1203] : memref<16x6x2xvector<8xf32>> + %1205 = addf %1179, %1204 : vector<8xf32> + store %1205, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %1206 = addi %1178, %c8 : index + %1207 = vector.transfer_read %arg2[%678, %1206], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1208 = addi %1176, %c8 : index + %1209 = cmpi "slt", %1208, %c0 : index + %1210 = subi %c-1, %1208 : index + %1211 = select %1209, %1210, %1208 : index + %1212 = divi_signed %1211, %c16 : index + %1213 = subi %c-1, %1212 : index + %1214 = select %1209, %1213, %1212 : index + %1215 = remi_signed %1214, %c16 : index + %1216 = cmpi "slt", %1215, %c0 : index + %1217 = addi %1215, %c16 : index + %1218 = select %1216, %1217, %1215 : index + %1219 = divi_signed %1182, %c8 : index + %1220 = subi %c-1, %1219 : index + %1221 = select %1180, %1220, %1219 : index + %1222 = muli %1214, %c-2 : index + %1223 = addi %1221, %1222 : index + %1224 = addi %1223, %c1 : index + %1225 = cmpi "slt", %1224, %c0 : index + %1226 = subi %c-1, %1224 : index + %1227 = select %1225, %1226, %1224 : index + %1228 = divi_signed %1227, %c2 : index + %1229 = subi %c-1, %1228 : index + %1230 = select %1225, %1229, %1228 : index + %1231 = muli %1230, %c-2 : index + %1232 = addi %1223, %1231 : index + %1233 = addi %1232, %c1 : index + %1234 = load %2[%1218, %c0, %1233] : memref<16x6x2xvector<8xf32>> + %1235 = addf %1207, %1234 : vector<8xf32> + store %1235, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %1236 = addi %1178, %c16 : index + %1237 = vector.transfer_read %arg2[%678, %1236], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1238 = addi %1185, %c1 : index + %1239 = cmpi "slt", %1238, %c0 : index + %1240 = subi %c-1, %1238 : index + %1241 = select %1239, %1240, %1238 : index + %1242 = divi_signed %1241, %c16 : index + %1243 = subi %c-1, %1242 : index + %1244 = select %1239, %1243, %1242 : index + %1245 = muli %1244, %c-16 : index + %1246 = addi %1185, %1245 : index + %1247 = addi %1246, %c1 : index + %1248 = load %2[%1247, %c0, %1203] : memref<16x6x2xvector<8xf32>> + %1249 = addf %1237, %1248 : vector<8xf32> + store %1249, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %1250 = addi %1178, %c24 : index + %1251 = vector.transfer_read %arg2[%678, %1250], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1252 = addi %1176, %c24 : index + %1253 = cmpi "slt", %1252, %c0 : index + %1254 = subi %c-1, %1252 : index + %1255 = select %1253, %1254, %1252 : index + %1256 = divi_signed %1255, %c16 : index + %1257 = subi %c-1, %1256 : index + %1258 = select %1253, %1257, %1256 : index + %1259 = remi_signed %1258, %c16 : index + %1260 = cmpi "slt", %1259, %c0 : index + %1261 = addi %1259, %c16 : index + %1262 = select %1260, %1261, %1259 : index + %1263 = muli %1258, %c-2 : index + %1264 = addi %1221, %1263 : index + %1265 = addi %1264, %c3 : index + %1266 = cmpi "slt", %1265, %c0 : index + %1267 = subi %c-1, %1265 : index + %1268 = select %1266, %1267, %1265 : index + %1269 = divi_signed %1268, %c2 : index + %1270 = subi %c-1, %1269 : index + %1271 = select %1266, %1270, %1269 : index + %1272 = muli %1271, %c-2 : index + %1273 = addi %1264, %1272 : index + %1274 = addi %1273, %c3 : index + %1275 = load %2[%1262, %c0, %1274] : memref<16x6x2xvector<8xf32>> + %1276 = addf %1251, %1275 : vector<8xf32> + store %1276, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %1277 = addi %1178, %c32 : index + %1278 = vector.transfer_read %arg2[%678, %1277], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1279 = addi %1185, %c2 : index + %1280 = cmpi "slt", %1279, %c0 : index + %1281 = subi %c-1, %1279 : index + %1282 = select %1280, %1281, %1279 : index + %1283 = divi_signed %1282, %c16 : index + %1284 = subi %c-1, %1283 : index + %1285 = select %1280, %1284, %1283 : index + %1286 = muli %1285, %c-16 : index + %1287 = addi %1185, %1286 : index + %1288 = addi %1287, %c2 : index + %1289 = load %2[%1288, %c0, %1203] : memref<16x6x2xvector<8xf32>> + %1290 = addf %1278, %1289 : vector<8xf32> + store %1290, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %1291 = addi %1178, %c40 : index + %1292 = vector.transfer_read %arg2[%678, %1291], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1293 = addi %1176, %c40 : index + %1294 = cmpi "slt", %1293, %c0 : index + %1295 = subi %c-1, %1293 : index + %1296 = select %1294, %1295, %1293 : index + %1297 = divi_signed %1296, %c16 : index + %1298 = subi %c-1, %1297 : index + %1299 = select %1294, %1298, %1297 : index + %1300 = remi_signed %1299, %c16 : index + %1301 = cmpi "slt", %1300, %c0 : index + %1302 = addi %1300, %c16 : index + %1303 = select %1301, %1302, %1300 : index + %1304 = muli %1299, %c-2 : index + %1305 = addi %1221, %1304 : index + %1306 = addi %1305, %c5 : index + %1307 = cmpi "slt", %1306, %c0 : index + %1308 = subi %c-1, %1306 : index + %1309 = select %1307, %1308, %1306 : index + %1310 = divi_signed %1309, %c2 : index + %1311 = subi %c-1, %1310 : index + %1312 = select %1307, %1311, %1310 : index + %1313 = muli %1312, %c-2 : index + %1314 = addi %1305, %1313 : index + %1315 = addi %1314, %c5 : index + %1316 = load %2[%1303, %c0, %1315] : memref<16x6x2xvector<8xf32>> + %1317 = addf %1292, %1316 : vector<8xf32> + store %1317, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %1318 = addi %1178, %c48 : index + %1319 = vector.transfer_read %arg2[%678, %1318], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1320 = addi %1185, %c3 : index + %1321 = cmpi "slt", %1320, %c0 : index + %1322 = subi %c-1, %1320 : index + %1323 = select %1321, %1322, %1320 : index + %1324 = divi_signed %1323, %c16 : index + %1325 = subi %c-1, %1324 : index + %1326 = select %1321, %1325, %1324 : index + %1327 = muli %1326, %c-16 : index + %1328 = addi %1185, %1327 : index + %1329 = addi %1328, %c3 : index + %1330 = load %2[%1329, %c0, %1203] : memref<16x6x2xvector<8xf32>> + %1331 = addf %1319, %1330 : vector<8xf32> + store %1331, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %1332 = addi %1178, %c56 : index + %1333 = vector.transfer_read %arg2[%678, %1332], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1334 = addi %1176, %c56 : index + %1335 = cmpi "slt", %1334, %c0 : index + %1336 = subi %c-1, %1334 : index + %1337 = select %1335, %1336, %1334 : index + %1338 = divi_signed %1337, %c16 : index + %1339 = subi %c-1, %1338 : index + %1340 = select %1335, %1339, %1338 : index + %1341 = remi_signed %1340, %c16 : index + %1342 = cmpi "slt", %1341, %c0 : index + %1343 = addi %1341, %c16 : index + %1344 = select %1342, %1343, %1341 : index + %1345 = muli %1340, %c-2 : index + %1346 = addi %1221, %1345 : index + %1347 = addi %1346, %c7 : index + %1348 = cmpi "slt", %1347, %c0 : index + %1349 = subi %c-1, %1347 : index + %1350 = select %1348, %1349, %1347 : index + %1351 = divi_signed %1350, %c2 : index + %1352 = subi %c-1, %1351 : index + %1353 = select %1348, %1352, %1351 : index + %1354 = muli %1353, %c-2 : index + %1355 = addi %1346, %1354 : index + %1356 = addi %1355, %c7 : index + %1357 = load %2[%1344, %c0, %1356] : memref<16x6x2xvector<8xf32>> + %1358 = addf %1333, %1357 : vector<8xf32> + store %1358, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %1359 = addi %1178, %c64 : index + %1360 = vector.transfer_read %arg2[%678, %1359], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1361 = addi %1185, %c4 : index + %1362 = cmpi "slt", %1361, %c0 : index + %1363 = subi %c-1, %1361 : index + %1364 = select %1362, %1363, %1361 : index + %1365 = divi_signed %1364, %c16 : index + %1366 = subi %c-1, %1365 : index + %1367 = select %1362, %1366, %1365 : index + %1368 = muli %1367, %c-16 : index + %1369 = addi %1185, %1368 : index + %1370 = addi %1369, %c4 : index + %1371 = load %2[%1370, %c0, %1203] : memref<16x6x2xvector<8xf32>> + %1372 = addf %1360, %1371 : vector<8xf32> + store %1372, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %1373 = addi %1178, %c72 : index + %1374 = vector.transfer_read %arg2[%678, %1373], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1375 = addi %1176, %c72 : index + %1376 = cmpi "slt", %1375, %c0 : index + %1377 = subi %c-1, %1375 : index + %1378 = select %1376, %1377, %1375 : index + %1379 = divi_signed %1378, %c16 : index + %1380 = subi %c-1, %1379 : index + %1381 = select %1376, %1380, %1379 : index + %1382 = remi_signed %1381, %c16 : index + %1383 = cmpi "slt", %1382, %c0 : index + %1384 = addi %1382, %c16 : index + %1385 = select %1383, %1384, %1382 : index + %1386 = muli %1381, %c-2 : index + %1387 = addi %1221, %1386 : index + %1388 = addi %1387, %c9 : index + %1389 = cmpi "slt", %1388, %c0 : index + %1390 = subi %c-1, %1388 : index + %1391 = select %1389, %1390, %1388 : index + %1392 = divi_signed %1391, %c2 : index + %1393 = subi %c-1, %1392 : index + %1394 = select %1389, %1393, %1392 : index + %1395 = muli %1394, %c-2 : index + %1396 = addi %1387, %1395 : index + %1397 = addi %1396, %c9 : index + %1398 = load %2[%1385, %c0, %1397] : memref<16x6x2xvector<8xf32>> + %1399 = addf %1374, %1398 : vector<8xf32> + store %1399, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %1400 = addi %1178, %c80 : index + %1401 = vector.transfer_read %arg2[%678, %1400], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1402 = addi %1185, %c5 : index + %1403 = cmpi "slt", %1402, %c0 : index + %1404 = subi %c-1, %1402 : index + %1405 = select %1403, %1404, %1402 : index + %1406 = divi_signed %1405, %c16 : index + %1407 = subi %c-1, %1406 : index + %1408 = select %1403, %1407, %1406 : index + %1409 = muli %1408, %c-16 : index + %1410 = addi %1185, %1409 : index + %1411 = addi %1410, %c5 : index + %1412 = load %2[%1411, %c0, %1203] : memref<16x6x2xvector<8xf32>> + %1413 = addf %1401, %1412 : vector<8xf32> + store %1413, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %1414 = addi %1178, %c88 : index + %1415 = vector.transfer_read %arg2[%678, %1414], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1416 = addi %1176, %c88 : index + %1417 = cmpi "slt", %1416, %c0 : index + %1418 = subi %c-1, %1416 : index + %1419 = select %1417, %1418, %1416 : index + %1420 = divi_signed %1419, %c16 : index + %1421 = subi %c-1, %1420 : index + %1422 = select %1417, %1421, %1420 : index + %1423 = remi_signed %1422, %c16 : index + %1424 = cmpi "slt", %1423, %c0 : index + %1425 = addi %1423, %c16 : index + %1426 = select %1424, %1425, %1423 : index + %1427 = muli %1422, %c-2 : index + %1428 = addi %1221, %1427 : index + %1429 = addi %1428, %c11 : index + %1430 = cmpi "slt", %1429, %c0 : index + %1431 = subi %c-1, %1429 : index + %1432 = select %1430, %1431, %1429 : index + %1433 = divi_signed %1432, %c2 : index + %1434 = subi %c-1, %1433 : index + %1435 = select %1430, %1434, %1433 : index + %1436 = muli %1435, %c-2 : index + %1437 = addi %1428, %1436 : index + %1438 = addi %1437, %c11 : index + %1439 = load %2[%1426, %c0, %1438] : memref<16x6x2xvector<8xf32>> + %1440 = addf %1415, %1439 : vector<8xf32> + store %1440, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %1441 = addi %1178, %c96 : index + %1442 = vector.transfer_read %arg2[%678, %1441], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1443 = addi %1185, %c6 : index + %1444 = cmpi "slt", %1443, %c0 : index + %1445 = subi %c-1, %1443 : index + %1446 = select %1444, %1445, %1443 : index + %1447 = divi_signed %1446, %c16 : index + %1448 = subi %c-1, %1447 : index + %1449 = select %1444, %1448, %1447 : index + %1450 = muli %1449, %c-16 : index + %1451 = addi %1185, %1450 : index + %1452 = addi %1451, %c6 : index + %1453 = load %2[%1452, %c0, %1203] : memref<16x6x2xvector<8xf32>> + %1454 = addf %1442, %1453 : vector<8xf32> + store %1454, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %1455 = addi %1178, %c104 : index + %1456 = vector.transfer_read %arg2[%678, %1455], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1457 = addi %1176, %c104 : index + %1458 = cmpi "slt", %1457, %c0 : index + %1459 = subi %c-1, %1457 : index + %1460 = select %1458, %1459, %1457 : index + %1461 = divi_signed %1460, %c16 : index + %1462 = subi %c-1, %1461 : index + %1463 = select %1458, %1462, %1461 : index + %1464 = remi_signed %1463, %c16 : index + %1465 = cmpi "slt", %1464, %c0 : index + %1466 = addi %1464, %c16 : index + %1467 = select %1465, %1466, %1464 : index + %1468 = muli %1463, %c-2 : index + %1469 = addi %1221, %1468 : index + %1470 = addi %1469, %c13 : index + %1471 = cmpi "slt", %1470, %c0 : index + %1472 = subi %c-1, %1470 : index + %1473 = select %1471, %1472, %1470 : index + %1474 = divi_signed %1473, %c2 : index + %1475 = subi %c-1, %1474 : index + %1476 = select %1471, %1475, %1474 : index + %1477 = muli %1476, %c-2 : index + %1478 = addi %1469, %1477 : index + %1479 = addi %1478, %c13 : index + %1480 = load %2[%1467, %c0, %1479] : memref<16x6x2xvector<8xf32>> + %1481 = addf %1456, %1480 : vector<8xf32> + store %1481, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %1482 = addi %1178, %c112 : index + %1483 = vector.transfer_read %arg2[%678, %1482], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1484 = addi %1185, %c7 : index + %1485 = cmpi "slt", %1484, %c0 : index + %1486 = subi %c-1, %1484 : index + %1487 = select %1485, %1486, %1484 : index + %1488 = divi_signed %1487, %c16 : index + %1489 = subi %c-1, %1488 : index + %1490 = select %1485, %1489, %1488 : index + %1491 = muli %1490, %c-16 : index + %1492 = addi %1185, %1491 : index + %1493 = addi %1492, %c7 : index + %1494 = load %2[%1493, %c0, %1203] : memref<16x6x2xvector<8xf32>> + %1495 = addf %1483, %1494 : vector<8xf32> + store %1495, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %1496 = addi %1178, %c120 : index + %1497 = vector.transfer_read %arg2[%678, %1496], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1498 = addi %1176, %c120 : index + %1499 = cmpi "slt", %1498, %c0 : index + %1500 = subi %c-1, %1498 : index + %1501 = select %1499, %1500, %1498 : index + %1502 = divi_signed %1501, %c16 : index + %1503 = subi %c-1, %1502 : index + %1504 = select %1499, %1503, %1502 : index + %1505 = remi_signed %1504, %c16 : index + %1506 = cmpi "slt", %1505, %c0 : index + %1507 = addi %1505, %c16 : index + %1508 = select %1506, %1507, %1505 : index + %1509 = muli %1504, %c-2 : index + %1510 = addi %1221, %1509 : index + %1511 = addi %1510, %c15 : index + %1512 = cmpi "slt", %1511, %c0 : index + %1513 = subi %c-1, %1511 : index + %1514 = select %1512, %1513, %1511 : index + %1515 = divi_signed %1514, %c2 : index + %1516 = subi %c-1, %1515 : index + %1517 = select %1512, %1516, %1515 : index + %1518 = muli %1517, %c-2 : index + %1519 = addi %1510, %1518 : index + %1520 = addi %1519, %c15 : index + %1521 = load %2[%1508, %c0, %1520] : memref<16x6x2xvector<8xf32>> + %1522 = addf %1497, %1521 : vector<8xf32> + store %1522, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + br ^bb44(%c0 : index) + ^bb44(%1523: index): // 2 preds: ^bb43, ^bb45 + %1524 = cmpi "slt", %1523, %c16 : index + cond_br %1524, ^bb45, ^bb46 + ^bb45: // pred: ^bb44 + %1525 = muli %1523, %c8 : index + %1526 = addi %1178, %1525 : index + %1527 = load %1[%c0, %1523] : memref<1x16xvector<8xf32>> + vector.transfer_write %1527, %arg2[%678, %1526] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1528 = addi %1523, %c1 : index + br ^bb44(%1528 : index) + ^bb46: // pred: ^bb44 + br ^bb51 + ^bb47: // pred: ^bb42 + %1529 = addi %4, %1176 : index + %1530 = vector.transfer_read %arg2[%678, %1529], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1531 = cmpi "slt", %1176, %c0 : index + %1532 = subi %c-1, %1176 : index + %1533 = select %1531, %1532, %1176 : index + %1534 = divi_signed %1533, %c16 : index + %1535 = subi %c-1, %1534 : index + %1536 = select %1531, %1535, %1534 : index + %1537 = remi_signed %1536, %c16 : index + %1538 = cmpi "slt", %1537, %c0 : index + %1539 = addi %1537, %c16 : index + %1540 = select %1538, %1539, %1537 : index + %1541 = remi_signed %1176, %c16 : index + %1542 = cmpi "slt", %1541, %c0 : index + %1543 = addi %1541, %c16 : index + %1544 = select %1542, %1543, %1541 : index + %1545 = cmpi "slt", %1544, %c0 : index + %1546 = subi %c-1, %1544 : index + %1547 = select %1545, %1546, %1544 : index + %1548 = divi_signed %1547, %c8 : index + %1549 = subi %c-1, %1548 : index + %1550 = select %1545, %1549, %1548 : index + %1551 = remi_signed %1550, %c2 : index + %1552 = cmpi "slt", %1551, %c0 : index + %1553 = addi %1551, %c2 : index + %1554 = select %1552, %1553, %1551 : index + %1555 = load %2[%1540, %c0, %1554] : memref<16x6x2xvector<8xf32>> + %1556 = addf %1530, %1555 : vector<8xf32> + store %1556, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %1557 = addi %1529, %c8 : index + %1558 = vector.transfer_read %arg2[%678, %1557], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1559 = addi %1176, %c8 : index + %1560 = cmpi "slt", %1559, %c0 : index + %1561 = subi %c-1, %1559 : index + %1562 = select %1560, %1561, %1559 : index + %1563 = divi_signed %1562, %c16 : index + %1564 = subi %c-1, %1563 : index + %1565 = select %1560, %1564, %1563 : index + %1566 = remi_signed %1565, %c16 : index + %1567 = cmpi "slt", %1566, %c0 : index + %1568 = addi %1566, %c16 : index + %1569 = select %1567, %1568, %1566 : index + %1570 = divi_signed %1533, %c8 : index + %1571 = subi %c-1, %1570 : index + %1572 = select %1531, %1571, %1570 : index + %1573 = muli %1565, %c-2 : index + %1574 = addi %1572, %1573 : index + %1575 = addi %1574, %c1 : index + %1576 = cmpi "slt", %1575, %c0 : index + %1577 = subi %c-1, %1575 : index + %1578 = select %1576, %1577, %1575 : index + %1579 = divi_signed %1578, %c2 : index + %1580 = subi %c-1, %1579 : index + %1581 = select %1576, %1580, %1579 : index + %1582 = muli %1581, %c-2 : index + %1583 = addi %1574, %1582 : index + %1584 = addi %1583, %c1 : index + %1585 = load %2[%1569, %c0, %1584] : memref<16x6x2xvector<8xf32>> + %1586 = addf %1558, %1585 : vector<8xf32> + store %1586, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %1587 = addi %1529, %c16 : index + %1588 = vector.transfer_read %arg2[%678, %1587], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1589 = addi %1536, %c1 : index + %1590 = cmpi "slt", %1589, %c0 : index + %1591 = subi %c-1, %1589 : index + %1592 = select %1590, %1591, %1589 : index + %1593 = divi_signed %1592, %c16 : index + %1594 = subi %c-1, %1593 : index + %1595 = select %1590, %1594, %1593 : index + %1596 = muli %1595, %c-16 : index + %1597 = addi %1536, %1596 : index + %1598 = addi %1597, %c1 : index + %1599 = load %2[%1598, %c0, %1554] : memref<16x6x2xvector<8xf32>> + %1600 = addf %1588, %1599 : vector<8xf32> + store %1600, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %1601 = addi %1529, %c24 : index + %1602 = vector.transfer_read %arg2[%678, %1601], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1603 = addi %1176, %c24 : index + %1604 = cmpi "slt", %1603, %c0 : index + %1605 = subi %c-1, %1603 : index + %1606 = select %1604, %1605, %1603 : index + %1607 = divi_signed %1606, %c16 : index + %1608 = subi %c-1, %1607 : index + %1609 = select %1604, %1608, %1607 : index + %1610 = remi_signed %1609, %c16 : index + %1611 = cmpi "slt", %1610, %c0 : index + %1612 = addi %1610, %c16 : index + %1613 = select %1611, %1612, %1610 : index + %1614 = muli %1609, %c-2 : index + %1615 = addi %1572, %1614 : index + %1616 = addi %1615, %c3 : index + %1617 = cmpi "slt", %1616, %c0 : index + %1618 = subi %c-1, %1616 : index + %1619 = select %1617, %1618, %1616 : index + %1620 = divi_signed %1619, %c2 : index + %1621 = subi %c-1, %1620 : index + %1622 = select %1617, %1621, %1620 : index + %1623 = muli %1622, %c-2 : index + %1624 = addi %1615, %1623 : index + %1625 = addi %1624, %c3 : index + %1626 = load %2[%1613, %c0, %1625] : memref<16x6x2xvector<8xf32>> + %1627 = addf %1602, %1626 : vector<8xf32> + store %1627, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %1628 = addi %1529, %c32 : index + %1629 = vector.transfer_read %arg2[%678, %1628], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1630 = addi %1536, %c2 : index + %1631 = cmpi "slt", %1630, %c0 : index + %1632 = subi %c-1, %1630 : index + %1633 = select %1631, %1632, %1630 : index + %1634 = divi_signed %1633, %c16 : index + %1635 = subi %c-1, %1634 : index + %1636 = select %1631, %1635, %1634 : index + %1637 = muli %1636, %c-16 : index + %1638 = addi %1536, %1637 : index + %1639 = addi %1638, %c2 : index + %1640 = load %2[%1639, %c0, %1554] : memref<16x6x2xvector<8xf32>> + %1641 = addf %1629, %1640 : vector<8xf32> + store %1641, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %1642 = addi %1529, %c40 : index + %1643 = vector.transfer_read %arg2[%678, %1642], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1644 = addi %1176, %c40 : index + %1645 = cmpi "slt", %1644, %c0 : index + %1646 = subi %c-1, %1644 : index + %1647 = select %1645, %1646, %1644 : index + %1648 = divi_signed %1647, %c16 : index + %1649 = subi %c-1, %1648 : index + %1650 = select %1645, %1649, %1648 : index + %1651 = remi_signed %1650, %c16 : index + %1652 = cmpi "slt", %1651, %c0 : index + %1653 = addi %1651, %c16 : index + %1654 = select %1652, %1653, %1651 : index + %1655 = muli %1650, %c-2 : index + %1656 = addi %1572, %1655 : index + %1657 = addi %1656, %c5 : index + %1658 = cmpi "slt", %1657, %c0 : index + %1659 = subi %c-1, %1657 : index + %1660 = select %1658, %1659, %1657 : index + %1661 = divi_signed %1660, %c2 : index + %1662 = subi %c-1, %1661 : index + %1663 = select %1658, %1662, %1661 : index + %1664 = muli %1663, %c-2 : index + %1665 = addi %1656, %1664 : index + %1666 = addi %1665, %c5 : index + %1667 = load %2[%1654, %c0, %1666] : memref<16x6x2xvector<8xf32>> + %1668 = addf %1643, %1667 : vector<8xf32> + store %1668, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %1669 = addi %1529, %c48 : index + %1670 = vector.transfer_read %arg2[%678, %1669], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1671 = addi %1536, %c3 : index + %1672 = cmpi "slt", %1671, %c0 : index + %1673 = subi %c-1, %1671 : index + %1674 = select %1672, %1673, %1671 : index + %1675 = divi_signed %1674, %c16 : index + %1676 = subi %c-1, %1675 : index + %1677 = select %1672, %1676, %1675 : index + %1678 = muli %1677, %c-16 : index + %1679 = addi %1536, %1678 : index + %1680 = addi %1679, %c3 : index + %1681 = load %2[%1680, %c0, %1554] : memref<16x6x2xvector<8xf32>> + %1682 = addf %1670, %1681 : vector<8xf32> + store %1682, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %1683 = addi %1529, %c56 : index + %1684 = vector.transfer_read %arg2[%678, %1683], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1685 = addi %1176, %c56 : index + %1686 = cmpi "slt", %1685, %c0 : index + %1687 = subi %c-1, %1685 : index + %1688 = select %1686, %1687, %1685 : index + %1689 = divi_signed %1688, %c16 : index + %1690 = subi %c-1, %1689 : index + %1691 = select %1686, %1690, %1689 : index + %1692 = remi_signed %1691, %c16 : index + %1693 = cmpi "slt", %1692, %c0 : index + %1694 = addi %1692, %c16 : index + %1695 = select %1693, %1694, %1692 : index + %1696 = muli %1691, %c-2 : index + %1697 = addi %1572, %1696 : index + %1698 = addi %1697, %c7 : index + %1699 = cmpi "slt", %1698, %c0 : index + %1700 = subi %c-1, %1698 : index + %1701 = select %1699, %1700, %1698 : index + %1702 = divi_signed %1701, %c2 : index + %1703 = subi %c-1, %1702 : index + %1704 = select %1699, %1703, %1702 : index + %1705 = muli %1704, %c-2 : index + %1706 = addi %1697, %1705 : index + %1707 = addi %1706, %c7 : index + %1708 = load %2[%1695, %c0, %1707] : memref<16x6x2xvector<8xf32>> + %1709 = addf %1684, %1708 : vector<8xf32> + store %1709, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %1710 = addi %1529, %c64 : index + %1711 = vector.transfer_read %arg2[%678, %1710], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1712 = addi %1536, %c4 : index + %1713 = cmpi "slt", %1712, %c0 : index + %1714 = subi %c-1, %1712 : index + %1715 = select %1713, %1714, %1712 : index + %1716 = divi_signed %1715, %c16 : index + %1717 = subi %c-1, %1716 : index + %1718 = select %1713, %1717, %1716 : index + %1719 = muli %1718, %c-16 : index + %1720 = addi %1536, %1719 : index + %1721 = addi %1720, %c4 : index + %1722 = load %2[%1721, %c0, %1554] : memref<16x6x2xvector<8xf32>> + %1723 = addf %1711, %1722 : vector<8xf32> + store %1723, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %1724 = addi %1529, %c72 : index + %1725 = vector.transfer_read %arg2[%678, %1724], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1726 = addi %1176, %c72 : index + %1727 = cmpi "slt", %1726, %c0 : index + %1728 = subi %c-1, %1726 : index + %1729 = select %1727, %1728, %1726 : index + %1730 = divi_signed %1729, %c16 : index + %1731 = subi %c-1, %1730 : index + %1732 = select %1727, %1731, %1730 : index + %1733 = remi_signed %1732, %c16 : index + %1734 = cmpi "slt", %1733, %c0 : index + %1735 = addi %1733, %c16 : index + %1736 = select %1734, %1735, %1733 : index + %1737 = muli %1732, %c-2 : index + %1738 = addi %1572, %1737 : index + %1739 = addi %1738, %c9 : index + %1740 = cmpi "slt", %1739, %c0 : index + %1741 = subi %c-1, %1739 : index + %1742 = select %1740, %1741, %1739 : index + %1743 = divi_signed %1742, %c2 : index + %1744 = subi %c-1, %1743 : index + %1745 = select %1740, %1744, %1743 : index + %1746 = muli %1745, %c-2 : index + %1747 = addi %1738, %1746 : index + %1748 = addi %1747, %c9 : index + %1749 = load %2[%1736, %c0, %1748] : memref<16x6x2xvector<8xf32>> + %1750 = addf %1725, %1749 : vector<8xf32> + store %1750, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %1751 = addi %1529, %c80 : index + %1752 = vector.transfer_read %arg2[%678, %1751], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1753 = addi %1536, %c5 : index + %1754 = cmpi "slt", %1753, %c0 : index + %1755 = subi %c-1, %1753 : index + %1756 = select %1754, %1755, %1753 : index + %1757 = divi_signed %1756, %c16 : index + %1758 = subi %c-1, %1757 : index + %1759 = select %1754, %1758, %1757 : index + %1760 = muli %1759, %c-16 : index + %1761 = addi %1536, %1760 : index + %1762 = addi %1761, %c5 : index + %1763 = load %2[%1762, %c0, %1554] : memref<16x6x2xvector<8xf32>> + %1764 = addf %1752, %1763 : vector<8xf32> + store %1764, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %1765 = addi %1529, %c88 : index + %1766 = vector.transfer_read %arg2[%678, %1765], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1767 = addi %1176, %c88 : index + %1768 = cmpi "slt", %1767, %c0 : index + %1769 = subi %c-1, %1767 : index + %1770 = select %1768, %1769, %1767 : index + %1771 = divi_signed %1770, %c16 : index + %1772 = subi %c-1, %1771 : index + %1773 = select %1768, %1772, %1771 : index + %1774 = remi_signed %1773, %c16 : index + %1775 = cmpi "slt", %1774, %c0 : index + %1776 = addi %1774, %c16 : index + %1777 = select %1775, %1776, %1774 : index + %1778 = muli %1773, %c-2 : index + %1779 = addi %1572, %1778 : index + %1780 = addi %1779, %c11 : index + %1781 = cmpi "slt", %1780, %c0 : index + %1782 = subi %c-1, %1780 : index + %1783 = select %1781, %1782, %1780 : index + %1784 = divi_signed %1783, %c2 : index + %1785 = subi %c-1, %1784 : index + %1786 = select %1781, %1785, %1784 : index + %1787 = muli %1786, %c-2 : index + %1788 = addi %1779, %1787 : index + %1789 = addi %1788, %c11 : index + %1790 = load %2[%1777, %c0, %1789] : memref<16x6x2xvector<8xf32>> + %1791 = addf %1766, %1790 : vector<8xf32> + store %1791, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %1792 = addi %1529, %c96 : index + %1793 = vector.transfer_read %arg2[%678, %1792], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1794 = addi %1536, %c6 : index + %1795 = cmpi "slt", %1794, %c0 : index + %1796 = subi %c-1, %1794 : index + %1797 = select %1795, %1796, %1794 : index + %1798 = divi_signed %1797, %c16 : index + %1799 = subi %c-1, %1798 : index + %1800 = select %1795, %1799, %1798 : index + %1801 = muli %1800, %c-16 : index + %1802 = addi %1536, %1801 : index + %1803 = addi %1802, %c6 : index + %1804 = load %2[%1803, %c0, %1554] : memref<16x6x2xvector<8xf32>> + %1805 = addf %1793, %1804 : vector<8xf32> + store %1805, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %1806 = addi %1529, %c104 : index + %1807 = vector.transfer_read %arg2[%678, %1806], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1808 = addi %1176, %c104 : index + %1809 = cmpi "slt", %1808, %c0 : index + %1810 = subi %c-1, %1808 : index + %1811 = select %1809, %1810, %1808 : index + %1812 = divi_signed %1811, %c16 : index + %1813 = subi %c-1, %1812 : index + %1814 = select %1809, %1813, %1812 : index + %1815 = remi_signed %1814, %c16 : index + %1816 = cmpi "slt", %1815, %c0 : index + %1817 = addi %1815, %c16 : index + %1818 = select %1816, %1817, %1815 : index + %1819 = muli %1814, %c-2 : index + %1820 = addi %1572, %1819 : index + %1821 = addi %1820, %c13 : index + %1822 = cmpi "slt", %1821, %c0 : index + %1823 = subi %c-1, %1821 : index + %1824 = select %1822, %1823, %1821 : index + %1825 = divi_signed %1824, %c2 : index + %1826 = subi %c-1, %1825 : index + %1827 = select %1822, %1826, %1825 : index + %1828 = muli %1827, %c-2 : index + %1829 = addi %1820, %1828 : index + %1830 = addi %1829, %c13 : index + %1831 = load %2[%1818, %c0, %1830] : memref<16x6x2xvector<8xf32>> + %1832 = addf %1807, %1831 : vector<8xf32> + store %1832, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %1833 = addi %1529, %c112 : index + %1834 = vector.transfer_read %arg2[%678, %1833], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1835 = addi %1536, %c7 : index + %1836 = cmpi "slt", %1835, %c0 : index + %1837 = subi %c-1, %1835 : index + %1838 = select %1836, %1837, %1835 : index + %1839 = divi_signed %1838, %c16 : index + %1840 = subi %c-1, %1839 : index + %1841 = select %1836, %1840, %1839 : index + %1842 = muli %1841, %c-16 : index + %1843 = addi %1536, %1842 : index + %1844 = addi %1843, %c7 : index + %1845 = load %2[%1844, %c0, %1554] : memref<16x6x2xvector<8xf32>> + %1846 = addf %1834, %1845 : vector<8xf32> + store %1846, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %1847 = addi %1529, %c120 : index + %1848 = vector.transfer_read %arg2[%678, %1847], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %1849 = addi %1176, %c120 : index + %1850 = cmpi "slt", %1849, %c0 : index + %1851 = subi %c-1, %1849 : index + %1852 = select %1850, %1851, %1849 : index + %1853 = divi_signed %1852, %c16 : index + %1854 = subi %c-1, %1853 : index + %1855 = select %1850, %1854, %1853 : index + %1856 = remi_signed %1855, %c16 : index + %1857 = cmpi "slt", %1856, %c0 : index + %1858 = addi %1856, %c16 : index + %1859 = select %1857, %1858, %1856 : index + %1860 = muli %1855, %c-2 : index + %1861 = addi %1572, %1860 : index + %1862 = addi %1861, %c15 : index + %1863 = cmpi "slt", %1862, %c0 : index + %1864 = subi %c-1, %1862 : index + %1865 = select %1863, %1864, %1862 : index + %1866 = divi_signed %1865, %c2 : index + %1867 = subi %c-1, %1866 : index + %1868 = select %1863, %1867, %1866 : index + %1869 = muli %1868, %c-2 : index + %1870 = addi %1861, %1869 : index + %1871 = addi %1870, %c15 : index + %1872 = load %2[%1859, %c0, %1871] : memref<16x6x2xvector<8xf32>> + %1873 = addf %1848, %1872 : vector<8xf32> + store %1873, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + br ^bb48(%c0 : index) + ^bb48(%1874: index): // 2 preds: ^bb47, ^bb49 + %1875 = cmpi "slt", %1874, %c16 : index + cond_br %1875, ^bb49, ^bb50 + ^bb49: // pred: ^bb48 + %1876 = muli %1874, %c8 : index + %1877 = addi %1529, %1876 : index + %1878 = load %1[%c0, %1874] : memref<1x16xvector<8xf32>> + vector.transfer_write %1878, %arg2[%678, %1877] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %1879 = addi %1874, %c1 : index + br ^bb48(%1879 : index) + ^bb50: // pred: ^bb48 + br ^bb51 + ^bb51: // 2 preds: ^bb46, ^bb50 + %1880 = addi %1176, %c128 : index + br ^bb41(%1880 : index) + ^bb52: // pred: ^bb41 + %1881 = addi %678, %c1 : index + br ^bb12(%1881 : index) + ^bb53: // pred: ^bb12 + %1882 = addi %4, %c256 : index + br ^bb1(%1882 : index) + ^bb54: // pred: ^bb1 + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/21_ConvertValueToLLVM.mlir b/Tutorials/optimized_matmul/mlir/21_ConvertValueToLLVM.mlir new file mode 100644 index 00000000..0531ddef --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/21_ConvertValueToLLVM.mlir @@ -0,0 +1,7762 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.mlir.global internal @cache_17() : !llvm.array<4096 x vec<8 x float>> + llvm.mlir.global internal @cache_16() : !llvm.array<192 x vec<8 x float>> + llvm.func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(0 : i64) : !llvm.i64 + %25 = llvm.mlir.constant(1 : i64) : !llvm.i64 + %26 = llvm.mlir.constant(2 : i64) : !llvm.i64 + %27 = llvm.mlir.constant(3 : i64) : !llvm.i64 + %28 = llvm.mlir.constant(4 : i64) : !llvm.i64 + %29 = llvm.mlir.constant(5 : i64) : !llvm.i64 + %30 = llvm.mlir.constant(6 : i64) : !llvm.i64 + %31 = llvm.mlir.constant(7 : i64) : !llvm.i64 + %32 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %33 = llvm.mlir.constant(10 : index) : !llvm.i64 + %34 = llvm.mlir.constant(12 : index) : !llvm.i64 + %35 = llvm.mlir.constant(14 : index) : !llvm.i64 + %36 = llvm.mlir.constant(512 : index) : !llvm.i64 + %37 = llvm.mlir.constant(784 : index) : !llvm.i64 + %38 = llvm.mlir.constant(256 : index) : !llvm.i64 + %39 = llvm.mlir.constant(128 : index) : !llvm.i64 + %40 = llvm.mlir.constant(true) : !llvm.i1 + %41 = llvm.mlir.constant(24 : index) : !llvm.i64 + %42 = llvm.mlir.constant(32 : index) : !llvm.i64 + %43 = llvm.mlir.constant(40 : index) : !llvm.i64 + %44 = llvm.mlir.constant(48 : index) : !llvm.i64 + %45 = llvm.mlir.constant(3 : index) : !llvm.i64 + %46 = llvm.mlir.constant(56 : index) : !llvm.i64 + %47 = llvm.mlir.constant(64 : index) : !llvm.i64 + %48 = llvm.mlir.constant(4 : index) : !llvm.i64 + %49 = llvm.mlir.constant(72 : index) : !llvm.i64 + %50 = llvm.mlir.constant(9 : index) : !llvm.i64 + %51 = llvm.mlir.constant(80 : index) : !llvm.i64 + %52 = llvm.mlir.constant(5 : index) : !llvm.i64 + %53 = llvm.mlir.constant(88 : index) : !llvm.i64 + %54 = llvm.mlir.constant(11 : index) : !llvm.i64 + %55 = llvm.mlir.constant(96 : index) : !llvm.i64 + %56 = llvm.mlir.constant(6 : index) : !llvm.i64 + %57 = llvm.mlir.constant(104 : index) : !llvm.i64 + %58 = llvm.mlir.constant(13 : index) : !llvm.i64 + %59 = llvm.mlir.constant(112 : index) : !llvm.i64 + %60 = llvm.mlir.constant(-16 : index) : !llvm.i64 + %61 = llvm.mlir.constant(7 : index) : !llvm.i64 + %62 = llvm.mlir.constant(120 : index) : !llvm.i64 + %63 = llvm.mlir.constant(2 : index) : !llvm.i64 + %64 = llvm.mlir.constant(-1 : index) : !llvm.i64 + %65 = llvm.mlir.constant(-2 : index) : !llvm.i64 + %66 = llvm.mlir.constant(15 : index) : !llvm.i64 + %67 = llvm.mlir.constant(0 : index) : !llvm.i64 + %68 = llvm.mlir.constant(16 : index) : !llvm.i64 + %69 = llvm.mlir.constant(1 : index) : !llvm.i64 + %70 = llvm.mlir.constant(8 : index) : !llvm.i64 + %71 = llvm.mlir.constant(1 : index) : !llvm.i64 + %72 = llvm.mlir.constant(16 : index) : !llvm.i64 + %73 = llvm.mul %71, %72 : !llvm.i64 + %74 = llvm.mlir.null : !llvm.ptr> + %75 = llvm.mlir.constant(1 : index) : !llvm.i64 + %76 = llvm.getelementptr %74[%75] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %77 = llvm.ptrtoint %76 : !llvm.ptr> to !llvm.i64 + %78 = llvm.mul %73, %77 : !llvm.i64 + %79 = llvm.alloca %78 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %80 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %81 = llvm.insertvalue %79, %80[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %82 = llvm.insertvalue %79, %81[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %83 = llvm.mlir.constant(0 : index) : !llvm.i64 + %84 = llvm.insertvalue %83, %82[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %85 = llvm.mlir.constant(1 : index) : !llvm.i64 + %86 = llvm.mlir.constant(16 : index) : !llvm.i64 + %87 = llvm.insertvalue %71, %84[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %88 = llvm.insertvalue %86, %87[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.insertvalue %72, %88[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %90 = llvm.insertvalue %85, %89[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %91 = llvm.mlir.constant(1 : index) : !llvm.i64 + %92 = llvm.mlir.constant(16 : index) : !llvm.i64 + %93 = llvm.mul %91, %92 : !llvm.i64 + %94 = llvm.mlir.null : !llvm.ptr> + %95 = llvm.mlir.constant(1 : index) : !llvm.i64 + %96 = llvm.getelementptr %94[%95] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %97 = llvm.ptrtoint %96 : !llvm.ptr> to !llvm.i64 + %98 = llvm.mul %93, %97 : !llvm.i64 + %99 = llvm.alloca %98 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %100 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %101 = llvm.insertvalue %99, %100[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %102 = llvm.insertvalue %99, %101[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %103 = llvm.mlir.constant(0 : index) : !llvm.i64 + %104 = llvm.insertvalue %103, %102[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %105 = llvm.mlir.constant(1 : index) : !llvm.i64 + %106 = llvm.mlir.constant(16 : index) : !llvm.i64 + %107 = llvm.insertvalue %91, %104[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %108 = llvm.insertvalue %106, %107[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.insertvalue %92, %108[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %110 = llvm.insertvalue %105, %109[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %111 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %112 = llvm.mlir.addressof @cache_16 : !llvm.ptr>> + %113 = llvm.getelementptr %112[%111, %111] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %114 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %115 = llvm.insertvalue %113, %114[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %116 = llvm.insertvalue %113, %115[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %117 = llvm.mlir.constant(0 : index) : !llvm.i64 + %118 = llvm.insertvalue %117, %116[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %119 = llvm.mlir.constant(16 : index) : !llvm.i64 + %120 = llvm.insertvalue %119, %118[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %121 = llvm.mlir.constant(12 : index) : !llvm.i64 + %122 = llvm.insertvalue %121, %120[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %123 = llvm.mlir.constant(6 : index) : !llvm.i64 + %124 = llvm.insertvalue %123, %122[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %126 = llvm.insertvalue %125, %124[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %127 = llvm.mlir.constant(2 : index) : !llvm.i64 + %128 = llvm.insertvalue %127, %126[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %129 = llvm.mlir.constant(1 : index) : !llvm.i64 + %130 = llvm.insertvalue %129, %128[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %131 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %132 = llvm.mlir.addressof @cache_17 : !llvm.ptr>> + %133 = llvm.getelementptr %132[%131, %131] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %134 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %135 = llvm.insertvalue %133, %134[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %136 = llvm.insertvalue %133, %135[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %137 = llvm.mlir.constant(0 : index) : !llvm.i64 + %138 = llvm.insertvalue %137, %136[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %139 = llvm.mlir.constant(16 : index) : !llvm.i64 + %140 = llvm.insertvalue %139, %138[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.insertvalue %141, %140[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %143 = llvm.mlir.constant(128 : index) : !llvm.i64 + %144 = llvm.insertvalue %143, %142[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %145 = llvm.mlir.constant(2 : index) : !llvm.i64 + %146 = llvm.insertvalue %145, %144[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %147 = llvm.mlir.constant(2 : index) : !llvm.i64 + %148 = llvm.insertvalue %147, %146[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %150 = llvm.insertvalue %149, %148[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + llvm.br ^bb1(%67 : !llvm.i64) + ^bb1(%151: !llvm.i64): // 2 preds: ^bb0, ^bb52 + %152 = llvm.icmp "slt" %151, %36 : !llvm.i64 + llvm.cond_br %152, ^bb2, ^bb53 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%67 : !llvm.i64) + ^bb3(%153: !llvm.i64): // 2 preds: ^bb2, ^bb10 + %154 = llvm.icmp "slt" %153, %39 : !llvm.i64 + llvm.cond_br %154, ^bb4, ^bb11 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%67 : !llvm.i64) + ^bb5(%155: !llvm.i64): // 2 preds: ^bb4, ^bb9 + %156 = llvm.icmp "slt" %155, %38 : !llvm.i64 + llvm.cond_br %156, ^bb6, ^bb10 + ^bb6: // pred: ^bb5 + llvm.cond_br %40, ^bb7, ^bb8 + ^bb7: // pred: ^bb6 + %157 = llvm.add %151, %155 : !llvm.i64 + %158 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %161 = llvm.mul %153, %160 : !llvm.i64 + %162 = llvm.add %159, %161 : !llvm.i64 + %163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %164 = llvm.mul %157, %163 : !llvm.i64 + %165 = llvm.add %162, %164 : !llvm.i64 + %166 = llvm.getelementptr %158[%165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %167 = llvm.bitcast %166 : !llvm.ptr to !llvm.ptr> + %168 = llvm.load %167 {alignment = 4 : i64} : !llvm.ptr> + %169 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(16 : index) : !llvm.i64 + %172 = llvm.mul %67, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %67, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %168, %177 : !llvm.ptr> + %178 = llvm.add %157, %70 : !llvm.i64 + %179 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %182 = llvm.mul %153, %181 : !llvm.i64 + %183 = llvm.add %180, %182 : !llvm.i64 + %184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %185 = llvm.mul %178, %184 : !llvm.i64 + %186 = llvm.add %183, %185 : !llvm.i64 + %187 = llvm.getelementptr %179[%186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %188 = llvm.bitcast %187 : !llvm.ptr to !llvm.ptr> + %189 = llvm.load %188 {alignment = 4 : i64} : !llvm.ptr> + %190 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %192 = llvm.mlir.constant(16 : index) : !llvm.i64 + %193 = llvm.mul %67, %192 : !llvm.i64 + %194 = llvm.add %191, %193 : !llvm.i64 + %195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %196 = llvm.mul %69, %195 : !llvm.i64 + %197 = llvm.add %194, %196 : !llvm.i64 + %198 = llvm.getelementptr %190[%197] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %189, %198 : !llvm.ptr> + %199 = llvm.add %157, %68 : !llvm.i64 + %200 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(512 : index) : !llvm.i64 + %203 = llvm.mul %153, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %199, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.bitcast %208 : !llvm.ptr to !llvm.ptr> + %210 = llvm.load %209 {alignment = 4 : i64} : !llvm.ptr> + %211 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %212 = llvm.mlir.constant(0 : index) : !llvm.i64 + %213 = llvm.mlir.constant(16 : index) : !llvm.i64 + %214 = llvm.mul %67, %213 : !llvm.i64 + %215 = llvm.add %212, %214 : !llvm.i64 + %216 = llvm.mlir.constant(1 : index) : !llvm.i64 + %217 = llvm.mul %63, %216 : !llvm.i64 + %218 = llvm.add %215, %217 : !llvm.i64 + %219 = llvm.getelementptr %211[%218] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %210, %219 : !llvm.ptr> + %220 = llvm.add %157, %41 : !llvm.i64 + %221 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %223 = llvm.mlir.constant(512 : index) : !llvm.i64 + %224 = llvm.mul %153, %223 : !llvm.i64 + %225 = llvm.add %222, %224 : !llvm.i64 + %226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %227 = llvm.mul %220, %226 : !llvm.i64 + %228 = llvm.add %225, %227 : !llvm.i64 + %229 = llvm.getelementptr %221[%228] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %230 = llvm.bitcast %229 : !llvm.ptr to !llvm.ptr> + %231 = llvm.load %230 {alignment = 4 : i64} : !llvm.ptr> + %232 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %233 = llvm.mlir.constant(0 : index) : !llvm.i64 + %234 = llvm.mlir.constant(16 : index) : !llvm.i64 + %235 = llvm.mul %67, %234 : !llvm.i64 + %236 = llvm.add %233, %235 : !llvm.i64 + %237 = llvm.mlir.constant(1 : index) : !llvm.i64 + %238 = llvm.mul %45, %237 : !llvm.i64 + %239 = llvm.add %236, %238 : !llvm.i64 + %240 = llvm.getelementptr %232[%239] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %231, %240 : !llvm.ptr> + %241 = llvm.add %157, %42 : !llvm.i64 + %242 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %243 = llvm.mlir.constant(0 : index) : !llvm.i64 + %244 = llvm.mlir.constant(512 : index) : !llvm.i64 + %245 = llvm.mul %153, %244 : !llvm.i64 + %246 = llvm.add %243, %245 : !llvm.i64 + %247 = llvm.mlir.constant(1 : index) : !llvm.i64 + %248 = llvm.mul %241, %247 : !llvm.i64 + %249 = llvm.add %246, %248 : !llvm.i64 + %250 = llvm.getelementptr %242[%249] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %251 = llvm.bitcast %250 : !llvm.ptr to !llvm.ptr> + %252 = llvm.load %251 {alignment = 4 : i64} : !llvm.ptr> + %253 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %254 = llvm.mlir.constant(0 : index) : !llvm.i64 + %255 = llvm.mlir.constant(16 : index) : !llvm.i64 + %256 = llvm.mul %67, %255 : !llvm.i64 + %257 = llvm.add %254, %256 : !llvm.i64 + %258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %259 = llvm.mul %48, %258 : !llvm.i64 + %260 = llvm.add %257, %259 : !llvm.i64 + %261 = llvm.getelementptr %253[%260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %252, %261 : !llvm.ptr> + %262 = llvm.add %157, %43 : !llvm.i64 + %263 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %264 = llvm.mlir.constant(0 : index) : !llvm.i64 + %265 = llvm.mlir.constant(512 : index) : !llvm.i64 + %266 = llvm.mul %153, %265 : !llvm.i64 + %267 = llvm.add %264, %266 : !llvm.i64 + %268 = llvm.mlir.constant(1 : index) : !llvm.i64 + %269 = llvm.mul %262, %268 : !llvm.i64 + %270 = llvm.add %267, %269 : !llvm.i64 + %271 = llvm.getelementptr %263[%270] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %272 = llvm.bitcast %271 : !llvm.ptr to !llvm.ptr> + %273 = llvm.load %272 {alignment = 4 : i64} : !llvm.ptr> + %274 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %275 = llvm.mlir.constant(0 : index) : !llvm.i64 + %276 = llvm.mlir.constant(16 : index) : !llvm.i64 + %277 = llvm.mul %67, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.mlir.constant(1 : index) : !llvm.i64 + %280 = llvm.mul %52, %279 : !llvm.i64 + %281 = llvm.add %278, %280 : !llvm.i64 + %282 = llvm.getelementptr %274[%281] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %273, %282 : !llvm.ptr> + %283 = llvm.add %157, %44 : !llvm.i64 + %284 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %286 = llvm.mlir.constant(512 : index) : !llvm.i64 + %287 = llvm.mul %153, %286 : !llvm.i64 + %288 = llvm.add %285, %287 : !llvm.i64 + %289 = llvm.mlir.constant(1 : index) : !llvm.i64 + %290 = llvm.mul %283, %289 : !llvm.i64 + %291 = llvm.add %288, %290 : !llvm.i64 + %292 = llvm.getelementptr %284[%291] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %293 = llvm.bitcast %292 : !llvm.ptr to !llvm.ptr> + %294 = llvm.load %293 {alignment = 4 : i64} : !llvm.ptr> + %295 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %296 = llvm.mlir.constant(0 : index) : !llvm.i64 + %297 = llvm.mlir.constant(16 : index) : !llvm.i64 + %298 = llvm.mul %67, %297 : !llvm.i64 + %299 = llvm.add %296, %298 : !llvm.i64 + %300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %301 = llvm.mul %56, %300 : !llvm.i64 + %302 = llvm.add %299, %301 : !llvm.i64 + %303 = llvm.getelementptr %295[%302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %294, %303 : !llvm.ptr> + %304 = llvm.add %157, %46 : !llvm.i64 + %305 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %306 = llvm.mlir.constant(0 : index) : !llvm.i64 + %307 = llvm.mlir.constant(512 : index) : !llvm.i64 + %308 = llvm.mul %153, %307 : !llvm.i64 + %309 = llvm.add %306, %308 : !llvm.i64 + %310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %311 = llvm.mul %304, %310 : !llvm.i64 + %312 = llvm.add %309, %311 : !llvm.i64 + %313 = llvm.getelementptr %305[%312] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %314 = llvm.bitcast %313 : !llvm.ptr to !llvm.ptr> + %315 = llvm.load %314 {alignment = 4 : i64} : !llvm.ptr> + %316 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %318 = llvm.mlir.constant(16 : index) : !llvm.i64 + %319 = llvm.mul %67, %318 : !llvm.i64 + %320 = llvm.add %317, %319 : !llvm.i64 + %321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %322 = llvm.mul %61, %321 : !llvm.i64 + %323 = llvm.add %320, %322 : !llvm.i64 + %324 = llvm.getelementptr %316[%323] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %315, %324 : !llvm.ptr> + %325 = llvm.add %157, %47 : !llvm.i64 + %326 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %328 = llvm.mlir.constant(512 : index) : !llvm.i64 + %329 = llvm.mul %153, %328 : !llvm.i64 + %330 = llvm.add %327, %329 : !llvm.i64 + %331 = llvm.mlir.constant(1 : index) : !llvm.i64 + %332 = llvm.mul %325, %331 : !llvm.i64 + %333 = llvm.add %330, %332 : !llvm.i64 + %334 = llvm.getelementptr %326[%333] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %335 = llvm.bitcast %334 : !llvm.ptr to !llvm.ptr> + %336 = llvm.load %335 {alignment = 4 : i64} : !llvm.ptr> + %337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %340 = llvm.mul %67, %339 : !llvm.i64 + %341 = llvm.add %338, %340 : !llvm.i64 + %342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %343 = llvm.mul %70, %342 : !llvm.i64 + %344 = llvm.add %341, %343 : !llvm.i64 + %345 = llvm.getelementptr %337[%344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %336, %345 : !llvm.ptr> + %346 = llvm.add %157, %49 : !llvm.i64 + %347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %350 = llvm.mul %153, %349 : !llvm.i64 + %351 = llvm.add %348, %350 : !llvm.i64 + %352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %353 = llvm.mul %346, %352 : !llvm.i64 + %354 = llvm.add %351, %353 : !llvm.i64 + %355 = llvm.getelementptr %347[%354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %356 = llvm.bitcast %355 : !llvm.ptr to !llvm.ptr> + %357 = llvm.load %356 {alignment = 4 : i64} : !llvm.ptr> + %358 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %360 = llvm.mlir.constant(16 : index) : !llvm.i64 + %361 = llvm.mul %67, %360 : !llvm.i64 + %362 = llvm.add %359, %361 : !llvm.i64 + %363 = llvm.mlir.constant(1 : index) : !llvm.i64 + %364 = llvm.mul %50, %363 : !llvm.i64 + %365 = llvm.add %362, %364 : !llvm.i64 + %366 = llvm.getelementptr %358[%365] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %357, %366 : !llvm.ptr> + %367 = llvm.add %157, %51 : !llvm.i64 + %368 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %371 = llvm.mul %153, %370 : !llvm.i64 + %372 = llvm.add %369, %371 : !llvm.i64 + %373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %374 = llvm.mul %367, %373 : !llvm.i64 + %375 = llvm.add %372, %374 : !llvm.i64 + %376 = llvm.getelementptr %368[%375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %377 = llvm.bitcast %376 : !llvm.ptr to !llvm.ptr> + %378 = llvm.load %377 {alignment = 4 : i64} : !llvm.ptr> + %379 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %380 = llvm.mlir.constant(0 : index) : !llvm.i64 + %381 = llvm.mlir.constant(16 : index) : !llvm.i64 + %382 = llvm.mul %67, %381 : !llvm.i64 + %383 = llvm.add %380, %382 : !llvm.i64 + %384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %385 = llvm.mul %33, %384 : !llvm.i64 + %386 = llvm.add %383, %385 : !llvm.i64 + %387 = llvm.getelementptr %379[%386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %378, %387 : !llvm.ptr> + %388 = llvm.add %157, %53 : !llvm.i64 + %389 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %390 = llvm.mlir.constant(0 : index) : !llvm.i64 + %391 = llvm.mlir.constant(512 : index) : !llvm.i64 + %392 = llvm.mul %153, %391 : !llvm.i64 + %393 = llvm.add %390, %392 : !llvm.i64 + %394 = llvm.mlir.constant(1 : index) : !llvm.i64 + %395 = llvm.mul %388, %394 : !llvm.i64 + %396 = llvm.add %393, %395 : !llvm.i64 + %397 = llvm.getelementptr %389[%396] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %398 = llvm.bitcast %397 : !llvm.ptr to !llvm.ptr> + %399 = llvm.load %398 {alignment = 4 : i64} : !llvm.ptr> + %400 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %403 = llvm.mul %67, %402 : !llvm.i64 + %404 = llvm.add %401, %403 : !llvm.i64 + %405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %406 = llvm.mul %54, %405 : !llvm.i64 + %407 = llvm.add %404, %406 : !llvm.i64 + %408 = llvm.getelementptr %400[%407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %399, %408 : !llvm.ptr> + %409 = llvm.add %157, %55 : !llvm.i64 + %410 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %413 = llvm.mul %153, %412 : !llvm.i64 + %414 = llvm.add %411, %413 : !llvm.i64 + %415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %416 = llvm.mul %409, %415 : !llvm.i64 + %417 = llvm.add %414, %416 : !llvm.i64 + %418 = llvm.getelementptr %410[%417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %419 = llvm.bitcast %418 : !llvm.ptr to !llvm.ptr> + %420 = llvm.load %419 {alignment = 4 : i64} : !llvm.ptr> + %421 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %423 = llvm.mlir.constant(16 : index) : !llvm.i64 + %424 = llvm.mul %67, %423 : !llvm.i64 + %425 = llvm.add %422, %424 : !llvm.i64 + %426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %427 = llvm.mul %34, %426 : !llvm.i64 + %428 = llvm.add %425, %427 : !llvm.i64 + %429 = llvm.getelementptr %421[%428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %420, %429 : !llvm.ptr> + %430 = llvm.add %157, %57 : !llvm.i64 + %431 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %433 = llvm.mlir.constant(512 : index) : !llvm.i64 + %434 = llvm.mul %153, %433 : !llvm.i64 + %435 = llvm.add %432, %434 : !llvm.i64 + %436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %437 = llvm.mul %430, %436 : !llvm.i64 + %438 = llvm.add %435, %437 : !llvm.i64 + %439 = llvm.getelementptr %431[%438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %440 = llvm.bitcast %439 : !llvm.ptr to !llvm.ptr> + %441 = llvm.load %440 {alignment = 4 : i64} : !llvm.ptr> + %442 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %444 = llvm.mlir.constant(16 : index) : !llvm.i64 + %445 = llvm.mul %67, %444 : !llvm.i64 + %446 = llvm.add %443, %445 : !llvm.i64 + %447 = llvm.mlir.constant(1 : index) : !llvm.i64 + %448 = llvm.mul %58, %447 : !llvm.i64 + %449 = llvm.add %446, %448 : !llvm.i64 + %450 = llvm.getelementptr %442[%449] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %441, %450 : !llvm.ptr> + %451 = llvm.add %157, %59 : !llvm.i64 + %452 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %455 = llvm.mul %153, %454 : !llvm.i64 + %456 = llvm.add %453, %455 : !llvm.i64 + %457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %458 = llvm.mul %451, %457 : !llvm.i64 + %459 = llvm.add %456, %458 : !llvm.i64 + %460 = llvm.getelementptr %452[%459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %461 = llvm.bitcast %460 : !llvm.ptr to !llvm.ptr> + %462 = llvm.load %461 {alignment = 4 : i64} : !llvm.ptr> + %463 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %464 = llvm.mlir.constant(0 : index) : !llvm.i64 + %465 = llvm.mlir.constant(16 : index) : !llvm.i64 + %466 = llvm.mul %67, %465 : !llvm.i64 + %467 = llvm.add %464, %466 : !llvm.i64 + %468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %469 = llvm.mul %35, %468 : !llvm.i64 + %470 = llvm.add %467, %469 : !llvm.i64 + %471 = llvm.getelementptr %463[%470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %462, %471 : !llvm.ptr> + %472 = llvm.add %157, %62 : !llvm.i64 + %473 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %474 = llvm.mlir.constant(0 : index) : !llvm.i64 + %475 = llvm.mlir.constant(512 : index) : !llvm.i64 + %476 = llvm.mul %153, %475 : !llvm.i64 + %477 = llvm.add %474, %476 : !llvm.i64 + %478 = llvm.mlir.constant(1 : index) : !llvm.i64 + %479 = llvm.mul %472, %478 : !llvm.i64 + %480 = llvm.add %477, %479 : !llvm.i64 + %481 = llvm.getelementptr %473[%480] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %482 = llvm.bitcast %481 : !llvm.ptr to !llvm.ptr> + %483 = llvm.load %482 {alignment = 4 : i64} : !llvm.ptr> + %484 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %485 = llvm.mlir.constant(0 : index) : !llvm.i64 + %486 = llvm.mlir.constant(16 : index) : !llvm.i64 + %487 = llvm.mul %67, %486 : !llvm.i64 + %488 = llvm.add %485, %487 : !llvm.i64 + %489 = llvm.mlir.constant(1 : index) : !llvm.i64 + %490 = llvm.mul %66, %489 : !llvm.i64 + %491 = llvm.add %488, %490 : !llvm.i64 + %492 = llvm.getelementptr %484[%491] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %483, %492 : !llvm.ptr> + %493 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %494 = llvm.mlir.constant(0 : index) : !llvm.i64 + %495 = llvm.mlir.constant(16 : index) : !llvm.i64 + %496 = llvm.mul %67, %495 : !llvm.i64 + %497 = llvm.add %494, %496 : !llvm.i64 + %498 = llvm.mlir.constant(1 : index) : !llvm.i64 + %499 = llvm.mul %67, %498 : !llvm.i64 + %500 = llvm.add %497, %499 : !llvm.i64 + %501 = llvm.getelementptr %493[%500] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %502 = llvm.load %501 : !llvm.ptr> + %503 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %504 = llvm.sub %64, %155 : !llvm.i64 + %505 = llvm.select %503, %504, %155 : !llvm.i1, !llvm.i64 + %506 = llvm.sdiv %505, %68 : !llvm.i64 + %507 = llvm.sub %64, %506 : !llvm.i64 + %508 = llvm.select %503, %507, %506 : !llvm.i1, !llvm.i64 + %509 = llvm.srem %508, %68 : !llvm.i64 + %510 = llvm.icmp "slt" %509, %67 : !llvm.i64 + %511 = llvm.add %509, %68 : !llvm.i64 + %512 = llvm.select %510, %511, %509 : !llvm.i1, !llvm.i64 + %513 = llvm.srem %153, %39 : !llvm.i64 + %514 = llvm.icmp "slt" %513, %67 : !llvm.i64 + %515 = llvm.add %513, %39 : !llvm.i64 + %516 = llvm.select %514, %515, %513 : !llvm.i1, !llvm.i64 + %517 = llvm.srem %155, %68 : !llvm.i64 + %518 = llvm.icmp "slt" %517, %67 : !llvm.i64 + %519 = llvm.add %517, %68 : !llvm.i64 + %520 = llvm.select %518, %519, %517 : !llvm.i1, !llvm.i64 + %521 = llvm.icmp "slt" %520, %67 : !llvm.i64 + %522 = llvm.sub %64, %520 : !llvm.i64 + %523 = llvm.select %521, %522, %520 : !llvm.i1, !llvm.i64 + %524 = llvm.sdiv %523, %70 : !llvm.i64 + %525 = llvm.sub %64, %524 : !llvm.i64 + %526 = llvm.select %521, %525, %524 : !llvm.i1, !llvm.i64 + %527 = llvm.srem %526, %63 : !llvm.i64 + %528 = llvm.icmp "slt" %527, %67 : !llvm.i64 + %529 = llvm.add %527, %63 : !llvm.i64 + %530 = llvm.select %528, %529, %527 : !llvm.i1, !llvm.i64 + %531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %534 = llvm.mul %512, %533 : !llvm.i64 + %535 = llvm.add %532, %534 : !llvm.i64 + %536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %537 = llvm.mul %516, %536 : !llvm.i64 + %538 = llvm.add %535, %537 : !llvm.i64 + %539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %540 = llvm.mul %530, %539 : !llvm.i64 + %541 = llvm.add %538, %540 : !llvm.i64 + %542 = llvm.getelementptr %531[%541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %502, %542 : !llvm.ptr> + %543 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %544 = llvm.mlir.constant(0 : index) : !llvm.i64 + %545 = llvm.mlir.constant(16 : index) : !llvm.i64 + %546 = llvm.mul %67, %545 : !llvm.i64 + %547 = llvm.add %544, %546 : !llvm.i64 + %548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %549 = llvm.mul %69, %548 : !llvm.i64 + %550 = llvm.add %547, %549 : !llvm.i64 + %551 = llvm.getelementptr %543[%550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %552 = llvm.load %551 : !llvm.ptr> + %553 = llvm.add %155, %70 : !llvm.i64 + %554 = llvm.icmp "slt" %553, %67 : !llvm.i64 + %555 = llvm.sub %64, %553 : !llvm.i64 + %556 = llvm.select %554, %555, %553 : !llvm.i1, !llvm.i64 + %557 = llvm.sdiv %556, %68 : !llvm.i64 + %558 = llvm.sub %64, %557 : !llvm.i64 + %559 = llvm.select %554, %558, %557 : !llvm.i1, !llvm.i64 + %560 = llvm.srem %559, %68 : !llvm.i64 + %561 = llvm.icmp "slt" %560, %67 : !llvm.i64 + %562 = llvm.add %560, %68 : !llvm.i64 + %563 = llvm.select %561, %562, %560 : !llvm.i1, !llvm.i64 + %564 = llvm.sdiv %505, %70 : !llvm.i64 + %565 = llvm.sub %64, %564 : !llvm.i64 + %566 = llvm.select %503, %565, %564 : !llvm.i1, !llvm.i64 + %567 = llvm.mul %559, %65 : !llvm.i64 + %568 = llvm.add %566, %567 : !llvm.i64 + %569 = llvm.add %568, %69 : !llvm.i64 + %570 = llvm.icmp "slt" %569, %67 : !llvm.i64 + %571 = llvm.sub %64, %569 : !llvm.i64 + %572 = llvm.select %570, %571, %569 : !llvm.i1, !llvm.i64 + %573 = llvm.sdiv %572, %63 : !llvm.i64 + %574 = llvm.sub %64, %573 : !llvm.i64 + %575 = llvm.select %570, %574, %573 : !llvm.i1, !llvm.i64 + %576 = llvm.mul %575, %65 : !llvm.i64 + %577 = llvm.add %568, %576 : !llvm.i64 + %578 = llvm.add %577, %69 : !llvm.i64 + %579 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %581 = llvm.mlir.constant(256 : index) : !llvm.i64 + %582 = llvm.mul %563, %581 : !llvm.i64 + %583 = llvm.add %580, %582 : !llvm.i64 + %584 = llvm.mlir.constant(2 : index) : !llvm.i64 + %585 = llvm.mul %516, %584 : !llvm.i64 + %586 = llvm.add %583, %585 : !llvm.i64 + %587 = llvm.mlir.constant(1 : index) : !llvm.i64 + %588 = llvm.mul %578, %587 : !llvm.i64 + %589 = llvm.add %586, %588 : !llvm.i64 + %590 = llvm.getelementptr %579[%589] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %552, %590 : !llvm.ptr> + %591 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %593 = llvm.mlir.constant(16 : index) : !llvm.i64 + %594 = llvm.mul %67, %593 : !llvm.i64 + %595 = llvm.add %592, %594 : !llvm.i64 + %596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %597 = llvm.mul %63, %596 : !llvm.i64 + %598 = llvm.add %595, %597 : !llvm.i64 + %599 = llvm.getelementptr %591[%598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %600 = llvm.load %599 : !llvm.ptr> + %601 = llvm.add %508, %69 : !llvm.i64 + %602 = llvm.icmp "slt" %601, %67 : !llvm.i64 + %603 = llvm.sub %64, %601 : !llvm.i64 + %604 = llvm.select %602, %603, %601 : !llvm.i1, !llvm.i64 + %605 = llvm.sdiv %604, %68 : !llvm.i64 + %606 = llvm.sub %64, %605 : !llvm.i64 + %607 = llvm.select %602, %606, %605 : !llvm.i1, !llvm.i64 + %608 = llvm.mul %607, %60 : !llvm.i64 + %609 = llvm.add %508, %608 : !llvm.i64 + %610 = llvm.add %609, %69 : !llvm.i64 + %611 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %612 = llvm.mlir.constant(0 : index) : !llvm.i64 + %613 = llvm.mlir.constant(256 : index) : !llvm.i64 + %614 = llvm.mul %610, %613 : !llvm.i64 + %615 = llvm.add %612, %614 : !llvm.i64 + %616 = llvm.mlir.constant(2 : index) : !llvm.i64 + %617 = llvm.mul %516, %616 : !llvm.i64 + %618 = llvm.add %615, %617 : !llvm.i64 + %619 = llvm.mlir.constant(1 : index) : !llvm.i64 + %620 = llvm.mul %530, %619 : !llvm.i64 + %621 = llvm.add %618, %620 : !llvm.i64 + %622 = llvm.getelementptr %611[%621] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %600, %622 : !llvm.ptr> + %623 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %624 = llvm.mlir.constant(0 : index) : !llvm.i64 + %625 = llvm.mlir.constant(16 : index) : !llvm.i64 + %626 = llvm.mul %67, %625 : !llvm.i64 + %627 = llvm.add %624, %626 : !llvm.i64 + %628 = llvm.mlir.constant(1 : index) : !llvm.i64 + %629 = llvm.mul %45, %628 : !llvm.i64 + %630 = llvm.add %627, %629 : !llvm.i64 + %631 = llvm.getelementptr %623[%630] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %632 = llvm.load %631 : !llvm.ptr> + %633 = llvm.add %155, %41 : !llvm.i64 + %634 = llvm.icmp "slt" %633, %67 : !llvm.i64 + %635 = llvm.sub %64, %633 : !llvm.i64 + %636 = llvm.select %634, %635, %633 : !llvm.i1, !llvm.i64 + %637 = llvm.sdiv %636, %68 : !llvm.i64 + %638 = llvm.sub %64, %637 : !llvm.i64 + %639 = llvm.select %634, %638, %637 : !llvm.i1, !llvm.i64 + %640 = llvm.srem %639, %68 : !llvm.i64 + %641 = llvm.icmp "slt" %640, %67 : !llvm.i64 + %642 = llvm.add %640, %68 : !llvm.i64 + %643 = llvm.select %641, %642, %640 : !llvm.i1, !llvm.i64 + %644 = llvm.mul %639, %65 : !llvm.i64 + %645 = llvm.add %566, %644 : !llvm.i64 + %646 = llvm.add %645, %45 : !llvm.i64 + %647 = llvm.icmp "slt" %646, %67 : !llvm.i64 + %648 = llvm.sub %64, %646 : !llvm.i64 + %649 = llvm.select %647, %648, %646 : !llvm.i1, !llvm.i64 + %650 = llvm.sdiv %649, %63 : !llvm.i64 + %651 = llvm.sub %64, %650 : !llvm.i64 + %652 = llvm.select %647, %651, %650 : !llvm.i1, !llvm.i64 + %653 = llvm.mul %652, %65 : !llvm.i64 + %654 = llvm.add %645, %653 : !llvm.i64 + %655 = llvm.add %654, %45 : !llvm.i64 + %656 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %658 = llvm.mlir.constant(256 : index) : !llvm.i64 + %659 = llvm.mul %643, %658 : !llvm.i64 + %660 = llvm.add %657, %659 : !llvm.i64 + %661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %662 = llvm.mul %516, %661 : !llvm.i64 + %663 = llvm.add %660, %662 : !llvm.i64 + %664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %665 = llvm.mul %655, %664 : !llvm.i64 + %666 = llvm.add %663, %665 : !llvm.i64 + %667 = llvm.getelementptr %656[%666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %632, %667 : !llvm.ptr> + %668 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %670 = llvm.mlir.constant(16 : index) : !llvm.i64 + %671 = llvm.mul %67, %670 : !llvm.i64 + %672 = llvm.add %669, %671 : !llvm.i64 + %673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %674 = llvm.mul %48, %673 : !llvm.i64 + %675 = llvm.add %672, %674 : !llvm.i64 + %676 = llvm.getelementptr %668[%675] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %677 = llvm.load %676 : !llvm.ptr> + %678 = llvm.add %508, %63 : !llvm.i64 + %679 = llvm.icmp "slt" %678, %67 : !llvm.i64 + %680 = llvm.sub %64, %678 : !llvm.i64 + %681 = llvm.select %679, %680, %678 : !llvm.i1, !llvm.i64 + %682 = llvm.sdiv %681, %68 : !llvm.i64 + %683 = llvm.sub %64, %682 : !llvm.i64 + %684 = llvm.select %679, %683, %682 : !llvm.i1, !llvm.i64 + %685 = llvm.mul %684, %60 : !llvm.i64 + %686 = llvm.add %508, %685 : !llvm.i64 + %687 = llvm.add %686, %63 : !llvm.i64 + %688 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %690 = llvm.mlir.constant(256 : index) : !llvm.i64 + %691 = llvm.mul %687, %690 : !llvm.i64 + %692 = llvm.add %689, %691 : !llvm.i64 + %693 = llvm.mlir.constant(2 : index) : !llvm.i64 + %694 = llvm.mul %516, %693 : !llvm.i64 + %695 = llvm.add %692, %694 : !llvm.i64 + %696 = llvm.mlir.constant(1 : index) : !llvm.i64 + %697 = llvm.mul %530, %696 : !llvm.i64 + %698 = llvm.add %695, %697 : !llvm.i64 + %699 = llvm.getelementptr %688[%698] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %677, %699 : !llvm.ptr> + %700 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %701 = llvm.mlir.constant(0 : index) : !llvm.i64 + %702 = llvm.mlir.constant(16 : index) : !llvm.i64 + %703 = llvm.mul %67, %702 : !llvm.i64 + %704 = llvm.add %701, %703 : !llvm.i64 + %705 = llvm.mlir.constant(1 : index) : !llvm.i64 + %706 = llvm.mul %52, %705 : !llvm.i64 + %707 = llvm.add %704, %706 : !llvm.i64 + %708 = llvm.getelementptr %700[%707] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %709 = llvm.load %708 : !llvm.ptr> + %710 = llvm.add %155, %43 : !llvm.i64 + %711 = llvm.icmp "slt" %710, %67 : !llvm.i64 + %712 = llvm.sub %64, %710 : !llvm.i64 + %713 = llvm.select %711, %712, %710 : !llvm.i1, !llvm.i64 + %714 = llvm.sdiv %713, %68 : !llvm.i64 + %715 = llvm.sub %64, %714 : !llvm.i64 + %716 = llvm.select %711, %715, %714 : !llvm.i1, !llvm.i64 + %717 = llvm.srem %716, %68 : !llvm.i64 + %718 = llvm.icmp "slt" %717, %67 : !llvm.i64 + %719 = llvm.add %717, %68 : !llvm.i64 + %720 = llvm.select %718, %719, %717 : !llvm.i1, !llvm.i64 + %721 = llvm.mul %716, %65 : !llvm.i64 + %722 = llvm.add %566, %721 : !llvm.i64 + %723 = llvm.add %722, %52 : !llvm.i64 + %724 = llvm.icmp "slt" %723, %67 : !llvm.i64 + %725 = llvm.sub %64, %723 : !llvm.i64 + %726 = llvm.select %724, %725, %723 : !llvm.i1, !llvm.i64 + %727 = llvm.sdiv %726, %63 : !llvm.i64 + %728 = llvm.sub %64, %727 : !llvm.i64 + %729 = llvm.select %724, %728, %727 : !llvm.i1, !llvm.i64 + %730 = llvm.mul %729, %65 : !llvm.i64 + %731 = llvm.add %722, %730 : !llvm.i64 + %732 = llvm.add %731, %52 : !llvm.i64 + %733 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %734 = llvm.mlir.constant(0 : index) : !llvm.i64 + %735 = llvm.mlir.constant(256 : index) : !llvm.i64 + %736 = llvm.mul %720, %735 : !llvm.i64 + %737 = llvm.add %734, %736 : !llvm.i64 + %738 = llvm.mlir.constant(2 : index) : !llvm.i64 + %739 = llvm.mul %516, %738 : !llvm.i64 + %740 = llvm.add %737, %739 : !llvm.i64 + %741 = llvm.mlir.constant(1 : index) : !llvm.i64 + %742 = llvm.mul %732, %741 : !llvm.i64 + %743 = llvm.add %740, %742 : !llvm.i64 + %744 = llvm.getelementptr %733[%743] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %709, %744 : !llvm.ptr> + %745 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %746 = llvm.mlir.constant(0 : index) : !llvm.i64 + %747 = llvm.mlir.constant(16 : index) : !llvm.i64 + %748 = llvm.mul %67, %747 : !llvm.i64 + %749 = llvm.add %746, %748 : !llvm.i64 + %750 = llvm.mlir.constant(1 : index) : !llvm.i64 + %751 = llvm.mul %56, %750 : !llvm.i64 + %752 = llvm.add %749, %751 : !llvm.i64 + %753 = llvm.getelementptr %745[%752] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %754 = llvm.load %753 : !llvm.ptr> + %755 = llvm.add %508, %45 : !llvm.i64 + %756 = llvm.icmp "slt" %755, %67 : !llvm.i64 + %757 = llvm.sub %64, %755 : !llvm.i64 + %758 = llvm.select %756, %757, %755 : !llvm.i1, !llvm.i64 + %759 = llvm.sdiv %758, %68 : !llvm.i64 + %760 = llvm.sub %64, %759 : !llvm.i64 + %761 = llvm.select %756, %760, %759 : !llvm.i1, !llvm.i64 + %762 = llvm.mul %761, %60 : !llvm.i64 + %763 = llvm.add %508, %762 : !llvm.i64 + %764 = llvm.add %763, %45 : !llvm.i64 + %765 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %766 = llvm.mlir.constant(0 : index) : !llvm.i64 + %767 = llvm.mlir.constant(256 : index) : !llvm.i64 + %768 = llvm.mul %764, %767 : !llvm.i64 + %769 = llvm.add %766, %768 : !llvm.i64 + %770 = llvm.mlir.constant(2 : index) : !llvm.i64 + %771 = llvm.mul %516, %770 : !llvm.i64 + %772 = llvm.add %769, %771 : !llvm.i64 + %773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %774 = llvm.mul %530, %773 : !llvm.i64 + %775 = llvm.add %772, %774 : !llvm.i64 + %776 = llvm.getelementptr %765[%775] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %754, %776 : !llvm.ptr> + %777 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %779 = llvm.mlir.constant(16 : index) : !llvm.i64 + %780 = llvm.mul %67, %779 : !llvm.i64 + %781 = llvm.add %778, %780 : !llvm.i64 + %782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %783 = llvm.mul %61, %782 : !llvm.i64 + %784 = llvm.add %781, %783 : !llvm.i64 + %785 = llvm.getelementptr %777[%784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %786 = llvm.load %785 : !llvm.ptr> + %787 = llvm.add %155, %46 : !llvm.i64 + %788 = llvm.icmp "slt" %787, %67 : !llvm.i64 + %789 = llvm.sub %64, %787 : !llvm.i64 + %790 = llvm.select %788, %789, %787 : !llvm.i1, !llvm.i64 + %791 = llvm.sdiv %790, %68 : !llvm.i64 + %792 = llvm.sub %64, %791 : !llvm.i64 + %793 = llvm.select %788, %792, %791 : !llvm.i1, !llvm.i64 + %794 = llvm.srem %793, %68 : !llvm.i64 + %795 = llvm.icmp "slt" %794, %67 : !llvm.i64 + %796 = llvm.add %794, %68 : !llvm.i64 + %797 = llvm.select %795, %796, %794 : !llvm.i1, !llvm.i64 + %798 = llvm.mul %793, %65 : !llvm.i64 + %799 = llvm.add %566, %798 : !llvm.i64 + %800 = llvm.add %799, %61 : !llvm.i64 + %801 = llvm.icmp "slt" %800, %67 : !llvm.i64 + %802 = llvm.sub %64, %800 : !llvm.i64 + %803 = llvm.select %801, %802, %800 : !llvm.i1, !llvm.i64 + %804 = llvm.sdiv %803, %63 : !llvm.i64 + %805 = llvm.sub %64, %804 : !llvm.i64 + %806 = llvm.select %801, %805, %804 : !llvm.i1, !llvm.i64 + %807 = llvm.mul %806, %65 : !llvm.i64 + %808 = llvm.add %799, %807 : !llvm.i64 + %809 = llvm.add %808, %61 : !llvm.i64 + %810 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %811 = llvm.mlir.constant(0 : index) : !llvm.i64 + %812 = llvm.mlir.constant(256 : index) : !llvm.i64 + %813 = llvm.mul %797, %812 : !llvm.i64 + %814 = llvm.add %811, %813 : !llvm.i64 + %815 = llvm.mlir.constant(2 : index) : !llvm.i64 + %816 = llvm.mul %516, %815 : !llvm.i64 + %817 = llvm.add %814, %816 : !llvm.i64 + %818 = llvm.mlir.constant(1 : index) : !llvm.i64 + %819 = llvm.mul %809, %818 : !llvm.i64 + %820 = llvm.add %817, %819 : !llvm.i64 + %821 = llvm.getelementptr %810[%820] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %786, %821 : !llvm.ptr> + %822 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %824 = llvm.mlir.constant(16 : index) : !llvm.i64 + %825 = llvm.mul %67, %824 : !llvm.i64 + %826 = llvm.add %823, %825 : !llvm.i64 + %827 = llvm.mlir.constant(1 : index) : !llvm.i64 + %828 = llvm.mul %70, %827 : !llvm.i64 + %829 = llvm.add %826, %828 : !llvm.i64 + %830 = llvm.getelementptr %822[%829] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %831 = llvm.load %830 : !llvm.ptr> + %832 = llvm.add %508, %48 : !llvm.i64 + %833 = llvm.icmp "slt" %832, %67 : !llvm.i64 + %834 = llvm.sub %64, %832 : !llvm.i64 + %835 = llvm.select %833, %834, %832 : !llvm.i1, !llvm.i64 + %836 = llvm.sdiv %835, %68 : !llvm.i64 + %837 = llvm.sub %64, %836 : !llvm.i64 + %838 = llvm.select %833, %837, %836 : !llvm.i1, !llvm.i64 + %839 = llvm.mul %838, %60 : !llvm.i64 + %840 = llvm.add %508, %839 : !llvm.i64 + %841 = llvm.add %840, %48 : !llvm.i64 + %842 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %843 = llvm.mlir.constant(0 : index) : !llvm.i64 + %844 = llvm.mlir.constant(256 : index) : !llvm.i64 + %845 = llvm.mul %841, %844 : !llvm.i64 + %846 = llvm.add %843, %845 : !llvm.i64 + %847 = llvm.mlir.constant(2 : index) : !llvm.i64 + %848 = llvm.mul %516, %847 : !llvm.i64 + %849 = llvm.add %846, %848 : !llvm.i64 + %850 = llvm.mlir.constant(1 : index) : !llvm.i64 + %851 = llvm.mul %530, %850 : !llvm.i64 + %852 = llvm.add %849, %851 : !llvm.i64 + %853 = llvm.getelementptr %842[%852] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %831, %853 : !llvm.ptr> + %854 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %855 = llvm.mlir.constant(0 : index) : !llvm.i64 + %856 = llvm.mlir.constant(16 : index) : !llvm.i64 + %857 = llvm.mul %67, %856 : !llvm.i64 + %858 = llvm.add %855, %857 : !llvm.i64 + %859 = llvm.mlir.constant(1 : index) : !llvm.i64 + %860 = llvm.mul %50, %859 : !llvm.i64 + %861 = llvm.add %858, %860 : !llvm.i64 + %862 = llvm.getelementptr %854[%861] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %863 = llvm.load %862 : !llvm.ptr> + %864 = llvm.add %155, %49 : !llvm.i64 + %865 = llvm.icmp "slt" %864, %67 : !llvm.i64 + %866 = llvm.sub %64, %864 : !llvm.i64 + %867 = llvm.select %865, %866, %864 : !llvm.i1, !llvm.i64 + %868 = llvm.sdiv %867, %68 : !llvm.i64 + %869 = llvm.sub %64, %868 : !llvm.i64 + %870 = llvm.select %865, %869, %868 : !llvm.i1, !llvm.i64 + %871 = llvm.srem %870, %68 : !llvm.i64 + %872 = llvm.icmp "slt" %871, %67 : !llvm.i64 + %873 = llvm.add %871, %68 : !llvm.i64 + %874 = llvm.select %872, %873, %871 : !llvm.i1, !llvm.i64 + %875 = llvm.mul %870, %65 : !llvm.i64 + %876 = llvm.add %566, %875 : !llvm.i64 + %877 = llvm.add %876, %50 : !llvm.i64 + %878 = llvm.icmp "slt" %877, %67 : !llvm.i64 + %879 = llvm.sub %64, %877 : !llvm.i64 + %880 = llvm.select %878, %879, %877 : !llvm.i1, !llvm.i64 + %881 = llvm.sdiv %880, %63 : !llvm.i64 + %882 = llvm.sub %64, %881 : !llvm.i64 + %883 = llvm.select %878, %882, %881 : !llvm.i1, !llvm.i64 + %884 = llvm.mul %883, %65 : !llvm.i64 + %885 = llvm.add %876, %884 : !llvm.i64 + %886 = llvm.add %885, %50 : !llvm.i64 + %887 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %888 = llvm.mlir.constant(0 : index) : !llvm.i64 + %889 = llvm.mlir.constant(256 : index) : !llvm.i64 + %890 = llvm.mul %874, %889 : !llvm.i64 + %891 = llvm.add %888, %890 : !llvm.i64 + %892 = llvm.mlir.constant(2 : index) : !llvm.i64 + %893 = llvm.mul %516, %892 : !llvm.i64 + %894 = llvm.add %891, %893 : !llvm.i64 + %895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %896 = llvm.mul %886, %895 : !llvm.i64 + %897 = llvm.add %894, %896 : !llvm.i64 + %898 = llvm.getelementptr %887[%897] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %863, %898 : !llvm.ptr> + %899 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %901 = llvm.mlir.constant(16 : index) : !llvm.i64 + %902 = llvm.mul %67, %901 : !llvm.i64 + %903 = llvm.add %900, %902 : !llvm.i64 + %904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %905 = llvm.mul %33, %904 : !llvm.i64 + %906 = llvm.add %903, %905 : !llvm.i64 + %907 = llvm.getelementptr %899[%906] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %908 = llvm.load %907 : !llvm.ptr> + %909 = llvm.add %508, %52 : !llvm.i64 + %910 = llvm.icmp "slt" %909, %67 : !llvm.i64 + %911 = llvm.sub %64, %909 : !llvm.i64 + %912 = llvm.select %910, %911, %909 : !llvm.i1, !llvm.i64 + %913 = llvm.sdiv %912, %68 : !llvm.i64 + %914 = llvm.sub %64, %913 : !llvm.i64 + %915 = llvm.select %910, %914, %913 : !llvm.i1, !llvm.i64 + %916 = llvm.mul %915, %60 : !llvm.i64 + %917 = llvm.add %508, %916 : !llvm.i64 + %918 = llvm.add %917, %52 : !llvm.i64 + %919 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %921 = llvm.mlir.constant(256 : index) : !llvm.i64 + %922 = llvm.mul %918, %921 : !llvm.i64 + %923 = llvm.add %920, %922 : !llvm.i64 + %924 = llvm.mlir.constant(2 : index) : !llvm.i64 + %925 = llvm.mul %516, %924 : !llvm.i64 + %926 = llvm.add %923, %925 : !llvm.i64 + %927 = llvm.mlir.constant(1 : index) : !llvm.i64 + %928 = llvm.mul %530, %927 : !llvm.i64 + %929 = llvm.add %926, %928 : !llvm.i64 + %930 = llvm.getelementptr %919[%929] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %908, %930 : !llvm.ptr> + %931 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %932 = llvm.mlir.constant(0 : index) : !llvm.i64 + %933 = llvm.mlir.constant(16 : index) : !llvm.i64 + %934 = llvm.mul %67, %933 : !llvm.i64 + %935 = llvm.add %932, %934 : !llvm.i64 + %936 = llvm.mlir.constant(1 : index) : !llvm.i64 + %937 = llvm.mul %54, %936 : !llvm.i64 + %938 = llvm.add %935, %937 : !llvm.i64 + %939 = llvm.getelementptr %931[%938] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %940 = llvm.load %939 : !llvm.ptr> + %941 = llvm.add %155, %53 : !llvm.i64 + %942 = llvm.icmp "slt" %941, %67 : !llvm.i64 + %943 = llvm.sub %64, %941 : !llvm.i64 + %944 = llvm.select %942, %943, %941 : !llvm.i1, !llvm.i64 + %945 = llvm.sdiv %944, %68 : !llvm.i64 + %946 = llvm.sub %64, %945 : !llvm.i64 + %947 = llvm.select %942, %946, %945 : !llvm.i1, !llvm.i64 + %948 = llvm.srem %947, %68 : !llvm.i64 + %949 = llvm.icmp "slt" %948, %67 : !llvm.i64 + %950 = llvm.add %948, %68 : !llvm.i64 + %951 = llvm.select %949, %950, %948 : !llvm.i1, !llvm.i64 + %952 = llvm.mul %947, %65 : !llvm.i64 + %953 = llvm.add %566, %952 : !llvm.i64 + %954 = llvm.add %953, %54 : !llvm.i64 + %955 = llvm.icmp "slt" %954, %67 : !llvm.i64 + %956 = llvm.sub %64, %954 : !llvm.i64 + %957 = llvm.select %955, %956, %954 : !llvm.i1, !llvm.i64 + %958 = llvm.sdiv %957, %63 : !llvm.i64 + %959 = llvm.sub %64, %958 : !llvm.i64 + %960 = llvm.select %955, %959, %958 : !llvm.i1, !llvm.i64 + %961 = llvm.mul %960, %65 : !llvm.i64 + %962 = llvm.add %953, %961 : !llvm.i64 + %963 = llvm.add %962, %54 : !llvm.i64 + %964 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %966 = llvm.mlir.constant(256 : index) : !llvm.i64 + %967 = llvm.mul %951, %966 : !llvm.i64 + %968 = llvm.add %965, %967 : !llvm.i64 + %969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %970 = llvm.mul %516, %969 : !llvm.i64 + %971 = llvm.add %968, %970 : !llvm.i64 + %972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %973 = llvm.mul %963, %972 : !llvm.i64 + %974 = llvm.add %971, %973 : !llvm.i64 + %975 = llvm.getelementptr %964[%974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %940, %975 : !llvm.ptr> + %976 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %977 = llvm.mlir.constant(0 : index) : !llvm.i64 + %978 = llvm.mlir.constant(16 : index) : !llvm.i64 + %979 = llvm.mul %67, %978 : !llvm.i64 + %980 = llvm.add %977, %979 : !llvm.i64 + %981 = llvm.mlir.constant(1 : index) : !llvm.i64 + %982 = llvm.mul %34, %981 : !llvm.i64 + %983 = llvm.add %980, %982 : !llvm.i64 + %984 = llvm.getelementptr %976[%983] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %985 = llvm.load %984 : !llvm.ptr> + %986 = llvm.add %508, %56 : !llvm.i64 + %987 = llvm.icmp "slt" %986, %67 : !llvm.i64 + %988 = llvm.sub %64, %986 : !llvm.i64 + %989 = llvm.select %987, %988, %986 : !llvm.i1, !llvm.i64 + %990 = llvm.sdiv %989, %68 : !llvm.i64 + %991 = llvm.sub %64, %990 : !llvm.i64 + %992 = llvm.select %987, %991, %990 : !llvm.i1, !llvm.i64 + %993 = llvm.mul %992, %60 : !llvm.i64 + %994 = llvm.add %508, %993 : !llvm.i64 + %995 = llvm.add %994, %56 : !llvm.i64 + %996 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %997 = llvm.mlir.constant(0 : index) : !llvm.i64 + %998 = llvm.mlir.constant(256 : index) : !llvm.i64 + %999 = llvm.mul %995, %998 : !llvm.i64 + %1000 = llvm.add %997, %999 : !llvm.i64 + %1001 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1002 = llvm.mul %516, %1001 : !llvm.i64 + %1003 = llvm.add %1000, %1002 : !llvm.i64 + %1004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1005 = llvm.mul %530, %1004 : !llvm.i64 + %1006 = llvm.add %1003, %1005 : !llvm.i64 + %1007 = llvm.getelementptr %996[%1006] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %985, %1007 : !llvm.ptr> + %1008 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1010 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1011 = llvm.mul %67, %1010 : !llvm.i64 + %1012 = llvm.add %1009, %1011 : !llvm.i64 + %1013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1014 = llvm.mul %58, %1013 : !llvm.i64 + %1015 = llvm.add %1012, %1014 : !llvm.i64 + %1016 = llvm.getelementptr %1008[%1015] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1017 = llvm.load %1016 : !llvm.ptr> + %1018 = llvm.add %155, %57 : !llvm.i64 + %1019 = llvm.icmp "slt" %1018, %67 : !llvm.i64 + %1020 = llvm.sub %64, %1018 : !llvm.i64 + %1021 = llvm.select %1019, %1020, %1018 : !llvm.i1, !llvm.i64 + %1022 = llvm.sdiv %1021, %68 : !llvm.i64 + %1023 = llvm.sub %64, %1022 : !llvm.i64 + %1024 = llvm.select %1019, %1023, %1022 : !llvm.i1, !llvm.i64 + %1025 = llvm.srem %1024, %68 : !llvm.i64 + %1026 = llvm.icmp "slt" %1025, %67 : !llvm.i64 + %1027 = llvm.add %1025, %68 : !llvm.i64 + %1028 = llvm.select %1026, %1027, %1025 : !llvm.i1, !llvm.i64 + %1029 = llvm.mul %1024, %65 : !llvm.i64 + %1030 = llvm.add %566, %1029 : !llvm.i64 + %1031 = llvm.add %1030, %58 : !llvm.i64 + %1032 = llvm.icmp "slt" %1031, %67 : !llvm.i64 + %1033 = llvm.sub %64, %1031 : !llvm.i64 + %1034 = llvm.select %1032, %1033, %1031 : !llvm.i1, !llvm.i64 + %1035 = llvm.sdiv %1034, %63 : !llvm.i64 + %1036 = llvm.sub %64, %1035 : !llvm.i64 + %1037 = llvm.select %1032, %1036, %1035 : !llvm.i1, !llvm.i64 + %1038 = llvm.mul %1037, %65 : !llvm.i64 + %1039 = llvm.add %1030, %1038 : !llvm.i64 + %1040 = llvm.add %1039, %58 : !llvm.i64 + %1041 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1042 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1043 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1044 = llvm.mul %1028, %1043 : !llvm.i64 + %1045 = llvm.add %1042, %1044 : !llvm.i64 + %1046 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1047 = llvm.mul %516, %1046 : !llvm.i64 + %1048 = llvm.add %1045, %1047 : !llvm.i64 + %1049 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1050 = llvm.mul %1040, %1049 : !llvm.i64 + %1051 = llvm.add %1048, %1050 : !llvm.i64 + %1052 = llvm.getelementptr %1041[%1051] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1017, %1052 : !llvm.ptr> + %1053 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1054 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1055 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1056 = llvm.mul %67, %1055 : !llvm.i64 + %1057 = llvm.add %1054, %1056 : !llvm.i64 + %1058 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1059 = llvm.mul %35, %1058 : !llvm.i64 + %1060 = llvm.add %1057, %1059 : !llvm.i64 + %1061 = llvm.getelementptr %1053[%1060] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1062 = llvm.load %1061 : !llvm.ptr> + %1063 = llvm.add %508, %61 : !llvm.i64 + %1064 = llvm.icmp "slt" %1063, %67 : !llvm.i64 + %1065 = llvm.sub %64, %1063 : !llvm.i64 + %1066 = llvm.select %1064, %1065, %1063 : !llvm.i1, !llvm.i64 + %1067 = llvm.sdiv %1066, %68 : !llvm.i64 + %1068 = llvm.sub %64, %1067 : !llvm.i64 + %1069 = llvm.select %1064, %1068, %1067 : !llvm.i1, !llvm.i64 + %1070 = llvm.mul %1069, %60 : !llvm.i64 + %1071 = llvm.add %508, %1070 : !llvm.i64 + %1072 = llvm.add %1071, %61 : !llvm.i64 + %1073 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1074 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1075 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1076 = llvm.mul %1072, %1075 : !llvm.i64 + %1077 = llvm.add %1074, %1076 : !llvm.i64 + %1078 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1079 = llvm.mul %516, %1078 : !llvm.i64 + %1080 = llvm.add %1077, %1079 : !llvm.i64 + %1081 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1082 = llvm.mul %530, %1081 : !llvm.i64 + %1083 = llvm.add %1080, %1082 : !llvm.i64 + %1084 = llvm.getelementptr %1073[%1083] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1062, %1084 : !llvm.ptr> + %1085 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1087 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1088 = llvm.mul %67, %1087 : !llvm.i64 + %1089 = llvm.add %1086, %1088 : !llvm.i64 + %1090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1091 = llvm.mul %66, %1090 : !llvm.i64 + %1092 = llvm.add %1089, %1091 : !llvm.i64 + %1093 = llvm.getelementptr %1085[%1092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1094 = llvm.load %1093 : !llvm.ptr> + %1095 = llvm.add %155, %62 : !llvm.i64 + %1096 = llvm.icmp "slt" %1095, %67 : !llvm.i64 + %1097 = llvm.sub %64, %1095 : !llvm.i64 + %1098 = llvm.select %1096, %1097, %1095 : !llvm.i1, !llvm.i64 + %1099 = llvm.sdiv %1098, %68 : !llvm.i64 + %1100 = llvm.sub %64, %1099 : !llvm.i64 + %1101 = llvm.select %1096, %1100, %1099 : !llvm.i1, !llvm.i64 + %1102 = llvm.srem %1101, %68 : !llvm.i64 + %1103 = llvm.icmp "slt" %1102, %67 : !llvm.i64 + %1104 = llvm.add %1102, %68 : !llvm.i64 + %1105 = llvm.select %1103, %1104, %1102 : !llvm.i1, !llvm.i64 + %1106 = llvm.mul %1101, %65 : !llvm.i64 + %1107 = llvm.add %566, %1106 : !llvm.i64 + %1108 = llvm.add %1107, %66 : !llvm.i64 + %1109 = llvm.icmp "slt" %1108, %67 : !llvm.i64 + %1110 = llvm.sub %64, %1108 : !llvm.i64 + %1111 = llvm.select %1109, %1110, %1108 : !llvm.i1, !llvm.i64 + %1112 = llvm.sdiv %1111, %63 : !llvm.i64 + %1113 = llvm.sub %64, %1112 : !llvm.i64 + %1114 = llvm.select %1109, %1113, %1112 : !llvm.i1, !llvm.i64 + %1115 = llvm.mul %1114, %65 : !llvm.i64 + %1116 = llvm.add %1107, %1115 : !llvm.i64 + %1117 = llvm.add %1116, %66 : !llvm.i64 + %1118 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1119 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1120 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1121 = llvm.mul %1105, %1120 : !llvm.i64 + %1122 = llvm.add %1119, %1121 : !llvm.i64 + %1123 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1124 = llvm.mul %516, %1123 : !llvm.i64 + %1125 = llvm.add %1122, %1124 : !llvm.i64 + %1126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1127 = llvm.mul %1117, %1126 : !llvm.i64 + %1128 = llvm.add %1125, %1127 : !llvm.i64 + %1129 = llvm.getelementptr %1118[%1128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1094, %1129 : !llvm.ptr> + llvm.br ^bb9 + ^bb8: // pred: ^bb6 + %1130 = llvm.add %151, %155 : !llvm.i64 + %1131 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1134 = llvm.mul %153, %1133 : !llvm.i64 + %1135 = llvm.add %1132, %1134 : !llvm.i64 + %1136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1137 = llvm.mul %1130, %1136 : !llvm.i64 + %1138 = llvm.add %1135, %1137 : !llvm.i64 + %1139 = llvm.getelementptr %1131[%1138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1140 = llvm.bitcast %1139 : !llvm.ptr to !llvm.ptr> + %1141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1143 = llvm.trunc %1130 : !llvm.i64 to !llvm.i32 + %1144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1146 = llvm.insertelement %1143, %1144[%1145 : !llvm.i32] : !llvm.vec<8 x i32> + %1147 = llvm.shufflevector %1146, %1144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1148 = llvm.add %1147, %1142 : !llvm.vec<8 x i32> + %1149 = llvm.trunc %1141 : !llvm.i64 to !llvm.i32 + %1150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1152 = llvm.insertelement %1149, %1150[%1151 : !llvm.i32] : !llvm.vec<8 x i32> + %1153 = llvm.shufflevector %1152, %1150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1154 = llvm.icmp "slt" %1148, %1153 : !llvm.vec<8 x i32> + %1155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1156 = llvm.intr.masked.load %1140, %1154, %1155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1157 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1159 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1160 = llvm.mul %67, %1159 : !llvm.i64 + %1161 = llvm.add %1158, %1160 : !llvm.i64 + %1162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1163 = llvm.mul %67, %1162 : !llvm.i64 + %1164 = llvm.add %1161, %1163 : !llvm.i64 + %1165 = llvm.getelementptr %1157[%1164] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1156, %1165 : !llvm.ptr> + %1166 = llvm.add %1130, %70 : !llvm.i64 + %1167 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1169 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1170 = llvm.mul %153, %1169 : !llvm.i64 + %1171 = llvm.add %1168, %1170 : !llvm.i64 + %1172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1173 = llvm.mul %1166, %1172 : !llvm.i64 + %1174 = llvm.add %1171, %1173 : !llvm.i64 + %1175 = llvm.getelementptr %1167[%1174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1176 = llvm.bitcast %1175 : !llvm.ptr to !llvm.ptr> + %1177 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1178 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1179 = llvm.trunc %1166 : !llvm.i64 to !llvm.i32 + %1180 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1181 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1182 = llvm.insertelement %1179, %1180[%1181 : !llvm.i32] : !llvm.vec<8 x i32> + %1183 = llvm.shufflevector %1182, %1180 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1184 = llvm.add %1183, %1178 : !llvm.vec<8 x i32> + %1185 = llvm.trunc %1177 : !llvm.i64 to !llvm.i32 + %1186 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1187 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1188 = llvm.insertelement %1185, %1186[%1187 : !llvm.i32] : !llvm.vec<8 x i32> + %1189 = llvm.shufflevector %1188, %1186 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1190 = llvm.icmp "slt" %1184, %1189 : !llvm.vec<8 x i32> + %1191 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1192 = llvm.intr.masked.load %1176, %1190, %1191 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1193 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1194 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1195 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1196 = llvm.mul %67, %1195 : !llvm.i64 + %1197 = llvm.add %1194, %1196 : !llvm.i64 + %1198 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1199 = llvm.mul %69, %1198 : !llvm.i64 + %1200 = llvm.add %1197, %1199 : !llvm.i64 + %1201 = llvm.getelementptr %1193[%1200] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1192, %1201 : !llvm.ptr> + %1202 = llvm.add %1130, %68 : !llvm.i64 + %1203 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1204 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1205 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1206 = llvm.mul %153, %1205 : !llvm.i64 + %1207 = llvm.add %1204, %1206 : !llvm.i64 + %1208 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1209 = llvm.mul %1202, %1208 : !llvm.i64 + %1210 = llvm.add %1207, %1209 : !llvm.i64 + %1211 = llvm.getelementptr %1203[%1210] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1212 = llvm.bitcast %1211 : !llvm.ptr to !llvm.ptr> + %1213 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1214 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1215 = llvm.trunc %1202 : !llvm.i64 to !llvm.i32 + %1216 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1217 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1218 = llvm.insertelement %1215, %1216[%1217 : !llvm.i32] : !llvm.vec<8 x i32> + %1219 = llvm.shufflevector %1218, %1216 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1220 = llvm.add %1219, %1214 : !llvm.vec<8 x i32> + %1221 = llvm.trunc %1213 : !llvm.i64 to !llvm.i32 + %1222 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1223 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1224 = llvm.insertelement %1221, %1222[%1223 : !llvm.i32] : !llvm.vec<8 x i32> + %1225 = llvm.shufflevector %1224, %1222 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1226 = llvm.icmp "slt" %1220, %1225 : !llvm.vec<8 x i32> + %1227 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1228 = llvm.intr.masked.load %1212, %1226, %1227 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1229 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1230 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1231 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1232 = llvm.mul %67, %1231 : !llvm.i64 + %1233 = llvm.add %1230, %1232 : !llvm.i64 + %1234 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1235 = llvm.mul %63, %1234 : !llvm.i64 + %1236 = llvm.add %1233, %1235 : !llvm.i64 + %1237 = llvm.getelementptr %1229[%1236] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1228, %1237 : !llvm.ptr> + %1238 = llvm.add %1130, %41 : !llvm.i64 + %1239 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1242 = llvm.mul %153, %1241 : !llvm.i64 + %1243 = llvm.add %1240, %1242 : !llvm.i64 + %1244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1245 = llvm.mul %1238, %1244 : !llvm.i64 + %1246 = llvm.add %1243, %1245 : !llvm.i64 + %1247 = llvm.getelementptr %1239[%1246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1248 = llvm.bitcast %1247 : !llvm.ptr to !llvm.ptr> + %1249 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1250 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1251 = llvm.trunc %1238 : !llvm.i64 to !llvm.i32 + %1252 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1253 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1254 = llvm.insertelement %1251, %1252[%1253 : !llvm.i32] : !llvm.vec<8 x i32> + %1255 = llvm.shufflevector %1254, %1252 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1256 = llvm.add %1255, %1250 : !llvm.vec<8 x i32> + %1257 = llvm.trunc %1249 : !llvm.i64 to !llvm.i32 + %1258 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1259 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1260 = llvm.insertelement %1257, %1258[%1259 : !llvm.i32] : !llvm.vec<8 x i32> + %1261 = llvm.shufflevector %1260, %1258 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1262 = llvm.icmp "slt" %1256, %1261 : !llvm.vec<8 x i32> + %1263 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1264 = llvm.intr.masked.load %1248, %1262, %1263 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1265 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1266 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1267 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1268 = llvm.mul %67, %1267 : !llvm.i64 + %1269 = llvm.add %1266, %1268 : !llvm.i64 + %1270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1271 = llvm.mul %45, %1270 : !llvm.i64 + %1272 = llvm.add %1269, %1271 : !llvm.i64 + %1273 = llvm.getelementptr %1265[%1272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1264, %1273 : !llvm.ptr> + %1274 = llvm.add %1130, %42 : !llvm.i64 + %1275 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1276 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1277 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1278 = llvm.mul %153, %1277 : !llvm.i64 + %1279 = llvm.add %1276, %1278 : !llvm.i64 + %1280 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1281 = llvm.mul %1274, %1280 : !llvm.i64 + %1282 = llvm.add %1279, %1281 : !llvm.i64 + %1283 = llvm.getelementptr %1275[%1282] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1284 = llvm.bitcast %1283 : !llvm.ptr to !llvm.ptr> + %1285 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1286 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1287 = llvm.trunc %1274 : !llvm.i64 to !llvm.i32 + %1288 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1289 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1290 = llvm.insertelement %1287, %1288[%1289 : !llvm.i32] : !llvm.vec<8 x i32> + %1291 = llvm.shufflevector %1290, %1288 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1292 = llvm.add %1291, %1286 : !llvm.vec<8 x i32> + %1293 = llvm.trunc %1285 : !llvm.i64 to !llvm.i32 + %1294 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1295 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1296 = llvm.insertelement %1293, %1294[%1295 : !llvm.i32] : !llvm.vec<8 x i32> + %1297 = llvm.shufflevector %1296, %1294 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1298 = llvm.icmp "slt" %1292, %1297 : !llvm.vec<8 x i32> + %1299 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1300 = llvm.intr.masked.load %1284, %1298, %1299 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1301 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1302 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1303 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1304 = llvm.mul %67, %1303 : !llvm.i64 + %1305 = llvm.add %1302, %1304 : !llvm.i64 + %1306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1307 = llvm.mul %48, %1306 : !llvm.i64 + %1308 = llvm.add %1305, %1307 : !llvm.i64 + %1309 = llvm.getelementptr %1301[%1308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1300, %1309 : !llvm.ptr> + %1310 = llvm.add %1130, %43 : !llvm.i64 + %1311 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1314 = llvm.mul %153, %1313 : !llvm.i64 + %1315 = llvm.add %1312, %1314 : !llvm.i64 + %1316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1317 = llvm.mul %1310, %1316 : !llvm.i64 + %1318 = llvm.add %1315, %1317 : !llvm.i64 + %1319 = llvm.getelementptr %1311[%1318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1320 = llvm.bitcast %1319 : !llvm.ptr to !llvm.ptr> + %1321 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1322 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1323 = llvm.trunc %1310 : !llvm.i64 to !llvm.i32 + %1324 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1325 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1326 = llvm.insertelement %1323, %1324[%1325 : !llvm.i32] : !llvm.vec<8 x i32> + %1327 = llvm.shufflevector %1326, %1324 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1328 = llvm.add %1327, %1322 : !llvm.vec<8 x i32> + %1329 = llvm.trunc %1321 : !llvm.i64 to !llvm.i32 + %1330 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1331 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1332 = llvm.insertelement %1329, %1330[%1331 : !llvm.i32] : !llvm.vec<8 x i32> + %1333 = llvm.shufflevector %1332, %1330 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1334 = llvm.icmp "slt" %1328, %1333 : !llvm.vec<8 x i32> + %1335 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1336 = llvm.intr.masked.load %1320, %1334, %1335 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1340 = llvm.mul %67, %1339 : !llvm.i64 + %1341 = llvm.add %1338, %1340 : !llvm.i64 + %1342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1343 = llvm.mul %52, %1342 : !llvm.i64 + %1344 = llvm.add %1341, %1343 : !llvm.i64 + %1345 = llvm.getelementptr %1337[%1344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1336, %1345 : !llvm.ptr> + %1346 = llvm.add %1130, %44 : !llvm.i64 + %1347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1350 = llvm.mul %153, %1349 : !llvm.i64 + %1351 = llvm.add %1348, %1350 : !llvm.i64 + %1352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1353 = llvm.mul %1346, %1352 : !llvm.i64 + %1354 = llvm.add %1351, %1353 : !llvm.i64 + %1355 = llvm.getelementptr %1347[%1354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1356 = llvm.bitcast %1355 : !llvm.ptr to !llvm.ptr> + %1357 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1358 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1359 = llvm.trunc %1346 : !llvm.i64 to !llvm.i32 + %1360 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1361 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1362 = llvm.insertelement %1359, %1360[%1361 : !llvm.i32] : !llvm.vec<8 x i32> + %1363 = llvm.shufflevector %1362, %1360 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1364 = llvm.add %1363, %1358 : !llvm.vec<8 x i32> + %1365 = llvm.trunc %1357 : !llvm.i64 to !llvm.i32 + %1366 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1367 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1368 = llvm.insertelement %1365, %1366[%1367 : !llvm.i32] : !llvm.vec<8 x i32> + %1369 = llvm.shufflevector %1368, %1366 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1370 = llvm.icmp "slt" %1364, %1369 : !llvm.vec<8 x i32> + %1371 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1372 = llvm.intr.masked.load %1356, %1370, %1371 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1373 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1375 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1376 = llvm.mul %67, %1375 : !llvm.i64 + %1377 = llvm.add %1374, %1376 : !llvm.i64 + %1378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1379 = llvm.mul %56, %1378 : !llvm.i64 + %1380 = llvm.add %1377, %1379 : !llvm.i64 + %1381 = llvm.getelementptr %1373[%1380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1372, %1381 : !llvm.ptr> + %1382 = llvm.add %1130, %46 : !llvm.i64 + %1383 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1384 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1385 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1386 = llvm.mul %153, %1385 : !llvm.i64 + %1387 = llvm.add %1384, %1386 : !llvm.i64 + %1388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1389 = llvm.mul %1382, %1388 : !llvm.i64 + %1390 = llvm.add %1387, %1389 : !llvm.i64 + %1391 = llvm.getelementptr %1383[%1390] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1392 = llvm.bitcast %1391 : !llvm.ptr to !llvm.ptr> + %1393 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1394 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1395 = llvm.trunc %1382 : !llvm.i64 to !llvm.i32 + %1396 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1397 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1398 = llvm.insertelement %1395, %1396[%1397 : !llvm.i32] : !llvm.vec<8 x i32> + %1399 = llvm.shufflevector %1398, %1396 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1400 = llvm.add %1399, %1394 : !llvm.vec<8 x i32> + %1401 = llvm.trunc %1393 : !llvm.i64 to !llvm.i32 + %1402 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1403 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1404 = llvm.insertelement %1401, %1402[%1403 : !llvm.i32] : !llvm.vec<8 x i32> + %1405 = llvm.shufflevector %1404, %1402 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1406 = llvm.icmp "slt" %1400, %1405 : !llvm.vec<8 x i32> + %1407 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1408 = llvm.intr.masked.load %1392, %1406, %1407 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1409 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1410 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1411 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1412 = llvm.mul %67, %1411 : !llvm.i64 + %1413 = llvm.add %1410, %1412 : !llvm.i64 + %1414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1415 = llvm.mul %61, %1414 : !llvm.i64 + %1416 = llvm.add %1413, %1415 : !llvm.i64 + %1417 = llvm.getelementptr %1409[%1416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1408, %1417 : !llvm.ptr> + %1418 = llvm.add %1130, %47 : !llvm.i64 + %1419 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1422 = llvm.mul %153, %1421 : !llvm.i64 + %1423 = llvm.add %1420, %1422 : !llvm.i64 + %1424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1425 = llvm.mul %1418, %1424 : !llvm.i64 + %1426 = llvm.add %1423, %1425 : !llvm.i64 + %1427 = llvm.getelementptr %1419[%1426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1428 = llvm.bitcast %1427 : !llvm.ptr to !llvm.ptr> + %1429 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1430 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1431 = llvm.trunc %1418 : !llvm.i64 to !llvm.i32 + %1432 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1433 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1434 = llvm.insertelement %1431, %1432[%1433 : !llvm.i32] : !llvm.vec<8 x i32> + %1435 = llvm.shufflevector %1434, %1432 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1436 = llvm.add %1435, %1430 : !llvm.vec<8 x i32> + %1437 = llvm.trunc %1429 : !llvm.i64 to !llvm.i32 + %1438 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1439 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1440 = llvm.insertelement %1437, %1438[%1439 : !llvm.i32] : !llvm.vec<8 x i32> + %1441 = llvm.shufflevector %1440, %1438 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1442 = llvm.icmp "slt" %1436, %1441 : !llvm.vec<8 x i32> + %1443 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1444 = llvm.intr.masked.load %1428, %1442, %1443 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1445 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1446 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1447 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1448 = llvm.mul %67, %1447 : !llvm.i64 + %1449 = llvm.add %1446, %1448 : !llvm.i64 + %1450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1451 = llvm.mul %70, %1450 : !llvm.i64 + %1452 = llvm.add %1449, %1451 : !llvm.i64 + %1453 = llvm.getelementptr %1445[%1452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1444, %1453 : !llvm.ptr> + %1454 = llvm.add %1130, %49 : !llvm.i64 + %1455 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1456 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1457 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1458 = llvm.mul %153, %1457 : !llvm.i64 + %1459 = llvm.add %1456, %1458 : !llvm.i64 + %1460 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1461 = llvm.mul %1454, %1460 : !llvm.i64 + %1462 = llvm.add %1459, %1461 : !llvm.i64 + %1463 = llvm.getelementptr %1455[%1462] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1464 = llvm.bitcast %1463 : !llvm.ptr to !llvm.ptr> + %1465 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1466 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1467 = llvm.trunc %1454 : !llvm.i64 to !llvm.i32 + %1468 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1469 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1470 = llvm.insertelement %1467, %1468[%1469 : !llvm.i32] : !llvm.vec<8 x i32> + %1471 = llvm.shufflevector %1470, %1468 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1472 = llvm.add %1471, %1466 : !llvm.vec<8 x i32> + %1473 = llvm.trunc %1465 : !llvm.i64 to !llvm.i32 + %1474 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1475 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1476 = llvm.insertelement %1473, %1474[%1475 : !llvm.i32] : !llvm.vec<8 x i32> + %1477 = llvm.shufflevector %1476, %1474 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1478 = llvm.icmp "slt" %1472, %1477 : !llvm.vec<8 x i32> + %1479 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1480 = llvm.intr.masked.load %1464, %1478, %1479 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1481 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1482 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1483 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1484 = llvm.mul %67, %1483 : !llvm.i64 + %1485 = llvm.add %1482, %1484 : !llvm.i64 + %1486 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1487 = llvm.mul %50, %1486 : !llvm.i64 + %1488 = llvm.add %1485, %1487 : !llvm.i64 + %1489 = llvm.getelementptr %1481[%1488] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1480, %1489 : !llvm.ptr> + %1490 = llvm.add %1130, %51 : !llvm.i64 + %1491 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1492 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1493 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1494 = llvm.mul %153, %1493 : !llvm.i64 + %1495 = llvm.add %1492, %1494 : !llvm.i64 + %1496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1497 = llvm.mul %1490, %1496 : !llvm.i64 + %1498 = llvm.add %1495, %1497 : !llvm.i64 + %1499 = llvm.getelementptr %1491[%1498] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1500 = llvm.bitcast %1499 : !llvm.ptr to !llvm.ptr> + %1501 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1502 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1503 = llvm.trunc %1490 : !llvm.i64 to !llvm.i32 + %1504 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1505 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1506 = llvm.insertelement %1503, %1504[%1505 : !llvm.i32] : !llvm.vec<8 x i32> + %1507 = llvm.shufflevector %1506, %1504 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1508 = llvm.add %1507, %1502 : !llvm.vec<8 x i32> + %1509 = llvm.trunc %1501 : !llvm.i64 to !llvm.i32 + %1510 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1511 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1512 = llvm.insertelement %1509, %1510[%1511 : !llvm.i32] : !llvm.vec<8 x i32> + %1513 = llvm.shufflevector %1512, %1510 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1514 = llvm.icmp "slt" %1508, %1513 : !llvm.vec<8 x i32> + %1515 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1516 = llvm.intr.masked.load %1500, %1514, %1515 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1517 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1519 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1520 = llvm.mul %67, %1519 : !llvm.i64 + %1521 = llvm.add %1518, %1520 : !llvm.i64 + %1522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1523 = llvm.mul %33, %1522 : !llvm.i64 + %1524 = llvm.add %1521, %1523 : !llvm.i64 + %1525 = llvm.getelementptr %1517[%1524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1516, %1525 : !llvm.ptr> + %1526 = llvm.add %1130, %53 : !llvm.i64 + %1527 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1528 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1529 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1530 = llvm.mul %153, %1529 : !llvm.i64 + %1531 = llvm.add %1528, %1530 : !llvm.i64 + %1532 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1533 = llvm.mul %1526, %1532 : !llvm.i64 + %1534 = llvm.add %1531, %1533 : !llvm.i64 + %1535 = llvm.getelementptr %1527[%1534] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1536 = llvm.bitcast %1535 : !llvm.ptr to !llvm.ptr> + %1537 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1538 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1539 = llvm.trunc %1526 : !llvm.i64 to !llvm.i32 + %1540 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1541 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1542 = llvm.insertelement %1539, %1540[%1541 : !llvm.i32] : !llvm.vec<8 x i32> + %1543 = llvm.shufflevector %1542, %1540 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1544 = llvm.add %1543, %1538 : !llvm.vec<8 x i32> + %1545 = llvm.trunc %1537 : !llvm.i64 to !llvm.i32 + %1546 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1547 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1548 = llvm.insertelement %1545, %1546[%1547 : !llvm.i32] : !llvm.vec<8 x i32> + %1549 = llvm.shufflevector %1548, %1546 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1550 = llvm.icmp "slt" %1544, %1549 : !llvm.vec<8 x i32> + %1551 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1552 = llvm.intr.masked.load %1536, %1550, %1551 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1553 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1554 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1555 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1556 = llvm.mul %67, %1555 : !llvm.i64 + %1557 = llvm.add %1554, %1556 : !llvm.i64 + %1558 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1559 = llvm.mul %54, %1558 : !llvm.i64 + %1560 = llvm.add %1557, %1559 : !llvm.i64 + %1561 = llvm.getelementptr %1553[%1560] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1552, %1561 : !llvm.ptr> + %1562 = llvm.add %1130, %55 : !llvm.i64 + %1563 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1564 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1565 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1566 = llvm.mul %153, %1565 : !llvm.i64 + %1567 = llvm.add %1564, %1566 : !llvm.i64 + %1568 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1569 = llvm.mul %1562, %1568 : !llvm.i64 + %1570 = llvm.add %1567, %1569 : !llvm.i64 + %1571 = llvm.getelementptr %1563[%1570] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1572 = llvm.bitcast %1571 : !llvm.ptr to !llvm.ptr> + %1573 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1574 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1575 = llvm.trunc %1562 : !llvm.i64 to !llvm.i32 + %1576 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1577 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1578 = llvm.insertelement %1575, %1576[%1577 : !llvm.i32] : !llvm.vec<8 x i32> + %1579 = llvm.shufflevector %1578, %1576 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1580 = llvm.add %1579, %1574 : !llvm.vec<8 x i32> + %1581 = llvm.trunc %1573 : !llvm.i64 to !llvm.i32 + %1582 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1583 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1584 = llvm.insertelement %1581, %1582[%1583 : !llvm.i32] : !llvm.vec<8 x i32> + %1585 = llvm.shufflevector %1584, %1582 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1586 = llvm.icmp "slt" %1580, %1585 : !llvm.vec<8 x i32> + %1587 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1588 = llvm.intr.masked.load %1572, %1586, %1587 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1589 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1592 = llvm.mul %67, %1591 : !llvm.i64 + %1593 = llvm.add %1590, %1592 : !llvm.i64 + %1594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1595 = llvm.mul %34, %1594 : !llvm.i64 + %1596 = llvm.add %1593, %1595 : !llvm.i64 + %1597 = llvm.getelementptr %1589[%1596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1588, %1597 : !llvm.ptr> + %1598 = llvm.add %1130, %57 : !llvm.i64 + %1599 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1602 = llvm.mul %153, %1601 : !llvm.i64 + %1603 = llvm.add %1600, %1602 : !llvm.i64 + %1604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1605 = llvm.mul %1598, %1604 : !llvm.i64 + %1606 = llvm.add %1603, %1605 : !llvm.i64 + %1607 = llvm.getelementptr %1599[%1606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1608 = llvm.bitcast %1607 : !llvm.ptr to !llvm.ptr> + %1609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1611 = llvm.trunc %1598 : !llvm.i64 to !llvm.i32 + %1612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1614 = llvm.insertelement %1611, %1612[%1613 : !llvm.i32] : !llvm.vec<8 x i32> + %1615 = llvm.shufflevector %1614, %1612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1616 = llvm.add %1615, %1610 : !llvm.vec<8 x i32> + %1617 = llvm.trunc %1609 : !llvm.i64 to !llvm.i32 + %1618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1620 = llvm.insertelement %1617, %1618[%1619 : !llvm.i32] : !llvm.vec<8 x i32> + %1621 = llvm.shufflevector %1620, %1618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1622 = llvm.icmp "slt" %1616, %1621 : !llvm.vec<8 x i32> + %1623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1624 = llvm.intr.masked.load %1608, %1622, %1623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1625 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1626 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1627 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1628 = llvm.mul %67, %1627 : !llvm.i64 + %1629 = llvm.add %1626, %1628 : !llvm.i64 + %1630 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1631 = llvm.mul %58, %1630 : !llvm.i64 + %1632 = llvm.add %1629, %1631 : !llvm.i64 + %1633 = llvm.getelementptr %1625[%1632] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1624, %1633 : !llvm.ptr> + %1634 = llvm.add %1130, %59 : !llvm.i64 + %1635 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1637 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1638 = llvm.mul %153, %1637 : !llvm.i64 + %1639 = llvm.add %1636, %1638 : !llvm.i64 + %1640 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1641 = llvm.mul %1634, %1640 : !llvm.i64 + %1642 = llvm.add %1639, %1641 : !llvm.i64 + %1643 = llvm.getelementptr %1635[%1642] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1644 = llvm.bitcast %1643 : !llvm.ptr to !llvm.ptr> + %1645 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1646 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1647 = llvm.trunc %1634 : !llvm.i64 to !llvm.i32 + %1648 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1649 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1650 = llvm.insertelement %1647, %1648[%1649 : !llvm.i32] : !llvm.vec<8 x i32> + %1651 = llvm.shufflevector %1650, %1648 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1652 = llvm.add %1651, %1646 : !llvm.vec<8 x i32> + %1653 = llvm.trunc %1645 : !llvm.i64 to !llvm.i32 + %1654 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1655 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1656 = llvm.insertelement %1653, %1654[%1655 : !llvm.i32] : !llvm.vec<8 x i32> + %1657 = llvm.shufflevector %1656, %1654 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1658 = llvm.icmp "slt" %1652, %1657 : !llvm.vec<8 x i32> + %1659 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1660 = llvm.intr.masked.load %1644, %1658, %1659 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1661 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1662 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1663 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1664 = llvm.mul %67, %1663 : !llvm.i64 + %1665 = llvm.add %1662, %1664 : !llvm.i64 + %1666 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1667 = llvm.mul %35, %1666 : !llvm.i64 + %1668 = llvm.add %1665, %1667 : !llvm.i64 + %1669 = llvm.getelementptr %1661[%1668] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1660, %1669 : !llvm.ptr> + %1670 = llvm.add %1130, %62 : !llvm.i64 + %1671 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1672 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1673 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1674 = llvm.mul %153, %1673 : !llvm.i64 + %1675 = llvm.add %1672, %1674 : !llvm.i64 + %1676 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1677 = llvm.mul %1670, %1676 : !llvm.i64 + %1678 = llvm.add %1675, %1677 : !llvm.i64 + %1679 = llvm.getelementptr %1671[%1678] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1680 = llvm.bitcast %1679 : !llvm.ptr to !llvm.ptr> + %1681 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1682 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1683 = llvm.trunc %1670 : !llvm.i64 to !llvm.i32 + %1684 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1685 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1686 = llvm.insertelement %1683, %1684[%1685 : !llvm.i32] : !llvm.vec<8 x i32> + %1687 = llvm.shufflevector %1686, %1684 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1688 = llvm.add %1687, %1682 : !llvm.vec<8 x i32> + %1689 = llvm.trunc %1681 : !llvm.i64 to !llvm.i32 + %1690 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1691 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1692 = llvm.insertelement %1689, %1690[%1691 : !llvm.i32] : !llvm.vec<8 x i32> + %1693 = llvm.shufflevector %1692, %1690 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1694 = llvm.icmp "slt" %1688, %1693 : !llvm.vec<8 x i32> + %1695 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1696 = llvm.intr.masked.load %1680, %1694, %1695 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1697 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1698 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1699 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1700 = llvm.mul %67, %1699 : !llvm.i64 + %1701 = llvm.add %1698, %1700 : !llvm.i64 + %1702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1703 = llvm.mul %66, %1702 : !llvm.i64 + %1704 = llvm.add %1701, %1703 : !llvm.i64 + %1705 = llvm.getelementptr %1697[%1704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1696, %1705 : !llvm.ptr> + %1706 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1708 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1709 = llvm.mul %67, %1708 : !llvm.i64 + %1710 = llvm.add %1707, %1709 : !llvm.i64 + %1711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1712 = llvm.mul %67, %1711 : !llvm.i64 + %1713 = llvm.add %1710, %1712 : !llvm.i64 + %1714 = llvm.getelementptr %1706[%1713] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1715 = llvm.load %1714 : !llvm.ptr> + %1716 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %1717 = llvm.sub %64, %155 : !llvm.i64 + %1718 = llvm.select %1716, %1717, %155 : !llvm.i1, !llvm.i64 + %1719 = llvm.sdiv %1718, %68 : !llvm.i64 + %1720 = llvm.sub %64, %1719 : !llvm.i64 + %1721 = llvm.select %1716, %1720, %1719 : !llvm.i1, !llvm.i64 + %1722 = llvm.srem %1721, %68 : !llvm.i64 + %1723 = llvm.icmp "slt" %1722, %67 : !llvm.i64 + %1724 = llvm.add %1722, %68 : !llvm.i64 + %1725 = llvm.select %1723, %1724, %1722 : !llvm.i1, !llvm.i64 + %1726 = llvm.srem %153, %39 : !llvm.i64 + %1727 = llvm.icmp "slt" %1726, %67 : !llvm.i64 + %1728 = llvm.add %1726, %39 : !llvm.i64 + %1729 = llvm.select %1727, %1728, %1726 : !llvm.i1, !llvm.i64 + %1730 = llvm.srem %155, %68 : !llvm.i64 + %1731 = llvm.icmp "slt" %1730, %67 : !llvm.i64 + %1732 = llvm.add %1730, %68 : !llvm.i64 + %1733 = llvm.select %1731, %1732, %1730 : !llvm.i1, !llvm.i64 + %1734 = llvm.icmp "slt" %1733, %67 : !llvm.i64 + %1735 = llvm.sub %64, %1733 : !llvm.i64 + %1736 = llvm.select %1734, %1735, %1733 : !llvm.i1, !llvm.i64 + %1737 = llvm.sdiv %1736, %70 : !llvm.i64 + %1738 = llvm.sub %64, %1737 : !llvm.i64 + %1739 = llvm.select %1734, %1738, %1737 : !llvm.i1, !llvm.i64 + %1740 = llvm.srem %1739, %63 : !llvm.i64 + %1741 = llvm.icmp "slt" %1740, %67 : !llvm.i64 + %1742 = llvm.add %1740, %63 : !llvm.i64 + %1743 = llvm.select %1741, %1742, %1740 : !llvm.i1, !llvm.i64 + %1744 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1746 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1747 = llvm.mul %1725, %1746 : !llvm.i64 + %1748 = llvm.add %1745, %1747 : !llvm.i64 + %1749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1750 = llvm.mul %1729, %1749 : !llvm.i64 + %1751 = llvm.add %1748, %1750 : !llvm.i64 + %1752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1753 = llvm.mul %1743, %1752 : !llvm.i64 + %1754 = llvm.add %1751, %1753 : !llvm.i64 + %1755 = llvm.getelementptr %1744[%1754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1715, %1755 : !llvm.ptr> + %1756 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1758 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1759 = llvm.mul %67, %1758 : !llvm.i64 + %1760 = llvm.add %1757, %1759 : !llvm.i64 + %1761 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1762 = llvm.mul %69, %1761 : !llvm.i64 + %1763 = llvm.add %1760, %1762 : !llvm.i64 + %1764 = llvm.getelementptr %1756[%1763] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1765 = llvm.load %1764 : !llvm.ptr> + %1766 = llvm.add %155, %70 : !llvm.i64 + %1767 = llvm.icmp "slt" %1766, %67 : !llvm.i64 + %1768 = llvm.sub %64, %1766 : !llvm.i64 + %1769 = llvm.select %1767, %1768, %1766 : !llvm.i1, !llvm.i64 + %1770 = llvm.sdiv %1769, %68 : !llvm.i64 + %1771 = llvm.sub %64, %1770 : !llvm.i64 + %1772 = llvm.select %1767, %1771, %1770 : !llvm.i1, !llvm.i64 + %1773 = llvm.srem %1772, %68 : !llvm.i64 + %1774 = llvm.icmp "slt" %1773, %67 : !llvm.i64 + %1775 = llvm.add %1773, %68 : !llvm.i64 + %1776 = llvm.select %1774, %1775, %1773 : !llvm.i1, !llvm.i64 + %1777 = llvm.sdiv %1718, %70 : !llvm.i64 + %1778 = llvm.sub %64, %1777 : !llvm.i64 + %1779 = llvm.select %1716, %1778, %1777 : !llvm.i1, !llvm.i64 + %1780 = llvm.mul %1772, %65 : !llvm.i64 + %1781 = llvm.add %1779, %1780 : !llvm.i64 + %1782 = llvm.add %1781, %69 : !llvm.i64 + %1783 = llvm.icmp "slt" %1782, %67 : !llvm.i64 + %1784 = llvm.sub %64, %1782 : !llvm.i64 + %1785 = llvm.select %1783, %1784, %1782 : !llvm.i1, !llvm.i64 + %1786 = llvm.sdiv %1785, %63 : !llvm.i64 + %1787 = llvm.sub %64, %1786 : !llvm.i64 + %1788 = llvm.select %1783, %1787, %1786 : !llvm.i1, !llvm.i64 + %1789 = llvm.mul %1788, %65 : !llvm.i64 + %1790 = llvm.add %1781, %1789 : !llvm.i64 + %1791 = llvm.add %1790, %69 : !llvm.i64 + %1792 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1794 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1795 = llvm.mul %1776, %1794 : !llvm.i64 + %1796 = llvm.add %1793, %1795 : !llvm.i64 + %1797 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1798 = llvm.mul %1729, %1797 : !llvm.i64 + %1799 = llvm.add %1796, %1798 : !llvm.i64 + %1800 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1801 = llvm.mul %1791, %1800 : !llvm.i64 + %1802 = llvm.add %1799, %1801 : !llvm.i64 + %1803 = llvm.getelementptr %1792[%1802] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1765, %1803 : !llvm.ptr> + %1804 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1805 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1806 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1807 = llvm.mul %67, %1806 : !llvm.i64 + %1808 = llvm.add %1805, %1807 : !llvm.i64 + %1809 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1810 = llvm.mul %63, %1809 : !llvm.i64 + %1811 = llvm.add %1808, %1810 : !llvm.i64 + %1812 = llvm.getelementptr %1804[%1811] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1813 = llvm.load %1812 : !llvm.ptr> + %1814 = llvm.add %1721, %69 : !llvm.i64 + %1815 = llvm.icmp "slt" %1814, %67 : !llvm.i64 + %1816 = llvm.sub %64, %1814 : !llvm.i64 + %1817 = llvm.select %1815, %1816, %1814 : !llvm.i1, !llvm.i64 + %1818 = llvm.sdiv %1817, %68 : !llvm.i64 + %1819 = llvm.sub %64, %1818 : !llvm.i64 + %1820 = llvm.select %1815, %1819, %1818 : !llvm.i1, !llvm.i64 + %1821 = llvm.mul %1820, %60 : !llvm.i64 + %1822 = llvm.add %1721, %1821 : !llvm.i64 + %1823 = llvm.add %1822, %69 : !llvm.i64 + %1824 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1825 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1826 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1827 = llvm.mul %1823, %1826 : !llvm.i64 + %1828 = llvm.add %1825, %1827 : !llvm.i64 + %1829 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1830 = llvm.mul %1729, %1829 : !llvm.i64 + %1831 = llvm.add %1828, %1830 : !llvm.i64 + %1832 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1833 = llvm.mul %1743, %1832 : !llvm.i64 + %1834 = llvm.add %1831, %1833 : !llvm.i64 + %1835 = llvm.getelementptr %1824[%1834] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1813, %1835 : !llvm.ptr> + %1836 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1837 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1838 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1839 = llvm.mul %67, %1838 : !llvm.i64 + %1840 = llvm.add %1837, %1839 : !llvm.i64 + %1841 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1842 = llvm.mul %45, %1841 : !llvm.i64 + %1843 = llvm.add %1840, %1842 : !llvm.i64 + %1844 = llvm.getelementptr %1836[%1843] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1845 = llvm.load %1844 : !llvm.ptr> + %1846 = llvm.add %155, %41 : !llvm.i64 + %1847 = llvm.icmp "slt" %1846, %67 : !llvm.i64 + %1848 = llvm.sub %64, %1846 : !llvm.i64 + %1849 = llvm.select %1847, %1848, %1846 : !llvm.i1, !llvm.i64 + %1850 = llvm.sdiv %1849, %68 : !llvm.i64 + %1851 = llvm.sub %64, %1850 : !llvm.i64 + %1852 = llvm.select %1847, %1851, %1850 : !llvm.i1, !llvm.i64 + %1853 = llvm.srem %1852, %68 : !llvm.i64 + %1854 = llvm.icmp "slt" %1853, %67 : !llvm.i64 + %1855 = llvm.add %1853, %68 : !llvm.i64 + %1856 = llvm.select %1854, %1855, %1853 : !llvm.i1, !llvm.i64 + %1857 = llvm.mul %1852, %65 : !llvm.i64 + %1858 = llvm.add %1779, %1857 : !llvm.i64 + %1859 = llvm.add %1858, %45 : !llvm.i64 + %1860 = llvm.icmp "slt" %1859, %67 : !llvm.i64 + %1861 = llvm.sub %64, %1859 : !llvm.i64 + %1862 = llvm.select %1860, %1861, %1859 : !llvm.i1, !llvm.i64 + %1863 = llvm.sdiv %1862, %63 : !llvm.i64 + %1864 = llvm.sub %64, %1863 : !llvm.i64 + %1865 = llvm.select %1860, %1864, %1863 : !llvm.i1, !llvm.i64 + %1866 = llvm.mul %1865, %65 : !llvm.i64 + %1867 = llvm.add %1858, %1866 : !llvm.i64 + %1868 = llvm.add %1867, %45 : !llvm.i64 + %1869 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1871 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1872 = llvm.mul %1856, %1871 : !llvm.i64 + %1873 = llvm.add %1870, %1872 : !llvm.i64 + %1874 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1875 = llvm.mul %1729, %1874 : !llvm.i64 + %1876 = llvm.add %1873, %1875 : !llvm.i64 + %1877 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1878 = llvm.mul %1868, %1877 : !llvm.i64 + %1879 = llvm.add %1876, %1878 : !llvm.i64 + %1880 = llvm.getelementptr %1869[%1879] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1845, %1880 : !llvm.ptr> + %1881 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1882 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1883 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1884 = llvm.mul %67, %1883 : !llvm.i64 + %1885 = llvm.add %1882, %1884 : !llvm.i64 + %1886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1887 = llvm.mul %48, %1886 : !llvm.i64 + %1888 = llvm.add %1885, %1887 : !llvm.i64 + %1889 = llvm.getelementptr %1881[%1888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1890 = llvm.load %1889 : !llvm.ptr> + %1891 = llvm.add %1721, %63 : !llvm.i64 + %1892 = llvm.icmp "slt" %1891, %67 : !llvm.i64 + %1893 = llvm.sub %64, %1891 : !llvm.i64 + %1894 = llvm.select %1892, %1893, %1891 : !llvm.i1, !llvm.i64 + %1895 = llvm.sdiv %1894, %68 : !llvm.i64 + %1896 = llvm.sub %64, %1895 : !llvm.i64 + %1897 = llvm.select %1892, %1896, %1895 : !llvm.i1, !llvm.i64 + %1898 = llvm.mul %1897, %60 : !llvm.i64 + %1899 = llvm.add %1721, %1898 : !llvm.i64 + %1900 = llvm.add %1899, %63 : !llvm.i64 + %1901 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1903 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1904 = llvm.mul %1900, %1903 : !llvm.i64 + %1905 = llvm.add %1902, %1904 : !llvm.i64 + %1906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1907 = llvm.mul %1729, %1906 : !llvm.i64 + %1908 = llvm.add %1905, %1907 : !llvm.i64 + %1909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1910 = llvm.mul %1743, %1909 : !llvm.i64 + %1911 = llvm.add %1908, %1910 : !llvm.i64 + %1912 = llvm.getelementptr %1901[%1911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1890, %1912 : !llvm.ptr> + %1913 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1914 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1915 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1916 = llvm.mul %67, %1915 : !llvm.i64 + %1917 = llvm.add %1914, %1916 : !llvm.i64 + %1918 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1919 = llvm.mul %52, %1918 : !llvm.i64 + %1920 = llvm.add %1917, %1919 : !llvm.i64 + %1921 = llvm.getelementptr %1913[%1920] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1922 = llvm.load %1921 : !llvm.ptr> + %1923 = llvm.add %155, %43 : !llvm.i64 + %1924 = llvm.icmp "slt" %1923, %67 : !llvm.i64 + %1925 = llvm.sub %64, %1923 : !llvm.i64 + %1926 = llvm.select %1924, %1925, %1923 : !llvm.i1, !llvm.i64 + %1927 = llvm.sdiv %1926, %68 : !llvm.i64 + %1928 = llvm.sub %64, %1927 : !llvm.i64 + %1929 = llvm.select %1924, %1928, %1927 : !llvm.i1, !llvm.i64 + %1930 = llvm.srem %1929, %68 : !llvm.i64 + %1931 = llvm.icmp "slt" %1930, %67 : !llvm.i64 + %1932 = llvm.add %1930, %68 : !llvm.i64 + %1933 = llvm.select %1931, %1932, %1930 : !llvm.i1, !llvm.i64 + %1934 = llvm.mul %1929, %65 : !llvm.i64 + %1935 = llvm.add %1779, %1934 : !llvm.i64 + %1936 = llvm.add %1935, %52 : !llvm.i64 + %1937 = llvm.icmp "slt" %1936, %67 : !llvm.i64 + %1938 = llvm.sub %64, %1936 : !llvm.i64 + %1939 = llvm.select %1937, %1938, %1936 : !llvm.i1, !llvm.i64 + %1940 = llvm.sdiv %1939, %63 : !llvm.i64 + %1941 = llvm.sub %64, %1940 : !llvm.i64 + %1942 = llvm.select %1937, %1941, %1940 : !llvm.i1, !llvm.i64 + %1943 = llvm.mul %1942, %65 : !llvm.i64 + %1944 = llvm.add %1935, %1943 : !llvm.i64 + %1945 = llvm.add %1944, %52 : !llvm.i64 + %1946 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1948 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1949 = llvm.mul %1933, %1948 : !llvm.i64 + %1950 = llvm.add %1947, %1949 : !llvm.i64 + %1951 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1952 = llvm.mul %1729, %1951 : !llvm.i64 + %1953 = llvm.add %1950, %1952 : !llvm.i64 + %1954 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1955 = llvm.mul %1945, %1954 : !llvm.i64 + %1956 = llvm.add %1953, %1955 : !llvm.i64 + %1957 = llvm.getelementptr %1946[%1956] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1922, %1957 : !llvm.ptr> + %1958 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1960 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1961 = llvm.mul %67, %1960 : !llvm.i64 + %1962 = llvm.add %1959, %1961 : !llvm.i64 + %1963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1964 = llvm.mul %56, %1963 : !llvm.i64 + %1965 = llvm.add %1962, %1964 : !llvm.i64 + %1966 = llvm.getelementptr %1958[%1965] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1967 = llvm.load %1966 : !llvm.ptr> + %1968 = llvm.add %1721, %45 : !llvm.i64 + %1969 = llvm.icmp "slt" %1968, %67 : !llvm.i64 + %1970 = llvm.sub %64, %1968 : !llvm.i64 + %1971 = llvm.select %1969, %1970, %1968 : !llvm.i1, !llvm.i64 + %1972 = llvm.sdiv %1971, %68 : !llvm.i64 + %1973 = llvm.sub %64, %1972 : !llvm.i64 + %1974 = llvm.select %1969, %1973, %1972 : !llvm.i1, !llvm.i64 + %1975 = llvm.mul %1974, %60 : !llvm.i64 + %1976 = llvm.add %1721, %1975 : !llvm.i64 + %1977 = llvm.add %1976, %45 : !llvm.i64 + %1978 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1980 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1981 = llvm.mul %1977, %1980 : !llvm.i64 + %1982 = llvm.add %1979, %1981 : !llvm.i64 + %1983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1984 = llvm.mul %1729, %1983 : !llvm.i64 + %1985 = llvm.add %1982, %1984 : !llvm.i64 + %1986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1987 = llvm.mul %1743, %1986 : !llvm.i64 + %1988 = llvm.add %1985, %1987 : !llvm.i64 + %1989 = llvm.getelementptr %1978[%1988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1967, %1989 : !llvm.ptr> + %1990 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1991 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1992 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1993 = llvm.mul %67, %1992 : !llvm.i64 + %1994 = llvm.add %1991, %1993 : !llvm.i64 + %1995 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1996 = llvm.mul %61, %1995 : !llvm.i64 + %1997 = llvm.add %1994, %1996 : !llvm.i64 + %1998 = llvm.getelementptr %1990[%1997] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1999 = llvm.load %1998 : !llvm.ptr> + %2000 = llvm.add %155, %46 : !llvm.i64 + %2001 = llvm.icmp "slt" %2000, %67 : !llvm.i64 + %2002 = llvm.sub %64, %2000 : !llvm.i64 + %2003 = llvm.select %2001, %2002, %2000 : !llvm.i1, !llvm.i64 + %2004 = llvm.sdiv %2003, %68 : !llvm.i64 + %2005 = llvm.sub %64, %2004 : !llvm.i64 + %2006 = llvm.select %2001, %2005, %2004 : !llvm.i1, !llvm.i64 + %2007 = llvm.srem %2006, %68 : !llvm.i64 + %2008 = llvm.icmp "slt" %2007, %67 : !llvm.i64 + %2009 = llvm.add %2007, %68 : !llvm.i64 + %2010 = llvm.select %2008, %2009, %2007 : !llvm.i1, !llvm.i64 + %2011 = llvm.mul %2006, %65 : !llvm.i64 + %2012 = llvm.add %1779, %2011 : !llvm.i64 + %2013 = llvm.add %2012, %61 : !llvm.i64 + %2014 = llvm.icmp "slt" %2013, %67 : !llvm.i64 + %2015 = llvm.sub %64, %2013 : !llvm.i64 + %2016 = llvm.select %2014, %2015, %2013 : !llvm.i1, !llvm.i64 + %2017 = llvm.sdiv %2016, %63 : !llvm.i64 + %2018 = llvm.sub %64, %2017 : !llvm.i64 + %2019 = llvm.select %2014, %2018, %2017 : !llvm.i1, !llvm.i64 + %2020 = llvm.mul %2019, %65 : !llvm.i64 + %2021 = llvm.add %2012, %2020 : !llvm.i64 + %2022 = llvm.add %2021, %61 : !llvm.i64 + %2023 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2024 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2025 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2026 = llvm.mul %2010, %2025 : !llvm.i64 + %2027 = llvm.add %2024, %2026 : !llvm.i64 + %2028 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2029 = llvm.mul %1729, %2028 : !llvm.i64 + %2030 = llvm.add %2027, %2029 : !llvm.i64 + %2031 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2032 = llvm.mul %2022, %2031 : !llvm.i64 + %2033 = llvm.add %2030, %2032 : !llvm.i64 + %2034 = llvm.getelementptr %2023[%2033] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1999, %2034 : !llvm.ptr> + %2035 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2036 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2037 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2038 = llvm.mul %67, %2037 : !llvm.i64 + %2039 = llvm.add %2036, %2038 : !llvm.i64 + %2040 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2041 = llvm.mul %70, %2040 : !llvm.i64 + %2042 = llvm.add %2039, %2041 : !llvm.i64 + %2043 = llvm.getelementptr %2035[%2042] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2044 = llvm.load %2043 : !llvm.ptr> + %2045 = llvm.add %1721, %48 : !llvm.i64 + %2046 = llvm.icmp "slt" %2045, %67 : !llvm.i64 + %2047 = llvm.sub %64, %2045 : !llvm.i64 + %2048 = llvm.select %2046, %2047, %2045 : !llvm.i1, !llvm.i64 + %2049 = llvm.sdiv %2048, %68 : !llvm.i64 + %2050 = llvm.sub %64, %2049 : !llvm.i64 + %2051 = llvm.select %2046, %2050, %2049 : !llvm.i1, !llvm.i64 + %2052 = llvm.mul %2051, %60 : !llvm.i64 + %2053 = llvm.add %1721, %2052 : !llvm.i64 + %2054 = llvm.add %2053, %48 : !llvm.i64 + %2055 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2056 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2057 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2058 = llvm.mul %2054, %2057 : !llvm.i64 + %2059 = llvm.add %2056, %2058 : !llvm.i64 + %2060 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2061 = llvm.mul %1729, %2060 : !llvm.i64 + %2062 = llvm.add %2059, %2061 : !llvm.i64 + %2063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2064 = llvm.mul %1743, %2063 : !llvm.i64 + %2065 = llvm.add %2062, %2064 : !llvm.i64 + %2066 = llvm.getelementptr %2055[%2065] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2044, %2066 : !llvm.ptr> + %2067 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2069 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2070 = llvm.mul %67, %2069 : !llvm.i64 + %2071 = llvm.add %2068, %2070 : !llvm.i64 + %2072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2073 = llvm.mul %50, %2072 : !llvm.i64 + %2074 = llvm.add %2071, %2073 : !llvm.i64 + %2075 = llvm.getelementptr %2067[%2074] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2076 = llvm.load %2075 : !llvm.ptr> + %2077 = llvm.add %155, %49 : !llvm.i64 + %2078 = llvm.icmp "slt" %2077, %67 : !llvm.i64 + %2079 = llvm.sub %64, %2077 : !llvm.i64 + %2080 = llvm.select %2078, %2079, %2077 : !llvm.i1, !llvm.i64 + %2081 = llvm.sdiv %2080, %68 : !llvm.i64 + %2082 = llvm.sub %64, %2081 : !llvm.i64 + %2083 = llvm.select %2078, %2082, %2081 : !llvm.i1, !llvm.i64 + %2084 = llvm.srem %2083, %68 : !llvm.i64 + %2085 = llvm.icmp "slt" %2084, %67 : !llvm.i64 + %2086 = llvm.add %2084, %68 : !llvm.i64 + %2087 = llvm.select %2085, %2086, %2084 : !llvm.i1, !llvm.i64 + %2088 = llvm.mul %2083, %65 : !llvm.i64 + %2089 = llvm.add %1779, %2088 : !llvm.i64 + %2090 = llvm.add %2089, %50 : !llvm.i64 + %2091 = llvm.icmp "slt" %2090, %67 : !llvm.i64 + %2092 = llvm.sub %64, %2090 : !llvm.i64 + %2093 = llvm.select %2091, %2092, %2090 : !llvm.i1, !llvm.i64 + %2094 = llvm.sdiv %2093, %63 : !llvm.i64 + %2095 = llvm.sub %64, %2094 : !llvm.i64 + %2096 = llvm.select %2091, %2095, %2094 : !llvm.i1, !llvm.i64 + %2097 = llvm.mul %2096, %65 : !llvm.i64 + %2098 = llvm.add %2089, %2097 : !llvm.i64 + %2099 = llvm.add %2098, %50 : !llvm.i64 + %2100 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2101 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2102 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2103 = llvm.mul %2087, %2102 : !llvm.i64 + %2104 = llvm.add %2101, %2103 : !llvm.i64 + %2105 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2106 = llvm.mul %1729, %2105 : !llvm.i64 + %2107 = llvm.add %2104, %2106 : !llvm.i64 + %2108 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2109 = llvm.mul %2099, %2108 : !llvm.i64 + %2110 = llvm.add %2107, %2109 : !llvm.i64 + %2111 = llvm.getelementptr %2100[%2110] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2076, %2111 : !llvm.ptr> + %2112 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2114 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2115 = llvm.mul %67, %2114 : !llvm.i64 + %2116 = llvm.add %2113, %2115 : !llvm.i64 + %2117 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2118 = llvm.mul %33, %2117 : !llvm.i64 + %2119 = llvm.add %2116, %2118 : !llvm.i64 + %2120 = llvm.getelementptr %2112[%2119] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2121 = llvm.load %2120 : !llvm.ptr> + %2122 = llvm.add %1721, %52 : !llvm.i64 + %2123 = llvm.icmp "slt" %2122, %67 : !llvm.i64 + %2124 = llvm.sub %64, %2122 : !llvm.i64 + %2125 = llvm.select %2123, %2124, %2122 : !llvm.i1, !llvm.i64 + %2126 = llvm.sdiv %2125, %68 : !llvm.i64 + %2127 = llvm.sub %64, %2126 : !llvm.i64 + %2128 = llvm.select %2123, %2127, %2126 : !llvm.i1, !llvm.i64 + %2129 = llvm.mul %2128, %60 : !llvm.i64 + %2130 = llvm.add %1721, %2129 : !llvm.i64 + %2131 = llvm.add %2130, %52 : !llvm.i64 + %2132 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2134 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2135 = llvm.mul %2131, %2134 : !llvm.i64 + %2136 = llvm.add %2133, %2135 : !llvm.i64 + %2137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2138 = llvm.mul %1729, %2137 : !llvm.i64 + %2139 = llvm.add %2136, %2138 : !llvm.i64 + %2140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2141 = llvm.mul %1743, %2140 : !llvm.i64 + %2142 = llvm.add %2139, %2141 : !llvm.i64 + %2143 = llvm.getelementptr %2132[%2142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2121, %2143 : !llvm.ptr> + %2144 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2145 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2146 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2147 = llvm.mul %67, %2146 : !llvm.i64 + %2148 = llvm.add %2145, %2147 : !llvm.i64 + %2149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2150 = llvm.mul %54, %2149 : !llvm.i64 + %2151 = llvm.add %2148, %2150 : !llvm.i64 + %2152 = llvm.getelementptr %2144[%2151] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2153 = llvm.load %2152 : !llvm.ptr> + %2154 = llvm.add %155, %53 : !llvm.i64 + %2155 = llvm.icmp "slt" %2154, %67 : !llvm.i64 + %2156 = llvm.sub %64, %2154 : !llvm.i64 + %2157 = llvm.select %2155, %2156, %2154 : !llvm.i1, !llvm.i64 + %2158 = llvm.sdiv %2157, %68 : !llvm.i64 + %2159 = llvm.sub %64, %2158 : !llvm.i64 + %2160 = llvm.select %2155, %2159, %2158 : !llvm.i1, !llvm.i64 + %2161 = llvm.srem %2160, %68 : !llvm.i64 + %2162 = llvm.icmp "slt" %2161, %67 : !llvm.i64 + %2163 = llvm.add %2161, %68 : !llvm.i64 + %2164 = llvm.select %2162, %2163, %2161 : !llvm.i1, !llvm.i64 + %2165 = llvm.mul %2160, %65 : !llvm.i64 + %2166 = llvm.add %1779, %2165 : !llvm.i64 + %2167 = llvm.add %2166, %54 : !llvm.i64 + %2168 = llvm.icmp "slt" %2167, %67 : !llvm.i64 + %2169 = llvm.sub %64, %2167 : !llvm.i64 + %2170 = llvm.select %2168, %2169, %2167 : !llvm.i1, !llvm.i64 + %2171 = llvm.sdiv %2170, %63 : !llvm.i64 + %2172 = llvm.sub %64, %2171 : !llvm.i64 + %2173 = llvm.select %2168, %2172, %2171 : !llvm.i1, !llvm.i64 + %2174 = llvm.mul %2173, %65 : !llvm.i64 + %2175 = llvm.add %2166, %2174 : !llvm.i64 + %2176 = llvm.add %2175, %54 : !llvm.i64 + %2177 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2179 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2180 = llvm.mul %2164, %2179 : !llvm.i64 + %2181 = llvm.add %2178, %2180 : !llvm.i64 + %2182 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2183 = llvm.mul %1729, %2182 : !llvm.i64 + %2184 = llvm.add %2181, %2183 : !llvm.i64 + %2185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2186 = llvm.mul %2176, %2185 : !llvm.i64 + %2187 = llvm.add %2184, %2186 : !llvm.i64 + %2188 = llvm.getelementptr %2177[%2187] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2153, %2188 : !llvm.ptr> + %2189 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2191 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2192 = llvm.mul %67, %2191 : !llvm.i64 + %2193 = llvm.add %2190, %2192 : !llvm.i64 + %2194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2195 = llvm.mul %34, %2194 : !llvm.i64 + %2196 = llvm.add %2193, %2195 : !llvm.i64 + %2197 = llvm.getelementptr %2189[%2196] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2198 = llvm.load %2197 : !llvm.ptr> + %2199 = llvm.add %1721, %56 : !llvm.i64 + %2200 = llvm.icmp "slt" %2199, %67 : !llvm.i64 + %2201 = llvm.sub %64, %2199 : !llvm.i64 + %2202 = llvm.select %2200, %2201, %2199 : !llvm.i1, !llvm.i64 + %2203 = llvm.sdiv %2202, %68 : !llvm.i64 + %2204 = llvm.sub %64, %2203 : !llvm.i64 + %2205 = llvm.select %2200, %2204, %2203 : !llvm.i1, !llvm.i64 + %2206 = llvm.mul %2205, %60 : !llvm.i64 + %2207 = llvm.add %1721, %2206 : !llvm.i64 + %2208 = llvm.add %2207, %56 : !llvm.i64 + %2209 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2211 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2212 = llvm.mul %2208, %2211 : !llvm.i64 + %2213 = llvm.add %2210, %2212 : !llvm.i64 + %2214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2215 = llvm.mul %1729, %2214 : !llvm.i64 + %2216 = llvm.add %2213, %2215 : !llvm.i64 + %2217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2218 = llvm.mul %1743, %2217 : !llvm.i64 + %2219 = llvm.add %2216, %2218 : !llvm.i64 + %2220 = llvm.getelementptr %2209[%2219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2198, %2220 : !llvm.ptr> + %2221 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2223 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2224 = llvm.mul %67, %2223 : !llvm.i64 + %2225 = llvm.add %2222, %2224 : !llvm.i64 + %2226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2227 = llvm.mul %58, %2226 : !llvm.i64 + %2228 = llvm.add %2225, %2227 : !llvm.i64 + %2229 = llvm.getelementptr %2221[%2228] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2230 = llvm.load %2229 : !llvm.ptr> + %2231 = llvm.add %155, %57 : !llvm.i64 + %2232 = llvm.icmp "slt" %2231, %67 : !llvm.i64 + %2233 = llvm.sub %64, %2231 : !llvm.i64 + %2234 = llvm.select %2232, %2233, %2231 : !llvm.i1, !llvm.i64 + %2235 = llvm.sdiv %2234, %68 : !llvm.i64 + %2236 = llvm.sub %64, %2235 : !llvm.i64 + %2237 = llvm.select %2232, %2236, %2235 : !llvm.i1, !llvm.i64 + %2238 = llvm.srem %2237, %68 : !llvm.i64 + %2239 = llvm.icmp "slt" %2238, %67 : !llvm.i64 + %2240 = llvm.add %2238, %68 : !llvm.i64 + %2241 = llvm.select %2239, %2240, %2238 : !llvm.i1, !llvm.i64 + %2242 = llvm.mul %2237, %65 : !llvm.i64 + %2243 = llvm.add %1779, %2242 : !llvm.i64 + %2244 = llvm.add %2243, %58 : !llvm.i64 + %2245 = llvm.icmp "slt" %2244, %67 : !llvm.i64 + %2246 = llvm.sub %64, %2244 : !llvm.i64 + %2247 = llvm.select %2245, %2246, %2244 : !llvm.i1, !llvm.i64 + %2248 = llvm.sdiv %2247, %63 : !llvm.i64 + %2249 = llvm.sub %64, %2248 : !llvm.i64 + %2250 = llvm.select %2245, %2249, %2248 : !llvm.i1, !llvm.i64 + %2251 = llvm.mul %2250, %65 : !llvm.i64 + %2252 = llvm.add %2243, %2251 : !llvm.i64 + %2253 = llvm.add %2252, %58 : !llvm.i64 + %2254 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2256 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2257 = llvm.mul %2241, %2256 : !llvm.i64 + %2258 = llvm.add %2255, %2257 : !llvm.i64 + %2259 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2260 = llvm.mul %1729, %2259 : !llvm.i64 + %2261 = llvm.add %2258, %2260 : !llvm.i64 + %2262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2263 = llvm.mul %2253, %2262 : !llvm.i64 + %2264 = llvm.add %2261, %2263 : !llvm.i64 + %2265 = llvm.getelementptr %2254[%2264] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2230, %2265 : !llvm.ptr> + %2266 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2268 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2269 = llvm.mul %67, %2268 : !llvm.i64 + %2270 = llvm.add %2267, %2269 : !llvm.i64 + %2271 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2272 = llvm.mul %35, %2271 : !llvm.i64 + %2273 = llvm.add %2270, %2272 : !llvm.i64 + %2274 = llvm.getelementptr %2266[%2273] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2275 = llvm.load %2274 : !llvm.ptr> + %2276 = llvm.add %1721, %61 : !llvm.i64 + %2277 = llvm.icmp "slt" %2276, %67 : !llvm.i64 + %2278 = llvm.sub %64, %2276 : !llvm.i64 + %2279 = llvm.select %2277, %2278, %2276 : !llvm.i1, !llvm.i64 + %2280 = llvm.sdiv %2279, %68 : !llvm.i64 + %2281 = llvm.sub %64, %2280 : !llvm.i64 + %2282 = llvm.select %2277, %2281, %2280 : !llvm.i1, !llvm.i64 + %2283 = llvm.mul %2282, %60 : !llvm.i64 + %2284 = llvm.add %1721, %2283 : !llvm.i64 + %2285 = llvm.add %2284, %61 : !llvm.i64 + %2286 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2287 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2288 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2289 = llvm.mul %2285, %2288 : !llvm.i64 + %2290 = llvm.add %2287, %2289 : !llvm.i64 + %2291 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2292 = llvm.mul %1729, %2291 : !llvm.i64 + %2293 = llvm.add %2290, %2292 : !llvm.i64 + %2294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2295 = llvm.mul %1743, %2294 : !llvm.i64 + %2296 = llvm.add %2293, %2295 : !llvm.i64 + %2297 = llvm.getelementptr %2286[%2296] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2275, %2297 : !llvm.ptr> + %2298 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2300 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2301 = llvm.mul %67, %2300 : !llvm.i64 + %2302 = llvm.add %2299, %2301 : !llvm.i64 + %2303 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2304 = llvm.mul %66, %2303 : !llvm.i64 + %2305 = llvm.add %2302, %2304 : !llvm.i64 + %2306 = llvm.getelementptr %2298[%2305] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2307 = llvm.load %2306 : !llvm.ptr> + %2308 = llvm.add %155, %62 : !llvm.i64 + %2309 = llvm.icmp "slt" %2308, %67 : !llvm.i64 + %2310 = llvm.sub %64, %2308 : !llvm.i64 + %2311 = llvm.select %2309, %2310, %2308 : !llvm.i1, !llvm.i64 + %2312 = llvm.sdiv %2311, %68 : !llvm.i64 + %2313 = llvm.sub %64, %2312 : !llvm.i64 + %2314 = llvm.select %2309, %2313, %2312 : !llvm.i1, !llvm.i64 + %2315 = llvm.srem %2314, %68 : !llvm.i64 + %2316 = llvm.icmp "slt" %2315, %67 : !llvm.i64 + %2317 = llvm.add %2315, %68 : !llvm.i64 + %2318 = llvm.select %2316, %2317, %2315 : !llvm.i1, !llvm.i64 + %2319 = llvm.mul %2314, %65 : !llvm.i64 + %2320 = llvm.add %1779, %2319 : !llvm.i64 + %2321 = llvm.add %2320, %66 : !llvm.i64 + %2322 = llvm.icmp "slt" %2321, %67 : !llvm.i64 + %2323 = llvm.sub %64, %2321 : !llvm.i64 + %2324 = llvm.select %2322, %2323, %2321 : !llvm.i1, !llvm.i64 + %2325 = llvm.sdiv %2324, %63 : !llvm.i64 + %2326 = llvm.sub %64, %2325 : !llvm.i64 + %2327 = llvm.select %2322, %2326, %2325 : !llvm.i1, !llvm.i64 + %2328 = llvm.mul %2327, %65 : !llvm.i64 + %2329 = llvm.add %2320, %2328 : !llvm.i64 + %2330 = llvm.add %2329, %66 : !llvm.i64 + %2331 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2332 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2333 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2334 = llvm.mul %2318, %2333 : !llvm.i64 + %2335 = llvm.add %2332, %2334 : !llvm.i64 + %2336 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2337 = llvm.mul %1729, %2336 : !llvm.i64 + %2338 = llvm.add %2335, %2337 : !llvm.i64 + %2339 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2340 = llvm.mul %2330, %2339 : !llvm.i64 + %2341 = llvm.add %2338, %2340 : !llvm.i64 + %2342 = llvm.getelementptr %2331[%2341] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2307, %2342 : !llvm.ptr> + llvm.br ^bb9 + ^bb9: // 2 preds: ^bb7, ^bb8 + %2343 = llvm.add %155, %39 : !llvm.i64 + llvm.br ^bb5(%2343 : !llvm.i64) + ^bb10: // pred: ^bb5 + %2344 = llvm.add %153, %69 : !llvm.i64 + llvm.br ^bb3(%2344 : !llvm.i64) + ^bb11: // pred: ^bb3 + llvm.br ^bb12(%67 : !llvm.i64) + ^bb12(%2345: !llvm.i64): // 2 preds: ^bb11, ^bb51 + %2346 = llvm.icmp "slt" %2345, %37 : !llvm.i64 + llvm.cond_br %2346, ^bb13, ^bb52 + ^bb13: // pred: ^bb12 + llvm.br ^bb14(%67 : !llvm.i64) + ^bb14(%2347: !llvm.i64): // 2 preds: ^bb13, ^bb21 + %2348 = llvm.icmp "slt" %2347, %68 : !llvm.i64 + llvm.cond_br %2348, ^bb15, ^bb22 + ^bb15: // pred: ^bb14 + llvm.br ^bb16(%67 : !llvm.i64) + ^bb16(%2349: !llvm.i64): // 2 preds: ^bb15, ^bb20 + %2350 = llvm.icmp "slt" %2349, %56 : !llvm.i64 + llvm.cond_br %2350, ^bb17, ^bb21 + ^bb17: // pred: ^bb16 + llvm.br ^bb18(%67 : !llvm.i64) + ^bb18(%2351: !llvm.i64): // 2 preds: ^bb17, ^bb19 + %2352 = llvm.icmp "slt" %2351, %63 : !llvm.i64 + llvm.cond_br %2352, ^bb19, ^bb20 + ^bb19: // pred: ^bb18 + %2353 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2354 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2355 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2356 = llvm.mul %2347, %2355 : !llvm.i64 + %2357 = llvm.add %2354, %2356 : !llvm.i64 + %2358 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2359 = llvm.mul %2349, %2358 : !llvm.i64 + %2360 = llvm.add %2357, %2359 : !llvm.i64 + %2361 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2362 = llvm.mul %2351, %2361 : !llvm.i64 + %2363 = llvm.add %2360, %2362 : !llvm.i64 + %2364 = llvm.getelementptr %2353[%2363] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %32, %2364 : !llvm.ptr> + %2365 = llvm.add %2351, %69 : !llvm.i64 + llvm.br ^bb18(%2365 : !llvm.i64) + ^bb20: // pred: ^bb18 + %2366 = llvm.add %2349, %69 : !llvm.i64 + llvm.br ^bb16(%2366 : !llvm.i64) + ^bb21: // pred: ^bb16 + %2367 = llvm.add %2347, %69 : !llvm.i64 + llvm.br ^bb14(%2367 : !llvm.i64) + ^bb22: // pred: ^bb14 + llvm.br ^bb23(%67 : !llvm.i64) + ^bb23(%2368: !llvm.i64): // 2 preds: ^bb22, ^bb39 + %2369 = llvm.icmp "slt" %2368, %38 : !llvm.i64 + llvm.cond_br %2369, ^bb24, ^bb40 + ^bb24: // pred: ^bb23 + llvm.br ^bb25(%67 : !llvm.i64) + ^bb25(%2370: !llvm.i64): // 2 preds: ^bb24, ^bb38 + %2371 = llvm.icmp "slt" %2370, %39 : !llvm.i64 + llvm.cond_br %2371, ^bb26, ^bb39 + ^bb26: // pred: ^bb25 + llvm.br ^bb27(%67 : !llvm.i64) + ^bb27(%2372: !llvm.i64): // 2 preds: ^bb26, ^bb34 + %2373 = llvm.icmp "slt" %2372, %67 : !llvm.i64 + llvm.cond_br %2373, ^bb28, ^bb35 + ^bb28: // pred: ^bb27 + llvm.br ^bb29(%67 : !llvm.i64) + ^bb29(%2374: !llvm.i64): // 2 preds: ^bb28, ^bb33 + %2375 = llvm.icmp "slt" %2374, %48 : !llvm.i64 + llvm.cond_br %2375, ^bb30, ^bb34 + ^bb30: // pred: ^bb29 + llvm.br ^bb31(%67 : !llvm.i64) + ^bb31(%2376: !llvm.i64): // 2 preds: ^bb30, ^bb32 + %2377 = llvm.icmp "slt" %2376, %67 : !llvm.i64 + llvm.cond_br %2377, ^bb32, ^bb33 + ^bb32: // pred: ^bb31 + %2378 = llvm.add %2345, %2372 : !llvm.i64 + %2379 = llvm.add %2378, %2376 : !llvm.i64 + %2380 = llvm.add %2370, %2374 : !llvm.i64 + %2381 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2383 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2384 = llvm.mul %2379, %2383 : !llvm.i64 + %2385 = llvm.add %2382, %2384 : !llvm.i64 + %2386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2387 = llvm.mul %2380, %2386 : !llvm.i64 + %2388 = llvm.add %2385, %2387 : !llvm.i64 + %2389 = llvm.getelementptr %2381[%2388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2390 = llvm.load %2389 : !llvm.ptr + %2391 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2392 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2393 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2394 = llvm.mul %2379, %2393 : !llvm.i64 + %2395 = llvm.add %2392, %2394 : !llvm.i64 + %2396 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2397 = llvm.mul %2380, %2396 : !llvm.i64 + %2398 = llvm.add %2395, %2397 : !llvm.i64 + %2399 = llvm.getelementptr %2391[%2398] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2400 = llvm.load %2399 : !llvm.ptr + %2401 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2403 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2404 = llvm.mul %2379, %2403 : !llvm.i64 + %2405 = llvm.add %2402, %2404 : !llvm.i64 + %2406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2407 = llvm.mul %2380, %2406 : !llvm.i64 + %2408 = llvm.add %2405, %2407 : !llvm.i64 + %2409 = llvm.getelementptr %2401[%2408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2410 = llvm.load %2409 : !llvm.ptr + %2411 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2413 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2414 = llvm.mul %2379, %2413 : !llvm.i64 + %2415 = llvm.add %2412, %2414 : !llvm.i64 + %2416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2417 = llvm.mul %2380, %2416 : !llvm.i64 + %2418 = llvm.add %2415, %2417 : !llvm.i64 + %2419 = llvm.getelementptr %2411[%2418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2420 = llvm.load %2419 : !llvm.ptr + %2421 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2423 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2424 = llvm.mul %2379, %2423 : !llvm.i64 + %2425 = llvm.add %2422, %2424 : !llvm.i64 + %2426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2427 = llvm.mul %2380, %2426 : !llvm.i64 + %2428 = llvm.add %2425, %2427 : !llvm.i64 + %2429 = llvm.getelementptr %2421[%2428] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2430 = llvm.load %2429 : !llvm.ptr + %2431 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2433 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2434 = llvm.mul %2379, %2433 : !llvm.i64 + %2435 = llvm.add %2432, %2434 : !llvm.i64 + %2436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2437 = llvm.mul %2380, %2436 : !llvm.i64 + %2438 = llvm.add %2435, %2437 : !llvm.i64 + %2439 = llvm.getelementptr %2431[%2438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2440 = llvm.load %2439 : !llvm.ptr + %2441 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2442 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2443 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2444 = llvm.mul %2379, %2443 : !llvm.i64 + %2445 = llvm.add %2442, %2444 : !llvm.i64 + %2446 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2447 = llvm.mul %2380, %2446 : !llvm.i64 + %2448 = llvm.add %2445, %2447 : !llvm.i64 + %2449 = llvm.getelementptr %2441[%2448] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2450 = llvm.load %2449 : !llvm.ptr + %2451 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2452 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2453 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2454 = llvm.mul %2379, %2453 : !llvm.i64 + %2455 = llvm.add %2452, %2454 : !llvm.i64 + %2456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2457 = llvm.mul %2380, %2456 : !llvm.i64 + %2458 = llvm.add %2455, %2457 : !llvm.i64 + %2459 = llvm.getelementptr %2451[%2458] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2460 = llvm.load %2459 : !llvm.ptr + %2461 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %2462 = llvm.sub %64, %2368 : !llvm.i64 + %2463 = llvm.select %2461, %2462, %2368 : !llvm.i1, !llvm.i64 + %2464 = llvm.sdiv %2463, %68 : !llvm.i64 + %2465 = llvm.sub %64, %2464 : !llvm.i64 + %2466 = llvm.select %2461, %2465, %2464 : !llvm.i1, !llvm.i64 + %2467 = llvm.srem %2466, %68 : !llvm.i64 + %2468 = llvm.icmp "slt" %2467, %67 : !llvm.i64 + %2469 = llvm.add %2467, %68 : !llvm.i64 + %2470 = llvm.select %2468, %2469, %2467 : !llvm.i1, !llvm.i64 + %2471 = llvm.srem %2380, %39 : !llvm.i64 + %2472 = llvm.icmp "slt" %2471, %67 : !llvm.i64 + %2473 = llvm.add %2471, %39 : !llvm.i64 + %2474 = llvm.select %2472, %2473, %2471 : !llvm.i1, !llvm.i64 + %2475 = llvm.srem %2368, %68 : !llvm.i64 + %2476 = llvm.icmp "slt" %2475, %67 : !llvm.i64 + %2477 = llvm.add %2475, %68 : !llvm.i64 + %2478 = llvm.select %2476, %2477, %2475 : !llvm.i1, !llvm.i64 + %2479 = llvm.icmp "slt" %2478, %67 : !llvm.i64 + %2480 = llvm.sub %64, %2478 : !llvm.i64 + %2481 = llvm.select %2479, %2480, %2478 : !llvm.i1, !llvm.i64 + %2482 = llvm.sdiv %2481, %70 : !llvm.i64 + %2483 = llvm.sub %64, %2482 : !llvm.i64 + %2484 = llvm.select %2479, %2483, %2482 : !llvm.i1, !llvm.i64 + %2485 = llvm.srem %2484, %63 : !llvm.i64 + %2486 = llvm.icmp "slt" %2485, %67 : !llvm.i64 + %2487 = llvm.add %2485, %63 : !llvm.i64 + %2488 = llvm.select %2486, %2487, %2485 : !llvm.i1, !llvm.i64 + %2489 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2490 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2491 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2492 = llvm.mul %2470, %2491 : !llvm.i64 + %2493 = llvm.add %2490, %2492 : !llvm.i64 + %2494 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2495 = llvm.mul %2474, %2494 : !llvm.i64 + %2496 = llvm.add %2493, %2495 : !llvm.i64 + %2497 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2498 = llvm.mul %2488, %2497 : !llvm.i64 + %2499 = llvm.add %2496, %2498 : !llvm.i64 + %2500 = llvm.getelementptr %2489[%2499] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2501 = llvm.load %2500 : !llvm.ptr> + %2502 = llvm.extractelement %2501[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2503 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2504 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2505 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2506 = llvm.mul %2470, %2505 : !llvm.i64 + %2507 = llvm.add %2504, %2506 : !llvm.i64 + %2508 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2509 = llvm.mul %2474, %2508 : !llvm.i64 + %2510 = llvm.add %2507, %2509 : !llvm.i64 + %2511 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2512 = llvm.mul %2488, %2511 : !llvm.i64 + %2513 = llvm.add %2510, %2512 : !llvm.i64 + %2514 = llvm.getelementptr %2503[%2513] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2515 = llvm.load %2514 : !llvm.ptr> + %2516 = llvm.extractelement %2515[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2517 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2519 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2520 = llvm.mul %2470, %2519 : !llvm.i64 + %2521 = llvm.add %2518, %2520 : !llvm.i64 + %2522 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2523 = llvm.mul %2474, %2522 : !llvm.i64 + %2524 = llvm.add %2521, %2523 : !llvm.i64 + %2525 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2526 = llvm.mul %2488, %2525 : !llvm.i64 + %2527 = llvm.add %2524, %2526 : !llvm.i64 + %2528 = llvm.getelementptr %2517[%2527] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2529 = llvm.load %2528 : !llvm.ptr> + %2530 = llvm.extractelement %2529[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2534 = llvm.mul %2470, %2533 : !llvm.i64 + %2535 = llvm.add %2532, %2534 : !llvm.i64 + %2536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2537 = llvm.mul %2474, %2536 : !llvm.i64 + %2538 = llvm.add %2535, %2537 : !llvm.i64 + %2539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2540 = llvm.mul %2488, %2539 : !llvm.i64 + %2541 = llvm.add %2538, %2540 : !llvm.i64 + %2542 = llvm.getelementptr %2531[%2541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2543 = llvm.load %2542 : !llvm.ptr> + %2544 = llvm.extractelement %2543[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2545 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2546 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2547 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2548 = llvm.mul %2470, %2547 : !llvm.i64 + %2549 = llvm.add %2546, %2548 : !llvm.i64 + %2550 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2551 = llvm.mul %2474, %2550 : !llvm.i64 + %2552 = llvm.add %2549, %2551 : !llvm.i64 + %2553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2554 = llvm.mul %2488, %2553 : !llvm.i64 + %2555 = llvm.add %2552, %2554 : !llvm.i64 + %2556 = llvm.getelementptr %2545[%2555] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2557 = llvm.load %2556 : !llvm.ptr> + %2558 = llvm.extractelement %2557[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2559 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2560 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2561 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2562 = llvm.mul %2470, %2561 : !llvm.i64 + %2563 = llvm.add %2560, %2562 : !llvm.i64 + %2564 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2565 = llvm.mul %2474, %2564 : !llvm.i64 + %2566 = llvm.add %2563, %2565 : !llvm.i64 + %2567 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2568 = llvm.mul %2488, %2567 : !llvm.i64 + %2569 = llvm.add %2566, %2568 : !llvm.i64 + %2570 = llvm.getelementptr %2559[%2569] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2571 = llvm.load %2570 : !llvm.ptr> + %2572 = llvm.extractelement %2571[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2573 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2574 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2575 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2576 = llvm.mul %2470, %2575 : !llvm.i64 + %2577 = llvm.add %2574, %2576 : !llvm.i64 + %2578 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2579 = llvm.mul %2474, %2578 : !llvm.i64 + %2580 = llvm.add %2577, %2579 : !llvm.i64 + %2581 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2582 = llvm.mul %2488, %2581 : !llvm.i64 + %2583 = llvm.add %2580, %2582 : !llvm.i64 + %2584 = llvm.getelementptr %2573[%2583] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2585 = llvm.load %2584 : !llvm.ptr> + %2586 = llvm.extractelement %2585[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2587 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2588 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2589 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2590 = llvm.mul %2470, %2589 : !llvm.i64 + %2591 = llvm.add %2588, %2590 : !llvm.i64 + %2592 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2593 = llvm.mul %2474, %2592 : !llvm.i64 + %2594 = llvm.add %2591, %2593 : !llvm.i64 + %2595 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2596 = llvm.mul %2488, %2595 : !llvm.i64 + %2597 = llvm.add %2594, %2596 : !llvm.i64 + %2598 = llvm.getelementptr %2587[%2597] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2599 = llvm.load %2598 : !llvm.ptr> + %2600 = llvm.extractelement %2599[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2601 = llvm.fmul %2390, %2502 {RelaxedPrecision} : !llvm.float + %2602 = llvm.fmul %2400, %2516 {RelaxedPrecision} : !llvm.float + %2603 = llvm.fmul %2410, %2530 {RelaxedPrecision} : !llvm.float + %2604 = llvm.fmul %2420, %2544 {RelaxedPrecision} : !llvm.float + %2605 = llvm.fmul %2430, %2558 {RelaxedPrecision} : !llvm.float + %2606 = llvm.fmul %2440, %2572 {RelaxedPrecision} : !llvm.float + %2607 = llvm.fmul %2450, %2586 {RelaxedPrecision} : !llvm.float + %2608 = llvm.fmul %2460, %2600 {RelaxedPrecision} : !llvm.float + %2609 = llvm.add %2372, %2376 : !llvm.i64 + %2610 = llvm.srem %2609, %56 : !llvm.i64 + %2611 = llvm.icmp "slt" %2610, %67 : !llvm.i64 + %2612 = llvm.add %2610, %56 : !llvm.i64 + %2613 = llvm.select %2611, %2612, %2610 : !llvm.i1, !llvm.i64 + %2614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2617 = llvm.mul %2470, %2616 : !llvm.i64 + %2618 = llvm.add %2615, %2617 : !llvm.i64 + %2619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2620 = llvm.mul %2613, %2619 : !llvm.i64 + %2621 = llvm.add %2618, %2620 : !llvm.i64 + %2622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2623 = llvm.mul %2488, %2622 : !llvm.i64 + %2624 = llvm.add %2621, %2623 : !llvm.i64 + %2625 = llvm.getelementptr %2614[%2624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2626 = llvm.load %2625 : !llvm.ptr> + %2627 = llvm.extractelement %2626[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2628 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2630 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2631 = llvm.mul %2470, %2630 : !llvm.i64 + %2632 = llvm.add %2629, %2631 : !llvm.i64 + %2633 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2634 = llvm.mul %2613, %2633 : !llvm.i64 + %2635 = llvm.add %2632, %2634 : !llvm.i64 + %2636 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2637 = llvm.mul %2488, %2636 : !llvm.i64 + %2638 = llvm.add %2635, %2637 : !llvm.i64 + %2639 = llvm.getelementptr %2628[%2638] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2640 = llvm.load %2639 : !llvm.ptr> + %2641 = llvm.extractelement %2640[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2642 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2643 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2644 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2645 = llvm.mul %2470, %2644 : !llvm.i64 + %2646 = llvm.add %2643, %2645 : !llvm.i64 + %2647 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2648 = llvm.mul %2613, %2647 : !llvm.i64 + %2649 = llvm.add %2646, %2648 : !llvm.i64 + %2650 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2651 = llvm.mul %2488, %2650 : !llvm.i64 + %2652 = llvm.add %2649, %2651 : !llvm.i64 + %2653 = llvm.getelementptr %2642[%2652] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2654 = llvm.load %2653 : !llvm.ptr> + %2655 = llvm.extractelement %2654[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2659 = llvm.mul %2470, %2658 : !llvm.i64 + %2660 = llvm.add %2657, %2659 : !llvm.i64 + %2661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2662 = llvm.mul %2613, %2661 : !llvm.i64 + %2663 = llvm.add %2660, %2662 : !llvm.i64 + %2664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2665 = llvm.mul %2488, %2664 : !llvm.i64 + %2666 = llvm.add %2663, %2665 : !llvm.i64 + %2667 = llvm.getelementptr %2656[%2666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2668 = llvm.load %2667 : !llvm.ptr> + %2669 = llvm.extractelement %2668[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2673 = llvm.mul %2470, %2672 : !llvm.i64 + %2674 = llvm.add %2671, %2673 : !llvm.i64 + %2675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2676 = llvm.mul %2613, %2675 : !llvm.i64 + %2677 = llvm.add %2674, %2676 : !llvm.i64 + %2678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2679 = llvm.mul %2488, %2678 : !llvm.i64 + %2680 = llvm.add %2677, %2679 : !llvm.i64 + %2681 = llvm.getelementptr %2670[%2680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2682 = llvm.load %2681 : !llvm.ptr> + %2683 = llvm.extractelement %2682[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2684 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2685 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2686 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2687 = llvm.mul %2470, %2686 : !llvm.i64 + %2688 = llvm.add %2685, %2687 : !llvm.i64 + %2689 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2690 = llvm.mul %2613, %2689 : !llvm.i64 + %2691 = llvm.add %2688, %2690 : !llvm.i64 + %2692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2693 = llvm.mul %2488, %2692 : !llvm.i64 + %2694 = llvm.add %2691, %2693 : !llvm.i64 + %2695 = llvm.getelementptr %2684[%2694] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2696 = llvm.load %2695 : !llvm.ptr> + %2697 = llvm.extractelement %2696[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2698 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2700 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2701 = llvm.mul %2470, %2700 : !llvm.i64 + %2702 = llvm.add %2699, %2701 : !llvm.i64 + %2703 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2704 = llvm.mul %2613, %2703 : !llvm.i64 + %2705 = llvm.add %2702, %2704 : !llvm.i64 + %2706 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2707 = llvm.mul %2488, %2706 : !llvm.i64 + %2708 = llvm.add %2705, %2707 : !llvm.i64 + %2709 = llvm.getelementptr %2698[%2708] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2710 = llvm.load %2709 : !llvm.ptr> + %2711 = llvm.extractelement %2710[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2712 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2714 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2715 = llvm.mul %2470, %2714 : !llvm.i64 + %2716 = llvm.add %2713, %2715 : !llvm.i64 + %2717 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2718 = llvm.mul %2613, %2717 : !llvm.i64 + %2719 = llvm.add %2716, %2718 : !llvm.i64 + %2720 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2721 = llvm.mul %2488, %2720 : !llvm.i64 + %2722 = llvm.add %2719, %2721 : !llvm.i64 + %2723 = llvm.getelementptr %2712[%2722] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2724 = llvm.load %2723 : !llvm.ptr> + %2725 = llvm.extractelement %2724[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2726 = llvm.fadd %2627, %2601 {RelaxedPrecision} : !llvm.float + %2727 = llvm.fadd %2641, %2602 {RelaxedPrecision} : !llvm.float + %2728 = llvm.fadd %2655, %2603 {RelaxedPrecision} : !llvm.float + %2729 = llvm.fadd %2669, %2604 {RelaxedPrecision} : !llvm.float + %2730 = llvm.fadd %2683, %2605 {RelaxedPrecision} : !llvm.float + %2731 = llvm.fadd %2697, %2606 {RelaxedPrecision} : !llvm.float + %2732 = llvm.fadd %2711, %2607 {RelaxedPrecision} : !llvm.float + %2733 = llvm.fadd %2725, %2608 {RelaxedPrecision} : !llvm.float + %2734 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2735 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2736 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2737 = llvm.mul %2470, %2736 : !llvm.i64 + %2738 = llvm.add %2735, %2737 : !llvm.i64 + %2739 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2740 = llvm.mul %2613, %2739 : !llvm.i64 + %2741 = llvm.add %2738, %2740 : !llvm.i64 + %2742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2743 = llvm.mul %2488, %2742 : !llvm.i64 + %2744 = llvm.add %2741, %2743 : !llvm.i64 + %2745 = llvm.getelementptr %2734[%2744] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2746 = llvm.load %2745 : !llvm.ptr> + %2747 = llvm.insertelement %2726, %2746[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2748 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2750 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2751 = llvm.mul %2470, %2750 : !llvm.i64 + %2752 = llvm.add %2749, %2751 : !llvm.i64 + %2753 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2754 = llvm.mul %2613, %2753 : !llvm.i64 + %2755 = llvm.add %2752, %2754 : !llvm.i64 + %2756 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2757 = llvm.mul %2488, %2756 : !llvm.i64 + %2758 = llvm.add %2755, %2757 : !llvm.i64 + %2759 = llvm.getelementptr %2748[%2758] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2747, %2759 : !llvm.ptr> + %2760 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2761 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2762 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2763 = llvm.mul %2470, %2762 : !llvm.i64 + %2764 = llvm.add %2761, %2763 : !llvm.i64 + %2765 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2766 = llvm.mul %2613, %2765 : !llvm.i64 + %2767 = llvm.add %2764, %2766 : !llvm.i64 + %2768 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2769 = llvm.mul %2488, %2768 : !llvm.i64 + %2770 = llvm.add %2767, %2769 : !llvm.i64 + %2771 = llvm.getelementptr %2760[%2770] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2772 = llvm.load %2771 : !llvm.ptr> + %2773 = llvm.insertelement %2727, %2772[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2774 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2775 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2776 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2777 = llvm.mul %2470, %2776 : !llvm.i64 + %2778 = llvm.add %2775, %2777 : !llvm.i64 + %2779 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2780 = llvm.mul %2613, %2779 : !llvm.i64 + %2781 = llvm.add %2778, %2780 : !llvm.i64 + %2782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2783 = llvm.mul %2488, %2782 : !llvm.i64 + %2784 = llvm.add %2781, %2783 : !llvm.i64 + %2785 = llvm.getelementptr %2774[%2784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2773, %2785 : !llvm.ptr> + %2786 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2787 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2788 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2789 = llvm.mul %2470, %2788 : !llvm.i64 + %2790 = llvm.add %2787, %2789 : !llvm.i64 + %2791 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2792 = llvm.mul %2613, %2791 : !llvm.i64 + %2793 = llvm.add %2790, %2792 : !llvm.i64 + %2794 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2795 = llvm.mul %2488, %2794 : !llvm.i64 + %2796 = llvm.add %2793, %2795 : !llvm.i64 + %2797 = llvm.getelementptr %2786[%2796] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2798 = llvm.load %2797 : !llvm.ptr> + %2799 = llvm.insertelement %2728, %2798[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2800 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2801 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2802 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2803 = llvm.mul %2470, %2802 : !llvm.i64 + %2804 = llvm.add %2801, %2803 : !llvm.i64 + %2805 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2806 = llvm.mul %2613, %2805 : !llvm.i64 + %2807 = llvm.add %2804, %2806 : !llvm.i64 + %2808 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2809 = llvm.mul %2488, %2808 : !llvm.i64 + %2810 = llvm.add %2807, %2809 : !llvm.i64 + %2811 = llvm.getelementptr %2800[%2810] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2799, %2811 : !llvm.ptr> + %2812 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2813 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2814 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2815 = llvm.mul %2470, %2814 : !llvm.i64 + %2816 = llvm.add %2813, %2815 : !llvm.i64 + %2817 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2818 = llvm.mul %2613, %2817 : !llvm.i64 + %2819 = llvm.add %2816, %2818 : !llvm.i64 + %2820 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2821 = llvm.mul %2488, %2820 : !llvm.i64 + %2822 = llvm.add %2819, %2821 : !llvm.i64 + %2823 = llvm.getelementptr %2812[%2822] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2824 = llvm.load %2823 : !llvm.ptr> + %2825 = llvm.insertelement %2729, %2824[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2826 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2827 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2828 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2829 = llvm.mul %2470, %2828 : !llvm.i64 + %2830 = llvm.add %2827, %2829 : !llvm.i64 + %2831 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2832 = llvm.mul %2613, %2831 : !llvm.i64 + %2833 = llvm.add %2830, %2832 : !llvm.i64 + %2834 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2835 = llvm.mul %2488, %2834 : !llvm.i64 + %2836 = llvm.add %2833, %2835 : !llvm.i64 + %2837 = llvm.getelementptr %2826[%2836] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2825, %2837 : !llvm.ptr> + %2838 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2839 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2840 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2841 = llvm.mul %2470, %2840 : !llvm.i64 + %2842 = llvm.add %2839, %2841 : !llvm.i64 + %2843 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2844 = llvm.mul %2613, %2843 : !llvm.i64 + %2845 = llvm.add %2842, %2844 : !llvm.i64 + %2846 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2847 = llvm.mul %2488, %2846 : !llvm.i64 + %2848 = llvm.add %2845, %2847 : !llvm.i64 + %2849 = llvm.getelementptr %2838[%2848] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2850 = llvm.load %2849 : !llvm.ptr> + %2851 = llvm.insertelement %2730, %2850[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2852 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2853 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2854 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2855 = llvm.mul %2470, %2854 : !llvm.i64 + %2856 = llvm.add %2853, %2855 : !llvm.i64 + %2857 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2858 = llvm.mul %2613, %2857 : !llvm.i64 + %2859 = llvm.add %2856, %2858 : !llvm.i64 + %2860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2861 = llvm.mul %2488, %2860 : !llvm.i64 + %2862 = llvm.add %2859, %2861 : !llvm.i64 + %2863 = llvm.getelementptr %2852[%2862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2851, %2863 : !llvm.ptr> + %2864 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2865 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2866 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2867 = llvm.mul %2470, %2866 : !llvm.i64 + %2868 = llvm.add %2865, %2867 : !llvm.i64 + %2869 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2870 = llvm.mul %2613, %2869 : !llvm.i64 + %2871 = llvm.add %2868, %2870 : !llvm.i64 + %2872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2873 = llvm.mul %2488, %2872 : !llvm.i64 + %2874 = llvm.add %2871, %2873 : !llvm.i64 + %2875 = llvm.getelementptr %2864[%2874] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2876 = llvm.load %2875 : !llvm.ptr> + %2877 = llvm.insertelement %2731, %2876[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2878 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2880 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2881 = llvm.mul %2470, %2880 : !llvm.i64 + %2882 = llvm.add %2879, %2881 : !llvm.i64 + %2883 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2884 = llvm.mul %2613, %2883 : !llvm.i64 + %2885 = llvm.add %2882, %2884 : !llvm.i64 + %2886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2887 = llvm.mul %2488, %2886 : !llvm.i64 + %2888 = llvm.add %2885, %2887 : !llvm.i64 + %2889 = llvm.getelementptr %2878[%2888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2877, %2889 : !llvm.ptr> + %2890 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2892 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2893 = llvm.mul %2470, %2892 : !llvm.i64 + %2894 = llvm.add %2891, %2893 : !llvm.i64 + %2895 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2896 = llvm.mul %2613, %2895 : !llvm.i64 + %2897 = llvm.add %2894, %2896 : !llvm.i64 + %2898 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2899 = llvm.mul %2488, %2898 : !llvm.i64 + %2900 = llvm.add %2897, %2899 : !llvm.i64 + %2901 = llvm.getelementptr %2890[%2900] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2902 = llvm.load %2901 : !llvm.ptr> + %2903 = llvm.insertelement %2732, %2902[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2904 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2905 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2906 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2907 = llvm.mul %2470, %2906 : !llvm.i64 + %2908 = llvm.add %2905, %2907 : !llvm.i64 + %2909 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2910 = llvm.mul %2613, %2909 : !llvm.i64 + %2911 = llvm.add %2908, %2910 : !llvm.i64 + %2912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2913 = llvm.mul %2488, %2912 : !llvm.i64 + %2914 = llvm.add %2911, %2913 : !llvm.i64 + %2915 = llvm.getelementptr %2904[%2914] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2903, %2915 : !llvm.ptr> + %2916 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2917 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2918 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2919 = llvm.mul %2470, %2918 : !llvm.i64 + %2920 = llvm.add %2917, %2919 : !llvm.i64 + %2921 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2922 = llvm.mul %2613, %2921 : !llvm.i64 + %2923 = llvm.add %2920, %2922 : !llvm.i64 + %2924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2925 = llvm.mul %2488, %2924 : !llvm.i64 + %2926 = llvm.add %2923, %2925 : !llvm.i64 + %2927 = llvm.getelementptr %2916[%2926] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2928 = llvm.load %2927 : !llvm.ptr> + %2929 = llvm.insertelement %2733, %2928[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2930 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2931 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2932 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2933 = llvm.mul %2470, %2932 : !llvm.i64 + %2934 = llvm.add %2931, %2933 : !llvm.i64 + %2935 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2936 = llvm.mul %2613, %2935 : !llvm.i64 + %2937 = llvm.add %2934, %2936 : !llvm.i64 + %2938 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2939 = llvm.mul %2488, %2938 : !llvm.i64 + %2940 = llvm.add %2937, %2939 : !llvm.i64 + %2941 = llvm.getelementptr %2930[%2940] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2929, %2941 : !llvm.ptr> + %2942 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2943 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2944 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2945 = llvm.mul %2470, %2944 : !llvm.i64 + %2946 = llvm.add %2943, %2945 : !llvm.i64 + %2947 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2948 = llvm.mul %2613, %2947 : !llvm.i64 + %2949 = llvm.add %2946, %2948 : !llvm.i64 + %2950 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2951 = llvm.mul %2488, %2950 : !llvm.i64 + %2952 = llvm.add %2949, %2951 : !llvm.i64 + %2953 = llvm.getelementptr %2942[%2952] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2954 = llvm.load %2953 : !llvm.ptr> + %2955 = llvm.insertelement %2726, %2954[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2956 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2957 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2958 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2959 = llvm.mul %2470, %2958 : !llvm.i64 + %2960 = llvm.add %2957, %2959 : !llvm.i64 + %2961 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2962 = llvm.mul %2613, %2961 : !llvm.i64 + %2963 = llvm.add %2960, %2962 : !llvm.i64 + %2964 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2965 = llvm.mul %2488, %2964 : !llvm.i64 + %2966 = llvm.add %2963, %2965 : !llvm.i64 + %2967 = llvm.getelementptr %2956[%2966] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2955, %2967 : !llvm.ptr> + %2968 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2970 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2971 = llvm.mul %2470, %2970 : !llvm.i64 + %2972 = llvm.add %2969, %2971 : !llvm.i64 + %2973 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2974 = llvm.mul %2613, %2973 : !llvm.i64 + %2975 = llvm.add %2972, %2974 : !llvm.i64 + %2976 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2977 = llvm.mul %2488, %2976 : !llvm.i64 + %2978 = llvm.add %2975, %2977 : !llvm.i64 + %2979 = llvm.getelementptr %2968[%2978] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2980 = llvm.load %2979 : !llvm.ptr> + %2981 = llvm.insertelement %2727, %2980[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2982 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2984 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2985 = llvm.mul %2470, %2984 : !llvm.i64 + %2986 = llvm.add %2983, %2985 : !llvm.i64 + %2987 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2988 = llvm.mul %2613, %2987 : !llvm.i64 + %2989 = llvm.add %2986, %2988 : !llvm.i64 + %2990 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2991 = llvm.mul %2488, %2990 : !llvm.i64 + %2992 = llvm.add %2989, %2991 : !llvm.i64 + %2993 = llvm.getelementptr %2982[%2992] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2981, %2993 : !llvm.ptr> + %2994 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2995 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2996 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2997 = llvm.mul %2470, %2996 : !llvm.i64 + %2998 = llvm.add %2995, %2997 : !llvm.i64 + %2999 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3000 = llvm.mul %2613, %2999 : !llvm.i64 + %3001 = llvm.add %2998, %3000 : !llvm.i64 + %3002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3003 = llvm.mul %2488, %3002 : !llvm.i64 + %3004 = llvm.add %3001, %3003 : !llvm.i64 + %3005 = llvm.getelementptr %2994[%3004] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3006 = llvm.load %3005 : !llvm.ptr> + %3007 = llvm.insertelement %2728, %3006[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3008 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3010 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3011 = llvm.mul %2470, %3010 : !llvm.i64 + %3012 = llvm.add %3009, %3011 : !llvm.i64 + %3013 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3014 = llvm.mul %2613, %3013 : !llvm.i64 + %3015 = llvm.add %3012, %3014 : !llvm.i64 + %3016 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3017 = llvm.mul %2488, %3016 : !llvm.i64 + %3018 = llvm.add %3015, %3017 : !llvm.i64 + %3019 = llvm.getelementptr %3008[%3018] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3007, %3019 : !llvm.ptr> + %3020 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3021 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3022 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3023 = llvm.mul %2470, %3022 : !llvm.i64 + %3024 = llvm.add %3021, %3023 : !llvm.i64 + %3025 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3026 = llvm.mul %2613, %3025 : !llvm.i64 + %3027 = llvm.add %3024, %3026 : !llvm.i64 + %3028 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3029 = llvm.mul %2488, %3028 : !llvm.i64 + %3030 = llvm.add %3027, %3029 : !llvm.i64 + %3031 = llvm.getelementptr %3020[%3030] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3032 = llvm.load %3031 : !llvm.ptr> + %3033 = llvm.insertelement %2729, %3032[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3037 = llvm.mul %2470, %3036 : !llvm.i64 + %3038 = llvm.add %3035, %3037 : !llvm.i64 + %3039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3040 = llvm.mul %2613, %3039 : !llvm.i64 + %3041 = llvm.add %3038, %3040 : !llvm.i64 + %3042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3043 = llvm.mul %2488, %3042 : !llvm.i64 + %3044 = llvm.add %3041, %3043 : !llvm.i64 + %3045 = llvm.getelementptr %3034[%3044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3033, %3045 : !llvm.ptr> + %3046 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3047 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3048 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3049 = llvm.mul %2470, %3048 : !llvm.i64 + %3050 = llvm.add %3047, %3049 : !llvm.i64 + %3051 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3052 = llvm.mul %2613, %3051 : !llvm.i64 + %3053 = llvm.add %3050, %3052 : !llvm.i64 + %3054 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3055 = llvm.mul %2488, %3054 : !llvm.i64 + %3056 = llvm.add %3053, %3055 : !llvm.i64 + %3057 = llvm.getelementptr %3046[%3056] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3058 = llvm.load %3057 : !llvm.ptr> + %3059 = llvm.insertelement %2730, %3058[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3060 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3062 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3063 = llvm.mul %2470, %3062 : !llvm.i64 + %3064 = llvm.add %3061, %3063 : !llvm.i64 + %3065 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3066 = llvm.mul %2613, %3065 : !llvm.i64 + %3067 = llvm.add %3064, %3066 : !llvm.i64 + %3068 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3069 = llvm.mul %2488, %3068 : !llvm.i64 + %3070 = llvm.add %3067, %3069 : !llvm.i64 + %3071 = llvm.getelementptr %3060[%3070] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3059, %3071 : !llvm.ptr> + %3072 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3073 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3074 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3075 = llvm.mul %2470, %3074 : !llvm.i64 + %3076 = llvm.add %3073, %3075 : !llvm.i64 + %3077 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3078 = llvm.mul %2613, %3077 : !llvm.i64 + %3079 = llvm.add %3076, %3078 : !llvm.i64 + %3080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3081 = llvm.mul %2488, %3080 : !llvm.i64 + %3082 = llvm.add %3079, %3081 : !llvm.i64 + %3083 = llvm.getelementptr %3072[%3082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3084 = llvm.load %3083 : !llvm.ptr> + %3085 = llvm.insertelement %2731, %3084[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3086 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3087 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3088 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3089 = llvm.mul %2470, %3088 : !llvm.i64 + %3090 = llvm.add %3087, %3089 : !llvm.i64 + %3091 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3092 = llvm.mul %2613, %3091 : !llvm.i64 + %3093 = llvm.add %3090, %3092 : !llvm.i64 + %3094 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3095 = llvm.mul %2488, %3094 : !llvm.i64 + %3096 = llvm.add %3093, %3095 : !llvm.i64 + %3097 = llvm.getelementptr %3086[%3096] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3085, %3097 : !llvm.ptr> + %3098 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3100 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3101 = llvm.mul %2470, %3100 : !llvm.i64 + %3102 = llvm.add %3099, %3101 : !llvm.i64 + %3103 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3104 = llvm.mul %2613, %3103 : !llvm.i64 + %3105 = llvm.add %3102, %3104 : !llvm.i64 + %3106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3107 = llvm.mul %2488, %3106 : !llvm.i64 + %3108 = llvm.add %3105, %3107 : !llvm.i64 + %3109 = llvm.getelementptr %3098[%3108] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3110 = llvm.load %3109 : !llvm.ptr> + %3111 = llvm.insertelement %2732, %3110[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3112 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3114 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3115 = llvm.mul %2470, %3114 : !llvm.i64 + %3116 = llvm.add %3113, %3115 : !llvm.i64 + %3117 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3118 = llvm.mul %2613, %3117 : !llvm.i64 + %3119 = llvm.add %3116, %3118 : !llvm.i64 + %3120 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3121 = llvm.mul %2488, %3120 : !llvm.i64 + %3122 = llvm.add %3119, %3121 : !llvm.i64 + %3123 = llvm.getelementptr %3112[%3122] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3111, %3123 : !llvm.ptr> + %3124 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3126 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3127 = llvm.mul %2470, %3126 : !llvm.i64 + %3128 = llvm.add %3125, %3127 : !llvm.i64 + %3129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3130 = llvm.mul %2613, %3129 : !llvm.i64 + %3131 = llvm.add %3128, %3130 : !llvm.i64 + %3132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3133 = llvm.mul %2488, %3132 : !llvm.i64 + %3134 = llvm.add %3131, %3133 : !llvm.i64 + %3135 = llvm.getelementptr %3124[%3134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3136 = llvm.load %3135 : !llvm.ptr> + %3137 = llvm.insertelement %2733, %3136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3138 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3140 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3141 = llvm.mul %2470, %3140 : !llvm.i64 + %3142 = llvm.add %3139, %3141 : !llvm.i64 + %3143 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3144 = llvm.mul %2613, %3143 : !llvm.i64 + %3145 = llvm.add %3142, %3144 : !llvm.i64 + %3146 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3147 = llvm.mul %2488, %3146 : !llvm.i64 + %3148 = llvm.add %3145, %3147 : !llvm.i64 + %3149 = llvm.getelementptr %3138[%3148] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3137, %3149 : !llvm.ptr> + %3150 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3151 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3152 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3153 = llvm.mul %2379, %3152 : !llvm.i64 + %3154 = llvm.add %3151, %3153 : !llvm.i64 + %3155 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3156 = llvm.mul %2380, %3155 : !llvm.i64 + %3157 = llvm.add %3154, %3156 : !llvm.i64 + %3158 = llvm.getelementptr %3150[%3157] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3159 = llvm.load %3158 : !llvm.ptr + %3160 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3162 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3163 = llvm.mul %2379, %3162 : !llvm.i64 + %3164 = llvm.add %3161, %3163 : !llvm.i64 + %3165 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3166 = llvm.mul %2380, %3165 : !llvm.i64 + %3167 = llvm.add %3164, %3166 : !llvm.i64 + %3168 = llvm.getelementptr %3160[%3167] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3169 = llvm.load %3168 : !llvm.ptr + %3170 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3171 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3172 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3173 = llvm.mul %2379, %3172 : !llvm.i64 + %3174 = llvm.add %3171, %3173 : !llvm.i64 + %3175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3176 = llvm.mul %2380, %3175 : !llvm.i64 + %3177 = llvm.add %3174, %3176 : !llvm.i64 + %3178 = llvm.getelementptr %3170[%3177] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3179 = llvm.load %3178 : !llvm.ptr + %3180 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3182 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3183 = llvm.mul %2379, %3182 : !llvm.i64 + %3184 = llvm.add %3181, %3183 : !llvm.i64 + %3185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3186 = llvm.mul %2380, %3185 : !llvm.i64 + %3187 = llvm.add %3184, %3186 : !llvm.i64 + %3188 = llvm.getelementptr %3180[%3187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3189 = llvm.load %3188 : !llvm.ptr + %3190 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3192 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3193 = llvm.mul %2379, %3192 : !llvm.i64 + %3194 = llvm.add %3191, %3193 : !llvm.i64 + %3195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3196 = llvm.mul %2380, %3195 : !llvm.i64 + %3197 = llvm.add %3194, %3196 : !llvm.i64 + %3198 = llvm.getelementptr %3190[%3197] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3199 = llvm.load %3198 : !llvm.ptr + %3200 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3202 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3203 = llvm.mul %2379, %3202 : !llvm.i64 + %3204 = llvm.add %3201, %3203 : !llvm.i64 + %3205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3206 = llvm.mul %2380, %3205 : !llvm.i64 + %3207 = llvm.add %3204, %3206 : !llvm.i64 + %3208 = llvm.getelementptr %3200[%3207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3209 = llvm.load %3208 : !llvm.ptr + %3210 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3212 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3213 = llvm.mul %2379, %3212 : !llvm.i64 + %3214 = llvm.add %3211, %3213 : !llvm.i64 + %3215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3216 = llvm.mul %2380, %3215 : !llvm.i64 + %3217 = llvm.add %3214, %3216 : !llvm.i64 + %3218 = llvm.getelementptr %3210[%3217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3219 = llvm.load %3218 : !llvm.ptr + %3220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3222 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3223 = llvm.mul %2379, %3222 : !llvm.i64 + %3224 = llvm.add %3221, %3223 : !llvm.i64 + %3225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3226 = llvm.mul %2380, %3225 : !llvm.i64 + %3227 = llvm.add %3224, %3226 : !llvm.i64 + %3228 = llvm.getelementptr %3220[%3227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3229 = llvm.load %3228 : !llvm.ptr + %3230 = llvm.add %2368, %70 : !llvm.i64 + %3231 = llvm.icmp "slt" %3230, %67 : !llvm.i64 + %3232 = llvm.sub %64, %3230 : !llvm.i64 + %3233 = llvm.select %3231, %3232, %3230 : !llvm.i1, !llvm.i64 + %3234 = llvm.sdiv %3233, %68 : !llvm.i64 + %3235 = llvm.sub %64, %3234 : !llvm.i64 + %3236 = llvm.select %3231, %3235, %3234 : !llvm.i1, !llvm.i64 + %3237 = llvm.srem %3236, %68 : !llvm.i64 + %3238 = llvm.icmp "slt" %3237, %67 : !llvm.i64 + %3239 = llvm.add %3237, %68 : !llvm.i64 + %3240 = llvm.select %3238, %3239, %3237 : !llvm.i1, !llvm.i64 + %3241 = llvm.sdiv %2463, %70 : !llvm.i64 + %3242 = llvm.sub %64, %3241 : !llvm.i64 + %3243 = llvm.select %2461, %3242, %3241 : !llvm.i1, !llvm.i64 + %3244 = llvm.mul %3236, %65 : !llvm.i64 + %3245 = llvm.add %3243, %3244 : !llvm.i64 + %3246 = llvm.add %3245, %69 : !llvm.i64 + %3247 = llvm.icmp "slt" %3246, %67 : !llvm.i64 + %3248 = llvm.sub %64, %3246 : !llvm.i64 + %3249 = llvm.select %3247, %3248, %3246 : !llvm.i1, !llvm.i64 + %3250 = llvm.sdiv %3249, %63 : !llvm.i64 + %3251 = llvm.sub %64, %3250 : !llvm.i64 + %3252 = llvm.select %3247, %3251, %3250 : !llvm.i1, !llvm.i64 + %3253 = llvm.mul %3252, %65 : !llvm.i64 + %3254 = llvm.add %3245, %3253 : !llvm.i64 + %3255 = llvm.add %3254, %69 : !llvm.i64 + %3256 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3257 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3258 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3259 = llvm.mul %3240, %3258 : !llvm.i64 + %3260 = llvm.add %3257, %3259 : !llvm.i64 + %3261 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3262 = llvm.mul %2474, %3261 : !llvm.i64 + %3263 = llvm.add %3260, %3262 : !llvm.i64 + %3264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3265 = llvm.mul %3255, %3264 : !llvm.i64 + %3266 = llvm.add %3263, %3265 : !llvm.i64 + %3267 = llvm.getelementptr %3256[%3266] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3268 = llvm.load %3267 : !llvm.ptr> + %3269 = llvm.extractelement %3268[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3270 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3271 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3272 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3273 = llvm.mul %3240, %3272 : !llvm.i64 + %3274 = llvm.add %3271, %3273 : !llvm.i64 + %3275 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3276 = llvm.mul %2474, %3275 : !llvm.i64 + %3277 = llvm.add %3274, %3276 : !llvm.i64 + %3278 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3279 = llvm.mul %3255, %3278 : !llvm.i64 + %3280 = llvm.add %3277, %3279 : !llvm.i64 + %3281 = llvm.getelementptr %3270[%3280] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3282 = llvm.load %3281 : !llvm.ptr> + %3283 = llvm.extractelement %3282[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3284 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3286 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3287 = llvm.mul %3240, %3286 : !llvm.i64 + %3288 = llvm.add %3285, %3287 : !llvm.i64 + %3289 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3290 = llvm.mul %2474, %3289 : !llvm.i64 + %3291 = llvm.add %3288, %3290 : !llvm.i64 + %3292 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3293 = llvm.mul %3255, %3292 : !llvm.i64 + %3294 = llvm.add %3291, %3293 : !llvm.i64 + %3295 = llvm.getelementptr %3284[%3294] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3296 = llvm.load %3295 : !llvm.ptr> + %3297 = llvm.extractelement %3296[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3298 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3300 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3301 = llvm.mul %3240, %3300 : !llvm.i64 + %3302 = llvm.add %3299, %3301 : !llvm.i64 + %3303 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3304 = llvm.mul %2474, %3303 : !llvm.i64 + %3305 = llvm.add %3302, %3304 : !llvm.i64 + %3306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3307 = llvm.mul %3255, %3306 : !llvm.i64 + %3308 = llvm.add %3305, %3307 : !llvm.i64 + %3309 = llvm.getelementptr %3298[%3308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3310 = llvm.load %3309 : !llvm.ptr> + %3311 = llvm.extractelement %3310[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3312 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3313 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3314 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3315 = llvm.mul %3240, %3314 : !llvm.i64 + %3316 = llvm.add %3313, %3315 : !llvm.i64 + %3317 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3318 = llvm.mul %2474, %3317 : !llvm.i64 + %3319 = llvm.add %3316, %3318 : !llvm.i64 + %3320 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3321 = llvm.mul %3255, %3320 : !llvm.i64 + %3322 = llvm.add %3319, %3321 : !llvm.i64 + %3323 = llvm.getelementptr %3312[%3322] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3324 = llvm.load %3323 : !llvm.ptr> + %3325 = llvm.extractelement %3324[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3326 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3328 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3329 = llvm.mul %3240, %3328 : !llvm.i64 + %3330 = llvm.add %3327, %3329 : !llvm.i64 + %3331 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3332 = llvm.mul %2474, %3331 : !llvm.i64 + %3333 = llvm.add %3330, %3332 : !llvm.i64 + %3334 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3335 = llvm.mul %3255, %3334 : !llvm.i64 + %3336 = llvm.add %3333, %3335 : !llvm.i64 + %3337 = llvm.getelementptr %3326[%3336] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3338 = llvm.load %3337 : !llvm.ptr> + %3339 = llvm.extractelement %3338[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3340 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3342 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3343 = llvm.mul %3240, %3342 : !llvm.i64 + %3344 = llvm.add %3341, %3343 : !llvm.i64 + %3345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3346 = llvm.mul %2474, %3345 : !llvm.i64 + %3347 = llvm.add %3344, %3346 : !llvm.i64 + %3348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3349 = llvm.mul %3255, %3348 : !llvm.i64 + %3350 = llvm.add %3347, %3349 : !llvm.i64 + %3351 = llvm.getelementptr %3340[%3350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3352 = llvm.load %3351 : !llvm.ptr> + %3353 = llvm.extractelement %3352[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3354 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3356 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3357 = llvm.mul %3240, %3356 : !llvm.i64 + %3358 = llvm.add %3355, %3357 : !llvm.i64 + %3359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3360 = llvm.mul %2474, %3359 : !llvm.i64 + %3361 = llvm.add %3358, %3360 : !llvm.i64 + %3362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3363 = llvm.mul %3255, %3362 : !llvm.i64 + %3364 = llvm.add %3361, %3363 : !llvm.i64 + %3365 = llvm.getelementptr %3354[%3364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3366 = llvm.load %3365 : !llvm.ptr> + %3367 = llvm.extractelement %3366[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3368 = llvm.fmul %3159, %3269 {RelaxedPrecision} : !llvm.float + %3369 = llvm.fmul %3169, %3283 {RelaxedPrecision} : !llvm.float + %3370 = llvm.fmul %3179, %3297 {RelaxedPrecision} : !llvm.float + %3371 = llvm.fmul %3189, %3311 {RelaxedPrecision} : !llvm.float + %3372 = llvm.fmul %3199, %3325 {RelaxedPrecision} : !llvm.float + %3373 = llvm.fmul %3209, %3339 {RelaxedPrecision} : !llvm.float + %3374 = llvm.fmul %3219, %3353 {RelaxedPrecision} : !llvm.float + %3375 = llvm.fmul %3229, %3367 {RelaxedPrecision} : !llvm.float + %3376 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3377 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3378 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3379 = llvm.mul %3240, %3378 : !llvm.i64 + %3380 = llvm.add %3377, %3379 : !llvm.i64 + %3381 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3382 = llvm.mul %2613, %3381 : !llvm.i64 + %3383 = llvm.add %3380, %3382 : !llvm.i64 + %3384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3385 = llvm.mul %3255, %3384 : !llvm.i64 + %3386 = llvm.add %3383, %3385 : !llvm.i64 + %3387 = llvm.getelementptr %3376[%3386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3388 = llvm.load %3387 : !llvm.ptr> + %3389 = llvm.extractelement %3388[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3390 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3392 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3393 = llvm.mul %3240, %3392 : !llvm.i64 + %3394 = llvm.add %3391, %3393 : !llvm.i64 + %3395 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3396 = llvm.mul %2613, %3395 : !llvm.i64 + %3397 = llvm.add %3394, %3396 : !llvm.i64 + %3398 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3399 = llvm.mul %3255, %3398 : !llvm.i64 + %3400 = llvm.add %3397, %3399 : !llvm.i64 + %3401 = llvm.getelementptr %3390[%3400] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3402 = llvm.load %3401 : !llvm.ptr> + %3403 = llvm.extractelement %3402[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3404 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3405 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3406 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3407 = llvm.mul %3240, %3406 : !llvm.i64 + %3408 = llvm.add %3405, %3407 : !llvm.i64 + %3409 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3410 = llvm.mul %2613, %3409 : !llvm.i64 + %3411 = llvm.add %3408, %3410 : !llvm.i64 + %3412 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3413 = llvm.mul %3255, %3412 : !llvm.i64 + %3414 = llvm.add %3411, %3413 : !llvm.i64 + %3415 = llvm.getelementptr %3404[%3414] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3416 = llvm.load %3415 : !llvm.ptr> + %3417 = llvm.extractelement %3416[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3421 = llvm.mul %3240, %3420 : !llvm.i64 + %3422 = llvm.add %3419, %3421 : !llvm.i64 + %3423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3424 = llvm.mul %2613, %3423 : !llvm.i64 + %3425 = llvm.add %3422, %3424 : !llvm.i64 + %3426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3427 = llvm.mul %3255, %3426 : !llvm.i64 + %3428 = llvm.add %3425, %3427 : !llvm.i64 + %3429 = llvm.getelementptr %3418[%3428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3430 = llvm.load %3429 : !llvm.ptr> + %3431 = llvm.extractelement %3430[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3435 = llvm.mul %3240, %3434 : !llvm.i64 + %3436 = llvm.add %3433, %3435 : !llvm.i64 + %3437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3438 = llvm.mul %2613, %3437 : !llvm.i64 + %3439 = llvm.add %3436, %3438 : !llvm.i64 + %3440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3441 = llvm.mul %3255, %3440 : !llvm.i64 + %3442 = llvm.add %3439, %3441 : !llvm.i64 + %3443 = llvm.getelementptr %3432[%3442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3444 = llvm.load %3443 : !llvm.ptr> + %3445 = llvm.extractelement %3444[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3446 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3447 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3448 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3449 = llvm.mul %3240, %3448 : !llvm.i64 + %3450 = llvm.add %3447, %3449 : !llvm.i64 + %3451 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3452 = llvm.mul %2613, %3451 : !llvm.i64 + %3453 = llvm.add %3450, %3452 : !llvm.i64 + %3454 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3455 = llvm.mul %3255, %3454 : !llvm.i64 + %3456 = llvm.add %3453, %3455 : !llvm.i64 + %3457 = llvm.getelementptr %3446[%3456] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3458 = llvm.load %3457 : !llvm.ptr> + %3459 = llvm.extractelement %3458[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3460 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3461 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3462 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3463 = llvm.mul %3240, %3462 : !llvm.i64 + %3464 = llvm.add %3461, %3463 : !llvm.i64 + %3465 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3466 = llvm.mul %2613, %3465 : !llvm.i64 + %3467 = llvm.add %3464, %3466 : !llvm.i64 + %3468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3469 = llvm.mul %3255, %3468 : !llvm.i64 + %3470 = llvm.add %3467, %3469 : !llvm.i64 + %3471 = llvm.getelementptr %3460[%3470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3472 = llvm.load %3471 : !llvm.ptr> + %3473 = llvm.extractelement %3472[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3477 = llvm.mul %3240, %3476 : !llvm.i64 + %3478 = llvm.add %3475, %3477 : !llvm.i64 + %3479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3480 = llvm.mul %2613, %3479 : !llvm.i64 + %3481 = llvm.add %3478, %3480 : !llvm.i64 + %3482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3483 = llvm.mul %3255, %3482 : !llvm.i64 + %3484 = llvm.add %3481, %3483 : !llvm.i64 + %3485 = llvm.getelementptr %3474[%3484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3486 = llvm.load %3485 : !llvm.ptr> + %3487 = llvm.extractelement %3486[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3488 = llvm.fadd %3389, %3368 {RelaxedPrecision} : !llvm.float + %3489 = llvm.fadd %3403, %3369 {RelaxedPrecision} : !llvm.float + %3490 = llvm.fadd %3417, %3370 {RelaxedPrecision} : !llvm.float + %3491 = llvm.fadd %3431, %3371 {RelaxedPrecision} : !llvm.float + %3492 = llvm.fadd %3445, %3372 {RelaxedPrecision} : !llvm.float + %3493 = llvm.fadd %3459, %3373 {RelaxedPrecision} : !llvm.float + %3494 = llvm.fadd %3473, %3374 {RelaxedPrecision} : !llvm.float + %3495 = llvm.fadd %3487, %3375 {RelaxedPrecision} : !llvm.float + %3496 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3497 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3498 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3499 = llvm.mul %3240, %3498 : !llvm.i64 + %3500 = llvm.add %3497, %3499 : !llvm.i64 + %3501 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3502 = llvm.mul %2613, %3501 : !llvm.i64 + %3503 = llvm.add %3500, %3502 : !llvm.i64 + %3504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3505 = llvm.mul %3255, %3504 : !llvm.i64 + %3506 = llvm.add %3503, %3505 : !llvm.i64 + %3507 = llvm.getelementptr %3496[%3506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3508 = llvm.load %3507 : !llvm.ptr> + %3509 = llvm.insertelement %3488, %3508[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3510 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3511 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3512 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3513 = llvm.mul %3240, %3512 : !llvm.i64 + %3514 = llvm.add %3511, %3513 : !llvm.i64 + %3515 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3516 = llvm.mul %2613, %3515 : !llvm.i64 + %3517 = llvm.add %3514, %3516 : !llvm.i64 + %3518 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3519 = llvm.mul %3255, %3518 : !llvm.i64 + %3520 = llvm.add %3517, %3519 : !llvm.i64 + %3521 = llvm.getelementptr %3510[%3520] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3509, %3521 : !llvm.ptr> + %3522 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3523 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3524 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3525 = llvm.mul %3240, %3524 : !llvm.i64 + %3526 = llvm.add %3523, %3525 : !llvm.i64 + %3527 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3528 = llvm.mul %2613, %3527 : !llvm.i64 + %3529 = llvm.add %3526, %3528 : !llvm.i64 + %3530 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3531 = llvm.mul %3255, %3530 : !llvm.i64 + %3532 = llvm.add %3529, %3531 : !llvm.i64 + %3533 = llvm.getelementptr %3522[%3532] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3534 = llvm.load %3533 : !llvm.ptr> + %3535 = llvm.insertelement %3489, %3534[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3536 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3537 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3538 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3539 = llvm.mul %3240, %3538 : !llvm.i64 + %3540 = llvm.add %3537, %3539 : !llvm.i64 + %3541 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3542 = llvm.mul %2613, %3541 : !llvm.i64 + %3543 = llvm.add %3540, %3542 : !llvm.i64 + %3544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3545 = llvm.mul %3255, %3544 : !llvm.i64 + %3546 = llvm.add %3543, %3545 : !llvm.i64 + %3547 = llvm.getelementptr %3536[%3546] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3535, %3547 : !llvm.ptr> + %3548 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3550 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3551 = llvm.mul %3240, %3550 : !llvm.i64 + %3552 = llvm.add %3549, %3551 : !llvm.i64 + %3553 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3554 = llvm.mul %2613, %3553 : !llvm.i64 + %3555 = llvm.add %3552, %3554 : !llvm.i64 + %3556 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3557 = llvm.mul %3255, %3556 : !llvm.i64 + %3558 = llvm.add %3555, %3557 : !llvm.i64 + %3559 = llvm.getelementptr %3548[%3558] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3560 = llvm.load %3559 : !llvm.ptr> + %3561 = llvm.insertelement %3490, %3560[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3562 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3563 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3564 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3565 = llvm.mul %3240, %3564 : !llvm.i64 + %3566 = llvm.add %3563, %3565 : !llvm.i64 + %3567 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3568 = llvm.mul %2613, %3567 : !llvm.i64 + %3569 = llvm.add %3566, %3568 : !llvm.i64 + %3570 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3571 = llvm.mul %3255, %3570 : !llvm.i64 + %3572 = llvm.add %3569, %3571 : !llvm.i64 + %3573 = llvm.getelementptr %3562[%3572] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3561, %3573 : !llvm.ptr> + %3574 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3575 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3576 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3577 = llvm.mul %3240, %3576 : !llvm.i64 + %3578 = llvm.add %3575, %3577 : !llvm.i64 + %3579 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3580 = llvm.mul %2613, %3579 : !llvm.i64 + %3581 = llvm.add %3578, %3580 : !llvm.i64 + %3582 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3583 = llvm.mul %3255, %3582 : !llvm.i64 + %3584 = llvm.add %3581, %3583 : !llvm.i64 + %3585 = llvm.getelementptr %3574[%3584] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3586 = llvm.load %3585 : !llvm.ptr> + %3587 = llvm.insertelement %3491, %3586[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3588 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3590 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3591 = llvm.mul %3240, %3590 : !llvm.i64 + %3592 = llvm.add %3589, %3591 : !llvm.i64 + %3593 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3594 = llvm.mul %2613, %3593 : !llvm.i64 + %3595 = llvm.add %3592, %3594 : !llvm.i64 + %3596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3597 = llvm.mul %3255, %3596 : !llvm.i64 + %3598 = llvm.add %3595, %3597 : !llvm.i64 + %3599 = llvm.getelementptr %3588[%3598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3587, %3599 : !llvm.ptr> + %3600 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3602 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3603 = llvm.mul %3240, %3602 : !llvm.i64 + %3604 = llvm.add %3601, %3603 : !llvm.i64 + %3605 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3606 = llvm.mul %2613, %3605 : !llvm.i64 + %3607 = llvm.add %3604, %3606 : !llvm.i64 + %3608 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3609 = llvm.mul %3255, %3608 : !llvm.i64 + %3610 = llvm.add %3607, %3609 : !llvm.i64 + %3611 = llvm.getelementptr %3600[%3610] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3612 = llvm.load %3611 : !llvm.ptr> + %3613 = llvm.insertelement %3492, %3612[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3617 = llvm.mul %3240, %3616 : !llvm.i64 + %3618 = llvm.add %3615, %3617 : !llvm.i64 + %3619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3620 = llvm.mul %2613, %3619 : !llvm.i64 + %3621 = llvm.add %3618, %3620 : !llvm.i64 + %3622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3623 = llvm.mul %3255, %3622 : !llvm.i64 + %3624 = llvm.add %3621, %3623 : !llvm.i64 + %3625 = llvm.getelementptr %3614[%3624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3613, %3625 : !llvm.ptr> + %3626 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3627 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3628 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3629 = llvm.mul %3240, %3628 : !llvm.i64 + %3630 = llvm.add %3627, %3629 : !llvm.i64 + %3631 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3632 = llvm.mul %2613, %3631 : !llvm.i64 + %3633 = llvm.add %3630, %3632 : !llvm.i64 + %3634 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3635 = llvm.mul %3255, %3634 : !llvm.i64 + %3636 = llvm.add %3633, %3635 : !llvm.i64 + %3637 = llvm.getelementptr %3626[%3636] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3638 = llvm.load %3637 : !llvm.ptr> + %3639 = llvm.insertelement %3493, %3638[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3640 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3641 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3642 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3643 = llvm.mul %3240, %3642 : !llvm.i64 + %3644 = llvm.add %3641, %3643 : !llvm.i64 + %3645 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3646 = llvm.mul %2613, %3645 : !llvm.i64 + %3647 = llvm.add %3644, %3646 : !llvm.i64 + %3648 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3649 = llvm.mul %3255, %3648 : !llvm.i64 + %3650 = llvm.add %3647, %3649 : !llvm.i64 + %3651 = llvm.getelementptr %3640[%3650] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3639, %3651 : !llvm.ptr> + %3652 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3653 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3654 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3655 = llvm.mul %3240, %3654 : !llvm.i64 + %3656 = llvm.add %3653, %3655 : !llvm.i64 + %3657 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3658 = llvm.mul %2613, %3657 : !llvm.i64 + %3659 = llvm.add %3656, %3658 : !llvm.i64 + %3660 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3661 = llvm.mul %3255, %3660 : !llvm.i64 + %3662 = llvm.add %3659, %3661 : !llvm.i64 + %3663 = llvm.getelementptr %3652[%3662] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3664 = llvm.load %3663 : !llvm.ptr> + %3665 = llvm.insertelement %3494, %3664[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3666 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3667 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3668 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3669 = llvm.mul %3240, %3668 : !llvm.i64 + %3670 = llvm.add %3667, %3669 : !llvm.i64 + %3671 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3672 = llvm.mul %2613, %3671 : !llvm.i64 + %3673 = llvm.add %3670, %3672 : !llvm.i64 + %3674 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3675 = llvm.mul %3255, %3674 : !llvm.i64 + %3676 = llvm.add %3673, %3675 : !llvm.i64 + %3677 = llvm.getelementptr %3666[%3676] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3665, %3677 : !llvm.ptr> + %3678 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3680 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3681 = llvm.mul %3240, %3680 : !llvm.i64 + %3682 = llvm.add %3679, %3681 : !llvm.i64 + %3683 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3684 = llvm.mul %2613, %3683 : !llvm.i64 + %3685 = llvm.add %3682, %3684 : !llvm.i64 + %3686 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3687 = llvm.mul %3255, %3686 : !llvm.i64 + %3688 = llvm.add %3685, %3687 : !llvm.i64 + %3689 = llvm.getelementptr %3678[%3688] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3690 = llvm.load %3689 : !llvm.ptr> + %3691 = llvm.insertelement %3495, %3690[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3692 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3694 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3695 = llvm.mul %3240, %3694 : !llvm.i64 + %3696 = llvm.add %3693, %3695 : !llvm.i64 + %3697 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3698 = llvm.mul %2613, %3697 : !llvm.i64 + %3699 = llvm.add %3696, %3698 : !llvm.i64 + %3700 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3701 = llvm.mul %3255, %3700 : !llvm.i64 + %3702 = llvm.add %3699, %3701 : !llvm.i64 + %3703 = llvm.getelementptr %3692[%3702] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3691, %3703 : !llvm.ptr> + %3704 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3705 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3706 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3707 = llvm.mul %3240, %3706 : !llvm.i64 + %3708 = llvm.add %3705, %3707 : !llvm.i64 + %3709 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3710 = llvm.mul %2613, %3709 : !llvm.i64 + %3711 = llvm.add %3708, %3710 : !llvm.i64 + %3712 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3713 = llvm.mul %3255, %3712 : !llvm.i64 + %3714 = llvm.add %3711, %3713 : !llvm.i64 + %3715 = llvm.getelementptr %3704[%3714] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3716 = llvm.load %3715 : !llvm.ptr> + %3717 = llvm.insertelement %3488, %3716[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3718 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3720 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3721 = llvm.mul %3240, %3720 : !llvm.i64 + %3722 = llvm.add %3719, %3721 : !llvm.i64 + %3723 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3724 = llvm.mul %2613, %3723 : !llvm.i64 + %3725 = llvm.add %3722, %3724 : !llvm.i64 + %3726 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3727 = llvm.mul %3255, %3726 : !llvm.i64 + %3728 = llvm.add %3725, %3727 : !llvm.i64 + %3729 = llvm.getelementptr %3718[%3728] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3717, %3729 : !llvm.ptr> + %3730 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3731 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3732 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3733 = llvm.mul %3240, %3732 : !llvm.i64 + %3734 = llvm.add %3731, %3733 : !llvm.i64 + %3735 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3736 = llvm.mul %2613, %3735 : !llvm.i64 + %3737 = llvm.add %3734, %3736 : !llvm.i64 + %3738 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3739 = llvm.mul %3255, %3738 : !llvm.i64 + %3740 = llvm.add %3737, %3739 : !llvm.i64 + %3741 = llvm.getelementptr %3730[%3740] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3742 = llvm.load %3741 : !llvm.ptr> + %3743 = llvm.insertelement %3489, %3742[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3744 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3746 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3747 = llvm.mul %3240, %3746 : !llvm.i64 + %3748 = llvm.add %3745, %3747 : !llvm.i64 + %3749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3750 = llvm.mul %2613, %3749 : !llvm.i64 + %3751 = llvm.add %3748, %3750 : !llvm.i64 + %3752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3753 = llvm.mul %3255, %3752 : !llvm.i64 + %3754 = llvm.add %3751, %3753 : !llvm.i64 + %3755 = llvm.getelementptr %3744[%3754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3743, %3755 : !llvm.ptr> + %3756 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3758 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3759 = llvm.mul %3240, %3758 : !llvm.i64 + %3760 = llvm.add %3757, %3759 : !llvm.i64 + %3761 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3762 = llvm.mul %2613, %3761 : !llvm.i64 + %3763 = llvm.add %3760, %3762 : !llvm.i64 + %3764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3765 = llvm.mul %3255, %3764 : !llvm.i64 + %3766 = llvm.add %3763, %3765 : !llvm.i64 + %3767 = llvm.getelementptr %3756[%3766] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3768 = llvm.load %3767 : !llvm.ptr> + %3769 = llvm.insertelement %3490, %3768[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3770 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3771 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3772 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3773 = llvm.mul %3240, %3772 : !llvm.i64 + %3774 = llvm.add %3771, %3773 : !llvm.i64 + %3775 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3776 = llvm.mul %2613, %3775 : !llvm.i64 + %3777 = llvm.add %3774, %3776 : !llvm.i64 + %3778 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3779 = llvm.mul %3255, %3778 : !llvm.i64 + %3780 = llvm.add %3777, %3779 : !llvm.i64 + %3781 = llvm.getelementptr %3770[%3780] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3769, %3781 : !llvm.ptr> + %3782 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3784 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3785 = llvm.mul %3240, %3784 : !llvm.i64 + %3786 = llvm.add %3783, %3785 : !llvm.i64 + %3787 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3788 = llvm.mul %2613, %3787 : !llvm.i64 + %3789 = llvm.add %3786, %3788 : !llvm.i64 + %3790 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3791 = llvm.mul %3255, %3790 : !llvm.i64 + %3792 = llvm.add %3789, %3791 : !llvm.i64 + %3793 = llvm.getelementptr %3782[%3792] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3794 = llvm.load %3793 : !llvm.ptr> + %3795 = llvm.insertelement %3491, %3794[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3796 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3797 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3798 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3799 = llvm.mul %3240, %3798 : !llvm.i64 + %3800 = llvm.add %3797, %3799 : !llvm.i64 + %3801 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3802 = llvm.mul %2613, %3801 : !llvm.i64 + %3803 = llvm.add %3800, %3802 : !llvm.i64 + %3804 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3805 = llvm.mul %3255, %3804 : !llvm.i64 + %3806 = llvm.add %3803, %3805 : !llvm.i64 + %3807 = llvm.getelementptr %3796[%3806] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3795, %3807 : !llvm.ptr> + %3808 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3810 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3811 = llvm.mul %3240, %3810 : !llvm.i64 + %3812 = llvm.add %3809, %3811 : !llvm.i64 + %3813 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3814 = llvm.mul %2613, %3813 : !llvm.i64 + %3815 = llvm.add %3812, %3814 : !llvm.i64 + %3816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3817 = llvm.mul %3255, %3816 : !llvm.i64 + %3818 = llvm.add %3815, %3817 : !llvm.i64 + %3819 = llvm.getelementptr %3808[%3818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3820 = llvm.load %3819 : !llvm.ptr> + %3821 = llvm.insertelement %3492, %3820[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3822 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3824 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3825 = llvm.mul %3240, %3824 : !llvm.i64 + %3826 = llvm.add %3823, %3825 : !llvm.i64 + %3827 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3828 = llvm.mul %2613, %3827 : !llvm.i64 + %3829 = llvm.add %3826, %3828 : !llvm.i64 + %3830 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3831 = llvm.mul %3255, %3830 : !llvm.i64 + %3832 = llvm.add %3829, %3831 : !llvm.i64 + %3833 = llvm.getelementptr %3822[%3832] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3821, %3833 : !llvm.ptr> + %3834 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3835 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3836 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3837 = llvm.mul %3240, %3836 : !llvm.i64 + %3838 = llvm.add %3835, %3837 : !llvm.i64 + %3839 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3840 = llvm.mul %2613, %3839 : !llvm.i64 + %3841 = llvm.add %3838, %3840 : !llvm.i64 + %3842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3843 = llvm.mul %3255, %3842 : !llvm.i64 + %3844 = llvm.add %3841, %3843 : !llvm.i64 + %3845 = llvm.getelementptr %3834[%3844] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3846 = llvm.load %3845 : !llvm.ptr> + %3847 = llvm.insertelement %3493, %3846[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3848 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3850 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3851 = llvm.mul %3240, %3850 : !llvm.i64 + %3852 = llvm.add %3849, %3851 : !llvm.i64 + %3853 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3854 = llvm.mul %2613, %3853 : !llvm.i64 + %3855 = llvm.add %3852, %3854 : !llvm.i64 + %3856 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3857 = llvm.mul %3255, %3856 : !llvm.i64 + %3858 = llvm.add %3855, %3857 : !llvm.i64 + %3859 = llvm.getelementptr %3848[%3858] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3847, %3859 : !llvm.ptr> + %3860 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3861 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3862 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3863 = llvm.mul %3240, %3862 : !llvm.i64 + %3864 = llvm.add %3861, %3863 : !llvm.i64 + %3865 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3866 = llvm.mul %2613, %3865 : !llvm.i64 + %3867 = llvm.add %3864, %3866 : !llvm.i64 + %3868 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3869 = llvm.mul %3255, %3868 : !llvm.i64 + %3870 = llvm.add %3867, %3869 : !llvm.i64 + %3871 = llvm.getelementptr %3860[%3870] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3872 = llvm.load %3871 : !llvm.ptr> + %3873 = llvm.insertelement %3494, %3872[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3874 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3875 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3876 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3877 = llvm.mul %3240, %3876 : !llvm.i64 + %3878 = llvm.add %3875, %3877 : !llvm.i64 + %3879 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3880 = llvm.mul %2613, %3879 : !llvm.i64 + %3881 = llvm.add %3878, %3880 : !llvm.i64 + %3882 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3883 = llvm.mul %3255, %3882 : !llvm.i64 + %3884 = llvm.add %3881, %3883 : !llvm.i64 + %3885 = llvm.getelementptr %3874[%3884] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3873, %3885 : !llvm.ptr> + %3886 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3888 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3889 = llvm.mul %3240, %3888 : !llvm.i64 + %3890 = llvm.add %3887, %3889 : !llvm.i64 + %3891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3892 = llvm.mul %2613, %3891 : !llvm.i64 + %3893 = llvm.add %3890, %3892 : !llvm.i64 + %3894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3895 = llvm.mul %3255, %3894 : !llvm.i64 + %3896 = llvm.add %3893, %3895 : !llvm.i64 + %3897 = llvm.getelementptr %3886[%3896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3898 = llvm.load %3897 : !llvm.ptr> + %3899 = llvm.insertelement %3495, %3898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3903 = llvm.mul %3240, %3902 : !llvm.i64 + %3904 = llvm.add %3901, %3903 : !llvm.i64 + %3905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3906 = llvm.mul %2613, %3905 : !llvm.i64 + %3907 = llvm.add %3904, %3906 : !llvm.i64 + %3908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3909 = llvm.mul %3255, %3908 : !llvm.i64 + %3910 = llvm.add %3907, %3909 : !llvm.i64 + %3911 = llvm.getelementptr %3900[%3910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3899, %3911 : !llvm.ptr> + %3912 = llvm.add %2376, %69 : !llvm.i64 + llvm.br ^bb31(%3912 : !llvm.i64) + ^bb33: // pred: ^bb31 + %3913 = llvm.add %2374, %69 : !llvm.i64 + llvm.br ^bb29(%3913 : !llvm.i64) + ^bb34: // pred: ^bb29 + %3914 = llvm.add %2372, %56 : !llvm.i64 + llvm.br ^bb27(%3914 : !llvm.i64) + ^bb35: // pred: ^bb27 + llvm.br ^bb36(%67 : !llvm.i64) + ^bb36(%3915: !llvm.i64): // 2 preds: ^bb35, ^bb37 + %3916 = llvm.icmp "slt" %3915, %48 : !llvm.i64 + llvm.cond_br %3916, ^bb37, ^bb38 + ^bb37: // pred: ^bb36 + %3917 = llvm.add %2370, %3915 : !llvm.i64 + %3918 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3920 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3921 = llvm.mul %2345, %3920 : !llvm.i64 + %3922 = llvm.add %3919, %3921 : !llvm.i64 + %3923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3924 = llvm.mul %3917, %3923 : !llvm.i64 + %3925 = llvm.add %3922, %3924 : !llvm.i64 + %3926 = llvm.getelementptr %3918[%3925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3927 = llvm.load %3926 : !llvm.ptr + %3928 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3930 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3931 = llvm.mul %2345, %3930 : !llvm.i64 + %3932 = llvm.add %3929, %3931 : !llvm.i64 + %3933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3934 = llvm.mul %3917, %3933 : !llvm.i64 + %3935 = llvm.add %3932, %3934 : !llvm.i64 + %3936 = llvm.getelementptr %3928[%3935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3937 = llvm.load %3936 : !llvm.ptr + %3938 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3940 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3941 = llvm.mul %2345, %3940 : !llvm.i64 + %3942 = llvm.add %3939, %3941 : !llvm.i64 + %3943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3944 = llvm.mul %3917, %3943 : !llvm.i64 + %3945 = llvm.add %3942, %3944 : !llvm.i64 + %3946 = llvm.getelementptr %3938[%3945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3947 = llvm.load %3946 : !llvm.ptr + %3948 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3950 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3951 = llvm.mul %2345, %3950 : !llvm.i64 + %3952 = llvm.add %3949, %3951 : !llvm.i64 + %3953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3954 = llvm.mul %3917, %3953 : !llvm.i64 + %3955 = llvm.add %3952, %3954 : !llvm.i64 + %3956 = llvm.getelementptr %3948[%3955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3957 = llvm.load %3956 : !llvm.ptr + %3958 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3960 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3961 = llvm.mul %2345, %3960 : !llvm.i64 + %3962 = llvm.add %3959, %3961 : !llvm.i64 + %3963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3964 = llvm.mul %3917, %3963 : !llvm.i64 + %3965 = llvm.add %3962, %3964 : !llvm.i64 + %3966 = llvm.getelementptr %3958[%3965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3967 = llvm.load %3966 : !llvm.ptr + %3968 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3970 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3971 = llvm.mul %2345, %3970 : !llvm.i64 + %3972 = llvm.add %3969, %3971 : !llvm.i64 + %3973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3974 = llvm.mul %3917, %3973 : !llvm.i64 + %3975 = llvm.add %3972, %3974 : !llvm.i64 + %3976 = llvm.getelementptr %3968[%3975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3977 = llvm.load %3976 : !llvm.ptr + %3978 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3980 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3981 = llvm.mul %2345, %3980 : !llvm.i64 + %3982 = llvm.add %3979, %3981 : !llvm.i64 + %3983 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3984 = llvm.mul %3917, %3983 : !llvm.i64 + %3985 = llvm.add %3982, %3984 : !llvm.i64 + %3986 = llvm.getelementptr %3978[%3985] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3987 = llvm.load %3986 : !llvm.ptr + %3988 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3990 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3991 = llvm.mul %2345, %3990 : !llvm.i64 + %3992 = llvm.add %3989, %3991 : !llvm.i64 + %3993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3994 = llvm.mul %3917, %3993 : !llvm.i64 + %3995 = llvm.add %3992, %3994 : !llvm.i64 + %3996 = llvm.getelementptr %3988[%3995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3997 = llvm.load %3996 : !llvm.ptr + %3998 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %3999 = llvm.sub %64, %2368 : !llvm.i64 + %4000 = llvm.select %3998, %3999, %2368 : !llvm.i1, !llvm.i64 + %4001 = llvm.sdiv %4000, %68 : !llvm.i64 + %4002 = llvm.sub %64, %4001 : !llvm.i64 + %4003 = llvm.select %3998, %4002, %4001 : !llvm.i1, !llvm.i64 + %4004 = llvm.srem %4003, %68 : !llvm.i64 + %4005 = llvm.icmp "slt" %4004, %67 : !llvm.i64 + %4006 = llvm.add %4004, %68 : !llvm.i64 + %4007 = llvm.select %4005, %4006, %4004 : !llvm.i1, !llvm.i64 + %4008 = llvm.srem %3917, %39 : !llvm.i64 + %4009 = llvm.icmp "slt" %4008, %67 : !llvm.i64 + %4010 = llvm.add %4008, %39 : !llvm.i64 + %4011 = llvm.select %4009, %4010, %4008 : !llvm.i1, !llvm.i64 + %4012 = llvm.srem %2368, %68 : !llvm.i64 + %4013 = llvm.icmp "slt" %4012, %67 : !llvm.i64 + %4014 = llvm.add %4012, %68 : !llvm.i64 + %4015 = llvm.select %4013, %4014, %4012 : !llvm.i1, !llvm.i64 + %4016 = llvm.icmp "slt" %4015, %67 : !llvm.i64 + %4017 = llvm.sub %64, %4015 : !llvm.i64 + %4018 = llvm.select %4016, %4017, %4015 : !llvm.i1, !llvm.i64 + %4019 = llvm.sdiv %4018, %70 : !llvm.i64 + %4020 = llvm.sub %64, %4019 : !llvm.i64 + %4021 = llvm.select %4016, %4020, %4019 : !llvm.i1, !llvm.i64 + %4022 = llvm.srem %4021, %63 : !llvm.i64 + %4023 = llvm.icmp "slt" %4022, %67 : !llvm.i64 + %4024 = llvm.add %4022, %63 : !llvm.i64 + %4025 = llvm.select %4023, %4024, %4022 : !llvm.i1, !llvm.i64 + %4026 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4027 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4028 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4029 = llvm.mul %4007, %4028 : !llvm.i64 + %4030 = llvm.add %4027, %4029 : !llvm.i64 + %4031 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4032 = llvm.mul %4011, %4031 : !llvm.i64 + %4033 = llvm.add %4030, %4032 : !llvm.i64 + %4034 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4035 = llvm.mul %4025, %4034 : !llvm.i64 + %4036 = llvm.add %4033, %4035 : !llvm.i64 + %4037 = llvm.getelementptr %4026[%4036] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4038 = llvm.load %4037 : !llvm.ptr> + %4039 = llvm.extractelement %4038[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4040 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4041 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4042 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4043 = llvm.mul %4007, %4042 : !llvm.i64 + %4044 = llvm.add %4041, %4043 : !llvm.i64 + %4045 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4046 = llvm.mul %4011, %4045 : !llvm.i64 + %4047 = llvm.add %4044, %4046 : !llvm.i64 + %4048 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4049 = llvm.mul %4025, %4048 : !llvm.i64 + %4050 = llvm.add %4047, %4049 : !llvm.i64 + %4051 = llvm.getelementptr %4040[%4050] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4052 = llvm.load %4051 : !llvm.ptr> + %4053 = llvm.extractelement %4052[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4054 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4056 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4057 = llvm.mul %4007, %4056 : !llvm.i64 + %4058 = llvm.add %4055, %4057 : !llvm.i64 + %4059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4060 = llvm.mul %4011, %4059 : !llvm.i64 + %4061 = llvm.add %4058, %4060 : !llvm.i64 + %4062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4063 = llvm.mul %4025, %4062 : !llvm.i64 + %4064 = llvm.add %4061, %4063 : !llvm.i64 + %4065 = llvm.getelementptr %4054[%4064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4066 = llvm.load %4065 : !llvm.ptr> + %4067 = llvm.extractelement %4066[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4068 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4070 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4071 = llvm.mul %4007, %4070 : !llvm.i64 + %4072 = llvm.add %4069, %4071 : !llvm.i64 + %4073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4074 = llvm.mul %4011, %4073 : !llvm.i64 + %4075 = llvm.add %4072, %4074 : !llvm.i64 + %4076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4077 = llvm.mul %4025, %4076 : !llvm.i64 + %4078 = llvm.add %4075, %4077 : !llvm.i64 + %4079 = llvm.getelementptr %4068[%4078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4080 = llvm.load %4079 : !llvm.ptr> + %4081 = llvm.extractelement %4080[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4082 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4083 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4084 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4085 = llvm.mul %4007, %4084 : !llvm.i64 + %4086 = llvm.add %4083, %4085 : !llvm.i64 + %4087 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4088 = llvm.mul %4011, %4087 : !llvm.i64 + %4089 = llvm.add %4086, %4088 : !llvm.i64 + %4090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4091 = llvm.mul %4025, %4090 : !llvm.i64 + %4092 = llvm.add %4089, %4091 : !llvm.i64 + %4093 = llvm.getelementptr %4082[%4092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4094 = llvm.load %4093 : !llvm.ptr> + %4095 = llvm.extractelement %4094[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4096 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4098 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4099 = llvm.mul %4007, %4098 : !llvm.i64 + %4100 = llvm.add %4097, %4099 : !llvm.i64 + %4101 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4102 = llvm.mul %4011, %4101 : !llvm.i64 + %4103 = llvm.add %4100, %4102 : !llvm.i64 + %4104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4105 = llvm.mul %4025, %4104 : !llvm.i64 + %4106 = llvm.add %4103, %4105 : !llvm.i64 + %4107 = llvm.getelementptr %4096[%4106] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4108 = llvm.load %4107 : !llvm.ptr> + %4109 = llvm.extractelement %4108[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4110 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4111 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4112 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4113 = llvm.mul %4007, %4112 : !llvm.i64 + %4114 = llvm.add %4111, %4113 : !llvm.i64 + %4115 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4116 = llvm.mul %4011, %4115 : !llvm.i64 + %4117 = llvm.add %4114, %4116 : !llvm.i64 + %4118 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4119 = llvm.mul %4025, %4118 : !llvm.i64 + %4120 = llvm.add %4117, %4119 : !llvm.i64 + %4121 = llvm.getelementptr %4110[%4120] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4122 = llvm.load %4121 : !llvm.ptr> + %4123 = llvm.extractelement %4122[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4124 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4126 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4127 = llvm.mul %4007, %4126 : !llvm.i64 + %4128 = llvm.add %4125, %4127 : !llvm.i64 + %4129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4130 = llvm.mul %4011, %4129 : !llvm.i64 + %4131 = llvm.add %4128, %4130 : !llvm.i64 + %4132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4133 = llvm.mul %4025, %4132 : !llvm.i64 + %4134 = llvm.add %4131, %4133 : !llvm.i64 + %4135 = llvm.getelementptr %4124[%4134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4136 = llvm.load %4135 : !llvm.ptr> + %4137 = llvm.extractelement %4136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4138 = llvm.fmul %3927, %4039 {RelaxedPrecision} : !llvm.float + %4139 = llvm.fmul %3937, %4053 {RelaxedPrecision} : !llvm.float + %4140 = llvm.fmul %3947, %4067 {RelaxedPrecision} : !llvm.float + %4141 = llvm.fmul %3957, %4081 {RelaxedPrecision} : !llvm.float + %4142 = llvm.fmul %3967, %4095 {RelaxedPrecision} : !llvm.float + %4143 = llvm.fmul %3977, %4109 {RelaxedPrecision} : !llvm.float + %4144 = llvm.fmul %3987, %4123 {RelaxedPrecision} : !llvm.float + %4145 = llvm.fmul %3997, %4137 {RelaxedPrecision} : !llvm.float + %4146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4149 = llvm.mul %4007, %4148 : !llvm.i64 + %4150 = llvm.add %4147, %4149 : !llvm.i64 + %4151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4152 = llvm.mul %67, %4151 : !llvm.i64 + %4153 = llvm.add %4150, %4152 : !llvm.i64 + %4154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4155 = llvm.mul %4025, %4154 : !llvm.i64 + %4156 = llvm.add %4153, %4155 : !llvm.i64 + %4157 = llvm.getelementptr %4146[%4156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4158 = llvm.load %4157 : !llvm.ptr> + %4159 = llvm.extractelement %4158[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4160 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4162 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4163 = llvm.mul %4007, %4162 : !llvm.i64 + %4164 = llvm.add %4161, %4163 : !llvm.i64 + %4165 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4166 = llvm.mul %67, %4165 : !llvm.i64 + %4167 = llvm.add %4164, %4166 : !llvm.i64 + %4168 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4169 = llvm.mul %4025, %4168 : !llvm.i64 + %4170 = llvm.add %4167, %4169 : !llvm.i64 + %4171 = llvm.getelementptr %4160[%4170] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4172 = llvm.load %4171 : !llvm.ptr> + %4173 = llvm.extractelement %4172[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4174 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4175 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4176 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4177 = llvm.mul %4007, %4176 : !llvm.i64 + %4178 = llvm.add %4175, %4177 : !llvm.i64 + %4179 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4180 = llvm.mul %67, %4179 : !llvm.i64 + %4181 = llvm.add %4178, %4180 : !llvm.i64 + %4182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4183 = llvm.mul %4025, %4182 : !llvm.i64 + %4184 = llvm.add %4181, %4183 : !llvm.i64 + %4185 = llvm.getelementptr %4174[%4184] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4186 = llvm.load %4185 : !llvm.ptr> + %4187 = llvm.extractelement %4186[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4188 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4190 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4191 = llvm.mul %4007, %4190 : !llvm.i64 + %4192 = llvm.add %4189, %4191 : !llvm.i64 + %4193 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4194 = llvm.mul %67, %4193 : !llvm.i64 + %4195 = llvm.add %4192, %4194 : !llvm.i64 + %4196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4197 = llvm.mul %4025, %4196 : !llvm.i64 + %4198 = llvm.add %4195, %4197 : !llvm.i64 + %4199 = llvm.getelementptr %4188[%4198] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4200 = llvm.load %4199 : !llvm.ptr> + %4201 = llvm.extractelement %4200[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4202 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4203 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4204 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4205 = llvm.mul %4007, %4204 : !llvm.i64 + %4206 = llvm.add %4203, %4205 : !llvm.i64 + %4207 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4208 = llvm.mul %67, %4207 : !llvm.i64 + %4209 = llvm.add %4206, %4208 : !llvm.i64 + %4210 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4211 = llvm.mul %4025, %4210 : !llvm.i64 + %4212 = llvm.add %4209, %4211 : !llvm.i64 + %4213 = llvm.getelementptr %4202[%4212] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4214 = llvm.load %4213 : !llvm.ptr> + %4215 = llvm.extractelement %4214[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4216 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4217 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4218 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4219 = llvm.mul %4007, %4218 : !llvm.i64 + %4220 = llvm.add %4217, %4219 : !llvm.i64 + %4221 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4222 = llvm.mul %67, %4221 : !llvm.i64 + %4223 = llvm.add %4220, %4222 : !llvm.i64 + %4224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4225 = llvm.mul %4025, %4224 : !llvm.i64 + %4226 = llvm.add %4223, %4225 : !llvm.i64 + %4227 = llvm.getelementptr %4216[%4226] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4228 = llvm.load %4227 : !llvm.ptr> + %4229 = llvm.extractelement %4228[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4230 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4232 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4233 = llvm.mul %4007, %4232 : !llvm.i64 + %4234 = llvm.add %4231, %4233 : !llvm.i64 + %4235 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4236 = llvm.mul %67, %4235 : !llvm.i64 + %4237 = llvm.add %4234, %4236 : !llvm.i64 + %4238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4239 = llvm.mul %4025, %4238 : !llvm.i64 + %4240 = llvm.add %4237, %4239 : !llvm.i64 + %4241 = llvm.getelementptr %4230[%4240] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4242 = llvm.load %4241 : !llvm.ptr> + %4243 = llvm.extractelement %4242[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4244 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4245 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4246 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4247 = llvm.mul %4007, %4246 : !llvm.i64 + %4248 = llvm.add %4245, %4247 : !llvm.i64 + %4249 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4250 = llvm.mul %67, %4249 : !llvm.i64 + %4251 = llvm.add %4248, %4250 : !llvm.i64 + %4252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4253 = llvm.mul %4025, %4252 : !llvm.i64 + %4254 = llvm.add %4251, %4253 : !llvm.i64 + %4255 = llvm.getelementptr %4244[%4254] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4256 = llvm.load %4255 : !llvm.ptr> + %4257 = llvm.extractelement %4256[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4258 = llvm.fadd %4159, %4138 {RelaxedPrecision} : !llvm.float + %4259 = llvm.fadd %4173, %4139 {RelaxedPrecision} : !llvm.float + %4260 = llvm.fadd %4187, %4140 {RelaxedPrecision} : !llvm.float + %4261 = llvm.fadd %4201, %4141 {RelaxedPrecision} : !llvm.float + %4262 = llvm.fadd %4215, %4142 {RelaxedPrecision} : !llvm.float + %4263 = llvm.fadd %4229, %4143 {RelaxedPrecision} : !llvm.float + %4264 = llvm.fadd %4243, %4144 {RelaxedPrecision} : !llvm.float + %4265 = llvm.fadd %4257, %4145 {RelaxedPrecision} : !llvm.float + %4266 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4268 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4269 = llvm.mul %4007, %4268 : !llvm.i64 + %4270 = llvm.add %4267, %4269 : !llvm.i64 + %4271 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4272 = llvm.mul %67, %4271 : !llvm.i64 + %4273 = llvm.add %4270, %4272 : !llvm.i64 + %4274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4275 = llvm.mul %4025, %4274 : !llvm.i64 + %4276 = llvm.add %4273, %4275 : !llvm.i64 + %4277 = llvm.getelementptr %4266[%4276] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4278 = llvm.load %4277 : !llvm.ptr> + %4279 = llvm.insertelement %4258, %4278[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4280 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4281 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4282 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4283 = llvm.mul %4007, %4282 : !llvm.i64 + %4284 = llvm.add %4281, %4283 : !llvm.i64 + %4285 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4286 = llvm.mul %67, %4285 : !llvm.i64 + %4287 = llvm.add %4284, %4286 : !llvm.i64 + %4288 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4289 = llvm.mul %4025, %4288 : !llvm.i64 + %4290 = llvm.add %4287, %4289 : !llvm.i64 + %4291 = llvm.getelementptr %4280[%4290] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4279, %4291 : !llvm.ptr> + %4292 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4293 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4294 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4295 = llvm.mul %4007, %4294 : !llvm.i64 + %4296 = llvm.add %4293, %4295 : !llvm.i64 + %4297 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4298 = llvm.mul %67, %4297 : !llvm.i64 + %4299 = llvm.add %4296, %4298 : !llvm.i64 + %4300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4301 = llvm.mul %4025, %4300 : !llvm.i64 + %4302 = llvm.add %4299, %4301 : !llvm.i64 + %4303 = llvm.getelementptr %4292[%4302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4304 = llvm.load %4303 : !llvm.ptr> + %4305 = llvm.insertelement %4259, %4304[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4306 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4307 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4308 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4309 = llvm.mul %4007, %4308 : !llvm.i64 + %4310 = llvm.add %4307, %4309 : !llvm.i64 + %4311 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4312 = llvm.mul %67, %4311 : !llvm.i64 + %4313 = llvm.add %4310, %4312 : !llvm.i64 + %4314 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4315 = llvm.mul %4025, %4314 : !llvm.i64 + %4316 = llvm.add %4313, %4315 : !llvm.i64 + %4317 = llvm.getelementptr %4306[%4316] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4305, %4317 : !llvm.ptr> + %4318 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4320 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4321 = llvm.mul %4007, %4320 : !llvm.i64 + %4322 = llvm.add %4319, %4321 : !llvm.i64 + %4323 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4324 = llvm.mul %67, %4323 : !llvm.i64 + %4325 = llvm.add %4322, %4324 : !llvm.i64 + %4326 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4327 = llvm.mul %4025, %4326 : !llvm.i64 + %4328 = llvm.add %4325, %4327 : !llvm.i64 + %4329 = llvm.getelementptr %4318[%4328] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4330 = llvm.load %4329 : !llvm.ptr> + %4331 = llvm.insertelement %4260, %4330[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4332 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4333 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4334 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4335 = llvm.mul %4007, %4334 : !llvm.i64 + %4336 = llvm.add %4333, %4335 : !llvm.i64 + %4337 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4338 = llvm.mul %67, %4337 : !llvm.i64 + %4339 = llvm.add %4336, %4338 : !llvm.i64 + %4340 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4341 = llvm.mul %4025, %4340 : !llvm.i64 + %4342 = llvm.add %4339, %4341 : !llvm.i64 + %4343 = llvm.getelementptr %4332[%4342] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4331, %4343 : !llvm.ptr> + %4344 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4345 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4346 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4347 = llvm.mul %4007, %4346 : !llvm.i64 + %4348 = llvm.add %4345, %4347 : !llvm.i64 + %4349 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4350 = llvm.mul %67, %4349 : !llvm.i64 + %4351 = llvm.add %4348, %4350 : !llvm.i64 + %4352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4353 = llvm.mul %4025, %4352 : !llvm.i64 + %4354 = llvm.add %4351, %4353 : !llvm.i64 + %4355 = llvm.getelementptr %4344[%4354] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4356 = llvm.load %4355 : !llvm.ptr> + %4357 = llvm.insertelement %4261, %4356[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4358 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4360 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4361 = llvm.mul %4007, %4360 : !llvm.i64 + %4362 = llvm.add %4359, %4361 : !llvm.i64 + %4363 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4364 = llvm.mul %67, %4363 : !llvm.i64 + %4365 = llvm.add %4362, %4364 : !llvm.i64 + %4366 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4367 = llvm.mul %4025, %4366 : !llvm.i64 + %4368 = llvm.add %4365, %4367 : !llvm.i64 + %4369 = llvm.getelementptr %4358[%4368] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4357, %4369 : !llvm.ptr> + %4370 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4371 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4372 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4373 = llvm.mul %4007, %4372 : !llvm.i64 + %4374 = llvm.add %4371, %4373 : !llvm.i64 + %4375 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4376 = llvm.mul %67, %4375 : !llvm.i64 + %4377 = llvm.add %4374, %4376 : !llvm.i64 + %4378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4379 = llvm.mul %4025, %4378 : !llvm.i64 + %4380 = llvm.add %4377, %4379 : !llvm.i64 + %4381 = llvm.getelementptr %4370[%4380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4382 = llvm.load %4381 : !llvm.ptr> + %4383 = llvm.insertelement %4262, %4382[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4384 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4385 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4386 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4387 = llvm.mul %4007, %4386 : !llvm.i64 + %4388 = llvm.add %4385, %4387 : !llvm.i64 + %4389 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4390 = llvm.mul %67, %4389 : !llvm.i64 + %4391 = llvm.add %4388, %4390 : !llvm.i64 + %4392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4393 = llvm.mul %4025, %4392 : !llvm.i64 + %4394 = llvm.add %4391, %4393 : !llvm.i64 + %4395 = llvm.getelementptr %4384[%4394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4383, %4395 : !llvm.ptr> + %4396 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4397 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4398 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4399 = llvm.mul %4007, %4398 : !llvm.i64 + %4400 = llvm.add %4397, %4399 : !llvm.i64 + %4401 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4402 = llvm.mul %67, %4401 : !llvm.i64 + %4403 = llvm.add %4400, %4402 : !llvm.i64 + %4404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4405 = llvm.mul %4025, %4404 : !llvm.i64 + %4406 = llvm.add %4403, %4405 : !llvm.i64 + %4407 = llvm.getelementptr %4396[%4406] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4408 = llvm.load %4407 : !llvm.ptr> + %4409 = llvm.insertelement %4263, %4408[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4410 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4412 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4413 = llvm.mul %4007, %4412 : !llvm.i64 + %4414 = llvm.add %4411, %4413 : !llvm.i64 + %4415 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4416 = llvm.mul %67, %4415 : !llvm.i64 + %4417 = llvm.add %4414, %4416 : !llvm.i64 + %4418 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4419 = llvm.mul %4025, %4418 : !llvm.i64 + %4420 = llvm.add %4417, %4419 : !llvm.i64 + %4421 = llvm.getelementptr %4410[%4420] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4409, %4421 : !llvm.ptr> + %4422 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4423 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4424 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4425 = llvm.mul %4007, %4424 : !llvm.i64 + %4426 = llvm.add %4423, %4425 : !llvm.i64 + %4427 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4428 = llvm.mul %67, %4427 : !llvm.i64 + %4429 = llvm.add %4426, %4428 : !llvm.i64 + %4430 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4431 = llvm.mul %4025, %4430 : !llvm.i64 + %4432 = llvm.add %4429, %4431 : !llvm.i64 + %4433 = llvm.getelementptr %4422[%4432] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4434 = llvm.load %4433 : !llvm.ptr> + %4435 = llvm.insertelement %4264, %4434[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4436 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4437 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4438 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4439 = llvm.mul %4007, %4438 : !llvm.i64 + %4440 = llvm.add %4437, %4439 : !llvm.i64 + %4441 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4442 = llvm.mul %67, %4441 : !llvm.i64 + %4443 = llvm.add %4440, %4442 : !llvm.i64 + %4444 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4445 = llvm.mul %4025, %4444 : !llvm.i64 + %4446 = llvm.add %4443, %4445 : !llvm.i64 + %4447 = llvm.getelementptr %4436[%4446] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4435, %4447 : !llvm.ptr> + %4448 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4450 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4451 = llvm.mul %4007, %4450 : !llvm.i64 + %4452 = llvm.add %4449, %4451 : !llvm.i64 + %4453 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4454 = llvm.mul %67, %4453 : !llvm.i64 + %4455 = llvm.add %4452, %4454 : !llvm.i64 + %4456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4457 = llvm.mul %4025, %4456 : !llvm.i64 + %4458 = llvm.add %4455, %4457 : !llvm.i64 + %4459 = llvm.getelementptr %4448[%4458] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4460 = llvm.load %4459 : !llvm.ptr> + %4461 = llvm.insertelement %4265, %4460[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4462 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4463 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4464 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4465 = llvm.mul %4007, %4464 : !llvm.i64 + %4466 = llvm.add %4463, %4465 : !llvm.i64 + %4467 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4468 = llvm.mul %67, %4467 : !llvm.i64 + %4469 = llvm.add %4466, %4468 : !llvm.i64 + %4470 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4471 = llvm.mul %4025, %4470 : !llvm.i64 + %4472 = llvm.add %4469, %4471 : !llvm.i64 + %4473 = llvm.getelementptr %4462[%4472] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4461, %4473 : !llvm.ptr> + %4474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4477 = llvm.mul %4007, %4476 : !llvm.i64 + %4478 = llvm.add %4475, %4477 : !llvm.i64 + %4479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4480 = llvm.mul %67, %4479 : !llvm.i64 + %4481 = llvm.add %4478, %4480 : !llvm.i64 + %4482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4483 = llvm.mul %4025, %4482 : !llvm.i64 + %4484 = llvm.add %4481, %4483 : !llvm.i64 + %4485 = llvm.getelementptr %4474[%4484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4486 = llvm.load %4485 : !llvm.ptr> + %4487 = llvm.insertelement %4258, %4486[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4488 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4490 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4491 = llvm.mul %4007, %4490 : !llvm.i64 + %4492 = llvm.add %4489, %4491 : !llvm.i64 + %4493 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4494 = llvm.mul %67, %4493 : !llvm.i64 + %4495 = llvm.add %4492, %4494 : !llvm.i64 + %4496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4497 = llvm.mul %4025, %4496 : !llvm.i64 + %4498 = llvm.add %4495, %4497 : !llvm.i64 + %4499 = llvm.getelementptr %4488[%4498] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4487, %4499 : !llvm.ptr> + %4500 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4501 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4502 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4503 = llvm.mul %4007, %4502 : !llvm.i64 + %4504 = llvm.add %4501, %4503 : !llvm.i64 + %4505 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4506 = llvm.mul %67, %4505 : !llvm.i64 + %4507 = llvm.add %4504, %4506 : !llvm.i64 + %4508 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4509 = llvm.mul %4025, %4508 : !llvm.i64 + %4510 = llvm.add %4507, %4509 : !llvm.i64 + %4511 = llvm.getelementptr %4500[%4510] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4512 = llvm.load %4511 : !llvm.ptr> + %4513 = llvm.insertelement %4259, %4512[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4514 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4515 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4516 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4517 = llvm.mul %4007, %4516 : !llvm.i64 + %4518 = llvm.add %4515, %4517 : !llvm.i64 + %4519 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4520 = llvm.mul %67, %4519 : !llvm.i64 + %4521 = llvm.add %4518, %4520 : !llvm.i64 + %4522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4523 = llvm.mul %4025, %4522 : !llvm.i64 + %4524 = llvm.add %4521, %4523 : !llvm.i64 + %4525 = llvm.getelementptr %4514[%4524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4513, %4525 : !llvm.ptr> + %4526 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4528 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4529 = llvm.mul %4007, %4528 : !llvm.i64 + %4530 = llvm.add %4527, %4529 : !llvm.i64 + %4531 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4532 = llvm.mul %67, %4531 : !llvm.i64 + %4533 = llvm.add %4530, %4532 : !llvm.i64 + %4534 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4535 = llvm.mul %4025, %4534 : !llvm.i64 + %4536 = llvm.add %4533, %4535 : !llvm.i64 + %4537 = llvm.getelementptr %4526[%4536] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4538 = llvm.load %4537 : !llvm.ptr> + %4539 = llvm.insertelement %4260, %4538[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4540 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4542 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4543 = llvm.mul %4007, %4542 : !llvm.i64 + %4544 = llvm.add %4541, %4543 : !llvm.i64 + %4545 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4546 = llvm.mul %67, %4545 : !llvm.i64 + %4547 = llvm.add %4544, %4546 : !llvm.i64 + %4548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4549 = llvm.mul %4025, %4548 : !llvm.i64 + %4550 = llvm.add %4547, %4549 : !llvm.i64 + %4551 = llvm.getelementptr %4540[%4550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4539, %4551 : !llvm.ptr> + %4552 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4553 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4554 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4555 = llvm.mul %4007, %4554 : !llvm.i64 + %4556 = llvm.add %4553, %4555 : !llvm.i64 + %4557 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4558 = llvm.mul %67, %4557 : !llvm.i64 + %4559 = llvm.add %4556, %4558 : !llvm.i64 + %4560 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4561 = llvm.mul %4025, %4560 : !llvm.i64 + %4562 = llvm.add %4559, %4561 : !llvm.i64 + %4563 = llvm.getelementptr %4552[%4562] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4564 = llvm.load %4563 : !llvm.ptr> + %4565 = llvm.insertelement %4261, %4564[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4566 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4567 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4568 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4569 = llvm.mul %4007, %4568 : !llvm.i64 + %4570 = llvm.add %4567, %4569 : !llvm.i64 + %4571 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4572 = llvm.mul %67, %4571 : !llvm.i64 + %4573 = llvm.add %4570, %4572 : !llvm.i64 + %4574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4575 = llvm.mul %4025, %4574 : !llvm.i64 + %4576 = llvm.add %4573, %4575 : !llvm.i64 + %4577 = llvm.getelementptr %4566[%4576] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4565, %4577 : !llvm.ptr> + %4578 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4580 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4581 = llvm.mul %4007, %4580 : !llvm.i64 + %4582 = llvm.add %4579, %4581 : !llvm.i64 + %4583 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4584 = llvm.mul %67, %4583 : !llvm.i64 + %4585 = llvm.add %4582, %4584 : !llvm.i64 + %4586 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4587 = llvm.mul %4025, %4586 : !llvm.i64 + %4588 = llvm.add %4585, %4587 : !llvm.i64 + %4589 = llvm.getelementptr %4578[%4588] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4590 = llvm.load %4589 : !llvm.ptr> + %4591 = llvm.insertelement %4262, %4590[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4592 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4593 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4594 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4595 = llvm.mul %4007, %4594 : !llvm.i64 + %4596 = llvm.add %4593, %4595 : !llvm.i64 + %4597 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4598 = llvm.mul %67, %4597 : !llvm.i64 + %4599 = llvm.add %4596, %4598 : !llvm.i64 + %4600 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4601 = llvm.mul %4025, %4600 : !llvm.i64 + %4602 = llvm.add %4599, %4601 : !llvm.i64 + %4603 = llvm.getelementptr %4592[%4602] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4591, %4603 : !llvm.ptr> + %4604 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4605 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4606 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4607 = llvm.mul %4007, %4606 : !llvm.i64 + %4608 = llvm.add %4605, %4607 : !llvm.i64 + %4609 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4610 = llvm.mul %67, %4609 : !llvm.i64 + %4611 = llvm.add %4608, %4610 : !llvm.i64 + %4612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4613 = llvm.mul %4025, %4612 : !llvm.i64 + %4614 = llvm.add %4611, %4613 : !llvm.i64 + %4615 = llvm.getelementptr %4604[%4614] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4616 = llvm.load %4615 : !llvm.ptr> + %4617 = llvm.insertelement %4263, %4616[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4618 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4620 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4621 = llvm.mul %4007, %4620 : !llvm.i64 + %4622 = llvm.add %4619, %4621 : !llvm.i64 + %4623 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4624 = llvm.mul %67, %4623 : !llvm.i64 + %4625 = llvm.add %4622, %4624 : !llvm.i64 + %4626 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4627 = llvm.mul %4025, %4626 : !llvm.i64 + %4628 = llvm.add %4625, %4627 : !llvm.i64 + %4629 = llvm.getelementptr %4618[%4628] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4617, %4629 : !llvm.ptr> + %4630 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4632 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4633 = llvm.mul %4007, %4632 : !llvm.i64 + %4634 = llvm.add %4631, %4633 : !llvm.i64 + %4635 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4636 = llvm.mul %67, %4635 : !llvm.i64 + %4637 = llvm.add %4634, %4636 : !llvm.i64 + %4638 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4639 = llvm.mul %4025, %4638 : !llvm.i64 + %4640 = llvm.add %4637, %4639 : !llvm.i64 + %4641 = llvm.getelementptr %4630[%4640] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4642 = llvm.load %4641 : !llvm.ptr> + %4643 = llvm.insertelement %4264, %4642[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4644 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4645 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4646 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4647 = llvm.mul %4007, %4646 : !llvm.i64 + %4648 = llvm.add %4645, %4647 : !llvm.i64 + %4649 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4650 = llvm.mul %67, %4649 : !llvm.i64 + %4651 = llvm.add %4648, %4650 : !llvm.i64 + %4652 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4653 = llvm.mul %4025, %4652 : !llvm.i64 + %4654 = llvm.add %4651, %4653 : !llvm.i64 + %4655 = llvm.getelementptr %4644[%4654] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4643, %4655 : !llvm.ptr> + %4656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4659 = llvm.mul %4007, %4658 : !llvm.i64 + %4660 = llvm.add %4657, %4659 : !llvm.i64 + %4661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4662 = llvm.mul %67, %4661 : !llvm.i64 + %4663 = llvm.add %4660, %4662 : !llvm.i64 + %4664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4665 = llvm.mul %4025, %4664 : !llvm.i64 + %4666 = llvm.add %4663, %4665 : !llvm.i64 + %4667 = llvm.getelementptr %4656[%4666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4668 = llvm.load %4667 : !llvm.ptr> + %4669 = llvm.insertelement %4265, %4668[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4673 = llvm.mul %4007, %4672 : !llvm.i64 + %4674 = llvm.add %4671, %4673 : !llvm.i64 + %4675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4676 = llvm.mul %67, %4675 : !llvm.i64 + %4677 = llvm.add %4674, %4676 : !llvm.i64 + %4678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4679 = llvm.mul %4025, %4678 : !llvm.i64 + %4680 = llvm.add %4677, %4679 : !llvm.i64 + %4681 = llvm.getelementptr %4670[%4680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4669, %4681 : !llvm.ptr> + %4682 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4683 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4684 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4685 = llvm.mul %2345, %4684 : !llvm.i64 + %4686 = llvm.add %4683, %4685 : !llvm.i64 + %4687 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4688 = llvm.mul %3917, %4687 : !llvm.i64 + %4689 = llvm.add %4686, %4688 : !llvm.i64 + %4690 = llvm.getelementptr %4682[%4689] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4691 = llvm.load %4690 : !llvm.ptr + %4692 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4694 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4695 = llvm.mul %2345, %4694 : !llvm.i64 + %4696 = llvm.add %4693, %4695 : !llvm.i64 + %4697 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4698 = llvm.mul %3917, %4697 : !llvm.i64 + %4699 = llvm.add %4696, %4698 : !llvm.i64 + %4700 = llvm.getelementptr %4692[%4699] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4701 = llvm.load %4700 : !llvm.ptr + %4702 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4703 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4704 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4705 = llvm.mul %2345, %4704 : !llvm.i64 + %4706 = llvm.add %4703, %4705 : !llvm.i64 + %4707 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4708 = llvm.mul %3917, %4707 : !llvm.i64 + %4709 = llvm.add %4706, %4708 : !llvm.i64 + %4710 = llvm.getelementptr %4702[%4709] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4711 = llvm.load %4710 : !llvm.ptr + %4712 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4714 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4715 = llvm.mul %2345, %4714 : !llvm.i64 + %4716 = llvm.add %4713, %4715 : !llvm.i64 + %4717 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4718 = llvm.mul %3917, %4717 : !llvm.i64 + %4719 = llvm.add %4716, %4718 : !llvm.i64 + %4720 = llvm.getelementptr %4712[%4719] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4721 = llvm.load %4720 : !llvm.ptr + %4722 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4724 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4725 = llvm.mul %2345, %4724 : !llvm.i64 + %4726 = llvm.add %4723, %4725 : !llvm.i64 + %4727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4728 = llvm.mul %3917, %4727 : !llvm.i64 + %4729 = llvm.add %4726, %4728 : !llvm.i64 + %4730 = llvm.getelementptr %4722[%4729] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4731 = llvm.load %4730 : !llvm.ptr + %4732 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4734 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4735 = llvm.mul %2345, %4734 : !llvm.i64 + %4736 = llvm.add %4733, %4735 : !llvm.i64 + %4737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4738 = llvm.mul %3917, %4737 : !llvm.i64 + %4739 = llvm.add %4736, %4738 : !llvm.i64 + %4740 = llvm.getelementptr %4732[%4739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4741 = llvm.load %4740 : !llvm.ptr + %4742 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4743 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4744 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4745 = llvm.mul %2345, %4744 : !llvm.i64 + %4746 = llvm.add %4743, %4745 : !llvm.i64 + %4747 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4748 = llvm.mul %3917, %4747 : !llvm.i64 + %4749 = llvm.add %4746, %4748 : !llvm.i64 + %4750 = llvm.getelementptr %4742[%4749] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4751 = llvm.load %4750 : !llvm.ptr + %4752 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4754 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4755 = llvm.mul %2345, %4754 : !llvm.i64 + %4756 = llvm.add %4753, %4755 : !llvm.i64 + %4757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4758 = llvm.mul %3917, %4757 : !llvm.i64 + %4759 = llvm.add %4756, %4758 : !llvm.i64 + %4760 = llvm.getelementptr %4752[%4759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4761 = llvm.load %4760 : !llvm.ptr + %4762 = llvm.add %2368, %70 : !llvm.i64 + %4763 = llvm.icmp "slt" %4762, %67 : !llvm.i64 + %4764 = llvm.sub %64, %4762 : !llvm.i64 + %4765 = llvm.select %4763, %4764, %4762 : !llvm.i1, !llvm.i64 + %4766 = llvm.sdiv %4765, %68 : !llvm.i64 + %4767 = llvm.sub %64, %4766 : !llvm.i64 + %4768 = llvm.select %4763, %4767, %4766 : !llvm.i1, !llvm.i64 + %4769 = llvm.srem %4768, %68 : !llvm.i64 + %4770 = llvm.icmp "slt" %4769, %67 : !llvm.i64 + %4771 = llvm.add %4769, %68 : !llvm.i64 + %4772 = llvm.select %4770, %4771, %4769 : !llvm.i1, !llvm.i64 + %4773 = llvm.sdiv %4000, %70 : !llvm.i64 + %4774 = llvm.sub %64, %4773 : !llvm.i64 + %4775 = llvm.select %3998, %4774, %4773 : !llvm.i1, !llvm.i64 + %4776 = llvm.mul %4768, %65 : !llvm.i64 + %4777 = llvm.add %4775, %4776 : !llvm.i64 + %4778 = llvm.add %4777, %69 : !llvm.i64 + %4779 = llvm.icmp "slt" %4778, %67 : !llvm.i64 + %4780 = llvm.sub %64, %4778 : !llvm.i64 + %4781 = llvm.select %4779, %4780, %4778 : !llvm.i1, !llvm.i64 + %4782 = llvm.sdiv %4781, %63 : !llvm.i64 + %4783 = llvm.sub %64, %4782 : !llvm.i64 + %4784 = llvm.select %4779, %4783, %4782 : !llvm.i1, !llvm.i64 + %4785 = llvm.mul %4784, %65 : !llvm.i64 + %4786 = llvm.add %4777, %4785 : !llvm.i64 + %4787 = llvm.add %4786, %69 : !llvm.i64 + %4788 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4790 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4791 = llvm.mul %4772, %4790 : !llvm.i64 + %4792 = llvm.add %4789, %4791 : !llvm.i64 + %4793 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4794 = llvm.mul %4011, %4793 : !llvm.i64 + %4795 = llvm.add %4792, %4794 : !llvm.i64 + %4796 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4797 = llvm.mul %4787, %4796 : !llvm.i64 + %4798 = llvm.add %4795, %4797 : !llvm.i64 + %4799 = llvm.getelementptr %4788[%4798] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4800 = llvm.load %4799 : !llvm.ptr> + %4801 = llvm.extractelement %4800[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4802 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4803 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4804 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4805 = llvm.mul %4772, %4804 : !llvm.i64 + %4806 = llvm.add %4803, %4805 : !llvm.i64 + %4807 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4808 = llvm.mul %4011, %4807 : !llvm.i64 + %4809 = llvm.add %4806, %4808 : !llvm.i64 + %4810 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4811 = llvm.mul %4787, %4810 : !llvm.i64 + %4812 = llvm.add %4809, %4811 : !llvm.i64 + %4813 = llvm.getelementptr %4802[%4812] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4814 = llvm.load %4813 : !llvm.ptr> + %4815 = llvm.extractelement %4814[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4816 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4817 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4818 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4819 = llvm.mul %4772, %4818 : !llvm.i64 + %4820 = llvm.add %4817, %4819 : !llvm.i64 + %4821 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4822 = llvm.mul %4011, %4821 : !llvm.i64 + %4823 = llvm.add %4820, %4822 : !llvm.i64 + %4824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4825 = llvm.mul %4787, %4824 : !llvm.i64 + %4826 = llvm.add %4823, %4825 : !llvm.i64 + %4827 = llvm.getelementptr %4816[%4826] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4828 = llvm.load %4827 : !llvm.ptr> + %4829 = llvm.extractelement %4828[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4830 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4831 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4832 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4833 = llvm.mul %4772, %4832 : !llvm.i64 + %4834 = llvm.add %4831, %4833 : !llvm.i64 + %4835 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4836 = llvm.mul %4011, %4835 : !llvm.i64 + %4837 = llvm.add %4834, %4836 : !llvm.i64 + %4838 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4839 = llvm.mul %4787, %4838 : !llvm.i64 + %4840 = llvm.add %4837, %4839 : !llvm.i64 + %4841 = llvm.getelementptr %4830[%4840] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4842 = llvm.load %4841 : !llvm.ptr> + %4843 = llvm.extractelement %4842[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4844 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4845 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4846 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4847 = llvm.mul %4772, %4846 : !llvm.i64 + %4848 = llvm.add %4845, %4847 : !llvm.i64 + %4849 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4850 = llvm.mul %4011, %4849 : !llvm.i64 + %4851 = llvm.add %4848, %4850 : !llvm.i64 + %4852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4853 = llvm.mul %4787, %4852 : !llvm.i64 + %4854 = llvm.add %4851, %4853 : !llvm.i64 + %4855 = llvm.getelementptr %4844[%4854] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4856 = llvm.load %4855 : !llvm.ptr> + %4857 = llvm.extractelement %4856[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4858 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4860 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4861 = llvm.mul %4772, %4860 : !llvm.i64 + %4862 = llvm.add %4859, %4861 : !llvm.i64 + %4863 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4864 = llvm.mul %4011, %4863 : !llvm.i64 + %4865 = llvm.add %4862, %4864 : !llvm.i64 + %4866 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4867 = llvm.mul %4787, %4866 : !llvm.i64 + %4868 = llvm.add %4865, %4867 : !llvm.i64 + %4869 = llvm.getelementptr %4858[%4868] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4870 = llvm.load %4869 : !llvm.ptr> + %4871 = llvm.extractelement %4870[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4872 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4873 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4874 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4875 = llvm.mul %4772, %4874 : !llvm.i64 + %4876 = llvm.add %4873, %4875 : !llvm.i64 + %4877 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4878 = llvm.mul %4011, %4877 : !llvm.i64 + %4879 = llvm.add %4876, %4878 : !llvm.i64 + %4880 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4881 = llvm.mul %4787, %4880 : !llvm.i64 + %4882 = llvm.add %4879, %4881 : !llvm.i64 + %4883 = llvm.getelementptr %4872[%4882] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4884 = llvm.load %4883 : !llvm.ptr> + %4885 = llvm.extractelement %4884[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4886 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4888 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4889 = llvm.mul %4772, %4888 : !llvm.i64 + %4890 = llvm.add %4887, %4889 : !llvm.i64 + %4891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4892 = llvm.mul %4011, %4891 : !llvm.i64 + %4893 = llvm.add %4890, %4892 : !llvm.i64 + %4894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4895 = llvm.mul %4787, %4894 : !llvm.i64 + %4896 = llvm.add %4893, %4895 : !llvm.i64 + %4897 = llvm.getelementptr %4886[%4896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4898 = llvm.load %4897 : !llvm.ptr> + %4899 = llvm.extractelement %4898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4900 = llvm.fmul %4691, %4801 {RelaxedPrecision} : !llvm.float + %4901 = llvm.fmul %4701, %4815 {RelaxedPrecision} : !llvm.float + %4902 = llvm.fmul %4711, %4829 {RelaxedPrecision} : !llvm.float + %4903 = llvm.fmul %4721, %4843 {RelaxedPrecision} : !llvm.float + %4904 = llvm.fmul %4731, %4857 {RelaxedPrecision} : !llvm.float + %4905 = llvm.fmul %4741, %4871 {RelaxedPrecision} : !llvm.float + %4906 = llvm.fmul %4751, %4885 {RelaxedPrecision} : !llvm.float + %4907 = llvm.fmul %4761, %4899 {RelaxedPrecision} : !llvm.float + %4908 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4910 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4911 = llvm.mul %4772, %4910 : !llvm.i64 + %4912 = llvm.add %4909, %4911 : !llvm.i64 + %4913 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4914 = llvm.mul %67, %4913 : !llvm.i64 + %4915 = llvm.add %4912, %4914 : !llvm.i64 + %4916 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4917 = llvm.mul %4787, %4916 : !llvm.i64 + %4918 = llvm.add %4915, %4917 : !llvm.i64 + %4919 = llvm.getelementptr %4908[%4918] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4920 = llvm.load %4919 : !llvm.ptr> + %4921 = llvm.extractelement %4920[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4922 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4923 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4924 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4925 = llvm.mul %4772, %4924 : !llvm.i64 + %4926 = llvm.add %4923, %4925 : !llvm.i64 + %4927 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4928 = llvm.mul %67, %4927 : !llvm.i64 + %4929 = llvm.add %4926, %4928 : !llvm.i64 + %4930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4931 = llvm.mul %4787, %4930 : !llvm.i64 + %4932 = llvm.add %4929, %4931 : !llvm.i64 + %4933 = llvm.getelementptr %4922[%4932] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4934 = llvm.load %4933 : !llvm.ptr> + %4935 = llvm.extractelement %4934[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4936 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4937 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4938 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4939 = llvm.mul %4772, %4938 : !llvm.i64 + %4940 = llvm.add %4937, %4939 : !llvm.i64 + %4941 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4942 = llvm.mul %67, %4941 : !llvm.i64 + %4943 = llvm.add %4940, %4942 : !llvm.i64 + %4944 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4945 = llvm.mul %4787, %4944 : !llvm.i64 + %4946 = llvm.add %4943, %4945 : !llvm.i64 + %4947 = llvm.getelementptr %4936[%4946] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4948 = llvm.load %4947 : !llvm.ptr> + %4949 = llvm.extractelement %4948[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4950 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4951 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4952 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4953 = llvm.mul %4772, %4952 : !llvm.i64 + %4954 = llvm.add %4951, %4953 : !llvm.i64 + %4955 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4956 = llvm.mul %67, %4955 : !llvm.i64 + %4957 = llvm.add %4954, %4956 : !llvm.i64 + %4958 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4959 = llvm.mul %4787, %4958 : !llvm.i64 + %4960 = llvm.add %4957, %4959 : !llvm.i64 + %4961 = llvm.getelementptr %4950[%4960] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4962 = llvm.load %4961 : !llvm.ptr> + %4963 = llvm.extractelement %4962[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4964 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4966 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4967 = llvm.mul %4772, %4966 : !llvm.i64 + %4968 = llvm.add %4965, %4967 : !llvm.i64 + %4969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4970 = llvm.mul %67, %4969 : !llvm.i64 + %4971 = llvm.add %4968, %4970 : !llvm.i64 + %4972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4973 = llvm.mul %4787, %4972 : !llvm.i64 + %4974 = llvm.add %4971, %4973 : !llvm.i64 + %4975 = llvm.getelementptr %4964[%4974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4976 = llvm.load %4975 : !llvm.ptr> + %4977 = llvm.extractelement %4976[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4978 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4980 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4981 = llvm.mul %4772, %4980 : !llvm.i64 + %4982 = llvm.add %4979, %4981 : !llvm.i64 + %4983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4984 = llvm.mul %67, %4983 : !llvm.i64 + %4985 = llvm.add %4982, %4984 : !llvm.i64 + %4986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4987 = llvm.mul %4787, %4986 : !llvm.i64 + %4988 = llvm.add %4985, %4987 : !llvm.i64 + %4989 = llvm.getelementptr %4978[%4988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4990 = llvm.load %4989 : !llvm.ptr> + %4991 = llvm.extractelement %4990[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4992 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4993 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4994 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4995 = llvm.mul %4772, %4994 : !llvm.i64 + %4996 = llvm.add %4993, %4995 : !llvm.i64 + %4997 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4998 = llvm.mul %67, %4997 : !llvm.i64 + %4999 = llvm.add %4996, %4998 : !llvm.i64 + %5000 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5001 = llvm.mul %4787, %5000 : !llvm.i64 + %5002 = llvm.add %4999, %5001 : !llvm.i64 + %5003 = llvm.getelementptr %4992[%5002] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5004 = llvm.load %5003 : !llvm.ptr> + %5005 = llvm.extractelement %5004[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5006 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5007 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5008 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5009 = llvm.mul %4772, %5008 : !llvm.i64 + %5010 = llvm.add %5007, %5009 : !llvm.i64 + %5011 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5012 = llvm.mul %67, %5011 : !llvm.i64 + %5013 = llvm.add %5010, %5012 : !llvm.i64 + %5014 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5015 = llvm.mul %4787, %5014 : !llvm.i64 + %5016 = llvm.add %5013, %5015 : !llvm.i64 + %5017 = llvm.getelementptr %5006[%5016] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5018 = llvm.load %5017 : !llvm.ptr> + %5019 = llvm.extractelement %5018[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5020 = llvm.fadd %4921, %4900 {RelaxedPrecision} : !llvm.float + %5021 = llvm.fadd %4935, %4901 {RelaxedPrecision} : !llvm.float + %5022 = llvm.fadd %4949, %4902 {RelaxedPrecision} : !llvm.float + %5023 = llvm.fadd %4963, %4903 {RelaxedPrecision} : !llvm.float + %5024 = llvm.fadd %4977, %4904 {RelaxedPrecision} : !llvm.float + %5025 = llvm.fadd %4991, %4905 {RelaxedPrecision} : !llvm.float + %5026 = llvm.fadd %5005, %4906 {RelaxedPrecision} : !llvm.float + %5027 = llvm.fadd %5019, %4907 {RelaxedPrecision} : !llvm.float + %5028 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5030 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5031 = llvm.mul %4772, %5030 : !llvm.i64 + %5032 = llvm.add %5029, %5031 : !llvm.i64 + %5033 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5034 = llvm.mul %67, %5033 : !llvm.i64 + %5035 = llvm.add %5032, %5034 : !llvm.i64 + %5036 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5037 = llvm.mul %4787, %5036 : !llvm.i64 + %5038 = llvm.add %5035, %5037 : !llvm.i64 + %5039 = llvm.getelementptr %5028[%5038] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5040 = llvm.load %5039 : !llvm.ptr> + %5041 = llvm.insertelement %5020, %5040[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5042 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5043 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5044 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5045 = llvm.mul %4772, %5044 : !llvm.i64 + %5046 = llvm.add %5043, %5045 : !llvm.i64 + %5047 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5048 = llvm.mul %67, %5047 : !llvm.i64 + %5049 = llvm.add %5046, %5048 : !llvm.i64 + %5050 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5051 = llvm.mul %4787, %5050 : !llvm.i64 + %5052 = llvm.add %5049, %5051 : !llvm.i64 + %5053 = llvm.getelementptr %5042[%5052] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5041, %5053 : !llvm.ptr> + %5054 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5056 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5057 = llvm.mul %4772, %5056 : !llvm.i64 + %5058 = llvm.add %5055, %5057 : !llvm.i64 + %5059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5060 = llvm.mul %67, %5059 : !llvm.i64 + %5061 = llvm.add %5058, %5060 : !llvm.i64 + %5062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5063 = llvm.mul %4787, %5062 : !llvm.i64 + %5064 = llvm.add %5061, %5063 : !llvm.i64 + %5065 = llvm.getelementptr %5054[%5064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5066 = llvm.load %5065 : !llvm.ptr> + %5067 = llvm.insertelement %5021, %5066[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5068 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5070 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5071 = llvm.mul %4772, %5070 : !llvm.i64 + %5072 = llvm.add %5069, %5071 : !llvm.i64 + %5073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5074 = llvm.mul %67, %5073 : !llvm.i64 + %5075 = llvm.add %5072, %5074 : !llvm.i64 + %5076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5077 = llvm.mul %4787, %5076 : !llvm.i64 + %5078 = llvm.add %5075, %5077 : !llvm.i64 + %5079 = llvm.getelementptr %5068[%5078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5067, %5079 : !llvm.ptr> + %5080 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5081 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5082 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5083 = llvm.mul %4772, %5082 : !llvm.i64 + %5084 = llvm.add %5081, %5083 : !llvm.i64 + %5085 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5086 = llvm.mul %67, %5085 : !llvm.i64 + %5087 = llvm.add %5084, %5086 : !llvm.i64 + %5088 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5089 = llvm.mul %4787, %5088 : !llvm.i64 + %5090 = llvm.add %5087, %5089 : !llvm.i64 + %5091 = llvm.getelementptr %5080[%5090] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5092 = llvm.load %5091 : !llvm.ptr> + %5093 = llvm.insertelement %5022, %5092[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5094 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5095 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5096 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5097 = llvm.mul %4772, %5096 : !llvm.i64 + %5098 = llvm.add %5095, %5097 : !llvm.i64 + %5099 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5100 = llvm.mul %67, %5099 : !llvm.i64 + %5101 = llvm.add %5098, %5100 : !llvm.i64 + %5102 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5103 = llvm.mul %4787, %5102 : !llvm.i64 + %5104 = llvm.add %5101, %5103 : !llvm.i64 + %5105 = llvm.getelementptr %5094[%5104] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5093, %5105 : !llvm.ptr> + %5106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5109 = llvm.mul %4772, %5108 : !llvm.i64 + %5110 = llvm.add %5107, %5109 : !llvm.i64 + %5111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5112 = llvm.mul %67, %5111 : !llvm.i64 + %5113 = llvm.add %5110, %5112 : !llvm.i64 + %5114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5115 = llvm.mul %4787, %5114 : !llvm.i64 + %5116 = llvm.add %5113, %5115 : !llvm.i64 + %5117 = llvm.getelementptr %5106[%5116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5118 = llvm.load %5117 : !llvm.ptr> + %5119 = llvm.insertelement %5023, %5118[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5120 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5122 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5123 = llvm.mul %4772, %5122 : !llvm.i64 + %5124 = llvm.add %5121, %5123 : !llvm.i64 + %5125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5126 = llvm.mul %67, %5125 : !llvm.i64 + %5127 = llvm.add %5124, %5126 : !llvm.i64 + %5128 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5129 = llvm.mul %4787, %5128 : !llvm.i64 + %5130 = llvm.add %5127, %5129 : !llvm.i64 + %5131 = llvm.getelementptr %5120[%5130] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5119, %5131 : !llvm.ptr> + %5132 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5134 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5135 = llvm.mul %4772, %5134 : !llvm.i64 + %5136 = llvm.add %5133, %5135 : !llvm.i64 + %5137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5138 = llvm.mul %67, %5137 : !llvm.i64 + %5139 = llvm.add %5136, %5138 : !llvm.i64 + %5140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5141 = llvm.mul %4787, %5140 : !llvm.i64 + %5142 = llvm.add %5139, %5141 : !llvm.i64 + %5143 = llvm.getelementptr %5132[%5142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5144 = llvm.load %5143 : !llvm.ptr> + %5145 = llvm.insertelement %5024, %5144[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5149 = llvm.mul %4772, %5148 : !llvm.i64 + %5150 = llvm.add %5147, %5149 : !llvm.i64 + %5151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5152 = llvm.mul %67, %5151 : !llvm.i64 + %5153 = llvm.add %5150, %5152 : !llvm.i64 + %5154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5155 = llvm.mul %4787, %5154 : !llvm.i64 + %5156 = llvm.add %5153, %5155 : !llvm.i64 + %5157 = llvm.getelementptr %5146[%5156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5145, %5157 : !llvm.ptr> + %5158 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5160 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5161 = llvm.mul %4772, %5160 : !llvm.i64 + %5162 = llvm.add %5159, %5161 : !llvm.i64 + %5163 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5164 = llvm.mul %67, %5163 : !llvm.i64 + %5165 = llvm.add %5162, %5164 : !llvm.i64 + %5166 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5167 = llvm.mul %4787, %5166 : !llvm.i64 + %5168 = llvm.add %5165, %5167 : !llvm.i64 + %5169 = llvm.getelementptr %5158[%5168] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5170 = llvm.load %5169 : !llvm.ptr> + %5171 = llvm.insertelement %5025, %5170[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5172 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5173 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5174 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5175 = llvm.mul %4772, %5174 : !llvm.i64 + %5176 = llvm.add %5173, %5175 : !llvm.i64 + %5177 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5178 = llvm.mul %67, %5177 : !llvm.i64 + %5179 = llvm.add %5176, %5178 : !llvm.i64 + %5180 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5181 = llvm.mul %4787, %5180 : !llvm.i64 + %5182 = llvm.add %5179, %5181 : !llvm.i64 + %5183 = llvm.getelementptr %5172[%5182] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5171, %5183 : !llvm.ptr> + %5184 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5185 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5186 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5187 = llvm.mul %4772, %5186 : !llvm.i64 + %5188 = llvm.add %5185, %5187 : !llvm.i64 + %5189 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5190 = llvm.mul %67, %5189 : !llvm.i64 + %5191 = llvm.add %5188, %5190 : !llvm.i64 + %5192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5193 = llvm.mul %4787, %5192 : !llvm.i64 + %5194 = llvm.add %5191, %5193 : !llvm.i64 + %5195 = llvm.getelementptr %5184[%5194] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5196 = llvm.load %5195 : !llvm.ptr> + %5197 = llvm.insertelement %5026, %5196[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5198 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5200 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5201 = llvm.mul %4772, %5200 : !llvm.i64 + %5202 = llvm.add %5199, %5201 : !llvm.i64 + %5203 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5204 = llvm.mul %67, %5203 : !llvm.i64 + %5205 = llvm.add %5202, %5204 : !llvm.i64 + %5206 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5207 = llvm.mul %4787, %5206 : !llvm.i64 + %5208 = llvm.add %5205, %5207 : !llvm.i64 + %5209 = llvm.getelementptr %5198[%5208] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5197, %5209 : !llvm.ptr> + %5210 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5212 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5213 = llvm.mul %4772, %5212 : !llvm.i64 + %5214 = llvm.add %5211, %5213 : !llvm.i64 + %5215 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5216 = llvm.mul %67, %5215 : !llvm.i64 + %5217 = llvm.add %5214, %5216 : !llvm.i64 + %5218 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5219 = llvm.mul %4787, %5218 : !llvm.i64 + %5220 = llvm.add %5217, %5219 : !llvm.i64 + %5221 = llvm.getelementptr %5210[%5220] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5222 = llvm.load %5221 : !llvm.ptr> + %5223 = llvm.insertelement %5027, %5222[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5224 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5225 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5226 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5227 = llvm.mul %4772, %5226 : !llvm.i64 + %5228 = llvm.add %5225, %5227 : !llvm.i64 + %5229 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5230 = llvm.mul %67, %5229 : !llvm.i64 + %5231 = llvm.add %5228, %5230 : !llvm.i64 + %5232 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5233 = llvm.mul %4787, %5232 : !llvm.i64 + %5234 = llvm.add %5231, %5233 : !llvm.i64 + %5235 = llvm.getelementptr %5224[%5234] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5223, %5235 : !llvm.ptr> + %5236 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5237 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5238 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5239 = llvm.mul %4772, %5238 : !llvm.i64 + %5240 = llvm.add %5237, %5239 : !llvm.i64 + %5241 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5242 = llvm.mul %67, %5241 : !llvm.i64 + %5243 = llvm.add %5240, %5242 : !llvm.i64 + %5244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5245 = llvm.mul %4787, %5244 : !llvm.i64 + %5246 = llvm.add %5243, %5245 : !llvm.i64 + %5247 = llvm.getelementptr %5236[%5246] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5248 = llvm.load %5247 : !llvm.ptr> + %5249 = llvm.insertelement %5020, %5248[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5250 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5252 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5253 = llvm.mul %4772, %5252 : !llvm.i64 + %5254 = llvm.add %5251, %5253 : !llvm.i64 + %5255 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5256 = llvm.mul %67, %5255 : !llvm.i64 + %5257 = llvm.add %5254, %5256 : !llvm.i64 + %5258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5259 = llvm.mul %4787, %5258 : !llvm.i64 + %5260 = llvm.add %5257, %5259 : !llvm.i64 + %5261 = llvm.getelementptr %5250[%5260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5249, %5261 : !llvm.ptr> + %5262 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5263 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5264 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5265 = llvm.mul %4772, %5264 : !llvm.i64 + %5266 = llvm.add %5263, %5265 : !llvm.i64 + %5267 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5268 = llvm.mul %67, %5267 : !llvm.i64 + %5269 = llvm.add %5266, %5268 : !llvm.i64 + %5270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5271 = llvm.mul %4787, %5270 : !llvm.i64 + %5272 = llvm.add %5269, %5271 : !llvm.i64 + %5273 = llvm.getelementptr %5262[%5272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5274 = llvm.load %5273 : !llvm.ptr> + %5275 = llvm.insertelement %5021, %5274[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5276 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5277 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5278 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5279 = llvm.mul %4772, %5278 : !llvm.i64 + %5280 = llvm.add %5277, %5279 : !llvm.i64 + %5281 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5282 = llvm.mul %67, %5281 : !llvm.i64 + %5283 = llvm.add %5280, %5282 : !llvm.i64 + %5284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5285 = llvm.mul %4787, %5284 : !llvm.i64 + %5286 = llvm.add %5283, %5285 : !llvm.i64 + %5287 = llvm.getelementptr %5276[%5286] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5275, %5287 : !llvm.ptr> + %5288 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5290 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5291 = llvm.mul %4772, %5290 : !llvm.i64 + %5292 = llvm.add %5289, %5291 : !llvm.i64 + %5293 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5294 = llvm.mul %67, %5293 : !llvm.i64 + %5295 = llvm.add %5292, %5294 : !llvm.i64 + %5296 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5297 = llvm.mul %4787, %5296 : !llvm.i64 + %5298 = llvm.add %5295, %5297 : !llvm.i64 + %5299 = llvm.getelementptr %5288[%5298] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5300 = llvm.load %5299 : !llvm.ptr> + %5301 = llvm.insertelement %5022, %5300[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5302 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5303 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5304 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5305 = llvm.mul %4772, %5304 : !llvm.i64 + %5306 = llvm.add %5303, %5305 : !llvm.i64 + %5307 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5308 = llvm.mul %67, %5307 : !llvm.i64 + %5309 = llvm.add %5306, %5308 : !llvm.i64 + %5310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5311 = llvm.mul %4787, %5310 : !llvm.i64 + %5312 = llvm.add %5309, %5311 : !llvm.i64 + %5313 = llvm.getelementptr %5302[%5312] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5301, %5313 : !llvm.ptr> + %5314 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5316 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5317 = llvm.mul %4772, %5316 : !llvm.i64 + %5318 = llvm.add %5315, %5317 : !llvm.i64 + %5319 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5320 = llvm.mul %67, %5319 : !llvm.i64 + %5321 = llvm.add %5318, %5320 : !llvm.i64 + %5322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5323 = llvm.mul %4787, %5322 : !llvm.i64 + %5324 = llvm.add %5321, %5323 : !llvm.i64 + %5325 = llvm.getelementptr %5314[%5324] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5326 = llvm.load %5325 : !llvm.ptr> + %5327 = llvm.insertelement %5023, %5326[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5328 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5330 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5331 = llvm.mul %4772, %5330 : !llvm.i64 + %5332 = llvm.add %5329, %5331 : !llvm.i64 + %5333 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5334 = llvm.mul %67, %5333 : !llvm.i64 + %5335 = llvm.add %5332, %5334 : !llvm.i64 + %5336 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5337 = llvm.mul %4787, %5336 : !llvm.i64 + %5338 = llvm.add %5335, %5337 : !llvm.i64 + %5339 = llvm.getelementptr %5328[%5338] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5327, %5339 : !llvm.ptr> + %5340 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5342 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5343 = llvm.mul %4772, %5342 : !llvm.i64 + %5344 = llvm.add %5341, %5343 : !llvm.i64 + %5345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5346 = llvm.mul %67, %5345 : !llvm.i64 + %5347 = llvm.add %5344, %5346 : !llvm.i64 + %5348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5349 = llvm.mul %4787, %5348 : !llvm.i64 + %5350 = llvm.add %5347, %5349 : !llvm.i64 + %5351 = llvm.getelementptr %5340[%5350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5352 = llvm.load %5351 : !llvm.ptr> + %5353 = llvm.insertelement %5024, %5352[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5354 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5356 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5357 = llvm.mul %4772, %5356 : !llvm.i64 + %5358 = llvm.add %5355, %5357 : !llvm.i64 + %5359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5360 = llvm.mul %67, %5359 : !llvm.i64 + %5361 = llvm.add %5358, %5360 : !llvm.i64 + %5362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5363 = llvm.mul %4787, %5362 : !llvm.i64 + %5364 = llvm.add %5361, %5363 : !llvm.i64 + %5365 = llvm.getelementptr %5354[%5364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5353, %5365 : !llvm.ptr> + %5366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5369 = llvm.mul %4772, %5368 : !llvm.i64 + %5370 = llvm.add %5367, %5369 : !llvm.i64 + %5371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5372 = llvm.mul %67, %5371 : !llvm.i64 + %5373 = llvm.add %5370, %5372 : !llvm.i64 + %5374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5375 = llvm.mul %4787, %5374 : !llvm.i64 + %5376 = llvm.add %5373, %5375 : !llvm.i64 + %5377 = llvm.getelementptr %5366[%5376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5378 = llvm.load %5377 : !llvm.ptr> + %5379 = llvm.insertelement %5025, %5378[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5380 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5382 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5383 = llvm.mul %4772, %5382 : !llvm.i64 + %5384 = llvm.add %5381, %5383 : !llvm.i64 + %5385 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5386 = llvm.mul %67, %5385 : !llvm.i64 + %5387 = llvm.add %5384, %5386 : !llvm.i64 + %5388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5389 = llvm.mul %4787, %5388 : !llvm.i64 + %5390 = llvm.add %5387, %5389 : !llvm.i64 + %5391 = llvm.getelementptr %5380[%5390] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5379, %5391 : !llvm.ptr> + %5392 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5393 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5394 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5395 = llvm.mul %4772, %5394 : !llvm.i64 + %5396 = llvm.add %5393, %5395 : !llvm.i64 + %5397 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5398 = llvm.mul %67, %5397 : !llvm.i64 + %5399 = llvm.add %5396, %5398 : !llvm.i64 + %5400 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5401 = llvm.mul %4787, %5400 : !llvm.i64 + %5402 = llvm.add %5399, %5401 : !llvm.i64 + %5403 = llvm.getelementptr %5392[%5402] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5404 = llvm.load %5403 : !llvm.ptr> + %5405 = llvm.insertelement %5026, %5404[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5406 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5408 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5409 = llvm.mul %4772, %5408 : !llvm.i64 + %5410 = llvm.add %5407, %5409 : !llvm.i64 + %5411 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5412 = llvm.mul %67, %5411 : !llvm.i64 + %5413 = llvm.add %5410, %5412 : !llvm.i64 + %5414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5415 = llvm.mul %4787, %5414 : !llvm.i64 + %5416 = llvm.add %5413, %5415 : !llvm.i64 + %5417 = llvm.getelementptr %5406[%5416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5405, %5417 : !llvm.ptr> + %5418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5421 = llvm.mul %4772, %5420 : !llvm.i64 + %5422 = llvm.add %5419, %5421 : !llvm.i64 + %5423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5424 = llvm.mul %67, %5423 : !llvm.i64 + %5425 = llvm.add %5422, %5424 : !llvm.i64 + %5426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5427 = llvm.mul %4787, %5426 : !llvm.i64 + %5428 = llvm.add %5425, %5427 : !llvm.i64 + %5429 = llvm.getelementptr %5418[%5428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5430 = llvm.load %5429 : !llvm.ptr> + %5431 = llvm.insertelement %5027, %5430[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5435 = llvm.mul %4772, %5434 : !llvm.i64 + %5436 = llvm.add %5433, %5435 : !llvm.i64 + %5437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5438 = llvm.mul %67, %5437 : !llvm.i64 + %5439 = llvm.add %5436, %5438 : !llvm.i64 + %5440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5441 = llvm.mul %4787, %5440 : !llvm.i64 + %5442 = llvm.add %5439, %5441 : !llvm.i64 + %5443 = llvm.getelementptr %5432[%5442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5431, %5443 : !llvm.ptr> + %5444 = llvm.add %3915, %69 : !llvm.i64 + llvm.br ^bb36(%5444 : !llvm.i64) + ^bb38: // pred: ^bb36 + %5445 = llvm.add %2370, %48 : !llvm.i64 + llvm.br ^bb25(%5445 : !llvm.i64) + ^bb39: // pred: ^bb25 + %5446 = llvm.add %2368, %68 : !llvm.i64 + llvm.br ^bb23(%5446 : !llvm.i64) + ^bb40: // pred: ^bb23 + llvm.br ^bb41(%67 : !llvm.i64) + ^bb41(%5447: !llvm.i64): // 2 preds: ^bb40, ^bb50 + %5448 = llvm.icmp "slt" %5447, %38 : !llvm.i64 + llvm.cond_br %5448, ^bb42, ^bb51 + ^bb42: // pred: ^bb41 + llvm.cond_br %40, ^bb43, ^bb47 + ^bb43: // pred: ^bb42 + %5449 = llvm.add %151, %5447 : !llvm.i64 + %5450 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5451 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5452 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5453 = llvm.mul %2345, %5452 : !llvm.i64 + %5454 = llvm.add %5451, %5453 : !llvm.i64 + %5455 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5456 = llvm.mul %5449, %5455 : !llvm.i64 + %5457 = llvm.add %5454, %5456 : !llvm.i64 + %5458 = llvm.getelementptr %5450[%5457] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5459 = llvm.bitcast %5458 : !llvm.ptr to !llvm.ptr> + %5460 = llvm.load %5459 {alignment = 4 : i64} : !llvm.ptr> + %5461 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %5462 = llvm.sub %64, %5447 : !llvm.i64 + %5463 = llvm.select %5461, %5462, %5447 : !llvm.i1, !llvm.i64 + %5464 = llvm.sdiv %5463, %68 : !llvm.i64 + %5465 = llvm.sub %64, %5464 : !llvm.i64 + %5466 = llvm.select %5461, %5465, %5464 : !llvm.i1, !llvm.i64 + %5467 = llvm.srem %5466, %68 : !llvm.i64 + %5468 = llvm.icmp "slt" %5467, %67 : !llvm.i64 + %5469 = llvm.add %5467, %68 : !llvm.i64 + %5470 = llvm.select %5468, %5469, %5467 : !llvm.i1, !llvm.i64 + %5471 = llvm.srem %5447, %68 : !llvm.i64 + %5472 = llvm.icmp "slt" %5471, %67 : !llvm.i64 + %5473 = llvm.add %5471, %68 : !llvm.i64 + %5474 = llvm.select %5472, %5473, %5471 : !llvm.i1, !llvm.i64 + %5475 = llvm.icmp "slt" %5474, %67 : !llvm.i64 + %5476 = llvm.sub %64, %5474 : !llvm.i64 + %5477 = llvm.select %5475, %5476, %5474 : !llvm.i1, !llvm.i64 + %5478 = llvm.sdiv %5477, %70 : !llvm.i64 + %5479 = llvm.sub %64, %5478 : !llvm.i64 + %5480 = llvm.select %5475, %5479, %5478 : !llvm.i1, !llvm.i64 + %5481 = llvm.srem %5480, %63 : !llvm.i64 + %5482 = llvm.icmp "slt" %5481, %67 : !llvm.i64 + %5483 = llvm.add %5481, %63 : !llvm.i64 + %5484 = llvm.select %5482, %5483, %5481 : !llvm.i1, !llvm.i64 + %5485 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5486 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5487 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5488 = llvm.mul %5470, %5487 : !llvm.i64 + %5489 = llvm.add %5486, %5488 : !llvm.i64 + %5490 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5491 = llvm.mul %67, %5490 : !llvm.i64 + %5492 = llvm.add %5489, %5491 : !llvm.i64 + %5493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5494 = llvm.mul %5484, %5493 : !llvm.i64 + %5495 = llvm.add %5492, %5494 : !llvm.i64 + %5496 = llvm.getelementptr %5485[%5495] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5497 = llvm.load %5496 : !llvm.ptr> + %5498 = llvm.fadd %5460, %5497 : !llvm.vec<8 x float> + %5499 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5500 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5501 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5502 = llvm.mul %67, %5501 : !llvm.i64 + %5503 = llvm.add %5500, %5502 : !llvm.i64 + %5504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5505 = llvm.mul %67, %5504 : !llvm.i64 + %5506 = llvm.add %5503, %5505 : !llvm.i64 + %5507 = llvm.getelementptr %5499[%5506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5498, %5507 : !llvm.ptr> + %5508 = llvm.add %5449, %70 : !llvm.i64 + %5509 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5510 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5511 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5512 = llvm.mul %2345, %5511 : !llvm.i64 + %5513 = llvm.add %5510, %5512 : !llvm.i64 + %5514 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5515 = llvm.mul %5508, %5514 : !llvm.i64 + %5516 = llvm.add %5513, %5515 : !llvm.i64 + %5517 = llvm.getelementptr %5509[%5516] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5518 = llvm.bitcast %5517 : !llvm.ptr to !llvm.ptr> + %5519 = llvm.load %5518 {alignment = 4 : i64} : !llvm.ptr> + %5520 = llvm.add %5447, %70 : !llvm.i64 + %5521 = llvm.icmp "slt" %5520, %67 : !llvm.i64 + %5522 = llvm.sub %64, %5520 : !llvm.i64 + %5523 = llvm.select %5521, %5522, %5520 : !llvm.i1, !llvm.i64 + %5524 = llvm.sdiv %5523, %68 : !llvm.i64 + %5525 = llvm.sub %64, %5524 : !llvm.i64 + %5526 = llvm.select %5521, %5525, %5524 : !llvm.i1, !llvm.i64 + %5527 = llvm.srem %5526, %68 : !llvm.i64 + %5528 = llvm.icmp "slt" %5527, %67 : !llvm.i64 + %5529 = llvm.add %5527, %68 : !llvm.i64 + %5530 = llvm.select %5528, %5529, %5527 : !llvm.i1, !llvm.i64 + %5531 = llvm.sdiv %5463, %70 : !llvm.i64 + %5532 = llvm.sub %64, %5531 : !llvm.i64 + %5533 = llvm.select %5461, %5532, %5531 : !llvm.i1, !llvm.i64 + %5534 = llvm.mul %5526, %65 : !llvm.i64 + %5535 = llvm.add %5533, %5534 : !llvm.i64 + %5536 = llvm.add %5535, %69 : !llvm.i64 + %5537 = llvm.icmp "slt" %5536, %67 : !llvm.i64 + %5538 = llvm.sub %64, %5536 : !llvm.i64 + %5539 = llvm.select %5537, %5538, %5536 : !llvm.i1, !llvm.i64 + %5540 = llvm.sdiv %5539, %63 : !llvm.i64 + %5541 = llvm.sub %64, %5540 : !llvm.i64 + %5542 = llvm.select %5537, %5541, %5540 : !llvm.i1, !llvm.i64 + %5543 = llvm.mul %5542, %65 : !llvm.i64 + %5544 = llvm.add %5535, %5543 : !llvm.i64 + %5545 = llvm.add %5544, %69 : !llvm.i64 + %5546 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5547 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5548 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5549 = llvm.mul %5530, %5548 : !llvm.i64 + %5550 = llvm.add %5547, %5549 : !llvm.i64 + %5551 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5552 = llvm.mul %67, %5551 : !llvm.i64 + %5553 = llvm.add %5550, %5552 : !llvm.i64 + %5554 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5555 = llvm.mul %5545, %5554 : !llvm.i64 + %5556 = llvm.add %5553, %5555 : !llvm.i64 + %5557 = llvm.getelementptr %5546[%5556] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5558 = llvm.load %5557 : !llvm.ptr> + %5559 = llvm.fadd %5519, %5558 : !llvm.vec<8 x float> + %5560 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5561 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5562 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5563 = llvm.mul %67, %5562 : !llvm.i64 + %5564 = llvm.add %5561, %5563 : !llvm.i64 + %5565 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5566 = llvm.mul %69, %5565 : !llvm.i64 + %5567 = llvm.add %5564, %5566 : !llvm.i64 + %5568 = llvm.getelementptr %5560[%5567] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5559, %5568 : !llvm.ptr> + %5569 = llvm.add %5449, %68 : !llvm.i64 + %5570 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5571 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5572 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5573 = llvm.mul %2345, %5572 : !llvm.i64 + %5574 = llvm.add %5571, %5573 : !llvm.i64 + %5575 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5576 = llvm.mul %5569, %5575 : !llvm.i64 + %5577 = llvm.add %5574, %5576 : !llvm.i64 + %5578 = llvm.getelementptr %5570[%5577] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5579 = llvm.bitcast %5578 : !llvm.ptr to !llvm.ptr> + %5580 = llvm.load %5579 {alignment = 4 : i64} : !llvm.ptr> + %5581 = llvm.add %5466, %69 : !llvm.i64 + %5582 = llvm.icmp "slt" %5581, %67 : !llvm.i64 + %5583 = llvm.sub %64, %5581 : !llvm.i64 + %5584 = llvm.select %5582, %5583, %5581 : !llvm.i1, !llvm.i64 + %5585 = llvm.sdiv %5584, %68 : !llvm.i64 + %5586 = llvm.sub %64, %5585 : !llvm.i64 + %5587 = llvm.select %5582, %5586, %5585 : !llvm.i1, !llvm.i64 + %5588 = llvm.mul %5587, %60 : !llvm.i64 + %5589 = llvm.add %5466, %5588 : !llvm.i64 + %5590 = llvm.add %5589, %69 : !llvm.i64 + %5591 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5593 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5594 = llvm.mul %5590, %5593 : !llvm.i64 + %5595 = llvm.add %5592, %5594 : !llvm.i64 + %5596 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5597 = llvm.mul %67, %5596 : !llvm.i64 + %5598 = llvm.add %5595, %5597 : !llvm.i64 + %5599 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5600 = llvm.mul %5484, %5599 : !llvm.i64 + %5601 = llvm.add %5598, %5600 : !llvm.i64 + %5602 = llvm.getelementptr %5591[%5601] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5603 = llvm.load %5602 : !llvm.ptr> + %5604 = llvm.fadd %5580, %5603 : !llvm.vec<8 x float> + %5605 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5606 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5607 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5608 = llvm.mul %67, %5607 : !llvm.i64 + %5609 = llvm.add %5606, %5608 : !llvm.i64 + %5610 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5611 = llvm.mul %63, %5610 : !llvm.i64 + %5612 = llvm.add %5609, %5611 : !llvm.i64 + %5613 = llvm.getelementptr %5605[%5612] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5604, %5613 : !llvm.ptr> + %5614 = llvm.add %5449, %41 : !llvm.i64 + %5615 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5616 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5617 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5618 = llvm.mul %2345, %5617 : !llvm.i64 + %5619 = llvm.add %5616, %5618 : !llvm.i64 + %5620 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5621 = llvm.mul %5614, %5620 : !llvm.i64 + %5622 = llvm.add %5619, %5621 : !llvm.i64 + %5623 = llvm.getelementptr %5615[%5622] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5624 = llvm.bitcast %5623 : !llvm.ptr to !llvm.ptr> + %5625 = llvm.load %5624 {alignment = 4 : i64} : !llvm.ptr> + %5626 = llvm.add %5447, %41 : !llvm.i64 + %5627 = llvm.icmp "slt" %5626, %67 : !llvm.i64 + %5628 = llvm.sub %64, %5626 : !llvm.i64 + %5629 = llvm.select %5627, %5628, %5626 : !llvm.i1, !llvm.i64 + %5630 = llvm.sdiv %5629, %68 : !llvm.i64 + %5631 = llvm.sub %64, %5630 : !llvm.i64 + %5632 = llvm.select %5627, %5631, %5630 : !llvm.i1, !llvm.i64 + %5633 = llvm.srem %5632, %68 : !llvm.i64 + %5634 = llvm.icmp "slt" %5633, %67 : !llvm.i64 + %5635 = llvm.add %5633, %68 : !llvm.i64 + %5636 = llvm.select %5634, %5635, %5633 : !llvm.i1, !llvm.i64 + %5637 = llvm.mul %5632, %65 : !llvm.i64 + %5638 = llvm.add %5533, %5637 : !llvm.i64 + %5639 = llvm.add %5638, %45 : !llvm.i64 + %5640 = llvm.icmp "slt" %5639, %67 : !llvm.i64 + %5641 = llvm.sub %64, %5639 : !llvm.i64 + %5642 = llvm.select %5640, %5641, %5639 : !llvm.i1, !llvm.i64 + %5643 = llvm.sdiv %5642, %63 : !llvm.i64 + %5644 = llvm.sub %64, %5643 : !llvm.i64 + %5645 = llvm.select %5640, %5644, %5643 : !llvm.i1, !llvm.i64 + %5646 = llvm.mul %5645, %65 : !llvm.i64 + %5647 = llvm.add %5638, %5646 : !llvm.i64 + %5648 = llvm.add %5647, %45 : !llvm.i64 + %5649 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5651 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5652 = llvm.mul %5636, %5651 : !llvm.i64 + %5653 = llvm.add %5650, %5652 : !llvm.i64 + %5654 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5655 = llvm.mul %67, %5654 : !llvm.i64 + %5656 = llvm.add %5653, %5655 : !llvm.i64 + %5657 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5658 = llvm.mul %5648, %5657 : !llvm.i64 + %5659 = llvm.add %5656, %5658 : !llvm.i64 + %5660 = llvm.getelementptr %5649[%5659] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5661 = llvm.load %5660 : !llvm.ptr> + %5662 = llvm.fadd %5625, %5661 : !llvm.vec<8 x float> + %5663 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5664 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5665 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5666 = llvm.mul %67, %5665 : !llvm.i64 + %5667 = llvm.add %5664, %5666 : !llvm.i64 + %5668 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5669 = llvm.mul %45, %5668 : !llvm.i64 + %5670 = llvm.add %5667, %5669 : !llvm.i64 + %5671 = llvm.getelementptr %5663[%5670] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5662, %5671 : !llvm.ptr> + %5672 = llvm.add %5449, %42 : !llvm.i64 + %5673 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5674 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5675 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5676 = llvm.mul %2345, %5675 : !llvm.i64 + %5677 = llvm.add %5674, %5676 : !llvm.i64 + %5678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5679 = llvm.mul %5672, %5678 : !llvm.i64 + %5680 = llvm.add %5677, %5679 : !llvm.i64 + %5681 = llvm.getelementptr %5673[%5680] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5682 = llvm.bitcast %5681 : !llvm.ptr to !llvm.ptr> + %5683 = llvm.load %5682 {alignment = 4 : i64} : !llvm.ptr> + %5684 = llvm.add %5466, %63 : !llvm.i64 + %5685 = llvm.icmp "slt" %5684, %67 : !llvm.i64 + %5686 = llvm.sub %64, %5684 : !llvm.i64 + %5687 = llvm.select %5685, %5686, %5684 : !llvm.i1, !llvm.i64 + %5688 = llvm.sdiv %5687, %68 : !llvm.i64 + %5689 = llvm.sub %64, %5688 : !llvm.i64 + %5690 = llvm.select %5685, %5689, %5688 : !llvm.i1, !llvm.i64 + %5691 = llvm.mul %5690, %60 : !llvm.i64 + %5692 = llvm.add %5466, %5691 : !llvm.i64 + %5693 = llvm.add %5692, %63 : !llvm.i64 + %5694 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5695 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5696 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5697 = llvm.mul %5693, %5696 : !llvm.i64 + %5698 = llvm.add %5695, %5697 : !llvm.i64 + %5699 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5700 = llvm.mul %67, %5699 : !llvm.i64 + %5701 = llvm.add %5698, %5700 : !llvm.i64 + %5702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5703 = llvm.mul %5484, %5702 : !llvm.i64 + %5704 = llvm.add %5701, %5703 : !llvm.i64 + %5705 = llvm.getelementptr %5694[%5704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5706 = llvm.load %5705 : !llvm.ptr> + %5707 = llvm.fadd %5683, %5706 : !llvm.vec<8 x float> + %5708 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5710 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5711 = llvm.mul %67, %5710 : !llvm.i64 + %5712 = llvm.add %5709, %5711 : !llvm.i64 + %5713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5714 = llvm.mul %48, %5713 : !llvm.i64 + %5715 = llvm.add %5712, %5714 : !llvm.i64 + %5716 = llvm.getelementptr %5708[%5715] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5707, %5716 : !llvm.ptr> + %5717 = llvm.add %5449, %43 : !llvm.i64 + %5718 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5720 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5721 = llvm.mul %2345, %5720 : !llvm.i64 + %5722 = llvm.add %5719, %5721 : !llvm.i64 + %5723 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5724 = llvm.mul %5717, %5723 : !llvm.i64 + %5725 = llvm.add %5722, %5724 : !llvm.i64 + %5726 = llvm.getelementptr %5718[%5725] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5727 = llvm.bitcast %5726 : !llvm.ptr to !llvm.ptr> + %5728 = llvm.load %5727 {alignment = 4 : i64} : !llvm.ptr> + %5729 = llvm.add %5447, %43 : !llvm.i64 + %5730 = llvm.icmp "slt" %5729, %67 : !llvm.i64 + %5731 = llvm.sub %64, %5729 : !llvm.i64 + %5732 = llvm.select %5730, %5731, %5729 : !llvm.i1, !llvm.i64 + %5733 = llvm.sdiv %5732, %68 : !llvm.i64 + %5734 = llvm.sub %64, %5733 : !llvm.i64 + %5735 = llvm.select %5730, %5734, %5733 : !llvm.i1, !llvm.i64 + %5736 = llvm.srem %5735, %68 : !llvm.i64 + %5737 = llvm.icmp "slt" %5736, %67 : !llvm.i64 + %5738 = llvm.add %5736, %68 : !llvm.i64 + %5739 = llvm.select %5737, %5738, %5736 : !llvm.i1, !llvm.i64 + %5740 = llvm.mul %5735, %65 : !llvm.i64 + %5741 = llvm.add %5533, %5740 : !llvm.i64 + %5742 = llvm.add %5741, %52 : !llvm.i64 + %5743 = llvm.icmp "slt" %5742, %67 : !llvm.i64 + %5744 = llvm.sub %64, %5742 : !llvm.i64 + %5745 = llvm.select %5743, %5744, %5742 : !llvm.i1, !llvm.i64 + %5746 = llvm.sdiv %5745, %63 : !llvm.i64 + %5747 = llvm.sub %64, %5746 : !llvm.i64 + %5748 = llvm.select %5743, %5747, %5746 : !llvm.i1, !llvm.i64 + %5749 = llvm.mul %5748, %65 : !llvm.i64 + %5750 = llvm.add %5741, %5749 : !llvm.i64 + %5751 = llvm.add %5750, %52 : !llvm.i64 + %5752 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5754 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5755 = llvm.mul %5739, %5754 : !llvm.i64 + %5756 = llvm.add %5753, %5755 : !llvm.i64 + %5757 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5758 = llvm.mul %67, %5757 : !llvm.i64 + %5759 = llvm.add %5756, %5758 : !llvm.i64 + %5760 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5761 = llvm.mul %5751, %5760 : !llvm.i64 + %5762 = llvm.add %5759, %5761 : !llvm.i64 + %5763 = llvm.getelementptr %5752[%5762] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5764 = llvm.load %5763 : !llvm.ptr> + %5765 = llvm.fadd %5728, %5764 : !llvm.vec<8 x float> + %5766 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5767 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5768 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5769 = llvm.mul %67, %5768 : !llvm.i64 + %5770 = llvm.add %5767, %5769 : !llvm.i64 + %5771 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5772 = llvm.mul %52, %5771 : !llvm.i64 + %5773 = llvm.add %5770, %5772 : !llvm.i64 + %5774 = llvm.getelementptr %5766[%5773] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5765, %5774 : !llvm.ptr> + %5775 = llvm.add %5449, %44 : !llvm.i64 + %5776 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5777 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5778 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5779 = llvm.mul %2345, %5778 : !llvm.i64 + %5780 = llvm.add %5777, %5779 : !llvm.i64 + %5781 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5782 = llvm.mul %5775, %5781 : !llvm.i64 + %5783 = llvm.add %5780, %5782 : !llvm.i64 + %5784 = llvm.getelementptr %5776[%5783] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5785 = llvm.bitcast %5784 : !llvm.ptr to !llvm.ptr> + %5786 = llvm.load %5785 {alignment = 4 : i64} : !llvm.ptr> + %5787 = llvm.add %5466, %45 : !llvm.i64 + %5788 = llvm.icmp "slt" %5787, %67 : !llvm.i64 + %5789 = llvm.sub %64, %5787 : !llvm.i64 + %5790 = llvm.select %5788, %5789, %5787 : !llvm.i1, !llvm.i64 + %5791 = llvm.sdiv %5790, %68 : !llvm.i64 + %5792 = llvm.sub %64, %5791 : !llvm.i64 + %5793 = llvm.select %5788, %5792, %5791 : !llvm.i1, !llvm.i64 + %5794 = llvm.mul %5793, %60 : !llvm.i64 + %5795 = llvm.add %5466, %5794 : !llvm.i64 + %5796 = llvm.add %5795, %45 : !llvm.i64 + %5797 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5799 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5800 = llvm.mul %5796, %5799 : !llvm.i64 + %5801 = llvm.add %5798, %5800 : !llvm.i64 + %5802 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5803 = llvm.mul %67, %5802 : !llvm.i64 + %5804 = llvm.add %5801, %5803 : !llvm.i64 + %5805 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5806 = llvm.mul %5484, %5805 : !llvm.i64 + %5807 = llvm.add %5804, %5806 : !llvm.i64 + %5808 = llvm.getelementptr %5797[%5807] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5809 = llvm.load %5808 : !llvm.ptr> + %5810 = llvm.fadd %5786, %5809 : !llvm.vec<8 x float> + %5811 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5812 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5813 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5814 = llvm.mul %67, %5813 : !llvm.i64 + %5815 = llvm.add %5812, %5814 : !llvm.i64 + %5816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5817 = llvm.mul %56, %5816 : !llvm.i64 + %5818 = llvm.add %5815, %5817 : !llvm.i64 + %5819 = llvm.getelementptr %5811[%5818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5810, %5819 : !llvm.ptr> + %5820 = llvm.add %5449, %46 : !llvm.i64 + %5821 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5822 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5823 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5824 = llvm.mul %2345, %5823 : !llvm.i64 + %5825 = llvm.add %5822, %5824 : !llvm.i64 + %5826 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5827 = llvm.mul %5820, %5826 : !llvm.i64 + %5828 = llvm.add %5825, %5827 : !llvm.i64 + %5829 = llvm.getelementptr %5821[%5828] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5830 = llvm.bitcast %5829 : !llvm.ptr to !llvm.ptr> + %5831 = llvm.load %5830 {alignment = 4 : i64} : !llvm.ptr> + %5832 = llvm.add %5447, %46 : !llvm.i64 + %5833 = llvm.icmp "slt" %5832, %67 : !llvm.i64 + %5834 = llvm.sub %64, %5832 : !llvm.i64 + %5835 = llvm.select %5833, %5834, %5832 : !llvm.i1, !llvm.i64 + %5836 = llvm.sdiv %5835, %68 : !llvm.i64 + %5837 = llvm.sub %64, %5836 : !llvm.i64 + %5838 = llvm.select %5833, %5837, %5836 : !llvm.i1, !llvm.i64 + %5839 = llvm.srem %5838, %68 : !llvm.i64 + %5840 = llvm.icmp "slt" %5839, %67 : !llvm.i64 + %5841 = llvm.add %5839, %68 : !llvm.i64 + %5842 = llvm.select %5840, %5841, %5839 : !llvm.i1, !llvm.i64 + %5843 = llvm.mul %5838, %65 : !llvm.i64 + %5844 = llvm.add %5533, %5843 : !llvm.i64 + %5845 = llvm.add %5844, %61 : !llvm.i64 + %5846 = llvm.icmp "slt" %5845, %67 : !llvm.i64 + %5847 = llvm.sub %64, %5845 : !llvm.i64 + %5848 = llvm.select %5846, %5847, %5845 : !llvm.i1, !llvm.i64 + %5849 = llvm.sdiv %5848, %63 : !llvm.i64 + %5850 = llvm.sub %64, %5849 : !llvm.i64 + %5851 = llvm.select %5846, %5850, %5849 : !llvm.i1, !llvm.i64 + %5852 = llvm.mul %5851, %65 : !llvm.i64 + %5853 = llvm.add %5844, %5852 : !llvm.i64 + %5854 = llvm.add %5853, %61 : !llvm.i64 + %5855 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5857 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5858 = llvm.mul %5842, %5857 : !llvm.i64 + %5859 = llvm.add %5856, %5858 : !llvm.i64 + %5860 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5861 = llvm.mul %67, %5860 : !llvm.i64 + %5862 = llvm.add %5859, %5861 : !llvm.i64 + %5863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5864 = llvm.mul %5854, %5863 : !llvm.i64 + %5865 = llvm.add %5862, %5864 : !llvm.i64 + %5866 = llvm.getelementptr %5855[%5865] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5867 = llvm.load %5866 : !llvm.ptr> + %5868 = llvm.fadd %5831, %5867 : !llvm.vec<8 x float> + %5869 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5871 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5872 = llvm.mul %67, %5871 : !llvm.i64 + %5873 = llvm.add %5870, %5872 : !llvm.i64 + %5874 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5875 = llvm.mul %61, %5874 : !llvm.i64 + %5876 = llvm.add %5873, %5875 : !llvm.i64 + %5877 = llvm.getelementptr %5869[%5876] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5868, %5877 : !llvm.ptr> + %5878 = llvm.add %5449, %47 : !llvm.i64 + %5879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5882 = llvm.mul %2345, %5881 : !llvm.i64 + %5883 = llvm.add %5880, %5882 : !llvm.i64 + %5884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5885 = llvm.mul %5878, %5884 : !llvm.i64 + %5886 = llvm.add %5883, %5885 : !llvm.i64 + %5887 = llvm.getelementptr %5879[%5886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5888 = llvm.bitcast %5887 : !llvm.ptr to !llvm.ptr> + %5889 = llvm.load %5888 {alignment = 4 : i64} : !llvm.ptr> + %5890 = llvm.add %5466, %48 : !llvm.i64 + %5891 = llvm.icmp "slt" %5890, %67 : !llvm.i64 + %5892 = llvm.sub %64, %5890 : !llvm.i64 + %5893 = llvm.select %5891, %5892, %5890 : !llvm.i1, !llvm.i64 + %5894 = llvm.sdiv %5893, %68 : !llvm.i64 + %5895 = llvm.sub %64, %5894 : !llvm.i64 + %5896 = llvm.select %5891, %5895, %5894 : !llvm.i1, !llvm.i64 + %5897 = llvm.mul %5896, %60 : !llvm.i64 + %5898 = llvm.add %5466, %5897 : !llvm.i64 + %5899 = llvm.add %5898, %48 : !llvm.i64 + %5900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5903 = llvm.mul %5899, %5902 : !llvm.i64 + %5904 = llvm.add %5901, %5903 : !llvm.i64 + %5905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5906 = llvm.mul %67, %5905 : !llvm.i64 + %5907 = llvm.add %5904, %5906 : !llvm.i64 + %5908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5909 = llvm.mul %5484, %5908 : !llvm.i64 + %5910 = llvm.add %5907, %5909 : !llvm.i64 + %5911 = llvm.getelementptr %5900[%5910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5912 = llvm.load %5911 : !llvm.ptr> + %5913 = llvm.fadd %5889, %5912 : !llvm.vec<8 x float> + %5914 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5915 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5916 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5917 = llvm.mul %67, %5916 : !llvm.i64 + %5918 = llvm.add %5915, %5917 : !llvm.i64 + %5919 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5920 = llvm.mul %70, %5919 : !llvm.i64 + %5921 = llvm.add %5918, %5920 : !llvm.i64 + %5922 = llvm.getelementptr %5914[%5921] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5913, %5922 : !llvm.ptr> + %5923 = llvm.add %5449, %49 : !llvm.i64 + %5924 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5925 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5926 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5927 = llvm.mul %2345, %5926 : !llvm.i64 + %5928 = llvm.add %5925, %5927 : !llvm.i64 + %5929 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5930 = llvm.mul %5923, %5929 : !llvm.i64 + %5931 = llvm.add %5928, %5930 : !llvm.i64 + %5932 = llvm.getelementptr %5924[%5931] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5933 = llvm.bitcast %5932 : !llvm.ptr to !llvm.ptr> + %5934 = llvm.load %5933 {alignment = 4 : i64} : !llvm.ptr> + %5935 = llvm.add %5447, %49 : !llvm.i64 + %5936 = llvm.icmp "slt" %5935, %67 : !llvm.i64 + %5937 = llvm.sub %64, %5935 : !llvm.i64 + %5938 = llvm.select %5936, %5937, %5935 : !llvm.i1, !llvm.i64 + %5939 = llvm.sdiv %5938, %68 : !llvm.i64 + %5940 = llvm.sub %64, %5939 : !llvm.i64 + %5941 = llvm.select %5936, %5940, %5939 : !llvm.i1, !llvm.i64 + %5942 = llvm.srem %5941, %68 : !llvm.i64 + %5943 = llvm.icmp "slt" %5942, %67 : !llvm.i64 + %5944 = llvm.add %5942, %68 : !llvm.i64 + %5945 = llvm.select %5943, %5944, %5942 : !llvm.i1, !llvm.i64 + %5946 = llvm.mul %5941, %65 : !llvm.i64 + %5947 = llvm.add %5533, %5946 : !llvm.i64 + %5948 = llvm.add %5947, %50 : !llvm.i64 + %5949 = llvm.icmp "slt" %5948, %67 : !llvm.i64 + %5950 = llvm.sub %64, %5948 : !llvm.i64 + %5951 = llvm.select %5949, %5950, %5948 : !llvm.i1, !llvm.i64 + %5952 = llvm.sdiv %5951, %63 : !llvm.i64 + %5953 = llvm.sub %64, %5952 : !llvm.i64 + %5954 = llvm.select %5949, %5953, %5952 : !llvm.i1, !llvm.i64 + %5955 = llvm.mul %5954, %65 : !llvm.i64 + %5956 = llvm.add %5947, %5955 : !llvm.i64 + %5957 = llvm.add %5956, %50 : !llvm.i64 + %5958 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5960 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5961 = llvm.mul %5945, %5960 : !llvm.i64 + %5962 = llvm.add %5959, %5961 : !llvm.i64 + %5963 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5964 = llvm.mul %67, %5963 : !llvm.i64 + %5965 = llvm.add %5962, %5964 : !llvm.i64 + %5966 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5967 = llvm.mul %5957, %5966 : !llvm.i64 + %5968 = llvm.add %5965, %5967 : !llvm.i64 + %5969 = llvm.getelementptr %5958[%5968] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5970 = llvm.load %5969 : !llvm.ptr> + %5971 = llvm.fadd %5934, %5970 : !llvm.vec<8 x float> + %5972 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5973 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5974 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5975 = llvm.mul %67, %5974 : !llvm.i64 + %5976 = llvm.add %5973, %5975 : !llvm.i64 + %5977 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5978 = llvm.mul %50, %5977 : !llvm.i64 + %5979 = llvm.add %5976, %5978 : !llvm.i64 + %5980 = llvm.getelementptr %5972[%5979] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5971, %5980 : !llvm.ptr> + %5981 = llvm.add %5449, %51 : !llvm.i64 + %5982 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5984 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5985 = llvm.mul %2345, %5984 : !llvm.i64 + %5986 = llvm.add %5983, %5985 : !llvm.i64 + %5987 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5988 = llvm.mul %5981, %5987 : !llvm.i64 + %5989 = llvm.add %5986, %5988 : !llvm.i64 + %5990 = llvm.getelementptr %5982[%5989] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5991 = llvm.bitcast %5990 : !llvm.ptr to !llvm.ptr> + %5992 = llvm.load %5991 {alignment = 4 : i64} : !llvm.ptr> + %5993 = llvm.add %5466, %52 : !llvm.i64 + %5994 = llvm.icmp "slt" %5993, %67 : !llvm.i64 + %5995 = llvm.sub %64, %5993 : !llvm.i64 + %5996 = llvm.select %5994, %5995, %5993 : !llvm.i1, !llvm.i64 + %5997 = llvm.sdiv %5996, %68 : !llvm.i64 + %5998 = llvm.sub %64, %5997 : !llvm.i64 + %5999 = llvm.select %5994, %5998, %5997 : !llvm.i1, !llvm.i64 + %6000 = llvm.mul %5999, %60 : !llvm.i64 + %6001 = llvm.add %5466, %6000 : !llvm.i64 + %6002 = llvm.add %6001, %52 : !llvm.i64 + %6003 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6004 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6005 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6006 = llvm.mul %6002, %6005 : !llvm.i64 + %6007 = llvm.add %6004, %6006 : !llvm.i64 + %6008 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6009 = llvm.mul %67, %6008 : !llvm.i64 + %6010 = llvm.add %6007, %6009 : !llvm.i64 + %6011 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6012 = llvm.mul %5484, %6011 : !llvm.i64 + %6013 = llvm.add %6010, %6012 : !llvm.i64 + %6014 = llvm.getelementptr %6003[%6013] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6015 = llvm.load %6014 : !llvm.ptr> + %6016 = llvm.fadd %5992, %6015 : !llvm.vec<8 x float> + %6017 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6018 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6019 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6020 = llvm.mul %67, %6019 : !llvm.i64 + %6021 = llvm.add %6018, %6020 : !llvm.i64 + %6022 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6023 = llvm.mul %33, %6022 : !llvm.i64 + %6024 = llvm.add %6021, %6023 : !llvm.i64 + %6025 = llvm.getelementptr %6017[%6024] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6016, %6025 : !llvm.ptr> + %6026 = llvm.add %5449, %53 : !llvm.i64 + %6027 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6028 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6029 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6030 = llvm.mul %2345, %6029 : !llvm.i64 + %6031 = llvm.add %6028, %6030 : !llvm.i64 + %6032 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6033 = llvm.mul %6026, %6032 : !llvm.i64 + %6034 = llvm.add %6031, %6033 : !llvm.i64 + %6035 = llvm.getelementptr %6027[%6034] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6036 = llvm.bitcast %6035 : !llvm.ptr to !llvm.ptr> + %6037 = llvm.load %6036 {alignment = 4 : i64} : !llvm.ptr> + %6038 = llvm.add %5447, %53 : !llvm.i64 + %6039 = llvm.icmp "slt" %6038, %67 : !llvm.i64 + %6040 = llvm.sub %64, %6038 : !llvm.i64 + %6041 = llvm.select %6039, %6040, %6038 : !llvm.i1, !llvm.i64 + %6042 = llvm.sdiv %6041, %68 : !llvm.i64 + %6043 = llvm.sub %64, %6042 : !llvm.i64 + %6044 = llvm.select %6039, %6043, %6042 : !llvm.i1, !llvm.i64 + %6045 = llvm.srem %6044, %68 : !llvm.i64 + %6046 = llvm.icmp "slt" %6045, %67 : !llvm.i64 + %6047 = llvm.add %6045, %68 : !llvm.i64 + %6048 = llvm.select %6046, %6047, %6045 : !llvm.i1, !llvm.i64 + %6049 = llvm.mul %6044, %65 : !llvm.i64 + %6050 = llvm.add %5533, %6049 : !llvm.i64 + %6051 = llvm.add %6050, %54 : !llvm.i64 + %6052 = llvm.icmp "slt" %6051, %67 : !llvm.i64 + %6053 = llvm.sub %64, %6051 : !llvm.i64 + %6054 = llvm.select %6052, %6053, %6051 : !llvm.i1, !llvm.i64 + %6055 = llvm.sdiv %6054, %63 : !llvm.i64 + %6056 = llvm.sub %64, %6055 : !llvm.i64 + %6057 = llvm.select %6052, %6056, %6055 : !llvm.i1, !llvm.i64 + %6058 = llvm.mul %6057, %65 : !llvm.i64 + %6059 = llvm.add %6050, %6058 : !llvm.i64 + %6060 = llvm.add %6059, %54 : !llvm.i64 + %6061 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6062 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6063 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6064 = llvm.mul %6048, %6063 : !llvm.i64 + %6065 = llvm.add %6062, %6064 : !llvm.i64 + %6066 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6067 = llvm.mul %67, %6066 : !llvm.i64 + %6068 = llvm.add %6065, %6067 : !llvm.i64 + %6069 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6070 = llvm.mul %6060, %6069 : !llvm.i64 + %6071 = llvm.add %6068, %6070 : !llvm.i64 + %6072 = llvm.getelementptr %6061[%6071] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6073 = llvm.load %6072 : !llvm.ptr> + %6074 = llvm.fadd %6037, %6073 : !llvm.vec<8 x float> + %6075 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6076 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6077 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6078 = llvm.mul %67, %6077 : !llvm.i64 + %6079 = llvm.add %6076, %6078 : !llvm.i64 + %6080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6081 = llvm.mul %54, %6080 : !llvm.i64 + %6082 = llvm.add %6079, %6081 : !llvm.i64 + %6083 = llvm.getelementptr %6075[%6082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6074, %6083 : !llvm.ptr> + %6084 = llvm.add %5449, %55 : !llvm.i64 + %6085 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6087 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6088 = llvm.mul %2345, %6087 : !llvm.i64 + %6089 = llvm.add %6086, %6088 : !llvm.i64 + %6090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6091 = llvm.mul %6084, %6090 : !llvm.i64 + %6092 = llvm.add %6089, %6091 : !llvm.i64 + %6093 = llvm.getelementptr %6085[%6092] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6094 = llvm.bitcast %6093 : !llvm.ptr to !llvm.ptr> + %6095 = llvm.load %6094 {alignment = 4 : i64} : !llvm.ptr> + %6096 = llvm.add %5466, %56 : !llvm.i64 + %6097 = llvm.icmp "slt" %6096, %67 : !llvm.i64 + %6098 = llvm.sub %64, %6096 : !llvm.i64 + %6099 = llvm.select %6097, %6098, %6096 : !llvm.i1, !llvm.i64 + %6100 = llvm.sdiv %6099, %68 : !llvm.i64 + %6101 = llvm.sub %64, %6100 : !llvm.i64 + %6102 = llvm.select %6097, %6101, %6100 : !llvm.i1, !llvm.i64 + %6103 = llvm.mul %6102, %60 : !llvm.i64 + %6104 = llvm.add %5466, %6103 : !llvm.i64 + %6105 = llvm.add %6104, %56 : !llvm.i64 + %6106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6109 = llvm.mul %6105, %6108 : !llvm.i64 + %6110 = llvm.add %6107, %6109 : !llvm.i64 + %6111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6112 = llvm.mul %67, %6111 : !llvm.i64 + %6113 = llvm.add %6110, %6112 : !llvm.i64 + %6114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6115 = llvm.mul %5484, %6114 : !llvm.i64 + %6116 = llvm.add %6113, %6115 : !llvm.i64 + %6117 = llvm.getelementptr %6106[%6116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6118 = llvm.load %6117 : !llvm.ptr> + %6119 = llvm.fadd %6095, %6118 : !llvm.vec<8 x float> + %6120 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6122 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6123 = llvm.mul %67, %6122 : !llvm.i64 + %6124 = llvm.add %6121, %6123 : !llvm.i64 + %6125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6126 = llvm.mul %34, %6125 : !llvm.i64 + %6127 = llvm.add %6124, %6126 : !llvm.i64 + %6128 = llvm.getelementptr %6120[%6127] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6119, %6128 : !llvm.ptr> + %6129 = llvm.add %5449, %57 : !llvm.i64 + %6130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6132 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6133 = llvm.mul %2345, %6132 : !llvm.i64 + %6134 = llvm.add %6131, %6133 : !llvm.i64 + %6135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6136 = llvm.mul %6129, %6135 : !llvm.i64 + %6137 = llvm.add %6134, %6136 : !llvm.i64 + %6138 = llvm.getelementptr %6130[%6137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6139 = llvm.bitcast %6138 : !llvm.ptr to !llvm.ptr> + %6140 = llvm.load %6139 {alignment = 4 : i64} : !llvm.ptr> + %6141 = llvm.add %5447, %57 : !llvm.i64 + %6142 = llvm.icmp "slt" %6141, %67 : !llvm.i64 + %6143 = llvm.sub %64, %6141 : !llvm.i64 + %6144 = llvm.select %6142, %6143, %6141 : !llvm.i1, !llvm.i64 + %6145 = llvm.sdiv %6144, %68 : !llvm.i64 + %6146 = llvm.sub %64, %6145 : !llvm.i64 + %6147 = llvm.select %6142, %6146, %6145 : !llvm.i1, !llvm.i64 + %6148 = llvm.srem %6147, %68 : !llvm.i64 + %6149 = llvm.icmp "slt" %6148, %67 : !llvm.i64 + %6150 = llvm.add %6148, %68 : !llvm.i64 + %6151 = llvm.select %6149, %6150, %6148 : !llvm.i1, !llvm.i64 + %6152 = llvm.mul %6147, %65 : !llvm.i64 + %6153 = llvm.add %5533, %6152 : !llvm.i64 + %6154 = llvm.add %6153, %58 : !llvm.i64 + %6155 = llvm.icmp "slt" %6154, %67 : !llvm.i64 + %6156 = llvm.sub %64, %6154 : !llvm.i64 + %6157 = llvm.select %6155, %6156, %6154 : !llvm.i1, !llvm.i64 + %6158 = llvm.sdiv %6157, %63 : !llvm.i64 + %6159 = llvm.sub %64, %6158 : !llvm.i64 + %6160 = llvm.select %6155, %6159, %6158 : !llvm.i1, !llvm.i64 + %6161 = llvm.mul %6160, %65 : !llvm.i64 + %6162 = llvm.add %6153, %6161 : !llvm.i64 + %6163 = llvm.add %6162, %58 : !llvm.i64 + %6164 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6165 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6166 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6167 = llvm.mul %6151, %6166 : !llvm.i64 + %6168 = llvm.add %6165, %6167 : !llvm.i64 + %6169 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6170 = llvm.mul %67, %6169 : !llvm.i64 + %6171 = llvm.add %6168, %6170 : !llvm.i64 + %6172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6173 = llvm.mul %6163, %6172 : !llvm.i64 + %6174 = llvm.add %6171, %6173 : !llvm.i64 + %6175 = llvm.getelementptr %6164[%6174] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6176 = llvm.load %6175 : !llvm.ptr> + %6177 = llvm.fadd %6140, %6176 : !llvm.vec<8 x float> + %6178 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6180 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6181 = llvm.mul %67, %6180 : !llvm.i64 + %6182 = llvm.add %6179, %6181 : !llvm.i64 + %6183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6184 = llvm.mul %58, %6183 : !llvm.i64 + %6185 = llvm.add %6182, %6184 : !llvm.i64 + %6186 = llvm.getelementptr %6178[%6185] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6177, %6186 : !llvm.ptr> + %6187 = llvm.add %5449, %59 : !llvm.i64 + %6188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6191 = llvm.mul %2345, %6190 : !llvm.i64 + %6192 = llvm.add %6189, %6191 : !llvm.i64 + %6193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6194 = llvm.mul %6187, %6193 : !llvm.i64 + %6195 = llvm.add %6192, %6194 : !llvm.i64 + %6196 = llvm.getelementptr %6188[%6195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6197 = llvm.bitcast %6196 : !llvm.ptr to !llvm.ptr> + %6198 = llvm.load %6197 {alignment = 4 : i64} : !llvm.ptr> + %6199 = llvm.add %5466, %61 : !llvm.i64 + %6200 = llvm.icmp "slt" %6199, %67 : !llvm.i64 + %6201 = llvm.sub %64, %6199 : !llvm.i64 + %6202 = llvm.select %6200, %6201, %6199 : !llvm.i1, !llvm.i64 + %6203 = llvm.sdiv %6202, %68 : !llvm.i64 + %6204 = llvm.sub %64, %6203 : !llvm.i64 + %6205 = llvm.select %6200, %6204, %6203 : !llvm.i1, !llvm.i64 + %6206 = llvm.mul %6205, %60 : !llvm.i64 + %6207 = llvm.add %5466, %6206 : !llvm.i64 + %6208 = llvm.add %6207, %61 : !llvm.i64 + %6209 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6211 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6212 = llvm.mul %6208, %6211 : !llvm.i64 + %6213 = llvm.add %6210, %6212 : !llvm.i64 + %6214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6215 = llvm.mul %67, %6214 : !llvm.i64 + %6216 = llvm.add %6213, %6215 : !llvm.i64 + %6217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6218 = llvm.mul %5484, %6217 : !llvm.i64 + %6219 = llvm.add %6216, %6218 : !llvm.i64 + %6220 = llvm.getelementptr %6209[%6219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6221 = llvm.load %6220 : !llvm.ptr> + %6222 = llvm.fadd %6198, %6221 : !llvm.vec<8 x float> + %6223 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6224 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6225 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6226 = llvm.mul %67, %6225 : !llvm.i64 + %6227 = llvm.add %6224, %6226 : !llvm.i64 + %6228 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6229 = llvm.mul %35, %6228 : !llvm.i64 + %6230 = llvm.add %6227, %6229 : !llvm.i64 + %6231 = llvm.getelementptr %6223[%6230] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6222, %6231 : !llvm.ptr> + %6232 = llvm.add %5449, %62 : !llvm.i64 + %6233 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6234 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6235 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6236 = llvm.mul %2345, %6235 : !llvm.i64 + %6237 = llvm.add %6234, %6236 : !llvm.i64 + %6238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6239 = llvm.mul %6232, %6238 : !llvm.i64 + %6240 = llvm.add %6237, %6239 : !llvm.i64 + %6241 = llvm.getelementptr %6233[%6240] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6242 = llvm.bitcast %6241 : !llvm.ptr to !llvm.ptr> + %6243 = llvm.load %6242 {alignment = 4 : i64} : !llvm.ptr> + %6244 = llvm.add %5447, %62 : !llvm.i64 + %6245 = llvm.icmp "slt" %6244, %67 : !llvm.i64 + %6246 = llvm.sub %64, %6244 : !llvm.i64 + %6247 = llvm.select %6245, %6246, %6244 : !llvm.i1, !llvm.i64 + %6248 = llvm.sdiv %6247, %68 : !llvm.i64 + %6249 = llvm.sub %64, %6248 : !llvm.i64 + %6250 = llvm.select %6245, %6249, %6248 : !llvm.i1, !llvm.i64 + %6251 = llvm.srem %6250, %68 : !llvm.i64 + %6252 = llvm.icmp "slt" %6251, %67 : !llvm.i64 + %6253 = llvm.add %6251, %68 : !llvm.i64 + %6254 = llvm.select %6252, %6253, %6251 : !llvm.i1, !llvm.i64 + %6255 = llvm.mul %6250, %65 : !llvm.i64 + %6256 = llvm.add %5533, %6255 : !llvm.i64 + %6257 = llvm.add %6256, %66 : !llvm.i64 + %6258 = llvm.icmp "slt" %6257, %67 : !llvm.i64 + %6259 = llvm.sub %64, %6257 : !llvm.i64 + %6260 = llvm.select %6258, %6259, %6257 : !llvm.i1, !llvm.i64 + %6261 = llvm.sdiv %6260, %63 : !llvm.i64 + %6262 = llvm.sub %64, %6261 : !llvm.i64 + %6263 = llvm.select %6258, %6262, %6261 : !llvm.i1, !llvm.i64 + %6264 = llvm.mul %6263, %65 : !llvm.i64 + %6265 = llvm.add %6256, %6264 : !llvm.i64 + %6266 = llvm.add %6265, %66 : !llvm.i64 + %6267 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6269 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6270 = llvm.mul %6254, %6269 : !llvm.i64 + %6271 = llvm.add %6268, %6270 : !llvm.i64 + %6272 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6273 = llvm.mul %67, %6272 : !llvm.i64 + %6274 = llvm.add %6271, %6273 : !llvm.i64 + %6275 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6276 = llvm.mul %6266, %6275 : !llvm.i64 + %6277 = llvm.add %6274, %6276 : !llvm.i64 + %6278 = llvm.getelementptr %6267[%6277] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6279 = llvm.load %6278 : !llvm.ptr> + %6280 = llvm.fadd %6243, %6279 : !llvm.vec<8 x float> + %6281 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6282 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6283 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6284 = llvm.mul %67, %6283 : !llvm.i64 + %6285 = llvm.add %6282, %6284 : !llvm.i64 + %6286 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6287 = llvm.mul %66, %6286 : !llvm.i64 + %6288 = llvm.add %6285, %6287 : !llvm.i64 + %6289 = llvm.getelementptr %6281[%6288] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6280, %6289 : !llvm.ptr> + llvm.br ^bb44(%67 : !llvm.i64) + ^bb44(%6290: !llvm.i64): // 2 preds: ^bb43, ^bb45 + %6291 = llvm.icmp "slt" %6290, %68 : !llvm.i64 + llvm.cond_br %6291, ^bb45, ^bb46 + ^bb45: // pred: ^bb44 + %6292 = llvm.mul %6290, %70 : !llvm.i64 + %6293 = llvm.add %5449, %6292 : !llvm.i64 + %6294 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6295 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6296 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6297 = llvm.mul %67, %6296 : !llvm.i64 + %6298 = llvm.add %6295, %6297 : !llvm.i64 + %6299 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6300 = llvm.mul %6290, %6299 : !llvm.i64 + %6301 = llvm.add %6298, %6300 : !llvm.i64 + %6302 = llvm.getelementptr %6294[%6301] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6303 = llvm.load %6302 : !llvm.ptr> + %6304 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6305 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6306 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6307 = llvm.mul %2345, %6306 : !llvm.i64 + %6308 = llvm.add %6305, %6307 : !llvm.i64 + %6309 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6310 = llvm.mul %6293, %6309 : !llvm.i64 + %6311 = llvm.add %6308, %6310 : !llvm.i64 + %6312 = llvm.getelementptr %6304[%6311] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6313 = llvm.bitcast %6312 : !llvm.ptr to !llvm.ptr> + llvm.store %6303, %6313 {alignment = 4 : i64} : !llvm.ptr> + %6314 = llvm.add %6290, %69 : !llvm.i64 + llvm.br ^bb44(%6314 : !llvm.i64) + ^bb46: // 2 preds: ^bb44, ^bb48 + llvm.br ^bb50 + ^bb47: // pred: ^bb42 + %6315 = llvm.add %151, %5447 : !llvm.i64 + %6316 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6318 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6319 = llvm.mul %2345, %6318 : !llvm.i64 + %6320 = llvm.add %6317, %6319 : !llvm.i64 + %6321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6322 = llvm.mul %6315, %6321 : !llvm.i64 + %6323 = llvm.add %6320, %6322 : !llvm.i64 + %6324 = llvm.getelementptr %6316[%6323] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6325 = llvm.bitcast %6324 : !llvm.ptr to !llvm.ptr> + %6326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6327 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6328 = llvm.trunc %6315 : !llvm.i64 to !llvm.i32 + %6329 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6330 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6331 = llvm.insertelement %6328, %6329[%6330 : !llvm.i32] : !llvm.vec<8 x i32> + %6332 = llvm.shufflevector %6331, %6329 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6333 = llvm.add %6332, %6327 : !llvm.vec<8 x i32> + %6334 = llvm.trunc %6326 : !llvm.i64 to !llvm.i32 + %6335 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6336 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6337 = llvm.insertelement %6334, %6335[%6336 : !llvm.i32] : !llvm.vec<8 x i32> + %6338 = llvm.shufflevector %6337, %6335 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6339 = llvm.icmp "slt" %6333, %6338 : !llvm.vec<8 x i32> + %6340 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6341 = llvm.intr.masked.load %6325, %6339, %6340 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6342 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %6343 = llvm.sub %64, %5447 : !llvm.i64 + %6344 = llvm.select %6342, %6343, %5447 : !llvm.i1, !llvm.i64 + %6345 = llvm.sdiv %6344, %68 : !llvm.i64 + %6346 = llvm.sub %64, %6345 : !llvm.i64 + %6347 = llvm.select %6342, %6346, %6345 : !llvm.i1, !llvm.i64 + %6348 = llvm.srem %6347, %68 : !llvm.i64 + %6349 = llvm.icmp "slt" %6348, %67 : !llvm.i64 + %6350 = llvm.add %6348, %68 : !llvm.i64 + %6351 = llvm.select %6349, %6350, %6348 : !llvm.i1, !llvm.i64 + %6352 = llvm.srem %5447, %68 : !llvm.i64 + %6353 = llvm.icmp "slt" %6352, %67 : !llvm.i64 + %6354 = llvm.add %6352, %68 : !llvm.i64 + %6355 = llvm.select %6353, %6354, %6352 : !llvm.i1, !llvm.i64 + %6356 = llvm.icmp "slt" %6355, %67 : !llvm.i64 + %6357 = llvm.sub %64, %6355 : !llvm.i64 + %6358 = llvm.select %6356, %6357, %6355 : !llvm.i1, !llvm.i64 + %6359 = llvm.sdiv %6358, %70 : !llvm.i64 + %6360 = llvm.sub %64, %6359 : !llvm.i64 + %6361 = llvm.select %6356, %6360, %6359 : !llvm.i1, !llvm.i64 + %6362 = llvm.srem %6361, %63 : !llvm.i64 + %6363 = llvm.icmp "slt" %6362, %67 : !llvm.i64 + %6364 = llvm.add %6362, %63 : !llvm.i64 + %6365 = llvm.select %6363, %6364, %6362 : !llvm.i1, !llvm.i64 + %6366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6369 = llvm.mul %6351, %6368 : !llvm.i64 + %6370 = llvm.add %6367, %6369 : !llvm.i64 + %6371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6372 = llvm.mul %67, %6371 : !llvm.i64 + %6373 = llvm.add %6370, %6372 : !llvm.i64 + %6374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6375 = llvm.mul %6365, %6374 : !llvm.i64 + %6376 = llvm.add %6373, %6375 : !llvm.i64 + %6377 = llvm.getelementptr %6366[%6376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6378 = llvm.load %6377 : !llvm.ptr> + %6379 = llvm.fadd %6341, %6378 : !llvm.vec<8 x float> + %6380 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6382 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6383 = llvm.mul %67, %6382 : !llvm.i64 + %6384 = llvm.add %6381, %6383 : !llvm.i64 + %6385 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6386 = llvm.mul %67, %6385 : !llvm.i64 + %6387 = llvm.add %6384, %6386 : !llvm.i64 + %6388 = llvm.getelementptr %6380[%6387] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6379, %6388 : !llvm.ptr> + %6389 = llvm.add %6315, %70 : !llvm.i64 + %6390 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6392 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6393 = llvm.mul %2345, %6392 : !llvm.i64 + %6394 = llvm.add %6391, %6393 : !llvm.i64 + %6395 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6396 = llvm.mul %6389, %6395 : !llvm.i64 + %6397 = llvm.add %6394, %6396 : !llvm.i64 + %6398 = llvm.getelementptr %6390[%6397] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6399 = llvm.bitcast %6398 : !llvm.ptr to !llvm.ptr> + %6400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6401 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6402 = llvm.trunc %6389 : !llvm.i64 to !llvm.i32 + %6403 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6404 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6405 = llvm.insertelement %6402, %6403[%6404 : !llvm.i32] : !llvm.vec<8 x i32> + %6406 = llvm.shufflevector %6405, %6403 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6407 = llvm.add %6406, %6401 : !llvm.vec<8 x i32> + %6408 = llvm.trunc %6400 : !llvm.i64 to !llvm.i32 + %6409 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6410 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6411 = llvm.insertelement %6408, %6409[%6410 : !llvm.i32] : !llvm.vec<8 x i32> + %6412 = llvm.shufflevector %6411, %6409 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6413 = llvm.icmp "slt" %6407, %6412 : !llvm.vec<8 x i32> + %6414 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6415 = llvm.intr.masked.load %6399, %6413, %6414 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6416 = llvm.add %5447, %70 : !llvm.i64 + %6417 = llvm.icmp "slt" %6416, %67 : !llvm.i64 + %6418 = llvm.sub %64, %6416 : !llvm.i64 + %6419 = llvm.select %6417, %6418, %6416 : !llvm.i1, !llvm.i64 + %6420 = llvm.sdiv %6419, %68 : !llvm.i64 + %6421 = llvm.sub %64, %6420 : !llvm.i64 + %6422 = llvm.select %6417, %6421, %6420 : !llvm.i1, !llvm.i64 + %6423 = llvm.srem %6422, %68 : !llvm.i64 + %6424 = llvm.icmp "slt" %6423, %67 : !llvm.i64 + %6425 = llvm.add %6423, %68 : !llvm.i64 + %6426 = llvm.select %6424, %6425, %6423 : !llvm.i1, !llvm.i64 + %6427 = llvm.sdiv %6344, %70 : !llvm.i64 + %6428 = llvm.sub %64, %6427 : !llvm.i64 + %6429 = llvm.select %6342, %6428, %6427 : !llvm.i1, !llvm.i64 + %6430 = llvm.mul %6422, %65 : !llvm.i64 + %6431 = llvm.add %6429, %6430 : !llvm.i64 + %6432 = llvm.add %6431, %69 : !llvm.i64 + %6433 = llvm.icmp "slt" %6432, %67 : !llvm.i64 + %6434 = llvm.sub %64, %6432 : !llvm.i64 + %6435 = llvm.select %6433, %6434, %6432 : !llvm.i1, !llvm.i64 + %6436 = llvm.sdiv %6435, %63 : !llvm.i64 + %6437 = llvm.sub %64, %6436 : !llvm.i64 + %6438 = llvm.select %6433, %6437, %6436 : !llvm.i1, !llvm.i64 + %6439 = llvm.mul %6438, %65 : !llvm.i64 + %6440 = llvm.add %6431, %6439 : !llvm.i64 + %6441 = llvm.add %6440, %69 : !llvm.i64 + %6442 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6444 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6445 = llvm.mul %6426, %6444 : !llvm.i64 + %6446 = llvm.add %6443, %6445 : !llvm.i64 + %6447 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6448 = llvm.mul %67, %6447 : !llvm.i64 + %6449 = llvm.add %6446, %6448 : !llvm.i64 + %6450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6451 = llvm.mul %6441, %6450 : !llvm.i64 + %6452 = llvm.add %6449, %6451 : !llvm.i64 + %6453 = llvm.getelementptr %6442[%6452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6454 = llvm.load %6453 : !llvm.ptr> + %6455 = llvm.fadd %6415, %6454 : !llvm.vec<8 x float> + %6456 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6457 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6458 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6459 = llvm.mul %67, %6458 : !llvm.i64 + %6460 = llvm.add %6457, %6459 : !llvm.i64 + %6461 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6462 = llvm.mul %69, %6461 : !llvm.i64 + %6463 = llvm.add %6460, %6462 : !llvm.i64 + %6464 = llvm.getelementptr %6456[%6463] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6455, %6464 : !llvm.ptr> + %6465 = llvm.add %6315, %68 : !llvm.i64 + %6466 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6467 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6468 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6469 = llvm.mul %2345, %6468 : !llvm.i64 + %6470 = llvm.add %6467, %6469 : !llvm.i64 + %6471 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6472 = llvm.mul %6465, %6471 : !llvm.i64 + %6473 = llvm.add %6470, %6472 : !llvm.i64 + %6474 = llvm.getelementptr %6466[%6473] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6475 = llvm.bitcast %6474 : !llvm.ptr to !llvm.ptr> + %6476 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6477 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6478 = llvm.trunc %6465 : !llvm.i64 to !llvm.i32 + %6479 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6480 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6481 = llvm.insertelement %6478, %6479[%6480 : !llvm.i32] : !llvm.vec<8 x i32> + %6482 = llvm.shufflevector %6481, %6479 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6483 = llvm.add %6482, %6477 : !llvm.vec<8 x i32> + %6484 = llvm.trunc %6476 : !llvm.i64 to !llvm.i32 + %6485 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6486 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6487 = llvm.insertelement %6484, %6485[%6486 : !llvm.i32] : !llvm.vec<8 x i32> + %6488 = llvm.shufflevector %6487, %6485 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6489 = llvm.icmp "slt" %6483, %6488 : !llvm.vec<8 x i32> + %6490 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6491 = llvm.intr.masked.load %6475, %6489, %6490 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6492 = llvm.add %6347, %69 : !llvm.i64 + %6493 = llvm.icmp "slt" %6492, %67 : !llvm.i64 + %6494 = llvm.sub %64, %6492 : !llvm.i64 + %6495 = llvm.select %6493, %6494, %6492 : !llvm.i1, !llvm.i64 + %6496 = llvm.sdiv %6495, %68 : !llvm.i64 + %6497 = llvm.sub %64, %6496 : !llvm.i64 + %6498 = llvm.select %6493, %6497, %6496 : !llvm.i1, !llvm.i64 + %6499 = llvm.mul %6498, %60 : !llvm.i64 + %6500 = llvm.add %6347, %6499 : !llvm.i64 + %6501 = llvm.add %6500, %69 : !llvm.i64 + %6502 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6503 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6504 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6505 = llvm.mul %6501, %6504 : !llvm.i64 + %6506 = llvm.add %6503, %6505 : !llvm.i64 + %6507 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6508 = llvm.mul %67, %6507 : !llvm.i64 + %6509 = llvm.add %6506, %6508 : !llvm.i64 + %6510 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6511 = llvm.mul %6365, %6510 : !llvm.i64 + %6512 = llvm.add %6509, %6511 : !llvm.i64 + %6513 = llvm.getelementptr %6502[%6512] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6514 = llvm.load %6513 : !llvm.ptr> + %6515 = llvm.fadd %6491, %6514 : !llvm.vec<8 x float> + %6516 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6517 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6518 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6519 = llvm.mul %67, %6518 : !llvm.i64 + %6520 = llvm.add %6517, %6519 : !llvm.i64 + %6521 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6522 = llvm.mul %63, %6521 : !llvm.i64 + %6523 = llvm.add %6520, %6522 : !llvm.i64 + %6524 = llvm.getelementptr %6516[%6523] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6515, %6524 : !llvm.ptr> + %6525 = llvm.add %6315, %41 : !llvm.i64 + %6526 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6528 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6529 = llvm.mul %2345, %6528 : !llvm.i64 + %6530 = llvm.add %6527, %6529 : !llvm.i64 + %6531 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6532 = llvm.mul %6525, %6531 : !llvm.i64 + %6533 = llvm.add %6530, %6532 : !llvm.i64 + %6534 = llvm.getelementptr %6526[%6533] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6535 = llvm.bitcast %6534 : !llvm.ptr to !llvm.ptr> + %6536 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6537 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6538 = llvm.trunc %6525 : !llvm.i64 to !llvm.i32 + %6539 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6540 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6541 = llvm.insertelement %6538, %6539[%6540 : !llvm.i32] : !llvm.vec<8 x i32> + %6542 = llvm.shufflevector %6541, %6539 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6543 = llvm.add %6542, %6537 : !llvm.vec<8 x i32> + %6544 = llvm.trunc %6536 : !llvm.i64 to !llvm.i32 + %6545 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6546 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6547 = llvm.insertelement %6544, %6545[%6546 : !llvm.i32] : !llvm.vec<8 x i32> + %6548 = llvm.shufflevector %6547, %6545 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6549 = llvm.icmp "slt" %6543, %6548 : !llvm.vec<8 x i32> + %6550 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6551 = llvm.intr.masked.load %6535, %6549, %6550 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6552 = llvm.add %5447, %41 : !llvm.i64 + %6553 = llvm.icmp "slt" %6552, %67 : !llvm.i64 + %6554 = llvm.sub %64, %6552 : !llvm.i64 + %6555 = llvm.select %6553, %6554, %6552 : !llvm.i1, !llvm.i64 + %6556 = llvm.sdiv %6555, %68 : !llvm.i64 + %6557 = llvm.sub %64, %6556 : !llvm.i64 + %6558 = llvm.select %6553, %6557, %6556 : !llvm.i1, !llvm.i64 + %6559 = llvm.srem %6558, %68 : !llvm.i64 + %6560 = llvm.icmp "slt" %6559, %67 : !llvm.i64 + %6561 = llvm.add %6559, %68 : !llvm.i64 + %6562 = llvm.select %6560, %6561, %6559 : !llvm.i1, !llvm.i64 + %6563 = llvm.mul %6558, %65 : !llvm.i64 + %6564 = llvm.add %6429, %6563 : !llvm.i64 + %6565 = llvm.add %6564, %45 : !llvm.i64 + %6566 = llvm.icmp "slt" %6565, %67 : !llvm.i64 + %6567 = llvm.sub %64, %6565 : !llvm.i64 + %6568 = llvm.select %6566, %6567, %6565 : !llvm.i1, !llvm.i64 + %6569 = llvm.sdiv %6568, %63 : !llvm.i64 + %6570 = llvm.sub %64, %6569 : !llvm.i64 + %6571 = llvm.select %6566, %6570, %6569 : !llvm.i1, !llvm.i64 + %6572 = llvm.mul %6571, %65 : !llvm.i64 + %6573 = llvm.add %6564, %6572 : !llvm.i64 + %6574 = llvm.add %6573, %45 : !llvm.i64 + %6575 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6576 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6577 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6578 = llvm.mul %6562, %6577 : !llvm.i64 + %6579 = llvm.add %6576, %6578 : !llvm.i64 + %6580 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6581 = llvm.mul %67, %6580 : !llvm.i64 + %6582 = llvm.add %6579, %6581 : !llvm.i64 + %6583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6584 = llvm.mul %6574, %6583 : !llvm.i64 + %6585 = llvm.add %6582, %6584 : !llvm.i64 + %6586 = llvm.getelementptr %6575[%6585] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6587 = llvm.load %6586 : !llvm.ptr> + %6588 = llvm.fadd %6551, %6587 : !llvm.vec<8 x float> + %6589 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6592 = llvm.mul %67, %6591 : !llvm.i64 + %6593 = llvm.add %6590, %6592 : !llvm.i64 + %6594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6595 = llvm.mul %45, %6594 : !llvm.i64 + %6596 = llvm.add %6593, %6595 : !llvm.i64 + %6597 = llvm.getelementptr %6589[%6596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6588, %6597 : !llvm.ptr> + %6598 = llvm.add %6315, %42 : !llvm.i64 + %6599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6602 = llvm.mul %2345, %6601 : !llvm.i64 + %6603 = llvm.add %6600, %6602 : !llvm.i64 + %6604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6605 = llvm.mul %6598, %6604 : !llvm.i64 + %6606 = llvm.add %6603, %6605 : !llvm.i64 + %6607 = llvm.getelementptr %6599[%6606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6608 = llvm.bitcast %6607 : !llvm.ptr to !llvm.ptr> + %6609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6611 = llvm.trunc %6598 : !llvm.i64 to !llvm.i32 + %6612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6614 = llvm.insertelement %6611, %6612[%6613 : !llvm.i32] : !llvm.vec<8 x i32> + %6615 = llvm.shufflevector %6614, %6612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6616 = llvm.add %6615, %6610 : !llvm.vec<8 x i32> + %6617 = llvm.trunc %6609 : !llvm.i64 to !llvm.i32 + %6618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6620 = llvm.insertelement %6617, %6618[%6619 : !llvm.i32] : !llvm.vec<8 x i32> + %6621 = llvm.shufflevector %6620, %6618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6622 = llvm.icmp "slt" %6616, %6621 : !llvm.vec<8 x i32> + %6623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6624 = llvm.intr.masked.load %6608, %6622, %6623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6625 = llvm.add %6347, %63 : !llvm.i64 + %6626 = llvm.icmp "slt" %6625, %67 : !llvm.i64 + %6627 = llvm.sub %64, %6625 : !llvm.i64 + %6628 = llvm.select %6626, %6627, %6625 : !llvm.i1, !llvm.i64 + %6629 = llvm.sdiv %6628, %68 : !llvm.i64 + %6630 = llvm.sub %64, %6629 : !llvm.i64 + %6631 = llvm.select %6626, %6630, %6629 : !llvm.i1, !llvm.i64 + %6632 = llvm.mul %6631, %60 : !llvm.i64 + %6633 = llvm.add %6347, %6632 : !llvm.i64 + %6634 = llvm.add %6633, %63 : !llvm.i64 + %6635 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6637 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6638 = llvm.mul %6634, %6637 : !llvm.i64 + %6639 = llvm.add %6636, %6638 : !llvm.i64 + %6640 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6641 = llvm.mul %67, %6640 : !llvm.i64 + %6642 = llvm.add %6639, %6641 : !llvm.i64 + %6643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6644 = llvm.mul %6365, %6643 : !llvm.i64 + %6645 = llvm.add %6642, %6644 : !llvm.i64 + %6646 = llvm.getelementptr %6635[%6645] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6647 = llvm.load %6646 : !llvm.ptr> + %6648 = llvm.fadd %6624, %6647 : !llvm.vec<8 x float> + %6649 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6651 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6652 = llvm.mul %67, %6651 : !llvm.i64 + %6653 = llvm.add %6650, %6652 : !llvm.i64 + %6654 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6655 = llvm.mul %48, %6654 : !llvm.i64 + %6656 = llvm.add %6653, %6655 : !llvm.i64 + %6657 = llvm.getelementptr %6649[%6656] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6648, %6657 : !llvm.ptr> + %6658 = llvm.add %6315, %43 : !llvm.i64 + %6659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6662 = llvm.mul %2345, %6661 : !llvm.i64 + %6663 = llvm.add %6660, %6662 : !llvm.i64 + %6664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6665 = llvm.mul %6658, %6664 : !llvm.i64 + %6666 = llvm.add %6663, %6665 : !llvm.i64 + %6667 = llvm.getelementptr %6659[%6666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6668 = llvm.bitcast %6667 : !llvm.ptr to !llvm.ptr> + %6669 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6670 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6671 = llvm.trunc %6658 : !llvm.i64 to !llvm.i32 + %6672 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6673 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6674 = llvm.insertelement %6671, %6672[%6673 : !llvm.i32] : !llvm.vec<8 x i32> + %6675 = llvm.shufflevector %6674, %6672 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6676 = llvm.add %6675, %6670 : !llvm.vec<8 x i32> + %6677 = llvm.trunc %6669 : !llvm.i64 to !llvm.i32 + %6678 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6679 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6680 = llvm.insertelement %6677, %6678[%6679 : !llvm.i32] : !llvm.vec<8 x i32> + %6681 = llvm.shufflevector %6680, %6678 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6682 = llvm.icmp "slt" %6676, %6681 : !llvm.vec<8 x i32> + %6683 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6684 = llvm.intr.masked.load %6668, %6682, %6683 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6685 = llvm.add %5447, %43 : !llvm.i64 + %6686 = llvm.icmp "slt" %6685, %67 : !llvm.i64 + %6687 = llvm.sub %64, %6685 : !llvm.i64 + %6688 = llvm.select %6686, %6687, %6685 : !llvm.i1, !llvm.i64 + %6689 = llvm.sdiv %6688, %68 : !llvm.i64 + %6690 = llvm.sub %64, %6689 : !llvm.i64 + %6691 = llvm.select %6686, %6690, %6689 : !llvm.i1, !llvm.i64 + %6692 = llvm.srem %6691, %68 : !llvm.i64 + %6693 = llvm.icmp "slt" %6692, %67 : !llvm.i64 + %6694 = llvm.add %6692, %68 : !llvm.i64 + %6695 = llvm.select %6693, %6694, %6692 : !llvm.i1, !llvm.i64 + %6696 = llvm.mul %6691, %65 : !llvm.i64 + %6697 = llvm.add %6429, %6696 : !llvm.i64 + %6698 = llvm.add %6697, %52 : !llvm.i64 + %6699 = llvm.icmp "slt" %6698, %67 : !llvm.i64 + %6700 = llvm.sub %64, %6698 : !llvm.i64 + %6701 = llvm.select %6699, %6700, %6698 : !llvm.i1, !llvm.i64 + %6702 = llvm.sdiv %6701, %63 : !llvm.i64 + %6703 = llvm.sub %64, %6702 : !llvm.i64 + %6704 = llvm.select %6699, %6703, %6702 : !llvm.i1, !llvm.i64 + %6705 = llvm.mul %6704, %65 : !llvm.i64 + %6706 = llvm.add %6697, %6705 : !llvm.i64 + %6707 = llvm.add %6706, %52 : !llvm.i64 + %6708 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6710 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6711 = llvm.mul %6695, %6710 : !llvm.i64 + %6712 = llvm.add %6709, %6711 : !llvm.i64 + %6713 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6714 = llvm.mul %67, %6713 : !llvm.i64 + %6715 = llvm.add %6712, %6714 : !llvm.i64 + %6716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6717 = llvm.mul %6707, %6716 : !llvm.i64 + %6718 = llvm.add %6715, %6717 : !llvm.i64 + %6719 = llvm.getelementptr %6708[%6718] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6720 = llvm.load %6719 : !llvm.ptr> + %6721 = llvm.fadd %6684, %6720 : !llvm.vec<8 x float> + %6722 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6724 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6725 = llvm.mul %67, %6724 : !llvm.i64 + %6726 = llvm.add %6723, %6725 : !llvm.i64 + %6727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6728 = llvm.mul %52, %6727 : !llvm.i64 + %6729 = llvm.add %6726, %6728 : !llvm.i64 + %6730 = llvm.getelementptr %6722[%6729] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6721, %6730 : !llvm.ptr> + %6731 = llvm.add %6315, %44 : !llvm.i64 + %6732 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6734 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6735 = llvm.mul %2345, %6734 : !llvm.i64 + %6736 = llvm.add %6733, %6735 : !llvm.i64 + %6737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6738 = llvm.mul %6731, %6737 : !llvm.i64 + %6739 = llvm.add %6736, %6738 : !llvm.i64 + %6740 = llvm.getelementptr %6732[%6739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6741 = llvm.bitcast %6740 : !llvm.ptr to !llvm.ptr> + %6742 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6743 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6744 = llvm.trunc %6731 : !llvm.i64 to !llvm.i32 + %6745 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6746 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6747 = llvm.insertelement %6744, %6745[%6746 : !llvm.i32] : !llvm.vec<8 x i32> + %6748 = llvm.shufflevector %6747, %6745 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6749 = llvm.add %6748, %6743 : !llvm.vec<8 x i32> + %6750 = llvm.trunc %6742 : !llvm.i64 to !llvm.i32 + %6751 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6752 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6753 = llvm.insertelement %6750, %6751[%6752 : !llvm.i32] : !llvm.vec<8 x i32> + %6754 = llvm.shufflevector %6753, %6751 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6755 = llvm.icmp "slt" %6749, %6754 : !llvm.vec<8 x i32> + %6756 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6757 = llvm.intr.masked.load %6741, %6755, %6756 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6758 = llvm.add %6347, %45 : !llvm.i64 + %6759 = llvm.icmp "slt" %6758, %67 : !llvm.i64 + %6760 = llvm.sub %64, %6758 : !llvm.i64 + %6761 = llvm.select %6759, %6760, %6758 : !llvm.i1, !llvm.i64 + %6762 = llvm.sdiv %6761, %68 : !llvm.i64 + %6763 = llvm.sub %64, %6762 : !llvm.i64 + %6764 = llvm.select %6759, %6763, %6762 : !llvm.i1, !llvm.i64 + %6765 = llvm.mul %6764, %60 : !llvm.i64 + %6766 = llvm.add %6347, %6765 : !llvm.i64 + %6767 = llvm.add %6766, %45 : !llvm.i64 + %6768 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6770 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6771 = llvm.mul %6767, %6770 : !llvm.i64 + %6772 = llvm.add %6769, %6771 : !llvm.i64 + %6773 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6774 = llvm.mul %67, %6773 : !llvm.i64 + %6775 = llvm.add %6772, %6774 : !llvm.i64 + %6776 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6777 = llvm.mul %6365, %6776 : !llvm.i64 + %6778 = llvm.add %6775, %6777 : !llvm.i64 + %6779 = llvm.getelementptr %6768[%6778] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6780 = llvm.load %6779 : !llvm.ptr> + %6781 = llvm.fadd %6757, %6780 : !llvm.vec<8 x float> + %6782 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6784 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6785 = llvm.mul %67, %6784 : !llvm.i64 + %6786 = llvm.add %6783, %6785 : !llvm.i64 + %6787 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6788 = llvm.mul %56, %6787 : !llvm.i64 + %6789 = llvm.add %6786, %6788 : !llvm.i64 + %6790 = llvm.getelementptr %6782[%6789] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6781, %6790 : !llvm.ptr> + %6791 = llvm.add %6315, %46 : !llvm.i64 + %6792 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6794 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6795 = llvm.mul %2345, %6794 : !llvm.i64 + %6796 = llvm.add %6793, %6795 : !llvm.i64 + %6797 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6798 = llvm.mul %6791, %6797 : !llvm.i64 + %6799 = llvm.add %6796, %6798 : !llvm.i64 + %6800 = llvm.getelementptr %6792[%6799] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6801 = llvm.bitcast %6800 : !llvm.ptr to !llvm.ptr> + %6802 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6803 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6804 = llvm.trunc %6791 : !llvm.i64 to !llvm.i32 + %6805 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6806 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6807 = llvm.insertelement %6804, %6805[%6806 : !llvm.i32] : !llvm.vec<8 x i32> + %6808 = llvm.shufflevector %6807, %6805 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6809 = llvm.add %6808, %6803 : !llvm.vec<8 x i32> + %6810 = llvm.trunc %6802 : !llvm.i64 to !llvm.i32 + %6811 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6812 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6813 = llvm.insertelement %6810, %6811[%6812 : !llvm.i32] : !llvm.vec<8 x i32> + %6814 = llvm.shufflevector %6813, %6811 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6815 = llvm.icmp "slt" %6809, %6814 : !llvm.vec<8 x i32> + %6816 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6817 = llvm.intr.masked.load %6801, %6815, %6816 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6818 = llvm.add %5447, %46 : !llvm.i64 + %6819 = llvm.icmp "slt" %6818, %67 : !llvm.i64 + %6820 = llvm.sub %64, %6818 : !llvm.i64 + %6821 = llvm.select %6819, %6820, %6818 : !llvm.i1, !llvm.i64 + %6822 = llvm.sdiv %6821, %68 : !llvm.i64 + %6823 = llvm.sub %64, %6822 : !llvm.i64 + %6824 = llvm.select %6819, %6823, %6822 : !llvm.i1, !llvm.i64 + %6825 = llvm.srem %6824, %68 : !llvm.i64 + %6826 = llvm.icmp "slt" %6825, %67 : !llvm.i64 + %6827 = llvm.add %6825, %68 : !llvm.i64 + %6828 = llvm.select %6826, %6827, %6825 : !llvm.i1, !llvm.i64 + %6829 = llvm.mul %6824, %65 : !llvm.i64 + %6830 = llvm.add %6429, %6829 : !llvm.i64 + %6831 = llvm.add %6830, %61 : !llvm.i64 + %6832 = llvm.icmp "slt" %6831, %67 : !llvm.i64 + %6833 = llvm.sub %64, %6831 : !llvm.i64 + %6834 = llvm.select %6832, %6833, %6831 : !llvm.i1, !llvm.i64 + %6835 = llvm.sdiv %6834, %63 : !llvm.i64 + %6836 = llvm.sub %64, %6835 : !llvm.i64 + %6837 = llvm.select %6832, %6836, %6835 : !llvm.i1, !llvm.i64 + %6838 = llvm.mul %6837, %65 : !llvm.i64 + %6839 = llvm.add %6830, %6838 : !llvm.i64 + %6840 = llvm.add %6839, %61 : !llvm.i64 + %6841 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6842 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6843 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6844 = llvm.mul %6828, %6843 : !llvm.i64 + %6845 = llvm.add %6842, %6844 : !llvm.i64 + %6846 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6847 = llvm.mul %67, %6846 : !llvm.i64 + %6848 = llvm.add %6845, %6847 : !llvm.i64 + %6849 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6850 = llvm.mul %6840, %6849 : !llvm.i64 + %6851 = llvm.add %6848, %6850 : !llvm.i64 + %6852 = llvm.getelementptr %6841[%6851] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6853 = llvm.load %6852 : !llvm.ptr> + %6854 = llvm.fadd %6817, %6853 : !llvm.vec<8 x float> + %6855 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6857 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6858 = llvm.mul %67, %6857 : !llvm.i64 + %6859 = llvm.add %6856, %6858 : !llvm.i64 + %6860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6861 = llvm.mul %61, %6860 : !llvm.i64 + %6862 = llvm.add %6859, %6861 : !llvm.i64 + %6863 = llvm.getelementptr %6855[%6862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6854, %6863 : !llvm.ptr> + %6864 = llvm.add %6315, %47 : !llvm.i64 + %6865 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6866 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6867 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6868 = llvm.mul %2345, %6867 : !llvm.i64 + %6869 = llvm.add %6866, %6868 : !llvm.i64 + %6870 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6871 = llvm.mul %6864, %6870 : !llvm.i64 + %6872 = llvm.add %6869, %6871 : !llvm.i64 + %6873 = llvm.getelementptr %6865[%6872] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6874 = llvm.bitcast %6873 : !llvm.ptr to !llvm.ptr> + %6875 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6876 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6877 = llvm.trunc %6864 : !llvm.i64 to !llvm.i32 + %6878 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6879 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6880 = llvm.insertelement %6877, %6878[%6879 : !llvm.i32] : !llvm.vec<8 x i32> + %6881 = llvm.shufflevector %6880, %6878 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6882 = llvm.add %6881, %6876 : !llvm.vec<8 x i32> + %6883 = llvm.trunc %6875 : !llvm.i64 to !llvm.i32 + %6884 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6885 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6886 = llvm.insertelement %6883, %6884[%6885 : !llvm.i32] : !llvm.vec<8 x i32> + %6887 = llvm.shufflevector %6886, %6884 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6888 = llvm.icmp "slt" %6882, %6887 : !llvm.vec<8 x i32> + %6889 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6890 = llvm.intr.masked.load %6874, %6888, %6889 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6891 = llvm.add %6347, %48 : !llvm.i64 + %6892 = llvm.icmp "slt" %6891, %67 : !llvm.i64 + %6893 = llvm.sub %64, %6891 : !llvm.i64 + %6894 = llvm.select %6892, %6893, %6891 : !llvm.i1, !llvm.i64 + %6895 = llvm.sdiv %6894, %68 : !llvm.i64 + %6896 = llvm.sub %64, %6895 : !llvm.i64 + %6897 = llvm.select %6892, %6896, %6895 : !llvm.i1, !llvm.i64 + %6898 = llvm.mul %6897, %60 : !llvm.i64 + %6899 = llvm.add %6347, %6898 : !llvm.i64 + %6900 = llvm.add %6899, %48 : !llvm.i64 + %6901 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6903 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6904 = llvm.mul %6900, %6903 : !llvm.i64 + %6905 = llvm.add %6902, %6904 : !llvm.i64 + %6906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6907 = llvm.mul %67, %6906 : !llvm.i64 + %6908 = llvm.add %6905, %6907 : !llvm.i64 + %6909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6910 = llvm.mul %6365, %6909 : !llvm.i64 + %6911 = llvm.add %6908, %6910 : !llvm.i64 + %6912 = llvm.getelementptr %6901[%6911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6913 = llvm.load %6912 : !llvm.ptr> + %6914 = llvm.fadd %6890, %6913 : !llvm.vec<8 x float> + %6915 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6916 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6917 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6918 = llvm.mul %67, %6917 : !llvm.i64 + %6919 = llvm.add %6916, %6918 : !llvm.i64 + %6920 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6921 = llvm.mul %70, %6920 : !llvm.i64 + %6922 = llvm.add %6919, %6921 : !llvm.i64 + %6923 = llvm.getelementptr %6915[%6922] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6914, %6923 : !llvm.ptr> + %6924 = llvm.add %6315, %49 : !llvm.i64 + %6925 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6926 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6927 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6928 = llvm.mul %2345, %6927 : !llvm.i64 + %6929 = llvm.add %6926, %6928 : !llvm.i64 + %6930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6931 = llvm.mul %6924, %6930 : !llvm.i64 + %6932 = llvm.add %6929, %6931 : !llvm.i64 + %6933 = llvm.getelementptr %6925[%6932] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6934 = llvm.bitcast %6933 : !llvm.ptr to !llvm.ptr> + %6935 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6936 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6937 = llvm.trunc %6924 : !llvm.i64 to !llvm.i32 + %6938 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6939 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6940 = llvm.insertelement %6937, %6938[%6939 : !llvm.i32] : !llvm.vec<8 x i32> + %6941 = llvm.shufflevector %6940, %6938 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6942 = llvm.add %6941, %6936 : !llvm.vec<8 x i32> + %6943 = llvm.trunc %6935 : !llvm.i64 to !llvm.i32 + %6944 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6945 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6946 = llvm.insertelement %6943, %6944[%6945 : !llvm.i32] : !llvm.vec<8 x i32> + %6947 = llvm.shufflevector %6946, %6944 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6948 = llvm.icmp "slt" %6942, %6947 : !llvm.vec<8 x i32> + %6949 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6950 = llvm.intr.masked.load %6934, %6948, %6949 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6951 = llvm.add %5447, %49 : !llvm.i64 + %6952 = llvm.icmp "slt" %6951, %67 : !llvm.i64 + %6953 = llvm.sub %64, %6951 : !llvm.i64 + %6954 = llvm.select %6952, %6953, %6951 : !llvm.i1, !llvm.i64 + %6955 = llvm.sdiv %6954, %68 : !llvm.i64 + %6956 = llvm.sub %64, %6955 : !llvm.i64 + %6957 = llvm.select %6952, %6956, %6955 : !llvm.i1, !llvm.i64 + %6958 = llvm.srem %6957, %68 : !llvm.i64 + %6959 = llvm.icmp "slt" %6958, %67 : !llvm.i64 + %6960 = llvm.add %6958, %68 : !llvm.i64 + %6961 = llvm.select %6959, %6960, %6958 : !llvm.i1, !llvm.i64 + %6962 = llvm.mul %6957, %65 : !llvm.i64 + %6963 = llvm.add %6429, %6962 : !llvm.i64 + %6964 = llvm.add %6963, %50 : !llvm.i64 + %6965 = llvm.icmp "slt" %6964, %67 : !llvm.i64 + %6966 = llvm.sub %64, %6964 : !llvm.i64 + %6967 = llvm.select %6965, %6966, %6964 : !llvm.i1, !llvm.i64 + %6968 = llvm.sdiv %6967, %63 : !llvm.i64 + %6969 = llvm.sub %64, %6968 : !llvm.i64 + %6970 = llvm.select %6965, %6969, %6968 : !llvm.i1, !llvm.i64 + %6971 = llvm.mul %6970, %65 : !llvm.i64 + %6972 = llvm.add %6963, %6971 : !llvm.i64 + %6973 = llvm.add %6972, %50 : !llvm.i64 + %6974 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6975 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6976 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6977 = llvm.mul %6961, %6976 : !llvm.i64 + %6978 = llvm.add %6975, %6977 : !llvm.i64 + %6979 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6980 = llvm.mul %67, %6979 : !llvm.i64 + %6981 = llvm.add %6978, %6980 : !llvm.i64 + %6982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6983 = llvm.mul %6973, %6982 : !llvm.i64 + %6984 = llvm.add %6981, %6983 : !llvm.i64 + %6985 = llvm.getelementptr %6974[%6984] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6986 = llvm.load %6985 : !llvm.ptr> + %6987 = llvm.fadd %6950, %6986 : !llvm.vec<8 x float> + %6988 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6990 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6991 = llvm.mul %67, %6990 : !llvm.i64 + %6992 = llvm.add %6989, %6991 : !llvm.i64 + %6993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6994 = llvm.mul %50, %6993 : !llvm.i64 + %6995 = llvm.add %6992, %6994 : !llvm.i64 + %6996 = llvm.getelementptr %6988[%6995] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6987, %6996 : !llvm.ptr> + %6997 = llvm.add %6315, %51 : !llvm.i64 + %6998 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6999 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7000 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7001 = llvm.mul %2345, %7000 : !llvm.i64 + %7002 = llvm.add %6999, %7001 : !llvm.i64 + %7003 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7004 = llvm.mul %6997, %7003 : !llvm.i64 + %7005 = llvm.add %7002, %7004 : !llvm.i64 + %7006 = llvm.getelementptr %6998[%7005] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7007 = llvm.bitcast %7006 : !llvm.ptr to !llvm.ptr> + %7008 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7009 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7010 = llvm.trunc %6997 : !llvm.i64 to !llvm.i32 + %7011 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7012 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7013 = llvm.insertelement %7010, %7011[%7012 : !llvm.i32] : !llvm.vec<8 x i32> + %7014 = llvm.shufflevector %7013, %7011 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7015 = llvm.add %7014, %7009 : !llvm.vec<8 x i32> + %7016 = llvm.trunc %7008 : !llvm.i64 to !llvm.i32 + %7017 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7018 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7019 = llvm.insertelement %7016, %7017[%7018 : !llvm.i32] : !llvm.vec<8 x i32> + %7020 = llvm.shufflevector %7019, %7017 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7021 = llvm.icmp "slt" %7015, %7020 : !llvm.vec<8 x i32> + %7022 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7023 = llvm.intr.masked.load %7007, %7021, %7022 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7024 = llvm.add %6347, %52 : !llvm.i64 + %7025 = llvm.icmp "slt" %7024, %67 : !llvm.i64 + %7026 = llvm.sub %64, %7024 : !llvm.i64 + %7027 = llvm.select %7025, %7026, %7024 : !llvm.i1, !llvm.i64 + %7028 = llvm.sdiv %7027, %68 : !llvm.i64 + %7029 = llvm.sub %64, %7028 : !llvm.i64 + %7030 = llvm.select %7025, %7029, %7028 : !llvm.i1, !llvm.i64 + %7031 = llvm.mul %7030, %60 : !llvm.i64 + %7032 = llvm.add %6347, %7031 : !llvm.i64 + %7033 = llvm.add %7032, %52 : !llvm.i64 + %7034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7037 = llvm.mul %7033, %7036 : !llvm.i64 + %7038 = llvm.add %7035, %7037 : !llvm.i64 + %7039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7040 = llvm.mul %67, %7039 : !llvm.i64 + %7041 = llvm.add %7038, %7040 : !llvm.i64 + %7042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7043 = llvm.mul %6365, %7042 : !llvm.i64 + %7044 = llvm.add %7041, %7043 : !llvm.i64 + %7045 = llvm.getelementptr %7034[%7044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7046 = llvm.load %7045 : !llvm.ptr> + %7047 = llvm.fadd %7023, %7046 : !llvm.vec<8 x float> + %7048 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7050 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7051 = llvm.mul %67, %7050 : !llvm.i64 + %7052 = llvm.add %7049, %7051 : !llvm.i64 + %7053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7054 = llvm.mul %33, %7053 : !llvm.i64 + %7055 = llvm.add %7052, %7054 : !llvm.i64 + %7056 = llvm.getelementptr %7048[%7055] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7047, %7056 : !llvm.ptr> + %7057 = llvm.add %6315, %53 : !llvm.i64 + %7058 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7059 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7060 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7061 = llvm.mul %2345, %7060 : !llvm.i64 + %7062 = llvm.add %7059, %7061 : !llvm.i64 + %7063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7064 = llvm.mul %7057, %7063 : !llvm.i64 + %7065 = llvm.add %7062, %7064 : !llvm.i64 + %7066 = llvm.getelementptr %7058[%7065] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7067 = llvm.bitcast %7066 : !llvm.ptr to !llvm.ptr> + %7068 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7069 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7070 = llvm.trunc %7057 : !llvm.i64 to !llvm.i32 + %7071 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7072 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7073 = llvm.insertelement %7070, %7071[%7072 : !llvm.i32] : !llvm.vec<8 x i32> + %7074 = llvm.shufflevector %7073, %7071 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7075 = llvm.add %7074, %7069 : !llvm.vec<8 x i32> + %7076 = llvm.trunc %7068 : !llvm.i64 to !llvm.i32 + %7077 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7078 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7079 = llvm.insertelement %7076, %7077[%7078 : !llvm.i32] : !llvm.vec<8 x i32> + %7080 = llvm.shufflevector %7079, %7077 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7081 = llvm.icmp "slt" %7075, %7080 : !llvm.vec<8 x i32> + %7082 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7083 = llvm.intr.masked.load %7067, %7081, %7082 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7084 = llvm.add %5447, %53 : !llvm.i64 + %7085 = llvm.icmp "slt" %7084, %67 : !llvm.i64 + %7086 = llvm.sub %64, %7084 : !llvm.i64 + %7087 = llvm.select %7085, %7086, %7084 : !llvm.i1, !llvm.i64 + %7088 = llvm.sdiv %7087, %68 : !llvm.i64 + %7089 = llvm.sub %64, %7088 : !llvm.i64 + %7090 = llvm.select %7085, %7089, %7088 : !llvm.i1, !llvm.i64 + %7091 = llvm.srem %7090, %68 : !llvm.i64 + %7092 = llvm.icmp "slt" %7091, %67 : !llvm.i64 + %7093 = llvm.add %7091, %68 : !llvm.i64 + %7094 = llvm.select %7092, %7093, %7091 : !llvm.i1, !llvm.i64 + %7095 = llvm.mul %7090, %65 : !llvm.i64 + %7096 = llvm.add %6429, %7095 : !llvm.i64 + %7097 = llvm.add %7096, %54 : !llvm.i64 + %7098 = llvm.icmp "slt" %7097, %67 : !llvm.i64 + %7099 = llvm.sub %64, %7097 : !llvm.i64 + %7100 = llvm.select %7098, %7099, %7097 : !llvm.i1, !llvm.i64 + %7101 = llvm.sdiv %7100, %63 : !llvm.i64 + %7102 = llvm.sub %64, %7101 : !llvm.i64 + %7103 = llvm.select %7098, %7102, %7101 : !llvm.i1, !llvm.i64 + %7104 = llvm.mul %7103, %65 : !llvm.i64 + %7105 = llvm.add %7096, %7104 : !llvm.i64 + %7106 = llvm.add %7105, %54 : !llvm.i64 + %7107 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7108 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7109 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7110 = llvm.mul %7094, %7109 : !llvm.i64 + %7111 = llvm.add %7108, %7110 : !llvm.i64 + %7112 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7113 = llvm.mul %67, %7112 : !llvm.i64 + %7114 = llvm.add %7111, %7113 : !llvm.i64 + %7115 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7116 = llvm.mul %7106, %7115 : !llvm.i64 + %7117 = llvm.add %7114, %7116 : !llvm.i64 + %7118 = llvm.getelementptr %7107[%7117] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7119 = llvm.load %7118 : !llvm.ptr> + %7120 = llvm.fadd %7083, %7119 : !llvm.vec<8 x float> + %7121 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7122 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7123 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7124 = llvm.mul %67, %7123 : !llvm.i64 + %7125 = llvm.add %7122, %7124 : !llvm.i64 + %7126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7127 = llvm.mul %54, %7126 : !llvm.i64 + %7128 = llvm.add %7125, %7127 : !llvm.i64 + %7129 = llvm.getelementptr %7121[%7128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7120, %7129 : !llvm.ptr> + %7130 = llvm.add %6315, %55 : !llvm.i64 + %7131 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7134 = llvm.mul %2345, %7133 : !llvm.i64 + %7135 = llvm.add %7132, %7134 : !llvm.i64 + %7136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7137 = llvm.mul %7130, %7136 : !llvm.i64 + %7138 = llvm.add %7135, %7137 : !llvm.i64 + %7139 = llvm.getelementptr %7131[%7138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7140 = llvm.bitcast %7139 : !llvm.ptr to !llvm.ptr> + %7141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7143 = llvm.trunc %7130 : !llvm.i64 to !llvm.i32 + %7144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7146 = llvm.insertelement %7143, %7144[%7145 : !llvm.i32] : !llvm.vec<8 x i32> + %7147 = llvm.shufflevector %7146, %7144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7148 = llvm.add %7147, %7142 : !llvm.vec<8 x i32> + %7149 = llvm.trunc %7141 : !llvm.i64 to !llvm.i32 + %7150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7152 = llvm.insertelement %7149, %7150[%7151 : !llvm.i32] : !llvm.vec<8 x i32> + %7153 = llvm.shufflevector %7152, %7150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7154 = llvm.icmp "slt" %7148, %7153 : !llvm.vec<8 x i32> + %7155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7156 = llvm.intr.masked.load %7140, %7154, %7155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7157 = llvm.add %6347, %56 : !llvm.i64 + %7158 = llvm.icmp "slt" %7157, %67 : !llvm.i64 + %7159 = llvm.sub %64, %7157 : !llvm.i64 + %7160 = llvm.select %7158, %7159, %7157 : !llvm.i1, !llvm.i64 + %7161 = llvm.sdiv %7160, %68 : !llvm.i64 + %7162 = llvm.sub %64, %7161 : !llvm.i64 + %7163 = llvm.select %7158, %7162, %7161 : !llvm.i1, !llvm.i64 + %7164 = llvm.mul %7163, %60 : !llvm.i64 + %7165 = llvm.add %6347, %7164 : !llvm.i64 + %7166 = llvm.add %7165, %56 : !llvm.i64 + %7167 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7169 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7170 = llvm.mul %7166, %7169 : !llvm.i64 + %7171 = llvm.add %7168, %7170 : !llvm.i64 + %7172 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7173 = llvm.mul %67, %7172 : !llvm.i64 + %7174 = llvm.add %7171, %7173 : !llvm.i64 + %7175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7176 = llvm.mul %6365, %7175 : !llvm.i64 + %7177 = llvm.add %7174, %7176 : !llvm.i64 + %7178 = llvm.getelementptr %7167[%7177] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7179 = llvm.load %7178 : !llvm.ptr> + %7180 = llvm.fadd %7156, %7179 : !llvm.vec<8 x float> + %7181 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7182 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7183 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7184 = llvm.mul %67, %7183 : !llvm.i64 + %7185 = llvm.add %7182, %7184 : !llvm.i64 + %7186 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7187 = llvm.mul %34, %7186 : !llvm.i64 + %7188 = llvm.add %7185, %7187 : !llvm.i64 + %7189 = llvm.getelementptr %7181[%7188] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7180, %7189 : !llvm.ptr> + %7190 = llvm.add %6315, %57 : !llvm.i64 + %7191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7193 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7194 = llvm.mul %2345, %7193 : !llvm.i64 + %7195 = llvm.add %7192, %7194 : !llvm.i64 + %7196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7197 = llvm.mul %7190, %7196 : !llvm.i64 + %7198 = llvm.add %7195, %7197 : !llvm.i64 + %7199 = llvm.getelementptr %7191[%7198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7200 = llvm.bitcast %7199 : !llvm.ptr to !llvm.ptr> + %7201 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7202 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7203 = llvm.trunc %7190 : !llvm.i64 to !llvm.i32 + %7204 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7205 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7206 = llvm.insertelement %7203, %7204[%7205 : !llvm.i32] : !llvm.vec<8 x i32> + %7207 = llvm.shufflevector %7206, %7204 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7208 = llvm.add %7207, %7202 : !llvm.vec<8 x i32> + %7209 = llvm.trunc %7201 : !llvm.i64 to !llvm.i32 + %7210 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7211 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7212 = llvm.insertelement %7209, %7210[%7211 : !llvm.i32] : !llvm.vec<8 x i32> + %7213 = llvm.shufflevector %7212, %7210 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7214 = llvm.icmp "slt" %7208, %7213 : !llvm.vec<8 x i32> + %7215 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7216 = llvm.intr.masked.load %7200, %7214, %7215 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7217 = llvm.add %5447, %57 : !llvm.i64 + %7218 = llvm.icmp "slt" %7217, %67 : !llvm.i64 + %7219 = llvm.sub %64, %7217 : !llvm.i64 + %7220 = llvm.select %7218, %7219, %7217 : !llvm.i1, !llvm.i64 + %7221 = llvm.sdiv %7220, %68 : !llvm.i64 + %7222 = llvm.sub %64, %7221 : !llvm.i64 + %7223 = llvm.select %7218, %7222, %7221 : !llvm.i1, !llvm.i64 + %7224 = llvm.srem %7223, %68 : !llvm.i64 + %7225 = llvm.icmp "slt" %7224, %67 : !llvm.i64 + %7226 = llvm.add %7224, %68 : !llvm.i64 + %7227 = llvm.select %7225, %7226, %7224 : !llvm.i1, !llvm.i64 + %7228 = llvm.mul %7223, %65 : !llvm.i64 + %7229 = llvm.add %6429, %7228 : !llvm.i64 + %7230 = llvm.add %7229, %58 : !llvm.i64 + %7231 = llvm.icmp "slt" %7230, %67 : !llvm.i64 + %7232 = llvm.sub %64, %7230 : !llvm.i64 + %7233 = llvm.select %7231, %7232, %7230 : !llvm.i1, !llvm.i64 + %7234 = llvm.sdiv %7233, %63 : !llvm.i64 + %7235 = llvm.sub %64, %7234 : !llvm.i64 + %7236 = llvm.select %7231, %7235, %7234 : !llvm.i1, !llvm.i64 + %7237 = llvm.mul %7236, %65 : !llvm.i64 + %7238 = llvm.add %7229, %7237 : !llvm.i64 + %7239 = llvm.add %7238, %58 : !llvm.i64 + %7240 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7242 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7243 = llvm.mul %7227, %7242 : !llvm.i64 + %7244 = llvm.add %7241, %7243 : !llvm.i64 + %7245 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7246 = llvm.mul %67, %7245 : !llvm.i64 + %7247 = llvm.add %7244, %7246 : !llvm.i64 + %7248 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7249 = llvm.mul %7239, %7248 : !llvm.i64 + %7250 = llvm.add %7247, %7249 : !llvm.i64 + %7251 = llvm.getelementptr %7240[%7250] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7252 = llvm.load %7251 : !llvm.ptr> + %7253 = llvm.fadd %7216, %7252 : !llvm.vec<8 x float> + %7254 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7256 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7257 = llvm.mul %67, %7256 : !llvm.i64 + %7258 = llvm.add %7255, %7257 : !llvm.i64 + %7259 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7260 = llvm.mul %58, %7259 : !llvm.i64 + %7261 = llvm.add %7258, %7260 : !llvm.i64 + %7262 = llvm.getelementptr %7254[%7261] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7253, %7262 : !llvm.ptr> + %7263 = llvm.add %6315, %59 : !llvm.i64 + %7264 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7265 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7266 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7267 = llvm.mul %2345, %7266 : !llvm.i64 + %7268 = llvm.add %7265, %7267 : !llvm.i64 + %7269 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7270 = llvm.mul %7263, %7269 : !llvm.i64 + %7271 = llvm.add %7268, %7270 : !llvm.i64 + %7272 = llvm.getelementptr %7264[%7271] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7273 = llvm.bitcast %7272 : !llvm.ptr to !llvm.ptr> + %7274 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7275 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7276 = llvm.trunc %7263 : !llvm.i64 to !llvm.i32 + %7277 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7278 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7279 = llvm.insertelement %7276, %7277[%7278 : !llvm.i32] : !llvm.vec<8 x i32> + %7280 = llvm.shufflevector %7279, %7277 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7281 = llvm.add %7280, %7275 : !llvm.vec<8 x i32> + %7282 = llvm.trunc %7274 : !llvm.i64 to !llvm.i32 + %7283 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7284 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7285 = llvm.insertelement %7282, %7283[%7284 : !llvm.i32] : !llvm.vec<8 x i32> + %7286 = llvm.shufflevector %7285, %7283 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7287 = llvm.icmp "slt" %7281, %7286 : !llvm.vec<8 x i32> + %7288 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7289 = llvm.intr.masked.load %7273, %7287, %7288 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7290 = llvm.add %6347, %61 : !llvm.i64 + %7291 = llvm.icmp "slt" %7290, %67 : !llvm.i64 + %7292 = llvm.sub %64, %7290 : !llvm.i64 + %7293 = llvm.select %7291, %7292, %7290 : !llvm.i1, !llvm.i64 + %7294 = llvm.sdiv %7293, %68 : !llvm.i64 + %7295 = llvm.sub %64, %7294 : !llvm.i64 + %7296 = llvm.select %7291, %7295, %7294 : !llvm.i1, !llvm.i64 + %7297 = llvm.mul %7296, %60 : !llvm.i64 + %7298 = llvm.add %6347, %7297 : !llvm.i64 + %7299 = llvm.add %7298, %61 : !llvm.i64 + %7300 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7302 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7303 = llvm.mul %7299, %7302 : !llvm.i64 + %7304 = llvm.add %7301, %7303 : !llvm.i64 + %7305 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7306 = llvm.mul %67, %7305 : !llvm.i64 + %7307 = llvm.add %7304, %7306 : !llvm.i64 + %7308 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7309 = llvm.mul %6365, %7308 : !llvm.i64 + %7310 = llvm.add %7307, %7309 : !llvm.i64 + %7311 = llvm.getelementptr %7300[%7310] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7312 = llvm.load %7311 : !llvm.ptr> + %7313 = llvm.fadd %7289, %7312 : !llvm.vec<8 x float> + %7314 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7316 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7317 = llvm.mul %67, %7316 : !llvm.i64 + %7318 = llvm.add %7315, %7317 : !llvm.i64 + %7319 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7320 = llvm.mul %35, %7319 : !llvm.i64 + %7321 = llvm.add %7318, %7320 : !llvm.i64 + %7322 = llvm.getelementptr %7314[%7321] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7313, %7322 : !llvm.ptr> + %7323 = llvm.add %6315, %62 : !llvm.i64 + %7324 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7325 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7327 = llvm.mul %2345, %7326 : !llvm.i64 + %7328 = llvm.add %7325, %7327 : !llvm.i64 + %7329 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7330 = llvm.mul %7323, %7329 : !llvm.i64 + %7331 = llvm.add %7328, %7330 : !llvm.i64 + %7332 = llvm.getelementptr %7324[%7331] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7333 = llvm.bitcast %7332 : !llvm.ptr to !llvm.ptr> + %7334 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7335 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7336 = llvm.trunc %7323 : !llvm.i64 to !llvm.i32 + %7337 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7338 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7339 = llvm.insertelement %7336, %7337[%7338 : !llvm.i32] : !llvm.vec<8 x i32> + %7340 = llvm.shufflevector %7339, %7337 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7341 = llvm.add %7340, %7335 : !llvm.vec<8 x i32> + %7342 = llvm.trunc %7334 : !llvm.i64 to !llvm.i32 + %7343 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7344 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7345 = llvm.insertelement %7342, %7343[%7344 : !llvm.i32] : !llvm.vec<8 x i32> + %7346 = llvm.shufflevector %7345, %7343 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7347 = llvm.icmp "slt" %7341, %7346 : !llvm.vec<8 x i32> + %7348 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7349 = llvm.intr.masked.load %7333, %7347, %7348 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7350 = llvm.add %5447, %62 : !llvm.i64 + %7351 = llvm.icmp "slt" %7350, %67 : !llvm.i64 + %7352 = llvm.sub %64, %7350 : !llvm.i64 + %7353 = llvm.select %7351, %7352, %7350 : !llvm.i1, !llvm.i64 + %7354 = llvm.sdiv %7353, %68 : !llvm.i64 + %7355 = llvm.sub %64, %7354 : !llvm.i64 + %7356 = llvm.select %7351, %7355, %7354 : !llvm.i1, !llvm.i64 + %7357 = llvm.srem %7356, %68 : !llvm.i64 + %7358 = llvm.icmp "slt" %7357, %67 : !llvm.i64 + %7359 = llvm.add %7357, %68 : !llvm.i64 + %7360 = llvm.select %7358, %7359, %7357 : !llvm.i1, !llvm.i64 + %7361 = llvm.mul %7356, %65 : !llvm.i64 + %7362 = llvm.add %6429, %7361 : !llvm.i64 + %7363 = llvm.add %7362, %66 : !llvm.i64 + %7364 = llvm.icmp "slt" %7363, %67 : !llvm.i64 + %7365 = llvm.sub %64, %7363 : !llvm.i64 + %7366 = llvm.select %7364, %7365, %7363 : !llvm.i1, !llvm.i64 + %7367 = llvm.sdiv %7366, %63 : !llvm.i64 + %7368 = llvm.sub %64, %7367 : !llvm.i64 + %7369 = llvm.select %7364, %7368, %7367 : !llvm.i1, !llvm.i64 + %7370 = llvm.mul %7369, %65 : !llvm.i64 + %7371 = llvm.add %7362, %7370 : !llvm.i64 + %7372 = llvm.add %7371, %66 : !llvm.i64 + %7373 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7375 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7376 = llvm.mul %7360, %7375 : !llvm.i64 + %7377 = llvm.add %7374, %7376 : !llvm.i64 + %7378 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7379 = llvm.mul %67, %7378 : !llvm.i64 + %7380 = llvm.add %7377, %7379 : !llvm.i64 + %7381 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7382 = llvm.mul %7372, %7381 : !llvm.i64 + %7383 = llvm.add %7380, %7382 : !llvm.i64 + %7384 = llvm.getelementptr %7373[%7383] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7385 = llvm.load %7384 : !llvm.ptr> + %7386 = llvm.fadd %7349, %7385 : !llvm.vec<8 x float> + %7387 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7389 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7390 = llvm.mul %67, %7389 : !llvm.i64 + %7391 = llvm.add %7388, %7390 : !llvm.i64 + %7392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7393 = llvm.mul %66, %7392 : !llvm.i64 + %7394 = llvm.add %7391, %7393 : !llvm.i64 + %7395 = llvm.getelementptr %7387[%7394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7386, %7395 : !llvm.ptr> + llvm.br ^bb48(%67 : !llvm.i64) + ^bb48(%7396: !llvm.i64): // 2 preds: ^bb47, ^bb49 + %7397 = llvm.icmp "slt" %7396, %68 : !llvm.i64 + llvm.cond_br %7397, ^bb49, ^bb46 + ^bb49: // pred: ^bb48 + %7398 = llvm.mul %7396, %70 : !llvm.i64 + %7399 = llvm.add %6315, %7398 : !llvm.i64 + %7400 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7403 = llvm.mul %67, %7402 : !llvm.i64 + %7404 = llvm.add %7401, %7403 : !llvm.i64 + %7405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7406 = llvm.mul %7396, %7405 : !llvm.i64 + %7407 = llvm.add %7404, %7406 : !llvm.i64 + %7408 = llvm.getelementptr %7400[%7407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7409 = llvm.load %7408 : !llvm.ptr> + %7410 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7413 = llvm.mul %2345, %7412 : !llvm.i64 + %7414 = llvm.add %7411, %7413 : !llvm.i64 + %7415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7416 = llvm.mul %7399, %7415 : !llvm.i64 + %7417 = llvm.add %7414, %7416 : !llvm.i64 + %7418 = llvm.getelementptr %7410[%7417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7419 = llvm.bitcast %7418 : !llvm.ptr to !llvm.ptr> + %7420 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7421 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7422 = llvm.trunc %7399 : !llvm.i64 to !llvm.i32 + %7423 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7424 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7425 = llvm.insertelement %7422, %7423[%7424 : !llvm.i32] : !llvm.vec<8 x i32> + %7426 = llvm.shufflevector %7425, %7423 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7427 = llvm.add %7426, %7421 : !llvm.vec<8 x i32> + %7428 = llvm.trunc %7420 : !llvm.i64 to !llvm.i32 + %7429 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7430 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7431 = llvm.insertelement %7428, %7429[%7430 : !llvm.i32] : !llvm.vec<8 x i32> + %7432 = llvm.shufflevector %7431, %7429 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7433 = llvm.icmp "slt" %7427, %7432 : !llvm.vec<8 x i32> + llvm.intr.masked.store %7409, %7419, %7433 {alignment = 4 : i32} : !llvm.vec<8 x float>, !llvm.vec<8 x i1> into !llvm.ptr> + %7434 = llvm.add %7396, %69 : !llvm.i64 + llvm.br ^bb48(%7434 : !llvm.i64) + ^bb50: // pred: ^bb46 + %7435 = llvm.add %5447, %39 : !llvm.i64 + llvm.br ^bb41(%7435 : !llvm.i64) + ^bb51: // pred: ^bb41 + %7436 = llvm.add %2345, %69 : !llvm.i64 + llvm.br ^bb12(%7436 : !llvm.i64) + ^bb52: // pred: ^bb12 + %7437 = llvm.add %151, %38 : !llvm.i64 + llvm.br ^bb1(%7437 : !llvm.i64) + ^bb53: // pred: ^bb1 + llvm.return + } + llvm.func @optimized_matmul_py_4a6286d9(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(784 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(128 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(128 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(512 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(512 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(784 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(512 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(512 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/optimized_matmul/mlir/21_LLVMLegalizeForExport.mlir b/Tutorials/optimized_matmul/mlir/21_LLVMLegalizeForExport.mlir new file mode 100644 index 00000000..be537b05 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/21_LLVMLegalizeForExport.mlir @@ -0,0 +1,10140 @@ +module @optimized_matmul { + llvm.func @optimized_matmul_py_impl_17630232307017152746(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(781 : index) : !llvm.i64 + %25 = llvm.mlir.constant(782 : index) : !llvm.i64 + %26 = llvm.mlir.constant(783 : index) : !llvm.i64 + %27 = llvm.mlir.constant(512 : index) : !llvm.i64 + %28 = llvm.mlir.constant(780 : index) : !llvm.i64 + %29 = llvm.mlir.constant(256 : index) : !llvm.i64 + %30 = llvm.mlir.constant(16 : index) : !llvm.i64 + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.mlir.constant(0 : index) : !llvm.i64 + %33 = llvm.mlir.constant(1 : index) : !llvm.i64 + %34 = llvm.mlir.constant(2 : index) : !llvm.i64 + %35 = llvm.mlir.constant(3 : index) : !llvm.i64 + %36 = llvm.mlir.constant(4 : index) : !llvm.i64 + %37 = llvm.mlir.constant(5 : index) : !llvm.i64 + %38 = llvm.mlir.constant(6 : index) : !llvm.i64 + %39 = llvm.mlir.constant(7 : index) : !llvm.i64 + %40 = llvm.mlir.constant(8 : index) : !llvm.i64 + %41 = llvm.mlir.constant(9 : index) : !llvm.i64 + %42 = llvm.mlir.constant(10 : index) : !llvm.i64 + %43 = llvm.mlir.constant(11 : index) : !llvm.i64 + %44 = llvm.mlir.constant(12 : index) : !llvm.i64 + %45 = llvm.mlir.constant(13 : index) : !llvm.i64 + %46 = llvm.mlir.constant(14 : index) : !llvm.i64 + %47 = llvm.mlir.constant(15 : index) : !llvm.i64 + llvm.br ^bb1(%32 : !llvm.i64) + ^bb1(%48: !llvm.i64): // 2 preds: ^bb0, ^bb23 + %49 = llvm.icmp "slt" %48, %27 : !llvm.i64 + llvm.cond_br %49, ^bb2, ^bb24 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%32 : !llvm.i64) + ^bb3(%50: !llvm.i64): // 2 preds: ^bb2, ^bb13 + %51 = llvm.icmp "slt" %50, %28 : !llvm.i64 + llvm.cond_br %51, ^bb4, ^bb14 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%32 : !llvm.i64) + ^bb5(%52: !llvm.i64): // 2 preds: ^bb4, ^bb12 + %53 = llvm.icmp "slt" %52, %29 : !llvm.i64 + llvm.cond_br %53, ^bb6, ^bb13 + ^bb6: // pred: ^bb5 + llvm.br ^bb7(%32 : !llvm.i64) + ^bb7(%54: !llvm.i64): // 2 preds: ^bb6, ^bb11 + %55 = llvm.icmp "slt" %54, %31 : !llvm.i64 + llvm.cond_br %55, ^bb8, ^bb12 + ^bb8: // pred: ^bb7 + llvm.br ^bb9(%32 : !llvm.i64) + ^bb9(%56: !llvm.i64): // 2 preds: ^bb8, ^bb10 + %57 = llvm.icmp "slt" %56, %36 : !llvm.i64 + llvm.cond_br %57, ^bb10, ^bb11 + ^bb10: // pred: ^bb9 + %58 = llvm.add %48, %52 : !llvm.i64 + %59 = llvm.add %54, %56 : !llvm.i64 + %60 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %61 = llvm.mlir.constant(0 : index) : !llvm.i64 + %62 = llvm.mlir.constant(128 : index) : !llvm.i64 + %63 = llvm.mul %50, %62 : !llvm.i64 + %64 = llvm.add %61, %63 : !llvm.i64 + %65 = llvm.mlir.constant(1 : index) : !llvm.i64 + %66 = llvm.mul %59, %65 : !llvm.i64 + %67 = llvm.add %64, %66 : !llvm.i64 + %68 = llvm.getelementptr %60[%67] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %69 = llvm.load %68 : !llvm.ptr + %70 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %71 = llvm.mlir.constant(0 : index) : !llvm.i64 + %72 = llvm.mlir.constant(512 : index) : !llvm.i64 + %73 = llvm.mul %59, %72 : !llvm.i64 + %74 = llvm.add %71, %73 : !llvm.i64 + %75 = llvm.mlir.constant(1 : index) : !llvm.i64 + %76 = llvm.mul %58, %75 : !llvm.i64 + %77 = llvm.add %74, %76 : !llvm.i64 + %78 = llvm.getelementptr %70[%77] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %79 = llvm.load %78 : !llvm.ptr + %80 = llvm.fmul %69, %79 {RelaxedPrecision} : !llvm.float + %81 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %82 = llvm.mlir.constant(0 : index) : !llvm.i64 + %83 = llvm.mlir.constant(512 : index) : !llvm.i64 + %84 = llvm.mul %50, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.mlir.constant(1 : index) : !llvm.i64 + %87 = llvm.mul %58, %86 : !llvm.i64 + %88 = llvm.add %85, %87 : !llvm.i64 + %89 = llvm.getelementptr %81[%88] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %90 = llvm.load %89 : !llvm.ptr + %91 = llvm.fadd %90, %80 {RelaxedPrecision} : !llvm.float + %92 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %93 = llvm.mlir.constant(0 : index) : !llvm.i64 + %94 = llvm.mlir.constant(512 : index) : !llvm.i64 + %95 = llvm.mul %50, %94 : !llvm.i64 + %96 = llvm.add %93, %95 : !llvm.i64 + %97 = llvm.mlir.constant(1 : index) : !llvm.i64 + %98 = llvm.mul %58, %97 : !llvm.i64 + %99 = llvm.add %96, %98 : !llvm.i64 + %100 = llvm.getelementptr %92[%99] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %91, %100 : !llvm.ptr + %101 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %102 = llvm.mlir.constant(0 : index) : !llvm.i64 + %103 = llvm.mlir.constant(512 : index) : !llvm.i64 + %104 = llvm.mul %50, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %107 = llvm.mul %58, %106 : !llvm.i64 + %108 = llvm.add %105, %107 : !llvm.i64 + %109 = llvm.getelementptr %101[%108] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %110 = llvm.load %109 : !llvm.ptr + %111 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %112 = llvm.mlir.constant(0 : index) : !llvm.i64 + %113 = llvm.mlir.constant(512 : index) : !llvm.i64 + %114 = llvm.mul %50, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.mlir.constant(1 : index) : !llvm.i64 + %117 = llvm.mul %58, %116 : !llvm.i64 + %118 = llvm.add %115, %117 : !llvm.i64 + %119 = llvm.getelementptr %111[%118] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %110, %119 : !llvm.ptr + %120 = llvm.add %58, %33 : !llvm.i64 + %121 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %122 = llvm.mlir.constant(0 : index) : !llvm.i64 + %123 = llvm.mlir.constant(128 : index) : !llvm.i64 + %124 = llvm.mul %50, %123 : !llvm.i64 + %125 = llvm.add %122, %124 : !llvm.i64 + %126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %127 = llvm.mul %59, %126 : !llvm.i64 + %128 = llvm.add %125, %127 : !llvm.i64 + %129 = llvm.getelementptr %121[%128] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %130 = llvm.load %129 : !llvm.ptr + %131 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %134 = llvm.mul %59, %133 : !llvm.i64 + %135 = llvm.add %132, %134 : !llvm.i64 + %136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %137 = llvm.mul %120, %136 : !llvm.i64 + %138 = llvm.add %135, %137 : !llvm.i64 + %139 = llvm.getelementptr %131[%138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %140 = llvm.load %139 : !llvm.ptr + %141 = llvm.fmul %130, %140 {RelaxedPrecision} : !llvm.float + %142 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %143 = llvm.mlir.constant(0 : index) : !llvm.i64 + %144 = llvm.mlir.constant(512 : index) : !llvm.i64 + %145 = llvm.mul %50, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.mlir.constant(1 : index) : !llvm.i64 + %148 = llvm.mul %120, %147 : !llvm.i64 + %149 = llvm.add %146, %148 : !llvm.i64 + %150 = llvm.getelementptr %142[%149] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %151 = llvm.load %150 : !llvm.ptr + %152 = llvm.fadd %151, %141 {RelaxedPrecision} : !llvm.float + %153 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %154 = llvm.mlir.constant(0 : index) : !llvm.i64 + %155 = llvm.mlir.constant(512 : index) : !llvm.i64 + %156 = llvm.mul %50, %155 : !llvm.i64 + %157 = llvm.add %154, %156 : !llvm.i64 + %158 = llvm.mlir.constant(1 : index) : !llvm.i64 + %159 = llvm.mul %120, %158 : !llvm.i64 + %160 = llvm.add %157, %159 : !llvm.i64 + %161 = llvm.getelementptr %153[%160] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %152, %161 : !llvm.ptr + %162 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %163 = llvm.mlir.constant(0 : index) : !llvm.i64 + %164 = llvm.mlir.constant(512 : index) : !llvm.i64 + %165 = llvm.mul %50, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.mlir.constant(1 : index) : !llvm.i64 + %168 = llvm.mul %120, %167 : !llvm.i64 + %169 = llvm.add %166, %168 : !llvm.i64 + %170 = llvm.getelementptr %162[%169] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %171 = llvm.load %170 : !llvm.ptr + %172 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %173 = llvm.mlir.constant(0 : index) : !llvm.i64 + %174 = llvm.mlir.constant(512 : index) : !llvm.i64 + %175 = llvm.mul %50, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.mlir.constant(1 : index) : !llvm.i64 + %178 = llvm.mul %120, %177 : !llvm.i64 + %179 = llvm.add %176, %178 : !llvm.i64 + %180 = llvm.getelementptr %172[%179] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %171, %180 : !llvm.ptr + %181 = llvm.add %58, %34 : !llvm.i64 + %182 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %183 = llvm.mlir.constant(0 : index) : !llvm.i64 + %184 = llvm.mlir.constant(128 : index) : !llvm.i64 + %185 = llvm.mul %50, %184 : !llvm.i64 + %186 = llvm.add %183, %185 : !llvm.i64 + %187 = llvm.mlir.constant(1 : index) : !llvm.i64 + %188 = llvm.mul %59, %187 : !llvm.i64 + %189 = llvm.add %186, %188 : !llvm.i64 + %190 = llvm.getelementptr %182[%189] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %191 = llvm.load %190 : !llvm.ptr + %192 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %193 = llvm.mlir.constant(0 : index) : !llvm.i64 + %194 = llvm.mlir.constant(512 : index) : !llvm.i64 + %195 = llvm.mul %59, %194 : !llvm.i64 + %196 = llvm.add %193, %195 : !llvm.i64 + %197 = llvm.mlir.constant(1 : index) : !llvm.i64 + %198 = llvm.mul %181, %197 : !llvm.i64 + %199 = llvm.add %196, %198 : !llvm.i64 + %200 = llvm.getelementptr %192[%199] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %201 = llvm.load %200 : !llvm.ptr + %202 = llvm.fmul %191, %201 {RelaxedPrecision} : !llvm.float + %203 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %204 = llvm.mlir.constant(0 : index) : !llvm.i64 + %205 = llvm.mlir.constant(512 : index) : !llvm.i64 + %206 = llvm.mul %50, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.mlir.constant(1 : index) : !llvm.i64 + %209 = llvm.mul %181, %208 : !llvm.i64 + %210 = llvm.add %207, %209 : !llvm.i64 + %211 = llvm.getelementptr %203[%210] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %212 = llvm.load %211 : !llvm.ptr + %213 = llvm.fadd %212, %202 {RelaxedPrecision} : !llvm.float + %214 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %215 = llvm.mlir.constant(0 : index) : !llvm.i64 + %216 = llvm.mlir.constant(512 : index) : !llvm.i64 + %217 = llvm.mul %50, %216 : !llvm.i64 + %218 = llvm.add %215, %217 : !llvm.i64 + %219 = llvm.mlir.constant(1 : index) : !llvm.i64 + %220 = llvm.mul %181, %219 : !llvm.i64 + %221 = llvm.add %218, %220 : !llvm.i64 + %222 = llvm.getelementptr %214[%221] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %213, %222 : !llvm.ptr + %223 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %224 = llvm.mlir.constant(0 : index) : !llvm.i64 + %225 = llvm.mlir.constant(512 : index) : !llvm.i64 + %226 = llvm.mul %50, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.mlir.constant(1 : index) : !llvm.i64 + %229 = llvm.mul %181, %228 : !llvm.i64 + %230 = llvm.add %227, %229 : !llvm.i64 + %231 = llvm.getelementptr %223[%230] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %232 = llvm.load %231 : !llvm.ptr + %233 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %234 = llvm.mlir.constant(0 : index) : !llvm.i64 + %235 = llvm.mlir.constant(512 : index) : !llvm.i64 + %236 = llvm.mul %50, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %239 = llvm.mul %181, %238 : !llvm.i64 + %240 = llvm.add %237, %239 : !llvm.i64 + %241 = llvm.getelementptr %233[%240] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %232, %241 : !llvm.ptr + %242 = llvm.add %58, %35 : !llvm.i64 + %243 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %244 = llvm.mlir.constant(0 : index) : !llvm.i64 + %245 = llvm.mlir.constant(128 : index) : !llvm.i64 + %246 = llvm.mul %50, %245 : !llvm.i64 + %247 = llvm.add %244, %246 : !llvm.i64 + %248 = llvm.mlir.constant(1 : index) : !llvm.i64 + %249 = llvm.mul %59, %248 : !llvm.i64 + %250 = llvm.add %247, %249 : !llvm.i64 + %251 = llvm.getelementptr %243[%250] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %252 = llvm.load %251 : !llvm.ptr + %253 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %254 = llvm.mlir.constant(0 : index) : !llvm.i64 + %255 = llvm.mlir.constant(512 : index) : !llvm.i64 + %256 = llvm.mul %59, %255 : !llvm.i64 + %257 = llvm.add %254, %256 : !llvm.i64 + %258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %259 = llvm.mul %242, %258 : !llvm.i64 + %260 = llvm.add %257, %259 : !llvm.i64 + %261 = llvm.getelementptr %253[%260] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %262 = llvm.load %261 : !llvm.ptr + %263 = llvm.fmul %252, %262 {RelaxedPrecision} : !llvm.float + %264 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %265 = llvm.mlir.constant(0 : index) : !llvm.i64 + %266 = llvm.mlir.constant(512 : index) : !llvm.i64 + %267 = llvm.mul %50, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.mlir.constant(1 : index) : !llvm.i64 + %270 = llvm.mul %242, %269 : !llvm.i64 + %271 = llvm.add %268, %270 : !llvm.i64 + %272 = llvm.getelementptr %264[%271] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %273 = llvm.load %272 : !llvm.ptr + %274 = llvm.fadd %273, %263 {RelaxedPrecision} : !llvm.float + %275 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %276 = llvm.mlir.constant(0 : index) : !llvm.i64 + %277 = llvm.mlir.constant(512 : index) : !llvm.i64 + %278 = llvm.mul %50, %277 : !llvm.i64 + %279 = llvm.add %276, %278 : !llvm.i64 + %280 = llvm.mlir.constant(1 : index) : !llvm.i64 + %281 = llvm.mul %242, %280 : !llvm.i64 + %282 = llvm.add %279, %281 : !llvm.i64 + %283 = llvm.getelementptr %275[%282] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %274, %283 : !llvm.ptr + %284 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %286 = llvm.mlir.constant(512 : index) : !llvm.i64 + %287 = llvm.mul %50, %286 : !llvm.i64 + %288 = llvm.add %285, %287 : !llvm.i64 + %289 = llvm.mlir.constant(1 : index) : !llvm.i64 + %290 = llvm.mul %242, %289 : !llvm.i64 + %291 = llvm.add %288, %290 : !llvm.i64 + %292 = llvm.getelementptr %284[%291] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %293 = llvm.load %292 : !llvm.ptr + %294 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %295 = llvm.mlir.constant(0 : index) : !llvm.i64 + %296 = llvm.mlir.constant(512 : index) : !llvm.i64 + %297 = llvm.mul %50, %296 : !llvm.i64 + %298 = llvm.add %295, %297 : !llvm.i64 + %299 = llvm.mlir.constant(1 : index) : !llvm.i64 + %300 = llvm.mul %242, %299 : !llvm.i64 + %301 = llvm.add %298, %300 : !llvm.i64 + %302 = llvm.getelementptr %294[%301] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %293, %302 : !llvm.ptr + %303 = llvm.add %58, %36 : !llvm.i64 + %304 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %305 = llvm.mlir.constant(0 : index) : !llvm.i64 + %306 = llvm.mlir.constant(128 : index) : !llvm.i64 + %307 = llvm.mul %50, %306 : !llvm.i64 + %308 = llvm.add %305, %307 : !llvm.i64 + %309 = llvm.mlir.constant(1 : index) : !llvm.i64 + %310 = llvm.mul %59, %309 : !llvm.i64 + %311 = llvm.add %308, %310 : !llvm.i64 + %312 = llvm.getelementptr %304[%311] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %313 = llvm.load %312 : !llvm.ptr + %314 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %316 = llvm.mlir.constant(512 : index) : !llvm.i64 + %317 = llvm.mul %59, %316 : !llvm.i64 + %318 = llvm.add %315, %317 : !llvm.i64 + %319 = llvm.mlir.constant(1 : index) : !llvm.i64 + %320 = llvm.mul %303, %319 : !llvm.i64 + %321 = llvm.add %318, %320 : !llvm.i64 + %322 = llvm.getelementptr %314[%321] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %323 = llvm.load %322 : !llvm.ptr + %324 = llvm.fmul %313, %323 {RelaxedPrecision} : !llvm.float + %325 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %326 = llvm.mlir.constant(0 : index) : !llvm.i64 + %327 = llvm.mlir.constant(512 : index) : !llvm.i64 + %328 = llvm.mul %50, %327 : !llvm.i64 + %329 = llvm.add %326, %328 : !llvm.i64 + %330 = llvm.mlir.constant(1 : index) : !llvm.i64 + %331 = llvm.mul %303, %330 : !llvm.i64 + %332 = llvm.add %329, %331 : !llvm.i64 + %333 = llvm.getelementptr %325[%332] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %334 = llvm.load %333 : !llvm.ptr + %335 = llvm.fadd %334, %324 {RelaxedPrecision} : !llvm.float + %336 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %337 = llvm.mlir.constant(0 : index) : !llvm.i64 + %338 = llvm.mlir.constant(512 : index) : !llvm.i64 + %339 = llvm.mul %50, %338 : !llvm.i64 + %340 = llvm.add %337, %339 : !llvm.i64 + %341 = llvm.mlir.constant(1 : index) : !llvm.i64 + %342 = llvm.mul %303, %341 : !llvm.i64 + %343 = llvm.add %340, %342 : !llvm.i64 + %344 = llvm.getelementptr %336[%343] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %335, %344 : !llvm.ptr + %345 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %346 = llvm.mlir.constant(0 : index) : !llvm.i64 + %347 = llvm.mlir.constant(512 : index) : !llvm.i64 + %348 = llvm.mul %50, %347 : !llvm.i64 + %349 = llvm.add %346, %348 : !llvm.i64 + %350 = llvm.mlir.constant(1 : index) : !llvm.i64 + %351 = llvm.mul %303, %350 : !llvm.i64 + %352 = llvm.add %349, %351 : !llvm.i64 + %353 = llvm.getelementptr %345[%352] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %354 = llvm.load %353 : !llvm.ptr + %355 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %356 = llvm.mlir.constant(0 : index) : !llvm.i64 + %357 = llvm.mlir.constant(512 : index) : !llvm.i64 + %358 = llvm.mul %50, %357 : !llvm.i64 + %359 = llvm.add %356, %358 : !llvm.i64 + %360 = llvm.mlir.constant(1 : index) : !llvm.i64 + %361 = llvm.mul %303, %360 : !llvm.i64 + %362 = llvm.add %359, %361 : !llvm.i64 + %363 = llvm.getelementptr %355[%362] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %354, %363 : !llvm.ptr + %364 = llvm.add %58, %37 : !llvm.i64 + %365 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %366 = llvm.mlir.constant(0 : index) : !llvm.i64 + %367 = llvm.mlir.constant(128 : index) : !llvm.i64 + %368 = llvm.mul %50, %367 : !llvm.i64 + %369 = llvm.add %366, %368 : !llvm.i64 + %370 = llvm.mlir.constant(1 : index) : !llvm.i64 + %371 = llvm.mul %59, %370 : !llvm.i64 + %372 = llvm.add %369, %371 : !llvm.i64 + %373 = llvm.getelementptr %365[%372] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %374 = llvm.load %373 : !llvm.ptr + %375 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %376 = llvm.mlir.constant(0 : index) : !llvm.i64 + %377 = llvm.mlir.constant(512 : index) : !llvm.i64 + %378 = llvm.mul %59, %377 : !llvm.i64 + %379 = llvm.add %376, %378 : !llvm.i64 + %380 = llvm.mlir.constant(1 : index) : !llvm.i64 + %381 = llvm.mul %364, %380 : !llvm.i64 + %382 = llvm.add %379, %381 : !llvm.i64 + %383 = llvm.getelementptr %375[%382] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %384 = llvm.load %383 : !llvm.ptr + %385 = llvm.fmul %374, %384 {RelaxedPrecision} : !llvm.float + %386 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %387 = llvm.mlir.constant(0 : index) : !llvm.i64 + %388 = llvm.mlir.constant(512 : index) : !llvm.i64 + %389 = llvm.mul %50, %388 : !llvm.i64 + %390 = llvm.add %387, %389 : !llvm.i64 + %391 = llvm.mlir.constant(1 : index) : !llvm.i64 + %392 = llvm.mul %364, %391 : !llvm.i64 + %393 = llvm.add %390, %392 : !llvm.i64 + %394 = llvm.getelementptr %386[%393] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %395 = llvm.load %394 : !llvm.ptr + %396 = llvm.fadd %395, %385 {RelaxedPrecision} : !llvm.float + %397 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %398 = llvm.mlir.constant(0 : index) : !llvm.i64 + %399 = llvm.mlir.constant(512 : index) : !llvm.i64 + %400 = llvm.mul %50, %399 : !llvm.i64 + %401 = llvm.add %398, %400 : !llvm.i64 + %402 = llvm.mlir.constant(1 : index) : !llvm.i64 + %403 = llvm.mul %364, %402 : !llvm.i64 + %404 = llvm.add %401, %403 : !llvm.i64 + %405 = llvm.getelementptr %397[%404] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %396, %405 : !llvm.ptr + %406 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %408 = llvm.mlir.constant(512 : index) : !llvm.i64 + %409 = llvm.mul %50, %408 : !llvm.i64 + %410 = llvm.add %407, %409 : !llvm.i64 + %411 = llvm.mlir.constant(1 : index) : !llvm.i64 + %412 = llvm.mul %364, %411 : !llvm.i64 + %413 = llvm.add %410, %412 : !llvm.i64 + %414 = llvm.getelementptr %406[%413] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %415 = llvm.load %414 : !llvm.ptr + %416 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %417 = llvm.mlir.constant(0 : index) : !llvm.i64 + %418 = llvm.mlir.constant(512 : index) : !llvm.i64 + %419 = llvm.mul %50, %418 : !llvm.i64 + %420 = llvm.add %417, %419 : !llvm.i64 + %421 = llvm.mlir.constant(1 : index) : !llvm.i64 + %422 = llvm.mul %364, %421 : !llvm.i64 + %423 = llvm.add %420, %422 : !llvm.i64 + %424 = llvm.getelementptr %416[%423] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %415, %424 : !llvm.ptr + %425 = llvm.add %58, %38 : !llvm.i64 + %426 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %427 = llvm.mlir.constant(0 : index) : !llvm.i64 + %428 = llvm.mlir.constant(128 : index) : !llvm.i64 + %429 = llvm.mul %50, %428 : !llvm.i64 + %430 = llvm.add %427, %429 : !llvm.i64 + %431 = llvm.mlir.constant(1 : index) : !llvm.i64 + %432 = llvm.mul %59, %431 : !llvm.i64 + %433 = llvm.add %430, %432 : !llvm.i64 + %434 = llvm.getelementptr %426[%433] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %435 = llvm.load %434 : !llvm.ptr + %436 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %437 = llvm.mlir.constant(0 : index) : !llvm.i64 + %438 = llvm.mlir.constant(512 : index) : !llvm.i64 + %439 = llvm.mul %59, %438 : !llvm.i64 + %440 = llvm.add %437, %439 : !llvm.i64 + %441 = llvm.mlir.constant(1 : index) : !llvm.i64 + %442 = llvm.mul %425, %441 : !llvm.i64 + %443 = llvm.add %440, %442 : !llvm.i64 + %444 = llvm.getelementptr %436[%443] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %445 = llvm.load %444 : !llvm.ptr + %446 = llvm.fmul %435, %445 {RelaxedPrecision} : !llvm.float + %447 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %449 = llvm.mlir.constant(512 : index) : !llvm.i64 + %450 = llvm.mul %50, %449 : !llvm.i64 + %451 = llvm.add %448, %450 : !llvm.i64 + %452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %453 = llvm.mul %425, %452 : !llvm.i64 + %454 = llvm.add %451, %453 : !llvm.i64 + %455 = llvm.getelementptr %447[%454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %456 = llvm.load %455 : !llvm.ptr + %457 = llvm.fadd %456, %446 {RelaxedPrecision} : !llvm.float + %458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %461 = llvm.mul %50, %460 : !llvm.i64 + %462 = llvm.add %459, %461 : !llvm.i64 + %463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %464 = llvm.mul %425, %463 : !llvm.i64 + %465 = llvm.add %462, %464 : !llvm.i64 + %466 = llvm.getelementptr %458[%465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %457, %466 : !llvm.ptr + %467 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %468 = llvm.mlir.constant(0 : index) : !llvm.i64 + %469 = llvm.mlir.constant(512 : index) : !llvm.i64 + %470 = llvm.mul %50, %469 : !llvm.i64 + %471 = llvm.add %468, %470 : !llvm.i64 + %472 = llvm.mlir.constant(1 : index) : !llvm.i64 + %473 = llvm.mul %425, %472 : !llvm.i64 + %474 = llvm.add %471, %473 : !llvm.i64 + %475 = llvm.getelementptr %467[%474] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %476 = llvm.load %475 : !llvm.ptr + %477 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %480 = llvm.mul %50, %479 : !llvm.i64 + %481 = llvm.add %478, %480 : !llvm.i64 + %482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %483 = llvm.mul %425, %482 : !llvm.i64 + %484 = llvm.add %481, %483 : !llvm.i64 + %485 = llvm.getelementptr %477[%484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %476, %485 : !llvm.ptr + %486 = llvm.add %58, %39 : !llvm.i64 + %487 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %489 = llvm.mlir.constant(128 : index) : !llvm.i64 + %490 = llvm.mul %50, %489 : !llvm.i64 + %491 = llvm.add %488, %490 : !llvm.i64 + %492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %493 = llvm.mul %59, %492 : !llvm.i64 + %494 = llvm.add %491, %493 : !llvm.i64 + %495 = llvm.getelementptr %487[%494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %496 = llvm.load %495 : !llvm.ptr + %497 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %500 = llvm.mul %59, %499 : !llvm.i64 + %501 = llvm.add %498, %500 : !llvm.i64 + %502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %503 = llvm.mul %486, %502 : !llvm.i64 + %504 = llvm.add %501, %503 : !llvm.i64 + %505 = llvm.getelementptr %497[%504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %506 = llvm.load %505 : !llvm.ptr + %507 = llvm.fmul %496, %506 {RelaxedPrecision} : !llvm.float + %508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %511 = llvm.mul %50, %510 : !llvm.i64 + %512 = llvm.add %509, %511 : !llvm.i64 + %513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %514 = llvm.mul %486, %513 : !llvm.i64 + %515 = llvm.add %512, %514 : !llvm.i64 + %516 = llvm.getelementptr %508[%515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %517 = llvm.load %516 : !llvm.ptr + %518 = llvm.fadd %517, %507 {RelaxedPrecision} : !llvm.float + %519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %522 = llvm.mul %50, %521 : !llvm.i64 + %523 = llvm.add %520, %522 : !llvm.i64 + %524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %525 = llvm.mul %486, %524 : !llvm.i64 + %526 = llvm.add %523, %525 : !llvm.i64 + %527 = llvm.getelementptr %519[%526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %518, %527 : !llvm.ptr + %528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %531 = llvm.mul %50, %530 : !llvm.i64 + %532 = llvm.add %529, %531 : !llvm.i64 + %533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %534 = llvm.mul %486, %533 : !llvm.i64 + %535 = llvm.add %532, %534 : !llvm.i64 + %536 = llvm.getelementptr %528[%535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %537 = llvm.load %536 : !llvm.ptr + %538 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %539 = llvm.mlir.constant(0 : index) : !llvm.i64 + %540 = llvm.mlir.constant(512 : index) : !llvm.i64 + %541 = llvm.mul %50, %540 : !llvm.i64 + %542 = llvm.add %539, %541 : !llvm.i64 + %543 = llvm.mlir.constant(1 : index) : !llvm.i64 + %544 = llvm.mul %486, %543 : !llvm.i64 + %545 = llvm.add %542, %544 : !llvm.i64 + %546 = llvm.getelementptr %538[%545] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %537, %546 : !llvm.ptr + %547 = llvm.add %58, %40 : !llvm.i64 + %548 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %550 = llvm.mlir.constant(128 : index) : !llvm.i64 + %551 = llvm.mul %50, %550 : !llvm.i64 + %552 = llvm.add %549, %551 : !llvm.i64 + %553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %554 = llvm.mul %59, %553 : !llvm.i64 + %555 = llvm.add %552, %554 : !llvm.i64 + %556 = llvm.getelementptr %548[%555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %557 = llvm.load %556 : !llvm.ptr + %558 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %561 = llvm.mul %59, %560 : !llvm.i64 + %562 = llvm.add %559, %561 : !llvm.i64 + %563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %564 = llvm.mul %547, %563 : !llvm.i64 + %565 = llvm.add %562, %564 : !llvm.i64 + %566 = llvm.getelementptr %558[%565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %567 = llvm.load %566 : !llvm.ptr + %568 = llvm.fmul %557, %567 {RelaxedPrecision} : !llvm.float + %569 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %570 = llvm.mlir.constant(0 : index) : !llvm.i64 + %571 = llvm.mlir.constant(512 : index) : !llvm.i64 + %572 = llvm.mul %50, %571 : !llvm.i64 + %573 = llvm.add %570, %572 : !llvm.i64 + %574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %575 = llvm.mul %547, %574 : !llvm.i64 + %576 = llvm.add %573, %575 : !llvm.i64 + %577 = llvm.getelementptr %569[%576] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %578 = llvm.load %577 : !llvm.ptr + %579 = llvm.fadd %578, %568 {RelaxedPrecision} : !llvm.float + %580 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %581 = llvm.mlir.constant(0 : index) : !llvm.i64 + %582 = llvm.mlir.constant(512 : index) : !llvm.i64 + %583 = llvm.mul %50, %582 : !llvm.i64 + %584 = llvm.add %581, %583 : !llvm.i64 + %585 = llvm.mlir.constant(1 : index) : !llvm.i64 + %586 = llvm.mul %547, %585 : !llvm.i64 + %587 = llvm.add %584, %586 : !llvm.i64 + %588 = llvm.getelementptr %580[%587] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %579, %588 : !llvm.ptr + %589 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %591 = llvm.mlir.constant(512 : index) : !llvm.i64 + %592 = llvm.mul %50, %591 : !llvm.i64 + %593 = llvm.add %590, %592 : !llvm.i64 + %594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %595 = llvm.mul %547, %594 : !llvm.i64 + %596 = llvm.add %593, %595 : !llvm.i64 + %597 = llvm.getelementptr %589[%596] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %598 = llvm.load %597 : !llvm.ptr + %599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %602 = llvm.mul %50, %601 : !llvm.i64 + %603 = llvm.add %600, %602 : !llvm.i64 + %604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %605 = llvm.mul %547, %604 : !llvm.i64 + %606 = llvm.add %603, %605 : !llvm.i64 + %607 = llvm.getelementptr %599[%606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %598, %607 : !llvm.ptr + %608 = llvm.add %58, %41 : !llvm.i64 + %609 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %610 = llvm.mlir.constant(0 : index) : !llvm.i64 + %611 = llvm.mlir.constant(128 : index) : !llvm.i64 + %612 = llvm.mul %50, %611 : !llvm.i64 + %613 = llvm.add %610, %612 : !llvm.i64 + %614 = llvm.mlir.constant(1 : index) : !llvm.i64 + %615 = llvm.mul %59, %614 : !llvm.i64 + %616 = llvm.add %613, %615 : !llvm.i64 + %617 = llvm.getelementptr %609[%616] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %618 = llvm.load %617 : !llvm.ptr + %619 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %620 = llvm.mlir.constant(0 : index) : !llvm.i64 + %621 = llvm.mlir.constant(512 : index) : !llvm.i64 + %622 = llvm.mul %59, %621 : !llvm.i64 + %623 = llvm.add %620, %622 : !llvm.i64 + %624 = llvm.mlir.constant(1 : index) : !llvm.i64 + %625 = llvm.mul %608, %624 : !llvm.i64 + %626 = llvm.add %623, %625 : !llvm.i64 + %627 = llvm.getelementptr %619[%626] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %628 = llvm.load %627 : !llvm.ptr + %629 = llvm.fmul %618, %628 {RelaxedPrecision} : !llvm.float + %630 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %632 = llvm.mlir.constant(512 : index) : !llvm.i64 + %633 = llvm.mul %50, %632 : !llvm.i64 + %634 = llvm.add %631, %633 : !llvm.i64 + %635 = llvm.mlir.constant(1 : index) : !llvm.i64 + %636 = llvm.mul %608, %635 : !llvm.i64 + %637 = llvm.add %634, %636 : !llvm.i64 + %638 = llvm.getelementptr %630[%637] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %639 = llvm.load %638 : !llvm.ptr + %640 = llvm.fadd %639, %629 {RelaxedPrecision} : !llvm.float + %641 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %642 = llvm.mlir.constant(0 : index) : !llvm.i64 + %643 = llvm.mlir.constant(512 : index) : !llvm.i64 + %644 = llvm.mul %50, %643 : !llvm.i64 + %645 = llvm.add %642, %644 : !llvm.i64 + %646 = llvm.mlir.constant(1 : index) : !llvm.i64 + %647 = llvm.mul %608, %646 : !llvm.i64 + %648 = llvm.add %645, %647 : !llvm.i64 + %649 = llvm.getelementptr %641[%648] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %640, %649 : !llvm.ptr + %650 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %651 = llvm.mlir.constant(0 : index) : !llvm.i64 + %652 = llvm.mlir.constant(512 : index) : !llvm.i64 + %653 = llvm.mul %50, %652 : !llvm.i64 + %654 = llvm.add %651, %653 : !llvm.i64 + %655 = llvm.mlir.constant(1 : index) : !llvm.i64 + %656 = llvm.mul %608, %655 : !llvm.i64 + %657 = llvm.add %654, %656 : !llvm.i64 + %658 = llvm.getelementptr %650[%657] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %659 = llvm.load %658 : !llvm.ptr + %660 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %662 = llvm.mlir.constant(512 : index) : !llvm.i64 + %663 = llvm.mul %50, %662 : !llvm.i64 + %664 = llvm.add %661, %663 : !llvm.i64 + %665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %666 = llvm.mul %608, %665 : !llvm.i64 + %667 = llvm.add %664, %666 : !llvm.i64 + %668 = llvm.getelementptr %660[%667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %659, %668 : !llvm.ptr + %669 = llvm.add %58, %42 : !llvm.i64 + %670 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %672 = llvm.mlir.constant(128 : index) : !llvm.i64 + %673 = llvm.mul %50, %672 : !llvm.i64 + %674 = llvm.add %671, %673 : !llvm.i64 + %675 = llvm.mlir.constant(1 : index) : !llvm.i64 + %676 = llvm.mul %59, %675 : !llvm.i64 + %677 = llvm.add %674, %676 : !llvm.i64 + %678 = llvm.getelementptr %670[%677] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %679 = llvm.load %678 : !llvm.ptr + %680 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %681 = llvm.mlir.constant(0 : index) : !llvm.i64 + %682 = llvm.mlir.constant(512 : index) : !llvm.i64 + %683 = llvm.mul %59, %682 : !llvm.i64 + %684 = llvm.add %681, %683 : !llvm.i64 + %685 = llvm.mlir.constant(1 : index) : !llvm.i64 + %686 = llvm.mul %669, %685 : !llvm.i64 + %687 = llvm.add %684, %686 : !llvm.i64 + %688 = llvm.getelementptr %680[%687] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %689 = llvm.load %688 : !llvm.ptr + %690 = llvm.fmul %679, %689 {RelaxedPrecision} : !llvm.float + %691 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %692 = llvm.mlir.constant(0 : index) : !llvm.i64 + %693 = llvm.mlir.constant(512 : index) : !llvm.i64 + %694 = llvm.mul %50, %693 : !llvm.i64 + %695 = llvm.add %692, %694 : !llvm.i64 + %696 = llvm.mlir.constant(1 : index) : !llvm.i64 + %697 = llvm.mul %669, %696 : !llvm.i64 + %698 = llvm.add %695, %697 : !llvm.i64 + %699 = llvm.getelementptr %691[%698] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %700 = llvm.load %699 : !llvm.ptr + %701 = llvm.fadd %700, %690 {RelaxedPrecision} : !llvm.float + %702 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %703 = llvm.mlir.constant(0 : index) : !llvm.i64 + %704 = llvm.mlir.constant(512 : index) : !llvm.i64 + %705 = llvm.mul %50, %704 : !llvm.i64 + %706 = llvm.add %703, %705 : !llvm.i64 + %707 = llvm.mlir.constant(1 : index) : !llvm.i64 + %708 = llvm.mul %669, %707 : !llvm.i64 + %709 = llvm.add %706, %708 : !llvm.i64 + %710 = llvm.getelementptr %702[%709] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %701, %710 : !llvm.ptr + %711 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %712 = llvm.mlir.constant(0 : index) : !llvm.i64 + %713 = llvm.mlir.constant(512 : index) : !llvm.i64 + %714 = llvm.mul %50, %713 : !llvm.i64 + %715 = llvm.add %712, %714 : !llvm.i64 + %716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %717 = llvm.mul %669, %716 : !llvm.i64 + %718 = llvm.add %715, %717 : !llvm.i64 + %719 = llvm.getelementptr %711[%718] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %720 = llvm.load %719 : !llvm.ptr + %721 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %722 = llvm.mlir.constant(0 : index) : !llvm.i64 + %723 = llvm.mlir.constant(512 : index) : !llvm.i64 + %724 = llvm.mul %50, %723 : !llvm.i64 + %725 = llvm.add %722, %724 : !llvm.i64 + %726 = llvm.mlir.constant(1 : index) : !llvm.i64 + %727 = llvm.mul %669, %726 : !llvm.i64 + %728 = llvm.add %725, %727 : !llvm.i64 + %729 = llvm.getelementptr %721[%728] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %720, %729 : !llvm.ptr + %730 = llvm.add %58, %43 : !llvm.i64 + %731 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %732 = llvm.mlir.constant(0 : index) : !llvm.i64 + %733 = llvm.mlir.constant(128 : index) : !llvm.i64 + %734 = llvm.mul %50, %733 : !llvm.i64 + %735 = llvm.add %732, %734 : !llvm.i64 + %736 = llvm.mlir.constant(1 : index) : !llvm.i64 + %737 = llvm.mul %59, %736 : !llvm.i64 + %738 = llvm.add %735, %737 : !llvm.i64 + %739 = llvm.getelementptr %731[%738] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %740 = llvm.load %739 : !llvm.ptr + %741 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %742 = llvm.mlir.constant(0 : index) : !llvm.i64 + %743 = llvm.mlir.constant(512 : index) : !llvm.i64 + %744 = llvm.mul %59, %743 : !llvm.i64 + %745 = llvm.add %742, %744 : !llvm.i64 + %746 = llvm.mlir.constant(1 : index) : !llvm.i64 + %747 = llvm.mul %730, %746 : !llvm.i64 + %748 = llvm.add %745, %747 : !llvm.i64 + %749 = llvm.getelementptr %741[%748] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %750 = llvm.load %749 : !llvm.ptr + %751 = llvm.fmul %740, %750 {RelaxedPrecision} : !llvm.float + %752 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %754 = llvm.mlir.constant(512 : index) : !llvm.i64 + %755 = llvm.mul %50, %754 : !llvm.i64 + %756 = llvm.add %753, %755 : !llvm.i64 + %757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %758 = llvm.mul %730, %757 : !llvm.i64 + %759 = llvm.add %756, %758 : !llvm.i64 + %760 = llvm.getelementptr %752[%759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %761 = llvm.load %760 : !llvm.ptr + %762 = llvm.fadd %761, %751 {RelaxedPrecision} : !llvm.float + %763 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %764 = llvm.mlir.constant(0 : index) : !llvm.i64 + %765 = llvm.mlir.constant(512 : index) : !llvm.i64 + %766 = llvm.mul %50, %765 : !llvm.i64 + %767 = llvm.add %764, %766 : !llvm.i64 + %768 = llvm.mlir.constant(1 : index) : !llvm.i64 + %769 = llvm.mul %730, %768 : !llvm.i64 + %770 = llvm.add %767, %769 : !llvm.i64 + %771 = llvm.getelementptr %763[%770] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %762, %771 : !llvm.ptr + %772 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %773 = llvm.mlir.constant(0 : index) : !llvm.i64 + %774 = llvm.mlir.constant(512 : index) : !llvm.i64 + %775 = llvm.mul %50, %774 : !llvm.i64 + %776 = llvm.add %773, %775 : !llvm.i64 + %777 = llvm.mlir.constant(1 : index) : !llvm.i64 + %778 = llvm.mul %730, %777 : !llvm.i64 + %779 = llvm.add %776, %778 : !llvm.i64 + %780 = llvm.getelementptr %772[%779] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %781 = llvm.load %780 : !llvm.ptr + %782 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %784 = llvm.mlir.constant(512 : index) : !llvm.i64 + %785 = llvm.mul %50, %784 : !llvm.i64 + %786 = llvm.add %783, %785 : !llvm.i64 + %787 = llvm.mlir.constant(1 : index) : !llvm.i64 + %788 = llvm.mul %730, %787 : !llvm.i64 + %789 = llvm.add %786, %788 : !llvm.i64 + %790 = llvm.getelementptr %782[%789] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %781, %790 : !llvm.ptr + %791 = llvm.add %58, %44 : !llvm.i64 + %792 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %794 = llvm.mlir.constant(128 : index) : !llvm.i64 + %795 = llvm.mul %50, %794 : !llvm.i64 + %796 = llvm.add %793, %795 : !llvm.i64 + %797 = llvm.mlir.constant(1 : index) : !llvm.i64 + %798 = llvm.mul %59, %797 : !llvm.i64 + %799 = llvm.add %796, %798 : !llvm.i64 + %800 = llvm.getelementptr %792[%799] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %801 = llvm.load %800 : !llvm.ptr + %802 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %803 = llvm.mlir.constant(0 : index) : !llvm.i64 + %804 = llvm.mlir.constant(512 : index) : !llvm.i64 + %805 = llvm.mul %59, %804 : !llvm.i64 + %806 = llvm.add %803, %805 : !llvm.i64 + %807 = llvm.mlir.constant(1 : index) : !llvm.i64 + %808 = llvm.mul %791, %807 : !llvm.i64 + %809 = llvm.add %806, %808 : !llvm.i64 + %810 = llvm.getelementptr %802[%809] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %811 = llvm.load %810 : !llvm.ptr + %812 = llvm.fmul %801, %811 {RelaxedPrecision} : !llvm.float + %813 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %814 = llvm.mlir.constant(0 : index) : !llvm.i64 + %815 = llvm.mlir.constant(512 : index) : !llvm.i64 + %816 = llvm.mul %50, %815 : !llvm.i64 + %817 = llvm.add %814, %816 : !llvm.i64 + %818 = llvm.mlir.constant(1 : index) : !llvm.i64 + %819 = llvm.mul %791, %818 : !llvm.i64 + %820 = llvm.add %817, %819 : !llvm.i64 + %821 = llvm.getelementptr %813[%820] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %822 = llvm.load %821 : !llvm.ptr + %823 = llvm.fadd %822, %812 {RelaxedPrecision} : !llvm.float + %824 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %825 = llvm.mlir.constant(0 : index) : !llvm.i64 + %826 = llvm.mlir.constant(512 : index) : !llvm.i64 + %827 = llvm.mul %50, %826 : !llvm.i64 + %828 = llvm.add %825, %827 : !llvm.i64 + %829 = llvm.mlir.constant(1 : index) : !llvm.i64 + %830 = llvm.mul %791, %829 : !llvm.i64 + %831 = llvm.add %828, %830 : !llvm.i64 + %832 = llvm.getelementptr %824[%831] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %823, %832 : !llvm.ptr + %833 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %834 = llvm.mlir.constant(0 : index) : !llvm.i64 + %835 = llvm.mlir.constant(512 : index) : !llvm.i64 + %836 = llvm.mul %50, %835 : !llvm.i64 + %837 = llvm.add %834, %836 : !llvm.i64 + %838 = llvm.mlir.constant(1 : index) : !llvm.i64 + %839 = llvm.mul %791, %838 : !llvm.i64 + %840 = llvm.add %837, %839 : !llvm.i64 + %841 = llvm.getelementptr %833[%840] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %842 = llvm.load %841 : !llvm.ptr + %843 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %844 = llvm.mlir.constant(0 : index) : !llvm.i64 + %845 = llvm.mlir.constant(512 : index) : !llvm.i64 + %846 = llvm.mul %50, %845 : !llvm.i64 + %847 = llvm.add %844, %846 : !llvm.i64 + %848 = llvm.mlir.constant(1 : index) : !llvm.i64 + %849 = llvm.mul %791, %848 : !llvm.i64 + %850 = llvm.add %847, %849 : !llvm.i64 + %851 = llvm.getelementptr %843[%850] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %842, %851 : !llvm.ptr + %852 = llvm.add %58, %45 : !llvm.i64 + %853 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %854 = llvm.mlir.constant(0 : index) : !llvm.i64 + %855 = llvm.mlir.constant(128 : index) : !llvm.i64 + %856 = llvm.mul %50, %855 : !llvm.i64 + %857 = llvm.add %854, %856 : !llvm.i64 + %858 = llvm.mlir.constant(1 : index) : !llvm.i64 + %859 = llvm.mul %59, %858 : !llvm.i64 + %860 = llvm.add %857, %859 : !llvm.i64 + %861 = llvm.getelementptr %853[%860] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %862 = llvm.load %861 : !llvm.ptr + %863 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %864 = llvm.mlir.constant(0 : index) : !llvm.i64 + %865 = llvm.mlir.constant(512 : index) : !llvm.i64 + %866 = llvm.mul %59, %865 : !llvm.i64 + %867 = llvm.add %864, %866 : !llvm.i64 + %868 = llvm.mlir.constant(1 : index) : !llvm.i64 + %869 = llvm.mul %852, %868 : !llvm.i64 + %870 = llvm.add %867, %869 : !llvm.i64 + %871 = llvm.getelementptr %863[%870] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %872 = llvm.load %871 : !llvm.ptr + %873 = llvm.fmul %862, %872 {RelaxedPrecision} : !llvm.float + %874 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %875 = llvm.mlir.constant(0 : index) : !llvm.i64 + %876 = llvm.mlir.constant(512 : index) : !llvm.i64 + %877 = llvm.mul %50, %876 : !llvm.i64 + %878 = llvm.add %875, %877 : !llvm.i64 + %879 = llvm.mlir.constant(1 : index) : !llvm.i64 + %880 = llvm.mul %852, %879 : !llvm.i64 + %881 = llvm.add %878, %880 : !llvm.i64 + %882 = llvm.getelementptr %874[%881] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %883 = llvm.load %882 : !llvm.ptr + %884 = llvm.fadd %883, %873 {RelaxedPrecision} : !llvm.float + %885 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %886 = llvm.mlir.constant(0 : index) : !llvm.i64 + %887 = llvm.mlir.constant(512 : index) : !llvm.i64 + %888 = llvm.mul %50, %887 : !llvm.i64 + %889 = llvm.add %886, %888 : !llvm.i64 + %890 = llvm.mlir.constant(1 : index) : !llvm.i64 + %891 = llvm.mul %852, %890 : !llvm.i64 + %892 = llvm.add %889, %891 : !llvm.i64 + %893 = llvm.getelementptr %885[%892] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %884, %893 : !llvm.ptr + %894 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %895 = llvm.mlir.constant(0 : index) : !llvm.i64 + %896 = llvm.mlir.constant(512 : index) : !llvm.i64 + %897 = llvm.mul %50, %896 : !llvm.i64 + %898 = llvm.add %895, %897 : !llvm.i64 + %899 = llvm.mlir.constant(1 : index) : !llvm.i64 + %900 = llvm.mul %852, %899 : !llvm.i64 + %901 = llvm.add %898, %900 : !llvm.i64 + %902 = llvm.getelementptr %894[%901] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %903 = llvm.load %902 : !llvm.ptr + %904 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %905 = llvm.mlir.constant(0 : index) : !llvm.i64 + %906 = llvm.mlir.constant(512 : index) : !llvm.i64 + %907 = llvm.mul %50, %906 : !llvm.i64 + %908 = llvm.add %905, %907 : !llvm.i64 + %909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %910 = llvm.mul %852, %909 : !llvm.i64 + %911 = llvm.add %908, %910 : !llvm.i64 + %912 = llvm.getelementptr %904[%911] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %903, %912 : !llvm.ptr + %913 = llvm.add %58, %46 : !llvm.i64 + %914 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %915 = llvm.mlir.constant(0 : index) : !llvm.i64 + %916 = llvm.mlir.constant(128 : index) : !llvm.i64 + %917 = llvm.mul %50, %916 : !llvm.i64 + %918 = llvm.add %915, %917 : !llvm.i64 + %919 = llvm.mlir.constant(1 : index) : !llvm.i64 + %920 = llvm.mul %59, %919 : !llvm.i64 + %921 = llvm.add %918, %920 : !llvm.i64 + %922 = llvm.getelementptr %914[%921] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %923 = llvm.load %922 : !llvm.ptr + %924 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %925 = llvm.mlir.constant(0 : index) : !llvm.i64 + %926 = llvm.mlir.constant(512 : index) : !llvm.i64 + %927 = llvm.mul %59, %926 : !llvm.i64 + %928 = llvm.add %925, %927 : !llvm.i64 + %929 = llvm.mlir.constant(1 : index) : !llvm.i64 + %930 = llvm.mul %913, %929 : !llvm.i64 + %931 = llvm.add %928, %930 : !llvm.i64 + %932 = llvm.getelementptr %924[%931] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %933 = llvm.load %932 : !llvm.ptr + %934 = llvm.fmul %923, %933 {RelaxedPrecision} : !llvm.float + %935 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %936 = llvm.mlir.constant(0 : index) : !llvm.i64 + %937 = llvm.mlir.constant(512 : index) : !llvm.i64 + %938 = llvm.mul %50, %937 : !llvm.i64 + %939 = llvm.add %936, %938 : !llvm.i64 + %940 = llvm.mlir.constant(1 : index) : !llvm.i64 + %941 = llvm.mul %913, %940 : !llvm.i64 + %942 = llvm.add %939, %941 : !llvm.i64 + %943 = llvm.getelementptr %935[%942] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %944 = llvm.load %943 : !llvm.ptr + %945 = llvm.fadd %944, %934 {RelaxedPrecision} : !llvm.float + %946 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %948 = llvm.mlir.constant(512 : index) : !llvm.i64 + %949 = llvm.mul %50, %948 : !llvm.i64 + %950 = llvm.add %947, %949 : !llvm.i64 + %951 = llvm.mlir.constant(1 : index) : !llvm.i64 + %952 = llvm.mul %913, %951 : !llvm.i64 + %953 = llvm.add %950, %952 : !llvm.i64 + %954 = llvm.getelementptr %946[%953] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %945, %954 : !llvm.ptr + %955 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %956 = llvm.mlir.constant(0 : index) : !llvm.i64 + %957 = llvm.mlir.constant(512 : index) : !llvm.i64 + %958 = llvm.mul %50, %957 : !llvm.i64 + %959 = llvm.add %956, %958 : !llvm.i64 + %960 = llvm.mlir.constant(1 : index) : !llvm.i64 + %961 = llvm.mul %913, %960 : !llvm.i64 + %962 = llvm.add %959, %961 : !llvm.i64 + %963 = llvm.getelementptr %955[%962] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %964 = llvm.load %963 : !llvm.ptr + %965 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %966 = llvm.mlir.constant(0 : index) : !llvm.i64 + %967 = llvm.mlir.constant(512 : index) : !llvm.i64 + %968 = llvm.mul %50, %967 : !llvm.i64 + %969 = llvm.add %966, %968 : !llvm.i64 + %970 = llvm.mlir.constant(1 : index) : !llvm.i64 + %971 = llvm.mul %913, %970 : !llvm.i64 + %972 = llvm.add %969, %971 : !llvm.i64 + %973 = llvm.getelementptr %965[%972] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %964, %973 : !llvm.ptr + %974 = llvm.add %58, %47 : !llvm.i64 + %975 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %976 = llvm.mlir.constant(0 : index) : !llvm.i64 + %977 = llvm.mlir.constant(128 : index) : !llvm.i64 + %978 = llvm.mul %50, %977 : !llvm.i64 + %979 = llvm.add %976, %978 : !llvm.i64 + %980 = llvm.mlir.constant(1 : index) : !llvm.i64 + %981 = llvm.mul %59, %980 : !llvm.i64 + %982 = llvm.add %979, %981 : !llvm.i64 + %983 = llvm.getelementptr %975[%982] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %984 = llvm.load %983 : !llvm.ptr + %985 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %986 = llvm.mlir.constant(0 : index) : !llvm.i64 + %987 = llvm.mlir.constant(512 : index) : !llvm.i64 + %988 = llvm.mul %59, %987 : !llvm.i64 + %989 = llvm.add %986, %988 : !llvm.i64 + %990 = llvm.mlir.constant(1 : index) : !llvm.i64 + %991 = llvm.mul %974, %990 : !llvm.i64 + %992 = llvm.add %989, %991 : !llvm.i64 + %993 = llvm.getelementptr %985[%992] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %994 = llvm.load %993 : !llvm.ptr + %995 = llvm.fmul %984, %994 {RelaxedPrecision} : !llvm.float + %996 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %997 = llvm.mlir.constant(0 : index) : !llvm.i64 + %998 = llvm.mlir.constant(512 : index) : !llvm.i64 + %999 = llvm.mul %50, %998 : !llvm.i64 + %1000 = llvm.add %997, %999 : !llvm.i64 + %1001 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1002 = llvm.mul %974, %1001 : !llvm.i64 + %1003 = llvm.add %1000, %1002 : !llvm.i64 + %1004 = llvm.getelementptr %996[%1003] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1005 = llvm.load %1004 : !llvm.ptr + %1006 = llvm.fadd %1005, %995 {RelaxedPrecision} : !llvm.float + %1007 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1009 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1010 = llvm.mul %50, %1009 : !llvm.i64 + %1011 = llvm.add %1008, %1010 : !llvm.i64 + %1012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1013 = llvm.mul %974, %1012 : !llvm.i64 + %1014 = llvm.add %1011, %1013 : !llvm.i64 + %1015 = llvm.getelementptr %1007[%1014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1006, %1015 : !llvm.ptr + %1016 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1017 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1018 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1019 = llvm.mul %50, %1018 : !llvm.i64 + %1020 = llvm.add %1017, %1019 : !llvm.i64 + %1021 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1022 = llvm.mul %974, %1021 : !llvm.i64 + %1023 = llvm.add %1020, %1022 : !llvm.i64 + %1024 = llvm.getelementptr %1016[%1023] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1025 = llvm.load %1024 : !llvm.ptr + %1026 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1027 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1028 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1029 = llvm.mul %50, %1028 : !llvm.i64 + %1030 = llvm.add %1027, %1029 : !llvm.i64 + %1031 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1032 = llvm.mul %974, %1031 : !llvm.i64 + %1033 = llvm.add %1030, %1032 : !llvm.i64 + %1034 = llvm.getelementptr %1026[%1033] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1025, %1034 : !llvm.ptr + %1035 = llvm.add %50, %33 : !llvm.i64 + %1036 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1037 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1038 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1039 = llvm.mul %1035, %1038 : !llvm.i64 + %1040 = llvm.add %1037, %1039 : !llvm.i64 + %1041 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1042 = llvm.mul %59, %1041 : !llvm.i64 + %1043 = llvm.add %1040, %1042 : !llvm.i64 + %1044 = llvm.getelementptr %1036[%1043] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1045 = llvm.load %1044 : !llvm.ptr + %1046 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1047 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1048 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1049 = llvm.mul %59, %1048 : !llvm.i64 + %1050 = llvm.add %1047, %1049 : !llvm.i64 + %1051 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1052 = llvm.mul %58, %1051 : !llvm.i64 + %1053 = llvm.add %1050, %1052 : !llvm.i64 + %1054 = llvm.getelementptr %1046[%1053] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1055 = llvm.load %1054 : !llvm.ptr + %1056 = llvm.fmul %1045, %1055 {RelaxedPrecision} : !llvm.float + %1057 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1059 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1060 = llvm.mul %1035, %1059 : !llvm.i64 + %1061 = llvm.add %1058, %1060 : !llvm.i64 + %1062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1063 = llvm.mul %58, %1062 : !llvm.i64 + %1064 = llvm.add %1061, %1063 : !llvm.i64 + %1065 = llvm.getelementptr %1057[%1064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1066 = llvm.load %1065 : !llvm.ptr + %1067 = llvm.fadd %1066, %1056 {RelaxedPrecision} : !llvm.float + %1068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1071 = llvm.mul %1035, %1070 : !llvm.i64 + %1072 = llvm.add %1069, %1071 : !llvm.i64 + %1073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1074 = llvm.mul %58, %1073 : !llvm.i64 + %1075 = llvm.add %1072, %1074 : !llvm.i64 + %1076 = llvm.getelementptr %1068[%1075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1067, %1076 : !llvm.ptr + %1077 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1078 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1079 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1080 = llvm.mul %1035, %1079 : !llvm.i64 + %1081 = llvm.add %1078, %1080 : !llvm.i64 + %1082 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1083 = llvm.mul %58, %1082 : !llvm.i64 + %1084 = llvm.add %1081, %1083 : !llvm.i64 + %1085 = llvm.getelementptr %1077[%1084] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1086 = llvm.load %1085 : !llvm.ptr + %1087 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1088 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1089 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1090 = llvm.mul %1035, %1089 : !llvm.i64 + %1091 = llvm.add %1088, %1090 : !llvm.i64 + %1092 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1093 = llvm.mul %58, %1092 : !llvm.i64 + %1094 = llvm.add %1091, %1093 : !llvm.i64 + %1095 = llvm.getelementptr %1087[%1094] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1086, %1095 : !llvm.ptr + %1096 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1098 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1099 = llvm.mul %1035, %1098 : !llvm.i64 + %1100 = llvm.add %1097, %1099 : !llvm.i64 + %1101 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1102 = llvm.mul %59, %1101 : !llvm.i64 + %1103 = llvm.add %1100, %1102 : !llvm.i64 + %1104 = llvm.getelementptr %1096[%1103] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1105 = llvm.load %1104 : !llvm.ptr + %1106 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1108 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1109 = llvm.mul %59, %1108 : !llvm.i64 + %1110 = llvm.add %1107, %1109 : !llvm.i64 + %1111 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1112 = llvm.mul %120, %1111 : !llvm.i64 + %1113 = llvm.add %1110, %1112 : !llvm.i64 + %1114 = llvm.getelementptr %1106[%1113] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1115 = llvm.load %1114 : !llvm.ptr + %1116 = llvm.fmul %1105, %1115 {RelaxedPrecision} : !llvm.float + %1117 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1120 = llvm.mul %1035, %1119 : !llvm.i64 + %1121 = llvm.add %1118, %1120 : !llvm.i64 + %1122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1123 = llvm.mul %120, %1122 : !llvm.i64 + %1124 = llvm.add %1121, %1123 : !llvm.i64 + %1125 = llvm.getelementptr %1117[%1124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1126 = llvm.load %1125 : !llvm.ptr + %1127 = llvm.fadd %1126, %1116 {RelaxedPrecision} : !llvm.float + %1128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1131 = llvm.mul %1035, %1130 : !llvm.i64 + %1132 = llvm.add %1129, %1131 : !llvm.i64 + %1133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1134 = llvm.mul %120, %1133 : !llvm.i64 + %1135 = llvm.add %1132, %1134 : !llvm.i64 + %1136 = llvm.getelementptr %1128[%1135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1127, %1136 : !llvm.ptr + %1137 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1140 = llvm.mul %1035, %1139 : !llvm.i64 + %1141 = llvm.add %1138, %1140 : !llvm.i64 + %1142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1143 = llvm.mul %120, %1142 : !llvm.i64 + %1144 = llvm.add %1141, %1143 : !llvm.i64 + %1145 = llvm.getelementptr %1137[%1144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1146 = llvm.load %1145 : !llvm.ptr + %1147 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1149 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1150 = llvm.mul %1035, %1149 : !llvm.i64 + %1151 = llvm.add %1148, %1150 : !llvm.i64 + %1152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1153 = llvm.mul %120, %1152 : !llvm.i64 + %1154 = llvm.add %1151, %1153 : !llvm.i64 + %1155 = llvm.getelementptr %1147[%1154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1146, %1155 : !llvm.ptr + %1156 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1157 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1158 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1159 = llvm.mul %1035, %1158 : !llvm.i64 + %1160 = llvm.add %1157, %1159 : !llvm.i64 + %1161 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1162 = llvm.mul %59, %1161 : !llvm.i64 + %1163 = llvm.add %1160, %1162 : !llvm.i64 + %1164 = llvm.getelementptr %1156[%1163] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1165 = llvm.load %1164 : !llvm.ptr + %1166 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1167 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1168 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1169 = llvm.mul %59, %1168 : !llvm.i64 + %1170 = llvm.add %1167, %1169 : !llvm.i64 + %1171 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1172 = llvm.mul %181, %1171 : !llvm.i64 + %1173 = llvm.add %1170, %1172 : !llvm.i64 + %1174 = llvm.getelementptr %1166[%1173] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1175 = llvm.load %1174 : !llvm.ptr + %1176 = llvm.fmul %1165, %1175 {RelaxedPrecision} : !llvm.float + %1177 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1179 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1180 = llvm.mul %1035, %1179 : !llvm.i64 + %1181 = llvm.add %1178, %1180 : !llvm.i64 + %1182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1183 = llvm.mul %181, %1182 : !llvm.i64 + %1184 = llvm.add %1181, %1183 : !llvm.i64 + %1185 = llvm.getelementptr %1177[%1184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1186 = llvm.load %1185 : !llvm.ptr + %1187 = llvm.fadd %1186, %1176 {RelaxedPrecision} : !llvm.float + %1188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1191 = llvm.mul %1035, %1190 : !llvm.i64 + %1192 = llvm.add %1189, %1191 : !llvm.i64 + %1193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1194 = llvm.mul %181, %1193 : !llvm.i64 + %1195 = llvm.add %1192, %1194 : !llvm.i64 + %1196 = llvm.getelementptr %1188[%1195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1187, %1196 : !llvm.ptr + %1197 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1198 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1199 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1200 = llvm.mul %1035, %1199 : !llvm.i64 + %1201 = llvm.add %1198, %1200 : !llvm.i64 + %1202 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1203 = llvm.mul %181, %1202 : !llvm.i64 + %1204 = llvm.add %1201, %1203 : !llvm.i64 + %1205 = llvm.getelementptr %1197[%1204] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1206 = llvm.load %1205 : !llvm.ptr + %1207 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1208 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1209 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1210 = llvm.mul %1035, %1209 : !llvm.i64 + %1211 = llvm.add %1208, %1210 : !llvm.i64 + %1212 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1213 = llvm.mul %181, %1212 : !llvm.i64 + %1214 = llvm.add %1211, %1213 : !llvm.i64 + %1215 = llvm.getelementptr %1207[%1214] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1206, %1215 : !llvm.ptr + %1216 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1217 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1218 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1219 = llvm.mul %1035, %1218 : !llvm.i64 + %1220 = llvm.add %1217, %1219 : !llvm.i64 + %1221 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1222 = llvm.mul %59, %1221 : !llvm.i64 + %1223 = llvm.add %1220, %1222 : !llvm.i64 + %1224 = llvm.getelementptr %1216[%1223] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1225 = llvm.load %1224 : !llvm.ptr + %1226 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1227 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1228 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1229 = llvm.mul %59, %1228 : !llvm.i64 + %1230 = llvm.add %1227, %1229 : !llvm.i64 + %1231 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1232 = llvm.mul %242, %1231 : !llvm.i64 + %1233 = llvm.add %1230, %1232 : !llvm.i64 + %1234 = llvm.getelementptr %1226[%1233] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1235 = llvm.load %1234 : !llvm.ptr + %1236 = llvm.fmul %1225, %1235 {RelaxedPrecision} : !llvm.float + %1237 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1239 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1240 = llvm.mul %1035, %1239 : !llvm.i64 + %1241 = llvm.add %1238, %1240 : !llvm.i64 + %1242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1243 = llvm.mul %242, %1242 : !llvm.i64 + %1244 = llvm.add %1241, %1243 : !llvm.i64 + %1245 = llvm.getelementptr %1237[%1244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1246 = llvm.load %1245 : !llvm.ptr + %1247 = llvm.fadd %1246, %1236 {RelaxedPrecision} : !llvm.float + %1248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1251 = llvm.mul %1035, %1250 : !llvm.i64 + %1252 = llvm.add %1249, %1251 : !llvm.i64 + %1253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1254 = llvm.mul %242, %1253 : !llvm.i64 + %1255 = llvm.add %1252, %1254 : !llvm.i64 + %1256 = llvm.getelementptr %1248[%1255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1247, %1256 : !llvm.ptr + %1257 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1258 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1259 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1260 = llvm.mul %1035, %1259 : !llvm.i64 + %1261 = llvm.add %1258, %1260 : !llvm.i64 + %1262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1263 = llvm.mul %242, %1262 : !llvm.i64 + %1264 = llvm.add %1261, %1263 : !llvm.i64 + %1265 = llvm.getelementptr %1257[%1264] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1266 = llvm.load %1265 : !llvm.ptr + %1267 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1269 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1270 = llvm.mul %1035, %1269 : !llvm.i64 + %1271 = llvm.add %1268, %1270 : !llvm.i64 + %1272 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1273 = llvm.mul %242, %1272 : !llvm.i64 + %1274 = llvm.add %1271, %1273 : !llvm.i64 + %1275 = llvm.getelementptr %1267[%1274] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1266, %1275 : !llvm.ptr + %1276 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1277 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1278 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1279 = llvm.mul %1035, %1278 : !llvm.i64 + %1280 = llvm.add %1277, %1279 : !llvm.i64 + %1281 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1282 = llvm.mul %59, %1281 : !llvm.i64 + %1283 = llvm.add %1280, %1282 : !llvm.i64 + %1284 = llvm.getelementptr %1276[%1283] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1285 = llvm.load %1284 : !llvm.ptr + %1286 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1287 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1288 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1289 = llvm.mul %59, %1288 : !llvm.i64 + %1290 = llvm.add %1287, %1289 : !llvm.i64 + %1291 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1292 = llvm.mul %303, %1291 : !llvm.i64 + %1293 = llvm.add %1290, %1292 : !llvm.i64 + %1294 = llvm.getelementptr %1286[%1293] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1295 = llvm.load %1294 : !llvm.ptr + %1296 = llvm.fmul %1285, %1295 {RelaxedPrecision} : !llvm.float + %1297 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1299 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1300 = llvm.mul %1035, %1299 : !llvm.i64 + %1301 = llvm.add %1298, %1300 : !llvm.i64 + %1302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1303 = llvm.mul %303, %1302 : !llvm.i64 + %1304 = llvm.add %1301, %1303 : !llvm.i64 + %1305 = llvm.getelementptr %1297[%1304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1306 = llvm.load %1305 : !llvm.ptr + %1307 = llvm.fadd %1306, %1296 {RelaxedPrecision} : !llvm.float + %1308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1311 = llvm.mul %1035, %1310 : !llvm.i64 + %1312 = llvm.add %1309, %1311 : !llvm.i64 + %1313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1314 = llvm.mul %303, %1313 : !llvm.i64 + %1315 = llvm.add %1312, %1314 : !llvm.i64 + %1316 = llvm.getelementptr %1308[%1315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1307, %1316 : !llvm.ptr + %1317 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1318 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1319 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1320 = llvm.mul %1035, %1319 : !llvm.i64 + %1321 = llvm.add %1318, %1320 : !llvm.i64 + %1322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1323 = llvm.mul %303, %1322 : !llvm.i64 + %1324 = llvm.add %1321, %1323 : !llvm.i64 + %1325 = llvm.getelementptr %1317[%1324] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1326 = llvm.load %1325 : !llvm.ptr + %1327 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1328 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1329 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1330 = llvm.mul %1035, %1329 : !llvm.i64 + %1331 = llvm.add %1328, %1330 : !llvm.i64 + %1332 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1333 = llvm.mul %303, %1332 : !llvm.i64 + %1334 = llvm.add %1331, %1333 : !llvm.i64 + %1335 = llvm.getelementptr %1327[%1334] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1326, %1335 : !llvm.ptr + %1336 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1337 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1338 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1339 = llvm.mul %1035, %1338 : !llvm.i64 + %1340 = llvm.add %1337, %1339 : !llvm.i64 + %1341 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1342 = llvm.mul %59, %1341 : !llvm.i64 + %1343 = llvm.add %1340, %1342 : !llvm.i64 + %1344 = llvm.getelementptr %1336[%1343] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1345 = llvm.load %1344 : !llvm.ptr + %1346 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1347 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1348 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1349 = llvm.mul %59, %1348 : !llvm.i64 + %1350 = llvm.add %1347, %1349 : !llvm.i64 + %1351 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1352 = llvm.mul %364, %1351 : !llvm.i64 + %1353 = llvm.add %1350, %1352 : !llvm.i64 + %1354 = llvm.getelementptr %1346[%1353] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1355 = llvm.load %1354 : !llvm.ptr + %1356 = llvm.fmul %1345, %1355 {RelaxedPrecision} : !llvm.float + %1357 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1359 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1360 = llvm.mul %1035, %1359 : !llvm.i64 + %1361 = llvm.add %1358, %1360 : !llvm.i64 + %1362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1363 = llvm.mul %364, %1362 : !llvm.i64 + %1364 = llvm.add %1361, %1363 : !llvm.i64 + %1365 = llvm.getelementptr %1357[%1364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1366 = llvm.load %1365 : !llvm.ptr + %1367 = llvm.fadd %1366, %1356 {RelaxedPrecision} : !llvm.float + %1368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1371 = llvm.mul %1035, %1370 : !llvm.i64 + %1372 = llvm.add %1369, %1371 : !llvm.i64 + %1373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1374 = llvm.mul %364, %1373 : !llvm.i64 + %1375 = llvm.add %1372, %1374 : !llvm.i64 + %1376 = llvm.getelementptr %1368[%1375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1367, %1376 : !llvm.ptr + %1377 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1378 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1379 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1380 = llvm.mul %1035, %1379 : !llvm.i64 + %1381 = llvm.add %1378, %1380 : !llvm.i64 + %1382 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1383 = llvm.mul %364, %1382 : !llvm.i64 + %1384 = llvm.add %1381, %1383 : !llvm.i64 + %1385 = llvm.getelementptr %1377[%1384] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1386 = llvm.load %1385 : !llvm.ptr + %1387 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1389 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1390 = llvm.mul %1035, %1389 : !llvm.i64 + %1391 = llvm.add %1388, %1390 : !llvm.i64 + %1392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1393 = llvm.mul %364, %1392 : !llvm.i64 + %1394 = llvm.add %1391, %1393 : !llvm.i64 + %1395 = llvm.getelementptr %1387[%1394] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1386, %1395 : !llvm.ptr + %1396 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1397 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1398 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1399 = llvm.mul %1035, %1398 : !llvm.i64 + %1400 = llvm.add %1397, %1399 : !llvm.i64 + %1401 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1402 = llvm.mul %59, %1401 : !llvm.i64 + %1403 = llvm.add %1400, %1402 : !llvm.i64 + %1404 = llvm.getelementptr %1396[%1403] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1405 = llvm.load %1404 : !llvm.ptr + %1406 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1408 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1409 = llvm.mul %59, %1408 : !llvm.i64 + %1410 = llvm.add %1407, %1409 : !llvm.i64 + %1411 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1412 = llvm.mul %425, %1411 : !llvm.i64 + %1413 = llvm.add %1410, %1412 : !llvm.i64 + %1414 = llvm.getelementptr %1406[%1413] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1415 = llvm.load %1414 : !llvm.ptr + %1416 = llvm.fmul %1405, %1415 {RelaxedPrecision} : !llvm.float + %1417 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1419 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1420 = llvm.mul %1035, %1419 : !llvm.i64 + %1421 = llvm.add %1418, %1420 : !llvm.i64 + %1422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1423 = llvm.mul %425, %1422 : !llvm.i64 + %1424 = llvm.add %1421, %1423 : !llvm.i64 + %1425 = llvm.getelementptr %1417[%1424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1426 = llvm.load %1425 : !llvm.ptr + %1427 = llvm.fadd %1426, %1416 {RelaxedPrecision} : !llvm.float + %1428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1431 = llvm.mul %1035, %1430 : !llvm.i64 + %1432 = llvm.add %1429, %1431 : !llvm.i64 + %1433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1434 = llvm.mul %425, %1433 : !llvm.i64 + %1435 = llvm.add %1432, %1434 : !llvm.i64 + %1436 = llvm.getelementptr %1428[%1435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1427, %1436 : !llvm.ptr + %1437 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1438 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1439 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1440 = llvm.mul %1035, %1439 : !llvm.i64 + %1441 = llvm.add %1438, %1440 : !llvm.i64 + %1442 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1443 = llvm.mul %425, %1442 : !llvm.i64 + %1444 = llvm.add %1441, %1443 : !llvm.i64 + %1445 = llvm.getelementptr %1437[%1444] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1446 = llvm.load %1445 : !llvm.ptr + %1447 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1449 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1450 = llvm.mul %1035, %1449 : !llvm.i64 + %1451 = llvm.add %1448, %1450 : !llvm.i64 + %1452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1453 = llvm.mul %425, %1452 : !llvm.i64 + %1454 = llvm.add %1451, %1453 : !llvm.i64 + %1455 = llvm.getelementptr %1447[%1454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1446, %1455 : !llvm.ptr + %1456 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1457 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1458 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1459 = llvm.mul %1035, %1458 : !llvm.i64 + %1460 = llvm.add %1457, %1459 : !llvm.i64 + %1461 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1462 = llvm.mul %59, %1461 : !llvm.i64 + %1463 = llvm.add %1460, %1462 : !llvm.i64 + %1464 = llvm.getelementptr %1456[%1463] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1465 = llvm.load %1464 : !llvm.ptr + %1466 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1467 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1468 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1469 = llvm.mul %59, %1468 : !llvm.i64 + %1470 = llvm.add %1467, %1469 : !llvm.i64 + %1471 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1472 = llvm.mul %486, %1471 : !llvm.i64 + %1473 = llvm.add %1470, %1472 : !llvm.i64 + %1474 = llvm.getelementptr %1466[%1473] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1475 = llvm.load %1474 : !llvm.ptr + %1476 = llvm.fmul %1465, %1475 {RelaxedPrecision} : !llvm.float + %1477 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1480 = llvm.mul %1035, %1479 : !llvm.i64 + %1481 = llvm.add %1478, %1480 : !llvm.i64 + %1482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1483 = llvm.mul %486, %1482 : !llvm.i64 + %1484 = llvm.add %1481, %1483 : !llvm.i64 + %1485 = llvm.getelementptr %1477[%1484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1486 = llvm.load %1485 : !llvm.ptr + %1487 = llvm.fadd %1486, %1476 {RelaxedPrecision} : !llvm.float + %1488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1491 = llvm.mul %1035, %1490 : !llvm.i64 + %1492 = llvm.add %1489, %1491 : !llvm.i64 + %1493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1494 = llvm.mul %486, %1493 : !llvm.i64 + %1495 = llvm.add %1492, %1494 : !llvm.i64 + %1496 = llvm.getelementptr %1488[%1495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1487, %1496 : !llvm.ptr + %1497 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1500 = llvm.mul %1035, %1499 : !llvm.i64 + %1501 = llvm.add %1498, %1500 : !llvm.i64 + %1502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1503 = llvm.mul %486, %1502 : !llvm.i64 + %1504 = llvm.add %1501, %1503 : !llvm.i64 + %1505 = llvm.getelementptr %1497[%1504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1506 = llvm.load %1505 : !llvm.ptr + %1507 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1508 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1509 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1510 = llvm.mul %1035, %1509 : !llvm.i64 + %1511 = llvm.add %1508, %1510 : !llvm.i64 + %1512 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1513 = llvm.mul %486, %1512 : !llvm.i64 + %1514 = llvm.add %1511, %1513 : !llvm.i64 + %1515 = llvm.getelementptr %1507[%1514] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1506, %1515 : !llvm.ptr + %1516 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1517 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1518 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1519 = llvm.mul %1035, %1518 : !llvm.i64 + %1520 = llvm.add %1517, %1519 : !llvm.i64 + %1521 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1522 = llvm.mul %59, %1521 : !llvm.i64 + %1523 = llvm.add %1520, %1522 : !llvm.i64 + %1524 = llvm.getelementptr %1516[%1523] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1525 = llvm.load %1524 : !llvm.ptr + %1526 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1528 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1529 = llvm.mul %59, %1528 : !llvm.i64 + %1530 = llvm.add %1527, %1529 : !llvm.i64 + %1531 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1532 = llvm.mul %547, %1531 : !llvm.i64 + %1533 = llvm.add %1530, %1532 : !llvm.i64 + %1534 = llvm.getelementptr %1526[%1533] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1535 = llvm.load %1534 : !llvm.ptr + %1536 = llvm.fmul %1525, %1535 {RelaxedPrecision} : !llvm.float + %1537 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1539 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1540 = llvm.mul %1035, %1539 : !llvm.i64 + %1541 = llvm.add %1538, %1540 : !llvm.i64 + %1542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1543 = llvm.mul %547, %1542 : !llvm.i64 + %1544 = llvm.add %1541, %1543 : !llvm.i64 + %1545 = llvm.getelementptr %1537[%1544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1546 = llvm.load %1545 : !llvm.ptr + %1547 = llvm.fadd %1546, %1536 {RelaxedPrecision} : !llvm.float + %1548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1551 = llvm.mul %1035, %1550 : !llvm.i64 + %1552 = llvm.add %1549, %1551 : !llvm.i64 + %1553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1554 = llvm.mul %547, %1553 : !llvm.i64 + %1555 = llvm.add %1552, %1554 : !llvm.i64 + %1556 = llvm.getelementptr %1548[%1555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1547, %1556 : !llvm.ptr + %1557 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1558 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1559 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1560 = llvm.mul %1035, %1559 : !llvm.i64 + %1561 = llvm.add %1558, %1560 : !llvm.i64 + %1562 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1563 = llvm.mul %547, %1562 : !llvm.i64 + %1564 = llvm.add %1561, %1563 : !llvm.i64 + %1565 = llvm.getelementptr %1557[%1564] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1566 = llvm.load %1565 : !llvm.ptr + %1567 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1568 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1569 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1570 = llvm.mul %1035, %1569 : !llvm.i64 + %1571 = llvm.add %1568, %1570 : !llvm.i64 + %1572 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1573 = llvm.mul %547, %1572 : !llvm.i64 + %1574 = llvm.add %1571, %1573 : !llvm.i64 + %1575 = llvm.getelementptr %1567[%1574] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1566, %1575 : !llvm.ptr + %1576 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1577 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1578 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1579 = llvm.mul %1035, %1578 : !llvm.i64 + %1580 = llvm.add %1577, %1579 : !llvm.i64 + %1581 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1582 = llvm.mul %59, %1581 : !llvm.i64 + %1583 = llvm.add %1580, %1582 : !llvm.i64 + %1584 = llvm.getelementptr %1576[%1583] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1585 = llvm.load %1584 : !llvm.ptr + %1586 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1587 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1588 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1589 = llvm.mul %59, %1588 : !llvm.i64 + %1590 = llvm.add %1587, %1589 : !llvm.i64 + %1591 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1592 = llvm.mul %608, %1591 : !llvm.i64 + %1593 = llvm.add %1590, %1592 : !llvm.i64 + %1594 = llvm.getelementptr %1586[%1593] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1595 = llvm.load %1594 : !llvm.ptr + %1596 = llvm.fmul %1585, %1595 {RelaxedPrecision} : !llvm.float + %1597 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1599 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1600 = llvm.mul %1035, %1599 : !llvm.i64 + %1601 = llvm.add %1598, %1600 : !llvm.i64 + %1602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1603 = llvm.mul %608, %1602 : !llvm.i64 + %1604 = llvm.add %1601, %1603 : !llvm.i64 + %1605 = llvm.getelementptr %1597[%1604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1606 = llvm.load %1605 : !llvm.ptr + %1607 = llvm.fadd %1606, %1596 {RelaxedPrecision} : !llvm.float + %1608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1611 = llvm.mul %1035, %1610 : !llvm.i64 + %1612 = llvm.add %1609, %1611 : !llvm.i64 + %1613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1614 = llvm.mul %608, %1613 : !llvm.i64 + %1615 = llvm.add %1612, %1614 : !llvm.i64 + %1616 = llvm.getelementptr %1608[%1615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1607, %1616 : !llvm.ptr + %1617 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1618 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1619 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1620 = llvm.mul %1035, %1619 : !llvm.i64 + %1621 = llvm.add %1618, %1620 : !llvm.i64 + %1622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1623 = llvm.mul %608, %1622 : !llvm.i64 + %1624 = llvm.add %1621, %1623 : !llvm.i64 + %1625 = llvm.getelementptr %1617[%1624] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1626 = llvm.load %1625 : !llvm.ptr + %1627 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1628 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1629 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1630 = llvm.mul %1035, %1629 : !llvm.i64 + %1631 = llvm.add %1628, %1630 : !llvm.i64 + %1632 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1633 = llvm.mul %608, %1632 : !llvm.i64 + %1634 = llvm.add %1631, %1633 : !llvm.i64 + %1635 = llvm.getelementptr %1627[%1634] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1626, %1635 : !llvm.ptr + %1636 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1637 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1638 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1639 = llvm.mul %1035, %1638 : !llvm.i64 + %1640 = llvm.add %1637, %1639 : !llvm.i64 + %1641 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1642 = llvm.mul %59, %1641 : !llvm.i64 + %1643 = llvm.add %1640, %1642 : !llvm.i64 + %1644 = llvm.getelementptr %1636[%1643] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1645 = llvm.load %1644 : !llvm.ptr + %1646 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1647 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1648 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1649 = llvm.mul %59, %1648 : !llvm.i64 + %1650 = llvm.add %1647, %1649 : !llvm.i64 + %1651 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1652 = llvm.mul %669, %1651 : !llvm.i64 + %1653 = llvm.add %1650, %1652 : !llvm.i64 + %1654 = llvm.getelementptr %1646[%1653] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1655 = llvm.load %1654 : !llvm.ptr + %1656 = llvm.fmul %1645, %1655 {RelaxedPrecision} : !llvm.float + %1657 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1659 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1660 = llvm.mul %1035, %1659 : !llvm.i64 + %1661 = llvm.add %1658, %1660 : !llvm.i64 + %1662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1663 = llvm.mul %669, %1662 : !llvm.i64 + %1664 = llvm.add %1661, %1663 : !llvm.i64 + %1665 = llvm.getelementptr %1657[%1664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1666 = llvm.load %1665 : !llvm.ptr + %1667 = llvm.fadd %1666, %1656 {RelaxedPrecision} : !llvm.float + %1668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1671 = llvm.mul %1035, %1670 : !llvm.i64 + %1672 = llvm.add %1669, %1671 : !llvm.i64 + %1673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1674 = llvm.mul %669, %1673 : !llvm.i64 + %1675 = llvm.add %1672, %1674 : !llvm.i64 + %1676 = llvm.getelementptr %1668[%1675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1667, %1676 : !llvm.ptr + %1677 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1680 = llvm.mul %1035, %1679 : !llvm.i64 + %1681 = llvm.add %1678, %1680 : !llvm.i64 + %1682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1683 = llvm.mul %669, %1682 : !llvm.i64 + %1684 = llvm.add %1681, %1683 : !llvm.i64 + %1685 = llvm.getelementptr %1677[%1684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1686 = llvm.load %1685 : !llvm.ptr + %1687 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1688 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1689 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1690 = llvm.mul %1035, %1689 : !llvm.i64 + %1691 = llvm.add %1688, %1690 : !llvm.i64 + %1692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1693 = llvm.mul %669, %1692 : !llvm.i64 + %1694 = llvm.add %1691, %1693 : !llvm.i64 + %1695 = llvm.getelementptr %1687[%1694] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1686, %1695 : !llvm.ptr + %1696 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1697 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1698 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1699 = llvm.mul %1035, %1698 : !llvm.i64 + %1700 = llvm.add %1697, %1699 : !llvm.i64 + %1701 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1702 = llvm.mul %59, %1701 : !llvm.i64 + %1703 = llvm.add %1700, %1702 : !llvm.i64 + %1704 = llvm.getelementptr %1696[%1703] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1705 = llvm.load %1704 : !llvm.ptr + %1706 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1708 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1709 = llvm.mul %59, %1708 : !llvm.i64 + %1710 = llvm.add %1707, %1709 : !llvm.i64 + %1711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1712 = llvm.mul %730, %1711 : !llvm.i64 + %1713 = llvm.add %1710, %1712 : !llvm.i64 + %1714 = llvm.getelementptr %1706[%1713] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1715 = llvm.load %1714 : !llvm.ptr + %1716 = llvm.fmul %1705, %1715 {RelaxedPrecision} : !llvm.float + %1717 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1718 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1719 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1720 = llvm.mul %1035, %1719 : !llvm.i64 + %1721 = llvm.add %1718, %1720 : !llvm.i64 + %1722 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1723 = llvm.mul %730, %1722 : !llvm.i64 + %1724 = llvm.add %1721, %1723 : !llvm.i64 + %1725 = llvm.getelementptr %1717[%1724] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1726 = llvm.load %1725 : !llvm.ptr + %1727 = llvm.fadd %1726, %1716 {RelaxedPrecision} : !llvm.float + %1728 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1729 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1730 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1731 = llvm.mul %1035, %1730 : !llvm.i64 + %1732 = llvm.add %1729, %1731 : !llvm.i64 + %1733 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1734 = llvm.mul %730, %1733 : !llvm.i64 + %1735 = llvm.add %1732, %1734 : !llvm.i64 + %1736 = llvm.getelementptr %1728[%1735] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1727, %1736 : !llvm.ptr + %1737 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1738 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1739 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1740 = llvm.mul %1035, %1739 : !llvm.i64 + %1741 = llvm.add %1738, %1740 : !llvm.i64 + %1742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1743 = llvm.mul %730, %1742 : !llvm.i64 + %1744 = llvm.add %1741, %1743 : !llvm.i64 + %1745 = llvm.getelementptr %1737[%1744] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1746 = llvm.load %1745 : !llvm.ptr + %1747 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1749 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1750 = llvm.mul %1035, %1749 : !llvm.i64 + %1751 = llvm.add %1748, %1750 : !llvm.i64 + %1752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1753 = llvm.mul %730, %1752 : !llvm.i64 + %1754 = llvm.add %1751, %1753 : !llvm.i64 + %1755 = llvm.getelementptr %1747[%1754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1746, %1755 : !llvm.ptr + %1756 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1758 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1759 = llvm.mul %1035, %1758 : !llvm.i64 + %1760 = llvm.add %1757, %1759 : !llvm.i64 + %1761 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1762 = llvm.mul %59, %1761 : !llvm.i64 + %1763 = llvm.add %1760, %1762 : !llvm.i64 + %1764 = llvm.getelementptr %1756[%1763] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1765 = llvm.load %1764 : !llvm.ptr + %1766 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1767 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1768 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1769 = llvm.mul %59, %1768 : !llvm.i64 + %1770 = llvm.add %1767, %1769 : !llvm.i64 + %1771 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1772 = llvm.mul %791, %1771 : !llvm.i64 + %1773 = llvm.add %1770, %1772 : !llvm.i64 + %1774 = llvm.getelementptr %1766[%1773] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1775 = llvm.load %1774 : !llvm.ptr + %1776 = llvm.fmul %1765, %1775 {RelaxedPrecision} : !llvm.float + %1777 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1779 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1780 = llvm.mul %1035, %1779 : !llvm.i64 + %1781 = llvm.add %1778, %1780 : !llvm.i64 + %1782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1783 = llvm.mul %791, %1782 : !llvm.i64 + %1784 = llvm.add %1781, %1783 : !llvm.i64 + %1785 = llvm.getelementptr %1777[%1784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1786 = llvm.load %1785 : !llvm.ptr + %1787 = llvm.fadd %1786, %1776 {RelaxedPrecision} : !llvm.float + %1788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1791 = llvm.mul %1035, %1790 : !llvm.i64 + %1792 = llvm.add %1789, %1791 : !llvm.i64 + %1793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1794 = llvm.mul %791, %1793 : !llvm.i64 + %1795 = llvm.add %1792, %1794 : !llvm.i64 + %1796 = llvm.getelementptr %1788[%1795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1787, %1796 : !llvm.ptr + %1797 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1799 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1800 = llvm.mul %1035, %1799 : !llvm.i64 + %1801 = llvm.add %1798, %1800 : !llvm.i64 + %1802 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1803 = llvm.mul %791, %1802 : !llvm.i64 + %1804 = llvm.add %1801, %1803 : !llvm.i64 + %1805 = llvm.getelementptr %1797[%1804] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1806 = llvm.load %1805 : !llvm.ptr + %1807 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1808 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1809 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1810 = llvm.mul %1035, %1809 : !llvm.i64 + %1811 = llvm.add %1808, %1810 : !llvm.i64 + %1812 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1813 = llvm.mul %791, %1812 : !llvm.i64 + %1814 = llvm.add %1811, %1813 : !llvm.i64 + %1815 = llvm.getelementptr %1807[%1814] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1806, %1815 : !llvm.ptr + %1816 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1817 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1818 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1819 = llvm.mul %1035, %1818 : !llvm.i64 + %1820 = llvm.add %1817, %1819 : !llvm.i64 + %1821 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1822 = llvm.mul %59, %1821 : !llvm.i64 + %1823 = llvm.add %1820, %1822 : !llvm.i64 + %1824 = llvm.getelementptr %1816[%1823] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1825 = llvm.load %1824 : !llvm.ptr + %1826 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1827 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1828 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1829 = llvm.mul %59, %1828 : !llvm.i64 + %1830 = llvm.add %1827, %1829 : !llvm.i64 + %1831 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1832 = llvm.mul %852, %1831 : !llvm.i64 + %1833 = llvm.add %1830, %1832 : !llvm.i64 + %1834 = llvm.getelementptr %1826[%1833] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1835 = llvm.load %1834 : !llvm.ptr + %1836 = llvm.fmul %1825, %1835 {RelaxedPrecision} : !llvm.float + %1837 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1839 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1840 = llvm.mul %1035, %1839 : !llvm.i64 + %1841 = llvm.add %1838, %1840 : !llvm.i64 + %1842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1843 = llvm.mul %852, %1842 : !llvm.i64 + %1844 = llvm.add %1841, %1843 : !llvm.i64 + %1845 = llvm.getelementptr %1837[%1844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1846 = llvm.load %1845 : !llvm.ptr + %1847 = llvm.fadd %1846, %1836 {RelaxedPrecision} : !llvm.float + %1848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1851 = llvm.mul %1035, %1850 : !llvm.i64 + %1852 = llvm.add %1849, %1851 : !llvm.i64 + %1853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1854 = llvm.mul %852, %1853 : !llvm.i64 + %1855 = llvm.add %1852, %1854 : !llvm.i64 + %1856 = llvm.getelementptr %1848[%1855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1847, %1856 : !llvm.ptr + %1857 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1858 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1859 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1860 = llvm.mul %1035, %1859 : !llvm.i64 + %1861 = llvm.add %1858, %1860 : !llvm.i64 + %1862 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1863 = llvm.mul %852, %1862 : !llvm.i64 + %1864 = llvm.add %1861, %1863 : !llvm.i64 + %1865 = llvm.getelementptr %1857[%1864] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1866 = llvm.load %1865 : !llvm.ptr + %1867 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1868 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1869 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1870 = llvm.mul %1035, %1869 : !llvm.i64 + %1871 = llvm.add %1868, %1870 : !llvm.i64 + %1872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1873 = llvm.mul %852, %1872 : !llvm.i64 + %1874 = llvm.add %1871, %1873 : !llvm.i64 + %1875 = llvm.getelementptr %1867[%1874] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1866, %1875 : !llvm.ptr + %1876 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1877 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1878 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1879 = llvm.mul %1035, %1878 : !llvm.i64 + %1880 = llvm.add %1877, %1879 : !llvm.i64 + %1881 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1882 = llvm.mul %59, %1881 : !llvm.i64 + %1883 = llvm.add %1880, %1882 : !llvm.i64 + %1884 = llvm.getelementptr %1876[%1883] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1885 = llvm.load %1884 : !llvm.ptr + %1886 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1888 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1889 = llvm.mul %59, %1888 : !llvm.i64 + %1890 = llvm.add %1887, %1889 : !llvm.i64 + %1891 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1892 = llvm.mul %913, %1891 : !llvm.i64 + %1893 = llvm.add %1890, %1892 : !llvm.i64 + %1894 = llvm.getelementptr %1886[%1893] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1895 = llvm.load %1894 : !llvm.ptr + %1896 = llvm.fmul %1885, %1895 {RelaxedPrecision} : !llvm.float + %1897 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1899 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1900 = llvm.mul %1035, %1899 : !llvm.i64 + %1901 = llvm.add %1898, %1900 : !llvm.i64 + %1902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1903 = llvm.mul %913, %1902 : !llvm.i64 + %1904 = llvm.add %1901, %1903 : !llvm.i64 + %1905 = llvm.getelementptr %1897[%1904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1906 = llvm.load %1905 : !llvm.ptr + %1907 = llvm.fadd %1906, %1896 {RelaxedPrecision} : !llvm.float + %1908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1911 = llvm.mul %1035, %1910 : !llvm.i64 + %1912 = llvm.add %1909, %1911 : !llvm.i64 + %1913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1914 = llvm.mul %913, %1913 : !llvm.i64 + %1915 = llvm.add %1912, %1914 : !llvm.i64 + %1916 = llvm.getelementptr %1908[%1915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1907, %1916 : !llvm.ptr + %1917 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1918 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1919 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1920 = llvm.mul %1035, %1919 : !llvm.i64 + %1921 = llvm.add %1918, %1920 : !llvm.i64 + %1922 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1923 = llvm.mul %913, %1922 : !llvm.i64 + %1924 = llvm.add %1921, %1923 : !llvm.i64 + %1925 = llvm.getelementptr %1917[%1924] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1926 = llvm.load %1925 : !llvm.ptr + %1927 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1928 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1929 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1930 = llvm.mul %1035, %1929 : !llvm.i64 + %1931 = llvm.add %1928, %1930 : !llvm.i64 + %1932 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1933 = llvm.mul %913, %1932 : !llvm.i64 + %1934 = llvm.add %1931, %1933 : !llvm.i64 + %1935 = llvm.getelementptr %1927[%1934] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1926, %1935 : !llvm.ptr + %1936 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1937 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1938 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1939 = llvm.mul %1035, %1938 : !llvm.i64 + %1940 = llvm.add %1937, %1939 : !llvm.i64 + %1941 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1942 = llvm.mul %59, %1941 : !llvm.i64 + %1943 = llvm.add %1940, %1942 : !llvm.i64 + %1944 = llvm.getelementptr %1936[%1943] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1945 = llvm.load %1944 : !llvm.ptr + %1946 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1948 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1949 = llvm.mul %59, %1948 : !llvm.i64 + %1950 = llvm.add %1947, %1949 : !llvm.i64 + %1951 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1952 = llvm.mul %974, %1951 : !llvm.i64 + %1953 = llvm.add %1950, %1952 : !llvm.i64 + %1954 = llvm.getelementptr %1946[%1953] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1955 = llvm.load %1954 : !llvm.ptr + %1956 = llvm.fmul %1945, %1955 {RelaxedPrecision} : !llvm.float + %1957 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1958 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1959 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1960 = llvm.mul %1035, %1959 : !llvm.i64 + %1961 = llvm.add %1958, %1960 : !llvm.i64 + %1962 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1963 = llvm.mul %974, %1962 : !llvm.i64 + %1964 = llvm.add %1961, %1963 : !llvm.i64 + %1965 = llvm.getelementptr %1957[%1964] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1966 = llvm.load %1965 : !llvm.ptr + %1967 = llvm.fadd %1966, %1956 {RelaxedPrecision} : !llvm.float + %1968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1971 = llvm.mul %1035, %1970 : !llvm.i64 + %1972 = llvm.add %1969, %1971 : !llvm.i64 + %1973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1974 = llvm.mul %974, %1973 : !llvm.i64 + %1975 = llvm.add %1972, %1974 : !llvm.i64 + %1976 = llvm.getelementptr %1968[%1975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1967, %1976 : !llvm.ptr + %1977 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1978 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1979 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1980 = llvm.mul %1035, %1979 : !llvm.i64 + %1981 = llvm.add %1978, %1980 : !llvm.i64 + %1982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1983 = llvm.mul %974, %1982 : !llvm.i64 + %1984 = llvm.add %1981, %1983 : !llvm.i64 + %1985 = llvm.getelementptr %1977[%1984] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1986 = llvm.load %1985 : !llvm.ptr + %1987 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1988 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1989 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1990 = llvm.mul %1035, %1989 : !llvm.i64 + %1991 = llvm.add %1988, %1990 : !llvm.i64 + %1992 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1993 = llvm.mul %974, %1992 : !llvm.i64 + %1994 = llvm.add %1991, %1993 : !llvm.i64 + %1995 = llvm.getelementptr %1987[%1994] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1986, %1995 : !llvm.ptr + %1996 = llvm.add %50, %34 : !llvm.i64 + %1997 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1998 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1999 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2000 = llvm.mul %1996, %1999 : !llvm.i64 + %2001 = llvm.add %1998, %2000 : !llvm.i64 + %2002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2003 = llvm.mul %59, %2002 : !llvm.i64 + %2004 = llvm.add %2001, %2003 : !llvm.i64 + %2005 = llvm.getelementptr %1997[%2004] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2006 = llvm.load %2005 : !llvm.ptr + %2007 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2009 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2010 = llvm.mul %59, %2009 : !llvm.i64 + %2011 = llvm.add %2008, %2010 : !llvm.i64 + %2012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2013 = llvm.mul %58, %2012 : !llvm.i64 + %2014 = llvm.add %2011, %2013 : !llvm.i64 + %2015 = llvm.getelementptr %2007[%2014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2016 = llvm.load %2015 : !llvm.ptr + %2017 = llvm.fmul %2006, %2016 {RelaxedPrecision} : !llvm.float + %2018 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2020 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2021 = llvm.mul %1996, %2020 : !llvm.i64 + %2022 = llvm.add %2019, %2021 : !llvm.i64 + %2023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2024 = llvm.mul %58, %2023 : !llvm.i64 + %2025 = llvm.add %2022, %2024 : !llvm.i64 + %2026 = llvm.getelementptr %2018[%2025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2027 = llvm.load %2026 : !llvm.ptr + %2028 = llvm.fadd %2027, %2017 {RelaxedPrecision} : !llvm.float + %2029 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2030 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2031 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2032 = llvm.mul %1996, %2031 : !llvm.i64 + %2033 = llvm.add %2030, %2032 : !llvm.i64 + %2034 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2035 = llvm.mul %58, %2034 : !llvm.i64 + %2036 = llvm.add %2033, %2035 : !llvm.i64 + %2037 = llvm.getelementptr %2029[%2036] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2028, %2037 : !llvm.ptr + %2038 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2039 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2040 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2041 = llvm.mul %1996, %2040 : !llvm.i64 + %2042 = llvm.add %2039, %2041 : !llvm.i64 + %2043 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2044 = llvm.mul %58, %2043 : !llvm.i64 + %2045 = llvm.add %2042, %2044 : !llvm.i64 + %2046 = llvm.getelementptr %2038[%2045] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2047 = llvm.load %2046 : !llvm.ptr + %2048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2051 = llvm.mul %1996, %2050 : !llvm.i64 + %2052 = llvm.add %2049, %2051 : !llvm.i64 + %2053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2054 = llvm.mul %58, %2053 : !llvm.i64 + %2055 = llvm.add %2052, %2054 : !llvm.i64 + %2056 = llvm.getelementptr %2048[%2055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2047, %2056 : !llvm.ptr + %2057 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2059 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2060 = llvm.mul %1996, %2059 : !llvm.i64 + %2061 = llvm.add %2058, %2060 : !llvm.i64 + %2062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2063 = llvm.mul %59, %2062 : !llvm.i64 + %2064 = llvm.add %2061, %2063 : !llvm.i64 + %2065 = llvm.getelementptr %2057[%2064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2066 = llvm.load %2065 : !llvm.ptr + %2067 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2069 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2070 = llvm.mul %59, %2069 : !llvm.i64 + %2071 = llvm.add %2068, %2070 : !llvm.i64 + %2072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2073 = llvm.mul %120, %2072 : !llvm.i64 + %2074 = llvm.add %2071, %2073 : !llvm.i64 + %2075 = llvm.getelementptr %2067[%2074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2076 = llvm.load %2075 : !llvm.ptr + %2077 = llvm.fmul %2066, %2076 {RelaxedPrecision} : !llvm.float + %2078 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2080 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2081 = llvm.mul %1996, %2080 : !llvm.i64 + %2082 = llvm.add %2079, %2081 : !llvm.i64 + %2083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2084 = llvm.mul %120, %2083 : !llvm.i64 + %2085 = llvm.add %2082, %2084 : !llvm.i64 + %2086 = llvm.getelementptr %2078[%2085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2087 = llvm.load %2086 : !llvm.ptr + %2088 = llvm.fadd %2087, %2077 {RelaxedPrecision} : !llvm.float + %2089 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2090 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2091 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2092 = llvm.mul %1996, %2091 : !llvm.i64 + %2093 = llvm.add %2090, %2092 : !llvm.i64 + %2094 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2095 = llvm.mul %120, %2094 : !llvm.i64 + %2096 = llvm.add %2093, %2095 : !llvm.i64 + %2097 = llvm.getelementptr %2089[%2096] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2088, %2097 : !llvm.ptr + %2098 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2100 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2101 = llvm.mul %1996, %2100 : !llvm.i64 + %2102 = llvm.add %2099, %2101 : !llvm.i64 + %2103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2104 = llvm.mul %120, %2103 : !llvm.i64 + %2105 = llvm.add %2102, %2104 : !llvm.i64 + %2106 = llvm.getelementptr %2098[%2105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2107 = llvm.load %2106 : !llvm.ptr + %2108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2111 = llvm.mul %1996, %2110 : !llvm.i64 + %2112 = llvm.add %2109, %2111 : !llvm.i64 + %2113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2114 = llvm.mul %120, %2113 : !llvm.i64 + %2115 = llvm.add %2112, %2114 : !llvm.i64 + %2116 = llvm.getelementptr %2108[%2115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2107, %2116 : !llvm.ptr + %2117 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2119 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2120 = llvm.mul %1996, %2119 : !llvm.i64 + %2121 = llvm.add %2118, %2120 : !llvm.i64 + %2122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2123 = llvm.mul %59, %2122 : !llvm.i64 + %2124 = llvm.add %2121, %2123 : !llvm.i64 + %2125 = llvm.getelementptr %2117[%2124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2126 = llvm.load %2125 : !llvm.ptr + %2127 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2128 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2129 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2130 = llvm.mul %59, %2129 : !llvm.i64 + %2131 = llvm.add %2128, %2130 : !llvm.i64 + %2132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2133 = llvm.mul %181, %2132 : !llvm.i64 + %2134 = llvm.add %2131, %2133 : !llvm.i64 + %2135 = llvm.getelementptr %2127[%2134] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2136 = llvm.load %2135 : !llvm.ptr + %2137 = llvm.fmul %2126, %2136 {RelaxedPrecision} : !llvm.float + %2138 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2140 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2141 = llvm.mul %1996, %2140 : !llvm.i64 + %2142 = llvm.add %2139, %2141 : !llvm.i64 + %2143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2144 = llvm.mul %181, %2143 : !llvm.i64 + %2145 = llvm.add %2142, %2144 : !llvm.i64 + %2146 = llvm.getelementptr %2138[%2145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2147 = llvm.load %2146 : !llvm.ptr + %2148 = llvm.fadd %2147, %2137 {RelaxedPrecision} : !llvm.float + %2149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2151 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2152 = llvm.mul %1996, %2151 : !llvm.i64 + %2153 = llvm.add %2150, %2152 : !llvm.i64 + %2154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2155 = llvm.mul %181, %2154 : !llvm.i64 + %2156 = llvm.add %2153, %2155 : !llvm.i64 + %2157 = llvm.getelementptr %2149[%2156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2148, %2157 : !llvm.ptr + %2158 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2161 = llvm.mul %1996, %2160 : !llvm.i64 + %2162 = llvm.add %2159, %2161 : !llvm.i64 + %2163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2164 = llvm.mul %181, %2163 : !llvm.i64 + %2165 = llvm.add %2162, %2164 : !llvm.i64 + %2166 = llvm.getelementptr %2158[%2165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2167 = llvm.load %2166 : !llvm.ptr + %2168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2171 = llvm.mul %1996, %2170 : !llvm.i64 + %2172 = llvm.add %2169, %2171 : !llvm.i64 + %2173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2174 = llvm.mul %181, %2173 : !llvm.i64 + %2175 = llvm.add %2172, %2174 : !llvm.i64 + %2176 = llvm.getelementptr %2168[%2175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2167, %2176 : !llvm.ptr + %2177 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2179 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2180 = llvm.mul %1996, %2179 : !llvm.i64 + %2181 = llvm.add %2178, %2180 : !llvm.i64 + %2182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2183 = llvm.mul %59, %2182 : !llvm.i64 + %2184 = llvm.add %2181, %2183 : !llvm.i64 + %2185 = llvm.getelementptr %2177[%2184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2186 = llvm.load %2185 : !llvm.ptr + %2187 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2188 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2189 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2190 = llvm.mul %59, %2189 : !llvm.i64 + %2191 = llvm.add %2188, %2190 : !llvm.i64 + %2192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2193 = llvm.mul %242, %2192 : !llvm.i64 + %2194 = llvm.add %2191, %2193 : !llvm.i64 + %2195 = llvm.getelementptr %2187[%2194] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2196 = llvm.load %2195 : !llvm.ptr + %2197 = llvm.fmul %2186, %2196 {RelaxedPrecision} : !llvm.float + %2198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2201 = llvm.mul %1996, %2200 : !llvm.i64 + %2202 = llvm.add %2199, %2201 : !llvm.i64 + %2203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2204 = llvm.mul %242, %2203 : !llvm.i64 + %2205 = llvm.add %2202, %2204 : !llvm.i64 + %2206 = llvm.getelementptr %2198[%2205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2207 = llvm.load %2206 : !llvm.ptr + %2208 = llvm.fadd %2207, %2197 {RelaxedPrecision} : !llvm.float + %2209 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2211 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2212 = llvm.mul %1996, %2211 : !llvm.i64 + %2213 = llvm.add %2210, %2212 : !llvm.i64 + %2214 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2215 = llvm.mul %242, %2214 : !llvm.i64 + %2216 = llvm.add %2213, %2215 : !llvm.i64 + %2217 = llvm.getelementptr %2209[%2216] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2208, %2217 : !llvm.ptr + %2218 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2220 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2221 = llvm.mul %1996, %2220 : !llvm.i64 + %2222 = llvm.add %2219, %2221 : !llvm.i64 + %2223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2224 = llvm.mul %242, %2223 : !llvm.i64 + %2225 = llvm.add %2222, %2224 : !llvm.i64 + %2226 = llvm.getelementptr %2218[%2225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2227 = llvm.load %2226 : !llvm.ptr + %2228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2231 = llvm.mul %1996, %2230 : !llvm.i64 + %2232 = llvm.add %2229, %2231 : !llvm.i64 + %2233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2234 = llvm.mul %242, %2233 : !llvm.i64 + %2235 = llvm.add %2232, %2234 : !llvm.i64 + %2236 = llvm.getelementptr %2228[%2235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2227, %2236 : !llvm.ptr + %2237 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2239 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2240 = llvm.mul %1996, %2239 : !llvm.i64 + %2241 = llvm.add %2238, %2240 : !llvm.i64 + %2242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2243 = llvm.mul %59, %2242 : !llvm.i64 + %2244 = llvm.add %2241, %2243 : !llvm.i64 + %2245 = llvm.getelementptr %2237[%2244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2246 = llvm.load %2245 : !llvm.ptr + %2247 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2248 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2249 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2250 = llvm.mul %59, %2249 : !llvm.i64 + %2251 = llvm.add %2248, %2250 : !llvm.i64 + %2252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2253 = llvm.mul %303, %2252 : !llvm.i64 + %2254 = llvm.add %2251, %2253 : !llvm.i64 + %2255 = llvm.getelementptr %2247[%2254] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2256 = llvm.load %2255 : !llvm.ptr + %2257 = llvm.fmul %2246, %2256 {RelaxedPrecision} : !llvm.float + %2258 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2260 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2261 = llvm.mul %1996, %2260 : !llvm.i64 + %2262 = llvm.add %2259, %2261 : !llvm.i64 + %2263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2264 = llvm.mul %303, %2263 : !llvm.i64 + %2265 = llvm.add %2262, %2264 : !llvm.i64 + %2266 = llvm.getelementptr %2258[%2265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2267 = llvm.load %2266 : !llvm.ptr + %2268 = llvm.fadd %2267, %2257 {RelaxedPrecision} : !llvm.float + %2269 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2270 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2271 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2272 = llvm.mul %1996, %2271 : !llvm.i64 + %2273 = llvm.add %2270, %2272 : !llvm.i64 + %2274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2275 = llvm.mul %303, %2274 : !llvm.i64 + %2276 = llvm.add %2273, %2275 : !llvm.i64 + %2277 = llvm.getelementptr %2269[%2276] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2268, %2277 : !llvm.ptr + %2278 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2279 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2280 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2281 = llvm.mul %1996, %2280 : !llvm.i64 + %2282 = llvm.add %2279, %2281 : !llvm.i64 + %2283 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2284 = llvm.mul %303, %2283 : !llvm.i64 + %2285 = llvm.add %2282, %2284 : !llvm.i64 + %2286 = llvm.getelementptr %2278[%2285] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2287 = llvm.load %2286 : !llvm.ptr + %2288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2291 = llvm.mul %1996, %2290 : !llvm.i64 + %2292 = llvm.add %2289, %2291 : !llvm.i64 + %2293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2294 = llvm.mul %303, %2293 : !llvm.i64 + %2295 = llvm.add %2292, %2294 : !llvm.i64 + %2296 = llvm.getelementptr %2288[%2295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2287, %2296 : !llvm.ptr + %2297 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2299 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2300 = llvm.mul %1996, %2299 : !llvm.i64 + %2301 = llvm.add %2298, %2300 : !llvm.i64 + %2302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2303 = llvm.mul %59, %2302 : !llvm.i64 + %2304 = llvm.add %2301, %2303 : !llvm.i64 + %2305 = llvm.getelementptr %2297[%2304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2306 = llvm.load %2305 : !llvm.ptr + %2307 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2308 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2309 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2310 = llvm.mul %59, %2309 : !llvm.i64 + %2311 = llvm.add %2308, %2310 : !llvm.i64 + %2312 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2313 = llvm.mul %364, %2312 : !llvm.i64 + %2314 = llvm.add %2311, %2313 : !llvm.i64 + %2315 = llvm.getelementptr %2307[%2314] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2316 = llvm.load %2315 : !llvm.ptr + %2317 = llvm.fmul %2306, %2316 {RelaxedPrecision} : !llvm.float + %2318 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2320 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2321 = llvm.mul %1996, %2320 : !llvm.i64 + %2322 = llvm.add %2319, %2321 : !llvm.i64 + %2323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2324 = llvm.mul %364, %2323 : !llvm.i64 + %2325 = llvm.add %2322, %2324 : !llvm.i64 + %2326 = llvm.getelementptr %2318[%2325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2327 = llvm.load %2326 : !llvm.ptr + %2328 = llvm.fadd %2327, %2317 {RelaxedPrecision} : !llvm.float + %2329 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2330 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2331 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2332 = llvm.mul %1996, %2331 : !llvm.i64 + %2333 = llvm.add %2330, %2332 : !llvm.i64 + %2334 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2335 = llvm.mul %364, %2334 : !llvm.i64 + %2336 = llvm.add %2333, %2335 : !llvm.i64 + %2337 = llvm.getelementptr %2329[%2336] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2328, %2337 : !llvm.ptr + %2338 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2339 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2340 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2341 = llvm.mul %1996, %2340 : !llvm.i64 + %2342 = llvm.add %2339, %2341 : !llvm.i64 + %2343 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2344 = llvm.mul %364, %2343 : !llvm.i64 + %2345 = llvm.add %2342, %2344 : !llvm.i64 + %2346 = llvm.getelementptr %2338[%2345] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2347 = llvm.load %2346 : !llvm.ptr + %2348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2351 = llvm.mul %1996, %2350 : !llvm.i64 + %2352 = llvm.add %2349, %2351 : !llvm.i64 + %2353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2354 = llvm.mul %364, %2353 : !llvm.i64 + %2355 = llvm.add %2352, %2354 : !llvm.i64 + %2356 = llvm.getelementptr %2348[%2355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2347, %2356 : !llvm.ptr + %2357 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2359 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2360 = llvm.mul %1996, %2359 : !llvm.i64 + %2361 = llvm.add %2358, %2360 : !llvm.i64 + %2362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2363 = llvm.mul %59, %2362 : !llvm.i64 + %2364 = llvm.add %2361, %2363 : !llvm.i64 + %2365 = llvm.getelementptr %2357[%2364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2366 = llvm.load %2365 : !llvm.ptr + %2367 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2368 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2369 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2370 = llvm.mul %59, %2369 : !llvm.i64 + %2371 = llvm.add %2368, %2370 : !llvm.i64 + %2372 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2373 = llvm.mul %425, %2372 : !llvm.i64 + %2374 = llvm.add %2371, %2373 : !llvm.i64 + %2375 = llvm.getelementptr %2367[%2374] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2376 = llvm.load %2375 : !llvm.ptr + %2377 = llvm.fmul %2366, %2376 {RelaxedPrecision} : !llvm.float + %2378 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2380 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2381 = llvm.mul %1996, %2380 : !llvm.i64 + %2382 = llvm.add %2379, %2381 : !llvm.i64 + %2383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2384 = llvm.mul %425, %2383 : !llvm.i64 + %2385 = llvm.add %2382, %2384 : !llvm.i64 + %2386 = llvm.getelementptr %2378[%2385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2387 = llvm.load %2386 : !llvm.ptr + %2388 = llvm.fadd %2387, %2377 {RelaxedPrecision} : !llvm.float + %2389 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2390 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2391 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2392 = llvm.mul %1996, %2391 : !llvm.i64 + %2393 = llvm.add %2390, %2392 : !llvm.i64 + %2394 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2395 = llvm.mul %425, %2394 : !llvm.i64 + %2396 = llvm.add %2393, %2395 : !llvm.i64 + %2397 = llvm.getelementptr %2389[%2396] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2388, %2397 : !llvm.ptr + %2398 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2399 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2401 = llvm.mul %1996, %2400 : !llvm.i64 + %2402 = llvm.add %2399, %2401 : !llvm.i64 + %2403 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2404 = llvm.mul %425, %2403 : !llvm.i64 + %2405 = llvm.add %2402, %2404 : !llvm.i64 + %2406 = llvm.getelementptr %2398[%2405] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2407 = llvm.load %2406 : !llvm.ptr + %2408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2411 = llvm.mul %1996, %2410 : !llvm.i64 + %2412 = llvm.add %2409, %2411 : !llvm.i64 + %2413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2414 = llvm.mul %425, %2413 : !llvm.i64 + %2415 = llvm.add %2412, %2414 : !llvm.i64 + %2416 = llvm.getelementptr %2408[%2415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2407, %2416 : !llvm.ptr + %2417 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2419 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2420 = llvm.mul %1996, %2419 : !llvm.i64 + %2421 = llvm.add %2418, %2420 : !llvm.i64 + %2422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2423 = llvm.mul %59, %2422 : !llvm.i64 + %2424 = llvm.add %2421, %2423 : !llvm.i64 + %2425 = llvm.getelementptr %2417[%2424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2426 = llvm.load %2425 : !llvm.ptr + %2427 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2428 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2429 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2430 = llvm.mul %59, %2429 : !llvm.i64 + %2431 = llvm.add %2428, %2430 : !llvm.i64 + %2432 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2433 = llvm.mul %486, %2432 : !llvm.i64 + %2434 = llvm.add %2431, %2433 : !llvm.i64 + %2435 = llvm.getelementptr %2427[%2434] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2436 = llvm.load %2435 : !llvm.ptr + %2437 = llvm.fmul %2426, %2436 {RelaxedPrecision} : !llvm.float + %2438 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2440 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2441 = llvm.mul %1996, %2440 : !llvm.i64 + %2442 = llvm.add %2439, %2441 : !llvm.i64 + %2443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2444 = llvm.mul %486, %2443 : !llvm.i64 + %2445 = llvm.add %2442, %2444 : !llvm.i64 + %2446 = llvm.getelementptr %2438[%2445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2447 = llvm.load %2446 : !llvm.ptr + %2448 = llvm.fadd %2447, %2437 {RelaxedPrecision} : !llvm.float + %2449 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2450 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2451 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2452 = llvm.mul %1996, %2451 : !llvm.i64 + %2453 = llvm.add %2450, %2452 : !llvm.i64 + %2454 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2455 = llvm.mul %486, %2454 : !llvm.i64 + %2456 = llvm.add %2453, %2455 : !llvm.i64 + %2457 = llvm.getelementptr %2449[%2456] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2448, %2457 : !llvm.ptr + %2458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2461 = llvm.mul %1996, %2460 : !llvm.i64 + %2462 = llvm.add %2459, %2461 : !llvm.i64 + %2463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2464 = llvm.mul %486, %2463 : !llvm.i64 + %2465 = llvm.add %2462, %2464 : !llvm.i64 + %2466 = llvm.getelementptr %2458[%2465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2467 = llvm.load %2466 : !llvm.ptr + %2468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2471 = llvm.mul %1996, %2470 : !llvm.i64 + %2472 = llvm.add %2469, %2471 : !llvm.i64 + %2473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2474 = llvm.mul %486, %2473 : !llvm.i64 + %2475 = llvm.add %2472, %2474 : !llvm.i64 + %2476 = llvm.getelementptr %2468[%2475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2467, %2476 : !llvm.ptr + %2477 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2479 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2480 = llvm.mul %1996, %2479 : !llvm.i64 + %2481 = llvm.add %2478, %2480 : !llvm.i64 + %2482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2483 = llvm.mul %59, %2482 : !llvm.i64 + %2484 = llvm.add %2481, %2483 : !llvm.i64 + %2485 = llvm.getelementptr %2477[%2484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2486 = llvm.load %2485 : !llvm.ptr + %2487 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2489 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2490 = llvm.mul %59, %2489 : !llvm.i64 + %2491 = llvm.add %2488, %2490 : !llvm.i64 + %2492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2493 = llvm.mul %547, %2492 : !llvm.i64 + %2494 = llvm.add %2491, %2493 : !llvm.i64 + %2495 = llvm.getelementptr %2487[%2494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2496 = llvm.load %2495 : !llvm.ptr + %2497 = llvm.fmul %2486, %2496 {RelaxedPrecision} : !llvm.float + %2498 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2500 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2501 = llvm.mul %1996, %2500 : !llvm.i64 + %2502 = llvm.add %2499, %2501 : !llvm.i64 + %2503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2504 = llvm.mul %547, %2503 : !llvm.i64 + %2505 = llvm.add %2502, %2504 : !llvm.i64 + %2506 = llvm.getelementptr %2498[%2505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2507 = llvm.load %2506 : !llvm.ptr + %2508 = llvm.fadd %2507, %2497 {RelaxedPrecision} : !llvm.float + %2509 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2510 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2511 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2512 = llvm.mul %1996, %2511 : !llvm.i64 + %2513 = llvm.add %2510, %2512 : !llvm.i64 + %2514 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2515 = llvm.mul %547, %2514 : !llvm.i64 + %2516 = llvm.add %2513, %2515 : !llvm.i64 + %2517 = llvm.getelementptr %2509[%2516] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2508, %2517 : !llvm.ptr + %2518 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2519 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2520 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2521 = llvm.mul %1996, %2520 : !llvm.i64 + %2522 = llvm.add %2519, %2521 : !llvm.i64 + %2523 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2524 = llvm.mul %547, %2523 : !llvm.i64 + %2525 = llvm.add %2522, %2524 : !llvm.i64 + %2526 = llvm.getelementptr %2518[%2525] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2527 = llvm.load %2526 : !llvm.ptr + %2528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2531 = llvm.mul %1996, %2530 : !llvm.i64 + %2532 = llvm.add %2529, %2531 : !llvm.i64 + %2533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2534 = llvm.mul %547, %2533 : !llvm.i64 + %2535 = llvm.add %2532, %2534 : !llvm.i64 + %2536 = llvm.getelementptr %2528[%2535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2527, %2536 : !llvm.ptr + %2537 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2539 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2540 = llvm.mul %1996, %2539 : !llvm.i64 + %2541 = llvm.add %2538, %2540 : !llvm.i64 + %2542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2543 = llvm.mul %59, %2542 : !llvm.i64 + %2544 = llvm.add %2541, %2543 : !llvm.i64 + %2545 = llvm.getelementptr %2537[%2544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2546 = llvm.load %2545 : !llvm.ptr + %2547 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2548 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2549 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2550 = llvm.mul %59, %2549 : !llvm.i64 + %2551 = llvm.add %2548, %2550 : !llvm.i64 + %2552 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2553 = llvm.mul %608, %2552 : !llvm.i64 + %2554 = llvm.add %2551, %2553 : !llvm.i64 + %2555 = llvm.getelementptr %2547[%2554] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2556 = llvm.load %2555 : !llvm.ptr + %2557 = llvm.fmul %2546, %2556 {RelaxedPrecision} : !llvm.float + %2558 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2561 = llvm.mul %1996, %2560 : !llvm.i64 + %2562 = llvm.add %2559, %2561 : !llvm.i64 + %2563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2564 = llvm.mul %608, %2563 : !llvm.i64 + %2565 = llvm.add %2562, %2564 : !llvm.i64 + %2566 = llvm.getelementptr %2558[%2565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2567 = llvm.load %2566 : !llvm.ptr + %2568 = llvm.fadd %2567, %2557 {RelaxedPrecision} : !llvm.float + %2569 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2570 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2571 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2572 = llvm.mul %1996, %2571 : !llvm.i64 + %2573 = llvm.add %2570, %2572 : !llvm.i64 + %2574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2575 = llvm.mul %608, %2574 : !llvm.i64 + %2576 = llvm.add %2573, %2575 : !llvm.i64 + %2577 = llvm.getelementptr %2569[%2576] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2568, %2577 : !llvm.ptr + %2578 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2580 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2581 = llvm.mul %1996, %2580 : !llvm.i64 + %2582 = llvm.add %2579, %2581 : !llvm.i64 + %2583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2584 = llvm.mul %608, %2583 : !llvm.i64 + %2585 = llvm.add %2582, %2584 : !llvm.i64 + %2586 = llvm.getelementptr %2578[%2585] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2587 = llvm.load %2586 : !llvm.ptr + %2588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2591 = llvm.mul %1996, %2590 : !llvm.i64 + %2592 = llvm.add %2589, %2591 : !llvm.i64 + %2593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2594 = llvm.mul %608, %2593 : !llvm.i64 + %2595 = llvm.add %2592, %2594 : !llvm.i64 + %2596 = llvm.getelementptr %2588[%2595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2587, %2596 : !llvm.ptr + %2597 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2599 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2600 = llvm.mul %1996, %2599 : !llvm.i64 + %2601 = llvm.add %2598, %2600 : !llvm.i64 + %2602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2603 = llvm.mul %59, %2602 : !llvm.i64 + %2604 = llvm.add %2601, %2603 : !llvm.i64 + %2605 = llvm.getelementptr %2597[%2604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2606 = llvm.load %2605 : !llvm.ptr + %2607 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2608 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2610 = llvm.mul %59, %2609 : !llvm.i64 + %2611 = llvm.add %2608, %2610 : !llvm.i64 + %2612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2613 = llvm.mul %669, %2612 : !llvm.i64 + %2614 = llvm.add %2611, %2613 : !llvm.i64 + %2615 = llvm.getelementptr %2607[%2614] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2616 = llvm.load %2615 : !llvm.ptr + %2617 = llvm.fmul %2606, %2616 {RelaxedPrecision} : !llvm.float + %2618 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2620 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2621 = llvm.mul %1996, %2620 : !llvm.i64 + %2622 = llvm.add %2619, %2621 : !llvm.i64 + %2623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2624 = llvm.mul %669, %2623 : !llvm.i64 + %2625 = llvm.add %2622, %2624 : !llvm.i64 + %2626 = llvm.getelementptr %2618[%2625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2627 = llvm.load %2626 : !llvm.ptr + %2628 = llvm.fadd %2627, %2617 {RelaxedPrecision} : !llvm.float + %2629 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2630 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2631 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2632 = llvm.mul %1996, %2631 : !llvm.i64 + %2633 = llvm.add %2630, %2632 : !llvm.i64 + %2634 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2635 = llvm.mul %669, %2634 : !llvm.i64 + %2636 = llvm.add %2633, %2635 : !llvm.i64 + %2637 = llvm.getelementptr %2629[%2636] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2628, %2637 : !llvm.ptr + %2638 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2639 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2640 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2641 = llvm.mul %1996, %2640 : !llvm.i64 + %2642 = llvm.add %2639, %2641 : !llvm.i64 + %2643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2644 = llvm.mul %669, %2643 : !llvm.i64 + %2645 = llvm.add %2642, %2644 : !llvm.i64 + %2646 = llvm.getelementptr %2638[%2645] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2647 = llvm.load %2646 : !llvm.ptr + %2648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2651 = llvm.mul %1996, %2650 : !llvm.i64 + %2652 = llvm.add %2649, %2651 : !llvm.i64 + %2653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2654 = llvm.mul %669, %2653 : !llvm.i64 + %2655 = llvm.add %2652, %2654 : !llvm.i64 + %2656 = llvm.getelementptr %2648[%2655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2647, %2656 : !llvm.ptr + %2657 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2659 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2660 = llvm.mul %1996, %2659 : !llvm.i64 + %2661 = llvm.add %2658, %2660 : !llvm.i64 + %2662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2663 = llvm.mul %59, %2662 : !llvm.i64 + %2664 = llvm.add %2661, %2663 : !llvm.i64 + %2665 = llvm.getelementptr %2657[%2664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2666 = llvm.load %2665 : !llvm.ptr + %2667 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2668 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2669 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2670 = llvm.mul %59, %2669 : !llvm.i64 + %2671 = llvm.add %2668, %2670 : !llvm.i64 + %2672 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2673 = llvm.mul %730, %2672 : !llvm.i64 + %2674 = llvm.add %2671, %2673 : !llvm.i64 + %2675 = llvm.getelementptr %2667[%2674] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2676 = llvm.load %2675 : !llvm.ptr + %2677 = llvm.fmul %2666, %2676 {RelaxedPrecision} : !llvm.float + %2678 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2680 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2681 = llvm.mul %1996, %2680 : !llvm.i64 + %2682 = llvm.add %2679, %2681 : !llvm.i64 + %2683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2684 = llvm.mul %730, %2683 : !llvm.i64 + %2685 = llvm.add %2682, %2684 : !llvm.i64 + %2686 = llvm.getelementptr %2678[%2685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2687 = llvm.load %2686 : !llvm.ptr + %2688 = llvm.fadd %2687, %2677 {RelaxedPrecision} : !llvm.float + %2689 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2690 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2691 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2692 = llvm.mul %1996, %2691 : !llvm.i64 + %2693 = llvm.add %2690, %2692 : !llvm.i64 + %2694 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2695 = llvm.mul %730, %2694 : !llvm.i64 + %2696 = llvm.add %2693, %2695 : !llvm.i64 + %2697 = llvm.getelementptr %2689[%2696] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2688, %2697 : !llvm.ptr + %2698 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2700 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2701 = llvm.mul %1996, %2700 : !llvm.i64 + %2702 = llvm.add %2699, %2701 : !llvm.i64 + %2703 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2704 = llvm.mul %730, %2703 : !llvm.i64 + %2705 = llvm.add %2702, %2704 : !llvm.i64 + %2706 = llvm.getelementptr %2698[%2705] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2707 = llvm.load %2706 : !llvm.ptr + %2708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2711 = llvm.mul %1996, %2710 : !llvm.i64 + %2712 = llvm.add %2709, %2711 : !llvm.i64 + %2713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2714 = llvm.mul %730, %2713 : !llvm.i64 + %2715 = llvm.add %2712, %2714 : !llvm.i64 + %2716 = llvm.getelementptr %2708[%2715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2707, %2716 : !llvm.ptr + %2717 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2718 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2719 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2720 = llvm.mul %1996, %2719 : !llvm.i64 + %2721 = llvm.add %2718, %2720 : !llvm.i64 + %2722 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2723 = llvm.mul %59, %2722 : !llvm.i64 + %2724 = llvm.add %2721, %2723 : !llvm.i64 + %2725 = llvm.getelementptr %2717[%2724] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2726 = llvm.load %2725 : !llvm.ptr + %2727 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2729 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2730 = llvm.mul %59, %2729 : !llvm.i64 + %2731 = llvm.add %2728, %2730 : !llvm.i64 + %2732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2733 = llvm.mul %791, %2732 : !llvm.i64 + %2734 = llvm.add %2731, %2733 : !llvm.i64 + %2735 = llvm.getelementptr %2727[%2734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2736 = llvm.load %2735 : !llvm.ptr + %2737 = llvm.fmul %2726, %2736 {RelaxedPrecision} : !llvm.float + %2738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2741 = llvm.mul %1996, %2740 : !llvm.i64 + %2742 = llvm.add %2739, %2741 : !llvm.i64 + %2743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2744 = llvm.mul %791, %2743 : !llvm.i64 + %2745 = llvm.add %2742, %2744 : !llvm.i64 + %2746 = llvm.getelementptr %2738[%2745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2747 = llvm.load %2746 : !llvm.ptr + %2748 = llvm.fadd %2747, %2737 {RelaxedPrecision} : !llvm.float + %2749 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2750 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2751 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2752 = llvm.mul %1996, %2751 : !llvm.i64 + %2753 = llvm.add %2750, %2752 : !llvm.i64 + %2754 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2755 = llvm.mul %791, %2754 : !llvm.i64 + %2756 = llvm.add %2753, %2755 : !llvm.i64 + %2757 = llvm.getelementptr %2749[%2756] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2748, %2757 : !llvm.ptr + %2758 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2759 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2760 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2761 = llvm.mul %1996, %2760 : !llvm.i64 + %2762 = llvm.add %2759, %2761 : !llvm.i64 + %2763 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2764 = llvm.mul %791, %2763 : !llvm.i64 + %2765 = llvm.add %2762, %2764 : !llvm.i64 + %2766 = llvm.getelementptr %2758[%2765] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2767 = llvm.load %2766 : !llvm.ptr + %2768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2771 = llvm.mul %1996, %2770 : !llvm.i64 + %2772 = llvm.add %2769, %2771 : !llvm.i64 + %2773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2774 = llvm.mul %791, %2773 : !llvm.i64 + %2775 = llvm.add %2772, %2774 : !llvm.i64 + %2776 = llvm.getelementptr %2768[%2775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2767, %2776 : !llvm.ptr + %2777 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2779 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2780 = llvm.mul %1996, %2779 : !llvm.i64 + %2781 = llvm.add %2778, %2780 : !llvm.i64 + %2782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2783 = llvm.mul %59, %2782 : !llvm.i64 + %2784 = llvm.add %2781, %2783 : !llvm.i64 + %2785 = llvm.getelementptr %2777[%2784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2786 = llvm.load %2785 : !llvm.ptr + %2787 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2788 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2789 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2790 = llvm.mul %59, %2789 : !llvm.i64 + %2791 = llvm.add %2788, %2790 : !llvm.i64 + %2792 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2793 = llvm.mul %852, %2792 : !llvm.i64 + %2794 = llvm.add %2791, %2793 : !llvm.i64 + %2795 = llvm.getelementptr %2787[%2794] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2796 = llvm.load %2795 : !llvm.ptr + %2797 = llvm.fmul %2786, %2796 {RelaxedPrecision} : !llvm.float + %2798 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2800 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2801 = llvm.mul %1996, %2800 : !llvm.i64 + %2802 = llvm.add %2799, %2801 : !llvm.i64 + %2803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2804 = llvm.mul %852, %2803 : !llvm.i64 + %2805 = llvm.add %2802, %2804 : !llvm.i64 + %2806 = llvm.getelementptr %2798[%2805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2807 = llvm.load %2806 : !llvm.ptr + %2808 = llvm.fadd %2807, %2797 {RelaxedPrecision} : !llvm.float + %2809 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2810 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2811 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2812 = llvm.mul %1996, %2811 : !llvm.i64 + %2813 = llvm.add %2810, %2812 : !llvm.i64 + %2814 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2815 = llvm.mul %852, %2814 : !llvm.i64 + %2816 = llvm.add %2813, %2815 : !llvm.i64 + %2817 = llvm.getelementptr %2809[%2816] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2808, %2817 : !llvm.ptr + %2818 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2819 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2820 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2821 = llvm.mul %1996, %2820 : !llvm.i64 + %2822 = llvm.add %2819, %2821 : !llvm.i64 + %2823 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2824 = llvm.mul %852, %2823 : !llvm.i64 + %2825 = llvm.add %2822, %2824 : !llvm.i64 + %2826 = llvm.getelementptr %2818[%2825] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2827 = llvm.load %2826 : !llvm.ptr + %2828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2831 = llvm.mul %1996, %2830 : !llvm.i64 + %2832 = llvm.add %2829, %2831 : !llvm.i64 + %2833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2834 = llvm.mul %852, %2833 : !llvm.i64 + %2835 = llvm.add %2832, %2834 : !llvm.i64 + %2836 = llvm.getelementptr %2828[%2835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2827, %2836 : !llvm.ptr + %2837 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2839 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2840 = llvm.mul %1996, %2839 : !llvm.i64 + %2841 = llvm.add %2838, %2840 : !llvm.i64 + %2842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2843 = llvm.mul %59, %2842 : !llvm.i64 + %2844 = llvm.add %2841, %2843 : !llvm.i64 + %2845 = llvm.getelementptr %2837[%2844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2846 = llvm.load %2845 : !llvm.ptr + %2847 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2848 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2849 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2850 = llvm.mul %59, %2849 : !llvm.i64 + %2851 = llvm.add %2848, %2850 : !llvm.i64 + %2852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2853 = llvm.mul %913, %2852 : !llvm.i64 + %2854 = llvm.add %2851, %2853 : !llvm.i64 + %2855 = llvm.getelementptr %2847[%2854] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2856 = llvm.load %2855 : !llvm.ptr + %2857 = llvm.fmul %2846, %2856 {RelaxedPrecision} : !llvm.float + %2858 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2860 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2861 = llvm.mul %1996, %2860 : !llvm.i64 + %2862 = llvm.add %2859, %2861 : !llvm.i64 + %2863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2864 = llvm.mul %913, %2863 : !llvm.i64 + %2865 = llvm.add %2862, %2864 : !llvm.i64 + %2866 = llvm.getelementptr %2858[%2865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2867 = llvm.load %2866 : !llvm.ptr + %2868 = llvm.fadd %2867, %2857 {RelaxedPrecision} : !llvm.float + %2869 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2871 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2872 = llvm.mul %1996, %2871 : !llvm.i64 + %2873 = llvm.add %2870, %2872 : !llvm.i64 + %2874 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2875 = llvm.mul %913, %2874 : !llvm.i64 + %2876 = llvm.add %2873, %2875 : !llvm.i64 + %2877 = llvm.getelementptr %2869[%2876] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2868, %2877 : !llvm.ptr + %2878 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2880 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2881 = llvm.mul %1996, %2880 : !llvm.i64 + %2882 = llvm.add %2879, %2881 : !llvm.i64 + %2883 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2884 = llvm.mul %913, %2883 : !llvm.i64 + %2885 = llvm.add %2882, %2884 : !llvm.i64 + %2886 = llvm.getelementptr %2878[%2885] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2887 = llvm.load %2886 : !llvm.ptr + %2888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2891 = llvm.mul %1996, %2890 : !llvm.i64 + %2892 = llvm.add %2889, %2891 : !llvm.i64 + %2893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2894 = llvm.mul %913, %2893 : !llvm.i64 + %2895 = llvm.add %2892, %2894 : !llvm.i64 + %2896 = llvm.getelementptr %2888[%2895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2887, %2896 : !llvm.ptr + %2897 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2899 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2900 = llvm.mul %1996, %2899 : !llvm.i64 + %2901 = llvm.add %2898, %2900 : !llvm.i64 + %2902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2903 = llvm.mul %59, %2902 : !llvm.i64 + %2904 = llvm.add %2901, %2903 : !llvm.i64 + %2905 = llvm.getelementptr %2897[%2904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2906 = llvm.load %2905 : !llvm.ptr + %2907 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2908 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2909 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2910 = llvm.mul %59, %2909 : !llvm.i64 + %2911 = llvm.add %2908, %2910 : !llvm.i64 + %2912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2913 = llvm.mul %974, %2912 : !llvm.i64 + %2914 = llvm.add %2911, %2913 : !llvm.i64 + %2915 = llvm.getelementptr %2907[%2914] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2916 = llvm.load %2915 : !llvm.ptr + %2917 = llvm.fmul %2906, %2916 {RelaxedPrecision} : !llvm.float + %2918 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2920 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2921 = llvm.mul %1996, %2920 : !llvm.i64 + %2922 = llvm.add %2919, %2921 : !llvm.i64 + %2923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2924 = llvm.mul %974, %2923 : !llvm.i64 + %2925 = llvm.add %2922, %2924 : !llvm.i64 + %2926 = llvm.getelementptr %2918[%2925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2927 = llvm.load %2926 : !llvm.ptr + %2928 = llvm.fadd %2927, %2917 {RelaxedPrecision} : !llvm.float + %2929 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2930 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2931 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2932 = llvm.mul %1996, %2931 : !llvm.i64 + %2933 = llvm.add %2930, %2932 : !llvm.i64 + %2934 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2935 = llvm.mul %974, %2934 : !llvm.i64 + %2936 = llvm.add %2933, %2935 : !llvm.i64 + %2937 = llvm.getelementptr %2929[%2936] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2928, %2937 : !llvm.ptr + %2938 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2940 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2941 = llvm.mul %1996, %2940 : !llvm.i64 + %2942 = llvm.add %2939, %2941 : !llvm.i64 + %2943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2944 = llvm.mul %974, %2943 : !llvm.i64 + %2945 = llvm.add %2942, %2944 : !llvm.i64 + %2946 = llvm.getelementptr %2938[%2945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2947 = llvm.load %2946 : !llvm.ptr + %2948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2951 = llvm.mul %1996, %2950 : !llvm.i64 + %2952 = llvm.add %2949, %2951 : !llvm.i64 + %2953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2954 = llvm.mul %974, %2953 : !llvm.i64 + %2955 = llvm.add %2952, %2954 : !llvm.i64 + %2956 = llvm.getelementptr %2948[%2955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2947, %2956 : !llvm.ptr + %2957 = llvm.add %50, %35 : !llvm.i64 + %2958 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2960 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2961 = llvm.mul %2957, %2960 : !llvm.i64 + %2962 = llvm.add %2959, %2961 : !llvm.i64 + %2963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2964 = llvm.mul %59, %2963 : !llvm.i64 + %2965 = llvm.add %2962, %2964 : !llvm.i64 + %2966 = llvm.getelementptr %2958[%2965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2967 = llvm.load %2966 : !llvm.ptr + %2968 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2971 = llvm.mul %59, %2970 : !llvm.i64 + %2972 = llvm.add %2969, %2971 : !llvm.i64 + %2973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2974 = llvm.mul %58, %2973 : !llvm.i64 + %2975 = llvm.add %2972, %2974 : !llvm.i64 + %2976 = llvm.getelementptr %2968[%2975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2977 = llvm.load %2976 : !llvm.ptr + %2978 = llvm.fmul %2967, %2977 {RelaxedPrecision} : !llvm.float + %2979 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2981 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2982 = llvm.mul %2957, %2981 : !llvm.i64 + %2983 = llvm.add %2980, %2982 : !llvm.i64 + %2984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2985 = llvm.mul %58, %2984 : !llvm.i64 + %2986 = llvm.add %2983, %2985 : !llvm.i64 + %2987 = llvm.getelementptr %2979[%2986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2988 = llvm.load %2987 : !llvm.ptr + %2989 = llvm.fadd %2988, %2978 {RelaxedPrecision} : !llvm.float + %2990 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2991 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2992 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2993 = llvm.mul %2957, %2992 : !llvm.i64 + %2994 = llvm.add %2991, %2993 : !llvm.i64 + %2995 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2996 = llvm.mul %58, %2995 : !llvm.i64 + %2997 = llvm.add %2994, %2996 : !llvm.i64 + %2998 = llvm.getelementptr %2990[%2997] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2989, %2998 : !llvm.ptr + %2999 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3000 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3001 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3002 = llvm.mul %2957, %3001 : !llvm.i64 + %3003 = llvm.add %3000, %3002 : !llvm.i64 + %3004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3005 = llvm.mul %58, %3004 : !llvm.i64 + %3006 = llvm.add %3003, %3005 : !llvm.i64 + %3007 = llvm.getelementptr %2999[%3006] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3008 = llvm.load %3007 : !llvm.ptr + %3009 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3010 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3011 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3012 = llvm.mul %2957, %3011 : !llvm.i64 + %3013 = llvm.add %3010, %3012 : !llvm.i64 + %3014 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3015 = llvm.mul %58, %3014 : !llvm.i64 + %3016 = llvm.add %3013, %3015 : !llvm.i64 + %3017 = llvm.getelementptr %3009[%3016] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3008, %3017 : !llvm.ptr + %3018 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3020 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3021 = llvm.mul %2957, %3020 : !llvm.i64 + %3022 = llvm.add %3019, %3021 : !llvm.i64 + %3023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3024 = llvm.mul %59, %3023 : !llvm.i64 + %3025 = llvm.add %3022, %3024 : !llvm.i64 + %3026 = llvm.getelementptr %3018[%3025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3027 = llvm.load %3026 : !llvm.ptr + %3028 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3031 = llvm.mul %59, %3030 : !llvm.i64 + %3032 = llvm.add %3029, %3031 : !llvm.i64 + %3033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3034 = llvm.mul %120, %3033 : !llvm.i64 + %3035 = llvm.add %3032, %3034 : !llvm.i64 + %3036 = llvm.getelementptr %3028[%3035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3037 = llvm.load %3036 : !llvm.ptr + %3038 = llvm.fmul %3027, %3037 {RelaxedPrecision} : !llvm.float + %3039 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3041 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3042 = llvm.mul %2957, %3041 : !llvm.i64 + %3043 = llvm.add %3040, %3042 : !llvm.i64 + %3044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3045 = llvm.mul %120, %3044 : !llvm.i64 + %3046 = llvm.add %3043, %3045 : !llvm.i64 + %3047 = llvm.getelementptr %3039[%3046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3048 = llvm.load %3047 : !llvm.ptr + %3049 = llvm.fadd %3048, %3038 {RelaxedPrecision} : !llvm.float + %3050 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3051 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3052 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3053 = llvm.mul %2957, %3052 : !llvm.i64 + %3054 = llvm.add %3051, %3053 : !llvm.i64 + %3055 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3056 = llvm.mul %120, %3055 : !llvm.i64 + %3057 = llvm.add %3054, %3056 : !llvm.i64 + %3058 = llvm.getelementptr %3050[%3057] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3049, %3058 : !llvm.ptr + %3059 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3060 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3061 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3062 = llvm.mul %2957, %3061 : !llvm.i64 + %3063 = llvm.add %3060, %3062 : !llvm.i64 + %3064 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3065 = llvm.mul %120, %3064 : !llvm.i64 + %3066 = llvm.add %3063, %3065 : !llvm.i64 + %3067 = llvm.getelementptr %3059[%3066] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3068 = llvm.load %3067 : !llvm.ptr + %3069 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3070 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3071 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3072 = llvm.mul %2957, %3071 : !llvm.i64 + %3073 = llvm.add %3070, %3072 : !llvm.i64 + %3074 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3075 = llvm.mul %120, %3074 : !llvm.i64 + %3076 = llvm.add %3073, %3075 : !llvm.i64 + %3077 = llvm.getelementptr %3069[%3076] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3068, %3077 : !llvm.ptr + %3078 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3080 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3081 = llvm.mul %2957, %3080 : !llvm.i64 + %3082 = llvm.add %3079, %3081 : !llvm.i64 + %3083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3084 = llvm.mul %59, %3083 : !llvm.i64 + %3085 = llvm.add %3082, %3084 : !llvm.i64 + %3086 = llvm.getelementptr %3078[%3085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3087 = llvm.load %3086 : !llvm.ptr + %3088 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3091 = llvm.mul %59, %3090 : !llvm.i64 + %3092 = llvm.add %3089, %3091 : !llvm.i64 + %3093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3094 = llvm.mul %181, %3093 : !llvm.i64 + %3095 = llvm.add %3092, %3094 : !llvm.i64 + %3096 = llvm.getelementptr %3088[%3095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3097 = llvm.load %3096 : !llvm.ptr + %3098 = llvm.fmul %3087, %3097 {RelaxedPrecision} : !llvm.float + %3099 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3101 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3102 = llvm.mul %2957, %3101 : !llvm.i64 + %3103 = llvm.add %3100, %3102 : !llvm.i64 + %3104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3105 = llvm.mul %181, %3104 : !llvm.i64 + %3106 = llvm.add %3103, %3105 : !llvm.i64 + %3107 = llvm.getelementptr %3099[%3106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3108 = llvm.load %3107 : !llvm.ptr + %3109 = llvm.fadd %3108, %3098 {RelaxedPrecision} : !llvm.float + %3110 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3111 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3112 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3113 = llvm.mul %2957, %3112 : !llvm.i64 + %3114 = llvm.add %3111, %3113 : !llvm.i64 + %3115 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3116 = llvm.mul %181, %3115 : !llvm.i64 + %3117 = llvm.add %3114, %3116 : !llvm.i64 + %3118 = llvm.getelementptr %3110[%3117] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3109, %3118 : !llvm.ptr + %3119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3121 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3122 = llvm.mul %2957, %3121 : !llvm.i64 + %3123 = llvm.add %3120, %3122 : !llvm.i64 + %3124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3125 = llvm.mul %181, %3124 : !llvm.i64 + %3126 = llvm.add %3123, %3125 : !llvm.i64 + %3127 = llvm.getelementptr %3119[%3126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3128 = llvm.load %3127 : !llvm.ptr + %3129 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3130 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3131 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3132 = llvm.mul %2957, %3131 : !llvm.i64 + %3133 = llvm.add %3130, %3132 : !llvm.i64 + %3134 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3135 = llvm.mul %181, %3134 : !llvm.i64 + %3136 = llvm.add %3133, %3135 : !llvm.i64 + %3137 = llvm.getelementptr %3129[%3136] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3128, %3137 : !llvm.ptr + %3138 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3140 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3141 = llvm.mul %2957, %3140 : !llvm.i64 + %3142 = llvm.add %3139, %3141 : !llvm.i64 + %3143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3144 = llvm.mul %59, %3143 : !llvm.i64 + %3145 = llvm.add %3142, %3144 : !llvm.i64 + %3146 = llvm.getelementptr %3138[%3145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3147 = llvm.load %3146 : !llvm.ptr + %3148 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3151 = llvm.mul %59, %3150 : !llvm.i64 + %3152 = llvm.add %3149, %3151 : !llvm.i64 + %3153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3154 = llvm.mul %242, %3153 : !llvm.i64 + %3155 = llvm.add %3152, %3154 : !llvm.i64 + %3156 = llvm.getelementptr %3148[%3155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3157 = llvm.load %3156 : !llvm.ptr + %3158 = llvm.fmul %3147, %3157 {RelaxedPrecision} : !llvm.float + %3159 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3161 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3162 = llvm.mul %2957, %3161 : !llvm.i64 + %3163 = llvm.add %3160, %3162 : !llvm.i64 + %3164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3165 = llvm.mul %242, %3164 : !llvm.i64 + %3166 = llvm.add %3163, %3165 : !llvm.i64 + %3167 = llvm.getelementptr %3159[%3166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3168 = llvm.load %3167 : !llvm.ptr + %3169 = llvm.fadd %3168, %3158 {RelaxedPrecision} : !llvm.float + %3170 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3171 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3172 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3173 = llvm.mul %2957, %3172 : !llvm.i64 + %3174 = llvm.add %3171, %3173 : !llvm.i64 + %3175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3176 = llvm.mul %242, %3175 : !llvm.i64 + %3177 = llvm.add %3174, %3176 : !llvm.i64 + %3178 = llvm.getelementptr %3170[%3177] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3169, %3178 : !llvm.ptr + %3179 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3182 = llvm.mul %2957, %3181 : !llvm.i64 + %3183 = llvm.add %3180, %3182 : !llvm.i64 + %3184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3185 = llvm.mul %242, %3184 : !llvm.i64 + %3186 = llvm.add %3183, %3185 : !llvm.i64 + %3187 = llvm.getelementptr %3179[%3186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3188 = llvm.load %3187 : !llvm.ptr + %3189 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3191 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3192 = llvm.mul %2957, %3191 : !llvm.i64 + %3193 = llvm.add %3190, %3192 : !llvm.i64 + %3194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3195 = llvm.mul %242, %3194 : !llvm.i64 + %3196 = llvm.add %3193, %3195 : !llvm.i64 + %3197 = llvm.getelementptr %3189[%3196] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3188, %3197 : !llvm.ptr + %3198 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3200 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3201 = llvm.mul %2957, %3200 : !llvm.i64 + %3202 = llvm.add %3199, %3201 : !llvm.i64 + %3203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3204 = llvm.mul %59, %3203 : !llvm.i64 + %3205 = llvm.add %3202, %3204 : !llvm.i64 + %3206 = llvm.getelementptr %3198[%3205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3207 = llvm.load %3206 : !llvm.ptr + %3208 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3211 = llvm.mul %59, %3210 : !llvm.i64 + %3212 = llvm.add %3209, %3211 : !llvm.i64 + %3213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3214 = llvm.mul %303, %3213 : !llvm.i64 + %3215 = llvm.add %3212, %3214 : !llvm.i64 + %3216 = llvm.getelementptr %3208[%3215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3217 = llvm.load %3216 : !llvm.ptr + %3218 = llvm.fmul %3207, %3217 {RelaxedPrecision} : !llvm.float + %3219 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3221 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3222 = llvm.mul %2957, %3221 : !llvm.i64 + %3223 = llvm.add %3220, %3222 : !llvm.i64 + %3224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3225 = llvm.mul %303, %3224 : !llvm.i64 + %3226 = llvm.add %3223, %3225 : !llvm.i64 + %3227 = llvm.getelementptr %3219[%3226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3228 = llvm.load %3227 : !llvm.ptr + %3229 = llvm.fadd %3228, %3218 {RelaxedPrecision} : !llvm.float + %3230 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3232 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3233 = llvm.mul %2957, %3232 : !llvm.i64 + %3234 = llvm.add %3231, %3233 : !llvm.i64 + %3235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3236 = llvm.mul %303, %3235 : !llvm.i64 + %3237 = llvm.add %3234, %3236 : !llvm.i64 + %3238 = llvm.getelementptr %3230[%3237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3229, %3238 : !llvm.ptr + %3239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3242 = llvm.mul %2957, %3241 : !llvm.i64 + %3243 = llvm.add %3240, %3242 : !llvm.i64 + %3244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3245 = llvm.mul %303, %3244 : !llvm.i64 + %3246 = llvm.add %3243, %3245 : !llvm.i64 + %3247 = llvm.getelementptr %3239[%3246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3248 = llvm.load %3247 : !llvm.ptr + %3249 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3250 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3251 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3252 = llvm.mul %2957, %3251 : !llvm.i64 + %3253 = llvm.add %3250, %3252 : !llvm.i64 + %3254 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3255 = llvm.mul %303, %3254 : !llvm.i64 + %3256 = llvm.add %3253, %3255 : !llvm.i64 + %3257 = llvm.getelementptr %3249[%3256] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3248, %3257 : !llvm.ptr + %3258 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3260 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3261 = llvm.mul %2957, %3260 : !llvm.i64 + %3262 = llvm.add %3259, %3261 : !llvm.i64 + %3263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3264 = llvm.mul %59, %3263 : !llvm.i64 + %3265 = llvm.add %3262, %3264 : !llvm.i64 + %3266 = llvm.getelementptr %3258[%3265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3267 = llvm.load %3266 : !llvm.ptr + %3268 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3271 = llvm.mul %59, %3270 : !llvm.i64 + %3272 = llvm.add %3269, %3271 : !llvm.i64 + %3273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3274 = llvm.mul %364, %3273 : !llvm.i64 + %3275 = llvm.add %3272, %3274 : !llvm.i64 + %3276 = llvm.getelementptr %3268[%3275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3277 = llvm.load %3276 : !llvm.ptr + %3278 = llvm.fmul %3267, %3277 {RelaxedPrecision} : !llvm.float + %3279 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3281 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3282 = llvm.mul %2957, %3281 : !llvm.i64 + %3283 = llvm.add %3280, %3282 : !llvm.i64 + %3284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3285 = llvm.mul %364, %3284 : !llvm.i64 + %3286 = llvm.add %3283, %3285 : !llvm.i64 + %3287 = llvm.getelementptr %3279[%3286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3288 = llvm.load %3287 : !llvm.ptr + %3289 = llvm.fadd %3288, %3278 {RelaxedPrecision} : !llvm.float + %3290 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3291 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3292 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3293 = llvm.mul %2957, %3292 : !llvm.i64 + %3294 = llvm.add %3291, %3293 : !llvm.i64 + %3295 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3296 = llvm.mul %364, %3295 : !llvm.i64 + %3297 = llvm.add %3294, %3296 : !llvm.i64 + %3298 = llvm.getelementptr %3290[%3297] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3289, %3298 : !llvm.ptr + %3299 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3300 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3301 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3302 = llvm.mul %2957, %3301 : !llvm.i64 + %3303 = llvm.add %3300, %3302 : !llvm.i64 + %3304 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3305 = llvm.mul %364, %3304 : !llvm.i64 + %3306 = llvm.add %3303, %3305 : !llvm.i64 + %3307 = llvm.getelementptr %3299[%3306] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3308 = llvm.load %3307 : !llvm.ptr + %3309 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3310 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3311 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3312 = llvm.mul %2957, %3311 : !llvm.i64 + %3313 = llvm.add %3310, %3312 : !llvm.i64 + %3314 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3315 = llvm.mul %364, %3314 : !llvm.i64 + %3316 = llvm.add %3313, %3315 : !llvm.i64 + %3317 = llvm.getelementptr %3309[%3316] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3308, %3317 : !llvm.ptr + %3318 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3320 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3321 = llvm.mul %2957, %3320 : !llvm.i64 + %3322 = llvm.add %3319, %3321 : !llvm.i64 + %3323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3324 = llvm.mul %59, %3323 : !llvm.i64 + %3325 = llvm.add %3322, %3324 : !llvm.i64 + %3326 = llvm.getelementptr %3318[%3325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3327 = llvm.load %3326 : !llvm.ptr + %3328 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3331 = llvm.mul %59, %3330 : !llvm.i64 + %3332 = llvm.add %3329, %3331 : !llvm.i64 + %3333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3334 = llvm.mul %425, %3333 : !llvm.i64 + %3335 = llvm.add %3332, %3334 : !llvm.i64 + %3336 = llvm.getelementptr %3328[%3335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3337 = llvm.load %3336 : !llvm.ptr + %3338 = llvm.fmul %3327, %3337 {RelaxedPrecision} : !llvm.float + %3339 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3341 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3342 = llvm.mul %2957, %3341 : !llvm.i64 + %3343 = llvm.add %3340, %3342 : !llvm.i64 + %3344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3345 = llvm.mul %425, %3344 : !llvm.i64 + %3346 = llvm.add %3343, %3345 : !llvm.i64 + %3347 = llvm.getelementptr %3339[%3346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3348 = llvm.load %3347 : !llvm.ptr + %3349 = llvm.fadd %3348, %3338 {RelaxedPrecision} : !llvm.float + %3350 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3351 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3352 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3353 = llvm.mul %2957, %3352 : !llvm.i64 + %3354 = llvm.add %3351, %3353 : !llvm.i64 + %3355 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3356 = llvm.mul %425, %3355 : !llvm.i64 + %3357 = llvm.add %3354, %3356 : !llvm.i64 + %3358 = llvm.getelementptr %3350[%3357] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3349, %3358 : !llvm.ptr + %3359 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3360 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3361 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3362 = llvm.mul %2957, %3361 : !llvm.i64 + %3363 = llvm.add %3360, %3362 : !llvm.i64 + %3364 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3365 = llvm.mul %425, %3364 : !llvm.i64 + %3366 = llvm.add %3363, %3365 : !llvm.i64 + %3367 = llvm.getelementptr %3359[%3366] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3368 = llvm.load %3367 : !llvm.ptr + %3369 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3370 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3371 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3372 = llvm.mul %2957, %3371 : !llvm.i64 + %3373 = llvm.add %3370, %3372 : !llvm.i64 + %3374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3375 = llvm.mul %425, %3374 : !llvm.i64 + %3376 = llvm.add %3373, %3375 : !llvm.i64 + %3377 = llvm.getelementptr %3369[%3376] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3368, %3377 : !llvm.ptr + %3378 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3380 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3381 = llvm.mul %2957, %3380 : !llvm.i64 + %3382 = llvm.add %3379, %3381 : !llvm.i64 + %3383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3384 = llvm.mul %59, %3383 : !llvm.i64 + %3385 = llvm.add %3382, %3384 : !llvm.i64 + %3386 = llvm.getelementptr %3378[%3385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3387 = llvm.load %3386 : !llvm.ptr + %3388 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3391 = llvm.mul %59, %3390 : !llvm.i64 + %3392 = llvm.add %3389, %3391 : !llvm.i64 + %3393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3394 = llvm.mul %486, %3393 : !llvm.i64 + %3395 = llvm.add %3392, %3394 : !llvm.i64 + %3396 = llvm.getelementptr %3388[%3395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3397 = llvm.load %3396 : !llvm.ptr + %3398 = llvm.fmul %3387, %3397 {RelaxedPrecision} : !llvm.float + %3399 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3401 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3402 = llvm.mul %2957, %3401 : !llvm.i64 + %3403 = llvm.add %3400, %3402 : !llvm.i64 + %3404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3405 = llvm.mul %486, %3404 : !llvm.i64 + %3406 = llvm.add %3403, %3405 : !llvm.i64 + %3407 = llvm.getelementptr %3399[%3406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3408 = llvm.load %3407 : !llvm.ptr + %3409 = llvm.fadd %3408, %3398 {RelaxedPrecision} : !llvm.float + %3410 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3413 = llvm.mul %2957, %3412 : !llvm.i64 + %3414 = llvm.add %3411, %3413 : !llvm.i64 + %3415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3416 = llvm.mul %486, %3415 : !llvm.i64 + %3417 = llvm.add %3414, %3416 : !llvm.i64 + %3418 = llvm.getelementptr %3410[%3417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3409, %3418 : !llvm.ptr + %3419 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3422 = llvm.mul %2957, %3421 : !llvm.i64 + %3423 = llvm.add %3420, %3422 : !llvm.i64 + %3424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3425 = llvm.mul %486, %3424 : !llvm.i64 + %3426 = llvm.add %3423, %3425 : !llvm.i64 + %3427 = llvm.getelementptr %3419[%3426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3428 = llvm.load %3427 : !llvm.ptr + %3429 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3430 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3431 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3432 = llvm.mul %2957, %3431 : !llvm.i64 + %3433 = llvm.add %3430, %3432 : !llvm.i64 + %3434 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3435 = llvm.mul %486, %3434 : !llvm.i64 + %3436 = llvm.add %3433, %3435 : !llvm.i64 + %3437 = llvm.getelementptr %3429[%3436] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3428, %3437 : !llvm.ptr + %3438 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3440 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3441 = llvm.mul %2957, %3440 : !llvm.i64 + %3442 = llvm.add %3439, %3441 : !llvm.i64 + %3443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3444 = llvm.mul %59, %3443 : !llvm.i64 + %3445 = llvm.add %3442, %3444 : !llvm.i64 + %3446 = llvm.getelementptr %3438[%3445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3447 = llvm.load %3446 : !llvm.ptr + %3448 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3451 = llvm.mul %59, %3450 : !llvm.i64 + %3452 = llvm.add %3449, %3451 : !llvm.i64 + %3453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3454 = llvm.mul %547, %3453 : !llvm.i64 + %3455 = llvm.add %3452, %3454 : !llvm.i64 + %3456 = llvm.getelementptr %3448[%3455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3457 = llvm.load %3456 : !llvm.ptr + %3458 = llvm.fmul %3447, %3457 {RelaxedPrecision} : !llvm.float + %3459 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3461 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3462 = llvm.mul %2957, %3461 : !llvm.i64 + %3463 = llvm.add %3460, %3462 : !llvm.i64 + %3464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3465 = llvm.mul %547, %3464 : !llvm.i64 + %3466 = llvm.add %3463, %3465 : !llvm.i64 + %3467 = llvm.getelementptr %3459[%3466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3468 = llvm.load %3467 : !llvm.ptr + %3469 = llvm.fadd %3468, %3458 {RelaxedPrecision} : !llvm.float + %3470 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3471 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3472 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3473 = llvm.mul %2957, %3472 : !llvm.i64 + %3474 = llvm.add %3471, %3473 : !llvm.i64 + %3475 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3476 = llvm.mul %547, %3475 : !llvm.i64 + %3477 = llvm.add %3474, %3476 : !llvm.i64 + %3478 = llvm.getelementptr %3470[%3477] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3469, %3478 : !llvm.ptr + %3479 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3480 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3481 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3482 = llvm.mul %2957, %3481 : !llvm.i64 + %3483 = llvm.add %3480, %3482 : !llvm.i64 + %3484 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3485 = llvm.mul %547, %3484 : !llvm.i64 + %3486 = llvm.add %3483, %3485 : !llvm.i64 + %3487 = llvm.getelementptr %3479[%3486] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3488 = llvm.load %3487 : !llvm.ptr + %3489 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3490 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3491 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3492 = llvm.mul %2957, %3491 : !llvm.i64 + %3493 = llvm.add %3490, %3492 : !llvm.i64 + %3494 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3495 = llvm.mul %547, %3494 : !llvm.i64 + %3496 = llvm.add %3493, %3495 : !llvm.i64 + %3497 = llvm.getelementptr %3489[%3496] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3488, %3497 : !llvm.ptr + %3498 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3500 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3501 = llvm.mul %2957, %3500 : !llvm.i64 + %3502 = llvm.add %3499, %3501 : !llvm.i64 + %3503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3504 = llvm.mul %59, %3503 : !llvm.i64 + %3505 = llvm.add %3502, %3504 : !llvm.i64 + %3506 = llvm.getelementptr %3498[%3505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3507 = llvm.load %3506 : !llvm.ptr + %3508 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3511 = llvm.mul %59, %3510 : !llvm.i64 + %3512 = llvm.add %3509, %3511 : !llvm.i64 + %3513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3514 = llvm.mul %608, %3513 : !llvm.i64 + %3515 = llvm.add %3512, %3514 : !llvm.i64 + %3516 = llvm.getelementptr %3508[%3515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3517 = llvm.load %3516 : !llvm.ptr + %3518 = llvm.fmul %3507, %3517 {RelaxedPrecision} : !llvm.float + %3519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3522 = llvm.mul %2957, %3521 : !llvm.i64 + %3523 = llvm.add %3520, %3522 : !llvm.i64 + %3524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3525 = llvm.mul %608, %3524 : !llvm.i64 + %3526 = llvm.add %3523, %3525 : !llvm.i64 + %3527 = llvm.getelementptr %3519[%3526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3528 = llvm.load %3527 : !llvm.ptr + %3529 = llvm.fadd %3528, %3518 {RelaxedPrecision} : !llvm.float + %3530 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3531 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3532 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3533 = llvm.mul %2957, %3532 : !llvm.i64 + %3534 = llvm.add %3531, %3533 : !llvm.i64 + %3535 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3536 = llvm.mul %608, %3535 : !llvm.i64 + %3537 = llvm.add %3534, %3536 : !llvm.i64 + %3538 = llvm.getelementptr %3530[%3537] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3529, %3538 : !llvm.ptr + %3539 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3540 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3541 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3542 = llvm.mul %2957, %3541 : !llvm.i64 + %3543 = llvm.add %3540, %3542 : !llvm.i64 + %3544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3545 = llvm.mul %608, %3544 : !llvm.i64 + %3546 = llvm.add %3543, %3545 : !llvm.i64 + %3547 = llvm.getelementptr %3539[%3546] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3548 = llvm.load %3547 : !llvm.ptr + %3549 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3550 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3551 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3552 = llvm.mul %2957, %3551 : !llvm.i64 + %3553 = llvm.add %3550, %3552 : !llvm.i64 + %3554 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3555 = llvm.mul %608, %3554 : !llvm.i64 + %3556 = llvm.add %3553, %3555 : !llvm.i64 + %3557 = llvm.getelementptr %3549[%3556] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3548, %3557 : !llvm.ptr + %3558 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3560 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3561 = llvm.mul %2957, %3560 : !llvm.i64 + %3562 = llvm.add %3559, %3561 : !llvm.i64 + %3563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3564 = llvm.mul %59, %3563 : !llvm.i64 + %3565 = llvm.add %3562, %3564 : !llvm.i64 + %3566 = llvm.getelementptr %3558[%3565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3567 = llvm.load %3566 : !llvm.ptr + %3568 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3571 = llvm.mul %59, %3570 : !llvm.i64 + %3572 = llvm.add %3569, %3571 : !llvm.i64 + %3573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3574 = llvm.mul %669, %3573 : !llvm.i64 + %3575 = llvm.add %3572, %3574 : !llvm.i64 + %3576 = llvm.getelementptr %3568[%3575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3577 = llvm.load %3576 : !llvm.ptr + %3578 = llvm.fmul %3567, %3577 {RelaxedPrecision} : !llvm.float + %3579 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3581 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3582 = llvm.mul %2957, %3581 : !llvm.i64 + %3583 = llvm.add %3580, %3582 : !llvm.i64 + %3584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3585 = llvm.mul %669, %3584 : !llvm.i64 + %3586 = llvm.add %3583, %3585 : !llvm.i64 + %3587 = llvm.getelementptr %3579[%3586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3588 = llvm.load %3587 : !llvm.ptr + %3589 = llvm.fadd %3588, %3578 {RelaxedPrecision} : !llvm.float + %3590 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3591 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3592 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3593 = llvm.mul %2957, %3592 : !llvm.i64 + %3594 = llvm.add %3591, %3593 : !llvm.i64 + %3595 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3596 = llvm.mul %669, %3595 : !llvm.i64 + %3597 = llvm.add %3594, %3596 : !llvm.i64 + %3598 = llvm.getelementptr %3590[%3597] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3589, %3598 : !llvm.ptr + %3599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3602 = llvm.mul %2957, %3601 : !llvm.i64 + %3603 = llvm.add %3600, %3602 : !llvm.i64 + %3604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3605 = llvm.mul %669, %3604 : !llvm.i64 + %3606 = llvm.add %3603, %3605 : !llvm.i64 + %3607 = llvm.getelementptr %3599[%3606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3608 = llvm.load %3607 : !llvm.ptr + %3609 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3610 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3611 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3612 = llvm.mul %2957, %3611 : !llvm.i64 + %3613 = llvm.add %3610, %3612 : !llvm.i64 + %3614 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3615 = llvm.mul %669, %3614 : !llvm.i64 + %3616 = llvm.add %3613, %3615 : !llvm.i64 + %3617 = llvm.getelementptr %3609[%3616] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3608, %3617 : !llvm.ptr + %3618 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3620 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3621 = llvm.mul %2957, %3620 : !llvm.i64 + %3622 = llvm.add %3619, %3621 : !llvm.i64 + %3623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3624 = llvm.mul %59, %3623 : !llvm.i64 + %3625 = llvm.add %3622, %3624 : !llvm.i64 + %3626 = llvm.getelementptr %3618[%3625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3627 = llvm.load %3626 : !llvm.ptr + %3628 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3631 = llvm.mul %59, %3630 : !llvm.i64 + %3632 = llvm.add %3629, %3631 : !llvm.i64 + %3633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3634 = llvm.mul %730, %3633 : !llvm.i64 + %3635 = llvm.add %3632, %3634 : !llvm.i64 + %3636 = llvm.getelementptr %3628[%3635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3637 = llvm.load %3636 : !llvm.ptr + %3638 = llvm.fmul %3627, %3637 {RelaxedPrecision} : !llvm.float + %3639 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3641 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3642 = llvm.mul %2957, %3641 : !llvm.i64 + %3643 = llvm.add %3640, %3642 : !llvm.i64 + %3644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3645 = llvm.mul %730, %3644 : !llvm.i64 + %3646 = llvm.add %3643, %3645 : !llvm.i64 + %3647 = llvm.getelementptr %3639[%3646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3648 = llvm.load %3647 : !llvm.ptr + %3649 = llvm.fadd %3648, %3638 {RelaxedPrecision} : !llvm.float + %3650 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3651 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3652 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3653 = llvm.mul %2957, %3652 : !llvm.i64 + %3654 = llvm.add %3651, %3653 : !llvm.i64 + %3655 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3656 = llvm.mul %730, %3655 : !llvm.i64 + %3657 = llvm.add %3654, %3656 : !llvm.i64 + %3658 = llvm.getelementptr %3650[%3657] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3649, %3658 : !llvm.ptr + %3659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3662 = llvm.mul %2957, %3661 : !llvm.i64 + %3663 = llvm.add %3660, %3662 : !llvm.i64 + %3664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3665 = llvm.mul %730, %3664 : !llvm.i64 + %3666 = llvm.add %3663, %3665 : !llvm.i64 + %3667 = llvm.getelementptr %3659[%3666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3668 = llvm.load %3667 : !llvm.ptr + %3669 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3670 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3671 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3672 = llvm.mul %2957, %3671 : !llvm.i64 + %3673 = llvm.add %3670, %3672 : !llvm.i64 + %3674 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3675 = llvm.mul %730, %3674 : !llvm.i64 + %3676 = llvm.add %3673, %3675 : !llvm.i64 + %3677 = llvm.getelementptr %3669[%3676] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3668, %3677 : !llvm.ptr + %3678 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3680 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3681 = llvm.mul %2957, %3680 : !llvm.i64 + %3682 = llvm.add %3679, %3681 : !llvm.i64 + %3683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3684 = llvm.mul %59, %3683 : !llvm.i64 + %3685 = llvm.add %3682, %3684 : !llvm.i64 + %3686 = llvm.getelementptr %3678[%3685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3687 = llvm.load %3686 : !llvm.ptr + %3688 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3691 = llvm.mul %59, %3690 : !llvm.i64 + %3692 = llvm.add %3689, %3691 : !llvm.i64 + %3693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3694 = llvm.mul %791, %3693 : !llvm.i64 + %3695 = llvm.add %3692, %3694 : !llvm.i64 + %3696 = llvm.getelementptr %3688[%3695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3697 = llvm.load %3696 : !llvm.ptr + %3698 = llvm.fmul %3687, %3697 {RelaxedPrecision} : !llvm.float + %3699 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3701 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3702 = llvm.mul %2957, %3701 : !llvm.i64 + %3703 = llvm.add %3700, %3702 : !llvm.i64 + %3704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3705 = llvm.mul %791, %3704 : !llvm.i64 + %3706 = llvm.add %3703, %3705 : !llvm.i64 + %3707 = llvm.getelementptr %3699[%3706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3708 = llvm.load %3707 : !llvm.ptr + %3709 = llvm.fadd %3708, %3698 {RelaxedPrecision} : !llvm.float + %3710 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3711 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3712 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3713 = llvm.mul %2957, %3712 : !llvm.i64 + %3714 = llvm.add %3711, %3713 : !llvm.i64 + %3715 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3716 = llvm.mul %791, %3715 : !llvm.i64 + %3717 = llvm.add %3714, %3716 : !llvm.i64 + %3718 = llvm.getelementptr %3710[%3717] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3709, %3718 : !llvm.ptr + %3719 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3720 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3721 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3722 = llvm.mul %2957, %3721 : !llvm.i64 + %3723 = llvm.add %3720, %3722 : !llvm.i64 + %3724 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3725 = llvm.mul %791, %3724 : !llvm.i64 + %3726 = llvm.add %3723, %3725 : !llvm.i64 + %3727 = llvm.getelementptr %3719[%3726] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3728 = llvm.load %3727 : !llvm.ptr + %3729 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3730 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3731 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3732 = llvm.mul %2957, %3731 : !llvm.i64 + %3733 = llvm.add %3730, %3732 : !llvm.i64 + %3734 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3735 = llvm.mul %791, %3734 : !llvm.i64 + %3736 = llvm.add %3733, %3735 : !llvm.i64 + %3737 = llvm.getelementptr %3729[%3736] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3728, %3737 : !llvm.ptr + %3738 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3740 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3741 = llvm.mul %2957, %3740 : !llvm.i64 + %3742 = llvm.add %3739, %3741 : !llvm.i64 + %3743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3744 = llvm.mul %59, %3743 : !llvm.i64 + %3745 = llvm.add %3742, %3744 : !llvm.i64 + %3746 = llvm.getelementptr %3738[%3745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3747 = llvm.load %3746 : !llvm.ptr + %3748 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3750 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3751 = llvm.mul %59, %3750 : !llvm.i64 + %3752 = llvm.add %3749, %3751 : !llvm.i64 + %3753 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3754 = llvm.mul %852, %3753 : !llvm.i64 + %3755 = llvm.add %3752, %3754 : !llvm.i64 + %3756 = llvm.getelementptr %3748[%3755] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3757 = llvm.load %3756 : !llvm.ptr + %3758 = llvm.fmul %3747, %3757 {RelaxedPrecision} : !llvm.float + %3759 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3761 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3762 = llvm.mul %2957, %3761 : !llvm.i64 + %3763 = llvm.add %3760, %3762 : !llvm.i64 + %3764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3765 = llvm.mul %852, %3764 : !llvm.i64 + %3766 = llvm.add %3763, %3765 : !llvm.i64 + %3767 = llvm.getelementptr %3759[%3766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3768 = llvm.load %3767 : !llvm.ptr + %3769 = llvm.fadd %3768, %3758 {RelaxedPrecision} : !llvm.float + %3770 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3771 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3772 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3773 = llvm.mul %2957, %3772 : !llvm.i64 + %3774 = llvm.add %3771, %3773 : !llvm.i64 + %3775 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3776 = llvm.mul %852, %3775 : !llvm.i64 + %3777 = llvm.add %3774, %3776 : !llvm.i64 + %3778 = llvm.getelementptr %3770[%3777] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3769, %3778 : !llvm.ptr + %3779 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3780 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3781 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3782 = llvm.mul %2957, %3781 : !llvm.i64 + %3783 = llvm.add %3780, %3782 : !llvm.i64 + %3784 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3785 = llvm.mul %852, %3784 : !llvm.i64 + %3786 = llvm.add %3783, %3785 : !llvm.i64 + %3787 = llvm.getelementptr %3779[%3786] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3788 = llvm.load %3787 : !llvm.ptr + %3789 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3790 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3791 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3792 = llvm.mul %2957, %3791 : !llvm.i64 + %3793 = llvm.add %3790, %3792 : !llvm.i64 + %3794 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3795 = llvm.mul %852, %3794 : !llvm.i64 + %3796 = llvm.add %3793, %3795 : !llvm.i64 + %3797 = llvm.getelementptr %3789[%3796] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3788, %3797 : !llvm.ptr + %3798 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3800 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3801 = llvm.mul %2957, %3800 : !llvm.i64 + %3802 = llvm.add %3799, %3801 : !llvm.i64 + %3803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3804 = llvm.mul %59, %3803 : !llvm.i64 + %3805 = llvm.add %3802, %3804 : !llvm.i64 + %3806 = llvm.getelementptr %3798[%3805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3807 = llvm.load %3806 : !llvm.ptr + %3808 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3811 = llvm.mul %59, %3810 : !llvm.i64 + %3812 = llvm.add %3809, %3811 : !llvm.i64 + %3813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3814 = llvm.mul %913, %3813 : !llvm.i64 + %3815 = llvm.add %3812, %3814 : !llvm.i64 + %3816 = llvm.getelementptr %3808[%3815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3817 = llvm.load %3816 : !llvm.ptr + %3818 = llvm.fmul %3807, %3817 {RelaxedPrecision} : !llvm.float + %3819 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3821 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3822 = llvm.mul %2957, %3821 : !llvm.i64 + %3823 = llvm.add %3820, %3822 : !llvm.i64 + %3824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3825 = llvm.mul %913, %3824 : !llvm.i64 + %3826 = llvm.add %3823, %3825 : !llvm.i64 + %3827 = llvm.getelementptr %3819[%3826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3828 = llvm.load %3827 : !llvm.ptr + %3829 = llvm.fadd %3828, %3818 {RelaxedPrecision} : !llvm.float + %3830 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3831 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3832 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3833 = llvm.mul %2957, %3832 : !llvm.i64 + %3834 = llvm.add %3831, %3833 : !llvm.i64 + %3835 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3836 = llvm.mul %913, %3835 : !llvm.i64 + %3837 = llvm.add %3834, %3836 : !llvm.i64 + %3838 = llvm.getelementptr %3830[%3837] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3829, %3838 : !llvm.ptr + %3839 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3840 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3841 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3842 = llvm.mul %2957, %3841 : !llvm.i64 + %3843 = llvm.add %3840, %3842 : !llvm.i64 + %3844 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3845 = llvm.mul %913, %3844 : !llvm.i64 + %3846 = llvm.add %3843, %3845 : !llvm.i64 + %3847 = llvm.getelementptr %3839[%3846] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3848 = llvm.load %3847 : !llvm.ptr + %3849 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3850 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3851 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3852 = llvm.mul %2957, %3851 : !llvm.i64 + %3853 = llvm.add %3850, %3852 : !llvm.i64 + %3854 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3855 = llvm.mul %913, %3854 : !llvm.i64 + %3856 = llvm.add %3853, %3855 : !llvm.i64 + %3857 = llvm.getelementptr %3849[%3856] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3848, %3857 : !llvm.ptr + %3858 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3860 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3861 = llvm.mul %2957, %3860 : !llvm.i64 + %3862 = llvm.add %3859, %3861 : !llvm.i64 + %3863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3864 = llvm.mul %59, %3863 : !llvm.i64 + %3865 = llvm.add %3862, %3864 : !llvm.i64 + %3866 = llvm.getelementptr %3858[%3865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3867 = llvm.load %3866 : !llvm.ptr + %3868 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3871 = llvm.mul %59, %3870 : !llvm.i64 + %3872 = llvm.add %3869, %3871 : !llvm.i64 + %3873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3874 = llvm.mul %974, %3873 : !llvm.i64 + %3875 = llvm.add %3872, %3874 : !llvm.i64 + %3876 = llvm.getelementptr %3868[%3875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3877 = llvm.load %3876 : !llvm.ptr + %3878 = llvm.fmul %3867, %3877 {RelaxedPrecision} : !llvm.float + %3879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3882 = llvm.mul %2957, %3881 : !llvm.i64 + %3883 = llvm.add %3880, %3882 : !llvm.i64 + %3884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3885 = llvm.mul %974, %3884 : !llvm.i64 + %3886 = llvm.add %3883, %3885 : !llvm.i64 + %3887 = llvm.getelementptr %3879[%3886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3888 = llvm.load %3887 : !llvm.ptr + %3889 = llvm.fadd %3888, %3878 {RelaxedPrecision} : !llvm.float + %3890 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3892 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3893 = llvm.mul %2957, %3892 : !llvm.i64 + %3894 = llvm.add %3891, %3893 : !llvm.i64 + %3895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3896 = llvm.mul %974, %3895 : !llvm.i64 + %3897 = llvm.add %3894, %3896 : !llvm.i64 + %3898 = llvm.getelementptr %3890[%3897] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3889, %3898 : !llvm.ptr + %3899 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3901 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3902 = llvm.mul %2957, %3901 : !llvm.i64 + %3903 = llvm.add %3900, %3902 : !llvm.i64 + %3904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3905 = llvm.mul %974, %3904 : !llvm.i64 + %3906 = llvm.add %3903, %3905 : !llvm.i64 + %3907 = llvm.getelementptr %3899[%3906] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3908 = llvm.load %3907 : !llvm.ptr + %3909 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3910 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3911 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3912 = llvm.mul %2957, %3911 : !llvm.i64 + %3913 = llvm.add %3910, %3912 : !llvm.i64 + %3914 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3915 = llvm.mul %974, %3914 : !llvm.i64 + %3916 = llvm.add %3913, %3915 : !llvm.i64 + %3917 = llvm.getelementptr %3909[%3916] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3908, %3917 : !llvm.ptr + %3918 = llvm.add %50, %36 : !llvm.i64 + %3919 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3921 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3922 = llvm.mul %3918, %3921 : !llvm.i64 + %3923 = llvm.add %3920, %3922 : !llvm.i64 + %3924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3925 = llvm.mul %59, %3924 : !llvm.i64 + %3926 = llvm.add %3923, %3925 : !llvm.i64 + %3927 = llvm.getelementptr %3919[%3926] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3928 = llvm.load %3927 : !llvm.ptr + %3929 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3930 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3931 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3932 = llvm.mul %59, %3931 : !llvm.i64 + %3933 = llvm.add %3930, %3932 : !llvm.i64 + %3934 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3935 = llvm.mul %58, %3934 : !llvm.i64 + %3936 = llvm.add %3933, %3935 : !llvm.i64 + %3937 = llvm.getelementptr %3929[%3936] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3938 = llvm.load %3937 : !llvm.ptr + %3939 = llvm.fmul %3928, %3938 {RelaxedPrecision} : !llvm.float + %3940 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3941 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3942 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3943 = llvm.mul %3918, %3942 : !llvm.i64 + %3944 = llvm.add %3941, %3943 : !llvm.i64 + %3945 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3946 = llvm.mul %58, %3945 : !llvm.i64 + %3947 = llvm.add %3944, %3946 : !llvm.i64 + %3948 = llvm.getelementptr %3940[%3947] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3949 = llvm.load %3948 : !llvm.ptr + %3950 = llvm.fadd %3949, %3939 {RelaxedPrecision} : !llvm.float + %3951 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3952 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3953 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3954 = llvm.mul %3918, %3953 : !llvm.i64 + %3955 = llvm.add %3952, %3954 : !llvm.i64 + %3956 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3957 = llvm.mul %58, %3956 : !llvm.i64 + %3958 = llvm.add %3955, %3957 : !llvm.i64 + %3959 = llvm.getelementptr %3951[%3958] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3950, %3959 : !llvm.ptr + %3960 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3961 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3962 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3963 = llvm.mul %3918, %3962 : !llvm.i64 + %3964 = llvm.add %3961, %3963 : !llvm.i64 + %3965 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3966 = llvm.mul %58, %3965 : !llvm.i64 + %3967 = llvm.add %3964, %3966 : !llvm.i64 + %3968 = llvm.getelementptr %3960[%3967] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3969 = llvm.load %3968 : !llvm.ptr + %3970 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3971 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3972 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3973 = llvm.mul %3918, %3972 : !llvm.i64 + %3974 = llvm.add %3971, %3973 : !llvm.i64 + %3975 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3976 = llvm.mul %58, %3975 : !llvm.i64 + %3977 = llvm.add %3974, %3976 : !llvm.i64 + %3978 = llvm.getelementptr %3970[%3977] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3969, %3978 : !llvm.ptr + %3979 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3981 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3982 = llvm.mul %3918, %3981 : !llvm.i64 + %3983 = llvm.add %3980, %3982 : !llvm.i64 + %3984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3985 = llvm.mul %59, %3984 : !llvm.i64 + %3986 = llvm.add %3983, %3985 : !llvm.i64 + %3987 = llvm.getelementptr %3979[%3986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3988 = llvm.load %3987 : !llvm.ptr + %3989 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3990 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3991 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3992 = llvm.mul %59, %3991 : !llvm.i64 + %3993 = llvm.add %3990, %3992 : !llvm.i64 + %3994 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3995 = llvm.mul %120, %3994 : !llvm.i64 + %3996 = llvm.add %3993, %3995 : !llvm.i64 + %3997 = llvm.getelementptr %3989[%3996] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3998 = llvm.load %3997 : !llvm.ptr + %3999 = llvm.fmul %3988, %3998 {RelaxedPrecision} : !llvm.float + %4000 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4001 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4002 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4003 = llvm.mul %3918, %4002 : !llvm.i64 + %4004 = llvm.add %4001, %4003 : !llvm.i64 + %4005 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4006 = llvm.mul %120, %4005 : !llvm.i64 + %4007 = llvm.add %4004, %4006 : !llvm.i64 + %4008 = llvm.getelementptr %4000[%4007] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4009 = llvm.load %4008 : !llvm.ptr + %4010 = llvm.fadd %4009, %3999 {RelaxedPrecision} : !llvm.float + %4011 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4012 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4013 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4014 = llvm.mul %3918, %4013 : !llvm.i64 + %4015 = llvm.add %4012, %4014 : !llvm.i64 + %4016 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4017 = llvm.mul %120, %4016 : !llvm.i64 + %4018 = llvm.add %4015, %4017 : !llvm.i64 + %4019 = llvm.getelementptr %4011[%4018] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4010, %4019 : !llvm.ptr + %4020 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4021 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4022 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4023 = llvm.mul %3918, %4022 : !llvm.i64 + %4024 = llvm.add %4021, %4023 : !llvm.i64 + %4025 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4026 = llvm.mul %120, %4025 : !llvm.i64 + %4027 = llvm.add %4024, %4026 : !llvm.i64 + %4028 = llvm.getelementptr %4020[%4027] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4029 = llvm.load %4028 : !llvm.ptr + %4030 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4031 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4032 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4033 = llvm.mul %3918, %4032 : !llvm.i64 + %4034 = llvm.add %4031, %4033 : !llvm.i64 + %4035 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4036 = llvm.mul %120, %4035 : !llvm.i64 + %4037 = llvm.add %4034, %4036 : !llvm.i64 + %4038 = llvm.getelementptr %4030[%4037] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4029, %4038 : !llvm.ptr + %4039 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4041 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4042 = llvm.mul %3918, %4041 : !llvm.i64 + %4043 = llvm.add %4040, %4042 : !llvm.i64 + %4044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4045 = llvm.mul %59, %4044 : !llvm.i64 + %4046 = llvm.add %4043, %4045 : !llvm.i64 + %4047 = llvm.getelementptr %4039[%4046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4048 = llvm.load %4047 : !llvm.ptr + %4049 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4050 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4051 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4052 = llvm.mul %59, %4051 : !llvm.i64 + %4053 = llvm.add %4050, %4052 : !llvm.i64 + %4054 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4055 = llvm.mul %181, %4054 : !llvm.i64 + %4056 = llvm.add %4053, %4055 : !llvm.i64 + %4057 = llvm.getelementptr %4049[%4056] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4058 = llvm.load %4057 : !llvm.ptr + %4059 = llvm.fmul %4048, %4058 {RelaxedPrecision} : !llvm.float + %4060 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4062 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4063 = llvm.mul %3918, %4062 : !llvm.i64 + %4064 = llvm.add %4061, %4063 : !llvm.i64 + %4065 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4066 = llvm.mul %181, %4065 : !llvm.i64 + %4067 = llvm.add %4064, %4066 : !llvm.i64 + %4068 = llvm.getelementptr %4060[%4067] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4069 = llvm.load %4068 : !llvm.ptr + %4070 = llvm.fadd %4069, %4059 {RelaxedPrecision} : !llvm.float + %4071 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4072 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4073 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4074 = llvm.mul %3918, %4073 : !llvm.i64 + %4075 = llvm.add %4072, %4074 : !llvm.i64 + %4076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4077 = llvm.mul %181, %4076 : !llvm.i64 + %4078 = llvm.add %4075, %4077 : !llvm.i64 + %4079 = llvm.getelementptr %4071[%4078] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4070, %4079 : !llvm.ptr + %4080 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4081 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4082 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4083 = llvm.mul %3918, %4082 : !llvm.i64 + %4084 = llvm.add %4081, %4083 : !llvm.i64 + %4085 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4086 = llvm.mul %181, %4085 : !llvm.i64 + %4087 = llvm.add %4084, %4086 : !llvm.i64 + %4088 = llvm.getelementptr %4080[%4087] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4089 = llvm.load %4088 : !llvm.ptr + %4090 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4091 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4092 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4093 = llvm.mul %3918, %4092 : !llvm.i64 + %4094 = llvm.add %4091, %4093 : !llvm.i64 + %4095 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4096 = llvm.mul %181, %4095 : !llvm.i64 + %4097 = llvm.add %4094, %4096 : !llvm.i64 + %4098 = llvm.getelementptr %4090[%4097] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4089, %4098 : !llvm.ptr + %4099 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4101 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4102 = llvm.mul %3918, %4101 : !llvm.i64 + %4103 = llvm.add %4100, %4102 : !llvm.i64 + %4104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4105 = llvm.mul %59, %4104 : !llvm.i64 + %4106 = llvm.add %4103, %4105 : !llvm.i64 + %4107 = llvm.getelementptr %4099[%4106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4108 = llvm.load %4107 : !llvm.ptr + %4109 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4110 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4111 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4112 = llvm.mul %59, %4111 : !llvm.i64 + %4113 = llvm.add %4110, %4112 : !llvm.i64 + %4114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4115 = llvm.mul %242, %4114 : !llvm.i64 + %4116 = llvm.add %4113, %4115 : !llvm.i64 + %4117 = llvm.getelementptr %4109[%4116] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4118 = llvm.load %4117 : !llvm.ptr + %4119 = llvm.fmul %4108, %4118 {RelaxedPrecision} : !llvm.float + %4120 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4122 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4123 = llvm.mul %3918, %4122 : !llvm.i64 + %4124 = llvm.add %4121, %4123 : !llvm.i64 + %4125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4126 = llvm.mul %242, %4125 : !llvm.i64 + %4127 = llvm.add %4124, %4126 : !llvm.i64 + %4128 = llvm.getelementptr %4120[%4127] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4129 = llvm.load %4128 : !llvm.ptr + %4130 = llvm.fadd %4129, %4119 {RelaxedPrecision} : !llvm.float + %4131 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4134 = llvm.mul %3918, %4133 : !llvm.i64 + %4135 = llvm.add %4132, %4134 : !llvm.i64 + %4136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4137 = llvm.mul %242, %4136 : !llvm.i64 + %4138 = llvm.add %4135, %4137 : !llvm.i64 + %4139 = llvm.getelementptr %4131[%4138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4130, %4139 : !llvm.ptr + %4140 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4141 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4142 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4143 = llvm.mul %3918, %4142 : !llvm.i64 + %4144 = llvm.add %4141, %4143 : !llvm.i64 + %4145 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4146 = llvm.mul %242, %4145 : !llvm.i64 + %4147 = llvm.add %4144, %4146 : !llvm.i64 + %4148 = llvm.getelementptr %4140[%4147] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4149 = llvm.load %4148 : !llvm.ptr + %4150 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4151 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4152 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4153 = llvm.mul %3918, %4152 : !llvm.i64 + %4154 = llvm.add %4151, %4153 : !llvm.i64 + %4155 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4156 = llvm.mul %242, %4155 : !llvm.i64 + %4157 = llvm.add %4154, %4156 : !llvm.i64 + %4158 = llvm.getelementptr %4150[%4157] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4149, %4158 : !llvm.ptr + %4159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4161 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4162 = llvm.mul %3918, %4161 : !llvm.i64 + %4163 = llvm.add %4160, %4162 : !llvm.i64 + %4164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4165 = llvm.mul %59, %4164 : !llvm.i64 + %4166 = llvm.add %4163, %4165 : !llvm.i64 + %4167 = llvm.getelementptr %4159[%4166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4168 = llvm.load %4167 : !llvm.ptr + %4169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4171 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4172 = llvm.mul %59, %4171 : !llvm.i64 + %4173 = llvm.add %4170, %4172 : !llvm.i64 + %4174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4175 = llvm.mul %303, %4174 : !llvm.i64 + %4176 = llvm.add %4173, %4175 : !llvm.i64 + %4177 = llvm.getelementptr %4169[%4176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4178 = llvm.load %4177 : !llvm.ptr + %4179 = llvm.fmul %4168, %4178 {RelaxedPrecision} : !llvm.float + %4180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4182 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4183 = llvm.mul %3918, %4182 : !llvm.i64 + %4184 = llvm.add %4181, %4183 : !llvm.i64 + %4185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4186 = llvm.mul %303, %4185 : !llvm.i64 + %4187 = llvm.add %4184, %4186 : !llvm.i64 + %4188 = llvm.getelementptr %4180[%4187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4189 = llvm.load %4188 : !llvm.ptr + %4190 = llvm.fadd %4189, %4179 {RelaxedPrecision} : !llvm.float + %4191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4193 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4194 = llvm.mul %3918, %4193 : !llvm.i64 + %4195 = llvm.add %4192, %4194 : !llvm.i64 + %4196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4197 = llvm.mul %303, %4196 : !llvm.i64 + %4198 = llvm.add %4195, %4197 : !llvm.i64 + %4199 = llvm.getelementptr %4191[%4198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4190, %4199 : !llvm.ptr + %4200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4202 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4203 = llvm.mul %3918, %4202 : !llvm.i64 + %4204 = llvm.add %4201, %4203 : !llvm.i64 + %4205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4206 = llvm.mul %303, %4205 : !llvm.i64 + %4207 = llvm.add %4204, %4206 : !llvm.i64 + %4208 = llvm.getelementptr %4200[%4207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4209 = llvm.load %4208 : !llvm.ptr + %4210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4212 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4213 = llvm.mul %3918, %4212 : !llvm.i64 + %4214 = llvm.add %4211, %4213 : !llvm.i64 + %4215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4216 = llvm.mul %303, %4215 : !llvm.i64 + %4217 = llvm.add %4214, %4216 : !llvm.i64 + %4218 = llvm.getelementptr %4210[%4217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4209, %4218 : !llvm.ptr + %4219 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4221 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4222 = llvm.mul %3918, %4221 : !llvm.i64 + %4223 = llvm.add %4220, %4222 : !llvm.i64 + %4224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4225 = llvm.mul %59, %4224 : !llvm.i64 + %4226 = llvm.add %4223, %4225 : !llvm.i64 + %4227 = llvm.getelementptr %4219[%4226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4228 = llvm.load %4227 : !llvm.ptr + %4229 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4230 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4231 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4232 = llvm.mul %59, %4231 : !llvm.i64 + %4233 = llvm.add %4230, %4232 : !llvm.i64 + %4234 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4235 = llvm.mul %364, %4234 : !llvm.i64 + %4236 = llvm.add %4233, %4235 : !llvm.i64 + %4237 = llvm.getelementptr %4229[%4236] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4238 = llvm.load %4237 : !llvm.ptr + %4239 = llvm.fmul %4228, %4238 {RelaxedPrecision} : !llvm.float + %4240 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4242 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4243 = llvm.mul %3918, %4242 : !llvm.i64 + %4244 = llvm.add %4241, %4243 : !llvm.i64 + %4245 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4246 = llvm.mul %364, %4245 : !llvm.i64 + %4247 = llvm.add %4244, %4246 : !llvm.i64 + %4248 = llvm.getelementptr %4240[%4247] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4249 = llvm.load %4248 : !llvm.ptr + %4250 = llvm.fadd %4249, %4239 {RelaxedPrecision} : !llvm.float + %4251 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4252 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4253 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4254 = llvm.mul %3918, %4253 : !llvm.i64 + %4255 = llvm.add %4252, %4254 : !llvm.i64 + %4256 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4257 = llvm.mul %364, %4256 : !llvm.i64 + %4258 = llvm.add %4255, %4257 : !llvm.i64 + %4259 = llvm.getelementptr %4251[%4258] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4250, %4259 : !llvm.ptr + %4260 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4261 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4262 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4263 = llvm.mul %3918, %4262 : !llvm.i64 + %4264 = llvm.add %4261, %4263 : !llvm.i64 + %4265 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4266 = llvm.mul %364, %4265 : !llvm.i64 + %4267 = llvm.add %4264, %4266 : !llvm.i64 + %4268 = llvm.getelementptr %4260[%4267] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4269 = llvm.load %4268 : !llvm.ptr + %4270 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4271 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4272 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4273 = llvm.mul %3918, %4272 : !llvm.i64 + %4274 = llvm.add %4271, %4273 : !llvm.i64 + %4275 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4276 = llvm.mul %364, %4275 : !llvm.i64 + %4277 = llvm.add %4274, %4276 : !llvm.i64 + %4278 = llvm.getelementptr %4270[%4277] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4269, %4278 : !llvm.ptr + %4279 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4281 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4282 = llvm.mul %3918, %4281 : !llvm.i64 + %4283 = llvm.add %4280, %4282 : !llvm.i64 + %4284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4285 = llvm.mul %59, %4284 : !llvm.i64 + %4286 = llvm.add %4283, %4285 : !llvm.i64 + %4287 = llvm.getelementptr %4279[%4286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4288 = llvm.load %4287 : !llvm.ptr + %4289 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4290 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4291 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4292 = llvm.mul %59, %4291 : !llvm.i64 + %4293 = llvm.add %4290, %4292 : !llvm.i64 + %4294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4295 = llvm.mul %425, %4294 : !llvm.i64 + %4296 = llvm.add %4293, %4295 : !llvm.i64 + %4297 = llvm.getelementptr %4289[%4296] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4298 = llvm.load %4297 : !llvm.ptr + %4299 = llvm.fmul %4288, %4298 {RelaxedPrecision} : !llvm.float + %4300 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4302 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4303 = llvm.mul %3918, %4302 : !llvm.i64 + %4304 = llvm.add %4301, %4303 : !llvm.i64 + %4305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4306 = llvm.mul %425, %4305 : !llvm.i64 + %4307 = llvm.add %4304, %4306 : !llvm.i64 + %4308 = llvm.getelementptr %4300[%4307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4309 = llvm.load %4308 : !llvm.ptr + %4310 = llvm.fadd %4309, %4299 {RelaxedPrecision} : !llvm.float + %4311 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4314 = llvm.mul %3918, %4313 : !llvm.i64 + %4315 = llvm.add %4312, %4314 : !llvm.i64 + %4316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4317 = llvm.mul %425, %4316 : !llvm.i64 + %4318 = llvm.add %4315, %4317 : !llvm.i64 + %4319 = llvm.getelementptr %4311[%4318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4310, %4319 : !llvm.ptr + %4320 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4321 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4322 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4323 = llvm.mul %3918, %4322 : !llvm.i64 + %4324 = llvm.add %4321, %4323 : !llvm.i64 + %4325 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4326 = llvm.mul %425, %4325 : !llvm.i64 + %4327 = llvm.add %4324, %4326 : !llvm.i64 + %4328 = llvm.getelementptr %4320[%4327] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4329 = llvm.load %4328 : !llvm.ptr + %4330 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4331 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4332 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4333 = llvm.mul %3918, %4332 : !llvm.i64 + %4334 = llvm.add %4331, %4333 : !llvm.i64 + %4335 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4336 = llvm.mul %425, %4335 : !llvm.i64 + %4337 = llvm.add %4334, %4336 : !llvm.i64 + %4338 = llvm.getelementptr %4330[%4337] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4329, %4338 : !llvm.ptr + %4339 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4341 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4342 = llvm.mul %3918, %4341 : !llvm.i64 + %4343 = llvm.add %4340, %4342 : !llvm.i64 + %4344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4345 = llvm.mul %59, %4344 : !llvm.i64 + %4346 = llvm.add %4343, %4345 : !llvm.i64 + %4347 = llvm.getelementptr %4339[%4346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4348 = llvm.load %4347 : !llvm.ptr + %4349 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4350 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4351 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4352 = llvm.mul %59, %4351 : !llvm.i64 + %4353 = llvm.add %4350, %4352 : !llvm.i64 + %4354 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4355 = llvm.mul %486, %4354 : !llvm.i64 + %4356 = llvm.add %4353, %4355 : !llvm.i64 + %4357 = llvm.getelementptr %4349[%4356] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4358 = llvm.load %4357 : !llvm.ptr + %4359 = llvm.fmul %4348, %4358 {RelaxedPrecision} : !llvm.float + %4360 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4361 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4362 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4363 = llvm.mul %3918, %4362 : !llvm.i64 + %4364 = llvm.add %4361, %4363 : !llvm.i64 + %4365 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4366 = llvm.mul %486, %4365 : !llvm.i64 + %4367 = llvm.add %4364, %4366 : !llvm.i64 + %4368 = llvm.getelementptr %4360[%4367] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4369 = llvm.load %4368 : !llvm.ptr + %4370 = llvm.fadd %4369, %4359 {RelaxedPrecision} : !llvm.float + %4371 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4372 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4373 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4374 = llvm.mul %3918, %4373 : !llvm.i64 + %4375 = llvm.add %4372, %4374 : !llvm.i64 + %4376 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4377 = llvm.mul %486, %4376 : !llvm.i64 + %4378 = llvm.add %4375, %4377 : !llvm.i64 + %4379 = llvm.getelementptr %4371[%4378] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4370, %4379 : !llvm.ptr + %4380 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4382 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4383 = llvm.mul %3918, %4382 : !llvm.i64 + %4384 = llvm.add %4381, %4383 : !llvm.i64 + %4385 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4386 = llvm.mul %486, %4385 : !llvm.i64 + %4387 = llvm.add %4384, %4386 : !llvm.i64 + %4388 = llvm.getelementptr %4380[%4387] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4389 = llvm.load %4388 : !llvm.ptr + %4390 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4392 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4393 = llvm.mul %3918, %4392 : !llvm.i64 + %4394 = llvm.add %4391, %4393 : !llvm.i64 + %4395 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4396 = llvm.mul %486, %4395 : !llvm.i64 + %4397 = llvm.add %4394, %4396 : !llvm.i64 + %4398 = llvm.getelementptr %4390[%4397] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4389, %4398 : !llvm.ptr + %4399 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4401 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4402 = llvm.mul %3918, %4401 : !llvm.i64 + %4403 = llvm.add %4400, %4402 : !llvm.i64 + %4404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4405 = llvm.mul %59, %4404 : !llvm.i64 + %4406 = llvm.add %4403, %4405 : !llvm.i64 + %4407 = llvm.getelementptr %4399[%4406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4408 = llvm.load %4407 : !llvm.ptr + %4409 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4410 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4411 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4412 = llvm.mul %59, %4411 : !llvm.i64 + %4413 = llvm.add %4410, %4412 : !llvm.i64 + %4414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4415 = llvm.mul %547, %4414 : !llvm.i64 + %4416 = llvm.add %4413, %4415 : !llvm.i64 + %4417 = llvm.getelementptr %4409[%4416] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4418 = llvm.load %4417 : !llvm.ptr + %4419 = llvm.fmul %4408, %4418 {RelaxedPrecision} : !llvm.float + %4420 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4421 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4422 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4423 = llvm.mul %3918, %4422 : !llvm.i64 + %4424 = llvm.add %4421, %4423 : !llvm.i64 + %4425 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4426 = llvm.mul %547, %4425 : !llvm.i64 + %4427 = llvm.add %4424, %4426 : !llvm.i64 + %4428 = llvm.getelementptr %4420[%4427] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4429 = llvm.load %4428 : !llvm.ptr + %4430 = llvm.fadd %4429, %4419 {RelaxedPrecision} : !llvm.float + %4431 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4433 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4434 = llvm.mul %3918, %4433 : !llvm.i64 + %4435 = llvm.add %4432, %4434 : !llvm.i64 + %4436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4437 = llvm.mul %547, %4436 : !llvm.i64 + %4438 = llvm.add %4435, %4437 : !llvm.i64 + %4439 = llvm.getelementptr %4431[%4438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4430, %4439 : !llvm.ptr + %4440 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4441 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4442 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4443 = llvm.mul %3918, %4442 : !llvm.i64 + %4444 = llvm.add %4441, %4443 : !llvm.i64 + %4445 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4446 = llvm.mul %547, %4445 : !llvm.i64 + %4447 = llvm.add %4444, %4446 : !llvm.i64 + %4448 = llvm.getelementptr %4440[%4447] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4449 = llvm.load %4448 : !llvm.ptr + %4450 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4451 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4452 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4453 = llvm.mul %3918, %4452 : !llvm.i64 + %4454 = llvm.add %4451, %4453 : !llvm.i64 + %4455 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4456 = llvm.mul %547, %4455 : !llvm.i64 + %4457 = llvm.add %4454, %4456 : !llvm.i64 + %4458 = llvm.getelementptr %4450[%4457] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4449, %4458 : !llvm.ptr + %4459 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4461 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4462 = llvm.mul %3918, %4461 : !llvm.i64 + %4463 = llvm.add %4460, %4462 : !llvm.i64 + %4464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4465 = llvm.mul %59, %4464 : !llvm.i64 + %4466 = llvm.add %4463, %4465 : !llvm.i64 + %4467 = llvm.getelementptr %4459[%4466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4468 = llvm.load %4467 : !llvm.ptr + %4469 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4470 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4471 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4472 = llvm.mul %59, %4471 : !llvm.i64 + %4473 = llvm.add %4470, %4472 : !llvm.i64 + %4474 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4475 = llvm.mul %608, %4474 : !llvm.i64 + %4476 = llvm.add %4473, %4475 : !llvm.i64 + %4477 = llvm.getelementptr %4469[%4476] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4478 = llvm.load %4477 : !llvm.ptr + %4479 = llvm.fmul %4468, %4478 {RelaxedPrecision} : !llvm.float + %4480 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4481 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4482 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4483 = llvm.mul %3918, %4482 : !llvm.i64 + %4484 = llvm.add %4481, %4483 : !llvm.i64 + %4485 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4486 = llvm.mul %608, %4485 : !llvm.i64 + %4487 = llvm.add %4484, %4486 : !llvm.i64 + %4488 = llvm.getelementptr %4480[%4487] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4489 = llvm.load %4488 : !llvm.ptr + %4490 = llvm.fadd %4489, %4479 {RelaxedPrecision} : !llvm.float + %4491 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4492 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4493 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4494 = llvm.mul %3918, %4493 : !llvm.i64 + %4495 = llvm.add %4492, %4494 : !llvm.i64 + %4496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4497 = llvm.mul %608, %4496 : !llvm.i64 + %4498 = llvm.add %4495, %4497 : !llvm.i64 + %4499 = llvm.getelementptr %4491[%4498] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4490, %4499 : !llvm.ptr + %4500 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4501 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4502 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4503 = llvm.mul %3918, %4502 : !llvm.i64 + %4504 = llvm.add %4501, %4503 : !llvm.i64 + %4505 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4506 = llvm.mul %608, %4505 : !llvm.i64 + %4507 = llvm.add %4504, %4506 : !llvm.i64 + %4508 = llvm.getelementptr %4500[%4507] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4509 = llvm.load %4508 : !llvm.ptr + %4510 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4511 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4512 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4513 = llvm.mul %3918, %4512 : !llvm.i64 + %4514 = llvm.add %4511, %4513 : !llvm.i64 + %4515 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4516 = llvm.mul %608, %4515 : !llvm.i64 + %4517 = llvm.add %4514, %4516 : !llvm.i64 + %4518 = llvm.getelementptr %4510[%4517] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4509, %4518 : !llvm.ptr + %4519 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4521 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4522 = llvm.mul %3918, %4521 : !llvm.i64 + %4523 = llvm.add %4520, %4522 : !llvm.i64 + %4524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4525 = llvm.mul %59, %4524 : !llvm.i64 + %4526 = llvm.add %4523, %4525 : !llvm.i64 + %4527 = llvm.getelementptr %4519[%4526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4528 = llvm.load %4527 : !llvm.ptr + %4529 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4530 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4531 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4532 = llvm.mul %59, %4531 : !llvm.i64 + %4533 = llvm.add %4530, %4532 : !llvm.i64 + %4534 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4535 = llvm.mul %669, %4534 : !llvm.i64 + %4536 = llvm.add %4533, %4535 : !llvm.i64 + %4537 = llvm.getelementptr %4529[%4536] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4538 = llvm.load %4537 : !llvm.ptr + %4539 = llvm.fmul %4528, %4538 {RelaxedPrecision} : !llvm.float + %4540 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4542 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4543 = llvm.mul %3918, %4542 : !llvm.i64 + %4544 = llvm.add %4541, %4543 : !llvm.i64 + %4545 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4546 = llvm.mul %669, %4545 : !llvm.i64 + %4547 = llvm.add %4544, %4546 : !llvm.i64 + %4548 = llvm.getelementptr %4540[%4547] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4549 = llvm.load %4548 : !llvm.ptr + %4550 = llvm.fadd %4549, %4539 {RelaxedPrecision} : !llvm.float + %4551 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4552 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4553 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4554 = llvm.mul %3918, %4553 : !llvm.i64 + %4555 = llvm.add %4552, %4554 : !llvm.i64 + %4556 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4557 = llvm.mul %669, %4556 : !llvm.i64 + %4558 = llvm.add %4555, %4557 : !llvm.i64 + %4559 = llvm.getelementptr %4551[%4558] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4550, %4559 : !llvm.ptr + %4560 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4561 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4562 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4563 = llvm.mul %3918, %4562 : !llvm.i64 + %4564 = llvm.add %4561, %4563 : !llvm.i64 + %4565 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4566 = llvm.mul %669, %4565 : !llvm.i64 + %4567 = llvm.add %4564, %4566 : !llvm.i64 + %4568 = llvm.getelementptr %4560[%4567] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4569 = llvm.load %4568 : !llvm.ptr + %4570 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4571 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4572 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4573 = llvm.mul %3918, %4572 : !llvm.i64 + %4574 = llvm.add %4571, %4573 : !llvm.i64 + %4575 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4576 = llvm.mul %669, %4575 : !llvm.i64 + %4577 = llvm.add %4574, %4576 : !llvm.i64 + %4578 = llvm.getelementptr %4570[%4577] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4569, %4578 : !llvm.ptr + %4579 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4581 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4582 = llvm.mul %3918, %4581 : !llvm.i64 + %4583 = llvm.add %4580, %4582 : !llvm.i64 + %4584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4585 = llvm.mul %59, %4584 : !llvm.i64 + %4586 = llvm.add %4583, %4585 : !llvm.i64 + %4587 = llvm.getelementptr %4579[%4586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4588 = llvm.load %4587 : !llvm.ptr + %4589 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4591 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4592 = llvm.mul %59, %4591 : !llvm.i64 + %4593 = llvm.add %4590, %4592 : !llvm.i64 + %4594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4595 = llvm.mul %730, %4594 : !llvm.i64 + %4596 = llvm.add %4593, %4595 : !llvm.i64 + %4597 = llvm.getelementptr %4589[%4596] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4598 = llvm.load %4597 : !llvm.ptr + %4599 = llvm.fmul %4588, %4598 {RelaxedPrecision} : !llvm.float + %4600 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4602 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4603 = llvm.mul %3918, %4602 : !llvm.i64 + %4604 = llvm.add %4601, %4603 : !llvm.i64 + %4605 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4606 = llvm.mul %730, %4605 : !llvm.i64 + %4607 = llvm.add %4604, %4606 : !llvm.i64 + %4608 = llvm.getelementptr %4600[%4607] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4609 = llvm.load %4608 : !llvm.ptr + %4610 = llvm.fadd %4609, %4599 {RelaxedPrecision} : !llvm.float + %4611 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4612 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4613 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4614 = llvm.mul %3918, %4613 : !llvm.i64 + %4615 = llvm.add %4612, %4614 : !llvm.i64 + %4616 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4617 = llvm.mul %730, %4616 : !llvm.i64 + %4618 = llvm.add %4615, %4617 : !llvm.i64 + %4619 = llvm.getelementptr %4611[%4618] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4610, %4619 : !llvm.ptr + %4620 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4621 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4622 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4623 = llvm.mul %3918, %4622 : !llvm.i64 + %4624 = llvm.add %4621, %4623 : !llvm.i64 + %4625 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4626 = llvm.mul %730, %4625 : !llvm.i64 + %4627 = llvm.add %4624, %4626 : !llvm.i64 + %4628 = llvm.getelementptr %4620[%4627] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4629 = llvm.load %4628 : !llvm.ptr + %4630 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4632 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4633 = llvm.mul %3918, %4632 : !llvm.i64 + %4634 = llvm.add %4631, %4633 : !llvm.i64 + %4635 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4636 = llvm.mul %730, %4635 : !llvm.i64 + %4637 = llvm.add %4634, %4636 : !llvm.i64 + %4638 = llvm.getelementptr %4630[%4637] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4629, %4638 : !llvm.ptr + %4639 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4641 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4642 = llvm.mul %3918, %4641 : !llvm.i64 + %4643 = llvm.add %4640, %4642 : !llvm.i64 + %4644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4645 = llvm.mul %59, %4644 : !llvm.i64 + %4646 = llvm.add %4643, %4645 : !llvm.i64 + %4647 = llvm.getelementptr %4639[%4646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4648 = llvm.load %4647 : !llvm.ptr + %4649 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4651 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4652 = llvm.mul %59, %4651 : !llvm.i64 + %4653 = llvm.add %4650, %4652 : !llvm.i64 + %4654 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4655 = llvm.mul %791, %4654 : !llvm.i64 + %4656 = llvm.add %4653, %4655 : !llvm.i64 + %4657 = llvm.getelementptr %4649[%4656] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4658 = llvm.load %4657 : !llvm.ptr + %4659 = llvm.fmul %4648, %4658 {RelaxedPrecision} : !llvm.float + %4660 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4662 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4663 = llvm.mul %3918, %4662 : !llvm.i64 + %4664 = llvm.add %4661, %4663 : !llvm.i64 + %4665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4666 = llvm.mul %791, %4665 : !llvm.i64 + %4667 = llvm.add %4664, %4666 : !llvm.i64 + %4668 = llvm.getelementptr %4660[%4667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4669 = llvm.load %4668 : !llvm.ptr + %4670 = llvm.fadd %4669, %4659 {RelaxedPrecision} : !llvm.float + %4671 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4672 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4673 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4674 = llvm.mul %3918, %4673 : !llvm.i64 + %4675 = llvm.add %4672, %4674 : !llvm.i64 + %4676 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4677 = llvm.mul %791, %4676 : !llvm.i64 + %4678 = llvm.add %4675, %4677 : !llvm.i64 + %4679 = llvm.getelementptr %4671[%4678] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4670, %4679 : !llvm.ptr + %4680 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4681 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4682 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4683 = llvm.mul %3918, %4682 : !llvm.i64 + %4684 = llvm.add %4681, %4683 : !llvm.i64 + %4685 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4686 = llvm.mul %791, %4685 : !llvm.i64 + %4687 = llvm.add %4684, %4686 : !llvm.i64 + %4688 = llvm.getelementptr %4680[%4687] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4689 = llvm.load %4688 : !llvm.ptr + %4690 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4691 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4692 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4693 = llvm.mul %3918, %4692 : !llvm.i64 + %4694 = llvm.add %4691, %4693 : !llvm.i64 + %4695 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4696 = llvm.mul %791, %4695 : !llvm.i64 + %4697 = llvm.add %4694, %4696 : !llvm.i64 + %4698 = llvm.getelementptr %4690[%4697] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4689, %4698 : !llvm.ptr + %4699 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4701 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4702 = llvm.mul %3918, %4701 : !llvm.i64 + %4703 = llvm.add %4700, %4702 : !llvm.i64 + %4704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4705 = llvm.mul %59, %4704 : !llvm.i64 + %4706 = llvm.add %4703, %4705 : !llvm.i64 + %4707 = llvm.getelementptr %4699[%4706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4708 = llvm.load %4707 : !llvm.ptr + %4709 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4710 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4711 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4712 = llvm.mul %59, %4711 : !llvm.i64 + %4713 = llvm.add %4710, %4712 : !llvm.i64 + %4714 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4715 = llvm.mul %852, %4714 : !llvm.i64 + %4716 = llvm.add %4713, %4715 : !llvm.i64 + %4717 = llvm.getelementptr %4709[%4716] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4718 = llvm.load %4717 : !llvm.ptr + %4719 = llvm.fmul %4708, %4718 {RelaxedPrecision} : !llvm.float + %4720 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4721 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4722 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4723 = llvm.mul %3918, %4722 : !llvm.i64 + %4724 = llvm.add %4721, %4723 : !llvm.i64 + %4725 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4726 = llvm.mul %852, %4725 : !llvm.i64 + %4727 = llvm.add %4724, %4726 : !llvm.i64 + %4728 = llvm.getelementptr %4720[%4727] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4729 = llvm.load %4728 : !llvm.ptr + %4730 = llvm.fadd %4729, %4719 {RelaxedPrecision} : !llvm.float + %4731 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4732 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4733 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4734 = llvm.mul %3918, %4733 : !llvm.i64 + %4735 = llvm.add %4732, %4734 : !llvm.i64 + %4736 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4737 = llvm.mul %852, %4736 : !llvm.i64 + %4738 = llvm.add %4735, %4737 : !llvm.i64 + %4739 = llvm.getelementptr %4731[%4738] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4730, %4739 : !llvm.ptr + %4740 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4741 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4742 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4743 = llvm.mul %3918, %4742 : !llvm.i64 + %4744 = llvm.add %4741, %4743 : !llvm.i64 + %4745 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4746 = llvm.mul %852, %4745 : !llvm.i64 + %4747 = llvm.add %4744, %4746 : !llvm.i64 + %4748 = llvm.getelementptr %4740[%4747] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4749 = llvm.load %4748 : !llvm.ptr + %4750 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4751 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4752 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4753 = llvm.mul %3918, %4752 : !llvm.i64 + %4754 = llvm.add %4751, %4753 : !llvm.i64 + %4755 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4756 = llvm.mul %852, %4755 : !llvm.i64 + %4757 = llvm.add %4754, %4756 : !llvm.i64 + %4758 = llvm.getelementptr %4750[%4757] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4749, %4758 : !llvm.ptr + %4759 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4761 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4762 = llvm.mul %3918, %4761 : !llvm.i64 + %4763 = llvm.add %4760, %4762 : !llvm.i64 + %4764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4765 = llvm.mul %59, %4764 : !llvm.i64 + %4766 = llvm.add %4763, %4765 : !llvm.i64 + %4767 = llvm.getelementptr %4759[%4766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4768 = llvm.load %4767 : !llvm.ptr + %4769 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4770 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4771 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4772 = llvm.mul %59, %4771 : !llvm.i64 + %4773 = llvm.add %4770, %4772 : !llvm.i64 + %4774 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4775 = llvm.mul %913, %4774 : !llvm.i64 + %4776 = llvm.add %4773, %4775 : !llvm.i64 + %4777 = llvm.getelementptr %4769[%4776] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4778 = llvm.load %4777 : !llvm.ptr + %4779 = llvm.fmul %4768, %4778 {RelaxedPrecision} : !llvm.float + %4780 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4781 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4782 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4783 = llvm.mul %3918, %4782 : !llvm.i64 + %4784 = llvm.add %4781, %4783 : !llvm.i64 + %4785 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4786 = llvm.mul %913, %4785 : !llvm.i64 + %4787 = llvm.add %4784, %4786 : !llvm.i64 + %4788 = llvm.getelementptr %4780[%4787] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4789 = llvm.load %4788 : !llvm.ptr + %4790 = llvm.fadd %4789, %4779 {RelaxedPrecision} : !llvm.float + %4791 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4792 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4793 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4794 = llvm.mul %3918, %4793 : !llvm.i64 + %4795 = llvm.add %4792, %4794 : !llvm.i64 + %4796 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4797 = llvm.mul %913, %4796 : !llvm.i64 + %4798 = llvm.add %4795, %4797 : !llvm.i64 + %4799 = llvm.getelementptr %4791[%4798] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4790, %4799 : !llvm.ptr + %4800 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4801 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4802 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4803 = llvm.mul %3918, %4802 : !llvm.i64 + %4804 = llvm.add %4801, %4803 : !llvm.i64 + %4805 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4806 = llvm.mul %913, %4805 : !llvm.i64 + %4807 = llvm.add %4804, %4806 : !llvm.i64 + %4808 = llvm.getelementptr %4800[%4807] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4809 = llvm.load %4808 : !llvm.ptr + %4810 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4811 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4812 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4813 = llvm.mul %3918, %4812 : !llvm.i64 + %4814 = llvm.add %4811, %4813 : !llvm.i64 + %4815 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4816 = llvm.mul %913, %4815 : !llvm.i64 + %4817 = llvm.add %4814, %4816 : !llvm.i64 + %4818 = llvm.getelementptr %4810[%4817] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4809, %4818 : !llvm.ptr + %4819 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4821 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4822 = llvm.mul %3918, %4821 : !llvm.i64 + %4823 = llvm.add %4820, %4822 : !llvm.i64 + %4824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4825 = llvm.mul %59, %4824 : !llvm.i64 + %4826 = llvm.add %4823, %4825 : !llvm.i64 + %4827 = llvm.getelementptr %4819[%4826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4828 = llvm.load %4827 : !llvm.ptr + %4829 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4830 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4831 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4832 = llvm.mul %59, %4831 : !llvm.i64 + %4833 = llvm.add %4830, %4832 : !llvm.i64 + %4834 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4835 = llvm.mul %974, %4834 : !llvm.i64 + %4836 = llvm.add %4833, %4835 : !llvm.i64 + %4837 = llvm.getelementptr %4829[%4836] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4838 = llvm.load %4837 : !llvm.ptr + %4839 = llvm.fmul %4828, %4838 {RelaxedPrecision} : !llvm.float + %4840 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4841 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4842 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4843 = llvm.mul %3918, %4842 : !llvm.i64 + %4844 = llvm.add %4841, %4843 : !llvm.i64 + %4845 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4846 = llvm.mul %974, %4845 : !llvm.i64 + %4847 = llvm.add %4844, %4846 : !llvm.i64 + %4848 = llvm.getelementptr %4840[%4847] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4849 = llvm.load %4848 : !llvm.ptr + %4850 = llvm.fadd %4849, %4839 {RelaxedPrecision} : !llvm.float + %4851 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4852 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4853 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4854 = llvm.mul %3918, %4853 : !llvm.i64 + %4855 = llvm.add %4852, %4854 : !llvm.i64 + %4856 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4857 = llvm.mul %974, %4856 : !llvm.i64 + %4858 = llvm.add %4855, %4857 : !llvm.i64 + %4859 = llvm.getelementptr %4851[%4858] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4850, %4859 : !llvm.ptr + %4860 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4861 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4862 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4863 = llvm.mul %3918, %4862 : !llvm.i64 + %4864 = llvm.add %4861, %4863 : !llvm.i64 + %4865 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4866 = llvm.mul %974, %4865 : !llvm.i64 + %4867 = llvm.add %4864, %4866 : !llvm.i64 + %4868 = llvm.getelementptr %4860[%4867] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4869 = llvm.load %4868 : !llvm.ptr + %4870 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4871 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4872 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4873 = llvm.mul %3918, %4872 : !llvm.i64 + %4874 = llvm.add %4871, %4873 : !llvm.i64 + %4875 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4876 = llvm.mul %974, %4875 : !llvm.i64 + %4877 = llvm.add %4874, %4876 : !llvm.i64 + %4878 = llvm.getelementptr %4870[%4877] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4869, %4878 : !llvm.ptr + %4879 = llvm.add %50, %37 : !llvm.i64 + %4880 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4881 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4882 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4883 = llvm.mul %4879, %4882 : !llvm.i64 + %4884 = llvm.add %4881, %4883 : !llvm.i64 + %4885 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4886 = llvm.mul %59, %4885 : !llvm.i64 + %4887 = llvm.add %4884, %4886 : !llvm.i64 + %4888 = llvm.getelementptr %4880[%4887] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4889 = llvm.load %4888 : !llvm.ptr + %4890 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4892 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4893 = llvm.mul %59, %4892 : !llvm.i64 + %4894 = llvm.add %4891, %4893 : !llvm.i64 + %4895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4896 = llvm.mul %58, %4895 : !llvm.i64 + %4897 = llvm.add %4894, %4896 : !llvm.i64 + %4898 = llvm.getelementptr %4890[%4897] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4899 = llvm.load %4898 : !llvm.ptr + %4900 = llvm.fmul %4889, %4899 {RelaxedPrecision} : !llvm.float + %4901 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4903 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4904 = llvm.mul %4879, %4903 : !llvm.i64 + %4905 = llvm.add %4902, %4904 : !llvm.i64 + %4906 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4907 = llvm.mul %58, %4906 : !llvm.i64 + %4908 = llvm.add %4905, %4907 : !llvm.i64 + %4909 = llvm.getelementptr %4901[%4908] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4910 = llvm.load %4909 : !llvm.ptr + %4911 = llvm.fadd %4910, %4900 {RelaxedPrecision} : !llvm.float + %4912 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4913 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4914 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4915 = llvm.mul %4879, %4914 : !llvm.i64 + %4916 = llvm.add %4913, %4915 : !llvm.i64 + %4917 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4918 = llvm.mul %58, %4917 : !llvm.i64 + %4919 = llvm.add %4916, %4918 : !llvm.i64 + %4920 = llvm.getelementptr %4912[%4919] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4911, %4920 : !llvm.ptr + %4921 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4922 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4923 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4924 = llvm.mul %4879, %4923 : !llvm.i64 + %4925 = llvm.add %4922, %4924 : !llvm.i64 + %4926 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4927 = llvm.mul %58, %4926 : !llvm.i64 + %4928 = llvm.add %4925, %4927 : !llvm.i64 + %4929 = llvm.getelementptr %4921[%4928] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4930 = llvm.load %4929 : !llvm.ptr + %4931 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4932 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4933 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4934 = llvm.mul %4879, %4933 : !llvm.i64 + %4935 = llvm.add %4932, %4934 : !llvm.i64 + %4936 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4937 = llvm.mul %58, %4936 : !llvm.i64 + %4938 = llvm.add %4935, %4937 : !llvm.i64 + %4939 = llvm.getelementptr %4931[%4938] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4930, %4939 : !llvm.ptr + %4940 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4941 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4942 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4943 = llvm.mul %4879, %4942 : !llvm.i64 + %4944 = llvm.add %4941, %4943 : !llvm.i64 + %4945 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4946 = llvm.mul %59, %4945 : !llvm.i64 + %4947 = llvm.add %4944, %4946 : !llvm.i64 + %4948 = llvm.getelementptr %4940[%4947] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4949 = llvm.load %4948 : !llvm.ptr + %4950 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4951 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4952 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4953 = llvm.mul %59, %4952 : !llvm.i64 + %4954 = llvm.add %4951, %4953 : !llvm.i64 + %4955 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4956 = llvm.mul %120, %4955 : !llvm.i64 + %4957 = llvm.add %4954, %4956 : !llvm.i64 + %4958 = llvm.getelementptr %4950[%4957] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4959 = llvm.load %4958 : !llvm.ptr + %4960 = llvm.fmul %4949, %4959 {RelaxedPrecision} : !llvm.float + %4961 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4962 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4963 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4964 = llvm.mul %4879, %4963 : !llvm.i64 + %4965 = llvm.add %4962, %4964 : !llvm.i64 + %4966 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4967 = llvm.mul %120, %4966 : !llvm.i64 + %4968 = llvm.add %4965, %4967 : !llvm.i64 + %4969 = llvm.getelementptr %4961[%4968] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4970 = llvm.load %4969 : !llvm.ptr + %4971 = llvm.fadd %4970, %4960 {RelaxedPrecision} : !llvm.float + %4972 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4973 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4974 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4975 = llvm.mul %4879, %4974 : !llvm.i64 + %4976 = llvm.add %4973, %4975 : !llvm.i64 + %4977 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4978 = llvm.mul %120, %4977 : !llvm.i64 + %4979 = llvm.add %4976, %4978 : !llvm.i64 + %4980 = llvm.getelementptr %4972[%4979] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4971, %4980 : !llvm.ptr + %4981 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4982 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4983 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4984 = llvm.mul %4879, %4983 : !llvm.i64 + %4985 = llvm.add %4982, %4984 : !llvm.i64 + %4986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4987 = llvm.mul %120, %4986 : !llvm.i64 + %4988 = llvm.add %4985, %4987 : !llvm.i64 + %4989 = llvm.getelementptr %4981[%4988] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4990 = llvm.load %4989 : !llvm.ptr + %4991 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4992 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4993 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4994 = llvm.mul %4879, %4993 : !llvm.i64 + %4995 = llvm.add %4992, %4994 : !llvm.i64 + %4996 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4997 = llvm.mul %120, %4996 : !llvm.i64 + %4998 = llvm.add %4995, %4997 : !llvm.i64 + %4999 = llvm.getelementptr %4991[%4998] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4990, %4999 : !llvm.ptr + %5000 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5001 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5002 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5003 = llvm.mul %4879, %5002 : !llvm.i64 + %5004 = llvm.add %5001, %5003 : !llvm.i64 + %5005 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5006 = llvm.mul %59, %5005 : !llvm.i64 + %5007 = llvm.add %5004, %5006 : !llvm.i64 + %5008 = llvm.getelementptr %5000[%5007] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5009 = llvm.load %5008 : !llvm.ptr + %5010 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5011 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5012 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5013 = llvm.mul %59, %5012 : !llvm.i64 + %5014 = llvm.add %5011, %5013 : !llvm.i64 + %5015 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5016 = llvm.mul %181, %5015 : !llvm.i64 + %5017 = llvm.add %5014, %5016 : !llvm.i64 + %5018 = llvm.getelementptr %5010[%5017] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5019 = llvm.load %5018 : !llvm.ptr + %5020 = llvm.fmul %5009, %5019 {RelaxedPrecision} : !llvm.float + %5021 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5022 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5023 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5024 = llvm.mul %4879, %5023 : !llvm.i64 + %5025 = llvm.add %5022, %5024 : !llvm.i64 + %5026 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5027 = llvm.mul %181, %5026 : !llvm.i64 + %5028 = llvm.add %5025, %5027 : !llvm.i64 + %5029 = llvm.getelementptr %5021[%5028] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5030 = llvm.load %5029 : !llvm.ptr + %5031 = llvm.fadd %5030, %5020 {RelaxedPrecision} : !llvm.float + %5032 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5033 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5034 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5035 = llvm.mul %4879, %5034 : !llvm.i64 + %5036 = llvm.add %5033, %5035 : !llvm.i64 + %5037 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5038 = llvm.mul %181, %5037 : !llvm.i64 + %5039 = llvm.add %5036, %5038 : !llvm.i64 + %5040 = llvm.getelementptr %5032[%5039] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5031, %5040 : !llvm.ptr + %5041 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5042 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5043 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5044 = llvm.mul %4879, %5043 : !llvm.i64 + %5045 = llvm.add %5042, %5044 : !llvm.i64 + %5046 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5047 = llvm.mul %181, %5046 : !llvm.i64 + %5048 = llvm.add %5045, %5047 : !llvm.i64 + %5049 = llvm.getelementptr %5041[%5048] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5050 = llvm.load %5049 : !llvm.ptr + %5051 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5052 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5053 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5054 = llvm.mul %4879, %5053 : !llvm.i64 + %5055 = llvm.add %5052, %5054 : !llvm.i64 + %5056 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5057 = llvm.mul %181, %5056 : !llvm.i64 + %5058 = llvm.add %5055, %5057 : !llvm.i64 + %5059 = llvm.getelementptr %5051[%5058] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5050, %5059 : !llvm.ptr + %5060 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5062 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5063 = llvm.mul %4879, %5062 : !llvm.i64 + %5064 = llvm.add %5061, %5063 : !llvm.i64 + %5065 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5066 = llvm.mul %59, %5065 : !llvm.i64 + %5067 = llvm.add %5064, %5066 : !llvm.i64 + %5068 = llvm.getelementptr %5060[%5067] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5069 = llvm.load %5068 : !llvm.ptr + %5070 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5071 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5072 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5073 = llvm.mul %59, %5072 : !llvm.i64 + %5074 = llvm.add %5071, %5073 : !llvm.i64 + %5075 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5076 = llvm.mul %242, %5075 : !llvm.i64 + %5077 = llvm.add %5074, %5076 : !llvm.i64 + %5078 = llvm.getelementptr %5070[%5077] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5079 = llvm.load %5078 : !llvm.ptr + %5080 = llvm.fmul %5069, %5079 {RelaxedPrecision} : !llvm.float + %5081 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5082 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5083 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5084 = llvm.mul %4879, %5083 : !llvm.i64 + %5085 = llvm.add %5082, %5084 : !llvm.i64 + %5086 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5087 = llvm.mul %242, %5086 : !llvm.i64 + %5088 = llvm.add %5085, %5087 : !llvm.i64 + %5089 = llvm.getelementptr %5081[%5088] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5090 = llvm.load %5089 : !llvm.ptr + %5091 = llvm.fadd %5090, %5080 {RelaxedPrecision} : !llvm.float + %5092 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5093 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5094 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5095 = llvm.mul %4879, %5094 : !llvm.i64 + %5096 = llvm.add %5093, %5095 : !llvm.i64 + %5097 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5098 = llvm.mul %242, %5097 : !llvm.i64 + %5099 = llvm.add %5096, %5098 : !llvm.i64 + %5100 = llvm.getelementptr %5092[%5099] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5091, %5100 : !llvm.ptr + %5101 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5102 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5103 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5104 = llvm.mul %4879, %5103 : !llvm.i64 + %5105 = llvm.add %5102, %5104 : !llvm.i64 + %5106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5107 = llvm.mul %242, %5106 : !llvm.i64 + %5108 = llvm.add %5105, %5107 : !llvm.i64 + %5109 = llvm.getelementptr %5101[%5108] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5110 = llvm.load %5109 : !llvm.ptr + %5111 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5112 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5113 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5114 = llvm.mul %4879, %5113 : !llvm.i64 + %5115 = llvm.add %5112, %5114 : !llvm.i64 + %5116 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5117 = llvm.mul %242, %5116 : !llvm.i64 + %5118 = llvm.add %5115, %5117 : !llvm.i64 + %5119 = llvm.getelementptr %5111[%5118] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5110, %5119 : !llvm.ptr + %5120 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5122 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5123 = llvm.mul %4879, %5122 : !llvm.i64 + %5124 = llvm.add %5121, %5123 : !llvm.i64 + %5125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5126 = llvm.mul %59, %5125 : !llvm.i64 + %5127 = llvm.add %5124, %5126 : !llvm.i64 + %5128 = llvm.getelementptr %5120[%5127] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5129 = llvm.load %5128 : !llvm.ptr + %5130 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5132 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5133 = llvm.mul %59, %5132 : !llvm.i64 + %5134 = llvm.add %5131, %5133 : !llvm.i64 + %5135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5136 = llvm.mul %303, %5135 : !llvm.i64 + %5137 = llvm.add %5134, %5136 : !llvm.i64 + %5138 = llvm.getelementptr %5130[%5137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5139 = llvm.load %5138 : !llvm.ptr + %5140 = llvm.fmul %5129, %5139 {RelaxedPrecision} : !llvm.float + %5141 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5142 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5143 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5144 = llvm.mul %4879, %5143 : !llvm.i64 + %5145 = llvm.add %5142, %5144 : !llvm.i64 + %5146 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5147 = llvm.mul %303, %5146 : !llvm.i64 + %5148 = llvm.add %5145, %5147 : !llvm.i64 + %5149 = llvm.getelementptr %5141[%5148] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5150 = llvm.load %5149 : !llvm.ptr + %5151 = llvm.fadd %5150, %5140 {RelaxedPrecision} : !llvm.float + %5152 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5153 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5154 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5155 = llvm.mul %4879, %5154 : !llvm.i64 + %5156 = llvm.add %5153, %5155 : !llvm.i64 + %5157 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5158 = llvm.mul %303, %5157 : !llvm.i64 + %5159 = llvm.add %5156, %5158 : !llvm.i64 + %5160 = llvm.getelementptr %5152[%5159] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5151, %5160 : !llvm.ptr + %5161 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5162 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5163 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5164 = llvm.mul %4879, %5163 : !llvm.i64 + %5165 = llvm.add %5162, %5164 : !llvm.i64 + %5166 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5167 = llvm.mul %303, %5166 : !llvm.i64 + %5168 = llvm.add %5165, %5167 : !llvm.i64 + %5169 = llvm.getelementptr %5161[%5168] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5170 = llvm.load %5169 : !llvm.ptr + %5171 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5172 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5173 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5174 = llvm.mul %4879, %5173 : !llvm.i64 + %5175 = llvm.add %5172, %5174 : !llvm.i64 + %5176 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5177 = llvm.mul %303, %5176 : !llvm.i64 + %5178 = llvm.add %5175, %5177 : !llvm.i64 + %5179 = llvm.getelementptr %5171[%5178] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5170, %5179 : !llvm.ptr + %5180 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5182 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5183 = llvm.mul %4879, %5182 : !llvm.i64 + %5184 = llvm.add %5181, %5183 : !llvm.i64 + %5185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5186 = llvm.mul %59, %5185 : !llvm.i64 + %5187 = llvm.add %5184, %5186 : !llvm.i64 + %5188 = llvm.getelementptr %5180[%5187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5189 = llvm.load %5188 : !llvm.ptr + %5190 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5192 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5193 = llvm.mul %59, %5192 : !llvm.i64 + %5194 = llvm.add %5191, %5193 : !llvm.i64 + %5195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5196 = llvm.mul %364, %5195 : !llvm.i64 + %5197 = llvm.add %5194, %5196 : !llvm.i64 + %5198 = llvm.getelementptr %5190[%5197] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5199 = llvm.load %5198 : !llvm.ptr + %5200 = llvm.fmul %5189, %5199 {RelaxedPrecision} : !llvm.float + %5201 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5202 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5203 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5204 = llvm.mul %4879, %5203 : !llvm.i64 + %5205 = llvm.add %5202, %5204 : !llvm.i64 + %5206 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5207 = llvm.mul %364, %5206 : !llvm.i64 + %5208 = llvm.add %5205, %5207 : !llvm.i64 + %5209 = llvm.getelementptr %5201[%5208] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5210 = llvm.load %5209 : !llvm.ptr + %5211 = llvm.fadd %5210, %5200 {RelaxedPrecision} : !llvm.float + %5212 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5213 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5214 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5215 = llvm.mul %4879, %5214 : !llvm.i64 + %5216 = llvm.add %5213, %5215 : !llvm.i64 + %5217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5218 = llvm.mul %364, %5217 : !llvm.i64 + %5219 = llvm.add %5216, %5218 : !llvm.i64 + %5220 = llvm.getelementptr %5212[%5219] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5211, %5220 : !llvm.ptr + %5221 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5223 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5224 = llvm.mul %4879, %5223 : !llvm.i64 + %5225 = llvm.add %5222, %5224 : !llvm.i64 + %5226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5227 = llvm.mul %364, %5226 : !llvm.i64 + %5228 = llvm.add %5225, %5227 : !llvm.i64 + %5229 = llvm.getelementptr %5221[%5228] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5230 = llvm.load %5229 : !llvm.ptr + %5231 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5232 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5233 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5234 = llvm.mul %4879, %5233 : !llvm.i64 + %5235 = llvm.add %5232, %5234 : !llvm.i64 + %5236 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5237 = llvm.mul %364, %5236 : !llvm.i64 + %5238 = llvm.add %5235, %5237 : !llvm.i64 + %5239 = llvm.getelementptr %5231[%5238] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5230, %5239 : !llvm.ptr + %5240 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5242 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5243 = llvm.mul %4879, %5242 : !llvm.i64 + %5244 = llvm.add %5241, %5243 : !llvm.i64 + %5245 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5246 = llvm.mul %59, %5245 : !llvm.i64 + %5247 = llvm.add %5244, %5246 : !llvm.i64 + %5248 = llvm.getelementptr %5240[%5247] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5249 = llvm.load %5248 : !llvm.ptr + %5250 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5252 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5253 = llvm.mul %59, %5252 : !llvm.i64 + %5254 = llvm.add %5251, %5253 : !llvm.i64 + %5255 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5256 = llvm.mul %425, %5255 : !llvm.i64 + %5257 = llvm.add %5254, %5256 : !llvm.i64 + %5258 = llvm.getelementptr %5250[%5257] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5259 = llvm.load %5258 : !llvm.ptr + %5260 = llvm.fmul %5249, %5259 {RelaxedPrecision} : !llvm.float + %5261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5263 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5264 = llvm.mul %4879, %5263 : !llvm.i64 + %5265 = llvm.add %5262, %5264 : !llvm.i64 + %5266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5267 = llvm.mul %425, %5266 : !llvm.i64 + %5268 = llvm.add %5265, %5267 : !llvm.i64 + %5269 = llvm.getelementptr %5261[%5268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5270 = llvm.load %5269 : !llvm.ptr + %5271 = llvm.fadd %5270, %5260 {RelaxedPrecision} : !llvm.float + %5272 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5273 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5274 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5275 = llvm.mul %4879, %5274 : !llvm.i64 + %5276 = llvm.add %5273, %5275 : !llvm.i64 + %5277 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5278 = llvm.mul %425, %5277 : !llvm.i64 + %5279 = llvm.add %5276, %5278 : !llvm.i64 + %5280 = llvm.getelementptr %5272[%5279] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5271, %5280 : !llvm.ptr + %5281 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5282 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5283 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5284 = llvm.mul %4879, %5283 : !llvm.i64 + %5285 = llvm.add %5282, %5284 : !llvm.i64 + %5286 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5287 = llvm.mul %425, %5286 : !llvm.i64 + %5288 = llvm.add %5285, %5287 : !llvm.i64 + %5289 = llvm.getelementptr %5281[%5288] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5290 = llvm.load %5289 : !llvm.ptr + %5291 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5292 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5293 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5294 = llvm.mul %4879, %5293 : !llvm.i64 + %5295 = llvm.add %5292, %5294 : !llvm.i64 + %5296 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5297 = llvm.mul %425, %5296 : !llvm.i64 + %5298 = llvm.add %5295, %5297 : !llvm.i64 + %5299 = llvm.getelementptr %5291[%5298] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5290, %5299 : !llvm.ptr + %5300 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5302 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5303 = llvm.mul %4879, %5302 : !llvm.i64 + %5304 = llvm.add %5301, %5303 : !llvm.i64 + %5305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5306 = llvm.mul %59, %5305 : !llvm.i64 + %5307 = llvm.add %5304, %5306 : !llvm.i64 + %5308 = llvm.getelementptr %5300[%5307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5309 = llvm.load %5308 : !llvm.ptr + %5310 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5311 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5312 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5313 = llvm.mul %59, %5312 : !llvm.i64 + %5314 = llvm.add %5311, %5313 : !llvm.i64 + %5315 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5316 = llvm.mul %486, %5315 : !llvm.i64 + %5317 = llvm.add %5314, %5316 : !llvm.i64 + %5318 = llvm.getelementptr %5310[%5317] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5319 = llvm.load %5318 : !llvm.ptr + %5320 = llvm.fmul %5309, %5319 {RelaxedPrecision} : !llvm.float + %5321 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5322 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5323 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5324 = llvm.mul %4879, %5323 : !llvm.i64 + %5325 = llvm.add %5322, %5324 : !llvm.i64 + %5326 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5327 = llvm.mul %486, %5326 : !llvm.i64 + %5328 = llvm.add %5325, %5327 : !llvm.i64 + %5329 = llvm.getelementptr %5321[%5328] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5330 = llvm.load %5329 : !llvm.ptr + %5331 = llvm.fadd %5330, %5320 {RelaxedPrecision} : !llvm.float + %5332 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5333 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5334 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5335 = llvm.mul %4879, %5334 : !llvm.i64 + %5336 = llvm.add %5333, %5335 : !llvm.i64 + %5337 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5338 = llvm.mul %486, %5337 : !llvm.i64 + %5339 = llvm.add %5336, %5338 : !llvm.i64 + %5340 = llvm.getelementptr %5332[%5339] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5331, %5340 : !llvm.ptr + %5341 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5342 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5343 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5344 = llvm.mul %4879, %5343 : !llvm.i64 + %5345 = llvm.add %5342, %5344 : !llvm.i64 + %5346 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5347 = llvm.mul %486, %5346 : !llvm.i64 + %5348 = llvm.add %5345, %5347 : !llvm.i64 + %5349 = llvm.getelementptr %5341[%5348] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5350 = llvm.load %5349 : !llvm.ptr + %5351 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5352 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5353 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5354 = llvm.mul %4879, %5353 : !llvm.i64 + %5355 = llvm.add %5352, %5354 : !llvm.i64 + %5356 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5357 = llvm.mul %486, %5356 : !llvm.i64 + %5358 = llvm.add %5355, %5357 : !llvm.i64 + %5359 = llvm.getelementptr %5351[%5358] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5350, %5359 : !llvm.ptr + %5360 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5361 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5362 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5363 = llvm.mul %4879, %5362 : !llvm.i64 + %5364 = llvm.add %5361, %5363 : !llvm.i64 + %5365 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5366 = llvm.mul %59, %5365 : !llvm.i64 + %5367 = llvm.add %5364, %5366 : !llvm.i64 + %5368 = llvm.getelementptr %5360[%5367] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5369 = llvm.load %5368 : !llvm.ptr + %5370 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5371 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5372 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5373 = llvm.mul %59, %5372 : !llvm.i64 + %5374 = llvm.add %5371, %5373 : !llvm.i64 + %5375 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5376 = llvm.mul %547, %5375 : !llvm.i64 + %5377 = llvm.add %5374, %5376 : !llvm.i64 + %5378 = llvm.getelementptr %5370[%5377] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5379 = llvm.load %5378 : !llvm.ptr + %5380 = llvm.fmul %5369, %5379 {RelaxedPrecision} : !llvm.float + %5381 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5383 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5384 = llvm.mul %4879, %5383 : !llvm.i64 + %5385 = llvm.add %5382, %5384 : !llvm.i64 + %5386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5387 = llvm.mul %547, %5386 : !llvm.i64 + %5388 = llvm.add %5385, %5387 : !llvm.i64 + %5389 = llvm.getelementptr %5381[%5388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5390 = llvm.load %5389 : !llvm.ptr + %5391 = llvm.fadd %5390, %5380 {RelaxedPrecision} : !llvm.float + %5392 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5393 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5394 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5395 = llvm.mul %4879, %5394 : !llvm.i64 + %5396 = llvm.add %5393, %5395 : !llvm.i64 + %5397 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5398 = llvm.mul %547, %5397 : !llvm.i64 + %5399 = llvm.add %5396, %5398 : !llvm.i64 + %5400 = llvm.getelementptr %5392[%5399] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5391, %5400 : !llvm.ptr + %5401 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5403 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5404 = llvm.mul %4879, %5403 : !llvm.i64 + %5405 = llvm.add %5402, %5404 : !llvm.i64 + %5406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5407 = llvm.mul %547, %5406 : !llvm.i64 + %5408 = llvm.add %5405, %5407 : !llvm.i64 + %5409 = llvm.getelementptr %5401[%5408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5410 = llvm.load %5409 : !llvm.ptr + %5411 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5413 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5414 = llvm.mul %4879, %5413 : !llvm.i64 + %5415 = llvm.add %5412, %5414 : !llvm.i64 + %5416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5417 = llvm.mul %547, %5416 : !llvm.i64 + %5418 = llvm.add %5415, %5417 : !llvm.i64 + %5419 = llvm.getelementptr %5411[%5418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5410, %5419 : !llvm.ptr + %5420 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5421 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5422 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5423 = llvm.mul %4879, %5422 : !llvm.i64 + %5424 = llvm.add %5421, %5423 : !llvm.i64 + %5425 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5426 = llvm.mul %59, %5425 : !llvm.i64 + %5427 = llvm.add %5424, %5426 : !llvm.i64 + %5428 = llvm.getelementptr %5420[%5427] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5429 = llvm.load %5428 : !llvm.ptr + %5430 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5431 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5432 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5433 = llvm.mul %59, %5432 : !llvm.i64 + %5434 = llvm.add %5431, %5433 : !llvm.i64 + %5435 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5436 = llvm.mul %608, %5435 : !llvm.i64 + %5437 = llvm.add %5434, %5436 : !llvm.i64 + %5438 = llvm.getelementptr %5430[%5437] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5439 = llvm.load %5438 : !llvm.ptr + %5440 = llvm.fmul %5429, %5439 {RelaxedPrecision} : !llvm.float + %5441 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5442 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5443 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5444 = llvm.mul %4879, %5443 : !llvm.i64 + %5445 = llvm.add %5442, %5444 : !llvm.i64 + %5446 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5447 = llvm.mul %608, %5446 : !llvm.i64 + %5448 = llvm.add %5445, %5447 : !llvm.i64 + %5449 = llvm.getelementptr %5441[%5448] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5450 = llvm.load %5449 : !llvm.ptr + %5451 = llvm.fadd %5450, %5440 {RelaxedPrecision} : !llvm.float + %5452 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5455 = llvm.mul %4879, %5454 : !llvm.i64 + %5456 = llvm.add %5453, %5455 : !llvm.i64 + %5457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5458 = llvm.mul %608, %5457 : !llvm.i64 + %5459 = llvm.add %5456, %5458 : !llvm.i64 + %5460 = llvm.getelementptr %5452[%5459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5451, %5460 : !llvm.ptr + %5461 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5462 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5463 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5464 = llvm.mul %4879, %5463 : !llvm.i64 + %5465 = llvm.add %5462, %5464 : !llvm.i64 + %5466 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5467 = llvm.mul %608, %5466 : !llvm.i64 + %5468 = llvm.add %5465, %5467 : !llvm.i64 + %5469 = llvm.getelementptr %5461[%5468] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5470 = llvm.load %5469 : !llvm.ptr + %5471 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5472 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5473 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5474 = llvm.mul %4879, %5473 : !llvm.i64 + %5475 = llvm.add %5472, %5474 : !llvm.i64 + %5476 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5477 = llvm.mul %608, %5476 : !llvm.i64 + %5478 = llvm.add %5475, %5477 : !llvm.i64 + %5479 = llvm.getelementptr %5471[%5478] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5470, %5479 : !llvm.ptr + %5480 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5481 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5482 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5483 = llvm.mul %4879, %5482 : !llvm.i64 + %5484 = llvm.add %5481, %5483 : !llvm.i64 + %5485 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5486 = llvm.mul %59, %5485 : !llvm.i64 + %5487 = llvm.add %5484, %5486 : !llvm.i64 + %5488 = llvm.getelementptr %5480[%5487] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5489 = llvm.load %5488 : !llvm.ptr + %5490 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5491 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5492 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5493 = llvm.mul %59, %5492 : !llvm.i64 + %5494 = llvm.add %5491, %5493 : !llvm.i64 + %5495 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5496 = llvm.mul %669, %5495 : !llvm.i64 + %5497 = llvm.add %5494, %5496 : !llvm.i64 + %5498 = llvm.getelementptr %5490[%5497] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5499 = llvm.load %5498 : !llvm.ptr + %5500 = llvm.fmul %5489, %5499 {RelaxedPrecision} : !llvm.float + %5501 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5502 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5503 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5504 = llvm.mul %4879, %5503 : !llvm.i64 + %5505 = llvm.add %5502, %5504 : !llvm.i64 + %5506 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5507 = llvm.mul %669, %5506 : !llvm.i64 + %5508 = llvm.add %5505, %5507 : !llvm.i64 + %5509 = llvm.getelementptr %5501[%5508] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5510 = llvm.load %5509 : !llvm.ptr + %5511 = llvm.fadd %5510, %5500 {RelaxedPrecision} : !llvm.float + %5512 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5513 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5514 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5515 = llvm.mul %4879, %5514 : !llvm.i64 + %5516 = llvm.add %5513, %5515 : !llvm.i64 + %5517 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5518 = llvm.mul %669, %5517 : !llvm.i64 + %5519 = llvm.add %5516, %5518 : !llvm.i64 + %5520 = llvm.getelementptr %5512[%5519] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5511, %5520 : !llvm.ptr + %5521 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5522 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5523 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5524 = llvm.mul %4879, %5523 : !llvm.i64 + %5525 = llvm.add %5522, %5524 : !llvm.i64 + %5526 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5527 = llvm.mul %669, %5526 : !llvm.i64 + %5528 = llvm.add %5525, %5527 : !llvm.i64 + %5529 = llvm.getelementptr %5521[%5528] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5530 = llvm.load %5529 : !llvm.ptr + %5531 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5533 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5534 = llvm.mul %4879, %5533 : !llvm.i64 + %5535 = llvm.add %5532, %5534 : !llvm.i64 + %5536 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5537 = llvm.mul %669, %5536 : !llvm.i64 + %5538 = llvm.add %5535, %5537 : !llvm.i64 + %5539 = llvm.getelementptr %5531[%5538] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5530, %5539 : !llvm.ptr + %5540 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5542 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5543 = llvm.mul %4879, %5542 : !llvm.i64 + %5544 = llvm.add %5541, %5543 : !llvm.i64 + %5545 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5546 = llvm.mul %59, %5545 : !llvm.i64 + %5547 = llvm.add %5544, %5546 : !llvm.i64 + %5548 = llvm.getelementptr %5540[%5547] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5549 = llvm.load %5548 : !llvm.ptr + %5550 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5551 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5552 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5553 = llvm.mul %59, %5552 : !llvm.i64 + %5554 = llvm.add %5551, %5553 : !llvm.i64 + %5555 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5556 = llvm.mul %730, %5555 : !llvm.i64 + %5557 = llvm.add %5554, %5556 : !llvm.i64 + %5558 = llvm.getelementptr %5550[%5557] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5559 = llvm.load %5558 : !llvm.ptr + %5560 = llvm.fmul %5549, %5559 {RelaxedPrecision} : !llvm.float + %5561 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5562 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5563 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5564 = llvm.mul %4879, %5563 : !llvm.i64 + %5565 = llvm.add %5562, %5564 : !llvm.i64 + %5566 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5567 = llvm.mul %730, %5566 : !llvm.i64 + %5568 = llvm.add %5565, %5567 : !llvm.i64 + %5569 = llvm.getelementptr %5561[%5568] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5570 = llvm.load %5569 : !llvm.ptr + %5571 = llvm.fadd %5570, %5560 {RelaxedPrecision} : !llvm.float + %5572 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5573 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5574 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5575 = llvm.mul %4879, %5574 : !llvm.i64 + %5576 = llvm.add %5573, %5575 : !llvm.i64 + %5577 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5578 = llvm.mul %730, %5577 : !llvm.i64 + %5579 = llvm.add %5576, %5578 : !llvm.i64 + %5580 = llvm.getelementptr %5572[%5579] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5571, %5580 : !llvm.ptr + %5581 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5582 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5583 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5584 = llvm.mul %4879, %5583 : !llvm.i64 + %5585 = llvm.add %5582, %5584 : !llvm.i64 + %5586 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5587 = llvm.mul %730, %5586 : !llvm.i64 + %5588 = llvm.add %5585, %5587 : !llvm.i64 + %5589 = llvm.getelementptr %5581[%5588] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5590 = llvm.load %5589 : !llvm.ptr + %5591 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5593 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5594 = llvm.mul %4879, %5593 : !llvm.i64 + %5595 = llvm.add %5592, %5594 : !llvm.i64 + %5596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5597 = llvm.mul %730, %5596 : !llvm.i64 + %5598 = llvm.add %5595, %5597 : !llvm.i64 + %5599 = llvm.getelementptr %5591[%5598] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5590, %5599 : !llvm.ptr + %5600 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5602 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5603 = llvm.mul %4879, %5602 : !llvm.i64 + %5604 = llvm.add %5601, %5603 : !llvm.i64 + %5605 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5606 = llvm.mul %59, %5605 : !llvm.i64 + %5607 = llvm.add %5604, %5606 : !llvm.i64 + %5608 = llvm.getelementptr %5600[%5607] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5609 = llvm.load %5608 : !llvm.ptr + %5610 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5611 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5612 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5613 = llvm.mul %59, %5612 : !llvm.i64 + %5614 = llvm.add %5611, %5613 : !llvm.i64 + %5615 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5616 = llvm.mul %791, %5615 : !llvm.i64 + %5617 = llvm.add %5614, %5616 : !llvm.i64 + %5618 = llvm.getelementptr %5610[%5617] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5619 = llvm.load %5618 : !llvm.ptr + %5620 = llvm.fmul %5609, %5619 {RelaxedPrecision} : !llvm.float + %5621 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5622 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5623 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5624 = llvm.mul %4879, %5623 : !llvm.i64 + %5625 = llvm.add %5622, %5624 : !llvm.i64 + %5626 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5627 = llvm.mul %791, %5626 : !llvm.i64 + %5628 = llvm.add %5625, %5627 : !llvm.i64 + %5629 = llvm.getelementptr %5621[%5628] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5630 = llvm.load %5629 : !llvm.ptr + %5631 = llvm.fadd %5630, %5620 {RelaxedPrecision} : !llvm.float + %5632 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5633 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5634 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5635 = llvm.mul %4879, %5634 : !llvm.i64 + %5636 = llvm.add %5633, %5635 : !llvm.i64 + %5637 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5638 = llvm.mul %791, %5637 : !llvm.i64 + %5639 = llvm.add %5636, %5638 : !llvm.i64 + %5640 = llvm.getelementptr %5632[%5639] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5631, %5640 : !llvm.ptr + %5641 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5642 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5643 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5644 = llvm.mul %4879, %5643 : !llvm.i64 + %5645 = llvm.add %5642, %5644 : !llvm.i64 + %5646 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5647 = llvm.mul %791, %5646 : !llvm.i64 + %5648 = llvm.add %5645, %5647 : !llvm.i64 + %5649 = llvm.getelementptr %5641[%5648] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5650 = llvm.load %5649 : !llvm.ptr + %5651 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5652 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5653 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5654 = llvm.mul %4879, %5653 : !llvm.i64 + %5655 = llvm.add %5652, %5654 : !llvm.i64 + %5656 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5657 = llvm.mul %791, %5656 : !llvm.i64 + %5658 = llvm.add %5655, %5657 : !llvm.i64 + %5659 = llvm.getelementptr %5651[%5658] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5650, %5659 : !llvm.ptr + %5660 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5662 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5663 = llvm.mul %4879, %5662 : !llvm.i64 + %5664 = llvm.add %5661, %5663 : !llvm.i64 + %5665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5666 = llvm.mul %59, %5665 : !llvm.i64 + %5667 = llvm.add %5664, %5666 : !llvm.i64 + %5668 = llvm.getelementptr %5660[%5667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5669 = llvm.load %5668 : !llvm.ptr + %5670 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5672 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5673 = llvm.mul %59, %5672 : !llvm.i64 + %5674 = llvm.add %5671, %5673 : !llvm.i64 + %5675 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5676 = llvm.mul %852, %5675 : !llvm.i64 + %5677 = llvm.add %5674, %5676 : !llvm.i64 + %5678 = llvm.getelementptr %5670[%5677] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5679 = llvm.load %5678 : !llvm.ptr + %5680 = llvm.fmul %5669, %5679 {RelaxedPrecision} : !llvm.float + %5681 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5682 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5683 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5684 = llvm.mul %4879, %5683 : !llvm.i64 + %5685 = llvm.add %5682, %5684 : !llvm.i64 + %5686 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5687 = llvm.mul %852, %5686 : !llvm.i64 + %5688 = llvm.add %5685, %5687 : !llvm.i64 + %5689 = llvm.getelementptr %5681[%5688] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5690 = llvm.load %5689 : !llvm.ptr + %5691 = llvm.fadd %5690, %5680 {RelaxedPrecision} : !llvm.float + %5692 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5694 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5695 = llvm.mul %4879, %5694 : !llvm.i64 + %5696 = llvm.add %5693, %5695 : !llvm.i64 + %5697 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5698 = llvm.mul %852, %5697 : !llvm.i64 + %5699 = llvm.add %5696, %5698 : !llvm.i64 + %5700 = llvm.getelementptr %5692[%5699] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5691, %5700 : !llvm.ptr + %5701 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5702 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5703 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5704 = llvm.mul %4879, %5703 : !llvm.i64 + %5705 = llvm.add %5702, %5704 : !llvm.i64 + %5706 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5707 = llvm.mul %852, %5706 : !llvm.i64 + %5708 = llvm.add %5705, %5707 : !llvm.i64 + %5709 = llvm.getelementptr %5701[%5708] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5710 = llvm.load %5709 : !llvm.ptr + %5711 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5712 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5713 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5714 = llvm.mul %4879, %5713 : !llvm.i64 + %5715 = llvm.add %5712, %5714 : !llvm.i64 + %5716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5717 = llvm.mul %852, %5716 : !llvm.i64 + %5718 = llvm.add %5715, %5717 : !llvm.i64 + %5719 = llvm.getelementptr %5711[%5718] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5710, %5719 : !llvm.ptr + %5720 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5721 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5722 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5723 = llvm.mul %4879, %5722 : !llvm.i64 + %5724 = llvm.add %5721, %5723 : !llvm.i64 + %5725 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5726 = llvm.mul %59, %5725 : !llvm.i64 + %5727 = llvm.add %5724, %5726 : !llvm.i64 + %5728 = llvm.getelementptr %5720[%5727] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5729 = llvm.load %5728 : !llvm.ptr + %5730 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5731 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5732 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5733 = llvm.mul %59, %5732 : !llvm.i64 + %5734 = llvm.add %5731, %5733 : !llvm.i64 + %5735 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5736 = llvm.mul %913, %5735 : !llvm.i64 + %5737 = llvm.add %5734, %5736 : !llvm.i64 + %5738 = llvm.getelementptr %5730[%5737] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5739 = llvm.load %5738 : !llvm.ptr + %5740 = llvm.fmul %5729, %5739 {RelaxedPrecision} : !llvm.float + %5741 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5742 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5743 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5744 = llvm.mul %4879, %5743 : !llvm.i64 + %5745 = llvm.add %5742, %5744 : !llvm.i64 + %5746 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5747 = llvm.mul %913, %5746 : !llvm.i64 + %5748 = llvm.add %5745, %5747 : !llvm.i64 + %5749 = llvm.getelementptr %5741[%5748] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5750 = llvm.load %5749 : !llvm.ptr + %5751 = llvm.fadd %5750, %5740 {RelaxedPrecision} : !llvm.float + %5752 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5754 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5755 = llvm.mul %4879, %5754 : !llvm.i64 + %5756 = llvm.add %5753, %5755 : !llvm.i64 + %5757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5758 = llvm.mul %913, %5757 : !llvm.i64 + %5759 = llvm.add %5756, %5758 : !llvm.i64 + %5760 = llvm.getelementptr %5752[%5759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5751, %5760 : !llvm.ptr + %5761 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5762 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5763 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5764 = llvm.mul %4879, %5763 : !llvm.i64 + %5765 = llvm.add %5762, %5764 : !llvm.i64 + %5766 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5767 = llvm.mul %913, %5766 : !llvm.i64 + %5768 = llvm.add %5765, %5767 : !llvm.i64 + %5769 = llvm.getelementptr %5761[%5768] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5770 = llvm.load %5769 : !llvm.ptr + %5771 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5772 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5773 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5774 = llvm.mul %4879, %5773 : !llvm.i64 + %5775 = llvm.add %5772, %5774 : !llvm.i64 + %5776 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5777 = llvm.mul %913, %5776 : !llvm.i64 + %5778 = llvm.add %5775, %5777 : !llvm.i64 + %5779 = llvm.getelementptr %5771[%5778] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5770, %5779 : !llvm.ptr + %5780 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5781 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5782 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5783 = llvm.mul %4879, %5782 : !llvm.i64 + %5784 = llvm.add %5781, %5783 : !llvm.i64 + %5785 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5786 = llvm.mul %59, %5785 : !llvm.i64 + %5787 = llvm.add %5784, %5786 : !llvm.i64 + %5788 = llvm.getelementptr %5780[%5787] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5789 = llvm.load %5788 : !llvm.ptr + %5790 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5791 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5792 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5793 = llvm.mul %59, %5792 : !llvm.i64 + %5794 = llvm.add %5791, %5793 : !llvm.i64 + %5795 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5796 = llvm.mul %974, %5795 : !llvm.i64 + %5797 = llvm.add %5794, %5796 : !llvm.i64 + %5798 = llvm.getelementptr %5790[%5797] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5799 = llvm.load %5798 : !llvm.ptr + %5800 = llvm.fmul %5789, %5799 {RelaxedPrecision} : !llvm.float + %5801 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5802 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5803 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5804 = llvm.mul %4879, %5803 : !llvm.i64 + %5805 = llvm.add %5802, %5804 : !llvm.i64 + %5806 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5807 = llvm.mul %974, %5806 : !llvm.i64 + %5808 = llvm.add %5805, %5807 : !llvm.i64 + %5809 = llvm.getelementptr %5801[%5808] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5810 = llvm.load %5809 : !llvm.ptr + %5811 = llvm.fadd %5810, %5800 {RelaxedPrecision} : !llvm.float + %5812 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5813 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5814 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5815 = llvm.mul %4879, %5814 : !llvm.i64 + %5816 = llvm.add %5813, %5815 : !llvm.i64 + %5817 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5818 = llvm.mul %974, %5817 : !llvm.i64 + %5819 = llvm.add %5816, %5818 : !llvm.i64 + %5820 = llvm.getelementptr %5812[%5819] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5811, %5820 : !llvm.ptr + %5821 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5822 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5823 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5824 = llvm.mul %4879, %5823 : !llvm.i64 + %5825 = llvm.add %5822, %5824 : !llvm.i64 + %5826 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5827 = llvm.mul %974, %5826 : !llvm.i64 + %5828 = llvm.add %5825, %5827 : !llvm.i64 + %5829 = llvm.getelementptr %5821[%5828] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5830 = llvm.load %5829 : !llvm.ptr + %5831 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5832 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5833 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5834 = llvm.mul %4879, %5833 : !llvm.i64 + %5835 = llvm.add %5832, %5834 : !llvm.i64 + %5836 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5837 = llvm.mul %974, %5836 : !llvm.i64 + %5838 = llvm.add %5835, %5837 : !llvm.i64 + %5839 = llvm.getelementptr %5831[%5838] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5830, %5839 : !llvm.ptr + %5840 = llvm.add %56, %33 : !llvm.i64 + llvm.br ^bb9(%5840 : !llvm.i64) + ^bb11: // pred: ^bb9 + %5841 = llvm.add %54, %36 : !llvm.i64 + llvm.br ^bb7(%5841 : !llvm.i64) + ^bb12: // pred: ^bb7 + %5842 = llvm.add %52, %30 : !llvm.i64 + llvm.br ^bb5(%5842 : !llvm.i64) + ^bb13: // pred: ^bb5 + %5843 = llvm.add %50, %38 : !llvm.i64 + llvm.br ^bb3(%5843 : !llvm.i64) + ^bb14: // pred: ^bb3 + llvm.br ^bb15(%32 : !llvm.i64) + ^bb15(%5844: !llvm.i64): // 2 preds: ^bb14, ^bb22 + %5845 = llvm.icmp "slt" %5844, %29 : !llvm.i64 + llvm.cond_br %5845, ^bb16, ^bb23 + ^bb16: // pred: ^bb15 + llvm.br ^bb17(%32 : !llvm.i64) + ^bb17(%5846: !llvm.i64): // 2 preds: ^bb16, ^bb21 + %5847 = llvm.icmp "slt" %5846, %31 : !llvm.i64 + llvm.cond_br %5847, ^bb18, ^bb22 + ^bb18: // pred: ^bb17 + llvm.br ^bb19(%32 : !llvm.i64) + ^bb19(%5848: !llvm.i64): // 2 preds: ^bb18, ^bb20 + %5849 = llvm.icmp "slt" %5848, %36 : !llvm.i64 + llvm.cond_br %5849, ^bb20, ^bb21 + ^bb20: // pred: ^bb19 + %5850 = llvm.add %48, %5844 : !llvm.i64 + %5851 = llvm.add %5846, %5848 : !llvm.i64 + %5852 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5853 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5854 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5855 = llvm.mul %28, %5854 : !llvm.i64 + %5856 = llvm.add %5853, %5855 : !llvm.i64 + %5857 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5858 = llvm.mul %5851, %5857 : !llvm.i64 + %5859 = llvm.add %5856, %5858 : !llvm.i64 + %5860 = llvm.getelementptr %5852[%5859] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5861 = llvm.load %5860 : !llvm.ptr + %5862 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5863 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5864 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5865 = llvm.mul %5851, %5864 : !llvm.i64 + %5866 = llvm.add %5863, %5865 : !llvm.i64 + %5867 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5868 = llvm.mul %5850, %5867 : !llvm.i64 + %5869 = llvm.add %5866, %5868 : !llvm.i64 + %5870 = llvm.getelementptr %5862[%5869] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5871 = llvm.load %5870 : !llvm.ptr + %5872 = llvm.fmul %5861, %5871 {RelaxedPrecision} : !llvm.float + %5873 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5874 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5875 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5876 = llvm.mul %28, %5875 : !llvm.i64 + %5877 = llvm.add %5874, %5876 : !llvm.i64 + %5878 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5879 = llvm.mul %5850, %5878 : !llvm.i64 + %5880 = llvm.add %5877, %5879 : !llvm.i64 + %5881 = llvm.getelementptr %5873[%5880] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5882 = llvm.load %5881 : !llvm.ptr + %5883 = llvm.fadd %5882, %5872 {RelaxedPrecision} : !llvm.float + %5884 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5885 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5886 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5887 = llvm.mul %28, %5886 : !llvm.i64 + %5888 = llvm.add %5885, %5887 : !llvm.i64 + %5889 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5890 = llvm.mul %5850, %5889 : !llvm.i64 + %5891 = llvm.add %5888, %5890 : !llvm.i64 + %5892 = llvm.getelementptr %5884[%5891] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5883, %5892 : !llvm.ptr + %5893 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5894 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5895 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5896 = llvm.mul %28, %5895 : !llvm.i64 + %5897 = llvm.add %5894, %5896 : !llvm.i64 + %5898 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5899 = llvm.mul %5850, %5898 : !llvm.i64 + %5900 = llvm.add %5897, %5899 : !llvm.i64 + %5901 = llvm.getelementptr %5893[%5900] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5902 = llvm.load %5901 : !llvm.ptr + %5903 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5904 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5905 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5906 = llvm.mul %28, %5905 : !llvm.i64 + %5907 = llvm.add %5904, %5906 : !llvm.i64 + %5908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5909 = llvm.mul %5850, %5908 : !llvm.i64 + %5910 = llvm.add %5907, %5909 : !llvm.i64 + %5911 = llvm.getelementptr %5903[%5910] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5902, %5911 : !llvm.ptr + %5912 = llvm.add %5850, %33 : !llvm.i64 + %5913 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5914 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5915 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5916 = llvm.mul %28, %5915 : !llvm.i64 + %5917 = llvm.add %5914, %5916 : !llvm.i64 + %5918 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5919 = llvm.mul %5851, %5918 : !llvm.i64 + %5920 = llvm.add %5917, %5919 : !llvm.i64 + %5921 = llvm.getelementptr %5913[%5920] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5922 = llvm.load %5921 : !llvm.ptr + %5923 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5924 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5925 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5926 = llvm.mul %5851, %5925 : !llvm.i64 + %5927 = llvm.add %5924, %5926 : !llvm.i64 + %5928 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5929 = llvm.mul %5912, %5928 : !llvm.i64 + %5930 = llvm.add %5927, %5929 : !llvm.i64 + %5931 = llvm.getelementptr %5923[%5930] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5932 = llvm.load %5931 : !llvm.ptr + %5933 = llvm.fmul %5922, %5932 {RelaxedPrecision} : !llvm.float + %5934 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5935 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5936 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5937 = llvm.mul %28, %5936 : !llvm.i64 + %5938 = llvm.add %5935, %5937 : !llvm.i64 + %5939 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5940 = llvm.mul %5912, %5939 : !llvm.i64 + %5941 = llvm.add %5938, %5940 : !llvm.i64 + %5942 = llvm.getelementptr %5934[%5941] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5943 = llvm.load %5942 : !llvm.ptr + %5944 = llvm.fadd %5943, %5933 {RelaxedPrecision} : !llvm.float + %5945 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5946 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5947 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5948 = llvm.mul %28, %5947 : !llvm.i64 + %5949 = llvm.add %5946, %5948 : !llvm.i64 + %5950 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5951 = llvm.mul %5912, %5950 : !llvm.i64 + %5952 = llvm.add %5949, %5951 : !llvm.i64 + %5953 = llvm.getelementptr %5945[%5952] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5944, %5953 : !llvm.ptr + %5954 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5955 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5956 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5957 = llvm.mul %28, %5956 : !llvm.i64 + %5958 = llvm.add %5955, %5957 : !llvm.i64 + %5959 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5960 = llvm.mul %5912, %5959 : !llvm.i64 + %5961 = llvm.add %5958, %5960 : !llvm.i64 + %5962 = llvm.getelementptr %5954[%5961] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5963 = llvm.load %5962 : !llvm.ptr + %5964 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5966 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5967 = llvm.mul %28, %5966 : !llvm.i64 + %5968 = llvm.add %5965, %5967 : !llvm.i64 + %5969 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5970 = llvm.mul %5912, %5969 : !llvm.i64 + %5971 = llvm.add %5968, %5970 : !llvm.i64 + %5972 = llvm.getelementptr %5964[%5971] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5963, %5972 : !llvm.ptr + %5973 = llvm.add %5850, %34 : !llvm.i64 + %5974 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5975 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5976 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5977 = llvm.mul %28, %5976 : !llvm.i64 + %5978 = llvm.add %5975, %5977 : !llvm.i64 + %5979 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5980 = llvm.mul %5851, %5979 : !llvm.i64 + %5981 = llvm.add %5978, %5980 : !llvm.i64 + %5982 = llvm.getelementptr %5974[%5981] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5983 = llvm.load %5982 : !llvm.ptr + %5984 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5985 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5986 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5987 = llvm.mul %5851, %5986 : !llvm.i64 + %5988 = llvm.add %5985, %5987 : !llvm.i64 + %5989 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5990 = llvm.mul %5973, %5989 : !llvm.i64 + %5991 = llvm.add %5988, %5990 : !llvm.i64 + %5992 = llvm.getelementptr %5984[%5991] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5993 = llvm.load %5992 : !llvm.ptr + %5994 = llvm.fmul %5983, %5993 {RelaxedPrecision} : !llvm.float + %5995 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5996 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5997 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5998 = llvm.mul %28, %5997 : !llvm.i64 + %5999 = llvm.add %5996, %5998 : !llvm.i64 + %6000 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6001 = llvm.mul %5973, %6000 : !llvm.i64 + %6002 = llvm.add %5999, %6001 : !llvm.i64 + %6003 = llvm.getelementptr %5995[%6002] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6004 = llvm.load %6003 : !llvm.ptr + %6005 = llvm.fadd %6004, %5994 {RelaxedPrecision} : !llvm.float + %6006 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6007 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6008 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6009 = llvm.mul %28, %6008 : !llvm.i64 + %6010 = llvm.add %6007, %6009 : !llvm.i64 + %6011 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6012 = llvm.mul %5973, %6011 : !llvm.i64 + %6013 = llvm.add %6010, %6012 : !llvm.i64 + %6014 = llvm.getelementptr %6006[%6013] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6005, %6014 : !llvm.ptr + %6015 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6016 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6017 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6018 = llvm.mul %28, %6017 : !llvm.i64 + %6019 = llvm.add %6016, %6018 : !llvm.i64 + %6020 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6021 = llvm.mul %5973, %6020 : !llvm.i64 + %6022 = llvm.add %6019, %6021 : !llvm.i64 + %6023 = llvm.getelementptr %6015[%6022] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6024 = llvm.load %6023 : !llvm.ptr + %6025 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6026 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6027 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6028 = llvm.mul %28, %6027 : !llvm.i64 + %6029 = llvm.add %6026, %6028 : !llvm.i64 + %6030 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6031 = llvm.mul %5973, %6030 : !llvm.i64 + %6032 = llvm.add %6029, %6031 : !llvm.i64 + %6033 = llvm.getelementptr %6025[%6032] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6024, %6033 : !llvm.ptr + %6034 = llvm.add %5850, %35 : !llvm.i64 + %6035 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6036 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6037 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6038 = llvm.mul %28, %6037 : !llvm.i64 + %6039 = llvm.add %6036, %6038 : !llvm.i64 + %6040 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6041 = llvm.mul %5851, %6040 : !llvm.i64 + %6042 = llvm.add %6039, %6041 : !llvm.i64 + %6043 = llvm.getelementptr %6035[%6042] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6044 = llvm.load %6043 : !llvm.ptr + %6045 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6046 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6047 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6048 = llvm.mul %5851, %6047 : !llvm.i64 + %6049 = llvm.add %6046, %6048 : !llvm.i64 + %6050 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6051 = llvm.mul %6034, %6050 : !llvm.i64 + %6052 = llvm.add %6049, %6051 : !llvm.i64 + %6053 = llvm.getelementptr %6045[%6052] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6054 = llvm.load %6053 : !llvm.ptr + %6055 = llvm.fmul %6044, %6054 {RelaxedPrecision} : !llvm.float + %6056 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6057 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6058 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6059 = llvm.mul %28, %6058 : !llvm.i64 + %6060 = llvm.add %6057, %6059 : !llvm.i64 + %6061 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6062 = llvm.mul %6034, %6061 : !llvm.i64 + %6063 = llvm.add %6060, %6062 : !llvm.i64 + %6064 = llvm.getelementptr %6056[%6063] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6065 = llvm.load %6064 : !llvm.ptr + %6066 = llvm.fadd %6065, %6055 {RelaxedPrecision} : !llvm.float + %6067 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6069 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6070 = llvm.mul %28, %6069 : !llvm.i64 + %6071 = llvm.add %6068, %6070 : !llvm.i64 + %6072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6073 = llvm.mul %6034, %6072 : !llvm.i64 + %6074 = llvm.add %6071, %6073 : !llvm.i64 + %6075 = llvm.getelementptr %6067[%6074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6066, %6075 : !llvm.ptr + %6076 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6077 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6078 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6079 = llvm.mul %28, %6078 : !llvm.i64 + %6080 = llvm.add %6077, %6079 : !llvm.i64 + %6081 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6082 = llvm.mul %6034, %6081 : !llvm.i64 + %6083 = llvm.add %6080, %6082 : !llvm.i64 + %6084 = llvm.getelementptr %6076[%6083] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6085 = llvm.load %6084 : !llvm.ptr + %6086 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6087 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6088 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6089 = llvm.mul %28, %6088 : !llvm.i64 + %6090 = llvm.add %6087, %6089 : !llvm.i64 + %6091 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6092 = llvm.mul %6034, %6091 : !llvm.i64 + %6093 = llvm.add %6090, %6092 : !llvm.i64 + %6094 = llvm.getelementptr %6086[%6093] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6085, %6094 : !llvm.ptr + %6095 = llvm.add %5850, %36 : !llvm.i64 + %6096 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6098 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6099 = llvm.mul %28, %6098 : !llvm.i64 + %6100 = llvm.add %6097, %6099 : !llvm.i64 + %6101 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6102 = llvm.mul %5851, %6101 : !llvm.i64 + %6103 = llvm.add %6100, %6102 : !llvm.i64 + %6104 = llvm.getelementptr %6096[%6103] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6105 = llvm.load %6104 : !llvm.ptr + %6106 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6108 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6109 = llvm.mul %5851, %6108 : !llvm.i64 + %6110 = llvm.add %6107, %6109 : !llvm.i64 + %6111 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6112 = llvm.mul %6095, %6111 : !llvm.i64 + %6113 = llvm.add %6110, %6112 : !llvm.i64 + %6114 = llvm.getelementptr %6106[%6113] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6115 = llvm.load %6114 : !llvm.ptr + %6116 = llvm.fmul %6105, %6115 {RelaxedPrecision} : !llvm.float + %6117 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6120 = llvm.mul %28, %6119 : !llvm.i64 + %6121 = llvm.add %6118, %6120 : !llvm.i64 + %6122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6123 = llvm.mul %6095, %6122 : !llvm.i64 + %6124 = llvm.add %6121, %6123 : !llvm.i64 + %6125 = llvm.getelementptr %6117[%6124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6126 = llvm.load %6125 : !llvm.ptr + %6127 = llvm.fadd %6126, %6116 {RelaxedPrecision} : !llvm.float + %6128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6131 = llvm.mul %28, %6130 : !llvm.i64 + %6132 = llvm.add %6129, %6131 : !llvm.i64 + %6133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6134 = llvm.mul %6095, %6133 : !llvm.i64 + %6135 = llvm.add %6132, %6134 : !llvm.i64 + %6136 = llvm.getelementptr %6128[%6135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6127, %6136 : !llvm.ptr + %6137 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6140 = llvm.mul %28, %6139 : !llvm.i64 + %6141 = llvm.add %6138, %6140 : !llvm.i64 + %6142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6143 = llvm.mul %6095, %6142 : !llvm.i64 + %6144 = llvm.add %6141, %6143 : !llvm.i64 + %6145 = llvm.getelementptr %6137[%6144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6146 = llvm.load %6145 : !llvm.ptr + %6147 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6149 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6150 = llvm.mul %28, %6149 : !llvm.i64 + %6151 = llvm.add %6148, %6150 : !llvm.i64 + %6152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6153 = llvm.mul %6095, %6152 : !llvm.i64 + %6154 = llvm.add %6151, %6153 : !llvm.i64 + %6155 = llvm.getelementptr %6147[%6154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6146, %6155 : !llvm.ptr + %6156 = llvm.add %5850, %37 : !llvm.i64 + %6157 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6159 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6160 = llvm.mul %28, %6159 : !llvm.i64 + %6161 = llvm.add %6158, %6160 : !llvm.i64 + %6162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6163 = llvm.mul %5851, %6162 : !llvm.i64 + %6164 = llvm.add %6161, %6163 : !llvm.i64 + %6165 = llvm.getelementptr %6157[%6164] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6166 = llvm.load %6165 : !llvm.ptr + %6167 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6169 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6170 = llvm.mul %5851, %6169 : !llvm.i64 + %6171 = llvm.add %6168, %6170 : !llvm.i64 + %6172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6173 = llvm.mul %6156, %6172 : !llvm.i64 + %6174 = llvm.add %6171, %6173 : !llvm.i64 + %6175 = llvm.getelementptr %6167[%6174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6176 = llvm.load %6175 : !llvm.ptr + %6177 = llvm.fmul %6166, %6176 {RelaxedPrecision} : !llvm.float + %6178 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6180 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6181 = llvm.mul %28, %6180 : !llvm.i64 + %6182 = llvm.add %6179, %6181 : !llvm.i64 + %6183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6184 = llvm.mul %6156, %6183 : !llvm.i64 + %6185 = llvm.add %6182, %6184 : !llvm.i64 + %6186 = llvm.getelementptr %6178[%6185] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6187 = llvm.load %6186 : !llvm.ptr + %6188 = llvm.fadd %6187, %6177 {RelaxedPrecision} : !llvm.float + %6189 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6191 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6192 = llvm.mul %28, %6191 : !llvm.i64 + %6193 = llvm.add %6190, %6192 : !llvm.i64 + %6194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6195 = llvm.mul %6156, %6194 : !llvm.i64 + %6196 = llvm.add %6193, %6195 : !llvm.i64 + %6197 = llvm.getelementptr %6189[%6196] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6188, %6197 : !llvm.ptr + %6198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6201 = llvm.mul %28, %6200 : !llvm.i64 + %6202 = llvm.add %6199, %6201 : !llvm.i64 + %6203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6204 = llvm.mul %6156, %6203 : !llvm.i64 + %6205 = llvm.add %6202, %6204 : !llvm.i64 + %6206 = llvm.getelementptr %6198[%6205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6207 = llvm.load %6206 : !llvm.ptr + %6208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6211 = llvm.mul %28, %6210 : !llvm.i64 + %6212 = llvm.add %6209, %6211 : !llvm.i64 + %6213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6214 = llvm.mul %6156, %6213 : !llvm.i64 + %6215 = llvm.add %6212, %6214 : !llvm.i64 + %6216 = llvm.getelementptr %6208[%6215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6207, %6216 : !llvm.ptr + %6217 = llvm.add %5850, %38 : !llvm.i64 + %6218 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6220 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6221 = llvm.mul %28, %6220 : !llvm.i64 + %6222 = llvm.add %6219, %6221 : !llvm.i64 + %6223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6224 = llvm.mul %5851, %6223 : !llvm.i64 + %6225 = llvm.add %6222, %6224 : !llvm.i64 + %6226 = llvm.getelementptr %6218[%6225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6227 = llvm.load %6226 : !llvm.ptr + %6228 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6231 = llvm.mul %5851, %6230 : !llvm.i64 + %6232 = llvm.add %6229, %6231 : !llvm.i64 + %6233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6234 = llvm.mul %6217, %6233 : !llvm.i64 + %6235 = llvm.add %6232, %6234 : !llvm.i64 + %6236 = llvm.getelementptr %6228[%6235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6237 = llvm.load %6236 : !llvm.ptr + %6238 = llvm.fmul %6227, %6237 {RelaxedPrecision} : !llvm.float + %6239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6242 = llvm.mul %28, %6241 : !llvm.i64 + %6243 = llvm.add %6240, %6242 : !llvm.i64 + %6244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6245 = llvm.mul %6217, %6244 : !llvm.i64 + %6246 = llvm.add %6243, %6245 : !llvm.i64 + %6247 = llvm.getelementptr %6239[%6246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6248 = llvm.load %6247 : !llvm.ptr + %6249 = llvm.fadd %6248, %6238 {RelaxedPrecision} : !llvm.float + %6250 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6252 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6253 = llvm.mul %28, %6252 : !llvm.i64 + %6254 = llvm.add %6251, %6253 : !llvm.i64 + %6255 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6256 = llvm.mul %6217, %6255 : !llvm.i64 + %6257 = llvm.add %6254, %6256 : !llvm.i64 + %6258 = llvm.getelementptr %6250[%6257] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6249, %6258 : !llvm.ptr + %6259 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6260 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6261 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6262 = llvm.mul %28, %6261 : !llvm.i64 + %6263 = llvm.add %6260, %6262 : !llvm.i64 + %6264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6265 = llvm.mul %6217, %6264 : !llvm.i64 + %6266 = llvm.add %6263, %6265 : !llvm.i64 + %6267 = llvm.getelementptr %6259[%6266] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6268 = llvm.load %6267 : !llvm.ptr + %6269 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6270 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6271 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6272 = llvm.mul %28, %6271 : !llvm.i64 + %6273 = llvm.add %6270, %6272 : !llvm.i64 + %6274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6275 = llvm.mul %6217, %6274 : !llvm.i64 + %6276 = llvm.add %6273, %6275 : !llvm.i64 + %6277 = llvm.getelementptr %6269[%6276] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6268, %6277 : !llvm.ptr + %6278 = llvm.add %5850, %39 : !llvm.i64 + %6279 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6281 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6282 = llvm.mul %28, %6281 : !llvm.i64 + %6283 = llvm.add %6280, %6282 : !llvm.i64 + %6284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6285 = llvm.mul %5851, %6284 : !llvm.i64 + %6286 = llvm.add %6283, %6285 : !llvm.i64 + %6287 = llvm.getelementptr %6279[%6286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6288 = llvm.load %6287 : !llvm.ptr + %6289 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6290 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6291 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6292 = llvm.mul %5851, %6291 : !llvm.i64 + %6293 = llvm.add %6290, %6292 : !llvm.i64 + %6294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6295 = llvm.mul %6278, %6294 : !llvm.i64 + %6296 = llvm.add %6293, %6295 : !llvm.i64 + %6297 = llvm.getelementptr %6289[%6296] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6298 = llvm.load %6297 : !llvm.ptr + %6299 = llvm.fmul %6288, %6298 {RelaxedPrecision} : !llvm.float + %6300 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6302 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6303 = llvm.mul %28, %6302 : !llvm.i64 + %6304 = llvm.add %6301, %6303 : !llvm.i64 + %6305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6306 = llvm.mul %6278, %6305 : !llvm.i64 + %6307 = llvm.add %6304, %6306 : !llvm.i64 + %6308 = llvm.getelementptr %6300[%6307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6309 = llvm.load %6308 : !llvm.ptr + %6310 = llvm.fadd %6309, %6299 {RelaxedPrecision} : !llvm.float + %6311 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6314 = llvm.mul %28, %6313 : !llvm.i64 + %6315 = llvm.add %6312, %6314 : !llvm.i64 + %6316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6317 = llvm.mul %6278, %6316 : !llvm.i64 + %6318 = llvm.add %6315, %6317 : !llvm.i64 + %6319 = llvm.getelementptr %6311[%6318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6310, %6319 : !llvm.ptr + %6320 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6321 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6322 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6323 = llvm.mul %28, %6322 : !llvm.i64 + %6324 = llvm.add %6321, %6323 : !llvm.i64 + %6325 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6326 = llvm.mul %6278, %6325 : !llvm.i64 + %6327 = llvm.add %6324, %6326 : !llvm.i64 + %6328 = llvm.getelementptr %6320[%6327] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6329 = llvm.load %6328 : !llvm.ptr + %6330 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6331 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6332 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6333 = llvm.mul %28, %6332 : !llvm.i64 + %6334 = llvm.add %6331, %6333 : !llvm.i64 + %6335 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6336 = llvm.mul %6278, %6335 : !llvm.i64 + %6337 = llvm.add %6334, %6336 : !llvm.i64 + %6338 = llvm.getelementptr %6330[%6337] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6329, %6338 : !llvm.ptr + %6339 = llvm.add %5850, %40 : !llvm.i64 + %6340 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6342 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6343 = llvm.mul %28, %6342 : !llvm.i64 + %6344 = llvm.add %6341, %6343 : !llvm.i64 + %6345 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6346 = llvm.mul %5851, %6345 : !llvm.i64 + %6347 = llvm.add %6344, %6346 : !llvm.i64 + %6348 = llvm.getelementptr %6340[%6347] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6349 = llvm.load %6348 : !llvm.ptr + %6350 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6351 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6352 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6353 = llvm.mul %5851, %6352 : !llvm.i64 + %6354 = llvm.add %6351, %6353 : !llvm.i64 + %6355 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6356 = llvm.mul %6339, %6355 : !llvm.i64 + %6357 = llvm.add %6354, %6356 : !llvm.i64 + %6358 = llvm.getelementptr %6350[%6357] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6359 = llvm.load %6358 : !llvm.ptr + %6360 = llvm.fmul %6349, %6359 {RelaxedPrecision} : !llvm.float + %6361 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6362 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6363 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6364 = llvm.mul %28, %6363 : !llvm.i64 + %6365 = llvm.add %6362, %6364 : !llvm.i64 + %6366 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6367 = llvm.mul %6339, %6366 : !llvm.i64 + %6368 = llvm.add %6365, %6367 : !llvm.i64 + %6369 = llvm.getelementptr %6361[%6368] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6370 = llvm.load %6369 : !llvm.ptr + %6371 = llvm.fadd %6370, %6360 {RelaxedPrecision} : !llvm.float + %6372 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6373 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6374 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6375 = llvm.mul %28, %6374 : !llvm.i64 + %6376 = llvm.add %6373, %6375 : !llvm.i64 + %6377 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6378 = llvm.mul %6339, %6377 : !llvm.i64 + %6379 = llvm.add %6376, %6378 : !llvm.i64 + %6380 = llvm.getelementptr %6372[%6379] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6371, %6380 : !llvm.ptr + %6381 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6383 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6384 = llvm.mul %28, %6383 : !llvm.i64 + %6385 = llvm.add %6382, %6384 : !llvm.i64 + %6386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6387 = llvm.mul %6339, %6386 : !llvm.i64 + %6388 = llvm.add %6385, %6387 : !llvm.i64 + %6389 = llvm.getelementptr %6381[%6388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6390 = llvm.load %6389 : !llvm.ptr + %6391 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6392 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6393 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6394 = llvm.mul %28, %6393 : !llvm.i64 + %6395 = llvm.add %6392, %6394 : !llvm.i64 + %6396 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6397 = llvm.mul %6339, %6396 : !llvm.i64 + %6398 = llvm.add %6395, %6397 : !llvm.i64 + %6399 = llvm.getelementptr %6391[%6398] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6390, %6399 : !llvm.ptr + %6400 = llvm.add %5850, %41 : !llvm.i64 + %6401 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6403 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6404 = llvm.mul %28, %6403 : !llvm.i64 + %6405 = llvm.add %6402, %6404 : !llvm.i64 + %6406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6407 = llvm.mul %5851, %6406 : !llvm.i64 + %6408 = llvm.add %6405, %6407 : !llvm.i64 + %6409 = llvm.getelementptr %6401[%6408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6410 = llvm.load %6409 : !llvm.ptr + %6411 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6413 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6414 = llvm.mul %5851, %6413 : !llvm.i64 + %6415 = llvm.add %6412, %6414 : !llvm.i64 + %6416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6417 = llvm.mul %6400, %6416 : !llvm.i64 + %6418 = llvm.add %6415, %6417 : !llvm.i64 + %6419 = llvm.getelementptr %6411[%6418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6420 = llvm.load %6419 : !llvm.ptr + %6421 = llvm.fmul %6410, %6420 {RelaxedPrecision} : !llvm.float + %6422 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6423 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6424 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6425 = llvm.mul %28, %6424 : !llvm.i64 + %6426 = llvm.add %6423, %6425 : !llvm.i64 + %6427 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6428 = llvm.mul %6400, %6427 : !llvm.i64 + %6429 = llvm.add %6426, %6428 : !llvm.i64 + %6430 = llvm.getelementptr %6422[%6429] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6431 = llvm.load %6430 : !llvm.ptr + %6432 = llvm.fadd %6431, %6421 {RelaxedPrecision} : !llvm.float + %6433 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6434 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6435 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6436 = llvm.mul %28, %6435 : !llvm.i64 + %6437 = llvm.add %6434, %6436 : !llvm.i64 + %6438 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6439 = llvm.mul %6400, %6438 : !llvm.i64 + %6440 = llvm.add %6437, %6439 : !llvm.i64 + %6441 = llvm.getelementptr %6433[%6440] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6432, %6441 : !llvm.ptr + %6442 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6444 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6445 = llvm.mul %28, %6444 : !llvm.i64 + %6446 = llvm.add %6443, %6445 : !llvm.i64 + %6447 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6448 = llvm.mul %6400, %6447 : !llvm.i64 + %6449 = llvm.add %6446, %6448 : !llvm.i64 + %6450 = llvm.getelementptr %6442[%6449] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6451 = llvm.load %6450 : !llvm.ptr + %6452 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6455 = llvm.mul %28, %6454 : !llvm.i64 + %6456 = llvm.add %6453, %6455 : !llvm.i64 + %6457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6458 = llvm.mul %6400, %6457 : !llvm.i64 + %6459 = llvm.add %6456, %6458 : !llvm.i64 + %6460 = llvm.getelementptr %6452[%6459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6451, %6460 : !llvm.ptr + %6461 = llvm.add %5850, %42 : !llvm.i64 + %6462 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6463 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6464 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6465 = llvm.mul %28, %6464 : !llvm.i64 + %6466 = llvm.add %6463, %6465 : !llvm.i64 + %6467 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6468 = llvm.mul %5851, %6467 : !llvm.i64 + %6469 = llvm.add %6466, %6468 : !llvm.i64 + %6470 = llvm.getelementptr %6462[%6469] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6471 = llvm.load %6470 : !llvm.ptr + %6472 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6473 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6474 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6475 = llvm.mul %5851, %6474 : !llvm.i64 + %6476 = llvm.add %6473, %6475 : !llvm.i64 + %6477 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6478 = llvm.mul %6461, %6477 : !llvm.i64 + %6479 = llvm.add %6476, %6478 : !llvm.i64 + %6480 = llvm.getelementptr %6472[%6479] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6481 = llvm.load %6480 : !llvm.ptr + %6482 = llvm.fmul %6471, %6481 {RelaxedPrecision} : !llvm.float + %6483 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6484 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6485 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6486 = llvm.mul %28, %6485 : !llvm.i64 + %6487 = llvm.add %6484, %6486 : !llvm.i64 + %6488 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6489 = llvm.mul %6461, %6488 : !llvm.i64 + %6490 = llvm.add %6487, %6489 : !llvm.i64 + %6491 = llvm.getelementptr %6483[%6490] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6492 = llvm.load %6491 : !llvm.ptr + %6493 = llvm.fadd %6492, %6482 {RelaxedPrecision} : !llvm.float + %6494 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6495 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6496 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6497 = llvm.mul %28, %6496 : !llvm.i64 + %6498 = llvm.add %6495, %6497 : !llvm.i64 + %6499 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6500 = llvm.mul %6461, %6499 : !llvm.i64 + %6501 = llvm.add %6498, %6500 : !llvm.i64 + %6502 = llvm.getelementptr %6494[%6501] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6493, %6502 : !llvm.ptr + %6503 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6504 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6505 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6506 = llvm.mul %28, %6505 : !llvm.i64 + %6507 = llvm.add %6504, %6506 : !llvm.i64 + %6508 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6509 = llvm.mul %6461, %6508 : !llvm.i64 + %6510 = llvm.add %6507, %6509 : !llvm.i64 + %6511 = llvm.getelementptr %6503[%6510] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6512 = llvm.load %6511 : !llvm.ptr + %6513 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6514 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6515 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6516 = llvm.mul %28, %6515 : !llvm.i64 + %6517 = llvm.add %6514, %6516 : !llvm.i64 + %6518 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6519 = llvm.mul %6461, %6518 : !llvm.i64 + %6520 = llvm.add %6517, %6519 : !llvm.i64 + %6521 = llvm.getelementptr %6513[%6520] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6512, %6521 : !llvm.ptr + %6522 = llvm.add %5850, %43 : !llvm.i64 + %6523 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6524 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6525 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6526 = llvm.mul %28, %6525 : !llvm.i64 + %6527 = llvm.add %6524, %6526 : !llvm.i64 + %6528 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6529 = llvm.mul %5851, %6528 : !llvm.i64 + %6530 = llvm.add %6527, %6529 : !llvm.i64 + %6531 = llvm.getelementptr %6523[%6530] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6532 = llvm.load %6531 : !llvm.ptr + %6533 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6534 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6535 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6536 = llvm.mul %5851, %6535 : !llvm.i64 + %6537 = llvm.add %6534, %6536 : !llvm.i64 + %6538 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6539 = llvm.mul %6522, %6538 : !llvm.i64 + %6540 = llvm.add %6537, %6539 : !llvm.i64 + %6541 = llvm.getelementptr %6533[%6540] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6542 = llvm.load %6541 : !llvm.ptr + %6543 = llvm.fmul %6532, %6542 {RelaxedPrecision} : !llvm.float + %6544 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6545 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6546 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6547 = llvm.mul %28, %6546 : !llvm.i64 + %6548 = llvm.add %6545, %6547 : !llvm.i64 + %6549 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6550 = llvm.mul %6522, %6549 : !llvm.i64 + %6551 = llvm.add %6548, %6550 : !llvm.i64 + %6552 = llvm.getelementptr %6544[%6551] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6553 = llvm.load %6552 : !llvm.ptr + %6554 = llvm.fadd %6553, %6543 {RelaxedPrecision} : !llvm.float + %6555 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6556 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6557 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6558 = llvm.mul %28, %6557 : !llvm.i64 + %6559 = llvm.add %6556, %6558 : !llvm.i64 + %6560 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6561 = llvm.mul %6522, %6560 : !llvm.i64 + %6562 = llvm.add %6559, %6561 : !llvm.i64 + %6563 = llvm.getelementptr %6555[%6562] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6554, %6563 : !llvm.ptr + %6564 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6565 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6566 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6567 = llvm.mul %28, %6566 : !llvm.i64 + %6568 = llvm.add %6565, %6567 : !llvm.i64 + %6569 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6570 = llvm.mul %6522, %6569 : !llvm.i64 + %6571 = llvm.add %6568, %6570 : !llvm.i64 + %6572 = llvm.getelementptr %6564[%6571] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6573 = llvm.load %6572 : !llvm.ptr + %6574 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6575 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6576 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6577 = llvm.mul %28, %6576 : !llvm.i64 + %6578 = llvm.add %6575, %6577 : !llvm.i64 + %6579 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6580 = llvm.mul %6522, %6579 : !llvm.i64 + %6581 = llvm.add %6578, %6580 : !llvm.i64 + %6582 = llvm.getelementptr %6574[%6581] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6573, %6582 : !llvm.ptr + %6583 = llvm.add %5850, %44 : !llvm.i64 + %6584 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6585 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6586 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6587 = llvm.mul %28, %6586 : !llvm.i64 + %6588 = llvm.add %6585, %6587 : !llvm.i64 + %6589 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6590 = llvm.mul %5851, %6589 : !llvm.i64 + %6591 = llvm.add %6588, %6590 : !llvm.i64 + %6592 = llvm.getelementptr %6584[%6591] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6593 = llvm.load %6592 : !llvm.ptr + %6594 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6595 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6596 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6597 = llvm.mul %5851, %6596 : !llvm.i64 + %6598 = llvm.add %6595, %6597 : !llvm.i64 + %6599 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6600 = llvm.mul %6583, %6599 : !llvm.i64 + %6601 = llvm.add %6598, %6600 : !llvm.i64 + %6602 = llvm.getelementptr %6594[%6601] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6603 = llvm.load %6602 : !llvm.ptr + %6604 = llvm.fmul %6593, %6603 {RelaxedPrecision} : !llvm.float + %6605 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6606 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6607 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6608 = llvm.mul %28, %6607 : !llvm.i64 + %6609 = llvm.add %6606, %6608 : !llvm.i64 + %6610 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6611 = llvm.mul %6583, %6610 : !llvm.i64 + %6612 = llvm.add %6609, %6611 : !llvm.i64 + %6613 = llvm.getelementptr %6605[%6612] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6614 = llvm.load %6613 : !llvm.ptr + %6615 = llvm.fadd %6614, %6604 {RelaxedPrecision} : !llvm.float + %6616 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6617 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6618 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6619 = llvm.mul %28, %6618 : !llvm.i64 + %6620 = llvm.add %6617, %6619 : !llvm.i64 + %6621 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6622 = llvm.mul %6583, %6621 : !llvm.i64 + %6623 = llvm.add %6620, %6622 : !llvm.i64 + %6624 = llvm.getelementptr %6616[%6623] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6615, %6624 : !llvm.ptr + %6625 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6626 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6627 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6628 = llvm.mul %28, %6627 : !llvm.i64 + %6629 = llvm.add %6626, %6628 : !llvm.i64 + %6630 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6631 = llvm.mul %6583, %6630 : !llvm.i64 + %6632 = llvm.add %6629, %6631 : !llvm.i64 + %6633 = llvm.getelementptr %6625[%6632] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6634 = llvm.load %6633 : !llvm.ptr + %6635 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6637 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6638 = llvm.mul %28, %6637 : !llvm.i64 + %6639 = llvm.add %6636, %6638 : !llvm.i64 + %6640 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6641 = llvm.mul %6583, %6640 : !llvm.i64 + %6642 = llvm.add %6639, %6641 : !llvm.i64 + %6643 = llvm.getelementptr %6635[%6642] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6634, %6643 : !llvm.ptr + %6644 = llvm.add %5850, %45 : !llvm.i64 + %6645 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6646 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6647 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6648 = llvm.mul %28, %6647 : !llvm.i64 + %6649 = llvm.add %6646, %6648 : !llvm.i64 + %6650 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6651 = llvm.mul %5851, %6650 : !llvm.i64 + %6652 = llvm.add %6649, %6651 : !llvm.i64 + %6653 = llvm.getelementptr %6645[%6652] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6654 = llvm.load %6653 : !llvm.ptr + %6655 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6656 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6657 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6658 = llvm.mul %5851, %6657 : !llvm.i64 + %6659 = llvm.add %6656, %6658 : !llvm.i64 + %6660 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6661 = llvm.mul %6644, %6660 : !llvm.i64 + %6662 = llvm.add %6659, %6661 : !llvm.i64 + %6663 = llvm.getelementptr %6655[%6662] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6664 = llvm.load %6663 : !llvm.ptr + %6665 = llvm.fmul %6654, %6664 {RelaxedPrecision} : !llvm.float + %6666 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6667 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6668 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6669 = llvm.mul %28, %6668 : !llvm.i64 + %6670 = llvm.add %6667, %6669 : !llvm.i64 + %6671 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6672 = llvm.mul %6644, %6671 : !llvm.i64 + %6673 = llvm.add %6670, %6672 : !llvm.i64 + %6674 = llvm.getelementptr %6666[%6673] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6675 = llvm.load %6674 : !llvm.ptr + %6676 = llvm.fadd %6675, %6665 {RelaxedPrecision} : !llvm.float + %6677 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6680 = llvm.mul %28, %6679 : !llvm.i64 + %6681 = llvm.add %6678, %6680 : !llvm.i64 + %6682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6683 = llvm.mul %6644, %6682 : !llvm.i64 + %6684 = llvm.add %6681, %6683 : !llvm.i64 + %6685 = llvm.getelementptr %6677[%6684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6676, %6685 : !llvm.ptr + %6686 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6687 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6688 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6689 = llvm.mul %28, %6688 : !llvm.i64 + %6690 = llvm.add %6687, %6689 : !llvm.i64 + %6691 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6692 = llvm.mul %6644, %6691 : !llvm.i64 + %6693 = llvm.add %6690, %6692 : !llvm.i64 + %6694 = llvm.getelementptr %6686[%6693] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6695 = llvm.load %6694 : !llvm.ptr + %6696 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6697 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6698 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6699 = llvm.mul %28, %6698 : !llvm.i64 + %6700 = llvm.add %6697, %6699 : !llvm.i64 + %6701 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6702 = llvm.mul %6644, %6701 : !llvm.i64 + %6703 = llvm.add %6700, %6702 : !llvm.i64 + %6704 = llvm.getelementptr %6696[%6703] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6695, %6704 : !llvm.ptr + %6705 = llvm.add %5850, %46 : !llvm.i64 + %6706 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6708 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6709 = llvm.mul %28, %6708 : !llvm.i64 + %6710 = llvm.add %6707, %6709 : !llvm.i64 + %6711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6712 = llvm.mul %5851, %6711 : !llvm.i64 + %6713 = llvm.add %6710, %6712 : !llvm.i64 + %6714 = llvm.getelementptr %6706[%6713] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6715 = llvm.load %6714 : !llvm.ptr + %6716 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6717 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6718 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6719 = llvm.mul %5851, %6718 : !llvm.i64 + %6720 = llvm.add %6717, %6719 : !llvm.i64 + %6721 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6722 = llvm.mul %6705, %6721 : !llvm.i64 + %6723 = llvm.add %6720, %6722 : !llvm.i64 + %6724 = llvm.getelementptr %6716[%6723] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6725 = llvm.load %6724 : !llvm.ptr + %6726 = llvm.fmul %6715, %6725 {RelaxedPrecision} : !llvm.float + %6727 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6729 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6730 = llvm.mul %28, %6729 : !llvm.i64 + %6731 = llvm.add %6728, %6730 : !llvm.i64 + %6732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6733 = llvm.mul %6705, %6732 : !llvm.i64 + %6734 = llvm.add %6731, %6733 : !llvm.i64 + %6735 = llvm.getelementptr %6727[%6734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6736 = llvm.load %6735 : !llvm.ptr + %6737 = llvm.fadd %6736, %6726 {RelaxedPrecision} : !llvm.float + %6738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6741 = llvm.mul %28, %6740 : !llvm.i64 + %6742 = llvm.add %6739, %6741 : !llvm.i64 + %6743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6744 = llvm.mul %6705, %6743 : !llvm.i64 + %6745 = llvm.add %6742, %6744 : !llvm.i64 + %6746 = llvm.getelementptr %6738[%6745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6737, %6746 : !llvm.ptr + %6747 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6749 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6750 = llvm.mul %28, %6749 : !llvm.i64 + %6751 = llvm.add %6748, %6750 : !llvm.i64 + %6752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6753 = llvm.mul %6705, %6752 : !llvm.i64 + %6754 = llvm.add %6751, %6753 : !llvm.i64 + %6755 = llvm.getelementptr %6747[%6754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6756 = llvm.load %6755 : !llvm.ptr + %6757 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6758 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6759 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6760 = llvm.mul %28, %6759 : !llvm.i64 + %6761 = llvm.add %6758, %6760 : !llvm.i64 + %6762 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6763 = llvm.mul %6705, %6762 : !llvm.i64 + %6764 = llvm.add %6761, %6763 : !llvm.i64 + %6765 = llvm.getelementptr %6757[%6764] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6756, %6765 : !llvm.ptr + %6766 = llvm.add %5850, %47 : !llvm.i64 + %6767 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6768 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6769 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6770 = llvm.mul %28, %6769 : !llvm.i64 + %6771 = llvm.add %6768, %6770 : !llvm.i64 + %6772 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6773 = llvm.mul %5851, %6772 : !llvm.i64 + %6774 = llvm.add %6771, %6773 : !llvm.i64 + %6775 = llvm.getelementptr %6767[%6774] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6776 = llvm.load %6775 : !llvm.ptr + %6777 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6779 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6780 = llvm.mul %5851, %6779 : !llvm.i64 + %6781 = llvm.add %6778, %6780 : !llvm.i64 + %6782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6783 = llvm.mul %6766, %6782 : !llvm.i64 + %6784 = llvm.add %6781, %6783 : !llvm.i64 + %6785 = llvm.getelementptr %6777[%6784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6786 = llvm.load %6785 : !llvm.ptr + %6787 = llvm.fmul %6776, %6786 {RelaxedPrecision} : !llvm.float + %6788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6791 = llvm.mul %28, %6790 : !llvm.i64 + %6792 = llvm.add %6789, %6791 : !llvm.i64 + %6793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6794 = llvm.mul %6766, %6793 : !llvm.i64 + %6795 = llvm.add %6792, %6794 : !llvm.i64 + %6796 = llvm.getelementptr %6788[%6795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6797 = llvm.load %6796 : !llvm.ptr + %6798 = llvm.fadd %6797, %6787 {RelaxedPrecision} : !llvm.float + %6799 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6800 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6801 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6802 = llvm.mul %28, %6801 : !llvm.i64 + %6803 = llvm.add %6800, %6802 : !llvm.i64 + %6804 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6805 = llvm.mul %6766, %6804 : !llvm.i64 + %6806 = llvm.add %6803, %6805 : !llvm.i64 + %6807 = llvm.getelementptr %6799[%6806] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6798, %6807 : !llvm.ptr + %6808 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6811 = llvm.mul %28, %6810 : !llvm.i64 + %6812 = llvm.add %6809, %6811 : !llvm.i64 + %6813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6814 = llvm.mul %6766, %6813 : !llvm.i64 + %6815 = llvm.add %6812, %6814 : !llvm.i64 + %6816 = llvm.getelementptr %6808[%6815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6817 = llvm.load %6816 : !llvm.ptr + %6818 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6819 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6820 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6821 = llvm.mul %28, %6820 : !llvm.i64 + %6822 = llvm.add %6819, %6821 : !llvm.i64 + %6823 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6824 = llvm.mul %6766, %6823 : !llvm.i64 + %6825 = llvm.add %6822, %6824 : !llvm.i64 + %6826 = llvm.getelementptr %6818[%6825] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6817, %6826 : !llvm.ptr + %6827 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6828 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6829 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6830 = llvm.mul %24, %6829 : !llvm.i64 + %6831 = llvm.add %6828, %6830 : !llvm.i64 + %6832 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6833 = llvm.mul %5851, %6832 : !llvm.i64 + %6834 = llvm.add %6831, %6833 : !llvm.i64 + %6835 = llvm.getelementptr %6827[%6834] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6836 = llvm.load %6835 : !llvm.ptr + %6837 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6839 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6840 = llvm.mul %5851, %6839 : !llvm.i64 + %6841 = llvm.add %6838, %6840 : !llvm.i64 + %6842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6843 = llvm.mul %5850, %6842 : !llvm.i64 + %6844 = llvm.add %6841, %6843 : !llvm.i64 + %6845 = llvm.getelementptr %6837[%6844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6846 = llvm.load %6845 : !llvm.ptr + %6847 = llvm.fmul %6836, %6846 {RelaxedPrecision} : !llvm.float + %6848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6851 = llvm.mul %24, %6850 : !llvm.i64 + %6852 = llvm.add %6849, %6851 : !llvm.i64 + %6853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6854 = llvm.mul %5850, %6853 : !llvm.i64 + %6855 = llvm.add %6852, %6854 : !llvm.i64 + %6856 = llvm.getelementptr %6848[%6855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6857 = llvm.load %6856 : !llvm.ptr + %6858 = llvm.fadd %6857, %6847 {RelaxedPrecision} : !llvm.float + %6859 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6860 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6861 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6862 = llvm.mul %24, %6861 : !llvm.i64 + %6863 = llvm.add %6860, %6862 : !llvm.i64 + %6864 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6865 = llvm.mul %5850, %6864 : !llvm.i64 + %6866 = llvm.add %6863, %6865 : !llvm.i64 + %6867 = llvm.getelementptr %6859[%6866] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6858, %6867 : !llvm.ptr + %6868 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6871 = llvm.mul %24, %6870 : !llvm.i64 + %6872 = llvm.add %6869, %6871 : !llvm.i64 + %6873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6874 = llvm.mul %5850, %6873 : !llvm.i64 + %6875 = llvm.add %6872, %6874 : !llvm.i64 + %6876 = llvm.getelementptr %6868[%6875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6877 = llvm.load %6876 : !llvm.ptr + %6878 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6880 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6881 = llvm.mul %24, %6880 : !llvm.i64 + %6882 = llvm.add %6879, %6881 : !llvm.i64 + %6883 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6884 = llvm.mul %5850, %6883 : !llvm.i64 + %6885 = llvm.add %6882, %6884 : !llvm.i64 + %6886 = llvm.getelementptr %6878[%6885] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6877, %6886 : !llvm.ptr + %6887 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6888 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6889 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6890 = llvm.mul %24, %6889 : !llvm.i64 + %6891 = llvm.add %6888, %6890 : !llvm.i64 + %6892 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6893 = llvm.mul %5851, %6892 : !llvm.i64 + %6894 = llvm.add %6891, %6893 : !llvm.i64 + %6895 = llvm.getelementptr %6887[%6894] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6896 = llvm.load %6895 : !llvm.ptr + %6897 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6899 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6900 = llvm.mul %5851, %6899 : !llvm.i64 + %6901 = llvm.add %6898, %6900 : !llvm.i64 + %6902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6903 = llvm.mul %5912, %6902 : !llvm.i64 + %6904 = llvm.add %6901, %6903 : !llvm.i64 + %6905 = llvm.getelementptr %6897[%6904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6906 = llvm.load %6905 : !llvm.ptr + %6907 = llvm.fmul %6896, %6906 {RelaxedPrecision} : !llvm.float + %6908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6911 = llvm.mul %24, %6910 : !llvm.i64 + %6912 = llvm.add %6909, %6911 : !llvm.i64 + %6913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6914 = llvm.mul %5912, %6913 : !llvm.i64 + %6915 = llvm.add %6912, %6914 : !llvm.i64 + %6916 = llvm.getelementptr %6908[%6915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6917 = llvm.load %6916 : !llvm.ptr + %6918 = llvm.fadd %6917, %6907 {RelaxedPrecision} : !llvm.float + %6919 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6921 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6922 = llvm.mul %24, %6921 : !llvm.i64 + %6923 = llvm.add %6920, %6922 : !llvm.i64 + %6924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6925 = llvm.mul %5912, %6924 : !llvm.i64 + %6926 = llvm.add %6923, %6925 : !llvm.i64 + %6927 = llvm.getelementptr %6919[%6926] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6918, %6927 : !llvm.ptr + %6928 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6930 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6931 = llvm.mul %24, %6930 : !llvm.i64 + %6932 = llvm.add %6929, %6931 : !llvm.i64 + %6933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6934 = llvm.mul %5912, %6933 : !llvm.i64 + %6935 = llvm.add %6932, %6934 : !llvm.i64 + %6936 = llvm.getelementptr %6928[%6935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6937 = llvm.load %6936 : !llvm.ptr + %6938 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6940 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6941 = llvm.mul %24, %6940 : !llvm.i64 + %6942 = llvm.add %6939, %6941 : !llvm.i64 + %6943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6944 = llvm.mul %5912, %6943 : !llvm.i64 + %6945 = llvm.add %6942, %6944 : !llvm.i64 + %6946 = llvm.getelementptr %6938[%6945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6937, %6946 : !llvm.ptr + %6947 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6948 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6949 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6950 = llvm.mul %24, %6949 : !llvm.i64 + %6951 = llvm.add %6948, %6950 : !llvm.i64 + %6952 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6953 = llvm.mul %5851, %6952 : !llvm.i64 + %6954 = llvm.add %6951, %6953 : !llvm.i64 + %6955 = llvm.getelementptr %6947[%6954] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6956 = llvm.load %6955 : !llvm.ptr + %6957 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6958 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6959 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6960 = llvm.mul %5851, %6959 : !llvm.i64 + %6961 = llvm.add %6958, %6960 : !llvm.i64 + %6962 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6963 = llvm.mul %5973, %6962 : !llvm.i64 + %6964 = llvm.add %6961, %6963 : !llvm.i64 + %6965 = llvm.getelementptr %6957[%6964] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6966 = llvm.load %6965 : !llvm.ptr + %6967 = llvm.fmul %6956, %6966 {RelaxedPrecision} : !llvm.float + %6968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6971 = llvm.mul %24, %6970 : !llvm.i64 + %6972 = llvm.add %6969, %6971 : !llvm.i64 + %6973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6974 = llvm.mul %5973, %6973 : !llvm.i64 + %6975 = llvm.add %6972, %6974 : !llvm.i64 + %6976 = llvm.getelementptr %6968[%6975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6977 = llvm.load %6976 : !llvm.ptr + %6978 = llvm.fadd %6977, %6967 {RelaxedPrecision} : !llvm.float + %6979 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6981 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6982 = llvm.mul %24, %6981 : !llvm.i64 + %6983 = llvm.add %6980, %6982 : !llvm.i64 + %6984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6985 = llvm.mul %5973, %6984 : !llvm.i64 + %6986 = llvm.add %6983, %6985 : !llvm.i64 + %6987 = llvm.getelementptr %6979[%6986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6978, %6987 : !llvm.ptr + %6988 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6990 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6991 = llvm.mul %24, %6990 : !llvm.i64 + %6992 = llvm.add %6989, %6991 : !llvm.i64 + %6993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6994 = llvm.mul %5973, %6993 : !llvm.i64 + %6995 = llvm.add %6992, %6994 : !llvm.i64 + %6996 = llvm.getelementptr %6988[%6995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6997 = llvm.load %6996 : !llvm.ptr + %6998 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6999 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7000 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7001 = llvm.mul %24, %7000 : !llvm.i64 + %7002 = llvm.add %6999, %7001 : !llvm.i64 + %7003 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7004 = llvm.mul %5973, %7003 : !llvm.i64 + %7005 = llvm.add %7002, %7004 : !llvm.i64 + %7006 = llvm.getelementptr %6998[%7005] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6997, %7006 : !llvm.ptr + %7007 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7009 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7010 = llvm.mul %24, %7009 : !llvm.i64 + %7011 = llvm.add %7008, %7010 : !llvm.i64 + %7012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7013 = llvm.mul %5851, %7012 : !llvm.i64 + %7014 = llvm.add %7011, %7013 : !llvm.i64 + %7015 = llvm.getelementptr %7007[%7014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7016 = llvm.load %7015 : !llvm.ptr + %7017 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7018 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7019 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7020 = llvm.mul %5851, %7019 : !llvm.i64 + %7021 = llvm.add %7018, %7020 : !llvm.i64 + %7022 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7023 = llvm.mul %6034, %7022 : !llvm.i64 + %7024 = llvm.add %7021, %7023 : !llvm.i64 + %7025 = llvm.getelementptr %7017[%7024] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7026 = llvm.load %7025 : !llvm.ptr + %7027 = llvm.fmul %7016, %7026 {RelaxedPrecision} : !llvm.float + %7028 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7031 = llvm.mul %24, %7030 : !llvm.i64 + %7032 = llvm.add %7029, %7031 : !llvm.i64 + %7033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7034 = llvm.mul %6034, %7033 : !llvm.i64 + %7035 = llvm.add %7032, %7034 : !llvm.i64 + %7036 = llvm.getelementptr %7028[%7035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7037 = llvm.load %7036 : !llvm.ptr + %7038 = llvm.fadd %7037, %7027 {RelaxedPrecision} : !llvm.float + %7039 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7041 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7042 = llvm.mul %24, %7041 : !llvm.i64 + %7043 = llvm.add %7040, %7042 : !llvm.i64 + %7044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7045 = llvm.mul %6034, %7044 : !llvm.i64 + %7046 = llvm.add %7043, %7045 : !llvm.i64 + %7047 = llvm.getelementptr %7039[%7046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7038, %7047 : !llvm.ptr + %7048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7051 = llvm.mul %24, %7050 : !llvm.i64 + %7052 = llvm.add %7049, %7051 : !llvm.i64 + %7053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7054 = llvm.mul %6034, %7053 : !llvm.i64 + %7055 = llvm.add %7052, %7054 : !llvm.i64 + %7056 = llvm.getelementptr %7048[%7055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7057 = llvm.load %7056 : !llvm.ptr + %7058 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7059 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7060 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7061 = llvm.mul %24, %7060 : !llvm.i64 + %7062 = llvm.add %7059, %7061 : !llvm.i64 + %7063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7064 = llvm.mul %6034, %7063 : !llvm.i64 + %7065 = llvm.add %7062, %7064 : !llvm.i64 + %7066 = llvm.getelementptr %7058[%7065] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7057, %7066 : !llvm.ptr + %7067 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7069 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7070 = llvm.mul %24, %7069 : !llvm.i64 + %7071 = llvm.add %7068, %7070 : !llvm.i64 + %7072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7073 = llvm.mul %5851, %7072 : !llvm.i64 + %7074 = llvm.add %7071, %7073 : !llvm.i64 + %7075 = llvm.getelementptr %7067[%7074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7076 = llvm.load %7075 : !llvm.ptr + %7077 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7078 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7079 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7080 = llvm.mul %5851, %7079 : !llvm.i64 + %7081 = llvm.add %7078, %7080 : !llvm.i64 + %7082 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7083 = llvm.mul %6095, %7082 : !llvm.i64 + %7084 = llvm.add %7081, %7083 : !llvm.i64 + %7085 = llvm.getelementptr %7077[%7084] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7086 = llvm.load %7085 : !llvm.ptr + %7087 = llvm.fmul %7076, %7086 {RelaxedPrecision} : !llvm.float + %7088 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7091 = llvm.mul %24, %7090 : !llvm.i64 + %7092 = llvm.add %7089, %7091 : !llvm.i64 + %7093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7094 = llvm.mul %6095, %7093 : !llvm.i64 + %7095 = llvm.add %7092, %7094 : !llvm.i64 + %7096 = llvm.getelementptr %7088[%7095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7097 = llvm.load %7096 : !llvm.ptr + %7098 = llvm.fadd %7097, %7087 {RelaxedPrecision} : !llvm.float + %7099 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7101 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7102 = llvm.mul %24, %7101 : !llvm.i64 + %7103 = llvm.add %7100, %7102 : !llvm.i64 + %7104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7105 = llvm.mul %6095, %7104 : !llvm.i64 + %7106 = llvm.add %7103, %7105 : !llvm.i64 + %7107 = llvm.getelementptr %7099[%7106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7098, %7107 : !llvm.ptr + %7108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7111 = llvm.mul %24, %7110 : !llvm.i64 + %7112 = llvm.add %7109, %7111 : !llvm.i64 + %7113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7114 = llvm.mul %6095, %7113 : !llvm.i64 + %7115 = llvm.add %7112, %7114 : !llvm.i64 + %7116 = llvm.getelementptr %7108[%7115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7117 = llvm.load %7116 : !llvm.ptr + %7118 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7119 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7120 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7121 = llvm.mul %24, %7120 : !llvm.i64 + %7122 = llvm.add %7119, %7121 : !llvm.i64 + %7123 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7124 = llvm.mul %6095, %7123 : !llvm.i64 + %7125 = llvm.add %7122, %7124 : !llvm.i64 + %7126 = llvm.getelementptr %7118[%7125] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7117, %7126 : !llvm.ptr + %7127 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7128 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7129 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7130 = llvm.mul %24, %7129 : !llvm.i64 + %7131 = llvm.add %7128, %7130 : !llvm.i64 + %7132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7133 = llvm.mul %5851, %7132 : !llvm.i64 + %7134 = llvm.add %7131, %7133 : !llvm.i64 + %7135 = llvm.getelementptr %7127[%7134] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7136 = llvm.load %7135 : !llvm.ptr + %7137 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7140 = llvm.mul %5851, %7139 : !llvm.i64 + %7141 = llvm.add %7138, %7140 : !llvm.i64 + %7142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7143 = llvm.mul %6156, %7142 : !llvm.i64 + %7144 = llvm.add %7141, %7143 : !llvm.i64 + %7145 = llvm.getelementptr %7137[%7144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7146 = llvm.load %7145 : !llvm.ptr + %7147 = llvm.fmul %7136, %7146 {RelaxedPrecision} : !llvm.float + %7148 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7151 = llvm.mul %24, %7150 : !llvm.i64 + %7152 = llvm.add %7149, %7151 : !llvm.i64 + %7153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7154 = llvm.mul %6156, %7153 : !llvm.i64 + %7155 = llvm.add %7152, %7154 : !llvm.i64 + %7156 = llvm.getelementptr %7148[%7155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7157 = llvm.load %7156 : !llvm.ptr + %7158 = llvm.fadd %7157, %7147 {RelaxedPrecision} : !llvm.float + %7159 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7161 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7162 = llvm.mul %24, %7161 : !llvm.i64 + %7163 = llvm.add %7160, %7162 : !llvm.i64 + %7164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7165 = llvm.mul %6156, %7164 : !llvm.i64 + %7166 = llvm.add %7163, %7165 : !llvm.i64 + %7167 = llvm.getelementptr %7159[%7166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7158, %7167 : !llvm.ptr + %7168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7171 = llvm.mul %24, %7170 : !llvm.i64 + %7172 = llvm.add %7169, %7171 : !llvm.i64 + %7173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7174 = llvm.mul %6156, %7173 : !llvm.i64 + %7175 = llvm.add %7172, %7174 : !llvm.i64 + %7176 = llvm.getelementptr %7168[%7175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7177 = llvm.load %7176 : !llvm.ptr + %7178 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7180 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7181 = llvm.mul %24, %7180 : !llvm.i64 + %7182 = llvm.add %7179, %7181 : !llvm.i64 + %7183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7184 = llvm.mul %6156, %7183 : !llvm.i64 + %7185 = llvm.add %7182, %7184 : !llvm.i64 + %7186 = llvm.getelementptr %7178[%7185] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7177, %7186 : !llvm.ptr + %7187 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7188 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7189 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7190 = llvm.mul %24, %7189 : !llvm.i64 + %7191 = llvm.add %7188, %7190 : !llvm.i64 + %7192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7193 = llvm.mul %5851, %7192 : !llvm.i64 + %7194 = llvm.add %7191, %7193 : !llvm.i64 + %7195 = llvm.getelementptr %7187[%7194] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7196 = llvm.load %7195 : !llvm.ptr + %7197 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7198 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7199 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7200 = llvm.mul %5851, %7199 : !llvm.i64 + %7201 = llvm.add %7198, %7200 : !llvm.i64 + %7202 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7203 = llvm.mul %6217, %7202 : !llvm.i64 + %7204 = llvm.add %7201, %7203 : !llvm.i64 + %7205 = llvm.getelementptr %7197[%7204] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7206 = llvm.load %7205 : !llvm.ptr + %7207 = llvm.fmul %7196, %7206 {RelaxedPrecision} : !llvm.float + %7208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7211 = llvm.mul %24, %7210 : !llvm.i64 + %7212 = llvm.add %7209, %7211 : !llvm.i64 + %7213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7214 = llvm.mul %6217, %7213 : !llvm.i64 + %7215 = llvm.add %7212, %7214 : !llvm.i64 + %7216 = llvm.getelementptr %7208[%7215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7217 = llvm.load %7216 : !llvm.ptr + %7218 = llvm.fadd %7217, %7207 {RelaxedPrecision} : !llvm.float + %7219 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7221 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7222 = llvm.mul %24, %7221 : !llvm.i64 + %7223 = llvm.add %7220, %7222 : !llvm.i64 + %7224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7225 = llvm.mul %6217, %7224 : !llvm.i64 + %7226 = llvm.add %7223, %7225 : !llvm.i64 + %7227 = llvm.getelementptr %7219[%7226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7218, %7227 : !llvm.ptr + %7228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7231 = llvm.mul %24, %7230 : !llvm.i64 + %7232 = llvm.add %7229, %7231 : !llvm.i64 + %7233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7234 = llvm.mul %6217, %7233 : !llvm.i64 + %7235 = llvm.add %7232, %7234 : !llvm.i64 + %7236 = llvm.getelementptr %7228[%7235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7237 = llvm.load %7236 : !llvm.ptr + %7238 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7239 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7240 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7241 = llvm.mul %24, %7240 : !llvm.i64 + %7242 = llvm.add %7239, %7241 : !llvm.i64 + %7243 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7244 = llvm.mul %6217, %7243 : !llvm.i64 + %7245 = llvm.add %7242, %7244 : !llvm.i64 + %7246 = llvm.getelementptr %7238[%7245] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7237, %7246 : !llvm.ptr + %7247 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7248 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7249 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7250 = llvm.mul %24, %7249 : !llvm.i64 + %7251 = llvm.add %7248, %7250 : !llvm.i64 + %7252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7253 = llvm.mul %5851, %7252 : !llvm.i64 + %7254 = llvm.add %7251, %7253 : !llvm.i64 + %7255 = llvm.getelementptr %7247[%7254] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7256 = llvm.load %7255 : !llvm.ptr + %7257 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7258 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7259 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7260 = llvm.mul %5851, %7259 : !llvm.i64 + %7261 = llvm.add %7258, %7260 : !llvm.i64 + %7262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7263 = llvm.mul %6278, %7262 : !llvm.i64 + %7264 = llvm.add %7261, %7263 : !llvm.i64 + %7265 = llvm.getelementptr %7257[%7264] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7266 = llvm.load %7265 : !llvm.ptr + %7267 = llvm.fmul %7256, %7266 {RelaxedPrecision} : !llvm.float + %7268 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7271 = llvm.mul %24, %7270 : !llvm.i64 + %7272 = llvm.add %7269, %7271 : !llvm.i64 + %7273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7274 = llvm.mul %6278, %7273 : !llvm.i64 + %7275 = llvm.add %7272, %7274 : !llvm.i64 + %7276 = llvm.getelementptr %7268[%7275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7277 = llvm.load %7276 : !llvm.ptr + %7278 = llvm.fadd %7277, %7267 {RelaxedPrecision} : !llvm.float + %7279 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7281 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7282 = llvm.mul %24, %7281 : !llvm.i64 + %7283 = llvm.add %7280, %7282 : !llvm.i64 + %7284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7285 = llvm.mul %6278, %7284 : !llvm.i64 + %7286 = llvm.add %7283, %7285 : !llvm.i64 + %7287 = llvm.getelementptr %7279[%7286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7278, %7287 : !llvm.ptr + %7288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7291 = llvm.mul %24, %7290 : !llvm.i64 + %7292 = llvm.add %7289, %7291 : !llvm.i64 + %7293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7294 = llvm.mul %6278, %7293 : !llvm.i64 + %7295 = llvm.add %7292, %7294 : !llvm.i64 + %7296 = llvm.getelementptr %7288[%7295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7297 = llvm.load %7296 : !llvm.ptr + %7298 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7300 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7301 = llvm.mul %24, %7300 : !llvm.i64 + %7302 = llvm.add %7299, %7301 : !llvm.i64 + %7303 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7304 = llvm.mul %6278, %7303 : !llvm.i64 + %7305 = llvm.add %7302, %7304 : !llvm.i64 + %7306 = llvm.getelementptr %7298[%7305] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7297, %7306 : !llvm.ptr + %7307 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7308 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7309 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7310 = llvm.mul %24, %7309 : !llvm.i64 + %7311 = llvm.add %7308, %7310 : !llvm.i64 + %7312 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7313 = llvm.mul %5851, %7312 : !llvm.i64 + %7314 = llvm.add %7311, %7313 : !llvm.i64 + %7315 = llvm.getelementptr %7307[%7314] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7316 = llvm.load %7315 : !llvm.ptr + %7317 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7318 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7319 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7320 = llvm.mul %5851, %7319 : !llvm.i64 + %7321 = llvm.add %7318, %7320 : !llvm.i64 + %7322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7323 = llvm.mul %6339, %7322 : !llvm.i64 + %7324 = llvm.add %7321, %7323 : !llvm.i64 + %7325 = llvm.getelementptr %7317[%7324] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7326 = llvm.load %7325 : !llvm.ptr + %7327 = llvm.fmul %7316, %7326 {RelaxedPrecision} : !llvm.float + %7328 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7331 = llvm.mul %24, %7330 : !llvm.i64 + %7332 = llvm.add %7329, %7331 : !llvm.i64 + %7333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7334 = llvm.mul %6339, %7333 : !llvm.i64 + %7335 = llvm.add %7332, %7334 : !llvm.i64 + %7336 = llvm.getelementptr %7328[%7335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7337 = llvm.load %7336 : !llvm.ptr + %7338 = llvm.fadd %7337, %7327 {RelaxedPrecision} : !llvm.float + %7339 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7341 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7342 = llvm.mul %24, %7341 : !llvm.i64 + %7343 = llvm.add %7340, %7342 : !llvm.i64 + %7344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7345 = llvm.mul %6339, %7344 : !llvm.i64 + %7346 = llvm.add %7343, %7345 : !llvm.i64 + %7347 = llvm.getelementptr %7339[%7346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7338, %7347 : !llvm.ptr + %7348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7351 = llvm.mul %24, %7350 : !llvm.i64 + %7352 = llvm.add %7349, %7351 : !llvm.i64 + %7353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7354 = llvm.mul %6339, %7353 : !llvm.i64 + %7355 = llvm.add %7352, %7354 : !llvm.i64 + %7356 = llvm.getelementptr %7348[%7355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7357 = llvm.load %7356 : !llvm.ptr + %7358 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7360 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7361 = llvm.mul %24, %7360 : !llvm.i64 + %7362 = llvm.add %7359, %7361 : !llvm.i64 + %7363 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7364 = llvm.mul %6339, %7363 : !llvm.i64 + %7365 = llvm.add %7362, %7364 : !llvm.i64 + %7366 = llvm.getelementptr %7358[%7365] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7357, %7366 : !llvm.ptr + %7367 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7368 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7369 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7370 = llvm.mul %24, %7369 : !llvm.i64 + %7371 = llvm.add %7368, %7370 : !llvm.i64 + %7372 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7373 = llvm.mul %5851, %7372 : !llvm.i64 + %7374 = llvm.add %7371, %7373 : !llvm.i64 + %7375 = llvm.getelementptr %7367[%7374] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7376 = llvm.load %7375 : !llvm.ptr + %7377 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7378 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7379 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7380 = llvm.mul %5851, %7379 : !llvm.i64 + %7381 = llvm.add %7378, %7380 : !llvm.i64 + %7382 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7383 = llvm.mul %6400, %7382 : !llvm.i64 + %7384 = llvm.add %7381, %7383 : !llvm.i64 + %7385 = llvm.getelementptr %7377[%7384] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7386 = llvm.load %7385 : !llvm.ptr + %7387 = llvm.fmul %7376, %7386 {RelaxedPrecision} : !llvm.float + %7388 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7391 = llvm.mul %24, %7390 : !llvm.i64 + %7392 = llvm.add %7389, %7391 : !llvm.i64 + %7393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7394 = llvm.mul %6400, %7393 : !llvm.i64 + %7395 = llvm.add %7392, %7394 : !llvm.i64 + %7396 = llvm.getelementptr %7388[%7395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7397 = llvm.load %7396 : !llvm.ptr + %7398 = llvm.fadd %7397, %7387 {RelaxedPrecision} : !llvm.float + %7399 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7401 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7402 = llvm.mul %24, %7401 : !llvm.i64 + %7403 = llvm.add %7400, %7402 : !llvm.i64 + %7404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7405 = llvm.mul %6400, %7404 : !llvm.i64 + %7406 = llvm.add %7403, %7405 : !llvm.i64 + %7407 = llvm.getelementptr %7399[%7406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7398, %7407 : !llvm.ptr + %7408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7411 = llvm.mul %24, %7410 : !llvm.i64 + %7412 = llvm.add %7409, %7411 : !llvm.i64 + %7413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7414 = llvm.mul %6400, %7413 : !llvm.i64 + %7415 = llvm.add %7412, %7414 : !llvm.i64 + %7416 = llvm.getelementptr %7408[%7415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7417 = llvm.load %7416 : !llvm.ptr + %7418 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7420 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7421 = llvm.mul %24, %7420 : !llvm.i64 + %7422 = llvm.add %7419, %7421 : !llvm.i64 + %7423 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7424 = llvm.mul %6400, %7423 : !llvm.i64 + %7425 = llvm.add %7422, %7424 : !llvm.i64 + %7426 = llvm.getelementptr %7418[%7425] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7417, %7426 : !llvm.ptr + %7427 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7428 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7429 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7430 = llvm.mul %24, %7429 : !llvm.i64 + %7431 = llvm.add %7428, %7430 : !llvm.i64 + %7432 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7433 = llvm.mul %5851, %7432 : !llvm.i64 + %7434 = llvm.add %7431, %7433 : !llvm.i64 + %7435 = llvm.getelementptr %7427[%7434] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7436 = llvm.load %7435 : !llvm.ptr + %7437 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7438 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7439 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7440 = llvm.mul %5851, %7439 : !llvm.i64 + %7441 = llvm.add %7438, %7440 : !llvm.i64 + %7442 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7443 = llvm.mul %6461, %7442 : !llvm.i64 + %7444 = llvm.add %7441, %7443 : !llvm.i64 + %7445 = llvm.getelementptr %7437[%7444] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7446 = llvm.load %7445 : !llvm.ptr + %7447 = llvm.fmul %7436, %7446 {RelaxedPrecision} : !llvm.float + %7448 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7451 = llvm.mul %24, %7450 : !llvm.i64 + %7452 = llvm.add %7449, %7451 : !llvm.i64 + %7453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7454 = llvm.mul %6461, %7453 : !llvm.i64 + %7455 = llvm.add %7452, %7454 : !llvm.i64 + %7456 = llvm.getelementptr %7448[%7455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7457 = llvm.load %7456 : !llvm.ptr + %7458 = llvm.fadd %7457, %7447 {RelaxedPrecision} : !llvm.float + %7459 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7461 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7462 = llvm.mul %24, %7461 : !llvm.i64 + %7463 = llvm.add %7460, %7462 : !llvm.i64 + %7464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7465 = llvm.mul %6461, %7464 : !llvm.i64 + %7466 = llvm.add %7463, %7465 : !llvm.i64 + %7467 = llvm.getelementptr %7459[%7466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7458, %7467 : !llvm.ptr + %7468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7471 = llvm.mul %24, %7470 : !llvm.i64 + %7472 = llvm.add %7469, %7471 : !llvm.i64 + %7473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7474 = llvm.mul %6461, %7473 : !llvm.i64 + %7475 = llvm.add %7472, %7474 : !llvm.i64 + %7476 = llvm.getelementptr %7468[%7475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7477 = llvm.load %7476 : !llvm.ptr + %7478 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7479 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7480 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7481 = llvm.mul %24, %7480 : !llvm.i64 + %7482 = llvm.add %7479, %7481 : !llvm.i64 + %7483 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7484 = llvm.mul %6461, %7483 : !llvm.i64 + %7485 = llvm.add %7482, %7484 : !llvm.i64 + %7486 = llvm.getelementptr %7478[%7485] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7477, %7486 : !llvm.ptr + %7487 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7489 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7490 = llvm.mul %24, %7489 : !llvm.i64 + %7491 = llvm.add %7488, %7490 : !llvm.i64 + %7492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7493 = llvm.mul %5851, %7492 : !llvm.i64 + %7494 = llvm.add %7491, %7493 : !llvm.i64 + %7495 = llvm.getelementptr %7487[%7494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7496 = llvm.load %7495 : !llvm.ptr + %7497 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7500 = llvm.mul %5851, %7499 : !llvm.i64 + %7501 = llvm.add %7498, %7500 : !llvm.i64 + %7502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7503 = llvm.mul %6522, %7502 : !llvm.i64 + %7504 = llvm.add %7501, %7503 : !llvm.i64 + %7505 = llvm.getelementptr %7497[%7504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7506 = llvm.load %7505 : !llvm.ptr + %7507 = llvm.fmul %7496, %7506 {RelaxedPrecision} : !llvm.float + %7508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7511 = llvm.mul %24, %7510 : !llvm.i64 + %7512 = llvm.add %7509, %7511 : !llvm.i64 + %7513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7514 = llvm.mul %6522, %7513 : !llvm.i64 + %7515 = llvm.add %7512, %7514 : !llvm.i64 + %7516 = llvm.getelementptr %7508[%7515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7517 = llvm.load %7516 : !llvm.ptr + %7518 = llvm.fadd %7517, %7507 {RelaxedPrecision} : !llvm.float + %7519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7522 = llvm.mul %24, %7521 : !llvm.i64 + %7523 = llvm.add %7520, %7522 : !llvm.i64 + %7524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7525 = llvm.mul %6522, %7524 : !llvm.i64 + %7526 = llvm.add %7523, %7525 : !llvm.i64 + %7527 = llvm.getelementptr %7519[%7526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7518, %7527 : !llvm.ptr + %7528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7531 = llvm.mul %24, %7530 : !llvm.i64 + %7532 = llvm.add %7529, %7531 : !llvm.i64 + %7533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7534 = llvm.mul %6522, %7533 : !llvm.i64 + %7535 = llvm.add %7532, %7534 : !llvm.i64 + %7536 = llvm.getelementptr %7528[%7535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7537 = llvm.load %7536 : !llvm.ptr + %7538 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7539 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7540 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7541 = llvm.mul %24, %7540 : !llvm.i64 + %7542 = llvm.add %7539, %7541 : !llvm.i64 + %7543 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7544 = llvm.mul %6522, %7543 : !llvm.i64 + %7545 = llvm.add %7542, %7544 : !llvm.i64 + %7546 = llvm.getelementptr %7538[%7545] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7537, %7546 : !llvm.ptr + %7547 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7548 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7549 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7550 = llvm.mul %24, %7549 : !llvm.i64 + %7551 = llvm.add %7548, %7550 : !llvm.i64 + %7552 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7553 = llvm.mul %5851, %7552 : !llvm.i64 + %7554 = llvm.add %7551, %7553 : !llvm.i64 + %7555 = llvm.getelementptr %7547[%7554] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7556 = llvm.load %7555 : !llvm.ptr + %7557 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7558 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7559 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7560 = llvm.mul %5851, %7559 : !llvm.i64 + %7561 = llvm.add %7558, %7560 : !llvm.i64 + %7562 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7563 = llvm.mul %6583, %7562 : !llvm.i64 + %7564 = llvm.add %7561, %7563 : !llvm.i64 + %7565 = llvm.getelementptr %7557[%7564] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7566 = llvm.load %7565 : !llvm.ptr + %7567 = llvm.fmul %7556, %7566 {RelaxedPrecision} : !llvm.float + %7568 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7571 = llvm.mul %24, %7570 : !llvm.i64 + %7572 = llvm.add %7569, %7571 : !llvm.i64 + %7573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7574 = llvm.mul %6583, %7573 : !llvm.i64 + %7575 = llvm.add %7572, %7574 : !llvm.i64 + %7576 = llvm.getelementptr %7568[%7575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7577 = llvm.load %7576 : !llvm.ptr + %7578 = llvm.fadd %7577, %7567 {RelaxedPrecision} : !llvm.float + %7579 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7581 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7582 = llvm.mul %24, %7581 : !llvm.i64 + %7583 = llvm.add %7580, %7582 : !llvm.i64 + %7584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7585 = llvm.mul %6583, %7584 : !llvm.i64 + %7586 = llvm.add %7583, %7585 : !llvm.i64 + %7587 = llvm.getelementptr %7579[%7586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7578, %7587 : !llvm.ptr + %7588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7591 = llvm.mul %24, %7590 : !llvm.i64 + %7592 = llvm.add %7589, %7591 : !llvm.i64 + %7593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7594 = llvm.mul %6583, %7593 : !llvm.i64 + %7595 = llvm.add %7592, %7594 : !llvm.i64 + %7596 = llvm.getelementptr %7588[%7595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7597 = llvm.load %7596 : !llvm.ptr + %7598 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7599 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7600 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7601 = llvm.mul %24, %7600 : !llvm.i64 + %7602 = llvm.add %7599, %7601 : !llvm.i64 + %7603 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7604 = llvm.mul %6583, %7603 : !llvm.i64 + %7605 = llvm.add %7602, %7604 : !llvm.i64 + %7606 = llvm.getelementptr %7598[%7605] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7597, %7606 : !llvm.ptr + %7607 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7608 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7609 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7610 = llvm.mul %24, %7609 : !llvm.i64 + %7611 = llvm.add %7608, %7610 : !llvm.i64 + %7612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7613 = llvm.mul %5851, %7612 : !llvm.i64 + %7614 = llvm.add %7611, %7613 : !llvm.i64 + %7615 = llvm.getelementptr %7607[%7614] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7616 = llvm.load %7615 : !llvm.ptr + %7617 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7618 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7619 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7620 = llvm.mul %5851, %7619 : !llvm.i64 + %7621 = llvm.add %7618, %7620 : !llvm.i64 + %7622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7623 = llvm.mul %6644, %7622 : !llvm.i64 + %7624 = llvm.add %7621, %7623 : !llvm.i64 + %7625 = llvm.getelementptr %7617[%7624] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7626 = llvm.load %7625 : !llvm.ptr + %7627 = llvm.fmul %7616, %7626 {RelaxedPrecision} : !llvm.float + %7628 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7631 = llvm.mul %24, %7630 : !llvm.i64 + %7632 = llvm.add %7629, %7631 : !llvm.i64 + %7633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7634 = llvm.mul %6644, %7633 : !llvm.i64 + %7635 = llvm.add %7632, %7634 : !llvm.i64 + %7636 = llvm.getelementptr %7628[%7635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7637 = llvm.load %7636 : !llvm.ptr + %7638 = llvm.fadd %7637, %7627 {RelaxedPrecision} : !llvm.float + %7639 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7641 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7642 = llvm.mul %24, %7641 : !llvm.i64 + %7643 = llvm.add %7640, %7642 : !llvm.i64 + %7644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7645 = llvm.mul %6644, %7644 : !llvm.i64 + %7646 = llvm.add %7643, %7645 : !llvm.i64 + %7647 = llvm.getelementptr %7639[%7646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7638, %7647 : !llvm.ptr + %7648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7651 = llvm.mul %24, %7650 : !llvm.i64 + %7652 = llvm.add %7649, %7651 : !llvm.i64 + %7653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7654 = llvm.mul %6644, %7653 : !llvm.i64 + %7655 = llvm.add %7652, %7654 : !llvm.i64 + %7656 = llvm.getelementptr %7648[%7655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7657 = llvm.load %7656 : !llvm.ptr + %7658 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7659 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7660 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7661 = llvm.mul %24, %7660 : !llvm.i64 + %7662 = llvm.add %7659, %7661 : !llvm.i64 + %7663 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7664 = llvm.mul %6644, %7663 : !llvm.i64 + %7665 = llvm.add %7662, %7664 : !llvm.i64 + %7666 = llvm.getelementptr %7658[%7665] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7657, %7666 : !llvm.ptr + %7667 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7668 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7669 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7670 = llvm.mul %24, %7669 : !llvm.i64 + %7671 = llvm.add %7668, %7670 : !llvm.i64 + %7672 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7673 = llvm.mul %5851, %7672 : !llvm.i64 + %7674 = llvm.add %7671, %7673 : !llvm.i64 + %7675 = llvm.getelementptr %7667[%7674] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7676 = llvm.load %7675 : !llvm.ptr + %7677 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7680 = llvm.mul %5851, %7679 : !llvm.i64 + %7681 = llvm.add %7678, %7680 : !llvm.i64 + %7682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7683 = llvm.mul %6705, %7682 : !llvm.i64 + %7684 = llvm.add %7681, %7683 : !llvm.i64 + %7685 = llvm.getelementptr %7677[%7684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7686 = llvm.load %7685 : !llvm.ptr + %7687 = llvm.fmul %7676, %7686 {RelaxedPrecision} : !llvm.float + %7688 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7691 = llvm.mul %24, %7690 : !llvm.i64 + %7692 = llvm.add %7689, %7691 : !llvm.i64 + %7693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7694 = llvm.mul %6705, %7693 : !llvm.i64 + %7695 = llvm.add %7692, %7694 : !llvm.i64 + %7696 = llvm.getelementptr %7688[%7695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7697 = llvm.load %7696 : !llvm.ptr + %7698 = llvm.fadd %7697, %7687 {RelaxedPrecision} : !llvm.float + %7699 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7701 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7702 = llvm.mul %24, %7701 : !llvm.i64 + %7703 = llvm.add %7700, %7702 : !llvm.i64 + %7704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7705 = llvm.mul %6705, %7704 : !llvm.i64 + %7706 = llvm.add %7703, %7705 : !llvm.i64 + %7707 = llvm.getelementptr %7699[%7706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7698, %7707 : !llvm.ptr + %7708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7711 = llvm.mul %24, %7710 : !llvm.i64 + %7712 = llvm.add %7709, %7711 : !llvm.i64 + %7713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7714 = llvm.mul %6705, %7713 : !llvm.i64 + %7715 = llvm.add %7712, %7714 : !llvm.i64 + %7716 = llvm.getelementptr %7708[%7715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7717 = llvm.load %7716 : !llvm.ptr + %7718 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7720 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7721 = llvm.mul %24, %7720 : !llvm.i64 + %7722 = llvm.add %7719, %7721 : !llvm.i64 + %7723 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7724 = llvm.mul %6705, %7723 : !llvm.i64 + %7725 = llvm.add %7722, %7724 : !llvm.i64 + %7726 = llvm.getelementptr %7718[%7725] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7717, %7726 : !llvm.ptr + %7727 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7729 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7730 = llvm.mul %24, %7729 : !llvm.i64 + %7731 = llvm.add %7728, %7730 : !llvm.i64 + %7732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7733 = llvm.mul %5851, %7732 : !llvm.i64 + %7734 = llvm.add %7731, %7733 : !llvm.i64 + %7735 = llvm.getelementptr %7727[%7734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7736 = llvm.load %7735 : !llvm.ptr + %7737 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7738 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7739 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7740 = llvm.mul %5851, %7739 : !llvm.i64 + %7741 = llvm.add %7738, %7740 : !llvm.i64 + %7742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7743 = llvm.mul %6766, %7742 : !llvm.i64 + %7744 = llvm.add %7741, %7743 : !llvm.i64 + %7745 = llvm.getelementptr %7737[%7744] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7746 = llvm.load %7745 : !llvm.ptr + %7747 = llvm.fmul %7736, %7746 {RelaxedPrecision} : !llvm.float + %7748 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7750 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7751 = llvm.mul %24, %7750 : !llvm.i64 + %7752 = llvm.add %7749, %7751 : !llvm.i64 + %7753 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7754 = llvm.mul %6766, %7753 : !llvm.i64 + %7755 = llvm.add %7752, %7754 : !llvm.i64 + %7756 = llvm.getelementptr %7748[%7755] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7757 = llvm.load %7756 : !llvm.ptr + %7758 = llvm.fadd %7757, %7747 {RelaxedPrecision} : !llvm.float + %7759 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7761 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7762 = llvm.mul %24, %7761 : !llvm.i64 + %7763 = llvm.add %7760, %7762 : !llvm.i64 + %7764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7765 = llvm.mul %6766, %7764 : !llvm.i64 + %7766 = llvm.add %7763, %7765 : !llvm.i64 + %7767 = llvm.getelementptr %7759[%7766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7758, %7767 : !llvm.ptr + %7768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7771 = llvm.mul %24, %7770 : !llvm.i64 + %7772 = llvm.add %7769, %7771 : !llvm.i64 + %7773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7774 = llvm.mul %6766, %7773 : !llvm.i64 + %7775 = llvm.add %7772, %7774 : !llvm.i64 + %7776 = llvm.getelementptr %7768[%7775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7777 = llvm.load %7776 : !llvm.ptr + %7778 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7779 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7780 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7781 = llvm.mul %24, %7780 : !llvm.i64 + %7782 = llvm.add %7779, %7781 : !llvm.i64 + %7783 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7784 = llvm.mul %6766, %7783 : !llvm.i64 + %7785 = llvm.add %7782, %7784 : !llvm.i64 + %7786 = llvm.getelementptr %7778[%7785] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7777, %7786 : !llvm.ptr + %7787 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7788 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7789 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7790 = llvm.mul %25, %7789 : !llvm.i64 + %7791 = llvm.add %7788, %7790 : !llvm.i64 + %7792 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7793 = llvm.mul %5851, %7792 : !llvm.i64 + %7794 = llvm.add %7791, %7793 : !llvm.i64 + %7795 = llvm.getelementptr %7787[%7794] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7796 = llvm.load %7795 : !llvm.ptr + %7797 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7799 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7800 = llvm.mul %5851, %7799 : !llvm.i64 + %7801 = llvm.add %7798, %7800 : !llvm.i64 + %7802 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7803 = llvm.mul %5850, %7802 : !llvm.i64 + %7804 = llvm.add %7801, %7803 : !llvm.i64 + %7805 = llvm.getelementptr %7797[%7804] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7806 = llvm.load %7805 : !llvm.ptr + %7807 = llvm.fmul %7796, %7806 {RelaxedPrecision} : !llvm.float + %7808 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7811 = llvm.mul %25, %7810 : !llvm.i64 + %7812 = llvm.add %7809, %7811 : !llvm.i64 + %7813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7814 = llvm.mul %5850, %7813 : !llvm.i64 + %7815 = llvm.add %7812, %7814 : !llvm.i64 + %7816 = llvm.getelementptr %7808[%7815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7817 = llvm.load %7816 : !llvm.ptr + %7818 = llvm.fadd %7817, %7807 {RelaxedPrecision} : !llvm.float + %7819 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7821 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7822 = llvm.mul %25, %7821 : !llvm.i64 + %7823 = llvm.add %7820, %7822 : !llvm.i64 + %7824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7825 = llvm.mul %5850, %7824 : !llvm.i64 + %7826 = llvm.add %7823, %7825 : !llvm.i64 + %7827 = llvm.getelementptr %7819[%7826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7818, %7827 : !llvm.ptr + %7828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7831 = llvm.mul %25, %7830 : !llvm.i64 + %7832 = llvm.add %7829, %7831 : !llvm.i64 + %7833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7834 = llvm.mul %5850, %7833 : !llvm.i64 + %7835 = llvm.add %7832, %7834 : !llvm.i64 + %7836 = llvm.getelementptr %7828[%7835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7837 = llvm.load %7836 : !llvm.ptr + %7838 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7839 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7840 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7841 = llvm.mul %25, %7840 : !llvm.i64 + %7842 = llvm.add %7839, %7841 : !llvm.i64 + %7843 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7844 = llvm.mul %5850, %7843 : !llvm.i64 + %7845 = llvm.add %7842, %7844 : !llvm.i64 + %7846 = llvm.getelementptr %7838[%7845] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7837, %7846 : !llvm.ptr + %7847 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7848 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7849 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7850 = llvm.mul %25, %7849 : !llvm.i64 + %7851 = llvm.add %7848, %7850 : !llvm.i64 + %7852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7853 = llvm.mul %5851, %7852 : !llvm.i64 + %7854 = llvm.add %7851, %7853 : !llvm.i64 + %7855 = llvm.getelementptr %7847[%7854] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7856 = llvm.load %7855 : !llvm.ptr + %7857 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7858 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7859 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7860 = llvm.mul %5851, %7859 : !llvm.i64 + %7861 = llvm.add %7858, %7860 : !llvm.i64 + %7862 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7863 = llvm.mul %5912, %7862 : !llvm.i64 + %7864 = llvm.add %7861, %7863 : !llvm.i64 + %7865 = llvm.getelementptr %7857[%7864] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7866 = llvm.load %7865 : !llvm.ptr + %7867 = llvm.fmul %7856, %7866 {RelaxedPrecision} : !llvm.float + %7868 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7871 = llvm.mul %25, %7870 : !llvm.i64 + %7872 = llvm.add %7869, %7871 : !llvm.i64 + %7873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7874 = llvm.mul %5912, %7873 : !llvm.i64 + %7875 = llvm.add %7872, %7874 : !llvm.i64 + %7876 = llvm.getelementptr %7868[%7875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7877 = llvm.load %7876 : !llvm.ptr + %7878 = llvm.fadd %7877, %7867 {RelaxedPrecision} : !llvm.float + %7879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7882 = llvm.mul %25, %7881 : !llvm.i64 + %7883 = llvm.add %7880, %7882 : !llvm.i64 + %7884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7885 = llvm.mul %5912, %7884 : !llvm.i64 + %7886 = llvm.add %7883, %7885 : !llvm.i64 + %7887 = llvm.getelementptr %7879[%7886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7878, %7887 : !llvm.ptr + %7888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7891 = llvm.mul %25, %7890 : !llvm.i64 + %7892 = llvm.add %7889, %7891 : !llvm.i64 + %7893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7894 = llvm.mul %5912, %7893 : !llvm.i64 + %7895 = llvm.add %7892, %7894 : !llvm.i64 + %7896 = llvm.getelementptr %7888[%7895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7897 = llvm.load %7896 : !llvm.ptr + %7898 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7899 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7900 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7901 = llvm.mul %25, %7900 : !llvm.i64 + %7902 = llvm.add %7899, %7901 : !llvm.i64 + %7903 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7904 = llvm.mul %5912, %7903 : !llvm.i64 + %7905 = llvm.add %7902, %7904 : !llvm.i64 + %7906 = llvm.getelementptr %7898[%7905] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7897, %7906 : !llvm.ptr + %7907 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7908 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7909 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7910 = llvm.mul %25, %7909 : !llvm.i64 + %7911 = llvm.add %7908, %7910 : !llvm.i64 + %7912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7913 = llvm.mul %5851, %7912 : !llvm.i64 + %7914 = llvm.add %7911, %7913 : !llvm.i64 + %7915 = llvm.getelementptr %7907[%7914] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7916 = llvm.load %7915 : !llvm.ptr + %7917 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7918 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7919 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7920 = llvm.mul %5851, %7919 : !llvm.i64 + %7921 = llvm.add %7918, %7920 : !llvm.i64 + %7922 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7923 = llvm.mul %5973, %7922 : !llvm.i64 + %7924 = llvm.add %7921, %7923 : !llvm.i64 + %7925 = llvm.getelementptr %7917[%7924] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7926 = llvm.load %7925 : !llvm.ptr + %7927 = llvm.fmul %7916, %7926 {RelaxedPrecision} : !llvm.float + %7928 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7930 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7931 = llvm.mul %25, %7930 : !llvm.i64 + %7932 = llvm.add %7929, %7931 : !llvm.i64 + %7933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7934 = llvm.mul %5973, %7933 : !llvm.i64 + %7935 = llvm.add %7932, %7934 : !llvm.i64 + %7936 = llvm.getelementptr %7928[%7935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7937 = llvm.load %7936 : !llvm.ptr + %7938 = llvm.fadd %7937, %7927 {RelaxedPrecision} : !llvm.float + %7939 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7940 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7941 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7942 = llvm.mul %25, %7941 : !llvm.i64 + %7943 = llvm.add %7940, %7942 : !llvm.i64 + %7944 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7945 = llvm.mul %5973, %7944 : !llvm.i64 + %7946 = llvm.add %7943, %7945 : !llvm.i64 + %7947 = llvm.getelementptr %7939[%7946] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7938, %7947 : !llvm.ptr + %7948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7951 = llvm.mul %25, %7950 : !llvm.i64 + %7952 = llvm.add %7949, %7951 : !llvm.i64 + %7953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7954 = llvm.mul %5973, %7953 : !llvm.i64 + %7955 = llvm.add %7952, %7954 : !llvm.i64 + %7956 = llvm.getelementptr %7948[%7955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7957 = llvm.load %7956 : !llvm.ptr + %7958 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7960 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7961 = llvm.mul %25, %7960 : !llvm.i64 + %7962 = llvm.add %7959, %7961 : !llvm.i64 + %7963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7964 = llvm.mul %5973, %7963 : !llvm.i64 + %7965 = llvm.add %7962, %7964 : !llvm.i64 + %7966 = llvm.getelementptr %7958[%7965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7957, %7966 : !llvm.ptr + %7967 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7968 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7969 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7970 = llvm.mul %25, %7969 : !llvm.i64 + %7971 = llvm.add %7968, %7970 : !llvm.i64 + %7972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7973 = llvm.mul %5851, %7972 : !llvm.i64 + %7974 = llvm.add %7971, %7973 : !llvm.i64 + %7975 = llvm.getelementptr %7967[%7974] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7976 = llvm.load %7975 : !llvm.ptr + %7977 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7978 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7979 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7980 = llvm.mul %5851, %7979 : !llvm.i64 + %7981 = llvm.add %7978, %7980 : !llvm.i64 + %7982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7983 = llvm.mul %6034, %7982 : !llvm.i64 + %7984 = llvm.add %7981, %7983 : !llvm.i64 + %7985 = llvm.getelementptr %7977[%7984] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7986 = llvm.load %7985 : !llvm.ptr + %7987 = llvm.fmul %7976, %7986 {RelaxedPrecision} : !llvm.float + %7988 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7990 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7991 = llvm.mul %25, %7990 : !llvm.i64 + %7992 = llvm.add %7989, %7991 : !llvm.i64 + %7993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7994 = llvm.mul %6034, %7993 : !llvm.i64 + %7995 = llvm.add %7992, %7994 : !llvm.i64 + %7996 = llvm.getelementptr %7988[%7995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7997 = llvm.load %7996 : !llvm.ptr + %7998 = llvm.fadd %7997, %7987 {RelaxedPrecision} : !llvm.float + %7999 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8000 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8001 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8002 = llvm.mul %25, %8001 : !llvm.i64 + %8003 = llvm.add %8000, %8002 : !llvm.i64 + %8004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8005 = llvm.mul %6034, %8004 : !llvm.i64 + %8006 = llvm.add %8003, %8005 : !llvm.i64 + %8007 = llvm.getelementptr %7999[%8006] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7998, %8007 : !llvm.ptr + %8008 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8010 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8011 = llvm.mul %25, %8010 : !llvm.i64 + %8012 = llvm.add %8009, %8011 : !llvm.i64 + %8013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8014 = llvm.mul %6034, %8013 : !llvm.i64 + %8015 = llvm.add %8012, %8014 : !llvm.i64 + %8016 = llvm.getelementptr %8008[%8015] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8017 = llvm.load %8016 : !llvm.ptr + %8018 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8020 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8021 = llvm.mul %25, %8020 : !llvm.i64 + %8022 = llvm.add %8019, %8021 : !llvm.i64 + %8023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8024 = llvm.mul %6034, %8023 : !llvm.i64 + %8025 = llvm.add %8022, %8024 : !llvm.i64 + %8026 = llvm.getelementptr %8018[%8025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8017, %8026 : !llvm.ptr + %8027 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8028 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8029 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8030 = llvm.mul %25, %8029 : !llvm.i64 + %8031 = llvm.add %8028, %8030 : !llvm.i64 + %8032 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8033 = llvm.mul %5851, %8032 : !llvm.i64 + %8034 = llvm.add %8031, %8033 : !llvm.i64 + %8035 = llvm.getelementptr %8027[%8034] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8036 = llvm.load %8035 : !llvm.ptr + %8037 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8038 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8039 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8040 = llvm.mul %5851, %8039 : !llvm.i64 + %8041 = llvm.add %8038, %8040 : !llvm.i64 + %8042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8043 = llvm.mul %6095, %8042 : !llvm.i64 + %8044 = llvm.add %8041, %8043 : !llvm.i64 + %8045 = llvm.getelementptr %8037[%8044] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8046 = llvm.load %8045 : !llvm.ptr + %8047 = llvm.fmul %8036, %8046 {RelaxedPrecision} : !llvm.float + %8048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8051 = llvm.mul %25, %8050 : !llvm.i64 + %8052 = llvm.add %8049, %8051 : !llvm.i64 + %8053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8054 = llvm.mul %6095, %8053 : !llvm.i64 + %8055 = llvm.add %8052, %8054 : !llvm.i64 + %8056 = llvm.getelementptr %8048[%8055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8057 = llvm.load %8056 : !llvm.ptr + %8058 = llvm.fadd %8057, %8047 {RelaxedPrecision} : !llvm.float + %8059 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8060 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8061 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8062 = llvm.mul %25, %8061 : !llvm.i64 + %8063 = llvm.add %8060, %8062 : !llvm.i64 + %8064 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8065 = llvm.mul %6095, %8064 : !llvm.i64 + %8066 = llvm.add %8063, %8065 : !llvm.i64 + %8067 = llvm.getelementptr %8059[%8066] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8058, %8067 : !llvm.ptr + %8068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8071 = llvm.mul %25, %8070 : !llvm.i64 + %8072 = llvm.add %8069, %8071 : !llvm.i64 + %8073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8074 = llvm.mul %6095, %8073 : !llvm.i64 + %8075 = llvm.add %8072, %8074 : !llvm.i64 + %8076 = llvm.getelementptr %8068[%8075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8077 = llvm.load %8076 : !llvm.ptr + %8078 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8080 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8081 = llvm.mul %25, %8080 : !llvm.i64 + %8082 = llvm.add %8079, %8081 : !llvm.i64 + %8083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8084 = llvm.mul %6095, %8083 : !llvm.i64 + %8085 = llvm.add %8082, %8084 : !llvm.i64 + %8086 = llvm.getelementptr %8078[%8085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8077, %8086 : !llvm.ptr + %8087 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8088 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8089 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8090 = llvm.mul %25, %8089 : !llvm.i64 + %8091 = llvm.add %8088, %8090 : !llvm.i64 + %8092 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8093 = llvm.mul %5851, %8092 : !llvm.i64 + %8094 = llvm.add %8091, %8093 : !llvm.i64 + %8095 = llvm.getelementptr %8087[%8094] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8096 = llvm.load %8095 : !llvm.ptr + %8097 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8098 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8099 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8100 = llvm.mul %5851, %8099 : !llvm.i64 + %8101 = llvm.add %8098, %8100 : !llvm.i64 + %8102 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8103 = llvm.mul %6156, %8102 : !llvm.i64 + %8104 = llvm.add %8101, %8103 : !llvm.i64 + %8105 = llvm.getelementptr %8097[%8104] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8106 = llvm.load %8105 : !llvm.ptr + %8107 = llvm.fmul %8096, %8106 {RelaxedPrecision} : !llvm.float + %8108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8111 = llvm.mul %25, %8110 : !llvm.i64 + %8112 = llvm.add %8109, %8111 : !llvm.i64 + %8113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8114 = llvm.mul %6156, %8113 : !llvm.i64 + %8115 = llvm.add %8112, %8114 : !llvm.i64 + %8116 = llvm.getelementptr %8108[%8115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8117 = llvm.load %8116 : !llvm.ptr + %8118 = llvm.fadd %8117, %8107 {RelaxedPrecision} : !llvm.float + %8119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8121 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8122 = llvm.mul %25, %8121 : !llvm.i64 + %8123 = llvm.add %8120, %8122 : !llvm.i64 + %8124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8125 = llvm.mul %6156, %8124 : !llvm.i64 + %8126 = llvm.add %8123, %8125 : !llvm.i64 + %8127 = llvm.getelementptr %8119[%8126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8118, %8127 : !llvm.ptr + %8128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8131 = llvm.mul %25, %8130 : !llvm.i64 + %8132 = llvm.add %8129, %8131 : !llvm.i64 + %8133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8134 = llvm.mul %6156, %8133 : !llvm.i64 + %8135 = llvm.add %8132, %8134 : !llvm.i64 + %8136 = llvm.getelementptr %8128[%8135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8137 = llvm.load %8136 : !llvm.ptr + %8138 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8140 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8141 = llvm.mul %25, %8140 : !llvm.i64 + %8142 = llvm.add %8139, %8141 : !llvm.i64 + %8143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8144 = llvm.mul %6156, %8143 : !llvm.i64 + %8145 = llvm.add %8142, %8144 : !llvm.i64 + %8146 = llvm.getelementptr %8138[%8145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8137, %8146 : !llvm.ptr + %8147 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8149 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8150 = llvm.mul %25, %8149 : !llvm.i64 + %8151 = llvm.add %8148, %8150 : !llvm.i64 + %8152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8153 = llvm.mul %5851, %8152 : !llvm.i64 + %8154 = llvm.add %8151, %8153 : !llvm.i64 + %8155 = llvm.getelementptr %8147[%8154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8156 = llvm.load %8155 : !llvm.ptr + %8157 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8159 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8160 = llvm.mul %5851, %8159 : !llvm.i64 + %8161 = llvm.add %8158, %8160 : !llvm.i64 + %8162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8163 = llvm.mul %6217, %8162 : !llvm.i64 + %8164 = llvm.add %8161, %8163 : !llvm.i64 + %8165 = llvm.getelementptr %8157[%8164] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8166 = llvm.load %8165 : !llvm.ptr + %8167 = llvm.fmul %8156, %8166 {RelaxedPrecision} : !llvm.float + %8168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8171 = llvm.mul %25, %8170 : !llvm.i64 + %8172 = llvm.add %8169, %8171 : !llvm.i64 + %8173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8174 = llvm.mul %6217, %8173 : !llvm.i64 + %8175 = llvm.add %8172, %8174 : !llvm.i64 + %8176 = llvm.getelementptr %8168[%8175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8177 = llvm.load %8176 : !llvm.ptr + %8178 = llvm.fadd %8177, %8167 {RelaxedPrecision} : !llvm.float + %8179 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8182 = llvm.mul %25, %8181 : !llvm.i64 + %8183 = llvm.add %8180, %8182 : !llvm.i64 + %8184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8185 = llvm.mul %6217, %8184 : !llvm.i64 + %8186 = llvm.add %8183, %8185 : !llvm.i64 + %8187 = llvm.getelementptr %8179[%8186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8178, %8187 : !llvm.ptr + %8188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8191 = llvm.mul %25, %8190 : !llvm.i64 + %8192 = llvm.add %8189, %8191 : !llvm.i64 + %8193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8194 = llvm.mul %6217, %8193 : !llvm.i64 + %8195 = llvm.add %8192, %8194 : !llvm.i64 + %8196 = llvm.getelementptr %8188[%8195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8197 = llvm.load %8196 : !llvm.ptr + %8198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8201 = llvm.mul %25, %8200 : !llvm.i64 + %8202 = llvm.add %8199, %8201 : !llvm.i64 + %8203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8204 = llvm.mul %6217, %8203 : !llvm.i64 + %8205 = llvm.add %8202, %8204 : !llvm.i64 + %8206 = llvm.getelementptr %8198[%8205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8197, %8206 : !llvm.ptr + %8207 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8208 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8209 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8210 = llvm.mul %25, %8209 : !llvm.i64 + %8211 = llvm.add %8208, %8210 : !llvm.i64 + %8212 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8213 = llvm.mul %5851, %8212 : !llvm.i64 + %8214 = llvm.add %8211, %8213 : !llvm.i64 + %8215 = llvm.getelementptr %8207[%8214] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8216 = llvm.load %8215 : !llvm.ptr + %8217 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8218 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8219 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8220 = llvm.mul %5851, %8219 : !llvm.i64 + %8221 = llvm.add %8218, %8220 : !llvm.i64 + %8222 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8223 = llvm.mul %6278, %8222 : !llvm.i64 + %8224 = llvm.add %8221, %8223 : !llvm.i64 + %8225 = llvm.getelementptr %8217[%8224] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8226 = llvm.load %8225 : !llvm.ptr + %8227 = llvm.fmul %8216, %8226 {RelaxedPrecision} : !llvm.float + %8228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8231 = llvm.mul %25, %8230 : !llvm.i64 + %8232 = llvm.add %8229, %8231 : !llvm.i64 + %8233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8234 = llvm.mul %6278, %8233 : !llvm.i64 + %8235 = llvm.add %8232, %8234 : !llvm.i64 + %8236 = llvm.getelementptr %8228[%8235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8237 = llvm.load %8236 : !llvm.ptr + %8238 = llvm.fadd %8237, %8227 {RelaxedPrecision} : !llvm.float + %8239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8242 = llvm.mul %25, %8241 : !llvm.i64 + %8243 = llvm.add %8240, %8242 : !llvm.i64 + %8244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8245 = llvm.mul %6278, %8244 : !llvm.i64 + %8246 = llvm.add %8243, %8245 : !llvm.i64 + %8247 = llvm.getelementptr %8239[%8246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8238, %8247 : !llvm.ptr + %8248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8251 = llvm.mul %25, %8250 : !llvm.i64 + %8252 = llvm.add %8249, %8251 : !llvm.i64 + %8253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8254 = llvm.mul %6278, %8253 : !llvm.i64 + %8255 = llvm.add %8252, %8254 : !llvm.i64 + %8256 = llvm.getelementptr %8248[%8255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8257 = llvm.load %8256 : !llvm.ptr + %8258 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8260 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8261 = llvm.mul %25, %8260 : !llvm.i64 + %8262 = llvm.add %8259, %8261 : !llvm.i64 + %8263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8264 = llvm.mul %6278, %8263 : !llvm.i64 + %8265 = llvm.add %8262, %8264 : !llvm.i64 + %8266 = llvm.getelementptr %8258[%8265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8257, %8266 : !llvm.ptr + %8267 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8269 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8270 = llvm.mul %25, %8269 : !llvm.i64 + %8271 = llvm.add %8268, %8270 : !llvm.i64 + %8272 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8273 = llvm.mul %5851, %8272 : !llvm.i64 + %8274 = llvm.add %8271, %8273 : !llvm.i64 + %8275 = llvm.getelementptr %8267[%8274] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8276 = llvm.load %8275 : !llvm.ptr + %8277 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8278 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8279 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8280 = llvm.mul %5851, %8279 : !llvm.i64 + %8281 = llvm.add %8278, %8280 : !llvm.i64 + %8282 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8283 = llvm.mul %6339, %8282 : !llvm.i64 + %8284 = llvm.add %8281, %8283 : !llvm.i64 + %8285 = llvm.getelementptr %8277[%8284] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8286 = llvm.load %8285 : !llvm.ptr + %8287 = llvm.fmul %8276, %8286 {RelaxedPrecision} : !llvm.float + %8288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8291 = llvm.mul %25, %8290 : !llvm.i64 + %8292 = llvm.add %8289, %8291 : !llvm.i64 + %8293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8294 = llvm.mul %6339, %8293 : !llvm.i64 + %8295 = llvm.add %8292, %8294 : !llvm.i64 + %8296 = llvm.getelementptr %8288[%8295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8297 = llvm.load %8296 : !llvm.ptr + %8298 = llvm.fadd %8297, %8287 {RelaxedPrecision} : !llvm.float + %8299 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8300 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8301 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8302 = llvm.mul %25, %8301 : !llvm.i64 + %8303 = llvm.add %8300, %8302 : !llvm.i64 + %8304 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8305 = llvm.mul %6339, %8304 : !llvm.i64 + %8306 = llvm.add %8303, %8305 : !llvm.i64 + %8307 = llvm.getelementptr %8299[%8306] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8298, %8307 : !llvm.ptr + %8308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8311 = llvm.mul %25, %8310 : !llvm.i64 + %8312 = llvm.add %8309, %8311 : !llvm.i64 + %8313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8314 = llvm.mul %6339, %8313 : !llvm.i64 + %8315 = llvm.add %8312, %8314 : !llvm.i64 + %8316 = llvm.getelementptr %8308[%8315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8317 = llvm.load %8316 : !llvm.ptr + %8318 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8320 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8321 = llvm.mul %25, %8320 : !llvm.i64 + %8322 = llvm.add %8319, %8321 : !llvm.i64 + %8323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8324 = llvm.mul %6339, %8323 : !llvm.i64 + %8325 = llvm.add %8322, %8324 : !llvm.i64 + %8326 = llvm.getelementptr %8318[%8325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8317, %8326 : !llvm.ptr + %8327 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8328 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8329 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8330 = llvm.mul %25, %8329 : !llvm.i64 + %8331 = llvm.add %8328, %8330 : !llvm.i64 + %8332 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8333 = llvm.mul %5851, %8332 : !llvm.i64 + %8334 = llvm.add %8331, %8333 : !llvm.i64 + %8335 = llvm.getelementptr %8327[%8334] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8336 = llvm.load %8335 : !llvm.ptr + %8337 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8339 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8340 = llvm.mul %5851, %8339 : !llvm.i64 + %8341 = llvm.add %8338, %8340 : !llvm.i64 + %8342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8343 = llvm.mul %6400, %8342 : !llvm.i64 + %8344 = llvm.add %8341, %8343 : !llvm.i64 + %8345 = llvm.getelementptr %8337[%8344] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8346 = llvm.load %8345 : !llvm.ptr + %8347 = llvm.fmul %8336, %8346 {RelaxedPrecision} : !llvm.float + %8348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8351 = llvm.mul %25, %8350 : !llvm.i64 + %8352 = llvm.add %8349, %8351 : !llvm.i64 + %8353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8354 = llvm.mul %6400, %8353 : !llvm.i64 + %8355 = llvm.add %8352, %8354 : !llvm.i64 + %8356 = llvm.getelementptr %8348[%8355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8357 = llvm.load %8356 : !llvm.ptr + %8358 = llvm.fadd %8357, %8347 {RelaxedPrecision} : !llvm.float + %8359 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8360 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8361 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8362 = llvm.mul %25, %8361 : !llvm.i64 + %8363 = llvm.add %8360, %8362 : !llvm.i64 + %8364 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8365 = llvm.mul %6400, %8364 : !llvm.i64 + %8366 = llvm.add %8363, %8365 : !llvm.i64 + %8367 = llvm.getelementptr %8359[%8366] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8358, %8367 : !llvm.ptr + %8368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8371 = llvm.mul %25, %8370 : !llvm.i64 + %8372 = llvm.add %8369, %8371 : !llvm.i64 + %8373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8374 = llvm.mul %6400, %8373 : !llvm.i64 + %8375 = llvm.add %8372, %8374 : !llvm.i64 + %8376 = llvm.getelementptr %8368[%8375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8377 = llvm.load %8376 : !llvm.ptr + %8378 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8380 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8381 = llvm.mul %25, %8380 : !llvm.i64 + %8382 = llvm.add %8379, %8381 : !llvm.i64 + %8383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8384 = llvm.mul %6400, %8383 : !llvm.i64 + %8385 = llvm.add %8382, %8384 : !llvm.i64 + %8386 = llvm.getelementptr %8378[%8385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8377, %8386 : !llvm.ptr + %8387 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8389 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8390 = llvm.mul %25, %8389 : !llvm.i64 + %8391 = llvm.add %8388, %8390 : !llvm.i64 + %8392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8393 = llvm.mul %5851, %8392 : !llvm.i64 + %8394 = llvm.add %8391, %8393 : !llvm.i64 + %8395 = llvm.getelementptr %8387[%8394] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8396 = llvm.load %8395 : !llvm.ptr + %8397 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8398 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8399 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8400 = llvm.mul %5851, %8399 : !llvm.i64 + %8401 = llvm.add %8398, %8400 : !llvm.i64 + %8402 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8403 = llvm.mul %6461, %8402 : !llvm.i64 + %8404 = llvm.add %8401, %8403 : !llvm.i64 + %8405 = llvm.getelementptr %8397[%8404] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8406 = llvm.load %8405 : !llvm.ptr + %8407 = llvm.fmul %8396, %8406 {RelaxedPrecision} : !llvm.float + %8408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8411 = llvm.mul %25, %8410 : !llvm.i64 + %8412 = llvm.add %8409, %8411 : !llvm.i64 + %8413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8414 = llvm.mul %6461, %8413 : !llvm.i64 + %8415 = llvm.add %8412, %8414 : !llvm.i64 + %8416 = llvm.getelementptr %8408[%8415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8417 = llvm.load %8416 : !llvm.ptr + %8418 = llvm.fadd %8417, %8407 {RelaxedPrecision} : !llvm.float + %8419 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8422 = llvm.mul %25, %8421 : !llvm.i64 + %8423 = llvm.add %8420, %8422 : !llvm.i64 + %8424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8425 = llvm.mul %6461, %8424 : !llvm.i64 + %8426 = llvm.add %8423, %8425 : !llvm.i64 + %8427 = llvm.getelementptr %8419[%8426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8418, %8427 : !llvm.ptr + %8428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8431 = llvm.mul %25, %8430 : !llvm.i64 + %8432 = llvm.add %8429, %8431 : !llvm.i64 + %8433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8434 = llvm.mul %6461, %8433 : !llvm.i64 + %8435 = llvm.add %8432, %8434 : !llvm.i64 + %8436 = llvm.getelementptr %8428[%8435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8437 = llvm.load %8436 : !llvm.ptr + %8438 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8440 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8441 = llvm.mul %25, %8440 : !llvm.i64 + %8442 = llvm.add %8439, %8441 : !llvm.i64 + %8443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8444 = llvm.mul %6461, %8443 : !llvm.i64 + %8445 = llvm.add %8442, %8444 : !llvm.i64 + %8446 = llvm.getelementptr %8438[%8445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8437, %8446 : !llvm.ptr + %8447 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8449 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8450 = llvm.mul %25, %8449 : !llvm.i64 + %8451 = llvm.add %8448, %8450 : !llvm.i64 + %8452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8453 = llvm.mul %5851, %8452 : !llvm.i64 + %8454 = llvm.add %8451, %8453 : !llvm.i64 + %8455 = llvm.getelementptr %8447[%8454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8456 = llvm.load %8455 : !llvm.ptr + %8457 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8458 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8459 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8460 = llvm.mul %5851, %8459 : !llvm.i64 + %8461 = llvm.add %8458, %8460 : !llvm.i64 + %8462 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8463 = llvm.mul %6522, %8462 : !llvm.i64 + %8464 = llvm.add %8461, %8463 : !llvm.i64 + %8465 = llvm.getelementptr %8457[%8464] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8466 = llvm.load %8465 : !llvm.ptr + %8467 = llvm.fmul %8456, %8466 {RelaxedPrecision} : !llvm.float + %8468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8471 = llvm.mul %25, %8470 : !llvm.i64 + %8472 = llvm.add %8469, %8471 : !llvm.i64 + %8473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8474 = llvm.mul %6522, %8473 : !llvm.i64 + %8475 = llvm.add %8472, %8474 : !llvm.i64 + %8476 = llvm.getelementptr %8468[%8475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8477 = llvm.load %8476 : !llvm.ptr + %8478 = llvm.fadd %8477, %8467 {RelaxedPrecision} : !llvm.float + %8479 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8480 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8481 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8482 = llvm.mul %25, %8481 : !llvm.i64 + %8483 = llvm.add %8480, %8482 : !llvm.i64 + %8484 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8485 = llvm.mul %6522, %8484 : !llvm.i64 + %8486 = llvm.add %8483, %8485 : !llvm.i64 + %8487 = llvm.getelementptr %8479[%8486] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8478, %8487 : !llvm.ptr + %8488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8491 = llvm.mul %25, %8490 : !llvm.i64 + %8492 = llvm.add %8489, %8491 : !llvm.i64 + %8493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8494 = llvm.mul %6522, %8493 : !llvm.i64 + %8495 = llvm.add %8492, %8494 : !llvm.i64 + %8496 = llvm.getelementptr %8488[%8495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8497 = llvm.load %8496 : !llvm.ptr + %8498 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8500 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8501 = llvm.mul %25, %8500 : !llvm.i64 + %8502 = llvm.add %8499, %8501 : !llvm.i64 + %8503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8504 = llvm.mul %6522, %8503 : !llvm.i64 + %8505 = llvm.add %8502, %8504 : !llvm.i64 + %8506 = llvm.getelementptr %8498[%8505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8497, %8506 : !llvm.ptr + %8507 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8508 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8509 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8510 = llvm.mul %25, %8509 : !llvm.i64 + %8511 = llvm.add %8508, %8510 : !llvm.i64 + %8512 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8513 = llvm.mul %5851, %8512 : !llvm.i64 + %8514 = llvm.add %8511, %8513 : !llvm.i64 + %8515 = llvm.getelementptr %8507[%8514] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8516 = llvm.load %8515 : !llvm.ptr + %8517 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8519 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8520 = llvm.mul %5851, %8519 : !llvm.i64 + %8521 = llvm.add %8518, %8520 : !llvm.i64 + %8522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8523 = llvm.mul %6583, %8522 : !llvm.i64 + %8524 = llvm.add %8521, %8523 : !llvm.i64 + %8525 = llvm.getelementptr %8517[%8524] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8526 = llvm.load %8525 : !llvm.ptr + %8527 = llvm.fmul %8516, %8526 {RelaxedPrecision} : !llvm.float + %8528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8531 = llvm.mul %25, %8530 : !llvm.i64 + %8532 = llvm.add %8529, %8531 : !llvm.i64 + %8533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8534 = llvm.mul %6583, %8533 : !llvm.i64 + %8535 = llvm.add %8532, %8534 : !llvm.i64 + %8536 = llvm.getelementptr %8528[%8535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8537 = llvm.load %8536 : !llvm.ptr + %8538 = llvm.fadd %8537, %8527 {RelaxedPrecision} : !llvm.float + %8539 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8540 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8541 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8542 = llvm.mul %25, %8541 : !llvm.i64 + %8543 = llvm.add %8540, %8542 : !llvm.i64 + %8544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8545 = llvm.mul %6583, %8544 : !llvm.i64 + %8546 = llvm.add %8543, %8545 : !llvm.i64 + %8547 = llvm.getelementptr %8539[%8546] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8538, %8547 : !llvm.ptr + %8548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8551 = llvm.mul %25, %8550 : !llvm.i64 + %8552 = llvm.add %8549, %8551 : !llvm.i64 + %8553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8554 = llvm.mul %6583, %8553 : !llvm.i64 + %8555 = llvm.add %8552, %8554 : !llvm.i64 + %8556 = llvm.getelementptr %8548[%8555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8557 = llvm.load %8556 : !llvm.ptr + %8558 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8561 = llvm.mul %25, %8560 : !llvm.i64 + %8562 = llvm.add %8559, %8561 : !llvm.i64 + %8563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8564 = llvm.mul %6583, %8563 : !llvm.i64 + %8565 = llvm.add %8562, %8564 : !llvm.i64 + %8566 = llvm.getelementptr %8558[%8565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8557, %8566 : !llvm.ptr + %8567 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8568 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8569 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8570 = llvm.mul %25, %8569 : !llvm.i64 + %8571 = llvm.add %8568, %8570 : !llvm.i64 + %8572 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8573 = llvm.mul %5851, %8572 : !llvm.i64 + %8574 = llvm.add %8571, %8573 : !llvm.i64 + %8575 = llvm.getelementptr %8567[%8574] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8576 = llvm.load %8575 : !llvm.ptr + %8577 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8578 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8579 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8580 = llvm.mul %5851, %8579 : !llvm.i64 + %8581 = llvm.add %8578, %8580 : !llvm.i64 + %8582 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8583 = llvm.mul %6644, %8582 : !llvm.i64 + %8584 = llvm.add %8581, %8583 : !llvm.i64 + %8585 = llvm.getelementptr %8577[%8584] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8586 = llvm.load %8585 : !llvm.ptr + %8587 = llvm.fmul %8576, %8586 {RelaxedPrecision} : !llvm.float + %8588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8591 = llvm.mul %25, %8590 : !llvm.i64 + %8592 = llvm.add %8589, %8591 : !llvm.i64 + %8593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8594 = llvm.mul %6644, %8593 : !llvm.i64 + %8595 = llvm.add %8592, %8594 : !llvm.i64 + %8596 = llvm.getelementptr %8588[%8595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8597 = llvm.load %8596 : !llvm.ptr + %8598 = llvm.fadd %8597, %8587 {RelaxedPrecision} : !llvm.float + %8599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8602 = llvm.mul %25, %8601 : !llvm.i64 + %8603 = llvm.add %8600, %8602 : !llvm.i64 + %8604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8605 = llvm.mul %6644, %8604 : !llvm.i64 + %8606 = llvm.add %8603, %8605 : !llvm.i64 + %8607 = llvm.getelementptr %8599[%8606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8598, %8607 : !llvm.ptr + %8608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8611 = llvm.mul %25, %8610 : !llvm.i64 + %8612 = llvm.add %8609, %8611 : !llvm.i64 + %8613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8614 = llvm.mul %6644, %8613 : !llvm.i64 + %8615 = llvm.add %8612, %8614 : !llvm.i64 + %8616 = llvm.getelementptr %8608[%8615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8617 = llvm.load %8616 : !llvm.ptr + %8618 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8620 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8621 = llvm.mul %25, %8620 : !llvm.i64 + %8622 = llvm.add %8619, %8621 : !llvm.i64 + %8623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8624 = llvm.mul %6644, %8623 : !llvm.i64 + %8625 = llvm.add %8622, %8624 : !llvm.i64 + %8626 = llvm.getelementptr %8618[%8625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8617, %8626 : !llvm.ptr + %8627 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8628 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8629 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8630 = llvm.mul %25, %8629 : !llvm.i64 + %8631 = llvm.add %8628, %8630 : !llvm.i64 + %8632 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8633 = llvm.mul %5851, %8632 : !llvm.i64 + %8634 = llvm.add %8631, %8633 : !llvm.i64 + %8635 = llvm.getelementptr %8627[%8634] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8636 = llvm.load %8635 : !llvm.ptr + %8637 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8638 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8639 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8640 = llvm.mul %5851, %8639 : !llvm.i64 + %8641 = llvm.add %8638, %8640 : !llvm.i64 + %8642 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8643 = llvm.mul %6705, %8642 : !llvm.i64 + %8644 = llvm.add %8641, %8643 : !llvm.i64 + %8645 = llvm.getelementptr %8637[%8644] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8646 = llvm.load %8645 : !llvm.ptr + %8647 = llvm.fmul %8636, %8646 {RelaxedPrecision} : !llvm.float + %8648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8651 = llvm.mul %25, %8650 : !llvm.i64 + %8652 = llvm.add %8649, %8651 : !llvm.i64 + %8653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8654 = llvm.mul %6705, %8653 : !llvm.i64 + %8655 = llvm.add %8652, %8654 : !llvm.i64 + %8656 = llvm.getelementptr %8648[%8655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8657 = llvm.load %8656 : !llvm.ptr + %8658 = llvm.fadd %8657, %8647 {RelaxedPrecision} : !llvm.float + %8659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8662 = llvm.mul %25, %8661 : !llvm.i64 + %8663 = llvm.add %8660, %8662 : !llvm.i64 + %8664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8665 = llvm.mul %6705, %8664 : !llvm.i64 + %8666 = llvm.add %8663, %8665 : !llvm.i64 + %8667 = llvm.getelementptr %8659[%8666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8658, %8667 : !llvm.ptr + %8668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8671 = llvm.mul %25, %8670 : !llvm.i64 + %8672 = llvm.add %8669, %8671 : !llvm.i64 + %8673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8674 = llvm.mul %6705, %8673 : !llvm.i64 + %8675 = llvm.add %8672, %8674 : !llvm.i64 + %8676 = llvm.getelementptr %8668[%8675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8677 = llvm.load %8676 : !llvm.ptr + %8678 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8680 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8681 = llvm.mul %25, %8680 : !llvm.i64 + %8682 = llvm.add %8679, %8681 : !llvm.i64 + %8683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8684 = llvm.mul %6705, %8683 : !llvm.i64 + %8685 = llvm.add %8682, %8684 : !llvm.i64 + %8686 = llvm.getelementptr %8678[%8685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8677, %8686 : !llvm.ptr + %8687 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8688 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8689 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8690 = llvm.mul %25, %8689 : !llvm.i64 + %8691 = llvm.add %8688, %8690 : !llvm.i64 + %8692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8693 = llvm.mul %5851, %8692 : !llvm.i64 + %8694 = llvm.add %8691, %8693 : !llvm.i64 + %8695 = llvm.getelementptr %8687[%8694] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8696 = llvm.load %8695 : !llvm.ptr + %8697 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8698 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8699 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8700 = llvm.mul %5851, %8699 : !llvm.i64 + %8701 = llvm.add %8698, %8700 : !llvm.i64 + %8702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8703 = llvm.mul %6766, %8702 : !llvm.i64 + %8704 = llvm.add %8701, %8703 : !llvm.i64 + %8705 = llvm.getelementptr %8697[%8704] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8706 = llvm.load %8705 : !llvm.ptr + %8707 = llvm.fmul %8696, %8706 {RelaxedPrecision} : !llvm.float + %8708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8711 = llvm.mul %25, %8710 : !llvm.i64 + %8712 = llvm.add %8709, %8711 : !llvm.i64 + %8713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8714 = llvm.mul %6766, %8713 : !llvm.i64 + %8715 = llvm.add %8712, %8714 : !llvm.i64 + %8716 = llvm.getelementptr %8708[%8715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8717 = llvm.load %8716 : !llvm.ptr + %8718 = llvm.fadd %8717, %8707 {RelaxedPrecision} : !llvm.float + %8719 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8720 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8721 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8722 = llvm.mul %25, %8721 : !llvm.i64 + %8723 = llvm.add %8720, %8722 : !llvm.i64 + %8724 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8725 = llvm.mul %6766, %8724 : !llvm.i64 + %8726 = llvm.add %8723, %8725 : !llvm.i64 + %8727 = llvm.getelementptr %8719[%8726] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8718, %8727 : !llvm.ptr + %8728 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8729 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8730 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8731 = llvm.mul %25, %8730 : !llvm.i64 + %8732 = llvm.add %8729, %8731 : !llvm.i64 + %8733 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8734 = llvm.mul %6766, %8733 : !llvm.i64 + %8735 = llvm.add %8732, %8734 : !llvm.i64 + %8736 = llvm.getelementptr %8728[%8735] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8737 = llvm.load %8736 : !llvm.ptr + %8738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8741 = llvm.mul %25, %8740 : !llvm.i64 + %8742 = llvm.add %8739, %8741 : !llvm.i64 + %8743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8744 = llvm.mul %6766, %8743 : !llvm.i64 + %8745 = llvm.add %8742, %8744 : !llvm.i64 + %8746 = llvm.getelementptr %8738[%8745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8737, %8746 : !llvm.ptr + %8747 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8749 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8750 = llvm.mul %26, %8749 : !llvm.i64 + %8751 = llvm.add %8748, %8750 : !llvm.i64 + %8752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8753 = llvm.mul %5851, %8752 : !llvm.i64 + %8754 = llvm.add %8751, %8753 : !llvm.i64 + %8755 = llvm.getelementptr %8747[%8754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8756 = llvm.load %8755 : !llvm.ptr + %8757 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8758 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8759 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8760 = llvm.mul %5851, %8759 : !llvm.i64 + %8761 = llvm.add %8758, %8760 : !llvm.i64 + %8762 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8763 = llvm.mul %5850, %8762 : !llvm.i64 + %8764 = llvm.add %8761, %8763 : !llvm.i64 + %8765 = llvm.getelementptr %8757[%8764] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8766 = llvm.load %8765 : !llvm.ptr + %8767 = llvm.fmul %8756, %8766 {RelaxedPrecision} : !llvm.float + %8768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8771 = llvm.mul %26, %8770 : !llvm.i64 + %8772 = llvm.add %8769, %8771 : !llvm.i64 + %8773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8774 = llvm.mul %5850, %8773 : !llvm.i64 + %8775 = llvm.add %8772, %8774 : !llvm.i64 + %8776 = llvm.getelementptr %8768[%8775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8777 = llvm.load %8776 : !llvm.ptr + %8778 = llvm.fadd %8777, %8767 {RelaxedPrecision} : !llvm.float + %8779 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8780 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8781 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8782 = llvm.mul %26, %8781 : !llvm.i64 + %8783 = llvm.add %8780, %8782 : !llvm.i64 + %8784 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8785 = llvm.mul %5850, %8784 : !llvm.i64 + %8786 = llvm.add %8783, %8785 : !llvm.i64 + %8787 = llvm.getelementptr %8779[%8786] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8778, %8787 : !llvm.ptr + %8788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8791 = llvm.mul %26, %8790 : !llvm.i64 + %8792 = llvm.add %8789, %8791 : !llvm.i64 + %8793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8794 = llvm.mul %5850, %8793 : !llvm.i64 + %8795 = llvm.add %8792, %8794 : !llvm.i64 + %8796 = llvm.getelementptr %8788[%8795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8797 = llvm.load %8796 : !llvm.ptr + %8798 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8800 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8801 = llvm.mul %26, %8800 : !llvm.i64 + %8802 = llvm.add %8799, %8801 : !llvm.i64 + %8803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8804 = llvm.mul %5850, %8803 : !llvm.i64 + %8805 = llvm.add %8802, %8804 : !llvm.i64 + %8806 = llvm.getelementptr %8798[%8805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8797, %8806 : !llvm.ptr + %8807 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8808 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8809 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8810 = llvm.mul %26, %8809 : !llvm.i64 + %8811 = llvm.add %8808, %8810 : !llvm.i64 + %8812 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8813 = llvm.mul %5851, %8812 : !llvm.i64 + %8814 = llvm.add %8811, %8813 : !llvm.i64 + %8815 = llvm.getelementptr %8807[%8814] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8816 = llvm.load %8815 : !llvm.ptr + %8817 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8818 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8819 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8820 = llvm.mul %5851, %8819 : !llvm.i64 + %8821 = llvm.add %8818, %8820 : !llvm.i64 + %8822 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8823 = llvm.mul %5912, %8822 : !llvm.i64 + %8824 = llvm.add %8821, %8823 : !llvm.i64 + %8825 = llvm.getelementptr %8817[%8824] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8826 = llvm.load %8825 : !llvm.ptr + %8827 = llvm.fmul %8816, %8826 {RelaxedPrecision} : !llvm.float + %8828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8831 = llvm.mul %26, %8830 : !llvm.i64 + %8832 = llvm.add %8829, %8831 : !llvm.i64 + %8833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8834 = llvm.mul %5912, %8833 : !llvm.i64 + %8835 = llvm.add %8832, %8834 : !llvm.i64 + %8836 = llvm.getelementptr %8828[%8835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8837 = llvm.load %8836 : !llvm.ptr + %8838 = llvm.fadd %8837, %8827 {RelaxedPrecision} : !llvm.float + %8839 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8840 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8841 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8842 = llvm.mul %26, %8841 : !llvm.i64 + %8843 = llvm.add %8840, %8842 : !llvm.i64 + %8844 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8845 = llvm.mul %5912, %8844 : !llvm.i64 + %8846 = llvm.add %8843, %8845 : !llvm.i64 + %8847 = llvm.getelementptr %8839[%8846] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8838, %8847 : !llvm.ptr + %8848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8851 = llvm.mul %26, %8850 : !llvm.i64 + %8852 = llvm.add %8849, %8851 : !llvm.i64 + %8853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8854 = llvm.mul %5912, %8853 : !llvm.i64 + %8855 = llvm.add %8852, %8854 : !llvm.i64 + %8856 = llvm.getelementptr %8848[%8855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8857 = llvm.load %8856 : !llvm.ptr + %8858 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8860 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8861 = llvm.mul %26, %8860 : !llvm.i64 + %8862 = llvm.add %8859, %8861 : !llvm.i64 + %8863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8864 = llvm.mul %5912, %8863 : !llvm.i64 + %8865 = llvm.add %8862, %8864 : !llvm.i64 + %8866 = llvm.getelementptr %8858[%8865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8857, %8866 : !llvm.ptr + %8867 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8868 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8869 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8870 = llvm.mul %26, %8869 : !llvm.i64 + %8871 = llvm.add %8868, %8870 : !llvm.i64 + %8872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8873 = llvm.mul %5851, %8872 : !llvm.i64 + %8874 = llvm.add %8871, %8873 : !llvm.i64 + %8875 = llvm.getelementptr %8867[%8874] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8876 = llvm.load %8875 : !llvm.ptr + %8877 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8878 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8879 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8880 = llvm.mul %5851, %8879 : !llvm.i64 + %8881 = llvm.add %8878, %8880 : !llvm.i64 + %8882 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8883 = llvm.mul %5973, %8882 : !llvm.i64 + %8884 = llvm.add %8881, %8883 : !llvm.i64 + %8885 = llvm.getelementptr %8877[%8884] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8886 = llvm.load %8885 : !llvm.ptr + %8887 = llvm.fmul %8876, %8886 {RelaxedPrecision} : !llvm.float + %8888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8891 = llvm.mul %26, %8890 : !llvm.i64 + %8892 = llvm.add %8889, %8891 : !llvm.i64 + %8893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8894 = llvm.mul %5973, %8893 : !llvm.i64 + %8895 = llvm.add %8892, %8894 : !llvm.i64 + %8896 = llvm.getelementptr %8888[%8895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8897 = llvm.load %8896 : !llvm.ptr + %8898 = llvm.fadd %8897, %8887 {RelaxedPrecision} : !llvm.float + %8899 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8901 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8902 = llvm.mul %26, %8901 : !llvm.i64 + %8903 = llvm.add %8900, %8902 : !llvm.i64 + %8904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8905 = llvm.mul %5973, %8904 : !llvm.i64 + %8906 = llvm.add %8903, %8905 : !llvm.i64 + %8907 = llvm.getelementptr %8899[%8906] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8898, %8907 : !llvm.ptr + %8908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8911 = llvm.mul %26, %8910 : !llvm.i64 + %8912 = llvm.add %8909, %8911 : !llvm.i64 + %8913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8914 = llvm.mul %5973, %8913 : !llvm.i64 + %8915 = llvm.add %8912, %8914 : !llvm.i64 + %8916 = llvm.getelementptr %8908[%8915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8917 = llvm.load %8916 : !llvm.ptr + %8918 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8920 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8921 = llvm.mul %26, %8920 : !llvm.i64 + %8922 = llvm.add %8919, %8921 : !llvm.i64 + %8923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8924 = llvm.mul %5973, %8923 : !llvm.i64 + %8925 = llvm.add %8922, %8924 : !llvm.i64 + %8926 = llvm.getelementptr %8918[%8925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8917, %8926 : !llvm.ptr + %8927 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8928 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8929 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8930 = llvm.mul %26, %8929 : !llvm.i64 + %8931 = llvm.add %8928, %8930 : !llvm.i64 + %8932 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8933 = llvm.mul %5851, %8932 : !llvm.i64 + %8934 = llvm.add %8931, %8933 : !llvm.i64 + %8935 = llvm.getelementptr %8927[%8934] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8936 = llvm.load %8935 : !llvm.ptr + %8937 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8938 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8939 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8940 = llvm.mul %5851, %8939 : !llvm.i64 + %8941 = llvm.add %8938, %8940 : !llvm.i64 + %8942 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8943 = llvm.mul %6034, %8942 : !llvm.i64 + %8944 = llvm.add %8941, %8943 : !llvm.i64 + %8945 = llvm.getelementptr %8937[%8944] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8946 = llvm.load %8945 : !llvm.ptr + %8947 = llvm.fmul %8936, %8946 {RelaxedPrecision} : !llvm.float + %8948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8951 = llvm.mul %26, %8950 : !llvm.i64 + %8952 = llvm.add %8949, %8951 : !llvm.i64 + %8953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8954 = llvm.mul %6034, %8953 : !llvm.i64 + %8955 = llvm.add %8952, %8954 : !llvm.i64 + %8956 = llvm.getelementptr %8948[%8955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8957 = llvm.load %8956 : !llvm.ptr + %8958 = llvm.fadd %8957, %8947 {RelaxedPrecision} : !llvm.float + %8959 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8960 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8961 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8962 = llvm.mul %26, %8961 : !llvm.i64 + %8963 = llvm.add %8960, %8962 : !llvm.i64 + %8964 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8965 = llvm.mul %6034, %8964 : !llvm.i64 + %8966 = llvm.add %8963, %8965 : !llvm.i64 + %8967 = llvm.getelementptr %8959[%8966] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8958, %8967 : !llvm.ptr + %8968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8971 = llvm.mul %26, %8970 : !llvm.i64 + %8972 = llvm.add %8969, %8971 : !llvm.i64 + %8973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8974 = llvm.mul %6034, %8973 : !llvm.i64 + %8975 = llvm.add %8972, %8974 : !llvm.i64 + %8976 = llvm.getelementptr %8968[%8975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8977 = llvm.load %8976 : !llvm.ptr + %8978 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8980 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8981 = llvm.mul %26, %8980 : !llvm.i64 + %8982 = llvm.add %8979, %8981 : !llvm.i64 + %8983 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8984 = llvm.mul %6034, %8983 : !llvm.i64 + %8985 = llvm.add %8982, %8984 : !llvm.i64 + %8986 = llvm.getelementptr %8978[%8985] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8977, %8986 : !llvm.ptr + %8987 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8988 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8989 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8990 = llvm.mul %26, %8989 : !llvm.i64 + %8991 = llvm.add %8988, %8990 : !llvm.i64 + %8992 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8993 = llvm.mul %5851, %8992 : !llvm.i64 + %8994 = llvm.add %8991, %8993 : !llvm.i64 + %8995 = llvm.getelementptr %8987[%8994] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8996 = llvm.load %8995 : !llvm.ptr + %8997 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8998 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8999 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9000 = llvm.mul %5851, %8999 : !llvm.i64 + %9001 = llvm.add %8998, %9000 : !llvm.i64 + %9002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9003 = llvm.mul %6095, %9002 : !llvm.i64 + %9004 = llvm.add %9001, %9003 : !llvm.i64 + %9005 = llvm.getelementptr %8997[%9004] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9006 = llvm.load %9005 : !llvm.ptr + %9007 = llvm.fmul %8996, %9006 {RelaxedPrecision} : !llvm.float + %9008 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9010 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9011 = llvm.mul %26, %9010 : !llvm.i64 + %9012 = llvm.add %9009, %9011 : !llvm.i64 + %9013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9014 = llvm.mul %6095, %9013 : !llvm.i64 + %9015 = llvm.add %9012, %9014 : !llvm.i64 + %9016 = llvm.getelementptr %9008[%9015] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9017 = llvm.load %9016 : !llvm.ptr + %9018 = llvm.fadd %9017, %9007 {RelaxedPrecision} : !llvm.float + %9019 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9020 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9021 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9022 = llvm.mul %26, %9021 : !llvm.i64 + %9023 = llvm.add %9020, %9022 : !llvm.i64 + %9024 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9025 = llvm.mul %6095, %9024 : !llvm.i64 + %9026 = llvm.add %9023, %9025 : !llvm.i64 + %9027 = llvm.getelementptr %9019[%9026] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9018, %9027 : !llvm.ptr + %9028 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9031 = llvm.mul %26, %9030 : !llvm.i64 + %9032 = llvm.add %9029, %9031 : !llvm.i64 + %9033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9034 = llvm.mul %6095, %9033 : !llvm.i64 + %9035 = llvm.add %9032, %9034 : !llvm.i64 + %9036 = llvm.getelementptr %9028[%9035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9037 = llvm.load %9036 : !llvm.ptr + %9038 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9039 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9040 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9041 = llvm.mul %26, %9040 : !llvm.i64 + %9042 = llvm.add %9039, %9041 : !llvm.i64 + %9043 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9044 = llvm.mul %6095, %9043 : !llvm.i64 + %9045 = llvm.add %9042, %9044 : !llvm.i64 + %9046 = llvm.getelementptr %9038[%9045] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9037, %9046 : !llvm.ptr + %9047 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9048 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9049 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9050 = llvm.mul %26, %9049 : !llvm.i64 + %9051 = llvm.add %9048, %9050 : !llvm.i64 + %9052 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9053 = llvm.mul %5851, %9052 : !llvm.i64 + %9054 = llvm.add %9051, %9053 : !llvm.i64 + %9055 = llvm.getelementptr %9047[%9054] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9056 = llvm.load %9055 : !llvm.ptr + %9057 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9059 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9060 = llvm.mul %5851, %9059 : !llvm.i64 + %9061 = llvm.add %9058, %9060 : !llvm.i64 + %9062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9063 = llvm.mul %6156, %9062 : !llvm.i64 + %9064 = llvm.add %9061, %9063 : !llvm.i64 + %9065 = llvm.getelementptr %9057[%9064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9066 = llvm.load %9065 : !llvm.ptr + %9067 = llvm.fmul %9056, %9066 {RelaxedPrecision} : !llvm.float + %9068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9071 = llvm.mul %26, %9070 : !llvm.i64 + %9072 = llvm.add %9069, %9071 : !llvm.i64 + %9073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9074 = llvm.mul %6156, %9073 : !llvm.i64 + %9075 = llvm.add %9072, %9074 : !llvm.i64 + %9076 = llvm.getelementptr %9068[%9075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9077 = llvm.load %9076 : !llvm.ptr + %9078 = llvm.fadd %9077, %9067 {RelaxedPrecision} : !llvm.float + %9079 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9080 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9081 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9082 = llvm.mul %26, %9081 : !llvm.i64 + %9083 = llvm.add %9080, %9082 : !llvm.i64 + %9084 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9085 = llvm.mul %6156, %9084 : !llvm.i64 + %9086 = llvm.add %9083, %9085 : !llvm.i64 + %9087 = llvm.getelementptr %9079[%9086] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9078, %9087 : !llvm.ptr + %9088 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9091 = llvm.mul %26, %9090 : !llvm.i64 + %9092 = llvm.add %9089, %9091 : !llvm.i64 + %9093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9094 = llvm.mul %6156, %9093 : !llvm.i64 + %9095 = llvm.add %9092, %9094 : !llvm.i64 + %9096 = llvm.getelementptr %9088[%9095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9097 = llvm.load %9096 : !llvm.ptr + %9098 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9100 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9101 = llvm.mul %26, %9100 : !llvm.i64 + %9102 = llvm.add %9099, %9101 : !llvm.i64 + %9103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9104 = llvm.mul %6156, %9103 : !llvm.i64 + %9105 = llvm.add %9102, %9104 : !llvm.i64 + %9106 = llvm.getelementptr %9098[%9105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9097, %9106 : !llvm.ptr + %9107 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9108 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9109 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9110 = llvm.mul %26, %9109 : !llvm.i64 + %9111 = llvm.add %9108, %9110 : !llvm.i64 + %9112 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9113 = llvm.mul %5851, %9112 : !llvm.i64 + %9114 = llvm.add %9111, %9113 : !llvm.i64 + %9115 = llvm.getelementptr %9107[%9114] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9116 = llvm.load %9115 : !llvm.ptr + %9117 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9120 = llvm.mul %5851, %9119 : !llvm.i64 + %9121 = llvm.add %9118, %9120 : !llvm.i64 + %9122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9123 = llvm.mul %6217, %9122 : !llvm.i64 + %9124 = llvm.add %9121, %9123 : !llvm.i64 + %9125 = llvm.getelementptr %9117[%9124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9126 = llvm.load %9125 : !llvm.ptr + %9127 = llvm.fmul %9116, %9126 {RelaxedPrecision} : !llvm.float + %9128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9131 = llvm.mul %26, %9130 : !llvm.i64 + %9132 = llvm.add %9129, %9131 : !llvm.i64 + %9133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9134 = llvm.mul %6217, %9133 : !llvm.i64 + %9135 = llvm.add %9132, %9134 : !llvm.i64 + %9136 = llvm.getelementptr %9128[%9135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9137 = llvm.load %9136 : !llvm.ptr + %9138 = llvm.fadd %9137, %9127 {RelaxedPrecision} : !llvm.float + %9139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9142 = llvm.mul %26, %9141 : !llvm.i64 + %9143 = llvm.add %9140, %9142 : !llvm.i64 + %9144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9145 = llvm.mul %6217, %9144 : !llvm.i64 + %9146 = llvm.add %9143, %9145 : !llvm.i64 + %9147 = llvm.getelementptr %9139[%9146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9138, %9147 : !llvm.ptr + %9148 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9151 = llvm.mul %26, %9150 : !llvm.i64 + %9152 = llvm.add %9149, %9151 : !llvm.i64 + %9153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9154 = llvm.mul %6217, %9153 : !llvm.i64 + %9155 = llvm.add %9152, %9154 : !llvm.i64 + %9156 = llvm.getelementptr %9148[%9155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9157 = llvm.load %9156 : !llvm.ptr + %9158 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9161 = llvm.mul %26, %9160 : !llvm.i64 + %9162 = llvm.add %9159, %9161 : !llvm.i64 + %9163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9164 = llvm.mul %6217, %9163 : !llvm.i64 + %9165 = llvm.add %9162, %9164 : !llvm.i64 + %9166 = llvm.getelementptr %9158[%9165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9157, %9166 : !llvm.ptr + %9167 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9169 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9170 = llvm.mul %26, %9169 : !llvm.i64 + %9171 = llvm.add %9168, %9170 : !llvm.i64 + %9172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9173 = llvm.mul %5851, %9172 : !llvm.i64 + %9174 = llvm.add %9171, %9173 : !llvm.i64 + %9175 = llvm.getelementptr %9167[%9174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9176 = llvm.load %9175 : !llvm.ptr + %9177 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9179 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9180 = llvm.mul %5851, %9179 : !llvm.i64 + %9181 = llvm.add %9178, %9180 : !llvm.i64 + %9182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9183 = llvm.mul %6278, %9182 : !llvm.i64 + %9184 = llvm.add %9181, %9183 : !llvm.i64 + %9185 = llvm.getelementptr %9177[%9184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9186 = llvm.load %9185 : !llvm.ptr + %9187 = llvm.fmul %9176, %9186 {RelaxedPrecision} : !llvm.float + %9188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9191 = llvm.mul %26, %9190 : !llvm.i64 + %9192 = llvm.add %9189, %9191 : !llvm.i64 + %9193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9194 = llvm.mul %6278, %9193 : !llvm.i64 + %9195 = llvm.add %9192, %9194 : !llvm.i64 + %9196 = llvm.getelementptr %9188[%9195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9197 = llvm.load %9196 : !llvm.ptr + %9198 = llvm.fadd %9197, %9187 {RelaxedPrecision} : !llvm.float + %9199 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9200 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9201 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9202 = llvm.mul %26, %9201 : !llvm.i64 + %9203 = llvm.add %9200, %9202 : !llvm.i64 + %9204 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9205 = llvm.mul %6278, %9204 : !llvm.i64 + %9206 = llvm.add %9203, %9205 : !llvm.i64 + %9207 = llvm.getelementptr %9199[%9206] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9198, %9207 : !llvm.ptr + %9208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9211 = llvm.mul %26, %9210 : !llvm.i64 + %9212 = llvm.add %9209, %9211 : !llvm.i64 + %9213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9214 = llvm.mul %6278, %9213 : !llvm.i64 + %9215 = llvm.add %9212, %9214 : !llvm.i64 + %9216 = llvm.getelementptr %9208[%9215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9217 = llvm.load %9216 : !llvm.ptr + %9218 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9220 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9221 = llvm.mul %26, %9220 : !llvm.i64 + %9222 = llvm.add %9219, %9221 : !llvm.i64 + %9223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9224 = llvm.mul %6278, %9223 : !llvm.i64 + %9225 = llvm.add %9222, %9224 : !llvm.i64 + %9226 = llvm.getelementptr %9218[%9225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9217, %9226 : !llvm.ptr + %9227 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9228 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9229 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9230 = llvm.mul %26, %9229 : !llvm.i64 + %9231 = llvm.add %9228, %9230 : !llvm.i64 + %9232 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9233 = llvm.mul %5851, %9232 : !llvm.i64 + %9234 = llvm.add %9231, %9233 : !llvm.i64 + %9235 = llvm.getelementptr %9227[%9234] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9236 = llvm.load %9235 : !llvm.ptr + %9237 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9239 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9240 = llvm.mul %5851, %9239 : !llvm.i64 + %9241 = llvm.add %9238, %9240 : !llvm.i64 + %9242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9243 = llvm.mul %6339, %9242 : !llvm.i64 + %9244 = llvm.add %9241, %9243 : !llvm.i64 + %9245 = llvm.getelementptr %9237[%9244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9246 = llvm.load %9245 : !llvm.ptr + %9247 = llvm.fmul %9236, %9246 {RelaxedPrecision} : !llvm.float + %9248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9251 = llvm.mul %26, %9250 : !llvm.i64 + %9252 = llvm.add %9249, %9251 : !llvm.i64 + %9253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9254 = llvm.mul %6339, %9253 : !llvm.i64 + %9255 = llvm.add %9252, %9254 : !llvm.i64 + %9256 = llvm.getelementptr %9248[%9255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9257 = llvm.load %9256 : !llvm.ptr + %9258 = llvm.fadd %9257, %9247 {RelaxedPrecision} : !llvm.float + %9259 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9260 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9261 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9262 = llvm.mul %26, %9261 : !llvm.i64 + %9263 = llvm.add %9260, %9262 : !llvm.i64 + %9264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9265 = llvm.mul %6339, %9264 : !llvm.i64 + %9266 = llvm.add %9263, %9265 : !llvm.i64 + %9267 = llvm.getelementptr %9259[%9266] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9258, %9267 : !llvm.ptr + %9268 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9271 = llvm.mul %26, %9270 : !llvm.i64 + %9272 = llvm.add %9269, %9271 : !llvm.i64 + %9273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9274 = llvm.mul %6339, %9273 : !llvm.i64 + %9275 = llvm.add %9272, %9274 : !llvm.i64 + %9276 = llvm.getelementptr %9268[%9275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9277 = llvm.load %9276 : !llvm.ptr + %9278 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9279 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9280 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9281 = llvm.mul %26, %9280 : !llvm.i64 + %9282 = llvm.add %9279, %9281 : !llvm.i64 + %9283 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9284 = llvm.mul %6339, %9283 : !llvm.i64 + %9285 = llvm.add %9282, %9284 : !llvm.i64 + %9286 = llvm.getelementptr %9278[%9285] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9277, %9286 : !llvm.ptr + %9287 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9288 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9289 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9290 = llvm.mul %26, %9289 : !llvm.i64 + %9291 = llvm.add %9288, %9290 : !llvm.i64 + %9292 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9293 = llvm.mul %5851, %9292 : !llvm.i64 + %9294 = llvm.add %9291, %9293 : !llvm.i64 + %9295 = llvm.getelementptr %9287[%9294] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9296 = llvm.load %9295 : !llvm.ptr + %9297 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9299 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9300 = llvm.mul %5851, %9299 : !llvm.i64 + %9301 = llvm.add %9298, %9300 : !llvm.i64 + %9302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9303 = llvm.mul %6400, %9302 : !llvm.i64 + %9304 = llvm.add %9301, %9303 : !llvm.i64 + %9305 = llvm.getelementptr %9297[%9304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9306 = llvm.load %9305 : !llvm.ptr + %9307 = llvm.fmul %9296, %9306 {RelaxedPrecision} : !llvm.float + %9308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9311 = llvm.mul %26, %9310 : !llvm.i64 + %9312 = llvm.add %9309, %9311 : !llvm.i64 + %9313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9314 = llvm.mul %6400, %9313 : !llvm.i64 + %9315 = llvm.add %9312, %9314 : !llvm.i64 + %9316 = llvm.getelementptr %9308[%9315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9317 = llvm.load %9316 : !llvm.ptr + %9318 = llvm.fadd %9317, %9307 {RelaxedPrecision} : !llvm.float + %9319 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9320 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9321 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9322 = llvm.mul %26, %9321 : !llvm.i64 + %9323 = llvm.add %9320, %9322 : !llvm.i64 + %9324 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9325 = llvm.mul %6400, %9324 : !llvm.i64 + %9326 = llvm.add %9323, %9325 : !llvm.i64 + %9327 = llvm.getelementptr %9319[%9326] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9318, %9327 : !llvm.ptr + %9328 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9331 = llvm.mul %26, %9330 : !llvm.i64 + %9332 = llvm.add %9329, %9331 : !llvm.i64 + %9333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9334 = llvm.mul %6400, %9333 : !llvm.i64 + %9335 = llvm.add %9332, %9334 : !llvm.i64 + %9336 = llvm.getelementptr %9328[%9335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9337 = llvm.load %9336 : !llvm.ptr + %9338 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9339 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9340 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9341 = llvm.mul %26, %9340 : !llvm.i64 + %9342 = llvm.add %9339, %9341 : !llvm.i64 + %9343 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9344 = llvm.mul %6400, %9343 : !llvm.i64 + %9345 = llvm.add %9342, %9344 : !llvm.i64 + %9346 = llvm.getelementptr %9338[%9345] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9337, %9346 : !llvm.ptr + %9347 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9349 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9350 = llvm.mul %26, %9349 : !llvm.i64 + %9351 = llvm.add %9348, %9350 : !llvm.i64 + %9352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9353 = llvm.mul %5851, %9352 : !llvm.i64 + %9354 = llvm.add %9351, %9353 : !llvm.i64 + %9355 = llvm.getelementptr %9347[%9354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9356 = llvm.load %9355 : !llvm.ptr + %9357 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9359 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9360 = llvm.mul %5851, %9359 : !llvm.i64 + %9361 = llvm.add %9358, %9360 : !llvm.i64 + %9362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9363 = llvm.mul %6461, %9362 : !llvm.i64 + %9364 = llvm.add %9361, %9363 : !llvm.i64 + %9365 = llvm.getelementptr %9357[%9364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9366 = llvm.load %9365 : !llvm.ptr + %9367 = llvm.fmul %9356, %9366 {RelaxedPrecision} : !llvm.float + %9368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9371 = llvm.mul %26, %9370 : !llvm.i64 + %9372 = llvm.add %9369, %9371 : !llvm.i64 + %9373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9374 = llvm.mul %6461, %9373 : !llvm.i64 + %9375 = llvm.add %9372, %9374 : !llvm.i64 + %9376 = llvm.getelementptr %9368[%9375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9377 = llvm.load %9376 : !llvm.ptr + %9378 = llvm.fadd %9377, %9367 {RelaxedPrecision} : !llvm.float + %9379 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9380 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9381 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9382 = llvm.mul %26, %9381 : !llvm.i64 + %9383 = llvm.add %9380, %9382 : !llvm.i64 + %9384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9385 = llvm.mul %6461, %9384 : !llvm.i64 + %9386 = llvm.add %9383, %9385 : !llvm.i64 + %9387 = llvm.getelementptr %9379[%9386] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9378, %9387 : !llvm.ptr + %9388 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9391 = llvm.mul %26, %9390 : !llvm.i64 + %9392 = llvm.add %9389, %9391 : !llvm.i64 + %9393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9394 = llvm.mul %6461, %9393 : !llvm.i64 + %9395 = llvm.add %9392, %9394 : !llvm.i64 + %9396 = llvm.getelementptr %9388[%9395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9397 = llvm.load %9396 : !llvm.ptr + %9398 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9399 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9401 = llvm.mul %26, %9400 : !llvm.i64 + %9402 = llvm.add %9399, %9401 : !llvm.i64 + %9403 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9404 = llvm.mul %6461, %9403 : !llvm.i64 + %9405 = llvm.add %9402, %9404 : !llvm.i64 + %9406 = llvm.getelementptr %9398[%9405] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9397, %9406 : !llvm.ptr + %9407 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9408 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9409 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9410 = llvm.mul %26, %9409 : !llvm.i64 + %9411 = llvm.add %9408, %9410 : !llvm.i64 + %9412 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9413 = llvm.mul %5851, %9412 : !llvm.i64 + %9414 = llvm.add %9411, %9413 : !llvm.i64 + %9415 = llvm.getelementptr %9407[%9414] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9416 = llvm.load %9415 : !llvm.ptr + %9417 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9419 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9420 = llvm.mul %5851, %9419 : !llvm.i64 + %9421 = llvm.add %9418, %9420 : !llvm.i64 + %9422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9423 = llvm.mul %6522, %9422 : !llvm.i64 + %9424 = llvm.add %9421, %9423 : !llvm.i64 + %9425 = llvm.getelementptr %9417[%9424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9426 = llvm.load %9425 : !llvm.ptr + %9427 = llvm.fmul %9416, %9426 {RelaxedPrecision} : !llvm.float + %9428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9431 = llvm.mul %26, %9430 : !llvm.i64 + %9432 = llvm.add %9429, %9431 : !llvm.i64 + %9433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9434 = llvm.mul %6522, %9433 : !llvm.i64 + %9435 = llvm.add %9432, %9434 : !llvm.i64 + %9436 = llvm.getelementptr %9428[%9435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9437 = llvm.load %9436 : !llvm.ptr + %9438 = llvm.fadd %9437, %9427 {RelaxedPrecision} : !llvm.float + %9439 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9440 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9441 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9442 = llvm.mul %26, %9441 : !llvm.i64 + %9443 = llvm.add %9440, %9442 : !llvm.i64 + %9444 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9445 = llvm.mul %6522, %9444 : !llvm.i64 + %9446 = llvm.add %9443, %9445 : !llvm.i64 + %9447 = llvm.getelementptr %9439[%9446] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9438, %9447 : !llvm.ptr + %9448 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9451 = llvm.mul %26, %9450 : !llvm.i64 + %9452 = llvm.add %9449, %9451 : !llvm.i64 + %9453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9454 = llvm.mul %6522, %9453 : !llvm.i64 + %9455 = llvm.add %9452, %9454 : !llvm.i64 + %9456 = llvm.getelementptr %9448[%9455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9457 = llvm.load %9456 : !llvm.ptr + %9458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9461 = llvm.mul %26, %9460 : !llvm.i64 + %9462 = llvm.add %9459, %9461 : !llvm.i64 + %9463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9464 = llvm.mul %6522, %9463 : !llvm.i64 + %9465 = llvm.add %9462, %9464 : !llvm.i64 + %9466 = llvm.getelementptr %9458[%9465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9457, %9466 : !llvm.ptr + %9467 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9468 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9469 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9470 = llvm.mul %26, %9469 : !llvm.i64 + %9471 = llvm.add %9468, %9470 : !llvm.i64 + %9472 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9473 = llvm.mul %5851, %9472 : !llvm.i64 + %9474 = llvm.add %9471, %9473 : !llvm.i64 + %9475 = llvm.getelementptr %9467[%9474] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9476 = llvm.load %9475 : !llvm.ptr + %9477 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9480 = llvm.mul %5851, %9479 : !llvm.i64 + %9481 = llvm.add %9478, %9480 : !llvm.i64 + %9482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9483 = llvm.mul %6583, %9482 : !llvm.i64 + %9484 = llvm.add %9481, %9483 : !llvm.i64 + %9485 = llvm.getelementptr %9477[%9484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9486 = llvm.load %9485 : !llvm.ptr + %9487 = llvm.fmul %9476, %9486 {RelaxedPrecision} : !llvm.float + %9488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9491 = llvm.mul %26, %9490 : !llvm.i64 + %9492 = llvm.add %9489, %9491 : !llvm.i64 + %9493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9494 = llvm.mul %6583, %9493 : !llvm.i64 + %9495 = llvm.add %9492, %9494 : !llvm.i64 + %9496 = llvm.getelementptr %9488[%9495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9497 = llvm.load %9496 : !llvm.ptr + %9498 = llvm.fadd %9497, %9487 {RelaxedPrecision} : !llvm.float + %9499 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9500 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9501 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9502 = llvm.mul %26, %9501 : !llvm.i64 + %9503 = llvm.add %9500, %9502 : !llvm.i64 + %9504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9505 = llvm.mul %6583, %9504 : !llvm.i64 + %9506 = llvm.add %9503, %9505 : !llvm.i64 + %9507 = llvm.getelementptr %9499[%9506] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9498, %9507 : !llvm.ptr + %9508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9511 = llvm.mul %26, %9510 : !llvm.i64 + %9512 = llvm.add %9509, %9511 : !llvm.i64 + %9513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9514 = llvm.mul %6583, %9513 : !llvm.i64 + %9515 = llvm.add %9512, %9514 : !llvm.i64 + %9516 = llvm.getelementptr %9508[%9515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9517 = llvm.load %9516 : !llvm.ptr + %9518 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9519 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9520 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9521 = llvm.mul %26, %9520 : !llvm.i64 + %9522 = llvm.add %9519, %9521 : !llvm.i64 + %9523 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9524 = llvm.mul %6583, %9523 : !llvm.i64 + %9525 = llvm.add %9522, %9524 : !llvm.i64 + %9526 = llvm.getelementptr %9518[%9525] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9517, %9526 : !llvm.ptr + %9527 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9528 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9529 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9530 = llvm.mul %26, %9529 : !llvm.i64 + %9531 = llvm.add %9528, %9530 : !llvm.i64 + %9532 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9533 = llvm.mul %5851, %9532 : !llvm.i64 + %9534 = llvm.add %9531, %9533 : !llvm.i64 + %9535 = llvm.getelementptr %9527[%9534] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9536 = llvm.load %9535 : !llvm.ptr + %9537 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9539 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9540 = llvm.mul %5851, %9539 : !llvm.i64 + %9541 = llvm.add %9538, %9540 : !llvm.i64 + %9542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9543 = llvm.mul %6644, %9542 : !llvm.i64 + %9544 = llvm.add %9541, %9543 : !llvm.i64 + %9545 = llvm.getelementptr %9537[%9544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9546 = llvm.load %9545 : !llvm.ptr + %9547 = llvm.fmul %9536, %9546 {RelaxedPrecision} : !llvm.float + %9548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9551 = llvm.mul %26, %9550 : !llvm.i64 + %9552 = llvm.add %9549, %9551 : !llvm.i64 + %9553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9554 = llvm.mul %6644, %9553 : !llvm.i64 + %9555 = llvm.add %9552, %9554 : !llvm.i64 + %9556 = llvm.getelementptr %9548[%9555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9557 = llvm.load %9556 : !llvm.ptr + %9558 = llvm.fadd %9557, %9547 {RelaxedPrecision} : !llvm.float + %9559 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9560 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9561 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9562 = llvm.mul %26, %9561 : !llvm.i64 + %9563 = llvm.add %9560, %9562 : !llvm.i64 + %9564 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9565 = llvm.mul %6644, %9564 : !llvm.i64 + %9566 = llvm.add %9563, %9565 : !llvm.i64 + %9567 = llvm.getelementptr %9559[%9566] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9558, %9567 : !llvm.ptr + %9568 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9571 = llvm.mul %26, %9570 : !llvm.i64 + %9572 = llvm.add %9569, %9571 : !llvm.i64 + %9573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9574 = llvm.mul %6644, %9573 : !llvm.i64 + %9575 = llvm.add %9572, %9574 : !llvm.i64 + %9576 = llvm.getelementptr %9568[%9575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9577 = llvm.load %9576 : !llvm.ptr + %9578 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9580 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9581 = llvm.mul %26, %9580 : !llvm.i64 + %9582 = llvm.add %9579, %9581 : !llvm.i64 + %9583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9584 = llvm.mul %6644, %9583 : !llvm.i64 + %9585 = llvm.add %9582, %9584 : !llvm.i64 + %9586 = llvm.getelementptr %9578[%9585] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9577, %9586 : !llvm.ptr + %9587 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9588 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9589 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9590 = llvm.mul %26, %9589 : !llvm.i64 + %9591 = llvm.add %9588, %9590 : !llvm.i64 + %9592 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9593 = llvm.mul %5851, %9592 : !llvm.i64 + %9594 = llvm.add %9591, %9593 : !llvm.i64 + %9595 = llvm.getelementptr %9587[%9594] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9596 = llvm.load %9595 : !llvm.ptr + %9597 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9599 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9600 = llvm.mul %5851, %9599 : !llvm.i64 + %9601 = llvm.add %9598, %9600 : !llvm.i64 + %9602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9603 = llvm.mul %6705, %9602 : !llvm.i64 + %9604 = llvm.add %9601, %9603 : !llvm.i64 + %9605 = llvm.getelementptr %9597[%9604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9606 = llvm.load %9605 : !llvm.ptr + %9607 = llvm.fmul %9596, %9606 {RelaxedPrecision} : !llvm.float + %9608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9611 = llvm.mul %26, %9610 : !llvm.i64 + %9612 = llvm.add %9609, %9611 : !llvm.i64 + %9613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9614 = llvm.mul %6705, %9613 : !llvm.i64 + %9615 = llvm.add %9612, %9614 : !llvm.i64 + %9616 = llvm.getelementptr %9608[%9615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9617 = llvm.load %9616 : !llvm.ptr + %9618 = llvm.fadd %9617, %9607 {RelaxedPrecision} : !llvm.float + %9619 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9620 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9621 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9622 = llvm.mul %26, %9621 : !llvm.i64 + %9623 = llvm.add %9620, %9622 : !llvm.i64 + %9624 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9625 = llvm.mul %6705, %9624 : !llvm.i64 + %9626 = llvm.add %9623, %9625 : !llvm.i64 + %9627 = llvm.getelementptr %9619[%9626] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9618, %9627 : !llvm.ptr + %9628 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9631 = llvm.mul %26, %9630 : !llvm.i64 + %9632 = llvm.add %9629, %9631 : !llvm.i64 + %9633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9634 = llvm.mul %6705, %9633 : !llvm.i64 + %9635 = llvm.add %9632, %9634 : !llvm.i64 + %9636 = llvm.getelementptr %9628[%9635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9637 = llvm.load %9636 : !llvm.ptr + %9638 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9639 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9640 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9641 = llvm.mul %26, %9640 : !llvm.i64 + %9642 = llvm.add %9639, %9641 : !llvm.i64 + %9643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9644 = llvm.mul %6705, %9643 : !llvm.i64 + %9645 = llvm.add %9642, %9644 : !llvm.i64 + %9646 = llvm.getelementptr %9638[%9645] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9637, %9646 : !llvm.ptr + %9647 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9648 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9649 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9650 = llvm.mul %26, %9649 : !llvm.i64 + %9651 = llvm.add %9648, %9650 : !llvm.i64 + %9652 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9653 = llvm.mul %5851, %9652 : !llvm.i64 + %9654 = llvm.add %9651, %9653 : !llvm.i64 + %9655 = llvm.getelementptr %9647[%9654] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9656 = llvm.load %9655 : !llvm.ptr + %9657 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9659 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9660 = llvm.mul %5851, %9659 : !llvm.i64 + %9661 = llvm.add %9658, %9660 : !llvm.i64 + %9662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9663 = llvm.mul %6766, %9662 : !llvm.i64 + %9664 = llvm.add %9661, %9663 : !llvm.i64 + %9665 = llvm.getelementptr %9657[%9664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9666 = llvm.load %9665 : !llvm.ptr + %9667 = llvm.fmul %9656, %9666 {RelaxedPrecision} : !llvm.float + %9668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9671 = llvm.mul %26, %9670 : !llvm.i64 + %9672 = llvm.add %9669, %9671 : !llvm.i64 + %9673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9674 = llvm.mul %6766, %9673 : !llvm.i64 + %9675 = llvm.add %9672, %9674 : !llvm.i64 + %9676 = llvm.getelementptr %9668[%9675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9677 = llvm.load %9676 : !llvm.ptr + %9678 = llvm.fadd %9677, %9667 {RelaxedPrecision} : !llvm.float + %9679 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9680 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9681 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9682 = llvm.mul %26, %9681 : !llvm.i64 + %9683 = llvm.add %9680, %9682 : !llvm.i64 + %9684 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9685 = llvm.mul %6766, %9684 : !llvm.i64 + %9686 = llvm.add %9683, %9685 : !llvm.i64 + %9687 = llvm.getelementptr %9679[%9686] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9678, %9687 : !llvm.ptr + %9688 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9691 = llvm.mul %26, %9690 : !llvm.i64 + %9692 = llvm.add %9689, %9691 : !llvm.i64 + %9693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9694 = llvm.mul %6766, %9693 : !llvm.i64 + %9695 = llvm.add %9692, %9694 : !llvm.i64 + %9696 = llvm.getelementptr %9688[%9695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9697 = llvm.load %9696 : !llvm.ptr + %9698 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9700 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9701 = llvm.mul %26, %9700 : !llvm.i64 + %9702 = llvm.add %9699, %9701 : !llvm.i64 + %9703 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9704 = llvm.mul %6766, %9703 : !llvm.i64 + %9705 = llvm.add %9702, %9704 : !llvm.i64 + %9706 = llvm.getelementptr %9698[%9705] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9697, %9706 : !llvm.ptr + %9707 = llvm.add %5848, %33 : !llvm.i64 + llvm.br ^bb19(%9707 : !llvm.i64) + ^bb21: // pred: ^bb19 + %9708 = llvm.add %5846, %36 : !llvm.i64 + llvm.br ^bb17(%9708 : !llvm.i64) + ^bb22: // pred: ^bb17 + %9709 = llvm.add %5844, %30 : !llvm.i64 + llvm.br ^bb15(%9709 : !llvm.i64) + ^bb23: // pred: ^bb15 + %9710 = llvm.add %48, %29 : !llvm.i64 + llvm.br ^bb1(%9710 : !llvm.i64) + ^bb24: // pred: ^bb1 + llvm.return + } + llvm.func @optimized_matmul_py(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(784 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(128 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(128 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(512 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(512 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(784 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(512 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(512 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @optimized_matmul_py_impl_17630232307017152746(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/optimized_matmul/mlir/22_FunctionPointerResolution.mlir b/Tutorials/optimized_matmul/mlir/22_FunctionPointerResolution.mlir new file mode 100644 index 00000000..be537b05 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/22_FunctionPointerResolution.mlir @@ -0,0 +1,10140 @@ +module @optimized_matmul { + llvm.func @optimized_matmul_py_impl_17630232307017152746(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(781 : index) : !llvm.i64 + %25 = llvm.mlir.constant(782 : index) : !llvm.i64 + %26 = llvm.mlir.constant(783 : index) : !llvm.i64 + %27 = llvm.mlir.constant(512 : index) : !llvm.i64 + %28 = llvm.mlir.constant(780 : index) : !llvm.i64 + %29 = llvm.mlir.constant(256 : index) : !llvm.i64 + %30 = llvm.mlir.constant(16 : index) : !llvm.i64 + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.mlir.constant(0 : index) : !llvm.i64 + %33 = llvm.mlir.constant(1 : index) : !llvm.i64 + %34 = llvm.mlir.constant(2 : index) : !llvm.i64 + %35 = llvm.mlir.constant(3 : index) : !llvm.i64 + %36 = llvm.mlir.constant(4 : index) : !llvm.i64 + %37 = llvm.mlir.constant(5 : index) : !llvm.i64 + %38 = llvm.mlir.constant(6 : index) : !llvm.i64 + %39 = llvm.mlir.constant(7 : index) : !llvm.i64 + %40 = llvm.mlir.constant(8 : index) : !llvm.i64 + %41 = llvm.mlir.constant(9 : index) : !llvm.i64 + %42 = llvm.mlir.constant(10 : index) : !llvm.i64 + %43 = llvm.mlir.constant(11 : index) : !llvm.i64 + %44 = llvm.mlir.constant(12 : index) : !llvm.i64 + %45 = llvm.mlir.constant(13 : index) : !llvm.i64 + %46 = llvm.mlir.constant(14 : index) : !llvm.i64 + %47 = llvm.mlir.constant(15 : index) : !llvm.i64 + llvm.br ^bb1(%32 : !llvm.i64) + ^bb1(%48: !llvm.i64): // 2 preds: ^bb0, ^bb23 + %49 = llvm.icmp "slt" %48, %27 : !llvm.i64 + llvm.cond_br %49, ^bb2, ^bb24 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%32 : !llvm.i64) + ^bb3(%50: !llvm.i64): // 2 preds: ^bb2, ^bb13 + %51 = llvm.icmp "slt" %50, %28 : !llvm.i64 + llvm.cond_br %51, ^bb4, ^bb14 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%32 : !llvm.i64) + ^bb5(%52: !llvm.i64): // 2 preds: ^bb4, ^bb12 + %53 = llvm.icmp "slt" %52, %29 : !llvm.i64 + llvm.cond_br %53, ^bb6, ^bb13 + ^bb6: // pred: ^bb5 + llvm.br ^bb7(%32 : !llvm.i64) + ^bb7(%54: !llvm.i64): // 2 preds: ^bb6, ^bb11 + %55 = llvm.icmp "slt" %54, %31 : !llvm.i64 + llvm.cond_br %55, ^bb8, ^bb12 + ^bb8: // pred: ^bb7 + llvm.br ^bb9(%32 : !llvm.i64) + ^bb9(%56: !llvm.i64): // 2 preds: ^bb8, ^bb10 + %57 = llvm.icmp "slt" %56, %36 : !llvm.i64 + llvm.cond_br %57, ^bb10, ^bb11 + ^bb10: // pred: ^bb9 + %58 = llvm.add %48, %52 : !llvm.i64 + %59 = llvm.add %54, %56 : !llvm.i64 + %60 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %61 = llvm.mlir.constant(0 : index) : !llvm.i64 + %62 = llvm.mlir.constant(128 : index) : !llvm.i64 + %63 = llvm.mul %50, %62 : !llvm.i64 + %64 = llvm.add %61, %63 : !llvm.i64 + %65 = llvm.mlir.constant(1 : index) : !llvm.i64 + %66 = llvm.mul %59, %65 : !llvm.i64 + %67 = llvm.add %64, %66 : !llvm.i64 + %68 = llvm.getelementptr %60[%67] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %69 = llvm.load %68 : !llvm.ptr + %70 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %71 = llvm.mlir.constant(0 : index) : !llvm.i64 + %72 = llvm.mlir.constant(512 : index) : !llvm.i64 + %73 = llvm.mul %59, %72 : !llvm.i64 + %74 = llvm.add %71, %73 : !llvm.i64 + %75 = llvm.mlir.constant(1 : index) : !llvm.i64 + %76 = llvm.mul %58, %75 : !llvm.i64 + %77 = llvm.add %74, %76 : !llvm.i64 + %78 = llvm.getelementptr %70[%77] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %79 = llvm.load %78 : !llvm.ptr + %80 = llvm.fmul %69, %79 {RelaxedPrecision} : !llvm.float + %81 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %82 = llvm.mlir.constant(0 : index) : !llvm.i64 + %83 = llvm.mlir.constant(512 : index) : !llvm.i64 + %84 = llvm.mul %50, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.mlir.constant(1 : index) : !llvm.i64 + %87 = llvm.mul %58, %86 : !llvm.i64 + %88 = llvm.add %85, %87 : !llvm.i64 + %89 = llvm.getelementptr %81[%88] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %90 = llvm.load %89 : !llvm.ptr + %91 = llvm.fadd %90, %80 {RelaxedPrecision} : !llvm.float + %92 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %93 = llvm.mlir.constant(0 : index) : !llvm.i64 + %94 = llvm.mlir.constant(512 : index) : !llvm.i64 + %95 = llvm.mul %50, %94 : !llvm.i64 + %96 = llvm.add %93, %95 : !llvm.i64 + %97 = llvm.mlir.constant(1 : index) : !llvm.i64 + %98 = llvm.mul %58, %97 : !llvm.i64 + %99 = llvm.add %96, %98 : !llvm.i64 + %100 = llvm.getelementptr %92[%99] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %91, %100 : !llvm.ptr + %101 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %102 = llvm.mlir.constant(0 : index) : !llvm.i64 + %103 = llvm.mlir.constant(512 : index) : !llvm.i64 + %104 = llvm.mul %50, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %107 = llvm.mul %58, %106 : !llvm.i64 + %108 = llvm.add %105, %107 : !llvm.i64 + %109 = llvm.getelementptr %101[%108] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %110 = llvm.load %109 : !llvm.ptr + %111 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %112 = llvm.mlir.constant(0 : index) : !llvm.i64 + %113 = llvm.mlir.constant(512 : index) : !llvm.i64 + %114 = llvm.mul %50, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.mlir.constant(1 : index) : !llvm.i64 + %117 = llvm.mul %58, %116 : !llvm.i64 + %118 = llvm.add %115, %117 : !llvm.i64 + %119 = llvm.getelementptr %111[%118] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %110, %119 : !llvm.ptr + %120 = llvm.add %58, %33 : !llvm.i64 + %121 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %122 = llvm.mlir.constant(0 : index) : !llvm.i64 + %123 = llvm.mlir.constant(128 : index) : !llvm.i64 + %124 = llvm.mul %50, %123 : !llvm.i64 + %125 = llvm.add %122, %124 : !llvm.i64 + %126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %127 = llvm.mul %59, %126 : !llvm.i64 + %128 = llvm.add %125, %127 : !llvm.i64 + %129 = llvm.getelementptr %121[%128] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %130 = llvm.load %129 : !llvm.ptr + %131 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %134 = llvm.mul %59, %133 : !llvm.i64 + %135 = llvm.add %132, %134 : !llvm.i64 + %136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %137 = llvm.mul %120, %136 : !llvm.i64 + %138 = llvm.add %135, %137 : !llvm.i64 + %139 = llvm.getelementptr %131[%138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %140 = llvm.load %139 : !llvm.ptr + %141 = llvm.fmul %130, %140 {RelaxedPrecision} : !llvm.float + %142 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %143 = llvm.mlir.constant(0 : index) : !llvm.i64 + %144 = llvm.mlir.constant(512 : index) : !llvm.i64 + %145 = llvm.mul %50, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.mlir.constant(1 : index) : !llvm.i64 + %148 = llvm.mul %120, %147 : !llvm.i64 + %149 = llvm.add %146, %148 : !llvm.i64 + %150 = llvm.getelementptr %142[%149] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %151 = llvm.load %150 : !llvm.ptr + %152 = llvm.fadd %151, %141 {RelaxedPrecision} : !llvm.float + %153 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %154 = llvm.mlir.constant(0 : index) : !llvm.i64 + %155 = llvm.mlir.constant(512 : index) : !llvm.i64 + %156 = llvm.mul %50, %155 : !llvm.i64 + %157 = llvm.add %154, %156 : !llvm.i64 + %158 = llvm.mlir.constant(1 : index) : !llvm.i64 + %159 = llvm.mul %120, %158 : !llvm.i64 + %160 = llvm.add %157, %159 : !llvm.i64 + %161 = llvm.getelementptr %153[%160] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %152, %161 : !llvm.ptr + %162 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %163 = llvm.mlir.constant(0 : index) : !llvm.i64 + %164 = llvm.mlir.constant(512 : index) : !llvm.i64 + %165 = llvm.mul %50, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.mlir.constant(1 : index) : !llvm.i64 + %168 = llvm.mul %120, %167 : !llvm.i64 + %169 = llvm.add %166, %168 : !llvm.i64 + %170 = llvm.getelementptr %162[%169] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %171 = llvm.load %170 : !llvm.ptr + %172 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %173 = llvm.mlir.constant(0 : index) : !llvm.i64 + %174 = llvm.mlir.constant(512 : index) : !llvm.i64 + %175 = llvm.mul %50, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.mlir.constant(1 : index) : !llvm.i64 + %178 = llvm.mul %120, %177 : !llvm.i64 + %179 = llvm.add %176, %178 : !llvm.i64 + %180 = llvm.getelementptr %172[%179] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %171, %180 : !llvm.ptr + %181 = llvm.add %58, %34 : !llvm.i64 + %182 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %183 = llvm.mlir.constant(0 : index) : !llvm.i64 + %184 = llvm.mlir.constant(128 : index) : !llvm.i64 + %185 = llvm.mul %50, %184 : !llvm.i64 + %186 = llvm.add %183, %185 : !llvm.i64 + %187 = llvm.mlir.constant(1 : index) : !llvm.i64 + %188 = llvm.mul %59, %187 : !llvm.i64 + %189 = llvm.add %186, %188 : !llvm.i64 + %190 = llvm.getelementptr %182[%189] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %191 = llvm.load %190 : !llvm.ptr + %192 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %193 = llvm.mlir.constant(0 : index) : !llvm.i64 + %194 = llvm.mlir.constant(512 : index) : !llvm.i64 + %195 = llvm.mul %59, %194 : !llvm.i64 + %196 = llvm.add %193, %195 : !llvm.i64 + %197 = llvm.mlir.constant(1 : index) : !llvm.i64 + %198 = llvm.mul %181, %197 : !llvm.i64 + %199 = llvm.add %196, %198 : !llvm.i64 + %200 = llvm.getelementptr %192[%199] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %201 = llvm.load %200 : !llvm.ptr + %202 = llvm.fmul %191, %201 {RelaxedPrecision} : !llvm.float + %203 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %204 = llvm.mlir.constant(0 : index) : !llvm.i64 + %205 = llvm.mlir.constant(512 : index) : !llvm.i64 + %206 = llvm.mul %50, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.mlir.constant(1 : index) : !llvm.i64 + %209 = llvm.mul %181, %208 : !llvm.i64 + %210 = llvm.add %207, %209 : !llvm.i64 + %211 = llvm.getelementptr %203[%210] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %212 = llvm.load %211 : !llvm.ptr + %213 = llvm.fadd %212, %202 {RelaxedPrecision} : !llvm.float + %214 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %215 = llvm.mlir.constant(0 : index) : !llvm.i64 + %216 = llvm.mlir.constant(512 : index) : !llvm.i64 + %217 = llvm.mul %50, %216 : !llvm.i64 + %218 = llvm.add %215, %217 : !llvm.i64 + %219 = llvm.mlir.constant(1 : index) : !llvm.i64 + %220 = llvm.mul %181, %219 : !llvm.i64 + %221 = llvm.add %218, %220 : !llvm.i64 + %222 = llvm.getelementptr %214[%221] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %213, %222 : !llvm.ptr + %223 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %224 = llvm.mlir.constant(0 : index) : !llvm.i64 + %225 = llvm.mlir.constant(512 : index) : !llvm.i64 + %226 = llvm.mul %50, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.mlir.constant(1 : index) : !llvm.i64 + %229 = llvm.mul %181, %228 : !llvm.i64 + %230 = llvm.add %227, %229 : !llvm.i64 + %231 = llvm.getelementptr %223[%230] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %232 = llvm.load %231 : !llvm.ptr + %233 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %234 = llvm.mlir.constant(0 : index) : !llvm.i64 + %235 = llvm.mlir.constant(512 : index) : !llvm.i64 + %236 = llvm.mul %50, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %239 = llvm.mul %181, %238 : !llvm.i64 + %240 = llvm.add %237, %239 : !llvm.i64 + %241 = llvm.getelementptr %233[%240] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %232, %241 : !llvm.ptr + %242 = llvm.add %58, %35 : !llvm.i64 + %243 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %244 = llvm.mlir.constant(0 : index) : !llvm.i64 + %245 = llvm.mlir.constant(128 : index) : !llvm.i64 + %246 = llvm.mul %50, %245 : !llvm.i64 + %247 = llvm.add %244, %246 : !llvm.i64 + %248 = llvm.mlir.constant(1 : index) : !llvm.i64 + %249 = llvm.mul %59, %248 : !llvm.i64 + %250 = llvm.add %247, %249 : !llvm.i64 + %251 = llvm.getelementptr %243[%250] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %252 = llvm.load %251 : !llvm.ptr + %253 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %254 = llvm.mlir.constant(0 : index) : !llvm.i64 + %255 = llvm.mlir.constant(512 : index) : !llvm.i64 + %256 = llvm.mul %59, %255 : !llvm.i64 + %257 = llvm.add %254, %256 : !llvm.i64 + %258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %259 = llvm.mul %242, %258 : !llvm.i64 + %260 = llvm.add %257, %259 : !llvm.i64 + %261 = llvm.getelementptr %253[%260] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %262 = llvm.load %261 : !llvm.ptr + %263 = llvm.fmul %252, %262 {RelaxedPrecision} : !llvm.float + %264 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %265 = llvm.mlir.constant(0 : index) : !llvm.i64 + %266 = llvm.mlir.constant(512 : index) : !llvm.i64 + %267 = llvm.mul %50, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.mlir.constant(1 : index) : !llvm.i64 + %270 = llvm.mul %242, %269 : !llvm.i64 + %271 = llvm.add %268, %270 : !llvm.i64 + %272 = llvm.getelementptr %264[%271] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %273 = llvm.load %272 : !llvm.ptr + %274 = llvm.fadd %273, %263 {RelaxedPrecision} : !llvm.float + %275 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %276 = llvm.mlir.constant(0 : index) : !llvm.i64 + %277 = llvm.mlir.constant(512 : index) : !llvm.i64 + %278 = llvm.mul %50, %277 : !llvm.i64 + %279 = llvm.add %276, %278 : !llvm.i64 + %280 = llvm.mlir.constant(1 : index) : !llvm.i64 + %281 = llvm.mul %242, %280 : !llvm.i64 + %282 = llvm.add %279, %281 : !llvm.i64 + %283 = llvm.getelementptr %275[%282] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %274, %283 : !llvm.ptr + %284 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %286 = llvm.mlir.constant(512 : index) : !llvm.i64 + %287 = llvm.mul %50, %286 : !llvm.i64 + %288 = llvm.add %285, %287 : !llvm.i64 + %289 = llvm.mlir.constant(1 : index) : !llvm.i64 + %290 = llvm.mul %242, %289 : !llvm.i64 + %291 = llvm.add %288, %290 : !llvm.i64 + %292 = llvm.getelementptr %284[%291] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %293 = llvm.load %292 : !llvm.ptr + %294 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %295 = llvm.mlir.constant(0 : index) : !llvm.i64 + %296 = llvm.mlir.constant(512 : index) : !llvm.i64 + %297 = llvm.mul %50, %296 : !llvm.i64 + %298 = llvm.add %295, %297 : !llvm.i64 + %299 = llvm.mlir.constant(1 : index) : !llvm.i64 + %300 = llvm.mul %242, %299 : !llvm.i64 + %301 = llvm.add %298, %300 : !llvm.i64 + %302 = llvm.getelementptr %294[%301] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %293, %302 : !llvm.ptr + %303 = llvm.add %58, %36 : !llvm.i64 + %304 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %305 = llvm.mlir.constant(0 : index) : !llvm.i64 + %306 = llvm.mlir.constant(128 : index) : !llvm.i64 + %307 = llvm.mul %50, %306 : !llvm.i64 + %308 = llvm.add %305, %307 : !llvm.i64 + %309 = llvm.mlir.constant(1 : index) : !llvm.i64 + %310 = llvm.mul %59, %309 : !llvm.i64 + %311 = llvm.add %308, %310 : !llvm.i64 + %312 = llvm.getelementptr %304[%311] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %313 = llvm.load %312 : !llvm.ptr + %314 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %316 = llvm.mlir.constant(512 : index) : !llvm.i64 + %317 = llvm.mul %59, %316 : !llvm.i64 + %318 = llvm.add %315, %317 : !llvm.i64 + %319 = llvm.mlir.constant(1 : index) : !llvm.i64 + %320 = llvm.mul %303, %319 : !llvm.i64 + %321 = llvm.add %318, %320 : !llvm.i64 + %322 = llvm.getelementptr %314[%321] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %323 = llvm.load %322 : !llvm.ptr + %324 = llvm.fmul %313, %323 {RelaxedPrecision} : !llvm.float + %325 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %326 = llvm.mlir.constant(0 : index) : !llvm.i64 + %327 = llvm.mlir.constant(512 : index) : !llvm.i64 + %328 = llvm.mul %50, %327 : !llvm.i64 + %329 = llvm.add %326, %328 : !llvm.i64 + %330 = llvm.mlir.constant(1 : index) : !llvm.i64 + %331 = llvm.mul %303, %330 : !llvm.i64 + %332 = llvm.add %329, %331 : !llvm.i64 + %333 = llvm.getelementptr %325[%332] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %334 = llvm.load %333 : !llvm.ptr + %335 = llvm.fadd %334, %324 {RelaxedPrecision} : !llvm.float + %336 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %337 = llvm.mlir.constant(0 : index) : !llvm.i64 + %338 = llvm.mlir.constant(512 : index) : !llvm.i64 + %339 = llvm.mul %50, %338 : !llvm.i64 + %340 = llvm.add %337, %339 : !llvm.i64 + %341 = llvm.mlir.constant(1 : index) : !llvm.i64 + %342 = llvm.mul %303, %341 : !llvm.i64 + %343 = llvm.add %340, %342 : !llvm.i64 + %344 = llvm.getelementptr %336[%343] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %335, %344 : !llvm.ptr + %345 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %346 = llvm.mlir.constant(0 : index) : !llvm.i64 + %347 = llvm.mlir.constant(512 : index) : !llvm.i64 + %348 = llvm.mul %50, %347 : !llvm.i64 + %349 = llvm.add %346, %348 : !llvm.i64 + %350 = llvm.mlir.constant(1 : index) : !llvm.i64 + %351 = llvm.mul %303, %350 : !llvm.i64 + %352 = llvm.add %349, %351 : !llvm.i64 + %353 = llvm.getelementptr %345[%352] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %354 = llvm.load %353 : !llvm.ptr + %355 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %356 = llvm.mlir.constant(0 : index) : !llvm.i64 + %357 = llvm.mlir.constant(512 : index) : !llvm.i64 + %358 = llvm.mul %50, %357 : !llvm.i64 + %359 = llvm.add %356, %358 : !llvm.i64 + %360 = llvm.mlir.constant(1 : index) : !llvm.i64 + %361 = llvm.mul %303, %360 : !llvm.i64 + %362 = llvm.add %359, %361 : !llvm.i64 + %363 = llvm.getelementptr %355[%362] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %354, %363 : !llvm.ptr + %364 = llvm.add %58, %37 : !llvm.i64 + %365 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %366 = llvm.mlir.constant(0 : index) : !llvm.i64 + %367 = llvm.mlir.constant(128 : index) : !llvm.i64 + %368 = llvm.mul %50, %367 : !llvm.i64 + %369 = llvm.add %366, %368 : !llvm.i64 + %370 = llvm.mlir.constant(1 : index) : !llvm.i64 + %371 = llvm.mul %59, %370 : !llvm.i64 + %372 = llvm.add %369, %371 : !llvm.i64 + %373 = llvm.getelementptr %365[%372] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %374 = llvm.load %373 : !llvm.ptr + %375 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %376 = llvm.mlir.constant(0 : index) : !llvm.i64 + %377 = llvm.mlir.constant(512 : index) : !llvm.i64 + %378 = llvm.mul %59, %377 : !llvm.i64 + %379 = llvm.add %376, %378 : !llvm.i64 + %380 = llvm.mlir.constant(1 : index) : !llvm.i64 + %381 = llvm.mul %364, %380 : !llvm.i64 + %382 = llvm.add %379, %381 : !llvm.i64 + %383 = llvm.getelementptr %375[%382] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %384 = llvm.load %383 : !llvm.ptr + %385 = llvm.fmul %374, %384 {RelaxedPrecision} : !llvm.float + %386 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %387 = llvm.mlir.constant(0 : index) : !llvm.i64 + %388 = llvm.mlir.constant(512 : index) : !llvm.i64 + %389 = llvm.mul %50, %388 : !llvm.i64 + %390 = llvm.add %387, %389 : !llvm.i64 + %391 = llvm.mlir.constant(1 : index) : !llvm.i64 + %392 = llvm.mul %364, %391 : !llvm.i64 + %393 = llvm.add %390, %392 : !llvm.i64 + %394 = llvm.getelementptr %386[%393] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %395 = llvm.load %394 : !llvm.ptr + %396 = llvm.fadd %395, %385 {RelaxedPrecision} : !llvm.float + %397 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %398 = llvm.mlir.constant(0 : index) : !llvm.i64 + %399 = llvm.mlir.constant(512 : index) : !llvm.i64 + %400 = llvm.mul %50, %399 : !llvm.i64 + %401 = llvm.add %398, %400 : !llvm.i64 + %402 = llvm.mlir.constant(1 : index) : !llvm.i64 + %403 = llvm.mul %364, %402 : !llvm.i64 + %404 = llvm.add %401, %403 : !llvm.i64 + %405 = llvm.getelementptr %397[%404] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %396, %405 : !llvm.ptr + %406 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %408 = llvm.mlir.constant(512 : index) : !llvm.i64 + %409 = llvm.mul %50, %408 : !llvm.i64 + %410 = llvm.add %407, %409 : !llvm.i64 + %411 = llvm.mlir.constant(1 : index) : !llvm.i64 + %412 = llvm.mul %364, %411 : !llvm.i64 + %413 = llvm.add %410, %412 : !llvm.i64 + %414 = llvm.getelementptr %406[%413] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %415 = llvm.load %414 : !llvm.ptr + %416 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %417 = llvm.mlir.constant(0 : index) : !llvm.i64 + %418 = llvm.mlir.constant(512 : index) : !llvm.i64 + %419 = llvm.mul %50, %418 : !llvm.i64 + %420 = llvm.add %417, %419 : !llvm.i64 + %421 = llvm.mlir.constant(1 : index) : !llvm.i64 + %422 = llvm.mul %364, %421 : !llvm.i64 + %423 = llvm.add %420, %422 : !llvm.i64 + %424 = llvm.getelementptr %416[%423] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %415, %424 : !llvm.ptr + %425 = llvm.add %58, %38 : !llvm.i64 + %426 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %427 = llvm.mlir.constant(0 : index) : !llvm.i64 + %428 = llvm.mlir.constant(128 : index) : !llvm.i64 + %429 = llvm.mul %50, %428 : !llvm.i64 + %430 = llvm.add %427, %429 : !llvm.i64 + %431 = llvm.mlir.constant(1 : index) : !llvm.i64 + %432 = llvm.mul %59, %431 : !llvm.i64 + %433 = llvm.add %430, %432 : !llvm.i64 + %434 = llvm.getelementptr %426[%433] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %435 = llvm.load %434 : !llvm.ptr + %436 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %437 = llvm.mlir.constant(0 : index) : !llvm.i64 + %438 = llvm.mlir.constant(512 : index) : !llvm.i64 + %439 = llvm.mul %59, %438 : !llvm.i64 + %440 = llvm.add %437, %439 : !llvm.i64 + %441 = llvm.mlir.constant(1 : index) : !llvm.i64 + %442 = llvm.mul %425, %441 : !llvm.i64 + %443 = llvm.add %440, %442 : !llvm.i64 + %444 = llvm.getelementptr %436[%443] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %445 = llvm.load %444 : !llvm.ptr + %446 = llvm.fmul %435, %445 {RelaxedPrecision} : !llvm.float + %447 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %449 = llvm.mlir.constant(512 : index) : !llvm.i64 + %450 = llvm.mul %50, %449 : !llvm.i64 + %451 = llvm.add %448, %450 : !llvm.i64 + %452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %453 = llvm.mul %425, %452 : !llvm.i64 + %454 = llvm.add %451, %453 : !llvm.i64 + %455 = llvm.getelementptr %447[%454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %456 = llvm.load %455 : !llvm.ptr + %457 = llvm.fadd %456, %446 {RelaxedPrecision} : !llvm.float + %458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %461 = llvm.mul %50, %460 : !llvm.i64 + %462 = llvm.add %459, %461 : !llvm.i64 + %463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %464 = llvm.mul %425, %463 : !llvm.i64 + %465 = llvm.add %462, %464 : !llvm.i64 + %466 = llvm.getelementptr %458[%465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %457, %466 : !llvm.ptr + %467 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %468 = llvm.mlir.constant(0 : index) : !llvm.i64 + %469 = llvm.mlir.constant(512 : index) : !llvm.i64 + %470 = llvm.mul %50, %469 : !llvm.i64 + %471 = llvm.add %468, %470 : !llvm.i64 + %472 = llvm.mlir.constant(1 : index) : !llvm.i64 + %473 = llvm.mul %425, %472 : !llvm.i64 + %474 = llvm.add %471, %473 : !llvm.i64 + %475 = llvm.getelementptr %467[%474] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %476 = llvm.load %475 : !llvm.ptr + %477 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %480 = llvm.mul %50, %479 : !llvm.i64 + %481 = llvm.add %478, %480 : !llvm.i64 + %482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %483 = llvm.mul %425, %482 : !llvm.i64 + %484 = llvm.add %481, %483 : !llvm.i64 + %485 = llvm.getelementptr %477[%484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %476, %485 : !llvm.ptr + %486 = llvm.add %58, %39 : !llvm.i64 + %487 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %489 = llvm.mlir.constant(128 : index) : !llvm.i64 + %490 = llvm.mul %50, %489 : !llvm.i64 + %491 = llvm.add %488, %490 : !llvm.i64 + %492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %493 = llvm.mul %59, %492 : !llvm.i64 + %494 = llvm.add %491, %493 : !llvm.i64 + %495 = llvm.getelementptr %487[%494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %496 = llvm.load %495 : !llvm.ptr + %497 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %500 = llvm.mul %59, %499 : !llvm.i64 + %501 = llvm.add %498, %500 : !llvm.i64 + %502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %503 = llvm.mul %486, %502 : !llvm.i64 + %504 = llvm.add %501, %503 : !llvm.i64 + %505 = llvm.getelementptr %497[%504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %506 = llvm.load %505 : !llvm.ptr + %507 = llvm.fmul %496, %506 {RelaxedPrecision} : !llvm.float + %508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %511 = llvm.mul %50, %510 : !llvm.i64 + %512 = llvm.add %509, %511 : !llvm.i64 + %513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %514 = llvm.mul %486, %513 : !llvm.i64 + %515 = llvm.add %512, %514 : !llvm.i64 + %516 = llvm.getelementptr %508[%515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %517 = llvm.load %516 : !llvm.ptr + %518 = llvm.fadd %517, %507 {RelaxedPrecision} : !llvm.float + %519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %522 = llvm.mul %50, %521 : !llvm.i64 + %523 = llvm.add %520, %522 : !llvm.i64 + %524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %525 = llvm.mul %486, %524 : !llvm.i64 + %526 = llvm.add %523, %525 : !llvm.i64 + %527 = llvm.getelementptr %519[%526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %518, %527 : !llvm.ptr + %528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %531 = llvm.mul %50, %530 : !llvm.i64 + %532 = llvm.add %529, %531 : !llvm.i64 + %533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %534 = llvm.mul %486, %533 : !llvm.i64 + %535 = llvm.add %532, %534 : !llvm.i64 + %536 = llvm.getelementptr %528[%535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %537 = llvm.load %536 : !llvm.ptr + %538 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %539 = llvm.mlir.constant(0 : index) : !llvm.i64 + %540 = llvm.mlir.constant(512 : index) : !llvm.i64 + %541 = llvm.mul %50, %540 : !llvm.i64 + %542 = llvm.add %539, %541 : !llvm.i64 + %543 = llvm.mlir.constant(1 : index) : !llvm.i64 + %544 = llvm.mul %486, %543 : !llvm.i64 + %545 = llvm.add %542, %544 : !llvm.i64 + %546 = llvm.getelementptr %538[%545] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %537, %546 : !llvm.ptr + %547 = llvm.add %58, %40 : !llvm.i64 + %548 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %550 = llvm.mlir.constant(128 : index) : !llvm.i64 + %551 = llvm.mul %50, %550 : !llvm.i64 + %552 = llvm.add %549, %551 : !llvm.i64 + %553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %554 = llvm.mul %59, %553 : !llvm.i64 + %555 = llvm.add %552, %554 : !llvm.i64 + %556 = llvm.getelementptr %548[%555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %557 = llvm.load %556 : !llvm.ptr + %558 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %561 = llvm.mul %59, %560 : !llvm.i64 + %562 = llvm.add %559, %561 : !llvm.i64 + %563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %564 = llvm.mul %547, %563 : !llvm.i64 + %565 = llvm.add %562, %564 : !llvm.i64 + %566 = llvm.getelementptr %558[%565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %567 = llvm.load %566 : !llvm.ptr + %568 = llvm.fmul %557, %567 {RelaxedPrecision} : !llvm.float + %569 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %570 = llvm.mlir.constant(0 : index) : !llvm.i64 + %571 = llvm.mlir.constant(512 : index) : !llvm.i64 + %572 = llvm.mul %50, %571 : !llvm.i64 + %573 = llvm.add %570, %572 : !llvm.i64 + %574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %575 = llvm.mul %547, %574 : !llvm.i64 + %576 = llvm.add %573, %575 : !llvm.i64 + %577 = llvm.getelementptr %569[%576] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %578 = llvm.load %577 : !llvm.ptr + %579 = llvm.fadd %578, %568 {RelaxedPrecision} : !llvm.float + %580 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %581 = llvm.mlir.constant(0 : index) : !llvm.i64 + %582 = llvm.mlir.constant(512 : index) : !llvm.i64 + %583 = llvm.mul %50, %582 : !llvm.i64 + %584 = llvm.add %581, %583 : !llvm.i64 + %585 = llvm.mlir.constant(1 : index) : !llvm.i64 + %586 = llvm.mul %547, %585 : !llvm.i64 + %587 = llvm.add %584, %586 : !llvm.i64 + %588 = llvm.getelementptr %580[%587] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %579, %588 : !llvm.ptr + %589 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %591 = llvm.mlir.constant(512 : index) : !llvm.i64 + %592 = llvm.mul %50, %591 : !llvm.i64 + %593 = llvm.add %590, %592 : !llvm.i64 + %594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %595 = llvm.mul %547, %594 : !llvm.i64 + %596 = llvm.add %593, %595 : !llvm.i64 + %597 = llvm.getelementptr %589[%596] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %598 = llvm.load %597 : !llvm.ptr + %599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %602 = llvm.mul %50, %601 : !llvm.i64 + %603 = llvm.add %600, %602 : !llvm.i64 + %604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %605 = llvm.mul %547, %604 : !llvm.i64 + %606 = llvm.add %603, %605 : !llvm.i64 + %607 = llvm.getelementptr %599[%606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %598, %607 : !llvm.ptr + %608 = llvm.add %58, %41 : !llvm.i64 + %609 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %610 = llvm.mlir.constant(0 : index) : !llvm.i64 + %611 = llvm.mlir.constant(128 : index) : !llvm.i64 + %612 = llvm.mul %50, %611 : !llvm.i64 + %613 = llvm.add %610, %612 : !llvm.i64 + %614 = llvm.mlir.constant(1 : index) : !llvm.i64 + %615 = llvm.mul %59, %614 : !llvm.i64 + %616 = llvm.add %613, %615 : !llvm.i64 + %617 = llvm.getelementptr %609[%616] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %618 = llvm.load %617 : !llvm.ptr + %619 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %620 = llvm.mlir.constant(0 : index) : !llvm.i64 + %621 = llvm.mlir.constant(512 : index) : !llvm.i64 + %622 = llvm.mul %59, %621 : !llvm.i64 + %623 = llvm.add %620, %622 : !llvm.i64 + %624 = llvm.mlir.constant(1 : index) : !llvm.i64 + %625 = llvm.mul %608, %624 : !llvm.i64 + %626 = llvm.add %623, %625 : !llvm.i64 + %627 = llvm.getelementptr %619[%626] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %628 = llvm.load %627 : !llvm.ptr + %629 = llvm.fmul %618, %628 {RelaxedPrecision} : !llvm.float + %630 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %632 = llvm.mlir.constant(512 : index) : !llvm.i64 + %633 = llvm.mul %50, %632 : !llvm.i64 + %634 = llvm.add %631, %633 : !llvm.i64 + %635 = llvm.mlir.constant(1 : index) : !llvm.i64 + %636 = llvm.mul %608, %635 : !llvm.i64 + %637 = llvm.add %634, %636 : !llvm.i64 + %638 = llvm.getelementptr %630[%637] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %639 = llvm.load %638 : !llvm.ptr + %640 = llvm.fadd %639, %629 {RelaxedPrecision} : !llvm.float + %641 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %642 = llvm.mlir.constant(0 : index) : !llvm.i64 + %643 = llvm.mlir.constant(512 : index) : !llvm.i64 + %644 = llvm.mul %50, %643 : !llvm.i64 + %645 = llvm.add %642, %644 : !llvm.i64 + %646 = llvm.mlir.constant(1 : index) : !llvm.i64 + %647 = llvm.mul %608, %646 : !llvm.i64 + %648 = llvm.add %645, %647 : !llvm.i64 + %649 = llvm.getelementptr %641[%648] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %640, %649 : !llvm.ptr + %650 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %651 = llvm.mlir.constant(0 : index) : !llvm.i64 + %652 = llvm.mlir.constant(512 : index) : !llvm.i64 + %653 = llvm.mul %50, %652 : !llvm.i64 + %654 = llvm.add %651, %653 : !llvm.i64 + %655 = llvm.mlir.constant(1 : index) : !llvm.i64 + %656 = llvm.mul %608, %655 : !llvm.i64 + %657 = llvm.add %654, %656 : !llvm.i64 + %658 = llvm.getelementptr %650[%657] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %659 = llvm.load %658 : !llvm.ptr + %660 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %662 = llvm.mlir.constant(512 : index) : !llvm.i64 + %663 = llvm.mul %50, %662 : !llvm.i64 + %664 = llvm.add %661, %663 : !llvm.i64 + %665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %666 = llvm.mul %608, %665 : !llvm.i64 + %667 = llvm.add %664, %666 : !llvm.i64 + %668 = llvm.getelementptr %660[%667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %659, %668 : !llvm.ptr + %669 = llvm.add %58, %42 : !llvm.i64 + %670 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %672 = llvm.mlir.constant(128 : index) : !llvm.i64 + %673 = llvm.mul %50, %672 : !llvm.i64 + %674 = llvm.add %671, %673 : !llvm.i64 + %675 = llvm.mlir.constant(1 : index) : !llvm.i64 + %676 = llvm.mul %59, %675 : !llvm.i64 + %677 = llvm.add %674, %676 : !llvm.i64 + %678 = llvm.getelementptr %670[%677] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %679 = llvm.load %678 : !llvm.ptr + %680 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %681 = llvm.mlir.constant(0 : index) : !llvm.i64 + %682 = llvm.mlir.constant(512 : index) : !llvm.i64 + %683 = llvm.mul %59, %682 : !llvm.i64 + %684 = llvm.add %681, %683 : !llvm.i64 + %685 = llvm.mlir.constant(1 : index) : !llvm.i64 + %686 = llvm.mul %669, %685 : !llvm.i64 + %687 = llvm.add %684, %686 : !llvm.i64 + %688 = llvm.getelementptr %680[%687] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %689 = llvm.load %688 : !llvm.ptr + %690 = llvm.fmul %679, %689 {RelaxedPrecision} : !llvm.float + %691 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %692 = llvm.mlir.constant(0 : index) : !llvm.i64 + %693 = llvm.mlir.constant(512 : index) : !llvm.i64 + %694 = llvm.mul %50, %693 : !llvm.i64 + %695 = llvm.add %692, %694 : !llvm.i64 + %696 = llvm.mlir.constant(1 : index) : !llvm.i64 + %697 = llvm.mul %669, %696 : !llvm.i64 + %698 = llvm.add %695, %697 : !llvm.i64 + %699 = llvm.getelementptr %691[%698] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %700 = llvm.load %699 : !llvm.ptr + %701 = llvm.fadd %700, %690 {RelaxedPrecision} : !llvm.float + %702 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %703 = llvm.mlir.constant(0 : index) : !llvm.i64 + %704 = llvm.mlir.constant(512 : index) : !llvm.i64 + %705 = llvm.mul %50, %704 : !llvm.i64 + %706 = llvm.add %703, %705 : !llvm.i64 + %707 = llvm.mlir.constant(1 : index) : !llvm.i64 + %708 = llvm.mul %669, %707 : !llvm.i64 + %709 = llvm.add %706, %708 : !llvm.i64 + %710 = llvm.getelementptr %702[%709] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %701, %710 : !llvm.ptr + %711 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %712 = llvm.mlir.constant(0 : index) : !llvm.i64 + %713 = llvm.mlir.constant(512 : index) : !llvm.i64 + %714 = llvm.mul %50, %713 : !llvm.i64 + %715 = llvm.add %712, %714 : !llvm.i64 + %716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %717 = llvm.mul %669, %716 : !llvm.i64 + %718 = llvm.add %715, %717 : !llvm.i64 + %719 = llvm.getelementptr %711[%718] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %720 = llvm.load %719 : !llvm.ptr + %721 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %722 = llvm.mlir.constant(0 : index) : !llvm.i64 + %723 = llvm.mlir.constant(512 : index) : !llvm.i64 + %724 = llvm.mul %50, %723 : !llvm.i64 + %725 = llvm.add %722, %724 : !llvm.i64 + %726 = llvm.mlir.constant(1 : index) : !llvm.i64 + %727 = llvm.mul %669, %726 : !llvm.i64 + %728 = llvm.add %725, %727 : !llvm.i64 + %729 = llvm.getelementptr %721[%728] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %720, %729 : !llvm.ptr + %730 = llvm.add %58, %43 : !llvm.i64 + %731 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %732 = llvm.mlir.constant(0 : index) : !llvm.i64 + %733 = llvm.mlir.constant(128 : index) : !llvm.i64 + %734 = llvm.mul %50, %733 : !llvm.i64 + %735 = llvm.add %732, %734 : !llvm.i64 + %736 = llvm.mlir.constant(1 : index) : !llvm.i64 + %737 = llvm.mul %59, %736 : !llvm.i64 + %738 = llvm.add %735, %737 : !llvm.i64 + %739 = llvm.getelementptr %731[%738] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %740 = llvm.load %739 : !llvm.ptr + %741 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %742 = llvm.mlir.constant(0 : index) : !llvm.i64 + %743 = llvm.mlir.constant(512 : index) : !llvm.i64 + %744 = llvm.mul %59, %743 : !llvm.i64 + %745 = llvm.add %742, %744 : !llvm.i64 + %746 = llvm.mlir.constant(1 : index) : !llvm.i64 + %747 = llvm.mul %730, %746 : !llvm.i64 + %748 = llvm.add %745, %747 : !llvm.i64 + %749 = llvm.getelementptr %741[%748] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %750 = llvm.load %749 : !llvm.ptr + %751 = llvm.fmul %740, %750 {RelaxedPrecision} : !llvm.float + %752 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %754 = llvm.mlir.constant(512 : index) : !llvm.i64 + %755 = llvm.mul %50, %754 : !llvm.i64 + %756 = llvm.add %753, %755 : !llvm.i64 + %757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %758 = llvm.mul %730, %757 : !llvm.i64 + %759 = llvm.add %756, %758 : !llvm.i64 + %760 = llvm.getelementptr %752[%759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %761 = llvm.load %760 : !llvm.ptr + %762 = llvm.fadd %761, %751 {RelaxedPrecision} : !llvm.float + %763 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %764 = llvm.mlir.constant(0 : index) : !llvm.i64 + %765 = llvm.mlir.constant(512 : index) : !llvm.i64 + %766 = llvm.mul %50, %765 : !llvm.i64 + %767 = llvm.add %764, %766 : !llvm.i64 + %768 = llvm.mlir.constant(1 : index) : !llvm.i64 + %769 = llvm.mul %730, %768 : !llvm.i64 + %770 = llvm.add %767, %769 : !llvm.i64 + %771 = llvm.getelementptr %763[%770] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %762, %771 : !llvm.ptr + %772 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %773 = llvm.mlir.constant(0 : index) : !llvm.i64 + %774 = llvm.mlir.constant(512 : index) : !llvm.i64 + %775 = llvm.mul %50, %774 : !llvm.i64 + %776 = llvm.add %773, %775 : !llvm.i64 + %777 = llvm.mlir.constant(1 : index) : !llvm.i64 + %778 = llvm.mul %730, %777 : !llvm.i64 + %779 = llvm.add %776, %778 : !llvm.i64 + %780 = llvm.getelementptr %772[%779] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %781 = llvm.load %780 : !llvm.ptr + %782 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %784 = llvm.mlir.constant(512 : index) : !llvm.i64 + %785 = llvm.mul %50, %784 : !llvm.i64 + %786 = llvm.add %783, %785 : !llvm.i64 + %787 = llvm.mlir.constant(1 : index) : !llvm.i64 + %788 = llvm.mul %730, %787 : !llvm.i64 + %789 = llvm.add %786, %788 : !llvm.i64 + %790 = llvm.getelementptr %782[%789] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %781, %790 : !llvm.ptr + %791 = llvm.add %58, %44 : !llvm.i64 + %792 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %794 = llvm.mlir.constant(128 : index) : !llvm.i64 + %795 = llvm.mul %50, %794 : !llvm.i64 + %796 = llvm.add %793, %795 : !llvm.i64 + %797 = llvm.mlir.constant(1 : index) : !llvm.i64 + %798 = llvm.mul %59, %797 : !llvm.i64 + %799 = llvm.add %796, %798 : !llvm.i64 + %800 = llvm.getelementptr %792[%799] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %801 = llvm.load %800 : !llvm.ptr + %802 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %803 = llvm.mlir.constant(0 : index) : !llvm.i64 + %804 = llvm.mlir.constant(512 : index) : !llvm.i64 + %805 = llvm.mul %59, %804 : !llvm.i64 + %806 = llvm.add %803, %805 : !llvm.i64 + %807 = llvm.mlir.constant(1 : index) : !llvm.i64 + %808 = llvm.mul %791, %807 : !llvm.i64 + %809 = llvm.add %806, %808 : !llvm.i64 + %810 = llvm.getelementptr %802[%809] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %811 = llvm.load %810 : !llvm.ptr + %812 = llvm.fmul %801, %811 {RelaxedPrecision} : !llvm.float + %813 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %814 = llvm.mlir.constant(0 : index) : !llvm.i64 + %815 = llvm.mlir.constant(512 : index) : !llvm.i64 + %816 = llvm.mul %50, %815 : !llvm.i64 + %817 = llvm.add %814, %816 : !llvm.i64 + %818 = llvm.mlir.constant(1 : index) : !llvm.i64 + %819 = llvm.mul %791, %818 : !llvm.i64 + %820 = llvm.add %817, %819 : !llvm.i64 + %821 = llvm.getelementptr %813[%820] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %822 = llvm.load %821 : !llvm.ptr + %823 = llvm.fadd %822, %812 {RelaxedPrecision} : !llvm.float + %824 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %825 = llvm.mlir.constant(0 : index) : !llvm.i64 + %826 = llvm.mlir.constant(512 : index) : !llvm.i64 + %827 = llvm.mul %50, %826 : !llvm.i64 + %828 = llvm.add %825, %827 : !llvm.i64 + %829 = llvm.mlir.constant(1 : index) : !llvm.i64 + %830 = llvm.mul %791, %829 : !llvm.i64 + %831 = llvm.add %828, %830 : !llvm.i64 + %832 = llvm.getelementptr %824[%831] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %823, %832 : !llvm.ptr + %833 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %834 = llvm.mlir.constant(0 : index) : !llvm.i64 + %835 = llvm.mlir.constant(512 : index) : !llvm.i64 + %836 = llvm.mul %50, %835 : !llvm.i64 + %837 = llvm.add %834, %836 : !llvm.i64 + %838 = llvm.mlir.constant(1 : index) : !llvm.i64 + %839 = llvm.mul %791, %838 : !llvm.i64 + %840 = llvm.add %837, %839 : !llvm.i64 + %841 = llvm.getelementptr %833[%840] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %842 = llvm.load %841 : !llvm.ptr + %843 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %844 = llvm.mlir.constant(0 : index) : !llvm.i64 + %845 = llvm.mlir.constant(512 : index) : !llvm.i64 + %846 = llvm.mul %50, %845 : !llvm.i64 + %847 = llvm.add %844, %846 : !llvm.i64 + %848 = llvm.mlir.constant(1 : index) : !llvm.i64 + %849 = llvm.mul %791, %848 : !llvm.i64 + %850 = llvm.add %847, %849 : !llvm.i64 + %851 = llvm.getelementptr %843[%850] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %842, %851 : !llvm.ptr + %852 = llvm.add %58, %45 : !llvm.i64 + %853 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %854 = llvm.mlir.constant(0 : index) : !llvm.i64 + %855 = llvm.mlir.constant(128 : index) : !llvm.i64 + %856 = llvm.mul %50, %855 : !llvm.i64 + %857 = llvm.add %854, %856 : !llvm.i64 + %858 = llvm.mlir.constant(1 : index) : !llvm.i64 + %859 = llvm.mul %59, %858 : !llvm.i64 + %860 = llvm.add %857, %859 : !llvm.i64 + %861 = llvm.getelementptr %853[%860] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %862 = llvm.load %861 : !llvm.ptr + %863 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %864 = llvm.mlir.constant(0 : index) : !llvm.i64 + %865 = llvm.mlir.constant(512 : index) : !llvm.i64 + %866 = llvm.mul %59, %865 : !llvm.i64 + %867 = llvm.add %864, %866 : !llvm.i64 + %868 = llvm.mlir.constant(1 : index) : !llvm.i64 + %869 = llvm.mul %852, %868 : !llvm.i64 + %870 = llvm.add %867, %869 : !llvm.i64 + %871 = llvm.getelementptr %863[%870] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %872 = llvm.load %871 : !llvm.ptr + %873 = llvm.fmul %862, %872 {RelaxedPrecision} : !llvm.float + %874 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %875 = llvm.mlir.constant(0 : index) : !llvm.i64 + %876 = llvm.mlir.constant(512 : index) : !llvm.i64 + %877 = llvm.mul %50, %876 : !llvm.i64 + %878 = llvm.add %875, %877 : !llvm.i64 + %879 = llvm.mlir.constant(1 : index) : !llvm.i64 + %880 = llvm.mul %852, %879 : !llvm.i64 + %881 = llvm.add %878, %880 : !llvm.i64 + %882 = llvm.getelementptr %874[%881] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %883 = llvm.load %882 : !llvm.ptr + %884 = llvm.fadd %883, %873 {RelaxedPrecision} : !llvm.float + %885 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %886 = llvm.mlir.constant(0 : index) : !llvm.i64 + %887 = llvm.mlir.constant(512 : index) : !llvm.i64 + %888 = llvm.mul %50, %887 : !llvm.i64 + %889 = llvm.add %886, %888 : !llvm.i64 + %890 = llvm.mlir.constant(1 : index) : !llvm.i64 + %891 = llvm.mul %852, %890 : !llvm.i64 + %892 = llvm.add %889, %891 : !llvm.i64 + %893 = llvm.getelementptr %885[%892] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %884, %893 : !llvm.ptr + %894 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %895 = llvm.mlir.constant(0 : index) : !llvm.i64 + %896 = llvm.mlir.constant(512 : index) : !llvm.i64 + %897 = llvm.mul %50, %896 : !llvm.i64 + %898 = llvm.add %895, %897 : !llvm.i64 + %899 = llvm.mlir.constant(1 : index) : !llvm.i64 + %900 = llvm.mul %852, %899 : !llvm.i64 + %901 = llvm.add %898, %900 : !llvm.i64 + %902 = llvm.getelementptr %894[%901] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %903 = llvm.load %902 : !llvm.ptr + %904 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %905 = llvm.mlir.constant(0 : index) : !llvm.i64 + %906 = llvm.mlir.constant(512 : index) : !llvm.i64 + %907 = llvm.mul %50, %906 : !llvm.i64 + %908 = llvm.add %905, %907 : !llvm.i64 + %909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %910 = llvm.mul %852, %909 : !llvm.i64 + %911 = llvm.add %908, %910 : !llvm.i64 + %912 = llvm.getelementptr %904[%911] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %903, %912 : !llvm.ptr + %913 = llvm.add %58, %46 : !llvm.i64 + %914 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %915 = llvm.mlir.constant(0 : index) : !llvm.i64 + %916 = llvm.mlir.constant(128 : index) : !llvm.i64 + %917 = llvm.mul %50, %916 : !llvm.i64 + %918 = llvm.add %915, %917 : !llvm.i64 + %919 = llvm.mlir.constant(1 : index) : !llvm.i64 + %920 = llvm.mul %59, %919 : !llvm.i64 + %921 = llvm.add %918, %920 : !llvm.i64 + %922 = llvm.getelementptr %914[%921] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %923 = llvm.load %922 : !llvm.ptr + %924 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %925 = llvm.mlir.constant(0 : index) : !llvm.i64 + %926 = llvm.mlir.constant(512 : index) : !llvm.i64 + %927 = llvm.mul %59, %926 : !llvm.i64 + %928 = llvm.add %925, %927 : !llvm.i64 + %929 = llvm.mlir.constant(1 : index) : !llvm.i64 + %930 = llvm.mul %913, %929 : !llvm.i64 + %931 = llvm.add %928, %930 : !llvm.i64 + %932 = llvm.getelementptr %924[%931] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %933 = llvm.load %932 : !llvm.ptr + %934 = llvm.fmul %923, %933 {RelaxedPrecision} : !llvm.float + %935 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %936 = llvm.mlir.constant(0 : index) : !llvm.i64 + %937 = llvm.mlir.constant(512 : index) : !llvm.i64 + %938 = llvm.mul %50, %937 : !llvm.i64 + %939 = llvm.add %936, %938 : !llvm.i64 + %940 = llvm.mlir.constant(1 : index) : !llvm.i64 + %941 = llvm.mul %913, %940 : !llvm.i64 + %942 = llvm.add %939, %941 : !llvm.i64 + %943 = llvm.getelementptr %935[%942] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %944 = llvm.load %943 : !llvm.ptr + %945 = llvm.fadd %944, %934 {RelaxedPrecision} : !llvm.float + %946 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %948 = llvm.mlir.constant(512 : index) : !llvm.i64 + %949 = llvm.mul %50, %948 : !llvm.i64 + %950 = llvm.add %947, %949 : !llvm.i64 + %951 = llvm.mlir.constant(1 : index) : !llvm.i64 + %952 = llvm.mul %913, %951 : !llvm.i64 + %953 = llvm.add %950, %952 : !llvm.i64 + %954 = llvm.getelementptr %946[%953] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %945, %954 : !llvm.ptr + %955 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %956 = llvm.mlir.constant(0 : index) : !llvm.i64 + %957 = llvm.mlir.constant(512 : index) : !llvm.i64 + %958 = llvm.mul %50, %957 : !llvm.i64 + %959 = llvm.add %956, %958 : !llvm.i64 + %960 = llvm.mlir.constant(1 : index) : !llvm.i64 + %961 = llvm.mul %913, %960 : !llvm.i64 + %962 = llvm.add %959, %961 : !llvm.i64 + %963 = llvm.getelementptr %955[%962] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %964 = llvm.load %963 : !llvm.ptr + %965 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %966 = llvm.mlir.constant(0 : index) : !llvm.i64 + %967 = llvm.mlir.constant(512 : index) : !llvm.i64 + %968 = llvm.mul %50, %967 : !llvm.i64 + %969 = llvm.add %966, %968 : !llvm.i64 + %970 = llvm.mlir.constant(1 : index) : !llvm.i64 + %971 = llvm.mul %913, %970 : !llvm.i64 + %972 = llvm.add %969, %971 : !llvm.i64 + %973 = llvm.getelementptr %965[%972] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %964, %973 : !llvm.ptr + %974 = llvm.add %58, %47 : !llvm.i64 + %975 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %976 = llvm.mlir.constant(0 : index) : !llvm.i64 + %977 = llvm.mlir.constant(128 : index) : !llvm.i64 + %978 = llvm.mul %50, %977 : !llvm.i64 + %979 = llvm.add %976, %978 : !llvm.i64 + %980 = llvm.mlir.constant(1 : index) : !llvm.i64 + %981 = llvm.mul %59, %980 : !llvm.i64 + %982 = llvm.add %979, %981 : !llvm.i64 + %983 = llvm.getelementptr %975[%982] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %984 = llvm.load %983 : !llvm.ptr + %985 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %986 = llvm.mlir.constant(0 : index) : !llvm.i64 + %987 = llvm.mlir.constant(512 : index) : !llvm.i64 + %988 = llvm.mul %59, %987 : !llvm.i64 + %989 = llvm.add %986, %988 : !llvm.i64 + %990 = llvm.mlir.constant(1 : index) : !llvm.i64 + %991 = llvm.mul %974, %990 : !llvm.i64 + %992 = llvm.add %989, %991 : !llvm.i64 + %993 = llvm.getelementptr %985[%992] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %994 = llvm.load %993 : !llvm.ptr + %995 = llvm.fmul %984, %994 {RelaxedPrecision} : !llvm.float + %996 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %997 = llvm.mlir.constant(0 : index) : !llvm.i64 + %998 = llvm.mlir.constant(512 : index) : !llvm.i64 + %999 = llvm.mul %50, %998 : !llvm.i64 + %1000 = llvm.add %997, %999 : !llvm.i64 + %1001 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1002 = llvm.mul %974, %1001 : !llvm.i64 + %1003 = llvm.add %1000, %1002 : !llvm.i64 + %1004 = llvm.getelementptr %996[%1003] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1005 = llvm.load %1004 : !llvm.ptr + %1006 = llvm.fadd %1005, %995 {RelaxedPrecision} : !llvm.float + %1007 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1009 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1010 = llvm.mul %50, %1009 : !llvm.i64 + %1011 = llvm.add %1008, %1010 : !llvm.i64 + %1012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1013 = llvm.mul %974, %1012 : !llvm.i64 + %1014 = llvm.add %1011, %1013 : !llvm.i64 + %1015 = llvm.getelementptr %1007[%1014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1006, %1015 : !llvm.ptr + %1016 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1017 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1018 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1019 = llvm.mul %50, %1018 : !llvm.i64 + %1020 = llvm.add %1017, %1019 : !llvm.i64 + %1021 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1022 = llvm.mul %974, %1021 : !llvm.i64 + %1023 = llvm.add %1020, %1022 : !llvm.i64 + %1024 = llvm.getelementptr %1016[%1023] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1025 = llvm.load %1024 : !llvm.ptr + %1026 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1027 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1028 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1029 = llvm.mul %50, %1028 : !llvm.i64 + %1030 = llvm.add %1027, %1029 : !llvm.i64 + %1031 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1032 = llvm.mul %974, %1031 : !llvm.i64 + %1033 = llvm.add %1030, %1032 : !llvm.i64 + %1034 = llvm.getelementptr %1026[%1033] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1025, %1034 : !llvm.ptr + %1035 = llvm.add %50, %33 : !llvm.i64 + %1036 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1037 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1038 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1039 = llvm.mul %1035, %1038 : !llvm.i64 + %1040 = llvm.add %1037, %1039 : !llvm.i64 + %1041 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1042 = llvm.mul %59, %1041 : !llvm.i64 + %1043 = llvm.add %1040, %1042 : !llvm.i64 + %1044 = llvm.getelementptr %1036[%1043] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1045 = llvm.load %1044 : !llvm.ptr + %1046 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1047 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1048 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1049 = llvm.mul %59, %1048 : !llvm.i64 + %1050 = llvm.add %1047, %1049 : !llvm.i64 + %1051 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1052 = llvm.mul %58, %1051 : !llvm.i64 + %1053 = llvm.add %1050, %1052 : !llvm.i64 + %1054 = llvm.getelementptr %1046[%1053] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1055 = llvm.load %1054 : !llvm.ptr + %1056 = llvm.fmul %1045, %1055 {RelaxedPrecision} : !llvm.float + %1057 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1059 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1060 = llvm.mul %1035, %1059 : !llvm.i64 + %1061 = llvm.add %1058, %1060 : !llvm.i64 + %1062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1063 = llvm.mul %58, %1062 : !llvm.i64 + %1064 = llvm.add %1061, %1063 : !llvm.i64 + %1065 = llvm.getelementptr %1057[%1064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1066 = llvm.load %1065 : !llvm.ptr + %1067 = llvm.fadd %1066, %1056 {RelaxedPrecision} : !llvm.float + %1068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1071 = llvm.mul %1035, %1070 : !llvm.i64 + %1072 = llvm.add %1069, %1071 : !llvm.i64 + %1073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1074 = llvm.mul %58, %1073 : !llvm.i64 + %1075 = llvm.add %1072, %1074 : !llvm.i64 + %1076 = llvm.getelementptr %1068[%1075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1067, %1076 : !llvm.ptr + %1077 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1078 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1079 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1080 = llvm.mul %1035, %1079 : !llvm.i64 + %1081 = llvm.add %1078, %1080 : !llvm.i64 + %1082 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1083 = llvm.mul %58, %1082 : !llvm.i64 + %1084 = llvm.add %1081, %1083 : !llvm.i64 + %1085 = llvm.getelementptr %1077[%1084] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1086 = llvm.load %1085 : !llvm.ptr + %1087 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1088 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1089 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1090 = llvm.mul %1035, %1089 : !llvm.i64 + %1091 = llvm.add %1088, %1090 : !llvm.i64 + %1092 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1093 = llvm.mul %58, %1092 : !llvm.i64 + %1094 = llvm.add %1091, %1093 : !llvm.i64 + %1095 = llvm.getelementptr %1087[%1094] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1086, %1095 : !llvm.ptr + %1096 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1098 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1099 = llvm.mul %1035, %1098 : !llvm.i64 + %1100 = llvm.add %1097, %1099 : !llvm.i64 + %1101 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1102 = llvm.mul %59, %1101 : !llvm.i64 + %1103 = llvm.add %1100, %1102 : !llvm.i64 + %1104 = llvm.getelementptr %1096[%1103] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1105 = llvm.load %1104 : !llvm.ptr + %1106 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1108 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1109 = llvm.mul %59, %1108 : !llvm.i64 + %1110 = llvm.add %1107, %1109 : !llvm.i64 + %1111 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1112 = llvm.mul %120, %1111 : !llvm.i64 + %1113 = llvm.add %1110, %1112 : !llvm.i64 + %1114 = llvm.getelementptr %1106[%1113] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1115 = llvm.load %1114 : !llvm.ptr + %1116 = llvm.fmul %1105, %1115 {RelaxedPrecision} : !llvm.float + %1117 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1120 = llvm.mul %1035, %1119 : !llvm.i64 + %1121 = llvm.add %1118, %1120 : !llvm.i64 + %1122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1123 = llvm.mul %120, %1122 : !llvm.i64 + %1124 = llvm.add %1121, %1123 : !llvm.i64 + %1125 = llvm.getelementptr %1117[%1124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1126 = llvm.load %1125 : !llvm.ptr + %1127 = llvm.fadd %1126, %1116 {RelaxedPrecision} : !llvm.float + %1128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1131 = llvm.mul %1035, %1130 : !llvm.i64 + %1132 = llvm.add %1129, %1131 : !llvm.i64 + %1133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1134 = llvm.mul %120, %1133 : !llvm.i64 + %1135 = llvm.add %1132, %1134 : !llvm.i64 + %1136 = llvm.getelementptr %1128[%1135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1127, %1136 : !llvm.ptr + %1137 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1140 = llvm.mul %1035, %1139 : !llvm.i64 + %1141 = llvm.add %1138, %1140 : !llvm.i64 + %1142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1143 = llvm.mul %120, %1142 : !llvm.i64 + %1144 = llvm.add %1141, %1143 : !llvm.i64 + %1145 = llvm.getelementptr %1137[%1144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1146 = llvm.load %1145 : !llvm.ptr + %1147 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1149 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1150 = llvm.mul %1035, %1149 : !llvm.i64 + %1151 = llvm.add %1148, %1150 : !llvm.i64 + %1152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1153 = llvm.mul %120, %1152 : !llvm.i64 + %1154 = llvm.add %1151, %1153 : !llvm.i64 + %1155 = llvm.getelementptr %1147[%1154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1146, %1155 : !llvm.ptr + %1156 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1157 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1158 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1159 = llvm.mul %1035, %1158 : !llvm.i64 + %1160 = llvm.add %1157, %1159 : !llvm.i64 + %1161 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1162 = llvm.mul %59, %1161 : !llvm.i64 + %1163 = llvm.add %1160, %1162 : !llvm.i64 + %1164 = llvm.getelementptr %1156[%1163] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1165 = llvm.load %1164 : !llvm.ptr + %1166 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1167 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1168 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1169 = llvm.mul %59, %1168 : !llvm.i64 + %1170 = llvm.add %1167, %1169 : !llvm.i64 + %1171 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1172 = llvm.mul %181, %1171 : !llvm.i64 + %1173 = llvm.add %1170, %1172 : !llvm.i64 + %1174 = llvm.getelementptr %1166[%1173] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1175 = llvm.load %1174 : !llvm.ptr + %1176 = llvm.fmul %1165, %1175 {RelaxedPrecision} : !llvm.float + %1177 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1179 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1180 = llvm.mul %1035, %1179 : !llvm.i64 + %1181 = llvm.add %1178, %1180 : !llvm.i64 + %1182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1183 = llvm.mul %181, %1182 : !llvm.i64 + %1184 = llvm.add %1181, %1183 : !llvm.i64 + %1185 = llvm.getelementptr %1177[%1184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1186 = llvm.load %1185 : !llvm.ptr + %1187 = llvm.fadd %1186, %1176 {RelaxedPrecision} : !llvm.float + %1188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1191 = llvm.mul %1035, %1190 : !llvm.i64 + %1192 = llvm.add %1189, %1191 : !llvm.i64 + %1193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1194 = llvm.mul %181, %1193 : !llvm.i64 + %1195 = llvm.add %1192, %1194 : !llvm.i64 + %1196 = llvm.getelementptr %1188[%1195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1187, %1196 : !llvm.ptr + %1197 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1198 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1199 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1200 = llvm.mul %1035, %1199 : !llvm.i64 + %1201 = llvm.add %1198, %1200 : !llvm.i64 + %1202 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1203 = llvm.mul %181, %1202 : !llvm.i64 + %1204 = llvm.add %1201, %1203 : !llvm.i64 + %1205 = llvm.getelementptr %1197[%1204] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1206 = llvm.load %1205 : !llvm.ptr + %1207 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1208 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1209 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1210 = llvm.mul %1035, %1209 : !llvm.i64 + %1211 = llvm.add %1208, %1210 : !llvm.i64 + %1212 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1213 = llvm.mul %181, %1212 : !llvm.i64 + %1214 = llvm.add %1211, %1213 : !llvm.i64 + %1215 = llvm.getelementptr %1207[%1214] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1206, %1215 : !llvm.ptr + %1216 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1217 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1218 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1219 = llvm.mul %1035, %1218 : !llvm.i64 + %1220 = llvm.add %1217, %1219 : !llvm.i64 + %1221 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1222 = llvm.mul %59, %1221 : !llvm.i64 + %1223 = llvm.add %1220, %1222 : !llvm.i64 + %1224 = llvm.getelementptr %1216[%1223] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1225 = llvm.load %1224 : !llvm.ptr + %1226 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1227 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1228 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1229 = llvm.mul %59, %1228 : !llvm.i64 + %1230 = llvm.add %1227, %1229 : !llvm.i64 + %1231 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1232 = llvm.mul %242, %1231 : !llvm.i64 + %1233 = llvm.add %1230, %1232 : !llvm.i64 + %1234 = llvm.getelementptr %1226[%1233] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1235 = llvm.load %1234 : !llvm.ptr + %1236 = llvm.fmul %1225, %1235 {RelaxedPrecision} : !llvm.float + %1237 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1239 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1240 = llvm.mul %1035, %1239 : !llvm.i64 + %1241 = llvm.add %1238, %1240 : !llvm.i64 + %1242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1243 = llvm.mul %242, %1242 : !llvm.i64 + %1244 = llvm.add %1241, %1243 : !llvm.i64 + %1245 = llvm.getelementptr %1237[%1244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1246 = llvm.load %1245 : !llvm.ptr + %1247 = llvm.fadd %1246, %1236 {RelaxedPrecision} : !llvm.float + %1248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1251 = llvm.mul %1035, %1250 : !llvm.i64 + %1252 = llvm.add %1249, %1251 : !llvm.i64 + %1253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1254 = llvm.mul %242, %1253 : !llvm.i64 + %1255 = llvm.add %1252, %1254 : !llvm.i64 + %1256 = llvm.getelementptr %1248[%1255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1247, %1256 : !llvm.ptr + %1257 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1258 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1259 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1260 = llvm.mul %1035, %1259 : !llvm.i64 + %1261 = llvm.add %1258, %1260 : !llvm.i64 + %1262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1263 = llvm.mul %242, %1262 : !llvm.i64 + %1264 = llvm.add %1261, %1263 : !llvm.i64 + %1265 = llvm.getelementptr %1257[%1264] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1266 = llvm.load %1265 : !llvm.ptr + %1267 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1269 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1270 = llvm.mul %1035, %1269 : !llvm.i64 + %1271 = llvm.add %1268, %1270 : !llvm.i64 + %1272 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1273 = llvm.mul %242, %1272 : !llvm.i64 + %1274 = llvm.add %1271, %1273 : !llvm.i64 + %1275 = llvm.getelementptr %1267[%1274] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1266, %1275 : !llvm.ptr + %1276 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1277 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1278 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1279 = llvm.mul %1035, %1278 : !llvm.i64 + %1280 = llvm.add %1277, %1279 : !llvm.i64 + %1281 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1282 = llvm.mul %59, %1281 : !llvm.i64 + %1283 = llvm.add %1280, %1282 : !llvm.i64 + %1284 = llvm.getelementptr %1276[%1283] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1285 = llvm.load %1284 : !llvm.ptr + %1286 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1287 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1288 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1289 = llvm.mul %59, %1288 : !llvm.i64 + %1290 = llvm.add %1287, %1289 : !llvm.i64 + %1291 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1292 = llvm.mul %303, %1291 : !llvm.i64 + %1293 = llvm.add %1290, %1292 : !llvm.i64 + %1294 = llvm.getelementptr %1286[%1293] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1295 = llvm.load %1294 : !llvm.ptr + %1296 = llvm.fmul %1285, %1295 {RelaxedPrecision} : !llvm.float + %1297 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1299 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1300 = llvm.mul %1035, %1299 : !llvm.i64 + %1301 = llvm.add %1298, %1300 : !llvm.i64 + %1302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1303 = llvm.mul %303, %1302 : !llvm.i64 + %1304 = llvm.add %1301, %1303 : !llvm.i64 + %1305 = llvm.getelementptr %1297[%1304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1306 = llvm.load %1305 : !llvm.ptr + %1307 = llvm.fadd %1306, %1296 {RelaxedPrecision} : !llvm.float + %1308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1311 = llvm.mul %1035, %1310 : !llvm.i64 + %1312 = llvm.add %1309, %1311 : !llvm.i64 + %1313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1314 = llvm.mul %303, %1313 : !llvm.i64 + %1315 = llvm.add %1312, %1314 : !llvm.i64 + %1316 = llvm.getelementptr %1308[%1315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1307, %1316 : !llvm.ptr + %1317 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1318 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1319 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1320 = llvm.mul %1035, %1319 : !llvm.i64 + %1321 = llvm.add %1318, %1320 : !llvm.i64 + %1322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1323 = llvm.mul %303, %1322 : !llvm.i64 + %1324 = llvm.add %1321, %1323 : !llvm.i64 + %1325 = llvm.getelementptr %1317[%1324] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1326 = llvm.load %1325 : !llvm.ptr + %1327 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1328 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1329 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1330 = llvm.mul %1035, %1329 : !llvm.i64 + %1331 = llvm.add %1328, %1330 : !llvm.i64 + %1332 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1333 = llvm.mul %303, %1332 : !llvm.i64 + %1334 = llvm.add %1331, %1333 : !llvm.i64 + %1335 = llvm.getelementptr %1327[%1334] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1326, %1335 : !llvm.ptr + %1336 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1337 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1338 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1339 = llvm.mul %1035, %1338 : !llvm.i64 + %1340 = llvm.add %1337, %1339 : !llvm.i64 + %1341 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1342 = llvm.mul %59, %1341 : !llvm.i64 + %1343 = llvm.add %1340, %1342 : !llvm.i64 + %1344 = llvm.getelementptr %1336[%1343] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1345 = llvm.load %1344 : !llvm.ptr + %1346 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1347 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1348 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1349 = llvm.mul %59, %1348 : !llvm.i64 + %1350 = llvm.add %1347, %1349 : !llvm.i64 + %1351 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1352 = llvm.mul %364, %1351 : !llvm.i64 + %1353 = llvm.add %1350, %1352 : !llvm.i64 + %1354 = llvm.getelementptr %1346[%1353] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1355 = llvm.load %1354 : !llvm.ptr + %1356 = llvm.fmul %1345, %1355 {RelaxedPrecision} : !llvm.float + %1357 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1359 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1360 = llvm.mul %1035, %1359 : !llvm.i64 + %1361 = llvm.add %1358, %1360 : !llvm.i64 + %1362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1363 = llvm.mul %364, %1362 : !llvm.i64 + %1364 = llvm.add %1361, %1363 : !llvm.i64 + %1365 = llvm.getelementptr %1357[%1364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1366 = llvm.load %1365 : !llvm.ptr + %1367 = llvm.fadd %1366, %1356 {RelaxedPrecision} : !llvm.float + %1368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1371 = llvm.mul %1035, %1370 : !llvm.i64 + %1372 = llvm.add %1369, %1371 : !llvm.i64 + %1373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1374 = llvm.mul %364, %1373 : !llvm.i64 + %1375 = llvm.add %1372, %1374 : !llvm.i64 + %1376 = llvm.getelementptr %1368[%1375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1367, %1376 : !llvm.ptr + %1377 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1378 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1379 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1380 = llvm.mul %1035, %1379 : !llvm.i64 + %1381 = llvm.add %1378, %1380 : !llvm.i64 + %1382 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1383 = llvm.mul %364, %1382 : !llvm.i64 + %1384 = llvm.add %1381, %1383 : !llvm.i64 + %1385 = llvm.getelementptr %1377[%1384] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1386 = llvm.load %1385 : !llvm.ptr + %1387 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1389 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1390 = llvm.mul %1035, %1389 : !llvm.i64 + %1391 = llvm.add %1388, %1390 : !llvm.i64 + %1392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1393 = llvm.mul %364, %1392 : !llvm.i64 + %1394 = llvm.add %1391, %1393 : !llvm.i64 + %1395 = llvm.getelementptr %1387[%1394] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1386, %1395 : !llvm.ptr + %1396 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1397 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1398 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1399 = llvm.mul %1035, %1398 : !llvm.i64 + %1400 = llvm.add %1397, %1399 : !llvm.i64 + %1401 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1402 = llvm.mul %59, %1401 : !llvm.i64 + %1403 = llvm.add %1400, %1402 : !llvm.i64 + %1404 = llvm.getelementptr %1396[%1403] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1405 = llvm.load %1404 : !llvm.ptr + %1406 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1408 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1409 = llvm.mul %59, %1408 : !llvm.i64 + %1410 = llvm.add %1407, %1409 : !llvm.i64 + %1411 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1412 = llvm.mul %425, %1411 : !llvm.i64 + %1413 = llvm.add %1410, %1412 : !llvm.i64 + %1414 = llvm.getelementptr %1406[%1413] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1415 = llvm.load %1414 : !llvm.ptr + %1416 = llvm.fmul %1405, %1415 {RelaxedPrecision} : !llvm.float + %1417 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1419 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1420 = llvm.mul %1035, %1419 : !llvm.i64 + %1421 = llvm.add %1418, %1420 : !llvm.i64 + %1422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1423 = llvm.mul %425, %1422 : !llvm.i64 + %1424 = llvm.add %1421, %1423 : !llvm.i64 + %1425 = llvm.getelementptr %1417[%1424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1426 = llvm.load %1425 : !llvm.ptr + %1427 = llvm.fadd %1426, %1416 {RelaxedPrecision} : !llvm.float + %1428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1431 = llvm.mul %1035, %1430 : !llvm.i64 + %1432 = llvm.add %1429, %1431 : !llvm.i64 + %1433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1434 = llvm.mul %425, %1433 : !llvm.i64 + %1435 = llvm.add %1432, %1434 : !llvm.i64 + %1436 = llvm.getelementptr %1428[%1435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1427, %1436 : !llvm.ptr + %1437 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1438 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1439 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1440 = llvm.mul %1035, %1439 : !llvm.i64 + %1441 = llvm.add %1438, %1440 : !llvm.i64 + %1442 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1443 = llvm.mul %425, %1442 : !llvm.i64 + %1444 = llvm.add %1441, %1443 : !llvm.i64 + %1445 = llvm.getelementptr %1437[%1444] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1446 = llvm.load %1445 : !llvm.ptr + %1447 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1449 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1450 = llvm.mul %1035, %1449 : !llvm.i64 + %1451 = llvm.add %1448, %1450 : !llvm.i64 + %1452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1453 = llvm.mul %425, %1452 : !llvm.i64 + %1454 = llvm.add %1451, %1453 : !llvm.i64 + %1455 = llvm.getelementptr %1447[%1454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1446, %1455 : !llvm.ptr + %1456 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1457 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1458 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1459 = llvm.mul %1035, %1458 : !llvm.i64 + %1460 = llvm.add %1457, %1459 : !llvm.i64 + %1461 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1462 = llvm.mul %59, %1461 : !llvm.i64 + %1463 = llvm.add %1460, %1462 : !llvm.i64 + %1464 = llvm.getelementptr %1456[%1463] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1465 = llvm.load %1464 : !llvm.ptr + %1466 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1467 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1468 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1469 = llvm.mul %59, %1468 : !llvm.i64 + %1470 = llvm.add %1467, %1469 : !llvm.i64 + %1471 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1472 = llvm.mul %486, %1471 : !llvm.i64 + %1473 = llvm.add %1470, %1472 : !llvm.i64 + %1474 = llvm.getelementptr %1466[%1473] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1475 = llvm.load %1474 : !llvm.ptr + %1476 = llvm.fmul %1465, %1475 {RelaxedPrecision} : !llvm.float + %1477 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1480 = llvm.mul %1035, %1479 : !llvm.i64 + %1481 = llvm.add %1478, %1480 : !llvm.i64 + %1482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1483 = llvm.mul %486, %1482 : !llvm.i64 + %1484 = llvm.add %1481, %1483 : !llvm.i64 + %1485 = llvm.getelementptr %1477[%1484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1486 = llvm.load %1485 : !llvm.ptr + %1487 = llvm.fadd %1486, %1476 {RelaxedPrecision} : !llvm.float + %1488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1491 = llvm.mul %1035, %1490 : !llvm.i64 + %1492 = llvm.add %1489, %1491 : !llvm.i64 + %1493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1494 = llvm.mul %486, %1493 : !llvm.i64 + %1495 = llvm.add %1492, %1494 : !llvm.i64 + %1496 = llvm.getelementptr %1488[%1495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1487, %1496 : !llvm.ptr + %1497 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1500 = llvm.mul %1035, %1499 : !llvm.i64 + %1501 = llvm.add %1498, %1500 : !llvm.i64 + %1502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1503 = llvm.mul %486, %1502 : !llvm.i64 + %1504 = llvm.add %1501, %1503 : !llvm.i64 + %1505 = llvm.getelementptr %1497[%1504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1506 = llvm.load %1505 : !llvm.ptr + %1507 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1508 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1509 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1510 = llvm.mul %1035, %1509 : !llvm.i64 + %1511 = llvm.add %1508, %1510 : !llvm.i64 + %1512 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1513 = llvm.mul %486, %1512 : !llvm.i64 + %1514 = llvm.add %1511, %1513 : !llvm.i64 + %1515 = llvm.getelementptr %1507[%1514] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1506, %1515 : !llvm.ptr + %1516 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1517 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1518 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1519 = llvm.mul %1035, %1518 : !llvm.i64 + %1520 = llvm.add %1517, %1519 : !llvm.i64 + %1521 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1522 = llvm.mul %59, %1521 : !llvm.i64 + %1523 = llvm.add %1520, %1522 : !llvm.i64 + %1524 = llvm.getelementptr %1516[%1523] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1525 = llvm.load %1524 : !llvm.ptr + %1526 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1528 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1529 = llvm.mul %59, %1528 : !llvm.i64 + %1530 = llvm.add %1527, %1529 : !llvm.i64 + %1531 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1532 = llvm.mul %547, %1531 : !llvm.i64 + %1533 = llvm.add %1530, %1532 : !llvm.i64 + %1534 = llvm.getelementptr %1526[%1533] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1535 = llvm.load %1534 : !llvm.ptr + %1536 = llvm.fmul %1525, %1535 {RelaxedPrecision} : !llvm.float + %1537 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1539 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1540 = llvm.mul %1035, %1539 : !llvm.i64 + %1541 = llvm.add %1538, %1540 : !llvm.i64 + %1542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1543 = llvm.mul %547, %1542 : !llvm.i64 + %1544 = llvm.add %1541, %1543 : !llvm.i64 + %1545 = llvm.getelementptr %1537[%1544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1546 = llvm.load %1545 : !llvm.ptr + %1547 = llvm.fadd %1546, %1536 {RelaxedPrecision} : !llvm.float + %1548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1551 = llvm.mul %1035, %1550 : !llvm.i64 + %1552 = llvm.add %1549, %1551 : !llvm.i64 + %1553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1554 = llvm.mul %547, %1553 : !llvm.i64 + %1555 = llvm.add %1552, %1554 : !llvm.i64 + %1556 = llvm.getelementptr %1548[%1555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1547, %1556 : !llvm.ptr + %1557 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1558 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1559 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1560 = llvm.mul %1035, %1559 : !llvm.i64 + %1561 = llvm.add %1558, %1560 : !llvm.i64 + %1562 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1563 = llvm.mul %547, %1562 : !llvm.i64 + %1564 = llvm.add %1561, %1563 : !llvm.i64 + %1565 = llvm.getelementptr %1557[%1564] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1566 = llvm.load %1565 : !llvm.ptr + %1567 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1568 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1569 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1570 = llvm.mul %1035, %1569 : !llvm.i64 + %1571 = llvm.add %1568, %1570 : !llvm.i64 + %1572 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1573 = llvm.mul %547, %1572 : !llvm.i64 + %1574 = llvm.add %1571, %1573 : !llvm.i64 + %1575 = llvm.getelementptr %1567[%1574] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1566, %1575 : !llvm.ptr + %1576 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1577 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1578 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1579 = llvm.mul %1035, %1578 : !llvm.i64 + %1580 = llvm.add %1577, %1579 : !llvm.i64 + %1581 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1582 = llvm.mul %59, %1581 : !llvm.i64 + %1583 = llvm.add %1580, %1582 : !llvm.i64 + %1584 = llvm.getelementptr %1576[%1583] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1585 = llvm.load %1584 : !llvm.ptr + %1586 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1587 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1588 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1589 = llvm.mul %59, %1588 : !llvm.i64 + %1590 = llvm.add %1587, %1589 : !llvm.i64 + %1591 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1592 = llvm.mul %608, %1591 : !llvm.i64 + %1593 = llvm.add %1590, %1592 : !llvm.i64 + %1594 = llvm.getelementptr %1586[%1593] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1595 = llvm.load %1594 : !llvm.ptr + %1596 = llvm.fmul %1585, %1595 {RelaxedPrecision} : !llvm.float + %1597 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1599 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1600 = llvm.mul %1035, %1599 : !llvm.i64 + %1601 = llvm.add %1598, %1600 : !llvm.i64 + %1602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1603 = llvm.mul %608, %1602 : !llvm.i64 + %1604 = llvm.add %1601, %1603 : !llvm.i64 + %1605 = llvm.getelementptr %1597[%1604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1606 = llvm.load %1605 : !llvm.ptr + %1607 = llvm.fadd %1606, %1596 {RelaxedPrecision} : !llvm.float + %1608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1611 = llvm.mul %1035, %1610 : !llvm.i64 + %1612 = llvm.add %1609, %1611 : !llvm.i64 + %1613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1614 = llvm.mul %608, %1613 : !llvm.i64 + %1615 = llvm.add %1612, %1614 : !llvm.i64 + %1616 = llvm.getelementptr %1608[%1615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1607, %1616 : !llvm.ptr + %1617 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1618 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1619 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1620 = llvm.mul %1035, %1619 : !llvm.i64 + %1621 = llvm.add %1618, %1620 : !llvm.i64 + %1622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1623 = llvm.mul %608, %1622 : !llvm.i64 + %1624 = llvm.add %1621, %1623 : !llvm.i64 + %1625 = llvm.getelementptr %1617[%1624] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1626 = llvm.load %1625 : !llvm.ptr + %1627 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1628 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1629 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1630 = llvm.mul %1035, %1629 : !llvm.i64 + %1631 = llvm.add %1628, %1630 : !llvm.i64 + %1632 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1633 = llvm.mul %608, %1632 : !llvm.i64 + %1634 = llvm.add %1631, %1633 : !llvm.i64 + %1635 = llvm.getelementptr %1627[%1634] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1626, %1635 : !llvm.ptr + %1636 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1637 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1638 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1639 = llvm.mul %1035, %1638 : !llvm.i64 + %1640 = llvm.add %1637, %1639 : !llvm.i64 + %1641 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1642 = llvm.mul %59, %1641 : !llvm.i64 + %1643 = llvm.add %1640, %1642 : !llvm.i64 + %1644 = llvm.getelementptr %1636[%1643] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1645 = llvm.load %1644 : !llvm.ptr + %1646 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1647 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1648 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1649 = llvm.mul %59, %1648 : !llvm.i64 + %1650 = llvm.add %1647, %1649 : !llvm.i64 + %1651 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1652 = llvm.mul %669, %1651 : !llvm.i64 + %1653 = llvm.add %1650, %1652 : !llvm.i64 + %1654 = llvm.getelementptr %1646[%1653] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1655 = llvm.load %1654 : !llvm.ptr + %1656 = llvm.fmul %1645, %1655 {RelaxedPrecision} : !llvm.float + %1657 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1659 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1660 = llvm.mul %1035, %1659 : !llvm.i64 + %1661 = llvm.add %1658, %1660 : !llvm.i64 + %1662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1663 = llvm.mul %669, %1662 : !llvm.i64 + %1664 = llvm.add %1661, %1663 : !llvm.i64 + %1665 = llvm.getelementptr %1657[%1664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1666 = llvm.load %1665 : !llvm.ptr + %1667 = llvm.fadd %1666, %1656 {RelaxedPrecision} : !llvm.float + %1668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1671 = llvm.mul %1035, %1670 : !llvm.i64 + %1672 = llvm.add %1669, %1671 : !llvm.i64 + %1673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1674 = llvm.mul %669, %1673 : !llvm.i64 + %1675 = llvm.add %1672, %1674 : !llvm.i64 + %1676 = llvm.getelementptr %1668[%1675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1667, %1676 : !llvm.ptr + %1677 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1680 = llvm.mul %1035, %1679 : !llvm.i64 + %1681 = llvm.add %1678, %1680 : !llvm.i64 + %1682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1683 = llvm.mul %669, %1682 : !llvm.i64 + %1684 = llvm.add %1681, %1683 : !llvm.i64 + %1685 = llvm.getelementptr %1677[%1684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1686 = llvm.load %1685 : !llvm.ptr + %1687 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1688 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1689 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1690 = llvm.mul %1035, %1689 : !llvm.i64 + %1691 = llvm.add %1688, %1690 : !llvm.i64 + %1692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1693 = llvm.mul %669, %1692 : !llvm.i64 + %1694 = llvm.add %1691, %1693 : !llvm.i64 + %1695 = llvm.getelementptr %1687[%1694] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1686, %1695 : !llvm.ptr + %1696 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1697 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1698 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1699 = llvm.mul %1035, %1698 : !llvm.i64 + %1700 = llvm.add %1697, %1699 : !llvm.i64 + %1701 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1702 = llvm.mul %59, %1701 : !llvm.i64 + %1703 = llvm.add %1700, %1702 : !llvm.i64 + %1704 = llvm.getelementptr %1696[%1703] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1705 = llvm.load %1704 : !llvm.ptr + %1706 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1708 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1709 = llvm.mul %59, %1708 : !llvm.i64 + %1710 = llvm.add %1707, %1709 : !llvm.i64 + %1711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1712 = llvm.mul %730, %1711 : !llvm.i64 + %1713 = llvm.add %1710, %1712 : !llvm.i64 + %1714 = llvm.getelementptr %1706[%1713] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1715 = llvm.load %1714 : !llvm.ptr + %1716 = llvm.fmul %1705, %1715 {RelaxedPrecision} : !llvm.float + %1717 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1718 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1719 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1720 = llvm.mul %1035, %1719 : !llvm.i64 + %1721 = llvm.add %1718, %1720 : !llvm.i64 + %1722 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1723 = llvm.mul %730, %1722 : !llvm.i64 + %1724 = llvm.add %1721, %1723 : !llvm.i64 + %1725 = llvm.getelementptr %1717[%1724] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1726 = llvm.load %1725 : !llvm.ptr + %1727 = llvm.fadd %1726, %1716 {RelaxedPrecision} : !llvm.float + %1728 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1729 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1730 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1731 = llvm.mul %1035, %1730 : !llvm.i64 + %1732 = llvm.add %1729, %1731 : !llvm.i64 + %1733 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1734 = llvm.mul %730, %1733 : !llvm.i64 + %1735 = llvm.add %1732, %1734 : !llvm.i64 + %1736 = llvm.getelementptr %1728[%1735] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1727, %1736 : !llvm.ptr + %1737 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1738 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1739 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1740 = llvm.mul %1035, %1739 : !llvm.i64 + %1741 = llvm.add %1738, %1740 : !llvm.i64 + %1742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1743 = llvm.mul %730, %1742 : !llvm.i64 + %1744 = llvm.add %1741, %1743 : !llvm.i64 + %1745 = llvm.getelementptr %1737[%1744] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1746 = llvm.load %1745 : !llvm.ptr + %1747 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1749 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1750 = llvm.mul %1035, %1749 : !llvm.i64 + %1751 = llvm.add %1748, %1750 : !llvm.i64 + %1752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1753 = llvm.mul %730, %1752 : !llvm.i64 + %1754 = llvm.add %1751, %1753 : !llvm.i64 + %1755 = llvm.getelementptr %1747[%1754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1746, %1755 : !llvm.ptr + %1756 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1758 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1759 = llvm.mul %1035, %1758 : !llvm.i64 + %1760 = llvm.add %1757, %1759 : !llvm.i64 + %1761 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1762 = llvm.mul %59, %1761 : !llvm.i64 + %1763 = llvm.add %1760, %1762 : !llvm.i64 + %1764 = llvm.getelementptr %1756[%1763] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1765 = llvm.load %1764 : !llvm.ptr + %1766 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1767 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1768 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1769 = llvm.mul %59, %1768 : !llvm.i64 + %1770 = llvm.add %1767, %1769 : !llvm.i64 + %1771 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1772 = llvm.mul %791, %1771 : !llvm.i64 + %1773 = llvm.add %1770, %1772 : !llvm.i64 + %1774 = llvm.getelementptr %1766[%1773] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1775 = llvm.load %1774 : !llvm.ptr + %1776 = llvm.fmul %1765, %1775 {RelaxedPrecision} : !llvm.float + %1777 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1779 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1780 = llvm.mul %1035, %1779 : !llvm.i64 + %1781 = llvm.add %1778, %1780 : !llvm.i64 + %1782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1783 = llvm.mul %791, %1782 : !llvm.i64 + %1784 = llvm.add %1781, %1783 : !llvm.i64 + %1785 = llvm.getelementptr %1777[%1784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1786 = llvm.load %1785 : !llvm.ptr + %1787 = llvm.fadd %1786, %1776 {RelaxedPrecision} : !llvm.float + %1788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1791 = llvm.mul %1035, %1790 : !llvm.i64 + %1792 = llvm.add %1789, %1791 : !llvm.i64 + %1793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1794 = llvm.mul %791, %1793 : !llvm.i64 + %1795 = llvm.add %1792, %1794 : !llvm.i64 + %1796 = llvm.getelementptr %1788[%1795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1787, %1796 : !llvm.ptr + %1797 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1799 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1800 = llvm.mul %1035, %1799 : !llvm.i64 + %1801 = llvm.add %1798, %1800 : !llvm.i64 + %1802 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1803 = llvm.mul %791, %1802 : !llvm.i64 + %1804 = llvm.add %1801, %1803 : !llvm.i64 + %1805 = llvm.getelementptr %1797[%1804] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1806 = llvm.load %1805 : !llvm.ptr + %1807 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1808 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1809 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1810 = llvm.mul %1035, %1809 : !llvm.i64 + %1811 = llvm.add %1808, %1810 : !llvm.i64 + %1812 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1813 = llvm.mul %791, %1812 : !llvm.i64 + %1814 = llvm.add %1811, %1813 : !llvm.i64 + %1815 = llvm.getelementptr %1807[%1814] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1806, %1815 : !llvm.ptr + %1816 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1817 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1818 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1819 = llvm.mul %1035, %1818 : !llvm.i64 + %1820 = llvm.add %1817, %1819 : !llvm.i64 + %1821 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1822 = llvm.mul %59, %1821 : !llvm.i64 + %1823 = llvm.add %1820, %1822 : !llvm.i64 + %1824 = llvm.getelementptr %1816[%1823] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1825 = llvm.load %1824 : !llvm.ptr + %1826 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1827 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1828 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1829 = llvm.mul %59, %1828 : !llvm.i64 + %1830 = llvm.add %1827, %1829 : !llvm.i64 + %1831 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1832 = llvm.mul %852, %1831 : !llvm.i64 + %1833 = llvm.add %1830, %1832 : !llvm.i64 + %1834 = llvm.getelementptr %1826[%1833] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1835 = llvm.load %1834 : !llvm.ptr + %1836 = llvm.fmul %1825, %1835 {RelaxedPrecision} : !llvm.float + %1837 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1839 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1840 = llvm.mul %1035, %1839 : !llvm.i64 + %1841 = llvm.add %1838, %1840 : !llvm.i64 + %1842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1843 = llvm.mul %852, %1842 : !llvm.i64 + %1844 = llvm.add %1841, %1843 : !llvm.i64 + %1845 = llvm.getelementptr %1837[%1844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1846 = llvm.load %1845 : !llvm.ptr + %1847 = llvm.fadd %1846, %1836 {RelaxedPrecision} : !llvm.float + %1848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1851 = llvm.mul %1035, %1850 : !llvm.i64 + %1852 = llvm.add %1849, %1851 : !llvm.i64 + %1853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1854 = llvm.mul %852, %1853 : !llvm.i64 + %1855 = llvm.add %1852, %1854 : !llvm.i64 + %1856 = llvm.getelementptr %1848[%1855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1847, %1856 : !llvm.ptr + %1857 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1858 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1859 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1860 = llvm.mul %1035, %1859 : !llvm.i64 + %1861 = llvm.add %1858, %1860 : !llvm.i64 + %1862 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1863 = llvm.mul %852, %1862 : !llvm.i64 + %1864 = llvm.add %1861, %1863 : !llvm.i64 + %1865 = llvm.getelementptr %1857[%1864] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1866 = llvm.load %1865 : !llvm.ptr + %1867 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1868 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1869 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1870 = llvm.mul %1035, %1869 : !llvm.i64 + %1871 = llvm.add %1868, %1870 : !llvm.i64 + %1872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1873 = llvm.mul %852, %1872 : !llvm.i64 + %1874 = llvm.add %1871, %1873 : !llvm.i64 + %1875 = llvm.getelementptr %1867[%1874] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1866, %1875 : !llvm.ptr + %1876 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1877 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1878 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1879 = llvm.mul %1035, %1878 : !llvm.i64 + %1880 = llvm.add %1877, %1879 : !llvm.i64 + %1881 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1882 = llvm.mul %59, %1881 : !llvm.i64 + %1883 = llvm.add %1880, %1882 : !llvm.i64 + %1884 = llvm.getelementptr %1876[%1883] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1885 = llvm.load %1884 : !llvm.ptr + %1886 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1888 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1889 = llvm.mul %59, %1888 : !llvm.i64 + %1890 = llvm.add %1887, %1889 : !llvm.i64 + %1891 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1892 = llvm.mul %913, %1891 : !llvm.i64 + %1893 = llvm.add %1890, %1892 : !llvm.i64 + %1894 = llvm.getelementptr %1886[%1893] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1895 = llvm.load %1894 : !llvm.ptr + %1896 = llvm.fmul %1885, %1895 {RelaxedPrecision} : !llvm.float + %1897 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1899 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1900 = llvm.mul %1035, %1899 : !llvm.i64 + %1901 = llvm.add %1898, %1900 : !llvm.i64 + %1902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1903 = llvm.mul %913, %1902 : !llvm.i64 + %1904 = llvm.add %1901, %1903 : !llvm.i64 + %1905 = llvm.getelementptr %1897[%1904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1906 = llvm.load %1905 : !llvm.ptr + %1907 = llvm.fadd %1906, %1896 {RelaxedPrecision} : !llvm.float + %1908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1911 = llvm.mul %1035, %1910 : !llvm.i64 + %1912 = llvm.add %1909, %1911 : !llvm.i64 + %1913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1914 = llvm.mul %913, %1913 : !llvm.i64 + %1915 = llvm.add %1912, %1914 : !llvm.i64 + %1916 = llvm.getelementptr %1908[%1915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1907, %1916 : !llvm.ptr + %1917 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1918 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1919 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1920 = llvm.mul %1035, %1919 : !llvm.i64 + %1921 = llvm.add %1918, %1920 : !llvm.i64 + %1922 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1923 = llvm.mul %913, %1922 : !llvm.i64 + %1924 = llvm.add %1921, %1923 : !llvm.i64 + %1925 = llvm.getelementptr %1917[%1924] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1926 = llvm.load %1925 : !llvm.ptr + %1927 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1928 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1929 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1930 = llvm.mul %1035, %1929 : !llvm.i64 + %1931 = llvm.add %1928, %1930 : !llvm.i64 + %1932 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1933 = llvm.mul %913, %1932 : !llvm.i64 + %1934 = llvm.add %1931, %1933 : !llvm.i64 + %1935 = llvm.getelementptr %1927[%1934] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1926, %1935 : !llvm.ptr + %1936 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1937 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1938 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1939 = llvm.mul %1035, %1938 : !llvm.i64 + %1940 = llvm.add %1937, %1939 : !llvm.i64 + %1941 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1942 = llvm.mul %59, %1941 : !llvm.i64 + %1943 = llvm.add %1940, %1942 : !llvm.i64 + %1944 = llvm.getelementptr %1936[%1943] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1945 = llvm.load %1944 : !llvm.ptr + %1946 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1948 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1949 = llvm.mul %59, %1948 : !llvm.i64 + %1950 = llvm.add %1947, %1949 : !llvm.i64 + %1951 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1952 = llvm.mul %974, %1951 : !llvm.i64 + %1953 = llvm.add %1950, %1952 : !llvm.i64 + %1954 = llvm.getelementptr %1946[%1953] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1955 = llvm.load %1954 : !llvm.ptr + %1956 = llvm.fmul %1945, %1955 {RelaxedPrecision} : !llvm.float + %1957 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1958 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1959 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1960 = llvm.mul %1035, %1959 : !llvm.i64 + %1961 = llvm.add %1958, %1960 : !llvm.i64 + %1962 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1963 = llvm.mul %974, %1962 : !llvm.i64 + %1964 = llvm.add %1961, %1963 : !llvm.i64 + %1965 = llvm.getelementptr %1957[%1964] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1966 = llvm.load %1965 : !llvm.ptr + %1967 = llvm.fadd %1966, %1956 {RelaxedPrecision} : !llvm.float + %1968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1971 = llvm.mul %1035, %1970 : !llvm.i64 + %1972 = llvm.add %1969, %1971 : !llvm.i64 + %1973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1974 = llvm.mul %974, %1973 : !llvm.i64 + %1975 = llvm.add %1972, %1974 : !llvm.i64 + %1976 = llvm.getelementptr %1968[%1975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1967, %1976 : !llvm.ptr + %1977 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1978 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1979 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1980 = llvm.mul %1035, %1979 : !llvm.i64 + %1981 = llvm.add %1978, %1980 : !llvm.i64 + %1982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1983 = llvm.mul %974, %1982 : !llvm.i64 + %1984 = llvm.add %1981, %1983 : !llvm.i64 + %1985 = llvm.getelementptr %1977[%1984] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1986 = llvm.load %1985 : !llvm.ptr + %1987 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1988 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1989 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1990 = llvm.mul %1035, %1989 : !llvm.i64 + %1991 = llvm.add %1988, %1990 : !llvm.i64 + %1992 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1993 = llvm.mul %974, %1992 : !llvm.i64 + %1994 = llvm.add %1991, %1993 : !llvm.i64 + %1995 = llvm.getelementptr %1987[%1994] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1986, %1995 : !llvm.ptr + %1996 = llvm.add %50, %34 : !llvm.i64 + %1997 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1998 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1999 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2000 = llvm.mul %1996, %1999 : !llvm.i64 + %2001 = llvm.add %1998, %2000 : !llvm.i64 + %2002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2003 = llvm.mul %59, %2002 : !llvm.i64 + %2004 = llvm.add %2001, %2003 : !llvm.i64 + %2005 = llvm.getelementptr %1997[%2004] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2006 = llvm.load %2005 : !llvm.ptr + %2007 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2009 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2010 = llvm.mul %59, %2009 : !llvm.i64 + %2011 = llvm.add %2008, %2010 : !llvm.i64 + %2012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2013 = llvm.mul %58, %2012 : !llvm.i64 + %2014 = llvm.add %2011, %2013 : !llvm.i64 + %2015 = llvm.getelementptr %2007[%2014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2016 = llvm.load %2015 : !llvm.ptr + %2017 = llvm.fmul %2006, %2016 {RelaxedPrecision} : !llvm.float + %2018 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2020 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2021 = llvm.mul %1996, %2020 : !llvm.i64 + %2022 = llvm.add %2019, %2021 : !llvm.i64 + %2023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2024 = llvm.mul %58, %2023 : !llvm.i64 + %2025 = llvm.add %2022, %2024 : !llvm.i64 + %2026 = llvm.getelementptr %2018[%2025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2027 = llvm.load %2026 : !llvm.ptr + %2028 = llvm.fadd %2027, %2017 {RelaxedPrecision} : !llvm.float + %2029 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2030 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2031 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2032 = llvm.mul %1996, %2031 : !llvm.i64 + %2033 = llvm.add %2030, %2032 : !llvm.i64 + %2034 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2035 = llvm.mul %58, %2034 : !llvm.i64 + %2036 = llvm.add %2033, %2035 : !llvm.i64 + %2037 = llvm.getelementptr %2029[%2036] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2028, %2037 : !llvm.ptr + %2038 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2039 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2040 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2041 = llvm.mul %1996, %2040 : !llvm.i64 + %2042 = llvm.add %2039, %2041 : !llvm.i64 + %2043 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2044 = llvm.mul %58, %2043 : !llvm.i64 + %2045 = llvm.add %2042, %2044 : !llvm.i64 + %2046 = llvm.getelementptr %2038[%2045] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2047 = llvm.load %2046 : !llvm.ptr + %2048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2051 = llvm.mul %1996, %2050 : !llvm.i64 + %2052 = llvm.add %2049, %2051 : !llvm.i64 + %2053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2054 = llvm.mul %58, %2053 : !llvm.i64 + %2055 = llvm.add %2052, %2054 : !llvm.i64 + %2056 = llvm.getelementptr %2048[%2055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2047, %2056 : !llvm.ptr + %2057 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2059 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2060 = llvm.mul %1996, %2059 : !llvm.i64 + %2061 = llvm.add %2058, %2060 : !llvm.i64 + %2062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2063 = llvm.mul %59, %2062 : !llvm.i64 + %2064 = llvm.add %2061, %2063 : !llvm.i64 + %2065 = llvm.getelementptr %2057[%2064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2066 = llvm.load %2065 : !llvm.ptr + %2067 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2069 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2070 = llvm.mul %59, %2069 : !llvm.i64 + %2071 = llvm.add %2068, %2070 : !llvm.i64 + %2072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2073 = llvm.mul %120, %2072 : !llvm.i64 + %2074 = llvm.add %2071, %2073 : !llvm.i64 + %2075 = llvm.getelementptr %2067[%2074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2076 = llvm.load %2075 : !llvm.ptr + %2077 = llvm.fmul %2066, %2076 {RelaxedPrecision} : !llvm.float + %2078 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2080 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2081 = llvm.mul %1996, %2080 : !llvm.i64 + %2082 = llvm.add %2079, %2081 : !llvm.i64 + %2083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2084 = llvm.mul %120, %2083 : !llvm.i64 + %2085 = llvm.add %2082, %2084 : !llvm.i64 + %2086 = llvm.getelementptr %2078[%2085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2087 = llvm.load %2086 : !llvm.ptr + %2088 = llvm.fadd %2087, %2077 {RelaxedPrecision} : !llvm.float + %2089 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2090 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2091 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2092 = llvm.mul %1996, %2091 : !llvm.i64 + %2093 = llvm.add %2090, %2092 : !llvm.i64 + %2094 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2095 = llvm.mul %120, %2094 : !llvm.i64 + %2096 = llvm.add %2093, %2095 : !llvm.i64 + %2097 = llvm.getelementptr %2089[%2096] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2088, %2097 : !llvm.ptr + %2098 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2100 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2101 = llvm.mul %1996, %2100 : !llvm.i64 + %2102 = llvm.add %2099, %2101 : !llvm.i64 + %2103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2104 = llvm.mul %120, %2103 : !llvm.i64 + %2105 = llvm.add %2102, %2104 : !llvm.i64 + %2106 = llvm.getelementptr %2098[%2105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2107 = llvm.load %2106 : !llvm.ptr + %2108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2111 = llvm.mul %1996, %2110 : !llvm.i64 + %2112 = llvm.add %2109, %2111 : !llvm.i64 + %2113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2114 = llvm.mul %120, %2113 : !llvm.i64 + %2115 = llvm.add %2112, %2114 : !llvm.i64 + %2116 = llvm.getelementptr %2108[%2115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2107, %2116 : !llvm.ptr + %2117 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2119 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2120 = llvm.mul %1996, %2119 : !llvm.i64 + %2121 = llvm.add %2118, %2120 : !llvm.i64 + %2122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2123 = llvm.mul %59, %2122 : !llvm.i64 + %2124 = llvm.add %2121, %2123 : !llvm.i64 + %2125 = llvm.getelementptr %2117[%2124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2126 = llvm.load %2125 : !llvm.ptr + %2127 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2128 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2129 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2130 = llvm.mul %59, %2129 : !llvm.i64 + %2131 = llvm.add %2128, %2130 : !llvm.i64 + %2132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2133 = llvm.mul %181, %2132 : !llvm.i64 + %2134 = llvm.add %2131, %2133 : !llvm.i64 + %2135 = llvm.getelementptr %2127[%2134] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2136 = llvm.load %2135 : !llvm.ptr + %2137 = llvm.fmul %2126, %2136 {RelaxedPrecision} : !llvm.float + %2138 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2140 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2141 = llvm.mul %1996, %2140 : !llvm.i64 + %2142 = llvm.add %2139, %2141 : !llvm.i64 + %2143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2144 = llvm.mul %181, %2143 : !llvm.i64 + %2145 = llvm.add %2142, %2144 : !llvm.i64 + %2146 = llvm.getelementptr %2138[%2145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2147 = llvm.load %2146 : !llvm.ptr + %2148 = llvm.fadd %2147, %2137 {RelaxedPrecision} : !llvm.float + %2149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2151 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2152 = llvm.mul %1996, %2151 : !llvm.i64 + %2153 = llvm.add %2150, %2152 : !llvm.i64 + %2154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2155 = llvm.mul %181, %2154 : !llvm.i64 + %2156 = llvm.add %2153, %2155 : !llvm.i64 + %2157 = llvm.getelementptr %2149[%2156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2148, %2157 : !llvm.ptr + %2158 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2161 = llvm.mul %1996, %2160 : !llvm.i64 + %2162 = llvm.add %2159, %2161 : !llvm.i64 + %2163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2164 = llvm.mul %181, %2163 : !llvm.i64 + %2165 = llvm.add %2162, %2164 : !llvm.i64 + %2166 = llvm.getelementptr %2158[%2165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2167 = llvm.load %2166 : !llvm.ptr + %2168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2171 = llvm.mul %1996, %2170 : !llvm.i64 + %2172 = llvm.add %2169, %2171 : !llvm.i64 + %2173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2174 = llvm.mul %181, %2173 : !llvm.i64 + %2175 = llvm.add %2172, %2174 : !llvm.i64 + %2176 = llvm.getelementptr %2168[%2175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2167, %2176 : !llvm.ptr + %2177 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2179 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2180 = llvm.mul %1996, %2179 : !llvm.i64 + %2181 = llvm.add %2178, %2180 : !llvm.i64 + %2182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2183 = llvm.mul %59, %2182 : !llvm.i64 + %2184 = llvm.add %2181, %2183 : !llvm.i64 + %2185 = llvm.getelementptr %2177[%2184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2186 = llvm.load %2185 : !llvm.ptr + %2187 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2188 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2189 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2190 = llvm.mul %59, %2189 : !llvm.i64 + %2191 = llvm.add %2188, %2190 : !llvm.i64 + %2192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2193 = llvm.mul %242, %2192 : !llvm.i64 + %2194 = llvm.add %2191, %2193 : !llvm.i64 + %2195 = llvm.getelementptr %2187[%2194] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2196 = llvm.load %2195 : !llvm.ptr + %2197 = llvm.fmul %2186, %2196 {RelaxedPrecision} : !llvm.float + %2198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2201 = llvm.mul %1996, %2200 : !llvm.i64 + %2202 = llvm.add %2199, %2201 : !llvm.i64 + %2203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2204 = llvm.mul %242, %2203 : !llvm.i64 + %2205 = llvm.add %2202, %2204 : !llvm.i64 + %2206 = llvm.getelementptr %2198[%2205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2207 = llvm.load %2206 : !llvm.ptr + %2208 = llvm.fadd %2207, %2197 {RelaxedPrecision} : !llvm.float + %2209 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2211 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2212 = llvm.mul %1996, %2211 : !llvm.i64 + %2213 = llvm.add %2210, %2212 : !llvm.i64 + %2214 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2215 = llvm.mul %242, %2214 : !llvm.i64 + %2216 = llvm.add %2213, %2215 : !llvm.i64 + %2217 = llvm.getelementptr %2209[%2216] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2208, %2217 : !llvm.ptr + %2218 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2220 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2221 = llvm.mul %1996, %2220 : !llvm.i64 + %2222 = llvm.add %2219, %2221 : !llvm.i64 + %2223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2224 = llvm.mul %242, %2223 : !llvm.i64 + %2225 = llvm.add %2222, %2224 : !llvm.i64 + %2226 = llvm.getelementptr %2218[%2225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2227 = llvm.load %2226 : !llvm.ptr + %2228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2231 = llvm.mul %1996, %2230 : !llvm.i64 + %2232 = llvm.add %2229, %2231 : !llvm.i64 + %2233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2234 = llvm.mul %242, %2233 : !llvm.i64 + %2235 = llvm.add %2232, %2234 : !llvm.i64 + %2236 = llvm.getelementptr %2228[%2235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2227, %2236 : !llvm.ptr + %2237 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2239 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2240 = llvm.mul %1996, %2239 : !llvm.i64 + %2241 = llvm.add %2238, %2240 : !llvm.i64 + %2242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2243 = llvm.mul %59, %2242 : !llvm.i64 + %2244 = llvm.add %2241, %2243 : !llvm.i64 + %2245 = llvm.getelementptr %2237[%2244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2246 = llvm.load %2245 : !llvm.ptr + %2247 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2248 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2249 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2250 = llvm.mul %59, %2249 : !llvm.i64 + %2251 = llvm.add %2248, %2250 : !llvm.i64 + %2252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2253 = llvm.mul %303, %2252 : !llvm.i64 + %2254 = llvm.add %2251, %2253 : !llvm.i64 + %2255 = llvm.getelementptr %2247[%2254] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2256 = llvm.load %2255 : !llvm.ptr + %2257 = llvm.fmul %2246, %2256 {RelaxedPrecision} : !llvm.float + %2258 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2260 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2261 = llvm.mul %1996, %2260 : !llvm.i64 + %2262 = llvm.add %2259, %2261 : !llvm.i64 + %2263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2264 = llvm.mul %303, %2263 : !llvm.i64 + %2265 = llvm.add %2262, %2264 : !llvm.i64 + %2266 = llvm.getelementptr %2258[%2265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2267 = llvm.load %2266 : !llvm.ptr + %2268 = llvm.fadd %2267, %2257 {RelaxedPrecision} : !llvm.float + %2269 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2270 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2271 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2272 = llvm.mul %1996, %2271 : !llvm.i64 + %2273 = llvm.add %2270, %2272 : !llvm.i64 + %2274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2275 = llvm.mul %303, %2274 : !llvm.i64 + %2276 = llvm.add %2273, %2275 : !llvm.i64 + %2277 = llvm.getelementptr %2269[%2276] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2268, %2277 : !llvm.ptr + %2278 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2279 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2280 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2281 = llvm.mul %1996, %2280 : !llvm.i64 + %2282 = llvm.add %2279, %2281 : !llvm.i64 + %2283 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2284 = llvm.mul %303, %2283 : !llvm.i64 + %2285 = llvm.add %2282, %2284 : !llvm.i64 + %2286 = llvm.getelementptr %2278[%2285] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2287 = llvm.load %2286 : !llvm.ptr + %2288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2291 = llvm.mul %1996, %2290 : !llvm.i64 + %2292 = llvm.add %2289, %2291 : !llvm.i64 + %2293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2294 = llvm.mul %303, %2293 : !llvm.i64 + %2295 = llvm.add %2292, %2294 : !llvm.i64 + %2296 = llvm.getelementptr %2288[%2295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2287, %2296 : !llvm.ptr + %2297 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2299 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2300 = llvm.mul %1996, %2299 : !llvm.i64 + %2301 = llvm.add %2298, %2300 : !llvm.i64 + %2302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2303 = llvm.mul %59, %2302 : !llvm.i64 + %2304 = llvm.add %2301, %2303 : !llvm.i64 + %2305 = llvm.getelementptr %2297[%2304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2306 = llvm.load %2305 : !llvm.ptr + %2307 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2308 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2309 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2310 = llvm.mul %59, %2309 : !llvm.i64 + %2311 = llvm.add %2308, %2310 : !llvm.i64 + %2312 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2313 = llvm.mul %364, %2312 : !llvm.i64 + %2314 = llvm.add %2311, %2313 : !llvm.i64 + %2315 = llvm.getelementptr %2307[%2314] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2316 = llvm.load %2315 : !llvm.ptr + %2317 = llvm.fmul %2306, %2316 {RelaxedPrecision} : !llvm.float + %2318 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2320 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2321 = llvm.mul %1996, %2320 : !llvm.i64 + %2322 = llvm.add %2319, %2321 : !llvm.i64 + %2323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2324 = llvm.mul %364, %2323 : !llvm.i64 + %2325 = llvm.add %2322, %2324 : !llvm.i64 + %2326 = llvm.getelementptr %2318[%2325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2327 = llvm.load %2326 : !llvm.ptr + %2328 = llvm.fadd %2327, %2317 {RelaxedPrecision} : !llvm.float + %2329 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2330 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2331 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2332 = llvm.mul %1996, %2331 : !llvm.i64 + %2333 = llvm.add %2330, %2332 : !llvm.i64 + %2334 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2335 = llvm.mul %364, %2334 : !llvm.i64 + %2336 = llvm.add %2333, %2335 : !llvm.i64 + %2337 = llvm.getelementptr %2329[%2336] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2328, %2337 : !llvm.ptr + %2338 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2339 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2340 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2341 = llvm.mul %1996, %2340 : !llvm.i64 + %2342 = llvm.add %2339, %2341 : !llvm.i64 + %2343 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2344 = llvm.mul %364, %2343 : !llvm.i64 + %2345 = llvm.add %2342, %2344 : !llvm.i64 + %2346 = llvm.getelementptr %2338[%2345] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2347 = llvm.load %2346 : !llvm.ptr + %2348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2351 = llvm.mul %1996, %2350 : !llvm.i64 + %2352 = llvm.add %2349, %2351 : !llvm.i64 + %2353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2354 = llvm.mul %364, %2353 : !llvm.i64 + %2355 = llvm.add %2352, %2354 : !llvm.i64 + %2356 = llvm.getelementptr %2348[%2355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2347, %2356 : !llvm.ptr + %2357 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2359 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2360 = llvm.mul %1996, %2359 : !llvm.i64 + %2361 = llvm.add %2358, %2360 : !llvm.i64 + %2362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2363 = llvm.mul %59, %2362 : !llvm.i64 + %2364 = llvm.add %2361, %2363 : !llvm.i64 + %2365 = llvm.getelementptr %2357[%2364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2366 = llvm.load %2365 : !llvm.ptr + %2367 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2368 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2369 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2370 = llvm.mul %59, %2369 : !llvm.i64 + %2371 = llvm.add %2368, %2370 : !llvm.i64 + %2372 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2373 = llvm.mul %425, %2372 : !llvm.i64 + %2374 = llvm.add %2371, %2373 : !llvm.i64 + %2375 = llvm.getelementptr %2367[%2374] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2376 = llvm.load %2375 : !llvm.ptr + %2377 = llvm.fmul %2366, %2376 {RelaxedPrecision} : !llvm.float + %2378 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2380 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2381 = llvm.mul %1996, %2380 : !llvm.i64 + %2382 = llvm.add %2379, %2381 : !llvm.i64 + %2383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2384 = llvm.mul %425, %2383 : !llvm.i64 + %2385 = llvm.add %2382, %2384 : !llvm.i64 + %2386 = llvm.getelementptr %2378[%2385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2387 = llvm.load %2386 : !llvm.ptr + %2388 = llvm.fadd %2387, %2377 {RelaxedPrecision} : !llvm.float + %2389 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2390 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2391 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2392 = llvm.mul %1996, %2391 : !llvm.i64 + %2393 = llvm.add %2390, %2392 : !llvm.i64 + %2394 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2395 = llvm.mul %425, %2394 : !llvm.i64 + %2396 = llvm.add %2393, %2395 : !llvm.i64 + %2397 = llvm.getelementptr %2389[%2396] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2388, %2397 : !llvm.ptr + %2398 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2399 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2401 = llvm.mul %1996, %2400 : !llvm.i64 + %2402 = llvm.add %2399, %2401 : !llvm.i64 + %2403 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2404 = llvm.mul %425, %2403 : !llvm.i64 + %2405 = llvm.add %2402, %2404 : !llvm.i64 + %2406 = llvm.getelementptr %2398[%2405] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2407 = llvm.load %2406 : !llvm.ptr + %2408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2411 = llvm.mul %1996, %2410 : !llvm.i64 + %2412 = llvm.add %2409, %2411 : !llvm.i64 + %2413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2414 = llvm.mul %425, %2413 : !llvm.i64 + %2415 = llvm.add %2412, %2414 : !llvm.i64 + %2416 = llvm.getelementptr %2408[%2415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2407, %2416 : !llvm.ptr + %2417 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2419 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2420 = llvm.mul %1996, %2419 : !llvm.i64 + %2421 = llvm.add %2418, %2420 : !llvm.i64 + %2422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2423 = llvm.mul %59, %2422 : !llvm.i64 + %2424 = llvm.add %2421, %2423 : !llvm.i64 + %2425 = llvm.getelementptr %2417[%2424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2426 = llvm.load %2425 : !llvm.ptr + %2427 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2428 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2429 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2430 = llvm.mul %59, %2429 : !llvm.i64 + %2431 = llvm.add %2428, %2430 : !llvm.i64 + %2432 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2433 = llvm.mul %486, %2432 : !llvm.i64 + %2434 = llvm.add %2431, %2433 : !llvm.i64 + %2435 = llvm.getelementptr %2427[%2434] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2436 = llvm.load %2435 : !llvm.ptr + %2437 = llvm.fmul %2426, %2436 {RelaxedPrecision} : !llvm.float + %2438 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2440 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2441 = llvm.mul %1996, %2440 : !llvm.i64 + %2442 = llvm.add %2439, %2441 : !llvm.i64 + %2443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2444 = llvm.mul %486, %2443 : !llvm.i64 + %2445 = llvm.add %2442, %2444 : !llvm.i64 + %2446 = llvm.getelementptr %2438[%2445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2447 = llvm.load %2446 : !llvm.ptr + %2448 = llvm.fadd %2447, %2437 {RelaxedPrecision} : !llvm.float + %2449 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2450 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2451 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2452 = llvm.mul %1996, %2451 : !llvm.i64 + %2453 = llvm.add %2450, %2452 : !llvm.i64 + %2454 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2455 = llvm.mul %486, %2454 : !llvm.i64 + %2456 = llvm.add %2453, %2455 : !llvm.i64 + %2457 = llvm.getelementptr %2449[%2456] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2448, %2457 : !llvm.ptr + %2458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2461 = llvm.mul %1996, %2460 : !llvm.i64 + %2462 = llvm.add %2459, %2461 : !llvm.i64 + %2463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2464 = llvm.mul %486, %2463 : !llvm.i64 + %2465 = llvm.add %2462, %2464 : !llvm.i64 + %2466 = llvm.getelementptr %2458[%2465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2467 = llvm.load %2466 : !llvm.ptr + %2468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2471 = llvm.mul %1996, %2470 : !llvm.i64 + %2472 = llvm.add %2469, %2471 : !llvm.i64 + %2473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2474 = llvm.mul %486, %2473 : !llvm.i64 + %2475 = llvm.add %2472, %2474 : !llvm.i64 + %2476 = llvm.getelementptr %2468[%2475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2467, %2476 : !llvm.ptr + %2477 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2479 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2480 = llvm.mul %1996, %2479 : !llvm.i64 + %2481 = llvm.add %2478, %2480 : !llvm.i64 + %2482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2483 = llvm.mul %59, %2482 : !llvm.i64 + %2484 = llvm.add %2481, %2483 : !llvm.i64 + %2485 = llvm.getelementptr %2477[%2484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2486 = llvm.load %2485 : !llvm.ptr + %2487 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2489 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2490 = llvm.mul %59, %2489 : !llvm.i64 + %2491 = llvm.add %2488, %2490 : !llvm.i64 + %2492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2493 = llvm.mul %547, %2492 : !llvm.i64 + %2494 = llvm.add %2491, %2493 : !llvm.i64 + %2495 = llvm.getelementptr %2487[%2494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2496 = llvm.load %2495 : !llvm.ptr + %2497 = llvm.fmul %2486, %2496 {RelaxedPrecision} : !llvm.float + %2498 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2500 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2501 = llvm.mul %1996, %2500 : !llvm.i64 + %2502 = llvm.add %2499, %2501 : !llvm.i64 + %2503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2504 = llvm.mul %547, %2503 : !llvm.i64 + %2505 = llvm.add %2502, %2504 : !llvm.i64 + %2506 = llvm.getelementptr %2498[%2505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2507 = llvm.load %2506 : !llvm.ptr + %2508 = llvm.fadd %2507, %2497 {RelaxedPrecision} : !llvm.float + %2509 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2510 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2511 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2512 = llvm.mul %1996, %2511 : !llvm.i64 + %2513 = llvm.add %2510, %2512 : !llvm.i64 + %2514 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2515 = llvm.mul %547, %2514 : !llvm.i64 + %2516 = llvm.add %2513, %2515 : !llvm.i64 + %2517 = llvm.getelementptr %2509[%2516] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2508, %2517 : !llvm.ptr + %2518 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2519 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2520 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2521 = llvm.mul %1996, %2520 : !llvm.i64 + %2522 = llvm.add %2519, %2521 : !llvm.i64 + %2523 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2524 = llvm.mul %547, %2523 : !llvm.i64 + %2525 = llvm.add %2522, %2524 : !llvm.i64 + %2526 = llvm.getelementptr %2518[%2525] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2527 = llvm.load %2526 : !llvm.ptr + %2528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2531 = llvm.mul %1996, %2530 : !llvm.i64 + %2532 = llvm.add %2529, %2531 : !llvm.i64 + %2533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2534 = llvm.mul %547, %2533 : !llvm.i64 + %2535 = llvm.add %2532, %2534 : !llvm.i64 + %2536 = llvm.getelementptr %2528[%2535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2527, %2536 : !llvm.ptr + %2537 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2539 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2540 = llvm.mul %1996, %2539 : !llvm.i64 + %2541 = llvm.add %2538, %2540 : !llvm.i64 + %2542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2543 = llvm.mul %59, %2542 : !llvm.i64 + %2544 = llvm.add %2541, %2543 : !llvm.i64 + %2545 = llvm.getelementptr %2537[%2544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2546 = llvm.load %2545 : !llvm.ptr + %2547 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2548 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2549 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2550 = llvm.mul %59, %2549 : !llvm.i64 + %2551 = llvm.add %2548, %2550 : !llvm.i64 + %2552 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2553 = llvm.mul %608, %2552 : !llvm.i64 + %2554 = llvm.add %2551, %2553 : !llvm.i64 + %2555 = llvm.getelementptr %2547[%2554] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2556 = llvm.load %2555 : !llvm.ptr + %2557 = llvm.fmul %2546, %2556 {RelaxedPrecision} : !llvm.float + %2558 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2561 = llvm.mul %1996, %2560 : !llvm.i64 + %2562 = llvm.add %2559, %2561 : !llvm.i64 + %2563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2564 = llvm.mul %608, %2563 : !llvm.i64 + %2565 = llvm.add %2562, %2564 : !llvm.i64 + %2566 = llvm.getelementptr %2558[%2565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2567 = llvm.load %2566 : !llvm.ptr + %2568 = llvm.fadd %2567, %2557 {RelaxedPrecision} : !llvm.float + %2569 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2570 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2571 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2572 = llvm.mul %1996, %2571 : !llvm.i64 + %2573 = llvm.add %2570, %2572 : !llvm.i64 + %2574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2575 = llvm.mul %608, %2574 : !llvm.i64 + %2576 = llvm.add %2573, %2575 : !llvm.i64 + %2577 = llvm.getelementptr %2569[%2576] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2568, %2577 : !llvm.ptr + %2578 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2580 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2581 = llvm.mul %1996, %2580 : !llvm.i64 + %2582 = llvm.add %2579, %2581 : !llvm.i64 + %2583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2584 = llvm.mul %608, %2583 : !llvm.i64 + %2585 = llvm.add %2582, %2584 : !llvm.i64 + %2586 = llvm.getelementptr %2578[%2585] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2587 = llvm.load %2586 : !llvm.ptr + %2588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2591 = llvm.mul %1996, %2590 : !llvm.i64 + %2592 = llvm.add %2589, %2591 : !llvm.i64 + %2593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2594 = llvm.mul %608, %2593 : !llvm.i64 + %2595 = llvm.add %2592, %2594 : !llvm.i64 + %2596 = llvm.getelementptr %2588[%2595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2587, %2596 : !llvm.ptr + %2597 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2599 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2600 = llvm.mul %1996, %2599 : !llvm.i64 + %2601 = llvm.add %2598, %2600 : !llvm.i64 + %2602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2603 = llvm.mul %59, %2602 : !llvm.i64 + %2604 = llvm.add %2601, %2603 : !llvm.i64 + %2605 = llvm.getelementptr %2597[%2604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2606 = llvm.load %2605 : !llvm.ptr + %2607 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2608 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2610 = llvm.mul %59, %2609 : !llvm.i64 + %2611 = llvm.add %2608, %2610 : !llvm.i64 + %2612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2613 = llvm.mul %669, %2612 : !llvm.i64 + %2614 = llvm.add %2611, %2613 : !llvm.i64 + %2615 = llvm.getelementptr %2607[%2614] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2616 = llvm.load %2615 : !llvm.ptr + %2617 = llvm.fmul %2606, %2616 {RelaxedPrecision} : !llvm.float + %2618 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2620 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2621 = llvm.mul %1996, %2620 : !llvm.i64 + %2622 = llvm.add %2619, %2621 : !llvm.i64 + %2623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2624 = llvm.mul %669, %2623 : !llvm.i64 + %2625 = llvm.add %2622, %2624 : !llvm.i64 + %2626 = llvm.getelementptr %2618[%2625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2627 = llvm.load %2626 : !llvm.ptr + %2628 = llvm.fadd %2627, %2617 {RelaxedPrecision} : !llvm.float + %2629 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2630 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2631 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2632 = llvm.mul %1996, %2631 : !llvm.i64 + %2633 = llvm.add %2630, %2632 : !llvm.i64 + %2634 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2635 = llvm.mul %669, %2634 : !llvm.i64 + %2636 = llvm.add %2633, %2635 : !llvm.i64 + %2637 = llvm.getelementptr %2629[%2636] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2628, %2637 : !llvm.ptr + %2638 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2639 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2640 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2641 = llvm.mul %1996, %2640 : !llvm.i64 + %2642 = llvm.add %2639, %2641 : !llvm.i64 + %2643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2644 = llvm.mul %669, %2643 : !llvm.i64 + %2645 = llvm.add %2642, %2644 : !llvm.i64 + %2646 = llvm.getelementptr %2638[%2645] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2647 = llvm.load %2646 : !llvm.ptr + %2648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2651 = llvm.mul %1996, %2650 : !llvm.i64 + %2652 = llvm.add %2649, %2651 : !llvm.i64 + %2653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2654 = llvm.mul %669, %2653 : !llvm.i64 + %2655 = llvm.add %2652, %2654 : !llvm.i64 + %2656 = llvm.getelementptr %2648[%2655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2647, %2656 : !llvm.ptr + %2657 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2659 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2660 = llvm.mul %1996, %2659 : !llvm.i64 + %2661 = llvm.add %2658, %2660 : !llvm.i64 + %2662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2663 = llvm.mul %59, %2662 : !llvm.i64 + %2664 = llvm.add %2661, %2663 : !llvm.i64 + %2665 = llvm.getelementptr %2657[%2664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2666 = llvm.load %2665 : !llvm.ptr + %2667 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2668 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2669 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2670 = llvm.mul %59, %2669 : !llvm.i64 + %2671 = llvm.add %2668, %2670 : !llvm.i64 + %2672 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2673 = llvm.mul %730, %2672 : !llvm.i64 + %2674 = llvm.add %2671, %2673 : !llvm.i64 + %2675 = llvm.getelementptr %2667[%2674] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2676 = llvm.load %2675 : !llvm.ptr + %2677 = llvm.fmul %2666, %2676 {RelaxedPrecision} : !llvm.float + %2678 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2680 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2681 = llvm.mul %1996, %2680 : !llvm.i64 + %2682 = llvm.add %2679, %2681 : !llvm.i64 + %2683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2684 = llvm.mul %730, %2683 : !llvm.i64 + %2685 = llvm.add %2682, %2684 : !llvm.i64 + %2686 = llvm.getelementptr %2678[%2685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2687 = llvm.load %2686 : !llvm.ptr + %2688 = llvm.fadd %2687, %2677 {RelaxedPrecision} : !llvm.float + %2689 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2690 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2691 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2692 = llvm.mul %1996, %2691 : !llvm.i64 + %2693 = llvm.add %2690, %2692 : !llvm.i64 + %2694 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2695 = llvm.mul %730, %2694 : !llvm.i64 + %2696 = llvm.add %2693, %2695 : !llvm.i64 + %2697 = llvm.getelementptr %2689[%2696] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2688, %2697 : !llvm.ptr + %2698 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2700 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2701 = llvm.mul %1996, %2700 : !llvm.i64 + %2702 = llvm.add %2699, %2701 : !llvm.i64 + %2703 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2704 = llvm.mul %730, %2703 : !llvm.i64 + %2705 = llvm.add %2702, %2704 : !llvm.i64 + %2706 = llvm.getelementptr %2698[%2705] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2707 = llvm.load %2706 : !llvm.ptr + %2708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2711 = llvm.mul %1996, %2710 : !llvm.i64 + %2712 = llvm.add %2709, %2711 : !llvm.i64 + %2713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2714 = llvm.mul %730, %2713 : !llvm.i64 + %2715 = llvm.add %2712, %2714 : !llvm.i64 + %2716 = llvm.getelementptr %2708[%2715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2707, %2716 : !llvm.ptr + %2717 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2718 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2719 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2720 = llvm.mul %1996, %2719 : !llvm.i64 + %2721 = llvm.add %2718, %2720 : !llvm.i64 + %2722 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2723 = llvm.mul %59, %2722 : !llvm.i64 + %2724 = llvm.add %2721, %2723 : !llvm.i64 + %2725 = llvm.getelementptr %2717[%2724] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2726 = llvm.load %2725 : !llvm.ptr + %2727 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2729 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2730 = llvm.mul %59, %2729 : !llvm.i64 + %2731 = llvm.add %2728, %2730 : !llvm.i64 + %2732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2733 = llvm.mul %791, %2732 : !llvm.i64 + %2734 = llvm.add %2731, %2733 : !llvm.i64 + %2735 = llvm.getelementptr %2727[%2734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2736 = llvm.load %2735 : !llvm.ptr + %2737 = llvm.fmul %2726, %2736 {RelaxedPrecision} : !llvm.float + %2738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2741 = llvm.mul %1996, %2740 : !llvm.i64 + %2742 = llvm.add %2739, %2741 : !llvm.i64 + %2743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2744 = llvm.mul %791, %2743 : !llvm.i64 + %2745 = llvm.add %2742, %2744 : !llvm.i64 + %2746 = llvm.getelementptr %2738[%2745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2747 = llvm.load %2746 : !llvm.ptr + %2748 = llvm.fadd %2747, %2737 {RelaxedPrecision} : !llvm.float + %2749 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2750 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2751 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2752 = llvm.mul %1996, %2751 : !llvm.i64 + %2753 = llvm.add %2750, %2752 : !llvm.i64 + %2754 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2755 = llvm.mul %791, %2754 : !llvm.i64 + %2756 = llvm.add %2753, %2755 : !llvm.i64 + %2757 = llvm.getelementptr %2749[%2756] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2748, %2757 : !llvm.ptr + %2758 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2759 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2760 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2761 = llvm.mul %1996, %2760 : !llvm.i64 + %2762 = llvm.add %2759, %2761 : !llvm.i64 + %2763 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2764 = llvm.mul %791, %2763 : !llvm.i64 + %2765 = llvm.add %2762, %2764 : !llvm.i64 + %2766 = llvm.getelementptr %2758[%2765] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2767 = llvm.load %2766 : !llvm.ptr + %2768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2771 = llvm.mul %1996, %2770 : !llvm.i64 + %2772 = llvm.add %2769, %2771 : !llvm.i64 + %2773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2774 = llvm.mul %791, %2773 : !llvm.i64 + %2775 = llvm.add %2772, %2774 : !llvm.i64 + %2776 = llvm.getelementptr %2768[%2775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2767, %2776 : !llvm.ptr + %2777 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2779 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2780 = llvm.mul %1996, %2779 : !llvm.i64 + %2781 = llvm.add %2778, %2780 : !llvm.i64 + %2782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2783 = llvm.mul %59, %2782 : !llvm.i64 + %2784 = llvm.add %2781, %2783 : !llvm.i64 + %2785 = llvm.getelementptr %2777[%2784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2786 = llvm.load %2785 : !llvm.ptr + %2787 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2788 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2789 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2790 = llvm.mul %59, %2789 : !llvm.i64 + %2791 = llvm.add %2788, %2790 : !llvm.i64 + %2792 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2793 = llvm.mul %852, %2792 : !llvm.i64 + %2794 = llvm.add %2791, %2793 : !llvm.i64 + %2795 = llvm.getelementptr %2787[%2794] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2796 = llvm.load %2795 : !llvm.ptr + %2797 = llvm.fmul %2786, %2796 {RelaxedPrecision} : !llvm.float + %2798 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2800 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2801 = llvm.mul %1996, %2800 : !llvm.i64 + %2802 = llvm.add %2799, %2801 : !llvm.i64 + %2803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2804 = llvm.mul %852, %2803 : !llvm.i64 + %2805 = llvm.add %2802, %2804 : !llvm.i64 + %2806 = llvm.getelementptr %2798[%2805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2807 = llvm.load %2806 : !llvm.ptr + %2808 = llvm.fadd %2807, %2797 {RelaxedPrecision} : !llvm.float + %2809 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2810 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2811 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2812 = llvm.mul %1996, %2811 : !llvm.i64 + %2813 = llvm.add %2810, %2812 : !llvm.i64 + %2814 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2815 = llvm.mul %852, %2814 : !llvm.i64 + %2816 = llvm.add %2813, %2815 : !llvm.i64 + %2817 = llvm.getelementptr %2809[%2816] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2808, %2817 : !llvm.ptr + %2818 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2819 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2820 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2821 = llvm.mul %1996, %2820 : !llvm.i64 + %2822 = llvm.add %2819, %2821 : !llvm.i64 + %2823 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2824 = llvm.mul %852, %2823 : !llvm.i64 + %2825 = llvm.add %2822, %2824 : !llvm.i64 + %2826 = llvm.getelementptr %2818[%2825] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2827 = llvm.load %2826 : !llvm.ptr + %2828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2831 = llvm.mul %1996, %2830 : !llvm.i64 + %2832 = llvm.add %2829, %2831 : !llvm.i64 + %2833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2834 = llvm.mul %852, %2833 : !llvm.i64 + %2835 = llvm.add %2832, %2834 : !llvm.i64 + %2836 = llvm.getelementptr %2828[%2835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2827, %2836 : !llvm.ptr + %2837 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2839 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2840 = llvm.mul %1996, %2839 : !llvm.i64 + %2841 = llvm.add %2838, %2840 : !llvm.i64 + %2842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2843 = llvm.mul %59, %2842 : !llvm.i64 + %2844 = llvm.add %2841, %2843 : !llvm.i64 + %2845 = llvm.getelementptr %2837[%2844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2846 = llvm.load %2845 : !llvm.ptr + %2847 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2848 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2849 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2850 = llvm.mul %59, %2849 : !llvm.i64 + %2851 = llvm.add %2848, %2850 : !llvm.i64 + %2852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2853 = llvm.mul %913, %2852 : !llvm.i64 + %2854 = llvm.add %2851, %2853 : !llvm.i64 + %2855 = llvm.getelementptr %2847[%2854] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2856 = llvm.load %2855 : !llvm.ptr + %2857 = llvm.fmul %2846, %2856 {RelaxedPrecision} : !llvm.float + %2858 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2860 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2861 = llvm.mul %1996, %2860 : !llvm.i64 + %2862 = llvm.add %2859, %2861 : !llvm.i64 + %2863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2864 = llvm.mul %913, %2863 : !llvm.i64 + %2865 = llvm.add %2862, %2864 : !llvm.i64 + %2866 = llvm.getelementptr %2858[%2865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2867 = llvm.load %2866 : !llvm.ptr + %2868 = llvm.fadd %2867, %2857 {RelaxedPrecision} : !llvm.float + %2869 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2871 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2872 = llvm.mul %1996, %2871 : !llvm.i64 + %2873 = llvm.add %2870, %2872 : !llvm.i64 + %2874 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2875 = llvm.mul %913, %2874 : !llvm.i64 + %2876 = llvm.add %2873, %2875 : !llvm.i64 + %2877 = llvm.getelementptr %2869[%2876] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2868, %2877 : !llvm.ptr + %2878 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2880 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2881 = llvm.mul %1996, %2880 : !llvm.i64 + %2882 = llvm.add %2879, %2881 : !llvm.i64 + %2883 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2884 = llvm.mul %913, %2883 : !llvm.i64 + %2885 = llvm.add %2882, %2884 : !llvm.i64 + %2886 = llvm.getelementptr %2878[%2885] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2887 = llvm.load %2886 : !llvm.ptr + %2888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2891 = llvm.mul %1996, %2890 : !llvm.i64 + %2892 = llvm.add %2889, %2891 : !llvm.i64 + %2893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2894 = llvm.mul %913, %2893 : !llvm.i64 + %2895 = llvm.add %2892, %2894 : !llvm.i64 + %2896 = llvm.getelementptr %2888[%2895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2887, %2896 : !llvm.ptr + %2897 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2899 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2900 = llvm.mul %1996, %2899 : !llvm.i64 + %2901 = llvm.add %2898, %2900 : !llvm.i64 + %2902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2903 = llvm.mul %59, %2902 : !llvm.i64 + %2904 = llvm.add %2901, %2903 : !llvm.i64 + %2905 = llvm.getelementptr %2897[%2904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2906 = llvm.load %2905 : !llvm.ptr + %2907 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2908 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2909 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2910 = llvm.mul %59, %2909 : !llvm.i64 + %2911 = llvm.add %2908, %2910 : !llvm.i64 + %2912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2913 = llvm.mul %974, %2912 : !llvm.i64 + %2914 = llvm.add %2911, %2913 : !llvm.i64 + %2915 = llvm.getelementptr %2907[%2914] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2916 = llvm.load %2915 : !llvm.ptr + %2917 = llvm.fmul %2906, %2916 {RelaxedPrecision} : !llvm.float + %2918 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2920 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2921 = llvm.mul %1996, %2920 : !llvm.i64 + %2922 = llvm.add %2919, %2921 : !llvm.i64 + %2923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2924 = llvm.mul %974, %2923 : !llvm.i64 + %2925 = llvm.add %2922, %2924 : !llvm.i64 + %2926 = llvm.getelementptr %2918[%2925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2927 = llvm.load %2926 : !llvm.ptr + %2928 = llvm.fadd %2927, %2917 {RelaxedPrecision} : !llvm.float + %2929 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2930 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2931 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2932 = llvm.mul %1996, %2931 : !llvm.i64 + %2933 = llvm.add %2930, %2932 : !llvm.i64 + %2934 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2935 = llvm.mul %974, %2934 : !llvm.i64 + %2936 = llvm.add %2933, %2935 : !llvm.i64 + %2937 = llvm.getelementptr %2929[%2936] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2928, %2937 : !llvm.ptr + %2938 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2940 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2941 = llvm.mul %1996, %2940 : !llvm.i64 + %2942 = llvm.add %2939, %2941 : !llvm.i64 + %2943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2944 = llvm.mul %974, %2943 : !llvm.i64 + %2945 = llvm.add %2942, %2944 : !llvm.i64 + %2946 = llvm.getelementptr %2938[%2945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2947 = llvm.load %2946 : !llvm.ptr + %2948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2951 = llvm.mul %1996, %2950 : !llvm.i64 + %2952 = llvm.add %2949, %2951 : !llvm.i64 + %2953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2954 = llvm.mul %974, %2953 : !llvm.i64 + %2955 = llvm.add %2952, %2954 : !llvm.i64 + %2956 = llvm.getelementptr %2948[%2955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2947, %2956 : !llvm.ptr + %2957 = llvm.add %50, %35 : !llvm.i64 + %2958 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2960 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2961 = llvm.mul %2957, %2960 : !llvm.i64 + %2962 = llvm.add %2959, %2961 : !llvm.i64 + %2963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2964 = llvm.mul %59, %2963 : !llvm.i64 + %2965 = llvm.add %2962, %2964 : !llvm.i64 + %2966 = llvm.getelementptr %2958[%2965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2967 = llvm.load %2966 : !llvm.ptr + %2968 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2971 = llvm.mul %59, %2970 : !llvm.i64 + %2972 = llvm.add %2969, %2971 : !llvm.i64 + %2973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2974 = llvm.mul %58, %2973 : !llvm.i64 + %2975 = llvm.add %2972, %2974 : !llvm.i64 + %2976 = llvm.getelementptr %2968[%2975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2977 = llvm.load %2976 : !llvm.ptr + %2978 = llvm.fmul %2967, %2977 {RelaxedPrecision} : !llvm.float + %2979 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2981 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2982 = llvm.mul %2957, %2981 : !llvm.i64 + %2983 = llvm.add %2980, %2982 : !llvm.i64 + %2984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2985 = llvm.mul %58, %2984 : !llvm.i64 + %2986 = llvm.add %2983, %2985 : !llvm.i64 + %2987 = llvm.getelementptr %2979[%2986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2988 = llvm.load %2987 : !llvm.ptr + %2989 = llvm.fadd %2988, %2978 {RelaxedPrecision} : !llvm.float + %2990 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2991 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2992 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2993 = llvm.mul %2957, %2992 : !llvm.i64 + %2994 = llvm.add %2991, %2993 : !llvm.i64 + %2995 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2996 = llvm.mul %58, %2995 : !llvm.i64 + %2997 = llvm.add %2994, %2996 : !llvm.i64 + %2998 = llvm.getelementptr %2990[%2997] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2989, %2998 : !llvm.ptr + %2999 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3000 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3001 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3002 = llvm.mul %2957, %3001 : !llvm.i64 + %3003 = llvm.add %3000, %3002 : !llvm.i64 + %3004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3005 = llvm.mul %58, %3004 : !llvm.i64 + %3006 = llvm.add %3003, %3005 : !llvm.i64 + %3007 = llvm.getelementptr %2999[%3006] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3008 = llvm.load %3007 : !llvm.ptr + %3009 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3010 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3011 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3012 = llvm.mul %2957, %3011 : !llvm.i64 + %3013 = llvm.add %3010, %3012 : !llvm.i64 + %3014 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3015 = llvm.mul %58, %3014 : !llvm.i64 + %3016 = llvm.add %3013, %3015 : !llvm.i64 + %3017 = llvm.getelementptr %3009[%3016] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3008, %3017 : !llvm.ptr + %3018 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3020 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3021 = llvm.mul %2957, %3020 : !llvm.i64 + %3022 = llvm.add %3019, %3021 : !llvm.i64 + %3023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3024 = llvm.mul %59, %3023 : !llvm.i64 + %3025 = llvm.add %3022, %3024 : !llvm.i64 + %3026 = llvm.getelementptr %3018[%3025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3027 = llvm.load %3026 : !llvm.ptr + %3028 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3031 = llvm.mul %59, %3030 : !llvm.i64 + %3032 = llvm.add %3029, %3031 : !llvm.i64 + %3033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3034 = llvm.mul %120, %3033 : !llvm.i64 + %3035 = llvm.add %3032, %3034 : !llvm.i64 + %3036 = llvm.getelementptr %3028[%3035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3037 = llvm.load %3036 : !llvm.ptr + %3038 = llvm.fmul %3027, %3037 {RelaxedPrecision} : !llvm.float + %3039 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3041 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3042 = llvm.mul %2957, %3041 : !llvm.i64 + %3043 = llvm.add %3040, %3042 : !llvm.i64 + %3044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3045 = llvm.mul %120, %3044 : !llvm.i64 + %3046 = llvm.add %3043, %3045 : !llvm.i64 + %3047 = llvm.getelementptr %3039[%3046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3048 = llvm.load %3047 : !llvm.ptr + %3049 = llvm.fadd %3048, %3038 {RelaxedPrecision} : !llvm.float + %3050 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3051 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3052 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3053 = llvm.mul %2957, %3052 : !llvm.i64 + %3054 = llvm.add %3051, %3053 : !llvm.i64 + %3055 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3056 = llvm.mul %120, %3055 : !llvm.i64 + %3057 = llvm.add %3054, %3056 : !llvm.i64 + %3058 = llvm.getelementptr %3050[%3057] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3049, %3058 : !llvm.ptr + %3059 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3060 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3061 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3062 = llvm.mul %2957, %3061 : !llvm.i64 + %3063 = llvm.add %3060, %3062 : !llvm.i64 + %3064 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3065 = llvm.mul %120, %3064 : !llvm.i64 + %3066 = llvm.add %3063, %3065 : !llvm.i64 + %3067 = llvm.getelementptr %3059[%3066] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3068 = llvm.load %3067 : !llvm.ptr + %3069 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3070 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3071 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3072 = llvm.mul %2957, %3071 : !llvm.i64 + %3073 = llvm.add %3070, %3072 : !llvm.i64 + %3074 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3075 = llvm.mul %120, %3074 : !llvm.i64 + %3076 = llvm.add %3073, %3075 : !llvm.i64 + %3077 = llvm.getelementptr %3069[%3076] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3068, %3077 : !llvm.ptr + %3078 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3080 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3081 = llvm.mul %2957, %3080 : !llvm.i64 + %3082 = llvm.add %3079, %3081 : !llvm.i64 + %3083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3084 = llvm.mul %59, %3083 : !llvm.i64 + %3085 = llvm.add %3082, %3084 : !llvm.i64 + %3086 = llvm.getelementptr %3078[%3085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3087 = llvm.load %3086 : !llvm.ptr + %3088 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3091 = llvm.mul %59, %3090 : !llvm.i64 + %3092 = llvm.add %3089, %3091 : !llvm.i64 + %3093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3094 = llvm.mul %181, %3093 : !llvm.i64 + %3095 = llvm.add %3092, %3094 : !llvm.i64 + %3096 = llvm.getelementptr %3088[%3095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3097 = llvm.load %3096 : !llvm.ptr + %3098 = llvm.fmul %3087, %3097 {RelaxedPrecision} : !llvm.float + %3099 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3101 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3102 = llvm.mul %2957, %3101 : !llvm.i64 + %3103 = llvm.add %3100, %3102 : !llvm.i64 + %3104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3105 = llvm.mul %181, %3104 : !llvm.i64 + %3106 = llvm.add %3103, %3105 : !llvm.i64 + %3107 = llvm.getelementptr %3099[%3106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3108 = llvm.load %3107 : !llvm.ptr + %3109 = llvm.fadd %3108, %3098 {RelaxedPrecision} : !llvm.float + %3110 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3111 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3112 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3113 = llvm.mul %2957, %3112 : !llvm.i64 + %3114 = llvm.add %3111, %3113 : !llvm.i64 + %3115 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3116 = llvm.mul %181, %3115 : !llvm.i64 + %3117 = llvm.add %3114, %3116 : !llvm.i64 + %3118 = llvm.getelementptr %3110[%3117] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3109, %3118 : !llvm.ptr + %3119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3121 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3122 = llvm.mul %2957, %3121 : !llvm.i64 + %3123 = llvm.add %3120, %3122 : !llvm.i64 + %3124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3125 = llvm.mul %181, %3124 : !llvm.i64 + %3126 = llvm.add %3123, %3125 : !llvm.i64 + %3127 = llvm.getelementptr %3119[%3126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3128 = llvm.load %3127 : !llvm.ptr + %3129 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3130 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3131 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3132 = llvm.mul %2957, %3131 : !llvm.i64 + %3133 = llvm.add %3130, %3132 : !llvm.i64 + %3134 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3135 = llvm.mul %181, %3134 : !llvm.i64 + %3136 = llvm.add %3133, %3135 : !llvm.i64 + %3137 = llvm.getelementptr %3129[%3136] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3128, %3137 : !llvm.ptr + %3138 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3140 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3141 = llvm.mul %2957, %3140 : !llvm.i64 + %3142 = llvm.add %3139, %3141 : !llvm.i64 + %3143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3144 = llvm.mul %59, %3143 : !llvm.i64 + %3145 = llvm.add %3142, %3144 : !llvm.i64 + %3146 = llvm.getelementptr %3138[%3145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3147 = llvm.load %3146 : !llvm.ptr + %3148 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3151 = llvm.mul %59, %3150 : !llvm.i64 + %3152 = llvm.add %3149, %3151 : !llvm.i64 + %3153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3154 = llvm.mul %242, %3153 : !llvm.i64 + %3155 = llvm.add %3152, %3154 : !llvm.i64 + %3156 = llvm.getelementptr %3148[%3155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3157 = llvm.load %3156 : !llvm.ptr + %3158 = llvm.fmul %3147, %3157 {RelaxedPrecision} : !llvm.float + %3159 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3161 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3162 = llvm.mul %2957, %3161 : !llvm.i64 + %3163 = llvm.add %3160, %3162 : !llvm.i64 + %3164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3165 = llvm.mul %242, %3164 : !llvm.i64 + %3166 = llvm.add %3163, %3165 : !llvm.i64 + %3167 = llvm.getelementptr %3159[%3166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3168 = llvm.load %3167 : !llvm.ptr + %3169 = llvm.fadd %3168, %3158 {RelaxedPrecision} : !llvm.float + %3170 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3171 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3172 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3173 = llvm.mul %2957, %3172 : !llvm.i64 + %3174 = llvm.add %3171, %3173 : !llvm.i64 + %3175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3176 = llvm.mul %242, %3175 : !llvm.i64 + %3177 = llvm.add %3174, %3176 : !llvm.i64 + %3178 = llvm.getelementptr %3170[%3177] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3169, %3178 : !llvm.ptr + %3179 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3182 = llvm.mul %2957, %3181 : !llvm.i64 + %3183 = llvm.add %3180, %3182 : !llvm.i64 + %3184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3185 = llvm.mul %242, %3184 : !llvm.i64 + %3186 = llvm.add %3183, %3185 : !llvm.i64 + %3187 = llvm.getelementptr %3179[%3186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3188 = llvm.load %3187 : !llvm.ptr + %3189 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3191 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3192 = llvm.mul %2957, %3191 : !llvm.i64 + %3193 = llvm.add %3190, %3192 : !llvm.i64 + %3194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3195 = llvm.mul %242, %3194 : !llvm.i64 + %3196 = llvm.add %3193, %3195 : !llvm.i64 + %3197 = llvm.getelementptr %3189[%3196] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3188, %3197 : !llvm.ptr + %3198 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3200 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3201 = llvm.mul %2957, %3200 : !llvm.i64 + %3202 = llvm.add %3199, %3201 : !llvm.i64 + %3203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3204 = llvm.mul %59, %3203 : !llvm.i64 + %3205 = llvm.add %3202, %3204 : !llvm.i64 + %3206 = llvm.getelementptr %3198[%3205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3207 = llvm.load %3206 : !llvm.ptr + %3208 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3211 = llvm.mul %59, %3210 : !llvm.i64 + %3212 = llvm.add %3209, %3211 : !llvm.i64 + %3213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3214 = llvm.mul %303, %3213 : !llvm.i64 + %3215 = llvm.add %3212, %3214 : !llvm.i64 + %3216 = llvm.getelementptr %3208[%3215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3217 = llvm.load %3216 : !llvm.ptr + %3218 = llvm.fmul %3207, %3217 {RelaxedPrecision} : !llvm.float + %3219 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3221 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3222 = llvm.mul %2957, %3221 : !llvm.i64 + %3223 = llvm.add %3220, %3222 : !llvm.i64 + %3224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3225 = llvm.mul %303, %3224 : !llvm.i64 + %3226 = llvm.add %3223, %3225 : !llvm.i64 + %3227 = llvm.getelementptr %3219[%3226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3228 = llvm.load %3227 : !llvm.ptr + %3229 = llvm.fadd %3228, %3218 {RelaxedPrecision} : !llvm.float + %3230 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3232 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3233 = llvm.mul %2957, %3232 : !llvm.i64 + %3234 = llvm.add %3231, %3233 : !llvm.i64 + %3235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3236 = llvm.mul %303, %3235 : !llvm.i64 + %3237 = llvm.add %3234, %3236 : !llvm.i64 + %3238 = llvm.getelementptr %3230[%3237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3229, %3238 : !llvm.ptr + %3239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3242 = llvm.mul %2957, %3241 : !llvm.i64 + %3243 = llvm.add %3240, %3242 : !llvm.i64 + %3244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3245 = llvm.mul %303, %3244 : !llvm.i64 + %3246 = llvm.add %3243, %3245 : !llvm.i64 + %3247 = llvm.getelementptr %3239[%3246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3248 = llvm.load %3247 : !llvm.ptr + %3249 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3250 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3251 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3252 = llvm.mul %2957, %3251 : !llvm.i64 + %3253 = llvm.add %3250, %3252 : !llvm.i64 + %3254 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3255 = llvm.mul %303, %3254 : !llvm.i64 + %3256 = llvm.add %3253, %3255 : !llvm.i64 + %3257 = llvm.getelementptr %3249[%3256] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3248, %3257 : !llvm.ptr + %3258 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3260 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3261 = llvm.mul %2957, %3260 : !llvm.i64 + %3262 = llvm.add %3259, %3261 : !llvm.i64 + %3263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3264 = llvm.mul %59, %3263 : !llvm.i64 + %3265 = llvm.add %3262, %3264 : !llvm.i64 + %3266 = llvm.getelementptr %3258[%3265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3267 = llvm.load %3266 : !llvm.ptr + %3268 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3271 = llvm.mul %59, %3270 : !llvm.i64 + %3272 = llvm.add %3269, %3271 : !llvm.i64 + %3273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3274 = llvm.mul %364, %3273 : !llvm.i64 + %3275 = llvm.add %3272, %3274 : !llvm.i64 + %3276 = llvm.getelementptr %3268[%3275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3277 = llvm.load %3276 : !llvm.ptr + %3278 = llvm.fmul %3267, %3277 {RelaxedPrecision} : !llvm.float + %3279 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3281 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3282 = llvm.mul %2957, %3281 : !llvm.i64 + %3283 = llvm.add %3280, %3282 : !llvm.i64 + %3284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3285 = llvm.mul %364, %3284 : !llvm.i64 + %3286 = llvm.add %3283, %3285 : !llvm.i64 + %3287 = llvm.getelementptr %3279[%3286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3288 = llvm.load %3287 : !llvm.ptr + %3289 = llvm.fadd %3288, %3278 {RelaxedPrecision} : !llvm.float + %3290 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3291 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3292 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3293 = llvm.mul %2957, %3292 : !llvm.i64 + %3294 = llvm.add %3291, %3293 : !llvm.i64 + %3295 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3296 = llvm.mul %364, %3295 : !llvm.i64 + %3297 = llvm.add %3294, %3296 : !llvm.i64 + %3298 = llvm.getelementptr %3290[%3297] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3289, %3298 : !llvm.ptr + %3299 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3300 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3301 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3302 = llvm.mul %2957, %3301 : !llvm.i64 + %3303 = llvm.add %3300, %3302 : !llvm.i64 + %3304 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3305 = llvm.mul %364, %3304 : !llvm.i64 + %3306 = llvm.add %3303, %3305 : !llvm.i64 + %3307 = llvm.getelementptr %3299[%3306] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3308 = llvm.load %3307 : !llvm.ptr + %3309 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3310 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3311 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3312 = llvm.mul %2957, %3311 : !llvm.i64 + %3313 = llvm.add %3310, %3312 : !llvm.i64 + %3314 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3315 = llvm.mul %364, %3314 : !llvm.i64 + %3316 = llvm.add %3313, %3315 : !llvm.i64 + %3317 = llvm.getelementptr %3309[%3316] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3308, %3317 : !llvm.ptr + %3318 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3320 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3321 = llvm.mul %2957, %3320 : !llvm.i64 + %3322 = llvm.add %3319, %3321 : !llvm.i64 + %3323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3324 = llvm.mul %59, %3323 : !llvm.i64 + %3325 = llvm.add %3322, %3324 : !llvm.i64 + %3326 = llvm.getelementptr %3318[%3325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3327 = llvm.load %3326 : !llvm.ptr + %3328 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3331 = llvm.mul %59, %3330 : !llvm.i64 + %3332 = llvm.add %3329, %3331 : !llvm.i64 + %3333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3334 = llvm.mul %425, %3333 : !llvm.i64 + %3335 = llvm.add %3332, %3334 : !llvm.i64 + %3336 = llvm.getelementptr %3328[%3335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3337 = llvm.load %3336 : !llvm.ptr + %3338 = llvm.fmul %3327, %3337 {RelaxedPrecision} : !llvm.float + %3339 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3341 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3342 = llvm.mul %2957, %3341 : !llvm.i64 + %3343 = llvm.add %3340, %3342 : !llvm.i64 + %3344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3345 = llvm.mul %425, %3344 : !llvm.i64 + %3346 = llvm.add %3343, %3345 : !llvm.i64 + %3347 = llvm.getelementptr %3339[%3346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3348 = llvm.load %3347 : !llvm.ptr + %3349 = llvm.fadd %3348, %3338 {RelaxedPrecision} : !llvm.float + %3350 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3351 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3352 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3353 = llvm.mul %2957, %3352 : !llvm.i64 + %3354 = llvm.add %3351, %3353 : !llvm.i64 + %3355 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3356 = llvm.mul %425, %3355 : !llvm.i64 + %3357 = llvm.add %3354, %3356 : !llvm.i64 + %3358 = llvm.getelementptr %3350[%3357] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3349, %3358 : !llvm.ptr + %3359 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3360 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3361 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3362 = llvm.mul %2957, %3361 : !llvm.i64 + %3363 = llvm.add %3360, %3362 : !llvm.i64 + %3364 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3365 = llvm.mul %425, %3364 : !llvm.i64 + %3366 = llvm.add %3363, %3365 : !llvm.i64 + %3367 = llvm.getelementptr %3359[%3366] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3368 = llvm.load %3367 : !llvm.ptr + %3369 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3370 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3371 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3372 = llvm.mul %2957, %3371 : !llvm.i64 + %3373 = llvm.add %3370, %3372 : !llvm.i64 + %3374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3375 = llvm.mul %425, %3374 : !llvm.i64 + %3376 = llvm.add %3373, %3375 : !llvm.i64 + %3377 = llvm.getelementptr %3369[%3376] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3368, %3377 : !llvm.ptr + %3378 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3380 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3381 = llvm.mul %2957, %3380 : !llvm.i64 + %3382 = llvm.add %3379, %3381 : !llvm.i64 + %3383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3384 = llvm.mul %59, %3383 : !llvm.i64 + %3385 = llvm.add %3382, %3384 : !llvm.i64 + %3386 = llvm.getelementptr %3378[%3385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3387 = llvm.load %3386 : !llvm.ptr + %3388 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3391 = llvm.mul %59, %3390 : !llvm.i64 + %3392 = llvm.add %3389, %3391 : !llvm.i64 + %3393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3394 = llvm.mul %486, %3393 : !llvm.i64 + %3395 = llvm.add %3392, %3394 : !llvm.i64 + %3396 = llvm.getelementptr %3388[%3395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3397 = llvm.load %3396 : !llvm.ptr + %3398 = llvm.fmul %3387, %3397 {RelaxedPrecision} : !llvm.float + %3399 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3401 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3402 = llvm.mul %2957, %3401 : !llvm.i64 + %3403 = llvm.add %3400, %3402 : !llvm.i64 + %3404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3405 = llvm.mul %486, %3404 : !llvm.i64 + %3406 = llvm.add %3403, %3405 : !llvm.i64 + %3407 = llvm.getelementptr %3399[%3406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3408 = llvm.load %3407 : !llvm.ptr + %3409 = llvm.fadd %3408, %3398 {RelaxedPrecision} : !llvm.float + %3410 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3413 = llvm.mul %2957, %3412 : !llvm.i64 + %3414 = llvm.add %3411, %3413 : !llvm.i64 + %3415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3416 = llvm.mul %486, %3415 : !llvm.i64 + %3417 = llvm.add %3414, %3416 : !llvm.i64 + %3418 = llvm.getelementptr %3410[%3417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3409, %3418 : !llvm.ptr + %3419 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3422 = llvm.mul %2957, %3421 : !llvm.i64 + %3423 = llvm.add %3420, %3422 : !llvm.i64 + %3424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3425 = llvm.mul %486, %3424 : !llvm.i64 + %3426 = llvm.add %3423, %3425 : !llvm.i64 + %3427 = llvm.getelementptr %3419[%3426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3428 = llvm.load %3427 : !llvm.ptr + %3429 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3430 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3431 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3432 = llvm.mul %2957, %3431 : !llvm.i64 + %3433 = llvm.add %3430, %3432 : !llvm.i64 + %3434 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3435 = llvm.mul %486, %3434 : !llvm.i64 + %3436 = llvm.add %3433, %3435 : !llvm.i64 + %3437 = llvm.getelementptr %3429[%3436] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3428, %3437 : !llvm.ptr + %3438 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3440 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3441 = llvm.mul %2957, %3440 : !llvm.i64 + %3442 = llvm.add %3439, %3441 : !llvm.i64 + %3443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3444 = llvm.mul %59, %3443 : !llvm.i64 + %3445 = llvm.add %3442, %3444 : !llvm.i64 + %3446 = llvm.getelementptr %3438[%3445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3447 = llvm.load %3446 : !llvm.ptr + %3448 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3451 = llvm.mul %59, %3450 : !llvm.i64 + %3452 = llvm.add %3449, %3451 : !llvm.i64 + %3453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3454 = llvm.mul %547, %3453 : !llvm.i64 + %3455 = llvm.add %3452, %3454 : !llvm.i64 + %3456 = llvm.getelementptr %3448[%3455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3457 = llvm.load %3456 : !llvm.ptr + %3458 = llvm.fmul %3447, %3457 {RelaxedPrecision} : !llvm.float + %3459 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3461 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3462 = llvm.mul %2957, %3461 : !llvm.i64 + %3463 = llvm.add %3460, %3462 : !llvm.i64 + %3464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3465 = llvm.mul %547, %3464 : !llvm.i64 + %3466 = llvm.add %3463, %3465 : !llvm.i64 + %3467 = llvm.getelementptr %3459[%3466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3468 = llvm.load %3467 : !llvm.ptr + %3469 = llvm.fadd %3468, %3458 {RelaxedPrecision} : !llvm.float + %3470 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3471 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3472 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3473 = llvm.mul %2957, %3472 : !llvm.i64 + %3474 = llvm.add %3471, %3473 : !llvm.i64 + %3475 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3476 = llvm.mul %547, %3475 : !llvm.i64 + %3477 = llvm.add %3474, %3476 : !llvm.i64 + %3478 = llvm.getelementptr %3470[%3477] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3469, %3478 : !llvm.ptr + %3479 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3480 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3481 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3482 = llvm.mul %2957, %3481 : !llvm.i64 + %3483 = llvm.add %3480, %3482 : !llvm.i64 + %3484 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3485 = llvm.mul %547, %3484 : !llvm.i64 + %3486 = llvm.add %3483, %3485 : !llvm.i64 + %3487 = llvm.getelementptr %3479[%3486] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3488 = llvm.load %3487 : !llvm.ptr + %3489 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3490 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3491 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3492 = llvm.mul %2957, %3491 : !llvm.i64 + %3493 = llvm.add %3490, %3492 : !llvm.i64 + %3494 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3495 = llvm.mul %547, %3494 : !llvm.i64 + %3496 = llvm.add %3493, %3495 : !llvm.i64 + %3497 = llvm.getelementptr %3489[%3496] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3488, %3497 : !llvm.ptr + %3498 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3500 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3501 = llvm.mul %2957, %3500 : !llvm.i64 + %3502 = llvm.add %3499, %3501 : !llvm.i64 + %3503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3504 = llvm.mul %59, %3503 : !llvm.i64 + %3505 = llvm.add %3502, %3504 : !llvm.i64 + %3506 = llvm.getelementptr %3498[%3505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3507 = llvm.load %3506 : !llvm.ptr + %3508 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3511 = llvm.mul %59, %3510 : !llvm.i64 + %3512 = llvm.add %3509, %3511 : !llvm.i64 + %3513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3514 = llvm.mul %608, %3513 : !llvm.i64 + %3515 = llvm.add %3512, %3514 : !llvm.i64 + %3516 = llvm.getelementptr %3508[%3515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3517 = llvm.load %3516 : !llvm.ptr + %3518 = llvm.fmul %3507, %3517 {RelaxedPrecision} : !llvm.float + %3519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3522 = llvm.mul %2957, %3521 : !llvm.i64 + %3523 = llvm.add %3520, %3522 : !llvm.i64 + %3524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3525 = llvm.mul %608, %3524 : !llvm.i64 + %3526 = llvm.add %3523, %3525 : !llvm.i64 + %3527 = llvm.getelementptr %3519[%3526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3528 = llvm.load %3527 : !llvm.ptr + %3529 = llvm.fadd %3528, %3518 {RelaxedPrecision} : !llvm.float + %3530 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3531 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3532 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3533 = llvm.mul %2957, %3532 : !llvm.i64 + %3534 = llvm.add %3531, %3533 : !llvm.i64 + %3535 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3536 = llvm.mul %608, %3535 : !llvm.i64 + %3537 = llvm.add %3534, %3536 : !llvm.i64 + %3538 = llvm.getelementptr %3530[%3537] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3529, %3538 : !llvm.ptr + %3539 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3540 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3541 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3542 = llvm.mul %2957, %3541 : !llvm.i64 + %3543 = llvm.add %3540, %3542 : !llvm.i64 + %3544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3545 = llvm.mul %608, %3544 : !llvm.i64 + %3546 = llvm.add %3543, %3545 : !llvm.i64 + %3547 = llvm.getelementptr %3539[%3546] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3548 = llvm.load %3547 : !llvm.ptr + %3549 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3550 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3551 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3552 = llvm.mul %2957, %3551 : !llvm.i64 + %3553 = llvm.add %3550, %3552 : !llvm.i64 + %3554 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3555 = llvm.mul %608, %3554 : !llvm.i64 + %3556 = llvm.add %3553, %3555 : !llvm.i64 + %3557 = llvm.getelementptr %3549[%3556] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3548, %3557 : !llvm.ptr + %3558 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3560 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3561 = llvm.mul %2957, %3560 : !llvm.i64 + %3562 = llvm.add %3559, %3561 : !llvm.i64 + %3563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3564 = llvm.mul %59, %3563 : !llvm.i64 + %3565 = llvm.add %3562, %3564 : !llvm.i64 + %3566 = llvm.getelementptr %3558[%3565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3567 = llvm.load %3566 : !llvm.ptr + %3568 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3571 = llvm.mul %59, %3570 : !llvm.i64 + %3572 = llvm.add %3569, %3571 : !llvm.i64 + %3573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3574 = llvm.mul %669, %3573 : !llvm.i64 + %3575 = llvm.add %3572, %3574 : !llvm.i64 + %3576 = llvm.getelementptr %3568[%3575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3577 = llvm.load %3576 : !llvm.ptr + %3578 = llvm.fmul %3567, %3577 {RelaxedPrecision} : !llvm.float + %3579 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3581 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3582 = llvm.mul %2957, %3581 : !llvm.i64 + %3583 = llvm.add %3580, %3582 : !llvm.i64 + %3584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3585 = llvm.mul %669, %3584 : !llvm.i64 + %3586 = llvm.add %3583, %3585 : !llvm.i64 + %3587 = llvm.getelementptr %3579[%3586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3588 = llvm.load %3587 : !llvm.ptr + %3589 = llvm.fadd %3588, %3578 {RelaxedPrecision} : !llvm.float + %3590 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3591 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3592 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3593 = llvm.mul %2957, %3592 : !llvm.i64 + %3594 = llvm.add %3591, %3593 : !llvm.i64 + %3595 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3596 = llvm.mul %669, %3595 : !llvm.i64 + %3597 = llvm.add %3594, %3596 : !llvm.i64 + %3598 = llvm.getelementptr %3590[%3597] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3589, %3598 : !llvm.ptr + %3599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3602 = llvm.mul %2957, %3601 : !llvm.i64 + %3603 = llvm.add %3600, %3602 : !llvm.i64 + %3604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3605 = llvm.mul %669, %3604 : !llvm.i64 + %3606 = llvm.add %3603, %3605 : !llvm.i64 + %3607 = llvm.getelementptr %3599[%3606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3608 = llvm.load %3607 : !llvm.ptr + %3609 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3610 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3611 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3612 = llvm.mul %2957, %3611 : !llvm.i64 + %3613 = llvm.add %3610, %3612 : !llvm.i64 + %3614 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3615 = llvm.mul %669, %3614 : !llvm.i64 + %3616 = llvm.add %3613, %3615 : !llvm.i64 + %3617 = llvm.getelementptr %3609[%3616] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3608, %3617 : !llvm.ptr + %3618 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3620 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3621 = llvm.mul %2957, %3620 : !llvm.i64 + %3622 = llvm.add %3619, %3621 : !llvm.i64 + %3623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3624 = llvm.mul %59, %3623 : !llvm.i64 + %3625 = llvm.add %3622, %3624 : !llvm.i64 + %3626 = llvm.getelementptr %3618[%3625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3627 = llvm.load %3626 : !llvm.ptr + %3628 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3631 = llvm.mul %59, %3630 : !llvm.i64 + %3632 = llvm.add %3629, %3631 : !llvm.i64 + %3633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3634 = llvm.mul %730, %3633 : !llvm.i64 + %3635 = llvm.add %3632, %3634 : !llvm.i64 + %3636 = llvm.getelementptr %3628[%3635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3637 = llvm.load %3636 : !llvm.ptr + %3638 = llvm.fmul %3627, %3637 {RelaxedPrecision} : !llvm.float + %3639 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3641 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3642 = llvm.mul %2957, %3641 : !llvm.i64 + %3643 = llvm.add %3640, %3642 : !llvm.i64 + %3644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3645 = llvm.mul %730, %3644 : !llvm.i64 + %3646 = llvm.add %3643, %3645 : !llvm.i64 + %3647 = llvm.getelementptr %3639[%3646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3648 = llvm.load %3647 : !llvm.ptr + %3649 = llvm.fadd %3648, %3638 {RelaxedPrecision} : !llvm.float + %3650 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3651 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3652 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3653 = llvm.mul %2957, %3652 : !llvm.i64 + %3654 = llvm.add %3651, %3653 : !llvm.i64 + %3655 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3656 = llvm.mul %730, %3655 : !llvm.i64 + %3657 = llvm.add %3654, %3656 : !llvm.i64 + %3658 = llvm.getelementptr %3650[%3657] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3649, %3658 : !llvm.ptr + %3659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3662 = llvm.mul %2957, %3661 : !llvm.i64 + %3663 = llvm.add %3660, %3662 : !llvm.i64 + %3664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3665 = llvm.mul %730, %3664 : !llvm.i64 + %3666 = llvm.add %3663, %3665 : !llvm.i64 + %3667 = llvm.getelementptr %3659[%3666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3668 = llvm.load %3667 : !llvm.ptr + %3669 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3670 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3671 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3672 = llvm.mul %2957, %3671 : !llvm.i64 + %3673 = llvm.add %3670, %3672 : !llvm.i64 + %3674 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3675 = llvm.mul %730, %3674 : !llvm.i64 + %3676 = llvm.add %3673, %3675 : !llvm.i64 + %3677 = llvm.getelementptr %3669[%3676] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3668, %3677 : !llvm.ptr + %3678 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3680 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3681 = llvm.mul %2957, %3680 : !llvm.i64 + %3682 = llvm.add %3679, %3681 : !llvm.i64 + %3683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3684 = llvm.mul %59, %3683 : !llvm.i64 + %3685 = llvm.add %3682, %3684 : !llvm.i64 + %3686 = llvm.getelementptr %3678[%3685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3687 = llvm.load %3686 : !llvm.ptr + %3688 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3691 = llvm.mul %59, %3690 : !llvm.i64 + %3692 = llvm.add %3689, %3691 : !llvm.i64 + %3693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3694 = llvm.mul %791, %3693 : !llvm.i64 + %3695 = llvm.add %3692, %3694 : !llvm.i64 + %3696 = llvm.getelementptr %3688[%3695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3697 = llvm.load %3696 : !llvm.ptr + %3698 = llvm.fmul %3687, %3697 {RelaxedPrecision} : !llvm.float + %3699 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3701 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3702 = llvm.mul %2957, %3701 : !llvm.i64 + %3703 = llvm.add %3700, %3702 : !llvm.i64 + %3704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3705 = llvm.mul %791, %3704 : !llvm.i64 + %3706 = llvm.add %3703, %3705 : !llvm.i64 + %3707 = llvm.getelementptr %3699[%3706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3708 = llvm.load %3707 : !llvm.ptr + %3709 = llvm.fadd %3708, %3698 {RelaxedPrecision} : !llvm.float + %3710 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3711 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3712 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3713 = llvm.mul %2957, %3712 : !llvm.i64 + %3714 = llvm.add %3711, %3713 : !llvm.i64 + %3715 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3716 = llvm.mul %791, %3715 : !llvm.i64 + %3717 = llvm.add %3714, %3716 : !llvm.i64 + %3718 = llvm.getelementptr %3710[%3717] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3709, %3718 : !llvm.ptr + %3719 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3720 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3721 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3722 = llvm.mul %2957, %3721 : !llvm.i64 + %3723 = llvm.add %3720, %3722 : !llvm.i64 + %3724 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3725 = llvm.mul %791, %3724 : !llvm.i64 + %3726 = llvm.add %3723, %3725 : !llvm.i64 + %3727 = llvm.getelementptr %3719[%3726] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3728 = llvm.load %3727 : !llvm.ptr + %3729 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3730 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3731 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3732 = llvm.mul %2957, %3731 : !llvm.i64 + %3733 = llvm.add %3730, %3732 : !llvm.i64 + %3734 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3735 = llvm.mul %791, %3734 : !llvm.i64 + %3736 = llvm.add %3733, %3735 : !llvm.i64 + %3737 = llvm.getelementptr %3729[%3736] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3728, %3737 : !llvm.ptr + %3738 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3740 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3741 = llvm.mul %2957, %3740 : !llvm.i64 + %3742 = llvm.add %3739, %3741 : !llvm.i64 + %3743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3744 = llvm.mul %59, %3743 : !llvm.i64 + %3745 = llvm.add %3742, %3744 : !llvm.i64 + %3746 = llvm.getelementptr %3738[%3745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3747 = llvm.load %3746 : !llvm.ptr + %3748 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3750 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3751 = llvm.mul %59, %3750 : !llvm.i64 + %3752 = llvm.add %3749, %3751 : !llvm.i64 + %3753 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3754 = llvm.mul %852, %3753 : !llvm.i64 + %3755 = llvm.add %3752, %3754 : !llvm.i64 + %3756 = llvm.getelementptr %3748[%3755] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3757 = llvm.load %3756 : !llvm.ptr + %3758 = llvm.fmul %3747, %3757 {RelaxedPrecision} : !llvm.float + %3759 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3761 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3762 = llvm.mul %2957, %3761 : !llvm.i64 + %3763 = llvm.add %3760, %3762 : !llvm.i64 + %3764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3765 = llvm.mul %852, %3764 : !llvm.i64 + %3766 = llvm.add %3763, %3765 : !llvm.i64 + %3767 = llvm.getelementptr %3759[%3766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3768 = llvm.load %3767 : !llvm.ptr + %3769 = llvm.fadd %3768, %3758 {RelaxedPrecision} : !llvm.float + %3770 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3771 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3772 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3773 = llvm.mul %2957, %3772 : !llvm.i64 + %3774 = llvm.add %3771, %3773 : !llvm.i64 + %3775 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3776 = llvm.mul %852, %3775 : !llvm.i64 + %3777 = llvm.add %3774, %3776 : !llvm.i64 + %3778 = llvm.getelementptr %3770[%3777] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3769, %3778 : !llvm.ptr + %3779 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3780 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3781 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3782 = llvm.mul %2957, %3781 : !llvm.i64 + %3783 = llvm.add %3780, %3782 : !llvm.i64 + %3784 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3785 = llvm.mul %852, %3784 : !llvm.i64 + %3786 = llvm.add %3783, %3785 : !llvm.i64 + %3787 = llvm.getelementptr %3779[%3786] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3788 = llvm.load %3787 : !llvm.ptr + %3789 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3790 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3791 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3792 = llvm.mul %2957, %3791 : !llvm.i64 + %3793 = llvm.add %3790, %3792 : !llvm.i64 + %3794 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3795 = llvm.mul %852, %3794 : !llvm.i64 + %3796 = llvm.add %3793, %3795 : !llvm.i64 + %3797 = llvm.getelementptr %3789[%3796] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3788, %3797 : !llvm.ptr + %3798 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3800 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3801 = llvm.mul %2957, %3800 : !llvm.i64 + %3802 = llvm.add %3799, %3801 : !llvm.i64 + %3803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3804 = llvm.mul %59, %3803 : !llvm.i64 + %3805 = llvm.add %3802, %3804 : !llvm.i64 + %3806 = llvm.getelementptr %3798[%3805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3807 = llvm.load %3806 : !llvm.ptr + %3808 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3811 = llvm.mul %59, %3810 : !llvm.i64 + %3812 = llvm.add %3809, %3811 : !llvm.i64 + %3813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3814 = llvm.mul %913, %3813 : !llvm.i64 + %3815 = llvm.add %3812, %3814 : !llvm.i64 + %3816 = llvm.getelementptr %3808[%3815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3817 = llvm.load %3816 : !llvm.ptr + %3818 = llvm.fmul %3807, %3817 {RelaxedPrecision} : !llvm.float + %3819 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3821 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3822 = llvm.mul %2957, %3821 : !llvm.i64 + %3823 = llvm.add %3820, %3822 : !llvm.i64 + %3824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3825 = llvm.mul %913, %3824 : !llvm.i64 + %3826 = llvm.add %3823, %3825 : !llvm.i64 + %3827 = llvm.getelementptr %3819[%3826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3828 = llvm.load %3827 : !llvm.ptr + %3829 = llvm.fadd %3828, %3818 {RelaxedPrecision} : !llvm.float + %3830 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3831 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3832 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3833 = llvm.mul %2957, %3832 : !llvm.i64 + %3834 = llvm.add %3831, %3833 : !llvm.i64 + %3835 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3836 = llvm.mul %913, %3835 : !llvm.i64 + %3837 = llvm.add %3834, %3836 : !llvm.i64 + %3838 = llvm.getelementptr %3830[%3837] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3829, %3838 : !llvm.ptr + %3839 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3840 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3841 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3842 = llvm.mul %2957, %3841 : !llvm.i64 + %3843 = llvm.add %3840, %3842 : !llvm.i64 + %3844 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3845 = llvm.mul %913, %3844 : !llvm.i64 + %3846 = llvm.add %3843, %3845 : !llvm.i64 + %3847 = llvm.getelementptr %3839[%3846] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3848 = llvm.load %3847 : !llvm.ptr + %3849 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3850 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3851 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3852 = llvm.mul %2957, %3851 : !llvm.i64 + %3853 = llvm.add %3850, %3852 : !llvm.i64 + %3854 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3855 = llvm.mul %913, %3854 : !llvm.i64 + %3856 = llvm.add %3853, %3855 : !llvm.i64 + %3857 = llvm.getelementptr %3849[%3856] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3848, %3857 : !llvm.ptr + %3858 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3860 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3861 = llvm.mul %2957, %3860 : !llvm.i64 + %3862 = llvm.add %3859, %3861 : !llvm.i64 + %3863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3864 = llvm.mul %59, %3863 : !llvm.i64 + %3865 = llvm.add %3862, %3864 : !llvm.i64 + %3866 = llvm.getelementptr %3858[%3865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3867 = llvm.load %3866 : !llvm.ptr + %3868 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3871 = llvm.mul %59, %3870 : !llvm.i64 + %3872 = llvm.add %3869, %3871 : !llvm.i64 + %3873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3874 = llvm.mul %974, %3873 : !llvm.i64 + %3875 = llvm.add %3872, %3874 : !llvm.i64 + %3876 = llvm.getelementptr %3868[%3875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3877 = llvm.load %3876 : !llvm.ptr + %3878 = llvm.fmul %3867, %3877 {RelaxedPrecision} : !llvm.float + %3879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3882 = llvm.mul %2957, %3881 : !llvm.i64 + %3883 = llvm.add %3880, %3882 : !llvm.i64 + %3884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3885 = llvm.mul %974, %3884 : !llvm.i64 + %3886 = llvm.add %3883, %3885 : !llvm.i64 + %3887 = llvm.getelementptr %3879[%3886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3888 = llvm.load %3887 : !llvm.ptr + %3889 = llvm.fadd %3888, %3878 {RelaxedPrecision} : !llvm.float + %3890 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3892 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3893 = llvm.mul %2957, %3892 : !llvm.i64 + %3894 = llvm.add %3891, %3893 : !llvm.i64 + %3895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3896 = llvm.mul %974, %3895 : !llvm.i64 + %3897 = llvm.add %3894, %3896 : !llvm.i64 + %3898 = llvm.getelementptr %3890[%3897] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3889, %3898 : !llvm.ptr + %3899 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3901 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3902 = llvm.mul %2957, %3901 : !llvm.i64 + %3903 = llvm.add %3900, %3902 : !llvm.i64 + %3904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3905 = llvm.mul %974, %3904 : !llvm.i64 + %3906 = llvm.add %3903, %3905 : !llvm.i64 + %3907 = llvm.getelementptr %3899[%3906] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3908 = llvm.load %3907 : !llvm.ptr + %3909 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3910 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3911 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3912 = llvm.mul %2957, %3911 : !llvm.i64 + %3913 = llvm.add %3910, %3912 : !llvm.i64 + %3914 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3915 = llvm.mul %974, %3914 : !llvm.i64 + %3916 = llvm.add %3913, %3915 : !llvm.i64 + %3917 = llvm.getelementptr %3909[%3916] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3908, %3917 : !llvm.ptr + %3918 = llvm.add %50, %36 : !llvm.i64 + %3919 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3921 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3922 = llvm.mul %3918, %3921 : !llvm.i64 + %3923 = llvm.add %3920, %3922 : !llvm.i64 + %3924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3925 = llvm.mul %59, %3924 : !llvm.i64 + %3926 = llvm.add %3923, %3925 : !llvm.i64 + %3927 = llvm.getelementptr %3919[%3926] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3928 = llvm.load %3927 : !llvm.ptr + %3929 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3930 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3931 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3932 = llvm.mul %59, %3931 : !llvm.i64 + %3933 = llvm.add %3930, %3932 : !llvm.i64 + %3934 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3935 = llvm.mul %58, %3934 : !llvm.i64 + %3936 = llvm.add %3933, %3935 : !llvm.i64 + %3937 = llvm.getelementptr %3929[%3936] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3938 = llvm.load %3937 : !llvm.ptr + %3939 = llvm.fmul %3928, %3938 {RelaxedPrecision} : !llvm.float + %3940 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3941 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3942 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3943 = llvm.mul %3918, %3942 : !llvm.i64 + %3944 = llvm.add %3941, %3943 : !llvm.i64 + %3945 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3946 = llvm.mul %58, %3945 : !llvm.i64 + %3947 = llvm.add %3944, %3946 : !llvm.i64 + %3948 = llvm.getelementptr %3940[%3947] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3949 = llvm.load %3948 : !llvm.ptr + %3950 = llvm.fadd %3949, %3939 {RelaxedPrecision} : !llvm.float + %3951 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3952 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3953 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3954 = llvm.mul %3918, %3953 : !llvm.i64 + %3955 = llvm.add %3952, %3954 : !llvm.i64 + %3956 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3957 = llvm.mul %58, %3956 : !llvm.i64 + %3958 = llvm.add %3955, %3957 : !llvm.i64 + %3959 = llvm.getelementptr %3951[%3958] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3950, %3959 : !llvm.ptr + %3960 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3961 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3962 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3963 = llvm.mul %3918, %3962 : !llvm.i64 + %3964 = llvm.add %3961, %3963 : !llvm.i64 + %3965 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3966 = llvm.mul %58, %3965 : !llvm.i64 + %3967 = llvm.add %3964, %3966 : !llvm.i64 + %3968 = llvm.getelementptr %3960[%3967] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3969 = llvm.load %3968 : !llvm.ptr + %3970 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3971 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3972 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3973 = llvm.mul %3918, %3972 : !llvm.i64 + %3974 = llvm.add %3971, %3973 : !llvm.i64 + %3975 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3976 = llvm.mul %58, %3975 : !llvm.i64 + %3977 = llvm.add %3974, %3976 : !llvm.i64 + %3978 = llvm.getelementptr %3970[%3977] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3969, %3978 : !llvm.ptr + %3979 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3981 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3982 = llvm.mul %3918, %3981 : !llvm.i64 + %3983 = llvm.add %3980, %3982 : !llvm.i64 + %3984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3985 = llvm.mul %59, %3984 : !llvm.i64 + %3986 = llvm.add %3983, %3985 : !llvm.i64 + %3987 = llvm.getelementptr %3979[%3986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3988 = llvm.load %3987 : !llvm.ptr + %3989 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3990 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3991 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3992 = llvm.mul %59, %3991 : !llvm.i64 + %3993 = llvm.add %3990, %3992 : !llvm.i64 + %3994 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3995 = llvm.mul %120, %3994 : !llvm.i64 + %3996 = llvm.add %3993, %3995 : !llvm.i64 + %3997 = llvm.getelementptr %3989[%3996] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3998 = llvm.load %3997 : !llvm.ptr + %3999 = llvm.fmul %3988, %3998 {RelaxedPrecision} : !llvm.float + %4000 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4001 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4002 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4003 = llvm.mul %3918, %4002 : !llvm.i64 + %4004 = llvm.add %4001, %4003 : !llvm.i64 + %4005 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4006 = llvm.mul %120, %4005 : !llvm.i64 + %4007 = llvm.add %4004, %4006 : !llvm.i64 + %4008 = llvm.getelementptr %4000[%4007] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4009 = llvm.load %4008 : !llvm.ptr + %4010 = llvm.fadd %4009, %3999 {RelaxedPrecision} : !llvm.float + %4011 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4012 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4013 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4014 = llvm.mul %3918, %4013 : !llvm.i64 + %4015 = llvm.add %4012, %4014 : !llvm.i64 + %4016 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4017 = llvm.mul %120, %4016 : !llvm.i64 + %4018 = llvm.add %4015, %4017 : !llvm.i64 + %4019 = llvm.getelementptr %4011[%4018] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4010, %4019 : !llvm.ptr + %4020 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4021 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4022 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4023 = llvm.mul %3918, %4022 : !llvm.i64 + %4024 = llvm.add %4021, %4023 : !llvm.i64 + %4025 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4026 = llvm.mul %120, %4025 : !llvm.i64 + %4027 = llvm.add %4024, %4026 : !llvm.i64 + %4028 = llvm.getelementptr %4020[%4027] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4029 = llvm.load %4028 : !llvm.ptr + %4030 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4031 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4032 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4033 = llvm.mul %3918, %4032 : !llvm.i64 + %4034 = llvm.add %4031, %4033 : !llvm.i64 + %4035 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4036 = llvm.mul %120, %4035 : !llvm.i64 + %4037 = llvm.add %4034, %4036 : !llvm.i64 + %4038 = llvm.getelementptr %4030[%4037] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4029, %4038 : !llvm.ptr + %4039 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4041 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4042 = llvm.mul %3918, %4041 : !llvm.i64 + %4043 = llvm.add %4040, %4042 : !llvm.i64 + %4044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4045 = llvm.mul %59, %4044 : !llvm.i64 + %4046 = llvm.add %4043, %4045 : !llvm.i64 + %4047 = llvm.getelementptr %4039[%4046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4048 = llvm.load %4047 : !llvm.ptr + %4049 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4050 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4051 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4052 = llvm.mul %59, %4051 : !llvm.i64 + %4053 = llvm.add %4050, %4052 : !llvm.i64 + %4054 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4055 = llvm.mul %181, %4054 : !llvm.i64 + %4056 = llvm.add %4053, %4055 : !llvm.i64 + %4057 = llvm.getelementptr %4049[%4056] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4058 = llvm.load %4057 : !llvm.ptr + %4059 = llvm.fmul %4048, %4058 {RelaxedPrecision} : !llvm.float + %4060 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4062 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4063 = llvm.mul %3918, %4062 : !llvm.i64 + %4064 = llvm.add %4061, %4063 : !llvm.i64 + %4065 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4066 = llvm.mul %181, %4065 : !llvm.i64 + %4067 = llvm.add %4064, %4066 : !llvm.i64 + %4068 = llvm.getelementptr %4060[%4067] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4069 = llvm.load %4068 : !llvm.ptr + %4070 = llvm.fadd %4069, %4059 {RelaxedPrecision} : !llvm.float + %4071 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4072 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4073 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4074 = llvm.mul %3918, %4073 : !llvm.i64 + %4075 = llvm.add %4072, %4074 : !llvm.i64 + %4076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4077 = llvm.mul %181, %4076 : !llvm.i64 + %4078 = llvm.add %4075, %4077 : !llvm.i64 + %4079 = llvm.getelementptr %4071[%4078] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4070, %4079 : !llvm.ptr + %4080 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4081 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4082 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4083 = llvm.mul %3918, %4082 : !llvm.i64 + %4084 = llvm.add %4081, %4083 : !llvm.i64 + %4085 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4086 = llvm.mul %181, %4085 : !llvm.i64 + %4087 = llvm.add %4084, %4086 : !llvm.i64 + %4088 = llvm.getelementptr %4080[%4087] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4089 = llvm.load %4088 : !llvm.ptr + %4090 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4091 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4092 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4093 = llvm.mul %3918, %4092 : !llvm.i64 + %4094 = llvm.add %4091, %4093 : !llvm.i64 + %4095 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4096 = llvm.mul %181, %4095 : !llvm.i64 + %4097 = llvm.add %4094, %4096 : !llvm.i64 + %4098 = llvm.getelementptr %4090[%4097] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4089, %4098 : !llvm.ptr + %4099 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4101 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4102 = llvm.mul %3918, %4101 : !llvm.i64 + %4103 = llvm.add %4100, %4102 : !llvm.i64 + %4104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4105 = llvm.mul %59, %4104 : !llvm.i64 + %4106 = llvm.add %4103, %4105 : !llvm.i64 + %4107 = llvm.getelementptr %4099[%4106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4108 = llvm.load %4107 : !llvm.ptr + %4109 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4110 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4111 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4112 = llvm.mul %59, %4111 : !llvm.i64 + %4113 = llvm.add %4110, %4112 : !llvm.i64 + %4114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4115 = llvm.mul %242, %4114 : !llvm.i64 + %4116 = llvm.add %4113, %4115 : !llvm.i64 + %4117 = llvm.getelementptr %4109[%4116] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4118 = llvm.load %4117 : !llvm.ptr + %4119 = llvm.fmul %4108, %4118 {RelaxedPrecision} : !llvm.float + %4120 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4122 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4123 = llvm.mul %3918, %4122 : !llvm.i64 + %4124 = llvm.add %4121, %4123 : !llvm.i64 + %4125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4126 = llvm.mul %242, %4125 : !llvm.i64 + %4127 = llvm.add %4124, %4126 : !llvm.i64 + %4128 = llvm.getelementptr %4120[%4127] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4129 = llvm.load %4128 : !llvm.ptr + %4130 = llvm.fadd %4129, %4119 {RelaxedPrecision} : !llvm.float + %4131 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4134 = llvm.mul %3918, %4133 : !llvm.i64 + %4135 = llvm.add %4132, %4134 : !llvm.i64 + %4136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4137 = llvm.mul %242, %4136 : !llvm.i64 + %4138 = llvm.add %4135, %4137 : !llvm.i64 + %4139 = llvm.getelementptr %4131[%4138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4130, %4139 : !llvm.ptr + %4140 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4141 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4142 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4143 = llvm.mul %3918, %4142 : !llvm.i64 + %4144 = llvm.add %4141, %4143 : !llvm.i64 + %4145 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4146 = llvm.mul %242, %4145 : !llvm.i64 + %4147 = llvm.add %4144, %4146 : !llvm.i64 + %4148 = llvm.getelementptr %4140[%4147] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4149 = llvm.load %4148 : !llvm.ptr + %4150 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4151 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4152 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4153 = llvm.mul %3918, %4152 : !llvm.i64 + %4154 = llvm.add %4151, %4153 : !llvm.i64 + %4155 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4156 = llvm.mul %242, %4155 : !llvm.i64 + %4157 = llvm.add %4154, %4156 : !llvm.i64 + %4158 = llvm.getelementptr %4150[%4157] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4149, %4158 : !llvm.ptr + %4159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4161 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4162 = llvm.mul %3918, %4161 : !llvm.i64 + %4163 = llvm.add %4160, %4162 : !llvm.i64 + %4164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4165 = llvm.mul %59, %4164 : !llvm.i64 + %4166 = llvm.add %4163, %4165 : !llvm.i64 + %4167 = llvm.getelementptr %4159[%4166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4168 = llvm.load %4167 : !llvm.ptr + %4169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4171 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4172 = llvm.mul %59, %4171 : !llvm.i64 + %4173 = llvm.add %4170, %4172 : !llvm.i64 + %4174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4175 = llvm.mul %303, %4174 : !llvm.i64 + %4176 = llvm.add %4173, %4175 : !llvm.i64 + %4177 = llvm.getelementptr %4169[%4176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4178 = llvm.load %4177 : !llvm.ptr + %4179 = llvm.fmul %4168, %4178 {RelaxedPrecision} : !llvm.float + %4180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4182 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4183 = llvm.mul %3918, %4182 : !llvm.i64 + %4184 = llvm.add %4181, %4183 : !llvm.i64 + %4185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4186 = llvm.mul %303, %4185 : !llvm.i64 + %4187 = llvm.add %4184, %4186 : !llvm.i64 + %4188 = llvm.getelementptr %4180[%4187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4189 = llvm.load %4188 : !llvm.ptr + %4190 = llvm.fadd %4189, %4179 {RelaxedPrecision} : !llvm.float + %4191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4193 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4194 = llvm.mul %3918, %4193 : !llvm.i64 + %4195 = llvm.add %4192, %4194 : !llvm.i64 + %4196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4197 = llvm.mul %303, %4196 : !llvm.i64 + %4198 = llvm.add %4195, %4197 : !llvm.i64 + %4199 = llvm.getelementptr %4191[%4198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4190, %4199 : !llvm.ptr + %4200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4202 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4203 = llvm.mul %3918, %4202 : !llvm.i64 + %4204 = llvm.add %4201, %4203 : !llvm.i64 + %4205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4206 = llvm.mul %303, %4205 : !llvm.i64 + %4207 = llvm.add %4204, %4206 : !llvm.i64 + %4208 = llvm.getelementptr %4200[%4207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4209 = llvm.load %4208 : !llvm.ptr + %4210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4212 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4213 = llvm.mul %3918, %4212 : !llvm.i64 + %4214 = llvm.add %4211, %4213 : !llvm.i64 + %4215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4216 = llvm.mul %303, %4215 : !llvm.i64 + %4217 = llvm.add %4214, %4216 : !llvm.i64 + %4218 = llvm.getelementptr %4210[%4217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4209, %4218 : !llvm.ptr + %4219 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4221 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4222 = llvm.mul %3918, %4221 : !llvm.i64 + %4223 = llvm.add %4220, %4222 : !llvm.i64 + %4224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4225 = llvm.mul %59, %4224 : !llvm.i64 + %4226 = llvm.add %4223, %4225 : !llvm.i64 + %4227 = llvm.getelementptr %4219[%4226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4228 = llvm.load %4227 : !llvm.ptr + %4229 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4230 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4231 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4232 = llvm.mul %59, %4231 : !llvm.i64 + %4233 = llvm.add %4230, %4232 : !llvm.i64 + %4234 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4235 = llvm.mul %364, %4234 : !llvm.i64 + %4236 = llvm.add %4233, %4235 : !llvm.i64 + %4237 = llvm.getelementptr %4229[%4236] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4238 = llvm.load %4237 : !llvm.ptr + %4239 = llvm.fmul %4228, %4238 {RelaxedPrecision} : !llvm.float + %4240 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4242 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4243 = llvm.mul %3918, %4242 : !llvm.i64 + %4244 = llvm.add %4241, %4243 : !llvm.i64 + %4245 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4246 = llvm.mul %364, %4245 : !llvm.i64 + %4247 = llvm.add %4244, %4246 : !llvm.i64 + %4248 = llvm.getelementptr %4240[%4247] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4249 = llvm.load %4248 : !llvm.ptr + %4250 = llvm.fadd %4249, %4239 {RelaxedPrecision} : !llvm.float + %4251 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4252 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4253 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4254 = llvm.mul %3918, %4253 : !llvm.i64 + %4255 = llvm.add %4252, %4254 : !llvm.i64 + %4256 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4257 = llvm.mul %364, %4256 : !llvm.i64 + %4258 = llvm.add %4255, %4257 : !llvm.i64 + %4259 = llvm.getelementptr %4251[%4258] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4250, %4259 : !llvm.ptr + %4260 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4261 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4262 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4263 = llvm.mul %3918, %4262 : !llvm.i64 + %4264 = llvm.add %4261, %4263 : !llvm.i64 + %4265 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4266 = llvm.mul %364, %4265 : !llvm.i64 + %4267 = llvm.add %4264, %4266 : !llvm.i64 + %4268 = llvm.getelementptr %4260[%4267] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4269 = llvm.load %4268 : !llvm.ptr + %4270 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4271 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4272 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4273 = llvm.mul %3918, %4272 : !llvm.i64 + %4274 = llvm.add %4271, %4273 : !llvm.i64 + %4275 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4276 = llvm.mul %364, %4275 : !llvm.i64 + %4277 = llvm.add %4274, %4276 : !llvm.i64 + %4278 = llvm.getelementptr %4270[%4277] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4269, %4278 : !llvm.ptr + %4279 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4281 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4282 = llvm.mul %3918, %4281 : !llvm.i64 + %4283 = llvm.add %4280, %4282 : !llvm.i64 + %4284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4285 = llvm.mul %59, %4284 : !llvm.i64 + %4286 = llvm.add %4283, %4285 : !llvm.i64 + %4287 = llvm.getelementptr %4279[%4286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4288 = llvm.load %4287 : !llvm.ptr + %4289 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4290 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4291 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4292 = llvm.mul %59, %4291 : !llvm.i64 + %4293 = llvm.add %4290, %4292 : !llvm.i64 + %4294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4295 = llvm.mul %425, %4294 : !llvm.i64 + %4296 = llvm.add %4293, %4295 : !llvm.i64 + %4297 = llvm.getelementptr %4289[%4296] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4298 = llvm.load %4297 : !llvm.ptr + %4299 = llvm.fmul %4288, %4298 {RelaxedPrecision} : !llvm.float + %4300 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4302 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4303 = llvm.mul %3918, %4302 : !llvm.i64 + %4304 = llvm.add %4301, %4303 : !llvm.i64 + %4305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4306 = llvm.mul %425, %4305 : !llvm.i64 + %4307 = llvm.add %4304, %4306 : !llvm.i64 + %4308 = llvm.getelementptr %4300[%4307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4309 = llvm.load %4308 : !llvm.ptr + %4310 = llvm.fadd %4309, %4299 {RelaxedPrecision} : !llvm.float + %4311 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4314 = llvm.mul %3918, %4313 : !llvm.i64 + %4315 = llvm.add %4312, %4314 : !llvm.i64 + %4316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4317 = llvm.mul %425, %4316 : !llvm.i64 + %4318 = llvm.add %4315, %4317 : !llvm.i64 + %4319 = llvm.getelementptr %4311[%4318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4310, %4319 : !llvm.ptr + %4320 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4321 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4322 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4323 = llvm.mul %3918, %4322 : !llvm.i64 + %4324 = llvm.add %4321, %4323 : !llvm.i64 + %4325 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4326 = llvm.mul %425, %4325 : !llvm.i64 + %4327 = llvm.add %4324, %4326 : !llvm.i64 + %4328 = llvm.getelementptr %4320[%4327] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4329 = llvm.load %4328 : !llvm.ptr + %4330 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4331 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4332 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4333 = llvm.mul %3918, %4332 : !llvm.i64 + %4334 = llvm.add %4331, %4333 : !llvm.i64 + %4335 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4336 = llvm.mul %425, %4335 : !llvm.i64 + %4337 = llvm.add %4334, %4336 : !llvm.i64 + %4338 = llvm.getelementptr %4330[%4337] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4329, %4338 : !llvm.ptr + %4339 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4341 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4342 = llvm.mul %3918, %4341 : !llvm.i64 + %4343 = llvm.add %4340, %4342 : !llvm.i64 + %4344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4345 = llvm.mul %59, %4344 : !llvm.i64 + %4346 = llvm.add %4343, %4345 : !llvm.i64 + %4347 = llvm.getelementptr %4339[%4346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4348 = llvm.load %4347 : !llvm.ptr + %4349 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4350 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4351 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4352 = llvm.mul %59, %4351 : !llvm.i64 + %4353 = llvm.add %4350, %4352 : !llvm.i64 + %4354 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4355 = llvm.mul %486, %4354 : !llvm.i64 + %4356 = llvm.add %4353, %4355 : !llvm.i64 + %4357 = llvm.getelementptr %4349[%4356] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4358 = llvm.load %4357 : !llvm.ptr + %4359 = llvm.fmul %4348, %4358 {RelaxedPrecision} : !llvm.float + %4360 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4361 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4362 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4363 = llvm.mul %3918, %4362 : !llvm.i64 + %4364 = llvm.add %4361, %4363 : !llvm.i64 + %4365 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4366 = llvm.mul %486, %4365 : !llvm.i64 + %4367 = llvm.add %4364, %4366 : !llvm.i64 + %4368 = llvm.getelementptr %4360[%4367] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4369 = llvm.load %4368 : !llvm.ptr + %4370 = llvm.fadd %4369, %4359 {RelaxedPrecision} : !llvm.float + %4371 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4372 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4373 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4374 = llvm.mul %3918, %4373 : !llvm.i64 + %4375 = llvm.add %4372, %4374 : !llvm.i64 + %4376 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4377 = llvm.mul %486, %4376 : !llvm.i64 + %4378 = llvm.add %4375, %4377 : !llvm.i64 + %4379 = llvm.getelementptr %4371[%4378] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4370, %4379 : !llvm.ptr + %4380 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4382 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4383 = llvm.mul %3918, %4382 : !llvm.i64 + %4384 = llvm.add %4381, %4383 : !llvm.i64 + %4385 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4386 = llvm.mul %486, %4385 : !llvm.i64 + %4387 = llvm.add %4384, %4386 : !llvm.i64 + %4388 = llvm.getelementptr %4380[%4387] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4389 = llvm.load %4388 : !llvm.ptr + %4390 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4392 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4393 = llvm.mul %3918, %4392 : !llvm.i64 + %4394 = llvm.add %4391, %4393 : !llvm.i64 + %4395 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4396 = llvm.mul %486, %4395 : !llvm.i64 + %4397 = llvm.add %4394, %4396 : !llvm.i64 + %4398 = llvm.getelementptr %4390[%4397] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4389, %4398 : !llvm.ptr + %4399 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4401 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4402 = llvm.mul %3918, %4401 : !llvm.i64 + %4403 = llvm.add %4400, %4402 : !llvm.i64 + %4404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4405 = llvm.mul %59, %4404 : !llvm.i64 + %4406 = llvm.add %4403, %4405 : !llvm.i64 + %4407 = llvm.getelementptr %4399[%4406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4408 = llvm.load %4407 : !llvm.ptr + %4409 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4410 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4411 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4412 = llvm.mul %59, %4411 : !llvm.i64 + %4413 = llvm.add %4410, %4412 : !llvm.i64 + %4414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4415 = llvm.mul %547, %4414 : !llvm.i64 + %4416 = llvm.add %4413, %4415 : !llvm.i64 + %4417 = llvm.getelementptr %4409[%4416] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4418 = llvm.load %4417 : !llvm.ptr + %4419 = llvm.fmul %4408, %4418 {RelaxedPrecision} : !llvm.float + %4420 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4421 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4422 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4423 = llvm.mul %3918, %4422 : !llvm.i64 + %4424 = llvm.add %4421, %4423 : !llvm.i64 + %4425 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4426 = llvm.mul %547, %4425 : !llvm.i64 + %4427 = llvm.add %4424, %4426 : !llvm.i64 + %4428 = llvm.getelementptr %4420[%4427] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4429 = llvm.load %4428 : !llvm.ptr + %4430 = llvm.fadd %4429, %4419 {RelaxedPrecision} : !llvm.float + %4431 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4433 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4434 = llvm.mul %3918, %4433 : !llvm.i64 + %4435 = llvm.add %4432, %4434 : !llvm.i64 + %4436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4437 = llvm.mul %547, %4436 : !llvm.i64 + %4438 = llvm.add %4435, %4437 : !llvm.i64 + %4439 = llvm.getelementptr %4431[%4438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4430, %4439 : !llvm.ptr + %4440 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4441 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4442 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4443 = llvm.mul %3918, %4442 : !llvm.i64 + %4444 = llvm.add %4441, %4443 : !llvm.i64 + %4445 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4446 = llvm.mul %547, %4445 : !llvm.i64 + %4447 = llvm.add %4444, %4446 : !llvm.i64 + %4448 = llvm.getelementptr %4440[%4447] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4449 = llvm.load %4448 : !llvm.ptr + %4450 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4451 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4452 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4453 = llvm.mul %3918, %4452 : !llvm.i64 + %4454 = llvm.add %4451, %4453 : !llvm.i64 + %4455 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4456 = llvm.mul %547, %4455 : !llvm.i64 + %4457 = llvm.add %4454, %4456 : !llvm.i64 + %4458 = llvm.getelementptr %4450[%4457] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4449, %4458 : !llvm.ptr + %4459 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4461 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4462 = llvm.mul %3918, %4461 : !llvm.i64 + %4463 = llvm.add %4460, %4462 : !llvm.i64 + %4464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4465 = llvm.mul %59, %4464 : !llvm.i64 + %4466 = llvm.add %4463, %4465 : !llvm.i64 + %4467 = llvm.getelementptr %4459[%4466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4468 = llvm.load %4467 : !llvm.ptr + %4469 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4470 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4471 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4472 = llvm.mul %59, %4471 : !llvm.i64 + %4473 = llvm.add %4470, %4472 : !llvm.i64 + %4474 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4475 = llvm.mul %608, %4474 : !llvm.i64 + %4476 = llvm.add %4473, %4475 : !llvm.i64 + %4477 = llvm.getelementptr %4469[%4476] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4478 = llvm.load %4477 : !llvm.ptr + %4479 = llvm.fmul %4468, %4478 {RelaxedPrecision} : !llvm.float + %4480 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4481 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4482 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4483 = llvm.mul %3918, %4482 : !llvm.i64 + %4484 = llvm.add %4481, %4483 : !llvm.i64 + %4485 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4486 = llvm.mul %608, %4485 : !llvm.i64 + %4487 = llvm.add %4484, %4486 : !llvm.i64 + %4488 = llvm.getelementptr %4480[%4487] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4489 = llvm.load %4488 : !llvm.ptr + %4490 = llvm.fadd %4489, %4479 {RelaxedPrecision} : !llvm.float + %4491 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4492 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4493 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4494 = llvm.mul %3918, %4493 : !llvm.i64 + %4495 = llvm.add %4492, %4494 : !llvm.i64 + %4496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4497 = llvm.mul %608, %4496 : !llvm.i64 + %4498 = llvm.add %4495, %4497 : !llvm.i64 + %4499 = llvm.getelementptr %4491[%4498] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4490, %4499 : !llvm.ptr + %4500 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4501 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4502 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4503 = llvm.mul %3918, %4502 : !llvm.i64 + %4504 = llvm.add %4501, %4503 : !llvm.i64 + %4505 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4506 = llvm.mul %608, %4505 : !llvm.i64 + %4507 = llvm.add %4504, %4506 : !llvm.i64 + %4508 = llvm.getelementptr %4500[%4507] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4509 = llvm.load %4508 : !llvm.ptr + %4510 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4511 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4512 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4513 = llvm.mul %3918, %4512 : !llvm.i64 + %4514 = llvm.add %4511, %4513 : !llvm.i64 + %4515 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4516 = llvm.mul %608, %4515 : !llvm.i64 + %4517 = llvm.add %4514, %4516 : !llvm.i64 + %4518 = llvm.getelementptr %4510[%4517] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4509, %4518 : !llvm.ptr + %4519 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4521 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4522 = llvm.mul %3918, %4521 : !llvm.i64 + %4523 = llvm.add %4520, %4522 : !llvm.i64 + %4524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4525 = llvm.mul %59, %4524 : !llvm.i64 + %4526 = llvm.add %4523, %4525 : !llvm.i64 + %4527 = llvm.getelementptr %4519[%4526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4528 = llvm.load %4527 : !llvm.ptr + %4529 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4530 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4531 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4532 = llvm.mul %59, %4531 : !llvm.i64 + %4533 = llvm.add %4530, %4532 : !llvm.i64 + %4534 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4535 = llvm.mul %669, %4534 : !llvm.i64 + %4536 = llvm.add %4533, %4535 : !llvm.i64 + %4537 = llvm.getelementptr %4529[%4536] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4538 = llvm.load %4537 : !llvm.ptr + %4539 = llvm.fmul %4528, %4538 {RelaxedPrecision} : !llvm.float + %4540 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4542 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4543 = llvm.mul %3918, %4542 : !llvm.i64 + %4544 = llvm.add %4541, %4543 : !llvm.i64 + %4545 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4546 = llvm.mul %669, %4545 : !llvm.i64 + %4547 = llvm.add %4544, %4546 : !llvm.i64 + %4548 = llvm.getelementptr %4540[%4547] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4549 = llvm.load %4548 : !llvm.ptr + %4550 = llvm.fadd %4549, %4539 {RelaxedPrecision} : !llvm.float + %4551 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4552 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4553 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4554 = llvm.mul %3918, %4553 : !llvm.i64 + %4555 = llvm.add %4552, %4554 : !llvm.i64 + %4556 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4557 = llvm.mul %669, %4556 : !llvm.i64 + %4558 = llvm.add %4555, %4557 : !llvm.i64 + %4559 = llvm.getelementptr %4551[%4558] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4550, %4559 : !llvm.ptr + %4560 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4561 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4562 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4563 = llvm.mul %3918, %4562 : !llvm.i64 + %4564 = llvm.add %4561, %4563 : !llvm.i64 + %4565 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4566 = llvm.mul %669, %4565 : !llvm.i64 + %4567 = llvm.add %4564, %4566 : !llvm.i64 + %4568 = llvm.getelementptr %4560[%4567] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4569 = llvm.load %4568 : !llvm.ptr + %4570 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4571 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4572 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4573 = llvm.mul %3918, %4572 : !llvm.i64 + %4574 = llvm.add %4571, %4573 : !llvm.i64 + %4575 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4576 = llvm.mul %669, %4575 : !llvm.i64 + %4577 = llvm.add %4574, %4576 : !llvm.i64 + %4578 = llvm.getelementptr %4570[%4577] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4569, %4578 : !llvm.ptr + %4579 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4581 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4582 = llvm.mul %3918, %4581 : !llvm.i64 + %4583 = llvm.add %4580, %4582 : !llvm.i64 + %4584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4585 = llvm.mul %59, %4584 : !llvm.i64 + %4586 = llvm.add %4583, %4585 : !llvm.i64 + %4587 = llvm.getelementptr %4579[%4586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4588 = llvm.load %4587 : !llvm.ptr + %4589 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4591 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4592 = llvm.mul %59, %4591 : !llvm.i64 + %4593 = llvm.add %4590, %4592 : !llvm.i64 + %4594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4595 = llvm.mul %730, %4594 : !llvm.i64 + %4596 = llvm.add %4593, %4595 : !llvm.i64 + %4597 = llvm.getelementptr %4589[%4596] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4598 = llvm.load %4597 : !llvm.ptr + %4599 = llvm.fmul %4588, %4598 {RelaxedPrecision} : !llvm.float + %4600 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4602 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4603 = llvm.mul %3918, %4602 : !llvm.i64 + %4604 = llvm.add %4601, %4603 : !llvm.i64 + %4605 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4606 = llvm.mul %730, %4605 : !llvm.i64 + %4607 = llvm.add %4604, %4606 : !llvm.i64 + %4608 = llvm.getelementptr %4600[%4607] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4609 = llvm.load %4608 : !llvm.ptr + %4610 = llvm.fadd %4609, %4599 {RelaxedPrecision} : !llvm.float + %4611 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4612 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4613 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4614 = llvm.mul %3918, %4613 : !llvm.i64 + %4615 = llvm.add %4612, %4614 : !llvm.i64 + %4616 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4617 = llvm.mul %730, %4616 : !llvm.i64 + %4618 = llvm.add %4615, %4617 : !llvm.i64 + %4619 = llvm.getelementptr %4611[%4618] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4610, %4619 : !llvm.ptr + %4620 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4621 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4622 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4623 = llvm.mul %3918, %4622 : !llvm.i64 + %4624 = llvm.add %4621, %4623 : !llvm.i64 + %4625 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4626 = llvm.mul %730, %4625 : !llvm.i64 + %4627 = llvm.add %4624, %4626 : !llvm.i64 + %4628 = llvm.getelementptr %4620[%4627] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4629 = llvm.load %4628 : !llvm.ptr + %4630 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4632 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4633 = llvm.mul %3918, %4632 : !llvm.i64 + %4634 = llvm.add %4631, %4633 : !llvm.i64 + %4635 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4636 = llvm.mul %730, %4635 : !llvm.i64 + %4637 = llvm.add %4634, %4636 : !llvm.i64 + %4638 = llvm.getelementptr %4630[%4637] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4629, %4638 : !llvm.ptr + %4639 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4641 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4642 = llvm.mul %3918, %4641 : !llvm.i64 + %4643 = llvm.add %4640, %4642 : !llvm.i64 + %4644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4645 = llvm.mul %59, %4644 : !llvm.i64 + %4646 = llvm.add %4643, %4645 : !llvm.i64 + %4647 = llvm.getelementptr %4639[%4646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4648 = llvm.load %4647 : !llvm.ptr + %4649 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4651 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4652 = llvm.mul %59, %4651 : !llvm.i64 + %4653 = llvm.add %4650, %4652 : !llvm.i64 + %4654 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4655 = llvm.mul %791, %4654 : !llvm.i64 + %4656 = llvm.add %4653, %4655 : !llvm.i64 + %4657 = llvm.getelementptr %4649[%4656] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4658 = llvm.load %4657 : !llvm.ptr + %4659 = llvm.fmul %4648, %4658 {RelaxedPrecision} : !llvm.float + %4660 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4662 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4663 = llvm.mul %3918, %4662 : !llvm.i64 + %4664 = llvm.add %4661, %4663 : !llvm.i64 + %4665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4666 = llvm.mul %791, %4665 : !llvm.i64 + %4667 = llvm.add %4664, %4666 : !llvm.i64 + %4668 = llvm.getelementptr %4660[%4667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4669 = llvm.load %4668 : !llvm.ptr + %4670 = llvm.fadd %4669, %4659 {RelaxedPrecision} : !llvm.float + %4671 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4672 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4673 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4674 = llvm.mul %3918, %4673 : !llvm.i64 + %4675 = llvm.add %4672, %4674 : !llvm.i64 + %4676 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4677 = llvm.mul %791, %4676 : !llvm.i64 + %4678 = llvm.add %4675, %4677 : !llvm.i64 + %4679 = llvm.getelementptr %4671[%4678] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4670, %4679 : !llvm.ptr + %4680 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4681 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4682 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4683 = llvm.mul %3918, %4682 : !llvm.i64 + %4684 = llvm.add %4681, %4683 : !llvm.i64 + %4685 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4686 = llvm.mul %791, %4685 : !llvm.i64 + %4687 = llvm.add %4684, %4686 : !llvm.i64 + %4688 = llvm.getelementptr %4680[%4687] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4689 = llvm.load %4688 : !llvm.ptr + %4690 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4691 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4692 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4693 = llvm.mul %3918, %4692 : !llvm.i64 + %4694 = llvm.add %4691, %4693 : !llvm.i64 + %4695 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4696 = llvm.mul %791, %4695 : !llvm.i64 + %4697 = llvm.add %4694, %4696 : !llvm.i64 + %4698 = llvm.getelementptr %4690[%4697] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4689, %4698 : !llvm.ptr + %4699 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4701 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4702 = llvm.mul %3918, %4701 : !llvm.i64 + %4703 = llvm.add %4700, %4702 : !llvm.i64 + %4704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4705 = llvm.mul %59, %4704 : !llvm.i64 + %4706 = llvm.add %4703, %4705 : !llvm.i64 + %4707 = llvm.getelementptr %4699[%4706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4708 = llvm.load %4707 : !llvm.ptr + %4709 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4710 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4711 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4712 = llvm.mul %59, %4711 : !llvm.i64 + %4713 = llvm.add %4710, %4712 : !llvm.i64 + %4714 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4715 = llvm.mul %852, %4714 : !llvm.i64 + %4716 = llvm.add %4713, %4715 : !llvm.i64 + %4717 = llvm.getelementptr %4709[%4716] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4718 = llvm.load %4717 : !llvm.ptr + %4719 = llvm.fmul %4708, %4718 {RelaxedPrecision} : !llvm.float + %4720 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4721 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4722 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4723 = llvm.mul %3918, %4722 : !llvm.i64 + %4724 = llvm.add %4721, %4723 : !llvm.i64 + %4725 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4726 = llvm.mul %852, %4725 : !llvm.i64 + %4727 = llvm.add %4724, %4726 : !llvm.i64 + %4728 = llvm.getelementptr %4720[%4727] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4729 = llvm.load %4728 : !llvm.ptr + %4730 = llvm.fadd %4729, %4719 {RelaxedPrecision} : !llvm.float + %4731 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4732 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4733 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4734 = llvm.mul %3918, %4733 : !llvm.i64 + %4735 = llvm.add %4732, %4734 : !llvm.i64 + %4736 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4737 = llvm.mul %852, %4736 : !llvm.i64 + %4738 = llvm.add %4735, %4737 : !llvm.i64 + %4739 = llvm.getelementptr %4731[%4738] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4730, %4739 : !llvm.ptr + %4740 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4741 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4742 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4743 = llvm.mul %3918, %4742 : !llvm.i64 + %4744 = llvm.add %4741, %4743 : !llvm.i64 + %4745 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4746 = llvm.mul %852, %4745 : !llvm.i64 + %4747 = llvm.add %4744, %4746 : !llvm.i64 + %4748 = llvm.getelementptr %4740[%4747] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4749 = llvm.load %4748 : !llvm.ptr + %4750 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4751 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4752 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4753 = llvm.mul %3918, %4752 : !llvm.i64 + %4754 = llvm.add %4751, %4753 : !llvm.i64 + %4755 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4756 = llvm.mul %852, %4755 : !llvm.i64 + %4757 = llvm.add %4754, %4756 : !llvm.i64 + %4758 = llvm.getelementptr %4750[%4757] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4749, %4758 : !llvm.ptr + %4759 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4761 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4762 = llvm.mul %3918, %4761 : !llvm.i64 + %4763 = llvm.add %4760, %4762 : !llvm.i64 + %4764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4765 = llvm.mul %59, %4764 : !llvm.i64 + %4766 = llvm.add %4763, %4765 : !llvm.i64 + %4767 = llvm.getelementptr %4759[%4766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4768 = llvm.load %4767 : !llvm.ptr + %4769 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4770 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4771 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4772 = llvm.mul %59, %4771 : !llvm.i64 + %4773 = llvm.add %4770, %4772 : !llvm.i64 + %4774 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4775 = llvm.mul %913, %4774 : !llvm.i64 + %4776 = llvm.add %4773, %4775 : !llvm.i64 + %4777 = llvm.getelementptr %4769[%4776] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4778 = llvm.load %4777 : !llvm.ptr + %4779 = llvm.fmul %4768, %4778 {RelaxedPrecision} : !llvm.float + %4780 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4781 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4782 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4783 = llvm.mul %3918, %4782 : !llvm.i64 + %4784 = llvm.add %4781, %4783 : !llvm.i64 + %4785 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4786 = llvm.mul %913, %4785 : !llvm.i64 + %4787 = llvm.add %4784, %4786 : !llvm.i64 + %4788 = llvm.getelementptr %4780[%4787] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4789 = llvm.load %4788 : !llvm.ptr + %4790 = llvm.fadd %4789, %4779 {RelaxedPrecision} : !llvm.float + %4791 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4792 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4793 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4794 = llvm.mul %3918, %4793 : !llvm.i64 + %4795 = llvm.add %4792, %4794 : !llvm.i64 + %4796 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4797 = llvm.mul %913, %4796 : !llvm.i64 + %4798 = llvm.add %4795, %4797 : !llvm.i64 + %4799 = llvm.getelementptr %4791[%4798] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4790, %4799 : !llvm.ptr + %4800 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4801 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4802 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4803 = llvm.mul %3918, %4802 : !llvm.i64 + %4804 = llvm.add %4801, %4803 : !llvm.i64 + %4805 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4806 = llvm.mul %913, %4805 : !llvm.i64 + %4807 = llvm.add %4804, %4806 : !llvm.i64 + %4808 = llvm.getelementptr %4800[%4807] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4809 = llvm.load %4808 : !llvm.ptr + %4810 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4811 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4812 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4813 = llvm.mul %3918, %4812 : !llvm.i64 + %4814 = llvm.add %4811, %4813 : !llvm.i64 + %4815 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4816 = llvm.mul %913, %4815 : !llvm.i64 + %4817 = llvm.add %4814, %4816 : !llvm.i64 + %4818 = llvm.getelementptr %4810[%4817] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4809, %4818 : !llvm.ptr + %4819 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4821 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4822 = llvm.mul %3918, %4821 : !llvm.i64 + %4823 = llvm.add %4820, %4822 : !llvm.i64 + %4824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4825 = llvm.mul %59, %4824 : !llvm.i64 + %4826 = llvm.add %4823, %4825 : !llvm.i64 + %4827 = llvm.getelementptr %4819[%4826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4828 = llvm.load %4827 : !llvm.ptr + %4829 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4830 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4831 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4832 = llvm.mul %59, %4831 : !llvm.i64 + %4833 = llvm.add %4830, %4832 : !llvm.i64 + %4834 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4835 = llvm.mul %974, %4834 : !llvm.i64 + %4836 = llvm.add %4833, %4835 : !llvm.i64 + %4837 = llvm.getelementptr %4829[%4836] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4838 = llvm.load %4837 : !llvm.ptr + %4839 = llvm.fmul %4828, %4838 {RelaxedPrecision} : !llvm.float + %4840 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4841 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4842 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4843 = llvm.mul %3918, %4842 : !llvm.i64 + %4844 = llvm.add %4841, %4843 : !llvm.i64 + %4845 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4846 = llvm.mul %974, %4845 : !llvm.i64 + %4847 = llvm.add %4844, %4846 : !llvm.i64 + %4848 = llvm.getelementptr %4840[%4847] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4849 = llvm.load %4848 : !llvm.ptr + %4850 = llvm.fadd %4849, %4839 {RelaxedPrecision} : !llvm.float + %4851 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4852 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4853 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4854 = llvm.mul %3918, %4853 : !llvm.i64 + %4855 = llvm.add %4852, %4854 : !llvm.i64 + %4856 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4857 = llvm.mul %974, %4856 : !llvm.i64 + %4858 = llvm.add %4855, %4857 : !llvm.i64 + %4859 = llvm.getelementptr %4851[%4858] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4850, %4859 : !llvm.ptr + %4860 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4861 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4862 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4863 = llvm.mul %3918, %4862 : !llvm.i64 + %4864 = llvm.add %4861, %4863 : !llvm.i64 + %4865 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4866 = llvm.mul %974, %4865 : !llvm.i64 + %4867 = llvm.add %4864, %4866 : !llvm.i64 + %4868 = llvm.getelementptr %4860[%4867] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4869 = llvm.load %4868 : !llvm.ptr + %4870 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4871 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4872 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4873 = llvm.mul %3918, %4872 : !llvm.i64 + %4874 = llvm.add %4871, %4873 : !llvm.i64 + %4875 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4876 = llvm.mul %974, %4875 : !llvm.i64 + %4877 = llvm.add %4874, %4876 : !llvm.i64 + %4878 = llvm.getelementptr %4870[%4877] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4869, %4878 : !llvm.ptr + %4879 = llvm.add %50, %37 : !llvm.i64 + %4880 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4881 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4882 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4883 = llvm.mul %4879, %4882 : !llvm.i64 + %4884 = llvm.add %4881, %4883 : !llvm.i64 + %4885 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4886 = llvm.mul %59, %4885 : !llvm.i64 + %4887 = llvm.add %4884, %4886 : !llvm.i64 + %4888 = llvm.getelementptr %4880[%4887] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4889 = llvm.load %4888 : !llvm.ptr + %4890 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4892 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4893 = llvm.mul %59, %4892 : !llvm.i64 + %4894 = llvm.add %4891, %4893 : !llvm.i64 + %4895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4896 = llvm.mul %58, %4895 : !llvm.i64 + %4897 = llvm.add %4894, %4896 : !llvm.i64 + %4898 = llvm.getelementptr %4890[%4897] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4899 = llvm.load %4898 : !llvm.ptr + %4900 = llvm.fmul %4889, %4899 {RelaxedPrecision} : !llvm.float + %4901 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4903 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4904 = llvm.mul %4879, %4903 : !llvm.i64 + %4905 = llvm.add %4902, %4904 : !llvm.i64 + %4906 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4907 = llvm.mul %58, %4906 : !llvm.i64 + %4908 = llvm.add %4905, %4907 : !llvm.i64 + %4909 = llvm.getelementptr %4901[%4908] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4910 = llvm.load %4909 : !llvm.ptr + %4911 = llvm.fadd %4910, %4900 {RelaxedPrecision} : !llvm.float + %4912 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4913 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4914 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4915 = llvm.mul %4879, %4914 : !llvm.i64 + %4916 = llvm.add %4913, %4915 : !llvm.i64 + %4917 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4918 = llvm.mul %58, %4917 : !llvm.i64 + %4919 = llvm.add %4916, %4918 : !llvm.i64 + %4920 = llvm.getelementptr %4912[%4919] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4911, %4920 : !llvm.ptr + %4921 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4922 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4923 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4924 = llvm.mul %4879, %4923 : !llvm.i64 + %4925 = llvm.add %4922, %4924 : !llvm.i64 + %4926 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4927 = llvm.mul %58, %4926 : !llvm.i64 + %4928 = llvm.add %4925, %4927 : !llvm.i64 + %4929 = llvm.getelementptr %4921[%4928] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4930 = llvm.load %4929 : !llvm.ptr + %4931 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4932 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4933 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4934 = llvm.mul %4879, %4933 : !llvm.i64 + %4935 = llvm.add %4932, %4934 : !llvm.i64 + %4936 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4937 = llvm.mul %58, %4936 : !llvm.i64 + %4938 = llvm.add %4935, %4937 : !llvm.i64 + %4939 = llvm.getelementptr %4931[%4938] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4930, %4939 : !llvm.ptr + %4940 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4941 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4942 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4943 = llvm.mul %4879, %4942 : !llvm.i64 + %4944 = llvm.add %4941, %4943 : !llvm.i64 + %4945 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4946 = llvm.mul %59, %4945 : !llvm.i64 + %4947 = llvm.add %4944, %4946 : !llvm.i64 + %4948 = llvm.getelementptr %4940[%4947] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4949 = llvm.load %4948 : !llvm.ptr + %4950 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4951 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4952 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4953 = llvm.mul %59, %4952 : !llvm.i64 + %4954 = llvm.add %4951, %4953 : !llvm.i64 + %4955 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4956 = llvm.mul %120, %4955 : !llvm.i64 + %4957 = llvm.add %4954, %4956 : !llvm.i64 + %4958 = llvm.getelementptr %4950[%4957] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4959 = llvm.load %4958 : !llvm.ptr + %4960 = llvm.fmul %4949, %4959 {RelaxedPrecision} : !llvm.float + %4961 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4962 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4963 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4964 = llvm.mul %4879, %4963 : !llvm.i64 + %4965 = llvm.add %4962, %4964 : !llvm.i64 + %4966 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4967 = llvm.mul %120, %4966 : !llvm.i64 + %4968 = llvm.add %4965, %4967 : !llvm.i64 + %4969 = llvm.getelementptr %4961[%4968] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4970 = llvm.load %4969 : !llvm.ptr + %4971 = llvm.fadd %4970, %4960 {RelaxedPrecision} : !llvm.float + %4972 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4973 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4974 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4975 = llvm.mul %4879, %4974 : !llvm.i64 + %4976 = llvm.add %4973, %4975 : !llvm.i64 + %4977 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4978 = llvm.mul %120, %4977 : !llvm.i64 + %4979 = llvm.add %4976, %4978 : !llvm.i64 + %4980 = llvm.getelementptr %4972[%4979] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4971, %4980 : !llvm.ptr + %4981 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4982 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4983 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4984 = llvm.mul %4879, %4983 : !llvm.i64 + %4985 = llvm.add %4982, %4984 : !llvm.i64 + %4986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4987 = llvm.mul %120, %4986 : !llvm.i64 + %4988 = llvm.add %4985, %4987 : !llvm.i64 + %4989 = llvm.getelementptr %4981[%4988] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4990 = llvm.load %4989 : !llvm.ptr + %4991 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4992 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4993 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4994 = llvm.mul %4879, %4993 : !llvm.i64 + %4995 = llvm.add %4992, %4994 : !llvm.i64 + %4996 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4997 = llvm.mul %120, %4996 : !llvm.i64 + %4998 = llvm.add %4995, %4997 : !llvm.i64 + %4999 = llvm.getelementptr %4991[%4998] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4990, %4999 : !llvm.ptr + %5000 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5001 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5002 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5003 = llvm.mul %4879, %5002 : !llvm.i64 + %5004 = llvm.add %5001, %5003 : !llvm.i64 + %5005 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5006 = llvm.mul %59, %5005 : !llvm.i64 + %5007 = llvm.add %5004, %5006 : !llvm.i64 + %5008 = llvm.getelementptr %5000[%5007] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5009 = llvm.load %5008 : !llvm.ptr + %5010 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5011 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5012 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5013 = llvm.mul %59, %5012 : !llvm.i64 + %5014 = llvm.add %5011, %5013 : !llvm.i64 + %5015 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5016 = llvm.mul %181, %5015 : !llvm.i64 + %5017 = llvm.add %5014, %5016 : !llvm.i64 + %5018 = llvm.getelementptr %5010[%5017] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5019 = llvm.load %5018 : !llvm.ptr + %5020 = llvm.fmul %5009, %5019 {RelaxedPrecision} : !llvm.float + %5021 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5022 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5023 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5024 = llvm.mul %4879, %5023 : !llvm.i64 + %5025 = llvm.add %5022, %5024 : !llvm.i64 + %5026 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5027 = llvm.mul %181, %5026 : !llvm.i64 + %5028 = llvm.add %5025, %5027 : !llvm.i64 + %5029 = llvm.getelementptr %5021[%5028] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5030 = llvm.load %5029 : !llvm.ptr + %5031 = llvm.fadd %5030, %5020 {RelaxedPrecision} : !llvm.float + %5032 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5033 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5034 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5035 = llvm.mul %4879, %5034 : !llvm.i64 + %5036 = llvm.add %5033, %5035 : !llvm.i64 + %5037 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5038 = llvm.mul %181, %5037 : !llvm.i64 + %5039 = llvm.add %5036, %5038 : !llvm.i64 + %5040 = llvm.getelementptr %5032[%5039] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5031, %5040 : !llvm.ptr + %5041 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5042 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5043 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5044 = llvm.mul %4879, %5043 : !llvm.i64 + %5045 = llvm.add %5042, %5044 : !llvm.i64 + %5046 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5047 = llvm.mul %181, %5046 : !llvm.i64 + %5048 = llvm.add %5045, %5047 : !llvm.i64 + %5049 = llvm.getelementptr %5041[%5048] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5050 = llvm.load %5049 : !llvm.ptr + %5051 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5052 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5053 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5054 = llvm.mul %4879, %5053 : !llvm.i64 + %5055 = llvm.add %5052, %5054 : !llvm.i64 + %5056 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5057 = llvm.mul %181, %5056 : !llvm.i64 + %5058 = llvm.add %5055, %5057 : !llvm.i64 + %5059 = llvm.getelementptr %5051[%5058] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5050, %5059 : !llvm.ptr + %5060 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5062 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5063 = llvm.mul %4879, %5062 : !llvm.i64 + %5064 = llvm.add %5061, %5063 : !llvm.i64 + %5065 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5066 = llvm.mul %59, %5065 : !llvm.i64 + %5067 = llvm.add %5064, %5066 : !llvm.i64 + %5068 = llvm.getelementptr %5060[%5067] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5069 = llvm.load %5068 : !llvm.ptr + %5070 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5071 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5072 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5073 = llvm.mul %59, %5072 : !llvm.i64 + %5074 = llvm.add %5071, %5073 : !llvm.i64 + %5075 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5076 = llvm.mul %242, %5075 : !llvm.i64 + %5077 = llvm.add %5074, %5076 : !llvm.i64 + %5078 = llvm.getelementptr %5070[%5077] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5079 = llvm.load %5078 : !llvm.ptr + %5080 = llvm.fmul %5069, %5079 {RelaxedPrecision} : !llvm.float + %5081 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5082 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5083 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5084 = llvm.mul %4879, %5083 : !llvm.i64 + %5085 = llvm.add %5082, %5084 : !llvm.i64 + %5086 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5087 = llvm.mul %242, %5086 : !llvm.i64 + %5088 = llvm.add %5085, %5087 : !llvm.i64 + %5089 = llvm.getelementptr %5081[%5088] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5090 = llvm.load %5089 : !llvm.ptr + %5091 = llvm.fadd %5090, %5080 {RelaxedPrecision} : !llvm.float + %5092 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5093 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5094 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5095 = llvm.mul %4879, %5094 : !llvm.i64 + %5096 = llvm.add %5093, %5095 : !llvm.i64 + %5097 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5098 = llvm.mul %242, %5097 : !llvm.i64 + %5099 = llvm.add %5096, %5098 : !llvm.i64 + %5100 = llvm.getelementptr %5092[%5099] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5091, %5100 : !llvm.ptr + %5101 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5102 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5103 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5104 = llvm.mul %4879, %5103 : !llvm.i64 + %5105 = llvm.add %5102, %5104 : !llvm.i64 + %5106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5107 = llvm.mul %242, %5106 : !llvm.i64 + %5108 = llvm.add %5105, %5107 : !llvm.i64 + %5109 = llvm.getelementptr %5101[%5108] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5110 = llvm.load %5109 : !llvm.ptr + %5111 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5112 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5113 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5114 = llvm.mul %4879, %5113 : !llvm.i64 + %5115 = llvm.add %5112, %5114 : !llvm.i64 + %5116 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5117 = llvm.mul %242, %5116 : !llvm.i64 + %5118 = llvm.add %5115, %5117 : !llvm.i64 + %5119 = llvm.getelementptr %5111[%5118] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5110, %5119 : !llvm.ptr + %5120 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5122 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5123 = llvm.mul %4879, %5122 : !llvm.i64 + %5124 = llvm.add %5121, %5123 : !llvm.i64 + %5125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5126 = llvm.mul %59, %5125 : !llvm.i64 + %5127 = llvm.add %5124, %5126 : !llvm.i64 + %5128 = llvm.getelementptr %5120[%5127] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5129 = llvm.load %5128 : !llvm.ptr + %5130 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5132 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5133 = llvm.mul %59, %5132 : !llvm.i64 + %5134 = llvm.add %5131, %5133 : !llvm.i64 + %5135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5136 = llvm.mul %303, %5135 : !llvm.i64 + %5137 = llvm.add %5134, %5136 : !llvm.i64 + %5138 = llvm.getelementptr %5130[%5137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5139 = llvm.load %5138 : !llvm.ptr + %5140 = llvm.fmul %5129, %5139 {RelaxedPrecision} : !llvm.float + %5141 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5142 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5143 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5144 = llvm.mul %4879, %5143 : !llvm.i64 + %5145 = llvm.add %5142, %5144 : !llvm.i64 + %5146 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5147 = llvm.mul %303, %5146 : !llvm.i64 + %5148 = llvm.add %5145, %5147 : !llvm.i64 + %5149 = llvm.getelementptr %5141[%5148] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5150 = llvm.load %5149 : !llvm.ptr + %5151 = llvm.fadd %5150, %5140 {RelaxedPrecision} : !llvm.float + %5152 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5153 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5154 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5155 = llvm.mul %4879, %5154 : !llvm.i64 + %5156 = llvm.add %5153, %5155 : !llvm.i64 + %5157 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5158 = llvm.mul %303, %5157 : !llvm.i64 + %5159 = llvm.add %5156, %5158 : !llvm.i64 + %5160 = llvm.getelementptr %5152[%5159] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5151, %5160 : !llvm.ptr + %5161 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5162 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5163 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5164 = llvm.mul %4879, %5163 : !llvm.i64 + %5165 = llvm.add %5162, %5164 : !llvm.i64 + %5166 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5167 = llvm.mul %303, %5166 : !llvm.i64 + %5168 = llvm.add %5165, %5167 : !llvm.i64 + %5169 = llvm.getelementptr %5161[%5168] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5170 = llvm.load %5169 : !llvm.ptr + %5171 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5172 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5173 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5174 = llvm.mul %4879, %5173 : !llvm.i64 + %5175 = llvm.add %5172, %5174 : !llvm.i64 + %5176 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5177 = llvm.mul %303, %5176 : !llvm.i64 + %5178 = llvm.add %5175, %5177 : !llvm.i64 + %5179 = llvm.getelementptr %5171[%5178] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5170, %5179 : !llvm.ptr + %5180 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5182 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5183 = llvm.mul %4879, %5182 : !llvm.i64 + %5184 = llvm.add %5181, %5183 : !llvm.i64 + %5185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5186 = llvm.mul %59, %5185 : !llvm.i64 + %5187 = llvm.add %5184, %5186 : !llvm.i64 + %5188 = llvm.getelementptr %5180[%5187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5189 = llvm.load %5188 : !llvm.ptr + %5190 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5192 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5193 = llvm.mul %59, %5192 : !llvm.i64 + %5194 = llvm.add %5191, %5193 : !llvm.i64 + %5195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5196 = llvm.mul %364, %5195 : !llvm.i64 + %5197 = llvm.add %5194, %5196 : !llvm.i64 + %5198 = llvm.getelementptr %5190[%5197] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5199 = llvm.load %5198 : !llvm.ptr + %5200 = llvm.fmul %5189, %5199 {RelaxedPrecision} : !llvm.float + %5201 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5202 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5203 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5204 = llvm.mul %4879, %5203 : !llvm.i64 + %5205 = llvm.add %5202, %5204 : !llvm.i64 + %5206 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5207 = llvm.mul %364, %5206 : !llvm.i64 + %5208 = llvm.add %5205, %5207 : !llvm.i64 + %5209 = llvm.getelementptr %5201[%5208] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5210 = llvm.load %5209 : !llvm.ptr + %5211 = llvm.fadd %5210, %5200 {RelaxedPrecision} : !llvm.float + %5212 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5213 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5214 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5215 = llvm.mul %4879, %5214 : !llvm.i64 + %5216 = llvm.add %5213, %5215 : !llvm.i64 + %5217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5218 = llvm.mul %364, %5217 : !llvm.i64 + %5219 = llvm.add %5216, %5218 : !llvm.i64 + %5220 = llvm.getelementptr %5212[%5219] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5211, %5220 : !llvm.ptr + %5221 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5223 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5224 = llvm.mul %4879, %5223 : !llvm.i64 + %5225 = llvm.add %5222, %5224 : !llvm.i64 + %5226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5227 = llvm.mul %364, %5226 : !llvm.i64 + %5228 = llvm.add %5225, %5227 : !llvm.i64 + %5229 = llvm.getelementptr %5221[%5228] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5230 = llvm.load %5229 : !llvm.ptr + %5231 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5232 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5233 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5234 = llvm.mul %4879, %5233 : !llvm.i64 + %5235 = llvm.add %5232, %5234 : !llvm.i64 + %5236 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5237 = llvm.mul %364, %5236 : !llvm.i64 + %5238 = llvm.add %5235, %5237 : !llvm.i64 + %5239 = llvm.getelementptr %5231[%5238] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5230, %5239 : !llvm.ptr + %5240 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5242 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5243 = llvm.mul %4879, %5242 : !llvm.i64 + %5244 = llvm.add %5241, %5243 : !llvm.i64 + %5245 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5246 = llvm.mul %59, %5245 : !llvm.i64 + %5247 = llvm.add %5244, %5246 : !llvm.i64 + %5248 = llvm.getelementptr %5240[%5247] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5249 = llvm.load %5248 : !llvm.ptr + %5250 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5252 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5253 = llvm.mul %59, %5252 : !llvm.i64 + %5254 = llvm.add %5251, %5253 : !llvm.i64 + %5255 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5256 = llvm.mul %425, %5255 : !llvm.i64 + %5257 = llvm.add %5254, %5256 : !llvm.i64 + %5258 = llvm.getelementptr %5250[%5257] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5259 = llvm.load %5258 : !llvm.ptr + %5260 = llvm.fmul %5249, %5259 {RelaxedPrecision} : !llvm.float + %5261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5263 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5264 = llvm.mul %4879, %5263 : !llvm.i64 + %5265 = llvm.add %5262, %5264 : !llvm.i64 + %5266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5267 = llvm.mul %425, %5266 : !llvm.i64 + %5268 = llvm.add %5265, %5267 : !llvm.i64 + %5269 = llvm.getelementptr %5261[%5268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5270 = llvm.load %5269 : !llvm.ptr + %5271 = llvm.fadd %5270, %5260 {RelaxedPrecision} : !llvm.float + %5272 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5273 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5274 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5275 = llvm.mul %4879, %5274 : !llvm.i64 + %5276 = llvm.add %5273, %5275 : !llvm.i64 + %5277 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5278 = llvm.mul %425, %5277 : !llvm.i64 + %5279 = llvm.add %5276, %5278 : !llvm.i64 + %5280 = llvm.getelementptr %5272[%5279] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5271, %5280 : !llvm.ptr + %5281 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5282 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5283 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5284 = llvm.mul %4879, %5283 : !llvm.i64 + %5285 = llvm.add %5282, %5284 : !llvm.i64 + %5286 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5287 = llvm.mul %425, %5286 : !llvm.i64 + %5288 = llvm.add %5285, %5287 : !llvm.i64 + %5289 = llvm.getelementptr %5281[%5288] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5290 = llvm.load %5289 : !llvm.ptr + %5291 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5292 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5293 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5294 = llvm.mul %4879, %5293 : !llvm.i64 + %5295 = llvm.add %5292, %5294 : !llvm.i64 + %5296 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5297 = llvm.mul %425, %5296 : !llvm.i64 + %5298 = llvm.add %5295, %5297 : !llvm.i64 + %5299 = llvm.getelementptr %5291[%5298] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5290, %5299 : !llvm.ptr + %5300 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5302 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5303 = llvm.mul %4879, %5302 : !llvm.i64 + %5304 = llvm.add %5301, %5303 : !llvm.i64 + %5305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5306 = llvm.mul %59, %5305 : !llvm.i64 + %5307 = llvm.add %5304, %5306 : !llvm.i64 + %5308 = llvm.getelementptr %5300[%5307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5309 = llvm.load %5308 : !llvm.ptr + %5310 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5311 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5312 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5313 = llvm.mul %59, %5312 : !llvm.i64 + %5314 = llvm.add %5311, %5313 : !llvm.i64 + %5315 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5316 = llvm.mul %486, %5315 : !llvm.i64 + %5317 = llvm.add %5314, %5316 : !llvm.i64 + %5318 = llvm.getelementptr %5310[%5317] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5319 = llvm.load %5318 : !llvm.ptr + %5320 = llvm.fmul %5309, %5319 {RelaxedPrecision} : !llvm.float + %5321 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5322 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5323 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5324 = llvm.mul %4879, %5323 : !llvm.i64 + %5325 = llvm.add %5322, %5324 : !llvm.i64 + %5326 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5327 = llvm.mul %486, %5326 : !llvm.i64 + %5328 = llvm.add %5325, %5327 : !llvm.i64 + %5329 = llvm.getelementptr %5321[%5328] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5330 = llvm.load %5329 : !llvm.ptr + %5331 = llvm.fadd %5330, %5320 {RelaxedPrecision} : !llvm.float + %5332 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5333 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5334 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5335 = llvm.mul %4879, %5334 : !llvm.i64 + %5336 = llvm.add %5333, %5335 : !llvm.i64 + %5337 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5338 = llvm.mul %486, %5337 : !llvm.i64 + %5339 = llvm.add %5336, %5338 : !llvm.i64 + %5340 = llvm.getelementptr %5332[%5339] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5331, %5340 : !llvm.ptr + %5341 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5342 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5343 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5344 = llvm.mul %4879, %5343 : !llvm.i64 + %5345 = llvm.add %5342, %5344 : !llvm.i64 + %5346 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5347 = llvm.mul %486, %5346 : !llvm.i64 + %5348 = llvm.add %5345, %5347 : !llvm.i64 + %5349 = llvm.getelementptr %5341[%5348] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5350 = llvm.load %5349 : !llvm.ptr + %5351 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5352 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5353 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5354 = llvm.mul %4879, %5353 : !llvm.i64 + %5355 = llvm.add %5352, %5354 : !llvm.i64 + %5356 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5357 = llvm.mul %486, %5356 : !llvm.i64 + %5358 = llvm.add %5355, %5357 : !llvm.i64 + %5359 = llvm.getelementptr %5351[%5358] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5350, %5359 : !llvm.ptr + %5360 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5361 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5362 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5363 = llvm.mul %4879, %5362 : !llvm.i64 + %5364 = llvm.add %5361, %5363 : !llvm.i64 + %5365 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5366 = llvm.mul %59, %5365 : !llvm.i64 + %5367 = llvm.add %5364, %5366 : !llvm.i64 + %5368 = llvm.getelementptr %5360[%5367] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5369 = llvm.load %5368 : !llvm.ptr + %5370 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5371 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5372 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5373 = llvm.mul %59, %5372 : !llvm.i64 + %5374 = llvm.add %5371, %5373 : !llvm.i64 + %5375 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5376 = llvm.mul %547, %5375 : !llvm.i64 + %5377 = llvm.add %5374, %5376 : !llvm.i64 + %5378 = llvm.getelementptr %5370[%5377] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5379 = llvm.load %5378 : !llvm.ptr + %5380 = llvm.fmul %5369, %5379 {RelaxedPrecision} : !llvm.float + %5381 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5383 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5384 = llvm.mul %4879, %5383 : !llvm.i64 + %5385 = llvm.add %5382, %5384 : !llvm.i64 + %5386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5387 = llvm.mul %547, %5386 : !llvm.i64 + %5388 = llvm.add %5385, %5387 : !llvm.i64 + %5389 = llvm.getelementptr %5381[%5388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5390 = llvm.load %5389 : !llvm.ptr + %5391 = llvm.fadd %5390, %5380 {RelaxedPrecision} : !llvm.float + %5392 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5393 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5394 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5395 = llvm.mul %4879, %5394 : !llvm.i64 + %5396 = llvm.add %5393, %5395 : !llvm.i64 + %5397 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5398 = llvm.mul %547, %5397 : !llvm.i64 + %5399 = llvm.add %5396, %5398 : !llvm.i64 + %5400 = llvm.getelementptr %5392[%5399] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5391, %5400 : !llvm.ptr + %5401 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5403 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5404 = llvm.mul %4879, %5403 : !llvm.i64 + %5405 = llvm.add %5402, %5404 : !llvm.i64 + %5406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5407 = llvm.mul %547, %5406 : !llvm.i64 + %5408 = llvm.add %5405, %5407 : !llvm.i64 + %5409 = llvm.getelementptr %5401[%5408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5410 = llvm.load %5409 : !llvm.ptr + %5411 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5413 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5414 = llvm.mul %4879, %5413 : !llvm.i64 + %5415 = llvm.add %5412, %5414 : !llvm.i64 + %5416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5417 = llvm.mul %547, %5416 : !llvm.i64 + %5418 = llvm.add %5415, %5417 : !llvm.i64 + %5419 = llvm.getelementptr %5411[%5418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5410, %5419 : !llvm.ptr + %5420 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5421 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5422 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5423 = llvm.mul %4879, %5422 : !llvm.i64 + %5424 = llvm.add %5421, %5423 : !llvm.i64 + %5425 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5426 = llvm.mul %59, %5425 : !llvm.i64 + %5427 = llvm.add %5424, %5426 : !llvm.i64 + %5428 = llvm.getelementptr %5420[%5427] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5429 = llvm.load %5428 : !llvm.ptr + %5430 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5431 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5432 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5433 = llvm.mul %59, %5432 : !llvm.i64 + %5434 = llvm.add %5431, %5433 : !llvm.i64 + %5435 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5436 = llvm.mul %608, %5435 : !llvm.i64 + %5437 = llvm.add %5434, %5436 : !llvm.i64 + %5438 = llvm.getelementptr %5430[%5437] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5439 = llvm.load %5438 : !llvm.ptr + %5440 = llvm.fmul %5429, %5439 {RelaxedPrecision} : !llvm.float + %5441 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5442 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5443 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5444 = llvm.mul %4879, %5443 : !llvm.i64 + %5445 = llvm.add %5442, %5444 : !llvm.i64 + %5446 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5447 = llvm.mul %608, %5446 : !llvm.i64 + %5448 = llvm.add %5445, %5447 : !llvm.i64 + %5449 = llvm.getelementptr %5441[%5448] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5450 = llvm.load %5449 : !llvm.ptr + %5451 = llvm.fadd %5450, %5440 {RelaxedPrecision} : !llvm.float + %5452 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5455 = llvm.mul %4879, %5454 : !llvm.i64 + %5456 = llvm.add %5453, %5455 : !llvm.i64 + %5457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5458 = llvm.mul %608, %5457 : !llvm.i64 + %5459 = llvm.add %5456, %5458 : !llvm.i64 + %5460 = llvm.getelementptr %5452[%5459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5451, %5460 : !llvm.ptr + %5461 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5462 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5463 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5464 = llvm.mul %4879, %5463 : !llvm.i64 + %5465 = llvm.add %5462, %5464 : !llvm.i64 + %5466 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5467 = llvm.mul %608, %5466 : !llvm.i64 + %5468 = llvm.add %5465, %5467 : !llvm.i64 + %5469 = llvm.getelementptr %5461[%5468] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5470 = llvm.load %5469 : !llvm.ptr + %5471 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5472 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5473 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5474 = llvm.mul %4879, %5473 : !llvm.i64 + %5475 = llvm.add %5472, %5474 : !llvm.i64 + %5476 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5477 = llvm.mul %608, %5476 : !llvm.i64 + %5478 = llvm.add %5475, %5477 : !llvm.i64 + %5479 = llvm.getelementptr %5471[%5478] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5470, %5479 : !llvm.ptr + %5480 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5481 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5482 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5483 = llvm.mul %4879, %5482 : !llvm.i64 + %5484 = llvm.add %5481, %5483 : !llvm.i64 + %5485 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5486 = llvm.mul %59, %5485 : !llvm.i64 + %5487 = llvm.add %5484, %5486 : !llvm.i64 + %5488 = llvm.getelementptr %5480[%5487] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5489 = llvm.load %5488 : !llvm.ptr + %5490 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5491 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5492 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5493 = llvm.mul %59, %5492 : !llvm.i64 + %5494 = llvm.add %5491, %5493 : !llvm.i64 + %5495 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5496 = llvm.mul %669, %5495 : !llvm.i64 + %5497 = llvm.add %5494, %5496 : !llvm.i64 + %5498 = llvm.getelementptr %5490[%5497] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5499 = llvm.load %5498 : !llvm.ptr + %5500 = llvm.fmul %5489, %5499 {RelaxedPrecision} : !llvm.float + %5501 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5502 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5503 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5504 = llvm.mul %4879, %5503 : !llvm.i64 + %5505 = llvm.add %5502, %5504 : !llvm.i64 + %5506 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5507 = llvm.mul %669, %5506 : !llvm.i64 + %5508 = llvm.add %5505, %5507 : !llvm.i64 + %5509 = llvm.getelementptr %5501[%5508] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5510 = llvm.load %5509 : !llvm.ptr + %5511 = llvm.fadd %5510, %5500 {RelaxedPrecision} : !llvm.float + %5512 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5513 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5514 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5515 = llvm.mul %4879, %5514 : !llvm.i64 + %5516 = llvm.add %5513, %5515 : !llvm.i64 + %5517 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5518 = llvm.mul %669, %5517 : !llvm.i64 + %5519 = llvm.add %5516, %5518 : !llvm.i64 + %5520 = llvm.getelementptr %5512[%5519] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5511, %5520 : !llvm.ptr + %5521 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5522 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5523 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5524 = llvm.mul %4879, %5523 : !llvm.i64 + %5525 = llvm.add %5522, %5524 : !llvm.i64 + %5526 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5527 = llvm.mul %669, %5526 : !llvm.i64 + %5528 = llvm.add %5525, %5527 : !llvm.i64 + %5529 = llvm.getelementptr %5521[%5528] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5530 = llvm.load %5529 : !llvm.ptr + %5531 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5533 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5534 = llvm.mul %4879, %5533 : !llvm.i64 + %5535 = llvm.add %5532, %5534 : !llvm.i64 + %5536 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5537 = llvm.mul %669, %5536 : !llvm.i64 + %5538 = llvm.add %5535, %5537 : !llvm.i64 + %5539 = llvm.getelementptr %5531[%5538] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5530, %5539 : !llvm.ptr + %5540 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5542 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5543 = llvm.mul %4879, %5542 : !llvm.i64 + %5544 = llvm.add %5541, %5543 : !llvm.i64 + %5545 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5546 = llvm.mul %59, %5545 : !llvm.i64 + %5547 = llvm.add %5544, %5546 : !llvm.i64 + %5548 = llvm.getelementptr %5540[%5547] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5549 = llvm.load %5548 : !llvm.ptr + %5550 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5551 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5552 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5553 = llvm.mul %59, %5552 : !llvm.i64 + %5554 = llvm.add %5551, %5553 : !llvm.i64 + %5555 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5556 = llvm.mul %730, %5555 : !llvm.i64 + %5557 = llvm.add %5554, %5556 : !llvm.i64 + %5558 = llvm.getelementptr %5550[%5557] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5559 = llvm.load %5558 : !llvm.ptr + %5560 = llvm.fmul %5549, %5559 {RelaxedPrecision} : !llvm.float + %5561 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5562 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5563 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5564 = llvm.mul %4879, %5563 : !llvm.i64 + %5565 = llvm.add %5562, %5564 : !llvm.i64 + %5566 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5567 = llvm.mul %730, %5566 : !llvm.i64 + %5568 = llvm.add %5565, %5567 : !llvm.i64 + %5569 = llvm.getelementptr %5561[%5568] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5570 = llvm.load %5569 : !llvm.ptr + %5571 = llvm.fadd %5570, %5560 {RelaxedPrecision} : !llvm.float + %5572 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5573 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5574 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5575 = llvm.mul %4879, %5574 : !llvm.i64 + %5576 = llvm.add %5573, %5575 : !llvm.i64 + %5577 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5578 = llvm.mul %730, %5577 : !llvm.i64 + %5579 = llvm.add %5576, %5578 : !llvm.i64 + %5580 = llvm.getelementptr %5572[%5579] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5571, %5580 : !llvm.ptr + %5581 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5582 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5583 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5584 = llvm.mul %4879, %5583 : !llvm.i64 + %5585 = llvm.add %5582, %5584 : !llvm.i64 + %5586 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5587 = llvm.mul %730, %5586 : !llvm.i64 + %5588 = llvm.add %5585, %5587 : !llvm.i64 + %5589 = llvm.getelementptr %5581[%5588] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5590 = llvm.load %5589 : !llvm.ptr + %5591 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5593 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5594 = llvm.mul %4879, %5593 : !llvm.i64 + %5595 = llvm.add %5592, %5594 : !llvm.i64 + %5596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5597 = llvm.mul %730, %5596 : !llvm.i64 + %5598 = llvm.add %5595, %5597 : !llvm.i64 + %5599 = llvm.getelementptr %5591[%5598] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5590, %5599 : !llvm.ptr + %5600 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5602 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5603 = llvm.mul %4879, %5602 : !llvm.i64 + %5604 = llvm.add %5601, %5603 : !llvm.i64 + %5605 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5606 = llvm.mul %59, %5605 : !llvm.i64 + %5607 = llvm.add %5604, %5606 : !llvm.i64 + %5608 = llvm.getelementptr %5600[%5607] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5609 = llvm.load %5608 : !llvm.ptr + %5610 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5611 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5612 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5613 = llvm.mul %59, %5612 : !llvm.i64 + %5614 = llvm.add %5611, %5613 : !llvm.i64 + %5615 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5616 = llvm.mul %791, %5615 : !llvm.i64 + %5617 = llvm.add %5614, %5616 : !llvm.i64 + %5618 = llvm.getelementptr %5610[%5617] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5619 = llvm.load %5618 : !llvm.ptr + %5620 = llvm.fmul %5609, %5619 {RelaxedPrecision} : !llvm.float + %5621 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5622 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5623 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5624 = llvm.mul %4879, %5623 : !llvm.i64 + %5625 = llvm.add %5622, %5624 : !llvm.i64 + %5626 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5627 = llvm.mul %791, %5626 : !llvm.i64 + %5628 = llvm.add %5625, %5627 : !llvm.i64 + %5629 = llvm.getelementptr %5621[%5628] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5630 = llvm.load %5629 : !llvm.ptr + %5631 = llvm.fadd %5630, %5620 {RelaxedPrecision} : !llvm.float + %5632 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5633 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5634 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5635 = llvm.mul %4879, %5634 : !llvm.i64 + %5636 = llvm.add %5633, %5635 : !llvm.i64 + %5637 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5638 = llvm.mul %791, %5637 : !llvm.i64 + %5639 = llvm.add %5636, %5638 : !llvm.i64 + %5640 = llvm.getelementptr %5632[%5639] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5631, %5640 : !llvm.ptr + %5641 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5642 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5643 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5644 = llvm.mul %4879, %5643 : !llvm.i64 + %5645 = llvm.add %5642, %5644 : !llvm.i64 + %5646 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5647 = llvm.mul %791, %5646 : !llvm.i64 + %5648 = llvm.add %5645, %5647 : !llvm.i64 + %5649 = llvm.getelementptr %5641[%5648] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5650 = llvm.load %5649 : !llvm.ptr + %5651 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5652 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5653 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5654 = llvm.mul %4879, %5653 : !llvm.i64 + %5655 = llvm.add %5652, %5654 : !llvm.i64 + %5656 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5657 = llvm.mul %791, %5656 : !llvm.i64 + %5658 = llvm.add %5655, %5657 : !llvm.i64 + %5659 = llvm.getelementptr %5651[%5658] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5650, %5659 : !llvm.ptr + %5660 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5662 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5663 = llvm.mul %4879, %5662 : !llvm.i64 + %5664 = llvm.add %5661, %5663 : !llvm.i64 + %5665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5666 = llvm.mul %59, %5665 : !llvm.i64 + %5667 = llvm.add %5664, %5666 : !llvm.i64 + %5668 = llvm.getelementptr %5660[%5667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5669 = llvm.load %5668 : !llvm.ptr + %5670 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5672 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5673 = llvm.mul %59, %5672 : !llvm.i64 + %5674 = llvm.add %5671, %5673 : !llvm.i64 + %5675 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5676 = llvm.mul %852, %5675 : !llvm.i64 + %5677 = llvm.add %5674, %5676 : !llvm.i64 + %5678 = llvm.getelementptr %5670[%5677] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5679 = llvm.load %5678 : !llvm.ptr + %5680 = llvm.fmul %5669, %5679 {RelaxedPrecision} : !llvm.float + %5681 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5682 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5683 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5684 = llvm.mul %4879, %5683 : !llvm.i64 + %5685 = llvm.add %5682, %5684 : !llvm.i64 + %5686 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5687 = llvm.mul %852, %5686 : !llvm.i64 + %5688 = llvm.add %5685, %5687 : !llvm.i64 + %5689 = llvm.getelementptr %5681[%5688] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5690 = llvm.load %5689 : !llvm.ptr + %5691 = llvm.fadd %5690, %5680 {RelaxedPrecision} : !llvm.float + %5692 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5694 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5695 = llvm.mul %4879, %5694 : !llvm.i64 + %5696 = llvm.add %5693, %5695 : !llvm.i64 + %5697 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5698 = llvm.mul %852, %5697 : !llvm.i64 + %5699 = llvm.add %5696, %5698 : !llvm.i64 + %5700 = llvm.getelementptr %5692[%5699] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5691, %5700 : !llvm.ptr + %5701 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5702 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5703 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5704 = llvm.mul %4879, %5703 : !llvm.i64 + %5705 = llvm.add %5702, %5704 : !llvm.i64 + %5706 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5707 = llvm.mul %852, %5706 : !llvm.i64 + %5708 = llvm.add %5705, %5707 : !llvm.i64 + %5709 = llvm.getelementptr %5701[%5708] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5710 = llvm.load %5709 : !llvm.ptr + %5711 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5712 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5713 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5714 = llvm.mul %4879, %5713 : !llvm.i64 + %5715 = llvm.add %5712, %5714 : !llvm.i64 + %5716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5717 = llvm.mul %852, %5716 : !llvm.i64 + %5718 = llvm.add %5715, %5717 : !llvm.i64 + %5719 = llvm.getelementptr %5711[%5718] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5710, %5719 : !llvm.ptr + %5720 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5721 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5722 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5723 = llvm.mul %4879, %5722 : !llvm.i64 + %5724 = llvm.add %5721, %5723 : !llvm.i64 + %5725 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5726 = llvm.mul %59, %5725 : !llvm.i64 + %5727 = llvm.add %5724, %5726 : !llvm.i64 + %5728 = llvm.getelementptr %5720[%5727] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5729 = llvm.load %5728 : !llvm.ptr + %5730 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5731 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5732 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5733 = llvm.mul %59, %5732 : !llvm.i64 + %5734 = llvm.add %5731, %5733 : !llvm.i64 + %5735 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5736 = llvm.mul %913, %5735 : !llvm.i64 + %5737 = llvm.add %5734, %5736 : !llvm.i64 + %5738 = llvm.getelementptr %5730[%5737] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5739 = llvm.load %5738 : !llvm.ptr + %5740 = llvm.fmul %5729, %5739 {RelaxedPrecision} : !llvm.float + %5741 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5742 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5743 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5744 = llvm.mul %4879, %5743 : !llvm.i64 + %5745 = llvm.add %5742, %5744 : !llvm.i64 + %5746 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5747 = llvm.mul %913, %5746 : !llvm.i64 + %5748 = llvm.add %5745, %5747 : !llvm.i64 + %5749 = llvm.getelementptr %5741[%5748] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5750 = llvm.load %5749 : !llvm.ptr + %5751 = llvm.fadd %5750, %5740 {RelaxedPrecision} : !llvm.float + %5752 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5754 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5755 = llvm.mul %4879, %5754 : !llvm.i64 + %5756 = llvm.add %5753, %5755 : !llvm.i64 + %5757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5758 = llvm.mul %913, %5757 : !llvm.i64 + %5759 = llvm.add %5756, %5758 : !llvm.i64 + %5760 = llvm.getelementptr %5752[%5759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5751, %5760 : !llvm.ptr + %5761 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5762 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5763 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5764 = llvm.mul %4879, %5763 : !llvm.i64 + %5765 = llvm.add %5762, %5764 : !llvm.i64 + %5766 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5767 = llvm.mul %913, %5766 : !llvm.i64 + %5768 = llvm.add %5765, %5767 : !llvm.i64 + %5769 = llvm.getelementptr %5761[%5768] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5770 = llvm.load %5769 : !llvm.ptr + %5771 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5772 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5773 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5774 = llvm.mul %4879, %5773 : !llvm.i64 + %5775 = llvm.add %5772, %5774 : !llvm.i64 + %5776 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5777 = llvm.mul %913, %5776 : !llvm.i64 + %5778 = llvm.add %5775, %5777 : !llvm.i64 + %5779 = llvm.getelementptr %5771[%5778] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5770, %5779 : !llvm.ptr + %5780 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5781 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5782 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5783 = llvm.mul %4879, %5782 : !llvm.i64 + %5784 = llvm.add %5781, %5783 : !llvm.i64 + %5785 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5786 = llvm.mul %59, %5785 : !llvm.i64 + %5787 = llvm.add %5784, %5786 : !llvm.i64 + %5788 = llvm.getelementptr %5780[%5787] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5789 = llvm.load %5788 : !llvm.ptr + %5790 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5791 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5792 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5793 = llvm.mul %59, %5792 : !llvm.i64 + %5794 = llvm.add %5791, %5793 : !llvm.i64 + %5795 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5796 = llvm.mul %974, %5795 : !llvm.i64 + %5797 = llvm.add %5794, %5796 : !llvm.i64 + %5798 = llvm.getelementptr %5790[%5797] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5799 = llvm.load %5798 : !llvm.ptr + %5800 = llvm.fmul %5789, %5799 {RelaxedPrecision} : !llvm.float + %5801 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5802 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5803 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5804 = llvm.mul %4879, %5803 : !llvm.i64 + %5805 = llvm.add %5802, %5804 : !llvm.i64 + %5806 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5807 = llvm.mul %974, %5806 : !llvm.i64 + %5808 = llvm.add %5805, %5807 : !llvm.i64 + %5809 = llvm.getelementptr %5801[%5808] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5810 = llvm.load %5809 : !llvm.ptr + %5811 = llvm.fadd %5810, %5800 {RelaxedPrecision} : !llvm.float + %5812 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5813 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5814 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5815 = llvm.mul %4879, %5814 : !llvm.i64 + %5816 = llvm.add %5813, %5815 : !llvm.i64 + %5817 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5818 = llvm.mul %974, %5817 : !llvm.i64 + %5819 = llvm.add %5816, %5818 : !llvm.i64 + %5820 = llvm.getelementptr %5812[%5819] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5811, %5820 : !llvm.ptr + %5821 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5822 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5823 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5824 = llvm.mul %4879, %5823 : !llvm.i64 + %5825 = llvm.add %5822, %5824 : !llvm.i64 + %5826 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5827 = llvm.mul %974, %5826 : !llvm.i64 + %5828 = llvm.add %5825, %5827 : !llvm.i64 + %5829 = llvm.getelementptr %5821[%5828] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5830 = llvm.load %5829 : !llvm.ptr + %5831 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5832 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5833 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5834 = llvm.mul %4879, %5833 : !llvm.i64 + %5835 = llvm.add %5832, %5834 : !llvm.i64 + %5836 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5837 = llvm.mul %974, %5836 : !llvm.i64 + %5838 = llvm.add %5835, %5837 : !llvm.i64 + %5839 = llvm.getelementptr %5831[%5838] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5830, %5839 : !llvm.ptr + %5840 = llvm.add %56, %33 : !llvm.i64 + llvm.br ^bb9(%5840 : !llvm.i64) + ^bb11: // pred: ^bb9 + %5841 = llvm.add %54, %36 : !llvm.i64 + llvm.br ^bb7(%5841 : !llvm.i64) + ^bb12: // pred: ^bb7 + %5842 = llvm.add %52, %30 : !llvm.i64 + llvm.br ^bb5(%5842 : !llvm.i64) + ^bb13: // pred: ^bb5 + %5843 = llvm.add %50, %38 : !llvm.i64 + llvm.br ^bb3(%5843 : !llvm.i64) + ^bb14: // pred: ^bb3 + llvm.br ^bb15(%32 : !llvm.i64) + ^bb15(%5844: !llvm.i64): // 2 preds: ^bb14, ^bb22 + %5845 = llvm.icmp "slt" %5844, %29 : !llvm.i64 + llvm.cond_br %5845, ^bb16, ^bb23 + ^bb16: // pred: ^bb15 + llvm.br ^bb17(%32 : !llvm.i64) + ^bb17(%5846: !llvm.i64): // 2 preds: ^bb16, ^bb21 + %5847 = llvm.icmp "slt" %5846, %31 : !llvm.i64 + llvm.cond_br %5847, ^bb18, ^bb22 + ^bb18: // pred: ^bb17 + llvm.br ^bb19(%32 : !llvm.i64) + ^bb19(%5848: !llvm.i64): // 2 preds: ^bb18, ^bb20 + %5849 = llvm.icmp "slt" %5848, %36 : !llvm.i64 + llvm.cond_br %5849, ^bb20, ^bb21 + ^bb20: // pred: ^bb19 + %5850 = llvm.add %48, %5844 : !llvm.i64 + %5851 = llvm.add %5846, %5848 : !llvm.i64 + %5852 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5853 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5854 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5855 = llvm.mul %28, %5854 : !llvm.i64 + %5856 = llvm.add %5853, %5855 : !llvm.i64 + %5857 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5858 = llvm.mul %5851, %5857 : !llvm.i64 + %5859 = llvm.add %5856, %5858 : !llvm.i64 + %5860 = llvm.getelementptr %5852[%5859] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5861 = llvm.load %5860 : !llvm.ptr + %5862 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5863 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5864 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5865 = llvm.mul %5851, %5864 : !llvm.i64 + %5866 = llvm.add %5863, %5865 : !llvm.i64 + %5867 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5868 = llvm.mul %5850, %5867 : !llvm.i64 + %5869 = llvm.add %5866, %5868 : !llvm.i64 + %5870 = llvm.getelementptr %5862[%5869] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5871 = llvm.load %5870 : !llvm.ptr + %5872 = llvm.fmul %5861, %5871 {RelaxedPrecision} : !llvm.float + %5873 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5874 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5875 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5876 = llvm.mul %28, %5875 : !llvm.i64 + %5877 = llvm.add %5874, %5876 : !llvm.i64 + %5878 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5879 = llvm.mul %5850, %5878 : !llvm.i64 + %5880 = llvm.add %5877, %5879 : !llvm.i64 + %5881 = llvm.getelementptr %5873[%5880] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5882 = llvm.load %5881 : !llvm.ptr + %5883 = llvm.fadd %5882, %5872 {RelaxedPrecision} : !llvm.float + %5884 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5885 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5886 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5887 = llvm.mul %28, %5886 : !llvm.i64 + %5888 = llvm.add %5885, %5887 : !llvm.i64 + %5889 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5890 = llvm.mul %5850, %5889 : !llvm.i64 + %5891 = llvm.add %5888, %5890 : !llvm.i64 + %5892 = llvm.getelementptr %5884[%5891] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5883, %5892 : !llvm.ptr + %5893 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5894 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5895 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5896 = llvm.mul %28, %5895 : !llvm.i64 + %5897 = llvm.add %5894, %5896 : !llvm.i64 + %5898 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5899 = llvm.mul %5850, %5898 : !llvm.i64 + %5900 = llvm.add %5897, %5899 : !llvm.i64 + %5901 = llvm.getelementptr %5893[%5900] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5902 = llvm.load %5901 : !llvm.ptr + %5903 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5904 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5905 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5906 = llvm.mul %28, %5905 : !llvm.i64 + %5907 = llvm.add %5904, %5906 : !llvm.i64 + %5908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5909 = llvm.mul %5850, %5908 : !llvm.i64 + %5910 = llvm.add %5907, %5909 : !llvm.i64 + %5911 = llvm.getelementptr %5903[%5910] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5902, %5911 : !llvm.ptr + %5912 = llvm.add %5850, %33 : !llvm.i64 + %5913 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5914 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5915 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5916 = llvm.mul %28, %5915 : !llvm.i64 + %5917 = llvm.add %5914, %5916 : !llvm.i64 + %5918 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5919 = llvm.mul %5851, %5918 : !llvm.i64 + %5920 = llvm.add %5917, %5919 : !llvm.i64 + %5921 = llvm.getelementptr %5913[%5920] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5922 = llvm.load %5921 : !llvm.ptr + %5923 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5924 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5925 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5926 = llvm.mul %5851, %5925 : !llvm.i64 + %5927 = llvm.add %5924, %5926 : !llvm.i64 + %5928 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5929 = llvm.mul %5912, %5928 : !llvm.i64 + %5930 = llvm.add %5927, %5929 : !llvm.i64 + %5931 = llvm.getelementptr %5923[%5930] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5932 = llvm.load %5931 : !llvm.ptr + %5933 = llvm.fmul %5922, %5932 {RelaxedPrecision} : !llvm.float + %5934 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5935 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5936 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5937 = llvm.mul %28, %5936 : !llvm.i64 + %5938 = llvm.add %5935, %5937 : !llvm.i64 + %5939 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5940 = llvm.mul %5912, %5939 : !llvm.i64 + %5941 = llvm.add %5938, %5940 : !llvm.i64 + %5942 = llvm.getelementptr %5934[%5941] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5943 = llvm.load %5942 : !llvm.ptr + %5944 = llvm.fadd %5943, %5933 {RelaxedPrecision} : !llvm.float + %5945 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5946 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5947 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5948 = llvm.mul %28, %5947 : !llvm.i64 + %5949 = llvm.add %5946, %5948 : !llvm.i64 + %5950 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5951 = llvm.mul %5912, %5950 : !llvm.i64 + %5952 = llvm.add %5949, %5951 : !llvm.i64 + %5953 = llvm.getelementptr %5945[%5952] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5944, %5953 : !llvm.ptr + %5954 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5955 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5956 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5957 = llvm.mul %28, %5956 : !llvm.i64 + %5958 = llvm.add %5955, %5957 : !llvm.i64 + %5959 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5960 = llvm.mul %5912, %5959 : !llvm.i64 + %5961 = llvm.add %5958, %5960 : !llvm.i64 + %5962 = llvm.getelementptr %5954[%5961] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5963 = llvm.load %5962 : !llvm.ptr + %5964 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5966 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5967 = llvm.mul %28, %5966 : !llvm.i64 + %5968 = llvm.add %5965, %5967 : !llvm.i64 + %5969 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5970 = llvm.mul %5912, %5969 : !llvm.i64 + %5971 = llvm.add %5968, %5970 : !llvm.i64 + %5972 = llvm.getelementptr %5964[%5971] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5963, %5972 : !llvm.ptr + %5973 = llvm.add %5850, %34 : !llvm.i64 + %5974 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5975 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5976 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5977 = llvm.mul %28, %5976 : !llvm.i64 + %5978 = llvm.add %5975, %5977 : !llvm.i64 + %5979 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5980 = llvm.mul %5851, %5979 : !llvm.i64 + %5981 = llvm.add %5978, %5980 : !llvm.i64 + %5982 = llvm.getelementptr %5974[%5981] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5983 = llvm.load %5982 : !llvm.ptr + %5984 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5985 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5986 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5987 = llvm.mul %5851, %5986 : !llvm.i64 + %5988 = llvm.add %5985, %5987 : !llvm.i64 + %5989 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5990 = llvm.mul %5973, %5989 : !llvm.i64 + %5991 = llvm.add %5988, %5990 : !llvm.i64 + %5992 = llvm.getelementptr %5984[%5991] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5993 = llvm.load %5992 : !llvm.ptr + %5994 = llvm.fmul %5983, %5993 {RelaxedPrecision} : !llvm.float + %5995 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5996 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5997 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5998 = llvm.mul %28, %5997 : !llvm.i64 + %5999 = llvm.add %5996, %5998 : !llvm.i64 + %6000 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6001 = llvm.mul %5973, %6000 : !llvm.i64 + %6002 = llvm.add %5999, %6001 : !llvm.i64 + %6003 = llvm.getelementptr %5995[%6002] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6004 = llvm.load %6003 : !llvm.ptr + %6005 = llvm.fadd %6004, %5994 {RelaxedPrecision} : !llvm.float + %6006 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6007 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6008 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6009 = llvm.mul %28, %6008 : !llvm.i64 + %6010 = llvm.add %6007, %6009 : !llvm.i64 + %6011 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6012 = llvm.mul %5973, %6011 : !llvm.i64 + %6013 = llvm.add %6010, %6012 : !llvm.i64 + %6014 = llvm.getelementptr %6006[%6013] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6005, %6014 : !llvm.ptr + %6015 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6016 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6017 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6018 = llvm.mul %28, %6017 : !llvm.i64 + %6019 = llvm.add %6016, %6018 : !llvm.i64 + %6020 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6021 = llvm.mul %5973, %6020 : !llvm.i64 + %6022 = llvm.add %6019, %6021 : !llvm.i64 + %6023 = llvm.getelementptr %6015[%6022] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6024 = llvm.load %6023 : !llvm.ptr + %6025 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6026 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6027 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6028 = llvm.mul %28, %6027 : !llvm.i64 + %6029 = llvm.add %6026, %6028 : !llvm.i64 + %6030 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6031 = llvm.mul %5973, %6030 : !llvm.i64 + %6032 = llvm.add %6029, %6031 : !llvm.i64 + %6033 = llvm.getelementptr %6025[%6032] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6024, %6033 : !llvm.ptr + %6034 = llvm.add %5850, %35 : !llvm.i64 + %6035 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6036 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6037 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6038 = llvm.mul %28, %6037 : !llvm.i64 + %6039 = llvm.add %6036, %6038 : !llvm.i64 + %6040 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6041 = llvm.mul %5851, %6040 : !llvm.i64 + %6042 = llvm.add %6039, %6041 : !llvm.i64 + %6043 = llvm.getelementptr %6035[%6042] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6044 = llvm.load %6043 : !llvm.ptr + %6045 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6046 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6047 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6048 = llvm.mul %5851, %6047 : !llvm.i64 + %6049 = llvm.add %6046, %6048 : !llvm.i64 + %6050 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6051 = llvm.mul %6034, %6050 : !llvm.i64 + %6052 = llvm.add %6049, %6051 : !llvm.i64 + %6053 = llvm.getelementptr %6045[%6052] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6054 = llvm.load %6053 : !llvm.ptr + %6055 = llvm.fmul %6044, %6054 {RelaxedPrecision} : !llvm.float + %6056 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6057 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6058 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6059 = llvm.mul %28, %6058 : !llvm.i64 + %6060 = llvm.add %6057, %6059 : !llvm.i64 + %6061 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6062 = llvm.mul %6034, %6061 : !llvm.i64 + %6063 = llvm.add %6060, %6062 : !llvm.i64 + %6064 = llvm.getelementptr %6056[%6063] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6065 = llvm.load %6064 : !llvm.ptr + %6066 = llvm.fadd %6065, %6055 {RelaxedPrecision} : !llvm.float + %6067 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6069 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6070 = llvm.mul %28, %6069 : !llvm.i64 + %6071 = llvm.add %6068, %6070 : !llvm.i64 + %6072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6073 = llvm.mul %6034, %6072 : !llvm.i64 + %6074 = llvm.add %6071, %6073 : !llvm.i64 + %6075 = llvm.getelementptr %6067[%6074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6066, %6075 : !llvm.ptr + %6076 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6077 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6078 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6079 = llvm.mul %28, %6078 : !llvm.i64 + %6080 = llvm.add %6077, %6079 : !llvm.i64 + %6081 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6082 = llvm.mul %6034, %6081 : !llvm.i64 + %6083 = llvm.add %6080, %6082 : !llvm.i64 + %6084 = llvm.getelementptr %6076[%6083] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6085 = llvm.load %6084 : !llvm.ptr + %6086 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6087 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6088 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6089 = llvm.mul %28, %6088 : !llvm.i64 + %6090 = llvm.add %6087, %6089 : !llvm.i64 + %6091 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6092 = llvm.mul %6034, %6091 : !llvm.i64 + %6093 = llvm.add %6090, %6092 : !llvm.i64 + %6094 = llvm.getelementptr %6086[%6093] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6085, %6094 : !llvm.ptr + %6095 = llvm.add %5850, %36 : !llvm.i64 + %6096 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6098 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6099 = llvm.mul %28, %6098 : !llvm.i64 + %6100 = llvm.add %6097, %6099 : !llvm.i64 + %6101 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6102 = llvm.mul %5851, %6101 : !llvm.i64 + %6103 = llvm.add %6100, %6102 : !llvm.i64 + %6104 = llvm.getelementptr %6096[%6103] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6105 = llvm.load %6104 : !llvm.ptr + %6106 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6108 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6109 = llvm.mul %5851, %6108 : !llvm.i64 + %6110 = llvm.add %6107, %6109 : !llvm.i64 + %6111 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6112 = llvm.mul %6095, %6111 : !llvm.i64 + %6113 = llvm.add %6110, %6112 : !llvm.i64 + %6114 = llvm.getelementptr %6106[%6113] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6115 = llvm.load %6114 : !llvm.ptr + %6116 = llvm.fmul %6105, %6115 {RelaxedPrecision} : !llvm.float + %6117 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6120 = llvm.mul %28, %6119 : !llvm.i64 + %6121 = llvm.add %6118, %6120 : !llvm.i64 + %6122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6123 = llvm.mul %6095, %6122 : !llvm.i64 + %6124 = llvm.add %6121, %6123 : !llvm.i64 + %6125 = llvm.getelementptr %6117[%6124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6126 = llvm.load %6125 : !llvm.ptr + %6127 = llvm.fadd %6126, %6116 {RelaxedPrecision} : !llvm.float + %6128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6131 = llvm.mul %28, %6130 : !llvm.i64 + %6132 = llvm.add %6129, %6131 : !llvm.i64 + %6133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6134 = llvm.mul %6095, %6133 : !llvm.i64 + %6135 = llvm.add %6132, %6134 : !llvm.i64 + %6136 = llvm.getelementptr %6128[%6135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6127, %6136 : !llvm.ptr + %6137 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6140 = llvm.mul %28, %6139 : !llvm.i64 + %6141 = llvm.add %6138, %6140 : !llvm.i64 + %6142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6143 = llvm.mul %6095, %6142 : !llvm.i64 + %6144 = llvm.add %6141, %6143 : !llvm.i64 + %6145 = llvm.getelementptr %6137[%6144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6146 = llvm.load %6145 : !llvm.ptr + %6147 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6149 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6150 = llvm.mul %28, %6149 : !llvm.i64 + %6151 = llvm.add %6148, %6150 : !llvm.i64 + %6152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6153 = llvm.mul %6095, %6152 : !llvm.i64 + %6154 = llvm.add %6151, %6153 : !llvm.i64 + %6155 = llvm.getelementptr %6147[%6154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6146, %6155 : !llvm.ptr + %6156 = llvm.add %5850, %37 : !llvm.i64 + %6157 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6159 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6160 = llvm.mul %28, %6159 : !llvm.i64 + %6161 = llvm.add %6158, %6160 : !llvm.i64 + %6162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6163 = llvm.mul %5851, %6162 : !llvm.i64 + %6164 = llvm.add %6161, %6163 : !llvm.i64 + %6165 = llvm.getelementptr %6157[%6164] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6166 = llvm.load %6165 : !llvm.ptr + %6167 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6169 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6170 = llvm.mul %5851, %6169 : !llvm.i64 + %6171 = llvm.add %6168, %6170 : !llvm.i64 + %6172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6173 = llvm.mul %6156, %6172 : !llvm.i64 + %6174 = llvm.add %6171, %6173 : !llvm.i64 + %6175 = llvm.getelementptr %6167[%6174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6176 = llvm.load %6175 : !llvm.ptr + %6177 = llvm.fmul %6166, %6176 {RelaxedPrecision} : !llvm.float + %6178 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6180 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6181 = llvm.mul %28, %6180 : !llvm.i64 + %6182 = llvm.add %6179, %6181 : !llvm.i64 + %6183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6184 = llvm.mul %6156, %6183 : !llvm.i64 + %6185 = llvm.add %6182, %6184 : !llvm.i64 + %6186 = llvm.getelementptr %6178[%6185] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6187 = llvm.load %6186 : !llvm.ptr + %6188 = llvm.fadd %6187, %6177 {RelaxedPrecision} : !llvm.float + %6189 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6191 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6192 = llvm.mul %28, %6191 : !llvm.i64 + %6193 = llvm.add %6190, %6192 : !llvm.i64 + %6194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6195 = llvm.mul %6156, %6194 : !llvm.i64 + %6196 = llvm.add %6193, %6195 : !llvm.i64 + %6197 = llvm.getelementptr %6189[%6196] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6188, %6197 : !llvm.ptr + %6198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6201 = llvm.mul %28, %6200 : !llvm.i64 + %6202 = llvm.add %6199, %6201 : !llvm.i64 + %6203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6204 = llvm.mul %6156, %6203 : !llvm.i64 + %6205 = llvm.add %6202, %6204 : !llvm.i64 + %6206 = llvm.getelementptr %6198[%6205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6207 = llvm.load %6206 : !llvm.ptr + %6208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6211 = llvm.mul %28, %6210 : !llvm.i64 + %6212 = llvm.add %6209, %6211 : !llvm.i64 + %6213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6214 = llvm.mul %6156, %6213 : !llvm.i64 + %6215 = llvm.add %6212, %6214 : !llvm.i64 + %6216 = llvm.getelementptr %6208[%6215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6207, %6216 : !llvm.ptr + %6217 = llvm.add %5850, %38 : !llvm.i64 + %6218 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6220 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6221 = llvm.mul %28, %6220 : !llvm.i64 + %6222 = llvm.add %6219, %6221 : !llvm.i64 + %6223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6224 = llvm.mul %5851, %6223 : !llvm.i64 + %6225 = llvm.add %6222, %6224 : !llvm.i64 + %6226 = llvm.getelementptr %6218[%6225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6227 = llvm.load %6226 : !llvm.ptr + %6228 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6231 = llvm.mul %5851, %6230 : !llvm.i64 + %6232 = llvm.add %6229, %6231 : !llvm.i64 + %6233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6234 = llvm.mul %6217, %6233 : !llvm.i64 + %6235 = llvm.add %6232, %6234 : !llvm.i64 + %6236 = llvm.getelementptr %6228[%6235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6237 = llvm.load %6236 : !llvm.ptr + %6238 = llvm.fmul %6227, %6237 {RelaxedPrecision} : !llvm.float + %6239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6242 = llvm.mul %28, %6241 : !llvm.i64 + %6243 = llvm.add %6240, %6242 : !llvm.i64 + %6244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6245 = llvm.mul %6217, %6244 : !llvm.i64 + %6246 = llvm.add %6243, %6245 : !llvm.i64 + %6247 = llvm.getelementptr %6239[%6246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6248 = llvm.load %6247 : !llvm.ptr + %6249 = llvm.fadd %6248, %6238 {RelaxedPrecision} : !llvm.float + %6250 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6252 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6253 = llvm.mul %28, %6252 : !llvm.i64 + %6254 = llvm.add %6251, %6253 : !llvm.i64 + %6255 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6256 = llvm.mul %6217, %6255 : !llvm.i64 + %6257 = llvm.add %6254, %6256 : !llvm.i64 + %6258 = llvm.getelementptr %6250[%6257] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6249, %6258 : !llvm.ptr + %6259 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6260 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6261 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6262 = llvm.mul %28, %6261 : !llvm.i64 + %6263 = llvm.add %6260, %6262 : !llvm.i64 + %6264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6265 = llvm.mul %6217, %6264 : !llvm.i64 + %6266 = llvm.add %6263, %6265 : !llvm.i64 + %6267 = llvm.getelementptr %6259[%6266] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6268 = llvm.load %6267 : !llvm.ptr + %6269 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6270 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6271 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6272 = llvm.mul %28, %6271 : !llvm.i64 + %6273 = llvm.add %6270, %6272 : !llvm.i64 + %6274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6275 = llvm.mul %6217, %6274 : !llvm.i64 + %6276 = llvm.add %6273, %6275 : !llvm.i64 + %6277 = llvm.getelementptr %6269[%6276] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6268, %6277 : !llvm.ptr + %6278 = llvm.add %5850, %39 : !llvm.i64 + %6279 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6281 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6282 = llvm.mul %28, %6281 : !llvm.i64 + %6283 = llvm.add %6280, %6282 : !llvm.i64 + %6284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6285 = llvm.mul %5851, %6284 : !llvm.i64 + %6286 = llvm.add %6283, %6285 : !llvm.i64 + %6287 = llvm.getelementptr %6279[%6286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6288 = llvm.load %6287 : !llvm.ptr + %6289 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6290 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6291 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6292 = llvm.mul %5851, %6291 : !llvm.i64 + %6293 = llvm.add %6290, %6292 : !llvm.i64 + %6294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6295 = llvm.mul %6278, %6294 : !llvm.i64 + %6296 = llvm.add %6293, %6295 : !llvm.i64 + %6297 = llvm.getelementptr %6289[%6296] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6298 = llvm.load %6297 : !llvm.ptr + %6299 = llvm.fmul %6288, %6298 {RelaxedPrecision} : !llvm.float + %6300 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6302 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6303 = llvm.mul %28, %6302 : !llvm.i64 + %6304 = llvm.add %6301, %6303 : !llvm.i64 + %6305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6306 = llvm.mul %6278, %6305 : !llvm.i64 + %6307 = llvm.add %6304, %6306 : !llvm.i64 + %6308 = llvm.getelementptr %6300[%6307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6309 = llvm.load %6308 : !llvm.ptr + %6310 = llvm.fadd %6309, %6299 {RelaxedPrecision} : !llvm.float + %6311 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6314 = llvm.mul %28, %6313 : !llvm.i64 + %6315 = llvm.add %6312, %6314 : !llvm.i64 + %6316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6317 = llvm.mul %6278, %6316 : !llvm.i64 + %6318 = llvm.add %6315, %6317 : !llvm.i64 + %6319 = llvm.getelementptr %6311[%6318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6310, %6319 : !llvm.ptr + %6320 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6321 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6322 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6323 = llvm.mul %28, %6322 : !llvm.i64 + %6324 = llvm.add %6321, %6323 : !llvm.i64 + %6325 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6326 = llvm.mul %6278, %6325 : !llvm.i64 + %6327 = llvm.add %6324, %6326 : !llvm.i64 + %6328 = llvm.getelementptr %6320[%6327] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6329 = llvm.load %6328 : !llvm.ptr + %6330 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6331 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6332 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6333 = llvm.mul %28, %6332 : !llvm.i64 + %6334 = llvm.add %6331, %6333 : !llvm.i64 + %6335 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6336 = llvm.mul %6278, %6335 : !llvm.i64 + %6337 = llvm.add %6334, %6336 : !llvm.i64 + %6338 = llvm.getelementptr %6330[%6337] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6329, %6338 : !llvm.ptr + %6339 = llvm.add %5850, %40 : !llvm.i64 + %6340 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6342 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6343 = llvm.mul %28, %6342 : !llvm.i64 + %6344 = llvm.add %6341, %6343 : !llvm.i64 + %6345 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6346 = llvm.mul %5851, %6345 : !llvm.i64 + %6347 = llvm.add %6344, %6346 : !llvm.i64 + %6348 = llvm.getelementptr %6340[%6347] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6349 = llvm.load %6348 : !llvm.ptr + %6350 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6351 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6352 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6353 = llvm.mul %5851, %6352 : !llvm.i64 + %6354 = llvm.add %6351, %6353 : !llvm.i64 + %6355 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6356 = llvm.mul %6339, %6355 : !llvm.i64 + %6357 = llvm.add %6354, %6356 : !llvm.i64 + %6358 = llvm.getelementptr %6350[%6357] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6359 = llvm.load %6358 : !llvm.ptr + %6360 = llvm.fmul %6349, %6359 {RelaxedPrecision} : !llvm.float + %6361 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6362 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6363 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6364 = llvm.mul %28, %6363 : !llvm.i64 + %6365 = llvm.add %6362, %6364 : !llvm.i64 + %6366 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6367 = llvm.mul %6339, %6366 : !llvm.i64 + %6368 = llvm.add %6365, %6367 : !llvm.i64 + %6369 = llvm.getelementptr %6361[%6368] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6370 = llvm.load %6369 : !llvm.ptr + %6371 = llvm.fadd %6370, %6360 {RelaxedPrecision} : !llvm.float + %6372 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6373 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6374 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6375 = llvm.mul %28, %6374 : !llvm.i64 + %6376 = llvm.add %6373, %6375 : !llvm.i64 + %6377 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6378 = llvm.mul %6339, %6377 : !llvm.i64 + %6379 = llvm.add %6376, %6378 : !llvm.i64 + %6380 = llvm.getelementptr %6372[%6379] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6371, %6380 : !llvm.ptr + %6381 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6383 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6384 = llvm.mul %28, %6383 : !llvm.i64 + %6385 = llvm.add %6382, %6384 : !llvm.i64 + %6386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6387 = llvm.mul %6339, %6386 : !llvm.i64 + %6388 = llvm.add %6385, %6387 : !llvm.i64 + %6389 = llvm.getelementptr %6381[%6388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6390 = llvm.load %6389 : !llvm.ptr + %6391 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6392 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6393 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6394 = llvm.mul %28, %6393 : !llvm.i64 + %6395 = llvm.add %6392, %6394 : !llvm.i64 + %6396 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6397 = llvm.mul %6339, %6396 : !llvm.i64 + %6398 = llvm.add %6395, %6397 : !llvm.i64 + %6399 = llvm.getelementptr %6391[%6398] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6390, %6399 : !llvm.ptr + %6400 = llvm.add %5850, %41 : !llvm.i64 + %6401 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6403 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6404 = llvm.mul %28, %6403 : !llvm.i64 + %6405 = llvm.add %6402, %6404 : !llvm.i64 + %6406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6407 = llvm.mul %5851, %6406 : !llvm.i64 + %6408 = llvm.add %6405, %6407 : !llvm.i64 + %6409 = llvm.getelementptr %6401[%6408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6410 = llvm.load %6409 : !llvm.ptr + %6411 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6413 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6414 = llvm.mul %5851, %6413 : !llvm.i64 + %6415 = llvm.add %6412, %6414 : !llvm.i64 + %6416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6417 = llvm.mul %6400, %6416 : !llvm.i64 + %6418 = llvm.add %6415, %6417 : !llvm.i64 + %6419 = llvm.getelementptr %6411[%6418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6420 = llvm.load %6419 : !llvm.ptr + %6421 = llvm.fmul %6410, %6420 {RelaxedPrecision} : !llvm.float + %6422 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6423 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6424 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6425 = llvm.mul %28, %6424 : !llvm.i64 + %6426 = llvm.add %6423, %6425 : !llvm.i64 + %6427 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6428 = llvm.mul %6400, %6427 : !llvm.i64 + %6429 = llvm.add %6426, %6428 : !llvm.i64 + %6430 = llvm.getelementptr %6422[%6429] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6431 = llvm.load %6430 : !llvm.ptr + %6432 = llvm.fadd %6431, %6421 {RelaxedPrecision} : !llvm.float + %6433 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6434 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6435 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6436 = llvm.mul %28, %6435 : !llvm.i64 + %6437 = llvm.add %6434, %6436 : !llvm.i64 + %6438 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6439 = llvm.mul %6400, %6438 : !llvm.i64 + %6440 = llvm.add %6437, %6439 : !llvm.i64 + %6441 = llvm.getelementptr %6433[%6440] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6432, %6441 : !llvm.ptr + %6442 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6444 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6445 = llvm.mul %28, %6444 : !llvm.i64 + %6446 = llvm.add %6443, %6445 : !llvm.i64 + %6447 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6448 = llvm.mul %6400, %6447 : !llvm.i64 + %6449 = llvm.add %6446, %6448 : !llvm.i64 + %6450 = llvm.getelementptr %6442[%6449] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6451 = llvm.load %6450 : !llvm.ptr + %6452 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6455 = llvm.mul %28, %6454 : !llvm.i64 + %6456 = llvm.add %6453, %6455 : !llvm.i64 + %6457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6458 = llvm.mul %6400, %6457 : !llvm.i64 + %6459 = llvm.add %6456, %6458 : !llvm.i64 + %6460 = llvm.getelementptr %6452[%6459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6451, %6460 : !llvm.ptr + %6461 = llvm.add %5850, %42 : !llvm.i64 + %6462 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6463 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6464 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6465 = llvm.mul %28, %6464 : !llvm.i64 + %6466 = llvm.add %6463, %6465 : !llvm.i64 + %6467 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6468 = llvm.mul %5851, %6467 : !llvm.i64 + %6469 = llvm.add %6466, %6468 : !llvm.i64 + %6470 = llvm.getelementptr %6462[%6469] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6471 = llvm.load %6470 : !llvm.ptr + %6472 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6473 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6474 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6475 = llvm.mul %5851, %6474 : !llvm.i64 + %6476 = llvm.add %6473, %6475 : !llvm.i64 + %6477 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6478 = llvm.mul %6461, %6477 : !llvm.i64 + %6479 = llvm.add %6476, %6478 : !llvm.i64 + %6480 = llvm.getelementptr %6472[%6479] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6481 = llvm.load %6480 : !llvm.ptr + %6482 = llvm.fmul %6471, %6481 {RelaxedPrecision} : !llvm.float + %6483 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6484 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6485 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6486 = llvm.mul %28, %6485 : !llvm.i64 + %6487 = llvm.add %6484, %6486 : !llvm.i64 + %6488 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6489 = llvm.mul %6461, %6488 : !llvm.i64 + %6490 = llvm.add %6487, %6489 : !llvm.i64 + %6491 = llvm.getelementptr %6483[%6490] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6492 = llvm.load %6491 : !llvm.ptr + %6493 = llvm.fadd %6492, %6482 {RelaxedPrecision} : !llvm.float + %6494 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6495 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6496 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6497 = llvm.mul %28, %6496 : !llvm.i64 + %6498 = llvm.add %6495, %6497 : !llvm.i64 + %6499 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6500 = llvm.mul %6461, %6499 : !llvm.i64 + %6501 = llvm.add %6498, %6500 : !llvm.i64 + %6502 = llvm.getelementptr %6494[%6501] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6493, %6502 : !llvm.ptr + %6503 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6504 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6505 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6506 = llvm.mul %28, %6505 : !llvm.i64 + %6507 = llvm.add %6504, %6506 : !llvm.i64 + %6508 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6509 = llvm.mul %6461, %6508 : !llvm.i64 + %6510 = llvm.add %6507, %6509 : !llvm.i64 + %6511 = llvm.getelementptr %6503[%6510] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6512 = llvm.load %6511 : !llvm.ptr + %6513 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6514 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6515 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6516 = llvm.mul %28, %6515 : !llvm.i64 + %6517 = llvm.add %6514, %6516 : !llvm.i64 + %6518 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6519 = llvm.mul %6461, %6518 : !llvm.i64 + %6520 = llvm.add %6517, %6519 : !llvm.i64 + %6521 = llvm.getelementptr %6513[%6520] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6512, %6521 : !llvm.ptr + %6522 = llvm.add %5850, %43 : !llvm.i64 + %6523 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6524 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6525 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6526 = llvm.mul %28, %6525 : !llvm.i64 + %6527 = llvm.add %6524, %6526 : !llvm.i64 + %6528 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6529 = llvm.mul %5851, %6528 : !llvm.i64 + %6530 = llvm.add %6527, %6529 : !llvm.i64 + %6531 = llvm.getelementptr %6523[%6530] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6532 = llvm.load %6531 : !llvm.ptr + %6533 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6534 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6535 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6536 = llvm.mul %5851, %6535 : !llvm.i64 + %6537 = llvm.add %6534, %6536 : !llvm.i64 + %6538 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6539 = llvm.mul %6522, %6538 : !llvm.i64 + %6540 = llvm.add %6537, %6539 : !llvm.i64 + %6541 = llvm.getelementptr %6533[%6540] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6542 = llvm.load %6541 : !llvm.ptr + %6543 = llvm.fmul %6532, %6542 {RelaxedPrecision} : !llvm.float + %6544 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6545 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6546 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6547 = llvm.mul %28, %6546 : !llvm.i64 + %6548 = llvm.add %6545, %6547 : !llvm.i64 + %6549 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6550 = llvm.mul %6522, %6549 : !llvm.i64 + %6551 = llvm.add %6548, %6550 : !llvm.i64 + %6552 = llvm.getelementptr %6544[%6551] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6553 = llvm.load %6552 : !llvm.ptr + %6554 = llvm.fadd %6553, %6543 {RelaxedPrecision} : !llvm.float + %6555 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6556 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6557 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6558 = llvm.mul %28, %6557 : !llvm.i64 + %6559 = llvm.add %6556, %6558 : !llvm.i64 + %6560 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6561 = llvm.mul %6522, %6560 : !llvm.i64 + %6562 = llvm.add %6559, %6561 : !llvm.i64 + %6563 = llvm.getelementptr %6555[%6562] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6554, %6563 : !llvm.ptr + %6564 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6565 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6566 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6567 = llvm.mul %28, %6566 : !llvm.i64 + %6568 = llvm.add %6565, %6567 : !llvm.i64 + %6569 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6570 = llvm.mul %6522, %6569 : !llvm.i64 + %6571 = llvm.add %6568, %6570 : !llvm.i64 + %6572 = llvm.getelementptr %6564[%6571] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6573 = llvm.load %6572 : !llvm.ptr + %6574 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6575 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6576 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6577 = llvm.mul %28, %6576 : !llvm.i64 + %6578 = llvm.add %6575, %6577 : !llvm.i64 + %6579 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6580 = llvm.mul %6522, %6579 : !llvm.i64 + %6581 = llvm.add %6578, %6580 : !llvm.i64 + %6582 = llvm.getelementptr %6574[%6581] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6573, %6582 : !llvm.ptr + %6583 = llvm.add %5850, %44 : !llvm.i64 + %6584 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6585 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6586 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6587 = llvm.mul %28, %6586 : !llvm.i64 + %6588 = llvm.add %6585, %6587 : !llvm.i64 + %6589 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6590 = llvm.mul %5851, %6589 : !llvm.i64 + %6591 = llvm.add %6588, %6590 : !llvm.i64 + %6592 = llvm.getelementptr %6584[%6591] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6593 = llvm.load %6592 : !llvm.ptr + %6594 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6595 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6596 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6597 = llvm.mul %5851, %6596 : !llvm.i64 + %6598 = llvm.add %6595, %6597 : !llvm.i64 + %6599 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6600 = llvm.mul %6583, %6599 : !llvm.i64 + %6601 = llvm.add %6598, %6600 : !llvm.i64 + %6602 = llvm.getelementptr %6594[%6601] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6603 = llvm.load %6602 : !llvm.ptr + %6604 = llvm.fmul %6593, %6603 {RelaxedPrecision} : !llvm.float + %6605 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6606 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6607 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6608 = llvm.mul %28, %6607 : !llvm.i64 + %6609 = llvm.add %6606, %6608 : !llvm.i64 + %6610 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6611 = llvm.mul %6583, %6610 : !llvm.i64 + %6612 = llvm.add %6609, %6611 : !llvm.i64 + %6613 = llvm.getelementptr %6605[%6612] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6614 = llvm.load %6613 : !llvm.ptr + %6615 = llvm.fadd %6614, %6604 {RelaxedPrecision} : !llvm.float + %6616 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6617 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6618 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6619 = llvm.mul %28, %6618 : !llvm.i64 + %6620 = llvm.add %6617, %6619 : !llvm.i64 + %6621 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6622 = llvm.mul %6583, %6621 : !llvm.i64 + %6623 = llvm.add %6620, %6622 : !llvm.i64 + %6624 = llvm.getelementptr %6616[%6623] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6615, %6624 : !llvm.ptr + %6625 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6626 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6627 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6628 = llvm.mul %28, %6627 : !llvm.i64 + %6629 = llvm.add %6626, %6628 : !llvm.i64 + %6630 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6631 = llvm.mul %6583, %6630 : !llvm.i64 + %6632 = llvm.add %6629, %6631 : !llvm.i64 + %6633 = llvm.getelementptr %6625[%6632] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6634 = llvm.load %6633 : !llvm.ptr + %6635 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6637 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6638 = llvm.mul %28, %6637 : !llvm.i64 + %6639 = llvm.add %6636, %6638 : !llvm.i64 + %6640 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6641 = llvm.mul %6583, %6640 : !llvm.i64 + %6642 = llvm.add %6639, %6641 : !llvm.i64 + %6643 = llvm.getelementptr %6635[%6642] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6634, %6643 : !llvm.ptr + %6644 = llvm.add %5850, %45 : !llvm.i64 + %6645 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6646 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6647 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6648 = llvm.mul %28, %6647 : !llvm.i64 + %6649 = llvm.add %6646, %6648 : !llvm.i64 + %6650 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6651 = llvm.mul %5851, %6650 : !llvm.i64 + %6652 = llvm.add %6649, %6651 : !llvm.i64 + %6653 = llvm.getelementptr %6645[%6652] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6654 = llvm.load %6653 : !llvm.ptr + %6655 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6656 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6657 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6658 = llvm.mul %5851, %6657 : !llvm.i64 + %6659 = llvm.add %6656, %6658 : !llvm.i64 + %6660 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6661 = llvm.mul %6644, %6660 : !llvm.i64 + %6662 = llvm.add %6659, %6661 : !llvm.i64 + %6663 = llvm.getelementptr %6655[%6662] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6664 = llvm.load %6663 : !llvm.ptr + %6665 = llvm.fmul %6654, %6664 {RelaxedPrecision} : !llvm.float + %6666 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6667 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6668 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6669 = llvm.mul %28, %6668 : !llvm.i64 + %6670 = llvm.add %6667, %6669 : !llvm.i64 + %6671 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6672 = llvm.mul %6644, %6671 : !llvm.i64 + %6673 = llvm.add %6670, %6672 : !llvm.i64 + %6674 = llvm.getelementptr %6666[%6673] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6675 = llvm.load %6674 : !llvm.ptr + %6676 = llvm.fadd %6675, %6665 {RelaxedPrecision} : !llvm.float + %6677 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6680 = llvm.mul %28, %6679 : !llvm.i64 + %6681 = llvm.add %6678, %6680 : !llvm.i64 + %6682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6683 = llvm.mul %6644, %6682 : !llvm.i64 + %6684 = llvm.add %6681, %6683 : !llvm.i64 + %6685 = llvm.getelementptr %6677[%6684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6676, %6685 : !llvm.ptr + %6686 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6687 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6688 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6689 = llvm.mul %28, %6688 : !llvm.i64 + %6690 = llvm.add %6687, %6689 : !llvm.i64 + %6691 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6692 = llvm.mul %6644, %6691 : !llvm.i64 + %6693 = llvm.add %6690, %6692 : !llvm.i64 + %6694 = llvm.getelementptr %6686[%6693] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6695 = llvm.load %6694 : !llvm.ptr + %6696 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6697 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6698 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6699 = llvm.mul %28, %6698 : !llvm.i64 + %6700 = llvm.add %6697, %6699 : !llvm.i64 + %6701 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6702 = llvm.mul %6644, %6701 : !llvm.i64 + %6703 = llvm.add %6700, %6702 : !llvm.i64 + %6704 = llvm.getelementptr %6696[%6703] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6695, %6704 : !llvm.ptr + %6705 = llvm.add %5850, %46 : !llvm.i64 + %6706 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6708 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6709 = llvm.mul %28, %6708 : !llvm.i64 + %6710 = llvm.add %6707, %6709 : !llvm.i64 + %6711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6712 = llvm.mul %5851, %6711 : !llvm.i64 + %6713 = llvm.add %6710, %6712 : !llvm.i64 + %6714 = llvm.getelementptr %6706[%6713] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6715 = llvm.load %6714 : !llvm.ptr + %6716 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6717 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6718 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6719 = llvm.mul %5851, %6718 : !llvm.i64 + %6720 = llvm.add %6717, %6719 : !llvm.i64 + %6721 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6722 = llvm.mul %6705, %6721 : !llvm.i64 + %6723 = llvm.add %6720, %6722 : !llvm.i64 + %6724 = llvm.getelementptr %6716[%6723] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6725 = llvm.load %6724 : !llvm.ptr + %6726 = llvm.fmul %6715, %6725 {RelaxedPrecision} : !llvm.float + %6727 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6729 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6730 = llvm.mul %28, %6729 : !llvm.i64 + %6731 = llvm.add %6728, %6730 : !llvm.i64 + %6732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6733 = llvm.mul %6705, %6732 : !llvm.i64 + %6734 = llvm.add %6731, %6733 : !llvm.i64 + %6735 = llvm.getelementptr %6727[%6734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6736 = llvm.load %6735 : !llvm.ptr + %6737 = llvm.fadd %6736, %6726 {RelaxedPrecision} : !llvm.float + %6738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6741 = llvm.mul %28, %6740 : !llvm.i64 + %6742 = llvm.add %6739, %6741 : !llvm.i64 + %6743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6744 = llvm.mul %6705, %6743 : !llvm.i64 + %6745 = llvm.add %6742, %6744 : !llvm.i64 + %6746 = llvm.getelementptr %6738[%6745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6737, %6746 : !llvm.ptr + %6747 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6749 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6750 = llvm.mul %28, %6749 : !llvm.i64 + %6751 = llvm.add %6748, %6750 : !llvm.i64 + %6752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6753 = llvm.mul %6705, %6752 : !llvm.i64 + %6754 = llvm.add %6751, %6753 : !llvm.i64 + %6755 = llvm.getelementptr %6747[%6754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6756 = llvm.load %6755 : !llvm.ptr + %6757 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6758 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6759 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6760 = llvm.mul %28, %6759 : !llvm.i64 + %6761 = llvm.add %6758, %6760 : !llvm.i64 + %6762 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6763 = llvm.mul %6705, %6762 : !llvm.i64 + %6764 = llvm.add %6761, %6763 : !llvm.i64 + %6765 = llvm.getelementptr %6757[%6764] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6756, %6765 : !llvm.ptr + %6766 = llvm.add %5850, %47 : !llvm.i64 + %6767 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6768 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6769 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6770 = llvm.mul %28, %6769 : !llvm.i64 + %6771 = llvm.add %6768, %6770 : !llvm.i64 + %6772 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6773 = llvm.mul %5851, %6772 : !llvm.i64 + %6774 = llvm.add %6771, %6773 : !llvm.i64 + %6775 = llvm.getelementptr %6767[%6774] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6776 = llvm.load %6775 : !llvm.ptr + %6777 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6779 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6780 = llvm.mul %5851, %6779 : !llvm.i64 + %6781 = llvm.add %6778, %6780 : !llvm.i64 + %6782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6783 = llvm.mul %6766, %6782 : !llvm.i64 + %6784 = llvm.add %6781, %6783 : !llvm.i64 + %6785 = llvm.getelementptr %6777[%6784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6786 = llvm.load %6785 : !llvm.ptr + %6787 = llvm.fmul %6776, %6786 {RelaxedPrecision} : !llvm.float + %6788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6791 = llvm.mul %28, %6790 : !llvm.i64 + %6792 = llvm.add %6789, %6791 : !llvm.i64 + %6793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6794 = llvm.mul %6766, %6793 : !llvm.i64 + %6795 = llvm.add %6792, %6794 : !llvm.i64 + %6796 = llvm.getelementptr %6788[%6795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6797 = llvm.load %6796 : !llvm.ptr + %6798 = llvm.fadd %6797, %6787 {RelaxedPrecision} : !llvm.float + %6799 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6800 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6801 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6802 = llvm.mul %28, %6801 : !llvm.i64 + %6803 = llvm.add %6800, %6802 : !llvm.i64 + %6804 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6805 = llvm.mul %6766, %6804 : !llvm.i64 + %6806 = llvm.add %6803, %6805 : !llvm.i64 + %6807 = llvm.getelementptr %6799[%6806] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6798, %6807 : !llvm.ptr + %6808 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6811 = llvm.mul %28, %6810 : !llvm.i64 + %6812 = llvm.add %6809, %6811 : !llvm.i64 + %6813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6814 = llvm.mul %6766, %6813 : !llvm.i64 + %6815 = llvm.add %6812, %6814 : !llvm.i64 + %6816 = llvm.getelementptr %6808[%6815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6817 = llvm.load %6816 : !llvm.ptr + %6818 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6819 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6820 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6821 = llvm.mul %28, %6820 : !llvm.i64 + %6822 = llvm.add %6819, %6821 : !llvm.i64 + %6823 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6824 = llvm.mul %6766, %6823 : !llvm.i64 + %6825 = llvm.add %6822, %6824 : !llvm.i64 + %6826 = llvm.getelementptr %6818[%6825] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6817, %6826 : !llvm.ptr + %6827 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6828 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6829 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6830 = llvm.mul %24, %6829 : !llvm.i64 + %6831 = llvm.add %6828, %6830 : !llvm.i64 + %6832 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6833 = llvm.mul %5851, %6832 : !llvm.i64 + %6834 = llvm.add %6831, %6833 : !llvm.i64 + %6835 = llvm.getelementptr %6827[%6834] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6836 = llvm.load %6835 : !llvm.ptr + %6837 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6839 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6840 = llvm.mul %5851, %6839 : !llvm.i64 + %6841 = llvm.add %6838, %6840 : !llvm.i64 + %6842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6843 = llvm.mul %5850, %6842 : !llvm.i64 + %6844 = llvm.add %6841, %6843 : !llvm.i64 + %6845 = llvm.getelementptr %6837[%6844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6846 = llvm.load %6845 : !llvm.ptr + %6847 = llvm.fmul %6836, %6846 {RelaxedPrecision} : !llvm.float + %6848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6851 = llvm.mul %24, %6850 : !llvm.i64 + %6852 = llvm.add %6849, %6851 : !llvm.i64 + %6853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6854 = llvm.mul %5850, %6853 : !llvm.i64 + %6855 = llvm.add %6852, %6854 : !llvm.i64 + %6856 = llvm.getelementptr %6848[%6855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6857 = llvm.load %6856 : !llvm.ptr + %6858 = llvm.fadd %6857, %6847 {RelaxedPrecision} : !llvm.float + %6859 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6860 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6861 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6862 = llvm.mul %24, %6861 : !llvm.i64 + %6863 = llvm.add %6860, %6862 : !llvm.i64 + %6864 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6865 = llvm.mul %5850, %6864 : !llvm.i64 + %6866 = llvm.add %6863, %6865 : !llvm.i64 + %6867 = llvm.getelementptr %6859[%6866] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6858, %6867 : !llvm.ptr + %6868 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6871 = llvm.mul %24, %6870 : !llvm.i64 + %6872 = llvm.add %6869, %6871 : !llvm.i64 + %6873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6874 = llvm.mul %5850, %6873 : !llvm.i64 + %6875 = llvm.add %6872, %6874 : !llvm.i64 + %6876 = llvm.getelementptr %6868[%6875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6877 = llvm.load %6876 : !llvm.ptr + %6878 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6880 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6881 = llvm.mul %24, %6880 : !llvm.i64 + %6882 = llvm.add %6879, %6881 : !llvm.i64 + %6883 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6884 = llvm.mul %5850, %6883 : !llvm.i64 + %6885 = llvm.add %6882, %6884 : !llvm.i64 + %6886 = llvm.getelementptr %6878[%6885] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6877, %6886 : !llvm.ptr + %6887 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6888 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6889 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6890 = llvm.mul %24, %6889 : !llvm.i64 + %6891 = llvm.add %6888, %6890 : !llvm.i64 + %6892 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6893 = llvm.mul %5851, %6892 : !llvm.i64 + %6894 = llvm.add %6891, %6893 : !llvm.i64 + %6895 = llvm.getelementptr %6887[%6894] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6896 = llvm.load %6895 : !llvm.ptr + %6897 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6899 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6900 = llvm.mul %5851, %6899 : !llvm.i64 + %6901 = llvm.add %6898, %6900 : !llvm.i64 + %6902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6903 = llvm.mul %5912, %6902 : !llvm.i64 + %6904 = llvm.add %6901, %6903 : !llvm.i64 + %6905 = llvm.getelementptr %6897[%6904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6906 = llvm.load %6905 : !llvm.ptr + %6907 = llvm.fmul %6896, %6906 {RelaxedPrecision} : !llvm.float + %6908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6911 = llvm.mul %24, %6910 : !llvm.i64 + %6912 = llvm.add %6909, %6911 : !llvm.i64 + %6913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6914 = llvm.mul %5912, %6913 : !llvm.i64 + %6915 = llvm.add %6912, %6914 : !llvm.i64 + %6916 = llvm.getelementptr %6908[%6915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6917 = llvm.load %6916 : !llvm.ptr + %6918 = llvm.fadd %6917, %6907 {RelaxedPrecision} : !llvm.float + %6919 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6921 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6922 = llvm.mul %24, %6921 : !llvm.i64 + %6923 = llvm.add %6920, %6922 : !llvm.i64 + %6924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6925 = llvm.mul %5912, %6924 : !llvm.i64 + %6926 = llvm.add %6923, %6925 : !llvm.i64 + %6927 = llvm.getelementptr %6919[%6926] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6918, %6927 : !llvm.ptr + %6928 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6930 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6931 = llvm.mul %24, %6930 : !llvm.i64 + %6932 = llvm.add %6929, %6931 : !llvm.i64 + %6933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6934 = llvm.mul %5912, %6933 : !llvm.i64 + %6935 = llvm.add %6932, %6934 : !llvm.i64 + %6936 = llvm.getelementptr %6928[%6935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6937 = llvm.load %6936 : !llvm.ptr + %6938 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6940 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6941 = llvm.mul %24, %6940 : !llvm.i64 + %6942 = llvm.add %6939, %6941 : !llvm.i64 + %6943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6944 = llvm.mul %5912, %6943 : !llvm.i64 + %6945 = llvm.add %6942, %6944 : !llvm.i64 + %6946 = llvm.getelementptr %6938[%6945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6937, %6946 : !llvm.ptr + %6947 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6948 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6949 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6950 = llvm.mul %24, %6949 : !llvm.i64 + %6951 = llvm.add %6948, %6950 : !llvm.i64 + %6952 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6953 = llvm.mul %5851, %6952 : !llvm.i64 + %6954 = llvm.add %6951, %6953 : !llvm.i64 + %6955 = llvm.getelementptr %6947[%6954] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6956 = llvm.load %6955 : !llvm.ptr + %6957 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6958 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6959 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6960 = llvm.mul %5851, %6959 : !llvm.i64 + %6961 = llvm.add %6958, %6960 : !llvm.i64 + %6962 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6963 = llvm.mul %5973, %6962 : !llvm.i64 + %6964 = llvm.add %6961, %6963 : !llvm.i64 + %6965 = llvm.getelementptr %6957[%6964] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6966 = llvm.load %6965 : !llvm.ptr + %6967 = llvm.fmul %6956, %6966 {RelaxedPrecision} : !llvm.float + %6968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6971 = llvm.mul %24, %6970 : !llvm.i64 + %6972 = llvm.add %6969, %6971 : !llvm.i64 + %6973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6974 = llvm.mul %5973, %6973 : !llvm.i64 + %6975 = llvm.add %6972, %6974 : !llvm.i64 + %6976 = llvm.getelementptr %6968[%6975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6977 = llvm.load %6976 : !llvm.ptr + %6978 = llvm.fadd %6977, %6967 {RelaxedPrecision} : !llvm.float + %6979 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6981 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6982 = llvm.mul %24, %6981 : !llvm.i64 + %6983 = llvm.add %6980, %6982 : !llvm.i64 + %6984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6985 = llvm.mul %5973, %6984 : !llvm.i64 + %6986 = llvm.add %6983, %6985 : !llvm.i64 + %6987 = llvm.getelementptr %6979[%6986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6978, %6987 : !llvm.ptr + %6988 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6990 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6991 = llvm.mul %24, %6990 : !llvm.i64 + %6992 = llvm.add %6989, %6991 : !llvm.i64 + %6993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6994 = llvm.mul %5973, %6993 : !llvm.i64 + %6995 = llvm.add %6992, %6994 : !llvm.i64 + %6996 = llvm.getelementptr %6988[%6995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6997 = llvm.load %6996 : !llvm.ptr + %6998 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6999 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7000 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7001 = llvm.mul %24, %7000 : !llvm.i64 + %7002 = llvm.add %6999, %7001 : !llvm.i64 + %7003 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7004 = llvm.mul %5973, %7003 : !llvm.i64 + %7005 = llvm.add %7002, %7004 : !llvm.i64 + %7006 = llvm.getelementptr %6998[%7005] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6997, %7006 : !llvm.ptr + %7007 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7009 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7010 = llvm.mul %24, %7009 : !llvm.i64 + %7011 = llvm.add %7008, %7010 : !llvm.i64 + %7012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7013 = llvm.mul %5851, %7012 : !llvm.i64 + %7014 = llvm.add %7011, %7013 : !llvm.i64 + %7015 = llvm.getelementptr %7007[%7014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7016 = llvm.load %7015 : !llvm.ptr + %7017 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7018 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7019 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7020 = llvm.mul %5851, %7019 : !llvm.i64 + %7021 = llvm.add %7018, %7020 : !llvm.i64 + %7022 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7023 = llvm.mul %6034, %7022 : !llvm.i64 + %7024 = llvm.add %7021, %7023 : !llvm.i64 + %7025 = llvm.getelementptr %7017[%7024] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7026 = llvm.load %7025 : !llvm.ptr + %7027 = llvm.fmul %7016, %7026 {RelaxedPrecision} : !llvm.float + %7028 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7031 = llvm.mul %24, %7030 : !llvm.i64 + %7032 = llvm.add %7029, %7031 : !llvm.i64 + %7033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7034 = llvm.mul %6034, %7033 : !llvm.i64 + %7035 = llvm.add %7032, %7034 : !llvm.i64 + %7036 = llvm.getelementptr %7028[%7035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7037 = llvm.load %7036 : !llvm.ptr + %7038 = llvm.fadd %7037, %7027 {RelaxedPrecision} : !llvm.float + %7039 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7041 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7042 = llvm.mul %24, %7041 : !llvm.i64 + %7043 = llvm.add %7040, %7042 : !llvm.i64 + %7044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7045 = llvm.mul %6034, %7044 : !llvm.i64 + %7046 = llvm.add %7043, %7045 : !llvm.i64 + %7047 = llvm.getelementptr %7039[%7046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7038, %7047 : !llvm.ptr + %7048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7051 = llvm.mul %24, %7050 : !llvm.i64 + %7052 = llvm.add %7049, %7051 : !llvm.i64 + %7053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7054 = llvm.mul %6034, %7053 : !llvm.i64 + %7055 = llvm.add %7052, %7054 : !llvm.i64 + %7056 = llvm.getelementptr %7048[%7055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7057 = llvm.load %7056 : !llvm.ptr + %7058 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7059 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7060 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7061 = llvm.mul %24, %7060 : !llvm.i64 + %7062 = llvm.add %7059, %7061 : !llvm.i64 + %7063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7064 = llvm.mul %6034, %7063 : !llvm.i64 + %7065 = llvm.add %7062, %7064 : !llvm.i64 + %7066 = llvm.getelementptr %7058[%7065] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7057, %7066 : !llvm.ptr + %7067 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7069 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7070 = llvm.mul %24, %7069 : !llvm.i64 + %7071 = llvm.add %7068, %7070 : !llvm.i64 + %7072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7073 = llvm.mul %5851, %7072 : !llvm.i64 + %7074 = llvm.add %7071, %7073 : !llvm.i64 + %7075 = llvm.getelementptr %7067[%7074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7076 = llvm.load %7075 : !llvm.ptr + %7077 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7078 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7079 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7080 = llvm.mul %5851, %7079 : !llvm.i64 + %7081 = llvm.add %7078, %7080 : !llvm.i64 + %7082 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7083 = llvm.mul %6095, %7082 : !llvm.i64 + %7084 = llvm.add %7081, %7083 : !llvm.i64 + %7085 = llvm.getelementptr %7077[%7084] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7086 = llvm.load %7085 : !llvm.ptr + %7087 = llvm.fmul %7076, %7086 {RelaxedPrecision} : !llvm.float + %7088 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7091 = llvm.mul %24, %7090 : !llvm.i64 + %7092 = llvm.add %7089, %7091 : !llvm.i64 + %7093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7094 = llvm.mul %6095, %7093 : !llvm.i64 + %7095 = llvm.add %7092, %7094 : !llvm.i64 + %7096 = llvm.getelementptr %7088[%7095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7097 = llvm.load %7096 : !llvm.ptr + %7098 = llvm.fadd %7097, %7087 {RelaxedPrecision} : !llvm.float + %7099 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7101 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7102 = llvm.mul %24, %7101 : !llvm.i64 + %7103 = llvm.add %7100, %7102 : !llvm.i64 + %7104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7105 = llvm.mul %6095, %7104 : !llvm.i64 + %7106 = llvm.add %7103, %7105 : !llvm.i64 + %7107 = llvm.getelementptr %7099[%7106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7098, %7107 : !llvm.ptr + %7108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7111 = llvm.mul %24, %7110 : !llvm.i64 + %7112 = llvm.add %7109, %7111 : !llvm.i64 + %7113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7114 = llvm.mul %6095, %7113 : !llvm.i64 + %7115 = llvm.add %7112, %7114 : !llvm.i64 + %7116 = llvm.getelementptr %7108[%7115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7117 = llvm.load %7116 : !llvm.ptr + %7118 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7119 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7120 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7121 = llvm.mul %24, %7120 : !llvm.i64 + %7122 = llvm.add %7119, %7121 : !llvm.i64 + %7123 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7124 = llvm.mul %6095, %7123 : !llvm.i64 + %7125 = llvm.add %7122, %7124 : !llvm.i64 + %7126 = llvm.getelementptr %7118[%7125] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7117, %7126 : !llvm.ptr + %7127 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7128 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7129 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7130 = llvm.mul %24, %7129 : !llvm.i64 + %7131 = llvm.add %7128, %7130 : !llvm.i64 + %7132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7133 = llvm.mul %5851, %7132 : !llvm.i64 + %7134 = llvm.add %7131, %7133 : !llvm.i64 + %7135 = llvm.getelementptr %7127[%7134] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7136 = llvm.load %7135 : !llvm.ptr + %7137 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7140 = llvm.mul %5851, %7139 : !llvm.i64 + %7141 = llvm.add %7138, %7140 : !llvm.i64 + %7142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7143 = llvm.mul %6156, %7142 : !llvm.i64 + %7144 = llvm.add %7141, %7143 : !llvm.i64 + %7145 = llvm.getelementptr %7137[%7144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7146 = llvm.load %7145 : !llvm.ptr + %7147 = llvm.fmul %7136, %7146 {RelaxedPrecision} : !llvm.float + %7148 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7151 = llvm.mul %24, %7150 : !llvm.i64 + %7152 = llvm.add %7149, %7151 : !llvm.i64 + %7153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7154 = llvm.mul %6156, %7153 : !llvm.i64 + %7155 = llvm.add %7152, %7154 : !llvm.i64 + %7156 = llvm.getelementptr %7148[%7155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7157 = llvm.load %7156 : !llvm.ptr + %7158 = llvm.fadd %7157, %7147 {RelaxedPrecision} : !llvm.float + %7159 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7161 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7162 = llvm.mul %24, %7161 : !llvm.i64 + %7163 = llvm.add %7160, %7162 : !llvm.i64 + %7164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7165 = llvm.mul %6156, %7164 : !llvm.i64 + %7166 = llvm.add %7163, %7165 : !llvm.i64 + %7167 = llvm.getelementptr %7159[%7166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7158, %7167 : !llvm.ptr + %7168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7171 = llvm.mul %24, %7170 : !llvm.i64 + %7172 = llvm.add %7169, %7171 : !llvm.i64 + %7173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7174 = llvm.mul %6156, %7173 : !llvm.i64 + %7175 = llvm.add %7172, %7174 : !llvm.i64 + %7176 = llvm.getelementptr %7168[%7175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7177 = llvm.load %7176 : !llvm.ptr + %7178 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7180 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7181 = llvm.mul %24, %7180 : !llvm.i64 + %7182 = llvm.add %7179, %7181 : !llvm.i64 + %7183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7184 = llvm.mul %6156, %7183 : !llvm.i64 + %7185 = llvm.add %7182, %7184 : !llvm.i64 + %7186 = llvm.getelementptr %7178[%7185] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7177, %7186 : !llvm.ptr + %7187 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7188 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7189 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7190 = llvm.mul %24, %7189 : !llvm.i64 + %7191 = llvm.add %7188, %7190 : !llvm.i64 + %7192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7193 = llvm.mul %5851, %7192 : !llvm.i64 + %7194 = llvm.add %7191, %7193 : !llvm.i64 + %7195 = llvm.getelementptr %7187[%7194] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7196 = llvm.load %7195 : !llvm.ptr + %7197 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7198 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7199 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7200 = llvm.mul %5851, %7199 : !llvm.i64 + %7201 = llvm.add %7198, %7200 : !llvm.i64 + %7202 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7203 = llvm.mul %6217, %7202 : !llvm.i64 + %7204 = llvm.add %7201, %7203 : !llvm.i64 + %7205 = llvm.getelementptr %7197[%7204] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7206 = llvm.load %7205 : !llvm.ptr + %7207 = llvm.fmul %7196, %7206 {RelaxedPrecision} : !llvm.float + %7208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7211 = llvm.mul %24, %7210 : !llvm.i64 + %7212 = llvm.add %7209, %7211 : !llvm.i64 + %7213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7214 = llvm.mul %6217, %7213 : !llvm.i64 + %7215 = llvm.add %7212, %7214 : !llvm.i64 + %7216 = llvm.getelementptr %7208[%7215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7217 = llvm.load %7216 : !llvm.ptr + %7218 = llvm.fadd %7217, %7207 {RelaxedPrecision} : !llvm.float + %7219 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7221 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7222 = llvm.mul %24, %7221 : !llvm.i64 + %7223 = llvm.add %7220, %7222 : !llvm.i64 + %7224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7225 = llvm.mul %6217, %7224 : !llvm.i64 + %7226 = llvm.add %7223, %7225 : !llvm.i64 + %7227 = llvm.getelementptr %7219[%7226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7218, %7227 : !llvm.ptr + %7228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7231 = llvm.mul %24, %7230 : !llvm.i64 + %7232 = llvm.add %7229, %7231 : !llvm.i64 + %7233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7234 = llvm.mul %6217, %7233 : !llvm.i64 + %7235 = llvm.add %7232, %7234 : !llvm.i64 + %7236 = llvm.getelementptr %7228[%7235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7237 = llvm.load %7236 : !llvm.ptr + %7238 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7239 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7240 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7241 = llvm.mul %24, %7240 : !llvm.i64 + %7242 = llvm.add %7239, %7241 : !llvm.i64 + %7243 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7244 = llvm.mul %6217, %7243 : !llvm.i64 + %7245 = llvm.add %7242, %7244 : !llvm.i64 + %7246 = llvm.getelementptr %7238[%7245] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7237, %7246 : !llvm.ptr + %7247 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7248 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7249 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7250 = llvm.mul %24, %7249 : !llvm.i64 + %7251 = llvm.add %7248, %7250 : !llvm.i64 + %7252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7253 = llvm.mul %5851, %7252 : !llvm.i64 + %7254 = llvm.add %7251, %7253 : !llvm.i64 + %7255 = llvm.getelementptr %7247[%7254] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7256 = llvm.load %7255 : !llvm.ptr + %7257 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7258 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7259 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7260 = llvm.mul %5851, %7259 : !llvm.i64 + %7261 = llvm.add %7258, %7260 : !llvm.i64 + %7262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7263 = llvm.mul %6278, %7262 : !llvm.i64 + %7264 = llvm.add %7261, %7263 : !llvm.i64 + %7265 = llvm.getelementptr %7257[%7264] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7266 = llvm.load %7265 : !llvm.ptr + %7267 = llvm.fmul %7256, %7266 {RelaxedPrecision} : !llvm.float + %7268 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7271 = llvm.mul %24, %7270 : !llvm.i64 + %7272 = llvm.add %7269, %7271 : !llvm.i64 + %7273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7274 = llvm.mul %6278, %7273 : !llvm.i64 + %7275 = llvm.add %7272, %7274 : !llvm.i64 + %7276 = llvm.getelementptr %7268[%7275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7277 = llvm.load %7276 : !llvm.ptr + %7278 = llvm.fadd %7277, %7267 {RelaxedPrecision} : !llvm.float + %7279 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7281 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7282 = llvm.mul %24, %7281 : !llvm.i64 + %7283 = llvm.add %7280, %7282 : !llvm.i64 + %7284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7285 = llvm.mul %6278, %7284 : !llvm.i64 + %7286 = llvm.add %7283, %7285 : !llvm.i64 + %7287 = llvm.getelementptr %7279[%7286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7278, %7287 : !llvm.ptr + %7288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7291 = llvm.mul %24, %7290 : !llvm.i64 + %7292 = llvm.add %7289, %7291 : !llvm.i64 + %7293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7294 = llvm.mul %6278, %7293 : !llvm.i64 + %7295 = llvm.add %7292, %7294 : !llvm.i64 + %7296 = llvm.getelementptr %7288[%7295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7297 = llvm.load %7296 : !llvm.ptr + %7298 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7300 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7301 = llvm.mul %24, %7300 : !llvm.i64 + %7302 = llvm.add %7299, %7301 : !llvm.i64 + %7303 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7304 = llvm.mul %6278, %7303 : !llvm.i64 + %7305 = llvm.add %7302, %7304 : !llvm.i64 + %7306 = llvm.getelementptr %7298[%7305] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7297, %7306 : !llvm.ptr + %7307 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7308 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7309 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7310 = llvm.mul %24, %7309 : !llvm.i64 + %7311 = llvm.add %7308, %7310 : !llvm.i64 + %7312 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7313 = llvm.mul %5851, %7312 : !llvm.i64 + %7314 = llvm.add %7311, %7313 : !llvm.i64 + %7315 = llvm.getelementptr %7307[%7314] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7316 = llvm.load %7315 : !llvm.ptr + %7317 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7318 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7319 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7320 = llvm.mul %5851, %7319 : !llvm.i64 + %7321 = llvm.add %7318, %7320 : !llvm.i64 + %7322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7323 = llvm.mul %6339, %7322 : !llvm.i64 + %7324 = llvm.add %7321, %7323 : !llvm.i64 + %7325 = llvm.getelementptr %7317[%7324] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7326 = llvm.load %7325 : !llvm.ptr + %7327 = llvm.fmul %7316, %7326 {RelaxedPrecision} : !llvm.float + %7328 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7331 = llvm.mul %24, %7330 : !llvm.i64 + %7332 = llvm.add %7329, %7331 : !llvm.i64 + %7333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7334 = llvm.mul %6339, %7333 : !llvm.i64 + %7335 = llvm.add %7332, %7334 : !llvm.i64 + %7336 = llvm.getelementptr %7328[%7335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7337 = llvm.load %7336 : !llvm.ptr + %7338 = llvm.fadd %7337, %7327 {RelaxedPrecision} : !llvm.float + %7339 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7341 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7342 = llvm.mul %24, %7341 : !llvm.i64 + %7343 = llvm.add %7340, %7342 : !llvm.i64 + %7344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7345 = llvm.mul %6339, %7344 : !llvm.i64 + %7346 = llvm.add %7343, %7345 : !llvm.i64 + %7347 = llvm.getelementptr %7339[%7346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7338, %7347 : !llvm.ptr + %7348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7351 = llvm.mul %24, %7350 : !llvm.i64 + %7352 = llvm.add %7349, %7351 : !llvm.i64 + %7353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7354 = llvm.mul %6339, %7353 : !llvm.i64 + %7355 = llvm.add %7352, %7354 : !llvm.i64 + %7356 = llvm.getelementptr %7348[%7355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7357 = llvm.load %7356 : !llvm.ptr + %7358 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7360 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7361 = llvm.mul %24, %7360 : !llvm.i64 + %7362 = llvm.add %7359, %7361 : !llvm.i64 + %7363 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7364 = llvm.mul %6339, %7363 : !llvm.i64 + %7365 = llvm.add %7362, %7364 : !llvm.i64 + %7366 = llvm.getelementptr %7358[%7365] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7357, %7366 : !llvm.ptr + %7367 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7368 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7369 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7370 = llvm.mul %24, %7369 : !llvm.i64 + %7371 = llvm.add %7368, %7370 : !llvm.i64 + %7372 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7373 = llvm.mul %5851, %7372 : !llvm.i64 + %7374 = llvm.add %7371, %7373 : !llvm.i64 + %7375 = llvm.getelementptr %7367[%7374] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7376 = llvm.load %7375 : !llvm.ptr + %7377 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7378 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7379 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7380 = llvm.mul %5851, %7379 : !llvm.i64 + %7381 = llvm.add %7378, %7380 : !llvm.i64 + %7382 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7383 = llvm.mul %6400, %7382 : !llvm.i64 + %7384 = llvm.add %7381, %7383 : !llvm.i64 + %7385 = llvm.getelementptr %7377[%7384] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7386 = llvm.load %7385 : !llvm.ptr + %7387 = llvm.fmul %7376, %7386 {RelaxedPrecision} : !llvm.float + %7388 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7391 = llvm.mul %24, %7390 : !llvm.i64 + %7392 = llvm.add %7389, %7391 : !llvm.i64 + %7393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7394 = llvm.mul %6400, %7393 : !llvm.i64 + %7395 = llvm.add %7392, %7394 : !llvm.i64 + %7396 = llvm.getelementptr %7388[%7395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7397 = llvm.load %7396 : !llvm.ptr + %7398 = llvm.fadd %7397, %7387 {RelaxedPrecision} : !llvm.float + %7399 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7401 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7402 = llvm.mul %24, %7401 : !llvm.i64 + %7403 = llvm.add %7400, %7402 : !llvm.i64 + %7404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7405 = llvm.mul %6400, %7404 : !llvm.i64 + %7406 = llvm.add %7403, %7405 : !llvm.i64 + %7407 = llvm.getelementptr %7399[%7406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7398, %7407 : !llvm.ptr + %7408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7411 = llvm.mul %24, %7410 : !llvm.i64 + %7412 = llvm.add %7409, %7411 : !llvm.i64 + %7413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7414 = llvm.mul %6400, %7413 : !llvm.i64 + %7415 = llvm.add %7412, %7414 : !llvm.i64 + %7416 = llvm.getelementptr %7408[%7415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7417 = llvm.load %7416 : !llvm.ptr + %7418 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7420 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7421 = llvm.mul %24, %7420 : !llvm.i64 + %7422 = llvm.add %7419, %7421 : !llvm.i64 + %7423 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7424 = llvm.mul %6400, %7423 : !llvm.i64 + %7425 = llvm.add %7422, %7424 : !llvm.i64 + %7426 = llvm.getelementptr %7418[%7425] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7417, %7426 : !llvm.ptr + %7427 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7428 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7429 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7430 = llvm.mul %24, %7429 : !llvm.i64 + %7431 = llvm.add %7428, %7430 : !llvm.i64 + %7432 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7433 = llvm.mul %5851, %7432 : !llvm.i64 + %7434 = llvm.add %7431, %7433 : !llvm.i64 + %7435 = llvm.getelementptr %7427[%7434] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7436 = llvm.load %7435 : !llvm.ptr + %7437 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7438 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7439 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7440 = llvm.mul %5851, %7439 : !llvm.i64 + %7441 = llvm.add %7438, %7440 : !llvm.i64 + %7442 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7443 = llvm.mul %6461, %7442 : !llvm.i64 + %7444 = llvm.add %7441, %7443 : !llvm.i64 + %7445 = llvm.getelementptr %7437[%7444] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7446 = llvm.load %7445 : !llvm.ptr + %7447 = llvm.fmul %7436, %7446 {RelaxedPrecision} : !llvm.float + %7448 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7451 = llvm.mul %24, %7450 : !llvm.i64 + %7452 = llvm.add %7449, %7451 : !llvm.i64 + %7453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7454 = llvm.mul %6461, %7453 : !llvm.i64 + %7455 = llvm.add %7452, %7454 : !llvm.i64 + %7456 = llvm.getelementptr %7448[%7455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7457 = llvm.load %7456 : !llvm.ptr + %7458 = llvm.fadd %7457, %7447 {RelaxedPrecision} : !llvm.float + %7459 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7461 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7462 = llvm.mul %24, %7461 : !llvm.i64 + %7463 = llvm.add %7460, %7462 : !llvm.i64 + %7464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7465 = llvm.mul %6461, %7464 : !llvm.i64 + %7466 = llvm.add %7463, %7465 : !llvm.i64 + %7467 = llvm.getelementptr %7459[%7466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7458, %7467 : !llvm.ptr + %7468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7471 = llvm.mul %24, %7470 : !llvm.i64 + %7472 = llvm.add %7469, %7471 : !llvm.i64 + %7473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7474 = llvm.mul %6461, %7473 : !llvm.i64 + %7475 = llvm.add %7472, %7474 : !llvm.i64 + %7476 = llvm.getelementptr %7468[%7475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7477 = llvm.load %7476 : !llvm.ptr + %7478 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7479 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7480 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7481 = llvm.mul %24, %7480 : !llvm.i64 + %7482 = llvm.add %7479, %7481 : !llvm.i64 + %7483 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7484 = llvm.mul %6461, %7483 : !llvm.i64 + %7485 = llvm.add %7482, %7484 : !llvm.i64 + %7486 = llvm.getelementptr %7478[%7485] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7477, %7486 : !llvm.ptr + %7487 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7489 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7490 = llvm.mul %24, %7489 : !llvm.i64 + %7491 = llvm.add %7488, %7490 : !llvm.i64 + %7492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7493 = llvm.mul %5851, %7492 : !llvm.i64 + %7494 = llvm.add %7491, %7493 : !llvm.i64 + %7495 = llvm.getelementptr %7487[%7494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7496 = llvm.load %7495 : !llvm.ptr + %7497 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7500 = llvm.mul %5851, %7499 : !llvm.i64 + %7501 = llvm.add %7498, %7500 : !llvm.i64 + %7502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7503 = llvm.mul %6522, %7502 : !llvm.i64 + %7504 = llvm.add %7501, %7503 : !llvm.i64 + %7505 = llvm.getelementptr %7497[%7504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7506 = llvm.load %7505 : !llvm.ptr + %7507 = llvm.fmul %7496, %7506 {RelaxedPrecision} : !llvm.float + %7508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7511 = llvm.mul %24, %7510 : !llvm.i64 + %7512 = llvm.add %7509, %7511 : !llvm.i64 + %7513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7514 = llvm.mul %6522, %7513 : !llvm.i64 + %7515 = llvm.add %7512, %7514 : !llvm.i64 + %7516 = llvm.getelementptr %7508[%7515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7517 = llvm.load %7516 : !llvm.ptr + %7518 = llvm.fadd %7517, %7507 {RelaxedPrecision} : !llvm.float + %7519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7522 = llvm.mul %24, %7521 : !llvm.i64 + %7523 = llvm.add %7520, %7522 : !llvm.i64 + %7524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7525 = llvm.mul %6522, %7524 : !llvm.i64 + %7526 = llvm.add %7523, %7525 : !llvm.i64 + %7527 = llvm.getelementptr %7519[%7526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7518, %7527 : !llvm.ptr + %7528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7531 = llvm.mul %24, %7530 : !llvm.i64 + %7532 = llvm.add %7529, %7531 : !llvm.i64 + %7533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7534 = llvm.mul %6522, %7533 : !llvm.i64 + %7535 = llvm.add %7532, %7534 : !llvm.i64 + %7536 = llvm.getelementptr %7528[%7535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7537 = llvm.load %7536 : !llvm.ptr + %7538 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7539 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7540 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7541 = llvm.mul %24, %7540 : !llvm.i64 + %7542 = llvm.add %7539, %7541 : !llvm.i64 + %7543 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7544 = llvm.mul %6522, %7543 : !llvm.i64 + %7545 = llvm.add %7542, %7544 : !llvm.i64 + %7546 = llvm.getelementptr %7538[%7545] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7537, %7546 : !llvm.ptr + %7547 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7548 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7549 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7550 = llvm.mul %24, %7549 : !llvm.i64 + %7551 = llvm.add %7548, %7550 : !llvm.i64 + %7552 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7553 = llvm.mul %5851, %7552 : !llvm.i64 + %7554 = llvm.add %7551, %7553 : !llvm.i64 + %7555 = llvm.getelementptr %7547[%7554] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7556 = llvm.load %7555 : !llvm.ptr + %7557 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7558 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7559 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7560 = llvm.mul %5851, %7559 : !llvm.i64 + %7561 = llvm.add %7558, %7560 : !llvm.i64 + %7562 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7563 = llvm.mul %6583, %7562 : !llvm.i64 + %7564 = llvm.add %7561, %7563 : !llvm.i64 + %7565 = llvm.getelementptr %7557[%7564] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7566 = llvm.load %7565 : !llvm.ptr + %7567 = llvm.fmul %7556, %7566 {RelaxedPrecision} : !llvm.float + %7568 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7571 = llvm.mul %24, %7570 : !llvm.i64 + %7572 = llvm.add %7569, %7571 : !llvm.i64 + %7573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7574 = llvm.mul %6583, %7573 : !llvm.i64 + %7575 = llvm.add %7572, %7574 : !llvm.i64 + %7576 = llvm.getelementptr %7568[%7575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7577 = llvm.load %7576 : !llvm.ptr + %7578 = llvm.fadd %7577, %7567 {RelaxedPrecision} : !llvm.float + %7579 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7581 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7582 = llvm.mul %24, %7581 : !llvm.i64 + %7583 = llvm.add %7580, %7582 : !llvm.i64 + %7584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7585 = llvm.mul %6583, %7584 : !llvm.i64 + %7586 = llvm.add %7583, %7585 : !llvm.i64 + %7587 = llvm.getelementptr %7579[%7586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7578, %7587 : !llvm.ptr + %7588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7591 = llvm.mul %24, %7590 : !llvm.i64 + %7592 = llvm.add %7589, %7591 : !llvm.i64 + %7593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7594 = llvm.mul %6583, %7593 : !llvm.i64 + %7595 = llvm.add %7592, %7594 : !llvm.i64 + %7596 = llvm.getelementptr %7588[%7595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7597 = llvm.load %7596 : !llvm.ptr + %7598 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7599 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7600 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7601 = llvm.mul %24, %7600 : !llvm.i64 + %7602 = llvm.add %7599, %7601 : !llvm.i64 + %7603 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7604 = llvm.mul %6583, %7603 : !llvm.i64 + %7605 = llvm.add %7602, %7604 : !llvm.i64 + %7606 = llvm.getelementptr %7598[%7605] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7597, %7606 : !llvm.ptr + %7607 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7608 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7609 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7610 = llvm.mul %24, %7609 : !llvm.i64 + %7611 = llvm.add %7608, %7610 : !llvm.i64 + %7612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7613 = llvm.mul %5851, %7612 : !llvm.i64 + %7614 = llvm.add %7611, %7613 : !llvm.i64 + %7615 = llvm.getelementptr %7607[%7614] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7616 = llvm.load %7615 : !llvm.ptr + %7617 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7618 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7619 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7620 = llvm.mul %5851, %7619 : !llvm.i64 + %7621 = llvm.add %7618, %7620 : !llvm.i64 + %7622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7623 = llvm.mul %6644, %7622 : !llvm.i64 + %7624 = llvm.add %7621, %7623 : !llvm.i64 + %7625 = llvm.getelementptr %7617[%7624] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7626 = llvm.load %7625 : !llvm.ptr + %7627 = llvm.fmul %7616, %7626 {RelaxedPrecision} : !llvm.float + %7628 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7631 = llvm.mul %24, %7630 : !llvm.i64 + %7632 = llvm.add %7629, %7631 : !llvm.i64 + %7633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7634 = llvm.mul %6644, %7633 : !llvm.i64 + %7635 = llvm.add %7632, %7634 : !llvm.i64 + %7636 = llvm.getelementptr %7628[%7635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7637 = llvm.load %7636 : !llvm.ptr + %7638 = llvm.fadd %7637, %7627 {RelaxedPrecision} : !llvm.float + %7639 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7641 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7642 = llvm.mul %24, %7641 : !llvm.i64 + %7643 = llvm.add %7640, %7642 : !llvm.i64 + %7644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7645 = llvm.mul %6644, %7644 : !llvm.i64 + %7646 = llvm.add %7643, %7645 : !llvm.i64 + %7647 = llvm.getelementptr %7639[%7646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7638, %7647 : !llvm.ptr + %7648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7651 = llvm.mul %24, %7650 : !llvm.i64 + %7652 = llvm.add %7649, %7651 : !llvm.i64 + %7653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7654 = llvm.mul %6644, %7653 : !llvm.i64 + %7655 = llvm.add %7652, %7654 : !llvm.i64 + %7656 = llvm.getelementptr %7648[%7655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7657 = llvm.load %7656 : !llvm.ptr + %7658 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7659 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7660 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7661 = llvm.mul %24, %7660 : !llvm.i64 + %7662 = llvm.add %7659, %7661 : !llvm.i64 + %7663 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7664 = llvm.mul %6644, %7663 : !llvm.i64 + %7665 = llvm.add %7662, %7664 : !llvm.i64 + %7666 = llvm.getelementptr %7658[%7665] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7657, %7666 : !llvm.ptr + %7667 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7668 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7669 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7670 = llvm.mul %24, %7669 : !llvm.i64 + %7671 = llvm.add %7668, %7670 : !llvm.i64 + %7672 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7673 = llvm.mul %5851, %7672 : !llvm.i64 + %7674 = llvm.add %7671, %7673 : !llvm.i64 + %7675 = llvm.getelementptr %7667[%7674] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7676 = llvm.load %7675 : !llvm.ptr + %7677 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7680 = llvm.mul %5851, %7679 : !llvm.i64 + %7681 = llvm.add %7678, %7680 : !llvm.i64 + %7682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7683 = llvm.mul %6705, %7682 : !llvm.i64 + %7684 = llvm.add %7681, %7683 : !llvm.i64 + %7685 = llvm.getelementptr %7677[%7684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7686 = llvm.load %7685 : !llvm.ptr + %7687 = llvm.fmul %7676, %7686 {RelaxedPrecision} : !llvm.float + %7688 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7691 = llvm.mul %24, %7690 : !llvm.i64 + %7692 = llvm.add %7689, %7691 : !llvm.i64 + %7693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7694 = llvm.mul %6705, %7693 : !llvm.i64 + %7695 = llvm.add %7692, %7694 : !llvm.i64 + %7696 = llvm.getelementptr %7688[%7695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7697 = llvm.load %7696 : !llvm.ptr + %7698 = llvm.fadd %7697, %7687 {RelaxedPrecision} : !llvm.float + %7699 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7701 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7702 = llvm.mul %24, %7701 : !llvm.i64 + %7703 = llvm.add %7700, %7702 : !llvm.i64 + %7704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7705 = llvm.mul %6705, %7704 : !llvm.i64 + %7706 = llvm.add %7703, %7705 : !llvm.i64 + %7707 = llvm.getelementptr %7699[%7706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7698, %7707 : !llvm.ptr + %7708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7711 = llvm.mul %24, %7710 : !llvm.i64 + %7712 = llvm.add %7709, %7711 : !llvm.i64 + %7713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7714 = llvm.mul %6705, %7713 : !llvm.i64 + %7715 = llvm.add %7712, %7714 : !llvm.i64 + %7716 = llvm.getelementptr %7708[%7715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7717 = llvm.load %7716 : !llvm.ptr + %7718 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7720 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7721 = llvm.mul %24, %7720 : !llvm.i64 + %7722 = llvm.add %7719, %7721 : !llvm.i64 + %7723 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7724 = llvm.mul %6705, %7723 : !llvm.i64 + %7725 = llvm.add %7722, %7724 : !llvm.i64 + %7726 = llvm.getelementptr %7718[%7725] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7717, %7726 : !llvm.ptr + %7727 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7729 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7730 = llvm.mul %24, %7729 : !llvm.i64 + %7731 = llvm.add %7728, %7730 : !llvm.i64 + %7732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7733 = llvm.mul %5851, %7732 : !llvm.i64 + %7734 = llvm.add %7731, %7733 : !llvm.i64 + %7735 = llvm.getelementptr %7727[%7734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7736 = llvm.load %7735 : !llvm.ptr + %7737 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7738 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7739 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7740 = llvm.mul %5851, %7739 : !llvm.i64 + %7741 = llvm.add %7738, %7740 : !llvm.i64 + %7742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7743 = llvm.mul %6766, %7742 : !llvm.i64 + %7744 = llvm.add %7741, %7743 : !llvm.i64 + %7745 = llvm.getelementptr %7737[%7744] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7746 = llvm.load %7745 : !llvm.ptr + %7747 = llvm.fmul %7736, %7746 {RelaxedPrecision} : !llvm.float + %7748 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7750 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7751 = llvm.mul %24, %7750 : !llvm.i64 + %7752 = llvm.add %7749, %7751 : !llvm.i64 + %7753 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7754 = llvm.mul %6766, %7753 : !llvm.i64 + %7755 = llvm.add %7752, %7754 : !llvm.i64 + %7756 = llvm.getelementptr %7748[%7755] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7757 = llvm.load %7756 : !llvm.ptr + %7758 = llvm.fadd %7757, %7747 {RelaxedPrecision} : !llvm.float + %7759 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7761 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7762 = llvm.mul %24, %7761 : !llvm.i64 + %7763 = llvm.add %7760, %7762 : !llvm.i64 + %7764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7765 = llvm.mul %6766, %7764 : !llvm.i64 + %7766 = llvm.add %7763, %7765 : !llvm.i64 + %7767 = llvm.getelementptr %7759[%7766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7758, %7767 : !llvm.ptr + %7768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7771 = llvm.mul %24, %7770 : !llvm.i64 + %7772 = llvm.add %7769, %7771 : !llvm.i64 + %7773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7774 = llvm.mul %6766, %7773 : !llvm.i64 + %7775 = llvm.add %7772, %7774 : !llvm.i64 + %7776 = llvm.getelementptr %7768[%7775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7777 = llvm.load %7776 : !llvm.ptr + %7778 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7779 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7780 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7781 = llvm.mul %24, %7780 : !llvm.i64 + %7782 = llvm.add %7779, %7781 : !llvm.i64 + %7783 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7784 = llvm.mul %6766, %7783 : !llvm.i64 + %7785 = llvm.add %7782, %7784 : !llvm.i64 + %7786 = llvm.getelementptr %7778[%7785] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7777, %7786 : !llvm.ptr + %7787 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7788 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7789 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7790 = llvm.mul %25, %7789 : !llvm.i64 + %7791 = llvm.add %7788, %7790 : !llvm.i64 + %7792 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7793 = llvm.mul %5851, %7792 : !llvm.i64 + %7794 = llvm.add %7791, %7793 : !llvm.i64 + %7795 = llvm.getelementptr %7787[%7794] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7796 = llvm.load %7795 : !llvm.ptr + %7797 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7799 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7800 = llvm.mul %5851, %7799 : !llvm.i64 + %7801 = llvm.add %7798, %7800 : !llvm.i64 + %7802 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7803 = llvm.mul %5850, %7802 : !llvm.i64 + %7804 = llvm.add %7801, %7803 : !llvm.i64 + %7805 = llvm.getelementptr %7797[%7804] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7806 = llvm.load %7805 : !llvm.ptr + %7807 = llvm.fmul %7796, %7806 {RelaxedPrecision} : !llvm.float + %7808 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7811 = llvm.mul %25, %7810 : !llvm.i64 + %7812 = llvm.add %7809, %7811 : !llvm.i64 + %7813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7814 = llvm.mul %5850, %7813 : !llvm.i64 + %7815 = llvm.add %7812, %7814 : !llvm.i64 + %7816 = llvm.getelementptr %7808[%7815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7817 = llvm.load %7816 : !llvm.ptr + %7818 = llvm.fadd %7817, %7807 {RelaxedPrecision} : !llvm.float + %7819 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7821 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7822 = llvm.mul %25, %7821 : !llvm.i64 + %7823 = llvm.add %7820, %7822 : !llvm.i64 + %7824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7825 = llvm.mul %5850, %7824 : !llvm.i64 + %7826 = llvm.add %7823, %7825 : !llvm.i64 + %7827 = llvm.getelementptr %7819[%7826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7818, %7827 : !llvm.ptr + %7828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7831 = llvm.mul %25, %7830 : !llvm.i64 + %7832 = llvm.add %7829, %7831 : !llvm.i64 + %7833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7834 = llvm.mul %5850, %7833 : !llvm.i64 + %7835 = llvm.add %7832, %7834 : !llvm.i64 + %7836 = llvm.getelementptr %7828[%7835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7837 = llvm.load %7836 : !llvm.ptr + %7838 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7839 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7840 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7841 = llvm.mul %25, %7840 : !llvm.i64 + %7842 = llvm.add %7839, %7841 : !llvm.i64 + %7843 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7844 = llvm.mul %5850, %7843 : !llvm.i64 + %7845 = llvm.add %7842, %7844 : !llvm.i64 + %7846 = llvm.getelementptr %7838[%7845] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7837, %7846 : !llvm.ptr + %7847 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7848 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7849 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7850 = llvm.mul %25, %7849 : !llvm.i64 + %7851 = llvm.add %7848, %7850 : !llvm.i64 + %7852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7853 = llvm.mul %5851, %7852 : !llvm.i64 + %7854 = llvm.add %7851, %7853 : !llvm.i64 + %7855 = llvm.getelementptr %7847[%7854] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7856 = llvm.load %7855 : !llvm.ptr + %7857 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7858 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7859 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7860 = llvm.mul %5851, %7859 : !llvm.i64 + %7861 = llvm.add %7858, %7860 : !llvm.i64 + %7862 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7863 = llvm.mul %5912, %7862 : !llvm.i64 + %7864 = llvm.add %7861, %7863 : !llvm.i64 + %7865 = llvm.getelementptr %7857[%7864] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7866 = llvm.load %7865 : !llvm.ptr + %7867 = llvm.fmul %7856, %7866 {RelaxedPrecision} : !llvm.float + %7868 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7871 = llvm.mul %25, %7870 : !llvm.i64 + %7872 = llvm.add %7869, %7871 : !llvm.i64 + %7873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7874 = llvm.mul %5912, %7873 : !llvm.i64 + %7875 = llvm.add %7872, %7874 : !llvm.i64 + %7876 = llvm.getelementptr %7868[%7875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7877 = llvm.load %7876 : !llvm.ptr + %7878 = llvm.fadd %7877, %7867 {RelaxedPrecision} : !llvm.float + %7879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7882 = llvm.mul %25, %7881 : !llvm.i64 + %7883 = llvm.add %7880, %7882 : !llvm.i64 + %7884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7885 = llvm.mul %5912, %7884 : !llvm.i64 + %7886 = llvm.add %7883, %7885 : !llvm.i64 + %7887 = llvm.getelementptr %7879[%7886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7878, %7887 : !llvm.ptr + %7888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7891 = llvm.mul %25, %7890 : !llvm.i64 + %7892 = llvm.add %7889, %7891 : !llvm.i64 + %7893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7894 = llvm.mul %5912, %7893 : !llvm.i64 + %7895 = llvm.add %7892, %7894 : !llvm.i64 + %7896 = llvm.getelementptr %7888[%7895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7897 = llvm.load %7896 : !llvm.ptr + %7898 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7899 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7900 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7901 = llvm.mul %25, %7900 : !llvm.i64 + %7902 = llvm.add %7899, %7901 : !llvm.i64 + %7903 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7904 = llvm.mul %5912, %7903 : !llvm.i64 + %7905 = llvm.add %7902, %7904 : !llvm.i64 + %7906 = llvm.getelementptr %7898[%7905] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7897, %7906 : !llvm.ptr + %7907 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7908 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7909 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7910 = llvm.mul %25, %7909 : !llvm.i64 + %7911 = llvm.add %7908, %7910 : !llvm.i64 + %7912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7913 = llvm.mul %5851, %7912 : !llvm.i64 + %7914 = llvm.add %7911, %7913 : !llvm.i64 + %7915 = llvm.getelementptr %7907[%7914] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7916 = llvm.load %7915 : !llvm.ptr + %7917 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7918 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7919 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7920 = llvm.mul %5851, %7919 : !llvm.i64 + %7921 = llvm.add %7918, %7920 : !llvm.i64 + %7922 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7923 = llvm.mul %5973, %7922 : !llvm.i64 + %7924 = llvm.add %7921, %7923 : !llvm.i64 + %7925 = llvm.getelementptr %7917[%7924] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7926 = llvm.load %7925 : !llvm.ptr + %7927 = llvm.fmul %7916, %7926 {RelaxedPrecision} : !llvm.float + %7928 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7930 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7931 = llvm.mul %25, %7930 : !llvm.i64 + %7932 = llvm.add %7929, %7931 : !llvm.i64 + %7933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7934 = llvm.mul %5973, %7933 : !llvm.i64 + %7935 = llvm.add %7932, %7934 : !llvm.i64 + %7936 = llvm.getelementptr %7928[%7935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7937 = llvm.load %7936 : !llvm.ptr + %7938 = llvm.fadd %7937, %7927 {RelaxedPrecision} : !llvm.float + %7939 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7940 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7941 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7942 = llvm.mul %25, %7941 : !llvm.i64 + %7943 = llvm.add %7940, %7942 : !llvm.i64 + %7944 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7945 = llvm.mul %5973, %7944 : !llvm.i64 + %7946 = llvm.add %7943, %7945 : !llvm.i64 + %7947 = llvm.getelementptr %7939[%7946] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7938, %7947 : !llvm.ptr + %7948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7951 = llvm.mul %25, %7950 : !llvm.i64 + %7952 = llvm.add %7949, %7951 : !llvm.i64 + %7953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7954 = llvm.mul %5973, %7953 : !llvm.i64 + %7955 = llvm.add %7952, %7954 : !llvm.i64 + %7956 = llvm.getelementptr %7948[%7955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7957 = llvm.load %7956 : !llvm.ptr + %7958 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7960 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7961 = llvm.mul %25, %7960 : !llvm.i64 + %7962 = llvm.add %7959, %7961 : !llvm.i64 + %7963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7964 = llvm.mul %5973, %7963 : !llvm.i64 + %7965 = llvm.add %7962, %7964 : !llvm.i64 + %7966 = llvm.getelementptr %7958[%7965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7957, %7966 : !llvm.ptr + %7967 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7968 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7969 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7970 = llvm.mul %25, %7969 : !llvm.i64 + %7971 = llvm.add %7968, %7970 : !llvm.i64 + %7972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7973 = llvm.mul %5851, %7972 : !llvm.i64 + %7974 = llvm.add %7971, %7973 : !llvm.i64 + %7975 = llvm.getelementptr %7967[%7974] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7976 = llvm.load %7975 : !llvm.ptr + %7977 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7978 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7979 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7980 = llvm.mul %5851, %7979 : !llvm.i64 + %7981 = llvm.add %7978, %7980 : !llvm.i64 + %7982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7983 = llvm.mul %6034, %7982 : !llvm.i64 + %7984 = llvm.add %7981, %7983 : !llvm.i64 + %7985 = llvm.getelementptr %7977[%7984] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7986 = llvm.load %7985 : !llvm.ptr + %7987 = llvm.fmul %7976, %7986 {RelaxedPrecision} : !llvm.float + %7988 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7990 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7991 = llvm.mul %25, %7990 : !llvm.i64 + %7992 = llvm.add %7989, %7991 : !llvm.i64 + %7993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7994 = llvm.mul %6034, %7993 : !llvm.i64 + %7995 = llvm.add %7992, %7994 : !llvm.i64 + %7996 = llvm.getelementptr %7988[%7995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7997 = llvm.load %7996 : !llvm.ptr + %7998 = llvm.fadd %7997, %7987 {RelaxedPrecision} : !llvm.float + %7999 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8000 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8001 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8002 = llvm.mul %25, %8001 : !llvm.i64 + %8003 = llvm.add %8000, %8002 : !llvm.i64 + %8004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8005 = llvm.mul %6034, %8004 : !llvm.i64 + %8006 = llvm.add %8003, %8005 : !llvm.i64 + %8007 = llvm.getelementptr %7999[%8006] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7998, %8007 : !llvm.ptr + %8008 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8010 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8011 = llvm.mul %25, %8010 : !llvm.i64 + %8012 = llvm.add %8009, %8011 : !llvm.i64 + %8013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8014 = llvm.mul %6034, %8013 : !llvm.i64 + %8015 = llvm.add %8012, %8014 : !llvm.i64 + %8016 = llvm.getelementptr %8008[%8015] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8017 = llvm.load %8016 : !llvm.ptr + %8018 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8020 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8021 = llvm.mul %25, %8020 : !llvm.i64 + %8022 = llvm.add %8019, %8021 : !llvm.i64 + %8023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8024 = llvm.mul %6034, %8023 : !llvm.i64 + %8025 = llvm.add %8022, %8024 : !llvm.i64 + %8026 = llvm.getelementptr %8018[%8025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8017, %8026 : !llvm.ptr + %8027 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8028 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8029 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8030 = llvm.mul %25, %8029 : !llvm.i64 + %8031 = llvm.add %8028, %8030 : !llvm.i64 + %8032 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8033 = llvm.mul %5851, %8032 : !llvm.i64 + %8034 = llvm.add %8031, %8033 : !llvm.i64 + %8035 = llvm.getelementptr %8027[%8034] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8036 = llvm.load %8035 : !llvm.ptr + %8037 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8038 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8039 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8040 = llvm.mul %5851, %8039 : !llvm.i64 + %8041 = llvm.add %8038, %8040 : !llvm.i64 + %8042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8043 = llvm.mul %6095, %8042 : !llvm.i64 + %8044 = llvm.add %8041, %8043 : !llvm.i64 + %8045 = llvm.getelementptr %8037[%8044] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8046 = llvm.load %8045 : !llvm.ptr + %8047 = llvm.fmul %8036, %8046 {RelaxedPrecision} : !llvm.float + %8048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8051 = llvm.mul %25, %8050 : !llvm.i64 + %8052 = llvm.add %8049, %8051 : !llvm.i64 + %8053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8054 = llvm.mul %6095, %8053 : !llvm.i64 + %8055 = llvm.add %8052, %8054 : !llvm.i64 + %8056 = llvm.getelementptr %8048[%8055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8057 = llvm.load %8056 : !llvm.ptr + %8058 = llvm.fadd %8057, %8047 {RelaxedPrecision} : !llvm.float + %8059 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8060 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8061 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8062 = llvm.mul %25, %8061 : !llvm.i64 + %8063 = llvm.add %8060, %8062 : !llvm.i64 + %8064 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8065 = llvm.mul %6095, %8064 : !llvm.i64 + %8066 = llvm.add %8063, %8065 : !llvm.i64 + %8067 = llvm.getelementptr %8059[%8066] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8058, %8067 : !llvm.ptr + %8068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8071 = llvm.mul %25, %8070 : !llvm.i64 + %8072 = llvm.add %8069, %8071 : !llvm.i64 + %8073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8074 = llvm.mul %6095, %8073 : !llvm.i64 + %8075 = llvm.add %8072, %8074 : !llvm.i64 + %8076 = llvm.getelementptr %8068[%8075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8077 = llvm.load %8076 : !llvm.ptr + %8078 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8080 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8081 = llvm.mul %25, %8080 : !llvm.i64 + %8082 = llvm.add %8079, %8081 : !llvm.i64 + %8083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8084 = llvm.mul %6095, %8083 : !llvm.i64 + %8085 = llvm.add %8082, %8084 : !llvm.i64 + %8086 = llvm.getelementptr %8078[%8085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8077, %8086 : !llvm.ptr + %8087 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8088 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8089 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8090 = llvm.mul %25, %8089 : !llvm.i64 + %8091 = llvm.add %8088, %8090 : !llvm.i64 + %8092 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8093 = llvm.mul %5851, %8092 : !llvm.i64 + %8094 = llvm.add %8091, %8093 : !llvm.i64 + %8095 = llvm.getelementptr %8087[%8094] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8096 = llvm.load %8095 : !llvm.ptr + %8097 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8098 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8099 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8100 = llvm.mul %5851, %8099 : !llvm.i64 + %8101 = llvm.add %8098, %8100 : !llvm.i64 + %8102 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8103 = llvm.mul %6156, %8102 : !llvm.i64 + %8104 = llvm.add %8101, %8103 : !llvm.i64 + %8105 = llvm.getelementptr %8097[%8104] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8106 = llvm.load %8105 : !llvm.ptr + %8107 = llvm.fmul %8096, %8106 {RelaxedPrecision} : !llvm.float + %8108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8111 = llvm.mul %25, %8110 : !llvm.i64 + %8112 = llvm.add %8109, %8111 : !llvm.i64 + %8113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8114 = llvm.mul %6156, %8113 : !llvm.i64 + %8115 = llvm.add %8112, %8114 : !llvm.i64 + %8116 = llvm.getelementptr %8108[%8115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8117 = llvm.load %8116 : !llvm.ptr + %8118 = llvm.fadd %8117, %8107 {RelaxedPrecision} : !llvm.float + %8119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8121 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8122 = llvm.mul %25, %8121 : !llvm.i64 + %8123 = llvm.add %8120, %8122 : !llvm.i64 + %8124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8125 = llvm.mul %6156, %8124 : !llvm.i64 + %8126 = llvm.add %8123, %8125 : !llvm.i64 + %8127 = llvm.getelementptr %8119[%8126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8118, %8127 : !llvm.ptr + %8128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8131 = llvm.mul %25, %8130 : !llvm.i64 + %8132 = llvm.add %8129, %8131 : !llvm.i64 + %8133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8134 = llvm.mul %6156, %8133 : !llvm.i64 + %8135 = llvm.add %8132, %8134 : !llvm.i64 + %8136 = llvm.getelementptr %8128[%8135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8137 = llvm.load %8136 : !llvm.ptr + %8138 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8140 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8141 = llvm.mul %25, %8140 : !llvm.i64 + %8142 = llvm.add %8139, %8141 : !llvm.i64 + %8143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8144 = llvm.mul %6156, %8143 : !llvm.i64 + %8145 = llvm.add %8142, %8144 : !llvm.i64 + %8146 = llvm.getelementptr %8138[%8145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8137, %8146 : !llvm.ptr + %8147 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8149 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8150 = llvm.mul %25, %8149 : !llvm.i64 + %8151 = llvm.add %8148, %8150 : !llvm.i64 + %8152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8153 = llvm.mul %5851, %8152 : !llvm.i64 + %8154 = llvm.add %8151, %8153 : !llvm.i64 + %8155 = llvm.getelementptr %8147[%8154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8156 = llvm.load %8155 : !llvm.ptr + %8157 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8159 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8160 = llvm.mul %5851, %8159 : !llvm.i64 + %8161 = llvm.add %8158, %8160 : !llvm.i64 + %8162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8163 = llvm.mul %6217, %8162 : !llvm.i64 + %8164 = llvm.add %8161, %8163 : !llvm.i64 + %8165 = llvm.getelementptr %8157[%8164] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8166 = llvm.load %8165 : !llvm.ptr + %8167 = llvm.fmul %8156, %8166 {RelaxedPrecision} : !llvm.float + %8168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8171 = llvm.mul %25, %8170 : !llvm.i64 + %8172 = llvm.add %8169, %8171 : !llvm.i64 + %8173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8174 = llvm.mul %6217, %8173 : !llvm.i64 + %8175 = llvm.add %8172, %8174 : !llvm.i64 + %8176 = llvm.getelementptr %8168[%8175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8177 = llvm.load %8176 : !llvm.ptr + %8178 = llvm.fadd %8177, %8167 {RelaxedPrecision} : !llvm.float + %8179 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8182 = llvm.mul %25, %8181 : !llvm.i64 + %8183 = llvm.add %8180, %8182 : !llvm.i64 + %8184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8185 = llvm.mul %6217, %8184 : !llvm.i64 + %8186 = llvm.add %8183, %8185 : !llvm.i64 + %8187 = llvm.getelementptr %8179[%8186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8178, %8187 : !llvm.ptr + %8188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8191 = llvm.mul %25, %8190 : !llvm.i64 + %8192 = llvm.add %8189, %8191 : !llvm.i64 + %8193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8194 = llvm.mul %6217, %8193 : !llvm.i64 + %8195 = llvm.add %8192, %8194 : !llvm.i64 + %8196 = llvm.getelementptr %8188[%8195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8197 = llvm.load %8196 : !llvm.ptr + %8198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8201 = llvm.mul %25, %8200 : !llvm.i64 + %8202 = llvm.add %8199, %8201 : !llvm.i64 + %8203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8204 = llvm.mul %6217, %8203 : !llvm.i64 + %8205 = llvm.add %8202, %8204 : !llvm.i64 + %8206 = llvm.getelementptr %8198[%8205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8197, %8206 : !llvm.ptr + %8207 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8208 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8209 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8210 = llvm.mul %25, %8209 : !llvm.i64 + %8211 = llvm.add %8208, %8210 : !llvm.i64 + %8212 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8213 = llvm.mul %5851, %8212 : !llvm.i64 + %8214 = llvm.add %8211, %8213 : !llvm.i64 + %8215 = llvm.getelementptr %8207[%8214] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8216 = llvm.load %8215 : !llvm.ptr + %8217 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8218 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8219 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8220 = llvm.mul %5851, %8219 : !llvm.i64 + %8221 = llvm.add %8218, %8220 : !llvm.i64 + %8222 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8223 = llvm.mul %6278, %8222 : !llvm.i64 + %8224 = llvm.add %8221, %8223 : !llvm.i64 + %8225 = llvm.getelementptr %8217[%8224] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8226 = llvm.load %8225 : !llvm.ptr + %8227 = llvm.fmul %8216, %8226 {RelaxedPrecision} : !llvm.float + %8228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8231 = llvm.mul %25, %8230 : !llvm.i64 + %8232 = llvm.add %8229, %8231 : !llvm.i64 + %8233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8234 = llvm.mul %6278, %8233 : !llvm.i64 + %8235 = llvm.add %8232, %8234 : !llvm.i64 + %8236 = llvm.getelementptr %8228[%8235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8237 = llvm.load %8236 : !llvm.ptr + %8238 = llvm.fadd %8237, %8227 {RelaxedPrecision} : !llvm.float + %8239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8242 = llvm.mul %25, %8241 : !llvm.i64 + %8243 = llvm.add %8240, %8242 : !llvm.i64 + %8244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8245 = llvm.mul %6278, %8244 : !llvm.i64 + %8246 = llvm.add %8243, %8245 : !llvm.i64 + %8247 = llvm.getelementptr %8239[%8246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8238, %8247 : !llvm.ptr + %8248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8251 = llvm.mul %25, %8250 : !llvm.i64 + %8252 = llvm.add %8249, %8251 : !llvm.i64 + %8253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8254 = llvm.mul %6278, %8253 : !llvm.i64 + %8255 = llvm.add %8252, %8254 : !llvm.i64 + %8256 = llvm.getelementptr %8248[%8255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8257 = llvm.load %8256 : !llvm.ptr + %8258 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8260 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8261 = llvm.mul %25, %8260 : !llvm.i64 + %8262 = llvm.add %8259, %8261 : !llvm.i64 + %8263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8264 = llvm.mul %6278, %8263 : !llvm.i64 + %8265 = llvm.add %8262, %8264 : !llvm.i64 + %8266 = llvm.getelementptr %8258[%8265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8257, %8266 : !llvm.ptr + %8267 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8269 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8270 = llvm.mul %25, %8269 : !llvm.i64 + %8271 = llvm.add %8268, %8270 : !llvm.i64 + %8272 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8273 = llvm.mul %5851, %8272 : !llvm.i64 + %8274 = llvm.add %8271, %8273 : !llvm.i64 + %8275 = llvm.getelementptr %8267[%8274] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8276 = llvm.load %8275 : !llvm.ptr + %8277 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8278 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8279 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8280 = llvm.mul %5851, %8279 : !llvm.i64 + %8281 = llvm.add %8278, %8280 : !llvm.i64 + %8282 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8283 = llvm.mul %6339, %8282 : !llvm.i64 + %8284 = llvm.add %8281, %8283 : !llvm.i64 + %8285 = llvm.getelementptr %8277[%8284] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8286 = llvm.load %8285 : !llvm.ptr + %8287 = llvm.fmul %8276, %8286 {RelaxedPrecision} : !llvm.float + %8288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8291 = llvm.mul %25, %8290 : !llvm.i64 + %8292 = llvm.add %8289, %8291 : !llvm.i64 + %8293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8294 = llvm.mul %6339, %8293 : !llvm.i64 + %8295 = llvm.add %8292, %8294 : !llvm.i64 + %8296 = llvm.getelementptr %8288[%8295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8297 = llvm.load %8296 : !llvm.ptr + %8298 = llvm.fadd %8297, %8287 {RelaxedPrecision} : !llvm.float + %8299 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8300 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8301 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8302 = llvm.mul %25, %8301 : !llvm.i64 + %8303 = llvm.add %8300, %8302 : !llvm.i64 + %8304 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8305 = llvm.mul %6339, %8304 : !llvm.i64 + %8306 = llvm.add %8303, %8305 : !llvm.i64 + %8307 = llvm.getelementptr %8299[%8306] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8298, %8307 : !llvm.ptr + %8308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8311 = llvm.mul %25, %8310 : !llvm.i64 + %8312 = llvm.add %8309, %8311 : !llvm.i64 + %8313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8314 = llvm.mul %6339, %8313 : !llvm.i64 + %8315 = llvm.add %8312, %8314 : !llvm.i64 + %8316 = llvm.getelementptr %8308[%8315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8317 = llvm.load %8316 : !llvm.ptr + %8318 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8320 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8321 = llvm.mul %25, %8320 : !llvm.i64 + %8322 = llvm.add %8319, %8321 : !llvm.i64 + %8323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8324 = llvm.mul %6339, %8323 : !llvm.i64 + %8325 = llvm.add %8322, %8324 : !llvm.i64 + %8326 = llvm.getelementptr %8318[%8325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8317, %8326 : !llvm.ptr + %8327 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8328 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8329 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8330 = llvm.mul %25, %8329 : !llvm.i64 + %8331 = llvm.add %8328, %8330 : !llvm.i64 + %8332 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8333 = llvm.mul %5851, %8332 : !llvm.i64 + %8334 = llvm.add %8331, %8333 : !llvm.i64 + %8335 = llvm.getelementptr %8327[%8334] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8336 = llvm.load %8335 : !llvm.ptr + %8337 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8339 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8340 = llvm.mul %5851, %8339 : !llvm.i64 + %8341 = llvm.add %8338, %8340 : !llvm.i64 + %8342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8343 = llvm.mul %6400, %8342 : !llvm.i64 + %8344 = llvm.add %8341, %8343 : !llvm.i64 + %8345 = llvm.getelementptr %8337[%8344] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8346 = llvm.load %8345 : !llvm.ptr + %8347 = llvm.fmul %8336, %8346 {RelaxedPrecision} : !llvm.float + %8348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8351 = llvm.mul %25, %8350 : !llvm.i64 + %8352 = llvm.add %8349, %8351 : !llvm.i64 + %8353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8354 = llvm.mul %6400, %8353 : !llvm.i64 + %8355 = llvm.add %8352, %8354 : !llvm.i64 + %8356 = llvm.getelementptr %8348[%8355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8357 = llvm.load %8356 : !llvm.ptr + %8358 = llvm.fadd %8357, %8347 {RelaxedPrecision} : !llvm.float + %8359 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8360 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8361 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8362 = llvm.mul %25, %8361 : !llvm.i64 + %8363 = llvm.add %8360, %8362 : !llvm.i64 + %8364 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8365 = llvm.mul %6400, %8364 : !llvm.i64 + %8366 = llvm.add %8363, %8365 : !llvm.i64 + %8367 = llvm.getelementptr %8359[%8366] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8358, %8367 : !llvm.ptr + %8368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8371 = llvm.mul %25, %8370 : !llvm.i64 + %8372 = llvm.add %8369, %8371 : !llvm.i64 + %8373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8374 = llvm.mul %6400, %8373 : !llvm.i64 + %8375 = llvm.add %8372, %8374 : !llvm.i64 + %8376 = llvm.getelementptr %8368[%8375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8377 = llvm.load %8376 : !llvm.ptr + %8378 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8380 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8381 = llvm.mul %25, %8380 : !llvm.i64 + %8382 = llvm.add %8379, %8381 : !llvm.i64 + %8383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8384 = llvm.mul %6400, %8383 : !llvm.i64 + %8385 = llvm.add %8382, %8384 : !llvm.i64 + %8386 = llvm.getelementptr %8378[%8385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8377, %8386 : !llvm.ptr + %8387 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8389 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8390 = llvm.mul %25, %8389 : !llvm.i64 + %8391 = llvm.add %8388, %8390 : !llvm.i64 + %8392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8393 = llvm.mul %5851, %8392 : !llvm.i64 + %8394 = llvm.add %8391, %8393 : !llvm.i64 + %8395 = llvm.getelementptr %8387[%8394] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8396 = llvm.load %8395 : !llvm.ptr + %8397 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8398 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8399 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8400 = llvm.mul %5851, %8399 : !llvm.i64 + %8401 = llvm.add %8398, %8400 : !llvm.i64 + %8402 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8403 = llvm.mul %6461, %8402 : !llvm.i64 + %8404 = llvm.add %8401, %8403 : !llvm.i64 + %8405 = llvm.getelementptr %8397[%8404] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8406 = llvm.load %8405 : !llvm.ptr + %8407 = llvm.fmul %8396, %8406 {RelaxedPrecision} : !llvm.float + %8408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8411 = llvm.mul %25, %8410 : !llvm.i64 + %8412 = llvm.add %8409, %8411 : !llvm.i64 + %8413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8414 = llvm.mul %6461, %8413 : !llvm.i64 + %8415 = llvm.add %8412, %8414 : !llvm.i64 + %8416 = llvm.getelementptr %8408[%8415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8417 = llvm.load %8416 : !llvm.ptr + %8418 = llvm.fadd %8417, %8407 {RelaxedPrecision} : !llvm.float + %8419 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8422 = llvm.mul %25, %8421 : !llvm.i64 + %8423 = llvm.add %8420, %8422 : !llvm.i64 + %8424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8425 = llvm.mul %6461, %8424 : !llvm.i64 + %8426 = llvm.add %8423, %8425 : !llvm.i64 + %8427 = llvm.getelementptr %8419[%8426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8418, %8427 : !llvm.ptr + %8428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8431 = llvm.mul %25, %8430 : !llvm.i64 + %8432 = llvm.add %8429, %8431 : !llvm.i64 + %8433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8434 = llvm.mul %6461, %8433 : !llvm.i64 + %8435 = llvm.add %8432, %8434 : !llvm.i64 + %8436 = llvm.getelementptr %8428[%8435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8437 = llvm.load %8436 : !llvm.ptr + %8438 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8440 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8441 = llvm.mul %25, %8440 : !llvm.i64 + %8442 = llvm.add %8439, %8441 : !llvm.i64 + %8443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8444 = llvm.mul %6461, %8443 : !llvm.i64 + %8445 = llvm.add %8442, %8444 : !llvm.i64 + %8446 = llvm.getelementptr %8438[%8445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8437, %8446 : !llvm.ptr + %8447 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8449 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8450 = llvm.mul %25, %8449 : !llvm.i64 + %8451 = llvm.add %8448, %8450 : !llvm.i64 + %8452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8453 = llvm.mul %5851, %8452 : !llvm.i64 + %8454 = llvm.add %8451, %8453 : !llvm.i64 + %8455 = llvm.getelementptr %8447[%8454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8456 = llvm.load %8455 : !llvm.ptr + %8457 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8458 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8459 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8460 = llvm.mul %5851, %8459 : !llvm.i64 + %8461 = llvm.add %8458, %8460 : !llvm.i64 + %8462 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8463 = llvm.mul %6522, %8462 : !llvm.i64 + %8464 = llvm.add %8461, %8463 : !llvm.i64 + %8465 = llvm.getelementptr %8457[%8464] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8466 = llvm.load %8465 : !llvm.ptr + %8467 = llvm.fmul %8456, %8466 {RelaxedPrecision} : !llvm.float + %8468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8471 = llvm.mul %25, %8470 : !llvm.i64 + %8472 = llvm.add %8469, %8471 : !llvm.i64 + %8473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8474 = llvm.mul %6522, %8473 : !llvm.i64 + %8475 = llvm.add %8472, %8474 : !llvm.i64 + %8476 = llvm.getelementptr %8468[%8475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8477 = llvm.load %8476 : !llvm.ptr + %8478 = llvm.fadd %8477, %8467 {RelaxedPrecision} : !llvm.float + %8479 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8480 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8481 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8482 = llvm.mul %25, %8481 : !llvm.i64 + %8483 = llvm.add %8480, %8482 : !llvm.i64 + %8484 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8485 = llvm.mul %6522, %8484 : !llvm.i64 + %8486 = llvm.add %8483, %8485 : !llvm.i64 + %8487 = llvm.getelementptr %8479[%8486] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8478, %8487 : !llvm.ptr + %8488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8491 = llvm.mul %25, %8490 : !llvm.i64 + %8492 = llvm.add %8489, %8491 : !llvm.i64 + %8493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8494 = llvm.mul %6522, %8493 : !llvm.i64 + %8495 = llvm.add %8492, %8494 : !llvm.i64 + %8496 = llvm.getelementptr %8488[%8495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8497 = llvm.load %8496 : !llvm.ptr + %8498 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8500 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8501 = llvm.mul %25, %8500 : !llvm.i64 + %8502 = llvm.add %8499, %8501 : !llvm.i64 + %8503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8504 = llvm.mul %6522, %8503 : !llvm.i64 + %8505 = llvm.add %8502, %8504 : !llvm.i64 + %8506 = llvm.getelementptr %8498[%8505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8497, %8506 : !llvm.ptr + %8507 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8508 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8509 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8510 = llvm.mul %25, %8509 : !llvm.i64 + %8511 = llvm.add %8508, %8510 : !llvm.i64 + %8512 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8513 = llvm.mul %5851, %8512 : !llvm.i64 + %8514 = llvm.add %8511, %8513 : !llvm.i64 + %8515 = llvm.getelementptr %8507[%8514] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8516 = llvm.load %8515 : !llvm.ptr + %8517 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8519 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8520 = llvm.mul %5851, %8519 : !llvm.i64 + %8521 = llvm.add %8518, %8520 : !llvm.i64 + %8522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8523 = llvm.mul %6583, %8522 : !llvm.i64 + %8524 = llvm.add %8521, %8523 : !llvm.i64 + %8525 = llvm.getelementptr %8517[%8524] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8526 = llvm.load %8525 : !llvm.ptr + %8527 = llvm.fmul %8516, %8526 {RelaxedPrecision} : !llvm.float + %8528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8531 = llvm.mul %25, %8530 : !llvm.i64 + %8532 = llvm.add %8529, %8531 : !llvm.i64 + %8533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8534 = llvm.mul %6583, %8533 : !llvm.i64 + %8535 = llvm.add %8532, %8534 : !llvm.i64 + %8536 = llvm.getelementptr %8528[%8535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8537 = llvm.load %8536 : !llvm.ptr + %8538 = llvm.fadd %8537, %8527 {RelaxedPrecision} : !llvm.float + %8539 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8540 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8541 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8542 = llvm.mul %25, %8541 : !llvm.i64 + %8543 = llvm.add %8540, %8542 : !llvm.i64 + %8544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8545 = llvm.mul %6583, %8544 : !llvm.i64 + %8546 = llvm.add %8543, %8545 : !llvm.i64 + %8547 = llvm.getelementptr %8539[%8546] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8538, %8547 : !llvm.ptr + %8548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8551 = llvm.mul %25, %8550 : !llvm.i64 + %8552 = llvm.add %8549, %8551 : !llvm.i64 + %8553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8554 = llvm.mul %6583, %8553 : !llvm.i64 + %8555 = llvm.add %8552, %8554 : !llvm.i64 + %8556 = llvm.getelementptr %8548[%8555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8557 = llvm.load %8556 : !llvm.ptr + %8558 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8561 = llvm.mul %25, %8560 : !llvm.i64 + %8562 = llvm.add %8559, %8561 : !llvm.i64 + %8563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8564 = llvm.mul %6583, %8563 : !llvm.i64 + %8565 = llvm.add %8562, %8564 : !llvm.i64 + %8566 = llvm.getelementptr %8558[%8565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8557, %8566 : !llvm.ptr + %8567 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8568 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8569 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8570 = llvm.mul %25, %8569 : !llvm.i64 + %8571 = llvm.add %8568, %8570 : !llvm.i64 + %8572 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8573 = llvm.mul %5851, %8572 : !llvm.i64 + %8574 = llvm.add %8571, %8573 : !llvm.i64 + %8575 = llvm.getelementptr %8567[%8574] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8576 = llvm.load %8575 : !llvm.ptr + %8577 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8578 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8579 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8580 = llvm.mul %5851, %8579 : !llvm.i64 + %8581 = llvm.add %8578, %8580 : !llvm.i64 + %8582 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8583 = llvm.mul %6644, %8582 : !llvm.i64 + %8584 = llvm.add %8581, %8583 : !llvm.i64 + %8585 = llvm.getelementptr %8577[%8584] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8586 = llvm.load %8585 : !llvm.ptr + %8587 = llvm.fmul %8576, %8586 {RelaxedPrecision} : !llvm.float + %8588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8591 = llvm.mul %25, %8590 : !llvm.i64 + %8592 = llvm.add %8589, %8591 : !llvm.i64 + %8593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8594 = llvm.mul %6644, %8593 : !llvm.i64 + %8595 = llvm.add %8592, %8594 : !llvm.i64 + %8596 = llvm.getelementptr %8588[%8595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8597 = llvm.load %8596 : !llvm.ptr + %8598 = llvm.fadd %8597, %8587 {RelaxedPrecision} : !llvm.float + %8599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8602 = llvm.mul %25, %8601 : !llvm.i64 + %8603 = llvm.add %8600, %8602 : !llvm.i64 + %8604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8605 = llvm.mul %6644, %8604 : !llvm.i64 + %8606 = llvm.add %8603, %8605 : !llvm.i64 + %8607 = llvm.getelementptr %8599[%8606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8598, %8607 : !llvm.ptr + %8608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8611 = llvm.mul %25, %8610 : !llvm.i64 + %8612 = llvm.add %8609, %8611 : !llvm.i64 + %8613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8614 = llvm.mul %6644, %8613 : !llvm.i64 + %8615 = llvm.add %8612, %8614 : !llvm.i64 + %8616 = llvm.getelementptr %8608[%8615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8617 = llvm.load %8616 : !llvm.ptr + %8618 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8620 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8621 = llvm.mul %25, %8620 : !llvm.i64 + %8622 = llvm.add %8619, %8621 : !llvm.i64 + %8623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8624 = llvm.mul %6644, %8623 : !llvm.i64 + %8625 = llvm.add %8622, %8624 : !llvm.i64 + %8626 = llvm.getelementptr %8618[%8625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8617, %8626 : !llvm.ptr + %8627 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8628 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8629 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8630 = llvm.mul %25, %8629 : !llvm.i64 + %8631 = llvm.add %8628, %8630 : !llvm.i64 + %8632 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8633 = llvm.mul %5851, %8632 : !llvm.i64 + %8634 = llvm.add %8631, %8633 : !llvm.i64 + %8635 = llvm.getelementptr %8627[%8634] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8636 = llvm.load %8635 : !llvm.ptr + %8637 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8638 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8639 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8640 = llvm.mul %5851, %8639 : !llvm.i64 + %8641 = llvm.add %8638, %8640 : !llvm.i64 + %8642 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8643 = llvm.mul %6705, %8642 : !llvm.i64 + %8644 = llvm.add %8641, %8643 : !llvm.i64 + %8645 = llvm.getelementptr %8637[%8644] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8646 = llvm.load %8645 : !llvm.ptr + %8647 = llvm.fmul %8636, %8646 {RelaxedPrecision} : !llvm.float + %8648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8651 = llvm.mul %25, %8650 : !llvm.i64 + %8652 = llvm.add %8649, %8651 : !llvm.i64 + %8653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8654 = llvm.mul %6705, %8653 : !llvm.i64 + %8655 = llvm.add %8652, %8654 : !llvm.i64 + %8656 = llvm.getelementptr %8648[%8655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8657 = llvm.load %8656 : !llvm.ptr + %8658 = llvm.fadd %8657, %8647 {RelaxedPrecision} : !llvm.float + %8659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8662 = llvm.mul %25, %8661 : !llvm.i64 + %8663 = llvm.add %8660, %8662 : !llvm.i64 + %8664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8665 = llvm.mul %6705, %8664 : !llvm.i64 + %8666 = llvm.add %8663, %8665 : !llvm.i64 + %8667 = llvm.getelementptr %8659[%8666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8658, %8667 : !llvm.ptr + %8668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8671 = llvm.mul %25, %8670 : !llvm.i64 + %8672 = llvm.add %8669, %8671 : !llvm.i64 + %8673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8674 = llvm.mul %6705, %8673 : !llvm.i64 + %8675 = llvm.add %8672, %8674 : !llvm.i64 + %8676 = llvm.getelementptr %8668[%8675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8677 = llvm.load %8676 : !llvm.ptr + %8678 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8680 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8681 = llvm.mul %25, %8680 : !llvm.i64 + %8682 = llvm.add %8679, %8681 : !llvm.i64 + %8683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8684 = llvm.mul %6705, %8683 : !llvm.i64 + %8685 = llvm.add %8682, %8684 : !llvm.i64 + %8686 = llvm.getelementptr %8678[%8685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8677, %8686 : !llvm.ptr + %8687 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8688 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8689 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8690 = llvm.mul %25, %8689 : !llvm.i64 + %8691 = llvm.add %8688, %8690 : !llvm.i64 + %8692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8693 = llvm.mul %5851, %8692 : !llvm.i64 + %8694 = llvm.add %8691, %8693 : !llvm.i64 + %8695 = llvm.getelementptr %8687[%8694] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8696 = llvm.load %8695 : !llvm.ptr + %8697 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8698 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8699 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8700 = llvm.mul %5851, %8699 : !llvm.i64 + %8701 = llvm.add %8698, %8700 : !llvm.i64 + %8702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8703 = llvm.mul %6766, %8702 : !llvm.i64 + %8704 = llvm.add %8701, %8703 : !llvm.i64 + %8705 = llvm.getelementptr %8697[%8704] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8706 = llvm.load %8705 : !llvm.ptr + %8707 = llvm.fmul %8696, %8706 {RelaxedPrecision} : !llvm.float + %8708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8711 = llvm.mul %25, %8710 : !llvm.i64 + %8712 = llvm.add %8709, %8711 : !llvm.i64 + %8713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8714 = llvm.mul %6766, %8713 : !llvm.i64 + %8715 = llvm.add %8712, %8714 : !llvm.i64 + %8716 = llvm.getelementptr %8708[%8715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8717 = llvm.load %8716 : !llvm.ptr + %8718 = llvm.fadd %8717, %8707 {RelaxedPrecision} : !llvm.float + %8719 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8720 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8721 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8722 = llvm.mul %25, %8721 : !llvm.i64 + %8723 = llvm.add %8720, %8722 : !llvm.i64 + %8724 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8725 = llvm.mul %6766, %8724 : !llvm.i64 + %8726 = llvm.add %8723, %8725 : !llvm.i64 + %8727 = llvm.getelementptr %8719[%8726] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8718, %8727 : !llvm.ptr + %8728 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8729 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8730 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8731 = llvm.mul %25, %8730 : !llvm.i64 + %8732 = llvm.add %8729, %8731 : !llvm.i64 + %8733 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8734 = llvm.mul %6766, %8733 : !llvm.i64 + %8735 = llvm.add %8732, %8734 : !llvm.i64 + %8736 = llvm.getelementptr %8728[%8735] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8737 = llvm.load %8736 : !llvm.ptr + %8738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8741 = llvm.mul %25, %8740 : !llvm.i64 + %8742 = llvm.add %8739, %8741 : !llvm.i64 + %8743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8744 = llvm.mul %6766, %8743 : !llvm.i64 + %8745 = llvm.add %8742, %8744 : !llvm.i64 + %8746 = llvm.getelementptr %8738[%8745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8737, %8746 : !llvm.ptr + %8747 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8749 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8750 = llvm.mul %26, %8749 : !llvm.i64 + %8751 = llvm.add %8748, %8750 : !llvm.i64 + %8752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8753 = llvm.mul %5851, %8752 : !llvm.i64 + %8754 = llvm.add %8751, %8753 : !llvm.i64 + %8755 = llvm.getelementptr %8747[%8754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8756 = llvm.load %8755 : !llvm.ptr + %8757 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8758 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8759 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8760 = llvm.mul %5851, %8759 : !llvm.i64 + %8761 = llvm.add %8758, %8760 : !llvm.i64 + %8762 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8763 = llvm.mul %5850, %8762 : !llvm.i64 + %8764 = llvm.add %8761, %8763 : !llvm.i64 + %8765 = llvm.getelementptr %8757[%8764] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8766 = llvm.load %8765 : !llvm.ptr + %8767 = llvm.fmul %8756, %8766 {RelaxedPrecision} : !llvm.float + %8768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8771 = llvm.mul %26, %8770 : !llvm.i64 + %8772 = llvm.add %8769, %8771 : !llvm.i64 + %8773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8774 = llvm.mul %5850, %8773 : !llvm.i64 + %8775 = llvm.add %8772, %8774 : !llvm.i64 + %8776 = llvm.getelementptr %8768[%8775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8777 = llvm.load %8776 : !llvm.ptr + %8778 = llvm.fadd %8777, %8767 {RelaxedPrecision} : !llvm.float + %8779 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8780 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8781 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8782 = llvm.mul %26, %8781 : !llvm.i64 + %8783 = llvm.add %8780, %8782 : !llvm.i64 + %8784 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8785 = llvm.mul %5850, %8784 : !llvm.i64 + %8786 = llvm.add %8783, %8785 : !llvm.i64 + %8787 = llvm.getelementptr %8779[%8786] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8778, %8787 : !llvm.ptr + %8788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8791 = llvm.mul %26, %8790 : !llvm.i64 + %8792 = llvm.add %8789, %8791 : !llvm.i64 + %8793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8794 = llvm.mul %5850, %8793 : !llvm.i64 + %8795 = llvm.add %8792, %8794 : !llvm.i64 + %8796 = llvm.getelementptr %8788[%8795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8797 = llvm.load %8796 : !llvm.ptr + %8798 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8800 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8801 = llvm.mul %26, %8800 : !llvm.i64 + %8802 = llvm.add %8799, %8801 : !llvm.i64 + %8803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8804 = llvm.mul %5850, %8803 : !llvm.i64 + %8805 = llvm.add %8802, %8804 : !llvm.i64 + %8806 = llvm.getelementptr %8798[%8805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8797, %8806 : !llvm.ptr + %8807 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8808 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8809 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8810 = llvm.mul %26, %8809 : !llvm.i64 + %8811 = llvm.add %8808, %8810 : !llvm.i64 + %8812 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8813 = llvm.mul %5851, %8812 : !llvm.i64 + %8814 = llvm.add %8811, %8813 : !llvm.i64 + %8815 = llvm.getelementptr %8807[%8814] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8816 = llvm.load %8815 : !llvm.ptr + %8817 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8818 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8819 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8820 = llvm.mul %5851, %8819 : !llvm.i64 + %8821 = llvm.add %8818, %8820 : !llvm.i64 + %8822 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8823 = llvm.mul %5912, %8822 : !llvm.i64 + %8824 = llvm.add %8821, %8823 : !llvm.i64 + %8825 = llvm.getelementptr %8817[%8824] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8826 = llvm.load %8825 : !llvm.ptr + %8827 = llvm.fmul %8816, %8826 {RelaxedPrecision} : !llvm.float + %8828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8831 = llvm.mul %26, %8830 : !llvm.i64 + %8832 = llvm.add %8829, %8831 : !llvm.i64 + %8833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8834 = llvm.mul %5912, %8833 : !llvm.i64 + %8835 = llvm.add %8832, %8834 : !llvm.i64 + %8836 = llvm.getelementptr %8828[%8835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8837 = llvm.load %8836 : !llvm.ptr + %8838 = llvm.fadd %8837, %8827 {RelaxedPrecision} : !llvm.float + %8839 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8840 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8841 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8842 = llvm.mul %26, %8841 : !llvm.i64 + %8843 = llvm.add %8840, %8842 : !llvm.i64 + %8844 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8845 = llvm.mul %5912, %8844 : !llvm.i64 + %8846 = llvm.add %8843, %8845 : !llvm.i64 + %8847 = llvm.getelementptr %8839[%8846] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8838, %8847 : !llvm.ptr + %8848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8851 = llvm.mul %26, %8850 : !llvm.i64 + %8852 = llvm.add %8849, %8851 : !llvm.i64 + %8853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8854 = llvm.mul %5912, %8853 : !llvm.i64 + %8855 = llvm.add %8852, %8854 : !llvm.i64 + %8856 = llvm.getelementptr %8848[%8855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8857 = llvm.load %8856 : !llvm.ptr + %8858 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8860 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8861 = llvm.mul %26, %8860 : !llvm.i64 + %8862 = llvm.add %8859, %8861 : !llvm.i64 + %8863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8864 = llvm.mul %5912, %8863 : !llvm.i64 + %8865 = llvm.add %8862, %8864 : !llvm.i64 + %8866 = llvm.getelementptr %8858[%8865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8857, %8866 : !llvm.ptr + %8867 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8868 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8869 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8870 = llvm.mul %26, %8869 : !llvm.i64 + %8871 = llvm.add %8868, %8870 : !llvm.i64 + %8872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8873 = llvm.mul %5851, %8872 : !llvm.i64 + %8874 = llvm.add %8871, %8873 : !llvm.i64 + %8875 = llvm.getelementptr %8867[%8874] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8876 = llvm.load %8875 : !llvm.ptr + %8877 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8878 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8879 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8880 = llvm.mul %5851, %8879 : !llvm.i64 + %8881 = llvm.add %8878, %8880 : !llvm.i64 + %8882 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8883 = llvm.mul %5973, %8882 : !llvm.i64 + %8884 = llvm.add %8881, %8883 : !llvm.i64 + %8885 = llvm.getelementptr %8877[%8884] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8886 = llvm.load %8885 : !llvm.ptr + %8887 = llvm.fmul %8876, %8886 {RelaxedPrecision} : !llvm.float + %8888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8891 = llvm.mul %26, %8890 : !llvm.i64 + %8892 = llvm.add %8889, %8891 : !llvm.i64 + %8893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8894 = llvm.mul %5973, %8893 : !llvm.i64 + %8895 = llvm.add %8892, %8894 : !llvm.i64 + %8896 = llvm.getelementptr %8888[%8895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8897 = llvm.load %8896 : !llvm.ptr + %8898 = llvm.fadd %8897, %8887 {RelaxedPrecision} : !llvm.float + %8899 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8901 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8902 = llvm.mul %26, %8901 : !llvm.i64 + %8903 = llvm.add %8900, %8902 : !llvm.i64 + %8904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8905 = llvm.mul %5973, %8904 : !llvm.i64 + %8906 = llvm.add %8903, %8905 : !llvm.i64 + %8907 = llvm.getelementptr %8899[%8906] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8898, %8907 : !llvm.ptr + %8908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8911 = llvm.mul %26, %8910 : !llvm.i64 + %8912 = llvm.add %8909, %8911 : !llvm.i64 + %8913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8914 = llvm.mul %5973, %8913 : !llvm.i64 + %8915 = llvm.add %8912, %8914 : !llvm.i64 + %8916 = llvm.getelementptr %8908[%8915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8917 = llvm.load %8916 : !llvm.ptr + %8918 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8920 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8921 = llvm.mul %26, %8920 : !llvm.i64 + %8922 = llvm.add %8919, %8921 : !llvm.i64 + %8923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8924 = llvm.mul %5973, %8923 : !llvm.i64 + %8925 = llvm.add %8922, %8924 : !llvm.i64 + %8926 = llvm.getelementptr %8918[%8925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8917, %8926 : !llvm.ptr + %8927 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8928 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8929 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8930 = llvm.mul %26, %8929 : !llvm.i64 + %8931 = llvm.add %8928, %8930 : !llvm.i64 + %8932 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8933 = llvm.mul %5851, %8932 : !llvm.i64 + %8934 = llvm.add %8931, %8933 : !llvm.i64 + %8935 = llvm.getelementptr %8927[%8934] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8936 = llvm.load %8935 : !llvm.ptr + %8937 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8938 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8939 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8940 = llvm.mul %5851, %8939 : !llvm.i64 + %8941 = llvm.add %8938, %8940 : !llvm.i64 + %8942 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8943 = llvm.mul %6034, %8942 : !llvm.i64 + %8944 = llvm.add %8941, %8943 : !llvm.i64 + %8945 = llvm.getelementptr %8937[%8944] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8946 = llvm.load %8945 : !llvm.ptr + %8947 = llvm.fmul %8936, %8946 {RelaxedPrecision} : !llvm.float + %8948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8951 = llvm.mul %26, %8950 : !llvm.i64 + %8952 = llvm.add %8949, %8951 : !llvm.i64 + %8953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8954 = llvm.mul %6034, %8953 : !llvm.i64 + %8955 = llvm.add %8952, %8954 : !llvm.i64 + %8956 = llvm.getelementptr %8948[%8955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8957 = llvm.load %8956 : !llvm.ptr + %8958 = llvm.fadd %8957, %8947 {RelaxedPrecision} : !llvm.float + %8959 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8960 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8961 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8962 = llvm.mul %26, %8961 : !llvm.i64 + %8963 = llvm.add %8960, %8962 : !llvm.i64 + %8964 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8965 = llvm.mul %6034, %8964 : !llvm.i64 + %8966 = llvm.add %8963, %8965 : !llvm.i64 + %8967 = llvm.getelementptr %8959[%8966] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8958, %8967 : !llvm.ptr + %8968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8971 = llvm.mul %26, %8970 : !llvm.i64 + %8972 = llvm.add %8969, %8971 : !llvm.i64 + %8973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8974 = llvm.mul %6034, %8973 : !llvm.i64 + %8975 = llvm.add %8972, %8974 : !llvm.i64 + %8976 = llvm.getelementptr %8968[%8975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8977 = llvm.load %8976 : !llvm.ptr + %8978 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8980 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8981 = llvm.mul %26, %8980 : !llvm.i64 + %8982 = llvm.add %8979, %8981 : !llvm.i64 + %8983 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8984 = llvm.mul %6034, %8983 : !llvm.i64 + %8985 = llvm.add %8982, %8984 : !llvm.i64 + %8986 = llvm.getelementptr %8978[%8985] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8977, %8986 : !llvm.ptr + %8987 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8988 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8989 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8990 = llvm.mul %26, %8989 : !llvm.i64 + %8991 = llvm.add %8988, %8990 : !llvm.i64 + %8992 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8993 = llvm.mul %5851, %8992 : !llvm.i64 + %8994 = llvm.add %8991, %8993 : !llvm.i64 + %8995 = llvm.getelementptr %8987[%8994] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8996 = llvm.load %8995 : !llvm.ptr + %8997 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8998 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8999 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9000 = llvm.mul %5851, %8999 : !llvm.i64 + %9001 = llvm.add %8998, %9000 : !llvm.i64 + %9002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9003 = llvm.mul %6095, %9002 : !llvm.i64 + %9004 = llvm.add %9001, %9003 : !llvm.i64 + %9005 = llvm.getelementptr %8997[%9004] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9006 = llvm.load %9005 : !llvm.ptr + %9007 = llvm.fmul %8996, %9006 {RelaxedPrecision} : !llvm.float + %9008 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9010 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9011 = llvm.mul %26, %9010 : !llvm.i64 + %9012 = llvm.add %9009, %9011 : !llvm.i64 + %9013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9014 = llvm.mul %6095, %9013 : !llvm.i64 + %9015 = llvm.add %9012, %9014 : !llvm.i64 + %9016 = llvm.getelementptr %9008[%9015] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9017 = llvm.load %9016 : !llvm.ptr + %9018 = llvm.fadd %9017, %9007 {RelaxedPrecision} : !llvm.float + %9019 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9020 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9021 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9022 = llvm.mul %26, %9021 : !llvm.i64 + %9023 = llvm.add %9020, %9022 : !llvm.i64 + %9024 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9025 = llvm.mul %6095, %9024 : !llvm.i64 + %9026 = llvm.add %9023, %9025 : !llvm.i64 + %9027 = llvm.getelementptr %9019[%9026] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9018, %9027 : !llvm.ptr + %9028 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9031 = llvm.mul %26, %9030 : !llvm.i64 + %9032 = llvm.add %9029, %9031 : !llvm.i64 + %9033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9034 = llvm.mul %6095, %9033 : !llvm.i64 + %9035 = llvm.add %9032, %9034 : !llvm.i64 + %9036 = llvm.getelementptr %9028[%9035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9037 = llvm.load %9036 : !llvm.ptr + %9038 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9039 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9040 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9041 = llvm.mul %26, %9040 : !llvm.i64 + %9042 = llvm.add %9039, %9041 : !llvm.i64 + %9043 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9044 = llvm.mul %6095, %9043 : !llvm.i64 + %9045 = llvm.add %9042, %9044 : !llvm.i64 + %9046 = llvm.getelementptr %9038[%9045] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9037, %9046 : !llvm.ptr + %9047 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9048 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9049 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9050 = llvm.mul %26, %9049 : !llvm.i64 + %9051 = llvm.add %9048, %9050 : !llvm.i64 + %9052 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9053 = llvm.mul %5851, %9052 : !llvm.i64 + %9054 = llvm.add %9051, %9053 : !llvm.i64 + %9055 = llvm.getelementptr %9047[%9054] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9056 = llvm.load %9055 : !llvm.ptr + %9057 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9059 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9060 = llvm.mul %5851, %9059 : !llvm.i64 + %9061 = llvm.add %9058, %9060 : !llvm.i64 + %9062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9063 = llvm.mul %6156, %9062 : !llvm.i64 + %9064 = llvm.add %9061, %9063 : !llvm.i64 + %9065 = llvm.getelementptr %9057[%9064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9066 = llvm.load %9065 : !llvm.ptr + %9067 = llvm.fmul %9056, %9066 {RelaxedPrecision} : !llvm.float + %9068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9071 = llvm.mul %26, %9070 : !llvm.i64 + %9072 = llvm.add %9069, %9071 : !llvm.i64 + %9073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9074 = llvm.mul %6156, %9073 : !llvm.i64 + %9075 = llvm.add %9072, %9074 : !llvm.i64 + %9076 = llvm.getelementptr %9068[%9075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9077 = llvm.load %9076 : !llvm.ptr + %9078 = llvm.fadd %9077, %9067 {RelaxedPrecision} : !llvm.float + %9079 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9080 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9081 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9082 = llvm.mul %26, %9081 : !llvm.i64 + %9083 = llvm.add %9080, %9082 : !llvm.i64 + %9084 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9085 = llvm.mul %6156, %9084 : !llvm.i64 + %9086 = llvm.add %9083, %9085 : !llvm.i64 + %9087 = llvm.getelementptr %9079[%9086] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9078, %9087 : !llvm.ptr + %9088 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9091 = llvm.mul %26, %9090 : !llvm.i64 + %9092 = llvm.add %9089, %9091 : !llvm.i64 + %9093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9094 = llvm.mul %6156, %9093 : !llvm.i64 + %9095 = llvm.add %9092, %9094 : !llvm.i64 + %9096 = llvm.getelementptr %9088[%9095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9097 = llvm.load %9096 : !llvm.ptr + %9098 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9100 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9101 = llvm.mul %26, %9100 : !llvm.i64 + %9102 = llvm.add %9099, %9101 : !llvm.i64 + %9103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9104 = llvm.mul %6156, %9103 : !llvm.i64 + %9105 = llvm.add %9102, %9104 : !llvm.i64 + %9106 = llvm.getelementptr %9098[%9105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9097, %9106 : !llvm.ptr + %9107 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9108 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9109 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9110 = llvm.mul %26, %9109 : !llvm.i64 + %9111 = llvm.add %9108, %9110 : !llvm.i64 + %9112 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9113 = llvm.mul %5851, %9112 : !llvm.i64 + %9114 = llvm.add %9111, %9113 : !llvm.i64 + %9115 = llvm.getelementptr %9107[%9114] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9116 = llvm.load %9115 : !llvm.ptr + %9117 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9120 = llvm.mul %5851, %9119 : !llvm.i64 + %9121 = llvm.add %9118, %9120 : !llvm.i64 + %9122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9123 = llvm.mul %6217, %9122 : !llvm.i64 + %9124 = llvm.add %9121, %9123 : !llvm.i64 + %9125 = llvm.getelementptr %9117[%9124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9126 = llvm.load %9125 : !llvm.ptr + %9127 = llvm.fmul %9116, %9126 {RelaxedPrecision} : !llvm.float + %9128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9131 = llvm.mul %26, %9130 : !llvm.i64 + %9132 = llvm.add %9129, %9131 : !llvm.i64 + %9133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9134 = llvm.mul %6217, %9133 : !llvm.i64 + %9135 = llvm.add %9132, %9134 : !llvm.i64 + %9136 = llvm.getelementptr %9128[%9135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9137 = llvm.load %9136 : !llvm.ptr + %9138 = llvm.fadd %9137, %9127 {RelaxedPrecision} : !llvm.float + %9139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9142 = llvm.mul %26, %9141 : !llvm.i64 + %9143 = llvm.add %9140, %9142 : !llvm.i64 + %9144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9145 = llvm.mul %6217, %9144 : !llvm.i64 + %9146 = llvm.add %9143, %9145 : !llvm.i64 + %9147 = llvm.getelementptr %9139[%9146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9138, %9147 : !llvm.ptr + %9148 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9151 = llvm.mul %26, %9150 : !llvm.i64 + %9152 = llvm.add %9149, %9151 : !llvm.i64 + %9153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9154 = llvm.mul %6217, %9153 : !llvm.i64 + %9155 = llvm.add %9152, %9154 : !llvm.i64 + %9156 = llvm.getelementptr %9148[%9155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9157 = llvm.load %9156 : !llvm.ptr + %9158 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9161 = llvm.mul %26, %9160 : !llvm.i64 + %9162 = llvm.add %9159, %9161 : !llvm.i64 + %9163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9164 = llvm.mul %6217, %9163 : !llvm.i64 + %9165 = llvm.add %9162, %9164 : !llvm.i64 + %9166 = llvm.getelementptr %9158[%9165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9157, %9166 : !llvm.ptr + %9167 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9169 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9170 = llvm.mul %26, %9169 : !llvm.i64 + %9171 = llvm.add %9168, %9170 : !llvm.i64 + %9172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9173 = llvm.mul %5851, %9172 : !llvm.i64 + %9174 = llvm.add %9171, %9173 : !llvm.i64 + %9175 = llvm.getelementptr %9167[%9174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9176 = llvm.load %9175 : !llvm.ptr + %9177 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9179 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9180 = llvm.mul %5851, %9179 : !llvm.i64 + %9181 = llvm.add %9178, %9180 : !llvm.i64 + %9182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9183 = llvm.mul %6278, %9182 : !llvm.i64 + %9184 = llvm.add %9181, %9183 : !llvm.i64 + %9185 = llvm.getelementptr %9177[%9184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9186 = llvm.load %9185 : !llvm.ptr + %9187 = llvm.fmul %9176, %9186 {RelaxedPrecision} : !llvm.float + %9188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9191 = llvm.mul %26, %9190 : !llvm.i64 + %9192 = llvm.add %9189, %9191 : !llvm.i64 + %9193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9194 = llvm.mul %6278, %9193 : !llvm.i64 + %9195 = llvm.add %9192, %9194 : !llvm.i64 + %9196 = llvm.getelementptr %9188[%9195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9197 = llvm.load %9196 : !llvm.ptr + %9198 = llvm.fadd %9197, %9187 {RelaxedPrecision} : !llvm.float + %9199 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9200 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9201 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9202 = llvm.mul %26, %9201 : !llvm.i64 + %9203 = llvm.add %9200, %9202 : !llvm.i64 + %9204 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9205 = llvm.mul %6278, %9204 : !llvm.i64 + %9206 = llvm.add %9203, %9205 : !llvm.i64 + %9207 = llvm.getelementptr %9199[%9206] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9198, %9207 : !llvm.ptr + %9208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9211 = llvm.mul %26, %9210 : !llvm.i64 + %9212 = llvm.add %9209, %9211 : !llvm.i64 + %9213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9214 = llvm.mul %6278, %9213 : !llvm.i64 + %9215 = llvm.add %9212, %9214 : !llvm.i64 + %9216 = llvm.getelementptr %9208[%9215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9217 = llvm.load %9216 : !llvm.ptr + %9218 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9220 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9221 = llvm.mul %26, %9220 : !llvm.i64 + %9222 = llvm.add %9219, %9221 : !llvm.i64 + %9223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9224 = llvm.mul %6278, %9223 : !llvm.i64 + %9225 = llvm.add %9222, %9224 : !llvm.i64 + %9226 = llvm.getelementptr %9218[%9225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9217, %9226 : !llvm.ptr + %9227 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9228 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9229 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9230 = llvm.mul %26, %9229 : !llvm.i64 + %9231 = llvm.add %9228, %9230 : !llvm.i64 + %9232 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9233 = llvm.mul %5851, %9232 : !llvm.i64 + %9234 = llvm.add %9231, %9233 : !llvm.i64 + %9235 = llvm.getelementptr %9227[%9234] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9236 = llvm.load %9235 : !llvm.ptr + %9237 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9239 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9240 = llvm.mul %5851, %9239 : !llvm.i64 + %9241 = llvm.add %9238, %9240 : !llvm.i64 + %9242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9243 = llvm.mul %6339, %9242 : !llvm.i64 + %9244 = llvm.add %9241, %9243 : !llvm.i64 + %9245 = llvm.getelementptr %9237[%9244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9246 = llvm.load %9245 : !llvm.ptr + %9247 = llvm.fmul %9236, %9246 {RelaxedPrecision} : !llvm.float + %9248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9251 = llvm.mul %26, %9250 : !llvm.i64 + %9252 = llvm.add %9249, %9251 : !llvm.i64 + %9253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9254 = llvm.mul %6339, %9253 : !llvm.i64 + %9255 = llvm.add %9252, %9254 : !llvm.i64 + %9256 = llvm.getelementptr %9248[%9255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9257 = llvm.load %9256 : !llvm.ptr + %9258 = llvm.fadd %9257, %9247 {RelaxedPrecision} : !llvm.float + %9259 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9260 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9261 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9262 = llvm.mul %26, %9261 : !llvm.i64 + %9263 = llvm.add %9260, %9262 : !llvm.i64 + %9264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9265 = llvm.mul %6339, %9264 : !llvm.i64 + %9266 = llvm.add %9263, %9265 : !llvm.i64 + %9267 = llvm.getelementptr %9259[%9266] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9258, %9267 : !llvm.ptr + %9268 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9271 = llvm.mul %26, %9270 : !llvm.i64 + %9272 = llvm.add %9269, %9271 : !llvm.i64 + %9273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9274 = llvm.mul %6339, %9273 : !llvm.i64 + %9275 = llvm.add %9272, %9274 : !llvm.i64 + %9276 = llvm.getelementptr %9268[%9275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9277 = llvm.load %9276 : !llvm.ptr + %9278 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9279 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9280 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9281 = llvm.mul %26, %9280 : !llvm.i64 + %9282 = llvm.add %9279, %9281 : !llvm.i64 + %9283 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9284 = llvm.mul %6339, %9283 : !llvm.i64 + %9285 = llvm.add %9282, %9284 : !llvm.i64 + %9286 = llvm.getelementptr %9278[%9285] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9277, %9286 : !llvm.ptr + %9287 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9288 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9289 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9290 = llvm.mul %26, %9289 : !llvm.i64 + %9291 = llvm.add %9288, %9290 : !llvm.i64 + %9292 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9293 = llvm.mul %5851, %9292 : !llvm.i64 + %9294 = llvm.add %9291, %9293 : !llvm.i64 + %9295 = llvm.getelementptr %9287[%9294] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9296 = llvm.load %9295 : !llvm.ptr + %9297 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9299 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9300 = llvm.mul %5851, %9299 : !llvm.i64 + %9301 = llvm.add %9298, %9300 : !llvm.i64 + %9302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9303 = llvm.mul %6400, %9302 : !llvm.i64 + %9304 = llvm.add %9301, %9303 : !llvm.i64 + %9305 = llvm.getelementptr %9297[%9304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9306 = llvm.load %9305 : !llvm.ptr + %9307 = llvm.fmul %9296, %9306 {RelaxedPrecision} : !llvm.float + %9308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9311 = llvm.mul %26, %9310 : !llvm.i64 + %9312 = llvm.add %9309, %9311 : !llvm.i64 + %9313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9314 = llvm.mul %6400, %9313 : !llvm.i64 + %9315 = llvm.add %9312, %9314 : !llvm.i64 + %9316 = llvm.getelementptr %9308[%9315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9317 = llvm.load %9316 : !llvm.ptr + %9318 = llvm.fadd %9317, %9307 {RelaxedPrecision} : !llvm.float + %9319 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9320 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9321 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9322 = llvm.mul %26, %9321 : !llvm.i64 + %9323 = llvm.add %9320, %9322 : !llvm.i64 + %9324 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9325 = llvm.mul %6400, %9324 : !llvm.i64 + %9326 = llvm.add %9323, %9325 : !llvm.i64 + %9327 = llvm.getelementptr %9319[%9326] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9318, %9327 : !llvm.ptr + %9328 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9331 = llvm.mul %26, %9330 : !llvm.i64 + %9332 = llvm.add %9329, %9331 : !llvm.i64 + %9333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9334 = llvm.mul %6400, %9333 : !llvm.i64 + %9335 = llvm.add %9332, %9334 : !llvm.i64 + %9336 = llvm.getelementptr %9328[%9335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9337 = llvm.load %9336 : !llvm.ptr + %9338 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9339 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9340 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9341 = llvm.mul %26, %9340 : !llvm.i64 + %9342 = llvm.add %9339, %9341 : !llvm.i64 + %9343 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9344 = llvm.mul %6400, %9343 : !llvm.i64 + %9345 = llvm.add %9342, %9344 : !llvm.i64 + %9346 = llvm.getelementptr %9338[%9345] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9337, %9346 : !llvm.ptr + %9347 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9349 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9350 = llvm.mul %26, %9349 : !llvm.i64 + %9351 = llvm.add %9348, %9350 : !llvm.i64 + %9352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9353 = llvm.mul %5851, %9352 : !llvm.i64 + %9354 = llvm.add %9351, %9353 : !llvm.i64 + %9355 = llvm.getelementptr %9347[%9354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9356 = llvm.load %9355 : !llvm.ptr + %9357 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9359 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9360 = llvm.mul %5851, %9359 : !llvm.i64 + %9361 = llvm.add %9358, %9360 : !llvm.i64 + %9362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9363 = llvm.mul %6461, %9362 : !llvm.i64 + %9364 = llvm.add %9361, %9363 : !llvm.i64 + %9365 = llvm.getelementptr %9357[%9364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9366 = llvm.load %9365 : !llvm.ptr + %9367 = llvm.fmul %9356, %9366 {RelaxedPrecision} : !llvm.float + %9368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9371 = llvm.mul %26, %9370 : !llvm.i64 + %9372 = llvm.add %9369, %9371 : !llvm.i64 + %9373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9374 = llvm.mul %6461, %9373 : !llvm.i64 + %9375 = llvm.add %9372, %9374 : !llvm.i64 + %9376 = llvm.getelementptr %9368[%9375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9377 = llvm.load %9376 : !llvm.ptr + %9378 = llvm.fadd %9377, %9367 {RelaxedPrecision} : !llvm.float + %9379 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9380 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9381 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9382 = llvm.mul %26, %9381 : !llvm.i64 + %9383 = llvm.add %9380, %9382 : !llvm.i64 + %9384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9385 = llvm.mul %6461, %9384 : !llvm.i64 + %9386 = llvm.add %9383, %9385 : !llvm.i64 + %9387 = llvm.getelementptr %9379[%9386] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9378, %9387 : !llvm.ptr + %9388 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9391 = llvm.mul %26, %9390 : !llvm.i64 + %9392 = llvm.add %9389, %9391 : !llvm.i64 + %9393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9394 = llvm.mul %6461, %9393 : !llvm.i64 + %9395 = llvm.add %9392, %9394 : !llvm.i64 + %9396 = llvm.getelementptr %9388[%9395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9397 = llvm.load %9396 : !llvm.ptr + %9398 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9399 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9401 = llvm.mul %26, %9400 : !llvm.i64 + %9402 = llvm.add %9399, %9401 : !llvm.i64 + %9403 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9404 = llvm.mul %6461, %9403 : !llvm.i64 + %9405 = llvm.add %9402, %9404 : !llvm.i64 + %9406 = llvm.getelementptr %9398[%9405] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9397, %9406 : !llvm.ptr + %9407 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9408 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9409 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9410 = llvm.mul %26, %9409 : !llvm.i64 + %9411 = llvm.add %9408, %9410 : !llvm.i64 + %9412 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9413 = llvm.mul %5851, %9412 : !llvm.i64 + %9414 = llvm.add %9411, %9413 : !llvm.i64 + %9415 = llvm.getelementptr %9407[%9414] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9416 = llvm.load %9415 : !llvm.ptr + %9417 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9419 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9420 = llvm.mul %5851, %9419 : !llvm.i64 + %9421 = llvm.add %9418, %9420 : !llvm.i64 + %9422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9423 = llvm.mul %6522, %9422 : !llvm.i64 + %9424 = llvm.add %9421, %9423 : !llvm.i64 + %9425 = llvm.getelementptr %9417[%9424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9426 = llvm.load %9425 : !llvm.ptr + %9427 = llvm.fmul %9416, %9426 {RelaxedPrecision} : !llvm.float + %9428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9431 = llvm.mul %26, %9430 : !llvm.i64 + %9432 = llvm.add %9429, %9431 : !llvm.i64 + %9433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9434 = llvm.mul %6522, %9433 : !llvm.i64 + %9435 = llvm.add %9432, %9434 : !llvm.i64 + %9436 = llvm.getelementptr %9428[%9435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9437 = llvm.load %9436 : !llvm.ptr + %9438 = llvm.fadd %9437, %9427 {RelaxedPrecision} : !llvm.float + %9439 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9440 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9441 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9442 = llvm.mul %26, %9441 : !llvm.i64 + %9443 = llvm.add %9440, %9442 : !llvm.i64 + %9444 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9445 = llvm.mul %6522, %9444 : !llvm.i64 + %9446 = llvm.add %9443, %9445 : !llvm.i64 + %9447 = llvm.getelementptr %9439[%9446] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9438, %9447 : !llvm.ptr + %9448 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9451 = llvm.mul %26, %9450 : !llvm.i64 + %9452 = llvm.add %9449, %9451 : !llvm.i64 + %9453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9454 = llvm.mul %6522, %9453 : !llvm.i64 + %9455 = llvm.add %9452, %9454 : !llvm.i64 + %9456 = llvm.getelementptr %9448[%9455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9457 = llvm.load %9456 : !llvm.ptr + %9458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9461 = llvm.mul %26, %9460 : !llvm.i64 + %9462 = llvm.add %9459, %9461 : !llvm.i64 + %9463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9464 = llvm.mul %6522, %9463 : !llvm.i64 + %9465 = llvm.add %9462, %9464 : !llvm.i64 + %9466 = llvm.getelementptr %9458[%9465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9457, %9466 : !llvm.ptr + %9467 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9468 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9469 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9470 = llvm.mul %26, %9469 : !llvm.i64 + %9471 = llvm.add %9468, %9470 : !llvm.i64 + %9472 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9473 = llvm.mul %5851, %9472 : !llvm.i64 + %9474 = llvm.add %9471, %9473 : !llvm.i64 + %9475 = llvm.getelementptr %9467[%9474] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9476 = llvm.load %9475 : !llvm.ptr + %9477 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9480 = llvm.mul %5851, %9479 : !llvm.i64 + %9481 = llvm.add %9478, %9480 : !llvm.i64 + %9482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9483 = llvm.mul %6583, %9482 : !llvm.i64 + %9484 = llvm.add %9481, %9483 : !llvm.i64 + %9485 = llvm.getelementptr %9477[%9484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9486 = llvm.load %9485 : !llvm.ptr + %9487 = llvm.fmul %9476, %9486 {RelaxedPrecision} : !llvm.float + %9488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9491 = llvm.mul %26, %9490 : !llvm.i64 + %9492 = llvm.add %9489, %9491 : !llvm.i64 + %9493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9494 = llvm.mul %6583, %9493 : !llvm.i64 + %9495 = llvm.add %9492, %9494 : !llvm.i64 + %9496 = llvm.getelementptr %9488[%9495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9497 = llvm.load %9496 : !llvm.ptr + %9498 = llvm.fadd %9497, %9487 {RelaxedPrecision} : !llvm.float + %9499 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9500 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9501 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9502 = llvm.mul %26, %9501 : !llvm.i64 + %9503 = llvm.add %9500, %9502 : !llvm.i64 + %9504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9505 = llvm.mul %6583, %9504 : !llvm.i64 + %9506 = llvm.add %9503, %9505 : !llvm.i64 + %9507 = llvm.getelementptr %9499[%9506] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9498, %9507 : !llvm.ptr + %9508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9511 = llvm.mul %26, %9510 : !llvm.i64 + %9512 = llvm.add %9509, %9511 : !llvm.i64 + %9513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9514 = llvm.mul %6583, %9513 : !llvm.i64 + %9515 = llvm.add %9512, %9514 : !llvm.i64 + %9516 = llvm.getelementptr %9508[%9515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9517 = llvm.load %9516 : !llvm.ptr + %9518 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9519 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9520 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9521 = llvm.mul %26, %9520 : !llvm.i64 + %9522 = llvm.add %9519, %9521 : !llvm.i64 + %9523 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9524 = llvm.mul %6583, %9523 : !llvm.i64 + %9525 = llvm.add %9522, %9524 : !llvm.i64 + %9526 = llvm.getelementptr %9518[%9525] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9517, %9526 : !llvm.ptr + %9527 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9528 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9529 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9530 = llvm.mul %26, %9529 : !llvm.i64 + %9531 = llvm.add %9528, %9530 : !llvm.i64 + %9532 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9533 = llvm.mul %5851, %9532 : !llvm.i64 + %9534 = llvm.add %9531, %9533 : !llvm.i64 + %9535 = llvm.getelementptr %9527[%9534] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9536 = llvm.load %9535 : !llvm.ptr + %9537 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9539 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9540 = llvm.mul %5851, %9539 : !llvm.i64 + %9541 = llvm.add %9538, %9540 : !llvm.i64 + %9542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9543 = llvm.mul %6644, %9542 : !llvm.i64 + %9544 = llvm.add %9541, %9543 : !llvm.i64 + %9545 = llvm.getelementptr %9537[%9544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9546 = llvm.load %9545 : !llvm.ptr + %9547 = llvm.fmul %9536, %9546 {RelaxedPrecision} : !llvm.float + %9548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9551 = llvm.mul %26, %9550 : !llvm.i64 + %9552 = llvm.add %9549, %9551 : !llvm.i64 + %9553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9554 = llvm.mul %6644, %9553 : !llvm.i64 + %9555 = llvm.add %9552, %9554 : !llvm.i64 + %9556 = llvm.getelementptr %9548[%9555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9557 = llvm.load %9556 : !llvm.ptr + %9558 = llvm.fadd %9557, %9547 {RelaxedPrecision} : !llvm.float + %9559 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9560 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9561 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9562 = llvm.mul %26, %9561 : !llvm.i64 + %9563 = llvm.add %9560, %9562 : !llvm.i64 + %9564 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9565 = llvm.mul %6644, %9564 : !llvm.i64 + %9566 = llvm.add %9563, %9565 : !llvm.i64 + %9567 = llvm.getelementptr %9559[%9566] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9558, %9567 : !llvm.ptr + %9568 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9571 = llvm.mul %26, %9570 : !llvm.i64 + %9572 = llvm.add %9569, %9571 : !llvm.i64 + %9573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9574 = llvm.mul %6644, %9573 : !llvm.i64 + %9575 = llvm.add %9572, %9574 : !llvm.i64 + %9576 = llvm.getelementptr %9568[%9575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9577 = llvm.load %9576 : !llvm.ptr + %9578 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9580 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9581 = llvm.mul %26, %9580 : !llvm.i64 + %9582 = llvm.add %9579, %9581 : !llvm.i64 + %9583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9584 = llvm.mul %6644, %9583 : !llvm.i64 + %9585 = llvm.add %9582, %9584 : !llvm.i64 + %9586 = llvm.getelementptr %9578[%9585] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9577, %9586 : !llvm.ptr + %9587 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9588 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9589 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9590 = llvm.mul %26, %9589 : !llvm.i64 + %9591 = llvm.add %9588, %9590 : !llvm.i64 + %9592 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9593 = llvm.mul %5851, %9592 : !llvm.i64 + %9594 = llvm.add %9591, %9593 : !llvm.i64 + %9595 = llvm.getelementptr %9587[%9594] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9596 = llvm.load %9595 : !llvm.ptr + %9597 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9599 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9600 = llvm.mul %5851, %9599 : !llvm.i64 + %9601 = llvm.add %9598, %9600 : !llvm.i64 + %9602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9603 = llvm.mul %6705, %9602 : !llvm.i64 + %9604 = llvm.add %9601, %9603 : !llvm.i64 + %9605 = llvm.getelementptr %9597[%9604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9606 = llvm.load %9605 : !llvm.ptr + %9607 = llvm.fmul %9596, %9606 {RelaxedPrecision} : !llvm.float + %9608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9611 = llvm.mul %26, %9610 : !llvm.i64 + %9612 = llvm.add %9609, %9611 : !llvm.i64 + %9613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9614 = llvm.mul %6705, %9613 : !llvm.i64 + %9615 = llvm.add %9612, %9614 : !llvm.i64 + %9616 = llvm.getelementptr %9608[%9615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9617 = llvm.load %9616 : !llvm.ptr + %9618 = llvm.fadd %9617, %9607 {RelaxedPrecision} : !llvm.float + %9619 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9620 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9621 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9622 = llvm.mul %26, %9621 : !llvm.i64 + %9623 = llvm.add %9620, %9622 : !llvm.i64 + %9624 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9625 = llvm.mul %6705, %9624 : !llvm.i64 + %9626 = llvm.add %9623, %9625 : !llvm.i64 + %9627 = llvm.getelementptr %9619[%9626] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9618, %9627 : !llvm.ptr + %9628 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9631 = llvm.mul %26, %9630 : !llvm.i64 + %9632 = llvm.add %9629, %9631 : !llvm.i64 + %9633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9634 = llvm.mul %6705, %9633 : !llvm.i64 + %9635 = llvm.add %9632, %9634 : !llvm.i64 + %9636 = llvm.getelementptr %9628[%9635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9637 = llvm.load %9636 : !llvm.ptr + %9638 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9639 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9640 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9641 = llvm.mul %26, %9640 : !llvm.i64 + %9642 = llvm.add %9639, %9641 : !llvm.i64 + %9643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9644 = llvm.mul %6705, %9643 : !llvm.i64 + %9645 = llvm.add %9642, %9644 : !llvm.i64 + %9646 = llvm.getelementptr %9638[%9645] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9637, %9646 : !llvm.ptr + %9647 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9648 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9649 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9650 = llvm.mul %26, %9649 : !llvm.i64 + %9651 = llvm.add %9648, %9650 : !llvm.i64 + %9652 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9653 = llvm.mul %5851, %9652 : !llvm.i64 + %9654 = llvm.add %9651, %9653 : !llvm.i64 + %9655 = llvm.getelementptr %9647[%9654] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9656 = llvm.load %9655 : !llvm.ptr + %9657 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9659 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9660 = llvm.mul %5851, %9659 : !llvm.i64 + %9661 = llvm.add %9658, %9660 : !llvm.i64 + %9662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9663 = llvm.mul %6766, %9662 : !llvm.i64 + %9664 = llvm.add %9661, %9663 : !llvm.i64 + %9665 = llvm.getelementptr %9657[%9664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9666 = llvm.load %9665 : !llvm.ptr + %9667 = llvm.fmul %9656, %9666 {RelaxedPrecision} : !llvm.float + %9668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9671 = llvm.mul %26, %9670 : !llvm.i64 + %9672 = llvm.add %9669, %9671 : !llvm.i64 + %9673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9674 = llvm.mul %6766, %9673 : !llvm.i64 + %9675 = llvm.add %9672, %9674 : !llvm.i64 + %9676 = llvm.getelementptr %9668[%9675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9677 = llvm.load %9676 : !llvm.ptr + %9678 = llvm.fadd %9677, %9667 {RelaxedPrecision} : !llvm.float + %9679 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9680 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9681 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9682 = llvm.mul %26, %9681 : !llvm.i64 + %9683 = llvm.add %9680, %9682 : !llvm.i64 + %9684 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9685 = llvm.mul %6766, %9684 : !llvm.i64 + %9686 = llvm.add %9683, %9685 : !llvm.i64 + %9687 = llvm.getelementptr %9679[%9686] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9678, %9687 : !llvm.ptr + %9688 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9691 = llvm.mul %26, %9690 : !llvm.i64 + %9692 = llvm.add %9689, %9691 : !llvm.i64 + %9693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9694 = llvm.mul %6766, %9693 : !llvm.i64 + %9695 = llvm.add %9692, %9694 : !llvm.i64 + %9696 = llvm.getelementptr %9688[%9695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9697 = llvm.load %9696 : !llvm.ptr + %9698 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9700 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9701 = llvm.mul %26, %9700 : !llvm.i64 + %9702 = llvm.add %9699, %9701 : !llvm.i64 + %9703 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9704 = llvm.mul %6766, %9703 : !llvm.i64 + %9705 = llvm.add %9702, %9704 : !llvm.i64 + %9706 = llvm.getelementptr %9698[%9705] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9697, %9706 : !llvm.ptr + %9707 = llvm.add %5848, %33 : !llvm.i64 + llvm.br ^bb19(%9707 : !llvm.i64) + ^bb21: // pred: ^bb19 + %9708 = llvm.add %5846, %36 : !llvm.i64 + llvm.br ^bb17(%9708 : !llvm.i64) + ^bb22: // pred: ^bb17 + %9709 = llvm.add %5844, %30 : !llvm.i64 + llvm.br ^bb15(%9709 : !llvm.i64) + ^bb23: // pred: ^bb15 + %9710 = llvm.add %48, %29 : !llvm.i64 + llvm.br ^bb1(%9710 : !llvm.i64) + ^bb24: // pred: ^bb1 + llvm.return + } + llvm.func @optimized_matmul_py(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(784 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(128 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(128 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(512 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(512 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(784 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(512 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(512 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @optimized_matmul_py_impl_17630232307017152746(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/optimized_matmul/mlir/22_LLVMLegalizeForExport.mlir b/Tutorials/optimized_matmul/mlir/22_LLVMLegalizeForExport.mlir new file mode 100644 index 00000000..0531ddef --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/22_LLVMLegalizeForExport.mlir @@ -0,0 +1,7762 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.mlir.global internal @cache_17() : !llvm.array<4096 x vec<8 x float>> + llvm.mlir.global internal @cache_16() : !llvm.array<192 x vec<8 x float>> + llvm.func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(0 : i64) : !llvm.i64 + %25 = llvm.mlir.constant(1 : i64) : !llvm.i64 + %26 = llvm.mlir.constant(2 : i64) : !llvm.i64 + %27 = llvm.mlir.constant(3 : i64) : !llvm.i64 + %28 = llvm.mlir.constant(4 : i64) : !llvm.i64 + %29 = llvm.mlir.constant(5 : i64) : !llvm.i64 + %30 = llvm.mlir.constant(6 : i64) : !llvm.i64 + %31 = llvm.mlir.constant(7 : i64) : !llvm.i64 + %32 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %33 = llvm.mlir.constant(10 : index) : !llvm.i64 + %34 = llvm.mlir.constant(12 : index) : !llvm.i64 + %35 = llvm.mlir.constant(14 : index) : !llvm.i64 + %36 = llvm.mlir.constant(512 : index) : !llvm.i64 + %37 = llvm.mlir.constant(784 : index) : !llvm.i64 + %38 = llvm.mlir.constant(256 : index) : !llvm.i64 + %39 = llvm.mlir.constant(128 : index) : !llvm.i64 + %40 = llvm.mlir.constant(true) : !llvm.i1 + %41 = llvm.mlir.constant(24 : index) : !llvm.i64 + %42 = llvm.mlir.constant(32 : index) : !llvm.i64 + %43 = llvm.mlir.constant(40 : index) : !llvm.i64 + %44 = llvm.mlir.constant(48 : index) : !llvm.i64 + %45 = llvm.mlir.constant(3 : index) : !llvm.i64 + %46 = llvm.mlir.constant(56 : index) : !llvm.i64 + %47 = llvm.mlir.constant(64 : index) : !llvm.i64 + %48 = llvm.mlir.constant(4 : index) : !llvm.i64 + %49 = llvm.mlir.constant(72 : index) : !llvm.i64 + %50 = llvm.mlir.constant(9 : index) : !llvm.i64 + %51 = llvm.mlir.constant(80 : index) : !llvm.i64 + %52 = llvm.mlir.constant(5 : index) : !llvm.i64 + %53 = llvm.mlir.constant(88 : index) : !llvm.i64 + %54 = llvm.mlir.constant(11 : index) : !llvm.i64 + %55 = llvm.mlir.constant(96 : index) : !llvm.i64 + %56 = llvm.mlir.constant(6 : index) : !llvm.i64 + %57 = llvm.mlir.constant(104 : index) : !llvm.i64 + %58 = llvm.mlir.constant(13 : index) : !llvm.i64 + %59 = llvm.mlir.constant(112 : index) : !llvm.i64 + %60 = llvm.mlir.constant(-16 : index) : !llvm.i64 + %61 = llvm.mlir.constant(7 : index) : !llvm.i64 + %62 = llvm.mlir.constant(120 : index) : !llvm.i64 + %63 = llvm.mlir.constant(2 : index) : !llvm.i64 + %64 = llvm.mlir.constant(-1 : index) : !llvm.i64 + %65 = llvm.mlir.constant(-2 : index) : !llvm.i64 + %66 = llvm.mlir.constant(15 : index) : !llvm.i64 + %67 = llvm.mlir.constant(0 : index) : !llvm.i64 + %68 = llvm.mlir.constant(16 : index) : !llvm.i64 + %69 = llvm.mlir.constant(1 : index) : !llvm.i64 + %70 = llvm.mlir.constant(8 : index) : !llvm.i64 + %71 = llvm.mlir.constant(1 : index) : !llvm.i64 + %72 = llvm.mlir.constant(16 : index) : !llvm.i64 + %73 = llvm.mul %71, %72 : !llvm.i64 + %74 = llvm.mlir.null : !llvm.ptr> + %75 = llvm.mlir.constant(1 : index) : !llvm.i64 + %76 = llvm.getelementptr %74[%75] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %77 = llvm.ptrtoint %76 : !llvm.ptr> to !llvm.i64 + %78 = llvm.mul %73, %77 : !llvm.i64 + %79 = llvm.alloca %78 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %80 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %81 = llvm.insertvalue %79, %80[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %82 = llvm.insertvalue %79, %81[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %83 = llvm.mlir.constant(0 : index) : !llvm.i64 + %84 = llvm.insertvalue %83, %82[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %85 = llvm.mlir.constant(1 : index) : !llvm.i64 + %86 = llvm.mlir.constant(16 : index) : !llvm.i64 + %87 = llvm.insertvalue %71, %84[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %88 = llvm.insertvalue %86, %87[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.insertvalue %72, %88[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %90 = llvm.insertvalue %85, %89[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %91 = llvm.mlir.constant(1 : index) : !llvm.i64 + %92 = llvm.mlir.constant(16 : index) : !llvm.i64 + %93 = llvm.mul %91, %92 : !llvm.i64 + %94 = llvm.mlir.null : !llvm.ptr> + %95 = llvm.mlir.constant(1 : index) : !llvm.i64 + %96 = llvm.getelementptr %94[%95] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %97 = llvm.ptrtoint %96 : !llvm.ptr> to !llvm.i64 + %98 = llvm.mul %93, %97 : !llvm.i64 + %99 = llvm.alloca %98 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %100 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %101 = llvm.insertvalue %99, %100[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %102 = llvm.insertvalue %99, %101[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %103 = llvm.mlir.constant(0 : index) : !llvm.i64 + %104 = llvm.insertvalue %103, %102[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %105 = llvm.mlir.constant(1 : index) : !llvm.i64 + %106 = llvm.mlir.constant(16 : index) : !llvm.i64 + %107 = llvm.insertvalue %91, %104[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %108 = llvm.insertvalue %106, %107[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.insertvalue %92, %108[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %110 = llvm.insertvalue %105, %109[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %111 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %112 = llvm.mlir.addressof @cache_16 : !llvm.ptr>> + %113 = llvm.getelementptr %112[%111, %111] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %114 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %115 = llvm.insertvalue %113, %114[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %116 = llvm.insertvalue %113, %115[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %117 = llvm.mlir.constant(0 : index) : !llvm.i64 + %118 = llvm.insertvalue %117, %116[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %119 = llvm.mlir.constant(16 : index) : !llvm.i64 + %120 = llvm.insertvalue %119, %118[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %121 = llvm.mlir.constant(12 : index) : !llvm.i64 + %122 = llvm.insertvalue %121, %120[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %123 = llvm.mlir.constant(6 : index) : !llvm.i64 + %124 = llvm.insertvalue %123, %122[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %126 = llvm.insertvalue %125, %124[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %127 = llvm.mlir.constant(2 : index) : !llvm.i64 + %128 = llvm.insertvalue %127, %126[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %129 = llvm.mlir.constant(1 : index) : !llvm.i64 + %130 = llvm.insertvalue %129, %128[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %131 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %132 = llvm.mlir.addressof @cache_17 : !llvm.ptr>> + %133 = llvm.getelementptr %132[%131, %131] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %134 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %135 = llvm.insertvalue %133, %134[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %136 = llvm.insertvalue %133, %135[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %137 = llvm.mlir.constant(0 : index) : !llvm.i64 + %138 = llvm.insertvalue %137, %136[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %139 = llvm.mlir.constant(16 : index) : !llvm.i64 + %140 = llvm.insertvalue %139, %138[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.insertvalue %141, %140[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %143 = llvm.mlir.constant(128 : index) : !llvm.i64 + %144 = llvm.insertvalue %143, %142[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %145 = llvm.mlir.constant(2 : index) : !llvm.i64 + %146 = llvm.insertvalue %145, %144[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %147 = llvm.mlir.constant(2 : index) : !llvm.i64 + %148 = llvm.insertvalue %147, %146[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %150 = llvm.insertvalue %149, %148[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + llvm.br ^bb1(%67 : !llvm.i64) + ^bb1(%151: !llvm.i64): // 2 preds: ^bb0, ^bb52 + %152 = llvm.icmp "slt" %151, %36 : !llvm.i64 + llvm.cond_br %152, ^bb2, ^bb53 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%67 : !llvm.i64) + ^bb3(%153: !llvm.i64): // 2 preds: ^bb2, ^bb10 + %154 = llvm.icmp "slt" %153, %39 : !llvm.i64 + llvm.cond_br %154, ^bb4, ^bb11 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%67 : !llvm.i64) + ^bb5(%155: !llvm.i64): // 2 preds: ^bb4, ^bb9 + %156 = llvm.icmp "slt" %155, %38 : !llvm.i64 + llvm.cond_br %156, ^bb6, ^bb10 + ^bb6: // pred: ^bb5 + llvm.cond_br %40, ^bb7, ^bb8 + ^bb7: // pred: ^bb6 + %157 = llvm.add %151, %155 : !llvm.i64 + %158 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %161 = llvm.mul %153, %160 : !llvm.i64 + %162 = llvm.add %159, %161 : !llvm.i64 + %163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %164 = llvm.mul %157, %163 : !llvm.i64 + %165 = llvm.add %162, %164 : !llvm.i64 + %166 = llvm.getelementptr %158[%165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %167 = llvm.bitcast %166 : !llvm.ptr to !llvm.ptr> + %168 = llvm.load %167 {alignment = 4 : i64} : !llvm.ptr> + %169 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(16 : index) : !llvm.i64 + %172 = llvm.mul %67, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %67, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %168, %177 : !llvm.ptr> + %178 = llvm.add %157, %70 : !llvm.i64 + %179 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %182 = llvm.mul %153, %181 : !llvm.i64 + %183 = llvm.add %180, %182 : !llvm.i64 + %184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %185 = llvm.mul %178, %184 : !llvm.i64 + %186 = llvm.add %183, %185 : !llvm.i64 + %187 = llvm.getelementptr %179[%186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %188 = llvm.bitcast %187 : !llvm.ptr to !llvm.ptr> + %189 = llvm.load %188 {alignment = 4 : i64} : !llvm.ptr> + %190 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %192 = llvm.mlir.constant(16 : index) : !llvm.i64 + %193 = llvm.mul %67, %192 : !llvm.i64 + %194 = llvm.add %191, %193 : !llvm.i64 + %195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %196 = llvm.mul %69, %195 : !llvm.i64 + %197 = llvm.add %194, %196 : !llvm.i64 + %198 = llvm.getelementptr %190[%197] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %189, %198 : !llvm.ptr> + %199 = llvm.add %157, %68 : !llvm.i64 + %200 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(512 : index) : !llvm.i64 + %203 = llvm.mul %153, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %199, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.bitcast %208 : !llvm.ptr to !llvm.ptr> + %210 = llvm.load %209 {alignment = 4 : i64} : !llvm.ptr> + %211 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %212 = llvm.mlir.constant(0 : index) : !llvm.i64 + %213 = llvm.mlir.constant(16 : index) : !llvm.i64 + %214 = llvm.mul %67, %213 : !llvm.i64 + %215 = llvm.add %212, %214 : !llvm.i64 + %216 = llvm.mlir.constant(1 : index) : !llvm.i64 + %217 = llvm.mul %63, %216 : !llvm.i64 + %218 = llvm.add %215, %217 : !llvm.i64 + %219 = llvm.getelementptr %211[%218] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %210, %219 : !llvm.ptr> + %220 = llvm.add %157, %41 : !llvm.i64 + %221 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %223 = llvm.mlir.constant(512 : index) : !llvm.i64 + %224 = llvm.mul %153, %223 : !llvm.i64 + %225 = llvm.add %222, %224 : !llvm.i64 + %226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %227 = llvm.mul %220, %226 : !llvm.i64 + %228 = llvm.add %225, %227 : !llvm.i64 + %229 = llvm.getelementptr %221[%228] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %230 = llvm.bitcast %229 : !llvm.ptr to !llvm.ptr> + %231 = llvm.load %230 {alignment = 4 : i64} : !llvm.ptr> + %232 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %233 = llvm.mlir.constant(0 : index) : !llvm.i64 + %234 = llvm.mlir.constant(16 : index) : !llvm.i64 + %235 = llvm.mul %67, %234 : !llvm.i64 + %236 = llvm.add %233, %235 : !llvm.i64 + %237 = llvm.mlir.constant(1 : index) : !llvm.i64 + %238 = llvm.mul %45, %237 : !llvm.i64 + %239 = llvm.add %236, %238 : !llvm.i64 + %240 = llvm.getelementptr %232[%239] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %231, %240 : !llvm.ptr> + %241 = llvm.add %157, %42 : !llvm.i64 + %242 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %243 = llvm.mlir.constant(0 : index) : !llvm.i64 + %244 = llvm.mlir.constant(512 : index) : !llvm.i64 + %245 = llvm.mul %153, %244 : !llvm.i64 + %246 = llvm.add %243, %245 : !llvm.i64 + %247 = llvm.mlir.constant(1 : index) : !llvm.i64 + %248 = llvm.mul %241, %247 : !llvm.i64 + %249 = llvm.add %246, %248 : !llvm.i64 + %250 = llvm.getelementptr %242[%249] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %251 = llvm.bitcast %250 : !llvm.ptr to !llvm.ptr> + %252 = llvm.load %251 {alignment = 4 : i64} : !llvm.ptr> + %253 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %254 = llvm.mlir.constant(0 : index) : !llvm.i64 + %255 = llvm.mlir.constant(16 : index) : !llvm.i64 + %256 = llvm.mul %67, %255 : !llvm.i64 + %257 = llvm.add %254, %256 : !llvm.i64 + %258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %259 = llvm.mul %48, %258 : !llvm.i64 + %260 = llvm.add %257, %259 : !llvm.i64 + %261 = llvm.getelementptr %253[%260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %252, %261 : !llvm.ptr> + %262 = llvm.add %157, %43 : !llvm.i64 + %263 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %264 = llvm.mlir.constant(0 : index) : !llvm.i64 + %265 = llvm.mlir.constant(512 : index) : !llvm.i64 + %266 = llvm.mul %153, %265 : !llvm.i64 + %267 = llvm.add %264, %266 : !llvm.i64 + %268 = llvm.mlir.constant(1 : index) : !llvm.i64 + %269 = llvm.mul %262, %268 : !llvm.i64 + %270 = llvm.add %267, %269 : !llvm.i64 + %271 = llvm.getelementptr %263[%270] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %272 = llvm.bitcast %271 : !llvm.ptr to !llvm.ptr> + %273 = llvm.load %272 {alignment = 4 : i64} : !llvm.ptr> + %274 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %275 = llvm.mlir.constant(0 : index) : !llvm.i64 + %276 = llvm.mlir.constant(16 : index) : !llvm.i64 + %277 = llvm.mul %67, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.mlir.constant(1 : index) : !llvm.i64 + %280 = llvm.mul %52, %279 : !llvm.i64 + %281 = llvm.add %278, %280 : !llvm.i64 + %282 = llvm.getelementptr %274[%281] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %273, %282 : !llvm.ptr> + %283 = llvm.add %157, %44 : !llvm.i64 + %284 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %286 = llvm.mlir.constant(512 : index) : !llvm.i64 + %287 = llvm.mul %153, %286 : !llvm.i64 + %288 = llvm.add %285, %287 : !llvm.i64 + %289 = llvm.mlir.constant(1 : index) : !llvm.i64 + %290 = llvm.mul %283, %289 : !llvm.i64 + %291 = llvm.add %288, %290 : !llvm.i64 + %292 = llvm.getelementptr %284[%291] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %293 = llvm.bitcast %292 : !llvm.ptr to !llvm.ptr> + %294 = llvm.load %293 {alignment = 4 : i64} : !llvm.ptr> + %295 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %296 = llvm.mlir.constant(0 : index) : !llvm.i64 + %297 = llvm.mlir.constant(16 : index) : !llvm.i64 + %298 = llvm.mul %67, %297 : !llvm.i64 + %299 = llvm.add %296, %298 : !llvm.i64 + %300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %301 = llvm.mul %56, %300 : !llvm.i64 + %302 = llvm.add %299, %301 : !llvm.i64 + %303 = llvm.getelementptr %295[%302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %294, %303 : !llvm.ptr> + %304 = llvm.add %157, %46 : !llvm.i64 + %305 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %306 = llvm.mlir.constant(0 : index) : !llvm.i64 + %307 = llvm.mlir.constant(512 : index) : !llvm.i64 + %308 = llvm.mul %153, %307 : !llvm.i64 + %309 = llvm.add %306, %308 : !llvm.i64 + %310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %311 = llvm.mul %304, %310 : !llvm.i64 + %312 = llvm.add %309, %311 : !llvm.i64 + %313 = llvm.getelementptr %305[%312] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %314 = llvm.bitcast %313 : !llvm.ptr to !llvm.ptr> + %315 = llvm.load %314 {alignment = 4 : i64} : !llvm.ptr> + %316 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %318 = llvm.mlir.constant(16 : index) : !llvm.i64 + %319 = llvm.mul %67, %318 : !llvm.i64 + %320 = llvm.add %317, %319 : !llvm.i64 + %321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %322 = llvm.mul %61, %321 : !llvm.i64 + %323 = llvm.add %320, %322 : !llvm.i64 + %324 = llvm.getelementptr %316[%323] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %315, %324 : !llvm.ptr> + %325 = llvm.add %157, %47 : !llvm.i64 + %326 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %328 = llvm.mlir.constant(512 : index) : !llvm.i64 + %329 = llvm.mul %153, %328 : !llvm.i64 + %330 = llvm.add %327, %329 : !llvm.i64 + %331 = llvm.mlir.constant(1 : index) : !llvm.i64 + %332 = llvm.mul %325, %331 : !llvm.i64 + %333 = llvm.add %330, %332 : !llvm.i64 + %334 = llvm.getelementptr %326[%333] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %335 = llvm.bitcast %334 : !llvm.ptr to !llvm.ptr> + %336 = llvm.load %335 {alignment = 4 : i64} : !llvm.ptr> + %337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %340 = llvm.mul %67, %339 : !llvm.i64 + %341 = llvm.add %338, %340 : !llvm.i64 + %342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %343 = llvm.mul %70, %342 : !llvm.i64 + %344 = llvm.add %341, %343 : !llvm.i64 + %345 = llvm.getelementptr %337[%344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %336, %345 : !llvm.ptr> + %346 = llvm.add %157, %49 : !llvm.i64 + %347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %350 = llvm.mul %153, %349 : !llvm.i64 + %351 = llvm.add %348, %350 : !llvm.i64 + %352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %353 = llvm.mul %346, %352 : !llvm.i64 + %354 = llvm.add %351, %353 : !llvm.i64 + %355 = llvm.getelementptr %347[%354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %356 = llvm.bitcast %355 : !llvm.ptr to !llvm.ptr> + %357 = llvm.load %356 {alignment = 4 : i64} : !llvm.ptr> + %358 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %360 = llvm.mlir.constant(16 : index) : !llvm.i64 + %361 = llvm.mul %67, %360 : !llvm.i64 + %362 = llvm.add %359, %361 : !llvm.i64 + %363 = llvm.mlir.constant(1 : index) : !llvm.i64 + %364 = llvm.mul %50, %363 : !llvm.i64 + %365 = llvm.add %362, %364 : !llvm.i64 + %366 = llvm.getelementptr %358[%365] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %357, %366 : !llvm.ptr> + %367 = llvm.add %157, %51 : !llvm.i64 + %368 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %371 = llvm.mul %153, %370 : !llvm.i64 + %372 = llvm.add %369, %371 : !llvm.i64 + %373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %374 = llvm.mul %367, %373 : !llvm.i64 + %375 = llvm.add %372, %374 : !llvm.i64 + %376 = llvm.getelementptr %368[%375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %377 = llvm.bitcast %376 : !llvm.ptr to !llvm.ptr> + %378 = llvm.load %377 {alignment = 4 : i64} : !llvm.ptr> + %379 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %380 = llvm.mlir.constant(0 : index) : !llvm.i64 + %381 = llvm.mlir.constant(16 : index) : !llvm.i64 + %382 = llvm.mul %67, %381 : !llvm.i64 + %383 = llvm.add %380, %382 : !llvm.i64 + %384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %385 = llvm.mul %33, %384 : !llvm.i64 + %386 = llvm.add %383, %385 : !llvm.i64 + %387 = llvm.getelementptr %379[%386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %378, %387 : !llvm.ptr> + %388 = llvm.add %157, %53 : !llvm.i64 + %389 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %390 = llvm.mlir.constant(0 : index) : !llvm.i64 + %391 = llvm.mlir.constant(512 : index) : !llvm.i64 + %392 = llvm.mul %153, %391 : !llvm.i64 + %393 = llvm.add %390, %392 : !llvm.i64 + %394 = llvm.mlir.constant(1 : index) : !llvm.i64 + %395 = llvm.mul %388, %394 : !llvm.i64 + %396 = llvm.add %393, %395 : !llvm.i64 + %397 = llvm.getelementptr %389[%396] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %398 = llvm.bitcast %397 : !llvm.ptr to !llvm.ptr> + %399 = llvm.load %398 {alignment = 4 : i64} : !llvm.ptr> + %400 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %403 = llvm.mul %67, %402 : !llvm.i64 + %404 = llvm.add %401, %403 : !llvm.i64 + %405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %406 = llvm.mul %54, %405 : !llvm.i64 + %407 = llvm.add %404, %406 : !llvm.i64 + %408 = llvm.getelementptr %400[%407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %399, %408 : !llvm.ptr> + %409 = llvm.add %157, %55 : !llvm.i64 + %410 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %413 = llvm.mul %153, %412 : !llvm.i64 + %414 = llvm.add %411, %413 : !llvm.i64 + %415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %416 = llvm.mul %409, %415 : !llvm.i64 + %417 = llvm.add %414, %416 : !llvm.i64 + %418 = llvm.getelementptr %410[%417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %419 = llvm.bitcast %418 : !llvm.ptr to !llvm.ptr> + %420 = llvm.load %419 {alignment = 4 : i64} : !llvm.ptr> + %421 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %423 = llvm.mlir.constant(16 : index) : !llvm.i64 + %424 = llvm.mul %67, %423 : !llvm.i64 + %425 = llvm.add %422, %424 : !llvm.i64 + %426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %427 = llvm.mul %34, %426 : !llvm.i64 + %428 = llvm.add %425, %427 : !llvm.i64 + %429 = llvm.getelementptr %421[%428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %420, %429 : !llvm.ptr> + %430 = llvm.add %157, %57 : !llvm.i64 + %431 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %433 = llvm.mlir.constant(512 : index) : !llvm.i64 + %434 = llvm.mul %153, %433 : !llvm.i64 + %435 = llvm.add %432, %434 : !llvm.i64 + %436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %437 = llvm.mul %430, %436 : !llvm.i64 + %438 = llvm.add %435, %437 : !llvm.i64 + %439 = llvm.getelementptr %431[%438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %440 = llvm.bitcast %439 : !llvm.ptr to !llvm.ptr> + %441 = llvm.load %440 {alignment = 4 : i64} : !llvm.ptr> + %442 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %444 = llvm.mlir.constant(16 : index) : !llvm.i64 + %445 = llvm.mul %67, %444 : !llvm.i64 + %446 = llvm.add %443, %445 : !llvm.i64 + %447 = llvm.mlir.constant(1 : index) : !llvm.i64 + %448 = llvm.mul %58, %447 : !llvm.i64 + %449 = llvm.add %446, %448 : !llvm.i64 + %450 = llvm.getelementptr %442[%449] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %441, %450 : !llvm.ptr> + %451 = llvm.add %157, %59 : !llvm.i64 + %452 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %455 = llvm.mul %153, %454 : !llvm.i64 + %456 = llvm.add %453, %455 : !llvm.i64 + %457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %458 = llvm.mul %451, %457 : !llvm.i64 + %459 = llvm.add %456, %458 : !llvm.i64 + %460 = llvm.getelementptr %452[%459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %461 = llvm.bitcast %460 : !llvm.ptr to !llvm.ptr> + %462 = llvm.load %461 {alignment = 4 : i64} : !llvm.ptr> + %463 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %464 = llvm.mlir.constant(0 : index) : !llvm.i64 + %465 = llvm.mlir.constant(16 : index) : !llvm.i64 + %466 = llvm.mul %67, %465 : !llvm.i64 + %467 = llvm.add %464, %466 : !llvm.i64 + %468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %469 = llvm.mul %35, %468 : !llvm.i64 + %470 = llvm.add %467, %469 : !llvm.i64 + %471 = llvm.getelementptr %463[%470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %462, %471 : !llvm.ptr> + %472 = llvm.add %157, %62 : !llvm.i64 + %473 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %474 = llvm.mlir.constant(0 : index) : !llvm.i64 + %475 = llvm.mlir.constant(512 : index) : !llvm.i64 + %476 = llvm.mul %153, %475 : !llvm.i64 + %477 = llvm.add %474, %476 : !llvm.i64 + %478 = llvm.mlir.constant(1 : index) : !llvm.i64 + %479 = llvm.mul %472, %478 : !llvm.i64 + %480 = llvm.add %477, %479 : !llvm.i64 + %481 = llvm.getelementptr %473[%480] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %482 = llvm.bitcast %481 : !llvm.ptr to !llvm.ptr> + %483 = llvm.load %482 {alignment = 4 : i64} : !llvm.ptr> + %484 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %485 = llvm.mlir.constant(0 : index) : !llvm.i64 + %486 = llvm.mlir.constant(16 : index) : !llvm.i64 + %487 = llvm.mul %67, %486 : !llvm.i64 + %488 = llvm.add %485, %487 : !llvm.i64 + %489 = llvm.mlir.constant(1 : index) : !llvm.i64 + %490 = llvm.mul %66, %489 : !llvm.i64 + %491 = llvm.add %488, %490 : !llvm.i64 + %492 = llvm.getelementptr %484[%491] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %483, %492 : !llvm.ptr> + %493 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %494 = llvm.mlir.constant(0 : index) : !llvm.i64 + %495 = llvm.mlir.constant(16 : index) : !llvm.i64 + %496 = llvm.mul %67, %495 : !llvm.i64 + %497 = llvm.add %494, %496 : !llvm.i64 + %498 = llvm.mlir.constant(1 : index) : !llvm.i64 + %499 = llvm.mul %67, %498 : !llvm.i64 + %500 = llvm.add %497, %499 : !llvm.i64 + %501 = llvm.getelementptr %493[%500] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %502 = llvm.load %501 : !llvm.ptr> + %503 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %504 = llvm.sub %64, %155 : !llvm.i64 + %505 = llvm.select %503, %504, %155 : !llvm.i1, !llvm.i64 + %506 = llvm.sdiv %505, %68 : !llvm.i64 + %507 = llvm.sub %64, %506 : !llvm.i64 + %508 = llvm.select %503, %507, %506 : !llvm.i1, !llvm.i64 + %509 = llvm.srem %508, %68 : !llvm.i64 + %510 = llvm.icmp "slt" %509, %67 : !llvm.i64 + %511 = llvm.add %509, %68 : !llvm.i64 + %512 = llvm.select %510, %511, %509 : !llvm.i1, !llvm.i64 + %513 = llvm.srem %153, %39 : !llvm.i64 + %514 = llvm.icmp "slt" %513, %67 : !llvm.i64 + %515 = llvm.add %513, %39 : !llvm.i64 + %516 = llvm.select %514, %515, %513 : !llvm.i1, !llvm.i64 + %517 = llvm.srem %155, %68 : !llvm.i64 + %518 = llvm.icmp "slt" %517, %67 : !llvm.i64 + %519 = llvm.add %517, %68 : !llvm.i64 + %520 = llvm.select %518, %519, %517 : !llvm.i1, !llvm.i64 + %521 = llvm.icmp "slt" %520, %67 : !llvm.i64 + %522 = llvm.sub %64, %520 : !llvm.i64 + %523 = llvm.select %521, %522, %520 : !llvm.i1, !llvm.i64 + %524 = llvm.sdiv %523, %70 : !llvm.i64 + %525 = llvm.sub %64, %524 : !llvm.i64 + %526 = llvm.select %521, %525, %524 : !llvm.i1, !llvm.i64 + %527 = llvm.srem %526, %63 : !llvm.i64 + %528 = llvm.icmp "slt" %527, %67 : !llvm.i64 + %529 = llvm.add %527, %63 : !llvm.i64 + %530 = llvm.select %528, %529, %527 : !llvm.i1, !llvm.i64 + %531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %534 = llvm.mul %512, %533 : !llvm.i64 + %535 = llvm.add %532, %534 : !llvm.i64 + %536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %537 = llvm.mul %516, %536 : !llvm.i64 + %538 = llvm.add %535, %537 : !llvm.i64 + %539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %540 = llvm.mul %530, %539 : !llvm.i64 + %541 = llvm.add %538, %540 : !llvm.i64 + %542 = llvm.getelementptr %531[%541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %502, %542 : !llvm.ptr> + %543 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %544 = llvm.mlir.constant(0 : index) : !llvm.i64 + %545 = llvm.mlir.constant(16 : index) : !llvm.i64 + %546 = llvm.mul %67, %545 : !llvm.i64 + %547 = llvm.add %544, %546 : !llvm.i64 + %548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %549 = llvm.mul %69, %548 : !llvm.i64 + %550 = llvm.add %547, %549 : !llvm.i64 + %551 = llvm.getelementptr %543[%550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %552 = llvm.load %551 : !llvm.ptr> + %553 = llvm.add %155, %70 : !llvm.i64 + %554 = llvm.icmp "slt" %553, %67 : !llvm.i64 + %555 = llvm.sub %64, %553 : !llvm.i64 + %556 = llvm.select %554, %555, %553 : !llvm.i1, !llvm.i64 + %557 = llvm.sdiv %556, %68 : !llvm.i64 + %558 = llvm.sub %64, %557 : !llvm.i64 + %559 = llvm.select %554, %558, %557 : !llvm.i1, !llvm.i64 + %560 = llvm.srem %559, %68 : !llvm.i64 + %561 = llvm.icmp "slt" %560, %67 : !llvm.i64 + %562 = llvm.add %560, %68 : !llvm.i64 + %563 = llvm.select %561, %562, %560 : !llvm.i1, !llvm.i64 + %564 = llvm.sdiv %505, %70 : !llvm.i64 + %565 = llvm.sub %64, %564 : !llvm.i64 + %566 = llvm.select %503, %565, %564 : !llvm.i1, !llvm.i64 + %567 = llvm.mul %559, %65 : !llvm.i64 + %568 = llvm.add %566, %567 : !llvm.i64 + %569 = llvm.add %568, %69 : !llvm.i64 + %570 = llvm.icmp "slt" %569, %67 : !llvm.i64 + %571 = llvm.sub %64, %569 : !llvm.i64 + %572 = llvm.select %570, %571, %569 : !llvm.i1, !llvm.i64 + %573 = llvm.sdiv %572, %63 : !llvm.i64 + %574 = llvm.sub %64, %573 : !llvm.i64 + %575 = llvm.select %570, %574, %573 : !llvm.i1, !llvm.i64 + %576 = llvm.mul %575, %65 : !llvm.i64 + %577 = llvm.add %568, %576 : !llvm.i64 + %578 = llvm.add %577, %69 : !llvm.i64 + %579 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %581 = llvm.mlir.constant(256 : index) : !llvm.i64 + %582 = llvm.mul %563, %581 : !llvm.i64 + %583 = llvm.add %580, %582 : !llvm.i64 + %584 = llvm.mlir.constant(2 : index) : !llvm.i64 + %585 = llvm.mul %516, %584 : !llvm.i64 + %586 = llvm.add %583, %585 : !llvm.i64 + %587 = llvm.mlir.constant(1 : index) : !llvm.i64 + %588 = llvm.mul %578, %587 : !llvm.i64 + %589 = llvm.add %586, %588 : !llvm.i64 + %590 = llvm.getelementptr %579[%589] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %552, %590 : !llvm.ptr> + %591 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %593 = llvm.mlir.constant(16 : index) : !llvm.i64 + %594 = llvm.mul %67, %593 : !llvm.i64 + %595 = llvm.add %592, %594 : !llvm.i64 + %596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %597 = llvm.mul %63, %596 : !llvm.i64 + %598 = llvm.add %595, %597 : !llvm.i64 + %599 = llvm.getelementptr %591[%598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %600 = llvm.load %599 : !llvm.ptr> + %601 = llvm.add %508, %69 : !llvm.i64 + %602 = llvm.icmp "slt" %601, %67 : !llvm.i64 + %603 = llvm.sub %64, %601 : !llvm.i64 + %604 = llvm.select %602, %603, %601 : !llvm.i1, !llvm.i64 + %605 = llvm.sdiv %604, %68 : !llvm.i64 + %606 = llvm.sub %64, %605 : !llvm.i64 + %607 = llvm.select %602, %606, %605 : !llvm.i1, !llvm.i64 + %608 = llvm.mul %607, %60 : !llvm.i64 + %609 = llvm.add %508, %608 : !llvm.i64 + %610 = llvm.add %609, %69 : !llvm.i64 + %611 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %612 = llvm.mlir.constant(0 : index) : !llvm.i64 + %613 = llvm.mlir.constant(256 : index) : !llvm.i64 + %614 = llvm.mul %610, %613 : !llvm.i64 + %615 = llvm.add %612, %614 : !llvm.i64 + %616 = llvm.mlir.constant(2 : index) : !llvm.i64 + %617 = llvm.mul %516, %616 : !llvm.i64 + %618 = llvm.add %615, %617 : !llvm.i64 + %619 = llvm.mlir.constant(1 : index) : !llvm.i64 + %620 = llvm.mul %530, %619 : !llvm.i64 + %621 = llvm.add %618, %620 : !llvm.i64 + %622 = llvm.getelementptr %611[%621] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %600, %622 : !llvm.ptr> + %623 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %624 = llvm.mlir.constant(0 : index) : !llvm.i64 + %625 = llvm.mlir.constant(16 : index) : !llvm.i64 + %626 = llvm.mul %67, %625 : !llvm.i64 + %627 = llvm.add %624, %626 : !llvm.i64 + %628 = llvm.mlir.constant(1 : index) : !llvm.i64 + %629 = llvm.mul %45, %628 : !llvm.i64 + %630 = llvm.add %627, %629 : !llvm.i64 + %631 = llvm.getelementptr %623[%630] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %632 = llvm.load %631 : !llvm.ptr> + %633 = llvm.add %155, %41 : !llvm.i64 + %634 = llvm.icmp "slt" %633, %67 : !llvm.i64 + %635 = llvm.sub %64, %633 : !llvm.i64 + %636 = llvm.select %634, %635, %633 : !llvm.i1, !llvm.i64 + %637 = llvm.sdiv %636, %68 : !llvm.i64 + %638 = llvm.sub %64, %637 : !llvm.i64 + %639 = llvm.select %634, %638, %637 : !llvm.i1, !llvm.i64 + %640 = llvm.srem %639, %68 : !llvm.i64 + %641 = llvm.icmp "slt" %640, %67 : !llvm.i64 + %642 = llvm.add %640, %68 : !llvm.i64 + %643 = llvm.select %641, %642, %640 : !llvm.i1, !llvm.i64 + %644 = llvm.mul %639, %65 : !llvm.i64 + %645 = llvm.add %566, %644 : !llvm.i64 + %646 = llvm.add %645, %45 : !llvm.i64 + %647 = llvm.icmp "slt" %646, %67 : !llvm.i64 + %648 = llvm.sub %64, %646 : !llvm.i64 + %649 = llvm.select %647, %648, %646 : !llvm.i1, !llvm.i64 + %650 = llvm.sdiv %649, %63 : !llvm.i64 + %651 = llvm.sub %64, %650 : !llvm.i64 + %652 = llvm.select %647, %651, %650 : !llvm.i1, !llvm.i64 + %653 = llvm.mul %652, %65 : !llvm.i64 + %654 = llvm.add %645, %653 : !llvm.i64 + %655 = llvm.add %654, %45 : !llvm.i64 + %656 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %658 = llvm.mlir.constant(256 : index) : !llvm.i64 + %659 = llvm.mul %643, %658 : !llvm.i64 + %660 = llvm.add %657, %659 : !llvm.i64 + %661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %662 = llvm.mul %516, %661 : !llvm.i64 + %663 = llvm.add %660, %662 : !llvm.i64 + %664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %665 = llvm.mul %655, %664 : !llvm.i64 + %666 = llvm.add %663, %665 : !llvm.i64 + %667 = llvm.getelementptr %656[%666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %632, %667 : !llvm.ptr> + %668 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %670 = llvm.mlir.constant(16 : index) : !llvm.i64 + %671 = llvm.mul %67, %670 : !llvm.i64 + %672 = llvm.add %669, %671 : !llvm.i64 + %673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %674 = llvm.mul %48, %673 : !llvm.i64 + %675 = llvm.add %672, %674 : !llvm.i64 + %676 = llvm.getelementptr %668[%675] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %677 = llvm.load %676 : !llvm.ptr> + %678 = llvm.add %508, %63 : !llvm.i64 + %679 = llvm.icmp "slt" %678, %67 : !llvm.i64 + %680 = llvm.sub %64, %678 : !llvm.i64 + %681 = llvm.select %679, %680, %678 : !llvm.i1, !llvm.i64 + %682 = llvm.sdiv %681, %68 : !llvm.i64 + %683 = llvm.sub %64, %682 : !llvm.i64 + %684 = llvm.select %679, %683, %682 : !llvm.i1, !llvm.i64 + %685 = llvm.mul %684, %60 : !llvm.i64 + %686 = llvm.add %508, %685 : !llvm.i64 + %687 = llvm.add %686, %63 : !llvm.i64 + %688 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %690 = llvm.mlir.constant(256 : index) : !llvm.i64 + %691 = llvm.mul %687, %690 : !llvm.i64 + %692 = llvm.add %689, %691 : !llvm.i64 + %693 = llvm.mlir.constant(2 : index) : !llvm.i64 + %694 = llvm.mul %516, %693 : !llvm.i64 + %695 = llvm.add %692, %694 : !llvm.i64 + %696 = llvm.mlir.constant(1 : index) : !llvm.i64 + %697 = llvm.mul %530, %696 : !llvm.i64 + %698 = llvm.add %695, %697 : !llvm.i64 + %699 = llvm.getelementptr %688[%698] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %677, %699 : !llvm.ptr> + %700 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %701 = llvm.mlir.constant(0 : index) : !llvm.i64 + %702 = llvm.mlir.constant(16 : index) : !llvm.i64 + %703 = llvm.mul %67, %702 : !llvm.i64 + %704 = llvm.add %701, %703 : !llvm.i64 + %705 = llvm.mlir.constant(1 : index) : !llvm.i64 + %706 = llvm.mul %52, %705 : !llvm.i64 + %707 = llvm.add %704, %706 : !llvm.i64 + %708 = llvm.getelementptr %700[%707] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %709 = llvm.load %708 : !llvm.ptr> + %710 = llvm.add %155, %43 : !llvm.i64 + %711 = llvm.icmp "slt" %710, %67 : !llvm.i64 + %712 = llvm.sub %64, %710 : !llvm.i64 + %713 = llvm.select %711, %712, %710 : !llvm.i1, !llvm.i64 + %714 = llvm.sdiv %713, %68 : !llvm.i64 + %715 = llvm.sub %64, %714 : !llvm.i64 + %716 = llvm.select %711, %715, %714 : !llvm.i1, !llvm.i64 + %717 = llvm.srem %716, %68 : !llvm.i64 + %718 = llvm.icmp "slt" %717, %67 : !llvm.i64 + %719 = llvm.add %717, %68 : !llvm.i64 + %720 = llvm.select %718, %719, %717 : !llvm.i1, !llvm.i64 + %721 = llvm.mul %716, %65 : !llvm.i64 + %722 = llvm.add %566, %721 : !llvm.i64 + %723 = llvm.add %722, %52 : !llvm.i64 + %724 = llvm.icmp "slt" %723, %67 : !llvm.i64 + %725 = llvm.sub %64, %723 : !llvm.i64 + %726 = llvm.select %724, %725, %723 : !llvm.i1, !llvm.i64 + %727 = llvm.sdiv %726, %63 : !llvm.i64 + %728 = llvm.sub %64, %727 : !llvm.i64 + %729 = llvm.select %724, %728, %727 : !llvm.i1, !llvm.i64 + %730 = llvm.mul %729, %65 : !llvm.i64 + %731 = llvm.add %722, %730 : !llvm.i64 + %732 = llvm.add %731, %52 : !llvm.i64 + %733 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %734 = llvm.mlir.constant(0 : index) : !llvm.i64 + %735 = llvm.mlir.constant(256 : index) : !llvm.i64 + %736 = llvm.mul %720, %735 : !llvm.i64 + %737 = llvm.add %734, %736 : !llvm.i64 + %738 = llvm.mlir.constant(2 : index) : !llvm.i64 + %739 = llvm.mul %516, %738 : !llvm.i64 + %740 = llvm.add %737, %739 : !llvm.i64 + %741 = llvm.mlir.constant(1 : index) : !llvm.i64 + %742 = llvm.mul %732, %741 : !llvm.i64 + %743 = llvm.add %740, %742 : !llvm.i64 + %744 = llvm.getelementptr %733[%743] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %709, %744 : !llvm.ptr> + %745 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %746 = llvm.mlir.constant(0 : index) : !llvm.i64 + %747 = llvm.mlir.constant(16 : index) : !llvm.i64 + %748 = llvm.mul %67, %747 : !llvm.i64 + %749 = llvm.add %746, %748 : !llvm.i64 + %750 = llvm.mlir.constant(1 : index) : !llvm.i64 + %751 = llvm.mul %56, %750 : !llvm.i64 + %752 = llvm.add %749, %751 : !llvm.i64 + %753 = llvm.getelementptr %745[%752] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %754 = llvm.load %753 : !llvm.ptr> + %755 = llvm.add %508, %45 : !llvm.i64 + %756 = llvm.icmp "slt" %755, %67 : !llvm.i64 + %757 = llvm.sub %64, %755 : !llvm.i64 + %758 = llvm.select %756, %757, %755 : !llvm.i1, !llvm.i64 + %759 = llvm.sdiv %758, %68 : !llvm.i64 + %760 = llvm.sub %64, %759 : !llvm.i64 + %761 = llvm.select %756, %760, %759 : !llvm.i1, !llvm.i64 + %762 = llvm.mul %761, %60 : !llvm.i64 + %763 = llvm.add %508, %762 : !llvm.i64 + %764 = llvm.add %763, %45 : !llvm.i64 + %765 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %766 = llvm.mlir.constant(0 : index) : !llvm.i64 + %767 = llvm.mlir.constant(256 : index) : !llvm.i64 + %768 = llvm.mul %764, %767 : !llvm.i64 + %769 = llvm.add %766, %768 : !llvm.i64 + %770 = llvm.mlir.constant(2 : index) : !llvm.i64 + %771 = llvm.mul %516, %770 : !llvm.i64 + %772 = llvm.add %769, %771 : !llvm.i64 + %773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %774 = llvm.mul %530, %773 : !llvm.i64 + %775 = llvm.add %772, %774 : !llvm.i64 + %776 = llvm.getelementptr %765[%775] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %754, %776 : !llvm.ptr> + %777 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %779 = llvm.mlir.constant(16 : index) : !llvm.i64 + %780 = llvm.mul %67, %779 : !llvm.i64 + %781 = llvm.add %778, %780 : !llvm.i64 + %782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %783 = llvm.mul %61, %782 : !llvm.i64 + %784 = llvm.add %781, %783 : !llvm.i64 + %785 = llvm.getelementptr %777[%784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %786 = llvm.load %785 : !llvm.ptr> + %787 = llvm.add %155, %46 : !llvm.i64 + %788 = llvm.icmp "slt" %787, %67 : !llvm.i64 + %789 = llvm.sub %64, %787 : !llvm.i64 + %790 = llvm.select %788, %789, %787 : !llvm.i1, !llvm.i64 + %791 = llvm.sdiv %790, %68 : !llvm.i64 + %792 = llvm.sub %64, %791 : !llvm.i64 + %793 = llvm.select %788, %792, %791 : !llvm.i1, !llvm.i64 + %794 = llvm.srem %793, %68 : !llvm.i64 + %795 = llvm.icmp "slt" %794, %67 : !llvm.i64 + %796 = llvm.add %794, %68 : !llvm.i64 + %797 = llvm.select %795, %796, %794 : !llvm.i1, !llvm.i64 + %798 = llvm.mul %793, %65 : !llvm.i64 + %799 = llvm.add %566, %798 : !llvm.i64 + %800 = llvm.add %799, %61 : !llvm.i64 + %801 = llvm.icmp "slt" %800, %67 : !llvm.i64 + %802 = llvm.sub %64, %800 : !llvm.i64 + %803 = llvm.select %801, %802, %800 : !llvm.i1, !llvm.i64 + %804 = llvm.sdiv %803, %63 : !llvm.i64 + %805 = llvm.sub %64, %804 : !llvm.i64 + %806 = llvm.select %801, %805, %804 : !llvm.i1, !llvm.i64 + %807 = llvm.mul %806, %65 : !llvm.i64 + %808 = llvm.add %799, %807 : !llvm.i64 + %809 = llvm.add %808, %61 : !llvm.i64 + %810 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %811 = llvm.mlir.constant(0 : index) : !llvm.i64 + %812 = llvm.mlir.constant(256 : index) : !llvm.i64 + %813 = llvm.mul %797, %812 : !llvm.i64 + %814 = llvm.add %811, %813 : !llvm.i64 + %815 = llvm.mlir.constant(2 : index) : !llvm.i64 + %816 = llvm.mul %516, %815 : !llvm.i64 + %817 = llvm.add %814, %816 : !llvm.i64 + %818 = llvm.mlir.constant(1 : index) : !llvm.i64 + %819 = llvm.mul %809, %818 : !llvm.i64 + %820 = llvm.add %817, %819 : !llvm.i64 + %821 = llvm.getelementptr %810[%820] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %786, %821 : !llvm.ptr> + %822 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %824 = llvm.mlir.constant(16 : index) : !llvm.i64 + %825 = llvm.mul %67, %824 : !llvm.i64 + %826 = llvm.add %823, %825 : !llvm.i64 + %827 = llvm.mlir.constant(1 : index) : !llvm.i64 + %828 = llvm.mul %70, %827 : !llvm.i64 + %829 = llvm.add %826, %828 : !llvm.i64 + %830 = llvm.getelementptr %822[%829] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %831 = llvm.load %830 : !llvm.ptr> + %832 = llvm.add %508, %48 : !llvm.i64 + %833 = llvm.icmp "slt" %832, %67 : !llvm.i64 + %834 = llvm.sub %64, %832 : !llvm.i64 + %835 = llvm.select %833, %834, %832 : !llvm.i1, !llvm.i64 + %836 = llvm.sdiv %835, %68 : !llvm.i64 + %837 = llvm.sub %64, %836 : !llvm.i64 + %838 = llvm.select %833, %837, %836 : !llvm.i1, !llvm.i64 + %839 = llvm.mul %838, %60 : !llvm.i64 + %840 = llvm.add %508, %839 : !llvm.i64 + %841 = llvm.add %840, %48 : !llvm.i64 + %842 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %843 = llvm.mlir.constant(0 : index) : !llvm.i64 + %844 = llvm.mlir.constant(256 : index) : !llvm.i64 + %845 = llvm.mul %841, %844 : !llvm.i64 + %846 = llvm.add %843, %845 : !llvm.i64 + %847 = llvm.mlir.constant(2 : index) : !llvm.i64 + %848 = llvm.mul %516, %847 : !llvm.i64 + %849 = llvm.add %846, %848 : !llvm.i64 + %850 = llvm.mlir.constant(1 : index) : !llvm.i64 + %851 = llvm.mul %530, %850 : !llvm.i64 + %852 = llvm.add %849, %851 : !llvm.i64 + %853 = llvm.getelementptr %842[%852] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %831, %853 : !llvm.ptr> + %854 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %855 = llvm.mlir.constant(0 : index) : !llvm.i64 + %856 = llvm.mlir.constant(16 : index) : !llvm.i64 + %857 = llvm.mul %67, %856 : !llvm.i64 + %858 = llvm.add %855, %857 : !llvm.i64 + %859 = llvm.mlir.constant(1 : index) : !llvm.i64 + %860 = llvm.mul %50, %859 : !llvm.i64 + %861 = llvm.add %858, %860 : !llvm.i64 + %862 = llvm.getelementptr %854[%861] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %863 = llvm.load %862 : !llvm.ptr> + %864 = llvm.add %155, %49 : !llvm.i64 + %865 = llvm.icmp "slt" %864, %67 : !llvm.i64 + %866 = llvm.sub %64, %864 : !llvm.i64 + %867 = llvm.select %865, %866, %864 : !llvm.i1, !llvm.i64 + %868 = llvm.sdiv %867, %68 : !llvm.i64 + %869 = llvm.sub %64, %868 : !llvm.i64 + %870 = llvm.select %865, %869, %868 : !llvm.i1, !llvm.i64 + %871 = llvm.srem %870, %68 : !llvm.i64 + %872 = llvm.icmp "slt" %871, %67 : !llvm.i64 + %873 = llvm.add %871, %68 : !llvm.i64 + %874 = llvm.select %872, %873, %871 : !llvm.i1, !llvm.i64 + %875 = llvm.mul %870, %65 : !llvm.i64 + %876 = llvm.add %566, %875 : !llvm.i64 + %877 = llvm.add %876, %50 : !llvm.i64 + %878 = llvm.icmp "slt" %877, %67 : !llvm.i64 + %879 = llvm.sub %64, %877 : !llvm.i64 + %880 = llvm.select %878, %879, %877 : !llvm.i1, !llvm.i64 + %881 = llvm.sdiv %880, %63 : !llvm.i64 + %882 = llvm.sub %64, %881 : !llvm.i64 + %883 = llvm.select %878, %882, %881 : !llvm.i1, !llvm.i64 + %884 = llvm.mul %883, %65 : !llvm.i64 + %885 = llvm.add %876, %884 : !llvm.i64 + %886 = llvm.add %885, %50 : !llvm.i64 + %887 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %888 = llvm.mlir.constant(0 : index) : !llvm.i64 + %889 = llvm.mlir.constant(256 : index) : !llvm.i64 + %890 = llvm.mul %874, %889 : !llvm.i64 + %891 = llvm.add %888, %890 : !llvm.i64 + %892 = llvm.mlir.constant(2 : index) : !llvm.i64 + %893 = llvm.mul %516, %892 : !llvm.i64 + %894 = llvm.add %891, %893 : !llvm.i64 + %895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %896 = llvm.mul %886, %895 : !llvm.i64 + %897 = llvm.add %894, %896 : !llvm.i64 + %898 = llvm.getelementptr %887[%897] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %863, %898 : !llvm.ptr> + %899 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %901 = llvm.mlir.constant(16 : index) : !llvm.i64 + %902 = llvm.mul %67, %901 : !llvm.i64 + %903 = llvm.add %900, %902 : !llvm.i64 + %904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %905 = llvm.mul %33, %904 : !llvm.i64 + %906 = llvm.add %903, %905 : !llvm.i64 + %907 = llvm.getelementptr %899[%906] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %908 = llvm.load %907 : !llvm.ptr> + %909 = llvm.add %508, %52 : !llvm.i64 + %910 = llvm.icmp "slt" %909, %67 : !llvm.i64 + %911 = llvm.sub %64, %909 : !llvm.i64 + %912 = llvm.select %910, %911, %909 : !llvm.i1, !llvm.i64 + %913 = llvm.sdiv %912, %68 : !llvm.i64 + %914 = llvm.sub %64, %913 : !llvm.i64 + %915 = llvm.select %910, %914, %913 : !llvm.i1, !llvm.i64 + %916 = llvm.mul %915, %60 : !llvm.i64 + %917 = llvm.add %508, %916 : !llvm.i64 + %918 = llvm.add %917, %52 : !llvm.i64 + %919 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %921 = llvm.mlir.constant(256 : index) : !llvm.i64 + %922 = llvm.mul %918, %921 : !llvm.i64 + %923 = llvm.add %920, %922 : !llvm.i64 + %924 = llvm.mlir.constant(2 : index) : !llvm.i64 + %925 = llvm.mul %516, %924 : !llvm.i64 + %926 = llvm.add %923, %925 : !llvm.i64 + %927 = llvm.mlir.constant(1 : index) : !llvm.i64 + %928 = llvm.mul %530, %927 : !llvm.i64 + %929 = llvm.add %926, %928 : !llvm.i64 + %930 = llvm.getelementptr %919[%929] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %908, %930 : !llvm.ptr> + %931 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %932 = llvm.mlir.constant(0 : index) : !llvm.i64 + %933 = llvm.mlir.constant(16 : index) : !llvm.i64 + %934 = llvm.mul %67, %933 : !llvm.i64 + %935 = llvm.add %932, %934 : !llvm.i64 + %936 = llvm.mlir.constant(1 : index) : !llvm.i64 + %937 = llvm.mul %54, %936 : !llvm.i64 + %938 = llvm.add %935, %937 : !llvm.i64 + %939 = llvm.getelementptr %931[%938] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %940 = llvm.load %939 : !llvm.ptr> + %941 = llvm.add %155, %53 : !llvm.i64 + %942 = llvm.icmp "slt" %941, %67 : !llvm.i64 + %943 = llvm.sub %64, %941 : !llvm.i64 + %944 = llvm.select %942, %943, %941 : !llvm.i1, !llvm.i64 + %945 = llvm.sdiv %944, %68 : !llvm.i64 + %946 = llvm.sub %64, %945 : !llvm.i64 + %947 = llvm.select %942, %946, %945 : !llvm.i1, !llvm.i64 + %948 = llvm.srem %947, %68 : !llvm.i64 + %949 = llvm.icmp "slt" %948, %67 : !llvm.i64 + %950 = llvm.add %948, %68 : !llvm.i64 + %951 = llvm.select %949, %950, %948 : !llvm.i1, !llvm.i64 + %952 = llvm.mul %947, %65 : !llvm.i64 + %953 = llvm.add %566, %952 : !llvm.i64 + %954 = llvm.add %953, %54 : !llvm.i64 + %955 = llvm.icmp "slt" %954, %67 : !llvm.i64 + %956 = llvm.sub %64, %954 : !llvm.i64 + %957 = llvm.select %955, %956, %954 : !llvm.i1, !llvm.i64 + %958 = llvm.sdiv %957, %63 : !llvm.i64 + %959 = llvm.sub %64, %958 : !llvm.i64 + %960 = llvm.select %955, %959, %958 : !llvm.i1, !llvm.i64 + %961 = llvm.mul %960, %65 : !llvm.i64 + %962 = llvm.add %953, %961 : !llvm.i64 + %963 = llvm.add %962, %54 : !llvm.i64 + %964 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %966 = llvm.mlir.constant(256 : index) : !llvm.i64 + %967 = llvm.mul %951, %966 : !llvm.i64 + %968 = llvm.add %965, %967 : !llvm.i64 + %969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %970 = llvm.mul %516, %969 : !llvm.i64 + %971 = llvm.add %968, %970 : !llvm.i64 + %972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %973 = llvm.mul %963, %972 : !llvm.i64 + %974 = llvm.add %971, %973 : !llvm.i64 + %975 = llvm.getelementptr %964[%974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %940, %975 : !llvm.ptr> + %976 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %977 = llvm.mlir.constant(0 : index) : !llvm.i64 + %978 = llvm.mlir.constant(16 : index) : !llvm.i64 + %979 = llvm.mul %67, %978 : !llvm.i64 + %980 = llvm.add %977, %979 : !llvm.i64 + %981 = llvm.mlir.constant(1 : index) : !llvm.i64 + %982 = llvm.mul %34, %981 : !llvm.i64 + %983 = llvm.add %980, %982 : !llvm.i64 + %984 = llvm.getelementptr %976[%983] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %985 = llvm.load %984 : !llvm.ptr> + %986 = llvm.add %508, %56 : !llvm.i64 + %987 = llvm.icmp "slt" %986, %67 : !llvm.i64 + %988 = llvm.sub %64, %986 : !llvm.i64 + %989 = llvm.select %987, %988, %986 : !llvm.i1, !llvm.i64 + %990 = llvm.sdiv %989, %68 : !llvm.i64 + %991 = llvm.sub %64, %990 : !llvm.i64 + %992 = llvm.select %987, %991, %990 : !llvm.i1, !llvm.i64 + %993 = llvm.mul %992, %60 : !llvm.i64 + %994 = llvm.add %508, %993 : !llvm.i64 + %995 = llvm.add %994, %56 : !llvm.i64 + %996 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %997 = llvm.mlir.constant(0 : index) : !llvm.i64 + %998 = llvm.mlir.constant(256 : index) : !llvm.i64 + %999 = llvm.mul %995, %998 : !llvm.i64 + %1000 = llvm.add %997, %999 : !llvm.i64 + %1001 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1002 = llvm.mul %516, %1001 : !llvm.i64 + %1003 = llvm.add %1000, %1002 : !llvm.i64 + %1004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1005 = llvm.mul %530, %1004 : !llvm.i64 + %1006 = llvm.add %1003, %1005 : !llvm.i64 + %1007 = llvm.getelementptr %996[%1006] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %985, %1007 : !llvm.ptr> + %1008 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1010 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1011 = llvm.mul %67, %1010 : !llvm.i64 + %1012 = llvm.add %1009, %1011 : !llvm.i64 + %1013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1014 = llvm.mul %58, %1013 : !llvm.i64 + %1015 = llvm.add %1012, %1014 : !llvm.i64 + %1016 = llvm.getelementptr %1008[%1015] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1017 = llvm.load %1016 : !llvm.ptr> + %1018 = llvm.add %155, %57 : !llvm.i64 + %1019 = llvm.icmp "slt" %1018, %67 : !llvm.i64 + %1020 = llvm.sub %64, %1018 : !llvm.i64 + %1021 = llvm.select %1019, %1020, %1018 : !llvm.i1, !llvm.i64 + %1022 = llvm.sdiv %1021, %68 : !llvm.i64 + %1023 = llvm.sub %64, %1022 : !llvm.i64 + %1024 = llvm.select %1019, %1023, %1022 : !llvm.i1, !llvm.i64 + %1025 = llvm.srem %1024, %68 : !llvm.i64 + %1026 = llvm.icmp "slt" %1025, %67 : !llvm.i64 + %1027 = llvm.add %1025, %68 : !llvm.i64 + %1028 = llvm.select %1026, %1027, %1025 : !llvm.i1, !llvm.i64 + %1029 = llvm.mul %1024, %65 : !llvm.i64 + %1030 = llvm.add %566, %1029 : !llvm.i64 + %1031 = llvm.add %1030, %58 : !llvm.i64 + %1032 = llvm.icmp "slt" %1031, %67 : !llvm.i64 + %1033 = llvm.sub %64, %1031 : !llvm.i64 + %1034 = llvm.select %1032, %1033, %1031 : !llvm.i1, !llvm.i64 + %1035 = llvm.sdiv %1034, %63 : !llvm.i64 + %1036 = llvm.sub %64, %1035 : !llvm.i64 + %1037 = llvm.select %1032, %1036, %1035 : !llvm.i1, !llvm.i64 + %1038 = llvm.mul %1037, %65 : !llvm.i64 + %1039 = llvm.add %1030, %1038 : !llvm.i64 + %1040 = llvm.add %1039, %58 : !llvm.i64 + %1041 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1042 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1043 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1044 = llvm.mul %1028, %1043 : !llvm.i64 + %1045 = llvm.add %1042, %1044 : !llvm.i64 + %1046 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1047 = llvm.mul %516, %1046 : !llvm.i64 + %1048 = llvm.add %1045, %1047 : !llvm.i64 + %1049 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1050 = llvm.mul %1040, %1049 : !llvm.i64 + %1051 = llvm.add %1048, %1050 : !llvm.i64 + %1052 = llvm.getelementptr %1041[%1051] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1017, %1052 : !llvm.ptr> + %1053 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1054 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1055 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1056 = llvm.mul %67, %1055 : !llvm.i64 + %1057 = llvm.add %1054, %1056 : !llvm.i64 + %1058 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1059 = llvm.mul %35, %1058 : !llvm.i64 + %1060 = llvm.add %1057, %1059 : !llvm.i64 + %1061 = llvm.getelementptr %1053[%1060] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1062 = llvm.load %1061 : !llvm.ptr> + %1063 = llvm.add %508, %61 : !llvm.i64 + %1064 = llvm.icmp "slt" %1063, %67 : !llvm.i64 + %1065 = llvm.sub %64, %1063 : !llvm.i64 + %1066 = llvm.select %1064, %1065, %1063 : !llvm.i1, !llvm.i64 + %1067 = llvm.sdiv %1066, %68 : !llvm.i64 + %1068 = llvm.sub %64, %1067 : !llvm.i64 + %1069 = llvm.select %1064, %1068, %1067 : !llvm.i1, !llvm.i64 + %1070 = llvm.mul %1069, %60 : !llvm.i64 + %1071 = llvm.add %508, %1070 : !llvm.i64 + %1072 = llvm.add %1071, %61 : !llvm.i64 + %1073 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1074 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1075 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1076 = llvm.mul %1072, %1075 : !llvm.i64 + %1077 = llvm.add %1074, %1076 : !llvm.i64 + %1078 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1079 = llvm.mul %516, %1078 : !llvm.i64 + %1080 = llvm.add %1077, %1079 : !llvm.i64 + %1081 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1082 = llvm.mul %530, %1081 : !llvm.i64 + %1083 = llvm.add %1080, %1082 : !llvm.i64 + %1084 = llvm.getelementptr %1073[%1083] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1062, %1084 : !llvm.ptr> + %1085 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1087 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1088 = llvm.mul %67, %1087 : !llvm.i64 + %1089 = llvm.add %1086, %1088 : !llvm.i64 + %1090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1091 = llvm.mul %66, %1090 : !llvm.i64 + %1092 = llvm.add %1089, %1091 : !llvm.i64 + %1093 = llvm.getelementptr %1085[%1092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1094 = llvm.load %1093 : !llvm.ptr> + %1095 = llvm.add %155, %62 : !llvm.i64 + %1096 = llvm.icmp "slt" %1095, %67 : !llvm.i64 + %1097 = llvm.sub %64, %1095 : !llvm.i64 + %1098 = llvm.select %1096, %1097, %1095 : !llvm.i1, !llvm.i64 + %1099 = llvm.sdiv %1098, %68 : !llvm.i64 + %1100 = llvm.sub %64, %1099 : !llvm.i64 + %1101 = llvm.select %1096, %1100, %1099 : !llvm.i1, !llvm.i64 + %1102 = llvm.srem %1101, %68 : !llvm.i64 + %1103 = llvm.icmp "slt" %1102, %67 : !llvm.i64 + %1104 = llvm.add %1102, %68 : !llvm.i64 + %1105 = llvm.select %1103, %1104, %1102 : !llvm.i1, !llvm.i64 + %1106 = llvm.mul %1101, %65 : !llvm.i64 + %1107 = llvm.add %566, %1106 : !llvm.i64 + %1108 = llvm.add %1107, %66 : !llvm.i64 + %1109 = llvm.icmp "slt" %1108, %67 : !llvm.i64 + %1110 = llvm.sub %64, %1108 : !llvm.i64 + %1111 = llvm.select %1109, %1110, %1108 : !llvm.i1, !llvm.i64 + %1112 = llvm.sdiv %1111, %63 : !llvm.i64 + %1113 = llvm.sub %64, %1112 : !llvm.i64 + %1114 = llvm.select %1109, %1113, %1112 : !llvm.i1, !llvm.i64 + %1115 = llvm.mul %1114, %65 : !llvm.i64 + %1116 = llvm.add %1107, %1115 : !llvm.i64 + %1117 = llvm.add %1116, %66 : !llvm.i64 + %1118 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1119 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1120 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1121 = llvm.mul %1105, %1120 : !llvm.i64 + %1122 = llvm.add %1119, %1121 : !llvm.i64 + %1123 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1124 = llvm.mul %516, %1123 : !llvm.i64 + %1125 = llvm.add %1122, %1124 : !llvm.i64 + %1126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1127 = llvm.mul %1117, %1126 : !llvm.i64 + %1128 = llvm.add %1125, %1127 : !llvm.i64 + %1129 = llvm.getelementptr %1118[%1128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1094, %1129 : !llvm.ptr> + llvm.br ^bb9 + ^bb8: // pred: ^bb6 + %1130 = llvm.add %151, %155 : !llvm.i64 + %1131 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1134 = llvm.mul %153, %1133 : !llvm.i64 + %1135 = llvm.add %1132, %1134 : !llvm.i64 + %1136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1137 = llvm.mul %1130, %1136 : !llvm.i64 + %1138 = llvm.add %1135, %1137 : !llvm.i64 + %1139 = llvm.getelementptr %1131[%1138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1140 = llvm.bitcast %1139 : !llvm.ptr to !llvm.ptr> + %1141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1143 = llvm.trunc %1130 : !llvm.i64 to !llvm.i32 + %1144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1146 = llvm.insertelement %1143, %1144[%1145 : !llvm.i32] : !llvm.vec<8 x i32> + %1147 = llvm.shufflevector %1146, %1144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1148 = llvm.add %1147, %1142 : !llvm.vec<8 x i32> + %1149 = llvm.trunc %1141 : !llvm.i64 to !llvm.i32 + %1150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1152 = llvm.insertelement %1149, %1150[%1151 : !llvm.i32] : !llvm.vec<8 x i32> + %1153 = llvm.shufflevector %1152, %1150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1154 = llvm.icmp "slt" %1148, %1153 : !llvm.vec<8 x i32> + %1155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1156 = llvm.intr.masked.load %1140, %1154, %1155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1157 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1159 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1160 = llvm.mul %67, %1159 : !llvm.i64 + %1161 = llvm.add %1158, %1160 : !llvm.i64 + %1162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1163 = llvm.mul %67, %1162 : !llvm.i64 + %1164 = llvm.add %1161, %1163 : !llvm.i64 + %1165 = llvm.getelementptr %1157[%1164] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1156, %1165 : !llvm.ptr> + %1166 = llvm.add %1130, %70 : !llvm.i64 + %1167 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1169 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1170 = llvm.mul %153, %1169 : !llvm.i64 + %1171 = llvm.add %1168, %1170 : !llvm.i64 + %1172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1173 = llvm.mul %1166, %1172 : !llvm.i64 + %1174 = llvm.add %1171, %1173 : !llvm.i64 + %1175 = llvm.getelementptr %1167[%1174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1176 = llvm.bitcast %1175 : !llvm.ptr to !llvm.ptr> + %1177 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1178 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1179 = llvm.trunc %1166 : !llvm.i64 to !llvm.i32 + %1180 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1181 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1182 = llvm.insertelement %1179, %1180[%1181 : !llvm.i32] : !llvm.vec<8 x i32> + %1183 = llvm.shufflevector %1182, %1180 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1184 = llvm.add %1183, %1178 : !llvm.vec<8 x i32> + %1185 = llvm.trunc %1177 : !llvm.i64 to !llvm.i32 + %1186 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1187 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1188 = llvm.insertelement %1185, %1186[%1187 : !llvm.i32] : !llvm.vec<8 x i32> + %1189 = llvm.shufflevector %1188, %1186 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1190 = llvm.icmp "slt" %1184, %1189 : !llvm.vec<8 x i32> + %1191 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1192 = llvm.intr.masked.load %1176, %1190, %1191 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1193 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1194 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1195 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1196 = llvm.mul %67, %1195 : !llvm.i64 + %1197 = llvm.add %1194, %1196 : !llvm.i64 + %1198 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1199 = llvm.mul %69, %1198 : !llvm.i64 + %1200 = llvm.add %1197, %1199 : !llvm.i64 + %1201 = llvm.getelementptr %1193[%1200] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1192, %1201 : !llvm.ptr> + %1202 = llvm.add %1130, %68 : !llvm.i64 + %1203 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1204 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1205 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1206 = llvm.mul %153, %1205 : !llvm.i64 + %1207 = llvm.add %1204, %1206 : !llvm.i64 + %1208 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1209 = llvm.mul %1202, %1208 : !llvm.i64 + %1210 = llvm.add %1207, %1209 : !llvm.i64 + %1211 = llvm.getelementptr %1203[%1210] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1212 = llvm.bitcast %1211 : !llvm.ptr to !llvm.ptr> + %1213 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1214 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1215 = llvm.trunc %1202 : !llvm.i64 to !llvm.i32 + %1216 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1217 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1218 = llvm.insertelement %1215, %1216[%1217 : !llvm.i32] : !llvm.vec<8 x i32> + %1219 = llvm.shufflevector %1218, %1216 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1220 = llvm.add %1219, %1214 : !llvm.vec<8 x i32> + %1221 = llvm.trunc %1213 : !llvm.i64 to !llvm.i32 + %1222 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1223 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1224 = llvm.insertelement %1221, %1222[%1223 : !llvm.i32] : !llvm.vec<8 x i32> + %1225 = llvm.shufflevector %1224, %1222 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1226 = llvm.icmp "slt" %1220, %1225 : !llvm.vec<8 x i32> + %1227 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1228 = llvm.intr.masked.load %1212, %1226, %1227 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1229 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1230 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1231 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1232 = llvm.mul %67, %1231 : !llvm.i64 + %1233 = llvm.add %1230, %1232 : !llvm.i64 + %1234 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1235 = llvm.mul %63, %1234 : !llvm.i64 + %1236 = llvm.add %1233, %1235 : !llvm.i64 + %1237 = llvm.getelementptr %1229[%1236] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1228, %1237 : !llvm.ptr> + %1238 = llvm.add %1130, %41 : !llvm.i64 + %1239 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1242 = llvm.mul %153, %1241 : !llvm.i64 + %1243 = llvm.add %1240, %1242 : !llvm.i64 + %1244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1245 = llvm.mul %1238, %1244 : !llvm.i64 + %1246 = llvm.add %1243, %1245 : !llvm.i64 + %1247 = llvm.getelementptr %1239[%1246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1248 = llvm.bitcast %1247 : !llvm.ptr to !llvm.ptr> + %1249 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1250 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1251 = llvm.trunc %1238 : !llvm.i64 to !llvm.i32 + %1252 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1253 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1254 = llvm.insertelement %1251, %1252[%1253 : !llvm.i32] : !llvm.vec<8 x i32> + %1255 = llvm.shufflevector %1254, %1252 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1256 = llvm.add %1255, %1250 : !llvm.vec<8 x i32> + %1257 = llvm.trunc %1249 : !llvm.i64 to !llvm.i32 + %1258 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1259 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1260 = llvm.insertelement %1257, %1258[%1259 : !llvm.i32] : !llvm.vec<8 x i32> + %1261 = llvm.shufflevector %1260, %1258 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1262 = llvm.icmp "slt" %1256, %1261 : !llvm.vec<8 x i32> + %1263 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1264 = llvm.intr.masked.load %1248, %1262, %1263 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1265 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1266 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1267 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1268 = llvm.mul %67, %1267 : !llvm.i64 + %1269 = llvm.add %1266, %1268 : !llvm.i64 + %1270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1271 = llvm.mul %45, %1270 : !llvm.i64 + %1272 = llvm.add %1269, %1271 : !llvm.i64 + %1273 = llvm.getelementptr %1265[%1272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1264, %1273 : !llvm.ptr> + %1274 = llvm.add %1130, %42 : !llvm.i64 + %1275 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1276 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1277 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1278 = llvm.mul %153, %1277 : !llvm.i64 + %1279 = llvm.add %1276, %1278 : !llvm.i64 + %1280 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1281 = llvm.mul %1274, %1280 : !llvm.i64 + %1282 = llvm.add %1279, %1281 : !llvm.i64 + %1283 = llvm.getelementptr %1275[%1282] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1284 = llvm.bitcast %1283 : !llvm.ptr to !llvm.ptr> + %1285 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1286 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1287 = llvm.trunc %1274 : !llvm.i64 to !llvm.i32 + %1288 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1289 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1290 = llvm.insertelement %1287, %1288[%1289 : !llvm.i32] : !llvm.vec<8 x i32> + %1291 = llvm.shufflevector %1290, %1288 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1292 = llvm.add %1291, %1286 : !llvm.vec<8 x i32> + %1293 = llvm.trunc %1285 : !llvm.i64 to !llvm.i32 + %1294 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1295 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1296 = llvm.insertelement %1293, %1294[%1295 : !llvm.i32] : !llvm.vec<8 x i32> + %1297 = llvm.shufflevector %1296, %1294 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1298 = llvm.icmp "slt" %1292, %1297 : !llvm.vec<8 x i32> + %1299 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1300 = llvm.intr.masked.load %1284, %1298, %1299 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1301 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1302 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1303 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1304 = llvm.mul %67, %1303 : !llvm.i64 + %1305 = llvm.add %1302, %1304 : !llvm.i64 + %1306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1307 = llvm.mul %48, %1306 : !llvm.i64 + %1308 = llvm.add %1305, %1307 : !llvm.i64 + %1309 = llvm.getelementptr %1301[%1308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1300, %1309 : !llvm.ptr> + %1310 = llvm.add %1130, %43 : !llvm.i64 + %1311 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1314 = llvm.mul %153, %1313 : !llvm.i64 + %1315 = llvm.add %1312, %1314 : !llvm.i64 + %1316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1317 = llvm.mul %1310, %1316 : !llvm.i64 + %1318 = llvm.add %1315, %1317 : !llvm.i64 + %1319 = llvm.getelementptr %1311[%1318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1320 = llvm.bitcast %1319 : !llvm.ptr to !llvm.ptr> + %1321 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1322 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1323 = llvm.trunc %1310 : !llvm.i64 to !llvm.i32 + %1324 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1325 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1326 = llvm.insertelement %1323, %1324[%1325 : !llvm.i32] : !llvm.vec<8 x i32> + %1327 = llvm.shufflevector %1326, %1324 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1328 = llvm.add %1327, %1322 : !llvm.vec<8 x i32> + %1329 = llvm.trunc %1321 : !llvm.i64 to !llvm.i32 + %1330 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1331 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1332 = llvm.insertelement %1329, %1330[%1331 : !llvm.i32] : !llvm.vec<8 x i32> + %1333 = llvm.shufflevector %1332, %1330 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1334 = llvm.icmp "slt" %1328, %1333 : !llvm.vec<8 x i32> + %1335 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1336 = llvm.intr.masked.load %1320, %1334, %1335 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1340 = llvm.mul %67, %1339 : !llvm.i64 + %1341 = llvm.add %1338, %1340 : !llvm.i64 + %1342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1343 = llvm.mul %52, %1342 : !llvm.i64 + %1344 = llvm.add %1341, %1343 : !llvm.i64 + %1345 = llvm.getelementptr %1337[%1344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1336, %1345 : !llvm.ptr> + %1346 = llvm.add %1130, %44 : !llvm.i64 + %1347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1350 = llvm.mul %153, %1349 : !llvm.i64 + %1351 = llvm.add %1348, %1350 : !llvm.i64 + %1352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1353 = llvm.mul %1346, %1352 : !llvm.i64 + %1354 = llvm.add %1351, %1353 : !llvm.i64 + %1355 = llvm.getelementptr %1347[%1354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1356 = llvm.bitcast %1355 : !llvm.ptr to !llvm.ptr> + %1357 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1358 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1359 = llvm.trunc %1346 : !llvm.i64 to !llvm.i32 + %1360 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1361 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1362 = llvm.insertelement %1359, %1360[%1361 : !llvm.i32] : !llvm.vec<8 x i32> + %1363 = llvm.shufflevector %1362, %1360 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1364 = llvm.add %1363, %1358 : !llvm.vec<8 x i32> + %1365 = llvm.trunc %1357 : !llvm.i64 to !llvm.i32 + %1366 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1367 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1368 = llvm.insertelement %1365, %1366[%1367 : !llvm.i32] : !llvm.vec<8 x i32> + %1369 = llvm.shufflevector %1368, %1366 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1370 = llvm.icmp "slt" %1364, %1369 : !llvm.vec<8 x i32> + %1371 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1372 = llvm.intr.masked.load %1356, %1370, %1371 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1373 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1375 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1376 = llvm.mul %67, %1375 : !llvm.i64 + %1377 = llvm.add %1374, %1376 : !llvm.i64 + %1378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1379 = llvm.mul %56, %1378 : !llvm.i64 + %1380 = llvm.add %1377, %1379 : !llvm.i64 + %1381 = llvm.getelementptr %1373[%1380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1372, %1381 : !llvm.ptr> + %1382 = llvm.add %1130, %46 : !llvm.i64 + %1383 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1384 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1385 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1386 = llvm.mul %153, %1385 : !llvm.i64 + %1387 = llvm.add %1384, %1386 : !llvm.i64 + %1388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1389 = llvm.mul %1382, %1388 : !llvm.i64 + %1390 = llvm.add %1387, %1389 : !llvm.i64 + %1391 = llvm.getelementptr %1383[%1390] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1392 = llvm.bitcast %1391 : !llvm.ptr to !llvm.ptr> + %1393 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1394 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1395 = llvm.trunc %1382 : !llvm.i64 to !llvm.i32 + %1396 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1397 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1398 = llvm.insertelement %1395, %1396[%1397 : !llvm.i32] : !llvm.vec<8 x i32> + %1399 = llvm.shufflevector %1398, %1396 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1400 = llvm.add %1399, %1394 : !llvm.vec<8 x i32> + %1401 = llvm.trunc %1393 : !llvm.i64 to !llvm.i32 + %1402 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1403 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1404 = llvm.insertelement %1401, %1402[%1403 : !llvm.i32] : !llvm.vec<8 x i32> + %1405 = llvm.shufflevector %1404, %1402 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1406 = llvm.icmp "slt" %1400, %1405 : !llvm.vec<8 x i32> + %1407 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1408 = llvm.intr.masked.load %1392, %1406, %1407 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1409 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1410 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1411 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1412 = llvm.mul %67, %1411 : !llvm.i64 + %1413 = llvm.add %1410, %1412 : !llvm.i64 + %1414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1415 = llvm.mul %61, %1414 : !llvm.i64 + %1416 = llvm.add %1413, %1415 : !llvm.i64 + %1417 = llvm.getelementptr %1409[%1416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1408, %1417 : !llvm.ptr> + %1418 = llvm.add %1130, %47 : !llvm.i64 + %1419 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1422 = llvm.mul %153, %1421 : !llvm.i64 + %1423 = llvm.add %1420, %1422 : !llvm.i64 + %1424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1425 = llvm.mul %1418, %1424 : !llvm.i64 + %1426 = llvm.add %1423, %1425 : !llvm.i64 + %1427 = llvm.getelementptr %1419[%1426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1428 = llvm.bitcast %1427 : !llvm.ptr to !llvm.ptr> + %1429 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1430 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1431 = llvm.trunc %1418 : !llvm.i64 to !llvm.i32 + %1432 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1433 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1434 = llvm.insertelement %1431, %1432[%1433 : !llvm.i32] : !llvm.vec<8 x i32> + %1435 = llvm.shufflevector %1434, %1432 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1436 = llvm.add %1435, %1430 : !llvm.vec<8 x i32> + %1437 = llvm.trunc %1429 : !llvm.i64 to !llvm.i32 + %1438 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1439 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1440 = llvm.insertelement %1437, %1438[%1439 : !llvm.i32] : !llvm.vec<8 x i32> + %1441 = llvm.shufflevector %1440, %1438 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1442 = llvm.icmp "slt" %1436, %1441 : !llvm.vec<8 x i32> + %1443 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1444 = llvm.intr.masked.load %1428, %1442, %1443 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1445 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1446 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1447 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1448 = llvm.mul %67, %1447 : !llvm.i64 + %1449 = llvm.add %1446, %1448 : !llvm.i64 + %1450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1451 = llvm.mul %70, %1450 : !llvm.i64 + %1452 = llvm.add %1449, %1451 : !llvm.i64 + %1453 = llvm.getelementptr %1445[%1452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1444, %1453 : !llvm.ptr> + %1454 = llvm.add %1130, %49 : !llvm.i64 + %1455 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1456 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1457 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1458 = llvm.mul %153, %1457 : !llvm.i64 + %1459 = llvm.add %1456, %1458 : !llvm.i64 + %1460 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1461 = llvm.mul %1454, %1460 : !llvm.i64 + %1462 = llvm.add %1459, %1461 : !llvm.i64 + %1463 = llvm.getelementptr %1455[%1462] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1464 = llvm.bitcast %1463 : !llvm.ptr to !llvm.ptr> + %1465 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1466 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1467 = llvm.trunc %1454 : !llvm.i64 to !llvm.i32 + %1468 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1469 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1470 = llvm.insertelement %1467, %1468[%1469 : !llvm.i32] : !llvm.vec<8 x i32> + %1471 = llvm.shufflevector %1470, %1468 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1472 = llvm.add %1471, %1466 : !llvm.vec<8 x i32> + %1473 = llvm.trunc %1465 : !llvm.i64 to !llvm.i32 + %1474 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1475 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1476 = llvm.insertelement %1473, %1474[%1475 : !llvm.i32] : !llvm.vec<8 x i32> + %1477 = llvm.shufflevector %1476, %1474 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1478 = llvm.icmp "slt" %1472, %1477 : !llvm.vec<8 x i32> + %1479 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1480 = llvm.intr.masked.load %1464, %1478, %1479 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1481 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1482 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1483 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1484 = llvm.mul %67, %1483 : !llvm.i64 + %1485 = llvm.add %1482, %1484 : !llvm.i64 + %1486 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1487 = llvm.mul %50, %1486 : !llvm.i64 + %1488 = llvm.add %1485, %1487 : !llvm.i64 + %1489 = llvm.getelementptr %1481[%1488] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1480, %1489 : !llvm.ptr> + %1490 = llvm.add %1130, %51 : !llvm.i64 + %1491 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1492 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1493 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1494 = llvm.mul %153, %1493 : !llvm.i64 + %1495 = llvm.add %1492, %1494 : !llvm.i64 + %1496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1497 = llvm.mul %1490, %1496 : !llvm.i64 + %1498 = llvm.add %1495, %1497 : !llvm.i64 + %1499 = llvm.getelementptr %1491[%1498] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1500 = llvm.bitcast %1499 : !llvm.ptr to !llvm.ptr> + %1501 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1502 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1503 = llvm.trunc %1490 : !llvm.i64 to !llvm.i32 + %1504 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1505 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1506 = llvm.insertelement %1503, %1504[%1505 : !llvm.i32] : !llvm.vec<8 x i32> + %1507 = llvm.shufflevector %1506, %1504 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1508 = llvm.add %1507, %1502 : !llvm.vec<8 x i32> + %1509 = llvm.trunc %1501 : !llvm.i64 to !llvm.i32 + %1510 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1511 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1512 = llvm.insertelement %1509, %1510[%1511 : !llvm.i32] : !llvm.vec<8 x i32> + %1513 = llvm.shufflevector %1512, %1510 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1514 = llvm.icmp "slt" %1508, %1513 : !llvm.vec<8 x i32> + %1515 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1516 = llvm.intr.masked.load %1500, %1514, %1515 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1517 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1519 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1520 = llvm.mul %67, %1519 : !llvm.i64 + %1521 = llvm.add %1518, %1520 : !llvm.i64 + %1522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1523 = llvm.mul %33, %1522 : !llvm.i64 + %1524 = llvm.add %1521, %1523 : !llvm.i64 + %1525 = llvm.getelementptr %1517[%1524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1516, %1525 : !llvm.ptr> + %1526 = llvm.add %1130, %53 : !llvm.i64 + %1527 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1528 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1529 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1530 = llvm.mul %153, %1529 : !llvm.i64 + %1531 = llvm.add %1528, %1530 : !llvm.i64 + %1532 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1533 = llvm.mul %1526, %1532 : !llvm.i64 + %1534 = llvm.add %1531, %1533 : !llvm.i64 + %1535 = llvm.getelementptr %1527[%1534] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1536 = llvm.bitcast %1535 : !llvm.ptr to !llvm.ptr> + %1537 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1538 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1539 = llvm.trunc %1526 : !llvm.i64 to !llvm.i32 + %1540 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1541 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1542 = llvm.insertelement %1539, %1540[%1541 : !llvm.i32] : !llvm.vec<8 x i32> + %1543 = llvm.shufflevector %1542, %1540 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1544 = llvm.add %1543, %1538 : !llvm.vec<8 x i32> + %1545 = llvm.trunc %1537 : !llvm.i64 to !llvm.i32 + %1546 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1547 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1548 = llvm.insertelement %1545, %1546[%1547 : !llvm.i32] : !llvm.vec<8 x i32> + %1549 = llvm.shufflevector %1548, %1546 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1550 = llvm.icmp "slt" %1544, %1549 : !llvm.vec<8 x i32> + %1551 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1552 = llvm.intr.masked.load %1536, %1550, %1551 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1553 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1554 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1555 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1556 = llvm.mul %67, %1555 : !llvm.i64 + %1557 = llvm.add %1554, %1556 : !llvm.i64 + %1558 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1559 = llvm.mul %54, %1558 : !llvm.i64 + %1560 = llvm.add %1557, %1559 : !llvm.i64 + %1561 = llvm.getelementptr %1553[%1560] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1552, %1561 : !llvm.ptr> + %1562 = llvm.add %1130, %55 : !llvm.i64 + %1563 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1564 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1565 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1566 = llvm.mul %153, %1565 : !llvm.i64 + %1567 = llvm.add %1564, %1566 : !llvm.i64 + %1568 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1569 = llvm.mul %1562, %1568 : !llvm.i64 + %1570 = llvm.add %1567, %1569 : !llvm.i64 + %1571 = llvm.getelementptr %1563[%1570] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1572 = llvm.bitcast %1571 : !llvm.ptr to !llvm.ptr> + %1573 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1574 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1575 = llvm.trunc %1562 : !llvm.i64 to !llvm.i32 + %1576 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1577 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1578 = llvm.insertelement %1575, %1576[%1577 : !llvm.i32] : !llvm.vec<8 x i32> + %1579 = llvm.shufflevector %1578, %1576 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1580 = llvm.add %1579, %1574 : !llvm.vec<8 x i32> + %1581 = llvm.trunc %1573 : !llvm.i64 to !llvm.i32 + %1582 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1583 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1584 = llvm.insertelement %1581, %1582[%1583 : !llvm.i32] : !llvm.vec<8 x i32> + %1585 = llvm.shufflevector %1584, %1582 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1586 = llvm.icmp "slt" %1580, %1585 : !llvm.vec<8 x i32> + %1587 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1588 = llvm.intr.masked.load %1572, %1586, %1587 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1589 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1592 = llvm.mul %67, %1591 : !llvm.i64 + %1593 = llvm.add %1590, %1592 : !llvm.i64 + %1594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1595 = llvm.mul %34, %1594 : !llvm.i64 + %1596 = llvm.add %1593, %1595 : !llvm.i64 + %1597 = llvm.getelementptr %1589[%1596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1588, %1597 : !llvm.ptr> + %1598 = llvm.add %1130, %57 : !llvm.i64 + %1599 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1602 = llvm.mul %153, %1601 : !llvm.i64 + %1603 = llvm.add %1600, %1602 : !llvm.i64 + %1604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1605 = llvm.mul %1598, %1604 : !llvm.i64 + %1606 = llvm.add %1603, %1605 : !llvm.i64 + %1607 = llvm.getelementptr %1599[%1606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1608 = llvm.bitcast %1607 : !llvm.ptr to !llvm.ptr> + %1609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1611 = llvm.trunc %1598 : !llvm.i64 to !llvm.i32 + %1612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1614 = llvm.insertelement %1611, %1612[%1613 : !llvm.i32] : !llvm.vec<8 x i32> + %1615 = llvm.shufflevector %1614, %1612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1616 = llvm.add %1615, %1610 : !llvm.vec<8 x i32> + %1617 = llvm.trunc %1609 : !llvm.i64 to !llvm.i32 + %1618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1620 = llvm.insertelement %1617, %1618[%1619 : !llvm.i32] : !llvm.vec<8 x i32> + %1621 = llvm.shufflevector %1620, %1618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1622 = llvm.icmp "slt" %1616, %1621 : !llvm.vec<8 x i32> + %1623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1624 = llvm.intr.masked.load %1608, %1622, %1623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1625 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1626 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1627 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1628 = llvm.mul %67, %1627 : !llvm.i64 + %1629 = llvm.add %1626, %1628 : !llvm.i64 + %1630 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1631 = llvm.mul %58, %1630 : !llvm.i64 + %1632 = llvm.add %1629, %1631 : !llvm.i64 + %1633 = llvm.getelementptr %1625[%1632] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1624, %1633 : !llvm.ptr> + %1634 = llvm.add %1130, %59 : !llvm.i64 + %1635 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1637 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1638 = llvm.mul %153, %1637 : !llvm.i64 + %1639 = llvm.add %1636, %1638 : !llvm.i64 + %1640 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1641 = llvm.mul %1634, %1640 : !llvm.i64 + %1642 = llvm.add %1639, %1641 : !llvm.i64 + %1643 = llvm.getelementptr %1635[%1642] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1644 = llvm.bitcast %1643 : !llvm.ptr to !llvm.ptr> + %1645 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1646 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1647 = llvm.trunc %1634 : !llvm.i64 to !llvm.i32 + %1648 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1649 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1650 = llvm.insertelement %1647, %1648[%1649 : !llvm.i32] : !llvm.vec<8 x i32> + %1651 = llvm.shufflevector %1650, %1648 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1652 = llvm.add %1651, %1646 : !llvm.vec<8 x i32> + %1653 = llvm.trunc %1645 : !llvm.i64 to !llvm.i32 + %1654 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1655 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1656 = llvm.insertelement %1653, %1654[%1655 : !llvm.i32] : !llvm.vec<8 x i32> + %1657 = llvm.shufflevector %1656, %1654 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1658 = llvm.icmp "slt" %1652, %1657 : !llvm.vec<8 x i32> + %1659 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1660 = llvm.intr.masked.load %1644, %1658, %1659 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1661 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1662 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1663 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1664 = llvm.mul %67, %1663 : !llvm.i64 + %1665 = llvm.add %1662, %1664 : !llvm.i64 + %1666 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1667 = llvm.mul %35, %1666 : !llvm.i64 + %1668 = llvm.add %1665, %1667 : !llvm.i64 + %1669 = llvm.getelementptr %1661[%1668] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1660, %1669 : !llvm.ptr> + %1670 = llvm.add %1130, %62 : !llvm.i64 + %1671 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1672 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1673 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1674 = llvm.mul %153, %1673 : !llvm.i64 + %1675 = llvm.add %1672, %1674 : !llvm.i64 + %1676 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1677 = llvm.mul %1670, %1676 : !llvm.i64 + %1678 = llvm.add %1675, %1677 : !llvm.i64 + %1679 = llvm.getelementptr %1671[%1678] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1680 = llvm.bitcast %1679 : !llvm.ptr to !llvm.ptr> + %1681 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1682 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1683 = llvm.trunc %1670 : !llvm.i64 to !llvm.i32 + %1684 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1685 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1686 = llvm.insertelement %1683, %1684[%1685 : !llvm.i32] : !llvm.vec<8 x i32> + %1687 = llvm.shufflevector %1686, %1684 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1688 = llvm.add %1687, %1682 : !llvm.vec<8 x i32> + %1689 = llvm.trunc %1681 : !llvm.i64 to !llvm.i32 + %1690 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1691 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1692 = llvm.insertelement %1689, %1690[%1691 : !llvm.i32] : !llvm.vec<8 x i32> + %1693 = llvm.shufflevector %1692, %1690 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1694 = llvm.icmp "slt" %1688, %1693 : !llvm.vec<8 x i32> + %1695 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1696 = llvm.intr.masked.load %1680, %1694, %1695 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1697 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1698 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1699 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1700 = llvm.mul %67, %1699 : !llvm.i64 + %1701 = llvm.add %1698, %1700 : !llvm.i64 + %1702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1703 = llvm.mul %66, %1702 : !llvm.i64 + %1704 = llvm.add %1701, %1703 : !llvm.i64 + %1705 = llvm.getelementptr %1697[%1704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1696, %1705 : !llvm.ptr> + %1706 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1708 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1709 = llvm.mul %67, %1708 : !llvm.i64 + %1710 = llvm.add %1707, %1709 : !llvm.i64 + %1711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1712 = llvm.mul %67, %1711 : !llvm.i64 + %1713 = llvm.add %1710, %1712 : !llvm.i64 + %1714 = llvm.getelementptr %1706[%1713] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1715 = llvm.load %1714 : !llvm.ptr> + %1716 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %1717 = llvm.sub %64, %155 : !llvm.i64 + %1718 = llvm.select %1716, %1717, %155 : !llvm.i1, !llvm.i64 + %1719 = llvm.sdiv %1718, %68 : !llvm.i64 + %1720 = llvm.sub %64, %1719 : !llvm.i64 + %1721 = llvm.select %1716, %1720, %1719 : !llvm.i1, !llvm.i64 + %1722 = llvm.srem %1721, %68 : !llvm.i64 + %1723 = llvm.icmp "slt" %1722, %67 : !llvm.i64 + %1724 = llvm.add %1722, %68 : !llvm.i64 + %1725 = llvm.select %1723, %1724, %1722 : !llvm.i1, !llvm.i64 + %1726 = llvm.srem %153, %39 : !llvm.i64 + %1727 = llvm.icmp "slt" %1726, %67 : !llvm.i64 + %1728 = llvm.add %1726, %39 : !llvm.i64 + %1729 = llvm.select %1727, %1728, %1726 : !llvm.i1, !llvm.i64 + %1730 = llvm.srem %155, %68 : !llvm.i64 + %1731 = llvm.icmp "slt" %1730, %67 : !llvm.i64 + %1732 = llvm.add %1730, %68 : !llvm.i64 + %1733 = llvm.select %1731, %1732, %1730 : !llvm.i1, !llvm.i64 + %1734 = llvm.icmp "slt" %1733, %67 : !llvm.i64 + %1735 = llvm.sub %64, %1733 : !llvm.i64 + %1736 = llvm.select %1734, %1735, %1733 : !llvm.i1, !llvm.i64 + %1737 = llvm.sdiv %1736, %70 : !llvm.i64 + %1738 = llvm.sub %64, %1737 : !llvm.i64 + %1739 = llvm.select %1734, %1738, %1737 : !llvm.i1, !llvm.i64 + %1740 = llvm.srem %1739, %63 : !llvm.i64 + %1741 = llvm.icmp "slt" %1740, %67 : !llvm.i64 + %1742 = llvm.add %1740, %63 : !llvm.i64 + %1743 = llvm.select %1741, %1742, %1740 : !llvm.i1, !llvm.i64 + %1744 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1746 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1747 = llvm.mul %1725, %1746 : !llvm.i64 + %1748 = llvm.add %1745, %1747 : !llvm.i64 + %1749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1750 = llvm.mul %1729, %1749 : !llvm.i64 + %1751 = llvm.add %1748, %1750 : !llvm.i64 + %1752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1753 = llvm.mul %1743, %1752 : !llvm.i64 + %1754 = llvm.add %1751, %1753 : !llvm.i64 + %1755 = llvm.getelementptr %1744[%1754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1715, %1755 : !llvm.ptr> + %1756 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1758 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1759 = llvm.mul %67, %1758 : !llvm.i64 + %1760 = llvm.add %1757, %1759 : !llvm.i64 + %1761 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1762 = llvm.mul %69, %1761 : !llvm.i64 + %1763 = llvm.add %1760, %1762 : !llvm.i64 + %1764 = llvm.getelementptr %1756[%1763] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1765 = llvm.load %1764 : !llvm.ptr> + %1766 = llvm.add %155, %70 : !llvm.i64 + %1767 = llvm.icmp "slt" %1766, %67 : !llvm.i64 + %1768 = llvm.sub %64, %1766 : !llvm.i64 + %1769 = llvm.select %1767, %1768, %1766 : !llvm.i1, !llvm.i64 + %1770 = llvm.sdiv %1769, %68 : !llvm.i64 + %1771 = llvm.sub %64, %1770 : !llvm.i64 + %1772 = llvm.select %1767, %1771, %1770 : !llvm.i1, !llvm.i64 + %1773 = llvm.srem %1772, %68 : !llvm.i64 + %1774 = llvm.icmp "slt" %1773, %67 : !llvm.i64 + %1775 = llvm.add %1773, %68 : !llvm.i64 + %1776 = llvm.select %1774, %1775, %1773 : !llvm.i1, !llvm.i64 + %1777 = llvm.sdiv %1718, %70 : !llvm.i64 + %1778 = llvm.sub %64, %1777 : !llvm.i64 + %1779 = llvm.select %1716, %1778, %1777 : !llvm.i1, !llvm.i64 + %1780 = llvm.mul %1772, %65 : !llvm.i64 + %1781 = llvm.add %1779, %1780 : !llvm.i64 + %1782 = llvm.add %1781, %69 : !llvm.i64 + %1783 = llvm.icmp "slt" %1782, %67 : !llvm.i64 + %1784 = llvm.sub %64, %1782 : !llvm.i64 + %1785 = llvm.select %1783, %1784, %1782 : !llvm.i1, !llvm.i64 + %1786 = llvm.sdiv %1785, %63 : !llvm.i64 + %1787 = llvm.sub %64, %1786 : !llvm.i64 + %1788 = llvm.select %1783, %1787, %1786 : !llvm.i1, !llvm.i64 + %1789 = llvm.mul %1788, %65 : !llvm.i64 + %1790 = llvm.add %1781, %1789 : !llvm.i64 + %1791 = llvm.add %1790, %69 : !llvm.i64 + %1792 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1794 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1795 = llvm.mul %1776, %1794 : !llvm.i64 + %1796 = llvm.add %1793, %1795 : !llvm.i64 + %1797 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1798 = llvm.mul %1729, %1797 : !llvm.i64 + %1799 = llvm.add %1796, %1798 : !llvm.i64 + %1800 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1801 = llvm.mul %1791, %1800 : !llvm.i64 + %1802 = llvm.add %1799, %1801 : !llvm.i64 + %1803 = llvm.getelementptr %1792[%1802] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1765, %1803 : !llvm.ptr> + %1804 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1805 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1806 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1807 = llvm.mul %67, %1806 : !llvm.i64 + %1808 = llvm.add %1805, %1807 : !llvm.i64 + %1809 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1810 = llvm.mul %63, %1809 : !llvm.i64 + %1811 = llvm.add %1808, %1810 : !llvm.i64 + %1812 = llvm.getelementptr %1804[%1811] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1813 = llvm.load %1812 : !llvm.ptr> + %1814 = llvm.add %1721, %69 : !llvm.i64 + %1815 = llvm.icmp "slt" %1814, %67 : !llvm.i64 + %1816 = llvm.sub %64, %1814 : !llvm.i64 + %1817 = llvm.select %1815, %1816, %1814 : !llvm.i1, !llvm.i64 + %1818 = llvm.sdiv %1817, %68 : !llvm.i64 + %1819 = llvm.sub %64, %1818 : !llvm.i64 + %1820 = llvm.select %1815, %1819, %1818 : !llvm.i1, !llvm.i64 + %1821 = llvm.mul %1820, %60 : !llvm.i64 + %1822 = llvm.add %1721, %1821 : !llvm.i64 + %1823 = llvm.add %1822, %69 : !llvm.i64 + %1824 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1825 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1826 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1827 = llvm.mul %1823, %1826 : !llvm.i64 + %1828 = llvm.add %1825, %1827 : !llvm.i64 + %1829 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1830 = llvm.mul %1729, %1829 : !llvm.i64 + %1831 = llvm.add %1828, %1830 : !llvm.i64 + %1832 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1833 = llvm.mul %1743, %1832 : !llvm.i64 + %1834 = llvm.add %1831, %1833 : !llvm.i64 + %1835 = llvm.getelementptr %1824[%1834] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1813, %1835 : !llvm.ptr> + %1836 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1837 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1838 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1839 = llvm.mul %67, %1838 : !llvm.i64 + %1840 = llvm.add %1837, %1839 : !llvm.i64 + %1841 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1842 = llvm.mul %45, %1841 : !llvm.i64 + %1843 = llvm.add %1840, %1842 : !llvm.i64 + %1844 = llvm.getelementptr %1836[%1843] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1845 = llvm.load %1844 : !llvm.ptr> + %1846 = llvm.add %155, %41 : !llvm.i64 + %1847 = llvm.icmp "slt" %1846, %67 : !llvm.i64 + %1848 = llvm.sub %64, %1846 : !llvm.i64 + %1849 = llvm.select %1847, %1848, %1846 : !llvm.i1, !llvm.i64 + %1850 = llvm.sdiv %1849, %68 : !llvm.i64 + %1851 = llvm.sub %64, %1850 : !llvm.i64 + %1852 = llvm.select %1847, %1851, %1850 : !llvm.i1, !llvm.i64 + %1853 = llvm.srem %1852, %68 : !llvm.i64 + %1854 = llvm.icmp "slt" %1853, %67 : !llvm.i64 + %1855 = llvm.add %1853, %68 : !llvm.i64 + %1856 = llvm.select %1854, %1855, %1853 : !llvm.i1, !llvm.i64 + %1857 = llvm.mul %1852, %65 : !llvm.i64 + %1858 = llvm.add %1779, %1857 : !llvm.i64 + %1859 = llvm.add %1858, %45 : !llvm.i64 + %1860 = llvm.icmp "slt" %1859, %67 : !llvm.i64 + %1861 = llvm.sub %64, %1859 : !llvm.i64 + %1862 = llvm.select %1860, %1861, %1859 : !llvm.i1, !llvm.i64 + %1863 = llvm.sdiv %1862, %63 : !llvm.i64 + %1864 = llvm.sub %64, %1863 : !llvm.i64 + %1865 = llvm.select %1860, %1864, %1863 : !llvm.i1, !llvm.i64 + %1866 = llvm.mul %1865, %65 : !llvm.i64 + %1867 = llvm.add %1858, %1866 : !llvm.i64 + %1868 = llvm.add %1867, %45 : !llvm.i64 + %1869 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1871 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1872 = llvm.mul %1856, %1871 : !llvm.i64 + %1873 = llvm.add %1870, %1872 : !llvm.i64 + %1874 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1875 = llvm.mul %1729, %1874 : !llvm.i64 + %1876 = llvm.add %1873, %1875 : !llvm.i64 + %1877 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1878 = llvm.mul %1868, %1877 : !llvm.i64 + %1879 = llvm.add %1876, %1878 : !llvm.i64 + %1880 = llvm.getelementptr %1869[%1879] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1845, %1880 : !llvm.ptr> + %1881 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1882 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1883 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1884 = llvm.mul %67, %1883 : !llvm.i64 + %1885 = llvm.add %1882, %1884 : !llvm.i64 + %1886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1887 = llvm.mul %48, %1886 : !llvm.i64 + %1888 = llvm.add %1885, %1887 : !llvm.i64 + %1889 = llvm.getelementptr %1881[%1888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1890 = llvm.load %1889 : !llvm.ptr> + %1891 = llvm.add %1721, %63 : !llvm.i64 + %1892 = llvm.icmp "slt" %1891, %67 : !llvm.i64 + %1893 = llvm.sub %64, %1891 : !llvm.i64 + %1894 = llvm.select %1892, %1893, %1891 : !llvm.i1, !llvm.i64 + %1895 = llvm.sdiv %1894, %68 : !llvm.i64 + %1896 = llvm.sub %64, %1895 : !llvm.i64 + %1897 = llvm.select %1892, %1896, %1895 : !llvm.i1, !llvm.i64 + %1898 = llvm.mul %1897, %60 : !llvm.i64 + %1899 = llvm.add %1721, %1898 : !llvm.i64 + %1900 = llvm.add %1899, %63 : !llvm.i64 + %1901 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1903 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1904 = llvm.mul %1900, %1903 : !llvm.i64 + %1905 = llvm.add %1902, %1904 : !llvm.i64 + %1906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1907 = llvm.mul %1729, %1906 : !llvm.i64 + %1908 = llvm.add %1905, %1907 : !llvm.i64 + %1909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1910 = llvm.mul %1743, %1909 : !llvm.i64 + %1911 = llvm.add %1908, %1910 : !llvm.i64 + %1912 = llvm.getelementptr %1901[%1911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1890, %1912 : !llvm.ptr> + %1913 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1914 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1915 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1916 = llvm.mul %67, %1915 : !llvm.i64 + %1917 = llvm.add %1914, %1916 : !llvm.i64 + %1918 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1919 = llvm.mul %52, %1918 : !llvm.i64 + %1920 = llvm.add %1917, %1919 : !llvm.i64 + %1921 = llvm.getelementptr %1913[%1920] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1922 = llvm.load %1921 : !llvm.ptr> + %1923 = llvm.add %155, %43 : !llvm.i64 + %1924 = llvm.icmp "slt" %1923, %67 : !llvm.i64 + %1925 = llvm.sub %64, %1923 : !llvm.i64 + %1926 = llvm.select %1924, %1925, %1923 : !llvm.i1, !llvm.i64 + %1927 = llvm.sdiv %1926, %68 : !llvm.i64 + %1928 = llvm.sub %64, %1927 : !llvm.i64 + %1929 = llvm.select %1924, %1928, %1927 : !llvm.i1, !llvm.i64 + %1930 = llvm.srem %1929, %68 : !llvm.i64 + %1931 = llvm.icmp "slt" %1930, %67 : !llvm.i64 + %1932 = llvm.add %1930, %68 : !llvm.i64 + %1933 = llvm.select %1931, %1932, %1930 : !llvm.i1, !llvm.i64 + %1934 = llvm.mul %1929, %65 : !llvm.i64 + %1935 = llvm.add %1779, %1934 : !llvm.i64 + %1936 = llvm.add %1935, %52 : !llvm.i64 + %1937 = llvm.icmp "slt" %1936, %67 : !llvm.i64 + %1938 = llvm.sub %64, %1936 : !llvm.i64 + %1939 = llvm.select %1937, %1938, %1936 : !llvm.i1, !llvm.i64 + %1940 = llvm.sdiv %1939, %63 : !llvm.i64 + %1941 = llvm.sub %64, %1940 : !llvm.i64 + %1942 = llvm.select %1937, %1941, %1940 : !llvm.i1, !llvm.i64 + %1943 = llvm.mul %1942, %65 : !llvm.i64 + %1944 = llvm.add %1935, %1943 : !llvm.i64 + %1945 = llvm.add %1944, %52 : !llvm.i64 + %1946 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1948 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1949 = llvm.mul %1933, %1948 : !llvm.i64 + %1950 = llvm.add %1947, %1949 : !llvm.i64 + %1951 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1952 = llvm.mul %1729, %1951 : !llvm.i64 + %1953 = llvm.add %1950, %1952 : !llvm.i64 + %1954 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1955 = llvm.mul %1945, %1954 : !llvm.i64 + %1956 = llvm.add %1953, %1955 : !llvm.i64 + %1957 = llvm.getelementptr %1946[%1956] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1922, %1957 : !llvm.ptr> + %1958 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1960 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1961 = llvm.mul %67, %1960 : !llvm.i64 + %1962 = llvm.add %1959, %1961 : !llvm.i64 + %1963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1964 = llvm.mul %56, %1963 : !llvm.i64 + %1965 = llvm.add %1962, %1964 : !llvm.i64 + %1966 = llvm.getelementptr %1958[%1965] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1967 = llvm.load %1966 : !llvm.ptr> + %1968 = llvm.add %1721, %45 : !llvm.i64 + %1969 = llvm.icmp "slt" %1968, %67 : !llvm.i64 + %1970 = llvm.sub %64, %1968 : !llvm.i64 + %1971 = llvm.select %1969, %1970, %1968 : !llvm.i1, !llvm.i64 + %1972 = llvm.sdiv %1971, %68 : !llvm.i64 + %1973 = llvm.sub %64, %1972 : !llvm.i64 + %1974 = llvm.select %1969, %1973, %1972 : !llvm.i1, !llvm.i64 + %1975 = llvm.mul %1974, %60 : !llvm.i64 + %1976 = llvm.add %1721, %1975 : !llvm.i64 + %1977 = llvm.add %1976, %45 : !llvm.i64 + %1978 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1980 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1981 = llvm.mul %1977, %1980 : !llvm.i64 + %1982 = llvm.add %1979, %1981 : !llvm.i64 + %1983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1984 = llvm.mul %1729, %1983 : !llvm.i64 + %1985 = llvm.add %1982, %1984 : !llvm.i64 + %1986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1987 = llvm.mul %1743, %1986 : !llvm.i64 + %1988 = llvm.add %1985, %1987 : !llvm.i64 + %1989 = llvm.getelementptr %1978[%1988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1967, %1989 : !llvm.ptr> + %1990 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1991 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1992 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1993 = llvm.mul %67, %1992 : !llvm.i64 + %1994 = llvm.add %1991, %1993 : !llvm.i64 + %1995 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1996 = llvm.mul %61, %1995 : !llvm.i64 + %1997 = llvm.add %1994, %1996 : !llvm.i64 + %1998 = llvm.getelementptr %1990[%1997] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1999 = llvm.load %1998 : !llvm.ptr> + %2000 = llvm.add %155, %46 : !llvm.i64 + %2001 = llvm.icmp "slt" %2000, %67 : !llvm.i64 + %2002 = llvm.sub %64, %2000 : !llvm.i64 + %2003 = llvm.select %2001, %2002, %2000 : !llvm.i1, !llvm.i64 + %2004 = llvm.sdiv %2003, %68 : !llvm.i64 + %2005 = llvm.sub %64, %2004 : !llvm.i64 + %2006 = llvm.select %2001, %2005, %2004 : !llvm.i1, !llvm.i64 + %2007 = llvm.srem %2006, %68 : !llvm.i64 + %2008 = llvm.icmp "slt" %2007, %67 : !llvm.i64 + %2009 = llvm.add %2007, %68 : !llvm.i64 + %2010 = llvm.select %2008, %2009, %2007 : !llvm.i1, !llvm.i64 + %2011 = llvm.mul %2006, %65 : !llvm.i64 + %2012 = llvm.add %1779, %2011 : !llvm.i64 + %2013 = llvm.add %2012, %61 : !llvm.i64 + %2014 = llvm.icmp "slt" %2013, %67 : !llvm.i64 + %2015 = llvm.sub %64, %2013 : !llvm.i64 + %2016 = llvm.select %2014, %2015, %2013 : !llvm.i1, !llvm.i64 + %2017 = llvm.sdiv %2016, %63 : !llvm.i64 + %2018 = llvm.sub %64, %2017 : !llvm.i64 + %2019 = llvm.select %2014, %2018, %2017 : !llvm.i1, !llvm.i64 + %2020 = llvm.mul %2019, %65 : !llvm.i64 + %2021 = llvm.add %2012, %2020 : !llvm.i64 + %2022 = llvm.add %2021, %61 : !llvm.i64 + %2023 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2024 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2025 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2026 = llvm.mul %2010, %2025 : !llvm.i64 + %2027 = llvm.add %2024, %2026 : !llvm.i64 + %2028 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2029 = llvm.mul %1729, %2028 : !llvm.i64 + %2030 = llvm.add %2027, %2029 : !llvm.i64 + %2031 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2032 = llvm.mul %2022, %2031 : !llvm.i64 + %2033 = llvm.add %2030, %2032 : !llvm.i64 + %2034 = llvm.getelementptr %2023[%2033] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1999, %2034 : !llvm.ptr> + %2035 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2036 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2037 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2038 = llvm.mul %67, %2037 : !llvm.i64 + %2039 = llvm.add %2036, %2038 : !llvm.i64 + %2040 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2041 = llvm.mul %70, %2040 : !llvm.i64 + %2042 = llvm.add %2039, %2041 : !llvm.i64 + %2043 = llvm.getelementptr %2035[%2042] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2044 = llvm.load %2043 : !llvm.ptr> + %2045 = llvm.add %1721, %48 : !llvm.i64 + %2046 = llvm.icmp "slt" %2045, %67 : !llvm.i64 + %2047 = llvm.sub %64, %2045 : !llvm.i64 + %2048 = llvm.select %2046, %2047, %2045 : !llvm.i1, !llvm.i64 + %2049 = llvm.sdiv %2048, %68 : !llvm.i64 + %2050 = llvm.sub %64, %2049 : !llvm.i64 + %2051 = llvm.select %2046, %2050, %2049 : !llvm.i1, !llvm.i64 + %2052 = llvm.mul %2051, %60 : !llvm.i64 + %2053 = llvm.add %1721, %2052 : !llvm.i64 + %2054 = llvm.add %2053, %48 : !llvm.i64 + %2055 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2056 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2057 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2058 = llvm.mul %2054, %2057 : !llvm.i64 + %2059 = llvm.add %2056, %2058 : !llvm.i64 + %2060 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2061 = llvm.mul %1729, %2060 : !llvm.i64 + %2062 = llvm.add %2059, %2061 : !llvm.i64 + %2063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2064 = llvm.mul %1743, %2063 : !llvm.i64 + %2065 = llvm.add %2062, %2064 : !llvm.i64 + %2066 = llvm.getelementptr %2055[%2065] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2044, %2066 : !llvm.ptr> + %2067 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2069 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2070 = llvm.mul %67, %2069 : !llvm.i64 + %2071 = llvm.add %2068, %2070 : !llvm.i64 + %2072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2073 = llvm.mul %50, %2072 : !llvm.i64 + %2074 = llvm.add %2071, %2073 : !llvm.i64 + %2075 = llvm.getelementptr %2067[%2074] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2076 = llvm.load %2075 : !llvm.ptr> + %2077 = llvm.add %155, %49 : !llvm.i64 + %2078 = llvm.icmp "slt" %2077, %67 : !llvm.i64 + %2079 = llvm.sub %64, %2077 : !llvm.i64 + %2080 = llvm.select %2078, %2079, %2077 : !llvm.i1, !llvm.i64 + %2081 = llvm.sdiv %2080, %68 : !llvm.i64 + %2082 = llvm.sub %64, %2081 : !llvm.i64 + %2083 = llvm.select %2078, %2082, %2081 : !llvm.i1, !llvm.i64 + %2084 = llvm.srem %2083, %68 : !llvm.i64 + %2085 = llvm.icmp "slt" %2084, %67 : !llvm.i64 + %2086 = llvm.add %2084, %68 : !llvm.i64 + %2087 = llvm.select %2085, %2086, %2084 : !llvm.i1, !llvm.i64 + %2088 = llvm.mul %2083, %65 : !llvm.i64 + %2089 = llvm.add %1779, %2088 : !llvm.i64 + %2090 = llvm.add %2089, %50 : !llvm.i64 + %2091 = llvm.icmp "slt" %2090, %67 : !llvm.i64 + %2092 = llvm.sub %64, %2090 : !llvm.i64 + %2093 = llvm.select %2091, %2092, %2090 : !llvm.i1, !llvm.i64 + %2094 = llvm.sdiv %2093, %63 : !llvm.i64 + %2095 = llvm.sub %64, %2094 : !llvm.i64 + %2096 = llvm.select %2091, %2095, %2094 : !llvm.i1, !llvm.i64 + %2097 = llvm.mul %2096, %65 : !llvm.i64 + %2098 = llvm.add %2089, %2097 : !llvm.i64 + %2099 = llvm.add %2098, %50 : !llvm.i64 + %2100 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2101 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2102 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2103 = llvm.mul %2087, %2102 : !llvm.i64 + %2104 = llvm.add %2101, %2103 : !llvm.i64 + %2105 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2106 = llvm.mul %1729, %2105 : !llvm.i64 + %2107 = llvm.add %2104, %2106 : !llvm.i64 + %2108 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2109 = llvm.mul %2099, %2108 : !llvm.i64 + %2110 = llvm.add %2107, %2109 : !llvm.i64 + %2111 = llvm.getelementptr %2100[%2110] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2076, %2111 : !llvm.ptr> + %2112 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2114 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2115 = llvm.mul %67, %2114 : !llvm.i64 + %2116 = llvm.add %2113, %2115 : !llvm.i64 + %2117 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2118 = llvm.mul %33, %2117 : !llvm.i64 + %2119 = llvm.add %2116, %2118 : !llvm.i64 + %2120 = llvm.getelementptr %2112[%2119] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2121 = llvm.load %2120 : !llvm.ptr> + %2122 = llvm.add %1721, %52 : !llvm.i64 + %2123 = llvm.icmp "slt" %2122, %67 : !llvm.i64 + %2124 = llvm.sub %64, %2122 : !llvm.i64 + %2125 = llvm.select %2123, %2124, %2122 : !llvm.i1, !llvm.i64 + %2126 = llvm.sdiv %2125, %68 : !llvm.i64 + %2127 = llvm.sub %64, %2126 : !llvm.i64 + %2128 = llvm.select %2123, %2127, %2126 : !llvm.i1, !llvm.i64 + %2129 = llvm.mul %2128, %60 : !llvm.i64 + %2130 = llvm.add %1721, %2129 : !llvm.i64 + %2131 = llvm.add %2130, %52 : !llvm.i64 + %2132 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2134 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2135 = llvm.mul %2131, %2134 : !llvm.i64 + %2136 = llvm.add %2133, %2135 : !llvm.i64 + %2137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2138 = llvm.mul %1729, %2137 : !llvm.i64 + %2139 = llvm.add %2136, %2138 : !llvm.i64 + %2140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2141 = llvm.mul %1743, %2140 : !llvm.i64 + %2142 = llvm.add %2139, %2141 : !llvm.i64 + %2143 = llvm.getelementptr %2132[%2142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2121, %2143 : !llvm.ptr> + %2144 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2145 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2146 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2147 = llvm.mul %67, %2146 : !llvm.i64 + %2148 = llvm.add %2145, %2147 : !llvm.i64 + %2149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2150 = llvm.mul %54, %2149 : !llvm.i64 + %2151 = llvm.add %2148, %2150 : !llvm.i64 + %2152 = llvm.getelementptr %2144[%2151] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2153 = llvm.load %2152 : !llvm.ptr> + %2154 = llvm.add %155, %53 : !llvm.i64 + %2155 = llvm.icmp "slt" %2154, %67 : !llvm.i64 + %2156 = llvm.sub %64, %2154 : !llvm.i64 + %2157 = llvm.select %2155, %2156, %2154 : !llvm.i1, !llvm.i64 + %2158 = llvm.sdiv %2157, %68 : !llvm.i64 + %2159 = llvm.sub %64, %2158 : !llvm.i64 + %2160 = llvm.select %2155, %2159, %2158 : !llvm.i1, !llvm.i64 + %2161 = llvm.srem %2160, %68 : !llvm.i64 + %2162 = llvm.icmp "slt" %2161, %67 : !llvm.i64 + %2163 = llvm.add %2161, %68 : !llvm.i64 + %2164 = llvm.select %2162, %2163, %2161 : !llvm.i1, !llvm.i64 + %2165 = llvm.mul %2160, %65 : !llvm.i64 + %2166 = llvm.add %1779, %2165 : !llvm.i64 + %2167 = llvm.add %2166, %54 : !llvm.i64 + %2168 = llvm.icmp "slt" %2167, %67 : !llvm.i64 + %2169 = llvm.sub %64, %2167 : !llvm.i64 + %2170 = llvm.select %2168, %2169, %2167 : !llvm.i1, !llvm.i64 + %2171 = llvm.sdiv %2170, %63 : !llvm.i64 + %2172 = llvm.sub %64, %2171 : !llvm.i64 + %2173 = llvm.select %2168, %2172, %2171 : !llvm.i1, !llvm.i64 + %2174 = llvm.mul %2173, %65 : !llvm.i64 + %2175 = llvm.add %2166, %2174 : !llvm.i64 + %2176 = llvm.add %2175, %54 : !llvm.i64 + %2177 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2179 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2180 = llvm.mul %2164, %2179 : !llvm.i64 + %2181 = llvm.add %2178, %2180 : !llvm.i64 + %2182 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2183 = llvm.mul %1729, %2182 : !llvm.i64 + %2184 = llvm.add %2181, %2183 : !llvm.i64 + %2185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2186 = llvm.mul %2176, %2185 : !llvm.i64 + %2187 = llvm.add %2184, %2186 : !llvm.i64 + %2188 = llvm.getelementptr %2177[%2187] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2153, %2188 : !llvm.ptr> + %2189 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2191 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2192 = llvm.mul %67, %2191 : !llvm.i64 + %2193 = llvm.add %2190, %2192 : !llvm.i64 + %2194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2195 = llvm.mul %34, %2194 : !llvm.i64 + %2196 = llvm.add %2193, %2195 : !llvm.i64 + %2197 = llvm.getelementptr %2189[%2196] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2198 = llvm.load %2197 : !llvm.ptr> + %2199 = llvm.add %1721, %56 : !llvm.i64 + %2200 = llvm.icmp "slt" %2199, %67 : !llvm.i64 + %2201 = llvm.sub %64, %2199 : !llvm.i64 + %2202 = llvm.select %2200, %2201, %2199 : !llvm.i1, !llvm.i64 + %2203 = llvm.sdiv %2202, %68 : !llvm.i64 + %2204 = llvm.sub %64, %2203 : !llvm.i64 + %2205 = llvm.select %2200, %2204, %2203 : !llvm.i1, !llvm.i64 + %2206 = llvm.mul %2205, %60 : !llvm.i64 + %2207 = llvm.add %1721, %2206 : !llvm.i64 + %2208 = llvm.add %2207, %56 : !llvm.i64 + %2209 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2211 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2212 = llvm.mul %2208, %2211 : !llvm.i64 + %2213 = llvm.add %2210, %2212 : !llvm.i64 + %2214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2215 = llvm.mul %1729, %2214 : !llvm.i64 + %2216 = llvm.add %2213, %2215 : !llvm.i64 + %2217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2218 = llvm.mul %1743, %2217 : !llvm.i64 + %2219 = llvm.add %2216, %2218 : !llvm.i64 + %2220 = llvm.getelementptr %2209[%2219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2198, %2220 : !llvm.ptr> + %2221 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2223 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2224 = llvm.mul %67, %2223 : !llvm.i64 + %2225 = llvm.add %2222, %2224 : !llvm.i64 + %2226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2227 = llvm.mul %58, %2226 : !llvm.i64 + %2228 = llvm.add %2225, %2227 : !llvm.i64 + %2229 = llvm.getelementptr %2221[%2228] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2230 = llvm.load %2229 : !llvm.ptr> + %2231 = llvm.add %155, %57 : !llvm.i64 + %2232 = llvm.icmp "slt" %2231, %67 : !llvm.i64 + %2233 = llvm.sub %64, %2231 : !llvm.i64 + %2234 = llvm.select %2232, %2233, %2231 : !llvm.i1, !llvm.i64 + %2235 = llvm.sdiv %2234, %68 : !llvm.i64 + %2236 = llvm.sub %64, %2235 : !llvm.i64 + %2237 = llvm.select %2232, %2236, %2235 : !llvm.i1, !llvm.i64 + %2238 = llvm.srem %2237, %68 : !llvm.i64 + %2239 = llvm.icmp "slt" %2238, %67 : !llvm.i64 + %2240 = llvm.add %2238, %68 : !llvm.i64 + %2241 = llvm.select %2239, %2240, %2238 : !llvm.i1, !llvm.i64 + %2242 = llvm.mul %2237, %65 : !llvm.i64 + %2243 = llvm.add %1779, %2242 : !llvm.i64 + %2244 = llvm.add %2243, %58 : !llvm.i64 + %2245 = llvm.icmp "slt" %2244, %67 : !llvm.i64 + %2246 = llvm.sub %64, %2244 : !llvm.i64 + %2247 = llvm.select %2245, %2246, %2244 : !llvm.i1, !llvm.i64 + %2248 = llvm.sdiv %2247, %63 : !llvm.i64 + %2249 = llvm.sub %64, %2248 : !llvm.i64 + %2250 = llvm.select %2245, %2249, %2248 : !llvm.i1, !llvm.i64 + %2251 = llvm.mul %2250, %65 : !llvm.i64 + %2252 = llvm.add %2243, %2251 : !llvm.i64 + %2253 = llvm.add %2252, %58 : !llvm.i64 + %2254 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2256 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2257 = llvm.mul %2241, %2256 : !llvm.i64 + %2258 = llvm.add %2255, %2257 : !llvm.i64 + %2259 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2260 = llvm.mul %1729, %2259 : !llvm.i64 + %2261 = llvm.add %2258, %2260 : !llvm.i64 + %2262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2263 = llvm.mul %2253, %2262 : !llvm.i64 + %2264 = llvm.add %2261, %2263 : !llvm.i64 + %2265 = llvm.getelementptr %2254[%2264] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2230, %2265 : !llvm.ptr> + %2266 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2268 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2269 = llvm.mul %67, %2268 : !llvm.i64 + %2270 = llvm.add %2267, %2269 : !llvm.i64 + %2271 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2272 = llvm.mul %35, %2271 : !llvm.i64 + %2273 = llvm.add %2270, %2272 : !llvm.i64 + %2274 = llvm.getelementptr %2266[%2273] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2275 = llvm.load %2274 : !llvm.ptr> + %2276 = llvm.add %1721, %61 : !llvm.i64 + %2277 = llvm.icmp "slt" %2276, %67 : !llvm.i64 + %2278 = llvm.sub %64, %2276 : !llvm.i64 + %2279 = llvm.select %2277, %2278, %2276 : !llvm.i1, !llvm.i64 + %2280 = llvm.sdiv %2279, %68 : !llvm.i64 + %2281 = llvm.sub %64, %2280 : !llvm.i64 + %2282 = llvm.select %2277, %2281, %2280 : !llvm.i1, !llvm.i64 + %2283 = llvm.mul %2282, %60 : !llvm.i64 + %2284 = llvm.add %1721, %2283 : !llvm.i64 + %2285 = llvm.add %2284, %61 : !llvm.i64 + %2286 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2287 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2288 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2289 = llvm.mul %2285, %2288 : !llvm.i64 + %2290 = llvm.add %2287, %2289 : !llvm.i64 + %2291 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2292 = llvm.mul %1729, %2291 : !llvm.i64 + %2293 = llvm.add %2290, %2292 : !llvm.i64 + %2294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2295 = llvm.mul %1743, %2294 : !llvm.i64 + %2296 = llvm.add %2293, %2295 : !llvm.i64 + %2297 = llvm.getelementptr %2286[%2296] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2275, %2297 : !llvm.ptr> + %2298 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2300 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2301 = llvm.mul %67, %2300 : !llvm.i64 + %2302 = llvm.add %2299, %2301 : !llvm.i64 + %2303 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2304 = llvm.mul %66, %2303 : !llvm.i64 + %2305 = llvm.add %2302, %2304 : !llvm.i64 + %2306 = llvm.getelementptr %2298[%2305] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2307 = llvm.load %2306 : !llvm.ptr> + %2308 = llvm.add %155, %62 : !llvm.i64 + %2309 = llvm.icmp "slt" %2308, %67 : !llvm.i64 + %2310 = llvm.sub %64, %2308 : !llvm.i64 + %2311 = llvm.select %2309, %2310, %2308 : !llvm.i1, !llvm.i64 + %2312 = llvm.sdiv %2311, %68 : !llvm.i64 + %2313 = llvm.sub %64, %2312 : !llvm.i64 + %2314 = llvm.select %2309, %2313, %2312 : !llvm.i1, !llvm.i64 + %2315 = llvm.srem %2314, %68 : !llvm.i64 + %2316 = llvm.icmp "slt" %2315, %67 : !llvm.i64 + %2317 = llvm.add %2315, %68 : !llvm.i64 + %2318 = llvm.select %2316, %2317, %2315 : !llvm.i1, !llvm.i64 + %2319 = llvm.mul %2314, %65 : !llvm.i64 + %2320 = llvm.add %1779, %2319 : !llvm.i64 + %2321 = llvm.add %2320, %66 : !llvm.i64 + %2322 = llvm.icmp "slt" %2321, %67 : !llvm.i64 + %2323 = llvm.sub %64, %2321 : !llvm.i64 + %2324 = llvm.select %2322, %2323, %2321 : !llvm.i1, !llvm.i64 + %2325 = llvm.sdiv %2324, %63 : !llvm.i64 + %2326 = llvm.sub %64, %2325 : !llvm.i64 + %2327 = llvm.select %2322, %2326, %2325 : !llvm.i1, !llvm.i64 + %2328 = llvm.mul %2327, %65 : !llvm.i64 + %2329 = llvm.add %2320, %2328 : !llvm.i64 + %2330 = llvm.add %2329, %66 : !llvm.i64 + %2331 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2332 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2333 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2334 = llvm.mul %2318, %2333 : !llvm.i64 + %2335 = llvm.add %2332, %2334 : !llvm.i64 + %2336 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2337 = llvm.mul %1729, %2336 : !llvm.i64 + %2338 = llvm.add %2335, %2337 : !llvm.i64 + %2339 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2340 = llvm.mul %2330, %2339 : !llvm.i64 + %2341 = llvm.add %2338, %2340 : !llvm.i64 + %2342 = llvm.getelementptr %2331[%2341] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2307, %2342 : !llvm.ptr> + llvm.br ^bb9 + ^bb9: // 2 preds: ^bb7, ^bb8 + %2343 = llvm.add %155, %39 : !llvm.i64 + llvm.br ^bb5(%2343 : !llvm.i64) + ^bb10: // pred: ^bb5 + %2344 = llvm.add %153, %69 : !llvm.i64 + llvm.br ^bb3(%2344 : !llvm.i64) + ^bb11: // pred: ^bb3 + llvm.br ^bb12(%67 : !llvm.i64) + ^bb12(%2345: !llvm.i64): // 2 preds: ^bb11, ^bb51 + %2346 = llvm.icmp "slt" %2345, %37 : !llvm.i64 + llvm.cond_br %2346, ^bb13, ^bb52 + ^bb13: // pred: ^bb12 + llvm.br ^bb14(%67 : !llvm.i64) + ^bb14(%2347: !llvm.i64): // 2 preds: ^bb13, ^bb21 + %2348 = llvm.icmp "slt" %2347, %68 : !llvm.i64 + llvm.cond_br %2348, ^bb15, ^bb22 + ^bb15: // pred: ^bb14 + llvm.br ^bb16(%67 : !llvm.i64) + ^bb16(%2349: !llvm.i64): // 2 preds: ^bb15, ^bb20 + %2350 = llvm.icmp "slt" %2349, %56 : !llvm.i64 + llvm.cond_br %2350, ^bb17, ^bb21 + ^bb17: // pred: ^bb16 + llvm.br ^bb18(%67 : !llvm.i64) + ^bb18(%2351: !llvm.i64): // 2 preds: ^bb17, ^bb19 + %2352 = llvm.icmp "slt" %2351, %63 : !llvm.i64 + llvm.cond_br %2352, ^bb19, ^bb20 + ^bb19: // pred: ^bb18 + %2353 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2354 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2355 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2356 = llvm.mul %2347, %2355 : !llvm.i64 + %2357 = llvm.add %2354, %2356 : !llvm.i64 + %2358 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2359 = llvm.mul %2349, %2358 : !llvm.i64 + %2360 = llvm.add %2357, %2359 : !llvm.i64 + %2361 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2362 = llvm.mul %2351, %2361 : !llvm.i64 + %2363 = llvm.add %2360, %2362 : !llvm.i64 + %2364 = llvm.getelementptr %2353[%2363] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %32, %2364 : !llvm.ptr> + %2365 = llvm.add %2351, %69 : !llvm.i64 + llvm.br ^bb18(%2365 : !llvm.i64) + ^bb20: // pred: ^bb18 + %2366 = llvm.add %2349, %69 : !llvm.i64 + llvm.br ^bb16(%2366 : !llvm.i64) + ^bb21: // pred: ^bb16 + %2367 = llvm.add %2347, %69 : !llvm.i64 + llvm.br ^bb14(%2367 : !llvm.i64) + ^bb22: // pred: ^bb14 + llvm.br ^bb23(%67 : !llvm.i64) + ^bb23(%2368: !llvm.i64): // 2 preds: ^bb22, ^bb39 + %2369 = llvm.icmp "slt" %2368, %38 : !llvm.i64 + llvm.cond_br %2369, ^bb24, ^bb40 + ^bb24: // pred: ^bb23 + llvm.br ^bb25(%67 : !llvm.i64) + ^bb25(%2370: !llvm.i64): // 2 preds: ^bb24, ^bb38 + %2371 = llvm.icmp "slt" %2370, %39 : !llvm.i64 + llvm.cond_br %2371, ^bb26, ^bb39 + ^bb26: // pred: ^bb25 + llvm.br ^bb27(%67 : !llvm.i64) + ^bb27(%2372: !llvm.i64): // 2 preds: ^bb26, ^bb34 + %2373 = llvm.icmp "slt" %2372, %67 : !llvm.i64 + llvm.cond_br %2373, ^bb28, ^bb35 + ^bb28: // pred: ^bb27 + llvm.br ^bb29(%67 : !llvm.i64) + ^bb29(%2374: !llvm.i64): // 2 preds: ^bb28, ^bb33 + %2375 = llvm.icmp "slt" %2374, %48 : !llvm.i64 + llvm.cond_br %2375, ^bb30, ^bb34 + ^bb30: // pred: ^bb29 + llvm.br ^bb31(%67 : !llvm.i64) + ^bb31(%2376: !llvm.i64): // 2 preds: ^bb30, ^bb32 + %2377 = llvm.icmp "slt" %2376, %67 : !llvm.i64 + llvm.cond_br %2377, ^bb32, ^bb33 + ^bb32: // pred: ^bb31 + %2378 = llvm.add %2345, %2372 : !llvm.i64 + %2379 = llvm.add %2378, %2376 : !llvm.i64 + %2380 = llvm.add %2370, %2374 : !llvm.i64 + %2381 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2383 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2384 = llvm.mul %2379, %2383 : !llvm.i64 + %2385 = llvm.add %2382, %2384 : !llvm.i64 + %2386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2387 = llvm.mul %2380, %2386 : !llvm.i64 + %2388 = llvm.add %2385, %2387 : !llvm.i64 + %2389 = llvm.getelementptr %2381[%2388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2390 = llvm.load %2389 : !llvm.ptr + %2391 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2392 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2393 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2394 = llvm.mul %2379, %2393 : !llvm.i64 + %2395 = llvm.add %2392, %2394 : !llvm.i64 + %2396 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2397 = llvm.mul %2380, %2396 : !llvm.i64 + %2398 = llvm.add %2395, %2397 : !llvm.i64 + %2399 = llvm.getelementptr %2391[%2398] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2400 = llvm.load %2399 : !llvm.ptr + %2401 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2403 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2404 = llvm.mul %2379, %2403 : !llvm.i64 + %2405 = llvm.add %2402, %2404 : !llvm.i64 + %2406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2407 = llvm.mul %2380, %2406 : !llvm.i64 + %2408 = llvm.add %2405, %2407 : !llvm.i64 + %2409 = llvm.getelementptr %2401[%2408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2410 = llvm.load %2409 : !llvm.ptr + %2411 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2413 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2414 = llvm.mul %2379, %2413 : !llvm.i64 + %2415 = llvm.add %2412, %2414 : !llvm.i64 + %2416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2417 = llvm.mul %2380, %2416 : !llvm.i64 + %2418 = llvm.add %2415, %2417 : !llvm.i64 + %2419 = llvm.getelementptr %2411[%2418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2420 = llvm.load %2419 : !llvm.ptr + %2421 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2423 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2424 = llvm.mul %2379, %2423 : !llvm.i64 + %2425 = llvm.add %2422, %2424 : !llvm.i64 + %2426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2427 = llvm.mul %2380, %2426 : !llvm.i64 + %2428 = llvm.add %2425, %2427 : !llvm.i64 + %2429 = llvm.getelementptr %2421[%2428] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2430 = llvm.load %2429 : !llvm.ptr + %2431 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2433 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2434 = llvm.mul %2379, %2433 : !llvm.i64 + %2435 = llvm.add %2432, %2434 : !llvm.i64 + %2436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2437 = llvm.mul %2380, %2436 : !llvm.i64 + %2438 = llvm.add %2435, %2437 : !llvm.i64 + %2439 = llvm.getelementptr %2431[%2438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2440 = llvm.load %2439 : !llvm.ptr + %2441 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2442 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2443 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2444 = llvm.mul %2379, %2443 : !llvm.i64 + %2445 = llvm.add %2442, %2444 : !llvm.i64 + %2446 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2447 = llvm.mul %2380, %2446 : !llvm.i64 + %2448 = llvm.add %2445, %2447 : !llvm.i64 + %2449 = llvm.getelementptr %2441[%2448] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2450 = llvm.load %2449 : !llvm.ptr + %2451 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2452 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2453 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2454 = llvm.mul %2379, %2453 : !llvm.i64 + %2455 = llvm.add %2452, %2454 : !llvm.i64 + %2456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2457 = llvm.mul %2380, %2456 : !llvm.i64 + %2458 = llvm.add %2455, %2457 : !llvm.i64 + %2459 = llvm.getelementptr %2451[%2458] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2460 = llvm.load %2459 : !llvm.ptr + %2461 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %2462 = llvm.sub %64, %2368 : !llvm.i64 + %2463 = llvm.select %2461, %2462, %2368 : !llvm.i1, !llvm.i64 + %2464 = llvm.sdiv %2463, %68 : !llvm.i64 + %2465 = llvm.sub %64, %2464 : !llvm.i64 + %2466 = llvm.select %2461, %2465, %2464 : !llvm.i1, !llvm.i64 + %2467 = llvm.srem %2466, %68 : !llvm.i64 + %2468 = llvm.icmp "slt" %2467, %67 : !llvm.i64 + %2469 = llvm.add %2467, %68 : !llvm.i64 + %2470 = llvm.select %2468, %2469, %2467 : !llvm.i1, !llvm.i64 + %2471 = llvm.srem %2380, %39 : !llvm.i64 + %2472 = llvm.icmp "slt" %2471, %67 : !llvm.i64 + %2473 = llvm.add %2471, %39 : !llvm.i64 + %2474 = llvm.select %2472, %2473, %2471 : !llvm.i1, !llvm.i64 + %2475 = llvm.srem %2368, %68 : !llvm.i64 + %2476 = llvm.icmp "slt" %2475, %67 : !llvm.i64 + %2477 = llvm.add %2475, %68 : !llvm.i64 + %2478 = llvm.select %2476, %2477, %2475 : !llvm.i1, !llvm.i64 + %2479 = llvm.icmp "slt" %2478, %67 : !llvm.i64 + %2480 = llvm.sub %64, %2478 : !llvm.i64 + %2481 = llvm.select %2479, %2480, %2478 : !llvm.i1, !llvm.i64 + %2482 = llvm.sdiv %2481, %70 : !llvm.i64 + %2483 = llvm.sub %64, %2482 : !llvm.i64 + %2484 = llvm.select %2479, %2483, %2482 : !llvm.i1, !llvm.i64 + %2485 = llvm.srem %2484, %63 : !llvm.i64 + %2486 = llvm.icmp "slt" %2485, %67 : !llvm.i64 + %2487 = llvm.add %2485, %63 : !llvm.i64 + %2488 = llvm.select %2486, %2487, %2485 : !llvm.i1, !llvm.i64 + %2489 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2490 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2491 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2492 = llvm.mul %2470, %2491 : !llvm.i64 + %2493 = llvm.add %2490, %2492 : !llvm.i64 + %2494 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2495 = llvm.mul %2474, %2494 : !llvm.i64 + %2496 = llvm.add %2493, %2495 : !llvm.i64 + %2497 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2498 = llvm.mul %2488, %2497 : !llvm.i64 + %2499 = llvm.add %2496, %2498 : !llvm.i64 + %2500 = llvm.getelementptr %2489[%2499] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2501 = llvm.load %2500 : !llvm.ptr> + %2502 = llvm.extractelement %2501[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2503 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2504 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2505 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2506 = llvm.mul %2470, %2505 : !llvm.i64 + %2507 = llvm.add %2504, %2506 : !llvm.i64 + %2508 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2509 = llvm.mul %2474, %2508 : !llvm.i64 + %2510 = llvm.add %2507, %2509 : !llvm.i64 + %2511 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2512 = llvm.mul %2488, %2511 : !llvm.i64 + %2513 = llvm.add %2510, %2512 : !llvm.i64 + %2514 = llvm.getelementptr %2503[%2513] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2515 = llvm.load %2514 : !llvm.ptr> + %2516 = llvm.extractelement %2515[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2517 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2519 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2520 = llvm.mul %2470, %2519 : !llvm.i64 + %2521 = llvm.add %2518, %2520 : !llvm.i64 + %2522 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2523 = llvm.mul %2474, %2522 : !llvm.i64 + %2524 = llvm.add %2521, %2523 : !llvm.i64 + %2525 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2526 = llvm.mul %2488, %2525 : !llvm.i64 + %2527 = llvm.add %2524, %2526 : !llvm.i64 + %2528 = llvm.getelementptr %2517[%2527] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2529 = llvm.load %2528 : !llvm.ptr> + %2530 = llvm.extractelement %2529[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2534 = llvm.mul %2470, %2533 : !llvm.i64 + %2535 = llvm.add %2532, %2534 : !llvm.i64 + %2536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2537 = llvm.mul %2474, %2536 : !llvm.i64 + %2538 = llvm.add %2535, %2537 : !llvm.i64 + %2539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2540 = llvm.mul %2488, %2539 : !llvm.i64 + %2541 = llvm.add %2538, %2540 : !llvm.i64 + %2542 = llvm.getelementptr %2531[%2541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2543 = llvm.load %2542 : !llvm.ptr> + %2544 = llvm.extractelement %2543[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2545 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2546 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2547 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2548 = llvm.mul %2470, %2547 : !llvm.i64 + %2549 = llvm.add %2546, %2548 : !llvm.i64 + %2550 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2551 = llvm.mul %2474, %2550 : !llvm.i64 + %2552 = llvm.add %2549, %2551 : !llvm.i64 + %2553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2554 = llvm.mul %2488, %2553 : !llvm.i64 + %2555 = llvm.add %2552, %2554 : !llvm.i64 + %2556 = llvm.getelementptr %2545[%2555] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2557 = llvm.load %2556 : !llvm.ptr> + %2558 = llvm.extractelement %2557[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2559 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2560 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2561 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2562 = llvm.mul %2470, %2561 : !llvm.i64 + %2563 = llvm.add %2560, %2562 : !llvm.i64 + %2564 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2565 = llvm.mul %2474, %2564 : !llvm.i64 + %2566 = llvm.add %2563, %2565 : !llvm.i64 + %2567 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2568 = llvm.mul %2488, %2567 : !llvm.i64 + %2569 = llvm.add %2566, %2568 : !llvm.i64 + %2570 = llvm.getelementptr %2559[%2569] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2571 = llvm.load %2570 : !llvm.ptr> + %2572 = llvm.extractelement %2571[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2573 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2574 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2575 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2576 = llvm.mul %2470, %2575 : !llvm.i64 + %2577 = llvm.add %2574, %2576 : !llvm.i64 + %2578 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2579 = llvm.mul %2474, %2578 : !llvm.i64 + %2580 = llvm.add %2577, %2579 : !llvm.i64 + %2581 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2582 = llvm.mul %2488, %2581 : !llvm.i64 + %2583 = llvm.add %2580, %2582 : !llvm.i64 + %2584 = llvm.getelementptr %2573[%2583] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2585 = llvm.load %2584 : !llvm.ptr> + %2586 = llvm.extractelement %2585[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2587 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2588 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2589 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2590 = llvm.mul %2470, %2589 : !llvm.i64 + %2591 = llvm.add %2588, %2590 : !llvm.i64 + %2592 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2593 = llvm.mul %2474, %2592 : !llvm.i64 + %2594 = llvm.add %2591, %2593 : !llvm.i64 + %2595 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2596 = llvm.mul %2488, %2595 : !llvm.i64 + %2597 = llvm.add %2594, %2596 : !llvm.i64 + %2598 = llvm.getelementptr %2587[%2597] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2599 = llvm.load %2598 : !llvm.ptr> + %2600 = llvm.extractelement %2599[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2601 = llvm.fmul %2390, %2502 {RelaxedPrecision} : !llvm.float + %2602 = llvm.fmul %2400, %2516 {RelaxedPrecision} : !llvm.float + %2603 = llvm.fmul %2410, %2530 {RelaxedPrecision} : !llvm.float + %2604 = llvm.fmul %2420, %2544 {RelaxedPrecision} : !llvm.float + %2605 = llvm.fmul %2430, %2558 {RelaxedPrecision} : !llvm.float + %2606 = llvm.fmul %2440, %2572 {RelaxedPrecision} : !llvm.float + %2607 = llvm.fmul %2450, %2586 {RelaxedPrecision} : !llvm.float + %2608 = llvm.fmul %2460, %2600 {RelaxedPrecision} : !llvm.float + %2609 = llvm.add %2372, %2376 : !llvm.i64 + %2610 = llvm.srem %2609, %56 : !llvm.i64 + %2611 = llvm.icmp "slt" %2610, %67 : !llvm.i64 + %2612 = llvm.add %2610, %56 : !llvm.i64 + %2613 = llvm.select %2611, %2612, %2610 : !llvm.i1, !llvm.i64 + %2614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2617 = llvm.mul %2470, %2616 : !llvm.i64 + %2618 = llvm.add %2615, %2617 : !llvm.i64 + %2619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2620 = llvm.mul %2613, %2619 : !llvm.i64 + %2621 = llvm.add %2618, %2620 : !llvm.i64 + %2622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2623 = llvm.mul %2488, %2622 : !llvm.i64 + %2624 = llvm.add %2621, %2623 : !llvm.i64 + %2625 = llvm.getelementptr %2614[%2624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2626 = llvm.load %2625 : !llvm.ptr> + %2627 = llvm.extractelement %2626[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2628 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2630 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2631 = llvm.mul %2470, %2630 : !llvm.i64 + %2632 = llvm.add %2629, %2631 : !llvm.i64 + %2633 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2634 = llvm.mul %2613, %2633 : !llvm.i64 + %2635 = llvm.add %2632, %2634 : !llvm.i64 + %2636 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2637 = llvm.mul %2488, %2636 : !llvm.i64 + %2638 = llvm.add %2635, %2637 : !llvm.i64 + %2639 = llvm.getelementptr %2628[%2638] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2640 = llvm.load %2639 : !llvm.ptr> + %2641 = llvm.extractelement %2640[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2642 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2643 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2644 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2645 = llvm.mul %2470, %2644 : !llvm.i64 + %2646 = llvm.add %2643, %2645 : !llvm.i64 + %2647 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2648 = llvm.mul %2613, %2647 : !llvm.i64 + %2649 = llvm.add %2646, %2648 : !llvm.i64 + %2650 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2651 = llvm.mul %2488, %2650 : !llvm.i64 + %2652 = llvm.add %2649, %2651 : !llvm.i64 + %2653 = llvm.getelementptr %2642[%2652] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2654 = llvm.load %2653 : !llvm.ptr> + %2655 = llvm.extractelement %2654[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2659 = llvm.mul %2470, %2658 : !llvm.i64 + %2660 = llvm.add %2657, %2659 : !llvm.i64 + %2661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2662 = llvm.mul %2613, %2661 : !llvm.i64 + %2663 = llvm.add %2660, %2662 : !llvm.i64 + %2664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2665 = llvm.mul %2488, %2664 : !llvm.i64 + %2666 = llvm.add %2663, %2665 : !llvm.i64 + %2667 = llvm.getelementptr %2656[%2666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2668 = llvm.load %2667 : !llvm.ptr> + %2669 = llvm.extractelement %2668[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2673 = llvm.mul %2470, %2672 : !llvm.i64 + %2674 = llvm.add %2671, %2673 : !llvm.i64 + %2675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2676 = llvm.mul %2613, %2675 : !llvm.i64 + %2677 = llvm.add %2674, %2676 : !llvm.i64 + %2678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2679 = llvm.mul %2488, %2678 : !llvm.i64 + %2680 = llvm.add %2677, %2679 : !llvm.i64 + %2681 = llvm.getelementptr %2670[%2680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2682 = llvm.load %2681 : !llvm.ptr> + %2683 = llvm.extractelement %2682[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2684 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2685 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2686 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2687 = llvm.mul %2470, %2686 : !llvm.i64 + %2688 = llvm.add %2685, %2687 : !llvm.i64 + %2689 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2690 = llvm.mul %2613, %2689 : !llvm.i64 + %2691 = llvm.add %2688, %2690 : !llvm.i64 + %2692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2693 = llvm.mul %2488, %2692 : !llvm.i64 + %2694 = llvm.add %2691, %2693 : !llvm.i64 + %2695 = llvm.getelementptr %2684[%2694] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2696 = llvm.load %2695 : !llvm.ptr> + %2697 = llvm.extractelement %2696[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2698 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2700 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2701 = llvm.mul %2470, %2700 : !llvm.i64 + %2702 = llvm.add %2699, %2701 : !llvm.i64 + %2703 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2704 = llvm.mul %2613, %2703 : !llvm.i64 + %2705 = llvm.add %2702, %2704 : !llvm.i64 + %2706 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2707 = llvm.mul %2488, %2706 : !llvm.i64 + %2708 = llvm.add %2705, %2707 : !llvm.i64 + %2709 = llvm.getelementptr %2698[%2708] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2710 = llvm.load %2709 : !llvm.ptr> + %2711 = llvm.extractelement %2710[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2712 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2714 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2715 = llvm.mul %2470, %2714 : !llvm.i64 + %2716 = llvm.add %2713, %2715 : !llvm.i64 + %2717 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2718 = llvm.mul %2613, %2717 : !llvm.i64 + %2719 = llvm.add %2716, %2718 : !llvm.i64 + %2720 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2721 = llvm.mul %2488, %2720 : !llvm.i64 + %2722 = llvm.add %2719, %2721 : !llvm.i64 + %2723 = llvm.getelementptr %2712[%2722] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2724 = llvm.load %2723 : !llvm.ptr> + %2725 = llvm.extractelement %2724[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2726 = llvm.fadd %2627, %2601 {RelaxedPrecision} : !llvm.float + %2727 = llvm.fadd %2641, %2602 {RelaxedPrecision} : !llvm.float + %2728 = llvm.fadd %2655, %2603 {RelaxedPrecision} : !llvm.float + %2729 = llvm.fadd %2669, %2604 {RelaxedPrecision} : !llvm.float + %2730 = llvm.fadd %2683, %2605 {RelaxedPrecision} : !llvm.float + %2731 = llvm.fadd %2697, %2606 {RelaxedPrecision} : !llvm.float + %2732 = llvm.fadd %2711, %2607 {RelaxedPrecision} : !llvm.float + %2733 = llvm.fadd %2725, %2608 {RelaxedPrecision} : !llvm.float + %2734 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2735 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2736 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2737 = llvm.mul %2470, %2736 : !llvm.i64 + %2738 = llvm.add %2735, %2737 : !llvm.i64 + %2739 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2740 = llvm.mul %2613, %2739 : !llvm.i64 + %2741 = llvm.add %2738, %2740 : !llvm.i64 + %2742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2743 = llvm.mul %2488, %2742 : !llvm.i64 + %2744 = llvm.add %2741, %2743 : !llvm.i64 + %2745 = llvm.getelementptr %2734[%2744] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2746 = llvm.load %2745 : !llvm.ptr> + %2747 = llvm.insertelement %2726, %2746[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2748 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2750 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2751 = llvm.mul %2470, %2750 : !llvm.i64 + %2752 = llvm.add %2749, %2751 : !llvm.i64 + %2753 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2754 = llvm.mul %2613, %2753 : !llvm.i64 + %2755 = llvm.add %2752, %2754 : !llvm.i64 + %2756 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2757 = llvm.mul %2488, %2756 : !llvm.i64 + %2758 = llvm.add %2755, %2757 : !llvm.i64 + %2759 = llvm.getelementptr %2748[%2758] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2747, %2759 : !llvm.ptr> + %2760 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2761 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2762 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2763 = llvm.mul %2470, %2762 : !llvm.i64 + %2764 = llvm.add %2761, %2763 : !llvm.i64 + %2765 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2766 = llvm.mul %2613, %2765 : !llvm.i64 + %2767 = llvm.add %2764, %2766 : !llvm.i64 + %2768 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2769 = llvm.mul %2488, %2768 : !llvm.i64 + %2770 = llvm.add %2767, %2769 : !llvm.i64 + %2771 = llvm.getelementptr %2760[%2770] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2772 = llvm.load %2771 : !llvm.ptr> + %2773 = llvm.insertelement %2727, %2772[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2774 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2775 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2776 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2777 = llvm.mul %2470, %2776 : !llvm.i64 + %2778 = llvm.add %2775, %2777 : !llvm.i64 + %2779 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2780 = llvm.mul %2613, %2779 : !llvm.i64 + %2781 = llvm.add %2778, %2780 : !llvm.i64 + %2782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2783 = llvm.mul %2488, %2782 : !llvm.i64 + %2784 = llvm.add %2781, %2783 : !llvm.i64 + %2785 = llvm.getelementptr %2774[%2784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2773, %2785 : !llvm.ptr> + %2786 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2787 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2788 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2789 = llvm.mul %2470, %2788 : !llvm.i64 + %2790 = llvm.add %2787, %2789 : !llvm.i64 + %2791 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2792 = llvm.mul %2613, %2791 : !llvm.i64 + %2793 = llvm.add %2790, %2792 : !llvm.i64 + %2794 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2795 = llvm.mul %2488, %2794 : !llvm.i64 + %2796 = llvm.add %2793, %2795 : !llvm.i64 + %2797 = llvm.getelementptr %2786[%2796] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2798 = llvm.load %2797 : !llvm.ptr> + %2799 = llvm.insertelement %2728, %2798[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2800 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2801 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2802 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2803 = llvm.mul %2470, %2802 : !llvm.i64 + %2804 = llvm.add %2801, %2803 : !llvm.i64 + %2805 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2806 = llvm.mul %2613, %2805 : !llvm.i64 + %2807 = llvm.add %2804, %2806 : !llvm.i64 + %2808 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2809 = llvm.mul %2488, %2808 : !llvm.i64 + %2810 = llvm.add %2807, %2809 : !llvm.i64 + %2811 = llvm.getelementptr %2800[%2810] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2799, %2811 : !llvm.ptr> + %2812 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2813 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2814 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2815 = llvm.mul %2470, %2814 : !llvm.i64 + %2816 = llvm.add %2813, %2815 : !llvm.i64 + %2817 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2818 = llvm.mul %2613, %2817 : !llvm.i64 + %2819 = llvm.add %2816, %2818 : !llvm.i64 + %2820 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2821 = llvm.mul %2488, %2820 : !llvm.i64 + %2822 = llvm.add %2819, %2821 : !llvm.i64 + %2823 = llvm.getelementptr %2812[%2822] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2824 = llvm.load %2823 : !llvm.ptr> + %2825 = llvm.insertelement %2729, %2824[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2826 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2827 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2828 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2829 = llvm.mul %2470, %2828 : !llvm.i64 + %2830 = llvm.add %2827, %2829 : !llvm.i64 + %2831 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2832 = llvm.mul %2613, %2831 : !llvm.i64 + %2833 = llvm.add %2830, %2832 : !llvm.i64 + %2834 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2835 = llvm.mul %2488, %2834 : !llvm.i64 + %2836 = llvm.add %2833, %2835 : !llvm.i64 + %2837 = llvm.getelementptr %2826[%2836] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2825, %2837 : !llvm.ptr> + %2838 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2839 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2840 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2841 = llvm.mul %2470, %2840 : !llvm.i64 + %2842 = llvm.add %2839, %2841 : !llvm.i64 + %2843 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2844 = llvm.mul %2613, %2843 : !llvm.i64 + %2845 = llvm.add %2842, %2844 : !llvm.i64 + %2846 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2847 = llvm.mul %2488, %2846 : !llvm.i64 + %2848 = llvm.add %2845, %2847 : !llvm.i64 + %2849 = llvm.getelementptr %2838[%2848] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2850 = llvm.load %2849 : !llvm.ptr> + %2851 = llvm.insertelement %2730, %2850[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2852 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2853 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2854 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2855 = llvm.mul %2470, %2854 : !llvm.i64 + %2856 = llvm.add %2853, %2855 : !llvm.i64 + %2857 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2858 = llvm.mul %2613, %2857 : !llvm.i64 + %2859 = llvm.add %2856, %2858 : !llvm.i64 + %2860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2861 = llvm.mul %2488, %2860 : !llvm.i64 + %2862 = llvm.add %2859, %2861 : !llvm.i64 + %2863 = llvm.getelementptr %2852[%2862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2851, %2863 : !llvm.ptr> + %2864 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2865 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2866 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2867 = llvm.mul %2470, %2866 : !llvm.i64 + %2868 = llvm.add %2865, %2867 : !llvm.i64 + %2869 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2870 = llvm.mul %2613, %2869 : !llvm.i64 + %2871 = llvm.add %2868, %2870 : !llvm.i64 + %2872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2873 = llvm.mul %2488, %2872 : !llvm.i64 + %2874 = llvm.add %2871, %2873 : !llvm.i64 + %2875 = llvm.getelementptr %2864[%2874] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2876 = llvm.load %2875 : !llvm.ptr> + %2877 = llvm.insertelement %2731, %2876[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2878 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2880 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2881 = llvm.mul %2470, %2880 : !llvm.i64 + %2882 = llvm.add %2879, %2881 : !llvm.i64 + %2883 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2884 = llvm.mul %2613, %2883 : !llvm.i64 + %2885 = llvm.add %2882, %2884 : !llvm.i64 + %2886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2887 = llvm.mul %2488, %2886 : !llvm.i64 + %2888 = llvm.add %2885, %2887 : !llvm.i64 + %2889 = llvm.getelementptr %2878[%2888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2877, %2889 : !llvm.ptr> + %2890 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2892 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2893 = llvm.mul %2470, %2892 : !llvm.i64 + %2894 = llvm.add %2891, %2893 : !llvm.i64 + %2895 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2896 = llvm.mul %2613, %2895 : !llvm.i64 + %2897 = llvm.add %2894, %2896 : !llvm.i64 + %2898 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2899 = llvm.mul %2488, %2898 : !llvm.i64 + %2900 = llvm.add %2897, %2899 : !llvm.i64 + %2901 = llvm.getelementptr %2890[%2900] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2902 = llvm.load %2901 : !llvm.ptr> + %2903 = llvm.insertelement %2732, %2902[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2904 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2905 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2906 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2907 = llvm.mul %2470, %2906 : !llvm.i64 + %2908 = llvm.add %2905, %2907 : !llvm.i64 + %2909 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2910 = llvm.mul %2613, %2909 : !llvm.i64 + %2911 = llvm.add %2908, %2910 : !llvm.i64 + %2912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2913 = llvm.mul %2488, %2912 : !llvm.i64 + %2914 = llvm.add %2911, %2913 : !llvm.i64 + %2915 = llvm.getelementptr %2904[%2914] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2903, %2915 : !llvm.ptr> + %2916 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2917 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2918 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2919 = llvm.mul %2470, %2918 : !llvm.i64 + %2920 = llvm.add %2917, %2919 : !llvm.i64 + %2921 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2922 = llvm.mul %2613, %2921 : !llvm.i64 + %2923 = llvm.add %2920, %2922 : !llvm.i64 + %2924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2925 = llvm.mul %2488, %2924 : !llvm.i64 + %2926 = llvm.add %2923, %2925 : !llvm.i64 + %2927 = llvm.getelementptr %2916[%2926] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2928 = llvm.load %2927 : !llvm.ptr> + %2929 = llvm.insertelement %2733, %2928[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2930 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2931 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2932 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2933 = llvm.mul %2470, %2932 : !llvm.i64 + %2934 = llvm.add %2931, %2933 : !llvm.i64 + %2935 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2936 = llvm.mul %2613, %2935 : !llvm.i64 + %2937 = llvm.add %2934, %2936 : !llvm.i64 + %2938 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2939 = llvm.mul %2488, %2938 : !llvm.i64 + %2940 = llvm.add %2937, %2939 : !llvm.i64 + %2941 = llvm.getelementptr %2930[%2940] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2929, %2941 : !llvm.ptr> + %2942 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2943 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2944 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2945 = llvm.mul %2470, %2944 : !llvm.i64 + %2946 = llvm.add %2943, %2945 : !llvm.i64 + %2947 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2948 = llvm.mul %2613, %2947 : !llvm.i64 + %2949 = llvm.add %2946, %2948 : !llvm.i64 + %2950 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2951 = llvm.mul %2488, %2950 : !llvm.i64 + %2952 = llvm.add %2949, %2951 : !llvm.i64 + %2953 = llvm.getelementptr %2942[%2952] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2954 = llvm.load %2953 : !llvm.ptr> + %2955 = llvm.insertelement %2726, %2954[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2956 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2957 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2958 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2959 = llvm.mul %2470, %2958 : !llvm.i64 + %2960 = llvm.add %2957, %2959 : !llvm.i64 + %2961 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2962 = llvm.mul %2613, %2961 : !llvm.i64 + %2963 = llvm.add %2960, %2962 : !llvm.i64 + %2964 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2965 = llvm.mul %2488, %2964 : !llvm.i64 + %2966 = llvm.add %2963, %2965 : !llvm.i64 + %2967 = llvm.getelementptr %2956[%2966] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2955, %2967 : !llvm.ptr> + %2968 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2970 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2971 = llvm.mul %2470, %2970 : !llvm.i64 + %2972 = llvm.add %2969, %2971 : !llvm.i64 + %2973 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2974 = llvm.mul %2613, %2973 : !llvm.i64 + %2975 = llvm.add %2972, %2974 : !llvm.i64 + %2976 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2977 = llvm.mul %2488, %2976 : !llvm.i64 + %2978 = llvm.add %2975, %2977 : !llvm.i64 + %2979 = llvm.getelementptr %2968[%2978] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2980 = llvm.load %2979 : !llvm.ptr> + %2981 = llvm.insertelement %2727, %2980[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2982 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2984 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2985 = llvm.mul %2470, %2984 : !llvm.i64 + %2986 = llvm.add %2983, %2985 : !llvm.i64 + %2987 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2988 = llvm.mul %2613, %2987 : !llvm.i64 + %2989 = llvm.add %2986, %2988 : !llvm.i64 + %2990 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2991 = llvm.mul %2488, %2990 : !llvm.i64 + %2992 = llvm.add %2989, %2991 : !llvm.i64 + %2993 = llvm.getelementptr %2982[%2992] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2981, %2993 : !llvm.ptr> + %2994 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2995 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2996 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2997 = llvm.mul %2470, %2996 : !llvm.i64 + %2998 = llvm.add %2995, %2997 : !llvm.i64 + %2999 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3000 = llvm.mul %2613, %2999 : !llvm.i64 + %3001 = llvm.add %2998, %3000 : !llvm.i64 + %3002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3003 = llvm.mul %2488, %3002 : !llvm.i64 + %3004 = llvm.add %3001, %3003 : !llvm.i64 + %3005 = llvm.getelementptr %2994[%3004] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3006 = llvm.load %3005 : !llvm.ptr> + %3007 = llvm.insertelement %2728, %3006[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3008 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3010 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3011 = llvm.mul %2470, %3010 : !llvm.i64 + %3012 = llvm.add %3009, %3011 : !llvm.i64 + %3013 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3014 = llvm.mul %2613, %3013 : !llvm.i64 + %3015 = llvm.add %3012, %3014 : !llvm.i64 + %3016 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3017 = llvm.mul %2488, %3016 : !llvm.i64 + %3018 = llvm.add %3015, %3017 : !llvm.i64 + %3019 = llvm.getelementptr %3008[%3018] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3007, %3019 : !llvm.ptr> + %3020 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3021 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3022 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3023 = llvm.mul %2470, %3022 : !llvm.i64 + %3024 = llvm.add %3021, %3023 : !llvm.i64 + %3025 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3026 = llvm.mul %2613, %3025 : !llvm.i64 + %3027 = llvm.add %3024, %3026 : !llvm.i64 + %3028 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3029 = llvm.mul %2488, %3028 : !llvm.i64 + %3030 = llvm.add %3027, %3029 : !llvm.i64 + %3031 = llvm.getelementptr %3020[%3030] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3032 = llvm.load %3031 : !llvm.ptr> + %3033 = llvm.insertelement %2729, %3032[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3037 = llvm.mul %2470, %3036 : !llvm.i64 + %3038 = llvm.add %3035, %3037 : !llvm.i64 + %3039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3040 = llvm.mul %2613, %3039 : !llvm.i64 + %3041 = llvm.add %3038, %3040 : !llvm.i64 + %3042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3043 = llvm.mul %2488, %3042 : !llvm.i64 + %3044 = llvm.add %3041, %3043 : !llvm.i64 + %3045 = llvm.getelementptr %3034[%3044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3033, %3045 : !llvm.ptr> + %3046 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3047 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3048 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3049 = llvm.mul %2470, %3048 : !llvm.i64 + %3050 = llvm.add %3047, %3049 : !llvm.i64 + %3051 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3052 = llvm.mul %2613, %3051 : !llvm.i64 + %3053 = llvm.add %3050, %3052 : !llvm.i64 + %3054 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3055 = llvm.mul %2488, %3054 : !llvm.i64 + %3056 = llvm.add %3053, %3055 : !llvm.i64 + %3057 = llvm.getelementptr %3046[%3056] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3058 = llvm.load %3057 : !llvm.ptr> + %3059 = llvm.insertelement %2730, %3058[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3060 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3062 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3063 = llvm.mul %2470, %3062 : !llvm.i64 + %3064 = llvm.add %3061, %3063 : !llvm.i64 + %3065 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3066 = llvm.mul %2613, %3065 : !llvm.i64 + %3067 = llvm.add %3064, %3066 : !llvm.i64 + %3068 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3069 = llvm.mul %2488, %3068 : !llvm.i64 + %3070 = llvm.add %3067, %3069 : !llvm.i64 + %3071 = llvm.getelementptr %3060[%3070] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3059, %3071 : !llvm.ptr> + %3072 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3073 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3074 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3075 = llvm.mul %2470, %3074 : !llvm.i64 + %3076 = llvm.add %3073, %3075 : !llvm.i64 + %3077 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3078 = llvm.mul %2613, %3077 : !llvm.i64 + %3079 = llvm.add %3076, %3078 : !llvm.i64 + %3080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3081 = llvm.mul %2488, %3080 : !llvm.i64 + %3082 = llvm.add %3079, %3081 : !llvm.i64 + %3083 = llvm.getelementptr %3072[%3082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3084 = llvm.load %3083 : !llvm.ptr> + %3085 = llvm.insertelement %2731, %3084[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3086 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3087 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3088 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3089 = llvm.mul %2470, %3088 : !llvm.i64 + %3090 = llvm.add %3087, %3089 : !llvm.i64 + %3091 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3092 = llvm.mul %2613, %3091 : !llvm.i64 + %3093 = llvm.add %3090, %3092 : !llvm.i64 + %3094 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3095 = llvm.mul %2488, %3094 : !llvm.i64 + %3096 = llvm.add %3093, %3095 : !llvm.i64 + %3097 = llvm.getelementptr %3086[%3096] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3085, %3097 : !llvm.ptr> + %3098 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3100 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3101 = llvm.mul %2470, %3100 : !llvm.i64 + %3102 = llvm.add %3099, %3101 : !llvm.i64 + %3103 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3104 = llvm.mul %2613, %3103 : !llvm.i64 + %3105 = llvm.add %3102, %3104 : !llvm.i64 + %3106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3107 = llvm.mul %2488, %3106 : !llvm.i64 + %3108 = llvm.add %3105, %3107 : !llvm.i64 + %3109 = llvm.getelementptr %3098[%3108] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3110 = llvm.load %3109 : !llvm.ptr> + %3111 = llvm.insertelement %2732, %3110[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3112 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3114 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3115 = llvm.mul %2470, %3114 : !llvm.i64 + %3116 = llvm.add %3113, %3115 : !llvm.i64 + %3117 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3118 = llvm.mul %2613, %3117 : !llvm.i64 + %3119 = llvm.add %3116, %3118 : !llvm.i64 + %3120 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3121 = llvm.mul %2488, %3120 : !llvm.i64 + %3122 = llvm.add %3119, %3121 : !llvm.i64 + %3123 = llvm.getelementptr %3112[%3122] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3111, %3123 : !llvm.ptr> + %3124 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3126 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3127 = llvm.mul %2470, %3126 : !llvm.i64 + %3128 = llvm.add %3125, %3127 : !llvm.i64 + %3129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3130 = llvm.mul %2613, %3129 : !llvm.i64 + %3131 = llvm.add %3128, %3130 : !llvm.i64 + %3132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3133 = llvm.mul %2488, %3132 : !llvm.i64 + %3134 = llvm.add %3131, %3133 : !llvm.i64 + %3135 = llvm.getelementptr %3124[%3134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3136 = llvm.load %3135 : !llvm.ptr> + %3137 = llvm.insertelement %2733, %3136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3138 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3140 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3141 = llvm.mul %2470, %3140 : !llvm.i64 + %3142 = llvm.add %3139, %3141 : !llvm.i64 + %3143 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3144 = llvm.mul %2613, %3143 : !llvm.i64 + %3145 = llvm.add %3142, %3144 : !llvm.i64 + %3146 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3147 = llvm.mul %2488, %3146 : !llvm.i64 + %3148 = llvm.add %3145, %3147 : !llvm.i64 + %3149 = llvm.getelementptr %3138[%3148] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3137, %3149 : !llvm.ptr> + %3150 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3151 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3152 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3153 = llvm.mul %2379, %3152 : !llvm.i64 + %3154 = llvm.add %3151, %3153 : !llvm.i64 + %3155 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3156 = llvm.mul %2380, %3155 : !llvm.i64 + %3157 = llvm.add %3154, %3156 : !llvm.i64 + %3158 = llvm.getelementptr %3150[%3157] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3159 = llvm.load %3158 : !llvm.ptr + %3160 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3162 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3163 = llvm.mul %2379, %3162 : !llvm.i64 + %3164 = llvm.add %3161, %3163 : !llvm.i64 + %3165 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3166 = llvm.mul %2380, %3165 : !llvm.i64 + %3167 = llvm.add %3164, %3166 : !llvm.i64 + %3168 = llvm.getelementptr %3160[%3167] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3169 = llvm.load %3168 : !llvm.ptr + %3170 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3171 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3172 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3173 = llvm.mul %2379, %3172 : !llvm.i64 + %3174 = llvm.add %3171, %3173 : !llvm.i64 + %3175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3176 = llvm.mul %2380, %3175 : !llvm.i64 + %3177 = llvm.add %3174, %3176 : !llvm.i64 + %3178 = llvm.getelementptr %3170[%3177] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3179 = llvm.load %3178 : !llvm.ptr + %3180 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3182 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3183 = llvm.mul %2379, %3182 : !llvm.i64 + %3184 = llvm.add %3181, %3183 : !llvm.i64 + %3185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3186 = llvm.mul %2380, %3185 : !llvm.i64 + %3187 = llvm.add %3184, %3186 : !llvm.i64 + %3188 = llvm.getelementptr %3180[%3187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3189 = llvm.load %3188 : !llvm.ptr + %3190 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3192 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3193 = llvm.mul %2379, %3192 : !llvm.i64 + %3194 = llvm.add %3191, %3193 : !llvm.i64 + %3195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3196 = llvm.mul %2380, %3195 : !llvm.i64 + %3197 = llvm.add %3194, %3196 : !llvm.i64 + %3198 = llvm.getelementptr %3190[%3197] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3199 = llvm.load %3198 : !llvm.ptr + %3200 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3202 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3203 = llvm.mul %2379, %3202 : !llvm.i64 + %3204 = llvm.add %3201, %3203 : !llvm.i64 + %3205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3206 = llvm.mul %2380, %3205 : !llvm.i64 + %3207 = llvm.add %3204, %3206 : !llvm.i64 + %3208 = llvm.getelementptr %3200[%3207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3209 = llvm.load %3208 : !llvm.ptr + %3210 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3212 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3213 = llvm.mul %2379, %3212 : !llvm.i64 + %3214 = llvm.add %3211, %3213 : !llvm.i64 + %3215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3216 = llvm.mul %2380, %3215 : !llvm.i64 + %3217 = llvm.add %3214, %3216 : !llvm.i64 + %3218 = llvm.getelementptr %3210[%3217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3219 = llvm.load %3218 : !llvm.ptr + %3220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3222 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3223 = llvm.mul %2379, %3222 : !llvm.i64 + %3224 = llvm.add %3221, %3223 : !llvm.i64 + %3225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3226 = llvm.mul %2380, %3225 : !llvm.i64 + %3227 = llvm.add %3224, %3226 : !llvm.i64 + %3228 = llvm.getelementptr %3220[%3227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3229 = llvm.load %3228 : !llvm.ptr + %3230 = llvm.add %2368, %70 : !llvm.i64 + %3231 = llvm.icmp "slt" %3230, %67 : !llvm.i64 + %3232 = llvm.sub %64, %3230 : !llvm.i64 + %3233 = llvm.select %3231, %3232, %3230 : !llvm.i1, !llvm.i64 + %3234 = llvm.sdiv %3233, %68 : !llvm.i64 + %3235 = llvm.sub %64, %3234 : !llvm.i64 + %3236 = llvm.select %3231, %3235, %3234 : !llvm.i1, !llvm.i64 + %3237 = llvm.srem %3236, %68 : !llvm.i64 + %3238 = llvm.icmp "slt" %3237, %67 : !llvm.i64 + %3239 = llvm.add %3237, %68 : !llvm.i64 + %3240 = llvm.select %3238, %3239, %3237 : !llvm.i1, !llvm.i64 + %3241 = llvm.sdiv %2463, %70 : !llvm.i64 + %3242 = llvm.sub %64, %3241 : !llvm.i64 + %3243 = llvm.select %2461, %3242, %3241 : !llvm.i1, !llvm.i64 + %3244 = llvm.mul %3236, %65 : !llvm.i64 + %3245 = llvm.add %3243, %3244 : !llvm.i64 + %3246 = llvm.add %3245, %69 : !llvm.i64 + %3247 = llvm.icmp "slt" %3246, %67 : !llvm.i64 + %3248 = llvm.sub %64, %3246 : !llvm.i64 + %3249 = llvm.select %3247, %3248, %3246 : !llvm.i1, !llvm.i64 + %3250 = llvm.sdiv %3249, %63 : !llvm.i64 + %3251 = llvm.sub %64, %3250 : !llvm.i64 + %3252 = llvm.select %3247, %3251, %3250 : !llvm.i1, !llvm.i64 + %3253 = llvm.mul %3252, %65 : !llvm.i64 + %3254 = llvm.add %3245, %3253 : !llvm.i64 + %3255 = llvm.add %3254, %69 : !llvm.i64 + %3256 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3257 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3258 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3259 = llvm.mul %3240, %3258 : !llvm.i64 + %3260 = llvm.add %3257, %3259 : !llvm.i64 + %3261 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3262 = llvm.mul %2474, %3261 : !llvm.i64 + %3263 = llvm.add %3260, %3262 : !llvm.i64 + %3264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3265 = llvm.mul %3255, %3264 : !llvm.i64 + %3266 = llvm.add %3263, %3265 : !llvm.i64 + %3267 = llvm.getelementptr %3256[%3266] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3268 = llvm.load %3267 : !llvm.ptr> + %3269 = llvm.extractelement %3268[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3270 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3271 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3272 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3273 = llvm.mul %3240, %3272 : !llvm.i64 + %3274 = llvm.add %3271, %3273 : !llvm.i64 + %3275 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3276 = llvm.mul %2474, %3275 : !llvm.i64 + %3277 = llvm.add %3274, %3276 : !llvm.i64 + %3278 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3279 = llvm.mul %3255, %3278 : !llvm.i64 + %3280 = llvm.add %3277, %3279 : !llvm.i64 + %3281 = llvm.getelementptr %3270[%3280] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3282 = llvm.load %3281 : !llvm.ptr> + %3283 = llvm.extractelement %3282[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3284 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3286 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3287 = llvm.mul %3240, %3286 : !llvm.i64 + %3288 = llvm.add %3285, %3287 : !llvm.i64 + %3289 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3290 = llvm.mul %2474, %3289 : !llvm.i64 + %3291 = llvm.add %3288, %3290 : !llvm.i64 + %3292 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3293 = llvm.mul %3255, %3292 : !llvm.i64 + %3294 = llvm.add %3291, %3293 : !llvm.i64 + %3295 = llvm.getelementptr %3284[%3294] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3296 = llvm.load %3295 : !llvm.ptr> + %3297 = llvm.extractelement %3296[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3298 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3300 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3301 = llvm.mul %3240, %3300 : !llvm.i64 + %3302 = llvm.add %3299, %3301 : !llvm.i64 + %3303 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3304 = llvm.mul %2474, %3303 : !llvm.i64 + %3305 = llvm.add %3302, %3304 : !llvm.i64 + %3306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3307 = llvm.mul %3255, %3306 : !llvm.i64 + %3308 = llvm.add %3305, %3307 : !llvm.i64 + %3309 = llvm.getelementptr %3298[%3308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3310 = llvm.load %3309 : !llvm.ptr> + %3311 = llvm.extractelement %3310[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3312 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3313 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3314 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3315 = llvm.mul %3240, %3314 : !llvm.i64 + %3316 = llvm.add %3313, %3315 : !llvm.i64 + %3317 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3318 = llvm.mul %2474, %3317 : !llvm.i64 + %3319 = llvm.add %3316, %3318 : !llvm.i64 + %3320 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3321 = llvm.mul %3255, %3320 : !llvm.i64 + %3322 = llvm.add %3319, %3321 : !llvm.i64 + %3323 = llvm.getelementptr %3312[%3322] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3324 = llvm.load %3323 : !llvm.ptr> + %3325 = llvm.extractelement %3324[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3326 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3328 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3329 = llvm.mul %3240, %3328 : !llvm.i64 + %3330 = llvm.add %3327, %3329 : !llvm.i64 + %3331 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3332 = llvm.mul %2474, %3331 : !llvm.i64 + %3333 = llvm.add %3330, %3332 : !llvm.i64 + %3334 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3335 = llvm.mul %3255, %3334 : !llvm.i64 + %3336 = llvm.add %3333, %3335 : !llvm.i64 + %3337 = llvm.getelementptr %3326[%3336] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3338 = llvm.load %3337 : !llvm.ptr> + %3339 = llvm.extractelement %3338[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3340 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3342 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3343 = llvm.mul %3240, %3342 : !llvm.i64 + %3344 = llvm.add %3341, %3343 : !llvm.i64 + %3345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3346 = llvm.mul %2474, %3345 : !llvm.i64 + %3347 = llvm.add %3344, %3346 : !llvm.i64 + %3348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3349 = llvm.mul %3255, %3348 : !llvm.i64 + %3350 = llvm.add %3347, %3349 : !llvm.i64 + %3351 = llvm.getelementptr %3340[%3350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3352 = llvm.load %3351 : !llvm.ptr> + %3353 = llvm.extractelement %3352[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3354 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3356 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3357 = llvm.mul %3240, %3356 : !llvm.i64 + %3358 = llvm.add %3355, %3357 : !llvm.i64 + %3359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3360 = llvm.mul %2474, %3359 : !llvm.i64 + %3361 = llvm.add %3358, %3360 : !llvm.i64 + %3362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3363 = llvm.mul %3255, %3362 : !llvm.i64 + %3364 = llvm.add %3361, %3363 : !llvm.i64 + %3365 = llvm.getelementptr %3354[%3364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3366 = llvm.load %3365 : !llvm.ptr> + %3367 = llvm.extractelement %3366[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3368 = llvm.fmul %3159, %3269 {RelaxedPrecision} : !llvm.float + %3369 = llvm.fmul %3169, %3283 {RelaxedPrecision} : !llvm.float + %3370 = llvm.fmul %3179, %3297 {RelaxedPrecision} : !llvm.float + %3371 = llvm.fmul %3189, %3311 {RelaxedPrecision} : !llvm.float + %3372 = llvm.fmul %3199, %3325 {RelaxedPrecision} : !llvm.float + %3373 = llvm.fmul %3209, %3339 {RelaxedPrecision} : !llvm.float + %3374 = llvm.fmul %3219, %3353 {RelaxedPrecision} : !llvm.float + %3375 = llvm.fmul %3229, %3367 {RelaxedPrecision} : !llvm.float + %3376 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3377 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3378 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3379 = llvm.mul %3240, %3378 : !llvm.i64 + %3380 = llvm.add %3377, %3379 : !llvm.i64 + %3381 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3382 = llvm.mul %2613, %3381 : !llvm.i64 + %3383 = llvm.add %3380, %3382 : !llvm.i64 + %3384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3385 = llvm.mul %3255, %3384 : !llvm.i64 + %3386 = llvm.add %3383, %3385 : !llvm.i64 + %3387 = llvm.getelementptr %3376[%3386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3388 = llvm.load %3387 : !llvm.ptr> + %3389 = llvm.extractelement %3388[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3390 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3392 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3393 = llvm.mul %3240, %3392 : !llvm.i64 + %3394 = llvm.add %3391, %3393 : !llvm.i64 + %3395 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3396 = llvm.mul %2613, %3395 : !llvm.i64 + %3397 = llvm.add %3394, %3396 : !llvm.i64 + %3398 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3399 = llvm.mul %3255, %3398 : !llvm.i64 + %3400 = llvm.add %3397, %3399 : !llvm.i64 + %3401 = llvm.getelementptr %3390[%3400] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3402 = llvm.load %3401 : !llvm.ptr> + %3403 = llvm.extractelement %3402[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3404 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3405 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3406 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3407 = llvm.mul %3240, %3406 : !llvm.i64 + %3408 = llvm.add %3405, %3407 : !llvm.i64 + %3409 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3410 = llvm.mul %2613, %3409 : !llvm.i64 + %3411 = llvm.add %3408, %3410 : !llvm.i64 + %3412 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3413 = llvm.mul %3255, %3412 : !llvm.i64 + %3414 = llvm.add %3411, %3413 : !llvm.i64 + %3415 = llvm.getelementptr %3404[%3414] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3416 = llvm.load %3415 : !llvm.ptr> + %3417 = llvm.extractelement %3416[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3421 = llvm.mul %3240, %3420 : !llvm.i64 + %3422 = llvm.add %3419, %3421 : !llvm.i64 + %3423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3424 = llvm.mul %2613, %3423 : !llvm.i64 + %3425 = llvm.add %3422, %3424 : !llvm.i64 + %3426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3427 = llvm.mul %3255, %3426 : !llvm.i64 + %3428 = llvm.add %3425, %3427 : !llvm.i64 + %3429 = llvm.getelementptr %3418[%3428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3430 = llvm.load %3429 : !llvm.ptr> + %3431 = llvm.extractelement %3430[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3435 = llvm.mul %3240, %3434 : !llvm.i64 + %3436 = llvm.add %3433, %3435 : !llvm.i64 + %3437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3438 = llvm.mul %2613, %3437 : !llvm.i64 + %3439 = llvm.add %3436, %3438 : !llvm.i64 + %3440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3441 = llvm.mul %3255, %3440 : !llvm.i64 + %3442 = llvm.add %3439, %3441 : !llvm.i64 + %3443 = llvm.getelementptr %3432[%3442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3444 = llvm.load %3443 : !llvm.ptr> + %3445 = llvm.extractelement %3444[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3446 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3447 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3448 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3449 = llvm.mul %3240, %3448 : !llvm.i64 + %3450 = llvm.add %3447, %3449 : !llvm.i64 + %3451 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3452 = llvm.mul %2613, %3451 : !llvm.i64 + %3453 = llvm.add %3450, %3452 : !llvm.i64 + %3454 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3455 = llvm.mul %3255, %3454 : !llvm.i64 + %3456 = llvm.add %3453, %3455 : !llvm.i64 + %3457 = llvm.getelementptr %3446[%3456] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3458 = llvm.load %3457 : !llvm.ptr> + %3459 = llvm.extractelement %3458[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3460 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3461 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3462 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3463 = llvm.mul %3240, %3462 : !llvm.i64 + %3464 = llvm.add %3461, %3463 : !llvm.i64 + %3465 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3466 = llvm.mul %2613, %3465 : !llvm.i64 + %3467 = llvm.add %3464, %3466 : !llvm.i64 + %3468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3469 = llvm.mul %3255, %3468 : !llvm.i64 + %3470 = llvm.add %3467, %3469 : !llvm.i64 + %3471 = llvm.getelementptr %3460[%3470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3472 = llvm.load %3471 : !llvm.ptr> + %3473 = llvm.extractelement %3472[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3477 = llvm.mul %3240, %3476 : !llvm.i64 + %3478 = llvm.add %3475, %3477 : !llvm.i64 + %3479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3480 = llvm.mul %2613, %3479 : !llvm.i64 + %3481 = llvm.add %3478, %3480 : !llvm.i64 + %3482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3483 = llvm.mul %3255, %3482 : !llvm.i64 + %3484 = llvm.add %3481, %3483 : !llvm.i64 + %3485 = llvm.getelementptr %3474[%3484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3486 = llvm.load %3485 : !llvm.ptr> + %3487 = llvm.extractelement %3486[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3488 = llvm.fadd %3389, %3368 {RelaxedPrecision} : !llvm.float + %3489 = llvm.fadd %3403, %3369 {RelaxedPrecision} : !llvm.float + %3490 = llvm.fadd %3417, %3370 {RelaxedPrecision} : !llvm.float + %3491 = llvm.fadd %3431, %3371 {RelaxedPrecision} : !llvm.float + %3492 = llvm.fadd %3445, %3372 {RelaxedPrecision} : !llvm.float + %3493 = llvm.fadd %3459, %3373 {RelaxedPrecision} : !llvm.float + %3494 = llvm.fadd %3473, %3374 {RelaxedPrecision} : !llvm.float + %3495 = llvm.fadd %3487, %3375 {RelaxedPrecision} : !llvm.float + %3496 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3497 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3498 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3499 = llvm.mul %3240, %3498 : !llvm.i64 + %3500 = llvm.add %3497, %3499 : !llvm.i64 + %3501 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3502 = llvm.mul %2613, %3501 : !llvm.i64 + %3503 = llvm.add %3500, %3502 : !llvm.i64 + %3504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3505 = llvm.mul %3255, %3504 : !llvm.i64 + %3506 = llvm.add %3503, %3505 : !llvm.i64 + %3507 = llvm.getelementptr %3496[%3506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3508 = llvm.load %3507 : !llvm.ptr> + %3509 = llvm.insertelement %3488, %3508[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3510 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3511 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3512 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3513 = llvm.mul %3240, %3512 : !llvm.i64 + %3514 = llvm.add %3511, %3513 : !llvm.i64 + %3515 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3516 = llvm.mul %2613, %3515 : !llvm.i64 + %3517 = llvm.add %3514, %3516 : !llvm.i64 + %3518 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3519 = llvm.mul %3255, %3518 : !llvm.i64 + %3520 = llvm.add %3517, %3519 : !llvm.i64 + %3521 = llvm.getelementptr %3510[%3520] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3509, %3521 : !llvm.ptr> + %3522 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3523 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3524 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3525 = llvm.mul %3240, %3524 : !llvm.i64 + %3526 = llvm.add %3523, %3525 : !llvm.i64 + %3527 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3528 = llvm.mul %2613, %3527 : !llvm.i64 + %3529 = llvm.add %3526, %3528 : !llvm.i64 + %3530 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3531 = llvm.mul %3255, %3530 : !llvm.i64 + %3532 = llvm.add %3529, %3531 : !llvm.i64 + %3533 = llvm.getelementptr %3522[%3532] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3534 = llvm.load %3533 : !llvm.ptr> + %3535 = llvm.insertelement %3489, %3534[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3536 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3537 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3538 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3539 = llvm.mul %3240, %3538 : !llvm.i64 + %3540 = llvm.add %3537, %3539 : !llvm.i64 + %3541 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3542 = llvm.mul %2613, %3541 : !llvm.i64 + %3543 = llvm.add %3540, %3542 : !llvm.i64 + %3544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3545 = llvm.mul %3255, %3544 : !llvm.i64 + %3546 = llvm.add %3543, %3545 : !llvm.i64 + %3547 = llvm.getelementptr %3536[%3546] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3535, %3547 : !llvm.ptr> + %3548 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3550 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3551 = llvm.mul %3240, %3550 : !llvm.i64 + %3552 = llvm.add %3549, %3551 : !llvm.i64 + %3553 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3554 = llvm.mul %2613, %3553 : !llvm.i64 + %3555 = llvm.add %3552, %3554 : !llvm.i64 + %3556 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3557 = llvm.mul %3255, %3556 : !llvm.i64 + %3558 = llvm.add %3555, %3557 : !llvm.i64 + %3559 = llvm.getelementptr %3548[%3558] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3560 = llvm.load %3559 : !llvm.ptr> + %3561 = llvm.insertelement %3490, %3560[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3562 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3563 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3564 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3565 = llvm.mul %3240, %3564 : !llvm.i64 + %3566 = llvm.add %3563, %3565 : !llvm.i64 + %3567 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3568 = llvm.mul %2613, %3567 : !llvm.i64 + %3569 = llvm.add %3566, %3568 : !llvm.i64 + %3570 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3571 = llvm.mul %3255, %3570 : !llvm.i64 + %3572 = llvm.add %3569, %3571 : !llvm.i64 + %3573 = llvm.getelementptr %3562[%3572] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3561, %3573 : !llvm.ptr> + %3574 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3575 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3576 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3577 = llvm.mul %3240, %3576 : !llvm.i64 + %3578 = llvm.add %3575, %3577 : !llvm.i64 + %3579 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3580 = llvm.mul %2613, %3579 : !llvm.i64 + %3581 = llvm.add %3578, %3580 : !llvm.i64 + %3582 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3583 = llvm.mul %3255, %3582 : !llvm.i64 + %3584 = llvm.add %3581, %3583 : !llvm.i64 + %3585 = llvm.getelementptr %3574[%3584] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3586 = llvm.load %3585 : !llvm.ptr> + %3587 = llvm.insertelement %3491, %3586[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3588 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3590 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3591 = llvm.mul %3240, %3590 : !llvm.i64 + %3592 = llvm.add %3589, %3591 : !llvm.i64 + %3593 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3594 = llvm.mul %2613, %3593 : !llvm.i64 + %3595 = llvm.add %3592, %3594 : !llvm.i64 + %3596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3597 = llvm.mul %3255, %3596 : !llvm.i64 + %3598 = llvm.add %3595, %3597 : !llvm.i64 + %3599 = llvm.getelementptr %3588[%3598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3587, %3599 : !llvm.ptr> + %3600 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3602 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3603 = llvm.mul %3240, %3602 : !llvm.i64 + %3604 = llvm.add %3601, %3603 : !llvm.i64 + %3605 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3606 = llvm.mul %2613, %3605 : !llvm.i64 + %3607 = llvm.add %3604, %3606 : !llvm.i64 + %3608 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3609 = llvm.mul %3255, %3608 : !llvm.i64 + %3610 = llvm.add %3607, %3609 : !llvm.i64 + %3611 = llvm.getelementptr %3600[%3610] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3612 = llvm.load %3611 : !llvm.ptr> + %3613 = llvm.insertelement %3492, %3612[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3617 = llvm.mul %3240, %3616 : !llvm.i64 + %3618 = llvm.add %3615, %3617 : !llvm.i64 + %3619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3620 = llvm.mul %2613, %3619 : !llvm.i64 + %3621 = llvm.add %3618, %3620 : !llvm.i64 + %3622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3623 = llvm.mul %3255, %3622 : !llvm.i64 + %3624 = llvm.add %3621, %3623 : !llvm.i64 + %3625 = llvm.getelementptr %3614[%3624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3613, %3625 : !llvm.ptr> + %3626 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3627 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3628 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3629 = llvm.mul %3240, %3628 : !llvm.i64 + %3630 = llvm.add %3627, %3629 : !llvm.i64 + %3631 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3632 = llvm.mul %2613, %3631 : !llvm.i64 + %3633 = llvm.add %3630, %3632 : !llvm.i64 + %3634 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3635 = llvm.mul %3255, %3634 : !llvm.i64 + %3636 = llvm.add %3633, %3635 : !llvm.i64 + %3637 = llvm.getelementptr %3626[%3636] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3638 = llvm.load %3637 : !llvm.ptr> + %3639 = llvm.insertelement %3493, %3638[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3640 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3641 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3642 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3643 = llvm.mul %3240, %3642 : !llvm.i64 + %3644 = llvm.add %3641, %3643 : !llvm.i64 + %3645 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3646 = llvm.mul %2613, %3645 : !llvm.i64 + %3647 = llvm.add %3644, %3646 : !llvm.i64 + %3648 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3649 = llvm.mul %3255, %3648 : !llvm.i64 + %3650 = llvm.add %3647, %3649 : !llvm.i64 + %3651 = llvm.getelementptr %3640[%3650] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3639, %3651 : !llvm.ptr> + %3652 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3653 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3654 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3655 = llvm.mul %3240, %3654 : !llvm.i64 + %3656 = llvm.add %3653, %3655 : !llvm.i64 + %3657 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3658 = llvm.mul %2613, %3657 : !llvm.i64 + %3659 = llvm.add %3656, %3658 : !llvm.i64 + %3660 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3661 = llvm.mul %3255, %3660 : !llvm.i64 + %3662 = llvm.add %3659, %3661 : !llvm.i64 + %3663 = llvm.getelementptr %3652[%3662] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3664 = llvm.load %3663 : !llvm.ptr> + %3665 = llvm.insertelement %3494, %3664[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3666 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3667 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3668 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3669 = llvm.mul %3240, %3668 : !llvm.i64 + %3670 = llvm.add %3667, %3669 : !llvm.i64 + %3671 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3672 = llvm.mul %2613, %3671 : !llvm.i64 + %3673 = llvm.add %3670, %3672 : !llvm.i64 + %3674 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3675 = llvm.mul %3255, %3674 : !llvm.i64 + %3676 = llvm.add %3673, %3675 : !llvm.i64 + %3677 = llvm.getelementptr %3666[%3676] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3665, %3677 : !llvm.ptr> + %3678 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3680 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3681 = llvm.mul %3240, %3680 : !llvm.i64 + %3682 = llvm.add %3679, %3681 : !llvm.i64 + %3683 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3684 = llvm.mul %2613, %3683 : !llvm.i64 + %3685 = llvm.add %3682, %3684 : !llvm.i64 + %3686 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3687 = llvm.mul %3255, %3686 : !llvm.i64 + %3688 = llvm.add %3685, %3687 : !llvm.i64 + %3689 = llvm.getelementptr %3678[%3688] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3690 = llvm.load %3689 : !llvm.ptr> + %3691 = llvm.insertelement %3495, %3690[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3692 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3694 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3695 = llvm.mul %3240, %3694 : !llvm.i64 + %3696 = llvm.add %3693, %3695 : !llvm.i64 + %3697 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3698 = llvm.mul %2613, %3697 : !llvm.i64 + %3699 = llvm.add %3696, %3698 : !llvm.i64 + %3700 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3701 = llvm.mul %3255, %3700 : !llvm.i64 + %3702 = llvm.add %3699, %3701 : !llvm.i64 + %3703 = llvm.getelementptr %3692[%3702] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3691, %3703 : !llvm.ptr> + %3704 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3705 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3706 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3707 = llvm.mul %3240, %3706 : !llvm.i64 + %3708 = llvm.add %3705, %3707 : !llvm.i64 + %3709 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3710 = llvm.mul %2613, %3709 : !llvm.i64 + %3711 = llvm.add %3708, %3710 : !llvm.i64 + %3712 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3713 = llvm.mul %3255, %3712 : !llvm.i64 + %3714 = llvm.add %3711, %3713 : !llvm.i64 + %3715 = llvm.getelementptr %3704[%3714] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3716 = llvm.load %3715 : !llvm.ptr> + %3717 = llvm.insertelement %3488, %3716[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3718 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3720 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3721 = llvm.mul %3240, %3720 : !llvm.i64 + %3722 = llvm.add %3719, %3721 : !llvm.i64 + %3723 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3724 = llvm.mul %2613, %3723 : !llvm.i64 + %3725 = llvm.add %3722, %3724 : !llvm.i64 + %3726 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3727 = llvm.mul %3255, %3726 : !llvm.i64 + %3728 = llvm.add %3725, %3727 : !llvm.i64 + %3729 = llvm.getelementptr %3718[%3728] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3717, %3729 : !llvm.ptr> + %3730 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3731 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3732 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3733 = llvm.mul %3240, %3732 : !llvm.i64 + %3734 = llvm.add %3731, %3733 : !llvm.i64 + %3735 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3736 = llvm.mul %2613, %3735 : !llvm.i64 + %3737 = llvm.add %3734, %3736 : !llvm.i64 + %3738 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3739 = llvm.mul %3255, %3738 : !llvm.i64 + %3740 = llvm.add %3737, %3739 : !llvm.i64 + %3741 = llvm.getelementptr %3730[%3740] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3742 = llvm.load %3741 : !llvm.ptr> + %3743 = llvm.insertelement %3489, %3742[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3744 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3746 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3747 = llvm.mul %3240, %3746 : !llvm.i64 + %3748 = llvm.add %3745, %3747 : !llvm.i64 + %3749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3750 = llvm.mul %2613, %3749 : !llvm.i64 + %3751 = llvm.add %3748, %3750 : !llvm.i64 + %3752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3753 = llvm.mul %3255, %3752 : !llvm.i64 + %3754 = llvm.add %3751, %3753 : !llvm.i64 + %3755 = llvm.getelementptr %3744[%3754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3743, %3755 : !llvm.ptr> + %3756 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3758 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3759 = llvm.mul %3240, %3758 : !llvm.i64 + %3760 = llvm.add %3757, %3759 : !llvm.i64 + %3761 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3762 = llvm.mul %2613, %3761 : !llvm.i64 + %3763 = llvm.add %3760, %3762 : !llvm.i64 + %3764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3765 = llvm.mul %3255, %3764 : !llvm.i64 + %3766 = llvm.add %3763, %3765 : !llvm.i64 + %3767 = llvm.getelementptr %3756[%3766] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3768 = llvm.load %3767 : !llvm.ptr> + %3769 = llvm.insertelement %3490, %3768[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3770 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3771 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3772 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3773 = llvm.mul %3240, %3772 : !llvm.i64 + %3774 = llvm.add %3771, %3773 : !llvm.i64 + %3775 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3776 = llvm.mul %2613, %3775 : !llvm.i64 + %3777 = llvm.add %3774, %3776 : !llvm.i64 + %3778 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3779 = llvm.mul %3255, %3778 : !llvm.i64 + %3780 = llvm.add %3777, %3779 : !llvm.i64 + %3781 = llvm.getelementptr %3770[%3780] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3769, %3781 : !llvm.ptr> + %3782 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3784 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3785 = llvm.mul %3240, %3784 : !llvm.i64 + %3786 = llvm.add %3783, %3785 : !llvm.i64 + %3787 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3788 = llvm.mul %2613, %3787 : !llvm.i64 + %3789 = llvm.add %3786, %3788 : !llvm.i64 + %3790 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3791 = llvm.mul %3255, %3790 : !llvm.i64 + %3792 = llvm.add %3789, %3791 : !llvm.i64 + %3793 = llvm.getelementptr %3782[%3792] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3794 = llvm.load %3793 : !llvm.ptr> + %3795 = llvm.insertelement %3491, %3794[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3796 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3797 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3798 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3799 = llvm.mul %3240, %3798 : !llvm.i64 + %3800 = llvm.add %3797, %3799 : !llvm.i64 + %3801 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3802 = llvm.mul %2613, %3801 : !llvm.i64 + %3803 = llvm.add %3800, %3802 : !llvm.i64 + %3804 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3805 = llvm.mul %3255, %3804 : !llvm.i64 + %3806 = llvm.add %3803, %3805 : !llvm.i64 + %3807 = llvm.getelementptr %3796[%3806] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3795, %3807 : !llvm.ptr> + %3808 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3810 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3811 = llvm.mul %3240, %3810 : !llvm.i64 + %3812 = llvm.add %3809, %3811 : !llvm.i64 + %3813 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3814 = llvm.mul %2613, %3813 : !llvm.i64 + %3815 = llvm.add %3812, %3814 : !llvm.i64 + %3816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3817 = llvm.mul %3255, %3816 : !llvm.i64 + %3818 = llvm.add %3815, %3817 : !llvm.i64 + %3819 = llvm.getelementptr %3808[%3818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3820 = llvm.load %3819 : !llvm.ptr> + %3821 = llvm.insertelement %3492, %3820[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3822 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3824 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3825 = llvm.mul %3240, %3824 : !llvm.i64 + %3826 = llvm.add %3823, %3825 : !llvm.i64 + %3827 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3828 = llvm.mul %2613, %3827 : !llvm.i64 + %3829 = llvm.add %3826, %3828 : !llvm.i64 + %3830 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3831 = llvm.mul %3255, %3830 : !llvm.i64 + %3832 = llvm.add %3829, %3831 : !llvm.i64 + %3833 = llvm.getelementptr %3822[%3832] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3821, %3833 : !llvm.ptr> + %3834 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3835 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3836 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3837 = llvm.mul %3240, %3836 : !llvm.i64 + %3838 = llvm.add %3835, %3837 : !llvm.i64 + %3839 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3840 = llvm.mul %2613, %3839 : !llvm.i64 + %3841 = llvm.add %3838, %3840 : !llvm.i64 + %3842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3843 = llvm.mul %3255, %3842 : !llvm.i64 + %3844 = llvm.add %3841, %3843 : !llvm.i64 + %3845 = llvm.getelementptr %3834[%3844] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3846 = llvm.load %3845 : !llvm.ptr> + %3847 = llvm.insertelement %3493, %3846[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3848 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3850 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3851 = llvm.mul %3240, %3850 : !llvm.i64 + %3852 = llvm.add %3849, %3851 : !llvm.i64 + %3853 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3854 = llvm.mul %2613, %3853 : !llvm.i64 + %3855 = llvm.add %3852, %3854 : !llvm.i64 + %3856 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3857 = llvm.mul %3255, %3856 : !llvm.i64 + %3858 = llvm.add %3855, %3857 : !llvm.i64 + %3859 = llvm.getelementptr %3848[%3858] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3847, %3859 : !llvm.ptr> + %3860 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3861 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3862 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3863 = llvm.mul %3240, %3862 : !llvm.i64 + %3864 = llvm.add %3861, %3863 : !llvm.i64 + %3865 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3866 = llvm.mul %2613, %3865 : !llvm.i64 + %3867 = llvm.add %3864, %3866 : !llvm.i64 + %3868 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3869 = llvm.mul %3255, %3868 : !llvm.i64 + %3870 = llvm.add %3867, %3869 : !llvm.i64 + %3871 = llvm.getelementptr %3860[%3870] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3872 = llvm.load %3871 : !llvm.ptr> + %3873 = llvm.insertelement %3494, %3872[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3874 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3875 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3876 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3877 = llvm.mul %3240, %3876 : !llvm.i64 + %3878 = llvm.add %3875, %3877 : !llvm.i64 + %3879 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3880 = llvm.mul %2613, %3879 : !llvm.i64 + %3881 = llvm.add %3878, %3880 : !llvm.i64 + %3882 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3883 = llvm.mul %3255, %3882 : !llvm.i64 + %3884 = llvm.add %3881, %3883 : !llvm.i64 + %3885 = llvm.getelementptr %3874[%3884] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3873, %3885 : !llvm.ptr> + %3886 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3888 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3889 = llvm.mul %3240, %3888 : !llvm.i64 + %3890 = llvm.add %3887, %3889 : !llvm.i64 + %3891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3892 = llvm.mul %2613, %3891 : !llvm.i64 + %3893 = llvm.add %3890, %3892 : !llvm.i64 + %3894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3895 = llvm.mul %3255, %3894 : !llvm.i64 + %3896 = llvm.add %3893, %3895 : !llvm.i64 + %3897 = llvm.getelementptr %3886[%3896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3898 = llvm.load %3897 : !llvm.ptr> + %3899 = llvm.insertelement %3495, %3898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3903 = llvm.mul %3240, %3902 : !llvm.i64 + %3904 = llvm.add %3901, %3903 : !llvm.i64 + %3905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3906 = llvm.mul %2613, %3905 : !llvm.i64 + %3907 = llvm.add %3904, %3906 : !llvm.i64 + %3908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3909 = llvm.mul %3255, %3908 : !llvm.i64 + %3910 = llvm.add %3907, %3909 : !llvm.i64 + %3911 = llvm.getelementptr %3900[%3910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3899, %3911 : !llvm.ptr> + %3912 = llvm.add %2376, %69 : !llvm.i64 + llvm.br ^bb31(%3912 : !llvm.i64) + ^bb33: // pred: ^bb31 + %3913 = llvm.add %2374, %69 : !llvm.i64 + llvm.br ^bb29(%3913 : !llvm.i64) + ^bb34: // pred: ^bb29 + %3914 = llvm.add %2372, %56 : !llvm.i64 + llvm.br ^bb27(%3914 : !llvm.i64) + ^bb35: // pred: ^bb27 + llvm.br ^bb36(%67 : !llvm.i64) + ^bb36(%3915: !llvm.i64): // 2 preds: ^bb35, ^bb37 + %3916 = llvm.icmp "slt" %3915, %48 : !llvm.i64 + llvm.cond_br %3916, ^bb37, ^bb38 + ^bb37: // pred: ^bb36 + %3917 = llvm.add %2370, %3915 : !llvm.i64 + %3918 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3920 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3921 = llvm.mul %2345, %3920 : !llvm.i64 + %3922 = llvm.add %3919, %3921 : !llvm.i64 + %3923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3924 = llvm.mul %3917, %3923 : !llvm.i64 + %3925 = llvm.add %3922, %3924 : !llvm.i64 + %3926 = llvm.getelementptr %3918[%3925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3927 = llvm.load %3926 : !llvm.ptr + %3928 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3930 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3931 = llvm.mul %2345, %3930 : !llvm.i64 + %3932 = llvm.add %3929, %3931 : !llvm.i64 + %3933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3934 = llvm.mul %3917, %3933 : !llvm.i64 + %3935 = llvm.add %3932, %3934 : !llvm.i64 + %3936 = llvm.getelementptr %3928[%3935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3937 = llvm.load %3936 : !llvm.ptr + %3938 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3940 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3941 = llvm.mul %2345, %3940 : !llvm.i64 + %3942 = llvm.add %3939, %3941 : !llvm.i64 + %3943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3944 = llvm.mul %3917, %3943 : !llvm.i64 + %3945 = llvm.add %3942, %3944 : !llvm.i64 + %3946 = llvm.getelementptr %3938[%3945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3947 = llvm.load %3946 : !llvm.ptr + %3948 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3950 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3951 = llvm.mul %2345, %3950 : !llvm.i64 + %3952 = llvm.add %3949, %3951 : !llvm.i64 + %3953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3954 = llvm.mul %3917, %3953 : !llvm.i64 + %3955 = llvm.add %3952, %3954 : !llvm.i64 + %3956 = llvm.getelementptr %3948[%3955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3957 = llvm.load %3956 : !llvm.ptr + %3958 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3960 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3961 = llvm.mul %2345, %3960 : !llvm.i64 + %3962 = llvm.add %3959, %3961 : !llvm.i64 + %3963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3964 = llvm.mul %3917, %3963 : !llvm.i64 + %3965 = llvm.add %3962, %3964 : !llvm.i64 + %3966 = llvm.getelementptr %3958[%3965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3967 = llvm.load %3966 : !llvm.ptr + %3968 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3970 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3971 = llvm.mul %2345, %3970 : !llvm.i64 + %3972 = llvm.add %3969, %3971 : !llvm.i64 + %3973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3974 = llvm.mul %3917, %3973 : !llvm.i64 + %3975 = llvm.add %3972, %3974 : !llvm.i64 + %3976 = llvm.getelementptr %3968[%3975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3977 = llvm.load %3976 : !llvm.ptr + %3978 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3980 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3981 = llvm.mul %2345, %3980 : !llvm.i64 + %3982 = llvm.add %3979, %3981 : !llvm.i64 + %3983 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3984 = llvm.mul %3917, %3983 : !llvm.i64 + %3985 = llvm.add %3982, %3984 : !llvm.i64 + %3986 = llvm.getelementptr %3978[%3985] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3987 = llvm.load %3986 : !llvm.ptr + %3988 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3990 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3991 = llvm.mul %2345, %3990 : !llvm.i64 + %3992 = llvm.add %3989, %3991 : !llvm.i64 + %3993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3994 = llvm.mul %3917, %3993 : !llvm.i64 + %3995 = llvm.add %3992, %3994 : !llvm.i64 + %3996 = llvm.getelementptr %3988[%3995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3997 = llvm.load %3996 : !llvm.ptr + %3998 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %3999 = llvm.sub %64, %2368 : !llvm.i64 + %4000 = llvm.select %3998, %3999, %2368 : !llvm.i1, !llvm.i64 + %4001 = llvm.sdiv %4000, %68 : !llvm.i64 + %4002 = llvm.sub %64, %4001 : !llvm.i64 + %4003 = llvm.select %3998, %4002, %4001 : !llvm.i1, !llvm.i64 + %4004 = llvm.srem %4003, %68 : !llvm.i64 + %4005 = llvm.icmp "slt" %4004, %67 : !llvm.i64 + %4006 = llvm.add %4004, %68 : !llvm.i64 + %4007 = llvm.select %4005, %4006, %4004 : !llvm.i1, !llvm.i64 + %4008 = llvm.srem %3917, %39 : !llvm.i64 + %4009 = llvm.icmp "slt" %4008, %67 : !llvm.i64 + %4010 = llvm.add %4008, %39 : !llvm.i64 + %4011 = llvm.select %4009, %4010, %4008 : !llvm.i1, !llvm.i64 + %4012 = llvm.srem %2368, %68 : !llvm.i64 + %4013 = llvm.icmp "slt" %4012, %67 : !llvm.i64 + %4014 = llvm.add %4012, %68 : !llvm.i64 + %4015 = llvm.select %4013, %4014, %4012 : !llvm.i1, !llvm.i64 + %4016 = llvm.icmp "slt" %4015, %67 : !llvm.i64 + %4017 = llvm.sub %64, %4015 : !llvm.i64 + %4018 = llvm.select %4016, %4017, %4015 : !llvm.i1, !llvm.i64 + %4019 = llvm.sdiv %4018, %70 : !llvm.i64 + %4020 = llvm.sub %64, %4019 : !llvm.i64 + %4021 = llvm.select %4016, %4020, %4019 : !llvm.i1, !llvm.i64 + %4022 = llvm.srem %4021, %63 : !llvm.i64 + %4023 = llvm.icmp "slt" %4022, %67 : !llvm.i64 + %4024 = llvm.add %4022, %63 : !llvm.i64 + %4025 = llvm.select %4023, %4024, %4022 : !llvm.i1, !llvm.i64 + %4026 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4027 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4028 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4029 = llvm.mul %4007, %4028 : !llvm.i64 + %4030 = llvm.add %4027, %4029 : !llvm.i64 + %4031 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4032 = llvm.mul %4011, %4031 : !llvm.i64 + %4033 = llvm.add %4030, %4032 : !llvm.i64 + %4034 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4035 = llvm.mul %4025, %4034 : !llvm.i64 + %4036 = llvm.add %4033, %4035 : !llvm.i64 + %4037 = llvm.getelementptr %4026[%4036] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4038 = llvm.load %4037 : !llvm.ptr> + %4039 = llvm.extractelement %4038[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4040 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4041 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4042 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4043 = llvm.mul %4007, %4042 : !llvm.i64 + %4044 = llvm.add %4041, %4043 : !llvm.i64 + %4045 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4046 = llvm.mul %4011, %4045 : !llvm.i64 + %4047 = llvm.add %4044, %4046 : !llvm.i64 + %4048 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4049 = llvm.mul %4025, %4048 : !llvm.i64 + %4050 = llvm.add %4047, %4049 : !llvm.i64 + %4051 = llvm.getelementptr %4040[%4050] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4052 = llvm.load %4051 : !llvm.ptr> + %4053 = llvm.extractelement %4052[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4054 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4056 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4057 = llvm.mul %4007, %4056 : !llvm.i64 + %4058 = llvm.add %4055, %4057 : !llvm.i64 + %4059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4060 = llvm.mul %4011, %4059 : !llvm.i64 + %4061 = llvm.add %4058, %4060 : !llvm.i64 + %4062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4063 = llvm.mul %4025, %4062 : !llvm.i64 + %4064 = llvm.add %4061, %4063 : !llvm.i64 + %4065 = llvm.getelementptr %4054[%4064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4066 = llvm.load %4065 : !llvm.ptr> + %4067 = llvm.extractelement %4066[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4068 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4070 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4071 = llvm.mul %4007, %4070 : !llvm.i64 + %4072 = llvm.add %4069, %4071 : !llvm.i64 + %4073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4074 = llvm.mul %4011, %4073 : !llvm.i64 + %4075 = llvm.add %4072, %4074 : !llvm.i64 + %4076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4077 = llvm.mul %4025, %4076 : !llvm.i64 + %4078 = llvm.add %4075, %4077 : !llvm.i64 + %4079 = llvm.getelementptr %4068[%4078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4080 = llvm.load %4079 : !llvm.ptr> + %4081 = llvm.extractelement %4080[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4082 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4083 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4084 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4085 = llvm.mul %4007, %4084 : !llvm.i64 + %4086 = llvm.add %4083, %4085 : !llvm.i64 + %4087 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4088 = llvm.mul %4011, %4087 : !llvm.i64 + %4089 = llvm.add %4086, %4088 : !llvm.i64 + %4090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4091 = llvm.mul %4025, %4090 : !llvm.i64 + %4092 = llvm.add %4089, %4091 : !llvm.i64 + %4093 = llvm.getelementptr %4082[%4092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4094 = llvm.load %4093 : !llvm.ptr> + %4095 = llvm.extractelement %4094[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4096 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4098 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4099 = llvm.mul %4007, %4098 : !llvm.i64 + %4100 = llvm.add %4097, %4099 : !llvm.i64 + %4101 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4102 = llvm.mul %4011, %4101 : !llvm.i64 + %4103 = llvm.add %4100, %4102 : !llvm.i64 + %4104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4105 = llvm.mul %4025, %4104 : !llvm.i64 + %4106 = llvm.add %4103, %4105 : !llvm.i64 + %4107 = llvm.getelementptr %4096[%4106] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4108 = llvm.load %4107 : !llvm.ptr> + %4109 = llvm.extractelement %4108[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4110 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4111 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4112 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4113 = llvm.mul %4007, %4112 : !llvm.i64 + %4114 = llvm.add %4111, %4113 : !llvm.i64 + %4115 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4116 = llvm.mul %4011, %4115 : !llvm.i64 + %4117 = llvm.add %4114, %4116 : !llvm.i64 + %4118 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4119 = llvm.mul %4025, %4118 : !llvm.i64 + %4120 = llvm.add %4117, %4119 : !llvm.i64 + %4121 = llvm.getelementptr %4110[%4120] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4122 = llvm.load %4121 : !llvm.ptr> + %4123 = llvm.extractelement %4122[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4124 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4126 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4127 = llvm.mul %4007, %4126 : !llvm.i64 + %4128 = llvm.add %4125, %4127 : !llvm.i64 + %4129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4130 = llvm.mul %4011, %4129 : !llvm.i64 + %4131 = llvm.add %4128, %4130 : !llvm.i64 + %4132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4133 = llvm.mul %4025, %4132 : !llvm.i64 + %4134 = llvm.add %4131, %4133 : !llvm.i64 + %4135 = llvm.getelementptr %4124[%4134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4136 = llvm.load %4135 : !llvm.ptr> + %4137 = llvm.extractelement %4136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4138 = llvm.fmul %3927, %4039 {RelaxedPrecision} : !llvm.float + %4139 = llvm.fmul %3937, %4053 {RelaxedPrecision} : !llvm.float + %4140 = llvm.fmul %3947, %4067 {RelaxedPrecision} : !llvm.float + %4141 = llvm.fmul %3957, %4081 {RelaxedPrecision} : !llvm.float + %4142 = llvm.fmul %3967, %4095 {RelaxedPrecision} : !llvm.float + %4143 = llvm.fmul %3977, %4109 {RelaxedPrecision} : !llvm.float + %4144 = llvm.fmul %3987, %4123 {RelaxedPrecision} : !llvm.float + %4145 = llvm.fmul %3997, %4137 {RelaxedPrecision} : !llvm.float + %4146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4149 = llvm.mul %4007, %4148 : !llvm.i64 + %4150 = llvm.add %4147, %4149 : !llvm.i64 + %4151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4152 = llvm.mul %67, %4151 : !llvm.i64 + %4153 = llvm.add %4150, %4152 : !llvm.i64 + %4154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4155 = llvm.mul %4025, %4154 : !llvm.i64 + %4156 = llvm.add %4153, %4155 : !llvm.i64 + %4157 = llvm.getelementptr %4146[%4156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4158 = llvm.load %4157 : !llvm.ptr> + %4159 = llvm.extractelement %4158[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4160 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4162 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4163 = llvm.mul %4007, %4162 : !llvm.i64 + %4164 = llvm.add %4161, %4163 : !llvm.i64 + %4165 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4166 = llvm.mul %67, %4165 : !llvm.i64 + %4167 = llvm.add %4164, %4166 : !llvm.i64 + %4168 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4169 = llvm.mul %4025, %4168 : !llvm.i64 + %4170 = llvm.add %4167, %4169 : !llvm.i64 + %4171 = llvm.getelementptr %4160[%4170] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4172 = llvm.load %4171 : !llvm.ptr> + %4173 = llvm.extractelement %4172[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4174 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4175 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4176 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4177 = llvm.mul %4007, %4176 : !llvm.i64 + %4178 = llvm.add %4175, %4177 : !llvm.i64 + %4179 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4180 = llvm.mul %67, %4179 : !llvm.i64 + %4181 = llvm.add %4178, %4180 : !llvm.i64 + %4182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4183 = llvm.mul %4025, %4182 : !llvm.i64 + %4184 = llvm.add %4181, %4183 : !llvm.i64 + %4185 = llvm.getelementptr %4174[%4184] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4186 = llvm.load %4185 : !llvm.ptr> + %4187 = llvm.extractelement %4186[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4188 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4190 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4191 = llvm.mul %4007, %4190 : !llvm.i64 + %4192 = llvm.add %4189, %4191 : !llvm.i64 + %4193 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4194 = llvm.mul %67, %4193 : !llvm.i64 + %4195 = llvm.add %4192, %4194 : !llvm.i64 + %4196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4197 = llvm.mul %4025, %4196 : !llvm.i64 + %4198 = llvm.add %4195, %4197 : !llvm.i64 + %4199 = llvm.getelementptr %4188[%4198] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4200 = llvm.load %4199 : !llvm.ptr> + %4201 = llvm.extractelement %4200[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4202 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4203 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4204 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4205 = llvm.mul %4007, %4204 : !llvm.i64 + %4206 = llvm.add %4203, %4205 : !llvm.i64 + %4207 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4208 = llvm.mul %67, %4207 : !llvm.i64 + %4209 = llvm.add %4206, %4208 : !llvm.i64 + %4210 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4211 = llvm.mul %4025, %4210 : !llvm.i64 + %4212 = llvm.add %4209, %4211 : !llvm.i64 + %4213 = llvm.getelementptr %4202[%4212] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4214 = llvm.load %4213 : !llvm.ptr> + %4215 = llvm.extractelement %4214[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4216 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4217 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4218 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4219 = llvm.mul %4007, %4218 : !llvm.i64 + %4220 = llvm.add %4217, %4219 : !llvm.i64 + %4221 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4222 = llvm.mul %67, %4221 : !llvm.i64 + %4223 = llvm.add %4220, %4222 : !llvm.i64 + %4224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4225 = llvm.mul %4025, %4224 : !llvm.i64 + %4226 = llvm.add %4223, %4225 : !llvm.i64 + %4227 = llvm.getelementptr %4216[%4226] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4228 = llvm.load %4227 : !llvm.ptr> + %4229 = llvm.extractelement %4228[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4230 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4232 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4233 = llvm.mul %4007, %4232 : !llvm.i64 + %4234 = llvm.add %4231, %4233 : !llvm.i64 + %4235 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4236 = llvm.mul %67, %4235 : !llvm.i64 + %4237 = llvm.add %4234, %4236 : !llvm.i64 + %4238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4239 = llvm.mul %4025, %4238 : !llvm.i64 + %4240 = llvm.add %4237, %4239 : !llvm.i64 + %4241 = llvm.getelementptr %4230[%4240] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4242 = llvm.load %4241 : !llvm.ptr> + %4243 = llvm.extractelement %4242[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4244 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4245 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4246 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4247 = llvm.mul %4007, %4246 : !llvm.i64 + %4248 = llvm.add %4245, %4247 : !llvm.i64 + %4249 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4250 = llvm.mul %67, %4249 : !llvm.i64 + %4251 = llvm.add %4248, %4250 : !llvm.i64 + %4252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4253 = llvm.mul %4025, %4252 : !llvm.i64 + %4254 = llvm.add %4251, %4253 : !llvm.i64 + %4255 = llvm.getelementptr %4244[%4254] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4256 = llvm.load %4255 : !llvm.ptr> + %4257 = llvm.extractelement %4256[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4258 = llvm.fadd %4159, %4138 {RelaxedPrecision} : !llvm.float + %4259 = llvm.fadd %4173, %4139 {RelaxedPrecision} : !llvm.float + %4260 = llvm.fadd %4187, %4140 {RelaxedPrecision} : !llvm.float + %4261 = llvm.fadd %4201, %4141 {RelaxedPrecision} : !llvm.float + %4262 = llvm.fadd %4215, %4142 {RelaxedPrecision} : !llvm.float + %4263 = llvm.fadd %4229, %4143 {RelaxedPrecision} : !llvm.float + %4264 = llvm.fadd %4243, %4144 {RelaxedPrecision} : !llvm.float + %4265 = llvm.fadd %4257, %4145 {RelaxedPrecision} : !llvm.float + %4266 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4268 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4269 = llvm.mul %4007, %4268 : !llvm.i64 + %4270 = llvm.add %4267, %4269 : !llvm.i64 + %4271 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4272 = llvm.mul %67, %4271 : !llvm.i64 + %4273 = llvm.add %4270, %4272 : !llvm.i64 + %4274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4275 = llvm.mul %4025, %4274 : !llvm.i64 + %4276 = llvm.add %4273, %4275 : !llvm.i64 + %4277 = llvm.getelementptr %4266[%4276] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4278 = llvm.load %4277 : !llvm.ptr> + %4279 = llvm.insertelement %4258, %4278[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4280 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4281 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4282 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4283 = llvm.mul %4007, %4282 : !llvm.i64 + %4284 = llvm.add %4281, %4283 : !llvm.i64 + %4285 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4286 = llvm.mul %67, %4285 : !llvm.i64 + %4287 = llvm.add %4284, %4286 : !llvm.i64 + %4288 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4289 = llvm.mul %4025, %4288 : !llvm.i64 + %4290 = llvm.add %4287, %4289 : !llvm.i64 + %4291 = llvm.getelementptr %4280[%4290] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4279, %4291 : !llvm.ptr> + %4292 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4293 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4294 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4295 = llvm.mul %4007, %4294 : !llvm.i64 + %4296 = llvm.add %4293, %4295 : !llvm.i64 + %4297 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4298 = llvm.mul %67, %4297 : !llvm.i64 + %4299 = llvm.add %4296, %4298 : !llvm.i64 + %4300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4301 = llvm.mul %4025, %4300 : !llvm.i64 + %4302 = llvm.add %4299, %4301 : !llvm.i64 + %4303 = llvm.getelementptr %4292[%4302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4304 = llvm.load %4303 : !llvm.ptr> + %4305 = llvm.insertelement %4259, %4304[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4306 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4307 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4308 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4309 = llvm.mul %4007, %4308 : !llvm.i64 + %4310 = llvm.add %4307, %4309 : !llvm.i64 + %4311 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4312 = llvm.mul %67, %4311 : !llvm.i64 + %4313 = llvm.add %4310, %4312 : !llvm.i64 + %4314 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4315 = llvm.mul %4025, %4314 : !llvm.i64 + %4316 = llvm.add %4313, %4315 : !llvm.i64 + %4317 = llvm.getelementptr %4306[%4316] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4305, %4317 : !llvm.ptr> + %4318 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4320 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4321 = llvm.mul %4007, %4320 : !llvm.i64 + %4322 = llvm.add %4319, %4321 : !llvm.i64 + %4323 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4324 = llvm.mul %67, %4323 : !llvm.i64 + %4325 = llvm.add %4322, %4324 : !llvm.i64 + %4326 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4327 = llvm.mul %4025, %4326 : !llvm.i64 + %4328 = llvm.add %4325, %4327 : !llvm.i64 + %4329 = llvm.getelementptr %4318[%4328] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4330 = llvm.load %4329 : !llvm.ptr> + %4331 = llvm.insertelement %4260, %4330[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4332 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4333 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4334 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4335 = llvm.mul %4007, %4334 : !llvm.i64 + %4336 = llvm.add %4333, %4335 : !llvm.i64 + %4337 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4338 = llvm.mul %67, %4337 : !llvm.i64 + %4339 = llvm.add %4336, %4338 : !llvm.i64 + %4340 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4341 = llvm.mul %4025, %4340 : !llvm.i64 + %4342 = llvm.add %4339, %4341 : !llvm.i64 + %4343 = llvm.getelementptr %4332[%4342] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4331, %4343 : !llvm.ptr> + %4344 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4345 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4346 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4347 = llvm.mul %4007, %4346 : !llvm.i64 + %4348 = llvm.add %4345, %4347 : !llvm.i64 + %4349 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4350 = llvm.mul %67, %4349 : !llvm.i64 + %4351 = llvm.add %4348, %4350 : !llvm.i64 + %4352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4353 = llvm.mul %4025, %4352 : !llvm.i64 + %4354 = llvm.add %4351, %4353 : !llvm.i64 + %4355 = llvm.getelementptr %4344[%4354] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4356 = llvm.load %4355 : !llvm.ptr> + %4357 = llvm.insertelement %4261, %4356[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4358 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4360 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4361 = llvm.mul %4007, %4360 : !llvm.i64 + %4362 = llvm.add %4359, %4361 : !llvm.i64 + %4363 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4364 = llvm.mul %67, %4363 : !llvm.i64 + %4365 = llvm.add %4362, %4364 : !llvm.i64 + %4366 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4367 = llvm.mul %4025, %4366 : !llvm.i64 + %4368 = llvm.add %4365, %4367 : !llvm.i64 + %4369 = llvm.getelementptr %4358[%4368] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4357, %4369 : !llvm.ptr> + %4370 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4371 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4372 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4373 = llvm.mul %4007, %4372 : !llvm.i64 + %4374 = llvm.add %4371, %4373 : !llvm.i64 + %4375 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4376 = llvm.mul %67, %4375 : !llvm.i64 + %4377 = llvm.add %4374, %4376 : !llvm.i64 + %4378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4379 = llvm.mul %4025, %4378 : !llvm.i64 + %4380 = llvm.add %4377, %4379 : !llvm.i64 + %4381 = llvm.getelementptr %4370[%4380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4382 = llvm.load %4381 : !llvm.ptr> + %4383 = llvm.insertelement %4262, %4382[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4384 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4385 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4386 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4387 = llvm.mul %4007, %4386 : !llvm.i64 + %4388 = llvm.add %4385, %4387 : !llvm.i64 + %4389 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4390 = llvm.mul %67, %4389 : !llvm.i64 + %4391 = llvm.add %4388, %4390 : !llvm.i64 + %4392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4393 = llvm.mul %4025, %4392 : !llvm.i64 + %4394 = llvm.add %4391, %4393 : !llvm.i64 + %4395 = llvm.getelementptr %4384[%4394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4383, %4395 : !llvm.ptr> + %4396 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4397 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4398 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4399 = llvm.mul %4007, %4398 : !llvm.i64 + %4400 = llvm.add %4397, %4399 : !llvm.i64 + %4401 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4402 = llvm.mul %67, %4401 : !llvm.i64 + %4403 = llvm.add %4400, %4402 : !llvm.i64 + %4404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4405 = llvm.mul %4025, %4404 : !llvm.i64 + %4406 = llvm.add %4403, %4405 : !llvm.i64 + %4407 = llvm.getelementptr %4396[%4406] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4408 = llvm.load %4407 : !llvm.ptr> + %4409 = llvm.insertelement %4263, %4408[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4410 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4412 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4413 = llvm.mul %4007, %4412 : !llvm.i64 + %4414 = llvm.add %4411, %4413 : !llvm.i64 + %4415 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4416 = llvm.mul %67, %4415 : !llvm.i64 + %4417 = llvm.add %4414, %4416 : !llvm.i64 + %4418 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4419 = llvm.mul %4025, %4418 : !llvm.i64 + %4420 = llvm.add %4417, %4419 : !llvm.i64 + %4421 = llvm.getelementptr %4410[%4420] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4409, %4421 : !llvm.ptr> + %4422 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4423 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4424 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4425 = llvm.mul %4007, %4424 : !llvm.i64 + %4426 = llvm.add %4423, %4425 : !llvm.i64 + %4427 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4428 = llvm.mul %67, %4427 : !llvm.i64 + %4429 = llvm.add %4426, %4428 : !llvm.i64 + %4430 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4431 = llvm.mul %4025, %4430 : !llvm.i64 + %4432 = llvm.add %4429, %4431 : !llvm.i64 + %4433 = llvm.getelementptr %4422[%4432] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4434 = llvm.load %4433 : !llvm.ptr> + %4435 = llvm.insertelement %4264, %4434[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4436 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4437 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4438 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4439 = llvm.mul %4007, %4438 : !llvm.i64 + %4440 = llvm.add %4437, %4439 : !llvm.i64 + %4441 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4442 = llvm.mul %67, %4441 : !llvm.i64 + %4443 = llvm.add %4440, %4442 : !llvm.i64 + %4444 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4445 = llvm.mul %4025, %4444 : !llvm.i64 + %4446 = llvm.add %4443, %4445 : !llvm.i64 + %4447 = llvm.getelementptr %4436[%4446] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4435, %4447 : !llvm.ptr> + %4448 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4450 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4451 = llvm.mul %4007, %4450 : !llvm.i64 + %4452 = llvm.add %4449, %4451 : !llvm.i64 + %4453 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4454 = llvm.mul %67, %4453 : !llvm.i64 + %4455 = llvm.add %4452, %4454 : !llvm.i64 + %4456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4457 = llvm.mul %4025, %4456 : !llvm.i64 + %4458 = llvm.add %4455, %4457 : !llvm.i64 + %4459 = llvm.getelementptr %4448[%4458] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4460 = llvm.load %4459 : !llvm.ptr> + %4461 = llvm.insertelement %4265, %4460[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4462 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4463 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4464 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4465 = llvm.mul %4007, %4464 : !llvm.i64 + %4466 = llvm.add %4463, %4465 : !llvm.i64 + %4467 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4468 = llvm.mul %67, %4467 : !llvm.i64 + %4469 = llvm.add %4466, %4468 : !llvm.i64 + %4470 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4471 = llvm.mul %4025, %4470 : !llvm.i64 + %4472 = llvm.add %4469, %4471 : !llvm.i64 + %4473 = llvm.getelementptr %4462[%4472] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4461, %4473 : !llvm.ptr> + %4474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4477 = llvm.mul %4007, %4476 : !llvm.i64 + %4478 = llvm.add %4475, %4477 : !llvm.i64 + %4479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4480 = llvm.mul %67, %4479 : !llvm.i64 + %4481 = llvm.add %4478, %4480 : !llvm.i64 + %4482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4483 = llvm.mul %4025, %4482 : !llvm.i64 + %4484 = llvm.add %4481, %4483 : !llvm.i64 + %4485 = llvm.getelementptr %4474[%4484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4486 = llvm.load %4485 : !llvm.ptr> + %4487 = llvm.insertelement %4258, %4486[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4488 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4490 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4491 = llvm.mul %4007, %4490 : !llvm.i64 + %4492 = llvm.add %4489, %4491 : !llvm.i64 + %4493 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4494 = llvm.mul %67, %4493 : !llvm.i64 + %4495 = llvm.add %4492, %4494 : !llvm.i64 + %4496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4497 = llvm.mul %4025, %4496 : !llvm.i64 + %4498 = llvm.add %4495, %4497 : !llvm.i64 + %4499 = llvm.getelementptr %4488[%4498] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4487, %4499 : !llvm.ptr> + %4500 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4501 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4502 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4503 = llvm.mul %4007, %4502 : !llvm.i64 + %4504 = llvm.add %4501, %4503 : !llvm.i64 + %4505 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4506 = llvm.mul %67, %4505 : !llvm.i64 + %4507 = llvm.add %4504, %4506 : !llvm.i64 + %4508 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4509 = llvm.mul %4025, %4508 : !llvm.i64 + %4510 = llvm.add %4507, %4509 : !llvm.i64 + %4511 = llvm.getelementptr %4500[%4510] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4512 = llvm.load %4511 : !llvm.ptr> + %4513 = llvm.insertelement %4259, %4512[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4514 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4515 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4516 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4517 = llvm.mul %4007, %4516 : !llvm.i64 + %4518 = llvm.add %4515, %4517 : !llvm.i64 + %4519 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4520 = llvm.mul %67, %4519 : !llvm.i64 + %4521 = llvm.add %4518, %4520 : !llvm.i64 + %4522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4523 = llvm.mul %4025, %4522 : !llvm.i64 + %4524 = llvm.add %4521, %4523 : !llvm.i64 + %4525 = llvm.getelementptr %4514[%4524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4513, %4525 : !llvm.ptr> + %4526 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4528 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4529 = llvm.mul %4007, %4528 : !llvm.i64 + %4530 = llvm.add %4527, %4529 : !llvm.i64 + %4531 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4532 = llvm.mul %67, %4531 : !llvm.i64 + %4533 = llvm.add %4530, %4532 : !llvm.i64 + %4534 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4535 = llvm.mul %4025, %4534 : !llvm.i64 + %4536 = llvm.add %4533, %4535 : !llvm.i64 + %4537 = llvm.getelementptr %4526[%4536] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4538 = llvm.load %4537 : !llvm.ptr> + %4539 = llvm.insertelement %4260, %4538[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4540 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4542 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4543 = llvm.mul %4007, %4542 : !llvm.i64 + %4544 = llvm.add %4541, %4543 : !llvm.i64 + %4545 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4546 = llvm.mul %67, %4545 : !llvm.i64 + %4547 = llvm.add %4544, %4546 : !llvm.i64 + %4548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4549 = llvm.mul %4025, %4548 : !llvm.i64 + %4550 = llvm.add %4547, %4549 : !llvm.i64 + %4551 = llvm.getelementptr %4540[%4550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4539, %4551 : !llvm.ptr> + %4552 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4553 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4554 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4555 = llvm.mul %4007, %4554 : !llvm.i64 + %4556 = llvm.add %4553, %4555 : !llvm.i64 + %4557 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4558 = llvm.mul %67, %4557 : !llvm.i64 + %4559 = llvm.add %4556, %4558 : !llvm.i64 + %4560 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4561 = llvm.mul %4025, %4560 : !llvm.i64 + %4562 = llvm.add %4559, %4561 : !llvm.i64 + %4563 = llvm.getelementptr %4552[%4562] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4564 = llvm.load %4563 : !llvm.ptr> + %4565 = llvm.insertelement %4261, %4564[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4566 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4567 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4568 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4569 = llvm.mul %4007, %4568 : !llvm.i64 + %4570 = llvm.add %4567, %4569 : !llvm.i64 + %4571 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4572 = llvm.mul %67, %4571 : !llvm.i64 + %4573 = llvm.add %4570, %4572 : !llvm.i64 + %4574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4575 = llvm.mul %4025, %4574 : !llvm.i64 + %4576 = llvm.add %4573, %4575 : !llvm.i64 + %4577 = llvm.getelementptr %4566[%4576] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4565, %4577 : !llvm.ptr> + %4578 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4580 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4581 = llvm.mul %4007, %4580 : !llvm.i64 + %4582 = llvm.add %4579, %4581 : !llvm.i64 + %4583 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4584 = llvm.mul %67, %4583 : !llvm.i64 + %4585 = llvm.add %4582, %4584 : !llvm.i64 + %4586 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4587 = llvm.mul %4025, %4586 : !llvm.i64 + %4588 = llvm.add %4585, %4587 : !llvm.i64 + %4589 = llvm.getelementptr %4578[%4588] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4590 = llvm.load %4589 : !llvm.ptr> + %4591 = llvm.insertelement %4262, %4590[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4592 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4593 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4594 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4595 = llvm.mul %4007, %4594 : !llvm.i64 + %4596 = llvm.add %4593, %4595 : !llvm.i64 + %4597 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4598 = llvm.mul %67, %4597 : !llvm.i64 + %4599 = llvm.add %4596, %4598 : !llvm.i64 + %4600 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4601 = llvm.mul %4025, %4600 : !llvm.i64 + %4602 = llvm.add %4599, %4601 : !llvm.i64 + %4603 = llvm.getelementptr %4592[%4602] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4591, %4603 : !llvm.ptr> + %4604 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4605 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4606 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4607 = llvm.mul %4007, %4606 : !llvm.i64 + %4608 = llvm.add %4605, %4607 : !llvm.i64 + %4609 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4610 = llvm.mul %67, %4609 : !llvm.i64 + %4611 = llvm.add %4608, %4610 : !llvm.i64 + %4612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4613 = llvm.mul %4025, %4612 : !llvm.i64 + %4614 = llvm.add %4611, %4613 : !llvm.i64 + %4615 = llvm.getelementptr %4604[%4614] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4616 = llvm.load %4615 : !llvm.ptr> + %4617 = llvm.insertelement %4263, %4616[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4618 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4620 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4621 = llvm.mul %4007, %4620 : !llvm.i64 + %4622 = llvm.add %4619, %4621 : !llvm.i64 + %4623 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4624 = llvm.mul %67, %4623 : !llvm.i64 + %4625 = llvm.add %4622, %4624 : !llvm.i64 + %4626 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4627 = llvm.mul %4025, %4626 : !llvm.i64 + %4628 = llvm.add %4625, %4627 : !llvm.i64 + %4629 = llvm.getelementptr %4618[%4628] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4617, %4629 : !llvm.ptr> + %4630 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4632 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4633 = llvm.mul %4007, %4632 : !llvm.i64 + %4634 = llvm.add %4631, %4633 : !llvm.i64 + %4635 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4636 = llvm.mul %67, %4635 : !llvm.i64 + %4637 = llvm.add %4634, %4636 : !llvm.i64 + %4638 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4639 = llvm.mul %4025, %4638 : !llvm.i64 + %4640 = llvm.add %4637, %4639 : !llvm.i64 + %4641 = llvm.getelementptr %4630[%4640] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4642 = llvm.load %4641 : !llvm.ptr> + %4643 = llvm.insertelement %4264, %4642[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4644 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4645 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4646 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4647 = llvm.mul %4007, %4646 : !llvm.i64 + %4648 = llvm.add %4645, %4647 : !llvm.i64 + %4649 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4650 = llvm.mul %67, %4649 : !llvm.i64 + %4651 = llvm.add %4648, %4650 : !llvm.i64 + %4652 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4653 = llvm.mul %4025, %4652 : !llvm.i64 + %4654 = llvm.add %4651, %4653 : !llvm.i64 + %4655 = llvm.getelementptr %4644[%4654] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4643, %4655 : !llvm.ptr> + %4656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4659 = llvm.mul %4007, %4658 : !llvm.i64 + %4660 = llvm.add %4657, %4659 : !llvm.i64 + %4661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4662 = llvm.mul %67, %4661 : !llvm.i64 + %4663 = llvm.add %4660, %4662 : !llvm.i64 + %4664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4665 = llvm.mul %4025, %4664 : !llvm.i64 + %4666 = llvm.add %4663, %4665 : !llvm.i64 + %4667 = llvm.getelementptr %4656[%4666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4668 = llvm.load %4667 : !llvm.ptr> + %4669 = llvm.insertelement %4265, %4668[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4673 = llvm.mul %4007, %4672 : !llvm.i64 + %4674 = llvm.add %4671, %4673 : !llvm.i64 + %4675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4676 = llvm.mul %67, %4675 : !llvm.i64 + %4677 = llvm.add %4674, %4676 : !llvm.i64 + %4678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4679 = llvm.mul %4025, %4678 : !llvm.i64 + %4680 = llvm.add %4677, %4679 : !llvm.i64 + %4681 = llvm.getelementptr %4670[%4680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4669, %4681 : !llvm.ptr> + %4682 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4683 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4684 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4685 = llvm.mul %2345, %4684 : !llvm.i64 + %4686 = llvm.add %4683, %4685 : !llvm.i64 + %4687 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4688 = llvm.mul %3917, %4687 : !llvm.i64 + %4689 = llvm.add %4686, %4688 : !llvm.i64 + %4690 = llvm.getelementptr %4682[%4689] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4691 = llvm.load %4690 : !llvm.ptr + %4692 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4694 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4695 = llvm.mul %2345, %4694 : !llvm.i64 + %4696 = llvm.add %4693, %4695 : !llvm.i64 + %4697 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4698 = llvm.mul %3917, %4697 : !llvm.i64 + %4699 = llvm.add %4696, %4698 : !llvm.i64 + %4700 = llvm.getelementptr %4692[%4699] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4701 = llvm.load %4700 : !llvm.ptr + %4702 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4703 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4704 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4705 = llvm.mul %2345, %4704 : !llvm.i64 + %4706 = llvm.add %4703, %4705 : !llvm.i64 + %4707 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4708 = llvm.mul %3917, %4707 : !llvm.i64 + %4709 = llvm.add %4706, %4708 : !llvm.i64 + %4710 = llvm.getelementptr %4702[%4709] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4711 = llvm.load %4710 : !llvm.ptr + %4712 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4714 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4715 = llvm.mul %2345, %4714 : !llvm.i64 + %4716 = llvm.add %4713, %4715 : !llvm.i64 + %4717 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4718 = llvm.mul %3917, %4717 : !llvm.i64 + %4719 = llvm.add %4716, %4718 : !llvm.i64 + %4720 = llvm.getelementptr %4712[%4719] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4721 = llvm.load %4720 : !llvm.ptr + %4722 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4724 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4725 = llvm.mul %2345, %4724 : !llvm.i64 + %4726 = llvm.add %4723, %4725 : !llvm.i64 + %4727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4728 = llvm.mul %3917, %4727 : !llvm.i64 + %4729 = llvm.add %4726, %4728 : !llvm.i64 + %4730 = llvm.getelementptr %4722[%4729] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4731 = llvm.load %4730 : !llvm.ptr + %4732 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4734 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4735 = llvm.mul %2345, %4734 : !llvm.i64 + %4736 = llvm.add %4733, %4735 : !llvm.i64 + %4737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4738 = llvm.mul %3917, %4737 : !llvm.i64 + %4739 = llvm.add %4736, %4738 : !llvm.i64 + %4740 = llvm.getelementptr %4732[%4739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4741 = llvm.load %4740 : !llvm.ptr + %4742 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4743 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4744 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4745 = llvm.mul %2345, %4744 : !llvm.i64 + %4746 = llvm.add %4743, %4745 : !llvm.i64 + %4747 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4748 = llvm.mul %3917, %4747 : !llvm.i64 + %4749 = llvm.add %4746, %4748 : !llvm.i64 + %4750 = llvm.getelementptr %4742[%4749] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4751 = llvm.load %4750 : !llvm.ptr + %4752 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4754 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4755 = llvm.mul %2345, %4754 : !llvm.i64 + %4756 = llvm.add %4753, %4755 : !llvm.i64 + %4757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4758 = llvm.mul %3917, %4757 : !llvm.i64 + %4759 = llvm.add %4756, %4758 : !llvm.i64 + %4760 = llvm.getelementptr %4752[%4759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4761 = llvm.load %4760 : !llvm.ptr + %4762 = llvm.add %2368, %70 : !llvm.i64 + %4763 = llvm.icmp "slt" %4762, %67 : !llvm.i64 + %4764 = llvm.sub %64, %4762 : !llvm.i64 + %4765 = llvm.select %4763, %4764, %4762 : !llvm.i1, !llvm.i64 + %4766 = llvm.sdiv %4765, %68 : !llvm.i64 + %4767 = llvm.sub %64, %4766 : !llvm.i64 + %4768 = llvm.select %4763, %4767, %4766 : !llvm.i1, !llvm.i64 + %4769 = llvm.srem %4768, %68 : !llvm.i64 + %4770 = llvm.icmp "slt" %4769, %67 : !llvm.i64 + %4771 = llvm.add %4769, %68 : !llvm.i64 + %4772 = llvm.select %4770, %4771, %4769 : !llvm.i1, !llvm.i64 + %4773 = llvm.sdiv %4000, %70 : !llvm.i64 + %4774 = llvm.sub %64, %4773 : !llvm.i64 + %4775 = llvm.select %3998, %4774, %4773 : !llvm.i1, !llvm.i64 + %4776 = llvm.mul %4768, %65 : !llvm.i64 + %4777 = llvm.add %4775, %4776 : !llvm.i64 + %4778 = llvm.add %4777, %69 : !llvm.i64 + %4779 = llvm.icmp "slt" %4778, %67 : !llvm.i64 + %4780 = llvm.sub %64, %4778 : !llvm.i64 + %4781 = llvm.select %4779, %4780, %4778 : !llvm.i1, !llvm.i64 + %4782 = llvm.sdiv %4781, %63 : !llvm.i64 + %4783 = llvm.sub %64, %4782 : !llvm.i64 + %4784 = llvm.select %4779, %4783, %4782 : !llvm.i1, !llvm.i64 + %4785 = llvm.mul %4784, %65 : !llvm.i64 + %4786 = llvm.add %4777, %4785 : !llvm.i64 + %4787 = llvm.add %4786, %69 : !llvm.i64 + %4788 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4790 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4791 = llvm.mul %4772, %4790 : !llvm.i64 + %4792 = llvm.add %4789, %4791 : !llvm.i64 + %4793 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4794 = llvm.mul %4011, %4793 : !llvm.i64 + %4795 = llvm.add %4792, %4794 : !llvm.i64 + %4796 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4797 = llvm.mul %4787, %4796 : !llvm.i64 + %4798 = llvm.add %4795, %4797 : !llvm.i64 + %4799 = llvm.getelementptr %4788[%4798] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4800 = llvm.load %4799 : !llvm.ptr> + %4801 = llvm.extractelement %4800[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4802 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4803 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4804 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4805 = llvm.mul %4772, %4804 : !llvm.i64 + %4806 = llvm.add %4803, %4805 : !llvm.i64 + %4807 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4808 = llvm.mul %4011, %4807 : !llvm.i64 + %4809 = llvm.add %4806, %4808 : !llvm.i64 + %4810 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4811 = llvm.mul %4787, %4810 : !llvm.i64 + %4812 = llvm.add %4809, %4811 : !llvm.i64 + %4813 = llvm.getelementptr %4802[%4812] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4814 = llvm.load %4813 : !llvm.ptr> + %4815 = llvm.extractelement %4814[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4816 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4817 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4818 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4819 = llvm.mul %4772, %4818 : !llvm.i64 + %4820 = llvm.add %4817, %4819 : !llvm.i64 + %4821 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4822 = llvm.mul %4011, %4821 : !llvm.i64 + %4823 = llvm.add %4820, %4822 : !llvm.i64 + %4824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4825 = llvm.mul %4787, %4824 : !llvm.i64 + %4826 = llvm.add %4823, %4825 : !llvm.i64 + %4827 = llvm.getelementptr %4816[%4826] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4828 = llvm.load %4827 : !llvm.ptr> + %4829 = llvm.extractelement %4828[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4830 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4831 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4832 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4833 = llvm.mul %4772, %4832 : !llvm.i64 + %4834 = llvm.add %4831, %4833 : !llvm.i64 + %4835 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4836 = llvm.mul %4011, %4835 : !llvm.i64 + %4837 = llvm.add %4834, %4836 : !llvm.i64 + %4838 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4839 = llvm.mul %4787, %4838 : !llvm.i64 + %4840 = llvm.add %4837, %4839 : !llvm.i64 + %4841 = llvm.getelementptr %4830[%4840] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4842 = llvm.load %4841 : !llvm.ptr> + %4843 = llvm.extractelement %4842[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4844 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4845 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4846 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4847 = llvm.mul %4772, %4846 : !llvm.i64 + %4848 = llvm.add %4845, %4847 : !llvm.i64 + %4849 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4850 = llvm.mul %4011, %4849 : !llvm.i64 + %4851 = llvm.add %4848, %4850 : !llvm.i64 + %4852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4853 = llvm.mul %4787, %4852 : !llvm.i64 + %4854 = llvm.add %4851, %4853 : !llvm.i64 + %4855 = llvm.getelementptr %4844[%4854] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4856 = llvm.load %4855 : !llvm.ptr> + %4857 = llvm.extractelement %4856[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4858 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4860 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4861 = llvm.mul %4772, %4860 : !llvm.i64 + %4862 = llvm.add %4859, %4861 : !llvm.i64 + %4863 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4864 = llvm.mul %4011, %4863 : !llvm.i64 + %4865 = llvm.add %4862, %4864 : !llvm.i64 + %4866 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4867 = llvm.mul %4787, %4866 : !llvm.i64 + %4868 = llvm.add %4865, %4867 : !llvm.i64 + %4869 = llvm.getelementptr %4858[%4868] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4870 = llvm.load %4869 : !llvm.ptr> + %4871 = llvm.extractelement %4870[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4872 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4873 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4874 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4875 = llvm.mul %4772, %4874 : !llvm.i64 + %4876 = llvm.add %4873, %4875 : !llvm.i64 + %4877 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4878 = llvm.mul %4011, %4877 : !llvm.i64 + %4879 = llvm.add %4876, %4878 : !llvm.i64 + %4880 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4881 = llvm.mul %4787, %4880 : !llvm.i64 + %4882 = llvm.add %4879, %4881 : !llvm.i64 + %4883 = llvm.getelementptr %4872[%4882] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4884 = llvm.load %4883 : !llvm.ptr> + %4885 = llvm.extractelement %4884[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4886 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4888 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4889 = llvm.mul %4772, %4888 : !llvm.i64 + %4890 = llvm.add %4887, %4889 : !llvm.i64 + %4891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4892 = llvm.mul %4011, %4891 : !llvm.i64 + %4893 = llvm.add %4890, %4892 : !llvm.i64 + %4894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4895 = llvm.mul %4787, %4894 : !llvm.i64 + %4896 = llvm.add %4893, %4895 : !llvm.i64 + %4897 = llvm.getelementptr %4886[%4896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4898 = llvm.load %4897 : !llvm.ptr> + %4899 = llvm.extractelement %4898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4900 = llvm.fmul %4691, %4801 {RelaxedPrecision} : !llvm.float + %4901 = llvm.fmul %4701, %4815 {RelaxedPrecision} : !llvm.float + %4902 = llvm.fmul %4711, %4829 {RelaxedPrecision} : !llvm.float + %4903 = llvm.fmul %4721, %4843 {RelaxedPrecision} : !llvm.float + %4904 = llvm.fmul %4731, %4857 {RelaxedPrecision} : !llvm.float + %4905 = llvm.fmul %4741, %4871 {RelaxedPrecision} : !llvm.float + %4906 = llvm.fmul %4751, %4885 {RelaxedPrecision} : !llvm.float + %4907 = llvm.fmul %4761, %4899 {RelaxedPrecision} : !llvm.float + %4908 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4910 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4911 = llvm.mul %4772, %4910 : !llvm.i64 + %4912 = llvm.add %4909, %4911 : !llvm.i64 + %4913 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4914 = llvm.mul %67, %4913 : !llvm.i64 + %4915 = llvm.add %4912, %4914 : !llvm.i64 + %4916 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4917 = llvm.mul %4787, %4916 : !llvm.i64 + %4918 = llvm.add %4915, %4917 : !llvm.i64 + %4919 = llvm.getelementptr %4908[%4918] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4920 = llvm.load %4919 : !llvm.ptr> + %4921 = llvm.extractelement %4920[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4922 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4923 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4924 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4925 = llvm.mul %4772, %4924 : !llvm.i64 + %4926 = llvm.add %4923, %4925 : !llvm.i64 + %4927 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4928 = llvm.mul %67, %4927 : !llvm.i64 + %4929 = llvm.add %4926, %4928 : !llvm.i64 + %4930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4931 = llvm.mul %4787, %4930 : !llvm.i64 + %4932 = llvm.add %4929, %4931 : !llvm.i64 + %4933 = llvm.getelementptr %4922[%4932] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4934 = llvm.load %4933 : !llvm.ptr> + %4935 = llvm.extractelement %4934[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4936 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4937 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4938 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4939 = llvm.mul %4772, %4938 : !llvm.i64 + %4940 = llvm.add %4937, %4939 : !llvm.i64 + %4941 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4942 = llvm.mul %67, %4941 : !llvm.i64 + %4943 = llvm.add %4940, %4942 : !llvm.i64 + %4944 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4945 = llvm.mul %4787, %4944 : !llvm.i64 + %4946 = llvm.add %4943, %4945 : !llvm.i64 + %4947 = llvm.getelementptr %4936[%4946] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4948 = llvm.load %4947 : !llvm.ptr> + %4949 = llvm.extractelement %4948[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4950 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4951 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4952 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4953 = llvm.mul %4772, %4952 : !llvm.i64 + %4954 = llvm.add %4951, %4953 : !llvm.i64 + %4955 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4956 = llvm.mul %67, %4955 : !llvm.i64 + %4957 = llvm.add %4954, %4956 : !llvm.i64 + %4958 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4959 = llvm.mul %4787, %4958 : !llvm.i64 + %4960 = llvm.add %4957, %4959 : !llvm.i64 + %4961 = llvm.getelementptr %4950[%4960] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4962 = llvm.load %4961 : !llvm.ptr> + %4963 = llvm.extractelement %4962[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4964 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4966 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4967 = llvm.mul %4772, %4966 : !llvm.i64 + %4968 = llvm.add %4965, %4967 : !llvm.i64 + %4969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4970 = llvm.mul %67, %4969 : !llvm.i64 + %4971 = llvm.add %4968, %4970 : !llvm.i64 + %4972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4973 = llvm.mul %4787, %4972 : !llvm.i64 + %4974 = llvm.add %4971, %4973 : !llvm.i64 + %4975 = llvm.getelementptr %4964[%4974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4976 = llvm.load %4975 : !llvm.ptr> + %4977 = llvm.extractelement %4976[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4978 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4980 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4981 = llvm.mul %4772, %4980 : !llvm.i64 + %4982 = llvm.add %4979, %4981 : !llvm.i64 + %4983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4984 = llvm.mul %67, %4983 : !llvm.i64 + %4985 = llvm.add %4982, %4984 : !llvm.i64 + %4986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4987 = llvm.mul %4787, %4986 : !llvm.i64 + %4988 = llvm.add %4985, %4987 : !llvm.i64 + %4989 = llvm.getelementptr %4978[%4988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4990 = llvm.load %4989 : !llvm.ptr> + %4991 = llvm.extractelement %4990[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4992 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4993 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4994 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4995 = llvm.mul %4772, %4994 : !llvm.i64 + %4996 = llvm.add %4993, %4995 : !llvm.i64 + %4997 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4998 = llvm.mul %67, %4997 : !llvm.i64 + %4999 = llvm.add %4996, %4998 : !llvm.i64 + %5000 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5001 = llvm.mul %4787, %5000 : !llvm.i64 + %5002 = llvm.add %4999, %5001 : !llvm.i64 + %5003 = llvm.getelementptr %4992[%5002] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5004 = llvm.load %5003 : !llvm.ptr> + %5005 = llvm.extractelement %5004[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5006 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5007 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5008 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5009 = llvm.mul %4772, %5008 : !llvm.i64 + %5010 = llvm.add %5007, %5009 : !llvm.i64 + %5011 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5012 = llvm.mul %67, %5011 : !llvm.i64 + %5013 = llvm.add %5010, %5012 : !llvm.i64 + %5014 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5015 = llvm.mul %4787, %5014 : !llvm.i64 + %5016 = llvm.add %5013, %5015 : !llvm.i64 + %5017 = llvm.getelementptr %5006[%5016] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5018 = llvm.load %5017 : !llvm.ptr> + %5019 = llvm.extractelement %5018[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5020 = llvm.fadd %4921, %4900 {RelaxedPrecision} : !llvm.float + %5021 = llvm.fadd %4935, %4901 {RelaxedPrecision} : !llvm.float + %5022 = llvm.fadd %4949, %4902 {RelaxedPrecision} : !llvm.float + %5023 = llvm.fadd %4963, %4903 {RelaxedPrecision} : !llvm.float + %5024 = llvm.fadd %4977, %4904 {RelaxedPrecision} : !llvm.float + %5025 = llvm.fadd %4991, %4905 {RelaxedPrecision} : !llvm.float + %5026 = llvm.fadd %5005, %4906 {RelaxedPrecision} : !llvm.float + %5027 = llvm.fadd %5019, %4907 {RelaxedPrecision} : !llvm.float + %5028 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5030 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5031 = llvm.mul %4772, %5030 : !llvm.i64 + %5032 = llvm.add %5029, %5031 : !llvm.i64 + %5033 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5034 = llvm.mul %67, %5033 : !llvm.i64 + %5035 = llvm.add %5032, %5034 : !llvm.i64 + %5036 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5037 = llvm.mul %4787, %5036 : !llvm.i64 + %5038 = llvm.add %5035, %5037 : !llvm.i64 + %5039 = llvm.getelementptr %5028[%5038] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5040 = llvm.load %5039 : !llvm.ptr> + %5041 = llvm.insertelement %5020, %5040[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5042 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5043 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5044 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5045 = llvm.mul %4772, %5044 : !llvm.i64 + %5046 = llvm.add %5043, %5045 : !llvm.i64 + %5047 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5048 = llvm.mul %67, %5047 : !llvm.i64 + %5049 = llvm.add %5046, %5048 : !llvm.i64 + %5050 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5051 = llvm.mul %4787, %5050 : !llvm.i64 + %5052 = llvm.add %5049, %5051 : !llvm.i64 + %5053 = llvm.getelementptr %5042[%5052] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5041, %5053 : !llvm.ptr> + %5054 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5056 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5057 = llvm.mul %4772, %5056 : !llvm.i64 + %5058 = llvm.add %5055, %5057 : !llvm.i64 + %5059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5060 = llvm.mul %67, %5059 : !llvm.i64 + %5061 = llvm.add %5058, %5060 : !llvm.i64 + %5062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5063 = llvm.mul %4787, %5062 : !llvm.i64 + %5064 = llvm.add %5061, %5063 : !llvm.i64 + %5065 = llvm.getelementptr %5054[%5064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5066 = llvm.load %5065 : !llvm.ptr> + %5067 = llvm.insertelement %5021, %5066[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5068 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5070 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5071 = llvm.mul %4772, %5070 : !llvm.i64 + %5072 = llvm.add %5069, %5071 : !llvm.i64 + %5073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5074 = llvm.mul %67, %5073 : !llvm.i64 + %5075 = llvm.add %5072, %5074 : !llvm.i64 + %5076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5077 = llvm.mul %4787, %5076 : !llvm.i64 + %5078 = llvm.add %5075, %5077 : !llvm.i64 + %5079 = llvm.getelementptr %5068[%5078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5067, %5079 : !llvm.ptr> + %5080 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5081 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5082 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5083 = llvm.mul %4772, %5082 : !llvm.i64 + %5084 = llvm.add %5081, %5083 : !llvm.i64 + %5085 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5086 = llvm.mul %67, %5085 : !llvm.i64 + %5087 = llvm.add %5084, %5086 : !llvm.i64 + %5088 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5089 = llvm.mul %4787, %5088 : !llvm.i64 + %5090 = llvm.add %5087, %5089 : !llvm.i64 + %5091 = llvm.getelementptr %5080[%5090] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5092 = llvm.load %5091 : !llvm.ptr> + %5093 = llvm.insertelement %5022, %5092[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5094 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5095 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5096 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5097 = llvm.mul %4772, %5096 : !llvm.i64 + %5098 = llvm.add %5095, %5097 : !llvm.i64 + %5099 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5100 = llvm.mul %67, %5099 : !llvm.i64 + %5101 = llvm.add %5098, %5100 : !llvm.i64 + %5102 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5103 = llvm.mul %4787, %5102 : !llvm.i64 + %5104 = llvm.add %5101, %5103 : !llvm.i64 + %5105 = llvm.getelementptr %5094[%5104] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5093, %5105 : !llvm.ptr> + %5106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5109 = llvm.mul %4772, %5108 : !llvm.i64 + %5110 = llvm.add %5107, %5109 : !llvm.i64 + %5111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5112 = llvm.mul %67, %5111 : !llvm.i64 + %5113 = llvm.add %5110, %5112 : !llvm.i64 + %5114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5115 = llvm.mul %4787, %5114 : !llvm.i64 + %5116 = llvm.add %5113, %5115 : !llvm.i64 + %5117 = llvm.getelementptr %5106[%5116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5118 = llvm.load %5117 : !llvm.ptr> + %5119 = llvm.insertelement %5023, %5118[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5120 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5122 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5123 = llvm.mul %4772, %5122 : !llvm.i64 + %5124 = llvm.add %5121, %5123 : !llvm.i64 + %5125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5126 = llvm.mul %67, %5125 : !llvm.i64 + %5127 = llvm.add %5124, %5126 : !llvm.i64 + %5128 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5129 = llvm.mul %4787, %5128 : !llvm.i64 + %5130 = llvm.add %5127, %5129 : !llvm.i64 + %5131 = llvm.getelementptr %5120[%5130] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5119, %5131 : !llvm.ptr> + %5132 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5134 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5135 = llvm.mul %4772, %5134 : !llvm.i64 + %5136 = llvm.add %5133, %5135 : !llvm.i64 + %5137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5138 = llvm.mul %67, %5137 : !llvm.i64 + %5139 = llvm.add %5136, %5138 : !llvm.i64 + %5140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5141 = llvm.mul %4787, %5140 : !llvm.i64 + %5142 = llvm.add %5139, %5141 : !llvm.i64 + %5143 = llvm.getelementptr %5132[%5142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5144 = llvm.load %5143 : !llvm.ptr> + %5145 = llvm.insertelement %5024, %5144[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5149 = llvm.mul %4772, %5148 : !llvm.i64 + %5150 = llvm.add %5147, %5149 : !llvm.i64 + %5151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5152 = llvm.mul %67, %5151 : !llvm.i64 + %5153 = llvm.add %5150, %5152 : !llvm.i64 + %5154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5155 = llvm.mul %4787, %5154 : !llvm.i64 + %5156 = llvm.add %5153, %5155 : !llvm.i64 + %5157 = llvm.getelementptr %5146[%5156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5145, %5157 : !llvm.ptr> + %5158 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5160 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5161 = llvm.mul %4772, %5160 : !llvm.i64 + %5162 = llvm.add %5159, %5161 : !llvm.i64 + %5163 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5164 = llvm.mul %67, %5163 : !llvm.i64 + %5165 = llvm.add %5162, %5164 : !llvm.i64 + %5166 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5167 = llvm.mul %4787, %5166 : !llvm.i64 + %5168 = llvm.add %5165, %5167 : !llvm.i64 + %5169 = llvm.getelementptr %5158[%5168] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5170 = llvm.load %5169 : !llvm.ptr> + %5171 = llvm.insertelement %5025, %5170[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5172 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5173 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5174 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5175 = llvm.mul %4772, %5174 : !llvm.i64 + %5176 = llvm.add %5173, %5175 : !llvm.i64 + %5177 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5178 = llvm.mul %67, %5177 : !llvm.i64 + %5179 = llvm.add %5176, %5178 : !llvm.i64 + %5180 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5181 = llvm.mul %4787, %5180 : !llvm.i64 + %5182 = llvm.add %5179, %5181 : !llvm.i64 + %5183 = llvm.getelementptr %5172[%5182] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5171, %5183 : !llvm.ptr> + %5184 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5185 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5186 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5187 = llvm.mul %4772, %5186 : !llvm.i64 + %5188 = llvm.add %5185, %5187 : !llvm.i64 + %5189 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5190 = llvm.mul %67, %5189 : !llvm.i64 + %5191 = llvm.add %5188, %5190 : !llvm.i64 + %5192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5193 = llvm.mul %4787, %5192 : !llvm.i64 + %5194 = llvm.add %5191, %5193 : !llvm.i64 + %5195 = llvm.getelementptr %5184[%5194] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5196 = llvm.load %5195 : !llvm.ptr> + %5197 = llvm.insertelement %5026, %5196[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5198 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5200 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5201 = llvm.mul %4772, %5200 : !llvm.i64 + %5202 = llvm.add %5199, %5201 : !llvm.i64 + %5203 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5204 = llvm.mul %67, %5203 : !llvm.i64 + %5205 = llvm.add %5202, %5204 : !llvm.i64 + %5206 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5207 = llvm.mul %4787, %5206 : !llvm.i64 + %5208 = llvm.add %5205, %5207 : !llvm.i64 + %5209 = llvm.getelementptr %5198[%5208] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5197, %5209 : !llvm.ptr> + %5210 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5212 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5213 = llvm.mul %4772, %5212 : !llvm.i64 + %5214 = llvm.add %5211, %5213 : !llvm.i64 + %5215 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5216 = llvm.mul %67, %5215 : !llvm.i64 + %5217 = llvm.add %5214, %5216 : !llvm.i64 + %5218 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5219 = llvm.mul %4787, %5218 : !llvm.i64 + %5220 = llvm.add %5217, %5219 : !llvm.i64 + %5221 = llvm.getelementptr %5210[%5220] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5222 = llvm.load %5221 : !llvm.ptr> + %5223 = llvm.insertelement %5027, %5222[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5224 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5225 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5226 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5227 = llvm.mul %4772, %5226 : !llvm.i64 + %5228 = llvm.add %5225, %5227 : !llvm.i64 + %5229 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5230 = llvm.mul %67, %5229 : !llvm.i64 + %5231 = llvm.add %5228, %5230 : !llvm.i64 + %5232 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5233 = llvm.mul %4787, %5232 : !llvm.i64 + %5234 = llvm.add %5231, %5233 : !llvm.i64 + %5235 = llvm.getelementptr %5224[%5234] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5223, %5235 : !llvm.ptr> + %5236 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5237 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5238 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5239 = llvm.mul %4772, %5238 : !llvm.i64 + %5240 = llvm.add %5237, %5239 : !llvm.i64 + %5241 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5242 = llvm.mul %67, %5241 : !llvm.i64 + %5243 = llvm.add %5240, %5242 : !llvm.i64 + %5244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5245 = llvm.mul %4787, %5244 : !llvm.i64 + %5246 = llvm.add %5243, %5245 : !llvm.i64 + %5247 = llvm.getelementptr %5236[%5246] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5248 = llvm.load %5247 : !llvm.ptr> + %5249 = llvm.insertelement %5020, %5248[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5250 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5252 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5253 = llvm.mul %4772, %5252 : !llvm.i64 + %5254 = llvm.add %5251, %5253 : !llvm.i64 + %5255 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5256 = llvm.mul %67, %5255 : !llvm.i64 + %5257 = llvm.add %5254, %5256 : !llvm.i64 + %5258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5259 = llvm.mul %4787, %5258 : !llvm.i64 + %5260 = llvm.add %5257, %5259 : !llvm.i64 + %5261 = llvm.getelementptr %5250[%5260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5249, %5261 : !llvm.ptr> + %5262 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5263 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5264 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5265 = llvm.mul %4772, %5264 : !llvm.i64 + %5266 = llvm.add %5263, %5265 : !llvm.i64 + %5267 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5268 = llvm.mul %67, %5267 : !llvm.i64 + %5269 = llvm.add %5266, %5268 : !llvm.i64 + %5270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5271 = llvm.mul %4787, %5270 : !llvm.i64 + %5272 = llvm.add %5269, %5271 : !llvm.i64 + %5273 = llvm.getelementptr %5262[%5272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5274 = llvm.load %5273 : !llvm.ptr> + %5275 = llvm.insertelement %5021, %5274[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5276 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5277 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5278 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5279 = llvm.mul %4772, %5278 : !llvm.i64 + %5280 = llvm.add %5277, %5279 : !llvm.i64 + %5281 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5282 = llvm.mul %67, %5281 : !llvm.i64 + %5283 = llvm.add %5280, %5282 : !llvm.i64 + %5284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5285 = llvm.mul %4787, %5284 : !llvm.i64 + %5286 = llvm.add %5283, %5285 : !llvm.i64 + %5287 = llvm.getelementptr %5276[%5286] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5275, %5287 : !llvm.ptr> + %5288 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5290 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5291 = llvm.mul %4772, %5290 : !llvm.i64 + %5292 = llvm.add %5289, %5291 : !llvm.i64 + %5293 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5294 = llvm.mul %67, %5293 : !llvm.i64 + %5295 = llvm.add %5292, %5294 : !llvm.i64 + %5296 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5297 = llvm.mul %4787, %5296 : !llvm.i64 + %5298 = llvm.add %5295, %5297 : !llvm.i64 + %5299 = llvm.getelementptr %5288[%5298] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5300 = llvm.load %5299 : !llvm.ptr> + %5301 = llvm.insertelement %5022, %5300[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5302 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5303 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5304 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5305 = llvm.mul %4772, %5304 : !llvm.i64 + %5306 = llvm.add %5303, %5305 : !llvm.i64 + %5307 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5308 = llvm.mul %67, %5307 : !llvm.i64 + %5309 = llvm.add %5306, %5308 : !llvm.i64 + %5310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5311 = llvm.mul %4787, %5310 : !llvm.i64 + %5312 = llvm.add %5309, %5311 : !llvm.i64 + %5313 = llvm.getelementptr %5302[%5312] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5301, %5313 : !llvm.ptr> + %5314 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5316 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5317 = llvm.mul %4772, %5316 : !llvm.i64 + %5318 = llvm.add %5315, %5317 : !llvm.i64 + %5319 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5320 = llvm.mul %67, %5319 : !llvm.i64 + %5321 = llvm.add %5318, %5320 : !llvm.i64 + %5322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5323 = llvm.mul %4787, %5322 : !llvm.i64 + %5324 = llvm.add %5321, %5323 : !llvm.i64 + %5325 = llvm.getelementptr %5314[%5324] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5326 = llvm.load %5325 : !llvm.ptr> + %5327 = llvm.insertelement %5023, %5326[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5328 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5330 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5331 = llvm.mul %4772, %5330 : !llvm.i64 + %5332 = llvm.add %5329, %5331 : !llvm.i64 + %5333 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5334 = llvm.mul %67, %5333 : !llvm.i64 + %5335 = llvm.add %5332, %5334 : !llvm.i64 + %5336 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5337 = llvm.mul %4787, %5336 : !llvm.i64 + %5338 = llvm.add %5335, %5337 : !llvm.i64 + %5339 = llvm.getelementptr %5328[%5338] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5327, %5339 : !llvm.ptr> + %5340 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5342 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5343 = llvm.mul %4772, %5342 : !llvm.i64 + %5344 = llvm.add %5341, %5343 : !llvm.i64 + %5345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5346 = llvm.mul %67, %5345 : !llvm.i64 + %5347 = llvm.add %5344, %5346 : !llvm.i64 + %5348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5349 = llvm.mul %4787, %5348 : !llvm.i64 + %5350 = llvm.add %5347, %5349 : !llvm.i64 + %5351 = llvm.getelementptr %5340[%5350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5352 = llvm.load %5351 : !llvm.ptr> + %5353 = llvm.insertelement %5024, %5352[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5354 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5356 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5357 = llvm.mul %4772, %5356 : !llvm.i64 + %5358 = llvm.add %5355, %5357 : !llvm.i64 + %5359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5360 = llvm.mul %67, %5359 : !llvm.i64 + %5361 = llvm.add %5358, %5360 : !llvm.i64 + %5362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5363 = llvm.mul %4787, %5362 : !llvm.i64 + %5364 = llvm.add %5361, %5363 : !llvm.i64 + %5365 = llvm.getelementptr %5354[%5364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5353, %5365 : !llvm.ptr> + %5366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5369 = llvm.mul %4772, %5368 : !llvm.i64 + %5370 = llvm.add %5367, %5369 : !llvm.i64 + %5371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5372 = llvm.mul %67, %5371 : !llvm.i64 + %5373 = llvm.add %5370, %5372 : !llvm.i64 + %5374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5375 = llvm.mul %4787, %5374 : !llvm.i64 + %5376 = llvm.add %5373, %5375 : !llvm.i64 + %5377 = llvm.getelementptr %5366[%5376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5378 = llvm.load %5377 : !llvm.ptr> + %5379 = llvm.insertelement %5025, %5378[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5380 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5382 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5383 = llvm.mul %4772, %5382 : !llvm.i64 + %5384 = llvm.add %5381, %5383 : !llvm.i64 + %5385 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5386 = llvm.mul %67, %5385 : !llvm.i64 + %5387 = llvm.add %5384, %5386 : !llvm.i64 + %5388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5389 = llvm.mul %4787, %5388 : !llvm.i64 + %5390 = llvm.add %5387, %5389 : !llvm.i64 + %5391 = llvm.getelementptr %5380[%5390] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5379, %5391 : !llvm.ptr> + %5392 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5393 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5394 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5395 = llvm.mul %4772, %5394 : !llvm.i64 + %5396 = llvm.add %5393, %5395 : !llvm.i64 + %5397 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5398 = llvm.mul %67, %5397 : !llvm.i64 + %5399 = llvm.add %5396, %5398 : !llvm.i64 + %5400 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5401 = llvm.mul %4787, %5400 : !llvm.i64 + %5402 = llvm.add %5399, %5401 : !llvm.i64 + %5403 = llvm.getelementptr %5392[%5402] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5404 = llvm.load %5403 : !llvm.ptr> + %5405 = llvm.insertelement %5026, %5404[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5406 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5408 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5409 = llvm.mul %4772, %5408 : !llvm.i64 + %5410 = llvm.add %5407, %5409 : !llvm.i64 + %5411 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5412 = llvm.mul %67, %5411 : !llvm.i64 + %5413 = llvm.add %5410, %5412 : !llvm.i64 + %5414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5415 = llvm.mul %4787, %5414 : !llvm.i64 + %5416 = llvm.add %5413, %5415 : !llvm.i64 + %5417 = llvm.getelementptr %5406[%5416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5405, %5417 : !llvm.ptr> + %5418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5421 = llvm.mul %4772, %5420 : !llvm.i64 + %5422 = llvm.add %5419, %5421 : !llvm.i64 + %5423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5424 = llvm.mul %67, %5423 : !llvm.i64 + %5425 = llvm.add %5422, %5424 : !llvm.i64 + %5426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5427 = llvm.mul %4787, %5426 : !llvm.i64 + %5428 = llvm.add %5425, %5427 : !llvm.i64 + %5429 = llvm.getelementptr %5418[%5428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5430 = llvm.load %5429 : !llvm.ptr> + %5431 = llvm.insertelement %5027, %5430[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5435 = llvm.mul %4772, %5434 : !llvm.i64 + %5436 = llvm.add %5433, %5435 : !llvm.i64 + %5437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5438 = llvm.mul %67, %5437 : !llvm.i64 + %5439 = llvm.add %5436, %5438 : !llvm.i64 + %5440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5441 = llvm.mul %4787, %5440 : !llvm.i64 + %5442 = llvm.add %5439, %5441 : !llvm.i64 + %5443 = llvm.getelementptr %5432[%5442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5431, %5443 : !llvm.ptr> + %5444 = llvm.add %3915, %69 : !llvm.i64 + llvm.br ^bb36(%5444 : !llvm.i64) + ^bb38: // pred: ^bb36 + %5445 = llvm.add %2370, %48 : !llvm.i64 + llvm.br ^bb25(%5445 : !llvm.i64) + ^bb39: // pred: ^bb25 + %5446 = llvm.add %2368, %68 : !llvm.i64 + llvm.br ^bb23(%5446 : !llvm.i64) + ^bb40: // pred: ^bb23 + llvm.br ^bb41(%67 : !llvm.i64) + ^bb41(%5447: !llvm.i64): // 2 preds: ^bb40, ^bb50 + %5448 = llvm.icmp "slt" %5447, %38 : !llvm.i64 + llvm.cond_br %5448, ^bb42, ^bb51 + ^bb42: // pred: ^bb41 + llvm.cond_br %40, ^bb43, ^bb47 + ^bb43: // pred: ^bb42 + %5449 = llvm.add %151, %5447 : !llvm.i64 + %5450 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5451 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5452 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5453 = llvm.mul %2345, %5452 : !llvm.i64 + %5454 = llvm.add %5451, %5453 : !llvm.i64 + %5455 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5456 = llvm.mul %5449, %5455 : !llvm.i64 + %5457 = llvm.add %5454, %5456 : !llvm.i64 + %5458 = llvm.getelementptr %5450[%5457] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5459 = llvm.bitcast %5458 : !llvm.ptr to !llvm.ptr> + %5460 = llvm.load %5459 {alignment = 4 : i64} : !llvm.ptr> + %5461 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %5462 = llvm.sub %64, %5447 : !llvm.i64 + %5463 = llvm.select %5461, %5462, %5447 : !llvm.i1, !llvm.i64 + %5464 = llvm.sdiv %5463, %68 : !llvm.i64 + %5465 = llvm.sub %64, %5464 : !llvm.i64 + %5466 = llvm.select %5461, %5465, %5464 : !llvm.i1, !llvm.i64 + %5467 = llvm.srem %5466, %68 : !llvm.i64 + %5468 = llvm.icmp "slt" %5467, %67 : !llvm.i64 + %5469 = llvm.add %5467, %68 : !llvm.i64 + %5470 = llvm.select %5468, %5469, %5467 : !llvm.i1, !llvm.i64 + %5471 = llvm.srem %5447, %68 : !llvm.i64 + %5472 = llvm.icmp "slt" %5471, %67 : !llvm.i64 + %5473 = llvm.add %5471, %68 : !llvm.i64 + %5474 = llvm.select %5472, %5473, %5471 : !llvm.i1, !llvm.i64 + %5475 = llvm.icmp "slt" %5474, %67 : !llvm.i64 + %5476 = llvm.sub %64, %5474 : !llvm.i64 + %5477 = llvm.select %5475, %5476, %5474 : !llvm.i1, !llvm.i64 + %5478 = llvm.sdiv %5477, %70 : !llvm.i64 + %5479 = llvm.sub %64, %5478 : !llvm.i64 + %5480 = llvm.select %5475, %5479, %5478 : !llvm.i1, !llvm.i64 + %5481 = llvm.srem %5480, %63 : !llvm.i64 + %5482 = llvm.icmp "slt" %5481, %67 : !llvm.i64 + %5483 = llvm.add %5481, %63 : !llvm.i64 + %5484 = llvm.select %5482, %5483, %5481 : !llvm.i1, !llvm.i64 + %5485 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5486 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5487 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5488 = llvm.mul %5470, %5487 : !llvm.i64 + %5489 = llvm.add %5486, %5488 : !llvm.i64 + %5490 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5491 = llvm.mul %67, %5490 : !llvm.i64 + %5492 = llvm.add %5489, %5491 : !llvm.i64 + %5493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5494 = llvm.mul %5484, %5493 : !llvm.i64 + %5495 = llvm.add %5492, %5494 : !llvm.i64 + %5496 = llvm.getelementptr %5485[%5495] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5497 = llvm.load %5496 : !llvm.ptr> + %5498 = llvm.fadd %5460, %5497 : !llvm.vec<8 x float> + %5499 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5500 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5501 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5502 = llvm.mul %67, %5501 : !llvm.i64 + %5503 = llvm.add %5500, %5502 : !llvm.i64 + %5504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5505 = llvm.mul %67, %5504 : !llvm.i64 + %5506 = llvm.add %5503, %5505 : !llvm.i64 + %5507 = llvm.getelementptr %5499[%5506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5498, %5507 : !llvm.ptr> + %5508 = llvm.add %5449, %70 : !llvm.i64 + %5509 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5510 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5511 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5512 = llvm.mul %2345, %5511 : !llvm.i64 + %5513 = llvm.add %5510, %5512 : !llvm.i64 + %5514 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5515 = llvm.mul %5508, %5514 : !llvm.i64 + %5516 = llvm.add %5513, %5515 : !llvm.i64 + %5517 = llvm.getelementptr %5509[%5516] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5518 = llvm.bitcast %5517 : !llvm.ptr to !llvm.ptr> + %5519 = llvm.load %5518 {alignment = 4 : i64} : !llvm.ptr> + %5520 = llvm.add %5447, %70 : !llvm.i64 + %5521 = llvm.icmp "slt" %5520, %67 : !llvm.i64 + %5522 = llvm.sub %64, %5520 : !llvm.i64 + %5523 = llvm.select %5521, %5522, %5520 : !llvm.i1, !llvm.i64 + %5524 = llvm.sdiv %5523, %68 : !llvm.i64 + %5525 = llvm.sub %64, %5524 : !llvm.i64 + %5526 = llvm.select %5521, %5525, %5524 : !llvm.i1, !llvm.i64 + %5527 = llvm.srem %5526, %68 : !llvm.i64 + %5528 = llvm.icmp "slt" %5527, %67 : !llvm.i64 + %5529 = llvm.add %5527, %68 : !llvm.i64 + %5530 = llvm.select %5528, %5529, %5527 : !llvm.i1, !llvm.i64 + %5531 = llvm.sdiv %5463, %70 : !llvm.i64 + %5532 = llvm.sub %64, %5531 : !llvm.i64 + %5533 = llvm.select %5461, %5532, %5531 : !llvm.i1, !llvm.i64 + %5534 = llvm.mul %5526, %65 : !llvm.i64 + %5535 = llvm.add %5533, %5534 : !llvm.i64 + %5536 = llvm.add %5535, %69 : !llvm.i64 + %5537 = llvm.icmp "slt" %5536, %67 : !llvm.i64 + %5538 = llvm.sub %64, %5536 : !llvm.i64 + %5539 = llvm.select %5537, %5538, %5536 : !llvm.i1, !llvm.i64 + %5540 = llvm.sdiv %5539, %63 : !llvm.i64 + %5541 = llvm.sub %64, %5540 : !llvm.i64 + %5542 = llvm.select %5537, %5541, %5540 : !llvm.i1, !llvm.i64 + %5543 = llvm.mul %5542, %65 : !llvm.i64 + %5544 = llvm.add %5535, %5543 : !llvm.i64 + %5545 = llvm.add %5544, %69 : !llvm.i64 + %5546 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5547 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5548 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5549 = llvm.mul %5530, %5548 : !llvm.i64 + %5550 = llvm.add %5547, %5549 : !llvm.i64 + %5551 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5552 = llvm.mul %67, %5551 : !llvm.i64 + %5553 = llvm.add %5550, %5552 : !llvm.i64 + %5554 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5555 = llvm.mul %5545, %5554 : !llvm.i64 + %5556 = llvm.add %5553, %5555 : !llvm.i64 + %5557 = llvm.getelementptr %5546[%5556] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5558 = llvm.load %5557 : !llvm.ptr> + %5559 = llvm.fadd %5519, %5558 : !llvm.vec<8 x float> + %5560 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5561 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5562 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5563 = llvm.mul %67, %5562 : !llvm.i64 + %5564 = llvm.add %5561, %5563 : !llvm.i64 + %5565 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5566 = llvm.mul %69, %5565 : !llvm.i64 + %5567 = llvm.add %5564, %5566 : !llvm.i64 + %5568 = llvm.getelementptr %5560[%5567] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5559, %5568 : !llvm.ptr> + %5569 = llvm.add %5449, %68 : !llvm.i64 + %5570 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5571 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5572 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5573 = llvm.mul %2345, %5572 : !llvm.i64 + %5574 = llvm.add %5571, %5573 : !llvm.i64 + %5575 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5576 = llvm.mul %5569, %5575 : !llvm.i64 + %5577 = llvm.add %5574, %5576 : !llvm.i64 + %5578 = llvm.getelementptr %5570[%5577] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5579 = llvm.bitcast %5578 : !llvm.ptr to !llvm.ptr> + %5580 = llvm.load %5579 {alignment = 4 : i64} : !llvm.ptr> + %5581 = llvm.add %5466, %69 : !llvm.i64 + %5582 = llvm.icmp "slt" %5581, %67 : !llvm.i64 + %5583 = llvm.sub %64, %5581 : !llvm.i64 + %5584 = llvm.select %5582, %5583, %5581 : !llvm.i1, !llvm.i64 + %5585 = llvm.sdiv %5584, %68 : !llvm.i64 + %5586 = llvm.sub %64, %5585 : !llvm.i64 + %5587 = llvm.select %5582, %5586, %5585 : !llvm.i1, !llvm.i64 + %5588 = llvm.mul %5587, %60 : !llvm.i64 + %5589 = llvm.add %5466, %5588 : !llvm.i64 + %5590 = llvm.add %5589, %69 : !llvm.i64 + %5591 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5593 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5594 = llvm.mul %5590, %5593 : !llvm.i64 + %5595 = llvm.add %5592, %5594 : !llvm.i64 + %5596 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5597 = llvm.mul %67, %5596 : !llvm.i64 + %5598 = llvm.add %5595, %5597 : !llvm.i64 + %5599 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5600 = llvm.mul %5484, %5599 : !llvm.i64 + %5601 = llvm.add %5598, %5600 : !llvm.i64 + %5602 = llvm.getelementptr %5591[%5601] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5603 = llvm.load %5602 : !llvm.ptr> + %5604 = llvm.fadd %5580, %5603 : !llvm.vec<8 x float> + %5605 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5606 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5607 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5608 = llvm.mul %67, %5607 : !llvm.i64 + %5609 = llvm.add %5606, %5608 : !llvm.i64 + %5610 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5611 = llvm.mul %63, %5610 : !llvm.i64 + %5612 = llvm.add %5609, %5611 : !llvm.i64 + %5613 = llvm.getelementptr %5605[%5612] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5604, %5613 : !llvm.ptr> + %5614 = llvm.add %5449, %41 : !llvm.i64 + %5615 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5616 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5617 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5618 = llvm.mul %2345, %5617 : !llvm.i64 + %5619 = llvm.add %5616, %5618 : !llvm.i64 + %5620 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5621 = llvm.mul %5614, %5620 : !llvm.i64 + %5622 = llvm.add %5619, %5621 : !llvm.i64 + %5623 = llvm.getelementptr %5615[%5622] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5624 = llvm.bitcast %5623 : !llvm.ptr to !llvm.ptr> + %5625 = llvm.load %5624 {alignment = 4 : i64} : !llvm.ptr> + %5626 = llvm.add %5447, %41 : !llvm.i64 + %5627 = llvm.icmp "slt" %5626, %67 : !llvm.i64 + %5628 = llvm.sub %64, %5626 : !llvm.i64 + %5629 = llvm.select %5627, %5628, %5626 : !llvm.i1, !llvm.i64 + %5630 = llvm.sdiv %5629, %68 : !llvm.i64 + %5631 = llvm.sub %64, %5630 : !llvm.i64 + %5632 = llvm.select %5627, %5631, %5630 : !llvm.i1, !llvm.i64 + %5633 = llvm.srem %5632, %68 : !llvm.i64 + %5634 = llvm.icmp "slt" %5633, %67 : !llvm.i64 + %5635 = llvm.add %5633, %68 : !llvm.i64 + %5636 = llvm.select %5634, %5635, %5633 : !llvm.i1, !llvm.i64 + %5637 = llvm.mul %5632, %65 : !llvm.i64 + %5638 = llvm.add %5533, %5637 : !llvm.i64 + %5639 = llvm.add %5638, %45 : !llvm.i64 + %5640 = llvm.icmp "slt" %5639, %67 : !llvm.i64 + %5641 = llvm.sub %64, %5639 : !llvm.i64 + %5642 = llvm.select %5640, %5641, %5639 : !llvm.i1, !llvm.i64 + %5643 = llvm.sdiv %5642, %63 : !llvm.i64 + %5644 = llvm.sub %64, %5643 : !llvm.i64 + %5645 = llvm.select %5640, %5644, %5643 : !llvm.i1, !llvm.i64 + %5646 = llvm.mul %5645, %65 : !llvm.i64 + %5647 = llvm.add %5638, %5646 : !llvm.i64 + %5648 = llvm.add %5647, %45 : !llvm.i64 + %5649 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5651 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5652 = llvm.mul %5636, %5651 : !llvm.i64 + %5653 = llvm.add %5650, %5652 : !llvm.i64 + %5654 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5655 = llvm.mul %67, %5654 : !llvm.i64 + %5656 = llvm.add %5653, %5655 : !llvm.i64 + %5657 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5658 = llvm.mul %5648, %5657 : !llvm.i64 + %5659 = llvm.add %5656, %5658 : !llvm.i64 + %5660 = llvm.getelementptr %5649[%5659] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5661 = llvm.load %5660 : !llvm.ptr> + %5662 = llvm.fadd %5625, %5661 : !llvm.vec<8 x float> + %5663 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5664 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5665 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5666 = llvm.mul %67, %5665 : !llvm.i64 + %5667 = llvm.add %5664, %5666 : !llvm.i64 + %5668 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5669 = llvm.mul %45, %5668 : !llvm.i64 + %5670 = llvm.add %5667, %5669 : !llvm.i64 + %5671 = llvm.getelementptr %5663[%5670] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5662, %5671 : !llvm.ptr> + %5672 = llvm.add %5449, %42 : !llvm.i64 + %5673 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5674 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5675 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5676 = llvm.mul %2345, %5675 : !llvm.i64 + %5677 = llvm.add %5674, %5676 : !llvm.i64 + %5678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5679 = llvm.mul %5672, %5678 : !llvm.i64 + %5680 = llvm.add %5677, %5679 : !llvm.i64 + %5681 = llvm.getelementptr %5673[%5680] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5682 = llvm.bitcast %5681 : !llvm.ptr to !llvm.ptr> + %5683 = llvm.load %5682 {alignment = 4 : i64} : !llvm.ptr> + %5684 = llvm.add %5466, %63 : !llvm.i64 + %5685 = llvm.icmp "slt" %5684, %67 : !llvm.i64 + %5686 = llvm.sub %64, %5684 : !llvm.i64 + %5687 = llvm.select %5685, %5686, %5684 : !llvm.i1, !llvm.i64 + %5688 = llvm.sdiv %5687, %68 : !llvm.i64 + %5689 = llvm.sub %64, %5688 : !llvm.i64 + %5690 = llvm.select %5685, %5689, %5688 : !llvm.i1, !llvm.i64 + %5691 = llvm.mul %5690, %60 : !llvm.i64 + %5692 = llvm.add %5466, %5691 : !llvm.i64 + %5693 = llvm.add %5692, %63 : !llvm.i64 + %5694 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5695 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5696 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5697 = llvm.mul %5693, %5696 : !llvm.i64 + %5698 = llvm.add %5695, %5697 : !llvm.i64 + %5699 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5700 = llvm.mul %67, %5699 : !llvm.i64 + %5701 = llvm.add %5698, %5700 : !llvm.i64 + %5702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5703 = llvm.mul %5484, %5702 : !llvm.i64 + %5704 = llvm.add %5701, %5703 : !llvm.i64 + %5705 = llvm.getelementptr %5694[%5704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5706 = llvm.load %5705 : !llvm.ptr> + %5707 = llvm.fadd %5683, %5706 : !llvm.vec<8 x float> + %5708 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5710 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5711 = llvm.mul %67, %5710 : !llvm.i64 + %5712 = llvm.add %5709, %5711 : !llvm.i64 + %5713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5714 = llvm.mul %48, %5713 : !llvm.i64 + %5715 = llvm.add %5712, %5714 : !llvm.i64 + %5716 = llvm.getelementptr %5708[%5715] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5707, %5716 : !llvm.ptr> + %5717 = llvm.add %5449, %43 : !llvm.i64 + %5718 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5720 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5721 = llvm.mul %2345, %5720 : !llvm.i64 + %5722 = llvm.add %5719, %5721 : !llvm.i64 + %5723 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5724 = llvm.mul %5717, %5723 : !llvm.i64 + %5725 = llvm.add %5722, %5724 : !llvm.i64 + %5726 = llvm.getelementptr %5718[%5725] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5727 = llvm.bitcast %5726 : !llvm.ptr to !llvm.ptr> + %5728 = llvm.load %5727 {alignment = 4 : i64} : !llvm.ptr> + %5729 = llvm.add %5447, %43 : !llvm.i64 + %5730 = llvm.icmp "slt" %5729, %67 : !llvm.i64 + %5731 = llvm.sub %64, %5729 : !llvm.i64 + %5732 = llvm.select %5730, %5731, %5729 : !llvm.i1, !llvm.i64 + %5733 = llvm.sdiv %5732, %68 : !llvm.i64 + %5734 = llvm.sub %64, %5733 : !llvm.i64 + %5735 = llvm.select %5730, %5734, %5733 : !llvm.i1, !llvm.i64 + %5736 = llvm.srem %5735, %68 : !llvm.i64 + %5737 = llvm.icmp "slt" %5736, %67 : !llvm.i64 + %5738 = llvm.add %5736, %68 : !llvm.i64 + %5739 = llvm.select %5737, %5738, %5736 : !llvm.i1, !llvm.i64 + %5740 = llvm.mul %5735, %65 : !llvm.i64 + %5741 = llvm.add %5533, %5740 : !llvm.i64 + %5742 = llvm.add %5741, %52 : !llvm.i64 + %5743 = llvm.icmp "slt" %5742, %67 : !llvm.i64 + %5744 = llvm.sub %64, %5742 : !llvm.i64 + %5745 = llvm.select %5743, %5744, %5742 : !llvm.i1, !llvm.i64 + %5746 = llvm.sdiv %5745, %63 : !llvm.i64 + %5747 = llvm.sub %64, %5746 : !llvm.i64 + %5748 = llvm.select %5743, %5747, %5746 : !llvm.i1, !llvm.i64 + %5749 = llvm.mul %5748, %65 : !llvm.i64 + %5750 = llvm.add %5741, %5749 : !llvm.i64 + %5751 = llvm.add %5750, %52 : !llvm.i64 + %5752 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5754 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5755 = llvm.mul %5739, %5754 : !llvm.i64 + %5756 = llvm.add %5753, %5755 : !llvm.i64 + %5757 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5758 = llvm.mul %67, %5757 : !llvm.i64 + %5759 = llvm.add %5756, %5758 : !llvm.i64 + %5760 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5761 = llvm.mul %5751, %5760 : !llvm.i64 + %5762 = llvm.add %5759, %5761 : !llvm.i64 + %5763 = llvm.getelementptr %5752[%5762] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5764 = llvm.load %5763 : !llvm.ptr> + %5765 = llvm.fadd %5728, %5764 : !llvm.vec<8 x float> + %5766 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5767 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5768 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5769 = llvm.mul %67, %5768 : !llvm.i64 + %5770 = llvm.add %5767, %5769 : !llvm.i64 + %5771 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5772 = llvm.mul %52, %5771 : !llvm.i64 + %5773 = llvm.add %5770, %5772 : !llvm.i64 + %5774 = llvm.getelementptr %5766[%5773] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5765, %5774 : !llvm.ptr> + %5775 = llvm.add %5449, %44 : !llvm.i64 + %5776 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5777 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5778 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5779 = llvm.mul %2345, %5778 : !llvm.i64 + %5780 = llvm.add %5777, %5779 : !llvm.i64 + %5781 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5782 = llvm.mul %5775, %5781 : !llvm.i64 + %5783 = llvm.add %5780, %5782 : !llvm.i64 + %5784 = llvm.getelementptr %5776[%5783] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5785 = llvm.bitcast %5784 : !llvm.ptr to !llvm.ptr> + %5786 = llvm.load %5785 {alignment = 4 : i64} : !llvm.ptr> + %5787 = llvm.add %5466, %45 : !llvm.i64 + %5788 = llvm.icmp "slt" %5787, %67 : !llvm.i64 + %5789 = llvm.sub %64, %5787 : !llvm.i64 + %5790 = llvm.select %5788, %5789, %5787 : !llvm.i1, !llvm.i64 + %5791 = llvm.sdiv %5790, %68 : !llvm.i64 + %5792 = llvm.sub %64, %5791 : !llvm.i64 + %5793 = llvm.select %5788, %5792, %5791 : !llvm.i1, !llvm.i64 + %5794 = llvm.mul %5793, %60 : !llvm.i64 + %5795 = llvm.add %5466, %5794 : !llvm.i64 + %5796 = llvm.add %5795, %45 : !llvm.i64 + %5797 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5799 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5800 = llvm.mul %5796, %5799 : !llvm.i64 + %5801 = llvm.add %5798, %5800 : !llvm.i64 + %5802 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5803 = llvm.mul %67, %5802 : !llvm.i64 + %5804 = llvm.add %5801, %5803 : !llvm.i64 + %5805 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5806 = llvm.mul %5484, %5805 : !llvm.i64 + %5807 = llvm.add %5804, %5806 : !llvm.i64 + %5808 = llvm.getelementptr %5797[%5807] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5809 = llvm.load %5808 : !llvm.ptr> + %5810 = llvm.fadd %5786, %5809 : !llvm.vec<8 x float> + %5811 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5812 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5813 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5814 = llvm.mul %67, %5813 : !llvm.i64 + %5815 = llvm.add %5812, %5814 : !llvm.i64 + %5816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5817 = llvm.mul %56, %5816 : !llvm.i64 + %5818 = llvm.add %5815, %5817 : !llvm.i64 + %5819 = llvm.getelementptr %5811[%5818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5810, %5819 : !llvm.ptr> + %5820 = llvm.add %5449, %46 : !llvm.i64 + %5821 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5822 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5823 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5824 = llvm.mul %2345, %5823 : !llvm.i64 + %5825 = llvm.add %5822, %5824 : !llvm.i64 + %5826 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5827 = llvm.mul %5820, %5826 : !llvm.i64 + %5828 = llvm.add %5825, %5827 : !llvm.i64 + %5829 = llvm.getelementptr %5821[%5828] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5830 = llvm.bitcast %5829 : !llvm.ptr to !llvm.ptr> + %5831 = llvm.load %5830 {alignment = 4 : i64} : !llvm.ptr> + %5832 = llvm.add %5447, %46 : !llvm.i64 + %5833 = llvm.icmp "slt" %5832, %67 : !llvm.i64 + %5834 = llvm.sub %64, %5832 : !llvm.i64 + %5835 = llvm.select %5833, %5834, %5832 : !llvm.i1, !llvm.i64 + %5836 = llvm.sdiv %5835, %68 : !llvm.i64 + %5837 = llvm.sub %64, %5836 : !llvm.i64 + %5838 = llvm.select %5833, %5837, %5836 : !llvm.i1, !llvm.i64 + %5839 = llvm.srem %5838, %68 : !llvm.i64 + %5840 = llvm.icmp "slt" %5839, %67 : !llvm.i64 + %5841 = llvm.add %5839, %68 : !llvm.i64 + %5842 = llvm.select %5840, %5841, %5839 : !llvm.i1, !llvm.i64 + %5843 = llvm.mul %5838, %65 : !llvm.i64 + %5844 = llvm.add %5533, %5843 : !llvm.i64 + %5845 = llvm.add %5844, %61 : !llvm.i64 + %5846 = llvm.icmp "slt" %5845, %67 : !llvm.i64 + %5847 = llvm.sub %64, %5845 : !llvm.i64 + %5848 = llvm.select %5846, %5847, %5845 : !llvm.i1, !llvm.i64 + %5849 = llvm.sdiv %5848, %63 : !llvm.i64 + %5850 = llvm.sub %64, %5849 : !llvm.i64 + %5851 = llvm.select %5846, %5850, %5849 : !llvm.i1, !llvm.i64 + %5852 = llvm.mul %5851, %65 : !llvm.i64 + %5853 = llvm.add %5844, %5852 : !llvm.i64 + %5854 = llvm.add %5853, %61 : !llvm.i64 + %5855 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5857 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5858 = llvm.mul %5842, %5857 : !llvm.i64 + %5859 = llvm.add %5856, %5858 : !llvm.i64 + %5860 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5861 = llvm.mul %67, %5860 : !llvm.i64 + %5862 = llvm.add %5859, %5861 : !llvm.i64 + %5863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5864 = llvm.mul %5854, %5863 : !llvm.i64 + %5865 = llvm.add %5862, %5864 : !llvm.i64 + %5866 = llvm.getelementptr %5855[%5865] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5867 = llvm.load %5866 : !llvm.ptr> + %5868 = llvm.fadd %5831, %5867 : !llvm.vec<8 x float> + %5869 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5871 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5872 = llvm.mul %67, %5871 : !llvm.i64 + %5873 = llvm.add %5870, %5872 : !llvm.i64 + %5874 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5875 = llvm.mul %61, %5874 : !llvm.i64 + %5876 = llvm.add %5873, %5875 : !llvm.i64 + %5877 = llvm.getelementptr %5869[%5876] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5868, %5877 : !llvm.ptr> + %5878 = llvm.add %5449, %47 : !llvm.i64 + %5879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5882 = llvm.mul %2345, %5881 : !llvm.i64 + %5883 = llvm.add %5880, %5882 : !llvm.i64 + %5884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5885 = llvm.mul %5878, %5884 : !llvm.i64 + %5886 = llvm.add %5883, %5885 : !llvm.i64 + %5887 = llvm.getelementptr %5879[%5886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5888 = llvm.bitcast %5887 : !llvm.ptr to !llvm.ptr> + %5889 = llvm.load %5888 {alignment = 4 : i64} : !llvm.ptr> + %5890 = llvm.add %5466, %48 : !llvm.i64 + %5891 = llvm.icmp "slt" %5890, %67 : !llvm.i64 + %5892 = llvm.sub %64, %5890 : !llvm.i64 + %5893 = llvm.select %5891, %5892, %5890 : !llvm.i1, !llvm.i64 + %5894 = llvm.sdiv %5893, %68 : !llvm.i64 + %5895 = llvm.sub %64, %5894 : !llvm.i64 + %5896 = llvm.select %5891, %5895, %5894 : !llvm.i1, !llvm.i64 + %5897 = llvm.mul %5896, %60 : !llvm.i64 + %5898 = llvm.add %5466, %5897 : !llvm.i64 + %5899 = llvm.add %5898, %48 : !llvm.i64 + %5900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5903 = llvm.mul %5899, %5902 : !llvm.i64 + %5904 = llvm.add %5901, %5903 : !llvm.i64 + %5905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5906 = llvm.mul %67, %5905 : !llvm.i64 + %5907 = llvm.add %5904, %5906 : !llvm.i64 + %5908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5909 = llvm.mul %5484, %5908 : !llvm.i64 + %5910 = llvm.add %5907, %5909 : !llvm.i64 + %5911 = llvm.getelementptr %5900[%5910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5912 = llvm.load %5911 : !llvm.ptr> + %5913 = llvm.fadd %5889, %5912 : !llvm.vec<8 x float> + %5914 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5915 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5916 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5917 = llvm.mul %67, %5916 : !llvm.i64 + %5918 = llvm.add %5915, %5917 : !llvm.i64 + %5919 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5920 = llvm.mul %70, %5919 : !llvm.i64 + %5921 = llvm.add %5918, %5920 : !llvm.i64 + %5922 = llvm.getelementptr %5914[%5921] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5913, %5922 : !llvm.ptr> + %5923 = llvm.add %5449, %49 : !llvm.i64 + %5924 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5925 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5926 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5927 = llvm.mul %2345, %5926 : !llvm.i64 + %5928 = llvm.add %5925, %5927 : !llvm.i64 + %5929 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5930 = llvm.mul %5923, %5929 : !llvm.i64 + %5931 = llvm.add %5928, %5930 : !llvm.i64 + %5932 = llvm.getelementptr %5924[%5931] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5933 = llvm.bitcast %5932 : !llvm.ptr to !llvm.ptr> + %5934 = llvm.load %5933 {alignment = 4 : i64} : !llvm.ptr> + %5935 = llvm.add %5447, %49 : !llvm.i64 + %5936 = llvm.icmp "slt" %5935, %67 : !llvm.i64 + %5937 = llvm.sub %64, %5935 : !llvm.i64 + %5938 = llvm.select %5936, %5937, %5935 : !llvm.i1, !llvm.i64 + %5939 = llvm.sdiv %5938, %68 : !llvm.i64 + %5940 = llvm.sub %64, %5939 : !llvm.i64 + %5941 = llvm.select %5936, %5940, %5939 : !llvm.i1, !llvm.i64 + %5942 = llvm.srem %5941, %68 : !llvm.i64 + %5943 = llvm.icmp "slt" %5942, %67 : !llvm.i64 + %5944 = llvm.add %5942, %68 : !llvm.i64 + %5945 = llvm.select %5943, %5944, %5942 : !llvm.i1, !llvm.i64 + %5946 = llvm.mul %5941, %65 : !llvm.i64 + %5947 = llvm.add %5533, %5946 : !llvm.i64 + %5948 = llvm.add %5947, %50 : !llvm.i64 + %5949 = llvm.icmp "slt" %5948, %67 : !llvm.i64 + %5950 = llvm.sub %64, %5948 : !llvm.i64 + %5951 = llvm.select %5949, %5950, %5948 : !llvm.i1, !llvm.i64 + %5952 = llvm.sdiv %5951, %63 : !llvm.i64 + %5953 = llvm.sub %64, %5952 : !llvm.i64 + %5954 = llvm.select %5949, %5953, %5952 : !llvm.i1, !llvm.i64 + %5955 = llvm.mul %5954, %65 : !llvm.i64 + %5956 = llvm.add %5947, %5955 : !llvm.i64 + %5957 = llvm.add %5956, %50 : !llvm.i64 + %5958 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5960 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5961 = llvm.mul %5945, %5960 : !llvm.i64 + %5962 = llvm.add %5959, %5961 : !llvm.i64 + %5963 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5964 = llvm.mul %67, %5963 : !llvm.i64 + %5965 = llvm.add %5962, %5964 : !llvm.i64 + %5966 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5967 = llvm.mul %5957, %5966 : !llvm.i64 + %5968 = llvm.add %5965, %5967 : !llvm.i64 + %5969 = llvm.getelementptr %5958[%5968] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5970 = llvm.load %5969 : !llvm.ptr> + %5971 = llvm.fadd %5934, %5970 : !llvm.vec<8 x float> + %5972 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5973 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5974 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5975 = llvm.mul %67, %5974 : !llvm.i64 + %5976 = llvm.add %5973, %5975 : !llvm.i64 + %5977 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5978 = llvm.mul %50, %5977 : !llvm.i64 + %5979 = llvm.add %5976, %5978 : !llvm.i64 + %5980 = llvm.getelementptr %5972[%5979] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5971, %5980 : !llvm.ptr> + %5981 = llvm.add %5449, %51 : !llvm.i64 + %5982 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5984 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5985 = llvm.mul %2345, %5984 : !llvm.i64 + %5986 = llvm.add %5983, %5985 : !llvm.i64 + %5987 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5988 = llvm.mul %5981, %5987 : !llvm.i64 + %5989 = llvm.add %5986, %5988 : !llvm.i64 + %5990 = llvm.getelementptr %5982[%5989] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5991 = llvm.bitcast %5990 : !llvm.ptr to !llvm.ptr> + %5992 = llvm.load %5991 {alignment = 4 : i64} : !llvm.ptr> + %5993 = llvm.add %5466, %52 : !llvm.i64 + %5994 = llvm.icmp "slt" %5993, %67 : !llvm.i64 + %5995 = llvm.sub %64, %5993 : !llvm.i64 + %5996 = llvm.select %5994, %5995, %5993 : !llvm.i1, !llvm.i64 + %5997 = llvm.sdiv %5996, %68 : !llvm.i64 + %5998 = llvm.sub %64, %5997 : !llvm.i64 + %5999 = llvm.select %5994, %5998, %5997 : !llvm.i1, !llvm.i64 + %6000 = llvm.mul %5999, %60 : !llvm.i64 + %6001 = llvm.add %5466, %6000 : !llvm.i64 + %6002 = llvm.add %6001, %52 : !llvm.i64 + %6003 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6004 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6005 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6006 = llvm.mul %6002, %6005 : !llvm.i64 + %6007 = llvm.add %6004, %6006 : !llvm.i64 + %6008 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6009 = llvm.mul %67, %6008 : !llvm.i64 + %6010 = llvm.add %6007, %6009 : !llvm.i64 + %6011 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6012 = llvm.mul %5484, %6011 : !llvm.i64 + %6013 = llvm.add %6010, %6012 : !llvm.i64 + %6014 = llvm.getelementptr %6003[%6013] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6015 = llvm.load %6014 : !llvm.ptr> + %6016 = llvm.fadd %5992, %6015 : !llvm.vec<8 x float> + %6017 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6018 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6019 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6020 = llvm.mul %67, %6019 : !llvm.i64 + %6021 = llvm.add %6018, %6020 : !llvm.i64 + %6022 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6023 = llvm.mul %33, %6022 : !llvm.i64 + %6024 = llvm.add %6021, %6023 : !llvm.i64 + %6025 = llvm.getelementptr %6017[%6024] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6016, %6025 : !llvm.ptr> + %6026 = llvm.add %5449, %53 : !llvm.i64 + %6027 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6028 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6029 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6030 = llvm.mul %2345, %6029 : !llvm.i64 + %6031 = llvm.add %6028, %6030 : !llvm.i64 + %6032 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6033 = llvm.mul %6026, %6032 : !llvm.i64 + %6034 = llvm.add %6031, %6033 : !llvm.i64 + %6035 = llvm.getelementptr %6027[%6034] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6036 = llvm.bitcast %6035 : !llvm.ptr to !llvm.ptr> + %6037 = llvm.load %6036 {alignment = 4 : i64} : !llvm.ptr> + %6038 = llvm.add %5447, %53 : !llvm.i64 + %6039 = llvm.icmp "slt" %6038, %67 : !llvm.i64 + %6040 = llvm.sub %64, %6038 : !llvm.i64 + %6041 = llvm.select %6039, %6040, %6038 : !llvm.i1, !llvm.i64 + %6042 = llvm.sdiv %6041, %68 : !llvm.i64 + %6043 = llvm.sub %64, %6042 : !llvm.i64 + %6044 = llvm.select %6039, %6043, %6042 : !llvm.i1, !llvm.i64 + %6045 = llvm.srem %6044, %68 : !llvm.i64 + %6046 = llvm.icmp "slt" %6045, %67 : !llvm.i64 + %6047 = llvm.add %6045, %68 : !llvm.i64 + %6048 = llvm.select %6046, %6047, %6045 : !llvm.i1, !llvm.i64 + %6049 = llvm.mul %6044, %65 : !llvm.i64 + %6050 = llvm.add %5533, %6049 : !llvm.i64 + %6051 = llvm.add %6050, %54 : !llvm.i64 + %6052 = llvm.icmp "slt" %6051, %67 : !llvm.i64 + %6053 = llvm.sub %64, %6051 : !llvm.i64 + %6054 = llvm.select %6052, %6053, %6051 : !llvm.i1, !llvm.i64 + %6055 = llvm.sdiv %6054, %63 : !llvm.i64 + %6056 = llvm.sub %64, %6055 : !llvm.i64 + %6057 = llvm.select %6052, %6056, %6055 : !llvm.i1, !llvm.i64 + %6058 = llvm.mul %6057, %65 : !llvm.i64 + %6059 = llvm.add %6050, %6058 : !llvm.i64 + %6060 = llvm.add %6059, %54 : !llvm.i64 + %6061 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6062 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6063 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6064 = llvm.mul %6048, %6063 : !llvm.i64 + %6065 = llvm.add %6062, %6064 : !llvm.i64 + %6066 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6067 = llvm.mul %67, %6066 : !llvm.i64 + %6068 = llvm.add %6065, %6067 : !llvm.i64 + %6069 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6070 = llvm.mul %6060, %6069 : !llvm.i64 + %6071 = llvm.add %6068, %6070 : !llvm.i64 + %6072 = llvm.getelementptr %6061[%6071] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6073 = llvm.load %6072 : !llvm.ptr> + %6074 = llvm.fadd %6037, %6073 : !llvm.vec<8 x float> + %6075 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6076 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6077 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6078 = llvm.mul %67, %6077 : !llvm.i64 + %6079 = llvm.add %6076, %6078 : !llvm.i64 + %6080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6081 = llvm.mul %54, %6080 : !llvm.i64 + %6082 = llvm.add %6079, %6081 : !llvm.i64 + %6083 = llvm.getelementptr %6075[%6082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6074, %6083 : !llvm.ptr> + %6084 = llvm.add %5449, %55 : !llvm.i64 + %6085 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6087 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6088 = llvm.mul %2345, %6087 : !llvm.i64 + %6089 = llvm.add %6086, %6088 : !llvm.i64 + %6090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6091 = llvm.mul %6084, %6090 : !llvm.i64 + %6092 = llvm.add %6089, %6091 : !llvm.i64 + %6093 = llvm.getelementptr %6085[%6092] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6094 = llvm.bitcast %6093 : !llvm.ptr to !llvm.ptr> + %6095 = llvm.load %6094 {alignment = 4 : i64} : !llvm.ptr> + %6096 = llvm.add %5466, %56 : !llvm.i64 + %6097 = llvm.icmp "slt" %6096, %67 : !llvm.i64 + %6098 = llvm.sub %64, %6096 : !llvm.i64 + %6099 = llvm.select %6097, %6098, %6096 : !llvm.i1, !llvm.i64 + %6100 = llvm.sdiv %6099, %68 : !llvm.i64 + %6101 = llvm.sub %64, %6100 : !llvm.i64 + %6102 = llvm.select %6097, %6101, %6100 : !llvm.i1, !llvm.i64 + %6103 = llvm.mul %6102, %60 : !llvm.i64 + %6104 = llvm.add %5466, %6103 : !llvm.i64 + %6105 = llvm.add %6104, %56 : !llvm.i64 + %6106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6109 = llvm.mul %6105, %6108 : !llvm.i64 + %6110 = llvm.add %6107, %6109 : !llvm.i64 + %6111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6112 = llvm.mul %67, %6111 : !llvm.i64 + %6113 = llvm.add %6110, %6112 : !llvm.i64 + %6114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6115 = llvm.mul %5484, %6114 : !llvm.i64 + %6116 = llvm.add %6113, %6115 : !llvm.i64 + %6117 = llvm.getelementptr %6106[%6116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6118 = llvm.load %6117 : !llvm.ptr> + %6119 = llvm.fadd %6095, %6118 : !llvm.vec<8 x float> + %6120 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6122 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6123 = llvm.mul %67, %6122 : !llvm.i64 + %6124 = llvm.add %6121, %6123 : !llvm.i64 + %6125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6126 = llvm.mul %34, %6125 : !llvm.i64 + %6127 = llvm.add %6124, %6126 : !llvm.i64 + %6128 = llvm.getelementptr %6120[%6127] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6119, %6128 : !llvm.ptr> + %6129 = llvm.add %5449, %57 : !llvm.i64 + %6130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6132 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6133 = llvm.mul %2345, %6132 : !llvm.i64 + %6134 = llvm.add %6131, %6133 : !llvm.i64 + %6135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6136 = llvm.mul %6129, %6135 : !llvm.i64 + %6137 = llvm.add %6134, %6136 : !llvm.i64 + %6138 = llvm.getelementptr %6130[%6137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6139 = llvm.bitcast %6138 : !llvm.ptr to !llvm.ptr> + %6140 = llvm.load %6139 {alignment = 4 : i64} : !llvm.ptr> + %6141 = llvm.add %5447, %57 : !llvm.i64 + %6142 = llvm.icmp "slt" %6141, %67 : !llvm.i64 + %6143 = llvm.sub %64, %6141 : !llvm.i64 + %6144 = llvm.select %6142, %6143, %6141 : !llvm.i1, !llvm.i64 + %6145 = llvm.sdiv %6144, %68 : !llvm.i64 + %6146 = llvm.sub %64, %6145 : !llvm.i64 + %6147 = llvm.select %6142, %6146, %6145 : !llvm.i1, !llvm.i64 + %6148 = llvm.srem %6147, %68 : !llvm.i64 + %6149 = llvm.icmp "slt" %6148, %67 : !llvm.i64 + %6150 = llvm.add %6148, %68 : !llvm.i64 + %6151 = llvm.select %6149, %6150, %6148 : !llvm.i1, !llvm.i64 + %6152 = llvm.mul %6147, %65 : !llvm.i64 + %6153 = llvm.add %5533, %6152 : !llvm.i64 + %6154 = llvm.add %6153, %58 : !llvm.i64 + %6155 = llvm.icmp "slt" %6154, %67 : !llvm.i64 + %6156 = llvm.sub %64, %6154 : !llvm.i64 + %6157 = llvm.select %6155, %6156, %6154 : !llvm.i1, !llvm.i64 + %6158 = llvm.sdiv %6157, %63 : !llvm.i64 + %6159 = llvm.sub %64, %6158 : !llvm.i64 + %6160 = llvm.select %6155, %6159, %6158 : !llvm.i1, !llvm.i64 + %6161 = llvm.mul %6160, %65 : !llvm.i64 + %6162 = llvm.add %6153, %6161 : !llvm.i64 + %6163 = llvm.add %6162, %58 : !llvm.i64 + %6164 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6165 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6166 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6167 = llvm.mul %6151, %6166 : !llvm.i64 + %6168 = llvm.add %6165, %6167 : !llvm.i64 + %6169 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6170 = llvm.mul %67, %6169 : !llvm.i64 + %6171 = llvm.add %6168, %6170 : !llvm.i64 + %6172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6173 = llvm.mul %6163, %6172 : !llvm.i64 + %6174 = llvm.add %6171, %6173 : !llvm.i64 + %6175 = llvm.getelementptr %6164[%6174] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6176 = llvm.load %6175 : !llvm.ptr> + %6177 = llvm.fadd %6140, %6176 : !llvm.vec<8 x float> + %6178 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6180 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6181 = llvm.mul %67, %6180 : !llvm.i64 + %6182 = llvm.add %6179, %6181 : !llvm.i64 + %6183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6184 = llvm.mul %58, %6183 : !llvm.i64 + %6185 = llvm.add %6182, %6184 : !llvm.i64 + %6186 = llvm.getelementptr %6178[%6185] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6177, %6186 : !llvm.ptr> + %6187 = llvm.add %5449, %59 : !llvm.i64 + %6188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6191 = llvm.mul %2345, %6190 : !llvm.i64 + %6192 = llvm.add %6189, %6191 : !llvm.i64 + %6193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6194 = llvm.mul %6187, %6193 : !llvm.i64 + %6195 = llvm.add %6192, %6194 : !llvm.i64 + %6196 = llvm.getelementptr %6188[%6195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6197 = llvm.bitcast %6196 : !llvm.ptr to !llvm.ptr> + %6198 = llvm.load %6197 {alignment = 4 : i64} : !llvm.ptr> + %6199 = llvm.add %5466, %61 : !llvm.i64 + %6200 = llvm.icmp "slt" %6199, %67 : !llvm.i64 + %6201 = llvm.sub %64, %6199 : !llvm.i64 + %6202 = llvm.select %6200, %6201, %6199 : !llvm.i1, !llvm.i64 + %6203 = llvm.sdiv %6202, %68 : !llvm.i64 + %6204 = llvm.sub %64, %6203 : !llvm.i64 + %6205 = llvm.select %6200, %6204, %6203 : !llvm.i1, !llvm.i64 + %6206 = llvm.mul %6205, %60 : !llvm.i64 + %6207 = llvm.add %5466, %6206 : !llvm.i64 + %6208 = llvm.add %6207, %61 : !llvm.i64 + %6209 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6211 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6212 = llvm.mul %6208, %6211 : !llvm.i64 + %6213 = llvm.add %6210, %6212 : !llvm.i64 + %6214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6215 = llvm.mul %67, %6214 : !llvm.i64 + %6216 = llvm.add %6213, %6215 : !llvm.i64 + %6217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6218 = llvm.mul %5484, %6217 : !llvm.i64 + %6219 = llvm.add %6216, %6218 : !llvm.i64 + %6220 = llvm.getelementptr %6209[%6219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6221 = llvm.load %6220 : !llvm.ptr> + %6222 = llvm.fadd %6198, %6221 : !llvm.vec<8 x float> + %6223 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6224 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6225 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6226 = llvm.mul %67, %6225 : !llvm.i64 + %6227 = llvm.add %6224, %6226 : !llvm.i64 + %6228 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6229 = llvm.mul %35, %6228 : !llvm.i64 + %6230 = llvm.add %6227, %6229 : !llvm.i64 + %6231 = llvm.getelementptr %6223[%6230] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6222, %6231 : !llvm.ptr> + %6232 = llvm.add %5449, %62 : !llvm.i64 + %6233 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6234 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6235 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6236 = llvm.mul %2345, %6235 : !llvm.i64 + %6237 = llvm.add %6234, %6236 : !llvm.i64 + %6238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6239 = llvm.mul %6232, %6238 : !llvm.i64 + %6240 = llvm.add %6237, %6239 : !llvm.i64 + %6241 = llvm.getelementptr %6233[%6240] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6242 = llvm.bitcast %6241 : !llvm.ptr to !llvm.ptr> + %6243 = llvm.load %6242 {alignment = 4 : i64} : !llvm.ptr> + %6244 = llvm.add %5447, %62 : !llvm.i64 + %6245 = llvm.icmp "slt" %6244, %67 : !llvm.i64 + %6246 = llvm.sub %64, %6244 : !llvm.i64 + %6247 = llvm.select %6245, %6246, %6244 : !llvm.i1, !llvm.i64 + %6248 = llvm.sdiv %6247, %68 : !llvm.i64 + %6249 = llvm.sub %64, %6248 : !llvm.i64 + %6250 = llvm.select %6245, %6249, %6248 : !llvm.i1, !llvm.i64 + %6251 = llvm.srem %6250, %68 : !llvm.i64 + %6252 = llvm.icmp "slt" %6251, %67 : !llvm.i64 + %6253 = llvm.add %6251, %68 : !llvm.i64 + %6254 = llvm.select %6252, %6253, %6251 : !llvm.i1, !llvm.i64 + %6255 = llvm.mul %6250, %65 : !llvm.i64 + %6256 = llvm.add %5533, %6255 : !llvm.i64 + %6257 = llvm.add %6256, %66 : !llvm.i64 + %6258 = llvm.icmp "slt" %6257, %67 : !llvm.i64 + %6259 = llvm.sub %64, %6257 : !llvm.i64 + %6260 = llvm.select %6258, %6259, %6257 : !llvm.i1, !llvm.i64 + %6261 = llvm.sdiv %6260, %63 : !llvm.i64 + %6262 = llvm.sub %64, %6261 : !llvm.i64 + %6263 = llvm.select %6258, %6262, %6261 : !llvm.i1, !llvm.i64 + %6264 = llvm.mul %6263, %65 : !llvm.i64 + %6265 = llvm.add %6256, %6264 : !llvm.i64 + %6266 = llvm.add %6265, %66 : !llvm.i64 + %6267 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6269 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6270 = llvm.mul %6254, %6269 : !llvm.i64 + %6271 = llvm.add %6268, %6270 : !llvm.i64 + %6272 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6273 = llvm.mul %67, %6272 : !llvm.i64 + %6274 = llvm.add %6271, %6273 : !llvm.i64 + %6275 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6276 = llvm.mul %6266, %6275 : !llvm.i64 + %6277 = llvm.add %6274, %6276 : !llvm.i64 + %6278 = llvm.getelementptr %6267[%6277] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6279 = llvm.load %6278 : !llvm.ptr> + %6280 = llvm.fadd %6243, %6279 : !llvm.vec<8 x float> + %6281 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6282 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6283 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6284 = llvm.mul %67, %6283 : !llvm.i64 + %6285 = llvm.add %6282, %6284 : !llvm.i64 + %6286 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6287 = llvm.mul %66, %6286 : !llvm.i64 + %6288 = llvm.add %6285, %6287 : !llvm.i64 + %6289 = llvm.getelementptr %6281[%6288] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6280, %6289 : !llvm.ptr> + llvm.br ^bb44(%67 : !llvm.i64) + ^bb44(%6290: !llvm.i64): // 2 preds: ^bb43, ^bb45 + %6291 = llvm.icmp "slt" %6290, %68 : !llvm.i64 + llvm.cond_br %6291, ^bb45, ^bb46 + ^bb45: // pred: ^bb44 + %6292 = llvm.mul %6290, %70 : !llvm.i64 + %6293 = llvm.add %5449, %6292 : !llvm.i64 + %6294 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6295 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6296 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6297 = llvm.mul %67, %6296 : !llvm.i64 + %6298 = llvm.add %6295, %6297 : !llvm.i64 + %6299 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6300 = llvm.mul %6290, %6299 : !llvm.i64 + %6301 = llvm.add %6298, %6300 : !llvm.i64 + %6302 = llvm.getelementptr %6294[%6301] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6303 = llvm.load %6302 : !llvm.ptr> + %6304 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6305 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6306 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6307 = llvm.mul %2345, %6306 : !llvm.i64 + %6308 = llvm.add %6305, %6307 : !llvm.i64 + %6309 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6310 = llvm.mul %6293, %6309 : !llvm.i64 + %6311 = llvm.add %6308, %6310 : !llvm.i64 + %6312 = llvm.getelementptr %6304[%6311] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6313 = llvm.bitcast %6312 : !llvm.ptr to !llvm.ptr> + llvm.store %6303, %6313 {alignment = 4 : i64} : !llvm.ptr> + %6314 = llvm.add %6290, %69 : !llvm.i64 + llvm.br ^bb44(%6314 : !llvm.i64) + ^bb46: // 2 preds: ^bb44, ^bb48 + llvm.br ^bb50 + ^bb47: // pred: ^bb42 + %6315 = llvm.add %151, %5447 : !llvm.i64 + %6316 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6318 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6319 = llvm.mul %2345, %6318 : !llvm.i64 + %6320 = llvm.add %6317, %6319 : !llvm.i64 + %6321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6322 = llvm.mul %6315, %6321 : !llvm.i64 + %6323 = llvm.add %6320, %6322 : !llvm.i64 + %6324 = llvm.getelementptr %6316[%6323] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6325 = llvm.bitcast %6324 : !llvm.ptr to !llvm.ptr> + %6326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6327 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6328 = llvm.trunc %6315 : !llvm.i64 to !llvm.i32 + %6329 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6330 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6331 = llvm.insertelement %6328, %6329[%6330 : !llvm.i32] : !llvm.vec<8 x i32> + %6332 = llvm.shufflevector %6331, %6329 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6333 = llvm.add %6332, %6327 : !llvm.vec<8 x i32> + %6334 = llvm.trunc %6326 : !llvm.i64 to !llvm.i32 + %6335 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6336 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6337 = llvm.insertelement %6334, %6335[%6336 : !llvm.i32] : !llvm.vec<8 x i32> + %6338 = llvm.shufflevector %6337, %6335 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6339 = llvm.icmp "slt" %6333, %6338 : !llvm.vec<8 x i32> + %6340 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6341 = llvm.intr.masked.load %6325, %6339, %6340 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6342 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %6343 = llvm.sub %64, %5447 : !llvm.i64 + %6344 = llvm.select %6342, %6343, %5447 : !llvm.i1, !llvm.i64 + %6345 = llvm.sdiv %6344, %68 : !llvm.i64 + %6346 = llvm.sub %64, %6345 : !llvm.i64 + %6347 = llvm.select %6342, %6346, %6345 : !llvm.i1, !llvm.i64 + %6348 = llvm.srem %6347, %68 : !llvm.i64 + %6349 = llvm.icmp "slt" %6348, %67 : !llvm.i64 + %6350 = llvm.add %6348, %68 : !llvm.i64 + %6351 = llvm.select %6349, %6350, %6348 : !llvm.i1, !llvm.i64 + %6352 = llvm.srem %5447, %68 : !llvm.i64 + %6353 = llvm.icmp "slt" %6352, %67 : !llvm.i64 + %6354 = llvm.add %6352, %68 : !llvm.i64 + %6355 = llvm.select %6353, %6354, %6352 : !llvm.i1, !llvm.i64 + %6356 = llvm.icmp "slt" %6355, %67 : !llvm.i64 + %6357 = llvm.sub %64, %6355 : !llvm.i64 + %6358 = llvm.select %6356, %6357, %6355 : !llvm.i1, !llvm.i64 + %6359 = llvm.sdiv %6358, %70 : !llvm.i64 + %6360 = llvm.sub %64, %6359 : !llvm.i64 + %6361 = llvm.select %6356, %6360, %6359 : !llvm.i1, !llvm.i64 + %6362 = llvm.srem %6361, %63 : !llvm.i64 + %6363 = llvm.icmp "slt" %6362, %67 : !llvm.i64 + %6364 = llvm.add %6362, %63 : !llvm.i64 + %6365 = llvm.select %6363, %6364, %6362 : !llvm.i1, !llvm.i64 + %6366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6369 = llvm.mul %6351, %6368 : !llvm.i64 + %6370 = llvm.add %6367, %6369 : !llvm.i64 + %6371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6372 = llvm.mul %67, %6371 : !llvm.i64 + %6373 = llvm.add %6370, %6372 : !llvm.i64 + %6374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6375 = llvm.mul %6365, %6374 : !llvm.i64 + %6376 = llvm.add %6373, %6375 : !llvm.i64 + %6377 = llvm.getelementptr %6366[%6376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6378 = llvm.load %6377 : !llvm.ptr> + %6379 = llvm.fadd %6341, %6378 : !llvm.vec<8 x float> + %6380 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6382 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6383 = llvm.mul %67, %6382 : !llvm.i64 + %6384 = llvm.add %6381, %6383 : !llvm.i64 + %6385 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6386 = llvm.mul %67, %6385 : !llvm.i64 + %6387 = llvm.add %6384, %6386 : !llvm.i64 + %6388 = llvm.getelementptr %6380[%6387] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6379, %6388 : !llvm.ptr> + %6389 = llvm.add %6315, %70 : !llvm.i64 + %6390 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6392 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6393 = llvm.mul %2345, %6392 : !llvm.i64 + %6394 = llvm.add %6391, %6393 : !llvm.i64 + %6395 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6396 = llvm.mul %6389, %6395 : !llvm.i64 + %6397 = llvm.add %6394, %6396 : !llvm.i64 + %6398 = llvm.getelementptr %6390[%6397] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6399 = llvm.bitcast %6398 : !llvm.ptr to !llvm.ptr> + %6400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6401 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6402 = llvm.trunc %6389 : !llvm.i64 to !llvm.i32 + %6403 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6404 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6405 = llvm.insertelement %6402, %6403[%6404 : !llvm.i32] : !llvm.vec<8 x i32> + %6406 = llvm.shufflevector %6405, %6403 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6407 = llvm.add %6406, %6401 : !llvm.vec<8 x i32> + %6408 = llvm.trunc %6400 : !llvm.i64 to !llvm.i32 + %6409 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6410 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6411 = llvm.insertelement %6408, %6409[%6410 : !llvm.i32] : !llvm.vec<8 x i32> + %6412 = llvm.shufflevector %6411, %6409 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6413 = llvm.icmp "slt" %6407, %6412 : !llvm.vec<8 x i32> + %6414 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6415 = llvm.intr.masked.load %6399, %6413, %6414 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6416 = llvm.add %5447, %70 : !llvm.i64 + %6417 = llvm.icmp "slt" %6416, %67 : !llvm.i64 + %6418 = llvm.sub %64, %6416 : !llvm.i64 + %6419 = llvm.select %6417, %6418, %6416 : !llvm.i1, !llvm.i64 + %6420 = llvm.sdiv %6419, %68 : !llvm.i64 + %6421 = llvm.sub %64, %6420 : !llvm.i64 + %6422 = llvm.select %6417, %6421, %6420 : !llvm.i1, !llvm.i64 + %6423 = llvm.srem %6422, %68 : !llvm.i64 + %6424 = llvm.icmp "slt" %6423, %67 : !llvm.i64 + %6425 = llvm.add %6423, %68 : !llvm.i64 + %6426 = llvm.select %6424, %6425, %6423 : !llvm.i1, !llvm.i64 + %6427 = llvm.sdiv %6344, %70 : !llvm.i64 + %6428 = llvm.sub %64, %6427 : !llvm.i64 + %6429 = llvm.select %6342, %6428, %6427 : !llvm.i1, !llvm.i64 + %6430 = llvm.mul %6422, %65 : !llvm.i64 + %6431 = llvm.add %6429, %6430 : !llvm.i64 + %6432 = llvm.add %6431, %69 : !llvm.i64 + %6433 = llvm.icmp "slt" %6432, %67 : !llvm.i64 + %6434 = llvm.sub %64, %6432 : !llvm.i64 + %6435 = llvm.select %6433, %6434, %6432 : !llvm.i1, !llvm.i64 + %6436 = llvm.sdiv %6435, %63 : !llvm.i64 + %6437 = llvm.sub %64, %6436 : !llvm.i64 + %6438 = llvm.select %6433, %6437, %6436 : !llvm.i1, !llvm.i64 + %6439 = llvm.mul %6438, %65 : !llvm.i64 + %6440 = llvm.add %6431, %6439 : !llvm.i64 + %6441 = llvm.add %6440, %69 : !llvm.i64 + %6442 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6444 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6445 = llvm.mul %6426, %6444 : !llvm.i64 + %6446 = llvm.add %6443, %6445 : !llvm.i64 + %6447 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6448 = llvm.mul %67, %6447 : !llvm.i64 + %6449 = llvm.add %6446, %6448 : !llvm.i64 + %6450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6451 = llvm.mul %6441, %6450 : !llvm.i64 + %6452 = llvm.add %6449, %6451 : !llvm.i64 + %6453 = llvm.getelementptr %6442[%6452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6454 = llvm.load %6453 : !llvm.ptr> + %6455 = llvm.fadd %6415, %6454 : !llvm.vec<8 x float> + %6456 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6457 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6458 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6459 = llvm.mul %67, %6458 : !llvm.i64 + %6460 = llvm.add %6457, %6459 : !llvm.i64 + %6461 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6462 = llvm.mul %69, %6461 : !llvm.i64 + %6463 = llvm.add %6460, %6462 : !llvm.i64 + %6464 = llvm.getelementptr %6456[%6463] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6455, %6464 : !llvm.ptr> + %6465 = llvm.add %6315, %68 : !llvm.i64 + %6466 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6467 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6468 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6469 = llvm.mul %2345, %6468 : !llvm.i64 + %6470 = llvm.add %6467, %6469 : !llvm.i64 + %6471 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6472 = llvm.mul %6465, %6471 : !llvm.i64 + %6473 = llvm.add %6470, %6472 : !llvm.i64 + %6474 = llvm.getelementptr %6466[%6473] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6475 = llvm.bitcast %6474 : !llvm.ptr to !llvm.ptr> + %6476 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6477 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6478 = llvm.trunc %6465 : !llvm.i64 to !llvm.i32 + %6479 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6480 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6481 = llvm.insertelement %6478, %6479[%6480 : !llvm.i32] : !llvm.vec<8 x i32> + %6482 = llvm.shufflevector %6481, %6479 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6483 = llvm.add %6482, %6477 : !llvm.vec<8 x i32> + %6484 = llvm.trunc %6476 : !llvm.i64 to !llvm.i32 + %6485 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6486 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6487 = llvm.insertelement %6484, %6485[%6486 : !llvm.i32] : !llvm.vec<8 x i32> + %6488 = llvm.shufflevector %6487, %6485 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6489 = llvm.icmp "slt" %6483, %6488 : !llvm.vec<8 x i32> + %6490 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6491 = llvm.intr.masked.load %6475, %6489, %6490 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6492 = llvm.add %6347, %69 : !llvm.i64 + %6493 = llvm.icmp "slt" %6492, %67 : !llvm.i64 + %6494 = llvm.sub %64, %6492 : !llvm.i64 + %6495 = llvm.select %6493, %6494, %6492 : !llvm.i1, !llvm.i64 + %6496 = llvm.sdiv %6495, %68 : !llvm.i64 + %6497 = llvm.sub %64, %6496 : !llvm.i64 + %6498 = llvm.select %6493, %6497, %6496 : !llvm.i1, !llvm.i64 + %6499 = llvm.mul %6498, %60 : !llvm.i64 + %6500 = llvm.add %6347, %6499 : !llvm.i64 + %6501 = llvm.add %6500, %69 : !llvm.i64 + %6502 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6503 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6504 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6505 = llvm.mul %6501, %6504 : !llvm.i64 + %6506 = llvm.add %6503, %6505 : !llvm.i64 + %6507 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6508 = llvm.mul %67, %6507 : !llvm.i64 + %6509 = llvm.add %6506, %6508 : !llvm.i64 + %6510 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6511 = llvm.mul %6365, %6510 : !llvm.i64 + %6512 = llvm.add %6509, %6511 : !llvm.i64 + %6513 = llvm.getelementptr %6502[%6512] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6514 = llvm.load %6513 : !llvm.ptr> + %6515 = llvm.fadd %6491, %6514 : !llvm.vec<8 x float> + %6516 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6517 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6518 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6519 = llvm.mul %67, %6518 : !llvm.i64 + %6520 = llvm.add %6517, %6519 : !llvm.i64 + %6521 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6522 = llvm.mul %63, %6521 : !llvm.i64 + %6523 = llvm.add %6520, %6522 : !llvm.i64 + %6524 = llvm.getelementptr %6516[%6523] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6515, %6524 : !llvm.ptr> + %6525 = llvm.add %6315, %41 : !llvm.i64 + %6526 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6528 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6529 = llvm.mul %2345, %6528 : !llvm.i64 + %6530 = llvm.add %6527, %6529 : !llvm.i64 + %6531 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6532 = llvm.mul %6525, %6531 : !llvm.i64 + %6533 = llvm.add %6530, %6532 : !llvm.i64 + %6534 = llvm.getelementptr %6526[%6533] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6535 = llvm.bitcast %6534 : !llvm.ptr to !llvm.ptr> + %6536 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6537 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6538 = llvm.trunc %6525 : !llvm.i64 to !llvm.i32 + %6539 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6540 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6541 = llvm.insertelement %6538, %6539[%6540 : !llvm.i32] : !llvm.vec<8 x i32> + %6542 = llvm.shufflevector %6541, %6539 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6543 = llvm.add %6542, %6537 : !llvm.vec<8 x i32> + %6544 = llvm.trunc %6536 : !llvm.i64 to !llvm.i32 + %6545 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6546 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6547 = llvm.insertelement %6544, %6545[%6546 : !llvm.i32] : !llvm.vec<8 x i32> + %6548 = llvm.shufflevector %6547, %6545 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6549 = llvm.icmp "slt" %6543, %6548 : !llvm.vec<8 x i32> + %6550 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6551 = llvm.intr.masked.load %6535, %6549, %6550 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6552 = llvm.add %5447, %41 : !llvm.i64 + %6553 = llvm.icmp "slt" %6552, %67 : !llvm.i64 + %6554 = llvm.sub %64, %6552 : !llvm.i64 + %6555 = llvm.select %6553, %6554, %6552 : !llvm.i1, !llvm.i64 + %6556 = llvm.sdiv %6555, %68 : !llvm.i64 + %6557 = llvm.sub %64, %6556 : !llvm.i64 + %6558 = llvm.select %6553, %6557, %6556 : !llvm.i1, !llvm.i64 + %6559 = llvm.srem %6558, %68 : !llvm.i64 + %6560 = llvm.icmp "slt" %6559, %67 : !llvm.i64 + %6561 = llvm.add %6559, %68 : !llvm.i64 + %6562 = llvm.select %6560, %6561, %6559 : !llvm.i1, !llvm.i64 + %6563 = llvm.mul %6558, %65 : !llvm.i64 + %6564 = llvm.add %6429, %6563 : !llvm.i64 + %6565 = llvm.add %6564, %45 : !llvm.i64 + %6566 = llvm.icmp "slt" %6565, %67 : !llvm.i64 + %6567 = llvm.sub %64, %6565 : !llvm.i64 + %6568 = llvm.select %6566, %6567, %6565 : !llvm.i1, !llvm.i64 + %6569 = llvm.sdiv %6568, %63 : !llvm.i64 + %6570 = llvm.sub %64, %6569 : !llvm.i64 + %6571 = llvm.select %6566, %6570, %6569 : !llvm.i1, !llvm.i64 + %6572 = llvm.mul %6571, %65 : !llvm.i64 + %6573 = llvm.add %6564, %6572 : !llvm.i64 + %6574 = llvm.add %6573, %45 : !llvm.i64 + %6575 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6576 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6577 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6578 = llvm.mul %6562, %6577 : !llvm.i64 + %6579 = llvm.add %6576, %6578 : !llvm.i64 + %6580 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6581 = llvm.mul %67, %6580 : !llvm.i64 + %6582 = llvm.add %6579, %6581 : !llvm.i64 + %6583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6584 = llvm.mul %6574, %6583 : !llvm.i64 + %6585 = llvm.add %6582, %6584 : !llvm.i64 + %6586 = llvm.getelementptr %6575[%6585] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6587 = llvm.load %6586 : !llvm.ptr> + %6588 = llvm.fadd %6551, %6587 : !llvm.vec<8 x float> + %6589 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6592 = llvm.mul %67, %6591 : !llvm.i64 + %6593 = llvm.add %6590, %6592 : !llvm.i64 + %6594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6595 = llvm.mul %45, %6594 : !llvm.i64 + %6596 = llvm.add %6593, %6595 : !llvm.i64 + %6597 = llvm.getelementptr %6589[%6596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6588, %6597 : !llvm.ptr> + %6598 = llvm.add %6315, %42 : !llvm.i64 + %6599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6602 = llvm.mul %2345, %6601 : !llvm.i64 + %6603 = llvm.add %6600, %6602 : !llvm.i64 + %6604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6605 = llvm.mul %6598, %6604 : !llvm.i64 + %6606 = llvm.add %6603, %6605 : !llvm.i64 + %6607 = llvm.getelementptr %6599[%6606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6608 = llvm.bitcast %6607 : !llvm.ptr to !llvm.ptr> + %6609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6611 = llvm.trunc %6598 : !llvm.i64 to !llvm.i32 + %6612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6614 = llvm.insertelement %6611, %6612[%6613 : !llvm.i32] : !llvm.vec<8 x i32> + %6615 = llvm.shufflevector %6614, %6612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6616 = llvm.add %6615, %6610 : !llvm.vec<8 x i32> + %6617 = llvm.trunc %6609 : !llvm.i64 to !llvm.i32 + %6618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6620 = llvm.insertelement %6617, %6618[%6619 : !llvm.i32] : !llvm.vec<8 x i32> + %6621 = llvm.shufflevector %6620, %6618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6622 = llvm.icmp "slt" %6616, %6621 : !llvm.vec<8 x i32> + %6623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6624 = llvm.intr.masked.load %6608, %6622, %6623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6625 = llvm.add %6347, %63 : !llvm.i64 + %6626 = llvm.icmp "slt" %6625, %67 : !llvm.i64 + %6627 = llvm.sub %64, %6625 : !llvm.i64 + %6628 = llvm.select %6626, %6627, %6625 : !llvm.i1, !llvm.i64 + %6629 = llvm.sdiv %6628, %68 : !llvm.i64 + %6630 = llvm.sub %64, %6629 : !llvm.i64 + %6631 = llvm.select %6626, %6630, %6629 : !llvm.i1, !llvm.i64 + %6632 = llvm.mul %6631, %60 : !llvm.i64 + %6633 = llvm.add %6347, %6632 : !llvm.i64 + %6634 = llvm.add %6633, %63 : !llvm.i64 + %6635 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6637 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6638 = llvm.mul %6634, %6637 : !llvm.i64 + %6639 = llvm.add %6636, %6638 : !llvm.i64 + %6640 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6641 = llvm.mul %67, %6640 : !llvm.i64 + %6642 = llvm.add %6639, %6641 : !llvm.i64 + %6643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6644 = llvm.mul %6365, %6643 : !llvm.i64 + %6645 = llvm.add %6642, %6644 : !llvm.i64 + %6646 = llvm.getelementptr %6635[%6645] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6647 = llvm.load %6646 : !llvm.ptr> + %6648 = llvm.fadd %6624, %6647 : !llvm.vec<8 x float> + %6649 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6651 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6652 = llvm.mul %67, %6651 : !llvm.i64 + %6653 = llvm.add %6650, %6652 : !llvm.i64 + %6654 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6655 = llvm.mul %48, %6654 : !llvm.i64 + %6656 = llvm.add %6653, %6655 : !llvm.i64 + %6657 = llvm.getelementptr %6649[%6656] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6648, %6657 : !llvm.ptr> + %6658 = llvm.add %6315, %43 : !llvm.i64 + %6659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6662 = llvm.mul %2345, %6661 : !llvm.i64 + %6663 = llvm.add %6660, %6662 : !llvm.i64 + %6664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6665 = llvm.mul %6658, %6664 : !llvm.i64 + %6666 = llvm.add %6663, %6665 : !llvm.i64 + %6667 = llvm.getelementptr %6659[%6666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6668 = llvm.bitcast %6667 : !llvm.ptr to !llvm.ptr> + %6669 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6670 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6671 = llvm.trunc %6658 : !llvm.i64 to !llvm.i32 + %6672 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6673 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6674 = llvm.insertelement %6671, %6672[%6673 : !llvm.i32] : !llvm.vec<8 x i32> + %6675 = llvm.shufflevector %6674, %6672 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6676 = llvm.add %6675, %6670 : !llvm.vec<8 x i32> + %6677 = llvm.trunc %6669 : !llvm.i64 to !llvm.i32 + %6678 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6679 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6680 = llvm.insertelement %6677, %6678[%6679 : !llvm.i32] : !llvm.vec<8 x i32> + %6681 = llvm.shufflevector %6680, %6678 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6682 = llvm.icmp "slt" %6676, %6681 : !llvm.vec<8 x i32> + %6683 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6684 = llvm.intr.masked.load %6668, %6682, %6683 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6685 = llvm.add %5447, %43 : !llvm.i64 + %6686 = llvm.icmp "slt" %6685, %67 : !llvm.i64 + %6687 = llvm.sub %64, %6685 : !llvm.i64 + %6688 = llvm.select %6686, %6687, %6685 : !llvm.i1, !llvm.i64 + %6689 = llvm.sdiv %6688, %68 : !llvm.i64 + %6690 = llvm.sub %64, %6689 : !llvm.i64 + %6691 = llvm.select %6686, %6690, %6689 : !llvm.i1, !llvm.i64 + %6692 = llvm.srem %6691, %68 : !llvm.i64 + %6693 = llvm.icmp "slt" %6692, %67 : !llvm.i64 + %6694 = llvm.add %6692, %68 : !llvm.i64 + %6695 = llvm.select %6693, %6694, %6692 : !llvm.i1, !llvm.i64 + %6696 = llvm.mul %6691, %65 : !llvm.i64 + %6697 = llvm.add %6429, %6696 : !llvm.i64 + %6698 = llvm.add %6697, %52 : !llvm.i64 + %6699 = llvm.icmp "slt" %6698, %67 : !llvm.i64 + %6700 = llvm.sub %64, %6698 : !llvm.i64 + %6701 = llvm.select %6699, %6700, %6698 : !llvm.i1, !llvm.i64 + %6702 = llvm.sdiv %6701, %63 : !llvm.i64 + %6703 = llvm.sub %64, %6702 : !llvm.i64 + %6704 = llvm.select %6699, %6703, %6702 : !llvm.i1, !llvm.i64 + %6705 = llvm.mul %6704, %65 : !llvm.i64 + %6706 = llvm.add %6697, %6705 : !llvm.i64 + %6707 = llvm.add %6706, %52 : !llvm.i64 + %6708 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6710 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6711 = llvm.mul %6695, %6710 : !llvm.i64 + %6712 = llvm.add %6709, %6711 : !llvm.i64 + %6713 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6714 = llvm.mul %67, %6713 : !llvm.i64 + %6715 = llvm.add %6712, %6714 : !llvm.i64 + %6716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6717 = llvm.mul %6707, %6716 : !llvm.i64 + %6718 = llvm.add %6715, %6717 : !llvm.i64 + %6719 = llvm.getelementptr %6708[%6718] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6720 = llvm.load %6719 : !llvm.ptr> + %6721 = llvm.fadd %6684, %6720 : !llvm.vec<8 x float> + %6722 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6724 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6725 = llvm.mul %67, %6724 : !llvm.i64 + %6726 = llvm.add %6723, %6725 : !llvm.i64 + %6727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6728 = llvm.mul %52, %6727 : !llvm.i64 + %6729 = llvm.add %6726, %6728 : !llvm.i64 + %6730 = llvm.getelementptr %6722[%6729] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6721, %6730 : !llvm.ptr> + %6731 = llvm.add %6315, %44 : !llvm.i64 + %6732 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6734 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6735 = llvm.mul %2345, %6734 : !llvm.i64 + %6736 = llvm.add %6733, %6735 : !llvm.i64 + %6737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6738 = llvm.mul %6731, %6737 : !llvm.i64 + %6739 = llvm.add %6736, %6738 : !llvm.i64 + %6740 = llvm.getelementptr %6732[%6739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6741 = llvm.bitcast %6740 : !llvm.ptr to !llvm.ptr> + %6742 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6743 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6744 = llvm.trunc %6731 : !llvm.i64 to !llvm.i32 + %6745 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6746 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6747 = llvm.insertelement %6744, %6745[%6746 : !llvm.i32] : !llvm.vec<8 x i32> + %6748 = llvm.shufflevector %6747, %6745 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6749 = llvm.add %6748, %6743 : !llvm.vec<8 x i32> + %6750 = llvm.trunc %6742 : !llvm.i64 to !llvm.i32 + %6751 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6752 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6753 = llvm.insertelement %6750, %6751[%6752 : !llvm.i32] : !llvm.vec<8 x i32> + %6754 = llvm.shufflevector %6753, %6751 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6755 = llvm.icmp "slt" %6749, %6754 : !llvm.vec<8 x i32> + %6756 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6757 = llvm.intr.masked.load %6741, %6755, %6756 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6758 = llvm.add %6347, %45 : !llvm.i64 + %6759 = llvm.icmp "slt" %6758, %67 : !llvm.i64 + %6760 = llvm.sub %64, %6758 : !llvm.i64 + %6761 = llvm.select %6759, %6760, %6758 : !llvm.i1, !llvm.i64 + %6762 = llvm.sdiv %6761, %68 : !llvm.i64 + %6763 = llvm.sub %64, %6762 : !llvm.i64 + %6764 = llvm.select %6759, %6763, %6762 : !llvm.i1, !llvm.i64 + %6765 = llvm.mul %6764, %60 : !llvm.i64 + %6766 = llvm.add %6347, %6765 : !llvm.i64 + %6767 = llvm.add %6766, %45 : !llvm.i64 + %6768 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6770 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6771 = llvm.mul %6767, %6770 : !llvm.i64 + %6772 = llvm.add %6769, %6771 : !llvm.i64 + %6773 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6774 = llvm.mul %67, %6773 : !llvm.i64 + %6775 = llvm.add %6772, %6774 : !llvm.i64 + %6776 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6777 = llvm.mul %6365, %6776 : !llvm.i64 + %6778 = llvm.add %6775, %6777 : !llvm.i64 + %6779 = llvm.getelementptr %6768[%6778] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6780 = llvm.load %6779 : !llvm.ptr> + %6781 = llvm.fadd %6757, %6780 : !llvm.vec<8 x float> + %6782 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6784 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6785 = llvm.mul %67, %6784 : !llvm.i64 + %6786 = llvm.add %6783, %6785 : !llvm.i64 + %6787 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6788 = llvm.mul %56, %6787 : !llvm.i64 + %6789 = llvm.add %6786, %6788 : !llvm.i64 + %6790 = llvm.getelementptr %6782[%6789] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6781, %6790 : !llvm.ptr> + %6791 = llvm.add %6315, %46 : !llvm.i64 + %6792 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6794 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6795 = llvm.mul %2345, %6794 : !llvm.i64 + %6796 = llvm.add %6793, %6795 : !llvm.i64 + %6797 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6798 = llvm.mul %6791, %6797 : !llvm.i64 + %6799 = llvm.add %6796, %6798 : !llvm.i64 + %6800 = llvm.getelementptr %6792[%6799] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6801 = llvm.bitcast %6800 : !llvm.ptr to !llvm.ptr> + %6802 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6803 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6804 = llvm.trunc %6791 : !llvm.i64 to !llvm.i32 + %6805 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6806 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6807 = llvm.insertelement %6804, %6805[%6806 : !llvm.i32] : !llvm.vec<8 x i32> + %6808 = llvm.shufflevector %6807, %6805 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6809 = llvm.add %6808, %6803 : !llvm.vec<8 x i32> + %6810 = llvm.trunc %6802 : !llvm.i64 to !llvm.i32 + %6811 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6812 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6813 = llvm.insertelement %6810, %6811[%6812 : !llvm.i32] : !llvm.vec<8 x i32> + %6814 = llvm.shufflevector %6813, %6811 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6815 = llvm.icmp "slt" %6809, %6814 : !llvm.vec<8 x i32> + %6816 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6817 = llvm.intr.masked.load %6801, %6815, %6816 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6818 = llvm.add %5447, %46 : !llvm.i64 + %6819 = llvm.icmp "slt" %6818, %67 : !llvm.i64 + %6820 = llvm.sub %64, %6818 : !llvm.i64 + %6821 = llvm.select %6819, %6820, %6818 : !llvm.i1, !llvm.i64 + %6822 = llvm.sdiv %6821, %68 : !llvm.i64 + %6823 = llvm.sub %64, %6822 : !llvm.i64 + %6824 = llvm.select %6819, %6823, %6822 : !llvm.i1, !llvm.i64 + %6825 = llvm.srem %6824, %68 : !llvm.i64 + %6826 = llvm.icmp "slt" %6825, %67 : !llvm.i64 + %6827 = llvm.add %6825, %68 : !llvm.i64 + %6828 = llvm.select %6826, %6827, %6825 : !llvm.i1, !llvm.i64 + %6829 = llvm.mul %6824, %65 : !llvm.i64 + %6830 = llvm.add %6429, %6829 : !llvm.i64 + %6831 = llvm.add %6830, %61 : !llvm.i64 + %6832 = llvm.icmp "slt" %6831, %67 : !llvm.i64 + %6833 = llvm.sub %64, %6831 : !llvm.i64 + %6834 = llvm.select %6832, %6833, %6831 : !llvm.i1, !llvm.i64 + %6835 = llvm.sdiv %6834, %63 : !llvm.i64 + %6836 = llvm.sub %64, %6835 : !llvm.i64 + %6837 = llvm.select %6832, %6836, %6835 : !llvm.i1, !llvm.i64 + %6838 = llvm.mul %6837, %65 : !llvm.i64 + %6839 = llvm.add %6830, %6838 : !llvm.i64 + %6840 = llvm.add %6839, %61 : !llvm.i64 + %6841 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6842 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6843 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6844 = llvm.mul %6828, %6843 : !llvm.i64 + %6845 = llvm.add %6842, %6844 : !llvm.i64 + %6846 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6847 = llvm.mul %67, %6846 : !llvm.i64 + %6848 = llvm.add %6845, %6847 : !llvm.i64 + %6849 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6850 = llvm.mul %6840, %6849 : !llvm.i64 + %6851 = llvm.add %6848, %6850 : !llvm.i64 + %6852 = llvm.getelementptr %6841[%6851] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6853 = llvm.load %6852 : !llvm.ptr> + %6854 = llvm.fadd %6817, %6853 : !llvm.vec<8 x float> + %6855 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6857 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6858 = llvm.mul %67, %6857 : !llvm.i64 + %6859 = llvm.add %6856, %6858 : !llvm.i64 + %6860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6861 = llvm.mul %61, %6860 : !llvm.i64 + %6862 = llvm.add %6859, %6861 : !llvm.i64 + %6863 = llvm.getelementptr %6855[%6862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6854, %6863 : !llvm.ptr> + %6864 = llvm.add %6315, %47 : !llvm.i64 + %6865 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6866 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6867 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6868 = llvm.mul %2345, %6867 : !llvm.i64 + %6869 = llvm.add %6866, %6868 : !llvm.i64 + %6870 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6871 = llvm.mul %6864, %6870 : !llvm.i64 + %6872 = llvm.add %6869, %6871 : !llvm.i64 + %6873 = llvm.getelementptr %6865[%6872] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6874 = llvm.bitcast %6873 : !llvm.ptr to !llvm.ptr> + %6875 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6876 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6877 = llvm.trunc %6864 : !llvm.i64 to !llvm.i32 + %6878 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6879 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6880 = llvm.insertelement %6877, %6878[%6879 : !llvm.i32] : !llvm.vec<8 x i32> + %6881 = llvm.shufflevector %6880, %6878 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6882 = llvm.add %6881, %6876 : !llvm.vec<8 x i32> + %6883 = llvm.trunc %6875 : !llvm.i64 to !llvm.i32 + %6884 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6885 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6886 = llvm.insertelement %6883, %6884[%6885 : !llvm.i32] : !llvm.vec<8 x i32> + %6887 = llvm.shufflevector %6886, %6884 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6888 = llvm.icmp "slt" %6882, %6887 : !llvm.vec<8 x i32> + %6889 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6890 = llvm.intr.masked.load %6874, %6888, %6889 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6891 = llvm.add %6347, %48 : !llvm.i64 + %6892 = llvm.icmp "slt" %6891, %67 : !llvm.i64 + %6893 = llvm.sub %64, %6891 : !llvm.i64 + %6894 = llvm.select %6892, %6893, %6891 : !llvm.i1, !llvm.i64 + %6895 = llvm.sdiv %6894, %68 : !llvm.i64 + %6896 = llvm.sub %64, %6895 : !llvm.i64 + %6897 = llvm.select %6892, %6896, %6895 : !llvm.i1, !llvm.i64 + %6898 = llvm.mul %6897, %60 : !llvm.i64 + %6899 = llvm.add %6347, %6898 : !llvm.i64 + %6900 = llvm.add %6899, %48 : !llvm.i64 + %6901 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6903 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6904 = llvm.mul %6900, %6903 : !llvm.i64 + %6905 = llvm.add %6902, %6904 : !llvm.i64 + %6906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6907 = llvm.mul %67, %6906 : !llvm.i64 + %6908 = llvm.add %6905, %6907 : !llvm.i64 + %6909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6910 = llvm.mul %6365, %6909 : !llvm.i64 + %6911 = llvm.add %6908, %6910 : !llvm.i64 + %6912 = llvm.getelementptr %6901[%6911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6913 = llvm.load %6912 : !llvm.ptr> + %6914 = llvm.fadd %6890, %6913 : !llvm.vec<8 x float> + %6915 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6916 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6917 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6918 = llvm.mul %67, %6917 : !llvm.i64 + %6919 = llvm.add %6916, %6918 : !llvm.i64 + %6920 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6921 = llvm.mul %70, %6920 : !llvm.i64 + %6922 = llvm.add %6919, %6921 : !llvm.i64 + %6923 = llvm.getelementptr %6915[%6922] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6914, %6923 : !llvm.ptr> + %6924 = llvm.add %6315, %49 : !llvm.i64 + %6925 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6926 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6927 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6928 = llvm.mul %2345, %6927 : !llvm.i64 + %6929 = llvm.add %6926, %6928 : !llvm.i64 + %6930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6931 = llvm.mul %6924, %6930 : !llvm.i64 + %6932 = llvm.add %6929, %6931 : !llvm.i64 + %6933 = llvm.getelementptr %6925[%6932] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6934 = llvm.bitcast %6933 : !llvm.ptr to !llvm.ptr> + %6935 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6936 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6937 = llvm.trunc %6924 : !llvm.i64 to !llvm.i32 + %6938 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6939 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6940 = llvm.insertelement %6937, %6938[%6939 : !llvm.i32] : !llvm.vec<8 x i32> + %6941 = llvm.shufflevector %6940, %6938 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6942 = llvm.add %6941, %6936 : !llvm.vec<8 x i32> + %6943 = llvm.trunc %6935 : !llvm.i64 to !llvm.i32 + %6944 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6945 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6946 = llvm.insertelement %6943, %6944[%6945 : !llvm.i32] : !llvm.vec<8 x i32> + %6947 = llvm.shufflevector %6946, %6944 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6948 = llvm.icmp "slt" %6942, %6947 : !llvm.vec<8 x i32> + %6949 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6950 = llvm.intr.masked.load %6934, %6948, %6949 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6951 = llvm.add %5447, %49 : !llvm.i64 + %6952 = llvm.icmp "slt" %6951, %67 : !llvm.i64 + %6953 = llvm.sub %64, %6951 : !llvm.i64 + %6954 = llvm.select %6952, %6953, %6951 : !llvm.i1, !llvm.i64 + %6955 = llvm.sdiv %6954, %68 : !llvm.i64 + %6956 = llvm.sub %64, %6955 : !llvm.i64 + %6957 = llvm.select %6952, %6956, %6955 : !llvm.i1, !llvm.i64 + %6958 = llvm.srem %6957, %68 : !llvm.i64 + %6959 = llvm.icmp "slt" %6958, %67 : !llvm.i64 + %6960 = llvm.add %6958, %68 : !llvm.i64 + %6961 = llvm.select %6959, %6960, %6958 : !llvm.i1, !llvm.i64 + %6962 = llvm.mul %6957, %65 : !llvm.i64 + %6963 = llvm.add %6429, %6962 : !llvm.i64 + %6964 = llvm.add %6963, %50 : !llvm.i64 + %6965 = llvm.icmp "slt" %6964, %67 : !llvm.i64 + %6966 = llvm.sub %64, %6964 : !llvm.i64 + %6967 = llvm.select %6965, %6966, %6964 : !llvm.i1, !llvm.i64 + %6968 = llvm.sdiv %6967, %63 : !llvm.i64 + %6969 = llvm.sub %64, %6968 : !llvm.i64 + %6970 = llvm.select %6965, %6969, %6968 : !llvm.i1, !llvm.i64 + %6971 = llvm.mul %6970, %65 : !llvm.i64 + %6972 = llvm.add %6963, %6971 : !llvm.i64 + %6973 = llvm.add %6972, %50 : !llvm.i64 + %6974 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6975 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6976 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6977 = llvm.mul %6961, %6976 : !llvm.i64 + %6978 = llvm.add %6975, %6977 : !llvm.i64 + %6979 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6980 = llvm.mul %67, %6979 : !llvm.i64 + %6981 = llvm.add %6978, %6980 : !llvm.i64 + %6982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6983 = llvm.mul %6973, %6982 : !llvm.i64 + %6984 = llvm.add %6981, %6983 : !llvm.i64 + %6985 = llvm.getelementptr %6974[%6984] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6986 = llvm.load %6985 : !llvm.ptr> + %6987 = llvm.fadd %6950, %6986 : !llvm.vec<8 x float> + %6988 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6990 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6991 = llvm.mul %67, %6990 : !llvm.i64 + %6992 = llvm.add %6989, %6991 : !llvm.i64 + %6993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6994 = llvm.mul %50, %6993 : !llvm.i64 + %6995 = llvm.add %6992, %6994 : !llvm.i64 + %6996 = llvm.getelementptr %6988[%6995] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6987, %6996 : !llvm.ptr> + %6997 = llvm.add %6315, %51 : !llvm.i64 + %6998 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6999 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7000 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7001 = llvm.mul %2345, %7000 : !llvm.i64 + %7002 = llvm.add %6999, %7001 : !llvm.i64 + %7003 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7004 = llvm.mul %6997, %7003 : !llvm.i64 + %7005 = llvm.add %7002, %7004 : !llvm.i64 + %7006 = llvm.getelementptr %6998[%7005] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7007 = llvm.bitcast %7006 : !llvm.ptr to !llvm.ptr> + %7008 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7009 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7010 = llvm.trunc %6997 : !llvm.i64 to !llvm.i32 + %7011 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7012 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7013 = llvm.insertelement %7010, %7011[%7012 : !llvm.i32] : !llvm.vec<8 x i32> + %7014 = llvm.shufflevector %7013, %7011 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7015 = llvm.add %7014, %7009 : !llvm.vec<8 x i32> + %7016 = llvm.trunc %7008 : !llvm.i64 to !llvm.i32 + %7017 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7018 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7019 = llvm.insertelement %7016, %7017[%7018 : !llvm.i32] : !llvm.vec<8 x i32> + %7020 = llvm.shufflevector %7019, %7017 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7021 = llvm.icmp "slt" %7015, %7020 : !llvm.vec<8 x i32> + %7022 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7023 = llvm.intr.masked.load %7007, %7021, %7022 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7024 = llvm.add %6347, %52 : !llvm.i64 + %7025 = llvm.icmp "slt" %7024, %67 : !llvm.i64 + %7026 = llvm.sub %64, %7024 : !llvm.i64 + %7027 = llvm.select %7025, %7026, %7024 : !llvm.i1, !llvm.i64 + %7028 = llvm.sdiv %7027, %68 : !llvm.i64 + %7029 = llvm.sub %64, %7028 : !llvm.i64 + %7030 = llvm.select %7025, %7029, %7028 : !llvm.i1, !llvm.i64 + %7031 = llvm.mul %7030, %60 : !llvm.i64 + %7032 = llvm.add %6347, %7031 : !llvm.i64 + %7033 = llvm.add %7032, %52 : !llvm.i64 + %7034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7037 = llvm.mul %7033, %7036 : !llvm.i64 + %7038 = llvm.add %7035, %7037 : !llvm.i64 + %7039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7040 = llvm.mul %67, %7039 : !llvm.i64 + %7041 = llvm.add %7038, %7040 : !llvm.i64 + %7042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7043 = llvm.mul %6365, %7042 : !llvm.i64 + %7044 = llvm.add %7041, %7043 : !llvm.i64 + %7045 = llvm.getelementptr %7034[%7044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7046 = llvm.load %7045 : !llvm.ptr> + %7047 = llvm.fadd %7023, %7046 : !llvm.vec<8 x float> + %7048 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7050 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7051 = llvm.mul %67, %7050 : !llvm.i64 + %7052 = llvm.add %7049, %7051 : !llvm.i64 + %7053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7054 = llvm.mul %33, %7053 : !llvm.i64 + %7055 = llvm.add %7052, %7054 : !llvm.i64 + %7056 = llvm.getelementptr %7048[%7055] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7047, %7056 : !llvm.ptr> + %7057 = llvm.add %6315, %53 : !llvm.i64 + %7058 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7059 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7060 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7061 = llvm.mul %2345, %7060 : !llvm.i64 + %7062 = llvm.add %7059, %7061 : !llvm.i64 + %7063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7064 = llvm.mul %7057, %7063 : !llvm.i64 + %7065 = llvm.add %7062, %7064 : !llvm.i64 + %7066 = llvm.getelementptr %7058[%7065] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7067 = llvm.bitcast %7066 : !llvm.ptr to !llvm.ptr> + %7068 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7069 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7070 = llvm.trunc %7057 : !llvm.i64 to !llvm.i32 + %7071 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7072 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7073 = llvm.insertelement %7070, %7071[%7072 : !llvm.i32] : !llvm.vec<8 x i32> + %7074 = llvm.shufflevector %7073, %7071 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7075 = llvm.add %7074, %7069 : !llvm.vec<8 x i32> + %7076 = llvm.trunc %7068 : !llvm.i64 to !llvm.i32 + %7077 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7078 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7079 = llvm.insertelement %7076, %7077[%7078 : !llvm.i32] : !llvm.vec<8 x i32> + %7080 = llvm.shufflevector %7079, %7077 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7081 = llvm.icmp "slt" %7075, %7080 : !llvm.vec<8 x i32> + %7082 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7083 = llvm.intr.masked.load %7067, %7081, %7082 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7084 = llvm.add %5447, %53 : !llvm.i64 + %7085 = llvm.icmp "slt" %7084, %67 : !llvm.i64 + %7086 = llvm.sub %64, %7084 : !llvm.i64 + %7087 = llvm.select %7085, %7086, %7084 : !llvm.i1, !llvm.i64 + %7088 = llvm.sdiv %7087, %68 : !llvm.i64 + %7089 = llvm.sub %64, %7088 : !llvm.i64 + %7090 = llvm.select %7085, %7089, %7088 : !llvm.i1, !llvm.i64 + %7091 = llvm.srem %7090, %68 : !llvm.i64 + %7092 = llvm.icmp "slt" %7091, %67 : !llvm.i64 + %7093 = llvm.add %7091, %68 : !llvm.i64 + %7094 = llvm.select %7092, %7093, %7091 : !llvm.i1, !llvm.i64 + %7095 = llvm.mul %7090, %65 : !llvm.i64 + %7096 = llvm.add %6429, %7095 : !llvm.i64 + %7097 = llvm.add %7096, %54 : !llvm.i64 + %7098 = llvm.icmp "slt" %7097, %67 : !llvm.i64 + %7099 = llvm.sub %64, %7097 : !llvm.i64 + %7100 = llvm.select %7098, %7099, %7097 : !llvm.i1, !llvm.i64 + %7101 = llvm.sdiv %7100, %63 : !llvm.i64 + %7102 = llvm.sub %64, %7101 : !llvm.i64 + %7103 = llvm.select %7098, %7102, %7101 : !llvm.i1, !llvm.i64 + %7104 = llvm.mul %7103, %65 : !llvm.i64 + %7105 = llvm.add %7096, %7104 : !llvm.i64 + %7106 = llvm.add %7105, %54 : !llvm.i64 + %7107 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7108 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7109 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7110 = llvm.mul %7094, %7109 : !llvm.i64 + %7111 = llvm.add %7108, %7110 : !llvm.i64 + %7112 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7113 = llvm.mul %67, %7112 : !llvm.i64 + %7114 = llvm.add %7111, %7113 : !llvm.i64 + %7115 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7116 = llvm.mul %7106, %7115 : !llvm.i64 + %7117 = llvm.add %7114, %7116 : !llvm.i64 + %7118 = llvm.getelementptr %7107[%7117] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7119 = llvm.load %7118 : !llvm.ptr> + %7120 = llvm.fadd %7083, %7119 : !llvm.vec<8 x float> + %7121 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7122 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7123 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7124 = llvm.mul %67, %7123 : !llvm.i64 + %7125 = llvm.add %7122, %7124 : !llvm.i64 + %7126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7127 = llvm.mul %54, %7126 : !llvm.i64 + %7128 = llvm.add %7125, %7127 : !llvm.i64 + %7129 = llvm.getelementptr %7121[%7128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7120, %7129 : !llvm.ptr> + %7130 = llvm.add %6315, %55 : !llvm.i64 + %7131 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7134 = llvm.mul %2345, %7133 : !llvm.i64 + %7135 = llvm.add %7132, %7134 : !llvm.i64 + %7136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7137 = llvm.mul %7130, %7136 : !llvm.i64 + %7138 = llvm.add %7135, %7137 : !llvm.i64 + %7139 = llvm.getelementptr %7131[%7138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7140 = llvm.bitcast %7139 : !llvm.ptr to !llvm.ptr> + %7141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7143 = llvm.trunc %7130 : !llvm.i64 to !llvm.i32 + %7144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7146 = llvm.insertelement %7143, %7144[%7145 : !llvm.i32] : !llvm.vec<8 x i32> + %7147 = llvm.shufflevector %7146, %7144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7148 = llvm.add %7147, %7142 : !llvm.vec<8 x i32> + %7149 = llvm.trunc %7141 : !llvm.i64 to !llvm.i32 + %7150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7152 = llvm.insertelement %7149, %7150[%7151 : !llvm.i32] : !llvm.vec<8 x i32> + %7153 = llvm.shufflevector %7152, %7150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7154 = llvm.icmp "slt" %7148, %7153 : !llvm.vec<8 x i32> + %7155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7156 = llvm.intr.masked.load %7140, %7154, %7155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7157 = llvm.add %6347, %56 : !llvm.i64 + %7158 = llvm.icmp "slt" %7157, %67 : !llvm.i64 + %7159 = llvm.sub %64, %7157 : !llvm.i64 + %7160 = llvm.select %7158, %7159, %7157 : !llvm.i1, !llvm.i64 + %7161 = llvm.sdiv %7160, %68 : !llvm.i64 + %7162 = llvm.sub %64, %7161 : !llvm.i64 + %7163 = llvm.select %7158, %7162, %7161 : !llvm.i1, !llvm.i64 + %7164 = llvm.mul %7163, %60 : !llvm.i64 + %7165 = llvm.add %6347, %7164 : !llvm.i64 + %7166 = llvm.add %7165, %56 : !llvm.i64 + %7167 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7169 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7170 = llvm.mul %7166, %7169 : !llvm.i64 + %7171 = llvm.add %7168, %7170 : !llvm.i64 + %7172 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7173 = llvm.mul %67, %7172 : !llvm.i64 + %7174 = llvm.add %7171, %7173 : !llvm.i64 + %7175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7176 = llvm.mul %6365, %7175 : !llvm.i64 + %7177 = llvm.add %7174, %7176 : !llvm.i64 + %7178 = llvm.getelementptr %7167[%7177] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7179 = llvm.load %7178 : !llvm.ptr> + %7180 = llvm.fadd %7156, %7179 : !llvm.vec<8 x float> + %7181 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7182 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7183 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7184 = llvm.mul %67, %7183 : !llvm.i64 + %7185 = llvm.add %7182, %7184 : !llvm.i64 + %7186 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7187 = llvm.mul %34, %7186 : !llvm.i64 + %7188 = llvm.add %7185, %7187 : !llvm.i64 + %7189 = llvm.getelementptr %7181[%7188] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7180, %7189 : !llvm.ptr> + %7190 = llvm.add %6315, %57 : !llvm.i64 + %7191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7193 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7194 = llvm.mul %2345, %7193 : !llvm.i64 + %7195 = llvm.add %7192, %7194 : !llvm.i64 + %7196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7197 = llvm.mul %7190, %7196 : !llvm.i64 + %7198 = llvm.add %7195, %7197 : !llvm.i64 + %7199 = llvm.getelementptr %7191[%7198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7200 = llvm.bitcast %7199 : !llvm.ptr to !llvm.ptr> + %7201 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7202 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7203 = llvm.trunc %7190 : !llvm.i64 to !llvm.i32 + %7204 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7205 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7206 = llvm.insertelement %7203, %7204[%7205 : !llvm.i32] : !llvm.vec<8 x i32> + %7207 = llvm.shufflevector %7206, %7204 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7208 = llvm.add %7207, %7202 : !llvm.vec<8 x i32> + %7209 = llvm.trunc %7201 : !llvm.i64 to !llvm.i32 + %7210 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7211 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7212 = llvm.insertelement %7209, %7210[%7211 : !llvm.i32] : !llvm.vec<8 x i32> + %7213 = llvm.shufflevector %7212, %7210 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7214 = llvm.icmp "slt" %7208, %7213 : !llvm.vec<8 x i32> + %7215 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7216 = llvm.intr.masked.load %7200, %7214, %7215 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7217 = llvm.add %5447, %57 : !llvm.i64 + %7218 = llvm.icmp "slt" %7217, %67 : !llvm.i64 + %7219 = llvm.sub %64, %7217 : !llvm.i64 + %7220 = llvm.select %7218, %7219, %7217 : !llvm.i1, !llvm.i64 + %7221 = llvm.sdiv %7220, %68 : !llvm.i64 + %7222 = llvm.sub %64, %7221 : !llvm.i64 + %7223 = llvm.select %7218, %7222, %7221 : !llvm.i1, !llvm.i64 + %7224 = llvm.srem %7223, %68 : !llvm.i64 + %7225 = llvm.icmp "slt" %7224, %67 : !llvm.i64 + %7226 = llvm.add %7224, %68 : !llvm.i64 + %7227 = llvm.select %7225, %7226, %7224 : !llvm.i1, !llvm.i64 + %7228 = llvm.mul %7223, %65 : !llvm.i64 + %7229 = llvm.add %6429, %7228 : !llvm.i64 + %7230 = llvm.add %7229, %58 : !llvm.i64 + %7231 = llvm.icmp "slt" %7230, %67 : !llvm.i64 + %7232 = llvm.sub %64, %7230 : !llvm.i64 + %7233 = llvm.select %7231, %7232, %7230 : !llvm.i1, !llvm.i64 + %7234 = llvm.sdiv %7233, %63 : !llvm.i64 + %7235 = llvm.sub %64, %7234 : !llvm.i64 + %7236 = llvm.select %7231, %7235, %7234 : !llvm.i1, !llvm.i64 + %7237 = llvm.mul %7236, %65 : !llvm.i64 + %7238 = llvm.add %7229, %7237 : !llvm.i64 + %7239 = llvm.add %7238, %58 : !llvm.i64 + %7240 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7242 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7243 = llvm.mul %7227, %7242 : !llvm.i64 + %7244 = llvm.add %7241, %7243 : !llvm.i64 + %7245 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7246 = llvm.mul %67, %7245 : !llvm.i64 + %7247 = llvm.add %7244, %7246 : !llvm.i64 + %7248 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7249 = llvm.mul %7239, %7248 : !llvm.i64 + %7250 = llvm.add %7247, %7249 : !llvm.i64 + %7251 = llvm.getelementptr %7240[%7250] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7252 = llvm.load %7251 : !llvm.ptr> + %7253 = llvm.fadd %7216, %7252 : !llvm.vec<8 x float> + %7254 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7256 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7257 = llvm.mul %67, %7256 : !llvm.i64 + %7258 = llvm.add %7255, %7257 : !llvm.i64 + %7259 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7260 = llvm.mul %58, %7259 : !llvm.i64 + %7261 = llvm.add %7258, %7260 : !llvm.i64 + %7262 = llvm.getelementptr %7254[%7261] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7253, %7262 : !llvm.ptr> + %7263 = llvm.add %6315, %59 : !llvm.i64 + %7264 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7265 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7266 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7267 = llvm.mul %2345, %7266 : !llvm.i64 + %7268 = llvm.add %7265, %7267 : !llvm.i64 + %7269 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7270 = llvm.mul %7263, %7269 : !llvm.i64 + %7271 = llvm.add %7268, %7270 : !llvm.i64 + %7272 = llvm.getelementptr %7264[%7271] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7273 = llvm.bitcast %7272 : !llvm.ptr to !llvm.ptr> + %7274 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7275 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7276 = llvm.trunc %7263 : !llvm.i64 to !llvm.i32 + %7277 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7278 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7279 = llvm.insertelement %7276, %7277[%7278 : !llvm.i32] : !llvm.vec<8 x i32> + %7280 = llvm.shufflevector %7279, %7277 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7281 = llvm.add %7280, %7275 : !llvm.vec<8 x i32> + %7282 = llvm.trunc %7274 : !llvm.i64 to !llvm.i32 + %7283 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7284 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7285 = llvm.insertelement %7282, %7283[%7284 : !llvm.i32] : !llvm.vec<8 x i32> + %7286 = llvm.shufflevector %7285, %7283 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7287 = llvm.icmp "slt" %7281, %7286 : !llvm.vec<8 x i32> + %7288 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7289 = llvm.intr.masked.load %7273, %7287, %7288 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7290 = llvm.add %6347, %61 : !llvm.i64 + %7291 = llvm.icmp "slt" %7290, %67 : !llvm.i64 + %7292 = llvm.sub %64, %7290 : !llvm.i64 + %7293 = llvm.select %7291, %7292, %7290 : !llvm.i1, !llvm.i64 + %7294 = llvm.sdiv %7293, %68 : !llvm.i64 + %7295 = llvm.sub %64, %7294 : !llvm.i64 + %7296 = llvm.select %7291, %7295, %7294 : !llvm.i1, !llvm.i64 + %7297 = llvm.mul %7296, %60 : !llvm.i64 + %7298 = llvm.add %6347, %7297 : !llvm.i64 + %7299 = llvm.add %7298, %61 : !llvm.i64 + %7300 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7302 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7303 = llvm.mul %7299, %7302 : !llvm.i64 + %7304 = llvm.add %7301, %7303 : !llvm.i64 + %7305 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7306 = llvm.mul %67, %7305 : !llvm.i64 + %7307 = llvm.add %7304, %7306 : !llvm.i64 + %7308 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7309 = llvm.mul %6365, %7308 : !llvm.i64 + %7310 = llvm.add %7307, %7309 : !llvm.i64 + %7311 = llvm.getelementptr %7300[%7310] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7312 = llvm.load %7311 : !llvm.ptr> + %7313 = llvm.fadd %7289, %7312 : !llvm.vec<8 x float> + %7314 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7316 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7317 = llvm.mul %67, %7316 : !llvm.i64 + %7318 = llvm.add %7315, %7317 : !llvm.i64 + %7319 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7320 = llvm.mul %35, %7319 : !llvm.i64 + %7321 = llvm.add %7318, %7320 : !llvm.i64 + %7322 = llvm.getelementptr %7314[%7321] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7313, %7322 : !llvm.ptr> + %7323 = llvm.add %6315, %62 : !llvm.i64 + %7324 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7325 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7327 = llvm.mul %2345, %7326 : !llvm.i64 + %7328 = llvm.add %7325, %7327 : !llvm.i64 + %7329 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7330 = llvm.mul %7323, %7329 : !llvm.i64 + %7331 = llvm.add %7328, %7330 : !llvm.i64 + %7332 = llvm.getelementptr %7324[%7331] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7333 = llvm.bitcast %7332 : !llvm.ptr to !llvm.ptr> + %7334 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7335 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7336 = llvm.trunc %7323 : !llvm.i64 to !llvm.i32 + %7337 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7338 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7339 = llvm.insertelement %7336, %7337[%7338 : !llvm.i32] : !llvm.vec<8 x i32> + %7340 = llvm.shufflevector %7339, %7337 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7341 = llvm.add %7340, %7335 : !llvm.vec<8 x i32> + %7342 = llvm.trunc %7334 : !llvm.i64 to !llvm.i32 + %7343 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7344 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7345 = llvm.insertelement %7342, %7343[%7344 : !llvm.i32] : !llvm.vec<8 x i32> + %7346 = llvm.shufflevector %7345, %7343 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7347 = llvm.icmp "slt" %7341, %7346 : !llvm.vec<8 x i32> + %7348 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7349 = llvm.intr.masked.load %7333, %7347, %7348 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7350 = llvm.add %5447, %62 : !llvm.i64 + %7351 = llvm.icmp "slt" %7350, %67 : !llvm.i64 + %7352 = llvm.sub %64, %7350 : !llvm.i64 + %7353 = llvm.select %7351, %7352, %7350 : !llvm.i1, !llvm.i64 + %7354 = llvm.sdiv %7353, %68 : !llvm.i64 + %7355 = llvm.sub %64, %7354 : !llvm.i64 + %7356 = llvm.select %7351, %7355, %7354 : !llvm.i1, !llvm.i64 + %7357 = llvm.srem %7356, %68 : !llvm.i64 + %7358 = llvm.icmp "slt" %7357, %67 : !llvm.i64 + %7359 = llvm.add %7357, %68 : !llvm.i64 + %7360 = llvm.select %7358, %7359, %7357 : !llvm.i1, !llvm.i64 + %7361 = llvm.mul %7356, %65 : !llvm.i64 + %7362 = llvm.add %6429, %7361 : !llvm.i64 + %7363 = llvm.add %7362, %66 : !llvm.i64 + %7364 = llvm.icmp "slt" %7363, %67 : !llvm.i64 + %7365 = llvm.sub %64, %7363 : !llvm.i64 + %7366 = llvm.select %7364, %7365, %7363 : !llvm.i1, !llvm.i64 + %7367 = llvm.sdiv %7366, %63 : !llvm.i64 + %7368 = llvm.sub %64, %7367 : !llvm.i64 + %7369 = llvm.select %7364, %7368, %7367 : !llvm.i1, !llvm.i64 + %7370 = llvm.mul %7369, %65 : !llvm.i64 + %7371 = llvm.add %7362, %7370 : !llvm.i64 + %7372 = llvm.add %7371, %66 : !llvm.i64 + %7373 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7375 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7376 = llvm.mul %7360, %7375 : !llvm.i64 + %7377 = llvm.add %7374, %7376 : !llvm.i64 + %7378 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7379 = llvm.mul %67, %7378 : !llvm.i64 + %7380 = llvm.add %7377, %7379 : !llvm.i64 + %7381 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7382 = llvm.mul %7372, %7381 : !llvm.i64 + %7383 = llvm.add %7380, %7382 : !llvm.i64 + %7384 = llvm.getelementptr %7373[%7383] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7385 = llvm.load %7384 : !llvm.ptr> + %7386 = llvm.fadd %7349, %7385 : !llvm.vec<8 x float> + %7387 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7389 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7390 = llvm.mul %67, %7389 : !llvm.i64 + %7391 = llvm.add %7388, %7390 : !llvm.i64 + %7392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7393 = llvm.mul %66, %7392 : !llvm.i64 + %7394 = llvm.add %7391, %7393 : !llvm.i64 + %7395 = llvm.getelementptr %7387[%7394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7386, %7395 : !llvm.ptr> + llvm.br ^bb48(%67 : !llvm.i64) + ^bb48(%7396: !llvm.i64): // 2 preds: ^bb47, ^bb49 + %7397 = llvm.icmp "slt" %7396, %68 : !llvm.i64 + llvm.cond_br %7397, ^bb49, ^bb46 + ^bb49: // pred: ^bb48 + %7398 = llvm.mul %7396, %70 : !llvm.i64 + %7399 = llvm.add %6315, %7398 : !llvm.i64 + %7400 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7403 = llvm.mul %67, %7402 : !llvm.i64 + %7404 = llvm.add %7401, %7403 : !llvm.i64 + %7405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7406 = llvm.mul %7396, %7405 : !llvm.i64 + %7407 = llvm.add %7404, %7406 : !llvm.i64 + %7408 = llvm.getelementptr %7400[%7407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7409 = llvm.load %7408 : !llvm.ptr> + %7410 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7413 = llvm.mul %2345, %7412 : !llvm.i64 + %7414 = llvm.add %7411, %7413 : !llvm.i64 + %7415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7416 = llvm.mul %7399, %7415 : !llvm.i64 + %7417 = llvm.add %7414, %7416 : !llvm.i64 + %7418 = llvm.getelementptr %7410[%7417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7419 = llvm.bitcast %7418 : !llvm.ptr to !llvm.ptr> + %7420 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7421 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7422 = llvm.trunc %7399 : !llvm.i64 to !llvm.i32 + %7423 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7424 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7425 = llvm.insertelement %7422, %7423[%7424 : !llvm.i32] : !llvm.vec<8 x i32> + %7426 = llvm.shufflevector %7425, %7423 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7427 = llvm.add %7426, %7421 : !llvm.vec<8 x i32> + %7428 = llvm.trunc %7420 : !llvm.i64 to !llvm.i32 + %7429 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7430 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7431 = llvm.insertelement %7428, %7429[%7430 : !llvm.i32] : !llvm.vec<8 x i32> + %7432 = llvm.shufflevector %7431, %7429 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7433 = llvm.icmp "slt" %7427, %7432 : !llvm.vec<8 x i32> + llvm.intr.masked.store %7409, %7419, %7433 {alignment = 4 : i32} : !llvm.vec<8 x float>, !llvm.vec<8 x i1> into !llvm.ptr> + %7434 = llvm.add %7396, %69 : !llvm.i64 + llvm.br ^bb48(%7434 : !llvm.i64) + ^bb50: // pred: ^bb46 + %7435 = llvm.add %5447, %39 : !llvm.i64 + llvm.br ^bb41(%7435 : !llvm.i64) + ^bb51: // pred: ^bb41 + %7436 = llvm.add %2345, %69 : !llvm.i64 + llvm.br ^bb12(%7436 : !llvm.i64) + ^bb52: // pred: ^bb12 + %7437 = llvm.add %151, %38 : !llvm.i64 + llvm.br ^bb1(%7437 : !llvm.i64) + ^bb53: // pred: ^bb1 + llvm.return + } + llvm.func @optimized_matmul_py_4a6286d9(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(784 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(128 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(128 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(512 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(512 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(784 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(512 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(512 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/optimized_matmul/mlir/23_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir b/Tutorials/optimized_matmul/mlir/23_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir new file mode 100644 index 00000000..be537b05 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/23_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir @@ -0,0 +1,10140 @@ +module @optimized_matmul { + llvm.func @optimized_matmul_py_impl_17630232307017152746(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(781 : index) : !llvm.i64 + %25 = llvm.mlir.constant(782 : index) : !llvm.i64 + %26 = llvm.mlir.constant(783 : index) : !llvm.i64 + %27 = llvm.mlir.constant(512 : index) : !llvm.i64 + %28 = llvm.mlir.constant(780 : index) : !llvm.i64 + %29 = llvm.mlir.constant(256 : index) : !llvm.i64 + %30 = llvm.mlir.constant(16 : index) : !llvm.i64 + %31 = llvm.mlir.constant(128 : index) : !llvm.i64 + %32 = llvm.mlir.constant(0 : index) : !llvm.i64 + %33 = llvm.mlir.constant(1 : index) : !llvm.i64 + %34 = llvm.mlir.constant(2 : index) : !llvm.i64 + %35 = llvm.mlir.constant(3 : index) : !llvm.i64 + %36 = llvm.mlir.constant(4 : index) : !llvm.i64 + %37 = llvm.mlir.constant(5 : index) : !llvm.i64 + %38 = llvm.mlir.constant(6 : index) : !llvm.i64 + %39 = llvm.mlir.constant(7 : index) : !llvm.i64 + %40 = llvm.mlir.constant(8 : index) : !llvm.i64 + %41 = llvm.mlir.constant(9 : index) : !llvm.i64 + %42 = llvm.mlir.constant(10 : index) : !llvm.i64 + %43 = llvm.mlir.constant(11 : index) : !llvm.i64 + %44 = llvm.mlir.constant(12 : index) : !llvm.i64 + %45 = llvm.mlir.constant(13 : index) : !llvm.i64 + %46 = llvm.mlir.constant(14 : index) : !llvm.i64 + %47 = llvm.mlir.constant(15 : index) : !llvm.i64 + llvm.br ^bb1(%32 : !llvm.i64) + ^bb1(%48: !llvm.i64): // 2 preds: ^bb0, ^bb23 + %49 = llvm.icmp "slt" %48, %27 : !llvm.i64 + llvm.cond_br %49, ^bb2, ^bb24 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%32 : !llvm.i64) + ^bb3(%50: !llvm.i64): // 2 preds: ^bb2, ^bb13 + %51 = llvm.icmp "slt" %50, %28 : !llvm.i64 + llvm.cond_br %51, ^bb4, ^bb14 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%32 : !llvm.i64) + ^bb5(%52: !llvm.i64): // 2 preds: ^bb4, ^bb12 + %53 = llvm.icmp "slt" %52, %29 : !llvm.i64 + llvm.cond_br %53, ^bb6, ^bb13 + ^bb6: // pred: ^bb5 + llvm.br ^bb7(%32 : !llvm.i64) + ^bb7(%54: !llvm.i64): // 2 preds: ^bb6, ^bb11 + %55 = llvm.icmp "slt" %54, %31 : !llvm.i64 + llvm.cond_br %55, ^bb8, ^bb12 + ^bb8: // pred: ^bb7 + llvm.br ^bb9(%32 : !llvm.i64) + ^bb9(%56: !llvm.i64): // 2 preds: ^bb8, ^bb10 + %57 = llvm.icmp "slt" %56, %36 : !llvm.i64 + llvm.cond_br %57, ^bb10, ^bb11 + ^bb10: // pred: ^bb9 + %58 = llvm.add %48, %52 : !llvm.i64 + %59 = llvm.add %54, %56 : !llvm.i64 + %60 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %61 = llvm.mlir.constant(0 : index) : !llvm.i64 + %62 = llvm.mlir.constant(128 : index) : !llvm.i64 + %63 = llvm.mul %50, %62 : !llvm.i64 + %64 = llvm.add %61, %63 : !llvm.i64 + %65 = llvm.mlir.constant(1 : index) : !llvm.i64 + %66 = llvm.mul %59, %65 : !llvm.i64 + %67 = llvm.add %64, %66 : !llvm.i64 + %68 = llvm.getelementptr %60[%67] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %69 = llvm.load %68 : !llvm.ptr + %70 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %71 = llvm.mlir.constant(0 : index) : !llvm.i64 + %72 = llvm.mlir.constant(512 : index) : !llvm.i64 + %73 = llvm.mul %59, %72 : !llvm.i64 + %74 = llvm.add %71, %73 : !llvm.i64 + %75 = llvm.mlir.constant(1 : index) : !llvm.i64 + %76 = llvm.mul %58, %75 : !llvm.i64 + %77 = llvm.add %74, %76 : !llvm.i64 + %78 = llvm.getelementptr %70[%77] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %79 = llvm.load %78 : !llvm.ptr + %80 = llvm.fmul %69, %79 {RelaxedPrecision} : !llvm.float + %81 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %82 = llvm.mlir.constant(0 : index) : !llvm.i64 + %83 = llvm.mlir.constant(512 : index) : !llvm.i64 + %84 = llvm.mul %50, %83 : !llvm.i64 + %85 = llvm.add %82, %84 : !llvm.i64 + %86 = llvm.mlir.constant(1 : index) : !llvm.i64 + %87 = llvm.mul %58, %86 : !llvm.i64 + %88 = llvm.add %85, %87 : !llvm.i64 + %89 = llvm.getelementptr %81[%88] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %90 = llvm.load %89 : !llvm.ptr + %91 = llvm.fadd %90, %80 {RelaxedPrecision} : !llvm.float + %92 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %93 = llvm.mlir.constant(0 : index) : !llvm.i64 + %94 = llvm.mlir.constant(512 : index) : !llvm.i64 + %95 = llvm.mul %50, %94 : !llvm.i64 + %96 = llvm.add %93, %95 : !llvm.i64 + %97 = llvm.mlir.constant(1 : index) : !llvm.i64 + %98 = llvm.mul %58, %97 : !llvm.i64 + %99 = llvm.add %96, %98 : !llvm.i64 + %100 = llvm.getelementptr %92[%99] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %91, %100 : !llvm.ptr + %101 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %102 = llvm.mlir.constant(0 : index) : !llvm.i64 + %103 = llvm.mlir.constant(512 : index) : !llvm.i64 + %104 = llvm.mul %50, %103 : !llvm.i64 + %105 = llvm.add %102, %104 : !llvm.i64 + %106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %107 = llvm.mul %58, %106 : !llvm.i64 + %108 = llvm.add %105, %107 : !llvm.i64 + %109 = llvm.getelementptr %101[%108] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %110 = llvm.load %109 : !llvm.ptr + %111 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %112 = llvm.mlir.constant(0 : index) : !llvm.i64 + %113 = llvm.mlir.constant(512 : index) : !llvm.i64 + %114 = llvm.mul %50, %113 : !llvm.i64 + %115 = llvm.add %112, %114 : !llvm.i64 + %116 = llvm.mlir.constant(1 : index) : !llvm.i64 + %117 = llvm.mul %58, %116 : !llvm.i64 + %118 = llvm.add %115, %117 : !llvm.i64 + %119 = llvm.getelementptr %111[%118] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %110, %119 : !llvm.ptr + %120 = llvm.add %58, %33 : !llvm.i64 + %121 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %122 = llvm.mlir.constant(0 : index) : !llvm.i64 + %123 = llvm.mlir.constant(128 : index) : !llvm.i64 + %124 = llvm.mul %50, %123 : !llvm.i64 + %125 = llvm.add %122, %124 : !llvm.i64 + %126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %127 = llvm.mul %59, %126 : !llvm.i64 + %128 = llvm.add %125, %127 : !llvm.i64 + %129 = llvm.getelementptr %121[%128] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %130 = llvm.load %129 : !llvm.ptr + %131 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %134 = llvm.mul %59, %133 : !llvm.i64 + %135 = llvm.add %132, %134 : !llvm.i64 + %136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %137 = llvm.mul %120, %136 : !llvm.i64 + %138 = llvm.add %135, %137 : !llvm.i64 + %139 = llvm.getelementptr %131[%138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %140 = llvm.load %139 : !llvm.ptr + %141 = llvm.fmul %130, %140 {RelaxedPrecision} : !llvm.float + %142 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %143 = llvm.mlir.constant(0 : index) : !llvm.i64 + %144 = llvm.mlir.constant(512 : index) : !llvm.i64 + %145 = llvm.mul %50, %144 : !llvm.i64 + %146 = llvm.add %143, %145 : !llvm.i64 + %147 = llvm.mlir.constant(1 : index) : !llvm.i64 + %148 = llvm.mul %120, %147 : !llvm.i64 + %149 = llvm.add %146, %148 : !llvm.i64 + %150 = llvm.getelementptr %142[%149] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %151 = llvm.load %150 : !llvm.ptr + %152 = llvm.fadd %151, %141 {RelaxedPrecision} : !llvm.float + %153 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %154 = llvm.mlir.constant(0 : index) : !llvm.i64 + %155 = llvm.mlir.constant(512 : index) : !llvm.i64 + %156 = llvm.mul %50, %155 : !llvm.i64 + %157 = llvm.add %154, %156 : !llvm.i64 + %158 = llvm.mlir.constant(1 : index) : !llvm.i64 + %159 = llvm.mul %120, %158 : !llvm.i64 + %160 = llvm.add %157, %159 : !llvm.i64 + %161 = llvm.getelementptr %153[%160] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %152, %161 : !llvm.ptr + %162 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %163 = llvm.mlir.constant(0 : index) : !llvm.i64 + %164 = llvm.mlir.constant(512 : index) : !llvm.i64 + %165 = llvm.mul %50, %164 : !llvm.i64 + %166 = llvm.add %163, %165 : !llvm.i64 + %167 = llvm.mlir.constant(1 : index) : !llvm.i64 + %168 = llvm.mul %120, %167 : !llvm.i64 + %169 = llvm.add %166, %168 : !llvm.i64 + %170 = llvm.getelementptr %162[%169] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %171 = llvm.load %170 : !llvm.ptr + %172 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %173 = llvm.mlir.constant(0 : index) : !llvm.i64 + %174 = llvm.mlir.constant(512 : index) : !llvm.i64 + %175 = llvm.mul %50, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.mlir.constant(1 : index) : !llvm.i64 + %178 = llvm.mul %120, %177 : !llvm.i64 + %179 = llvm.add %176, %178 : !llvm.i64 + %180 = llvm.getelementptr %172[%179] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %171, %180 : !llvm.ptr + %181 = llvm.add %58, %34 : !llvm.i64 + %182 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %183 = llvm.mlir.constant(0 : index) : !llvm.i64 + %184 = llvm.mlir.constant(128 : index) : !llvm.i64 + %185 = llvm.mul %50, %184 : !llvm.i64 + %186 = llvm.add %183, %185 : !llvm.i64 + %187 = llvm.mlir.constant(1 : index) : !llvm.i64 + %188 = llvm.mul %59, %187 : !llvm.i64 + %189 = llvm.add %186, %188 : !llvm.i64 + %190 = llvm.getelementptr %182[%189] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %191 = llvm.load %190 : !llvm.ptr + %192 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %193 = llvm.mlir.constant(0 : index) : !llvm.i64 + %194 = llvm.mlir.constant(512 : index) : !llvm.i64 + %195 = llvm.mul %59, %194 : !llvm.i64 + %196 = llvm.add %193, %195 : !llvm.i64 + %197 = llvm.mlir.constant(1 : index) : !llvm.i64 + %198 = llvm.mul %181, %197 : !llvm.i64 + %199 = llvm.add %196, %198 : !llvm.i64 + %200 = llvm.getelementptr %192[%199] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %201 = llvm.load %200 : !llvm.ptr + %202 = llvm.fmul %191, %201 {RelaxedPrecision} : !llvm.float + %203 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %204 = llvm.mlir.constant(0 : index) : !llvm.i64 + %205 = llvm.mlir.constant(512 : index) : !llvm.i64 + %206 = llvm.mul %50, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.mlir.constant(1 : index) : !llvm.i64 + %209 = llvm.mul %181, %208 : !llvm.i64 + %210 = llvm.add %207, %209 : !llvm.i64 + %211 = llvm.getelementptr %203[%210] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %212 = llvm.load %211 : !llvm.ptr + %213 = llvm.fadd %212, %202 {RelaxedPrecision} : !llvm.float + %214 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %215 = llvm.mlir.constant(0 : index) : !llvm.i64 + %216 = llvm.mlir.constant(512 : index) : !llvm.i64 + %217 = llvm.mul %50, %216 : !llvm.i64 + %218 = llvm.add %215, %217 : !llvm.i64 + %219 = llvm.mlir.constant(1 : index) : !llvm.i64 + %220 = llvm.mul %181, %219 : !llvm.i64 + %221 = llvm.add %218, %220 : !llvm.i64 + %222 = llvm.getelementptr %214[%221] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %213, %222 : !llvm.ptr + %223 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %224 = llvm.mlir.constant(0 : index) : !llvm.i64 + %225 = llvm.mlir.constant(512 : index) : !llvm.i64 + %226 = llvm.mul %50, %225 : !llvm.i64 + %227 = llvm.add %224, %226 : !llvm.i64 + %228 = llvm.mlir.constant(1 : index) : !llvm.i64 + %229 = llvm.mul %181, %228 : !llvm.i64 + %230 = llvm.add %227, %229 : !llvm.i64 + %231 = llvm.getelementptr %223[%230] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %232 = llvm.load %231 : !llvm.ptr + %233 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %234 = llvm.mlir.constant(0 : index) : !llvm.i64 + %235 = llvm.mlir.constant(512 : index) : !llvm.i64 + %236 = llvm.mul %50, %235 : !llvm.i64 + %237 = llvm.add %234, %236 : !llvm.i64 + %238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %239 = llvm.mul %181, %238 : !llvm.i64 + %240 = llvm.add %237, %239 : !llvm.i64 + %241 = llvm.getelementptr %233[%240] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %232, %241 : !llvm.ptr + %242 = llvm.add %58, %35 : !llvm.i64 + %243 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %244 = llvm.mlir.constant(0 : index) : !llvm.i64 + %245 = llvm.mlir.constant(128 : index) : !llvm.i64 + %246 = llvm.mul %50, %245 : !llvm.i64 + %247 = llvm.add %244, %246 : !llvm.i64 + %248 = llvm.mlir.constant(1 : index) : !llvm.i64 + %249 = llvm.mul %59, %248 : !llvm.i64 + %250 = llvm.add %247, %249 : !llvm.i64 + %251 = llvm.getelementptr %243[%250] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %252 = llvm.load %251 : !llvm.ptr + %253 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %254 = llvm.mlir.constant(0 : index) : !llvm.i64 + %255 = llvm.mlir.constant(512 : index) : !llvm.i64 + %256 = llvm.mul %59, %255 : !llvm.i64 + %257 = llvm.add %254, %256 : !llvm.i64 + %258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %259 = llvm.mul %242, %258 : !llvm.i64 + %260 = llvm.add %257, %259 : !llvm.i64 + %261 = llvm.getelementptr %253[%260] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %262 = llvm.load %261 : !llvm.ptr + %263 = llvm.fmul %252, %262 {RelaxedPrecision} : !llvm.float + %264 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %265 = llvm.mlir.constant(0 : index) : !llvm.i64 + %266 = llvm.mlir.constant(512 : index) : !llvm.i64 + %267 = llvm.mul %50, %266 : !llvm.i64 + %268 = llvm.add %265, %267 : !llvm.i64 + %269 = llvm.mlir.constant(1 : index) : !llvm.i64 + %270 = llvm.mul %242, %269 : !llvm.i64 + %271 = llvm.add %268, %270 : !llvm.i64 + %272 = llvm.getelementptr %264[%271] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %273 = llvm.load %272 : !llvm.ptr + %274 = llvm.fadd %273, %263 {RelaxedPrecision} : !llvm.float + %275 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %276 = llvm.mlir.constant(0 : index) : !llvm.i64 + %277 = llvm.mlir.constant(512 : index) : !llvm.i64 + %278 = llvm.mul %50, %277 : !llvm.i64 + %279 = llvm.add %276, %278 : !llvm.i64 + %280 = llvm.mlir.constant(1 : index) : !llvm.i64 + %281 = llvm.mul %242, %280 : !llvm.i64 + %282 = llvm.add %279, %281 : !llvm.i64 + %283 = llvm.getelementptr %275[%282] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %274, %283 : !llvm.ptr + %284 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %286 = llvm.mlir.constant(512 : index) : !llvm.i64 + %287 = llvm.mul %50, %286 : !llvm.i64 + %288 = llvm.add %285, %287 : !llvm.i64 + %289 = llvm.mlir.constant(1 : index) : !llvm.i64 + %290 = llvm.mul %242, %289 : !llvm.i64 + %291 = llvm.add %288, %290 : !llvm.i64 + %292 = llvm.getelementptr %284[%291] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %293 = llvm.load %292 : !llvm.ptr + %294 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %295 = llvm.mlir.constant(0 : index) : !llvm.i64 + %296 = llvm.mlir.constant(512 : index) : !llvm.i64 + %297 = llvm.mul %50, %296 : !llvm.i64 + %298 = llvm.add %295, %297 : !llvm.i64 + %299 = llvm.mlir.constant(1 : index) : !llvm.i64 + %300 = llvm.mul %242, %299 : !llvm.i64 + %301 = llvm.add %298, %300 : !llvm.i64 + %302 = llvm.getelementptr %294[%301] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %293, %302 : !llvm.ptr + %303 = llvm.add %58, %36 : !llvm.i64 + %304 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %305 = llvm.mlir.constant(0 : index) : !llvm.i64 + %306 = llvm.mlir.constant(128 : index) : !llvm.i64 + %307 = llvm.mul %50, %306 : !llvm.i64 + %308 = llvm.add %305, %307 : !llvm.i64 + %309 = llvm.mlir.constant(1 : index) : !llvm.i64 + %310 = llvm.mul %59, %309 : !llvm.i64 + %311 = llvm.add %308, %310 : !llvm.i64 + %312 = llvm.getelementptr %304[%311] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %313 = llvm.load %312 : !llvm.ptr + %314 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %316 = llvm.mlir.constant(512 : index) : !llvm.i64 + %317 = llvm.mul %59, %316 : !llvm.i64 + %318 = llvm.add %315, %317 : !llvm.i64 + %319 = llvm.mlir.constant(1 : index) : !llvm.i64 + %320 = llvm.mul %303, %319 : !llvm.i64 + %321 = llvm.add %318, %320 : !llvm.i64 + %322 = llvm.getelementptr %314[%321] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %323 = llvm.load %322 : !llvm.ptr + %324 = llvm.fmul %313, %323 {RelaxedPrecision} : !llvm.float + %325 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %326 = llvm.mlir.constant(0 : index) : !llvm.i64 + %327 = llvm.mlir.constant(512 : index) : !llvm.i64 + %328 = llvm.mul %50, %327 : !llvm.i64 + %329 = llvm.add %326, %328 : !llvm.i64 + %330 = llvm.mlir.constant(1 : index) : !llvm.i64 + %331 = llvm.mul %303, %330 : !llvm.i64 + %332 = llvm.add %329, %331 : !llvm.i64 + %333 = llvm.getelementptr %325[%332] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %334 = llvm.load %333 : !llvm.ptr + %335 = llvm.fadd %334, %324 {RelaxedPrecision} : !llvm.float + %336 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %337 = llvm.mlir.constant(0 : index) : !llvm.i64 + %338 = llvm.mlir.constant(512 : index) : !llvm.i64 + %339 = llvm.mul %50, %338 : !llvm.i64 + %340 = llvm.add %337, %339 : !llvm.i64 + %341 = llvm.mlir.constant(1 : index) : !llvm.i64 + %342 = llvm.mul %303, %341 : !llvm.i64 + %343 = llvm.add %340, %342 : !llvm.i64 + %344 = llvm.getelementptr %336[%343] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %335, %344 : !llvm.ptr + %345 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %346 = llvm.mlir.constant(0 : index) : !llvm.i64 + %347 = llvm.mlir.constant(512 : index) : !llvm.i64 + %348 = llvm.mul %50, %347 : !llvm.i64 + %349 = llvm.add %346, %348 : !llvm.i64 + %350 = llvm.mlir.constant(1 : index) : !llvm.i64 + %351 = llvm.mul %303, %350 : !llvm.i64 + %352 = llvm.add %349, %351 : !llvm.i64 + %353 = llvm.getelementptr %345[%352] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %354 = llvm.load %353 : !llvm.ptr + %355 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %356 = llvm.mlir.constant(0 : index) : !llvm.i64 + %357 = llvm.mlir.constant(512 : index) : !llvm.i64 + %358 = llvm.mul %50, %357 : !llvm.i64 + %359 = llvm.add %356, %358 : !llvm.i64 + %360 = llvm.mlir.constant(1 : index) : !llvm.i64 + %361 = llvm.mul %303, %360 : !llvm.i64 + %362 = llvm.add %359, %361 : !llvm.i64 + %363 = llvm.getelementptr %355[%362] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %354, %363 : !llvm.ptr + %364 = llvm.add %58, %37 : !llvm.i64 + %365 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %366 = llvm.mlir.constant(0 : index) : !llvm.i64 + %367 = llvm.mlir.constant(128 : index) : !llvm.i64 + %368 = llvm.mul %50, %367 : !llvm.i64 + %369 = llvm.add %366, %368 : !llvm.i64 + %370 = llvm.mlir.constant(1 : index) : !llvm.i64 + %371 = llvm.mul %59, %370 : !llvm.i64 + %372 = llvm.add %369, %371 : !llvm.i64 + %373 = llvm.getelementptr %365[%372] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %374 = llvm.load %373 : !llvm.ptr + %375 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %376 = llvm.mlir.constant(0 : index) : !llvm.i64 + %377 = llvm.mlir.constant(512 : index) : !llvm.i64 + %378 = llvm.mul %59, %377 : !llvm.i64 + %379 = llvm.add %376, %378 : !llvm.i64 + %380 = llvm.mlir.constant(1 : index) : !llvm.i64 + %381 = llvm.mul %364, %380 : !llvm.i64 + %382 = llvm.add %379, %381 : !llvm.i64 + %383 = llvm.getelementptr %375[%382] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %384 = llvm.load %383 : !llvm.ptr + %385 = llvm.fmul %374, %384 {RelaxedPrecision} : !llvm.float + %386 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %387 = llvm.mlir.constant(0 : index) : !llvm.i64 + %388 = llvm.mlir.constant(512 : index) : !llvm.i64 + %389 = llvm.mul %50, %388 : !llvm.i64 + %390 = llvm.add %387, %389 : !llvm.i64 + %391 = llvm.mlir.constant(1 : index) : !llvm.i64 + %392 = llvm.mul %364, %391 : !llvm.i64 + %393 = llvm.add %390, %392 : !llvm.i64 + %394 = llvm.getelementptr %386[%393] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %395 = llvm.load %394 : !llvm.ptr + %396 = llvm.fadd %395, %385 {RelaxedPrecision} : !llvm.float + %397 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %398 = llvm.mlir.constant(0 : index) : !llvm.i64 + %399 = llvm.mlir.constant(512 : index) : !llvm.i64 + %400 = llvm.mul %50, %399 : !llvm.i64 + %401 = llvm.add %398, %400 : !llvm.i64 + %402 = llvm.mlir.constant(1 : index) : !llvm.i64 + %403 = llvm.mul %364, %402 : !llvm.i64 + %404 = llvm.add %401, %403 : !llvm.i64 + %405 = llvm.getelementptr %397[%404] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %396, %405 : !llvm.ptr + %406 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %408 = llvm.mlir.constant(512 : index) : !llvm.i64 + %409 = llvm.mul %50, %408 : !llvm.i64 + %410 = llvm.add %407, %409 : !llvm.i64 + %411 = llvm.mlir.constant(1 : index) : !llvm.i64 + %412 = llvm.mul %364, %411 : !llvm.i64 + %413 = llvm.add %410, %412 : !llvm.i64 + %414 = llvm.getelementptr %406[%413] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %415 = llvm.load %414 : !llvm.ptr + %416 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %417 = llvm.mlir.constant(0 : index) : !llvm.i64 + %418 = llvm.mlir.constant(512 : index) : !llvm.i64 + %419 = llvm.mul %50, %418 : !llvm.i64 + %420 = llvm.add %417, %419 : !llvm.i64 + %421 = llvm.mlir.constant(1 : index) : !llvm.i64 + %422 = llvm.mul %364, %421 : !llvm.i64 + %423 = llvm.add %420, %422 : !llvm.i64 + %424 = llvm.getelementptr %416[%423] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %415, %424 : !llvm.ptr + %425 = llvm.add %58, %38 : !llvm.i64 + %426 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %427 = llvm.mlir.constant(0 : index) : !llvm.i64 + %428 = llvm.mlir.constant(128 : index) : !llvm.i64 + %429 = llvm.mul %50, %428 : !llvm.i64 + %430 = llvm.add %427, %429 : !llvm.i64 + %431 = llvm.mlir.constant(1 : index) : !llvm.i64 + %432 = llvm.mul %59, %431 : !llvm.i64 + %433 = llvm.add %430, %432 : !llvm.i64 + %434 = llvm.getelementptr %426[%433] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %435 = llvm.load %434 : !llvm.ptr + %436 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %437 = llvm.mlir.constant(0 : index) : !llvm.i64 + %438 = llvm.mlir.constant(512 : index) : !llvm.i64 + %439 = llvm.mul %59, %438 : !llvm.i64 + %440 = llvm.add %437, %439 : !llvm.i64 + %441 = llvm.mlir.constant(1 : index) : !llvm.i64 + %442 = llvm.mul %425, %441 : !llvm.i64 + %443 = llvm.add %440, %442 : !llvm.i64 + %444 = llvm.getelementptr %436[%443] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %445 = llvm.load %444 : !llvm.ptr + %446 = llvm.fmul %435, %445 {RelaxedPrecision} : !llvm.float + %447 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %449 = llvm.mlir.constant(512 : index) : !llvm.i64 + %450 = llvm.mul %50, %449 : !llvm.i64 + %451 = llvm.add %448, %450 : !llvm.i64 + %452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %453 = llvm.mul %425, %452 : !llvm.i64 + %454 = llvm.add %451, %453 : !llvm.i64 + %455 = llvm.getelementptr %447[%454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %456 = llvm.load %455 : !llvm.ptr + %457 = llvm.fadd %456, %446 {RelaxedPrecision} : !llvm.float + %458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %461 = llvm.mul %50, %460 : !llvm.i64 + %462 = llvm.add %459, %461 : !llvm.i64 + %463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %464 = llvm.mul %425, %463 : !llvm.i64 + %465 = llvm.add %462, %464 : !llvm.i64 + %466 = llvm.getelementptr %458[%465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %457, %466 : !llvm.ptr + %467 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %468 = llvm.mlir.constant(0 : index) : !llvm.i64 + %469 = llvm.mlir.constant(512 : index) : !llvm.i64 + %470 = llvm.mul %50, %469 : !llvm.i64 + %471 = llvm.add %468, %470 : !llvm.i64 + %472 = llvm.mlir.constant(1 : index) : !llvm.i64 + %473 = llvm.mul %425, %472 : !llvm.i64 + %474 = llvm.add %471, %473 : !llvm.i64 + %475 = llvm.getelementptr %467[%474] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %476 = llvm.load %475 : !llvm.ptr + %477 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %480 = llvm.mul %50, %479 : !llvm.i64 + %481 = llvm.add %478, %480 : !llvm.i64 + %482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %483 = llvm.mul %425, %482 : !llvm.i64 + %484 = llvm.add %481, %483 : !llvm.i64 + %485 = llvm.getelementptr %477[%484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %476, %485 : !llvm.ptr + %486 = llvm.add %58, %39 : !llvm.i64 + %487 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %489 = llvm.mlir.constant(128 : index) : !llvm.i64 + %490 = llvm.mul %50, %489 : !llvm.i64 + %491 = llvm.add %488, %490 : !llvm.i64 + %492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %493 = llvm.mul %59, %492 : !llvm.i64 + %494 = llvm.add %491, %493 : !llvm.i64 + %495 = llvm.getelementptr %487[%494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %496 = llvm.load %495 : !llvm.ptr + %497 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %500 = llvm.mul %59, %499 : !llvm.i64 + %501 = llvm.add %498, %500 : !llvm.i64 + %502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %503 = llvm.mul %486, %502 : !llvm.i64 + %504 = llvm.add %501, %503 : !llvm.i64 + %505 = llvm.getelementptr %497[%504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %506 = llvm.load %505 : !llvm.ptr + %507 = llvm.fmul %496, %506 {RelaxedPrecision} : !llvm.float + %508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %511 = llvm.mul %50, %510 : !llvm.i64 + %512 = llvm.add %509, %511 : !llvm.i64 + %513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %514 = llvm.mul %486, %513 : !llvm.i64 + %515 = llvm.add %512, %514 : !llvm.i64 + %516 = llvm.getelementptr %508[%515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %517 = llvm.load %516 : !llvm.ptr + %518 = llvm.fadd %517, %507 {RelaxedPrecision} : !llvm.float + %519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %522 = llvm.mul %50, %521 : !llvm.i64 + %523 = llvm.add %520, %522 : !llvm.i64 + %524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %525 = llvm.mul %486, %524 : !llvm.i64 + %526 = llvm.add %523, %525 : !llvm.i64 + %527 = llvm.getelementptr %519[%526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %518, %527 : !llvm.ptr + %528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %531 = llvm.mul %50, %530 : !llvm.i64 + %532 = llvm.add %529, %531 : !llvm.i64 + %533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %534 = llvm.mul %486, %533 : !llvm.i64 + %535 = llvm.add %532, %534 : !llvm.i64 + %536 = llvm.getelementptr %528[%535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %537 = llvm.load %536 : !llvm.ptr + %538 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %539 = llvm.mlir.constant(0 : index) : !llvm.i64 + %540 = llvm.mlir.constant(512 : index) : !llvm.i64 + %541 = llvm.mul %50, %540 : !llvm.i64 + %542 = llvm.add %539, %541 : !llvm.i64 + %543 = llvm.mlir.constant(1 : index) : !llvm.i64 + %544 = llvm.mul %486, %543 : !llvm.i64 + %545 = llvm.add %542, %544 : !llvm.i64 + %546 = llvm.getelementptr %538[%545] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %537, %546 : !llvm.ptr + %547 = llvm.add %58, %40 : !llvm.i64 + %548 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %550 = llvm.mlir.constant(128 : index) : !llvm.i64 + %551 = llvm.mul %50, %550 : !llvm.i64 + %552 = llvm.add %549, %551 : !llvm.i64 + %553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %554 = llvm.mul %59, %553 : !llvm.i64 + %555 = llvm.add %552, %554 : !llvm.i64 + %556 = llvm.getelementptr %548[%555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %557 = llvm.load %556 : !llvm.ptr + %558 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %561 = llvm.mul %59, %560 : !llvm.i64 + %562 = llvm.add %559, %561 : !llvm.i64 + %563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %564 = llvm.mul %547, %563 : !llvm.i64 + %565 = llvm.add %562, %564 : !llvm.i64 + %566 = llvm.getelementptr %558[%565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %567 = llvm.load %566 : !llvm.ptr + %568 = llvm.fmul %557, %567 {RelaxedPrecision} : !llvm.float + %569 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %570 = llvm.mlir.constant(0 : index) : !llvm.i64 + %571 = llvm.mlir.constant(512 : index) : !llvm.i64 + %572 = llvm.mul %50, %571 : !llvm.i64 + %573 = llvm.add %570, %572 : !llvm.i64 + %574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %575 = llvm.mul %547, %574 : !llvm.i64 + %576 = llvm.add %573, %575 : !llvm.i64 + %577 = llvm.getelementptr %569[%576] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %578 = llvm.load %577 : !llvm.ptr + %579 = llvm.fadd %578, %568 {RelaxedPrecision} : !llvm.float + %580 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %581 = llvm.mlir.constant(0 : index) : !llvm.i64 + %582 = llvm.mlir.constant(512 : index) : !llvm.i64 + %583 = llvm.mul %50, %582 : !llvm.i64 + %584 = llvm.add %581, %583 : !llvm.i64 + %585 = llvm.mlir.constant(1 : index) : !llvm.i64 + %586 = llvm.mul %547, %585 : !llvm.i64 + %587 = llvm.add %584, %586 : !llvm.i64 + %588 = llvm.getelementptr %580[%587] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %579, %588 : !llvm.ptr + %589 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %591 = llvm.mlir.constant(512 : index) : !llvm.i64 + %592 = llvm.mul %50, %591 : !llvm.i64 + %593 = llvm.add %590, %592 : !llvm.i64 + %594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %595 = llvm.mul %547, %594 : !llvm.i64 + %596 = llvm.add %593, %595 : !llvm.i64 + %597 = llvm.getelementptr %589[%596] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %598 = llvm.load %597 : !llvm.ptr + %599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %602 = llvm.mul %50, %601 : !llvm.i64 + %603 = llvm.add %600, %602 : !llvm.i64 + %604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %605 = llvm.mul %547, %604 : !llvm.i64 + %606 = llvm.add %603, %605 : !llvm.i64 + %607 = llvm.getelementptr %599[%606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %598, %607 : !llvm.ptr + %608 = llvm.add %58, %41 : !llvm.i64 + %609 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %610 = llvm.mlir.constant(0 : index) : !llvm.i64 + %611 = llvm.mlir.constant(128 : index) : !llvm.i64 + %612 = llvm.mul %50, %611 : !llvm.i64 + %613 = llvm.add %610, %612 : !llvm.i64 + %614 = llvm.mlir.constant(1 : index) : !llvm.i64 + %615 = llvm.mul %59, %614 : !llvm.i64 + %616 = llvm.add %613, %615 : !llvm.i64 + %617 = llvm.getelementptr %609[%616] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %618 = llvm.load %617 : !llvm.ptr + %619 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %620 = llvm.mlir.constant(0 : index) : !llvm.i64 + %621 = llvm.mlir.constant(512 : index) : !llvm.i64 + %622 = llvm.mul %59, %621 : !llvm.i64 + %623 = llvm.add %620, %622 : !llvm.i64 + %624 = llvm.mlir.constant(1 : index) : !llvm.i64 + %625 = llvm.mul %608, %624 : !llvm.i64 + %626 = llvm.add %623, %625 : !llvm.i64 + %627 = llvm.getelementptr %619[%626] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %628 = llvm.load %627 : !llvm.ptr + %629 = llvm.fmul %618, %628 {RelaxedPrecision} : !llvm.float + %630 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %632 = llvm.mlir.constant(512 : index) : !llvm.i64 + %633 = llvm.mul %50, %632 : !llvm.i64 + %634 = llvm.add %631, %633 : !llvm.i64 + %635 = llvm.mlir.constant(1 : index) : !llvm.i64 + %636 = llvm.mul %608, %635 : !llvm.i64 + %637 = llvm.add %634, %636 : !llvm.i64 + %638 = llvm.getelementptr %630[%637] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %639 = llvm.load %638 : !llvm.ptr + %640 = llvm.fadd %639, %629 {RelaxedPrecision} : !llvm.float + %641 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %642 = llvm.mlir.constant(0 : index) : !llvm.i64 + %643 = llvm.mlir.constant(512 : index) : !llvm.i64 + %644 = llvm.mul %50, %643 : !llvm.i64 + %645 = llvm.add %642, %644 : !llvm.i64 + %646 = llvm.mlir.constant(1 : index) : !llvm.i64 + %647 = llvm.mul %608, %646 : !llvm.i64 + %648 = llvm.add %645, %647 : !llvm.i64 + %649 = llvm.getelementptr %641[%648] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %640, %649 : !llvm.ptr + %650 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %651 = llvm.mlir.constant(0 : index) : !llvm.i64 + %652 = llvm.mlir.constant(512 : index) : !llvm.i64 + %653 = llvm.mul %50, %652 : !llvm.i64 + %654 = llvm.add %651, %653 : !llvm.i64 + %655 = llvm.mlir.constant(1 : index) : !llvm.i64 + %656 = llvm.mul %608, %655 : !llvm.i64 + %657 = llvm.add %654, %656 : !llvm.i64 + %658 = llvm.getelementptr %650[%657] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %659 = llvm.load %658 : !llvm.ptr + %660 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %662 = llvm.mlir.constant(512 : index) : !llvm.i64 + %663 = llvm.mul %50, %662 : !llvm.i64 + %664 = llvm.add %661, %663 : !llvm.i64 + %665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %666 = llvm.mul %608, %665 : !llvm.i64 + %667 = llvm.add %664, %666 : !llvm.i64 + %668 = llvm.getelementptr %660[%667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %659, %668 : !llvm.ptr + %669 = llvm.add %58, %42 : !llvm.i64 + %670 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %672 = llvm.mlir.constant(128 : index) : !llvm.i64 + %673 = llvm.mul %50, %672 : !llvm.i64 + %674 = llvm.add %671, %673 : !llvm.i64 + %675 = llvm.mlir.constant(1 : index) : !llvm.i64 + %676 = llvm.mul %59, %675 : !llvm.i64 + %677 = llvm.add %674, %676 : !llvm.i64 + %678 = llvm.getelementptr %670[%677] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %679 = llvm.load %678 : !llvm.ptr + %680 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %681 = llvm.mlir.constant(0 : index) : !llvm.i64 + %682 = llvm.mlir.constant(512 : index) : !llvm.i64 + %683 = llvm.mul %59, %682 : !llvm.i64 + %684 = llvm.add %681, %683 : !llvm.i64 + %685 = llvm.mlir.constant(1 : index) : !llvm.i64 + %686 = llvm.mul %669, %685 : !llvm.i64 + %687 = llvm.add %684, %686 : !llvm.i64 + %688 = llvm.getelementptr %680[%687] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %689 = llvm.load %688 : !llvm.ptr + %690 = llvm.fmul %679, %689 {RelaxedPrecision} : !llvm.float + %691 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %692 = llvm.mlir.constant(0 : index) : !llvm.i64 + %693 = llvm.mlir.constant(512 : index) : !llvm.i64 + %694 = llvm.mul %50, %693 : !llvm.i64 + %695 = llvm.add %692, %694 : !llvm.i64 + %696 = llvm.mlir.constant(1 : index) : !llvm.i64 + %697 = llvm.mul %669, %696 : !llvm.i64 + %698 = llvm.add %695, %697 : !llvm.i64 + %699 = llvm.getelementptr %691[%698] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %700 = llvm.load %699 : !llvm.ptr + %701 = llvm.fadd %700, %690 {RelaxedPrecision} : !llvm.float + %702 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %703 = llvm.mlir.constant(0 : index) : !llvm.i64 + %704 = llvm.mlir.constant(512 : index) : !llvm.i64 + %705 = llvm.mul %50, %704 : !llvm.i64 + %706 = llvm.add %703, %705 : !llvm.i64 + %707 = llvm.mlir.constant(1 : index) : !llvm.i64 + %708 = llvm.mul %669, %707 : !llvm.i64 + %709 = llvm.add %706, %708 : !llvm.i64 + %710 = llvm.getelementptr %702[%709] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %701, %710 : !llvm.ptr + %711 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %712 = llvm.mlir.constant(0 : index) : !llvm.i64 + %713 = llvm.mlir.constant(512 : index) : !llvm.i64 + %714 = llvm.mul %50, %713 : !llvm.i64 + %715 = llvm.add %712, %714 : !llvm.i64 + %716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %717 = llvm.mul %669, %716 : !llvm.i64 + %718 = llvm.add %715, %717 : !llvm.i64 + %719 = llvm.getelementptr %711[%718] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %720 = llvm.load %719 : !llvm.ptr + %721 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %722 = llvm.mlir.constant(0 : index) : !llvm.i64 + %723 = llvm.mlir.constant(512 : index) : !llvm.i64 + %724 = llvm.mul %50, %723 : !llvm.i64 + %725 = llvm.add %722, %724 : !llvm.i64 + %726 = llvm.mlir.constant(1 : index) : !llvm.i64 + %727 = llvm.mul %669, %726 : !llvm.i64 + %728 = llvm.add %725, %727 : !llvm.i64 + %729 = llvm.getelementptr %721[%728] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %720, %729 : !llvm.ptr + %730 = llvm.add %58, %43 : !llvm.i64 + %731 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %732 = llvm.mlir.constant(0 : index) : !llvm.i64 + %733 = llvm.mlir.constant(128 : index) : !llvm.i64 + %734 = llvm.mul %50, %733 : !llvm.i64 + %735 = llvm.add %732, %734 : !llvm.i64 + %736 = llvm.mlir.constant(1 : index) : !llvm.i64 + %737 = llvm.mul %59, %736 : !llvm.i64 + %738 = llvm.add %735, %737 : !llvm.i64 + %739 = llvm.getelementptr %731[%738] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %740 = llvm.load %739 : !llvm.ptr + %741 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %742 = llvm.mlir.constant(0 : index) : !llvm.i64 + %743 = llvm.mlir.constant(512 : index) : !llvm.i64 + %744 = llvm.mul %59, %743 : !llvm.i64 + %745 = llvm.add %742, %744 : !llvm.i64 + %746 = llvm.mlir.constant(1 : index) : !llvm.i64 + %747 = llvm.mul %730, %746 : !llvm.i64 + %748 = llvm.add %745, %747 : !llvm.i64 + %749 = llvm.getelementptr %741[%748] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %750 = llvm.load %749 : !llvm.ptr + %751 = llvm.fmul %740, %750 {RelaxedPrecision} : !llvm.float + %752 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %754 = llvm.mlir.constant(512 : index) : !llvm.i64 + %755 = llvm.mul %50, %754 : !llvm.i64 + %756 = llvm.add %753, %755 : !llvm.i64 + %757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %758 = llvm.mul %730, %757 : !llvm.i64 + %759 = llvm.add %756, %758 : !llvm.i64 + %760 = llvm.getelementptr %752[%759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %761 = llvm.load %760 : !llvm.ptr + %762 = llvm.fadd %761, %751 {RelaxedPrecision} : !llvm.float + %763 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %764 = llvm.mlir.constant(0 : index) : !llvm.i64 + %765 = llvm.mlir.constant(512 : index) : !llvm.i64 + %766 = llvm.mul %50, %765 : !llvm.i64 + %767 = llvm.add %764, %766 : !llvm.i64 + %768 = llvm.mlir.constant(1 : index) : !llvm.i64 + %769 = llvm.mul %730, %768 : !llvm.i64 + %770 = llvm.add %767, %769 : !llvm.i64 + %771 = llvm.getelementptr %763[%770] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %762, %771 : !llvm.ptr + %772 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %773 = llvm.mlir.constant(0 : index) : !llvm.i64 + %774 = llvm.mlir.constant(512 : index) : !llvm.i64 + %775 = llvm.mul %50, %774 : !llvm.i64 + %776 = llvm.add %773, %775 : !llvm.i64 + %777 = llvm.mlir.constant(1 : index) : !llvm.i64 + %778 = llvm.mul %730, %777 : !llvm.i64 + %779 = llvm.add %776, %778 : !llvm.i64 + %780 = llvm.getelementptr %772[%779] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %781 = llvm.load %780 : !llvm.ptr + %782 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %784 = llvm.mlir.constant(512 : index) : !llvm.i64 + %785 = llvm.mul %50, %784 : !llvm.i64 + %786 = llvm.add %783, %785 : !llvm.i64 + %787 = llvm.mlir.constant(1 : index) : !llvm.i64 + %788 = llvm.mul %730, %787 : !llvm.i64 + %789 = llvm.add %786, %788 : !llvm.i64 + %790 = llvm.getelementptr %782[%789] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %781, %790 : !llvm.ptr + %791 = llvm.add %58, %44 : !llvm.i64 + %792 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %794 = llvm.mlir.constant(128 : index) : !llvm.i64 + %795 = llvm.mul %50, %794 : !llvm.i64 + %796 = llvm.add %793, %795 : !llvm.i64 + %797 = llvm.mlir.constant(1 : index) : !llvm.i64 + %798 = llvm.mul %59, %797 : !llvm.i64 + %799 = llvm.add %796, %798 : !llvm.i64 + %800 = llvm.getelementptr %792[%799] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %801 = llvm.load %800 : !llvm.ptr + %802 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %803 = llvm.mlir.constant(0 : index) : !llvm.i64 + %804 = llvm.mlir.constant(512 : index) : !llvm.i64 + %805 = llvm.mul %59, %804 : !llvm.i64 + %806 = llvm.add %803, %805 : !llvm.i64 + %807 = llvm.mlir.constant(1 : index) : !llvm.i64 + %808 = llvm.mul %791, %807 : !llvm.i64 + %809 = llvm.add %806, %808 : !llvm.i64 + %810 = llvm.getelementptr %802[%809] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %811 = llvm.load %810 : !llvm.ptr + %812 = llvm.fmul %801, %811 {RelaxedPrecision} : !llvm.float + %813 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %814 = llvm.mlir.constant(0 : index) : !llvm.i64 + %815 = llvm.mlir.constant(512 : index) : !llvm.i64 + %816 = llvm.mul %50, %815 : !llvm.i64 + %817 = llvm.add %814, %816 : !llvm.i64 + %818 = llvm.mlir.constant(1 : index) : !llvm.i64 + %819 = llvm.mul %791, %818 : !llvm.i64 + %820 = llvm.add %817, %819 : !llvm.i64 + %821 = llvm.getelementptr %813[%820] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %822 = llvm.load %821 : !llvm.ptr + %823 = llvm.fadd %822, %812 {RelaxedPrecision} : !llvm.float + %824 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %825 = llvm.mlir.constant(0 : index) : !llvm.i64 + %826 = llvm.mlir.constant(512 : index) : !llvm.i64 + %827 = llvm.mul %50, %826 : !llvm.i64 + %828 = llvm.add %825, %827 : !llvm.i64 + %829 = llvm.mlir.constant(1 : index) : !llvm.i64 + %830 = llvm.mul %791, %829 : !llvm.i64 + %831 = llvm.add %828, %830 : !llvm.i64 + %832 = llvm.getelementptr %824[%831] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %823, %832 : !llvm.ptr + %833 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %834 = llvm.mlir.constant(0 : index) : !llvm.i64 + %835 = llvm.mlir.constant(512 : index) : !llvm.i64 + %836 = llvm.mul %50, %835 : !llvm.i64 + %837 = llvm.add %834, %836 : !llvm.i64 + %838 = llvm.mlir.constant(1 : index) : !llvm.i64 + %839 = llvm.mul %791, %838 : !llvm.i64 + %840 = llvm.add %837, %839 : !llvm.i64 + %841 = llvm.getelementptr %833[%840] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %842 = llvm.load %841 : !llvm.ptr + %843 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %844 = llvm.mlir.constant(0 : index) : !llvm.i64 + %845 = llvm.mlir.constant(512 : index) : !llvm.i64 + %846 = llvm.mul %50, %845 : !llvm.i64 + %847 = llvm.add %844, %846 : !llvm.i64 + %848 = llvm.mlir.constant(1 : index) : !llvm.i64 + %849 = llvm.mul %791, %848 : !llvm.i64 + %850 = llvm.add %847, %849 : !llvm.i64 + %851 = llvm.getelementptr %843[%850] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %842, %851 : !llvm.ptr + %852 = llvm.add %58, %45 : !llvm.i64 + %853 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %854 = llvm.mlir.constant(0 : index) : !llvm.i64 + %855 = llvm.mlir.constant(128 : index) : !llvm.i64 + %856 = llvm.mul %50, %855 : !llvm.i64 + %857 = llvm.add %854, %856 : !llvm.i64 + %858 = llvm.mlir.constant(1 : index) : !llvm.i64 + %859 = llvm.mul %59, %858 : !llvm.i64 + %860 = llvm.add %857, %859 : !llvm.i64 + %861 = llvm.getelementptr %853[%860] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %862 = llvm.load %861 : !llvm.ptr + %863 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %864 = llvm.mlir.constant(0 : index) : !llvm.i64 + %865 = llvm.mlir.constant(512 : index) : !llvm.i64 + %866 = llvm.mul %59, %865 : !llvm.i64 + %867 = llvm.add %864, %866 : !llvm.i64 + %868 = llvm.mlir.constant(1 : index) : !llvm.i64 + %869 = llvm.mul %852, %868 : !llvm.i64 + %870 = llvm.add %867, %869 : !llvm.i64 + %871 = llvm.getelementptr %863[%870] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %872 = llvm.load %871 : !llvm.ptr + %873 = llvm.fmul %862, %872 {RelaxedPrecision} : !llvm.float + %874 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %875 = llvm.mlir.constant(0 : index) : !llvm.i64 + %876 = llvm.mlir.constant(512 : index) : !llvm.i64 + %877 = llvm.mul %50, %876 : !llvm.i64 + %878 = llvm.add %875, %877 : !llvm.i64 + %879 = llvm.mlir.constant(1 : index) : !llvm.i64 + %880 = llvm.mul %852, %879 : !llvm.i64 + %881 = llvm.add %878, %880 : !llvm.i64 + %882 = llvm.getelementptr %874[%881] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %883 = llvm.load %882 : !llvm.ptr + %884 = llvm.fadd %883, %873 {RelaxedPrecision} : !llvm.float + %885 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %886 = llvm.mlir.constant(0 : index) : !llvm.i64 + %887 = llvm.mlir.constant(512 : index) : !llvm.i64 + %888 = llvm.mul %50, %887 : !llvm.i64 + %889 = llvm.add %886, %888 : !llvm.i64 + %890 = llvm.mlir.constant(1 : index) : !llvm.i64 + %891 = llvm.mul %852, %890 : !llvm.i64 + %892 = llvm.add %889, %891 : !llvm.i64 + %893 = llvm.getelementptr %885[%892] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %884, %893 : !llvm.ptr + %894 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %895 = llvm.mlir.constant(0 : index) : !llvm.i64 + %896 = llvm.mlir.constant(512 : index) : !llvm.i64 + %897 = llvm.mul %50, %896 : !llvm.i64 + %898 = llvm.add %895, %897 : !llvm.i64 + %899 = llvm.mlir.constant(1 : index) : !llvm.i64 + %900 = llvm.mul %852, %899 : !llvm.i64 + %901 = llvm.add %898, %900 : !llvm.i64 + %902 = llvm.getelementptr %894[%901] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %903 = llvm.load %902 : !llvm.ptr + %904 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %905 = llvm.mlir.constant(0 : index) : !llvm.i64 + %906 = llvm.mlir.constant(512 : index) : !llvm.i64 + %907 = llvm.mul %50, %906 : !llvm.i64 + %908 = llvm.add %905, %907 : !llvm.i64 + %909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %910 = llvm.mul %852, %909 : !llvm.i64 + %911 = llvm.add %908, %910 : !llvm.i64 + %912 = llvm.getelementptr %904[%911] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %903, %912 : !llvm.ptr + %913 = llvm.add %58, %46 : !llvm.i64 + %914 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %915 = llvm.mlir.constant(0 : index) : !llvm.i64 + %916 = llvm.mlir.constant(128 : index) : !llvm.i64 + %917 = llvm.mul %50, %916 : !llvm.i64 + %918 = llvm.add %915, %917 : !llvm.i64 + %919 = llvm.mlir.constant(1 : index) : !llvm.i64 + %920 = llvm.mul %59, %919 : !llvm.i64 + %921 = llvm.add %918, %920 : !llvm.i64 + %922 = llvm.getelementptr %914[%921] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %923 = llvm.load %922 : !llvm.ptr + %924 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %925 = llvm.mlir.constant(0 : index) : !llvm.i64 + %926 = llvm.mlir.constant(512 : index) : !llvm.i64 + %927 = llvm.mul %59, %926 : !llvm.i64 + %928 = llvm.add %925, %927 : !llvm.i64 + %929 = llvm.mlir.constant(1 : index) : !llvm.i64 + %930 = llvm.mul %913, %929 : !llvm.i64 + %931 = llvm.add %928, %930 : !llvm.i64 + %932 = llvm.getelementptr %924[%931] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %933 = llvm.load %932 : !llvm.ptr + %934 = llvm.fmul %923, %933 {RelaxedPrecision} : !llvm.float + %935 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %936 = llvm.mlir.constant(0 : index) : !llvm.i64 + %937 = llvm.mlir.constant(512 : index) : !llvm.i64 + %938 = llvm.mul %50, %937 : !llvm.i64 + %939 = llvm.add %936, %938 : !llvm.i64 + %940 = llvm.mlir.constant(1 : index) : !llvm.i64 + %941 = llvm.mul %913, %940 : !llvm.i64 + %942 = llvm.add %939, %941 : !llvm.i64 + %943 = llvm.getelementptr %935[%942] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %944 = llvm.load %943 : !llvm.ptr + %945 = llvm.fadd %944, %934 {RelaxedPrecision} : !llvm.float + %946 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %948 = llvm.mlir.constant(512 : index) : !llvm.i64 + %949 = llvm.mul %50, %948 : !llvm.i64 + %950 = llvm.add %947, %949 : !llvm.i64 + %951 = llvm.mlir.constant(1 : index) : !llvm.i64 + %952 = llvm.mul %913, %951 : !llvm.i64 + %953 = llvm.add %950, %952 : !llvm.i64 + %954 = llvm.getelementptr %946[%953] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %945, %954 : !llvm.ptr + %955 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %956 = llvm.mlir.constant(0 : index) : !llvm.i64 + %957 = llvm.mlir.constant(512 : index) : !llvm.i64 + %958 = llvm.mul %50, %957 : !llvm.i64 + %959 = llvm.add %956, %958 : !llvm.i64 + %960 = llvm.mlir.constant(1 : index) : !llvm.i64 + %961 = llvm.mul %913, %960 : !llvm.i64 + %962 = llvm.add %959, %961 : !llvm.i64 + %963 = llvm.getelementptr %955[%962] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %964 = llvm.load %963 : !llvm.ptr + %965 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %966 = llvm.mlir.constant(0 : index) : !llvm.i64 + %967 = llvm.mlir.constant(512 : index) : !llvm.i64 + %968 = llvm.mul %50, %967 : !llvm.i64 + %969 = llvm.add %966, %968 : !llvm.i64 + %970 = llvm.mlir.constant(1 : index) : !llvm.i64 + %971 = llvm.mul %913, %970 : !llvm.i64 + %972 = llvm.add %969, %971 : !llvm.i64 + %973 = llvm.getelementptr %965[%972] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %964, %973 : !llvm.ptr + %974 = llvm.add %58, %47 : !llvm.i64 + %975 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %976 = llvm.mlir.constant(0 : index) : !llvm.i64 + %977 = llvm.mlir.constant(128 : index) : !llvm.i64 + %978 = llvm.mul %50, %977 : !llvm.i64 + %979 = llvm.add %976, %978 : !llvm.i64 + %980 = llvm.mlir.constant(1 : index) : !llvm.i64 + %981 = llvm.mul %59, %980 : !llvm.i64 + %982 = llvm.add %979, %981 : !llvm.i64 + %983 = llvm.getelementptr %975[%982] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %984 = llvm.load %983 : !llvm.ptr + %985 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %986 = llvm.mlir.constant(0 : index) : !llvm.i64 + %987 = llvm.mlir.constant(512 : index) : !llvm.i64 + %988 = llvm.mul %59, %987 : !llvm.i64 + %989 = llvm.add %986, %988 : !llvm.i64 + %990 = llvm.mlir.constant(1 : index) : !llvm.i64 + %991 = llvm.mul %974, %990 : !llvm.i64 + %992 = llvm.add %989, %991 : !llvm.i64 + %993 = llvm.getelementptr %985[%992] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %994 = llvm.load %993 : !llvm.ptr + %995 = llvm.fmul %984, %994 {RelaxedPrecision} : !llvm.float + %996 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %997 = llvm.mlir.constant(0 : index) : !llvm.i64 + %998 = llvm.mlir.constant(512 : index) : !llvm.i64 + %999 = llvm.mul %50, %998 : !llvm.i64 + %1000 = llvm.add %997, %999 : !llvm.i64 + %1001 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1002 = llvm.mul %974, %1001 : !llvm.i64 + %1003 = llvm.add %1000, %1002 : !llvm.i64 + %1004 = llvm.getelementptr %996[%1003] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1005 = llvm.load %1004 : !llvm.ptr + %1006 = llvm.fadd %1005, %995 {RelaxedPrecision} : !llvm.float + %1007 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1009 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1010 = llvm.mul %50, %1009 : !llvm.i64 + %1011 = llvm.add %1008, %1010 : !llvm.i64 + %1012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1013 = llvm.mul %974, %1012 : !llvm.i64 + %1014 = llvm.add %1011, %1013 : !llvm.i64 + %1015 = llvm.getelementptr %1007[%1014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1006, %1015 : !llvm.ptr + %1016 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1017 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1018 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1019 = llvm.mul %50, %1018 : !llvm.i64 + %1020 = llvm.add %1017, %1019 : !llvm.i64 + %1021 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1022 = llvm.mul %974, %1021 : !llvm.i64 + %1023 = llvm.add %1020, %1022 : !llvm.i64 + %1024 = llvm.getelementptr %1016[%1023] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1025 = llvm.load %1024 : !llvm.ptr + %1026 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1027 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1028 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1029 = llvm.mul %50, %1028 : !llvm.i64 + %1030 = llvm.add %1027, %1029 : !llvm.i64 + %1031 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1032 = llvm.mul %974, %1031 : !llvm.i64 + %1033 = llvm.add %1030, %1032 : !llvm.i64 + %1034 = llvm.getelementptr %1026[%1033] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1025, %1034 : !llvm.ptr + %1035 = llvm.add %50, %33 : !llvm.i64 + %1036 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1037 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1038 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1039 = llvm.mul %1035, %1038 : !llvm.i64 + %1040 = llvm.add %1037, %1039 : !llvm.i64 + %1041 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1042 = llvm.mul %59, %1041 : !llvm.i64 + %1043 = llvm.add %1040, %1042 : !llvm.i64 + %1044 = llvm.getelementptr %1036[%1043] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1045 = llvm.load %1044 : !llvm.ptr + %1046 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1047 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1048 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1049 = llvm.mul %59, %1048 : !llvm.i64 + %1050 = llvm.add %1047, %1049 : !llvm.i64 + %1051 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1052 = llvm.mul %58, %1051 : !llvm.i64 + %1053 = llvm.add %1050, %1052 : !llvm.i64 + %1054 = llvm.getelementptr %1046[%1053] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1055 = llvm.load %1054 : !llvm.ptr + %1056 = llvm.fmul %1045, %1055 {RelaxedPrecision} : !llvm.float + %1057 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1059 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1060 = llvm.mul %1035, %1059 : !llvm.i64 + %1061 = llvm.add %1058, %1060 : !llvm.i64 + %1062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1063 = llvm.mul %58, %1062 : !llvm.i64 + %1064 = llvm.add %1061, %1063 : !llvm.i64 + %1065 = llvm.getelementptr %1057[%1064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1066 = llvm.load %1065 : !llvm.ptr + %1067 = llvm.fadd %1066, %1056 {RelaxedPrecision} : !llvm.float + %1068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1071 = llvm.mul %1035, %1070 : !llvm.i64 + %1072 = llvm.add %1069, %1071 : !llvm.i64 + %1073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1074 = llvm.mul %58, %1073 : !llvm.i64 + %1075 = llvm.add %1072, %1074 : !llvm.i64 + %1076 = llvm.getelementptr %1068[%1075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1067, %1076 : !llvm.ptr + %1077 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1078 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1079 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1080 = llvm.mul %1035, %1079 : !llvm.i64 + %1081 = llvm.add %1078, %1080 : !llvm.i64 + %1082 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1083 = llvm.mul %58, %1082 : !llvm.i64 + %1084 = llvm.add %1081, %1083 : !llvm.i64 + %1085 = llvm.getelementptr %1077[%1084] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1086 = llvm.load %1085 : !llvm.ptr + %1087 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1088 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1089 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1090 = llvm.mul %1035, %1089 : !llvm.i64 + %1091 = llvm.add %1088, %1090 : !llvm.i64 + %1092 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1093 = llvm.mul %58, %1092 : !llvm.i64 + %1094 = llvm.add %1091, %1093 : !llvm.i64 + %1095 = llvm.getelementptr %1087[%1094] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1086, %1095 : !llvm.ptr + %1096 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1098 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1099 = llvm.mul %1035, %1098 : !llvm.i64 + %1100 = llvm.add %1097, %1099 : !llvm.i64 + %1101 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1102 = llvm.mul %59, %1101 : !llvm.i64 + %1103 = llvm.add %1100, %1102 : !llvm.i64 + %1104 = llvm.getelementptr %1096[%1103] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1105 = llvm.load %1104 : !llvm.ptr + %1106 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1108 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1109 = llvm.mul %59, %1108 : !llvm.i64 + %1110 = llvm.add %1107, %1109 : !llvm.i64 + %1111 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1112 = llvm.mul %120, %1111 : !llvm.i64 + %1113 = llvm.add %1110, %1112 : !llvm.i64 + %1114 = llvm.getelementptr %1106[%1113] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1115 = llvm.load %1114 : !llvm.ptr + %1116 = llvm.fmul %1105, %1115 {RelaxedPrecision} : !llvm.float + %1117 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1120 = llvm.mul %1035, %1119 : !llvm.i64 + %1121 = llvm.add %1118, %1120 : !llvm.i64 + %1122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1123 = llvm.mul %120, %1122 : !llvm.i64 + %1124 = llvm.add %1121, %1123 : !llvm.i64 + %1125 = llvm.getelementptr %1117[%1124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1126 = llvm.load %1125 : !llvm.ptr + %1127 = llvm.fadd %1126, %1116 {RelaxedPrecision} : !llvm.float + %1128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1131 = llvm.mul %1035, %1130 : !llvm.i64 + %1132 = llvm.add %1129, %1131 : !llvm.i64 + %1133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1134 = llvm.mul %120, %1133 : !llvm.i64 + %1135 = llvm.add %1132, %1134 : !llvm.i64 + %1136 = llvm.getelementptr %1128[%1135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1127, %1136 : !llvm.ptr + %1137 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1140 = llvm.mul %1035, %1139 : !llvm.i64 + %1141 = llvm.add %1138, %1140 : !llvm.i64 + %1142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1143 = llvm.mul %120, %1142 : !llvm.i64 + %1144 = llvm.add %1141, %1143 : !llvm.i64 + %1145 = llvm.getelementptr %1137[%1144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1146 = llvm.load %1145 : !llvm.ptr + %1147 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1149 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1150 = llvm.mul %1035, %1149 : !llvm.i64 + %1151 = llvm.add %1148, %1150 : !llvm.i64 + %1152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1153 = llvm.mul %120, %1152 : !llvm.i64 + %1154 = llvm.add %1151, %1153 : !llvm.i64 + %1155 = llvm.getelementptr %1147[%1154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1146, %1155 : !llvm.ptr + %1156 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1157 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1158 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1159 = llvm.mul %1035, %1158 : !llvm.i64 + %1160 = llvm.add %1157, %1159 : !llvm.i64 + %1161 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1162 = llvm.mul %59, %1161 : !llvm.i64 + %1163 = llvm.add %1160, %1162 : !llvm.i64 + %1164 = llvm.getelementptr %1156[%1163] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1165 = llvm.load %1164 : !llvm.ptr + %1166 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1167 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1168 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1169 = llvm.mul %59, %1168 : !llvm.i64 + %1170 = llvm.add %1167, %1169 : !llvm.i64 + %1171 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1172 = llvm.mul %181, %1171 : !llvm.i64 + %1173 = llvm.add %1170, %1172 : !llvm.i64 + %1174 = llvm.getelementptr %1166[%1173] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1175 = llvm.load %1174 : !llvm.ptr + %1176 = llvm.fmul %1165, %1175 {RelaxedPrecision} : !llvm.float + %1177 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1179 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1180 = llvm.mul %1035, %1179 : !llvm.i64 + %1181 = llvm.add %1178, %1180 : !llvm.i64 + %1182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1183 = llvm.mul %181, %1182 : !llvm.i64 + %1184 = llvm.add %1181, %1183 : !llvm.i64 + %1185 = llvm.getelementptr %1177[%1184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1186 = llvm.load %1185 : !llvm.ptr + %1187 = llvm.fadd %1186, %1176 {RelaxedPrecision} : !llvm.float + %1188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1191 = llvm.mul %1035, %1190 : !llvm.i64 + %1192 = llvm.add %1189, %1191 : !llvm.i64 + %1193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1194 = llvm.mul %181, %1193 : !llvm.i64 + %1195 = llvm.add %1192, %1194 : !llvm.i64 + %1196 = llvm.getelementptr %1188[%1195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1187, %1196 : !llvm.ptr + %1197 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1198 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1199 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1200 = llvm.mul %1035, %1199 : !llvm.i64 + %1201 = llvm.add %1198, %1200 : !llvm.i64 + %1202 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1203 = llvm.mul %181, %1202 : !llvm.i64 + %1204 = llvm.add %1201, %1203 : !llvm.i64 + %1205 = llvm.getelementptr %1197[%1204] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1206 = llvm.load %1205 : !llvm.ptr + %1207 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1208 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1209 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1210 = llvm.mul %1035, %1209 : !llvm.i64 + %1211 = llvm.add %1208, %1210 : !llvm.i64 + %1212 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1213 = llvm.mul %181, %1212 : !llvm.i64 + %1214 = llvm.add %1211, %1213 : !llvm.i64 + %1215 = llvm.getelementptr %1207[%1214] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1206, %1215 : !llvm.ptr + %1216 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1217 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1218 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1219 = llvm.mul %1035, %1218 : !llvm.i64 + %1220 = llvm.add %1217, %1219 : !llvm.i64 + %1221 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1222 = llvm.mul %59, %1221 : !llvm.i64 + %1223 = llvm.add %1220, %1222 : !llvm.i64 + %1224 = llvm.getelementptr %1216[%1223] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1225 = llvm.load %1224 : !llvm.ptr + %1226 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1227 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1228 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1229 = llvm.mul %59, %1228 : !llvm.i64 + %1230 = llvm.add %1227, %1229 : !llvm.i64 + %1231 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1232 = llvm.mul %242, %1231 : !llvm.i64 + %1233 = llvm.add %1230, %1232 : !llvm.i64 + %1234 = llvm.getelementptr %1226[%1233] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1235 = llvm.load %1234 : !llvm.ptr + %1236 = llvm.fmul %1225, %1235 {RelaxedPrecision} : !llvm.float + %1237 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1239 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1240 = llvm.mul %1035, %1239 : !llvm.i64 + %1241 = llvm.add %1238, %1240 : !llvm.i64 + %1242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1243 = llvm.mul %242, %1242 : !llvm.i64 + %1244 = llvm.add %1241, %1243 : !llvm.i64 + %1245 = llvm.getelementptr %1237[%1244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1246 = llvm.load %1245 : !llvm.ptr + %1247 = llvm.fadd %1246, %1236 {RelaxedPrecision} : !llvm.float + %1248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1251 = llvm.mul %1035, %1250 : !llvm.i64 + %1252 = llvm.add %1249, %1251 : !llvm.i64 + %1253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1254 = llvm.mul %242, %1253 : !llvm.i64 + %1255 = llvm.add %1252, %1254 : !llvm.i64 + %1256 = llvm.getelementptr %1248[%1255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1247, %1256 : !llvm.ptr + %1257 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1258 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1259 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1260 = llvm.mul %1035, %1259 : !llvm.i64 + %1261 = llvm.add %1258, %1260 : !llvm.i64 + %1262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1263 = llvm.mul %242, %1262 : !llvm.i64 + %1264 = llvm.add %1261, %1263 : !llvm.i64 + %1265 = llvm.getelementptr %1257[%1264] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1266 = llvm.load %1265 : !llvm.ptr + %1267 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1269 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1270 = llvm.mul %1035, %1269 : !llvm.i64 + %1271 = llvm.add %1268, %1270 : !llvm.i64 + %1272 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1273 = llvm.mul %242, %1272 : !llvm.i64 + %1274 = llvm.add %1271, %1273 : !llvm.i64 + %1275 = llvm.getelementptr %1267[%1274] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1266, %1275 : !llvm.ptr + %1276 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1277 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1278 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1279 = llvm.mul %1035, %1278 : !llvm.i64 + %1280 = llvm.add %1277, %1279 : !llvm.i64 + %1281 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1282 = llvm.mul %59, %1281 : !llvm.i64 + %1283 = llvm.add %1280, %1282 : !llvm.i64 + %1284 = llvm.getelementptr %1276[%1283] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1285 = llvm.load %1284 : !llvm.ptr + %1286 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1287 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1288 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1289 = llvm.mul %59, %1288 : !llvm.i64 + %1290 = llvm.add %1287, %1289 : !llvm.i64 + %1291 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1292 = llvm.mul %303, %1291 : !llvm.i64 + %1293 = llvm.add %1290, %1292 : !llvm.i64 + %1294 = llvm.getelementptr %1286[%1293] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1295 = llvm.load %1294 : !llvm.ptr + %1296 = llvm.fmul %1285, %1295 {RelaxedPrecision} : !llvm.float + %1297 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1299 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1300 = llvm.mul %1035, %1299 : !llvm.i64 + %1301 = llvm.add %1298, %1300 : !llvm.i64 + %1302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1303 = llvm.mul %303, %1302 : !llvm.i64 + %1304 = llvm.add %1301, %1303 : !llvm.i64 + %1305 = llvm.getelementptr %1297[%1304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1306 = llvm.load %1305 : !llvm.ptr + %1307 = llvm.fadd %1306, %1296 {RelaxedPrecision} : !llvm.float + %1308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1311 = llvm.mul %1035, %1310 : !llvm.i64 + %1312 = llvm.add %1309, %1311 : !llvm.i64 + %1313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1314 = llvm.mul %303, %1313 : !llvm.i64 + %1315 = llvm.add %1312, %1314 : !llvm.i64 + %1316 = llvm.getelementptr %1308[%1315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1307, %1316 : !llvm.ptr + %1317 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1318 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1319 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1320 = llvm.mul %1035, %1319 : !llvm.i64 + %1321 = llvm.add %1318, %1320 : !llvm.i64 + %1322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1323 = llvm.mul %303, %1322 : !llvm.i64 + %1324 = llvm.add %1321, %1323 : !llvm.i64 + %1325 = llvm.getelementptr %1317[%1324] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1326 = llvm.load %1325 : !llvm.ptr + %1327 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1328 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1329 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1330 = llvm.mul %1035, %1329 : !llvm.i64 + %1331 = llvm.add %1328, %1330 : !llvm.i64 + %1332 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1333 = llvm.mul %303, %1332 : !llvm.i64 + %1334 = llvm.add %1331, %1333 : !llvm.i64 + %1335 = llvm.getelementptr %1327[%1334] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1326, %1335 : !llvm.ptr + %1336 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1337 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1338 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1339 = llvm.mul %1035, %1338 : !llvm.i64 + %1340 = llvm.add %1337, %1339 : !llvm.i64 + %1341 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1342 = llvm.mul %59, %1341 : !llvm.i64 + %1343 = llvm.add %1340, %1342 : !llvm.i64 + %1344 = llvm.getelementptr %1336[%1343] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1345 = llvm.load %1344 : !llvm.ptr + %1346 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1347 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1348 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1349 = llvm.mul %59, %1348 : !llvm.i64 + %1350 = llvm.add %1347, %1349 : !llvm.i64 + %1351 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1352 = llvm.mul %364, %1351 : !llvm.i64 + %1353 = llvm.add %1350, %1352 : !llvm.i64 + %1354 = llvm.getelementptr %1346[%1353] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1355 = llvm.load %1354 : !llvm.ptr + %1356 = llvm.fmul %1345, %1355 {RelaxedPrecision} : !llvm.float + %1357 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1359 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1360 = llvm.mul %1035, %1359 : !llvm.i64 + %1361 = llvm.add %1358, %1360 : !llvm.i64 + %1362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1363 = llvm.mul %364, %1362 : !llvm.i64 + %1364 = llvm.add %1361, %1363 : !llvm.i64 + %1365 = llvm.getelementptr %1357[%1364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1366 = llvm.load %1365 : !llvm.ptr + %1367 = llvm.fadd %1366, %1356 {RelaxedPrecision} : !llvm.float + %1368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1371 = llvm.mul %1035, %1370 : !llvm.i64 + %1372 = llvm.add %1369, %1371 : !llvm.i64 + %1373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1374 = llvm.mul %364, %1373 : !llvm.i64 + %1375 = llvm.add %1372, %1374 : !llvm.i64 + %1376 = llvm.getelementptr %1368[%1375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1367, %1376 : !llvm.ptr + %1377 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1378 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1379 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1380 = llvm.mul %1035, %1379 : !llvm.i64 + %1381 = llvm.add %1378, %1380 : !llvm.i64 + %1382 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1383 = llvm.mul %364, %1382 : !llvm.i64 + %1384 = llvm.add %1381, %1383 : !llvm.i64 + %1385 = llvm.getelementptr %1377[%1384] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1386 = llvm.load %1385 : !llvm.ptr + %1387 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1389 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1390 = llvm.mul %1035, %1389 : !llvm.i64 + %1391 = llvm.add %1388, %1390 : !llvm.i64 + %1392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1393 = llvm.mul %364, %1392 : !llvm.i64 + %1394 = llvm.add %1391, %1393 : !llvm.i64 + %1395 = llvm.getelementptr %1387[%1394] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1386, %1395 : !llvm.ptr + %1396 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1397 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1398 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1399 = llvm.mul %1035, %1398 : !llvm.i64 + %1400 = llvm.add %1397, %1399 : !llvm.i64 + %1401 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1402 = llvm.mul %59, %1401 : !llvm.i64 + %1403 = llvm.add %1400, %1402 : !llvm.i64 + %1404 = llvm.getelementptr %1396[%1403] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1405 = llvm.load %1404 : !llvm.ptr + %1406 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1408 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1409 = llvm.mul %59, %1408 : !llvm.i64 + %1410 = llvm.add %1407, %1409 : !llvm.i64 + %1411 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1412 = llvm.mul %425, %1411 : !llvm.i64 + %1413 = llvm.add %1410, %1412 : !llvm.i64 + %1414 = llvm.getelementptr %1406[%1413] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1415 = llvm.load %1414 : !llvm.ptr + %1416 = llvm.fmul %1405, %1415 {RelaxedPrecision} : !llvm.float + %1417 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1419 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1420 = llvm.mul %1035, %1419 : !llvm.i64 + %1421 = llvm.add %1418, %1420 : !llvm.i64 + %1422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1423 = llvm.mul %425, %1422 : !llvm.i64 + %1424 = llvm.add %1421, %1423 : !llvm.i64 + %1425 = llvm.getelementptr %1417[%1424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1426 = llvm.load %1425 : !llvm.ptr + %1427 = llvm.fadd %1426, %1416 {RelaxedPrecision} : !llvm.float + %1428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1431 = llvm.mul %1035, %1430 : !llvm.i64 + %1432 = llvm.add %1429, %1431 : !llvm.i64 + %1433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1434 = llvm.mul %425, %1433 : !llvm.i64 + %1435 = llvm.add %1432, %1434 : !llvm.i64 + %1436 = llvm.getelementptr %1428[%1435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1427, %1436 : !llvm.ptr + %1437 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1438 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1439 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1440 = llvm.mul %1035, %1439 : !llvm.i64 + %1441 = llvm.add %1438, %1440 : !llvm.i64 + %1442 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1443 = llvm.mul %425, %1442 : !llvm.i64 + %1444 = llvm.add %1441, %1443 : !llvm.i64 + %1445 = llvm.getelementptr %1437[%1444] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1446 = llvm.load %1445 : !llvm.ptr + %1447 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1449 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1450 = llvm.mul %1035, %1449 : !llvm.i64 + %1451 = llvm.add %1448, %1450 : !llvm.i64 + %1452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1453 = llvm.mul %425, %1452 : !llvm.i64 + %1454 = llvm.add %1451, %1453 : !llvm.i64 + %1455 = llvm.getelementptr %1447[%1454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1446, %1455 : !llvm.ptr + %1456 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1457 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1458 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1459 = llvm.mul %1035, %1458 : !llvm.i64 + %1460 = llvm.add %1457, %1459 : !llvm.i64 + %1461 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1462 = llvm.mul %59, %1461 : !llvm.i64 + %1463 = llvm.add %1460, %1462 : !llvm.i64 + %1464 = llvm.getelementptr %1456[%1463] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1465 = llvm.load %1464 : !llvm.ptr + %1466 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1467 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1468 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1469 = llvm.mul %59, %1468 : !llvm.i64 + %1470 = llvm.add %1467, %1469 : !llvm.i64 + %1471 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1472 = llvm.mul %486, %1471 : !llvm.i64 + %1473 = llvm.add %1470, %1472 : !llvm.i64 + %1474 = llvm.getelementptr %1466[%1473] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1475 = llvm.load %1474 : !llvm.ptr + %1476 = llvm.fmul %1465, %1475 {RelaxedPrecision} : !llvm.float + %1477 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1480 = llvm.mul %1035, %1479 : !llvm.i64 + %1481 = llvm.add %1478, %1480 : !llvm.i64 + %1482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1483 = llvm.mul %486, %1482 : !llvm.i64 + %1484 = llvm.add %1481, %1483 : !llvm.i64 + %1485 = llvm.getelementptr %1477[%1484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1486 = llvm.load %1485 : !llvm.ptr + %1487 = llvm.fadd %1486, %1476 {RelaxedPrecision} : !llvm.float + %1488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1491 = llvm.mul %1035, %1490 : !llvm.i64 + %1492 = llvm.add %1489, %1491 : !llvm.i64 + %1493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1494 = llvm.mul %486, %1493 : !llvm.i64 + %1495 = llvm.add %1492, %1494 : !llvm.i64 + %1496 = llvm.getelementptr %1488[%1495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1487, %1496 : !llvm.ptr + %1497 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1500 = llvm.mul %1035, %1499 : !llvm.i64 + %1501 = llvm.add %1498, %1500 : !llvm.i64 + %1502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1503 = llvm.mul %486, %1502 : !llvm.i64 + %1504 = llvm.add %1501, %1503 : !llvm.i64 + %1505 = llvm.getelementptr %1497[%1504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1506 = llvm.load %1505 : !llvm.ptr + %1507 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1508 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1509 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1510 = llvm.mul %1035, %1509 : !llvm.i64 + %1511 = llvm.add %1508, %1510 : !llvm.i64 + %1512 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1513 = llvm.mul %486, %1512 : !llvm.i64 + %1514 = llvm.add %1511, %1513 : !llvm.i64 + %1515 = llvm.getelementptr %1507[%1514] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1506, %1515 : !llvm.ptr + %1516 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1517 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1518 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1519 = llvm.mul %1035, %1518 : !llvm.i64 + %1520 = llvm.add %1517, %1519 : !llvm.i64 + %1521 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1522 = llvm.mul %59, %1521 : !llvm.i64 + %1523 = llvm.add %1520, %1522 : !llvm.i64 + %1524 = llvm.getelementptr %1516[%1523] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1525 = llvm.load %1524 : !llvm.ptr + %1526 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1528 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1529 = llvm.mul %59, %1528 : !llvm.i64 + %1530 = llvm.add %1527, %1529 : !llvm.i64 + %1531 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1532 = llvm.mul %547, %1531 : !llvm.i64 + %1533 = llvm.add %1530, %1532 : !llvm.i64 + %1534 = llvm.getelementptr %1526[%1533] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1535 = llvm.load %1534 : !llvm.ptr + %1536 = llvm.fmul %1525, %1535 {RelaxedPrecision} : !llvm.float + %1537 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1539 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1540 = llvm.mul %1035, %1539 : !llvm.i64 + %1541 = llvm.add %1538, %1540 : !llvm.i64 + %1542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1543 = llvm.mul %547, %1542 : !llvm.i64 + %1544 = llvm.add %1541, %1543 : !llvm.i64 + %1545 = llvm.getelementptr %1537[%1544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1546 = llvm.load %1545 : !llvm.ptr + %1547 = llvm.fadd %1546, %1536 {RelaxedPrecision} : !llvm.float + %1548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1551 = llvm.mul %1035, %1550 : !llvm.i64 + %1552 = llvm.add %1549, %1551 : !llvm.i64 + %1553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1554 = llvm.mul %547, %1553 : !llvm.i64 + %1555 = llvm.add %1552, %1554 : !llvm.i64 + %1556 = llvm.getelementptr %1548[%1555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1547, %1556 : !llvm.ptr + %1557 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1558 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1559 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1560 = llvm.mul %1035, %1559 : !llvm.i64 + %1561 = llvm.add %1558, %1560 : !llvm.i64 + %1562 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1563 = llvm.mul %547, %1562 : !llvm.i64 + %1564 = llvm.add %1561, %1563 : !llvm.i64 + %1565 = llvm.getelementptr %1557[%1564] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1566 = llvm.load %1565 : !llvm.ptr + %1567 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1568 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1569 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1570 = llvm.mul %1035, %1569 : !llvm.i64 + %1571 = llvm.add %1568, %1570 : !llvm.i64 + %1572 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1573 = llvm.mul %547, %1572 : !llvm.i64 + %1574 = llvm.add %1571, %1573 : !llvm.i64 + %1575 = llvm.getelementptr %1567[%1574] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1566, %1575 : !llvm.ptr + %1576 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1577 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1578 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1579 = llvm.mul %1035, %1578 : !llvm.i64 + %1580 = llvm.add %1577, %1579 : !llvm.i64 + %1581 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1582 = llvm.mul %59, %1581 : !llvm.i64 + %1583 = llvm.add %1580, %1582 : !llvm.i64 + %1584 = llvm.getelementptr %1576[%1583] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1585 = llvm.load %1584 : !llvm.ptr + %1586 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1587 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1588 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1589 = llvm.mul %59, %1588 : !llvm.i64 + %1590 = llvm.add %1587, %1589 : !llvm.i64 + %1591 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1592 = llvm.mul %608, %1591 : !llvm.i64 + %1593 = llvm.add %1590, %1592 : !llvm.i64 + %1594 = llvm.getelementptr %1586[%1593] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1595 = llvm.load %1594 : !llvm.ptr + %1596 = llvm.fmul %1585, %1595 {RelaxedPrecision} : !llvm.float + %1597 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1599 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1600 = llvm.mul %1035, %1599 : !llvm.i64 + %1601 = llvm.add %1598, %1600 : !llvm.i64 + %1602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1603 = llvm.mul %608, %1602 : !llvm.i64 + %1604 = llvm.add %1601, %1603 : !llvm.i64 + %1605 = llvm.getelementptr %1597[%1604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1606 = llvm.load %1605 : !llvm.ptr + %1607 = llvm.fadd %1606, %1596 {RelaxedPrecision} : !llvm.float + %1608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1611 = llvm.mul %1035, %1610 : !llvm.i64 + %1612 = llvm.add %1609, %1611 : !llvm.i64 + %1613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1614 = llvm.mul %608, %1613 : !llvm.i64 + %1615 = llvm.add %1612, %1614 : !llvm.i64 + %1616 = llvm.getelementptr %1608[%1615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1607, %1616 : !llvm.ptr + %1617 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1618 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1619 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1620 = llvm.mul %1035, %1619 : !llvm.i64 + %1621 = llvm.add %1618, %1620 : !llvm.i64 + %1622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1623 = llvm.mul %608, %1622 : !llvm.i64 + %1624 = llvm.add %1621, %1623 : !llvm.i64 + %1625 = llvm.getelementptr %1617[%1624] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1626 = llvm.load %1625 : !llvm.ptr + %1627 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1628 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1629 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1630 = llvm.mul %1035, %1629 : !llvm.i64 + %1631 = llvm.add %1628, %1630 : !llvm.i64 + %1632 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1633 = llvm.mul %608, %1632 : !llvm.i64 + %1634 = llvm.add %1631, %1633 : !llvm.i64 + %1635 = llvm.getelementptr %1627[%1634] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1626, %1635 : !llvm.ptr + %1636 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1637 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1638 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1639 = llvm.mul %1035, %1638 : !llvm.i64 + %1640 = llvm.add %1637, %1639 : !llvm.i64 + %1641 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1642 = llvm.mul %59, %1641 : !llvm.i64 + %1643 = llvm.add %1640, %1642 : !llvm.i64 + %1644 = llvm.getelementptr %1636[%1643] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1645 = llvm.load %1644 : !llvm.ptr + %1646 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1647 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1648 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1649 = llvm.mul %59, %1648 : !llvm.i64 + %1650 = llvm.add %1647, %1649 : !llvm.i64 + %1651 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1652 = llvm.mul %669, %1651 : !llvm.i64 + %1653 = llvm.add %1650, %1652 : !llvm.i64 + %1654 = llvm.getelementptr %1646[%1653] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1655 = llvm.load %1654 : !llvm.ptr + %1656 = llvm.fmul %1645, %1655 {RelaxedPrecision} : !llvm.float + %1657 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1659 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1660 = llvm.mul %1035, %1659 : !llvm.i64 + %1661 = llvm.add %1658, %1660 : !llvm.i64 + %1662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1663 = llvm.mul %669, %1662 : !llvm.i64 + %1664 = llvm.add %1661, %1663 : !llvm.i64 + %1665 = llvm.getelementptr %1657[%1664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1666 = llvm.load %1665 : !llvm.ptr + %1667 = llvm.fadd %1666, %1656 {RelaxedPrecision} : !llvm.float + %1668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1671 = llvm.mul %1035, %1670 : !llvm.i64 + %1672 = llvm.add %1669, %1671 : !llvm.i64 + %1673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1674 = llvm.mul %669, %1673 : !llvm.i64 + %1675 = llvm.add %1672, %1674 : !llvm.i64 + %1676 = llvm.getelementptr %1668[%1675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1667, %1676 : !llvm.ptr + %1677 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1680 = llvm.mul %1035, %1679 : !llvm.i64 + %1681 = llvm.add %1678, %1680 : !llvm.i64 + %1682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1683 = llvm.mul %669, %1682 : !llvm.i64 + %1684 = llvm.add %1681, %1683 : !llvm.i64 + %1685 = llvm.getelementptr %1677[%1684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1686 = llvm.load %1685 : !llvm.ptr + %1687 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1688 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1689 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1690 = llvm.mul %1035, %1689 : !llvm.i64 + %1691 = llvm.add %1688, %1690 : !llvm.i64 + %1692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1693 = llvm.mul %669, %1692 : !llvm.i64 + %1694 = llvm.add %1691, %1693 : !llvm.i64 + %1695 = llvm.getelementptr %1687[%1694] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1686, %1695 : !llvm.ptr + %1696 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1697 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1698 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1699 = llvm.mul %1035, %1698 : !llvm.i64 + %1700 = llvm.add %1697, %1699 : !llvm.i64 + %1701 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1702 = llvm.mul %59, %1701 : !llvm.i64 + %1703 = llvm.add %1700, %1702 : !llvm.i64 + %1704 = llvm.getelementptr %1696[%1703] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1705 = llvm.load %1704 : !llvm.ptr + %1706 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1708 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1709 = llvm.mul %59, %1708 : !llvm.i64 + %1710 = llvm.add %1707, %1709 : !llvm.i64 + %1711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1712 = llvm.mul %730, %1711 : !llvm.i64 + %1713 = llvm.add %1710, %1712 : !llvm.i64 + %1714 = llvm.getelementptr %1706[%1713] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1715 = llvm.load %1714 : !llvm.ptr + %1716 = llvm.fmul %1705, %1715 {RelaxedPrecision} : !llvm.float + %1717 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1718 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1719 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1720 = llvm.mul %1035, %1719 : !llvm.i64 + %1721 = llvm.add %1718, %1720 : !llvm.i64 + %1722 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1723 = llvm.mul %730, %1722 : !llvm.i64 + %1724 = llvm.add %1721, %1723 : !llvm.i64 + %1725 = llvm.getelementptr %1717[%1724] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1726 = llvm.load %1725 : !llvm.ptr + %1727 = llvm.fadd %1726, %1716 {RelaxedPrecision} : !llvm.float + %1728 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1729 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1730 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1731 = llvm.mul %1035, %1730 : !llvm.i64 + %1732 = llvm.add %1729, %1731 : !llvm.i64 + %1733 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1734 = llvm.mul %730, %1733 : !llvm.i64 + %1735 = llvm.add %1732, %1734 : !llvm.i64 + %1736 = llvm.getelementptr %1728[%1735] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1727, %1736 : !llvm.ptr + %1737 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1738 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1739 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1740 = llvm.mul %1035, %1739 : !llvm.i64 + %1741 = llvm.add %1738, %1740 : !llvm.i64 + %1742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1743 = llvm.mul %730, %1742 : !llvm.i64 + %1744 = llvm.add %1741, %1743 : !llvm.i64 + %1745 = llvm.getelementptr %1737[%1744] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1746 = llvm.load %1745 : !llvm.ptr + %1747 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1749 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1750 = llvm.mul %1035, %1749 : !llvm.i64 + %1751 = llvm.add %1748, %1750 : !llvm.i64 + %1752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1753 = llvm.mul %730, %1752 : !llvm.i64 + %1754 = llvm.add %1751, %1753 : !llvm.i64 + %1755 = llvm.getelementptr %1747[%1754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1746, %1755 : !llvm.ptr + %1756 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1758 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1759 = llvm.mul %1035, %1758 : !llvm.i64 + %1760 = llvm.add %1757, %1759 : !llvm.i64 + %1761 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1762 = llvm.mul %59, %1761 : !llvm.i64 + %1763 = llvm.add %1760, %1762 : !llvm.i64 + %1764 = llvm.getelementptr %1756[%1763] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1765 = llvm.load %1764 : !llvm.ptr + %1766 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1767 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1768 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1769 = llvm.mul %59, %1768 : !llvm.i64 + %1770 = llvm.add %1767, %1769 : !llvm.i64 + %1771 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1772 = llvm.mul %791, %1771 : !llvm.i64 + %1773 = llvm.add %1770, %1772 : !llvm.i64 + %1774 = llvm.getelementptr %1766[%1773] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1775 = llvm.load %1774 : !llvm.ptr + %1776 = llvm.fmul %1765, %1775 {RelaxedPrecision} : !llvm.float + %1777 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1779 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1780 = llvm.mul %1035, %1779 : !llvm.i64 + %1781 = llvm.add %1778, %1780 : !llvm.i64 + %1782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1783 = llvm.mul %791, %1782 : !llvm.i64 + %1784 = llvm.add %1781, %1783 : !llvm.i64 + %1785 = llvm.getelementptr %1777[%1784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1786 = llvm.load %1785 : !llvm.ptr + %1787 = llvm.fadd %1786, %1776 {RelaxedPrecision} : !llvm.float + %1788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1791 = llvm.mul %1035, %1790 : !llvm.i64 + %1792 = llvm.add %1789, %1791 : !llvm.i64 + %1793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1794 = llvm.mul %791, %1793 : !llvm.i64 + %1795 = llvm.add %1792, %1794 : !llvm.i64 + %1796 = llvm.getelementptr %1788[%1795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1787, %1796 : !llvm.ptr + %1797 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1799 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1800 = llvm.mul %1035, %1799 : !llvm.i64 + %1801 = llvm.add %1798, %1800 : !llvm.i64 + %1802 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1803 = llvm.mul %791, %1802 : !llvm.i64 + %1804 = llvm.add %1801, %1803 : !llvm.i64 + %1805 = llvm.getelementptr %1797[%1804] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1806 = llvm.load %1805 : !llvm.ptr + %1807 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1808 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1809 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1810 = llvm.mul %1035, %1809 : !llvm.i64 + %1811 = llvm.add %1808, %1810 : !llvm.i64 + %1812 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1813 = llvm.mul %791, %1812 : !llvm.i64 + %1814 = llvm.add %1811, %1813 : !llvm.i64 + %1815 = llvm.getelementptr %1807[%1814] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1806, %1815 : !llvm.ptr + %1816 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1817 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1818 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1819 = llvm.mul %1035, %1818 : !llvm.i64 + %1820 = llvm.add %1817, %1819 : !llvm.i64 + %1821 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1822 = llvm.mul %59, %1821 : !llvm.i64 + %1823 = llvm.add %1820, %1822 : !llvm.i64 + %1824 = llvm.getelementptr %1816[%1823] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1825 = llvm.load %1824 : !llvm.ptr + %1826 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1827 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1828 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1829 = llvm.mul %59, %1828 : !llvm.i64 + %1830 = llvm.add %1827, %1829 : !llvm.i64 + %1831 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1832 = llvm.mul %852, %1831 : !llvm.i64 + %1833 = llvm.add %1830, %1832 : !llvm.i64 + %1834 = llvm.getelementptr %1826[%1833] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1835 = llvm.load %1834 : !llvm.ptr + %1836 = llvm.fmul %1825, %1835 {RelaxedPrecision} : !llvm.float + %1837 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1839 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1840 = llvm.mul %1035, %1839 : !llvm.i64 + %1841 = llvm.add %1838, %1840 : !llvm.i64 + %1842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1843 = llvm.mul %852, %1842 : !llvm.i64 + %1844 = llvm.add %1841, %1843 : !llvm.i64 + %1845 = llvm.getelementptr %1837[%1844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1846 = llvm.load %1845 : !llvm.ptr + %1847 = llvm.fadd %1846, %1836 {RelaxedPrecision} : !llvm.float + %1848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1851 = llvm.mul %1035, %1850 : !llvm.i64 + %1852 = llvm.add %1849, %1851 : !llvm.i64 + %1853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1854 = llvm.mul %852, %1853 : !llvm.i64 + %1855 = llvm.add %1852, %1854 : !llvm.i64 + %1856 = llvm.getelementptr %1848[%1855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1847, %1856 : !llvm.ptr + %1857 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1858 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1859 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1860 = llvm.mul %1035, %1859 : !llvm.i64 + %1861 = llvm.add %1858, %1860 : !llvm.i64 + %1862 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1863 = llvm.mul %852, %1862 : !llvm.i64 + %1864 = llvm.add %1861, %1863 : !llvm.i64 + %1865 = llvm.getelementptr %1857[%1864] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1866 = llvm.load %1865 : !llvm.ptr + %1867 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1868 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1869 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1870 = llvm.mul %1035, %1869 : !llvm.i64 + %1871 = llvm.add %1868, %1870 : !llvm.i64 + %1872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1873 = llvm.mul %852, %1872 : !llvm.i64 + %1874 = llvm.add %1871, %1873 : !llvm.i64 + %1875 = llvm.getelementptr %1867[%1874] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1866, %1875 : !llvm.ptr + %1876 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1877 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1878 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1879 = llvm.mul %1035, %1878 : !llvm.i64 + %1880 = llvm.add %1877, %1879 : !llvm.i64 + %1881 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1882 = llvm.mul %59, %1881 : !llvm.i64 + %1883 = llvm.add %1880, %1882 : !llvm.i64 + %1884 = llvm.getelementptr %1876[%1883] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1885 = llvm.load %1884 : !llvm.ptr + %1886 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1888 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1889 = llvm.mul %59, %1888 : !llvm.i64 + %1890 = llvm.add %1887, %1889 : !llvm.i64 + %1891 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1892 = llvm.mul %913, %1891 : !llvm.i64 + %1893 = llvm.add %1890, %1892 : !llvm.i64 + %1894 = llvm.getelementptr %1886[%1893] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1895 = llvm.load %1894 : !llvm.ptr + %1896 = llvm.fmul %1885, %1895 {RelaxedPrecision} : !llvm.float + %1897 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1899 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1900 = llvm.mul %1035, %1899 : !llvm.i64 + %1901 = llvm.add %1898, %1900 : !llvm.i64 + %1902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1903 = llvm.mul %913, %1902 : !llvm.i64 + %1904 = llvm.add %1901, %1903 : !llvm.i64 + %1905 = llvm.getelementptr %1897[%1904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1906 = llvm.load %1905 : !llvm.ptr + %1907 = llvm.fadd %1906, %1896 {RelaxedPrecision} : !llvm.float + %1908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1911 = llvm.mul %1035, %1910 : !llvm.i64 + %1912 = llvm.add %1909, %1911 : !llvm.i64 + %1913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1914 = llvm.mul %913, %1913 : !llvm.i64 + %1915 = llvm.add %1912, %1914 : !llvm.i64 + %1916 = llvm.getelementptr %1908[%1915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1907, %1916 : !llvm.ptr + %1917 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1918 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1919 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1920 = llvm.mul %1035, %1919 : !llvm.i64 + %1921 = llvm.add %1918, %1920 : !llvm.i64 + %1922 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1923 = llvm.mul %913, %1922 : !llvm.i64 + %1924 = llvm.add %1921, %1923 : !llvm.i64 + %1925 = llvm.getelementptr %1917[%1924] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1926 = llvm.load %1925 : !llvm.ptr + %1927 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1928 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1929 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1930 = llvm.mul %1035, %1929 : !llvm.i64 + %1931 = llvm.add %1928, %1930 : !llvm.i64 + %1932 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1933 = llvm.mul %913, %1932 : !llvm.i64 + %1934 = llvm.add %1931, %1933 : !llvm.i64 + %1935 = llvm.getelementptr %1927[%1934] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1926, %1935 : !llvm.ptr + %1936 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1937 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1938 = llvm.mlir.constant(128 : index) : !llvm.i64 + %1939 = llvm.mul %1035, %1938 : !llvm.i64 + %1940 = llvm.add %1937, %1939 : !llvm.i64 + %1941 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1942 = llvm.mul %59, %1941 : !llvm.i64 + %1943 = llvm.add %1940, %1942 : !llvm.i64 + %1944 = llvm.getelementptr %1936[%1943] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1945 = llvm.load %1944 : !llvm.ptr + %1946 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1948 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1949 = llvm.mul %59, %1948 : !llvm.i64 + %1950 = llvm.add %1947, %1949 : !llvm.i64 + %1951 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1952 = llvm.mul %974, %1951 : !llvm.i64 + %1953 = llvm.add %1950, %1952 : !llvm.i64 + %1954 = llvm.getelementptr %1946[%1953] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1955 = llvm.load %1954 : !llvm.ptr + %1956 = llvm.fmul %1945, %1955 {RelaxedPrecision} : !llvm.float + %1957 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1958 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1959 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1960 = llvm.mul %1035, %1959 : !llvm.i64 + %1961 = llvm.add %1958, %1960 : !llvm.i64 + %1962 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1963 = llvm.mul %974, %1962 : !llvm.i64 + %1964 = llvm.add %1961, %1963 : !llvm.i64 + %1965 = llvm.getelementptr %1957[%1964] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1966 = llvm.load %1965 : !llvm.ptr + %1967 = llvm.fadd %1966, %1956 {RelaxedPrecision} : !llvm.float + %1968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1971 = llvm.mul %1035, %1970 : !llvm.i64 + %1972 = llvm.add %1969, %1971 : !llvm.i64 + %1973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1974 = llvm.mul %974, %1973 : !llvm.i64 + %1975 = llvm.add %1972, %1974 : !llvm.i64 + %1976 = llvm.getelementptr %1968[%1975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1967, %1976 : !llvm.ptr + %1977 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1978 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1979 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1980 = llvm.mul %1035, %1979 : !llvm.i64 + %1981 = llvm.add %1978, %1980 : !llvm.i64 + %1982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1983 = llvm.mul %974, %1982 : !llvm.i64 + %1984 = llvm.add %1981, %1983 : !llvm.i64 + %1985 = llvm.getelementptr %1977[%1984] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1986 = llvm.load %1985 : !llvm.ptr + %1987 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1988 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1989 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1990 = llvm.mul %1035, %1989 : !llvm.i64 + %1991 = llvm.add %1988, %1990 : !llvm.i64 + %1992 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1993 = llvm.mul %974, %1992 : !llvm.i64 + %1994 = llvm.add %1991, %1993 : !llvm.i64 + %1995 = llvm.getelementptr %1987[%1994] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %1986, %1995 : !llvm.ptr + %1996 = llvm.add %50, %34 : !llvm.i64 + %1997 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1998 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1999 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2000 = llvm.mul %1996, %1999 : !llvm.i64 + %2001 = llvm.add %1998, %2000 : !llvm.i64 + %2002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2003 = llvm.mul %59, %2002 : !llvm.i64 + %2004 = llvm.add %2001, %2003 : !llvm.i64 + %2005 = llvm.getelementptr %1997[%2004] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2006 = llvm.load %2005 : !llvm.ptr + %2007 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2009 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2010 = llvm.mul %59, %2009 : !llvm.i64 + %2011 = llvm.add %2008, %2010 : !llvm.i64 + %2012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2013 = llvm.mul %58, %2012 : !llvm.i64 + %2014 = llvm.add %2011, %2013 : !llvm.i64 + %2015 = llvm.getelementptr %2007[%2014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2016 = llvm.load %2015 : !llvm.ptr + %2017 = llvm.fmul %2006, %2016 {RelaxedPrecision} : !llvm.float + %2018 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2020 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2021 = llvm.mul %1996, %2020 : !llvm.i64 + %2022 = llvm.add %2019, %2021 : !llvm.i64 + %2023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2024 = llvm.mul %58, %2023 : !llvm.i64 + %2025 = llvm.add %2022, %2024 : !llvm.i64 + %2026 = llvm.getelementptr %2018[%2025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2027 = llvm.load %2026 : !llvm.ptr + %2028 = llvm.fadd %2027, %2017 {RelaxedPrecision} : !llvm.float + %2029 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2030 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2031 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2032 = llvm.mul %1996, %2031 : !llvm.i64 + %2033 = llvm.add %2030, %2032 : !llvm.i64 + %2034 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2035 = llvm.mul %58, %2034 : !llvm.i64 + %2036 = llvm.add %2033, %2035 : !llvm.i64 + %2037 = llvm.getelementptr %2029[%2036] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2028, %2037 : !llvm.ptr + %2038 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2039 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2040 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2041 = llvm.mul %1996, %2040 : !llvm.i64 + %2042 = llvm.add %2039, %2041 : !llvm.i64 + %2043 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2044 = llvm.mul %58, %2043 : !llvm.i64 + %2045 = llvm.add %2042, %2044 : !llvm.i64 + %2046 = llvm.getelementptr %2038[%2045] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2047 = llvm.load %2046 : !llvm.ptr + %2048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2051 = llvm.mul %1996, %2050 : !llvm.i64 + %2052 = llvm.add %2049, %2051 : !llvm.i64 + %2053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2054 = llvm.mul %58, %2053 : !llvm.i64 + %2055 = llvm.add %2052, %2054 : !llvm.i64 + %2056 = llvm.getelementptr %2048[%2055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2047, %2056 : !llvm.ptr + %2057 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2059 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2060 = llvm.mul %1996, %2059 : !llvm.i64 + %2061 = llvm.add %2058, %2060 : !llvm.i64 + %2062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2063 = llvm.mul %59, %2062 : !llvm.i64 + %2064 = llvm.add %2061, %2063 : !llvm.i64 + %2065 = llvm.getelementptr %2057[%2064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2066 = llvm.load %2065 : !llvm.ptr + %2067 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2069 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2070 = llvm.mul %59, %2069 : !llvm.i64 + %2071 = llvm.add %2068, %2070 : !llvm.i64 + %2072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2073 = llvm.mul %120, %2072 : !llvm.i64 + %2074 = llvm.add %2071, %2073 : !llvm.i64 + %2075 = llvm.getelementptr %2067[%2074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2076 = llvm.load %2075 : !llvm.ptr + %2077 = llvm.fmul %2066, %2076 {RelaxedPrecision} : !llvm.float + %2078 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2080 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2081 = llvm.mul %1996, %2080 : !llvm.i64 + %2082 = llvm.add %2079, %2081 : !llvm.i64 + %2083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2084 = llvm.mul %120, %2083 : !llvm.i64 + %2085 = llvm.add %2082, %2084 : !llvm.i64 + %2086 = llvm.getelementptr %2078[%2085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2087 = llvm.load %2086 : !llvm.ptr + %2088 = llvm.fadd %2087, %2077 {RelaxedPrecision} : !llvm.float + %2089 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2090 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2091 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2092 = llvm.mul %1996, %2091 : !llvm.i64 + %2093 = llvm.add %2090, %2092 : !llvm.i64 + %2094 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2095 = llvm.mul %120, %2094 : !llvm.i64 + %2096 = llvm.add %2093, %2095 : !llvm.i64 + %2097 = llvm.getelementptr %2089[%2096] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2088, %2097 : !llvm.ptr + %2098 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2100 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2101 = llvm.mul %1996, %2100 : !llvm.i64 + %2102 = llvm.add %2099, %2101 : !llvm.i64 + %2103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2104 = llvm.mul %120, %2103 : !llvm.i64 + %2105 = llvm.add %2102, %2104 : !llvm.i64 + %2106 = llvm.getelementptr %2098[%2105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2107 = llvm.load %2106 : !llvm.ptr + %2108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2111 = llvm.mul %1996, %2110 : !llvm.i64 + %2112 = llvm.add %2109, %2111 : !llvm.i64 + %2113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2114 = llvm.mul %120, %2113 : !llvm.i64 + %2115 = llvm.add %2112, %2114 : !llvm.i64 + %2116 = llvm.getelementptr %2108[%2115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2107, %2116 : !llvm.ptr + %2117 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2119 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2120 = llvm.mul %1996, %2119 : !llvm.i64 + %2121 = llvm.add %2118, %2120 : !llvm.i64 + %2122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2123 = llvm.mul %59, %2122 : !llvm.i64 + %2124 = llvm.add %2121, %2123 : !llvm.i64 + %2125 = llvm.getelementptr %2117[%2124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2126 = llvm.load %2125 : !llvm.ptr + %2127 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2128 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2129 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2130 = llvm.mul %59, %2129 : !llvm.i64 + %2131 = llvm.add %2128, %2130 : !llvm.i64 + %2132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2133 = llvm.mul %181, %2132 : !llvm.i64 + %2134 = llvm.add %2131, %2133 : !llvm.i64 + %2135 = llvm.getelementptr %2127[%2134] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2136 = llvm.load %2135 : !llvm.ptr + %2137 = llvm.fmul %2126, %2136 {RelaxedPrecision} : !llvm.float + %2138 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2140 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2141 = llvm.mul %1996, %2140 : !llvm.i64 + %2142 = llvm.add %2139, %2141 : !llvm.i64 + %2143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2144 = llvm.mul %181, %2143 : !llvm.i64 + %2145 = llvm.add %2142, %2144 : !llvm.i64 + %2146 = llvm.getelementptr %2138[%2145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2147 = llvm.load %2146 : !llvm.ptr + %2148 = llvm.fadd %2147, %2137 {RelaxedPrecision} : !llvm.float + %2149 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2150 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2151 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2152 = llvm.mul %1996, %2151 : !llvm.i64 + %2153 = llvm.add %2150, %2152 : !llvm.i64 + %2154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2155 = llvm.mul %181, %2154 : !llvm.i64 + %2156 = llvm.add %2153, %2155 : !llvm.i64 + %2157 = llvm.getelementptr %2149[%2156] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2148, %2157 : !llvm.ptr + %2158 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2161 = llvm.mul %1996, %2160 : !llvm.i64 + %2162 = llvm.add %2159, %2161 : !llvm.i64 + %2163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2164 = llvm.mul %181, %2163 : !llvm.i64 + %2165 = llvm.add %2162, %2164 : !llvm.i64 + %2166 = llvm.getelementptr %2158[%2165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2167 = llvm.load %2166 : !llvm.ptr + %2168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2171 = llvm.mul %1996, %2170 : !llvm.i64 + %2172 = llvm.add %2169, %2171 : !llvm.i64 + %2173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2174 = llvm.mul %181, %2173 : !llvm.i64 + %2175 = llvm.add %2172, %2174 : !llvm.i64 + %2176 = llvm.getelementptr %2168[%2175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2167, %2176 : !llvm.ptr + %2177 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2179 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2180 = llvm.mul %1996, %2179 : !llvm.i64 + %2181 = llvm.add %2178, %2180 : !llvm.i64 + %2182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2183 = llvm.mul %59, %2182 : !llvm.i64 + %2184 = llvm.add %2181, %2183 : !llvm.i64 + %2185 = llvm.getelementptr %2177[%2184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2186 = llvm.load %2185 : !llvm.ptr + %2187 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2188 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2189 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2190 = llvm.mul %59, %2189 : !llvm.i64 + %2191 = llvm.add %2188, %2190 : !llvm.i64 + %2192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2193 = llvm.mul %242, %2192 : !llvm.i64 + %2194 = llvm.add %2191, %2193 : !llvm.i64 + %2195 = llvm.getelementptr %2187[%2194] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2196 = llvm.load %2195 : !llvm.ptr + %2197 = llvm.fmul %2186, %2196 {RelaxedPrecision} : !llvm.float + %2198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2201 = llvm.mul %1996, %2200 : !llvm.i64 + %2202 = llvm.add %2199, %2201 : !llvm.i64 + %2203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2204 = llvm.mul %242, %2203 : !llvm.i64 + %2205 = llvm.add %2202, %2204 : !llvm.i64 + %2206 = llvm.getelementptr %2198[%2205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2207 = llvm.load %2206 : !llvm.ptr + %2208 = llvm.fadd %2207, %2197 {RelaxedPrecision} : !llvm.float + %2209 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2211 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2212 = llvm.mul %1996, %2211 : !llvm.i64 + %2213 = llvm.add %2210, %2212 : !llvm.i64 + %2214 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2215 = llvm.mul %242, %2214 : !llvm.i64 + %2216 = llvm.add %2213, %2215 : !llvm.i64 + %2217 = llvm.getelementptr %2209[%2216] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2208, %2217 : !llvm.ptr + %2218 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2220 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2221 = llvm.mul %1996, %2220 : !llvm.i64 + %2222 = llvm.add %2219, %2221 : !llvm.i64 + %2223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2224 = llvm.mul %242, %2223 : !llvm.i64 + %2225 = llvm.add %2222, %2224 : !llvm.i64 + %2226 = llvm.getelementptr %2218[%2225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2227 = llvm.load %2226 : !llvm.ptr + %2228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2231 = llvm.mul %1996, %2230 : !llvm.i64 + %2232 = llvm.add %2229, %2231 : !llvm.i64 + %2233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2234 = llvm.mul %242, %2233 : !llvm.i64 + %2235 = llvm.add %2232, %2234 : !llvm.i64 + %2236 = llvm.getelementptr %2228[%2235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2227, %2236 : !llvm.ptr + %2237 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2239 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2240 = llvm.mul %1996, %2239 : !llvm.i64 + %2241 = llvm.add %2238, %2240 : !llvm.i64 + %2242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2243 = llvm.mul %59, %2242 : !llvm.i64 + %2244 = llvm.add %2241, %2243 : !llvm.i64 + %2245 = llvm.getelementptr %2237[%2244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2246 = llvm.load %2245 : !llvm.ptr + %2247 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2248 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2249 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2250 = llvm.mul %59, %2249 : !llvm.i64 + %2251 = llvm.add %2248, %2250 : !llvm.i64 + %2252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2253 = llvm.mul %303, %2252 : !llvm.i64 + %2254 = llvm.add %2251, %2253 : !llvm.i64 + %2255 = llvm.getelementptr %2247[%2254] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2256 = llvm.load %2255 : !llvm.ptr + %2257 = llvm.fmul %2246, %2256 {RelaxedPrecision} : !llvm.float + %2258 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2260 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2261 = llvm.mul %1996, %2260 : !llvm.i64 + %2262 = llvm.add %2259, %2261 : !llvm.i64 + %2263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2264 = llvm.mul %303, %2263 : !llvm.i64 + %2265 = llvm.add %2262, %2264 : !llvm.i64 + %2266 = llvm.getelementptr %2258[%2265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2267 = llvm.load %2266 : !llvm.ptr + %2268 = llvm.fadd %2267, %2257 {RelaxedPrecision} : !llvm.float + %2269 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2270 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2271 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2272 = llvm.mul %1996, %2271 : !llvm.i64 + %2273 = llvm.add %2270, %2272 : !llvm.i64 + %2274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2275 = llvm.mul %303, %2274 : !llvm.i64 + %2276 = llvm.add %2273, %2275 : !llvm.i64 + %2277 = llvm.getelementptr %2269[%2276] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2268, %2277 : !llvm.ptr + %2278 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2279 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2280 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2281 = llvm.mul %1996, %2280 : !llvm.i64 + %2282 = llvm.add %2279, %2281 : !llvm.i64 + %2283 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2284 = llvm.mul %303, %2283 : !llvm.i64 + %2285 = llvm.add %2282, %2284 : !llvm.i64 + %2286 = llvm.getelementptr %2278[%2285] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2287 = llvm.load %2286 : !llvm.ptr + %2288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2291 = llvm.mul %1996, %2290 : !llvm.i64 + %2292 = llvm.add %2289, %2291 : !llvm.i64 + %2293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2294 = llvm.mul %303, %2293 : !llvm.i64 + %2295 = llvm.add %2292, %2294 : !llvm.i64 + %2296 = llvm.getelementptr %2288[%2295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2287, %2296 : !llvm.ptr + %2297 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2299 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2300 = llvm.mul %1996, %2299 : !llvm.i64 + %2301 = llvm.add %2298, %2300 : !llvm.i64 + %2302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2303 = llvm.mul %59, %2302 : !llvm.i64 + %2304 = llvm.add %2301, %2303 : !llvm.i64 + %2305 = llvm.getelementptr %2297[%2304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2306 = llvm.load %2305 : !llvm.ptr + %2307 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2308 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2309 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2310 = llvm.mul %59, %2309 : !llvm.i64 + %2311 = llvm.add %2308, %2310 : !llvm.i64 + %2312 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2313 = llvm.mul %364, %2312 : !llvm.i64 + %2314 = llvm.add %2311, %2313 : !llvm.i64 + %2315 = llvm.getelementptr %2307[%2314] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2316 = llvm.load %2315 : !llvm.ptr + %2317 = llvm.fmul %2306, %2316 {RelaxedPrecision} : !llvm.float + %2318 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2320 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2321 = llvm.mul %1996, %2320 : !llvm.i64 + %2322 = llvm.add %2319, %2321 : !llvm.i64 + %2323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2324 = llvm.mul %364, %2323 : !llvm.i64 + %2325 = llvm.add %2322, %2324 : !llvm.i64 + %2326 = llvm.getelementptr %2318[%2325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2327 = llvm.load %2326 : !llvm.ptr + %2328 = llvm.fadd %2327, %2317 {RelaxedPrecision} : !llvm.float + %2329 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2330 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2331 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2332 = llvm.mul %1996, %2331 : !llvm.i64 + %2333 = llvm.add %2330, %2332 : !llvm.i64 + %2334 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2335 = llvm.mul %364, %2334 : !llvm.i64 + %2336 = llvm.add %2333, %2335 : !llvm.i64 + %2337 = llvm.getelementptr %2329[%2336] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2328, %2337 : !llvm.ptr + %2338 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2339 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2340 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2341 = llvm.mul %1996, %2340 : !llvm.i64 + %2342 = llvm.add %2339, %2341 : !llvm.i64 + %2343 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2344 = llvm.mul %364, %2343 : !llvm.i64 + %2345 = llvm.add %2342, %2344 : !llvm.i64 + %2346 = llvm.getelementptr %2338[%2345] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2347 = llvm.load %2346 : !llvm.ptr + %2348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2351 = llvm.mul %1996, %2350 : !llvm.i64 + %2352 = llvm.add %2349, %2351 : !llvm.i64 + %2353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2354 = llvm.mul %364, %2353 : !llvm.i64 + %2355 = llvm.add %2352, %2354 : !llvm.i64 + %2356 = llvm.getelementptr %2348[%2355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2347, %2356 : !llvm.ptr + %2357 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2359 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2360 = llvm.mul %1996, %2359 : !llvm.i64 + %2361 = llvm.add %2358, %2360 : !llvm.i64 + %2362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2363 = llvm.mul %59, %2362 : !llvm.i64 + %2364 = llvm.add %2361, %2363 : !llvm.i64 + %2365 = llvm.getelementptr %2357[%2364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2366 = llvm.load %2365 : !llvm.ptr + %2367 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2368 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2369 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2370 = llvm.mul %59, %2369 : !llvm.i64 + %2371 = llvm.add %2368, %2370 : !llvm.i64 + %2372 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2373 = llvm.mul %425, %2372 : !llvm.i64 + %2374 = llvm.add %2371, %2373 : !llvm.i64 + %2375 = llvm.getelementptr %2367[%2374] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2376 = llvm.load %2375 : !llvm.ptr + %2377 = llvm.fmul %2366, %2376 {RelaxedPrecision} : !llvm.float + %2378 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2380 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2381 = llvm.mul %1996, %2380 : !llvm.i64 + %2382 = llvm.add %2379, %2381 : !llvm.i64 + %2383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2384 = llvm.mul %425, %2383 : !llvm.i64 + %2385 = llvm.add %2382, %2384 : !llvm.i64 + %2386 = llvm.getelementptr %2378[%2385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2387 = llvm.load %2386 : !llvm.ptr + %2388 = llvm.fadd %2387, %2377 {RelaxedPrecision} : !llvm.float + %2389 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2390 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2391 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2392 = llvm.mul %1996, %2391 : !llvm.i64 + %2393 = llvm.add %2390, %2392 : !llvm.i64 + %2394 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2395 = llvm.mul %425, %2394 : !llvm.i64 + %2396 = llvm.add %2393, %2395 : !llvm.i64 + %2397 = llvm.getelementptr %2389[%2396] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2388, %2397 : !llvm.ptr + %2398 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2399 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2401 = llvm.mul %1996, %2400 : !llvm.i64 + %2402 = llvm.add %2399, %2401 : !llvm.i64 + %2403 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2404 = llvm.mul %425, %2403 : !llvm.i64 + %2405 = llvm.add %2402, %2404 : !llvm.i64 + %2406 = llvm.getelementptr %2398[%2405] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2407 = llvm.load %2406 : !llvm.ptr + %2408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2411 = llvm.mul %1996, %2410 : !llvm.i64 + %2412 = llvm.add %2409, %2411 : !llvm.i64 + %2413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2414 = llvm.mul %425, %2413 : !llvm.i64 + %2415 = llvm.add %2412, %2414 : !llvm.i64 + %2416 = llvm.getelementptr %2408[%2415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2407, %2416 : !llvm.ptr + %2417 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2419 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2420 = llvm.mul %1996, %2419 : !llvm.i64 + %2421 = llvm.add %2418, %2420 : !llvm.i64 + %2422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2423 = llvm.mul %59, %2422 : !llvm.i64 + %2424 = llvm.add %2421, %2423 : !llvm.i64 + %2425 = llvm.getelementptr %2417[%2424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2426 = llvm.load %2425 : !llvm.ptr + %2427 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2428 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2429 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2430 = llvm.mul %59, %2429 : !llvm.i64 + %2431 = llvm.add %2428, %2430 : !llvm.i64 + %2432 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2433 = llvm.mul %486, %2432 : !llvm.i64 + %2434 = llvm.add %2431, %2433 : !llvm.i64 + %2435 = llvm.getelementptr %2427[%2434] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2436 = llvm.load %2435 : !llvm.ptr + %2437 = llvm.fmul %2426, %2436 {RelaxedPrecision} : !llvm.float + %2438 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2440 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2441 = llvm.mul %1996, %2440 : !llvm.i64 + %2442 = llvm.add %2439, %2441 : !llvm.i64 + %2443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2444 = llvm.mul %486, %2443 : !llvm.i64 + %2445 = llvm.add %2442, %2444 : !llvm.i64 + %2446 = llvm.getelementptr %2438[%2445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2447 = llvm.load %2446 : !llvm.ptr + %2448 = llvm.fadd %2447, %2437 {RelaxedPrecision} : !llvm.float + %2449 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2450 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2451 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2452 = llvm.mul %1996, %2451 : !llvm.i64 + %2453 = llvm.add %2450, %2452 : !llvm.i64 + %2454 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2455 = llvm.mul %486, %2454 : !llvm.i64 + %2456 = llvm.add %2453, %2455 : !llvm.i64 + %2457 = llvm.getelementptr %2449[%2456] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2448, %2457 : !llvm.ptr + %2458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2461 = llvm.mul %1996, %2460 : !llvm.i64 + %2462 = llvm.add %2459, %2461 : !llvm.i64 + %2463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2464 = llvm.mul %486, %2463 : !llvm.i64 + %2465 = llvm.add %2462, %2464 : !llvm.i64 + %2466 = llvm.getelementptr %2458[%2465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2467 = llvm.load %2466 : !llvm.ptr + %2468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2471 = llvm.mul %1996, %2470 : !llvm.i64 + %2472 = llvm.add %2469, %2471 : !llvm.i64 + %2473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2474 = llvm.mul %486, %2473 : !llvm.i64 + %2475 = llvm.add %2472, %2474 : !llvm.i64 + %2476 = llvm.getelementptr %2468[%2475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2467, %2476 : !llvm.ptr + %2477 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2479 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2480 = llvm.mul %1996, %2479 : !llvm.i64 + %2481 = llvm.add %2478, %2480 : !llvm.i64 + %2482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2483 = llvm.mul %59, %2482 : !llvm.i64 + %2484 = llvm.add %2481, %2483 : !llvm.i64 + %2485 = llvm.getelementptr %2477[%2484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2486 = llvm.load %2485 : !llvm.ptr + %2487 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2489 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2490 = llvm.mul %59, %2489 : !llvm.i64 + %2491 = llvm.add %2488, %2490 : !llvm.i64 + %2492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2493 = llvm.mul %547, %2492 : !llvm.i64 + %2494 = llvm.add %2491, %2493 : !llvm.i64 + %2495 = llvm.getelementptr %2487[%2494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2496 = llvm.load %2495 : !llvm.ptr + %2497 = llvm.fmul %2486, %2496 {RelaxedPrecision} : !llvm.float + %2498 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2500 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2501 = llvm.mul %1996, %2500 : !llvm.i64 + %2502 = llvm.add %2499, %2501 : !llvm.i64 + %2503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2504 = llvm.mul %547, %2503 : !llvm.i64 + %2505 = llvm.add %2502, %2504 : !llvm.i64 + %2506 = llvm.getelementptr %2498[%2505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2507 = llvm.load %2506 : !llvm.ptr + %2508 = llvm.fadd %2507, %2497 {RelaxedPrecision} : !llvm.float + %2509 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2510 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2511 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2512 = llvm.mul %1996, %2511 : !llvm.i64 + %2513 = llvm.add %2510, %2512 : !llvm.i64 + %2514 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2515 = llvm.mul %547, %2514 : !llvm.i64 + %2516 = llvm.add %2513, %2515 : !llvm.i64 + %2517 = llvm.getelementptr %2509[%2516] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2508, %2517 : !llvm.ptr + %2518 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2519 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2520 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2521 = llvm.mul %1996, %2520 : !llvm.i64 + %2522 = llvm.add %2519, %2521 : !llvm.i64 + %2523 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2524 = llvm.mul %547, %2523 : !llvm.i64 + %2525 = llvm.add %2522, %2524 : !llvm.i64 + %2526 = llvm.getelementptr %2518[%2525] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2527 = llvm.load %2526 : !llvm.ptr + %2528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2531 = llvm.mul %1996, %2530 : !llvm.i64 + %2532 = llvm.add %2529, %2531 : !llvm.i64 + %2533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2534 = llvm.mul %547, %2533 : !llvm.i64 + %2535 = llvm.add %2532, %2534 : !llvm.i64 + %2536 = llvm.getelementptr %2528[%2535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2527, %2536 : !llvm.ptr + %2537 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2539 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2540 = llvm.mul %1996, %2539 : !llvm.i64 + %2541 = llvm.add %2538, %2540 : !llvm.i64 + %2542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2543 = llvm.mul %59, %2542 : !llvm.i64 + %2544 = llvm.add %2541, %2543 : !llvm.i64 + %2545 = llvm.getelementptr %2537[%2544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2546 = llvm.load %2545 : !llvm.ptr + %2547 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2548 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2549 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2550 = llvm.mul %59, %2549 : !llvm.i64 + %2551 = llvm.add %2548, %2550 : !llvm.i64 + %2552 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2553 = llvm.mul %608, %2552 : !llvm.i64 + %2554 = llvm.add %2551, %2553 : !llvm.i64 + %2555 = llvm.getelementptr %2547[%2554] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2556 = llvm.load %2555 : !llvm.ptr + %2557 = llvm.fmul %2546, %2556 {RelaxedPrecision} : !llvm.float + %2558 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2561 = llvm.mul %1996, %2560 : !llvm.i64 + %2562 = llvm.add %2559, %2561 : !llvm.i64 + %2563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2564 = llvm.mul %608, %2563 : !llvm.i64 + %2565 = llvm.add %2562, %2564 : !llvm.i64 + %2566 = llvm.getelementptr %2558[%2565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2567 = llvm.load %2566 : !llvm.ptr + %2568 = llvm.fadd %2567, %2557 {RelaxedPrecision} : !llvm.float + %2569 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2570 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2571 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2572 = llvm.mul %1996, %2571 : !llvm.i64 + %2573 = llvm.add %2570, %2572 : !llvm.i64 + %2574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2575 = llvm.mul %608, %2574 : !llvm.i64 + %2576 = llvm.add %2573, %2575 : !llvm.i64 + %2577 = llvm.getelementptr %2569[%2576] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2568, %2577 : !llvm.ptr + %2578 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2580 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2581 = llvm.mul %1996, %2580 : !llvm.i64 + %2582 = llvm.add %2579, %2581 : !llvm.i64 + %2583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2584 = llvm.mul %608, %2583 : !llvm.i64 + %2585 = llvm.add %2582, %2584 : !llvm.i64 + %2586 = llvm.getelementptr %2578[%2585] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2587 = llvm.load %2586 : !llvm.ptr + %2588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2591 = llvm.mul %1996, %2590 : !llvm.i64 + %2592 = llvm.add %2589, %2591 : !llvm.i64 + %2593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2594 = llvm.mul %608, %2593 : !llvm.i64 + %2595 = llvm.add %2592, %2594 : !llvm.i64 + %2596 = llvm.getelementptr %2588[%2595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2587, %2596 : !llvm.ptr + %2597 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2599 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2600 = llvm.mul %1996, %2599 : !llvm.i64 + %2601 = llvm.add %2598, %2600 : !llvm.i64 + %2602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2603 = llvm.mul %59, %2602 : !llvm.i64 + %2604 = llvm.add %2601, %2603 : !llvm.i64 + %2605 = llvm.getelementptr %2597[%2604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2606 = llvm.load %2605 : !llvm.ptr + %2607 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2608 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2610 = llvm.mul %59, %2609 : !llvm.i64 + %2611 = llvm.add %2608, %2610 : !llvm.i64 + %2612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2613 = llvm.mul %669, %2612 : !llvm.i64 + %2614 = llvm.add %2611, %2613 : !llvm.i64 + %2615 = llvm.getelementptr %2607[%2614] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2616 = llvm.load %2615 : !llvm.ptr + %2617 = llvm.fmul %2606, %2616 {RelaxedPrecision} : !llvm.float + %2618 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2620 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2621 = llvm.mul %1996, %2620 : !llvm.i64 + %2622 = llvm.add %2619, %2621 : !llvm.i64 + %2623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2624 = llvm.mul %669, %2623 : !llvm.i64 + %2625 = llvm.add %2622, %2624 : !llvm.i64 + %2626 = llvm.getelementptr %2618[%2625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2627 = llvm.load %2626 : !llvm.ptr + %2628 = llvm.fadd %2627, %2617 {RelaxedPrecision} : !llvm.float + %2629 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2630 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2631 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2632 = llvm.mul %1996, %2631 : !llvm.i64 + %2633 = llvm.add %2630, %2632 : !llvm.i64 + %2634 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2635 = llvm.mul %669, %2634 : !llvm.i64 + %2636 = llvm.add %2633, %2635 : !llvm.i64 + %2637 = llvm.getelementptr %2629[%2636] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2628, %2637 : !llvm.ptr + %2638 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2639 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2640 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2641 = llvm.mul %1996, %2640 : !llvm.i64 + %2642 = llvm.add %2639, %2641 : !llvm.i64 + %2643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2644 = llvm.mul %669, %2643 : !llvm.i64 + %2645 = llvm.add %2642, %2644 : !llvm.i64 + %2646 = llvm.getelementptr %2638[%2645] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2647 = llvm.load %2646 : !llvm.ptr + %2648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2651 = llvm.mul %1996, %2650 : !llvm.i64 + %2652 = llvm.add %2649, %2651 : !llvm.i64 + %2653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2654 = llvm.mul %669, %2653 : !llvm.i64 + %2655 = llvm.add %2652, %2654 : !llvm.i64 + %2656 = llvm.getelementptr %2648[%2655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2647, %2656 : !llvm.ptr + %2657 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2659 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2660 = llvm.mul %1996, %2659 : !llvm.i64 + %2661 = llvm.add %2658, %2660 : !llvm.i64 + %2662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2663 = llvm.mul %59, %2662 : !llvm.i64 + %2664 = llvm.add %2661, %2663 : !llvm.i64 + %2665 = llvm.getelementptr %2657[%2664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2666 = llvm.load %2665 : !llvm.ptr + %2667 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2668 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2669 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2670 = llvm.mul %59, %2669 : !llvm.i64 + %2671 = llvm.add %2668, %2670 : !llvm.i64 + %2672 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2673 = llvm.mul %730, %2672 : !llvm.i64 + %2674 = llvm.add %2671, %2673 : !llvm.i64 + %2675 = llvm.getelementptr %2667[%2674] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2676 = llvm.load %2675 : !llvm.ptr + %2677 = llvm.fmul %2666, %2676 {RelaxedPrecision} : !llvm.float + %2678 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2680 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2681 = llvm.mul %1996, %2680 : !llvm.i64 + %2682 = llvm.add %2679, %2681 : !llvm.i64 + %2683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2684 = llvm.mul %730, %2683 : !llvm.i64 + %2685 = llvm.add %2682, %2684 : !llvm.i64 + %2686 = llvm.getelementptr %2678[%2685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2687 = llvm.load %2686 : !llvm.ptr + %2688 = llvm.fadd %2687, %2677 {RelaxedPrecision} : !llvm.float + %2689 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2690 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2691 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2692 = llvm.mul %1996, %2691 : !llvm.i64 + %2693 = llvm.add %2690, %2692 : !llvm.i64 + %2694 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2695 = llvm.mul %730, %2694 : !llvm.i64 + %2696 = llvm.add %2693, %2695 : !llvm.i64 + %2697 = llvm.getelementptr %2689[%2696] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2688, %2697 : !llvm.ptr + %2698 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2700 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2701 = llvm.mul %1996, %2700 : !llvm.i64 + %2702 = llvm.add %2699, %2701 : !llvm.i64 + %2703 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2704 = llvm.mul %730, %2703 : !llvm.i64 + %2705 = llvm.add %2702, %2704 : !llvm.i64 + %2706 = llvm.getelementptr %2698[%2705] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2707 = llvm.load %2706 : !llvm.ptr + %2708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2711 = llvm.mul %1996, %2710 : !llvm.i64 + %2712 = llvm.add %2709, %2711 : !llvm.i64 + %2713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2714 = llvm.mul %730, %2713 : !llvm.i64 + %2715 = llvm.add %2712, %2714 : !llvm.i64 + %2716 = llvm.getelementptr %2708[%2715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2707, %2716 : !llvm.ptr + %2717 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2718 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2719 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2720 = llvm.mul %1996, %2719 : !llvm.i64 + %2721 = llvm.add %2718, %2720 : !llvm.i64 + %2722 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2723 = llvm.mul %59, %2722 : !llvm.i64 + %2724 = llvm.add %2721, %2723 : !llvm.i64 + %2725 = llvm.getelementptr %2717[%2724] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2726 = llvm.load %2725 : !llvm.ptr + %2727 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2729 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2730 = llvm.mul %59, %2729 : !llvm.i64 + %2731 = llvm.add %2728, %2730 : !llvm.i64 + %2732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2733 = llvm.mul %791, %2732 : !llvm.i64 + %2734 = llvm.add %2731, %2733 : !llvm.i64 + %2735 = llvm.getelementptr %2727[%2734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2736 = llvm.load %2735 : !llvm.ptr + %2737 = llvm.fmul %2726, %2736 {RelaxedPrecision} : !llvm.float + %2738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2741 = llvm.mul %1996, %2740 : !llvm.i64 + %2742 = llvm.add %2739, %2741 : !llvm.i64 + %2743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2744 = llvm.mul %791, %2743 : !llvm.i64 + %2745 = llvm.add %2742, %2744 : !llvm.i64 + %2746 = llvm.getelementptr %2738[%2745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2747 = llvm.load %2746 : !llvm.ptr + %2748 = llvm.fadd %2747, %2737 {RelaxedPrecision} : !llvm.float + %2749 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2750 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2751 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2752 = llvm.mul %1996, %2751 : !llvm.i64 + %2753 = llvm.add %2750, %2752 : !llvm.i64 + %2754 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2755 = llvm.mul %791, %2754 : !llvm.i64 + %2756 = llvm.add %2753, %2755 : !llvm.i64 + %2757 = llvm.getelementptr %2749[%2756] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2748, %2757 : !llvm.ptr + %2758 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2759 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2760 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2761 = llvm.mul %1996, %2760 : !llvm.i64 + %2762 = llvm.add %2759, %2761 : !llvm.i64 + %2763 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2764 = llvm.mul %791, %2763 : !llvm.i64 + %2765 = llvm.add %2762, %2764 : !llvm.i64 + %2766 = llvm.getelementptr %2758[%2765] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2767 = llvm.load %2766 : !llvm.ptr + %2768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2771 = llvm.mul %1996, %2770 : !llvm.i64 + %2772 = llvm.add %2769, %2771 : !llvm.i64 + %2773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2774 = llvm.mul %791, %2773 : !llvm.i64 + %2775 = llvm.add %2772, %2774 : !llvm.i64 + %2776 = llvm.getelementptr %2768[%2775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2767, %2776 : !llvm.ptr + %2777 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2779 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2780 = llvm.mul %1996, %2779 : !llvm.i64 + %2781 = llvm.add %2778, %2780 : !llvm.i64 + %2782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2783 = llvm.mul %59, %2782 : !llvm.i64 + %2784 = llvm.add %2781, %2783 : !llvm.i64 + %2785 = llvm.getelementptr %2777[%2784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2786 = llvm.load %2785 : !llvm.ptr + %2787 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2788 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2789 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2790 = llvm.mul %59, %2789 : !llvm.i64 + %2791 = llvm.add %2788, %2790 : !llvm.i64 + %2792 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2793 = llvm.mul %852, %2792 : !llvm.i64 + %2794 = llvm.add %2791, %2793 : !llvm.i64 + %2795 = llvm.getelementptr %2787[%2794] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2796 = llvm.load %2795 : !llvm.ptr + %2797 = llvm.fmul %2786, %2796 {RelaxedPrecision} : !llvm.float + %2798 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2800 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2801 = llvm.mul %1996, %2800 : !llvm.i64 + %2802 = llvm.add %2799, %2801 : !llvm.i64 + %2803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2804 = llvm.mul %852, %2803 : !llvm.i64 + %2805 = llvm.add %2802, %2804 : !llvm.i64 + %2806 = llvm.getelementptr %2798[%2805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2807 = llvm.load %2806 : !llvm.ptr + %2808 = llvm.fadd %2807, %2797 {RelaxedPrecision} : !llvm.float + %2809 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2810 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2811 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2812 = llvm.mul %1996, %2811 : !llvm.i64 + %2813 = llvm.add %2810, %2812 : !llvm.i64 + %2814 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2815 = llvm.mul %852, %2814 : !llvm.i64 + %2816 = llvm.add %2813, %2815 : !llvm.i64 + %2817 = llvm.getelementptr %2809[%2816] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2808, %2817 : !llvm.ptr + %2818 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2819 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2820 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2821 = llvm.mul %1996, %2820 : !llvm.i64 + %2822 = llvm.add %2819, %2821 : !llvm.i64 + %2823 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2824 = llvm.mul %852, %2823 : !llvm.i64 + %2825 = llvm.add %2822, %2824 : !llvm.i64 + %2826 = llvm.getelementptr %2818[%2825] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2827 = llvm.load %2826 : !llvm.ptr + %2828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2831 = llvm.mul %1996, %2830 : !llvm.i64 + %2832 = llvm.add %2829, %2831 : !llvm.i64 + %2833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2834 = llvm.mul %852, %2833 : !llvm.i64 + %2835 = llvm.add %2832, %2834 : !llvm.i64 + %2836 = llvm.getelementptr %2828[%2835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2827, %2836 : !llvm.ptr + %2837 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2839 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2840 = llvm.mul %1996, %2839 : !llvm.i64 + %2841 = llvm.add %2838, %2840 : !llvm.i64 + %2842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2843 = llvm.mul %59, %2842 : !llvm.i64 + %2844 = llvm.add %2841, %2843 : !llvm.i64 + %2845 = llvm.getelementptr %2837[%2844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2846 = llvm.load %2845 : !llvm.ptr + %2847 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2848 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2849 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2850 = llvm.mul %59, %2849 : !llvm.i64 + %2851 = llvm.add %2848, %2850 : !llvm.i64 + %2852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2853 = llvm.mul %913, %2852 : !llvm.i64 + %2854 = llvm.add %2851, %2853 : !llvm.i64 + %2855 = llvm.getelementptr %2847[%2854] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2856 = llvm.load %2855 : !llvm.ptr + %2857 = llvm.fmul %2846, %2856 {RelaxedPrecision} : !llvm.float + %2858 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2860 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2861 = llvm.mul %1996, %2860 : !llvm.i64 + %2862 = llvm.add %2859, %2861 : !llvm.i64 + %2863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2864 = llvm.mul %913, %2863 : !llvm.i64 + %2865 = llvm.add %2862, %2864 : !llvm.i64 + %2866 = llvm.getelementptr %2858[%2865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2867 = llvm.load %2866 : !llvm.ptr + %2868 = llvm.fadd %2867, %2857 {RelaxedPrecision} : !llvm.float + %2869 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2871 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2872 = llvm.mul %1996, %2871 : !llvm.i64 + %2873 = llvm.add %2870, %2872 : !llvm.i64 + %2874 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2875 = llvm.mul %913, %2874 : !llvm.i64 + %2876 = llvm.add %2873, %2875 : !llvm.i64 + %2877 = llvm.getelementptr %2869[%2876] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2868, %2877 : !llvm.ptr + %2878 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2880 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2881 = llvm.mul %1996, %2880 : !llvm.i64 + %2882 = llvm.add %2879, %2881 : !llvm.i64 + %2883 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2884 = llvm.mul %913, %2883 : !llvm.i64 + %2885 = llvm.add %2882, %2884 : !llvm.i64 + %2886 = llvm.getelementptr %2878[%2885] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2887 = llvm.load %2886 : !llvm.ptr + %2888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2891 = llvm.mul %1996, %2890 : !llvm.i64 + %2892 = llvm.add %2889, %2891 : !llvm.i64 + %2893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2894 = llvm.mul %913, %2893 : !llvm.i64 + %2895 = llvm.add %2892, %2894 : !llvm.i64 + %2896 = llvm.getelementptr %2888[%2895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2887, %2896 : !llvm.ptr + %2897 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2899 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2900 = llvm.mul %1996, %2899 : !llvm.i64 + %2901 = llvm.add %2898, %2900 : !llvm.i64 + %2902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2903 = llvm.mul %59, %2902 : !llvm.i64 + %2904 = llvm.add %2901, %2903 : !llvm.i64 + %2905 = llvm.getelementptr %2897[%2904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2906 = llvm.load %2905 : !llvm.ptr + %2907 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2908 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2909 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2910 = llvm.mul %59, %2909 : !llvm.i64 + %2911 = llvm.add %2908, %2910 : !llvm.i64 + %2912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2913 = llvm.mul %974, %2912 : !llvm.i64 + %2914 = llvm.add %2911, %2913 : !llvm.i64 + %2915 = llvm.getelementptr %2907[%2914] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2916 = llvm.load %2915 : !llvm.ptr + %2917 = llvm.fmul %2906, %2916 {RelaxedPrecision} : !llvm.float + %2918 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2920 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2921 = llvm.mul %1996, %2920 : !llvm.i64 + %2922 = llvm.add %2919, %2921 : !llvm.i64 + %2923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2924 = llvm.mul %974, %2923 : !llvm.i64 + %2925 = llvm.add %2922, %2924 : !llvm.i64 + %2926 = llvm.getelementptr %2918[%2925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2927 = llvm.load %2926 : !llvm.ptr + %2928 = llvm.fadd %2927, %2917 {RelaxedPrecision} : !llvm.float + %2929 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2930 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2931 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2932 = llvm.mul %1996, %2931 : !llvm.i64 + %2933 = llvm.add %2930, %2932 : !llvm.i64 + %2934 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2935 = llvm.mul %974, %2934 : !llvm.i64 + %2936 = llvm.add %2933, %2935 : !llvm.i64 + %2937 = llvm.getelementptr %2929[%2936] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2928, %2937 : !llvm.ptr + %2938 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2940 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2941 = llvm.mul %1996, %2940 : !llvm.i64 + %2942 = llvm.add %2939, %2941 : !llvm.i64 + %2943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2944 = llvm.mul %974, %2943 : !llvm.i64 + %2945 = llvm.add %2942, %2944 : !llvm.i64 + %2946 = llvm.getelementptr %2938[%2945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2947 = llvm.load %2946 : !llvm.ptr + %2948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2951 = llvm.mul %1996, %2950 : !llvm.i64 + %2952 = llvm.add %2949, %2951 : !llvm.i64 + %2953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2954 = llvm.mul %974, %2953 : !llvm.i64 + %2955 = llvm.add %2952, %2954 : !llvm.i64 + %2956 = llvm.getelementptr %2948[%2955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2947, %2956 : !llvm.ptr + %2957 = llvm.add %50, %35 : !llvm.i64 + %2958 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2960 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2961 = llvm.mul %2957, %2960 : !llvm.i64 + %2962 = llvm.add %2959, %2961 : !llvm.i64 + %2963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2964 = llvm.mul %59, %2963 : !llvm.i64 + %2965 = llvm.add %2962, %2964 : !llvm.i64 + %2966 = llvm.getelementptr %2958[%2965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2967 = llvm.load %2966 : !llvm.ptr + %2968 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2971 = llvm.mul %59, %2970 : !llvm.i64 + %2972 = llvm.add %2969, %2971 : !llvm.i64 + %2973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2974 = llvm.mul %58, %2973 : !llvm.i64 + %2975 = llvm.add %2972, %2974 : !llvm.i64 + %2976 = llvm.getelementptr %2968[%2975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2977 = llvm.load %2976 : !llvm.ptr + %2978 = llvm.fmul %2967, %2977 {RelaxedPrecision} : !llvm.float + %2979 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2981 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2982 = llvm.mul %2957, %2981 : !llvm.i64 + %2983 = llvm.add %2980, %2982 : !llvm.i64 + %2984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2985 = llvm.mul %58, %2984 : !llvm.i64 + %2986 = llvm.add %2983, %2985 : !llvm.i64 + %2987 = llvm.getelementptr %2979[%2986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2988 = llvm.load %2987 : !llvm.ptr + %2989 = llvm.fadd %2988, %2978 {RelaxedPrecision} : !llvm.float + %2990 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2991 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2992 = llvm.mlir.constant(512 : index) : !llvm.i64 + %2993 = llvm.mul %2957, %2992 : !llvm.i64 + %2994 = llvm.add %2991, %2993 : !llvm.i64 + %2995 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2996 = llvm.mul %58, %2995 : !llvm.i64 + %2997 = llvm.add %2994, %2996 : !llvm.i64 + %2998 = llvm.getelementptr %2990[%2997] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %2989, %2998 : !llvm.ptr + %2999 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3000 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3001 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3002 = llvm.mul %2957, %3001 : !llvm.i64 + %3003 = llvm.add %3000, %3002 : !llvm.i64 + %3004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3005 = llvm.mul %58, %3004 : !llvm.i64 + %3006 = llvm.add %3003, %3005 : !llvm.i64 + %3007 = llvm.getelementptr %2999[%3006] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3008 = llvm.load %3007 : !llvm.ptr + %3009 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3010 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3011 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3012 = llvm.mul %2957, %3011 : !llvm.i64 + %3013 = llvm.add %3010, %3012 : !llvm.i64 + %3014 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3015 = llvm.mul %58, %3014 : !llvm.i64 + %3016 = llvm.add %3013, %3015 : !llvm.i64 + %3017 = llvm.getelementptr %3009[%3016] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3008, %3017 : !llvm.ptr + %3018 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3020 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3021 = llvm.mul %2957, %3020 : !llvm.i64 + %3022 = llvm.add %3019, %3021 : !llvm.i64 + %3023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3024 = llvm.mul %59, %3023 : !llvm.i64 + %3025 = llvm.add %3022, %3024 : !llvm.i64 + %3026 = llvm.getelementptr %3018[%3025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3027 = llvm.load %3026 : !llvm.ptr + %3028 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3031 = llvm.mul %59, %3030 : !llvm.i64 + %3032 = llvm.add %3029, %3031 : !llvm.i64 + %3033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3034 = llvm.mul %120, %3033 : !llvm.i64 + %3035 = llvm.add %3032, %3034 : !llvm.i64 + %3036 = llvm.getelementptr %3028[%3035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3037 = llvm.load %3036 : !llvm.ptr + %3038 = llvm.fmul %3027, %3037 {RelaxedPrecision} : !llvm.float + %3039 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3041 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3042 = llvm.mul %2957, %3041 : !llvm.i64 + %3043 = llvm.add %3040, %3042 : !llvm.i64 + %3044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3045 = llvm.mul %120, %3044 : !llvm.i64 + %3046 = llvm.add %3043, %3045 : !llvm.i64 + %3047 = llvm.getelementptr %3039[%3046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3048 = llvm.load %3047 : !llvm.ptr + %3049 = llvm.fadd %3048, %3038 {RelaxedPrecision} : !llvm.float + %3050 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3051 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3052 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3053 = llvm.mul %2957, %3052 : !llvm.i64 + %3054 = llvm.add %3051, %3053 : !llvm.i64 + %3055 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3056 = llvm.mul %120, %3055 : !llvm.i64 + %3057 = llvm.add %3054, %3056 : !llvm.i64 + %3058 = llvm.getelementptr %3050[%3057] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3049, %3058 : !llvm.ptr + %3059 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3060 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3061 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3062 = llvm.mul %2957, %3061 : !llvm.i64 + %3063 = llvm.add %3060, %3062 : !llvm.i64 + %3064 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3065 = llvm.mul %120, %3064 : !llvm.i64 + %3066 = llvm.add %3063, %3065 : !llvm.i64 + %3067 = llvm.getelementptr %3059[%3066] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3068 = llvm.load %3067 : !llvm.ptr + %3069 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3070 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3071 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3072 = llvm.mul %2957, %3071 : !llvm.i64 + %3073 = llvm.add %3070, %3072 : !llvm.i64 + %3074 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3075 = llvm.mul %120, %3074 : !llvm.i64 + %3076 = llvm.add %3073, %3075 : !llvm.i64 + %3077 = llvm.getelementptr %3069[%3076] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3068, %3077 : !llvm.ptr + %3078 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3080 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3081 = llvm.mul %2957, %3080 : !llvm.i64 + %3082 = llvm.add %3079, %3081 : !llvm.i64 + %3083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3084 = llvm.mul %59, %3083 : !llvm.i64 + %3085 = llvm.add %3082, %3084 : !llvm.i64 + %3086 = llvm.getelementptr %3078[%3085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3087 = llvm.load %3086 : !llvm.ptr + %3088 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3091 = llvm.mul %59, %3090 : !llvm.i64 + %3092 = llvm.add %3089, %3091 : !llvm.i64 + %3093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3094 = llvm.mul %181, %3093 : !llvm.i64 + %3095 = llvm.add %3092, %3094 : !llvm.i64 + %3096 = llvm.getelementptr %3088[%3095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3097 = llvm.load %3096 : !llvm.ptr + %3098 = llvm.fmul %3087, %3097 {RelaxedPrecision} : !llvm.float + %3099 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3101 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3102 = llvm.mul %2957, %3101 : !llvm.i64 + %3103 = llvm.add %3100, %3102 : !llvm.i64 + %3104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3105 = llvm.mul %181, %3104 : !llvm.i64 + %3106 = llvm.add %3103, %3105 : !llvm.i64 + %3107 = llvm.getelementptr %3099[%3106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3108 = llvm.load %3107 : !llvm.ptr + %3109 = llvm.fadd %3108, %3098 {RelaxedPrecision} : !llvm.float + %3110 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3111 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3112 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3113 = llvm.mul %2957, %3112 : !llvm.i64 + %3114 = llvm.add %3111, %3113 : !llvm.i64 + %3115 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3116 = llvm.mul %181, %3115 : !llvm.i64 + %3117 = llvm.add %3114, %3116 : !llvm.i64 + %3118 = llvm.getelementptr %3110[%3117] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3109, %3118 : !llvm.ptr + %3119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3121 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3122 = llvm.mul %2957, %3121 : !llvm.i64 + %3123 = llvm.add %3120, %3122 : !llvm.i64 + %3124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3125 = llvm.mul %181, %3124 : !llvm.i64 + %3126 = llvm.add %3123, %3125 : !llvm.i64 + %3127 = llvm.getelementptr %3119[%3126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3128 = llvm.load %3127 : !llvm.ptr + %3129 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3130 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3131 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3132 = llvm.mul %2957, %3131 : !llvm.i64 + %3133 = llvm.add %3130, %3132 : !llvm.i64 + %3134 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3135 = llvm.mul %181, %3134 : !llvm.i64 + %3136 = llvm.add %3133, %3135 : !llvm.i64 + %3137 = llvm.getelementptr %3129[%3136] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3128, %3137 : !llvm.ptr + %3138 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3140 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3141 = llvm.mul %2957, %3140 : !llvm.i64 + %3142 = llvm.add %3139, %3141 : !llvm.i64 + %3143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3144 = llvm.mul %59, %3143 : !llvm.i64 + %3145 = llvm.add %3142, %3144 : !llvm.i64 + %3146 = llvm.getelementptr %3138[%3145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3147 = llvm.load %3146 : !llvm.ptr + %3148 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3151 = llvm.mul %59, %3150 : !llvm.i64 + %3152 = llvm.add %3149, %3151 : !llvm.i64 + %3153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3154 = llvm.mul %242, %3153 : !llvm.i64 + %3155 = llvm.add %3152, %3154 : !llvm.i64 + %3156 = llvm.getelementptr %3148[%3155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3157 = llvm.load %3156 : !llvm.ptr + %3158 = llvm.fmul %3147, %3157 {RelaxedPrecision} : !llvm.float + %3159 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3161 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3162 = llvm.mul %2957, %3161 : !llvm.i64 + %3163 = llvm.add %3160, %3162 : !llvm.i64 + %3164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3165 = llvm.mul %242, %3164 : !llvm.i64 + %3166 = llvm.add %3163, %3165 : !llvm.i64 + %3167 = llvm.getelementptr %3159[%3166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3168 = llvm.load %3167 : !llvm.ptr + %3169 = llvm.fadd %3168, %3158 {RelaxedPrecision} : !llvm.float + %3170 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3171 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3172 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3173 = llvm.mul %2957, %3172 : !llvm.i64 + %3174 = llvm.add %3171, %3173 : !llvm.i64 + %3175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3176 = llvm.mul %242, %3175 : !llvm.i64 + %3177 = llvm.add %3174, %3176 : !llvm.i64 + %3178 = llvm.getelementptr %3170[%3177] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3169, %3178 : !llvm.ptr + %3179 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3182 = llvm.mul %2957, %3181 : !llvm.i64 + %3183 = llvm.add %3180, %3182 : !llvm.i64 + %3184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3185 = llvm.mul %242, %3184 : !llvm.i64 + %3186 = llvm.add %3183, %3185 : !llvm.i64 + %3187 = llvm.getelementptr %3179[%3186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3188 = llvm.load %3187 : !llvm.ptr + %3189 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3191 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3192 = llvm.mul %2957, %3191 : !llvm.i64 + %3193 = llvm.add %3190, %3192 : !llvm.i64 + %3194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3195 = llvm.mul %242, %3194 : !llvm.i64 + %3196 = llvm.add %3193, %3195 : !llvm.i64 + %3197 = llvm.getelementptr %3189[%3196] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3188, %3197 : !llvm.ptr + %3198 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3200 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3201 = llvm.mul %2957, %3200 : !llvm.i64 + %3202 = llvm.add %3199, %3201 : !llvm.i64 + %3203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3204 = llvm.mul %59, %3203 : !llvm.i64 + %3205 = llvm.add %3202, %3204 : !llvm.i64 + %3206 = llvm.getelementptr %3198[%3205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3207 = llvm.load %3206 : !llvm.ptr + %3208 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3211 = llvm.mul %59, %3210 : !llvm.i64 + %3212 = llvm.add %3209, %3211 : !llvm.i64 + %3213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3214 = llvm.mul %303, %3213 : !llvm.i64 + %3215 = llvm.add %3212, %3214 : !llvm.i64 + %3216 = llvm.getelementptr %3208[%3215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3217 = llvm.load %3216 : !llvm.ptr + %3218 = llvm.fmul %3207, %3217 {RelaxedPrecision} : !llvm.float + %3219 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3221 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3222 = llvm.mul %2957, %3221 : !llvm.i64 + %3223 = llvm.add %3220, %3222 : !llvm.i64 + %3224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3225 = llvm.mul %303, %3224 : !llvm.i64 + %3226 = llvm.add %3223, %3225 : !llvm.i64 + %3227 = llvm.getelementptr %3219[%3226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3228 = llvm.load %3227 : !llvm.ptr + %3229 = llvm.fadd %3228, %3218 {RelaxedPrecision} : !llvm.float + %3230 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3232 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3233 = llvm.mul %2957, %3232 : !llvm.i64 + %3234 = llvm.add %3231, %3233 : !llvm.i64 + %3235 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3236 = llvm.mul %303, %3235 : !llvm.i64 + %3237 = llvm.add %3234, %3236 : !llvm.i64 + %3238 = llvm.getelementptr %3230[%3237] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3229, %3238 : !llvm.ptr + %3239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3242 = llvm.mul %2957, %3241 : !llvm.i64 + %3243 = llvm.add %3240, %3242 : !llvm.i64 + %3244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3245 = llvm.mul %303, %3244 : !llvm.i64 + %3246 = llvm.add %3243, %3245 : !llvm.i64 + %3247 = llvm.getelementptr %3239[%3246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3248 = llvm.load %3247 : !llvm.ptr + %3249 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3250 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3251 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3252 = llvm.mul %2957, %3251 : !llvm.i64 + %3253 = llvm.add %3250, %3252 : !llvm.i64 + %3254 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3255 = llvm.mul %303, %3254 : !llvm.i64 + %3256 = llvm.add %3253, %3255 : !llvm.i64 + %3257 = llvm.getelementptr %3249[%3256] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3248, %3257 : !llvm.ptr + %3258 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3260 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3261 = llvm.mul %2957, %3260 : !llvm.i64 + %3262 = llvm.add %3259, %3261 : !llvm.i64 + %3263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3264 = llvm.mul %59, %3263 : !llvm.i64 + %3265 = llvm.add %3262, %3264 : !llvm.i64 + %3266 = llvm.getelementptr %3258[%3265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3267 = llvm.load %3266 : !llvm.ptr + %3268 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3271 = llvm.mul %59, %3270 : !llvm.i64 + %3272 = llvm.add %3269, %3271 : !llvm.i64 + %3273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3274 = llvm.mul %364, %3273 : !llvm.i64 + %3275 = llvm.add %3272, %3274 : !llvm.i64 + %3276 = llvm.getelementptr %3268[%3275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3277 = llvm.load %3276 : !llvm.ptr + %3278 = llvm.fmul %3267, %3277 {RelaxedPrecision} : !llvm.float + %3279 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3281 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3282 = llvm.mul %2957, %3281 : !llvm.i64 + %3283 = llvm.add %3280, %3282 : !llvm.i64 + %3284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3285 = llvm.mul %364, %3284 : !llvm.i64 + %3286 = llvm.add %3283, %3285 : !llvm.i64 + %3287 = llvm.getelementptr %3279[%3286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3288 = llvm.load %3287 : !llvm.ptr + %3289 = llvm.fadd %3288, %3278 {RelaxedPrecision} : !llvm.float + %3290 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3291 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3292 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3293 = llvm.mul %2957, %3292 : !llvm.i64 + %3294 = llvm.add %3291, %3293 : !llvm.i64 + %3295 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3296 = llvm.mul %364, %3295 : !llvm.i64 + %3297 = llvm.add %3294, %3296 : !llvm.i64 + %3298 = llvm.getelementptr %3290[%3297] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3289, %3298 : !llvm.ptr + %3299 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3300 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3301 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3302 = llvm.mul %2957, %3301 : !llvm.i64 + %3303 = llvm.add %3300, %3302 : !llvm.i64 + %3304 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3305 = llvm.mul %364, %3304 : !llvm.i64 + %3306 = llvm.add %3303, %3305 : !llvm.i64 + %3307 = llvm.getelementptr %3299[%3306] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3308 = llvm.load %3307 : !llvm.ptr + %3309 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3310 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3311 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3312 = llvm.mul %2957, %3311 : !llvm.i64 + %3313 = llvm.add %3310, %3312 : !llvm.i64 + %3314 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3315 = llvm.mul %364, %3314 : !llvm.i64 + %3316 = llvm.add %3313, %3315 : !llvm.i64 + %3317 = llvm.getelementptr %3309[%3316] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3308, %3317 : !llvm.ptr + %3318 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3320 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3321 = llvm.mul %2957, %3320 : !llvm.i64 + %3322 = llvm.add %3319, %3321 : !llvm.i64 + %3323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3324 = llvm.mul %59, %3323 : !llvm.i64 + %3325 = llvm.add %3322, %3324 : !llvm.i64 + %3326 = llvm.getelementptr %3318[%3325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3327 = llvm.load %3326 : !llvm.ptr + %3328 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3331 = llvm.mul %59, %3330 : !llvm.i64 + %3332 = llvm.add %3329, %3331 : !llvm.i64 + %3333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3334 = llvm.mul %425, %3333 : !llvm.i64 + %3335 = llvm.add %3332, %3334 : !llvm.i64 + %3336 = llvm.getelementptr %3328[%3335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3337 = llvm.load %3336 : !llvm.ptr + %3338 = llvm.fmul %3327, %3337 {RelaxedPrecision} : !llvm.float + %3339 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3341 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3342 = llvm.mul %2957, %3341 : !llvm.i64 + %3343 = llvm.add %3340, %3342 : !llvm.i64 + %3344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3345 = llvm.mul %425, %3344 : !llvm.i64 + %3346 = llvm.add %3343, %3345 : !llvm.i64 + %3347 = llvm.getelementptr %3339[%3346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3348 = llvm.load %3347 : !llvm.ptr + %3349 = llvm.fadd %3348, %3338 {RelaxedPrecision} : !llvm.float + %3350 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3351 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3352 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3353 = llvm.mul %2957, %3352 : !llvm.i64 + %3354 = llvm.add %3351, %3353 : !llvm.i64 + %3355 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3356 = llvm.mul %425, %3355 : !llvm.i64 + %3357 = llvm.add %3354, %3356 : !llvm.i64 + %3358 = llvm.getelementptr %3350[%3357] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3349, %3358 : !llvm.ptr + %3359 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3360 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3361 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3362 = llvm.mul %2957, %3361 : !llvm.i64 + %3363 = llvm.add %3360, %3362 : !llvm.i64 + %3364 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3365 = llvm.mul %425, %3364 : !llvm.i64 + %3366 = llvm.add %3363, %3365 : !llvm.i64 + %3367 = llvm.getelementptr %3359[%3366] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3368 = llvm.load %3367 : !llvm.ptr + %3369 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3370 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3371 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3372 = llvm.mul %2957, %3371 : !llvm.i64 + %3373 = llvm.add %3370, %3372 : !llvm.i64 + %3374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3375 = llvm.mul %425, %3374 : !llvm.i64 + %3376 = llvm.add %3373, %3375 : !llvm.i64 + %3377 = llvm.getelementptr %3369[%3376] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3368, %3377 : !llvm.ptr + %3378 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3380 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3381 = llvm.mul %2957, %3380 : !llvm.i64 + %3382 = llvm.add %3379, %3381 : !llvm.i64 + %3383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3384 = llvm.mul %59, %3383 : !llvm.i64 + %3385 = llvm.add %3382, %3384 : !llvm.i64 + %3386 = llvm.getelementptr %3378[%3385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3387 = llvm.load %3386 : !llvm.ptr + %3388 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3391 = llvm.mul %59, %3390 : !llvm.i64 + %3392 = llvm.add %3389, %3391 : !llvm.i64 + %3393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3394 = llvm.mul %486, %3393 : !llvm.i64 + %3395 = llvm.add %3392, %3394 : !llvm.i64 + %3396 = llvm.getelementptr %3388[%3395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3397 = llvm.load %3396 : !llvm.ptr + %3398 = llvm.fmul %3387, %3397 {RelaxedPrecision} : !llvm.float + %3399 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3401 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3402 = llvm.mul %2957, %3401 : !llvm.i64 + %3403 = llvm.add %3400, %3402 : !llvm.i64 + %3404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3405 = llvm.mul %486, %3404 : !llvm.i64 + %3406 = llvm.add %3403, %3405 : !llvm.i64 + %3407 = llvm.getelementptr %3399[%3406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3408 = llvm.load %3407 : !llvm.ptr + %3409 = llvm.fadd %3408, %3398 {RelaxedPrecision} : !llvm.float + %3410 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3413 = llvm.mul %2957, %3412 : !llvm.i64 + %3414 = llvm.add %3411, %3413 : !llvm.i64 + %3415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3416 = llvm.mul %486, %3415 : !llvm.i64 + %3417 = llvm.add %3414, %3416 : !llvm.i64 + %3418 = llvm.getelementptr %3410[%3417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3409, %3418 : !llvm.ptr + %3419 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3422 = llvm.mul %2957, %3421 : !llvm.i64 + %3423 = llvm.add %3420, %3422 : !llvm.i64 + %3424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3425 = llvm.mul %486, %3424 : !llvm.i64 + %3426 = llvm.add %3423, %3425 : !llvm.i64 + %3427 = llvm.getelementptr %3419[%3426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3428 = llvm.load %3427 : !llvm.ptr + %3429 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3430 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3431 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3432 = llvm.mul %2957, %3431 : !llvm.i64 + %3433 = llvm.add %3430, %3432 : !llvm.i64 + %3434 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3435 = llvm.mul %486, %3434 : !llvm.i64 + %3436 = llvm.add %3433, %3435 : !llvm.i64 + %3437 = llvm.getelementptr %3429[%3436] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3428, %3437 : !llvm.ptr + %3438 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3440 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3441 = llvm.mul %2957, %3440 : !llvm.i64 + %3442 = llvm.add %3439, %3441 : !llvm.i64 + %3443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3444 = llvm.mul %59, %3443 : !llvm.i64 + %3445 = llvm.add %3442, %3444 : !llvm.i64 + %3446 = llvm.getelementptr %3438[%3445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3447 = llvm.load %3446 : !llvm.ptr + %3448 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3451 = llvm.mul %59, %3450 : !llvm.i64 + %3452 = llvm.add %3449, %3451 : !llvm.i64 + %3453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3454 = llvm.mul %547, %3453 : !llvm.i64 + %3455 = llvm.add %3452, %3454 : !llvm.i64 + %3456 = llvm.getelementptr %3448[%3455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3457 = llvm.load %3456 : !llvm.ptr + %3458 = llvm.fmul %3447, %3457 {RelaxedPrecision} : !llvm.float + %3459 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3461 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3462 = llvm.mul %2957, %3461 : !llvm.i64 + %3463 = llvm.add %3460, %3462 : !llvm.i64 + %3464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3465 = llvm.mul %547, %3464 : !llvm.i64 + %3466 = llvm.add %3463, %3465 : !llvm.i64 + %3467 = llvm.getelementptr %3459[%3466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3468 = llvm.load %3467 : !llvm.ptr + %3469 = llvm.fadd %3468, %3458 {RelaxedPrecision} : !llvm.float + %3470 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3471 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3472 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3473 = llvm.mul %2957, %3472 : !llvm.i64 + %3474 = llvm.add %3471, %3473 : !llvm.i64 + %3475 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3476 = llvm.mul %547, %3475 : !llvm.i64 + %3477 = llvm.add %3474, %3476 : !llvm.i64 + %3478 = llvm.getelementptr %3470[%3477] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3469, %3478 : !llvm.ptr + %3479 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3480 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3481 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3482 = llvm.mul %2957, %3481 : !llvm.i64 + %3483 = llvm.add %3480, %3482 : !llvm.i64 + %3484 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3485 = llvm.mul %547, %3484 : !llvm.i64 + %3486 = llvm.add %3483, %3485 : !llvm.i64 + %3487 = llvm.getelementptr %3479[%3486] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3488 = llvm.load %3487 : !llvm.ptr + %3489 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3490 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3491 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3492 = llvm.mul %2957, %3491 : !llvm.i64 + %3493 = llvm.add %3490, %3492 : !llvm.i64 + %3494 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3495 = llvm.mul %547, %3494 : !llvm.i64 + %3496 = llvm.add %3493, %3495 : !llvm.i64 + %3497 = llvm.getelementptr %3489[%3496] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3488, %3497 : !llvm.ptr + %3498 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3500 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3501 = llvm.mul %2957, %3500 : !llvm.i64 + %3502 = llvm.add %3499, %3501 : !llvm.i64 + %3503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3504 = llvm.mul %59, %3503 : !llvm.i64 + %3505 = llvm.add %3502, %3504 : !llvm.i64 + %3506 = llvm.getelementptr %3498[%3505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3507 = llvm.load %3506 : !llvm.ptr + %3508 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3511 = llvm.mul %59, %3510 : !llvm.i64 + %3512 = llvm.add %3509, %3511 : !llvm.i64 + %3513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3514 = llvm.mul %608, %3513 : !llvm.i64 + %3515 = llvm.add %3512, %3514 : !llvm.i64 + %3516 = llvm.getelementptr %3508[%3515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3517 = llvm.load %3516 : !llvm.ptr + %3518 = llvm.fmul %3507, %3517 {RelaxedPrecision} : !llvm.float + %3519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3522 = llvm.mul %2957, %3521 : !llvm.i64 + %3523 = llvm.add %3520, %3522 : !llvm.i64 + %3524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3525 = llvm.mul %608, %3524 : !llvm.i64 + %3526 = llvm.add %3523, %3525 : !llvm.i64 + %3527 = llvm.getelementptr %3519[%3526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3528 = llvm.load %3527 : !llvm.ptr + %3529 = llvm.fadd %3528, %3518 {RelaxedPrecision} : !llvm.float + %3530 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3531 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3532 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3533 = llvm.mul %2957, %3532 : !llvm.i64 + %3534 = llvm.add %3531, %3533 : !llvm.i64 + %3535 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3536 = llvm.mul %608, %3535 : !llvm.i64 + %3537 = llvm.add %3534, %3536 : !llvm.i64 + %3538 = llvm.getelementptr %3530[%3537] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3529, %3538 : !llvm.ptr + %3539 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3540 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3541 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3542 = llvm.mul %2957, %3541 : !llvm.i64 + %3543 = llvm.add %3540, %3542 : !llvm.i64 + %3544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3545 = llvm.mul %608, %3544 : !llvm.i64 + %3546 = llvm.add %3543, %3545 : !llvm.i64 + %3547 = llvm.getelementptr %3539[%3546] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3548 = llvm.load %3547 : !llvm.ptr + %3549 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3550 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3551 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3552 = llvm.mul %2957, %3551 : !llvm.i64 + %3553 = llvm.add %3550, %3552 : !llvm.i64 + %3554 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3555 = llvm.mul %608, %3554 : !llvm.i64 + %3556 = llvm.add %3553, %3555 : !llvm.i64 + %3557 = llvm.getelementptr %3549[%3556] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3548, %3557 : !llvm.ptr + %3558 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3560 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3561 = llvm.mul %2957, %3560 : !llvm.i64 + %3562 = llvm.add %3559, %3561 : !llvm.i64 + %3563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3564 = llvm.mul %59, %3563 : !llvm.i64 + %3565 = llvm.add %3562, %3564 : !llvm.i64 + %3566 = llvm.getelementptr %3558[%3565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3567 = llvm.load %3566 : !llvm.ptr + %3568 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3571 = llvm.mul %59, %3570 : !llvm.i64 + %3572 = llvm.add %3569, %3571 : !llvm.i64 + %3573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3574 = llvm.mul %669, %3573 : !llvm.i64 + %3575 = llvm.add %3572, %3574 : !llvm.i64 + %3576 = llvm.getelementptr %3568[%3575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3577 = llvm.load %3576 : !llvm.ptr + %3578 = llvm.fmul %3567, %3577 {RelaxedPrecision} : !llvm.float + %3579 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3581 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3582 = llvm.mul %2957, %3581 : !llvm.i64 + %3583 = llvm.add %3580, %3582 : !llvm.i64 + %3584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3585 = llvm.mul %669, %3584 : !llvm.i64 + %3586 = llvm.add %3583, %3585 : !llvm.i64 + %3587 = llvm.getelementptr %3579[%3586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3588 = llvm.load %3587 : !llvm.ptr + %3589 = llvm.fadd %3588, %3578 {RelaxedPrecision} : !llvm.float + %3590 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3591 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3592 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3593 = llvm.mul %2957, %3592 : !llvm.i64 + %3594 = llvm.add %3591, %3593 : !llvm.i64 + %3595 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3596 = llvm.mul %669, %3595 : !llvm.i64 + %3597 = llvm.add %3594, %3596 : !llvm.i64 + %3598 = llvm.getelementptr %3590[%3597] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3589, %3598 : !llvm.ptr + %3599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3602 = llvm.mul %2957, %3601 : !llvm.i64 + %3603 = llvm.add %3600, %3602 : !llvm.i64 + %3604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3605 = llvm.mul %669, %3604 : !llvm.i64 + %3606 = llvm.add %3603, %3605 : !llvm.i64 + %3607 = llvm.getelementptr %3599[%3606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3608 = llvm.load %3607 : !llvm.ptr + %3609 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3610 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3611 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3612 = llvm.mul %2957, %3611 : !llvm.i64 + %3613 = llvm.add %3610, %3612 : !llvm.i64 + %3614 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3615 = llvm.mul %669, %3614 : !llvm.i64 + %3616 = llvm.add %3613, %3615 : !llvm.i64 + %3617 = llvm.getelementptr %3609[%3616] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3608, %3617 : !llvm.ptr + %3618 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3620 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3621 = llvm.mul %2957, %3620 : !llvm.i64 + %3622 = llvm.add %3619, %3621 : !llvm.i64 + %3623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3624 = llvm.mul %59, %3623 : !llvm.i64 + %3625 = llvm.add %3622, %3624 : !llvm.i64 + %3626 = llvm.getelementptr %3618[%3625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3627 = llvm.load %3626 : !llvm.ptr + %3628 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3631 = llvm.mul %59, %3630 : !llvm.i64 + %3632 = llvm.add %3629, %3631 : !llvm.i64 + %3633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3634 = llvm.mul %730, %3633 : !llvm.i64 + %3635 = llvm.add %3632, %3634 : !llvm.i64 + %3636 = llvm.getelementptr %3628[%3635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3637 = llvm.load %3636 : !llvm.ptr + %3638 = llvm.fmul %3627, %3637 {RelaxedPrecision} : !llvm.float + %3639 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3641 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3642 = llvm.mul %2957, %3641 : !llvm.i64 + %3643 = llvm.add %3640, %3642 : !llvm.i64 + %3644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3645 = llvm.mul %730, %3644 : !llvm.i64 + %3646 = llvm.add %3643, %3645 : !llvm.i64 + %3647 = llvm.getelementptr %3639[%3646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3648 = llvm.load %3647 : !llvm.ptr + %3649 = llvm.fadd %3648, %3638 {RelaxedPrecision} : !llvm.float + %3650 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3651 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3652 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3653 = llvm.mul %2957, %3652 : !llvm.i64 + %3654 = llvm.add %3651, %3653 : !llvm.i64 + %3655 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3656 = llvm.mul %730, %3655 : !llvm.i64 + %3657 = llvm.add %3654, %3656 : !llvm.i64 + %3658 = llvm.getelementptr %3650[%3657] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3649, %3658 : !llvm.ptr + %3659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3662 = llvm.mul %2957, %3661 : !llvm.i64 + %3663 = llvm.add %3660, %3662 : !llvm.i64 + %3664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3665 = llvm.mul %730, %3664 : !llvm.i64 + %3666 = llvm.add %3663, %3665 : !llvm.i64 + %3667 = llvm.getelementptr %3659[%3666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3668 = llvm.load %3667 : !llvm.ptr + %3669 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3670 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3671 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3672 = llvm.mul %2957, %3671 : !llvm.i64 + %3673 = llvm.add %3670, %3672 : !llvm.i64 + %3674 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3675 = llvm.mul %730, %3674 : !llvm.i64 + %3676 = llvm.add %3673, %3675 : !llvm.i64 + %3677 = llvm.getelementptr %3669[%3676] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3668, %3677 : !llvm.ptr + %3678 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3680 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3681 = llvm.mul %2957, %3680 : !llvm.i64 + %3682 = llvm.add %3679, %3681 : !llvm.i64 + %3683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3684 = llvm.mul %59, %3683 : !llvm.i64 + %3685 = llvm.add %3682, %3684 : !llvm.i64 + %3686 = llvm.getelementptr %3678[%3685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3687 = llvm.load %3686 : !llvm.ptr + %3688 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3691 = llvm.mul %59, %3690 : !llvm.i64 + %3692 = llvm.add %3689, %3691 : !llvm.i64 + %3693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3694 = llvm.mul %791, %3693 : !llvm.i64 + %3695 = llvm.add %3692, %3694 : !llvm.i64 + %3696 = llvm.getelementptr %3688[%3695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3697 = llvm.load %3696 : !llvm.ptr + %3698 = llvm.fmul %3687, %3697 {RelaxedPrecision} : !llvm.float + %3699 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3701 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3702 = llvm.mul %2957, %3701 : !llvm.i64 + %3703 = llvm.add %3700, %3702 : !llvm.i64 + %3704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3705 = llvm.mul %791, %3704 : !llvm.i64 + %3706 = llvm.add %3703, %3705 : !llvm.i64 + %3707 = llvm.getelementptr %3699[%3706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3708 = llvm.load %3707 : !llvm.ptr + %3709 = llvm.fadd %3708, %3698 {RelaxedPrecision} : !llvm.float + %3710 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3711 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3712 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3713 = llvm.mul %2957, %3712 : !llvm.i64 + %3714 = llvm.add %3711, %3713 : !llvm.i64 + %3715 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3716 = llvm.mul %791, %3715 : !llvm.i64 + %3717 = llvm.add %3714, %3716 : !llvm.i64 + %3718 = llvm.getelementptr %3710[%3717] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3709, %3718 : !llvm.ptr + %3719 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3720 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3721 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3722 = llvm.mul %2957, %3721 : !llvm.i64 + %3723 = llvm.add %3720, %3722 : !llvm.i64 + %3724 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3725 = llvm.mul %791, %3724 : !llvm.i64 + %3726 = llvm.add %3723, %3725 : !llvm.i64 + %3727 = llvm.getelementptr %3719[%3726] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3728 = llvm.load %3727 : !llvm.ptr + %3729 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3730 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3731 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3732 = llvm.mul %2957, %3731 : !llvm.i64 + %3733 = llvm.add %3730, %3732 : !llvm.i64 + %3734 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3735 = llvm.mul %791, %3734 : !llvm.i64 + %3736 = llvm.add %3733, %3735 : !llvm.i64 + %3737 = llvm.getelementptr %3729[%3736] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3728, %3737 : !llvm.ptr + %3738 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3740 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3741 = llvm.mul %2957, %3740 : !llvm.i64 + %3742 = llvm.add %3739, %3741 : !llvm.i64 + %3743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3744 = llvm.mul %59, %3743 : !llvm.i64 + %3745 = llvm.add %3742, %3744 : !llvm.i64 + %3746 = llvm.getelementptr %3738[%3745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3747 = llvm.load %3746 : !llvm.ptr + %3748 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3750 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3751 = llvm.mul %59, %3750 : !llvm.i64 + %3752 = llvm.add %3749, %3751 : !llvm.i64 + %3753 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3754 = llvm.mul %852, %3753 : !llvm.i64 + %3755 = llvm.add %3752, %3754 : !llvm.i64 + %3756 = llvm.getelementptr %3748[%3755] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3757 = llvm.load %3756 : !llvm.ptr + %3758 = llvm.fmul %3747, %3757 {RelaxedPrecision} : !llvm.float + %3759 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3761 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3762 = llvm.mul %2957, %3761 : !llvm.i64 + %3763 = llvm.add %3760, %3762 : !llvm.i64 + %3764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3765 = llvm.mul %852, %3764 : !llvm.i64 + %3766 = llvm.add %3763, %3765 : !llvm.i64 + %3767 = llvm.getelementptr %3759[%3766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3768 = llvm.load %3767 : !llvm.ptr + %3769 = llvm.fadd %3768, %3758 {RelaxedPrecision} : !llvm.float + %3770 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3771 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3772 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3773 = llvm.mul %2957, %3772 : !llvm.i64 + %3774 = llvm.add %3771, %3773 : !llvm.i64 + %3775 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3776 = llvm.mul %852, %3775 : !llvm.i64 + %3777 = llvm.add %3774, %3776 : !llvm.i64 + %3778 = llvm.getelementptr %3770[%3777] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3769, %3778 : !llvm.ptr + %3779 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3780 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3781 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3782 = llvm.mul %2957, %3781 : !llvm.i64 + %3783 = llvm.add %3780, %3782 : !llvm.i64 + %3784 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3785 = llvm.mul %852, %3784 : !llvm.i64 + %3786 = llvm.add %3783, %3785 : !llvm.i64 + %3787 = llvm.getelementptr %3779[%3786] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3788 = llvm.load %3787 : !llvm.ptr + %3789 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3790 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3791 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3792 = llvm.mul %2957, %3791 : !llvm.i64 + %3793 = llvm.add %3790, %3792 : !llvm.i64 + %3794 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3795 = llvm.mul %852, %3794 : !llvm.i64 + %3796 = llvm.add %3793, %3795 : !llvm.i64 + %3797 = llvm.getelementptr %3789[%3796] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3788, %3797 : !llvm.ptr + %3798 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3800 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3801 = llvm.mul %2957, %3800 : !llvm.i64 + %3802 = llvm.add %3799, %3801 : !llvm.i64 + %3803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3804 = llvm.mul %59, %3803 : !llvm.i64 + %3805 = llvm.add %3802, %3804 : !llvm.i64 + %3806 = llvm.getelementptr %3798[%3805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3807 = llvm.load %3806 : !llvm.ptr + %3808 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3811 = llvm.mul %59, %3810 : !llvm.i64 + %3812 = llvm.add %3809, %3811 : !llvm.i64 + %3813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3814 = llvm.mul %913, %3813 : !llvm.i64 + %3815 = llvm.add %3812, %3814 : !llvm.i64 + %3816 = llvm.getelementptr %3808[%3815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3817 = llvm.load %3816 : !llvm.ptr + %3818 = llvm.fmul %3807, %3817 {RelaxedPrecision} : !llvm.float + %3819 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3821 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3822 = llvm.mul %2957, %3821 : !llvm.i64 + %3823 = llvm.add %3820, %3822 : !llvm.i64 + %3824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3825 = llvm.mul %913, %3824 : !llvm.i64 + %3826 = llvm.add %3823, %3825 : !llvm.i64 + %3827 = llvm.getelementptr %3819[%3826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3828 = llvm.load %3827 : !llvm.ptr + %3829 = llvm.fadd %3828, %3818 {RelaxedPrecision} : !llvm.float + %3830 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3831 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3832 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3833 = llvm.mul %2957, %3832 : !llvm.i64 + %3834 = llvm.add %3831, %3833 : !llvm.i64 + %3835 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3836 = llvm.mul %913, %3835 : !llvm.i64 + %3837 = llvm.add %3834, %3836 : !llvm.i64 + %3838 = llvm.getelementptr %3830[%3837] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3829, %3838 : !llvm.ptr + %3839 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3840 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3841 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3842 = llvm.mul %2957, %3841 : !llvm.i64 + %3843 = llvm.add %3840, %3842 : !llvm.i64 + %3844 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3845 = llvm.mul %913, %3844 : !llvm.i64 + %3846 = llvm.add %3843, %3845 : !llvm.i64 + %3847 = llvm.getelementptr %3839[%3846] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3848 = llvm.load %3847 : !llvm.ptr + %3849 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3850 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3851 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3852 = llvm.mul %2957, %3851 : !llvm.i64 + %3853 = llvm.add %3850, %3852 : !llvm.i64 + %3854 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3855 = llvm.mul %913, %3854 : !llvm.i64 + %3856 = llvm.add %3853, %3855 : !llvm.i64 + %3857 = llvm.getelementptr %3849[%3856] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3848, %3857 : !llvm.ptr + %3858 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3860 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3861 = llvm.mul %2957, %3860 : !llvm.i64 + %3862 = llvm.add %3859, %3861 : !llvm.i64 + %3863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3864 = llvm.mul %59, %3863 : !llvm.i64 + %3865 = llvm.add %3862, %3864 : !llvm.i64 + %3866 = llvm.getelementptr %3858[%3865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3867 = llvm.load %3866 : !llvm.ptr + %3868 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3871 = llvm.mul %59, %3870 : !llvm.i64 + %3872 = llvm.add %3869, %3871 : !llvm.i64 + %3873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3874 = llvm.mul %974, %3873 : !llvm.i64 + %3875 = llvm.add %3872, %3874 : !llvm.i64 + %3876 = llvm.getelementptr %3868[%3875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3877 = llvm.load %3876 : !llvm.ptr + %3878 = llvm.fmul %3867, %3877 {RelaxedPrecision} : !llvm.float + %3879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3882 = llvm.mul %2957, %3881 : !llvm.i64 + %3883 = llvm.add %3880, %3882 : !llvm.i64 + %3884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3885 = llvm.mul %974, %3884 : !llvm.i64 + %3886 = llvm.add %3883, %3885 : !llvm.i64 + %3887 = llvm.getelementptr %3879[%3886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3888 = llvm.load %3887 : !llvm.ptr + %3889 = llvm.fadd %3888, %3878 {RelaxedPrecision} : !llvm.float + %3890 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3892 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3893 = llvm.mul %2957, %3892 : !llvm.i64 + %3894 = llvm.add %3891, %3893 : !llvm.i64 + %3895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3896 = llvm.mul %974, %3895 : !llvm.i64 + %3897 = llvm.add %3894, %3896 : !llvm.i64 + %3898 = llvm.getelementptr %3890[%3897] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3889, %3898 : !llvm.ptr + %3899 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3901 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3902 = llvm.mul %2957, %3901 : !llvm.i64 + %3903 = llvm.add %3900, %3902 : !llvm.i64 + %3904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3905 = llvm.mul %974, %3904 : !llvm.i64 + %3906 = llvm.add %3903, %3905 : !llvm.i64 + %3907 = llvm.getelementptr %3899[%3906] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3908 = llvm.load %3907 : !llvm.ptr + %3909 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3910 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3911 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3912 = llvm.mul %2957, %3911 : !llvm.i64 + %3913 = llvm.add %3910, %3912 : !llvm.i64 + %3914 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3915 = llvm.mul %974, %3914 : !llvm.i64 + %3916 = llvm.add %3913, %3915 : !llvm.i64 + %3917 = llvm.getelementptr %3909[%3916] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3908, %3917 : !llvm.ptr + %3918 = llvm.add %50, %36 : !llvm.i64 + %3919 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3921 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3922 = llvm.mul %3918, %3921 : !llvm.i64 + %3923 = llvm.add %3920, %3922 : !llvm.i64 + %3924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3925 = llvm.mul %59, %3924 : !llvm.i64 + %3926 = llvm.add %3923, %3925 : !llvm.i64 + %3927 = llvm.getelementptr %3919[%3926] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3928 = llvm.load %3927 : !llvm.ptr + %3929 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3930 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3931 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3932 = llvm.mul %59, %3931 : !llvm.i64 + %3933 = llvm.add %3930, %3932 : !llvm.i64 + %3934 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3935 = llvm.mul %58, %3934 : !llvm.i64 + %3936 = llvm.add %3933, %3935 : !llvm.i64 + %3937 = llvm.getelementptr %3929[%3936] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3938 = llvm.load %3937 : !llvm.ptr + %3939 = llvm.fmul %3928, %3938 {RelaxedPrecision} : !llvm.float + %3940 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3941 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3942 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3943 = llvm.mul %3918, %3942 : !llvm.i64 + %3944 = llvm.add %3941, %3943 : !llvm.i64 + %3945 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3946 = llvm.mul %58, %3945 : !llvm.i64 + %3947 = llvm.add %3944, %3946 : !llvm.i64 + %3948 = llvm.getelementptr %3940[%3947] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3949 = llvm.load %3948 : !llvm.ptr + %3950 = llvm.fadd %3949, %3939 {RelaxedPrecision} : !llvm.float + %3951 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3952 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3953 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3954 = llvm.mul %3918, %3953 : !llvm.i64 + %3955 = llvm.add %3952, %3954 : !llvm.i64 + %3956 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3957 = llvm.mul %58, %3956 : !llvm.i64 + %3958 = llvm.add %3955, %3957 : !llvm.i64 + %3959 = llvm.getelementptr %3951[%3958] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3950, %3959 : !llvm.ptr + %3960 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3961 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3962 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3963 = llvm.mul %3918, %3962 : !llvm.i64 + %3964 = llvm.add %3961, %3963 : !llvm.i64 + %3965 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3966 = llvm.mul %58, %3965 : !llvm.i64 + %3967 = llvm.add %3964, %3966 : !llvm.i64 + %3968 = llvm.getelementptr %3960[%3967] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3969 = llvm.load %3968 : !llvm.ptr + %3970 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3971 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3972 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3973 = llvm.mul %3918, %3972 : !llvm.i64 + %3974 = llvm.add %3971, %3973 : !llvm.i64 + %3975 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3976 = llvm.mul %58, %3975 : !llvm.i64 + %3977 = llvm.add %3974, %3976 : !llvm.i64 + %3978 = llvm.getelementptr %3970[%3977] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %3969, %3978 : !llvm.ptr + %3979 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3981 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3982 = llvm.mul %3918, %3981 : !llvm.i64 + %3983 = llvm.add %3980, %3982 : !llvm.i64 + %3984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3985 = llvm.mul %59, %3984 : !llvm.i64 + %3986 = llvm.add %3983, %3985 : !llvm.i64 + %3987 = llvm.getelementptr %3979[%3986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3988 = llvm.load %3987 : !llvm.ptr + %3989 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3990 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3991 = llvm.mlir.constant(512 : index) : !llvm.i64 + %3992 = llvm.mul %59, %3991 : !llvm.i64 + %3993 = llvm.add %3990, %3992 : !llvm.i64 + %3994 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3995 = llvm.mul %120, %3994 : !llvm.i64 + %3996 = llvm.add %3993, %3995 : !llvm.i64 + %3997 = llvm.getelementptr %3989[%3996] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3998 = llvm.load %3997 : !llvm.ptr + %3999 = llvm.fmul %3988, %3998 {RelaxedPrecision} : !llvm.float + %4000 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4001 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4002 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4003 = llvm.mul %3918, %4002 : !llvm.i64 + %4004 = llvm.add %4001, %4003 : !llvm.i64 + %4005 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4006 = llvm.mul %120, %4005 : !llvm.i64 + %4007 = llvm.add %4004, %4006 : !llvm.i64 + %4008 = llvm.getelementptr %4000[%4007] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4009 = llvm.load %4008 : !llvm.ptr + %4010 = llvm.fadd %4009, %3999 {RelaxedPrecision} : !llvm.float + %4011 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4012 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4013 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4014 = llvm.mul %3918, %4013 : !llvm.i64 + %4015 = llvm.add %4012, %4014 : !llvm.i64 + %4016 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4017 = llvm.mul %120, %4016 : !llvm.i64 + %4018 = llvm.add %4015, %4017 : !llvm.i64 + %4019 = llvm.getelementptr %4011[%4018] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4010, %4019 : !llvm.ptr + %4020 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4021 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4022 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4023 = llvm.mul %3918, %4022 : !llvm.i64 + %4024 = llvm.add %4021, %4023 : !llvm.i64 + %4025 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4026 = llvm.mul %120, %4025 : !llvm.i64 + %4027 = llvm.add %4024, %4026 : !llvm.i64 + %4028 = llvm.getelementptr %4020[%4027] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4029 = llvm.load %4028 : !llvm.ptr + %4030 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4031 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4032 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4033 = llvm.mul %3918, %4032 : !llvm.i64 + %4034 = llvm.add %4031, %4033 : !llvm.i64 + %4035 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4036 = llvm.mul %120, %4035 : !llvm.i64 + %4037 = llvm.add %4034, %4036 : !llvm.i64 + %4038 = llvm.getelementptr %4030[%4037] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4029, %4038 : !llvm.ptr + %4039 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4041 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4042 = llvm.mul %3918, %4041 : !llvm.i64 + %4043 = llvm.add %4040, %4042 : !llvm.i64 + %4044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4045 = llvm.mul %59, %4044 : !llvm.i64 + %4046 = llvm.add %4043, %4045 : !llvm.i64 + %4047 = llvm.getelementptr %4039[%4046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4048 = llvm.load %4047 : !llvm.ptr + %4049 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4050 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4051 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4052 = llvm.mul %59, %4051 : !llvm.i64 + %4053 = llvm.add %4050, %4052 : !llvm.i64 + %4054 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4055 = llvm.mul %181, %4054 : !llvm.i64 + %4056 = llvm.add %4053, %4055 : !llvm.i64 + %4057 = llvm.getelementptr %4049[%4056] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4058 = llvm.load %4057 : !llvm.ptr + %4059 = llvm.fmul %4048, %4058 {RelaxedPrecision} : !llvm.float + %4060 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4062 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4063 = llvm.mul %3918, %4062 : !llvm.i64 + %4064 = llvm.add %4061, %4063 : !llvm.i64 + %4065 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4066 = llvm.mul %181, %4065 : !llvm.i64 + %4067 = llvm.add %4064, %4066 : !llvm.i64 + %4068 = llvm.getelementptr %4060[%4067] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4069 = llvm.load %4068 : !llvm.ptr + %4070 = llvm.fadd %4069, %4059 {RelaxedPrecision} : !llvm.float + %4071 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4072 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4073 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4074 = llvm.mul %3918, %4073 : !llvm.i64 + %4075 = llvm.add %4072, %4074 : !llvm.i64 + %4076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4077 = llvm.mul %181, %4076 : !llvm.i64 + %4078 = llvm.add %4075, %4077 : !llvm.i64 + %4079 = llvm.getelementptr %4071[%4078] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4070, %4079 : !llvm.ptr + %4080 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4081 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4082 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4083 = llvm.mul %3918, %4082 : !llvm.i64 + %4084 = llvm.add %4081, %4083 : !llvm.i64 + %4085 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4086 = llvm.mul %181, %4085 : !llvm.i64 + %4087 = llvm.add %4084, %4086 : !llvm.i64 + %4088 = llvm.getelementptr %4080[%4087] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4089 = llvm.load %4088 : !llvm.ptr + %4090 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4091 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4092 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4093 = llvm.mul %3918, %4092 : !llvm.i64 + %4094 = llvm.add %4091, %4093 : !llvm.i64 + %4095 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4096 = llvm.mul %181, %4095 : !llvm.i64 + %4097 = llvm.add %4094, %4096 : !llvm.i64 + %4098 = llvm.getelementptr %4090[%4097] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4089, %4098 : !llvm.ptr + %4099 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4101 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4102 = llvm.mul %3918, %4101 : !llvm.i64 + %4103 = llvm.add %4100, %4102 : !llvm.i64 + %4104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4105 = llvm.mul %59, %4104 : !llvm.i64 + %4106 = llvm.add %4103, %4105 : !llvm.i64 + %4107 = llvm.getelementptr %4099[%4106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4108 = llvm.load %4107 : !llvm.ptr + %4109 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4110 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4111 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4112 = llvm.mul %59, %4111 : !llvm.i64 + %4113 = llvm.add %4110, %4112 : !llvm.i64 + %4114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4115 = llvm.mul %242, %4114 : !llvm.i64 + %4116 = llvm.add %4113, %4115 : !llvm.i64 + %4117 = llvm.getelementptr %4109[%4116] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4118 = llvm.load %4117 : !llvm.ptr + %4119 = llvm.fmul %4108, %4118 {RelaxedPrecision} : !llvm.float + %4120 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4122 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4123 = llvm.mul %3918, %4122 : !llvm.i64 + %4124 = llvm.add %4121, %4123 : !llvm.i64 + %4125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4126 = llvm.mul %242, %4125 : !llvm.i64 + %4127 = llvm.add %4124, %4126 : !llvm.i64 + %4128 = llvm.getelementptr %4120[%4127] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4129 = llvm.load %4128 : !llvm.ptr + %4130 = llvm.fadd %4129, %4119 {RelaxedPrecision} : !llvm.float + %4131 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4134 = llvm.mul %3918, %4133 : !llvm.i64 + %4135 = llvm.add %4132, %4134 : !llvm.i64 + %4136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4137 = llvm.mul %242, %4136 : !llvm.i64 + %4138 = llvm.add %4135, %4137 : !llvm.i64 + %4139 = llvm.getelementptr %4131[%4138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4130, %4139 : !llvm.ptr + %4140 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4141 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4142 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4143 = llvm.mul %3918, %4142 : !llvm.i64 + %4144 = llvm.add %4141, %4143 : !llvm.i64 + %4145 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4146 = llvm.mul %242, %4145 : !llvm.i64 + %4147 = llvm.add %4144, %4146 : !llvm.i64 + %4148 = llvm.getelementptr %4140[%4147] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4149 = llvm.load %4148 : !llvm.ptr + %4150 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4151 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4152 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4153 = llvm.mul %3918, %4152 : !llvm.i64 + %4154 = llvm.add %4151, %4153 : !llvm.i64 + %4155 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4156 = llvm.mul %242, %4155 : !llvm.i64 + %4157 = llvm.add %4154, %4156 : !llvm.i64 + %4158 = llvm.getelementptr %4150[%4157] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4149, %4158 : !llvm.ptr + %4159 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4161 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4162 = llvm.mul %3918, %4161 : !llvm.i64 + %4163 = llvm.add %4160, %4162 : !llvm.i64 + %4164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4165 = llvm.mul %59, %4164 : !llvm.i64 + %4166 = llvm.add %4163, %4165 : !llvm.i64 + %4167 = llvm.getelementptr %4159[%4166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4168 = llvm.load %4167 : !llvm.ptr + %4169 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4171 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4172 = llvm.mul %59, %4171 : !llvm.i64 + %4173 = llvm.add %4170, %4172 : !llvm.i64 + %4174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4175 = llvm.mul %303, %4174 : !llvm.i64 + %4176 = llvm.add %4173, %4175 : !llvm.i64 + %4177 = llvm.getelementptr %4169[%4176] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4178 = llvm.load %4177 : !llvm.ptr + %4179 = llvm.fmul %4168, %4178 {RelaxedPrecision} : !llvm.float + %4180 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4182 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4183 = llvm.mul %3918, %4182 : !llvm.i64 + %4184 = llvm.add %4181, %4183 : !llvm.i64 + %4185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4186 = llvm.mul %303, %4185 : !llvm.i64 + %4187 = llvm.add %4184, %4186 : !llvm.i64 + %4188 = llvm.getelementptr %4180[%4187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4189 = llvm.load %4188 : !llvm.ptr + %4190 = llvm.fadd %4189, %4179 {RelaxedPrecision} : !llvm.float + %4191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4193 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4194 = llvm.mul %3918, %4193 : !llvm.i64 + %4195 = llvm.add %4192, %4194 : !llvm.i64 + %4196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4197 = llvm.mul %303, %4196 : !llvm.i64 + %4198 = llvm.add %4195, %4197 : !llvm.i64 + %4199 = llvm.getelementptr %4191[%4198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4190, %4199 : !llvm.ptr + %4200 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4202 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4203 = llvm.mul %3918, %4202 : !llvm.i64 + %4204 = llvm.add %4201, %4203 : !llvm.i64 + %4205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4206 = llvm.mul %303, %4205 : !llvm.i64 + %4207 = llvm.add %4204, %4206 : !llvm.i64 + %4208 = llvm.getelementptr %4200[%4207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4209 = llvm.load %4208 : !llvm.ptr + %4210 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4212 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4213 = llvm.mul %3918, %4212 : !llvm.i64 + %4214 = llvm.add %4211, %4213 : !llvm.i64 + %4215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4216 = llvm.mul %303, %4215 : !llvm.i64 + %4217 = llvm.add %4214, %4216 : !llvm.i64 + %4218 = llvm.getelementptr %4210[%4217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4209, %4218 : !llvm.ptr + %4219 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4221 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4222 = llvm.mul %3918, %4221 : !llvm.i64 + %4223 = llvm.add %4220, %4222 : !llvm.i64 + %4224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4225 = llvm.mul %59, %4224 : !llvm.i64 + %4226 = llvm.add %4223, %4225 : !llvm.i64 + %4227 = llvm.getelementptr %4219[%4226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4228 = llvm.load %4227 : !llvm.ptr + %4229 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4230 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4231 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4232 = llvm.mul %59, %4231 : !llvm.i64 + %4233 = llvm.add %4230, %4232 : !llvm.i64 + %4234 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4235 = llvm.mul %364, %4234 : !llvm.i64 + %4236 = llvm.add %4233, %4235 : !llvm.i64 + %4237 = llvm.getelementptr %4229[%4236] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4238 = llvm.load %4237 : !llvm.ptr + %4239 = llvm.fmul %4228, %4238 {RelaxedPrecision} : !llvm.float + %4240 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4242 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4243 = llvm.mul %3918, %4242 : !llvm.i64 + %4244 = llvm.add %4241, %4243 : !llvm.i64 + %4245 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4246 = llvm.mul %364, %4245 : !llvm.i64 + %4247 = llvm.add %4244, %4246 : !llvm.i64 + %4248 = llvm.getelementptr %4240[%4247] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4249 = llvm.load %4248 : !llvm.ptr + %4250 = llvm.fadd %4249, %4239 {RelaxedPrecision} : !llvm.float + %4251 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4252 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4253 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4254 = llvm.mul %3918, %4253 : !llvm.i64 + %4255 = llvm.add %4252, %4254 : !llvm.i64 + %4256 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4257 = llvm.mul %364, %4256 : !llvm.i64 + %4258 = llvm.add %4255, %4257 : !llvm.i64 + %4259 = llvm.getelementptr %4251[%4258] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4250, %4259 : !llvm.ptr + %4260 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4261 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4262 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4263 = llvm.mul %3918, %4262 : !llvm.i64 + %4264 = llvm.add %4261, %4263 : !llvm.i64 + %4265 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4266 = llvm.mul %364, %4265 : !llvm.i64 + %4267 = llvm.add %4264, %4266 : !llvm.i64 + %4268 = llvm.getelementptr %4260[%4267] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4269 = llvm.load %4268 : !llvm.ptr + %4270 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4271 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4272 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4273 = llvm.mul %3918, %4272 : !llvm.i64 + %4274 = llvm.add %4271, %4273 : !llvm.i64 + %4275 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4276 = llvm.mul %364, %4275 : !llvm.i64 + %4277 = llvm.add %4274, %4276 : !llvm.i64 + %4278 = llvm.getelementptr %4270[%4277] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4269, %4278 : !llvm.ptr + %4279 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4281 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4282 = llvm.mul %3918, %4281 : !llvm.i64 + %4283 = llvm.add %4280, %4282 : !llvm.i64 + %4284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4285 = llvm.mul %59, %4284 : !llvm.i64 + %4286 = llvm.add %4283, %4285 : !llvm.i64 + %4287 = llvm.getelementptr %4279[%4286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4288 = llvm.load %4287 : !llvm.ptr + %4289 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4290 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4291 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4292 = llvm.mul %59, %4291 : !llvm.i64 + %4293 = llvm.add %4290, %4292 : !llvm.i64 + %4294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4295 = llvm.mul %425, %4294 : !llvm.i64 + %4296 = llvm.add %4293, %4295 : !llvm.i64 + %4297 = llvm.getelementptr %4289[%4296] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4298 = llvm.load %4297 : !llvm.ptr + %4299 = llvm.fmul %4288, %4298 {RelaxedPrecision} : !llvm.float + %4300 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4302 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4303 = llvm.mul %3918, %4302 : !llvm.i64 + %4304 = llvm.add %4301, %4303 : !llvm.i64 + %4305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4306 = llvm.mul %425, %4305 : !llvm.i64 + %4307 = llvm.add %4304, %4306 : !llvm.i64 + %4308 = llvm.getelementptr %4300[%4307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4309 = llvm.load %4308 : !llvm.ptr + %4310 = llvm.fadd %4309, %4299 {RelaxedPrecision} : !llvm.float + %4311 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4314 = llvm.mul %3918, %4313 : !llvm.i64 + %4315 = llvm.add %4312, %4314 : !llvm.i64 + %4316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4317 = llvm.mul %425, %4316 : !llvm.i64 + %4318 = llvm.add %4315, %4317 : !llvm.i64 + %4319 = llvm.getelementptr %4311[%4318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4310, %4319 : !llvm.ptr + %4320 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4321 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4322 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4323 = llvm.mul %3918, %4322 : !llvm.i64 + %4324 = llvm.add %4321, %4323 : !llvm.i64 + %4325 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4326 = llvm.mul %425, %4325 : !llvm.i64 + %4327 = llvm.add %4324, %4326 : !llvm.i64 + %4328 = llvm.getelementptr %4320[%4327] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4329 = llvm.load %4328 : !llvm.ptr + %4330 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4331 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4332 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4333 = llvm.mul %3918, %4332 : !llvm.i64 + %4334 = llvm.add %4331, %4333 : !llvm.i64 + %4335 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4336 = llvm.mul %425, %4335 : !llvm.i64 + %4337 = llvm.add %4334, %4336 : !llvm.i64 + %4338 = llvm.getelementptr %4330[%4337] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4329, %4338 : !llvm.ptr + %4339 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4341 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4342 = llvm.mul %3918, %4341 : !llvm.i64 + %4343 = llvm.add %4340, %4342 : !llvm.i64 + %4344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4345 = llvm.mul %59, %4344 : !llvm.i64 + %4346 = llvm.add %4343, %4345 : !llvm.i64 + %4347 = llvm.getelementptr %4339[%4346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4348 = llvm.load %4347 : !llvm.ptr + %4349 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4350 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4351 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4352 = llvm.mul %59, %4351 : !llvm.i64 + %4353 = llvm.add %4350, %4352 : !llvm.i64 + %4354 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4355 = llvm.mul %486, %4354 : !llvm.i64 + %4356 = llvm.add %4353, %4355 : !llvm.i64 + %4357 = llvm.getelementptr %4349[%4356] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4358 = llvm.load %4357 : !llvm.ptr + %4359 = llvm.fmul %4348, %4358 {RelaxedPrecision} : !llvm.float + %4360 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4361 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4362 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4363 = llvm.mul %3918, %4362 : !llvm.i64 + %4364 = llvm.add %4361, %4363 : !llvm.i64 + %4365 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4366 = llvm.mul %486, %4365 : !llvm.i64 + %4367 = llvm.add %4364, %4366 : !llvm.i64 + %4368 = llvm.getelementptr %4360[%4367] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4369 = llvm.load %4368 : !llvm.ptr + %4370 = llvm.fadd %4369, %4359 {RelaxedPrecision} : !llvm.float + %4371 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4372 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4373 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4374 = llvm.mul %3918, %4373 : !llvm.i64 + %4375 = llvm.add %4372, %4374 : !llvm.i64 + %4376 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4377 = llvm.mul %486, %4376 : !llvm.i64 + %4378 = llvm.add %4375, %4377 : !llvm.i64 + %4379 = llvm.getelementptr %4371[%4378] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4370, %4379 : !llvm.ptr + %4380 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4382 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4383 = llvm.mul %3918, %4382 : !llvm.i64 + %4384 = llvm.add %4381, %4383 : !llvm.i64 + %4385 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4386 = llvm.mul %486, %4385 : !llvm.i64 + %4387 = llvm.add %4384, %4386 : !llvm.i64 + %4388 = llvm.getelementptr %4380[%4387] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4389 = llvm.load %4388 : !llvm.ptr + %4390 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4392 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4393 = llvm.mul %3918, %4392 : !llvm.i64 + %4394 = llvm.add %4391, %4393 : !llvm.i64 + %4395 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4396 = llvm.mul %486, %4395 : !llvm.i64 + %4397 = llvm.add %4394, %4396 : !llvm.i64 + %4398 = llvm.getelementptr %4390[%4397] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4389, %4398 : !llvm.ptr + %4399 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4401 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4402 = llvm.mul %3918, %4401 : !llvm.i64 + %4403 = llvm.add %4400, %4402 : !llvm.i64 + %4404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4405 = llvm.mul %59, %4404 : !llvm.i64 + %4406 = llvm.add %4403, %4405 : !llvm.i64 + %4407 = llvm.getelementptr %4399[%4406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4408 = llvm.load %4407 : !llvm.ptr + %4409 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4410 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4411 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4412 = llvm.mul %59, %4411 : !llvm.i64 + %4413 = llvm.add %4410, %4412 : !llvm.i64 + %4414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4415 = llvm.mul %547, %4414 : !llvm.i64 + %4416 = llvm.add %4413, %4415 : !llvm.i64 + %4417 = llvm.getelementptr %4409[%4416] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4418 = llvm.load %4417 : !llvm.ptr + %4419 = llvm.fmul %4408, %4418 {RelaxedPrecision} : !llvm.float + %4420 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4421 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4422 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4423 = llvm.mul %3918, %4422 : !llvm.i64 + %4424 = llvm.add %4421, %4423 : !llvm.i64 + %4425 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4426 = llvm.mul %547, %4425 : !llvm.i64 + %4427 = llvm.add %4424, %4426 : !llvm.i64 + %4428 = llvm.getelementptr %4420[%4427] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4429 = llvm.load %4428 : !llvm.ptr + %4430 = llvm.fadd %4429, %4419 {RelaxedPrecision} : !llvm.float + %4431 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4433 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4434 = llvm.mul %3918, %4433 : !llvm.i64 + %4435 = llvm.add %4432, %4434 : !llvm.i64 + %4436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4437 = llvm.mul %547, %4436 : !llvm.i64 + %4438 = llvm.add %4435, %4437 : !llvm.i64 + %4439 = llvm.getelementptr %4431[%4438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4430, %4439 : !llvm.ptr + %4440 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4441 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4442 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4443 = llvm.mul %3918, %4442 : !llvm.i64 + %4444 = llvm.add %4441, %4443 : !llvm.i64 + %4445 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4446 = llvm.mul %547, %4445 : !llvm.i64 + %4447 = llvm.add %4444, %4446 : !llvm.i64 + %4448 = llvm.getelementptr %4440[%4447] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4449 = llvm.load %4448 : !llvm.ptr + %4450 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4451 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4452 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4453 = llvm.mul %3918, %4452 : !llvm.i64 + %4454 = llvm.add %4451, %4453 : !llvm.i64 + %4455 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4456 = llvm.mul %547, %4455 : !llvm.i64 + %4457 = llvm.add %4454, %4456 : !llvm.i64 + %4458 = llvm.getelementptr %4450[%4457] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4449, %4458 : !llvm.ptr + %4459 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4461 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4462 = llvm.mul %3918, %4461 : !llvm.i64 + %4463 = llvm.add %4460, %4462 : !llvm.i64 + %4464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4465 = llvm.mul %59, %4464 : !llvm.i64 + %4466 = llvm.add %4463, %4465 : !llvm.i64 + %4467 = llvm.getelementptr %4459[%4466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4468 = llvm.load %4467 : !llvm.ptr + %4469 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4470 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4471 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4472 = llvm.mul %59, %4471 : !llvm.i64 + %4473 = llvm.add %4470, %4472 : !llvm.i64 + %4474 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4475 = llvm.mul %608, %4474 : !llvm.i64 + %4476 = llvm.add %4473, %4475 : !llvm.i64 + %4477 = llvm.getelementptr %4469[%4476] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4478 = llvm.load %4477 : !llvm.ptr + %4479 = llvm.fmul %4468, %4478 {RelaxedPrecision} : !llvm.float + %4480 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4481 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4482 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4483 = llvm.mul %3918, %4482 : !llvm.i64 + %4484 = llvm.add %4481, %4483 : !llvm.i64 + %4485 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4486 = llvm.mul %608, %4485 : !llvm.i64 + %4487 = llvm.add %4484, %4486 : !llvm.i64 + %4488 = llvm.getelementptr %4480[%4487] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4489 = llvm.load %4488 : !llvm.ptr + %4490 = llvm.fadd %4489, %4479 {RelaxedPrecision} : !llvm.float + %4491 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4492 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4493 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4494 = llvm.mul %3918, %4493 : !llvm.i64 + %4495 = llvm.add %4492, %4494 : !llvm.i64 + %4496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4497 = llvm.mul %608, %4496 : !llvm.i64 + %4498 = llvm.add %4495, %4497 : !llvm.i64 + %4499 = llvm.getelementptr %4491[%4498] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4490, %4499 : !llvm.ptr + %4500 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4501 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4502 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4503 = llvm.mul %3918, %4502 : !llvm.i64 + %4504 = llvm.add %4501, %4503 : !llvm.i64 + %4505 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4506 = llvm.mul %608, %4505 : !llvm.i64 + %4507 = llvm.add %4504, %4506 : !llvm.i64 + %4508 = llvm.getelementptr %4500[%4507] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4509 = llvm.load %4508 : !llvm.ptr + %4510 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4511 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4512 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4513 = llvm.mul %3918, %4512 : !llvm.i64 + %4514 = llvm.add %4511, %4513 : !llvm.i64 + %4515 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4516 = llvm.mul %608, %4515 : !llvm.i64 + %4517 = llvm.add %4514, %4516 : !llvm.i64 + %4518 = llvm.getelementptr %4510[%4517] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4509, %4518 : !llvm.ptr + %4519 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4521 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4522 = llvm.mul %3918, %4521 : !llvm.i64 + %4523 = llvm.add %4520, %4522 : !llvm.i64 + %4524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4525 = llvm.mul %59, %4524 : !llvm.i64 + %4526 = llvm.add %4523, %4525 : !llvm.i64 + %4527 = llvm.getelementptr %4519[%4526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4528 = llvm.load %4527 : !llvm.ptr + %4529 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4530 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4531 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4532 = llvm.mul %59, %4531 : !llvm.i64 + %4533 = llvm.add %4530, %4532 : !llvm.i64 + %4534 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4535 = llvm.mul %669, %4534 : !llvm.i64 + %4536 = llvm.add %4533, %4535 : !llvm.i64 + %4537 = llvm.getelementptr %4529[%4536] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4538 = llvm.load %4537 : !llvm.ptr + %4539 = llvm.fmul %4528, %4538 {RelaxedPrecision} : !llvm.float + %4540 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4542 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4543 = llvm.mul %3918, %4542 : !llvm.i64 + %4544 = llvm.add %4541, %4543 : !llvm.i64 + %4545 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4546 = llvm.mul %669, %4545 : !llvm.i64 + %4547 = llvm.add %4544, %4546 : !llvm.i64 + %4548 = llvm.getelementptr %4540[%4547] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4549 = llvm.load %4548 : !llvm.ptr + %4550 = llvm.fadd %4549, %4539 {RelaxedPrecision} : !llvm.float + %4551 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4552 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4553 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4554 = llvm.mul %3918, %4553 : !llvm.i64 + %4555 = llvm.add %4552, %4554 : !llvm.i64 + %4556 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4557 = llvm.mul %669, %4556 : !llvm.i64 + %4558 = llvm.add %4555, %4557 : !llvm.i64 + %4559 = llvm.getelementptr %4551[%4558] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4550, %4559 : !llvm.ptr + %4560 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4561 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4562 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4563 = llvm.mul %3918, %4562 : !llvm.i64 + %4564 = llvm.add %4561, %4563 : !llvm.i64 + %4565 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4566 = llvm.mul %669, %4565 : !llvm.i64 + %4567 = llvm.add %4564, %4566 : !llvm.i64 + %4568 = llvm.getelementptr %4560[%4567] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4569 = llvm.load %4568 : !llvm.ptr + %4570 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4571 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4572 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4573 = llvm.mul %3918, %4572 : !llvm.i64 + %4574 = llvm.add %4571, %4573 : !llvm.i64 + %4575 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4576 = llvm.mul %669, %4575 : !llvm.i64 + %4577 = llvm.add %4574, %4576 : !llvm.i64 + %4578 = llvm.getelementptr %4570[%4577] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4569, %4578 : !llvm.ptr + %4579 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4581 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4582 = llvm.mul %3918, %4581 : !llvm.i64 + %4583 = llvm.add %4580, %4582 : !llvm.i64 + %4584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4585 = llvm.mul %59, %4584 : !llvm.i64 + %4586 = llvm.add %4583, %4585 : !llvm.i64 + %4587 = llvm.getelementptr %4579[%4586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4588 = llvm.load %4587 : !llvm.ptr + %4589 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4591 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4592 = llvm.mul %59, %4591 : !llvm.i64 + %4593 = llvm.add %4590, %4592 : !llvm.i64 + %4594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4595 = llvm.mul %730, %4594 : !llvm.i64 + %4596 = llvm.add %4593, %4595 : !llvm.i64 + %4597 = llvm.getelementptr %4589[%4596] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4598 = llvm.load %4597 : !llvm.ptr + %4599 = llvm.fmul %4588, %4598 {RelaxedPrecision} : !llvm.float + %4600 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4602 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4603 = llvm.mul %3918, %4602 : !llvm.i64 + %4604 = llvm.add %4601, %4603 : !llvm.i64 + %4605 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4606 = llvm.mul %730, %4605 : !llvm.i64 + %4607 = llvm.add %4604, %4606 : !llvm.i64 + %4608 = llvm.getelementptr %4600[%4607] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4609 = llvm.load %4608 : !llvm.ptr + %4610 = llvm.fadd %4609, %4599 {RelaxedPrecision} : !llvm.float + %4611 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4612 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4613 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4614 = llvm.mul %3918, %4613 : !llvm.i64 + %4615 = llvm.add %4612, %4614 : !llvm.i64 + %4616 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4617 = llvm.mul %730, %4616 : !llvm.i64 + %4618 = llvm.add %4615, %4617 : !llvm.i64 + %4619 = llvm.getelementptr %4611[%4618] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4610, %4619 : !llvm.ptr + %4620 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4621 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4622 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4623 = llvm.mul %3918, %4622 : !llvm.i64 + %4624 = llvm.add %4621, %4623 : !llvm.i64 + %4625 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4626 = llvm.mul %730, %4625 : !llvm.i64 + %4627 = llvm.add %4624, %4626 : !llvm.i64 + %4628 = llvm.getelementptr %4620[%4627] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4629 = llvm.load %4628 : !llvm.ptr + %4630 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4632 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4633 = llvm.mul %3918, %4632 : !llvm.i64 + %4634 = llvm.add %4631, %4633 : !llvm.i64 + %4635 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4636 = llvm.mul %730, %4635 : !llvm.i64 + %4637 = llvm.add %4634, %4636 : !llvm.i64 + %4638 = llvm.getelementptr %4630[%4637] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4629, %4638 : !llvm.ptr + %4639 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4641 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4642 = llvm.mul %3918, %4641 : !llvm.i64 + %4643 = llvm.add %4640, %4642 : !llvm.i64 + %4644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4645 = llvm.mul %59, %4644 : !llvm.i64 + %4646 = llvm.add %4643, %4645 : !llvm.i64 + %4647 = llvm.getelementptr %4639[%4646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4648 = llvm.load %4647 : !llvm.ptr + %4649 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4651 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4652 = llvm.mul %59, %4651 : !llvm.i64 + %4653 = llvm.add %4650, %4652 : !llvm.i64 + %4654 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4655 = llvm.mul %791, %4654 : !llvm.i64 + %4656 = llvm.add %4653, %4655 : !llvm.i64 + %4657 = llvm.getelementptr %4649[%4656] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4658 = llvm.load %4657 : !llvm.ptr + %4659 = llvm.fmul %4648, %4658 {RelaxedPrecision} : !llvm.float + %4660 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4662 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4663 = llvm.mul %3918, %4662 : !llvm.i64 + %4664 = llvm.add %4661, %4663 : !llvm.i64 + %4665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4666 = llvm.mul %791, %4665 : !llvm.i64 + %4667 = llvm.add %4664, %4666 : !llvm.i64 + %4668 = llvm.getelementptr %4660[%4667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4669 = llvm.load %4668 : !llvm.ptr + %4670 = llvm.fadd %4669, %4659 {RelaxedPrecision} : !llvm.float + %4671 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4672 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4673 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4674 = llvm.mul %3918, %4673 : !llvm.i64 + %4675 = llvm.add %4672, %4674 : !llvm.i64 + %4676 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4677 = llvm.mul %791, %4676 : !llvm.i64 + %4678 = llvm.add %4675, %4677 : !llvm.i64 + %4679 = llvm.getelementptr %4671[%4678] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4670, %4679 : !llvm.ptr + %4680 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4681 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4682 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4683 = llvm.mul %3918, %4682 : !llvm.i64 + %4684 = llvm.add %4681, %4683 : !llvm.i64 + %4685 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4686 = llvm.mul %791, %4685 : !llvm.i64 + %4687 = llvm.add %4684, %4686 : !llvm.i64 + %4688 = llvm.getelementptr %4680[%4687] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4689 = llvm.load %4688 : !llvm.ptr + %4690 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4691 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4692 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4693 = llvm.mul %3918, %4692 : !llvm.i64 + %4694 = llvm.add %4691, %4693 : !llvm.i64 + %4695 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4696 = llvm.mul %791, %4695 : !llvm.i64 + %4697 = llvm.add %4694, %4696 : !llvm.i64 + %4698 = llvm.getelementptr %4690[%4697] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4689, %4698 : !llvm.ptr + %4699 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4701 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4702 = llvm.mul %3918, %4701 : !llvm.i64 + %4703 = llvm.add %4700, %4702 : !llvm.i64 + %4704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4705 = llvm.mul %59, %4704 : !llvm.i64 + %4706 = llvm.add %4703, %4705 : !llvm.i64 + %4707 = llvm.getelementptr %4699[%4706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4708 = llvm.load %4707 : !llvm.ptr + %4709 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4710 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4711 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4712 = llvm.mul %59, %4711 : !llvm.i64 + %4713 = llvm.add %4710, %4712 : !llvm.i64 + %4714 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4715 = llvm.mul %852, %4714 : !llvm.i64 + %4716 = llvm.add %4713, %4715 : !llvm.i64 + %4717 = llvm.getelementptr %4709[%4716] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4718 = llvm.load %4717 : !llvm.ptr + %4719 = llvm.fmul %4708, %4718 {RelaxedPrecision} : !llvm.float + %4720 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4721 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4722 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4723 = llvm.mul %3918, %4722 : !llvm.i64 + %4724 = llvm.add %4721, %4723 : !llvm.i64 + %4725 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4726 = llvm.mul %852, %4725 : !llvm.i64 + %4727 = llvm.add %4724, %4726 : !llvm.i64 + %4728 = llvm.getelementptr %4720[%4727] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4729 = llvm.load %4728 : !llvm.ptr + %4730 = llvm.fadd %4729, %4719 {RelaxedPrecision} : !llvm.float + %4731 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4732 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4733 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4734 = llvm.mul %3918, %4733 : !llvm.i64 + %4735 = llvm.add %4732, %4734 : !llvm.i64 + %4736 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4737 = llvm.mul %852, %4736 : !llvm.i64 + %4738 = llvm.add %4735, %4737 : !llvm.i64 + %4739 = llvm.getelementptr %4731[%4738] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4730, %4739 : !llvm.ptr + %4740 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4741 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4742 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4743 = llvm.mul %3918, %4742 : !llvm.i64 + %4744 = llvm.add %4741, %4743 : !llvm.i64 + %4745 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4746 = llvm.mul %852, %4745 : !llvm.i64 + %4747 = llvm.add %4744, %4746 : !llvm.i64 + %4748 = llvm.getelementptr %4740[%4747] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4749 = llvm.load %4748 : !llvm.ptr + %4750 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4751 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4752 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4753 = llvm.mul %3918, %4752 : !llvm.i64 + %4754 = llvm.add %4751, %4753 : !llvm.i64 + %4755 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4756 = llvm.mul %852, %4755 : !llvm.i64 + %4757 = llvm.add %4754, %4756 : !llvm.i64 + %4758 = llvm.getelementptr %4750[%4757] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4749, %4758 : !llvm.ptr + %4759 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4761 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4762 = llvm.mul %3918, %4761 : !llvm.i64 + %4763 = llvm.add %4760, %4762 : !llvm.i64 + %4764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4765 = llvm.mul %59, %4764 : !llvm.i64 + %4766 = llvm.add %4763, %4765 : !llvm.i64 + %4767 = llvm.getelementptr %4759[%4766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4768 = llvm.load %4767 : !llvm.ptr + %4769 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4770 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4771 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4772 = llvm.mul %59, %4771 : !llvm.i64 + %4773 = llvm.add %4770, %4772 : !llvm.i64 + %4774 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4775 = llvm.mul %913, %4774 : !llvm.i64 + %4776 = llvm.add %4773, %4775 : !llvm.i64 + %4777 = llvm.getelementptr %4769[%4776] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4778 = llvm.load %4777 : !llvm.ptr + %4779 = llvm.fmul %4768, %4778 {RelaxedPrecision} : !llvm.float + %4780 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4781 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4782 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4783 = llvm.mul %3918, %4782 : !llvm.i64 + %4784 = llvm.add %4781, %4783 : !llvm.i64 + %4785 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4786 = llvm.mul %913, %4785 : !llvm.i64 + %4787 = llvm.add %4784, %4786 : !llvm.i64 + %4788 = llvm.getelementptr %4780[%4787] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4789 = llvm.load %4788 : !llvm.ptr + %4790 = llvm.fadd %4789, %4779 {RelaxedPrecision} : !llvm.float + %4791 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4792 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4793 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4794 = llvm.mul %3918, %4793 : !llvm.i64 + %4795 = llvm.add %4792, %4794 : !llvm.i64 + %4796 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4797 = llvm.mul %913, %4796 : !llvm.i64 + %4798 = llvm.add %4795, %4797 : !llvm.i64 + %4799 = llvm.getelementptr %4791[%4798] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4790, %4799 : !llvm.ptr + %4800 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4801 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4802 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4803 = llvm.mul %3918, %4802 : !llvm.i64 + %4804 = llvm.add %4801, %4803 : !llvm.i64 + %4805 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4806 = llvm.mul %913, %4805 : !llvm.i64 + %4807 = llvm.add %4804, %4806 : !llvm.i64 + %4808 = llvm.getelementptr %4800[%4807] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4809 = llvm.load %4808 : !llvm.ptr + %4810 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4811 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4812 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4813 = llvm.mul %3918, %4812 : !llvm.i64 + %4814 = llvm.add %4811, %4813 : !llvm.i64 + %4815 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4816 = llvm.mul %913, %4815 : !llvm.i64 + %4817 = llvm.add %4814, %4816 : !llvm.i64 + %4818 = llvm.getelementptr %4810[%4817] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4809, %4818 : !llvm.ptr + %4819 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4821 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4822 = llvm.mul %3918, %4821 : !llvm.i64 + %4823 = llvm.add %4820, %4822 : !llvm.i64 + %4824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4825 = llvm.mul %59, %4824 : !llvm.i64 + %4826 = llvm.add %4823, %4825 : !llvm.i64 + %4827 = llvm.getelementptr %4819[%4826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4828 = llvm.load %4827 : !llvm.ptr + %4829 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4830 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4831 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4832 = llvm.mul %59, %4831 : !llvm.i64 + %4833 = llvm.add %4830, %4832 : !llvm.i64 + %4834 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4835 = llvm.mul %974, %4834 : !llvm.i64 + %4836 = llvm.add %4833, %4835 : !llvm.i64 + %4837 = llvm.getelementptr %4829[%4836] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4838 = llvm.load %4837 : !llvm.ptr + %4839 = llvm.fmul %4828, %4838 {RelaxedPrecision} : !llvm.float + %4840 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4841 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4842 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4843 = llvm.mul %3918, %4842 : !llvm.i64 + %4844 = llvm.add %4841, %4843 : !llvm.i64 + %4845 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4846 = llvm.mul %974, %4845 : !llvm.i64 + %4847 = llvm.add %4844, %4846 : !llvm.i64 + %4848 = llvm.getelementptr %4840[%4847] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4849 = llvm.load %4848 : !llvm.ptr + %4850 = llvm.fadd %4849, %4839 {RelaxedPrecision} : !llvm.float + %4851 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4852 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4853 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4854 = llvm.mul %3918, %4853 : !llvm.i64 + %4855 = llvm.add %4852, %4854 : !llvm.i64 + %4856 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4857 = llvm.mul %974, %4856 : !llvm.i64 + %4858 = llvm.add %4855, %4857 : !llvm.i64 + %4859 = llvm.getelementptr %4851[%4858] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4850, %4859 : !llvm.ptr + %4860 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4861 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4862 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4863 = llvm.mul %3918, %4862 : !llvm.i64 + %4864 = llvm.add %4861, %4863 : !llvm.i64 + %4865 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4866 = llvm.mul %974, %4865 : !llvm.i64 + %4867 = llvm.add %4864, %4866 : !llvm.i64 + %4868 = llvm.getelementptr %4860[%4867] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4869 = llvm.load %4868 : !llvm.ptr + %4870 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4871 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4872 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4873 = llvm.mul %3918, %4872 : !llvm.i64 + %4874 = llvm.add %4871, %4873 : !llvm.i64 + %4875 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4876 = llvm.mul %974, %4875 : !llvm.i64 + %4877 = llvm.add %4874, %4876 : !llvm.i64 + %4878 = llvm.getelementptr %4870[%4877] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4869, %4878 : !llvm.ptr + %4879 = llvm.add %50, %37 : !llvm.i64 + %4880 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4881 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4882 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4883 = llvm.mul %4879, %4882 : !llvm.i64 + %4884 = llvm.add %4881, %4883 : !llvm.i64 + %4885 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4886 = llvm.mul %59, %4885 : !llvm.i64 + %4887 = llvm.add %4884, %4886 : !llvm.i64 + %4888 = llvm.getelementptr %4880[%4887] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4889 = llvm.load %4888 : !llvm.ptr + %4890 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4892 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4893 = llvm.mul %59, %4892 : !llvm.i64 + %4894 = llvm.add %4891, %4893 : !llvm.i64 + %4895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4896 = llvm.mul %58, %4895 : !llvm.i64 + %4897 = llvm.add %4894, %4896 : !llvm.i64 + %4898 = llvm.getelementptr %4890[%4897] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4899 = llvm.load %4898 : !llvm.ptr + %4900 = llvm.fmul %4889, %4899 {RelaxedPrecision} : !llvm.float + %4901 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4903 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4904 = llvm.mul %4879, %4903 : !llvm.i64 + %4905 = llvm.add %4902, %4904 : !llvm.i64 + %4906 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4907 = llvm.mul %58, %4906 : !llvm.i64 + %4908 = llvm.add %4905, %4907 : !llvm.i64 + %4909 = llvm.getelementptr %4901[%4908] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4910 = llvm.load %4909 : !llvm.ptr + %4911 = llvm.fadd %4910, %4900 {RelaxedPrecision} : !llvm.float + %4912 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4913 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4914 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4915 = llvm.mul %4879, %4914 : !llvm.i64 + %4916 = llvm.add %4913, %4915 : !llvm.i64 + %4917 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4918 = llvm.mul %58, %4917 : !llvm.i64 + %4919 = llvm.add %4916, %4918 : !llvm.i64 + %4920 = llvm.getelementptr %4912[%4919] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4911, %4920 : !llvm.ptr + %4921 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4922 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4923 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4924 = llvm.mul %4879, %4923 : !llvm.i64 + %4925 = llvm.add %4922, %4924 : !llvm.i64 + %4926 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4927 = llvm.mul %58, %4926 : !llvm.i64 + %4928 = llvm.add %4925, %4927 : !llvm.i64 + %4929 = llvm.getelementptr %4921[%4928] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4930 = llvm.load %4929 : !llvm.ptr + %4931 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4932 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4933 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4934 = llvm.mul %4879, %4933 : !llvm.i64 + %4935 = llvm.add %4932, %4934 : !llvm.i64 + %4936 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4937 = llvm.mul %58, %4936 : !llvm.i64 + %4938 = llvm.add %4935, %4937 : !llvm.i64 + %4939 = llvm.getelementptr %4931[%4938] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4930, %4939 : !llvm.ptr + %4940 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4941 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4942 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4943 = llvm.mul %4879, %4942 : !llvm.i64 + %4944 = llvm.add %4941, %4943 : !llvm.i64 + %4945 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4946 = llvm.mul %59, %4945 : !llvm.i64 + %4947 = llvm.add %4944, %4946 : !llvm.i64 + %4948 = llvm.getelementptr %4940[%4947] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4949 = llvm.load %4948 : !llvm.ptr + %4950 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4951 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4952 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4953 = llvm.mul %59, %4952 : !llvm.i64 + %4954 = llvm.add %4951, %4953 : !llvm.i64 + %4955 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4956 = llvm.mul %120, %4955 : !llvm.i64 + %4957 = llvm.add %4954, %4956 : !llvm.i64 + %4958 = llvm.getelementptr %4950[%4957] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4959 = llvm.load %4958 : !llvm.ptr + %4960 = llvm.fmul %4949, %4959 {RelaxedPrecision} : !llvm.float + %4961 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4962 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4963 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4964 = llvm.mul %4879, %4963 : !llvm.i64 + %4965 = llvm.add %4962, %4964 : !llvm.i64 + %4966 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4967 = llvm.mul %120, %4966 : !llvm.i64 + %4968 = llvm.add %4965, %4967 : !llvm.i64 + %4969 = llvm.getelementptr %4961[%4968] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4970 = llvm.load %4969 : !llvm.ptr + %4971 = llvm.fadd %4970, %4960 {RelaxedPrecision} : !llvm.float + %4972 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4973 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4974 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4975 = llvm.mul %4879, %4974 : !llvm.i64 + %4976 = llvm.add %4973, %4975 : !llvm.i64 + %4977 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4978 = llvm.mul %120, %4977 : !llvm.i64 + %4979 = llvm.add %4976, %4978 : !llvm.i64 + %4980 = llvm.getelementptr %4972[%4979] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4971, %4980 : !llvm.ptr + %4981 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4982 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4983 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4984 = llvm.mul %4879, %4983 : !llvm.i64 + %4985 = llvm.add %4982, %4984 : !llvm.i64 + %4986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4987 = llvm.mul %120, %4986 : !llvm.i64 + %4988 = llvm.add %4985, %4987 : !llvm.i64 + %4989 = llvm.getelementptr %4981[%4988] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4990 = llvm.load %4989 : !llvm.ptr + %4991 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4992 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4993 = llvm.mlir.constant(512 : index) : !llvm.i64 + %4994 = llvm.mul %4879, %4993 : !llvm.i64 + %4995 = llvm.add %4992, %4994 : !llvm.i64 + %4996 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4997 = llvm.mul %120, %4996 : !llvm.i64 + %4998 = llvm.add %4995, %4997 : !llvm.i64 + %4999 = llvm.getelementptr %4991[%4998] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %4990, %4999 : !llvm.ptr + %5000 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5001 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5002 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5003 = llvm.mul %4879, %5002 : !llvm.i64 + %5004 = llvm.add %5001, %5003 : !llvm.i64 + %5005 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5006 = llvm.mul %59, %5005 : !llvm.i64 + %5007 = llvm.add %5004, %5006 : !llvm.i64 + %5008 = llvm.getelementptr %5000[%5007] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5009 = llvm.load %5008 : !llvm.ptr + %5010 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5011 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5012 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5013 = llvm.mul %59, %5012 : !llvm.i64 + %5014 = llvm.add %5011, %5013 : !llvm.i64 + %5015 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5016 = llvm.mul %181, %5015 : !llvm.i64 + %5017 = llvm.add %5014, %5016 : !llvm.i64 + %5018 = llvm.getelementptr %5010[%5017] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5019 = llvm.load %5018 : !llvm.ptr + %5020 = llvm.fmul %5009, %5019 {RelaxedPrecision} : !llvm.float + %5021 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5022 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5023 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5024 = llvm.mul %4879, %5023 : !llvm.i64 + %5025 = llvm.add %5022, %5024 : !llvm.i64 + %5026 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5027 = llvm.mul %181, %5026 : !llvm.i64 + %5028 = llvm.add %5025, %5027 : !llvm.i64 + %5029 = llvm.getelementptr %5021[%5028] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5030 = llvm.load %5029 : !llvm.ptr + %5031 = llvm.fadd %5030, %5020 {RelaxedPrecision} : !llvm.float + %5032 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5033 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5034 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5035 = llvm.mul %4879, %5034 : !llvm.i64 + %5036 = llvm.add %5033, %5035 : !llvm.i64 + %5037 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5038 = llvm.mul %181, %5037 : !llvm.i64 + %5039 = llvm.add %5036, %5038 : !llvm.i64 + %5040 = llvm.getelementptr %5032[%5039] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5031, %5040 : !llvm.ptr + %5041 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5042 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5043 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5044 = llvm.mul %4879, %5043 : !llvm.i64 + %5045 = llvm.add %5042, %5044 : !llvm.i64 + %5046 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5047 = llvm.mul %181, %5046 : !llvm.i64 + %5048 = llvm.add %5045, %5047 : !llvm.i64 + %5049 = llvm.getelementptr %5041[%5048] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5050 = llvm.load %5049 : !llvm.ptr + %5051 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5052 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5053 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5054 = llvm.mul %4879, %5053 : !llvm.i64 + %5055 = llvm.add %5052, %5054 : !llvm.i64 + %5056 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5057 = llvm.mul %181, %5056 : !llvm.i64 + %5058 = llvm.add %5055, %5057 : !llvm.i64 + %5059 = llvm.getelementptr %5051[%5058] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5050, %5059 : !llvm.ptr + %5060 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5062 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5063 = llvm.mul %4879, %5062 : !llvm.i64 + %5064 = llvm.add %5061, %5063 : !llvm.i64 + %5065 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5066 = llvm.mul %59, %5065 : !llvm.i64 + %5067 = llvm.add %5064, %5066 : !llvm.i64 + %5068 = llvm.getelementptr %5060[%5067] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5069 = llvm.load %5068 : !llvm.ptr + %5070 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5071 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5072 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5073 = llvm.mul %59, %5072 : !llvm.i64 + %5074 = llvm.add %5071, %5073 : !llvm.i64 + %5075 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5076 = llvm.mul %242, %5075 : !llvm.i64 + %5077 = llvm.add %5074, %5076 : !llvm.i64 + %5078 = llvm.getelementptr %5070[%5077] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5079 = llvm.load %5078 : !llvm.ptr + %5080 = llvm.fmul %5069, %5079 {RelaxedPrecision} : !llvm.float + %5081 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5082 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5083 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5084 = llvm.mul %4879, %5083 : !llvm.i64 + %5085 = llvm.add %5082, %5084 : !llvm.i64 + %5086 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5087 = llvm.mul %242, %5086 : !llvm.i64 + %5088 = llvm.add %5085, %5087 : !llvm.i64 + %5089 = llvm.getelementptr %5081[%5088] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5090 = llvm.load %5089 : !llvm.ptr + %5091 = llvm.fadd %5090, %5080 {RelaxedPrecision} : !llvm.float + %5092 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5093 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5094 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5095 = llvm.mul %4879, %5094 : !llvm.i64 + %5096 = llvm.add %5093, %5095 : !llvm.i64 + %5097 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5098 = llvm.mul %242, %5097 : !llvm.i64 + %5099 = llvm.add %5096, %5098 : !llvm.i64 + %5100 = llvm.getelementptr %5092[%5099] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5091, %5100 : !llvm.ptr + %5101 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5102 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5103 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5104 = llvm.mul %4879, %5103 : !llvm.i64 + %5105 = llvm.add %5102, %5104 : !llvm.i64 + %5106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5107 = llvm.mul %242, %5106 : !llvm.i64 + %5108 = llvm.add %5105, %5107 : !llvm.i64 + %5109 = llvm.getelementptr %5101[%5108] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5110 = llvm.load %5109 : !llvm.ptr + %5111 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5112 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5113 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5114 = llvm.mul %4879, %5113 : !llvm.i64 + %5115 = llvm.add %5112, %5114 : !llvm.i64 + %5116 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5117 = llvm.mul %242, %5116 : !llvm.i64 + %5118 = llvm.add %5115, %5117 : !llvm.i64 + %5119 = llvm.getelementptr %5111[%5118] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5110, %5119 : !llvm.ptr + %5120 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5122 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5123 = llvm.mul %4879, %5122 : !llvm.i64 + %5124 = llvm.add %5121, %5123 : !llvm.i64 + %5125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5126 = llvm.mul %59, %5125 : !llvm.i64 + %5127 = llvm.add %5124, %5126 : !llvm.i64 + %5128 = llvm.getelementptr %5120[%5127] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5129 = llvm.load %5128 : !llvm.ptr + %5130 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5132 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5133 = llvm.mul %59, %5132 : !llvm.i64 + %5134 = llvm.add %5131, %5133 : !llvm.i64 + %5135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5136 = llvm.mul %303, %5135 : !llvm.i64 + %5137 = llvm.add %5134, %5136 : !llvm.i64 + %5138 = llvm.getelementptr %5130[%5137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5139 = llvm.load %5138 : !llvm.ptr + %5140 = llvm.fmul %5129, %5139 {RelaxedPrecision} : !llvm.float + %5141 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5142 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5143 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5144 = llvm.mul %4879, %5143 : !llvm.i64 + %5145 = llvm.add %5142, %5144 : !llvm.i64 + %5146 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5147 = llvm.mul %303, %5146 : !llvm.i64 + %5148 = llvm.add %5145, %5147 : !llvm.i64 + %5149 = llvm.getelementptr %5141[%5148] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5150 = llvm.load %5149 : !llvm.ptr + %5151 = llvm.fadd %5150, %5140 {RelaxedPrecision} : !llvm.float + %5152 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5153 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5154 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5155 = llvm.mul %4879, %5154 : !llvm.i64 + %5156 = llvm.add %5153, %5155 : !llvm.i64 + %5157 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5158 = llvm.mul %303, %5157 : !llvm.i64 + %5159 = llvm.add %5156, %5158 : !llvm.i64 + %5160 = llvm.getelementptr %5152[%5159] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5151, %5160 : !llvm.ptr + %5161 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5162 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5163 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5164 = llvm.mul %4879, %5163 : !llvm.i64 + %5165 = llvm.add %5162, %5164 : !llvm.i64 + %5166 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5167 = llvm.mul %303, %5166 : !llvm.i64 + %5168 = llvm.add %5165, %5167 : !llvm.i64 + %5169 = llvm.getelementptr %5161[%5168] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5170 = llvm.load %5169 : !llvm.ptr + %5171 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5172 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5173 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5174 = llvm.mul %4879, %5173 : !llvm.i64 + %5175 = llvm.add %5172, %5174 : !llvm.i64 + %5176 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5177 = llvm.mul %303, %5176 : !llvm.i64 + %5178 = llvm.add %5175, %5177 : !llvm.i64 + %5179 = llvm.getelementptr %5171[%5178] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5170, %5179 : !llvm.ptr + %5180 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5182 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5183 = llvm.mul %4879, %5182 : !llvm.i64 + %5184 = llvm.add %5181, %5183 : !llvm.i64 + %5185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5186 = llvm.mul %59, %5185 : !llvm.i64 + %5187 = llvm.add %5184, %5186 : !llvm.i64 + %5188 = llvm.getelementptr %5180[%5187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5189 = llvm.load %5188 : !llvm.ptr + %5190 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5192 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5193 = llvm.mul %59, %5192 : !llvm.i64 + %5194 = llvm.add %5191, %5193 : !llvm.i64 + %5195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5196 = llvm.mul %364, %5195 : !llvm.i64 + %5197 = llvm.add %5194, %5196 : !llvm.i64 + %5198 = llvm.getelementptr %5190[%5197] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5199 = llvm.load %5198 : !llvm.ptr + %5200 = llvm.fmul %5189, %5199 {RelaxedPrecision} : !llvm.float + %5201 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5202 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5203 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5204 = llvm.mul %4879, %5203 : !llvm.i64 + %5205 = llvm.add %5202, %5204 : !llvm.i64 + %5206 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5207 = llvm.mul %364, %5206 : !llvm.i64 + %5208 = llvm.add %5205, %5207 : !llvm.i64 + %5209 = llvm.getelementptr %5201[%5208] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5210 = llvm.load %5209 : !llvm.ptr + %5211 = llvm.fadd %5210, %5200 {RelaxedPrecision} : !llvm.float + %5212 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5213 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5214 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5215 = llvm.mul %4879, %5214 : !llvm.i64 + %5216 = llvm.add %5213, %5215 : !llvm.i64 + %5217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5218 = llvm.mul %364, %5217 : !llvm.i64 + %5219 = llvm.add %5216, %5218 : !llvm.i64 + %5220 = llvm.getelementptr %5212[%5219] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5211, %5220 : !llvm.ptr + %5221 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5223 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5224 = llvm.mul %4879, %5223 : !llvm.i64 + %5225 = llvm.add %5222, %5224 : !llvm.i64 + %5226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5227 = llvm.mul %364, %5226 : !llvm.i64 + %5228 = llvm.add %5225, %5227 : !llvm.i64 + %5229 = llvm.getelementptr %5221[%5228] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5230 = llvm.load %5229 : !llvm.ptr + %5231 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5232 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5233 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5234 = llvm.mul %4879, %5233 : !llvm.i64 + %5235 = llvm.add %5232, %5234 : !llvm.i64 + %5236 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5237 = llvm.mul %364, %5236 : !llvm.i64 + %5238 = llvm.add %5235, %5237 : !llvm.i64 + %5239 = llvm.getelementptr %5231[%5238] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5230, %5239 : !llvm.ptr + %5240 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5242 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5243 = llvm.mul %4879, %5242 : !llvm.i64 + %5244 = llvm.add %5241, %5243 : !llvm.i64 + %5245 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5246 = llvm.mul %59, %5245 : !llvm.i64 + %5247 = llvm.add %5244, %5246 : !llvm.i64 + %5248 = llvm.getelementptr %5240[%5247] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5249 = llvm.load %5248 : !llvm.ptr + %5250 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5252 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5253 = llvm.mul %59, %5252 : !llvm.i64 + %5254 = llvm.add %5251, %5253 : !llvm.i64 + %5255 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5256 = llvm.mul %425, %5255 : !llvm.i64 + %5257 = llvm.add %5254, %5256 : !llvm.i64 + %5258 = llvm.getelementptr %5250[%5257] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5259 = llvm.load %5258 : !llvm.ptr + %5260 = llvm.fmul %5249, %5259 {RelaxedPrecision} : !llvm.float + %5261 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5262 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5263 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5264 = llvm.mul %4879, %5263 : !llvm.i64 + %5265 = llvm.add %5262, %5264 : !llvm.i64 + %5266 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5267 = llvm.mul %425, %5266 : !llvm.i64 + %5268 = llvm.add %5265, %5267 : !llvm.i64 + %5269 = llvm.getelementptr %5261[%5268] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5270 = llvm.load %5269 : !llvm.ptr + %5271 = llvm.fadd %5270, %5260 {RelaxedPrecision} : !llvm.float + %5272 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5273 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5274 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5275 = llvm.mul %4879, %5274 : !llvm.i64 + %5276 = llvm.add %5273, %5275 : !llvm.i64 + %5277 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5278 = llvm.mul %425, %5277 : !llvm.i64 + %5279 = llvm.add %5276, %5278 : !llvm.i64 + %5280 = llvm.getelementptr %5272[%5279] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5271, %5280 : !llvm.ptr + %5281 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5282 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5283 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5284 = llvm.mul %4879, %5283 : !llvm.i64 + %5285 = llvm.add %5282, %5284 : !llvm.i64 + %5286 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5287 = llvm.mul %425, %5286 : !llvm.i64 + %5288 = llvm.add %5285, %5287 : !llvm.i64 + %5289 = llvm.getelementptr %5281[%5288] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5290 = llvm.load %5289 : !llvm.ptr + %5291 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5292 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5293 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5294 = llvm.mul %4879, %5293 : !llvm.i64 + %5295 = llvm.add %5292, %5294 : !llvm.i64 + %5296 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5297 = llvm.mul %425, %5296 : !llvm.i64 + %5298 = llvm.add %5295, %5297 : !llvm.i64 + %5299 = llvm.getelementptr %5291[%5298] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5290, %5299 : !llvm.ptr + %5300 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5302 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5303 = llvm.mul %4879, %5302 : !llvm.i64 + %5304 = llvm.add %5301, %5303 : !llvm.i64 + %5305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5306 = llvm.mul %59, %5305 : !llvm.i64 + %5307 = llvm.add %5304, %5306 : !llvm.i64 + %5308 = llvm.getelementptr %5300[%5307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5309 = llvm.load %5308 : !llvm.ptr + %5310 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5311 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5312 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5313 = llvm.mul %59, %5312 : !llvm.i64 + %5314 = llvm.add %5311, %5313 : !llvm.i64 + %5315 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5316 = llvm.mul %486, %5315 : !llvm.i64 + %5317 = llvm.add %5314, %5316 : !llvm.i64 + %5318 = llvm.getelementptr %5310[%5317] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5319 = llvm.load %5318 : !llvm.ptr + %5320 = llvm.fmul %5309, %5319 {RelaxedPrecision} : !llvm.float + %5321 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5322 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5323 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5324 = llvm.mul %4879, %5323 : !llvm.i64 + %5325 = llvm.add %5322, %5324 : !llvm.i64 + %5326 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5327 = llvm.mul %486, %5326 : !llvm.i64 + %5328 = llvm.add %5325, %5327 : !llvm.i64 + %5329 = llvm.getelementptr %5321[%5328] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5330 = llvm.load %5329 : !llvm.ptr + %5331 = llvm.fadd %5330, %5320 {RelaxedPrecision} : !llvm.float + %5332 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5333 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5334 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5335 = llvm.mul %4879, %5334 : !llvm.i64 + %5336 = llvm.add %5333, %5335 : !llvm.i64 + %5337 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5338 = llvm.mul %486, %5337 : !llvm.i64 + %5339 = llvm.add %5336, %5338 : !llvm.i64 + %5340 = llvm.getelementptr %5332[%5339] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5331, %5340 : !llvm.ptr + %5341 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5342 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5343 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5344 = llvm.mul %4879, %5343 : !llvm.i64 + %5345 = llvm.add %5342, %5344 : !llvm.i64 + %5346 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5347 = llvm.mul %486, %5346 : !llvm.i64 + %5348 = llvm.add %5345, %5347 : !llvm.i64 + %5349 = llvm.getelementptr %5341[%5348] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5350 = llvm.load %5349 : !llvm.ptr + %5351 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5352 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5353 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5354 = llvm.mul %4879, %5353 : !llvm.i64 + %5355 = llvm.add %5352, %5354 : !llvm.i64 + %5356 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5357 = llvm.mul %486, %5356 : !llvm.i64 + %5358 = llvm.add %5355, %5357 : !llvm.i64 + %5359 = llvm.getelementptr %5351[%5358] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5350, %5359 : !llvm.ptr + %5360 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5361 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5362 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5363 = llvm.mul %4879, %5362 : !llvm.i64 + %5364 = llvm.add %5361, %5363 : !llvm.i64 + %5365 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5366 = llvm.mul %59, %5365 : !llvm.i64 + %5367 = llvm.add %5364, %5366 : !llvm.i64 + %5368 = llvm.getelementptr %5360[%5367] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5369 = llvm.load %5368 : !llvm.ptr + %5370 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5371 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5372 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5373 = llvm.mul %59, %5372 : !llvm.i64 + %5374 = llvm.add %5371, %5373 : !llvm.i64 + %5375 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5376 = llvm.mul %547, %5375 : !llvm.i64 + %5377 = llvm.add %5374, %5376 : !llvm.i64 + %5378 = llvm.getelementptr %5370[%5377] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5379 = llvm.load %5378 : !llvm.ptr + %5380 = llvm.fmul %5369, %5379 {RelaxedPrecision} : !llvm.float + %5381 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5383 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5384 = llvm.mul %4879, %5383 : !llvm.i64 + %5385 = llvm.add %5382, %5384 : !llvm.i64 + %5386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5387 = llvm.mul %547, %5386 : !llvm.i64 + %5388 = llvm.add %5385, %5387 : !llvm.i64 + %5389 = llvm.getelementptr %5381[%5388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5390 = llvm.load %5389 : !llvm.ptr + %5391 = llvm.fadd %5390, %5380 {RelaxedPrecision} : !llvm.float + %5392 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5393 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5394 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5395 = llvm.mul %4879, %5394 : !llvm.i64 + %5396 = llvm.add %5393, %5395 : !llvm.i64 + %5397 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5398 = llvm.mul %547, %5397 : !llvm.i64 + %5399 = llvm.add %5396, %5398 : !llvm.i64 + %5400 = llvm.getelementptr %5392[%5399] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5391, %5400 : !llvm.ptr + %5401 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5403 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5404 = llvm.mul %4879, %5403 : !llvm.i64 + %5405 = llvm.add %5402, %5404 : !llvm.i64 + %5406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5407 = llvm.mul %547, %5406 : !llvm.i64 + %5408 = llvm.add %5405, %5407 : !llvm.i64 + %5409 = llvm.getelementptr %5401[%5408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5410 = llvm.load %5409 : !llvm.ptr + %5411 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5413 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5414 = llvm.mul %4879, %5413 : !llvm.i64 + %5415 = llvm.add %5412, %5414 : !llvm.i64 + %5416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5417 = llvm.mul %547, %5416 : !llvm.i64 + %5418 = llvm.add %5415, %5417 : !llvm.i64 + %5419 = llvm.getelementptr %5411[%5418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5410, %5419 : !llvm.ptr + %5420 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5421 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5422 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5423 = llvm.mul %4879, %5422 : !llvm.i64 + %5424 = llvm.add %5421, %5423 : !llvm.i64 + %5425 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5426 = llvm.mul %59, %5425 : !llvm.i64 + %5427 = llvm.add %5424, %5426 : !llvm.i64 + %5428 = llvm.getelementptr %5420[%5427] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5429 = llvm.load %5428 : !llvm.ptr + %5430 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5431 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5432 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5433 = llvm.mul %59, %5432 : !llvm.i64 + %5434 = llvm.add %5431, %5433 : !llvm.i64 + %5435 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5436 = llvm.mul %608, %5435 : !llvm.i64 + %5437 = llvm.add %5434, %5436 : !llvm.i64 + %5438 = llvm.getelementptr %5430[%5437] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5439 = llvm.load %5438 : !llvm.ptr + %5440 = llvm.fmul %5429, %5439 {RelaxedPrecision} : !llvm.float + %5441 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5442 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5443 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5444 = llvm.mul %4879, %5443 : !llvm.i64 + %5445 = llvm.add %5442, %5444 : !llvm.i64 + %5446 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5447 = llvm.mul %608, %5446 : !llvm.i64 + %5448 = llvm.add %5445, %5447 : !llvm.i64 + %5449 = llvm.getelementptr %5441[%5448] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5450 = llvm.load %5449 : !llvm.ptr + %5451 = llvm.fadd %5450, %5440 {RelaxedPrecision} : !llvm.float + %5452 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5455 = llvm.mul %4879, %5454 : !llvm.i64 + %5456 = llvm.add %5453, %5455 : !llvm.i64 + %5457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5458 = llvm.mul %608, %5457 : !llvm.i64 + %5459 = llvm.add %5456, %5458 : !llvm.i64 + %5460 = llvm.getelementptr %5452[%5459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5451, %5460 : !llvm.ptr + %5461 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5462 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5463 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5464 = llvm.mul %4879, %5463 : !llvm.i64 + %5465 = llvm.add %5462, %5464 : !llvm.i64 + %5466 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5467 = llvm.mul %608, %5466 : !llvm.i64 + %5468 = llvm.add %5465, %5467 : !llvm.i64 + %5469 = llvm.getelementptr %5461[%5468] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5470 = llvm.load %5469 : !llvm.ptr + %5471 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5472 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5473 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5474 = llvm.mul %4879, %5473 : !llvm.i64 + %5475 = llvm.add %5472, %5474 : !llvm.i64 + %5476 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5477 = llvm.mul %608, %5476 : !llvm.i64 + %5478 = llvm.add %5475, %5477 : !llvm.i64 + %5479 = llvm.getelementptr %5471[%5478] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5470, %5479 : !llvm.ptr + %5480 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5481 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5482 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5483 = llvm.mul %4879, %5482 : !llvm.i64 + %5484 = llvm.add %5481, %5483 : !llvm.i64 + %5485 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5486 = llvm.mul %59, %5485 : !llvm.i64 + %5487 = llvm.add %5484, %5486 : !llvm.i64 + %5488 = llvm.getelementptr %5480[%5487] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5489 = llvm.load %5488 : !llvm.ptr + %5490 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5491 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5492 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5493 = llvm.mul %59, %5492 : !llvm.i64 + %5494 = llvm.add %5491, %5493 : !llvm.i64 + %5495 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5496 = llvm.mul %669, %5495 : !llvm.i64 + %5497 = llvm.add %5494, %5496 : !llvm.i64 + %5498 = llvm.getelementptr %5490[%5497] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5499 = llvm.load %5498 : !llvm.ptr + %5500 = llvm.fmul %5489, %5499 {RelaxedPrecision} : !llvm.float + %5501 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5502 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5503 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5504 = llvm.mul %4879, %5503 : !llvm.i64 + %5505 = llvm.add %5502, %5504 : !llvm.i64 + %5506 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5507 = llvm.mul %669, %5506 : !llvm.i64 + %5508 = llvm.add %5505, %5507 : !llvm.i64 + %5509 = llvm.getelementptr %5501[%5508] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5510 = llvm.load %5509 : !llvm.ptr + %5511 = llvm.fadd %5510, %5500 {RelaxedPrecision} : !llvm.float + %5512 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5513 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5514 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5515 = llvm.mul %4879, %5514 : !llvm.i64 + %5516 = llvm.add %5513, %5515 : !llvm.i64 + %5517 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5518 = llvm.mul %669, %5517 : !llvm.i64 + %5519 = llvm.add %5516, %5518 : !llvm.i64 + %5520 = llvm.getelementptr %5512[%5519] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5511, %5520 : !llvm.ptr + %5521 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5522 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5523 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5524 = llvm.mul %4879, %5523 : !llvm.i64 + %5525 = llvm.add %5522, %5524 : !llvm.i64 + %5526 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5527 = llvm.mul %669, %5526 : !llvm.i64 + %5528 = llvm.add %5525, %5527 : !llvm.i64 + %5529 = llvm.getelementptr %5521[%5528] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5530 = llvm.load %5529 : !llvm.ptr + %5531 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5533 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5534 = llvm.mul %4879, %5533 : !llvm.i64 + %5535 = llvm.add %5532, %5534 : !llvm.i64 + %5536 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5537 = llvm.mul %669, %5536 : !llvm.i64 + %5538 = llvm.add %5535, %5537 : !llvm.i64 + %5539 = llvm.getelementptr %5531[%5538] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5530, %5539 : !llvm.ptr + %5540 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5542 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5543 = llvm.mul %4879, %5542 : !llvm.i64 + %5544 = llvm.add %5541, %5543 : !llvm.i64 + %5545 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5546 = llvm.mul %59, %5545 : !llvm.i64 + %5547 = llvm.add %5544, %5546 : !llvm.i64 + %5548 = llvm.getelementptr %5540[%5547] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5549 = llvm.load %5548 : !llvm.ptr + %5550 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5551 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5552 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5553 = llvm.mul %59, %5552 : !llvm.i64 + %5554 = llvm.add %5551, %5553 : !llvm.i64 + %5555 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5556 = llvm.mul %730, %5555 : !llvm.i64 + %5557 = llvm.add %5554, %5556 : !llvm.i64 + %5558 = llvm.getelementptr %5550[%5557] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5559 = llvm.load %5558 : !llvm.ptr + %5560 = llvm.fmul %5549, %5559 {RelaxedPrecision} : !llvm.float + %5561 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5562 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5563 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5564 = llvm.mul %4879, %5563 : !llvm.i64 + %5565 = llvm.add %5562, %5564 : !llvm.i64 + %5566 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5567 = llvm.mul %730, %5566 : !llvm.i64 + %5568 = llvm.add %5565, %5567 : !llvm.i64 + %5569 = llvm.getelementptr %5561[%5568] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5570 = llvm.load %5569 : !llvm.ptr + %5571 = llvm.fadd %5570, %5560 {RelaxedPrecision} : !llvm.float + %5572 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5573 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5574 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5575 = llvm.mul %4879, %5574 : !llvm.i64 + %5576 = llvm.add %5573, %5575 : !llvm.i64 + %5577 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5578 = llvm.mul %730, %5577 : !llvm.i64 + %5579 = llvm.add %5576, %5578 : !llvm.i64 + %5580 = llvm.getelementptr %5572[%5579] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5571, %5580 : !llvm.ptr + %5581 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5582 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5583 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5584 = llvm.mul %4879, %5583 : !llvm.i64 + %5585 = llvm.add %5582, %5584 : !llvm.i64 + %5586 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5587 = llvm.mul %730, %5586 : !llvm.i64 + %5588 = llvm.add %5585, %5587 : !llvm.i64 + %5589 = llvm.getelementptr %5581[%5588] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5590 = llvm.load %5589 : !llvm.ptr + %5591 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5593 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5594 = llvm.mul %4879, %5593 : !llvm.i64 + %5595 = llvm.add %5592, %5594 : !llvm.i64 + %5596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5597 = llvm.mul %730, %5596 : !llvm.i64 + %5598 = llvm.add %5595, %5597 : !llvm.i64 + %5599 = llvm.getelementptr %5591[%5598] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5590, %5599 : !llvm.ptr + %5600 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5602 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5603 = llvm.mul %4879, %5602 : !llvm.i64 + %5604 = llvm.add %5601, %5603 : !llvm.i64 + %5605 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5606 = llvm.mul %59, %5605 : !llvm.i64 + %5607 = llvm.add %5604, %5606 : !llvm.i64 + %5608 = llvm.getelementptr %5600[%5607] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5609 = llvm.load %5608 : !llvm.ptr + %5610 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5611 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5612 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5613 = llvm.mul %59, %5612 : !llvm.i64 + %5614 = llvm.add %5611, %5613 : !llvm.i64 + %5615 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5616 = llvm.mul %791, %5615 : !llvm.i64 + %5617 = llvm.add %5614, %5616 : !llvm.i64 + %5618 = llvm.getelementptr %5610[%5617] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5619 = llvm.load %5618 : !llvm.ptr + %5620 = llvm.fmul %5609, %5619 {RelaxedPrecision} : !llvm.float + %5621 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5622 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5623 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5624 = llvm.mul %4879, %5623 : !llvm.i64 + %5625 = llvm.add %5622, %5624 : !llvm.i64 + %5626 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5627 = llvm.mul %791, %5626 : !llvm.i64 + %5628 = llvm.add %5625, %5627 : !llvm.i64 + %5629 = llvm.getelementptr %5621[%5628] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5630 = llvm.load %5629 : !llvm.ptr + %5631 = llvm.fadd %5630, %5620 {RelaxedPrecision} : !llvm.float + %5632 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5633 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5634 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5635 = llvm.mul %4879, %5634 : !llvm.i64 + %5636 = llvm.add %5633, %5635 : !llvm.i64 + %5637 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5638 = llvm.mul %791, %5637 : !llvm.i64 + %5639 = llvm.add %5636, %5638 : !llvm.i64 + %5640 = llvm.getelementptr %5632[%5639] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5631, %5640 : !llvm.ptr + %5641 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5642 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5643 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5644 = llvm.mul %4879, %5643 : !llvm.i64 + %5645 = llvm.add %5642, %5644 : !llvm.i64 + %5646 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5647 = llvm.mul %791, %5646 : !llvm.i64 + %5648 = llvm.add %5645, %5647 : !llvm.i64 + %5649 = llvm.getelementptr %5641[%5648] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5650 = llvm.load %5649 : !llvm.ptr + %5651 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5652 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5653 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5654 = llvm.mul %4879, %5653 : !llvm.i64 + %5655 = llvm.add %5652, %5654 : !llvm.i64 + %5656 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5657 = llvm.mul %791, %5656 : !llvm.i64 + %5658 = llvm.add %5655, %5657 : !llvm.i64 + %5659 = llvm.getelementptr %5651[%5658] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5650, %5659 : !llvm.ptr + %5660 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5661 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5662 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5663 = llvm.mul %4879, %5662 : !llvm.i64 + %5664 = llvm.add %5661, %5663 : !llvm.i64 + %5665 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5666 = llvm.mul %59, %5665 : !llvm.i64 + %5667 = llvm.add %5664, %5666 : !llvm.i64 + %5668 = llvm.getelementptr %5660[%5667] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5669 = llvm.load %5668 : !llvm.ptr + %5670 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5672 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5673 = llvm.mul %59, %5672 : !llvm.i64 + %5674 = llvm.add %5671, %5673 : !llvm.i64 + %5675 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5676 = llvm.mul %852, %5675 : !llvm.i64 + %5677 = llvm.add %5674, %5676 : !llvm.i64 + %5678 = llvm.getelementptr %5670[%5677] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5679 = llvm.load %5678 : !llvm.ptr + %5680 = llvm.fmul %5669, %5679 {RelaxedPrecision} : !llvm.float + %5681 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5682 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5683 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5684 = llvm.mul %4879, %5683 : !llvm.i64 + %5685 = llvm.add %5682, %5684 : !llvm.i64 + %5686 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5687 = llvm.mul %852, %5686 : !llvm.i64 + %5688 = llvm.add %5685, %5687 : !llvm.i64 + %5689 = llvm.getelementptr %5681[%5688] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5690 = llvm.load %5689 : !llvm.ptr + %5691 = llvm.fadd %5690, %5680 {RelaxedPrecision} : !llvm.float + %5692 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5694 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5695 = llvm.mul %4879, %5694 : !llvm.i64 + %5696 = llvm.add %5693, %5695 : !llvm.i64 + %5697 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5698 = llvm.mul %852, %5697 : !llvm.i64 + %5699 = llvm.add %5696, %5698 : !llvm.i64 + %5700 = llvm.getelementptr %5692[%5699] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5691, %5700 : !llvm.ptr + %5701 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5702 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5703 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5704 = llvm.mul %4879, %5703 : !llvm.i64 + %5705 = llvm.add %5702, %5704 : !llvm.i64 + %5706 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5707 = llvm.mul %852, %5706 : !llvm.i64 + %5708 = llvm.add %5705, %5707 : !llvm.i64 + %5709 = llvm.getelementptr %5701[%5708] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5710 = llvm.load %5709 : !llvm.ptr + %5711 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5712 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5713 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5714 = llvm.mul %4879, %5713 : !llvm.i64 + %5715 = llvm.add %5712, %5714 : !llvm.i64 + %5716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5717 = llvm.mul %852, %5716 : !llvm.i64 + %5718 = llvm.add %5715, %5717 : !llvm.i64 + %5719 = llvm.getelementptr %5711[%5718] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5710, %5719 : !llvm.ptr + %5720 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5721 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5722 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5723 = llvm.mul %4879, %5722 : !llvm.i64 + %5724 = llvm.add %5721, %5723 : !llvm.i64 + %5725 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5726 = llvm.mul %59, %5725 : !llvm.i64 + %5727 = llvm.add %5724, %5726 : !llvm.i64 + %5728 = llvm.getelementptr %5720[%5727] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5729 = llvm.load %5728 : !llvm.ptr + %5730 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5731 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5732 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5733 = llvm.mul %59, %5732 : !llvm.i64 + %5734 = llvm.add %5731, %5733 : !llvm.i64 + %5735 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5736 = llvm.mul %913, %5735 : !llvm.i64 + %5737 = llvm.add %5734, %5736 : !llvm.i64 + %5738 = llvm.getelementptr %5730[%5737] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5739 = llvm.load %5738 : !llvm.ptr + %5740 = llvm.fmul %5729, %5739 {RelaxedPrecision} : !llvm.float + %5741 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5742 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5743 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5744 = llvm.mul %4879, %5743 : !llvm.i64 + %5745 = llvm.add %5742, %5744 : !llvm.i64 + %5746 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5747 = llvm.mul %913, %5746 : !llvm.i64 + %5748 = llvm.add %5745, %5747 : !llvm.i64 + %5749 = llvm.getelementptr %5741[%5748] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5750 = llvm.load %5749 : !llvm.ptr + %5751 = llvm.fadd %5750, %5740 {RelaxedPrecision} : !llvm.float + %5752 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5754 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5755 = llvm.mul %4879, %5754 : !llvm.i64 + %5756 = llvm.add %5753, %5755 : !llvm.i64 + %5757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5758 = llvm.mul %913, %5757 : !llvm.i64 + %5759 = llvm.add %5756, %5758 : !llvm.i64 + %5760 = llvm.getelementptr %5752[%5759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5751, %5760 : !llvm.ptr + %5761 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5762 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5763 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5764 = llvm.mul %4879, %5763 : !llvm.i64 + %5765 = llvm.add %5762, %5764 : !llvm.i64 + %5766 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5767 = llvm.mul %913, %5766 : !llvm.i64 + %5768 = llvm.add %5765, %5767 : !llvm.i64 + %5769 = llvm.getelementptr %5761[%5768] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5770 = llvm.load %5769 : !llvm.ptr + %5771 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5772 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5773 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5774 = llvm.mul %4879, %5773 : !llvm.i64 + %5775 = llvm.add %5772, %5774 : !llvm.i64 + %5776 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5777 = llvm.mul %913, %5776 : !llvm.i64 + %5778 = llvm.add %5775, %5777 : !llvm.i64 + %5779 = llvm.getelementptr %5771[%5778] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5770, %5779 : !llvm.ptr + %5780 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5781 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5782 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5783 = llvm.mul %4879, %5782 : !llvm.i64 + %5784 = llvm.add %5781, %5783 : !llvm.i64 + %5785 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5786 = llvm.mul %59, %5785 : !llvm.i64 + %5787 = llvm.add %5784, %5786 : !llvm.i64 + %5788 = llvm.getelementptr %5780[%5787] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5789 = llvm.load %5788 : !llvm.ptr + %5790 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5791 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5792 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5793 = llvm.mul %59, %5792 : !llvm.i64 + %5794 = llvm.add %5791, %5793 : !llvm.i64 + %5795 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5796 = llvm.mul %974, %5795 : !llvm.i64 + %5797 = llvm.add %5794, %5796 : !llvm.i64 + %5798 = llvm.getelementptr %5790[%5797] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5799 = llvm.load %5798 : !llvm.ptr + %5800 = llvm.fmul %5789, %5799 {RelaxedPrecision} : !llvm.float + %5801 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5802 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5803 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5804 = llvm.mul %4879, %5803 : !llvm.i64 + %5805 = llvm.add %5802, %5804 : !llvm.i64 + %5806 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5807 = llvm.mul %974, %5806 : !llvm.i64 + %5808 = llvm.add %5805, %5807 : !llvm.i64 + %5809 = llvm.getelementptr %5801[%5808] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5810 = llvm.load %5809 : !llvm.ptr + %5811 = llvm.fadd %5810, %5800 {RelaxedPrecision} : !llvm.float + %5812 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5813 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5814 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5815 = llvm.mul %4879, %5814 : !llvm.i64 + %5816 = llvm.add %5813, %5815 : !llvm.i64 + %5817 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5818 = llvm.mul %974, %5817 : !llvm.i64 + %5819 = llvm.add %5816, %5818 : !llvm.i64 + %5820 = llvm.getelementptr %5812[%5819] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5811, %5820 : !llvm.ptr + %5821 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5822 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5823 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5824 = llvm.mul %4879, %5823 : !llvm.i64 + %5825 = llvm.add %5822, %5824 : !llvm.i64 + %5826 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5827 = llvm.mul %974, %5826 : !llvm.i64 + %5828 = llvm.add %5825, %5827 : !llvm.i64 + %5829 = llvm.getelementptr %5821[%5828] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5830 = llvm.load %5829 : !llvm.ptr + %5831 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5832 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5833 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5834 = llvm.mul %4879, %5833 : !llvm.i64 + %5835 = llvm.add %5832, %5834 : !llvm.i64 + %5836 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5837 = llvm.mul %974, %5836 : !llvm.i64 + %5838 = llvm.add %5835, %5837 : !llvm.i64 + %5839 = llvm.getelementptr %5831[%5838] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5830, %5839 : !llvm.ptr + %5840 = llvm.add %56, %33 : !llvm.i64 + llvm.br ^bb9(%5840 : !llvm.i64) + ^bb11: // pred: ^bb9 + %5841 = llvm.add %54, %36 : !llvm.i64 + llvm.br ^bb7(%5841 : !llvm.i64) + ^bb12: // pred: ^bb7 + %5842 = llvm.add %52, %30 : !llvm.i64 + llvm.br ^bb5(%5842 : !llvm.i64) + ^bb13: // pred: ^bb5 + %5843 = llvm.add %50, %38 : !llvm.i64 + llvm.br ^bb3(%5843 : !llvm.i64) + ^bb14: // pred: ^bb3 + llvm.br ^bb15(%32 : !llvm.i64) + ^bb15(%5844: !llvm.i64): // 2 preds: ^bb14, ^bb22 + %5845 = llvm.icmp "slt" %5844, %29 : !llvm.i64 + llvm.cond_br %5845, ^bb16, ^bb23 + ^bb16: // pred: ^bb15 + llvm.br ^bb17(%32 : !llvm.i64) + ^bb17(%5846: !llvm.i64): // 2 preds: ^bb16, ^bb21 + %5847 = llvm.icmp "slt" %5846, %31 : !llvm.i64 + llvm.cond_br %5847, ^bb18, ^bb22 + ^bb18: // pred: ^bb17 + llvm.br ^bb19(%32 : !llvm.i64) + ^bb19(%5848: !llvm.i64): // 2 preds: ^bb18, ^bb20 + %5849 = llvm.icmp "slt" %5848, %36 : !llvm.i64 + llvm.cond_br %5849, ^bb20, ^bb21 + ^bb20: // pred: ^bb19 + %5850 = llvm.add %48, %5844 : !llvm.i64 + %5851 = llvm.add %5846, %5848 : !llvm.i64 + %5852 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5853 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5854 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5855 = llvm.mul %28, %5854 : !llvm.i64 + %5856 = llvm.add %5853, %5855 : !llvm.i64 + %5857 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5858 = llvm.mul %5851, %5857 : !llvm.i64 + %5859 = llvm.add %5856, %5858 : !llvm.i64 + %5860 = llvm.getelementptr %5852[%5859] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5861 = llvm.load %5860 : !llvm.ptr + %5862 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5863 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5864 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5865 = llvm.mul %5851, %5864 : !llvm.i64 + %5866 = llvm.add %5863, %5865 : !llvm.i64 + %5867 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5868 = llvm.mul %5850, %5867 : !llvm.i64 + %5869 = llvm.add %5866, %5868 : !llvm.i64 + %5870 = llvm.getelementptr %5862[%5869] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5871 = llvm.load %5870 : !llvm.ptr + %5872 = llvm.fmul %5861, %5871 {RelaxedPrecision} : !llvm.float + %5873 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5874 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5875 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5876 = llvm.mul %28, %5875 : !llvm.i64 + %5877 = llvm.add %5874, %5876 : !llvm.i64 + %5878 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5879 = llvm.mul %5850, %5878 : !llvm.i64 + %5880 = llvm.add %5877, %5879 : !llvm.i64 + %5881 = llvm.getelementptr %5873[%5880] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5882 = llvm.load %5881 : !llvm.ptr + %5883 = llvm.fadd %5882, %5872 {RelaxedPrecision} : !llvm.float + %5884 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5885 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5886 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5887 = llvm.mul %28, %5886 : !llvm.i64 + %5888 = llvm.add %5885, %5887 : !llvm.i64 + %5889 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5890 = llvm.mul %5850, %5889 : !llvm.i64 + %5891 = llvm.add %5888, %5890 : !llvm.i64 + %5892 = llvm.getelementptr %5884[%5891] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5883, %5892 : !llvm.ptr + %5893 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5894 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5895 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5896 = llvm.mul %28, %5895 : !llvm.i64 + %5897 = llvm.add %5894, %5896 : !llvm.i64 + %5898 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5899 = llvm.mul %5850, %5898 : !llvm.i64 + %5900 = llvm.add %5897, %5899 : !llvm.i64 + %5901 = llvm.getelementptr %5893[%5900] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5902 = llvm.load %5901 : !llvm.ptr + %5903 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5904 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5905 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5906 = llvm.mul %28, %5905 : !llvm.i64 + %5907 = llvm.add %5904, %5906 : !llvm.i64 + %5908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5909 = llvm.mul %5850, %5908 : !llvm.i64 + %5910 = llvm.add %5907, %5909 : !llvm.i64 + %5911 = llvm.getelementptr %5903[%5910] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5902, %5911 : !llvm.ptr + %5912 = llvm.add %5850, %33 : !llvm.i64 + %5913 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5914 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5915 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5916 = llvm.mul %28, %5915 : !llvm.i64 + %5917 = llvm.add %5914, %5916 : !llvm.i64 + %5918 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5919 = llvm.mul %5851, %5918 : !llvm.i64 + %5920 = llvm.add %5917, %5919 : !llvm.i64 + %5921 = llvm.getelementptr %5913[%5920] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5922 = llvm.load %5921 : !llvm.ptr + %5923 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5924 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5925 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5926 = llvm.mul %5851, %5925 : !llvm.i64 + %5927 = llvm.add %5924, %5926 : !llvm.i64 + %5928 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5929 = llvm.mul %5912, %5928 : !llvm.i64 + %5930 = llvm.add %5927, %5929 : !llvm.i64 + %5931 = llvm.getelementptr %5923[%5930] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5932 = llvm.load %5931 : !llvm.ptr + %5933 = llvm.fmul %5922, %5932 {RelaxedPrecision} : !llvm.float + %5934 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5935 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5936 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5937 = llvm.mul %28, %5936 : !llvm.i64 + %5938 = llvm.add %5935, %5937 : !llvm.i64 + %5939 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5940 = llvm.mul %5912, %5939 : !llvm.i64 + %5941 = llvm.add %5938, %5940 : !llvm.i64 + %5942 = llvm.getelementptr %5934[%5941] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5943 = llvm.load %5942 : !llvm.ptr + %5944 = llvm.fadd %5943, %5933 {RelaxedPrecision} : !llvm.float + %5945 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5946 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5947 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5948 = llvm.mul %28, %5947 : !llvm.i64 + %5949 = llvm.add %5946, %5948 : !llvm.i64 + %5950 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5951 = llvm.mul %5912, %5950 : !llvm.i64 + %5952 = llvm.add %5949, %5951 : !llvm.i64 + %5953 = llvm.getelementptr %5945[%5952] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5944, %5953 : !llvm.ptr + %5954 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5955 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5956 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5957 = llvm.mul %28, %5956 : !llvm.i64 + %5958 = llvm.add %5955, %5957 : !llvm.i64 + %5959 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5960 = llvm.mul %5912, %5959 : !llvm.i64 + %5961 = llvm.add %5958, %5960 : !llvm.i64 + %5962 = llvm.getelementptr %5954[%5961] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5963 = llvm.load %5962 : !llvm.ptr + %5964 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5966 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5967 = llvm.mul %28, %5966 : !llvm.i64 + %5968 = llvm.add %5965, %5967 : !llvm.i64 + %5969 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5970 = llvm.mul %5912, %5969 : !llvm.i64 + %5971 = llvm.add %5968, %5970 : !llvm.i64 + %5972 = llvm.getelementptr %5964[%5971] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %5963, %5972 : !llvm.ptr + %5973 = llvm.add %5850, %34 : !llvm.i64 + %5974 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5975 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5976 = llvm.mlir.constant(128 : index) : !llvm.i64 + %5977 = llvm.mul %28, %5976 : !llvm.i64 + %5978 = llvm.add %5975, %5977 : !llvm.i64 + %5979 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5980 = llvm.mul %5851, %5979 : !llvm.i64 + %5981 = llvm.add %5978, %5980 : !llvm.i64 + %5982 = llvm.getelementptr %5974[%5981] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5983 = llvm.load %5982 : !llvm.ptr + %5984 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5985 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5986 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5987 = llvm.mul %5851, %5986 : !llvm.i64 + %5988 = llvm.add %5985, %5987 : !llvm.i64 + %5989 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5990 = llvm.mul %5973, %5989 : !llvm.i64 + %5991 = llvm.add %5988, %5990 : !llvm.i64 + %5992 = llvm.getelementptr %5984[%5991] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5993 = llvm.load %5992 : !llvm.ptr + %5994 = llvm.fmul %5983, %5993 {RelaxedPrecision} : !llvm.float + %5995 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5996 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5997 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5998 = llvm.mul %28, %5997 : !llvm.i64 + %5999 = llvm.add %5996, %5998 : !llvm.i64 + %6000 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6001 = llvm.mul %5973, %6000 : !llvm.i64 + %6002 = llvm.add %5999, %6001 : !llvm.i64 + %6003 = llvm.getelementptr %5995[%6002] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6004 = llvm.load %6003 : !llvm.ptr + %6005 = llvm.fadd %6004, %5994 {RelaxedPrecision} : !llvm.float + %6006 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6007 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6008 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6009 = llvm.mul %28, %6008 : !llvm.i64 + %6010 = llvm.add %6007, %6009 : !llvm.i64 + %6011 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6012 = llvm.mul %5973, %6011 : !llvm.i64 + %6013 = llvm.add %6010, %6012 : !llvm.i64 + %6014 = llvm.getelementptr %6006[%6013] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6005, %6014 : !llvm.ptr + %6015 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6016 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6017 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6018 = llvm.mul %28, %6017 : !llvm.i64 + %6019 = llvm.add %6016, %6018 : !llvm.i64 + %6020 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6021 = llvm.mul %5973, %6020 : !llvm.i64 + %6022 = llvm.add %6019, %6021 : !llvm.i64 + %6023 = llvm.getelementptr %6015[%6022] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6024 = llvm.load %6023 : !llvm.ptr + %6025 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6026 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6027 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6028 = llvm.mul %28, %6027 : !llvm.i64 + %6029 = llvm.add %6026, %6028 : !llvm.i64 + %6030 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6031 = llvm.mul %5973, %6030 : !llvm.i64 + %6032 = llvm.add %6029, %6031 : !llvm.i64 + %6033 = llvm.getelementptr %6025[%6032] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6024, %6033 : !llvm.ptr + %6034 = llvm.add %5850, %35 : !llvm.i64 + %6035 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6036 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6037 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6038 = llvm.mul %28, %6037 : !llvm.i64 + %6039 = llvm.add %6036, %6038 : !llvm.i64 + %6040 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6041 = llvm.mul %5851, %6040 : !llvm.i64 + %6042 = llvm.add %6039, %6041 : !llvm.i64 + %6043 = llvm.getelementptr %6035[%6042] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6044 = llvm.load %6043 : !llvm.ptr + %6045 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6046 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6047 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6048 = llvm.mul %5851, %6047 : !llvm.i64 + %6049 = llvm.add %6046, %6048 : !llvm.i64 + %6050 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6051 = llvm.mul %6034, %6050 : !llvm.i64 + %6052 = llvm.add %6049, %6051 : !llvm.i64 + %6053 = llvm.getelementptr %6045[%6052] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6054 = llvm.load %6053 : !llvm.ptr + %6055 = llvm.fmul %6044, %6054 {RelaxedPrecision} : !llvm.float + %6056 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6057 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6058 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6059 = llvm.mul %28, %6058 : !llvm.i64 + %6060 = llvm.add %6057, %6059 : !llvm.i64 + %6061 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6062 = llvm.mul %6034, %6061 : !llvm.i64 + %6063 = llvm.add %6060, %6062 : !llvm.i64 + %6064 = llvm.getelementptr %6056[%6063] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6065 = llvm.load %6064 : !llvm.ptr + %6066 = llvm.fadd %6065, %6055 {RelaxedPrecision} : !llvm.float + %6067 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6069 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6070 = llvm.mul %28, %6069 : !llvm.i64 + %6071 = llvm.add %6068, %6070 : !llvm.i64 + %6072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6073 = llvm.mul %6034, %6072 : !llvm.i64 + %6074 = llvm.add %6071, %6073 : !llvm.i64 + %6075 = llvm.getelementptr %6067[%6074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6066, %6075 : !llvm.ptr + %6076 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6077 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6078 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6079 = llvm.mul %28, %6078 : !llvm.i64 + %6080 = llvm.add %6077, %6079 : !llvm.i64 + %6081 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6082 = llvm.mul %6034, %6081 : !llvm.i64 + %6083 = llvm.add %6080, %6082 : !llvm.i64 + %6084 = llvm.getelementptr %6076[%6083] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6085 = llvm.load %6084 : !llvm.ptr + %6086 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6087 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6088 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6089 = llvm.mul %28, %6088 : !llvm.i64 + %6090 = llvm.add %6087, %6089 : !llvm.i64 + %6091 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6092 = llvm.mul %6034, %6091 : !llvm.i64 + %6093 = llvm.add %6090, %6092 : !llvm.i64 + %6094 = llvm.getelementptr %6086[%6093] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6085, %6094 : !llvm.ptr + %6095 = llvm.add %5850, %36 : !llvm.i64 + %6096 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6098 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6099 = llvm.mul %28, %6098 : !llvm.i64 + %6100 = llvm.add %6097, %6099 : !llvm.i64 + %6101 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6102 = llvm.mul %5851, %6101 : !llvm.i64 + %6103 = llvm.add %6100, %6102 : !llvm.i64 + %6104 = llvm.getelementptr %6096[%6103] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6105 = llvm.load %6104 : !llvm.ptr + %6106 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6108 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6109 = llvm.mul %5851, %6108 : !llvm.i64 + %6110 = llvm.add %6107, %6109 : !llvm.i64 + %6111 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6112 = llvm.mul %6095, %6111 : !llvm.i64 + %6113 = llvm.add %6110, %6112 : !llvm.i64 + %6114 = llvm.getelementptr %6106[%6113] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6115 = llvm.load %6114 : !llvm.ptr + %6116 = llvm.fmul %6105, %6115 {RelaxedPrecision} : !llvm.float + %6117 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6120 = llvm.mul %28, %6119 : !llvm.i64 + %6121 = llvm.add %6118, %6120 : !llvm.i64 + %6122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6123 = llvm.mul %6095, %6122 : !llvm.i64 + %6124 = llvm.add %6121, %6123 : !llvm.i64 + %6125 = llvm.getelementptr %6117[%6124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6126 = llvm.load %6125 : !llvm.ptr + %6127 = llvm.fadd %6126, %6116 {RelaxedPrecision} : !llvm.float + %6128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6131 = llvm.mul %28, %6130 : !llvm.i64 + %6132 = llvm.add %6129, %6131 : !llvm.i64 + %6133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6134 = llvm.mul %6095, %6133 : !llvm.i64 + %6135 = llvm.add %6132, %6134 : !llvm.i64 + %6136 = llvm.getelementptr %6128[%6135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6127, %6136 : !llvm.ptr + %6137 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6140 = llvm.mul %28, %6139 : !llvm.i64 + %6141 = llvm.add %6138, %6140 : !llvm.i64 + %6142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6143 = llvm.mul %6095, %6142 : !llvm.i64 + %6144 = llvm.add %6141, %6143 : !llvm.i64 + %6145 = llvm.getelementptr %6137[%6144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6146 = llvm.load %6145 : !llvm.ptr + %6147 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6149 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6150 = llvm.mul %28, %6149 : !llvm.i64 + %6151 = llvm.add %6148, %6150 : !llvm.i64 + %6152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6153 = llvm.mul %6095, %6152 : !llvm.i64 + %6154 = llvm.add %6151, %6153 : !llvm.i64 + %6155 = llvm.getelementptr %6147[%6154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6146, %6155 : !llvm.ptr + %6156 = llvm.add %5850, %37 : !llvm.i64 + %6157 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6159 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6160 = llvm.mul %28, %6159 : !llvm.i64 + %6161 = llvm.add %6158, %6160 : !llvm.i64 + %6162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6163 = llvm.mul %5851, %6162 : !llvm.i64 + %6164 = llvm.add %6161, %6163 : !llvm.i64 + %6165 = llvm.getelementptr %6157[%6164] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6166 = llvm.load %6165 : !llvm.ptr + %6167 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6169 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6170 = llvm.mul %5851, %6169 : !llvm.i64 + %6171 = llvm.add %6168, %6170 : !llvm.i64 + %6172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6173 = llvm.mul %6156, %6172 : !llvm.i64 + %6174 = llvm.add %6171, %6173 : !llvm.i64 + %6175 = llvm.getelementptr %6167[%6174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6176 = llvm.load %6175 : !llvm.ptr + %6177 = llvm.fmul %6166, %6176 {RelaxedPrecision} : !llvm.float + %6178 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6180 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6181 = llvm.mul %28, %6180 : !llvm.i64 + %6182 = llvm.add %6179, %6181 : !llvm.i64 + %6183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6184 = llvm.mul %6156, %6183 : !llvm.i64 + %6185 = llvm.add %6182, %6184 : !llvm.i64 + %6186 = llvm.getelementptr %6178[%6185] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6187 = llvm.load %6186 : !llvm.ptr + %6188 = llvm.fadd %6187, %6177 {RelaxedPrecision} : !llvm.float + %6189 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6191 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6192 = llvm.mul %28, %6191 : !llvm.i64 + %6193 = llvm.add %6190, %6192 : !llvm.i64 + %6194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6195 = llvm.mul %6156, %6194 : !llvm.i64 + %6196 = llvm.add %6193, %6195 : !llvm.i64 + %6197 = llvm.getelementptr %6189[%6196] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6188, %6197 : !llvm.ptr + %6198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6201 = llvm.mul %28, %6200 : !llvm.i64 + %6202 = llvm.add %6199, %6201 : !llvm.i64 + %6203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6204 = llvm.mul %6156, %6203 : !llvm.i64 + %6205 = llvm.add %6202, %6204 : !llvm.i64 + %6206 = llvm.getelementptr %6198[%6205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6207 = llvm.load %6206 : !llvm.ptr + %6208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6211 = llvm.mul %28, %6210 : !llvm.i64 + %6212 = llvm.add %6209, %6211 : !llvm.i64 + %6213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6214 = llvm.mul %6156, %6213 : !llvm.i64 + %6215 = llvm.add %6212, %6214 : !llvm.i64 + %6216 = llvm.getelementptr %6208[%6215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6207, %6216 : !llvm.ptr + %6217 = llvm.add %5850, %38 : !llvm.i64 + %6218 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6220 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6221 = llvm.mul %28, %6220 : !llvm.i64 + %6222 = llvm.add %6219, %6221 : !llvm.i64 + %6223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6224 = llvm.mul %5851, %6223 : !llvm.i64 + %6225 = llvm.add %6222, %6224 : !llvm.i64 + %6226 = llvm.getelementptr %6218[%6225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6227 = llvm.load %6226 : !llvm.ptr + %6228 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6231 = llvm.mul %5851, %6230 : !llvm.i64 + %6232 = llvm.add %6229, %6231 : !llvm.i64 + %6233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6234 = llvm.mul %6217, %6233 : !llvm.i64 + %6235 = llvm.add %6232, %6234 : !llvm.i64 + %6236 = llvm.getelementptr %6228[%6235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6237 = llvm.load %6236 : !llvm.ptr + %6238 = llvm.fmul %6227, %6237 {RelaxedPrecision} : !llvm.float + %6239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6242 = llvm.mul %28, %6241 : !llvm.i64 + %6243 = llvm.add %6240, %6242 : !llvm.i64 + %6244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6245 = llvm.mul %6217, %6244 : !llvm.i64 + %6246 = llvm.add %6243, %6245 : !llvm.i64 + %6247 = llvm.getelementptr %6239[%6246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6248 = llvm.load %6247 : !llvm.ptr + %6249 = llvm.fadd %6248, %6238 {RelaxedPrecision} : !llvm.float + %6250 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6252 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6253 = llvm.mul %28, %6252 : !llvm.i64 + %6254 = llvm.add %6251, %6253 : !llvm.i64 + %6255 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6256 = llvm.mul %6217, %6255 : !llvm.i64 + %6257 = llvm.add %6254, %6256 : !llvm.i64 + %6258 = llvm.getelementptr %6250[%6257] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6249, %6258 : !llvm.ptr + %6259 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6260 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6261 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6262 = llvm.mul %28, %6261 : !llvm.i64 + %6263 = llvm.add %6260, %6262 : !llvm.i64 + %6264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6265 = llvm.mul %6217, %6264 : !llvm.i64 + %6266 = llvm.add %6263, %6265 : !llvm.i64 + %6267 = llvm.getelementptr %6259[%6266] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6268 = llvm.load %6267 : !llvm.ptr + %6269 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6270 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6271 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6272 = llvm.mul %28, %6271 : !llvm.i64 + %6273 = llvm.add %6270, %6272 : !llvm.i64 + %6274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6275 = llvm.mul %6217, %6274 : !llvm.i64 + %6276 = llvm.add %6273, %6275 : !llvm.i64 + %6277 = llvm.getelementptr %6269[%6276] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6268, %6277 : !llvm.ptr + %6278 = llvm.add %5850, %39 : !llvm.i64 + %6279 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6281 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6282 = llvm.mul %28, %6281 : !llvm.i64 + %6283 = llvm.add %6280, %6282 : !llvm.i64 + %6284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6285 = llvm.mul %5851, %6284 : !llvm.i64 + %6286 = llvm.add %6283, %6285 : !llvm.i64 + %6287 = llvm.getelementptr %6279[%6286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6288 = llvm.load %6287 : !llvm.ptr + %6289 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6290 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6291 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6292 = llvm.mul %5851, %6291 : !llvm.i64 + %6293 = llvm.add %6290, %6292 : !llvm.i64 + %6294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6295 = llvm.mul %6278, %6294 : !llvm.i64 + %6296 = llvm.add %6293, %6295 : !llvm.i64 + %6297 = llvm.getelementptr %6289[%6296] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6298 = llvm.load %6297 : !llvm.ptr + %6299 = llvm.fmul %6288, %6298 {RelaxedPrecision} : !llvm.float + %6300 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6302 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6303 = llvm.mul %28, %6302 : !llvm.i64 + %6304 = llvm.add %6301, %6303 : !llvm.i64 + %6305 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6306 = llvm.mul %6278, %6305 : !llvm.i64 + %6307 = llvm.add %6304, %6306 : !llvm.i64 + %6308 = llvm.getelementptr %6300[%6307] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6309 = llvm.load %6308 : !llvm.ptr + %6310 = llvm.fadd %6309, %6299 {RelaxedPrecision} : !llvm.float + %6311 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6314 = llvm.mul %28, %6313 : !llvm.i64 + %6315 = llvm.add %6312, %6314 : !llvm.i64 + %6316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6317 = llvm.mul %6278, %6316 : !llvm.i64 + %6318 = llvm.add %6315, %6317 : !llvm.i64 + %6319 = llvm.getelementptr %6311[%6318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6310, %6319 : !llvm.ptr + %6320 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6321 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6322 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6323 = llvm.mul %28, %6322 : !llvm.i64 + %6324 = llvm.add %6321, %6323 : !llvm.i64 + %6325 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6326 = llvm.mul %6278, %6325 : !llvm.i64 + %6327 = llvm.add %6324, %6326 : !llvm.i64 + %6328 = llvm.getelementptr %6320[%6327] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6329 = llvm.load %6328 : !llvm.ptr + %6330 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6331 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6332 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6333 = llvm.mul %28, %6332 : !llvm.i64 + %6334 = llvm.add %6331, %6333 : !llvm.i64 + %6335 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6336 = llvm.mul %6278, %6335 : !llvm.i64 + %6337 = llvm.add %6334, %6336 : !llvm.i64 + %6338 = llvm.getelementptr %6330[%6337] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6329, %6338 : !llvm.ptr + %6339 = llvm.add %5850, %40 : !llvm.i64 + %6340 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6342 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6343 = llvm.mul %28, %6342 : !llvm.i64 + %6344 = llvm.add %6341, %6343 : !llvm.i64 + %6345 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6346 = llvm.mul %5851, %6345 : !llvm.i64 + %6347 = llvm.add %6344, %6346 : !llvm.i64 + %6348 = llvm.getelementptr %6340[%6347] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6349 = llvm.load %6348 : !llvm.ptr + %6350 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6351 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6352 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6353 = llvm.mul %5851, %6352 : !llvm.i64 + %6354 = llvm.add %6351, %6353 : !llvm.i64 + %6355 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6356 = llvm.mul %6339, %6355 : !llvm.i64 + %6357 = llvm.add %6354, %6356 : !llvm.i64 + %6358 = llvm.getelementptr %6350[%6357] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6359 = llvm.load %6358 : !llvm.ptr + %6360 = llvm.fmul %6349, %6359 {RelaxedPrecision} : !llvm.float + %6361 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6362 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6363 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6364 = llvm.mul %28, %6363 : !llvm.i64 + %6365 = llvm.add %6362, %6364 : !llvm.i64 + %6366 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6367 = llvm.mul %6339, %6366 : !llvm.i64 + %6368 = llvm.add %6365, %6367 : !llvm.i64 + %6369 = llvm.getelementptr %6361[%6368] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6370 = llvm.load %6369 : !llvm.ptr + %6371 = llvm.fadd %6370, %6360 {RelaxedPrecision} : !llvm.float + %6372 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6373 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6374 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6375 = llvm.mul %28, %6374 : !llvm.i64 + %6376 = llvm.add %6373, %6375 : !llvm.i64 + %6377 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6378 = llvm.mul %6339, %6377 : !llvm.i64 + %6379 = llvm.add %6376, %6378 : !llvm.i64 + %6380 = llvm.getelementptr %6372[%6379] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6371, %6380 : !llvm.ptr + %6381 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6383 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6384 = llvm.mul %28, %6383 : !llvm.i64 + %6385 = llvm.add %6382, %6384 : !llvm.i64 + %6386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6387 = llvm.mul %6339, %6386 : !llvm.i64 + %6388 = llvm.add %6385, %6387 : !llvm.i64 + %6389 = llvm.getelementptr %6381[%6388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6390 = llvm.load %6389 : !llvm.ptr + %6391 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6392 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6393 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6394 = llvm.mul %28, %6393 : !llvm.i64 + %6395 = llvm.add %6392, %6394 : !llvm.i64 + %6396 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6397 = llvm.mul %6339, %6396 : !llvm.i64 + %6398 = llvm.add %6395, %6397 : !llvm.i64 + %6399 = llvm.getelementptr %6391[%6398] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6390, %6399 : !llvm.ptr + %6400 = llvm.add %5850, %41 : !llvm.i64 + %6401 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6403 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6404 = llvm.mul %28, %6403 : !llvm.i64 + %6405 = llvm.add %6402, %6404 : !llvm.i64 + %6406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6407 = llvm.mul %5851, %6406 : !llvm.i64 + %6408 = llvm.add %6405, %6407 : !llvm.i64 + %6409 = llvm.getelementptr %6401[%6408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6410 = llvm.load %6409 : !llvm.ptr + %6411 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6413 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6414 = llvm.mul %5851, %6413 : !llvm.i64 + %6415 = llvm.add %6412, %6414 : !llvm.i64 + %6416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6417 = llvm.mul %6400, %6416 : !llvm.i64 + %6418 = llvm.add %6415, %6417 : !llvm.i64 + %6419 = llvm.getelementptr %6411[%6418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6420 = llvm.load %6419 : !llvm.ptr + %6421 = llvm.fmul %6410, %6420 {RelaxedPrecision} : !llvm.float + %6422 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6423 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6424 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6425 = llvm.mul %28, %6424 : !llvm.i64 + %6426 = llvm.add %6423, %6425 : !llvm.i64 + %6427 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6428 = llvm.mul %6400, %6427 : !llvm.i64 + %6429 = llvm.add %6426, %6428 : !llvm.i64 + %6430 = llvm.getelementptr %6422[%6429] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6431 = llvm.load %6430 : !llvm.ptr + %6432 = llvm.fadd %6431, %6421 {RelaxedPrecision} : !llvm.float + %6433 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6434 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6435 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6436 = llvm.mul %28, %6435 : !llvm.i64 + %6437 = llvm.add %6434, %6436 : !llvm.i64 + %6438 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6439 = llvm.mul %6400, %6438 : !llvm.i64 + %6440 = llvm.add %6437, %6439 : !llvm.i64 + %6441 = llvm.getelementptr %6433[%6440] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6432, %6441 : !llvm.ptr + %6442 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6444 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6445 = llvm.mul %28, %6444 : !llvm.i64 + %6446 = llvm.add %6443, %6445 : !llvm.i64 + %6447 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6448 = llvm.mul %6400, %6447 : !llvm.i64 + %6449 = llvm.add %6446, %6448 : !llvm.i64 + %6450 = llvm.getelementptr %6442[%6449] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6451 = llvm.load %6450 : !llvm.ptr + %6452 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6455 = llvm.mul %28, %6454 : !llvm.i64 + %6456 = llvm.add %6453, %6455 : !llvm.i64 + %6457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6458 = llvm.mul %6400, %6457 : !llvm.i64 + %6459 = llvm.add %6456, %6458 : !llvm.i64 + %6460 = llvm.getelementptr %6452[%6459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6451, %6460 : !llvm.ptr + %6461 = llvm.add %5850, %42 : !llvm.i64 + %6462 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6463 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6464 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6465 = llvm.mul %28, %6464 : !llvm.i64 + %6466 = llvm.add %6463, %6465 : !llvm.i64 + %6467 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6468 = llvm.mul %5851, %6467 : !llvm.i64 + %6469 = llvm.add %6466, %6468 : !llvm.i64 + %6470 = llvm.getelementptr %6462[%6469] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6471 = llvm.load %6470 : !llvm.ptr + %6472 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6473 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6474 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6475 = llvm.mul %5851, %6474 : !llvm.i64 + %6476 = llvm.add %6473, %6475 : !llvm.i64 + %6477 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6478 = llvm.mul %6461, %6477 : !llvm.i64 + %6479 = llvm.add %6476, %6478 : !llvm.i64 + %6480 = llvm.getelementptr %6472[%6479] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6481 = llvm.load %6480 : !llvm.ptr + %6482 = llvm.fmul %6471, %6481 {RelaxedPrecision} : !llvm.float + %6483 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6484 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6485 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6486 = llvm.mul %28, %6485 : !llvm.i64 + %6487 = llvm.add %6484, %6486 : !llvm.i64 + %6488 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6489 = llvm.mul %6461, %6488 : !llvm.i64 + %6490 = llvm.add %6487, %6489 : !llvm.i64 + %6491 = llvm.getelementptr %6483[%6490] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6492 = llvm.load %6491 : !llvm.ptr + %6493 = llvm.fadd %6492, %6482 {RelaxedPrecision} : !llvm.float + %6494 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6495 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6496 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6497 = llvm.mul %28, %6496 : !llvm.i64 + %6498 = llvm.add %6495, %6497 : !llvm.i64 + %6499 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6500 = llvm.mul %6461, %6499 : !llvm.i64 + %6501 = llvm.add %6498, %6500 : !llvm.i64 + %6502 = llvm.getelementptr %6494[%6501] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6493, %6502 : !llvm.ptr + %6503 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6504 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6505 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6506 = llvm.mul %28, %6505 : !llvm.i64 + %6507 = llvm.add %6504, %6506 : !llvm.i64 + %6508 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6509 = llvm.mul %6461, %6508 : !llvm.i64 + %6510 = llvm.add %6507, %6509 : !llvm.i64 + %6511 = llvm.getelementptr %6503[%6510] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6512 = llvm.load %6511 : !llvm.ptr + %6513 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6514 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6515 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6516 = llvm.mul %28, %6515 : !llvm.i64 + %6517 = llvm.add %6514, %6516 : !llvm.i64 + %6518 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6519 = llvm.mul %6461, %6518 : !llvm.i64 + %6520 = llvm.add %6517, %6519 : !llvm.i64 + %6521 = llvm.getelementptr %6513[%6520] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6512, %6521 : !llvm.ptr + %6522 = llvm.add %5850, %43 : !llvm.i64 + %6523 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6524 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6525 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6526 = llvm.mul %28, %6525 : !llvm.i64 + %6527 = llvm.add %6524, %6526 : !llvm.i64 + %6528 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6529 = llvm.mul %5851, %6528 : !llvm.i64 + %6530 = llvm.add %6527, %6529 : !llvm.i64 + %6531 = llvm.getelementptr %6523[%6530] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6532 = llvm.load %6531 : !llvm.ptr + %6533 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6534 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6535 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6536 = llvm.mul %5851, %6535 : !llvm.i64 + %6537 = llvm.add %6534, %6536 : !llvm.i64 + %6538 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6539 = llvm.mul %6522, %6538 : !llvm.i64 + %6540 = llvm.add %6537, %6539 : !llvm.i64 + %6541 = llvm.getelementptr %6533[%6540] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6542 = llvm.load %6541 : !llvm.ptr + %6543 = llvm.fmul %6532, %6542 {RelaxedPrecision} : !llvm.float + %6544 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6545 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6546 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6547 = llvm.mul %28, %6546 : !llvm.i64 + %6548 = llvm.add %6545, %6547 : !llvm.i64 + %6549 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6550 = llvm.mul %6522, %6549 : !llvm.i64 + %6551 = llvm.add %6548, %6550 : !llvm.i64 + %6552 = llvm.getelementptr %6544[%6551] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6553 = llvm.load %6552 : !llvm.ptr + %6554 = llvm.fadd %6553, %6543 {RelaxedPrecision} : !llvm.float + %6555 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6556 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6557 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6558 = llvm.mul %28, %6557 : !llvm.i64 + %6559 = llvm.add %6556, %6558 : !llvm.i64 + %6560 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6561 = llvm.mul %6522, %6560 : !llvm.i64 + %6562 = llvm.add %6559, %6561 : !llvm.i64 + %6563 = llvm.getelementptr %6555[%6562] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6554, %6563 : !llvm.ptr + %6564 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6565 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6566 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6567 = llvm.mul %28, %6566 : !llvm.i64 + %6568 = llvm.add %6565, %6567 : !llvm.i64 + %6569 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6570 = llvm.mul %6522, %6569 : !llvm.i64 + %6571 = llvm.add %6568, %6570 : !llvm.i64 + %6572 = llvm.getelementptr %6564[%6571] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6573 = llvm.load %6572 : !llvm.ptr + %6574 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6575 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6576 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6577 = llvm.mul %28, %6576 : !llvm.i64 + %6578 = llvm.add %6575, %6577 : !llvm.i64 + %6579 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6580 = llvm.mul %6522, %6579 : !llvm.i64 + %6581 = llvm.add %6578, %6580 : !llvm.i64 + %6582 = llvm.getelementptr %6574[%6581] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6573, %6582 : !llvm.ptr + %6583 = llvm.add %5850, %44 : !llvm.i64 + %6584 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6585 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6586 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6587 = llvm.mul %28, %6586 : !llvm.i64 + %6588 = llvm.add %6585, %6587 : !llvm.i64 + %6589 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6590 = llvm.mul %5851, %6589 : !llvm.i64 + %6591 = llvm.add %6588, %6590 : !llvm.i64 + %6592 = llvm.getelementptr %6584[%6591] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6593 = llvm.load %6592 : !llvm.ptr + %6594 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6595 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6596 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6597 = llvm.mul %5851, %6596 : !llvm.i64 + %6598 = llvm.add %6595, %6597 : !llvm.i64 + %6599 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6600 = llvm.mul %6583, %6599 : !llvm.i64 + %6601 = llvm.add %6598, %6600 : !llvm.i64 + %6602 = llvm.getelementptr %6594[%6601] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6603 = llvm.load %6602 : !llvm.ptr + %6604 = llvm.fmul %6593, %6603 {RelaxedPrecision} : !llvm.float + %6605 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6606 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6607 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6608 = llvm.mul %28, %6607 : !llvm.i64 + %6609 = llvm.add %6606, %6608 : !llvm.i64 + %6610 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6611 = llvm.mul %6583, %6610 : !llvm.i64 + %6612 = llvm.add %6609, %6611 : !llvm.i64 + %6613 = llvm.getelementptr %6605[%6612] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6614 = llvm.load %6613 : !llvm.ptr + %6615 = llvm.fadd %6614, %6604 {RelaxedPrecision} : !llvm.float + %6616 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6617 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6618 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6619 = llvm.mul %28, %6618 : !llvm.i64 + %6620 = llvm.add %6617, %6619 : !llvm.i64 + %6621 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6622 = llvm.mul %6583, %6621 : !llvm.i64 + %6623 = llvm.add %6620, %6622 : !llvm.i64 + %6624 = llvm.getelementptr %6616[%6623] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6615, %6624 : !llvm.ptr + %6625 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6626 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6627 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6628 = llvm.mul %28, %6627 : !llvm.i64 + %6629 = llvm.add %6626, %6628 : !llvm.i64 + %6630 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6631 = llvm.mul %6583, %6630 : !llvm.i64 + %6632 = llvm.add %6629, %6631 : !llvm.i64 + %6633 = llvm.getelementptr %6625[%6632] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6634 = llvm.load %6633 : !llvm.ptr + %6635 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6637 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6638 = llvm.mul %28, %6637 : !llvm.i64 + %6639 = llvm.add %6636, %6638 : !llvm.i64 + %6640 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6641 = llvm.mul %6583, %6640 : !llvm.i64 + %6642 = llvm.add %6639, %6641 : !llvm.i64 + %6643 = llvm.getelementptr %6635[%6642] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6634, %6643 : !llvm.ptr + %6644 = llvm.add %5850, %45 : !llvm.i64 + %6645 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6646 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6647 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6648 = llvm.mul %28, %6647 : !llvm.i64 + %6649 = llvm.add %6646, %6648 : !llvm.i64 + %6650 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6651 = llvm.mul %5851, %6650 : !llvm.i64 + %6652 = llvm.add %6649, %6651 : !llvm.i64 + %6653 = llvm.getelementptr %6645[%6652] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6654 = llvm.load %6653 : !llvm.ptr + %6655 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6656 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6657 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6658 = llvm.mul %5851, %6657 : !llvm.i64 + %6659 = llvm.add %6656, %6658 : !llvm.i64 + %6660 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6661 = llvm.mul %6644, %6660 : !llvm.i64 + %6662 = llvm.add %6659, %6661 : !llvm.i64 + %6663 = llvm.getelementptr %6655[%6662] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6664 = llvm.load %6663 : !llvm.ptr + %6665 = llvm.fmul %6654, %6664 {RelaxedPrecision} : !llvm.float + %6666 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6667 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6668 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6669 = llvm.mul %28, %6668 : !llvm.i64 + %6670 = llvm.add %6667, %6669 : !llvm.i64 + %6671 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6672 = llvm.mul %6644, %6671 : !llvm.i64 + %6673 = llvm.add %6670, %6672 : !llvm.i64 + %6674 = llvm.getelementptr %6666[%6673] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6675 = llvm.load %6674 : !llvm.ptr + %6676 = llvm.fadd %6675, %6665 {RelaxedPrecision} : !llvm.float + %6677 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6680 = llvm.mul %28, %6679 : !llvm.i64 + %6681 = llvm.add %6678, %6680 : !llvm.i64 + %6682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6683 = llvm.mul %6644, %6682 : !llvm.i64 + %6684 = llvm.add %6681, %6683 : !llvm.i64 + %6685 = llvm.getelementptr %6677[%6684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6676, %6685 : !llvm.ptr + %6686 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6687 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6688 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6689 = llvm.mul %28, %6688 : !llvm.i64 + %6690 = llvm.add %6687, %6689 : !llvm.i64 + %6691 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6692 = llvm.mul %6644, %6691 : !llvm.i64 + %6693 = llvm.add %6690, %6692 : !llvm.i64 + %6694 = llvm.getelementptr %6686[%6693] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6695 = llvm.load %6694 : !llvm.ptr + %6696 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6697 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6698 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6699 = llvm.mul %28, %6698 : !llvm.i64 + %6700 = llvm.add %6697, %6699 : !llvm.i64 + %6701 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6702 = llvm.mul %6644, %6701 : !llvm.i64 + %6703 = llvm.add %6700, %6702 : !llvm.i64 + %6704 = llvm.getelementptr %6696[%6703] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6695, %6704 : !llvm.ptr + %6705 = llvm.add %5850, %46 : !llvm.i64 + %6706 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6708 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6709 = llvm.mul %28, %6708 : !llvm.i64 + %6710 = llvm.add %6707, %6709 : !llvm.i64 + %6711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6712 = llvm.mul %5851, %6711 : !llvm.i64 + %6713 = llvm.add %6710, %6712 : !llvm.i64 + %6714 = llvm.getelementptr %6706[%6713] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6715 = llvm.load %6714 : !llvm.ptr + %6716 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6717 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6718 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6719 = llvm.mul %5851, %6718 : !llvm.i64 + %6720 = llvm.add %6717, %6719 : !llvm.i64 + %6721 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6722 = llvm.mul %6705, %6721 : !llvm.i64 + %6723 = llvm.add %6720, %6722 : !llvm.i64 + %6724 = llvm.getelementptr %6716[%6723] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6725 = llvm.load %6724 : !llvm.ptr + %6726 = llvm.fmul %6715, %6725 {RelaxedPrecision} : !llvm.float + %6727 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6729 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6730 = llvm.mul %28, %6729 : !llvm.i64 + %6731 = llvm.add %6728, %6730 : !llvm.i64 + %6732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6733 = llvm.mul %6705, %6732 : !llvm.i64 + %6734 = llvm.add %6731, %6733 : !llvm.i64 + %6735 = llvm.getelementptr %6727[%6734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6736 = llvm.load %6735 : !llvm.ptr + %6737 = llvm.fadd %6736, %6726 {RelaxedPrecision} : !llvm.float + %6738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6741 = llvm.mul %28, %6740 : !llvm.i64 + %6742 = llvm.add %6739, %6741 : !llvm.i64 + %6743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6744 = llvm.mul %6705, %6743 : !llvm.i64 + %6745 = llvm.add %6742, %6744 : !llvm.i64 + %6746 = llvm.getelementptr %6738[%6745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6737, %6746 : !llvm.ptr + %6747 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6749 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6750 = llvm.mul %28, %6749 : !llvm.i64 + %6751 = llvm.add %6748, %6750 : !llvm.i64 + %6752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6753 = llvm.mul %6705, %6752 : !llvm.i64 + %6754 = llvm.add %6751, %6753 : !llvm.i64 + %6755 = llvm.getelementptr %6747[%6754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6756 = llvm.load %6755 : !llvm.ptr + %6757 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6758 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6759 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6760 = llvm.mul %28, %6759 : !llvm.i64 + %6761 = llvm.add %6758, %6760 : !llvm.i64 + %6762 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6763 = llvm.mul %6705, %6762 : !llvm.i64 + %6764 = llvm.add %6761, %6763 : !llvm.i64 + %6765 = llvm.getelementptr %6757[%6764] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6756, %6765 : !llvm.ptr + %6766 = llvm.add %5850, %47 : !llvm.i64 + %6767 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6768 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6769 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6770 = llvm.mul %28, %6769 : !llvm.i64 + %6771 = llvm.add %6768, %6770 : !llvm.i64 + %6772 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6773 = llvm.mul %5851, %6772 : !llvm.i64 + %6774 = llvm.add %6771, %6773 : !llvm.i64 + %6775 = llvm.getelementptr %6767[%6774] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6776 = llvm.load %6775 : !llvm.ptr + %6777 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6779 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6780 = llvm.mul %5851, %6779 : !llvm.i64 + %6781 = llvm.add %6778, %6780 : !llvm.i64 + %6782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6783 = llvm.mul %6766, %6782 : !llvm.i64 + %6784 = llvm.add %6781, %6783 : !llvm.i64 + %6785 = llvm.getelementptr %6777[%6784] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6786 = llvm.load %6785 : !llvm.ptr + %6787 = llvm.fmul %6776, %6786 {RelaxedPrecision} : !llvm.float + %6788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6791 = llvm.mul %28, %6790 : !llvm.i64 + %6792 = llvm.add %6789, %6791 : !llvm.i64 + %6793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6794 = llvm.mul %6766, %6793 : !llvm.i64 + %6795 = llvm.add %6792, %6794 : !llvm.i64 + %6796 = llvm.getelementptr %6788[%6795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6797 = llvm.load %6796 : !llvm.ptr + %6798 = llvm.fadd %6797, %6787 {RelaxedPrecision} : !llvm.float + %6799 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6800 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6801 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6802 = llvm.mul %28, %6801 : !llvm.i64 + %6803 = llvm.add %6800, %6802 : !llvm.i64 + %6804 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6805 = llvm.mul %6766, %6804 : !llvm.i64 + %6806 = llvm.add %6803, %6805 : !llvm.i64 + %6807 = llvm.getelementptr %6799[%6806] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6798, %6807 : !llvm.ptr + %6808 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6811 = llvm.mul %28, %6810 : !llvm.i64 + %6812 = llvm.add %6809, %6811 : !llvm.i64 + %6813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6814 = llvm.mul %6766, %6813 : !llvm.i64 + %6815 = llvm.add %6812, %6814 : !llvm.i64 + %6816 = llvm.getelementptr %6808[%6815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6817 = llvm.load %6816 : !llvm.ptr + %6818 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6819 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6820 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6821 = llvm.mul %28, %6820 : !llvm.i64 + %6822 = llvm.add %6819, %6821 : !llvm.i64 + %6823 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6824 = llvm.mul %6766, %6823 : !llvm.i64 + %6825 = llvm.add %6822, %6824 : !llvm.i64 + %6826 = llvm.getelementptr %6818[%6825] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6817, %6826 : !llvm.ptr + %6827 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6828 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6829 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6830 = llvm.mul %24, %6829 : !llvm.i64 + %6831 = llvm.add %6828, %6830 : !llvm.i64 + %6832 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6833 = llvm.mul %5851, %6832 : !llvm.i64 + %6834 = llvm.add %6831, %6833 : !llvm.i64 + %6835 = llvm.getelementptr %6827[%6834] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6836 = llvm.load %6835 : !llvm.ptr + %6837 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6838 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6839 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6840 = llvm.mul %5851, %6839 : !llvm.i64 + %6841 = llvm.add %6838, %6840 : !llvm.i64 + %6842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6843 = llvm.mul %5850, %6842 : !llvm.i64 + %6844 = llvm.add %6841, %6843 : !llvm.i64 + %6845 = llvm.getelementptr %6837[%6844] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6846 = llvm.load %6845 : !llvm.ptr + %6847 = llvm.fmul %6836, %6846 {RelaxedPrecision} : !llvm.float + %6848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6851 = llvm.mul %24, %6850 : !llvm.i64 + %6852 = llvm.add %6849, %6851 : !llvm.i64 + %6853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6854 = llvm.mul %5850, %6853 : !llvm.i64 + %6855 = llvm.add %6852, %6854 : !llvm.i64 + %6856 = llvm.getelementptr %6848[%6855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6857 = llvm.load %6856 : !llvm.ptr + %6858 = llvm.fadd %6857, %6847 {RelaxedPrecision} : !llvm.float + %6859 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6860 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6861 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6862 = llvm.mul %24, %6861 : !llvm.i64 + %6863 = llvm.add %6860, %6862 : !llvm.i64 + %6864 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6865 = llvm.mul %5850, %6864 : !llvm.i64 + %6866 = llvm.add %6863, %6865 : !llvm.i64 + %6867 = llvm.getelementptr %6859[%6866] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6858, %6867 : !llvm.ptr + %6868 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6871 = llvm.mul %24, %6870 : !llvm.i64 + %6872 = llvm.add %6869, %6871 : !llvm.i64 + %6873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6874 = llvm.mul %5850, %6873 : !llvm.i64 + %6875 = llvm.add %6872, %6874 : !llvm.i64 + %6876 = llvm.getelementptr %6868[%6875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6877 = llvm.load %6876 : !llvm.ptr + %6878 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6880 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6881 = llvm.mul %24, %6880 : !llvm.i64 + %6882 = llvm.add %6879, %6881 : !llvm.i64 + %6883 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6884 = llvm.mul %5850, %6883 : !llvm.i64 + %6885 = llvm.add %6882, %6884 : !llvm.i64 + %6886 = llvm.getelementptr %6878[%6885] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6877, %6886 : !llvm.ptr + %6887 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6888 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6889 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6890 = llvm.mul %24, %6889 : !llvm.i64 + %6891 = llvm.add %6888, %6890 : !llvm.i64 + %6892 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6893 = llvm.mul %5851, %6892 : !llvm.i64 + %6894 = llvm.add %6891, %6893 : !llvm.i64 + %6895 = llvm.getelementptr %6887[%6894] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6896 = llvm.load %6895 : !llvm.ptr + %6897 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6898 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6899 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6900 = llvm.mul %5851, %6899 : !llvm.i64 + %6901 = llvm.add %6898, %6900 : !llvm.i64 + %6902 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6903 = llvm.mul %5912, %6902 : !llvm.i64 + %6904 = llvm.add %6901, %6903 : !llvm.i64 + %6905 = llvm.getelementptr %6897[%6904] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6906 = llvm.load %6905 : !llvm.ptr + %6907 = llvm.fmul %6896, %6906 {RelaxedPrecision} : !llvm.float + %6908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6911 = llvm.mul %24, %6910 : !llvm.i64 + %6912 = llvm.add %6909, %6911 : !llvm.i64 + %6913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6914 = llvm.mul %5912, %6913 : !llvm.i64 + %6915 = llvm.add %6912, %6914 : !llvm.i64 + %6916 = llvm.getelementptr %6908[%6915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6917 = llvm.load %6916 : !llvm.ptr + %6918 = llvm.fadd %6917, %6907 {RelaxedPrecision} : !llvm.float + %6919 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6921 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6922 = llvm.mul %24, %6921 : !llvm.i64 + %6923 = llvm.add %6920, %6922 : !llvm.i64 + %6924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6925 = llvm.mul %5912, %6924 : !llvm.i64 + %6926 = llvm.add %6923, %6925 : !llvm.i64 + %6927 = llvm.getelementptr %6919[%6926] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6918, %6927 : !llvm.ptr + %6928 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6930 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6931 = llvm.mul %24, %6930 : !llvm.i64 + %6932 = llvm.add %6929, %6931 : !llvm.i64 + %6933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6934 = llvm.mul %5912, %6933 : !llvm.i64 + %6935 = llvm.add %6932, %6934 : !llvm.i64 + %6936 = llvm.getelementptr %6928[%6935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6937 = llvm.load %6936 : !llvm.ptr + %6938 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6940 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6941 = llvm.mul %24, %6940 : !llvm.i64 + %6942 = llvm.add %6939, %6941 : !llvm.i64 + %6943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6944 = llvm.mul %5912, %6943 : !llvm.i64 + %6945 = llvm.add %6942, %6944 : !llvm.i64 + %6946 = llvm.getelementptr %6938[%6945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6937, %6946 : !llvm.ptr + %6947 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6948 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6949 = llvm.mlir.constant(128 : index) : !llvm.i64 + %6950 = llvm.mul %24, %6949 : !llvm.i64 + %6951 = llvm.add %6948, %6950 : !llvm.i64 + %6952 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6953 = llvm.mul %5851, %6952 : !llvm.i64 + %6954 = llvm.add %6951, %6953 : !llvm.i64 + %6955 = llvm.getelementptr %6947[%6954] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6956 = llvm.load %6955 : !llvm.ptr + %6957 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6958 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6959 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6960 = llvm.mul %5851, %6959 : !llvm.i64 + %6961 = llvm.add %6958, %6960 : !llvm.i64 + %6962 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6963 = llvm.mul %5973, %6962 : !llvm.i64 + %6964 = llvm.add %6961, %6963 : !llvm.i64 + %6965 = llvm.getelementptr %6957[%6964] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6966 = llvm.load %6965 : !llvm.ptr + %6967 = llvm.fmul %6956, %6966 {RelaxedPrecision} : !llvm.float + %6968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6971 = llvm.mul %24, %6970 : !llvm.i64 + %6972 = llvm.add %6969, %6971 : !llvm.i64 + %6973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6974 = llvm.mul %5973, %6973 : !llvm.i64 + %6975 = llvm.add %6972, %6974 : !llvm.i64 + %6976 = llvm.getelementptr %6968[%6975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6977 = llvm.load %6976 : !llvm.ptr + %6978 = llvm.fadd %6977, %6967 {RelaxedPrecision} : !llvm.float + %6979 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6980 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6981 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6982 = llvm.mul %24, %6981 : !llvm.i64 + %6983 = llvm.add %6980, %6982 : !llvm.i64 + %6984 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6985 = llvm.mul %5973, %6984 : !llvm.i64 + %6986 = llvm.add %6983, %6985 : !llvm.i64 + %6987 = llvm.getelementptr %6979[%6986] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6978, %6987 : !llvm.ptr + %6988 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6990 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6991 = llvm.mul %24, %6990 : !llvm.i64 + %6992 = llvm.add %6989, %6991 : !llvm.i64 + %6993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6994 = llvm.mul %5973, %6993 : !llvm.i64 + %6995 = llvm.add %6992, %6994 : !llvm.i64 + %6996 = llvm.getelementptr %6988[%6995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6997 = llvm.load %6996 : !llvm.ptr + %6998 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6999 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7000 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7001 = llvm.mul %24, %7000 : !llvm.i64 + %7002 = llvm.add %6999, %7001 : !llvm.i64 + %7003 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7004 = llvm.mul %5973, %7003 : !llvm.i64 + %7005 = llvm.add %7002, %7004 : !llvm.i64 + %7006 = llvm.getelementptr %6998[%7005] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %6997, %7006 : !llvm.ptr + %7007 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7008 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7009 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7010 = llvm.mul %24, %7009 : !llvm.i64 + %7011 = llvm.add %7008, %7010 : !llvm.i64 + %7012 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7013 = llvm.mul %5851, %7012 : !llvm.i64 + %7014 = llvm.add %7011, %7013 : !llvm.i64 + %7015 = llvm.getelementptr %7007[%7014] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7016 = llvm.load %7015 : !llvm.ptr + %7017 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7018 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7019 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7020 = llvm.mul %5851, %7019 : !llvm.i64 + %7021 = llvm.add %7018, %7020 : !llvm.i64 + %7022 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7023 = llvm.mul %6034, %7022 : !llvm.i64 + %7024 = llvm.add %7021, %7023 : !llvm.i64 + %7025 = llvm.getelementptr %7017[%7024] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7026 = llvm.load %7025 : !llvm.ptr + %7027 = llvm.fmul %7016, %7026 {RelaxedPrecision} : !llvm.float + %7028 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7031 = llvm.mul %24, %7030 : !llvm.i64 + %7032 = llvm.add %7029, %7031 : !llvm.i64 + %7033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7034 = llvm.mul %6034, %7033 : !llvm.i64 + %7035 = llvm.add %7032, %7034 : !llvm.i64 + %7036 = llvm.getelementptr %7028[%7035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7037 = llvm.load %7036 : !llvm.ptr + %7038 = llvm.fadd %7037, %7027 {RelaxedPrecision} : !llvm.float + %7039 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7040 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7041 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7042 = llvm.mul %24, %7041 : !llvm.i64 + %7043 = llvm.add %7040, %7042 : !llvm.i64 + %7044 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7045 = llvm.mul %6034, %7044 : !llvm.i64 + %7046 = llvm.add %7043, %7045 : !llvm.i64 + %7047 = llvm.getelementptr %7039[%7046] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7038, %7047 : !llvm.ptr + %7048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7051 = llvm.mul %24, %7050 : !llvm.i64 + %7052 = llvm.add %7049, %7051 : !llvm.i64 + %7053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7054 = llvm.mul %6034, %7053 : !llvm.i64 + %7055 = llvm.add %7052, %7054 : !llvm.i64 + %7056 = llvm.getelementptr %7048[%7055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7057 = llvm.load %7056 : !llvm.ptr + %7058 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7059 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7060 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7061 = llvm.mul %24, %7060 : !llvm.i64 + %7062 = llvm.add %7059, %7061 : !llvm.i64 + %7063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7064 = llvm.mul %6034, %7063 : !llvm.i64 + %7065 = llvm.add %7062, %7064 : !llvm.i64 + %7066 = llvm.getelementptr %7058[%7065] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7057, %7066 : !llvm.ptr + %7067 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7069 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7070 = llvm.mul %24, %7069 : !llvm.i64 + %7071 = llvm.add %7068, %7070 : !llvm.i64 + %7072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7073 = llvm.mul %5851, %7072 : !llvm.i64 + %7074 = llvm.add %7071, %7073 : !llvm.i64 + %7075 = llvm.getelementptr %7067[%7074] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7076 = llvm.load %7075 : !llvm.ptr + %7077 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7078 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7079 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7080 = llvm.mul %5851, %7079 : !llvm.i64 + %7081 = llvm.add %7078, %7080 : !llvm.i64 + %7082 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7083 = llvm.mul %6095, %7082 : !llvm.i64 + %7084 = llvm.add %7081, %7083 : !llvm.i64 + %7085 = llvm.getelementptr %7077[%7084] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7086 = llvm.load %7085 : !llvm.ptr + %7087 = llvm.fmul %7076, %7086 {RelaxedPrecision} : !llvm.float + %7088 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7091 = llvm.mul %24, %7090 : !llvm.i64 + %7092 = llvm.add %7089, %7091 : !llvm.i64 + %7093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7094 = llvm.mul %6095, %7093 : !llvm.i64 + %7095 = llvm.add %7092, %7094 : !llvm.i64 + %7096 = llvm.getelementptr %7088[%7095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7097 = llvm.load %7096 : !llvm.ptr + %7098 = llvm.fadd %7097, %7087 {RelaxedPrecision} : !llvm.float + %7099 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7100 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7101 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7102 = llvm.mul %24, %7101 : !llvm.i64 + %7103 = llvm.add %7100, %7102 : !llvm.i64 + %7104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7105 = llvm.mul %6095, %7104 : !llvm.i64 + %7106 = llvm.add %7103, %7105 : !llvm.i64 + %7107 = llvm.getelementptr %7099[%7106] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7098, %7107 : !llvm.ptr + %7108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7111 = llvm.mul %24, %7110 : !llvm.i64 + %7112 = llvm.add %7109, %7111 : !llvm.i64 + %7113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7114 = llvm.mul %6095, %7113 : !llvm.i64 + %7115 = llvm.add %7112, %7114 : !llvm.i64 + %7116 = llvm.getelementptr %7108[%7115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7117 = llvm.load %7116 : !llvm.ptr + %7118 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7119 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7120 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7121 = llvm.mul %24, %7120 : !llvm.i64 + %7122 = llvm.add %7119, %7121 : !llvm.i64 + %7123 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7124 = llvm.mul %6095, %7123 : !llvm.i64 + %7125 = llvm.add %7122, %7124 : !llvm.i64 + %7126 = llvm.getelementptr %7118[%7125] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7117, %7126 : !llvm.ptr + %7127 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7128 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7129 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7130 = llvm.mul %24, %7129 : !llvm.i64 + %7131 = llvm.add %7128, %7130 : !llvm.i64 + %7132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7133 = llvm.mul %5851, %7132 : !llvm.i64 + %7134 = llvm.add %7131, %7133 : !llvm.i64 + %7135 = llvm.getelementptr %7127[%7134] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7136 = llvm.load %7135 : !llvm.ptr + %7137 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7138 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7139 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7140 = llvm.mul %5851, %7139 : !llvm.i64 + %7141 = llvm.add %7138, %7140 : !llvm.i64 + %7142 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7143 = llvm.mul %6156, %7142 : !llvm.i64 + %7144 = llvm.add %7141, %7143 : !llvm.i64 + %7145 = llvm.getelementptr %7137[%7144] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7146 = llvm.load %7145 : !llvm.ptr + %7147 = llvm.fmul %7136, %7146 {RelaxedPrecision} : !llvm.float + %7148 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7151 = llvm.mul %24, %7150 : !llvm.i64 + %7152 = llvm.add %7149, %7151 : !llvm.i64 + %7153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7154 = llvm.mul %6156, %7153 : !llvm.i64 + %7155 = llvm.add %7152, %7154 : !llvm.i64 + %7156 = llvm.getelementptr %7148[%7155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7157 = llvm.load %7156 : !llvm.ptr + %7158 = llvm.fadd %7157, %7147 {RelaxedPrecision} : !llvm.float + %7159 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7160 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7161 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7162 = llvm.mul %24, %7161 : !llvm.i64 + %7163 = llvm.add %7160, %7162 : !llvm.i64 + %7164 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7165 = llvm.mul %6156, %7164 : !llvm.i64 + %7166 = llvm.add %7163, %7165 : !llvm.i64 + %7167 = llvm.getelementptr %7159[%7166] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7158, %7167 : !llvm.ptr + %7168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7171 = llvm.mul %24, %7170 : !llvm.i64 + %7172 = llvm.add %7169, %7171 : !llvm.i64 + %7173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7174 = llvm.mul %6156, %7173 : !llvm.i64 + %7175 = llvm.add %7172, %7174 : !llvm.i64 + %7176 = llvm.getelementptr %7168[%7175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7177 = llvm.load %7176 : !llvm.ptr + %7178 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7180 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7181 = llvm.mul %24, %7180 : !llvm.i64 + %7182 = llvm.add %7179, %7181 : !llvm.i64 + %7183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7184 = llvm.mul %6156, %7183 : !llvm.i64 + %7185 = llvm.add %7182, %7184 : !llvm.i64 + %7186 = llvm.getelementptr %7178[%7185] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7177, %7186 : !llvm.ptr + %7187 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7188 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7189 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7190 = llvm.mul %24, %7189 : !llvm.i64 + %7191 = llvm.add %7188, %7190 : !llvm.i64 + %7192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7193 = llvm.mul %5851, %7192 : !llvm.i64 + %7194 = llvm.add %7191, %7193 : !llvm.i64 + %7195 = llvm.getelementptr %7187[%7194] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7196 = llvm.load %7195 : !llvm.ptr + %7197 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7198 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7199 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7200 = llvm.mul %5851, %7199 : !llvm.i64 + %7201 = llvm.add %7198, %7200 : !llvm.i64 + %7202 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7203 = llvm.mul %6217, %7202 : !llvm.i64 + %7204 = llvm.add %7201, %7203 : !llvm.i64 + %7205 = llvm.getelementptr %7197[%7204] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7206 = llvm.load %7205 : !llvm.ptr + %7207 = llvm.fmul %7196, %7206 {RelaxedPrecision} : !llvm.float + %7208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7211 = llvm.mul %24, %7210 : !llvm.i64 + %7212 = llvm.add %7209, %7211 : !llvm.i64 + %7213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7214 = llvm.mul %6217, %7213 : !llvm.i64 + %7215 = llvm.add %7212, %7214 : !llvm.i64 + %7216 = llvm.getelementptr %7208[%7215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7217 = llvm.load %7216 : !llvm.ptr + %7218 = llvm.fadd %7217, %7207 {RelaxedPrecision} : !llvm.float + %7219 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7220 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7221 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7222 = llvm.mul %24, %7221 : !llvm.i64 + %7223 = llvm.add %7220, %7222 : !llvm.i64 + %7224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7225 = llvm.mul %6217, %7224 : !llvm.i64 + %7226 = llvm.add %7223, %7225 : !llvm.i64 + %7227 = llvm.getelementptr %7219[%7226] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7218, %7227 : !llvm.ptr + %7228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7231 = llvm.mul %24, %7230 : !llvm.i64 + %7232 = llvm.add %7229, %7231 : !llvm.i64 + %7233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7234 = llvm.mul %6217, %7233 : !llvm.i64 + %7235 = llvm.add %7232, %7234 : !llvm.i64 + %7236 = llvm.getelementptr %7228[%7235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7237 = llvm.load %7236 : !llvm.ptr + %7238 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7239 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7240 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7241 = llvm.mul %24, %7240 : !llvm.i64 + %7242 = llvm.add %7239, %7241 : !llvm.i64 + %7243 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7244 = llvm.mul %6217, %7243 : !llvm.i64 + %7245 = llvm.add %7242, %7244 : !llvm.i64 + %7246 = llvm.getelementptr %7238[%7245] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7237, %7246 : !llvm.ptr + %7247 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7248 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7249 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7250 = llvm.mul %24, %7249 : !llvm.i64 + %7251 = llvm.add %7248, %7250 : !llvm.i64 + %7252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7253 = llvm.mul %5851, %7252 : !llvm.i64 + %7254 = llvm.add %7251, %7253 : !llvm.i64 + %7255 = llvm.getelementptr %7247[%7254] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7256 = llvm.load %7255 : !llvm.ptr + %7257 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7258 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7259 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7260 = llvm.mul %5851, %7259 : !llvm.i64 + %7261 = llvm.add %7258, %7260 : !llvm.i64 + %7262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7263 = llvm.mul %6278, %7262 : !llvm.i64 + %7264 = llvm.add %7261, %7263 : !llvm.i64 + %7265 = llvm.getelementptr %7257[%7264] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7266 = llvm.load %7265 : !llvm.ptr + %7267 = llvm.fmul %7256, %7266 {RelaxedPrecision} : !llvm.float + %7268 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7271 = llvm.mul %24, %7270 : !llvm.i64 + %7272 = llvm.add %7269, %7271 : !llvm.i64 + %7273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7274 = llvm.mul %6278, %7273 : !llvm.i64 + %7275 = llvm.add %7272, %7274 : !llvm.i64 + %7276 = llvm.getelementptr %7268[%7275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7277 = llvm.load %7276 : !llvm.ptr + %7278 = llvm.fadd %7277, %7267 {RelaxedPrecision} : !llvm.float + %7279 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7280 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7281 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7282 = llvm.mul %24, %7281 : !llvm.i64 + %7283 = llvm.add %7280, %7282 : !llvm.i64 + %7284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7285 = llvm.mul %6278, %7284 : !llvm.i64 + %7286 = llvm.add %7283, %7285 : !llvm.i64 + %7287 = llvm.getelementptr %7279[%7286] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7278, %7287 : !llvm.ptr + %7288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7291 = llvm.mul %24, %7290 : !llvm.i64 + %7292 = llvm.add %7289, %7291 : !llvm.i64 + %7293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7294 = llvm.mul %6278, %7293 : !llvm.i64 + %7295 = llvm.add %7292, %7294 : !llvm.i64 + %7296 = llvm.getelementptr %7288[%7295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7297 = llvm.load %7296 : !llvm.ptr + %7298 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7300 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7301 = llvm.mul %24, %7300 : !llvm.i64 + %7302 = llvm.add %7299, %7301 : !llvm.i64 + %7303 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7304 = llvm.mul %6278, %7303 : !llvm.i64 + %7305 = llvm.add %7302, %7304 : !llvm.i64 + %7306 = llvm.getelementptr %7298[%7305] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7297, %7306 : !llvm.ptr + %7307 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7308 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7309 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7310 = llvm.mul %24, %7309 : !llvm.i64 + %7311 = llvm.add %7308, %7310 : !llvm.i64 + %7312 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7313 = llvm.mul %5851, %7312 : !llvm.i64 + %7314 = llvm.add %7311, %7313 : !llvm.i64 + %7315 = llvm.getelementptr %7307[%7314] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7316 = llvm.load %7315 : !llvm.ptr + %7317 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7318 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7319 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7320 = llvm.mul %5851, %7319 : !llvm.i64 + %7321 = llvm.add %7318, %7320 : !llvm.i64 + %7322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7323 = llvm.mul %6339, %7322 : !llvm.i64 + %7324 = llvm.add %7321, %7323 : !llvm.i64 + %7325 = llvm.getelementptr %7317[%7324] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7326 = llvm.load %7325 : !llvm.ptr + %7327 = llvm.fmul %7316, %7326 {RelaxedPrecision} : !llvm.float + %7328 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7331 = llvm.mul %24, %7330 : !llvm.i64 + %7332 = llvm.add %7329, %7331 : !llvm.i64 + %7333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7334 = llvm.mul %6339, %7333 : !llvm.i64 + %7335 = llvm.add %7332, %7334 : !llvm.i64 + %7336 = llvm.getelementptr %7328[%7335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7337 = llvm.load %7336 : !llvm.ptr + %7338 = llvm.fadd %7337, %7327 {RelaxedPrecision} : !llvm.float + %7339 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7340 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7341 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7342 = llvm.mul %24, %7341 : !llvm.i64 + %7343 = llvm.add %7340, %7342 : !llvm.i64 + %7344 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7345 = llvm.mul %6339, %7344 : !llvm.i64 + %7346 = llvm.add %7343, %7345 : !llvm.i64 + %7347 = llvm.getelementptr %7339[%7346] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7338, %7347 : !llvm.ptr + %7348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7351 = llvm.mul %24, %7350 : !llvm.i64 + %7352 = llvm.add %7349, %7351 : !llvm.i64 + %7353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7354 = llvm.mul %6339, %7353 : !llvm.i64 + %7355 = llvm.add %7352, %7354 : !llvm.i64 + %7356 = llvm.getelementptr %7348[%7355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7357 = llvm.load %7356 : !llvm.ptr + %7358 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7360 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7361 = llvm.mul %24, %7360 : !llvm.i64 + %7362 = llvm.add %7359, %7361 : !llvm.i64 + %7363 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7364 = llvm.mul %6339, %7363 : !llvm.i64 + %7365 = llvm.add %7362, %7364 : !llvm.i64 + %7366 = llvm.getelementptr %7358[%7365] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7357, %7366 : !llvm.ptr + %7367 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7368 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7369 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7370 = llvm.mul %24, %7369 : !llvm.i64 + %7371 = llvm.add %7368, %7370 : !llvm.i64 + %7372 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7373 = llvm.mul %5851, %7372 : !llvm.i64 + %7374 = llvm.add %7371, %7373 : !llvm.i64 + %7375 = llvm.getelementptr %7367[%7374] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7376 = llvm.load %7375 : !llvm.ptr + %7377 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7378 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7379 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7380 = llvm.mul %5851, %7379 : !llvm.i64 + %7381 = llvm.add %7378, %7380 : !llvm.i64 + %7382 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7383 = llvm.mul %6400, %7382 : !llvm.i64 + %7384 = llvm.add %7381, %7383 : !llvm.i64 + %7385 = llvm.getelementptr %7377[%7384] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7386 = llvm.load %7385 : !llvm.ptr + %7387 = llvm.fmul %7376, %7386 {RelaxedPrecision} : !llvm.float + %7388 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7391 = llvm.mul %24, %7390 : !llvm.i64 + %7392 = llvm.add %7389, %7391 : !llvm.i64 + %7393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7394 = llvm.mul %6400, %7393 : !llvm.i64 + %7395 = llvm.add %7392, %7394 : !llvm.i64 + %7396 = llvm.getelementptr %7388[%7395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7397 = llvm.load %7396 : !llvm.ptr + %7398 = llvm.fadd %7397, %7387 {RelaxedPrecision} : !llvm.float + %7399 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7400 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7401 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7402 = llvm.mul %24, %7401 : !llvm.i64 + %7403 = llvm.add %7400, %7402 : !llvm.i64 + %7404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7405 = llvm.mul %6400, %7404 : !llvm.i64 + %7406 = llvm.add %7403, %7405 : !llvm.i64 + %7407 = llvm.getelementptr %7399[%7406] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7398, %7407 : !llvm.ptr + %7408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7411 = llvm.mul %24, %7410 : !llvm.i64 + %7412 = llvm.add %7409, %7411 : !llvm.i64 + %7413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7414 = llvm.mul %6400, %7413 : !llvm.i64 + %7415 = llvm.add %7412, %7414 : !llvm.i64 + %7416 = llvm.getelementptr %7408[%7415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7417 = llvm.load %7416 : !llvm.ptr + %7418 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7420 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7421 = llvm.mul %24, %7420 : !llvm.i64 + %7422 = llvm.add %7419, %7421 : !llvm.i64 + %7423 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7424 = llvm.mul %6400, %7423 : !llvm.i64 + %7425 = llvm.add %7422, %7424 : !llvm.i64 + %7426 = llvm.getelementptr %7418[%7425] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7417, %7426 : !llvm.ptr + %7427 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7428 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7429 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7430 = llvm.mul %24, %7429 : !llvm.i64 + %7431 = llvm.add %7428, %7430 : !llvm.i64 + %7432 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7433 = llvm.mul %5851, %7432 : !llvm.i64 + %7434 = llvm.add %7431, %7433 : !llvm.i64 + %7435 = llvm.getelementptr %7427[%7434] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7436 = llvm.load %7435 : !llvm.ptr + %7437 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7438 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7439 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7440 = llvm.mul %5851, %7439 : !llvm.i64 + %7441 = llvm.add %7438, %7440 : !llvm.i64 + %7442 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7443 = llvm.mul %6461, %7442 : !llvm.i64 + %7444 = llvm.add %7441, %7443 : !llvm.i64 + %7445 = llvm.getelementptr %7437[%7444] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7446 = llvm.load %7445 : !llvm.ptr + %7447 = llvm.fmul %7436, %7446 {RelaxedPrecision} : !llvm.float + %7448 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7451 = llvm.mul %24, %7450 : !llvm.i64 + %7452 = llvm.add %7449, %7451 : !llvm.i64 + %7453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7454 = llvm.mul %6461, %7453 : !llvm.i64 + %7455 = llvm.add %7452, %7454 : !llvm.i64 + %7456 = llvm.getelementptr %7448[%7455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7457 = llvm.load %7456 : !llvm.ptr + %7458 = llvm.fadd %7457, %7447 {RelaxedPrecision} : !llvm.float + %7459 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7460 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7461 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7462 = llvm.mul %24, %7461 : !llvm.i64 + %7463 = llvm.add %7460, %7462 : !llvm.i64 + %7464 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7465 = llvm.mul %6461, %7464 : !llvm.i64 + %7466 = llvm.add %7463, %7465 : !llvm.i64 + %7467 = llvm.getelementptr %7459[%7466] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7458, %7467 : !llvm.ptr + %7468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7471 = llvm.mul %24, %7470 : !llvm.i64 + %7472 = llvm.add %7469, %7471 : !llvm.i64 + %7473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7474 = llvm.mul %6461, %7473 : !llvm.i64 + %7475 = llvm.add %7472, %7474 : !llvm.i64 + %7476 = llvm.getelementptr %7468[%7475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7477 = llvm.load %7476 : !llvm.ptr + %7478 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7479 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7480 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7481 = llvm.mul %24, %7480 : !llvm.i64 + %7482 = llvm.add %7479, %7481 : !llvm.i64 + %7483 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7484 = llvm.mul %6461, %7483 : !llvm.i64 + %7485 = llvm.add %7482, %7484 : !llvm.i64 + %7486 = llvm.getelementptr %7478[%7485] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7477, %7486 : !llvm.ptr + %7487 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7488 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7489 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7490 = llvm.mul %24, %7489 : !llvm.i64 + %7491 = llvm.add %7488, %7490 : !llvm.i64 + %7492 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7493 = llvm.mul %5851, %7492 : !llvm.i64 + %7494 = llvm.add %7491, %7493 : !llvm.i64 + %7495 = llvm.getelementptr %7487[%7494] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7496 = llvm.load %7495 : !llvm.ptr + %7497 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7498 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7499 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7500 = llvm.mul %5851, %7499 : !llvm.i64 + %7501 = llvm.add %7498, %7500 : !llvm.i64 + %7502 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7503 = llvm.mul %6522, %7502 : !llvm.i64 + %7504 = llvm.add %7501, %7503 : !llvm.i64 + %7505 = llvm.getelementptr %7497[%7504] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7506 = llvm.load %7505 : !llvm.ptr + %7507 = llvm.fmul %7496, %7506 {RelaxedPrecision} : !llvm.float + %7508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7511 = llvm.mul %24, %7510 : !llvm.i64 + %7512 = llvm.add %7509, %7511 : !llvm.i64 + %7513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7514 = llvm.mul %6522, %7513 : !llvm.i64 + %7515 = llvm.add %7512, %7514 : !llvm.i64 + %7516 = llvm.getelementptr %7508[%7515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7517 = llvm.load %7516 : !llvm.ptr + %7518 = llvm.fadd %7517, %7507 {RelaxedPrecision} : !llvm.float + %7519 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7520 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7521 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7522 = llvm.mul %24, %7521 : !llvm.i64 + %7523 = llvm.add %7520, %7522 : !llvm.i64 + %7524 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7525 = llvm.mul %6522, %7524 : !llvm.i64 + %7526 = llvm.add %7523, %7525 : !llvm.i64 + %7527 = llvm.getelementptr %7519[%7526] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7518, %7527 : !llvm.ptr + %7528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7531 = llvm.mul %24, %7530 : !llvm.i64 + %7532 = llvm.add %7529, %7531 : !llvm.i64 + %7533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7534 = llvm.mul %6522, %7533 : !llvm.i64 + %7535 = llvm.add %7532, %7534 : !llvm.i64 + %7536 = llvm.getelementptr %7528[%7535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7537 = llvm.load %7536 : !llvm.ptr + %7538 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7539 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7540 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7541 = llvm.mul %24, %7540 : !llvm.i64 + %7542 = llvm.add %7539, %7541 : !llvm.i64 + %7543 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7544 = llvm.mul %6522, %7543 : !llvm.i64 + %7545 = llvm.add %7542, %7544 : !llvm.i64 + %7546 = llvm.getelementptr %7538[%7545] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7537, %7546 : !llvm.ptr + %7547 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7548 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7549 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7550 = llvm.mul %24, %7549 : !llvm.i64 + %7551 = llvm.add %7548, %7550 : !llvm.i64 + %7552 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7553 = llvm.mul %5851, %7552 : !llvm.i64 + %7554 = llvm.add %7551, %7553 : !llvm.i64 + %7555 = llvm.getelementptr %7547[%7554] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7556 = llvm.load %7555 : !llvm.ptr + %7557 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7558 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7559 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7560 = llvm.mul %5851, %7559 : !llvm.i64 + %7561 = llvm.add %7558, %7560 : !llvm.i64 + %7562 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7563 = llvm.mul %6583, %7562 : !llvm.i64 + %7564 = llvm.add %7561, %7563 : !llvm.i64 + %7565 = llvm.getelementptr %7557[%7564] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7566 = llvm.load %7565 : !llvm.ptr + %7567 = llvm.fmul %7556, %7566 {RelaxedPrecision} : !llvm.float + %7568 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7571 = llvm.mul %24, %7570 : !llvm.i64 + %7572 = llvm.add %7569, %7571 : !llvm.i64 + %7573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7574 = llvm.mul %6583, %7573 : !llvm.i64 + %7575 = llvm.add %7572, %7574 : !llvm.i64 + %7576 = llvm.getelementptr %7568[%7575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7577 = llvm.load %7576 : !llvm.ptr + %7578 = llvm.fadd %7577, %7567 {RelaxedPrecision} : !llvm.float + %7579 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7581 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7582 = llvm.mul %24, %7581 : !llvm.i64 + %7583 = llvm.add %7580, %7582 : !llvm.i64 + %7584 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7585 = llvm.mul %6583, %7584 : !llvm.i64 + %7586 = llvm.add %7583, %7585 : !llvm.i64 + %7587 = llvm.getelementptr %7579[%7586] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7578, %7587 : !llvm.ptr + %7588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7591 = llvm.mul %24, %7590 : !llvm.i64 + %7592 = llvm.add %7589, %7591 : !llvm.i64 + %7593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7594 = llvm.mul %6583, %7593 : !llvm.i64 + %7595 = llvm.add %7592, %7594 : !llvm.i64 + %7596 = llvm.getelementptr %7588[%7595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7597 = llvm.load %7596 : !llvm.ptr + %7598 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7599 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7600 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7601 = llvm.mul %24, %7600 : !llvm.i64 + %7602 = llvm.add %7599, %7601 : !llvm.i64 + %7603 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7604 = llvm.mul %6583, %7603 : !llvm.i64 + %7605 = llvm.add %7602, %7604 : !llvm.i64 + %7606 = llvm.getelementptr %7598[%7605] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7597, %7606 : !llvm.ptr + %7607 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7608 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7609 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7610 = llvm.mul %24, %7609 : !llvm.i64 + %7611 = llvm.add %7608, %7610 : !llvm.i64 + %7612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7613 = llvm.mul %5851, %7612 : !llvm.i64 + %7614 = llvm.add %7611, %7613 : !llvm.i64 + %7615 = llvm.getelementptr %7607[%7614] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7616 = llvm.load %7615 : !llvm.ptr + %7617 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7618 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7619 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7620 = llvm.mul %5851, %7619 : !llvm.i64 + %7621 = llvm.add %7618, %7620 : !llvm.i64 + %7622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7623 = llvm.mul %6644, %7622 : !llvm.i64 + %7624 = llvm.add %7621, %7623 : !llvm.i64 + %7625 = llvm.getelementptr %7617[%7624] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7626 = llvm.load %7625 : !llvm.ptr + %7627 = llvm.fmul %7616, %7626 {RelaxedPrecision} : !llvm.float + %7628 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7631 = llvm.mul %24, %7630 : !llvm.i64 + %7632 = llvm.add %7629, %7631 : !llvm.i64 + %7633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7634 = llvm.mul %6644, %7633 : !llvm.i64 + %7635 = llvm.add %7632, %7634 : !llvm.i64 + %7636 = llvm.getelementptr %7628[%7635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7637 = llvm.load %7636 : !llvm.ptr + %7638 = llvm.fadd %7637, %7627 {RelaxedPrecision} : !llvm.float + %7639 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7640 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7641 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7642 = llvm.mul %24, %7641 : !llvm.i64 + %7643 = llvm.add %7640, %7642 : !llvm.i64 + %7644 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7645 = llvm.mul %6644, %7644 : !llvm.i64 + %7646 = llvm.add %7643, %7645 : !llvm.i64 + %7647 = llvm.getelementptr %7639[%7646] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7638, %7647 : !llvm.ptr + %7648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7651 = llvm.mul %24, %7650 : !llvm.i64 + %7652 = llvm.add %7649, %7651 : !llvm.i64 + %7653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7654 = llvm.mul %6644, %7653 : !llvm.i64 + %7655 = llvm.add %7652, %7654 : !llvm.i64 + %7656 = llvm.getelementptr %7648[%7655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7657 = llvm.load %7656 : !llvm.ptr + %7658 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7659 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7660 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7661 = llvm.mul %24, %7660 : !llvm.i64 + %7662 = llvm.add %7659, %7661 : !llvm.i64 + %7663 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7664 = llvm.mul %6644, %7663 : !llvm.i64 + %7665 = llvm.add %7662, %7664 : !llvm.i64 + %7666 = llvm.getelementptr %7658[%7665] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7657, %7666 : !llvm.ptr + %7667 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7668 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7669 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7670 = llvm.mul %24, %7669 : !llvm.i64 + %7671 = llvm.add %7668, %7670 : !llvm.i64 + %7672 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7673 = llvm.mul %5851, %7672 : !llvm.i64 + %7674 = llvm.add %7671, %7673 : !llvm.i64 + %7675 = llvm.getelementptr %7667[%7674] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7676 = llvm.load %7675 : !llvm.ptr + %7677 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7678 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7679 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7680 = llvm.mul %5851, %7679 : !llvm.i64 + %7681 = llvm.add %7678, %7680 : !llvm.i64 + %7682 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7683 = llvm.mul %6705, %7682 : !llvm.i64 + %7684 = llvm.add %7681, %7683 : !llvm.i64 + %7685 = llvm.getelementptr %7677[%7684] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7686 = llvm.load %7685 : !llvm.ptr + %7687 = llvm.fmul %7676, %7686 {RelaxedPrecision} : !llvm.float + %7688 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7691 = llvm.mul %24, %7690 : !llvm.i64 + %7692 = llvm.add %7689, %7691 : !llvm.i64 + %7693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7694 = llvm.mul %6705, %7693 : !llvm.i64 + %7695 = llvm.add %7692, %7694 : !llvm.i64 + %7696 = llvm.getelementptr %7688[%7695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7697 = llvm.load %7696 : !llvm.ptr + %7698 = llvm.fadd %7697, %7687 {RelaxedPrecision} : !llvm.float + %7699 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7700 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7701 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7702 = llvm.mul %24, %7701 : !llvm.i64 + %7703 = llvm.add %7700, %7702 : !llvm.i64 + %7704 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7705 = llvm.mul %6705, %7704 : !llvm.i64 + %7706 = llvm.add %7703, %7705 : !llvm.i64 + %7707 = llvm.getelementptr %7699[%7706] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7698, %7707 : !llvm.ptr + %7708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7711 = llvm.mul %24, %7710 : !llvm.i64 + %7712 = llvm.add %7709, %7711 : !llvm.i64 + %7713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7714 = llvm.mul %6705, %7713 : !llvm.i64 + %7715 = llvm.add %7712, %7714 : !llvm.i64 + %7716 = llvm.getelementptr %7708[%7715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7717 = llvm.load %7716 : !llvm.ptr + %7718 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7720 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7721 = llvm.mul %24, %7720 : !llvm.i64 + %7722 = llvm.add %7719, %7721 : !llvm.i64 + %7723 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7724 = llvm.mul %6705, %7723 : !llvm.i64 + %7725 = llvm.add %7722, %7724 : !llvm.i64 + %7726 = llvm.getelementptr %7718[%7725] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7717, %7726 : !llvm.ptr + %7727 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7728 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7729 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7730 = llvm.mul %24, %7729 : !llvm.i64 + %7731 = llvm.add %7728, %7730 : !llvm.i64 + %7732 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7733 = llvm.mul %5851, %7732 : !llvm.i64 + %7734 = llvm.add %7731, %7733 : !llvm.i64 + %7735 = llvm.getelementptr %7727[%7734] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7736 = llvm.load %7735 : !llvm.ptr + %7737 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7738 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7739 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7740 = llvm.mul %5851, %7739 : !llvm.i64 + %7741 = llvm.add %7738, %7740 : !llvm.i64 + %7742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7743 = llvm.mul %6766, %7742 : !llvm.i64 + %7744 = llvm.add %7741, %7743 : !llvm.i64 + %7745 = llvm.getelementptr %7737[%7744] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7746 = llvm.load %7745 : !llvm.ptr + %7747 = llvm.fmul %7736, %7746 {RelaxedPrecision} : !llvm.float + %7748 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7750 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7751 = llvm.mul %24, %7750 : !llvm.i64 + %7752 = llvm.add %7749, %7751 : !llvm.i64 + %7753 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7754 = llvm.mul %6766, %7753 : !llvm.i64 + %7755 = llvm.add %7752, %7754 : !llvm.i64 + %7756 = llvm.getelementptr %7748[%7755] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7757 = llvm.load %7756 : !llvm.ptr + %7758 = llvm.fadd %7757, %7747 {RelaxedPrecision} : !llvm.float + %7759 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7760 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7761 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7762 = llvm.mul %24, %7761 : !llvm.i64 + %7763 = llvm.add %7760, %7762 : !llvm.i64 + %7764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7765 = llvm.mul %6766, %7764 : !llvm.i64 + %7766 = llvm.add %7763, %7765 : !llvm.i64 + %7767 = llvm.getelementptr %7759[%7766] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7758, %7767 : !llvm.ptr + %7768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7771 = llvm.mul %24, %7770 : !llvm.i64 + %7772 = llvm.add %7769, %7771 : !llvm.i64 + %7773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7774 = llvm.mul %6766, %7773 : !llvm.i64 + %7775 = llvm.add %7772, %7774 : !llvm.i64 + %7776 = llvm.getelementptr %7768[%7775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7777 = llvm.load %7776 : !llvm.ptr + %7778 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7779 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7780 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7781 = llvm.mul %24, %7780 : !llvm.i64 + %7782 = llvm.add %7779, %7781 : !llvm.i64 + %7783 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7784 = llvm.mul %6766, %7783 : !llvm.i64 + %7785 = llvm.add %7782, %7784 : !llvm.i64 + %7786 = llvm.getelementptr %7778[%7785] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7777, %7786 : !llvm.ptr + %7787 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7788 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7789 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7790 = llvm.mul %25, %7789 : !llvm.i64 + %7791 = llvm.add %7788, %7790 : !llvm.i64 + %7792 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7793 = llvm.mul %5851, %7792 : !llvm.i64 + %7794 = llvm.add %7791, %7793 : !llvm.i64 + %7795 = llvm.getelementptr %7787[%7794] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7796 = llvm.load %7795 : !llvm.ptr + %7797 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7799 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7800 = llvm.mul %5851, %7799 : !llvm.i64 + %7801 = llvm.add %7798, %7800 : !llvm.i64 + %7802 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7803 = llvm.mul %5850, %7802 : !llvm.i64 + %7804 = llvm.add %7801, %7803 : !llvm.i64 + %7805 = llvm.getelementptr %7797[%7804] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7806 = llvm.load %7805 : !llvm.ptr + %7807 = llvm.fmul %7796, %7806 {RelaxedPrecision} : !llvm.float + %7808 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7810 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7811 = llvm.mul %25, %7810 : !llvm.i64 + %7812 = llvm.add %7809, %7811 : !llvm.i64 + %7813 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7814 = llvm.mul %5850, %7813 : !llvm.i64 + %7815 = llvm.add %7812, %7814 : !llvm.i64 + %7816 = llvm.getelementptr %7808[%7815] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7817 = llvm.load %7816 : !llvm.ptr + %7818 = llvm.fadd %7817, %7807 {RelaxedPrecision} : !llvm.float + %7819 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7820 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7821 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7822 = llvm.mul %25, %7821 : !llvm.i64 + %7823 = llvm.add %7820, %7822 : !llvm.i64 + %7824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7825 = llvm.mul %5850, %7824 : !llvm.i64 + %7826 = llvm.add %7823, %7825 : !llvm.i64 + %7827 = llvm.getelementptr %7819[%7826] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7818, %7827 : !llvm.ptr + %7828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7831 = llvm.mul %25, %7830 : !llvm.i64 + %7832 = llvm.add %7829, %7831 : !llvm.i64 + %7833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7834 = llvm.mul %5850, %7833 : !llvm.i64 + %7835 = llvm.add %7832, %7834 : !llvm.i64 + %7836 = llvm.getelementptr %7828[%7835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7837 = llvm.load %7836 : !llvm.ptr + %7838 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7839 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7840 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7841 = llvm.mul %25, %7840 : !llvm.i64 + %7842 = llvm.add %7839, %7841 : !llvm.i64 + %7843 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7844 = llvm.mul %5850, %7843 : !llvm.i64 + %7845 = llvm.add %7842, %7844 : !llvm.i64 + %7846 = llvm.getelementptr %7838[%7845] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7837, %7846 : !llvm.ptr + %7847 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7848 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7849 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7850 = llvm.mul %25, %7849 : !llvm.i64 + %7851 = llvm.add %7848, %7850 : !llvm.i64 + %7852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7853 = llvm.mul %5851, %7852 : !llvm.i64 + %7854 = llvm.add %7851, %7853 : !llvm.i64 + %7855 = llvm.getelementptr %7847[%7854] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7856 = llvm.load %7855 : !llvm.ptr + %7857 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7858 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7859 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7860 = llvm.mul %5851, %7859 : !llvm.i64 + %7861 = llvm.add %7858, %7860 : !llvm.i64 + %7862 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7863 = llvm.mul %5912, %7862 : !llvm.i64 + %7864 = llvm.add %7861, %7863 : !llvm.i64 + %7865 = llvm.getelementptr %7857[%7864] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7866 = llvm.load %7865 : !llvm.ptr + %7867 = llvm.fmul %7856, %7866 {RelaxedPrecision} : !llvm.float + %7868 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7869 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7870 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7871 = llvm.mul %25, %7870 : !llvm.i64 + %7872 = llvm.add %7869, %7871 : !llvm.i64 + %7873 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7874 = llvm.mul %5912, %7873 : !llvm.i64 + %7875 = llvm.add %7872, %7874 : !llvm.i64 + %7876 = llvm.getelementptr %7868[%7875] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7877 = llvm.load %7876 : !llvm.ptr + %7878 = llvm.fadd %7877, %7867 {RelaxedPrecision} : !llvm.float + %7879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7882 = llvm.mul %25, %7881 : !llvm.i64 + %7883 = llvm.add %7880, %7882 : !llvm.i64 + %7884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7885 = llvm.mul %5912, %7884 : !llvm.i64 + %7886 = llvm.add %7883, %7885 : !llvm.i64 + %7887 = llvm.getelementptr %7879[%7886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7878, %7887 : !llvm.ptr + %7888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7891 = llvm.mul %25, %7890 : !llvm.i64 + %7892 = llvm.add %7889, %7891 : !llvm.i64 + %7893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7894 = llvm.mul %5912, %7893 : !llvm.i64 + %7895 = llvm.add %7892, %7894 : !llvm.i64 + %7896 = llvm.getelementptr %7888[%7895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7897 = llvm.load %7896 : !llvm.ptr + %7898 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7899 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7900 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7901 = llvm.mul %25, %7900 : !llvm.i64 + %7902 = llvm.add %7899, %7901 : !llvm.i64 + %7903 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7904 = llvm.mul %5912, %7903 : !llvm.i64 + %7905 = llvm.add %7902, %7904 : !llvm.i64 + %7906 = llvm.getelementptr %7898[%7905] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7897, %7906 : !llvm.ptr + %7907 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7908 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7909 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7910 = llvm.mul %25, %7909 : !llvm.i64 + %7911 = llvm.add %7908, %7910 : !llvm.i64 + %7912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7913 = llvm.mul %5851, %7912 : !llvm.i64 + %7914 = llvm.add %7911, %7913 : !llvm.i64 + %7915 = llvm.getelementptr %7907[%7914] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7916 = llvm.load %7915 : !llvm.ptr + %7917 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7918 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7919 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7920 = llvm.mul %5851, %7919 : !llvm.i64 + %7921 = llvm.add %7918, %7920 : !llvm.i64 + %7922 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7923 = llvm.mul %5973, %7922 : !llvm.i64 + %7924 = llvm.add %7921, %7923 : !llvm.i64 + %7925 = llvm.getelementptr %7917[%7924] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7926 = llvm.load %7925 : !llvm.ptr + %7927 = llvm.fmul %7916, %7926 {RelaxedPrecision} : !llvm.float + %7928 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7930 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7931 = llvm.mul %25, %7930 : !llvm.i64 + %7932 = llvm.add %7929, %7931 : !llvm.i64 + %7933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7934 = llvm.mul %5973, %7933 : !llvm.i64 + %7935 = llvm.add %7932, %7934 : !llvm.i64 + %7936 = llvm.getelementptr %7928[%7935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7937 = llvm.load %7936 : !llvm.ptr + %7938 = llvm.fadd %7937, %7927 {RelaxedPrecision} : !llvm.float + %7939 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7940 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7941 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7942 = llvm.mul %25, %7941 : !llvm.i64 + %7943 = llvm.add %7940, %7942 : !llvm.i64 + %7944 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7945 = llvm.mul %5973, %7944 : !llvm.i64 + %7946 = llvm.add %7943, %7945 : !llvm.i64 + %7947 = llvm.getelementptr %7939[%7946] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7938, %7947 : !llvm.ptr + %7948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7951 = llvm.mul %25, %7950 : !llvm.i64 + %7952 = llvm.add %7949, %7951 : !llvm.i64 + %7953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7954 = llvm.mul %5973, %7953 : !llvm.i64 + %7955 = llvm.add %7952, %7954 : !llvm.i64 + %7956 = llvm.getelementptr %7948[%7955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7957 = llvm.load %7956 : !llvm.ptr + %7958 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7960 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7961 = llvm.mul %25, %7960 : !llvm.i64 + %7962 = llvm.add %7959, %7961 : !llvm.i64 + %7963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7964 = llvm.mul %5973, %7963 : !llvm.i64 + %7965 = llvm.add %7962, %7964 : !llvm.i64 + %7966 = llvm.getelementptr %7958[%7965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7957, %7966 : !llvm.ptr + %7967 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7968 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7969 = llvm.mlir.constant(128 : index) : !llvm.i64 + %7970 = llvm.mul %25, %7969 : !llvm.i64 + %7971 = llvm.add %7968, %7970 : !llvm.i64 + %7972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7973 = llvm.mul %5851, %7972 : !llvm.i64 + %7974 = llvm.add %7971, %7973 : !llvm.i64 + %7975 = llvm.getelementptr %7967[%7974] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7976 = llvm.load %7975 : !llvm.ptr + %7977 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7978 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7979 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7980 = llvm.mul %5851, %7979 : !llvm.i64 + %7981 = llvm.add %7978, %7980 : !llvm.i64 + %7982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7983 = llvm.mul %6034, %7982 : !llvm.i64 + %7984 = llvm.add %7981, %7983 : !llvm.i64 + %7985 = llvm.getelementptr %7977[%7984] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7986 = llvm.load %7985 : !llvm.ptr + %7987 = llvm.fmul %7976, %7986 {RelaxedPrecision} : !llvm.float + %7988 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7990 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7991 = llvm.mul %25, %7990 : !llvm.i64 + %7992 = llvm.add %7989, %7991 : !llvm.i64 + %7993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7994 = llvm.mul %6034, %7993 : !llvm.i64 + %7995 = llvm.add %7992, %7994 : !llvm.i64 + %7996 = llvm.getelementptr %7988[%7995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7997 = llvm.load %7996 : !llvm.ptr + %7998 = llvm.fadd %7997, %7987 {RelaxedPrecision} : !llvm.float + %7999 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8000 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8001 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8002 = llvm.mul %25, %8001 : !llvm.i64 + %8003 = llvm.add %8000, %8002 : !llvm.i64 + %8004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8005 = llvm.mul %6034, %8004 : !llvm.i64 + %8006 = llvm.add %8003, %8005 : !llvm.i64 + %8007 = llvm.getelementptr %7999[%8006] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %7998, %8007 : !llvm.ptr + %8008 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8010 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8011 = llvm.mul %25, %8010 : !llvm.i64 + %8012 = llvm.add %8009, %8011 : !llvm.i64 + %8013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8014 = llvm.mul %6034, %8013 : !llvm.i64 + %8015 = llvm.add %8012, %8014 : !llvm.i64 + %8016 = llvm.getelementptr %8008[%8015] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8017 = llvm.load %8016 : !llvm.ptr + %8018 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8019 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8020 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8021 = llvm.mul %25, %8020 : !llvm.i64 + %8022 = llvm.add %8019, %8021 : !llvm.i64 + %8023 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8024 = llvm.mul %6034, %8023 : !llvm.i64 + %8025 = llvm.add %8022, %8024 : !llvm.i64 + %8026 = llvm.getelementptr %8018[%8025] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8017, %8026 : !llvm.ptr + %8027 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8028 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8029 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8030 = llvm.mul %25, %8029 : !llvm.i64 + %8031 = llvm.add %8028, %8030 : !llvm.i64 + %8032 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8033 = llvm.mul %5851, %8032 : !llvm.i64 + %8034 = llvm.add %8031, %8033 : !llvm.i64 + %8035 = llvm.getelementptr %8027[%8034] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8036 = llvm.load %8035 : !llvm.ptr + %8037 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8038 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8039 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8040 = llvm.mul %5851, %8039 : !llvm.i64 + %8041 = llvm.add %8038, %8040 : !llvm.i64 + %8042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8043 = llvm.mul %6095, %8042 : !llvm.i64 + %8044 = llvm.add %8041, %8043 : !llvm.i64 + %8045 = llvm.getelementptr %8037[%8044] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8046 = llvm.load %8045 : !llvm.ptr + %8047 = llvm.fmul %8036, %8046 {RelaxedPrecision} : !llvm.float + %8048 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8050 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8051 = llvm.mul %25, %8050 : !llvm.i64 + %8052 = llvm.add %8049, %8051 : !llvm.i64 + %8053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8054 = llvm.mul %6095, %8053 : !llvm.i64 + %8055 = llvm.add %8052, %8054 : !llvm.i64 + %8056 = llvm.getelementptr %8048[%8055] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8057 = llvm.load %8056 : !llvm.ptr + %8058 = llvm.fadd %8057, %8047 {RelaxedPrecision} : !llvm.float + %8059 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8060 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8061 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8062 = llvm.mul %25, %8061 : !llvm.i64 + %8063 = llvm.add %8060, %8062 : !llvm.i64 + %8064 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8065 = llvm.mul %6095, %8064 : !llvm.i64 + %8066 = llvm.add %8063, %8065 : !llvm.i64 + %8067 = llvm.getelementptr %8059[%8066] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8058, %8067 : !llvm.ptr + %8068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8071 = llvm.mul %25, %8070 : !llvm.i64 + %8072 = llvm.add %8069, %8071 : !llvm.i64 + %8073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8074 = llvm.mul %6095, %8073 : !llvm.i64 + %8075 = llvm.add %8072, %8074 : !llvm.i64 + %8076 = llvm.getelementptr %8068[%8075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8077 = llvm.load %8076 : !llvm.ptr + %8078 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8079 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8080 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8081 = llvm.mul %25, %8080 : !llvm.i64 + %8082 = llvm.add %8079, %8081 : !llvm.i64 + %8083 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8084 = llvm.mul %6095, %8083 : !llvm.i64 + %8085 = llvm.add %8082, %8084 : !llvm.i64 + %8086 = llvm.getelementptr %8078[%8085] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8077, %8086 : !llvm.ptr + %8087 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8088 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8089 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8090 = llvm.mul %25, %8089 : !llvm.i64 + %8091 = llvm.add %8088, %8090 : !llvm.i64 + %8092 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8093 = llvm.mul %5851, %8092 : !llvm.i64 + %8094 = llvm.add %8091, %8093 : !llvm.i64 + %8095 = llvm.getelementptr %8087[%8094] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8096 = llvm.load %8095 : !llvm.ptr + %8097 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8098 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8099 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8100 = llvm.mul %5851, %8099 : !llvm.i64 + %8101 = llvm.add %8098, %8100 : !llvm.i64 + %8102 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8103 = llvm.mul %6156, %8102 : !llvm.i64 + %8104 = llvm.add %8101, %8103 : !llvm.i64 + %8105 = llvm.getelementptr %8097[%8104] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8106 = llvm.load %8105 : !llvm.ptr + %8107 = llvm.fmul %8096, %8106 {RelaxedPrecision} : !llvm.float + %8108 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8109 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8110 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8111 = llvm.mul %25, %8110 : !llvm.i64 + %8112 = llvm.add %8109, %8111 : !llvm.i64 + %8113 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8114 = llvm.mul %6156, %8113 : !llvm.i64 + %8115 = llvm.add %8112, %8114 : !llvm.i64 + %8116 = llvm.getelementptr %8108[%8115] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8117 = llvm.load %8116 : !llvm.ptr + %8118 = llvm.fadd %8117, %8107 {RelaxedPrecision} : !llvm.float + %8119 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8120 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8121 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8122 = llvm.mul %25, %8121 : !llvm.i64 + %8123 = llvm.add %8120, %8122 : !llvm.i64 + %8124 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8125 = llvm.mul %6156, %8124 : !llvm.i64 + %8126 = llvm.add %8123, %8125 : !llvm.i64 + %8127 = llvm.getelementptr %8119[%8126] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8118, %8127 : !llvm.ptr + %8128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8131 = llvm.mul %25, %8130 : !llvm.i64 + %8132 = llvm.add %8129, %8131 : !llvm.i64 + %8133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8134 = llvm.mul %6156, %8133 : !llvm.i64 + %8135 = llvm.add %8132, %8134 : !llvm.i64 + %8136 = llvm.getelementptr %8128[%8135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8137 = llvm.load %8136 : !llvm.ptr + %8138 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8140 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8141 = llvm.mul %25, %8140 : !llvm.i64 + %8142 = llvm.add %8139, %8141 : !llvm.i64 + %8143 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8144 = llvm.mul %6156, %8143 : !llvm.i64 + %8145 = llvm.add %8142, %8144 : !llvm.i64 + %8146 = llvm.getelementptr %8138[%8145] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8137, %8146 : !llvm.ptr + %8147 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8148 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8149 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8150 = llvm.mul %25, %8149 : !llvm.i64 + %8151 = llvm.add %8148, %8150 : !llvm.i64 + %8152 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8153 = llvm.mul %5851, %8152 : !llvm.i64 + %8154 = llvm.add %8151, %8153 : !llvm.i64 + %8155 = llvm.getelementptr %8147[%8154] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8156 = llvm.load %8155 : !llvm.ptr + %8157 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8159 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8160 = llvm.mul %5851, %8159 : !llvm.i64 + %8161 = llvm.add %8158, %8160 : !llvm.i64 + %8162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8163 = llvm.mul %6217, %8162 : !llvm.i64 + %8164 = llvm.add %8161, %8163 : !llvm.i64 + %8165 = llvm.getelementptr %8157[%8164] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8166 = llvm.load %8165 : !llvm.ptr + %8167 = llvm.fmul %8156, %8166 {RelaxedPrecision} : !llvm.float + %8168 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8169 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8170 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8171 = llvm.mul %25, %8170 : !llvm.i64 + %8172 = llvm.add %8169, %8171 : !llvm.i64 + %8173 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8174 = llvm.mul %6217, %8173 : !llvm.i64 + %8175 = llvm.add %8172, %8174 : !llvm.i64 + %8176 = llvm.getelementptr %8168[%8175] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8177 = llvm.load %8176 : !llvm.ptr + %8178 = llvm.fadd %8177, %8167 {RelaxedPrecision} : !llvm.float + %8179 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8182 = llvm.mul %25, %8181 : !llvm.i64 + %8183 = llvm.add %8180, %8182 : !llvm.i64 + %8184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8185 = llvm.mul %6217, %8184 : !llvm.i64 + %8186 = llvm.add %8183, %8185 : !llvm.i64 + %8187 = llvm.getelementptr %8179[%8186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8178, %8187 : !llvm.ptr + %8188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8191 = llvm.mul %25, %8190 : !llvm.i64 + %8192 = llvm.add %8189, %8191 : !llvm.i64 + %8193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8194 = llvm.mul %6217, %8193 : !llvm.i64 + %8195 = llvm.add %8192, %8194 : !llvm.i64 + %8196 = llvm.getelementptr %8188[%8195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8197 = llvm.load %8196 : !llvm.ptr + %8198 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8200 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8201 = llvm.mul %25, %8200 : !llvm.i64 + %8202 = llvm.add %8199, %8201 : !llvm.i64 + %8203 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8204 = llvm.mul %6217, %8203 : !llvm.i64 + %8205 = llvm.add %8202, %8204 : !llvm.i64 + %8206 = llvm.getelementptr %8198[%8205] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8197, %8206 : !llvm.ptr + %8207 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8208 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8209 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8210 = llvm.mul %25, %8209 : !llvm.i64 + %8211 = llvm.add %8208, %8210 : !llvm.i64 + %8212 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8213 = llvm.mul %5851, %8212 : !llvm.i64 + %8214 = llvm.add %8211, %8213 : !llvm.i64 + %8215 = llvm.getelementptr %8207[%8214] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8216 = llvm.load %8215 : !llvm.ptr + %8217 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8218 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8219 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8220 = llvm.mul %5851, %8219 : !llvm.i64 + %8221 = llvm.add %8218, %8220 : !llvm.i64 + %8222 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8223 = llvm.mul %6278, %8222 : !llvm.i64 + %8224 = llvm.add %8221, %8223 : !llvm.i64 + %8225 = llvm.getelementptr %8217[%8224] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8226 = llvm.load %8225 : !llvm.ptr + %8227 = llvm.fmul %8216, %8226 {RelaxedPrecision} : !llvm.float + %8228 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8229 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8230 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8231 = llvm.mul %25, %8230 : !llvm.i64 + %8232 = llvm.add %8229, %8231 : !llvm.i64 + %8233 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8234 = llvm.mul %6278, %8233 : !llvm.i64 + %8235 = llvm.add %8232, %8234 : !llvm.i64 + %8236 = llvm.getelementptr %8228[%8235] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8237 = llvm.load %8236 : !llvm.ptr + %8238 = llvm.fadd %8237, %8227 {RelaxedPrecision} : !llvm.float + %8239 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8242 = llvm.mul %25, %8241 : !llvm.i64 + %8243 = llvm.add %8240, %8242 : !llvm.i64 + %8244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8245 = llvm.mul %6278, %8244 : !llvm.i64 + %8246 = llvm.add %8243, %8245 : !llvm.i64 + %8247 = llvm.getelementptr %8239[%8246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8238, %8247 : !llvm.ptr + %8248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8251 = llvm.mul %25, %8250 : !llvm.i64 + %8252 = llvm.add %8249, %8251 : !llvm.i64 + %8253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8254 = llvm.mul %6278, %8253 : !llvm.i64 + %8255 = llvm.add %8252, %8254 : !llvm.i64 + %8256 = llvm.getelementptr %8248[%8255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8257 = llvm.load %8256 : !llvm.ptr + %8258 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8259 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8260 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8261 = llvm.mul %25, %8260 : !llvm.i64 + %8262 = llvm.add %8259, %8261 : !llvm.i64 + %8263 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8264 = llvm.mul %6278, %8263 : !llvm.i64 + %8265 = llvm.add %8262, %8264 : !llvm.i64 + %8266 = llvm.getelementptr %8258[%8265] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8257, %8266 : !llvm.ptr + %8267 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8269 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8270 = llvm.mul %25, %8269 : !llvm.i64 + %8271 = llvm.add %8268, %8270 : !llvm.i64 + %8272 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8273 = llvm.mul %5851, %8272 : !llvm.i64 + %8274 = llvm.add %8271, %8273 : !llvm.i64 + %8275 = llvm.getelementptr %8267[%8274] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8276 = llvm.load %8275 : !llvm.ptr + %8277 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8278 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8279 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8280 = llvm.mul %5851, %8279 : !llvm.i64 + %8281 = llvm.add %8278, %8280 : !llvm.i64 + %8282 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8283 = llvm.mul %6339, %8282 : !llvm.i64 + %8284 = llvm.add %8281, %8283 : !llvm.i64 + %8285 = llvm.getelementptr %8277[%8284] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8286 = llvm.load %8285 : !llvm.ptr + %8287 = llvm.fmul %8276, %8286 {RelaxedPrecision} : !llvm.float + %8288 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8290 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8291 = llvm.mul %25, %8290 : !llvm.i64 + %8292 = llvm.add %8289, %8291 : !llvm.i64 + %8293 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8294 = llvm.mul %6339, %8293 : !llvm.i64 + %8295 = llvm.add %8292, %8294 : !llvm.i64 + %8296 = llvm.getelementptr %8288[%8295] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8297 = llvm.load %8296 : !llvm.ptr + %8298 = llvm.fadd %8297, %8287 {RelaxedPrecision} : !llvm.float + %8299 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8300 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8301 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8302 = llvm.mul %25, %8301 : !llvm.i64 + %8303 = llvm.add %8300, %8302 : !llvm.i64 + %8304 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8305 = llvm.mul %6339, %8304 : !llvm.i64 + %8306 = llvm.add %8303, %8305 : !llvm.i64 + %8307 = llvm.getelementptr %8299[%8306] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8298, %8307 : !llvm.ptr + %8308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8311 = llvm.mul %25, %8310 : !llvm.i64 + %8312 = llvm.add %8309, %8311 : !llvm.i64 + %8313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8314 = llvm.mul %6339, %8313 : !llvm.i64 + %8315 = llvm.add %8312, %8314 : !llvm.i64 + %8316 = llvm.getelementptr %8308[%8315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8317 = llvm.load %8316 : !llvm.ptr + %8318 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8320 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8321 = llvm.mul %25, %8320 : !llvm.i64 + %8322 = llvm.add %8319, %8321 : !llvm.i64 + %8323 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8324 = llvm.mul %6339, %8323 : !llvm.i64 + %8325 = llvm.add %8322, %8324 : !llvm.i64 + %8326 = llvm.getelementptr %8318[%8325] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8317, %8326 : !llvm.ptr + %8327 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8328 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8329 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8330 = llvm.mul %25, %8329 : !llvm.i64 + %8331 = llvm.add %8328, %8330 : !llvm.i64 + %8332 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8333 = llvm.mul %5851, %8332 : !llvm.i64 + %8334 = llvm.add %8331, %8333 : !llvm.i64 + %8335 = llvm.getelementptr %8327[%8334] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8336 = llvm.load %8335 : !llvm.ptr + %8337 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8339 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8340 = llvm.mul %5851, %8339 : !llvm.i64 + %8341 = llvm.add %8338, %8340 : !llvm.i64 + %8342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8343 = llvm.mul %6400, %8342 : !llvm.i64 + %8344 = llvm.add %8341, %8343 : !llvm.i64 + %8345 = llvm.getelementptr %8337[%8344] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8346 = llvm.load %8345 : !llvm.ptr + %8347 = llvm.fmul %8336, %8346 {RelaxedPrecision} : !llvm.float + %8348 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8349 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8350 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8351 = llvm.mul %25, %8350 : !llvm.i64 + %8352 = llvm.add %8349, %8351 : !llvm.i64 + %8353 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8354 = llvm.mul %6400, %8353 : !llvm.i64 + %8355 = llvm.add %8352, %8354 : !llvm.i64 + %8356 = llvm.getelementptr %8348[%8355] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8357 = llvm.load %8356 : !llvm.ptr + %8358 = llvm.fadd %8357, %8347 {RelaxedPrecision} : !llvm.float + %8359 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8360 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8361 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8362 = llvm.mul %25, %8361 : !llvm.i64 + %8363 = llvm.add %8360, %8362 : !llvm.i64 + %8364 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8365 = llvm.mul %6400, %8364 : !llvm.i64 + %8366 = llvm.add %8363, %8365 : !llvm.i64 + %8367 = llvm.getelementptr %8359[%8366] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8358, %8367 : !llvm.ptr + %8368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8371 = llvm.mul %25, %8370 : !llvm.i64 + %8372 = llvm.add %8369, %8371 : !llvm.i64 + %8373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8374 = llvm.mul %6400, %8373 : !llvm.i64 + %8375 = llvm.add %8372, %8374 : !llvm.i64 + %8376 = llvm.getelementptr %8368[%8375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8377 = llvm.load %8376 : !llvm.ptr + %8378 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8379 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8380 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8381 = llvm.mul %25, %8380 : !llvm.i64 + %8382 = llvm.add %8379, %8381 : !llvm.i64 + %8383 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8384 = llvm.mul %6400, %8383 : !llvm.i64 + %8385 = llvm.add %8382, %8384 : !llvm.i64 + %8386 = llvm.getelementptr %8378[%8385] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8377, %8386 : !llvm.ptr + %8387 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8389 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8390 = llvm.mul %25, %8389 : !llvm.i64 + %8391 = llvm.add %8388, %8390 : !llvm.i64 + %8392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8393 = llvm.mul %5851, %8392 : !llvm.i64 + %8394 = llvm.add %8391, %8393 : !llvm.i64 + %8395 = llvm.getelementptr %8387[%8394] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8396 = llvm.load %8395 : !llvm.ptr + %8397 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8398 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8399 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8400 = llvm.mul %5851, %8399 : !llvm.i64 + %8401 = llvm.add %8398, %8400 : !llvm.i64 + %8402 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8403 = llvm.mul %6461, %8402 : !llvm.i64 + %8404 = llvm.add %8401, %8403 : !llvm.i64 + %8405 = llvm.getelementptr %8397[%8404] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8406 = llvm.load %8405 : !llvm.ptr + %8407 = llvm.fmul %8396, %8406 {RelaxedPrecision} : !llvm.float + %8408 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8409 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8410 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8411 = llvm.mul %25, %8410 : !llvm.i64 + %8412 = llvm.add %8409, %8411 : !llvm.i64 + %8413 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8414 = llvm.mul %6461, %8413 : !llvm.i64 + %8415 = llvm.add %8412, %8414 : !llvm.i64 + %8416 = llvm.getelementptr %8408[%8415] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8417 = llvm.load %8416 : !llvm.ptr + %8418 = llvm.fadd %8417, %8407 {RelaxedPrecision} : !llvm.float + %8419 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8422 = llvm.mul %25, %8421 : !llvm.i64 + %8423 = llvm.add %8420, %8422 : !llvm.i64 + %8424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8425 = llvm.mul %6461, %8424 : !llvm.i64 + %8426 = llvm.add %8423, %8425 : !llvm.i64 + %8427 = llvm.getelementptr %8419[%8426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8418, %8427 : !llvm.ptr + %8428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8431 = llvm.mul %25, %8430 : !llvm.i64 + %8432 = llvm.add %8429, %8431 : !llvm.i64 + %8433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8434 = llvm.mul %6461, %8433 : !llvm.i64 + %8435 = llvm.add %8432, %8434 : !llvm.i64 + %8436 = llvm.getelementptr %8428[%8435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8437 = llvm.load %8436 : !llvm.ptr + %8438 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8439 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8440 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8441 = llvm.mul %25, %8440 : !llvm.i64 + %8442 = llvm.add %8439, %8441 : !llvm.i64 + %8443 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8444 = llvm.mul %6461, %8443 : !llvm.i64 + %8445 = llvm.add %8442, %8444 : !llvm.i64 + %8446 = llvm.getelementptr %8438[%8445] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8437, %8446 : !llvm.ptr + %8447 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8448 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8449 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8450 = llvm.mul %25, %8449 : !llvm.i64 + %8451 = llvm.add %8448, %8450 : !llvm.i64 + %8452 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8453 = llvm.mul %5851, %8452 : !llvm.i64 + %8454 = llvm.add %8451, %8453 : !llvm.i64 + %8455 = llvm.getelementptr %8447[%8454] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8456 = llvm.load %8455 : !llvm.ptr + %8457 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8458 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8459 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8460 = llvm.mul %5851, %8459 : !llvm.i64 + %8461 = llvm.add %8458, %8460 : !llvm.i64 + %8462 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8463 = llvm.mul %6522, %8462 : !llvm.i64 + %8464 = llvm.add %8461, %8463 : !llvm.i64 + %8465 = llvm.getelementptr %8457[%8464] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8466 = llvm.load %8465 : !llvm.ptr + %8467 = llvm.fmul %8456, %8466 {RelaxedPrecision} : !llvm.float + %8468 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8469 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8470 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8471 = llvm.mul %25, %8470 : !llvm.i64 + %8472 = llvm.add %8469, %8471 : !llvm.i64 + %8473 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8474 = llvm.mul %6522, %8473 : !llvm.i64 + %8475 = llvm.add %8472, %8474 : !llvm.i64 + %8476 = llvm.getelementptr %8468[%8475] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8477 = llvm.load %8476 : !llvm.ptr + %8478 = llvm.fadd %8477, %8467 {RelaxedPrecision} : !llvm.float + %8479 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8480 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8481 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8482 = llvm.mul %25, %8481 : !llvm.i64 + %8483 = llvm.add %8480, %8482 : !llvm.i64 + %8484 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8485 = llvm.mul %6522, %8484 : !llvm.i64 + %8486 = llvm.add %8483, %8485 : !llvm.i64 + %8487 = llvm.getelementptr %8479[%8486] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8478, %8487 : !llvm.ptr + %8488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8491 = llvm.mul %25, %8490 : !llvm.i64 + %8492 = llvm.add %8489, %8491 : !llvm.i64 + %8493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8494 = llvm.mul %6522, %8493 : !llvm.i64 + %8495 = llvm.add %8492, %8494 : !llvm.i64 + %8496 = llvm.getelementptr %8488[%8495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8497 = llvm.load %8496 : !llvm.ptr + %8498 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8499 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8500 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8501 = llvm.mul %25, %8500 : !llvm.i64 + %8502 = llvm.add %8499, %8501 : !llvm.i64 + %8503 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8504 = llvm.mul %6522, %8503 : !llvm.i64 + %8505 = llvm.add %8502, %8504 : !llvm.i64 + %8506 = llvm.getelementptr %8498[%8505] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8497, %8506 : !llvm.ptr + %8507 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8508 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8509 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8510 = llvm.mul %25, %8509 : !llvm.i64 + %8511 = llvm.add %8508, %8510 : !llvm.i64 + %8512 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8513 = llvm.mul %5851, %8512 : !llvm.i64 + %8514 = llvm.add %8511, %8513 : !llvm.i64 + %8515 = llvm.getelementptr %8507[%8514] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8516 = llvm.load %8515 : !llvm.ptr + %8517 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8519 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8520 = llvm.mul %5851, %8519 : !llvm.i64 + %8521 = llvm.add %8518, %8520 : !llvm.i64 + %8522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8523 = llvm.mul %6583, %8522 : !llvm.i64 + %8524 = llvm.add %8521, %8523 : !llvm.i64 + %8525 = llvm.getelementptr %8517[%8524] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8526 = llvm.load %8525 : !llvm.ptr + %8527 = llvm.fmul %8516, %8526 {RelaxedPrecision} : !llvm.float + %8528 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8529 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8530 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8531 = llvm.mul %25, %8530 : !llvm.i64 + %8532 = llvm.add %8529, %8531 : !llvm.i64 + %8533 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8534 = llvm.mul %6583, %8533 : !llvm.i64 + %8535 = llvm.add %8532, %8534 : !llvm.i64 + %8536 = llvm.getelementptr %8528[%8535] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8537 = llvm.load %8536 : !llvm.ptr + %8538 = llvm.fadd %8537, %8527 {RelaxedPrecision} : !llvm.float + %8539 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8540 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8541 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8542 = llvm.mul %25, %8541 : !llvm.i64 + %8543 = llvm.add %8540, %8542 : !llvm.i64 + %8544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8545 = llvm.mul %6583, %8544 : !llvm.i64 + %8546 = llvm.add %8543, %8545 : !llvm.i64 + %8547 = llvm.getelementptr %8539[%8546] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8538, %8547 : !llvm.ptr + %8548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8551 = llvm.mul %25, %8550 : !llvm.i64 + %8552 = llvm.add %8549, %8551 : !llvm.i64 + %8553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8554 = llvm.mul %6583, %8553 : !llvm.i64 + %8555 = llvm.add %8552, %8554 : !llvm.i64 + %8556 = llvm.getelementptr %8548[%8555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8557 = llvm.load %8556 : !llvm.ptr + %8558 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8559 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8560 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8561 = llvm.mul %25, %8560 : !llvm.i64 + %8562 = llvm.add %8559, %8561 : !llvm.i64 + %8563 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8564 = llvm.mul %6583, %8563 : !llvm.i64 + %8565 = llvm.add %8562, %8564 : !llvm.i64 + %8566 = llvm.getelementptr %8558[%8565] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8557, %8566 : !llvm.ptr + %8567 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8568 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8569 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8570 = llvm.mul %25, %8569 : !llvm.i64 + %8571 = llvm.add %8568, %8570 : !llvm.i64 + %8572 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8573 = llvm.mul %5851, %8572 : !llvm.i64 + %8574 = llvm.add %8571, %8573 : !llvm.i64 + %8575 = llvm.getelementptr %8567[%8574] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8576 = llvm.load %8575 : !llvm.ptr + %8577 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8578 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8579 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8580 = llvm.mul %5851, %8579 : !llvm.i64 + %8581 = llvm.add %8578, %8580 : !llvm.i64 + %8582 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8583 = llvm.mul %6644, %8582 : !llvm.i64 + %8584 = llvm.add %8581, %8583 : !llvm.i64 + %8585 = llvm.getelementptr %8577[%8584] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8586 = llvm.load %8585 : !llvm.ptr + %8587 = llvm.fmul %8576, %8586 {RelaxedPrecision} : !llvm.float + %8588 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8590 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8591 = llvm.mul %25, %8590 : !llvm.i64 + %8592 = llvm.add %8589, %8591 : !llvm.i64 + %8593 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8594 = llvm.mul %6644, %8593 : !llvm.i64 + %8595 = llvm.add %8592, %8594 : !llvm.i64 + %8596 = llvm.getelementptr %8588[%8595] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8597 = llvm.load %8596 : !llvm.ptr + %8598 = llvm.fadd %8597, %8587 {RelaxedPrecision} : !llvm.float + %8599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8602 = llvm.mul %25, %8601 : !llvm.i64 + %8603 = llvm.add %8600, %8602 : !llvm.i64 + %8604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8605 = llvm.mul %6644, %8604 : !llvm.i64 + %8606 = llvm.add %8603, %8605 : !llvm.i64 + %8607 = llvm.getelementptr %8599[%8606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8598, %8607 : !llvm.ptr + %8608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8611 = llvm.mul %25, %8610 : !llvm.i64 + %8612 = llvm.add %8609, %8611 : !llvm.i64 + %8613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8614 = llvm.mul %6644, %8613 : !llvm.i64 + %8615 = llvm.add %8612, %8614 : !llvm.i64 + %8616 = llvm.getelementptr %8608[%8615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8617 = llvm.load %8616 : !llvm.ptr + %8618 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8620 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8621 = llvm.mul %25, %8620 : !llvm.i64 + %8622 = llvm.add %8619, %8621 : !llvm.i64 + %8623 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8624 = llvm.mul %6644, %8623 : !llvm.i64 + %8625 = llvm.add %8622, %8624 : !llvm.i64 + %8626 = llvm.getelementptr %8618[%8625] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8617, %8626 : !llvm.ptr + %8627 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8628 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8629 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8630 = llvm.mul %25, %8629 : !llvm.i64 + %8631 = llvm.add %8628, %8630 : !llvm.i64 + %8632 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8633 = llvm.mul %5851, %8632 : !llvm.i64 + %8634 = llvm.add %8631, %8633 : !llvm.i64 + %8635 = llvm.getelementptr %8627[%8634] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8636 = llvm.load %8635 : !llvm.ptr + %8637 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8638 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8639 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8640 = llvm.mul %5851, %8639 : !llvm.i64 + %8641 = llvm.add %8638, %8640 : !llvm.i64 + %8642 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8643 = llvm.mul %6705, %8642 : !llvm.i64 + %8644 = llvm.add %8641, %8643 : !llvm.i64 + %8645 = llvm.getelementptr %8637[%8644] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8646 = llvm.load %8645 : !llvm.ptr + %8647 = llvm.fmul %8636, %8646 {RelaxedPrecision} : !llvm.float + %8648 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8649 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8650 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8651 = llvm.mul %25, %8650 : !llvm.i64 + %8652 = llvm.add %8649, %8651 : !llvm.i64 + %8653 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8654 = llvm.mul %6705, %8653 : !llvm.i64 + %8655 = llvm.add %8652, %8654 : !llvm.i64 + %8656 = llvm.getelementptr %8648[%8655] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8657 = llvm.load %8656 : !llvm.ptr + %8658 = llvm.fadd %8657, %8647 {RelaxedPrecision} : !llvm.float + %8659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8662 = llvm.mul %25, %8661 : !llvm.i64 + %8663 = llvm.add %8660, %8662 : !llvm.i64 + %8664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8665 = llvm.mul %6705, %8664 : !llvm.i64 + %8666 = llvm.add %8663, %8665 : !llvm.i64 + %8667 = llvm.getelementptr %8659[%8666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8658, %8667 : !llvm.ptr + %8668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8671 = llvm.mul %25, %8670 : !llvm.i64 + %8672 = llvm.add %8669, %8671 : !llvm.i64 + %8673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8674 = llvm.mul %6705, %8673 : !llvm.i64 + %8675 = llvm.add %8672, %8674 : !llvm.i64 + %8676 = llvm.getelementptr %8668[%8675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8677 = llvm.load %8676 : !llvm.ptr + %8678 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8680 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8681 = llvm.mul %25, %8680 : !llvm.i64 + %8682 = llvm.add %8679, %8681 : !llvm.i64 + %8683 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8684 = llvm.mul %6705, %8683 : !llvm.i64 + %8685 = llvm.add %8682, %8684 : !llvm.i64 + %8686 = llvm.getelementptr %8678[%8685] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8677, %8686 : !llvm.ptr + %8687 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8688 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8689 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8690 = llvm.mul %25, %8689 : !llvm.i64 + %8691 = llvm.add %8688, %8690 : !llvm.i64 + %8692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8693 = llvm.mul %5851, %8692 : !llvm.i64 + %8694 = llvm.add %8691, %8693 : !llvm.i64 + %8695 = llvm.getelementptr %8687[%8694] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8696 = llvm.load %8695 : !llvm.ptr + %8697 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8698 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8699 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8700 = llvm.mul %5851, %8699 : !llvm.i64 + %8701 = llvm.add %8698, %8700 : !llvm.i64 + %8702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8703 = llvm.mul %6766, %8702 : !llvm.i64 + %8704 = llvm.add %8701, %8703 : !llvm.i64 + %8705 = llvm.getelementptr %8697[%8704] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8706 = llvm.load %8705 : !llvm.ptr + %8707 = llvm.fmul %8696, %8706 {RelaxedPrecision} : !llvm.float + %8708 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8710 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8711 = llvm.mul %25, %8710 : !llvm.i64 + %8712 = llvm.add %8709, %8711 : !llvm.i64 + %8713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8714 = llvm.mul %6766, %8713 : !llvm.i64 + %8715 = llvm.add %8712, %8714 : !llvm.i64 + %8716 = llvm.getelementptr %8708[%8715] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8717 = llvm.load %8716 : !llvm.ptr + %8718 = llvm.fadd %8717, %8707 {RelaxedPrecision} : !llvm.float + %8719 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8720 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8721 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8722 = llvm.mul %25, %8721 : !llvm.i64 + %8723 = llvm.add %8720, %8722 : !llvm.i64 + %8724 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8725 = llvm.mul %6766, %8724 : !llvm.i64 + %8726 = llvm.add %8723, %8725 : !llvm.i64 + %8727 = llvm.getelementptr %8719[%8726] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8718, %8727 : !llvm.ptr + %8728 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8729 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8730 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8731 = llvm.mul %25, %8730 : !llvm.i64 + %8732 = llvm.add %8729, %8731 : !llvm.i64 + %8733 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8734 = llvm.mul %6766, %8733 : !llvm.i64 + %8735 = llvm.add %8732, %8734 : !llvm.i64 + %8736 = llvm.getelementptr %8728[%8735] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8737 = llvm.load %8736 : !llvm.ptr + %8738 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8739 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8740 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8741 = llvm.mul %25, %8740 : !llvm.i64 + %8742 = llvm.add %8739, %8741 : !llvm.i64 + %8743 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8744 = llvm.mul %6766, %8743 : !llvm.i64 + %8745 = llvm.add %8742, %8744 : !llvm.i64 + %8746 = llvm.getelementptr %8738[%8745] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8737, %8746 : !llvm.ptr + %8747 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8748 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8749 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8750 = llvm.mul %26, %8749 : !llvm.i64 + %8751 = llvm.add %8748, %8750 : !llvm.i64 + %8752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8753 = llvm.mul %5851, %8752 : !llvm.i64 + %8754 = llvm.add %8751, %8753 : !llvm.i64 + %8755 = llvm.getelementptr %8747[%8754] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8756 = llvm.load %8755 : !llvm.ptr + %8757 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8758 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8759 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8760 = llvm.mul %5851, %8759 : !llvm.i64 + %8761 = llvm.add %8758, %8760 : !llvm.i64 + %8762 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8763 = llvm.mul %5850, %8762 : !llvm.i64 + %8764 = llvm.add %8761, %8763 : !llvm.i64 + %8765 = llvm.getelementptr %8757[%8764] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8766 = llvm.load %8765 : !llvm.ptr + %8767 = llvm.fmul %8756, %8766 {RelaxedPrecision} : !llvm.float + %8768 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8770 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8771 = llvm.mul %26, %8770 : !llvm.i64 + %8772 = llvm.add %8769, %8771 : !llvm.i64 + %8773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8774 = llvm.mul %5850, %8773 : !llvm.i64 + %8775 = llvm.add %8772, %8774 : !llvm.i64 + %8776 = llvm.getelementptr %8768[%8775] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8777 = llvm.load %8776 : !llvm.ptr + %8778 = llvm.fadd %8777, %8767 {RelaxedPrecision} : !llvm.float + %8779 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8780 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8781 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8782 = llvm.mul %26, %8781 : !llvm.i64 + %8783 = llvm.add %8780, %8782 : !llvm.i64 + %8784 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8785 = llvm.mul %5850, %8784 : !llvm.i64 + %8786 = llvm.add %8783, %8785 : !llvm.i64 + %8787 = llvm.getelementptr %8779[%8786] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8778, %8787 : !llvm.ptr + %8788 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8790 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8791 = llvm.mul %26, %8790 : !llvm.i64 + %8792 = llvm.add %8789, %8791 : !llvm.i64 + %8793 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8794 = llvm.mul %5850, %8793 : !llvm.i64 + %8795 = llvm.add %8792, %8794 : !llvm.i64 + %8796 = llvm.getelementptr %8788[%8795] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8797 = llvm.load %8796 : !llvm.ptr + %8798 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8799 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8800 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8801 = llvm.mul %26, %8800 : !llvm.i64 + %8802 = llvm.add %8799, %8801 : !llvm.i64 + %8803 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8804 = llvm.mul %5850, %8803 : !llvm.i64 + %8805 = llvm.add %8802, %8804 : !llvm.i64 + %8806 = llvm.getelementptr %8798[%8805] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8797, %8806 : !llvm.ptr + %8807 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8808 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8809 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8810 = llvm.mul %26, %8809 : !llvm.i64 + %8811 = llvm.add %8808, %8810 : !llvm.i64 + %8812 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8813 = llvm.mul %5851, %8812 : !llvm.i64 + %8814 = llvm.add %8811, %8813 : !llvm.i64 + %8815 = llvm.getelementptr %8807[%8814] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8816 = llvm.load %8815 : !llvm.ptr + %8817 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8818 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8819 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8820 = llvm.mul %5851, %8819 : !llvm.i64 + %8821 = llvm.add %8818, %8820 : !llvm.i64 + %8822 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8823 = llvm.mul %5912, %8822 : !llvm.i64 + %8824 = llvm.add %8821, %8823 : !llvm.i64 + %8825 = llvm.getelementptr %8817[%8824] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8826 = llvm.load %8825 : !llvm.ptr + %8827 = llvm.fmul %8816, %8826 {RelaxedPrecision} : !llvm.float + %8828 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8829 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8830 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8831 = llvm.mul %26, %8830 : !llvm.i64 + %8832 = llvm.add %8829, %8831 : !llvm.i64 + %8833 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8834 = llvm.mul %5912, %8833 : !llvm.i64 + %8835 = llvm.add %8832, %8834 : !llvm.i64 + %8836 = llvm.getelementptr %8828[%8835] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8837 = llvm.load %8836 : !llvm.ptr + %8838 = llvm.fadd %8837, %8827 {RelaxedPrecision} : !llvm.float + %8839 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8840 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8841 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8842 = llvm.mul %26, %8841 : !llvm.i64 + %8843 = llvm.add %8840, %8842 : !llvm.i64 + %8844 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8845 = llvm.mul %5912, %8844 : !llvm.i64 + %8846 = llvm.add %8843, %8845 : !llvm.i64 + %8847 = llvm.getelementptr %8839[%8846] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8838, %8847 : !llvm.ptr + %8848 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8850 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8851 = llvm.mul %26, %8850 : !llvm.i64 + %8852 = llvm.add %8849, %8851 : !llvm.i64 + %8853 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8854 = llvm.mul %5912, %8853 : !llvm.i64 + %8855 = llvm.add %8852, %8854 : !llvm.i64 + %8856 = llvm.getelementptr %8848[%8855] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8857 = llvm.load %8856 : !llvm.ptr + %8858 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8860 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8861 = llvm.mul %26, %8860 : !llvm.i64 + %8862 = llvm.add %8859, %8861 : !llvm.i64 + %8863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8864 = llvm.mul %5912, %8863 : !llvm.i64 + %8865 = llvm.add %8862, %8864 : !llvm.i64 + %8866 = llvm.getelementptr %8858[%8865] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8857, %8866 : !llvm.ptr + %8867 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8868 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8869 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8870 = llvm.mul %26, %8869 : !llvm.i64 + %8871 = llvm.add %8868, %8870 : !llvm.i64 + %8872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8873 = llvm.mul %5851, %8872 : !llvm.i64 + %8874 = llvm.add %8871, %8873 : !llvm.i64 + %8875 = llvm.getelementptr %8867[%8874] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8876 = llvm.load %8875 : !llvm.ptr + %8877 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8878 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8879 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8880 = llvm.mul %5851, %8879 : !llvm.i64 + %8881 = llvm.add %8878, %8880 : !llvm.i64 + %8882 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8883 = llvm.mul %5973, %8882 : !llvm.i64 + %8884 = llvm.add %8881, %8883 : !llvm.i64 + %8885 = llvm.getelementptr %8877[%8884] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8886 = llvm.load %8885 : !llvm.ptr + %8887 = llvm.fmul %8876, %8886 {RelaxedPrecision} : !llvm.float + %8888 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8889 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8890 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8891 = llvm.mul %26, %8890 : !llvm.i64 + %8892 = llvm.add %8889, %8891 : !llvm.i64 + %8893 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8894 = llvm.mul %5973, %8893 : !llvm.i64 + %8895 = llvm.add %8892, %8894 : !llvm.i64 + %8896 = llvm.getelementptr %8888[%8895] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8897 = llvm.load %8896 : !llvm.ptr + %8898 = llvm.fadd %8897, %8887 {RelaxedPrecision} : !llvm.float + %8899 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8901 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8902 = llvm.mul %26, %8901 : !llvm.i64 + %8903 = llvm.add %8900, %8902 : !llvm.i64 + %8904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8905 = llvm.mul %5973, %8904 : !llvm.i64 + %8906 = llvm.add %8903, %8905 : !llvm.i64 + %8907 = llvm.getelementptr %8899[%8906] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8898, %8907 : !llvm.ptr + %8908 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8910 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8911 = llvm.mul %26, %8910 : !llvm.i64 + %8912 = llvm.add %8909, %8911 : !llvm.i64 + %8913 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8914 = llvm.mul %5973, %8913 : !llvm.i64 + %8915 = llvm.add %8912, %8914 : !llvm.i64 + %8916 = llvm.getelementptr %8908[%8915] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8917 = llvm.load %8916 : !llvm.ptr + %8918 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8920 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8921 = llvm.mul %26, %8920 : !llvm.i64 + %8922 = llvm.add %8919, %8921 : !llvm.i64 + %8923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8924 = llvm.mul %5973, %8923 : !llvm.i64 + %8925 = llvm.add %8922, %8924 : !llvm.i64 + %8926 = llvm.getelementptr %8918[%8925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8917, %8926 : !llvm.ptr + %8927 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8928 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8929 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8930 = llvm.mul %26, %8929 : !llvm.i64 + %8931 = llvm.add %8928, %8930 : !llvm.i64 + %8932 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8933 = llvm.mul %5851, %8932 : !llvm.i64 + %8934 = llvm.add %8931, %8933 : !llvm.i64 + %8935 = llvm.getelementptr %8927[%8934] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8936 = llvm.load %8935 : !llvm.ptr + %8937 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8938 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8939 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8940 = llvm.mul %5851, %8939 : !llvm.i64 + %8941 = llvm.add %8938, %8940 : !llvm.i64 + %8942 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8943 = llvm.mul %6034, %8942 : !llvm.i64 + %8944 = llvm.add %8941, %8943 : !llvm.i64 + %8945 = llvm.getelementptr %8937[%8944] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8946 = llvm.load %8945 : !llvm.ptr + %8947 = llvm.fmul %8936, %8946 {RelaxedPrecision} : !llvm.float + %8948 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8950 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8951 = llvm.mul %26, %8950 : !llvm.i64 + %8952 = llvm.add %8949, %8951 : !llvm.i64 + %8953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8954 = llvm.mul %6034, %8953 : !llvm.i64 + %8955 = llvm.add %8952, %8954 : !llvm.i64 + %8956 = llvm.getelementptr %8948[%8955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8957 = llvm.load %8956 : !llvm.ptr + %8958 = llvm.fadd %8957, %8947 {RelaxedPrecision} : !llvm.float + %8959 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8960 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8961 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8962 = llvm.mul %26, %8961 : !llvm.i64 + %8963 = llvm.add %8960, %8962 : !llvm.i64 + %8964 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8965 = llvm.mul %6034, %8964 : !llvm.i64 + %8966 = llvm.add %8963, %8965 : !llvm.i64 + %8967 = llvm.getelementptr %8959[%8966] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8958, %8967 : !llvm.ptr + %8968 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8970 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8971 = llvm.mul %26, %8970 : !llvm.i64 + %8972 = llvm.add %8969, %8971 : !llvm.i64 + %8973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8974 = llvm.mul %6034, %8973 : !llvm.i64 + %8975 = llvm.add %8972, %8974 : !llvm.i64 + %8976 = llvm.getelementptr %8968[%8975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8977 = llvm.load %8976 : !llvm.ptr + %8978 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8980 = llvm.mlir.constant(512 : index) : !llvm.i64 + %8981 = llvm.mul %26, %8980 : !llvm.i64 + %8982 = llvm.add %8979, %8981 : !llvm.i64 + %8983 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8984 = llvm.mul %6034, %8983 : !llvm.i64 + %8985 = llvm.add %8982, %8984 : !llvm.i64 + %8986 = llvm.getelementptr %8978[%8985] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %8977, %8986 : !llvm.ptr + %8987 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8988 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8989 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8990 = llvm.mul %26, %8989 : !llvm.i64 + %8991 = llvm.add %8988, %8990 : !llvm.i64 + %8992 = llvm.mlir.constant(1 : index) : !llvm.i64 + %8993 = llvm.mul %5851, %8992 : !llvm.i64 + %8994 = llvm.add %8991, %8993 : !llvm.i64 + %8995 = llvm.getelementptr %8987[%8994] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %8996 = llvm.load %8995 : !llvm.ptr + %8997 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8998 = llvm.mlir.constant(0 : index) : !llvm.i64 + %8999 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9000 = llvm.mul %5851, %8999 : !llvm.i64 + %9001 = llvm.add %8998, %9000 : !llvm.i64 + %9002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9003 = llvm.mul %6095, %9002 : !llvm.i64 + %9004 = llvm.add %9001, %9003 : !llvm.i64 + %9005 = llvm.getelementptr %8997[%9004] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9006 = llvm.load %9005 : !llvm.ptr + %9007 = llvm.fmul %8996, %9006 {RelaxedPrecision} : !llvm.float + %9008 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9010 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9011 = llvm.mul %26, %9010 : !llvm.i64 + %9012 = llvm.add %9009, %9011 : !llvm.i64 + %9013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9014 = llvm.mul %6095, %9013 : !llvm.i64 + %9015 = llvm.add %9012, %9014 : !llvm.i64 + %9016 = llvm.getelementptr %9008[%9015] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9017 = llvm.load %9016 : !llvm.ptr + %9018 = llvm.fadd %9017, %9007 {RelaxedPrecision} : !llvm.float + %9019 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9020 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9021 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9022 = llvm.mul %26, %9021 : !llvm.i64 + %9023 = llvm.add %9020, %9022 : !llvm.i64 + %9024 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9025 = llvm.mul %6095, %9024 : !llvm.i64 + %9026 = llvm.add %9023, %9025 : !llvm.i64 + %9027 = llvm.getelementptr %9019[%9026] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9018, %9027 : !llvm.ptr + %9028 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9030 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9031 = llvm.mul %26, %9030 : !llvm.i64 + %9032 = llvm.add %9029, %9031 : !llvm.i64 + %9033 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9034 = llvm.mul %6095, %9033 : !llvm.i64 + %9035 = llvm.add %9032, %9034 : !llvm.i64 + %9036 = llvm.getelementptr %9028[%9035] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9037 = llvm.load %9036 : !llvm.ptr + %9038 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9039 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9040 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9041 = llvm.mul %26, %9040 : !llvm.i64 + %9042 = llvm.add %9039, %9041 : !llvm.i64 + %9043 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9044 = llvm.mul %6095, %9043 : !llvm.i64 + %9045 = llvm.add %9042, %9044 : !llvm.i64 + %9046 = llvm.getelementptr %9038[%9045] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9037, %9046 : !llvm.ptr + %9047 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9048 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9049 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9050 = llvm.mul %26, %9049 : !llvm.i64 + %9051 = llvm.add %9048, %9050 : !llvm.i64 + %9052 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9053 = llvm.mul %5851, %9052 : !llvm.i64 + %9054 = llvm.add %9051, %9053 : !llvm.i64 + %9055 = llvm.getelementptr %9047[%9054] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9056 = llvm.load %9055 : !llvm.ptr + %9057 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9058 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9059 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9060 = llvm.mul %5851, %9059 : !llvm.i64 + %9061 = llvm.add %9058, %9060 : !llvm.i64 + %9062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9063 = llvm.mul %6156, %9062 : !llvm.i64 + %9064 = llvm.add %9061, %9063 : !llvm.i64 + %9065 = llvm.getelementptr %9057[%9064] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9066 = llvm.load %9065 : !llvm.ptr + %9067 = llvm.fmul %9056, %9066 {RelaxedPrecision} : !llvm.float + %9068 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9070 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9071 = llvm.mul %26, %9070 : !llvm.i64 + %9072 = llvm.add %9069, %9071 : !llvm.i64 + %9073 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9074 = llvm.mul %6156, %9073 : !llvm.i64 + %9075 = llvm.add %9072, %9074 : !llvm.i64 + %9076 = llvm.getelementptr %9068[%9075] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9077 = llvm.load %9076 : !llvm.ptr + %9078 = llvm.fadd %9077, %9067 {RelaxedPrecision} : !llvm.float + %9079 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9080 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9081 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9082 = llvm.mul %26, %9081 : !llvm.i64 + %9083 = llvm.add %9080, %9082 : !llvm.i64 + %9084 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9085 = llvm.mul %6156, %9084 : !llvm.i64 + %9086 = llvm.add %9083, %9085 : !llvm.i64 + %9087 = llvm.getelementptr %9079[%9086] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9078, %9087 : !llvm.ptr + %9088 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9089 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9090 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9091 = llvm.mul %26, %9090 : !llvm.i64 + %9092 = llvm.add %9089, %9091 : !llvm.i64 + %9093 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9094 = llvm.mul %6156, %9093 : !llvm.i64 + %9095 = llvm.add %9092, %9094 : !llvm.i64 + %9096 = llvm.getelementptr %9088[%9095] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9097 = llvm.load %9096 : !llvm.ptr + %9098 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9100 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9101 = llvm.mul %26, %9100 : !llvm.i64 + %9102 = llvm.add %9099, %9101 : !llvm.i64 + %9103 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9104 = llvm.mul %6156, %9103 : !llvm.i64 + %9105 = llvm.add %9102, %9104 : !llvm.i64 + %9106 = llvm.getelementptr %9098[%9105] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9097, %9106 : !llvm.ptr + %9107 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9108 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9109 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9110 = llvm.mul %26, %9109 : !llvm.i64 + %9111 = llvm.add %9108, %9110 : !llvm.i64 + %9112 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9113 = llvm.mul %5851, %9112 : !llvm.i64 + %9114 = llvm.add %9111, %9113 : !llvm.i64 + %9115 = llvm.getelementptr %9107[%9114] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9116 = llvm.load %9115 : !llvm.ptr + %9117 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9118 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9119 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9120 = llvm.mul %5851, %9119 : !llvm.i64 + %9121 = llvm.add %9118, %9120 : !llvm.i64 + %9122 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9123 = llvm.mul %6217, %9122 : !llvm.i64 + %9124 = llvm.add %9121, %9123 : !llvm.i64 + %9125 = llvm.getelementptr %9117[%9124] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9126 = llvm.load %9125 : !llvm.ptr + %9127 = llvm.fmul %9116, %9126 {RelaxedPrecision} : !llvm.float + %9128 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9129 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9130 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9131 = llvm.mul %26, %9130 : !llvm.i64 + %9132 = llvm.add %9129, %9131 : !llvm.i64 + %9133 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9134 = llvm.mul %6217, %9133 : !llvm.i64 + %9135 = llvm.add %9132, %9134 : !llvm.i64 + %9136 = llvm.getelementptr %9128[%9135] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9137 = llvm.load %9136 : !llvm.ptr + %9138 = llvm.fadd %9137, %9127 {RelaxedPrecision} : !llvm.float + %9139 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9140 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9142 = llvm.mul %26, %9141 : !llvm.i64 + %9143 = llvm.add %9140, %9142 : !llvm.i64 + %9144 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9145 = llvm.mul %6217, %9144 : !llvm.i64 + %9146 = llvm.add %9143, %9145 : !llvm.i64 + %9147 = llvm.getelementptr %9139[%9146] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9138, %9147 : !llvm.ptr + %9148 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9149 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9150 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9151 = llvm.mul %26, %9150 : !llvm.i64 + %9152 = llvm.add %9149, %9151 : !llvm.i64 + %9153 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9154 = llvm.mul %6217, %9153 : !llvm.i64 + %9155 = llvm.add %9152, %9154 : !llvm.i64 + %9156 = llvm.getelementptr %9148[%9155] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9157 = llvm.load %9156 : !llvm.ptr + %9158 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9161 = llvm.mul %26, %9160 : !llvm.i64 + %9162 = llvm.add %9159, %9161 : !llvm.i64 + %9163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9164 = llvm.mul %6217, %9163 : !llvm.i64 + %9165 = llvm.add %9162, %9164 : !llvm.i64 + %9166 = llvm.getelementptr %9158[%9165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9157, %9166 : !llvm.ptr + %9167 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9169 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9170 = llvm.mul %26, %9169 : !llvm.i64 + %9171 = llvm.add %9168, %9170 : !llvm.i64 + %9172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9173 = llvm.mul %5851, %9172 : !llvm.i64 + %9174 = llvm.add %9171, %9173 : !llvm.i64 + %9175 = llvm.getelementptr %9167[%9174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9176 = llvm.load %9175 : !llvm.ptr + %9177 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9179 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9180 = llvm.mul %5851, %9179 : !llvm.i64 + %9181 = llvm.add %9178, %9180 : !llvm.i64 + %9182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9183 = llvm.mul %6278, %9182 : !llvm.i64 + %9184 = llvm.add %9181, %9183 : !llvm.i64 + %9185 = llvm.getelementptr %9177[%9184] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9186 = llvm.load %9185 : !llvm.ptr + %9187 = llvm.fmul %9176, %9186 {RelaxedPrecision} : !llvm.float + %9188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9191 = llvm.mul %26, %9190 : !llvm.i64 + %9192 = llvm.add %9189, %9191 : !llvm.i64 + %9193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9194 = llvm.mul %6278, %9193 : !llvm.i64 + %9195 = llvm.add %9192, %9194 : !llvm.i64 + %9196 = llvm.getelementptr %9188[%9195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9197 = llvm.load %9196 : !llvm.ptr + %9198 = llvm.fadd %9197, %9187 {RelaxedPrecision} : !llvm.float + %9199 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9200 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9201 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9202 = llvm.mul %26, %9201 : !llvm.i64 + %9203 = llvm.add %9200, %9202 : !llvm.i64 + %9204 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9205 = llvm.mul %6278, %9204 : !llvm.i64 + %9206 = llvm.add %9203, %9205 : !llvm.i64 + %9207 = llvm.getelementptr %9199[%9206] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9198, %9207 : !llvm.ptr + %9208 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9209 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9210 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9211 = llvm.mul %26, %9210 : !llvm.i64 + %9212 = llvm.add %9209, %9211 : !llvm.i64 + %9213 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9214 = llvm.mul %6278, %9213 : !llvm.i64 + %9215 = llvm.add %9212, %9214 : !llvm.i64 + %9216 = llvm.getelementptr %9208[%9215] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9217 = llvm.load %9216 : !llvm.ptr + %9218 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9219 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9220 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9221 = llvm.mul %26, %9220 : !llvm.i64 + %9222 = llvm.add %9219, %9221 : !llvm.i64 + %9223 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9224 = llvm.mul %6278, %9223 : !llvm.i64 + %9225 = llvm.add %9222, %9224 : !llvm.i64 + %9226 = llvm.getelementptr %9218[%9225] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9217, %9226 : !llvm.ptr + %9227 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9228 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9229 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9230 = llvm.mul %26, %9229 : !llvm.i64 + %9231 = llvm.add %9228, %9230 : !llvm.i64 + %9232 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9233 = llvm.mul %5851, %9232 : !llvm.i64 + %9234 = llvm.add %9231, %9233 : !llvm.i64 + %9235 = llvm.getelementptr %9227[%9234] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9236 = llvm.load %9235 : !llvm.ptr + %9237 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9238 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9239 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9240 = llvm.mul %5851, %9239 : !llvm.i64 + %9241 = llvm.add %9238, %9240 : !llvm.i64 + %9242 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9243 = llvm.mul %6339, %9242 : !llvm.i64 + %9244 = llvm.add %9241, %9243 : !llvm.i64 + %9245 = llvm.getelementptr %9237[%9244] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9246 = llvm.load %9245 : !llvm.ptr + %9247 = llvm.fmul %9236, %9246 {RelaxedPrecision} : !llvm.float + %9248 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9249 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9250 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9251 = llvm.mul %26, %9250 : !llvm.i64 + %9252 = llvm.add %9249, %9251 : !llvm.i64 + %9253 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9254 = llvm.mul %6339, %9253 : !llvm.i64 + %9255 = llvm.add %9252, %9254 : !llvm.i64 + %9256 = llvm.getelementptr %9248[%9255] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9257 = llvm.load %9256 : !llvm.ptr + %9258 = llvm.fadd %9257, %9247 {RelaxedPrecision} : !llvm.float + %9259 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9260 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9261 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9262 = llvm.mul %26, %9261 : !llvm.i64 + %9263 = llvm.add %9260, %9262 : !llvm.i64 + %9264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9265 = llvm.mul %6339, %9264 : !llvm.i64 + %9266 = llvm.add %9263, %9265 : !llvm.i64 + %9267 = llvm.getelementptr %9259[%9266] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9258, %9267 : !llvm.ptr + %9268 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9269 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9270 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9271 = llvm.mul %26, %9270 : !llvm.i64 + %9272 = llvm.add %9269, %9271 : !llvm.i64 + %9273 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9274 = llvm.mul %6339, %9273 : !llvm.i64 + %9275 = llvm.add %9272, %9274 : !llvm.i64 + %9276 = llvm.getelementptr %9268[%9275] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9277 = llvm.load %9276 : !llvm.ptr + %9278 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9279 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9280 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9281 = llvm.mul %26, %9280 : !llvm.i64 + %9282 = llvm.add %9279, %9281 : !llvm.i64 + %9283 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9284 = llvm.mul %6339, %9283 : !llvm.i64 + %9285 = llvm.add %9282, %9284 : !llvm.i64 + %9286 = llvm.getelementptr %9278[%9285] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9277, %9286 : !llvm.ptr + %9287 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9288 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9289 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9290 = llvm.mul %26, %9289 : !llvm.i64 + %9291 = llvm.add %9288, %9290 : !llvm.i64 + %9292 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9293 = llvm.mul %5851, %9292 : !llvm.i64 + %9294 = llvm.add %9291, %9293 : !llvm.i64 + %9295 = llvm.getelementptr %9287[%9294] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9296 = llvm.load %9295 : !llvm.ptr + %9297 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9298 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9299 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9300 = llvm.mul %5851, %9299 : !llvm.i64 + %9301 = llvm.add %9298, %9300 : !llvm.i64 + %9302 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9303 = llvm.mul %6400, %9302 : !llvm.i64 + %9304 = llvm.add %9301, %9303 : !llvm.i64 + %9305 = llvm.getelementptr %9297[%9304] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9306 = llvm.load %9305 : !llvm.ptr + %9307 = llvm.fmul %9296, %9306 {RelaxedPrecision} : !llvm.float + %9308 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9309 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9310 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9311 = llvm.mul %26, %9310 : !llvm.i64 + %9312 = llvm.add %9309, %9311 : !llvm.i64 + %9313 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9314 = llvm.mul %6400, %9313 : !llvm.i64 + %9315 = llvm.add %9312, %9314 : !llvm.i64 + %9316 = llvm.getelementptr %9308[%9315] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9317 = llvm.load %9316 : !llvm.ptr + %9318 = llvm.fadd %9317, %9307 {RelaxedPrecision} : !llvm.float + %9319 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9320 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9321 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9322 = llvm.mul %26, %9321 : !llvm.i64 + %9323 = llvm.add %9320, %9322 : !llvm.i64 + %9324 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9325 = llvm.mul %6400, %9324 : !llvm.i64 + %9326 = llvm.add %9323, %9325 : !llvm.i64 + %9327 = llvm.getelementptr %9319[%9326] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9318, %9327 : !llvm.ptr + %9328 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9330 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9331 = llvm.mul %26, %9330 : !llvm.i64 + %9332 = llvm.add %9329, %9331 : !llvm.i64 + %9333 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9334 = llvm.mul %6400, %9333 : !llvm.i64 + %9335 = llvm.add %9332, %9334 : !llvm.i64 + %9336 = llvm.getelementptr %9328[%9335] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9337 = llvm.load %9336 : !llvm.ptr + %9338 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9339 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9340 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9341 = llvm.mul %26, %9340 : !llvm.i64 + %9342 = llvm.add %9339, %9341 : !llvm.i64 + %9343 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9344 = llvm.mul %6400, %9343 : !llvm.i64 + %9345 = llvm.add %9342, %9344 : !llvm.i64 + %9346 = llvm.getelementptr %9338[%9345] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9337, %9346 : !llvm.ptr + %9347 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9349 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9350 = llvm.mul %26, %9349 : !llvm.i64 + %9351 = llvm.add %9348, %9350 : !llvm.i64 + %9352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9353 = llvm.mul %5851, %9352 : !llvm.i64 + %9354 = llvm.add %9351, %9353 : !llvm.i64 + %9355 = llvm.getelementptr %9347[%9354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9356 = llvm.load %9355 : !llvm.ptr + %9357 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9358 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9359 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9360 = llvm.mul %5851, %9359 : !llvm.i64 + %9361 = llvm.add %9358, %9360 : !llvm.i64 + %9362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9363 = llvm.mul %6461, %9362 : !llvm.i64 + %9364 = llvm.add %9361, %9363 : !llvm.i64 + %9365 = llvm.getelementptr %9357[%9364] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9366 = llvm.load %9365 : !llvm.ptr + %9367 = llvm.fmul %9356, %9366 {RelaxedPrecision} : !llvm.float + %9368 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9371 = llvm.mul %26, %9370 : !llvm.i64 + %9372 = llvm.add %9369, %9371 : !llvm.i64 + %9373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9374 = llvm.mul %6461, %9373 : !llvm.i64 + %9375 = llvm.add %9372, %9374 : !llvm.i64 + %9376 = llvm.getelementptr %9368[%9375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9377 = llvm.load %9376 : !llvm.ptr + %9378 = llvm.fadd %9377, %9367 {RelaxedPrecision} : !llvm.float + %9379 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9380 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9381 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9382 = llvm.mul %26, %9381 : !llvm.i64 + %9383 = llvm.add %9380, %9382 : !llvm.i64 + %9384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9385 = llvm.mul %6461, %9384 : !llvm.i64 + %9386 = llvm.add %9383, %9385 : !llvm.i64 + %9387 = llvm.getelementptr %9379[%9386] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9378, %9387 : !llvm.ptr + %9388 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9389 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9390 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9391 = llvm.mul %26, %9390 : !llvm.i64 + %9392 = llvm.add %9389, %9391 : !llvm.i64 + %9393 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9394 = llvm.mul %6461, %9393 : !llvm.i64 + %9395 = llvm.add %9392, %9394 : !llvm.i64 + %9396 = llvm.getelementptr %9388[%9395] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9397 = llvm.load %9396 : !llvm.ptr + %9398 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9399 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9401 = llvm.mul %26, %9400 : !llvm.i64 + %9402 = llvm.add %9399, %9401 : !llvm.i64 + %9403 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9404 = llvm.mul %6461, %9403 : !llvm.i64 + %9405 = llvm.add %9402, %9404 : !llvm.i64 + %9406 = llvm.getelementptr %9398[%9405] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9397, %9406 : !llvm.ptr + %9407 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9408 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9409 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9410 = llvm.mul %26, %9409 : !llvm.i64 + %9411 = llvm.add %9408, %9410 : !llvm.i64 + %9412 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9413 = llvm.mul %5851, %9412 : !llvm.i64 + %9414 = llvm.add %9411, %9413 : !llvm.i64 + %9415 = llvm.getelementptr %9407[%9414] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9416 = llvm.load %9415 : !llvm.ptr + %9417 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9418 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9419 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9420 = llvm.mul %5851, %9419 : !llvm.i64 + %9421 = llvm.add %9418, %9420 : !llvm.i64 + %9422 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9423 = llvm.mul %6522, %9422 : !llvm.i64 + %9424 = llvm.add %9421, %9423 : !llvm.i64 + %9425 = llvm.getelementptr %9417[%9424] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9426 = llvm.load %9425 : !llvm.ptr + %9427 = llvm.fmul %9416, %9426 {RelaxedPrecision} : !llvm.float + %9428 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9429 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9430 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9431 = llvm.mul %26, %9430 : !llvm.i64 + %9432 = llvm.add %9429, %9431 : !llvm.i64 + %9433 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9434 = llvm.mul %6522, %9433 : !llvm.i64 + %9435 = llvm.add %9432, %9434 : !llvm.i64 + %9436 = llvm.getelementptr %9428[%9435] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9437 = llvm.load %9436 : !llvm.ptr + %9438 = llvm.fadd %9437, %9427 {RelaxedPrecision} : !llvm.float + %9439 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9440 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9441 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9442 = llvm.mul %26, %9441 : !llvm.i64 + %9443 = llvm.add %9440, %9442 : !llvm.i64 + %9444 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9445 = llvm.mul %6522, %9444 : !llvm.i64 + %9446 = llvm.add %9443, %9445 : !llvm.i64 + %9447 = llvm.getelementptr %9439[%9446] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9438, %9447 : !llvm.ptr + %9448 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9450 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9451 = llvm.mul %26, %9450 : !llvm.i64 + %9452 = llvm.add %9449, %9451 : !llvm.i64 + %9453 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9454 = llvm.mul %6522, %9453 : !llvm.i64 + %9455 = llvm.add %9452, %9454 : !llvm.i64 + %9456 = llvm.getelementptr %9448[%9455] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9457 = llvm.load %9456 : !llvm.ptr + %9458 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9459 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9460 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9461 = llvm.mul %26, %9460 : !llvm.i64 + %9462 = llvm.add %9459, %9461 : !llvm.i64 + %9463 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9464 = llvm.mul %6522, %9463 : !llvm.i64 + %9465 = llvm.add %9462, %9464 : !llvm.i64 + %9466 = llvm.getelementptr %9458[%9465] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9457, %9466 : !llvm.ptr + %9467 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9468 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9469 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9470 = llvm.mul %26, %9469 : !llvm.i64 + %9471 = llvm.add %9468, %9470 : !llvm.i64 + %9472 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9473 = llvm.mul %5851, %9472 : !llvm.i64 + %9474 = llvm.add %9471, %9473 : !llvm.i64 + %9475 = llvm.getelementptr %9467[%9474] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9476 = llvm.load %9475 : !llvm.ptr + %9477 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9478 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9479 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9480 = llvm.mul %5851, %9479 : !llvm.i64 + %9481 = llvm.add %9478, %9480 : !llvm.i64 + %9482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9483 = llvm.mul %6583, %9482 : !llvm.i64 + %9484 = llvm.add %9481, %9483 : !llvm.i64 + %9485 = llvm.getelementptr %9477[%9484] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9486 = llvm.load %9485 : !llvm.ptr + %9487 = llvm.fmul %9476, %9486 {RelaxedPrecision} : !llvm.float + %9488 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9490 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9491 = llvm.mul %26, %9490 : !llvm.i64 + %9492 = llvm.add %9489, %9491 : !llvm.i64 + %9493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9494 = llvm.mul %6583, %9493 : !llvm.i64 + %9495 = llvm.add %9492, %9494 : !llvm.i64 + %9496 = llvm.getelementptr %9488[%9495] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9497 = llvm.load %9496 : !llvm.ptr + %9498 = llvm.fadd %9497, %9487 {RelaxedPrecision} : !llvm.float + %9499 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9500 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9501 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9502 = llvm.mul %26, %9501 : !llvm.i64 + %9503 = llvm.add %9500, %9502 : !llvm.i64 + %9504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9505 = llvm.mul %6583, %9504 : !llvm.i64 + %9506 = llvm.add %9503, %9505 : !llvm.i64 + %9507 = llvm.getelementptr %9499[%9506] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9498, %9507 : !llvm.ptr + %9508 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9509 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9510 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9511 = llvm.mul %26, %9510 : !llvm.i64 + %9512 = llvm.add %9509, %9511 : !llvm.i64 + %9513 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9514 = llvm.mul %6583, %9513 : !llvm.i64 + %9515 = llvm.add %9512, %9514 : !llvm.i64 + %9516 = llvm.getelementptr %9508[%9515] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9517 = llvm.load %9516 : !llvm.ptr + %9518 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9519 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9520 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9521 = llvm.mul %26, %9520 : !llvm.i64 + %9522 = llvm.add %9519, %9521 : !llvm.i64 + %9523 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9524 = llvm.mul %6583, %9523 : !llvm.i64 + %9525 = llvm.add %9522, %9524 : !llvm.i64 + %9526 = llvm.getelementptr %9518[%9525] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9517, %9526 : !llvm.ptr + %9527 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9528 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9529 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9530 = llvm.mul %26, %9529 : !llvm.i64 + %9531 = llvm.add %9528, %9530 : !llvm.i64 + %9532 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9533 = llvm.mul %5851, %9532 : !llvm.i64 + %9534 = llvm.add %9531, %9533 : !llvm.i64 + %9535 = llvm.getelementptr %9527[%9534] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9536 = llvm.load %9535 : !llvm.ptr + %9537 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9538 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9539 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9540 = llvm.mul %5851, %9539 : !llvm.i64 + %9541 = llvm.add %9538, %9540 : !llvm.i64 + %9542 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9543 = llvm.mul %6644, %9542 : !llvm.i64 + %9544 = llvm.add %9541, %9543 : !llvm.i64 + %9545 = llvm.getelementptr %9537[%9544] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9546 = llvm.load %9545 : !llvm.ptr + %9547 = llvm.fmul %9536, %9546 {RelaxedPrecision} : !llvm.float + %9548 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9550 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9551 = llvm.mul %26, %9550 : !llvm.i64 + %9552 = llvm.add %9549, %9551 : !llvm.i64 + %9553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9554 = llvm.mul %6644, %9553 : !llvm.i64 + %9555 = llvm.add %9552, %9554 : !llvm.i64 + %9556 = llvm.getelementptr %9548[%9555] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9557 = llvm.load %9556 : !llvm.ptr + %9558 = llvm.fadd %9557, %9547 {RelaxedPrecision} : !llvm.float + %9559 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9560 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9561 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9562 = llvm.mul %26, %9561 : !llvm.i64 + %9563 = llvm.add %9560, %9562 : !llvm.i64 + %9564 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9565 = llvm.mul %6644, %9564 : !llvm.i64 + %9566 = llvm.add %9563, %9565 : !llvm.i64 + %9567 = llvm.getelementptr %9559[%9566] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9558, %9567 : !llvm.ptr + %9568 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9569 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9570 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9571 = llvm.mul %26, %9570 : !llvm.i64 + %9572 = llvm.add %9569, %9571 : !llvm.i64 + %9573 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9574 = llvm.mul %6644, %9573 : !llvm.i64 + %9575 = llvm.add %9572, %9574 : !llvm.i64 + %9576 = llvm.getelementptr %9568[%9575] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9577 = llvm.load %9576 : !llvm.ptr + %9578 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9580 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9581 = llvm.mul %26, %9580 : !llvm.i64 + %9582 = llvm.add %9579, %9581 : !llvm.i64 + %9583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9584 = llvm.mul %6644, %9583 : !llvm.i64 + %9585 = llvm.add %9582, %9584 : !llvm.i64 + %9586 = llvm.getelementptr %9578[%9585] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9577, %9586 : !llvm.ptr + %9587 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9588 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9589 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9590 = llvm.mul %26, %9589 : !llvm.i64 + %9591 = llvm.add %9588, %9590 : !llvm.i64 + %9592 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9593 = llvm.mul %5851, %9592 : !llvm.i64 + %9594 = llvm.add %9591, %9593 : !llvm.i64 + %9595 = llvm.getelementptr %9587[%9594] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9596 = llvm.load %9595 : !llvm.ptr + %9597 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9598 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9599 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9600 = llvm.mul %5851, %9599 : !llvm.i64 + %9601 = llvm.add %9598, %9600 : !llvm.i64 + %9602 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9603 = llvm.mul %6705, %9602 : !llvm.i64 + %9604 = llvm.add %9601, %9603 : !llvm.i64 + %9605 = llvm.getelementptr %9597[%9604] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9606 = llvm.load %9605 : !llvm.ptr + %9607 = llvm.fmul %9596, %9606 {RelaxedPrecision} : !llvm.float + %9608 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9609 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9610 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9611 = llvm.mul %26, %9610 : !llvm.i64 + %9612 = llvm.add %9609, %9611 : !llvm.i64 + %9613 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9614 = llvm.mul %6705, %9613 : !llvm.i64 + %9615 = llvm.add %9612, %9614 : !llvm.i64 + %9616 = llvm.getelementptr %9608[%9615] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9617 = llvm.load %9616 : !llvm.ptr + %9618 = llvm.fadd %9617, %9607 {RelaxedPrecision} : !llvm.float + %9619 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9620 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9621 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9622 = llvm.mul %26, %9621 : !llvm.i64 + %9623 = llvm.add %9620, %9622 : !llvm.i64 + %9624 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9625 = llvm.mul %6705, %9624 : !llvm.i64 + %9626 = llvm.add %9623, %9625 : !llvm.i64 + %9627 = llvm.getelementptr %9619[%9626] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9618, %9627 : !llvm.ptr + %9628 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9630 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9631 = llvm.mul %26, %9630 : !llvm.i64 + %9632 = llvm.add %9629, %9631 : !llvm.i64 + %9633 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9634 = llvm.mul %6705, %9633 : !llvm.i64 + %9635 = llvm.add %9632, %9634 : !llvm.i64 + %9636 = llvm.getelementptr %9628[%9635] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9637 = llvm.load %9636 : !llvm.ptr + %9638 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9639 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9640 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9641 = llvm.mul %26, %9640 : !llvm.i64 + %9642 = llvm.add %9639, %9641 : !llvm.i64 + %9643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9644 = llvm.mul %6705, %9643 : !llvm.i64 + %9645 = llvm.add %9642, %9644 : !llvm.i64 + %9646 = llvm.getelementptr %9638[%9645] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9637, %9646 : !llvm.ptr + %9647 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9648 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9649 = llvm.mlir.constant(128 : index) : !llvm.i64 + %9650 = llvm.mul %26, %9649 : !llvm.i64 + %9651 = llvm.add %9648, %9650 : !llvm.i64 + %9652 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9653 = llvm.mul %5851, %9652 : !llvm.i64 + %9654 = llvm.add %9651, %9653 : !llvm.i64 + %9655 = llvm.getelementptr %9647[%9654] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9656 = llvm.load %9655 : !llvm.ptr + %9657 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9658 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9659 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9660 = llvm.mul %5851, %9659 : !llvm.i64 + %9661 = llvm.add %9658, %9660 : !llvm.i64 + %9662 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9663 = llvm.mul %6766, %9662 : !llvm.i64 + %9664 = llvm.add %9661, %9663 : !llvm.i64 + %9665 = llvm.getelementptr %9657[%9664] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9666 = llvm.load %9665 : !llvm.ptr + %9667 = llvm.fmul %9656, %9666 {RelaxedPrecision} : !llvm.float + %9668 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9670 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9671 = llvm.mul %26, %9670 : !llvm.i64 + %9672 = llvm.add %9669, %9671 : !llvm.i64 + %9673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9674 = llvm.mul %6766, %9673 : !llvm.i64 + %9675 = llvm.add %9672, %9674 : !llvm.i64 + %9676 = llvm.getelementptr %9668[%9675] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9677 = llvm.load %9676 : !llvm.ptr + %9678 = llvm.fadd %9677, %9667 {RelaxedPrecision} : !llvm.float + %9679 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9680 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9681 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9682 = llvm.mul %26, %9681 : !llvm.i64 + %9683 = llvm.add %9680, %9682 : !llvm.i64 + %9684 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9685 = llvm.mul %6766, %9684 : !llvm.i64 + %9686 = llvm.add %9683, %9685 : !llvm.i64 + %9687 = llvm.getelementptr %9679[%9686] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9678, %9687 : !llvm.ptr + %9688 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9690 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9691 = llvm.mul %26, %9690 : !llvm.i64 + %9692 = llvm.add %9689, %9691 : !llvm.i64 + %9693 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9694 = llvm.mul %6766, %9693 : !llvm.i64 + %9695 = llvm.add %9692, %9694 : !llvm.i64 + %9696 = llvm.getelementptr %9688[%9695] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %9697 = llvm.load %9696 : !llvm.ptr + %9698 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %9700 = llvm.mlir.constant(512 : index) : !llvm.i64 + %9701 = llvm.mul %26, %9700 : !llvm.i64 + %9702 = llvm.add %9699, %9701 : !llvm.i64 + %9703 = llvm.mlir.constant(1 : index) : !llvm.i64 + %9704 = llvm.mul %6766, %9703 : !llvm.i64 + %9705 = llvm.add %9702, %9704 : !llvm.i64 + %9706 = llvm.getelementptr %9698[%9705] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + llvm.store %9697, %9706 : !llvm.ptr + %9707 = llvm.add %5848, %33 : !llvm.i64 + llvm.br ^bb19(%9707 : !llvm.i64) + ^bb21: // pred: ^bb19 + %9708 = llvm.add %5846, %36 : !llvm.i64 + llvm.br ^bb17(%9708 : !llvm.i64) + ^bb22: // pred: ^bb17 + %9709 = llvm.add %5844, %30 : !llvm.i64 + llvm.br ^bb15(%9709 : !llvm.i64) + ^bb23: // pred: ^bb15 + %9710 = llvm.add %48, %29 : !llvm.i64 + llvm.br ^bb1(%9710 : !llvm.i64) + ^bb24: // pred: ^bb1 + llvm.return + } + llvm.func @optimized_matmul_py(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(784 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(128 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(128 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(512 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(512 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(784 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(512 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(512 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @optimized_matmul_py_impl_17630232307017152746(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/optimized_matmul/mlir/23_FunctionPointerResolution.mlir b/Tutorials/optimized_matmul/mlir/23_FunctionPointerResolution.mlir new file mode 100644 index 00000000..0531ddef --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/23_FunctionPointerResolution.mlir @@ -0,0 +1,7762 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.mlir.global internal @cache_17() : !llvm.array<4096 x vec<8 x float>> + llvm.mlir.global internal @cache_16() : !llvm.array<192 x vec<8 x float>> + llvm.func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(0 : i64) : !llvm.i64 + %25 = llvm.mlir.constant(1 : i64) : !llvm.i64 + %26 = llvm.mlir.constant(2 : i64) : !llvm.i64 + %27 = llvm.mlir.constant(3 : i64) : !llvm.i64 + %28 = llvm.mlir.constant(4 : i64) : !llvm.i64 + %29 = llvm.mlir.constant(5 : i64) : !llvm.i64 + %30 = llvm.mlir.constant(6 : i64) : !llvm.i64 + %31 = llvm.mlir.constant(7 : i64) : !llvm.i64 + %32 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %33 = llvm.mlir.constant(10 : index) : !llvm.i64 + %34 = llvm.mlir.constant(12 : index) : !llvm.i64 + %35 = llvm.mlir.constant(14 : index) : !llvm.i64 + %36 = llvm.mlir.constant(512 : index) : !llvm.i64 + %37 = llvm.mlir.constant(784 : index) : !llvm.i64 + %38 = llvm.mlir.constant(256 : index) : !llvm.i64 + %39 = llvm.mlir.constant(128 : index) : !llvm.i64 + %40 = llvm.mlir.constant(true) : !llvm.i1 + %41 = llvm.mlir.constant(24 : index) : !llvm.i64 + %42 = llvm.mlir.constant(32 : index) : !llvm.i64 + %43 = llvm.mlir.constant(40 : index) : !llvm.i64 + %44 = llvm.mlir.constant(48 : index) : !llvm.i64 + %45 = llvm.mlir.constant(3 : index) : !llvm.i64 + %46 = llvm.mlir.constant(56 : index) : !llvm.i64 + %47 = llvm.mlir.constant(64 : index) : !llvm.i64 + %48 = llvm.mlir.constant(4 : index) : !llvm.i64 + %49 = llvm.mlir.constant(72 : index) : !llvm.i64 + %50 = llvm.mlir.constant(9 : index) : !llvm.i64 + %51 = llvm.mlir.constant(80 : index) : !llvm.i64 + %52 = llvm.mlir.constant(5 : index) : !llvm.i64 + %53 = llvm.mlir.constant(88 : index) : !llvm.i64 + %54 = llvm.mlir.constant(11 : index) : !llvm.i64 + %55 = llvm.mlir.constant(96 : index) : !llvm.i64 + %56 = llvm.mlir.constant(6 : index) : !llvm.i64 + %57 = llvm.mlir.constant(104 : index) : !llvm.i64 + %58 = llvm.mlir.constant(13 : index) : !llvm.i64 + %59 = llvm.mlir.constant(112 : index) : !llvm.i64 + %60 = llvm.mlir.constant(-16 : index) : !llvm.i64 + %61 = llvm.mlir.constant(7 : index) : !llvm.i64 + %62 = llvm.mlir.constant(120 : index) : !llvm.i64 + %63 = llvm.mlir.constant(2 : index) : !llvm.i64 + %64 = llvm.mlir.constant(-1 : index) : !llvm.i64 + %65 = llvm.mlir.constant(-2 : index) : !llvm.i64 + %66 = llvm.mlir.constant(15 : index) : !llvm.i64 + %67 = llvm.mlir.constant(0 : index) : !llvm.i64 + %68 = llvm.mlir.constant(16 : index) : !llvm.i64 + %69 = llvm.mlir.constant(1 : index) : !llvm.i64 + %70 = llvm.mlir.constant(8 : index) : !llvm.i64 + %71 = llvm.mlir.constant(1 : index) : !llvm.i64 + %72 = llvm.mlir.constant(16 : index) : !llvm.i64 + %73 = llvm.mul %71, %72 : !llvm.i64 + %74 = llvm.mlir.null : !llvm.ptr> + %75 = llvm.mlir.constant(1 : index) : !llvm.i64 + %76 = llvm.getelementptr %74[%75] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %77 = llvm.ptrtoint %76 : !llvm.ptr> to !llvm.i64 + %78 = llvm.mul %73, %77 : !llvm.i64 + %79 = llvm.alloca %78 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %80 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %81 = llvm.insertvalue %79, %80[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %82 = llvm.insertvalue %79, %81[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %83 = llvm.mlir.constant(0 : index) : !llvm.i64 + %84 = llvm.insertvalue %83, %82[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %85 = llvm.mlir.constant(1 : index) : !llvm.i64 + %86 = llvm.mlir.constant(16 : index) : !llvm.i64 + %87 = llvm.insertvalue %71, %84[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %88 = llvm.insertvalue %86, %87[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.insertvalue %72, %88[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %90 = llvm.insertvalue %85, %89[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %91 = llvm.mlir.constant(1 : index) : !llvm.i64 + %92 = llvm.mlir.constant(16 : index) : !llvm.i64 + %93 = llvm.mul %91, %92 : !llvm.i64 + %94 = llvm.mlir.null : !llvm.ptr> + %95 = llvm.mlir.constant(1 : index) : !llvm.i64 + %96 = llvm.getelementptr %94[%95] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %97 = llvm.ptrtoint %96 : !llvm.ptr> to !llvm.i64 + %98 = llvm.mul %93, %97 : !llvm.i64 + %99 = llvm.alloca %98 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %100 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %101 = llvm.insertvalue %99, %100[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %102 = llvm.insertvalue %99, %101[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %103 = llvm.mlir.constant(0 : index) : !llvm.i64 + %104 = llvm.insertvalue %103, %102[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %105 = llvm.mlir.constant(1 : index) : !llvm.i64 + %106 = llvm.mlir.constant(16 : index) : !llvm.i64 + %107 = llvm.insertvalue %91, %104[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %108 = llvm.insertvalue %106, %107[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.insertvalue %92, %108[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %110 = llvm.insertvalue %105, %109[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %111 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %112 = llvm.mlir.addressof @cache_16 : !llvm.ptr>> + %113 = llvm.getelementptr %112[%111, %111] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %114 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %115 = llvm.insertvalue %113, %114[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %116 = llvm.insertvalue %113, %115[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %117 = llvm.mlir.constant(0 : index) : !llvm.i64 + %118 = llvm.insertvalue %117, %116[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %119 = llvm.mlir.constant(16 : index) : !llvm.i64 + %120 = llvm.insertvalue %119, %118[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %121 = llvm.mlir.constant(12 : index) : !llvm.i64 + %122 = llvm.insertvalue %121, %120[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %123 = llvm.mlir.constant(6 : index) : !llvm.i64 + %124 = llvm.insertvalue %123, %122[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %126 = llvm.insertvalue %125, %124[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %127 = llvm.mlir.constant(2 : index) : !llvm.i64 + %128 = llvm.insertvalue %127, %126[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %129 = llvm.mlir.constant(1 : index) : !llvm.i64 + %130 = llvm.insertvalue %129, %128[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %131 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %132 = llvm.mlir.addressof @cache_17 : !llvm.ptr>> + %133 = llvm.getelementptr %132[%131, %131] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %134 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %135 = llvm.insertvalue %133, %134[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %136 = llvm.insertvalue %133, %135[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %137 = llvm.mlir.constant(0 : index) : !llvm.i64 + %138 = llvm.insertvalue %137, %136[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %139 = llvm.mlir.constant(16 : index) : !llvm.i64 + %140 = llvm.insertvalue %139, %138[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.insertvalue %141, %140[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %143 = llvm.mlir.constant(128 : index) : !llvm.i64 + %144 = llvm.insertvalue %143, %142[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %145 = llvm.mlir.constant(2 : index) : !llvm.i64 + %146 = llvm.insertvalue %145, %144[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %147 = llvm.mlir.constant(2 : index) : !llvm.i64 + %148 = llvm.insertvalue %147, %146[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %150 = llvm.insertvalue %149, %148[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + llvm.br ^bb1(%67 : !llvm.i64) + ^bb1(%151: !llvm.i64): // 2 preds: ^bb0, ^bb52 + %152 = llvm.icmp "slt" %151, %36 : !llvm.i64 + llvm.cond_br %152, ^bb2, ^bb53 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%67 : !llvm.i64) + ^bb3(%153: !llvm.i64): // 2 preds: ^bb2, ^bb10 + %154 = llvm.icmp "slt" %153, %39 : !llvm.i64 + llvm.cond_br %154, ^bb4, ^bb11 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%67 : !llvm.i64) + ^bb5(%155: !llvm.i64): // 2 preds: ^bb4, ^bb9 + %156 = llvm.icmp "slt" %155, %38 : !llvm.i64 + llvm.cond_br %156, ^bb6, ^bb10 + ^bb6: // pred: ^bb5 + llvm.cond_br %40, ^bb7, ^bb8 + ^bb7: // pred: ^bb6 + %157 = llvm.add %151, %155 : !llvm.i64 + %158 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %161 = llvm.mul %153, %160 : !llvm.i64 + %162 = llvm.add %159, %161 : !llvm.i64 + %163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %164 = llvm.mul %157, %163 : !llvm.i64 + %165 = llvm.add %162, %164 : !llvm.i64 + %166 = llvm.getelementptr %158[%165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %167 = llvm.bitcast %166 : !llvm.ptr to !llvm.ptr> + %168 = llvm.load %167 {alignment = 4 : i64} : !llvm.ptr> + %169 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(16 : index) : !llvm.i64 + %172 = llvm.mul %67, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %67, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %168, %177 : !llvm.ptr> + %178 = llvm.add %157, %70 : !llvm.i64 + %179 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %182 = llvm.mul %153, %181 : !llvm.i64 + %183 = llvm.add %180, %182 : !llvm.i64 + %184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %185 = llvm.mul %178, %184 : !llvm.i64 + %186 = llvm.add %183, %185 : !llvm.i64 + %187 = llvm.getelementptr %179[%186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %188 = llvm.bitcast %187 : !llvm.ptr to !llvm.ptr> + %189 = llvm.load %188 {alignment = 4 : i64} : !llvm.ptr> + %190 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %192 = llvm.mlir.constant(16 : index) : !llvm.i64 + %193 = llvm.mul %67, %192 : !llvm.i64 + %194 = llvm.add %191, %193 : !llvm.i64 + %195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %196 = llvm.mul %69, %195 : !llvm.i64 + %197 = llvm.add %194, %196 : !llvm.i64 + %198 = llvm.getelementptr %190[%197] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %189, %198 : !llvm.ptr> + %199 = llvm.add %157, %68 : !llvm.i64 + %200 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(512 : index) : !llvm.i64 + %203 = llvm.mul %153, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %199, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.bitcast %208 : !llvm.ptr to !llvm.ptr> + %210 = llvm.load %209 {alignment = 4 : i64} : !llvm.ptr> + %211 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %212 = llvm.mlir.constant(0 : index) : !llvm.i64 + %213 = llvm.mlir.constant(16 : index) : !llvm.i64 + %214 = llvm.mul %67, %213 : !llvm.i64 + %215 = llvm.add %212, %214 : !llvm.i64 + %216 = llvm.mlir.constant(1 : index) : !llvm.i64 + %217 = llvm.mul %63, %216 : !llvm.i64 + %218 = llvm.add %215, %217 : !llvm.i64 + %219 = llvm.getelementptr %211[%218] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %210, %219 : !llvm.ptr> + %220 = llvm.add %157, %41 : !llvm.i64 + %221 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %223 = llvm.mlir.constant(512 : index) : !llvm.i64 + %224 = llvm.mul %153, %223 : !llvm.i64 + %225 = llvm.add %222, %224 : !llvm.i64 + %226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %227 = llvm.mul %220, %226 : !llvm.i64 + %228 = llvm.add %225, %227 : !llvm.i64 + %229 = llvm.getelementptr %221[%228] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %230 = llvm.bitcast %229 : !llvm.ptr to !llvm.ptr> + %231 = llvm.load %230 {alignment = 4 : i64} : !llvm.ptr> + %232 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %233 = llvm.mlir.constant(0 : index) : !llvm.i64 + %234 = llvm.mlir.constant(16 : index) : !llvm.i64 + %235 = llvm.mul %67, %234 : !llvm.i64 + %236 = llvm.add %233, %235 : !llvm.i64 + %237 = llvm.mlir.constant(1 : index) : !llvm.i64 + %238 = llvm.mul %45, %237 : !llvm.i64 + %239 = llvm.add %236, %238 : !llvm.i64 + %240 = llvm.getelementptr %232[%239] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %231, %240 : !llvm.ptr> + %241 = llvm.add %157, %42 : !llvm.i64 + %242 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %243 = llvm.mlir.constant(0 : index) : !llvm.i64 + %244 = llvm.mlir.constant(512 : index) : !llvm.i64 + %245 = llvm.mul %153, %244 : !llvm.i64 + %246 = llvm.add %243, %245 : !llvm.i64 + %247 = llvm.mlir.constant(1 : index) : !llvm.i64 + %248 = llvm.mul %241, %247 : !llvm.i64 + %249 = llvm.add %246, %248 : !llvm.i64 + %250 = llvm.getelementptr %242[%249] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %251 = llvm.bitcast %250 : !llvm.ptr to !llvm.ptr> + %252 = llvm.load %251 {alignment = 4 : i64} : !llvm.ptr> + %253 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %254 = llvm.mlir.constant(0 : index) : !llvm.i64 + %255 = llvm.mlir.constant(16 : index) : !llvm.i64 + %256 = llvm.mul %67, %255 : !llvm.i64 + %257 = llvm.add %254, %256 : !llvm.i64 + %258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %259 = llvm.mul %48, %258 : !llvm.i64 + %260 = llvm.add %257, %259 : !llvm.i64 + %261 = llvm.getelementptr %253[%260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %252, %261 : !llvm.ptr> + %262 = llvm.add %157, %43 : !llvm.i64 + %263 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %264 = llvm.mlir.constant(0 : index) : !llvm.i64 + %265 = llvm.mlir.constant(512 : index) : !llvm.i64 + %266 = llvm.mul %153, %265 : !llvm.i64 + %267 = llvm.add %264, %266 : !llvm.i64 + %268 = llvm.mlir.constant(1 : index) : !llvm.i64 + %269 = llvm.mul %262, %268 : !llvm.i64 + %270 = llvm.add %267, %269 : !llvm.i64 + %271 = llvm.getelementptr %263[%270] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %272 = llvm.bitcast %271 : !llvm.ptr to !llvm.ptr> + %273 = llvm.load %272 {alignment = 4 : i64} : !llvm.ptr> + %274 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %275 = llvm.mlir.constant(0 : index) : !llvm.i64 + %276 = llvm.mlir.constant(16 : index) : !llvm.i64 + %277 = llvm.mul %67, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.mlir.constant(1 : index) : !llvm.i64 + %280 = llvm.mul %52, %279 : !llvm.i64 + %281 = llvm.add %278, %280 : !llvm.i64 + %282 = llvm.getelementptr %274[%281] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %273, %282 : !llvm.ptr> + %283 = llvm.add %157, %44 : !llvm.i64 + %284 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %286 = llvm.mlir.constant(512 : index) : !llvm.i64 + %287 = llvm.mul %153, %286 : !llvm.i64 + %288 = llvm.add %285, %287 : !llvm.i64 + %289 = llvm.mlir.constant(1 : index) : !llvm.i64 + %290 = llvm.mul %283, %289 : !llvm.i64 + %291 = llvm.add %288, %290 : !llvm.i64 + %292 = llvm.getelementptr %284[%291] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %293 = llvm.bitcast %292 : !llvm.ptr to !llvm.ptr> + %294 = llvm.load %293 {alignment = 4 : i64} : !llvm.ptr> + %295 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %296 = llvm.mlir.constant(0 : index) : !llvm.i64 + %297 = llvm.mlir.constant(16 : index) : !llvm.i64 + %298 = llvm.mul %67, %297 : !llvm.i64 + %299 = llvm.add %296, %298 : !llvm.i64 + %300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %301 = llvm.mul %56, %300 : !llvm.i64 + %302 = llvm.add %299, %301 : !llvm.i64 + %303 = llvm.getelementptr %295[%302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %294, %303 : !llvm.ptr> + %304 = llvm.add %157, %46 : !llvm.i64 + %305 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %306 = llvm.mlir.constant(0 : index) : !llvm.i64 + %307 = llvm.mlir.constant(512 : index) : !llvm.i64 + %308 = llvm.mul %153, %307 : !llvm.i64 + %309 = llvm.add %306, %308 : !llvm.i64 + %310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %311 = llvm.mul %304, %310 : !llvm.i64 + %312 = llvm.add %309, %311 : !llvm.i64 + %313 = llvm.getelementptr %305[%312] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %314 = llvm.bitcast %313 : !llvm.ptr to !llvm.ptr> + %315 = llvm.load %314 {alignment = 4 : i64} : !llvm.ptr> + %316 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %318 = llvm.mlir.constant(16 : index) : !llvm.i64 + %319 = llvm.mul %67, %318 : !llvm.i64 + %320 = llvm.add %317, %319 : !llvm.i64 + %321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %322 = llvm.mul %61, %321 : !llvm.i64 + %323 = llvm.add %320, %322 : !llvm.i64 + %324 = llvm.getelementptr %316[%323] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %315, %324 : !llvm.ptr> + %325 = llvm.add %157, %47 : !llvm.i64 + %326 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %328 = llvm.mlir.constant(512 : index) : !llvm.i64 + %329 = llvm.mul %153, %328 : !llvm.i64 + %330 = llvm.add %327, %329 : !llvm.i64 + %331 = llvm.mlir.constant(1 : index) : !llvm.i64 + %332 = llvm.mul %325, %331 : !llvm.i64 + %333 = llvm.add %330, %332 : !llvm.i64 + %334 = llvm.getelementptr %326[%333] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %335 = llvm.bitcast %334 : !llvm.ptr to !llvm.ptr> + %336 = llvm.load %335 {alignment = 4 : i64} : !llvm.ptr> + %337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %340 = llvm.mul %67, %339 : !llvm.i64 + %341 = llvm.add %338, %340 : !llvm.i64 + %342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %343 = llvm.mul %70, %342 : !llvm.i64 + %344 = llvm.add %341, %343 : !llvm.i64 + %345 = llvm.getelementptr %337[%344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %336, %345 : !llvm.ptr> + %346 = llvm.add %157, %49 : !llvm.i64 + %347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %350 = llvm.mul %153, %349 : !llvm.i64 + %351 = llvm.add %348, %350 : !llvm.i64 + %352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %353 = llvm.mul %346, %352 : !llvm.i64 + %354 = llvm.add %351, %353 : !llvm.i64 + %355 = llvm.getelementptr %347[%354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %356 = llvm.bitcast %355 : !llvm.ptr to !llvm.ptr> + %357 = llvm.load %356 {alignment = 4 : i64} : !llvm.ptr> + %358 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %360 = llvm.mlir.constant(16 : index) : !llvm.i64 + %361 = llvm.mul %67, %360 : !llvm.i64 + %362 = llvm.add %359, %361 : !llvm.i64 + %363 = llvm.mlir.constant(1 : index) : !llvm.i64 + %364 = llvm.mul %50, %363 : !llvm.i64 + %365 = llvm.add %362, %364 : !llvm.i64 + %366 = llvm.getelementptr %358[%365] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %357, %366 : !llvm.ptr> + %367 = llvm.add %157, %51 : !llvm.i64 + %368 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %371 = llvm.mul %153, %370 : !llvm.i64 + %372 = llvm.add %369, %371 : !llvm.i64 + %373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %374 = llvm.mul %367, %373 : !llvm.i64 + %375 = llvm.add %372, %374 : !llvm.i64 + %376 = llvm.getelementptr %368[%375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %377 = llvm.bitcast %376 : !llvm.ptr to !llvm.ptr> + %378 = llvm.load %377 {alignment = 4 : i64} : !llvm.ptr> + %379 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %380 = llvm.mlir.constant(0 : index) : !llvm.i64 + %381 = llvm.mlir.constant(16 : index) : !llvm.i64 + %382 = llvm.mul %67, %381 : !llvm.i64 + %383 = llvm.add %380, %382 : !llvm.i64 + %384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %385 = llvm.mul %33, %384 : !llvm.i64 + %386 = llvm.add %383, %385 : !llvm.i64 + %387 = llvm.getelementptr %379[%386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %378, %387 : !llvm.ptr> + %388 = llvm.add %157, %53 : !llvm.i64 + %389 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %390 = llvm.mlir.constant(0 : index) : !llvm.i64 + %391 = llvm.mlir.constant(512 : index) : !llvm.i64 + %392 = llvm.mul %153, %391 : !llvm.i64 + %393 = llvm.add %390, %392 : !llvm.i64 + %394 = llvm.mlir.constant(1 : index) : !llvm.i64 + %395 = llvm.mul %388, %394 : !llvm.i64 + %396 = llvm.add %393, %395 : !llvm.i64 + %397 = llvm.getelementptr %389[%396] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %398 = llvm.bitcast %397 : !llvm.ptr to !llvm.ptr> + %399 = llvm.load %398 {alignment = 4 : i64} : !llvm.ptr> + %400 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %403 = llvm.mul %67, %402 : !llvm.i64 + %404 = llvm.add %401, %403 : !llvm.i64 + %405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %406 = llvm.mul %54, %405 : !llvm.i64 + %407 = llvm.add %404, %406 : !llvm.i64 + %408 = llvm.getelementptr %400[%407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %399, %408 : !llvm.ptr> + %409 = llvm.add %157, %55 : !llvm.i64 + %410 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %413 = llvm.mul %153, %412 : !llvm.i64 + %414 = llvm.add %411, %413 : !llvm.i64 + %415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %416 = llvm.mul %409, %415 : !llvm.i64 + %417 = llvm.add %414, %416 : !llvm.i64 + %418 = llvm.getelementptr %410[%417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %419 = llvm.bitcast %418 : !llvm.ptr to !llvm.ptr> + %420 = llvm.load %419 {alignment = 4 : i64} : !llvm.ptr> + %421 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %423 = llvm.mlir.constant(16 : index) : !llvm.i64 + %424 = llvm.mul %67, %423 : !llvm.i64 + %425 = llvm.add %422, %424 : !llvm.i64 + %426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %427 = llvm.mul %34, %426 : !llvm.i64 + %428 = llvm.add %425, %427 : !llvm.i64 + %429 = llvm.getelementptr %421[%428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %420, %429 : !llvm.ptr> + %430 = llvm.add %157, %57 : !llvm.i64 + %431 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %433 = llvm.mlir.constant(512 : index) : !llvm.i64 + %434 = llvm.mul %153, %433 : !llvm.i64 + %435 = llvm.add %432, %434 : !llvm.i64 + %436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %437 = llvm.mul %430, %436 : !llvm.i64 + %438 = llvm.add %435, %437 : !llvm.i64 + %439 = llvm.getelementptr %431[%438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %440 = llvm.bitcast %439 : !llvm.ptr to !llvm.ptr> + %441 = llvm.load %440 {alignment = 4 : i64} : !llvm.ptr> + %442 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %444 = llvm.mlir.constant(16 : index) : !llvm.i64 + %445 = llvm.mul %67, %444 : !llvm.i64 + %446 = llvm.add %443, %445 : !llvm.i64 + %447 = llvm.mlir.constant(1 : index) : !llvm.i64 + %448 = llvm.mul %58, %447 : !llvm.i64 + %449 = llvm.add %446, %448 : !llvm.i64 + %450 = llvm.getelementptr %442[%449] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %441, %450 : !llvm.ptr> + %451 = llvm.add %157, %59 : !llvm.i64 + %452 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %455 = llvm.mul %153, %454 : !llvm.i64 + %456 = llvm.add %453, %455 : !llvm.i64 + %457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %458 = llvm.mul %451, %457 : !llvm.i64 + %459 = llvm.add %456, %458 : !llvm.i64 + %460 = llvm.getelementptr %452[%459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %461 = llvm.bitcast %460 : !llvm.ptr to !llvm.ptr> + %462 = llvm.load %461 {alignment = 4 : i64} : !llvm.ptr> + %463 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %464 = llvm.mlir.constant(0 : index) : !llvm.i64 + %465 = llvm.mlir.constant(16 : index) : !llvm.i64 + %466 = llvm.mul %67, %465 : !llvm.i64 + %467 = llvm.add %464, %466 : !llvm.i64 + %468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %469 = llvm.mul %35, %468 : !llvm.i64 + %470 = llvm.add %467, %469 : !llvm.i64 + %471 = llvm.getelementptr %463[%470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %462, %471 : !llvm.ptr> + %472 = llvm.add %157, %62 : !llvm.i64 + %473 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %474 = llvm.mlir.constant(0 : index) : !llvm.i64 + %475 = llvm.mlir.constant(512 : index) : !llvm.i64 + %476 = llvm.mul %153, %475 : !llvm.i64 + %477 = llvm.add %474, %476 : !llvm.i64 + %478 = llvm.mlir.constant(1 : index) : !llvm.i64 + %479 = llvm.mul %472, %478 : !llvm.i64 + %480 = llvm.add %477, %479 : !llvm.i64 + %481 = llvm.getelementptr %473[%480] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %482 = llvm.bitcast %481 : !llvm.ptr to !llvm.ptr> + %483 = llvm.load %482 {alignment = 4 : i64} : !llvm.ptr> + %484 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %485 = llvm.mlir.constant(0 : index) : !llvm.i64 + %486 = llvm.mlir.constant(16 : index) : !llvm.i64 + %487 = llvm.mul %67, %486 : !llvm.i64 + %488 = llvm.add %485, %487 : !llvm.i64 + %489 = llvm.mlir.constant(1 : index) : !llvm.i64 + %490 = llvm.mul %66, %489 : !llvm.i64 + %491 = llvm.add %488, %490 : !llvm.i64 + %492 = llvm.getelementptr %484[%491] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %483, %492 : !llvm.ptr> + %493 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %494 = llvm.mlir.constant(0 : index) : !llvm.i64 + %495 = llvm.mlir.constant(16 : index) : !llvm.i64 + %496 = llvm.mul %67, %495 : !llvm.i64 + %497 = llvm.add %494, %496 : !llvm.i64 + %498 = llvm.mlir.constant(1 : index) : !llvm.i64 + %499 = llvm.mul %67, %498 : !llvm.i64 + %500 = llvm.add %497, %499 : !llvm.i64 + %501 = llvm.getelementptr %493[%500] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %502 = llvm.load %501 : !llvm.ptr> + %503 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %504 = llvm.sub %64, %155 : !llvm.i64 + %505 = llvm.select %503, %504, %155 : !llvm.i1, !llvm.i64 + %506 = llvm.sdiv %505, %68 : !llvm.i64 + %507 = llvm.sub %64, %506 : !llvm.i64 + %508 = llvm.select %503, %507, %506 : !llvm.i1, !llvm.i64 + %509 = llvm.srem %508, %68 : !llvm.i64 + %510 = llvm.icmp "slt" %509, %67 : !llvm.i64 + %511 = llvm.add %509, %68 : !llvm.i64 + %512 = llvm.select %510, %511, %509 : !llvm.i1, !llvm.i64 + %513 = llvm.srem %153, %39 : !llvm.i64 + %514 = llvm.icmp "slt" %513, %67 : !llvm.i64 + %515 = llvm.add %513, %39 : !llvm.i64 + %516 = llvm.select %514, %515, %513 : !llvm.i1, !llvm.i64 + %517 = llvm.srem %155, %68 : !llvm.i64 + %518 = llvm.icmp "slt" %517, %67 : !llvm.i64 + %519 = llvm.add %517, %68 : !llvm.i64 + %520 = llvm.select %518, %519, %517 : !llvm.i1, !llvm.i64 + %521 = llvm.icmp "slt" %520, %67 : !llvm.i64 + %522 = llvm.sub %64, %520 : !llvm.i64 + %523 = llvm.select %521, %522, %520 : !llvm.i1, !llvm.i64 + %524 = llvm.sdiv %523, %70 : !llvm.i64 + %525 = llvm.sub %64, %524 : !llvm.i64 + %526 = llvm.select %521, %525, %524 : !llvm.i1, !llvm.i64 + %527 = llvm.srem %526, %63 : !llvm.i64 + %528 = llvm.icmp "slt" %527, %67 : !llvm.i64 + %529 = llvm.add %527, %63 : !llvm.i64 + %530 = llvm.select %528, %529, %527 : !llvm.i1, !llvm.i64 + %531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %534 = llvm.mul %512, %533 : !llvm.i64 + %535 = llvm.add %532, %534 : !llvm.i64 + %536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %537 = llvm.mul %516, %536 : !llvm.i64 + %538 = llvm.add %535, %537 : !llvm.i64 + %539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %540 = llvm.mul %530, %539 : !llvm.i64 + %541 = llvm.add %538, %540 : !llvm.i64 + %542 = llvm.getelementptr %531[%541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %502, %542 : !llvm.ptr> + %543 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %544 = llvm.mlir.constant(0 : index) : !llvm.i64 + %545 = llvm.mlir.constant(16 : index) : !llvm.i64 + %546 = llvm.mul %67, %545 : !llvm.i64 + %547 = llvm.add %544, %546 : !llvm.i64 + %548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %549 = llvm.mul %69, %548 : !llvm.i64 + %550 = llvm.add %547, %549 : !llvm.i64 + %551 = llvm.getelementptr %543[%550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %552 = llvm.load %551 : !llvm.ptr> + %553 = llvm.add %155, %70 : !llvm.i64 + %554 = llvm.icmp "slt" %553, %67 : !llvm.i64 + %555 = llvm.sub %64, %553 : !llvm.i64 + %556 = llvm.select %554, %555, %553 : !llvm.i1, !llvm.i64 + %557 = llvm.sdiv %556, %68 : !llvm.i64 + %558 = llvm.sub %64, %557 : !llvm.i64 + %559 = llvm.select %554, %558, %557 : !llvm.i1, !llvm.i64 + %560 = llvm.srem %559, %68 : !llvm.i64 + %561 = llvm.icmp "slt" %560, %67 : !llvm.i64 + %562 = llvm.add %560, %68 : !llvm.i64 + %563 = llvm.select %561, %562, %560 : !llvm.i1, !llvm.i64 + %564 = llvm.sdiv %505, %70 : !llvm.i64 + %565 = llvm.sub %64, %564 : !llvm.i64 + %566 = llvm.select %503, %565, %564 : !llvm.i1, !llvm.i64 + %567 = llvm.mul %559, %65 : !llvm.i64 + %568 = llvm.add %566, %567 : !llvm.i64 + %569 = llvm.add %568, %69 : !llvm.i64 + %570 = llvm.icmp "slt" %569, %67 : !llvm.i64 + %571 = llvm.sub %64, %569 : !llvm.i64 + %572 = llvm.select %570, %571, %569 : !llvm.i1, !llvm.i64 + %573 = llvm.sdiv %572, %63 : !llvm.i64 + %574 = llvm.sub %64, %573 : !llvm.i64 + %575 = llvm.select %570, %574, %573 : !llvm.i1, !llvm.i64 + %576 = llvm.mul %575, %65 : !llvm.i64 + %577 = llvm.add %568, %576 : !llvm.i64 + %578 = llvm.add %577, %69 : !llvm.i64 + %579 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %581 = llvm.mlir.constant(256 : index) : !llvm.i64 + %582 = llvm.mul %563, %581 : !llvm.i64 + %583 = llvm.add %580, %582 : !llvm.i64 + %584 = llvm.mlir.constant(2 : index) : !llvm.i64 + %585 = llvm.mul %516, %584 : !llvm.i64 + %586 = llvm.add %583, %585 : !llvm.i64 + %587 = llvm.mlir.constant(1 : index) : !llvm.i64 + %588 = llvm.mul %578, %587 : !llvm.i64 + %589 = llvm.add %586, %588 : !llvm.i64 + %590 = llvm.getelementptr %579[%589] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %552, %590 : !llvm.ptr> + %591 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %593 = llvm.mlir.constant(16 : index) : !llvm.i64 + %594 = llvm.mul %67, %593 : !llvm.i64 + %595 = llvm.add %592, %594 : !llvm.i64 + %596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %597 = llvm.mul %63, %596 : !llvm.i64 + %598 = llvm.add %595, %597 : !llvm.i64 + %599 = llvm.getelementptr %591[%598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %600 = llvm.load %599 : !llvm.ptr> + %601 = llvm.add %508, %69 : !llvm.i64 + %602 = llvm.icmp "slt" %601, %67 : !llvm.i64 + %603 = llvm.sub %64, %601 : !llvm.i64 + %604 = llvm.select %602, %603, %601 : !llvm.i1, !llvm.i64 + %605 = llvm.sdiv %604, %68 : !llvm.i64 + %606 = llvm.sub %64, %605 : !llvm.i64 + %607 = llvm.select %602, %606, %605 : !llvm.i1, !llvm.i64 + %608 = llvm.mul %607, %60 : !llvm.i64 + %609 = llvm.add %508, %608 : !llvm.i64 + %610 = llvm.add %609, %69 : !llvm.i64 + %611 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %612 = llvm.mlir.constant(0 : index) : !llvm.i64 + %613 = llvm.mlir.constant(256 : index) : !llvm.i64 + %614 = llvm.mul %610, %613 : !llvm.i64 + %615 = llvm.add %612, %614 : !llvm.i64 + %616 = llvm.mlir.constant(2 : index) : !llvm.i64 + %617 = llvm.mul %516, %616 : !llvm.i64 + %618 = llvm.add %615, %617 : !llvm.i64 + %619 = llvm.mlir.constant(1 : index) : !llvm.i64 + %620 = llvm.mul %530, %619 : !llvm.i64 + %621 = llvm.add %618, %620 : !llvm.i64 + %622 = llvm.getelementptr %611[%621] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %600, %622 : !llvm.ptr> + %623 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %624 = llvm.mlir.constant(0 : index) : !llvm.i64 + %625 = llvm.mlir.constant(16 : index) : !llvm.i64 + %626 = llvm.mul %67, %625 : !llvm.i64 + %627 = llvm.add %624, %626 : !llvm.i64 + %628 = llvm.mlir.constant(1 : index) : !llvm.i64 + %629 = llvm.mul %45, %628 : !llvm.i64 + %630 = llvm.add %627, %629 : !llvm.i64 + %631 = llvm.getelementptr %623[%630] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %632 = llvm.load %631 : !llvm.ptr> + %633 = llvm.add %155, %41 : !llvm.i64 + %634 = llvm.icmp "slt" %633, %67 : !llvm.i64 + %635 = llvm.sub %64, %633 : !llvm.i64 + %636 = llvm.select %634, %635, %633 : !llvm.i1, !llvm.i64 + %637 = llvm.sdiv %636, %68 : !llvm.i64 + %638 = llvm.sub %64, %637 : !llvm.i64 + %639 = llvm.select %634, %638, %637 : !llvm.i1, !llvm.i64 + %640 = llvm.srem %639, %68 : !llvm.i64 + %641 = llvm.icmp "slt" %640, %67 : !llvm.i64 + %642 = llvm.add %640, %68 : !llvm.i64 + %643 = llvm.select %641, %642, %640 : !llvm.i1, !llvm.i64 + %644 = llvm.mul %639, %65 : !llvm.i64 + %645 = llvm.add %566, %644 : !llvm.i64 + %646 = llvm.add %645, %45 : !llvm.i64 + %647 = llvm.icmp "slt" %646, %67 : !llvm.i64 + %648 = llvm.sub %64, %646 : !llvm.i64 + %649 = llvm.select %647, %648, %646 : !llvm.i1, !llvm.i64 + %650 = llvm.sdiv %649, %63 : !llvm.i64 + %651 = llvm.sub %64, %650 : !llvm.i64 + %652 = llvm.select %647, %651, %650 : !llvm.i1, !llvm.i64 + %653 = llvm.mul %652, %65 : !llvm.i64 + %654 = llvm.add %645, %653 : !llvm.i64 + %655 = llvm.add %654, %45 : !llvm.i64 + %656 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %658 = llvm.mlir.constant(256 : index) : !llvm.i64 + %659 = llvm.mul %643, %658 : !llvm.i64 + %660 = llvm.add %657, %659 : !llvm.i64 + %661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %662 = llvm.mul %516, %661 : !llvm.i64 + %663 = llvm.add %660, %662 : !llvm.i64 + %664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %665 = llvm.mul %655, %664 : !llvm.i64 + %666 = llvm.add %663, %665 : !llvm.i64 + %667 = llvm.getelementptr %656[%666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %632, %667 : !llvm.ptr> + %668 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %670 = llvm.mlir.constant(16 : index) : !llvm.i64 + %671 = llvm.mul %67, %670 : !llvm.i64 + %672 = llvm.add %669, %671 : !llvm.i64 + %673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %674 = llvm.mul %48, %673 : !llvm.i64 + %675 = llvm.add %672, %674 : !llvm.i64 + %676 = llvm.getelementptr %668[%675] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %677 = llvm.load %676 : !llvm.ptr> + %678 = llvm.add %508, %63 : !llvm.i64 + %679 = llvm.icmp "slt" %678, %67 : !llvm.i64 + %680 = llvm.sub %64, %678 : !llvm.i64 + %681 = llvm.select %679, %680, %678 : !llvm.i1, !llvm.i64 + %682 = llvm.sdiv %681, %68 : !llvm.i64 + %683 = llvm.sub %64, %682 : !llvm.i64 + %684 = llvm.select %679, %683, %682 : !llvm.i1, !llvm.i64 + %685 = llvm.mul %684, %60 : !llvm.i64 + %686 = llvm.add %508, %685 : !llvm.i64 + %687 = llvm.add %686, %63 : !llvm.i64 + %688 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %690 = llvm.mlir.constant(256 : index) : !llvm.i64 + %691 = llvm.mul %687, %690 : !llvm.i64 + %692 = llvm.add %689, %691 : !llvm.i64 + %693 = llvm.mlir.constant(2 : index) : !llvm.i64 + %694 = llvm.mul %516, %693 : !llvm.i64 + %695 = llvm.add %692, %694 : !llvm.i64 + %696 = llvm.mlir.constant(1 : index) : !llvm.i64 + %697 = llvm.mul %530, %696 : !llvm.i64 + %698 = llvm.add %695, %697 : !llvm.i64 + %699 = llvm.getelementptr %688[%698] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %677, %699 : !llvm.ptr> + %700 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %701 = llvm.mlir.constant(0 : index) : !llvm.i64 + %702 = llvm.mlir.constant(16 : index) : !llvm.i64 + %703 = llvm.mul %67, %702 : !llvm.i64 + %704 = llvm.add %701, %703 : !llvm.i64 + %705 = llvm.mlir.constant(1 : index) : !llvm.i64 + %706 = llvm.mul %52, %705 : !llvm.i64 + %707 = llvm.add %704, %706 : !llvm.i64 + %708 = llvm.getelementptr %700[%707] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %709 = llvm.load %708 : !llvm.ptr> + %710 = llvm.add %155, %43 : !llvm.i64 + %711 = llvm.icmp "slt" %710, %67 : !llvm.i64 + %712 = llvm.sub %64, %710 : !llvm.i64 + %713 = llvm.select %711, %712, %710 : !llvm.i1, !llvm.i64 + %714 = llvm.sdiv %713, %68 : !llvm.i64 + %715 = llvm.sub %64, %714 : !llvm.i64 + %716 = llvm.select %711, %715, %714 : !llvm.i1, !llvm.i64 + %717 = llvm.srem %716, %68 : !llvm.i64 + %718 = llvm.icmp "slt" %717, %67 : !llvm.i64 + %719 = llvm.add %717, %68 : !llvm.i64 + %720 = llvm.select %718, %719, %717 : !llvm.i1, !llvm.i64 + %721 = llvm.mul %716, %65 : !llvm.i64 + %722 = llvm.add %566, %721 : !llvm.i64 + %723 = llvm.add %722, %52 : !llvm.i64 + %724 = llvm.icmp "slt" %723, %67 : !llvm.i64 + %725 = llvm.sub %64, %723 : !llvm.i64 + %726 = llvm.select %724, %725, %723 : !llvm.i1, !llvm.i64 + %727 = llvm.sdiv %726, %63 : !llvm.i64 + %728 = llvm.sub %64, %727 : !llvm.i64 + %729 = llvm.select %724, %728, %727 : !llvm.i1, !llvm.i64 + %730 = llvm.mul %729, %65 : !llvm.i64 + %731 = llvm.add %722, %730 : !llvm.i64 + %732 = llvm.add %731, %52 : !llvm.i64 + %733 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %734 = llvm.mlir.constant(0 : index) : !llvm.i64 + %735 = llvm.mlir.constant(256 : index) : !llvm.i64 + %736 = llvm.mul %720, %735 : !llvm.i64 + %737 = llvm.add %734, %736 : !llvm.i64 + %738 = llvm.mlir.constant(2 : index) : !llvm.i64 + %739 = llvm.mul %516, %738 : !llvm.i64 + %740 = llvm.add %737, %739 : !llvm.i64 + %741 = llvm.mlir.constant(1 : index) : !llvm.i64 + %742 = llvm.mul %732, %741 : !llvm.i64 + %743 = llvm.add %740, %742 : !llvm.i64 + %744 = llvm.getelementptr %733[%743] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %709, %744 : !llvm.ptr> + %745 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %746 = llvm.mlir.constant(0 : index) : !llvm.i64 + %747 = llvm.mlir.constant(16 : index) : !llvm.i64 + %748 = llvm.mul %67, %747 : !llvm.i64 + %749 = llvm.add %746, %748 : !llvm.i64 + %750 = llvm.mlir.constant(1 : index) : !llvm.i64 + %751 = llvm.mul %56, %750 : !llvm.i64 + %752 = llvm.add %749, %751 : !llvm.i64 + %753 = llvm.getelementptr %745[%752] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %754 = llvm.load %753 : !llvm.ptr> + %755 = llvm.add %508, %45 : !llvm.i64 + %756 = llvm.icmp "slt" %755, %67 : !llvm.i64 + %757 = llvm.sub %64, %755 : !llvm.i64 + %758 = llvm.select %756, %757, %755 : !llvm.i1, !llvm.i64 + %759 = llvm.sdiv %758, %68 : !llvm.i64 + %760 = llvm.sub %64, %759 : !llvm.i64 + %761 = llvm.select %756, %760, %759 : !llvm.i1, !llvm.i64 + %762 = llvm.mul %761, %60 : !llvm.i64 + %763 = llvm.add %508, %762 : !llvm.i64 + %764 = llvm.add %763, %45 : !llvm.i64 + %765 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %766 = llvm.mlir.constant(0 : index) : !llvm.i64 + %767 = llvm.mlir.constant(256 : index) : !llvm.i64 + %768 = llvm.mul %764, %767 : !llvm.i64 + %769 = llvm.add %766, %768 : !llvm.i64 + %770 = llvm.mlir.constant(2 : index) : !llvm.i64 + %771 = llvm.mul %516, %770 : !llvm.i64 + %772 = llvm.add %769, %771 : !llvm.i64 + %773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %774 = llvm.mul %530, %773 : !llvm.i64 + %775 = llvm.add %772, %774 : !llvm.i64 + %776 = llvm.getelementptr %765[%775] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %754, %776 : !llvm.ptr> + %777 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %779 = llvm.mlir.constant(16 : index) : !llvm.i64 + %780 = llvm.mul %67, %779 : !llvm.i64 + %781 = llvm.add %778, %780 : !llvm.i64 + %782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %783 = llvm.mul %61, %782 : !llvm.i64 + %784 = llvm.add %781, %783 : !llvm.i64 + %785 = llvm.getelementptr %777[%784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %786 = llvm.load %785 : !llvm.ptr> + %787 = llvm.add %155, %46 : !llvm.i64 + %788 = llvm.icmp "slt" %787, %67 : !llvm.i64 + %789 = llvm.sub %64, %787 : !llvm.i64 + %790 = llvm.select %788, %789, %787 : !llvm.i1, !llvm.i64 + %791 = llvm.sdiv %790, %68 : !llvm.i64 + %792 = llvm.sub %64, %791 : !llvm.i64 + %793 = llvm.select %788, %792, %791 : !llvm.i1, !llvm.i64 + %794 = llvm.srem %793, %68 : !llvm.i64 + %795 = llvm.icmp "slt" %794, %67 : !llvm.i64 + %796 = llvm.add %794, %68 : !llvm.i64 + %797 = llvm.select %795, %796, %794 : !llvm.i1, !llvm.i64 + %798 = llvm.mul %793, %65 : !llvm.i64 + %799 = llvm.add %566, %798 : !llvm.i64 + %800 = llvm.add %799, %61 : !llvm.i64 + %801 = llvm.icmp "slt" %800, %67 : !llvm.i64 + %802 = llvm.sub %64, %800 : !llvm.i64 + %803 = llvm.select %801, %802, %800 : !llvm.i1, !llvm.i64 + %804 = llvm.sdiv %803, %63 : !llvm.i64 + %805 = llvm.sub %64, %804 : !llvm.i64 + %806 = llvm.select %801, %805, %804 : !llvm.i1, !llvm.i64 + %807 = llvm.mul %806, %65 : !llvm.i64 + %808 = llvm.add %799, %807 : !llvm.i64 + %809 = llvm.add %808, %61 : !llvm.i64 + %810 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %811 = llvm.mlir.constant(0 : index) : !llvm.i64 + %812 = llvm.mlir.constant(256 : index) : !llvm.i64 + %813 = llvm.mul %797, %812 : !llvm.i64 + %814 = llvm.add %811, %813 : !llvm.i64 + %815 = llvm.mlir.constant(2 : index) : !llvm.i64 + %816 = llvm.mul %516, %815 : !llvm.i64 + %817 = llvm.add %814, %816 : !llvm.i64 + %818 = llvm.mlir.constant(1 : index) : !llvm.i64 + %819 = llvm.mul %809, %818 : !llvm.i64 + %820 = llvm.add %817, %819 : !llvm.i64 + %821 = llvm.getelementptr %810[%820] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %786, %821 : !llvm.ptr> + %822 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %824 = llvm.mlir.constant(16 : index) : !llvm.i64 + %825 = llvm.mul %67, %824 : !llvm.i64 + %826 = llvm.add %823, %825 : !llvm.i64 + %827 = llvm.mlir.constant(1 : index) : !llvm.i64 + %828 = llvm.mul %70, %827 : !llvm.i64 + %829 = llvm.add %826, %828 : !llvm.i64 + %830 = llvm.getelementptr %822[%829] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %831 = llvm.load %830 : !llvm.ptr> + %832 = llvm.add %508, %48 : !llvm.i64 + %833 = llvm.icmp "slt" %832, %67 : !llvm.i64 + %834 = llvm.sub %64, %832 : !llvm.i64 + %835 = llvm.select %833, %834, %832 : !llvm.i1, !llvm.i64 + %836 = llvm.sdiv %835, %68 : !llvm.i64 + %837 = llvm.sub %64, %836 : !llvm.i64 + %838 = llvm.select %833, %837, %836 : !llvm.i1, !llvm.i64 + %839 = llvm.mul %838, %60 : !llvm.i64 + %840 = llvm.add %508, %839 : !llvm.i64 + %841 = llvm.add %840, %48 : !llvm.i64 + %842 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %843 = llvm.mlir.constant(0 : index) : !llvm.i64 + %844 = llvm.mlir.constant(256 : index) : !llvm.i64 + %845 = llvm.mul %841, %844 : !llvm.i64 + %846 = llvm.add %843, %845 : !llvm.i64 + %847 = llvm.mlir.constant(2 : index) : !llvm.i64 + %848 = llvm.mul %516, %847 : !llvm.i64 + %849 = llvm.add %846, %848 : !llvm.i64 + %850 = llvm.mlir.constant(1 : index) : !llvm.i64 + %851 = llvm.mul %530, %850 : !llvm.i64 + %852 = llvm.add %849, %851 : !llvm.i64 + %853 = llvm.getelementptr %842[%852] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %831, %853 : !llvm.ptr> + %854 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %855 = llvm.mlir.constant(0 : index) : !llvm.i64 + %856 = llvm.mlir.constant(16 : index) : !llvm.i64 + %857 = llvm.mul %67, %856 : !llvm.i64 + %858 = llvm.add %855, %857 : !llvm.i64 + %859 = llvm.mlir.constant(1 : index) : !llvm.i64 + %860 = llvm.mul %50, %859 : !llvm.i64 + %861 = llvm.add %858, %860 : !llvm.i64 + %862 = llvm.getelementptr %854[%861] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %863 = llvm.load %862 : !llvm.ptr> + %864 = llvm.add %155, %49 : !llvm.i64 + %865 = llvm.icmp "slt" %864, %67 : !llvm.i64 + %866 = llvm.sub %64, %864 : !llvm.i64 + %867 = llvm.select %865, %866, %864 : !llvm.i1, !llvm.i64 + %868 = llvm.sdiv %867, %68 : !llvm.i64 + %869 = llvm.sub %64, %868 : !llvm.i64 + %870 = llvm.select %865, %869, %868 : !llvm.i1, !llvm.i64 + %871 = llvm.srem %870, %68 : !llvm.i64 + %872 = llvm.icmp "slt" %871, %67 : !llvm.i64 + %873 = llvm.add %871, %68 : !llvm.i64 + %874 = llvm.select %872, %873, %871 : !llvm.i1, !llvm.i64 + %875 = llvm.mul %870, %65 : !llvm.i64 + %876 = llvm.add %566, %875 : !llvm.i64 + %877 = llvm.add %876, %50 : !llvm.i64 + %878 = llvm.icmp "slt" %877, %67 : !llvm.i64 + %879 = llvm.sub %64, %877 : !llvm.i64 + %880 = llvm.select %878, %879, %877 : !llvm.i1, !llvm.i64 + %881 = llvm.sdiv %880, %63 : !llvm.i64 + %882 = llvm.sub %64, %881 : !llvm.i64 + %883 = llvm.select %878, %882, %881 : !llvm.i1, !llvm.i64 + %884 = llvm.mul %883, %65 : !llvm.i64 + %885 = llvm.add %876, %884 : !llvm.i64 + %886 = llvm.add %885, %50 : !llvm.i64 + %887 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %888 = llvm.mlir.constant(0 : index) : !llvm.i64 + %889 = llvm.mlir.constant(256 : index) : !llvm.i64 + %890 = llvm.mul %874, %889 : !llvm.i64 + %891 = llvm.add %888, %890 : !llvm.i64 + %892 = llvm.mlir.constant(2 : index) : !llvm.i64 + %893 = llvm.mul %516, %892 : !llvm.i64 + %894 = llvm.add %891, %893 : !llvm.i64 + %895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %896 = llvm.mul %886, %895 : !llvm.i64 + %897 = llvm.add %894, %896 : !llvm.i64 + %898 = llvm.getelementptr %887[%897] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %863, %898 : !llvm.ptr> + %899 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %901 = llvm.mlir.constant(16 : index) : !llvm.i64 + %902 = llvm.mul %67, %901 : !llvm.i64 + %903 = llvm.add %900, %902 : !llvm.i64 + %904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %905 = llvm.mul %33, %904 : !llvm.i64 + %906 = llvm.add %903, %905 : !llvm.i64 + %907 = llvm.getelementptr %899[%906] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %908 = llvm.load %907 : !llvm.ptr> + %909 = llvm.add %508, %52 : !llvm.i64 + %910 = llvm.icmp "slt" %909, %67 : !llvm.i64 + %911 = llvm.sub %64, %909 : !llvm.i64 + %912 = llvm.select %910, %911, %909 : !llvm.i1, !llvm.i64 + %913 = llvm.sdiv %912, %68 : !llvm.i64 + %914 = llvm.sub %64, %913 : !llvm.i64 + %915 = llvm.select %910, %914, %913 : !llvm.i1, !llvm.i64 + %916 = llvm.mul %915, %60 : !llvm.i64 + %917 = llvm.add %508, %916 : !llvm.i64 + %918 = llvm.add %917, %52 : !llvm.i64 + %919 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %921 = llvm.mlir.constant(256 : index) : !llvm.i64 + %922 = llvm.mul %918, %921 : !llvm.i64 + %923 = llvm.add %920, %922 : !llvm.i64 + %924 = llvm.mlir.constant(2 : index) : !llvm.i64 + %925 = llvm.mul %516, %924 : !llvm.i64 + %926 = llvm.add %923, %925 : !llvm.i64 + %927 = llvm.mlir.constant(1 : index) : !llvm.i64 + %928 = llvm.mul %530, %927 : !llvm.i64 + %929 = llvm.add %926, %928 : !llvm.i64 + %930 = llvm.getelementptr %919[%929] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %908, %930 : !llvm.ptr> + %931 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %932 = llvm.mlir.constant(0 : index) : !llvm.i64 + %933 = llvm.mlir.constant(16 : index) : !llvm.i64 + %934 = llvm.mul %67, %933 : !llvm.i64 + %935 = llvm.add %932, %934 : !llvm.i64 + %936 = llvm.mlir.constant(1 : index) : !llvm.i64 + %937 = llvm.mul %54, %936 : !llvm.i64 + %938 = llvm.add %935, %937 : !llvm.i64 + %939 = llvm.getelementptr %931[%938] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %940 = llvm.load %939 : !llvm.ptr> + %941 = llvm.add %155, %53 : !llvm.i64 + %942 = llvm.icmp "slt" %941, %67 : !llvm.i64 + %943 = llvm.sub %64, %941 : !llvm.i64 + %944 = llvm.select %942, %943, %941 : !llvm.i1, !llvm.i64 + %945 = llvm.sdiv %944, %68 : !llvm.i64 + %946 = llvm.sub %64, %945 : !llvm.i64 + %947 = llvm.select %942, %946, %945 : !llvm.i1, !llvm.i64 + %948 = llvm.srem %947, %68 : !llvm.i64 + %949 = llvm.icmp "slt" %948, %67 : !llvm.i64 + %950 = llvm.add %948, %68 : !llvm.i64 + %951 = llvm.select %949, %950, %948 : !llvm.i1, !llvm.i64 + %952 = llvm.mul %947, %65 : !llvm.i64 + %953 = llvm.add %566, %952 : !llvm.i64 + %954 = llvm.add %953, %54 : !llvm.i64 + %955 = llvm.icmp "slt" %954, %67 : !llvm.i64 + %956 = llvm.sub %64, %954 : !llvm.i64 + %957 = llvm.select %955, %956, %954 : !llvm.i1, !llvm.i64 + %958 = llvm.sdiv %957, %63 : !llvm.i64 + %959 = llvm.sub %64, %958 : !llvm.i64 + %960 = llvm.select %955, %959, %958 : !llvm.i1, !llvm.i64 + %961 = llvm.mul %960, %65 : !llvm.i64 + %962 = llvm.add %953, %961 : !llvm.i64 + %963 = llvm.add %962, %54 : !llvm.i64 + %964 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %966 = llvm.mlir.constant(256 : index) : !llvm.i64 + %967 = llvm.mul %951, %966 : !llvm.i64 + %968 = llvm.add %965, %967 : !llvm.i64 + %969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %970 = llvm.mul %516, %969 : !llvm.i64 + %971 = llvm.add %968, %970 : !llvm.i64 + %972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %973 = llvm.mul %963, %972 : !llvm.i64 + %974 = llvm.add %971, %973 : !llvm.i64 + %975 = llvm.getelementptr %964[%974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %940, %975 : !llvm.ptr> + %976 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %977 = llvm.mlir.constant(0 : index) : !llvm.i64 + %978 = llvm.mlir.constant(16 : index) : !llvm.i64 + %979 = llvm.mul %67, %978 : !llvm.i64 + %980 = llvm.add %977, %979 : !llvm.i64 + %981 = llvm.mlir.constant(1 : index) : !llvm.i64 + %982 = llvm.mul %34, %981 : !llvm.i64 + %983 = llvm.add %980, %982 : !llvm.i64 + %984 = llvm.getelementptr %976[%983] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %985 = llvm.load %984 : !llvm.ptr> + %986 = llvm.add %508, %56 : !llvm.i64 + %987 = llvm.icmp "slt" %986, %67 : !llvm.i64 + %988 = llvm.sub %64, %986 : !llvm.i64 + %989 = llvm.select %987, %988, %986 : !llvm.i1, !llvm.i64 + %990 = llvm.sdiv %989, %68 : !llvm.i64 + %991 = llvm.sub %64, %990 : !llvm.i64 + %992 = llvm.select %987, %991, %990 : !llvm.i1, !llvm.i64 + %993 = llvm.mul %992, %60 : !llvm.i64 + %994 = llvm.add %508, %993 : !llvm.i64 + %995 = llvm.add %994, %56 : !llvm.i64 + %996 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %997 = llvm.mlir.constant(0 : index) : !llvm.i64 + %998 = llvm.mlir.constant(256 : index) : !llvm.i64 + %999 = llvm.mul %995, %998 : !llvm.i64 + %1000 = llvm.add %997, %999 : !llvm.i64 + %1001 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1002 = llvm.mul %516, %1001 : !llvm.i64 + %1003 = llvm.add %1000, %1002 : !llvm.i64 + %1004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1005 = llvm.mul %530, %1004 : !llvm.i64 + %1006 = llvm.add %1003, %1005 : !llvm.i64 + %1007 = llvm.getelementptr %996[%1006] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %985, %1007 : !llvm.ptr> + %1008 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1010 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1011 = llvm.mul %67, %1010 : !llvm.i64 + %1012 = llvm.add %1009, %1011 : !llvm.i64 + %1013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1014 = llvm.mul %58, %1013 : !llvm.i64 + %1015 = llvm.add %1012, %1014 : !llvm.i64 + %1016 = llvm.getelementptr %1008[%1015] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1017 = llvm.load %1016 : !llvm.ptr> + %1018 = llvm.add %155, %57 : !llvm.i64 + %1019 = llvm.icmp "slt" %1018, %67 : !llvm.i64 + %1020 = llvm.sub %64, %1018 : !llvm.i64 + %1021 = llvm.select %1019, %1020, %1018 : !llvm.i1, !llvm.i64 + %1022 = llvm.sdiv %1021, %68 : !llvm.i64 + %1023 = llvm.sub %64, %1022 : !llvm.i64 + %1024 = llvm.select %1019, %1023, %1022 : !llvm.i1, !llvm.i64 + %1025 = llvm.srem %1024, %68 : !llvm.i64 + %1026 = llvm.icmp "slt" %1025, %67 : !llvm.i64 + %1027 = llvm.add %1025, %68 : !llvm.i64 + %1028 = llvm.select %1026, %1027, %1025 : !llvm.i1, !llvm.i64 + %1029 = llvm.mul %1024, %65 : !llvm.i64 + %1030 = llvm.add %566, %1029 : !llvm.i64 + %1031 = llvm.add %1030, %58 : !llvm.i64 + %1032 = llvm.icmp "slt" %1031, %67 : !llvm.i64 + %1033 = llvm.sub %64, %1031 : !llvm.i64 + %1034 = llvm.select %1032, %1033, %1031 : !llvm.i1, !llvm.i64 + %1035 = llvm.sdiv %1034, %63 : !llvm.i64 + %1036 = llvm.sub %64, %1035 : !llvm.i64 + %1037 = llvm.select %1032, %1036, %1035 : !llvm.i1, !llvm.i64 + %1038 = llvm.mul %1037, %65 : !llvm.i64 + %1039 = llvm.add %1030, %1038 : !llvm.i64 + %1040 = llvm.add %1039, %58 : !llvm.i64 + %1041 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1042 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1043 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1044 = llvm.mul %1028, %1043 : !llvm.i64 + %1045 = llvm.add %1042, %1044 : !llvm.i64 + %1046 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1047 = llvm.mul %516, %1046 : !llvm.i64 + %1048 = llvm.add %1045, %1047 : !llvm.i64 + %1049 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1050 = llvm.mul %1040, %1049 : !llvm.i64 + %1051 = llvm.add %1048, %1050 : !llvm.i64 + %1052 = llvm.getelementptr %1041[%1051] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1017, %1052 : !llvm.ptr> + %1053 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1054 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1055 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1056 = llvm.mul %67, %1055 : !llvm.i64 + %1057 = llvm.add %1054, %1056 : !llvm.i64 + %1058 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1059 = llvm.mul %35, %1058 : !llvm.i64 + %1060 = llvm.add %1057, %1059 : !llvm.i64 + %1061 = llvm.getelementptr %1053[%1060] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1062 = llvm.load %1061 : !llvm.ptr> + %1063 = llvm.add %508, %61 : !llvm.i64 + %1064 = llvm.icmp "slt" %1063, %67 : !llvm.i64 + %1065 = llvm.sub %64, %1063 : !llvm.i64 + %1066 = llvm.select %1064, %1065, %1063 : !llvm.i1, !llvm.i64 + %1067 = llvm.sdiv %1066, %68 : !llvm.i64 + %1068 = llvm.sub %64, %1067 : !llvm.i64 + %1069 = llvm.select %1064, %1068, %1067 : !llvm.i1, !llvm.i64 + %1070 = llvm.mul %1069, %60 : !llvm.i64 + %1071 = llvm.add %508, %1070 : !llvm.i64 + %1072 = llvm.add %1071, %61 : !llvm.i64 + %1073 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1074 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1075 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1076 = llvm.mul %1072, %1075 : !llvm.i64 + %1077 = llvm.add %1074, %1076 : !llvm.i64 + %1078 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1079 = llvm.mul %516, %1078 : !llvm.i64 + %1080 = llvm.add %1077, %1079 : !llvm.i64 + %1081 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1082 = llvm.mul %530, %1081 : !llvm.i64 + %1083 = llvm.add %1080, %1082 : !llvm.i64 + %1084 = llvm.getelementptr %1073[%1083] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1062, %1084 : !llvm.ptr> + %1085 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1087 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1088 = llvm.mul %67, %1087 : !llvm.i64 + %1089 = llvm.add %1086, %1088 : !llvm.i64 + %1090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1091 = llvm.mul %66, %1090 : !llvm.i64 + %1092 = llvm.add %1089, %1091 : !llvm.i64 + %1093 = llvm.getelementptr %1085[%1092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1094 = llvm.load %1093 : !llvm.ptr> + %1095 = llvm.add %155, %62 : !llvm.i64 + %1096 = llvm.icmp "slt" %1095, %67 : !llvm.i64 + %1097 = llvm.sub %64, %1095 : !llvm.i64 + %1098 = llvm.select %1096, %1097, %1095 : !llvm.i1, !llvm.i64 + %1099 = llvm.sdiv %1098, %68 : !llvm.i64 + %1100 = llvm.sub %64, %1099 : !llvm.i64 + %1101 = llvm.select %1096, %1100, %1099 : !llvm.i1, !llvm.i64 + %1102 = llvm.srem %1101, %68 : !llvm.i64 + %1103 = llvm.icmp "slt" %1102, %67 : !llvm.i64 + %1104 = llvm.add %1102, %68 : !llvm.i64 + %1105 = llvm.select %1103, %1104, %1102 : !llvm.i1, !llvm.i64 + %1106 = llvm.mul %1101, %65 : !llvm.i64 + %1107 = llvm.add %566, %1106 : !llvm.i64 + %1108 = llvm.add %1107, %66 : !llvm.i64 + %1109 = llvm.icmp "slt" %1108, %67 : !llvm.i64 + %1110 = llvm.sub %64, %1108 : !llvm.i64 + %1111 = llvm.select %1109, %1110, %1108 : !llvm.i1, !llvm.i64 + %1112 = llvm.sdiv %1111, %63 : !llvm.i64 + %1113 = llvm.sub %64, %1112 : !llvm.i64 + %1114 = llvm.select %1109, %1113, %1112 : !llvm.i1, !llvm.i64 + %1115 = llvm.mul %1114, %65 : !llvm.i64 + %1116 = llvm.add %1107, %1115 : !llvm.i64 + %1117 = llvm.add %1116, %66 : !llvm.i64 + %1118 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1119 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1120 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1121 = llvm.mul %1105, %1120 : !llvm.i64 + %1122 = llvm.add %1119, %1121 : !llvm.i64 + %1123 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1124 = llvm.mul %516, %1123 : !llvm.i64 + %1125 = llvm.add %1122, %1124 : !llvm.i64 + %1126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1127 = llvm.mul %1117, %1126 : !llvm.i64 + %1128 = llvm.add %1125, %1127 : !llvm.i64 + %1129 = llvm.getelementptr %1118[%1128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1094, %1129 : !llvm.ptr> + llvm.br ^bb9 + ^bb8: // pred: ^bb6 + %1130 = llvm.add %151, %155 : !llvm.i64 + %1131 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1134 = llvm.mul %153, %1133 : !llvm.i64 + %1135 = llvm.add %1132, %1134 : !llvm.i64 + %1136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1137 = llvm.mul %1130, %1136 : !llvm.i64 + %1138 = llvm.add %1135, %1137 : !llvm.i64 + %1139 = llvm.getelementptr %1131[%1138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1140 = llvm.bitcast %1139 : !llvm.ptr to !llvm.ptr> + %1141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1143 = llvm.trunc %1130 : !llvm.i64 to !llvm.i32 + %1144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1146 = llvm.insertelement %1143, %1144[%1145 : !llvm.i32] : !llvm.vec<8 x i32> + %1147 = llvm.shufflevector %1146, %1144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1148 = llvm.add %1147, %1142 : !llvm.vec<8 x i32> + %1149 = llvm.trunc %1141 : !llvm.i64 to !llvm.i32 + %1150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1152 = llvm.insertelement %1149, %1150[%1151 : !llvm.i32] : !llvm.vec<8 x i32> + %1153 = llvm.shufflevector %1152, %1150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1154 = llvm.icmp "slt" %1148, %1153 : !llvm.vec<8 x i32> + %1155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1156 = llvm.intr.masked.load %1140, %1154, %1155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1157 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1159 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1160 = llvm.mul %67, %1159 : !llvm.i64 + %1161 = llvm.add %1158, %1160 : !llvm.i64 + %1162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1163 = llvm.mul %67, %1162 : !llvm.i64 + %1164 = llvm.add %1161, %1163 : !llvm.i64 + %1165 = llvm.getelementptr %1157[%1164] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1156, %1165 : !llvm.ptr> + %1166 = llvm.add %1130, %70 : !llvm.i64 + %1167 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1169 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1170 = llvm.mul %153, %1169 : !llvm.i64 + %1171 = llvm.add %1168, %1170 : !llvm.i64 + %1172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1173 = llvm.mul %1166, %1172 : !llvm.i64 + %1174 = llvm.add %1171, %1173 : !llvm.i64 + %1175 = llvm.getelementptr %1167[%1174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1176 = llvm.bitcast %1175 : !llvm.ptr to !llvm.ptr> + %1177 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1178 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1179 = llvm.trunc %1166 : !llvm.i64 to !llvm.i32 + %1180 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1181 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1182 = llvm.insertelement %1179, %1180[%1181 : !llvm.i32] : !llvm.vec<8 x i32> + %1183 = llvm.shufflevector %1182, %1180 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1184 = llvm.add %1183, %1178 : !llvm.vec<8 x i32> + %1185 = llvm.trunc %1177 : !llvm.i64 to !llvm.i32 + %1186 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1187 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1188 = llvm.insertelement %1185, %1186[%1187 : !llvm.i32] : !llvm.vec<8 x i32> + %1189 = llvm.shufflevector %1188, %1186 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1190 = llvm.icmp "slt" %1184, %1189 : !llvm.vec<8 x i32> + %1191 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1192 = llvm.intr.masked.load %1176, %1190, %1191 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1193 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1194 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1195 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1196 = llvm.mul %67, %1195 : !llvm.i64 + %1197 = llvm.add %1194, %1196 : !llvm.i64 + %1198 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1199 = llvm.mul %69, %1198 : !llvm.i64 + %1200 = llvm.add %1197, %1199 : !llvm.i64 + %1201 = llvm.getelementptr %1193[%1200] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1192, %1201 : !llvm.ptr> + %1202 = llvm.add %1130, %68 : !llvm.i64 + %1203 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1204 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1205 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1206 = llvm.mul %153, %1205 : !llvm.i64 + %1207 = llvm.add %1204, %1206 : !llvm.i64 + %1208 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1209 = llvm.mul %1202, %1208 : !llvm.i64 + %1210 = llvm.add %1207, %1209 : !llvm.i64 + %1211 = llvm.getelementptr %1203[%1210] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1212 = llvm.bitcast %1211 : !llvm.ptr to !llvm.ptr> + %1213 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1214 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1215 = llvm.trunc %1202 : !llvm.i64 to !llvm.i32 + %1216 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1217 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1218 = llvm.insertelement %1215, %1216[%1217 : !llvm.i32] : !llvm.vec<8 x i32> + %1219 = llvm.shufflevector %1218, %1216 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1220 = llvm.add %1219, %1214 : !llvm.vec<8 x i32> + %1221 = llvm.trunc %1213 : !llvm.i64 to !llvm.i32 + %1222 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1223 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1224 = llvm.insertelement %1221, %1222[%1223 : !llvm.i32] : !llvm.vec<8 x i32> + %1225 = llvm.shufflevector %1224, %1222 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1226 = llvm.icmp "slt" %1220, %1225 : !llvm.vec<8 x i32> + %1227 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1228 = llvm.intr.masked.load %1212, %1226, %1227 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1229 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1230 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1231 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1232 = llvm.mul %67, %1231 : !llvm.i64 + %1233 = llvm.add %1230, %1232 : !llvm.i64 + %1234 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1235 = llvm.mul %63, %1234 : !llvm.i64 + %1236 = llvm.add %1233, %1235 : !llvm.i64 + %1237 = llvm.getelementptr %1229[%1236] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1228, %1237 : !llvm.ptr> + %1238 = llvm.add %1130, %41 : !llvm.i64 + %1239 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1242 = llvm.mul %153, %1241 : !llvm.i64 + %1243 = llvm.add %1240, %1242 : !llvm.i64 + %1244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1245 = llvm.mul %1238, %1244 : !llvm.i64 + %1246 = llvm.add %1243, %1245 : !llvm.i64 + %1247 = llvm.getelementptr %1239[%1246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1248 = llvm.bitcast %1247 : !llvm.ptr to !llvm.ptr> + %1249 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1250 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1251 = llvm.trunc %1238 : !llvm.i64 to !llvm.i32 + %1252 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1253 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1254 = llvm.insertelement %1251, %1252[%1253 : !llvm.i32] : !llvm.vec<8 x i32> + %1255 = llvm.shufflevector %1254, %1252 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1256 = llvm.add %1255, %1250 : !llvm.vec<8 x i32> + %1257 = llvm.trunc %1249 : !llvm.i64 to !llvm.i32 + %1258 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1259 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1260 = llvm.insertelement %1257, %1258[%1259 : !llvm.i32] : !llvm.vec<8 x i32> + %1261 = llvm.shufflevector %1260, %1258 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1262 = llvm.icmp "slt" %1256, %1261 : !llvm.vec<8 x i32> + %1263 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1264 = llvm.intr.masked.load %1248, %1262, %1263 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1265 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1266 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1267 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1268 = llvm.mul %67, %1267 : !llvm.i64 + %1269 = llvm.add %1266, %1268 : !llvm.i64 + %1270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1271 = llvm.mul %45, %1270 : !llvm.i64 + %1272 = llvm.add %1269, %1271 : !llvm.i64 + %1273 = llvm.getelementptr %1265[%1272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1264, %1273 : !llvm.ptr> + %1274 = llvm.add %1130, %42 : !llvm.i64 + %1275 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1276 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1277 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1278 = llvm.mul %153, %1277 : !llvm.i64 + %1279 = llvm.add %1276, %1278 : !llvm.i64 + %1280 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1281 = llvm.mul %1274, %1280 : !llvm.i64 + %1282 = llvm.add %1279, %1281 : !llvm.i64 + %1283 = llvm.getelementptr %1275[%1282] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1284 = llvm.bitcast %1283 : !llvm.ptr to !llvm.ptr> + %1285 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1286 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1287 = llvm.trunc %1274 : !llvm.i64 to !llvm.i32 + %1288 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1289 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1290 = llvm.insertelement %1287, %1288[%1289 : !llvm.i32] : !llvm.vec<8 x i32> + %1291 = llvm.shufflevector %1290, %1288 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1292 = llvm.add %1291, %1286 : !llvm.vec<8 x i32> + %1293 = llvm.trunc %1285 : !llvm.i64 to !llvm.i32 + %1294 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1295 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1296 = llvm.insertelement %1293, %1294[%1295 : !llvm.i32] : !llvm.vec<8 x i32> + %1297 = llvm.shufflevector %1296, %1294 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1298 = llvm.icmp "slt" %1292, %1297 : !llvm.vec<8 x i32> + %1299 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1300 = llvm.intr.masked.load %1284, %1298, %1299 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1301 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1302 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1303 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1304 = llvm.mul %67, %1303 : !llvm.i64 + %1305 = llvm.add %1302, %1304 : !llvm.i64 + %1306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1307 = llvm.mul %48, %1306 : !llvm.i64 + %1308 = llvm.add %1305, %1307 : !llvm.i64 + %1309 = llvm.getelementptr %1301[%1308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1300, %1309 : !llvm.ptr> + %1310 = llvm.add %1130, %43 : !llvm.i64 + %1311 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1314 = llvm.mul %153, %1313 : !llvm.i64 + %1315 = llvm.add %1312, %1314 : !llvm.i64 + %1316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1317 = llvm.mul %1310, %1316 : !llvm.i64 + %1318 = llvm.add %1315, %1317 : !llvm.i64 + %1319 = llvm.getelementptr %1311[%1318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1320 = llvm.bitcast %1319 : !llvm.ptr to !llvm.ptr> + %1321 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1322 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1323 = llvm.trunc %1310 : !llvm.i64 to !llvm.i32 + %1324 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1325 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1326 = llvm.insertelement %1323, %1324[%1325 : !llvm.i32] : !llvm.vec<8 x i32> + %1327 = llvm.shufflevector %1326, %1324 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1328 = llvm.add %1327, %1322 : !llvm.vec<8 x i32> + %1329 = llvm.trunc %1321 : !llvm.i64 to !llvm.i32 + %1330 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1331 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1332 = llvm.insertelement %1329, %1330[%1331 : !llvm.i32] : !llvm.vec<8 x i32> + %1333 = llvm.shufflevector %1332, %1330 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1334 = llvm.icmp "slt" %1328, %1333 : !llvm.vec<8 x i32> + %1335 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1336 = llvm.intr.masked.load %1320, %1334, %1335 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1340 = llvm.mul %67, %1339 : !llvm.i64 + %1341 = llvm.add %1338, %1340 : !llvm.i64 + %1342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1343 = llvm.mul %52, %1342 : !llvm.i64 + %1344 = llvm.add %1341, %1343 : !llvm.i64 + %1345 = llvm.getelementptr %1337[%1344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1336, %1345 : !llvm.ptr> + %1346 = llvm.add %1130, %44 : !llvm.i64 + %1347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1350 = llvm.mul %153, %1349 : !llvm.i64 + %1351 = llvm.add %1348, %1350 : !llvm.i64 + %1352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1353 = llvm.mul %1346, %1352 : !llvm.i64 + %1354 = llvm.add %1351, %1353 : !llvm.i64 + %1355 = llvm.getelementptr %1347[%1354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1356 = llvm.bitcast %1355 : !llvm.ptr to !llvm.ptr> + %1357 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1358 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1359 = llvm.trunc %1346 : !llvm.i64 to !llvm.i32 + %1360 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1361 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1362 = llvm.insertelement %1359, %1360[%1361 : !llvm.i32] : !llvm.vec<8 x i32> + %1363 = llvm.shufflevector %1362, %1360 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1364 = llvm.add %1363, %1358 : !llvm.vec<8 x i32> + %1365 = llvm.trunc %1357 : !llvm.i64 to !llvm.i32 + %1366 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1367 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1368 = llvm.insertelement %1365, %1366[%1367 : !llvm.i32] : !llvm.vec<8 x i32> + %1369 = llvm.shufflevector %1368, %1366 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1370 = llvm.icmp "slt" %1364, %1369 : !llvm.vec<8 x i32> + %1371 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1372 = llvm.intr.masked.load %1356, %1370, %1371 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1373 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1375 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1376 = llvm.mul %67, %1375 : !llvm.i64 + %1377 = llvm.add %1374, %1376 : !llvm.i64 + %1378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1379 = llvm.mul %56, %1378 : !llvm.i64 + %1380 = llvm.add %1377, %1379 : !llvm.i64 + %1381 = llvm.getelementptr %1373[%1380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1372, %1381 : !llvm.ptr> + %1382 = llvm.add %1130, %46 : !llvm.i64 + %1383 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1384 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1385 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1386 = llvm.mul %153, %1385 : !llvm.i64 + %1387 = llvm.add %1384, %1386 : !llvm.i64 + %1388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1389 = llvm.mul %1382, %1388 : !llvm.i64 + %1390 = llvm.add %1387, %1389 : !llvm.i64 + %1391 = llvm.getelementptr %1383[%1390] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1392 = llvm.bitcast %1391 : !llvm.ptr to !llvm.ptr> + %1393 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1394 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1395 = llvm.trunc %1382 : !llvm.i64 to !llvm.i32 + %1396 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1397 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1398 = llvm.insertelement %1395, %1396[%1397 : !llvm.i32] : !llvm.vec<8 x i32> + %1399 = llvm.shufflevector %1398, %1396 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1400 = llvm.add %1399, %1394 : !llvm.vec<8 x i32> + %1401 = llvm.trunc %1393 : !llvm.i64 to !llvm.i32 + %1402 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1403 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1404 = llvm.insertelement %1401, %1402[%1403 : !llvm.i32] : !llvm.vec<8 x i32> + %1405 = llvm.shufflevector %1404, %1402 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1406 = llvm.icmp "slt" %1400, %1405 : !llvm.vec<8 x i32> + %1407 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1408 = llvm.intr.masked.load %1392, %1406, %1407 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1409 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1410 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1411 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1412 = llvm.mul %67, %1411 : !llvm.i64 + %1413 = llvm.add %1410, %1412 : !llvm.i64 + %1414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1415 = llvm.mul %61, %1414 : !llvm.i64 + %1416 = llvm.add %1413, %1415 : !llvm.i64 + %1417 = llvm.getelementptr %1409[%1416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1408, %1417 : !llvm.ptr> + %1418 = llvm.add %1130, %47 : !llvm.i64 + %1419 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1422 = llvm.mul %153, %1421 : !llvm.i64 + %1423 = llvm.add %1420, %1422 : !llvm.i64 + %1424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1425 = llvm.mul %1418, %1424 : !llvm.i64 + %1426 = llvm.add %1423, %1425 : !llvm.i64 + %1427 = llvm.getelementptr %1419[%1426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1428 = llvm.bitcast %1427 : !llvm.ptr to !llvm.ptr> + %1429 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1430 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1431 = llvm.trunc %1418 : !llvm.i64 to !llvm.i32 + %1432 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1433 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1434 = llvm.insertelement %1431, %1432[%1433 : !llvm.i32] : !llvm.vec<8 x i32> + %1435 = llvm.shufflevector %1434, %1432 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1436 = llvm.add %1435, %1430 : !llvm.vec<8 x i32> + %1437 = llvm.trunc %1429 : !llvm.i64 to !llvm.i32 + %1438 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1439 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1440 = llvm.insertelement %1437, %1438[%1439 : !llvm.i32] : !llvm.vec<8 x i32> + %1441 = llvm.shufflevector %1440, %1438 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1442 = llvm.icmp "slt" %1436, %1441 : !llvm.vec<8 x i32> + %1443 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1444 = llvm.intr.masked.load %1428, %1442, %1443 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1445 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1446 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1447 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1448 = llvm.mul %67, %1447 : !llvm.i64 + %1449 = llvm.add %1446, %1448 : !llvm.i64 + %1450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1451 = llvm.mul %70, %1450 : !llvm.i64 + %1452 = llvm.add %1449, %1451 : !llvm.i64 + %1453 = llvm.getelementptr %1445[%1452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1444, %1453 : !llvm.ptr> + %1454 = llvm.add %1130, %49 : !llvm.i64 + %1455 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1456 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1457 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1458 = llvm.mul %153, %1457 : !llvm.i64 + %1459 = llvm.add %1456, %1458 : !llvm.i64 + %1460 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1461 = llvm.mul %1454, %1460 : !llvm.i64 + %1462 = llvm.add %1459, %1461 : !llvm.i64 + %1463 = llvm.getelementptr %1455[%1462] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1464 = llvm.bitcast %1463 : !llvm.ptr to !llvm.ptr> + %1465 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1466 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1467 = llvm.trunc %1454 : !llvm.i64 to !llvm.i32 + %1468 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1469 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1470 = llvm.insertelement %1467, %1468[%1469 : !llvm.i32] : !llvm.vec<8 x i32> + %1471 = llvm.shufflevector %1470, %1468 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1472 = llvm.add %1471, %1466 : !llvm.vec<8 x i32> + %1473 = llvm.trunc %1465 : !llvm.i64 to !llvm.i32 + %1474 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1475 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1476 = llvm.insertelement %1473, %1474[%1475 : !llvm.i32] : !llvm.vec<8 x i32> + %1477 = llvm.shufflevector %1476, %1474 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1478 = llvm.icmp "slt" %1472, %1477 : !llvm.vec<8 x i32> + %1479 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1480 = llvm.intr.masked.load %1464, %1478, %1479 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1481 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1482 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1483 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1484 = llvm.mul %67, %1483 : !llvm.i64 + %1485 = llvm.add %1482, %1484 : !llvm.i64 + %1486 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1487 = llvm.mul %50, %1486 : !llvm.i64 + %1488 = llvm.add %1485, %1487 : !llvm.i64 + %1489 = llvm.getelementptr %1481[%1488] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1480, %1489 : !llvm.ptr> + %1490 = llvm.add %1130, %51 : !llvm.i64 + %1491 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1492 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1493 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1494 = llvm.mul %153, %1493 : !llvm.i64 + %1495 = llvm.add %1492, %1494 : !llvm.i64 + %1496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1497 = llvm.mul %1490, %1496 : !llvm.i64 + %1498 = llvm.add %1495, %1497 : !llvm.i64 + %1499 = llvm.getelementptr %1491[%1498] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1500 = llvm.bitcast %1499 : !llvm.ptr to !llvm.ptr> + %1501 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1502 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1503 = llvm.trunc %1490 : !llvm.i64 to !llvm.i32 + %1504 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1505 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1506 = llvm.insertelement %1503, %1504[%1505 : !llvm.i32] : !llvm.vec<8 x i32> + %1507 = llvm.shufflevector %1506, %1504 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1508 = llvm.add %1507, %1502 : !llvm.vec<8 x i32> + %1509 = llvm.trunc %1501 : !llvm.i64 to !llvm.i32 + %1510 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1511 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1512 = llvm.insertelement %1509, %1510[%1511 : !llvm.i32] : !llvm.vec<8 x i32> + %1513 = llvm.shufflevector %1512, %1510 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1514 = llvm.icmp "slt" %1508, %1513 : !llvm.vec<8 x i32> + %1515 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1516 = llvm.intr.masked.load %1500, %1514, %1515 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1517 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1519 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1520 = llvm.mul %67, %1519 : !llvm.i64 + %1521 = llvm.add %1518, %1520 : !llvm.i64 + %1522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1523 = llvm.mul %33, %1522 : !llvm.i64 + %1524 = llvm.add %1521, %1523 : !llvm.i64 + %1525 = llvm.getelementptr %1517[%1524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1516, %1525 : !llvm.ptr> + %1526 = llvm.add %1130, %53 : !llvm.i64 + %1527 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1528 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1529 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1530 = llvm.mul %153, %1529 : !llvm.i64 + %1531 = llvm.add %1528, %1530 : !llvm.i64 + %1532 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1533 = llvm.mul %1526, %1532 : !llvm.i64 + %1534 = llvm.add %1531, %1533 : !llvm.i64 + %1535 = llvm.getelementptr %1527[%1534] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1536 = llvm.bitcast %1535 : !llvm.ptr to !llvm.ptr> + %1537 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1538 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1539 = llvm.trunc %1526 : !llvm.i64 to !llvm.i32 + %1540 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1541 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1542 = llvm.insertelement %1539, %1540[%1541 : !llvm.i32] : !llvm.vec<8 x i32> + %1543 = llvm.shufflevector %1542, %1540 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1544 = llvm.add %1543, %1538 : !llvm.vec<8 x i32> + %1545 = llvm.trunc %1537 : !llvm.i64 to !llvm.i32 + %1546 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1547 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1548 = llvm.insertelement %1545, %1546[%1547 : !llvm.i32] : !llvm.vec<8 x i32> + %1549 = llvm.shufflevector %1548, %1546 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1550 = llvm.icmp "slt" %1544, %1549 : !llvm.vec<8 x i32> + %1551 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1552 = llvm.intr.masked.load %1536, %1550, %1551 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1553 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1554 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1555 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1556 = llvm.mul %67, %1555 : !llvm.i64 + %1557 = llvm.add %1554, %1556 : !llvm.i64 + %1558 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1559 = llvm.mul %54, %1558 : !llvm.i64 + %1560 = llvm.add %1557, %1559 : !llvm.i64 + %1561 = llvm.getelementptr %1553[%1560] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1552, %1561 : !llvm.ptr> + %1562 = llvm.add %1130, %55 : !llvm.i64 + %1563 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1564 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1565 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1566 = llvm.mul %153, %1565 : !llvm.i64 + %1567 = llvm.add %1564, %1566 : !llvm.i64 + %1568 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1569 = llvm.mul %1562, %1568 : !llvm.i64 + %1570 = llvm.add %1567, %1569 : !llvm.i64 + %1571 = llvm.getelementptr %1563[%1570] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1572 = llvm.bitcast %1571 : !llvm.ptr to !llvm.ptr> + %1573 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1574 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1575 = llvm.trunc %1562 : !llvm.i64 to !llvm.i32 + %1576 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1577 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1578 = llvm.insertelement %1575, %1576[%1577 : !llvm.i32] : !llvm.vec<8 x i32> + %1579 = llvm.shufflevector %1578, %1576 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1580 = llvm.add %1579, %1574 : !llvm.vec<8 x i32> + %1581 = llvm.trunc %1573 : !llvm.i64 to !llvm.i32 + %1582 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1583 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1584 = llvm.insertelement %1581, %1582[%1583 : !llvm.i32] : !llvm.vec<8 x i32> + %1585 = llvm.shufflevector %1584, %1582 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1586 = llvm.icmp "slt" %1580, %1585 : !llvm.vec<8 x i32> + %1587 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1588 = llvm.intr.masked.load %1572, %1586, %1587 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1589 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1592 = llvm.mul %67, %1591 : !llvm.i64 + %1593 = llvm.add %1590, %1592 : !llvm.i64 + %1594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1595 = llvm.mul %34, %1594 : !llvm.i64 + %1596 = llvm.add %1593, %1595 : !llvm.i64 + %1597 = llvm.getelementptr %1589[%1596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1588, %1597 : !llvm.ptr> + %1598 = llvm.add %1130, %57 : !llvm.i64 + %1599 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1602 = llvm.mul %153, %1601 : !llvm.i64 + %1603 = llvm.add %1600, %1602 : !llvm.i64 + %1604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1605 = llvm.mul %1598, %1604 : !llvm.i64 + %1606 = llvm.add %1603, %1605 : !llvm.i64 + %1607 = llvm.getelementptr %1599[%1606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1608 = llvm.bitcast %1607 : !llvm.ptr to !llvm.ptr> + %1609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1611 = llvm.trunc %1598 : !llvm.i64 to !llvm.i32 + %1612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1614 = llvm.insertelement %1611, %1612[%1613 : !llvm.i32] : !llvm.vec<8 x i32> + %1615 = llvm.shufflevector %1614, %1612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1616 = llvm.add %1615, %1610 : !llvm.vec<8 x i32> + %1617 = llvm.trunc %1609 : !llvm.i64 to !llvm.i32 + %1618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1620 = llvm.insertelement %1617, %1618[%1619 : !llvm.i32] : !llvm.vec<8 x i32> + %1621 = llvm.shufflevector %1620, %1618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1622 = llvm.icmp "slt" %1616, %1621 : !llvm.vec<8 x i32> + %1623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1624 = llvm.intr.masked.load %1608, %1622, %1623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1625 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1626 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1627 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1628 = llvm.mul %67, %1627 : !llvm.i64 + %1629 = llvm.add %1626, %1628 : !llvm.i64 + %1630 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1631 = llvm.mul %58, %1630 : !llvm.i64 + %1632 = llvm.add %1629, %1631 : !llvm.i64 + %1633 = llvm.getelementptr %1625[%1632] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1624, %1633 : !llvm.ptr> + %1634 = llvm.add %1130, %59 : !llvm.i64 + %1635 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1637 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1638 = llvm.mul %153, %1637 : !llvm.i64 + %1639 = llvm.add %1636, %1638 : !llvm.i64 + %1640 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1641 = llvm.mul %1634, %1640 : !llvm.i64 + %1642 = llvm.add %1639, %1641 : !llvm.i64 + %1643 = llvm.getelementptr %1635[%1642] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1644 = llvm.bitcast %1643 : !llvm.ptr to !llvm.ptr> + %1645 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1646 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1647 = llvm.trunc %1634 : !llvm.i64 to !llvm.i32 + %1648 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1649 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1650 = llvm.insertelement %1647, %1648[%1649 : !llvm.i32] : !llvm.vec<8 x i32> + %1651 = llvm.shufflevector %1650, %1648 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1652 = llvm.add %1651, %1646 : !llvm.vec<8 x i32> + %1653 = llvm.trunc %1645 : !llvm.i64 to !llvm.i32 + %1654 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1655 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1656 = llvm.insertelement %1653, %1654[%1655 : !llvm.i32] : !llvm.vec<8 x i32> + %1657 = llvm.shufflevector %1656, %1654 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1658 = llvm.icmp "slt" %1652, %1657 : !llvm.vec<8 x i32> + %1659 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1660 = llvm.intr.masked.load %1644, %1658, %1659 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1661 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1662 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1663 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1664 = llvm.mul %67, %1663 : !llvm.i64 + %1665 = llvm.add %1662, %1664 : !llvm.i64 + %1666 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1667 = llvm.mul %35, %1666 : !llvm.i64 + %1668 = llvm.add %1665, %1667 : !llvm.i64 + %1669 = llvm.getelementptr %1661[%1668] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1660, %1669 : !llvm.ptr> + %1670 = llvm.add %1130, %62 : !llvm.i64 + %1671 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1672 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1673 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1674 = llvm.mul %153, %1673 : !llvm.i64 + %1675 = llvm.add %1672, %1674 : !llvm.i64 + %1676 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1677 = llvm.mul %1670, %1676 : !llvm.i64 + %1678 = llvm.add %1675, %1677 : !llvm.i64 + %1679 = llvm.getelementptr %1671[%1678] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1680 = llvm.bitcast %1679 : !llvm.ptr to !llvm.ptr> + %1681 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1682 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1683 = llvm.trunc %1670 : !llvm.i64 to !llvm.i32 + %1684 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1685 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1686 = llvm.insertelement %1683, %1684[%1685 : !llvm.i32] : !llvm.vec<8 x i32> + %1687 = llvm.shufflevector %1686, %1684 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1688 = llvm.add %1687, %1682 : !llvm.vec<8 x i32> + %1689 = llvm.trunc %1681 : !llvm.i64 to !llvm.i32 + %1690 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1691 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1692 = llvm.insertelement %1689, %1690[%1691 : !llvm.i32] : !llvm.vec<8 x i32> + %1693 = llvm.shufflevector %1692, %1690 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1694 = llvm.icmp "slt" %1688, %1693 : !llvm.vec<8 x i32> + %1695 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1696 = llvm.intr.masked.load %1680, %1694, %1695 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1697 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1698 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1699 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1700 = llvm.mul %67, %1699 : !llvm.i64 + %1701 = llvm.add %1698, %1700 : !llvm.i64 + %1702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1703 = llvm.mul %66, %1702 : !llvm.i64 + %1704 = llvm.add %1701, %1703 : !llvm.i64 + %1705 = llvm.getelementptr %1697[%1704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1696, %1705 : !llvm.ptr> + %1706 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1708 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1709 = llvm.mul %67, %1708 : !llvm.i64 + %1710 = llvm.add %1707, %1709 : !llvm.i64 + %1711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1712 = llvm.mul %67, %1711 : !llvm.i64 + %1713 = llvm.add %1710, %1712 : !llvm.i64 + %1714 = llvm.getelementptr %1706[%1713] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1715 = llvm.load %1714 : !llvm.ptr> + %1716 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %1717 = llvm.sub %64, %155 : !llvm.i64 + %1718 = llvm.select %1716, %1717, %155 : !llvm.i1, !llvm.i64 + %1719 = llvm.sdiv %1718, %68 : !llvm.i64 + %1720 = llvm.sub %64, %1719 : !llvm.i64 + %1721 = llvm.select %1716, %1720, %1719 : !llvm.i1, !llvm.i64 + %1722 = llvm.srem %1721, %68 : !llvm.i64 + %1723 = llvm.icmp "slt" %1722, %67 : !llvm.i64 + %1724 = llvm.add %1722, %68 : !llvm.i64 + %1725 = llvm.select %1723, %1724, %1722 : !llvm.i1, !llvm.i64 + %1726 = llvm.srem %153, %39 : !llvm.i64 + %1727 = llvm.icmp "slt" %1726, %67 : !llvm.i64 + %1728 = llvm.add %1726, %39 : !llvm.i64 + %1729 = llvm.select %1727, %1728, %1726 : !llvm.i1, !llvm.i64 + %1730 = llvm.srem %155, %68 : !llvm.i64 + %1731 = llvm.icmp "slt" %1730, %67 : !llvm.i64 + %1732 = llvm.add %1730, %68 : !llvm.i64 + %1733 = llvm.select %1731, %1732, %1730 : !llvm.i1, !llvm.i64 + %1734 = llvm.icmp "slt" %1733, %67 : !llvm.i64 + %1735 = llvm.sub %64, %1733 : !llvm.i64 + %1736 = llvm.select %1734, %1735, %1733 : !llvm.i1, !llvm.i64 + %1737 = llvm.sdiv %1736, %70 : !llvm.i64 + %1738 = llvm.sub %64, %1737 : !llvm.i64 + %1739 = llvm.select %1734, %1738, %1737 : !llvm.i1, !llvm.i64 + %1740 = llvm.srem %1739, %63 : !llvm.i64 + %1741 = llvm.icmp "slt" %1740, %67 : !llvm.i64 + %1742 = llvm.add %1740, %63 : !llvm.i64 + %1743 = llvm.select %1741, %1742, %1740 : !llvm.i1, !llvm.i64 + %1744 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1746 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1747 = llvm.mul %1725, %1746 : !llvm.i64 + %1748 = llvm.add %1745, %1747 : !llvm.i64 + %1749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1750 = llvm.mul %1729, %1749 : !llvm.i64 + %1751 = llvm.add %1748, %1750 : !llvm.i64 + %1752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1753 = llvm.mul %1743, %1752 : !llvm.i64 + %1754 = llvm.add %1751, %1753 : !llvm.i64 + %1755 = llvm.getelementptr %1744[%1754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1715, %1755 : !llvm.ptr> + %1756 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1758 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1759 = llvm.mul %67, %1758 : !llvm.i64 + %1760 = llvm.add %1757, %1759 : !llvm.i64 + %1761 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1762 = llvm.mul %69, %1761 : !llvm.i64 + %1763 = llvm.add %1760, %1762 : !llvm.i64 + %1764 = llvm.getelementptr %1756[%1763] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1765 = llvm.load %1764 : !llvm.ptr> + %1766 = llvm.add %155, %70 : !llvm.i64 + %1767 = llvm.icmp "slt" %1766, %67 : !llvm.i64 + %1768 = llvm.sub %64, %1766 : !llvm.i64 + %1769 = llvm.select %1767, %1768, %1766 : !llvm.i1, !llvm.i64 + %1770 = llvm.sdiv %1769, %68 : !llvm.i64 + %1771 = llvm.sub %64, %1770 : !llvm.i64 + %1772 = llvm.select %1767, %1771, %1770 : !llvm.i1, !llvm.i64 + %1773 = llvm.srem %1772, %68 : !llvm.i64 + %1774 = llvm.icmp "slt" %1773, %67 : !llvm.i64 + %1775 = llvm.add %1773, %68 : !llvm.i64 + %1776 = llvm.select %1774, %1775, %1773 : !llvm.i1, !llvm.i64 + %1777 = llvm.sdiv %1718, %70 : !llvm.i64 + %1778 = llvm.sub %64, %1777 : !llvm.i64 + %1779 = llvm.select %1716, %1778, %1777 : !llvm.i1, !llvm.i64 + %1780 = llvm.mul %1772, %65 : !llvm.i64 + %1781 = llvm.add %1779, %1780 : !llvm.i64 + %1782 = llvm.add %1781, %69 : !llvm.i64 + %1783 = llvm.icmp "slt" %1782, %67 : !llvm.i64 + %1784 = llvm.sub %64, %1782 : !llvm.i64 + %1785 = llvm.select %1783, %1784, %1782 : !llvm.i1, !llvm.i64 + %1786 = llvm.sdiv %1785, %63 : !llvm.i64 + %1787 = llvm.sub %64, %1786 : !llvm.i64 + %1788 = llvm.select %1783, %1787, %1786 : !llvm.i1, !llvm.i64 + %1789 = llvm.mul %1788, %65 : !llvm.i64 + %1790 = llvm.add %1781, %1789 : !llvm.i64 + %1791 = llvm.add %1790, %69 : !llvm.i64 + %1792 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1794 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1795 = llvm.mul %1776, %1794 : !llvm.i64 + %1796 = llvm.add %1793, %1795 : !llvm.i64 + %1797 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1798 = llvm.mul %1729, %1797 : !llvm.i64 + %1799 = llvm.add %1796, %1798 : !llvm.i64 + %1800 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1801 = llvm.mul %1791, %1800 : !llvm.i64 + %1802 = llvm.add %1799, %1801 : !llvm.i64 + %1803 = llvm.getelementptr %1792[%1802] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1765, %1803 : !llvm.ptr> + %1804 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1805 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1806 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1807 = llvm.mul %67, %1806 : !llvm.i64 + %1808 = llvm.add %1805, %1807 : !llvm.i64 + %1809 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1810 = llvm.mul %63, %1809 : !llvm.i64 + %1811 = llvm.add %1808, %1810 : !llvm.i64 + %1812 = llvm.getelementptr %1804[%1811] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1813 = llvm.load %1812 : !llvm.ptr> + %1814 = llvm.add %1721, %69 : !llvm.i64 + %1815 = llvm.icmp "slt" %1814, %67 : !llvm.i64 + %1816 = llvm.sub %64, %1814 : !llvm.i64 + %1817 = llvm.select %1815, %1816, %1814 : !llvm.i1, !llvm.i64 + %1818 = llvm.sdiv %1817, %68 : !llvm.i64 + %1819 = llvm.sub %64, %1818 : !llvm.i64 + %1820 = llvm.select %1815, %1819, %1818 : !llvm.i1, !llvm.i64 + %1821 = llvm.mul %1820, %60 : !llvm.i64 + %1822 = llvm.add %1721, %1821 : !llvm.i64 + %1823 = llvm.add %1822, %69 : !llvm.i64 + %1824 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1825 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1826 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1827 = llvm.mul %1823, %1826 : !llvm.i64 + %1828 = llvm.add %1825, %1827 : !llvm.i64 + %1829 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1830 = llvm.mul %1729, %1829 : !llvm.i64 + %1831 = llvm.add %1828, %1830 : !llvm.i64 + %1832 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1833 = llvm.mul %1743, %1832 : !llvm.i64 + %1834 = llvm.add %1831, %1833 : !llvm.i64 + %1835 = llvm.getelementptr %1824[%1834] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1813, %1835 : !llvm.ptr> + %1836 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1837 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1838 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1839 = llvm.mul %67, %1838 : !llvm.i64 + %1840 = llvm.add %1837, %1839 : !llvm.i64 + %1841 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1842 = llvm.mul %45, %1841 : !llvm.i64 + %1843 = llvm.add %1840, %1842 : !llvm.i64 + %1844 = llvm.getelementptr %1836[%1843] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1845 = llvm.load %1844 : !llvm.ptr> + %1846 = llvm.add %155, %41 : !llvm.i64 + %1847 = llvm.icmp "slt" %1846, %67 : !llvm.i64 + %1848 = llvm.sub %64, %1846 : !llvm.i64 + %1849 = llvm.select %1847, %1848, %1846 : !llvm.i1, !llvm.i64 + %1850 = llvm.sdiv %1849, %68 : !llvm.i64 + %1851 = llvm.sub %64, %1850 : !llvm.i64 + %1852 = llvm.select %1847, %1851, %1850 : !llvm.i1, !llvm.i64 + %1853 = llvm.srem %1852, %68 : !llvm.i64 + %1854 = llvm.icmp "slt" %1853, %67 : !llvm.i64 + %1855 = llvm.add %1853, %68 : !llvm.i64 + %1856 = llvm.select %1854, %1855, %1853 : !llvm.i1, !llvm.i64 + %1857 = llvm.mul %1852, %65 : !llvm.i64 + %1858 = llvm.add %1779, %1857 : !llvm.i64 + %1859 = llvm.add %1858, %45 : !llvm.i64 + %1860 = llvm.icmp "slt" %1859, %67 : !llvm.i64 + %1861 = llvm.sub %64, %1859 : !llvm.i64 + %1862 = llvm.select %1860, %1861, %1859 : !llvm.i1, !llvm.i64 + %1863 = llvm.sdiv %1862, %63 : !llvm.i64 + %1864 = llvm.sub %64, %1863 : !llvm.i64 + %1865 = llvm.select %1860, %1864, %1863 : !llvm.i1, !llvm.i64 + %1866 = llvm.mul %1865, %65 : !llvm.i64 + %1867 = llvm.add %1858, %1866 : !llvm.i64 + %1868 = llvm.add %1867, %45 : !llvm.i64 + %1869 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1871 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1872 = llvm.mul %1856, %1871 : !llvm.i64 + %1873 = llvm.add %1870, %1872 : !llvm.i64 + %1874 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1875 = llvm.mul %1729, %1874 : !llvm.i64 + %1876 = llvm.add %1873, %1875 : !llvm.i64 + %1877 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1878 = llvm.mul %1868, %1877 : !llvm.i64 + %1879 = llvm.add %1876, %1878 : !llvm.i64 + %1880 = llvm.getelementptr %1869[%1879] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1845, %1880 : !llvm.ptr> + %1881 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1882 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1883 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1884 = llvm.mul %67, %1883 : !llvm.i64 + %1885 = llvm.add %1882, %1884 : !llvm.i64 + %1886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1887 = llvm.mul %48, %1886 : !llvm.i64 + %1888 = llvm.add %1885, %1887 : !llvm.i64 + %1889 = llvm.getelementptr %1881[%1888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1890 = llvm.load %1889 : !llvm.ptr> + %1891 = llvm.add %1721, %63 : !llvm.i64 + %1892 = llvm.icmp "slt" %1891, %67 : !llvm.i64 + %1893 = llvm.sub %64, %1891 : !llvm.i64 + %1894 = llvm.select %1892, %1893, %1891 : !llvm.i1, !llvm.i64 + %1895 = llvm.sdiv %1894, %68 : !llvm.i64 + %1896 = llvm.sub %64, %1895 : !llvm.i64 + %1897 = llvm.select %1892, %1896, %1895 : !llvm.i1, !llvm.i64 + %1898 = llvm.mul %1897, %60 : !llvm.i64 + %1899 = llvm.add %1721, %1898 : !llvm.i64 + %1900 = llvm.add %1899, %63 : !llvm.i64 + %1901 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1903 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1904 = llvm.mul %1900, %1903 : !llvm.i64 + %1905 = llvm.add %1902, %1904 : !llvm.i64 + %1906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1907 = llvm.mul %1729, %1906 : !llvm.i64 + %1908 = llvm.add %1905, %1907 : !llvm.i64 + %1909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1910 = llvm.mul %1743, %1909 : !llvm.i64 + %1911 = llvm.add %1908, %1910 : !llvm.i64 + %1912 = llvm.getelementptr %1901[%1911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1890, %1912 : !llvm.ptr> + %1913 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1914 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1915 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1916 = llvm.mul %67, %1915 : !llvm.i64 + %1917 = llvm.add %1914, %1916 : !llvm.i64 + %1918 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1919 = llvm.mul %52, %1918 : !llvm.i64 + %1920 = llvm.add %1917, %1919 : !llvm.i64 + %1921 = llvm.getelementptr %1913[%1920] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1922 = llvm.load %1921 : !llvm.ptr> + %1923 = llvm.add %155, %43 : !llvm.i64 + %1924 = llvm.icmp "slt" %1923, %67 : !llvm.i64 + %1925 = llvm.sub %64, %1923 : !llvm.i64 + %1926 = llvm.select %1924, %1925, %1923 : !llvm.i1, !llvm.i64 + %1927 = llvm.sdiv %1926, %68 : !llvm.i64 + %1928 = llvm.sub %64, %1927 : !llvm.i64 + %1929 = llvm.select %1924, %1928, %1927 : !llvm.i1, !llvm.i64 + %1930 = llvm.srem %1929, %68 : !llvm.i64 + %1931 = llvm.icmp "slt" %1930, %67 : !llvm.i64 + %1932 = llvm.add %1930, %68 : !llvm.i64 + %1933 = llvm.select %1931, %1932, %1930 : !llvm.i1, !llvm.i64 + %1934 = llvm.mul %1929, %65 : !llvm.i64 + %1935 = llvm.add %1779, %1934 : !llvm.i64 + %1936 = llvm.add %1935, %52 : !llvm.i64 + %1937 = llvm.icmp "slt" %1936, %67 : !llvm.i64 + %1938 = llvm.sub %64, %1936 : !llvm.i64 + %1939 = llvm.select %1937, %1938, %1936 : !llvm.i1, !llvm.i64 + %1940 = llvm.sdiv %1939, %63 : !llvm.i64 + %1941 = llvm.sub %64, %1940 : !llvm.i64 + %1942 = llvm.select %1937, %1941, %1940 : !llvm.i1, !llvm.i64 + %1943 = llvm.mul %1942, %65 : !llvm.i64 + %1944 = llvm.add %1935, %1943 : !llvm.i64 + %1945 = llvm.add %1944, %52 : !llvm.i64 + %1946 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1948 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1949 = llvm.mul %1933, %1948 : !llvm.i64 + %1950 = llvm.add %1947, %1949 : !llvm.i64 + %1951 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1952 = llvm.mul %1729, %1951 : !llvm.i64 + %1953 = llvm.add %1950, %1952 : !llvm.i64 + %1954 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1955 = llvm.mul %1945, %1954 : !llvm.i64 + %1956 = llvm.add %1953, %1955 : !llvm.i64 + %1957 = llvm.getelementptr %1946[%1956] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1922, %1957 : !llvm.ptr> + %1958 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1960 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1961 = llvm.mul %67, %1960 : !llvm.i64 + %1962 = llvm.add %1959, %1961 : !llvm.i64 + %1963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1964 = llvm.mul %56, %1963 : !llvm.i64 + %1965 = llvm.add %1962, %1964 : !llvm.i64 + %1966 = llvm.getelementptr %1958[%1965] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1967 = llvm.load %1966 : !llvm.ptr> + %1968 = llvm.add %1721, %45 : !llvm.i64 + %1969 = llvm.icmp "slt" %1968, %67 : !llvm.i64 + %1970 = llvm.sub %64, %1968 : !llvm.i64 + %1971 = llvm.select %1969, %1970, %1968 : !llvm.i1, !llvm.i64 + %1972 = llvm.sdiv %1971, %68 : !llvm.i64 + %1973 = llvm.sub %64, %1972 : !llvm.i64 + %1974 = llvm.select %1969, %1973, %1972 : !llvm.i1, !llvm.i64 + %1975 = llvm.mul %1974, %60 : !llvm.i64 + %1976 = llvm.add %1721, %1975 : !llvm.i64 + %1977 = llvm.add %1976, %45 : !llvm.i64 + %1978 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1980 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1981 = llvm.mul %1977, %1980 : !llvm.i64 + %1982 = llvm.add %1979, %1981 : !llvm.i64 + %1983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1984 = llvm.mul %1729, %1983 : !llvm.i64 + %1985 = llvm.add %1982, %1984 : !llvm.i64 + %1986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1987 = llvm.mul %1743, %1986 : !llvm.i64 + %1988 = llvm.add %1985, %1987 : !llvm.i64 + %1989 = llvm.getelementptr %1978[%1988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1967, %1989 : !llvm.ptr> + %1990 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1991 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1992 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1993 = llvm.mul %67, %1992 : !llvm.i64 + %1994 = llvm.add %1991, %1993 : !llvm.i64 + %1995 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1996 = llvm.mul %61, %1995 : !llvm.i64 + %1997 = llvm.add %1994, %1996 : !llvm.i64 + %1998 = llvm.getelementptr %1990[%1997] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1999 = llvm.load %1998 : !llvm.ptr> + %2000 = llvm.add %155, %46 : !llvm.i64 + %2001 = llvm.icmp "slt" %2000, %67 : !llvm.i64 + %2002 = llvm.sub %64, %2000 : !llvm.i64 + %2003 = llvm.select %2001, %2002, %2000 : !llvm.i1, !llvm.i64 + %2004 = llvm.sdiv %2003, %68 : !llvm.i64 + %2005 = llvm.sub %64, %2004 : !llvm.i64 + %2006 = llvm.select %2001, %2005, %2004 : !llvm.i1, !llvm.i64 + %2007 = llvm.srem %2006, %68 : !llvm.i64 + %2008 = llvm.icmp "slt" %2007, %67 : !llvm.i64 + %2009 = llvm.add %2007, %68 : !llvm.i64 + %2010 = llvm.select %2008, %2009, %2007 : !llvm.i1, !llvm.i64 + %2011 = llvm.mul %2006, %65 : !llvm.i64 + %2012 = llvm.add %1779, %2011 : !llvm.i64 + %2013 = llvm.add %2012, %61 : !llvm.i64 + %2014 = llvm.icmp "slt" %2013, %67 : !llvm.i64 + %2015 = llvm.sub %64, %2013 : !llvm.i64 + %2016 = llvm.select %2014, %2015, %2013 : !llvm.i1, !llvm.i64 + %2017 = llvm.sdiv %2016, %63 : !llvm.i64 + %2018 = llvm.sub %64, %2017 : !llvm.i64 + %2019 = llvm.select %2014, %2018, %2017 : !llvm.i1, !llvm.i64 + %2020 = llvm.mul %2019, %65 : !llvm.i64 + %2021 = llvm.add %2012, %2020 : !llvm.i64 + %2022 = llvm.add %2021, %61 : !llvm.i64 + %2023 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2024 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2025 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2026 = llvm.mul %2010, %2025 : !llvm.i64 + %2027 = llvm.add %2024, %2026 : !llvm.i64 + %2028 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2029 = llvm.mul %1729, %2028 : !llvm.i64 + %2030 = llvm.add %2027, %2029 : !llvm.i64 + %2031 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2032 = llvm.mul %2022, %2031 : !llvm.i64 + %2033 = llvm.add %2030, %2032 : !llvm.i64 + %2034 = llvm.getelementptr %2023[%2033] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1999, %2034 : !llvm.ptr> + %2035 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2036 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2037 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2038 = llvm.mul %67, %2037 : !llvm.i64 + %2039 = llvm.add %2036, %2038 : !llvm.i64 + %2040 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2041 = llvm.mul %70, %2040 : !llvm.i64 + %2042 = llvm.add %2039, %2041 : !llvm.i64 + %2043 = llvm.getelementptr %2035[%2042] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2044 = llvm.load %2043 : !llvm.ptr> + %2045 = llvm.add %1721, %48 : !llvm.i64 + %2046 = llvm.icmp "slt" %2045, %67 : !llvm.i64 + %2047 = llvm.sub %64, %2045 : !llvm.i64 + %2048 = llvm.select %2046, %2047, %2045 : !llvm.i1, !llvm.i64 + %2049 = llvm.sdiv %2048, %68 : !llvm.i64 + %2050 = llvm.sub %64, %2049 : !llvm.i64 + %2051 = llvm.select %2046, %2050, %2049 : !llvm.i1, !llvm.i64 + %2052 = llvm.mul %2051, %60 : !llvm.i64 + %2053 = llvm.add %1721, %2052 : !llvm.i64 + %2054 = llvm.add %2053, %48 : !llvm.i64 + %2055 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2056 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2057 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2058 = llvm.mul %2054, %2057 : !llvm.i64 + %2059 = llvm.add %2056, %2058 : !llvm.i64 + %2060 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2061 = llvm.mul %1729, %2060 : !llvm.i64 + %2062 = llvm.add %2059, %2061 : !llvm.i64 + %2063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2064 = llvm.mul %1743, %2063 : !llvm.i64 + %2065 = llvm.add %2062, %2064 : !llvm.i64 + %2066 = llvm.getelementptr %2055[%2065] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2044, %2066 : !llvm.ptr> + %2067 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2069 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2070 = llvm.mul %67, %2069 : !llvm.i64 + %2071 = llvm.add %2068, %2070 : !llvm.i64 + %2072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2073 = llvm.mul %50, %2072 : !llvm.i64 + %2074 = llvm.add %2071, %2073 : !llvm.i64 + %2075 = llvm.getelementptr %2067[%2074] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2076 = llvm.load %2075 : !llvm.ptr> + %2077 = llvm.add %155, %49 : !llvm.i64 + %2078 = llvm.icmp "slt" %2077, %67 : !llvm.i64 + %2079 = llvm.sub %64, %2077 : !llvm.i64 + %2080 = llvm.select %2078, %2079, %2077 : !llvm.i1, !llvm.i64 + %2081 = llvm.sdiv %2080, %68 : !llvm.i64 + %2082 = llvm.sub %64, %2081 : !llvm.i64 + %2083 = llvm.select %2078, %2082, %2081 : !llvm.i1, !llvm.i64 + %2084 = llvm.srem %2083, %68 : !llvm.i64 + %2085 = llvm.icmp "slt" %2084, %67 : !llvm.i64 + %2086 = llvm.add %2084, %68 : !llvm.i64 + %2087 = llvm.select %2085, %2086, %2084 : !llvm.i1, !llvm.i64 + %2088 = llvm.mul %2083, %65 : !llvm.i64 + %2089 = llvm.add %1779, %2088 : !llvm.i64 + %2090 = llvm.add %2089, %50 : !llvm.i64 + %2091 = llvm.icmp "slt" %2090, %67 : !llvm.i64 + %2092 = llvm.sub %64, %2090 : !llvm.i64 + %2093 = llvm.select %2091, %2092, %2090 : !llvm.i1, !llvm.i64 + %2094 = llvm.sdiv %2093, %63 : !llvm.i64 + %2095 = llvm.sub %64, %2094 : !llvm.i64 + %2096 = llvm.select %2091, %2095, %2094 : !llvm.i1, !llvm.i64 + %2097 = llvm.mul %2096, %65 : !llvm.i64 + %2098 = llvm.add %2089, %2097 : !llvm.i64 + %2099 = llvm.add %2098, %50 : !llvm.i64 + %2100 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2101 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2102 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2103 = llvm.mul %2087, %2102 : !llvm.i64 + %2104 = llvm.add %2101, %2103 : !llvm.i64 + %2105 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2106 = llvm.mul %1729, %2105 : !llvm.i64 + %2107 = llvm.add %2104, %2106 : !llvm.i64 + %2108 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2109 = llvm.mul %2099, %2108 : !llvm.i64 + %2110 = llvm.add %2107, %2109 : !llvm.i64 + %2111 = llvm.getelementptr %2100[%2110] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2076, %2111 : !llvm.ptr> + %2112 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2114 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2115 = llvm.mul %67, %2114 : !llvm.i64 + %2116 = llvm.add %2113, %2115 : !llvm.i64 + %2117 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2118 = llvm.mul %33, %2117 : !llvm.i64 + %2119 = llvm.add %2116, %2118 : !llvm.i64 + %2120 = llvm.getelementptr %2112[%2119] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2121 = llvm.load %2120 : !llvm.ptr> + %2122 = llvm.add %1721, %52 : !llvm.i64 + %2123 = llvm.icmp "slt" %2122, %67 : !llvm.i64 + %2124 = llvm.sub %64, %2122 : !llvm.i64 + %2125 = llvm.select %2123, %2124, %2122 : !llvm.i1, !llvm.i64 + %2126 = llvm.sdiv %2125, %68 : !llvm.i64 + %2127 = llvm.sub %64, %2126 : !llvm.i64 + %2128 = llvm.select %2123, %2127, %2126 : !llvm.i1, !llvm.i64 + %2129 = llvm.mul %2128, %60 : !llvm.i64 + %2130 = llvm.add %1721, %2129 : !llvm.i64 + %2131 = llvm.add %2130, %52 : !llvm.i64 + %2132 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2134 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2135 = llvm.mul %2131, %2134 : !llvm.i64 + %2136 = llvm.add %2133, %2135 : !llvm.i64 + %2137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2138 = llvm.mul %1729, %2137 : !llvm.i64 + %2139 = llvm.add %2136, %2138 : !llvm.i64 + %2140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2141 = llvm.mul %1743, %2140 : !llvm.i64 + %2142 = llvm.add %2139, %2141 : !llvm.i64 + %2143 = llvm.getelementptr %2132[%2142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2121, %2143 : !llvm.ptr> + %2144 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2145 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2146 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2147 = llvm.mul %67, %2146 : !llvm.i64 + %2148 = llvm.add %2145, %2147 : !llvm.i64 + %2149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2150 = llvm.mul %54, %2149 : !llvm.i64 + %2151 = llvm.add %2148, %2150 : !llvm.i64 + %2152 = llvm.getelementptr %2144[%2151] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2153 = llvm.load %2152 : !llvm.ptr> + %2154 = llvm.add %155, %53 : !llvm.i64 + %2155 = llvm.icmp "slt" %2154, %67 : !llvm.i64 + %2156 = llvm.sub %64, %2154 : !llvm.i64 + %2157 = llvm.select %2155, %2156, %2154 : !llvm.i1, !llvm.i64 + %2158 = llvm.sdiv %2157, %68 : !llvm.i64 + %2159 = llvm.sub %64, %2158 : !llvm.i64 + %2160 = llvm.select %2155, %2159, %2158 : !llvm.i1, !llvm.i64 + %2161 = llvm.srem %2160, %68 : !llvm.i64 + %2162 = llvm.icmp "slt" %2161, %67 : !llvm.i64 + %2163 = llvm.add %2161, %68 : !llvm.i64 + %2164 = llvm.select %2162, %2163, %2161 : !llvm.i1, !llvm.i64 + %2165 = llvm.mul %2160, %65 : !llvm.i64 + %2166 = llvm.add %1779, %2165 : !llvm.i64 + %2167 = llvm.add %2166, %54 : !llvm.i64 + %2168 = llvm.icmp "slt" %2167, %67 : !llvm.i64 + %2169 = llvm.sub %64, %2167 : !llvm.i64 + %2170 = llvm.select %2168, %2169, %2167 : !llvm.i1, !llvm.i64 + %2171 = llvm.sdiv %2170, %63 : !llvm.i64 + %2172 = llvm.sub %64, %2171 : !llvm.i64 + %2173 = llvm.select %2168, %2172, %2171 : !llvm.i1, !llvm.i64 + %2174 = llvm.mul %2173, %65 : !llvm.i64 + %2175 = llvm.add %2166, %2174 : !llvm.i64 + %2176 = llvm.add %2175, %54 : !llvm.i64 + %2177 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2179 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2180 = llvm.mul %2164, %2179 : !llvm.i64 + %2181 = llvm.add %2178, %2180 : !llvm.i64 + %2182 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2183 = llvm.mul %1729, %2182 : !llvm.i64 + %2184 = llvm.add %2181, %2183 : !llvm.i64 + %2185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2186 = llvm.mul %2176, %2185 : !llvm.i64 + %2187 = llvm.add %2184, %2186 : !llvm.i64 + %2188 = llvm.getelementptr %2177[%2187] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2153, %2188 : !llvm.ptr> + %2189 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2191 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2192 = llvm.mul %67, %2191 : !llvm.i64 + %2193 = llvm.add %2190, %2192 : !llvm.i64 + %2194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2195 = llvm.mul %34, %2194 : !llvm.i64 + %2196 = llvm.add %2193, %2195 : !llvm.i64 + %2197 = llvm.getelementptr %2189[%2196] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2198 = llvm.load %2197 : !llvm.ptr> + %2199 = llvm.add %1721, %56 : !llvm.i64 + %2200 = llvm.icmp "slt" %2199, %67 : !llvm.i64 + %2201 = llvm.sub %64, %2199 : !llvm.i64 + %2202 = llvm.select %2200, %2201, %2199 : !llvm.i1, !llvm.i64 + %2203 = llvm.sdiv %2202, %68 : !llvm.i64 + %2204 = llvm.sub %64, %2203 : !llvm.i64 + %2205 = llvm.select %2200, %2204, %2203 : !llvm.i1, !llvm.i64 + %2206 = llvm.mul %2205, %60 : !llvm.i64 + %2207 = llvm.add %1721, %2206 : !llvm.i64 + %2208 = llvm.add %2207, %56 : !llvm.i64 + %2209 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2211 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2212 = llvm.mul %2208, %2211 : !llvm.i64 + %2213 = llvm.add %2210, %2212 : !llvm.i64 + %2214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2215 = llvm.mul %1729, %2214 : !llvm.i64 + %2216 = llvm.add %2213, %2215 : !llvm.i64 + %2217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2218 = llvm.mul %1743, %2217 : !llvm.i64 + %2219 = llvm.add %2216, %2218 : !llvm.i64 + %2220 = llvm.getelementptr %2209[%2219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2198, %2220 : !llvm.ptr> + %2221 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2223 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2224 = llvm.mul %67, %2223 : !llvm.i64 + %2225 = llvm.add %2222, %2224 : !llvm.i64 + %2226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2227 = llvm.mul %58, %2226 : !llvm.i64 + %2228 = llvm.add %2225, %2227 : !llvm.i64 + %2229 = llvm.getelementptr %2221[%2228] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2230 = llvm.load %2229 : !llvm.ptr> + %2231 = llvm.add %155, %57 : !llvm.i64 + %2232 = llvm.icmp "slt" %2231, %67 : !llvm.i64 + %2233 = llvm.sub %64, %2231 : !llvm.i64 + %2234 = llvm.select %2232, %2233, %2231 : !llvm.i1, !llvm.i64 + %2235 = llvm.sdiv %2234, %68 : !llvm.i64 + %2236 = llvm.sub %64, %2235 : !llvm.i64 + %2237 = llvm.select %2232, %2236, %2235 : !llvm.i1, !llvm.i64 + %2238 = llvm.srem %2237, %68 : !llvm.i64 + %2239 = llvm.icmp "slt" %2238, %67 : !llvm.i64 + %2240 = llvm.add %2238, %68 : !llvm.i64 + %2241 = llvm.select %2239, %2240, %2238 : !llvm.i1, !llvm.i64 + %2242 = llvm.mul %2237, %65 : !llvm.i64 + %2243 = llvm.add %1779, %2242 : !llvm.i64 + %2244 = llvm.add %2243, %58 : !llvm.i64 + %2245 = llvm.icmp "slt" %2244, %67 : !llvm.i64 + %2246 = llvm.sub %64, %2244 : !llvm.i64 + %2247 = llvm.select %2245, %2246, %2244 : !llvm.i1, !llvm.i64 + %2248 = llvm.sdiv %2247, %63 : !llvm.i64 + %2249 = llvm.sub %64, %2248 : !llvm.i64 + %2250 = llvm.select %2245, %2249, %2248 : !llvm.i1, !llvm.i64 + %2251 = llvm.mul %2250, %65 : !llvm.i64 + %2252 = llvm.add %2243, %2251 : !llvm.i64 + %2253 = llvm.add %2252, %58 : !llvm.i64 + %2254 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2256 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2257 = llvm.mul %2241, %2256 : !llvm.i64 + %2258 = llvm.add %2255, %2257 : !llvm.i64 + %2259 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2260 = llvm.mul %1729, %2259 : !llvm.i64 + %2261 = llvm.add %2258, %2260 : !llvm.i64 + %2262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2263 = llvm.mul %2253, %2262 : !llvm.i64 + %2264 = llvm.add %2261, %2263 : !llvm.i64 + %2265 = llvm.getelementptr %2254[%2264] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2230, %2265 : !llvm.ptr> + %2266 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2268 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2269 = llvm.mul %67, %2268 : !llvm.i64 + %2270 = llvm.add %2267, %2269 : !llvm.i64 + %2271 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2272 = llvm.mul %35, %2271 : !llvm.i64 + %2273 = llvm.add %2270, %2272 : !llvm.i64 + %2274 = llvm.getelementptr %2266[%2273] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2275 = llvm.load %2274 : !llvm.ptr> + %2276 = llvm.add %1721, %61 : !llvm.i64 + %2277 = llvm.icmp "slt" %2276, %67 : !llvm.i64 + %2278 = llvm.sub %64, %2276 : !llvm.i64 + %2279 = llvm.select %2277, %2278, %2276 : !llvm.i1, !llvm.i64 + %2280 = llvm.sdiv %2279, %68 : !llvm.i64 + %2281 = llvm.sub %64, %2280 : !llvm.i64 + %2282 = llvm.select %2277, %2281, %2280 : !llvm.i1, !llvm.i64 + %2283 = llvm.mul %2282, %60 : !llvm.i64 + %2284 = llvm.add %1721, %2283 : !llvm.i64 + %2285 = llvm.add %2284, %61 : !llvm.i64 + %2286 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2287 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2288 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2289 = llvm.mul %2285, %2288 : !llvm.i64 + %2290 = llvm.add %2287, %2289 : !llvm.i64 + %2291 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2292 = llvm.mul %1729, %2291 : !llvm.i64 + %2293 = llvm.add %2290, %2292 : !llvm.i64 + %2294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2295 = llvm.mul %1743, %2294 : !llvm.i64 + %2296 = llvm.add %2293, %2295 : !llvm.i64 + %2297 = llvm.getelementptr %2286[%2296] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2275, %2297 : !llvm.ptr> + %2298 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2300 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2301 = llvm.mul %67, %2300 : !llvm.i64 + %2302 = llvm.add %2299, %2301 : !llvm.i64 + %2303 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2304 = llvm.mul %66, %2303 : !llvm.i64 + %2305 = llvm.add %2302, %2304 : !llvm.i64 + %2306 = llvm.getelementptr %2298[%2305] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2307 = llvm.load %2306 : !llvm.ptr> + %2308 = llvm.add %155, %62 : !llvm.i64 + %2309 = llvm.icmp "slt" %2308, %67 : !llvm.i64 + %2310 = llvm.sub %64, %2308 : !llvm.i64 + %2311 = llvm.select %2309, %2310, %2308 : !llvm.i1, !llvm.i64 + %2312 = llvm.sdiv %2311, %68 : !llvm.i64 + %2313 = llvm.sub %64, %2312 : !llvm.i64 + %2314 = llvm.select %2309, %2313, %2312 : !llvm.i1, !llvm.i64 + %2315 = llvm.srem %2314, %68 : !llvm.i64 + %2316 = llvm.icmp "slt" %2315, %67 : !llvm.i64 + %2317 = llvm.add %2315, %68 : !llvm.i64 + %2318 = llvm.select %2316, %2317, %2315 : !llvm.i1, !llvm.i64 + %2319 = llvm.mul %2314, %65 : !llvm.i64 + %2320 = llvm.add %1779, %2319 : !llvm.i64 + %2321 = llvm.add %2320, %66 : !llvm.i64 + %2322 = llvm.icmp "slt" %2321, %67 : !llvm.i64 + %2323 = llvm.sub %64, %2321 : !llvm.i64 + %2324 = llvm.select %2322, %2323, %2321 : !llvm.i1, !llvm.i64 + %2325 = llvm.sdiv %2324, %63 : !llvm.i64 + %2326 = llvm.sub %64, %2325 : !llvm.i64 + %2327 = llvm.select %2322, %2326, %2325 : !llvm.i1, !llvm.i64 + %2328 = llvm.mul %2327, %65 : !llvm.i64 + %2329 = llvm.add %2320, %2328 : !llvm.i64 + %2330 = llvm.add %2329, %66 : !llvm.i64 + %2331 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2332 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2333 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2334 = llvm.mul %2318, %2333 : !llvm.i64 + %2335 = llvm.add %2332, %2334 : !llvm.i64 + %2336 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2337 = llvm.mul %1729, %2336 : !llvm.i64 + %2338 = llvm.add %2335, %2337 : !llvm.i64 + %2339 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2340 = llvm.mul %2330, %2339 : !llvm.i64 + %2341 = llvm.add %2338, %2340 : !llvm.i64 + %2342 = llvm.getelementptr %2331[%2341] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2307, %2342 : !llvm.ptr> + llvm.br ^bb9 + ^bb9: // 2 preds: ^bb7, ^bb8 + %2343 = llvm.add %155, %39 : !llvm.i64 + llvm.br ^bb5(%2343 : !llvm.i64) + ^bb10: // pred: ^bb5 + %2344 = llvm.add %153, %69 : !llvm.i64 + llvm.br ^bb3(%2344 : !llvm.i64) + ^bb11: // pred: ^bb3 + llvm.br ^bb12(%67 : !llvm.i64) + ^bb12(%2345: !llvm.i64): // 2 preds: ^bb11, ^bb51 + %2346 = llvm.icmp "slt" %2345, %37 : !llvm.i64 + llvm.cond_br %2346, ^bb13, ^bb52 + ^bb13: // pred: ^bb12 + llvm.br ^bb14(%67 : !llvm.i64) + ^bb14(%2347: !llvm.i64): // 2 preds: ^bb13, ^bb21 + %2348 = llvm.icmp "slt" %2347, %68 : !llvm.i64 + llvm.cond_br %2348, ^bb15, ^bb22 + ^bb15: // pred: ^bb14 + llvm.br ^bb16(%67 : !llvm.i64) + ^bb16(%2349: !llvm.i64): // 2 preds: ^bb15, ^bb20 + %2350 = llvm.icmp "slt" %2349, %56 : !llvm.i64 + llvm.cond_br %2350, ^bb17, ^bb21 + ^bb17: // pred: ^bb16 + llvm.br ^bb18(%67 : !llvm.i64) + ^bb18(%2351: !llvm.i64): // 2 preds: ^bb17, ^bb19 + %2352 = llvm.icmp "slt" %2351, %63 : !llvm.i64 + llvm.cond_br %2352, ^bb19, ^bb20 + ^bb19: // pred: ^bb18 + %2353 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2354 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2355 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2356 = llvm.mul %2347, %2355 : !llvm.i64 + %2357 = llvm.add %2354, %2356 : !llvm.i64 + %2358 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2359 = llvm.mul %2349, %2358 : !llvm.i64 + %2360 = llvm.add %2357, %2359 : !llvm.i64 + %2361 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2362 = llvm.mul %2351, %2361 : !llvm.i64 + %2363 = llvm.add %2360, %2362 : !llvm.i64 + %2364 = llvm.getelementptr %2353[%2363] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %32, %2364 : !llvm.ptr> + %2365 = llvm.add %2351, %69 : !llvm.i64 + llvm.br ^bb18(%2365 : !llvm.i64) + ^bb20: // pred: ^bb18 + %2366 = llvm.add %2349, %69 : !llvm.i64 + llvm.br ^bb16(%2366 : !llvm.i64) + ^bb21: // pred: ^bb16 + %2367 = llvm.add %2347, %69 : !llvm.i64 + llvm.br ^bb14(%2367 : !llvm.i64) + ^bb22: // pred: ^bb14 + llvm.br ^bb23(%67 : !llvm.i64) + ^bb23(%2368: !llvm.i64): // 2 preds: ^bb22, ^bb39 + %2369 = llvm.icmp "slt" %2368, %38 : !llvm.i64 + llvm.cond_br %2369, ^bb24, ^bb40 + ^bb24: // pred: ^bb23 + llvm.br ^bb25(%67 : !llvm.i64) + ^bb25(%2370: !llvm.i64): // 2 preds: ^bb24, ^bb38 + %2371 = llvm.icmp "slt" %2370, %39 : !llvm.i64 + llvm.cond_br %2371, ^bb26, ^bb39 + ^bb26: // pred: ^bb25 + llvm.br ^bb27(%67 : !llvm.i64) + ^bb27(%2372: !llvm.i64): // 2 preds: ^bb26, ^bb34 + %2373 = llvm.icmp "slt" %2372, %67 : !llvm.i64 + llvm.cond_br %2373, ^bb28, ^bb35 + ^bb28: // pred: ^bb27 + llvm.br ^bb29(%67 : !llvm.i64) + ^bb29(%2374: !llvm.i64): // 2 preds: ^bb28, ^bb33 + %2375 = llvm.icmp "slt" %2374, %48 : !llvm.i64 + llvm.cond_br %2375, ^bb30, ^bb34 + ^bb30: // pred: ^bb29 + llvm.br ^bb31(%67 : !llvm.i64) + ^bb31(%2376: !llvm.i64): // 2 preds: ^bb30, ^bb32 + %2377 = llvm.icmp "slt" %2376, %67 : !llvm.i64 + llvm.cond_br %2377, ^bb32, ^bb33 + ^bb32: // pred: ^bb31 + %2378 = llvm.add %2345, %2372 : !llvm.i64 + %2379 = llvm.add %2378, %2376 : !llvm.i64 + %2380 = llvm.add %2370, %2374 : !llvm.i64 + %2381 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2383 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2384 = llvm.mul %2379, %2383 : !llvm.i64 + %2385 = llvm.add %2382, %2384 : !llvm.i64 + %2386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2387 = llvm.mul %2380, %2386 : !llvm.i64 + %2388 = llvm.add %2385, %2387 : !llvm.i64 + %2389 = llvm.getelementptr %2381[%2388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2390 = llvm.load %2389 : !llvm.ptr + %2391 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2392 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2393 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2394 = llvm.mul %2379, %2393 : !llvm.i64 + %2395 = llvm.add %2392, %2394 : !llvm.i64 + %2396 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2397 = llvm.mul %2380, %2396 : !llvm.i64 + %2398 = llvm.add %2395, %2397 : !llvm.i64 + %2399 = llvm.getelementptr %2391[%2398] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2400 = llvm.load %2399 : !llvm.ptr + %2401 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2403 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2404 = llvm.mul %2379, %2403 : !llvm.i64 + %2405 = llvm.add %2402, %2404 : !llvm.i64 + %2406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2407 = llvm.mul %2380, %2406 : !llvm.i64 + %2408 = llvm.add %2405, %2407 : !llvm.i64 + %2409 = llvm.getelementptr %2401[%2408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2410 = llvm.load %2409 : !llvm.ptr + %2411 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2413 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2414 = llvm.mul %2379, %2413 : !llvm.i64 + %2415 = llvm.add %2412, %2414 : !llvm.i64 + %2416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2417 = llvm.mul %2380, %2416 : !llvm.i64 + %2418 = llvm.add %2415, %2417 : !llvm.i64 + %2419 = llvm.getelementptr %2411[%2418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2420 = llvm.load %2419 : !llvm.ptr + %2421 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2423 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2424 = llvm.mul %2379, %2423 : !llvm.i64 + %2425 = llvm.add %2422, %2424 : !llvm.i64 + %2426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2427 = llvm.mul %2380, %2426 : !llvm.i64 + %2428 = llvm.add %2425, %2427 : !llvm.i64 + %2429 = llvm.getelementptr %2421[%2428] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2430 = llvm.load %2429 : !llvm.ptr + %2431 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2433 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2434 = llvm.mul %2379, %2433 : !llvm.i64 + %2435 = llvm.add %2432, %2434 : !llvm.i64 + %2436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2437 = llvm.mul %2380, %2436 : !llvm.i64 + %2438 = llvm.add %2435, %2437 : !llvm.i64 + %2439 = llvm.getelementptr %2431[%2438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2440 = llvm.load %2439 : !llvm.ptr + %2441 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2442 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2443 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2444 = llvm.mul %2379, %2443 : !llvm.i64 + %2445 = llvm.add %2442, %2444 : !llvm.i64 + %2446 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2447 = llvm.mul %2380, %2446 : !llvm.i64 + %2448 = llvm.add %2445, %2447 : !llvm.i64 + %2449 = llvm.getelementptr %2441[%2448] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2450 = llvm.load %2449 : !llvm.ptr + %2451 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2452 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2453 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2454 = llvm.mul %2379, %2453 : !llvm.i64 + %2455 = llvm.add %2452, %2454 : !llvm.i64 + %2456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2457 = llvm.mul %2380, %2456 : !llvm.i64 + %2458 = llvm.add %2455, %2457 : !llvm.i64 + %2459 = llvm.getelementptr %2451[%2458] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2460 = llvm.load %2459 : !llvm.ptr + %2461 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %2462 = llvm.sub %64, %2368 : !llvm.i64 + %2463 = llvm.select %2461, %2462, %2368 : !llvm.i1, !llvm.i64 + %2464 = llvm.sdiv %2463, %68 : !llvm.i64 + %2465 = llvm.sub %64, %2464 : !llvm.i64 + %2466 = llvm.select %2461, %2465, %2464 : !llvm.i1, !llvm.i64 + %2467 = llvm.srem %2466, %68 : !llvm.i64 + %2468 = llvm.icmp "slt" %2467, %67 : !llvm.i64 + %2469 = llvm.add %2467, %68 : !llvm.i64 + %2470 = llvm.select %2468, %2469, %2467 : !llvm.i1, !llvm.i64 + %2471 = llvm.srem %2380, %39 : !llvm.i64 + %2472 = llvm.icmp "slt" %2471, %67 : !llvm.i64 + %2473 = llvm.add %2471, %39 : !llvm.i64 + %2474 = llvm.select %2472, %2473, %2471 : !llvm.i1, !llvm.i64 + %2475 = llvm.srem %2368, %68 : !llvm.i64 + %2476 = llvm.icmp "slt" %2475, %67 : !llvm.i64 + %2477 = llvm.add %2475, %68 : !llvm.i64 + %2478 = llvm.select %2476, %2477, %2475 : !llvm.i1, !llvm.i64 + %2479 = llvm.icmp "slt" %2478, %67 : !llvm.i64 + %2480 = llvm.sub %64, %2478 : !llvm.i64 + %2481 = llvm.select %2479, %2480, %2478 : !llvm.i1, !llvm.i64 + %2482 = llvm.sdiv %2481, %70 : !llvm.i64 + %2483 = llvm.sub %64, %2482 : !llvm.i64 + %2484 = llvm.select %2479, %2483, %2482 : !llvm.i1, !llvm.i64 + %2485 = llvm.srem %2484, %63 : !llvm.i64 + %2486 = llvm.icmp "slt" %2485, %67 : !llvm.i64 + %2487 = llvm.add %2485, %63 : !llvm.i64 + %2488 = llvm.select %2486, %2487, %2485 : !llvm.i1, !llvm.i64 + %2489 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2490 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2491 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2492 = llvm.mul %2470, %2491 : !llvm.i64 + %2493 = llvm.add %2490, %2492 : !llvm.i64 + %2494 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2495 = llvm.mul %2474, %2494 : !llvm.i64 + %2496 = llvm.add %2493, %2495 : !llvm.i64 + %2497 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2498 = llvm.mul %2488, %2497 : !llvm.i64 + %2499 = llvm.add %2496, %2498 : !llvm.i64 + %2500 = llvm.getelementptr %2489[%2499] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2501 = llvm.load %2500 : !llvm.ptr> + %2502 = llvm.extractelement %2501[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2503 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2504 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2505 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2506 = llvm.mul %2470, %2505 : !llvm.i64 + %2507 = llvm.add %2504, %2506 : !llvm.i64 + %2508 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2509 = llvm.mul %2474, %2508 : !llvm.i64 + %2510 = llvm.add %2507, %2509 : !llvm.i64 + %2511 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2512 = llvm.mul %2488, %2511 : !llvm.i64 + %2513 = llvm.add %2510, %2512 : !llvm.i64 + %2514 = llvm.getelementptr %2503[%2513] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2515 = llvm.load %2514 : !llvm.ptr> + %2516 = llvm.extractelement %2515[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2517 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2519 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2520 = llvm.mul %2470, %2519 : !llvm.i64 + %2521 = llvm.add %2518, %2520 : !llvm.i64 + %2522 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2523 = llvm.mul %2474, %2522 : !llvm.i64 + %2524 = llvm.add %2521, %2523 : !llvm.i64 + %2525 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2526 = llvm.mul %2488, %2525 : !llvm.i64 + %2527 = llvm.add %2524, %2526 : !llvm.i64 + %2528 = llvm.getelementptr %2517[%2527] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2529 = llvm.load %2528 : !llvm.ptr> + %2530 = llvm.extractelement %2529[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2534 = llvm.mul %2470, %2533 : !llvm.i64 + %2535 = llvm.add %2532, %2534 : !llvm.i64 + %2536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2537 = llvm.mul %2474, %2536 : !llvm.i64 + %2538 = llvm.add %2535, %2537 : !llvm.i64 + %2539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2540 = llvm.mul %2488, %2539 : !llvm.i64 + %2541 = llvm.add %2538, %2540 : !llvm.i64 + %2542 = llvm.getelementptr %2531[%2541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2543 = llvm.load %2542 : !llvm.ptr> + %2544 = llvm.extractelement %2543[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2545 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2546 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2547 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2548 = llvm.mul %2470, %2547 : !llvm.i64 + %2549 = llvm.add %2546, %2548 : !llvm.i64 + %2550 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2551 = llvm.mul %2474, %2550 : !llvm.i64 + %2552 = llvm.add %2549, %2551 : !llvm.i64 + %2553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2554 = llvm.mul %2488, %2553 : !llvm.i64 + %2555 = llvm.add %2552, %2554 : !llvm.i64 + %2556 = llvm.getelementptr %2545[%2555] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2557 = llvm.load %2556 : !llvm.ptr> + %2558 = llvm.extractelement %2557[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2559 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2560 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2561 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2562 = llvm.mul %2470, %2561 : !llvm.i64 + %2563 = llvm.add %2560, %2562 : !llvm.i64 + %2564 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2565 = llvm.mul %2474, %2564 : !llvm.i64 + %2566 = llvm.add %2563, %2565 : !llvm.i64 + %2567 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2568 = llvm.mul %2488, %2567 : !llvm.i64 + %2569 = llvm.add %2566, %2568 : !llvm.i64 + %2570 = llvm.getelementptr %2559[%2569] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2571 = llvm.load %2570 : !llvm.ptr> + %2572 = llvm.extractelement %2571[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2573 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2574 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2575 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2576 = llvm.mul %2470, %2575 : !llvm.i64 + %2577 = llvm.add %2574, %2576 : !llvm.i64 + %2578 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2579 = llvm.mul %2474, %2578 : !llvm.i64 + %2580 = llvm.add %2577, %2579 : !llvm.i64 + %2581 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2582 = llvm.mul %2488, %2581 : !llvm.i64 + %2583 = llvm.add %2580, %2582 : !llvm.i64 + %2584 = llvm.getelementptr %2573[%2583] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2585 = llvm.load %2584 : !llvm.ptr> + %2586 = llvm.extractelement %2585[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2587 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2588 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2589 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2590 = llvm.mul %2470, %2589 : !llvm.i64 + %2591 = llvm.add %2588, %2590 : !llvm.i64 + %2592 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2593 = llvm.mul %2474, %2592 : !llvm.i64 + %2594 = llvm.add %2591, %2593 : !llvm.i64 + %2595 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2596 = llvm.mul %2488, %2595 : !llvm.i64 + %2597 = llvm.add %2594, %2596 : !llvm.i64 + %2598 = llvm.getelementptr %2587[%2597] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2599 = llvm.load %2598 : !llvm.ptr> + %2600 = llvm.extractelement %2599[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2601 = llvm.fmul %2390, %2502 {RelaxedPrecision} : !llvm.float + %2602 = llvm.fmul %2400, %2516 {RelaxedPrecision} : !llvm.float + %2603 = llvm.fmul %2410, %2530 {RelaxedPrecision} : !llvm.float + %2604 = llvm.fmul %2420, %2544 {RelaxedPrecision} : !llvm.float + %2605 = llvm.fmul %2430, %2558 {RelaxedPrecision} : !llvm.float + %2606 = llvm.fmul %2440, %2572 {RelaxedPrecision} : !llvm.float + %2607 = llvm.fmul %2450, %2586 {RelaxedPrecision} : !llvm.float + %2608 = llvm.fmul %2460, %2600 {RelaxedPrecision} : !llvm.float + %2609 = llvm.add %2372, %2376 : !llvm.i64 + %2610 = llvm.srem %2609, %56 : !llvm.i64 + %2611 = llvm.icmp "slt" %2610, %67 : !llvm.i64 + %2612 = llvm.add %2610, %56 : !llvm.i64 + %2613 = llvm.select %2611, %2612, %2610 : !llvm.i1, !llvm.i64 + %2614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2617 = llvm.mul %2470, %2616 : !llvm.i64 + %2618 = llvm.add %2615, %2617 : !llvm.i64 + %2619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2620 = llvm.mul %2613, %2619 : !llvm.i64 + %2621 = llvm.add %2618, %2620 : !llvm.i64 + %2622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2623 = llvm.mul %2488, %2622 : !llvm.i64 + %2624 = llvm.add %2621, %2623 : !llvm.i64 + %2625 = llvm.getelementptr %2614[%2624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2626 = llvm.load %2625 : !llvm.ptr> + %2627 = llvm.extractelement %2626[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2628 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2630 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2631 = llvm.mul %2470, %2630 : !llvm.i64 + %2632 = llvm.add %2629, %2631 : !llvm.i64 + %2633 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2634 = llvm.mul %2613, %2633 : !llvm.i64 + %2635 = llvm.add %2632, %2634 : !llvm.i64 + %2636 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2637 = llvm.mul %2488, %2636 : !llvm.i64 + %2638 = llvm.add %2635, %2637 : !llvm.i64 + %2639 = llvm.getelementptr %2628[%2638] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2640 = llvm.load %2639 : !llvm.ptr> + %2641 = llvm.extractelement %2640[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2642 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2643 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2644 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2645 = llvm.mul %2470, %2644 : !llvm.i64 + %2646 = llvm.add %2643, %2645 : !llvm.i64 + %2647 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2648 = llvm.mul %2613, %2647 : !llvm.i64 + %2649 = llvm.add %2646, %2648 : !llvm.i64 + %2650 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2651 = llvm.mul %2488, %2650 : !llvm.i64 + %2652 = llvm.add %2649, %2651 : !llvm.i64 + %2653 = llvm.getelementptr %2642[%2652] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2654 = llvm.load %2653 : !llvm.ptr> + %2655 = llvm.extractelement %2654[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2659 = llvm.mul %2470, %2658 : !llvm.i64 + %2660 = llvm.add %2657, %2659 : !llvm.i64 + %2661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2662 = llvm.mul %2613, %2661 : !llvm.i64 + %2663 = llvm.add %2660, %2662 : !llvm.i64 + %2664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2665 = llvm.mul %2488, %2664 : !llvm.i64 + %2666 = llvm.add %2663, %2665 : !llvm.i64 + %2667 = llvm.getelementptr %2656[%2666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2668 = llvm.load %2667 : !llvm.ptr> + %2669 = llvm.extractelement %2668[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2673 = llvm.mul %2470, %2672 : !llvm.i64 + %2674 = llvm.add %2671, %2673 : !llvm.i64 + %2675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2676 = llvm.mul %2613, %2675 : !llvm.i64 + %2677 = llvm.add %2674, %2676 : !llvm.i64 + %2678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2679 = llvm.mul %2488, %2678 : !llvm.i64 + %2680 = llvm.add %2677, %2679 : !llvm.i64 + %2681 = llvm.getelementptr %2670[%2680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2682 = llvm.load %2681 : !llvm.ptr> + %2683 = llvm.extractelement %2682[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2684 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2685 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2686 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2687 = llvm.mul %2470, %2686 : !llvm.i64 + %2688 = llvm.add %2685, %2687 : !llvm.i64 + %2689 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2690 = llvm.mul %2613, %2689 : !llvm.i64 + %2691 = llvm.add %2688, %2690 : !llvm.i64 + %2692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2693 = llvm.mul %2488, %2692 : !llvm.i64 + %2694 = llvm.add %2691, %2693 : !llvm.i64 + %2695 = llvm.getelementptr %2684[%2694] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2696 = llvm.load %2695 : !llvm.ptr> + %2697 = llvm.extractelement %2696[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2698 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2700 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2701 = llvm.mul %2470, %2700 : !llvm.i64 + %2702 = llvm.add %2699, %2701 : !llvm.i64 + %2703 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2704 = llvm.mul %2613, %2703 : !llvm.i64 + %2705 = llvm.add %2702, %2704 : !llvm.i64 + %2706 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2707 = llvm.mul %2488, %2706 : !llvm.i64 + %2708 = llvm.add %2705, %2707 : !llvm.i64 + %2709 = llvm.getelementptr %2698[%2708] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2710 = llvm.load %2709 : !llvm.ptr> + %2711 = llvm.extractelement %2710[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2712 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2714 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2715 = llvm.mul %2470, %2714 : !llvm.i64 + %2716 = llvm.add %2713, %2715 : !llvm.i64 + %2717 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2718 = llvm.mul %2613, %2717 : !llvm.i64 + %2719 = llvm.add %2716, %2718 : !llvm.i64 + %2720 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2721 = llvm.mul %2488, %2720 : !llvm.i64 + %2722 = llvm.add %2719, %2721 : !llvm.i64 + %2723 = llvm.getelementptr %2712[%2722] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2724 = llvm.load %2723 : !llvm.ptr> + %2725 = llvm.extractelement %2724[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2726 = llvm.fadd %2627, %2601 {RelaxedPrecision} : !llvm.float + %2727 = llvm.fadd %2641, %2602 {RelaxedPrecision} : !llvm.float + %2728 = llvm.fadd %2655, %2603 {RelaxedPrecision} : !llvm.float + %2729 = llvm.fadd %2669, %2604 {RelaxedPrecision} : !llvm.float + %2730 = llvm.fadd %2683, %2605 {RelaxedPrecision} : !llvm.float + %2731 = llvm.fadd %2697, %2606 {RelaxedPrecision} : !llvm.float + %2732 = llvm.fadd %2711, %2607 {RelaxedPrecision} : !llvm.float + %2733 = llvm.fadd %2725, %2608 {RelaxedPrecision} : !llvm.float + %2734 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2735 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2736 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2737 = llvm.mul %2470, %2736 : !llvm.i64 + %2738 = llvm.add %2735, %2737 : !llvm.i64 + %2739 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2740 = llvm.mul %2613, %2739 : !llvm.i64 + %2741 = llvm.add %2738, %2740 : !llvm.i64 + %2742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2743 = llvm.mul %2488, %2742 : !llvm.i64 + %2744 = llvm.add %2741, %2743 : !llvm.i64 + %2745 = llvm.getelementptr %2734[%2744] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2746 = llvm.load %2745 : !llvm.ptr> + %2747 = llvm.insertelement %2726, %2746[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2748 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2750 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2751 = llvm.mul %2470, %2750 : !llvm.i64 + %2752 = llvm.add %2749, %2751 : !llvm.i64 + %2753 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2754 = llvm.mul %2613, %2753 : !llvm.i64 + %2755 = llvm.add %2752, %2754 : !llvm.i64 + %2756 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2757 = llvm.mul %2488, %2756 : !llvm.i64 + %2758 = llvm.add %2755, %2757 : !llvm.i64 + %2759 = llvm.getelementptr %2748[%2758] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2747, %2759 : !llvm.ptr> + %2760 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2761 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2762 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2763 = llvm.mul %2470, %2762 : !llvm.i64 + %2764 = llvm.add %2761, %2763 : !llvm.i64 + %2765 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2766 = llvm.mul %2613, %2765 : !llvm.i64 + %2767 = llvm.add %2764, %2766 : !llvm.i64 + %2768 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2769 = llvm.mul %2488, %2768 : !llvm.i64 + %2770 = llvm.add %2767, %2769 : !llvm.i64 + %2771 = llvm.getelementptr %2760[%2770] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2772 = llvm.load %2771 : !llvm.ptr> + %2773 = llvm.insertelement %2727, %2772[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2774 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2775 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2776 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2777 = llvm.mul %2470, %2776 : !llvm.i64 + %2778 = llvm.add %2775, %2777 : !llvm.i64 + %2779 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2780 = llvm.mul %2613, %2779 : !llvm.i64 + %2781 = llvm.add %2778, %2780 : !llvm.i64 + %2782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2783 = llvm.mul %2488, %2782 : !llvm.i64 + %2784 = llvm.add %2781, %2783 : !llvm.i64 + %2785 = llvm.getelementptr %2774[%2784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2773, %2785 : !llvm.ptr> + %2786 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2787 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2788 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2789 = llvm.mul %2470, %2788 : !llvm.i64 + %2790 = llvm.add %2787, %2789 : !llvm.i64 + %2791 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2792 = llvm.mul %2613, %2791 : !llvm.i64 + %2793 = llvm.add %2790, %2792 : !llvm.i64 + %2794 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2795 = llvm.mul %2488, %2794 : !llvm.i64 + %2796 = llvm.add %2793, %2795 : !llvm.i64 + %2797 = llvm.getelementptr %2786[%2796] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2798 = llvm.load %2797 : !llvm.ptr> + %2799 = llvm.insertelement %2728, %2798[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2800 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2801 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2802 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2803 = llvm.mul %2470, %2802 : !llvm.i64 + %2804 = llvm.add %2801, %2803 : !llvm.i64 + %2805 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2806 = llvm.mul %2613, %2805 : !llvm.i64 + %2807 = llvm.add %2804, %2806 : !llvm.i64 + %2808 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2809 = llvm.mul %2488, %2808 : !llvm.i64 + %2810 = llvm.add %2807, %2809 : !llvm.i64 + %2811 = llvm.getelementptr %2800[%2810] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2799, %2811 : !llvm.ptr> + %2812 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2813 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2814 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2815 = llvm.mul %2470, %2814 : !llvm.i64 + %2816 = llvm.add %2813, %2815 : !llvm.i64 + %2817 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2818 = llvm.mul %2613, %2817 : !llvm.i64 + %2819 = llvm.add %2816, %2818 : !llvm.i64 + %2820 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2821 = llvm.mul %2488, %2820 : !llvm.i64 + %2822 = llvm.add %2819, %2821 : !llvm.i64 + %2823 = llvm.getelementptr %2812[%2822] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2824 = llvm.load %2823 : !llvm.ptr> + %2825 = llvm.insertelement %2729, %2824[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2826 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2827 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2828 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2829 = llvm.mul %2470, %2828 : !llvm.i64 + %2830 = llvm.add %2827, %2829 : !llvm.i64 + %2831 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2832 = llvm.mul %2613, %2831 : !llvm.i64 + %2833 = llvm.add %2830, %2832 : !llvm.i64 + %2834 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2835 = llvm.mul %2488, %2834 : !llvm.i64 + %2836 = llvm.add %2833, %2835 : !llvm.i64 + %2837 = llvm.getelementptr %2826[%2836] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2825, %2837 : !llvm.ptr> + %2838 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2839 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2840 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2841 = llvm.mul %2470, %2840 : !llvm.i64 + %2842 = llvm.add %2839, %2841 : !llvm.i64 + %2843 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2844 = llvm.mul %2613, %2843 : !llvm.i64 + %2845 = llvm.add %2842, %2844 : !llvm.i64 + %2846 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2847 = llvm.mul %2488, %2846 : !llvm.i64 + %2848 = llvm.add %2845, %2847 : !llvm.i64 + %2849 = llvm.getelementptr %2838[%2848] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2850 = llvm.load %2849 : !llvm.ptr> + %2851 = llvm.insertelement %2730, %2850[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2852 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2853 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2854 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2855 = llvm.mul %2470, %2854 : !llvm.i64 + %2856 = llvm.add %2853, %2855 : !llvm.i64 + %2857 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2858 = llvm.mul %2613, %2857 : !llvm.i64 + %2859 = llvm.add %2856, %2858 : !llvm.i64 + %2860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2861 = llvm.mul %2488, %2860 : !llvm.i64 + %2862 = llvm.add %2859, %2861 : !llvm.i64 + %2863 = llvm.getelementptr %2852[%2862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2851, %2863 : !llvm.ptr> + %2864 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2865 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2866 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2867 = llvm.mul %2470, %2866 : !llvm.i64 + %2868 = llvm.add %2865, %2867 : !llvm.i64 + %2869 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2870 = llvm.mul %2613, %2869 : !llvm.i64 + %2871 = llvm.add %2868, %2870 : !llvm.i64 + %2872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2873 = llvm.mul %2488, %2872 : !llvm.i64 + %2874 = llvm.add %2871, %2873 : !llvm.i64 + %2875 = llvm.getelementptr %2864[%2874] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2876 = llvm.load %2875 : !llvm.ptr> + %2877 = llvm.insertelement %2731, %2876[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2878 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2880 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2881 = llvm.mul %2470, %2880 : !llvm.i64 + %2882 = llvm.add %2879, %2881 : !llvm.i64 + %2883 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2884 = llvm.mul %2613, %2883 : !llvm.i64 + %2885 = llvm.add %2882, %2884 : !llvm.i64 + %2886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2887 = llvm.mul %2488, %2886 : !llvm.i64 + %2888 = llvm.add %2885, %2887 : !llvm.i64 + %2889 = llvm.getelementptr %2878[%2888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2877, %2889 : !llvm.ptr> + %2890 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2892 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2893 = llvm.mul %2470, %2892 : !llvm.i64 + %2894 = llvm.add %2891, %2893 : !llvm.i64 + %2895 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2896 = llvm.mul %2613, %2895 : !llvm.i64 + %2897 = llvm.add %2894, %2896 : !llvm.i64 + %2898 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2899 = llvm.mul %2488, %2898 : !llvm.i64 + %2900 = llvm.add %2897, %2899 : !llvm.i64 + %2901 = llvm.getelementptr %2890[%2900] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2902 = llvm.load %2901 : !llvm.ptr> + %2903 = llvm.insertelement %2732, %2902[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2904 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2905 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2906 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2907 = llvm.mul %2470, %2906 : !llvm.i64 + %2908 = llvm.add %2905, %2907 : !llvm.i64 + %2909 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2910 = llvm.mul %2613, %2909 : !llvm.i64 + %2911 = llvm.add %2908, %2910 : !llvm.i64 + %2912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2913 = llvm.mul %2488, %2912 : !llvm.i64 + %2914 = llvm.add %2911, %2913 : !llvm.i64 + %2915 = llvm.getelementptr %2904[%2914] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2903, %2915 : !llvm.ptr> + %2916 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2917 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2918 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2919 = llvm.mul %2470, %2918 : !llvm.i64 + %2920 = llvm.add %2917, %2919 : !llvm.i64 + %2921 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2922 = llvm.mul %2613, %2921 : !llvm.i64 + %2923 = llvm.add %2920, %2922 : !llvm.i64 + %2924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2925 = llvm.mul %2488, %2924 : !llvm.i64 + %2926 = llvm.add %2923, %2925 : !llvm.i64 + %2927 = llvm.getelementptr %2916[%2926] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2928 = llvm.load %2927 : !llvm.ptr> + %2929 = llvm.insertelement %2733, %2928[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2930 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2931 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2932 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2933 = llvm.mul %2470, %2932 : !llvm.i64 + %2934 = llvm.add %2931, %2933 : !llvm.i64 + %2935 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2936 = llvm.mul %2613, %2935 : !llvm.i64 + %2937 = llvm.add %2934, %2936 : !llvm.i64 + %2938 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2939 = llvm.mul %2488, %2938 : !llvm.i64 + %2940 = llvm.add %2937, %2939 : !llvm.i64 + %2941 = llvm.getelementptr %2930[%2940] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2929, %2941 : !llvm.ptr> + %2942 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2943 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2944 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2945 = llvm.mul %2470, %2944 : !llvm.i64 + %2946 = llvm.add %2943, %2945 : !llvm.i64 + %2947 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2948 = llvm.mul %2613, %2947 : !llvm.i64 + %2949 = llvm.add %2946, %2948 : !llvm.i64 + %2950 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2951 = llvm.mul %2488, %2950 : !llvm.i64 + %2952 = llvm.add %2949, %2951 : !llvm.i64 + %2953 = llvm.getelementptr %2942[%2952] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2954 = llvm.load %2953 : !llvm.ptr> + %2955 = llvm.insertelement %2726, %2954[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2956 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2957 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2958 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2959 = llvm.mul %2470, %2958 : !llvm.i64 + %2960 = llvm.add %2957, %2959 : !llvm.i64 + %2961 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2962 = llvm.mul %2613, %2961 : !llvm.i64 + %2963 = llvm.add %2960, %2962 : !llvm.i64 + %2964 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2965 = llvm.mul %2488, %2964 : !llvm.i64 + %2966 = llvm.add %2963, %2965 : !llvm.i64 + %2967 = llvm.getelementptr %2956[%2966] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2955, %2967 : !llvm.ptr> + %2968 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2970 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2971 = llvm.mul %2470, %2970 : !llvm.i64 + %2972 = llvm.add %2969, %2971 : !llvm.i64 + %2973 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2974 = llvm.mul %2613, %2973 : !llvm.i64 + %2975 = llvm.add %2972, %2974 : !llvm.i64 + %2976 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2977 = llvm.mul %2488, %2976 : !llvm.i64 + %2978 = llvm.add %2975, %2977 : !llvm.i64 + %2979 = llvm.getelementptr %2968[%2978] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2980 = llvm.load %2979 : !llvm.ptr> + %2981 = llvm.insertelement %2727, %2980[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2982 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2984 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2985 = llvm.mul %2470, %2984 : !llvm.i64 + %2986 = llvm.add %2983, %2985 : !llvm.i64 + %2987 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2988 = llvm.mul %2613, %2987 : !llvm.i64 + %2989 = llvm.add %2986, %2988 : !llvm.i64 + %2990 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2991 = llvm.mul %2488, %2990 : !llvm.i64 + %2992 = llvm.add %2989, %2991 : !llvm.i64 + %2993 = llvm.getelementptr %2982[%2992] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2981, %2993 : !llvm.ptr> + %2994 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2995 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2996 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2997 = llvm.mul %2470, %2996 : !llvm.i64 + %2998 = llvm.add %2995, %2997 : !llvm.i64 + %2999 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3000 = llvm.mul %2613, %2999 : !llvm.i64 + %3001 = llvm.add %2998, %3000 : !llvm.i64 + %3002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3003 = llvm.mul %2488, %3002 : !llvm.i64 + %3004 = llvm.add %3001, %3003 : !llvm.i64 + %3005 = llvm.getelementptr %2994[%3004] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3006 = llvm.load %3005 : !llvm.ptr> + %3007 = llvm.insertelement %2728, %3006[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3008 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3010 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3011 = llvm.mul %2470, %3010 : !llvm.i64 + %3012 = llvm.add %3009, %3011 : !llvm.i64 + %3013 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3014 = llvm.mul %2613, %3013 : !llvm.i64 + %3015 = llvm.add %3012, %3014 : !llvm.i64 + %3016 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3017 = llvm.mul %2488, %3016 : !llvm.i64 + %3018 = llvm.add %3015, %3017 : !llvm.i64 + %3019 = llvm.getelementptr %3008[%3018] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3007, %3019 : !llvm.ptr> + %3020 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3021 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3022 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3023 = llvm.mul %2470, %3022 : !llvm.i64 + %3024 = llvm.add %3021, %3023 : !llvm.i64 + %3025 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3026 = llvm.mul %2613, %3025 : !llvm.i64 + %3027 = llvm.add %3024, %3026 : !llvm.i64 + %3028 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3029 = llvm.mul %2488, %3028 : !llvm.i64 + %3030 = llvm.add %3027, %3029 : !llvm.i64 + %3031 = llvm.getelementptr %3020[%3030] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3032 = llvm.load %3031 : !llvm.ptr> + %3033 = llvm.insertelement %2729, %3032[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3037 = llvm.mul %2470, %3036 : !llvm.i64 + %3038 = llvm.add %3035, %3037 : !llvm.i64 + %3039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3040 = llvm.mul %2613, %3039 : !llvm.i64 + %3041 = llvm.add %3038, %3040 : !llvm.i64 + %3042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3043 = llvm.mul %2488, %3042 : !llvm.i64 + %3044 = llvm.add %3041, %3043 : !llvm.i64 + %3045 = llvm.getelementptr %3034[%3044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3033, %3045 : !llvm.ptr> + %3046 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3047 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3048 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3049 = llvm.mul %2470, %3048 : !llvm.i64 + %3050 = llvm.add %3047, %3049 : !llvm.i64 + %3051 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3052 = llvm.mul %2613, %3051 : !llvm.i64 + %3053 = llvm.add %3050, %3052 : !llvm.i64 + %3054 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3055 = llvm.mul %2488, %3054 : !llvm.i64 + %3056 = llvm.add %3053, %3055 : !llvm.i64 + %3057 = llvm.getelementptr %3046[%3056] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3058 = llvm.load %3057 : !llvm.ptr> + %3059 = llvm.insertelement %2730, %3058[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3060 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3062 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3063 = llvm.mul %2470, %3062 : !llvm.i64 + %3064 = llvm.add %3061, %3063 : !llvm.i64 + %3065 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3066 = llvm.mul %2613, %3065 : !llvm.i64 + %3067 = llvm.add %3064, %3066 : !llvm.i64 + %3068 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3069 = llvm.mul %2488, %3068 : !llvm.i64 + %3070 = llvm.add %3067, %3069 : !llvm.i64 + %3071 = llvm.getelementptr %3060[%3070] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3059, %3071 : !llvm.ptr> + %3072 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3073 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3074 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3075 = llvm.mul %2470, %3074 : !llvm.i64 + %3076 = llvm.add %3073, %3075 : !llvm.i64 + %3077 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3078 = llvm.mul %2613, %3077 : !llvm.i64 + %3079 = llvm.add %3076, %3078 : !llvm.i64 + %3080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3081 = llvm.mul %2488, %3080 : !llvm.i64 + %3082 = llvm.add %3079, %3081 : !llvm.i64 + %3083 = llvm.getelementptr %3072[%3082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3084 = llvm.load %3083 : !llvm.ptr> + %3085 = llvm.insertelement %2731, %3084[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3086 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3087 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3088 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3089 = llvm.mul %2470, %3088 : !llvm.i64 + %3090 = llvm.add %3087, %3089 : !llvm.i64 + %3091 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3092 = llvm.mul %2613, %3091 : !llvm.i64 + %3093 = llvm.add %3090, %3092 : !llvm.i64 + %3094 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3095 = llvm.mul %2488, %3094 : !llvm.i64 + %3096 = llvm.add %3093, %3095 : !llvm.i64 + %3097 = llvm.getelementptr %3086[%3096] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3085, %3097 : !llvm.ptr> + %3098 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3100 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3101 = llvm.mul %2470, %3100 : !llvm.i64 + %3102 = llvm.add %3099, %3101 : !llvm.i64 + %3103 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3104 = llvm.mul %2613, %3103 : !llvm.i64 + %3105 = llvm.add %3102, %3104 : !llvm.i64 + %3106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3107 = llvm.mul %2488, %3106 : !llvm.i64 + %3108 = llvm.add %3105, %3107 : !llvm.i64 + %3109 = llvm.getelementptr %3098[%3108] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3110 = llvm.load %3109 : !llvm.ptr> + %3111 = llvm.insertelement %2732, %3110[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3112 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3114 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3115 = llvm.mul %2470, %3114 : !llvm.i64 + %3116 = llvm.add %3113, %3115 : !llvm.i64 + %3117 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3118 = llvm.mul %2613, %3117 : !llvm.i64 + %3119 = llvm.add %3116, %3118 : !llvm.i64 + %3120 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3121 = llvm.mul %2488, %3120 : !llvm.i64 + %3122 = llvm.add %3119, %3121 : !llvm.i64 + %3123 = llvm.getelementptr %3112[%3122] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3111, %3123 : !llvm.ptr> + %3124 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3126 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3127 = llvm.mul %2470, %3126 : !llvm.i64 + %3128 = llvm.add %3125, %3127 : !llvm.i64 + %3129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3130 = llvm.mul %2613, %3129 : !llvm.i64 + %3131 = llvm.add %3128, %3130 : !llvm.i64 + %3132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3133 = llvm.mul %2488, %3132 : !llvm.i64 + %3134 = llvm.add %3131, %3133 : !llvm.i64 + %3135 = llvm.getelementptr %3124[%3134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3136 = llvm.load %3135 : !llvm.ptr> + %3137 = llvm.insertelement %2733, %3136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3138 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3140 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3141 = llvm.mul %2470, %3140 : !llvm.i64 + %3142 = llvm.add %3139, %3141 : !llvm.i64 + %3143 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3144 = llvm.mul %2613, %3143 : !llvm.i64 + %3145 = llvm.add %3142, %3144 : !llvm.i64 + %3146 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3147 = llvm.mul %2488, %3146 : !llvm.i64 + %3148 = llvm.add %3145, %3147 : !llvm.i64 + %3149 = llvm.getelementptr %3138[%3148] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3137, %3149 : !llvm.ptr> + %3150 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3151 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3152 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3153 = llvm.mul %2379, %3152 : !llvm.i64 + %3154 = llvm.add %3151, %3153 : !llvm.i64 + %3155 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3156 = llvm.mul %2380, %3155 : !llvm.i64 + %3157 = llvm.add %3154, %3156 : !llvm.i64 + %3158 = llvm.getelementptr %3150[%3157] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3159 = llvm.load %3158 : !llvm.ptr + %3160 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3162 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3163 = llvm.mul %2379, %3162 : !llvm.i64 + %3164 = llvm.add %3161, %3163 : !llvm.i64 + %3165 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3166 = llvm.mul %2380, %3165 : !llvm.i64 + %3167 = llvm.add %3164, %3166 : !llvm.i64 + %3168 = llvm.getelementptr %3160[%3167] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3169 = llvm.load %3168 : !llvm.ptr + %3170 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3171 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3172 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3173 = llvm.mul %2379, %3172 : !llvm.i64 + %3174 = llvm.add %3171, %3173 : !llvm.i64 + %3175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3176 = llvm.mul %2380, %3175 : !llvm.i64 + %3177 = llvm.add %3174, %3176 : !llvm.i64 + %3178 = llvm.getelementptr %3170[%3177] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3179 = llvm.load %3178 : !llvm.ptr + %3180 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3182 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3183 = llvm.mul %2379, %3182 : !llvm.i64 + %3184 = llvm.add %3181, %3183 : !llvm.i64 + %3185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3186 = llvm.mul %2380, %3185 : !llvm.i64 + %3187 = llvm.add %3184, %3186 : !llvm.i64 + %3188 = llvm.getelementptr %3180[%3187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3189 = llvm.load %3188 : !llvm.ptr + %3190 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3192 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3193 = llvm.mul %2379, %3192 : !llvm.i64 + %3194 = llvm.add %3191, %3193 : !llvm.i64 + %3195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3196 = llvm.mul %2380, %3195 : !llvm.i64 + %3197 = llvm.add %3194, %3196 : !llvm.i64 + %3198 = llvm.getelementptr %3190[%3197] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3199 = llvm.load %3198 : !llvm.ptr + %3200 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3202 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3203 = llvm.mul %2379, %3202 : !llvm.i64 + %3204 = llvm.add %3201, %3203 : !llvm.i64 + %3205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3206 = llvm.mul %2380, %3205 : !llvm.i64 + %3207 = llvm.add %3204, %3206 : !llvm.i64 + %3208 = llvm.getelementptr %3200[%3207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3209 = llvm.load %3208 : !llvm.ptr + %3210 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3212 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3213 = llvm.mul %2379, %3212 : !llvm.i64 + %3214 = llvm.add %3211, %3213 : !llvm.i64 + %3215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3216 = llvm.mul %2380, %3215 : !llvm.i64 + %3217 = llvm.add %3214, %3216 : !llvm.i64 + %3218 = llvm.getelementptr %3210[%3217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3219 = llvm.load %3218 : !llvm.ptr + %3220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3222 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3223 = llvm.mul %2379, %3222 : !llvm.i64 + %3224 = llvm.add %3221, %3223 : !llvm.i64 + %3225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3226 = llvm.mul %2380, %3225 : !llvm.i64 + %3227 = llvm.add %3224, %3226 : !llvm.i64 + %3228 = llvm.getelementptr %3220[%3227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3229 = llvm.load %3228 : !llvm.ptr + %3230 = llvm.add %2368, %70 : !llvm.i64 + %3231 = llvm.icmp "slt" %3230, %67 : !llvm.i64 + %3232 = llvm.sub %64, %3230 : !llvm.i64 + %3233 = llvm.select %3231, %3232, %3230 : !llvm.i1, !llvm.i64 + %3234 = llvm.sdiv %3233, %68 : !llvm.i64 + %3235 = llvm.sub %64, %3234 : !llvm.i64 + %3236 = llvm.select %3231, %3235, %3234 : !llvm.i1, !llvm.i64 + %3237 = llvm.srem %3236, %68 : !llvm.i64 + %3238 = llvm.icmp "slt" %3237, %67 : !llvm.i64 + %3239 = llvm.add %3237, %68 : !llvm.i64 + %3240 = llvm.select %3238, %3239, %3237 : !llvm.i1, !llvm.i64 + %3241 = llvm.sdiv %2463, %70 : !llvm.i64 + %3242 = llvm.sub %64, %3241 : !llvm.i64 + %3243 = llvm.select %2461, %3242, %3241 : !llvm.i1, !llvm.i64 + %3244 = llvm.mul %3236, %65 : !llvm.i64 + %3245 = llvm.add %3243, %3244 : !llvm.i64 + %3246 = llvm.add %3245, %69 : !llvm.i64 + %3247 = llvm.icmp "slt" %3246, %67 : !llvm.i64 + %3248 = llvm.sub %64, %3246 : !llvm.i64 + %3249 = llvm.select %3247, %3248, %3246 : !llvm.i1, !llvm.i64 + %3250 = llvm.sdiv %3249, %63 : !llvm.i64 + %3251 = llvm.sub %64, %3250 : !llvm.i64 + %3252 = llvm.select %3247, %3251, %3250 : !llvm.i1, !llvm.i64 + %3253 = llvm.mul %3252, %65 : !llvm.i64 + %3254 = llvm.add %3245, %3253 : !llvm.i64 + %3255 = llvm.add %3254, %69 : !llvm.i64 + %3256 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3257 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3258 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3259 = llvm.mul %3240, %3258 : !llvm.i64 + %3260 = llvm.add %3257, %3259 : !llvm.i64 + %3261 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3262 = llvm.mul %2474, %3261 : !llvm.i64 + %3263 = llvm.add %3260, %3262 : !llvm.i64 + %3264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3265 = llvm.mul %3255, %3264 : !llvm.i64 + %3266 = llvm.add %3263, %3265 : !llvm.i64 + %3267 = llvm.getelementptr %3256[%3266] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3268 = llvm.load %3267 : !llvm.ptr> + %3269 = llvm.extractelement %3268[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3270 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3271 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3272 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3273 = llvm.mul %3240, %3272 : !llvm.i64 + %3274 = llvm.add %3271, %3273 : !llvm.i64 + %3275 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3276 = llvm.mul %2474, %3275 : !llvm.i64 + %3277 = llvm.add %3274, %3276 : !llvm.i64 + %3278 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3279 = llvm.mul %3255, %3278 : !llvm.i64 + %3280 = llvm.add %3277, %3279 : !llvm.i64 + %3281 = llvm.getelementptr %3270[%3280] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3282 = llvm.load %3281 : !llvm.ptr> + %3283 = llvm.extractelement %3282[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3284 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3286 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3287 = llvm.mul %3240, %3286 : !llvm.i64 + %3288 = llvm.add %3285, %3287 : !llvm.i64 + %3289 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3290 = llvm.mul %2474, %3289 : !llvm.i64 + %3291 = llvm.add %3288, %3290 : !llvm.i64 + %3292 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3293 = llvm.mul %3255, %3292 : !llvm.i64 + %3294 = llvm.add %3291, %3293 : !llvm.i64 + %3295 = llvm.getelementptr %3284[%3294] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3296 = llvm.load %3295 : !llvm.ptr> + %3297 = llvm.extractelement %3296[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3298 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3300 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3301 = llvm.mul %3240, %3300 : !llvm.i64 + %3302 = llvm.add %3299, %3301 : !llvm.i64 + %3303 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3304 = llvm.mul %2474, %3303 : !llvm.i64 + %3305 = llvm.add %3302, %3304 : !llvm.i64 + %3306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3307 = llvm.mul %3255, %3306 : !llvm.i64 + %3308 = llvm.add %3305, %3307 : !llvm.i64 + %3309 = llvm.getelementptr %3298[%3308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3310 = llvm.load %3309 : !llvm.ptr> + %3311 = llvm.extractelement %3310[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3312 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3313 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3314 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3315 = llvm.mul %3240, %3314 : !llvm.i64 + %3316 = llvm.add %3313, %3315 : !llvm.i64 + %3317 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3318 = llvm.mul %2474, %3317 : !llvm.i64 + %3319 = llvm.add %3316, %3318 : !llvm.i64 + %3320 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3321 = llvm.mul %3255, %3320 : !llvm.i64 + %3322 = llvm.add %3319, %3321 : !llvm.i64 + %3323 = llvm.getelementptr %3312[%3322] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3324 = llvm.load %3323 : !llvm.ptr> + %3325 = llvm.extractelement %3324[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3326 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3328 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3329 = llvm.mul %3240, %3328 : !llvm.i64 + %3330 = llvm.add %3327, %3329 : !llvm.i64 + %3331 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3332 = llvm.mul %2474, %3331 : !llvm.i64 + %3333 = llvm.add %3330, %3332 : !llvm.i64 + %3334 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3335 = llvm.mul %3255, %3334 : !llvm.i64 + %3336 = llvm.add %3333, %3335 : !llvm.i64 + %3337 = llvm.getelementptr %3326[%3336] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3338 = llvm.load %3337 : !llvm.ptr> + %3339 = llvm.extractelement %3338[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3340 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3342 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3343 = llvm.mul %3240, %3342 : !llvm.i64 + %3344 = llvm.add %3341, %3343 : !llvm.i64 + %3345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3346 = llvm.mul %2474, %3345 : !llvm.i64 + %3347 = llvm.add %3344, %3346 : !llvm.i64 + %3348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3349 = llvm.mul %3255, %3348 : !llvm.i64 + %3350 = llvm.add %3347, %3349 : !llvm.i64 + %3351 = llvm.getelementptr %3340[%3350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3352 = llvm.load %3351 : !llvm.ptr> + %3353 = llvm.extractelement %3352[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3354 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3356 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3357 = llvm.mul %3240, %3356 : !llvm.i64 + %3358 = llvm.add %3355, %3357 : !llvm.i64 + %3359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3360 = llvm.mul %2474, %3359 : !llvm.i64 + %3361 = llvm.add %3358, %3360 : !llvm.i64 + %3362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3363 = llvm.mul %3255, %3362 : !llvm.i64 + %3364 = llvm.add %3361, %3363 : !llvm.i64 + %3365 = llvm.getelementptr %3354[%3364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3366 = llvm.load %3365 : !llvm.ptr> + %3367 = llvm.extractelement %3366[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3368 = llvm.fmul %3159, %3269 {RelaxedPrecision} : !llvm.float + %3369 = llvm.fmul %3169, %3283 {RelaxedPrecision} : !llvm.float + %3370 = llvm.fmul %3179, %3297 {RelaxedPrecision} : !llvm.float + %3371 = llvm.fmul %3189, %3311 {RelaxedPrecision} : !llvm.float + %3372 = llvm.fmul %3199, %3325 {RelaxedPrecision} : !llvm.float + %3373 = llvm.fmul %3209, %3339 {RelaxedPrecision} : !llvm.float + %3374 = llvm.fmul %3219, %3353 {RelaxedPrecision} : !llvm.float + %3375 = llvm.fmul %3229, %3367 {RelaxedPrecision} : !llvm.float + %3376 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3377 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3378 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3379 = llvm.mul %3240, %3378 : !llvm.i64 + %3380 = llvm.add %3377, %3379 : !llvm.i64 + %3381 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3382 = llvm.mul %2613, %3381 : !llvm.i64 + %3383 = llvm.add %3380, %3382 : !llvm.i64 + %3384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3385 = llvm.mul %3255, %3384 : !llvm.i64 + %3386 = llvm.add %3383, %3385 : !llvm.i64 + %3387 = llvm.getelementptr %3376[%3386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3388 = llvm.load %3387 : !llvm.ptr> + %3389 = llvm.extractelement %3388[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3390 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3392 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3393 = llvm.mul %3240, %3392 : !llvm.i64 + %3394 = llvm.add %3391, %3393 : !llvm.i64 + %3395 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3396 = llvm.mul %2613, %3395 : !llvm.i64 + %3397 = llvm.add %3394, %3396 : !llvm.i64 + %3398 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3399 = llvm.mul %3255, %3398 : !llvm.i64 + %3400 = llvm.add %3397, %3399 : !llvm.i64 + %3401 = llvm.getelementptr %3390[%3400] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3402 = llvm.load %3401 : !llvm.ptr> + %3403 = llvm.extractelement %3402[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3404 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3405 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3406 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3407 = llvm.mul %3240, %3406 : !llvm.i64 + %3408 = llvm.add %3405, %3407 : !llvm.i64 + %3409 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3410 = llvm.mul %2613, %3409 : !llvm.i64 + %3411 = llvm.add %3408, %3410 : !llvm.i64 + %3412 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3413 = llvm.mul %3255, %3412 : !llvm.i64 + %3414 = llvm.add %3411, %3413 : !llvm.i64 + %3415 = llvm.getelementptr %3404[%3414] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3416 = llvm.load %3415 : !llvm.ptr> + %3417 = llvm.extractelement %3416[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3421 = llvm.mul %3240, %3420 : !llvm.i64 + %3422 = llvm.add %3419, %3421 : !llvm.i64 + %3423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3424 = llvm.mul %2613, %3423 : !llvm.i64 + %3425 = llvm.add %3422, %3424 : !llvm.i64 + %3426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3427 = llvm.mul %3255, %3426 : !llvm.i64 + %3428 = llvm.add %3425, %3427 : !llvm.i64 + %3429 = llvm.getelementptr %3418[%3428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3430 = llvm.load %3429 : !llvm.ptr> + %3431 = llvm.extractelement %3430[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3435 = llvm.mul %3240, %3434 : !llvm.i64 + %3436 = llvm.add %3433, %3435 : !llvm.i64 + %3437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3438 = llvm.mul %2613, %3437 : !llvm.i64 + %3439 = llvm.add %3436, %3438 : !llvm.i64 + %3440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3441 = llvm.mul %3255, %3440 : !llvm.i64 + %3442 = llvm.add %3439, %3441 : !llvm.i64 + %3443 = llvm.getelementptr %3432[%3442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3444 = llvm.load %3443 : !llvm.ptr> + %3445 = llvm.extractelement %3444[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3446 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3447 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3448 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3449 = llvm.mul %3240, %3448 : !llvm.i64 + %3450 = llvm.add %3447, %3449 : !llvm.i64 + %3451 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3452 = llvm.mul %2613, %3451 : !llvm.i64 + %3453 = llvm.add %3450, %3452 : !llvm.i64 + %3454 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3455 = llvm.mul %3255, %3454 : !llvm.i64 + %3456 = llvm.add %3453, %3455 : !llvm.i64 + %3457 = llvm.getelementptr %3446[%3456] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3458 = llvm.load %3457 : !llvm.ptr> + %3459 = llvm.extractelement %3458[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3460 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3461 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3462 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3463 = llvm.mul %3240, %3462 : !llvm.i64 + %3464 = llvm.add %3461, %3463 : !llvm.i64 + %3465 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3466 = llvm.mul %2613, %3465 : !llvm.i64 + %3467 = llvm.add %3464, %3466 : !llvm.i64 + %3468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3469 = llvm.mul %3255, %3468 : !llvm.i64 + %3470 = llvm.add %3467, %3469 : !llvm.i64 + %3471 = llvm.getelementptr %3460[%3470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3472 = llvm.load %3471 : !llvm.ptr> + %3473 = llvm.extractelement %3472[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3477 = llvm.mul %3240, %3476 : !llvm.i64 + %3478 = llvm.add %3475, %3477 : !llvm.i64 + %3479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3480 = llvm.mul %2613, %3479 : !llvm.i64 + %3481 = llvm.add %3478, %3480 : !llvm.i64 + %3482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3483 = llvm.mul %3255, %3482 : !llvm.i64 + %3484 = llvm.add %3481, %3483 : !llvm.i64 + %3485 = llvm.getelementptr %3474[%3484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3486 = llvm.load %3485 : !llvm.ptr> + %3487 = llvm.extractelement %3486[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3488 = llvm.fadd %3389, %3368 {RelaxedPrecision} : !llvm.float + %3489 = llvm.fadd %3403, %3369 {RelaxedPrecision} : !llvm.float + %3490 = llvm.fadd %3417, %3370 {RelaxedPrecision} : !llvm.float + %3491 = llvm.fadd %3431, %3371 {RelaxedPrecision} : !llvm.float + %3492 = llvm.fadd %3445, %3372 {RelaxedPrecision} : !llvm.float + %3493 = llvm.fadd %3459, %3373 {RelaxedPrecision} : !llvm.float + %3494 = llvm.fadd %3473, %3374 {RelaxedPrecision} : !llvm.float + %3495 = llvm.fadd %3487, %3375 {RelaxedPrecision} : !llvm.float + %3496 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3497 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3498 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3499 = llvm.mul %3240, %3498 : !llvm.i64 + %3500 = llvm.add %3497, %3499 : !llvm.i64 + %3501 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3502 = llvm.mul %2613, %3501 : !llvm.i64 + %3503 = llvm.add %3500, %3502 : !llvm.i64 + %3504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3505 = llvm.mul %3255, %3504 : !llvm.i64 + %3506 = llvm.add %3503, %3505 : !llvm.i64 + %3507 = llvm.getelementptr %3496[%3506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3508 = llvm.load %3507 : !llvm.ptr> + %3509 = llvm.insertelement %3488, %3508[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3510 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3511 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3512 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3513 = llvm.mul %3240, %3512 : !llvm.i64 + %3514 = llvm.add %3511, %3513 : !llvm.i64 + %3515 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3516 = llvm.mul %2613, %3515 : !llvm.i64 + %3517 = llvm.add %3514, %3516 : !llvm.i64 + %3518 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3519 = llvm.mul %3255, %3518 : !llvm.i64 + %3520 = llvm.add %3517, %3519 : !llvm.i64 + %3521 = llvm.getelementptr %3510[%3520] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3509, %3521 : !llvm.ptr> + %3522 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3523 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3524 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3525 = llvm.mul %3240, %3524 : !llvm.i64 + %3526 = llvm.add %3523, %3525 : !llvm.i64 + %3527 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3528 = llvm.mul %2613, %3527 : !llvm.i64 + %3529 = llvm.add %3526, %3528 : !llvm.i64 + %3530 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3531 = llvm.mul %3255, %3530 : !llvm.i64 + %3532 = llvm.add %3529, %3531 : !llvm.i64 + %3533 = llvm.getelementptr %3522[%3532] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3534 = llvm.load %3533 : !llvm.ptr> + %3535 = llvm.insertelement %3489, %3534[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3536 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3537 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3538 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3539 = llvm.mul %3240, %3538 : !llvm.i64 + %3540 = llvm.add %3537, %3539 : !llvm.i64 + %3541 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3542 = llvm.mul %2613, %3541 : !llvm.i64 + %3543 = llvm.add %3540, %3542 : !llvm.i64 + %3544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3545 = llvm.mul %3255, %3544 : !llvm.i64 + %3546 = llvm.add %3543, %3545 : !llvm.i64 + %3547 = llvm.getelementptr %3536[%3546] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3535, %3547 : !llvm.ptr> + %3548 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3550 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3551 = llvm.mul %3240, %3550 : !llvm.i64 + %3552 = llvm.add %3549, %3551 : !llvm.i64 + %3553 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3554 = llvm.mul %2613, %3553 : !llvm.i64 + %3555 = llvm.add %3552, %3554 : !llvm.i64 + %3556 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3557 = llvm.mul %3255, %3556 : !llvm.i64 + %3558 = llvm.add %3555, %3557 : !llvm.i64 + %3559 = llvm.getelementptr %3548[%3558] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3560 = llvm.load %3559 : !llvm.ptr> + %3561 = llvm.insertelement %3490, %3560[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3562 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3563 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3564 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3565 = llvm.mul %3240, %3564 : !llvm.i64 + %3566 = llvm.add %3563, %3565 : !llvm.i64 + %3567 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3568 = llvm.mul %2613, %3567 : !llvm.i64 + %3569 = llvm.add %3566, %3568 : !llvm.i64 + %3570 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3571 = llvm.mul %3255, %3570 : !llvm.i64 + %3572 = llvm.add %3569, %3571 : !llvm.i64 + %3573 = llvm.getelementptr %3562[%3572] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3561, %3573 : !llvm.ptr> + %3574 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3575 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3576 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3577 = llvm.mul %3240, %3576 : !llvm.i64 + %3578 = llvm.add %3575, %3577 : !llvm.i64 + %3579 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3580 = llvm.mul %2613, %3579 : !llvm.i64 + %3581 = llvm.add %3578, %3580 : !llvm.i64 + %3582 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3583 = llvm.mul %3255, %3582 : !llvm.i64 + %3584 = llvm.add %3581, %3583 : !llvm.i64 + %3585 = llvm.getelementptr %3574[%3584] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3586 = llvm.load %3585 : !llvm.ptr> + %3587 = llvm.insertelement %3491, %3586[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3588 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3590 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3591 = llvm.mul %3240, %3590 : !llvm.i64 + %3592 = llvm.add %3589, %3591 : !llvm.i64 + %3593 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3594 = llvm.mul %2613, %3593 : !llvm.i64 + %3595 = llvm.add %3592, %3594 : !llvm.i64 + %3596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3597 = llvm.mul %3255, %3596 : !llvm.i64 + %3598 = llvm.add %3595, %3597 : !llvm.i64 + %3599 = llvm.getelementptr %3588[%3598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3587, %3599 : !llvm.ptr> + %3600 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3602 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3603 = llvm.mul %3240, %3602 : !llvm.i64 + %3604 = llvm.add %3601, %3603 : !llvm.i64 + %3605 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3606 = llvm.mul %2613, %3605 : !llvm.i64 + %3607 = llvm.add %3604, %3606 : !llvm.i64 + %3608 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3609 = llvm.mul %3255, %3608 : !llvm.i64 + %3610 = llvm.add %3607, %3609 : !llvm.i64 + %3611 = llvm.getelementptr %3600[%3610] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3612 = llvm.load %3611 : !llvm.ptr> + %3613 = llvm.insertelement %3492, %3612[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3617 = llvm.mul %3240, %3616 : !llvm.i64 + %3618 = llvm.add %3615, %3617 : !llvm.i64 + %3619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3620 = llvm.mul %2613, %3619 : !llvm.i64 + %3621 = llvm.add %3618, %3620 : !llvm.i64 + %3622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3623 = llvm.mul %3255, %3622 : !llvm.i64 + %3624 = llvm.add %3621, %3623 : !llvm.i64 + %3625 = llvm.getelementptr %3614[%3624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3613, %3625 : !llvm.ptr> + %3626 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3627 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3628 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3629 = llvm.mul %3240, %3628 : !llvm.i64 + %3630 = llvm.add %3627, %3629 : !llvm.i64 + %3631 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3632 = llvm.mul %2613, %3631 : !llvm.i64 + %3633 = llvm.add %3630, %3632 : !llvm.i64 + %3634 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3635 = llvm.mul %3255, %3634 : !llvm.i64 + %3636 = llvm.add %3633, %3635 : !llvm.i64 + %3637 = llvm.getelementptr %3626[%3636] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3638 = llvm.load %3637 : !llvm.ptr> + %3639 = llvm.insertelement %3493, %3638[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3640 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3641 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3642 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3643 = llvm.mul %3240, %3642 : !llvm.i64 + %3644 = llvm.add %3641, %3643 : !llvm.i64 + %3645 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3646 = llvm.mul %2613, %3645 : !llvm.i64 + %3647 = llvm.add %3644, %3646 : !llvm.i64 + %3648 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3649 = llvm.mul %3255, %3648 : !llvm.i64 + %3650 = llvm.add %3647, %3649 : !llvm.i64 + %3651 = llvm.getelementptr %3640[%3650] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3639, %3651 : !llvm.ptr> + %3652 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3653 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3654 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3655 = llvm.mul %3240, %3654 : !llvm.i64 + %3656 = llvm.add %3653, %3655 : !llvm.i64 + %3657 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3658 = llvm.mul %2613, %3657 : !llvm.i64 + %3659 = llvm.add %3656, %3658 : !llvm.i64 + %3660 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3661 = llvm.mul %3255, %3660 : !llvm.i64 + %3662 = llvm.add %3659, %3661 : !llvm.i64 + %3663 = llvm.getelementptr %3652[%3662] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3664 = llvm.load %3663 : !llvm.ptr> + %3665 = llvm.insertelement %3494, %3664[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3666 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3667 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3668 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3669 = llvm.mul %3240, %3668 : !llvm.i64 + %3670 = llvm.add %3667, %3669 : !llvm.i64 + %3671 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3672 = llvm.mul %2613, %3671 : !llvm.i64 + %3673 = llvm.add %3670, %3672 : !llvm.i64 + %3674 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3675 = llvm.mul %3255, %3674 : !llvm.i64 + %3676 = llvm.add %3673, %3675 : !llvm.i64 + %3677 = llvm.getelementptr %3666[%3676] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3665, %3677 : !llvm.ptr> + %3678 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3680 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3681 = llvm.mul %3240, %3680 : !llvm.i64 + %3682 = llvm.add %3679, %3681 : !llvm.i64 + %3683 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3684 = llvm.mul %2613, %3683 : !llvm.i64 + %3685 = llvm.add %3682, %3684 : !llvm.i64 + %3686 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3687 = llvm.mul %3255, %3686 : !llvm.i64 + %3688 = llvm.add %3685, %3687 : !llvm.i64 + %3689 = llvm.getelementptr %3678[%3688] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3690 = llvm.load %3689 : !llvm.ptr> + %3691 = llvm.insertelement %3495, %3690[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3692 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3694 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3695 = llvm.mul %3240, %3694 : !llvm.i64 + %3696 = llvm.add %3693, %3695 : !llvm.i64 + %3697 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3698 = llvm.mul %2613, %3697 : !llvm.i64 + %3699 = llvm.add %3696, %3698 : !llvm.i64 + %3700 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3701 = llvm.mul %3255, %3700 : !llvm.i64 + %3702 = llvm.add %3699, %3701 : !llvm.i64 + %3703 = llvm.getelementptr %3692[%3702] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3691, %3703 : !llvm.ptr> + %3704 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3705 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3706 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3707 = llvm.mul %3240, %3706 : !llvm.i64 + %3708 = llvm.add %3705, %3707 : !llvm.i64 + %3709 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3710 = llvm.mul %2613, %3709 : !llvm.i64 + %3711 = llvm.add %3708, %3710 : !llvm.i64 + %3712 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3713 = llvm.mul %3255, %3712 : !llvm.i64 + %3714 = llvm.add %3711, %3713 : !llvm.i64 + %3715 = llvm.getelementptr %3704[%3714] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3716 = llvm.load %3715 : !llvm.ptr> + %3717 = llvm.insertelement %3488, %3716[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3718 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3720 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3721 = llvm.mul %3240, %3720 : !llvm.i64 + %3722 = llvm.add %3719, %3721 : !llvm.i64 + %3723 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3724 = llvm.mul %2613, %3723 : !llvm.i64 + %3725 = llvm.add %3722, %3724 : !llvm.i64 + %3726 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3727 = llvm.mul %3255, %3726 : !llvm.i64 + %3728 = llvm.add %3725, %3727 : !llvm.i64 + %3729 = llvm.getelementptr %3718[%3728] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3717, %3729 : !llvm.ptr> + %3730 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3731 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3732 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3733 = llvm.mul %3240, %3732 : !llvm.i64 + %3734 = llvm.add %3731, %3733 : !llvm.i64 + %3735 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3736 = llvm.mul %2613, %3735 : !llvm.i64 + %3737 = llvm.add %3734, %3736 : !llvm.i64 + %3738 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3739 = llvm.mul %3255, %3738 : !llvm.i64 + %3740 = llvm.add %3737, %3739 : !llvm.i64 + %3741 = llvm.getelementptr %3730[%3740] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3742 = llvm.load %3741 : !llvm.ptr> + %3743 = llvm.insertelement %3489, %3742[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3744 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3746 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3747 = llvm.mul %3240, %3746 : !llvm.i64 + %3748 = llvm.add %3745, %3747 : !llvm.i64 + %3749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3750 = llvm.mul %2613, %3749 : !llvm.i64 + %3751 = llvm.add %3748, %3750 : !llvm.i64 + %3752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3753 = llvm.mul %3255, %3752 : !llvm.i64 + %3754 = llvm.add %3751, %3753 : !llvm.i64 + %3755 = llvm.getelementptr %3744[%3754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3743, %3755 : !llvm.ptr> + %3756 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3758 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3759 = llvm.mul %3240, %3758 : !llvm.i64 + %3760 = llvm.add %3757, %3759 : !llvm.i64 + %3761 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3762 = llvm.mul %2613, %3761 : !llvm.i64 + %3763 = llvm.add %3760, %3762 : !llvm.i64 + %3764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3765 = llvm.mul %3255, %3764 : !llvm.i64 + %3766 = llvm.add %3763, %3765 : !llvm.i64 + %3767 = llvm.getelementptr %3756[%3766] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3768 = llvm.load %3767 : !llvm.ptr> + %3769 = llvm.insertelement %3490, %3768[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3770 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3771 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3772 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3773 = llvm.mul %3240, %3772 : !llvm.i64 + %3774 = llvm.add %3771, %3773 : !llvm.i64 + %3775 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3776 = llvm.mul %2613, %3775 : !llvm.i64 + %3777 = llvm.add %3774, %3776 : !llvm.i64 + %3778 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3779 = llvm.mul %3255, %3778 : !llvm.i64 + %3780 = llvm.add %3777, %3779 : !llvm.i64 + %3781 = llvm.getelementptr %3770[%3780] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3769, %3781 : !llvm.ptr> + %3782 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3784 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3785 = llvm.mul %3240, %3784 : !llvm.i64 + %3786 = llvm.add %3783, %3785 : !llvm.i64 + %3787 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3788 = llvm.mul %2613, %3787 : !llvm.i64 + %3789 = llvm.add %3786, %3788 : !llvm.i64 + %3790 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3791 = llvm.mul %3255, %3790 : !llvm.i64 + %3792 = llvm.add %3789, %3791 : !llvm.i64 + %3793 = llvm.getelementptr %3782[%3792] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3794 = llvm.load %3793 : !llvm.ptr> + %3795 = llvm.insertelement %3491, %3794[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3796 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3797 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3798 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3799 = llvm.mul %3240, %3798 : !llvm.i64 + %3800 = llvm.add %3797, %3799 : !llvm.i64 + %3801 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3802 = llvm.mul %2613, %3801 : !llvm.i64 + %3803 = llvm.add %3800, %3802 : !llvm.i64 + %3804 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3805 = llvm.mul %3255, %3804 : !llvm.i64 + %3806 = llvm.add %3803, %3805 : !llvm.i64 + %3807 = llvm.getelementptr %3796[%3806] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3795, %3807 : !llvm.ptr> + %3808 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3810 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3811 = llvm.mul %3240, %3810 : !llvm.i64 + %3812 = llvm.add %3809, %3811 : !llvm.i64 + %3813 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3814 = llvm.mul %2613, %3813 : !llvm.i64 + %3815 = llvm.add %3812, %3814 : !llvm.i64 + %3816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3817 = llvm.mul %3255, %3816 : !llvm.i64 + %3818 = llvm.add %3815, %3817 : !llvm.i64 + %3819 = llvm.getelementptr %3808[%3818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3820 = llvm.load %3819 : !llvm.ptr> + %3821 = llvm.insertelement %3492, %3820[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3822 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3824 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3825 = llvm.mul %3240, %3824 : !llvm.i64 + %3826 = llvm.add %3823, %3825 : !llvm.i64 + %3827 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3828 = llvm.mul %2613, %3827 : !llvm.i64 + %3829 = llvm.add %3826, %3828 : !llvm.i64 + %3830 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3831 = llvm.mul %3255, %3830 : !llvm.i64 + %3832 = llvm.add %3829, %3831 : !llvm.i64 + %3833 = llvm.getelementptr %3822[%3832] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3821, %3833 : !llvm.ptr> + %3834 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3835 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3836 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3837 = llvm.mul %3240, %3836 : !llvm.i64 + %3838 = llvm.add %3835, %3837 : !llvm.i64 + %3839 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3840 = llvm.mul %2613, %3839 : !llvm.i64 + %3841 = llvm.add %3838, %3840 : !llvm.i64 + %3842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3843 = llvm.mul %3255, %3842 : !llvm.i64 + %3844 = llvm.add %3841, %3843 : !llvm.i64 + %3845 = llvm.getelementptr %3834[%3844] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3846 = llvm.load %3845 : !llvm.ptr> + %3847 = llvm.insertelement %3493, %3846[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3848 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3850 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3851 = llvm.mul %3240, %3850 : !llvm.i64 + %3852 = llvm.add %3849, %3851 : !llvm.i64 + %3853 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3854 = llvm.mul %2613, %3853 : !llvm.i64 + %3855 = llvm.add %3852, %3854 : !llvm.i64 + %3856 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3857 = llvm.mul %3255, %3856 : !llvm.i64 + %3858 = llvm.add %3855, %3857 : !llvm.i64 + %3859 = llvm.getelementptr %3848[%3858] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3847, %3859 : !llvm.ptr> + %3860 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3861 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3862 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3863 = llvm.mul %3240, %3862 : !llvm.i64 + %3864 = llvm.add %3861, %3863 : !llvm.i64 + %3865 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3866 = llvm.mul %2613, %3865 : !llvm.i64 + %3867 = llvm.add %3864, %3866 : !llvm.i64 + %3868 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3869 = llvm.mul %3255, %3868 : !llvm.i64 + %3870 = llvm.add %3867, %3869 : !llvm.i64 + %3871 = llvm.getelementptr %3860[%3870] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3872 = llvm.load %3871 : !llvm.ptr> + %3873 = llvm.insertelement %3494, %3872[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3874 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3875 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3876 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3877 = llvm.mul %3240, %3876 : !llvm.i64 + %3878 = llvm.add %3875, %3877 : !llvm.i64 + %3879 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3880 = llvm.mul %2613, %3879 : !llvm.i64 + %3881 = llvm.add %3878, %3880 : !llvm.i64 + %3882 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3883 = llvm.mul %3255, %3882 : !llvm.i64 + %3884 = llvm.add %3881, %3883 : !llvm.i64 + %3885 = llvm.getelementptr %3874[%3884] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3873, %3885 : !llvm.ptr> + %3886 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3888 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3889 = llvm.mul %3240, %3888 : !llvm.i64 + %3890 = llvm.add %3887, %3889 : !llvm.i64 + %3891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3892 = llvm.mul %2613, %3891 : !llvm.i64 + %3893 = llvm.add %3890, %3892 : !llvm.i64 + %3894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3895 = llvm.mul %3255, %3894 : !llvm.i64 + %3896 = llvm.add %3893, %3895 : !llvm.i64 + %3897 = llvm.getelementptr %3886[%3896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3898 = llvm.load %3897 : !llvm.ptr> + %3899 = llvm.insertelement %3495, %3898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3903 = llvm.mul %3240, %3902 : !llvm.i64 + %3904 = llvm.add %3901, %3903 : !llvm.i64 + %3905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3906 = llvm.mul %2613, %3905 : !llvm.i64 + %3907 = llvm.add %3904, %3906 : !llvm.i64 + %3908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3909 = llvm.mul %3255, %3908 : !llvm.i64 + %3910 = llvm.add %3907, %3909 : !llvm.i64 + %3911 = llvm.getelementptr %3900[%3910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3899, %3911 : !llvm.ptr> + %3912 = llvm.add %2376, %69 : !llvm.i64 + llvm.br ^bb31(%3912 : !llvm.i64) + ^bb33: // pred: ^bb31 + %3913 = llvm.add %2374, %69 : !llvm.i64 + llvm.br ^bb29(%3913 : !llvm.i64) + ^bb34: // pred: ^bb29 + %3914 = llvm.add %2372, %56 : !llvm.i64 + llvm.br ^bb27(%3914 : !llvm.i64) + ^bb35: // pred: ^bb27 + llvm.br ^bb36(%67 : !llvm.i64) + ^bb36(%3915: !llvm.i64): // 2 preds: ^bb35, ^bb37 + %3916 = llvm.icmp "slt" %3915, %48 : !llvm.i64 + llvm.cond_br %3916, ^bb37, ^bb38 + ^bb37: // pred: ^bb36 + %3917 = llvm.add %2370, %3915 : !llvm.i64 + %3918 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3920 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3921 = llvm.mul %2345, %3920 : !llvm.i64 + %3922 = llvm.add %3919, %3921 : !llvm.i64 + %3923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3924 = llvm.mul %3917, %3923 : !llvm.i64 + %3925 = llvm.add %3922, %3924 : !llvm.i64 + %3926 = llvm.getelementptr %3918[%3925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3927 = llvm.load %3926 : !llvm.ptr + %3928 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3930 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3931 = llvm.mul %2345, %3930 : !llvm.i64 + %3932 = llvm.add %3929, %3931 : !llvm.i64 + %3933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3934 = llvm.mul %3917, %3933 : !llvm.i64 + %3935 = llvm.add %3932, %3934 : !llvm.i64 + %3936 = llvm.getelementptr %3928[%3935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3937 = llvm.load %3936 : !llvm.ptr + %3938 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3940 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3941 = llvm.mul %2345, %3940 : !llvm.i64 + %3942 = llvm.add %3939, %3941 : !llvm.i64 + %3943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3944 = llvm.mul %3917, %3943 : !llvm.i64 + %3945 = llvm.add %3942, %3944 : !llvm.i64 + %3946 = llvm.getelementptr %3938[%3945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3947 = llvm.load %3946 : !llvm.ptr + %3948 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3950 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3951 = llvm.mul %2345, %3950 : !llvm.i64 + %3952 = llvm.add %3949, %3951 : !llvm.i64 + %3953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3954 = llvm.mul %3917, %3953 : !llvm.i64 + %3955 = llvm.add %3952, %3954 : !llvm.i64 + %3956 = llvm.getelementptr %3948[%3955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3957 = llvm.load %3956 : !llvm.ptr + %3958 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3960 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3961 = llvm.mul %2345, %3960 : !llvm.i64 + %3962 = llvm.add %3959, %3961 : !llvm.i64 + %3963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3964 = llvm.mul %3917, %3963 : !llvm.i64 + %3965 = llvm.add %3962, %3964 : !llvm.i64 + %3966 = llvm.getelementptr %3958[%3965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3967 = llvm.load %3966 : !llvm.ptr + %3968 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3970 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3971 = llvm.mul %2345, %3970 : !llvm.i64 + %3972 = llvm.add %3969, %3971 : !llvm.i64 + %3973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3974 = llvm.mul %3917, %3973 : !llvm.i64 + %3975 = llvm.add %3972, %3974 : !llvm.i64 + %3976 = llvm.getelementptr %3968[%3975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3977 = llvm.load %3976 : !llvm.ptr + %3978 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3980 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3981 = llvm.mul %2345, %3980 : !llvm.i64 + %3982 = llvm.add %3979, %3981 : !llvm.i64 + %3983 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3984 = llvm.mul %3917, %3983 : !llvm.i64 + %3985 = llvm.add %3982, %3984 : !llvm.i64 + %3986 = llvm.getelementptr %3978[%3985] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3987 = llvm.load %3986 : !llvm.ptr + %3988 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3990 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3991 = llvm.mul %2345, %3990 : !llvm.i64 + %3992 = llvm.add %3989, %3991 : !llvm.i64 + %3993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3994 = llvm.mul %3917, %3993 : !llvm.i64 + %3995 = llvm.add %3992, %3994 : !llvm.i64 + %3996 = llvm.getelementptr %3988[%3995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3997 = llvm.load %3996 : !llvm.ptr + %3998 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %3999 = llvm.sub %64, %2368 : !llvm.i64 + %4000 = llvm.select %3998, %3999, %2368 : !llvm.i1, !llvm.i64 + %4001 = llvm.sdiv %4000, %68 : !llvm.i64 + %4002 = llvm.sub %64, %4001 : !llvm.i64 + %4003 = llvm.select %3998, %4002, %4001 : !llvm.i1, !llvm.i64 + %4004 = llvm.srem %4003, %68 : !llvm.i64 + %4005 = llvm.icmp "slt" %4004, %67 : !llvm.i64 + %4006 = llvm.add %4004, %68 : !llvm.i64 + %4007 = llvm.select %4005, %4006, %4004 : !llvm.i1, !llvm.i64 + %4008 = llvm.srem %3917, %39 : !llvm.i64 + %4009 = llvm.icmp "slt" %4008, %67 : !llvm.i64 + %4010 = llvm.add %4008, %39 : !llvm.i64 + %4011 = llvm.select %4009, %4010, %4008 : !llvm.i1, !llvm.i64 + %4012 = llvm.srem %2368, %68 : !llvm.i64 + %4013 = llvm.icmp "slt" %4012, %67 : !llvm.i64 + %4014 = llvm.add %4012, %68 : !llvm.i64 + %4015 = llvm.select %4013, %4014, %4012 : !llvm.i1, !llvm.i64 + %4016 = llvm.icmp "slt" %4015, %67 : !llvm.i64 + %4017 = llvm.sub %64, %4015 : !llvm.i64 + %4018 = llvm.select %4016, %4017, %4015 : !llvm.i1, !llvm.i64 + %4019 = llvm.sdiv %4018, %70 : !llvm.i64 + %4020 = llvm.sub %64, %4019 : !llvm.i64 + %4021 = llvm.select %4016, %4020, %4019 : !llvm.i1, !llvm.i64 + %4022 = llvm.srem %4021, %63 : !llvm.i64 + %4023 = llvm.icmp "slt" %4022, %67 : !llvm.i64 + %4024 = llvm.add %4022, %63 : !llvm.i64 + %4025 = llvm.select %4023, %4024, %4022 : !llvm.i1, !llvm.i64 + %4026 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4027 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4028 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4029 = llvm.mul %4007, %4028 : !llvm.i64 + %4030 = llvm.add %4027, %4029 : !llvm.i64 + %4031 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4032 = llvm.mul %4011, %4031 : !llvm.i64 + %4033 = llvm.add %4030, %4032 : !llvm.i64 + %4034 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4035 = llvm.mul %4025, %4034 : !llvm.i64 + %4036 = llvm.add %4033, %4035 : !llvm.i64 + %4037 = llvm.getelementptr %4026[%4036] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4038 = llvm.load %4037 : !llvm.ptr> + %4039 = llvm.extractelement %4038[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4040 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4041 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4042 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4043 = llvm.mul %4007, %4042 : !llvm.i64 + %4044 = llvm.add %4041, %4043 : !llvm.i64 + %4045 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4046 = llvm.mul %4011, %4045 : !llvm.i64 + %4047 = llvm.add %4044, %4046 : !llvm.i64 + %4048 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4049 = llvm.mul %4025, %4048 : !llvm.i64 + %4050 = llvm.add %4047, %4049 : !llvm.i64 + %4051 = llvm.getelementptr %4040[%4050] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4052 = llvm.load %4051 : !llvm.ptr> + %4053 = llvm.extractelement %4052[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4054 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4056 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4057 = llvm.mul %4007, %4056 : !llvm.i64 + %4058 = llvm.add %4055, %4057 : !llvm.i64 + %4059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4060 = llvm.mul %4011, %4059 : !llvm.i64 + %4061 = llvm.add %4058, %4060 : !llvm.i64 + %4062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4063 = llvm.mul %4025, %4062 : !llvm.i64 + %4064 = llvm.add %4061, %4063 : !llvm.i64 + %4065 = llvm.getelementptr %4054[%4064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4066 = llvm.load %4065 : !llvm.ptr> + %4067 = llvm.extractelement %4066[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4068 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4070 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4071 = llvm.mul %4007, %4070 : !llvm.i64 + %4072 = llvm.add %4069, %4071 : !llvm.i64 + %4073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4074 = llvm.mul %4011, %4073 : !llvm.i64 + %4075 = llvm.add %4072, %4074 : !llvm.i64 + %4076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4077 = llvm.mul %4025, %4076 : !llvm.i64 + %4078 = llvm.add %4075, %4077 : !llvm.i64 + %4079 = llvm.getelementptr %4068[%4078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4080 = llvm.load %4079 : !llvm.ptr> + %4081 = llvm.extractelement %4080[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4082 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4083 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4084 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4085 = llvm.mul %4007, %4084 : !llvm.i64 + %4086 = llvm.add %4083, %4085 : !llvm.i64 + %4087 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4088 = llvm.mul %4011, %4087 : !llvm.i64 + %4089 = llvm.add %4086, %4088 : !llvm.i64 + %4090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4091 = llvm.mul %4025, %4090 : !llvm.i64 + %4092 = llvm.add %4089, %4091 : !llvm.i64 + %4093 = llvm.getelementptr %4082[%4092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4094 = llvm.load %4093 : !llvm.ptr> + %4095 = llvm.extractelement %4094[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4096 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4098 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4099 = llvm.mul %4007, %4098 : !llvm.i64 + %4100 = llvm.add %4097, %4099 : !llvm.i64 + %4101 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4102 = llvm.mul %4011, %4101 : !llvm.i64 + %4103 = llvm.add %4100, %4102 : !llvm.i64 + %4104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4105 = llvm.mul %4025, %4104 : !llvm.i64 + %4106 = llvm.add %4103, %4105 : !llvm.i64 + %4107 = llvm.getelementptr %4096[%4106] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4108 = llvm.load %4107 : !llvm.ptr> + %4109 = llvm.extractelement %4108[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4110 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4111 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4112 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4113 = llvm.mul %4007, %4112 : !llvm.i64 + %4114 = llvm.add %4111, %4113 : !llvm.i64 + %4115 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4116 = llvm.mul %4011, %4115 : !llvm.i64 + %4117 = llvm.add %4114, %4116 : !llvm.i64 + %4118 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4119 = llvm.mul %4025, %4118 : !llvm.i64 + %4120 = llvm.add %4117, %4119 : !llvm.i64 + %4121 = llvm.getelementptr %4110[%4120] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4122 = llvm.load %4121 : !llvm.ptr> + %4123 = llvm.extractelement %4122[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4124 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4126 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4127 = llvm.mul %4007, %4126 : !llvm.i64 + %4128 = llvm.add %4125, %4127 : !llvm.i64 + %4129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4130 = llvm.mul %4011, %4129 : !llvm.i64 + %4131 = llvm.add %4128, %4130 : !llvm.i64 + %4132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4133 = llvm.mul %4025, %4132 : !llvm.i64 + %4134 = llvm.add %4131, %4133 : !llvm.i64 + %4135 = llvm.getelementptr %4124[%4134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4136 = llvm.load %4135 : !llvm.ptr> + %4137 = llvm.extractelement %4136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4138 = llvm.fmul %3927, %4039 {RelaxedPrecision} : !llvm.float + %4139 = llvm.fmul %3937, %4053 {RelaxedPrecision} : !llvm.float + %4140 = llvm.fmul %3947, %4067 {RelaxedPrecision} : !llvm.float + %4141 = llvm.fmul %3957, %4081 {RelaxedPrecision} : !llvm.float + %4142 = llvm.fmul %3967, %4095 {RelaxedPrecision} : !llvm.float + %4143 = llvm.fmul %3977, %4109 {RelaxedPrecision} : !llvm.float + %4144 = llvm.fmul %3987, %4123 {RelaxedPrecision} : !llvm.float + %4145 = llvm.fmul %3997, %4137 {RelaxedPrecision} : !llvm.float + %4146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4149 = llvm.mul %4007, %4148 : !llvm.i64 + %4150 = llvm.add %4147, %4149 : !llvm.i64 + %4151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4152 = llvm.mul %67, %4151 : !llvm.i64 + %4153 = llvm.add %4150, %4152 : !llvm.i64 + %4154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4155 = llvm.mul %4025, %4154 : !llvm.i64 + %4156 = llvm.add %4153, %4155 : !llvm.i64 + %4157 = llvm.getelementptr %4146[%4156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4158 = llvm.load %4157 : !llvm.ptr> + %4159 = llvm.extractelement %4158[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4160 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4162 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4163 = llvm.mul %4007, %4162 : !llvm.i64 + %4164 = llvm.add %4161, %4163 : !llvm.i64 + %4165 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4166 = llvm.mul %67, %4165 : !llvm.i64 + %4167 = llvm.add %4164, %4166 : !llvm.i64 + %4168 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4169 = llvm.mul %4025, %4168 : !llvm.i64 + %4170 = llvm.add %4167, %4169 : !llvm.i64 + %4171 = llvm.getelementptr %4160[%4170] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4172 = llvm.load %4171 : !llvm.ptr> + %4173 = llvm.extractelement %4172[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4174 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4175 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4176 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4177 = llvm.mul %4007, %4176 : !llvm.i64 + %4178 = llvm.add %4175, %4177 : !llvm.i64 + %4179 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4180 = llvm.mul %67, %4179 : !llvm.i64 + %4181 = llvm.add %4178, %4180 : !llvm.i64 + %4182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4183 = llvm.mul %4025, %4182 : !llvm.i64 + %4184 = llvm.add %4181, %4183 : !llvm.i64 + %4185 = llvm.getelementptr %4174[%4184] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4186 = llvm.load %4185 : !llvm.ptr> + %4187 = llvm.extractelement %4186[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4188 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4190 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4191 = llvm.mul %4007, %4190 : !llvm.i64 + %4192 = llvm.add %4189, %4191 : !llvm.i64 + %4193 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4194 = llvm.mul %67, %4193 : !llvm.i64 + %4195 = llvm.add %4192, %4194 : !llvm.i64 + %4196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4197 = llvm.mul %4025, %4196 : !llvm.i64 + %4198 = llvm.add %4195, %4197 : !llvm.i64 + %4199 = llvm.getelementptr %4188[%4198] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4200 = llvm.load %4199 : !llvm.ptr> + %4201 = llvm.extractelement %4200[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4202 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4203 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4204 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4205 = llvm.mul %4007, %4204 : !llvm.i64 + %4206 = llvm.add %4203, %4205 : !llvm.i64 + %4207 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4208 = llvm.mul %67, %4207 : !llvm.i64 + %4209 = llvm.add %4206, %4208 : !llvm.i64 + %4210 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4211 = llvm.mul %4025, %4210 : !llvm.i64 + %4212 = llvm.add %4209, %4211 : !llvm.i64 + %4213 = llvm.getelementptr %4202[%4212] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4214 = llvm.load %4213 : !llvm.ptr> + %4215 = llvm.extractelement %4214[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4216 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4217 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4218 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4219 = llvm.mul %4007, %4218 : !llvm.i64 + %4220 = llvm.add %4217, %4219 : !llvm.i64 + %4221 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4222 = llvm.mul %67, %4221 : !llvm.i64 + %4223 = llvm.add %4220, %4222 : !llvm.i64 + %4224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4225 = llvm.mul %4025, %4224 : !llvm.i64 + %4226 = llvm.add %4223, %4225 : !llvm.i64 + %4227 = llvm.getelementptr %4216[%4226] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4228 = llvm.load %4227 : !llvm.ptr> + %4229 = llvm.extractelement %4228[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4230 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4232 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4233 = llvm.mul %4007, %4232 : !llvm.i64 + %4234 = llvm.add %4231, %4233 : !llvm.i64 + %4235 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4236 = llvm.mul %67, %4235 : !llvm.i64 + %4237 = llvm.add %4234, %4236 : !llvm.i64 + %4238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4239 = llvm.mul %4025, %4238 : !llvm.i64 + %4240 = llvm.add %4237, %4239 : !llvm.i64 + %4241 = llvm.getelementptr %4230[%4240] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4242 = llvm.load %4241 : !llvm.ptr> + %4243 = llvm.extractelement %4242[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4244 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4245 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4246 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4247 = llvm.mul %4007, %4246 : !llvm.i64 + %4248 = llvm.add %4245, %4247 : !llvm.i64 + %4249 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4250 = llvm.mul %67, %4249 : !llvm.i64 + %4251 = llvm.add %4248, %4250 : !llvm.i64 + %4252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4253 = llvm.mul %4025, %4252 : !llvm.i64 + %4254 = llvm.add %4251, %4253 : !llvm.i64 + %4255 = llvm.getelementptr %4244[%4254] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4256 = llvm.load %4255 : !llvm.ptr> + %4257 = llvm.extractelement %4256[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4258 = llvm.fadd %4159, %4138 {RelaxedPrecision} : !llvm.float + %4259 = llvm.fadd %4173, %4139 {RelaxedPrecision} : !llvm.float + %4260 = llvm.fadd %4187, %4140 {RelaxedPrecision} : !llvm.float + %4261 = llvm.fadd %4201, %4141 {RelaxedPrecision} : !llvm.float + %4262 = llvm.fadd %4215, %4142 {RelaxedPrecision} : !llvm.float + %4263 = llvm.fadd %4229, %4143 {RelaxedPrecision} : !llvm.float + %4264 = llvm.fadd %4243, %4144 {RelaxedPrecision} : !llvm.float + %4265 = llvm.fadd %4257, %4145 {RelaxedPrecision} : !llvm.float + %4266 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4268 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4269 = llvm.mul %4007, %4268 : !llvm.i64 + %4270 = llvm.add %4267, %4269 : !llvm.i64 + %4271 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4272 = llvm.mul %67, %4271 : !llvm.i64 + %4273 = llvm.add %4270, %4272 : !llvm.i64 + %4274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4275 = llvm.mul %4025, %4274 : !llvm.i64 + %4276 = llvm.add %4273, %4275 : !llvm.i64 + %4277 = llvm.getelementptr %4266[%4276] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4278 = llvm.load %4277 : !llvm.ptr> + %4279 = llvm.insertelement %4258, %4278[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4280 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4281 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4282 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4283 = llvm.mul %4007, %4282 : !llvm.i64 + %4284 = llvm.add %4281, %4283 : !llvm.i64 + %4285 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4286 = llvm.mul %67, %4285 : !llvm.i64 + %4287 = llvm.add %4284, %4286 : !llvm.i64 + %4288 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4289 = llvm.mul %4025, %4288 : !llvm.i64 + %4290 = llvm.add %4287, %4289 : !llvm.i64 + %4291 = llvm.getelementptr %4280[%4290] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4279, %4291 : !llvm.ptr> + %4292 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4293 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4294 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4295 = llvm.mul %4007, %4294 : !llvm.i64 + %4296 = llvm.add %4293, %4295 : !llvm.i64 + %4297 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4298 = llvm.mul %67, %4297 : !llvm.i64 + %4299 = llvm.add %4296, %4298 : !llvm.i64 + %4300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4301 = llvm.mul %4025, %4300 : !llvm.i64 + %4302 = llvm.add %4299, %4301 : !llvm.i64 + %4303 = llvm.getelementptr %4292[%4302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4304 = llvm.load %4303 : !llvm.ptr> + %4305 = llvm.insertelement %4259, %4304[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4306 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4307 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4308 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4309 = llvm.mul %4007, %4308 : !llvm.i64 + %4310 = llvm.add %4307, %4309 : !llvm.i64 + %4311 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4312 = llvm.mul %67, %4311 : !llvm.i64 + %4313 = llvm.add %4310, %4312 : !llvm.i64 + %4314 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4315 = llvm.mul %4025, %4314 : !llvm.i64 + %4316 = llvm.add %4313, %4315 : !llvm.i64 + %4317 = llvm.getelementptr %4306[%4316] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4305, %4317 : !llvm.ptr> + %4318 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4320 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4321 = llvm.mul %4007, %4320 : !llvm.i64 + %4322 = llvm.add %4319, %4321 : !llvm.i64 + %4323 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4324 = llvm.mul %67, %4323 : !llvm.i64 + %4325 = llvm.add %4322, %4324 : !llvm.i64 + %4326 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4327 = llvm.mul %4025, %4326 : !llvm.i64 + %4328 = llvm.add %4325, %4327 : !llvm.i64 + %4329 = llvm.getelementptr %4318[%4328] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4330 = llvm.load %4329 : !llvm.ptr> + %4331 = llvm.insertelement %4260, %4330[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4332 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4333 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4334 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4335 = llvm.mul %4007, %4334 : !llvm.i64 + %4336 = llvm.add %4333, %4335 : !llvm.i64 + %4337 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4338 = llvm.mul %67, %4337 : !llvm.i64 + %4339 = llvm.add %4336, %4338 : !llvm.i64 + %4340 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4341 = llvm.mul %4025, %4340 : !llvm.i64 + %4342 = llvm.add %4339, %4341 : !llvm.i64 + %4343 = llvm.getelementptr %4332[%4342] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4331, %4343 : !llvm.ptr> + %4344 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4345 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4346 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4347 = llvm.mul %4007, %4346 : !llvm.i64 + %4348 = llvm.add %4345, %4347 : !llvm.i64 + %4349 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4350 = llvm.mul %67, %4349 : !llvm.i64 + %4351 = llvm.add %4348, %4350 : !llvm.i64 + %4352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4353 = llvm.mul %4025, %4352 : !llvm.i64 + %4354 = llvm.add %4351, %4353 : !llvm.i64 + %4355 = llvm.getelementptr %4344[%4354] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4356 = llvm.load %4355 : !llvm.ptr> + %4357 = llvm.insertelement %4261, %4356[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4358 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4360 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4361 = llvm.mul %4007, %4360 : !llvm.i64 + %4362 = llvm.add %4359, %4361 : !llvm.i64 + %4363 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4364 = llvm.mul %67, %4363 : !llvm.i64 + %4365 = llvm.add %4362, %4364 : !llvm.i64 + %4366 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4367 = llvm.mul %4025, %4366 : !llvm.i64 + %4368 = llvm.add %4365, %4367 : !llvm.i64 + %4369 = llvm.getelementptr %4358[%4368] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4357, %4369 : !llvm.ptr> + %4370 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4371 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4372 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4373 = llvm.mul %4007, %4372 : !llvm.i64 + %4374 = llvm.add %4371, %4373 : !llvm.i64 + %4375 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4376 = llvm.mul %67, %4375 : !llvm.i64 + %4377 = llvm.add %4374, %4376 : !llvm.i64 + %4378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4379 = llvm.mul %4025, %4378 : !llvm.i64 + %4380 = llvm.add %4377, %4379 : !llvm.i64 + %4381 = llvm.getelementptr %4370[%4380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4382 = llvm.load %4381 : !llvm.ptr> + %4383 = llvm.insertelement %4262, %4382[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4384 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4385 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4386 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4387 = llvm.mul %4007, %4386 : !llvm.i64 + %4388 = llvm.add %4385, %4387 : !llvm.i64 + %4389 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4390 = llvm.mul %67, %4389 : !llvm.i64 + %4391 = llvm.add %4388, %4390 : !llvm.i64 + %4392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4393 = llvm.mul %4025, %4392 : !llvm.i64 + %4394 = llvm.add %4391, %4393 : !llvm.i64 + %4395 = llvm.getelementptr %4384[%4394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4383, %4395 : !llvm.ptr> + %4396 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4397 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4398 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4399 = llvm.mul %4007, %4398 : !llvm.i64 + %4400 = llvm.add %4397, %4399 : !llvm.i64 + %4401 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4402 = llvm.mul %67, %4401 : !llvm.i64 + %4403 = llvm.add %4400, %4402 : !llvm.i64 + %4404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4405 = llvm.mul %4025, %4404 : !llvm.i64 + %4406 = llvm.add %4403, %4405 : !llvm.i64 + %4407 = llvm.getelementptr %4396[%4406] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4408 = llvm.load %4407 : !llvm.ptr> + %4409 = llvm.insertelement %4263, %4408[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4410 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4412 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4413 = llvm.mul %4007, %4412 : !llvm.i64 + %4414 = llvm.add %4411, %4413 : !llvm.i64 + %4415 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4416 = llvm.mul %67, %4415 : !llvm.i64 + %4417 = llvm.add %4414, %4416 : !llvm.i64 + %4418 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4419 = llvm.mul %4025, %4418 : !llvm.i64 + %4420 = llvm.add %4417, %4419 : !llvm.i64 + %4421 = llvm.getelementptr %4410[%4420] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4409, %4421 : !llvm.ptr> + %4422 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4423 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4424 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4425 = llvm.mul %4007, %4424 : !llvm.i64 + %4426 = llvm.add %4423, %4425 : !llvm.i64 + %4427 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4428 = llvm.mul %67, %4427 : !llvm.i64 + %4429 = llvm.add %4426, %4428 : !llvm.i64 + %4430 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4431 = llvm.mul %4025, %4430 : !llvm.i64 + %4432 = llvm.add %4429, %4431 : !llvm.i64 + %4433 = llvm.getelementptr %4422[%4432] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4434 = llvm.load %4433 : !llvm.ptr> + %4435 = llvm.insertelement %4264, %4434[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4436 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4437 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4438 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4439 = llvm.mul %4007, %4438 : !llvm.i64 + %4440 = llvm.add %4437, %4439 : !llvm.i64 + %4441 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4442 = llvm.mul %67, %4441 : !llvm.i64 + %4443 = llvm.add %4440, %4442 : !llvm.i64 + %4444 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4445 = llvm.mul %4025, %4444 : !llvm.i64 + %4446 = llvm.add %4443, %4445 : !llvm.i64 + %4447 = llvm.getelementptr %4436[%4446] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4435, %4447 : !llvm.ptr> + %4448 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4450 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4451 = llvm.mul %4007, %4450 : !llvm.i64 + %4452 = llvm.add %4449, %4451 : !llvm.i64 + %4453 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4454 = llvm.mul %67, %4453 : !llvm.i64 + %4455 = llvm.add %4452, %4454 : !llvm.i64 + %4456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4457 = llvm.mul %4025, %4456 : !llvm.i64 + %4458 = llvm.add %4455, %4457 : !llvm.i64 + %4459 = llvm.getelementptr %4448[%4458] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4460 = llvm.load %4459 : !llvm.ptr> + %4461 = llvm.insertelement %4265, %4460[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4462 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4463 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4464 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4465 = llvm.mul %4007, %4464 : !llvm.i64 + %4466 = llvm.add %4463, %4465 : !llvm.i64 + %4467 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4468 = llvm.mul %67, %4467 : !llvm.i64 + %4469 = llvm.add %4466, %4468 : !llvm.i64 + %4470 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4471 = llvm.mul %4025, %4470 : !llvm.i64 + %4472 = llvm.add %4469, %4471 : !llvm.i64 + %4473 = llvm.getelementptr %4462[%4472] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4461, %4473 : !llvm.ptr> + %4474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4477 = llvm.mul %4007, %4476 : !llvm.i64 + %4478 = llvm.add %4475, %4477 : !llvm.i64 + %4479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4480 = llvm.mul %67, %4479 : !llvm.i64 + %4481 = llvm.add %4478, %4480 : !llvm.i64 + %4482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4483 = llvm.mul %4025, %4482 : !llvm.i64 + %4484 = llvm.add %4481, %4483 : !llvm.i64 + %4485 = llvm.getelementptr %4474[%4484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4486 = llvm.load %4485 : !llvm.ptr> + %4487 = llvm.insertelement %4258, %4486[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4488 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4490 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4491 = llvm.mul %4007, %4490 : !llvm.i64 + %4492 = llvm.add %4489, %4491 : !llvm.i64 + %4493 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4494 = llvm.mul %67, %4493 : !llvm.i64 + %4495 = llvm.add %4492, %4494 : !llvm.i64 + %4496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4497 = llvm.mul %4025, %4496 : !llvm.i64 + %4498 = llvm.add %4495, %4497 : !llvm.i64 + %4499 = llvm.getelementptr %4488[%4498] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4487, %4499 : !llvm.ptr> + %4500 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4501 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4502 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4503 = llvm.mul %4007, %4502 : !llvm.i64 + %4504 = llvm.add %4501, %4503 : !llvm.i64 + %4505 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4506 = llvm.mul %67, %4505 : !llvm.i64 + %4507 = llvm.add %4504, %4506 : !llvm.i64 + %4508 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4509 = llvm.mul %4025, %4508 : !llvm.i64 + %4510 = llvm.add %4507, %4509 : !llvm.i64 + %4511 = llvm.getelementptr %4500[%4510] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4512 = llvm.load %4511 : !llvm.ptr> + %4513 = llvm.insertelement %4259, %4512[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4514 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4515 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4516 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4517 = llvm.mul %4007, %4516 : !llvm.i64 + %4518 = llvm.add %4515, %4517 : !llvm.i64 + %4519 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4520 = llvm.mul %67, %4519 : !llvm.i64 + %4521 = llvm.add %4518, %4520 : !llvm.i64 + %4522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4523 = llvm.mul %4025, %4522 : !llvm.i64 + %4524 = llvm.add %4521, %4523 : !llvm.i64 + %4525 = llvm.getelementptr %4514[%4524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4513, %4525 : !llvm.ptr> + %4526 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4528 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4529 = llvm.mul %4007, %4528 : !llvm.i64 + %4530 = llvm.add %4527, %4529 : !llvm.i64 + %4531 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4532 = llvm.mul %67, %4531 : !llvm.i64 + %4533 = llvm.add %4530, %4532 : !llvm.i64 + %4534 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4535 = llvm.mul %4025, %4534 : !llvm.i64 + %4536 = llvm.add %4533, %4535 : !llvm.i64 + %4537 = llvm.getelementptr %4526[%4536] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4538 = llvm.load %4537 : !llvm.ptr> + %4539 = llvm.insertelement %4260, %4538[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4540 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4542 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4543 = llvm.mul %4007, %4542 : !llvm.i64 + %4544 = llvm.add %4541, %4543 : !llvm.i64 + %4545 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4546 = llvm.mul %67, %4545 : !llvm.i64 + %4547 = llvm.add %4544, %4546 : !llvm.i64 + %4548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4549 = llvm.mul %4025, %4548 : !llvm.i64 + %4550 = llvm.add %4547, %4549 : !llvm.i64 + %4551 = llvm.getelementptr %4540[%4550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4539, %4551 : !llvm.ptr> + %4552 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4553 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4554 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4555 = llvm.mul %4007, %4554 : !llvm.i64 + %4556 = llvm.add %4553, %4555 : !llvm.i64 + %4557 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4558 = llvm.mul %67, %4557 : !llvm.i64 + %4559 = llvm.add %4556, %4558 : !llvm.i64 + %4560 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4561 = llvm.mul %4025, %4560 : !llvm.i64 + %4562 = llvm.add %4559, %4561 : !llvm.i64 + %4563 = llvm.getelementptr %4552[%4562] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4564 = llvm.load %4563 : !llvm.ptr> + %4565 = llvm.insertelement %4261, %4564[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4566 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4567 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4568 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4569 = llvm.mul %4007, %4568 : !llvm.i64 + %4570 = llvm.add %4567, %4569 : !llvm.i64 + %4571 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4572 = llvm.mul %67, %4571 : !llvm.i64 + %4573 = llvm.add %4570, %4572 : !llvm.i64 + %4574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4575 = llvm.mul %4025, %4574 : !llvm.i64 + %4576 = llvm.add %4573, %4575 : !llvm.i64 + %4577 = llvm.getelementptr %4566[%4576] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4565, %4577 : !llvm.ptr> + %4578 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4580 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4581 = llvm.mul %4007, %4580 : !llvm.i64 + %4582 = llvm.add %4579, %4581 : !llvm.i64 + %4583 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4584 = llvm.mul %67, %4583 : !llvm.i64 + %4585 = llvm.add %4582, %4584 : !llvm.i64 + %4586 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4587 = llvm.mul %4025, %4586 : !llvm.i64 + %4588 = llvm.add %4585, %4587 : !llvm.i64 + %4589 = llvm.getelementptr %4578[%4588] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4590 = llvm.load %4589 : !llvm.ptr> + %4591 = llvm.insertelement %4262, %4590[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4592 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4593 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4594 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4595 = llvm.mul %4007, %4594 : !llvm.i64 + %4596 = llvm.add %4593, %4595 : !llvm.i64 + %4597 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4598 = llvm.mul %67, %4597 : !llvm.i64 + %4599 = llvm.add %4596, %4598 : !llvm.i64 + %4600 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4601 = llvm.mul %4025, %4600 : !llvm.i64 + %4602 = llvm.add %4599, %4601 : !llvm.i64 + %4603 = llvm.getelementptr %4592[%4602] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4591, %4603 : !llvm.ptr> + %4604 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4605 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4606 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4607 = llvm.mul %4007, %4606 : !llvm.i64 + %4608 = llvm.add %4605, %4607 : !llvm.i64 + %4609 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4610 = llvm.mul %67, %4609 : !llvm.i64 + %4611 = llvm.add %4608, %4610 : !llvm.i64 + %4612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4613 = llvm.mul %4025, %4612 : !llvm.i64 + %4614 = llvm.add %4611, %4613 : !llvm.i64 + %4615 = llvm.getelementptr %4604[%4614] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4616 = llvm.load %4615 : !llvm.ptr> + %4617 = llvm.insertelement %4263, %4616[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4618 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4620 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4621 = llvm.mul %4007, %4620 : !llvm.i64 + %4622 = llvm.add %4619, %4621 : !llvm.i64 + %4623 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4624 = llvm.mul %67, %4623 : !llvm.i64 + %4625 = llvm.add %4622, %4624 : !llvm.i64 + %4626 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4627 = llvm.mul %4025, %4626 : !llvm.i64 + %4628 = llvm.add %4625, %4627 : !llvm.i64 + %4629 = llvm.getelementptr %4618[%4628] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4617, %4629 : !llvm.ptr> + %4630 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4632 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4633 = llvm.mul %4007, %4632 : !llvm.i64 + %4634 = llvm.add %4631, %4633 : !llvm.i64 + %4635 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4636 = llvm.mul %67, %4635 : !llvm.i64 + %4637 = llvm.add %4634, %4636 : !llvm.i64 + %4638 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4639 = llvm.mul %4025, %4638 : !llvm.i64 + %4640 = llvm.add %4637, %4639 : !llvm.i64 + %4641 = llvm.getelementptr %4630[%4640] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4642 = llvm.load %4641 : !llvm.ptr> + %4643 = llvm.insertelement %4264, %4642[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4644 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4645 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4646 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4647 = llvm.mul %4007, %4646 : !llvm.i64 + %4648 = llvm.add %4645, %4647 : !llvm.i64 + %4649 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4650 = llvm.mul %67, %4649 : !llvm.i64 + %4651 = llvm.add %4648, %4650 : !llvm.i64 + %4652 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4653 = llvm.mul %4025, %4652 : !llvm.i64 + %4654 = llvm.add %4651, %4653 : !llvm.i64 + %4655 = llvm.getelementptr %4644[%4654] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4643, %4655 : !llvm.ptr> + %4656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4659 = llvm.mul %4007, %4658 : !llvm.i64 + %4660 = llvm.add %4657, %4659 : !llvm.i64 + %4661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4662 = llvm.mul %67, %4661 : !llvm.i64 + %4663 = llvm.add %4660, %4662 : !llvm.i64 + %4664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4665 = llvm.mul %4025, %4664 : !llvm.i64 + %4666 = llvm.add %4663, %4665 : !llvm.i64 + %4667 = llvm.getelementptr %4656[%4666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4668 = llvm.load %4667 : !llvm.ptr> + %4669 = llvm.insertelement %4265, %4668[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4673 = llvm.mul %4007, %4672 : !llvm.i64 + %4674 = llvm.add %4671, %4673 : !llvm.i64 + %4675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4676 = llvm.mul %67, %4675 : !llvm.i64 + %4677 = llvm.add %4674, %4676 : !llvm.i64 + %4678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4679 = llvm.mul %4025, %4678 : !llvm.i64 + %4680 = llvm.add %4677, %4679 : !llvm.i64 + %4681 = llvm.getelementptr %4670[%4680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4669, %4681 : !llvm.ptr> + %4682 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4683 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4684 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4685 = llvm.mul %2345, %4684 : !llvm.i64 + %4686 = llvm.add %4683, %4685 : !llvm.i64 + %4687 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4688 = llvm.mul %3917, %4687 : !llvm.i64 + %4689 = llvm.add %4686, %4688 : !llvm.i64 + %4690 = llvm.getelementptr %4682[%4689] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4691 = llvm.load %4690 : !llvm.ptr + %4692 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4694 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4695 = llvm.mul %2345, %4694 : !llvm.i64 + %4696 = llvm.add %4693, %4695 : !llvm.i64 + %4697 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4698 = llvm.mul %3917, %4697 : !llvm.i64 + %4699 = llvm.add %4696, %4698 : !llvm.i64 + %4700 = llvm.getelementptr %4692[%4699] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4701 = llvm.load %4700 : !llvm.ptr + %4702 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4703 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4704 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4705 = llvm.mul %2345, %4704 : !llvm.i64 + %4706 = llvm.add %4703, %4705 : !llvm.i64 + %4707 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4708 = llvm.mul %3917, %4707 : !llvm.i64 + %4709 = llvm.add %4706, %4708 : !llvm.i64 + %4710 = llvm.getelementptr %4702[%4709] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4711 = llvm.load %4710 : !llvm.ptr + %4712 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4714 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4715 = llvm.mul %2345, %4714 : !llvm.i64 + %4716 = llvm.add %4713, %4715 : !llvm.i64 + %4717 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4718 = llvm.mul %3917, %4717 : !llvm.i64 + %4719 = llvm.add %4716, %4718 : !llvm.i64 + %4720 = llvm.getelementptr %4712[%4719] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4721 = llvm.load %4720 : !llvm.ptr + %4722 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4724 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4725 = llvm.mul %2345, %4724 : !llvm.i64 + %4726 = llvm.add %4723, %4725 : !llvm.i64 + %4727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4728 = llvm.mul %3917, %4727 : !llvm.i64 + %4729 = llvm.add %4726, %4728 : !llvm.i64 + %4730 = llvm.getelementptr %4722[%4729] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4731 = llvm.load %4730 : !llvm.ptr + %4732 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4734 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4735 = llvm.mul %2345, %4734 : !llvm.i64 + %4736 = llvm.add %4733, %4735 : !llvm.i64 + %4737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4738 = llvm.mul %3917, %4737 : !llvm.i64 + %4739 = llvm.add %4736, %4738 : !llvm.i64 + %4740 = llvm.getelementptr %4732[%4739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4741 = llvm.load %4740 : !llvm.ptr + %4742 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4743 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4744 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4745 = llvm.mul %2345, %4744 : !llvm.i64 + %4746 = llvm.add %4743, %4745 : !llvm.i64 + %4747 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4748 = llvm.mul %3917, %4747 : !llvm.i64 + %4749 = llvm.add %4746, %4748 : !llvm.i64 + %4750 = llvm.getelementptr %4742[%4749] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4751 = llvm.load %4750 : !llvm.ptr + %4752 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4754 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4755 = llvm.mul %2345, %4754 : !llvm.i64 + %4756 = llvm.add %4753, %4755 : !llvm.i64 + %4757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4758 = llvm.mul %3917, %4757 : !llvm.i64 + %4759 = llvm.add %4756, %4758 : !llvm.i64 + %4760 = llvm.getelementptr %4752[%4759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4761 = llvm.load %4760 : !llvm.ptr + %4762 = llvm.add %2368, %70 : !llvm.i64 + %4763 = llvm.icmp "slt" %4762, %67 : !llvm.i64 + %4764 = llvm.sub %64, %4762 : !llvm.i64 + %4765 = llvm.select %4763, %4764, %4762 : !llvm.i1, !llvm.i64 + %4766 = llvm.sdiv %4765, %68 : !llvm.i64 + %4767 = llvm.sub %64, %4766 : !llvm.i64 + %4768 = llvm.select %4763, %4767, %4766 : !llvm.i1, !llvm.i64 + %4769 = llvm.srem %4768, %68 : !llvm.i64 + %4770 = llvm.icmp "slt" %4769, %67 : !llvm.i64 + %4771 = llvm.add %4769, %68 : !llvm.i64 + %4772 = llvm.select %4770, %4771, %4769 : !llvm.i1, !llvm.i64 + %4773 = llvm.sdiv %4000, %70 : !llvm.i64 + %4774 = llvm.sub %64, %4773 : !llvm.i64 + %4775 = llvm.select %3998, %4774, %4773 : !llvm.i1, !llvm.i64 + %4776 = llvm.mul %4768, %65 : !llvm.i64 + %4777 = llvm.add %4775, %4776 : !llvm.i64 + %4778 = llvm.add %4777, %69 : !llvm.i64 + %4779 = llvm.icmp "slt" %4778, %67 : !llvm.i64 + %4780 = llvm.sub %64, %4778 : !llvm.i64 + %4781 = llvm.select %4779, %4780, %4778 : !llvm.i1, !llvm.i64 + %4782 = llvm.sdiv %4781, %63 : !llvm.i64 + %4783 = llvm.sub %64, %4782 : !llvm.i64 + %4784 = llvm.select %4779, %4783, %4782 : !llvm.i1, !llvm.i64 + %4785 = llvm.mul %4784, %65 : !llvm.i64 + %4786 = llvm.add %4777, %4785 : !llvm.i64 + %4787 = llvm.add %4786, %69 : !llvm.i64 + %4788 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4790 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4791 = llvm.mul %4772, %4790 : !llvm.i64 + %4792 = llvm.add %4789, %4791 : !llvm.i64 + %4793 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4794 = llvm.mul %4011, %4793 : !llvm.i64 + %4795 = llvm.add %4792, %4794 : !llvm.i64 + %4796 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4797 = llvm.mul %4787, %4796 : !llvm.i64 + %4798 = llvm.add %4795, %4797 : !llvm.i64 + %4799 = llvm.getelementptr %4788[%4798] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4800 = llvm.load %4799 : !llvm.ptr> + %4801 = llvm.extractelement %4800[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4802 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4803 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4804 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4805 = llvm.mul %4772, %4804 : !llvm.i64 + %4806 = llvm.add %4803, %4805 : !llvm.i64 + %4807 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4808 = llvm.mul %4011, %4807 : !llvm.i64 + %4809 = llvm.add %4806, %4808 : !llvm.i64 + %4810 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4811 = llvm.mul %4787, %4810 : !llvm.i64 + %4812 = llvm.add %4809, %4811 : !llvm.i64 + %4813 = llvm.getelementptr %4802[%4812] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4814 = llvm.load %4813 : !llvm.ptr> + %4815 = llvm.extractelement %4814[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4816 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4817 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4818 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4819 = llvm.mul %4772, %4818 : !llvm.i64 + %4820 = llvm.add %4817, %4819 : !llvm.i64 + %4821 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4822 = llvm.mul %4011, %4821 : !llvm.i64 + %4823 = llvm.add %4820, %4822 : !llvm.i64 + %4824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4825 = llvm.mul %4787, %4824 : !llvm.i64 + %4826 = llvm.add %4823, %4825 : !llvm.i64 + %4827 = llvm.getelementptr %4816[%4826] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4828 = llvm.load %4827 : !llvm.ptr> + %4829 = llvm.extractelement %4828[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4830 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4831 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4832 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4833 = llvm.mul %4772, %4832 : !llvm.i64 + %4834 = llvm.add %4831, %4833 : !llvm.i64 + %4835 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4836 = llvm.mul %4011, %4835 : !llvm.i64 + %4837 = llvm.add %4834, %4836 : !llvm.i64 + %4838 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4839 = llvm.mul %4787, %4838 : !llvm.i64 + %4840 = llvm.add %4837, %4839 : !llvm.i64 + %4841 = llvm.getelementptr %4830[%4840] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4842 = llvm.load %4841 : !llvm.ptr> + %4843 = llvm.extractelement %4842[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4844 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4845 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4846 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4847 = llvm.mul %4772, %4846 : !llvm.i64 + %4848 = llvm.add %4845, %4847 : !llvm.i64 + %4849 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4850 = llvm.mul %4011, %4849 : !llvm.i64 + %4851 = llvm.add %4848, %4850 : !llvm.i64 + %4852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4853 = llvm.mul %4787, %4852 : !llvm.i64 + %4854 = llvm.add %4851, %4853 : !llvm.i64 + %4855 = llvm.getelementptr %4844[%4854] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4856 = llvm.load %4855 : !llvm.ptr> + %4857 = llvm.extractelement %4856[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4858 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4860 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4861 = llvm.mul %4772, %4860 : !llvm.i64 + %4862 = llvm.add %4859, %4861 : !llvm.i64 + %4863 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4864 = llvm.mul %4011, %4863 : !llvm.i64 + %4865 = llvm.add %4862, %4864 : !llvm.i64 + %4866 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4867 = llvm.mul %4787, %4866 : !llvm.i64 + %4868 = llvm.add %4865, %4867 : !llvm.i64 + %4869 = llvm.getelementptr %4858[%4868] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4870 = llvm.load %4869 : !llvm.ptr> + %4871 = llvm.extractelement %4870[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4872 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4873 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4874 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4875 = llvm.mul %4772, %4874 : !llvm.i64 + %4876 = llvm.add %4873, %4875 : !llvm.i64 + %4877 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4878 = llvm.mul %4011, %4877 : !llvm.i64 + %4879 = llvm.add %4876, %4878 : !llvm.i64 + %4880 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4881 = llvm.mul %4787, %4880 : !llvm.i64 + %4882 = llvm.add %4879, %4881 : !llvm.i64 + %4883 = llvm.getelementptr %4872[%4882] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4884 = llvm.load %4883 : !llvm.ptr> + %4885 = llvm.extractelement %4884[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4886 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4888 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4889 = llvm.mul %4772, %4888 : !llvm.i64 + %4890 = llvm.add %4887, %4889 : !llvm.i64 + %4891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4892 = llvm.mul %4011, %4891 : !llvm.i64 + %4893 = llvm.add %4890, %4892 : !llvm.i64 + %4894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4895 = llvm.mul %4787, %4894 : !llvm.i64 + %4896 = llvm.add %4893, %4895 : !llvm.i64 + %4897 = llvm.getelementptr %4886[%4896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4898 = llvm.load %4897 : !llvm.ptr> + %4899 = llvm.extractelement %4898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4900 = llvm.fmul %4691, %4801 {RelaxedPrecision} : !llvm.float + %4901 = llvm.fmul %4701, %4815 {RelaxedPrecision} : !llvm.float + %4902 = llvm.fmul %4711, %4829 {RelaxedPrecision} : !llvm.float + %4903 = llvm.fmul %4721, %4843 {RelaxedPrecision} : !llvm.float + %4904 = llvm.fmul %4731, %4857 {RelaxedPrecision} : !llvm.float + %4905 = llvm.fmul %4741, %4871 {RelaxedPrecision} : !llvm.float + %4906 = llvm.fmul %4751, %4885 {RelaxedPrecision} : !llvm.float + %4907 = llvm.fmul %4761, %4899 {RelaxedPrecision} : !llvm.float + %4908 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4910 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4911 = llvm.mul %4772, %4910 : !llvm.i64 + %4912 = llvm.add %4909, %4911 : !llvm.i64 + %4913 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4914 = llvm.mul %67, %4913 : !llvm.i64 + %4915 = llvm.add %4912, %4914 : !llvm.i64 + %4916 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4917 = llvm.mul %4787, %4916 : !llvm.i64 + %4918 = llvm.add %4915, %4917 : !llvm.i64 + %4919 = llvm.getelementptr %4908[%4918] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4920 = llvm.load %4919 : !llvm.ptr> + %4921 = llvm.extractelement %4920[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4922 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4923 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4924 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4925 = llvm.mul %4772, %4924 : !llvm.i64 + %4926 = llvm.add %4923, %4925 : !llvm.i64 + %4927 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4928 = llvm.mul %67, %4927 : !llvm.i64 + %4929 = llvm.add %4926, %4928 : !llvm.i64 + %4930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4931 = llvm.mul %4787, %4930 : !llvm.i64 + %4932 = llvm.add %4929, %4931 : !llvm.i64 + %4933 = llvm.getelementptr %4922[%4932] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4934 = llvm.load %4933 : !llvm.ptr> + %4935 = llvm.extractelement %4934[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4936 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4937 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4938 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4939 = llvm.mul %4772, %4938 : !llvm.i64 + %4940 = llvm.add %4937, %4939 : !llvm.i64 + %4941 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4942 = llvm.mul %67, %4941 : !llvm.i64 + %4943 = llvm.add %4940, %4942 : !llvm.i64 + %4944 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4945 = llvm.mul %4787, %4944 : !llvm.i64 + %4946 = llvm.add %4943, %4945 : !llvm.i64 + %4947 = llvm.getelementptr %4936[%4946] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4948 = llvm.load %4947 : !llvm.ptr> + %4949 = llvm.extractelement %4948[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4950 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4951 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4952 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4953 = llvm.mul %4772, %4952 : !llvm.i64 + %4954 = llvm.add %4951, %4953 : !llvm.i64 + %4955 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4956 = llvm.mul %67, %4955 : !llvm.i64 + %4957 = llvm.add %4954, %4956 : !llvm.i64 + %4958 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4959 = llvm.mul %4787, %4958 : !llvm.i64 + %4960 = llvm.add %4957, %4959 : !llvm.i64 + %4961 = llvm.getelementptr %4950[%4960] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4962 = llvm.load %4961 : !llvm.ptr> + %4963 = llvm.extractelement %4962[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4964 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4966 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4967 = llvm.mul %4772, %4966 : !llvm.i64 + %4968 = llvm.add %4965, %4967 : !llvm.i64 + %4969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4970 = llvm.mul %67, %4969 : !llvm.i64 + %4971 = llvm.add %4968, %4970 : !llvm.i64 + %4972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4973 = llvm.mul %4787, %4972 : !llvm.i64 + %4974 = llvm.add %4971, %4973 : !llvm.i64 + %4975 = llvm.getelementptr %4964[%4974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4976 = llvm.load %4975 : !llvm.ptr> + %4977 = llvm.extractelement %4976[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4978 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4980 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4981 = llvm.mul %4772, %4980 : !llvm.i64 + %4982 = llvm.add %4979, %4981 : !llvm.i64 + %4983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4984 = llvm.mul %67, %4983 : !llvm.i64 + %4985 = llvm.add %4982, %4984 : !llvm.i64 + %4986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4987 = llvm.mul %4787, %4986 : !llvm.i64 + %4988 = llvm.add %4985, %4987 : !llvm.i64 + %4989 = llvm.getelementptr %4978[%4988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4990 = llvm.load %4989 : !llvm.ptr> + %4991 = llvm.extractelement %4990[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4992 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4993 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4994 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4995 = llvm.mul %4772, %4994 : !llvm.i64 + %4996 = llvm.add %4993, %4995 : !llvm.i64 + %4997 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4998 = llvm.mul %67, %4997 : !llvm.i64 + %4999 = llvm.add %4996, %4998 : !llvm.i64 + %5000 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5001 = llvm.mul %4787, %5000 : !llvm.i64 + %5002 = llvm.add %4999, %5001 : !llvm.i64 + %5003 = llvm.getelementptr %4992[%5002] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5004 = llvm.load %5003 : !llvm.ptr> + %5005 = llvm.extractelement %5004[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5006 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5007 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5008 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5009 = llvm.mul %4772, %5008 : !llvm.i64 + %5010 = llvm.add %5007, %5009 : !llvm.i64 + %5011 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5012 = llvm.mul %67, %5011 : !llvm.i64 + %5013 = llvm.add %5010, %5012 : !llvm.i64 + %5014 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5015 = llvm.mul %4787, %5014 : !llvm.i64 + %5016 = llvm.add %5013, %5015 : !llvm.i64 + %5017 = llvm.getelementptr %5006[%5016] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5018 = llvm.load %5017 : !llvm.ptr> + %5019 = llvm.extractelement %5018[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5020 = llvm.fadd %4921, %4900 {RelaxedPrecision} : !llvm.float + %5021 = llvm.fadd %4935, %4901 {RelaxedPrecision} : !llvm.float + %5022 = llvm.fadd %4949, %4902 {RelaxedPrecision} : !llvm.float + %5023 = llvm.fadd %4963, %4903 {RelaxedPrecision} : !llvm.float + %5024 = llvm.fadd %4977, %4904 {RelaxedPrecision} : !llvm.float + %5025 = llvm.fadd %4991, %4905 {RelaxedPrecision} : !llvm.float + %5026 = llvm.fadd %5005, %4906 {RelaxedPrecision} : !llvm.float + %5027 = llvm.fadd %5019, %4907 {RelaxedPrecision} : !llvm.float + %5028 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5030 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5031 = llvm.mul %4772, %5030 : !llvm.i64 + %5032 = llvm.add %5029, %5031 : !llvm.i64 + %5033 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5034 = llvm.mul %67, %5033 : !llvm.i64 + %5035 = llvm.add %5032, %5034 : !llvm.i64 + %5036 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5037 = llvm.mul %4787, %5036 : !llvm.i64 + %5038 = llvm.add %5035, %5037 : !llvm.i64 + %5039 = llvm.getelementptr %5028[%5038] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5040 = llvm.load %5039 : !llvm.ptr> + %5041 = llvm.insertelement %5020, %5040[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5042 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5043 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5044 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5045 = llvm.mul %4772, %5044 : !llvm.i64 + %5046 = llvm.add %5043, %5045 : !llvm.i64 + %5047 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5048 = llvm.mul %67, %5047 : !llvm.i64 + %5049 = llvm.add %5046, %5048 : !llvm.i64 + %5050 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5051 = llvm.mul %4787, %5050 : !llvm.i64 + %5052 = llvm.add %5049, %5051 : !llvm.i64 + %5053 = llvm.getelementptr %5042[%5052] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5041, %5053 : !llvm.ptr> + %5054 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5056 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5057 = llvm.mul %4772, %5056 : !llvm.i64 + %5058 = llvm.add %5055, %5057 : !llvm.i64 + %5059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5060 = llvm.mul %67, %5059 : !llvm.i64 + %5061 = llvm.add %5058, %5060 : !llvm.i64 + %5062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5063 = llvm.mul %4787, %5062 : !llvm.i64 + %5064 = llvm.add %5061, %5063 : !llvm.i64 + %5065 = llvm.getelementptr %5054[%5064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5066 = llvm.load %5065 : !llvm.ptr> + %5067 = llvm.insertelement %5021, %5066[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5068 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5070 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5071 = llvm.mul %4772, %5070 : !llvm.i64 + %5072 = llvm.add %5069, %5071 : !llvm.i64 + %5073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5074 = llvm.mul %67, %5073 : !llvm.i64 + %5075 = llvm.add %5072, %5074 : !llvm.i64 + %5076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5077 = llvm.mul %4787, %5076 : !llvm.i64 + %5078 = llvm.add %5075, %5077 : !llvm.i64 + %5079 = llvm.getelementptr %5068[%5078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5067, %5079 : !llvm.ptr> + %5080 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5081 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5082 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5083 = llvm.mul %4772, %5082 : !llvm.i64 + %5084 = llvm.add %5081, %5083 : !llvm.i64 + %5085 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5086 = llvm.mul %67, %5085 : !llvm.i64 + %5087 = llvm.add %5084, %5086 : !llvm.i64 + %5088 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5089 = llvm.mul %4787, %5088 : !llvm.i64 + %5090 = llvm.add %5087, %5089 : !llvm.i64 + %5091 = llvm.getelementptr %5080[%5090] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5092 = llvm.load %5091 : !llvm.ptr> + %5093 = llvm.insertelement %5022, %5092[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5094 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5095 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5096 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5097 = llvm.mul %4772, %5096 : !llvm.i64 + %5098 = llvm.add %5095, %5097 : !llvm.i64 + %5099 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5100 = llvm.mul %67, %5099 : !llvm.i64 + %5101 = llvm.add %5098, %5100 : !llvm.i64 + %5102 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5103 = llvm.mul %4787, %5102 : !llvm.i64 + %5104 = llvm.add %5101, %5103 : !llvm.i64 + %5105 = llvm.getelementptr %5094[%5104] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5093, %5105 : !llvm.ptr> + %5106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5109 = llvm.mul %4772, %5108 : !llvm.i64 + %5110 = llvm.add %5107, %5109 : !llvm.i64 + %5111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5112 = llvm.mul %67, %5111 : !llvm.i64 + %5113 = llvm.add %5110, %5112 : !llvm.i64 + %5114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5115 = llvm.mul %4787, %5114 : !llvm.i64 + %5116 = llvm.add %5113, %5115 : !llvm.i64 + %5117 = llvm.getelementptr %5106[%5116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5118 = llvm.load %5117 : !llvm.ptr> + %5119 = llvm.insertelement %5023, %5118[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5120 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5122 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5123 = llvm.mul %4772, %5122 : !llvm.i64 + %5124 = llvm.add %5121, %5123 : !llvm.i64 + %5125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5126 = llvm.mul %67, %5125 : !llvm.i64 + %5127 = llvm.add %5124, %5126 : !llvm.i64 + %5128 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5129 = llvm.mul %4787, %5128 : !llvm.i64 + %5130 = llvm.add %5127, %5129 : !llvm.i64 + %5131 = llvm.getelementptr %5120[%5130] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5119, %5131 : !llvm.ptr> + %5132 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5134 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5135 = llvm.mul %4772, %5134 : !llvm.i64 + %5136 = llvm.add %5133, %5135 : !llvm.i64 + %5137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5138 = llvm.mul %67, %5137 : !llvm.i64 + %5139 = llvm.add %5136, %5138 : !llvm.i64 + %5140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5141 = llvm.mul %4787, %5140 : !llvm.i64 + %5142 = llvm.add %5139, %5141 : !llvm.i64 + %5143 = llvm.getelementptr %5132[%5142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5144 = llvm.load %5143 : !llvm.ptr> + %5145 = llvm.insertelement %5024, %5144[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5149 = llvm.mul %4772, %5148 : !llvm.i64 + %5150 = llvm.add %5147, %5149 : !llvm.i64 + %5151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5152 = llvm.mul %67, %5151 : !llvm.i64 + %5153 = llvm.add %5150, %5152 : !llvm.i64 + %5154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5155 = llvm.mul %4787, %5154 : !llvm.i64 + %5156 = llvm.add %5153, %5155 : !llvm.i64 + %5157 = llvm.getelementptr %5146[%5156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5145, %5157 : !llvm.ptr> + %5158 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5160 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5161 = llvm.mul %4772, %5160 : !llvm.i64 + %5162 = llvm.add %5159, %5161 : !llvm.i64 + %5163 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5164 = llvm.mul %67, %5163 : !llvm.i64 + %5165 = llvm.add %5162, %5164 : !llvm.i64 + %5166 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5167 = llvm.mul %4787, %5166 : !llvm.i64 + %5168 = llvm.add %5165, %5167 : !llvm.i64 + %5169 = llvm.getelementptr %5158[%5168] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5170 = llvm.load %5169 : !llvm.ptr> + %5171 = llvm.insertelement %5025, %5170[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5172 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5173 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5174 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5175 = llvm.mul %4772, %5174 : !llvm.i64 + %5176 = llvm.add %5173, %5175 : !llvm.i64 + %5177 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5178 = llvm.mul %67, %5177 : !llvm.i64 + %5179 = llvm.add %5176, %5178 : !llvm.i64 + %5180 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5181 = llvm.mul %4787, %5180 : !llvm.i64 + %5182 = llvm.add %5179, %5181 : !llvm.i64 + %5183 = llvm.getelementptr %5172[%5182] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5171, %5183 : !llvm.ptr> + %5184 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5185 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5186 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5187 = llvm.mul %4772, %5186 : !llvm.i64 + %5188 = llvm.add %5185, %5187 : !llvm.i64 + %5189 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5190 = llvm.mul %67, %5189 : !llvm.i64 + %5191 = llvm.add %5188, %5190 : !llvm.i64 + %5192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5193 = llvm.mul %4787, %5192 : !llvm.i64 + %5194 = llvm.add %5191, %5193 : !llvm.i64 + %5195 = llvm.getelementptr %5184[%5194] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5196 = llvm.load %5195 : !llvm.ptr> + %5197 = llvm.insertelement %5026, %5196[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5198 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5200 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5201 = llvm.mul %4772, %5200 : !llvm.i64 + %5202 = llvm.add %5199, %5201 : !llvm.i64 + %5203 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5204 = llvm.mul %67, %5203 : !llvm.i64 + %5205 = llvm.add %5202, %5204 : !llvm.i64 + %5206 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5207 = llvm.mul %4787, %5206 : !llvm.i64 + %5208 = llvm.add %5205, %5207 : !llvm.i64 + %5209 = llvm.getelementptr %5198[%5208] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5197, %5209 : !llvm.ptr> + %5210 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5212 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5213 = llvm.mul %4772, %5212 : !llvm.i64 + %5214 = llvm.add %5211, %5213 : !llvm.i64 + %5215 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5216 = llvm.mul %67, %5215 : !llvm.i64 + %5217 = llvm.add %5214, %5216 : !llvm.i64 + %5218 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5219 = llvm.mul %4787, %5218 : !llvm.i64 + %5220 = llvm.add %5217, %5219 : !llvm.i64 + %5221 = llvm.getelementptr %5210[%5220] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5222 = llvm.load %5221 : !llvm.ptr> + %5223 = llvm.insertelement %5027, %5222[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5224 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5225 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5226 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5227 = llvm.mul %4772, %5226 : !llvm.i64 + %5228 = llvm.add %5225, %5227 : !llvm.i64 + %5229 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5230 = llvm.mul %67, %5229 : !llvm.i64 + %5231 = llvm.add %5228, %5230 : !llvm.i64 + %5232 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5233 = llvm.mul %4787, %5232 : !llvm.i64 + %5234 = llvm.add %5231, %5233 : !llvm.i64 + %5235 = llvm.getelementptr %5224[%5234] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5223, %5235 : !llvm.ptr> + %5236 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5237 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5238 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5239 = llvm.mul %4772, %5238 : !llvm.i64 + %5240 = llvm.add %5237, %5239 : !llvm.i64 + %5241 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5242 = llvm.mul %67, %5241 : !llvm.i64 + %5243 = llvm.add %5240, %5242 : !llvm.i64 + %5244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5245 = llvm.mul %4787, %5244 : !llvm.i64 + %5246 = llvm.add %5243, %5245 : !llvm.i64 + %5247 = llvm.getelementptr %5236[%5246] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5248 = llvm.load %5247 : !llvm.ptr> + %5249 = llvm.insertelement %5020, %5248[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5250 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5252 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5253 = llvm.mul %4772, %5252 : !llvm.i64 + %5254 = llvm.add %5251, %5253 : !llvm.i64 + %5255 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5256 = llvm.mul %67, %5255 : !llvm.i64 + %5257 = llvm.add %5254, %5256 : !llvm.i64 + %5258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5259 = llvm.mul %4787, %5258 : !llvm.i64 + %5260 = llvm.add %5257, %5259 : !llvm.i64 + %5261 = llvm.getelementptr %5250[%5260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5249, %5261 : !llvm.ptr> + %5262 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5263 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5264 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5265 = llvm.mul %4772, %5264 : !llvm.i64 + %5266 = llvm.add %5263, %5265 : !llvm.i64 + %5267 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5268 = llvm.mul %67, %5267 : !llvm.i64 + %5269 = llvm.add %5266, %5268 : !llvm.i64 + %5270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5271 = llvm.mul %4787, %5270 : !llvm.i64 + %5272 = llvm.add %5269, %5271 : !llvm.i64 + %5273 = llvm.getelementptr %5262[%5272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5274 = llvm.load %5273 : !llvm.ptr> + %5275 = llvm.insertelement %5021, %5274[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5276 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5277 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5278 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5279 = llvm.mul %4772, %5278 : !llvm.i64 + %5280 = llvm.add %5277, %5279 : !llvm.i64 + %5281 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5282 = llvm.mul %67, %5281 : !llvm.i64 + %5283 = llvm.add %5280, %5282 : !llvm.i64 + %5284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5285 = llvm.mul %4787, %5284 : !llvm.i64 + %5286 = llvm.add %5283, %5285 : !llvm.i64 + %5287 = llvm.getelementptr %5276[%5286] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5275, %5287 : !llvm.ptr> + %5288 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5290 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5291 = llvm.mul %4772, %5290 : !llvm.i64 + %5292 = llvm.add %5289, %5291 : !llvm.i64 + %5293 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5294 = llvm.mul %67, %5293 : !llvm.i64 + %5295 = llvm.add %5292, %5294 : !llvm.i64 + %5296 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5297 = llvm.mul %4787, %5296 : !llvm.i64 + %5298 = llvm.add %5295, %5297 : !llvm.i64 + %5299 = llvm.getelementptr %5288[%5298] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5300 = llvm.load %5299 : !llvm.ptr> + %5301 = llvm.insertelement %5022, %5300[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5302 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5303 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5304 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5305 = llvm.mul %4772, %5304 : !llvm.i64 + %5306 = llvm.add %5303, %5305 : !llvm.i64 + %5307 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5308 = llvm.mul %67, %5307 : !llvm.i64 + %5309 = llvm.add %5306, %5308 : !llvm.i64 + %5310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5311 = llvm.mul %4787, %5310 : !llvm.i64 + %5312 = llvm.add %5309, %5311 : !llvm.i64 + %5313 = llvm.getelementptr %5302[%5312] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5301, %5313 : !llvm.ptr> + %5314 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5316 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5317 = llvm.mul %4772, %5316 : !llvm.i64 + %5318 = llvm.add %5315, %5317 : !llvm.i64 + %5319 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5320 = llvm.mul %67, %5319 : !llvm.i64 + %5321 = llvm.add %5318, %5320 : !llvm.i64 + %5322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5323 = llvm.mul %4787, %5322 : !llvm.i64 + %5324 = llvm.add %5321, %5323 : !llvm.i64 + %5325 = llvm.getelementptr %5314[%5324] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5326 = llvm.load %5325 : !llvm.ptr> + %5327 = llvm.insertelement %5023, %5326[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5328 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5330 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5331 = llvm.mul %4772, %5330 : !llvm.i64 + %5332 = llvm.add %5329, %5331 : !llvm.i64 + %5333 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5334 = llvm.mul %67, %5333 : !llvm.i64 + %5335 = llvm.add %5332, %5334 : !llvm.i64 + %5336 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5337 = llvm.mul %4787, %5336 : !llvm.i64 + %5338 = llvm.add %5335, %5337 : !llvm.i64 + %5339 = llvm.getelementptr %5328[%5338] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5327, %5339 : !llvm.ptr> + %5340 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5342 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5343 = llvm.mul %4772, %5342 : !llvm.i64 + %5344 = llvm.add %5341, %5343 : !llvm.i64 + %5345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5346 = llvm.mul %67, %5345 : !llvm.i64 + %5347 = llvm.add %5344, %5346 : !llvm.i64 + %5348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5349 = llvm.mul %4787, %5348 : !llvm.i64 + %5350 = llvm.add %5347, %5349 : !llvm.i64 + %5351 = llvm.getelementptr %5340[%5350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5352 = llvm.load %5351 : !llvm.ptr> + %5353 = llvm.insertelement %5024, %5352[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5354 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5356 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5357 = llvm.mul %4772, %5356 : !llvm.i64 + %5358 = llvm.add %5355, %5357 : !llvm.i64 + %5359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5360 = llvm.mul %67, %5359 : !llvm.i64 + %5361 = llvm.add %5358, %5360 : !llvm.i64 + %5362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5363 = llvm.mul %4787, %5362 : !llvm.i64 + %5364 = llvm.add %5361, %5363 : !llvm.i64 + %5365 = llvm.getelementptr %5354[%5364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5353, %5365 : !llvm.ptr> + %5366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5369 = llvm.mul %4772, %5368 : !llvm.i64 + %5370 = llvm.add %5367, %5369 : !llvm.i64 + %5371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5372 = llvm.mul %67, %5371 : !llvm.i64 + %5373 = llvm.add %5370, %5372 : !llvm.i64 + %5374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5375 = llvm.mul %4787, %5374 : !llvm.i64 + %5376 = llvm.add %5373, %5375 : !llvm.i64 + %5377 = llvm.getelementptr %5366[%5376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5378 = llvm.load %5377 : !llvm.ptr> + %5379 = llvm.insertelement %5025, %5378[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5380 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5382 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5383 = llvm.mul %4772, %5382 : !llvm.i64 + %5384 = llvm.add %5381, %5383 : !llvm.i64 + %5385 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5386 = llvm.mul %67, %5385 : !llvm.i64 + %5387 = llvm.add %5384, %5386 : !llvm.i64 + %5388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5389 = llvm.mul %4787, %5388 : !llvm.i64 + %5390 = llvm.add %5387, %5389 : !llvm.i64 + %5391 = llvm.getelementptr %5380[%5390] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5379, %5391 : !llvm.ptr> + %5392 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5393 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5394 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5395 = llvm.mul %4772, %5394 : !llvm.i64 + %5396 = llvm.add %5393, %5395 : !llvm.i64 + %5397 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5398 = llvm.mul %67, %5397 : !llvm.i64 + %5399 = llvm.add %5396, %5398 : !llvm.i64 + %5400 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5401 = llvm.mul %4787, %5400 : !llvm.i64 + %5402 = llvm.add %5399, %5401 : !llvm.i64 + %5403 = llvm.getelementptr %5392[%5402] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5404 = llvm.load %5403 : !llvm.ptr> + %5405 = llvm.insertelement %5026, %5404[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5406 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5408 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5409 = llvm.mul %4772, %5408 : !llvm.i64 + %5410 = llvm.add %5407, %5409 : !llvm.i64 + %5411 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5412 = llvm.mul %67, %5411 : !llvm.i64 + %5413 = llvm.add %5410, %5412 : !llvm.i64 + %5414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5415 = llvm.mul %4787, %5414 : !llvm.i64 + %5416 = llvm.add %5413, %5415 : !llvm.i64 + %5417 = llvm.getelementptr %5406[%5416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5405, %5417 : !llvm.ptr> + %5418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5421 = llvm.mul %4772, %5420 : !llvm.i64 + %5422 = llvm.add %5419, %5421 : !llvm.i64 + %5423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5424 = llvm.mul %67, %5423 : !llvm.i64 + %5425 = llvm.add %5422, %5424 : !llvm.i64 + %5426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5427 = llvm.mul %4787, %5426 : !llvm.i64 + %5428 = llvm.add %5425, %5427 : !llvm.i64 + %5429 = llvm.getelementptr %5418[%5428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5430 = llvm.load %5429 : !llvm.ptr> + %5431 = llvm.insertelement %5027, %5430[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5435 = llvm.mul %4772, %5434 : !llvm.i64 + %5436 = llvm.add %5433, %5435 : !llvm.i64 + %5437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5438 = llvm.mul %67, %5437 : !llvm.i64 + %5439 = llvm.add %5436, %5438 : !llvm.i64 + %5440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5441 = llvm.mul %4787, %5440 : !llvm.i64 + %5442 = llvm.add %5439, %5441 : !llvm.i64 + %5443 = llvm.getelementptr %5432[%5442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5431, %5443 : !llvm.ptr> + %5444 = llvm.add %3915, %69 : !llvm.i64 + llvm.br ^bb36(%5444 : !llvm.i64) + ^bb38: // pred: ^bb36 + %5445 = llvm.add %2370, %48 : !llvm.i64 + llvm.br ^bb25(%5445 : !llvm.i64) + ^bb39: // pred: ^bb25 + %5446 = llvm.add %2368, %68 : !llvm.i64 + llvm.br ^bb23(%5446 : !llvm.i64) + ^bb40: // pred: ^bb23 + llvm.br ^bb41(%67 : !llvm.i64) + ^bb41(%5447: !llvm.i64): // 2 preds: ^bb40, ^bb50 + %5448 = llvm.icmp "slt" %5447, %38 : !llvm.i64 + llvm.cond_br %5448, ^bb42, ^bb51 + ^bb42: // pred: ^bb41 + llvm.cond_br %40, ^bb43, ^bb47 + ^bb43: // pred: ^bb42 + %5449 = llvm.add %151, %5447 : !llvm.i64 + %5450 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5451 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5452 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5453 = llvm.mul %2345, %5452 : !llvm.i64 + %5454 = llvm.add %5451, %5453 : !llvm.i64 + %5455 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5456 = llvm.mul %5449, %5455 : !llvm.i64 + %5457 = llvm.add %5454, %5456 : !llvm.i64 + %5458 = llvm.getelementptr %5450[%5457] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5459 = llvm.bitcast %5458 : !llvm.ptr to !llvm.ptr> + %5460 = llvm.load %5459 {alignment = 4 : i64} : !llvm.ptr> + %5461 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %5462 = llvm.sub %64, %5447 : !llvm.i64 + %5463 = llvm.select %5461, %5462, %5447 : !llvm.i1, !llvm.i64 + %5464 = llvm.sdiv %5463, %68 : !llvm.i64 + %5465 = llvm.sub %64, %5464 : !llvm.i64 + %5466 = llvm.select %5461, %5465, %5464 : !llvm.i1, !llvm.i64 + %5467 = llvm.srem %5466, %68 : !llvm.i64 + %5468 = llvm.icmp "slt" %5467, %67 : !llvm.i64 + %5469 = llvm.add %5467, %68 : !llvm.i64 + %5470 = llvm.select %5468, %5469, %5467 : !llvm.i1, !llvm.i64 + %5471 = llvm.srem %5447, %68 : !llvm.i64 + %5472 = llvm.icmp "slt" %5471, %67 : !llvm.i64 + %5473 = llvm.add %5471, %68 : !llvm.i64 + %5474 = llvm.select %5472, %5473, %5471 : !llvm.i1, !llvm.i64 + %5475 = llvm.icmp "slt" %5474, %67 : !llvm.i64 + %5476 = llvm.sub %64, %5474 : !llvm.i64 + %5477 = llvm.select %5475, %5476, %5474 : !llvm.i1, !llvm.i64 + %5478 = llvm.sdiv %5477, %70 : !llvm.i64 + %5479 = llvm.sub %64, %5478 : !llvm.i64 + %5480 = llvm.select %5475, %5479, %5478 : !llvm.i1, !llvm.i64 + %5481 = llvm.srem %5480, %63 : !llvm.i64 + %5482 = llvm.icmp "slt" %5481, %67 : !llvm.i64 + %5483 = llvm.add %5481, %63 : !llvm.i64 + %5484 = llvm.select %5482, %5483, %5481 : !llvm.i1, !llvm.i64 + %5485 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5486 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5487 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5488 = llvm.mul %5470, %5487 : !llvm.i64 + %5489 = llvm.add %5486, %5488 : !llvm.i64 + %5490 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5491 = llvm.mul %67, %5490 : !llvm.i64 + %5492 = llvm.add %5489, %5491 : !llvm.i64 + %5493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5494 = llvm.mul %5484, %5493 : !llvm.i64 + %5495 = llvm.add %5492, %5494 : !llvm.i64 + %5496 = llvm.getelementptr %5485[%5495] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5497 = llvm.load %5496 : !llvm.ptr> + %5498 = llvm.fadd %5460, %5497 : !llvm.vec<8 x float> + %5499 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5500 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5501 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5502 = llvm.mul %67, %5501 : !llvm.i64 + %5503 = llvm.add %5500, %5502 : !llvm.i64 + %5504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5505 = llvm.mul %67, %5504 : !llvm.i64 + %5506 = llvm.add %5503, %5505 : !llvm.i64 + %5507 = llvm.getelementptr %5499[%5506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5498, %5507 : !llvm.ptr> + %5508 = llvm.add %5449, %70 : !llvm.i64 + %5509 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5510 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5511 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5512 = llvm.mul %2345, %5511 : !llvm.i64 + %5513 = llvm.add %5510, %5512 : !llvm.i64 + %5514 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5515 = llvm.mul %5508, %5514 : !llvm.i64 + %5516 = llvm.add %5513, %5515 : !llvm.i64 + %5517 = llvm.getelementptr %5509[%5516] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5518 = llvm.bitcast %5517 : !llvm.ptr to !llvm.ptr> + %5519 = llvm.load %5518 {alignment = 4 : i64} : !llvm.ptr> + %5520 = llvm.add %5447, %70 : !llvm.i64 + %5521 = llvm.icmp "slt" %5520, %67 : !llvm.i64 + %5522 = llvm.sub %64, %5520 : !llvm.i64 + %5523 = llvm.select %5521, %5522, %5520 : !llvm.i1, !llvm.i64 + %5524 = llvm.sdiv %5523, %68 : !llvm.i64 + %5525 = llvm.sub %64, %5524 : !llvm.i64 + %5526 = llvm.select %5521, %5525, %5524 : !llvm.i1, !llvm.i64 + %5527 = llvm.srem %5526, %68 : !llvm.i64 + %5528 = llvm.icmp "slt" %5527, %67 : !llvm.i64 + %5529 = llvm.add %5527, %68 : !llvm.i64 + %5530 = llvm.select %5528, %5529, %5527 : !llvm.i1, !llvm.i64 + %5531 = llvm.sdiv %5463, %70 : !llvm.i64 + %5532 = llvm.sub %64, %5531 : !llvm.i64 + %5533 = llvm.select %5461, %5532, %5531 : !llvm.i1, !llvm.i64 + %5534 = llvm.mul %5526, %65 : !llvm.i64 + %5535 = llvm.add %5533, %5534 : !llvm.i64 + %5536 = llvm.add %5535, %69 : !llvm.i64 + %5537 = llvm.icmp "slt" %5536, %67 : !llvm.i64 + %5538 = llvm.sub %64, %5536 : !llvm.i64 + %5539 = llvm.select %5537, %5538, %5536 : !llvm.i1, !llvm.i64 + %5540 = llvm.sdiv %5539, %63 : !llvm.i64 + %5541 = llvm.sub %64, %5540 : !llvm.i64 + %5542 = llvm.select %5537, %5541, %5540 : !llvm.i1, !llvm.i64 + %5543 = llvm.mul %5542, %65 : !llvm.i64 + %5544 = llvm.add %5535, %5543 : !llvm.i64 + %5545 = llvm.add %5544, %69 : !llvm.i64 + %5546 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5547 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5548 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5549 = llvm.mul %5530, %5548 : !llvm.i64 + %5550 = llvm.add %5547, %5549 : !llvm.i64 + %5551 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5552 = llvm.mul %67, %5551 : !llvm.i64 + %5553 = llvm.add %5550, %5552 : !llvm.i64 + %5554 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5555 = llvm.mul %5545, %5554 : !llvm.i64 + %5556 = llvm.add %5553, %5555 : !llvm.i64 + %5557 = llvm.getelementptr %5546[%5556] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5558 = llvm.load %5557 : !llvm.ptr> + %5559 = llvm.fadd %5519, %5558 : !llvm.vec<8 x float> + %5560 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5561 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5562 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5563 = llvm.mul %67, %5562 : !llvm.i64 + %5564 = llvm.add %5561, %5563 : !llvm.i64 + %5565 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5566 = llvm.mul %69, %5565 : !llvm.i64 + %5567 = llvm.add %5564, %5566 : !llvm.i64 + %5568 = llvm.getelementptr %5560[%5567] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5559, %5568 : !llvm.ptr> + %5569 = llvm.add %5449, %68 : !llvm.i64 + %5570 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5571 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5572 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5573 = llvm.mul %2345, %5572 : !llvm.i64 + %5574 = llvm.add %5571, %5573 : !llvm.i64 + %5575 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5576 = llvm.mul %5569, %5575 : !llvm.i64 + %5577 = llvm.add %5574, %5576 : !llvm.i64 + %5578 = llvm.getelementptr %5570[%5577] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5579 = llvm.bitcast %5578 : !llvm.ptr to !llvm.ptr> + %5580 = llvm.load %5579 {alignment = 4 : i64} : !llvm.ptr> + %5581 = llvm.add %5466, %69 : !llvm.i64 + %5582 = llvm.icmp "slt" %5581, %67 : !llvm.i64 + %5583 = llvm.sub %64, %5581 : !llvm.i64 + %5584 = llvm.select %5582, %5583, %5581 : !llvm.i1, !llvm.i64 + %5585 = llvm.sdiv %5584, %68 : !llvm.i64 + %5586 = llvm.sub %64, %5585 : !llvm.i64 + %5587 = llvm.select %5582, %5586, %5585 : !llvm.i1, !llvm.i64 + %5588 = llvm.mul %5587, %60 : !llvm.i64 + %5589 = llvm.add %5466, %5588 : !llvm.i64 + %5590 = llvm.add %5589, %69 : !llvm.i64 + %5591 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5593 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5594 = llvm.mul %5590, %5593 : !llvm.i64 + %5595 = llvm.add %5592, %5594 : !llvm.i64 + %5596 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5597 = llvm.mul %67, %5596 : !llvm.i64 + %5598 = llvm.add %5595, %5597 : !llvm.i64 + %5599 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5600 = llvm.mul %5484, %5599 : !llvm.i64 + %5601 = llvm.add %5598, %5600 : !llvm.i64 + %5602 = llvm.getelementptr %5591[%5601] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5603 = llvm.load %5602 : !llvm.ptr> + %5604 = llvm.fadd %5580, %5603 : !llvm.vec<8 x float> + %5605 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5606 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5607 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5608 = llvm.mul %67, %5607 : !llvm.i64 + %5609 = llvm.add %5606, %5608 : !llvm.i64 + %5610 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5611 = llvm.mul %63, %5610 : !llvm.i64 + %5612 = llvm.add %5609, %5611 : !llvm.i64 + %5613 = llvm.getelementptr %5605[%5612] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5604, %5613 : !llvm.ptr> + %5614 = llvm.add %5449, %41 : !llvm.i64 + %5615 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5616 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5617 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5618 = llvm.mul %2345, %5617 : !llvm.i64 + %5619 = llvm.add %5616, %5618 : !llvm.i64 + %5620 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5621 = llvm.mul %5614, %5620 : !llvm.i64 + %5622 = llvm.add %5619, %5621 : !llvm.i64 + %5623 = llvm.getelementptr %5615[%5622] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5624 = llvm.bitcast %5623 : !llvm.ptr to !llvm.ptr> + %5625 = llvm.load %5624 {alignment = 4 : i64} : !llvm.ptr> + %5626 = llvm.add %5447, %41 : !llvm.i64 + %5627 = llvm.icmp "slt" %5626, %67 : !llvm.i64 + %5628 = llvm.sub %64, %5626 : !llvm.i64 + %5629 = llvm.select %5627, %5628, %5626 : !llvm.i1, !llvm.i64 + %5630 = llvm.sdiv %5629, %68 : !llvm.i64 + %5631 = llvm.sub %64, %5630 : !llvm.i64 + %5632 = llvm.select %5627, %5631, %5630 : !llvm.i1, !llvm.i64 + %5633 = llvm.srem %5632, %68 : !llvm.i64 + %5634 = llvm.icmp "slt" %5633, %67 : !llvm.i64 + %5635 = llvm.add %5633, %68 : !llvm.i64 + %5636 = llvm.select %5634, %5635, %5633 : !llvm.i1, !llvm.i64 + %5637 = llvm.mul %5632, %65 : !llvm.i64 + %5638 = llvm.add %5533, %5637 : !llvm.i64 + %5639 = llvm.add %5638, %45 : !llvm.i64 + %5640 = llvm.icmp "slt" %5639, %67 : !llvm.i64 + %5641 = llvm.sub %64, %5639 : !llvm.i64 + %5642 = llvm.select %5640, %5641, %5639 : !llvm.i1, !llvm.i64 + %5643 = llvm.sdiv %5642, %63 : !llvm.i64 + %5644 = llvm.sub %64, %5643 : !llvm.i64 + %5645 = llvm.select %5640, %5644, %5643 : !llvm.i1, !llvm.i64 + %5646 = llvm.mul %5645, %65 : !llvm.i64 + %5647 = llvm.add %5638, %5646 : !llvm.i64 + %5648 = llvm.add %5647, %45 : !llvm.i64 + %5649 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5651 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5652 = llvm.mul %5636, %5651 : !llvm.i64 + %5653 = llvm.add %5650, %5652 : !llvm.i64 + %5654 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5655 = llvm.mul %67, %5654 : !llvm.i64 + %5656 = llvm.add %5653, %5655 : !llvm.i64 + %5657 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5658 = llvm.mul %5648, %5657 : !llvm.i64 + %5659 = llvm.add %5656, %5658 : !llvm.i64 + %5660 = llvm.getelementptr %5649[%5659] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5661 = llvm.load %5660 : !llvm.ptr> + %5662 = llvm.fadd %5625, %5661 : !llvm.vec<8 x float> + %5663 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5664 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5665 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5666 = llvm.mul %67, %5665 : !llvm.i64 + %5667 = llvm.add %5664, %5666 : !llvm.i64 + %5668 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5669 = llvm.mul %45, %5668 : !llvm.i64 + %5670 = llvm.add %5667, %5669 : !llvm.i64 + %5671 = llvm.getelementptr %5663[%5670] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5662, %5671 : !llvm.ptr> + %5672 = llvm.add %5449, %42 : !llvm.i64 + %5673 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5674 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5675 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5676 = llvm.mul %2345, %5675 : !llvm.i64 + %5677 = llvm.add %5674, %5676 : !llvm.i64 + %5678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5679 = llvm.mul %5672, %5678 : !llvm.i64 + %5680 = llvm.add %5677, %5679 : !llvm.i64 + %5681 = llvm.getelementptr %5673[%5680] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5682 = llvm.bitcast %5681 : !llvm.ptr to !llvm.ptr> + %5683 = llvm.load %5682 {alignment = 4 : i64} : !llvm.ptr> + %5684 = llvm.add %5466, %63 : !llvm.i64 + %5685 = llvm.icmp "slt" %5684, %67 : !llvm.i64 + %5686 = llvm.sub %64, %5684 : !llvm.i64 + %5687 = llvm.select %5685, %5686, %5684 : !llvm.i1, !llvm.i64 + %5688 = llvm.sdiv %5687, %68 : !llvm.i64 + %5689 = llvm.sub %64, %5688 : !llvm.i64 + %5690 = llvm.select %5685, %5689, %5688 : !llvm.i1, !llvm.i64 + %5691 = llvm.mul %5690, %60 : !llvm.i64 + %5692 = llvm.add %5466, %5691 : !llvm.i64 + %5693 = llvm.add %5692, %63 : !llvm.i64 + %5694 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5695 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5696 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5697 = llvm.mul %5693, %5696 : !llvm.i64 + %5698 = llvm.add %5695, %5697 : !llvm.i64 + %5699 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5700 = llvm.mul %67, %5699 : !llvm.i64 + %5701 = llvm.add %5698, %5700 : !llvm.i64 + %5702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5703 = llvm.mul %5484, %5702 : !llvm.i64 + %5704 = llvm.add %5701, %5703 : !llvm.i64 + %5705 = llvm.getelementptr %5694[%5704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5706 = llvm.load %5705 : !llvm.ptr> + %5707 = llvm.fadd %5683, %5706 : !llvm.vec<8 x float> + %5708 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5710 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5711 = llvm.mul %67, %5710 : !llvm.i64 + %5712 = llvm.add %5709, %5711 : !llvm.i64 + %5713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5714 = llvm.mul %48, %5713 : !llvm.i64 + %5715 = llvm.add %5712, %5714 : !llvm.i64 + %5716 = llvm.getelementptr %5708[%5715] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5707, %5716 : !llvm.ptr> + %5717 = llvm.add %5449, %43 : !llvm.i64 + %5718 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5720 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5721 = llvm.mul %2345, %5720 : !llvm.i64 + %5722 = llvm.add %5719, %5721 : !llvm.i64 + %5723 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5724 = llvm.mul %5717, %5723 : !llvm.i64 + %5725 = llvm.add %5722, %5724 : !llvm.i64 + %5726 = llvm.getelementptr %5718[%5725] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5727 = llvm.bitcast %5726 : !llvm.ptr to !llvm.ptr> + %5728 = llvm.load %5727 {alignment = 4 : i64} : !llvm.ptr> + %5729 = llvm.add %5447, %43 : !llvm.i64 + %5730 = llvm.icmp "slt" %5729, %67 : !llvm.i64 + %5731 = llvm.sub %64, %5729 : !llvm.i64 + %5732 = llvm.select %5730, %5731, %5729 : !llvm.i1, !llvm.i64 + %5733 = llvm.sdiv %5732, %68 : !llvm.i64 + %5734 = llvm.sub %64, %5733 : !llvm.i64 + %5735 = llvm.select %5730, %5734, %5733 : !llvm.i1, !llvm.i64 + %5736 = llvm.srem %5735, %68 : !llvm.i64 + %5737 = llvm.icmp "slt" %5736, %67 : !llvm.i64 + %5738 = llvm.add %5736, %68 : !llvm.i64 + %5739 = llvm.select %5737, %5738, %5736 : !llvm.i1, !llvm.i64 + %5740 = llvm.mul %5735, %65 : !llvm.i64 + %5741 = llvm.add %5533, %5740 : !llvm.i64 + %5742 = llvm.add %5741, %52 : !llvm.i64 + %5743 = llvm.icmp "slt" %5742, %67 : !llvm.i64 + %5744 = llvm.sub %64, %5742 : !llvm.i64 + %5745 = llvm.select %5743, %5744, %5742 : !llvm.i1, !llvm.i64 + %5746 = llvm.sdiv %5745, %63 : !llvm.i64 + %5747 = llvm.sub %64, %5746 : !llvm.i64 + %5748 = llvm.select %5743, %5747, %5746 : !llvm.i1, !llvm.i64 + %5749 = llvm.mul %5748, %65 : !llvm.i64 + %5750 = llvm.add %5741, %5749 : !llvm.i64 + %5751 = llvm.add %5750, %52 : !llvm.i64 + %5752 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5754 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5755 = llvm.mul %5739, %5754 : !llvm.i64 + %5756 = llvm.add %5753, %5755 : !llvm.i64 + %5757 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5758 = llvm.mul %67, %5757 : !llvm.i64 + %5759 = llvm.add %5756, %5758 : !llvm.i64 + %5760 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5761 = llvm.mul %5751, %5760 : !llvm.i64 + %5762 = llvm.add %5759, %5761 : !llvm.i64 + %5763 = llvm.getelementptr %5752[%5762] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5764 = llvm.load %5763 : !llvm.ptr> + %5765 = llvm.fadd %5728, %5764 : !llvm.vec<8 x float> + %5766 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5767 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5768 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5769 = llvm.mul %67, %5768 : !llvm.i64 + %5770 = llvm.add %5767, %5769 : !llvm.i64 + %5771 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5772 = llvm.mul %52, %5771 : !llvm.i64 + %5773 = llvm.add %5770, %5772 : !llvm.i64 + %5774 = llvm.getelementptr %5766[%5773] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5765, %5774 : !llvm.ptr> + %5775 = llvm.add %5449, %44 : !llvm.i64 + %5776 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5777 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5778 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5779 = llvm.mul %2345, %5778 : !llvm.i64 + %5780 = llvm.add %5777, %5779 : !llvm.i64 + %5781 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5782 = llvm.mul %5775, %5781 : !llvm.i64 + %5783 = llvm.add %5780, %5782 : !llvm.i64 + %5784 = llvm.getelementptr %5776[%5783] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5785 = llvm.bitcast %5784 : !llvm.ptr to !llvm.ptr> + %5786 = llvm.load %5785 {alignment = 4 : i64} : !llvm.ptr> + %5787 = llvm.add %5466, %45 : !llvm.i64 + %5788 = llvm.icmp "slt" %5787, %67 : !llvm.i64 + %5789 = llvm.sub %64, %5787 : !llvm.i64 + %5790 = llvm.select %5788, %5789, %5787 : !llvm.i1, !llvm.i64 + %5791 = llvm.sdiv %5790, %68 : !llvm.i64 + %5792 = llvm.sub %64, %5791 : !llvm.i64 + %5793 = llvm.select %5788, %5792, %5791 : !llvm.i1, !llvm.i64 + %5794 = llvm.mul %5793, %60 : !llvm.i64 + %5795 = llvm.add %5466, %5794 : !llvm.i64 + %5796 = llvm.add %5795, %45 : !llvm.i64 + %5797 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5799 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5800 = llvm.mul %5796, %5799 : !llvm.i64 + %5801 = llvm.add %5798, %5800 : !llvm.i64 + %5802 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5803 = llvm.mul %67, %5802 : !llvm.i64 + %5804 = llvm.add %5801, %5803 : !llvm.i64 + %5805 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5806 = llvm.mul %5484, %5805 : !llvm.i64 + %5807 = llvm.add %5804, %5806 : !llvm.i64 + %5808 = llvm.getelementptr %5797[%5807] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5809 = llvm.load %5808 : !llvm.ptr> + %5810 = llvm.fadd %5786, %5809 : !llvm.vec<8 x float> + %5811 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5812 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5813 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5814 = llvm.mul %67, %5813 : !llvm.i64 + %5815 = llvm.add %5812, %5814 : !llvm.i64 + %5816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5817 = llvm.mul %56, %5816 : !llvm.i64 + %5818 = llvm.add %5815, %5817 : !llvm.i64 + %5819 = llvm.getelementptr %5811[%5818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5810, %5819 : !llvm.ptr> + %5820 = llvm.add %5449, %46 : !llvm.i64 + %5821 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5822 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5823 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5824 = llvm.mul %2345, %5823 : !llvm.i64 + %5825 = llvm.add %5822, %5824 : !llvm.i64 + %5826 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5827 = llvm.mul %5820, %5826 : !llvm.i64 + %5828 = llvm.add %5825, %5827 : !llvm.i64 + %5829 = llvm.getelementptr %5821[%5828] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5830 = llvm.bitcast %5829 : !llvm.ptr to !llvm.ptr> + %5831 = llvm.load %5830 {alignment = 4 : i64} : !llvm.ptr> + %5832 = llvm.add %5447, %46 : !llvm.i64 + %5833 = llvm.icmp "slt" %5832, %67 : !llvm.i64 + %5834 = llvm.sub %64, %5832 : !llvm.i64 + %5835 = llvm.select %5833, %5834, %5832 : !llvm.i1, !llvm.i64 + %5836 = llvm.sdiv %5835, %68 : !llvm.i64 + %5837 = llvm.sub %64, %5836 : !llvm.i64 + %5838 = llvm.select %5833, %5837, %5836 : !llvm.i1, !llvm.i64 + %5839 = llvm.srem %5838, %68 : !llvm.i64 + %5840 = llvm.icmp "slt" %5839, %67 : !llvm.i64 + %5841 = llvm.add %5839, %68 : !llvm.i64 + %5842 = llvm.select %5840, %5841, %5839 : !llvm.i1, !llvm.i64 + %5843 = llvm.mul %5838, %65 : !llvm.i64 + %5844 = llvm.add %5533, %5843 : !llvm.i64 + %5845 = llvm.add %5844, %61 : !llvm.i64 + %5846 = llvm.icmp "slt" %5845, %67 : !llvm.i64 + %5847 = llvm.sub %64, %5845 : !llvm.i64 + %5848 = llvm.select %5846, %5847, %5845 : !llvm.i1, !llvm.i64 + %5849 = llvm.sdiv %5848, %63 : !llvm.i64 + %5850 = llvm.sub %64, %5849 : !llvm.i64 + %5851 = llvm.select %5846, %5850, %5849 : !llvm.i1, !llvm.i64 + %5852 = llvm.mul %5851, %65 : !llvm.i64 + %5853 = llvm.add %5844, %5852 : !llvm.i64 + %5854 = llvm.add %5853, %61 : !llvm.i64 + %5855 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5857 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5858 = llvm.mul %5842, %5857 : !llvm.i64 + %5859 = llvm.add %5856, %5858 : !llvm.i64 + %5860 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5861 = llvm.mul %67, %5860 : !llvm.i64 + %5862 = llvm.add %5859, %5861 : !llvm.i64 + %5863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5864 = llvm.mul %5854, %5863 : !llvm.i64 + %5865 = llvm.add %5862, %5864 : !llvm.i64 + %5866 = llvm.getelementptr %5855[%5865] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5867 = llvm.load %5866 : !llvm.ptr> + %5868 = llvm.fadd %5831, %5867 : !llvm.vec<8 x float> + %5869 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5871 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5872 = llvm.mul %67, %5871 : !llvm.i64 + %5873 = llvm.add %5870, %5872 : !llvm.i64 + %5874 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5875 = llvm.mul %61, %5874 : !llvm.i64 + %5876 = llvm.add %5873, %5875 : !llvm.i64 + %5877 = llvm.getelementptr %5869[%5876] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5868, %5877 : !llvm.ptr> + %5878 = llvm.add %5449, %47 : !llvm.i64 + %5879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5882 = llvm.mul %2345, %5881 : !llvm.i64 + %5883 = llvm.add %5880, %5882 : !llvm.i64 + %5884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5885 = llvm.mul %5878, %5884 : !llvm.i64 + %5886 = llvm.add %5883, %5885 : !llvm.i64 + %5887 = llvm.getelementptr %5879[%5886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5888 = llvm.bitcast %5887 : !llvm.ptr to !llvm.ptr> + %5889 = llvm.load %5888 {alignment = 4 : i64} : !llvm.ptr> + %5890 = llvm.add %5466, %48 : !llvm.i64 + %5891 = llvm.icmp "slt" %5890, %67 : !llvm.i64 + %5892 = llvm.sub %64, %5890 : !llvm.i64 + %5893 = llvm.select %5891, %5892, %5890 : !llvm.i1, !llvm.i64 + %5894 = llvm.sdiv %5893, %68 : !llvm.i64 + %5895 = llvm.sub %64, %5894 : !llvm.i64 + %5896 = llvm.select %5891, %5895, %5894 : !llvm.i1, !llvm.i64 + %5897 = llvm.mul %5896, %60 : !llvm.i64 + %5898 = llvm.add %5466, %5897 : !llvm.i64 + %5899 = llvm.add %5898, %48 : !llvm.i64 + %5900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5903 = llvm.mul %5899, %5902 : !llvm.i64 + %5904 = llvm.add %5901, %5903 : !llvm.i64 + %5905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5906 = llvm.mul %67, %5905 : !llvm.i64 + %5907 = llvm.add %5904, %5906 : !llvm.i64 + %5908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5909 = llvm.mul %5484, %5908 : !llvm.i64 + %5910 = llvm.add %5907, %5909 : !llvm.i64 + %5911 = llvm.getelementptr %5900[%5910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5912 = llvm.load %5911 : !llvm.ptr> + %5913 = llvm.fadd %5889, %5912 : !llvm.vec<8 x float> + %5914 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5915 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5916 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5917 = llvm.mul %67, %5916 : !llvm.i64 + %5918 = llvm.add %5915, %5917 : !llvm.i64 + %5919 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5920 = llvm.mul %70, %5919 : !llvm.i64 + %5921 = llvm.add %5918, %5920 : !llvm.i64 + %5922 = llvm.getelementptr %5914[%5921] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5913, %5922 : !llvm.ptr> + %5923 = llvm.add %5449, %49 : !llvm.i64 + %5924 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5925 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5926 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5927 = llvm.mul %2345, %5926 : !llvm.i64 + %5928 = llvm.add %5925, %5927 : !llvm.i64 + %5929 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5930 = llvm.mul %5923, %5929 : !llvm.i64 + %5931 = llvm.add %5928, %5930 : !llvm.i64 + %5932 = llvm.getelementptr %5924[%5931] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5933 = llvm.bitcast %5932 : !llvm.ptr to !llvm.ptr> + %5934 = llvm.load %5933 {alignment = 4 : i64} : !llvm.ptr> + %5935 = llvm.add %5447, %49 : !llvm.i64 + %5936 = llvm.icmp "slt" %5935, %67 : !llvm.i64 + %5937 = llvm.sub %64, %5935 : !llvm.i64 + %5938 = llvm.select %5936, %5937, %5935 : !llvm.i1, !llvm.i64 + %5939 = llvm.sdiv %5938, %68 : !llvm.i64 + %5940 = llvm.sub %64, %5939 : !llvm.i64 + %5941 = llvm.select %5936, %5940, %5939 : !llvm.i1, !llvm.i64 + %5942 = llvm.srem %5941, %68 : !llvm.i64 + %5943 = llvm.icmp "slt" %5942, %67 : !llvm.i64 + %5944 = llvm.add %5942, %68 : !llvm.i64 + %5945 = llvm.select %5943, %5944, %5942 : !llvm.i1, !llvm.i64 + %5946 = llvm.mul %5941, %65 : !llvm.i64 + %5947 = llvm.add %5533, %5946 : !llvm.i64 + %5948 = llvm.add %5947, %50 : !llvm.i64 + %5949 = llvm.icmp "slt" %5948, %67 : !llvm.i64 + %5950 = llvm.sub %64, %5948 : !llvm.i64 + %5951 = llvm.select %5949, %5950, %5948 : !llvm.i1, !llvm.i64 + %5952 = llvm.sdiv %5951, %63 : !llvm.i64 + %5953 = llvm.sub %64, %5952 : !llvm.i64 + %5954 = llvm.select %5949, %5953, %5952 : !llvm.i1, !llvm.i64 + %5955 = llvm.mul %5954, %65 : !llvm.i64 + %5956 = llvm.add %5947, %5955 : !llvm.i64 + %5957 = llvm.add %5956, %50 : !llvm.i64 + %5958 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5960 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5961 = llvm.mul %5945, %5960 : !llvm.i64 + %5962 = llvm.add %5959, %5961 : !llvm.i64 + %5963 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5964 = llvm.mul %67, %5963 : !llvm.i64 + %5965 = llvm.add %5962, %5964 : !llvm.i64 + %5966 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5967 = llvm.mul %5957, %5966 : !llvm.i64 + %5968 = llvm.add %5965, %5967 : !llvm.i64 + %5969 = llvm.getelementptr %5958[%5968] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5970 = llvm.load %5969 : !llvm.ptr> + %5971 = llvm.fadd %5934, %5970 : !llvm.vec<8 x float> + %5972 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5973 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5974 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5975 = llvm.mul %67, %5974 : !llvm.i64 + %5976 = llvm.add %5973, %5975 : !llvm.i64 + %5977 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5978 = llvm.mul %50, %5977 : !llvm.i64 + %5979 = llvm.add %5976, %5978 : !llvm.i64 + %5980 = llvm.getelementptr %5972[%5979] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5971, %5980 : !llvm.ptr> + %5981 = llvm.add %5449, %51 : !llvm.i64 + %5982 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5984 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5985 = llvm.mul %2345, %5984 : !llvm.i64 + %5986 = llvm.add %5983, %5985 : !llvm.i64 + %5987 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5988 = llvm.mul %5981, %5987 : !llvm.i64 + %5989 = llvm.add %5986, %5988 : !llvm.i64 + %5990 = llvm.getelementptr %5982[%5989] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5991 = llvm.bitcast %5990 : !llvm.ptr to !llvm.ptr> + %5992 = llvm.load %5991 {alignment = 4 : i64} : !llvm.ptr> + %5993 = llvm.add %5466, %52 : !llvm.i64 + %5994 = llvm.icmp "slt" %5993, %67 : !llvm.i64 + %5995 = llvm.sub %64, %5993 : !llvm.i64 + %5996 = llvm.select %5994, %5995, %5993 : !llvm.i1, !llvm.i64 + %5997 = llvm.sdiv %5996, %68 : !llvm.i64 + %5998 = llvm.sub %64, %5997 : !llvm.i64 + %5999 = llvm.select %5994, %5998, %5997 : !llvm.i1, !llvm.i64 + %6000 = llvm.mul %5999, %60 : !llvm.i64 + %6001 = llvm.add %5466, %6000 : !llvm.i64 + %6002 = llvm.add %6001, %52 : !llvm.i64 + %6003 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6004 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6005 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6006 = llvm.mul %6002, %6005 : !llvm.i64 + %6007 = llvm.add %6004, %6006 : !llvm.i64 + %6008 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6009 = llvm.mul %67, %6008 : !llvm.i64 + %6010 = llvm.add %6007, %6009 : !llvm.i64 + %6011 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6012 = llvm.mul %5484, %6011 : !llvm.i64 + %6013 = llvm.add %6010, %6012 : !llvm.i64 + %6014 = llvm.getelementptr %6003[%6013] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6015 = llvm.load %6014 : !llvm.ptr> + %6016 = llvm.fadd %5992, %6015 : !llvm.vec<8 x float> + %6017 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6018 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6019 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6020 = llvm.mul %67, %6019 : !llvm.i64 + %6021 = llvm.add %6018, %6020 : !llvm.i64 + %6022 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6023 = llvm.mul %33, %6022 : !llvm.i64 + %6024 = llvm.add %6021, %6023 : !llvm.i64 + %6025 = llvm.getelementptr %6017[%6024] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6016, %6025 : !llvm.ptr> + %6026 = llvm.add %5449, %53 : !llvm.i64 + %6027 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6028 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6029 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6030 = llvm.mul %2345, %6029 : !llvm.i64 + %6031 = llvm.add %6028, %6030 : !llvm.i64 + %6032 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6033 = llvm.mul %6026, %6032 : !llvm.i64 + %6034 = llvm.add %6031, %6033 : !llvm.i64 + %6035 = llvm.getelementptr %6027[%6034] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6036 = llvm.bitcast %6035 : !llvm.ptr to !llvm.ptr> + %6037 = llvm.load %6036 {alignment = 4 : i64} : !llvm.ptr> + %6038 = llvm.add %5447, %53 : !llvm.i64 + %6039 = llvm.icmp "slt" %6038, %67 : !llvm.i64 + %6040 = llvm.sub %64, %6038 : !llvm.i64 + %6041 = llvm.select %6039, %6040, %6038 : !llvm.i1, !llvm.i64 + %6042 = llvm.sdiv %6041, %68 : !llvm.i64 + %6043 = llvm.sub %64, %6042 : !llvm.i64 + %6044 = llvm.select %6039, %6043, %6042 : !llvm.i1, !llvm.i64 + %6045 = llvm.srem %6044, %68 : !llvm.i64 + %6046 = llvm.icmp "slt" %6045, %67 : !llvm.i64 + %6047 = llvm.add %6045, %68 : !llvm.i64 + %6048 = llvm.select %6046, %6047, %6045 : !llvm.i1, !llvm.i64 + %6049 = llvm.mul %6044, %65 : !llvm.i64 + %6050 = llvm.add %5533, %6049 : !llvm.i64 + %6051 = llvm.add %6050, %54 : !llvm.i64 + %6052 = llvm.icmp "slt" %6051, %67 : !llvm.i64 + %6053 = llvm.sub %64, %6051 : !llvm.i64 + %6054 = llvm.select %6052, %6053, %6051 : !llvm.i1, !llvm.i64 + %6055 = llvm.sdiv %6054, %63 : !llvm.i64 + %6056 = llvm.sub %64, %6055 : !llvm.i64 + %6057 = llvm.select %6052, %6056, %6055 : !llvm.i1, !llvm.i64 + %6058 = llvm.mul %6057, %65 : !llvm.i64 + %6059 = llvm.add %6050, %6058 : !llvm.i64 + %6060 = llvm.add %6059, %54 : !llvm.i64 + %6061 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6062 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6063 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6064 = llvm.mul %6048, %6063 : !llvm.i64 + %6065 = llvm.add %6062, %6064 : !llvm.i64 + %6066 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6067 = llvm.mul %67, %6066 : !llvm.i64 + %6068 = llvm.add %6065, %6067 : !llvm.i64 + %6069 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6070 = llvm.mul %6060, %6069 : !llvm.i64 + %6071 = llvm.add %6068, %6070 : !llvm.i64 + %6072 = llvm.getelementptr %6061[%6071] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6073 = llvm.load %6072 : !llvm.ptr> + %6074 = llvm.fadd %6037, %6073 : !llvm.vec<8 x float> + %6075 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6076 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6077 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6078 = llvm.mul %67, %6077 : !llvm.i64 + %6079 = llvm.add %6076, %6078 : !llvm.i64 + %6080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6081 = llvm.mul %54, %6080 : !llvm.i64 + %6082 = llvm.add %6079, %6081 : !llvm.i64 + %6083 = llvm.getelementptr %6075[%6082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6074, %6083 : !llvm.ptr> + %6084 = llvm.add %5449, %55 : !llvm.i64 + %6085 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6087 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6088 = llvm.mul %2345, %6087 : !llvm.i64 + %6089 = llvm.add %6086, %6088 : !llvm.i64 + %6090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6091 = llvm.mul %6084, %6090 : !llvm.i64 + %6092 = llvm.add %6089, %6091 : !llvm.i64 + %6093 = llvm.getelementptr %6085[%6092] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6094 = llvm.bitcast %6093 : !llvm.ptr to !llvm.ptr> + %6095 = llvm.load %6094 {alignment = 4 : i64} : !llvm.ptr> + %6096 = llvm.add %5466, %56 : !llvm.i64 + %6097 = llvm.icmp "slt" %6096, %67 : !llvm.i64 + %6098 = llvm.sub %64, %6096 : !llvm.i64 + %6099 = llvm.select %6097, %6098, %6096 : !llvm.i1, !llvm.i64 + %6100 = llvm.sdiv %6099, %68 : !llvm.i64 + %6101 = llvm.sub %64, %6100 : !llvm.i64 + %6102 = llvm.select %6097, %6101, %6100 : !llvm.i1, !llvm.i64 + %6103 = llvm.mul %6102, %60 : !llvm.i64 + %6104 = llvm.add %5466, %6103 : !llvm.i64 + %6105 = llvm.add %6104, %56 : !llvm.i64 + %6106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6109 = llvm.mul %6105, %6108 : !llvm.i64 + %6110 = llvm.add %6107, %6109 : !llvm.i64 + %6111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6112 = llvm.mul %67, %6111 : !llvm.i64 + %6113 = llvm.add %6110, %6112 : !llvm.i64 + %6114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6115 = llvm.mul %5484, %6114 : !llvm.i64 + %6116 = llvm.add %6113, %6115 : !llvm.i64 + %6117 = llvm.getelementptr %6106[%6116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6118 = llvm.load %6117 : !llvm.ptr> + %6119 = llvm.fadd %6095, %6118 : !llvm.vec<8 x float> + %6120 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6122 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6123 = llvm.mul %67, %6122 : !llvm.i64 + %6124 = llvm.add %6121, %6123 : !llvm.i64 + %6125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6126 = llvm.mul %34, %6125 : !llvm.i64 + %6127 = llvm.add %6124, %6126 : !llvm.i64 + %6128 = llvm.getelementptr %6120[%6127] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6119, %6128 : !llvm.ptr> + %6129 = llvm.add %5449, %57 : !llvm.i64 + %6130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6132 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6133 = llvm.mul %2345, %6132 : !llvm.i64 + %6134 = llvm.add %6131, %6133 : !llvm.i64 + %6135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6136 = llvm.mul %6129, %6135 : !llvm.i64 + %6137 = llvm.add %6134, %6136 : !llvm.i64 + %6138 = llvm.getelementptr %6130[%6137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6139 = llvm.bitcast %6138 : !llvm.ptr to !llvm.ptr> + %6140 = llvm.load %6139 {alignment = 4 : i64} : !llvm.ptr> + %6141 = llvm.add %5447, %57 : !llvm.i64 + %6142 = llvm.icmp "slt" %6141, %67 : !llvm.i64 + %6143 = llvm.sub %64, %6141 : !llvm.i64 + %6144 = llvm.select %6142, %6143, %6141 : !llvm.i1, !llvm.i64 + %6145 = llvm.sdiv %6144, %68 : !llvm.i64 + %6146 = llvm.sub %64, %6145 : !llvm.i64 + %6147 = llvm.select %6142, %6146, %6145 : !llvm.i1, !llvm.i64 + %6148 = llvm.srem %6147, %68 : !llvm.i64 + %6149 = llvm.icmp "slt" %6148, %67 : !llvm.i64 + %6150 = llvm.add %6148, %68 : !llvm.i64 + %6151 = llvm.select %6149, %6150, %6148 : !llvm.i1, !llvm.i64 + %6152 = llvm.mul %6147, %65 : !llvm.i64 + %6153 = llvm.add %5533, %6152 : !llvm.i64 + %6154 = llvm.add %6153, %58 : !llvm.i64 + %6155 = llvm.icmp "slt" %6154, %67 : !llvm.i64 + %6156 = llvm.sub %64, %6154 : !llvm.i64 + %6157 = llvm.select %6155, %6156, %6154 : !llvm.i1, !llvm.i64 + %6158 = llvm.sdiv %6157, %63 : !llvm.i64 + %6159 = llvm.sub %64, %6158 : !llvm.i64 + %6160 = llvm.select %6155, %6159, %6158 : !llvm.i1, !llvm.i64 + %6161 = llvm.mul %6160, %65 : !llvm.i64 + %6162 = llvm.add %6153, %6161 : !llvm.i64 + %6163 = llvm.add %6162, %58 : !llvm.i64 + %6164 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6165 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6166 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6167 = llvm.mul %6151, %6166 : !llvm.i64 + %6168 = llvm.add %6165, %6167 : !llvm.i64 + %6169 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6170 = llvm.mul %67, %6169 : !llvm.i64 + %6171 = llvm.add %6168, %6170 : !llvm.i64 + %6172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6173 = llvm.mul %6163, %6172 : !llvm.i64 + %6174 = llvm.add %6171, %6173 : !llvm.i64 + %6175 = llvm.getelementptr %6164[%6174] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6176 = llvm.load %6175 : !llvm.ptr> + %6177 = llvm.fadd %6140, %6176 : !llvm.vec<8 x float> + %6178 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6180 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6181 = llvm.mul %67, %6180 : !llvm.i64 + %6182 = llvm.add %6179, %6181 : !llvm.i64 + %6183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6184 = llvm.mul %58, %6183 : !llvm.i64 + %6185 = llvm.add %6182, %6184 : !llvm.i64 + %6186 = llvm.getelementptr %6178[%6185] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6177, %6186 : !llvm.ptr> + %6187 = llvm.add %5449, %59 : !llvm.i64 + %6188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6191 = llvm.mul %2345, %6190 : !llvm.i64 + %6192 = llvm.add %6189, %6191 : !llvm.i64 + %6193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6194 = llvm.mul %6187, %6193 : !llvm.i64 + %6195 = llvm.add %6192, %6194 : !llvm.i64 + %6196 = llvm.getelementptr %6188[%6195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6197 = llvm.bitcast %6196 : !llvm.ptr to !llvm.ptr> + %6198 = llvm.load %6197 {alignment = 4 : i64} : !llvm.ptr> + %6199 = llvm.add %5466, %61 : !llvm.i64 + %6200 = llvm.icmp "slt" %6199, %67 : !llvm.i64 + %6201 = llvm.sub %64, %6199 : !llvm.i64 + %6202 = llvm.select %6200, %6201, %6199 : !llvm.i1, !llvm.i64 + %6203 = llvm.sdiv %6202, %68 : !llvm.i64 + %6204 = llvm.sub %64, %6203 : !llvm.i64 + %6205 = llvm.select %6200, %6204, %6203 : !llvm.i1, !llvm.i64 + %6206 = llvm.mul %6205, %60 : !llvm.i64 + %6207 = llvm.add %5466, %6206 : !llvm.i64 + %6208 = llvm.add %6207, %61 : !llvm.i64 + %6209 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6211 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6212 = llvm.mul %6208, %6211 : !llvm.i64 + %6213 = llvm.add %6210, %6212 : !llvm.i64 + %6214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6215 = llvm.mul %67, %6214 : !llvm.i64 + %6216 = llvm.add %6213, %6215 : !llvm.i64 + %6217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6218 = llvm.mul %5484, %6217 : !llvm.i64 + %6219 = llvm.add %6216, %6218 : !llvm.i64 + %6220 = llvm.getelementptr %6209[%6219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6221 = llvm.load %6220 : !llvm.ptr> + %6222 = llvm.fadd %6198, %6221 : !llvm.vec<8 x float> + %6223 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6224 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6225 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6226 = llvm.mul %67, %6225 : !llvm.i64 + %6227 = llvm.add %6224, %6226 : !llvm.i64 + %6228 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6229 = llvm.mul %35, %6228 : !llvm.i64 + %6230 = llvm.add %6227, %6229 : !llvm.i64 + %6231 = llvm.getelementptr %6223[%6230] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6222, %6231 : !llvm.ptr> + %6232 = llvm.add %5449, %62 : !llvm.i64 + %6233 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6234 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6235 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6236 = llvm.mul %2345, %6235 : !llvm.i64 + %6237 = llvm.add %6234, %6236 : !llvm.i64 + %6238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6239 = llvm.mul %6232, %6238 : !llvm.i64 + %6240 = llvm.add %6237, %6239 : !llvm.i64 + %6241 = llvm.getelementptr %6233[%6240] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6242 = llvm.bitcast %6241 : !llvm.ptr to !llvm.ptr> + %6243 = llvm.load %6242 {alignment = 4 : i64} : !llvm.ptr> + %6244 = llvm.add %5447, %62 : !llvm.i64 + %6245 = llvm.icmp "slt" %6244, %67 : !llvm.i64 + %6246 = llvm.sub %64, %6244 : !llvm.i64 + %6247 = llvm.select %6245, %6246, %6244 : !llvm.i1, !llvm.i64 + %6248 = llvm.sdiv %6247, %68 : !llvm.i64 + %6249 = llvm.sub %64, %6248 : !llvm.i64 + %6250 = llvm.select %6245, %6249, %6248 : !llvm.i1, !llvm.i64 + %6251 = llvm.srem %6250, %68 : !llvm.i64 + %6252 = llvm.icmp "slt" %6251, %67 : !llvm.i64 + %6253 = llvm.add %6251, %68 : !llvm.i64 + %6254 = llvm.select %6252, %6253, %6251 : !llvm.i1, !llvm.i64 + %6255 = llvm.mul %6250, %65 : !llvm.i64 + %6256 = llvm.add %5533, %6255 : !llvm.i64 + %6257 = llvm.add %6256, %66 : !llvm.i64 + %6258 = llvm.icmp "slt" %6257, %67 : !llvm.i64 + %6259 = llvm.sub %64, %6257 : !llvm.i64 + %6260 = llvm.select %6258, %6259, %6257 : !llvm.i1, !llvm.i64 + %6261 = llvm.sdiv %6260, %63 : !llvm.i64 + %6262 = llvm.sub %64, %6261 : !llvm.i64 + %6263 = llvm.select %6258, %6262, %6261 : !llvm.i1, !llvm.i64 + %6264 = llvm.mul %6263, %65 : !llvm.i64 + %6265 = llvm.add %6256, %6264 : !llvm.i64 + %6266 = llvm.add %6265, %66 : !llvm.i64 + %6267 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6269 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6270 = llvm.mul %6254, %6269 : !llvm.i64 + %6271 = llvm.add %6268, %6270 : !llvm.i64 + %6272 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6273 = llvm.mul %67, %6272 : !llvm.i64 + %6274 = llvm.add %6271, %6273 : !llvm.i64 + %6275 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6276 = llvm.mul %6266, %6275 : !llvm.i64 + %6277 = llvm.add %6274, %6276 : !llvm.i64 + %6278 = llvm.getelementptr %6267[%6277] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6279 = llvm.load %6278 : !llvm.ptr> + %6280 = llvm.fadd %6243, %6279 : !llvm.vec<8 x float> + %6281 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6282 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6283 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6284 = llvm.mul %67, %6283 : !llvm.i64 + %6285 = llvm.add %6282, %6284 : !llvm.i64 + %6286 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6287 = llvm.mul %66, %6286 : !llvm.i64 + %6288 = llvm.add %6285, %6287 : !llvm.i64 + %6289 = llvm.getelementptr %6281[%6288] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6280, %6289 : !llvm.ptr> + llvm.br ^bb44(%67 : !llvm.i64) + ^bb44(%6290: !llvm.i64): // 2 preds: ^bb43, ^bb45 + %6291 = llvm.icmp "slt" %6290, %68 : !llvm.i64 + llvm.cond_br %6291, ^bb45, ^bb46 + ^bb45: // pred: ^bb44 + %6292 = llvm.mul %6290, %70 : !llvm.i64 + %6293 = llvm.add %5449, %6292 : !llvm.i64 + %6294 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6295 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6296 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6297 = llvm.mul %67, %6296 : !llvm.i64 + %6298 = llvm.add %6295, %6297 : !llvm.i64 + %6299 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6300 = llvm.mul %6290, %6299 : !llvm.i64 + %6301 = llvm.add %6298, %6300 : !llvm.i64 + %6302 = llvm.getelementptr %6294[%6301] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6303 = llvm.load %6302 : !llvm.ptr> + %6304 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6305 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6306 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6307 = llvm.mul %2345, %6306 : !llvm.i64 + %6308 = llvm.add %6305, %6307 : !llvm.i64 + %6309 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6310 = llvm.mul %6293, %6309 : !llvm.i64 + %6311 = llvm.add %6308, %6310 : !llvm.i64 + %6312 = llvm.getelementptr %6304[%6311] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6313 = llvm.bitcast %6312 : !llvm.ptr to !llvm.ptr> + llvm.store %6303, %6313 {alignment = 4 : i64} : !llvm.ptr> + %6314 = llvm.add %6290, %69 : !llvm.i64 + llvm.br ^bb44(%6314 : !llvm.i64) + ^bb46: // 2 preds: ^bb44, ^bb48 + llvm.br ^bb50 + ^bb47: // pred: ^bb42 + %6315 = llvm.add %151, %5447 : !llvm.i64 + %6316 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6318 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6319 = llvm.mul %2345, %6318 : !llvm.i64 + %6320 = llvm.add %6317, %6319 : !llvm.i64 + %6321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6322 = llvm.mul %6315, %6321 : !llvm.i64 + %6323 = llvm.add %6320, %6322 : !llvm.i64 + %6324 = llvm.getelementptr %6316[%6323] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6325 = llvm.bitcast %6324 : !llvm.ptr to !llvm.ptr> + %6326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6327 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6328 = llvm.trunc %6315 : !llvm.i64 to !llvm.i32 + %6329 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6330 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6331 = llvm.insertelement %6328, %6329[%6330 : !llvm.i32] : !llvm.vec<8 x i32> + %6332 = llvm.shufflevector %6331, %6329 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6333 = llvm.add %6332, %6327 : !llvm.vec<8 x i32> + %6334 = llvm.trunc %6326 : !llvm.i64 to !llvm.i32 + %6335 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6336 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6337 = llvm.insertelement %6334, %6335[%6336 : !llvm.i32] : !llvm.vec<8 x i32> + %6338 = llvm.shufflevector %6337, %6335 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6339 = llvm.icmp "slt" %6333, %6338 : !llvm.vec<8 x i32> + %6340 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6341 = llvm.intr.masked.load %6325, %6339, %6340 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6342 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %6343 = llvm.sub %64, %5447 : !llvm.i64 + %6344 = llvm.select %6342, %6343, %5447 : !llvm.i1, !llvm.i64 + %6345 = llvm.sdiv %6344, %68 : !llvm.i64 + %6346 = llvm.sub %64, %6345 : !llvm.i64 + %6347 = llvm.select %6342, %6346, %6345 : !llvm.i1, !llvm.i64 + %6348 = llvm.srem %6347, %68 : !llvm.i64 + %6349 = llvm.icmp "slt" %6348, %67 : !llvm.i64 + %6350 = llvm.add %6348, %68 : !llvm.i64 + %6351 = llvm.select %6349, %6350, %6348 : !llvm.i1, !llvm.i64 + %6352 = llvm.srem %5447, %68 : !llvm.i64 + %6353 = llvm.icmp "slt" %6352, %67 : !llvm.i64 + %6354 = llvm.add %6352, %68 : !llvm.i64 + %6355 = llvm.select %6353, %6354, %6352 : !llvm.i1, !llvm.i64 + %6356 = llvm.icmp "slt" %6355, %67 : !llvm.i64 + %6357 = llvm.sub %64, %6355 : !llvm.i64 + %6358 = llvm.select %6356, %6357, %6355 : !llvm.i1, !llvm.i64 + %6359 = llvm.sdiv %6358, %70 : !llvm.i64 + %6360 = llvm.sub %64, %6359 : !llvm.i64 + %6361 = llvm.select %6356, %6360, %6359 : !llvm.i1, !llvm.i64 + %6362 = llvm.srem %6361, %63 : !llvm.i64 + %6363 = llvm.icmp "slt" %6362, %67 : !llvm.i64 + %6364 = llvm.add %6362, %63 : !llvm.i64 + %6365 = llvm.select %6363, %6364, %6362 : !llvm.i1, !llvm.i64 + %6366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6369 = llvm.mul %6351, %6368 : !llvm.i64 + %6370 = llvm.add %6367, %6369 : !llvm.i64 + %6371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6372 = llvm.mul %67, %6371 : !llvm.i64 + %6373 = llvm.add %6370, %6372 : !llvm.i64 + %6374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6375 = llvm.mul %6365, %6374 : !llvm.i64 + %6376 = llvm.add %6373, %6375 : !llvm.i64 + %6377 = llvm.getelementptr %6366[%6376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6378 = llvm.load %6377 : !llvm.ptr> + %6379 = llvm.fadd %6341, %6378 : !llvm.vec<8 x float> + %6380 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6382 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6383 = llvm.mul %67, %6382 : !llvm.i64 + %6384 = llvm.add %6381, %6383 : !llvm.i64 + %6385 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6386 = llvm.mul %67, %6385 : !llvm.i64 + %6387 = llvm.add %6384, %6386 : !llvm.i64 + %6388 = llvm.getelementptr %6380[%6387] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6379, %6388 : !llvm.ptr> + %6389 = llvm.add %6315, %70 : !llvm.i64 + %6390 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6392 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6393 = llvm.mul %2345, %6392 : !llvm.i64 + %6394 = llvm.add %6391, %6393 : !llvm.i64 + %6395 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6396 = llvm.mul %6389, %6395 : !llvm.i64 + %6397 = llvm.add %6394, %6396 : !llvm.i64 + %6398 = llvm.getelementptr %6390[%6397] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6399 = llvm.bitcast %6398 : !llvm.ptr to !llvm.ptr> + %6400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6401 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6402 = llvm.trunc %6389 : !llvm.i64 to !llvm.i32 + %6403 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6404 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6405 = llvm.insertelement %6402, %6403[%6404 : !llvm.i32] : !llvm.vec<8 x i32> + %6406 = llvm.shufflevector %6405, %6403 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6407 = llvm.add %6406, %6401 : !llvm.vec<8 x i32> + %6408 = llvm.trunc %6400 : !llvm.i64 to !llvm.i32 + %6409 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6410 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6411 = llvm.insertelement %6408, %6409[%6410 : !llvm.i32] : !llvm.vec<8 x i32> + %6412 = llvm.shufflevector %6411, %6409 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6413 = llvm.icmp "slt" %6407, %6412 : !llvm.vec<8 x i32> + %6414 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6415 = llvm.intr.masked.load %6399, %6413, %6414 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6416 = llvm.add %5447, %70 : !llvm.i64 + %6417 = llvm.icmp "slt" %6416, %67 : !llvm.i64 + %6418 = llvm.sub %64, %6416 : !llvm.i64 + %6419 = llvm.select %6417, %6418, %6416 : !llvm.i1, !llvm.i64 + %6420 = llvm.sdiv %6419, %68 : !llvm.i64 + %6421 = llvm.sub %64, %6420 : !llvm.i64 + %6422 = llvm.select %6417, %6421, %6420 : !llvm.i1, !llvm.i64 + %6423 = llvm.srem %6422, %68 : !llvm.i64 + %6424 = llvm.icmp "slt" %6423, %67 : !llvm.i64 + %6425 = llvm.add %6423, %68 : !llvm.i64 + %6426 = llvm.select %6424, %6425, %6423 : !llvm.i1, !llvm.i64 + %6427 = llvm.sdiv %6344, %70 : !llvm.i64 + %6428 = llvm.sub %64, %6427 : !llvm.i64 + %6429 = llvm.select %6342, %6428, %6427 : !llvm.i1, !llvm.i64 + %6430 = llvm.mul %6422, %65 : !llvm.i64 + %6431 = llvm.add %6429, %6430 : !llvm.i64 + %6432 = llvm.add %6431, %69 : !llvm.i64 + %6433 = llvm.icmp "slt" %6432, %67 : !llvm.i64 + %6434 = llvm.sub %64, %6432 : !llvm.i64 + %6435 = llvm.select %6433, %6434, %6432 : !llvm.i1, !llvm.i64 + %6436 = llvm.sdiv %6435, %63 : !llvm.i64 + %6437 = llvm.sub %64, %6436 : !llvm.i64 + %6438 = llvm.select %6433, %6437, %6436 : !llvm.i1, !llvm.i64 + %6439 = llvm.mul %6438, %65 : !llvm.i64 + %6440 = llvm.add %6431, %6439 : !llvm.i64 + %6441 = llvm.add %6440, %69 : !llvm.i64 + %6442 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6444 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6445 = llvm.mul %6426, %6444 : !llvm.i64 + %6446 = llvm.add %6443, %6445 : !llvm.i64 + %6447 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6448 = llvm.mul %67, %6447 : !llvm.i64 + %6449 = llvm.add %6446, %6448 : !llvm.i64 + %6450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6451 = llvm.mul %6441, %6450 : !llvm.i64 + %6452 = llvm.add %6449, %6451 : !llvm.i64 + %6453 = llvm.getelementptr %6442[%6452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6454 = llvm.load %6453 : !llvm.ptr> + %6455 = llvm.fadd %6415, %6454 : !llvm.vec<8 x float> + %6456 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6457 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6458 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6459 = llvm.mul %67, %6458 : !llvm.i64 + %6460 = llvm.add %6457, %6459 : !llvm.i64 + %6461 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6462 = llvm.mul %69, %6461 : !llvm.i64 + %6463 = llvm.add %6460, %6462 : !llvm.i64 + %6464 = llvm.getelementptr %6456[%6463] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6455, %6464 : !llvm.ptr> + %6465 = llvm.add %6315, %68 : !llvm.i64 + %6466 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6467 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6468 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6469 = llvm.mul %2345, %6468 : !llvm.i64 + %6470 = llvm.add %6467, %6469 : !llvm.i64 + %6471 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6472 = llvm.mul %6465, %6471 : !llvm.i64 + %6473 = llvm.add %6470, %6472 : !llvm.i64 + %6474 = llvm.getelementptr %6466[%6473] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6475 = llvm.bitcast %6474 : !llvm.ptr to !llvm.ptr> + %6476 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6477 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6478 = llvm.trunc %6465 : !llvm.i64 to !llvm.i32 + %6479 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6480 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6481 = llvm.insertelement %6478, %6479[%6480 : !llvm.i32] : !llvm.vec<8 x i32> + %6482 = llvm.shufflevector %6481, %6479 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6483 = llvm.add %6482, %6477 : !llvm.vec<8 x i32> + %6484 = llvm.trunc %6476 : !llvm.i64 to !llvm.i32 + %6485 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6486 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6487 = llvm.insertelement %6484, %6485[%6486 : !llvm.i32] : !llvm.vec<8 x i32> + %6488 = llvm.shufflevector %6487, %6485 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6489 = llvm.icmp "slt" %6483, %6488 : !llvm.vec<8 x i32> + %6490 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6491 = llvm.intr.masked.load %6475, %6489, %6490 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6492 = llvm.add %6347, %69 : !llvm.i64 + %6493 = llvm.icmp "slt" %6492, %67 : !llvm.i64 + %6494 = llvm.sub %64, %6492 : !llvm.i64 + %6495 = llvm.select %6493, %6494, %6492 : !llvm.i1, !llvm.i64 + %6496 = llvm.sdiv %6495, %68 : !llvm.i64 + %6497 = llvm.sub %64, %6496 : !llvm.i64 + %6498 = llvm.select %6493, %6497, %6496 : !llvm.i1, !llvm.i64 + %6499 = llvm.mul %6498, %60 : !llvm.i64 + %6500 = llvm.add %6347, %6499 : !llvm.i64 + %6501 = llvm.add %6500, %69 : !llvm.i64 + %6502 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6503 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6504 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6505 = llvm.mul %6501, %6504 : !llvm.i64 + %6506 = llvm.add %6503, %6505 : !llvm.i64 + %6507 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6508 = llvm.mul %67, %6507 : !llvm.i64 + %6509 = llvm.add %6506, %6508 : !llvm.i64 + %6510 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6511 = llvm.mul %6365, %6510 : !llvm.i64 + %6512 = llvm.add %6509, %6511 : !llvm.i64 + %6513 = llvm.getelementptr %6502[%6512] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6514 = llvm.load %6513 : !llvm.ptr> + %6515 = llvm.fadd %6491, %6514 : !llvm.vec<8 x float> + %6516 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6517 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6518 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6519 = llvm.mul %67, %6518 : !llvm.i64 + %6520 = llvm.add %6517, %6519 : !llvm.i64 + %6521 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6522 = llvm.mul %63, %6521 : !llvm.i64 + %6523 = llvm.add %6520, %6522 : !llvm.i64 + %6524 = llvm.getelementptr %6516[%6523] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6515, %6524 : !llvm.ptr> + %6525 = llvm.add %6315, %41 : !llvm.i64 + %6526 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6528 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6529 = llvm.mul %2345, %6528 : !llvm.i64 + %6530 = llvm.add %6527, %6529 : !llvm.i64 + %6531 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6532 = llvm.mul %6525, %6531 : !llvm.i64 + %6533 = llvm.add %6530, %6532 : !llvm.i64 + %6534 = llvm.getelementptr %6526[%6533] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6535 = llvm.bitcast %6534 : !llvm.ptr to !llvm.ptr> + %6536 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6537 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6538 = llvm.trunc %6525 : !llvm.i64 to !llvm.i32 + %6539 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6540 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6541 = llvm.insertelement %6538, %6539[%6540 : !llvm.i32] : !llvm.vec<8 x i32> + %6542 = llvm.shufflevector %6541, %6539 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6543 = llvm.add %6542, %6537 : !llvm.vec<8 x i32> + %6544 = llvm.trunc %6536 : !llvm.i64 to !llvm.i32 + %6545 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6546 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6547 = llvm.insertelement %6544, %6545[%6546 : !llvm.i32] : !llvm.vec<8 x i32> + %6548 = llvm.shufflevector %6547, %6545 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6549 = llvm.icmp "slt" %6543, %6548 : !llvm.vec<8 x i32> + %6550 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6551 = llvm.intr.masked.load %6535, %6549, %6550 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6552 = llvm.add %5447, %41 : !llvm.i64 + %6553 = llvm.icmp "slt" %6552, %67 : !llvm.i64 + %6554 = llvm.sub %64, %6552 : !llvm.i64 + %6555 = llvm.select %6553, %6554, %6552 : !llvm.i1, !llvm.i64 + %6556 = llvm.sdiv %6555, %68 : !llvm.i64 + %6557 = llvm.sub %64, %6556 : !llvm.i64 + %6558 = llvm.select %6553, %6557, %6556 : !llvm.i1, !llvm.i64 + %6559 = llvm.srem %6558, %68 : !llvm.i64 + %6560 = llvm.icmp "slt" %6559, %67 : !llvm.i64 + %6561 = llvm.add %6559, %68 : !llvm.i64 + %6562 = llvm.select %6560, %6561, %6559 : !llvm.i1, !llvm.i64 + %6563 = llvm.mul %6558, %65 : !llvm.i64 + %6564 = llvm.add %6429, %6563 : !llvm.i64 + %6565 = llvm.add %6564, %45 : !llvm.i64 + %6566 = llvm.icmp "slt" %6565, %67 : !llvm.i64 + %6567 = llvm.sub %64, %6565 : !llvm.i64 + %6568 = llvm.select %6566, %6567, %6565 : !llvm.i1, !llvm.i64 + %6569 = llvm.sdiv %6568, %63 : !llvm.i64 + %6570 = llvm.sub %64, %6569 : !llvm.i64 + %6571 = llvm.select %6566, %6570, %6569 : !llvm.i1, !llvm.i64 + %6572 = llvm.mul %6571, %65 : !llvm.i64 + %6573 = llvm.add %6564, %6572 : !llvm.i64 + %6574 = llvm.add %6573, %45 : !llvm.i64 + %6575 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6576 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6577 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6578 = llvm.mul %6562, %6577 : !llvm.i64 + %6579 = llvm.add %6576, %6578 : !llvm.i64 + %6580 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6581 = llvm.mul %67, %6580 : !llvm.i64 + %6582 = llvm.add %6579, %6581 : !llvm.i64 + %6583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6584 = llvm.mul %6574, %6583 : !llvm.i64 + %6585 = llvm.add %6582, %6584 : !llvm.i64 + %6586 = llvm.getelementptr %6575[%6585] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6587 = llvm.load %6586 : !llvm.ptr> + %6588 = llvm.fadd %6551, %6587 : !llvm.vec<8 x float> + %6589 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6592 = llvm.mul %67, %6591 : !llvm.i64 + %6593 = llvm.add %6590, %6592 : !llvm.i64 + %6594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6595 = llvm.mul %45, %6594 : !llvm.i64 + %6596 = llvm.add %6593, %6595 : !llvm.i64 + %6597 = llvm.getelementptr %6589[%6596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6588, %6597 : !llvm.ptr> + %6598 = llvm.add %6315, %42 : !llvm.i64 + %6599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6602 = llvm.mul %2345, %6601 : !llvm.i64 + %6603 = llvm.add %6600, %6602 : !llvm.i64 + %6604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6605 = llvm.mul %6598, %6604 : !llvm.i64 + %6606 = llvm.add %6603, %6605 : !llvm.i64 + %6607 = llvm.getelementptr %6599[%6606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6608 = llvm.bitcast %6607 : !llvm.ptr to !llvm.ptr> + %6609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6611 = llvm.trunc %6598 : !llvm.i64 to !llvm.i32 + %6612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6614 = llvm.insertelement %6611, %6612[%6613 : !llvm.i32] : !llvm.vec<8 x i32> + %6615 = llvm.shufflevector %6614, %6612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6616 = llvm.add %6615, %6610 : !llvm.vec<8 x i32> + %6617 = llvm.trunc %6609 : !llvm.i64 to !llvm.i32 + %6618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6620 = llvm.insertelement %6617, %6618[%6619 : !llvm.i32] : !llvm.vec<8 x i32> + %6621 = llvm.shufflevector %6620, %6618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6622 = llvm.icmp "slt" %6616, %6621 : !llvm.vec<8 x i32> + %6623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6624 = llvm.intr.masked.load %6608, %6622, %6623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6625 = llvm.add %6347, %63 : !llvm.i64 + %6626 = llvm.icmp "slt" %6625, %67 : !llvm.i64 + %6627 = llvm.sub %64, %6625 : !llvm.i64 + %6628 = llvm.select %6626, %6627, %6625 : !llvm.i1, !llvm.i64 + %6629 = llvm.sdiv %6628, %68 : !llvm.i64 + %6630 = llvm.sub %64, %6629 : !llvm.i64 + %6631 = llvm.select %6626, %6630, %6629 : !llvm.i1, !llvm.i64 + %6632 = llvm.mul %6631, %60 : !llvm.i64 + %6633 = llvm.add %6347, %6632 : !llvm.i64 + %6634 = llvm.add %6633, %63 : !llvm.i64 + %6635 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6637 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6638 = llvm.mul %6634, %6637 : !llvm.i64 + %6639 = llvm.add %6636, %6638 : !llvm.i64 + %6640 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6641 = llvm.mul %67, %6640 : !llvm.i64 + %6642 = llvm.add %6639, %6641 : !llvm.i64 + %6643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6644 = llvm.mul %6365, %6643 : !llvm.i64 + %6645 = llvm.add %6642, %6644 : !llvm.i64 + %6646 = llvm.getelementptr %6635[%6645] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6647 = llvm.load %6646 : !llvm.ptr> + %6648 = llvm.fadd %6624, %6647 : !llvm.vec<8 x float> + %6649 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6651 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6652 = llvm.mul %67, %6651 : !llvm.i64 + %6653 = llvm.add %6650, %6652 : !llvm.i64 + %6654 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6655 = llvm.mul %48, %6654 : !llvm.i64 + %6656 = llvm.add %6653, %6655 : !llvm.i64 + %6657 = llvm.getelementptr %6649[%6656] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6648, %6657 : !llvm.ptr> + %6658 = llvm.add %6315, %43 : !llvm.i64 + %6659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6662 = llvm.mul %2345, %6661 : !llvm.i64 + %6663 = llvm.add %6660, %6662 : !llvm.i64 + %6664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6665 = llvm.mul %6658, %6664 : !llvm.i64 + %6666 = llvm.add %6663, %6665 : !llvm.i64 + %6667 = llvm.getelementptr %6659[%6666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6668 = llvm.bitcast %6667 : !llvm.ptr to !llvm.ptr> + %6669 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6670 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6671 = llvm.trunc %6658 : !llvm.i64 to !llvm.i32 + %6672 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6673 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6674 = llvm.insertelement %6671, %6672[%6673 : !llvm.i32] : !llvm.vec<8 x i32> + %6675 = llvm.shufflevector %6674, %6672 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6676 = llvm.add %6675, %6670 : !llvm.vec<8 x i32> + %6677 = llvm.trunc %6669 : !llvm.i64 to !llvm.i32 + %6678 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6679 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6680 = llvm.insertelement %6677, %6678[%6679 : !llvm.i32] : !llvm.vec<8 x i32> + %6681 = llvm.shufflevector %6680, %6678 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6682 = llvm.icmp "slt" %6676, %6681 : !llvm.vec<8 x i32> + %6683 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6684 = llvm.intr.masked.load %6668, %6682, %6683 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6685 = llvm.add %5447, %43 : !llvm.i64 + %6686 = llvm.icmp "slt" %6685, %67 : !llvm.i64 + %6687 = llvm.sub %64, %6685 : !llvm.i64 + %6688 = llvm.select %6686, %6687, %6685 : !llvm.i1, !llvm.i64 + %6689 = llvm.sdiv %6688, %68 : !llvm.i64 + %6690 = llvm.sub %64, %6689 : !llvm.i64 + %6691 = llvm.select %6686, %6690, %6689 : !llvm.i1, !llvm.i64 + %6692 = llvm.srem %6691, %68 : !llvm.i64 + %6693 = llvm.icmp "slt" %6692, %67 : !llvm.i64 + %6694 = llvm.add %6692, %68 : !llvm.i64 + %6695 = llvm.select %6693, %6694, %6692 : !llvm.i1, !llvm.i64 + %6696 = llvm.mul %6691, %65 : !llvm.i64 + %6697 = llvm.add %6429, %6696 : !llvm.i64 + %6698 = llvm.add %6697, %52 : !llvm.i64 + %6699 = llvm.icmp "slt" %6698, %67 : !llvm.i64 + %6700 = llvm.sub %64, %6698 : !llvm.i64 + %6701 = llvm.select %6699, %6700, %6698 : !llvm.i1, !llvm.i64 + %6702 = llvm.sdiv %6701, %63 : !llvm.i64 + %6703 = llvm.sub %64, %6702 : !llvm.i64 + %6704 = llvm.select %6699, %6703, %6702 : !llvm.i1, !llvm.i64 + %6705 = llvm.mul %6704, %65 : !llvm.i64 + %6706 = llvm.add %6697, %6705 : !llvm.i64 + %6707 = llvm.add %6706, %52 : !llvm.i64 + %6708 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6710 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6711 = llvm.mul %6695, %6710 : !llvm.i64 + %6712 = llvm.add %6709, %6711 : !llvm.i64 + %6713 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6714 = llvm.mul %67, %6713 : !llvm.i64 + %6715 = llvm.add %6712, %6714 : !llvm.i64 + %6716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6717 = llvm.mul %6707, %6716 : !llvm.i64 + %6718 = llvm.add %6715, %6717 : !llvm.i64 + %6719 = llvm.getelementptr %6708[%6718] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6720 = llvm.load %6719 : !llvm.ptr> + %6721 = llvm.fadd %6684, %6720 : !llvm.vec<8 x float> + %6722 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6724 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6725 = llvm.mul %67, %6724 : !llvm.i64 + %6726 = llvm.add %6723, %6725 : !llvm.i64 + %6727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6728 = llvm.mul %52, %6727 : !llvm.i64 + %6729 = llvm.add %6726, %6728 : !llvm.i64 + %6730 = llvm.getelementptr %6722[%6729] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6721, %6730 : !llvm.ptr> + %6731 = llvm.add %6315, %44 : !llvm.i64 + %6732 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6734 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6735 = llvm.mul %2345, %6734 : !llvm.i64 + %6736 = llvm.add %6733, %6735 : !llvm.i64 + %6737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6738 = llvm.mul %6731, %6737 : !llvm.i64 + %6739 = llvm.add %6736, %6738 : !llvm.i64 + %6740 = llvm.getelementptr %6732[%6739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6741 = llvm.bitcast %6740 : !llvm.ptr to !llvm.ptr> + %6742 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6743 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6744 = llvm.trunc %6731 : !llvm.i64 to !llvm.i32 + %6745 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6746 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6747 = llvm.insertelement %6744, %6745[%6746 : !llvm.i32] : !llvm.vec<8 x i32> + %6748 = llvm.shufflevector %6747, %6745 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6749 = llvm.add %6748, %6743 : !llvm.vec<8 x i32> + %6750 = llvm.trunc %6742 : !llvm.i64 to !llvm.i32 + %6751 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6752 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6753 = llvm.insertelement %6750, %6751[%6752 : !llvm.i32] : !llvm.vec<8 x i32> + %6754 = llvm.shufflevector %6753, %6751 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6755 = llvm.icmp "slt" %6749, %6754 : !llvm.vec<8 x i32> + %6756 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6757 = llvm.intr.masked.load %6741, %6755, %6756 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6758 = llvm.add %6347, %45 : !llvm.i64 + %6759 = llvm.icmp "slt" %6758, %67 : !llvm.i64 + %6760 = llvm.sub %64, %6758 : !llvm.i64 + %6761 = llvm.select %6759, %6760, %6758 : !llvm.i1, !llvm.i64 + %6762 = llvm.sdiv %6761, %68 : !llvm.i64 + %6763 = llvm.sub %64, %6762 : !llvm.i64 + %6764 = llvm.select %6759, %6763, %6762 : !llvm.i1, !llvm.i64 + %6765 = llvm.mul %6764, %60 : !llvm.i64 + %6766 = llvm.add %6347, %6765 : !llvm.i64 + %6767 = llvm.add %6766, %45 : !llvm.i64 + %6768 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6770 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6771 = llvm.mul %6767, %6770 : !llvm.i64 + %6772 = llvm.add %6769, %6771 : !llvm.i64 + %6773 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6774 = llvm.mul %67, %6773 : !llvm.i64 + %6775 = llvm.add %6772, %6774 : !llvm.i64 + %6776 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6777 = llvm.mul %6365, %6776 : !llvm.i64 + %6778 = llvm.add %6775, %6777 : !llvm.i64 + %6779 = llvm.getelementptr %6768[%6778] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6780 = llvm.load %6779 : !llvm.ptr> + %6781 = llvm.fadd %6757, %6780 : !llvm.vec<8 x float> + %6782 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6784 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6785 = llvm.mul %67, %6784 : !llvm.i64 + %6786 = llvm.add %6783, %6785 : !llvm.i64 + %6787 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6788 = llvm.mul %56, %6787 : !llvm.i64 + %6789 = llvm.add %6786, %6788 : !llvm.i64 + %6790 = llvm.getelementptr %6782[%6789] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6781, %6790 : !llvm.ptr> + %6791 = llvm.add %6315, %46 : !llvm.i64 + %6792 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6794 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6795 = llvm.mul %2345, %6794 : !llvm.i64 + %6796 = llvm.add %6793, %6795 : !llvm.i64 + %6797 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6798 = llvm.mul %6791, %6797 : !llvm.i64 + %6799 = llvm.add %6796, %6798 : !llvm.i64 + %6800 = llvm.getelementptr %6792[%6799] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6801 = llvm.bitcast %6800 : !llvm.ptr to !llvm.ptr> + %6802 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6803 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6804 = llvm.trunc %6791 : !llvm.i64 to !llvm.i32 + %6805 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6806 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6807 = llvm.insertelement %6804, %6805[%6806 : !llvm.i32] : !llvm.vec<8 x i32> + %6808 = llvm.shufflevector %6807, %6805 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6809 = llvm.add %6808, %6803 : !llvm.vec<8 x i32> + %6810 = llvm.trunc %6802 : !llvm.i64 to !llvm.i32 + %6811 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6812 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6813 = llvm.insertelement %6810, %6811[%6812 : !llvm.i32] : !llvm.vec<8 x i32> + %6814 = llvm.shufflevector %6813, %6811 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6815 = llvm.icmp "slt" %6809, %6814 : !llvm.vec<8 x i32> + %6816 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6817 = llvm.intr.masked.load %6801, %6815, %6816 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6818 = llvm.add %5447, %46 : !llvm.i64 + %6819 = llvm.icmp "slt" %6818, %67 : !llvm.i64 + %6820 = llvm.sub %64, %6818 : !llvm.i64 + %6821 = llvm.select %6819, %6820, %6818 : !llvm.i1, !llvm.i64 + %6822 = llvm.sdiv %6821, %68 : !llvm.i64 + %6823 = llvm.sub %64, %6822 : !llvm.i64 + %6824 = llvm.select %6819, %6823, %6822 : !llvm.i1, !llvm.i64 + %6825 = llvm.srem %6824, %68 : !llvm.i64 + %6826 = llvm.icmp "slt" %6825, %67 : !llvm.i64 + %6827 = llvm.add %6825, %68 : !llvm.i64 + %6828 = llvm.select %6826, %6827, %6825 : !llvm.i1, !llvm.i64 + %6829 = llvm.mul %6824, %65 : !llvm.i64 + %6830 = llvm.add %6429, %6829 : !llvm.i64 + %6831 = llvm.add %6830, %61 : !llvm.i64 + %6832 = llvm.icmp "slt" %6831, %67 : !llvm.i64 + %6833 = llvm.sub %64, %6831 : !llvm.i64 + %6834 = llvm.select %6832, %6833, %6831 : !llvm.i1, !llvm.i64 + %6835 = llvm.sdiv %6834, %63 : !llvm.i64 + %6836 = llvm.sub %64, %6835 : !llvm.i64 + %6837 = llvm.select %6832, %6836, %6835 : !llvm.i1, !llvm.i64 + %6838 = llvm.mul %6837, %65 : !llvm.i64 + %6839 = llvm.add %6830, %6838 : !llvm.i64 + %6840 = llvm.add %6839, %61 : !llvm.i64 + %6841 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6842 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6843 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6844 = llvm.mul %6828, %6843 : !llvm.i64 + %6845 = llvm.add %6842, %6844 : !llvm.i64 + %6846 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6847 = llvm.mul %67, %6846 : !llvm.i64 + %6848 = llvm.add %6845, %6847 : !llvm.i64 + %6849 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6850 = llvm.mul %6840, %6849 : !llvm.i64 + %6851 = llvm.add %6848, %6850 : !llvm.i64 + %6852 = llvm.getelementptr %6841[%6851] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6853 = llvm.load %6852 : !llvm.ptr> + %6854 = llvm.fadd %6817, %6853 : !llvm.vec<8 x float> + %6855 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6857 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6858 = llvm.mul %67, %6857 : !llvm.i64 + %6859 = llvm.add %6856, %6858 : !llvm.i64 + %6860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6861 = llvm.mul %61, %6860 : !llvm.i64 + %6862 = llvm.add %6859, %6861 : !llvm.i64 + %6863 = llvm.getelementptr %6855[%6862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6854, %6863 : !llvm.ptr> + %6864 = llvm.add %6315, %47 : !llvm.i64 + %6865 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6866 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6867 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6868 = llvm.mul %2345, %6867 : !llvm.i64 + %6869 = llvm.add %6866, %6868 : !llvm.i64 + %6870 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6871 = llvm.mul %6864, %6870 : !llvm.i64 + %6872 = llvm.add %6869, %6871 : !llvm.i64 + %6873 = llvm.getelementptr %6865[%6872] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6874 = llvm.bitcast %6873 : !llvm.ptr to !llvm.ptr> + %6875 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6876 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6877 = llvm.trunc %6864 : !llvm.i64 to !llvm.i32 + %6878 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6879 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6880 = llvm.insertelement %6877, %6878[%6879 : !llvm.i32] : !llvm.vec<8 x i32> + %6881 = llvm.shufflevector %6880, %6878 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6882 = llvm.add %6881, %6876 : !llvm.vec<8 x i32> + %6883 = llvm.trunc %6875 : !llvm.i64 to !llvm.i32 + %6884 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6885 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6886 = llvm.insertelement %6883, %6884[%6885 : !llvm.i32] : !llvm.vec<8 x i32> + %6887 = llvm.shufflevector %6886, %6884 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6888 = llvm.icmp "slt" %6882, %6887 : !llvm.vec<8 x i32> + %6889 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6890 = llvm.intr.masked.load %6874, %6888, %6889 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6891 = llvm.add %6347, %48 : !llvm.i64 + %6892 = llvm.icmp "slt" %6891, %67 : !llvm.i64 + %6893 = llvm.sub %64, %6891 : !llvm.i64 + %6894 = llvm.select %6892, %6893, %6891 : !llvm.i1, !llvm.i64 + %6895 = llvm.sdiv %6894, %68 : !llvm.i64 + %6896 = llvm.sub %64, %6895 : !llvm.i64 + %6897 = llvm.select %6892, %6896, %6895 : !llvm.i1, !llvm.i64 + %6898 = llvm.mul %6897, %60 : !llvm.i64 + %6899 = llvm.add %6347, %6898 : !llvm.i64 + %6900 = llvm.add %6899, %48 : !llvm.i64 + %6901 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6903 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6904 = llvm.mul %6900, %6903 : !llvm.i64 + %6905 = llvm.add %6902, %6904 : !llvm.i64 + %6906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6907 = llvm.mul %67, %6906 : !llvm.i64 + %6908 = llvm.add %6905, %6907 : !llvm.i64 + %6909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6910 = llvm.mul %6365, %6909 : !llvm.i64 + %6911 = llvm.add %6908, %6910 : !llvm.i64 + %6912 = llvm.getelementptr %6901[%6911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6913 = llvm.load %6912 : !llvm.ptr> + %6914 = llvm.fadd %6890, %6913 : !llvm.vec<8 x float> + %6915 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6916 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6917 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6918 = llvm.mul %67, %6917 : !llvm.i64 + %6919 = llvm.add %6916, %6918 : !llvm.i64 + %6920 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6921 = llvm.mul %70, %6920 : !llvm.i64 + %6922 = llvm.add %6919, %6921 : !llvm.i64 + %6923 = llvm.getelementptr %6915[%6922] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6914, %6923 : !llvm.ptr> + %6924 = llvm.add %6315, %49 : !llvm.i64 + %6925 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6926 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6927 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6928 = llvm.mul %2345, %6927 : !llvm.i64 + %6929 = llvm.add %6926, %6928 : !llvm.i64 + %6930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6931 = llvm.mul %6924, %6930 : !llvm.i64 + %6932 = llvm.add %6929, %6931 : !llvm.i64 + %6933 = llvm.getelementptr %6925[%6932] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6934 = llvm.bitcast %6933 : !llvm.ptr to !llvm.ptr> + %6935 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6936 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6937 = llvm.trunc %6924 : !llvm.i64 to !llvm.i32 + %6938 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6939 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6940 = llvm.insertelement %6937, %6938[%6939 : !llvm.i32] : !llvm.vec<8 x i32> + %6941 = llvm.shufflevector %6940, %6938 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6942 = llvm.add %6941, %6936 : !llvm.vec<8 x i32> + %6943 = llvm.trunc %6935 : !llvm.i64 to !llvm.i32 + %6944 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6945 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6946 = llvm.insertelement %6943, %6944[%6945 : !llvm.i32] : !llvm.vec<8 x i32> + %6947 = llvm.shufflevector %6946, %6944 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6948 = llvm.icmp "slt" %6942, %6947 : !llvm.vec<8 x i32> + %6949 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6950 = llvm.intr.masked.load %6934, %6948, %6949 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6951 = llvm.add %5447, %49 : !llvm.i64 + %6952 = llvm.icmp "slt" %6951, %67 : !llvm.i64 + %6953 = llvm.sub %64, %6951 : !llvm.i64 + %6954 = llvm.select %6952, %6953, %6951 : !llvm.i1, !llvm.i64 + %6955 = llvm.sdiv %6954, %68 : !llvm.i64 + %6956 = llvm.sub %64, %6955 : !llvm.i64 + %6957 = llvm.select %6952, %6956, %6955 : !llvm.i1, !llvm.i64 + %6958 = llvm.srem %6957, %68 : !llvm.i64 + %6959 = llvm.icmp "slt" %6958, %67 : !llvm.i64 + %6960 = llvm.add %6958, %68 : !llvm.i64 + %6961 = llvm.select %6959, %6960, %6958 : !llvm.i1, !llvm.i64 + %6962 = llvm.mul %6957, %65 : !llvm.i64 + %6963 = llvm.add %6429, %6962 : !llvm.i64 + %6964 = llvm.add %6963, %50 : !llvm.i64 + %6965 = llvm.icmp "slt" %6964, %67 : !llvm.i64 + %6966 = llvm.sub %64, %6964 : !llvm.i64 + %6967 = llvm.select %6965, %6966, %6964 : !llvm.i1, !llvm.i64 + %6968 = llvm.sdiv %6967, %63 : !llvm.i64 + %6969 = llvm.sub %64, %6968 : !llvm.i64 + %6970 = llvm.select %6965, %6969, %6968 : !llvm.i1, !llvm.i64 + %6971 = llvm.mul %6970, %65 : !llvm.i64 + %6972 = llvm.add %6963, %6971 : !llvm.i64 + %6973 = llvm.add %6972, %50 : !llvm.i64 + %6974 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6975 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6976 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6977 = llvm.mul %6961, %6976 : !llvm.i64 + %6978 = llvm.add %6975, %6977 : !llvm.i64 + %6979 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6980 = llvm.mul %67, %6979 : !llvm.i64 + %6981 = llvm.add %6978, %6980 : !llvm.i64 + %6982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6983 = llvm.mul %6973, %6982 : !llvm.i64 + %6984 = llvm.add %6981, %6983 : !llvm.i64 + %6985 = llvm.getelementptr %6974[%6984] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6986 = llvm.load %6985 : !llvm.ptr> + %6987 = llvm.fadd %6950, %6986 : !llvm.vec<8 x float> + %6988 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6990 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6991 = llvm.mul %67, %6990 : !llvm.i64 + %6992 = llvm.add %6989, %6991 : !llvm.i64 + %6993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6994 = llvm.mul %50, %6993 : !llvm.i64 + %6995 = llvm.add %6992, %6994 : !llvm.i64 + %6996 = llvm.getelementptr %6988[%6995] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6987, %6996 : !llvm.ptr> + %6997 = llvm.add %6315, %51 : !llvm.i64 + %6998 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6999 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7000 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7001 = llvm.mul %2345, %7000 : !llvm.i64 + %7002 = llvm.add %6999, %7001 : !llvm.i64 + %7003 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7004 = llvm.mul %6997, %7003 : !llvm.i64 + %7005 = llvm.add %7002, %7004 : !llvm.i64 + %7006 = llvm.getelementptr %6998[%7005] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7007 = llvm.bitcast %7006 : !llvm.ptr to !llvm.ptr> + %7008 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7009 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7010 = llvm.trunc %6997 : !llvm.i64 to !llvm.i32 + %7011 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7012 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7013 = llvm.insertelement %7010, %7011[%7012 : !llvm.i32] : !llvm.vec<8 x i32> + %7014 = llvm.shufflevector %7013, %7011 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7015 = llvm.add %7014, %7009 : !llvm.vec<8 x i32> + %7016 = llvm.trunc %7008 : !llvm.i64 to !llvm.i32 + %7017 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7018 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7019 = llvm.insertelement %7016, %7017[%7018 : !llvm.i32] : !llvm.vec<8 x i32> + %7020 = llvm.shufflevector %7019, %7017 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7021 = llvm.icmp "slt" %7015, %7020 : !llvm.vec<8 x i32> + %7022 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7023 = llvm.intr.masked.load %7007, %7021, %7022 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7024 = llvm.add %6347, %52 : !llvm.i64 + %7025 = llvm.icmp "slt" %7024, %67 : !llvm.i64 + %7026 = llvm.sub %64, %7024 : !llvm.i64 + %7027 = llvm.select %7025, %7026, %7024 : !llvm.i1, !llvm.i64 + %7028 = llvm.sdiv %7027, %68 : !llvm.i64 + %7029 = llvm.sub %64, %7028 : !llvm.i64 + %7030 = llvm.select %7025, %7029, %7028 : !llvm.i1, !llvm.i64 + %7031 = llvm.mul %7030, %60 : !llvm.i64 + %7032 = llvm.add %6347, %7031 : !llvm.i64 + %7033 = llvm.add %7032, %52 : !llvm.i64 + %7034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7037 = llvm.mul %7033, %7036 : !llvm.i64 + %7038 = llvm.add %7035, %7037 : !llvm.i64 + %7039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7040 = llvm.mul %67, %7039 : !llvm.i64 + %7041 = llvm.add %7038, %7040 : !llvm.i64 + %7042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7043 = llvm.mul %6365, %7042 : !llvm.i64 + %7044 = llvm.add %7041, %7043 : !llvm.i64 + %7045 = llvm.getelementptr %7034[%7044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7046 = llvm.load %7045 : !llvm.ptr> + %7047 = llvm.fadd %7023, %7046 : !llvm.vec<8 x float> + %7048 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7050 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7051 = llvm.mul %67, %7050 : !llvm.i64 + %7052 = llvm.add %7049, %7051 : !llvm.i64 + %7053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7054 = llvm.mul %33, %7053 : !llvm.i64 + %7055 = llvm.add %7052, %7054 : !llvm.i64 + %7056 = llvm.getelementptr %7048[%7055] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7047, %7056 : !llvm.ptr> + %7057 = llvm.add %6315, %53 : !llvm.i64 + %7058 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7059 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7060 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7061 = llvm.mul %2345, %7060 : !llvm.i64 + %7062 = llvm.add %7059, %7061 : !llvm.i64 + %7063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7064 = llvm.mul %7057, %7063 : !llvm.i64 + %7065 = llvm.add %7062, %7064 : !llvm.i64 + %7066 = llvm.getelementptr %7058[%7065] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7067 = llvm.bitcast %7066 : !llvm.ptr to !llvm.ptr> + %7068 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7069 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7070 = llvm.trunc %7057 : !llvm.i64 to !llvm.i32 + %7071 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7072 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7073 = llvm.insertelement %7070, %7071[%7072 : !llvm.i32] : !llvm.vec<8 x i32> + %7074 = llvm.shufflevector %7073, %7071 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7075 = llvm.add %7074, %7069 : !llvm.vec<8 x i32> + %7076 = llvm.trunc %7068 : !llvm.i64 to !llvm.i32 + %7077 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7078 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7079 = llvm.insertelement %7076, %7077[%7078 : !llvm.i32] : !llvm.vec<8 x i32> + %7080 = llvm.shufflevector %7079, %7077 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7081 = llvm.icmp "slt" %7075, %7080 : !llvm.vec<8 x i32> + %7082 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7083 = llvm.intr.masked.load %7067, %7081, %7082 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7084 = llvm.add %5447, %53 : !llvm.i64 + %7085 = llvm.icmp "slt" %7084, %67 : !llvm.i64 + %7086 = llvm.sub %64, %7084 : !llvm.i64 + %7087 = llvm.select %7085, %7086, %7084 : !llvm.i1, !llvm.i64 + %7088 = llvm.sdiv %7087, %68 : !llvm.i64 + %7089 = llvm.sub %64, %7088 : !llvm.i64 + %7090 = llvm.select %7085, %7089, %7088 : !llvm.i1, !llvm.i64 + %7091 = llvm.srem %7090, %68 : !llvm.i64 + %7092 = llvm.icmp "slt" %7091, %67 : !llvm.i64 + %7093 = llvm.add %7091, %68 : !llvm.i64 + %7094 = llvm.select %7092, %7093, %7091 : !llvm.i1, !llvm.i64 + %7095 = llvm.mul %7090, %65 : !llvm.i64 + %7096 = llvm.add %6429, %7095 : !llvm.i64 + %7097 = llvm.add %7096, %54 : !llvm.i64 + %7098 = llvm.icmp "slt" %7097, %67 : !llvm.i64 + %7099 = llvm.sub %64, %7097 : !llvm.i64 + %7100 = llvm.select %7098, %7099, %7097 : !llvm.i1, !llvm.i64 + %7101 = llvm.sdiv %7100, %63 : !llvm.i64 + %7102 = llvm.sub %64, %7101 : !llvm.i64 + %7103 = llvm.select %7098, %7102, %7101 : !llvm.i1, !llvm.i64 + %7104 = llvm.mul %7103, %65 : !llvm.i64 + %7105 = llvm.add %7096, %7104 : !llvm.i64 + %7106 = llvm.add %7105, %54 : !llvm.i64 + %7107 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7108 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7109 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7110 = llvm.mul %7094, %7109 : !llvm.i64 + %7111 = llvm.add %7108, %7110 : !llvm.i64 + %7112 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7113 = llvm.mul %67, %7112 : !llvm.i64 + %7114 = llvm.add %7111, %7113 : !llvm.i64 + %7115 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7116 = llvm.mul %7106, %7115 : !llvm.i64 + %7117 = llvm.add %7114, %7116 : !llvm.i64 + %7118 = llvm.getelementptr %7107[%7117] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7119 = llvm.load %7118 : !llvm.ptr> + %7120 = llvm.fadd %7083, %7119 : !llvm.vec<8 x float> + %7121 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7122 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7123 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7124 = llvm.mul %67, %7123 : !llvm.i64 + %7125 = llvm.add %7122, %7124 : !llvm.i64 + %7126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7127 = llvm.mul %54, %7126 : !llvm.i64 + %7128 = llvm.add %7125, %7127 : !llvm.i64 + %7129 = llvm.getelementptr %7121[%7128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7120, %7129 : !llvm.ptr> + %7130 = llvm.add %6315, %55 : !llvm.i64 + %7131 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7134 = llvm.mul %2345, %7133 : !llvm.i64 + %7135 = llvm.add %7132, %7134 : !llvm.i64 + %7136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7137 = llvm.mul %7130, %7136 : !llvm.i64 + %7138 = llvm.add %7135, %7137 : !llvm.i64 + %7139 = llvm.getelementptr %7131[%7138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7140 = llvm.bitcast %7139 : !llvm.ptr to !llvm.ptr> + %7141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7143 = llvm.trunc %7130 : !llvm.i64 to !llvm.i32 + %7144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7146 = llvm.insertelement %7143, %7144[%7145 : !llvm.i32] : !llvm.vec<8 x i32> + %7147 = llvm.shufflevector %7146, %7144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7148 = llvm.add %7147, %7142 : !llvm.vec<8 x i32> + %7149 = llvm.trunc %7141 : !llvm.i64 to !llvm.i32 + %7150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7152 = llvm.insertelement %7149, %7150[%7151 : !llvm.i32] : !llvm.vec<8 x i32> + %7153 = llvm.shufflevector %7152, %7150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7154 = llvm.icmp "slt" %7148, %7153 : !llvm.vec<8 x i32> + %7155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7156 = llvm.intr.masked.load %7140, %7154, %7155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7157 = llvm.add %6347, %56 : !llvm.i64 + %7158 = llvm.icmp "slt" %7157, %67 : !llvm.i64 + %7159 = llvm.sub %64, %7157 : !llvm.i64 + %7160 = llvm.select %7158, %7159, %7157 : !llvm.i1, !llvm.i64 + %7161 = llvm.sdiv %7160, %68 : !llvm.i64 + %7162 = llvm.sub %64, %7161 : !llvm.i64 + %7163 = llvm.select %7158, %7162, %7161 : !llvm.i1, !llvm.i64 + %7164 = llvm.mul %7163, %60 : !llvm.i64 + %7165 = llvm.add %6347, %7164 : !llvm.i64 + %7166 = llvm.add %7165, %56 : !llvm.i64 + %7167 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7169 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7170 = llvm.mul %7166, %7169 : !llvm.i64 + %7171 = llvm.add %7168, %7170 : !llvm.i64 + %7172 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7173 = llvm.mul %67, %7172 : !llvm.i64 + %7174 = llvm.add %7171, %7173 : !llvm.i64 + %7175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7176 = llvm.mul %6365, %7175 : !llvm.i64 + %7177 = llvm.add %7174, %7176 : !llvm.i64 + %7178 = llvm.getelementptr %7167[%7177] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7179 = llvm.load %7178 : !llvm.ptr> + %7180 = llvm.fadd %7156, %7179 : !llvm.vec<8 x float> + %7181 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7182 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7183 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7184 = llvm.mul %67, %7183 : !llvm.i64 + %7185 = llvm.add %7182, %7184 : !llvm.i64 + %7186 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7187 = llvm.mul %34, %7186 : !llvm.i64 + %7188 = llvm.add %7185, %7187 : !llvm.i64 + %7189 = llvm.getelementptr %7181[%7188] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7180, %7189 : !llvm.ptr> + %7190 = llvm.add %6315, %57 : !llvm.i64 + %7191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7193 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7194 = llvm.mul %2345, %7193 : !llvm.i64 + %7195 = llvm.add %7192, %7194 : !llvm.i64 + %7196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7197 = llvm.mul %7190, %7196 : !llvm.i64 + %7198 = llvm.add %7195, %7197 : !llvm.i64 + %7199 = llvm.getelementptr %7191[%7198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7200 = llvm.bitcast %7199 : !llvm.ptr to !llvm.ptr> + %7201 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7202 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7203 = llvm.trunc %7190 : !llvm.i64 to !llvm.i32 + %7204 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7205 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7206 = llvm.insertelement %7203, %7204[%7205 : !llvm.i32] : !llvm.vec<8 x i32> + %7207 = llvm.shufflevector %7206, %7204 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7208 = llvm.add %7207, %7202 : !llvm.vec<8 x i32> + %7209 = llvm.trunc %7201 : !llvm.i64 to !llvm.i32 + %7210 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7211 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7212 = llvm.insertelement %7209, %7210[%7211 : !llvm.i32] : !llvm.vec<8 x i32> + %7213 = llvm.shufflevector %7212, %7210 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7214 = llvm.icmp "slt" %7208, %7213 : !llvm.vec<8 x i32> + %7215 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7216 = llvm.intr.masked.load %7200, %7214, %7215 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7217 = llvm.add %5447, %57 : !llvm.i64 + %7218 = llvm.icmp "slt" %7217, %67 : !llvm.i64 + %7219 = llvm.sub %64, %7217 : !llvm.i64 + %7220 = llvm.select %7218, %7219, %7217 : !llvm.i1, !llvm.i64 + %7221 = llvm.sdiv %7220, %68 : !llvm.i64 + %7222 = llvm.sub %64, %7221 : !llvm.i64 + %7223 = llvm.select %7218, %7222, %7221 : !llvm.i1, !llvm.i64 + %7224 = llvm.srem %7223, %68 : !llvm.i64 + %7225 = llvm.icmp "slt" %7224, %67 : !llvm.i64 + %7226 = llvm.add %7224, %68 : !llvm.i64 + %7227 = llvm.select %7225, %7226, %7224 : !llvm.i1, !llvm.i64 + %7228 = llvm.mul %7223, %65 : !llvm.i64 + %7229 = llvm.add %6429, %7228 : !llvm.i64 + %7230 = llvm.add %7229, %58 : !llvm.i64 + %7231 = llvm.icmp "slt" %7230, %67 : !llvm.i64 + %7232 = llvm.sub %64, %7230 : !llvm.i64 + %7233 = llvm.select %7231, %7232, %7230 : !llvm.i1, !llvm.i64 + %7234 = llvm.sdiv %7233, %63 : !llvm.i64 + %7235 = llvm.sub %64, %7234 : !llvm.i64 + %7236 = llvm.select %7231, %7235, %7234 : !llvm.i1, !llvm.i64 + %7237 = llvm.mul %7236, %65 : !llvm.i64 + %7238 = llvm.add %7229, %7237 : !llvm.i64 + %7239 = llvm.add %7238, %58 : !llvm.i64 + %7240 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7242 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7243 = llvm.mul %7227, %7242 : !llvm.i64 + %7244 = llvm.add %7241, %7243 : !llvm.i64 + %7245 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7246 = llvm.mul %67, %7245 : !llvm.i64 + %7247 = llvm.add %7244, %7246 : !llvm.i64 + %7248 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7249 = llvm.mul %7239, %7248 : !llvm.i64 + %7250 = llvm.add %7247, %7249 : !llvm.i64 + %7251 = llvm.getelementptr %7240[%7250] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7252 = llvm.load %7251 : !llvm.ptr> + %7253 = llvm.fadd %7216, %7252 : !llvm.vec<8 x float> + %7254 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7256 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7257 = llvm.mul %67, %7256 : !llvm.i64 + %7258 = llvm.add %7255, %7257 : !llvm.i64 + %7259 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7260 = llvm.mul %58, %7259 : !llvm.i64 + %7261 = llvm.add %7258, %7260 : !llvm.i64 + %7262 = llvm.getelementptr %7254[%7261] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7253, %7262 : !llvm.ptr> + %7263 = llvm.add %6315, %59 : !llvm.i64 + %7264 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7265 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7266 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7267 = llvm.mul %2345, %7266 : !llvm.i64 + %7268 = llvm.add %7265, %7267 : !llvm.i64 + %7269 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7270 = llvm.mul %7263, %7269 : !llvm.i64 + %7271 = llvm.add %7268, %7270 : !llvm.i64 + %7272 = llvm.getelementptr %7264[%7271] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7273 = llvm.bitcast %7272 : !llvm.ptr to !llvm.ptr> + %7274 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7275 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7276 = llvm.trunc %7263 : !llvm.i64 to !llvm.i32 + %7277 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7278 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7279 = llvm.insertelement %7276, %7277[%7278 : !llvm.i32] : !llvm.vec<8 x i32> + %7280 = llvm.shufflevector %7279, %7277 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7281 = llvm.add %7280, %7275 : !llvm.vec<8 x i32> + %7282 = llvm.trunc %7274 : !llvm.i64 to !llvm.i32 + %7283 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7284 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7285 = llvm.insertelement %7282, %7283[%7284 : !llvm.i32] : !llvm.vec<8 x i32> + %7286 = llvm.shufflevector %7285, %7283 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7287 = llvm.icmp "slt" %7281, %7286 : !llvm.vec<8 x i32> + %7288 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7289 = llvm.intr.masked.load %7273, %7287, %7288 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7290 = llvm.add %6347, %61 : !llvm.i64 + %7291 = llvm.icmp "slt" %7290, %67 : !llvm.i64 + %7292 = llvm.sub %64, %7290 : !llvm.i64 + %7293 = llvm.select %7291, %7292, %7290 : !llvm.i1, !llvm.i64 + %7294 = llvm.sdiv %7293, %68 : !llvm.i64 + %7295 = llvm.sub %64, %7294 : !llvm.i64 + %7296 = llvm.select %7291, %7295, %7294 : !llvm.i1, !llvm.i64 + %7297 = llvm.mul %7296, %60 : !llvm.i64 + %7298 = llvm.add %6347, %7297 : !llvm.i64 + %7299 = llvm.add %7298, %61 : !llvm.i64 + %7300 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7302 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7303 = llvm.mul %7299, %7302 : !llvm.i64 + %7304 = llvm.add %7301, %7303 : !llvm.i64 + %7305 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7306 = llvm.mul %67, %7305 : !llvm.i64 + %7307 = llvm.add %7304, %7306 : !llvm.i64 + %7308 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7309 = llvm.mul %6365, %7308 : !llvm.i64 + %7310 = llvm.add %7307, %7309 : !llvm.i64 + %7311 = llvm.getelementptr %7300[%7310] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7312 = llvm.load %7311 : !llvm.ptr> + %7313 = llvm.fadd %7289, %7312 : !llvm.vec<8 x float> + %7314 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7316 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7317 = llvm.mul %67, %7316 : !llvm.i64 + %7318 = llvm.add %7315, %7317 : !llvm.i64 + %7319 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7320 = llvm.mul %35, %7319 : !llvm.i64 + %7321 = llvm.add %7318, %7320 : !llvm.i64 + %7322 = llvm.getelementptr %7314[%7321] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7313, %7322 : !llvm.ptr> + %7323 = llvm.add %6315, %62 : !llvm.i64 + %7324 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7325 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7327 = llvm.mul %2345, %7326 : !llvm.i64 + %7328 = llvm.add %7325, %7327 : !llvm.i64 + %7329 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7330 = llvm.mul %7323, %7329 : !llvm.i64 + %7331 = llvm.add %7328, %7330 : !llvm.i64 + %7332 = llvm.getelementptr %7324[%7331] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7333 = llvm.bitcast %7332 : !llvm.ptr to !llvm.ptr> + %7334 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7335 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7336 = llvm.trunc %7323 : !llvm.i64 to !llvm.i32 + %7337 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7338 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7339 = llvm.insertelement %7336, %7337[%7338 : !llvm.i32] : !llvm.vec<8 x i32> + %7340 = llvm.shufflevector %7339, %7337 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7341 = llvm.add %7340, %7335 : !llvm.vec<8 x i32> + %7342 = llvm.trunc %7334 : !llvm.i64 to !llvm.i32 + %7343 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7344 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7345 = llvm.insertelement %7342, %7343[%7344 : !llvm.i32] : !llvm.vec<8 x i32> + %7346 = llvm.shufflevector %7345, %7343 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7347 = llvm.icmp "slt" %7341, %7346 : !llvm.vec<8 x i32> + %7348 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7349 = llvm.intr.masked.load %7333, %7347, %7348 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7350 = llvm.add %5447, %62 : !llvm.i64 + %7351 = llvm.icmp "slt" %7350, %67 : !llvm.i64 + %7352 = llvm.sub %64, %7350 : !llvm.i64 + %7353 = llvm.select %7351, %7352, %7350 : !llvm.i1, !llvm.i64 + %7354 = llvm.sdiv %7353, %68 : !llvm.i64 + %7355 = llvm.sub %64, %7354 : !llvm.i64 + %7356 = llvm.select %7351, %7355, %7354 : !llvm.i1, !llvm.i64 + %7357 = llvm.srem %7356, %68 : !llvm.i64 + %7358 = llvm.icmp "slt" %7357, %67 : !llvm.i64 + %7359 = llvm.add %7357, %68 : !llvm.i64 + %7360 = llvm.select %7358, %7359, %7357 : !llvm.i1, !llvm.i64 + %7361 = llvm.mul %7356, %65 : !llvm.i64 + %7362 = llvm.add %6429, %7361 : !llvm.i64 + %7363 = llvm.add %7362, %66 : !llvm.i64 + %7364 = llvm.icmp "slt" %7363, %67 : !llvm.i64 + %7365 = llvm.sub %64, %7363 : !llvm.i64 + %7366 = llvm.select %7364, %7365, %7363 : !llvm.i1, !llvm.i64 + %7367 = llvm.sdiv %7366, %63 : !llvm.i64 + %7368 = llvm.sub %64, %7367 : !llvm.i64 + %7369 = llvm.select %7364, %7368, %7367 : !llvm.i1, !llvm.i64 + %7370 = llvm.mul %7369, %65 : !llvm.i64 + %7371 = llvm.add %7362, %7370 : !llvm.i64 + %7372 = llvm.add %7371, %66 : !llvm.i64 + %7373 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7375 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7376 = llvm.mul %7360, %7375 : !llvm.i64 + %7377 = llvm.add %7374, %7376 : !llvm.i64 + %7378 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7379 = llvm.mul %67, %7378 : !llvm.i64 + %7380 = llvm.add %7377, %7379 : !llvm.i64 + %7381 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7382 = llvm.mul %7372, %7381 : !llvm.i64 + %7383 = llvm.add %7380, %7382 : !llvm.i64 + %7384 = llvm.getelementptr %7373[%7383] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7385 = llvm.load %7384 : !llvm.ptr> + %7386 = llvm.fadd %7349, %7385 : !llvm.vec<8 x float> + %7387 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7389 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7390 = llvm.mul %67, %7389 : !llvm.i64 + %7391 = llvm.add %7388, %7390 : !llvm.i64 + %7392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7393 = llvm.mul %66, %7392 : !llvm.i64 + %7394 = llvm.add %7391, %7393 : !llvm.i64 + %7395 = llvm.getelementptr %7387[%7394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7386, %7395 : !llvm.ptr> + llvm.br ^bb48(%67 : !llvm.i64) + ^bb48(%7396: !llvm.i64): // 2 preds: ^bb47, ^bb49 + %7397 = llvm.icmp "slt" %7396, %68 : !llvm.i64 + llvm.cond_br %7397, ^bb49, ^bb46 + ^bb49: // pred: ^bb48 + %7398 = llvm.mul %7396, %70 : !llvm.i64 + %7399 = llvm.add %6315, %7398 : !llvm.i64 + %7400 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7403 = llvm.mul %67, %7402 : !llvm.i64 + %7404 = llvm.add %7401, %7403 : !llvm.i64 + %7405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7406 = llvm.mul %7396, %7405 : !llvm.i64 + %7407 = llvm.add %7404, %7406 : !llvm.i64 + %7408 = llvm.getelementptr %7400[%7407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7409 = llvm.load %7408 : !llvm.ptr> + %7410 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7413 = llvm.mul %2345, %7412 : !llvm.i64 + %7414 = llvm.add %7411, %7413 : !llvm.i64 + %7415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7416 = llvm.mul %7399, %7415 : !llvm.i64 + %7417 = llvm.add %7414, %7416 : !llvm.i64 + %7418 = llvm.getelementptr %7410[%7417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7419 = llvm.bitcast %7418 : !llvm.ptr to !llvm.ptr> + %7420 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7421 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7422 = llvm.trunc %7399 : !llvm.i64 to !llvm.i32 + %7423 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7424 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7425 = llvm.insertelement %7422, %7423[%7424 : !llvm.i32] : !llvm.vec<8 x i32> + %7426 = llvm.shufflevector %7425, %7423 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7427 = llvm.add %7426, %7421 : !llvm.vec<8 x i32> + %7428 = llvm.trunc %7420 : !llvm.i64 to !llvm.i32 + %7429 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7430 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7431 = llvm.insertelement %7428, %7429[%7430 : !llvm.i32] : !llvm.vec<8 x i32> + %7432 = llvm.shufflevector %7431, %7429 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7433 = llvm.icmp "slt" %7427, %7432 : !llvm.vec<8 x i32> + llvm.intr.masked.store %7409, %7419, %7433 {alignment = 4 : i32} : !llvm.vec<8 x float>, !llvm.vec<8 x i1> into !llvm.ptr> + %7434 = llvm.add %7396, %69 : !llvm.i64 + llvm.br ^bb48(%7434 : !llvm.i64) + ^bb50: // pred: ^bb46 + %7435 = llvm.add %5447, %39 : !llvm.i64 + llvm.br ^bb41(%7435 : !llvm.i64) + ^bb51: // pred: ^bb41 + %7436 = llvm.add %2345, %69 : !llvm.i64 + llvm.br ^bb12(%7436 : !llvm.i64) + ^bb52: // pred: ^bb12 + %7437 = llvm.add %151, %38 : !llvm.i64 + llvm.br ^bb1(%7437 : !llvm.i64) + ^bb53: // pred: ^bb1 + llvm.return + } + llvm.func @optimized_matmul_py_4a6286d9(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(784 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(128 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(128 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(512 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(512 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(784 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(512 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(512 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/optimized_matmul/mlir/24_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir b/Tutorials/optimized_matmul/mlir/24_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir new file mode 100644 index 00000000..0531ddef --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/24_ConvertVulkanLaunchFuncToVulkanCallsWithTiming.mlir @@ -0,0 +1,7762 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.mlir.global internal @cache_17() : !llvm.array<4096 x vec<8 x float>> + llvm.mlir.global internal @cache_16() : !llvm.array<192 x vec<8 x float>> + llvm.func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(0 : i64) : !llvm.i64 + %25 = llvm.mlir.constant(1 : i64) : !llvm.i64 + %26 = llvm.mlir.constant(2 : i64) : !llvm.i64 + %27 = llvm.mlir.constant(3 : i64) : !llvm.i64 + %28 = llvm.mlir.constant(4 : i64) : !llvm.i64 + %29 = llvm.mlir.constant(5 : i64) : !llvm.i64 + %30 = llvm.mlir.constant(6 : i64) : !llvm.i64 + %31 = llvm.mlir.constant(7 : i64) : !llvm.i64 + %32 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %33 = llvm.mlir.constant(10 : index) : !llvm.i64 + %34 = llvm.mlir.constant(12 : index) : !llvm.i64 + %35 = llvm.mlir.constant(14 : index) : !llvm.i64 + %36 = llvm.mlir.constant(512 : index) : !llvm.i64 + %37 = llvm.mlir.constant(784 : index) : !llvm.i64 + %38 = llvm.mlir.constant(256 : index) : !llvm.i64 + %39 = llvm.mlir.constant(128 : index) : !llvm.i64 + %40 = llvm.mlir.constant(true) : !llvm.i1 + %41 = llvm.mlir.constant(24 : index) : !llvm.i64 + %42 = llvm.mlir.constant(32 : index) : !llvm.i64 + %43 = llvm.mlir.constant(40 : index) : !llvm.i64 + %44 = llvm.mlir.constant(48 : index) : !llvm.i64 + %45 = llvm.mlir.constant(3 : index) : !llvm.i64 + %46 = llvm.mlir.constant(56 : index) : !llvm.i64 + %47 = llvm.mlir.constant(64 : index) : !llvm.i64 + %48 = llvm.mlir.constant(4 : index) : !llvm.i64 + %49 = llvm.mlir.constant(72 : index) : !llvm.i64 + %50 = llvm.mlir.constant(9 : index) : !llvm.i64 + %51 = llvm.mlir.constant(80 : index) : !llvm.i64 + %52 = llvm.mlir.constant(5 : index) : !llvm.i64 + %53 = llvm.mlir.constant(88 : index) : !llvm.i64 + %54 = llvm.mlir.constant(11 : index) : !llvm.i64 + %55 = llvm.mlir.constant(96 : index) : !llvm.i64 + %56 = llvm.mlir.constant(6 : index) : !llvm.i64 + %57 = llvm.mlir.constant(104 : index) : !llvm.i64 + %58 = llvm.mlir.constant(13 : index) : !llvm.i64 + %59 = llvm.mlir.constant(112 : index) : !llvm.i64 + %60 = llvm.mlir.constant(-16 : index) : !llvm.i64 + %61 = llvm.mlir.constant(7 : index) : !llvm.i64 + %62 = llvm.mlir.constant(120 : index) : !llvm.i64 + %63 = llvm.mlir.constant(2 : index) : !llvm.i64 + %64 = llvm.mlir.constant(-1 : index) : !llvm.i64 + %65 = llvm.mlir.constant(-2 : index) : !llvm.i64 + %66 = llvm.mlir.constant(15 : index) : !llvm.i64 + %67 = llvm.mlir.constant(0 : index) : !llvm.i64 + %68 = llvm.mlir.constant(16 : index) : !llvm.i64 + %69 = llvm.mlir.constant(1 : index) : !llvm.i64 + %70 = llvm.mlir.constant(8 : index) : !llvm.i64 + %71 = llvm.mlir.constant(1 : index) : !llvm.i64 + %72 = llvm.mlir.constant(16 : index) : !llvm.i64 + %73 = llvm.mul %71, %72 : !llvm.i64 + %74 = llvm.mlir.null : !llvm.ptr> + %75 = llvm.mlir.constant(1 : index) : !llvm.i64 + %76 = llvm.getelementptr %74[%75] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %77 = llvm.ptrtoint %76 : !llvm.ptr> to !llvm.i64 + %78 = llvm.mul %73, %77 : !llvm.i64 + %79 = llvm.alloca %78 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %80 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %81 = llvm.insertvalue %79, %80[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %82 = llvm.insertvalue %79, %81[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %83 = llvm.mlir.constant(0 : index) : !llvm.i64 + %84 = llvm.insertvalue %83, %82[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %85 = llvm.mlir.constant(1 : index) : !llvm.i64 + %86 = llvm.mlir.constant(16 : index) : !llvm.i64 + %87 = llvm.insertvalue %71, %84[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %88 = llvm.insertvalue %86, %87[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.insertvalue %72, %88[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %90 = llvm.insertvalue %85, %89[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %91 = llvm.mlir.constant(1 : index) : !llvm.i64 + %92 = llvm.mlir.constant(16 : index) : !llvm.i64 + %93 = llvm.mul %91, %92 : !llvm.i64 + %94 = llvm.mlir.null : !llvm.ptr> + %95 = llvm.mlir.constant(1 : index) : !llvm.i64 + %96 = llvm.getelementptr %94[%95] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %97 = llvm.ptrtoint %96 : !llvm.ptr> to !llvm.i64 + %98 = llvm.mul %93, %97 : !llvm.i64 + %99 = llvm.alloca %98 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %100 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %101 = llvm.insertvalue %99, %100[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %102 = llvm.insertvalue %99, %101[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %103 = llvm.mlir.constant(0 : index) : !llvm.i64 + %104 = llvm.insertvalue %103, %102[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %105 = llvm.mlir.constant(1 : index) : !llvm.i64 + %106 = llvm.mlir.constant(16 : index) : !llvm.i64 + %107 = llvm.insertvalue %91, %104[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %108 = llvm.insertvalue %106, %107[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.insertvalue %92, %108[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %110 = llvm.insertvalue %105, %109[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %111 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %112 = llvm.mlir.addressof @cache_16 : !llvm.ptr>> + %113 = llvm.getelementptr %112[%111, %111] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %114 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %115 = llvm.insertvalue %113, %114[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %116 = llvm.insertvalue %113, %115[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %117 = llvm.mlir.constant(0 : index) : !llvm.i64 + %118 = llvm.insertvalue %117, %116[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %119 = llvm.mlir.constant(16 : index) : !llvm.i64 + %120 = llvm.insertvalue %119, %118[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %121 = llvm.mlir.constant(12 : index) : !llvm.i64 + %122 = llvm.insertvalue %121, %120[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %123 = llvm.mlir.constant(6 : index) : !llvm.i64 + %124 = llvm.insertvalue %123, %122[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %126 = llvm.insertvalue %125, %124[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %127 = llvm.mlir.constant(2 : index) : !llvm.i64 + %128 = llvm.insertvalue %127, %126[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %129 = llvm.mlir.constant(1 : index) : !llvm.i64 + %130 = llvm.insertvalue %129, %128[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %131 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %132 = llvm.mlir.addressof @cache_17 : !llvm.ptr>> + %133 = llvm.getelementptr %132[%131, %131] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %134 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %135 = llvm.insertvalue %133, %134[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %136 = llvm.insertvalue %133, %135[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %137 = llvm.mlir.constant(0 : index) : !llvm.i64 + %138 = llvm.insertvalue %137, %136[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %139 = llvm.mlir.constant(16 : index) : !llvm.i64 + %140 = llvm.insertvalue %139, %138[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.insertvalue %141, %140[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %143 = llvm.mlir.constant(128 : index) : !llvm.i64 + %144 = llvm.insertvalue %143, %142[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %145 = llvm.mlir.constant(2 : index) : !llvm.i64 + %146 = llvm.insertvalue %145, %144[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %147 = llvm.mlir.constant(2 : index) : !llvm.i64 + %148 = llvm.insertvalue %147, %146[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %150 = llvm.insertvalue %149, %148[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + llvm.br ^bb1(%67 : !llvm.i64) + ^bb1(%151: !llvm.i64): // 2 preds: ^bb0, ^bb52 + %152 = llvm.icmp "slt" %151, %36 : !llvm.i64 + llvm.cond_br %152, ^bb2, ^bb53 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%67 : !llvm.i64) + ^bb3(%153: !llvm.i64): // 2 preds: ^bb2, ^bb10 + %154 = llvm.icmp "slt" %153, %39 : !llvm.i64 + llvm.cond_br %154, ^bb4, ^bb11 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%67 : !llvm.i64) + ^bb5(%155: !llvm.i64): // 2 preds: ^bb4, ^bb9 + %156 = llvm.icmp "slt" %155, %38 : !llvm.i64 + llvm.cond_br %156, ^bb6, ^bb10 + ^bb6: // pred: ^bb5 + llvm.cond_br %40, ^bb7, ^bb8 + ^bb7: // pred: ^bb6 + %157 = llvm.add %151, %155 : !llvm.i64 + %158 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %161 = llvm.mul %153, %160 : !llvm.i64 + %162 = llvm.add %159, %161 : !llvm.i64 + %163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %164 = llvm.mul %157, %163 : !llvm.i64 + %165 = llvm.add %162, %164 : !llvm.i64 + %166 = llvm.getelementptr %158[%165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %167 = llvm.bitcast %166 : !llvm.ptr to !llvm.ptr> + %168 = llvm.load %167 {alignment = 4 : i64} : !llvm.ptr> + %169 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(16 : index) : !llvm.i64 + %172 = llvm.mul %67, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %67, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %168, %177 : !llvm.ptr> + %178 = llvm.add %157, %70 : !llvm.i64 + %179 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %182 = llvm.mul %153, %181 : !llvm.i64 + %183 = llvm.add %180, %182 : !llvm.i64 + %184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %185 = llvm.mul %178, %184 : !llvm.i64 + %186 = llvm.add %183, %185 : !llvm.i64 + %187 = llvm.getelementptr %179[%186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %188 = llvm.bitcast %187 : !llvm.ptr to !llvm.ptr> + %189 = llvm.load %188 {alignment = 4 : i64} : !llvm.ptr> + %190 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %192 = llvm.mlir.constant(16 : index) : !llvm.i64 + %193 = llvm.mul %67, %192 : !llvm.i64 + %194 = llvm.add %191, %193 : !llvm.i64 + %195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %196 = llvm.mul %69, %195 : !llvm.i64 + %197 = llvm.add %194, %196 : !llvm.i64 + %198 = llvm.getelementptr %190[%197] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %189, %198 : !llvm.ptr> + %199 = llvm.add %157, %68 : !llvm.i64 + %200 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(512 : index) : !llvm.i64 + %203 = llvm.mul %153, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %199, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.bitcast %208 : !llvm.ptr to !llvm.ptr> + %210 = llvm.load %209 {alignment = 4 : i64} : !llvm.ptr> + %211 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %212 = llvm.mlir.constant(0 : index) : !llvm.i64 + %213 = llvm.mlir.constant(16 : index) : !llvm.i64 + %214 = llvm.mul %67, %213 : !llvm.i64 + %215 = llvm.add %212, %214 : !llvm.i64 + %216 = llvm.mlir.constant(1 : index) : !llvm.i64 + %217 = llvm.mul %63, %216 : !llvm.i64 + %218 = llvm.add %215, %217 : !llvm.i64 + %219 = llvm.getelementptr %211[%218] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %210, %219 : !llvm.ptr> + %220 = llvm.add %157, %41 : !llvm.i64 + %221 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %223 = llvm.mlir.constant(512 : index) : !llvm.i64 + %224 = llvm.mul %153, %223 : !llvm.i64 + %225 = llvm.add %222, %224 : !llvm.i64 + %226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %227 = llvm.mul %220, %226 : !llvm.i64 + %228 = llvm.add %225, %227 : !llvm.i64 + %229 = llvm.getelementptr %221[%228] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %230 = llvm.bitcast %229 : !llvm.ptr to !llvm.ptr> + %231 = llvm.load %230 {alignment = 4 : i64} : !llvm.ptr> + %232 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %233 = llvm.mlir.constant(0 : index) : !llvm.i64 + %234 = llvm.mlir.constant(16 : index) : !llvm.i64 + %235 = llvm.mul %67, %234 : !llvm.i64 + %236 = llvm.add %233, %235 : !llvm.i64 + %237 = llvm.mlir.constant(1 : index) : !llvm.i64 + %238 = llvm.mul %45, %237 : !llvm.i64 + %239 = llvm.add %236, %238 : !llvm.i64 + %240 = llvm.getelementptr %232[%239] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %231, %240 : !llvm.ptr> + %241 = llvm.add %157, %42 : !llvm.i64 + %242 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %243 = llvm.mlir.constant(0 : index) : !llvm.i64 + %244 = llvm.mlir.constant(512 : index) : !llvm.i64 + %245 = llvm.mul %153, %244 : !llvm.i64 + %246 = llvm.add %243, %245 : !llvm.i64 + %247 = llvm.mlir.constant(1 : index) : !llvm.i64 + %248 = llvm.mul %241, %247 : !llvm.i64 + %249 = llvm.add %246, %248 : !llvm.i64 + %250 = llvm.getelementptr %242[%249] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %251 = llvm.bitcast %250 : !llvm.ptr to !llvm.ptr> + %252 = llvm.load %251 {alignment = 4 : i64} : !llvm.ptr> + %253 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %254 = llvm.mlir.constant(0 : index) : !llvm.i64 + %255 = llvm.mlir.constant(16 : index) : !llvm.i64 + %256 = llvm.mul %67, %255 : !llvm.i64 + %257 = llvm.add %254, %256 : !llvm.i64 + %258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %259 = llvm.mul %48, %258 : !llvm.i64 + %260 = llvm.add %257, %259 : !llvm.i64 + %261 = llvm.getelementptr %253[%260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %252, %261 : !llvm.ptr> + %262 = llvm.add %157, %43 : !llvm.i64 + %263 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %264 = llvm.mlir.constant(0 : index) : !llvm.i64 + %265 = llvm.mlir.constant(512 : index) : !llvm.i64 + %266 = llvm.mul %153, %265 : !llvm.i64 + %267 = llvm.add %264, %266 : !llvm.i64 + %268 = llvm.mlir.constant(1 : index) : !llvm.i64 + %269 = llvm.mul %262, %268 : !llvm.i64 + %270 = llvm.add %267, %269 : !llvm.i64 + %271 = llvm.getelementptr %263[%270] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %272 = llvm.bitcast %271 : !llvm.ptr to !llvm.ptr> + %273 = llvm.load %272 {alignment = 4 : i64} : !llvm.ptr> + %274 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %275 = llvm.mlir.constant(0 : index) : !llvm.i64 + %276 = llvm.mlir.constant(16 : index) : !llvm.i64 + %277 = llvm.mul %67, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.mlir.constant(1 : index) : !llvm.i64 + %280 = llvm.mul %52, %279 : !llvm.i64 + %281 = llvm.add %278, %280 : !llvm.i64 + %282 = llvm.getelementptr %274[%281] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %273, %282 : !llvm.ptr> + %283 = llvm.add %157, %44 : !llvm.i64 + %284 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %286 = llvm.mlir.constant(512 : index) : !llvm.i64 + %287 = llvm.mul %153, %286 : !llvm.i64 + %288 = llvm.add %285, %287 : !llvm.i64 + %289 = llvm.mlir.constant(1 : index) : !llvm.i64 + %290 = llvm.mul %283, %289 : !llvm.i64 + %291 = llvm.add %288, %290 : !llvm.i64 + %292 = llvm.getelementptr %284[%291] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %293 = llvm.bitcast %292 : !llvm.ptr to !llvm.ptr> + %294 = llvm.load %293 {alignment = 4 : i64} : !llvm.ptr> + %295 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %296 = llvm.mlir.constant(0 : index) : !llvm.i64 + %297 = llvm.mlir.constant(16 : index) : !llvm.i64 + %298 = llvm.mul %67, %297 : !llvm.i64 + %299 = llvm.add %296, %298 : !llvm.i64 + %300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %301 = llvm.mul %56, %300 : !llvm.i64 + %302 = llvm.add %299, %301 : !llvm.i64 + %303 = llvm.getelementptr %295[%302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %294, %303 : !llvm.ptr> + %304 = llvm.add %157, %46 : !llvm.i64 + %305 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %306 = llvm.mlir.constant(0 : index) : !llvm.i64 + %307 = llvm.mlir.constant(512 : index) : !llvm.i64 + %308 = llvm.mul %153, %307 : !llvm.i64 + %309 = llvm.add %306, %308 : !llvm.i64 + %310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %311 = llvm.mul %304, %310 : !llvm.i64 + %312 = llvm.add %309, %311 : !llvm.i64 + %313 = llvm.getelementptr %305[%312] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %314 = llvm.bitcast %313 : !llvm.ptr to !llvm.ptr> + %315 = llvm.load %314 {alignment = 4 : i64} : !llvm.ptr> + %316 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %318 = llvm.mlir.constant(16 : index) : !llvm.i64 + %319 = llvm.mul %67, %318 : !llvm.i64 + %320 = llvm.add %317, %319 : !llvm.i64 + %321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %322 = llvm.mul %61, %321 : !llvm.i64 + %323 = llvm.add %320, %322 : !llvm.i64 + %324 = llvm.getelementptr %316[%323] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %315, %324 : !llvm.ptr> + %325 = llvm.add %157, %47 : !llvm.i64 + %326 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %328 = llvm.mlir.constant(512 : index) : !llvm.i64 + %329 = llvm.mul %153, %328 : !llvm.i64 + %330 = llvm.add %327, %329 : !llvm.i64 + %331 = llvm.mlir.constant(1 : index) : !llvm.i64 + %332 = llvm.mul %325, %331 : !llvm.i64 + %333 = llvm.add %330, %332 : !llvm.i64 + %334 = llvm.getelementptr %326[%333] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %335 = llvm.bitcast %334 : !llvm.ptr to !llvm.ptr> + %336 = llvm.load %335 {alignment = 4 : i64} : !llvm.ptr> + %337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %340 = llvm.mul %67, %339 : !llvm.i64 + %341 = llvm.add %338, %340 : !llvm.i64 + %342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %343 = llvm.mul %70, %342 : !llvm.i64 + %344 = llvm.add %341, %343 : !llvm.i64 + %345 = llvm.getelementptr %337[%344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %336, %345 : !llvm.ptr> + %346 = llvm.add %157, %49 : !llvm.i64 + %347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %350 = llvm.mul %153, %349 : !llvm.i64 + %351 = llvm.add %348, %350 : !llvm.i64 + %352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %353 = llvm.mul %346, %352 : !llvm.i64 + %354 = llvm.add %351, %353 : !llvm.i64 + %355 = llvm.getelementptr %347[%354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %356 = llvm.bitcast %355 : !llvm.ptr to !llvm.ptr> + %357 = llvm.load %356 {alignment = 4 : i64} : !llvm.ptr> + %358 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %360 = llvm.mlir.constant(16 : index) : !llvm.i64 + %361 = llvm.mul %67, %360 : !llvm.i64 + %362 = llvm.add %359, %361 : !llvm.i64 + %363 = llvm.mlir.constant(1 : index) : !llvm.i64 + %364 = llvm.mul %50, %363 : !llvm.i64 + %365 = llvm.add %362, %364 : !llvm.i64 + %366 = llvm.getelementptr %358[%365] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %357, %366 : !llvm.ptr> + %367 = llvm.add %157, %51 : !llvm.i64 + %368 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %371 = llvm.mul %153, %370 : !llvm.i64 + %372 = llvm.add %369, %371 : !llvm.i64 + %373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %374 = llvm.mul %367, %373 : !llvm.i64 + %375 = llvm.add %372, %374 : !llvm.i64 + %376 = llvm.getelementptr %368[%375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %377 = llvm.bitcast %376 : !llvm.ptr to !llvm.ptr> + %378 = llvm.load %377 {alignment = 4 : i64} : !llvm.ptr> + %379 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %380 = llvm.mlir.constant(0 : index) : !llvm.i64 + %381 = llvm.mlir.constant(16 : index) : !llvm.i64 + %382 = llvm.mul %67, %381 : !llvm.i64 + %383 = llvm.add %380, %382 : !llvm.i64 + %384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %385 = llvm.mul %33, %384 : !llvm.i64 + %386 = llvm.add %383, %385 : !llvm.i64 + %387 = llvm.getelementptr %379[%386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %378, %387 : !llvm.ptr> + %388 = llvm.add %157, %53 : !llvm.i64 + %389 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %390 = llvm.mlir.constant(0 : index) : !llvm.i64 + %391 = llvm.mlir.constant(512 : index) : !llvm.i64 + %392 = llvm.mul %153, %391 : !llvm.i64 + %393 = llvm.add %390, %392 : !llvm.i64 + %394 = llvm.mlir.constant(1 : index) : !llvm.i64 + %395 = llvm.mul %388, %394 : !llvm.i64 + %396 = llvm.add %393, %395 : !llvm.i64 + %397 = llvm.getelementptr %389[%396] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %398 = llvm.bitcast %397 : !llvm.ptr to !llvm.ptr> + %399 = llvm.load %398 {alignment = 4 : i64} : !llvm.ptr> + %400 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %403 = llvm.mul %67, %402 : !llvm.i64 + %404 = llvm.add %401, %403 : !llvm.i64 + %405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %406 = llvm.mul %54, %405 : !llvm.i64 + %407 = llvm.add %404, %406 : !llvm.i64 + %408 = llvm.getelementptr %400[%407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %399, %408 : !llvm.ptr> + %409 = llvm.add %157, %55 : !llvm.i64 + %410 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %413 = llvm.mul %153, %412 : !llvm.i64 + %414 = llvm.add %411, %413 : !llvm.i64 + %415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %416 = llvm.mul %409, %415 : !llvm.i64 + %417 = llvm.add %414, %416 : !llvm.i64 + %418 = llvm.getelementptr %410[%417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %419 = llvm.bitcast %418 : !llvm.ptr to !llvm.ptr> + %420 = llvm.load %419 {alignment = 4 : i64} : !llvm.ptr> + %421 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %423 = llvm.mlir.constant(16 : index) : !llvm.i64 + %424 = llvm.mul %67, %423 : !llvm.i64 + %425 = llvm.add %422, %424 : !llvm.i64 + %426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %427 = llvm.mul %34, %426 : !llvm.i64 + %428 = llvm.add %425, %427 : !llvm.i64 + %429 = llvm.getelementptr %421[%428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %420, %429 : !llvm.ptr> + %430 = llvm.add %157, %57 : !llvm.i64 + %431 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %433 = llvm.mlir.constant(512 : index) : !llvm.i64 + %434 = llvm.mul %153, %433 : !llvm.i64 + %435 = llvm.add %432, %434 : !llvm.i64 + %436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %437 = llvm.mul %430, %436 : !llvm.i64 + %438 = llvm.add %435, %437 : !llvm.i64 + %439 = llvm.getelementptr %431[%438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %440 = llvm.bitcast %439 : !llvm.ptr to !llvm.ptr> + %441 = llvm.load %440 {alignment = 4 : i64} : !llvm.ptr> + %442 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %444 = llvm.mlir.constant(16 : index) : !llvm.i64 + %445 = llvm.mul %67, %444 : !llvm.i64 + %446 = llvm.add %443, %445 : !llvm.i64 + %447 = llvm.mlir.constant(1 : index) : !llvm.i64 + %448 = llvm.mul %58, %447 : !llvm.i64 + %449 = llvm.add %446, %448 : !llvm.i64 + %450 = llvm.getelementptr %442[%449] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %441, %450 : !llvm.ptr> + %451 = llvm.add %157, %59 : !llvm.i64 + %452 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %455 = llvm.mul %153, %454 : !llvm.i64 + %456 = llvm.add %453, %455 : !llvm.i64 + %457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %458 = llvm.mul %451, %457 : !llvm.i64 + %459 = llvm.add %456, %458 : !llvm.i64 + %460 = llvm.getelementptr %452[%459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %461 = llvm.bitcast %460 : !llvm.ptr to !llvm.ptr> + %462 = llvm.load %461 {alignment = 4 : i64} : !llvm.ptr> + %463 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %464 = llvm.mlir.constant(0 : index) : !llvm.i64 + %465 = llvm.mlir.constant(16 : index) : !llvm.i64 + %466 = llvm.mul %67, %465 : !llvm.i64 + %467 = llvm.add %464, %466 : !llvm.i64 + %468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %469 = llvm.mul %35, %468 : !llvm.i64 + %470 = llvm.add %467, %469 : !llvm.i64 + %471 = llvm.getelementptr %463[%470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %462, %471 : !llvm.ptr> + %472 = llvm.add %157, %62 : !llvm.i64 + %473 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %474 = llvm.mlir.constant(0 : index) : !llvm.i64 + %475 = llvm.mlir.constant(512 : index) : !llvm.i64 + %476 = llvm.mul %153, %475 : !llvm.i64 + %477 = llvm.add %474, %476 : !llvm.i64 + %478 = llvm.mlir.constant(1 : index) : !llvm.i64 + %479 = llvm.mul %472, %478 : !llvm.i64 + %480 = llvm.add %477, %479 : !llvm.i64 + %481 = llvm.getelementptr %473[%480] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %482 = llvm.bitcast %481 : !llvm.ptr to !llvm.ptr> + %483 = llvm.load %482 {alignment = 4 : i64} : !llvm.ptr> + %484 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %485 = llvm.mlir.constant(0 : index) : !llvm.i64 + %486 = llvm.mlir.constant(16 : index) : !llvm.i64 + %487 = llvm.mul %67, %486 : !llvm.i64 + %488 = llvm.add %485, %487 : !llvm.i64 + %489 = llvm.mlir.constant(1 : index) : !llvm.i64 + %490 = llvm.mul %66, %489 : !llvm.i64 + %491 = llvm.add %488, %490 : !llvm.i64 + %492 = llvm.getelementptr %484[%491] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %483, %492 : !llvm.ptr> + %493 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %494 = llvm.mlir.constant(0 : index) : !llvm.i64 + %495 = llvm.mlir.constant(16 : index) : !llvm.i64 + %496 = llvm.mul %67, %495 : !llvm.i64 + %497 = llvm.add %494, %496 : !llvm.i64 + %498 = llvm.mlir.constant(1 : index) : !llvm.i64 + %499 = llvm.mul %67, %498 : !llvm.i64 + %500 = llvm.add %497, %499 : !llvm.i64 + %501 = llvm.getelementptr %493[%500] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %502 = llvm.load %501 : !llvm.ptr> + %503 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %504 = llvm.sub %64, %155 : !llvm.i64 + %505 = llvm.select %503, %504, %155 : !llvm.i1, !llvm.i64 + %506 = llvm.sdiv %505, %68 : !llvm.i64 + %507 = llvm.sub %64, %506 : !llvm.i64 + %508 = llvm.select %503, %507, %506 : !llvm.i1, !llvm.i64 + %509 = llvm.srem %508, %68 : !llvm.i64 + %510 = llvm.icmp "slt" %509, %67 : !llvm.i64 + %511 = llvm.add %509, %68 : !llvm.i64 + %512 = llvm.select %510, %511, %509 : !llvm.i1, !llvm.i64 + %513 = llvm.srem %153, %39 : !llvm.i64 + %514 = llvm.icmp "slt" %513, %67 : !llvm.i64 + %515 = llvm.add %513, %39 : !llvm.i64 + %516 = llvm.select %514, %515, %513 : !llvm.i1, !llvm.i64 + %517 = llvm.srem %155, %68 : !llvm.i64 + %518 = llvm.icmp "slt" %517, %67 : !llvm.i64 + %519 = llvm.add %517, %68 : !llvm.i64 + %520 = llvm.select %518, %519, %517 : !llvm.i1, !llvm.i64 + %521 = llvm.icmp "slt" %520, %67 : !llvm.i64 + %522 = llvm.sub %64, %520 : !llvm.i64 + %523 = llvm.select %521, %522, %520 : !llvm.i1, !llvm.i64 + %524 = llvm.sdiv %523, %70 : !llvm.i64 + %525 = llvm.sub %64, %524 : !llvm.i64 + %526 = llvm.select %521, %525, %524 : !llvm.i1, !llvm.i64 + %527 = llvm.srem %526, %63 : !llvm.i64 + %528 = llvm.icmp "slt" %527, %67 : !llvm.i64 + %529 = llvm.add %527, %63 : !llvm.i64 + %530 = llvm.select %528, %529, %527 : !llvm.i1, !llvm.i64 + %531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %534 = llvm.mul %512, %533 : !llvm.i64 + %535 = llvm.add %532, %534 : !llvm.i64 + %536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %537 = llvm.mul %516, %536 : !llvm.i64 + %538 = llvm.add %535, %537 : !llvm.i64 + %539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %540 = llvm.mul %530, %539 : !llvm.i64 + %541 = llvm.add %538, %540 : !llvm.i64 + %542 = llvm.getelementptr %531[%541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %502, %542 : !llvm.ptr> + %543 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %544 = llvm.mlir.constant(0 : index) : !llvm.i64 + %545 = llvm.mlir.constant(16 : index) : !llvm.i64 + %546 = llvm.mul %67, %545 : !llvm.i64 + %547 = llvm.add %544, %546 : !llvm.i64 + %548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %549 = llvm.mul %69, %548 : !llvm.i64 + %550 = llvm.add %547, %549 : !llvm.i64 + %551 = llvm.getelementptr %543[%550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %552 = llvm.load %551 : !llvm.ptr> + %553 = llvm.add %155, %70 : !llvm.i64 + %554 = llvm.icmp "slt" %553, %67 : !llvm.i64 + %555 = llvm.sub %64, %553 : !llvm.i64 + %556 = llvm.select %554, %555, %553 : !llvm.i1, !llvm.i64 + %557 = llvm.sdiv %556, %68 : !llvm.i64 + %558 = llvm.sub %64, %557 : !llvm.i64 + %559 = llvm.select %554, %558, %557 : !llvm.i1, !llvm.i64 + %560 = llvm.srem %559, %68 : !llvm.i64 + %561 = llvm.icmp "slt" %560, %67 : !llvm.i64 + %562 = llvm.add %560, %68 : !llvm.i64 + %563 = llvm.select %561, %562, %560 : !llvm.i1, !llvm.i64 + %564 = llvm.sdiv %505, %70 : !llvm.i64 + %565 = llvm.sub %64, %564 : !llvm.i64 + %566 = llvm.select %503, %565, %564 : !llvm.i1, !llvm.i64 + %567 = llvm.mul %559, %65 : !llvm.i64 + %568 = llvm.add %566, %567 : !llvm.i64 + %569 = llvm.add %568, %69 : !llvm.i64 + %570 = llvm.icmp "slt" %569, %67 : !llvm.i64 + %571 = llvm.sub %64, %569 : !llvm.i64 + %572 = llvm.select %570, %571, %569 : !llvm.i1, !llvm.i64 + %573 = llvm.sdiv %572, %63 : !llvm.i64 + %574 = llvm.sub %64, %573 : !llvm.i64 + %575 = llvm.select %570, %574, %573 : !llvm.i1, !llvm.i64 + %576 = llvm.mul %575, %65 : !llvm.i64 + %577 = llvm.add %568, %576 : !llvm.i64 + %578 = llvm.add %577, %69 : !llvm.i64 + %579 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %581 = llvm.mlir.constant(256 : index) : !llvm.i64 + %582 = llvm.mul %563, %581 : !llvm.i64 + %583 = llvm.add %580, %582 : !llvm.i64 + %584 = llvm.mlir.constant(2 : index) : !llvm.i64 + %585 = llvm.mul %516, %584 : !llvm.i64 + %586 = llvm.add %583, %585 : !llvm.i64 + %587 = llvm.mlir.constant(1 : index) : !llvm.i64 + %588 = llvm.mul %578, %587 : !llvm.i64 + %589 = llvm.add %586, %588 : !llvm.i64 + %590 = llvm.getelementptr %579[%589] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %552, %590 : !llvm.ptr> + %591 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %593 = llvm.mlir.constant(16 : index) : !llvm.i64 + %594 = llvm.mul %67, %593 : !llvm.i64 + %595 = llvm.add %592, %594 : !llvm.i64 + %596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %597 = llvm.mul %63, %596 : !llvm.i64 + %598 = llvm.add %595, %597 : !llvm.i64 + %599 = llvm.getelementptr %591[%598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %600 = llvm.load %599 : !llvm.ptr> + %601 = llvm.add %508, %69 : !llvm.i64 + %602 = llvm.icmp "slt" %601, %67 : !llvm.i64 + %603 = llvm.sub %64, %601 : !llvm.i64 + %604 = llvm.select %602, %603, %601 : !llvm.i1, !llvm.i64 + %605 = llvm.sdiv %604, %68 : !llvm.i64 + %606 = llvm.sub %64, %605 : !llvm.i64 + %607 = llvm.select %602, %606, %605 : !llvm.i1, !llvm.i64 + %608 = llvm.mul %607, %60 : !llvm.i64 + %609 = llvm.add %508, %608 : !llvm.i64 + %610 = llvm.add %609, %69 : !llvm.i64 + %611 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %612 = llvm.mlir.constant(0 : index) : !llvm.i64 + %613 = llvm.mlir.constant(256 : index) : !llvm.i64 + %614 = llvm.mul %610, %613 : !llvm.i64 + %615 = llvm.add %612, %614 : !llvm.i64 + %616 = llvm.mlir.constant(2 : index) : !llvm.i64 + %617 = llvm.mul %516, %616 : !llvm.i64 + %618 = llvm.add %615, %617 : !llvm.i64 + %619 = llvm.mlir.constant(1 : index) : !llvm.i64 + %620 = llvm.mul %530, %619 : !llvm.i64 + %621 = llvm.add %618, %620 : !llvm.i64 + %622 = llvm.getelementptr %611[%621] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %600, %622 : !llvm.ptr> + %623 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %624 = llvm.mlir.constant(0 : index) : !llvm.i64 + %625 = llvm.mlir.constant(16 : index) : !llvm.i64 + %626 = llvm.mul %67, %625 : !llvm.i64 + %627 = llvm.add %624, %626 : !llvm.i64 + %628 = llvm.mlir.constant(1 : index) : !llvm.i64 + %629 = llvm.mul %45, %628 : !llvm.i64 + %630 = llvm.add %627, %629 : !llvm.i64 + %631 = llvm.getelementptr %623[%630] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %632 = llvm.load %631 : !llvm.ptr> + %633 = llvm.add %155, %41 : !llvm.i64 + %634 = llvm.icmp "slt" %633, %67 : !llvm.i64 + %635 = llvm.sub %64, %633 : !llvm.i64 + %636 = llvm.select %634, %635, %633 : !llvm.i1, !llvm.i64 + %637 = llvm.sdiv %636, %68 : !llvm.i64 + %638 = llvm.sub %64, %637 : !llvm.i64 + %639 = llvm.select %634, %638, %637 : !llvm.i1, !llvm.i64 + %640 = llvm.srem %639, %68 : !llvm.i64 + %641 = llvm.icmp "slt" %640, %67 : !llvm.i64 + %642 = llvm.add %640, %68 : !llvm.i64 + %643 = llvm.select %641, %642, %640 : !llvm.i1, !llvm.i64 + %644 = llvm.mul %639, %65 : !llvm.i64 + %645 = llvm.add %566, %644 : !llvm.i64 + %646 = llvm.add %645, %45 : !llvm.i64 + %647 = llvm.icmp "slt" %646, %67 : !llvm.i64 + %648 = llvm.sub %64, %646 : !llvm.i64 + %649 = llvm.select %647, %648, %646 : !llvm.i1, !llvm.i64 + %650 = llvm.sdiv %649, %63 : !llvm.i64 + %651 = llvm.sub %64, %650 : !llvm.i64 + %652 = llvm.select %647, %651, %650 : !llvm.i1, !llvm.i64 + %653 = llvm.mul %652, %65 : !llvm.i64 + %654 = llvm.add %645, %653 : !llvm.i64 + %655 = llvm.add %654, %45 : !llvm.i64 + %656 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %658 = llvm.mlir.constant(256 : index) : !llvm.i64 + %659 = llvm.mul %643, %658 : !llvm.i64 + %660 = llvm.add %657, %659 : !llvm.i64 + %661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %662 = llvm.mul %516, %661 : !llvm.i64 + %663 = llvm.add %660, %662 : !llvm.i64 + %664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %665 = llvm.mul %655, %664 : !llvm.i64 + %666 = llvm.add %663, %665 : !llvm.i64 + %667 = llvm.getelementptr %656[%666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %632, %667 : !llvm.ptr> + %668 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %670 = llvm.mlir.constant(16 : index) : !llvm.i64 + %671 = llvm.mul %67, %670 : !llvm.i64 + %672 = llvm.add %669, %671 : !llvm.i64 + %673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %674 = llvm.mul %48, %673 : !llvm.i64 + %675 = llvm.add %672, %674 : !llvm.i64 + %676 = llvm.getelementptr %668[%675] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %677 = llvm.load %676 : !llvm.ptr> + %678 = llvm.add %508, %63 : !llvm.i64 + %679 = llvm.icmp "slt" %678, %67 : !llvm.i64 + %680 = llvm.sub %64, %678 : !llvm.i64 + %681 = llvm.select %679, %680, %678 : !llvm.i1, !llvm.i64 + %682 = llvm.sdiv %681, %68 : !llvm.i64 + %683 = llvm.sub %64, %682 : !llvm.i64 + %684 = llvm.select %679, %683, %682 : !llvm.i1, !llvm.i64 + %685 = llvm.mul %684, %60 : !llvm.i64 + %686 = llvm.add %508, %685 : !llvm.i64 + %687 = llvm.add %686, %63 : !llvm.i64 + %688 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %690 = llvm.mlir.constant(256 : index) : !llvm.i64 + %691 = llvm.mul %687, %690 : !llvm.i64 + %692 = llvm.add %689, %691 : !llvm.i64 + %693 = llvm.mlir.constant(2 : index) : !llvm.i64 + %694 = llvm.mul %516, %693 : !llvm.i64 + %695 = llvm.add %692, %694 : !llvm.i64 + %696 = llvm.mlir.constant(1 : index) : !llvm.i64 + %697 = llvm.mul %530, %696 : !llvm.i64 + %698 = llvm.add %695, %697 : !llvm.i64 + %699 = llvm.getelementptr %688[%698] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %677, %699 : !llvm.ptr> + %700 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %701 = llvm.mlir.constant(0 : index) : !llvm.i64 + %702 = llvm.mlir.constant(16 : index) : !llvm.i64 + %703 = llvm.mul %67, %702 : !llvm.i64 + %704 = llvm.add %701, %703 : !llvm.i64 + %705 = llvm.mlir.constant(1 : index) : !llvm.i64 + %706 = llvm.mul %52, %705 : !llvm.i64 + %707 = llvm.add %704, %706 : !llvm.i64 + %708 = llvm.getelementptr %700[%707] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %709 = llvm.load %708 : !llvm.ptr> + %710 = llvm.add %155, %43 : !llvm.i64 + %711 = llvm.icmp "slt" %710, %67 : !llvm.i64 + %712 = llvm.sub %64, %710 : !llvm.i64 + %713 = llvm.select %711, %712, %710 : !llvm.i1, !llvm.i64 + %714 = llvm.sdiv %713, %68 : !llvm.i64 + %715 = llvm.sub %64, %714 : !llvm.i64 + %716 = llvm.select %711, %715, %714 : !llvm.i1, !llvm.i64 + %717 = llvm.srem %716, %68 : !llvm.i64 + %718 = llvm.icmp "slt" %717, %67 : !llvm.i64 + %719 = llvm.add %717, %68 : !llvm.i64 + %720 = llvm.select %718, %719, %717 : !llvm.i1, !llvm.i64 + %721 = llvm.mul %716, %65 : !llvm.i64 + %722 = llvm.add %566, %721 : !llvm.i64 + %723 = llvm.add %722, %52 : !llvm.i64 + %724 = llvm.icmp "slt" %723, %67 : !llvm.i64 + %725 = llvm.sub %64, %723 : !llvm.i64 + %726 = llvm.select %724, %725, %723 : !llvm.i1, !llvm.i64 + %727 = llvm.sdiv %726, %63 : !llvm.i64 + %728 = llvm.sub %64, %727 : !llvm.i64 + %729 = llvm.select %724, %728, %727 : !llvm.i1, !llvm.i64 + %730 = llvm.mul %729, %65 : !llvm.i64 + %731 = llvm.add %722, %730 : !llvm.i64 + %732 = llvm.add %731, %52 : !llvm.i64 + %733 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %734 = llvm.mlir.constant(0 : index) : !llvm.i64 + %735 = llvm.mlir.constant(256 : index) : !llvm.i64 + %736 = llvm.mul %720, %735 : !llvm.i64 + %737 = llvm.add %734, %736 : !llvm.i64 + %738 = llvm.mlir.constant(2 : index) : !llvm.i64 + %739 = llvm.mul %516, %738 : !llvm.i64 + %740 = llvm.add %737, %739 : !llvm.i64 + %741 = llvm.mlir.constant(1 : index) : !llvm.i64 + %742 = llvm.mul %732, %741 : !llvm.i64 + %743 = llvm.add %740, %742 : !llvm.i64 + %744 = llvm.getelementptr %733[%743] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %709, %744 : !llvm.ptr> + %745 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %746 = llvm.mlir.constant(0 : index) : !llvm.i64 + %747 = llvm.mlir.constant(16 : index) : !llvm.i64 + %748 = llvm.mul %67, %747 : !llvm.i64 + %749 = llvm.add %746, %748 : !llvm.i64 + %750 = llvm.mlir.constant(1 : index) : !llvm.i64 + %751 = llvm.mul %56, %750 : !llvm.i64 + %752 = llvm.add %749, %751 : !llvm.i64 + %753 = llvm.getelementptr %745[%752] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %754 = llvm.load %753 : !llvm.ptr> + %755 = llvm.add %508, %45 : !llvm.i64 + %756 = llvm.icmp "slt" %755, %67 : !llvm.i64 + %757 = llvm.sub %64, %755 : !llvm.i64 + %758 = llvm.select %756, %757, %755 : !llvm.i1, !llvm.i64 + %759 = llvm.sdiv %758, %68 : !llvm.i64 + %760 = llvm.sub %64, %759 : !llvm.i64 + %761 = llvm.select %756, %760, %759 : !llvm.i1, !llvm.i64 + %762 = llvm.mul %761, %60 : !llvm.i64 + %763 = llvm.add %508, %762 : !llvm.i64 + %764 = llvm.add %763, %45 : !llvm.i64 + %765 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %766 = llvm.mlir.constant(0 : index) : !llvm.i64 + %767 = llvm.mlir.constant(256 : index) : !llvm.i64 + %768 = llvm.mul %764, %767 : !llvm.i64 + %769 = llvm.add %766, %768 : !llvm.i64 + %770 = llvm.mlir.constant(2 : index) : !llvm.i64 + %771 = llvm.mul %516, %770 : !llvm.i64 + %772 = llvm.add %769, %771 : !llvm.i64 + %773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %774 = llvm.mul %530, %773 : !llvm.i64 + %775 = llvm.add %772, %774 : !llvm.i64 + %776 = llvm.getelementptr %765[%775] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %754, %776 : !llvm.ptr> + %777 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %779 = llvm.mlir.constant(16 : index) : !llvm.i64 + %780 = llvm.mul %67, %779 : !llvm.i64 + %781 = llvm.add %778, %780 : !llvm.i64 + %782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %783 = llvm.mul %61, %782 : !llvm.i64 + %784 = llvm.add %781, %783 : !llvm.i64 + %785 = llvm.getelementptr %777[%784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %786 = llvm.load %785 : !llvm.ptr> + %787 = llvm.add %155, %46 : !llvm.i64 + %788 = llvm.icmp "slt" %787, %67 : !llvm.i64 + %789 = llvm.sub %64, %787 : !llvm.i64 + %790 = llvm.select %788, %789, %787 : !llvm.i1, !llvm.i64 + %791 = llvm.sdiv %790, %68 : !llvm.i64 + %792 = llvm.sub %64, %791 : !llvm.i64 + %793 = llvm.select %788, %792, %791 : !llvm.i1, !llvm.i64 + %794 = llvm.srem %793, %68 : !llvm.i64 + %795 = llvm.icmp "slt" %794, %67 : !llvm.i64 + %796 = llvm.add %794, %68 : !llvm.i64 + %797 = llvm.select %795, %796, %794 : !llvm.i1, !llvm.i64 + %798 = llvm.mul %793, %65 : !llvm.i64 + %799 = llvm.add %566, %798 : !llvm.i64 + %800 = llvm.add %799, %61 : !llvm.i64 + %801 = llvm.icmp "slt" %800, %67 : !llvm.i64 + %802 = llvm.sub %64, %800 : !llvm.i64 + %803 = llvm.select %801, %802, %800 : !llvm.i1, !llvm.i64 + %804 = llvm.sdiv %803, %63 : !llvm.i64 + %805 = llvm.sub %64, %804 : !llvm.i64 + %806 = llvm.select %801, %805, %804 : !llvm.i1, !llvm.i64 + %807 = llvm.mul %806, %65 : !llvm.i64 + %808 = llvm.add %799, %807 : !llvm.i64 + %809 = llvm.add %808, %61 : !llvm.i64 + %810 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %811 = llvm.mlir.constant(0 : index) : !llvm.i64 + %812 = llvm.mlir.constant(256 : index) : !llvm.i64 + %813 = llvm.mul %797, %812 : !llvm.i64 + %814 = llvm.add %811, %813 : !llvm.i64 + %815 = llvm.mlir.constant(2 : index) : !llvm.i64 + %816 = llvm.mul %516, %815 : !llvm.i64 + %817 = llvm.add %814, %816 : !llvm.i64 + %818 = llvm.mlir.constant(1 : index) : !llvm.i64 + %819 = llvm.mul %809, %818 : !llvm.i64 + %820 = llvm.add %817, %819 : !llvm.i64 + %821 = llvm.getelementptr %810[%820] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %786, %821 : !llvm.ptr> + %822 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %824 = llvm.mlir.constant(16 : index) : !llvm.i64 + %825 = llvm.mul %67, %824 : !llvm.i64 + %826 = llvm.add %823, %825 : !llvm.i64 + %827 = llvm.mlir.constant(1 : index) : !llvm.i64 + %828 = llvm.mul %70, %827 : !llvm.i64 + %829 = llvm.add %826, %828 : !llvm.i64 + %830 = llvm.getelementptr %822[%829] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %831 = llvm.load %830 : !llvm.ptr> + %832 = llvm.add %508, %48 : !llvm.i64 + %833 = llvm.icmp "slt" %832, %67 : !llvm.i64 + %834 = llvm.sub %64, %832 : !llvm.i64 + %835 = llvm.select %833, %834, %832 : !llvm.i1, !llvm.i64 + %836 = llvm.sdiv %835, %68 : !llvm.i64 + %837 = llvm.sub %64, %836 : !llvm.i64 + %838 = llvm.select %833, %837, %836 : !llvm.i1, !llvm.i64 + %839 = llvm.mul %838, %60 : !llvm.i64 + %840 = llvm.add %508, %839 : !llvm.i64 + %841 = llvm.add %840, %48 : !llvm.i64 + %842 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %843 = llvm.mlir.constant(0 : index) : !llvm.i64 + %844 = llvm.mlir.constant(256 : index) : !llvm.i64 + %845 = llvm.mul %841, %844 : !llvm.i64 + %846 = llvm.add %843, %845 : !llvm.i64 + %847 = llvm.mlir.constant(2 : index) : !llvm.i64 + %848 = llvm.mul %516, %847 : !llvm.i64 + %849 = llvm.add %846, %848 : !llvm.i64 + %850 = llvm.mlir.constant(1 : index) : !llvm.i64 + %851 = llvm.mul %530, %850 : !llvm.i64 + %852 = llvm.add %849, %851 : !llvm.i64 + %853 = llvm.getelementptr %842[%852] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %831, %853 : !llvm.ptr> + %854 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %855 = llvm.mlir.constant(0 : index) : !llvm.i64 + %856 = llvm.mlir.constant(16 : index) : !llvm.i64 + %857 = llvm.mul %67, %856 : !llvm.i64 + %858 = llvm.add %855, %857 : !llvm.i64 + %859 = llvm.mlir.constant(1 : index) : !llvm.i64 + %860 = llvm.mul %50, %859 : !llvm.i64 + %861 = llvm.add %858, %860 : !llvm.i64 + %862 = llvm.getelementptr %854[%861] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %863 = llvm.load %862 : !llvm.ptr> + %864 = llvm.add %155, %49 : !llvm.i64 + %865 = llvm.icmp "slt" %864, %67 : !llvm.i64 + %866 = llvm.sub %64, %864 : !llvm.i64 + %867 = llvm.select %865, %866, %864 : !llvm.i1, !llvm.i64 + %868 = llvm.sdiv %867, %68 : !llvm.i64 + %869 = llvm.sub %64, %868 : !llvm.i64 + %870 = llvm.select %865, %869, %868 : !llvm.i1, !llvm.i64 + %871 = llvm.srem %870, %68 : !llvm.i64 + %872 = llvm.icmp "slt" %871, %67 : !llvm.i64 + %873 = llvm.add %871, %68 : !llvm.i64 + %874 = llvm.select %872, %873, %871 : !llvm.i1, !llvm.i64 + %875 = llvm.mul %870, %65 : !llvm.i64 + %876 = llvm.add %566, %875 : !llvm.i64 + %877 = llvm.add %876, %50 : !llvm.i64 + %878 = llvm.icmp "slt" %877, %67 : !llvm.i64 + %879 = llvm.sub %64, %877 : !llvm.i64 + %880 = llvm.select %878, %879, %877 : !llvm.i1, !llvm.i64 + %881 = llvm.sdiv %880, %63 : !llvm.i64 + %882 = llvm.sub %64, %881 : !llvm.i64 + %883 = llvm.select %878, %882, %881 : !llvm.i1, !llvm.i64 + %884 = llvm.mul %883, %65 : !llvm.i64 + %885 = llvm.add %876, %884 : !llvm.i64 + %886 = llvm.add %885, %50 : !llvm.i64 + %887 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %888 = llvm.mlir.constant(0 : index) : !llvm.i64 + %889 = llvm.mlir.constant(256 : index) : !llvm.i64 + %890 = llvm.mul %874, %889 : !llvm.i64 + %891 = llvm.add %888, %890 : !llvm.i64 + %892 = llvm.mlir.constant(2 : index) : !llvm.i64 + %893 = llvm.mul %516, %892 : !llvm.i64 + %894 = llvm.add %891, %893 : !llvm.i64 + %895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %896 = llvm.mul %886, %895 : !llvm.i64 + %897 = llvm.add %894, %896 : !llvm.i64 + %898 = llvm.getelementptr %887[%897] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %863, %898 : !llvm.ptr> + %899 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %901 = llvm.mlir.constant(16 : index) : !llvm.i64 + %902 = llvm.mul %67, %901 : !llvm.i64 + %903 = llvm.add %900, %902 : !llvm.i64 + %904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %905 = llvm.mul %33, %904 : !llvm.i64 + %906 = llvm.add %903, %905 : !llvm.i64 + %907 = llvm.getelementptr %899[%906] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %908 = llvm.load %907 : !llvm.ptr> + %909 = llvm.add %508, %52 : !llvm.i64 + %910 = llvm.icmp "slt" %909, %67 : !llvm.i64 + %911 = llvm.sub %64, %909 : !llvm.i64 + %912 = llvm.select %910, %911, %909 : !llvm.i1, !llvm.i64 + %913 = llvm.sdiv %912, %68 : !llvm.i64 + %914 = llvm.sub %64, %913 : !llvm.i64 + %915 = llvm.select %910, %914, %913 : !llvm.i1, !llvm.i64 + %916 = llvm.mul %915, %60 : !llvm.i64 + %917 = llvm.add %508, %916 : !llvm.i64 + %918 = llvm.add %917, %52 : !llvm.i64 + %919 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %921 = llvm.mlir.constant(256 : index) : !llvm.i64 + %922 = llvm.mul %918, %921 : !llvm.i64 + %923 = llvm.add %920, %922 : !llvm.i64 + %924 = llvm.mlir.constant(2 : index) : !llvm.i64 + %925 = llvm.mul %516, %924 : !llvm.i64 + %926 = llvm.add %923, %925 : !llvm.i64 + %927 = llvm.mlir.constant(1 : index) : !llvm.i64 + %928 = llvm.mul %530, %927 : !llvm.i64 + %929 = llvm.add %926, %928 : !llvm.i64 + %930 = llvm.getelementptr %919[%929] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %908, %930 : !llvm.ptr> + %931 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %932 = llvm.mlir.constant(0 : index) : !llvm.i64 + %933 = llvm.mlir.constant(16 : index) : !llvm.i64 + %934 = llvm.mul %67, %933 : !llvm.i64 + %935 = llvm.add %932, %934 : !llvm.i64 + %936 = llvm.mlir.constant(1 : index) : !llvm.i64 + %937 = llvm.mul %54, %936 : !llvm.i64 + %938 = llvm.add %935, %937 : !llvm.i64 + %939 = llvm.getelementptr %931[%938] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %940 = llvm.load %939 : !llvm.ptr> + %941 = llvm.add %155, %53 : !llvm.i64 + %942 = llvm.icmp "slt" %941, %67 : !llvm.i64 + %943 = llvm.sub %64, %941 : !llvm.i64 + %944 = llvm.select %942, %943, %941 : !llvm.i1, !llvm.i64 + %945 = llvm.sdiv %944, %68 : !llvm.i64 + %946 = llvm.sub %64, %945 : !llvm.i64 + %947 = llvm.select %942, %946, %945 : !llvm.i1, !llvm.i64 + %948 = llvm.srem %947, %68 : !llvm.i64 + %949 = llvm.icmp "slt" %948, %67 : !llvm.i64 + %950 = llvm.add %948, %68 : !llvm.i64 + %951 = llvm.select %949, %950, %948 : !llvm.i1, !llvm.i64 + %952 = llvm.mul %947, %65 : !llvm.i64 + %953 = llvm.add %566, %952 : !llvm.i64 + %954 = llvm.add %953, %54 : !llvm.i64 + %955 = llvm.icmp "slt" %954, %67 : !llvm.i64 + %956 = llvm.sub %64, %954 : !llvm.i64 + %957 = llvm.select %955, %956, %954 : !llvm.i1, !llvm.i64 + %958 = llvm.sdiv %957, %63 : !llvm.i64 + %959 = llvm.sub %64, %958 : !llvm.i64 + %960 = llvm.select %955, %959, %958 : !llvm.i1, !llvm.i64 + %961 = llvm.mul %960, %65 : !llvm.i64 + %962 = llvm.add %953, %961 : !llvm.i64 + %963 = llvm.add %962, %54 : !llvm.i64 + %964 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %966 = llvm.mlir.constant(256 : index) : !llvm.i64 + %967 = llvm.mul %951, %966 : !llvm.i64 + %968 = llvm.add %965, %967 : !llvm.i64 + %969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %970 = llvm.mul %516, %969 : !llvm.i64 + %971 = llvm.add %968, %970 : !llvm.i64 + %972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %973 = llvm.mul %963, %972 : !llvm.i64 + %974 = llvm.add %971, %973 : !llvm.i64 + %975 = llvm.getelementptr %964[%974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %940, %975 : !llvm.ptr> + %976 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %977 = llvm.mlir.constant(0 : index) : !llvm.i64 + %978 = llvm.mlir.constant(16 : index) : !llvm.i64 + %979 = llvm.mul %67, %978 : !llvm.i64 + %980 = llvm.add %977, %979 : !llvm.i64 + %981 = llvm.mlir.constant(1 : index) : !llvm.i64 + %982 = llvm.mul %34, %981 : !llvm.i64 + %983 = llvm.add %980, %982 : !llvm.i64 + %984 = llvm.getelementptr %976[%983] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %985 = llvm.load %984 : !llvm.ptr> + %986 = llvm.add %508, %56 : !llvm.i64 + %987 = llvm.icmp "slt" %986, %67 : !llvm.i64 + %988 = llvm.sub %64, %986 : !llvm.i64 + %989 = llvm.select %987, %988, %986 : !llvm.i1, !llvm.i64 + %990 = llvm.sdiv %989, %68 : !llvm.i64 + %991 = llvm.sub %64, %990 : !llvm.i64 + %992 = llvm.select %987, %991, %990 : !llvm.i1, !llvm.i64 + %993 = llvm.mul %992, %60 : !llvm.i64 + %994 = llvm.add %508, %993 : !llvm.i64 + %995 = llvm.add %994, %56 : !llvm.i64 + %996 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %997 = llvm.mlir.constant(0 : index) : !llvm.i64 + %998 = llvm.mlir.constant(256 : index) : !llvm.i64 + %999 = llvm.mul %995, %998 : !llvm.i64 + %1000 = llvm.add %997, %999 : !llvm.i64 + %1001 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1002 = llvm.mul %516, %1001 : !llvm.i64 + %1003 = llvm.add %1000, %1002 : !llvm.i64 + %1004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1005 = llvm.mul %530, %1004 : !llvm.i64 + %1006 = llvm.add %1003, %1005 : !llvm.i64 + %1007 = llvm.getelementptr %996[%1006] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %985, %1007 : !llvm.ptr> + %1008 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1010 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1011 = llvm.mul %67, %1010 : !llvm.i64 + %1012 = llvm.add %1009, %1011 : !llvm.i64 + %1013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1014 = llvm.mul %58, %1013 : !llvm.i64 + %1015 = llvm.add %1012, %1014 : !llvm.i64 + %1016 = llvm.getelementptr %1008[%1015] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1017 = llvm.load %1016 : !llvm.ptr> + %1018 = llvm.add %155, %57 : !llvm.i64 + %1019 = llvm.icmp "slt" %1018, %67 : !llvm.i64 + %1020 = llvm.sub %64, %1018 : !llvm.i64 + %1021 = llvm.select %1019, %1020, %1018 : !llvm.i1, !llvm.i64 + %1022 = llvm.sdiv %1021, %68 : !llvm.i64 + %1023 = llvm.sub %64, %1022 : !llvm.i64 + %1024 = llvm.select %1019, %1023, %1022 : !llvm.i1, !llvm.i64 + %1025 = llvm.srem %1024, %68 : !llvm.i64 + %1026 = llvm.icmp "slt" %1025, %67 : !llvm.i64 + %1027 = llvm.add %1025, %68 : !llvm.i64 + %1028 = llvm.select %1026, %1027, %1025 : !llvm.i1, !llvm.i64 + %1029 = llvm.mul %1024, %65 : !llvm.i64 + %1030 = llvm.add %566, %1029 : !llvm.i64 + %1031 = llvm.add %1030, %58 : !llvm.i64 + %1032 = llvm.icmp "slt" %1031, %67 : !llvm.i64 + %1033 = llvm.sub %64, %1031 : !llvm.i64 + %1034 = llvm.select %1032, %1033, %1031 : !llvm.i1, !llvm.i64 + %1035 = llvm.sdiv %1034, %63 : !llvm.i64 + %1036 = llvm.sub %64, %1035 : !llvm.i64 + %1037 = llvm.select %1032, %1036, %1035 : !llvm.i1, !llvm.i64 + %1038 = llvm.mul %1037, %65 : !llvm.i64 + %1039 = llvm.add %1030, %1038 : !llvm.i64 + %1040 = llvm.add %1039, %58 : !llvm.i64 + %1041 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1042 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1043 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1044 = llvm.mul %1028, %1043 : !llvm.i64 + %1045 = llvm.add %1042, %1044 : !llvm.i64 + %1046 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1047 = llvm.mul %516, %1046 : !llvm.i64 + %1048 = llvm.add %1045, %1047 : !llvm.i64 + %1049 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1050 = llvm.mul %1040, %1049 : !llvm.i64 + %1051 = llvm.add %1048, %1050 : !llvm.i64 + %1052 = llvm.getelementptr %1041[%1051] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1017, %1052 : !llvm.ptr> + %1053 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1054 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1055 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1056 = llvm.mul %67, %1055 : !llvm.i64 + %1057 = llvm.add %1054, %1056 : !llvm.i64 + %1058 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1059 = llvm.mul %35, %1058 : !llvm.i64 + %1060 = llvm.add %1057, %1059 : !llvm.i64 + %1061 = llvm.getelementptr %1053[%1060] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1062 = llvm.load %1061 : !llvm.ptr> + %1063 = llvm.add %508, %61 : !llvm.i64 + %1064 = llvm.icmp "slt" %1063, %67 : !llvm.i64 + %1065 = llvm.sub %64, %1063 : !llvm.i64 + %1066 = llvm.select %1064, %1065, %1063 : !llvm.i1, !llvm.i64 + %1067 = llvm.sdiv %1066, %68 : !llvm.i64 + %1068 = llvm.sub %64, %1067 : !llvm.i64 + %1069 = llvm.select %1064, %1068, %1067 : !llvm.i1, !llvm.i64 + %1070 = llvm.mul %1069, %60 : !llvm.i64 + %1071 = llvm.add %508, %1070 : !llvm.i64 + %1072 = llvm.add %1071, %61 : !llvm.i64 + %1073 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1074 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1075 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1076 = llvm.mul %1072, %1075 : !llvm.i64 + %1077 = llvm.add %1074, %1076 : !llvm.i64 + %1078 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1079 = llvm.mul %516, %1078 : !llvm.i64 + %1080 = llvm.add %1077, %1079 : !llvm.i64 + %1081 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1082 = llvm.mul %530, %1081 : !llvm.i64 + %1083 = llvm.add %1080, %1082 : !llvm.i64 + %1084 = llvm.getelementptr %1073[%1083] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1062, %1084 : !llvm.ptr> + %1085 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1087 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1088 = llvm.mul %67, %1087 : !llvm.i64 + %1089 = llvm.add %1086, %1088 : !llvm.i64 + %1090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1091 = llvm.mul %66, %1090 : !llvm.i64 + %1092 = llvm.add %1089, %1091 : !llvm.i64 + %1093 = llvm.getelementptr %1085[%1092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1094 = llvm.load %1093 : !llvm.ptr> + %1095 = llvm.add %155, %62 : !llvm.i64 + %1096 = llvm.icmp "slt" %1095, %67 : !llvm.i64 + %1097 = llvm.sub %64, %1095 : !llvm.i64 + %1098 = llvm.select %1096, %1097, %1095 : !llvm.i1, !llvm.i64 + %1099 = llvm.sdiv %1098, %68 : !llvm.i64 + %1100 = llvm.sub %64, %1099 : !llvm.i64 + %1101 = llvm.select %1096, %1100, %1099 : !llvm.i1, !llvm.i64 + %1102 = llvm.srem %1101, %68 : !llvm.i64 + %1103 = llvm.icmp "slt" %1102, %67 : !llvm.i64 + %1104 = llvm.add %1102, %68 : !llvm.i64 + %1105 = llvm.select %1103, %1104, %1102 : !llvm.i1, !llvm.i64 + %1106 = llvm.mul %1101, %65 : !llvm.i64 + %1107 = llvm.add %566, %1106 : !llvm.i64 + %1108 = llvm.add %1107, %66 : !llvm.i64 + %1109 = llvm.icmp "slt" %1108, %67 : !llvm.i64 + %1110 = llvm.sub %64, %1108 : !llvm.i64 + %1111 = llvm.select %1109, %1110, %1108 : !llvm.i1, !llvm.i64 + %1112 = llvm.sdiv %1111, %63 : !llvm.i64 + %1113 = llvm.sub %64, %1112 : !llvm.i64 + %1114 = llvm.select %1109, %1113, %1112 : !llvm.i1, !llvm.i64 + %1115 = llvm.mul %1114, %65 : !llvm.i64 + %1116 = llvm.add %1107, %1115 : !llvm.i64 + %1117 = llvm.add %1116, %66 : !llvm.i64 + %1118 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1119 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1120 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1121 = llvm.mul %1105, %1120 : !llvm.i64 + %1122 = llvm.add %1119, %1121 : !llvm.i64 + %1123 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1124 = llvm.mul %516, %1123 : !llvm.i64 + %1125 = llvm.add %1122, %1124 : !llvm.i64 + %1126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1127 = llvm.mul %1117, %1126 : !llvm.i64 + %1128 = llvm.add %1125, %1127 : !llvm.i64 + %1129 = llvm.getelementptr %1118[%1128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1094, %1129 : !llvm.ptr> + llvm.br ^bb9 + ^bb8: // pred: ^bb6 + %1130 = llvm.add %151, %155 : !llvm.i64 + %1131 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1134 = llvm.mul %153, %1133 : !llvm.i64 + %1135 = llvm.add %1132, %1134 : !llvm.i64 + %1136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1137 = llvm.mul %1130, %1136 : !llvm.i64 + %1138 = llvm.add %1135, %1137 : !llvm.i64 + %1139 = llvm.getelementptr %1131[%1138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1140 = llvm.bitcast %1139 : !llvm.ptr to !llvm.ptr> + %1141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1143 = llvm.trunc %1130 : !llvm.i64 to !llvm.i32 + %1144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1146 = llvm.insertelement %1143, %1144[%1145 : !llvm.i32] : !llvm.vec<8 x i32> + %1147 = llvm.shufflevector %1146, %1144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1148 = llvm.add %1147, %1142 : !llvm.vec<8 x i32> + %1149 = llvm.trunc %1141 : !llvm.i64 to !llvm.i32 + %1150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1152 = llvm.insertelement %1149, %1150[%1151 : !llvm.i32] : !llvm.vec<8 x i32> + %1153 = llvm.shufflevector %1152, %1150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1154 = llvm.icmp "slt" %1148, %1153 : !llvm.vec<8 x i32> + %1155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1156 = llvm.intr.masked.load %1140, %1154, %1155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1157 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1159 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1160 = llvm.mul %67, %1159 : !llvm.i64 + %1161 = llvm.add %1158, %1160 : !llvm.i64 + %1162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1163 = llvm.mul %67, %1162 : !llvm.i64 + %1164 = llvm.add %1161, %1163 : !llvm.i64 + %1165 = llvm.getelementptr %1157[%1164] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1156, %1165 : !llvm.ptr> + %1166 = llvm.add %1130, %70 : !llvm.i64 + %1167 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1169 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1170 = llvm.mul %153, %1169 : !llvm.i64 + %1171 = llvm.add %1168, %1170 : !llvm.i64 + %1172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1173 = llvm.mul %1166, %1172 : !llvm.i64 + %1174 = llvm.add %1171, %1173 : !llvm.i64 + %1175 = llvm.getelementptr %1167[%1174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1176 = llvm.bitcast %1175 : !llvm.ptr to !llvm.ptr> + %1177 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1178 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1179 = llvm.trunc %1166 : !llvm.i64 to !llvm.i32 + %1180 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1181 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1182 = llvm.insertelement %1179, %1180[%1181 : !llvm.i32] : !llvm.vec<8 x i32> + %1183 = llvm.shufflevector %1182, %1180 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1184 = llvm.add %1183, %1178 : !llvm.vec<8 x i32> + %1185 = llvm.trunc %1177 : !llvm.i64 to !llvm.i32 + %1186 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1187 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1188 = llvm.insertelement %1185, %1186[%1187 : !llvm.i32] : !llvm.vec<8 x i32> + %1189 = llvm.shufflevector %1188, %1186 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1190 = llvm.icmp "slt" %1184, %1189 : !llvm.vec<8 x i32> + %1191 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1192 = llvm.intr.masked.load %1176, %1190, %1191 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1193 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1194 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1195 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1196 = llvm.mul %67, %1195 : !llvm.i64 + %1197 = llvm.add %1194, %1196 : !llvm.i64 + %1198 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1199 = llvm.mul %69, %1198 : !llvm.i64 + %1200 = llvm.add %1197, %1199 : !llvm.i64 + %1201 = llvm.getelementptr %1193[%1200] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1192, %1201 : !llvm.ptr> + %1202 = llvm.add %1130, %68 : !llvm.i64 + %1203 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1204 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1205 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1206 = llvm.mul %153, %1205 : !llvm.i64 + %1207 = llvm.add %1204, %1206 : !llvm.i64 + %1208 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1209 = llvm.mul %1202, %1208 : !llvm.i64 + %1210 = llvm.add %1207, %1209 : !llvm.i64 + %1211 = llvm.getelementptr %1203[%1210] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1212 = llvm.bitcast %1211 : !llvm.ptr to !llvm.ptr> + %1213 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1214 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1215 = llvm.trunc %1202 : !llvm.i64 to !llvm.i32 + %1216 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1217 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1218 = llvm.insertelement %1215, %1216[%1217 : !llvm.i32] : !llvm.vec<8 x i32> + %1219 = llvm.shufflevector %1218, %1216 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1220 = llvm.add %1219, %1214 : !llvm.vec<8 x i32> + %1221 = llvm.trunc %1213 : !llvm.i64 to !llvm.i32 + %1222 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1223 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1224 = llvm.insertelement %1221, %1222[%1223 : !llvm.i32] : !llvm.vec<8 x i32> + %1225 = llvm.shufflevector %1224, %1222 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1226 = llvm.icmp "slt" %1220, %1225 : !llvm.vec<8 x i32> + %1227 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1228 = llvm.intr.masked.load %1212, %1226, %1227 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1229 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1230 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1231 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1232 = llvm.mul %67, %1231 : !llvm.i64 + %1233 = llvm.add %1230, %1232 : !llvm.i64 + %1234 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1235 = llvm.mul %63, %1234 : !llvm.i64 + %1236 = llvm.add %1233, %1235 : !llvm.i64 + %1237 = llvm.getelementptr %1229[%1236] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1228, %1237 : !llvm.ptr> + %1238 = llvm.add %1130, %41 : !llvm.i64 + %1239 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1242 = llvm.mul %153, %1241 : !llvm.i64 + %1243 = llvm.add %1240, %1242 : !llvm.i64 + %1244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1245 = llvm.mul %1238, %1244 : !llvm.i64 + %1246 = llvm.add %1243, %1245 : !llvm.i64 + %1247 = llvm.getelementptr %1239[%1246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1248 = llvm.bitcast %1247 : !llvm.ptr to !llvm.ptr> + %1249 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1250 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1251 = llvm.trunc %1238 : !llvm.i64 to !llvm.i32 + %1252 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1253 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1254 = llvm.insertelement %1251, %1252[%1253 : !llvm.i32] : !llvm.vec<8 x i32> + %1255 = llvm.shufflevector %1254, %1252 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1256 = llvm.add %1255, %1250 : !llvm.vec<8 x i32> + %1257 = llvm.trunc %1249 : !llvm.i64 to !llvm.i32 + %1258 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1259 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1260 = llvm.insertelement %1257, %1258[%1259 : !llvm.i32] : !llvm.vec<8 x i32> + %1261 = llvm.shufflevector %1260, %1258 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1262 = llvm.icmp "slt" %1256, %1261 : !llvm.vec<8 x i32> + %1263 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1264 = llvm.intr.masked.load %1248, %1262, %1263 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1265 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1266 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1267 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1268 = llvm.mul %67, %1267 : !llvm.i64 + %1269 = llvm.add %1266, %1268 : !llvm.i64 + %1270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1271 = llvm.mul %45, %1270 : !llvm.i64 + %1272 = llvm.add %1269, %1271 : !llvm.i64 + %1273 = llvm.getelementptr %1265[%1272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1264, %1273 : !llvm.ptr> + %1274 = llvm.add %1130, %42 : !llvm.i64 + %1275 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1276 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1277 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1278 = llvm.mul %153, %1277 : !llvm.i64 + %1279 = llvm.add %1276, %1278 : !llvm.i64 + %1280 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1281 = llvm.mul %1274, %1280 : !llvm.i64 + %1282 = llvm.add %1279, %1281 : !llvm.i64 + %1283 = llvm.getelementptr %1275[%1282] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1284 = llvm.bitcast %1283 : !llvm.ptr to !llvm.ptr> + %1285 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1286 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1287 = llvm.trunc %1274 : !llvm.i64 to !llvm.i32 + %1288 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1289 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1290 = llvm.insertelement %1287, %1288[%1289 : !llvm.i32] : !llvm.vec<8 x i32> + %1291 = llvm.shufflevector %1290, %1288 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1292 = llvm.add %1291, %1286 : !llvm.vec<8 x i32> + %1293 = llvm.trunc %1285 : !llvm.i64 to !llvm.i32 + %1294 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1295 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1296 = llvm.insertelement %1293, %1294[%1295 : !llvm.i32] : !llvm.vec<8 x i32> + %1297 = llvm.shufflevector %1296, %1294 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1298 = llvm.icmp "slt" %1292, %1297 : !llvm.vec<8 x i32> + %1299 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1300 = llvm.intr.masked.load %1284, %1298, %1299 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1301 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1302 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1303 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1304 = llvm.mul %67, %1303 : !llvm.i64 + %1305 = llvm.add %1302, %1304 : !llvm.i64 + %1306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1307 = llvm.mul %48, %1306 : !llvm.i64 + %1308 = llvm.add %1305, %1307 : !llvm.i64 + %1309 = llvm.getelementptr %1301[%1308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1300, %1309 : !llvm.ptr> + %1310 = llvm.add %1130, %43 : !llvm.i64 + %1311 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1314 = llvm.mul %153, %1313 : !llvm.i64 + %1315 = llvm.add %1312, %1314 : !llvm.i64 + %1316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1317 = llvm.mul %1310, %1316 : !llvm.i64 + %1318 = llvm.add %1315, %1317 : !llvm.i64 + %1319 = llvm.getelementptr %1311[%1318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1320 = llvm.bitcast %1319 : !llvm.ptr to !llvm.ptr> + %1321 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1322 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1323 = llvm.trunc %1310 : !llvm.i64 to !llvm.i32 + %1324 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1325 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1326 = llvm.insertelement %1323, %1324[%1325 : !llvm.i32] : !llvm.vec<8 x i32> + %1327 = llvm.shufflevector %1326, %1324 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1328 = llvm.add %1327, %1322 : !llvm.vec<8 x i32> + %1329 = llvm.trunc %1321 : !llvm.i64 to !llvm.i32 + %1330 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1331 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1332 = llvm.insertelement %1329, %1330[%1331 : !llvm.i32] : !llvm.vec<8 x i32> + %1333 = llvm.shufflevector %1332, %1330 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1334 = llvm.icmp "slt" %1328, %1333 : !llvm.vec<8 x i32> + %1335 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1336 = llvm.intr.masked.load %1320, %1334, %1335 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1340 = llvm.mul %67, %1339 : !llvm.i64 + %1341 = llvm.add %1338, %1340 : !llvm.i64 + %1342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1343 = llvm.mul %52, %1342 : !llvm.i64 + %1344 = llvm.add %1341, %1343 : !llvm.i64 + %1345 = llvm.getelementptr %1337[%1344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1336, %1345 : !llvm.ptr> + %1346 = llvm.add %1130, %44 : !llvm.i64 + %1347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1350 = llvm.mul %153, %1349 : !llvm.i64 + %1351 = llvm.add %1348, %1350 : !llvm.i64 + %1352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1353 = llvm.mul %1346, %1352 : !llvm.i64 + %1354 = llvm.add %1351, %1353 : !llvm.i64 + %1355 = llvm.getelementptr %1347[%1354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1356 = llvm.bitcast %1355 : !llvm.ptr to !llvm.ptr> + %1357 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1358 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1359 = llvm.trunc %1346 : !llvm.i64 to !llvm.i32 + %1360 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1361 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1362 = llvm.insertelement %1359, %1360[%1361 : !llvm.i32] : !llvm.vec<8 x i32> + %1363 = llvm.shufflevector %1362, %1360 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1364 = llvm.add %1363, %1358 : !llvm.vec<8 x i32> + %1365 = llvm.trunc %1357 : !llvm.i64 to !llvm.i32 + %1366 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1367 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1368 = llvm.insertelement %1365, %1366[%1367 : !llvm.i32] : !llvm.vec<8 x i32> + %1369 = llvm.shufflevector %1368, %1366 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1370 = llvm.icmp "slt" %1364, %1369 : !llvm.vec<8 x i32> + %1371 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1372 = llvm.intr.masked.load %1356, %1370, %1371 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1373 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1375 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1376 = llvm.mul %67, %1375 : !llvm.i64 + %1377 = llvm.add %1374, %1376 : !llvm.i64 + %1378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1379 = llvm.mul %56, %1378 : !llvm.i64 + %1380 = llvm.add %1377, %1379 : !llvm.i64 + %1381 = llvm.getelementptr %1373[%1380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1372, %1381 : !llvm.ptr> + %1382 = llvm.add %1130, %46 : !llvm.i64 + %1383 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1384 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1385 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1386 = llvm.mul %153, %1385 : !llvm.i64 + %1387 = llvm.add %1384, %1386 : !llvm.i64 + %1388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1389 = llvm.mul %1382, %1388 : !llvm.i64 + %1390 = llvm.add %1387, %1389 : !llvm.i64 + %1391 = llvm.getelementptr %1383[%1390] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1392 = llvm.bitcast %1391 : !llvm.ptr to !llvm.ptr> + %1393 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1394 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1395 = llvm.trunc %1382 : !llvm.i64 to !llvm.i32 + %1396 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1397 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1398 = llvm.insertelement %1395, %1396[%1397 : !llvm.i32] : !llvm.vec<8 x i32> + %1399 = llvm.shufflevector %1398, %1396 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1400 = llvm.add %1399, %1394 : !llvm.vec<8 x i32> + %1401 = llvm.trunc %1393 : !llvm.i64 to !llvm.i32 + %1402 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1403 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1404 = llvm.insertelement %1401, %1402[%1403 : !llvm.i32] : !llvm.vec<8 x i32> + %1405 = llvm.shufflevector %1404, %1402 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1406 = llvm.icmp "slt" %1400, %1405 : !llvm.vec<8 x i32> + %1407 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1408 = llvm.intr.masked.load %1392, %1406, %1407 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1409 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1410 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1411 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1412 = llvm.mul %67, %1411 : !llvm.i64 + %1413 = llvm.add %1410, %1412 : !llvm.i64 + %1414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1415 = llvm.mul %61, %1414 : !llvm.i64 + %1416 = llvm.add %1413, %1415 : !llvm.i64 + %1417 = llvm.getelementptr %1409[%1416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1408, %1417 : !llvm.ptr> + %1418 = llvm.add %1130, %47 : !llvm.i64 + %1419 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1422 = llvm.mul %153, %1421 : !llvm.i64 + %1423 = llvm.add %1420, %1422 : !llvm.i64 + %1424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1425 = llvm.mul %1418, %1424 : !llvm.i64 + %1426 = llvm.add %1423, %1425 : !llvm.i64 + %1427 = llvm.getelementptr %1419[%1426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1428 = llvm.bitcast %1427 : !llvm.ptr to !llvm.ptr> + %1429 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1430 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1431 = llvm.trunc %1418 : !llvm.i64 to !llvm.i32 + %1432 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1433 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1434 = llvm.insertelement %1431, %1432[%1433 : !llvm.i32] : !llvm.vec<8 x i32> + %1435 = llvm.shufflevector %1434, %1432 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1436 = llvm.add %1435, %1430 : !llvm.vec<8 x i32> + %1437 = llvm.trunc %1429 : !llvm.i64 to !llvm.i32 + %1438 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1439 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1440 = llvm.insertelement %1437, %1438[%1439 : !llvm.i32] : !llvm.vec<8 x i32> + %1441 = llvm.shufflevector %1440, %1438 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1442 = llvm.icmp "slt" %1436, %1441 : !llvm.vec<8 x i32> + %1443 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1444 = llvm.intr.masked.load %1428, %1442, %1443 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1445 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1446 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1447 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1448 = llvm.mul %67, %1447 : !llvm.i64 + %1449 = llvm.add %1446, %1448 : !llvm.i64 + %1450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1451 = llvm.mul %70, %1450 : !llvm.i64 + %1452 = llvm.add %1449, %1451 : !llvm.i64 + %1453 = llvm.getelementptr %1445[%1452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1444, %1453 : !llvm.ptr> + %1454 = llvm.add %1130, %49 : !llvm.i64 + %1455 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1456 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1457 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1458 = llvm.mul %153, %1457 : !llvm.i64 + %1459 = llvm.add %1456, %1458 : !llvm.i64 + %1460 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1461 = llvm.mul %1454, %1460 : !llvm.i64 + %1462 = llvm.add %1459, %1461 : !llvm.i64 + %1463 = llvm.getelementptr %1455[%1462] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1464 = llvm.bitcast %1463 : !llvm.ptr to !llvm.ptr> + %1465 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1466 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1467 = llvm.trunc %1454 : !llvm.i64 to !llvm.i32 + %1468 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1469 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1470 = llvm.insertelement %1467, %1468[%1469 : !llvm.i32] : !llvm.vec<8 x i32> + %1471 = llvm.shufflevector %1470, %1468 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1472 = llvm.add %1471, %1466 : !llvm.vec<8 x i32> + %1473 = llvm.trunc %1465 : !llvm.i64 to !llvm.i32 + %1474 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1475 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1476 = llvm.insertelement %1473, %1474[%1475 : !llvm.i32] : !llvm.vec<8 x i32> + %1477 = llvm.shufflevector %1476, %1474 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1478 = llvm.icmp "slt" %1472, %1477 : !llvm.vec<8 x i32> + %1479 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1480 = llvm.intr.masked.load %1464, %1478, %1479 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1481 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1482 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1483 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1484 = llvm.mul %67, %1483 : !llvm.i64 + %1485 = llvm.add %1482, %1484 : !llvm.i64 + %1486 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1487 = llvm.mul %50, %1486 : !llvm.i64 + %1488 = llvm.add %1485, %1487 : !llvm.i64 + %1489 = llvm.getelementptr %1481[%1488] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1480, %1489 : !llvm.ptr> + %1490 = llvm.add %1130, %51 : !llvm.i64 + %1491 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1492 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1493 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1494 = llvm.mul %153, %1493 : !llvm.i64 + %1495 = llvm.add %1492, %1494 : !llvm.i64 + %1496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1497 = llvm.mul %1490, %1496 : !llvm.i64 + %1498 = llvm.add %1495, %1497 : !llvm.i64 + %1499 = llvm.getelementptr %1491[%1498] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1500 = llvm.bitcast %1499 : !llvm.ptr to !llvm.ptr> + %1501 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1502 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1503 = llvm.trunc %1490 : !llvm.i64 to !llvm.i32 + %1504 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1505 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1506 = llvm.insertelement %1503, %1504[%1505 : !llvm.i32] : !llvm.vec<8 x i32> + %1507 = llvm.shufflevector %1506, %1504 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1508 = llvm.add %1507, %1502 : !llvm.vec<8 x i32> + %1509 = llvm.trunc %1501 : !llvm.i64 to !llvm.i32 + %1510 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1511 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1512 = llvm.insertelement %1509, %1510[%1511 : !llvm.i32] : !llvm.vec<8 x i32> + %1513 = llvm.shufflevector %1512, %1510 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1514 = llvm.icmp "slt" %1508, %1513 : !llvm.vec<8 x i32> + %1515 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1516 = llvm.intr.masked.load %1500, %1514, %1515 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1517 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1519 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1520 = llvm.mul %67, %1519 : !llvm.i64 + %1521 = llvm.add %1518, %1520 : !llvm.i64 + %1522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1523 = llvm.mul %33, %1522 : !llvm.i64 + %1524 = llvm.add %1521, %1523 : !llvm.i64 + %1525 = llvm.getelementptr %1517[%1524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1516, %1525 : !llvm.ptr> + %1526 = llvm.add %1130, %53 : !llvm.i64 + %1527 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1528 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1529 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1530 = llvm.mul %153, %1529 : !llvm.i64 + %1531 = llvm.add %1528, %1530 : !llvm.i64 + %1532 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1533 = llvm.mul %1526, %1532 : !llvm.i64 + %1534 = llvm.add %1531, %1533 : !llvm.i64 + %1535 = llvm.getelementptr %1527[%1534] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1536 = llvm.bitcast %1535 : !llvm.ptr to !llvm.ptr> + %1537 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1538 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1539 = llvm.trunc %1526 : !llvm.i64 to !llvm.i32 + %1540 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1541 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1542 = llvm.insertelement %1539, %1540[%1541 : !llvm.i32] : !llvm.vec<8 x i32> + %1543 = llvm.shufflevector %1542, %1540 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1544 = llvm.add %1543, %1538 : !llvm.vec<8 x i32> + %1545 = llvm.trunc %1537 : !llvm.i64 to !llvm.i32 + %1546 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1547 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1548 = llvm.insertelement %1545, %1546[%1547 : !llvm.i32] : !llvm.vec<8 x i32> + %1549 = llvm.shufflevector %1548, %1546 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1550 = llvm.icmp "slt" %1544, %1549 : !llvm.vec<8 x i32> + %1551 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1552 = llvm.intr.masked.load %1536, %1550, %1551 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1553 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1554 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1555 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1556 = llvm.mul %67, %1555 : !llvm.i64 + %1557 = llvm.add %1554, %1556 : !llvm.i64 + %1558 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1559 = llvm.mul %54, %1558 : !llvm.i64 + %1560 = llvm.add %1557, %1559 : !llvm.i64 + %1561 = llvm.getelementptr %1553[%1560] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1552, %1561 : !llvm.ptr> + %1562 = llvm.add %1130, %55 : !llvm.i64 + %1563 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1564 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1565 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1566 = llvm.mul %153, %1565 : !llvm.i64 + %1567 = llvm.add %1564, %1566 : !llvm.i64 + %1568 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1569 = llvm.mul %1562, %1568 : !llvm.i64 + %1570 = llvm.add %1567, %1569 : !llvm.i64 + %1571 = llvm.getelementptr %1563[%1570] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1572 = llvm.bitcast %1571 : !llvm.ptr to !llvm.ptr> + %1573 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1574 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1575 = llvm.trunc %1562 : !llvm.i64 to !llvm.i32 + %1576 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1577 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1578 = llvm.insertelement %1575, %1576[%1577 : !llvm.i32] : !llvm.vec<8 x i32> + %1579 = llvm.shufflevector %1578, %1576 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1580 = llvm.add %1579, %1574 : !llvm.vec<8 x i32> + %1581 = llvm.trunc %1573 : !llvm.i64 to !llvm.i32 + %1582 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1583 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1584 = llvm.insertelement %1581, %1582[%1583 : !llvm.i32] : !llvm.vec<8 x i32> + %1585 = llvm.shufflevector %1584, %1582 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1586 = llvm.icmp "slt" %1580, %1585 : !llvm.vec<8 x i32> + %1587 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1588 = llvm.intr.masked.load %1572, %1586, %1587 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1589 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1592 = llvm.mul %67, %1591 : !llvm.i64 + %1593 = llvm.add %1590, %1592 : !llvm.i64 + %1594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1595 = llvm.mul %34, %1594 : !llvm.i64 + %1596 = llvm.add %1593, %1595 : !llvm.i64 + %1597 = llvm.getelementptr %1589[%1596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1588, %1597 : !llvm.ptr> + %1598 = llvm.add %1130, %57 : !llvm.i64 + %1599 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1602 = llvm.mul %153, %1601 : !llvm.i64 + %1603 = llvm.add %1600, %1602 : !llvm.i64 + %1604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1605 = llvm.mul %1598, %1604 : !llvm.i64 + %1606 = llvm.add %1603, %1605 : !llvm.i64 + %1607 = llvm.getelementptr %1599[%1606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1608 = llvm.bitcast %1607 : !llvm.ptr to !llvm.ptr> + %1609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1611 = llvm.trunc %1598 : !llvm.i64 to !llvm.i32 + %1612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1614 = llvm.insertelement %1611, %1612[%1613 : !llvm.i32] : !llvm.vec<8 x i32> + %1615 = llvm.shufflevector %1614, %1612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1616 = llvm.add %1615, %1610 : !llvm.vec<8 x i32> + %1617 = llvm.trunc %1609 : !llvm.i64 to !llvm.i32 + %1618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1620 = llvm.insertelement %1617, %1618[%1619 : !llvm.i32] : !llvm.vec<8 x i32> + %1621 = llvm.shufflevector %1620, %1618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1622 = llvm.icmp "slt" %1616, %1621 : !llvm.vec<8 x i32> + %1623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1624 = llvm.intr.masked.load %1608, %1622, %1623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1625 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1626 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1627 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1628 = llvm.mul %67, %1627 : !llvm.i64 + %1629 = llvm.add %1626, %1628 : !llvm.i64 + %1630 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1631 = llvm.mul %58, %1630 : !llvm.i64 + %1632 = llvm.add %1629, %1631 : !llvm.i64 + %1633 = llvm.getelementptr %1625[%1632] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1624, %1633 : !llvm.ptr> + %1634 = llvm.add %1130, %59 : !llvm.i64 + %1635 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1637 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1638 = llvm.mul %153, %1637 : !llvm.i64 + %1639 = llvm.add %1636, %1638 : !llvm.i64 + %1640 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1641 = llvm.mul %1634, %1640 : !llvm.i64 + %1642 = llvm.add %1639, %1641 : !llvm.i64 + %1643 = llvm.getelementptr %1635[%1642] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1644 = llvm.bitcast %1643 : !llvm.ptr to !llvm.ptr> + %1645 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1646 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1647 = llvm.trunc %1634 : !llvm.i64 to !llvm.i32 + %1648 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1649 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1650 = llvm.insertelement %1647, %1648[%1649 : !llvm.i32] : !llvm.vec<8 x i32> + %1651 = llvm.shufflevector %1650, %1648 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1652 = llvm.add %1651, %1646 : !llvm.vec<8 x i32> + %1653 = llvm.trunc %1645 : !llvm.i64 to !llvm.i32 + %1654 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1655 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1656 = llvm.insertelement %1653, %1654[%1655 : !llvm.i32] : !llvm.vec<8 x i32> + %1657 = llvm.shufflevector %1656, %1654 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1658 = llvm.icmp "slt" %1652, %1657 : !llvm.vec<8 x i32> + %1659 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1660 = llvm.intr.masked.load %1644, %1658, %1659 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1661 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1662 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1663 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1664 = llvm.mul %67, %1663 : !llvm.i64 + %1665 = llvm.add %1662, %1664 : !llvm.i64 + %1666 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1667 = llvm.mul %35, %1666 : !llvm.i64 + %1668 = llvm.add %1665, %1667 : !llvm.i64 + %1669 = llvm.getelementptr %1661[%1668] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1660, %1669 : !llvm.ptr> + %1670 = llvm.add %1130, %62 : !llvm.i64 + %1671 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1672 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1673 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1674 = llvm.mul %153, %1673 : !llvm.i64 + %1675 = llvm.add %1672, %1674 : !llvm.i64 + %1676 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1677 = llvm.mul %1670, %1676 : !llvm.i64 + %1678 = llvm.add %1675, %1677 : !llvm.i64 + %1679 = llvm.getelementptr %1671[%1678] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1680 = llvm.bitcast %1679 : !llvm.ptr to !llvm.ptr> + %1681 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1682 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1683 = llvm.trunc %1670 : !llvm.i64 to !llvm.i32 + %1684 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1685 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1686 = llvm.insertelement %1683, %1684[%1685 : !llvm.i32] : !llvm.vec<8 x i32> + %1687 = llvm.shufflevector %1686, %1684 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1688 = llvm.add %1687, %1682 : !llvm.vec<8 x i32> + %1689 = llvm.trunc %1681 : !llvm.i64 to !llvm.i32 + %1690 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1691 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1692 = llvm.insertelement %1689, %1690[%1691 : !llvm.i32] : !llvm.vec<8 x i32> + %1693 = llvm.shufflevector %1692, %1690 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1694 = llvm.icmp "slt" %1688, %1693 : !llvm.vec<8 x i32> + %1695 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1696 = llvm.intr.masked.load %1680, %1694, %1695 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1697 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1698 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1699 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1700 = llvm.mul %67, %1699 : !llvm.i64 + %1701 = llvm.add %1698, %1700 : !llvm.i64 + %1702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1703 = llvm.mul %66, %1702 : !llvm.i64 + %1704 = llvm.add %1701, %1703 : !llvm.i64 + %1705 = llvm.getelementptr %1697[%1704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1696, %1705 : !llvm.ptr> + %1706 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1708 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1709 = llvm.mul %67, %1708 : !llvm.i64 + %1710 = llvm.add %1707, %1709 : !llvm.i64 + %1711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1712 = llvm.mul %67, %1711 : !llvm.i64 + %1713 = llvm.add %1710, %1712 : !llvm.i64 + %1714 = llvm.getelementptr %1706[%1713] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1715 = llvm.load %1714 : !llvm.ptr> + %1716 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %1717 = llvm.sub %64, %155 : !llvm.i64 + %1718 = llvm.select %1716, %1717, %155 : !llvm.i1, !llvm.i64 + %1719 = llvm.sdiv %1718, %68 : !llvm.i64 + %1720 = llvm.sub %64, %1719 : !llvm.i64 + %1721 = llvm.select %1716, %1720, %1719 : !llvm.i1, !llvm.i64 + %1722 = llvm.srem %1721, %68 : !llvm.i64 + %1723 = llvm.icmp "slt" %1722, %67 : !llvm.i64 + %1724 = llvm.add %1722, %68 : !llvm.i64 + %1725 = llvm.select %1723, %1724, %1722 : !llvm.i1, !llvm.i64 + %1726 = llvm.srem %153, %39 : !llvm.i64 + %1727 = llvm.icmp "slt" %1726, %67 : !llvm.i64 + %1728 = llvm.add %1726, %39 : !llvm.i64 + %1729 = llvm.select %1727, %1728, %1726 : !llvm.i1, !llvm.i64 + %1730 = llvm.srem %155, %68 : !llvm.i64 + %1731 = llvm.icmp "slt" %1730, %67 : !llvm.i64 + %1732 = llvm.add %1730, %68 : !llvm.i64 + %1733 = llvm.select %1731, %1732, %1730 : !llvm.i1, !llvm.i64 + %1734 = llvm.icmp "slt" %1733, %67 : !llvm.i64 + %1735 = llvm.sub %64, %1733 : !llvm.i64 + %1736 = llvm.select %1734, %1735, %1733 : !llvm.i1, !llvm.i64 + %1737 = llvm.sdiv %1736, %70 : !llvm.i64 + %1738 = llvm.sub %64, %1737 : !llvm.i64 + %1739 = llvm.select %1734, %1738, %1737 : !llvm.i1, !llvm.i64 + %1740 = llvm.srem %1739, %63 : !llvm.i64 + %1741 = llvm.icmp "slt" %1740, %67 : !llvm.i64 + %1742 = llvm.add %1740, %63 : !llvm.i64 + %1743 = llvm.select %1741, %1742, %1740 : !llvm.i1, !llvm.i64 + %1744 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1746 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1747 = llvm.mul %1725, %1746 : !llvm.i64 + %1748 = llvm.add %1745, %1747 : !llvm.i64 + %1749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1750 = llvm.mul %1729, %1749 : !llvm.i64 + %1751 = llvm.add %1748, %1750 : !llvm.i64 + %1752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1753 = llvm.mul %1743, %1752 : !llvm.i64 + %1754 = llvm.add %1751, %1753 : !llvm.i64 + %1755 = llvm.getelementptr %1744[%1754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1715, %1755 : !llvm.ptr> + %1756 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1758 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1759 = llvm.mul %67, %1758 : !llvm.i64 + %1760 = llvm.add %1757, %1759 : !llvm.i64 + %1761 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1762 = llvm.mul %69, %1761 : !llvm.i64 + %1763 = llvm.add %1760, %1762 : !llvm.i64 + %1764 = llvm.getelementptr %1756[%1763] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1765 = llvm.load %1764 : !llvm.ptr> + %1766 = llvm.add %155, %70 : !llvm.i64 + %1767 = llvm.icmp "slt" %1766, %67 : !llvm.i64 + %1768 = llvm.sub %64, %1766 : !llvm.i64 + %1769 = llvm.select %1767, %1768, %1766 : !llvm.i1, !llvm.i64 + %1770 = llvm.sdiv %1769, %68 : !llvm.i64 + %1771 = llvm.sub %64, %1770 : !llvm.i64 + %1772 = llvm.select %1767, %1771, %1770 : !llvm.i1, !llvm.i64 + %1773 = llvm.srem %1772, %68 : !llvm.i64 + %1774 = llvm.icmp "slt" %1773, %67 : !llvm.i64 + %1775 = llvm.add %1773, %68 : !llvm.i64 + %1776 = llvm.select %1774, %1775, %1773 : !llvm.i1, !llvm.i64 + %1777 = llvm.sdiv %1718, %70 : !llvm.i64 + %1778 = llvm.sub %64, %1777 : !llvm.i64 + %1779 = llvm.select %1716, %1778, %1777 : !llvm.i1, !llvm.i64 + %1780 = llvm.mul %1772, %65 : !llvm.i64 + %1781 = llvm.add %1779, %1780 : !llvm.i64 + %1782 = llvm.add %1781, %69 : !llvm.i64 + %1783 = llvm.icmp "slt" %1782, %67 : !llvm.i64 + %1784 = llvm.sub %64, %1782 : !llvm.i64 + %1785 = llvm.select %1783, %1784, %1782 : !llvm.i1, !llvm.i64 + %1786 = llvm.sdiv %1785, %63 : !llvm.i64 + %1787 = llvm.sub %64, %1786 : !llvm.i64 + %1788 = llvm.select %1783, %1787, %1786 : !llvm.i1, !llvm.i64 + %1789 = llvm.mul %1788, %65 : !llvm.i64 + %1790 = llvm.add %1781, %1789 : !llvm.i64 + %1791 = llvm.add %1790, %69 : !llvm.i64 + %1792 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1794 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1795 = llvm.mul %1776, %1794 : !llvm.i64 + %1796 = llvm.add %1793, %1795 : !llvm.i64 + %1797 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1798 = llvm.mul %1729, %1797 : !llvm.i64 + %1799 = llvm.add %1796, %1798 : !llvm.i64 + %1800 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1801 = llvm.mul %1791, %1800 : !llvm.i64 + %1802 = llvm.add %1799, %1801 : !llvm.i64 + %1803 = llvm.getelementptr %1792[%1802] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1765, %1803 : !llvm.ptr> + %1804 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1805 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1806 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1807 = llvm.mul %67, %1806 : !llvm.i64 + %1808 = llvm.add %1805, %1807 : !llvm.i64 + %1809 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1810 = llvm.mul %63, %1809 : !llvm.i64 + %1811 = llvm.add %1808, %1810 : !llvm.i64 + %1812 = llvm.getelementptr %1804[%1811] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1813 = llvm.load %1812 : !llvm.ptr> + %1814 = llvm.add %1721, %69 : !llvm.i64 + %1815 = llvm.icmp "slt" %1814, %67 : !llvm.i64 + %1816 = llvm.sub %64, %1814 : !llvm.i64 + %1817 = llvm.select %1815, %1816, %1814 : !llvm.i1, !llvm.i64 + %1818 = llvm.sdiv %1817, %68 : !llvm.i64 + %1819 = llvm.sub %64, %1818 : !llvm.i64 + %1820 = llvm.select %1815, %1819, %1818 : !llvm.i1, !llvm.i64 + %1821 = llvm.mul %1820, %60 : !llvm.i64 + %1822 = llvm.add %1721, %1821 : !llvm.i64 + %1823 = llvm.add %1822, %69 : !llvm.i64 + %1824 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1825 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1826 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1827 = llvm.mul %1823, %1826 : !llvm.i64 + %1828 = llvm.add %1825, %1827 : !llvm.i64 + %1829 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1830 = llvm.mul %1729, %1829 : !llvm.i64 + %1831 = llvm.add %1828, %1830 : !llvm.i64 + %1832 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1833 = llvm.mul %1743, %1832 : !llvm.i64 + %1834 = llvm.add %1831, %1833 : !llvm.i64 + %1835 = llvm.getelementptr %1824[%1834] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1813, %1835 : !llvm.ptr> + %1836 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1837 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1838 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1839 = llvm.mul %67, %1838 : !llvm.i64 + %1840 = llvm.add %1837, %1839 : !llvm.i64 + %1841 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1842 = llvm.mul %45, %1841 : !llvm.i64 + %1843 = llvm.add %1840, %1842 : !llvm.i64 + %1844 = llvm.getelementptr %1836[%1843] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1845 = llvm.load %1844 : !llvm.ptr> + %1846 = llvm.add %155, %41 : !llvm.i64 + %1847 = llvm.icmp "slt" %1846, %67 : !llvm.i64 + %1848 = llvm.sub %64, %1846 : !llvm.i64 + %1849 = llvm.select %1847, %1848, %1846 : !llvm.i1, !llvm.i64 + %1850 = llvm.sdiv %1849, %68 : !llvm.i64 + %1851 = llvm.sub %64, %1850 : !llvm.i64 + %1852 = llvm.select %1847, %1851, %1850 : !llvm.i1, !llvm.i64 + %1853 = llvm.srem %1852, %68 : !llvm.i64 + %1854 = llvm.icmp "slt" %1853, %67 : !llvm.i64 + %1855 = llvm.add %1853, %68 : !llvm.i64 + %1856 = llvm.select %1854, %1855, %1853 : !llvm.i1, !llvm.i64 + %1857 = llvm.mul %1852, %65 : !llvm.i64 + %1858 = llvm.add %1779, %1857 : !llvm.i64 + %1859 = llvm.add %1858, %45 : !llvm.i64 + %1860 = llvm.icmp "slt" %1859, %67 : !llvm.i64 + %1861 = llvm.sub %64, %1859 : !llvm.i64 + %1862 = llvm.select %1860, %1861, %1859 : !llvm.i1, !llvm.i64 + %1863 = llvm.sdiv %1862, %63 : !llvm.i64 + %1864 = llvm.sub %64, %1863 : !llvm.i64 + %1865 = llvm.select %1860, %1864, %1863 : !llvm.i1, !llvm.i64 + %1866 = llvm.mul %1865, %65 : !llvm.i64 + %1867 = llvm.add %1858, %1866 : !llvm.i64 + %1868 = llvm.add %1867, %45 : !llvm.i64 + %1869 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1871 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1872 = llvm.mul %1856, %1871 : !llvm.i64 + %1873 = llvm.add %1870, %1872 : !llvm.i64 + %1874 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1875 = llvm.mul %1729, %1874 : !llvm.i64 + %1876 = llvm.add %1873, %1875 : !llvm.i64 + %1877 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1878 = llvm.mul %1868, %1877 : !llvm.i64 + %1879 = llvm.add %1876, %1878 : !llvm.i64 + %1880 = llvm.getelementptr %1869[%1879] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1845, %1880 : !llvm.ptr> + %1881 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1882 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1883 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1884 = llvm.mul %67, %1883 : !llvm.i64 + %1885 = llvm.add %1882, %1884 : !llvm.i64 + %1886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1887 = llvm.mul %48, %1886 : !llvm.i64 + %1888 = llvm.add %1885, %1887 : !llvm.i64 + %1889 = llvm.getelementptr %1881[%1888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1890 = llvm.load %1889 : !llvm.ptr> + %1891 = llvm.add %1721, %63 : !llvm.i64 + %1892 = llvm.icmp "slt" %1891, %67 : !llvm.i64 + %1893 = llvm.sub %64, %1891 : !llvm.i64 + %1894 = llvm.select %1892, %1893, %1891 : !llvm.i1, !llvm.i64 + %1895 = llvm.sdiv %1894, %68 : !llvm.i64 + %1896 = llvm.sub %64, %1895 : !llvm.i64 + %1897 = llvm.select %1892, %1896, %1895 : !llvm.i1, !llvm.i64 + %1898 = llvm.mul %1897, %60 : !llvm.i64 + %1899 = llvm.add %1721, %1898 : !llvm.i64 + %1900 = llvm.add %1899, %63 : !llvm.i64 + %1901 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1903 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1904 = llvm.mul %1900, %1903 : !llvm.i64 + %1905 = llvm.add %1902, %1904 : !llvm.i64 + %1906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1907 = llvm.mul %1729, %1906 : !llvm.i64 + %1908 = llvm.add %1905, %1907 : !llvm.i64 + %1909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1910 = llvm.mul %1743, %1909 : !llvm.i64 + %1911 = llvm.add %1908, %1910 : !llvm.i64 + %1912 = llvm.getelementptr %1901[%1911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1890, %1912 : !llvm.ptr> + %1913 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1914 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1915 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1916 = llvm.mul %67, %1915 : !llvm.i64 + %1917 = llvm.add %1914, %1916 : !llvm.i64 + %1918 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1919 = llvm.mul %52, %1918 : !llvm.i64 + %1920 = llvm.add %1917, %1919 : !llvm.i64 + %1921 = llvm.getelementptr %1913[%1920] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1922 = llvm.load %1921 : !llvm.ptr> + %1923 = llvm.add %155, %43 : !llvm.i64 + %1924 = llvm.icmp "slt" %1923, %67 : !llvm.i64 + %1925 = llvm.sub %64, %1923 : !llvm.i64 + %1926 = llvm.select %1924, %1925, %1923 : !llvm.i1, !llvm.i64 + %1927 = llvm.sdiv %1926, %68 : !llvm.i64 + %1928 = llvm.sub %64, %1927 : !llvm.i64 + %1929 = llvm.select %1924, %1928, %1927 : !llvm.i1, !llvm.i64 + %1930 = llvm.srem %1929, %68 : !llvm.i64 + %1931 = llvm.icmp "slt" %1930, %67 : !llvm.i64 + %1932 = llvm.add %1930, %68 : !llvm.i64 + %1933 = llvm.select %1931, %1932, %1930 : !llvm.i1, !llvm.i64 + %1934 = llvm.mul %1929, %65 : !llvm.i64 + %1935 = llvm.add %1779, %1934 : !llvm.i64 + %1936 = llvm.add %1935, %52 : !llvm.i64 + %1937 = llvm.icmp "slt" %1936, %67 : !llvm.i64 + %1938 = llvm.sub %64, %1936 : !llvm.i64 + %1939 = llvm.select %1937, %1938, %1936 : !llvm.i1, !llvm.i64 + %1940 = llvm.sdiv %1939, %63 : !llvm.i64 + %1941 = llvm.sub %64, %1940 : !llvm.i64 + %1942 = llvm.select %1937, %1941, %1940 : !llvm.i1, !llvm.i64 + %1943 = llvm.mul %1942, %65 : !llvm.i64 + %1944 = llvm.add %1935, %1943 : !llvm.i64 + %1945 = llvm.add %1944, %52 : !llvm.i64 + %1946 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1948 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1949 = llvm.mul %1933, %1948 : !llvm.i64 + %1950 = llvm.add %1947, %1949 : !llvm.i64 + %1951 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1952 = llvm.mul %1729, %1951 : !llvm.i64 + %1953 = llvm.add %1950, %1952 : !llvm.i64 + %1954 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1955 = llvm.mul %1945, %1954 : !llvm.i64 + %1956 = llvm.add %1953, %1955 : !llvm.i64 + %1957 = llvm.getelementptr %1946[%1956] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1922, %1957 : !llvm.ptr> + %1958 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1960 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1961 = llvm.mul %67, %1960 : !llvm.i64 + %1962 = llvm.add %1959, %1961 : !llvm.i64 + %1963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1964 = llvm.mul %56, %1963 : !llvm.i64 + %1965 = llvm.add %1962, %1964 : !llvm.i64 + %1966 = llvm.getelementptr %1958[%1965] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1967 = llvm.load %1966 : !llvm.ptr> + %1968 = llvm.add %1721, %45 : !llvm.i64 + %1969 = llvm.icmp "slt" %1968, %67 : !llvm.i64 + %1970 = llvm.sub %64, %1968 : !llvm.i64 + %1971 = llvm.select %1969, %1970, %1968 : !llvm.i1, !llvm.i64 + %1972 = llvm.sdiv %1971, %68 : !llvm.i64 + %1973 = llvm.sub %64, %1972 : !llvm.i64 + %1974 = llvm.select %1969, %1973, %1972 : !llvm.i1, !llvm.i64 + %1975 = llvm.mul %1974, %60 : !llvm.i64 + %1976 = llvm.add %1721, %1975 : !llvm.i64 + %1977 = llvm.add %1976, %45 : !llvm.i64 + %1978 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1980 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1981 = llvm.mul %1977, %1980 : !llvm.i64 + %1982 = llvm.add %1979, %1981 : !llvm.i64 + %1983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1984 = llvm.mul %1729, %1983 : !llvm.i64 + %1985 = llvm.add %1982, %1984 : !llvm.i64 + %1986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1987 = llvm.mul %1743, %1986 : !llvm.i64 + %1988 = llvm.add %1985, %1987 : !llvm.i64 + %1989 = llvm.getelementptr %1978[%1988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1967, %1989 : !llvm.ptr> + %1990 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1991 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1992 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1993 = llvm.mul %67, %1992 : !llvm.i64 + %1994 = llvm.add %1991, %1993 : !llvm.i64 + %1995 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1996 = llvm.mul %61, %1995 : !llvm.i64 + %1997 = llvm.add %1994, %1996 : !llvm.i64 + %1998 = llvm.getelementptr %1990[%1997] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1999 = llvm.load %1998 : !llvm.ptr> + %2000 = llvm.add %155, %46 : !llvm.i64 + %2001 = llvm.icmp "slt" %2000, %67 : !llvm.i64 + %2002 = llvm.sub %64, %2000 : !llvm.i64 + %2003 = llvm.select %2001, %2002, %2000 : !llvm.i1, !llvm.i64 + %2004 = llvm.sdiv %2003, %68 : !llvm.i64 + %2005 = llvm.sub %64, %2004 : !llvm.i64 + %2006 = llvm.select %2001, %2005, %2004 : !llvm.i1, !llvm.i64 + %2007 = llvm.srem %2006, %68 : !llvm.i64 + %2008 = llvm.icmp "slt" %2007, %67 : !llvm.i64 + %2009 = llvm.add %2007, %68 : !llvm.i64 + %2010 = llvm.select %2008, %2009, %2007 : !llvm.i1, !llvm.i64 + %2011 = llvm.mul %2006, %65 : !llvm.i64 + %2012 = llvm.add %1779, %2011 : !llvm.i64 + %2013 = llvm.add %2012, %61 : !llvm.i64 + %2014 = llvm.icmp "slt" %2013, %67 : !llvm.i64 + %2015 = llvm.sub %64, %2013 : !llvm.i64 + %2016 = llvm.select %2014, %2015, %2013 : !llvm.i1, !llvm.i64 + %2017 = llvm.sdiv %2016, %63 : !llvm.i64 + %2018 = llvm.sub %64, %2017 : !llvm.i64 + %2019 = llvm.select %2014, %2018, %2017 : !llvm.i1, !llvm.i64 + %2020 = llvm.mul %2019, %65 : !llvm.i64 + %2021 = llvm.add %2012, %2020 : !llvm.i64 + %2022 = llvm.add %2021, %61 : !llvm.i64 + %2023 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2024 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2025 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2026 = llvm.mul %2010, %2025 : !llvm.i64 + %2027 = llvm.add %2024, %2026 : !llvm.i64 + %2028 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2029 = llvm.mul %1729, %2028 : !llvm.i64 + %2030 = llvm.add %2027, %2029 : !llvm.i64 + %2031 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2032 = llvm.mul %2022, %2031 : !llvm.i64 + %2033 = llvm.add %2030, %2032 : !llvm.i64 + %2034 = llvm.getelementptr %2023[%2033] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1999, %2034 : !llvm.ptr> + %2035 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2036 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2037 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2038 = llvm.mul %67, %2037 : !llvm.i64 + %2039 = llvm.add %2036, %2038 : !llvm.i64 + %2040 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2041 = llvm.mul %70, %2040 : !llvm.i64 + %2042 = llvm.add %2039, %2041 : !llvm.i64 + %2043 = llvm.getelementptr %2035[%2042] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2044 = llvm.load %2043 : !llvm.ptr> + %2045 = llvm.add %1721, %48 : !llvm.i64 + %2046 = llvm.icmp "slt" %2045, %67 : !llvm.i64 + %2047 = llvm.sub %64, %2045 : !llvm.i64 + %2048 = llvm.select %2046, %2047, %2045 : !llvm.i1, !llvm.i64 + %2049 = llvm.sdiv %2048, %68 : !llvm.i64 + %2050 = llvm.sub %64, %2049 : !llvm.i64 + %2051 = llvm.select %2046, %2050, %2049 : !llvm.i1, !llvm.i64 + %2052 = llvm.mul %2051, %60 : !llvm.i64 + %2053 = llvm.add %1721, %2052 : !llvm.i64 + %2054 = llvm.add %2053, %48 : !llvm.i64 + %2055 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2056 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2057 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2058 = llvm.mul %2054, %2057 : !llvm.i64 + %2059 = llvm.add %2056, %2058 : !llvm.i64 + %2060 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2061 = llvm.mul %1729, %2060 : !llvm.i64 + %2062 = llvm.add %2059, %2061 : !llvm.i64 + %2063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2064 = llvm.mul %1743, %2063 : !llvm.i64 + %2065 = llvm.add %2062, %2064 : !llvm.i64 + %2066 = llvm.getelementptr %2055[%2065] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2044, %2066 : !llvm.ptr> + %2067 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2069 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2070 = llvm.mul %67, %2069 : !llvm.i64 + %2071 = llvm.add %2068, %2070 : !llvm.i64 + %2072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2073 = llvm.mul %50, %2072 : !llvm.i64 + %2074 = llvm.add %2071, %2073 : !llvm.i64 + %2075 = llvm.getelementptr %2067[%2074] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2076 = llvm.load %2075 : !llvm.ptr> + %2077 = llvm.add %155, %49 : !llvm.i64 + %2078 = llvm.icmp "slt" %2077, %67 : !llvm.i64 + %2079 = llvm.sub %64, %2077 : !llvm.i64 + %2080 = llvm.select %2078, %2079, %2077 : !llvm.i1, !llvm.i64 + %2081 = llvm.sdiv %2080, %68 : !llvm.i64 + %2082 = llvm.sub %64, %2081 : !llvm.i64 + %2083 = llvm.select %2078, %2082, %2081 : !llvm.i1, !llvm.i64 + %2084 = llvm.srem %2083, %68 : !llvm.i64 + %2085 = llvm.icmp "slt" %2084, %67 : !llvm.i64 + %2086 = llvm.add %2084, %68 : !llvm.i64 + %2087 = llvm.select %2085, %2086, %2084 : !llvm.i1, !llvm.i64 + %2088 = llvm.mul %2083, %65 : !llvm.i64 + %2089 = llvm.add %1779, %2088 : !llvm.i64 + %2090 = llvm.add %2089, %50 : !llvm.i64 + %2091 = llvm.icmp "slt" %2090, %67 : !llvm.i64 + %2092 = llvm.sub %64, %2090 : !llvm.i64 + %2093 = llvm.select %2091, %2092, %2090 : !llvm.i1, !llvm.i64 + %2094 = llvm.sdiv %2093, %63 : !llvm.i64 + %2095 = llvm.sub %64, %2094 : !llvm.i64 + %2096 = llvm.select %2091, %2095, %2094 : !llvm.i1, !llvm.i64 + %2097 = llvm.mul %2096, %65 : !llvm.i64 + %2098 = llvm.add %2089, %2097 : !llvm.i64 + %2099 = llvm.add %2098, %50 : !llvm.i64 + %2100 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2101 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2102 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2103 = llvm.mul %2087, %2102 : !llvm.i64 + %2104 = llvm.add %2101, %2103 : !llvm.i64 + %2105 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2106 = llvm.mul %1729, %2105 : !llvm.i64 + %2107 = llvm.add %2104, %2106 : !llvm.i64 + %2108 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2109 = llvm.mul %2099, %2108 : !llvm.i64 + %2110 = llvm.add %2107, %2109 : !llvm.i64 + %2111 = llvm.getelementptr %2100[%2110] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2076, %2111 : !llvm.ptr> + %2112 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2114 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2115 = llvm.mul %67, %2114 : !llvm.i64 + %2116 = llvm.add %2113, %2115 : !llvm.i64 + %2117 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2118 = llvm.mul %33, %2117 : !llvm.i64 + %2119 = llvm.add %2116, %2118 : !llvm.i64 + %2120 = llvm.getelementptr %2112[%2119] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2121 = llvm.load %2120 : !llvm.ptr> + %2122 = llvm.add %1721, %52 : !llvm.i64 + %2123 = llvm.icmp "slt" %2122, %67 : !llvm.i64 + %2124 = llvm.sub %64, %2122 : !llvm.i64 + %2125 = llvm.select %2123, %2124, %2122 : !llvm.i1, !llvm.i64 + %2126 = llvm.sdiv %2125, %68 : !llvm.i64 + %2127 = llvm.sub %64, %2126 : !llvm.i64 + %2128 = llvm.select %2123, %2127, %2126 : !llvm.i1, !llvm.i64 + %2129 = llvm.mul %2128, %60 : !llvm.i64 + %2130 = llvm.add %1721, %2129 : !llvm.i64 + %2131 = llvm.add %2130, %52 : !llvm.i64 + %2132 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2134 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2135 = llvm.mul %2131, %2134 : !llvm.i64 + %2136 = llvm.add %2133, %2135 : !llvm.i64 + %2137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2138 = llvm.mul %1729, %2137 : !llvm.i64 + %2139 = llvm.add %2136, %2138 : !llvm.i64 + %2140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2141 = llvm.mul %1743, %2140 : !llvm.i64 + %2142 = llvm.add %2139, %2141 : !llvm.i64 + %2143 = llvm.getelementptr %2132[%2142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2121, %2143 : !llvm.ptr> + %2144 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2145 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2146 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2147 = llvm.mul %67, %2146 : !llvm.i64 + %2148 = llvm.add %2145, %2147 : !llvm.i64 + %2149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2150 = llvm.mul %54, %2149 : !llvm.i64 + %2151 = llvm.add %2148, %2150 : !llvm.i64 + %2152 = llvm.getelementptr %2144[%2151] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2153 = llvm.load %2152 : !llvm.ptr> + %2154 = llvm.add %155, %53 : !llvm.i64 + %2155 = llvm.icmp "slt" %2154, %67 : !llvm.i64 + %2156 = llvm.sub %64, %2154 : !llvm.i64 + %2157 = llvm.select %2155, %2156, %2154 : !llvm.i1, !llvm.i64 + %2158 = llvm.sdiv %2157, %68 : !llvm.i64 + %2159 = llvm.sub %64, %2158 : !llvm.i64 + %2160 = llvm.select %2155, %2159, %2158 : !llvm.i1, !llvm.i64 + %2161 = llvm.srem %2160, %68 : !llvm.i64 + %2162 = llvm.icmp "slt" %2161, %67 : !llvm.i64 + %2163 = llvm.add %2161, %68 : !llvm.i64 + %2164 = llvm.select %2162, %2163, %2161 : !llvm.i1, !llvm.i64 + %2165 = llvm.mul %2160, %65 : !llvm.i64 + %2166 = llvm.add %1779, %2165 : !llvm.i64 + %2167 = llvm.add %2166, %54 : !llvm.i64 + %2168 = llvm.icmp "slt" %2167, %67 : !llvm.i64 + %2169 = llvm.sub %64, %2167 : !llvm.i64 + %2170 = llvm.select %2168, %2169, %2167 : !llvm.i1, !llvm.i64 + %2171 = llvm.sdiv %2170, %63 : !llvm.i64 + %2172 = llvm.sub %64, %2171 : !llvm.i64 + %2173 = llvm.select %2168, %2172, %2171 : !llvm.i1, !llvm.i64 + %2174 = llvm.mul %2173, %65 : !llvm.i64 + %2175 = llvm.add %2166, %2174 : !llvm.i64 + %2176 = llvm.add %2175, %54 : !llvm.i64 + %2177 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2179 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2180 = llvm.mul %2164, %2179 : !llvm.i64 + %2181 = llvm.add %2178, %2180 : !llvm.i64 + %2182 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2183 = llvm.mul %1729, %2182 : !llvm.i64 + %2184 = llvm.add %2181, %2183 : !llvm.i64 + %2185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2186 = llvm.mul %2176, %2185 : !llvm.i64 + %2187 = llvm.add %2184, %2186 : !llvm.i64 + %2188 = llvm.getelementptr %2177[%2187] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2153, %2188 : !llvm.ptr> + %2189 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2191 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2192 = llvm.mul %67, %2191 : !llvm.i64 + %2193 = llvm.add %2190, %2192 : !llvm.i64 + %2194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2195 = llvm.mul %34, %2194 : !llvm.i64 + %2196 = llvm.add %2193, %2195 : !llvm.i64 + %2197 = llvm.getelementptr %2189[%2196] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2198 = llvm.load %2197 : !llvm.ptr> + %2199 = llvm.add %1721, %56 : !llvm.i64 + %2200 = llvm.icmp "slt" %2199, %67 : !llvm.i64 + %2201 = llvm.sub %64, %2199 : !llvm.i64 + %2202 = llvm.select %2200, %2201, %2199 : !llvm.i1, !llvm.i64 + %2203 = llvm.sdiv %2202, %68 : !llvm.i64 + %2204 = llvm.sub %64, %2203 : !llvm.i64 + %2205 = llvm.select %2200, %2204, %2203 : !llvm.i1, !llvm.i64 + %2206 = llvm.mul %2205, %60 : !llvm.i64 + %2207 = llvm.add %1721, %2206 : !llvm.i64 + %2208 = llvm.add %2207, %56 : !llvm.i64 + %2209 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2211 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2212 = llvm.mul %2208, %2211 : !llvm.i64 + %2213 = llvm.add %2210, %2212 : !llvm.i64 + %2214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2215 = llvm.mul %1729, %2214 : !llvm.i64 + %2216 = llvm.add %2213, %2215 : !llvm.i64 + %2217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2218 = llvm.mul %1743, %2217 : !llvm.i64 + %2219 = llvm.add %2216, %2218 : !llvm.i64 + %2220 = llvm.getelementptr %2209[%2219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2198, %2220 : !llvm.ptr> + %2221 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2223 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2224 = llvm.mul %67, %2223 : !llvm.i64 + %2225 = llvm.add %2222, %2224 : !llvm.i64 + %2226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2227 = llvm.mul %58, %2226 : !llvm.i64 + %2228 = llvm.add %2225, %2227 : !llvm.i64 + %2229 = llvm.getelementptr %2221[%2228] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2230 = llvm.load %2229 : !llvm.ptr> + %2231 = llvm.add %155, %57 : !llvm.i64 + %2232 = llvm.icmp "slt" %2231, %67 : !llvm.i64 + %2233 = llvm.sub %64, %2231 : !llvm.i64 + %2234 = llvm.select %2232, %2233, %2231 : !llvm.i1, !llvm.i64 + %2235 = llvm.sdiv %2234, %68 : !llvm.i64 + %2236 = llvm.sub %64, %2235 : !llvm.i64 + %2237 = llvm.select %2232, %2236, %2235 : !llvm.i1, !llvm.i64 + %2238 = llvm.srem %2237, %68 : !llvm.i64 + %2239 = llvm.icmp "slt" %2238, %67 : !llvm.i64 + %2240 = llvm.add %2238, %68 : !llvm.i64 + %2241 = llvm.select %2239, %2240, %2238 : !llvm.i1, !llvm.i64 + %2242 = llvm.mul %2237, %65 : !llvm.i64 + %2243 = llvm.add %1779, %2242 : !llvm.i64 + %2244 = llvm.add %2243, %58 : !llvm.i64 + %2245 = llvm.icmp "slt" %2244, %67 : !llvm.i64 + %2246 = llvm.sub %64, %2244 : !llvm.i64 + %2247 = llvm.select %2245, %2246, %2244 : !llvm.i1, !llvm.i64 + %2248 = llvm.sdiv %2247, %63 : !llvm.i64 + %2249 = llvm.sub %64, %2248 : !llvm.i64 + %2250 = llvm.select %2245, %2249, %2248 : !llvm.i1, !llvm.i64 + %2251 = llvm.mul %2250, %65 : !llvm.i64 + %2252 = llvm.add %2243, %2251 : !llvm.i64 + %2253 = llvm.add %2252, %58 : !llvm.i64 + %2254 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2256 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2257 = llvm.mul %2241, %2256 : !llvm.i64 + %2258 = llvm.add %2255, %2257 : !llvm.i64 + %2259 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2260 = llvm.mul %1729, %2259 : !llvm.i64 + %2261 = llvm.add %2258, %2260 : !llvm.i64 + %2262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2263 = llvm.mul %2253, %2262 : !llvm.i64 + %2264 = llvm.add %2261, %2263 : !llvm.i64 + %2265 = llvm.getelementptr %2254[%2264] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2230, %2265 : !llvm.ptr> + %2266 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2268 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2269 = llvm.mul %67, %2268 : !llvm.i64 + %2270 = llvm.add %2267, %2269 : !llvm.i64 + %2271 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2272 = llvm.mul %35, %2271 : !llvm.i64 + %2273 = llvm.add %2270, %2272 : !llvm.i64 + %2274 = llvm.getelementptr %2266[%2273] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2275 = llvm.load %2274 : !llvm.ptr> + %2276 = llvm.add %1721, %61 : !llvm.i64 + %2277 = llvm.icmp "slt" %2276, %67 : !llvm.i64 + %2278 = llvm.sub %64, %2276 : !llvm.i64 + %2279 = llvm.select %2277, %2278, %2276 : !llvm.i1, !llvm.i64 + %2280 = llvm.sdiv %2279, %68 : !llvm.i64 + %2281 = llvm.sub %64, %2280 : !llvm.i64 + %2282 = llvm.select %2277, %2281, %2280 : !llvm.i1, !llvm.i64 + %2283 = llvm.mul %2282, %60 : !llvm.i64 + %2284 = llvm.add %1721, %2283 : !llvm.i64 + %2285 = llvm.add %2284, %61 : !llvm.i64 + %2286 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2287 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2288 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2289 = llvm.mul %2285, %2288 : !llvm.i64 + %2290 = llvm.add %2287, %2289 : !llvm.i64 + %2291 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2292 = llvm.mul %1729, %2291 : !llvm.i64 + %2293 = llvm.add %2290, %2292 : !llvm.i64 + %2294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2295 = llvm.mul %1743, %2294 : !llvm.i64 + %2296 = llvm.add %2293, %2295 : !llvm.i64 + %2297 = llvm.getelementptr %2286[%2296] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2275, %2297 : !llvm.ptr> + %2298 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2300 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2301 = llvm.mul %67, %2300 : !llvm.i64 + %2302 = llvm.add %2299, %2301 : !llvm.i64 + %2303 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2304 = llvm.mul %66, %2303 : !llvm.i64 + %2305 = llvm.add %2302, %2304 : !llvm.i64 + %2306 = llvm.getelementptr %2298[%2305] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2307 = llvm.load %2306 : !llvm.ptr> + %2308 = llvm.add %155, %62 : !llvm.i64 + %2309 = llvm.icmp "slt" %2308, %67 : !llvm.i64 + %2310 = llvm.sub %64, %2308 : !llvm.i64 + %2311 = llvm.select %2309, %2310, %2308 : !llvm.i1, !llvm.i64 + %2312 = llvm.sdiv %2311, %68 : !llvm.i64 + %2313 = llvm.sub %64, %2312 : !llvm.i64 + %2314 = llvm.select %2309, %2313, %2312 : !llvm.i1, !llvm.i64 + %2315 = llvm.srem %2314, %68 : !llvm.i64 + %2316 = llvm.icmp "slt" %2315, %67 : !llvm.i64 + %2317 = llvm.add %2315, %68 : !llvm.i64 + %2318 = llvm.select %2316, %2317, %2315 : !llvm.i1, !llvm.i64 + %2319 = llvm.mul %2314, %65 : !llvm.i64 + %2320 = llvm.add %1779, %2319 : !llvm.i64 + %2321 = llvm.add %2320, %66 : !llvm.i64 + %2322 = llvm.icmp "slt" %2321, %67 : !llvm.i64 + %2323 = llvm.sub %64, %2321 : !llvm.i64 + %2324 = llvm.select %2322, %2323, %2321 : !llvm.i1, !llvm.i64 + %2325 = llvm.sdiv %2324, %63 : !llvm.i64 + %2326 = llvm.sub %64, %2325 : !llvm.i64 + %2327 = llvm.select %2322, %2326, %2325 : !llvm.i1, !llvm.i64 + %2328 = llvm.mul %2327, %65 : !llvm.i64 + %2329 = llvm.add %2320, %2328 : !llvm.i64 + %2330 = llvm.add %2329, %66 : !llvm.i64 + %2331 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2332 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2333 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2334 = llvm.mul %2318, %2333 : !llvm.i64 + %2335 = llvm.add %2332, %2334 : !llvm.i64 + %2336 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2337 = llvm.mul %1729, %2336 : !llvm.i64 + %2338 = llvm.add %2335, %2337 : !llvm.i64 + %2339 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2340 = llvm.mul %2330, %2339 : !llvm.i64 + %2341 = llvm.add %2338, %2340 : !llvm.i64 + %2342 = llvm.getelementptr %2331[%2341] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2307, %2342 : !llvm.ptr> + llvm.br ^bb9 + ^bb9: // 2 preds: ^bb7, ^bb8 + %2343 = llvm.add %155, %39 : !llvm.i64 + llvm.br ^bb5(%2343 : !llvm.i64) + ^bb10: // pred: ^bb5 + %2344 = llvm.add %153, %69 : !llvm.i64 + llvm.br ^bb3(%2344 : !llvm.i64) + ^bb11: // pred: ^bb3 + llvm.br ^bb12(%67 : !llvm.i64) + ^bb12(%2345: !llvm.i64): // 2 preds: ^bb11, ^bb51 + %2346 = llvm.icmp "slt" %2345, %37 : !llvm.i64 + llvm.cond_br %2346, ^bb13, ^bb52 + ^bb13: // pred: ^bb12 + llvm.br ^bb14(%67 : !llvm.i64) + ^bb14(%2347: !llvm.i64): // 2 preds: ^bb13, ^bb21 + %2348 = llvm.icmp "slt" %2347, %68 : !llvm.i64 + llvm.cond_br %2348, ^bb15, ^bb22 + ^bb15: // pred: ^bb14 + llvm.br ^bb16(%67 : !llvm.i64) + ^bb16(%2349: !llvm.i64): // 2 preds: ^bb15, ^bb20 + %2350 = llvm.icmp "slt" %2349, %56 : !llvm.i64 + llvm.cond_br %2350, ^bb17, ^bb21 + ^bb17: // pred: ^bb16 + llvm.br ^bb18(%67 : !llvm.i64) + ^bb18(%2351: !llvm.i64): // 2 preds: ^bb17, ^bb19 + %2352 = llvm.icmp "slt" %2351, %63 : !llvm.i64 + llvm.cond_br %2352, ^bb19, ^bb20 + ^bb19: // pred: ^bb18 + %2353 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2354 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2355 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2356 = llvm.mul %2347, %2355 : !llvm.i64 + %2357 = llvm.add %2354, %2356 : !llvm.i64 + %2358 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2359 = llvm.mul %2349, %2358 : !llvm.i64 + %2360 = llvm.add %2357, %2359 : !llvm.i64 + %2361 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2362 = llvm.mul %2351, %2361 : !llvm.i64 + %2363 = llvm.add %2360, %2362 : !llvm.i64 + %2364 = llvm.getelementptr %2353[%2363] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %32, %2364 : !llvm.ptr> + %2365 = llvm.add %2351, %69 : !llvm.i64 + llvm.br ^bb18(%2365 : !llvm.i64) + ^bb20: // pred: ^bb18 + %2366 = llvm.add %2349, %69 : !llvm.i64 + llvm.br ^bb16(%2366 : !llvm.i64) + ^bb21: // pred: ^bb16 + %2367 = llvm.add %2347, %69 : !llvm.i64 + llvm.br ^bb14(%2367 : !llvm.i64) + ^bb22: // pred: ^bb14 + llvm.br ^bb23(%67 : !llvm.i64) + ^bb23(%2368: !llvm.i64): // 2 preds: ^bb22, ^bb39 + %2369 = llvm.icmp "slt" %2368, %38 : !llvm.i64 + llvm.cond_br %2369, ^bb24, ^bb40 + ^bb24: // pred: ^bb23 + llvm.br ^bb25(%67 : !llvm.i64) + ^bb25(%2370: !llvm.i64): // 2 preds: ^bb24, ^bb38 + %2371 = llvm.icmp "slt" %2370, %39 : !llvm.i64 + llvm.cond_br %2371, ^bb26, ^bb39 + ^bb26: // pred: ^bb25 + llvm.br ^bb27(%67 : !llvm.i64) + ^bb27(%2372: !llvm.i64): // 2 preds: ^bb26, ^bb34 + %2373 = llvm.icmp "slt" %2372, %67 : !llvm.i64 + llvm.cond_br %2373, ^bb28, ^bb35 + ^bb28: // pred: ^bb27 + llvm.br ^bb29(%67 : !llvm.i64) + ^bb29(%2374: !llvm.i64): // 2 preds: ^bb28, ^bb33 + %2375 = llvm.icmp "slt" %2374, %48 : !llvm.i64 + llvm.cond_br %2375, ^bb30, ^bb34 + ^bb30: // pred: ^bb29 + llvm.br ^bb31(%67 : !llvm.i64) + ^bb31(%2376: !llvm.i64): // 2 preds: ^bb30, ^bb32 + %2377 = llvm.icmp "slt" %2376, %67 : !llvm.i64 + llvm.cond_br %2377, ^bb32, ^bb33 + ^bb32: // pred: ^bb31 + %2378 = llvm.add %2345, %2372 : !llvm.i64 + %2379 = llvm.add %2378, %2376 : !llvm.i64 + %2380 = llvm.add %2370, %2374 : !llvm.i64 + %2381 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2383 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2384 = llvm.mul %2379, %2383 : !llvm.i64 + %2385 = llvm.add %2382, %2384 : !llvm.i64 + %2386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2387 = llvm.mul %2380, %2386 : !llvm.i64 + %2388 = llvm.add %2385, %2387 : !llvm.i64 + %2389 = llvm.getelementptr %2381[%2388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2390 = llvm.load %2389 : !llvm.ptr + %2391 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2392 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2393 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2394 = llvm.mul %2379, %2393 : !llvm.i64 + %2395 = llvm.add %2392, %2394 : !llvm.i64 + %2396 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2397 = llvm.mul %2380, %2396 : !llvm.i64 + %2398 = llvm.add %2395, %2397 : !llvm.i64 + %2399 = llvm.getelementptr %2391[%2398] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2400 = llvm.load %2399 : !llvm.ptr + %2401 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2403 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2404 = llvm.mul %2379, %2403 : !llvm.i64 + %2405 = llvm.add %2402, %2404 : !llvm.i64 + %2406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2407 = llvm.mul %2380, %2406 : !llvm.i64 + %2408 = llvm.add %2405, %2407 : !llvm.i64 + %2409 = llvm.getelementptr %2401[%2408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2410 = llvm.load %2409 : !llvm.ptr + %2411 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2413 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2414 = llvm.mul %2379, %2413 : !llvm.i64 + %2415 = llvm.add %2412, %2414 : !llvm.i64 + %2416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2417 = llvm.mul %2380, %2416 : !llvm.i64 + %2418 = llvm.add %2415, %2417 : !llvm.i64 + %2419 = llvm.getelementptr %2411[%2418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2420 = llvm.load %2419 : !llvm.ptr + %2421 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2423 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2424 = llvm.mul %2379, %2423 : !llvm.i64 + %2425 = llvm.add %2422, %2424 : !llvm.i64 + %2426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2427 = llvm.mul %2380, %2426 : !llvm.i64 + %2428 = llvm.add %2425, %2427 : !llvm.i64 + %2429 = llvm.getelementptr %2421[%2428] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2430 = llvm.load %2429 : !llvm.ptr + %2431 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2433 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2434 = llvm.mul %2379, %2433 : !llvm.i64 + %2435 = llvm.add %2432, %2434 : !llvm.i64 + %2436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2437 = llvm.mul %2380, %2436 : !llvm.i64 + %2438 = llvm.add %2435, %2437 : !llvm.i64 + %2439 = llvm.getelementptr %2431[%2438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2440 = llvm.load %2439 : !llvm.ptr + %2441 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2442 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2443 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2444 = llvm.mul %2379, %2443 : !llvm.i64 + %2445 = llvm.add %2442, %2444 : !llvm.i64 + %2446 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2447 = llvm.mul %2380, %2446 : !llvm.i64 + %2448 = llvm.add %2445, %2447 : !llvm.i64 + %2449 = llvm.getelementptr %2441[%2448] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2450 = llvm.load %2449 : !llvm.ptr + %2451 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2452 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2453 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2454 = llvm.mul %2379, %2453 : !llvm.i64 + %2455 = llvm.add %2452, %2454 : !llvm.i64 + %2456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2457 = llvm.mul %2380, %2456 : !llvm.i64 + %2458 = llvm.add %2455, %2457 : !llvm.i64 + %2459 = llvm.getelementptr %2451[%2458] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2460 = llvm.load %2459 : !llvm.ptr + %2461 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %2462 = llvm.sub %64, %2368 : !llvm.i64 + %2463 = llvm.select %2461, %2462, %2368 : !llvm.i1, !llvm.i64 + %2464 = llvm.sdiv %2463, %68 : !llvm.i64 + %2465 = llvm.sub %64, %2464 : !llvm.i64 + %2466 = llvm.select %2461, %2465, %2464 : !llvm.i1, !llvm.i64 + %2467 = llvm.srem %2466, %68 : !llvm.i64 + %2468 = llvm.icmp "slt" %2467, %67 : !llvm.i64 + %2469 = llvm.add %2467, %68 : !llvm.i64 + %2470 = llvm.select %2468, %2469, %2467 : !llvm.i1, !llvm.i64 + %2471 = llvm.srem %2380, %39 : !llvm.i64 + %2472 = llvm.icmp "slt" %2471, %67 : !llvm.i64 + %2473 = llvm.add %2471, %39 : !llvm.i64 + %2474 = llvm.select %2472, %2473, %2471 : !llvm.i1, !llvm.i64 + %2475 = llvm.srem %2368, %68 : !llvm.i64 + %2476 = llvm.icmp "slt" %2475, %67 : !llvm.i64 + %2477 = llvm.add %2475, %68 : !llvm.i64 + %2478 = llvm.select %2476, %2477, %2475 : !llvm.i1, !llvm.i64 + %2479 = llvm.icmp "slt" %2478, %67 : !llvm.i64 + %2480 = llvm.sub %64, %2478 : !llvm.i64 + %2481 = llvm.select %2479, %2480, %2478 : !llvm.i1, !llvm.i64 + %2482 = llvm.sdiv %2481, %70 : !llvm.i64 + %2483 = llvm.sub %64, %2482 : !llvm.i64 + %2484 = llvm.select %2479, %2483, %2482 : !llvm.i1, !llvm.i64 + %2485 = llvm.srem %2484, %63 : !llvm.i64 + %2486 = llvm.icmp "slt" %2485, %67 : !llvm.i64 + %2487 = llvm.add %2485, %63 : !llvm.i64 + %2488 = llvm.select %2486, %2487, %2485 : !llvm.i1, !llvm.i64 + %2489 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2490 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2491 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2492 = llvm.mul %2470, %2491 : !llvm.i64 + %2493 = llvm.add %2490, %2492 : !llvm.i64 + %2494 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2495 = llvm.mul %2474, %2494 : !llvm.i64 + %2496 = llvm.add %2493, %2495 : !llvm.i64 + %2497 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2498 = llvm.mul %2488, %2497 : !llvm.i64 + %2499 = llvm.add %2496, %2498 : !llvm.i64 + %2500 = llvm.getelementptr %2489[%2499] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2501 = llvm.load %2500 : !llvm.ptr> + %2502 = llvm.extractelement %2501[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2503 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2504 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2505 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2506 = llvm.mul %2470, %2505 : !llvm.i64 + %2507 = llvm.add %2504, %2506 : !llvm.i64 + %2508 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2509 = llvm.mul %2474, %2508 : !llvm.i64 + %2510 = llvm.add %2507, %2509 : !llvm.i64 + %2511 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2512 = llvm.mul %2488, %2511 : !llvm.i64 + %2513 = llvm.add %2510, %2512 : !llvm.i64 + %2514 = llvm.getelementptr %2503[%2513] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2515 = llvm.load %2514 : !llvm.ptr> + %2516 = llvm.extractelement %2515[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2517 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2519 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2520 = llvm.mul %2470, %2519 : !llvm.i64 + %2521 = llvm.add %2518, %2520 : !llvm.i64 + %2522 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2523 = llvm.mul %2474, %2522 : !llvm.i64 + %2524 = llvm.add %2521, %2523 : !llvm.i64 + %2525 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2526 = llvm.mul %2488, %2525 : !llvm.i64 + %2527 = llvm.add %2524, %2526 : !llvm.i64 + %2528 = llvm.getelementptr %2517[%2527] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2529 = llvm.load %2528 : !llvm.ptr> + %2530 = llvm.extractelement %2529[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2534 = llvm.mul %2470, %2533 : !llvm.i64 + %2535 = llvm.add %2532, %2534 : !llvm.i64 + %2536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2537 = llvm.mul %2474, %2536 : !llvm.i64 + %2538 = llvm.add %2535, %2537 : !llvm.i64 + %2539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2540 = llvm.mul %2488, %2539 : !llvm.i64 + %2541 = llvm.add %2538, %2540 : !llvm.i64 + %2542 = llvm.getelementptr %2531[%2541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2543 = llvm.load %2542 : !llvm.ptr> + %2544 = llvm.extractelement %2543[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2545 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2546 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2547 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2548 = llvm.mul %2470, %2547 : !llvm.i64 + %2549 = llvm.add %2546, %2548 : !llvm.i64 + %2550 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2551 = llvm.mul %2474, %2550 : !llvm.i64 + %2552 = llvm.add %2549, %2551 : !llvm.i64 + %2553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2554 = llvm.mul %2488, %2553 : !llvm.i64 + %2555 = llvm.add %2552, %2554 : !llvm.i64 + %2556 = llvm.getelementptr %2545[%2555] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2557 = llvm.load %2556 : !llvm.ptr> + %2558 = llvm.extractelement %2557[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2559 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2560 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2561 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2562 = llvm.mul %2470, %2561 : !llvm.i64 + %2563 = llvm.add %2560, %2562 : !llvm.i64 + %2564 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2565 = llvm.mul %2474, %2564 : !llvm.i64 + %2566 = llvm.add %2563, %2565 : !llvm.i64 + %2567 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2568 = llvm.mul %2488, %2567 : !llvm.i64 + %2569 = llvm.add %2566, %2568 : !llvm.i64 + %2570 = llvm.getelementptr %2559[%2569] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2571 = llvm.load %2570 : !llvm.ptr> + %2572 = llvm.extractelement %2571[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2573 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2574 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2575 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2576 = llvm.mul %2470, %2575 : !llvm.i64 + %2577 = llvm.add %2574, %2576 : !llvm.i64 + %2578 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2579 = llvm.mul %2474, %2578 : !llvm.i64 + %2580 = llvm.add %2577, %2579 : !llvm.i64 + %2581 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2582 = llvm.mul %2488, %2581 : !llvm.i64 + %2583 = llvm.add %2580, %2582 : !llvm.i64 + %2584 = llvm.getelementptr %2573[%2583] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2585 = llvm.load %2584 : !llvm.ptr> + %2586 = llvm.extractelement %2585[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2587 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2588 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2589 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2590 = llvm.mul %2470, %2589 : !llvm.i64 + %2591 = llvm.add %2588, %2590 : !llvm.i64 + %2592 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2593 = llvm.mul %2474, %2592 : !llvm.i64 + %2594 = llvm.add %2591, %2593 : !llvm.i64 + %2595 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2596 = llvm.mul %2488, %2595 : !llvm.i64 + %2597 = llvm.add %2594, %2596 : !llvm.i64 + %2598 = llvm.getelementptr %2587[%2597] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2599 = llvm.load %2598 : !llvm.ptr> + %2600 = llvm.extractelement %2599[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2601 = llvm.fmul %2390, %2502 {RelaxedPrecision} : !llvm.float + %2602 = llvm.fmul %2400, %2516 {RelaxedPrecision} : !llvm.float + %2603 = llvm.fmul %2410, %2530 {RelaxedPrecision} : !llvm.float + %2604 = llvm.fmul %2420, %2544 {RelaxedPrecision} : !llvm.float + %2605 = llvm.fmul %2430, %2558 {RelaxedPrecision} : !llvm.float + %2606 = llvm.fmul %2440, %2572 {RelaxedPrecision} : !llvm.float + %2607 = llvm.fmul %2450, %2586 {RelaxedPrecision} : !llvm.float + %2608 = llvm.fmul %2460, %2600 {RelaxedPrecision} : !llvm.float + %2609 = llvm.add %2372, %2376 : !llvm.i64 + %2610 = llvm.srem %2609, %56 : !llvm.i64 + %2611 = llvm.icmp "slt" %2610, %67 : !llvm.i64 + %2612 = llvm.add %2610, %56 : !llvm.i64 + %2613 = llvm.select %2611, %2612, %2610 : !llvm.i1, !llvm.i64 + %2614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2617 = llvm.mul %2470, %2616 : !llvm.i64 + %2618 = llvm.add %2615, %2617 : !llvm.i64 + %2619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2620 = llvm.mul %2613, %2619 : !llvm.i64 + %2621 = llvm.add %2618, %2620 : !llvm.i64 + %2622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2623 = llvm.mul %2488, %2622 : !llvm.i64 + %2624 = llvm.add %2621, %2623 : !llvm.i64 + %2625 = llvm.getelementptr %2614[%2624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2626 = llvm.load %2625 : !llvm.ptr> + %2627 = llvm.extractelement %2626[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2628 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2630 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2631 = llvm.mul %2470, %2630 : !llvm.i64 + %2632 = llvm.add %2629, %2631 : !llvm.i64 + %2633 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2634 = llvm.mul %2613, %2633 : !llvm.i64 + %2635 = llvm.add %2632, %2634 : !llvm.i64 + %2636 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2637 = llvm.mul %2488, %2636 : !llvm.i64 + %2638 = llvm.add %2635, %2637 : !llvm.i64 + %2639 = llvm.getelementptr %2628[%2638] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2640 = llvm.load %2639 : !llvm.ptr> + %2641 = llvm.extractelement %2640[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2642 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2643 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2644 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2645 = llvm.mul %2470, %2644 : !llvm.i64 + %2646 = llvm.add %2643, %2645 : !llvm.i64 + %2647 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2648 = llvm.mul %2613, %2647 : !llvm.i64 + %2649 = llvm.add %2646, %2648 : !llvm.i64 + %2650 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2651 = llvm.mul %2488, %2650 : !llvm.i64 + %2652 = llvm.add %2649, %2651 : !llvm.i64 + %2653 = llvm.getelementptr %2642[%2652] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2654 = llvm.load %2653 : !llvm.ptr> + %2655 = llvm.extractelement %2654[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2659 = llvm.mul %2470, %2658 : !llvm.i64 + %2660 = llvm.add %2657, %2659 : !llvm.i64 + %2661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2662 = llvm.mul %2613, %2661 : !llvm.i64 + %2663 = llvm.add %2660, %2662 : !llvm.i64 + %2664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2665 = llvm.mul %2488, %2664 : !llvm.i64 + %2666 = llvm.add %2663, %2665 : !llvm.i64 + %2667 = llvm.getelementptr %2656[%2666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2668 = llvm.load %2667 : !llvm.ptr> + %2669 = llvm.extractelement %2668[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2673 = llvm.mul %2470, %2672 : !llvm.i64 + %2674 = llvm.add %2671, %2673 : !llvm.i64 + %2675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2676 = llvm.mul %2613, %2675 : !llvm.i64 + %2677 = llvm.add %2674, %2676 : !llvm.i64 + %2678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2679 = llvm.mul %2488, %2678 : !llvm.i64 + %2680 = llvm.add %2677, %2679 : !llvm.i64 + %2681 = llvm.getelementptr %2670[%2680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2682 = llvm.load %2681 : !llvm.ptr> + %2683 = llvm.extractelement %2682[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2684 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2685 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2686 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2687 = llvm.mul %2470, %2686 : !llvm.i64 + %2688 = llvm.add %2685, %2687 : !llvm.i64 + %2689 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2690 = llvm.mul %2613, %2689 : !llvm.i64 + %2691 = llvm.add %2688, %2690 : !llvm.i64 + %2692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2693 = llvm.mul %2488, %2692 : !llvm.i64 + %2694 = llvm.add %2691, %2693 : !llvm.i64 + %2695 = llvm.getelementptr %2684[%2694] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2696 = llvm.load %2695 : !llvm.ptr> + %2697 = llvm.extractelement %2696[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2698 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2700 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2701 = llvm.mul %2470, %2700 : !llvm.i64 + %2702 = llvm.add %2699, %2701 : !llvm.i64 + %2703 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2704 = llvm.mul %2613, %2703 : !llvm.i64 + %2705 = llvm.add %2702, %2704 : !llvm.i64 + %2706 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2707 = llvm.mul %2488, %2706 : !llvm.i64 + %2708 = llvm.add %2705, %2707 : !llvm.i64 + %2709 = llvm.getelementptr %2698[%2708] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2710 = llvm.load %2709 : !llvm.ptr> + %2711 = llvm.extractelement %2710[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2712 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2714 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2715 = llvm.mul %2470, %2714 : !llvm.i64 + %2716 = llvm.add %2713, %2715 : !llvm.i64 + %2717 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2718 = llvm.mul %2613, %2717 : !llvm.i64 + %2719 = llvm.add %2716, %2718 : !llvm.i64 + %2720 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2721 = llvm.mul %2488, %2720 : !llvm.i64 + %2722 = llvm.add %2719, %2721 : !llvm.i64 + %2723 = llvm.getelementptr %2712[%2722] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2724 = llvm.load %2723 : !llvm.ptr> + %2725 = llvm.extractelement %2724[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2726 = llvm.fadd %2627, %2601 {RelaxedPrecision} : !llvm.float + %2727 = llvm.fadd %2641, %2602 {RelaxedPrecision} : !llvm.float + %2728 = llvm.fadd %2655, %2603 {RelaxedPrecision} : !llvm.float + %2729 = llvm.fadd %2669, %2604 {RelaxedPrecision} : !llvm.float + %2730 = llvm.fadd %2683, %2605 {RelaxedPrecision} : !llvm.float + %2731 = llvm.fadd %2697, %2606 {RelaxedPrecision} : !llvm.float + %2732 = llvm.fadd %2711, %2607 {RelaxedPrecision} : !llvm.float + %2733 = llvm.fadd %2725, %2608 {RelaxedPrecision} : !llvm.float + %2734 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2735 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2736 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2737 = llvm.mul %2470, %2736 : !llvm.i64 + %2738 = llvm.add %2735, %2737 : !llvm.i64 + %2739 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2740 = llvm.mul %2613, %2739 : !llvm.i64 + %2741 = llvm.add %2738, %2740 : !llvm.i64 + %2742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2743 = llvm.mul %2488, %2742 : !llvm.i64 + %2744 = llvm.add %2741, %2743 : !llvm.i64 + %2745 = llvm.getelementptr %2734[%2744] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2746 = llvm.load %2745 : !llvm.ptr> + %2747 = llvm.insertelement %2726, %2746[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2748 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2750 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2751 = llvm.mul %2470, %2750 : !llvm.i64 + %2752 = llvm.add %2749, %2751 : !llvm.i64 + %2753 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2754 = llvm.mul %2613, %2753 : !llvm.i64 + %2755 = llvm.add %2752, %2754 : !llvm.i64 + %2756 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2757 = llvm.mul %2488, %2756 : !llvm.i64 + %2758 = llvm.add %2755, %2757 : !llvm.i64 + %2759 = llvm.getelementptr %2748[%2758] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2747, %2759 : !llvm.ptr> + %2760 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2761 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2762 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2763 = llvm.mul %2470, %2762 : !llvm.i64 + %2764 = llvm.add %2761, %2763 : !llvm.i64 + %2765 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2766 = llvm.mul %2613, %2765 : !llvm.i64 + %2767 = llvm.add %2764, %2766 : !llvm.i64 + %2768 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2769 = llvm.mul %2488, %2768 : !llvm.i64 + %2770 = llvm.add %2767, %2769 : !llvm.i64 + %2771 = llvm.getelementptr %2760[%2770] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2772 = llvm.load %2771 : !llvm.ptr> + %2773 = llvm.insertelement %2727, %2772[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2774 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2775 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2776 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2777 = llvm.mul %2470, %2776 : !llvm.i64 + %2778 = llvm.add %2775, %2777 : !llvm.i64 + %2779 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2780 = llvm.mul %2613, %2779 : !llvm.i64 + %2781 = llvm.add %2778, %2780 : !llvm.i64 + %2782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2783 = llvm.mul %2488, %2782 : !llvm.i64 + %2784 = llvm.add %2781, %2783 : !llvm.i64 + %2785 = llvm.getelementptr %2774[%2784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2773, %2785 : !llvm.ptr> + %2786 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2787 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2788 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2789 = llvm.mul %2470, %2788 : !llvm.i64 + %2790 = llvm.add %2787, %2789 : !llvm.i64 + %2791 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2792 = llvm.mul %2613, %2791 : !llvm.i64 + %2793 = llvm.add %2790, %2792 : !llvm.i64 + %2794 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2795 = llvm.mul %2488, %2794 : !llvm.i64 + %2796 = llvm.add %2793, %2795 : !llvm.i64 + %2797 = llvm.getelementptr %2786[%2796] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2798 = llvm.load %2797 : !llvm.ptr> + %2799 = llvm.insertelement %2728, %2798[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2800 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2801 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2802 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2803 = llvm.mul %2470, %2802 : !llvm.i64 + %2804 = llvm.add %2801, %2803 : !llvm.i64 + %2805 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2806 = llvm.mul %2613, %2805 : !llvm.i64 + %2807 = llvm.add %2804, %2806 : !llvm.i64 + %2808 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2809 = llvm.mul %2488, %2808 : !llvm.i64 + %2810 = llvm.add %2807, %2809 : !llvm.i64 + %2811 = llvm.getelementptr %2800[%2810] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2799, %2811 : !llvm.ptr> + %2812 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2813 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2814 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2815 = llvm.mul %2470, %2814 : !llvm.i64 + %2816 = llvm.add %2813, %2815 : !llvm.i64 + %2817 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2818 = llvm.mul %2613, %2817 : !llvm.i64 + %2819 = llvm.add %2816, %2818 : !llvm.i64 + %2820 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2821 = llvm.mul %2488, %2820 : !llvm.i64 + %2822 = llvm.add %2819, %2821 : !llvm.i64 + %2823 = llvm.getelementptr %2812[%2822] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2824 = llvm.load %2823 : !llvm.ptr> + %2825 = llvm.insertelement %2729, %2824[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2826 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2827 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2828 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2829 = llvm.mul %2470, %2828 : !llvm.i64 + %2830 = llvm.add %2827, %2829 : !llvm.i64 + %2831 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2832 = llvm.mul %2613, %2831 : !llvm.i64 + %2833 = llvm.add %2830, %2832 : !llvm.i64 + %2834 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2835 = llvm.mul %2488, %2834 : !llvm.i64 + %2836 = llvm.add %2833, %2835 : !llvm.i64 + %2837 = llvm.getelementptr %2826[%2836] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2825, %2837 : !llvm.ptr> + %2838 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2839 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2840 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2841 = llvm.mul %2470, %2840 : !llvm.i64 + %2842 = llvm.add %2839, %2841 : !llvm.i64 + %2843 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2844 = llvm.mul %2613, %2843 : !llvm.i64 + %2845 = llvm.add %2842, %2844 : !llvm.i64 + %2846 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2847 = llvm.mul %2488, %2846 : !llvm.i64 + %2848 = llvm.add %2845, %2847 : !llvm.i64 + %2849 = llvm.getelementptr %2838[%2848] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2850 = llvm.load %2849 : !llvm.ptr> + %2851 = llvm.insertelement %2730, %2850[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2852 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2853 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2854 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2855 = llvm.mul %2470, %2854 : !llvm.i64 + %2856 = llvm.add %2853, %2855 : !llvm.i64 + %2857 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2858 = llvm.mul %2613, %2857 : !llvm.i64 + %2859 = llvm.add %2856, %2858 : !llvm.i64 + %2860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2861 = llvm.mul %2488, %2860 : !llvm.i64 + %2862 = llvm.add %2859, %2861 : !llvm.i64 + %2863 = llvm.getelementptr %2852[%2862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2851, %2863 : !llvm.ptr> + %2864 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2865 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2866 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2867 = llvm.mul %2470, %2866 : !llvm.i64 + %2868 = llvm.add %2865, %2867 : !llvm.i64 + %2869 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2870 = llvm.mul %2613, %2869 : !llvm.i64 + %2871 = llvm.add %2868, %2870 : !llvm.i64 + %2872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2873 = llvm.mul %2488, %2872 : !llvm.i64 + %2874 = llvm.add %2871, %2873 : !llvm.i64 + %2875 = llvm.getelementptr %2864[%2874] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2876 = llvm.load %2875 : !llvm.ptr> + %2877 = llvm.insertelement %2731, %2876[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2878 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2880 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2881 = llvm.mul %2470, %2880 : !llvm.i64 + %2882 = llvm.add %2879, %2881 : !llvm.i64 + %2883 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2884 = llvm.mul %2613, %2883 : !llvm.i64 + %2885 = llvm.add %2882, %2884 : !llvm.i64 + %2886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2887 = llvm.mul %2488, %2886 : !llvm.i64 + %2888 = llvm.add %2885, %2887 : !llvm.i64 + %2889 = llvm.getelementptr %2878[%2888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2877, %2889 : !llvm.ptr> + %2890 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2892 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2893 = llvm.mul %2470, %2892 : !llvm.i64 + %2894 = llvm.add %2891, %2893 : !llvm.i64 + %2895 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2896 = llvm.mul %2613, %2895 : !llvm.i64 + %2897 = llvm.add %2894, %2896 : !llvm.i64 + %2898 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2899 = llvm.mul %2488, %2898 : !llvm.i64 + %2900 = llvm.add %2897, %2899 : !llvm.i64 + %2901 = llvm.getelementptr %2890[%2900] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2902 = llvm.load %2901 : !llvm.ptr> + %2903 = llvm.insertelement %2732, %2902[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2904 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2905 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2906 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2907 = llvm.mul %2470, %2906 : !llvm.i64 + %2908 = llvm.add %2905, %2907 : !llvm.i64 + %2909 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2910 = llvm.mul %2613, %2909 : !llvm.i64 + %2911 = llvm.add %2908, %2910 : !llvm.i64 + %2912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2913 = llvm.mul %2488, %2912 : !llvm.i64 + %2914 = llvm.add %2911, %2913 : !llvm.i64 + %2915 = llvm.getelementptr %2904[%2914] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2903, %2915 : !llvm.ptr> + %2916 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2917 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2918 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2919 = llvm.mul %2470, %2918 : !llvm.i64 + %2920 = llvm.add %2917, %2919 : !llvm.i64 + %2921 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2922 = llvm.mul %2613, %2921 : !llvm.i64 + %2923 = llvm.add %2920, %2922 : !llvm.i64 + %2924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2925 = llvm.mul %2488, %2924 : !llvm.i64 + %2926 = llvm.add %2923, %2925 : !llvm.i64 + %2927 = llvm.getelementptr %2916[%2926] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2928 = llvm.load %2927 : !llvm.ptr> + %2929 = llvm.insertelement %2733, %2928[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2930 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2931 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2932 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2933 = llvm.mul %2470, %2932 : !llvm.i64 + %2934 = llvm.add %2931, %2933 : !llvm.i64 + %2935 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2936 = llvm.mul %2613, %2935 : !llvm.i64 + %2937 = llvm.add %2934, %2936 : !llvm.i64 + %2938 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2939 = llvm.mul %2488, %2938 : !llvm.i64 + %2940 = llvm.add %2937, %2939 : !llvm.i64 + %2941 = llvm.getelementptr %2930[%2940] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2929, %2941 : !llvm.ptr> + %2942 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2943 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2944 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2945 = llvm.mul %2470, %2944 : !llvm.i64 + %2946 = llvm.add %2943, %2945 : !llvm.i64 + %2947 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2948 = llvm.mul %2613, %2947 : !llvm.i64 + %2949 = llvm.add %2946, %2948 : !llvm.i64 + %2950 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2951 = llvm.mul %2488, %2950 : !llvm.i64 + %2952 = llvm.add %2949, %2951 : !llvm.i64 + %2953 = llvm.getelementptr %2942[%2952] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2954 = llvm.load %2953 : !llvm.ptr> + %2955 = llvm.insertelement %2726, %2954[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2956 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2957 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2958 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2959 = llvm.mul %2470, %2958 : !llvm.i64 + %2960 = llvm.add %2957, %2959 : !llvm.i64 + %2961 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2962 = llvm.mul %2613, %2961 : !llvm.i64 + %2963 = llvm.add %2960, %2962 : !llvm.i64 + %2964 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2965 = llvm.mul %2488, %2964 : !llvm.i64 + %2966 = llvm.add %2963, %2965 : !llvm.i64 + %2967 = llvm.getelementptr %2956[%2966] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2955, %2967 : !llvm.ptr> + %2968 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2970 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2971 = llvm.mul %2470, %2970 : !llvm.i64 + %2972 = llvm.add %2969, %2971 : !llvm.i64 + %2973 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2974 = llvm.mul %2613, %2973 : !llvm.i64 + %2975 = llvm.add %2972, %2974 : !llvm.i64 + %2976 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2977 = llvm.mul %2488, %2976 : !llvm.i64 + %2978 = llvm.add %2975, %2977 : !llvm.i64 + %2979 = llvm.getelementptr %2968[%2978] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2980 = llvm.load %2979 : !llvm.ptr> + %2981 = llvm.insertelement %2727, %2980[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2982 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2984 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2985 = llvm.mul %2470, %2984 : !llvm.i64 + %2986 = llvm.add %2983, %2985 : !llvm.i64 + %2987 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2988 = llvm.mul %2613, %2987 : !llvm.i64 + %2989 = llvm.add %2986, %2988 : !llvm.i64 + %2990 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2991 = llvm.mul %2488, %2990 : !llvm.i64 + %2992 = llvm.add %2989, %2991 : !llvm.i64 + %2993 = llvm.getelementptr %2982[%2992] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2981, %2993 : !llvm.ptr> + %2994 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2995 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2996 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2997 = llvm.mul %2470, %2996 : !llvm.i64 + %2998 = llvm.add %2995, %2997 : !llvm.i64 + %2999 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3000 = llvm.mul %2613, %2999 : !llvm.i64 + %3001 = llvm.add %2998, %3000 : !llvm.i64 + %3002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3003 = llvm.mul %2488, %3002 : !llvm.i64 + %3004 = llvm.add %3001, %3003 : !llvm.i64 + %3005 = llvm.getelementptr %2994[%3004] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3006 = llvm.load %3005 : !llvm.ptr> + %3007 = llvm.insertelement %2728, %3006[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3008 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3010 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3011 = llvm.mul %2470, %3010 : !llvm.i64 + %3012 = llvm.add %3009, %3011 : !llvm.i64 + %3013 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3014 = llvm.mul %2613, %3013 : !llvm.i64 + %3015 = llvm.add %3012, %3014 : !llvm.i64 + %3016 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3017 = llvm.mul %2488, %3016 : !llvm.i64 + %3018 = llvm.add %3015, %3017 : !llvm.i64 + %3019 = llvm.getelementptr %3008[%3018] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3007, %3019 : !llvm.ptr> + %3020 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3021 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3022 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3023 = llvm.mul %2470, %3022 : !llvm.i64 + %3024 = llvm.add %3021, %3023 : !llvm.i64 + %3025 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3026 = llvm.mul %2613, %3025 : !llvm.i64 + %3027 = llvm.add %3024, %3026 : !llvm.i64 + %3028 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3029 = llvm.mul %2488, %3028 : !llvm.i64 + %3030 = llvm.add %3027, %3029 : !llvm.i64 + %3031 = llvm.getelementptr %3020[%3030] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3032 = llvm.load %3031 : !llvm.ptr> + %3033 = llvm.insertelement %2729, %3032[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3037 = llvm.mul %2470, %3036 : !llvm.i64 + %3038 = llvm.add %3035, %3037 : !llvm.i64 + %3039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3040 = llvm.mul %2613, %3039 : !llvm.i64 + %3041 = llvm.add %3038, %3040 : !llvm.i64 + %3042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3043 = llvm.mul %2488, %3042 : !llvm.i64 + %3044 = llvm.add %3041, %3043 : !llvm.i64 + %3045 = llvm.getelementptr %3034[%3044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3033, %3045 : !llvm.ptr> + %3046 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3047 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3048 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3049 = llvm.mul %2470, %3048 : !llvm.i64 + %3050 = llvm.add %3047, %3049 : !llvm.i64 + %3051 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3052 = llvm.mul %2613, %3051 : !llvm.i64 + %3053 = llvm.add %3050, %3052 : !llvm.i64 + %3054 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3055 = llvm.mul %2488, %3054 : !llvm.i64 + %3056 = llvm.add %3053, %3055 : !llvm.i64 + %3057 = llvm.getelementptr %3046[%3056] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3058 = llvm.load %3057 : !llvm.ptr> + %3059 = llvm.insertelement %2730, %3058[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3060 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3062 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3063 = llvm.mul %2470, %3062 : !llvm.i64 + %3064 = llvm.add %3061, %3063 : !llvm.i64 + %3065 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3066 = llvm.mul %2613, %3065 : !llvm.i64 + %3067 = llvm.add %3064, %3066 : !llvm.i64 + %3068 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3069 = llvm.mul %2488, %3068 : !llvm.i64 + %3070 = llvm.add %3067, %3069 : !llvm.i64 + %3071 = llvm.getelementptr %3060[%3070] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3059, %3071 : !llvm.ptr> + %3072 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3073 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3074 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3075 = llvm.mul %2470, %3074 : !llvm.i64 + %3076 = llvm.add %3073, %3075 : !llvm.i64 + %3077 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3078 = llvm.mul %2613, %3077 : !llvm.i64 + %3079 = llvm.add %3076, %3078 : !llvm.i64 + %3080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3081 = llvm.mul %2488, %3080 : !llvm.i64 + %3082 = llvm.add %3079, %3081 : !llvm.i64 + %3083 = llvm.getelementptr %3072[%3082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3084 = llvm.load %3083 : !llvm.ptr> + %3085 = llvm.insertelement %2731, %3084[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3086 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3087 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3088 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3089 = llvm.mul %2470, %3088 : !llvm.i64 + %3090 = llvm.add %3087, %3089 : !llvm.i64 + %3091 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3092 = llvm.mul %2613, %3091 : !llvm.i64 + %3093 = llvm.add %3090, %3092 : !llvm.i64 + %3094 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3095 = llvm.mul %2488, %3094 : !llvm.i64 + %3096 = llvm.add %3093, %3095 : !llvm.i64 + %3097 = llvm.getelementptr %3086[%3096] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3085, %3097 : !llvm.ptr> + %3098 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3100 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3101 = llvm.mul %2470, %3100 : !llvm.i64 + %3102 = llvm.add %3099, %3101 : !llvm.i64 + %3103 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3104 = llvm.mul %2613, %3103 : !llvm.i64 + %3105 = llvm.add %3102, %3104 : !llvm.i64 + %3106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3107 = llvm.mul %2488, %3106 : !llvm.i64 + %3108 = llvm.add %3105, %3107 : !llvm.i64 + %3109 = llvm.getelementptr %3098[%3108] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3110 = llvm.load %3109 : !llvm.ptr> + %3111 = llvm.insertelement %2732, %3110[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3112 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3114 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3115 = llvm.mul %2470, %3114 : !llvm.i64 + %3116 = llvm.add %3113, %3115 : !llvm.i64 + %3117 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3118 = llvm.mul %2613, %3117 : !llvm.i64 + %3119 = llvm.add %3116, %3118 : !llvm.i64 + %3120 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3121 = llvm.mul %2488, %3120 : !llvm.i64 + %3122 = llvm.add %3119, %3121 : !llvm.i64 + %3123 = llvm.getelementptr %3112[%3122] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3111, %3123 : !llvm.ptr> + %3124 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3126 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3127 = llvm.mul %2470, %3126 : !llvm.i64 + %3128 = llvm.add %3125, %3127 : !llvm.i64 + %3129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3130 = llvm.mul %2613, %3129 : !llvm.i64 + %3131 = llvm.add %3128, %3130 : !llvm.i64 + %3132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3133 = llvm.mul %2488, %3132 : !llvm.i64 + %3134 = llvm.add %3131, %3133 : !llvm.i64 + %3135 = llvm.getelementptr %3124[%3134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3136 = llvm.load %3135 : !llvm.ptr> + %3137 = llvm.insertelement %2733, %3136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3138 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3140 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3141 = llvm.mul %2470, %3140 : !llvm.i64 + %3142 = llvm.add %3139, %3141 : !llvm.i64 + %3143 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3144 = llvm.mul %2613, %3143 : !llvm.i64 + %3145 = llvm.add %3142, %3144 : !llvm.i64 + %3146 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3147 = llvm.mul %2488, %3146 : !llvm.i64 + %3148 = llvm.add %3145, %3147 : !llvm.i64 + %3149 = llvm.getelementptr %3138[%3148] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3137, %3149 : !llvm.ptr> + %3150 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3151 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3152 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3153 = llvm.mul %2379, %3152 : !llvm.i64 + %3154 = llvm.add %3151, %3153 : !llvm.i64 + %3155 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3156 = llvm.mul %2380, %3155 : !llvm.i64 + %3157 = llvm.add %3154, %3156 : !llvm.i64 + %3158 = llvm.getelementptr %3150[%3157] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3159 = llvm.load %3158 : !llvm.ptr + %3160 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3162 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3163 = llvm.mul %2379, %3162 : !llvm.i64 + %3164 = llvm.add %3161, %3163 : !llvm.i64 + %3165 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3166 = llvm.mul %2380, %3165 : !llvm.i64 + %3167 = llvm.add %3164, %3166 : !llvm.i64 + %3168 = llvm.getelementptr %3160[%3167] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3169 = llvm.load %3168 : !llvm.ptr + %3170 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3171 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3172 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3173 = llvm.mul %2379, %3172 : !llvm.i64 + %3174 = llvm.add %3171, %3173 : !llvm.i64 + %3175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3176 = llvm.mul %2380, %3175 : !llvm.i64 + %3177 = llvm.add %3174, %3176 : !llvm.i64 + %3178 = llvm.getelementptr %3170[%3177] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3179 = llvm.load %3178 : !llvm.ptr + %3180 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3182 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3183 = llvm.mul %2379, %3182 : !llvm.i64 + %3184 = llvm.add %3181, %3183 : !llvm.i64 + %3185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3186 = llvm.mul %2380, %3185 : !llvm.i64 + %3187 = llvm.add %3184, %3186 : !llvm.i64 + %3188 = llvm.getelementptr %3180[%3187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3189 = llvm.load %3188 : !llvm.ptr + %3190 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3192 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3193 = llvm.mul %2379, %3192 : !llvm.i64 + %3194 = llvm.add %3191, %3193 : !llvm.i64 + %3195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3196 = llvm.mul %2380, %3195 : !llvm.i64 + %3197 = llvm.add %3194, %3196 : !llvm.i64 + %3198 = llvm.getelementptr %3190[%3197] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3199 = llvm.load %3198 : !llvm.ptr + %3200 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3202 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3203 = llvm.mul %2379, %3202 : !llvm.i64 + %3204 = llvm.add %3201, %3203 : !llvm.i64 + %3205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3206 = llvm.mul %2380, %3205 : !llvm.i64 + %3207 = llvm.add %3204, %3206 : !llvm.i64 + %3208 = llvm.getelementptr %3200[%3207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3209 = llvm.load %3208 : !llvm.ptr + %3210 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3212 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3213 = llvm.mul %2379, %3212 : !llvm.i64 + %3214 = llvm.add %3211, %3213 : !llvm.i64 + %3215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3216 = llvm.mul %2380, %3215 : !llvm.i64 + %3217 = llvm.add %3214, %3216 : !llvm.i64 + %3218 = llvm.getelementptr %3210[%3217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3219 = llvm.load %3218 : !llvm.ptr + %3220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3222 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3223 = llvm.mul %2379, %3222 : !llvm.i64 + %3224 = llvm.add %3221, %3223 : !llvm.i64 + %3225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3226 = llvm.mul %2380, %3225 : !llvm.i64 + %3227 = llvm.add %3224, %3226 : !llvm.i64 + %3228 = llvm.getelementptr %3220[%3227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3229 = llvm.load %3228 : !llvm.ptr + %3230 = llvm.add %2368, %70 : !llvm.i64 + %3231 = llvm.icmp "slt" %3230, %67 : !llvm.i64 + %3232 = llvm.sub %64, %3230 : !llvm.i64 + %3233 = llvm.select %3231, %3232, %3230 : !llvm.i1, !llvm.i64 + %3234 = llvm.sdiv %3233, %68 : !llvm.i64 + %3235 = llvm.sub %64, %3234 : !llvm.i64 + %3236 = llvm.select %3231, %3235, %3234 : !llvm.i1, !llvm.i64 + %3237 = llvm.srem %3236, %68 : !llvm.i64 + %3238 = llvm.icmp "slt" %3237, %67 : !llvm.i64 + %3239 = llvm.add %3237, %68 : !llvm.i64 + %3240 = llvm.select %3238, %3239, %3237 : !llvm.i1, !llvm.i64 + %3241 = llvm.sdiv %2463, %70 : !llvm.i64 + %3242 = llvm.sub %64, %3241 : !llvm.i64 + %3243 = llvm.select %2461, %3242, %3241 : !llvm.i1, !llvm.i64 + %3244 = llvm.mul %3236, %65 : !llvm.i64 + %3245 = llvm.add %3243, %3244 : !llvm.i64 + %3246 = llvm.add %3245, %69 : !llvm.i64 + %3247 = llvm.icmp "slt" %3246, %67 : !llvm.i64 + %3248 = llvm.sub %64, %3246 : !llvm.i64 + %3249 = llvm.select %3247, %3248, %3246 : !llvm.i1, !llvm.i64 + %3250 = llvm.sdiv %3249, %63 : !llvm.i64 + %3251 = llvm.sub %64, %3250 : !llvm.i64 + %3252 = llvm.select %3247, %3251, %3250 : !llvm.i1, !llvm.i64 + %3253 = llvm.mul %3252, %65 : !llvm.i64 + %3254 = llvm.add %3245, %3253 : !llvm.i64 + %3255 = llvm.add %3254, %69 : !llvm.i64 + %3256 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3257 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3258 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3259 = llvm.mul %3240, %3258 : !llvm.i64 + %3260 = llvm.add %3257, %3259 : !llvm.i64 + %3261 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3262 = llvm.mul %2474, %3261 : !llvm.i64 + %3263 = llvm.add %3260, %3262 : !llvm.i64 + %3264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3265 = llvm.mul %3255, %3264 : !llvm.i64 + %3266 = llvm.add %3263, %3265 : !llvm.i64 + %3267 = llvm.getelementptr %3256[%3266] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3268 = llvm.load %3267 : !llvm.ptr> + %3269 = llvm.extractelement %3268[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3270 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3271 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3272 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3273 = llvm.mul %3240, %3272 : !llvm.i64 + %3274 = llvm.add %3271, %3273 : !llvm.i64 + %3275 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3276 = llvm.mul %2474, %3275 : !llvm.i64 + %3277 = llvm.add %3274, %3276 : !llvm.i64 + %3278 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3279 = llvm.mul %3255, %3278 : !llvm.i64 + %3280 = llvm.add %3277, %3279 : !llvm.i64 + %3281 = llvm.getelementptr %3270[%3280] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3282 = llvm.load %3281 : !llvm.ptr> + %3283 = llvm.extractelement %3282[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3284 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3286 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3287 = llvm.mul %3240, %3286 : !llvm.i64 + %3288 = llvm.add %3285, %3287 : !llvm.i64 + %3289 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3290 = llvm.mul %2474, %3289 : !llvm.i64 + %3291 = llvm.add %3288, %3290 : !llvm.i64 + %3292 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3293 = llvm.mul %3255, %3292 : !llvm.i64 + %3294 = llvm.add %3291, %3293 : !llvm.i64 + %3295 = llvm.getelementptr %3284[%3294] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3296 = llvm.load %3295 : !llvm.ptr> + %3297 = llvm.extractelement %3296[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3298 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3300 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3301 = llvm.mul %3240, %3300 : !llvm.i64 + %3302 = llvm.add %3299, %3301 : !llvm.i64 + %3303 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3304 = llvm.mul %2474, %3303 : !llvm.i64 + %3305 = llvm.add %3302, %3304 : !llvm.i64 + %3306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3307 = llvm.mul %3255, %3306 : !llvm.i64 + %3308 = llvm.add %3305, %3307 : !llvm.i64 + %3309 = llvm.getelementptr %3298[%3308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3310 = llvm.load %3309 : !llvm.ptr> + %3311 = llvm.extractelement %3310[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3312 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3313 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3314 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3315 = llvm.mul %3240, %3314 : !llvm.i64 + %3316 = llvm.add %3313, %3315 : !llvm.i64 + %3317 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3318 = llvm.mul %2474, %3317 : !llvm.i64 + %3319 = llvm.add %3316, %3318 : !llvm.i64 + %3320 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3321 = llvm.mul %3255, %3320 : !llvm.i64 + %3322 = llvm.add %3319, %3321 : !llvm.i64 + %3323 = llvm.getelementptr %3312[%3322] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3324 = llvm.load %3323 : !llvm.ptr> + %3325 = llvm.extractelement %3324[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3326 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3328 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3329 = llvm.mul %3240, %3328 : !llvm.i64 + %3330 = llvm.add %3327, %3329 : !llvm.i64 + %3331 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3332 = llvm.mul %2474, %3331 : !llvm.i64 + %3333 = llvm.add %3330, %3332 : !llvm.i64 + %3334 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3335 = llvm.mul %3255, %3334 : !llvm.i64 + %3336 = llvm.add %3333, %3335 : !llvm.i64 + %3337 = llvm.getelementptr %3326[%3336] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3338 = llvm.load %3337 : !llvm.ptr> + %3339 = llvm.extractelement %3338[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3340 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3342 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3343 = llvm.mul %3240, %3342 : !llvm.i64 + %3344 = llvm.add %3341, %3343 : !llvm.i64 + %3345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3346 = llvm.mul %2474, %3345 : !llvm.i64 + %3347 = llvm.add %3344, %3346 : !llvm.i64 + %3348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3349 = llvm.mul %3255, %3348 : !llvm.i64 + %3350 = llvm.add %3347, %3349 : !llvm.i64 + %3351 = llvm.getelementptr %3340[%3350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3352 = llvm.load %3351 : !llvm.ptr> + %3353 = llvm.extractelement %3352[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3354 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3356 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3357 = llvm.mul %3240, %3356 : !llvm.i64 + %3358 = llvm.add %3355, %3357 : !llvm.i64 + %3359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3360 = llvm.mul %2474, %3359 : !llvm.i64 + %3361 = llvm.add %3358, %3360 : !llvm.i64 + %3362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3363 = llvm.mul %3255, %3362 : !llvm.i64 + %3364 = llvm.add %3361, %3363 : !llvm.i64 + %3365 = llvm.getelementptr %3354[%3364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3366 = llvm.load %3365 : !llvm.ptr> + %3367 = llvm.extractelement %3366[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3368 = llvm.fmul %3159, %3269 {RelaxedPrecision} : !llvm.float + %3369 = llvm.fmul %3169, %3283 {RelaxedPrecision} : !llvm.float + %3370 = llvm.fmul %3179, %3297 {RelaxedPrecision} : !llvm.float + %3371 = llvm.fmul %3189, %3311 {RelaxedPrecision} : !llvm.float + %3372 = llvm.fmul %3199, %3325 {RelaxedPrecision} : !llvm.float + %3373 = llvm.fmul %3209, %3339 {RelaxedPrecision} : !llvm.float + %3374 = llvm.fmul %3219, %3353 {RelaxedPrecision} : !llvm.float + %3375 = llvm.fmul %3229, %3367 {RelaxedPrecision} : !llvm.float + %3376 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3377 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3378 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3379 = llvm.mul %3240, %3378 : !llvm.i64 + %3380 = llvm.add %3377, %3379 : !llvm.i64 + %3381 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3382 = llvm.mul %2613, %3381 : !llvm.i64 + %3383 = llvm.add %3380, %3382 : !llvm.i64 + %3384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3385 = llvm.mul %3255, %3384 : !llvm.i64 + %3386 = llvm.add %3383, %3385 : !llvm.i64 + %3387 = llvm.getelementptr %3376[%3386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3388 = llvm.load %3387 : !llvm.ptr> + %3389 = llvm.extractelement %3388[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3390 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3392 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3393 = llvm.mul %3240, %3392 : !llvm.i64 + %3394 = llvm.add %3391, %3393 : !llvm.i64 + %3395 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3396 = llvm.mul %2613, %3395 : !llvm.i64 + %3397 = llvm.add %3394, %3396 : !llvm.i64 + %3398 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3399 = llvm.mul %3255, %3398 : !llvm.i64 + %3400 = llvm.add %3397, %3399 : !llvm.i64 + %3401 = llvm.getelementptr %3390[%3400] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3402 = llvm.load %3401 : !llvm.ptr> + %3403 = llvm.extractelement %3402[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3404 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3405 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3406 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3407 = llvm.mul %3240, %3406 : !llvm.i64 + %3408 = llvm.add %3405, %3407 : !llvm.i64 + %3409 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3410 = llvm.mul %2613, %3409 : !llvm.i64 + %3411 = llvm.add %3408, %3410 : !llvm.i64 + %3412 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3413 = llvm.mul %3255, %3412 : !llvm.i64 + %3414 = llvm.add %3411, %3413 : !llvm.i64 + %3415 = llvm.getelementptr %3404[%3414] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3416 = llvm.load %3415 : !llvm.ptr> + %3417 = llvm.extractelement %3416[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3421 = llvm.mul %3240, %3420 : !llvm.i64 + %3422 = llvm.add %3419, %3421 : !llvm.i64 + %3423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3424 = llvm.mul %2613, %3423 : !llvm.i64 + %3425 = llvm.add %3422, %3424 : !llvm.i64 + %3426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3427 = llvm.mul %3255, %3426 : !llvm.i64 + %3428 = llvm.add %3425, %3427 : !llvm.i64 + %3429 = llvm.getelementptr %3418[%3428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3430 = llvm.load %3429 : !llvm.ptr> + %3431 = llvm.extractelement %3430[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3435 = llvm.mul %3240, %3434 : !llvm.i64 + %3436 = llvm.add %3433, %3435 : !llvm.i64 + %3437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3438 = llvm.mul %2613, %3437 : !llvm.i64 + %3439 = llvm.add %3436, %3438 : !llvm.i64 + %3440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3441 = llvm.mul %3255, %3440 : !llvm.i64 + %3442 = llvm.add %3439, %3441 : !llvm.i64 + %3443 = llvm.getelementptr %3432[%3442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3444 = llvm.load %3443 : !llvm.ptr> + %3445 = llvm.extractelement %3444[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3446 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3447 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3448 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3449 = llvm.mul %3240, %3448 : !llvm.i64 + %3450 = llvm.add %3447, %3449 : !llvm.i64 + %3451 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3452 = llvm.mul %2613, %3451 : !llvm.i64 + %3453 = llvm.add %3450, %3452 : !llvm.i64 + %3454 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3455 = llvm.mul %3255, %3454 : !llvm.i64 + %3456 = llvm.add %3453, %3455 : !llvm.i64 + %3457 = llvm.getelementptr %3446[%3456] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3458 = llvm.load %3457 : !llvm.ptr> + %3459 = llvm.extractelement %3458[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3460 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3461 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3462 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3463 = llvm.mul %3240, %3462 : !llvm.i64 + %3464 = llvm.add %3461, %3463 : !llvm.i64 + %3465 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3466 = llvm.mul %2613, %3465 : !llvm.i64 + %3467 = llvm.add %3464, %3466 : !llvm.i64 + %3468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3469 = llvm.mul %3255, %3468 : !llvm.i64 + %3470 = llvm.add %3467, %3469 : !llvm.i64 + %3471 = llvm.getelementptr %3460[%3470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3472 = llvm.load %3471 : !llvm.ptr> + %3473 = llvm.extractelement %3472[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3477 = llvm.mul %3240, %3476 : !llvm.i64 + %3478 = llvm.add %3475, %3477 : !llvm.i64 + %3479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3480 = llvm.mul %2613, %3479 : !llvm.i64 + %3481 = llvm.add %3478, %3480 : !llvm.i64 + %3482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3483 = llvm.mul %3255, %3482 : !llvm.i64 + %3484 = llvm.add %3481, %3483 : !llvm.i64 + %3485 = llvm.getelementptr %3474[%3484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3486 = llvm.load %3485 : !llvm.ptr> + %3487 = llvm.extractelement %3486[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3488 = llvm.fadd %3389, %3368 {RelaxedPrecision} : !llvm.float + %3489 = llvm.fadd %3403, %3369 {RelaxedPrecision} : !llvm.float + %3490 = llvm.fadd %3417, %3370 {RelaxedPrecision} : !llvm.float + %3491 = llvm.fadd %3431, %3371 {RelaxedPrecision} : !llvm.float + %3492 = llvm.fadd %3445, %3372 {RelaxedPrecision} : !llvm.float + %3493 = llvm.fadd %3459, %3373 {RelaxedPrecision} : !llvm.float + %3494 = llvm.fadd %3473, %3374 {RelaxedPrecision} : !llvm.float + %3495 = llvm.fadd %3487, %3375 {RelaxedPrecision} : !llvm.float + %3496 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3497 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3498 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3499 = llvm.mul %3240, %3498 : !llvm.i64 + %3500 = llvm.add %3497, %3499 : !llvm.i64 + %3501 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3502 = llvm.mul %2613, %3501 : !llvm.i64 + %3503 = llvm.add %3500, %3502 : !llvm.i64 + %3504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3505 = llvm.mul %3255, %3504 : !llvm.i64 + %3506 = llvm.add %3503, %3505 : !llvm.i64 + %3507 = llvm.getelementptr %3496[%3506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3508 = llvm.load %3507 : !llvm.ptr> + %3509 = llvm.insertelement %3488, %3508[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3510 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3511 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3512 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3513 = llvm.mul %3240, %3512 : !llvm.i64 + %3514 = llvm.add %3511, %3513 : !llvm.i64 + %3515 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3516 = llvm.mul %2613, %3515 : !llvm.i64 + %3517 = llvm.add %3514, %3516 : !llvm.i64 + %3518 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3519 = llvm.mul %3255, %3518 : !llvm.i64 + %3520 = llvm.add %3517, %3519 : !llvm.i64 + %3521 = llvm.getelementptr %3510[%3520] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3509, %3521 : !llvm.ptr> + %3522 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3523 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3524 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3525 = llvm.mul %3240, %3524 : !llvm.i64 + %3526 = llvm.add %3523, %3525 : !llvm.i64 + %3527 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3528 = llvm.mul %2613, %3527 : !llvm.i64 + %3529 = llvm.add %3526, %3528 : !llvm.i64 + %3530 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3531 = llvm.mul %3255, %3530 : !llvm.i64 + %3532 = llvm.add %3529, %3531 : !llvm.i64 + %3533 = llvm.getelementptr %3522[%3532] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3534 = llvm.load %3533 : !llvm.ptr> + %3535 = llvm.insertelement %3489, %3534[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3536 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3537 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3538 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3539 = llvm.mul %3240, %3538 : !llvm.i64 + %3540 = llvm.add %3537, %3539 : !llvm.i64 + %3541 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3542 = llvm.mul %2613, %3541 : !llvm.i64 + %3543 = llvm.add %3540, %3542 : !llvm.i64 + %3544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3545 = llvm.mul %3255, %3544 : !llvm.i64 + %3546 = llvm.add %3543, %3545 : !llvm.i64 + %3547 = llvm.getelementptr %3536[%3546] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3535, %3547 : !llvm.ptr> + %3548 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3550 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3551 = llvm.mul %3240, %3550 : !llvm.i64 + %3552 = llvm.add %3549, %3551 : !llvm.i64 + %3553 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3554 = llvm.mul %2613, %3553 : !llvm.i64 + %3555 = llvm.add %3552, %3554 : !llvm.i64 + %3556 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3557 = llvm.mul %3255, %3556 : !llvm.i64 + %3558 = llvm.add %3555, %3557 : !llvm.i64 + %3559 = llvm.getelementptr %3548[%3558] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3560 = llvm.load %3559 : !llvm.ptr> + %3561 = llvm.insertelement %3490, %3560[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3562 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3563 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3564 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3565 = llvm.mul %3240, %3564 : !llvm.i64 + %3566 = llvm.add %3563, %3565 : !llvm.i64 + %3567 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3568 = llvm.mul %2613, %3567 : !llvm.i64 + %3569 = llvm.add %3566, %3568 : !llvm.i64 + %3570 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3571 = llvm.mul %3255, %3570 : !llvm.i64 + %3572 = llvm.add %3569, %3571 : !llvm.i64 + %3573 = llvm.getelementptr %3562[%3572] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3561, %3573 : !llvm.ptr> + %3574 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3575 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3576 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3577 = llvm.mul %3240, %3576 : !llvm.i64 + %3578 = llvm.add %3575, %3577 : !llvm.i64 + %3579 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3580 = llvm.mul %2613, %3579 : !llvm.i64 + %3581 = llvm.add %3578, %3580 : !llvm.i64 + %3582 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3583 = llvm.mul %3255, %3582 : !llvm.i64 + %3584 = llvm.add %3581, %3583 : !llvm.i64 + %3585 = llvm.getelementptr %3574[%3584] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3586 = llvm.load %3585 : !llvm.ptr> + %3587 = llvm.insertelement %3491, %3586[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3588 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3590 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3591 = llvm.mul %3240, %3590 : !llvm.i64 + %3592 = llvm.add %3589, %3591 : !llvm.i64 + %3593 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3594 = llvm.mul %2613, %3593 : !llvm.i64 + %3595 = llvm.add %3592, %3594 : !llvm.i64 + %3596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3597 = llvm.mul %3255, %3596 : !llvm.i64 + %3598 = llvm.add %3595, %3597 : !llvm.i64 + %3599 = llvm.getelementptr %3588[%3598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3587, %3599 : !llvm.ptr> + %3600 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3602 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3603 = llvm.mul %3240, %3602 : !llvm.i64 + %3604 = llvm.add %3601, %3603 : !llvm.i64 + %3605 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3606 = llvm.mul %2613, %3605 : !llvm.i64 + %3607 = llvm.add %3604, %3606 : !llvm.i64 + %3608 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3609 = llvm.mul %3255, %3608 : !llvm.i64 + %3610 = llvm.add %3607, %3609 : !llvm.i64 + %3611 = llvm.getelementptr %3600[%3610] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3612 = llvm.load %3611 : !llvm.ptr> + %3613 = llvm.insertelement %3492, %3612[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3617 = llvm.mul %3240, %3616 : !llvm.i64 + %3618 = llvm.add %3615, %3617 : !llvm.i64 + %3619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3620 = llvm.mul %2613, %3619 : !llvm.i64 + %3621 = llvm.add %3618, %3620 : !llvm.i64 + %3622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3623 = llvm.mul %3255, %3622 : !llvm.i64 + %3624 = llvm.add %3621, %3623 : !llvm.i64 + %3625 = llvm.getelementptr %3614[%3624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3613, %3625 : !llvm.ptr> + %3626 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3627 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3628 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3629 = llvm.mul %3240, %3628 : !llvm.i64 + %3630 = llvm.add %3627, %3629 : !llvm.i64 + %3631 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3632 = llvm.mul %2613, %3631 : !llvm.i64 + %3633 = llvm.add %3630, %3632 : !llvm.i64 + %3634 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3635 = llvm.mul %3255, %3634 : !llvm.i64 + %3636 = llvm.add %3633, %3635 : !llvm.i64 + %3637 = llvm.getelementptr %3626[%3636] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3638 = llvm.load %3637 : !llvm.ptr> + %3639 = llvm.insertelement %3493, %3638[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3640 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3641 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3642 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3643 = llvm.mul %3240, %3642 : !llvm.i64 + %3644 = llvm.add %3641, %3643 : !llvm.i64 + %3645 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3646 = llvm.mul %2613, %3645 : !llvm.i64 + %3647 = llvm.add %3644, %3646 : !llvm.i64 + %3648 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3649 = llvm.mul %3255, %3648 : !llvm.i64 + %3650 = llvm.add %3647, %3649 : !llvm.i64 + %3651 = llvm.getelementptr %3640[%3650] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3639, %3651 : !llvm.ptr> + %3652 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3653 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3654 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3655 = llvm.mul %3240, %3654 : !llvm.i64 + %3656 = llvm.add %3653, %3655 : !llvm.i64 + %3657 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3658 = llvm.mul %2613, %3657 : !llvm.i64 + %3659 = llvm.add %3656, %3658 : !llvm.i64 + %3660 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3661 = llvm.mul %3255, %3660 : !llvm.i64 + %3662 = llvm.add %3659, %3661 : !llvm.i64 + %3663 = llvm.getelementptr %3652[%3662] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3664 = llvm.load %3663 : !llvm.ptr> + %3665 = llvm.insertelement %3494, %3664[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3666 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3667 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3668 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3669 = llvm.mul %3240, %3668 : !llvm.i64 + %3670 = llvm.add %3667, %3669 : !llvm.i64 + %3671 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3672 = llvm.mul %2613, %3671 : !llvm.i64 + %3673 = llvm.add %3670, %3672 : !llvm.i64 + %3674 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3675 = llvm.mul %3255, %3674 : !llvm.i64 + %3676 = llvm.add %3673, %3675 : !llvm.i64 + %3677 = llvm.getelementptr %3666[%3676] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3665, %3677 : !llvm.ptr> + %3678 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3680 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3681 = llvm.mul %3240, %3680 : !llvm.i64 + %3682 = llvm.add %3679, %3681 : !llvm.i64 + %3683 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3684 = llvm.mul %2613, %3683 : !llvm.i64 + %3685 = llvm.add %3682, %3684 : !llvm.i64 + %3686 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3687 = llvm.mul %3255, %3686 : !llvm.i64 + %3688 = llvm.add %3685, %3687 : !llvm.i64 + %3689 = llvm.getelementptr %3678[%3688] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3690 = llvm.load %3689 : !llvm.ptr> + %3691 = llvm.insertelement %3495, %3690[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3692 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3694 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3695 = llvm.mul %3240, %3694 : !llvm.i64 + %3696 = llvm.add %3693, %3695 : !llvm.i64 + %3697 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3698 = llvm.mul %2613, %3697 : !llvm.i64 + %3699 = llvm.add %3696, %3698 : !llvm.i64 + %3700 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3701 = llvm.mul %3255, %3700 : !llvm.i64 + %3702 = llvm.add %3699, %3701 : !llvm.i64 + %3703 = llvm.getelementptr %3692[%3702] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3691, %3703 : !llvm.ptr> + %3704 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3705 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3706 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3707 = llvm.mul %3240, %3706 : !llvm.i64 + %3708 = llvm.add %3705, %3707 : !llvm.i64 + %3709 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3710 = llvm.mul %2613, %3709 : !llvm.i64 + %3711 = llvm.add %3708, %3710 : !llvm.i64 + %3712 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3713 = llvm.mul %3255, %3712 : !llvm.i64 + %3714 = llvm.add %3711, %3713 : !llvm.i64 + %3715 = llvm.getelementptr %3704[%3714] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3716 = llvm.load %3715 : !llvm.ptr> + %3717 = llvm.insertelement %3488, %3716[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3718 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3720 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3721 = llvm.mul %3240, %3720 : !llvm.i64 + %3722 = llvm.add %3719, %3721 : !llvm.i64 + %3723 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3724 = llvm.mul %2613, %3723 : !llvm.i64 + %3725 = llvm.add %3722, %3724 : !llvm.i64 + %3726 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3727 = llvm.mul %3255, %3726 : !llvm.i64 + %3728 = llvm.add %3725, %3727 : !llvm.i64 + %3729 = llvm.getelementptr %3718[%3728] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3717, %3729 : !llvm.ptr> + %3730 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3731 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3732 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3733 = llvm.mul %3240, %3732 : !llvm.i64 + %3734 = llvm.add %3731, %3733 : !llvm.i64 + %3735 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3736 = llvm.mul %2613, %3735 : !llvm.i64 + %3737 = llvm.add %3734, %3736 : !llvm.i64 + %3738 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3739 = llvm.mul %3255, %3738 : !llvm.i64 + %3740 = llvm.add %3737, %3739 : !llvm.i64 + %3741 = llvm.getelementptr %3730[%3740] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3742 = llvm.load %3741 : !llvm.ptr> + %3743 = llvm.insertelement %3489, %3742[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3744 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3746 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3747 = llvm.mul %3240, %3746 : !llvm.i64 + %3748 = llvm.add %3745, %3747 : !llvm.i64 + %3749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3750 = llvm.mul %2613, %3749 : !llvm.i64 + %3751 = llvm.add %3748, %3750 : !llvm.i64 + %3752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3753 = llvm.mul %3255, %3752 : !llvm.i64 + %3754 = llvm.add %3751, %3753 : !llvm.i64 + %3755 = llvm.getelementptr %3744[%3754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3743, %3755 : !llvm.ptr> + %3756 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3758 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3759 = llvm.mul %3240, %3758 : !llvm.i64 + %3760 = llvm.add %3757, %3759 : !llvm.i64 + %3761 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3762 = llvm.mul %2613, %3761 : !llvm.i64 + %3763 = llvm.add %3760, %3762 : !llvm.i64 + %3764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3765 = llvm.mul %3255, %3764 : !llvm.i64 + %3766 = llvm.add %3763, %3765 : !llvm.i64 + %3767 = llvm.getelementptr %3756[%3766] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3768 = llvm.load %3767 : !llvm.ptr> + %3769 = llvm.insertelement %3490, %3768[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3770 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3771 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3772 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3773 = llvm.mul %3240, %3772 : !llvm.i64 + %3774 = llvm.add %3771, %3773 : !llvm.i64 + %3775 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3776 = llvm.mul %2613, %3775 : !llvm.i64 + %3777 = llvm.add %3774, %3776 : !llvm.i64 + %3778 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3779 = llvm.mul %3255, %3778 : !llvm.i64 + %3780 = llvm.add %3777, %3779 : !llvm.i64 + %3781 = llvm.getelementptr %3770[%3780] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3769, %3781 : !llvm.ptr> + %3782 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3784 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3785 = llvm.mul %3240, %3784 : !llvm.i64 + %3786 = llvm.add %3783, %3785 : !llvm.i64 + %3787 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3788 = llvm.mul %2613, %3787 : !llvm.i64 + %3789 = llvm.add %3786, %3788 : !llvm.i64 + %3790 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3791 = llvm.mul %3255, %3790 : !llvm.i64 + %3792 = llvm.add %3789, %3791 : !llvm.i64 + %3793 = llvm.getelementptr %3782[%3792] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3794 = llvm.load %3793 : !llvm.ptr> + %3795 = llvm.insertelement %3491, %3794[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3796 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3797 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3798 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3799 = llvm.mul %3240, %3798 : !llvm.i64 + %3800 = llvm.add %3797, %3799 : !llvm.i64 + %3801 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3802 = llvm.mul %2613, %3801 : !llvm.i64 + %3803 = llvm.add %3800, %3802 : !llvm.i64 + %3804 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3805 = llvm.mul %3255, %3804 : !llvm.i64 + %3806 = llvm.add %3803, %3805 : !llvm.i64 + %3807 = llvm.getelementptr %3796[%3806] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3795, %3807 : !llvm.ptr> + %3808 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3810 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3811 = llvm.mul %3240, %3810 : !llvm.i64 + %3812 = llvm.add %3809, %3811 : !llvm.i64 + %3813 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3814 = llvm.mul %2613, %3813 : !llvm.i64 + %3815 = llvm.add %3812, %3814 : !llvm.i64 + %3816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3817 = llvm.mul %3255, %3816 : !llvm.i64 + %3818 = llvm.add %3815, %3817 : !llvm.i64 + %3819 = llvm.getelementptr %3808[%3818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3820 = llvm.load %3819 : !llvm.ptr> + %3821 = llvm.insertelement %3492, %3820[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3822 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3824 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3825 = llvm.mul %3240, %3824 : !llvm.i64 + %3826 = llvm.add %3823, %3825 : !llvm.i64 + %3827 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3828 = llvm.mul %2613, %3827 : !llvm.i64 + %3829 = llvm.add %3826, %3828 : !llvm.i64 + %3830 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3831 = llvm.mul %3255, %3830 : !llvm.i64 + %3832 = llvm.add %3829, %3831 : !llvm.i64 + %3833 = llvm.getelementptr %3822[%3832] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3821, %3833 : !llvm.ptr> + %3834 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3835 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3836 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3837 = llvm.mul %3240, %3836 : !llvm.i64 + %3838 = llvm.add %3835, %3837 : !llvm.i64 + %3839 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3840 = llvm.mul %2613, %3839 : !llvm.i64 + %3841 = llvm.add %3838, %3840 : !llvm.i64 + %3842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3843 = llvm.mul %3255, %3842 : !llvm.i64 + %3844 = llvm.add %3841, %3843 : !llvm.i64 + %3845 = llvm.getelementptr %3834[%3844] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3846 = llvm.load %3845 : !llvm.ptr> + %3847 = llvm.insertelement %3493, %3846[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3848 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3850 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3851 = llvm.mul %3240, %3850 : !llvm.i64 + %3852 = llvm.add %3849, %3851 : !llvm.i64 + %3853 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3854 = llvm.mul %2613, %3853 : !llvm.i64 + %3855 = llvm.add %3852, %3854 : !llvm.i64 + %3856 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3857 = llvm.mul %3255, %3856 : !llvm.i64 + %3858 = llvm.add %3855, %3857 : !llvm.i64 + %3859 = llvm.getelementptr %3848[%3858] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3847, %3859 : !llvm.ptr> + %3860 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3861 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3862 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3863 = llvm.mul %3240, %3862 : !llvm.i64 + %3864 = llvm.add %3861, %3863 : !llvm.i64 + %3865 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3866 = llvm.mul %2613, %3865 : !llvm.i64 + %3867 = llvm.add %3864, %3866 : !llvm.i64 + %3868 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3869 = llvm.mul %3255, %3868 : !llvm.i64 + %3870 = llvm.add %3867, %3869 : !llvm.i64 + %3871 = llvm.getelementptr %3860[%3870] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3872 = llvm.load %3871 : !llvm.ptr> + %3873 = llvm.insertelement %3494, %3872[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3874 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3875 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3876 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3877 = llvm.mul %3240, %3876 : !llvm.i64 + %3878 = llvm.add %3875, %3877 : !llvm.i64 + %3879 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3880 = llvm.mul %2613, %3879 : !llvm.i64 + %3881 = llvm.add %3878, %3880 : !llvm.i64 + %3882 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3883 = llvm.mul %3255, %3882 : !llvm.i64 + %3884 = llvm.add %3881, %3883 : !llvm.i64 + %3885 = llvm.getelementptr %3874[%3884] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3873, %3885 : !llvm.ptr> + %3886 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3888 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3889 = llvm.mul %3240, %3888 : !llvm.i64 + %3890 = llvm.add %3887, %3889 : !llvm.i64 + %3891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3892 = llvm.mul %2613, %3891 : !llvm.i64 + %3893 = llvm.add %3890, %3892 : !llvm.i64 + %3894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3895 = llvm.mul %3255, %3894 : !llvm.i64 + %3896 = llvm.add %3893, %3895 : !llvm.i64 + %3897 = llvm.getelementptr %3886[%3896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3898 = llvm.load %3897 : !llvm.ptr> + %3899 = llvm.insertelement %3495, %3898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3903 = llvm.mul %3240, %3902 : !llvm.i64 + %3904 = llvm.add %3901, %3903 : !llvm.i64 + %3905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3906 = llvm.mul %2613, %3905 : !llvm.i64 + %3907 = llvm.add %3904, %3906 : !llvm.i64 + %3908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3909 = llvm.mul %3255, %3908 : !llvm.i64 + %3910 = llvm.add %3907, %3909 : !llvm.i64 + %3911 = llvm.getelementptr %3900[%3910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3899, %3911 : !llvm.ptr> + %3912 = llvm.add %2376, %69 : !llvm.i64 + llvm.br ^bb31(%3912 : !llvm.i64) + ^bb33: // pred: ^bb31 + %3913 = llvm.add %2374, %69 : !llvm.i64 + llvm.br ^bb29(%3913 : !llvm.i64) + ^bb34: // pred: ^bb29 + %3914 = llvm.add %2372, %56 : !llvm.i64 + llvm.br ^bb27(%3914 : !llvm.i64) + ^bb35: // pred: ^bb27 + llvm.br ^bb36(%67 : !llvm.i64) + ^bb36(%3915: !llvm.i64): // 2 preds: ^bb35, ^bb37 + %3916 = llvm.icmp "slt" %3915, %48 : !llvm.i64 + llvm.cond_br %3916, ^bb37, ^bb38 + ^bb37: // pred: ^bb36 + %3917 = llvm.add %2370, %3915 : !llvm.i64 + %3918 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3920 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3921 = llvm.mul %2345, %3920 : !llvm.i64 + %3922 = llvm.add %3919, %3921 : !llvm.i64 + %3923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3924 = llvm.mul %3917, %3923 : !llvm.i64 + %3925 = llvm.add %3922, %3924 : !llvm.i64 + %3926 = llvm.getelementptr %3918[%3925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3927 = llvm.load %3926 : !llvm.ptr + %3928 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3930 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3931 = llvm.mul %2345, %3930 : !llvm.i64 + %3932 = llvm.add %3929, %3931 : !llvm.i64 + %3933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3934 = llvm.mul %3917, %3933 : !llvm.i64 + %3935 = llvm.add %3932, %3934 : !llvm.i64 + %3936 = llvm.getelementptr %3928[%3935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3937 = llvm.load %3936 : !llvm.ptr + %3938 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3940 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3941 = llvm.mul %2345, %3940 : !llvm.i64 + %3942 = llvm.add %3939, %3941 : !llvm.i64 + %3943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3944 = llvm.mul %3917, %3943 : !llvm.i64 + %3945 = llvm.add %3942, %3944 : !llvm.i64 + %3946 = llvm.getelementptr %3938[%3945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3947 = llvm.load %3946 : !llvm.ptr + %3948 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3950 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3951 = llvm.mul %2345, %3950 : !llvm.i64 + %3952 = llvm.add %3949, %3951 : !llvm.i64 + %3953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3954 = llvm.mul %3917, %3953 : !llvm.i64 + %3955 = llvm.add %3952, %3954 : !llvm.i64 + %3956 = llvm.getelementptr %3948[%3955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3957 = llvm.load %3956 : !llvm.ptr + %3958 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3960 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3961 = llvm.mul %2345, %3960 : !llvm.i64 + %3962 = llvm.add %3959, %3961 : !llvm.i64 + %3963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3964 = llvm.mul %3917, %3963 : !llvm.i64 + %3965 = llvm.add %3962, %3964 : !llvm.i64 + %3966 = llvm.getelementptr %3958[%3965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3967 = llvm.load %3966 : !llvm.ptr + %3968 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3970 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3971 = llvm.mul %2345, %3970 : !llvm.i64 + %3972 = llvm.add %3969, %3971 : !llvm.i64 + %3973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3974 = llvm.mul %3917, %3973 : !llvm.i64 + %3975 = llvm.add %3972, %3974 : !llvm.i64 + %3976 = llvm.getelementptr %3968[%3975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3977 = llvm.load %3976 : !llvm.ptr + %3978 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3980 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3981 = llvm.mul %2345, %3980 : !llvm.i64 + %3982 = llvm.add %3979, %3981 : !llvm.i64 + %3983 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3984 = llvm.mul %3917, %3983 : !llvm.i64 + %3985 = llvm.add %3982, %3984 : !llvm.i64 + %3986 = llvm.getelementptr %3978[%3985] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3987 = llvm.load %3986 : !llvm.ptr + %3988 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3990 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3991 = llvm.mul %2345, %3990 : !llvm.i64 + %3992 = llvm.add %3989, %3991 : !llvm.i64 + %3993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3994 = llvm.mul %3917, %3993 : !llvm.i64 + %3995 = llvm.add %3992, %3994 : !llvm.i64 + %3996 = llvm.getelementptr %3988[%3995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3997 = llvm.load %3996 : !llvm.ptr + %3998 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %3999 = llvm.sub %64, %2368 : !llvm.i64 + %4000 = llvm.select %3998, %3999, %2368 : !llvm.i1, !llvm.i64 + %4001 = llvm.sdiv %4000, %68 : !llvm.i64 + %4002 = llvm.sub %64, %4001 : !llvm.i64 + %4003 = llvm.select %3998, %4002, %4001 : !llvm.i1, !llvm.i64 + %4004 = llvm.srem %4003, %68 : !llvm.i64 + %4005 = llvm.icmp "slt" %4004, %67 : !llvm.i64 + %4006 = llvm.add %4004, %68 : !llvm.i64 + %4007 = llvm.select %4005, %4006, %4004 : !llvm.i1, !llvm.i64 + %4008 = llvm.srem %3917, %39 : !llvm.i64 + %4009 = llvm.icmp "slt" %4008, %67 : !llvm.i64 + %4010 = llvm.add %4008, %39 : !llvm.i64 + %4011 = llvm.select %4009, %4010, %4008 : !llvm.i1, !llvm.i64 + %4012 = llvm.srem %2368, %68 : !llvm.i64 + %4013 = llvm.icmp "slt" %4012, %67 : !llvm.i64 + %4014 = llvm.add %4012, %68 : !llvm.i64 + %4015 = llvm.select %4013, %4014, %4012 : !llvm.i1, !llvm.i64 + %4016 = llvm.icmp "slt" %4015, %67 : !llvm.i64 + %4017 = llvm.sub %64, %4015 : !llvm.i64 + %4018 = llvm.select %4016, %4017, %4015 : !llvm.i1, !llvm.i64 + %4019 = llvm.sdiv %4018, %70 : !llvm.i64 + %4020 = llvm.sub %64, %4019 : !llvm.i64 + %4021 = llvm.select %4016, %4020, %4019 : !llvm.i1, !llvm.i64 + %4022 = llvm.srem %4021, %63 : !llvm.i64 + %4023 = llvm.icmp "slt" %4022, %67 : !llvm.i64 + %4024 = llvm.add %4022, %63 : !llvm.i64 + %4025 = llvm.select %4023, %4024, %4022 : !llvm.i1, !llvm.i64 + %4026 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4027 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4028 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4029 = llvm.mul %4007, %4028 : !llvm.i64 + %4030 = llvm.add %4027, %4029 : !llvm.i64 + %4031 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4032 = llvm.mul %4011, %4031 : !llvm.i64 + %4033 = llvm.add %4030, %4032 : !llvm.i64 + %4034 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4035 = llvm.mul %4025, %4034 : !llvm.i64 + %4036 = llvm.add %4033, %4035 : !llvm.i64 + %4037 = llvm.getelementptr %4026[%4036] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4038 = llvm.load %4037 : !llvm.ptr> + %4039 = llvm.extractelement %4038[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4040 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4041 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4042 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4043 = llvm.mul %4007, %4042 : !llvm.i64 + %4044 = llvm.add %4041, %4043 : !llvm.i64 + %4045 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4046 = llvm.mul %4011, %4045 : !llvm.i64 + %4047 = llvm.add %4044, %4046 : !llvm.i64 + %4048 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4049 = llvm.mul %4025, %4048 : !llvm.i64 + %4050 = llvm.add %4047, %4049 : !llvm.i64 + %4051 = llvm.getelementptr %4040[%4050] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4052 = llvm.load %4051 : !llvm.ptr> + %4053 = llvm.extractelement %4052[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4054 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4056 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4057 = llvm.mul %4007, %4056 : !llvm.i64 + %4058 = llvm.add %4055, %4057 : !llvm.i64 + %4059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4060 = llvm.mul %4011, %4059 : !llvm.i64 + %4061 = llvm.add %4058, %4060 : !llvm.i64 + %4062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4063 = llvm.mul %4025, %4062 : !llvm.i64 + %4064 = llvm.add %4061, %4063 : !llvm.i64 + %4065 = llvm.getelementptr %4054[%4064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4066 = llvm.load %4065 : !llvm.ptr> + %4067 = llvm.extractelement %4066[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4068 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4070 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4071 = llvm.mul %4007, %4070 : !llvm.i64 + %4072 = llvm.add %4069, %4071 : !llvm.i64 + %4073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4074 = llvm.mul %4011, %4073 : !llvm.i64 + %4075 = llvm.add %4072, %4074 : !llvm.i64 + %4076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4077 = llvm.mul %4025, %4076 : !llvm.i64 + %4078 = llvm.add %4075, %4077 : !llvm.i64 + %4079 = llvm.getelementptr %4068[%4078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4080 = llvm.load %4079 : !llvm.ptr> + %4081 = llvm.extractelement %4080[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4082 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4083 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4084 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4085 = llvm.mul %4007, %4084 : !llvm.i64 + %4086 = llvm.add %4083, %4085 : !llvm.i64 + %4087 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4088 = llvm.mul %4011, %4087 : !llvm.i64 + %4089 = llvm.add %4086, %4088 : !llvm.i64 + %4090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4091 = llvm.mul %4025, %4090 : !llvm.i64 + %4092 = llvm.add %4089, %4091 : !llvm.i64 + %4093 = llvm.getelementptr %4082[%4092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4094 = llvm.load %4093 : !llvm.ptr> + %4095 = llvm.extractelement %4094[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4096 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4098 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4099 = llvm.mul %4007, %4098 : !llvm.i64 + %4100 = llvm.add %4097, %4099 : !llvm.i64 + %4101 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4102 = llvm.mul %4011, %4101 : !llvm.i64 + %4103 = llvm.add %4100, %4102 : !llvm.i64 + %4104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4105 = llvm.mul %4025, %4104 : !llvm.i64 + %4106 = llvm.add %4103, %4105 : !llvm.i64 + %4107 = llvm.getelementptr %4096[%4106] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4108 = llvm.load %4107 : !llvm.ptr> + %4109 = llvm.extractelement %4108[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4110 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4111 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4112 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4113 = llvm.mul %4007, %4112 : !llvm.i64 + %4114 = llvm.add %4111, %4113 : !llvm.i64 + %4115 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4116 = llvm.mul %4011, %4115 : !llvm.i64 + %4117 = llvm.add %4114, %4116 : !llvm.i64 + %4118 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4119 = llvm.mul %4025, %4118 : !llvm.i64 + %4120 = llvm.add %4117, %4119 : !llvm.i64 + %4121 = llvm.getelementptr %4110[%4120] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4122 = llvm.load %4121 : !llvm.ptr> + %4123 = llvm.extractelement %4122[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4124 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4126 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4127 = llvm.mul %4007, %4126 : !llvm.i64 + %4128 = llvm.add %4125, %4127 : !llvm.i64 + %4129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4130 = llvm.mul %4011, %4129 : !llvm.i64 + %4131 = llvm.add %4128, %4130 : !llvm.i64 + %4132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4133 = llvm.mul %4025, %4132 : !llvm.i64 + %4134 = llvm.add %4131, %4133 : !llvm.i64 + %4135 = llvm.getelementptr %4124[%4134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4136 = llvm.load %4135 : !llvm.ptr> + %4137 = llvm.extractelement %4136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4138 = llvm.fmul %3927, %4039 {RelaxedPrecision} : !llvm.float + %4139 = llvm.fmul %3937, %4053 {RelaxedPrecision} : !llvm.float + %4140 = llvm.fmul %3947, %4067 {RelaxedPrecision} : !llvm.float + %4141 = llvm.fmul %3957, %4081 {RelaxedPrecision} : !llvm.float + %4142 = llvm.fmul %3967, %4095 {RelaxedPrecision} : !llvm.float + %4143 = llvm.fmul %3977, %4109 {RelaxedPrecision} : !llvm.float + %4144 = llvm.fmul %3987, %4123 {RelaxedPrecision} : !llvm.float + %4145 = llvm.fmul %3997, %4137 {RelaxedPrecision} : !llvm.float + %4146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4149 = llvm.mul %4007, %4148 : !llvm.i64 + %4150 = llvm.add %4147, %4149 : !llvm.i64 + %4151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4152 = llvm.mul %67, %4151 : !llvm.i64 + %4153 = llvm.add %4150, %4152 : !llvm.i64 + %4154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4155 = llvm.mul %4025, %4154 : !llvm.i64 + %4156 = llvm.add %4153, %4155 : !llvm.i64 + %4157 = llvm.getelementptr %4146[%4156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4158 = llvm.load %4157 : !llvm.ptr> + %4159 = llvm.extractelement %4158[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4160 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4162 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4163 = llvm.mul %4007, %4162 : !llvm.i64 + %4164 = llvm.add %4161, %4163 : !llvm.i64 + %4165 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4166 = llvm.mul %67, %4165 : !llvm.i64 + %4167 = llvm.add %4164, %4166 : !llvm.i64 + %4168 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4169 = llvm.mul %4025, %4168 : !llvm.i64 + %4170 = llvm.add %4167, %4169 : !llvm.i64 + %4171 = llvm.getelementptr %4160[%4170] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4172 = llvm.load %4171 : !llvm.ptr> + %4173 = llvm.extractelement %4172[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4174 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4175 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4176 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4177 = llvm.mul %4007, %4176 : !llvm.i64 + %4178 = llvm.add %4175, %4177 : !llvm.i64 + %4179 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4180 = llvm.mul %67, %4179 : !llvm.i64 + %4181 = llvm.add %4178, %4180 : !llvm.i64 + %4182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4183 = llvm.mul %4025, %4182 : !llvm.i64 + %4184 = llvm.add %4181, %4183 : !llvm.i64 + %4185 = llvm.getelementptr %4174[%4184] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4186 = llvm.load %4185 : !llvm.ptr> + %4187 = llvm.extractelement %4186[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4188 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4190 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4191 = llvm.mul %4007, %4190 : !llvm.i64 + %4192 = llvm.add %4189, %4191 : !llvm.i64 + %4193 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4194 = llvm.mul %67, %4193 : !llvm.i64 + %4195 = llvm.add %4192, %4194 : !llvm.i64 + %4196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4197 = llvm.mul %4025, %4196 : !llvm.i64 + %4198 = llvm.add %4195, %4197 : !llvm.i64 + %4199 = llvm.getelementptr %4188[%4198] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4200 = llvm.load %4199 : !llvm.ptr> + %4201 = llvm.extractelement %4200[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4202 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4203 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4204 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4205 = llvm.mul %4007, %4204 : !llvm.i64 + %4206 = llvm.add %4203, %4205 : !llvm.i64 + %4207 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4208 = llvm.mul %67, %4207 : !llvm.i64 + %4209 = llvm.add %4206, %4208 : !llvm.i64 + %4210 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4211 = llvm.mul %4025, %4210 : !llvm.i64 + %4212 = llvm.add %4209, %4211 : !llvm.i64 + %4213 = llvm.getelementptr %4202[%4212] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4214 = llvm.load %4213 : !llvm.ptr> + %4215 = llvm.extractelement %4214[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4216 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4217 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4218 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4219 = llvm.mul %4007, %4218 : !llvm.i64 + %4220 = llvm.add %4217, %4219 : !llvm.i64 + %4221 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4222 = llvm.mul %67, %4221 : !llvm.i64 + %4223 = llvm.add %4220, %4222 : !llvm.i64 + %4224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4225 = llvm.mul %4025, %4224 : !llvm.i64 + %4226 = llvm.add %4223, %4225 : !llvm.i64 + %4227 = llvm.getelementptr %4216[%4226] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4228 = llvm.load %4227 : !llvm.ptr> + %4229 = llvm.extractelement %4228[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4230 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4232 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4233 = llvm.mul %4007, %4232 : !llvm.i64 + %4234 = llvm.add %4231, %4233 : !llvm.i64 + %4235 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4236 = llvm.mul %67, %4235 : !llvm.i64 + %4237 = llvm.add %4234, %4236 : !llvm.i64 + %4238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4239 = llvm.mul %4025, %4238 : !llvm.i64 + %4240 = llvm.add %4237, %4239 : !llvm.i64 + %4241 = llvm.getelementptr %4230[%4240] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4242 = llvm.load %4241 : !llvm.ptr> + %4243 = llvm.extractelement %4242[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4244 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4245 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4246 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4247 = llvm.mul %4007, %4246 : !llvm.i64 + %4248 = llvm.add %4245, %4247 : !llvm.i64 + %4249 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4250 = llvm.mul %67, %4249 : !llvm.i64 + %4251 = llvm.add %4248, %4250 : !llvm.i64 + %4252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4253 = llvm.mul %4025, %4252 : !llvm.i64 + %4254 = llvm.add %4251, %4253 : !llvm.i64 + %4255 = llvm.getelementptr %4244[%4254] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4256 = llvm.load %4255 : !llvm.ptr> + %4257 = llvm.extractelement %4256[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4258 = llvm.fadd %4159, %4138 {RelaxedPrecision} : !llvm.float + %4259 = llvm.fadd %4173, %4139 {RelaxedPrecision} : !llvm.float + %4260 = llvm.fadd %4187, %4140 {RelaxedPrecision} : !llvm.float + %4261 = llvm.fadd %4201, %4141 {RelaxedPrecision} : !llvm.float + %4262 = llvm.fadd %4215, %4142 {RelaxedPrecision} : !llvm.float + %4263 = llvm.fadd %4229, %4143 {RelaxedPrecision} : !llvm.float + %4264 = llvm.fadd %4243, %4144 {RelaxedPrecision} : !llvm.float + %4265 = llvm.fadd %4257, %4145 {RelaxedPrecision} : !llvm.float + %4266 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4268 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4269 = llvm.mul %4007, %4268 : !llvm.i64 + %4270 = llvm.add %4267, %4269 : !llvm.i64 + %4271 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4272 = llvm.mul %67, %4271 : !llvm.i64 + %4273 = llvm.add %4270, %4272 : !llvm.i64 + %4274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4275 = llvm.mul %4025, %4274 : !llvm.i64 + %4276 = llvm.add %4273, %4275 : !llvm.i64 + %4277 = llvm.getelementptr %4266[%4276] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4278 = llvm.load %4277 : !llvm.ptr> + %4279 = llvm.insertelement %4258, %4278[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4280 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4281 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4282 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4283 = llvm.mul %4007, %4282 : !llvm.i64 + %4284 = llvm.add %4281, %4283 : !llvm.i64 + %4285 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4286 = llvm.mul %67, %4285 : !llvm.i64 + %4287 = llvm.add %4284, %4286 : !llvm.i64 + %4288 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4289 = llvm.mul %4025, %4288 : !llvm.i64 + %4290 = llvm.add %4287, %4289 : !llvm.i64 + %4291 = llvm.getelementptr %4280[%4290] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4279, %4291 : !llvm.ptr> + %4292 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4293 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4294 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4295 = llvm.mul %4007, %4294 : !llvm.i64 + %4296 = llvm.add %4293, %4295 : !llvm.i64 + %4297 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4298 = llvm.mul %67, %4297 : !llvm.i64 + %4299 = llvm.add %4296, %4298 : !llvm.i64 + %4300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4301 = llvm.mul %4025, %4300 : !llvm.i64 + %4302 = llvm.add %4299, %4301 : !llvm.i64 + %4303 = llvm.getelementptr %4292[%4302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4304 = llvm.load %4303 : !llvm.ptr> + %4305 = llvm.insertelement %4259, %4304[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4306 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4307 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4308 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4309 = llvm.mul %4007, %4308 : !llvm.i64 + %4310 = llvm.add %4307, %4309 : !llvm.i64 + %4311 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4312 = llvm.mul %67, %4311 : !llvm.i64 + %4313 = llvm.add %4310, %4312 : !llvm.i64 + %4314 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4315 = llvm.mul %4025, %4314 : !llvm.i64 + %4316 = llvm.add %4313, %4315 : !llvm.i64 + %4317 = llvm.getelementptr %4306[%4316] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4305, %4317 : !llvm.ptr> + %4318 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4320 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4321 = llvm.mul %4007, %4320 : !llvm.i64 + %4322 = llvm.add %4319, %4321 : !llvm.i64 + %4323 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4324 = llvm.mul %67, %4323 : !llvm.i64 + %4325 = llvm.add %4322, %4324 : !llvm.i64 + %4326 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4327 = llvm.mul %4025, %4326 : !llvm.i64 + %4328 = llvm.add %4325, %4327 : !llvm.i64 + %4329 = llvm.getelementptr %4318[%4328] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4330 = llvm.load %4329 : !llvm.ptr> + %4331 = llvm.insertelement %4260, %4330[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4332 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4333 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4334 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4335 = llvm.mul %4007, %4334 : !llvm.i64 + %4336 = llvm.add %4333, %4335 : !llvm.i64 + %4337 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4338 = llvm.mul %67, %4337 : !llvm.i64 + %4339 = llvm.add %4336, %4338 : !llvm.i64 + %4340 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4341 = llvm.mul %4025, %4340 : !llvm.i64 + %4342 = llvm.add %4339, %4341 : !llvm.i64 + %4343 = llvm.getelementptr %4332[%4342] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4331, %4343 : !llvm.ptr> + %4344 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4345 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4346 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4347 = llvm.mul %4007, %4346 : !llvm.i64 + %4348 = llvm.add %4345, %4347 : !llvm.i64 + %4349 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4350 = llvm.mul %67, %4349 : !llvm.i64 + %4351 = llvm.add %4348, %4350 : !llvm.i64 + %4352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4353 = llvm.mul %4025, %4352 : !llvm.i64 + %4354 = llvm.add %4351, %4353 : !llvm.i64 + %4355 = llvm.getelementptr %4344[%4354] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4356 = llvm.load %4355 : !llvm.ptr> + %4357 = llvm.insertelement %4261, %4356[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4358 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4360 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4361 = llvm.mul %4007, %4360 : !llvm.i64 + %4362 = llvm.add %4359, %4361 : !llvm.i64 + %4363 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4364 = llvm.mul %67, %4363 : !llvm.i64 + %4365 = llvm.add %4362, %4364 : !llvm.i64 + %4366 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4367 = llvm.mul %4025, %4366 : !llvm.i64 + %4368 = llvm.add %4365, %4367 : !llvm.i64 + %4369 = llvm.getelementptr %4358[%4368] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4357, %4369 : !llvm.ptr> + %4370 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4371 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4372 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4373 = llvm.mul %4007, %4372 : !llvm.i64 + %4374 = llvm.add %4371, %4373 : !llvm.i64 + %4375 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4376 = llvm.mul %67, %4375 : !llvm.i64 + %4377 = llvm.add %4374, %4376 : !llvm.i64 + %4378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4379 = llvm.mul %4025, %4378 : !llvm.i64 + %4380 = llvm.add %4377, %4379 : !llvm.i64 + %4381 = llvm.getelementptr %4370[%4380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4382 = llvm.load %4381 : !llvm.ptr> + %4383 = llvm.insertelement %4262, %4382[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4384 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4385 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4386 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4387 = llvm.mul %4007, %4386 : !llvm.i64 + %4388 = llvm.add %4385, %4387 : !llvm.i64 + %4389 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4390 = llvm.mul %67, %4389 : !llvm.i64 + %4391 = llvm.add %4388, %4390 : !llvm.i64 + %4392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4393 = llvm.mul %4025, %4392 : !llvm.i64 + %4394 = llvm.add %4391, %4393 : !llvm.i64 + %4395 = llvm.getelementptr %4384[%4394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4383, %4395 : !llvm.ptr> + %4396 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4397 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4398 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4399 = llvm.mul %4007, %4398 : !llvm.i64 + %4400 = llvm.add %4397, %4399 : !llvm.i64 + %4401 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4402 = llvm.mul %67, %4401 : !llvm.i64 + %4403 = llvm.add %4400, %4402 : !llvm.i64 + %4404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4405 = llvm.mul %4025, %4404 : !llvm.i64 + %4406 = llvm.add %4403, %4405 : !llvm.i64 + %4407 = llvm.getelementptr %4396[%4406] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4408 = llvm.load %4407 : !llvm.ptr> + %4409 = llvm.insertelement %4263, %4408[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4410 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4412 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4413 = llvm.mul %4007, %4412 : !llvm.i64 + %4414 = llvm.add %4411, %4413 : !llvm.i64 + %4415 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4416 = llvm.mul %67, %4415 : !llvm.i64 + %4417 = llvm.add %4414, %4416 : !llvm.i64 + %4418 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4419 = llvm.mul %4025, %4418 : !llvm.i64 + %4420 = llvm.add %4417, %4419 : !llvm.i64 + %4421 = llvm.getelementptr %4410[%4420] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4409, %4421 : !llvm.ptr> + %4422 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4423 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4424 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4425 = llvm.mul %4007, %4424 : !llvm.i64 + %4426 = llvm.add %4423, %4425 : !llvm.i64 + %4427 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4428 = llvm.mul %67, %4427 : !llvm.i64 + %4429 = llvm.add %4426, %4428 : !llvm.i64 + %4430 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4431 = llvm.mul %4025, %4430 : !llvm.i64 + %4432 = llvm.add %4429, %4431 : !llvm.i64 + %4433 = llvm.getelementptr %4422[%4432] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4434 = llvm.load %4433 : !llvm.ptr> + %4435 = llvm.insertelement %4264, %4434[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4436 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4437 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4438 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4439 = llvm.mul %4007, %4438 : !llvm.i64 + %4440 = llvm.add %4437, %4439 : !llvm.i64 + %4441 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4442 = llvm.mul %67, %4441 : !llvm.i64 + %4443 = llvm.add %4440, %4442 : !llvm.i64 + %4444 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4445 = llvm.mul %4025, %4444 : !llvm.i64 + %4446 = llvm.add %4443, %4445 : !llvm.i64 + %4447 = llvm.getelementptr %4436[%4446] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4435, %4447 : !llvm.ptr> + %4448 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4450 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4451 = llvm.mul %4007, %4450 : !llvm.i64 + %4452 = llvm.add %4449, %4451 : !llvm.i64 + %4453 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4454 = llvm.mul %67, %4453 : !llvm.i64 + %4455 = llvm.add %4452, %4454 : !llvm.i64 + %4456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4457 = llvm.mul %4025, %4456 : !llvm.i64 + %4458 = llvm.add %4455, %4457 : !llvm.i64 + %4459 = llvm.getelementptr %4448[%4458] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4460 = llvm.load %4459 : !llvm.ptr> + %4461 = llvm.insertelement %4265, %4460[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4462 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4463 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4464 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4465 = llvm.mul %4007, %4464 : !llvm.i64 + %4466 = llvm.add %4463, %4465 : !llvm.i64 + %4467 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4468 = llvm.mul %67, %4467 : !llvm.i64 + %4469 = llvm.add %4466, %4468 : !llvm.i64 + %4470 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4471 = llvm.mul %4025, %4470 : !llvm.i64 + %4472 = llvm.add %4469, %4471 : !llvm.i64 + %4473 = llvm.getelementptr %4462[%4472] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4461, %4473 : !llvm.ptr> + %4474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4477 = llvm.mul %4007, %4476 : !llvm.i64 + %4478 = llvm.add %4475, %4477 : !llvm.i64 + %4479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4480 = llvm.mul %67, %4479 : !llvm.i64 + %4481 = llvm.add %4478, %4480 : !llvm.i64 + %4482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4483 = llvm.mul %4025, %4482 : !llvm.i64 + %4484 = llvm.add %4481, %4483 : !llvm.i64 + %4485 = llvm.getelementptr %4474[%4484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4486 = llvm.load %4485 : !llvm.ptr> + %4487 = llvm.insertelement %4258, %4486[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4488 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4490 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4491 = llvm.mul %4007, %4490 : !llvm.i64 + %4492 = llvm.add %4489, %4491 : !llvm.i64 + %4493 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4494 = llvm.mul %67, %4493 : !llvm.i64 + %4495 = llvm.add %4492, %4494 : !llvm.i64 + %4496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4497 = llvm.mul %4025, %4496 : !llvm.i64 + %4498 = llvm.add %4495, %4497 : !llvm.i64 + %4499 = llvm.getelementptr %4488[%4498] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4487, %4499 : !llvm.ptr> + %4500 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4501 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4502 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4503 = llvm.mul %4007, %4502 : !llvm.i64 + %4504 = llvm.add %4501, %4503 : !llvm.i64 + %4505 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4506 = llvm.mul %67, %4505 : !llvm.i64 + %4507 = llvm.add %4504, %4506 : !llvm.i64 + %4508 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4509 = llvm.mul %4025, %4508 : !llvm.i64 + %4510 = llvm.add %4507, %4509 : !llvm.i64 + %4511 = llvm.getelementptr %4500[%4510] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4512 = llvm.load %4511 : !llvm.ptr> + %4513 = llvm.insertelement %4259, %4512[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4514 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4515 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4516 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4517 = llvm.mul %4007, %4516 : !llvm.i64 + %4518 = llvm.add %4515, %4517 : !llvm.i64 + %4519 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4520 = llvm.mul %67, %4519 : !llvm.i64 + %4521 = llvm.add %4518, %4520 : !llvm.i64 + %4522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4523 = llvm.mul %4025, %4522 : !llvm.i64 + %4524 = llvm.add %4521, %4523 : !llvm.i64 + %4525 = llvm.getelementptr %4514[%4524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4513, %4525 : !llvm.ptr> + %4526 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4528 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4529 = llvm.mul %4007, %4528 : !llvm.i64 + %4530 = llvm.add %4527, %4529 : !llvm.i64 + %4531 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4532 = llvm.mul %67, %4531 : !llvm.i64 + %4533 = llvm.add %4530, %4532 : !llvm.i64 + %4534 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4535 = llvm.mul %4025, %4534 : !llvm.i64 + %4536 = llvm.add %4533, %4535 : !llvm.i64 + %4537 = llvm.getelementptr %4526[%4536] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4538 = llvm.load %4537 : !llvm.ptr> + %4539 = llvm.insertelement %4260, %4538[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4540 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4542 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4543 = llvm.mul %4007, %4542 : !llvm.i64 + %4544 = llvm.add %4541, %4543 : !llvm.i64 + %4545 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4546 = llvm.mul %67, %4545 : !llvm.i64 + %4547 = llvm.add %4544, %4546 : !llvm.i64 + %4548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4549 = llvm.mul %4025, %4548 : !llvm.i64 + %4550 = llvm.add %4547, %4549 : !llvm.i64 + %4551 = llvm.getelementptr %4540[%4550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4539, %4551 : !llvm.ptr> + %4552 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4553 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4554 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4555 = llvm.mul %4007, %4554 : !llvm.i64 + %4556 = llvm.add %4553, %4555 : !llvm.i64 + %4557 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4558 = llvm.mul %67, %4557 : !llvm.i64 + %4559 = llvm.add %4556, %4558 : !llvm.i64 + %4560 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4561 = llvm.mul %4025, %4560 : !llvm.i64 + %4562 = llvm.add %4559, %4561 : !llvm.i64 + %4563 = llvm.getelementptr %4552[%4562] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4564 = llvm.load %4563 : !llvm.ptr> + %4565 = llvm.insertelement %4261, %4564[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4566 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4567 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4568 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4569 = llvm.mul %4007, %4568 : !llvm.i64 + %4570 = llvm.add %4567, %4569 : !llvm.i64 + %4571 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4572 = llvm.mul %67, %4571 : !llvm.i64 + %4573 = llvm.add %4570, %4572 : !llvm.i64 + %4574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4575 = llvm.mul %4025, %4574 : !llvm.i64 + %4576 = llvm.add %4573, %4575 : !llvm.i64 + %4577 = llvm.getelementptr %4566[%4576] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4565, %4577 : !llvm.ptr> + %4578 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4580 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4581 = llvm.mul %4007, %4580 : !llvm.i64 + %4582 = llvm.add %4579, %4581 : !llvm.i64 + %4583 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4584 = llvm.mul %67, %4583 : !llvm.i64 + %4585 = llvm.add %4582, %4584 : !llvm.i64 + %4586 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4587 = llvm.mul %4025, %4586 : !llvm.i64 + %4588 = llvm.add %4585, %4587 : !llvm.i64 + %4589 = llvm.getelementptr %4578[%4588] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4590 = llvm.load %4589 : !llvm.ptr> + %4591 = llvm.insertelement %4262, %4590[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4592 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4593 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4594 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4595 = llvm.mul %4007, %4594 : !llvm.i64 + %4596 = llvm.add %4593, %4595 : !llvm.i64 + %4597 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4598 = llvm.mul %67, %4597 : !llvm.i64 + %4599 = llvm.add %4596, %4598 : !llvm.i64 + %4600 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4601 = llvm.mul %4025, %4600 : !llvm.i64 + %4602 = llvm.add %4599, %4601 : !llvm.i64 + %4603 = llvm.getelementptr %4592[%4602] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4591, %4603 : !llvm.ptr> + %4604 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4605 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4606 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4607 = llvm.mul %4007, %4606 : !llvm.i64 + %4608 = llvm.add %4605, %4607 : !llvm.i64 + %4609 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4610 = llvm.mul %67, %4609 : !llvm.i64 + %4611 = llvm.add %4608, %4610 : !llvm.i64 + %4612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4613 = llvm.mul %4025, %4612 : !llvm.i64 + %4614 = llvm.add %4611, %4613 : !llvm.i64 + %4615 = llvm.getelementptr %4604[%4614] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4616 = llvm.load %4615 : !llvm.ptr> + %4617 = llvm.insertelement %4263, %4616[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4618 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4620 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4621 = llvm.mul %4007, %4620 : !llvm.i64 + %4622 = llvm.add %4619, %4621 : !llvm.i64 + %4623 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4624 = llvm.mul %67, %4623 : !llvm.i64 + %4625 = llvm.add %4622, %4624 : !llvm.i64 + %4626 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4627 = llvm.mul %4025, %4626 : !llvm.i64 + %4628 = llvm.add %4625, %4627 : !llvm.i64 + %4629 = llvm.getelementptr %4618[%4628] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4617, %4629 : !llvm.ptr> + %4630 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4632 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4633 = llvm.mul %4007, %4632 : !llvm.i64 + %4634 = llvm.add %4631, %4633 : !llvm.i64 + %4635 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4636 = llvm.mul %67, %4635 : !llvm.i64 + %4637 = llvm.add %4634, %4636 : !llvm.i64 + %4638 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4639 = llvm.mul %4025, %4638 : !llvm.i64 + %4640 = llvm.add %4637, %4639 : !llvm.i64 + %4641 = llvm.getelementptr %4630[%4640] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4642 = llvm.load %4641 : !llvm.ptr> + %4643 = llvm.insertelement %4264, %4642[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4644 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4645 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4646 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4647 = llvm.mul %4007, %4646 : !llvm.i64 + %4648 = llvm.add %4645, %4647 : !llvm.i64 + %4649 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4650 = llvm.mul %67, %4649 : !llvm.i64 + %4651 = llvm.add %4648, %4650 : !llvm.i64 + %4652 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4653 = llvm.mul %4025, %4652 : !llvm.i64 + %4654 = llvm.add %4651, %4653 : !llvm.i64 + %4655 = llvm.getelementptr %4644[%4654] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4643, %4655 : !llvm.ptr> + %4656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4659 = llvm.mul %4007, %4658 : !llvm.i64 + %4660 = llvm.add %4657, %4659 : !llvm.i64 + %4661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4662 = llvm.mul %67, %4661 : !llvm.i64 + %4663 = llvm.add %4660, %4662 : !llvm.i64 + %4664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4665 = llvm.mul %4025, %4664 : !llvm.i64 + %4666 = llvm.add %4663, %4665 : !llvm.i64 + %4667 = llvm.getelementptr %4656[%4666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4668 = llvm.load %4667 : !llvm.ptr> + %4669 = llvm.insertelement %4265, %4668[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4673 = llvm.mul %4007, %4672 : !llvm.i64 + %4674 = llvm.add %4671, %4673 : !llvm.i64 + %4675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4676 = llvm.mul %67, %4675 : !llvm.i64 + %4677 = llvm.add %4674, %4676 : !llvm.i64 + %4678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4679 = llvm.mul %4025, %4678 : !llvm.i64 + %4680 = llvm.add %4677, %4679 : !llvm.i64 + %4681 = llvm.getelementptr %4670[%4680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4669, %4681 : !llvm.ptr> + %4682 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4683 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4684 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4685 = llvm.mul %2345, %4684 : !llvm.i64 + %4686 = llvm.add %4683, %4685 : !llvm.i64 + %4687 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4688 = llvm.mul %3917, %4687 : !llvm.i64 + %4689 = llvm.add %4686, %4688 : !llvm.i64 + %4690 = llvm.getelementptr %4682[%4689] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4691 = llvm.load %4690 : !llvm.ptr + %4692 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4694 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4695 = llvm.mul %2345, %4694 : !llvm.i64 + %4696 = llvm.add %4693, %4695 : !llvm.i64 + %4697 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4698 = llvm.mul %3917, %4697 : !llvm.i64 + %4699 = llvm.add %4696, %4698 : !llvm.i64 + %4700 = llvm.getelementptr %4692[%4699] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4701 = llvm.load %4700 : !llvm.ptr + %4702 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4703 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4704 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4705 = llvm.mul %2345, %4704 : !llvm.i64 + %4706 = llvm.add %4703, %4705 : !llvm.i64 + %4707 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4708 = llvm.mul %3917, %4707 : !llvm.i64 + %4709 = llvm.add %4706, %4708 : !llvm.i64 + %4710 = llvm.getelementptr %4702[%4709] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4711 = llvm.load %4710 : !llvm.ptr + %4712 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4714 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4715 = llvm.mul %2345, %4714 : !llvm.i64 + %4716 = llvm.add %4713, %4715 : !llvm.i64 + %4717 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4718 = llvm.mul %3917, %4717 : !llvm.i64 + %4719 = llvm.add %4716, %4718 : !llvm.i64 + %4720 = llvm.getelementptr %4712[%4719] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4721 = llvm.load %4720 : !llvm.ptr + %4722 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4724 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4725 = llvm.mul %2345, %4724 : !llvm.i64 + %4726 = llvm.add %4723, %4725 : !llvm.i64 + %4727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4728 = llvm.mul %3917, %4727 : !llvm.i64 + %4729 = llvm.add %4726, %4728 : !llvm.i64 + %4730 = llvm.getelementptr %4722[%4729] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4731 = llvm.load %4730 : !llvm.ptr + %4732 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4734 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4735 = llvm.mul %2345, %4734 : !llvm.i64 + %4736 = llvm.add %4733, %4735 : !llvm.i64 + %4737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4738 = llvm.mul %3917, %4737 : !llvm.i64 + %4739 = llvm.add %4736, %4738 : !llvm.i64 + %4740 = llvm.getelementptr %4732[%4739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4741 = llvm.load %4740 : !llvm.ptr + %4742 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4743 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4744 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4745 = llvm.mul %2345, %4744 : !llvm.i64 + %4746 = llvm.add %4743, %4745 : !llvm.i64 + %4747 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4748 = llvm.mul %3917, %4747 : !llvm.i64 + %4749 = llvm.add %4746, %4748 : !llvm.i64 + %4750 = llvm.getelementptr %4742[%4749] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4751 = llvm.load %4750 : !llvm.ptr + %4752 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4754 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4755 = llvm.mul %2345, %4754 : !llvm.i64 + %4756 = llvm.add %4753, %4755 : !llvm.i64 + %4757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4758 = llvm.mul %3917, %4757 : !llvm.i64 + %4759 = llvm.add %4756, %4758 : !llvm.i64 + %4760 = llvm.getelementptr %4752[%4759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4761 = llvm.load %4760 : !llvm.ptr + %4762 = llvm.add %2368, %70 : !llvm.i64 + %4763 = llvm.icmp "slt" %4762, %67 : !llvm.i64 + %4764 = llvm.sub %64, %4762 : !llvm.i64 + %4765 = llvm.select %4763, %4764, %4762 : !llvm.i1, !llvm.i64 + %4766 = llvm.sdiv %4765, %68 : !llvm.i64 + %4767 = llvm.sub %64, %4766 : !llvm.i64 + %4768 = llvm.select %4763, %4767, %4766 : !llvm.i1, !llvm.i64 + %4769 = llvm.srem %4768, %68 : !llvm.i64 + %4770 = llvm.icmp "slt" %4769, %67 : !llvm.i64 + %4771 = llvm.add %4769, %68 : !llvm.i64 + %4772 = llvm.select %4770, %4771, %4769 : !llvm.i1, !llvm.i64 + %4773 = llvm.sdiv %4000, %70 : !llvm.i64 + %4774 = llvm.sub %64, %4773 : !llvm.i64 + %4775 = llvm.select %3998, %4774, %4773 : !llvm.i1, !llvm.i64 + %4776 = llvm.mul %4768, %65 : !llvm.i64 + %4777 = llvm.add %4775, %4776 : !llvm.i64 + %4778 = llvm.add %4777, %69 : !llvm.i64 + %4779 = llvm.icmp "slt" %4778, %67 : !llvm.i64 + %4780 = llvm.sub %64, %4778 : !llvm.i64 + %4781 = llvm.select %4779, %4780, %4778 : !llvm.i1, !llvm.i64 + %4782 = llvm.sdiv %4781, %63 : !llvm.i64 + %4783 = llvm.sub %64, %4782 : !llvm.i64 + %4784 = llvm.select %4779, %4783, %4782 : !llvm.i1, !llvm.i64 + %4785 = llvm.mul %4784, %65 : !llvm.i64 + %4786 = llvm.add %4777, %4785 : !llvm.i64 + %4787 = llvm.add %4786, %69 : !llvm.i64 + %4788 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4790 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4791 = llvm.mul %4772, %4790 : !llvm.i64 + %4792 = llvm.add %4789, %4791 : !llvm.i64 + %4793 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4794 = llvm.mul %4011, %4793 : !llvm.i64 + %4795 = llvm.add %4792, %4794 : !llvm.i64 + %4796 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4797 = llvm.mul %4787, %4796 : !llvm.i64 + %4798 = llvm.add %4795, %4797 : !llvm.i64 + %4799 = llvm.getelementptr %4788[%4798] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4800 = llvm.load %4799 : !llvm.ptr> + %4801 = llvm.extractelement %4800[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4802 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4803 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4804 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4805 = llvm.mul %4772, %4804 : !llvm.i64 + %4806 = llvm.add %4803, %4805 : !llvm.i64 + %4807 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4808 = llvm.mul %4011, %4807 : !llvm.i64 + %4809 = llvm.add %4806, %4808 : !llvm.i64 + %4810 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4811 = llvm.mul %4787, %4810 : !llvm.i64 + %4812 = llvm.add %4809, %4811 : !llvm.i64 + %4813 = llvm.getelementptr %4802[%4812] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4814 = llvm.load %4813 : !llvm.ptr> + %4815 = llvm.extractelement %4814[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4816 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4817 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4818 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4819 = llvm.mul %4772, %4818 : !llvm.i64 + %4820 = llvm.add %4817, %4819 : !llvm.i64 + %4821 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4822 = llvm.mul %4011, %4821 : !llvm.i64 + %4823 = llvm.add %4820, %4822 : !llvm.i64 + %4824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4825 = llvm.mul %4787, %4824 : !llvm.i64 + %4826 = llvm.add %4823, %4825 : !llvm.i64 + %4827 = llvm.getelementptr %4816[%4826] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4828 = llvm.load %4827 : !llvm.ptr> + %4829 = llvm.extractelement %4828[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4830 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4831 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4832 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4833 = llvm.mul %4772, %4832 : !llvm.i64 + %4834 = llvm.add %4831, %4833 : !llvm.i64 + %4835 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4836 = llvm.mul %4011, %4835 : !llvm.i64 + %4837 = llvm.add %4834, %4836 : !llvm.i64 + %4838 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4839 = llvm.mul %4787, %4838 : !llvm.i64 + %4840 = llvm.add %4837, %4839 : !llvm.i64 + %4841 = llvm.getelementptr %4830[%4840] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4842 = llvm.load %4841 : !llvm.ptr> + %4843 = llvm.extractelement %4842[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4844 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4845 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4846 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4847 = llvm.mul %4772, %4846 : !llvm.i64 + %4848 = llvm.add %4845, %4847 : !llvm.i64 + %4849 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4850 = llvm.mul %4011, %4849 : !llvm.i64 + %4851 = llvm.add %4848, %4850 : !llvm.i64 + %4852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4853 = llvm.mul %4787, %4852 : !llvm.i64 + %4854 = llvm.add %4851, %4853 : !llvm.i64 + %4855 = llvm.getelementptr %4844[%4854] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4856 = llvm.load %4855 : !llvm.ptr> + %4857 = llvm.extractelement %4856[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4858 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4860 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4861 = llvm.mul %4772, %4860 : !llvm.i64 + %4862 = llvm.add %4859, %4861 : !llvm.i64 + %4863 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4864 = llvm.mul %4011, %4863 : !llvm.i64 + %4865 = llvm.add %4862, %4864 : !llvm.i64 + %4866 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4867 = llvm.mul %4787, %4866 : !llvm.i64 + %4868 = llvm.add %4865, %4867 : !llvm.i64 + %4869 = llvm.getelementptr %4858[%4868] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4870 = llvm.load %4869 : !llvm.ptr> + %4871 = llvm.extractelement %4870[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4872 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4873 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4874 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4875 = llvm.mul %4772, %4874 : !llvm.i64 + %4876 = llvm.add %4873, %4875 : !llvm.i64 + %4877 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4878 = llvm.mul %4011, %4877 : !llvm.i64 + %4879 = llvm.add %4876, %4878 : !llvm.i64 + %4880 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4881 = llvm.mul %4787, %4880 : !llvm.i64 + %4882 = llvm.add %4879, %4881 : !llvm.i64 + %4883 = llvm.getelementptr %4872[%4882] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4884 = llvm.load %4883 : !llvm.ptr> + %4885 = llvm.extractelement %4884[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4886 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4888 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4889 = llvm.mul %4772, %4888 : !llvm.i64 + %4890 = llvm.add %4887, %4889 : !llvm.i64 + %4891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4892 = llvm.mul %4011, %4891 : !llvm.i64 + %4893 = llvm.add %4890, %4892 : !llvm.i64 + %4894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4895 = llvm.mul %4787, %4894 : !llvm.i64 + %4896 = llvm.add %4893, %4895 : !llvm.i64 + %4897 = llvm.getelementptr %4886[%4896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4898 = llvm.load %4897 : !llvm.ptr> + %4899 = llvm.extractelement %4898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4900 = llvm.fmul %4691, %4801 {RelaxedPrecision} : !llvm.float + %4901 = llvm.fmul %4701, %4815 {RelaxedPrecision} : !llvm.float + %4902 = llvm.fmul %4711, %4829 {RelaxedPrecision} : !llvm.float + %4903 = llvm.fmul %4721, %4843 {RelaxedPrecision} : !llvm.float + %4904 = llvm.fmul %4731, %4857 {RelaxedPrecision} : !llvm.float + %4905 = llvm.fmul %4741, %4871 {RelaxedPrecision} : !llvm.float + %4906 = llvm.fmul %4751, %4885 {RelaxedPrecision} : !llvm.float + %4907 = llvm.fmul %4761, %4899 {RelaxedPrecision} : !llvm.float + %4908 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4910 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4911 = llvm.mul %4772, %4910 : !llvm.i64 + %4912 = llvm.add %4909, %4911 : !llvm.i64 + %4913 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4914 = llvm.mul %67, %4913 : !llvm.i64 + %4915 = llvm.add %4912, %4914 : !llvm.i64 + %4916 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4917 = llvm.mul %4787, %4916 : !llvm.i64 + %4918 = llvm.add %4915, %4917 : !llvm.i64 + %4919 = llvm.getelementptr %4908[%4918] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4920 = llvm.load %4919 : !llvm.ptr> + %4921 = llvm.extractelement %4920[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4922 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4923 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4924 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4925 = llvm.mul %4772, %4924 : !llvm.i64 + %4926 = llvm.add %4923, %4925 : !llvm.i64 + %4927 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4928 = llvm.mul %67, %4927 : !llvm.i64 + %4929 = llvm.add %4926, %4928 : !llvm.i64 + %4930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4931 = llvm.mul %4787, %4930 : !llvm.i64 + %4932 = llvm.add %4929, %4931 : !llvm.i64 + %4933 = llvm.getelementptr %4922[%4932] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4934 = llvm.load %4933 : !llvm.ptr> + %4935 = llvm.extractelement %4934[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4936 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4937 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4938 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4939 = llvm.mul %4772, %4938 : !llvm.i64 + %4940 = llvm.add %4937, %4939 : !llvm.i64 + %4941 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4942 = llvm.mul %67, %4941 : !llvm.i64 + %4943 = llvm.add %4940, %4942 : !llvm.i64 + %4944 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4945 = llvm.mul %4787, %4944 : !llvm.i64 + %4946 = llvm.add %4943, %4945 : !llvm.i64 + %4947 = llvm.getelementptr %4936[%4946] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4948 = llvm.load %4947 : !llvm.ptr> + %4949 = llvm.extractelement %4948[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4950 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4951 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4952 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4953 = llvm.mul %4772, %4952 : !llvm.i64 + %4954 = llvm.add %4951, %4953 : !llvm.i64 + %4955 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4956 = llvm.mul %67, %4955 : !llvm.i64 + %4957 = llvm.add %4954, %4956 : !llvm.i64 + %4958 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4959 = llvm.mul %4787, %4958 : !llvm.i64 + %4960 = llvm.add %4957, %4959 : !llvm.i64 + %4961 = llvm.getelementptr %4950[%4960] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4962 = llvm.load %4961 : !llvm.ptr> + %4963 = llvm.extractelement %4962[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4964 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4966 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4967 = llvm.mul %4772, %4966 : !llvm.i64 + %4968 = llvm.add %4965, %4967 : !llvm.i64 + %4969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4970 = llvm.mul %67, %4969 : !llvm.i64 + %4971 = llvm.add %4968, %4970 : !llvm.i64 + %4972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4973 = llvm.mul %4787, %4972 : !llvm.i64 + %4974 = llvm.add %4971, %4973 : !llvm.i64 + %4975 = llvm.getelementptr %4964[%4974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4976 = llvm.load %4975 : !llvm.ptr> + %4977 = llvm.extractelement %4976[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4978 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4980 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4981 = llvm.mul %4772, %4980 : !llvm.i64 + %4982 = llvm.add %4979, %4981 : !llvm.i64 + %4983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4984 = llvm.mul %67, %4983 : !llvm.i64 + %4985 = llvm.add %4982, %4984 : !llvm.i64 + %4986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4987 = llvm.mul %4787, %4986 : !llvm.i64 + %4988 = llvm.add %4985, %4987 : !llvm.i64 + %4989 = llvm.getelementptr %4978[%4988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4990 = llvm.load %4989 : !llvm.ptr> + %4991 = llvm.extractelement %4990[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4992 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4993 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4994 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4995 = llvm.mul %4772, %4994 : !llvm.i64 + %4996 = llvm.add %4993, %4995 : !llvm.i64 + %4997 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4998 = llvm.mul %67, %4997 : !llvm.i64 + %4999 = llvm.add %4996, %4998 : !llvm.i64 + %5000 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5001 = llvm.mul %4787, %5000 : !llvm.i64 + %5002 = llvm.add %4999, %5001 : !llvm.i64 + %5003 = llvm.getelementptr %4992[%5002] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5004 = llvm.load %5003 : !llvm.ptr> + %5005 = llvm.extractelement %5004[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5006 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5007 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5008 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5009 = llvm.mul %4772, %5008 : !llvm.i64 + %5010 = llvm.add %5007, %5009 : !llvm.i64 + %5011 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5012 = llvm.mul %67, %5011 : !llvm.i64 + %5013 = llvm.add %5010, %5012 : !llvm.i64 + %5014 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5015 = llvm.mul %4787, %5014 : !llvm.i64 + %5016 = llvm.add %5013, %5015 : !llvm.i64 + %5017 = llvm.getelementptr %5006[%5016] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5018 = llvm.load %5017 : !llvm.ptr> + %5019 = llvm.extractelement %5018[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5020 = llvm.fadd %4921, %4900 {RelaxedPrecision} : !llvm.float + %5021 = llvm.fadd %4935, %4901 {RelaxedPrecision} : !llvm.float + %5022 = llvm.fadd %4949, %4902 {RelaxedPrecision} : !llvm.float + %5023 = llvm.fadd %4963, %4903 {RelaxedPrecision} : !llvm.float + %5024 = llvm.fadd %4977, %4904 {RelaxedPrecision} : !llvm.float + %5025 = llvm.fadd %4991, %4905 {RelaxedPrecision} : !llvm.float + %5026 = llvm.fadd %5005, %4906 {RelaxedPrecision} : !llvm.float + %5027 = llvm.fadd %5019, %4907 {RelaxedPrecision} : !llvm.float + %5028 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5030 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5031 = llvm.mul %4772, %5030 : !llvm.i64 + %5032 = llvm.add %5029, %5031 : !llvm.i64 + %5033 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5034 = llvm.mul %67, %5033 : !llvm.i64 + %5035 = llvm.add %5032, %5034 : !llvm.i64 + %5036 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5037 = llvm.mul %4787, %5036 : !llvm.i64 + %5038 = llvm.add %5035, %5037 : !llvm.i64 + %5039 = llvm.getelementptr %5028[%5038] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5040 = llvm.load %5039 : !llvm.ptr> + %5041 = llvm.insertelement %5020, %5040[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5042 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5043 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5044 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5045 = llvm.mul %4772, %5044 : !llvm.i64 + %5046 = llvm.add %5043, %5045 : !llvm.i64 + %5047 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5048 = llvm.mul %67, %5047 : !llvm.i64 + %5049 = llvm.add %5046, %5048 : !llvm.i64 + %5050 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5051 = llvm.mul %4787, %5050 : !llvm.i64 + %5052 = llvm.add %5049, %5051 : !llvm.i64 + %5053 = llvm.getelementptr %5042[%5052] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5041, %5053 : !llvm.ptr> + %5054 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5056 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5057 = llvm.mul %4772, %5056 : !llvm.i64 + %5058 = llvm.add %5055, %5057 : !llvm.i64 + %5059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5060 = llvm.mul %67, %5059 : !llvm.i64 + %5061 = llvm.add %5058, %5060 : !llvm.i64 + %5062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5063 = llvm.mul %4787, %5062 : !llvm.i64 + %5064 = llvm.add %5061, %5063 : !llvm.i64 + %5065 = llvm.getelementptr %5054[%5064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5066 = llvm.load %5065 : !llvm.ptr> + %5067 = llvm.insertelement %5021, %5066[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5068 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5070 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5071 = llvm.mul %4772, %5070 : !llvm.i64 + %5072 = llvm.add %5069, %5071 : !llvm.i64 + %5073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5074 = llvm.mul %67, %5073 : !llvm.i64 + %5075 = llvm.add %5072, %5074 : !llvm.i64 + %5076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5077 = llvm.mul %4787, %5076 : !llvm.i64 + %5078 = llvm.add %5075, %5077 : !llvm.i64 + %5079 = llvm.getelementptr %5068[%5078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5067, %5079 : !llvm.ptr> + %5080 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5081 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5082 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5083 = llvm.mul %4772, %5082 : !llvm.i64 + %5084 = llvm.add %5081, %5083 : !llvm.i64 + %5085 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5086 = llvm.mul %67, %5085 : !llvm.i64 + %5087 = llvm.add %5084, %5086 : !llvm.i64 + %5088 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5089 = llvm.mul %4787, %5088 : !llvm.i64 + %5090 = llvm.add %5087, %5089 : !llvm.i64 + %5091 = llvm.getelementptr %5080[%5090] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5092 = llvm.load %5091 : !llvm.ptr> + %5093 = llvm.insertelement %5022, %5092[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5094 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5095 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5096 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5097 = llvm.mul %4772, %5096 : !llvm.i64 + %5098 = llvm.add %5095, %5097 : !llvm.i64 + %5099 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5100 = llvm.mul %67, %5099 : !llvm.i64 + %5101 = llvm.add %5098, %5100 : !llvm.i64 + %5102 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5103 = llvm.mul %4787, %5102 : !llvm.i64 + %5104 = llvm.add %5101, %5103 : !llvm.i64 + %5105 = llvm.getelementptr %5094[%5104] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5093, %5105 : !llvm.ptr> + %5106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5109 = llvm.mul %4772, %5108 : !llvm.i64 + %5110 = llvm.add %5107, %5109 : !llvm.i64 + %5111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5112 = llvm.mul %67, %5111 : !llvm.i64 + %5113 = llvm.add %5110, %5112 : !llvm.i64 + %5114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5115 = llvm.mul %4787, %5114 : !llvm.i64 + %5116 = llvm.add %5113, %5115 : !llvm.i64 + %5117 = llvm.getelementptr %5106[%5116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5118 = llvm.load %5117 : !llvm.ptr> + %5119 = llvm.insertelement %5023, %5118[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5120 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5122 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5123 = llvm.mul %4772, %5122 : !llvm.i64 + %5124 = llvm.add %5121, %5123 : !llvm.i64 + %5125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5126 = llvm.mul %67, %5125 : !llvm.i64 + %5127 = llvm.add %5124, %5126 : !llvm.i64 + %5128 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5129 = llvm.mul %4787, %5128 : !llvm.i64 + %5130 = llvm.add %5127, %5129 : !llvm.i64 + %5131 = llvm.getelementptr %5120[%5130] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5119, %5131 : !llvm.ptr> + %5132 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5134 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5135 = llvm.mul %4772, %5134 : !llvm.i64 + %5136 = llvm.add %5133, %5135 : !llvm.i64 + %5137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5138 = llvm.mul %67, %5137 : !llvm.i64 + %5139 = llvm.add %5136, %5138 : !llvm.i64 + %5140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5141 = llvm.mul %4787, %5140 : !llvm.i64 + %5142 = llvm.add %5139, %5141 : !llvm.i64 + %5143 = llvm.getelementptr %5132[%5142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5144 = llvm.load %5143 : !llvm.ptr> + %5145 = llvm.insertelement %5024, %5144[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5149 = llvm.mul %4772, %5148 : !llvm.i64 + %5150 = llvm.add %5147, %5149 : !llvm.i64 + %5151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5152 = llvm.mul %67, %5151 : !llvm.i64 + %5153 = llvm.add %5150, %5152 : !llvm.i64 + %5154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5155 = llvm.mul %4787, %5154 : !llvm.i64 + %5156 = llvm.add %5153, %5155 : !llvm.i64 + %5157 = llvm.getelementptr %5146[%5156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5145, %5157 : !llvm.ptr> + %5158 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5160 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5161 = llvm.mul %4772, %5160 : !llvm.i64 + %5162 = llvm.add %5159, %5161 : !llvm.i64 + %5163 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5164 = llvm.mul %67, %5163 : !llvm.i64 + %5165 = llvm.add %5162, %5164 : !llvm.i64 + %5166 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5167 = llvm.mul %4787, %5166 : !llvm.i64 + %5168 = llvm.add %5165, %5167 : !llvm.i64 + %5169 = llvm.getelementptr %5158[%5168] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5170 = llvm.load %5169 : !llvm.ptr> + %5171 = llvm.insertelement %5025, %5170[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5172 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5173 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5174 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5175 = llvm.mul %4772, %5174 : !llvm.i64 + %5176 = llvm.add %5173, %5175 : !llvm.i64 + %5177 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5178 = llvm.mul %67, %5177 : !llvm.i64 + %5179 = llvm.add %5176, %5178 : !llvm.i64 + %5180 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5181 = llvm.mul %4787, %5180 : !llvm.i64 + %5182 = llvm.add %5179, %5181 : !llvm.i64 + %5183 = llvm.getelementptr %5172[%5182] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5171, %5183 : !llvm.ptr> + %5184 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5185 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5186 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5187 = llvm.mul %4772, %5186 : !llvm.i64 + %5188 = llvm.add %5185, %5187 : !llvm.i64 + %5189 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5190 = llvm.mul %67, %5189 : !llvm.i64 + %5191 = llvm.add %5188, %5190 : !llvm.i64 + %5192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5193 = llvm.mul %4787, %5192 : !llvm.i64 + %5194 = llvm.add %5191, %5193 : !llvm.i64 + %5195 = llvm.getelementptr %5184[%5194] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5196 = llvm.load %5195 : !llvm.ptr> + %5197 = llvm.insertelement %5026, %5196[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5198 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5200 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5201 = llvm.mul %4772, %5200 : !llvm.i64 + %5202 = llvm.add %5199, %5201 : !llvm.i64 + %5203 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5204 = llvm.mul %67, %5203 : !llvm.i64 + %5205 = llvm.add %5202, %5204 : !llvm.i64 + %5206 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5207 = llvm.mul %4787, %5206 : !llvm.i64 + %5208 = llvm.add %5205, %5207 : !llvm.i64 + %5209 = llvm.getelementptr %5198[%5208] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5197, %5209 : !llvm.ptr> + %5210 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5212 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5213 = llvm.mul %4772, %5212 : !llvm.i64 + %5214 = llvm.add %5211, %5213 : !llvm.i64 + %5215 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5216 = llvm.mul %67, %5215 : !llvm.i64 + %5217 = llvm.add %5214, %5216 : !llvm.i64 + %5218 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5219 = llvm.mul %4787, %5218 : !llvm.i64 + %5220 = llvm.add %5217, %5219 : !llvm.i64 + %5221 = llvm.getelementptr %5210[%5220] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5222 = llvm.load %5221 : !llvm.ptr> + %5223 = llvm.insertelement %5027, %5222[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5224 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5225 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5226 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5227 = llvm.mul %4772, %5226 : !llvm.i64 + %5228 = llvm.add %5225, %5227 : !llvm.i64 + %5229 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5230 = llvm.mul %67, %5229 : !llvm.i64 + %5231 = llvm.add %5228, %5230 : !llvm.i64 + %5232 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5233 = llvm.mul %4787, %5232 : !llvm.i64 + %5234 = llvm.add %5231, %5233 : !llvm.i64 + %5235 = llvm.getelementptr %5224[%5234] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5223, %5235 : !llvm.ptr> + %5236 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5237 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5238 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5239 = llvm.mul %4772, %5238 : !llvm.i64 + %5240 = llvm.add %5237, %5239 : !llvm.i64 + %5241 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5242 = llvm.mul %67, %5241 : !llvm.i64 + %5243 = llvm.add %5240, %5242 : !llvm.i64 + %5244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5245 = llvm.mul %4787, %5244 : !llvm.i64 + %5246 = llvm.add %5243, %5245 : !llvm.i64 + %5247 = llvm.getelementptr %5236[%5246] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5248 = llvm.load %5247 : !llvm.ptr> + %5249 = llvm.insertelement %5020, %5248[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5250 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5252 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5253 = llvm.mul %4772, %5252 : !llvm.i64 + %5254 = llvm.add %5251, %5253 : !llvm.i64 + %5255 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5256 = llvm.mul %67, %5255 : !llvm.i64 + %5257 = llvm.add %5254, %5256 : !llvm.i64 + %5258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5259 = llvm.mul %4787, %5258 : !llvm.i64 + %5260 = llvm.add %5257, %5259 : !llvm.i64 + %5261 = llvm.getelementptr %5250[%5260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5249, %5261 : !llvm.ptr> + %5262 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5263 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5264 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5265 = llvm.mul %4772, %5264 : !llvm.i64 + %5266 = llvm.add %5263, %5265 : !llvm.i64 + %5267 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5268 = llvm.mul %67, %5267 : !llvm.i64 + %5269 = llvm.add %5266, %5268 : !llvm.i64 + %5270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5271 = llvm.mul %4787, %5270 : !llvm.i64 + %5272 = llvm.add %5269, %5271 : !llvm.i64 + %5273 = llvm.getelementptr %5262[%5272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5274 = llvm.load %5273 : !llvm.ptr> + %5275 = llvm.insertelement %5021, %5274[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5276 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5277 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5278 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5279 = llvm.mul %4772, %5278 : !llvm.i64 + %5280 = llvm.add %5277, %5279 : !llvm.i64 + %5281 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5282 = llvm.mul %67, %5281 : !llvm.i64 + %5283 = llvm.add %5280, %5282 : !llvm.i64 + %5284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5285 = llvm.mul %4787, %5284 : !llvm.i64 + %5286 = llvm.add %5283, %5285 : !llvm.i64 + %5287 = llvm.getelementptr %5276[%5286] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5275, %5287 : !llvm.ptr> + %5288 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5290 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5291 = llvm.mul %4772, %5290 : !llvm.i64 + %5292 = llvm.add %5289, %5291 : !llvm.i64 + %5293 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5294 = llvm.mul %67, %5293 : !llvm.i64 + %5295 = llvm.add %5292, %5294 : !llvm.i64 + %5296 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5297 = llvm.mul %4787, %5296 : !llvm.i64 + %5298 = llvm.add %5295, %5297 : !llvm.i64 + %5299 = llvm.getelementptr %5288[%5298] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5300 = llvm.load %5299 : !llvm.ptr> + %5301 = llvm.insertelement %5022, %5300[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5302 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5303 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5304 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5305 = llvm.mul %4772, %5304 : !llvm.i64 + %5306 = llvm.add %5303, %5305 : !llvm.i64 + %5307 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5308 = llvm.mul %67, %5307 : !llvm.i64 + %5309 = llvm.add %5306, %5308 : !llvm.i64 + %5310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5311 = llvm.mul %4787, %5310 : !llvm.i64 + %5312 = llvm.add %5309, %5311 : !llvm.i64 + %5313 = llvm.getelementptr %5302[%5312] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5301, %5313 : !llvm.ptr> + %5314 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5316 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5317 = llvm.mul %4772, %5316 : !llvm.i64 + %5318 = llvm.add %5315, %5317 : !llvm.i64 + %5319 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5320 = llvm.mul %67, %5319 : !llvm.i64 + %5321 = llvm.add %5318, %5320 : !llvm.i64 + %5322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5323 = llvm.mul %4787, %5322 : !llvm.i64 + %5324 = llvm.add %5321, %5323 : !llvm.i64 + %5325 = llvm.getelementptr %5314[%5324] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5326 = llvm.load %5325 : !llvm.ptr> + %5327 = llvm.insertelement %5023, %5326[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5328 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5330 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5331 = llvm.mul %4772, %5330 : !llvm.i64 + %5332 = llvm.add %5329, %5331 : !llvm.i64 + %5333 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5334 = llvm.mul %67, %5333 : !llvm.i64 + %5335 = llvm.add %5332, %5334 : !llvm.i64 + %5336 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5337 = llvm.mul %4787, %5336 : !llvm.i64 + %5338 = llvm.add %5335, %5337 : !llvm.i64 + %5339 = llvm.getelementptr %5328[%5338] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5327, %5339 : !llvm.ptr> + %5340 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5342 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5343 = llvm.mul %4772, %5342 : !llvm.i64 + %5344 = llvm.add %5341, %5343 : !llvm.i64 + %5345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5346 = llvm.mul %67, %5345 : !llvm.i64 + %5347 = llvm.add %5344, %5346 : !llvm.i64 + %5348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5349 = llvm.mul %4787, %5348 : !llvm.i64 + %5350 = llvm.add %5347, %5349 : !llvm.i64 + %5351 = llvm.getelementptr %5340[%5350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5352 = llvm.load %5351 : !llvm.ptr> + %5353 = llvm.insertelement %5024, %5352[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5354 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5356 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5357 = llvm.mul %4772, %5356 : !llvm.i64 + %5358 = llvm.add %5355, %5357 : !llvm.i64 + %5359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5360 = llvm.mul %67, %5359 : !llvm.i64 + %5361 = llvm.add %5358, %5360 : !llvm.i64 + %5362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5363 = llvm.mul %4787, %5362 : !llvm.i64 + %5364 = llvm.add %5361, %5363 : !llvm.i64 + %5365 = llvm.getelementptr %5354[%5364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5353, %5365 : !llvm.ptr> + %5366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5369 = llvm.mul %4772, %5368 : !llvm.i64 + %5370 = llvm.add %5367, %5369 : !llvm.i64 + %5371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5372 = llvm.mul %67, %5371 : !llvm.i64 + %5373 = llvm.add %5370, %5372 : !llvm.i64 + %5374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5375 = llvm.mul %4787, %5374 : !llvm.i64 + %5376 = llvm.add %5373, %5375 : !llvm.i64 + %5377 = llvm.getelementptr %5366[%5376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5378 = llvm.load %5377 : !llvm.ptr> + %5379 = llvm.insertelement %5025, %5378[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5380 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5382 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5383 = llvm.mul %4772, %5382 : !llvm.i64 + %5384 = llvm.add %5381, %5383 : !llvm.i64 + %5385 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5386 = llvm.mul %67, %5385 : !llvm.i64 + %5387 = llvm.add %5384, %5386 : !llvm.i64 + %5388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5389 = llvm.mul %4787, %5388 : !llvm.i64 + %5390 = llvm.add %5387, %5389 : !llvm.i64 + %5391 = llvm.getelementptr %5380[%5390] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5379, %5391 : !llvm.ptr> + %5392 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5393 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5394 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5395 = llvm.mul %4772, %5394 : !llvm.i64 + %5396 = llvm.add %5393, %5395 : !llvm.i64 + %5397 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5398 = llvm.mul %67, %5397 : !llvm.i64 + %5399 = llvm.add %5396, %5398 : !llvm.i64 + %5400 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5401 = llvm.mul %4787, %5400 : !llvm.i64 + %5402 = llvm.add %5399, %5401 : !llvm.i64 + %5403 = llvm.getelementptr %5392[%5402] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5404 = llvm.load %5403 : !llvm.ptr> + %5405 = llvm.insertelement %5026, %5404[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5406 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5408 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5409 = llvm.mul %4772, %5408 : !llvm.i64 + %5410 = llvm.add %5407, %5409 : !llvm.i64 + %5411 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5412 = llvm.mul %67, %5411 : !llvm.i64 + %5413 = llvm.add %5410, %5412 : !llvm.i64 + %5414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5415 = llvm.mul %4787, %5414 : !llvm.i64 + %5416 = llvm.add %5413, %5415 : !llvm.i64 + %5417 = llvm.getelementptr %5406[%5416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5405, %5417 : !llvm.ptr> + %5418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5421 = llvm.mul %4772, %5420 : !llvm.i64 + %5422 = llvm.add %5419, %5421 : !llvm.i64 + %5423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5424 = llvm.mul %67, %5423 : !llvm.i64 + %5425 = llvm.add %5422, %5424 : !llvm.i64 + %5426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5427 = llvm.mul %4787, %5426 : !llvm.i64 + %5428 = llvm.add %5425, %5427 : !llvm.i64 + %5429 = llvm.getelementptr %5418[%5428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5430 = llvm.load %5429 : !llvm.ptr> + %5431 = llvm.insertelement %5027, %5430[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5435 = llvm.mul %4772, %5434 : !llvm.i64 + %5436 = llvm.add %5433, %5435 : !llvm.i64 + %5437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5438 = llvm.mul %67, %5437 : !llvm.i64 + %5439 = llvm.add %5436, %5438 : !llvm.i64 + %5440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5441 = llvm.mul %4787, %5440 : !llvm.i64 + %5442 = llvm.add %5439, %5441 : !llvm.i64 + %5443 = llvm.getelementptr %5432[%5442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5431, %5443 : !llvm.ptr> + %5444 = llvm.add %3915, %69 : !llvm.i64 + llvm.br ^bb36(%5444 : !llvm.i64) + ^bb38: // pred: ^bb36 + %5445 = llvm.add %2370, %48 : !llvm.i64 + llvm.br ^bb25(%5445 : !llvm.i64) + ^bb39: // pred: ^bb25 + %5446 = llvm.add %2368, %68 : !llvm.i64 + llvm.br ^bb23(%5446 : !llvm.i64) + ^bb40: // pred: ^bb23 + llvm.br ^bb41(%67 : !llvm.i64) + ^bb41(%5447: !llvm.i64): // 2 preds: ^bb40, ^bb50 + %5448 = llvm.icmp "slt" %5447, %38 : !llvm.i64 + llvm.cond_br %5448, ^bb42, ^bb51 + ^bb42: // pred: ^bb41 + llvm.cond_br %40, ^bb43, ^bb47 + ^bb43: // pred: ^bb42 + %5449 = llvm.add %151, %5447 : !llvm.i64 + %5450 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5451 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5452 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5453 = llvm.mul %2345, %5452 : !llvm.i64 + %5454 = llvm.add %5451, %5453 : !llvm.i64 + %5455 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5456 = llvm.mul %5449, %5455 : !llvm.i64 + %5457 = llvm.add %5454, %5456 : !llvm.i64 + %5458 = llvm.getelementptr %5450[%5457] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5459 = llvm.bitcast %5458 : !llvm.ptr to !llvm.ptr> + %5460 = llvm.load %5459 {alignment = 4 : i64} : !llvm.ptr> + %5461 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %5462 = llvm.sub %64, %5447 : !llvm.i64 + %5463 = llvm.select %5461, %5462, %5447 : !llvm.i1, !llvm.i64 + %5464 = llvm.sdiv %5463, %68 : !llvm.i64 + %5465 = llvm.sub %64, %5464 : !llvm.i64 + %5466 = llvm.select %5461, %5465, %5464 : !llvm.i1, !llvm.i64 + %5467 = llvm.srem %5466, %68 : !llvm.i64 + %5468 = llvm.icmp "slt" %5467, %67 : !llvm.i64 + %5469 = llvm.add %5467, %68 : !llvm.i64 + %5470 = llvm.select %5468, %5469, %5467 : !llvm.i1, !llvm.i64 + %5471 = llvm.srem %5447, %68 : !llvm.i64 + %5472 = llvm.icmp "slt" %5471, %67 : !llvm.i64 + %5473 = llvm.add %5471, %68 : !llvm.i64 + %5474 = llvm.select %5472, %5473, %5471 : !llvm.i1, !llvm.i64 + %5475 = llvm.icmp "slt" %5474, %67 : !llvm.i64 + %5476 = llvm.sub %64, %5474 : !llvm.i64 + %5477 = llvm.select %5475, %5476, %5474 : !llvm.i1, !llvm.i64 + %5478 = llvm.sdiv %5477, %70 : !llvm.i64 + %5479 = llvm.sub %64, %5478 : !llvm.i64 + %5480 = llvm.select %5475, %5479, %5478 : !llvm.i1, !llvm.i64 + %5481 = llvm.srem %5480, %63 : !llvm.i64 + %5482 = llvm.icmp "slt" %5481, %67 : !llvm.i64 + %5483 = llvm.add %5481, %63 : !llvm.i64 + %5484 = llvm.select %5482, %5483, %5481 : !llvm.i1, !llvm.i64 + %5485 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5486 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5487 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5488 = llvm.mul %5470, %5487 : !llvm.i64 + %5489 = llvm.add %5486, %5488 : !llvm.i64 + %5490 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5491 = llvm.mul %67, %5490 : !llvm.i64 + %5492 = llvm.add %5489, %5491 : !llvm.i64 + %5493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5494 = llvm.mul %5484, %5493 : !llvm.i64 + %5495 = llvm.add %5492, %5494 : !llvm.i64 + %5496 = llvm.getelementptr %5485[%5495] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5497 = llvm.load %5496 : !llvm.ptr> + %5498 = llvm.fadd %5460, %5497 : !llvm.vec<8 x float> + %5499 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5500 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5501 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5502 = llvm.mul %67, %5501 : !llvm.i64 + %5503 = llvm.add %5500, %5502 : !llvm.i64 + %5504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5505 = llvm.mul %67, %5504 : !llvm.i64 + %5506 = llvm.add %5503, %5505 : !llvm.i64 + %5507 = llvm.getelementptr %5499[%5506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5498, %5507 : !llvm.ptr> + %5508 = llvm.add %5449, %70 : !llvm.i64 + %5509 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5510 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5511 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5512 = llvm.mul %2345, %5511 : !llvm.i64 + %5513 = llvm.add %5510, %5512 : !llvm.i64 + %5514 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5515 = llvm.mul %5508, %5514 : !llvm.i64 + %5516 = llvm.add %5513, %5515 : !llvm.i64 + %5517 = llvm.getelementptr %5509[%5516] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5518 = llvm.bitcast %5517 : !llvm.ptr to !llvm.ptr> + %5519 = llvm.load %5518 {alignment = 4 : i64} : !llvm.ptr> + %5520 = llvm.add %5447, %70 : !llvm.i64 + %5521 = llvm.icmp "slt" %5520, %67 : !llvm.i64 + %5522 = llvm.sub %64, %5520 : !llvm.i64 + %5523 = llvm.select %5521, %5522, %5520 : !llvm.i1, !llvm.i64 + %5524 = llvm.sdiv %5523, %68 : !llvm.i64 + %5525 = llvm.sub %64, %5524 : !llvm.i64 + %5526 = llvm.select %5521, %5525, %5524 : !llvm.i1, !llvm.i64 + %5527 = llvm.srem %5526, %68 : !llvm.i64 + %5528 = llvm.icmp "slt" %5527, %67 : !llvm.i64 + %5529 = llvm.add %5527, %68 : !llvm.i64 + %5530 = llvm.select %5528, %5529, %5527 : !llvm.i1, !llvm.i64 + %5531 = llvm.sdiv %5463, %70 : !llvm.i64 + %5532 = llvm.sub %64, %5531 : !llvm.i64 + %5533 = llvm.select %5461, %5532, %5531 : !llvm.i1, !llvm.i64 + %5534 = llvm.mul %5526, %65 : !llvm.i64 + %5535 = llvm.add %5533, %5534 : !llvm.i64 + %5536 = llvm.add %5535, %69 : !llvm.i64 + %5537 = llvm.icmp "slt" %5536, %67 : !llvm.i64 + %5538 = llvm.sub %64, %5536 : !llvm.i64 + %5539 = llvm.select %5537, %5538, %5536 : !llvm.i1, !llvm.i64 + %5540 = llvm.sdiv %5539, %63 : !llvm.i64 + %5541 = llvm.sub %64, %5540 : !llvm.i64 + %5542 = llvm.select %5537, %5541, %5540 : !llvm.i1, !llvm.i64 + %5543 = llvm.mul %5542, %65 : !llvm.i64 + %5544 = llvm.add %5535, %5543 : !llvm.i64 + %5545 = llvm.add %5544, %69 : !llvm.i64 + %5546 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5547 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5548 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5549 = llvm.mul %5530, %5548 : !llvm.i64 + %5550 = llvm.add %5547, %5549 : !llvm.i64 + %5551 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5552 = llvm.mul %67, %5551 : !llvm.i64 + %5553 = llvm.add %5550, %5552 : !llvm.i64 + %5554 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5555 = llvm.mul %5545, %5554 : !llvm.i64 + %5556 = llvm.add %5553, %5555 : !llvm.i64 + %5557 = llvm.getelementptr %5546[%5556] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5558 = llvm.load %5557 : !llvm.ptr> + %5559 = llvm.fadd %5519, %5558 : !llvm.vec<8 x float> + %5560 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5561 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5562 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5563 = llvm.mul %67, %5562 : !llvm.i64 + %5564 = llvm.add %5561, %5563 : !llvm.i64 + %5565 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5566 = llvm.mul %69, %5565 : !llvm.i64 + %5567 = llvm.add %5564, %5566 : !llvm.i64 + %5568 = llvm.getelementptr %5560[%5567] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5559, %5568 : !llvm.ptr> + %5569 = llvm.add %5449, %68 : !llvm.i64 + %5570 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5571 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5572 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5573 = llvm.mul %2345, %5572 : !llvm.i64 + %5574 = llvm.add %5571, %5573 : !llvm.i64 + %5575 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5576 = llvm.mul %5569, %5575 : !llvm.i64 + %5577 = llvm.add %5574, %5576 : !llvm.i64 + %5578 = llvm.getelementptr %5570[%5577] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5579 = llvm.bitcast %5578 : !llvm.ptr to !llvm.ptr> + %5580 = llvm.load %5579 {alignment = 4 : i64} : !llvm.ptr> + %5581 = llvm.add %5466, %69 : !llvm.i64 + %5582 = llvm.icmp "slt" %5581, %67 : !llvm.i64 + %5583 = llvm.sub %64, %5581 : !llvm.i64 + %5584 = llvm.select %5582, %5583, %5581 : !llvm.i1, !llvm.i64 + %5585 = llvm.sdiv %5584, %68 : !llvm.i64 + %5586 = llvm.sub %64, %5585 : !llvm.i64 + %5587 = llvm.select %5582, %5586, %5585 : !llvm.i1, !llvm.i64 + %5588 = llvm.mul %5587, %60 : !llvm.i64 + %5589 = llvm.add %5466, %5588 : !llvm.i64 + %5590 = llvm.add %5589, %69 : !llvm.i64 + %5591 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5593 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5594 = llvm.mul %5590, %5593 : !llvm.i64 + %5595 = llvm.add %5592, %5594 : !llvm.i64 + %5596 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5597 = llvm.mul %67, %5596 : !llvm.i64 + %5598 = llvm.add %5595, %5597 : !llvm.i64 + %5599 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5600 = llvm.mul %5484, %5599 : !llvm.i64 + %5601 = llvm.add %5598, %5600 : !llvm.i64 + %5602 = llvm.getelementptr %5591[%5601] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5603 = llvm.load %5602 : !llvm.ptr> + %5604 = llvm.fadd %5580, %5603 : !llvm.vec<8 x float> + %5605 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5606 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5607 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5608 = llvm.mul %67, %5607 : !llvm.i64 + %5609 = llvm.add %5606, %5608 : !llvm.i64 + %5610 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5611 = llvm.mul %63, %5610 : !llvm.i64 + %5612 = llvm.add %5609, %5611 : !llvm.i64 + %5613 = llvm.getelementptr %5605[%5612] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5604, %5613 : !llvm.ptr> + %5614 = llvm.add %5449, %41 : !llvm.i64 + %5615 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5616 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5617 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5618 = llvm.mul %2345, %5617 : !llvm.i64 + %5619 = llvm.add %5616, %5618 : !llvm.i64 + %5620 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5621 = llvm.mul %5614, %5620 : !llvm.i64 + %5622 = llvm.add %5619, %5621 : !llvm.i64 + %5623 = llvm.getelementptr %5615[%5622] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5624 = llvm.bitcast %5623 : !llvm.ptr to !llvm.ptr> + %5625 = llvm.load %5624 {alignment = 4 : i64} : !llvm.ptr> + %5626 = llvm.add %5447, %41 : !llvm.i64 + %5627 = llvm.icmp "slt" %5626, %67 : !llvm.i64 + %5628 = llvm.sub %64, %5626 : !llvm.i64 + %5629 = llvm.select %5627, %5628, %5626 : !llvm.i1, !llvm.i64 + %5630 = llvm.sdiv %5629, %68 : !llvm.i64 + %5631 = llvm.sub %64, %5630 : !llvm.i64 + %5632 = llvm.select %5627, %5631, %5630 : !llvm.i1, !llvm.i64 + %5633 = llvm.srem %5632, %68 : !llvm.i64 + %5634 = llvm.icmp "slt" %5633, %67 : !llvm.i64 + %5635 = llvm.add %5633, %68 : !llvm.i64 + %5636 = llvm.select %5634, %5635, %5633 : !llvm.i1, !llvm.i64 + %5637 = llvm.mul %5632, %65 : !llvm.i64 + %5638 = llvm.add %5533, %5637 : !llvm.i64 + %5639 = llvm.add %5638, %45 : !llvm.i64 + %5640 = llvm.icmp "slt" %5639, %67 : !llvm.i64 + %5641 = llvm.sub %64, %5639 : !llvm.i64 + %5642 = llvm.select %5640, %5641, %5639 : !llvm.i1, !llvm.i64 + %5643 = llvm.sdiv %5642, %63 : !llvm.i64 + %5644 = llvm.sub %64, %5643 : !llvm.i64 + %5645 = llvm.select %5640, %5644, %5643 : !llvm.i1, !llvm.i64 + %5646 = llvm.mul %5645, %65 : !llvm.i64 + %5647 = llvm.add %5638, %5646 : !llvm.i64 + %5648 = llvm.add %5647, %45 : !llvm.i64 + %5649 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5651 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5652 = llvm.mul %5636, %5651 : !llvm.i64 + %5653 = llvm.add %5650, %5652 : !llvm.i64 + %5654 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5655 = llvm.mul %67, %5654 : !llvm.i64 + %5656 = llvm.add %5653, %5655 : !llvm.i64 + %5657 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5658 = llvm.mul %5648, %5657 : !llvm.i64 + %5659 = llvm.add %5656, %5658 : !llvm.i64 + %5660 = llvm.getelementptr %5649[%5659] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5661 = llvm.load %5660 : !llvm.ptr> + %5662 = llvm.fadd %5625, %5661 : !llvm.vec<8 x float> + %5663 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5664 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5665 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5666 = llvm.mul %67, %5665 : !llvm.i64 + %5667 = llvm.add %5664, %5666 : !llvm.i64 + %5668 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5669 = llvm.mul %45, %5668 : !llvm.i64 + %5670 = llvm.add %5667, %5669 : !llvm.i64 + %5671 = llvm.getelementptr %5663[%5670] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5662, %5671 : !llvm.ptr> + %5672 = llvm.add %5449, %42 : !llvm.i64 + %5673 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5674 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5675 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5676 = llvm.mul %2345, %5675 : !llvm.i64 + %5677 = llvm.add %5674, %5676 : !llvm.i64 + %5678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5679 = llvm.mul %5672, %5678 : !llvm.i64 + %5680 = llvm.add %5677, %5679 : !llvm.i64 + %5681 = llvm.getelementptr %5673[%5680] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5682 = llvm.bitcast %5681 : !llvm.ptr to !llvm.ptr> + %5683 = llvm.load %5682 {alignment = 4 : i64} : !llvm.ptr> + %5684 = llvm.add %5466, %63 : !llvm.i64 + %5685 = llvm.icmp "slt" %5684, %67 : !llvm.i64 + %5686 = llvm.sub %64, %5684 : !llvm.i64 + %5687 = llvm.select %5685, %5686, %5684 : !llvm.i1, !llvm.i64 + %5688 = llvm.sdiv %5687, %68 : !llvm.i64 + %5689 = llvm.sub %64, %5688 : !llvm.i64 + %5690 = llvm.select %5685, %5689, %5688 : !llvm.i1, !llvm.i64 + %5691 = llvm.mul %5690, %60 : !llvm.i64 + %5692 = llvm.add %5466, %5691 : !llvm.i64 + %5693 = llvm.add %5692, %63 : !llvm.i64 + %5694 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5695 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5696 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5697 = llvm.mul %5693, %5696 : !llvm.i64 + %5698 = llvm.add %5695, %5697 : !llvm.i64 + %5699 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5700 = llvm.mul %67, %5699 : !llvm.i64 + %5701 = llvm.add %5698, %5700 : !llvm.i64 + %5702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5703 = llvm.mul %5484, %5702 : !llvm.i64 + %5704 = llvm.add %5701, %5703 : !llvm.i64 + %5705 = llvm.getelementptr %5694[%5704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5706 = llvm.load %5705 : !llvm.ptr> + %5707 = llvm.fadd %5683, %5706 : !llvm.vec<8 x float> + %5708 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5710 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5711 = llvm.mul %67, %5710 : !llvm.i64 + %5712 = llvm.add %5709, %5711 : !llvm.i64 + %5713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5714 = llvm.mul %48, %5713 : !llvm.i64 + %5715 = llvm.add %5712, %5714 : !llvm.i64 + %5716 = llvm.getelementptr %5708[%5715] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5707, %5716 : !llvm.ptr> + %5717 = llvm.add %5449, %43 : !llvm.i64 + %5718 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5720 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5721 = llvm.mul %2345, %5720 : !llvm.i64 + %5722 = llvm.add %5719, %5721 : !llvm.i64 + %5723 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5724 = llvm.mul %5717, %5723 : !llvm.i64 + %5725 = llvm.add %5722, %5724 : !llvm.i64 + %5726 = llvm.getelementptr %5718[%5725] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5727 = llvm.bitcast %5726 : !llvm.ptr to !llvm.ptr> + %5728 = llvm.load %5727 {alignment = 4 : i64} : !llvm.ptr> + %5729 = llvm.add %5447, %43 : !llvm.i64 + %5730 = llvm.icmp "slt" %5729, %67 : !llvm.i64 + %5731 = llvm.sub %64, %5729 : !llvm.i64 + %5732 = llvm.select %5730, %5731, %5729 : !llvm.i1, !llvm.i64 + %5733 = llvm.sdiv %5732, %68 : !llvm.i64 + %5734 = llvm.sub %64, %5733 : !llvm.i64 + %5735 = llvm.select %5730, %5734, %5733 : !llvm.i1, !llvm.i64 + %5736 = llvm.srem %5735, %68 : !llvm.i64 + %5737 = llvm.icmp "slt" %5736, %67 : !llvm.i64 + %5738 = llvm.add %5736, %68 : !llvm.i64 + %5739 = llvm.select %5737, %5738, %5736 : !llvm.i1, !llvm.i64 + %5740 = llvm.mul %5735, %65 : !llvm.i64 + %5741 = llvm.add %5533, %5740 : !llvm.i64 + %5742 = llvm.add %5741, %52 : !llvm.i64 + %5743 = llvm.icmp "slt" %5742, %67 : !llvm.i64 + %5744 = llvm.sub %64, %5742 : !llvm.i64 + %5745 = llvm.select %5743, %5744, %5742 : !llvm.i1, !llvm.i64 + %5746 = llvm.sdiv %5745, %63 : !llvm.i64 + %5747 = llvm.sub %64, %5746 : !llvm.i64 + %5748 = llvm.select %5743, %5747, %5746 : !llvm.i1, !llvm.i64 + %5749 = llvm.mul %5748, %65 : !llvm.i64 + %5750 = llvm.add %5741, %5749 : !llvm.i64 + %5751 = llvm.add %5750, %52 : !llvm.i64 + %5752 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5754 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5755 = llvm.mul %5739, %5754 : !llvm.i64 + %5756 = llvm.add %5753, %5755 : !llvm.i64 + %5757 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5758 = llvm.mul %67, %5757 : !llvm.i64 + %5759 = llvm.add %5756, %5758 : !llvm.i64 + %5760 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5761 = llvm.mul %5751, %5760 : !llvm.i64 + %5762 = llvm.add %5759, %5761 : !llvm.i64 + %5763 = llvm.getelementptr %5752[%5762] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5764 = llvm.load %5763 : !llvm.ptr> + %5765 = llvm.fadd %5728, %5764 : !llvm.vec<8 x float> + %5766 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5767 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5768 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5769 = llvm.mul %67, %5768 : !llvm.i64 + %5770 = llvm.add %5767, %5769 : !llvm.i64 + %5771 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5772 = llvm.mul %52, %5771 : !llvm.i64 + %5773 = llvm.add %5770, %5772 : !llvm.i64 + %5774 = llvm.getelementptr %5766[%5773] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5765, %5774 : !llvm.ptr> + %5775 = llvm.add %5449, %44 : !llvm.i64 + %5776 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5777 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5778 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5779 = llvm.mul %2345, %5778 : !llvm.i64 + %5780 = llvm.add %5777, %5779 : !llvm.i64 + %5781 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5782 = llvm.mul %5775, %5781 : !llvm.i64 + %5783 = llvm.add %5780, %5782 : !llvm.i64 + %5784 = llvm.getelementptr %5776[%5783] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5785 = llvm.bitcast %5784 : !llvm.ptr to !llvm.ptr> + %5786 = llvm.load %5785 {alignment = 4 : i64} : !llvm.ptr> + %5787 = llvm.add %5466, %45 : !llvm.i64 + %5788 = llvm.icmp "slt" %5787, %67 : !llvm.i64 + %5789 = llvm.sub %64, %5787 : !llvm.i64 + %5790 = llvm.select %5788, %5789, %5787 : !llvm.i1, !llvm.i64 + %5791 = llvm.sdiv %5790, %68 : !llvm.i64 + %5792 = llvm.sub %64, %5791 : !llvm.i64 + %5793 = llvm.select %5788, %5792, %5791 : !llvm.i1, !llvm.i64 + %5794 = llvm.mul %5793, %60 : !llvm.i64 + %5795 = llvm.add %5466, %5794 : !llvm.i64 + %5796 = llvm.add %5795, %45 : !llvm.i64 + %5797 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5799 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5800 = llvm.mul %5796, %5799 : !llvm.i64 + %5801 = llvm.add %5798, %5800 : !llvm.i64 + %5802 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5803 = llvm.mul %67, %5802 : !llvm.i64 + %5804 = llvm.add %5801, %5803 : !llvm.i64 + %5805 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5806 = llvm.mul %5484, %5805 : !llvm.i64 + %5807 = llvm.add %5804, %5806 : !llvm.i64 + %5808 = llvm.getelementptr %5797[%5807] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5809 = llvm.load %5808 : !llvm.ptr> + %5810 = llvm.fadd %5786, %5809 : !llvm.vec<8 x float> + %5811 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5812 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5813 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5814 = llvm.mul %67, %5813 : !llvm.i64 + %5815 = llvm.add %5812, %5814 : !llvm.i64 + %5816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5817 = llvm.mul %56, %5816 : !llvm.i64 + %5818 = llvm.add %5815, %5817 : !llvm.i64 + %5819 = llvm.getelementptr %5811[%5818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5810, %5819 : !llvm.ptr> + %5820 = llvm.add %5449, %46 : !llvm.i64 + %5821 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5822 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5823 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5824 = llvm.mul %2345, %5823 : !llvm.i64 + %5825 = llvm.add %5822, %5824 : !llvm.i64 + %5826 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5827 = llvm.mul %5820, %5826 : !llvm.i64 + %5828 = llvm.add %5825, %5827 : !llvm.i64 + %5829 = llvm.getelementptr %5821[%5828] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5830 = llvm.bitcast %5829 : !llvm.ptr to !llvm.ptr> + %5831 = llvm.load %5830 {alignment = 4 : i64} : !llvm.ptr> + %5832 = llvm.add %5447, %46 : !llvm.i64 + %5833 = llvm.icmp "slt" %5832, %67 : !llvm.i64 + %5834 = llvm.sub %64, %5832 : !llvm.i64 + %5835 = llvm.select %5833, %5834, %5832 : !llvm.i1, !llvm.i64 + %5836 = llvm.sdiv %5835, %68 : !llvm.i64 + %5837 = llvm.sub %64, %5836 : !llvm.i64 + %5838 = llvm.select %5833, %5837, %5836 : !llvm.i1, !llvm.i64 + %5839 = llvm.srem %5838, %68 : !llvm.i64 + %5840 = llvm.icmp "slt" %5839, %67 : !llvm.i64 + %5841 = llvm.add %5839, %68 : !llvm.i64 + %5842 = llvm.select %5840, %5841, %5839 : !llvm.i1, !llvm.i64 + %5843 = llvm.mul %5838, %65 : !llvm.i64 + %5844 = llvm.add %5533, %5843 : !llvm.i64 + %5845 = llvm.add %5844, %61 : !llvm.i64 + %5846 = llvm.icmp "slt" %5845, %67 : !llvm.i64 + %5847 = llvm.sub %64, %5845 : !llvm.i64 + %5848 = llvm.select %5846, %5847, %5845 : !llvm.i1, !llvm.i64 + %5849 = llvm.sdiv %5848, %63 : !llvm.i64 + %5850 = llvm.sub %64, %5849 : !llvm.i64 + %5851 = llvm.select %5846, %5850, %5849 : !llvm.i1, !llvm.i64 + %5852 = llvm.mul %5851, %65 : !llvm.i64 + %5853 = llvm.add %5844, %5852 : !llvm.i64 + %5854 = llvm.add %5853, %61 : !llvm.i64 + %5855 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5857 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5858 = llvm.mul %5842, %5857 : !llvm.i64 + %5859 = llvm.add %5856, %5858 : !llvm.i64 + %5860 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5861 = llvm.mul %67, %5860 : !llvm.i64 + %5862 = llvm.add %5859, %5861 : !llvm.i64 + %5863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5864 = llvm.mul %5854, %5863 : !llvm.i64 + %5865 = llvm.add %5862, %5864 : !llvm.i64 + %5866 = llvm.getelementptr %5855[%5865] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5867 = llvm.load %5866 : !llvm.ptr> + %5868 = llvm.fadd %5831, %5867 : !llvm.vec<8 x float> + %5869 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5871 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5872 = llvm.mul %67, %5871 : !llvm.i64 + %5873 = llvm.add %5870, %5872 : !llvm.i64 + %5874 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5875 = llvm.mul %61, %5874 : !llvm.i64 + %5876 = llvm.add %5873, %5875 : !llvm.i64 + %5877 = llvm.getelementptr %5869[%5876] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5868, %5877 : !llvm.ptr> + %5878 = llvm.add %5449, %47 : !llvm.i64 + %5879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5882 = llvm.mul %2345, %5881 : !llvm.i64 + %5883 = llvm.add %5880, %5882 : !llvm.i64 + %5884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5885 = llvm.mul %5878, %5884 : !llvm.i64 + %5886 = llvm.add %5883, %5885 : !llvm.i64 + %5887 = llvm.getelementptr %5879[%5886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5888 = llvm.bitcast %5887 : !llvm.ptr to !llvm.ptr> + %5889 = llvm.load %5888 {alignment = 4 : i64} : !llvm.ptr> + %5890 = llvm.add %5466, %48 : !llvm.i64 + %5891 = llvm.icmp "slt" %5890, %67 : !llvm.i64 + %5892 = llvm.sub %64, %5890 : !llvm.i64 + %5893 = llvm.select %5891, %5892, %5890 : !llvm.i1, !llvm.i64 + %5894 = llvm.sdiv %5893, %68 : !llvm.i64 + %5895 = llvm.sub %64, %5894 : !llvm.i64 + %5896 = llvm.select %5891, %5895, %5894 : !llvm.i1, !llvm.i64 + %5897 = llvm.mul %5896, %60 : !llvm.i64 + %5898 = llvm.add %5466, %5897 : !llvm.i64 + %5899 = llvm.add %5898, %48 : !llvm.i64 + %5900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5903 = llvm.mul %5899, %5902 : !llvm.i64 + %5904 = llvm.add %5901, %5903 : !llvm.i64 + %5905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5906 = llvm.mul %67, %5905 : !llvm.i64 + %5907 = llvm.add %5904, %5906 : !llvm.i64 + %5908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5909 = llvm.mul %5484, %5908 : !llvm.i64 + %5910 = llvm.add %5907, %5909 : !llvm.i64 + %5911 = llvm.getelementptr %5900[%5910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5912 = llvm.load %5911 : !llvm.ptr> + %5913 = llvm.fadd %5889, %5912 : !llvm.vec<8 x float> + %5914 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5915 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5916 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5917 = llvm.mul %67, %5916 : !llvm.i64 + %5918 = llvm.add %5915, %5917 : !llvm.i64 + %5919 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5920 = llvm.mul %70, %5919 : !llvm.i64 + %5921 = llvm.add %5918, %5920 : !llvm.i64 + %5922 = llvm.getelementptr %5914[%5921] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5913, %5922 : !llvm.ptr> + %5923 = llvm.add %5449, %49 : !llvm.i64 + %5924 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5925 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5926 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5927 = llvm.mul %2345, %5926 : !llvm.i64 + %5928 = llvm.add %5925, %5927 : !llvm.i64 + %5929 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5930 = llvm.mul %5923, %5929 : !llvm.i64 + %5931 = llvm.add %5928, %5930 : !llvm.i64 + %5932 = llvm.getelementptr %5924[%5931] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5933 = llvm.bitcast %5932 : !llvm.ptr to !llvm.ptr> + %5934 = llvm.load %5933 {alignment = 4 : i64} : !llvm.ptr> + %5935 = llvm.add %5447, %49 : !llvm.i64 + %5936 = llvm.icmp "slt" %5935, %67 : !llvm.i64 + %5937 = llvm.sub %64, %5935 : !llvm.i64 + %5938 = llvm.select %5936, %5937, %5935 : !llvm.i1, !llvm.i64 + %5939 = llvm.sdiv %5938, %68 : !llvm.i64 + %5940 = llvm.sub %64, %5939 : !llvm.i64 + %5941 = llvm.select %5936, %5940, %5939 : !llvm.i1, !llvm.i64 + %5942 = llvm.srem %5941, %68 : !llvm.i64 + %5943 = llvm.icmp "slt" %5942, %67 : !llvm.i64 + %5944 = llvm.add %5942, %68 : !llvm.i64 + %5945 = llvm.select %5943, %5944, %5942 : !llvm.i1, !llvm.i64 + %5946 = llvm.mul %5941, %65 : !llvm.i64 + %5947 = llvm.add %5533, %5946 : !llvm.i64 + %5948 = llvm.add %5947, %50 : !llvm.i64 + %5949 = llvm.icmp "slt" %5948, %67 : !llvm.i64 + %5950 = llvm.sub %64, %5948 : !llvm.i64 + %5951 = llvm.select %5949, %5950, %5948 : !llvm.i1, !llvm.i64 + %5952 = llvm.sdiv %5951, %63 : !llvm.i64 + %5953 = llvm.sub %64, %5952 : !llvm.i64 + %5954 = llvm.select %5949, %5953, %5952 : !llvm.i1, !llvm.i64 + %5955 = llvm.mul %5954, %65 : !llvm.i64 + %5956 = llvm.add %5947, %5955 : !llvm.i64 + %5957 = llvm.add %5956, %50 : !llvm.i64 + %5958 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5960 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5961 = llvm.mul %5945, %5960 : !llvm.i64 + %5962 = llvm.add %5959, %5961 : !llvm.i64 + %5963 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5964 = llvm.mul %67, %5963 : !llvm.i64 + %5965 = llvm.add %5962, %5964 : !llvm.i64 + %5966 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5967 = llvm.mul %5957, %5966 : !llvm.i64 + %5968 = llvm.add %5965, %5967 : !llvm.i64 + %5969 = llvm.getelementptr %5958[%5968] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5970 = llvm.load %5969 : !llvm.ptr> + %5971 = llvm.fadd %5934, %5970 : !llvm.vec<8 x float> + %5972 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5973 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5974 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5975 = llvm.mul %67, %5974 : !llvm.i64 + %5976 = llvm.add %5973, %5975 : !llvm.i64 + %5977 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5978 = llvm.mul %50, %5977 : !llvm.i64 + %5979 = llvm.add %5976, %5978 : !llvm.i64 + %5980 = llvm.getelementptr %5972[%5979] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5971, %5980 : !llvm.ptr> + %5981 = llvm.add %5449, %51 : !llvm.i64 + %5982 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5984 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5985 = llvm.mul %2345, %5984 : !llvm.i64 + %5986 = llvm.add %5983, %5985 : !llvm.i64 + %5987 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5988 = llvm.mul %5981, %5987 : !llvm.i64 + %5989 = llvm.add %5986, %5988 : !llvm.i64 + %5990 = llvm.getelementptr %5982[%5989] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5991 = llvm.bitcast %5990 : !llvm.ptr to !llvm.ptr> + %5992 = llvm.load %5991 {alignment = 4 : i64} : !llvm.ptr> + %5993 = llvm.add %5466, %52 : !llvm.i64 + %5994 = llvm.icmp "slt" %5993, %67 : !llvm.i64 + %5995 = llvm.sub %64, %5993 : !llvm.i64 + %5996 = llvm.select %5994, %5995, %5993 : !llvm.i1, !llvm.i64 + %5997 = llvm.sdiv %5996, %68 : !llvm.i64 + %5998 = llvm.sub %64, %5997 : !llvm.i64 + %5999 = llvm.select %5994, %5998, %5997 : !llvm.i1, !llvm.i64 + %6000 = llvm.mul %5999, %60 : !llvm.i64 + %6001 = llvm.add %5466, %6000 : !llvm.i64 + %6002 = llvm.add %6001, %52 : !llvm.i64 + %6003 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6004 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6005 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6006 = llvm.mul %6002, %6005 : !llvm.i64 + %6007 = llvm.add %6004, %6006 : !llvm.i64 + %6008 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6009 = llvm.mul %67, %6008 : !llvm.i64 + %6010 = llvm.add %6007, %6009 : !llvm.i64 + %6011 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6012 = llvm.mul %5484, %6011 : !llvm.i64 + %6013 = llvm.add %6010, %6012 : !llvm.i64 + %6014 = llvm.getelementptr %6003[%6013] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6015 = llvm.load %6014 : !llvm.ptr> + %6016 = llvm.fadd %5992, %6015 : !llvm.vec<8 x float> + %6017 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6018 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6019 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6020 = llvm.mul %67, %6019 : !llvm.i64 + %6021 = llvm.add %6018, %6020 : !llvm.i64 + %6022 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6023 = llvm.mul %33, %6022 : !llvm.i64 + %6024 = llvm.add %6021, %6023 : !llvm.i64 + %6025 = llvm.getelementptr %6017[%6024] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6016, %6025 : !llvm.ptr> + %6026 = llvm.add %5449, %53 : !llvm.i64 + %6027 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6028 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6029 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6030 = llvm.mul %2345, %6029 : !llvm.i64 + %6031 = llvm.add %6028, %6030 : !llvm.i64 + %6032 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6033 = llvm.mul %6026, %6032 : !llvm.i64 + %6034 = llvm.add %6031, %6033 : !llvm.i64 + %6035 = llvm.getelementptr %6027[%6034] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6036 = llvm.bitcast %6035 : !llvm.ptr to !llvm.ptr> + %6037 = llvm.load %6036 {alignment = 4 : i64} : !llvm.ptr> + %6038 = llvm.add %5447, %53 : !llvm.i64 + %6039 = llvm.icmp "slt" %6038, %67 : !llvm.i64 + %6040 = llvm.sub %64, %6038 : !llvm.i64 + %6041 = llvm.select %6039, %6040, %6038 : !llvm.i1, !llvm.i64 + %6042 = llvm.sdiv %6041, %68 : !llvm.i64 + %6043 = llvm.sub %64, %6042 : !llvm.i64 + %6044 = llvm.select %6039, %6043, %6042 : !llvm.i1, !llvm.i64 + %6045 = llvm.srem %6044, %68 : !llvm.i64 + %6046 = llvm.icmp "slt" %6045, %67 : !llvm.i64 + %6047 = llvm.add %6045, %68 : !llvm.i64 + %6048 = llvm.select %6046, %6047, %6045 : !llvm.i1, !llvm.i64 + %6049 = llvm.mul %6044, %65 : !llvm.i64 + %6050 = llvm.add %5533, %6049 : !llvm.i64 + %6051 = llvm.add %6050, %54 : !llvm.i64 + %6052 = llvm.icmp "slt" %6051, %67 : !llvm.i64 + %6053 = llvm.sub %64, %6051 : !llvm.i64 + %6054 = llvm.select %6052, %6053, %6051 : !llvm.i1, !llvm.i64 + %6055 = llvm.sdiv %6054, %63 : !llvm.i64 + %6056 = llvm.sub %64, %6055 : !llvm.i64 + %6057 = llvm.select %6052, %6056, %6055 : !llvm.i1, !llvm.i64 + %6058 = llvm.mul %6057, %65 : !llvm.i64 + %6059 = llvm.add %6050, %6058 : !llvm.i64 + %6060 = llvm.add %6059, %54 : !llvm.i64 + %6061 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6062 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6063 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6064 = llvm.mul %6048, %6063 : !llvm.i64 + %6065 = llvm.add %6062, %6064 : !llvm.i64 + %6066 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6067 = llvm.mul %67, %6066 : !llvm.i64 + %6068 = llvm.add %6065, %6067 : !llvm.i64 + %6069 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6070 = llvm.mul %6060, %6069 : !llvm.i64 + %6071 = llvm.add %6068, %6070 : !llvm.i64 + %6072 = llvm.getelementptr %6061[%6071] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6073 = llvm.load %6072 : !llvm.ptr> + %6074 = llvm.fadd %6037, %6073 : !llvm.vec<8 x float> + %6075 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6076 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6077 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6078 = llvm.mul %67, %6077 : !llvm.i64 + %6079 = llvm.add %6076, %6078 : !llvm.i64 + %6080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6081 = llvm.mul %54, %6080 : !llvm.i64 + %6082 = llvm.add %6079, %6081 : !llvm.i64 + %6083 = llvm.getelementptr %6075[%6082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6074, %6083 : !llvm.ptr> + %6084 = llvm.add %5449, %55 : !llvm.i64 + %6085 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6087 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6088 = llvm.mul %2345, %6087 : !llvm.i64 + %6089 = llvm.add %6086, %6088 : !llvm.i64 + %6090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6091 = llvm.mul %6084, %6090 : !llvm.i64 + %6092 = llvm.add %6089, %6091 : !llvm.i64 + %6093 = llvm.getelementptr %6085[%6092] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6094 = llvm.bitcast %6093 : !llvm.ptr to !llvm.ptr> + %6095 = llvm.load %6094 {alignment = 4 : i64} : !llvm.ptr> + %6096 = llvm.add %5466, %56 : !llvm.i64 + %6097 = llvm.icmp "slt" %6096, %67 : !llvm.i64 + %6098 = llvm.sub %64, %6096 : !llvm.i64 + %6099 = llvm.select %6097, %6098, %6096 : !llvm.i1, !llvm.i64 + %6100 = llvm.sdiv %6099, %68 : !llvm.i64 + %6101 = llvm.sub %64, %6100 : !llvm.i64 + %6102 = llvm.select %6097, %6101, %6100 : !llvm.i1, !llvm.i64 + %6103 = llvm.mul %6102, %60 : !llvm.i64 + %6104 = llvm.add %5466, %6103 : !llvm.i64 + %6105 = llvm.add %6104, %56 : !llvm.i64 + %6106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6109 = llvm.mul %6105, %6108 : !llvm.i64 + %6110 = llvm.add %6107, %6109 : !llvm.i64 + %6111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6112 = llvm.mul %67, %6111 : !llvm.i64 + %6113 = llvm.add %6110, %6112 : !llvm.i64 + %6114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6115 = llvm.mul %5484, %6114 : !llvm.i64 + %6116 = llvm.add %6113, %6115 : !llvm.i64 + %6117 = llvm.getelementptr %6106[%6116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6118 = llvm.load %6117 : !llvm.ptr> + %6119 = llvm.fadd %6095, %6118 : !llvm.vec<8 x float> + %6120 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6122 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6123 = llvm.mul %67, %6122 : !llvm.i64 + %6124 = llvm.add %6121, %6123 : !llvm.i64 + %6125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6126 = llvm.mul %34, %6125 : !llvm.i64 + %6127 = llvm.add %6124, %6126 : !llvm.i64 + %6128 = llvm.getelementptr %6120[%6127] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6119, %6128 : !llvm.ptr> + %6129 = llvm.add %5449, %57 : !llvm.i64 + %6130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6132 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6133 = llvm.mul %2345, %6132 : !llvm.i64 + %6134 = llvm.add %6131, %6133 : !llvm.i64 + %6135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6136 = llvm.mul %6129, %6135 : !llvm.i64 + %6137 = llvm.add %6134, %6136 : !llvm.i64 + %6138 = llvm.getelementptr %6130[%6137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6139 = llvm.bitcast %6138 : !llvm.ptr to !llvm.ptr> + %6140 = llvm.load %6139 {alignment = 4 : i64} : !llvm.ptr> + %6141 = llvm.add %5447, %57 : !llvm.i64 + %6142 = llvm.icmp "slt" %6141, %67 : !llvm.i64 + %6143 = llvm.sub %64, %6141 : !llvm.i64 + %6144 = llvm.select %6142, %6143, %6141 : !llvm.i1, !llvm.i64 + %6145 = llvm.sdiv %6144, %68 : !llvm.i64 + %6146 = llvm.sub %64, %6145 : !llvm.i64 + %6147 = llvm.select %6142, %6146, %6145 : !llvm.i1, !llvm.i64 + %6148 = llvm.srem %6147, %68 : !llvm.i64 + %6149 = llvm.icmp "slt" %6148, %67 : !llvm.i64 + %6150 = llvm.add %6148, %68 : !llvm.i64 + %6151 = llvm.select %6149, %6150, %6148 : !llvm.i1, !llvm.i64 + %6152 = llvm.mul %6147, %65 : !llvm.i64 + %6153 = llvm.add %5533, %6152 : !llvm.i64 + %6154 = llvm.add %6153, %58 : !llvm.i64 + %6155 = llvm.icmp "slt" %6154, %67 : !llvm.i64 + %6156 = llvm.sub %64, %6154 : !llvm.i64 + %6157 = llvm.select %6155, %6156, %6154 : !llvm.i1, !llvm.i64 + %6158 = llvm.sdiv %6157, %63 : !llvm.i64 + %6159 = llvm.sub %64, %6158 : !llvm.i64 + %6160 = llvm.select %6155, %6159, %6158 : !llvm.i1, !llvm.i64 + %6161 = llvm.mul %6160, %65 : !llvm.i64 + %6162 = llvm.add %6153, %6161 : !llvm.i64 + %6163 = llvm.add %6162, %58 : !llvm.i64 + %6164 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6165 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6166 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6167 = llvm.mul %6151, %6166 : !llvm.i64 + %6168 = llvm.add %6165, %6167 : !llvm.i64 + %6169 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6170 = llvm.mul %67, %6169 : !llvm.i64 + %6171 = llvm.add %6168, %6170 : !llvm.i64 + %6172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6173 = llvm.mul %6163, %6172 : !llvm.i64 + %6174 = llvm.add %6171, %6173 : !llvm.i64 + %6175 = llvm.getelementptr %6164[%6174] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6176 = llvm.load %6175 : !llvm.ptr> + %6177 = llvm.fadd %6140, %6176 : !llvm.vec<8 x float> + %6178 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6180 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6181 = llvm.mul %67, %6180 : !llvm.i64 + %6182 = llvm.add %6179, %6181 : !llvm.i64 + %6183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6184 = llvm.mul %58, %6183 : !llvm.i64 + %6185 = llvm.add %6182, %6184 : !llvm.i64 + %6186 = llvm.getelementptr %6178[%6185] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6177, %6186 : !llvm.ptr> + %6187 = llvm.add %5449, %59 : !llvm.i64 + %6188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6191 = llvm.mul %2345, %6190 : !llvm.i64 + %6192 = llvm.add %6189, %6191 : !llvm.i64 + %6193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6194 = llvm.mul %6187, %6193 : !llvm.i64 + %6195 = llvm.add %6192, %6194 : !llvm.i64 + %6196 = llvm.getelementptr %6188[%6195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6197 = llvm.bitcast %6196 : !llvm.ptr to !llvm.ptr> + %6198 = llvm.load %6197 {alignment = 4 : i64} : !llvm.ptr> + %6199 = llvm.add %5466, %61 : !llvm.i64 + %6200 = llvm.icmp "slt" %6199, %67 : !llvm.i64 + %6201 = llvm.sub %64, %6199 : !llvm.i64 + %6202 = llvm.select %6200, %6201, %6199 : !llvm.i1, !llvm.i64 + %6203 = llvm.sdiv %6202, %68 : !llvm.i64 + %6204 = llvm.sub %64, %6203 : !llvm.i64 + %6205 = llvm.select %6200, %6204, %6203 : !llvm.i1, !llvm.i64 + %6206 = llvm.mul %6205, %60 : !llvm.i64 + %6207 = llvm.add %5466, %6206 : !llvm.i64 + %6208 = llvm.add %6207, %61 : !llvm.i64 + %6209 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6211 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6212 = llvm.mul %6208, %6211 : !llvm.i64 + %6213 = llvm.add %6210, %6212 : !llvm.i64 + %6214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6215 = llvm.mul %67, %6214 : !llvm.i64 + %6216 = llvm.add %6213, %6215 : !llvm.i64 + %6217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6218 = llvm.mul %5484, %6217 : !llvm.i64 + %6219 = llvm.add %6216, %6218 : !llvm.i64 + %6220 = llvm.getelementptr %6209[%6219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6221 = llvm.load %6220 : !llvm.ptr> + %6222 = llvm.fadd %6198, %6221 : !llvm.vec<8 x float> + %6223 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6224 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6225 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6226 = llvm.mul %67, %6225 : !llvm.i64 + %6227 = llvm.add %6224, %6226 : !llvm.i64 + %6228 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6229 = llvm.mul %35, %6228 : !llvm.i64 + %6230 = llvm.add %6227, %6229 : !llvm.i64 + %6231 = llvm.getelementptr %6223[%6230] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6222, %6231 : !llvm.ptr> + %6232 = llvm.add %5449, %62 : !llvm.i64 + %6233 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6234 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6235 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6236 = llvm.mul %2345, %6235 : !llvm.i64 + %6237 = llvm.add %6234, %6236 : !llvm.i64 + %6238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6239 = llvm.mul %6232, %6238 : !llvm.i64 + %6240 = llvm.add %6237, %6239 : !llvm.i64 + %6241 = llvm.getelementptr %6233[%6240] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6242 = llvm.bitcast %6241 : !llvm.ptr to !llvm.ptr> + %6243 = llvm.load %6242 {alignment = 4 : i64} : !llvm.ptr> + %6244 = llvm.add %5447, %62 : !llvm.i64 + %6245 = llvm.icmp "slt" %6244, %67 : !llvm.i64 + %6246 = llvm.sub %64, %6244 : !llvm.i64 + %6247 = llvm.select %6245, %6246, %6244 : !llvm.i1, !llvm.i64 + %6248 = llvm.sdiv %6247, %68 : !llvm.i64 + %6249 = llvm.sub %64, %6248 : !llvm.i64 + %6250 = llvm.select %6245, %6249, %6248 : !llvm.i1, !llvm.i64 + %6251 = llvm.srem %6250, %68 : !llvm.i64 + %6252 = llvm.icmp "slt" %6251, %67 : !llvm.i64 + %6253 = llvm.add %6251, %68 : !llvm.i64 + %6254 = llvm.select %6252, %6253, %6251 : !llvm.i1, !llvm.i64 + %6255 = llvm.mul %6250, %65 : !llvm.i64 + %6256 = llvm.add %5533, %6255 : !llvm.i64 + %6257 = llvm.add %6256, %66 : !llvm.i64 + %6258 = llvm.icmp "slt" %6257, %67 : !llvm.i64 + %6259 = llvm.sub %64, %6257 : !llvm.i64 + %6260 = llvm.select %6258, %6259, %6257 : !llvm.i1, !llvm.i64 + %6261 = llvm.sdiv %6260, %63 : !llvm.i64 + %6262 = llvm.sub %64, %6261 : !llvm.i64 + %6263 = llvm.select %6258, %6262, %6261 : !llvm.i1, !llvm.i64 + %6264 = llvm.mul %6263, %65 : !llvm.i64 + %6265 = llvm.add %6256, %6264 : !llvm.i64 + %6266 = llvm.add %6265, %66 : !llvm.i64 + %6267 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6269 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6270 = llvm.mul %6254, %6269 : !llvm.i64 + %6271 = llvm.add %6268, %6270 : !llvm.i64 + %6272 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6273 = llvm.mul %67, %6272 : !llvm.i64 + %6274 = llvm.add %6271, %6273 : !llvm.i64 + %6275 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6276 = llvm.mul %6266, %6275 : !llvm.i64 + %6277 = llvm.add %6274, %6276 : !llvm.i64 + %6278 = llvm.getelementptr %6267[%6277] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6279 = llvm.load %6278 : !llvm.ptr> + %6280 = llvm.fadd %6243, %6279 : !llvm.vec<8 x float> + %6281 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6282 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6283 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6284 = llvm.mul %67, %6283 : !llvm.i64 + %6285 = llvm.add %6282, %6284 : !llvm.i64 + %6286 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6287 = llvm.mul %66, %6286 : !llvm.i64 + %6288 = llvm.add %6285, %6287 : !llvm.i64 + %6289 = llvm.getelementptr %6281[%6288] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6280, %6289 : !llvm.ptr> + llvm.br ^bb44(%67 : !llvm.i64) + ^bb44(%6290: !llvm.i64): // 2 preds: ^bb43, ^bb45 + %6291 = llvm.icmp "slt" %6290, %68 : !llvm.i64 + llvm.cond_br %6291, ^bb45, ^bb46 + ^bb45: // pred: ^bb44 + %6292 = llvm.mul %6290, %70 : !llvm.i64 + %6293 = llvm.add %5449, %6292 : !llvm.i64 + %6294 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6295 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6296 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6297 = llvm.mul %67, %6296 : !llvm.i64 + %6298 = llvm.add %6295, %6297 : !llvm.i64 + %6299 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6300 = llvm.mul %6290, %6299 : !llvm.i64 + %6301 = llvm.add %6298, %6300 : !llvm.i64 + %6302 = llvm.getelementptr %6294[%6301] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6303 = llvm.load %6302 : !llvm.ptr> + %6304 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6305 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6306 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6307 = llvm.mul %2345, %6306 : !llvm.i64 + %6308 = llvm.add %6305, %6307 : !llvm.i64 + %6309 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6310 = llvm.mul %6293, %6309 : !llvm.i64 + %6311 = llvm.add %6308, %6310 : !llvm.i64 + %6312 = llvm.getelementptr %6304[%6311] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6313 = llvm.bitcast %6312 : !llvm.ptr to !llvm.ptr> + llvm.store %6303, %6313 {alignment = 4 : i64} : !llvm.ptr> + %6314 = llvm.add %6290, %69 : !llvm.i64 + llvm.br ^bb44(%6314 : !llvm.i64) + ^bb46: // 2 preds: ^bb44, ^bb48 + llvm.br ^bb50 + ^bb47: // pred: ^bb42 + %6315 = llvm.add %151, %5447 : !llvm.i64 + %6316 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6318 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6319 = llvm.mul %2345, %6318 : !llvm.i64 + %6320 = llvm.add %6317, %6319 : !llvm.i64 + %6321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6322 = llvm.mul %6315, %6321 : !llvm.i64 + %6323 = llvm.add %6320, %6322 : !llvm.i64 + %6324 = llvm.getelementptr %6316[%6323] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6325 = llvm.bitcast %6324 : !llvm.ptr to !llvm.ptr> + %6326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6327 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6328 = llvm.trunc %6315 : !llvm.i64 to !llvm.i32 + %6329 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6330 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6331 = llvm.insertelement %6328, %6329[%6330 : !llvm.i32] : !llvm.vec<8 x i32> + %6332 = llvm.shufflevector %6331, %6329 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6333 = llvm.add %6332, %6327 : !llvm.vec<8 x i32> + %6334 = llvm.trunc %6326 : !llvm.i64 to !llvm.i32 + %6335 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6336 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6337 = llvm.insertelement %6334, %6335[%6336 : !llvm.i32] : !llvm.vec<8 x i32> + %6338 = llvm.shufflevector %6337, %6335 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6339 = llvm.icmp "slt" %6333, %6338 : !llvm.vec<8 x i32> + %6340 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6341 = llvm.intr.masked.load %6325, %6339, %6340 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6342 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %6343 = llvm.sub %64, %5447 : !llvm.i64 + %6344 = llvm.select %6342, %6343, %5447 : !llvm.i1, !llvm.i64 + %6345 = llvm.sdiv %6344, %68 : !llvm.i64 + %6346 = llvm.sub %64, %6345 : !llvm.i64 + %6347 = llvm.select %6342, %6346, %6345 : !llvm.i1, !llvm.i64 + %6348 = llvm.srem %6347, %68 : !llvm.i64 + %6349 = llvm.icmp "slt" %6348, %67 : !llvm.i64 + %6350 = llvm.add %6348, %68 : !llvm.i64 + %6351 = llvm.select %6349, %6350, %6348 : !llvm.i1, !llvm.i64 + %6352 = llvm.srem %5447, %68 : !llvm.i64 + %6353 = llvm.icmp "slt" %6352, %67 : !llvm.i64 + %6354 = llvm.add %6352, %68 : !llvm.i64 + %6355 = llvm.select %6353, %6354, %6352 : !llvm.i1, !llvm.i64 + %6356 = llvm.icmp "slt" %6355, %67 : !llvm.i64 + %6357 = llvm.sub %64, %6355 : !llvm.i64 + %6358 = llvm.select %6356, %6357, %6355 : !llvm.i1, !llvm.i64 + %6359 = llvm.sdiv %6358, %70 : !llvm.i64 + %6360 = llvm.sub %64, %6359 : !llvm.i64 + %6361 = llvm.select %6356, %6360, %6359 : !llvm.i1, !llvm.i64 + %6362 = llvm.srem %6361, %63 : !llvm.i64 + %6363 = llvm.icmp "slt" %6362, %67 : !llvm.i64 + %6364 = llvm.add %6362, %63 : !llvm.i64 + %6365 = llvm.select %6363, %6364, %6362 : !llvm.i1, !llvm.i64 + %6366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6369 = llvm.mul %6351, %6368 : !llvm.i64 + %6370 = llvm.add %6367, %6369 : !llvm.i64 + %6371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6372 = llvm.mul %67, %6371 : !llvm.i64 + %6373 = llvm.add %6370, %6372 : !llvm.i64 + %6374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6375 = llvm.mul %6365, %6374 : !llvm.i64 + %6376 = llvm.add %6373, %6375 : !llvm.i64 + %6377 = llvm.getelementptr %6366[%6376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6378 = llvm.load %6377 : !llvm.ptr> + %6379 = llvm.fadd %6341, %6378 : !llvm.vec<8 x float> + %6380 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6382 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6383 = llvm.mul %67, %6382 : !llvm.i64 + %6384 = llvm.add %6381, %6383 : !llvm.i64 + %6385 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6386 = llvm.mul %67, %6385 : !llvm.i64 + %6387 = llvm.add %6384, %6386 : !llvm.i64 + %6388 = llvm.getelementptr %6380[%6387] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6379, %6388 : !llvm.ptr> + %6389 = llvm.add %6315, %70 : !llvm.i64 + %6390 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6392 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6393 = llvm.mul %2345, %6392 : !llvm.i64 + %6394 = llvm.add %6391, %6393 : !llvm.i64 + %6395 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6396 = llvm.mul %6389, %6395 : !llvm.i64 + %6397 = llvm.add %6394, %6396 : !llvm.i64 + %6398 = llvm.getelementptr %6390[%6397] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6399 = llvm.bitcast %6398 : !llvm.ptr to !llvm.ptr> + %6400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6401 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6402 = llvm.trunc %6389 : !llvm.i64 to !llvm.i32 + %6403 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6404 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6405 = llvm.insertelement %6402, %6403[%6404 : !llvm.i32] : !llvm.vec<8 x i32> + %6406 = llvm.shufflevector %6405, %6403 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6407 = llvm.add %6406, %6401 : !llvm.vec<8 x i32> + %6408 = llvm.trunc %6400 : !llvm.i64 to !llvm.i32 + %6409 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6410 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6411 = llvm.insertelement %6408, %6409[%6410 : !llvm.i32] : !llvm.vec<8 x i32> + %6412 = llvm.shufflevector %6411, %6409 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6413 = llvm.icmp "slt" %6407, %6412 : !llvm.vec<8 x i32> + %6414 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6415 = llvm.intr.masked.load %6399, %6413, %6414 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6416 = llvm.add %5447, %70 : !llvm.i64 + %6417 = llvm.icmp "slt" %6416, %67 : !llvm.i64 + %6418 = llvm.sub %64, %6416 : !llvm.i64 + %6419 = llvm.select %6417, %6418, %6416 : !llvm.i1, !llvm.i64 + %6420 = llvm.sdiv %6419, %68 : !llvm.i64 + %6421 = llvm.sub %64, %6420 : !llvm.i64 + %6422 = llvm.select %6417, %6421, %6420 : !llvm.i1, !llvm.i64 + %6423 = llvm.srem %6422, %68 : !llvm.i64 + %6424 = llvm.icmp "slt" %6423, %67 : !llvm.i64 + %6425 = llvm.add %6423, %68 : !llvm.i64 + %6426 = llvm.select %6424, %6425, %6423 : !llvm.i1, !llvm.i64 + %6427 = llvm.sdiv %6344, %70 : !llvm.i64 + %6428 = llvm.sub %64, %6427 : !llvm.i64 + %6429 = llvm.select %6342, %6428, %6427 : !llvm.i1, !llvm.i64 + %6430 = llvm.mul %6422, %65 : !llvm.i64 + %6431 = llvm.add %6429, %6430 : !llvm.i64 + %6432 = llvm.add %6431, %69 : !llvm.i64 + %6433 = llvm.icmp "slt" %6432, %67 : !llvm.i64 + %6434 = llvm.sub %64, %6432 : !llvm.i64 + %6435 = llvm.select %6433, %6434, %6432 : !llvm.i1, !llvm.i64 + %6436 = llvm.sdiv %6435, %63 : !llvm.i64 + %6437 = llvm.sub %64, %6436 : !llvm.i64 + %6438 = llvm.select %6433, %6437, %6436 : !llvm.i1, !llvm.i64 + %6439 = llvm.mul %6438, %65 : !llvm.i64 + %6440 = llvm.add %6431, %6439 : !llvm.i64 + %6441 = llvm.add %6440, %69 : !llvm.i64 + %6442 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6444 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6445 = llvm.mul %6426, %6444 : !llvm.i64 + %6446 = llvm.add %6443, %6445 : !llvm.i64 + %6447 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6448 = llvm.mul %67, %6447 : !llvm.i64 + %6449 = llvm.add %6446, %6448 : !llvm.i64 + %6450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6451 = llvm.mul %6441, %6450 : !llvm.i64 + %6452 = llvm.add %6449, %6451 : !llvm.i64 + %6453 = llvm.getelementptr %6442[%6452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6454 = llvm.load %6453 : !llvm.ptr> + %6455 = llvm.fadd %6415, %6454 : !llvm.vec<8 x float> + %6456 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6457 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6458 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6459 = llvm.mul %67, %6458 : !llvm.i64 + %6460 = llvm.add %6457, %6459 : !llvm.i64 + %6461 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6462 = llvm.mul %69, %6461 : !llvm.i64 + %6463 = llvm.add %6460, %6462 : !llvm.i64 + %6464 = llvm.getelementptr %6456[%6463] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6455, %6464 : !llvm.ptr> + %6465 = llvm.add %6315, %68 : !llvm.i64 + %6466 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6467 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6468 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6469 = llvm.mul %2345, %6468 : !llvm.i64 + %6470 = llvm.add %6467, %6469 : !llvm.i64 + %6471 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6472 = llvm.mul %6465, %6471 : !llvm.i64 + %6473 = llvm.add %6470, %6472 : !llvm.i64 + %6474 = llvm.getelementptr %6466[%6473] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6475 = llvm.bitcast %6474 : !llvm.ptr to !llvm.ptr> + %6476 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6477 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6478 = llvm.trunc %6465 : !llvm.i64 to !llvm.i32 + %6479 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6480 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6481 = llvm.insertelement %6478, %6479[%6480 : !llvm.i32] : !llvm.vec<8 x i32> + %6482 = llvm.shufflevector %6481, %6479 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6483 = llvm.add %6482, %6477 : !llvm.vec<8 x i32> + %6484 = llvm.trunc %6476 : !llvm.i64 to !llvm.i32 + %6485 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6486 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6487 = llvm.insertelement %6484, %6485[%6486 : !llvm.i32] : !llvm.vec<8 x i32> + %6488 = llvm.shufflevector %6487, %6485 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6489 = llvm.icmp "slt" %6483, %6488 : !llvm.vec<8 x i32> + %6490 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6491 = llvm.intr.masked.load %6475, %6489, %6490 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6492 = llvm.add %6347, %69 : !llvm.i64 + %6493 = llvm.icmp "slt" %6492, %67 : !llvm.i64 + %6494 = llvm.sub %64, %6492 : !llvm.i64 + %6495 = llvm.select %6493, %6494, %6492 : !llvm.i1, !llvm.i64 + %6496 = llvm.sdiv %6495, %68 : !llvm.i64 + %6497 = llvm.sub %64, %6496 : !llvm.i64 + %6498 = llvm.select %6493, %6497, %6496 : !llvm.i1, !llvm.i64 + %6499 = llvm.mul %6498, %60 : !llvm.i64 + %6500 = llvm.add %6347, %6499 : !llvm.i64 + %6501 = llvm.add %6500, %69 : !llvm.i64 + %6502 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6503 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6504 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6505 = llvm.mul %6501, %6504 : !llvm.i64 + %6506 = llvm.add %6503, %6505 : !llvm.i64 + %6507 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6508 = llvm.mul %67, %6507 : !llvm.i64 + %6509 = llvm.add %6506, %6508 : !llvm.i64 + %6510 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6511 = llvm.mul %6365, %6510 : !llvm.i64 + %6512 = llvm.add %6509, %6511 : !llvm.i64 + %6513 = llvm.getelementptr %6502[%6512] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6514 = llvm.load %6513 : !llvm.ptr> + %6515 = llvm.fadd %6491, %6514 : !llvm.vec<8 x float> + %6516 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6517 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6518 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6519 = llvm.mul %67, %6518 : !llvm.i64 + %6520 = llvm.add %6517, %6519 : !llvm.i64 + %6521 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6522 = llvm.mul %63, %6521 : !llvm.i64 + %6523 = llvm.add %6520, %6522 : !llvm.i64 + %6524 = llvm.getelementptr %6516[%6523] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6515, %6524 : !llvm.ptr> + %6525 = llvm.add %6315, %41 : !llvm.i64 + %6526 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6528 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6529 = llvm.mul %2345, %6528 : !llvm.i64 + %6530 = llvm.add %6527, %6529 : !llvm.i64 + %6531 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6532 = llvm.mul %6525, %6531 : !llvm.i64 + %6533 = llvm.add %6530, %6532 : !llvm.i64 + %6534 = llvm.getelementptr %6526[%6533] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6535 = llvm.bitcast %6534 : !llvm.ptr to !llvm.ptr> + %6536 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6537 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6538 = llvm.trunc %6525 : !llvm.i64 to !llvm.i32 + %6539 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6540 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6541 = llvm.insertelement %6538, %6539[%6540 : !llvm.i32] : !llvm.vec<8 x i32> + %6542 = llvm.shufflevector %6541, %6539 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6543 = llvm.add %6542, %6537 : !llvm.vec<8 x i32> + %6544 = llvm.trunc %6536 : !llvm.i64 to !llvm.i32 + %6545 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6546 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6547 = llvm.insertelement %6544, %6545[%6546 : !llvm.i32] : !llvm.vec<8 x i32> + %6548 = llvm.shufflevector %6547, %6545 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6549 = llvm.icmp "slt" %6543, %6548 : !llvm.vec<8 x i32> + %6550 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6551 = llvm.intr.masked.load %6535, %6549, %6550 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6552 = llvm.add %5447, %41 : !llvm.i64 + %6553 = llvm.icmp "slt" %6552, %67 : !llvm.i64 + %6554 = llvm.sub %64, %6552 : !llvm.i64 + %6555 = llvm.select %6553, %6554, %6552 : !llvm.i1, !llvm.i64 + %6556 = llvm.sdiv %6555, %68 : !llvm.i64 + %6557 = llvm.sub %64, %6556 : !llvm.i64 + %6558 = llvm.select %6553, %6557, %6556 : !llvm.i1, !llvm.i64 + %6559 = llvm.srem %6558, %68 : !llvm.i64 + %6560 = llvm.icmp "slt" %6559, %67 : !llvm.i64 + %6561 = llvm.add %6559, %68 : !llvm.i64 + %6562 = llvm.select %6560, %6561, %6559 : !llvm.i1, !llvm.i64 + %6563 = llvm.mul %6558, %65 : !llvm.i64 + %6564 = llvm.add %6429, %6563 : !llvm.i64 + %6565 = llvm.add %6564, %45 : !llvm.i64 + %6566 = llvm.icmp "slt" %6565, %67 : !llvm.i64 + %6567 = llvm.sub %64, %6565 : !llvm.i64 + %6568 = llvm.select %6566, %6567, %6565 : !llvm.i1, !llvm.i64 + %6569 = llvm.sdiv %6568, %63 : !llvm.i64 + %6570 = llvm.sub %64, %6569 : !llvm.i64 + %6571 = llvm.select %6566, %6570, %6569 : !llvm.i1, !llvm.i64 + %6572 = llvm.mul %6571, %65 : !llvm.i64 + %6573 = llvm.add %6564, %6572 : !llvm.i64 + %6574 = llvm.add %6573, %45 : !llvm.i64 + %6575 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6576 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6577 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6578 = llvm.mul %6562, %6577 : !llvm.i64 + %6579 = llvm.add %6576, %6578 : !llvm.i64 + %6580 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6581 = llvm.mul %67, %6580 : !llvm.i64 + %6582 = llvm.add %6579, %6581 : !llvm.i64 + %6583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6584 = llvm.mul %6574, %6583 : !llvm.i64 + %6585 = llvm.add %6582, %6584 : !llvm.i64 + %6586 = llvm.getelementptr %6575[%6585] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6587 = llvm.load %6586 : !llvm.ptr> + %6588 = llvm.fadd %6551, %6587 : !llvm.vec<8 x float> + %6589 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6592 = llvm.mul %67, %6591 : !llvm.i64 + %6593 = llvm.add %6590, %6592 : !llvm.i64 + %6594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6595 = llvm.mul %45, %6594 : !llvm.i64 + %6596 = llvm.add %6593, %6595 : !llvm.i64 + %6597 = llvm.getelementptr %6589[%6596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6588, %6597 : !llvm.ptr> + %6598 = llvm.add %6315, %42 : !llvm.i64 + %6599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6602 = llvm.mul %2345, %6601 : !llvm.i64 + %6603 = llvm.add %6600, %6602 : !llvm.i64 + %6604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6605 = llvm.mul %6598, %6604 : !llvm.i64 + %6606 = llvm.add %6603, %6605 : !llvm.i64 + %6607 = llvm.getelementptr %6599[%6606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6608 = llvm.bitcast %6607 : !llvm.ptr to !llvm.ptr> + %6609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6611 = llvm.trunc %6598 : !llvm.i64 to !llvm.i32 + %6612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6614 = llvm.insertelement %6611, %6612[%6613 : !llvm.i32] : !llvm.vec<8 x i32> + %6615 = llvm.shufflevector %6614, %6612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6616 = llvm.add %6615, %6610 : !llvm.vec<8 x i32> + %6617 = llvm.trunc %6609 : !llvm.i64 to !llvm.i32 + %6618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6620 = llvm.insertelement %6617, %6618[%6619 : !llvm.i32] : !llvm.vec<8 x i32> + %6621 = llvm.shufflevector %6620, %6618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6622 = llvm.icmp "slt" %6616, %6621 : !llvm.vec<8 x i32> + %6623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6624 = llvm.intr.masked.load %6608, %6622, %6623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6625 = llvm.add %6347, %63 : !llvm.i64 + %6626 = llvm.icmp "slt" %6625, %67 : !llvm.i64 + %6627 = llvm.sub %64, %6625 : !llvm.i64 + %6628 = llvm.select %6626, %6627, %6625 : !llvm.i1, !llvm.i64 + %6629 = llvm.sdiv %6628, %68 : !llvm.i64 + %6630 = llvm.sub %64, %6629 : !llvm.i64 + %6631 = llvm.select %6626, %6630, %6629 : !llvm.i1, !llvm.i64 + %6632 = llvm.mul %6631, %60 : !llvm.i64 + %6633 = llvm.add %6347, %6632 : !llvm.i64 + %6634 = llvm.add %6633, %63 : !llvm.i64 + %6635 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6637 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6638 = llvm.mul %6634, %6637 : !llvm.i64 + %6639 = llvm.add %6636, %6638 : !llvm.i64 + %6640 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6641 = llvm.mul %67, %6640 : !llvm.i64 + %6642 = llvm.add %6639, %6641 : !llvm.i64 + %6643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6644 = llvm.mul %6365, %6643 : !llvm.i64 + %6645 = llvm.add %6642, %6644 : !llvm.i64 + %6646 = llvm.getelementptr %6635[%6645] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6647 = llvm.load %6646 : !llvm.ptr> + %6648 = llvm.fadd %6624, %6647 : !llvm.vec<8 x float> + %6649 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6651 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6652 = llvm.mul %67, %6651 : !llvm.i64 + %6653 = llvm.add %6650, %6652 : !llvm.i64 + %6654 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6655 = llvm.mul %48, %6654 : !llvm.i64 + %6656 = llvm.add %6653, %6655 : !llvm.i64 + %6657 = llvm.getelementptr %6649[%6656] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6648, %6657 : !llvm.ptr> + %6658 = llvm.add %6315, %43 : !llvm.i64 + %6659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6662 = llvm.mul %2345, %6661 : !llvm.i64 + %6663 = llvm.add %6660, %6662 : !llvm.i64 + %6664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6665 = llvm.mul %6658, %6664 : !llvm.i64 + %6666 = llvm.add %6663, %6665 : !llvm.i64 + %6667 = llvm.getelementptr %6659[%6666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6668 = llvm.bitcast %6667 : !llvm.ptr to !llvm.ptr> + %6669 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6670 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6671 = llvm.trunc %6658 : !llvm.i64 to !llvm.i32 + %6672 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6673 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6674 = llvm.insertelement %6671, %6672[%6673 : !llvm.i32] : !llvm.vec<8 x i32> + %6675 = llvm.shufflevector %6674, %6672 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6676 = llvm.add %6675, %6670 : !llvm.vec<8 x i32> + %6677 = llvm.trunc %6669 : !llvm.i64 to !llvm.i32 + %6678 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6679 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6680 = llvm.insertelement %6677, %6678[%6679 : !llvm.i32] : !llvm.vec<8 x i32> + %6681 = llvm.shufflevector %6680, %6678 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6682 = llvm.icmp "slt" %6676, %6681 : !llvm.vec<8 x i32> + %6683 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6684 = llvm.intr.masked.load %6668, %6682, %6683 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6685 = llvm.add %5447, %43 : !llvm.i64 + %6686 = llvm.icmp "slt" %6685, %67 : !llvm.i64 + %6687 = llvm.sub %64, %6685 : !llvm.i64 + %6688 = llvm.select %6686, %6687, %6685 : !llvm.i1, !llvm.i64 + %6689 = llvm.sdiv %6688, %68 : !llvm.i64 + %6690 = llvm.sub %64, %6689 : !llvm.i64 + %6691 = llvm.select %6686, %6690, %6689 : !llvm.i1, !llvm.i64 + %6692 = llvm.srem %6691, %68 : !llvm.i64 + %6693 = llvm.icmp "slt" %6692, %67 : !llvm.i64 + %6694 = llvm.add %6692, %68 : !llvm.i64 + %6695 = llvm.select %6693, %6694, %6692 : !llvm.i1, !llvm.i64 + %6696 = llvm.mul %6691, %65 : !llvm.i64 + %6697 = llvm.add %6429, %6696 : !llvm.i64 + %6698 = llvm.add %6697, %52 : !llvm.i64 + %6699 = llvm.icmp "slt" %6698, %67 : !llvm.i64 + %6700 = llvm.sub %64, %6698 : !llvm.i64 + %6701 = llvm.select %6699, %6700, %6698 : !llvm.i1, !llvm.i64 + %6702 = llvm.sdiv %6701, %63 : !llvm.i64 + %6703 = llvm.sub %64, %6702 : !llvm.i64 + %6704 = llvm.select %6699, %6703, %6702 : !llvm.i1, !llvm.i64 + %6705 = llvm.mul %6704, %65 : !llvm.i64 + %6706 = llvm.add %6697, %6705 : !llvm.i64 + %6707 = llvm.add %6706, %52 : !llvm.i64 + %6708 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6710 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6711 = llvm.mul %6695, %6710 : !llvm.i64 + %6712 = llvm.add %6709, %6711 : !llvm.i64 + %6713 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6714 = llvm.mul %67, %6713 : !llvm.i64 + %6715 = llvm.add %6712, %6714 : !llvm.i64 + %6716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6717 = llvm.mul %6707, %6716 : !llvm.i64 + %6718 = llvm.add %6715, %6717 : !llvm.i64 + %6719 = llvm.getelementptr %6708[%6718] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6720 = llvm.load %6719 : !llvm.ptr> + %6721 = llvm.fadd %6684, %6720 : !llvm.vec<8 x float> + %6722 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6724 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6725 = llvm.mul %67, %6724 : !llvm.i64 + %6726 = llvm.add %6723, %6725 : !llvm.i64 + %6727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6728 = llvm.mul %52, %6727 : !llvm.i64 + %6729 = llvm.add %6726, %6728 : !llvm.i64 + %6730 = llvm.getelementptr %6722[%6729] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6721, %6730 : !llvm.ptr> + %6731 = llvm.add %6315, %44 : !llvm.i64 + %6732 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6734 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6735 = llvm.mul %2345, %6734 : !llvm.i64 + %6736 = llvm.add %6733, %6735 : !llvm.i64 + %6737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6738 = llvm.mul %6731, %6737 : !llvm.i64 + %6739 = llvm.add %6736, %6738 : !llvm.i64 + %6740 = llvm.getelementptr %6732[%6739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6741 = llvm.bitcast %6740 : !llvm.ptr to !llvm.ptr> + %6742 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6743 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6744 = llvm.trunc %6731 : !llvm.i64 to !llvm.i32 + %6745 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6746 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6747 = llvm.insertelement %6744, %6745[%6746 : !llvm.i32] : !llvm.vec<8 x i32> + %6748 = llvm.shufflevector %6747, %6745 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6749 = llvm.add %6748, %6743 : !llvm.vec<8 x i32> + %6750 = llvm.trunc %6742 : !llvm.i64 to !llvm.i32 + %6751 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6752 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6753 = llvm.insertelement %6750, %6751[%6752 : !llvm.i32] : !llvm.vec<8 x i32> + %6754 = llvm.shufflevector %6753, %6751 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6755 = llvm.icmp "slt" %6749, %6754 : !llvm.vec<8 x i32> + %6756 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6757 = llvm.intr.masked.load %6741, %6755, %6756 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6758 = llvm.add %6347, %45 : !llvm.i64 + %6759 = llvm.icmp "slt" %6758, %67 : !llvm.i64 + %6760 = llvm.sub %64, %6758 : !llvm.i64 + %6761 = llvm.select %6759, %6760, %6758 : !llvm.i1, !llvm.i64 + %6762 = llvm.sdiv %6761, %68 : !llvm.i64 + %6763 = llvm.sub %64, %6762 : !llvm.i64 + %6764 = llvm.select %6759, %6763, %6762 : !llvm.i1, !llvm.i64 + %6765 = llvm.mul %6764, %60 : !llvm.i64 + %6766 = llvm.add %6347, %6765 : !llvm.i64 + %6767 = llvm.add %6766, %45 : !llvm.i64 + %6768 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6770 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6771 = llvm.mul %6767, %6770 : !llvm.i64 + %6772 = llvm.add %6769, %6771 : !llvm.i64 + %6773 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6774 = llvm.mul %67, %6773 : !llvm.i64 + %6775 = llvm.add %6772, %6774 : !llvm.i64 + %6776 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6777 = llvm.mul %6365, %6776 : !llvm.i64 + %6778 = llvm.add %6775, %6777 : !llvm.i64 + %6779 = llvm.getelementptr %6768[%6778] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6780 = llvm.load %6779 : !llvm.ptr> + %6781 = llvm.fadd %6757, %6780 : !llvm.vec<8 x float> + %6782 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6784 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6785 = llvm.mul %67, %6784 : !llvm.i64 + %6786 = llvm.add %6783, %6785 : !llvm.i64 + %6787 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6788 = llvm.mul %56, %6787 : !llvm.i64 + %6789 = llvm.add %6786, %6788 : !llvm.i64 + %6790 = llvm.getelementptr %6782[%6789] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6781, %6790 : !llvm.ptr> + %6791 = llvm.add %6315, %46 : !llvm.i64 + %6792 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6794 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6795 = llvm.mul %2345, %6794 : !llvm.i64 + %6796 = llvm.add %6793, %6795 : !llvm.i64 + %6797 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6798 = llvm.mul %6791, %6797 : !llvm.i64 + %6799 = llvm.add %6796, %6798 : !llvm.i64 + %6800 = llvm.getelementptr %6792[%6799] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6801 = llvm.bitcast %6800 : !llvm.ptr to !llvm.ptr> + %6802 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6803 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6804 = llvm.trunc %6791 : !llvm.i64 to !llvm.i32 + %6805 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6806 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6807 = llvm.insertelement %6804, %6805[%6806 : !llvm.i32] : !llvm.vec<8 x i32> + %6808 = llvm.shufflevector %6807, %6805 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6809 = llvm.add %6808, %6803 : !llvm.vec<8 x i32> + %6810 = llvm.trunc %6802 : !llvm.i64 to !llvm.i32 + %6811 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6812 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6813 = llvm.insertelement %6810, %6811[%6812 : !llvm.i32] : !llvm.vec<8 x i32> + %6814 = llvm.shufflevector %6813, %6811 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6815 = llvm.icmp "slt" %6809, %6814 : !llvm.vec<8 x i32> + %6816 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6817 = llvm.intr.masked.load %6801, %6815, %6816 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6818 = llvm.add %5447, %46 : !llvm.i64 + %6819 = llvm.icmp "slt" %6818, %67 : !llvm.i64 + %6820 = llvm.sub %64, %6818 : !llvm.i64 + %6821 = llvm.select %6819, %6820, %6818 : !llvm.i1, !llvm.i64 + %6822 = llvm.sdiv %6821, %68 : !llvm.i64 + %6823 = llvm.sub %64, %6822 : !llvm.i64 + %6824 = llvm.select %6819, %6823, %6822 : !llvm.i1, !llvm.i64 + %6825 = llvm.srem %6824, %68 : !llvm.i64 + %6826 = llvm.icmp "slt" %6825, %67 : !llvm.i64 + %6827 = llvm.add %6825, %68 : !llvm.i64 + %6828 = llvm.select %6826, %6827, %6825 : !llvm.i1, !llvm.i64 + %6829 = llvm.mul %6824, %65 : !llvm.i64 + %6830 = llvm.add %6429, %6829 : !llvm.i64 + %6831 = llvm.add %6830, %61 : !llvm.i64 + %6832 = llvm.icmp "slt" %6831, %67 : !llvm.i64 + %6833 = llvm.sub %64, %6831 : !llvm.i64 + %6834 = llvm.select %6832, %6833, %6831 : !llvm.i1, !llvm.i64 + %6835 = llvm.sdiv %6834, %63 : !llvm.i64 + %6836 = llvm.sub %64, %6835 : !llvm.i64 + %6837 = llvm.select %6832, %6836, %6835 : !llvm.i1, !llvm.i64 + %6838 = llvm.mul %6837, %65 : !llvm.i64 + %6839 = llvm.add %6830, %6838 : !llvm.i64 + %6840 = llvm.add %6839, %61 : !llvm.i64 + %6841 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6842 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6843 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6844 = llvm.mul %6828, %6843 : !llvm.i64 + %6845 = llvm.add %6842, %6844 : !llvm.i64 + %6846 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6847 = llvm.mul %67, %6846 : !llvm.i64 + %6848 = llvm.add %6845, %6847 : !llvm.i64 + %6849 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6850 = llvm.mul %6840, %6849 : !llvm.i64 + %6851 = llvm.add %6848, %6850 : !llvm.i64 + %6852 = llvm.getelementptr %6841[%6851] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6853 = llvm.load %6852 : !llvm.ptr> + %6854 = llvm.fadd %6817, %6853 : !llvm.vec<8 x float> + %6855 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6857 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6858 = llvm.mul %67, %6857 : !llvm.i64 + %6859 = llvm.add %6856, %6858 : !llvm.i64 + %6860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6861 = llvm.mul %61, %6860 : !llvm.i64 + %6862 = llvm.add %6859, %6861 : !llvm.i64 + %6863 = llvm.getelementptr %6855[%6862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6854, %6863 : !llvm.ptr> + %6864 = llvm.add %6315, %47 : !llvm.i64 + %6865 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6866 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6867 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6868 = llvm.mul %2345, %6867 : !llvm.i64 + %6869 = llvm.add %6866, %6868 : !llvm.i64 + %6870 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6871 = llvm.mul %6864, %6870 : !llvm.i64 + %6872 = llvm.add %6869, %6871 : !llvm.i64 + %6873 = llvm.getelementptr %6865[%6872] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6874 = llvm.bitcast %6873 : !llvm.ptr to !llvm.ptr> + %6875 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6876 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6877 = llvm.trunc %6864 : !llvm.i64 to !llvm.i32 + %6878 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6879 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6880 = llvm.insertelement %6877, %6878[%6879 : !llvm.i32] : !llvm.vec<8 x i32> + %6881 = llvm.shufflevector %6880, %6878 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6882 = llvm.add %6881, %6876 : !llvm.vec<8 x i32> + %6883 = llvm.trunc %6875 : !llvm.i64 to !llvm.i32 + %6884 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6885 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6886 = llvm.insertelement %6883, %6884[%6885 : !llvm.i32] : !llvm.vec<8 x i32> + %6887 = llvm.shufflevector %6886, %6884 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6888 = llvm.icmp "slt" %6882, %6887 : !llvm.vec<8 x i32> + %6889 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6890 = llvm.intr.masked.load %6874, %6888, %6889 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6891 = llvm.add %6347, %48 : !llvm.i64 + %6892 = llvm.icmp "slt" %6891, %67 : !llvm.i64 + %6893 = llvm.sub %64, %6891 : !llvm.i64 + %6894 = llvm.select %6892, %6893, %6891 : !llvm.i1, !llvm.i64 + %6895 = llvm.sdiv %6894, %68 : !llvm.i64 + %6896 = llvm.sub %64, %6895 : !llvm.i64 + %6897 = llvm.select %6892, %6896, %6895 : !llvm.i1, !llvm.i64 + %6898 = llvm.mul %6897, %60 : !llvm.i64 + %6899 = llvm.add %6347, %6898 : !llvm.i64 + %6900 = llvm.add %6899, %48 : !llvm.i64 + %6901 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6903 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6904 = llvm.mul %6900, %6903 : !llvm.i64 + %6905 = llvm.add %6902, %6904 : !llvm.i64 + %6906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6907 = llvm.mul %67, %6906 : !llvm.i64 + %6908 = llvm.add %6905, %6907 : !llvm.i64 + %6909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6910 = llvm.mul %6365, %6909 : !llvm.i64 + %6911 = llvm.add %6908, %6910 : !llvm.i64 + %6912 = llvm.getelementptr %6901[%6911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6913 = llvm.load %6912 : !llvm.ptr> + %6914 = llvm.fadd %6890, %6913 : !llvm.vec<8 x float> + %6915 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6916 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6917 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6918 = llvm.mul %67, %6917 : !llvm.i64 + %6919 = llvm.add %6916, %6918 : !llvm.i64 + %6920 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6921 = llvm.mul %70, %6920 : !llvm.i64 + %6922 = llvm.add %6919, %6921 : !llvm.i64 + %6923 = llvm.getelementptr %6915[%6922] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6914, %6923 : !llvm.ptr> + %6924 = llvm.add %6315, %49 : !llvm.i64 + %6925 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6926 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6927 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6928 = llvm.mul %2345, %6927 : !llvm.i64 + %6929 = llvm.add %6926, %6928 : !llvm.i64 + %6930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6931 = llvm.mul %6924, %6930 : !llvm.i64 + %6932 = llvm.add %6929, %6931 : !llvm.i64 + %6933 = llvm.getelementptr %6925[%6932] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6934 = llvm.bitcast %6933 : !llvm.ptr to !llvm.ptr> + %6935 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6936 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6937 = llvm.trunc %6924 : !llvm.i64 to !llvm.i32 + %6938 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6939 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6940 = llvm.insertelement %6937, %6938[%6939 : !llvm.i32] : !llvm.vec<8 x i32> + %6941 = llvm.shufflevector %6940, %6938 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6942 = llvm.add %6941, %6936 : !llvm.vec<8 x i32> + %6943 = llvm.trunc %6935 : !llvm.i64 to !llvm.i32 + %6944 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6945 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6946 = llvm.insertelement %6943, %6944[%6945 : !llvm.i32] : !llvm.vec<8 x i32> + %6947 = llvm.shufflevector %6946, %6944 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6948 = llvm.icmp "slt" %6942, %6947 : !llvm.vec<8 x i32> + %6949 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6950 = llvm.intr.masked.load %6934, %6948, %6949 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6951 = llvm.add %5447, %49 : !llvm.i64 + %6952 = llvm.icmp "slt" %6951, %67 : !llvm.i64 + %6953 = llvm.sub %64, %6951 : !llvm.i64 + %6954 = llvm.select %6952, %6953, %6951 : !llvm.i1, !llvm.i64 + %6955 = llvm.sdiv %6954, %68 : !llvm.i64 + %6956 = llvm.sub %64, %6955 : !llvm.i64 + %6957 = llvm.select %6952, %6956, %6955 : !llvm.i1, !llvm.i64 + %6958 = llvm.srem %6957, %68 : !llvm.i64 + %6959 = llvm.icmp "slt" %6958, %67 : !llvm.i64 + %6960 = llvm.add %6958, %68 : !llvm.i64 + %6961 = llvm.select %6959, %6960, %6958 : !llvm.i1, !llvm.i64 + %6962 = llvm.mul %6957, %65 : !llvm.i64 + %6963 = llvm.add %6429, %6962 : !llvm.i64 + %6964 = llvm.add %6963, %50 : !llvm.i64 + %6965 = llvm.icmp "slt" %6964, %67 : !llvm.i64 + %6966 = llvm.sub %64, %6964 : !llvm.i64 + %6967 = llvm.select %6965, %6966, %6964 : !llvm.i1, !llvm.i64 + %6968 = llvm.sdiv %6967, %63 : !llvm.i64 + %6969 = llvm.sub %64, %6968 : !llvm.i64 + %6970 = llvm.select %6965, %6969, %6968 : !llvm.i1, !llvm.i64 + %6971 = llvm.mul %6970, %65 : !llvm.i64 + %6972 = llvm.add %6963, %6971 : !llvm.i64 + %6973 = llvm.add %6972, %50 : !llvm.i64 + %6974 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6975 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6976 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6977 = llvm.mul %6961, %6976 : !llvm.i64 + %6978 = llvm.add %6975, %6977 : !llvm.i64 + %6979 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6980 = llvm.mul %67, %6979 : !llvm.i64 + %6981 = llvm.add %6978, %6980 : !llvm.i64 + %6982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6983 = llvm.mul %6973, %6982 : !llvm.i64 + %6984 = llvm.add %6981, %6983 : !llvm.i64 + %6985 = llvm.getelementptr %6974[%6984] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6986 = llvm.load %6985 : !llvm.ptr> + %6987 = llvm.fadd %6950, %6986 : !llvm.vec<8 x float> + %6988 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6990 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6991 = llvm.mul %67, %6990 : !llvm.i64 + %6992 = llvm.add %6989, %6991 : !llvm.i64 + %6993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6994 = llvm.mul %50, %6993 : !llvm.i64 + %6995 = llvm.add %6992, %6994 : !llvm.i64 + %6996 = llvm.getelementptr %6988[%6995] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6987, %6996 : !llvm.ptr> + %6997 = llvm.add %6315, %51 : !llvm.i64 + %6998 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6999 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7000 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7001 = llvm.mul %2345, %7000 : !llvm.i64 + %7002 = llvm.add %6999, %7001 : !llvm.i64 + %7003 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7004 = llvm.mul %6997, %7003 : !llvm.i64 + %7005 = llvm.add %7002, %7004 : !llvm.i64 + %7006 = llvm.getelementptr %6998[%7005] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7007 = llvm.bitcast %7006 : !llvm.ptr to !llvm.ptr> + %7008 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7009 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7010 = llvm.trunc %6997 : !llvm.i64 to !llvm.i32 + %7011 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7012 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7013 = llvm.insertelement %7010, %7011[%7012 : !llvm.i32] : !llvm.vec<8 x i32> + %7014 = llvm.shufflevector %7013, %7011 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7015 = llvm.add %7014, %7009 : !llvm.vec<8 x i32> + %7016 = llvm.trunc %7008 : !llvm.i64 to !llvm.i32 + %7017 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7018 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7019 = llvm.insertelement %7016, %7017[%7018 : !llvm.i32] : !llvm.vec<8 x i32> + %7020 = llvm.shufflevector %7019, %7017 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7021 = llvm.icmp "slt" %7015, %7020 : !llvm.vec<8 x i32> + %7022 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7023 = llvm.intr.masked.load %7007, %7021, %7022 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7024 = llvm.add %6347, %52 : !llvm.i64 + %7025 = llvm.icmp "slt" %7024, %67 : !llvm.i64 + %7026 = llvm.sub %64, %7024 : !llvm.i64 + %7027 = llvm.select %7025, %7026, %7024 : !llvm.i1, !llvm.i64 + %7028 = llvm.sdiv %7027, %68 : !llvm.i64 + %7029 = llvm.sub %64, %7028 : !llvm.i64 + %7030 = llvm.select %7025, %7029, %7028 : !llvm.i1, !llvm.i64 + %7031 = llvm.mul %7030, %60 : !llvm.i64 + %7032 = llvm.add %6347, %7031 : !llvm.i64 + %7033 = llvm.add %7032, %52 : !llvm.i64 + %7034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7037 = llvm.mul %7033, %7036 : !llvm.i64 + %7038 = llvm.add %7035, %7037 : !llvm.i64 + %7039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7040 = llvm.mul %67, %7039 : !llvm.i64 + %7041 = llvm.add %7038, %7040 : !llvm.i64 + %7042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7043 = llvm.mul %6365, %7042 : !llvm.i64 + %7044 = llvm.add %7041, %7043 : !llvm.i64 + %7045 = llvm.getelementptr %7034[%7044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7046 = llvm.load %7045 : !llvm.ptr> + %7047 = llvm.fadd %7023, %7046 : !llvm.vec<8 x float> + %7048 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7050 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7051 = llvm.mul %67, %7050 : !llvm.i64 + %7052 = llvm.add %7049, %7051 : !llvm.i64 + %7053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7054 = llvm.mul %33, %7053 : !llvm.i64 + %7055 = llvm.add %7052, %7054 : !llvm.i64 + %7056 = llvm.getelementptr %7048[%7055] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7047, %7056 : !llvm.ptr> + %7057 = llvm.add %6315, %53 : !llvm.i64 + %7058 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7059 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7060 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7061 = llvm.mul %2345, %7060 : !llvm.i64 + %7062 = llvm.add %7059, %7061 : !llvm.i64 + %7063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7064 = llvm.mul %7057, %7063 : !llvm.i64 + %7065 = llvm.add %7062, %7064 : !llvm.i64 + %7066 = llvm.getelementptr %7058[%7065] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7067 = llvm.bitcast %7066 : !llvm.ptr to !llvm.ptr> + %7068 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7069 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7070 = llvm.trunc %7057 : !llvm.i64 to !llvm.i32 + %7071 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7072 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7073 = llvm.insertelement %7070, %7071[%7072 : !llvm.i32] : !llvm.vec<8 x i32> + %7074 = llvm.shufflevector %7073, %7071 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7075 = llvm.add %7074, %7069 : !llvm.vec<8 x i32> + %7076 = llvm.trunc %7068 : !llvm.i64 to !llvm.i32 + %7077 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7078 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7079 = llvm.insertelement %7076, %7077[%7078 : !llvm.i32] : !llvm.vec<8 x i32> + %7080 = llvm.shufflevector %7079, %7077 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7081 = llvm.icmp "slt" %7075, %7080 : !llvm.vec<8 x i32> + %7082 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7083 = llvm.intr.masked.load %7067, %7081, %7082 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7084 = llvm.add %5447, %53 : !llvm.i64 + %7085 = llvm.icmp "slt" %7084, %67 : !llvm.i64 + %7086 = llvm.sub %64, %7084 : !llvm.i64 + %7087 = llvm.select %7085, %7086, %7084 : !llvm.i1, !llvm.i64 + %7088 = llvm.sdiv %7087, %68 : !llvm.i64 + %7089 = llvm.sub %64, %7088 : !llvm.i64 + %7090 = llvm.select %7085, %7089, %7088 : !llvm.i1, !llvm.i64 + %7091 = llvm.srem %7090, %68 : !llvm.i64 + %7092 = llvm.icmp "slt" %7091, %67 : !llvm.i64 + %7093 = llvm.add %7091, %68 : !llvm.i64 + %7094 = llvm.select %7092, %7093, %7091 : !llvm.i1, !llvm.i64 + %7095 = llvm.mul %7090, %65 : !llvm.i64 + %7096 = llvm.add %6429, %7095 : !llvm.i64 + %7097 = llvm.add %7096, %54 : !llvm.i64 + %7098 = llvm.icmp "slt" %7097, %67 : !llvm.i64 + %7099 = llvm.sub %64, %7097 : !llvm.i64 + %7100 = llvm.select %7098, %7099, %7097 : !llvm.i1, !llvm.i64 + %7101 = llvm.sdiv %7100, %63 : !llvm.i64 + %7102 = llvm.sub %64, %7101 : !llvm.i64 + %7103 = llvm.select %7098, %7102, %7101 : !llvm.i1, !llvm.i64 + %7104 = llvm.mul %7103, %65 : !llvm.i64 + %7105 = llvm.add %7096, %7104 : !llvm.i64 + %7106 = llvm.add %7105, %54 : !llvm.i64 + %7107 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7108 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7109 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7110 = llvm.mul %7094, %7109 : !llvm.i64 + %7111 = llvm.add %7108, %7110 : !llvm.i64 + %7112 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7113 = llvm.mul %67, %7112 : !llvm.i64 + %7114 = llvm.add %7111, %7113 : !llvm.i64 + %7115 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7116 = llvm.mul %7106, %7115 : !llvm.i64 + %7117 = llvm.add %7114, %7116 : !llvm.i64 + %7118 = llvm.getelementptr %7107[%7117] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7119 = llvm.load %7118 : !llvm.ptr> + %7120 = llvm.fadd %7083, %7119 : !llvm.vec<8 x float> + %7121 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7122 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7123 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7124 = llvm.mul %67, %7123 : !llvm.i64 + %7125 = llvm.add %7122, %7124 : !llvm.i64 + %7126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7127 = llvm.mul %54, %7126 : !llvm.i64 + %7128 = llvm.add %7125, %7127 : !llvm.i64 + %7129 = llvm.getelementptr %7121[%7128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7120, %7129 : !llvm.ptr> + %7130 = llvm.add %6315, %55 : !llvm.i64 + %7131 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7134 = llvm.mul %2345, %7133 : !llvm.i64 + %7135 = llvm.add %7132, %7134 : !llvm.i64 + %7136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7137 = llvm.mul %7130, %7136 : !llvm.i64 + %7138 = llvm.add %7135, %7137 : !llvm.i64 + %7139 = llvm.getelementptr %7131[%7138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7140 = llvm.bitcast %7139 : !llvm.ptr to !llvm.ptr> + %7141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7143 = llvm.trunc %7130 : !llvm.i64 to !llvm.i32 + %7144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7146 = llvm.insertelement %7143, %7144[%7145 : !llvm.i32] : !llvm.vec<8 x i32> + %7147 = llvm.shufflevector %7146, %7144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7148 = llvm.add %7147, %7142 : !llvm.vec<8 x i32> + %7149 = llvm.trunc %7141 : !llvm.i64 to !llvm.i32 + %7150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7152 = llvm.insertelement %7149, %7150[%7151 : !llvm.i32] : !llvm.vec<8 x i32> + %7153 = llvm.shufflevector %7152, %7150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7154 = llvm.icmp "slt" %7148, %7153 : !llvm.vec<8 x i32> + %7155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7156 = llvm.intr.masked.load %7140, %7154, %7155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7157 = llvm.add %6347, %56 : !llvm.i64 + %7158 = llvm.icmp "slt" %7157, %67 : !llvm.i64 + %7159 = llvm.sub %64, %7157 : !llvm.i64 + %7160 = llvm.select %7158, %7159, %7157 : !llvm.i1, !llvm.i64 + %7161 = llvm.sdiv %7160, %68 : !llvm.i64 + %7162 = llvm.sub %64, %7161 : !llvm.i64 + %7163 = llvm.select %7158, %7162, %7161 : !llvm.i1, !llvm.i64 + %7164 = llvm.mul %7163, %60 : !llvm.i64 + %7165 = llvm.add %6347, %7164 : !llvm.i64 + %7166 = llvm.add %7165, %56 : !llvm.i64 + %7167 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7169 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7170 = llvm.mul %7166, %7169 : !llvm.i64 + %7171 = llvm.add %7168, %7170 : !llvm.i64 + %7172 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7173 = llvm.mul %67, %7172 : !llvm.i64 + %7174 = llvm.add %7171, %7173 : !llvm.i64 + %7175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7176 = llvm.mul %6365, %7175 : !llvm.i64 + %7177 = llvm.add %7174, %7176 : !llvm.i64 + %7178 = llvm.getelementptr %7167[%7177] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7179 = llvm.load %7178 : !llvm.ptr> + %7180 = llvm.fadd %7156, %7179 : !llvm.vec<8 x float> + %7181 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7182 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7183 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7184 = llvm.mul %67, %7183 : !llvm.i64 + %7185 = llvm.add %7182, %7184 : !llvm.i64 + %7186 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7187 = llvm.mul %34, %7186 : !llvm.i64 + %7188 = llvm.add %7185, %7187 : !llvm.i64 + %7189 = llvm.getelementptr %7181[%7188] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7180, %7189 : !llvm.ptr> + %7190 = llvm.add %6315, %57 : !llvm.i64 + %7191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7193 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7194 = llvm.mul %2345, %7193 : !llvm.i64 + %7195 = llvm.add %7192, %7194 : !llvm.i64 + %7196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7197 = llvm.mul %7190, %7196 : !llvm.i64 + %7198 = llvm.add %7195, %7197 : !llvm.i64 + %7199 = llvm.getelementptr %7191[%7198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7200 = llvm.bitcast %7199 : !llvm.ptr to !llvm.ptr> + %7201 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7202 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7203 = llvm.trunc %7190 : !llvm.i64 to !llvm.i32 + %7204 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7205 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7206 = llvm.insertelement %7203, %7204[%7205 : !llvm.i32] : !llvm.vec<8 x i32> + %7207 = llvm.shufflevector %7206, %7204 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7208 = llvm.add %7207, %7202 : !llvm.vec<8 x i32> + %7209 = llvm.trunc %7201 : !llvm.i64 to !llvm.i32 + %7210 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7211 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7212 = llvm.insertelement %7209, %7210[%7211 : !llvm.i32] : !llvm.vec<8 x i32> + %7213 = llvm.shufflevector %7212, %7210 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7214 = llvm.icmp "slt" %7208, %7213 : !llvm.vec<8 x i32> + %7215 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7216 = llvm.intr.masked.load %7200, %7214, %7215 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7217 = llvm.add %5447, %57 : !llvm.i64 + %7218 = llvm.icmp "slt" %7217, %67 : !llvm.i64 + %7219 = llvm.sub %64, %7217 : !llvm.i64 + %7220 = llvm.select %7218, %7219, %7217 : !llvm.i1, !llvm.i64 + %7221 = llvm.sdiv %7220, %68 : !llvm.i64 + %7222 = llvm.sub %64, %7221 : !llvm.i64 + %7223 = llvm.select %7218, %7222, %7221 : !llvm.i1, !llvm.i64 + %7224 = llvm.srem %7223, %68 : !llvm.i64 + %7225 = llvm.icmp "slt" %7224, %67 : !llvm.i64 + %7226 = llvm.add %7224, %68 : !llvm.i64 + %7227 = llvm.select %7225, %7226, %7224 : !llvm.i1, !llvm.i64 + %7228 = llvm.mul %7223, %65 : !llvm.i64 + %7229 = llvm.add %6429, %7228 : !llvm.i64 + %7230 = llvm.add %7229, %58 : !llvm.i64 + %7231 = llvm.icmp "slt" %7230, %67 : !llvm.i64 + %7232 = llvm.sub %64, %7230 : !llvm.i64 + %7233 = llvm.select %7231, %7232, %7230 : !llvm.i1, !llvm.i64 + %7234 = llvm.sdiv %7233, %63 : !llvm.i64 + %7235 = llvm.sub %64, %7234 : !llvm.i64 + %7236 = llvm.select %7231, %7235, %7234 : !llvm.i1, !llvm.i64 + %7237 = llvm.mul %7236, %65 : !llvm.i64 + %7238 = llvm.add %7229, %7237 : !llvm.i64 + %7239 = llvm.add %7238, %58 : !llvm.i64 + %7240 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7242 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7243 = llvm.mul %7227, %7242 : !llvm.i64 + %7244 = llvm.add %7241, %7243 : !llvm.i64 + %7245 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7246 = llvm.mul %67, %7245 : !llvm.i64 + %7247 = llvm.add %7244, %7246 : !llvm.i64 + %7248 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7249 = llvm.mul %7239, %7248 : !llvm.i64 + %7250 = llvm.add %7247, %7249 : !llvm.i64 + %7251 = llvm.getelementptr %7240[%7250] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7252 = llvm.load %7251 : !llvm.ptr> + %7253 = llvm.fadd %7216, %7252 : !llvm.vec<8 x float> + %7254 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7256 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7257 = llvm.mul %67, %7256 : !llvm.i64 + %7258 = llvm.add %7255, %7257 : !llvm.i64 + %7259 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7260 = llvm.mul %58, %7259 : !llvm.i64 + %7261 = llvm.add %7258, %7260 : !llvm.i64 + %7262 = llvm.getelementptr %7254[%7261] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7253, %7262 : !llvm.ptr> + %7263 = llvm.add %6315, %59 : !llvm.i64 + %7264 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7265 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7266 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7267 = llvm.mul %2345, %7266 : !llvm.i64 + %7268 = llvm.add %7265, %7267 : !llvm.i64 + %7269 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7270 = llvm.mul %7263, %7269 : !llvm.i64 + %7271 = llvm.add %7268, %7270 : !llvm.i64 + %7272 = llvm.getelementptr %7264[%7271] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7273 = llvm.bitcast %7272 : !llvm.ptr to !llvm.ptr> + %7274 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7275 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7276 = llvm.trunc %7263 : !llvm.i64 to !llvm.i32 + %7277 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7278 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7279 = llvm.insertelement %7276, %7277[%7278 : !llvm.i32] : !llvm.vec<8 x i32> + %7280 = llvm.shufflevector %7279, %7277 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7281 = llvm.add %7280, %7275 : !llvm.vec<8 x i32> + %7282 = llvm.trunc %7274 : !llvm.i64 to !llvm.i32 + %7283 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7284 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7285 = llvm.insertelement %7282, %7283[%7284 : !llvm.i32] : !llvm.vec<8 x i32> + %7286 = llvm.shufflevector %7285, %7283 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7287 = llvm.icmp "slt" %7281, %7286 : !llvm.vec<8 x i32> + %7288 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7289 = llvm.intr.masked.load %7273, %7287, %7288 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7290 = llvm.add %6347, %61 : !llvm.i64 + %7291 = llvm.icmp "slt" %7290, %67 : !llvm.i64 + %7292 = llvm.sub %64, %7290 : !llvm.i64 + %7293 = llvm.select %7291, %7292, %7290 : !llvm.i1, !llvm.i64 + %7294 = llvm.sdiv %7293, %68 : !llvm.i64 + %7295 = llvm.sub %64, %7294 : !llvm.i64 + %7296 = llvm.select %7291, %7295, %7294 : !llvm.i1, !llvm.i64 + %7297 = llvm.mul %7296, %60 : !llvm.i64 + %7298 = llvm.add %6347, %7297 : !llvm.i64 + %7299 = llvm.add %7298, %61 : !llvm.i64 + %7300 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7302 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7303 = llvm.mul %7299, %7302 : !llvm.i64 + %7304 = llvm.add %7301, %7303 : !llvm.i64 + %7305 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7306 = llvm.mul %67, %7305 : !llvm.i64 + %7307 = llvm.add %7304, %7306 : !llvm.i64 + %7308 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7309 = llvm.mul %6365, %7308 : !llvm.i64 + %7310 = llvm.add %7307, %7309 : !llvm.i64 + %7311 = llvm.getelementptr %7300[%7310] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7312 = llvm.load %7311 : !llvm.ptr> + %7313 = llvm.fadd %7289, %7312 : !llvm.vec<8 x float> + %7314 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7316 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7317 = llvm.mul %67, %7316 : !llvm.i64 + %7318 = llvm.add %7315, %7317 : !llvm.i64 + %7319 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7320 = llvm.mul %35, %7319 : !llvm.i64 + %7321 = llvm.add %7318, %7320 : !llvm.i64 + %7322 = llvm.getelementptr %7314[%7321] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7313, %7322 : !llvm.ptr> + %7323 = llvm.add %6315, %62 : !llvm.i64 + %7324 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7325 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7327 = llvm.mul %2345, %7326 : !llvm.i64 + %7328 = llvm.add %7325, %7327 : !llvm.i64 + %7329 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7330 = llvm.mul %7323, %7329 : !llvm.i64 + %7331 = llvm.add %7328, %7330 : !llvm.i64 + %7332 = llvm.getelementptr %7324[%7331] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7333 = llvm.bitcast %7332 : !llvm.ptr to !llvm.ptr> + %7334 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7335 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7336 = llvm.trunc %7323 : !llvm.i64 to !llvm.i32 + %7337 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7338 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7339 = llvm.insertelement %7336, %7337[%7338 : !llvm.i32] : !llvm.vec<8 x i32> + %7340 = llvm.shufflevector %7339, %7337 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7341 = llvm.add %7340, %7335 : !llvm.vec<8 x i32> + %7342 = llvm.trunc %7334 : !llvm.i64 to !llvm.i32 + %7343 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7344 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7345 = llvm.insertelement %7342, %7343[%7344 : !llvm.i32] : !llvm.vec<8 x i32> + %7346 = llvm.shufflevector %7345, %7343 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7347 = llvm.icmp "slt" %7341, %7346 : !llvm.vec<8 x i32> + %7348 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7349 = llvm.intr.masked.load %7333, %7347, %7348 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7350 = llvm.add %5447, %62 : !llvm.i64 + %7351 = llvm.icmp "slt" %7350, %67 : !llvm.i64 + %7352 = llvm.sub %64, %7350 : !llvm.i64 + %7353 = llvm.select %7351, %7352, %7350 : !llvm.i1, !llvm.i64 + %7354 = llvm.sdiv %7353, %68 : !llvm.i64 + %7355 = llvm.sub %64, %7354 : !llvm.i64 + %7356 = llvm.select %7351, %7355, %7354 : !llvm.i1, !llvm.i64 + %7357 = llvm.srem %7356, %68 : !llvm.i64 + %7358 = llvm.icmp "slt" %7357, %67 : !llvm.i64 + %7359 = llvm.add %7357, %68 : !llvm.i64 + %7360 = llvm.select %7358, %7359, %7357 : !llvm.i1, !llvm.i64 + %7361 = llvm.mul %7356, %65 : !llvm.i64 + %7362 = llvm.add %6429, %7361 : !llvm.i64 + %7363 = llvm.add %7362, %66 : !llvm.i64 + %7364 = llvm.icmp "slt" %7363, %67 : !llvm.i64 + %7365 = llvm.sub %64, %7363 : !llvm.i64 + %7366 = llvm.select %7364, %7365, %7363 : !llvm.i1, !llvm.i64 + %7367 = llvm.sdiv %7366, %63 : !llvm.i64 + %7368 = llvm.sub %64, %7367 : !llvm.i64 + %7369 = llvm.select %7364, %7368, %7367 : !llvm.i1, !llvm.i64 + %7370 = llvm.mul %7369, %65 : !llvm.i64 + %7371 = llvm.add %7362, %7370 : !llvm.i64 + %7372 = llvm.add %7371, %66 : !llvm.i64 + %7373 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7375 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7376 = llvm.mul %7360, %7375 : !llvm.i64 + %7377 = llvm.add %7374, %7376 : !llvm.i64 + %7378 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7379 = llvm.mul %67, %7378 : !llvm.i64 + %7380 = llvm.add %7377, %7379 : !llvm.i64 + %7381 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7382 = llvm.mul %7372, %7381 : !llvm.i64 + %7383 = llvm.add %7380, %7382 : !llvm.i64 + %7384 = llvm.getelementptr %7373[%7383] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7385 = llvm.load %7384 : !llvm.ptr> + %7386 = llvm.fadd %7349, %7385 : !llvm.vec<8 x float> + %7387 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7389 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7390 = llvm.mul %67, %7389 : !llvm.i64 + %7391 = llvm.add %7388, %7390 : !llvm.i64 + %7392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7393 = llvm.mul %66, %7392 : !llvm.i64 + %7394 = llvm.add %7391, %7393 : !llvm.i64 + %7395 = llvm.getelementptr %7387[%7394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7386, %7395 : !llvm.ptr> + llvm.br ^bb48(%67 : !llvm.i64) + ^bb48(%7396: !llvm.i64): // 2 preds: ^bb47, ^bb49 + %7397 = llvm.icmp "slt" %7396, %68 : !llvm.i64 + llvm.cond_br %7397, ^bb49, ^bb46 + ^bb49: // pred: ^bb48 + %7398 = llvm.mul %7396, %70 : !llvm.i64 + %7399 = llvm.add %6315, %7398 : !llvm.i64 + %7400 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7403 = llvm.mul %67, %7402 : !llvm.i64 + %7404 = llvm.add %7401, %7403 : !llvm.i64 + %7405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7406 = llvm.mul %7396, %7405 : !llvm.i64 + %7407 = llvm.add %7404, %7406 : !llvm.i64 + %7408 = llvm.getelementptr %7400[%7407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7409 = llvm.load %7408 : !llvm.ptr> + %7410 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7413 = llvm.mul %2345, %7412 : !llvm.i64 + %7414 = llvm.add %7411, %7413 : !llvm.i64 + %7415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7416 = llvm.mul %7399, %7415 : !llvm.i64 + %7417 = llvm.add %7414, %7416 : !llvm.i64 + %7418 = llvm.getelementptr %7410[%7417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7419 = llvm.bitcast %7418 : !llvm.ptr to !llvm.ptr> + %7420 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7421 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7422 = llvm.trunc %7399 : !llvm.i64 to !llvm.i32 + %7423 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7424 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7425 = llvm.insertelement %7422, %7423[%7424 : !llvm.i32] : !llvm.vec<8 x i32> + %7426 = llvm.shufflevector %7425, %7423 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7427 = llvm.add %7426, %7421 : !llvm.vec<8 x i32> + %7428 = llvm.trunc %7420 : !llvm.i64 to !llvm.i32 + %7429 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7430 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7431 = llvm.insertelement %7428, %7429[%7430 : !llvm.i32] : !llvm.vec<8 x i32> + %7432 = llvm.shufflevector %7431, %7429 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7433 = llvm.icmp "slt" %7427, %7432 : !llvm.vec<8 x i32> + llvm.intr.masked.store %7409, %7419, %7433 {alignment = 4 : i32} : !llvm.vec<8 x float>, !llvm.vec<8 x i1> into !llvm.ptr> + %7434 = llvm.add %7396, %69 : !llvm.i64 + llvm.br ^bb48(%7434 : !llvm.i64) + ^bb50: // pred: ^bb46 + %7435 = llvm.add %5447, %39 : !llvm.i64 + llvm.br ^bb41(%7435 : !llvm.i64) + ^bb51: // pred: ^bb41 + %7436 = llvm.add %2345, %69 : !llvm.i64 + llvm.br ^bb12(%7436 : !llvm.i64) + ^bb52: // pred: ^bb12 + %7437 = llvm.add %151, %38 : !llvm.i64 + llvm.br ^bb1(%7437 : !llvm.i64) + ^bb53: // pred: ^bb1 + llvm.return + } + llvm.func @optimized_matmul_py_4a6286d9(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(784 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(128 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(128 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(512 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(512 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(784 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(512 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(512 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/optimized_matmul/mlir/2_LoopNestToValueFunc.mlir b/Tutorials/optimized_matmul/mlir/2_LoopNestToValueFunc.mlir new file mode 100644 index 00000000..d23d9108 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/2_LoopNestToValueFunc.mlir @@ -0,0 +1,476 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "optimized_matmul" { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + accv.func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant dense<0.000000e+00> : vector<8xf32> + %cst_0 = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + "accv.lambda"() ( { + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + affine.for %arg3 = 0 to 512 step 256 { + affine.for %arg4 = 0 to 128 step 128 { + "accv.lambda"() ( { + affine.for %arg5 = 0 to 128 { + affine.for %arg6 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + "accv.lambda"() ( { + affine.for %arg7 = 0 to 1 { + affine.for %arg8 = 0 to 16 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg5, %arg7) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg6, %arg8) + %6 = vector.transfer_read %arg1[%4, %5], %cst_0 {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %0[%arg7, %arg8] : memref<1x16xvector<8xf32>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{i_1,24}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i_0,23}">, #accln<"index{i_1,24}">], subdomainSize = [1, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{i_0,23}">, subdomainIndexOrder = [#accln<"index{i_0,23}">, #accln<"index{i_1,24}">], subdomainSize = [1, 16]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_15", type = () -> ()} : () -> () + "accv.lambda"() ( { + affine.for %arg7 = 0 to 1 { + affine.for %arg8 = 0 to 16 { + %4 = load %0[%arg7, %arg8] : memref<1x16xvector<8xf32>> + affine.store %4, %3[((%arg6 + %arg8 * 8) floordiv 16) mod 16, (%arg5 + %arg7) mod 128, (((%arg6 + %arg8 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{i_1,26}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i_0,25}">, #accln<"index{i_1,26}">], subdomainSize = [1, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{i_0,25}">, subdomainIndexOrder = [#accln<"index{i_0,25}">, #accln<"index{i_1,26}">], subdomainSize = [1, 16]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_14", type = () -> ()} : () -> () + } else { + "accv.lambda"() ( { + affine.for %arg7 = 0 to 1 { + affine.for %arg8 = 0 to 16 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg5, %arg7) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg6, %arg8) + %6 = vector.transfer_read %arg1[%4, %5], %cst_0 : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %0[%arg7, %arg8] : memref<1x16xvector<8xf32>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{i_1,28}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i_0,27}">, #accln<"index{i_1,28}">], subdomainSize = [1, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{i_0,27}">, subdomainIndexOrder = [#accln<"index{i_0,27}">, #accln<"index{i_1,28}">], subdomainSize = [1, 16]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_13", type = () -> ()} : () -> () + "accv.lambda"() ( { + affine.for %arg7 = 0 to 1 { + affine.for %arg8 = 0 to 16 { + %4 = load %0[%arg7, %arg8] : memref<1x16xvector<8xf32>> + affine.store %4, %3[((%arg6 + %arg8 * 8) floordiv 16) mod 16, (%arg5 + %arg7) mod 128, (((%arg6 + %arg8 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{i_1,30}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i_0,29}">, #accln<"index{i_1,30}">], subdomainSize = [1, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{i_0,29}">, subdomainIndexOrder = [#accln<"index{i_0,29}">, #accln<"index{i_1,30}">], subdomainSize = [1, 16]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_12", type = () -> ()} : () -> () + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_o,21}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i_o,19}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 256]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_7", type = () -> ()} : () -> () + affine.for %arg5 = 0 to 784 { + "accv.lambda"() ( { + affine.for %arg6 = 0 to 16 { + affine.for %arg7 = 0 to 6 { + affine.for %arg8 = 0 to 2 { + store %cst, %2[%arg6, %arg7, %arg8] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 2 : i64, index = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 6 : i64, index = #accln<"index{j_i_i_o,15}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 2]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i,14}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 6, 2]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_6", type = () -> ()} : () -> () + affine.for %arg6 = 0 to 256 step 16 { + affine.for %arg7 = 0 to 128 step 4 { + affine.for %arg8 = 0 to 0 step 6 { + affine.for %arg9 = 0 to 4 { + affine.for %arg10 = 0 to 0 { + affine.for %arg11 = 0 to 16 step 8 { + affine.for %arg12 = 0 to 8 step 8 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %12 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %21 = load %arg0[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg0[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg0[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg0[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg0[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg0[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg0[%10, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %28 = load %arg0[%11, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = affine.load %3[((%12 - %arg3) floordiv 16) mod 16, (%13 - %arg4) mod 128, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %30 = vector.extractelement %29[%c0_i64 : i64] : vector<8xf32> + %31 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %32 = affine.load %3[((%31 - %arg3) floordiv 16) mod 16, (%14 - %arg4) mod 128, (((%31 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c1_i64 : i64] : vector<8xf32> + %34 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %35 = affine.load %3[((%34 - %arg3) floordiv 16) mod 16, (%15 - %arg4) mod 128, (((%34 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %36 = vector.extractelement %35[%c2_i64 : i64] : vector<8xf32> + %37 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %38 = affine.load %3[((%37 - %arg3) floordiv 16) mod 16, (%16 - %arg4) mod 128, (((%37 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = vector.extractelement %38[%c3_i64 : i64] : vector<8xf32> + %40 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %41 = affine.load %3[((%40 - %arg3) floordiv 16) mod 16, (%17 - %arg4) mod 128, (((%40 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c4_i64 : i64] : vector<8xf32> + %43 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %44 = affine.load %3[((%43 - %arg3) floordiv 16) mod 16, (%18 - %arg4) mod 128, (((%43 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = vector.extractelement %44[%c5_i64 : i64] : vector<8xf32> + %46 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %47 = affine.load %3[((%46 - %arg3) floordiv 16) mod 16, (%19 - %arg4) mod 128, (((%46 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c6_i64 : i64] : vector<8xf32> + %49 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %50 = affine.load %3[((%49 - %arg3) floordiv 16) mod 16, (%20 - %arg4) mod 128, (((%49 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = vector.extractelement %50[%c7_i64 : i64] : vector<8xf32> + %52 = "accv.bin_op"(%21, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %36) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %42) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %45) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = "accv.bin_op"(%27, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = "accv.bin_op"(%28, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %60 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %61 = vector.extractelement %60[%c0_i64 : i64] : vector<8xf32> + %62 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %63 = affine.load %2[((%62 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%62 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = vector.extractelement %63[%c1_i64 : i64] : vector<8xf32> + %65 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %66 = affine.load %2[((%65 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%65 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = vector.extractelement %66[%c2_i64 : i64] : vector<8xf32> + %68 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %69 = affine.load %2[((%68 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%68 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c3_i64 : i64] : vector<8xf32> + %71 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %72 = affine.load %2[((%71 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%71 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c4_i64 : i64] : vector<8xf32> + %74 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %75 = affine.load %2[((%74 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%74 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %78 = affine.load %2[((%77 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%77 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c6_i64 : i64] : vector<8xf32> + %80 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %81 = affine.load %2[((%80 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%80 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = vector.extractelement %81[%c7_i64 : i64] : vector<8xf32> + %83 = "accv.bin_op"(%61, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%64, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%67, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%70, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%73, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%76, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = "accv.bin_op"(%79, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + %90 = "accv.bin_op"(%82, %59) {predicate = 0 : i64} : (f32, f32) -> f32 + %91 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %83, %91[%c0_i64 : i64] : vector<8xf32> + affine.store %92, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %94 = affine.load %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %84, %94[%c1_i64 : i64] : vector<8xf32> + affine.store %95, %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %97 = affine.load %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c2_i64 : i64] : vector<8xf32> + affine.store %98, %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %100 = affine.load %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %86, %100[%c3_i64 : i64] : vector<8xf32> + affine.store %101, %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %103 = affine.load %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %87, %103[%c4_i64 : i64] : vector<8xf32> + affine.store %104, %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %106 = affine.load %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %88, %106[%c5_i64 : i64] : vector<8xf32> + affine.store %107, %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %109 = affine.load %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %89, %109[%c6_i64 : i64] : vector<8xf32> + affine.store %110, %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %112 = affine.load %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %90, %112[%c7_i64 : i64] : vector<8xf32> + affine.store %113, %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %83, %114[%c0_i64 : i64] : vector<8xf32> + affine.store %115, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %117 = affine.load %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %84, %117[%c1_i64 : i64] : vector<8xf32> + affine.store %118, %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %120 = affine.load %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %85, %120[%c2_i64 : i64] : vector<8xf32> + affine.store %121, %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %123 = affine.load %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %124 = vector.insertelement %86, %123[%c3_i64 : i64] : vector<8xf32> + affine.store %124, %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %126 = affine.load %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %87, %126[%c4_i64 : i64] : vector<8xf32> + affine.store %127, %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %129 = affine.load %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %130 = vector.insertelement %88, %129[%c5_i64 : i64] : vector<8xf32> + affine.store %130, %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %132 = affine.load %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %133 = vector.insertelement %89, %132[%c6_i64 : i64] : vector<8xf32> + affine.store %133, %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %135 = affine.load %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %136 = vector.insertelement %90, %135[%c7_i64 : i64] : vector<8xf32> + affine.store %136, %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 8 : i64, index = #accln<"index{j_i_i_i,16}">, scheduledIndex = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 1, 1]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i_o,15}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 8, 1]} + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_i,12}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_o,11}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 4]} + affine.for %arg8 = 0 to 1 step 6 { + affine.for %arg9 = 0 to 4 { + affine.for %arg10 = 0 to 1 { + affine.for %arg11 = 0 to 16 step 8 { + affine.for %arg12 = 0 to 8 step 8 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg8, %arg10) + %12 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %21 = load %arg0[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg0[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg0[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg0[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg0[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg0[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg0[%10, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %28 = load %arg0[%11, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = affine.load %3[((%12 - %arg3) floordiv 16) mod 16, (%13 - %arg4) mod 128, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %30 = vector.extractelement %29[%c0_i64 : i64] : vector<8xf32> + %31 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %32 = affine.load %3[((%31 - %arg3) floordiv 16) mod 16, (%14 - %arg4) mod 128, (((%31 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c1_i64 : i64] : vector<8xf32> + %34 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %35 = affine.load %3[((%34 - %arg3) floordiv 16) mod 16, (%15 - %arg4) mod 128, (((%34 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %36 = vector.extractelement %35[%c2_i64 : i64] : vector<8xf32> + %37 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %38 = affine.load %3[((%37 - %arg3) floordiv 16) mod 16, (%16 - %arg4) mod 128, (((%37 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = vector.extractelement %38[%c3_i64 : i64] : vector<8xf32> + %40 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %41 = affine.load %3[((%40 - %arg3) floordiv 16) mod 16, (%17 - %arg4) mod 128, (((%40 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c4_i64 : i64] : vector<8xf32> + %43 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %44 = affine.load %3[((%43 - %arg3) floordiv 16) mod 16, (%18 - %arg4) mod 128, (((%43 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = vector.extractelement %44[%c5_i64 : i64] : vector<8xf32> + %46 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %47 = affine.load %3[((%46 - %arg3) floordiv 16) mod 16, (%19 - %arg4) mod 128, (((%46 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c6_i64 : i64] : vector<8xf32> + %49 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %50 = affine.load %3[((%49 - %arg3) floordiv 16) mod 16, (%20 - %arg4) mod 128, (((%49 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = vector.extractelement %50[%c7_i64 : i64] : vector<8xf32> + %52 = "accv.bin_op"(%21, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %36) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %42) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %45) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = "accv.bin_op"(%27, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = "accv.bin_op"(%28, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %60 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %61 = vector.extractelement %60[%c0_i64 : i64] : vector<8xf32> + %62 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %63 = affine.load %2[((%62 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%62 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = vector.extractelement %63[%c1_i64 : i64] : vector<8xf32> + %65 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %66 = affine.load %2[((%65 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%65 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = vector.extractelement %66[%c2_i64 : i64] : vector<8xf32> + %68 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %69 = affine.load %2[((%68 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%68 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c3_i64 : i64] : vector<8xf32> + %71 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %72 = affine.load %2[((%71 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%71 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c4_i64 : i64] : vector<8xf32> + %74 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %75 = affine.load %2[((%74 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%74 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %78 = affine.load %2[((%77 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%77 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c6_i64 : i64] : vector<8xf32> + %80 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %81 = affine.load %2[((%80 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%80 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = vector.extractelement %81[%c7_i64 : i64] : vector<8xf32> + %83 = "accv.bin_op"(%61, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%64, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%67, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%70, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%73, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%76, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = "accv.bin_op"(%79, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + %90 = "accv.bin_op"(%82, %59) {predicate = 0 : i64} : (f32, f32) -> f32 + %91 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %83, %91[%c0_i64 : i64] : vector<8xf32> + affine.store %92, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %94 = affine.load %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %84, %94[%c1_i64 : i64] : vector<8xf32> + affine.store %95, %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %97 = affine.load %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c2_i64 : i64] : vector<8xf32> + affine.store %98, %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %100 = affine.load %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %86, %100[%c3_i64 : i64] : vector<8xf32> + affine.store %101, %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %103 = affine.load %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %87, %103[%c4_i64 : i64] : vector<8xf32> + affine.store %104, %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %106 = affine.load %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %88, %106[%c5_i64 : i64] : vector<8xf32> + affine.store %107, %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %109 = affine.load %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %89, %109[%c6_i64 : i64] : vector<8xf32> + affine.store %110, %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %112 = affine.load %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %90, %112[%c7_i64 : i64] : vector<8xf32> + affine.store %113, %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %83, %114[%c0_i64 : i64] : vector<8xf32> + affine.store %115, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg5) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %117 = affine.load %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %84, %117[%c1_i64 : i64] : vector<8xf32> + affine.store %118, %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg5) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %120 = affine.load %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %85, %120[%c2_i64 : i64] : vector<8xf32> + affine.store %121, %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg5) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %123 = affine.load %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %124 = vector.insertelement %86, %123[%c3_i64 : i64] : vector<8xf32> + affine.store %124, %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg5) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %126 = affine.load %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %87, %126[%c4_i64 : i64] : vector<8xf32> + affine.store %127, %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg5) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %129 = affine.load %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %130 = vector.insertelement %88, %129[%c5_i64 : i64] : vector<8xf32> + affine.store %130, %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg5) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %132 = affine.load %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %133 = vector.insertelement %89, %132[%c6_i64 : i64] : vector<8xf32> + affine.store %133, %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg5) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg6, %arg11, %arg12) + %135 = affine.load %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %136 = vector.insertelement %90, %135[%c7_i64 : i64] : vector<8xf32> + affine.store %136, %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg5) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 8 : i64, index = #accln<"index{j_i_i_i,16}">, scheduledIndex = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i_o,15}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 8, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{i_i_i,12}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 1]} + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{i_i_o,11}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 4]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 128]} + "accv.lambda"() ( { + affine.for %arg6 = 0 to 1 { + affine.for %arg7 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + "accv.lambda"() ( { + affine.for %arg8 = 0 to 1 { + affine.for %arg9 = 0 to 16 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg6, %arg8) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg7, %arg9) + %6 = vector.transfer_read %arg2[%4, %5], %cst_0 {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %7 = affine.load %2[((%arg7 + %arg9 * 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 6, (((%arg7 + %arg9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %8 = addf %6, %7 : vector<8xf32> + store %8, %1[%arg8, %arg9] : memref<1x16xvector<8xf32>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{i_o,7}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{k_i,6}">, #accln<"index{i_o,7}">], subdomainSize = [1, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{k_i,6}">, subdomainIndexOrder = [#accln<"index{k_i,6}">, #accln<"index{i_o,7}">], subdomainSize = [1, 16]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_11", type = () -> ()} : () -> () + "accv.lambda"() ( { + affine.for %arg8 = 0 to 1 { + affine.for %arg9 = 0 to 16 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg6, %arg8) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg7, %arg9) + %6 = load %1[%arg8, %arg9] : memref<1x16xvector<8xf32>> + vector.transfer_write %6, %arg2[%4, %5] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{i_i,8}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 16]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_10", type = () -> ()} : () -> () + } else { + "accv.lambda"() ( { + affine.for %arg8 = 0 to 1 { + affine.for %arg9 = 0 to 16 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg6, %arg8) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg7, %arg9) + %6 = vector.transfer_read %arg2[%4, %5], %cst_0 : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %7 = affine.load %2[((%arg7 + %arg9 * 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 6, (((%arg7 + %arg9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %8 = addf %6, %7 : vector<8xf32> + store %8, %1[%arg8, %arg9] : memref<1x16xvector<8xf32>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{i_i_o,11}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{k_i_i,10}">, #accln<"index{i_i_o,11}">], subdomainSize = [1, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{k_i_i,10}">, #accln<"index{i_i_o,11}">], subdomainSize = [1, 16]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_9", type = () -> ()} : () -> () + "accv.lambda"() ( { + affine.for %arg8 = 0 to 1 { + affine.for %arg9 = 0 to 16 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg5, %arg6, %arg8) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg7, %arg9) + %6 = load %1[%arg8, %arg9] : memref<1x16xvector<8xf32>> + vector.transfer_write %6, %arg2[%4, %5] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 1]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{i_i_i,12}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 16]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_8", type = () -> ()} : () -> () + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 1 : i64, index = #accln<"index{k,2}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 256]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_5", type = () -> ()} : () -> () + } {begin = 0 : i64, end = 784 : i64, index = #accln<"index{i_o,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 128]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_o,5}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + accv.return + }) {exec_target = 0 : i64, sym_name = "NestFunction_0", type = () -> ()} : () -> () + accv.return + } + accv.func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + accv.return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/3_ValueFuncToTarget.mlir b/Tutorials/optimized_matmul/mlir/3_ValueFuncToTarget.mlir new file mode 100644 index 00000000..8f0e2338 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/3_ValueFuncToTarget.mlir @@ -0,0 +1,3384 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "optimized_matmul" { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %c0_1 = constant 0 : index + %c0_2 = constant 0 : index + %c0_3 = constant 0 : index + %c0_4 = constant 0 : index + %c0_5 = constant 0 : index + %c0_6 = constant 0 : index + %c0_7 = constant 0 : index + %c0_8 = constant 0 : index + %c0_9 = constant 0 : index + %c0_10 = constant 0 : index + %c0_11 = constant 0 : index + %c0_12 = constant 0 : index + %c0_13 = constant 0 : index + %c0_14 = constant 0 : index + %c0_15 = constant 0 : index + %c0_16 = constant 0 : index + %c0_17 = constant 0 : index + %c0_18 = constant 0 : index + %c0_19 = constant 0 : index + %c0_20 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_21 = constant dense<0.000000e+00> : vector<8xf32> + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + affine.for %arg3 = 0 to 512 step 256 { + affine.for %arg4 = 0 to 128 { + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %c0_20) + %6 = vector.transfer_read %arg1[%4, %5], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %0[%c0_19, %c0_20] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_20) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %7) + %10 = vector.transfer_read %arg1[%8, %9], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %10, %0[%c0_19, %7] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_20) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %11) + %14 = vector.transfer_read %arg1[%12, %13], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %0[%c0_19, %11] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_20) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %15) + %18 = vector.transfer_read %arg1[%16, %17], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %0[%c0_19, %15] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_20) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %19) + %22 = vector.transfer_read %arg1[%20, %21], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %22, %0[%c0_19, %19] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_20) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %23) + %26 = vector.transfer_read %arg1[%24, %25], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %0[%c0_19, %23] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_20) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %27) + %30 = vector.transfer_read %arg1[%28, %29], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %0[%c0_19, %27] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_20) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %31) + %34 = vector.transfer_read %arg1[%32, %33], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %34, %0[%c0_19, %31] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_20) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %35) + %38 = vector.transfer_read %arg1[%36, %37], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %0[%c0_19, %35] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_20) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %39) + %42 = vector.transfer_read %arg1[%40, %41], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %0[%c0_19, %39] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_20) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %43) + %46 = vector.transfer_read %arg1[%44, %45], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %46, %0[%c0_19, %43] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_20) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %47) + %50 = vector.transfer_read %arg1[%48, %49], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %0[%c0_19, %47] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_20) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %51) + %54 = vector.transfer_read %arg1[%52, %53], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %54, %0[%c0_19, %51] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_20) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %55) + %58 = vector.transfer_read %arg1[%56, %57], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %58, %0[%c0_19, %55] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_20) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %59) + %62 = vector.transfer_read %arg1[%60, %61], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %62, %0[%c0_19, %59] : memref<1x16xvector<8xf32>> + %63 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_20) + %64 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %65 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %63) + %66 = vector.transfer_read %arg1[%64, %65], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %66, %0[%c0_19, %63] : memref<1x16xvector<8xf32>> + %67 = load %0[%c0_17, %c0_18] : memref<1x16xvector<8xf32>> + affine.store %67, %3[((%arg5 + %c0_18 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %c0_18 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %68 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_18) + %69 = load %0[%c0_17, %68] : memref<1x16xvector<8xf32>> + affine.store %69, %3[((%arg5 + %68 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %68 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %70 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_18) + %71 = load %0[%c0_17, %70] : memref<1x16xvector<8xf32>> + affine.store %71, %3[((%arg5 + %70 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %70 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %72 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_18) + %73 = load %0[%c0_17, %72] : memref<1x16xvector<8xf32>> + affine.store %73, %3[((%arg5 + %72 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %72 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %74 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_18) + %75 = load %0[%c0_17, %74] : memref<1x16xvector<8xf32>> + affine.store %75, %3[((%arg5 + %74 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %74 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %76 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_18) + %77 = load %0[%c0_17, %76] : memref<1x16xvector<8xf32>> + affine.store %77, %3[((%arg5 + %76 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %76 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %78 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_18) + %79 = load %0[%c0_17, %78] : memref<1x16xvector<8xf32>> + affine.store %79, %3[((%arg5 + %78 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %78 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %80 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_18) + %81 = load %0[%c0_17, %80] : memref<1x16xvector<8xf32>> + affine.store %81, %3[((%arg5 + %80 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %80 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %82 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_18) + %83 = load %0[%c0_17, %82] : memref<1x16xvector<8xf32>> + affine.store %83, %3[((%arg5 + %82 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %82 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %84 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_18) + %85 = load %0[%c0_17, %84] : memref<1x16xvector<8xf32>> + affine.store %85, %3[((%arg5 + %84 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %84 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %86 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_18) + %87 = load %0[%c0_17, %86] : memref<1x16xvector<8xf32>> + affine.store %87, %3[((%arg5 + %86 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %86 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_18) + %89 = load %0[%c0_17, %88] : memref<1x16xvector<8xf32>> + affine.store %89, %3[((%arg5 + %88 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %88 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %90 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_18) + %91 = load %0[%c0_17, %90] : memref<1x16xvector<8xf32>> + affine.store %91, %3[((%arg5 + %90 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %90 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %92 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_18) + %93 = load %0[%c0_17, %92] : memref<1x16xvector<8xf32>> + affine.store %93, %3[((%arg5 + %92 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %92 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_18) + %95 = load %0[%c0_17, %94] : memref<1x16xvector<8xf32>> + affine.store %95, %3[((%arg5 + %94 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %94 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_18) + %97 = load %0[%c0_17, %96] : memref<1x16xvector<8xf32>> + affine.store %97, %3[((%arg5 + %96 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %96 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } else { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %c0_16) + %6 = vector.transfer_read %arg1[%4, %5], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %0[%c0_15, %c0_16] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_16) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %7) + %10 = vector.transfer_read %arg1[%8, %9], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %10, %0[%c0_15, %7] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_16) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %11) + %14 = vector.transfer_read %arg1[%12, %13], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %0[%c0_15, %11] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_16) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %15) + %18 = vector.transfer_read %arg1[%16, %17], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %0[%c0_15, %15] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_16) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %19) + %22 = vector.transfer_read %arg1[%20, %21], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %22, %0[%c0_15, %19] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_16) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %23) + %26 = vector.transfer_read %arg1[%24, %25], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %0[%c0_15, %23] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_16) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %27) + %30 = vector.transfer_read %arg1[%28, %29], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %0[%c0_15, %27] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_16) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %31) + %34 = vector.transfer_read %arg1[%32, %33], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %34, %0[%c0_15, %31] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_16) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %35) + %38 = vector.transfer_read %arg1[%36, %37], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %0[%c0_15, %35] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_16) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %39) + %42 = vector.transfer_read %arg1[%40, %41], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %0[%c0_15, %39] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_16) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %43) + %46 = vector.transfer_read %arg1[%44, %45], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %46, %0[%c0_15, %43] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_16) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %47) + %50 = vector.transfer_read %arg1[%48, %49], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %0[%c0_15, %47] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_16) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %51) + %54 = vector.transfer_read %arg1[%52, %53], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %54, %0[%c0_15, %51] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_16) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %55) + %58 = vector.transfer_read %arg1[%56, %57], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %58, %0[%c0_15, %55] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_16) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %59) + %62 = vector.transfer_read %arg1[%60, %61], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %62, %0[%c0_15, %59] : memref<1x16xvector<8xf32>> + %63 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_16) + %64 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %65 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %63) + %66 = vector.transfer_read %arg1[%64, %65], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %66, %0[%c0_15, %63] : memref<1x16xvector<8xf32>> + %67 = load %0[%c0_13, %c0_14] : memref<1x16xvector<8xf32>> + affine.store %67, %3[((%arg5 + %c0_14 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %c0_14 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %68 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_14) + %69 = load %0[%c0_13, %68] : memref<1x16xvector<8xf32>> + affine.store %69, %3[((%arg5 + %68 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %68 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %70 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_14) + %71 = load %0[%c0_13, %70] : memref<1x16xvector<8xf32>> + affine.store %71, %3[((%arg5 + %70 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %70 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %72 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_14) + %73 = load %0[%c0_13, %72] : memref<1x16xvector<8xf32>> + affine.store %73, %3[((%arg5 + %72 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %72 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %74 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_14) + %75 = load %0[%c0_13, %74] : memref<1x16xvector<8xf32>> + affine.store %75, %3[((%arg5 + %74 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %74 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %76 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_14) + %77 = load %0[%c0_13, %76] : memref<1x16xvector<8xf32>> + affine.store %77, %3[((%arg5 + %76 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %76 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %78 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_14) + %79 = load %0[%c0_13, %78] : memref<1x16xvector<8xf32>> + affine.store %79, %3[((%arg5 + %78 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %78 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %80 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_14) + %81 = load %0[%c0_13, %80] : memref<1x16xvector<8xf32>> + affine.store %81, %3[((%arg5 + %80 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %80 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %82 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_14) + %83 = load %0[%c0_13, %82] : memref<1x16xvector<8xf32>> + affine.store %83, %3[((%arg5 + %82 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %82 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %84 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_14) + %85 = load %0[%c0_13, %84] : memref<1x16xvector<8xf32>> + affine.store %85, %3[((%arg5 + %84 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %84 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %86 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_14) + %87 = load %0[%c0_13, %86] : memref<1x16xvector<8xf32>> + affine.store %87, %3[((%arg5 + %86 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %86 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_14) + %89 = load %0[%c0_13, %88] : memref<1x16xvector<8xf32>> + affine.store %89, %3[((%arg5 + %88 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %88 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %90 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_14) + %91 = load %0[%c0_13, %90] : memref<1x16xvector<8xf32>> + affine.store %91, %3[((%arg5 + %90 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %90 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %92 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_14) + %93 = load %0[%c0_13, %92] : memref<1x16xvector<8xf32>> + affine.store %93, %3[((%arg5 + %92 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %92 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_14) + %95 = load %0[%c0_13, %94] : memref<1x16xvector<8xf32>> + affine.store %95, %3[((%arg5 + %94 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %94 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_14) + %97 = load %0[%c0_13, %96] : memref<1x16xvector<8xf32>> + affine.store %97, %3[((%arg5 + %96 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %96 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_o,21}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i_o,19}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 256]} + affine.for %arg4 = 0 to 784 { + affine.for %arg5 = 0 to 16 { + affine.for %arg6 = 0 to 6 { + affine.for %arg7 = 0 to 2 { + store %cst_21, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 2 : i64, index = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 6 : i64, index = #accln<"index{j_i_i_o,15}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 2]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i,14}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 6, 2]} + affine.for %arg5 = 0 to 256 step 16 { + affine.for %arg6 = 0 to 128 step 4 { + affine.for %arg7 = 0 to 0 step 6 { + affine.for %arg8 = 0 to 4 { + affine.for %arg9 = 0 to 0 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %12 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %21 = load %arg0[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg0[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg0[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg0[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg0[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg0[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg0[%10, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %28 = load %arg0[%11, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = affine.load %3[((%12 - %arg3) floordiv 16) mod 16, (%13 - %c0) mod 128, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %30 = vector.extractelement %29[%c0_i64 : i64] : vector<8xf32> + %31 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %32 = affine.load %3[((%31 - %arg3) floordiv 16) mod 16, (%14 - %c0) mod 128, (((%31 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c1_i64 : i64] : vector<8xf32> + %34 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %35 = affine.load %3[((%34 - %arg3) floordiv 16) mod 16, (%15 - %c0) mod 128, (((%34 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %36 = vector.extractelement %35[%c2_i64 : i64] : vector<8xf32> + %37 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %38 = affine.load %3[((%37 - %arg3) floordiv 16) mod 16, (%16 - %c0) mod 128, (((%37 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = vector.extractelement %38[%c3_i64 : i64] : vector<8xf32> + %40 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %41 = affine.load %3[((%40 - %arg3) floordiv 16) mod 16, (%17 - %c0) mod 128, (((%40 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c4_i64 : i64] : vector<8xf32> + %43 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %44 = affine.load %3[((%43 - %arg3) floordiv 16) mod 16, (%18 - %c0) mod 128, (((%43 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = vector.extractelement %44[%c5_i64 : i64] : vector<8xf32> + %46 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %47 = affine.load %3[((%46 - %arg3) floordiv 16) mod 16, (%19 - %c0) mod 128, (((%46 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c6_i64 : i64] : vector<8xf32> + %49 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %50 = affine.load %3[((%49 - %arg3) floordiv 16) mod 16, (%20 - %c0) mod 128, (((%49 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = vector.extractelement %50[%c7_i64 : i64] : vector<8xf32> + %52 = "accv.bin_op"(%21, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %36) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %42) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %45) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = "accv.bin_op"(%27, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = "accv.bin_op"(%28, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %60 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %61 = vector.extractelement %60[%c0_i64 : i64] : vector<8xf32> + %62 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %63 = affine.load %2[((%62 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%62 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = vector.extractelement %63[%c1_i64 : i64] : vector<8xf32> + %65 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %66 = affine.load %2[((%65 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%65 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = vector.extractelement %66[%c2_i64 : i64] : vector<8xf32> + %68 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %69 = affine.load %2[((%68 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%68 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c3_i64 : i64] : vector<8xf32> + %71 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %72 = affine.load %2[((%71 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%71 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c4_i64 : i64] : vector<8xf32> + %74 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %75 = affine.load %2[((%74 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%74 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %78 = affine.load %2[((%77 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%77 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c6_i64 : i64] : vector<8xf32> + %80 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %81 = affine.load %2[((%80 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%80 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = vector.extractelement %81[%c7_i64 : i64] : vector<8xf32> + %83 = "accv.bin_op"(%61, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%64, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%67, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%70, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%73, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%76, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = "accv.bin_op"(%79, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + %90 = "accv.bin_op"(%82, %59) {predicate = 0 : i64} : (f32, f32) -> f32 + %91 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %83, %91[%c0_i64 : i64] : vector<8xf32> + affine.store %92, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %94 = affine.load %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %84, %94[%c1_i64 : i64] : vector<8xf32> + affine.store %95, %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %97 = affine.load %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c2_i64 : i64] : vector<8xf32> + affine.store %98, %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %100 = affine.load %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %86, %100[%c3_i64 : i64] : vector<8xf32> + affine.store %101, %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %103 = affine.load %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %87, %103[%c4_i64 : i64] : vector<8xf32> + affine.store %104, %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %106 = affine.load %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %88, %106[%c5_i64 : i64] : vector<8xf32> + affine.store %107, %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %109 = affine.load %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %89, %109[%c6_i64 : i64] : vector<8xf32> + affine.store %110, %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %112 = affine.load %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %90, %112[%c7_i64 : i64] : vector<8xf32> + affine.store %113, %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %83, %114[%c0_i64 : i64] : vector<8xf32> + affine.store %115, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %117 = affine.load %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %84, %117[%c1_i64 : i64] : vector<8xf32> + affine.store %118, %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %120 = affine.load %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %85, %120[%c2_i64 : i64] : vector<8xf32> + affine.store %121, %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %123 = affine.load %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %124 = vector.insertelement %86, %123[%c3_i64 : i64] : vector<8xf32> + affine.store %124, %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %126 = affine.load %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %87, %126[%c4_i64 : i64] : vector<8xf32> + affine.store %127, %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %129 = affine.load %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %130 = vector.insertelement %88, %129[%c5_i64 : i64] : vector<8xf32> + affine.store %130, %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %132 = affine.load %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %133 = vector.insertelement %89, %132[%c6_i64 : i64] : vector<8xf32> + affine.store %133, %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %135 = affine.load %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %136 = vector.insertelement %90, %135[%c7_i64 : i64] : vector<8xf32> + affine.store %136, %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %137 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_11) + %138 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %139 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %140 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %141 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %142 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %143 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %144 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %145 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %146 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %147 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %148 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %149 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %150 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %151 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %152 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %153 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %154 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %155 = load %arg0[%138, %147] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg0[%139, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg0[%140, %149] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %158 = load %arg0[%141, %150] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg0[%142, %151] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %160 = load %arg0[%143, %152] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %161 = load %arg0[%144, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg0[%145, %154] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = affine.load %3[((%146 - %arg3) floordiv 16) mod 16, (%147 - %c0) mod 128, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c0_i64 : i64] : vector<8xf32> + %165 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %166 = affine.load %3[((%165 - %arg3) floordiv 16) mod 16, (%148 - %c0) mod 128, (((%165 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c1_i64 : i64] : vector<8xf32> + %168 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %169 = affine.load %3[((%168 - %arg3) floordiv 16) mod 16, (%149 - %c0) mod 128, (((%168 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c2_i64 : i64] : vector<8xf32> + %171 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %172 = affine.load %3[((%171 - %arg3) floordiv 16) mod 16, (%150 - %c0) mod 128, (((%171 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c3_i64 : i64] : vector<8xf32> + %174 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %175 = affine.load %3[((%174 - %arg3) floordiv 16) mod 16, (%151 - %c0) mod 128, (((%174 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %176 = vector.extractelement %175[%c4_i64 : i64] : vector<8xf32> + %177 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %178 = affine.load %3[((%177 - %arg3) floordiv 16) mod 16, (%152 - %c0) mod 128, (((%177 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %179 = vector.extractelement %178[%c5_i64 : i64] : vector<8xf32> + %180 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %181 = affine.load %3[((%180 - %arg3) floordiv 16) mod 16, (%153 - %c0) mod 128, (((%180 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %182 = vector.extractelement %181[%c6_i64 : i64] : vector<8xf32> + %183 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %184 = affine.load %3[((%183 - %arg3) floordiv 16) mod 16, (%154 - %c0) mod 128, (((%183 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %185 = vector.extractelement %184[%c7_i64 : i64] : vector<8xf32> + %186 = "accv.bin_op"(%155, %164) {predicate = 2 : i64} : (f32, f32) -> f32 + %187 = "accv.bin_op"(%156, %167) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = "accv.bin_op"(%157, %170) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = "accv.bin_op"(%158, %173) {predicate = 2 : i64} : (f32, f32) -> f32 + %190 = "accv.bin_op"(%159, %176) {predicate = 2 : i64} : (f32, f32) -> f32 + %191 = "accv.bin_op"(%160, %179) {predicate = 2 : i64} : (f32, f32) -> f32 + %192 = "accv.bin_op"(%161, %182) {predicate = 2 : i64} : (f32, f32) -> f32 + %193 = "accv.bin_op"(%162, %185) {predicate = 2 : i64} : (f32, f32) -> f32 + %194 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c0_i64 : i64] : vector<8xf32> + %196 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %197 = affine.load %2[((%196 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%196 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %198 = vector.extractelement %197[%c1_i64 : i64] : vector<8xf32> + %199 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %200 = affine.load %2[((%199 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%199 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c2_i64 : i64] : vector<8xf32> + %202 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %203 = affine.load %2[((%202 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%202 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %204 = vector.extractelement %203[%c3_i64 : i64] : vector<8xf32> + %205 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %206 = affine.load %2[((%205 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%205 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %207 = vector.extractelement %206[%c4_i64 : i64] : vector<8xf32> + %208 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %209 = affine.load %2[((%208 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%208 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %210 = vector.extractelement %209[%c5_i64 : i64] : vector<8xf32> + %211 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %212 = affine.load %2[((%211 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%211 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %213 = vector.extractelement %212[%c6_i64 : i64] : vector<8xf32> + %214 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %215 = affine.load %2[((%214 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%214 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %216 = vector.extractelement %215[%c7_i64 : i64] : vector<8xf32> + %217 = "accv.bin_op"(%195, %186) {predicate = 0 : i64} : (f32, f32) -> f32 + %218 = "accv.bin_op"(%198, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + %219 = "accv.bin_op"(%201, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + %220 = "accv.bin_op"(%204, %189) {predicate = 0 : i64} : (f32, f32) -> f32 + %221 = "accv.bin_op"(%207, %190) {predicate = 0 : i64} : (f32, f32) -> f32 + %222 = "accv.bin_op"(%210, %191) {predicate = 0 : i64} : (f32, f32) -> f32 + %223 = "accv.bin_op"(%213, %192) {predicate = 0 : i64} : (f32, f32) -> f32 + %224 = "accv.bin_op"(%216, %193) {predicate = 0 : i64} : (f32, f32) -> f32 + %225 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %217, %225[%c0_i64 : i64] : vector<8xf32> + affine.store %226, %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %227 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %228 = affine.load %2[((%227 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%227 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %218, %228[%c1_i64 : i64] : vector<8xf32> + affine.store %229, %2[((%227 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%227 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %230 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %231 = affine.load %2[((%230 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%230 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %219, %231[%c2_i64 : i64] : vector<8xf32> + affine.store %232, %2[((%230 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%230 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %233 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %234 = affine.load %2[((%233 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %220, %234[%c3_i64 : i64] : vector<8xf32> + affine.store %235, %2[((%233 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %236 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %237 = affine.load %2[((%236 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %238 = vector.insertelement %221, %237[%c4_i64 : i64] : vector<8xf32> + affine.store %238, %2[((%236 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %239 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %240 = affine.load %2[((%239 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %222, %240[%c5_i64 : i64] : vector<8xf32> + affine.store %241, %2[((%239 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %242 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %243 = affine.load %2[((%242 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %244 = vector.insertelement %223, %243[%c6_i64 : i64] : vector<8xf32> + affine.store %244, %2[((%242 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %245 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %246 = affine.load %2[((%245 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = vector.insertelement %224, %246[%c7_i64 : i64] : vector<8xf32> + affine.store %247, %2[((%245 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %249 = vector.insertelement %217, %248[%c0_i64 : i64] : vector<8xf32> + affine.store %249, %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %250 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %251 = affine.load %2[((%250 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%250 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %252 = vector.insertelement %218, %251[%c1_i64 : i64] : vector<8xf32> + affine.store %252, %2[((%250 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%250 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %253 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %254 = affine.load %2[((%253 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%253 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %255 = vector.insertelement %219, %254[%c2_i64 : i64] : vector<8xf32> + affine.store %255, %2[((%253 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%253 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %256 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %257 = affine.load %2[((%256 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%256 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %258 = vector.insertelement %220, %257[%c3_i64 : i64] : vector<8xf32> + affine.store %258, %2[((%256 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%256 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %259 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %260 = affine.load %2[((%259 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%259 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %261 = vector.insertelement %221, %260[%c4_i64 : i64] : vector<8xf32> + affine.store %261, %2[((%259 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%259 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %262 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %263 = affine.load %2[((%262 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%262 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %264 = vector.insertelement %222, %263[%c5_i64 : i64] : vector<8xf32> + affine.store %264, %2[((%262 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%262 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %265 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %266 = affine.load %2[((%265 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%265 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %267 = vector.insertelement %223, %266[%c6_i64 : i64] : vector<8xf32> + affine.store %267, %2[((%265 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%265 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %268 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %269 = affine.load %2[((%268 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%268 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %270 = vector.insertelement %224, %269[%c7_i64 : i64] : vector<8xf32> + affine.store %270, %2[((%268 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%268 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_i,12}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_o,11}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 4]} + affine.for %arg7 = 0 to 4 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %12 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %21 = load %arg0[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg0[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg0[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg0[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg0[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg0[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg0[%10, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %28 = load %arg0[%11, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = affine.load %3[((%12 - %arg3) floordiv 16) mod 16, (%13 - %c0) mod 128, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %30 = vector.extractelement %29[%c0_i64 : i64] : vector<8xf32> + %31 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %32 = affine.load %3[((%31 - %arg3) floordiv 16) mod 16, (%14 - %c0) mod 128, (((%31 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c1_i64 : i64] : vector<8xf32> + %34 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %35 = affine.load %3[((%34 - %arg3) floordiv 16) mod 16, (%15 - %c0) mod 128, (((%34 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %36 = vector.extractelement %35[%c2_i64 : i64] : vector<8xf32> + %37 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %38 = affine.load %3[((%37 - %arg3) floordiv 16) mod 16, (%16 - %c0) mod 128, (((%37 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = vector.extractelement %38[%c3_i64 : i64] : vector<8xf32> + %40 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %41 = affine.load %3[((%40 - %arg3) floordiv 16) mod 16, (%17 - %c0) mod 128, (((%40 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c4_i64 : i64] : vector<8xf32> + %43 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %44 = affine.load %3[((%43 - %arg3) floordiv 16) mod 16, (%18 - %c0) mod 128, (((%43 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = vector.extractelement %44[%c5_i64 : i64] : vector<8xf32> + %46 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %47 = affine.load %3[((%46 - %arg3) floordiv 16) mod 16, (%19 - %c0) mod 128, (((%46 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c6_i64 : i64] : vector<8xf32> + %49 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %50 = affine.load %3[((%49 - %arg3) floordiv 16) mod 16, (%20 - %c0) mod 128, (((%49 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = vector.extractelement %50[%c7_i64 : i64] : vector<8xf32> + %52 = "accv.bin_op"(%21, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %36) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %42) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %45) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = "accv.bin_op"(%27, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = "accv.bin_op"(%28, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %60 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %61 = vector.extractelement %60[%c0_i64 : i64] : vector<8xf32> + %62 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %63 = affine.load %2[((%62 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%62 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = vector.extractelement %63[%c1_i64 : i64] : vector<8xf32> + %65 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %66 = affine.load %2[((%65 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%65 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = vector.extractelement %66[%c2_i64 : i64] : vector<8xf32> + %68 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %69 = affine.load %2[((%68 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%68 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c3_i64 : i64] : vector<8xf32> + %71 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %72 = affine.load %2[((%71 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%71 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c4_i64 : i64] : vector<8xf32> + %74 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %75 = affine.load %2[((%74 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%74 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %78 = affine.load %2[((%77 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%77 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c6_i64 : i64] : vector<8xf32> + %80 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %81 = affine.load %2[((%80 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%80 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = vector.extractelement %81[%c7_i64 : i64] : vector<8xf32> + %83 = "accv.bin_op"(%61, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%64, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%67, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%70, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%73, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%76, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = "accv.bin_op"(%79, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + %90 = "accv.bin_op"(%82, %59) {predicate = 0 : i64} : (f32, f32) -> f32 + %91 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %83, %91[%c0_i64 : i64] : vector<8xf32> + affine.store %92, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %94 = affine.load %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %84, %94[%c1_i64 : i64] : vector<8xf32> + affine.store %95, %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %97 = affine.load %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c2_i64 : i64] : vector<8xf32> + affine.store %98, %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %100 = affine.load %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %86, %100[%c3_i64 : i64] : vector<8xf32> + affine.store %101, %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %103 = affine.load %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %87, %103[%c4_i64 : i64] : vector<8xf32> + affine.store %104, %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %106 = affine.load %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %88, %106[%c5_i64 : i64] : vector<8xf32> + affine.store %107, %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %109 = affine.load %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %89, %109[%c6_i64 : i64] : vector<8xf32> + affine.store %110, %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %112 = affine.load %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %90, %112[%c7_i64 : i64] : vector<8xf32> + affine.store %113, %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %83, %114[%c0_i64 : i64] : vector<8xf32> + affine.store %115, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %117 = affine.load %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %84, %117[%c1_i64 : i64] : vector<8xf32> + affine.store %118, %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %120 = affine.load %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %85, %120[%c2_i64 : i64] : vector<8xf32> + affine.store %121, %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %123 = affine.load %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %124 = vector.insertelement %86, %123[%c3_i64 : i64] : vector<8xf32> + affine.store %124, %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %126 = affine.load %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %87, %126[%c4_i64 : i64] : vector<8xf32> + affine.store %127, %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %129 = affine.load %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %130 = vector.insertelement %88, %129[%c5_i64 : i64] : vector<8xf32> + affine.store %130, %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %132 = affine.load %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %133 = vector.insertelement %89, %132[%c6_i64 : i64] : vector<8xf32> + affine.store %133, %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %135 = affine.load %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %136 = vector.insertelement %90, %135[%c7_i64 : i64] : vector<8xf32> + affine.store %136, %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %137 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_9) + %138 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %139 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %140 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %141 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %142 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %143 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %144 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %145 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %146 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %147 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %148 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %149 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %150 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %151 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %152 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %153 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %154 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %155 = load %arg0[%138, %147] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg0[%139, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg0[%140, %149] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %158 = load %arg0[%141, %150] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg0[%142, %151] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %160 = load %arg0[%143, %152] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %161 = load %arg0[%144, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg0[%145, %154] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = affine.load %3[((%146 - %arg3) floordiv 16) mod 16, (%147 - %c0) mod 128, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c0_i64 : i64] : vector<8xf32> + %165 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %166 = affine.load %3[((%165 - %arg3) floordiv 16) mod 16, (%148 - %c0) mod 128, (((%165 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c1_i64 : i64] : vector<8xf32> + %168 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %169 = affine.load %3[((%168 - %arg3) floordiv 16) mod 16, (%149 - %c0) mod 128, (((%168 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c2_i64 : i64] : vector<8xf32> + %171 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %172 = affine.load %3[((%171 - %arg3) floordiv 16) mod 16, (%150 - %c0) mod 128, (((%171 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c3_i64 : i64] : vector<8xf32> + %174 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %175 = affine.load %3[((%174 - %arg3) floordiv 16) mod 16, (%151 - %c0) mod 128, (((%174 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %176 = vector.extractelement %175[%c4_i64 : i64] : vector<8xf32> + %177 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %178 = affine.load %3[((%177 - %arg3) floordiv 16) mod 16, (%152 - %c0) mod 128, (((%177 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %179 = vector.extractelement %178[%c5_i64 : i64] : vector<8xf32> + %180 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %181 = affine.load %3[((%180 - %arg3) floordiv 16) mod 16, (%153 - %c0) mod 128, (((%180 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %182 = vector.extractelement %181[%c6_i64 : i64] : vector<8xf32> + %183 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %184 = affine.load %3[((%183 - %arg3) floordiv 16) mod 16, (%154 - %c0) mod 128, (((%183 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %185 = vector.extractelement %184[%c7_i64 : i64] : vector<8xf32> + %186 = "accv.bin_op"(%155, %164) {predicate = 2 : i64} : (f32, f32) -> f32 + %187 = "accv.bin_op"(%156, %167) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = "accv.bin_op"(%157, %170) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = "accv.bin_op"(%158, %173) {predicate = 2 : i64} : (f32, f32) -> f32 + %190 = "accv.bin_op"(%159, %176) {predicate = 2 : i64} : (f32, f32) -> f32 + %191 = "accv.bin_op"(%160, %179) {predicate = 2 : i64} : (f32, f32) -> f32 + %192 = "accv.bin_op"(%161, %182) {predicate = 2 : i64} : (f32, f32) -> f32 + %193 = "accv.bin_op"(%162, %185) {predicate = 2 : i64} : (f32, f32) -> f32 + %194 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c0_i64 : i64] : vector<8xf32> + %196 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %197 = affine.load %2[((%196 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%196 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %198 = vector.extractelement %197[%c1_i64 : i64] : vector<8xf32> + %199 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %200 = affine.load %2[((%199 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%199 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c2_i64 : i64] : vector<8xf32> + %202 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %203 = affine.load %2[((%202 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%202 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %204 = vector.extractelement %203[%c3_i64 : i64] : vector<8xf32> + %205 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %206 = affine.load %2[((%205 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%205 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %207 = vector.extractelement %206[%c4_i64 : i64] : vector<8xf32> + %208 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %209 = affine.load %2[((%208 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%208 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %210 = vector.extractelement %209[%c5_i64 : i64] : vector<8xf32> + %211 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %212 = affine.load %2[((%211 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%211 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %213 = vector.extractelement %212[%c6_i64 : i64] : vector<8xf32> + %214 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %215 = affine.load %2[((%214 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%214 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %216 = vector.extractelement %215[%c7_i64 : i64] : vector<8xf32> + %217 = "accv.bin_op"(%195, %186) {predicate = 0 : i64} : (f32, f32) -> f32 + %218 = "accv.bin_op"(%198, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + %219 = "accv.bin_op"(%201, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + %220 = "accv.bin_op"(%204, %189) {predicate = 0 : i64} : (f32, f32) -> f32 + %221 = "accv.bin_op"(%207, %190) {predicate = 0 : i64} : (f32, f32) -> f32 + %222 = "accv.bin_op"(%210, %191) {predicate = 0 : i64} : (f32, f32) -> f32 + %223 = "accv.bin_op"(%213, %192) {predicate = 0 : i64} : (f32, f32) -> f32 + %224 = "accv.bin_op"(%216, %193) {predicate = 0 : i64} : (f32, f32) -> f32 + %225 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %217, %225[%c0_i64 : i64] : vector<8xf32> + affine.store %226, %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %227 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %228 = affine.load %2[((%227 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%227 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %218, %228[%c1_i64 : i64] : vector<8xf32> + affine.store %229, %2[((%227 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%227 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %230 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %231 = affine.load %2[((%230 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%230 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %219, %231[%c2_i64 : i64] : vector<8xf32> + affine.store %232, %2[((%230 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%230 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %233 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %234 = affine.load %2[((%233 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %220, %234[%c3_i64 : i64] : vector<8xf32> + affine.store %235, %2[((%233 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %236 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %237 = affine.load %2[((%236 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %238 = vector.insertelement %221, %237[%c4_i64 : i64] : vector<8xf32> + affine.store %238, %2[((%236 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %239 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %240 = affine.load %2[((%239 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %222, %240[%c5_i64 : i64] : vector<8xf32> + affine.store %241, %2[((%239 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %242 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %243 = affine.load %2[((%242 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %244 = vector.insertelement %223, %243[%c6_i64 : i64] : vector<8xf32> + affine.store %244, %2[((%242 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %245 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %246 = affine.load %2[((%245 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = vector.insertelement %224, %246[%c7_i64 : i64] : vector<8xf32> + affine.store %247, %2[((%245 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %249 = vector.insertelement %217, %248[%c0_i64 : i64] : vector<8xf32> + affine.store %249, %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %250 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %251 = affine.load %2[((%250 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%250 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %252 = vector.insertelement %218, %251[%c1_i64 : i64] : vector<8xf32> + affine.store %252, %2[((%250 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%250 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %253 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %254 = affine.load %2[((%253 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%253 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %255 = vector.insertelement %219, %254[%c2_i64 : i64] : vector<8xf32> + affine.store %255, %2[((%253 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%253 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %256 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %257 = affine.load %2[((%256 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%256 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %258 = vector.insertelement %220, %257[%c3_i64 : i64] : vector<8xf32> + affine.store %258, %2[((%256 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%256 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %259 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %260 = affine.load %2[((%259 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%259 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %261 = vector.insertelement %221, %260[%c4_i64 : i64] : vector<8xf32> + affine.store %261, %2[((%259 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%259 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %262 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %263 = affine.load %2[((%262 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%262 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %264 = vector.insertelement %222, %263[%c5_i64 : i64] : vector<8xf32> + affine.store %264, %2[((%262 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%262 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %265 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %266 = affine.load %2[((%265 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%265 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %267 = vector.insertelement %223, %266[%c6_i64 : i64] : vector<8xf32> + affine.store %267, %2[((%265 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%265 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %268 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %269 = affine.load %2[((%268 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%268 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %270 = vector.insertelement %224, %269[%c7_i64 : i64] : vector<8xf32> + affine.store %270, %2[((%268 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%268 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 128]} + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %c0_6) + %6 = vector.transfer_read %arg2[%4, %5], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %7 = affine.load %2[((%arg5 + %c0_6 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %c0_6 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %8 = addf %6, %7 : vector<8xf32> + store %8, %1[%c0_5, %c0_6] : memref<1x16xvector<8xf32>> + %9 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_6) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %9) + %12 = vector.transfer_read %arg2[%10, %11], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %13 = affine.load %2[((%arg5 + %9 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %14 = addf %12, %13 : vector<8xf32> + store %14, %1[%c0_5, %9] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_6) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %15) + %18 = vector.transfer_read %arg2[%16, %17], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %19 = affine.load %2[((%arg5 + %15 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %15 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %20 = addf %18, %19 : vector<8xf32> + store %20, %1[%c0_5, %15] : memref<1x16xvector<8xf32>> + %21 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_6) + %22 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %23 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %21) + %24 = vector.transfer_read %arg2[%22, %23], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %25 = affine.load %2[((%arg5 + %21 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %21 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %26 = addf %24, %25 : vector<8xf32> + store %26, %1[%c0_5, %21] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_6) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %27) + %30 = vector.transfer_read %arg2[%28, %29], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %31 = affine.load %2[((%arg5 + %27 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %27 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %32 = addf %30, %31 : vector<8xf32> + store %32, %1[%c0_5, %27] : memref<1x16xvector<8xf32>> + %33 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_6) + %34 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %35 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %33) + %36 = vector.transfer_read %arg2[%34, %35], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %37 = affine.load %2[((%arg5 + %33 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %33 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %38 = addf %36, %37 : vector<8xf32> + store %38, %1[%c0_5, %33] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_6) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %39) + %42 = vector.transfer_read %arg2[%40, %41], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %43 = affine.load %2[((%arg5 + %39 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %39 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %44 = addf %42, %43 : vector<8xf32> + store %44, %1[%c0_5, %39] : memref<1x16xvector<8xf32>> + %45 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_6) + %46 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %47 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %45) + %48 = vector.transfer_read %arg2[%46, %47], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %49 = affine.load %2[((%arg5 + %45 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %45 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %50 = addf %48, %49 : vector<8xf32> + store %50, %1[%c0_5, %45] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_6) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %51) + %54 = vector.transfer_read %arg2[%52, %53], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %55 = affine.load %2[((%arg5 + %51 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %51 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %56 = addf %54, %55 : vector<8xf32> + store %56, %1[%c0_5, %51] : memref<1x16xvector<8xf32>> + %57 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_6) + %58 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %59 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %57) + %60 = vector.transfer_read %arg2[%58, %59], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %61 = affine.load %2[((%arg5 + %57 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %57 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %62 = addf %60, %61 : vector<8xf32> + store %62, %1[%c0_5, %57] : memref<1x16xvector<8xf32>> + %63 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_6) + %64 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %65 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %63) + %66 = vector.transfer_read %arg2[%64, %65], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %67 = affine.load %2[((%arg5 + %63 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %63 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %68 = addf %66, %67 : vector<8xf32> + store %68, %1[%c0_5, %63] : memref<1x16xvector<8xf32>> + %69 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_6) + %70 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %71 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %69) + %72 = vector.transfer_read %arg2[%70, %71], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %73 = affine.load %2[((%arg5 + %69 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %69 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %74 = addf %72, %73 : vector<8xf32> + store %74, %1[%c0_5, %69] : memref<1x16xvector<8xf32>> + %75 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_6) + %76 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %77 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %75) + %78 = vector.transfer_read %arg2[%76, %77], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %79 = affine.load %2[((%arg5 + %75 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %75 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %80 = addf %78, %79 : vector<8xf32> + store %80, %1[%c0_5, %75] : memref<1x16xvector<8xf32>> + %81 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_6) + %82 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %83 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %81) + %84 = vector.transfer_read %arg2[%82, %83], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %85 = affine.load %2[((%arg5 + %81 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %81 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %86 = addf %84, %85 : vector<8xf32> + store %86, %1[%c0_5, %81] : memref<1x16xvector<8xf32>> + %87 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_6) + %88 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %89 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %87) + %90 = vector.transfer_read %arg2[%88, %89], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %91 = affine.load %2[((%arg5 + %87 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %87 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = addf %90, %91 : vector<8xf32> + store %92, %1[%c0_5, %87] : memref<1x16xvector<8xf32>> + %93 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_6) + %94 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %95 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %93) + %96 = vector.transfer_read %arg2[%94, %95], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %97 = affine.load %2[((%arg5 + %93 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %93 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = addf %96, %97 : vector<8xf32> + store %98, %1[%c0_5, %93] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %99 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_4) + %100 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %101 = load %1[%c0_4, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %101, %arg2[%99, %100] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 1]} + } else { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %c0_3) + %6 = vector.transfer_read %arg2[%4, %5], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %7 = affine.load %2[((%arg5 + %c0_3 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %c0_3 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %8 = addf %6, %7 : vector<8xf32> + store %8, %1[%c0_2, %c0_3] : memref<1x16xvector<8xf32>> + %9 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_3) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %9) + %12 = vector.transfer_read %arg2[%10, %11], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %13 = affine.load %2[((%arg5 + %9 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %14 = addf %12, %13 : vector<8xf32> + store %14, %1[%c0_2, %9] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_3) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %15) + %18 = vector.transfer_read %arg2[%16, %17], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %19 = affine.load %2[((%arg5 + %15 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %15 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %20 = addf %18, %19 : vector<8xf32> + store %20, %1[%c0_2, %15] : memref<1x16xvector<8xf32>> + %21 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_3) + %22 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %23 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %21) + %24 = vector.transfer_read %arg2[%22, %23], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %25 = affine.load %2[((%arg5 + %21 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %21 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %26 = addf %24, %25 : vector<8xf32> + store %26, %1[%c0_2, %21] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_3) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %27) + %30 = vector.transfer_read %arg2[%28, %29], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %31 = affine.load %2[((%arg5 + %27 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %27 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %32 = addf %30, %31 : vector<8xf32> + store %32, %1[%c0_2, %27] : memref<1x16xvector<8xf32>> + %33 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_3) + %34 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %35 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %33) + %36 = vector.transfer_read %arg2[%34, %35], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %37 = affine.load %2[((%arg5 + %33 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %33 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %38 = addf %36, %37 : vector<8xf32> + store %38, %1[%c0_2, %33] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_3) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %39) + %42 = vector.transfer_read %arg2[%40, %41], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %43 = affine.load %2[((%arg5 + %39 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %39 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %44 = addf %42, %43 : vector<8xf32> + store %44, %1[%c0_2, %39] : memref<1x16xvector<8xf32>> + %45 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_3) + %46 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %47 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %45) + %48 = vector.transfer_read %arg2[%46, %47], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %49 = affine.load %2[((%arg5 + %45 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %45 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %50 = addf %48, %49 : vector<8xf32> + store %50, %1[%c0_2, %45] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_3) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %51) + %54 = vector.transfer_read %arg2[%52, %53], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %55 = affine.load %2[((%arg5 + %51 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %51 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %56 = addf %54, %55 : vector<8xf32> + store %56, %1[%c0_2, %51] : memref<1x16xvector<8xf32>> + %57 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_3) + %58 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %59 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %57) + %60 = vector.transfer_read %arg2[%58, %59], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %61 = affine.load %2[((%arg5 + %57 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %57 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %62 = addf %60, %61 : vector<8xf32> + store %62, %1[%c0_2, %57] : memref<1x16xvector<8xf32>> + %63 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_3) + %64 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %65 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %63) + %66 = vector.transfer_read %arg2[%64, %65], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %67 = affine.load %2[((%arg5 + %63 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %63 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %68 = addf %66, %67 : vector<8xf32> + store %68, %1[%c0_2, %63] : memref<1x16xvector<8xf32>> + %69 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_3) + %70 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %71 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %69) + %72 = vector.transfer_read %arg2[%70, %71], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %73 = affine.load %2[((%arg5 + %69 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %69 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %74 = addf %72, %73 : vector<8xf32> + store %74, %1[%c0_2, %69] : memref<1x16xvector<8xf32>> + %75 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_3) + %76 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %77 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %75) + %78 = vector.transfer_read %arg2[%76, %77], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %79 = affine.load %2[((%arg5 + %75 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %75 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %80 = addf %78, %79 : vector<8xf32> + store %80, %1[%c0_2, %75] : memref<1x16xvector<8xf32>> + %81 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_3) + %82 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %83 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %81) + %84 = vector.transfer_read %arg2[%82, %83], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %85 = affine.load %2[((%arg5 + %81 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %81 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %86 = addf %84, %85 : vector<8xf32> + store %86, %1[%c0_2, %81] : memref<1x16xvector<8xf32>> + %87 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_3) + %88 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %89 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %87) + %90 = vector.transfer_read %arg2[%88, %89], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %91 = affine.load %2[((%arg5 + %87 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %87 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = addf %90, %91 : vector<8xf32> + store %92, %1[%c0_2, %87] : memref<1x16xvector<8xf32>> + %93 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_3) + %94 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %95 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %93) + %96 = vector.transfer_read %arg2[%94, %95], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %97 = affine.load %2[((%arg5 + %93 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %93 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = addf %96, %97 : vector<8xf32> + store %98, %1[%c0_2, %93] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %99 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_1) + %100 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %101 = load %1[%c0_1, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %101, %arg2[%99, %100] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 1]} + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 784 : i64, index = #accln<"index{i_o,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + func @NestFunction_8(%arg0: index, %arg1: index, %arg2: index, %arg3: index, %arg4: memref<1x16xvector<8xf32>>, %arg5: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + affine.for %arg6 = 0 to 16 { + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %arg6) + %2 = load %arg4[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %2, %arg5[%0, %1] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 1]} + return + } + func @NestFunction_9(%arg0: index, %arg1: index, %arg2: index, %arg3: index, %arg4: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg5: memref<16x6x2xvector<8xf32>>, %arg6: memref<1x16xvector<8xf32>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %c0_0) + %2 = vector.transfer_read %arg4[%0, %1], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %3 = affine.load %arg5[((%arg3 + %c0_0 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %c0_0 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %4 = addf %2, %3 : vector<8xf32> + store %4, %arg6[%c0, %c0_0] : memref<1x16xvector<8xf32>> + %5 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_0) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %5) + %8 = vector.transfer_read %arg4[%6, %7], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %9 = affine.load %arg5[((%arg3 + %5 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %5 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %10 = addf %8, %9 : vector<8xf32> + store %10, %arg6[%c0, %5] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_0) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %11) + %14 = vector.transfer_read %arg4[%12, %13], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %15 = affine.load %arg5[((%arg3 + %11 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %11 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %16 = addf %14, %15 : vector<8xf32> + store %16, %arg6[%c0, %11] : memref<1x16xvector<8xf32>> + %17 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_0) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %17) + %20 = vector.transfer_read %arg4[%18, %19], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %21 = affine.load %arg5[((%arg3 + %17 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %17 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %22 = addf %20, %21 : vector<8xf32> + store %22, %arg6[%c0, %17] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_0) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %23) + %26 = vector.transfer_read %arg4[%24, %25], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %27 = affine.load %arg5[((%arg3 + %23 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %23 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %28 = addf %26, %27 : vector<8xf32> + store %28, %arg6[%c0, %23] : memref<1x16xvector<8xf32>> + %29 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_0) + %30 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %31 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %29) + %32 = vector.transfer_read %arg4[%30, %31], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %33 = affine.load %arg5[((%arg3 + %29 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %29 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %34 = addf %32, %33 : vector<8xf32> + store %34, %arg6[%c0, %29] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_0) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %35) + %38 = vector.transfer_read %arg4[%36, %37], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %39 = affine.load %arg5[((%arg3 + %35 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %35 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %40 = addf %38, %39 : vector<8xf32> + store %40, %arg6[%c0, %35] : memref<1x16xvector<8xf32>> + %41 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_0) + %42 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %43 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %41) + %44 = vector.transfer_read %arg4[%42, %43], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %45 = affine.load %arg5[((%arg3 + %41 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %41 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %46 = addf %44, %45 : vector<8xf32> + store %46, %arg6[%c0, %41] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_0) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %47) + %50 = vector.transfer_read %arg4[%48, %49], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %51 = affine.load %arg5[((%arg3 + %47 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %47 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %52 = addf %50, %51 : vector<8xf32> + store %52, %arg6[%c0, %47] : memref<1x16xvector<8xf32>> + %53 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_0) + %54 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %55 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %53) + %56 = vector.transfer_read %arg4[%54, %55], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %57 = affine.load %arg5[((%arg3 + %53 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %53 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %58 = addf %56, %57 : vector<8xf32> + store %58, %arg6[%c0, %53] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_0) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %59) + %62 = vector.transfer_read %arg4[%60, %61], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %63 = affine.load %arg5[((%arg3 + %59 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %59 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = addf %62, %63 : vector<8xf32> + store %64, %arg6[%c0, %59] : memref<1x16xvector<8xf32>> + %65 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_0) + %66 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %67 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %65) + %68 = vector.transfer_read %arg4[%66, %67], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %69 = affine.load %arg5[((%arg3 + %65 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %65 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = addf %68, %69 : vector<8xf32> + store %70, %arg6[%c0, %65] : memref<1x16xvector<8xf32>> + %71 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_0) + %72 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %73 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %71) + %74 = vector.transfer_read %arg4[%72, %73], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %75 = affine.load %arg5[((%arg3 + %71 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %71 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = addf %74, %75 : vector<8xf32> + store %76, %arg6[%c0, %71] : memref<1x16xvector<8xf32>> + %77 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_0) + %78 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %79 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %77) + %80 = vector.transfer_read %arg4[%78, %79], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %81 = affine.load %arg5[((%arg3 + %77 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %77 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = addf %80, %81 : vector<8xf32> + store %82, %arg6[%c0, %77] : memref<1x16xvector<8xf32>> + %83 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_0) + %84 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %85 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %83) + %86 = vector.transfer_read %arg4[%84, %85], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %87 = affine.load %arg5[((%arg3 + %83 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %83 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %88 = addf %86, %87 : vector<8xf32> + store %88, %arg6[%c0, %83] : memref<1x16xvector<8xf32>> + %89 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_0) + %90 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %91 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %89) + %92 = vector.transfer_read %arg4[%90, %91], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %93 = affine.load %arg5[((%arg3 + %89 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %89 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = addf %92, %93 : vector<8xf32> + store %94, %arg6[%c0, %89] : memref<1x16xvector<8xf32>> + return + } + func @NestFunction_10(%arg0: index, %arg1: index, %arg2: index, %arg3: index, %arg4: memref<1x16xvector<8xf32>>, %arg5: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + affine.for %arg6 = 0 to 16 { + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %arg6) + %2 = load %arg4[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %2, %arg5[%0, %1] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 1]} + return + } + func @NestFunction_11(%arg0: index, %arg1: index, %arg2: index, %arg3: index, %arg4: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg5: memref<16x6x2xvector<8xf32>>, %arg6: memref<1x16xvector<8xf32>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %c0_0) + %2 = vector.transfer_read %arg4[%0, %1], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %3 = affine.load %arg5[((%arg3 + %c0_0 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %c0_0 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %4 = addf %2, %3 : vector<8xf32> + store %4, %arg6[%c0, %c0_0] : memref<1x16xvector<8xf32>> + %5 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_0) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %5) + %8 = vector.transfer_read %arg4[%6, %7], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %9 = affine.load %arg5[((%arg3 + %5 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %5 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %10 = addf %8, %9 : vector<8xf32> + store %10, %arg6[%c0, %5] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_0) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %11) + %14 = vector.transfer_read %arg4[%12, %13], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %15 = affine.load %arg5[((%arg3 + %11 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %11 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %16 = addf %14, %15 : vector<8xf32> + store %16, %arg6[%c0, %11] : memref<1x16xvector<8xf32>> + %17 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_0) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %17) + %20 = vector.transfer_read %arg4[%18, %19], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %21 = affine.load %arg5[((%arg3 + %17 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %17 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %22 = addf %20, %21 : vector<8xf32> + store %22, %arg6[%c0, %17] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_0) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %23) + %26 = vector.transfer_read %arg4[%24, %25], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %27 = affine.load %arg5[((%arg3 + %23 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %23 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %28 = addf %26, %27 : vector<8xf32> + store %28, %arg6[%c0, %23] : memref<1x16xvector<8xf32>> + %29 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_0) + %30 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %31 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %29) + %32 = vector.transfer_read %arg4[%30, %31], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %33 = affine.load %arg5[((%arg3 + %29 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %29 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %34 = addf %32, %33 : vector<8xf32> + store %34, %arg6[%c0, %29] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_0) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %35) + %38 = vector.transfer_read %arg4[%36, %37], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %39 = affine.load %arg5[((%arg3 + %35 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %35 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %40 = addf %38, %39 : vector<8xf32> + store %40, %arg6[%c0, %35] : memref<1x16xvector<8xf32>> + %41 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_0) + %42 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %43 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %41) + %44 = vector.transfer_read %arg4[%42, %43], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %45 = affine.load %arg5[((%arg3 + %41 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %41 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %46 = addf %44, %45 : vector<8xf32> + store %46, %arg6[%c0, %41] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_0) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %47) + %50 = vector.transfer_read %arg4[%48, %49], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %51 = affine.load %arg5[((%arg3 + %47 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %47 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %52 = addf %50, %51 : vector<8xf32> + store %52, %arg6[%c0, %47] : memref<1x16xvector<8xf32>> + %53 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_0) + %54 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %55 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %53) + %56 = vector.transfer_read %arg4[%54, %55], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %57 = affine.load %arg5[((%arg3 + %53 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %53 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %58 = addf %56, %57 : vector<8xf32> + store %58, %arg6[%c0, %53] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_0) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %59) + %62 = vector.transfer_read %arg4[%60, %61], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %63 = affine.load %arg5[((%arg3 + %59 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %59 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = addf %62, %63 : vector<8xf32> + store %64, %arg6[%c0, %59] : memref<1x16xvector<8xf32>> + %65 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_0) + %66 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %67 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %65) + %68 = vector.transfer_read %arg4[%66, %67], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %69 = affine.load %arg5[((%arg3 + %65 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %65 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = addf %68, %69 : vector<8xf32> + store %70, %arg6[%c0, %65] : memref<1x16xvector<8xf32>> + %71 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_0) + %72 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %73 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %71) + %74 = vector.transfer_read %arg4[%72, %73], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %75 = affine.load %arg5[((%arg3 + %71 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %71 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = addf %74, %75 : vector<8xf32> + store %76, %arg6[%c0, %71] : memref<1x16xvector<8xf32>> + %77 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_0) + %78 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %79 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %77) + %80 = vector.transfer_read %arg4[%78, %79], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %81 = affine.load %arg5[((%arg3 + %77 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %77 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = addf %80, %81 : vector<8xf32> + store %82, %arg6[%c0, %77] : memref<1x16xvector<8xf32>> + %83 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_0) + %84 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %85 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %83) + %86 = vector.transfer_read %arg4[%84, %85], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %87 = affine.load %arg5[((%arg3 + %83 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %83 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %88 = addf %86, %87 : vector<8xf32> + store %88, %arg6[%c0, %83] : memref<1x16xvector<8xf32>> + %89 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_0) + %90 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %91 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %89) + %92 = vector.transfer_read %arg4[%90, %91], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %93 = affine.load %arg5[((%arg3 + %89 * 8) floordiv 16) mod 16, (%arg1 + %c0) mod 6, (((%arg3 + %89 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = addf %92, %93 : vector<8xf32> + store %94, %arg6[%c0, %89] : memref<1x16xvector<8xf32>> + return + } + func @NestFunction_6(%arg0: memref<16x6x2xvector<8xf32>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %cst = constant dense<0.000000e+00> : vector<8xf32> + affine.for %arg1 = 0 to 16 { + affine.for %arg2 = 0 to 6 { + affine.for %arg3 = 0 to 2 { + store %cst, %arg0[%arg1, %arg2, %arg3] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 2 : i64, index = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 6 : i64, index = #accln<"index{j_i_i_o,15}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 2]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i,14}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 6, 2]} + return + } + func @NestFunction_12(%arg0: memref<1x16xvector<8xf32>>, %arg1: memref<16x128x2xvector<8xf32>>, %arg2: index, %arg3: index) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %0 = load %arg0[%c0, %c0_0] : memref<1x16xvector<8xf32>> + affine.store %0, %arg1[((%arg3 + %c0_0 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %c0_0 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %1 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_0) + %2 = load %arg0[%c0, %1] : memref<1x16xvector<8xf32>> + affine.store %2, %arg1[((%arg3 + %1 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %1 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %3 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_0) + %4 = load %arg0[%c0, %3] : memref<1x16xvector<8xf32>> + affine.store %4, %arg1[((%arg3 + %3 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %3 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %5 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_0) + %6 = load %arg0[%c0, %5] : memref<1x16xvector<8xf32>> + affine.store %6, %arg1[((%arg3 + %5 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %5 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_0) + %8 = load %arg0[%c0, %7] : memref<1x16xvector<8xf32>> + affine.store %8, %arg1[((%arg3 + %7 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %7 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %9 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_0) + %10 = load %arg0[%c0, %9] : memref<1x16xvector<8xf32>> + affine.store %10, %arg1[((%arg3 + %9 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %9 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_0) + %12 = load %arg0[%c0, %11] : memref<1x16xvector<8xf32>> + affine.store %12, %arg1[((%arg3 + %11 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %11 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %13 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_0) + %14 = load %arg0[%c0, %13] : memref<1x16xvector<8xf32>> + affine.store %14, %arg1[((%arg3 + %13 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %13 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_0) + %16 = load %arg0[%c0, %15] : memref<1x16xvector<8xf32>> + affine.store %16, %arg1[((%arg3 + %15 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %15 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %17 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_0) + %18 = load %arg0[%c0, %17] : memref<1x16xvector<8xf32>> + affine.store %18, %arg1[((%arg3 + %17 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %17 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_0) + %20 = load %arg0[%c0, %19] : memref<1x16xvector<8xf32>> + affine.store %20, %arg1[((%arg3 + %19 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %19 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %21 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_0) + %22 = load %arg0[%c0, %21] : memref<1x16xvector<8xf32>> + affine.store %22, %arg1[((%arg3 + %21 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %21 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_0) + %24 = load %arg0[%c0, %23] : memref<1x16xvector<8xf32>> + affine.store %24, %arg1[((%arg3 + %23 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %23 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %25 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_0) + %26 = load %arg0[%c0, %25] : memref<1x16xvector<8xf32>> + affine.store %26, %arg1[((%arg3 + %25 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %25 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_0) + %28 = load %arg0[%c0, %27] : memref<1x16xvector<8xf32>> + affine.store %28, %arg1[((%arg3 + %27 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %27 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %29 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_0) + %30 = load %arg0[%c0, %29] : memref<1x16xvector<8xf32>> + affine.store %30, %arg1[((%arg3 + %29 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %29 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + return + } + func @NestFunction_13(%arg0: index, %arg1: index, %arg2: index, %arg3: index, %arg4: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg5: memref<1x16xvector<8xf32>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %c0_0) + %2 = vector.transfer_read %arg4[%0, %1], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %2, %arg5[%c0, %c0_0] : memref<1x16xvector<8xf32>> + %3 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_0) + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %3) + %6 = vector.transfer_read %arg4[%4, %5], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %arg5[%c0, %3] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_0) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %7) + %10 = vector.transfer_read %arg4[%8, %9], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %10, %arg5[%c0, %7] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_0) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %11) + %14 = vector.transfer_read %arg4[%12, %13], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %arg5[%c0, %11] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_0) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %15) + %18 = vector.transfer_read %arg4[%16, %17], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %arg5[%c0, %15] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_0) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %19) + %22 = vector.transfer_read %arg4[%20, %21], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %22, %arg5[%c0, %19] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_0) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %23) + %26 = vector.transfer_read %arg4[%24, %25], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %arg5[%c0, %23] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_0) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %27) + %30 = vector.transfer_read %arg4[%28, %29], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %arg5[%c0, %27] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_0) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %31) + %34 = vector.transfer_read %arg4[%32, %33], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %34, %arg5[%c0, %31] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_0) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %35) + %38 = vector.transfer_read %arg4[%36, %37], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %arg5[%c0, %35] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_0) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %39) + %42 = vector.transfer_read %arg4[%40, %41], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %arg5[%c0, %39] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_0) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %43) + %46 = vector.transfer_read %arg4[%44, %45], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %46, %arg5[%c0, %43] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_0) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %47) + %50 = vector.transfer_read %arg4[%48, %49], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %arg5[%c0, %47] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_0) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %51) + %54 = vector.transfer_read %arg4[%52, %53], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %54, %arg5[%c0, %51] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_0) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %55) + %58 = vector.transfer_read %arg4[%56, %57], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %58, %arg5[%c0, %55] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_0) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %59) + %62 = vector.transfer_read %arg4[%60, %61], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %62, %arg5[%c0, %59] : memref<1x16xvector<8xf32>> + return + } + func @NestFunction_14(%arg0: memref<1x16xvector<8xf32>>, %arg1: memref<16x128x2xvector<8xf32>>, %arg2: index, %arg3: index) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %0 = load %arg0[%c0, %c0_0] : memref<1x16xvector<8xf32>> + affine.store %0, %arg1[((%arg3 + %c0_0 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %c0_0 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %1 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_0) + %2 = load %arg0[%c0, %1] : memref<1x16xvector<8xf32>> + affine.store %2, %arg1[((%arg3 + %1 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %1 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %3 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_0) + %4 = load %arg0[%c0, %3] : memref<1x16xvector<8xf32>> + affine.store %4, %arg1[((%arg3 + %3 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %3 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %5 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_0) + %6 = load %arg0[%c0, %5] : memref<1x16xvector<8xf32>> + affine.store %6, %arg1[((%arg3 + %5 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %5 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_0) + %8 = load %arg0[%c0, %7] : memref<1x16xvector<8xf32>> + affine.store %8, %arg1[((%arg3 + %7 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %7 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %9 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_0) + %10 = load %arg0[%c0, %9] : memref<1x16xvector<8xf32>> + affine.store %10, %arg1[((%arg3 + %9 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %9 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_0) + %12 = load %arg0[%c0, %11] : memref<1x16xvector<8xf32>> + affine.store %12, %arg1[((%arg3 + %11 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %11 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %13 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_0) + %14 = load %arg0[%c0, %13] : memref<1x16xvector<8xf32>> + affine.store %14, %arg1[((%arg3 + %13 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %13 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_0) + %16 = load %arg0[%c0, %15] : memref<1x16xvector<8xf32>> + affine.store %16, %arg1[((%arg3 + %15 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %15 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %17 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_0) + %18 = load %arg0[%c0, %17] : memref<1x16xvector<8xf32>> + affine.store %18, %arg1[((%arg3 + %17 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %17 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_0) + %20 = load %arg0[%c0, %19] : memref<1x16xvector<8xf32>> + affine.store %20, %arg1[((%arg3 + %19 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %19 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %21 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_0) + %22 = load %arg0[%c0, %21] : memref<1x16xvector<8xf32>> + affine.store %22, %arg1[((%arg3 + %21 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %21 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_0) + %24 = load %arg0[%c0, %23] : memref<1x16xvector<8xf32>> + affine.store %24, %arg1[((%arg3 + %23 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %23 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %25 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_0) + %26 = load %arg0[%c0, %25] : memref<1x16xvector<8xf32>> + affine.store %26, %arg1[((%arg3 + %25 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %25 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_0) + %28 = load %arg0[%c0, %27] : memref<1x16xvector<8xf32>> + affine.store %28, %arg1[((%arg3 + %27 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %27 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %29 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_0) + %30 = load %arg0[%c0, %29] : memref<1x16xvector<8xf32>> + affine.store %30, %arg1[((%arg3 + %29 * 8) floordiv 16) mod 16, (%arg2 + %c0) mod 128, (((%arg3 + %29 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + return + } + func @NestFunction_15(%arg0: index, %arg1: index, %arg2: index, %arg3: index, %arg4: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg5: memref<1x16xvector<8xf32>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %c0_0) + %2 = vector.transfer_read %arg4[%0, %1], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %2, %arg5[%c0, %c0_0] : memref<1x16xvector<8xf32>> + %3 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_0) + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %3) + %6 = vector.transfer_read %arg4[%4, %5], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %arg5[%c0, %3] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_0) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %7) + %10 = vector.transfer_read %arg4[%8, %9], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %10, %arg5[%c0, %7] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_0) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %11) + %14 = vector.transfer_read %arg4[%12, %13], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %arg5[%c0, %11] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_0) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %15) + %18 = vector.transfer_read %arg4[%16, %17], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %arg5[%c0, %15] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_0) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %19) + %22 = vector.transfer_read %arg4[%20, %21], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %22, %arg5[%c0, %19] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_0) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %23) + %26 = vector.transfer_read %arg4[%24, %25], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %arg5[%c0, %23] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_0) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %27) + %30 = vector.transfer_read %arg4[%28, %29], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %arg5[%c0, %27] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_0) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %31) + %34 = vector.transfer_read %arg4[%32, %33], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %34, %arg5[%c0, %31] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_0) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %35) + %38 = vector.transfer_read %arg4[%36, %37], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %arg5[%c0, %35] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_0) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %39) + %42 = vector.transfer_read %arg4[%40, %41], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %arg5[%c0, %39] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_0) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %43) + %46 = vector.transfer_read %arg4[%44, %45], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %46, %arg5[%c0, %43] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_0) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %47) + %50 = vector.transfer_read %arg4[%48, %49], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %arg5[%c0, %47] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_0) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %51) + %54 = vector.transfer_read %arg4[%52, %53], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %54, %arg5[%c0, %51] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_0) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %55) + %58 = vector.transfer_read %arg4[%56, %57], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %58, %arg5[%c0, %55] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_0) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg1, %c0) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg2, %arg3, %59) + %62 = vector.transfer_read %arg4[%60, %61], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %62, %arg5[%c0, %59] : memref<1x16xvector<8xf32>> + return + } + func @NestFunction_5(%arg0: index, %arg1: index, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg3: memref<16x6x2xvector<8xf32>>, %arg4: memref<1x16xvector<8xf32>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %c0_1 = constant 0 : index + %c0_2 = constant 0 : index + %c0_3 = constant 0 : index + %c0_4 = constant 0 : index + %c0_5 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %c0_5) + %2 = vector.transfer_read %arg2[%0, %1], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %3 = affine.load %arg3[((%arg5 + %c0_5 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %c0_5 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %4 = addf %2, %3 : vector<8xf32> + store %4, %arg4[%c0_4, %c0_5] : memref<1x16xvector<8xf32>> + %5 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_5) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %5) + %8 = vector.transfer_read %arg2[%6, %7], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %9 = affine.load %arg3[((%arg5 + %5 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %5 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %10 = addf %8, %9 : vector<8xf32> + store %10, %arg4[%c0_4, %5] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_5) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %11) + %14 = vector.transfer_read %arg2[%12, %13], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %15 = affine.load %arg3[((%arg5 + %11 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %11 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %16 = addf %14, %15 : vector<8xf32> + store %16, %arg4[%c0_4, %11] : memref<1x16xvector<8xf32>> + %17 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_5) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %17) + %20 = vector.transfer_read %arg2[%18, %19], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %21 = affine.load %arg3[((%arg5 + %17 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %17 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %22 = addf %20, %21 : vector<8xf32> + store %22, %arg4[%c0_4, %17] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_5) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %23) + %26 = vector.transfer_read %arg2[%24, %25], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %27 = affine.load %arg3[((%arg5 + %23 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %23 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %28 = addf %26, %27 : vector<8xf32> + store %28, %arg4[%c0_4, %23] : memref<1x16xvector<8xf32>> + %29 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_5) + %30 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %31 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %29) + %32 = vector.transfer_read %arg2[%30, %31], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %33 = affine.load %arg3[((%arg5 + %29 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %29 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %34 = addf %32, %33 : vector<8xf32> + store %34, %arg4[%c0_4, %29] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_5) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %35) + %38 = vector.transfer_read %arg2[%36, %37], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %39 = affine.load %arg3[((%arg5 + %35 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %35 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %40 = addf %38, %39 : vector<8xf32> + store %40, %arg4[%c0_4, %35] : memref<1x16xvector<8xf32>> + %41 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_5) + %42 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %43 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %41) + %44 = vector.transfer_read %arg2[%42, %43], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %45 = affine.load %arg3[((%arg5 + %41 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %41 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %46 = addf %44, %45 : vector<8xf32> + store %46, %arg4[%c0_4, %41] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_5) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %47) + %50 = vector.transfer_read %arg2[%48, %49], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %51 = affine.load %arg3[((%arg5 + %47 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %47 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %52 = addf %50, %51 : vector<8xf32> + store %52, %arg4[%c0_4, %47] : memref<1x16xvector<8xf32>> + %53 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_5) + %54 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %55 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %53) + %56 = vector.transfer_read %arg2[%54, %55], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %57 = affine.load %arg3[((%arg5 + %53 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %53 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %58 = addf %56, %57 : vector<8xf32> + store %58, %arg4[%c0_4, %53] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_5) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %59) + %62 = vector.transfer_read %arg2[%60, %61], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %63 = affine.load %arg3[((%arg5 + %59 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %59 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = addf %62, %63 : vector<8xf32> + store %64, %arg4[%c0_4, %59] : memref<1x16xvector<8xf32>> + %65 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_5) + %66 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %67 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %65) + %68 = vector.transfer_read %arg2[%66, %67], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %69 = affine.load %arg3[((%arg5 + %65 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %65 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = addf %68, %69 : vector<8xf32> + store %70, %arg4[%c0_4, %65] : memref<1x16xvector<8xf32>> + %71 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_5) + %72 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %73 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %71) + %74 = vector.transfer_read %arg2[%72, %73], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %75 = affine.load %arg3[((%arg5 + %71 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %71 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = addf %74, %75 : vector<8xf32> + store %76, %arg4[%c0_4, %71] : memref<1x16xvector<8xf32>> + %77 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_5) + %78 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %79 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %77) + %80 = vector.transfer_read %arg2[%78, %79], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %81 = affine.load %arg3[((%arg5 + %77 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %77 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = addf %80, %81 : vector<8xf32> + store %82, %arg4[%c0_4, %77] : memref<1x16xvector<8xf32>> + %83 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_5) + %84 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %85 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %83) + %86 = vector.transfer_read %arg2[%84, %85], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %87 = affine.load %arg3[((%arg5 + %83 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %83 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %88 = addf %86, %87 : vector<8xf32> + store %88, %arg4[%c0_4, %83] : memref<1x16xvector<8xf32>> + %89 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_5) + %90 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_4) + %91 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %89) + %92 = vector.transfer_read %arg2[%90, %91], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %93 = affine.load %arg3[((%arg5 + %89 * 8) floordiv 16) mod 16, (%c0 + %c0_4) mod 6, (((%arg5 + %89 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = addf %92, %93 : vector<8xf32> + store %94, %arg4[%c0_4, %89] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %95 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_3) + %96 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %arg6) + %97 = load %arg4[%c0_3, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %97, %arg2[%95, %96] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 1]} + } else { + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %c0_2) + %2 = vector.transfer_read %arg2[%0, %1], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %3 = affine.load %arg3[((%arg5 + %c0_2 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %c0_2 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %4 = addf %2, %3 : vector<8xf32> + store %4, %arg4[%c0_1, %c0_2] : memref<1x16xvector<8xf32>> + %5 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_2) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %5) + %8 = vector.transfer_read %arg2[%6, %7], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %9 = affine.load %arg3[((%arg5 + %5 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %5 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %10 = addf %8, %9 : vector<8xf32> + store %10, %arg4[%c0_1, %5] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_2) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %11) + %14 = vector.transfer_read %arg2[%12, %13], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %15 = affine.load %arg3[((%arg5 + %11 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %11 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %16 = addf %14, %15 : vector<8xf32> + store %16, %arg4[%c0_1, %11] : memref<1x16xvector<8xf32>> + %17 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_2) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %17) + %20 = vector.transfer_read %arg2[%18, %19], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %21 = affine.load %arg3[((%arg5 + %17 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %17 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %22 = addf %20, %21 : vector<8xf32> + store %22, %arg4[%c0_1, %17] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_2) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %23) + %26 = vector.transfer_read %arg2[%24, %25], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %27 = affine.load %arg3[((%arg5 + %23 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %23 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %28 = addf %26, %27 : vector<8xf32> + store %28, %arg4[%c0_1, %23] : memref<1x16xvector<8xf32>> + %29 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_2) + %30 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %31 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %29) + %32 = vector.transfer_read %arg2[%30, %31], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %33 = affine.load %arg3[((%arg5 + %29 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %29 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %34 = addf %32, %33 : vector<8xf32> + store %34, %arg4[%c0_1, %29] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_2) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %35) + %38 = vector.transfer_read %arg2[%36, %37], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %39 = affine.load %arg3[((%arg5 + %35 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %35 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %40 = addf %38, %39 : vector<8xf32> + store %40, %arg4[%c0_1, %35] : memref<1x16xvector<8xf32>> + %41 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_2) + %42 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %43 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %41) + %44 = vector.transfer_read %arg2[%42, %43], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %45 = affine.load %arg3[((%arg5 + %41 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %41 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %46 = addf %44, %45 : vector<8xf32> + store %46, %arg4[%c0_1, %41] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_2) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %47) + %50 = vector.transfer_read %arg2[%48, %49], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %51 = affine.load %arg3[((%arg5 + %47 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %47 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %52 = addf %50, %51 : vector<8xf32> + store %52, %arg4[%c0_1, %47] : memref<1x16xvector<8xf32>> + %53 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_2) + %54 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %55 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %53) + %56 = vector.transfer_read %arg2[%54, %55], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %57 = affine.load %arg3[((%arg5 + %53 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %53 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %58 = addf %56, %57 : vector<8xf32> + store %58, %arg4[%c0_1, %53] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_2) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %59) + %62 = vector.transfer_read %arg2[%60, %61], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %63 = affine.load %arg3[((%arg5 + %59 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %59 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = addf %62, %63 : vector<8xf32> + store %64, %arg4[%c0_1, %59] : memref<1x16xvector<8xf32>> + %65 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_2) + %66 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %67 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %65) + %68 = vector.transfer_read %arg2[%66, %67], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %69 = affine.load %arg3[((%arg5 + %65 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %65 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = addf %68, %69 : vector<8xf32> + store %70, %arg4[%c0_1, %65] : memref<1x16xvector<8xf32>> + %71 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_2) + %72 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %73 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %71) + %74 = vector.transfer_read %arg2[%72, %73], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %75 = affine.load %arg3[((%arg5 + %71 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %71 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = addf %74, %75 : vector<8xf32> + store %76, %arg4[%c0_1, %71] : memref<1x16xvector<8xf32>> + %77 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_2) + %78 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %79 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %77) + %80 = vector.transfer_read %arg2[%78, %79], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %81 = affine.load %arg3[((%arg5 + %77 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %77 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = addf %80, %81 : vector<8xf32> + store %82, %arg4[%c0_1, %77] : memref<1x16xvector<8xf32>> + %83 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_2) + %84 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %85 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %83) + %86 = vector.transfer_read %arg2[%84, %85], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %87 = affine.load %arg3[((%arg5 + %83 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %83 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %88 = addf %86, %87 : vector<8xf32> + store %88, %arg4[%c0_1, %83] : memref<1x16xvector<8xf32>> + %89 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_2) + %90 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_1) + %91 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %89) + %92 = vector.transfer_read %arg2[%90, %91], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %93 = affine.load %arg3[((%arg5 + %89 * 8) floordiv 16) mod 16, (%c0 + %c0_1) mod 6, (((%arg5 + %89 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = addf %92, %93 : vector<8xf32> + store %94, %arg4[%c0_1, %89] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %95 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %c0, %c0_0) + %96 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg5, %arg6) + %97 = load %arg4[%c0_0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %97, %arg2[%95, %96] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 1]} + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 128]} + return + } + func @NestFunction_7(%arg0: index, %arg1: index, %arg2: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg3: memref<1x16xvector<8xf32>>, %arg4: memref<16x128x2xvector<8xf32>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %c0_1 = constant 0 : index + %c0_2 = constant 0 : index + %c0_3 = constant 0 : index + %c0_4 = constant 0 : index + %c0_5 = constant 0 : index + %c0_6 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + affine.for %arg5 = 0 to 128 { + affine.for %arg6 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %c0_6) + %2 = vector.transfer_read %arg2[%0, %1], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %2, %arg3[%c0_5, %c0_6] : memref<1x16xvector<8xf32>> + %3 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_6) + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %3) + %6 = vector.transfer_read %arg2[%4, %5], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %arg3[%c0_5, %3] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_6) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %7) + %10 = vector.transfer_read %arg2[%8, %9], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %10, %arg3[%c0_5, %7] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_6) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %11) + %14 = vector.transfer_read %arg2[%12, %13], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %arg3[%c0_5, %11] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_6) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %15) + %18 = vector.transfer_read %arg2[%16, %17], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %arg3[%c0_5, %15] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_6) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %19) + %22 = vector.transfer_read %arg2[%20, %21], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %22, %arg3[%c0_5, %19] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_6) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %23) + %26 = vector.transfer_read %arg2[%24, %25], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %arg3[%c0_5, %23] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_6) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %27) + %30 = vector.transfer_read %arg2[%28, %29], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %arg3[%c0_5, %27] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_6) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %31) + %34 = vector.transfer_read %arg2[%32, %33], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %34, %arg3[%c0_5, %31] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_6) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %35) + %38 = vector.transfer_read %arg2[%36, %37], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %arg3[%c0_5, %35] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_6) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %39) + %42 = vector.transfer_read %arg2[%40, %41], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %arg3[%c0_5, %39] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_6) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %43) + %46 = vector.transfer_read %arg2[%44, %45], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %46, %arg3[%c0_5, %43] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_6) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %47) + %50 = vector.transfer_read %arg2[%48, %49], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %arg3[%c0_5, %47] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_6) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %51) + %54 = vector.transfer_read %arg2[%52, %53], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %54, %arg3[%c0_5, %51] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_6) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %55) + %58 = vector.transfer_read %arg2[%56, %57], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %58, %arg3[%c0_5, %55] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_6) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_5) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %59) + %62 = vector.transfer_read %arg2[%60, %61], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %62, %arg3[%c0_5, %59] : memref<1x16xvector<8xf32>> + %63 = load %arg3[%c0_3, %c0_4] : memref<1x16xvector<8xf32>> + affine.store %63, %arg4[((%arg6 + %c0_4 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %c0_4 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %64 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_4) + %65 = load %arg3[%c0_3, %64] : memref<1x16xvector<8xf32>> + affine.store %65, %arg4[((%arg6 + %64 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %64 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %66 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_4) + %67 = load %arg3[%c0_3, %66] : memref<1x16xvector<8xf32>> + affine.store %67, %arg4[((%arg6 + %66 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %66 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %68 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_4) + %69 = load %arg3[%c0_3, %68] : memref<1x16xvector<8xf32>> + affine.store %69, %arg4[((%arg6 + %68 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %68 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %70 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_4) + %71 = load %arg3[%c0_3, %70] : memref<1x16xvector<8xf32>> + affine.store %71, %arg4[((%arg6 + %70 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %70 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %72 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_4) + %73 = load %arg3[%c0_3, %72] : memref<1x16xvector<8xf32>> + affine.store %73, %arg4[((%arg6 + %72 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %72 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %74 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_4) + %75 = load %arg3[%c0_3, %74] : memref<1x16xvector<8xf32>> + affine.store %75, %arg4[((%arg6 + %74 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %74 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %76 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_4) + %77 = load %arg3[%c0_3, %76] : memref<1x16xvector<8xf32>> + affine.store %77, %arg4[((%arg6 + %76 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %76 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %78 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_4) + %79 = load %arg3[%c0_3, %78] : memref<1x16xvector<8xf32>> + affine.store %79, %arg4[((%arg6 + %78 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %78 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %80 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_4) + %81 = load %arg3[%c0_3, %80] : memref<1x16xvector<8xf32>> + affine.store %81, %arg4[((%arg6 + %80 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %80 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %82 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_4) + %83 = load %arg3[%c0_3, %82] : memref<1x16xvector<8xf32>> + affine.store %83, %arg4[((%arg6 + %82 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %82 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %84 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_4) + %85 = load %arg3[%c0_3, %84] : memref<1x16xvector<8xf32>> + affine.store %85, %arg4[((%arg6 + %84 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %84 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %86 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_4) + %87 = load %arg3[%c0_3, %86] : memref<1x16xvector<8xf32>> + affine.store %87, %arg4[((%arg6 + %86 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %86 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_4) + %89 = load %arg3[%c0_3, %88] : memref<1x16xvector<8xf32>> + affine.store %89, %arg4[((%arg6 + %88 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %88 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %90 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_4) + %91 = load %arg3[%c0_3, %90] : memref<1x16xvector<8xf32>> + affine.store %91, %arg4[((%arg6 + %90 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %90 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %92 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_4) + %93 = load %arg3[%c0_3, %92] : memref<1x16xvector<8xf32>> + affine.store %93, %arg4[((%arg6 + %92 * 8) floordiv 16) mod 16, (%arg5 + %c0_3) mod 128, (((%arg6 + %92 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } else { + %0 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %1 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %c0_2) + %2 = vector.transfer_read %arg2[%0, %1], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %2, %arg3[%c0_1, %c0_2] : memref<1x16xvector<8xf32>> + %3 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_2) + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %3) + %6 = vector.transfer_read %arg2[%4, %5], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %arg3[%c0_1, %3] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_2) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %7) + %10 = vector.transfer_read %arg2[%8, %9], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %10, %arg3[%c0_1, %7] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_2) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %11) + %14 = vector.transfer_read %arg2[%12, %13], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %arg3[%c0_1, %11] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_2) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %15) + %18 = vector.transfer_read %arg2[%16, %17], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %arg3[%c0_1, %15] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_2) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %19) + %22 = vector.transfer_read %arg2[%20, %21], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %22, %arg3[%c0_1, %19] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_2) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %23) + %26 = vector.transfer_read %arg2[%24, %25], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %arg3[%c0_1, %23] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_2) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %27) + %30 = vector.transfer_read %arg2[%28, %29], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %arg3[%c0_1, %27] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_2) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %31) + %34 = vector.transfer_read %arg2[%32, %33], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %34, %arg3[%c0_1, %31] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_2) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %35) + %38 = vector.transfer_read %arg2[%36, %37], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %arg3[%c0_1, %35] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_2) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %39) + %42 = vector.transfer_read %arg2[%40, %41], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %arg3[%c0_1, %39] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_2) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %43) + %46 = vector.transfer_read %arg2[%44, %45], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %46, %arg3[%c0_1, %43] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_2) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %47) + %50 = vector.transfer_read %arg2[%48, %49], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %arg3[%c0_1, %47] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_2) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %51) + %54 = vector.transfer_read %arg2[%52, %53], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %54, %arg3[%c0_1, %51] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_2) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %55) + %58 = vector.transfer_read %arg2[%56, %57], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %58, %arg3[%c0_1, %55] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_2) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %arg5, %c0_1) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg1, %arg6, %59) + %62 = vector.transfer_read %arg2[%60, %61], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %62, %arg3[%c0_1, %59] : memref<1x16xvector<8xf32>> + %63 = load %arg3[%c0, %c0_0] : memref<1x16xvector<8xf32>> + affine.store %63, %arg4[((%arg6 + %c0_0 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %c0_0 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %64 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_0) + %65 = load %arg3[%c0, %64] : memref<1x16xvector<8xf32>> + affine.store %65, %arg4[((%arg6 + %64 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %64 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %66 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_0) + %67 = load %arg3[%c0, %66] : memref<1x16xvector<8xf32>> + affine.store %67, %arg4[((%arg6 + %66 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %66 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %68 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_0) + %69 = load %arg3[%c0, %68] : memref<1x16xvector<8xf32>> + affine.store %69, %arg4[((%arg6 + %68 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %68 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %70 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_0) + %71 = load %arg3[%c0, %70] : memref<1x16xvector<8xf32>> + affine.store %71, %arg4[((%arg6 + %70 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %70 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %72 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_0) + %73 = load %arg3[%c0, %72] : memref<1x16xvector<8xf32>> + affine.store %73, %arg4[((%arg6 + %72 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %72 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %74 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_0) + %75 = load %arg3[%c0, %74] : memref<1x16xvector<8xf32>> + affine.store %75, %arg4[((%arg6 + %74 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %74 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %76 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_0) + %77 = load %arg3[%c0, %76] : memref<1x16xvector<8xf32>> + affine.store %77, %arg4[((%arg6 + %76 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %76 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %78 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_0) + %79 = load %arg3[%c0, %78] : memref<1x16xvector<8xf32>> + affine.store %79, %arg4[((%arg6 + %78 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %78 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %80 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_0) + %81 = load %arg3[%c0, %80] : memref<1x16xvector<8xf32>> + affine.store %81, %arg4[((%arg6 + %80 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %80 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %82 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_0) + %83 = load %arg3[%c0, %82] : memref<1x16xvector<8xf32>> + affine.store %83, %arg4[((%arg6 + %82 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %82 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %84 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_0) + %85 = load %arg3[%c0, %84] : memref<1x16xvector<8xf32>> + affine.store %85, %arg4[((%arg6 + %84 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %84 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %86 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_0) + %87 = load %arg3[%c0, %86] : memref<1x16xvector<8xf32>> + affine.store %87, %arg4[((%arg6 + %86 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %86 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_0) + %89 = load %arg3[%c0, %88] : memref<1x16xvector<8xf32>> + affine.store %89, %arg4[((%arg6 + %88 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %88 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %90 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_0) + %91 = load %arg3[%c0, %90] : memref<1x16xvector<8xf32>> + affine.store %91, %arg4[((%arg6 + %90 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %90 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %92 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_0) + %93 = load %arg3[%c0, %92] : memref<1x16xvector<8xf32>> + affine.store %93, %arg4[((%arg6 + %92 * 8) floordiv 16) mod 16, (%arg5 + %c0) mod 128, (((%arg6 + %92 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_o,21}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i_o,19}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 256]} + return + } + func @NestFunction_0(%arg0: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg1: memref<1x16xvector<8xf32>>, %arg2: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg3: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg4: memref<1x16xvector<8xf32>>) attributes {exec_target = 0 : i64, sym_visibility = "private"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %c0_1 = constant 0 : index + %c0_2 = constant 0 : index + %c0_3 = constant 0 : index + %c0_4 = constant 0 : index + %c0_5 = constant 0 : index + %c0_6 = constant 0 : index + %c0_7 = constant 0 : index + %c0_8 = constant 0 : index + %c0_9 = constant 0 : index + %c0_10 = constant 0 : index + %c0_11 = constant 0 : index + %c0_12 = constant 0 : index + %c0_13 = constant 0 : index + %c0_14 = constant 0 : index + %c0_15 = constant 0 : index + %c0_16 = constant 0 : index + %c0_17 = constant 0 : index + %c0_18 = constant 0 : index + %c0_19 = constant 0 : index + %c0_20 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_21 = constant dense<0.000000e+00> : vector<8xf32> + %0 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %1 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + affine.for %arg5 = 0 to 512 step 256 { + affine.for %arg6 = 0 to 128 { + affine.for %arg7 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %2 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %3 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %c0_20) + %4 = vector.transfer_read %arg0[%2, %3], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %4, %arg1[%c0_19, %c0_20] : memref<1x16xvector<8xf32>> + %5 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_20) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %5) + %8 = vector.transfer_read %arg0[%6, %7], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %8, %arg1[%c0_19, %5] : memref<1x16xvector<8xf32>> + %9 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_20) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %9) + %12 = vector.transfer_read %arg0[%10, %11], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %12, %arg1[%c0_19, %9] : memref<1x16xvector<8xf32>> + %13 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_20) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %13) + %16 = vector.transfer_read %arg0[%14, %15], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %16, %arg1[%c0_19, %13] : memref<1x16xvector<8xf32>> + %17 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_20) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %17) + %20 = vector.transfer_read %arg0[%18, %19], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %20, %arg1[%c0_19, %17] : memref<1x16xvector<8xf32>> + %21 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_20) + %22 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %23 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %21) + %24 = vector.transfer_read %arg0[%22, %23], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %24, %arg1[%c0_19, %21] : memref<1x16xvector<8xf32>> + %25 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_20) + %26 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %27 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %25) + %28 = vector.transfer_read %arg0[%26, %27], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %28, %arg1[%c0_19, %25] : memref<1x16xvector<8xf32>> + %29 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_20) + %30 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %31 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %29) + %32 = vector.transfer_read %arg0[%30, %31], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %32, %arg1[%c0_19, %29] : memref<1x16xvector<8xf32>> + %33 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_20) + %34 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %35 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %33) + %36 = vector.transfer_read %arg0[%34, %35], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %36, %arg1[%c0_19, %33] : memref<1x16xvector<8xf32>> + %37 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_20) + %38 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %39 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %37) + %40 = vector.transfer_read %arg0[%38, %39], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %40, %arg1[%c0_19, %37] : memref<1x16xvector<8xf32>> + %41 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_20) + %42 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %43 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %41) + %44 = vector.transfer_read %arg0[%42, %43], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %44, %arg1[%c0_19, %41] : memref<1x16xvector<8xf32>> + %45 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_20) + %46 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %47 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %45) + %48 = vector.transfer_read %arg0[%46, %47], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %48, %arg1[%c0_19, %45] : memref<1x16xvector<8xf32>> + %49 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_20) + %50 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %51 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %49) + %52 = vector.transfer_read %arg0[%50, %51], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %52, %arg1[%c0_19, %49] : memref<1x16xvector<8xf32>> + %53 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_20) + %54 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %55 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %53) + %56 = vector.transfer_read %arg0[%54, %55], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %56, %arg1[%c0_19, %53] : memref<1x16xvector<8xf32>> + %57 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_20) + %58 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %59 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %57) + %60 = vector.transfer_read %arg0[%58, %59], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %60, %arg1[%c0_19, %57] : memref<1x16xvector<8xf32>> + %61 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_20) + %62 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_19) + %63 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %61) + %64 = vector.transfer_read %arg0[%62, %63], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %64, %arg1[%c0_19, %61] : memref<1x16xvector<8xf32>> + %65 = load %arg1[%c0_17, %c0_18] : memref<1x16xvector<8xf32>> + affine.store %65, %1[((%arg7 + %c0_18 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %c0_18 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %66 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_18) + %67 = load %arg1[%c0_17, %66] : memref<1x16xvector<8xf32>> + affine.store %67, %1[((%arg7 + %66 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %66 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %68 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_18) + %69 = load %arg1[%c0_17, %68] : memref<1x16xvector<8xf32>> + affine.store %69, %1[((%arg7 + %68 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %68 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %70 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_18) + %71 = load %arg1[%c0_17, %70] : memref<1x16xvector<8xf32>> + affine.store %71, %1[((%arg7 + %70 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %70 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %72 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_18) + %73 = load %arg1[%c0_17, %72] : memref<1x16xvector<8xf32>> + affine.store %73, %1[((%arg7 + %72 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %72 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %74 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_18) + %75 = load %arg1[%c0_17, %74] : memref<1x16xvector<8xf32>> + affine.store %75, %1[((%arg7 + %74 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %74 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %76 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_18) + %77 = load %arg1[%c0_17, %76] : memref<1x16xvector<8xf32>> + affine.store %77, %1[((%arg7 + %76 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %76 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %78 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_18) + %79 = load %arg1[%c0_17, %78] : memref<1x16xvector<8xf32>> + affine.store %79, %1[((%arg7 + %78 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %78 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %80 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_18) + %81 = load %arg1[%c0_17, %80] : memref<1x16xvector<8xf32>> + affine.store %81, %1[((%arg7 + %80 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %80 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %82 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_18) + %83 = load %arg1[%c0_17, %82] : memref<1x16xvector<8xf32>> + affine.store %83, %1[((%arg7 + %82 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %82 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %84 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_18) + %85 = load %arg1[%c0_17, %84] : memref<1x16xvector<8xf32>> + affine.store %85, %1[((%arg7 + %84 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %84 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %86 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_18) + %87 = load %arg1[%c0_17, %86] : memref<1x16xvector<8xf32>> + affine.store %87, %1[((%arg7 + %86 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %86 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_18) + %89 = load %arg1[%c0_17, %88] : memref<1x16xvector<8xf32>> + affine.store %89, %1[((%arg7 + %88 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %88 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %90 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_18) + %91 = load %arg1[%c0_17, %90] : memref<1x16xvector<8xf32>> + affine.store %91, %1[((%arg7 + %90 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %90 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %92 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_18) + %93 = load %arg1[%c0_17, %92] : memref<1x16xvector<8xf32>> + affine.store %93, %1[((%arg7 + %92 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %92 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_18) + %95 = load %arg1[%c0_17, %94] : memref<1x16xvector<8xf32>> + affine.store %95, %1[((%arg7 + %94 * 8) floordiv 16) mod 16, (%arg6 + %c0_17) mod 128, (((%arg7 + %94 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } else { + %2 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %3 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %c0_16) + %4 = vector.transfer_read %arg0[%2, %3], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %4, %arg1[%c0_15, %c0_16] : memref<1x16xvector<8xf32>> + %5 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_16) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %5) + %8 = vector.transfer_read %arg0[%6, %7], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %8, %arg1[%c0_15, %5] : memref<1x16xvector<8xf32>> + %9 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_16) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %9) + %12 = vector.transfer_read %arg0[%10, %11], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %12, %arg1[%c0_15, %9] : memref<1x16xvector<8xf32>> + %13 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_16) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %13) + %16 = vector.transfer_read %arg0[%14, %15], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %16, %arg1[%c0_15, %13] : memref<1x16xvector<8xf32>> + %17 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_16) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %17) + %20 = vector.transfer_read %arg0[%18, %19], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %20, %arg1[%c0_15, %17] : memref<1x16xvector<8xf32>> + %21 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_16) + %22 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %23 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %21) + %24 = vector.transfer_read %arg0[%22, %23], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %24, %arg1[%c0_15, %21] : memref<1x16xvector<8xf32>> + %25 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_16) + %26 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %27 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %25) + %28 = vector.transfer_read %arg0[%26, %27], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %28, %arg1[%c0_15, %25] : memref<1x16xvector<8xf32>> + %29 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_16) + %30 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %31 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %29) + %32 = vector.transfer_read %arg0[%30, %31], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %32, %arg1[%c0_15, %29] : memref<1x16xvector<8xf32>> + %33 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_16) + %34 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %35 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %33) + %36 = vector.transfer_read %arg0[%34, %35], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %36, %arg1[%c0_15, %33] : memref<1x16xvector<8xf32>> + %37 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_16) + %38 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %39 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %37) + %40 = vector.transfer_read %arg0[%38, %39], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %40, %arg1[%c0_15, %37] : memref<1x16xvector<8xf32>> + %41 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_16) + %42 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %43 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %41) + %44 = vector.transfer_read %arg0[%42, %43], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %44, %arg1[%c0_15, %41] : memref<1x16xvector<8xf32>> + %45 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_16) + %46 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %47 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %45) + %48 = vector.transfer_read %arg0[%46, %47], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %48, %arg1[%c0_15, %45] : memref<1x16xvector<8xf32>> + %49 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_16) + %50 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %51 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %49) + %52 = vector.transfer_read %arg0[%50, %51], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %52, %arg1[%c0_15, %49] : memref<1x16xvector<8xf32>> + %53 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_16) + %54 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %55 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %53) + %56 = vector.transfer_read %arg0[%54, %55], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %56, %arg1[%c0_15, %53] : memref<1x16xvector<8xf32>> + %57 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_16) + %58 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %59 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %57) + %60 = vector.transfer_read %arg0[%58, %59], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %60, %arg1[%c0_15, %57] : memref<1x16xvector<8xf32>> + %61 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_16) + %62 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %c0_15) + %63 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %61) + %64 = vector.transfer_read %arg0[%62, %63], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %64, %arg1[%c0_15, %61] : memref<1x16xvector<8xf32>> + %65 = load %arg1[%c0_13, %c0_14] : memref<1x16xvector<8xf32>> + affine.store %65, %1[((%arg7 + %c0_14 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %c0_14 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %66 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_14) + %67 = load %arg1[%c0_13, %66] : memref<1x16xvector<8xf32>> + affine.store %67, %1[((%arg7 + %66 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %66 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %68 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_14) + %69 = load %arg1[%c0_13, %68] : memref<1x16xvector<8xf32>> + affine.store %69, %1[((%arg7 + %68 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %68 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %70 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_14) + %71 = load %arg1[%c0_13, %70] : memref<1x16xvector<8xf32>> + affine.store %71, %1[((%arg7 + %70 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %70 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %72 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_14) + %73 = load %arg1[%c0_13, %72] : memref<1x16xvector<8xf32>> + affine.store %73, %1[((%arg7 + %72 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %72 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %74 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_14) + %75 = load %arg1[%c0_13, %74] : memref<1x16xvector<8xf32>> + affine.store %75, %1[((%arg7 + %74 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %74 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %76 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_14) + %77 = load %arg1[%c0_13, %76] : memref<1x16xvector<8xf32>> + affine.store %77, %1[((%arg7 + %76 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %76 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %78 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_14) + %79 = load %arg1[%c0_13, %78] : memref<1x16xvector<8xf32>> + affine.store %79, %1[((%arg7 + %78 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %78 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %80 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_14) + %81 = load %arg1[%c0_13, %80] : memref<1x16xvector<8xf32>> + affine.store %81, %1[((%arg7 + %80 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %80 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %82 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_14) + %83 = load %arg1[%c0_13, %82] : memref<1x16xvector<8xf32>> + affine.store %83, %1[((%arg7 + %82 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %82 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %84 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_14) + %85 = load %arg1[%c0_13, %84] : memref<1x16xvector<8xf32>> + affine.store %85, %1[((%arg7 + %84 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %84 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %86 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_14) + %87 = load %arg1[%c0_13, %86] : memref<1x16xvector<8xf32>> + affine.store %87, %1[((%arg7 + %86 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %86 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_14) + %89 = load %arg1[%c0_13, %88] : memref<1x16xvector<8xf32>> + affine.store %89, %1[((%arg7 + %88 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %88 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %90 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_14) + %91 = load %arg1[%c0_13, %90] : memref<1x16xvector<8xf32>> + affine.store %91, %1[((%arg7 + %90 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %90 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %92 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_14) + %93 = load %arg1[%c0_13, %92] : memref<1x16xvector<8xf32>> + affine.store %93, %1[((%arg7 + %92 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %92 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_14) + %95 = load %arg1[%c0_13, %94] : memref<1x16xvector<8xf32>> + affine.store %95, %1[((%arg7 + %94 * 8) floordiv 16) mod 16, (%arg6 + %c0_13) mod 128, (((%arg7 + %94 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_o,21}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i_o,19}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 256]} + affine.for %arg6 = 0 to 784 { + affine.for %arg7 = 0 to 16 { + affine.for %arg8 = 0 to 6 { + affine.for %arg9 = 0 to 2 { + store %cst_21, %0[%arg7, %arg8, %arg9] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 2 : i64, index = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 6 : i64, index = #accln<"index{j_i_i_o,15}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 2]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i,14}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 6, 2]} + affine.for %arg7 = 0 to 256 step 16 { + affine.for %arg8 = 0 to 128 step 4 { + affine.for %arg9 = 0 to 0 step 6 { + affine.for %arg10 = 0 to 4 { + affine.for %arg11 = 0 to 0 { + %2 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %3 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %10 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %19 = load %arg2[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %20 = load %arg2[%3, %12] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = load %arg2[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg2[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg2[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg2[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg2[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg2[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = affine.load %1[((%10 - %arg5) floordiv 16) mod 16, (%11 - %c0) mod 128, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %28 = vector.extractelement %27[%c0_i64 : i64] : vector<8xf32> + %29 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %30 = affine.load %1[((%29 - %arg5) floordiv 16) mod 16, (%12 - %c0) mod 128, (((%29 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %31 = vector.extractelement %30[%c1_i64 : i64] : vector<8xf32> + %32 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %33 = affine.load %1[((%32 - %arg5) floordiv 16) mod 16, (%13 - %c0) mod 128, (((%32 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %34 = vector.extractelement %33[%c2_i64 : i64] : vector<8xf32> + %35 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %36 = affine.load %1[((%35 - %arg5) floordiv 16) mod 16, (%14 - %c0) mod 128, (((%35 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = vector.extractelement %36[%c3_i64 : i64] : vector<8xf32> + %38 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %39 = affine.load %1[((%38 - %arg5) floordiv 16) mod 16, (%15 - %c0) mod 128, (((%38 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %40 = vector.extractelement %39[%c4_i64 : i64] : vector<8xf32> + %41 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %42 = affine.load %1[((%41 - %arg5) floordiv 16) mod 16, (%16 - %c0) mod 128, (((%41 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = vector.extractelement %42[%c5_i64 : i64] : vector<8xf32> + %44 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %45 = affine.load %1[((%44 - %arg5) floordiv 16) mod 16, (%17 - %c0) mod 128, (((%44 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c6_i64 : i64] : vector<8xf32> + %47 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %48 = affine.load %1[((%47 - %arg5) floordiv 16) mod 16, (%18 - %c0) mod 128, (((%47 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %49 = vector.extractelement %48[%c7_i64 : i64] : vector<8xf32> + %50 = "accv.bin_op"(%19, %28) {predicate = 2 : i64} : (f32, f32) -> f32 + %51 = "accv.bin_op"(%20, %31) {predicate = 2 : i64} : (f32, f32) -> f32 + %52 = "accv.bin_op"(%21, %34) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %37) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %40) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %46) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %49) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = affine.load %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = vector.extractelement %58[%c0_i64 : i64] : vector<8xf32> + %60 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %61 = affine.load %0[((%60 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%60 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %62 = vector.extractelement %61[%c1_i64 : i64] : vector<8xf32> + %63 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %64 = affine.load %0[((%63 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%63 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %65 = vector.extractelement %64[%c2_i64 : i64] : vector<8xf32> + %66 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %67 = affine.load %0[((%66 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%66 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c3_i64 : i64] : vector<8xf32> + %69 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %70 = affine.load %0[((%69 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%69 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %71 = vector.extractelement %70[%c4_i64 : i64] : vector<8xf32> + %72 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %73 = affine.load %0[((%72 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%72 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c5_i64 : i64] : vector<8xf32> + %75 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %76 = affine.load %0[((%75 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%75 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c6_i64 : i64] : vector<8xf32> + %78 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %79 = affine.load %0[((%78 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%78 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = "accv.bin_op"(%59, %50) {predicate = 0 : i64} : (f32, f32) -> f32 + %82 = "accv.bin_op"(%62, %51) {predicate = 0 : i64} : (f32, f32) -> f32 + %83 = "accv.bin_op"(%65, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%68, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%71, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%74, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%77, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%80, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = affine.load %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + affine.store %90, %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %91 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %92 = affine.load %0[((%91 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%91 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = vector.insertelement %82, %92[%c1_i64 : i64] : vector<8xf32> + affine.store %93, %0[((%91 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%91 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %95 = affine.load %0[((%94 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%94 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %83, %95[%c2_i64 : i64] : vector<8xf32> + affine.store %96, %0[((%94 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%94 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %97 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %98 = affine.load %0[((%97 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%97 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %84, %98[%c3_i64 : i64] : vector<8xf32> + affine.store %99, %0[((%97 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%97 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %100 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %101 = affine.load %0[((%100 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%100 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %85, %101[%c4_i64 : i64] : vector<8xf32> + affine.store %102, %0[((%100 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%100 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %103 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %104 = affine.load %0[((%103 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%103 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %86, %104[%c5_i64 : i64] : vector<8xf32> + affine.store %105, %0[((%103 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%103 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %106 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %107 = affine.load %0[((%106 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%106 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %87, %107[%c6_i64 : i64] : vector<8xf32> + affine.store %108, %0[((%106 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%106 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %109 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %110 = affine.load %0[((%109 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%109 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %88, %110[%c7_i64 : i64] : vector<8xf32> + affine.store %111, %0[((%109 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%109 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %112 = affine.load %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %81, %112[%c0_i64 : i64] : vector<8xf32> + affine.store %113, %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %115 = affine.load %0[((%114 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%114 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %82, %115[%c1_i64 : i64] : vector<8xf32> + affine.store %116, %0[((%114 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%114 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %117 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %118 = affine.load %0[((%117 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%117 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %83, %118[%c2_i64 : i64] : vector<8xf32> + affine.store %119, %0[((%117 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%117 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %120 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %121 = affine.load %0[((%120 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%120 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = vector.insertelement %84, %121[%c3_i64 : i64] : vector<8xf32> + affine.store %122, %0[((%120 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%120 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %123 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %124 = affine.load %0[((%123 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%123 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %85, %124[%c4_i64 : i64] : vector<8xf32> + affine.store %125, %0[((%123 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%123 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %126 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %127 = affine.load %0[((%126 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%126 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = vector.insertelement %86, %127[%c5_i64 : i64] : vector<8xf32> + affine.store %128, %0[((%126 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%126 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %129 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %130 = affine.load %0[((%129 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%129 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = vector.insertelement %87, %130[%c6_i64 : i64] : vector<8xf32> + affine.store %131, %0[((%129 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%129 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %132 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_11, %c0_12) + %133 = affine.load %0[((%132 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%132 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = vector.insertelement %88, %133[%c7_i64 : i64] : vector<8xf32> + affine.store %134, %0[((%132 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%132 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %135 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_11) + %136 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %137 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %138 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %139 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %140 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %141 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %142 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %143 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %arg9, %arg11) + %144 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %145 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %146 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %147 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %148 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %149 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %150 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %151 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %152 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg10) + %153 = load %arg2[%136, %145] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %154 = load %arg2[%137, %146] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg2[%138, %147] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg2[%139, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg2[%140, %149] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %158 = load %arg2[%141, %150] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg2[%142, %151] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %160 = load %arg2[%143, %152] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %161 = affine.load %1[((%144 - %arg5) floordiv 16) mod 16, (%145 - %c0) mod 128, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c0_i64 : i64] : vector<8xf32> + %163 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %164 = affine.load %1[((%163 - %arg5) floordiv 16) mod 16, (%146 - %c0) mod 128, (((%163 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %167 = affine.load %1[((%166 - %arg5) floordiv 16) mod 16, (%147 - %c0) mod 128, (((%166 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c2_i64 : i64] : vector<8xf32> + %169 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %170 = affine.load %1[((%169 - %arg5) floordiv 16) mod 16, (%148 - %c0) mod 128, (((%169 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c3_i64 : i64] : vector<8xf32> + %172 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %173 = affine.load %1[((%172 - %arg5) floordiv 16) mod 16, (%149 - %c0) mod 128, (((%172 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %174 = vector.extractelement %173[%c4_i64 : i64] : vector<8xf32> + %175 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %176 = affine.load %1[((%175 - %arg5) floordiv 16) mod 16, (%150 - %c0) mod 128, (((%175 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c5_i64 : i64] : vector<8xf32> + %178 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %179 = affine.load %1[((%178 - %arg5) floordiv 16) mod 16, (%151 - %c0) mod 128, (((%178 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %180 = vector.extractelement %179[%c6_i64 : i64] : vector<8xf32> + %181 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %182 = affine.load %1[((%181 - %arg5) floordiv 16) mod 16, (%152 - %c0) mod 128, (((%181 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %183 = vector.extractelement %182[%c7_i64 : i64] : vector<8xf32> + %184 = "accv.bin_op"(%153, %162) {predicate = 2 : i64} : (f32, f32) -> f32 + %185 = "accv.bin_op"(%154, %165) {predicate = 2 : i64} : (f32, f32) -> f32 + %186 = "accv.bin_op"(%155, %168) {predicate = 2 : i64} : (f32, f32) -> f32 + %187 = "accv.bin_op"(%156, %171) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = "accv.bin_op"(%157, %174) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = "accv.bin_op"(%158, %177) {predicate = 2 : i64} : (f32, f32) -> f32 + %190 = "accv.bin_op"(%159, %180) {predicate = 2 : i64} : (f32, f32) -> f32 + %191 = "accv.bin_op"(%160, %183) {predicate = 2 : i64} : (f32, f32) -> f32 + %192 = affine.load %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c0_i64 : i64] : vector<8xf32> + %194 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %195 = affine.load %0[((%194 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%194 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %196 = vector.extractelement %195[%c1_i64 : i64] : vector<8xf32> + %197 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %198 = affine.load %0[((%197 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%197 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c2_i64 : i64] : vector<8xf32> + %200 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %201 = affine.load %0[((%200 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%200 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %202 = vector.extractelement %201[%c3_i64 : i64] : vector<8xf32> + %203 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %204 = affine.load %0[((%203 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%203 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %205 = vector.extractelement %204[%c4_i64 : i64] : vector<8xf32> + %206 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %207 = affine.load %0[((%206 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%206 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %208 = vector.extractelement %207[%c5_i64 : i64] : vector<8xf32> + %209 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %210 = affine.load %0[((%209 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%209 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %211 = vector.extractelement %210[%c6_i64 : i64] : vector<8xf32> + %212 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %213 = affine.load %0[((%212 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%212 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %214 = vector.extractelement %213[%c7_i64 : i64] : vector<8xf32> + %215 = "accv.bin_op"(%193, %184) {predicate = 0 : i64} : (f32, f32) -> f32 + %216 = "accv.bin_op"(%196, %185) {predicate = 0 : i64} : (f32, f32) -> f32 + %217 = "accv.bin_op"(%199, %186) {predicate = 0 : i64} : (f32, f32) -> f32 + %218 = "accv.bin_op"(%202, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + %219 = "accv.bin_op"(%205, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + %220 = "accv.bin_op"(%208, %189) {predicate = 0 : i64} : (f32, f32) -> f32 + %221 = "accv.bin_op"(%211, %190) {predicate = 0 : i64} : (f32, f32) -> f32 + %222 = "accv.bin_op"(%214, %191) {predicate = 0 : i64} : (f32, f32) -> f32 + %223 = affine.load %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %215, %223[%c0_i64 : i64] : vector<8xf32> + affine.store %224, %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %225 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %226 = affine.load %0[((%225 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%225 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %216, %226[%c1_i64 : i64] : vector<8xf32> + affine.store %227, %0[((%225 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%225 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %228 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %229 = affine.load %0[((%228 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%228 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %217, %229[%c2_i64 : i64] : vector<8xf32> + affine.store %230, %0[((%228 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%228 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %231 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %232 = affine.load %0[((%231 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%231 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %218, %232[%c3_i64 : i64] : vector<8xf32> + affine.store %233, %0[((%231 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%231 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %234 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %235 = affine.load %0[((%234 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%234 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %236 = vector.insertelement %219, %235[%c4_i64 : i64] : vector<8xf32> + affine.store %236, %0[((%234 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%234 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %237 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %238 = affine.load %0[((%237 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%237 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %220, %238[%c5_i64 : i64] : vector<8xf32> + affine.store %239, %0[((%237 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%237 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %240 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %241 = affine.load %0[((%240 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%240 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %242 = vector.insertelement %221, %241[%c6_i64 : i64] : vector<8xf32> + affine.store %242, %0[((%240 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%240 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %243 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %244 = affine.load %0[((%243 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%243 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %245 = vector.insertelement %222, %244[%c7_i64 : i64] : vector<8xf32> + affine.store %245, %0[((%243 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%243 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %246 = affine.load %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = vector.insertelement %215, %246[%c0_i64 : i64] : vector<8xf32> + affine.store %247, %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %249 = affine.load %0[((%248 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%248 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %250 = vector.insertelement %216, %249[%c1_i64 : i64] : vector<8xf32> + affine.store %250, %0[((%248 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%248 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %251 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %252 = affine.load %0[((%251 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%251 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %253 = vector.insertelement %217, %252[%c2_i64 : i64] : vector<8xf32> + affine.store %253, %0[((%251 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%251 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %254 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %255 = affine.load %0[((%254 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%254 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %256 = vector.insertelement %218, %255[%c3_i64 : i64] : vector<8xf32> + affine.store %256, %0[((%254 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%254 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %257 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %258 = affine.load %0[((%257 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%257 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %259 = vector.insertelement %219, %258[%c4_i64 : i64] : vector<8xf32> + affine.store %259, %0[((%257 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%257 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %260 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %261 = affine.load %0[((%260 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%260 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %262 = vector.insertelement %220, %261[%c5_i64 : i64] : vector<8xf32> + affine.store %262, %0[((%260 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%260 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %263 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %264 = affine.load %0[((%263 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%263 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %265 = vector.insertelement %221, %264[%c6_i64 : i64] : vector<8xf32> + affine.store %265, %0[((%263 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%263 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %266 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_12) + %267 = affine.load %0[((%266 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%266 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %268 = vector.insertelement %222, %267[%c7_i64 : i64] : vector<8xf32> + affine.store %268, %0[((%266 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%266 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_i,12}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_o,11}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 4]} + affine.for %arg9 = 0 to 4 { + %2 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %3 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %10 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %19 = load %arg2[%2, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %20 = load %arg2[%3, %12] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = load %arg2[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg2[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg2[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg2[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg2[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg2[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = affine.load %1[((%10 - %arg5) floordiv 16) mod 16, (%11 - %c0) mod 128, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %28 = vector.extractelement %27[%c0_i64 : i64] : vector<8xf32> + %29 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %30 = affine.load %1[((%29 - %arg5) floordiv 16) mod 16, (%12 - %c0) mod 128, (((%29 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %31 = vector.extractelement %30[%c1_i64 : i64] : vector<8xf32> + %32 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %33 = affine.load %1[((%32 - %arg5) floordiv 16) mod 16, (%13 - %c0) mod 128, (((%32 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %34 = vector.extractelement %33[%c2_i64 : i64] : vector<8xf32> + %35 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %36 = affine.load %1[((%35 - %arg5) floordiv 16) mod 16, (%14 - %c0) mod 128, (((%35 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = vector.extractelement %36[%c3_i64 : i64] : vector<8xf32> + %38 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %39 = affine.load %1[((%38 - %arg5) floordiv 16) mod 16, (%15 - %c0) mod 128, (((%38 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %40 = vector.extractelement %39[%c4_i64 : i64] : vector<8xf32> + %41 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %42 = affine.load %1[((%41 - %arg5) floordiv 16) mod 16, (%16 - %c0) mod 128, (((%41 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = vector.extractelement %42[%c5_i64 : i64] : vector<8xf32> + %44 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %45 = affine.load %1[((%44 - %arg5) floordiv 16) mod 16, (%17 - %c0) mod 128, (((%44 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %46 = vector.extractelement %45[%c6_i64 : i64] : vector<8xf32> + %47 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %48 = affine.load %1[((%47 - %arg5) floordiv 16) mod 16, (%18 - %c0) mod 128, (((%47 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %49 = vector.extractelement %48[%c7_i64 : i64] : vector<8xf32> + %50 = "accv.bin_op"(%19, %28) {predicate = 2 : i64} : (f32, f32) -> f32 + %51 = "accv.bin_op"(%20, %31) {predicate = 2 : i64} : (f32, f32) -> f32 + %52 = "accv.bin_op"(%21, %34) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %37) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %40) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %46) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %49) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = affine.load %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = vector.extractelement %58[%c0_i64 : i64] : vector<8xf32> + %60 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %61 = affine.load %0[((%60 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%60 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %62 = vector.extractelement %61[%c1_i64 : i64] : vector<8xf32> + %63 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %64 = affine.load %0[((%63 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%63 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %65 = vector.extractelement %64[%c2_i64 : i64] : vector<8xf32> + %66 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %67 = affine.load %0[((%66 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%66 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c3_i64 : i64] : vector<8xf32> + %69 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %70 = affine.load %0[((%69 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%69 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %71 = vector.extractelement %70[%c4_i64 : i64] : vector<8xf32> + %72 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %73 = affine.load %0[((%72 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%72 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c5_i64 : i64] : vector<8xf32> + %75 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %76 = affine.load %0[((%75 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%75 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %77 = vector.extractelement %76[%c6_i64 : i64] : vector<8xf32> + %78 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %79 = affine.load %0[((%78 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%78 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %80 = vector.extractelement %79[%c7_i64 : i64] : vector<8xf32> + %81 = "accv.bin_op"(%59, %50) {predicate = 0 : i64} : (f32, f32) -> f32 + %82 = "accv.bin_op"(%62, %51) {predicate = 0 : i64} : (f32, f32) -> f32 + %83 = "accv.bin_op"(%65, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%68, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%71, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%74, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%77, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%80, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = affine.load %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %81, %89[%c0_i64 : i64] : vector<8xf32> + affine.store %90, %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %91 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %92 = affine.load %0[((%91 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%91 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = vector.insertelement %82, %92[%c1_i64 : i64] : vector<8xf32> + affine.store %93, %0[((%91 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%91 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %95 = affine.load %0[((%94 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%94 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %83, %95[%c2_i64 : i64] : vector<8xf32> + affine.store %96, %0[((%94 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%94 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %97 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %98 = affine.load %0[((%97 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%97 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %84, %98[%c3_i64 : i64] : vector<8xf32> + affine.store %99, %0[((%97 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%97 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %100 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %101 = affine.load %0[((%100 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%100 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %85, %101[%c4_i64 : i64] : vector<8xf32> + affine.store %102, %0[((%100 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%100 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %103 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %104 = affine.load %0[((%103 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%103 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %86, %104[%c5_i64 : i64] : vector<8xf32> + affine.store %105, %0[((%103 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%103 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %106 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %107 = affine.load %0[((%106 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%106 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = vector.insertelement %87, %107[%c6_i64 : i64] : vector<8xf32> + affine.store %108, %0[((%106 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%106 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %109 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %110 = affine.load %0[((%109 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%109 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = vector.insertelement %88, %110[%c7_i64 : i64] : vector<8xf32> + affine.store %111, %0[((%109 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%109 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %112 = affine.load %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %81, %112[%c0_i64 : i64] : vector<8xf32> + affine.store %113, %0[((%10 - %arg5) floordiv 16) mod 16, (%2 - %arg6) mod 6, (((%10 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %115 = affine.load %0[((%114 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%114 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %82, %115[%c1_i64 : i64] : vector<8xf32> + affine.store %116, %0[((%114 - %arg5) floordiv 16) mod 16, (%3 - %arg6) mod 6, (((%114 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %117 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %118 = affine.load %0[((%117 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%117 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %83, %118[%c2_i64 : i64] : vector<8xf32> + affine.store %119, %0[((%117 - %arg5) floordiv 16) mod 16, (%4 - %arg6) mod 6, (((%117 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %120 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %121 = affine.load %0[((%120 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%120 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = vector.insertelement %84, %121[%c3_i64 : i64] : vector<8xf32> + affine.store %122, %0[((%120 - %arg5) floordiv 16) mod 16, (%5 - %arg6) mod 6, (((%120 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %123 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %124 = affine.load %0[((%123 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%123 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %85, %124[%c4_i64 : i64] : vector<8xf32> + affine.store %125, %0[((%123 - %arg5) floordiv 16) mod 16, (%6 - %arg6) mod 6, (((%123 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %126 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %127 = affine.load %0[((%126 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%126 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = vector.insertelement %86, %127[%c5_i64 : i64] : vector<8xf32> + affine.store %128, %0[((%126 - %arg5) floordiv 16) mod 16, (%7 - %arg6) mod 6, (((%126 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %129 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %130 = affine.load %0[((%129 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%129 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = vector.insertelement %87, %130[%c6_i64 : i64] : vector<8xf32> + affine.store %131, %0[((%129 - %arg5) floordiv 16) mod 16, (%8 - %arg6) mod 6, (((%129 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %132 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %c0_9, %c0_10) + %133 = affine.load %0[((%132 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%132 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = vector.insertelement %88, %133[%c7_i64 : i64] : vector<8xf32> + affine.store %134, %0[((%132 - %arg5) floordiv 16) mod 16, (%9 - %arg6) mod 6, (((%132 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %135 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_9) + %136 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %137 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %138 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %139 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %140 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %141 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %142 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %143 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_7, %c0_8) + %144 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %145 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %146 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %147 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %148 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %149 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %150 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %151 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %152 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg8, %arg9) + %153 = load %arg2[%136, %145] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %154 = load %arg2[%137, %146] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg2[%138, %147] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg2[%139, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg2[%140, %149] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %158 = load %arg2[%141, %150] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg2[%142, %151] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %160 = load %arg2[%143, %152] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %161 = affine.load %1[((%144 - %arg5) floordiv 16) mod 16, (%145 - %c0) mod 128, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c0_i64 : i64] : vector<8xf32> + %163 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %164 = affine.load %1[((%163 - %arg5) floordiv 16) mod 16, (%146 - %c0) mod 128, (((%163 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c1_i64 : i64] : vector<8xf32> + %166 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %167 = affine.load %1[((%166 - %arg5) floordiv 16) mod 16, (%147 - %c0) mod 128, (((%166 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c2_i64 : i64] : vector<8xf32> + %169 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %170 = affine.load %1[((%169 - %arg5) floordiv 16) mod 16, (%148 - %c0) mod 128, (((%169 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %171 = vector.extractelement %170[%c3_i64 : i64] : vector<8xf32> + %172 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %173 = affine.load %1[((%172 - %arg5) floordiv 16) mod 16, (%149 - %c0) mod 128, (((%172 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %174 = vector.extractelement %173[%c4_i64 : i64] : vector<8xf32> + %175 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %176 = affine.load %1[((%175 - %arg5) floordiv 16) mod 16, (%150 - %c0) mod 128, (((%175 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %177 = vector.extractelement %176[%c5_i64 : i64] : vector<8xf32> + %178 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %179 = affine.load %1[((%178 - %arg5) floordiv 16) mod 16, (%151 - %c0) mod 128, (((%178 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %180 = vector.extractelement %179[%c6_i64 : i64] : vector<8xf32> + %181 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %182 = affine.load %1[((%181 - %arg5) floordiv 16) mod 16, (%152 - %c0) mod 128, (((%181 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %183 = vector.extractelement %182[%c7_i64 : i64] : vector<8xf32> + %184 = "accv.bin_op"(%153, %162) {predicate = 2 : i64} : (f32, f32) -> f32 + %185 = "accv.bin_op"(%154, %165) {predicate = 2 : i64} : (f32, f32) -> f32 + %186 = "accv.bin_op"(%155, %168) {predicate = 2 : i64} : (f32, f32) -> f32 + %187 = "accv.bin_op"(%156, %171) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = "accv.bin_op"(%157, %174) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = "accv.bin_op"(%158, %177) {predicate = 2 : i64} : (f32, f32) -> f32 + %190 = "accv.bin_op"(%159, %180) {predicate = 2 : i64} : (f32, f32) -> f32 + %191 = "accv.bin_op"(%160, %183) {predicate = 2 : i64} : (f32, f32) -> f32 + %192 = affine.load %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c0_i64 : i64] : vector<8xf32> + %194 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %195 = affine.load %0[((%194 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%194 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %196 = vector.extractelement %195[%c1_i64 : i64] : vector<8xf32> + %197 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %198 = affine.load %0[((%197 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%197 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c2_i64 : i64] : vector<8xf32> + %200 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %201 = affine.load %0[((%200 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%200 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %202 = vector.extractelement %201[%c3_i64 : i64] : vector<8xf32> + %203 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %204 = affine.load %0[((%203 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%203 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %205 = vector.extractelement %204[%c4_i64 : i64] : vector<8xf32> + %206 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %207 = affine.load %0[((%206 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%206 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %208 = vector.extractelement %207[%c5_i64 : i64] : vector<8xf32> + %209 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %210 = affine.load %0[((%209 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%209 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %211 = vector.extractelement %210[%c6_i64 : i64] : vector<8xf32> + %212 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %213 = affine.load %0[((%212 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%212 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %214 = vector.extractelement %213[%c7_i64 : i64] : vector<8xf32> + %215 = "accv.bin_op"(%193, %184) {predicate = 0 : i64} : (f32, f32) -> f32 + %216 = "accv.bin_op"(%196, %185) {predicate = 0 : i64} : (f32, f32) -> f32 + %217 = "accv.bin_op"(%199, %186) {predicate = 0 : i64} : (f32, f32) -> f32 + %218 = "accv.bin_op"(%202, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + %219 = "accv.bin_op"(%205, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + %220 = "accv.bin_op"(%208, %189) {predicate = 0 : i64} : (f32, f32) -> f32 + %221 = "accv.bin_op"(%211, %190) {predicate = 0 : i64} : (f32, f32) -> f32 + %222 = "accv.bin_op"(%214, %191) {predicate = 0 : i64} : (f32, f32) -> f32 + %223 = affine.load %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %215, %223[%c0_i64 : i64] : vector<8xf32> + affine.store %224, %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %225 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %226 = affine.load %0[((%225 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%225 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %216, %226[%c1_i64 : i64] : vector<8xf32> + affine.store %227, %0[((%225 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%225 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %228 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %229 = affine.load %0[((%228 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%228 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %217, %229[%c2_i64 : i64] : vector<8xf32> + affine.store %230, %0[((%228 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%228 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %231 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %232 = affine.load %0[((%231 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%231 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %233 = vector.insertelement %218, %232[%c3_i64 : i64] : vector<8xf32> + affine.store %233, %0[((%231 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%231 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %234 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %235 = affine.load %0[((%234 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%234 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %236 = vector.insertelement %219, %235[%c4_i64 : i64] : vector<8xf32> + affine.store %236, %0[((%234 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%234 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %237 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %238 = affine.load %0[((%237 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%237 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %239 = vector.insertelement %220, %238[%c5_i64 : i64] : vector<8xf32> + affine.store %239, %0[((%237 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%237 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %240 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %241 = affine.load %0[((%240 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%240 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %242 = vector.insertelement %221, %241[%c6_i64 : i64] : vector<8xf32> + affine.store %242, %0[((%240 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%240 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %243 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %244 = affine.load %0[((%243 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%243 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %245 = vector.insertelement %222, %244[%c7_i64 : i64] : vector<8xf32> + affine.store %245, %0[((%243 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%243 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %246 = affine.load %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = vector.insertelement %215, %246[%c0_i64 : i64] : vector<8xf32> + affine.store %247, %0[((%144 - %arg5) floordiv 16) mod 16, (%136 - %arg6) mod 6, (((%144 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %249 = affine.load %0[((%248 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%248 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %250 = vector.insertelement %216, %249[%c1_i64 : i64] : vector<8xf32> + affine.store %250, %0[((%248 - %arg5) floordiv 16) mod 16, (%137 - %arg6) mod 6, (((%248 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %251 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %252 = affine.load %0[((%251 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%251 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %253 = vector.insertelement %217, %252[%c2_i64 : i64] : vector<8xf32> + affine.store %253, %0[((%251 - %arg5) floordiv 16) mod 16, (%138 - %arg6) mod 6, (((%251 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %254 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %255 = affine.load %0[((%254 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%254 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %256 = vector.insertelement %218, %255[%c3_i64 : i64] : vector<8xf32> + affine.store %256, %0[((%254 - %arg5) floordiv 16) mod 16, (%139 - %arg6) mod 6, (((%254 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %257 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %258 = affine.load %0[((%257 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%257 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %259 = vector.insertelement %219, %258[%c4_i64 : i64] : vector<8xf32> + affine.store %259, %0[((%257 - %arg5) floordiv 16) mod 16, (%140 - %arg6) mod 6, (((%257 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %260 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %261 = affine.load %0[((%260 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%260 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %262 = vector.insertelement %220, %261[%c5_i64 : i64] : vector<8xf32> + affine.store %262, %0[((%260 - %arg5) floordiv 16) mod 16, (%141 - %arg6) mod 6, (((%260 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %263 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %264 = affine.load %0[((%263 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%263 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %265 = vector.insertelement %221, %264[%c6_i64 : i64] : vector<8xf32> + affine.store %265, %0[((%263 - %arg5) floordiv 16) mod 16, (%142 - %arg6) mod 6, (((%263 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %266 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg5, %arg7, %135, %c0_10) + %267 = affine.load %0[((%266 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%266 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %268 = vector.insertelement %222, %267[%c7_i64 : i64] : vector<8xf32> + affine.store %268, %0[((%266 - %arg5) floordiv 16) mod 16, (%143 - %arg6) mod 6, (((%266 - %arg5) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 128]} + affine.for %arg7 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %2 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %3 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %c0_6) + %4 = vector.transfer_read %arg3[%2, %3], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %5 = affine.load %0[((%arg7 + %c0_6 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %c0_6 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %6 = addf %4, %5 : vector<8xf32> + store %6, %arg4[%c0_5, %c0_6] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_6) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %7) + %10 = vector.transfer_read %arg3[%8, %9], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %11 = affine.load %0[((%arg7 + %7 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %7 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %12 = addf %10, %11 : vector<8xf32> + store %12, %arg4[%c0_5, %7] : memref<1x16xvector<8xf32>> + %13 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_6) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %13) + %16 = vector.transfer_read %arg3[%14, %15], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %17 = affine.load %0[((%arg7 + %13 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %13 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %18 = addf %16, %17 : vector<8xf32> + store %18, %arg4[%c0_5, %13] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_6) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %19) + %22 = vector.transfer_read %arg3[%20, %21], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %23 = affine.load %0[((%arg7 + %19 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %19 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %24 = addf %22, %23 : vector<8xf32> + store %24, %arg4[%c0_5, %19] : memref<1x16xvector<8xf32>> + %25 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_6) + %26 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %27 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %25) + %28 = vector.transfer_read %arg3[%26, %27], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %29 = affine.load %0[((%arg7 + %25 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %25 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %30 = addf %28, %29 : vector<8xf32> + store %30, %arg4[%c0_5, %25] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_6) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %31) + %34 = vector.transfer_read %arg3[%32, %33], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %35 = affine.load %0[((%arg7 + %31 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %31 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %36 = addf %34, %35 : vector<8xf32> + store %36, %arg4[%c0_5, %31] : memref<1x16xvector<8xf32>> + %37 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_6) + %38 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %39 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %37) + %40 = vector.transfer_read %arg3[%38, %39], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %41 = affine.load %0[((%arg7 + %37 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %37 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %42 = addf %40, %41 : vector<8xf32> + store %42, %arg4[%c0_5, %37] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_6) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %43) + %46 = vector.transfer_read %arg3[%44, %45], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %47 = affine.load %0[((%arg7 + %43 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %43 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %48 = addf %46, %47 : vector<8xf32> + store %48, %arg4[%c0_5, %43] : memref<1x16xvector<8xf32>> + %49 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_6) + %50 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %51 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %49) + %52 = vector.transfer_read %arg3[%50, %51], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %53 = affine.load %0[((%arg7 + %49 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %49 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %54 = addf %52, %53 : vector<8xf32> + store %54, %arg4[%c0_5, %49] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_6) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %55) + %58 = vector.transfer_read %arg3[%56, %57], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %59 = affine.load %0[((%arg7 + %55 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %55 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %60 = addf %58, %59 : vector<8xf32> + store %60, %arg4[%c0_5, %55] : memref<1x16xvector<8xf32>> + %61 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_6) + %62 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %63 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %61) + %64 = vector.transfer_read %arg3[%62, %63], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %65 = affine.load %0[((%arg7 + %61 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %61 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %66 = addf %64, %65 : vector<8xf32> + store %66, %arg4[%c0_5, %61] : memref<1x16xvector<8xf32>> + %67 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_6) + %68 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %69 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %67) + %70 = vector.transfer_read %arg3[%68, %69], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %71 = affine.load %0[((%arg7 + %67 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %67 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %72 = addf %70, %71 : vector<8xf32> + store %72, %arg4[%c0_5, %67] : memref<1x16xvector<8xf32>> + %73 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_6) + %74 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %75 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %73) + %76 = vector.transfer_read %arg3[%74, %75], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %77 = affine.load %0[((%arg7 + %73 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %73 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %78 = addf %76, %77 : vector<8xf32> + store %78, %arg4[%c0_5, %73] : memref<1x16xvector<8xf32>> + %79 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_6) + %80 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %81 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %79) + %82 = vector.transfer_read %arg3[%80, %81], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %83 = affine.load %0[((%arg7 + %79 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %79 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %84 = addf %82, %83 : vector<8xf32> + store %84, %arg4[%c0_5, %79] : memref<1x16xvector<8xf32>> + %85 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_6) + %86 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %87 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %85) + %88 = vector.transfer_read %arg3[%86, %87], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %89 = affine.load %0[((%arg7 + %85 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %85 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %90 = addf %88, %89 : vector<8xf32> + store %90, %arg4[%c0_5, %85] : memref<1x16xvector<8xf32>> + %91 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_6) + %92 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_5) + %93 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %91) + %94 = vector.transfer_read %arg3[%92, %93], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %95 = affine.load %0[((%arg7 + %91 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg7 + %91 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = addf %94, %95 : vector<8xf32> + store %96, %arg4[%c0_5, %91] : memref<1x16xvector<8xf32>> + affine.for %arg8 = 0 to 16 { + %97 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_4) + %98 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %arg8) + %99 = load %arg4[%c0_4, %arg8] : memref<1x16xvector<8xf32>> + vector.transfer_write %99, %arg3[%97, %98] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 1]} + } else { + %2 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %3 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %c0_3) + %4 = vector.transfer_read %arg3[%2, %3], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %5 = affine.load %0[((%arg7 + %c0_3 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %c0_3 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %6 = addf %4, %5 : vector<8xf32> + store %6, %arg4[%c0_2, %c0_3] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_3) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %7) + %10 = vector.transfer_read %arg3[%8, %9], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %11 = affine.load %0[((%arg7 + %7 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %7 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %12 = addf %10, %11 : vector<8xf32> + store %12, %arg4[%c0_2, %7] : memref<1x16xvector<8xf32>> + %13 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_3) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %13) + %16 = vector.transfer_read %arg3[%14, %15], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %17 = affine.load %0[((%arg7 + %13 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %13 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %18 = addf %16, %17 : vector<8xf32> + store %18, %arg4[%c0_2, %13] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_3) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %19) + %22 = vector.transfer_read %arg3[%20, %21], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %23 = affine.load %0[((%arg7 + %19 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %19 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %24 = addf %22, %23 : vector<8xf32> + store %24, %arg4[%c0_2, %19] : memref<1x16xvector<8xf32>> + %25 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_3) + %26 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %27 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %25) + %28 = vector.transfer_read %arg3[%26, %27], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %29 = affine.load %0[((%arg7 + %25 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %25 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %30 = addf %28, %29 : vector<8xf32> + store %30, %arg4[%c0_2, %25] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_3) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %31) + %34 = vector.transfer_read %arg3[%32, %33], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %35 = affine.load %0[((%arg7 + %31 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %31 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %36 = addf %34, %35 : vector<8xf32> + store %36, %arg4[%c0_2, %31] : memref<1x16xvector<8xf32>> + %37 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_3) + %38 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %39 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %37) + %40 = vector.transfer_read %arg3[%38, %39], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %41 = affine.load %0[((%arg7 + %37 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %37 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %42 = addf %40, %41 : vector<8xf32> + store %42, %arg4[%c0_2, %37] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_3) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %43) + %46 = vector.transfer_read %arg3[%44, %45], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %47 = affine.load %0[((%arg7 + %43 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %43 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %48 = addf %46, %47 : vector<8xf32> + store %48, %arg4[%c0_2, %43] : memref<1x16xvector<8xf32>> + %49 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_3) + %50 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %51 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %49) + %52 = vector.transfer_read %arg3[%50, %51], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %53 = affine.load %0[((%arg7 + %49 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %49 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %54 = addf %52, %53 : vector<8xf32> + store %54, %arg4[%c0_2, %49] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_3) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %55) + %58 = vector.transfer_read %arg3[%56, %57], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %59 = affine.load %0[((%arg7 + %55 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %55 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %60 = addf %58, %59 : vector<8xf32> + store %60, %arg4[%c0_2, %55] : memref<1x16xvector<8xf32>> + %61 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_3) + %62 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %63 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %61) + %64 = vector.transfer_read %arg3[%62, %63], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %65 = affine.load %0[((%arg7 + %61 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %61 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %66 = addf %64, %65 : vector<8xf32> + store %66, %arg4[%c0_2, %61] : memref<1x16xvector<8xf32>> + %67 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_3) + %68 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %69 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %67) + %70 = vector.transfer_read %arg3[%68, %69], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %71 = affine.load %0[((%arg7 + %67 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %67 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %72 = addf %70, %71 : vector<8xf32> + store %72, %arg4[%c0_2, %67] : memref<1x16xvector<8xf32>> + %73 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_3) + %74 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %75 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %73) + %76 = vector.transfer_read %arg3[%74, %75], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %77 = affine.load %0[((%arg7 + %73 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %73 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %78 = addf %76, %77 : vector<8xf32> + store %78, %arg4[%c0_2, %73] : memref<1x16xvector<8xf32>> + %79 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_3) + %80 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %81 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %79) + %82 = vector.transfer_read %arg3[%80, %81], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %83 = affine.load %0[((%arg7 + %79 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %79 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %84 = addf %82, %83 : vector<8xf32> + store %84, %arg4[%c0_2, %79] : memref<1x16xvector<8xf32>> + %85 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_3) + %86 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %87 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %85) + %88 = vector.transfer_read %arg3[%86, %87], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %89 = affine.load %0[((%arg7 + %85 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %85 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %90 = addf %88, %89 : vector<8xf32> + store %90, %arg4[%c0_2, %85] : memref<1x16xvector<8xf32>> + %91 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_3) + %92 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_2) + %93 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %91) + %94 = vector.transfer_read %arg3[%92, %93], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %95 = affine.load %0[((%arg7 + %91 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg7 + %91 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = addf %94, %95 : vector<8xf32> + store %96, %arg4[%c0_2, %91] : memref<1x16xvector<8xf32>> + affine.for %arg8 = 0 to 16 { + %97 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg6, %c0_0, %c0_1) + %98 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg5, %arg7, %arg8) + %99 = load %arg4[%c0_1, %arg8] : memref<1x16xvector<8xf32>> + vector.transfer_write %99, %arg3[%97, %98] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 1]} + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 784 : i64, index = #accln<"index{i_o,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/4_LinalgLowerToAffineLoops.mlir b/Tutorials/optimized_matmul/mlir/4_LinalgLowerToAffineLoops.mlir new file mode 100644 index 00000000..d8cf6fcd --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/4_LinalgLowerToAffineLoops.mlir @@ -0,0 +1,1711 @@ +module @optimized_matmul { + accv.module "optimized_matmul" { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c780 = constant 780 : index + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + affine.for %arg3 = 0 to 512 step 256 { + affine.for %arg4 = 0 to 780 step 6 { + affine.for %arg5 = 0 to 256 step 16 { + affine.for %arg6 = 0 to 128 step 4 { + affine.for %arg7 = 0 to 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = "accv.bin_op"(%2, %3) {predicate = 2 : i64} : (f32, f32) -> f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = "accv.bin_op"(%5, %4) {predicate = 0 : i64} : (f32, f32) -> f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %10 = load %arg0[%arg4, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg1[%9, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %12 = "accv.bin_op"(%10, %11) {predicate = 2 : i64} : (f32, f32) -> f32 + %13 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = "accv.bin_op"(%13, %12) {predicate = 0 : i64} : (f32, f32) -> f32 + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %15, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %17 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %18 = load %arg0[%arg4, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg1[%17, %16] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = "accv.bin_op"(%18, %19) {predicate = 2 : i64} : (f32, f32) -> f32 + %21 = load %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = "accv.bin_op"(%21, %20) {predicate = 0 : i64} : (f32, f32) -> f32 + store %22, %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %23 = load %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %23, %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %25 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %26 = load %arg0[%arg4, %25] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg1[%25, %24] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = "accv.bin_op"(%26, %27) {predicate = 2 : i64} : (f32, f32) -> f32 + %29 = load %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %30 = "accv.bin_op"(%29, %28) {predicate = 0 : i64} : (f32, f32) -> f32 + store %30, %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = load %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %31, %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %33 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %34 = load %arg0[%arg4, %33] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %35 = load %arg1[%33, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = "accv.bin_op"(%34, %35) {predicate = 2 : i64} : (f32, f32) -> f32 + %37 = load %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %38 = "accv.bin_op"(%37, %36) {predicate = 0 : i64} : (f32, f32) -> f32 + store %38, %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = load %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %39, %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %41 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %42 = load %arg0[%arg4, %41] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %43 = load %arg1[%41, %40] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = "accv.bin_op"(%42, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %45 = load %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = "accv.bin_op"(%45, %44) {predicate = 0 : i64} : (f32, f32) -> f32 + store %46, %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %47 = load %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %47, %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %49 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %50 = load %arg0[%arg4, %49] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %51 = load %arg1[%49, %48] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = "accv.bin_op"(%50, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = load %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %54 = "accv.bin_op"(%53, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + store %54, %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = load %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %55, %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %57 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %58 = load %arg0[%arg4, %57] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%57, %56] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = "accv.bin_op"(%58, %59) {predicate = 2 : i64} : (f32, f32) -> f32 + %61 = load %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = "accv.bin_op"(%61, %60) {predicate = 0 : i64} : (f32, f32) -> f32 + store %62, %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %65 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %66 = load %arg0[%arg4, %65] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %67 = load %arg1[%65, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %68 = "accv.bin_op"(%66, %67) {predicate = 2 : i64} : (f32, f32) -> f32 + %69 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = "accv.bin_op"(%69, %68) {predicate = 0 : i64} : (f32, f32) -> f32 + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %71, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %72 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %73 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %74 = load %arg0[%arg4, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = "accv.bin_op"(%74, %75) {predicate = 2 : i64} : (f32, f32) -> f32 + %77 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = "accv.bin_op"(%77, %76) {predicate = 0 : i64} : (f32, f32) -> f32 + store %78, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %81 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %82 = load %arg0[%arg4, %81] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %83 = load %arg1[%81, %80] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = "accv.bin_op"(%82, %83) {predicate = 2 : i64} : (f32, f32) -> f32 + %85 = load %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %86 = "accv.bin_op"(%85, %84) {predicate = 0 : i64} : (f32, f32) -> f32 + store %86, %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = load %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %87, %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %89 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %90 = load %arg0[%arg4, %89] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %91 = load %arg1[%89, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = "accv.bin_op"(%90, %91) {predicate = 2 : i64} : (f32, f32) -> f32 + %93 = load %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = "accv.bin_op"(%93, %92) {predicate = 0 : i64} : (f32, f32) -> f32 + store %94, %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = load %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %95, %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %97 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %98 = load %arg0[%arg4, %97] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %99 = load %arg1[%97, %96] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %100 = "accv.bin_op"(%98, %99) {predicate = 2 : i64} : (f32, f32) -> f32 + %101 = load %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = "accv.bin_op"(%101, %100) {predicate = 0 : i64} : (f32, f32) -> f32 + store %102, %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = load %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %103, %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %105 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %106 = load %arg0[%arg4, %105] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %107 = load %arg1[%105, %104] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %108 = "accv.bin_op"(%106, %107) {predicate = 2 : i64} : (f32, f32) -> f32 + %109 = load %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %110 = "accv.bin_op"(%109, %108) {predicate = 0 : i64} : (f32, f32) -> f32 + store %110, %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = load %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %111, %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %113 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %114 = load %arg0[%arg4, %113] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%113, %112] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = "accv.bin_op"(%114, %115) {predicate = 2 : i64} : (f32, f32) -> f32 + %117 = load %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = "accv.bin_op"(%117, %116) {predicate = 0 : i64} : (f32, f32) -> f32 + store %118, %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %121 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %122 = load %arg0[%arg4, %121] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg1[%121, %120] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = "accv.bin_op"(%122, %123) {predicate = 2 : i64} : (f32, f32) -> f32 + %125 = load %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = "accv.bin_op"(%125, %124) {predicate = 0 : i64} : (f32, f32) -> f32 + store %126, %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = load %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %127, %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %129 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %130 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %131 = load %arg0[%128, %130] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%130, %129] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = "accv.bin_op"(%131, %132) {predicate = 2 : i64} : (f32, f32) -> f32 + %134 = load %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = "accv.bin_op"(%134, %133) {predicate = 0 : i64} : (f32, f32) -> f32 + store %135, %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %138 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %139 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %140 = load %arg0[%137, %139] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %141 = load %arg1[%139, %138] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = "accv.bin_op"(%140, %141) {predicate = 2 : i64} : (f32, f32) -> f32 + %143 = load %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = "accv.bin_op"(%143, %142) {predicate = 0 : i64} : (f32, f32) -> f32 + store %144, %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = load %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %145, %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %147 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %148 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %149 = load %arg0[%146, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%148, %147] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = "accv.bin_op"(%149, %150) {predicate = 2 : i64} : (f32, f32) -> f32 + %152 = load %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = "accv.bin_op"(%152, %151) {predicate = 0 : i64} : (f32, f32) -> f32 + store %153, %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %156 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %157 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %158 = load %arg0[%155, %157] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg1[%157, %156] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = "accv.bin_op"(%158, %159) {predicate = 2 : i64} : (f32, f32) -> f32 + %161 = load %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = "accv.bin_op"(%161, %160) {predicate = 0 : i64} : (f32, f32) -> f32 + store %162, %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = load %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %163, %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %165 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %166 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %167 = load %arg0[%164, %166] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%166, %165] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = "accv.bin_op"(%167, %168) {predicate = 2 : i64} : (f32, f32) -> f32 + %170 = load %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = "accv.bin_op"(%170, %169) {predicate = 0 : i64} : (f32, f32) -> f32 + store %171, %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %174 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %175 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %176 = load %arg0[%173, %175] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %177 = load %arg1[%175, %174] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = "accv.bin_op"(%176, %177) {predicate = 2 : i64} : (f32, f32) -> f32 + %179 = load %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = "accv.bin_op"(%179, %178) {predicate = 0 : i64} : (f32, f32) -> f32 + store %180, %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = load %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %181, %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %183 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %184 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %185 = load %arg0[%182, %184] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%184, %183] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = "accv.bin_op"(%185, %186) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = load %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = "accv.bin_op"(%188, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + store %189, %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %192 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %193 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %194 = load %arg0[%191, %193] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %195 = load %arg1[%193, %192] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = "accv.bin_op"(%194, %195) {predicate = 2 : i64} : (f32, f32) -> f32 + %197 = load %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = "accv.bin_op"(%197, %196) {predicate = 0 : i64} : (f32, f32) -> f32 + store %198, %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = load %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %199, %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %201 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %202 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %203 = load %arg0[%200, %202] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%202, %201] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = "accv.bin_op"(%203, %204) {predicate = 2 : i64} : (f32, f32) -> f32 + %206 = load %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = "accv.bin_op"(%206, %205) {predicate = 0 : i64} : (f32, f32) -> f32 + store %207, %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %210 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %211 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %212 = load %arg0[%209, %211] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %213 = load %arg1[%211, %210] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = "accv.bin_op"(%212, %213) {predicate = 2 : i64} : (f32, f32) -> f32 + %215 = load %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = "accv.bin_op"(%215, %214) {predicate = 0 : i64} : (f32, f32) -> f32 + store %216, %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %217, %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %218 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %219 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %220 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %221 = load %arg0[%218, %220] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%220, %219] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = "accv.bin_op"(%221, %222) {predicate = 2 : i64} : (f32, f32) -> f32 + %224 = load %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = "accv.bin_op"(%224, %223) {predicate = 0 : i64} : (f32, f32) -> f32 + store %225, %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %228 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %229 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %230 = load %arg0[%227, %229] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %231 = load %arg1[%229, %228] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = "accv.bin_op"(%230, %231) {predicate = 2 : i64} : (f32, f32) -> f32 + %233 = load %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = "accv.bin_op"(%233, %232) {predicate = 0 : i64} : (f32, f32) -> f32 + store %234, %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %235, %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %236 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %237 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %238 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %239 = load %arg0[%236, %238] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%238, %237] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = "accv.bin_op"(%239, %240) {predicate = 2 : i64} : (f32, f32) -> f32 + %242 = load %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = "accv.bin_op"(%242, %241) {predicate = 0 : i64} : (f32, f32) -> f32 + store %243, %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %246 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %247 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %248 = load %arg0[%245, %247] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %249 = load %arg1[%247, %246] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = "accv.bin_op"(%248, %249) {predicate = 2 : i64} : (f32, f32) -> f32 + %251 = load %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = "accv.bin_op"(%251, %250) {predicate = 0 : i64} : (f32, f32) -> f32 + store %252, %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %253, %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %254 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %255 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %256 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %257 = load %arg0[%254, %256] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%256, %255] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = "accv.bin_op"(%257, %258) {predicate = 2 : i64} : (f32, f32) -> f32 + %260 = load %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = "accv.bin_op"(%260, %259) {predicate = 0 : i64} : (f32, f32) -> f32 + store %261, %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %264 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %265 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %266 = load %arg0[%263, %265] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %267 = load %arg1[%265, %264] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = "accv.bin_op"(%266, %267) {predicate = 2 : i64} : (f32, f32) -> f32 + %269 = load %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = "accv.bin_op"(%269, %268) {predicate = 0 : i64} : (f32, f32) -> f32 + store %270, %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %271, %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %272 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %273 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %274 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %275 = load %arg0[%272, %274] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%274, %273] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = "accv.bin_op"(%275, %276) {predicate = 2 : i64} : (f32, f32) -> f32 + %278 = load %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = "accv.bin_op"(%278, %277) {predicate = 0 : i64} : (f32, f32) -> f32 + store %279, %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %282 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %283 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %284 = load %arg0[%281, %283] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %285 = load %arg1[%283, %282] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = "accv.bin_op"(%284, %285) {predicate = 2 : i64} : (f32, f32) -> f32 + %287 = load %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = "accv.bin_op"(%287, %286) {predicate = 0 : i64} : (f32, f32) -> f32 + store %288, %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %289, %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %290 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %291 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %292 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %293 = load %arg0[%290, %292] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%292, %291] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = "accv.bin_op"(%293, %294) {predicate = 2 : i64} : (f32, f32) -> f32 + %296 = load %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = "accv.bin_op"(%296, %295) {predicate = 0 : i64} : (f32, f32) -> f32 + store %297, %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %300 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %301 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %302 = load %arg0[%299, %301] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %303 = load %arg1[%301, %300] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = "accv.bin_op"(%302, %303) {predicate = 2 : i64} : (f32, f32) -> f32 + %305 = load %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = "accv.bin_op"(%305, %304) {predicate = 0 : i64} : (f32, f32) -> f32 + store %306, %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = load %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %307, %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %308 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %309 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %310 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %311 = load %arg0[%308, %310] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%310, %309] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = "accv.bin_op"(%311, %312) {predicate = 2 : i64} : (f32, f32) -> f32 + %314 = load %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = "accv.bin_op"(%314, %313) {predicate = 0 : i64} : (f32, f32) -> f32 + store %315, %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %318 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %319 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %320 = load %arg0[%317, %319] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%319, %318] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = "accv.bin_op"(%320, %321) {predicate = 2 : i64} : (f32, f32) -> f32 + %323 = load %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = "accv.bin_op"(%323, %322) {predicate = 0 : i64} : (f32, f32) -> f32 + store %324, %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %327 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %328 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %329 = load %arg0[%326, %328] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%328, %327] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = "accv.bin_op"(%329, %330) {predicate = 2 : i64} : (f32, f32) -> f32 + %332 = load %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = "accv.bin_op"(%332, %331) {predicate = 0 : i64} : (f32, f32) -> f32 + store %333, %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %336 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %337 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %338 = load %arg0[%335, %337] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%337, %336] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = "accv.bin_op"(%338, %339) {predicate = 2 : i64} : (f32, f32) -> f32 + %341 = load %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = "accv.bin_op"(%341, %340) {predicate = 0 : i64} : (f32, f32) -> f32 + store %342, %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %345 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %346 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %347 = load %arg0[%344, %346] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%346, %345] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = "accv.bin_op"(%347, %348) {predicate = 2 : i64} : (f32, f32) -> f32 + %350 = load %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = "accv.bin_op"(%350, %349) {predicate = 0 : i64} : (f32, f32) -> f32 + store %351, %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %354 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %355 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %356 = load %arg0[%353, %355] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%355, %354] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = "accv.bin_op"(%356, %357) {predicate = 2 : i64} : (f32, f32) -> f32 + %359 = load %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = "accv.bin_op"(%359, %358) {predicate = 0 : i64} : (f32, f32) -> f32 + store %360, %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %363 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %364 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %365 = load %arg0[%362, %364] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%364, %363] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = "accv.bin_op"(%365, %366) {predicate = 2 : i64} : (f32, f32) -> f32 + %368 = load %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = "accv.bin_op"(%368, %367) {predicate = 0 : i64} : (f32, f32) -> f32 + store %369, %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %372 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %373 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %374 = load %arg0[%371, %373] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%373, %372] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = "accv.bin_op"(%374, %375) {predicate = 2 : i64} : (f32, f32) -> f32 + %377 = load %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = "accv.bin_op"(%377, %376) {predicate = 0 : i64} : (f32, f32) -> f32 + store %378, %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %381 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %382 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %383 = load %arg0[%380, %382] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%382, %381] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = "accv.bin_op"(%383, %384) {predicate = 2 : i64} : (f32, f32) -> f32 + %386 = load %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = "accv.bin_op"(%386, %385) {predicate = 0 : i64} : (f32, f32) -> f32 + store %387, %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %390 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %391 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %392 = load %arg0[%389, %391] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%391, %390] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = "accv.bin_op"(%392, %393) {predicate = 2 : i64} : (f32, f32) -> f32 + %395 = load %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = "accv.bin_op"(%395, %394) {predicate = 0 : i64} : (f32, f32) -> f32 + store %396, %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %399 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %400 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %401 = load %arg0[%398, %400] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %402 = load %arg1[%400, %399] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = "accv.bin_op"(%401, %402) {predicate = 2 : i64} : (f32, f32) -> f32 + %404 = load %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %405 = "accv.bin_op"(%404, %403) {predicate = 0 : i64} : (f32, f32) -> f32 + store %405, %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %406 = load %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %406, %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %408 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %409 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %410 = load %arg0[%407, %409] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %411 = load %arg1[%409, %408] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %412 = "accv.bin_op"(%410, %411) {predicate = 2 : i64} : (f32, f32) -> f32 + %413 = load %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %414 = "accv.bin_op"(%413, %412) {predicate = 0 : i64} : (f32, f32) -> f32 + store %414, %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = load %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %415, %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %417 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %418 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %419 = load %arg0[%416, %418] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %420 = load %arg1[%418, %417] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = "accv.bin_op"(%419, %420) {predicate = 2 : i64} : (f32, f32) -> f32 + %422 = load %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = "accv.bin_op"(%422, %421) {predicate = 0 : i64} : (f32, f32) -> f32 + store %423, %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %424 = load %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %424, %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %426 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %427 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %428 = load %arg0[%425, %427] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %429 = load %arg1[%427, %426] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %430 = "accv.bin_op"(%428, %429) {predicate = 2 : i64} : (f32, f32) -> f32 + %431 = load %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %432 = "accv.bin_op"(%431, %430) {predicate = 0 : i64} : (f32, f32) -> f32 + store %432, %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = load %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %433, %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %435 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %436 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %437 = load %arg0[%434, %436] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %438 = load %arg1[%436, %435] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = "accv.bin_op"(%437, %438) {predicate = 2 : i64} : (f32, f32) -> f32 + %440 = load %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = "accv.bin_op"(%440, %439) {predicate = 0 : i64} : (f32, f32) -> f32 + store %441, %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %442 = load %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %442, %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %444 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %445 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %446 = load %arg0[%443, %445] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %447 = load %arg1[%445, %444] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %448 = "accv.bin_op"(%446, %447) {predicate = 2 : i64} : (f32, f32) -> f32 + %449 = load %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %450 = "accv.bin_op"(%449, %448) {predicate = 0 : i64} : (f32, f32) -> f32 + store %450, %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = load %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %451, %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %453 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %454 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %455 = load %arg0[%452, %454] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %456 = load %arg1[%454, %453] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = "accv.bin_op"(%455, %456) {predicate = 2 : i64} : (f32, f32) -> f32 + %458 = load %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = "accv.bin_op"(%458, %457) {predicate = 0 : i64} : (f32, f32) -> f32 + store %459, %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = load %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %460, %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %462 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %463 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %464 = load %arg0[%461, %463] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %465 = load %arg1[%463, %462] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %466 = "accv.bin_op"(%464, %465) {predicate = 2 : i64} : (f32, f32) -> f32 + %467 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = "accv.bin_op"(%467, %466) {predicate = 0 : i64} : (f32, f32) -> f32 + store %468, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %469, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %471 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %472 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %473 = load %arg0[%470, %472] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %474 = load %arg1[%472, %471] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = "accv.bin_op"(%473, %474) {predicate = 2 : i64} : (f32, f32) -> f32 + %476 = load %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = "accv.bin_op"(%476, %475) {predicate = 0 : i64} : (f32, f32) -> f32 + store %477, %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = load %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %478, %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %480 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %481 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %482 = load %arg0[%479, %481] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %483 = load %arg1[%481, %480] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %484 = "accv.bin_op"(%482, %483) {predicate = 2 : i64} : (f32, f32) -> f32 + %485 = load %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = "accv.bin_op"(%485, %484) {predicate = 0 : i64} : (f32, f32) -> f32 + store %486, %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = load %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %487, %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %489 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %490 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %491 = load %arg0[%488, %490] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %492 = load %arg1[%490, %489] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = "accv.bin_op"(%491, %492) {predicate = 2 : i64} : (f32, f32) -> f32 + %494 = load %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = "accv.bin_op"(%494, %493) {predicate = 0 : i64} : (f32, f32) -> f32 + store %495, %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = load %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %496, %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %498 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %499 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %500 = load %arg0[%497, %499] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %501 = load %arg1[%499, %498] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %502 = "accv.bin_op"(%500, %501) {predicate = 2 : i64} : (f32, f32) -> f32 + %503 = load %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = "accv.bin_op"(%503, %502) {predicate = 0 : i64} : (f32, f32) -> f32 + store %504, %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %505 = load %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %505, %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %507 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %508 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %509 = load %arg0[%506, %508] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %510 = load %arg1[%508, %507] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %511 = "accv.bin_op"(%509, %510) {predicate = 2 : i64} : (f32, f32) -> f32 + %512 = load %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = "accv.bin_op"(%512, %511) {predicate = 0 : i64} : (f32, f32) -> f32 + store %513, %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %514, %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %515 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %516 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %517 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %518 = load %arg0[%515, %517] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %519 = load %arg1[%517, %516] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = "accv.bin_op"(%518, %519) {predicate = 2 : i64} : (f32, f32) -> f32 + %521 = load %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = "accv.bin_op"(%521, %520) {predicate = 0 : i64} : (f32, f32) -> f32 + store %522, %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %523 = load %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %523, %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %525 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %526 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %527 = load %arg0[%524, %526] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %528 = load %arg1[%526, %525] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %529 = "accv.bin_op"(%527, %528) {predicate = 2 : i64} : (f32, f32) -> f32 + %530 = load %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = "accv.bin_op"(%530, %529) {predicate = 0 : i64} : (f32, f32) -> f32 + store %531, %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %532, %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %533 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %534 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %535 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %536 = load %arg0[%533, %535] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %537 = load %arg1[%535, %534] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = "accv.bin_op"(%536, %537) {predicate = 2 : i64} : (f32, f32) -> f32 + %539 = load %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = "accv.bin_op"(%539, %538) {predicate = 0 : i64} : (f32, f32) -> f32 + store %540, %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %541 = load %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %541, %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %543 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %544 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %545 = load %arg0[%542, %544] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %546 = load %arg1[%544, %543] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %547 = "accv.bin_op"(%545, %546) {predicate = 2 : i64} : (f32, f32) -> f32 + %548 = load %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = "accv.bin_op"(%548, %547) {predicate = 0 : i64} : (f32, f32) -> f32 + store %549, %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %550, %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %551 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %552 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %553 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %554 = load %arg0[%551, %553] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %555 = load %arg1[%553, %552] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = "accv.bin_op"(%554, %555) {predicate = 2 : i64} : (f32, f32) -> f32 + %557 = load %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = "accv.bin_op"(%557, %556) {predicate = 0 : i64} : (f32, f32) -> f32 + store %558, %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = load %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %559, %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %561 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %562 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %563 = load %arg0[%560, %562] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %564 = load %arg1[%562, %561] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %565 = "accv.bin_op"(%563, %564) {predicate = 2 : i64} : (f32, f32) -> f32 + %566 = load %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = "accv.bin_op"(%566, %565) {predicate = 0 : i64} : (f32, f32) -> f32 + store %567, %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %568, %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %569 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %570 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %571 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %572 = load %arg0[%569, %571] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %573 = load %arg1[%571, %570] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = "accv.bin_op"(%572, %573) {predicate = 2 : i64} : (f32, f32) -> f32 + %575 = load %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = "accv.bin_op"(%575, %574) {predicate = 0 : i64} : (f32, f32) -> f32 + store %576, %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %577 = load %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %577, %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %579 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %580 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %581 = load %arg0[%578, %580] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %582 = load %arg1[%580, %579] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %583 = "accv.bin_op"(%581, %582) {predicate = 2 : i64} : (f32, f32) -> f32 + %584 = load %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = "accv.bin_op"(%584, %583) {predicate = 0 : i64} : (f32, f32) -> f32 + store %585, %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %586, %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %587 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %588 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %589 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %590 = load %arg0[%587, %589] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %591 = load %arg1[%589, %588] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = "accv.bin_op"(%590, %591) {predicate = 2 : i64} : (f32, f32) -> f32 + %593 = load %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = "accv.bin_op"(%593, %592) {predicate = 0 : i64} : (f32, f32) -> f32 + store %594, %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %595 = load %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %595, %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %597 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %598 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %599 = load %arg0[%596, %598] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %600 = load %arg1[%598, %597] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %601 = "accv.bin_op"(%599, %600) {predicate = 2 : i64} : (f32, f32) -> f32 + %602 = load %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %603 = "accv.bin_op"(%602, %601) {predicate = 0 : i64} : (f32, f32) -> f32 + store %603, %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %604 = load %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %604, %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %605 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %606 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %607 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %608 = load %arg0[%605, %607] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %609 = load %arg1[%607, %606] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %610 = "accv.bin_op"(%608, %609) {predicate = 2 : i64} : (f32, f32) -> f32 + %611 = load %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %612 = "accv.bin_op"(%611, %610) {predicate = 0 : i64} : (f32, f32) -> f32 + store %612, %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %613 = load %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %613, %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %614 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %615 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %616 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %617 = load %arg0[%614, %616] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %618 = load %arg1[%616, %615] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %619 = "accv.bin_op"(%617, %618) {predicate = 2 : i64} : (f32, f32) -> f32 + %620 = load %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %621 = "accv.bin_op"(%620, %619) {predicate = 0 : i64} : (f32, f32) -> f32 + store %621, %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %622 = load %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %622, %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %623 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %624 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %625 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %626 = load %arg0[%623, %625] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %627 = load %arg1[%625, %624] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %628 = "accv.bin_op"(%626, %627) {predicate = 2 : i64} : (f32, f32) -> f32 + %629 = load %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %630 = "accv.bin_op"(%629, %628) {predicate = 0 : i64} : (f32, f32) -> f32 + store %630, %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %631 = load %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %631, %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %632 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %633 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %634 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %635 = load %arg0[%632, %634] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %636 = load %arg1[%634, %633] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %637 = "accv.bin_op"(%635, %636) {predicate = 2 : i64} : (f32, f32) -> f32 + %638 = load %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %639 = "accv.bin_op"(%638, %637) {predicate = 0 : i64} : (f32, f32) -> f32 + store %639, %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %640 = load %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %640, %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %641 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %642 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %643 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %644 = load %arg0[%641, %643] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %645 = load %arg1[%643, %642] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %646 = "accv.bin_op"(%644, %645) {predicate = 2 : i64} : (f32, f32) -> f32 + %647 = load %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %648 = "accv.bin_op"(%647, %646) {predicate = 0 : i64} : (f32, f32) -> f32 + store %648, %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %649 = load %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %649, %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %650 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %651 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %652 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %653 = load %arg0[%650, %652] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %654 = load %arg1[%652, %651] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %655 = "accv.bin_op"(%653, %654) {predicate = 2 : i64} : (f32, f32) -> f32 + %656 = load %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %657 = "accv.bin_op"(%656, %655) {predicate = 0 : i64} : (f32, f32) -> f32 + store %657, %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %658 = load %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %658, %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %659 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %660 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %661 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %662 = load %arg0[%659, %661] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %663 = load %arg1[%661, %660] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %664 = "accv.bin_op"(%662, %663) {predicate = 2 : i64} : (f32, f32) -> f32 + %665 = load %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %666 = "accv.bin_op"(%665, %664) {predicate = 0 : i64} : (f32, f32) -> f32 + store %666, %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %667 = load %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %667, %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %668 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %669 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %670 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %671 = load %arg0[%668, %670] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %672 = load %arg1[%670, %669] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %673 = "accv.bin_op"(%671, %672) {predicate = 2 : i64} : (f32, f32) -> f32 + %674 = load %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %675 = "accv.bin_op"(%674, %673) {predicate = 0 : i64} : (f32, f32) -> f32 + store %675, %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %676 = load %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %676, %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %677 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %678 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %679 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %680 = load %arg0[%677, %679] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %681 = load %arg1[%679, %678] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %682 = "accv.bin_op"(%680, %681) {predicate = 2 : i64} : (f32, f32) -> f32 + %683 = load %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %684 = "accv.bin_op"(%683, %682) {predicate = 0 : i64} : (f32, f32) -> f32 + store %684, %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %685 = load %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %685, %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %686 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %687 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %688 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %689 = load %arg0[%686, %688] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %690 = load %arg1[%688, %687] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %691 = "accv.bin_op"(%689, %690) {predicate = 2 : i64} : (f32, f32) -> f32 + %692 = load %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %693 = "accv.bin_op"(%692, %691) {predicate = 0 : i64} : (f32, f32) -> f32 + store %693, %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %694 = load %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %694, %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %695 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %696 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %697 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %698 = load %arg0[%695, %697] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %699 = load %arg1[%697, %696] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %700 = "accv.bin_op"(%698, %699) {predicate = 2 : i64} : (f32, f32) -> f32 + %701 = load %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %702 = "accv.bin_op"(%701, %700) {predicate = 0 : i64} : (f32, f32) -> f32 + store %702, %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %703 = load %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %703, %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %704 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %705 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %706 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %707 = load %arg0[%704, %706] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %708 = load %arg1[%706, %705] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %709 = "accv.bin_op"(%707, %708) {predicate = 2 : i64} : (f32, f32) -> f32 + %710 = load %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %711 = "accv.bin_op"(%710, %709) {predicate = 0 : i64} : (f32, f32) -> f32 + store %711, %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %712 = load %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %712, %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %713 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %714 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %715 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %716 = load %arg0[%713, %715] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %717 = load %arg1[%715, %714] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %718 = "accv.bin_op"(%716, %717) {predicate = 2 : i64} : (f32, f32) -> f32 + %719 = load %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %720 = "accv.bin_op"(%719, %718) {predicate = 0 : i64} : (f32, f32) -> f32 + store %720, %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %721 = load %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %721, %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %722 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %723 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %724 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %725 = load %arg0[%722, %724] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %726 = load %arg1[%724, %723] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %727 = "accv.bin_op"(%725, %726) {predicate = 2 : i64} : (f32, f32) -> f32 + %728 = load %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %729 = "accv.bin_op"(%728, %727) {predicate = 0 : i64} : (f32, f32) -> f32 + store %729, %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %730 = load %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %730, %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %731 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %732 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %733 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %734 = load %arg0[%731, %733] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %735 = load %arg1[%733, %732] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %736 = "accv.bin_op"(%734, %735) {predicate = 2 : i64} : (f32, f32) -> f32 + %737 = load %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %738 = "accv.bin_op"(%737, %736) {predicate = 0 : i64} : (f32, f32) -> f32 + store %738, %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %739 = load %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %739, %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %740 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %741 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %742 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %743 = load %arg0[%740, %742] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %744 = load %arg1[%742, %741] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %745 = "accv.bin_op"(%743, %744) {predicate = 2 : i64} : (f32, f32) -> f32 + %746 = load %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %747 = "accv.bin_op"(%746, %745) {predicate = 0 : i64} : (f32, f32) -> f32 + store %747, %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %748 = load %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %748, %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %749 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %750 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %751 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %752 = load %arg0[%749, %751] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %753 = load %arg1[%751, %750] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %754 = "accv.bin_op"(%752, %753) {predicate = 2 : i64} : (f32, f32) -> f32 + %755 = load %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %756 = "accv.bin_op"(%755, %754) {predicate = 0 : i64} : (f32, f32) -> f32 + store %756, %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %757 = load %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %757, %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %758 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %759 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %760 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %761 = load %arg0[%758, %760] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %762 = load %arg1[%760, %759] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %763 = "accv.bin_op"(%761, %762) {predicate = 2 : i64} : (f32, f32) -> f32 + %764 = load %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %765 = "accv.bin_op"(%764, %763) {predicate = 0 : i64} : (f32, f32) -> f32 + store %765, %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %766 = load %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %766, %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %767 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %768 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %769 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %770 = load %arg0[%767, %769] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %771 = load %arg1[%769, %768] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %772 = "accv.bin_op"(%770, %771) {predicate = 2 : i64} : (f32, f32) -> f32 + %773 = load %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %774 = "accv.bin_op"(%773, %772) {predicate = 0 : i64} : (f32, f32) -> f32 + store %774, %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %775 = load %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %775, %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %776 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %777 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %778 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %779 = load %arg0[%776, %778] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %780 = load %arg1[%778, %777] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %781 = "accv.bin_op"(%779, %780) {predicate = 2 : i64} : (f32, f32) -> f32 + %782 = load %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %783 = "accv.bin_op"(%782, %781) {predicate = 0 : i64} : (f32, f32) -> f32 + store %783, %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %784 = load %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %784, %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %785 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %786 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %787 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %788 = load %arg0[%785, %787] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %789 = load %arg1[%787, %786] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %790 = "accv.bin_op"(%788, %789) {predicate = 2 : i64} : (f32, f32) -> f32 + %791 = load %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %792 = "accv.bin_op"(%791, %790) {predicate = 0 : i64} : (f32, f32) -> f32 + store %792, %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %793 = load %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %793, %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %794 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %795 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %796 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %797 = load %arg0[%794, %796] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %798 = load %arg1[%796, %795] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %799 = "accv.bin_op"(%797, %798) {predicate = 2 : i64} : (f32, f32) -> f32 + %800 = load %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %801 = "accv.bin_op"(%800, %799) {predicate = 0 : i64} : (f32, f32) -> f32 + store %801, %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %802 = load %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %802, %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %803 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %804 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %805 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %806 = load %arg0[%803, %805] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %807 = load %arg1[%805, %804] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %808 = "accv.bin_op"(%806, %807) {predicate = 2 : i64} : (f32, f32) -> f32 + %809 = load %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %810 = "accv.bin_op"(%809, %808) {predicate = 0 : i64} : (f32, f32) -> f32 + store %810, %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %811 = load %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %811, %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %812 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %813 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %814 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %815 = load %arg0[%812, %814] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %816 = load %arg1[%814, %813] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %817 = "accv.bin_op"(%815, %816) {predicate = 2 : i64} : (f32, f32) -> f32 + %818 = load %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %819 = "accv.bin_op"(%818, %817) {predicate = 0 : i64} : (f32, f32) -> f32 + store %819, %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %820 = load %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %820, %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %821 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %822 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %823 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %824 = load %arg0[%821, %823] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %825 = load %arg1[%823, %822] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %826 = "accv.bin_op"(%824, %825) {predicate = 2 : i64} : (f32, f32) -> f32 + %827 = load %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %828 = "accv.bin_op"(%827, %826) {predicate = 0 : i64} : (f32, f32) -> f32 + store %828, %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %829 = load %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %829, %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %830 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %831 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %832 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %833 = load %arg0[%830, %832] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %834 = load %arg1[%832, %831] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %835 = "accv.bin_op"(%833, %834) {predicate = 2 : i64} : (f32, f32) -> f32 + %836 = load %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %837 = "accv.bin_op"(%836, %835) {predicate = 0 : i64} : (f32, f32) -> f32 + store %837, %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %838 = load %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %838, %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %839 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %840 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %841 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %842 = load %arg0[%839, %841] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %843 = load %arg1[%841, %840] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %844 = "accv.bin_op"(%842, %843) {predicate = 2 : i64} : (f32, f32) -> f32 + %845 = load %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %846 = "accv.bin_op"(%845, %844) {predicate = 0 : i64} : (f32, f32) -> f32 + store %846, %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %847 = load %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %847, %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_4,14}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_3,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_3,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 16, 128]} + } {begin = 0 : i64, end = 780 : i64, index = #accln<"index{i_1,11}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 256, 128]} + affine.for %arg4 = 0 to 256 step 16 { + affine.for %arg5 = 0 to 128 step 4 { + affine.for %arg6 = 0 to 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = "accv.bin_op"(%2, %3) {predicate = 2 : i64} : (f32, f32) -> f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = "accv.bin_op"(%5, %4) {predicate = 0 : i64} : (f32, f32) -> f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %10 = load %arg0[%c780, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg1[%9, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %12 = "accv.bin_op"(%10, %11) {predicate = 2 : i64} : (f32, f32) -> f32 + %13 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = "accv.bin_op"(%13, %12) {predicate = 0 : i64} : (f32, f32) -> f32 + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %15, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %17 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %18 = load %arg0[%c780, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg1[%17, %16] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = "accv.bin_op"(%18, %19) {predicate = 2 : i64} : (f32, f32) -> f32 + %21 = load %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = "accv.bin_op"(%21, %20) {predicate = 0 : i64} : (f32, f32) -> f32 + store %22, %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %23 = load %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %23, %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %25 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %26 = load %arg0[%c780, %25] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg1[%25, %24] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = "accv.bin_op"(%26, %27) {predicate = 2 : i64} : (f32, f32) -> f32 + %29 = load %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %30 = "accv.bin_op"(%29, %28) {predicate = 0 : i64} : (f32, f32) -> f32 + store %30, %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = load %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %31, %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %33 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %34 = load %arg0[%c780, %33] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %35 = load %arg1[%33, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = "accv.bin_op"(%34, %35) {predicate = 2 : i64} : (f32, f32) -> f32 + %37 = load %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %38 = "accv.bin_op"(%37, %36) {predicate = 0 : i64} : (f32, f32) -> f32 + store %38, %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = load %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %39, %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %41 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %42 = load %arg0[%c780, %41] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %43 = load %arg1[%41, %40] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = "accv.bin_op"(%42, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %45 = load %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = "accv.bin_op"(%45, %44) {predicate = 0 : i64} : (f32, f32) -> f32 + store %46, %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %47 = load %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %47, %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %49 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %50 = load %arg0[%c780, %49] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %51 = load %arg1[%49, %48] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = "accv.bin_op"(%50, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = load %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %54 = "accv.bin_op"(%53, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + store %54, %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = load %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %55, %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %57 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %58 = load %arg0[%c780, %57] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%57, %56] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = "accv.bin_op"(%58, %59) {predicate = 2 : i64} : (f32, f32) -> f32 + %61 = load %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = "accv.bin_op"(%61, %60) {predicate = 0 : i64} : (f32, f32) -> f32 + store %62, %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %65 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %66 = load %arg0[%c780, %65] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %67 = load %arg1[%65, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %68 = "accv.bin_op"(%66, %67) {predicate = 2 : i64} : (f32, f32) -> f32 + %69 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = "accv.bin_op"(%69, %68) {predicate = 0 : i64} : (f32, f32) -> f32 + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %71, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %72 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %73 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %74 = load %arg0[%c780, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = "accv.bin_op"(%74, %75) {predicate = 2 : i64} : (f32, f32) -> f32 + %77 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = "accv.bin_op"(%77, %76) {predicate = 0 : i64} : (f32, f32) -> f32 + store %78, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %81 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %82 = load %arg0[%c780, %81] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %83 = load %arg1[%81, %80] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = "accv.bin_op"(%82, %83) {predicate = 2 : i64} : (f32, f32) -> f32 + %85 = load %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %86 = "accv.bin_op"(%85, %84) {predicate = 0 : i64} : (f32, f32) -> f32 + store %86, %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = load %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %87, %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %89 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %90 = load %arg0[%c780, %89] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %91 = load %arg1[%89, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = "accv.bin_op"(%90, %91) {predicate = 2 : i64} : (f32, f32) -> f32 + %93 = load %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = "accv.bin_op"(%93, %92) {predicate = 0 : i64} : (f32, f32) -> f32 + store %94, %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = load %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %95, %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %97 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %98 = load %arg0[%c780, %97] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %99 = load %arg1[%97, %96] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %100 = "accv.bin_op"(%98, %99) {predicate = 2 : i64} : (f32, f32) -> f32 + %101 = load %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = "accv.bin_op"(%101, %100) {predicate = 0 : i64} : (f32, f32) -> f32 + store %102, %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = load %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %103, %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %105 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %106 = load %arg0[%c780, %105] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %107 = load %arg1[%105, %104] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %108 = "accv.bin_op"(%106, %107) {predicate = 2 : i64} : (f32, f32) -> f32 + %109 = load %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %110 = "accv.bin_op"(%109, %108) {predicate = 0 : i64} : (f32, f32) -> f32 + store %110, %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = load %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %111, %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %113 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %114 = load %arg0[%c780, %113] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%113, %112] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = "accv.bin_op"(%114, %115) {predicate = 2 : i64} : (f32, f32) -> f32 + %117 = load %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = "accv.bin_op"(%117, %116) {predicate = 0 : i64} : (f32, f32) -> f32 + store %118, %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %121 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %122 = load %arg0[%c780, %121] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg1[%121, %120] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = "accv.bin_op"(%122, %123) {predicate = 2 : i64} : (f32, f32) -> f32 + %125 = load %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = "accv.bin_op"(%125, %124) {predicate = 0 : i64} : (f32, f32) -> f32 + store %126, %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = load %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %127, %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %129 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %130 = load %arg0[%c781, %129] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg1[%129, %128] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = "accv.bin_op"(%130, %131) {predicate = 2 : i64} : (f32, f32) -> f32 + %133 = load %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = "accv.bin_op"(%133, %132) {predicate = 0 : i64} : (f32, f32) -> f32 + store %134, %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = load %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %135, %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %137 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %138 = load %arg0[%c781, %137] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%137, %136] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = "accv.bin_op"(%138, %139) {predicate = 2 : i64} : (f32, f32) -> f32 + %141 = load %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = "accv.bin_op"(%141, %140) {predicate = 0 : i64} : (f32, f32) -> f32 + store %142, %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %145 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %146 = load %arg0[%c781, %145] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %147 = load %arg1[%145, %144] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = "accv.bin_op"(%146, %147) {predicate = 2 : i64} : (f32, f32) -> f32 + %149 = load %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = "accv.bin_op"(%149, %148) {predicate = 0 : i64} : (f32, f32) -> f32 + store %150, %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = load %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %151, %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %153 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %154 = load %arg0[%c781, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg1[%153, %152] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = "accv.bin_op"(%154, %155) {predicate = 2 : i64} : (f32, f32) -> f32 + %157 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = "accv.bin_op"(%157, %156) {predicate = 0 : i64} : (f32, f32) -> f32 + store %158, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %159, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %161 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %162 = load %arg0[%c781, %161] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%161, %160] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = "accv.bin_op"(%162, %163) {predicate = 2 : i64} : (f32, f32) -> f32 + %165 = load %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = "accv.bin_op"(%165, %164) {predicate = 0 : i64} : (f32, f32) -> f32 + store %166, %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %169 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %170 = load %arg0[%c781, %169] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %171 = load %arg1[%169, %168] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = "accv.bin_op"(%170, %171) {predicate = 2 : i64} : (f32, f32) -> f32 + %173 = load %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = "accv.bin_op"(%173, %172) {predicate = 0 : i64} : (f32, f32) -> f32 + store %174, %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = load %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %175, %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %177 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %178 = load %arg0[%c781, %177] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %179 = load %arg1[%177, %176] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = "accv.bin_op"(%178, %179) {predicate = 2 : i64} : (f32, f32) -> f32 + %181 = load %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = "accv.bin_op"(%181, %180) {predicate = 0 : i64} : (f32, f32) -> f32 + store %182, %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = load %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %183, %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %185 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %186 = load %arg0[%c781, %185] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%185, %184] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = "accv.bin_op"(%186, %187) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = load %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = "accv.bin_op"(%189, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + store %190, %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %193 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %194 = load %arg0[%c781, %193] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %195 = load %arg1[%193, %192] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = "accv.bin_op"(%194, %195) {predicate = 2 : i64} : (f32, f32) -> f32 + %197 = load %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = "accv.bin_op"(%197, %196) {predicate = 0 : i64} : (f32, f32) -> f32 + store %198, %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = load %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %199, %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %201 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %202 = load %arg0[%c781, %201] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %203 = load %arg1[%201, %200] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = "accv.bin_op"(%202, %203) {predicate = 2 : i64} : (f32, f32) -> f32 + %205 = load %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = "accv.bin_op"(%205, %204) {predicate = 0 : i64} : (f32, f32) -> f32 + store %206, %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = load %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %207, %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %209 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %210 = load %arg0[%c781, %209] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %211 = load %arg1[%209, %208] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %212 = "accv.bin_op"(%210, %211) {predicate = 2 : i64} : (f32, f32) -> f32 + %213 = load %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = "accv.bin_op"(%213, %212) {predicate = 0 : i64} : (f32, f32) -> f32 + store %214, %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %215, %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %217 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %218 = load %arg0[%c781, %217] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %219 = load %arg1[%217, %216] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = "accv.bin_op"(%218, %219) {predicate = 2 : i64} : (f32, f32) -> f32 + %221 = load %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = "accv.bin_op"(%221, %220) {predicate = 0 : i64} : (f32, f32) -> f32 + store %222, %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %223, %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %224 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %225 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %226 = load %arg0[%c781, %225] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %227 = load %arg1[%225, %224] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = "accv.bin_op"(%226, %227) {predicate = 2 : i64} : (f32, f32) -> f32 + %229 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %230 = "accv.bin_op"(%229, %228) {predicate = 0 : i64} : (f32, f32) -> f32 + store %230, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %231, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %233 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %234 = load %arg0[%c781, %233] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %235 = load %arg1[%233, %232] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %236 = "accv.bin_op"(%234, %235) {predicate = 2 : i64} : (f32, f32) -> f32 + %237 = load %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = "accv.bin_op"(%237, %236) {predicate = 0 : i64} : (f32, f32) -> f32 + store %238, %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %239, %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %241 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %242 = load %arg0[%c781, %241] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %243 = load %arg1[%241, %240] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = "accv.bin_op"(%242, %243) {predicate = 2 : i64} : (f32, f32) -> f32 + %245 = load %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = "accv.bin_op"(%245, %244) {predicate = 0 : i64} : (f32, f32) -> f32 + store %246, %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %247, %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %248 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %249 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %250 = load %arg0[%c781, %249] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %251 = load %arg1[%249, %248] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = "accv.bin_op"(%250, %251) {predicate = 2 : i64} : (f32, f32) -> f32 + %253 = load %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %254 = "accv.bin_op"(%253, %252) {predicate = 0 : i64} : (f32, f32) -> f32 + store %254, %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = load %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %255, %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %257 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %258 = load %arg0[%c782, %257] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %259 = load %arg1[%257, %256] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %260 = "accv.bin_op"(%258, %259) {predicate = 2 : i64} : (f32, f32) -> f32 + %261 = load %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = "accv.bin_op"(%261, %260) {predicate = 0 : i64} : (f32, f32) -> f32 + store %262, %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %263, %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %265 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %266 = load %arg0[%c782, %265] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %267 = load %arg1[%265, %264] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = "accv.bin_op"(%266, %267) {predicate = 2 : i64} : (f32, f32) -> f32 + %269 = load %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = "accv.bin_op"(%269, %268) {predicate = 0 : i64} : (f32, f32) -> f32 + store %270, %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %271, %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %272 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %273 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %274 = load %arg0[%c782, %273] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %275 = load %arg1[%273, %272] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = "accv.bin_op"(%274, %275) {predicate = 2 : i64} : (f32, f32) -> f32 + %277 = load %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %278 = "accv.bin_op"(%277, %276) {predicate = 0 : i64} : (f32, f32) -> f32 + store %278, %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = load %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %279, %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %281 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %282 = load %arg0[%c782, %281] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %283 = load %arg1[%281, %280] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %284 = "accv.bin_op"(%282, %283) {predicate = 2 : i64} : (f32, f32) -> f32 + %285 = load %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = "accv.bin_op"(%285, %284) {predicate = 0 : i64} : (f32, f32) -> f32 + store %286, %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %287, %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %289 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %290 = load %arg0[%c782, %289] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %291 = load %arg1[%289, %288] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = "accv.bin_op"(%290, %291) {predicate = 2 : i64} : (f32, f32) -> f32 + %293 = load %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = "accv.bin_op"(%293, %292) {predicate = 0 : i64} : (f32, f32) -> f32 + store %294, %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %295, %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %296 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %297 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %298 = load %arg0[%c782, %297] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %299 = load %arg1[%297, %296] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = "accv.bin_op"(%298, %299) {predicate = 2 : i64} : (f32, f32) -> f32 + %301 = load %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %302 = "accv.bin_op"(%301, %300) {predicate = 0 : i64} : (f32, f32) -> f32 + store %302, %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = load %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %303, %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %305 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %306 = load %arg0[%c782, %305] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %307 = load %arg1[%305, %304] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %308 = "accv.bin_op"(%306, %307) {predicate = 2 : i64} : (f32, f32) -> f32 + %309 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = "accv.bin_op"(%309, %308) {predicate = 0 : i64} : (f32, f32) -> f32 + store %310, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %311, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %313 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %314 = load %arg0[%c782, %313] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%313, %312] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = "accv.bin_op"(%314, %315) {predicate = 2 : i64} : (f32, f32) -> f32 + %317 = load %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = "accv.bin_op"(%317, %316) {predicate = 0 : i64} : (f32, f32) -> f32 + store %318, %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %321 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %322 = load %arg0[%c782, %321] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %323 = load %arg1[%321, %320] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = "accv.bin_op"(%322, %323) {predicate = 2 : i64} : (f32, f32) -> f32 + %325 = load %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = "accv.bin_op"(%325, %324) {predicate = 0 : i64} : (f32, f32) -> f32 + store %326, %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = load %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %327, %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %329 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %330 = load %arg0[%c782, %329] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %331 = load %arg1[%329, %328] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = "accv.bin_op"(%330, %331) {predicate = 2 : i64} : (f32, f32) -> f32 + %333 = load %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = "accv.bin_op"(%333, %332) {predicate = 0 : i64} : (f32, f32) -> f32 + store %334, %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %335, %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %337 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %338 = load %arg0[%c782, %337] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%337, %336] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = "accv.bin_op"(%338, %339) {predicate = 2 : i64} : (f32, f32) -> f32 + %341 = load %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = "accv.bin_op"(%341, %340) {predicate = 0 : i64} : (f32, f32) -> f32 + store %342, %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %345 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %346 = load %arg0[%c782, %345] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %347 = load %arg1[%345, %344] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = "accv.bin_op"(%346, %347) {predicate = 2 : i64} : (f32, f32) -> f32 + %349 = load %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = "accv.bin_op"(%349, %348) {predicate = 0 : i64} : (f32, f32) -> f32 + store %350, %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = load %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %351, %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %353 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %354 = load %arg0[%c782, %353] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %355 = load %arg1[%353, %352] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = "accv.bin_op"(%354, %355) {predicate = 2 : i64} : (f32, f32) -> f32 + %357 = load %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = "accv.bin_op"(%357, %356) {predicate = 0 : i64} : (f32, f32) -> f32 + store %358, %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %359, %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %361 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %362 = load %arg0[%c782, %361] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%361, %360] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = "accv.bin_op"(%362, %363) {predicate = 2 : i64} : (f32, f32) -> f32 + %365 = load %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = "accv.bin_op"(%365, %364) {predicate = 0 : i64} : (f32, f32) -> f32 + store %366, %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %369 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %370 = load %arg0[%c782, %369] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %371 = load %arg1[%369, %368] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = "accv.bin_op"(%370, %371) {predicate = 2 : i64} : (f32, f32) -> f32 + %373 = load %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = "accv.bin_op"(%373, %372) {predicate = 0 : i64} : (f32, f32) -> f32 + store %374, %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = load %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %375, %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %377 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %378 = load %arg0[%c782, %377] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %379 = load %arg1[%377, %376] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = "accv.bin_op"(%378, %379) {predicate = 2 : i64} : (f32, f32) -> f32 + %381 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = "accv.bin_op"(%381, %380) {predicate = 0 : i64} : (f32, f32) -> f32 + store %382, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %383, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %385 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %386 = load %arg0[%c783, %385] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%385, %384] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = "accv.bin_op"(%386, %387) {predicate = 2 : i64} : (f32, f32) -> f32 + %389 = load %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = "accv.bin_op"(%389, %388) {predicate = 0 : i64} : (f32, f32) -> f32 + store %390, %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %393 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %394 = load %arg0[%c783, %393] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %395 = load %arg1[%393, %392] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = "accv.bin_op"(%394, %395) {predicate = 2 : i64} : (f32, f32) -> f32 + %397 = load %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = "accv.bin_op"(%397, %396) {predicate = 0 : i64} : (f32, f32) -> f32 + store %398, %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = load %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %399, %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %401 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %402 = load %arg0[%c783, %401] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %403 = load %arg1[%401, %400] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = "accv.bin_op"(%402, %403) {predicate = 2 : i64} : (f32, f32) -> f32 + %405 = load %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %406 = "accv.bin_op"(%405, %404) {predicate = 0 : i64} : (f32, f32) -> f32 + store %406, %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = load %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %407, %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %408 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %409 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %410 = load %arg0[%c783, %409] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %411 = load %arg1[%409, %408] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %412 = "accv.bin_op"(%410, %411) {predicate = 2 : i64} : (f32, f32) -> f32 + %413 = load %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %414 = "accv.bin_op"(%413, %412) {predicate = 0 : i64} : (f32, f32) -> f32 + store %414, %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = load %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %415, %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %417 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %418 = load %arg0[%c783, %417] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %419 = load %arg1[%417, %416] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = "accv.bin_op"(%418, %419) {predicate = 2 : i64} : (f32, f32) -> f32 + %421 = load %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = "accv.bin_op"(%421, %420) {predicate = 0 : i64} : (f32, f32) -> f32 + store %422, %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %423, %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %424 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %425 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %426 = load %arg0[%c783, %425] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %427 = load %arg1[%425, %424] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = "accv.bin_op"(%426, %427) {predicate = 2 : i64} : (f32, f32) -> f32 + %429 = load %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %430 = "accv.bin_op"(%429, %428) {predicate = 0 : i64} : (f32, f32) -> f32 + store %430, %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = load %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %431, %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %432 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %433 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %434 = load %arg0[%c783, %433] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %435 = load %arg1[%433, %432] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %436 = "accv.bin_op"(%434, %435) {predicate = 2 : i64} : (f32, f32) -> f32 + %437 = load %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %438 = "accv.bin_op"(%437, %436) {predicate = 0 : i64} : (f32, f32) -> f32 + store %438, %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = load %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %439, %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %441 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %442 = load %arg0[%c783, %441] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %443 = load %arg1[%441, %440] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %444 = "accv.bin_op"(%442, %443) {predicate = 2 : i64} : (f32, f32) -> f32 + %445 = load %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = "accv.bin_op"(%445, %444) {predicate = 0 : i64} : (f32, f32) -> f32 + store %446, %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %447, %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %448 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %449 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %450 = load %arg0[%c783, %449] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %451 = load %arg1[%449, %448] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = "accv.bin_op"(%450, %451) {predicate = 2 : i64} : (f32, f32) -> f32 + %453 = load %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %454 = "accv.bin_op"(%453, %452) {predicate = 0 : i64} : (f32, f32) -> f32 + store %454, %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = load %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %455, %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %456 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %457 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %458 = load %arg0[%c783, %457] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %459 = load %arg1[%457, %456] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = "accv.bin_op"(%458, %459) {predicate = 2 : i64} : (f32, f32) -> f32 + %461 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %462 = "accv.bin_op"(%461, %460) {predicate = 0 : i64} : (f32, f32) -> f32 + store %462, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %463, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %465 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %466 = load %arg0[%c783, %465] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %467 = load %arg1[%465, %464] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = "accv.bin_op"(%466, %467) {predicate = 2 : i64} : (f32, f32) -> f32 + %469 = load %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = "accv.bin_op"(%469, %468) {predicate = 0 : i64} : (f32, f32) -> f32 + store %470, %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %471, %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %472 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %473 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %474 = load %arg0[%c783, %473] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %475 = load %arg1[%473, %472] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = "accv.bin_op"(%474, %475) {predicate = 2 : i64} : (f32, f32) -> f32 + %477 = load %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = "accv.bin_op"(%477, %476) {predicate = 0 : i64} : (f32, f32) -> f32 + store %478, %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = load %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %479, %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %480 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %481 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %482 = load %arg0[%c783, %481] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %483 = load %arg1[%481, %480] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %484 = "accv.bin_op"(%482, %483) {predicate = 2 : i64} : (f32, f32) -> f32 + %485 = load %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = "accv.bin_op"(%485, %484) {predicate = 0 : i64} : (f32, f32) -> f32 + store %486, %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = load %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %487, %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %489 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %490 = load %arg0[%c783, %489] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %491 = load %arg1[%489, %488] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %492 = "accv.bin_op"(%490, %491) {predicate = 2 : i64} : (f32, f32) -> f32 + %493 = load %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = "accv.bin_op"(%493, %492) {predicate = 0 : i64} : (f32, f32) -> f32 + store %494, %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %495, %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %497 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %498 = load %arg0[%c783, %497] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %499 = load %arg1[%497, %496] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = "accv.bin_op"(%498, %499) {predicate = 2 : i64} : (f32, f32) -> f32 + %501 = load %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %502 = "accv.bin_op"(%501, %500) {predicate = 0 : i64} : (f32, f32) -> f32 + store %502, %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %503 = load %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %503, %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %505 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %506 = load %arg0[%c783, %505] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %507 = load %arg1[%505, %504] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = "accv.bin_op"(%506, %507) {predicate = 2 : i64} : (f32, f32) -> f32 + %509 = load %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = "accv.bin_op"(%509, %508) {predicate = 0 : i64} : (f32, f32) -> f32 + store %510, %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %511 = load %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %511, %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_4,14}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_3,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_3,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_1,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/4_SymbolDCE.mlir b/Tutorials/optimized_matmul/mlir/4_SymbolDCE.mlir new file mode 100644 index 00000000..e36f1a6a --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/4_SymbolDCE.mlir @@ -0,0 +1,1168 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "optimized_matmul" { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + %c0_0 = constant 0 : index + %c0_1 = constant 0 : index + %c0_2 = constant 0 : index + %c0_3 = constant 0 : index + %c0_4 = constant 0 : index + %c0_5 = constant 0 : index + %c0_6 = constant 0 : index + %c0_7 = constant 0 : index + %c0_8 = constant 0 : index + %c0_9 = constant 0 : index + %c0_10 = constant 0 : index + %c0_11 = constant 0 : index + %c0_12 = constant 0 : index + %c0_13 = constant 0 : index + %c0_14 = constant 0 : index + %c0_15 = constant 0 : index + %c0_16 = constant 0 : index + %c0_17 = constant 0 : index + %c0_18 = constant 0 : index + %c0_19 = constant 0 : index + %c0_20 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_21 = constant dense<0.000000e+00> : vector<8xf32> + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + affine.for %arg3 = 0 to 512 step 256 { + affine.for %arg4 = 0 to 128 { + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %c0_20) + %6 = vector.transfer_read %arg1[%4, %5], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %0[%c0_19, %c0_20] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_20) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %7) + %10 = vector.transfer_read %arg1[%8, %9], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %10, %0[%c0_19, %7] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_20) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %11) + %14 = vector.transfer_read %arg1[%12, %13], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %0[%c0_19, %11] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_20) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %15) + %18 = vector.transfer_read %arg1[%16, %17], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %0[%c0_19, %15] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_20) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %19) + %22 = vector.transfer_read %arg1[%20, %21], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %22, %0[%c0_19, %19] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_20) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %23) + %26 = vector.transfer_read %arg1[%24, %25], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %0[%c0_19, %23] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_20) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %27) + %30 = vector.transfer_read %arg1[%28, %29], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %0[%c0_19, %27] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_20) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %31) + %34 = vector.transfer_read %arg1[%32, %33], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %34, %0[%c0_19, %31] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_20) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %35) + %38 = vector.transfer_read %arg1[%36, %37], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %0[%c0_19, %35] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_20) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %39) + %42 = vector.transfer_read %arg1[%40, %41], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %0[%c0_19, %39] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_20) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %43) + %46 = vector.transfer_read %arg1[%44, %45], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %46, %0[%c0_19, %43] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_20) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %47) + %50 = vector.transfer_read %arg1[%48, %49], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %0[%c0_19, %47] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_20) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %51) + %54 = vector.transfer_read %arg1[%52, %53], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %54, %0[%c0_19, %51] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_20) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %55) + %58 = vector.transfer_read %arg1[%56, %57], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %58, %0[%c0_19, %55] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_20) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %59) + %62 = vector.transfer_read %arg1[%60, %61], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %62, %0[%c0_19, %59] : memref<1x16xvector<8xf32>> + %63 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_20) + %64 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_19) + %65 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %63) + %66 = vector.transfer_read %arg1[%64, %65], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %66, %0[%c0_19, %63] : memref<1x16xvector<8xf32>> + %67 = load %0[%c0_17, %c0_18] : memref<1x16xvector<8xf32>> + affine.store %67, %3[((%arg5 + %c0_18 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %c0_18 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %68 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_18) + %69 = load %0[%c0_17, %68] : memref<1x16xvector<8xf32>> + affine.store %69, %3[((%arg5 + %68 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %68 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %70 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_18) + %71 = load %0[%c0_17, %70] : memref<1x16xvector<8xf32>> + affine.store %71, %3[((%arg5 + %70 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %70 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %72 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_18) + %73 = load %0[%c0_17, %72] : memref<1x16xvector<8xf32>> + affine.store %73, %3[((%arg5 + %72 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %72 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %74 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_18) + %75 = load %0[%c0_17, %74] : memref<1x16xvector<8xf32>> + affine.store %75, %3[((%arg5 + %74 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %74 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %76 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_18) + %77 = load %0[%c0_17, %76] : memref<1x16xvector<8xf32>> + affine.store %77, %3[((%arg5 + %76 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %76 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %78 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_18) + %79 = load %0[%c0_17, %78] : memref<1x16xvector<8xf32>> + affine.store %79, %3[((%arg5 + %78 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %78 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %80 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_18) + %81 = load %0[%c0_17, %80] : memref<1x16xvector<8xf32>> + affine.store %81, %3[((%arg5 + %80 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %80 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %82 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_18) + %83 = load %0[%c0_17, %82] : memref<1x16xvector<8xf32>> + affine.store %83, %3[((%arg5 + %82 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %82 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %84 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_18) + %85 = load %0[%c0_17, %84] : memref<1x16xvector<8xf32>> + affine.store %85, %3[((%arg5 + %84 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %84 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %86 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_18) + %87 = load %0[%c0_17, %86] : memref<1x16xvector<8xf32>> + affine.store %87, %3[((%arg5 + %86 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %86 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_18) + %89 = load %0[%c0_17, %88] : memref<1x16xvector<8xf32>> + affine.store %89, %3[((%arg5 + %88 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %88 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %90 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_18) + %91 = load %0[%c0_17, %90] : memref<1x16xvector<8xf32>> + affine.store %91, %3[((%arg5 + %90 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %90 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %92 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_18) + %93 = load %0[%c0_17, %92] : memref<1x16xvector<8xf32>> + affine.store %93, %3[((%arg5 + %92 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %92 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_18) + %95 = load %0[%c0_17, %94] : memref<1x16xvector<8xf32>> + affine.store %95, %3[((%arg5 + %94 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %94 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_18) + %97 = load %0[%c0_17, %96] : memref<1x16xvector<8xf32>> + affine.store %97, %3[((%arg5 + %96 * 8) floordiv 16) mod 16, (%arg4 + %c0_17) mod 128, (((%arg5 + %96 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } else { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %c0_16) + %6 = vector.transfer_read %arg1[%4, %5], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %0[%c0_15, %c0_16] : memref<1x16xvector<8xf32>> + %7 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_16) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %7) + %10 = vector.transfer_read %arg1[%8, %9], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %10, %0[%c0_15, %7] : memref<1x16xvector<8xf32>> + %11 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_16) + %12 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %11) + %14 = vector.transfer_read %arg1[%12, %13], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %0[%c0_15, %11] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_16) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %15) + %18 = vector.transfer_read %arg1[%16, %17], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %0[%c0_15, %15] : memref<1x16xvector<8xf32>> + %19 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_16) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %21 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %19) + %22 = vector.transfer_read %arg1[%20, %21], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %22, %0[%c0_15, %19] : memref<1x16xvector<8xf32>> + %23 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_16) + %24 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %25 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %23) + %26 = vector.transfer_read %arg1[%24, %25], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %0[%c0_15, %23] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_16) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %27) + %30 = vector.transfer_read %arg1[%28, %29], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %0[%c0_15, %27] : memref<1x16xvector<8xf32>> + %31 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_16) + %32 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %33 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %31) + %34 = vector.transfer_read %arg1[%32, %33], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %34, %0[%c0_15, %31] : memref<1x16xvector<8xf32>> + %35 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_16) + %36 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %37 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %35) + %38 = vector.transfer_read %arg1[%36, %37], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %0[%c0_15, %35] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_16) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %39) + %42 = vector.transfer_read %arg1[%40, %41], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %0[%c0_15, %39] : memref<1x16xvector<8xf32>> + %43 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_16) + %44 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %45 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %43) + %46 = vector.transfer_read %arg1[%44, %45], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %46, %0[%c0_15, %43] : memref<1x16xvector<8xf32>> + %47 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_16) + %48 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %49 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %47) + %50 = vector.transfer_read %arg1[%48, %49], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %0[%c0_15, %47] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_16) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %51) + %54 = vector.transfer_read %arg1[%52, %53], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %54, %0[%c0_15, %51] : memref<1x16xvector<8xf32>> + %55 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_16) + %56 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %57 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %55) + %58 = vector.transfer_read %arg1[%56, %57], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %58, %0[%c0_15, %55] : memref<1x16xvector<8xf32>> + %59 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_16) + %60 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %61 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %59) + %62 = vector.transfer_read %arg1[%60, %61], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %62, %0[%c0_15, %59] : memref<1x16xvector<8xf32>> + %63 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_16) + %64 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg4, %c0_15) + %65 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %63) + %66 = vector.transfer_read %arg1[%64, %65], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %66, %0[%c0_15, %63] : memref<1x16xvector<8xf32>> + %67 = load %0[%c0_13, %c0_14] : memref<1x16xvector<8xf32>> + affine.store %67, %3[((%arg5 + %c0_14 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %c0_14 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %68 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_14) + %69 = load %0[%c0_13, %68] : memref<1x16xvector<8xf32>> + affine.store %69, %3[((%arg5 + %68 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %68 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %70 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_14) + %71 = load %0[%c0_13, %70] : memref<1x16xvector<8xf32>> + affine.store %71, %3[((%arg5 + %70 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %70 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %72 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_14) + %73 = load %0[%c0_13, %72] : memref<1x16xvector<8xf32>> + affine.store %73, %3[((%arg5 + %72 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %72 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %74 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_14) + %75 = load %0[%c0_13, %74] : memref<1x16xvector<8xf32>> + affine.store %75, %3[((%arg5 + %74 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %74 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %76 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_14) + %77 = load %0[%c0_13, %76] : memref<1x16xvector<8xf32>> + affine.store %77, %3[((%arg5 + %76 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %76 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %78 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_14) + %79 = load %0[%c0_13, %78] : memref<1x16xvector<8xf32>> + affine.store %79, %3[((%arg5 + %78 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %78 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %80 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_14) + %81 = load %0[%c0_13, %80] : memref<1x16xvector<8xf32>> + affine.store %81, %3[((%arg5 + %80 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %80 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %82 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_14) + %83 = load %0[%c0_13, %82] : memref<1x16xvector<8xf32>> + affine.store %83, %3[((%arg5 + %82 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %82 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %84 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_14) + %85 = load %0[%c0_13, %84] : memref<1x16xvector<8xf32>> + affine.store %85, %3[((%arg5 + %84 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %84 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %86 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_14) + %87 = load %0[%c0_13, %86] : memref<1x16xvector<8xf32>> + affine.store %87, %3[((%arg5 + %86 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %86 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_14) + %89 = load %0[%c0_13, %88] : memref<1x16xvector<8xf32>> + affine.store %89, %3[((%arg5 + %88 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %88 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %90 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_14) + %91 = load %0[%c0_13, %90] : memref<1x16xvector<8xf32>> + affine.store %91, %3[((%arg5 + %90 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %90 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %92 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_14) + %93 = load %0[%c0_13, %92] : memref<1x16xvector<8xf32>> + affine.store %93, %3[((%arg5 + %92 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %92 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_14) + %95 = load %0[%c0_13, %94] : memref<1x16xvector<8xf32>> + affine.store %95, %3[((%arg5 + %94 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %94 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_14) + %97 = load %0[%c0_13, %96] : memref<1x16xvector<8xf32>> + affine.store %97, %3[((%arg5 + %96 * 8) floordiv 16) mod 16, (%arg4 + %c0_13) mod 128, (((%arg5 + %96 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_o,21}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i_o,19}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 256]} + affine.for %arg4 = 0 to 784 { + affine.for %arg5 = 0 to 16 { + affine.for %arg6 = 0 to 6 { + affine.for %arg7 = 0 to 2 { + store %cst_21, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 2 : i64, index = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 6 : i64, index = #accln<"index{j_i_i_o,15}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 2]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i,14}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 6, 2]} + affine.for %arg5 = 0 to 256 step 16 { + affine.for %arg6 = 0 to 128 step 4 { + affine.for %arg7 = 0 to 0 step 6 { + affine.for %arg8 = 0 to 4 { + affine.for %arg9 = 0 to 0 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %12 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %21 = load %arg0[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg0[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg0[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg0[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg0[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg0[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg0[%10, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %28 = load %arg0[%11, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = affine.load %3[((%12 - %arg3) floordiv 16) mod 16, (%13 - %c0) mod 128, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %30 = vector.extractelement %29[%c0_i64 : i64] : vector<8xf32> + %31 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %32 = affine.load %3[((%31 - %arg3) floordiv 16) mod 16, (%14 - %c0) mod 128, (((%31 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c1_i64 : i64] : vector<8xf32> + %34 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %35 = affine.load %3[((%34 - %arg3) floordiv 16) mod 16, (%15 - %c0) mod 128, (((%34 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %36 = vector.extractelement %35[%c2_i64 : i64] : vector<8xf32> + %37 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %38 = affine.load %3[((%37 - %arg3) floordiv 16) mod 16, (%16 - %c0) mod 128, (((%37 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = vector.extractelement %38[%c3_i64 : i64] : vector<8xf32> + %40 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %41 = affine.load %3[((%40 - %arg3) floordiv 16) mod 16, (%17 - %c0) mod 128, (((%40 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c4_i64 : i64] : vector<8xf32> + %43 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %44 = affine.load %3[((%43 - %arg3) floordiv 16) mod 16, (%18 - %c0) mod 128, (((%43 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = vector.extractelement %44[%c5_i64 : i64] : vector<8xf32> + %46 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %47 = affine.load %3[((%46 - %arg3) floordiv 16) mod 16, (%19 - %c0) mod 128, (((%46 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c6_i64 : i64] : vector<8xf32> + %49 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %50 = affine.load %3[((%49 - %arg3) floordiv 16) mod 16, (%20 - %c0) mod 128, (((%49 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = vector.extractelement %50[%c7_i64 : i64] : vector<8xf32> + %52 = "accv.bin_op"(%21, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %36) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %42) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %45) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = "accv.bin_op"(%27, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = "accv.bin_op"(%28, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %60 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %61 = vector.extractelement %60[%c0_i64 : i64] : vector<8xf32> + %62 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %63 = affine.load %2[((%62 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%62 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = vector.extractelement %63[%c1_i64 : i64] : vector<8xf32> + %65 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %66 = affine.load %2[((%65 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%65 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = vector.extractelement %66[%c2_i64 : i64] : vector<8xf32> + %68 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %69 = affine.load %2[((%68 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%68 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c3_i64 : i64] : vector<8xf32> + %71 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %72 = affine.load %2[((%71 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%71 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c4_i64 : i64] : vector<8xf32> + %74 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %75 = affine.load %2[((%74 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%74 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %78 = affine.load %2[((%77 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%77 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c6_i64 : i64] : vector<8xf32> + %80 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %81 = affine.load %2[((%80 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%80 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = vector.extractelement %81[%c7_i64 : i64] : vector<8xf32> + %83 = "accv.bin_op"(%61, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%64, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%67, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%70, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%73, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%76, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = "accv.bin_op"(%79, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + %90 = "accv.bin_op"(%82, %59) {predicate = 0 : i64} : (f32, f32) -> f32 + %91 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %83, %91[%c0_i64 : i64] : vector<8xf32> + affine.store %92, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %94 = affine.load %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %84, %94[%c1_i64 : i64] : vector<8xf32> + affine.store %95, %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %97 = affine.load %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c2_i64 : i64] : vector<8xf32> + affine.store %98, %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %100 = affine.load %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %86, %100[%c3_i64 : i64] : vector<8xf32> + affine.store %101, %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %103 = affine.load %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %87, %103[%c4_i64 : i64] : vector<8xf32> + affine.store %104, %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %106 = affine.load %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %88, %106[%c5_i64 : i64] : vector<8xf32> + affine.store %107, %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %109 = affine.load %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %89, %109[%c6_i64 : i64] : vector<8xf32> + affine.store %110, %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %112 = affine.load %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %90, %112[%c7_i64 : i64] : vector<8xf32> + affine.store %113, %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %83, %114[%c0_i64 : i64] : vector<8xf32> + affine.store %115, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %117 = affine.load %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %84, %117[%c1_i64 : i64] : vector<8xf32> + affine.store %118, %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %120 = affine.load %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %85, %120[%c2_i64 : i64] : vector<8xf32> + affine.store %121, %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %123 = affine.load %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %124 = vector.insertelement %86, %123[%c3_i64 : i64] : vector<8xf32> + affine.store %124, %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %126 = affine.load %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %87, %126[%c4_i64 : i64] : vector<8xf32> + affine.store %127, %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %129 = affine.load %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %130 = vector.insertelement %88, %129[%c5_i64 : i64] : vector<8xf32> + affine.store %130, %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %132 = affine.load %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %133 = vector.insertelement %89, %132[%c6_i64 : i64] : vector<8xf32> + affine.store %133, %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_11, %c0_12) + %135 = affine.load %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %136 = vector.insertelement %90, %135[%c7_i64 : i64] : vector<8xf32> + affine.store %136, %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %137 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_11) + %138 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %139 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %140 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %141 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %142 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %143 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %144 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %145 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %146 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %147 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %148 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %149 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %150 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %151 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %152 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %153 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %154 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg8) + %155 = load %arg0[%138, %147] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg0[%139, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg0[%140, %149] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %158 = load %arg0[%141, %150] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg0[%142, %151] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %160 = load %arg0[%143, %152] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %161 = load %arg0[%144, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg0[%145, %154] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = affine.load %3[((%146 - %arg3) floordiv 16) mod 16, (%147 - %c0) mod 128, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c0_i64 : i64] : vector<8xf32> + %165 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %166 = affine.load %3[((%165 - %arg3) floordiv 16) mod 16, (%148 - %c0) mod 128, (((%165 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c1_i64 : i64] : vector<8xf32> + %168 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %169 = affine.load %3[((%168 - %arg3) floordiv 16) mod 16, (%149 - %c0) mod 128, (((%168 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c2_i64 : i64] : vector<8xf32> + %171 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %172 = affine.load %3[((%171 - %arg3) floordiv 16) mod 16, (%150 - %c0) mod 128, (((%171 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c3_i64 : i64] : vector<8xf32> + %174 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %175 = affine.load %3[((%174 - %arg3) floordiv 16) mod 16, (%151 - %c0) mod 128, (((%174 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %176 = vector.extractelement %175[%c4_i64 : i64] : vector<8xf32> + %177 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %178 = affine.load %3[((%177 - %arg3) floordiv 16) mod 16, (%152 - %c0) mod 128, (((%177 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %179 = vector.extractelement %178[%c5_i64 : i64] : vector<8xf32> + %180 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %181 = affine.load %3[((%180 - %arg3) floordiv 16) mod 16, (%153 - %c0) mod 128, (((%180 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %182 = vector.extractelement %181[%c6_i64 : i64] : vector<8xf32> + %183 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %184 = affine.load %3[((%183 - %arg3) floordiv 16) mod 16, (%154 - %c0) mod 128, (((%183 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %185 = vector.extractelement %184[%c7_i64 : i64] : vector<8xf32> + %186 = "accv.bin_op"(%155, %164) {predicate = 2 : i64} : (f32, f32) -> f32 + %187 = "accv.bin_op"(%156, %167) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = "accv.bin_op"(%157, %170) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = "accv.bin_op"(%158, %173) {predicate = 2 : i64} : (f32, f32) -> f32 + %190 = "accv.bin_op"(%159, %176) {predicate = 2 : i64} : (f32, f32) -> f32 + %191 = "accv.bin_op"(%160, %179) {predicate = 2 : i64} : (f32, f32) -> f32 + %192 = "accv.bin_op"(%161, %182) {predicate = 2 : i64} : (f32, f32) -> f32 + %193 = "accv.bin_op"(%162, %185) {predicate = 2 : i64} : (f32, f32) -> f32 + %194 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c0_i64 : i64] : vector<8xf32> + %196 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %197 = affine.load %2[((%196 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%196 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %198 = vector.extractelement %197[%c1_i64 : i64] : vector<8xf32> + %199 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %200 = affine.load %2[((%199 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%199 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c2_i64 : i64] : vector<8xf32> + %202 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %203 = affine.load %2[((%202 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%202 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %204 = vector.extractelement %203[%c3_i64 : i64] : vector<8xf32> + %205 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %206 = affine.load %2[((%205 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%205 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %207 = vector.extractelement %206[%c4_i64 : i64] : vector<8xf32> + %208 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %209 = affine.load %2[((%208 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%208 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %210 = vector.extractelement %209[%c5_i64 : i64] : vector<8xf32> + %211 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %212 = affine.load %2[((%211 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%211 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %213 = vector.extractelement %212[%c6_i64 : i64] : vector<8xf32> + %214 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %215 = affine.load %2[((%214 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%214 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %216 = vector.extractelement %215[%c7_i64 : i64] : vector<8xf32> + %217 = "accv.bin_op"(%195, %186) {predicate = 0 : i64} : (f32, f32) -> f32 + %218 = "accv.bin_op"(%198, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + %219 = "accv.bin_op"(%201, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + %220 = "accv.bin_op"(%204, %189) {predicate = 0 : i64} : (f32, f32) -> f32 + %221 = "accv.bin_op"(%207, %190) {predicate = 0 : i64} : (f32, f32) -> f32 + %222 = "accv.bin_op"(%210, %191) {predicate = 0 : i64} : (f32, f32) -> f32 + %223 = "accv.bin_op"(%213, %192) {predicate = 0 : i64} : (f32, f32) -> f32 + %224 = "accv.bin_op"(%216, %193) {predicate = 0 : i64} : (f32, f32) -> f32 + %225 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %217, %225[%c0_i64 : i64] : vector<8xf32> + affine.store %226, %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %227 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %228 = affine.load %2[((%227 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%227 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %218, %228[%c1_i64 : i64] : vector<8xf32> + affine.store %229, %2[((%227 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%227 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %230 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %231 = affine.load %2[((%230 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%230 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %219, %231[%c2_i64 : i64] : vector<8xf32> + affine.store %232, %2[((%230 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%230 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %233 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %234 = affine.load %2[((%233 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %220, %234[%c3_i64 : i64] : vector<8xf32> + affine.store %235, %2[((%233 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %236 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %237 = affine.load %2[((%236 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %238 = vector.insertelement %221, %237[%c4_i64 : i64] : vector<8xf32> + affine.store %238, %2[((%236 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %239 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %240 = affine.load %2[((%239 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %222, %240[%c5_i64 : i64] : vector<8xf32> + affine.store %241, %2[((%239 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %242 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %243 = affine.load %2[((%242 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %244 = vector.insertelement %223, %243[%c6_i64 : i64] : vector<8xf32> + affine.store %244, %2[((%242 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %245 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %246 = affine.load %2[((%245 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = vector.insertelement %224, %246[%c7_i64 : i64] : vector<8xf32> + affine.store %247, %2[((%245 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %249 = vector.insertelement %217, %248[%c0_i64 : i64] : vector<8xf32> + affine.store %249, %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %250 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %251 = affine.load %2[((%250 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%250 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %252 = vector.insertelement %218, %251[%c1_i64 : i64] : vector<8xf32> + affine.store %252, %2[((%250 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%250 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %253 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %254 = affine.load %2[((%253 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%253 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %255 = vector.insertelement %219, %254[%c2_i64 : i64] : vector<8xf32> + affine.store %255, %2[((%253 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%253 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %256 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %257 = affine.load %2[((%256 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%256 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %258 = vector.insertelement %220, %257[%c3_i64 : i64] : vector<8xf32> + affine.store %258, %2[((%256 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%256 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %259 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %260 = affine.load %2[((%259 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%259 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %261 = vector.insertelement %221, %260[%c4_i64 : i64] : vector<8xf32> + affine.store %261, %2[((%259 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%259 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %262 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %263 = affine.load %2[((%262 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%262 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %264 = vector.insertelement %222, %263[%c5_i64 : i64] : vector<8xf32> + affine.store %264, %2[((%262 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%262 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %265 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %266 = affine.load %2[((%265 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%265 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %267 = vector.insertelement %223, %266[%c6_i64 : i64] : vector<8xf32> + affine.store %267, %2[((%265 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%265 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %268 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_12) + %269 = affine.load %2[((%268 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%268 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %270 = vector.insertelement %224, %269[%c7_i64 : i64] : vector<8xf32> + affine.store %270, %2[((%268 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%268 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_i,12}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_o,11}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 4]} + affine.for %arg7 = 0 to 4 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %12 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %13 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %14 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %15 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %18 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %19 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %20 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %21 = load %arg0[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg0[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg0[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg0[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg0[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg0[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg0[%10, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %28 = load %arg0[%11, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = affine.load %3[((%12 - %arg3) floordiv 16) mod 16, (%13 - %c0) mod 128, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %30 = vector.extractelement %29[%c0_i64 : i64] : vector<8xf32> + %31 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %32 = affine.load %3[((%31 - %arg3) floordiv 16) mod 16, (%14 - %c0) mod 128, (((%31 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c1_i64 : i64] : vector<8xf32> + %34 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %35 = affine.load %3[((%34 - %arg3) floordiv 16) mod 16, (%15 - %c0) mod 128, (((%34 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %36 = vector.extractelement %35[%c2_i64 : i64] : vector<8xf32> + %37 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %38 = affine.load %3[((%37 - %arg3) floordiv 16) mod 16, (%16 - %c0) mod 128, (((%37 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = vector.extractelement %38[%c3_i64 : i64] : vector<8xf32> + %40 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %41 = affine.load %3[((%40 - %arg3) floordiv 16) mod 16, (%17 - %c0) mod 128, (((%40 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c4_i64 : i64] : vector<8xf32> + %43 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %44 = affine.load %3[((%43 - %arg3) floordiv 16) mod 16, (%18 - %c0) mod 128, (((%43 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = vector.extractelement %44[%c5_i64 : i64] : vector<8xf32> + %46 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %47 = affine.load %3[((%46 - %arg3) floordiv 16) mod 16, (%19 - %c0) mod 128, (((%46 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c6_i64 : i64] : vector<8xf32> + %49 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %50 = affine.load %3[((%49 - %arg3) floordiv 16) mod 16, (%20 - %c0) mod 128, (((%49 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = vector.extractelement %50[%c7_i64 : i64] : vector<8xf32> + %52 = "accv.bin_op"(%21, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %36) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %42) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %45) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = "accv.bin_op"(%27, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = "accv.bin_op"(%28, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %60 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %61 = vector.extractelement %60[%c0_i64 : i64] : vector<8xf32> + %62 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %63 = affine.load %2[((%62 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%62 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = vector.extractelement %63[%c1_i64 : i64] : vector<8xf32> + %65 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %66 = affine.load %2[((%65 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%65 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = vector.extractelement %66[%c2_i64 : i64] : vector<8xf32> + %68 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %69 = affine.load %2[((%68 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%68 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c3_i64 : i64] : vector<8xf32> + %71 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %72 = affine.load %2[((%71 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%71 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c4_i64 : i64] : vector<8xf32> + %74 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %75 = affine.load %2[((%74 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%74 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %78 = affine.load %2[((%77 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%77 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c6_i64 : i64] : vector<8xf32> + %80 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %81 = affine.load %2[((%80 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%80 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = vector.extractelement %81[%c7_i64 : i64] : vector<8xf32> + %83 = "accv.bin_op"(%61, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%64, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%67, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%70, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%73, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%76, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = "accv.bin_op"(%79, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + %90 = "accv.bin_op"(%82, %59) {predicate = 0 : i64} : (f32, f32) -> f32 + %91 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %83, %91[%c0_i64 : i64] : vector<8xf32> + affine.store %92, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %94 = affine.load %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %84, %94[%c1_i64 : i64] : vector<8xf32> + affine.store %95, %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %97 = affine.load %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c2_i64 : i64] : vector<8xf32> + affine.store %98, %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %100 = affine.load %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %86, %100[%c3_i64 : i64] : vector<8xf32> + affine.store %101, %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %103 = affine.load %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %87, %103[%c4_i64 : i64] : vector<8xf32> + affine.store %104, %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %106 = affine.load %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %88, %106[%c5_i64 : i64] : vector<8xf32> + affine.store %107, %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %109 = affine.load %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %89, %109[%c6_i64 : i64] : vector<8xf32> + affine.store %110, %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %112 = affine.load %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %90, %112[%c7_i64 : i64] : vector<8xf32> + affine.store %113, %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %83, %114[%c0_i64 : i64] : vector<8xf32> + affine.store %115, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %117 = affine.load %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %84, %117[%c1_i64 : i64] : vector<8xf32> + affine.store %118, %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %120 = affine.load %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %85, %120[%c2_i64 : i64] : vector<8xf32> + affine.store %121, %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %123 = affine.load %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %124 = vector.insertelement %86, %123[%c3_i64 : i64] : vector<8xf32> + affine.store %124, %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %126 = affine.load %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %87, %126[%c4_i64 : i64] : vector<8xf32> + affine.store %127, %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %129 = affine.load %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %130 = vector.insertelement %88, %129[%c5_i64 : i64] : vector<8xf32> + affine.store %130, %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %132 = affine.load %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %133 = vector.insertelement %89, %132[%c6_i64 : i64] : vector<8xf32> + affine.store %133, %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %c0_9, %c0_10) + %135 = affine.load %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %136 = vector.insertelement %90, %135[%c7_i64 : i64] : vector<8xf32> + affine.store %136, %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %137 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_9) + %138 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %139 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %140 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %141 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %142 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %143 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %144 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %145 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_7, %c0_8) + %146 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %147 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %148 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %149 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %150 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %151 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %152 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %153 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %154 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%c0, %arg6, %arg7) + %155 = load %arg0[%138, %147] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg0[%139, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg0[%140, %149] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %158 = load %arg0[%141, %150] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg0[%142, %151] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %160 = load %arg0[%143, %152] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %161 = load %arg0[%144, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = load %arg0[%145, %154] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = affine.load %3[((%146 - %arg3) floordiv 16) mod 16, (%147 - %c0) mod 128, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %164 = vector.extractelement %163[%c0_i64 : i64] : vector<8xf32> + %165 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %166 = affine.load %3[((%165 - %arg3) floordiv 16) mod 16, (%148 - %c0) mod 128, (((%165 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %167 = vector.extractelement %166[%c1_i64 : i64] : vector<8xf32> + %168 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %169 = affine.load %3[((%168 - %arg3) floordiv 16) mod 16, (%149 - %c0) mod 128, (((%168 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %170 = vector.extractelement %169[%c2_i64 : i64] : vector<8xf32> + %171 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %172 = affine.load %3[((%171 - %arg3) floordiv 16) mod 16, (%150 - %c0) mod 128, (((%171 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %173 = vector.extractelement %172[%c3_i64 : i64] : vector<8xf32> + %174 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %175 = affine.load %3[((%174 - %arg3) floordiv 16) mod 16, (%151 - %c0) mod 128, (((%174 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %176 = vector.extractelement %175[%c4_i64 : i64] : vector<8xf32> + %177 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %178 = affine.load %3[((%177 - %arg3) floordiv 16) mod 16, (%152 - %c0) mod 128, (((%177 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %179 = vector.extractelement %178[%c5_i64 : i64] : vector<8xf32> + %180 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %181 = affine.load %3[((%180 - %arg3) floordiv 16) mod 16, (%153 - %c0) mod 128, (((%180 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %182 = vector.extractelement %181[%c6_i64 : i64] : vector<8xf32> + %183 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %184 = affine.load %3[((%183 - %arg3) floordiv 16) mod 16, (%154 - %c0) mod 128, (((%183 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %185 = vector.extractelement %184[%c7_i64 : i64] : vector<8xf32> + %186 = "accv.bin_op"(%155, %164) {predicate = 2 : i64} : (f32, f32) -> f32 + %187 = "accv.bin_op"(%156, %167) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = "accv.bin_op"(%157, %170) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = "accv.bin_op"(%158, %173) {predicate = 2 : i64} : (f32, f32) -> f32 + %190 = "accv.bin_op"(%159, %176) {predicate = 2 : i64} : (f32, f32) -> f32 + %191 = "accv.bin_op"(%160, %179) {predicate = 2 : i64} : (f32, f32) -> f32 + %192 = "accv.bin_op"(%161, %182) {predicate = 2 : i64} : (f32, f32) -> f32 + %193 = "accv.bin_op"(%162, %185) {predicate = 2 : i64} : (f32, f32) -> f32 + %194 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %195 = vector.extractelement %194[%c0_i64 : i64] : vector<8xf32> + %196 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %197 = affine.load %2[((%196 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%196 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %198 = vector.extractelement %197[%c1_i64 : i64] : vector<8xf32> + %199 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %200 = affine.load %2[((%199 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%199 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %201 = vector.extractelement %200[%c2_i64 : i64] : vector<8xf32> + %202 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %203 = affine.load %2[((%202 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%202 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %204 = vector.extractelement %203[%c3_i64 : i64] : vector<8xf32> + %205 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %206 = affine.load %2[((%205 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%205 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %207 = vector.extractelement %206[%c4_i64 : i64] : vector<8xf32> + %208 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %209 = affine.load %2[((%208 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%208 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %210 = vector.extractelement %209[%c5_i64 : i64] : vector<8xf32> + %211 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %212 = affine.load %2[((%211 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%211 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %213 = vector.extractelement %212[%c6_i64 : i64] : vector<8xf32> + %214 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %215 = affine.load %2[((%214 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%214 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %216 = vector.extractelement %215[%c7_i64 : i64] : vector<8xf32> + %217 = "accv.bin_op"(%195, %186) {predicate = 0 : i64} : (f32, f32) -> f32 + %218 = "accv.bin_op"(%198, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + %219 = "accv.bin_op"(%201, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + %220 = "accv.bin_op"(%204, %189) {predicate = 0 : i64} : (f32, f32) -> f32 + %221 = "accv.bin_op"(%207, %190) {predicate = 0 : i64} : (f32, f32) -> f32 + %222 = "accv.bin_op"(%210, %191) {predicate = 0 : i64} : (f32, f32) -> f32 + %223 = "accv.bin_op"(%213, %192) {predicate = 0 : i64} : (f32, f32) -> f32 + %224 = "accv.bin_op"(%216, %193) {predicate = 0 : i64} : (f32, f32) -> f32 + %225 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %226 = vector.insertelement %217, %225[%c0_i64 : i64] : vector<8xf32> + affine.store %226, %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %227 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %228 = affine.load %2[((%227 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%227 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %229 = vector.insertelement %218, %228[%c1_i64 : i64] : vector<8xf32> + affine.store %229, %2[((%227 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%227 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %230 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %231 = affine.load %2[((%230 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%230 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %219, %231[%c2_i64 : i64] : vector<8xf32> + affine.store %232, %2[((%230 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%230 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %233 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %234 = affine.load %2[((%233 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %220, %234[%c3_i64 : i64] : vector<8xf32> + affine.store %235, %2[((%233 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %236 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %237 = affine.load %2[((%236 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %238 = vector.insertelement %221, %237[%c4_i64 : i64] : vector<8xf32> + affine.store %238, %2[((%236 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %239 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %240 = affine.load %2[((%239 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %222, %240[%c5_i64 : i64] : vector<8xf32> + affine.store %241, %2[((%239 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %242 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %243 = affine.load %2[((%242 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %244 = vector.insertelement %223, %243[%c6_i64 : i64] : vector<8xf32> + affine.store %244, %2[((%242 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %245 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %246 = affine.load %2[((%245 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = vector.insertelement %224, %246[%c7_i64 : i64] : vector<8xf32> + affine.store %247, %2[((%245 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = affine.load %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %249 = vector.insertelement %217, %248[%c0_i64 : i64] : vector<8xf32> + affine.store %249, %2[((%146 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%146 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %250 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %251 = affine.load %2[((%250 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%250 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %252 = vector.insertelement %218, %251[%c1_i64 : i64] : vector<8xf32> + affine.store %252, %2[((%250 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%250 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %253 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %254 = affine.load %2[((%253 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%253 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %255 = vector.insertelement %219, %254[%c2_i64 : i64] : vector<8xf32> + affine.store %255, %2[((%253 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%253 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %256 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %257 = affine.load %2[((%256 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%256 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %258 = vector.insertelement %220, %257[%c3_i64 : i64] : vector<8xf32> + affine.store %258, %2[((%256 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%256 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %259 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %260 = affine.load %2[((%259 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%259 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %261 = vector.insertelement %221, %260[%c4_i64 : i64] : vector<8xf32> + affine.store %261, %2[((%259 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%259 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %262 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %263 = affine.load %2[((%262 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%262 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %264 = vector.insertelement %222, %263[%c5_i64 : i64] : vector<8xf32> + affine.store %264, %2[((%262 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%262 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %265 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %266 = affine.load %2[((%265 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%265 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %267 = vector.insertelement %223, %266[%c6_i64 : i64] : vector<8xf32> + affine.store %267, %2[((%265 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%265 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %268 = affine.apply affine_map<(d0, d1, d2, d3) -> (d0 + d1 + d2 + d3)>(%arg3, %arg5, %137, %c0_10) + %269 = affine.load %2[((%268 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%268 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %270 = vector.insertelement %224, %269[%c7_i64 : i64] : vector<8xf32> + affine.store %270, %2[((%268 - %arg3) floordiv 16) mod 16, (%145 - %arg4) mod 6, (((%268 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 128]} + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %c0_6) + %6 = vector.transfer_read %arg2[%4, %5], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %7 = affine.load %2[((%arg5 + %c0_6 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %c0_6 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %8 = addf %6, %7 : vector<8xf32> + store %8, %1[%c0_5, %c0_6] : memref<1x16xvector<8xf32>> + %9 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_6) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %9) + %12 = vector.transfer_read %arg2[%10, %11], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %13 = affine.load %2[((%arg5 + %9 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %14 = addf %12, %13 : vector<8xf32> + store %14, %1[%c0_5, %9] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_6) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %15) + %18 = vector.transfer_read %arg2[%16, %17], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %19 = affine.load %2[((%arg5 + %15 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %15 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %20 = addf %18, %19 : vector<8xf32> + store %20, %1[%c0_5, %15] : memref<1x16xvector<8xf32>> + %21 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_6) + %22 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %23 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %21) + %24 = vector.transfer_read %arg2[%22, %23], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %25 = affine.load %2[((%arg5 + %21 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %21 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %26 = addf %24, %25 : vector<8xf32> + store %26, %1[%c0_5, %21] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_6) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %27) + %30 = vector.transfer_read %arg2[%28, %29], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %31 = affine.load %2[((%arg5 + %27 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %27 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %32 = addf %30, %31 : vector<8xf32> + store %32, %1[%c0_5, %27] : memref<1x16xvector<8xf32>> + %33 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_6) + %34 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %35 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %33) + %36 = vector.transfer_read %arg2[%34, %35], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %37 = affine.load %2[((%arg5 + %33 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %33 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %38 = addf %36, %37 : vector<8xf32> + store %38, %1[%c0_5, %33] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_6) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %39) + %42 = vector.transfer_read %arg2[%40, %41], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %43 = affine.load %2[((%arg5 + %39 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %39 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %44 = addf %42, %43 : vector<8xf32> + store %44, %1[%c0_5, %39] : memref<1x16xvector<8xf32>> + %45 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_6) + %46 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %47 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %45) + %48 = vector.transfer_read %arg2[%46, %47], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %49 = affine.load %2[((%arg5 + %45 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %45 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %50 = addf %48, %49 : vector<8xf32> + store %50, %1[%c0_5, %45] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_6) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %51) + %54 = vector.transfer_read %arg2[%52, %53], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %55 = affine.load %2[((%arg5 + %51 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %51 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %56 = addf %54, %55 : vector<8xf32> + store %56, %1[%c0_5, %51] : memref<1x16xvector<8xf32>> + %57 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_6) + %58 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %59 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %57) + %60 = vector.transfer_read %arg2[%58, %59], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %61 = affine.load %2[((%arg5 + %57 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %57 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %62 = addf %60, %61 : vector<8xf32> + store %62, %1[%c0_5, %57] : memref<1x16xvector<8xf32>> + %63 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_6) + %64 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %65 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %63) + %66 = vector.transfer_read %arg2[%64, %65], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %67 = affine.load %2[((%arg5 + %63 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %63 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %68 = addf %66, %67 : vector<8xf32> + store %68, %1[%c0_5, %63] : memref<1x16xvector<8xf32>> + %69 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_6) + %70 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %71 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %69) + %72 = vector.transfer_read %arg2[%70, %71], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %73 = affine.load %2[((%arg5 + %69 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %69 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %74 = addf %72, %73 : vector<8xf32> + store %74, %1[%c0_5, %69] : memref<1x16xvector<8xf32>> + %75 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_6) + %76 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %77 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %75) + %78 = vector.transfer_read %arg2[%76, %77], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %79 = affine.load %2[((%arg5 + %75 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %75 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %80 = addf %78, %79 : vector<8xf32> + store %80, %1[%c0_5, %75] : memref<1x16xvector<8xf32>> + %81 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_6) + %82 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %83 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %81) + %84 = vector.transfer_read %arg2[%82, %83], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %85 = affine.load %2[((%arg5 + %81 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %81 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %86 = addf %84, %85 : vector<8xf32> + store %86, %1[%c0_5, %81] : memref<1x16xvector<8xf32>> + %87 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_6) + %88 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %89 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %87) + %90 = vector.transfer_read %arg2[%88, %89], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %91 = affine.load %2[((%arg5 + %87 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %87 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = addf %90, %91 : vector<8xf32> + store %92, %1[%c0_5, %87] : memref<1x16xvector<8xf32>> + %93 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_6) + %94 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_5) + %95 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %93) + %96 = vector.transfer_read %arg2[%94, %95], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %97 = affine.load %2[((%arg5 + %93 * 8) floordiv 16) mod 16, (%c0_0 + %c0_5) mod 6, (((%arg5 + %93 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = addf %96, %97 : vector<8xf32> + store %98, %1[%c0_5, %93] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %99 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_4) + %100 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %101 = load %1[%c0_4, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %101, %arg2[%99, %100] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 1]} + } else { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %c0_3) + %6 = vector.transfer_read %arg2[%4, %5], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %7 = affine.load %2[((%arg5 + %c0_3 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %c0_3 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %8 = addf %6, %7 : vector<8xf32> + store %8, %1[%c0_2, %c0_3] : memref<1x16xvector<8xf32>> + %9 = affine.apply affine_map<(d0) -> (d0 + 1)>(%c0_3) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %9) + %12 = vector.transfer_read %arg2[%10, %11], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %13 = affine.load %2[((%arg5 + %9 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %14 = addf %12, %13 : vector<8xf32> + store %14, %1[%c0_2, %9] : memref<1x16xvector<8xf32>> + %15 = affine.apply affine_map<(d0) -> (d0 + 2)>(%c0_3) + %16 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %17 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %15) + %18 = vector.transfer_read %arg2[%16, %17], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %19 = affine.load %2[((%arg5 + %15 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %15 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %20 = addf %18, %19 : vector<8xf32> + store %20, %1[%c0_2, %15] : memref<1x16xvector<8xf32>> + %21 = affine.apply affine_map<(d0) -> (d0 + 3)>(%c0_3) + %22 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %23 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %21) + %24 = vector.transfer_read %arg2[%22, %23], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %25 = affine.load %2[((%arg5 + %21 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %21 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %26 = addf %24, %25 : vector<8xf32> + store %26, %1[%c0_2, %21] : memref<1x16xvector<8xf32>> + %27 = affine.apply affine_map<(d0) -> (d0 + 4)>(%c0_3) + %28 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %29 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %27) + %30 = vector.transfer_read %arg2[%28, %29], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %31 = affine.load %2[((%arg5 + %27 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %27 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %32 = addf %30, %31 : vector<8xf32> + store %32, %1[%c0_2, %27] : memref<1x16xvector<8xf32>> + %33 = affine.apply affine_map<(d0) -> (d0 + 5)>(%c0_3) + %34 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %35 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %33) + %36 = vector.transfer_read %arg2[%34, %35], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %37 = affine.load %2[((%arg5 + %33 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %33 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %38 = addf %36, %37 : vector<8xf32> + store %38, %1[%c0_2, %33] : memref<1x16xvector<8xf32>> + %39 = affine.apply affine_map<(d0) -> (d0 + 6)>(%c0_3) + %40 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %41 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %39) + %42 = vector.transfer_read %arg2[%40, %41], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %43 = affine.load %2[((%arg5 + %39 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %39 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %44 = addf %42, %43 : vector<8xf32> + store %44, %1[%c0_2, %39] : memref<1x16xvector<8xf32>> + %45 = affine.apply affine_map<(d0) -> (d0 + 7)>(%c0_3) + %46 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %47 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %45) + %48 = vector.transfer_read %arg2[%46, %47], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %49 = affine.load %2[((%arg5 + %45 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %45 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %50 = addf %48, %49 : vector<8xf32> + store %50, %1[%c0_2, %45] : memref<1x16xvector<8xf32>> + %51 = affine.apply affine_map<(d0) -> (d0 + 8)>(%c0_3) + %52 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %53 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %51) + %54 = vector.transfer_read %arg2[%52, %53], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %55 = affine.load %2[((%arg5 + %51 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %51 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %56 = addf %54, %55 : vector<8xf32> + store %56, %1[%c0_2, %51] : memref<1x16xvector<8xf32>> + %57 = affine.apply affine_map<(d0) -> (d0 + 9)>(%c0_3) + %58 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %59 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %57) + %60 = vector.transfer_read %arg2[%58, %59], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %61 = affine.load %2[((%arg5 + %57 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %57 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %62 = addf %60, %61 : vector<8xf32> + store %62, %1[%c0_2, %57] : memref<1x16xvector<8xf32>> + %63 = affine.apply affine_map<(d0) -> (d0 + 10)>(%c0_3) + %64 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %65 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %63) + %66 = vector.transfer_read %arg2[%64, %65], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %67 = affine.load %2[((%arg5 + %63 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %63 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %68 = addf %66, %67 : vector<8xf32> + store %68, %1[%c0_2, %63] : memref<1x16xvector<8xf32>> + %69 = affine.apply affine_map<(d0) -> (d0 + 11)>(%c0_3) + %70 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %71 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %69) + %72 = vector.transfer_read %arg2[%70, %71], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %73 = affine.load %2[((%arg5 + %69 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %69 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %74 = addf %72, %73 : vector<8xf32> + store %74, %1[%c0_2, %69] : memref<1x16xvector<8xf32>> + %75 = affine.apply affine_map<(d0) -> (d0 + 12)>(%c0_3) + %76 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %77 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %75) + %78 = vector.transfer_read %arg2[%76, %77], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %79 = affine.load %2[((%arg5 + %75 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %75 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %80 = addf %78, %79 : vector<8xf32> + store %80, %1[%c0_2, %75] : memref<1x16xvector<8xf32>> + %81 = affine.apply affine_map<(d0) -> (d0 + 13)>(%c0_3) + %82 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %83 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %81) + %84 = vector.transfer_read %arg2[%82, %83], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %85 = affine.load %2[((%arg5 + %81 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %81 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %86 = addf %84, %85 : vector<8xf32> + store %86, %1[%c0_2, %81] : memref<1x16xvector<8xf32>> + %87 = affine.apply affine_map<(d0) -> (d0 + 14)>(%c0_3) + %88 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %89 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %87) + %90 = vector.transfer_read %arg2[%88, %89], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %91 = affine.load %2[((%arg5 + %87 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %87 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = addf %90, %91 : vector<8xf32> + store %92, %1[%c0_2, %87] : memref<1x16xvector<8xf32>> + %93 = affine.apply affine_map<(d0) -> (d0 + 15)>(%c0_3) + %94 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_2) + %95 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %93) + %96 = vector.transfer_read %arg2[%94, %95], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %97 = affine.load %2[((%arg5 + %93 * 8) floordiv 16) mod 16, (%c0_0 + %c0_2) mod 6, (((%arg5 + %93 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = addf %96, %97 : vector<8xf32> + store %98, %1[%c0_2, %93] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %99 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %c0_0, %c0_1) + %100 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %101 = load %1[%c0_1, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %101, %arg2[%99, %100] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 1]} + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 784 : i64, index = #accln<"index{i_o,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/5_LinalgLowerToAffineLoops.mlir b/Tutorials/optimized_matmul/mlir/5_LinalgLowerToAffineLoops.mlir new file mode 100644 index 00000000..04929b29 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/5_LinalgLowerToAffineLoops.mlir @@ -0,0 +1,988 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "optimized_matmul" { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + affine.for %arg3 = 0 to 512 step 256 { + affine.for %arg4 = 0 to 128 { + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + affine.store %36, %3[((%arg5 + %c0 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c0 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + affine.store %37, %3[((%arg5 + %c1 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c1 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %38 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + affine.store %38, %3[((%arg5 + %c2 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c2 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + affine.store %39, %3[((%arg5 + %c3 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c3 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %40 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + affine.store %40, %3[((%arg5 + %c4 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c4 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %41 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + affine.store %41, %3[((%arg5 + %c5 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c5 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + affine.store %42, %3[((%arg5 + %c6 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c6 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + affine.store %43, %3[((%arg5 + %c7 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c7 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %44 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + affine.store %44, %3[((%arg5 + %c8 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c8 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + affine.store %45, %3[((%arg5 + %c9 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c9 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %46 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + affine.store %46, %3[((%arg5 + %c10 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c10 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %47 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + affine.store %47, %3[((%arg5 + %c11 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c11 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + affine.store %48, %3[((%arg5 + %c12 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c12 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %49 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + affine.store %49, %3[((%arg5 + %c13 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c13 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %50 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + affine.store %50, %3[((%arg5 + %c14 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c14 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.store %51, %3[((%arg5 + %c15 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c15 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } else { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + affine.store %36, %3[((%arg5 + %c0 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c0 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + affine.store %37, %3[((%arg5 + %c1 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c1 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %38 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + affine.store %38, %3[((%arg5 + %c2 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c2 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + affine.store %39, %3[((%arg5 + %c3 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c3 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %40 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + affine.store %40, %3[((%arg5 + %c4 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c4 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %41 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + affine.store %41, %3[((%arg5 + %c5 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c5 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + affine.store %42, %3[((%arg5 + %c6 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c6 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + affine.store %43, %3[((%arg5 + %c7 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c7 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %44 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + affine.store %44, %3[((%arg5 + %c8 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c8 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + affine.store %45, %3[((%arg5 + %c9 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c9 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %46 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + affine.store %46, %3[((%arg5 + %c10 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c10 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %47 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + affine.store %47, %3[((%arg5 + %c11 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c11 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + affine.store %48, %3[((%arg5 + %c12 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c12 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %49 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + affine.store %49, %3[((%arg5 + %c13 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c13 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %50 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + affine.store %50, %3[((%arg5 + %c14 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c14 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.store %51, %3[((%arg5 + %c15 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c15 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_o,21}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i_o,19}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 256]} + affine.for %arg4 = 0 to 784 { + affine.for %arg5 = 0 to 16 { + affine.for %arg6 = 0 to 6 { + affine.for %arg7 = 0 to 2 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 2 : i64, index = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 6 : i64, index = #accln<"index{j_i_i_o,15}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 2]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i,14}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 6, 2]} + affine.for %arg5 = 0 to 256 step 16 { + affine.for %arg6 = 0 to 128 step 4 { + affine.for %arg7 = 0 to 0 step 6 { + affine.for %arg8 = 0 to 4 { + affine.for %arg9 = 0 to 0 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %13 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %14 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %15 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %17 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %18 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %19 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %21 = load %arg0[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg0[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg0[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg0[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg0[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg0[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg0[%10, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %28 = load %arg0[%11, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = affine.load %3[((%12 - %arg3) floordiv 16) mod 16, (%13 - %c0) mod 128, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %30 = vector.extractelement %29[%c0_i64 : i64] : vector<8xf32> + %31 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %32 = affine.load %3[((%31 - %arg3) floordiv 16) mod 16, (%14 - %c0) mod 128, (((%31 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c1_i64 : i64] : vector<8xf32> + %34 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %35 = affine.load %3[((%34 - %arg3) floordiv 16) mod 16, (%15 - %c0) mod 128, (((%34 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %36 = vector.extractelement %35[%c2_i64 : i64] : vector<8xf32> + %37 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %38 = affine.load %3[((%37 - %arg3) floordiv 16) mod 16, (%16 - %c0) mod 128, (((%37 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = vector.extractelement %38[%c3_i64 : i64] : vector<8xf32> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %41 = affine.load %3[((%40 - %arg3) floordiv 16) mod 16, (%17 - %c0) mod 128, (((%40 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c4_i64 : i64] : vector<8xf32> + %43 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %44 = affine.load %3[((%43 - %arg3) floordiv 16) mod 16, (%18 - %c0) mod 128, (((%43 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = vector.extractelement %44[%c5_i64 : i64] : vector<8xf32> + %46 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %47 = affine.load %3[((%46 - %arg3) floordiv 16) mod 16, (%19 - %c0) mod 128, (((%46 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c6_i64 : i64] : vector<8xf32> + %49 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %50 = affine.load %3[((%49 - %arg3) floordiv 16) mod 16, (%20 - %c0) mod 128, (((%49 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = vector.extractelement %50[%c7_i64 : i64] : vector<8xf32> + %52 = "accv.bin_op"(%21, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %36) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %42) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %45) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = "accv.bin_op"(%27, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = "accv.bin_op"(%28, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %60 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %61 = vector.extractelement %60[%c0_i64 : i64] : vector<8xf32> + %62 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %63 = affine.load %2[((%62 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%62 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = vector.extractelement %63[%c1_i64 : i64] : vector<8xf32> + %65 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %66 = affine.load %2[((%65 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%65 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = vector.extractelement %66[%c2_i64 : i64] : vector<8xf32> + %68 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %69 = affine.load %2[((%68 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%68 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c3_i64 : i64] : vector<8xf32> + %71 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %72 = affine.load %2[((%71 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%71 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c4_i64 : i64] : vector<8xf32> + %74 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %75 = affine.load %2[((%74 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%74 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %78 = affine.load %2[((%77 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%77 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c6_i64 : i64] : vector<8xf32> + %80 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %81 = affine.load %2[((%80 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%80 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = vector.extractelement %81[%c7_i64 : i64] : vector<8xf32> + %83 = "accv.bin_op"(%61, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%64, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%67, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%70, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%73, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%76, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = "accv.bin_op"(%79, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + %90 = "accv.bin_op"(%82, %59) {predicate = 0 : i64} : (f32, f32) -> f32 + %91 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %83, %91[%c0_i64 : i64] : vector<8xf32> + affine.store %92, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %94 = affine.load %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %84, %94[%c1_i64 : i64] : vector<8xf32> + affine.store %95, %2[((%93 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%93 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %97 = affine.load %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c2_i64 : i64] : vector<8xf32> + affine.store %98, %2[((%96 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%96 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %100 = affine.load %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %86, %100[%c3_i64 : i64] : vector<8xf32> + affine.store %101, %2[((%99 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%99 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %103 = affine.load %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %87, %103[%c4_i64 : i64] : vector<8xf32> + affine.store %104, %2[((%102 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%102 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %106 = affine.load %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %88, %106[%c5_i64 : i64] : vector<8xf32> + affine.store %107, %2[((%105 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%105 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %109 = affine.load %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %89, %109[%c6_i64 : i64] : vector<8xf32> + affine.store %110, %2[((%108 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %112 = affine.load %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %90, %112[%c7_i64 : i64] : vector<8xf32> + affine.store %113, %2[((%111 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.load %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %83, %114[%c0_i64 : i64] : vector<8xf32> + affine.store %115, %2[((%12 - %arg3) floordiv 16) mod 16, (%4 - %arg4) mod 6, (((%12 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %117 = affine.load %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %84, %117[%c1_i64 : i64] : vector<8xf32> + affine.store %118, %2[((%116 - %arg3) floordiv 16) mod 16, (%5 - %arg4) mod 6, (((%116 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %120 = affine.load %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %85, %120[%c2_i64 : i64] : vector<8xf32> + affine.store %121, %2[((%119 - %arg3) floordiv 16) mod 16, (%6 - %arg4) mod 6, (((%119 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %123 = affine.load %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %124 = vector.insertelement %86, %123[%c3_i64 : i64] : vector<8xf32> + affine.store %124, %2[((%122 - %arg3) floordiv 16) mod 16, (%7 - %arg4) mod 6, (((%122 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %126 = affine.load %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %87, %126[%c4_i64 : i64] : vector<8xf32> + affine.store %127, %2[((%125 - %arg3) floordiv 16) mod 16, (%8 - %arg4) mod 6, (((%125 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %129 = affine.load %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %130 = vector.insertelement %88, %129[%c5_i64 : i64] : vector<8xf32> + affine.store %130, %2[((%128 - %arg3) floordiv 16) mod 16, (%9 - %arg4) mod 6, (((%128 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %132 = affine.load %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %133 = vector.insertelement %89, %132[%c6_i64 : i64] : vector<8xf32> + affine.store %133, %2[((%131 - %arg3) floordiv 16) mod 16, (%10 - %arg4) mod 6, (((%131 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %135 = affine.load %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %136 = vector.insertelement %90, %135[%c7_i64 : i64] : vector<8xf32> + affine.store %136, %2[((%134 - %arg3) floordiv 16) mod 16, (%11 - %arg4) mod 6, (((%134 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %137 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %138 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %139 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %140 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %141 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %142 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %143 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %144 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %145 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %146 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %147 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %148 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %149 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %150 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %151 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %152 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %153 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %154 = load %arg0[%137, %146] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg0[%138, %147] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg0[%139, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg0[%140, %149] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %158 = load %arg0[%141, %150] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg0[%142, %151] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %160 = load %arg0[%143, %152] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %161 = load %arg0[%144, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = affine.load %3[((%145 - %arg3) floordiv 16) mod 16, (%146 - %c0) mod 128, (((%145 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %165 = affine.load %3[((%164 - %arg3) floordiv 16) mod 16, (%147 - %c0) mod 128, (((%164 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c1_i64 : i64] : vector<8xf32> + %167 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %168 = affine.load %3[((%167 - %arg3) floordiv 16) mod 16, (%148 - %c0) mod 128, (((%167 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c2_i64 : i64] : vector<8xf32> + %170 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %171 = affine.load %3[((%170 - %arg3) floordiv 16) mod 16, (%149 - %c0) mod 128, (((%170 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %172 = vector.extractelement %171[%c3_i64 : i64] : vector<8xf32> + %173 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %174 = affine.load %3[((%173 - %arg3) floordiv 16) mod 16, (%150 - %c0) mod 128, (((%173 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c4_i64 : i64] : vector<8xf32> + %176 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %177 = affine.load %3[((%176 - %arg3) floordiv 16) mod 16, (%151 - %c0) mod 128, (((%176 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %178 = vector.extractelement %177[%c5_i64 : i64] : vector<8xf32> + %179 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %180 = affine.load %3[((%179 - %arg3) floordiv 16) mod 16, (%152 - %c0) mod 128, (((%179 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %181 = vector.extractelement %180[%c6_i64 : i64] : vector<8xf32> + %182 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %183 = affine.load %3[((%182 - %arg3) floordiv 16) mod 16, (%153 - %c0) mod 128, (((%182 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %184 = vector.extractelement %183[%c7_i64 : i64] : vector<8xf32> + %185 = "accv.bin_op"(%154, %163) {predicate = 2 : i64} : (f32, f32) -> f32 + %186 = "accv.bin_op"(%155, %166) {predicate = 2 : i64} : (f32, f32) -> f32 + %187 = "accv.bin_op"(%156, %169) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = "accv.bin_op"(%157, %172) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = "accv.bin_op"(%158, %175) {predicate = 2 : i64} : (f32, f32) -> f32 + %190 = "accv.bin_op"(%159, %178) {predicate = 2 : i64} : (f32, f32) -> f32 + %191 = "accv.bin_op"(%160, %181) {predicate = 2 : i64} : (f32, f32) -> f32 + %192 = "accv.bin_op"(%161, %184) {predicate = 2 : i64} : (f32, f32) -> f32 + %193 = affine.load %2[((%145 - %arg3) floordiv 16) mod 16, (%137 - %arg4) mod 6, (((%145 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c0_i64 : i64] : vector<8xf32> + %195 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %196 = affine.load %2[((%195 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%195 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c1_i64 : i64] : vector<8xf32> + %198 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %199 = affine.load %2[((%198 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%198 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %200 = vector.extractelement %199[%c2_i64 : i64] : vector<8xf32> + %201 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %202 = affine.load %2[((%201 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%201 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %203 = vector.extractelement %202[%c3_i64 : i64] : vector<8xf32> + %204 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %205 = affine.load %2[((%204 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%204 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %206 = vector.extractelement %205[%c4_i64 : i64] : vector<8xf32> + %207 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %208 = affine.load %2[((%207 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%207 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %209 = vector.extractelement %208[%c5_i64 : i64] : vector<8xf32> + %210 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %211 = affine.load %2[((%210 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%210 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %212 = vector.extractelement %211[%c6_i64 : i64] : vector<8xf32> + %213 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %214 = affine.load %2[((%213 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%213 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %215 = vector.extractelement %214[%c7_i64 : i64] : vector<8xf32> + %216 = "accv.bin_op"(%194, %185) {predicate = 0 : i64} : (f32, f32) -> f32 + %217 = "accv.bin_op"(%197, %186) {predicate = 0 : i64} : (f32, f32) -> f32 + %218 = "accv.bin_op"(%200, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + %219 = "accv.bin_op"(%203, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + %220 = "accv.bin_op"(%206, %189) {predicate = 0 : i64} : (f32, f32) -> f32 + %221 = "accv.bin_op"(%209, %190) {predicate = 0 : i64} : (f32, f32) -> f32 + %222 = "accv.bin_op"(%212, %191) {predicate = 0 : i64} : (f32, f32) -> f32 + %223 = "accv.bin_op"(%215, %192) {predicate = 0 : i64} : (f32, f32) -> f32 + %224 = affine.load %2[((%145 - %arg3) floordiv 16) mod 16, (%137 - %arg4) mod 6, (((%145 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %216, %224[%c0_i64 : i64] : vector<8xf32> + affine.store %225, %2[((%145 - %arg3) floordiv 16) mod 16, (%137 - %arg4) mod 6, (((%145 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %226 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %227 = affine.load %2[((%226 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%226 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %217, %227[%c1_i64 : i64] : vector<8xf32> + affine.store %228, %2[((%226 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%226 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %229 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %230 = affine.load %2[((%229 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%229 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %218, %230[%c2_i64 : i64] : vector<8xf32> + affine.store %231, %2[((%229 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%229 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %232 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %233 = affine.load %2[((%232 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%232 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %219, %233[%c3_i64 : i64] : vector<8xf32> + affine.store %234, %2[((%232 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%232 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %235 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %236 = affine.load %2[((%235 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%235 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %220, %236[%c4_i64 : i64] : vector<8xf32> + affine.store %237, %2[((%235 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%235 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %238 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %239 = affine.load %2[((%238 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%238 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %240 = vector.insertelement %221, %239[%c5_i64 : i64] : vector<8xf32> + affine.store %240, %2[((%238 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%238 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %241 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %242 = affine.load %2[((%241 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%241 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %243 = vector.insertelement %222, %242[%c6_i64 : i64] : vector<8xf32> + affine.store %243, %2[((%241 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%241 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %244 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %245 = affine.load %2[((%244 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%244 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %246 = vector.insertelement %223, %245[%c7_i64 : i64] : vector<8xf32> + affine.store %246, %2[((%244 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%244 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = affine.load %2[((%145 - %arg3) floordiv 16) mod 16, (%137 - %arg4) mod 6, (((%145 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = vector.insertelement %216, %247[%c0_i64 : i64] : vector<8xf32> + affine.store %248, %2[((%145 - %arg3) floordiv 16) mod 16, (%137 - %arg4) mod 6, (((%145 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %249 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %250 = affine.load %2[((%249 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%249 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %251 = vector.insertelement %217, %250[%c1_i64 : i64] : vector<8xf32> + affine.store %251, %2[((%249 - %arg3) floordiv 16) mod 16, (%138 - %arg4) mod 6, (((%249 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %252 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %253 = affine.load %2[((%252 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%252 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %254 = vector.insertelement %218, %253[%c2_i64 : i64] : vector<8xf32> + affine.store %254, %2[((%252 - %arg3) floordiv 16) mod 16, (%139 - %arg4) mod 6, (((%252 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %255 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %256 = affine.load %2[((%255 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%255 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %257 = vector.insertelement %219, %256[%c3_i64 : i64] : vector<8xf32> + affine.store %257, %2[((%255 - %arg3) floordiv 16) mod 16, (%140 - %arg4) mod 6, (((%255 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %258 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %259 = affine.load %2[((%258 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%258 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %260 = vector.insertelement %220, %259[%c4_i64 : i64] : vector<8xf32> + affine.store %260, %2[((%258 - %arg3) floordiv 16) mod 16, (%141 - %arg4) mod 6, (((%258 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %261 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %262 = affine.load %2[((%261 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%261 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %263 = vector.insertelement %221, %262[%c5_i64 : i64] : vector<8xf32> + affine.store %263, %2[((%261 - %arg3) floordiv 16) mod 16, (%142 - %arg4) mod 6, (((%261 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %264 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %265 = affine.load %2[((%264 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%264 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %266 = vector.insertelement %222, %265[%c6_i64 : i64] : vector<8xf32> + affine.store %266, %2[((%264 - %arg3) floordiv 16) mod 16, (%143 - %arg4) mod 6, (((%264 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %267 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %268 = affine.load %2[((%267 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%267 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %269 = vector.insertelement %223, %268[%c7_i64 : i64] : vector<8xf32> + affine.store %269, %2[((%267 - %arg3) floordiv 16) mod 16, (%144 - %arg4) mod 6, (((%267 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_i,12}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_o,11}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 4]} + affine.for %arg7 = 0 to 4 { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %6 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %7 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %10 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %11 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %13 = load %arg0[%arg4, %5] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%arg4, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = load %arg0[%arg4, %7] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %16 = load %arg0[%arg4, %8] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg0[%arg4, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %18 = load %arg0[%arg4, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg0[%arg4, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %20 = load %arg0[%arg4, %12] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = affine.load %3[((%4 - %arg3) floordiv 16) mod 16, (%5 - %c0) mod 128, (((%4 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %22 = vector.extractelement %21[%c0_i64 : i64] : vector<8xf32> + %23 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %24 = affine.load %3[((%23 - %arg3) floordiv 16) mod 16, (%6 - %c0) mod 128, (((%23 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %25 = vector.extractelement %24[%c1_i64 : i64] : vector<8xf32> + %26 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %27 = affine.load %3[((%26 - %arg3) floordiv 16) mod 16, (%7 - %c0) mod 128, (((%26 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %28 = vector.extractelement %27[%c2_i64 : i64] : vector<8xf32> + %29 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %30 = affine.load %3[((%29 - %arg3) floordiv 16) mod 16, (%8 - %c0) mod 128, (((%29 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %31 = vector.extractelement %30[%c3_i64 : i64] : vector<8xf32> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %33 = affine.load %3[((%32 - %arg3) floordiv 16) mod 16, (%9 - %c0) mod 128, (((%32 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %34 = vector.extractelement %33[%c4_i64 : i64] : vector<8xf32> + %35 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %36 = affine.load %3[((%35 - %arg3) floordiv 16) mod 16, (%10 - %c0) mod 128, (((%35 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = vector.extractelement %36[%c5_i64 : i64] : vector<8xf32> + %38 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %39 = affine.load %3[((%38 - %arg3) floordiv 16) mod 16, (%11 - %c0) mod 128, (((%38 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %40 = vector.extractelement %39[%c6_i64 : i64] : vector<8xf32> + %41 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %42 = affine.load %3[((%41 - %arg3) floordiv 16) mod 16, (%12 - %c0) mod 128, (((%41 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = vector.extractelement %42[%c7_i64 : i64] : vector<8xf32> + %44 = "accv.bin_op"(%13, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %45 = "accv.bin_op"(%14, %25) {predicate = 2 : i64} : (f32, f32) -> f32 + %46 = "accv.bin_op"(%15, %28) {predicate = 2 : i64} : (f32, f32) -> f32 + %47 = "accv.bin_op"(%16, %31) {predicate = 2 : i64} : (f32, f32) -> f32 + %48 = "accv.bin_op"(%17, %34) {predicate = 2 : i64} : (f32, f32) -> f32 + %49 = "accv.bin_op"(%18, %37) {predicate = 2 : i64} : (f32, f32) -> f32 + %50 = "accv.bin_op"(%19, %40) {predicate = 2 : i64} : (f32, f32) -> f32 + %51 = "accv.bin_op"(%20, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %52 = affine.load %2[((%4 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%4 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %53 = vector.extractelement %52[%c0_i64 : i64] : vector<8xf32> + %54 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %55 = affine.load %2[((%54 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%54 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %56 = vector.extractelement %55[%c1_i64 : i64] : vector<8xf32> + %57 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %58 = affine.load %2[((%57 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%57 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = vector.extractelement %58[%c2_i64 : i64] : vector<8xf32> + %60 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %61 = affine.load %2[((%60 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%60 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %62 = vector.extractelement %61[%c3_i64 : i64] : vector<8xf32> + %63 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %64 = affine.load %2[((%63 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%63 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %65 = vector.extractelement %64[%c4_i64 : i64] : vector<8xf32> + %66 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %67 = affine.load %2[((%66 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%66 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c5_i64 : i64] : vector<8xf32> + %69 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %70 = affine.load %2[((%69 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%69 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %71 = vector.extractelement %70[%c6_i64 : i64] : vector<8xf32> + %72 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %73 = affine.load %2[((%72 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%72 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c7_i64 : i64] : vector<8xf32> + %75 = "accv.bin_op"(%53, %44) {predicate = 0 : i64} : (f32, f32) -> f32 + %76 = "accv.bin_op"(%56, %45) {predicate = 0 : i64} : (f32, f32) -> f32 + %77 = "accv.bin_op"(%59, %46) {predicate = 0 : i64} : (f32, f32) -> f32 + %78 = "accv.bin_op"(%62, %47) {predicate = 0 : i64} : (f32, f32) -> f32 + %79 = "accv.bin_op"(%65, %48) {predicate = 0 : i64} : (f32, f32) -> f32 + %80 = "accv.bin_op"(%68, %49) {predicate = 0 : i64} : (f32, f32) -> f32 + %81 = "accv.bin_op"(%71, %50) {predicate = 0 : i64} : (f32, f32) -> f32 + %82 = "accv.bin_op"(%74, %51) {predicate = 0 : i64} : (f32, f32) -> f32 + %83 = affine.load %2[((%4 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%4 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %84 = vector.insertelement %75, %83[%c0_i64 : i64] : vector<8xf32> + affine.store %84, %2[((%4 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%4 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %85 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %86 = affine.load %2[((%85 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%85 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %87 = vector.insertelement %76, %86[%c1_i64 : i64] : vector<8xf32> + affine.store %87, %2[((%85 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%85 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %89 = affine.load %2[((%88 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%88 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %77, %89[%c2_i64 : i64] : vector<8xf32> + affine.store %90, %2[((%88 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%88 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %91 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %92 = affine.load %2[((%91 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%91 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = vector.insertelement %78, %92[%c3_i64 : i64] : vector<8xf32> + affine.store %93, %2[((%91 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%91 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %95 = affine.load %2[((%94 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%94 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %79, %95[%c4_i64 : i64] : vector<8xf32> + affine.store %96, %2[((%94 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%94 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %97 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %98 = affine.load %2[((%97 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%97 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %80, %98[%c5_i64 : i64] : vector<8xf32> + affine.store %99, %2[((%97 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%97 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %100 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %101 = affine.load %2[((%100 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%100 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %81, %101[%c6_i64 : i64] : vector<8xf32> + affine.store %102, %2[((%100 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%100 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %103 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %104 = affine.load %2[((%103 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%103 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %82, %104[%c7_i64 : i64] : vector<8xf32> + affine.store %105, %2[((%103 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%103 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %106 = affine.load %2[((%4 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%4 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %75, %106[%c0_i64 : i64] : vector<8xf32> + affine.store %107, %2[((%4 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%4 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %109 = affine.load %2[((%108 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %76, %109[%c1_i64 : i64] : vector<8xf32> + affine.store %110, %2[((%108 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%108 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %112 = affine.load %2[((%111 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %77, %112[%c2_i64 : i64] : vector<8xf32> + affine.store %113, %2[((%111 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%111 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %115 = affine.load %2[((%114 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%114 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %78, %115[%c3_i64 : i64] : vector<8xf32> + affine.store %116, %2[((%114 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%114 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %117 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %118 = affine.load %2[((%117 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%117 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %79, %118[%c4_i64 : i64] : vector<8xf32> + affine.store %119, %2[((%117 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%117 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %120 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %121 = affine.load %2[((%120 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%120 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = vector.insertelement %80, %121[%c5_i64 : i64] : vector<8xf32> + affine.store %122, %2[((%120 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%120 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %123 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %124 = affine.load %2[((%123 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%123 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %81, %124[%c6_i64 : i64] : vector<8xf32> + affine.store %125, %2[((%123 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%123 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %126 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %127 = affine.load %2[((%126 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%126 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = vector.insertelement %82, %127[%c7_i64 : i64] : vector<8xf32> + affine.store %128, %2[((%126 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%126 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %129 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %130 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %131 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %132 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %133 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %134 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %135 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %136 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %137 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %138 = load %arg0[%arg4, %130] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg0[%arg4, %131] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %140 = load %arg0[%arg4, %132] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %141 = load %arg0[%arg4, %133] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %142 = load %arg0[%arg4, %134] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %143 = load %arg0[%arg4, %135] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg0[%arg4, %136] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg0[%arg4, %137] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %146 = affine.load %3[((%129 - %arg3) floordiv 16) mod 16, (%130 - %c0) mod 128, (((%129 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %147 = vector.extractelement %146[%c0_i64 : i64] : vector<8xf32> + %148 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %149 = affine.load %3[((%148 - %arg3) floordiv 16) mod 16, (%131 - %c0) mod 128, (((%148 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %150 = vector.extractelement %149[%c1_i64 : i64] : vector<8xf32> + %151 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %152 = affine.load %3[((%151 - %arg3) floordiv 16) mod 16, (%132 - %c0) mod 128, (((%151 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %153 = vector.extractelement %152[%c2_i64 : i64] : vector<8xf32> + %154 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %155 = affine.load %3[((%154 - %arg3) floordiv 16) mod 16, (%133 - %c0) mod 128, (((%154 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c3_i64 : i64] : vector<8xf32> + %157 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %158 = affine.load %3[((%157 - %arg3) floordiv 16) mod 16, (%134 - %c0) mod 128, (((%157 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %159 = vector.extractelement %158[%c4_i64 : i64] : vector<8xf32> + %160 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %161 = affine.load %3[((%160 - %arg3) floordiv 16) mod 16, (%135 - %c0) mod 128, (((%160 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c5_i64 : i64] : vector<8xf32> + %163 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %164 = affine.load %3[((%163 - %arg3) floordiv 16) mod 16, (%136 - %c0) mod 128, (((%163 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c6_i64 : i64] : vector<8xf32> + %166 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %167 = affine.load %3[((%166 - %arg3) floordiv 16) mod 16, (%137 - %c0) mod 128, (((%166 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c7_i64 : i64] : vector<8xf32> + %169 = "accv.bin_op"(%138, %147) {predicate = 2 : i64} : (f32, f32) -> f32 + %170 = "accv.bin_op"(%139, %150) {predicate = 2 : i64} : (f32, f32) -> f32 + %171 = "accv.bin_op"(%140, %153) {predicate = 2 : i64} : (f32, f32) -> f32 + %172 = "accv.bin_op"(%141, %156) {predicate = 2 : i64} : (f32, f32) -> f32 + %173 = "accv.bin_op"(%142, %159) {predicate = 2 : i64} : (f32, f32) -> f32 + %174 = "accv.bin_op"(%143, %162) {predicate = 2 : i64} : (f32, f32) -> f32 + %175 = "accv.bin_op"(%144, %165) {predicate = 2 : i64} : (f32, f32) -> f32 + %176 = "accv.bin_op"(%145, %168) {predicate = 2 : i64} : (f32, f32) -> f32 + %177 = affine.load %2[((%129 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%129 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %178 = vector.extractelement %177[%c0_i64 : i64] : vector<8xf32> + %179 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %180 = affine.load %2[((%179 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%179 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %181 = vector.extractelement %180[%c1_i64 : i64] : vector<8xf32> + %182 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %183 = affine.load %2[((%182 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%182 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %186 = affine.load %2[((%185 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%185 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c3_i64 : i64] : vector<8xf32> + %188 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %189 = affine.load %2[((%188 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%188 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c4_i64 : i64] : vector<8xf32> + %191 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %192 = affine.load %2[((%191 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%191 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c5_i64 : i64] : vector<8xf32> + %194 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %195 = affine.load %2[((%194 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%194 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %196 = vector.extractelement %195[%c6_i64 : i64] : vector<8xf32> + %197 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %198 = affine.load %2[((%197 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%197 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c7_i64 : i64] : vector<8xf32> + %200 = "accv.bin_op"(%178, %169) {predicate = 0 : i64} : (f32, f32) -> f32 + %201 = "accv.bin_op"(%181, %170) {predicate = 0 : i64} : (f32, f32) -> f32 + %202 = "accv.bin_op"(%184, %171) {predicate = 0 : i64} : (f32, f32) -> f32 + %203 = "accv.bin_op"(%187, %172) {predicate = 0 : i64} : (f32, f32) -> f32 + %204 = "accv.bin_op"(%190, %173) {predicate = 0 : i64} : (f32, f32) -> f32 + %205 = "accv.bin_op"(%193, %174) {predicate = 0 : i64} : (f32, f32) -> f32 + %206 = "accv.bin_op"(%196, %175) {predicate = 0 : i64} : (f32, f32) -> f32 + %207 = "accv.bin_op"(%199, %176) {predicate = 0 : i64} : (f32, f32) -> f32 + %208 = affine.load %2[((%129 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%129 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %209 = vector.insertelement %200, %208[%c0_i64 : i64] : vector<8xf32> + affine.store %209, %2[((%129 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%129 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %210 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %211 = affine.load %2[((%210 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%210 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %201, %211[%c1_i64 : i64] : vector<8xf32> + affine.store %212, %2[((%210 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%210 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %213 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %214 = affine.load %2[((%213 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%213 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %202, %214[%c2_i64 : i64] : vector<8xf32> + affine.store %215, %2[((%213 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%213 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %216 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %217 = affine.load %2[((%216 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%216 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %203, %217[%c3_i64 : i64] : vector<8xf32> + affine.store %218, %2[((%216 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%216 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %219 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %220 = affine.load %2[((%219 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%219 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %204, %220[%c4_i64 : i64] : vector<8xf32> + affine.store %221, %2[((%219 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%219 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %222 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %223 = affine.load %2[((%222 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%222 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %205, %223[%c5_i64 : i64] : vector<8xf32> + affine.store %224, %2[((%222 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%222 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %225 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %226 = affine.load %2[((%225 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%225 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %206, %226[%c6_i64 : i64] : vector<8xf32> + affine.store %227, %2[((%225 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%225 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %228 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %229 = affine.load %2[((%228 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%228 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %207, %229[%c7_i64 : i64] : vector<8xf32> + affine.store %230, %2[((%228 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%228 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %231 = affine.load %2[((%129 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%129 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %200, %231[%c0_i64 : i64] : vector<8xf32> + affine.store %232, %2[((%129 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%129 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %233 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %234 = affine.load %2[((%233 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %201, %234[%c1_i64 : i64] : vector<8xf32> + affine.store %235, %2[((%233 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%233 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %236 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %237 = affine.load %2[((%236 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %238 = vector.insertelement %202, %237[%c2_i64 : i64] : vector<8xf32> + affine.store %238, %2[((%236 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%236 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %239 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %240 = affine.load %2[((%239 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %203, %240[%c3_i64 : i64] : vector<8xf32> + affine.store %241, %2[((%239 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%239 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %242 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %243 = affine.load %2[((%242 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %244 = vector.insertelement %204, %243[%c4_i64 : i64] : vector<8xf32> + affine.store %244, %2[((%242 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%242 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %245 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %246 = affine.load %2[((%245 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = vector.insertelement %205, %246[%c5_i64 : i64] : vector<8xf32> + affine.store %247, %2[((%245 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%245 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %249 = affine.load %2[((%248 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%248 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %250 = vector.insertelement %206, %249[%c6_i64 : i64] : vector<8xf32> + affine.store %250, %2[((%248 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%248 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %251 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %252 = affine.load %2[((%251 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%251 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %253 = vector.insertelement %207, %252[%c7_i64 : i64] : vector<8xf32> + affine.store %253, %2[((%251 - %arg3) floordiv 16) mod 16, (%arg4 - %arg4) mod 6, (((%251 - %arg3) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 128]} + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = affine.load %2[((%arg5 + %c0 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c0 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %7 = addf %5, %6 : vector<8xf32> + store %7, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %9 = vector.transfer_read %arg2[%arg4, %8], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %10 = affine.load %2[((%arg5 + %c1 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c1 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %11 = addf %9, %10 : vector<8xf32> + store %11, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %13 = vector.transfer_read %arg2[%arg4, %12], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %14 = affine.load %2[((%arg5 + %c2 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c2 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %15 = addf %13, %14 : vector<8xf32> + store %15, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %17 = vector.transfer_read %arg2[%arg4, %16], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %18 = affine.load %2[((%arg5 + %c3 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c3 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %19 = addf %17, %18 : vector<8xf32> + store %19, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %21 = vector.transfer_read %arg2[%arg4, %20], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %22 = affine.load %2[((%arg5 + %c4 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c4 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %23 = addf %21, %22 : vector<8xf32> + store %23, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %25 = vector.transfer_read %arg2[%arg4, %24], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %26 = affine.load %2[((%arg5 + %c5 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c5 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %27 = addf %25, %26 : vector<8xf32> + store %27, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %29 = vector.transfer_read %arg2[%arg4, %28], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %30 = affine.load %2[((%arg5 + %c6 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c6 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %31 = addf %29, %30 : vector<8xf32> + store %31, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = affine.load %2[((%arg5 + %c7 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c7 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %35 = addf %33, %34 : vector<8xf32> + store %35, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %36 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %37 = vector.transfer_read %arg2[%arg4, %36], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %38 = affine.load %2[((%arg5 + %c8 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c8 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %39 = addf %37, %38 : vector<8xf32> + store %39, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %41 = vector.transfer_read %arg2[%arg4, %40], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %42 = affine.load %2[((%arg5 + %c9 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %43 = addf %41, %42 : vector<8xf32> + store %43, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %44 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %45 = vector.transfer_read %arg2[%arg4, %44], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %46 = affine.load %2[((%arg5 + %c10 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c10 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %47 = addf %45, %46 : vector<8xf32> + store %47, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %49 = vector.transfer_read %arg2[%arg4, %48], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %50 = affine.load %2[((%arg5 + %c11 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c11 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %51 = addf %49, %50 : vector<8xf32> + store %51, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %52 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %53 = vector.transfer_read %arg2[%arg4, %52], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %54 = affine.load %2[((%arg5 + %c12 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c12 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %55 = addf %53, %54 : vector<8xf32> + store %55, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %57 = vector.transfer_read %arg2[%arg4, %56], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %58 = affine.load %2[((%arg5 + %c13 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c13 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = addf %57, %58 : vector<8xf32> + store %59, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %60 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %61 = vector.transfer_read %arg2[%arg4, %60], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %62 = affine.load %2[((%arg5 + %c14 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c14 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %63 = addf %61, %62 : vector<8xf32> + store %63, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %65 = vector.transfer_read %arg2[%arg4, %64], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %66 = affine.load %2[((%arg5 + %c15 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c15 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = addf %65, %66 : vector<8xf32> + store %67, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %68 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %69 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %69, %arg2[%arg4, %68] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 1]} + } else { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = affine.load %2[((%arg5 + %c0 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c0 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %7 = addf %5, %6 : vector<8xf32> + store %7, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %9 = vector.transfer_read %arg2[%arg4, %8], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %10 = affine.load %2[((%arg5 + %c1 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c1 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %11 = addf %9, %10 : vector<8xf32> + store %11, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %13 = vector.transfer_read %arg2[%arg4, %12], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %14 = affine.load %2[((%arg5 + %c2 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c2 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %15 = addf %13, %14 : vector<8xf32> + store %15, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %17 = vector.transfer_read %arg2[%arg4, %16], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %18 = affine.load %2[((%arg5 + %c3 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c3 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %19 = addf %17, %18 : vector<8xf32> + store %19, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %21 = vector.transfer_read %arg2[%arg4, %20], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %22 = affine.load %2[((%arg5 + %c4 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c4 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %23 = addf %21, %22 : vector<8xf32> + store %23, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %25 = vector.transfer_read %arg2[%arg4, %24], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %26 = affine.load %2[((%arg5 + %c5 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c5 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %27 = addf %25, %26 : vector<8xf32> + store %27, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %29 = vector.transfer_read %arg2[%arg4, %28], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %30 = affine.load %2[((%arg5 + %c6 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c6 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %31 = addf %29, %30 : vector<8xf32> + store %31, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = affine.load %2[((%arg5 + %c7 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c7 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %35 = addf %33, %34 : vector<8xf32> + store %35, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %36 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %37 = vector.transfer_read %arg2[%arg4, %36], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %38 = affine.load %2[((%arg5 + %c8 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c8 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %39 = addf %37, %38 : vector<8xf32> + store %39, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %41 = vector.transfer_read %arg2[%arg4, %40], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %42 = affine.load %2[((%arg5 + %c9 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %43 = addf %41, %42 : vector<8xf32> + store %43, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %44 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %45 = vector.transfer_read %arg2[%arg4, %44], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %46 = affine.load %2[((%arg5 + %c10 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c10 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %47 = addf %45, %46 : vector<8xf32> + store %47, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %49 = vector.transfer_read %arg2[%arg4, %48], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %50 = affine.load %2[((%arg5 + %c11 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c11 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %51 = addf %49, %50 : vector<8xf32> + store %51, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %52 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %53 = vector.transfer_read %arg2[%arg4, %52], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %54 = affine.load %2[((%arg5 + %c12 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c12 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %55 = addf %53, %54 : vector<8xf32> + store %55, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %57 = vector.transfer_read %arg2[%arg4, %56], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %58 = affine.load %2[((%arg5 + %c13 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c13 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = addf %57, %58 : vector<8xf32> + store %59, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %60 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %61 = vector.transfer_read %arg2[%arg4, %60], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %62 = affine.load %2[((%arg5 + %c14 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c14 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %63 = addf %61, %62 : vector<8xf32> + store %63, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %65 = vector.transfer_read %arg2[%arg4, %64], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %66 = affine.load %2[((%arg5 + %c15 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c15 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = addf %65, %66 : vector<8xf32> + store %67, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %68 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %69 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %69, %arg2[%arg4, %68] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 1]} + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 784 : i64, index = #accln<"index{i_o,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/5_SimplifyAffineStructures.mlir b/Tutorials/optimized_matmul/mlir/5_SimplifyAffineStructures.mlir new file mode 100644 index 00000000..d8cf6fcd --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/5_SimplifyAffineStructures.mlir @@ -0,0 +1,1711 @@ +module @optimized_matmul { + accv.module "optimized_matmul" { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c780 = constant 780 : index + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + affine.for %arg3 = 0 to 512 step 256 { + affine.for %arg4 = 0 to 780 step 6 { + affine.for %arg5 = 0 to 256 step 16 { + affine.for %arg6 = 0 to 128 step 4 { + affine.for %arg7 = 0 to 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = "accv.bin_op"(%2, %3) {predicate = 2 : i64} : (f32, f32) -> f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = "accv.bin_op"(%5, %4) {predicate = 0 : i64} : (f32, f32) -> f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %10 = load %arg0[%arg4, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg1[%9, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %12 = "accv.bin_op"(%10, %11) {predicate = 2 : i64} : (f32, f32) -> f32 + %13 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = "accv.bin_op"(%13, %12) {predicate = 0 : i64} : (f32, f32) -> f32 + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %15, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %17 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %18 = load %arg0[%arg4, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg1[%17, %16] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = "accv.bin_op"(%18, %19) {predicate = 2 : i64} : (f32, f32) -> f32 + %21 = load %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = "accv.bin_op"(%21, %20) {predicate = 0 : i64} : (f32, f32) -> f32 + store %22, %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %23 = load %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %23, %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %25 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %26 = load %arg0[%arg4, %25] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg1[%25, %24] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = "accv.bin_op"(%26, %27) {predicate = 2 : i64} : (f32, f32) -> f32 + %29 = load %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %30 = "accv.bin_op"(%29, %28) {predicate = 0 : i64} : (f32, f32) -> f32 + store %30, %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = load %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %31, %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %33 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %34 = load %arg0[%arg4, %33] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %35 = load %arg1[%33, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = "accv.bin_op"(%34, %35) {predicate = 2 : i64} : (f32, f32) -> f32 + %37 = load %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %38 = "accv.bin_op"(%37, %36) {predicate = 0 : i64} : (f32, f32) -> f32 + store %38, %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = load %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %39, %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %41 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %42 = load %arg0[%arg4, %41] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %43 = load %arg1[%41, %40] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = "accv.bin_op"(%42, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %45 = load %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = "accv.bin_op"(%45, %44) {predicate = 0 : i64} : (f32, f32) -> f32 + store %46, %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %47 = load %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %47, %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %49 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %50 = load %arg0[%arg4, %49] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %51 = load %arg1[%49, %48] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = "accv.bin_op"(%50, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = load %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %54 = "accv.bin_op"(%53, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + store %54, %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = load %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %55, %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %57 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %58 = load %arg0[%arg4, %57] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%57, %56] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = "accv.bin_op"(%58, %59) {predicate = 2 : i64} : (f32, f32) -> f32 + %61 = load %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = "accv.bin_op"(%61, %60) {predicate = 0 : i64} : (f32, f32) -> f32 + store %62, %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %65 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %66 = load %arg0[%arg4, %65] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %67 = load %arg1[%65, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %68 = "accv.bin_op"(%66, %67) {predicate = 2 : i64} : (f32, f32) -> f32 + %69 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = "accv.bin_op"(%69, %68) {predicate = 0 : i64} : (f32, f32) -> f32 + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %71, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %72 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %73 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %74 = load %arg0[%arg4, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = "accv.bin_op"(%74, %75) {predicate = 2 : i64} : (f32, f32) -> f32 + %77 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = "accv.bin_op"(%77, %76) {predicate = 0 : i64} : (f32, f32) -> f32 + store %78, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %81 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %82 = load %arg0[%arg4, %81] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %83 = load %arg1[%81, %80] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = "accv.bin_op"(%82, %83) {predicate = 2 : i64} : (f32, f32) -> f32 + %85 = load %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %86 = "accv.bin_op"(%85, %84) {predicate = 0 : i64} : (f32, f32) -> f32 + store %86, %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = load %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %87, %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %89 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %90 = load %arg0[%arg4, %89] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %91 = load %arg1[%89, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = "accv.bin_op"(%90, %91) {predicate = 2 : i64} : (f32, f32) -> f32 + %93 = load %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = "accv.bin_op"(%93, %92) {predicate = 0 : i64} : (f32, f32) -> f32 + store %94, %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = load %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %95, %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %97 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %98 = load %arg0[%arg4, %97] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %99 = load %arg1[%97, %96] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %100 = "accv.bin_op"(%98, %99) {predicate = 2 : i64} : (f32, f32) -> f32 + %101 = load %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = "accv.bin_op"(%101, %100) {predicate = 0 : i64} : (f32, f32) -> f32 + store %102, %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = load %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %103, %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %105 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %106 = load %arg0[%arg4, %105] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %107 = load %arg1[%105, %104] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %108 = "accv.bin_op"(%106, %107) {predicate = 2 : i64} : (f32, f32) -> f32 + %109 = load %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %110 = "accv.bin_op"(%109, %108) {predicate = 0 : i64} : (f32, f32) -> f32 + store %110, %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = load %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %111, %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %113 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %114 = load %arg0[%arg4, %113] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%113, %112] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = "accv.bin_op"(%114, %115) {predicate = 2 : i64} : (f32, f32) -> f32 + %117 = load %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = "accv.bin_op"(%117, %116) {predicate = 0 : i64} : (f32, f32) -> f32 + store %118, %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %121 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %122 = load %arg0[%arg4, %121] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg1[%121, %120] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = "accv.bin_op"(%122, %123) {predicate = 2 : i64} : (f32, f32) -> f32 + %125 = load %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = "accv.bin_op"(%125, %124) {predicate = 0 : i64} : (f32, f32) -> f32 + store %126, %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = load %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %127, %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %129 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %130 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %131 = load %arg0[%128, %130] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%130, %129] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = "accv.bin_op"(%131, %132) {predicate = 2 : i64} : (f32, f32) -> f32 + %134 = load %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = "accv.bin_op"(%134, %133) {predicate = 0 : i64} : (f32, f32) -> f32 + store %135, %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %138 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %139 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %140 = load %arg0[%137, %139] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %141 = load %arg1[%139, %138] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = "accv.bin_op"(%140, %141) {predicate = 2 : i64} : (f32, f32) -> f32 + %143 = load %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = "accv.bin_op"(%143, %142) {predicate = 0 : i64} : (f32, f32) -> f32 + store %144, %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = load %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %145, %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %147 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %148 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %149 = load %arg0[%146, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%148, %147] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = "accv.bin_op"(%149, %150) {predicate = 2 : i64} : (f32, f32) -> f32 + %152 = load %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = "accv.bin_op"(%152, %151) {predicate = 0 : i64} : (f32, f32) -> f32 + store %153, %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %156 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %157 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %158 = load %arg0[%155, %157] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg1[%157, %156] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = "accv.bin_op"(%158, %159) {predicate = 2 : i64} : (f32, f32) -> f32 + %161 = load %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = "accv.bin_op"(%161, %160) {predicate = 0 : i64} : (f32, f32) -> f32 + store %162, %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = load %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %163, %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %165 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %166 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %167 = load %arg0[%164, %166] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%166, %165] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = "accv.bin_op"(%167, %168) {predicate = 2 : i64} : (f32, f32) -> f32 + %170 = load %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = "accv.bin_op"(%170, %169) {predicate = 0 : i64} : (f32, f32) -> f32 + store %171, %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %174 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %175 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %176 = load %arg0[%173, %175] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %177 = load %arg1[%175, %174] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = "accv.bin_op"(%176, %177) {predicate = 2 : i64} : (f32, f32) -> f32 + %179 = load %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = "accv.bin_op"(%179, %178) {predicate = 0 : i64} : (f32, f32) -> f32 + store %180, %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = load %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %181, %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %183 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %184 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %185 = load %arg0[%182, %184] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%184, %183] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = "accv.bin_op"(%185, %186) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = load %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = "accv.bin_op"(%188, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + store %189, %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %192 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %193 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %194 = load %arg0[%191, %193] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %195 = load %arg1[%193, %192] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = "accv.bin_op"(%194, %195) {predicate = 2 : i64} : (f32, f32) -> f32 + %197 = load %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = "accv.bin_op"(%197, %196) {predicate = 0 : i64} : (f32, f32) -> f32 + store %198, %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = load %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %199, %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %201 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %202 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %203 = load %arg0[%200, %202] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%202, %201] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = "accv.bin_op"(%203, %204) {predicate = 2 : i64} : (f32, f32) -> f32 + %206 = load %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = "accv.bin_op"(%206, %205) {predicate = 0 : i64} : (f32, f32) -> f32 + store %207, %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %210 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %211 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %212 = load %arg0[%209, %211] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %213 = load %arg1[%211, %210] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = "accv.bin_op"(%212, %213) {predicate = 2 : i64} : (f32, f32) -> f32 + %215 = load %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = "accv.bin_op"(%215, %214) {predicate = 0 : i64} : (f32, f32) -> f32 + store %216, %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %217, %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %218 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %219 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %220 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %221 = load %arg0[%218, %220] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%220, %219] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = "accv.bin_op"(%221, %222) {predicate = 2 : i64} : (f32, f32) -> f32 + %224 = load %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = "accv.bin_op"(%224, %223) {predicate = 0 : i64} : (f32, f32) -> f32 + store %225, %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %228 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %229 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %230 = load %arg0[%227, %229] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %231 = load %arg1[%229, %228] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = "accv.bin_op"(%230, %231) {predicate = 2 : i64} : (f32, f32) -> f32 + %233 = load %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = "accv.bin_op"(%233, %232) {predicate = 0 : i64} : (f32, f32) -> f32 + store %234, %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %235, %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %236 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %237 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %238 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %239 = load %arg0[%236, %238] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%238, %237] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = "accv.bin_op"(%239, %240) {predicate = 2 : i64} : (f32, f32) -> f32 + %242 = load %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = "accv.bin_op"(%242, %241) {predicate = 0 : i64} : (f32, f32) -> f32 + store %243, %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %246 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %247 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %248 = load %arg0[%245, %247] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %249 = load %arg1[%247, %246] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = "accv.bin_op"(%248, %249) {predicate = 2 : i64} : (f32, f32) -> f32 + %251 = load %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = "accv.bin_op"(%251, %250) {predicate = 0 : i64} : (f32, f32) -> f32 + store %252, %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %253, %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %254 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %255 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %256 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %257 = load %arg0[%254, %256] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%256, %255] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = "accv.bin_op"(%257, %258) {predicate = 2 : i64} : (f32, f32) -> f32 + %260 = load %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = "accv.bin_op"(%260, %259) {predicate = 0 : i64} : (f32, f32) -> f32 + store %261, %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %264 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %265 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %266 = load %arg0[%263, %265] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %267 = load %arg1[%265, %264] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = "accv.bin_op"(%266, %267) {predicate = 2 : i64} : (f32, f32) -> f32 + %269 = load %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = "accv.bin_op"(%269, %268) {predicate = 0 : i64} : (f32, f32) -> f32 + store %270, %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %271, %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %272 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %273 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %274 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %275 = load %arg0[%272, %274] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%274, %273] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = "accv.bin_op"(%275, %276) {predicate = 2 : i64} : (f32, f32) -> f32 + %278 = load %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = "accv.bin_op"(%278, %277) {predicate = 0 : i64} : (f32, f32) -> f32 + store %279, %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %282 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %283 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %284 = load %arg0[%281, %283] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %285 = load %arg1[%283, %282] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = "accv.bin_op"(%284, %285) {predicate = 2 : i64} : (f32, f32) -> f32 + %287 = load %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = "accv.bin_op"(%287, %286) {predicate = 0 : i64} : (f32, f32) -> f32 + store %288, %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %289, %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %290 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %291 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %292 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %293 = load %arg0[%290, %292] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%292, %291] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = "accv.bin_op"(%293, %294) {predicate = 2 : i64} : (f32, f32) -> f32 + %296 = load %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = "accv.bin_op"(%296, %295) {predicate = 0 : i64} : (f32, f32) -> f32 + store %297, %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %300 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %301 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %302 = load %arg0[%299, %301] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %303 = load %arg1[%301, %300] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = "accv.bin_op"(%302, %303) {predicate = 2 : i64} : (f32, f32) -> f32 + %305 = load %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = "accv.bin_op"(%305, %304) {predicate = 0 : i64} : (f32, f32) -> f32 + store %306, %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = load %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %307, %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %308 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %309 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %310 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %311 = load %arg0[%308, %310] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%310, %309] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = "accv.bin_op"(%311, %312) {predicate = 2 : i64} : (f32, f32) -> f32 + %314 = load %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = "accv.bin_op"(%314, %313) {predicate = 0 : i64} : (f32, f32) -> f32 + store %315, %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %318 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %319 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %320 = load %arg0[%317, %319] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%319, %318] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = "accv.bin_op"(%320, %321) {predicate = 2 : i64} : (f32, f32) -> f32 + %323 = load %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = "accv.bin_op"(%323, %322) {predicate = 0 : i64} : (f32, f32) -> f32 + store %324, %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %327 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %328 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %329 = load %arg0[%326, %328] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%328, %327] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = "accv.bin_op"(%329, %330) {predicate = 2 : i64} : (f32, f32) -> f32 + %332 = load %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = "accv.bin_op"(%332, %331) {predicate = 0 : i64} : (f32, f32) -> f32 + store %333, %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %336 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %337 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %338 = load %arg0[%335, %337] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%337, %336] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = "accv.bin_op"(%338, %339) {predicate = 2 : i64} : (f32, f32) -> f32 + %341 = load %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = "accv.bin_op"(%341, %340) {predicate = 0 : i64} : (f32, f32) -> f32 + store %342, %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %345 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %346 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %347 = load %arg0[%344, %346] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%346, %345] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = "accv.bin_op"(%347, %348) {predicate = 2 : i64} : (f32, f32) -> f32 + %350 = load %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = "accv.bin_op"(%350, %349) {predicate = 0 : i64} : (f32, f32) -> f32 + store %351, %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %354 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %355 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %356 = load %arg0[%353, %355] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%355, %354] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = "accv.bin_op"(%356, %357) {predicate = 2 : i64} : (f32, f32) -> f32 + %359 = load %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = "accv.bin_op"(%359, %358) {predicate = 0 : i64} : (f32, f32) -> f32 + store %360, %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %363 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %364 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %365 = load %arg0[%362, %364] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%364, %363] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = "accv.bin_op"(%365, %366) {predicate = 2 : i64} : (f32, f32) -> f32 + %368 = load %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = "accv.bin_op"(%368, %367) {predicate = 0 : i64} : (f32, f32) -> f32 + store %369, %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %372 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %373 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %374 = load %arg0[%371, %373] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%373, %372] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = "accv.bin_op"(%374, %375) {predicate = 2 : i64} : (f32, f32) -> f32 + %377 = load %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = "accv.bin_op"(%377, %376) {predicate = 0 : i64} : (f32, f32) -> f32 + store %378, %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %381 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %382 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %383 = load %arg0[%380, %382] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%382, %381] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = "accv.bin_op"(%383, %384) {predicate = 2 : i64} : (f32, f32) -> f32 + %386 = load %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = "accv.bin_op"(%386, %385) {predicate = 0 : i64} : (f32, f32) -> f32 + store %387, %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %390 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %391 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %392 = load %arg0[%389, %391] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%391, %390] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = "accv.bin_op"(%392, %393) {predicate = 2 : i64} : (f32, f32) -> f32 + %395 = load %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = "accv.bin_op"(%395, %394) {predicate = 0 : i64} : (f32, f32) -> f32 + store %396, %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %399 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %400 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %401 = load %arg0[%398, %400] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %402 = load %arg1[%400, %399] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = "accv.bin_op"(%401, %402) {predicate = 2 : i64} : (f32, f32) -> f32 + %404 = load %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %405 = "accv.bin_op"(%404, %403) {predicate = 0 : i64} : (f32, f32) -> f32 + store %405, %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %406 = load %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %406, %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %408 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %409 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %410 = load %arg0[%407, %409] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %411 = load %arg1[%409, %408] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %412 = "accv.bin_op"(%410, %411) {predicate = 2 : i64} : (f32, f32) -> f32 + %413 = load %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %414 = "accv.bin_op"(%413, %412) {predicate = 0 : i64} : (f32, f32) -> f32 + store %414, %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = load %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %415, %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %417 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %418 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %419 = load %arg0[%416, %418] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %420 = load %arg1[%418, %417] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = "accv.bin_op"(%419, %420) {predicate = 2 : i64} : (f32, f32) -> f32 + %422 = load %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = "accv.bin_op"(%422, %421) {predicate = 0 : i64} : (f32, f32) -> f32 + store %423, %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %424 = load %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %424, %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %426 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %427 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %428 = load %arg0[%425, %427] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %429 = load %arg1[%427, %426] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %430 = "accv.bin_op"(%428, %429) {predicate = 2 : i64} : (f32, f32) -> f32 + %431 = load %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %432 = "accv.bin_op"(%431, %430) {predicate = 0 : i64} : (f32, f32) -> f32 + store %432, %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = load %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %433, %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %435 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %436 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %437 = load %arg0[%434, %436] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %438 = load %arg1[%436, %435] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = "accv.bin_op"(%437, %438) {predicate = 2 : i64} : (f32, f32) -> f32 + %440 = load %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = "accv.bin_op"(%440, %439) {predicate = 0 : i64} : (f32, f32) -> f32 + store %441, %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %442 = load %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %442, %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %444 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %445 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %446 = load %arg0[%443, %445] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %447 = load %arg1[%445, %444] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %448 = "accv.bin_op"(%446, %447) {predicate = 2 : i64} : (f32, f32) -> f32 + %449 = load %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %450 = "accv.bin_op"(%449, %448) {predicate = 0 : i64} : (f32, f32) -> f32 + store %450, %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = load %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %451, %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %453 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %454 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %455 = load %arg0[%452, %454] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %456 = load %arg1[%454, %453] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = "accv.bin_op"(%455, %456) {predicate = 2 : i64} : (f32, f32) -> f32 + %458 = load %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = "accv.bin_op"(%458, %457) {predicate = 0 : i64} : (f32, f32) -> f32 + store %459, %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = load %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %460, %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %462 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %463 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %464 = load %arg0[%461, %463] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %465 = load %arg1[%463, %462] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %466 = "accv.bin_op"(%464, %465) {predicate = 2 : i64} : (f32, f32) -> f32 + %467 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = "accv.bin_op"(%467, %466) {predicate = 0 : i64} : (f32, f32) -> f32 + store %468, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %469, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %471 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %472 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %473 = load %arg0[%470, %472] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %474 = load %arg1[%472, %471] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = "accv.bin_op"(%473, %474) {predicate = 2 : i64} : (f32, f32) -> f32 + %476 = load %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = "accv.bin_op"(%476, %475) {predicate = 0 : i64} : (f32, f32) -> f32 + store %477, %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = load %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %478, %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %480 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %481 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %482 = load %arg0[%479, %481] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %483 = load %arg1[%481, %480] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %484 = "accv.bin_op"(%482, %483) {predicate = 2 : i64} : (f32, f32) -> f32 + %485 = load %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = "accv.bin_op"(%485, %484) {predicate = 0 : i64} : (f32, f32) -> f32 + store %486, %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = load %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %487, %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %489 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %490 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %491 = load %arg0[%488, %490] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %492 = load %arg1[%490, %489] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = "accv.bin_op"(%491, %492) {predicate = 2 : i64} : (f32, f32) -> f32 + %494 = load %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = "accv.bin_op"(%494, %493) {predicate = 0 : i64} : (f32, f32) -> f32 + store %495, %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = load %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %496, %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %498 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %499 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %500 = load %arg0[%497, %499] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %501 = load %arg1[%499, %498] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %502 = "accv.bin_op"(%500, %501) {predicate = 2 : i64} : (f32, f32) -> f32 + %503 = load %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = "accv.bin_op"(%503, %502) {predicate = 0 : i64} : (f32, f32) -> f32 + store %504, %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %505 = load %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %505, %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %507 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %508 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %509 = load %arg0[%506, %508] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %510 = load %arg1[%508, %507] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %511 = "accv.bin_op"(%509, %510) {predicate = 2 : i64} : (f32, f32) -> f32 + %512 = load %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = "accv.bin_op"(%512, %511) {predicate = 0 : i64} : (f32, f32) -> f32 + store %513, %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %514, %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %515 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %516 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %517 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %518 = load %arg0[%515, %517] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %519 = load %arg1[%517, %516] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = "accv.bin_op"(%518, %519) {predicate = 2 : i64} : (f32, f32) -> f32 + %521 = load %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = "accv.bin_op"(%521, %520) {predicate = 0 : i64} : (f32, f32) -> f32 + store %522, %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %523 = load %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %523, %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %525 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %526 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %527 = load %arg0[%524, %526] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %528 = load %arg1[%526, %525] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %529 = "accv.bin_op"(%527, %528) {predicate = 2 : i64} : (f32, f32) -> f32 + %530 = load %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = "accv.bin_op"(%530, %529) {predicate = 0 : i64} : (f32, f32) -> f32 + store %531, %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %532, %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %533 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %534 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %535 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %536 = load %arg0[%533, %535] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %537 = load %arg1[%535, %534] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = "accv.bin_op"(%536, %537) {predicate = 2 : i64} : (f32, f32) -> f32 + %539 = load %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = "accv.bin_op"(%539, %538) {predicate = 0 : i64} : (f32, f32) -> f32 + store %540, %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %541 = load %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %541, %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %543 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %544 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %545 = load %arg0[%542, %544] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %546 = load %arg1[%544, %543] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %547 = "accv.bin_op"(%545, %546) {predicate = 2 : i64} : (f32, f32) -> f32 + %548 = load %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = "accv.bin_op"(%548, %547) {predicate = 0 : i64} : (f32, f32) -> f32 + store %549, %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %550, %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %551 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %552 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %553 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %554 = load %arg0[%551, %553] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %555 = load %arg1[%553, %552] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = "accv.bin_op"(%554, %555) {predicate = 2 : i64} : (f32, f32) -> f32 + %557 = load %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = "accv.bin_op"(%557, %556) {predicate = 0 : i64} : (f32, f32) -> f32 + store %558, %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = load %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %559, %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %561 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %562 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %563 = load %arg0[%560, %562] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %564 = load %arg1[%562, %561] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %565 = "accv.bin_op"(%563, %564) {predicate = 2 : i64} : (f32, f32) -> f32 + %566 = load %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = "accv.bin_op"(%566, %565) {predicate = 0 : i64} : (f32, f32) -> f32 + store %567, %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %568, %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %569 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %570 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %571 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %572 = load %arg0[%569, %571] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %573 = load %arg1[%571, %570] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = "accv.bin_op"(%572, %573) {predicate = 2 : i64} : (f32, f32) -> f32 + %575 = load %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = "accv.bin_op"(%575, %574) {predicate = 0 : i64} : (f32, f32) -> f32 + store %576, %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %577 = load %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %577, %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %579 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %580 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %581 = load %arg0[%578, %580] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %582 = load %arg1[%580, %579] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %583 = "accv.bin_op"(%581, %582) {predicate = 2 : i64} : (f32, f32) -> f32 + %584 = load %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = "accv.bin_op"(%584, %583) {predicate = 0 : i64} : (f32, f32) -> f32 + store %585, %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %586, %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %587 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %588 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %589 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %590 = load %arg0[%587, %589] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %591 = load %arg1[%589, %588] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = "accv.bin_op"(%590, %591) {predicate = 2 : i64} : (f32, f32) -> f32 + %593 = load %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = "accv.bin_op"(%593, %592) {predicate = 0 : i64} : (f32, f32) -> f32 + store %594, %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %595 = load %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %595, %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %597 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %598 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %599 = load %arg0[%596, %598] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %600 = load %arg1[%598, %597] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %601 = "accv.bin_op"(%599, %600) {predicate = 2 : i64} : (f32, f32) -> f32 + %602 = load %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %603 = "accv.bin_op"(%602, %601) {predicate = 0 : i64} : (f32, f32) -> f32 + store %603, %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %604 = load %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %604, %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %605 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %606 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %607 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %608 = load %arg0[%605, %607] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %609 = load %arg1[%607, %606] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %610 = "accv.bin_op"(%608, %609) {predicate = 2 : i64} : (f32, f32) -> f32 + %611 = load %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %612 = "accv.bin_op"(%611, %610) {predicate = 0 : i64} : (f32, f32) -> f32 + store %612, %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %613 = load %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %613, %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %614 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %615 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %616 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %617 = load %arg0[%614, %616] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %618 = load %arg1[%616, %615] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %619 = "accv.bin_op"(%617, %618) {predicate = 2 : i64} : (f32, f32) -> f32 + %620 = load %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %621 = "accv.bin_op"(%620, %619) {predicate = 0 : i64} : (f32, f32) -> f32 + store %621, %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %622 = load %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %622, %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %623 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %624 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %625 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %626 = load %arg0[%623, %625] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %627 = load %arg1[%625, %624] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %628 = "accv.bin_op"(%626, %627) {predicate = 2 : i64} : (f32, f32) -> f32 + %629 = load %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %630 = "accv.bin_op"(%629, %628) {predicate = 0 : i64} : (f32, f32) -> f32 + store %630, %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %631 = load %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %631, %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %632 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %633 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %634 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %635 = load %arg0[%632, %634] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %636 = load %arg1[%634, %633] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %637 = "accv.bin_op"(%635, %636) {predicate = 2 : i64} : (f32, f32) -> f32 + %638 = load %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %639 = "accv.bin_op"(%638, %637) {predicate = 0 : i64} : (f32, f32) -> f32 + store %639, %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %640 = load %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %640, %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %641 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %642 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %643 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %644 = load %arg0[%641, %643] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %645 = load %arg1[%643, %642] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %646 = "accv.bin_op"(%644, %645) {predicate = 2 : i64} : (f32, f32) -> f32 + %647 = load %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %648 = "accv.bin_op"(%647, %646) {predicate = 0 : i64} : (f32, f32) -> f32 + store %648, %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %649 = load %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %649, %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %650 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %651 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %652 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %653 = load %arg0[%650, %652] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %654 = load %arg1[%652, %651] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %655 = "accv.bin_op"(%653, %654) {predicate = 2 : i64} : (f32, f32) -> f32 + %656 = load %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %657 = "accv.bin_op"(%656, %655) {predicate = 0 : i64} : (f32, f32) -> f32 + store %657, %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %658 = load %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %658, %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %659 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %660 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %661 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %662 = load %arg0[%659, %661] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %663 = load %arg1[%661, %660] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %664 = "accv.bin_op"(%662, %663) {predicate = 2 : i64} : (f32, f32) -> f32 + %665 = load %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %666 = "accv.bin_op"(%665, %664) {predicate = 0 : i64} : (f32, f32) -> f32 + store %666, %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %667 = load %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %667, %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %668 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %669 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %670 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %671 = load %arg0[%668, %670] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %672 = load %arg1[%670, %669] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %673 = "accv.bin_op"(%671, %672) {predicate = 2 : i64} : (f32, f32) -> f32 + %674 = load %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %675 = "accv.bin_op"(%674, %673) {predicate = 0 : i64} : (f32, f32) -> f32 + store %675, %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %676 = load %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %676, %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %677 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %678 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %679 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %680 = load %arg0[%677, %679] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %681 = load %arg1[%679, %678] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %682 = "accv.bin_op"(%680, %681) {predicate = 2 : i64} : (f32, f32) -> f32 + %683 = load %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %684 = "accv.bin_op"(%683, %682) {predicate = 0 : i64} : (f32, f32) -> f32 + store %684, %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %685 = load %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %685, %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %686 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %687 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %688 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %689 = load %arg0[%686, %688] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %690 = load %arg1[%688, %687] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %691 = "accv.bin_op"(%689, %690) {predicate = 2 : i64} : (f32, f32) -> f32 + %692 = load %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %693 = "accv.bin_op"(%692, %691) {predicate = 0 : i64} : (f32, f32) -> f32 + store %693, %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %694 = load %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %694, %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %695 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %696 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %697 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %698 = load %arg0[%695, %697] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %699 = load %arg1[%697, %696] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %700 = "accv.bin_op"(%698, %699) {predicate = 2 : i64} : (f32, f32) -> f32 + %701 = load %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %702 = "accv.bin_op"(%701, %700) {predicate = 0 : i64} : (f32, f32) -> f32 + store %702, %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %703 = load %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %703, %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %704 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %705 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %706 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %707 = load %arg0[%704, %706] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %708 = load %arg1[%706, %705] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %709 = "accv.bin_op"(%707, %708) {predicate = 2 : i64} : (f32, f32) -> f32 + %710 = load %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %711 = "accv.bin_op"(%710, %709) {predicate = 0 : i64} : (f32, f32) -> f32 + store %711, %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %712 = load %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %712, %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %713 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %714 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %715 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %716 = load %arg0[%713, %715] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %717 = load %arg1[%715, %714] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %718 = "accv.bin_op"(%716, %717) {predicate = 2 : i64} : (f32, f32) -> f32 + %719 = load %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %720 = "accv.bin_op"(%719, %718) {predicate = 0 : i64} : (f32, f32) -> f32 + store %720, %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %721 = load %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %721, %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %722 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %723 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %724 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %725 = load %arg0[%722, %724] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %726 = load %arg1[%724, %723] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %727 = "accv.bin_op"(%725, %726) {predicate = 2 : i64} : (f32, f32) -> f32 + %728 = load %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %729 = "accv.bin_op"(%728, %727) {predicate = 0 : i64} : (f32, f32) -> f32 + store %729, %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %730 = load %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %730, %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %731 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %732 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %733 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %734 = load %arg0[%731, %733] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %735 = load %arg1[%733, %732] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %736 = "accv.bin_op"(%734, %735) {predicate = 2 : i64} : (f32, f32) -> f32 + %737 = load %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %738 = "accv.bin_op"(%737, %736) {predicate = 0 : i64} : (f32, f32) -> f32 + store %738, %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %739 = load %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %739, %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %740 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %741 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %742 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %743 = load %arg0[%740, %742] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %744 = load %arg1[%742, %741] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %745 = "accv.bin_op"(%743, %744) {predicate = 2 : i64} : (f32, f32) -> f32 + %746 = load %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %747 = "accv.bin_op"(%746, %745) {predicate = 0 : i64} : (f32, f32) -> f32 + store %747, %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %748 = load %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %748, %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %749 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %750 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %751 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %752 = load %arg0[%749, %751] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %753 = load %arg1[%751, %750] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %754 = "accv.bin_op"(%752, %753) {predicate = 2 : i64} : (f32, f32) -> f32 + %755 = load %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %756 = "accv.bin_op"(%755, %754) {predicate = 0 : i64} : (f32, f32) -> f32 + store %756, %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %757 = load %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %757, %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %758 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %759 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %760 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %761 = load %arg0[%758, %760] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %762 = load %arg1[%760, %759] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %763 = "accv.bin_op"(%761, %762) {predicate = 2 : i64} : (f32, f32) -> f32 + %764 = load %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %765 = "accv.bin_op"(%764, %763) {predicate = 0 : i64} : (f32, f32) -> f32 + store %765, %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %766 = load %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %766, %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %767 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %768 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %769 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %770 = load %arg0[%767, %769] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %771 = load %arg1[%769, %768] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %772 = "accv.bin_op"(%770, %771) {predicate = 2 : i64} : (f32, f32) -> f32 + %773 = load %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %774 = "accv.bin_op"(%773, %772) {predicate = 0 : i64} : (f32, f32) -> f32 + store %774, %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %775 = load %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %775, %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %776 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %777 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %778 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %779 = load %arg0[%776, %778] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %780 = load %arg1[%778, %777] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %781 = "accv.bin_op"(%779, %780) {predicate = 2 : i64} : (f32, f32) -> f32 + %782 = load %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %783 = "accv.bin_op"(%782, %781) {predicate = 0 : i64} : (f32, f32) -> f32 + store %783, %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %784 = load %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %784, %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %785 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %786 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %787 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %788 = load %arg0[%785, %787] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %789 = load %arg1[%787, %786] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %790 = "accv.bin_op"(%788, %789) {predicate = 2 : i64} : (f32, f32) -> f32 + %791 = load %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %792 = "accv.bin_op"(%791, %790) {predicate = 0 : i64} : (f32, f32) -> f32 + store %792, %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %793 = load %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %793, %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %794 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %795 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %796 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %797 = load %arg0[%794, %796] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %798 = load %arg1[%796, %795] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %799 = "accv.bin_op"(%797, %798) {predicate = 2 : i64} : (f32, f32) -> f32 + %800 = load %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %801 = "accv.bin_op"(%800, %799) {predicate = 0 : i64} : (f32, f32) -> f32 + store %801, %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %802 = load %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %802, %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %803 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %804 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %805 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %806 = load %arg0[%803, %805] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %807 = load %arg1[%805, %804] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %808 = "accv.bin_op"(%806, %807) {predicate = 2 : i64} : (f32, f32) -> f32 + %809 = load %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %810 = "accv.bin_op"(%809, %808) {predicate = 0 : i64} : (f32, f32) -> f32 + store %810, %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %811 = load %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %811, %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %812 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %813 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %814 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %815 = load %arg0[%812, %814] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %816 = load %arg1[%814, %813] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %817 = "accv.bin_op"(%815, %816) {predicate = 2 : i64} : (f32, f32) -> f32 + %818 = load %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %819 = "accv.bin_op"(%818, %817) {predicate = 0 : i64} : (f32, f32) -> f32 + store %819, %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %820 = load %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %820, %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %821 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %822 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %823 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %824 = load %arg0[%821, %823] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %825 = load %arg1[%823, %822] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %826 = "accv.bin_op"(%824, %825) {predicate = 2 : i64} : (f32, f32) -> f32 + %827 = load %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %828 = "accv.bin_op"(%827, %826) {predicate = 0 : i64} : (f32, f32) -> f32 + store %828, %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %829 = load %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %829, %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %830 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %831 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %832 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %833 = load %arg0[%830, %832] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %834 = load %arg1[%832, %831] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %835 = "accv.bin_op"(%833, %834) {predicate = 2 : i64} : (f32, f32) -> f32 + %836 = load %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %837 = "accv.bin_op"(%836, %835) {predicate = 0 : i64} : (f32, f32) -> f32 + store %837, %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %838 = load %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %838, %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %839 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %840 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %841 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %842 = load %arg0[%839, %841] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %843 = load %arg1[%841, %840] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %844 = "accv.bin_op"(%842, %843) {predicate = 2 : i64} : (f32, f32) -> f32 + %845 = load %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %846 = "accv.bin_op"(%845, %844) {predicate = 0 : i64} : (f32, f32) -> f32 + store %846, %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %847 = load %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %847, %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_4,14}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_3,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_3,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 16, 128]} + } {begin = 0 : i64, end = 780 : i64, index = #accln<"index{i_1,11}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 256, 128]} + affine.for %arg4 = 0 to 256 step 16 { + affine.for %arg5 = 0 to 128 step 4 { + affine.for %arg6 = 0 to 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = "accv.bin_op"(%2, %3) {predicate = 2 : i64} : (f32, f32) -> f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = "accv.bin_op"(%5, %4) {predicate = 0 : i64} : (f32, f32) -> f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %10 = load %arg0[%c780, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg1[%9, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %12 = "accv.bin_op"(%10, %11) {predicate = 2 : i64} : (f32, f32) -> f32 + %13 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = "accv.bin_op"(%13, %12) {predicate = 0 : i64} : (f32, f32) -> f32 + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %15, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %17 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %18 = load %arg0[%c780, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg1[%17, %16] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = "accv.bin_op"(%18, %19) {predicate = 2 : i64} : (f32, f32) -> f32 + %21 = load %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = "accv.bin_op"(%21, %20) {predicate = 0 : i64} : (f32, f32) -> f32 + store %22, %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %23 = load %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %23, %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %25 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %26 = load %arg0[%c780, %25] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg1[%25, %24] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = "accv.bin_op"(%26, %27) {predicate = 2 : i64} : (f32, f32) -> f32 + %29 = load %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %30 = "accv.bin_op"(%29, %28) {predicate = 0 : i64} : (f32, f32) -> f32 + store %30, %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = load %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %31, %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %33 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %34 = load %arg0[%c780, %33] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %35 = load %arg1[%33, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = "accv.bin_op"(%34, %35) {predicate = 2 : i64} : (f32, f32) -> f32 + %37 = load %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %38 = "accv.bin_op"(%37, %36) {predicate = 0 : i64} : (f32, f32) -> f32 + store %38, %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = load %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %39, %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %41 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %42 = load %arg0[%c780, %41] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %43 = load %arg1[%41, %40] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = "accv.bin_op"(%42, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %45 = load %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = "accv.bin_op"(%45, %44) {predicate = 0 : i64} : (f32, f32) -> f32 + store %46, %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %47 = load %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %47, %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %49 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %50 = load %arg0[%c780, %49] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %51 = load %arg1[%49, %48] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = "accv.bin_op"(%50, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = load %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %54 = "accv.bin_op"(%53, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + store %54, %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = load %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %55, %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %57 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %58 = load %arg0[%c780, %57] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%57, %56] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = "accv.bin_op"(%58, %59) {predicate = 2 : i64} : (f32, f32) -> f32 + %61 = load %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = "accv.bin_op"(%61, %60) {predicate = 0 : i64} : (f32, f32) -> f32 + store %62, %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %65 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %66 = load %arg0[%c780, %65] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %67 = load %arg1[%65, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %68 = "accv.bin_op"(%66, %67) {predicate = 2 : i64} : (f32, f32) -> f32 + %69 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = "accv.bin_op"(%69, %68) {predicate = 0 : i64} : (f32, f32) -> f32 + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %71, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %72 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %73 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %74 = load %arg0[%c780, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = "accv.bin_op"(%74, %75) {predicate = 2 : i64} : (f32, f32) -> f32 + %77 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = "accv.bin_op"(%77, %76) {predicate = 0 : i64} : (f32, f32) -> f32 + store %78, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %81 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %82 = load %arg0[%c780, %81] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %83 = load %arg1[%81, %80] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = "accv.bin_op"(%82, %83) {predicate = 2 : i64} : (f32, f32) -> f32 + %85 = load %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %86 = "accv.bin_op"(%85, %84) {predicate = 0 : i64} : (f32, f32) -> f32 + store %86, %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = load %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %87, %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %89 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %90 = load %arg0[%c780, %89] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %91 = load %arg1[%89, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = "accv.bin_op"(%90, %91) {predicate = 2 : i64} : (f32, f32) -> f32 + %93 = load %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = "accv.bin_op"(%93, %92) {predicate = 0 : i64} : (f32, f32) -> f32 + store %94, %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = load %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %95, %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %97 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %98 = load %arg0[%c780, %97] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %99 = load %arg1[%97, %96] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %100 = "accv.bin_op"(%98, %99) {predicate = 2 : i64} : (f32, f32) -> f32 + %101 = load %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = "accv.bin_op"(%101, %100) {predicate = 0 : i64} : (f32, f32) -> f32 + store %102, %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = load %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %103, %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %105 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %106 = load %arg0[%c780, %105] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %107 = load %arg1[%105, %104] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %108 = "accv.bin_op"(%106, %107) {predicate = 2 : i64} : (f32, f32) -> f32 + %109 = load %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %110 = "accv.bin_op"(%109, %108) {predicate = 0 : i64} : (f32, f32) -> f32 + store %110, %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = load %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %111, %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %113 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %114 = load %arg0[%c780, %113] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%113, %112] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = "accv.bin_op"(%114, %115) {predicate = 2 : i64} : (f32, f32) -> f32 + %117 = load %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = "accv.bin_op"(%117, %116) {predicate = 0 : i64} : (f32, f32) -> f32 + store %118, %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %121 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %122 = load %arg0[%c780, %121] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg1[%121, %120] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = "accv.bin_op"(%122, %123) {predicate = 2 : i64} : (f32, f32) -> f32 + %125 = load %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = "accv.bin_op"(%125, %124) {predicate = 0 : i64} : (f32, f32) -> f32 + store %126, %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = load %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %127, %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %129 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %130 = load %arg0[%c781, %129] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg1[%129, %128] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = "accv.bin_op"(%130, %131) {predicate = 2 : i64} : (f32, f32) -> f32 + %133 = load %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = "accv.bin_op"(%133, %132) {predicate = 0 : i64} : (f32, f32) -> f32 + store %134, %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = load %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %135, %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %137 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %138 = load %arg0[%c781, %137] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%137, %136] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = "accv.bin_op"(%138, %139) {predicate = 2 : i64} : (f32, f32) -> f32 + %141 = load %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = "accv.bin_op"(%141, %140) {predicate = 0 : i64} : (f32, f32) -> f32 + store %142, %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %145 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %146 = load %arg0[%c781, %145] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %147 = load %arg1[%145, %144] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = "accv.bin_op"(%146, %147) {predicate = 2 : i64} : (f32, f32) -> f32 + %149 = load %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = "accv.bin_op"(%149, %148) {predicate = 0 : i64} : (f32, f32) -> f32 + store %150, %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = load %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %151, %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %153 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %154 = load %arg0[%c781, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg1[%153, %152] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = "accv.bin_op"(%154, %155) {predicate = 2 : i64} : (f32, f32) -> f32 + %157 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = "accv.bin_op"(%157, %156) {predicate = 0 : i64} : (f32, f32) -> f32 + store %158, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %159, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %161 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %162 = load %arg0[%c781, %161] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%161, %160] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = "accv.bin_op"(%162, %163) {predicate = 2 : i64} : (f32, f32) -> f32 + %165 = load %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = "accv.bin_op"(%165, %164) {predicate = 0 : i64} : (f32, f32) -> f32 + store %166, %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %169 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %170 = load %arg0[%c781, %169] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %171 = load %arg1[%169, %168] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = "accv.bin_op"(%170, %171) {predicate = 2 : i64} : (f32, f32) -> f32 + %173 = load %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = "accv.bin_op"(%173, %172) {predicate = 0 : i64} : (f32, f32) -> f32 + store %174, %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = load %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %175, %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %177 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %178 = load %arg0[%c781, %177] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %179 = load %arg1[%177, %176] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = "accv.bin_op"(%178, %179) {predicate = 2 : i64} : (f32, f32) -> f32 + %181 = load %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = "accv.bin_op"(%181, %180) {predicate = 0 : i64} : (f32, f32) -> f32 + store %182, %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = load %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %183, %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %185 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %186 = load %arg0[%c781, %185] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%185, %184] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = "accv.bin_op"(%186, %187) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = load %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = "accv.bin_op"(%189, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + store %190, %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %193 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %194 = load %arg0[%c781, %193] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %195 = load %arg1[%193, %192] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = "accv.bin_op"(%194, %195) {predicate = 2 : i64} : (f32, f32) -> f32 + %197 = load %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = "accv.bin_op"(%197, %196) {predicate = 0 : i64} : (f32, f32) -> f32 + store %198, %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = load %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %199, %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %201 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %202 = load %arg0[%c781, %201] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %203 = load %arg1[%201, %200] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = "accv.bin_op"(%202, %203) {predicate = 2 : i64} : (f32, f32) -> f32 + %205 = load %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = "accv.bin_op"(%205, %204) {predicate = 0 : i64} : (f32, f32) -> f32 + store %206, %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = load %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %207, %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %209 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %210 = load %arg0[%c781, %209] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %211 = load %arg1[%209, %208] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %212 = "accv.bin_op"(%210, %211) {predicate = 2 : i64} : (f32, f32) -> f32 + %213 = load %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = "accv.bin_op"(%213, %212) {predicate = 0 : i64} : (f32, f32) -> f32 + store %214, %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %215, %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %217 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %218 = load %arg0[%c781, %217] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %219 = load %arg1[%217, %216] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = "accv.bin_op"(%218, %219) {predicate = 2 : i64} : (f32, f32) -> f32 + %221 = load %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = "accv.bin_op"(%221, %220) {predicate = 0 : i64} : (f32, f32) -> f32 + store %222, %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %223, %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %224 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %225 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %226 = load %arg0[%c781, %225] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %227 = load %arg1[%225, %224] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = "accv.bin_op"(%226, %227) {predicate = 2 : i64} : (f32, f32) -> f32 + %229 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %230 = "accv.bin_op"(%229, %228) {predicate = 0 : i64} : (f32, f32) -> f32 + store %230, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %231, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %233 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %234 = load %arg0[%c781, %233] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %235 = load %arg1[%233, %232] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %236 = "accv.bin_op"(%234, %235) {predicate = 2 : i64} : (f32, f32) -> f32 + %237 = load %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = "accv.bin_op"(%237, %236) {predicate = 0 : i64} : (f32, f32) -> f32 + store %238, %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %239, %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %241 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %242 = load %arg0[%c781, %241] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %243 = load %arg1[%241, %240] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = "accv.bin_op"(%242, %243) {predicate = 2 : i64} : (f32, f32) -> f32 + %245 = load %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = "accv.bin_op"(%245, %244) {predicate = 0 : i64} : (f32, f32) -> f32 + store %246, %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %247, %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %248 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %249 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %250 = load %arg0[%c781, %249] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %251 = load %arg1[%249, %248] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = "accv.bin_op"(%250, %251) {predicate = 2 : i64} : (f32, f32) -> f32 + %253 = load %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %254 = "accv.bin_op"(%253, %252) {predicate = 0 : i64} : (f32, f32) -> f32 + store %254, %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = load %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %255, %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %257 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %258 = load %arg0[%c782, %257] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %259 = load %arg1[%257, %256] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %260 = "accv.bin_op"(%258, %259) {predicate = 2 : i64} : (f32, f32) -> f32 + %261 = load %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = "accv.bin_op"(%261, %260) {predicate = 0 : i64} : (f32, f32) -> f32 + store %262, %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %263, %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %265 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %266 = load %arg0[%c782, %265] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %267 = load %arg1[%265, %264] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = "accv.bin_op"(%266, %267) {predicate = 2 : i64} : (f32, f32) -> f32 + %269 = load %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = "accv.bin_op"(%269, %268) {predicate = 0 : i64} : (f32, f32) -> f32 + store %270, %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %271, %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %272 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %273 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %274 = load %arg0[%c782, %273] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %275 = load %arg1[%273, %272] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = "accv.bin_op"(%274, %275) {predicate = 2 : i64} : (f32, f32) -> f32 + %277 = load %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %278 = "accv.bin_op"(%277, %276) {predicate = 0 : i64} : (f32, f32) -> f32 + store %278, %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = load %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %279, %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %281 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %282 = load %arg0[%c782, %281] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %283 = load %arg1[%281, %280] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %284 = "accv.bin_op"(%282, %283) {predicate = 2 : i64} : (f32, f32) -> f32 + %285 = load %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = "accv.bin_op"(%285, %284) {predicate = 0 : i64} : (f32, f32) -> f32 + store %286, %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %287, %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %289 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %290 = load %arg0[%c782, %289] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %291 = load %arg1[%289, %288] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = "accv.bin_op"(%290, %291) {predicate = 2 : i64} : (f32, f32) -> f32 + %293 = load %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = "accv.bin_op"(%293, %292) {predicate = 0 : i64} : (f32, f32) -> f32 + store %294, %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %295, %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %296 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %297 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %298 = load %arg0[%c782, %297] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %299 = load %arg1[%297, %296] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = "accv.bin_op"(%298, %299) {predicate = 2 : i64} : (f32, f32) -> f32 + %301 = load %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %302 = "accv.bin_op"(%301, %300) {predicate = 0 : i64} : (f32, f32) -> f32 + store %302, %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = load %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %303, %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %305 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %306 = load %arg0[%c782, %305] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %307 = load %arg1[%305, %304] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %308 = "accv.bin_op"(%306, %307) {predicate = 2 : i64} : (f32, f32) -> f32 + %309 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = "accv.bin_op"(%309, %308) {predicate = 0 : i64} : (f32, f32) -> f32 + store %310, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %311, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %313 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %314 = load %arg0[%c782, %313] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%313, %312] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = "accv.bin_op"(%314, %315) {predicate = 2 : i64} : (f32, f32) -> f32 + %317 = load %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = "accv.bin_op"(%317, %316) {predicate = 0 : i64} : (f32, f32) -> f32 + store %318, %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %321 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %322 = load %arg0[%c782, %321] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %323 = load %arg1[%321, %320] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = "accv.bin_op"(%322, %323) {predicate = 2 : i64} : (f32, f32) -> f32 + %325 = load %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = "accv.bin_op"(%325, %324) {predicate = 0 : i64} : (f32, f32) -> f32 + store %326, %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = load %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %327, %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %329 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %330 = load %arg0[%c782, %329] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %331 = load %arg1[%329, %328] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = "accv.bin_op"(%330, %331) {predicate = 2 : i64} : (f32, f32) -> f32 + %333 = load %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = "accv.bin_op"(%333, %332) {predicate = 0 : i64} : (f32, f32) -> f32 + store %334, %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %335, %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %337 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %338 = load %arg0[%c782, %337] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%337, %336] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = "accv.bin_op"(%338, %339) {predicate = 2 : i64} : (f32, f32) -> f32 + %341 = load %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = "accv.bin_op"(%341, %340) {predicate = 0 : i64} : (f32, f32) -> f32 + store %342, %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %345 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %346 = load %arg0[%c782, %345] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %347 = load %arg1[%345, %344] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = "accv.bin_op"(%346, %347) {predicate = 2 : i64} : (f32, f32) -> f32 + %349 = load %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = "accv.bin_op"(%349, %348) {predicate = 0 : i64} : (f32, f32) -> f32 + store %350, %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = load %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %351, %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %353 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %354 = load %arg0[%c782, %353] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %355 = load %arg1[%353, %352] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = "accv.bin_op"(%354, %355) {predicate = 2 : i64} : (f32, f32) -> f32 + %357 = load %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = "accv.bin_op"(%357, %356) {predicate = 0 : i64} : (f32, f32) -> f32 + store %358, %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %359, %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %361 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %362 = load %arg0[%c782, %361] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%361, %360] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = "accv.bin_op"(%362, %363) {predicate = 2 : i64} : (f32, f32) -> f32 + %365 = load %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = "accv.bin_op"(%365, %364) {predicate = 0 : i64} : (f32, f32) -> f32 + store %366, %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %369 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %370 = load %arg0[%c782, %369] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %371 = load %arg1[%369, %368] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = "accv.bin_op"(%370, %371) {predicate = 2 : i64} : (f32, f32) -> f32 + %373 = load %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = "accv.bin_op"(%373, %372) {predicate = 0 : i64} : (f32, f32) -> f32 + store %374, %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = load %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %375, %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %377 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %378 = load %arg0[%c782, %377] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %379 = load %arg1[%377, %376] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = "accv.bin_op"(%378, %379) {predicate = 2 : i64} : (f32, f32) -> f32 + %381 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = "accv.bin_op"(%381, %380) {predicate = 0 : i64} : (f32, f32) -> f32 + store %382, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %383, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %385 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %386 = load %arg0[%c783, %385] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%385, %384] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = "accv.bin_op"(%386, %387) {predicate = 2 : i64} : (f32, f32) -> f32 + %389 = load %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = "accv.bin_op"(%389, %388) {predicate = 0 : i64} : (f32, f32) -> f32 + store %390, %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %393 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %394 = load %arg0[%c783, %393] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %395 = load %arg1[%393, %392] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = "accv.bin_op"(%394, %395) {predicate = 2 : i64} : (f32, f32) -> f32 + %397 = load %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = "accv.bin_op"(%397, %396) {predicate = 0 : i64} : (f32, f32) -> f32 + store %398, %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = load %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %399, %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %401 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %402 = load %arg0[%c783, %401] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %403 = load %arg1[%401, %400] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = "accv.bin_op"(%402, %403) {predicate = 2 : i64} : (f32, f32) -> f32 + %405 = load %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %406 = "accv.bin_op"(%405, %404) {predicate = 0 : i64} : (f32, f32) -> f32 + store %406, %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = load %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %407, %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %408 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %409 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %410 = load %arg0[%c783, %409] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %411 = load %arg1[%409, %408] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %412 = "accv.bin_op"(%410, %411) {predicate = 2 : i64} : (f32, f32) -> f32 + %413 = load %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %414 = "accv.bin_op"(%413, %412) {predicate = 0 : i64} : (f32, f32) -> f32 + store %414, %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = load %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %415, %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %417 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %418 = load %arg0[%c783, %417] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %419 = load %arg1[%417, %416] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = "accv.bin_op"(%418, %419) {predicate = 2 : i64} : (f32, f32) -> f32 + %421 = load %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = "accv.bin_op"(%421, %420) {predicate = 0 : i64} : (f32, f32) -> f32 + store %422, %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %423, %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %424 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %425 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %426 = load %arg0[%c783, %425] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %427 = load %arg1[%425, %424] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = "accv.bin_op"(%426, %427) {predicate = 2 : i64} : (f32, f32) -> f32 + %429 = load %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %430 = "accv.bin_op"(%429, %428) {predicate = 0 : i64} : (f32, f32) -> f32 + store %430, %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = load %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %431, %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %432 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %433 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %434 = load %arg0[%c783, %433] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %435 = load %arg1[%433, %432] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %436 = "accv.bin_op"(%434, %435) {predicate = 2 : i64} : (f32, f32) -> f32 + %437 = load %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %438 = "accv.bin_op"(%437, %436) {predicate = 0 : i64} : (f32, f32) -> f32 + store %438, %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = load %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %439, %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %441 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %442 = load %arg0[%c783, %441] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %443 = load %arg1[%441, %440] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %444 = "accv.bin_op"(%442, %443) {predicate = 2 : i64} : (f32, f32) -> f32 + %445 = load %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = "accv.bin_op"(%445, %444) {predicate = 0 : i64} : (f32, f32) -> f32 + store %446, %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %447, %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %448 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %449 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %450 = load %arg0[%c783, %449] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %451 = load %arg1[%449, %448] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = "accv.bin_op"(%450, %451) {predicate = 2 : i64} : (f32, f32) -> f32 + %453 = load %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %454 = "accv.bin_op"(%453, %452) {predicate = 0 : i64} : (f32, f32) -> f32 + store %454, %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = load %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %455, %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %456 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %457 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %458 = load %arg0[%c783, %457] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %459 = load %arg1[%457, %456] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = "accv.bin_op"(%458, %459) {predicate = 2 : i64} : (f32, f32) -> f32 + %461 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %462 = "accv.bin_op"(%461, %460) {predicate = 0 : i64} : (f32, f32) -> f32 + store %462, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %463, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %465 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %466 = load %arg0[%c783, %465] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %467 = load %arg1[%465, %464] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = "accv.bin_op"(%466, %467) {predicate = 2 : i64} : (f32, f32) -> f32 + %469 = load %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = "accv.bin_op"(%469, %468) {predicate = 0 : i64} : (f32, f32) -> f32 + store %470, %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %471, %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %472 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %473 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %474 = load %arg0[%c783, %473] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %475 = load %arg1[%473, %472] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = "accv.bin_op"(%474, %475) {predicate = 2 : i64} : (f32, f32) -> f32 + %477 = load %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = "accv.bin_op"(%477, %476) {predicate = 0 : i64} : (f32, f32) -> f32 + store %478, %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = load %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %479, %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %480 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %481 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %482 = load %arg0[%c783, %481] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %483 = load %arg1[%481, %480] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %484 = "accv.bin_op"(%482, %483) {predicate = 2 : i64} : (f32, f32) -> f32 + %485 = load %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = "accv.bin_op"(%485, %484) {predicate = 0 : i64} : (f32, f32) -> f32 + store %486, %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = load %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %487, %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %489 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %490 = load %arg0[%c783, %489] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %491 = load %arg1[%489, %488] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %492 = "accv.bin_op"(%490, %491) {predicate = 2 : i64} : (f32, f32) -> f32 + %493 = load %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = "accv.bin_op"(%493, %492) {predicate = 0 : i64} : (f32, f32) -> f32 + store %494, %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %495, %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %497 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %498 = load %arg0[%c783, %497] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %499 = load %arg1[%497, %496] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = "accv.bin_op"(%498, %499) {predicate = 2 : i64} : (f32, f32) -> f32 + %501 = load %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %502 = "accv.bin_op"(%501, %500) {predicate = 0 : i64} : (f32, f32) -> f32 + store %502, %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %503 = load %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %503, %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %505 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %506 = load %arg0[%c783, %505] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %507 = load %arg1[%505, %504] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = "accv.bin_op"(%506, %507) {predicate = 2 : i64} : (f32, f32) -> f32 + %509 = load %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = "accv.bin_op"(%509, %508) {predicate = 0 : i64} : (f32, f32) -> f32 + store %510, %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %511 = load %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %511, %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_4,14}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_3,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_3,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_1,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/6_Canonicalizer.mlir b/Tutorials/optimized_matmul/mlir/6_Canonicalizer.mlir new file mode 100644 index 00000000..d8cf6fcd --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/6_Canonicalizer.mlir @@ -0,0 +1,1711 @@ +module @optimized_matmul { + accv.module "optimized_matmul" { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c780 = constant 780 : index + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + affine.for %arg3 = 0 to 512 step 256 { + affine.for %arg4 = 0 to 780 step 6 { + affine.for %arg5 = 0 to 256 step 16 { + affine.for %arg6 = 0 to 128 step 4 { + affine.for %arg7 = 0 to 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = "accv.bin_op"(%2, %3) {predicate = 2 : i64} : (f32, f32) -> f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = "accv.bin_op"(%5, %4) {predicate = 0 : i64} : (f32, f32) -> f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %10 = load %arg0[%arg4, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg1[%9, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %12 = "accv.bin_op"(%10, %11) {predicate = 2 : i64} : (f32, f32) -> f32 + %13 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = "accv.bin_op"(%13, %12) {predicate = 0 : i64} : (f32, f32) -> f32 + store %14, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = load %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %15, %arg2[%arg4, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %17 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %18 = load %arg0[%arg4, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg1[%17, %16] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = "accv.bin_op"(%18, %19) {predicate = 2 : i64} : (f32, f32) -> f32 + %21 = load %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = "accv.bin_op"(%21, %20) {predicate = 0 : i64} : (f32, f32) -> f32 + store %22, %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %23 = load %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %23, %arg2[%arg4, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %25 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %26 = load %arg0[%arg4, %25] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg1[%25, %24] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = "accv.bin_op"(%26, %27) {predicate = 2 : i64} : (f32, f32) -> f32 + %29 = load %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %30 = "accv.bin_op"(%29, %28) {predicate = 0 : i64} : (f32, f32) -> f32 + store %30, %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = load %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %31, %arg2[%arg4, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %33 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %34 = load %arg0[%arg4, %33] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %35 = load %arg1[%33, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = "accv.bin_op"(%34, %35) {predicate = 2 : i64} : (f32, f32) -> f32 + %37 = load %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %38 = "accv.bin_op"(%37, %36) {predicate = 0 : i64} : (f32, f32) -> f32 + store %38, %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = load %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %39, %arg2[%arg4, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %41 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %42 = load %arg0[%arg4, %41] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %43 = load %arg1[%41, %40] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = "accv.bin_op"(%42, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %45 = load %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = "accv.bin_op"(%45, %44) {predicate = 0 : i64} : (f32, f32) -> f32 + store %46, %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %47 = load %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %47, %arg2[%arg4, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %49 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %50 = load %arg0[%arg4, %49] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %51 = load %arg1[%49, %48] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = "accv.bin_op"(%50, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = load %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %54 = "accv.bin_op"(%53, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + store %54, %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = load %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %55, %arg2[%arg4, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %57 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %58 = load %arg0[%arg4, %57] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%57, %56] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = "accv.bin_op"(%58, %59) {predicate = 2 : i64} : (f32, f32) -> f32 + %61 = load %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = "accv.bin_op"(%61, %60) {predicate = 0 : i64} : (f32, f32) -> f32 + store %62, %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%arg4, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %65 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %66 = load %arg0[%arg4, %65] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %67 = load %arg1[%65, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %68 = "accv.bin_op"(%66, %67) {predicate = 2 : i64} : (f32, f32) -> f32 + %69 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = "accv.bin_op"(%69, %68) {predicate = 0 : i64} : (f32, f32) -> f32 + store %70, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = load %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %71, %arg2[%arg4, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %72 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %73 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %74 = load %arg0[%arg4, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = "accv.bin_op"(%74, %75) {predicate = 2 : i64} : (f32, f32) -> f32 + %77 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = "accv.bin_op"(%77, %76) {predicate = 0 : i64} : (f32, f32) -> f32 + store %78, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %81 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %82 = load %arg0[%arg4, %81] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %83 = load %arg1[%81, %80] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = "accv.bin_op"(%82, %83) {predicate = 2 : i64} : (f32, f32) -> f32 + %85 = load %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %86 = "accv.bin_op"(%85, %84) {predicate = 0 : i64} : (f32, f32) -> f32 + store %86, %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = load %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %87, %arg2[%arg4, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %89 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %90 = load %arg0[%arg4, %89] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %91 = load %arg1[%89, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = "accv.bin_op"(%90, %91) {predicate = 2 : i64} : (f32, f32) -> f32 + %93 = load %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = "accv.bin_op"(%93, %92) {predicate = 0 : i64} : (f32, f32) -> f32 + store %94, %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = load %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %95, %arg2[%arg4, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %97 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %98 = load %arg0[%arg4, %97] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %99 = load %arg1[%97, %96] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %100 = "accv.bin_op"(%98, %99) {predicate = 2 : i64} : (f32, f32) -> f32 + %101 = load %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = "accv.bin_op"(%101, %100) {predicate = 0 : i64} : (f32, f32) -> f32 + store %102, %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = load %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %103, %arg2[%arg4, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %105 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %106 = load %arg0[%arg4, %105] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %107 = load %arg1[%105, %104] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %108 = "accv.bin_op"(%106, %107) {predicate = 2 : i64} : (f32, f32) -> f32 + %109 = load %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %110 = "accv.bin_op"(%109, %108) {predicate = 0 : i64} : (f32, f32) -> f32 + store %110, %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = load %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %111, %arg2[%arg4, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %113 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %114 = load %arg0[%arg4, %113] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%113, %112] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = "accv.bin_op"(%114, %115) {predicate = 2 : i64} : (f32, f32) -> f32 + %117 = load %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = "accv.bin_op"(%117, %116) {predicate = 0 : i64} : (f32, f32) -> f32 + store %118, %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%arg4, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %121 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %122 = load %arg0[%arg4, %121] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg1[%121, %120] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = "accv.bin_op"(%122, %123) {predicate = 2 : i64} : (f32, f32) -> f32 + %125 = load %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = "accv.bin_op"(%125, %124) {predicate = 0 : i64} : (f32, f32) -> f32 + store %126, %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = load %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %127, %arg2[%arg4, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %129 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %130 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %131 = load %arg0[%128, %130] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = load %arg1[%130, %129] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = "accv.bin_op"(%131, %132) {predicate = 2 : i64} : (f32, f32) -> f32 + %134 = load %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = "accv.bin_op"(%134, %133) {predicate = 0 : i64} : (f32, f32) -> f32 + store %135, %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = load %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %136, %arg2[%128, %129] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %137 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %138 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %139 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %140 = load %arg0[%137, %139] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %141 = load %arg1[%139, %138] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = "accv.bin_op"(%140, %141) {predicate = 2 : i64} : (f32, f32) -> f32 + %143 = load %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = "accv.bin_op"(%143, %142) {predicate = 0 : i64} : (f32, f32) -> f32 + store %144, %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %145 = load %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %145, %arg2[%137, %138] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %146 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %147 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %148 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %149 = load %arg0[%146, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %150 = load %arg1[%148, %147] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = "accv.bin_op"(%149, %150) {predicate = 2 : i64} : (f32, f32) -> f32 + %152 = load %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %153 = "accv.bin_op"(%152, %151) {predicate = 0 : i64} : (f32, f32) -> f32 + store %153, %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %154 = load %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %154, %arg2[%146, %147] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %155 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %156 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %157 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %158 = load %arg0[%155, %157] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg1[%157, %156] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = "accv.bin_op"(%158, %159) {predicate = 2 : i64} : (f32, f32) -> f32 + %161 = load %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = "accv.bin_op"(%161, %160) {predicate = 0 : i64} : (f32, f32) -> f32 + store %162, %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %163 = load %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %163, %arg2[%155, %156] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %165 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %166 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %167 = load %arg0[%164, %166] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %168 = load %arg1[%166, %165] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = "accv.bin_op"(%167, %168) {predicate = 2 : i64} : (f32, f32) -> f32 + %170 = load %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = "accv.bin_op"(%170, %169) {predicate = 0 : i64} : (f32, f32) -> f32 + store %171, %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = load %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %172, %arg2[%164, %165] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %173 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %174 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %175 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %176 = load %arg0[%173, %175] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %177 = load %arg1[%175, %174] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = "accv.bin_op"(%176, %177) {predicate = 2 : i64} : (f32, f32) -> f32 + %179 = load %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = "accv.bin_op"(%179, %178) {predicate = 0 : i64} : (f32, f32) -> f32 + store %180, %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = load %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %181, %arg2[%173, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %183 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %184 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %185 = load %arg0[%182, %184] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %186 = load %arg1[%184, %183] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = "accv.bin_op"(%185, %186) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = load %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %189 = "accv.bin_op"(%188, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + store %189, %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = load %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %190, %arg2[%182, %183] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %192 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %193 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %194 = load %arg0[%191, %193] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %195 = load %arg1[%193, %192] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = "accv.bin_op"(%194, %195) {predicate = 2 : i64} : (f32, f32) -> f32 + %197 = load %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = "accv.bin_op"(%197, %196) {predicate = 0 : i64} : (f32, f32) -> f32 + store %198, %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = load %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %199, %arg2[%191, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %201 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %202 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %203 = load %arg0[%200, %202] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %204 = load %arg1[%202, %201] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = "accv.bin_op"(%203, %204) {predicate = 2 : i64} : (f32, f32) -> f32 + %206 = load %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = "accv.bin_op"(%206, %205) {predicate = 0 : i64} : (f32, f32) -> f32 + store %207, %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = load %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %208, %arg2[%200, %201] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %209 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %210 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %211 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %212 = load %arg0[%209, %211] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %213 = load %arg1[%211, %210] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = "accv.bin_op"(%212, %213) {predicate = 2 : i64} : (f32, f32) -> f32 + %215 = load %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = "accv.bin_op"(%215, %214) {predicate = 0 : i64} : (f32, f32) -> f32 + store %216, %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %217 = load %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %217, %arg2[%209, %210] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %218 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %219 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %220 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %221 = load %arg0[%218, %220] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %222 = load %arg1[%220, %219] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = "accv.bin_op"(%221, %222) {predicate = 2 : i64} : (f32, f32) -> f32 + %224 = load %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %225 = "accv.bin_op"(%224, %223) {predicate = 0 : i64} : (f32, f32) -> f32 + store %225, %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %226 = load %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %226, %arg2[%218, %219] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %227 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %228 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %229 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %230 = load %arg0[%227, %229] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %231 = load %arg1[%229, %228] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = "accv.bin_op"(%230, %231) {predicate = 2 : i64} : (f32, f32) -> f32 + %233 = load %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %234 = "accv.bin_op"(%233, %232) {predicate = 0 : i64} : (f32, f32) -> f32 + store %234, %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %235 = load %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %235, %arg2[%227, %228] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %236 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %237 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %238 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %239 = load %arg0[%236, %238] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %240 = load %arg1[%238, %237] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = "accv.bin_op"(%239, %240) {predicate = 2 : i64} : (f32, f32) -> f32 + %242 = load %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %243 = "accv.bin_op"(%242, %241) {predicate = 0 : i64} : (f32, f32) -> f32 + store %243, %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = load %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %244, %arg2[%236, %237] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %245 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %246 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %247 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %248 = load %arg0[%245, %247] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %249 = load %arg1[%247, %246] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = "accv.bin_op"(%248, %249) {predicate = 2 : i64} : (f32, f32) -> f32 + %251 = load %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = "accv.bin_op"(%251, %250) {predicate = 0 : i64} : (f32, f32) -> f32 + store %252, %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %253 = load %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %253, %arg2[%245, %246] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %254 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %255 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %256 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %257 = load %arg0[%254, %256] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %258 = load %arg1[%256, %255] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = "accv.bin_op"(%257, %258) {predicate = 2 : i64} : (f32, f32) -> f32 + %260 = load %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = "accv.bin_op"(%260, %259) {predicate = 0 : i64} : (f32, f32) -> f32 + store %261, %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = load %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %262, %arg2[%254, %255] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg4) + %264 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %265 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %266 = load %arg0[%263, %265] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %267 = load %arg1[%265, %264] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = "accv.bin_op"(%266, %267) {predicate = 2 : i64} : (f32, f32) -> f32 + %269 = load %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = "accv.bin_op"(%269, %268) {predicate = 0 : i64} : (f32, f32) -> f32 + store %270, %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %271, %arg2[%263, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %272 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %273 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %274 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %275 = load %arg0[%272, %274] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %276 = load %arg1[%274, %273] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = "accv.bin_op"(%275, %276) {predicate = 2 : i64} : (f32, f32) -> f32 + %278 = load %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = "accv.bin_op"(%278, %277) {predicate = 0 : i64} : (f32, f32) -> f32 + store %279, %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = load %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %280, %arg2[%272, %273] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %282 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %283 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %284 = load %arg0[%281, %283] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %285 = load %arg1[%283, %282] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = "accv.bin_op"(%284, %285) {predicate = 2 : i64} : (f32, f32) -> f32 + %287 = load %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = "accv.bin_op"(%287, %286) {predicate = 0 : i64} : (f32, f32) -> f32 + store %288, %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %289 = load %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %289, %arg2[%281, %282] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %290 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %291 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %292 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %293 = load %arg0[%290, %292] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %294 = load %arg1[%292, %291] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = "accv.bin_op"(%293, %294) {predicate = 2 : i64} : (f32, f32) -> f32 + %296 = load %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %297 = "accv.bin_op"(%296, %295) {predicate = 0 : i64} : (f32, f32) -> f32 + store %297, %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = load %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %298, %arg2[%290, %291] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %300 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %301 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %302 = load %arg0[%299, %301] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %303 = load %arg1[%301, %300] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = "accv.bin_op"(%302, %303) {predicate = 2 : i64} : (f32, f32) -> f32 + %305 = load %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %306 = "accv.bin_op"(%305, %304) {predicate = 0 : i64} : (f32, f32) -> f32 + store %306, %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = load %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %307, %arg2[%299, %300] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %308 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %309 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %310 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %311 = load %arg0[%308, %310] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %312 = load %arg1[%310, %309] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %313 = "accv.bin_op"(%311, %312) {predicate = 2 : i64} : (f32, f32) -> f32 + %314 = load %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %315 = "accv.bin_op"(%314, %313) {predicate = 0 : i64} : (f32, f32) -> f32 + store %315, %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = load %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %316, %arg2[%308, %309] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %318 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %319 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %320 = load %arg0[%317, %319] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %321 = load %arg1[%319, %318] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %322 = "accv.bin_op"(%320, %321) {predicate = 2 : i64} : (f32, f32) -> f32 + %323 = load %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = "accv.bin_op"(%323, %322) {predicate = 0 : i64} : (f32, f32) -> f32 + store %324, %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %325 = load %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %325, %arg2[%317, %318] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %327 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %328 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %329 = load %arg0[%326, %328] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %330 = load %arg1[%328, %327] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = "accv.bin_op"(%329, %330) {predicate = 2 : i64} : (f32, f32) -> f32 + %332 = load %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %333 = "accv.bin_op"(%332, %331) {predicate = 0 : i64} : (f32, f32) -> f32 + store %333, %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = load %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %334, %arg2[%326, %327] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %336 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %337 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %338 = load %arg0[%335, %337] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%337, %336] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = "accv.bin_op"(%338, %339) {predicate = 2 : i64} : (f32, f32) -> f32 + %341 = load %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = "accv.bin_op"(%341, %340) {predicate = 0 : i64} : (f32, f32) -> f32 + store %342, %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%335, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %345 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %346 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %347 = load %arg0[%344, %346] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %348 = load %arg1[%346, %345] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = "accv.bin_op"(%347, %348) {predicate = 2 : i64} : (f32, f32) -> f32 + %350 = load %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = "accv.bin_op"(%350, %349) {predicate = 0 : i64} : (f32, f32) -> f32 + store %351, %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = load %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %352, %arg2[%344, %345] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %354 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %355 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %356 = load %arg0[%353, %355] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %357 = load %arg1[%355, %354] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = "accv.bin_op"(%356, %357) {predicate = 2 : i64} : (f32, f32) -> f32 + %359 = load %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = "accv.bin_op"(%359, %358) {predicate = 0 : i64} : (f32, f32) -> f32 + store %360, %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = load %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %361, %arg2[%353, %354] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %363 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %364 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %365 = load %arg0[%362, %364] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%364, %363] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = "accv.bin_op"(%365, %366) {predicate = 2 : i64} : (f32, f32) -> f32 + %368 = load %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = "accv.bin_op"(%368, %367) {predicate = 0 : i64} : (f32, f32) -> f32 + store %369, %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%362, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %372 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %373 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %374 = load %arg0[%371, %373] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %375 = load %arg1[%373, %372] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = "accv.bin_op"(%374, %375) {predicate = 2 : i64} : (f32, f32) -> f32 + %377 = load %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %378 = "accv.bin_op"(%377, %376) {predicate = 0 : i64} : (f32, f32) -> f32 + store %378, %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = load %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %379, %arg2[%371, %372] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %381 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %382 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %383 = load %arg0[%380, %382] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %384 = load %arg1[%382, %381] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %385 = "accv.bin_op"(%383, %384) {predicate = 2 : i64} : (f32, f32) -> f32 + %386 = load %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = "accv.bin_op"(%386, %385) {predicate = 0 : i64} : (f32, f32) -> f32 + store %387, %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = load %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %388, %arg2[%380, %381] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %390 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %391 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %392 = load %arg0[%389, %391] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %393 = load %arg1[%391, %390] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %394 = "accv.bin_op"(%392, %393) {predicate = 2 : i64} : (f32, f32) -> f32 + %395 = load %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = "accv.bin_op"(%395, %394) {predicate = 0 : i64} : (f32, f32) -> f32 + store %396, %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = load %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %397, %arg2[%389, %390] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %399 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %400 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %401 = load %arg0[%398, %400] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %402 = load %arg1[%400, %399] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %403 = "accv.bin_op"(%401, %402) {predicate = 2 : i64} : (f32, f32) -> f32 + %404 = load %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %405 = "accv.bin_op"(%404, %403) {predicate = 0 : i64} : (f32, f32) -> f32 + store %405, %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %406 = load %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %406, %arg2[%398, %399] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = affine.apply affine_map<(d0) -> (d0 + 2)>(%arg4) + %408 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %409 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %410 = load %arg0[%407, %409] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %411 = load %arg1[%409, %408] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %412 = "accv.bin_op"(%410, %411) {predicate = 2 : i64} : (f32, f32) -> f32 + %413 = load %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %414 = "accv.bin_op"(%413, %412) {predicate = 0 : i64} : (f32, f32) -> f32 + store %414, %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = load %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %415, %arg2[%407, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %417 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %418 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %419 = load %arg0[%416, %418] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %420 = load %arg1[%418, %417] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = "accv.bin_op"(%419, %420) {predicate = 2 : i64} : (f32, f32) -> f32 + %422 = load %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = "accv.bin_op"(%422, %421) {predicate = 0 : i64} : (f32, f32) -> f32 + store %423, %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %424 = load %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %424, %arg2[%416, %417] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %426 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %427 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %428 = load %arg0[%425, %427] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %429 = load %arg1[%427, %426] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %430 = "accv.bin_op"(%428, %429) {predicate = 2 : i64} : (f32, f32) -> f32 + %431 = load %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %432 = "accv.bin_op"(%431, %430) {predicate = 0 : i64} : (f32, f32) -> f32 + store %432, %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = load %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %433, %arg2[%425, %426] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %434 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %435 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %436 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %437 = load %arg0[%434, %436] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %438 = load %arg1[%436, %435] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = "accv.bin_op"(%437, %438) {predicate = 2 : i64} : (f32, f32) -> f32 + %440 = load %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = "accv.bin_op"(%440, %439) {predicate = 0 : i64} : (f32, f32) -> f32 + store %441, %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %442 = load %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %442, %arg2[%434, %435] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %443 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %444 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %445 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %446 = load %arg0[%443, %445] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %447 = load %arg1[%445, %444] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %448 = "accv.bin_op"(%446, %447) {predicate = 2 : i64} : (f32, f32) -> f32 + %449 = load %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %450 = "accv.bin_op"(%449, %448) {predicate = 0 : i64} : (f32, f32) -> f32 + store %450, %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = load %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %451, %arg2[%443, %444] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %453 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %454 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %455 = load %arg0[%452, %454] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %456 = load %arg1[%454, %453] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = "accv.bin_op"(%455, %456) {predicate = 2 : i64} : (f32, f32) -> f32 + %458 = load %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = "accv.bin_op"(%458, %457) {predicate = 0 : i64} : (f32, f32) -> f32 + store %459, %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = load %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %460, %arg2[%452, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %462 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %463 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %464 = load %arg0[%461, %463] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %465 = load %arg1[%463, %462] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %466 = "accv.bin_op"(%464, %465) {predicate = 2 : i64} : (f32, f32) -> f32 + %467 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = "accv.bin_op"(%467, %466) {predicate = 0 : i64} : (f32, f32) -> f32 + store %468, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %469, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %471 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %472 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %473 = load %arg0[%470, %472] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %474 = load %arg1[%472, %471] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %475 = "accv.bin_op"(%473, %474) {predicate = 2 : i64} : (f32, f32) -> f32 + %476 = load %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %477 = "accv.bin_op"(%476, %475) {predicate = 0 : i64} : (f32, f32) -> f32 + store %477, %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = load %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %478, %arg2[%470, %471] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %480 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %481 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %482 = load %arg0[%479, %481] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %483 = load %arg1[%481, %480] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %484 = "accv.bin_op"(%482, %483) {predicate = 2 : i64} : (f32, f32) -> f32 + %485 = load %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = "accv.bin_op"(%485, %484) {predicate = 0 : i64} : (f32, f32) -> f32 + store %486, %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = load %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %487, %arg2[%479, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %489 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %490 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %491 = load %arg0[%488, %490] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %492 = load %arg1[%490, %489] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %493 = "accv.bin_op"(%491, %492) {predicate = 2 : i64} : (f32, f32) -> f32 + %494 = load %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = "accv.bin_op"(%494, %493) {predicate = 0 : i64} : (f32, f32) -> f32 + store %495, %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = load %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %496, %arg2[%488, %489] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %497 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %498 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %499 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %500 = load %arg0[%497, %499] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %501 = load %arg1[%499, %498] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %502 = "accv.bin_op"(%500, %501) {predicate = 2 : i64} : (f32, f32) -> f32 + %503 = load %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = "accv.bin_op"(%503, %502) {predicate = 0 : i64} : (f32, f32) -> f32 + store %504, %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %505 = load %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %505, %arg2[%497, %498] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %507 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %508 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %509 = load %arg0[%506, %508] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %510 = load %arg1[%508, %507] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %511 = "accv.bin_op"(%509, %510) {predicate = 2 : i64} : (f32, f32) -> f32 + %512 = load %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %513 = "accv.bin_op"(%512, %511) {predicate = 0 : i64} : (f32, f32) -> f32 + store %513, %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = load %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %514, %arg2[%506, %507] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %515 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %516 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %517 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %518 = load %arg0[%515, %517] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %519 = load %arg1[%517, %516] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = "accv.bin_op"(%518, %519) {predicate = 2 : i64} : (f32, f32) -> f32 + %521 = load %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %522 = "accv.bin_op"(%521, %520) {predicate = 0 : i64} : (f32, f32) -> f32 + store %522, %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %523 = load %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %523, %arg2[%515, %516] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %524 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %525 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %526 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %527 = load %arg0[%524, %526] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %528 = load %arg1[%526, %525] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %529 = "accv.bin_op"(%527, %528) {predicate = 2 : i64} : (f32, f32) -> f32 + %530 = load %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %531 = "accv.bin_op"(%530, %529) {predicate = 0 : i64} : (f32, f32) -> f32 + store %531, %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = load %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %532, %arg2[%524, %525] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %533 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %534 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %535 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %536 = load %arg0[%533, %535] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %537 = load %arg1[%535, %534] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = "accv.bin_op"(%536, %537) {predicate = 2 : i64} : (f32, f32) -> f32 + %539 = load %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = "accv.bin_op"(%539, %538) {predicate = 0 : i64} : (f32, f32) -> f32 + store %540, %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %541 = load %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %541, %arg2[%533, %534] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %542 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %543 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %544 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %545 = load %arg0[%542, %544] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %546 = load %arg1[%544, %543] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %547 = "accv.bin_op"(%545, %546) {predicate = 2 : i64} : (f32, f32) -> f32 + %548 = load %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = "accv.bin_op"(%548, %547) {predicate = 0 : i64} : (f32, f32) -> f32 + store %549, %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = load %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %550, %arg2[%542, %543] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %551 = affine.apply affine_map<(d0) -> (d0 + 3)>(%arg4) + %552 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %553 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %554 = load %arg0[%551, %553] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %555 = load %arg1[%553, %552] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = "accv.bin_op"(%554, %555) {predicate = 2 : i64} : (f32, f32) -> f32 + %557 = load %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = "accv.bin_op"(%557, %556) {predicate = 0 : i64} : (f32, f32) -> f32 + store %558, %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = load %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %559, %arg2[%551, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %561 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %562 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %563 = load %arg0[%560, %562] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %564 = load %arg1[%562, %561] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %565 = "accv.bin_op"(%563, %564) {predicate = 2 : i64} : (f32, f32) -> f32 + %566 = load %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %567 = "accv.bin_op"(%566, %565) {predicate = 0 : i64} : (f32, f32) -> f32 + store %567, %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = load %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %568, %arg2[%560, %561] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %569 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %570 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %571 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %572 = load %arg0[%569, %571] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %573 = load %arg1[%571, %570] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %574 = "accv.bin_op"(%572, %573) {predicate = 2 : i64} : (f32, f32) -> f32 + %575 = load %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = "accv.bin_op"(%575, %574) {predicate = 0 : i64} : (f32, f32) -> f32 + store %576, %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %577 = load %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %577, %arg2[%569, %570] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %579 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %580 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %581 = load %arg0[%578, %580] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %582 = load %arg1[%580, %579] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %583 = "accv.bin_op"(%581, %582) {predicate = 2 : i64} : (f32, f32) -> f32 + %584 = load %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %585 = "accv.bin_op"(%584, %583) {predicate = 0 : i64} : (f32, f32) -> f32 + store %585, %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = load %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %586, %arg2[%578, %579] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %587 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %588 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %589 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %590 = load %arg0[%587, %589] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %591 = load %arg1[%589, %588] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %592 = "accv.bin_op"(%590, %591) {predicate = 2 : i64} : (f32, f32) -> f32 + %593 = load %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %594 = "accv.bin_op"(%593, %592) {predicate = 0 : i64} : (f32, f32) -> f32 + store %594, %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %595 = load %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %595, %arg2[%587, %588] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %597 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %598 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %599 = load %arg0[%596, %598] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %600 = load %arg1[%598, %597] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %601 = "accv.bin_op"(%599, %600) {predicate = 2 : i64} : (f32, f32) -> f32 + %602 = load %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %603 = "accv.bin_op"(%602, %601) {predicate = 0 : i64} : (f32, f32) -> f32 + store %603, %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %604 = load %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %604, %arg2[%596, %597] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %605 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %606 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %607 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %608 = load %arg0[%605, %607] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %609 = load %arg1[%607, %606] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %610 = "accv.bin_op"(%608, %609) {predicate = 2 : i64} : (f32, f32) -> f32 + %611 = load %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %612 = "accv.bin_op"(%611, %610) {predicate = 0 : i64} : (f32, f32) -> f32 + store %612, %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %613 = load %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %613, %arg2[%605, %606] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %614 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %615 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %616 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %617 = load %arg0[%614, %616] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %618 = load %arg1[%616, %615] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %619 = "accv.bin_op"(%617, %618) {predicate = 2 : i64} : (f32, f32) -> f32 + %620 = load %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %621 = "accv.bin_op"(%620, %619) {predicate = 0 : i64} : (f32, f32) -> f32 + store %621, %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %622 = load %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %622, %arg2[%614, %615] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %623 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %624 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %625 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %626 = load %arg0[%623, %625] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %627 = load %arg1[%625, %624] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %628 = "accv.bin_op"(%626, %627) {predicate = 2 : i64} : (f32, f32) -> f32 + %629 = load %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %630 = "accv.bin_op"(%629, %628) {predicate = 0 : i64} : (f32, f32) -> f32 + store %630, %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %631 = load %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %631, %arg2[%623, %624] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %632 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %633 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %634 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %635 = load %arg0[%632, %634] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %636 = load %arg1[%634, %633] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %637 = "accv.bin_op"(%635, %636) {predicate = 2 : i64} : (f32, f32) -> f32 + %638 = load %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %639 = "accv.bin_op"(%638, %637) {predicate = 0 : i64} : (f32, f32) -> f32 + store %639, %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %640 = load %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %640, %arg2[%632, %633] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %641 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %642 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %643 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %644 = load %arg0[%641, %643] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %645 = load %arg1[%643, %642] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %646 = "accv.bin_op"(%644, %645) {predicate = 2 : i64} : (f32, f32) -> f32 + %647 = load %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %648 = "accv.bin_op"(%647, %646) {predicate = 0 : i64} : (f32, f32) -> f32 + store %648, %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %649 = load %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %649, %arg2[%641, %642] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %650 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %651 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %652 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %653 = load %arg0[%650, %652] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %654 = load %arg1[%652, %651] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %655 = "accv.bin_op"(%653, %654) {predicate = 2 : i64} : (f32, f32) -> f32 + %656 = load %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %657 = "accv.bin_op"(%656, %655) {predicate = 0 : i64} : (f32, f32) -> f32 + store %657, %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %658 = load %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %658, %arg2[%650, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %659 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %660 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %661 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %662 = load %arg0[%659, %661] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %663 = load %arg1[%661, %660] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %664 = "accv.bin_op"(%662, %663) {predicate = 2 : i64} : (f32, f32) -> f32 + %665 = load %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %666 = "accv.bin_op"(%665, %664) {predicate = 0 : i64} : (f32, f32) -> f32 + store %666, %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %667 = load %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %667, %arg2[%659, %660] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %668 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %669 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %670 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %671 = load %arg0[%668, %670] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %672 = load %arg1[%670, %669] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %673 = "accv.bin_op"(%671, %672) {predicate = 2 : i64} : (f32, f32) -> f32 + %674 = load %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %675 = "accv.bin_op"(%674, %673) {predicate = 0 : i64} : (f32, f32) -> f32 + store %675, %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %676 = load %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %676, %arg2[%668, %669] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %677 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %678 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %679 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %680 = load %arg0[%677, %679] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %681 = load %arg1[%679, %678] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %682 = "accv.bin_op"(%680, %681) {predicate = 2 : i64} : (f32, f32) -> f32 + %683 = load %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %684 = "accv.bin_op"(%683, %682) {predicate = 0 : i64} : (f32, f32) -> f32 + store %684, %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %685 = load %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %685, %arg2[%677, %678] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %686 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %687 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %688 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %689 = load %arg0[%686, %688] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %690 = load %arg1[%688, %687] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %691 = "accv.bin_op"(%689, %690) {predicate = 2 : i64} : (f32, f32) -> f32 + %692 = load %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %693 = "accv.bin_op"(%692, %691) {predicate = 0 : i64} : (f32, f32) -> f32 + store %693, %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %694 = load %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %694, %arg2[%686, %687] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %695 = affine.apply affine_map<(d0) -> (d0 + 4)>(%arg4) + %696 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %697 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %698 = load %arg0[%695, %697] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %699 = load %arg1[%697, %696] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %700 = "accv.bin_op"(%698, %699) {predicate = 2 : i64} : (f32, f32) -> f32 + %701 = load %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %702 = "accv.bin_op"(%701, %700) {predicate = 0 : i64} : (f32, f32) -> f32 + store %702, %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %703 = load %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %703, %arg2[%695, %696] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %704 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %705 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %706 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %707 = load %arg0[%704, %706] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %708 = load %arg1[%706, %705] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %709 = "accv.bin_op"(%707, %708) {predicate = 2 : i64} : (f32, f32) -> f32 + %710 = load %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %711 = "accv.bin_op"(%710, %709) {predicate = 0 : i64} : (f32, f32) -> f32 + store %711, %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %712 = load %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %712, %arg2[%704, %705] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %713 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %714 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg5) + %715 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %716 = load %arg0[%713, %715] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %717 = load %arg1[%715, %714] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %718 = "accv.bin_op"(%716, %717) {predicate = 2 : i64} : (f32, f32) -> f32 + %719 = load %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %720 = "accv.bin_op"(%719, %718) {predicate = 0 : i64} : (f32, f32) -> f32 + store %720, %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %721 = load %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %721, %arg2[%713, %714] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %722 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %723 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg5) + %724 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %725 = load %arg0[%722, %724] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %726 = load %arg1[%724, %723] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %727 = "accv.bin_op"(%725, %726) {predicate = 2 : i64} : (f32, f32) -> f32 + %728 = load %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %729 = "accv.bin_op"(%728, %727) {predicate = 0 : i64} : (f32, f32) -> f32 + store %729, %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %730 = load %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %730, %arg2[%722, %723] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %731 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %732 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg5) + %733 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %734 = load %arg0[%731, %733] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %735 = load %arg1[%733, %732] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %736 = "accv.bin_op"(%734, %735) {predicate = 2 : i64} : (f32, f32) -> f32 + %737 = load %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %738 = "accv.bin_op"(%737, %736) {predicate = 0 : i64} : (f32, f32) -> f32 + store %738, %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %739 = load %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %739, %arg2[%731, %732] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %740 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %741 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg5) + %742 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %743 = load %arg0[%740, %742] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %744 = load %arg1[%742, %741] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %745 = "accv.bin_op"(%743, %744) {predicate = 2 : i64} : (f32, f32) -> f32 + %746 = load %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %747 = "accv.bin_op"(%746, %745) {predicate = 0 : i64} : (f32, f32) -> f32 + store %747, %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %748 = load %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %748, %arg2[%740, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %749 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %750 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg5) + %751 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %752 = load %arg0[%749, %751] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %753 = load %arg1[%751, %750] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %754 = "accv.bin_op"(%752, %753) {predicate = 2 : i64} : (f32, f32) -> f32 + %755 = load %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %756 = "accv.bin_op"(%755, %754) {predicate = 0 : i64} : (f32, f32) -> f32 + store %756, %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %757 = load %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %757, %arg2[%749, %750] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %758 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %759 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg5) + %760 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %761 = load %arg0[%758, %760] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %762 = load %arg1[%760, %759] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %763 = "accv.bin_op"(%761, %762) {predicate = 2 : i64} : (f32, f32) -> f32 + %764 = load %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %765 = "accv.bin_op"(%764, %763) {predicate = 0 : i64} : (f32, f32) -> f32 + store %765, %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %766 = load %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %766, %arg2[%758, %759] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %767 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %768 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg5) + %769 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %770 = load %arg0[%767, %769] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %771 = load %arg1[%769, %768] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %772 = "accv.bin_op"(%770, %771) {predicate = 2 : i64} : (f32, f32) -> f32 + %773 = load %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %774 = "accv.bin_op"(%773, %772) {predicate = 0 : i64} : (f32, f32) -> f32 + store %774, %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %775 = load %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %775, %arg2[%767, %768] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %776 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %777 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %778 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %779 = load %arg0[%776, %778] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %780 = load %arg1[%778, %777] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %781 = "accv.bin_op"(%779, %780) {predicate = 2 : i64} : (f32, f32) -> f32 + %782 = load %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %783 = "accv.bin_op"(%782, %781) {predicate = 0 : i64} : (f32, f32) -> f32 + store %783, %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %784 = load %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %784, %arg2[%776, %777] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %785 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %786 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg5) + %787 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %788 = load %arg0[%785, %787] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %789 = load %arg1[%787, %786] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %790 = "accv.bin_op"(%788, %789) {predicate = 2 : i64} : (f32, f32) -> f32 + %791 = load %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %792 = "accv.bin_op"(%791, %790) {predicate = 0 : i64} : (f32, f32) -> f32 + store %792, %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %793 = load %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %793, %arg2[%785, %786] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %794 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %795 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg5) + %796 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %797 = load %arg0[%794, %796] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %798 = load %arg1[%796, %795] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %799 = "accv.bin_op"(%797, %798) {predicate = 2 : i64} : (f32, f32) -> f32 + %800 = load %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %801 = "accv.bin_op"(%800, %799) {predicate = 0 : i64} : (f32, f32) -> f32 + store %801, %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %802 = load %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %802, %arg2[%794, %795] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %803 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %804 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg5) + %805 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %806 = load %arg0[%803, %805] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %807 = load %arg1[%805, %804] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %808 = "accv.bin_op"(%806, %807) {predicate = 2 : i64} : (f32, f32) -> f32 + %809 = load %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %810 = "accv.bin_op"(%809, %808) {predicate = 0 : i64} : (f32, f32) -> f32 + store %810, %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %811 = load %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %811, %arg2[%803, %804] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %812 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %813 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg5) + %814 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %815 = load %arg0[%812, %814] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %816 = load %arg1[%814, %813] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %817 = "accv.bin_op"(%815, %816) {predicate = 2 : i64} : (f32, f32) -> f32 + %818 = load %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %819 = "accv.bin_op"(%818, %817) {predicate = 0 : i64} : (f32, f32) -> f32 + store %819, %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %820 = load %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %820, %arg2[%812, %813] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %821 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %822 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg5) + %823 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %824 = load %arg0[%821, %823] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %825 = load %arg1[%823, %822] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %826 = "accv.bin_op"(%824, %825) {predicate = 2 : i64} : (f32, f32) -> f32 + %827 = load %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %828 = "accv.bin_op"(%827, %826) {predicate = 0 : i64} : (f32, f32) -> f32 + store %828, %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %829 = load %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %829, %arg2[%821, %822] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %830 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %831 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg5) + %832 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %833 = load %arg0[%830, %832] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %834 = load %arg1[%832, %831] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %835 = "accv.bin_op"(%833, %834) {predicate = 2 : i64} : (f32, f32) -> f32 + %836 = load %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %837 = "accv.bin_op"(%836, %835) {predicate = 0 : i64} : (f32, f32) -> f32 + store %837, %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %838 = load %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %838, %arg2[%830, %831] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %839 = affine.apply affine_map<(d0) -> (d0 + 5)>(%arg4) + %840 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg5) + %841 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %842 = load %arg0[%839, %841] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %843 = load %arg1[%841, %840] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %844 = "accv.bin_op"(%842, %843) {predicate = 2 : i64} : (f32, f32) -> f32 + %845 = load %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %846 = "accv.bin_op"(%845, %844) {predicate = 0 : i64} : (f32, f32) -> f32 + store %846, %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %847 = load %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %847, %arg2[%839, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_4,14}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_3,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_3,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 16, 128]} + } {begin = 0 : i64, end = 780 : i64, index = #accln<"index{i_1,11}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [6, 256, 128]} + affine.for %arg4 = 0 to 256 step 16 { + affine.for %arg5 = 0 to 128 step 4 { + affine.for %arg6 = 0 to 4 { + %0 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = "accv.bin_op"(%2, %3) {predicate = 2 : i64} : (f32, f32) -> f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = "accv.bin_op"(%5, %4) {predicate = 0 : i64} : (f32, f32) -> f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %10 = load %arg0[%c780, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %11 = load %arg1[%9, %8] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %12 = "accv.bin_op"(%10, %11) {predicate = 2 : i64} : (f32, f32) -> f32 + %13 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %14 = "accv.bin_op"(%13, %12) {predicate = 0 : i64} : (f32, f32) -> f32 + store %14, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = load %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %15, %arg2[%c780, %8] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %17 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %18 = load %arg0[%c780, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg1[%17, %16] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %20 = "accv.bin_op"(%18, %19) {predicate = 2 : i64} : (f32, f32) -> f32 + %21 = load %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = "accv.bin_op"(%21, %20) {predicate = 0 : i64} : (f32, f32) -> f32 + store %22, %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %23 = load %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %23, %arg2[%c780, %16] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %25 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %26 = load %arg0[%c780, %25] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg1[%25, %24] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %28 = "accv.bin_op"(%26, %27) {predicate = 2 : i64} : (f32, f32) -> f32 + %29 = load %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %30 = "accv.bin_op"(%29, %28) {predicate = 0 : i64} : (f32, f32) -> f32 + store %30, %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = load %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %31, %arg2[%c780, %24] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %33 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %34 = load %arg0[%c780, %33] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %35 = load %arg1[%33, %32] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %36 = "accv.bin_op"(%34, %35) {predicate = 2 : i64} : (f32, f32) -> f32 + %37 = load %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %38 = "accv.bin_op"(%37, %36) {predicate = 0 : i64} : (f32, f32) -> f32 + store %38, %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %39 = load %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %39, %arg2[%c780, %32] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %41 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %42 = load %arg0[%c780, %41] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %43 = load %arg1[%41, %40] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = "accv.bin_op"(%42, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %45 = load %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %46 = "accv.bin_op"(%45, %44) {predicate = 0 : i64} : (f32, f32) -> f32 + store %46, %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %47 = load %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %47, %arg2[%c780, %40] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %49 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %50 = load %arg0[%c780, %49] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %51 = load %arg1[%49, %48] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = "accv.bin_op"(%50, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = load %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %54 = "accv.bin_op"(%53, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + store %54, %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %55 = load %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %55, %arg2[%c780, %48] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %57 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %58 = load %arg0[%c780, %57] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %59 = load %arg1[%57, %56] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = "accv.bin_op"(%58, %59) {predicate = 2 : i64} : (f32, f32) -> f32 + %61 = load %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = "accv.bin_op"(%61, %60) {predicate = 0 : i64} : (f32, f32) -> f32 + store %62, %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %63 = load %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %63, %arg2[%c780, %56] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %65 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %66 = load %arg0[%c780, %65] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %67 = load %arg1[%65, %64] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %68 = "accv.bin_op"(%66, %67) {predicate = 2 : i64} : (f32, f32) -> f32 + %69 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = "accv.bin_op"(%69, %68) {predicate = 0 : i64} : (f32, f32) -> f32 + store %70, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = load %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %71, %arg2[%c780, %64] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %72 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %73 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %74 = load %arg0[%c780, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = "accv.bin_op"(%74, %75) {predicate = 2 : i64} : (f32, f32) -> f32 + %77 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = "accv.bin_op"(%77, %76) {predicate = 0 : i64} : (f32, f32) -> f32 + store %78, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %81 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %82 = load %arg0[%c780, %81] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %83 = load %arg1[%81, %80] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %84 = "accv.bin_op"(%82, %83) {predicate = 2 : i64} : (f32, f32) -> f32 + %85 = load %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %86 = "accv.bin_op"(%85, %84) {predicate = 0 : i64} : (f32, f32) -> f32 + store %86, %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = load %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %87, %arg2[%c780, %80] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %89 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %90 = load %arg0[%c780, %89] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %91 = load %arg1[%89, %88] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %92 = "accv.bin_op"(%90, %91) {predicate = 2 : i64} : (f32, f32) -> f32 + %93 = load %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = "accv.bin_op"(%93, %92) {predicate = 0 : i64} : (f32, f32) -> f32 + store %94, %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %95 = load %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %95, %arg2[%c780, %88] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %97 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %98 = load %arg0[%c780, %97] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %99 = load %arg1[%97, %96] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %100 = "accv.bin_op"(%98, %99) {predicate = 2 : i64} : (f32, f32) -> f32 + %101 = load %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %102 = "accv.bin_op"(%101, %100) {predicate = 0 : i64} : (f32, f32) -> f32 + store %102, %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = load %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %103, %arg2[%c780, %96] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %104 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %105 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %106 = load %arg0[%c780, %105] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %107 = load %arg1[%105, %104] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %108 = "accv.bin_op"(%106, %107) {predicate = 2 : i64} : (f32, f32) -> f32 + %109 = load %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %110 = "accv.bin_op"(%109, %108) {predicate = 0 : i64} : (f32, f32) -> f32 + store %110, %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %111 = load %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %111, %arg2[%c780, %104] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %113 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %114 = load %arg0[%c780, %113] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg1[%113, %112] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = "accv.bin_op"(%114, %115) {predicate = 2 : i64} : (f32, f32) -> f32 + %117 = load %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %118 = "accv.bin_op"(%117, %116) {predicate = 0 : i64} : (f32, f32) -> f32 + store %118, %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %119 = load %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %119, %arg2[%c780, %112] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %120 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %121 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %122 = load %arg0[%c780, %121] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %123 = load %arg1[%121, %120] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = "accv.bin_op"(%122, %123) {predicate = 2 : i64} : (f32, f32) -> f32 + %125 = load %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %126 = "accv.bin_op"(%125, %124) {predicate = 0 : i64} : (f32, f32) -> f32 + store %126, %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %127 = load %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %127, %arg2[%c780, %120] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %128 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %129 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %130 = load %arg0[%c781, %129] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg1[%129, %128] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = "accv.bin_op"(%130, %131) {predicate = 2 : i64} : (f32, f32) -> f32 + %133 = load %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = "accv.bin_op"(%133, %132) {predicate = 0 : i64} : (f32, f32) -> f32 + store %134, %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %135 = load %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %135, %arg2[%c781, %128] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %136 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %137 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %138 = load %arg0[%c781, %137] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg1[%137, %136] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %140 = "accv.bin_op"(%138, %139) {predicate = 2 : i64} : (f32, f32) -> f32 + %141 = load %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = "accv.bin_op"(%141, %140) {predicate = 0 : i64} : (f32, f32) -> f32 + store %142, %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = load %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %143, %arg2[%c781, %136] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %144 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %145 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %146 = load %arg0[%c781, %145] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %147 = load %arg1[%145, %144] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = "accv.bin_op"(%146, %147) {predicate = 2 : i64} : (f32, f32) -> f32 + %149 = load %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = "accv.bin_op"(%149, %148) {predicate = 0 : i64} : (f32, f32) -> f32 + store %150, %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = load %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %151, %arg2[%c781, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %153 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %154 = load %arg0[%c781, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg1[%153, %152] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = "accv.bin_op"(%154, %155) {predicate = 2 : i64} : (f32, f32) -> f32 + %157 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = "accv.bin_op"(%157, %156) {predicate = 0 : i64} : (f32, f32) -> f32 + store %158, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %159, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %161 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %162 = load %arg0[%c781, %161] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %163 = load %arg1[%161, %160] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %164 = "accv.bin_op"(%162, %163) {predicate = 2 : i64} : (f32, f32) -> f32 + %165 = load %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %166 = "accv.bin_op"(%165, %164) {predicate = 0 : i64} : (f32, f32) -> f32 + store %166, %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = load %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %167, %arg2[%c781, %160] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %169 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %170 = load %arg0[%c781, %169] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %171 = load %arg1[%169, %168] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = "accv.bin_op"(%170, %171) {predicate = 2 : i64} : (f32, f32) -> f32 + %173 = load %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = "accv.bin_op"(%173, %172) {predicate = 0 : i64} : (f32, f32) -> f32 + store %174, %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %175 = load %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %175, %arg2[%c781, %168] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %177 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %178 = load %arg0[%c781, %177] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %179 = load %arg1[%177, %176] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = "accv.bin_op"(%178, %179) {predicate = 2 : i64} : (f32, f32) -> f32 + %181 = load %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = "accv.bin_op"(%181, %180) {predicate = 0 : i64} : (f32, f32) -> f32 + store %182, %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = load %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %183, %arg2[%c781, %176] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %184 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %185 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %186 = load %arg0[%c781, %185] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%185, %184] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = "accv.bin_op"(%186, %187) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = load %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = "accv.bin_op"(%189, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + store %190, %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%c781, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %193 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %194 = load %arg0[%c781, %193] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %195 = load %arg1[%193, %192] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = "accv.bin_op"(%194, %195) {predicate = 2 : i64} : (f32, f32) -> f32 + %197 = load %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = "accv.bin_op"(%197, %196) {predicate = 0 : i64} : (f32, f32) -> f32 + store %198, %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %199 = load %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %199, %arg2[%c781, %192] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %201 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %202 = load %arg0[%c781, %201] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %203 = load %arg1[%201, %200] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = "accv.bin_op"(%202, %203) {predicate = 2 : i64} : (f32, f32) -> f32 + %205 = load %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %206 = "accv.bin_op"(%205, %204) {predicate = 0 : i64} : (f32, f32) -> f32 + store %206, %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %207 = load %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %207, %arg2[%c781, %200] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %209 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %210 = load %arg0[%c781, %209] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %211 = load %arg1[%209, %208] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %212 = "accv.bin_op"(%210, %211) {predicate = 2 : i64} : (f32, f32) -> f32 + %213 = load %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = "accv.bin_op"(%213, %212) {predicate = 0 : i64} : (f32, f32) -> f32 + store %214, %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %215 = load %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %215, %arg2[%c781, %208] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %216 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %217 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %218 = load %arg0[%c781, %217] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %219 = load %arg1[%217, %216] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = "accv.bin_op"(%218, %219) {predicate = 2 : i64} : (f32, f32) -> f32 + %221 = load %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = "accv.bin_op"(%221, %220) {predicate = 0 : i64} : (f32, f32) -> f32 + store %222, %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = load %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %223, %arg2[%c781, %216] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %224 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %225 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %226 = load %arg0[%c781, %225] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %227 = load %arg1[%225, %224] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = "accv.bin_op"(%226, %227) {predicate = 2 : i64} : (f32, f32) -> f32 + %229 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %230 = "accv.bin_op"(%229, %228) {predicate = 0 : i64} : (f32, f32) -> f32 + store %230, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %231, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %233 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %234 = load %arg0[%c781, %233] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %235 = load %arg1[%233, %232] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %236 = "accv.bin_op"(%234, %235) {predicate = 2 : i64} : (f32, f32) -> f32 + %237 = load %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = "accv.bin_op"(%237, %236) {predicate = 0 : i64} : (f32, f32) -> f32 + store %238, %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = load %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %239, %arg2[%c781, %232] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %241 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %242 = load %arg0[%c781, %241] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %243 = load %arg1[%241, %240] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %244 = "accv.bin_op"(%242, %243) {predicate = 2 : i64} : (f32, f32) -> f32 + %245 = load %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = "accv.bin_op"(%245, %244) {predicate = 0 : i64} : (f32, f32) -> f32 + store %246, %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %247 = load %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %247, %arg2[%c781, %240] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %248 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %249 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %250 = load %arg0[%c781, %249] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %251 = load %arg1[%249, %248] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = "accv.bin_op"(%250, %251) {predicate = 2 : i64} : (f32, f32) -> f32 + %253 = load %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %254 = "accv.bin_op"(%253, %252) {predicate = 0 : i64} : (f32, f32) -> f32 + store %254, %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = load %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %255, %arg2[%c781, %248] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %256 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %257 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %258 = load %arg0[%c782, %257] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %259 = load %arg1[%257, %256] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %260 = "accv.bin_op"(%258, %259) {predicate = 2 : i64} : (f32, f32) -> f32 + %261 = load %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = "accv.bin_op"(%261, %260) {predicate = 0 : i64} : (f32, f32) -> f32 + store %262, %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %263 = load %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %263, %arg2[%c782, %256] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %265 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %266 = load %arg0[%c782, %265] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %267 = load %arg1[%265, %264] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = "accv.bin_op"(%266, %267) {predicate = 2 : i64} : (f32, f32) -> f32 + %269 = load %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = "accv.bin_op"(%269, %268) {predicate = 0 : i64} : (f32, f32) -> f32 + store %270, %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %271, %arg2[%c782, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %272 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %273 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %274 = load %arg0[%c782, %273] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %275 = load %arg1[%273, %272] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = "accv.bin_op"(%274, %275) {predicate = 2 : i64} : (f32, f32) -> f32 + %277 = load %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %278 = "accv.bin_op"(%277, %276) {predicate = 0 : i64} : (f32, f32) -> f32 + store %278, %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %279 = load %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %279, %arg2[%c782, %272] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %281 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %282 = load %arg0[%c782, %281] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %283 = load %arg1[%281, %280] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %284 = "accv.bin_op"(%282, %283) {predicate = 2 : i64} : (f32, f32) -> f32 + %285 = load %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = "accv.bin_op"(%285, %284) {predicate = 0 : i64} : (f32, f32) -> f32 + store %286, %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %287 = load %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %287, %arg2[%c782, %280] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %289 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %290 = load %arg0[%c782, %289] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %291 = load %arg1[%289, %288] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = "accv.bin_op"(%290, %291) {predicate = 2 : i64} : (f32, f32) -> f32 + %293 = load %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = "accv.bin_op"(%293, %292) {predicate = 0 : i64} : (f32, f32) -> f32 + store %294, %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %295 = load %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %295, %arg2[%c782, %288] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %296 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %297 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %298 = load %arg0[%c782, %297] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %299 = load %arg1[%297, %296] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = "accv.bin_op"(%298, %299) {predicate = 2 : i64} : (f32, f32) -> f32 + %301 = load %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %302 = "accv.bin_op"(%301, %300) {predicate = 0 : i64} : (f32, f32) -> f32 + store %302, %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = load %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %303, %arg2[%c782, %296] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %304 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %305 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %306 = load %arg0[%c782, %305] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %307 = load %arg1[%305, %304] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %308 = "accv.bin_op"(%306, %307) {predicate = 2 : i64} : (f32, f32) -> f32 + %309 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = "accv.bin_op"(%309, %308) {predicate = 0 : i64} : (f32, f32) -> f32 + store %310, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %311, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %313 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %314 = load %arg0[%c782, %313] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %315 = load %arg1[%313, %312] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %316 = "accv.bin_op"(%314, %315) {predicate = 2 : i64} : (f32, f32) -> f32 + %317 = load %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %318 = "accv.bin_op"(%317, %316) {predicate = 0 : i64} : (f32, f32) -> f32 + store %318, %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = load %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %319, %arg2[%c782, %312] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %321 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %322 = load %arg0[%c782, %321] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %323 = load %arg1[%321, %320] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %324 = "accv.bin_op"(%322, %323) {predicate = 2 : i64} : (f32, f32) -> f32 + %325 = load %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = "accv.bin_op"(%325, %324) {predicate = 0 : i64} : (f32, f32) -> f32 + store %326, %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = load %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %327, %arg2[%c782, %320] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %329 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %330 = load %arg0[%c782, %329] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %331 = load %arg1[%329, %328] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %332 = "accv.bin_op"(%330, %331) {predicate = 2 : i64} : (f32, f32) -> f32 + %333 = load %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %334 = "accv.bin_op"(%333, %332) {predicate = 0 : i64} : (f32, f32) -> f32 + store %334, %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = load %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %335, %arg2[%c782, %328] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %336 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %337 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %338 = load %arg0[%c782, %337] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %339 = load %arg1[%337, %336] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = "accv.bin_op"(%338, %339) {predicate = 2 : i64} : (f32, f32) -> f32 + %341 = load %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %342 = "accv.bin_op"(%341, %340) {predicate = 0 : i64} : (f32, f32) -> f32 + store %342, %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %343 = load %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %343, %arg2[%c782, %336] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %345 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %346 = load %arg0[%c782, %345] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %347 = load %arg1[%345, %344] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = "accv.bin_op"(%346, %347) {predicate = 2 : i64} : (f32, f32) -> f32 + %349 = load %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = "accv.bin_op"(%349, %348) {predicate = 0 : i64} : (f32, f32) -> f32 + store %350, %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = load %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %351, %arg2[%c782, %344] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %352 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %353 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %354 = load %arg0[%c782, %353] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %355 = load %arg1[%353, %352] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = "accv.bin_op"(%354, %355) {predicate = 2 : i64} : (f32, f32) -> f32 + %357 = load %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %358 = "accv.bin_op"(%357, %356) {predicate = 0 : i64} : (f32, f32) -> f32 + store %358, %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = load %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %359, %arg2[%c782, %352] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %361 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %362 = load %arg0[%c782, %361] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %363 = load %arg1[%361, %360] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = "accv.bin_op"(%362, %363) {predicate = 2 : i64} : (f32, f32) -> f32 + %365 = load %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = "accv.bin_op"(%365, %364) {predicate = 0 : i64} : (f32, f32) -> f32 + store %366, %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = load %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %367, %arg2[%c782, %360] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %368 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %369 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %370 = load %arg0[%c782, %369] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %371 = load %arg1[%369, %368] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %372 = "accv.bin_op"(%370, %371) {predicate = 2 : i64} : (f32, f32) -> f32 + %373 = load %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = "accv.bin_op"(%373, %372) {predicate = 0 : i64} : (f32, f32) -> f32 + store %374, %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = load %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %375, %arg2[%c782, %368] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %376 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %377 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %378 = load %arg0[%c782, %377] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %379 = load %arg1[%377, %376] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = "accv.bin_op"(%378, %379) {predicate = 2 : i64} : (f32, f32) -> f32 + %381 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = "accv.bin_op"(%381, %380) {predicate = 0 : i64} : (f32, f32) -> f32 + store %382, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %383, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg4) + %385 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %386 = load %arg0[%c783, %385] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %387 = load %arg1[%385, %384] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %388 = "accv.bin_op"(%386, %387) {predicate = 2 : i64} : (f32, f32) -> f32 + %389 = load %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = "accv.bin_op"(%389, %388) {predicate = 0 : i64} : (f32, f32) -> f32 + store %390, %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = load %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %391, %arg2[%c783, %384] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 1)>(%arg3, %arg4) + %393 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %394 = load %arg0[%c783, %393] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %395 = load %arg1[%393, %392] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %396 = "accv.bin_op"(%394, %395) {predicate = 2 : i64} : (f32, f32) -> f32 + %397 = load %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = "accv.bin_op"(%397, %396) {predicate = 0 : i64} : (f32, f32) -> f32 + store %398, %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = load %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %399, %arg2[%c783, %392] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 2)>(%arg3, %arg4) + %401 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %402 = load %arg0[%c783, %401] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %403 = load %arg1[%401, %400] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %404 = "accv.bin_op"(%402, %403) {predicate = 2 : i64} : (f32, f32) -> f32 + %405 = load %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %406 = "accv.bin_op"(%405, %404) {predicate = 0 : i64} : (f32, f32) -> f32 + store %406, %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = load %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %407, %arg2[%c783, %400] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %408 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 3)>(%arg3, %arg4) + %409 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %410 = load %arg0[%c783, %409] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %411 = load %arg1[%409, %408] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %412 = "accv.bin_op"(%410, %411) {predicate = 2 : i64} : (f32, f32) -> f32 + %413 = load %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %414 = "accv.bin_op"(%413, %412) {predicate = 0 : i64} : (f32, f32) -> f32 + store %414, %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %415 = load %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %415, %arg2[%c783, %408] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 4)>(%arg3, %arg4) + %417 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %418 = load %arg0[%c783, %417] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %419 = load %arg1[%417, %416] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = "accv.bin_op"(%418, %419) {predicate = 2 : i64} : (f32, f32) -> f32 + %421 = load %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %422 = "accv.bin_op"(%421, %420) {predicate = 0 : i64} : (f32, f32) -> f32 + store %422, %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %423 = load %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %423, %arg2[%c783, %416] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %424 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 5)>(%arg3, %arg4) + %425 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %426 = load %arg0[%c783, %425] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %427 = load %arg1[%425, %424] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = "accv.bin_op"(%426, %427) {predicate = 2 : i64} : (f32, f32) -> f32 + %429 = load %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %430 = "accv.bin_op"(%429, %428) {predicate = 0 : i64} : (f32, f32) -> f32 + store %430, %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = load %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %431, %arg2[%c783, %424] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %432 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 6)>(%arg3, %arg4) + %433 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %434 = load %arg0[%c783, %433] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %435 = load %arg1[%433, %432] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %436 = "accv.bin_op"(%434, %435) {predicate = 2 : i64} : (f32, f32) -> f32 + %437 = load %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %438 = "accv.bin_op"(%437, %436) {predicate = 0 : i64} : (f32, f32) -> f32 + store %438, %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = load %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %439, %arg2[%c783, %432] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 7)>(%arg3, %arg4) + %441 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %442 = load %arg0[%c783, %441] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %443 = load %arg1[%441, %440] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %444 = "accv.bin_op"(%442, %443) {predicate = 2 : i64} : (f32, f32) -> f32 + %445 = load %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = "accv.bin_op"(%445, %444) {predicate = 0 : i64} : (f32, f32) -> f32 + store %446, %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = load %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %447, %arg2[%c783, %440] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %448 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg4) + %449 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %450 = load %arg0[%c783, %449] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %451 = load %arg1[%449, %448] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %452 = "accv.bin_op"(%450, %451) {predicate = 2 : i64} : (f32, f32) -> f32 + %453 = load %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %454 = "accv.bin_op"(%453, %452) {predicate = 0 : i64} : (f32, f32) -> f32 + store %454, %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = load %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %455, %arg2[%c783, %448] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %456 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 9)>(%arg3, %arg4) + %457 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %458 = load %arg0[%c783, %457] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %459 = load %arg1[%457, %456] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = "accv.bin_op"(%458, %459) {predicate = 2 : i64} : (f32, f32) -> f32 + %461 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %462 = "accv.bin_op"(%461, %460) {predicate = 0 : i64} : (f32, f32) -> f32 + store %462, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %463, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 10)>(%arg3, %arg4) + %465 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %466 = load %arg0[%c783, %465] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %467 = load %arg1[%465, %464] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = "accv.bin_op"(%466, %467) {predicate = 2 : i64} : (f32, f32) -> f32 + %469 = load %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = "accv.bin_op"(%469, %468) {predicate = 0 : i64} : (f32, f32) -> f32 + store %470, %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = load %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %471, %arg2[%c783, %464] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %472 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 11)>(%arg3, %arg4) + %473 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %474 = load %arg0[%c783, %473] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %475 = load %arg1[%473, %472] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = "accv.bin_op"(%474, %475) {predicate = 2 : i64} : (f32, f32) -> f32 + %477 = load %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = "accv.bin_op"(%477, %476) {predicate = 0 : i64} : (f32, f32) -> f32 + store %478, %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = load %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %479, %arg2[%c783, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %480 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 12)>(%arg3, %arg4) + %481 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %482 = load %arg0[%c783, %481] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %483 = load %arg1[%481, %480] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %484 = "accv.bin_op"(%482, %483) {predicate = 2 : i64} : (f32, f32) -> f32 + %485 = load %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = "accv.bin_op"(%485, %484) {predicate = 0 : i64} : (f32, f32) -> f32 + store %486, %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = load %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %487, %arg2[%c783, %480] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 13)>(%arg3, %arg4) + %489 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %490 = load %arg0[%c783, %489] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %491 = load %arg1[%489, %488] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %492 = "accv.bin_op"(%490, %491) {predicate = 2 : i64} : (f32, f32) -> f32 + %493 = load %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %494 = "accv.bin_op"(%493, %492) {predicate = 0 : i64} : (f32, f32) -> f32 + store %494, %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %495 = load %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %495, %arg2[%c783, %488] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 14)>(%arg3, %arg4) + %497 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %498 = load %arg0[%c783, %497] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %499 = load %arg1[%497, %496] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = "accv.bin_op"(%498, %499) {predicate = 2 : i64} : (f32, f32) -> f32 + %501 = load %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %502 = "accv.bin_op"(%501, %500) {predicate = 0 : i64} : (f32, f32) -> f32 + store %502, %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %503 = load %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %503, %arg2[%c783, %496] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %504 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 15)>(%arg3, %arg4) + %505 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg5, %arg6) + %506 = load %arg0[%c783, %505] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %507 = load %arg1[%505, %504] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = "accv.bin_op"(%506, %507) {predicate = 2 : i64} : (f32, f32) -> f32 + %509 = load %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = "accv.bin_op"(%509, %508) {predicate = 0 : i64} : (f32, f32) -> f32 + store %510, %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %511 = load %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %511, %arg2[%c783, %504] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_4,14}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_3,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_3,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [4, 16, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_1,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/6_SimplifyAffineStructures.mlir b/Tutorials/optimized_matmul/mlir/6_SimplifyAffineStructures.mlir new file mode 100644 index 00000000..56e88c48 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/6_SimplifyAffineStructures.mlir @@ -0,0 +1,988 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "optimized_matmul" { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + affine.for %arg3 = 0 to 512 step 256 { + affine.for %arg4 = 0 to 128 { + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + affine.store %36, %3[((%arg5 + %c0 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c0 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + affine.store %37, %3[((%arg5 + %c1 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c1 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %38 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + affine.store %38, %3[((%arg5 + %c2 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c2 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + affine.store %39, %3[((%arg5 + %c3 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c3 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %40 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + affine.store %40, %3[((%arg5 + %c4 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c4 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %41 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + affine.store %41, %3[((%arg5 + %c5 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c5 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + affine.store %42, %3[((%arg5 + %c6 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c6 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + affine.store %43, %3[((%arg5 + %c7 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c7 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %44 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + affine.store %44, %3[((%arg5 + %c8 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c8 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + affine.store %45, %3[((%arg5 + %c9 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c9 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %46 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + affine.store %46, %3[((%arg5 + %c10 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c10 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %47 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + affine.store %47, %3[((%arg5 + %c11 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c11 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + affine.store %48, %3[((%arg5 + %c12 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c12 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %49 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + affine.store %49, %3[((%arg5 + %c13 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c13 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %50 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + affine.store %50, %3[((%arg5 + %c14 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c14 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.store %51, %3[((%arg5 + %c15 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c15 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } else { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + affine.store %36, %3[((%arg5 + %c0 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c0 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + affine.store %37, %3[((%arg5 + %c1 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c1 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %38 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + affine.store %38, %3[((%arg5 + %c2 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c2 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + affine.store %39, %3[((%arg5 + %c3 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c3 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %40 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + affine.store %40, %3[((%arg5 + %c4 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c4 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %41 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + affine.store %41, %3[((%arg5 + %c5 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c5 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + affine.store %42, %3[((%arg5 + %c6 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c6 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + affine.store %43, %3[((%arg5 + %c7 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c7 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %44 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + affine.store %44, %3[((%arg5 + %c8 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c8 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + affine.store %45, %3[((%arg5 + %c9 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c9 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %46 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + affine.store %46, %3[((%arg5 + %c10 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c10 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %47 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + affine.store %47, %3[((%arg5 + %c11 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c11 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + affine.store %48, %3[((%arg5 + %c12 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c12 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %49 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + affine.store %49, %3[((%arg5 + %c13 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c13 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %50 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + affine.store %50, %3[((%arg5 + %c14 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c14 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.store %51, %3[((%arg5 + %c15 * 8) floordiv 16) mod 16, (%arg4 + %c0) mod 128, (((%arg5 + %c15 * 8) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_o,21}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i_o,19}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 256]} + affine.for %arg4 = 0 to 784 { + affine.for %arg5 = 0 to 16 { + affine.for %arg6 = 0 to 6 { + affine.for %arg7 = 0 to 2 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 2 : i64, index = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 6 : i64, index = #accln<"index{j_i_i_o,15}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 2]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i,14}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 6, 2]} + affine.for %arg5 = 0 to 256 step 16 { + affine.for %arg6 = 0 to 128 step 4 { + affine.for %arg7 = 0 to 0 step 6 { + affine.for %arg8 = 0 to 4 { + affine.for %arg9 = 0 to 0 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %13 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %14 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %15 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %17 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %18 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %19 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %21 = load %arg0[%4, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg0[%5, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg0[%6, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg0[%7, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg0[%8, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg0[%9, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg0[%10, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %28 = load %arg0[%11, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = affine.load %3[((-%arg3 + %12) floordiv 16) mod 16, (-%c0 + %13) mod 128, (((-%arg3 + %12) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %30 = vector.extractelement %29[%c0_i64 : i64] : vector<8xf32> + %31 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %32 = affine.load %3[((-%arg3 + %31) floordiv 16) mod 16, (-%c0 + %14) mod 128, (((-%arg3 + %31) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c1_i64 : i64] : vector<8xf32> + %34 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %35 = affine.load %3[((-%arg3 + %34) floordiv 16) mod 16, (-%c0 + %15) mod 128, (((-%arg3 + %34) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %36 = vector.extractelement %35[%c2_i64 : i64] : vector<8xf32> + %37 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %38 = affine.load %3[((-%arg3 + %37) floordiv 16) mod 16, (-%c0 + %16) mod 128, (((-%arg3 + %37) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = vector.extractelement %38[%c3_i64 : i64] : vector<8xf32> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %41 = affine.load %3[((-%arg3 + %40) floordiv 16) mod 16, (-%c0 + %17) mod 128, (((-%arg3 + %40) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %42 = vector.extractelement %41[%c4_i64 : i64] : vector<8xf32> + %43 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %44 = affine.load %3[((-%arg3 + %43) floordiv 16) mod 16, (-%c0 + %18) mod 128, (((-%arg3 + %43) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = vector.extractelement %44[%c5_i64 : i64] : vector<8xf32> + %46 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %47 = affine.load %3[((-%arg3 + %46) floordiv 16) mod 16, (-%c0 + %19) mod 128, (((-%arg3 + %46) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %48 = vector.extractelement %47[%c6_i64 : i64] : vector<8xf32> + %49 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %50 = affine.load %3[((-%arg3 + %49) floordiv 16) mod 16, (-%c0 + %20) mod 128, (((-%arg3 + %49) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = vector.extractelement %50[%c7_i64 : i64] : vector<8xf32> + %52 = "accv.bin_op"(%21, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %53 = "accv.bin_op"(%22, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %54 = "accv.bin_op"(%23, %36) {predicate = 2 : i64} : (f32, f32) -> f32 + %55 = "accv.bin_op"(%24, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %56 = "accv.bin_op"(%25, %42) {predicate = 2 : i64} : (f32, f32) -> f32 + %57 = "accv.bin_op"(%26, %45) {predicate = 2 : i64} : (f32, f32) -> f32 + %58 = "accv.bin_op"(%27, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = "accv.bin_op"(%28, %51) {predicate = 2 : i64} : (f32, f32) -> f32 + %60 = affine.load %2[((-%arg3 + %12) floordiv 16) mod 16, (-%arg4 + %4) mod 6, (((-%arg3 + %12) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %61 = vector.extractelement %60[%c0_i64 : i64] : vector<8xf32> + %62 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %63 = affine.load %2[((-%arg3 + %62) floordiv 16) mod 16, (-%arg4 + %5) mod 6, (((-%arg3 + %62) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %64 = vector.extractelement %63[%c1_i64 : i64] : vector<8xf32> + %65 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %66 = affine.load %2[((-%arg3 + %65) floordiv 16) mod 16, (-%arg4 + %6) mod 6, (((-%arg3 + %65) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = vector.extractelement %66[%c2_i64 : i64] : vector<8xf32> + %68 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %69 = affine.load %2[((-%arg3 + %68) floordiv 16) mod 16, (-%arg4 + %7) mod 6, (((-%arg3 + %68) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = vector.extractelement %69[%c3_i64 : i64] : vector<8xf32> + %71 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %72 = affine.load %2[((-%arg3 + %71) floordiv 16) mod 16, (-%arg4 + %8) mod 6, (((-%arg3 + %71) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %73 = vector.extractelement %72[%c4_i64 : i64] : vector<8xf32> + %74 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %75 = affine.load %2[((-%arg3 + %74) floordiv 16) mod 16, (-%arg4 + %9) mod 6, (((-%arg3 + %74) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = vector.extractelement %75[%c5_i64 : i64] : vector<8xf32> + %77 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %78 = affine.load %2[((-%arg3 + %77) floordiv 16) mod 16, (-%arg4 + %10) mod 6, (((-%arg3 + %77) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.extractelement %78[%c6_i64 : i64] : vector<8xf32> + %80 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %81 = affine.load %2[((-%arg3 + %80) floordiv 16) mod 16, (-%arg4 + %11) mod 6, (((-%arg3 + %80) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = vector.extractelement %81[%c7_i64 : i64] : vector<8xf32> + %83 = "accv.bin_op"(%61, %52) {predicate = 0 : i64} : (f32, f32) -> f32 + %84 = "accv.bin_op"(%64, %53) {predicate = 0 : i64} : (f32, f32) -> f32 + %85 = "accv.bin_op"(%67, %54) {predicate = 0 : i64} : (f32, f32) -> f32 + %86 = "accv.bin_op"(%70, %55) {predicate = 0 : i64} : (f32, f32) -> f32 + %87 = "accv.bin_op"(%73, %56) {predicate = 0 : i64} : (f32, f32) -> f32 + %88 = "accv.bin_op"(%76, %57) {predicate = 0 : i64} : (f32, f32) -> f32 + %89 = "accv.bin_op"(%79, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + %90 = "accv.bin_op"(%82, %59) {predicate = 0 : i64} : (f32, f32) -> f32 + %91 = affine.load %2[((-%arg3 + %12) floordiv 16) mod 16, (-%arg4 + %4) mod 6, (((-%arg3 + %12) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = vector.insertelement %83, %91[%c0_i64 : i64] : vector<8xf32> + affine.store %92, %2[((-%arg3 + %12) floordiv 16) mod 16, (-%arg4 + %4) mod 6, (((-%arg3 + %12) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %94 = affine.load %2[((-%arg3 + %93) floordiv 16) mod 16, (-%arg4 + %5) mod 6, (((-%arg3 + %93) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %84, %94[%c1_i64 : i64] : vector<8xf32> + affine.store %95, %2[((-%arg3 + %93) floordiv 16) mod 16, (-%arg4 + %5) mod 6, (((-%arg3 + %93) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %97 = affine.load %2[((-%arg3 + %96) floordiv 16) mod 16, (-%arg4 + %6) mod 6, (((-%arg3 + %96) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = vector.insertelement %85, %97[%c2_i64 : i64] : vector<8xf32> + affine.store %98, %2[((-%arg3 + %96) floordiv 16) mod 16, (-%arg4 + %6) mod 6, (((-%arg3 + %96) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %100 = affine.load %2[((-%arg3 + %99) floordiv 16) mod 16, (-%arg4 + %7) mod 6, (((-%arg3 + %99) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %86, %100[%c3_i64 : i64] : vector<8xf32> + affine.store %101, %2[((-%arg3 + %99) floordiv 16) mod 16, (-%arg4 + %7) mod 6, (((-%arg3 + %99) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %103 = affine.load %2[((-%arg3 + %102) floordiv 16) mod 16, (-%arg4 + %8) mod 6, (((-%arg3 + %102) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %104 = vector.insertelement %87, %103[%c4_i64 : i64] : vector<8xf32> + affine.store %104, %2[((-%arg3 + %102) floordiv 16) mod 16, (-%arg4 + %8) mod 6, (((-%arg3 + %102) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %106 = affine.load %2[((-%arg3 + %105) floordiv 16) mod 16, (-%arg4 + %9) mod 6, (((-%arg3 + %105) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %88, %106[%c5_i64 : i64] : vector<8xf32> + affine.store %107, %2[((-%arg3 + %105) floordiv 16) mod 16, (-%arg4 + %9) mod 6, (((-%arg3 + %105) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %109 = affine.load %2[((-%arg3 + %108) floordiv 16) mod 16, (-%arg4 + %10) mod 6, (((-%arg3 + %108) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %89, %109[%c6_i64 : i64] : vector<8xf32> + affine.store %110, %2[((-%arg3 + %108) floordiv 16) mod 16, (-%arg4 + %10) mod 6, (((-%arg3 + %108) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %112 = affine.load %2[((-%arg3 + %111) floordiv 16) mod 16, (-%arg4 + %11) mod 6, (((-%arg3 + %111) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %90, %112[%c7_i64 : i64] : vector<8xf32> + affine.store %113, %2[((-%arg3 + %111) floordiv 16) mod 16, (-%arg4 + %11) mod 6, (((-%arg3 + %111) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.load %2[((-%arg3 + %12) floordiv 16) mod 16, (-%arg4 + %4) mod 6, (((-%arg3 + %12) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %115 = vector.insertelement %83, %114[%c0_i64 : i64] : vector<8xf32> + affine.store %115, %2[((-%arg3 + %12) floordiv 16) mod 16, (-%arg4 + %4) mod 6, (((-%arg3 + %12) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %117 = affine.load %2[((-%arg3 + %116) floordiv 16) mod 16, (-%arg4 + %5) mod 6, (((-%arg3 + %116) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %118 = vector.insertelement %84, %117[%c1_i64 : i64] : vector<8xf32> + affine.store %118, %2[((-%arg3 + %116) floordiv 16) mod 16, (-%arg4 + %5) mod 6, (((-%arg3 + %116) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %120 = affine.load %2[((-%arg3 + %119) floordiv 16) mod 16, (-%arg4 + %6) mod 6, (((-%arg3 + %119) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %121 = vector.insertelement %85, %120[%c2_i64 : i64] : vector<8xf32> + affine.store %121, %2[((-%arg3 + %119) floordiv 16) mod 16, (-%arg4 + %6) mod 6, (((-%arg3 + %119) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %123 = affine.load %2[((-%arg3 + %122) floordiv 16) mod 16, (-%arg4 + %7) mod 6, (((-%arg3 + %122) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %124 = vector.insertelement %86, %123[%c3_i64 : i64] : vector<8xf32> + affine.store %124, %2[((-%arg3 + %122) floordiv 16) mod 16, (-%arg4 + %7) mod 6, (((-%arg3 + %122) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %126 = affine.load %2[((-%arg3 + %125) floordiv 16) mod 16, (-%arg4 + %8) mod 6, (((-%arg3 + %125) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %127 = vector.insertelement %87, %126[%c4_i64 : i64] : vector<8xf32> + affine.store %127, %2[((-%arg3 + %125) floordiv 16) mod 16, (-%arg4 + %8) mod 6, (((-%arg3 + %125) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %129 = affine.load %2[((-%arg3 + %128) floordiv 16) mod 16, (-%arg4 + %9) mod 6, (((-%arg3 + %128) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %130 = vector.insertelement %88, %129[%c5_i64 : i64] : vector<8xf32> + affine.store %130, %2[((-%arg3 + %128) floordiv 16) mod 16, (-%arg4 + %9) mod 6, (((-%arg3 + %128) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %131 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %132 = affine.load %2[((-%arg3 + %131) floordiv 16) mod 16, (-%arg4 + %10) mod 6, (((-%arg3 + %131) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %133 = vector.insertelement %89, %132[%c6_i64 : i64] : vector<8xf32> + affine.store %133, %2[((-%arg3 + %131) floordiv 16) mod 16, (-%arg4 + %10) mod 6, (((-%arg3 + %131) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %134 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %135 = affine.load %2[((-%arg3 + %134) floordiv 16) mod 16, (-%arg4 + %11) mod 6, (((-%arg3 + %134) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %136 = vector.insertelement %90, %135[%c7_i64 : i64] : vector<8xf32> + affine.store %136, %2[((-%arg3 + %134) floordiv 16) mod 16, (-%arg4 + %11) mod 6, (((-%arg3 + %134) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %137 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %138 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %139 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %140 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %141 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %142 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %143 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %144 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %145 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %146 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %147 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %148 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %149 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %150 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %151 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %152 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %153 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %154 = load %arg0[%137, %146] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg0[%138, %147] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %156 = load %arg0[%139, %148] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg0[%140, %149] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %158 = load %arg0[%141, %150] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %159 = load %arg0[%142, %151] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %160 = load %arg0[%143, %152] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %161 = load %arg0[%144, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %162 = affine.load %3[((-%arg3 + %145) floordiv 16) mod 16, (-%c0 + %146) mod 128, (((-%arg3 + %145) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %163 = vector.extractelement %162[%c0_i64 : i64] : vector<8xf32> + %164 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %165 = affine.load %3[((-%arg3 + %164) floordiv 16) mod 16, (-%c0 + %147) mod 128, (((-%arg3 + %164) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %166 = vector.extractelement %165[%c1_i64 : i64] : vector<8xf32> + %167 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %168 = affine.load %3[((-%arg3 + %167) floordiv 16) mod 16, (-%c0 + %148) mod 128, (((-%arg3 + %167) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %169 = vector.extractelement %168[%c2_i64 : i64] : vector<8xf32> + %170 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %171 = affine.load %3[((-%arg3 + %170) floordiv 16) mod 16, (-%c0 + %149) mod 128, (((-%arg3 + %170) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %172 = vector.extractelement %171[%c3_i64 : i64] : vector<8xf32> + %173 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %174 = affine.load %3[((-%arg3 + %173) floordiv 16) mod 16, (-%c0 + %150) mod 128, (((-%arg3 + %173) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %175 = vector.extractelement %174[%c4_i64 : i64] : vector<8xf32> + %176 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %177 = affine.load %3[((-%arg3 + %176) floordiv 16) mod 16, (-%c0 + %151) mod 128, (((-%arg3 + %176) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %178 = vector.extractelement %177[%c5_i64 : i64] : vector<8xf32> + %179 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %180 = affine.load %3[((-%arg3 + %179) floordiv 16) mod 16, (-%c0 + %152) mod 128, (((-%arg3 + %179) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %181 = vector.extractelement %180[%c6_i64 : i64] : vector<8xf32> + %182 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %183 = affine.load %3[((-%arg3 + %182) floordiv 16) mod 16, (-%c0 + %153) mod 128, (((-%arg3 + %182) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %184 = vector.extractelement %183[%c7_i64 : i64] : vector<8xf32> + %185 = "accv.bin_op"(%154, %163) {predicate = 2 : i64} : (f32, f32) -> f32 + %186 = "accv.bin_op"(%155, %166) {predicate = 2 : i64} : (f32, f32) -> f32 + %187 = "accv.bin_op"(%156, %169) {predicate = 2 : i64} : (f32, f32) -> f32 + %188 = "accv.bin_op"(%157, %172) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = "accv.bin_op"(%158, %175) {predicate = 2 : i64} : (f32, f32) -> f32 + %190 = "accv.bin_op"(%159, %178) {predicate = 2 : i64} : (f32, f32) -> f32 + %191 = "accv.bin_op"(%160, %181) {predicate = 2 : i64} : (f32, f32) -> f32 + %192 = "accv.bin_op"(%161, %184) {predicate = 2 : i64} : (f32, f32) -> f32 + %193 = affine.load %2[((-%arg3 + %145) floordiv 16) mod 16, (-%arg4 + %137) mod 6, (((-%arg3 + %145) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %194 = vector.extractelement %193[%c0_i64 : i64] : vector<8xf32> + %195 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %196 = affine.load %2[((-%arg3 + %195) floordiv 16) mod 16, (-%arg4 + %138) mod 6, (((-%arg3 + %195) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %197 = vector.extractelement %196[%c1_i64 : i64] : vector<8xf32> + %198 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %199 = affine.load %2[((-%arg3 + %198) floordiv 16) mod 16, (-%arg4 + %139) mod 6, (((-%arg3 + %198) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %200 = vector.extractelement %199[%c2_i64 : i64] : vector<8xf32> + %201 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %202 = affine.load %2[((-%arg3 + %201) floordiv 16) mod 16, (-%arg4 + %140) mod 6, (((-%arg3 + %201) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %203 = vector.extractelement %202[%c3_i64 : i64] : vector<8xf32> + %204 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %205 = affine.load %2[((-%arg3 + %204) floordiv 16) mod 16, (-%arg4 + %141) mod 6, (((-%arg3 + %204) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %206 = vector.extractelement %205[%c4_i64 : i64] : vector<8xf32> + %207 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %208 = affine.load %2[((-%arg3 + %207) floordiv 16) mod 16, (-%arg4 + %142) mod 6, (((-%arg3 + %207) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %209 = vector.extractelement %208[%c5_i64 : i64] : vector<8xf32> + %210 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %211 = affine.load %2[((-%arg3 + %210) floordiv 16) mod 16, (-%arg4 + %143) mod 6, (((-%arg3 + %210) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %212 = vector.extractelement %211[%c6_i64 : i64] : vector<8xf32> + %213 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %214 = affine.load %2[((-%arg3 + %213) floordiv 16) mod 16, (-%arg4 + %144) mod 6, (((-%arg3 + %213) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %215 = vector.extractelement %214[%c7_i64 : i64] : vector<8xf32> + %216 = "accv.bin_op"(%194, %185) {predicate = 0 : i64} : (f32, f32) -> f32 + %217 = "accv.bin_op"(%197, %186) {predicate = 0 : i64} : (f32, f32) -> f32 + %218 = "accv.bin_op"(%200, %187) {predicate = 0 : i64} : (f32, f32) -> f32 + %219 = "accv.bin_op"(%203, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + %220 = "accv.bin_op"(%206, %189) {predicate = 0 : i64} : (f32, f32) -> f32 + %221 = "accv.bin_op"(%209, %190) {predicate = 0 : i64} : (f32, f32) -> f32 + %222 = "accv.bin_op"(%212, %191) {predicate = 0 : i64} : (f32, f32) -> f32 + %223 = "accv.bin_op"(%215, %192) {predicate = 0 : i64} : (f32, f32) -> f32 + %224 = affine.load %2[((-%arg3 + %145) floordiv 16) mod 16, (-%arg4 + %137) mod 6, (((-%arg3 + %145) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %225 = vector.insertelement %216, %224[%c0_i64 : i64] : vector<8xf32> + affine.store %225, %2[((-%arg3 + %145) floordiv 16) mod 16, (-%arg4 + %137) mod 6, (((-%arg3 + %145) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %226 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %227 = affine.load %2[((-%arg3 + %226) floordiv 16) mod 16, (-%arg4 + %138) mod 6, (((-%arg3 + %226) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %228 = vector.insertelement %217, %227[%c1_i64 : i64] : vector<8xf32> + affine.store %228, %2[((-%arg3 + %226) floordiv 16) mod 16, (-%arg4 + %138) mod 6, (((-%arg3 + %226) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %229 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %230 = affine.load %2[((-%arg3 + %229) floordiv 16) mod 16, (-%arg4 + %139) mod 6, (((-%arg3 + %229) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %231 = vector.insertelement %218, %230[%c2_i64 : i64] : vector<8xf32> + affine.store %231, %2[((-%arg3 + %229) floordiv 16) mod 16, (-%arg4 + %139) mod 6, (((-%arg3 + %229) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %232 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %233 = affine.load %2[((-%arg3 + %232) floordiv 16) mod 16, (-%arg4 + %140) mod 6, (((-%arg3 + %232) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %234 = vector.insertelement %219, %233[%c3_i64 : i64] : vector<8xf32> + affine.store %234, %2[((-%arg3 + %232) floordiv 16) mod 16, (-%arg4 + %140) mod 6, (((-%arg3 + %232) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %235 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %236 = affine.load %2[((-%arg3 + %235) floordiv 16) mod 16, (-%arg4 + %141) mod 6, (((-%arg3 + %235) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %237 = vector.insertelement %220, %236[%c4_i64 : i64] : vector<8xf32> + affine.store %237, %2[((-%arg3 + %235) floordiv 16) mod 16, (-%arg4 + %141) mod 6, (((-%arg3 + %235) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %238 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %239 = affine.load %2[((-%arg3 + %238) floordiv 16) mod 16, (-%arg4 + %142) mod 6, (((-%arg3 + %238) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %240 = vector.insertelement %221, %239[%c5_i64 : i64] : vector<8xf32> + affine.store %240, %2[((-%arg3 + %238) floordiv 16) mod 16, (-%arg4 + %142) mod 6, (((-%arg3 + %238) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %241 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %242 = affine.load %2[((-%arg3 + %241) floordiv 16) mod 16, (-%arg4 + %143) mod 6, (((-%arg3 + %241) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %243 = vector.insertelement %222, %242[%c6_i64 : i64] : vector<8xf32> + affine.store %243, %2[((-%arg3 + %241) floordiv 16) mod 16, (-%arg4 + %143) mod 6, (((-%arg3 + %241) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %244 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %245 = affine.load %2[((-%arg3 + %244) floordiv 16) mod 16, (-%arg4 + %144) mod 6, (((-%arg3 + %244) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %246 = vector.insertelement %223, %245[%c7_i64 : i64] : vector<8xf32> + affine.store %246, %2[((-%arg3 + %244) floordiv 16) mod 16, (-%arg4 + %144) mod 6, (((-%arg3 + %244) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = affine.load %2[((-%arg3 + %145) floordiv 16) mod 16, (-%arg4 + %137) mod 6, (((-%arg3 + %145) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = vector.insertelement %216, %247[%c0_i64 : i64] : vector<8xf32> + affine.store %248, %2[((-%arg3 + %145) floordiv 16) mod 16, (-%arg4 + %137) mod 6, (((-%arg3 + %145) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %249 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %250 = affine.load %2[((-%arg3 + %249) floordiv 16) mod 16, (-%arg4 + %138) mod 6, (((-%arg3 + %249) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %251 = vector.insertelement %217, %250[%c1_i64 : i64] : vector<8xf32> + affine.store %251, %2[((-%arg3 + %249) floordiv 16) mod 16, (-%arg4 + %138) mod 6, (((-%arg3 + %249) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %252 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %253 = affine.load %2[((-%arg3 + %252) floordiv 16) mod 16, (-%arg4 + %139) mod 6, (((-%arg3 + %252) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %254 = vector.insertelement %218, %253[%c2_i64 : i64] : vector<8xf32> + affine.store %254, %2[((-%arg3 + %252) floordiv 16) mod 16, (-%arg4 + %139) mod 6, (((-%arg3 + %252) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %255 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %256 = affine.load %2[((-%arg3 + %255) floordiv 16) mod 16, (-%arg4 + %140) mod 6, (((-%arg3 + %255) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %257 = vector.insertelement %219, %256[%c3_i64 : i64] : vector<8xf32> + affine.store %257, %2[((-%arg3 + %255) floordiv 16) mod 16, (-%arg4 + %140) mod 6, (((-%arg3 + %255) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %258 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %259 = affine.load %2[((-%arg3 + %258) floordiv 16) mod 16, (-%arg4 + %141) mod 6, (((-%arg3 + %258) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %260 = vector.insertelement %220, %259[%c4_i64 : i64] : vector<8xf32> + affine.store %260, %2[((-%arg3 + %258) floordiv 16) mod 16, (-%arg4 + %141) mod 6, (((-%arg3 + %258) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %261 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %262 = affine.load %2[((-%arg3 + %261) floordiv 16) mod 16, (-%arg4 + %142) mod 6, (((-%arg3 + %261) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %263 = vector.insertelement %221, %262[%c5_i64 : i64] : vector<8xf32> + affine.store %263, %2[((-%arg3 + %261) floordiv 16) mod 16, (-%arg4 + %142) mod 6, (((-%arg3 + %261) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %264 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %265 = affine.load %2[((-%arg3 + %264) floordiv 16) mod 16, (-%arg4 + %143) mod 6, (((-%arg3 + %264) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %266 = vector.insertelement %222, %265[%c6_i64 : i64] : vector<8xf32> + affine.store %266, %2[((-%arg3 + %264) floordiv 16) mod 16, (-%arg4 + %143) mod 6, (((-%arg3 + %264) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %267 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %268 = affine.load %2[((-%arg3 + %267) floordiv 16) mod 16, (-%arg4 + %144) mod 6, (((-%arg3 + %267) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %269 = vector.insertelement %223, %268[%c7_i64 : i64] : vector<8xf32> + affine.store %269, %2[((-%arg3 + %267) floordiv 16) mod 16, (-%arg4 + %144) mod 6, (((-%arg3 + %267) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_i,12}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_o,11}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 4]} + affine.for %arg7 = 0 to 4 { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %6 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %7 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %10 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %11 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %13 = load %arg0[%arg4, %5] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%arg4, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = load %arg0[%arg4, %7] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %16 = load %arg0[%arg4, %8] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg0[%arg4, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %18 = load %arg0[%arg4, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg0[%arg4, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %20 = load %arg0[%arg4, %12] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = affine.load %3[((-%arg3 + %4) floordiv 16) mod 16, (-%c0 + %5) mod 128, (((-%arg3 + %4) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %22 = vector.extractelement %21[%c0_i64 : i64] : vector<8xf32> + %23 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %24 = affine.load %3[((-%arg3 + %23) floordiv 16) mod 16, (-%c0 + %6) mod 128, (((-%arg3 + %23) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %25 = vector.extractelement %24[%c1_i64 : i64] : vector<8xf32> + %26 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %27 = affine.load %3[((-%arg3 + %26) floordiv 16) mod 16, (-%c0 + %7) mod 128, (((-%arg3 + %26) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %28 = vector.extractelement %27[%c2_i64 : i64] : vector<8xf32> + %29 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %30 = affine.load %3[((-%arg3 + %29) floordiv 16) mod 16, (-%c0 + %8) mod 128, (((-%arg3 + %29) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %31 = vector.extractelement %30[%c3_i64 : i64] : vector<8xf32> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %33 = affine.load %3[((-%arg3 + %32) floordiv 16) mod 16, (-%c0 + %9) mod 128, (((-%arg3 + %32) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %34 = vector.extractelement %33[%c4_i64 : i64] : vector<8xf32> + %35 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %36 = affine.load %3[((-%arg3 + %35) floordiv 16) mod 16, (-%c0 + %10) mod 128, (((-%arg3 + %35) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = vector.extractelement %36[%c5_i64 : i64] : vector<8xf32> + %38 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %39 = affine.load %3[((-%arg3 + %38) floordiv 16) mod 16, (-%c0 + %11) mod 128, (((-%arg3 + %38) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %40 = vector.extractelement %39[%c6_i64 : i64] : vector<8xf32> + %41 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %42 = affine.load %3[((-%arg3 + %41) floordiv 16) mod 16, (-%c0 + %12) mod 128, (((-%arg3 + %41) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = vector.extractelement %42[%c7_i64 : i64] : vector<8xf32> + %44 = "accv.bin_op"(%13, %22) {predicate = 2 : i64} : (f32, f32) -> f32 + %45 = "accv.bin_op"(%14, %25) {predicate = 2 : i64} : (f32, f32) -> f32 + %46 = "accv.bin_op"(%15, %28) {predicate = 2 : i64} : (f32, f32) -> f32 + %47 = "accv.bin_op"(%16, %31) {predicate = 2 : i64} : (f32, f32) -> f32 + %48 = "accv.bin_op"(%17, %34) {predicate = 2 : i64} : (f32, f32) -> f32 + %49 = "accv.bin_op"(%18, %37) {predicate = 2 : i64} : (f32, f32) -> f32 + %50 = "accv.bin_op"(%19, %40) {predicate = 2 : i64} : (f32, f32) -> f32 + %51 = "accv.bin_op"(%20, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %52 = affine.load %2[((-%arg3 + %4) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %4) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %53 = vector.extractelement %52[%c0_i64 : i64] : vector<8xf32> + %54 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %55 = affine.load %2[((-%arg3 + %54) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %54) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %56 = vector.extractelement %55[%c1_i64 : i64] : vector<8xf32> + %57 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %58 = affine.load %2[((-%arg3 + %57) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %57) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = vector.extractelement %58[%c2_i64 : i64] : vector<8xf32> + %60 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %61 = affine.load %2[((-%arg3 + %60) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %60) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %62 = vector.extractelement %61[%c3_i64 : i64] : vector<8xf32> + %63 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %64 = affine.load %2[((-%arg3 + %63) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %63) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %65 = vector.extractelement %64[%c4_i64 : i64] : vector<8xf32> + %66 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %67 = affine.load %2[((-%arg3 + %66) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %66) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %68 = vector.extractelement %67[%c5_i64 : i64] : vector<8xf32> + %69 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %70 = affine.load %2[((-%arg3 + %69) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %69) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %71 = vector.extractelement %70[%c6_i64 : i64] : vector<8xf32> + %72 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %73 = affine.load %2[((-%arg3 + %72) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %72) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %74 = vector.extractelement %73[%c7_i64 : i64] : vector<8xf32> + %75 = "accv.bin_op"(%53, %44) {predicate = 0 : i64} : (f32, f32) -> f32 + %76 = "accv.bin_op"(%56, %45) {predicate = 0 : i64} : (f32, f32) -> f32 + %77 = "accv.bin_op"(%59, %46) {predicate = 0 : i64} : (f32, f32) -> f32 + %78 = "accv.bin_op"(%62, %47) {predicate = 0 : i64} : (f32, f32) -> f32 + %79 = "accv.bin_op"(%65, %48) {predicate = 0 : i64} : (f32, f32) -> f32 + %80 = "accv.bin_op"(%68, %49) {predicate = 0 : i64} : (f32, f32) -> f32 + %81 = "accv.bin_op"(%71, %50) {predicate = 0 : i64} : (f32, f32) -> f32 + %82 = "accv.bin_op"(%74, %51) {predicate = 0 : i64} : (f32, f32) -> f32 + %83 = affine.load %2[((-%arg3 + %4) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %4) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %84 = vector.insertelement %75, %83[%c0_i64 : i64] : vector<8xf32> + affine.store %84, %2[((-%arg3 + %4) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %4) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %85 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %86 = affine.load %2[((-%arg3 + %85) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %85) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %87 = vector.insertelement %76, %86[%c1_i64 : i64] : vector<8xf32> + affine.store %87, %2[((-%arg3 + %85) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %85) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %88 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %89 = affine.load %2[((-%arg3 + %88) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %88) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %90 = vector.insertelement %77, %89[%c2_i64 : i64] : vector<8xf32> + affine.store %90, %2[((-%arg3 + %88) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %88) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %91 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %92 = affine.load %2[((-%arg3 + %91) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %91) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = vector.insertelement %78, %92[%c3_i64 : i64] : vector<8xf32> + affine.store %93, %2[((-%arg3 + %91) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %91) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %95 = affine.load %2[((-%arg3 + %94) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %94) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = vector.insertelement %79, %95[%c4_i64 : i64] : vector<8xf32> + affine.store %96, %2[((-%arg3 + %94) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %94) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %97 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %98 = affine.load %2[((-%arg3 + %97) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %97) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %80, %98[%c5_i64 : i64] : vector<8xf32> + affine.store %99, %2[((-%arg3 + %97) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %97) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %100 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %101 = affine.load %2[((-%arg3 + %100) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %100) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = vector.insertelement %81, %101[%c6_i64 : i64] : vector<8xf32> + affine.store %102, %2[((-%arg3 + %100) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %100) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %103 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %104 = affine.load %2[((-%arg3 + %103) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %103) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %82, %104[%c7_i64 : i64] : vector<8xf32> + affine.store %105, %2[((-%arg3 + %103) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %103) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %106 = affine.load %2[((-%arg3 + %4) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %4) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %75, %106[%c0_i64 : i64] : vector<8xf32> + affine.store %107, %2[((-%arg3 + %4) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %4) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %109 = affine.load %2[((-%arg3 + %108) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %108) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %110 = vector.insertelement %76, %109[%c1_i64 : i64] : vector<8xf32> + affine.store %110, %2[((-%arg3 + %108) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %108) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %111 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %112 = affine.load %2[((-%arg3 + %111) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %111) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %113 = vector.insertelement %77, %112[%c2_i64 : i64] : vector<8xf32> + affine.store %113, %2[((-%arg3 + %111) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %111) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %114 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %115 = affine.load %2[((-%arg3 + %114) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %114) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %116 = vector.insertelement %78, %115[%c3_i64 : i64] : vector<8xf32> + affine.store %116, %2[((-%arg3 + %114) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %114) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %117 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %118 = affine.load %2[((-%arg3 + %117) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %117) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %119 = vector.insertelement %79, %118[%c4_i64 : i64] : vector<8xf32> + affine.store %119, %2[((-%arg3 + %117) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %117) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %120 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %121 = affine.load %2[((-%arg3 + %120) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %120) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %122 = vector.insertelement %80, %121[%c5_i64 : i64] : vector<8xf32> + affine.store %122, %2[((-%arg3 + %120) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %120) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %123 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %124 = affine.load %2[((-%arg3 + %123) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %123) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %125 = vector.insertelement %81, %124[%c6_i64 : i64] : vector<8xf32> + affine.store %125, %2[((-%arg3 + %123) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %123) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %126 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %127 = affine.load %2[((-%arg3 + %126) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %126) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %128 = vector.insertelement %82, %127[%c7_i64 : i64] : vector<8xf32> + affine.store %128, %2[((-%arg3 + %126) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %126) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %129 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %130 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %131 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %132 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %133 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %134 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %135 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %136 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %137 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %138 = load %arg0[%arg4, %130] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %139 = load %arg0[%arg4, %131] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %140 = load %arg0[%arg4, %132] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %141 = load %arg0[%arg4, %133] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %142 = load %arg0[%arg4, %134] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %143 = load %arg0[%arg4, %135] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %144 = load %arg0[%arg4, %136] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %145 = load %arg0[%arg4, %137] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %146 = affine.load %3[((-%arg3 + %129) floordiv 16) mod 16, (-%c0 + %130) mod 128, (((-%arg3 + %129) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %147 = vector.extractelement %146[%c0_i64 : i64] : vector<8xf32> + %148 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %149 = affine.load %3[((-%arg3 + %148) floordiv 16) mod 16, (-%c0 + %131) mod 128, (((-%arg3 + %148) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %150 = vector.extractelement %149[%c1_i64 : i64] : vector<8xf32> + %151 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %152 = affine.load %3[((-%arg3 + %151) floordiv 16) mod 16, (-%c0 + %132) mod 128, (((-%arg3 + %151) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %153 = vector.extractelement %152[%c2_i64 : i64] : vector<8xf32> + %154 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %155 = affine.load %3[((-%arg3 + %154) floordiv 16) mod 16, (-%c0 + %133) mod 128, (((-%arg3 + %154) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %156 = vector.extractelement %155[%c3_i64 : i64] : vector<8xf32> + %157 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %158 = affine.load %3[((-%arg3 + %157) floordiv 16) mod 16, (-%c0 + %134) mod 128, (((-%arg3 + %157) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %159 = vector.extractelement %158[%c4_i64 : i64] : vector<8xf32> + %160 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %161 = affine.load %3[((-%arg3 + %160) floordiv 16) mod 16, (-%c0 + %135) mod 128, (((-%arg3 + %160) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %162 = vector.extractelement %161[%c5_i64 : i64] : vector<8xf32> + %163 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %164 = affine.load %3[((-%arg3 + %163) floordiv 16) mod 16, (-%c0 + %136) mod 128, (((-%arg3 + %163) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %165 = vector.extractelement %164[%c6_i64 : i64] : vector<8xf32> + %166 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %167 = affine.load %3[((-%arg3 + %166) floordiv 16) mod 16, (-%c0 + %137) mod 128, (((-%arg3 + %166) mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %168 = vector.extractelement %167[%c7_i64 : i64] : vector<8xf32> + %169 = "accv.bin_op"(%138, %147) {predicate = 2 : i64} : (f32, f32) -> f32 + %170 = "accv.bin_op"(%139, %150) {predicate = 2 : i64} : (f32, f32) -> f32 + %171 = "accv.bin_op"(%140, %153) {predicate = 2 : i64} : (f32, f32) -> f32 + %172 = "accv.bin_op"(%141, %156) {predicate = 2 : i64} : (f32, f32) -> f32 + %173 = "accv.bin_op"(%142, %159) {predicate = 2 : i64} : (f32, f32) -> f32 + %174 = "accv.bin_op"(%143, %162) {predicate = 2 : i64} : (f32, f32) -> f32 + %175 = "accv.bin_op"(%144, %165) {predicate = 2 : i64} : (f32, f32) -> f32 + %176 = "accv.bin_op"(%145, %168) {predicate = 2 : i64} : (f32, f32) -> f32 + %177 = affine.load %2[((-%arg3 + %129) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %129) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %178 = vector.extractelement %177[%c0_i64 : i64] : vector<8xf32> + %179 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %180 = affine.load %2[((-%arg3 + %179) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %179) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %181 = vector.extractelement %180[%c1_i64 : i64] : vector<8xf32> + %182 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %183 = affine.load %2[((-%arg3 + %182) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %182) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %184 = vector.extractelement %183[%c2_i64 : i64] : vector<8xf32> + %185 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %186 = affine.load %2[((-%arg3 + %185) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %185) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %187 = vector.extractelement %186[%c3_i64 : i64] : vector<8xf32> + %188 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %189 = affine.load %2[((-%arg3 + %188) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %188) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %190 = vector.extractelement %189[%c4_i64 : i64] : vector<8xf32> + %191 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %192 = affine.load %2[((-%arg3 + %191) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %191) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %193 = vector.extractelement %192[%c5_i64 : i64] : vector<8xf32> + %194 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %195 = affine.load %2[((-%arg3 + %194) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %194) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %196 = vector.extractelement %195[%c6_i64 : i64] : vector<8xf32> + %197 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %198 = affine.load %2[((-%arg3 + %197) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %197) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %199 = vector.extractelement %198[%c7_i64 : i64] : vector<8xf32> + %200 = "accv.bin_op"(%178, %169) {predicate = 0 : i64} : (f32, f32) -> f32 + %201 = "accv.bin_op"(%181, %170) {predicate = 0 : i64} : (f32, f32) -> f32 + %202 = "accv.bin_op"(%184, %171) {predicate = 0 : i64} : (f32, f32) -> f32 + %203 = "accv.bin_op"(%187, %172) {predicate = 0 : i64} : (f32, f32) -> f32 + %204 = "accv.bin_op"(%190, %173) {predicate = 0 : i64} : (f32, f32) -> f32 + %205 = "accv.bin_op"(%193, %174) {predicate = 0 : i64} : (f32, f32) -> f32 + %206 = "accv.bin_op"(%196, %175) {predicate = 0 : i64} : (f32, f32) -> f32 + %207 = "accv.bin_op"(%199, %176) {predicate = 0 : i64} : (f32, f32) -> f32 + %208 = affine.load %2[((-%arg3 + %129) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %129) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %209 = vector.insertelement %200, %208[%c0_i64 : i64] : vector<8xf32> + affine.store %209, %2[((-%arg3 + %129) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %129) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %210 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %211 = affine.load %2[((-%arg3 + %210) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %210) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %212 = vector.insertelement %201, %211[%c1_i64 : i64] : vector<8xf32> + affine.store %212, %2[((-%arg3 + %210) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %210) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %213 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %214 = affine.load %2[((-%arg3 + %213) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %213) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %215 = vector.insertelement %202, %214[%c2_i64 : i64] : vector<8xf32> + affine.store %215, %2[((-%arg3 + %213) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %213) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %216 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %217 = affine.load %2[((-%arg3 + %216) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %216) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %218 = vector.insertelement %203, %217[%c3_i64 : i64] : vector<8xf32> + affine.store %218, %2[((-%arg3 + %216) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %216) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %219 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %220 = affine.load %2[((-%arg3 + %219) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %219) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %221 = vector.insertelement %204, %220[%c4_i64 : i64] : vector<8xf32> + affine.store %221, %2[((-%arg3 + %219) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %219) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %222 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %223 = affine.load %2[((-%arg3 + %222) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %222) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %224 = vector.insertelement %205, %223[%c5_i64 : i64] : vector<8xf32> + affine.store %224, %2[((-%arg3 + %222) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %222) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %225 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %226 = affine.load %2[((-%arg3 + %225) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %225) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %227 = vector.insertelement %206, %226[%c6_i64 : i64] : vector<8xf32> + affine.store %227, %2[((-%arg3 + %225) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %225) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %228 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %229 = affine.load %2[((-%arg3 + %228) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %228) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %230 = vector.insertelement %207, %229[%c7_i64 : i64] : vector<8xf32> + affine.store %230, %2[((-%arg3 + %228) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %228) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %231 = affine.load %2[((-%arg3 + %129) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %129) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %232 = vector.insertelement %200, %231[%c0_i64 : i64] : vector<8xf32> + affine.store %232, %2[((-%arg3 + %129) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %129) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %233 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %234 = affine.load %2[((-%arg3 + %233) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %233) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %235 = vector.insertelement %201, %234[%c1_i64 : i64] : vector<8xf32> + affine.store %235, %2[((-%arg3 + %233) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %233) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %236 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %237 = affine.load %2[((-%arg3 + %236) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %236) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %238 = vector.insertelement %202, %237[%c2_i64 : i64] : vector<8xf32> + affine.store %238, %2[((-%arg3 + %236) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %236) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %239 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %240 = affine.load %2[((-%arg3 + %239) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %239) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %241 = vector.insertelement %203, %240[%c3_i64 : i64] : vector<8xf32> + affine.store %241, %2[((-%arg3 + %239) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %239) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %242 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %243 = affine.load %2[((-%arg3 + %242) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %242) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %244 = vector.insertelement %204, %243[%c4_i64 : i64] : vector<8xf32> + affine.store %244, %2[((-%arg3 + %242) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %242) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %245 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %246 = affine.load %2[((-%arg3 + %245) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %245) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %247 = vector.insertelement %205, %246[%c5_i64 : i64] : vector<8xf32> + affine.store %247, %2[((-%arg3 + %245) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %245) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %248 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %249 = affine.load %2[((-%arg3 + %248) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %248) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %250 = vector.insertelement %206, %249[%c6_i64 : i64] : vector<8xf32> + affine.store %250, %2[((-%arg3 + %248) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %248) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %251 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %252 = affine.load %2[((-%arg3 + %251) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %251) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %253 = vector.insertelement %207, %252[%c7_i64 : i64] : vector<8xf32> + affine.store %253, %2[((-%arg3 + %251) floordiv 16) mod 16, (-%arg4 + %arg4) mod 6, (((-%arg3 + %251) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 128]} + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = affine.load %2[((%arg5 + %c0 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c0 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %7 = addf %5, %6 : vector<8xf32> + store %7, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %9 = vector.transfer_read %arg2[%arg4, %8], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %10 = affine.load %2[((%arg5 + %c1 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c1 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %11 = addf %9, %10 : vector<8xf32> + store %11, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %13 = vector.transfer_read %arg2[%arg4, %12], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %14 = affine.load %2[((%arg5 + %c2 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c2 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %15 = addf %13, %14 : vector<8xf32> + store %15, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %17 = vector.transfer_read %arg2[%arg4, %16], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %18 = affine.load %2[((%arg5 + %c3 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c3 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %19 = addf %17, %18 : vector<8xf32> + store %19, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %21 = vector.transfer_read %arg2[%arg4, %20], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %22 = affine.load %2[((%arg5 + %c4 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c4 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %23 = addf %21, %22 : vector<8xf32> + store %23, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %25 = vector.transfer_read %arg2[%arg4, %24], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %26 = affine.load %2[((%arg5 + %c5 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c5 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %27 = addf %25, %26 : vector<8xf32> + store %27, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %29 = vector.transfer_read %arg2[%arg4, %28], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %30 = affine.load %2[((%arg5 + %c6 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c6 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %31 = addf %29, %30 : vector<8xf32> + store %31, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = affine.load %2[((%arg5 + %c7 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c7 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %35 = addf %33, %34 : vector<8xf32> + store %35, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %36 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %37 = vector.transfer_read %arg2[%arg4, %36], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %38 = affine.load %2[((%arg5 + %c8 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c8 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %39 = addf %37, %38 : vector<8xf32> + store %39, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %41 = vector.transfer_read %arg2[%arg4, %40], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %42 = affine.load %2[((%arg5 + %c9 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %43 = addf %41, %42 : vector<8xf32> + store %43, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %44 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %45 = vector.transfer_read %arg2[%arg4, %44], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %46 = affine.load %2[((%arg5 + %c10 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c10 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %47 = addf %45, %46 : vector<8xf32> + store %47, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %49 = vector.transfer_read %arg2[%arg4, %48], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %50 = affine.load %2[((%arg5 + %c11 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c11 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %51 = addf %49, %50 : vector<8xf32> + store %51, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %52 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %53 = vector.transfer_read %arg2[%arg4, %52], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %54 = affine.load %2[((%arg5 + %c12 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c12 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %55 = addf %53, %54 : vector<8xf32> + store %55, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %57 = vector.transfer_read %arg2[%arg4, %56], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %58 = affine.load %2[((%arg5 + %c13 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c13 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = addf %57, %58 : vector<8xf32> + store %59, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %60 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %61 = vector.transfer_read %arg2[%arg4, %60], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %62 = affine.load %2[((%arg5 + %c14 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c14 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %63 = addf %61, %62 : vector<8xf32> + store %63, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %65 = vector.transfer_read %arg2[%arg4, %64], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %66 = affine.load %2[((%arg5 + %c15 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c15 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = addf %65, %66 : vector<8xf32> + store %67, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %68 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %69 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %69, %arg2[%arg4, %68] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 1]} + } else { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = affine.load %2[((%arg5 + %c0 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c0 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %7 = addf %5, %6 : vector<8xf32> + store %7, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %9 = vector.transfer_read %arg2[%arg4, %8], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %10 = affine.load %2[((%arg5 + %c1 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c1 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %11 = addf %9, %10 : vector<8xf32> + store %11, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %13 = vector.transfer_read %arg2[%arg4, %12], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %14 = affine.load %2[((%arg5 + %c2 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c2 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %15 = addf %13, %14 : vector<8xf32> + store %15, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %17 = vector.transfer_read %arg2[%arg4, %16], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %18 = affine.load %2[((%arg5 + %c3 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c3 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %19 = addf %17, %18 : vector<8xf32> + store %19, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %21 = vector.transfer_read %arg2[%arg4, %20], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %22 = affine.load %2[((%arg5 + %c4 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c4 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %23 = addf %21, %22 : vector<8xf32> + store %23, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %25 = vector.transfer_read %arg2[%arg4, %24], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %26 = affine.load %2[((%arg5 + %c5 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c5 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %27 = addf %25, %26 : vector<8xf32> + store %27, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %29 = vector.transfer_read %arg2[%arg4, %28], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %30 = affine.load %2[((%arg5 + %c6 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c6 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %31 = addf %29, %30 : vector<8xf32> + store %31, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = affine.load %2[((%arg5 + %c7 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c7 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %35 = addf %33, %34 : vector<8xf32> + store %35, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %36 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %37 = vector.transfer_read %arg2[%arg4, %36], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %38 = affine.load %2[((%arg5 + %c8 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c8 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %39 = addf %37, %38 : vector<8xf32> + store %39, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %41 = vector.transfer_read %arg2[%arg4, %40], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %42 = affine.load %2[((%arg5 + %c9 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c9 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %43 = addf %41, %42 : vector<8xf32> + store %43, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %44 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %45 = vector.transfer_read %arg2[%arg4, %44], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %46 = affine.load %2[((%arg5 + %c10 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c10 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %47 = addf %45, %46 : vector<8xf32> + store %47, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %49 = vector.transfer_read %arg2[%arg4, %48], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %50 = affine.load %2[((%arg5 + %c11 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c11 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %51 = addf %49, %50 : vector<8xf32> + store %51, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %52 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %53 = vector.transfer_read %arg2[%arg4, %52], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %54 = affine.load %2[((%arg5 + %c12 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c12 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %55 = addf %53, %54 : vector<8xf32> + store %55, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %57 = vector.transfer_read %arg2[%arg4, %56], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %58 = affine.load %2[((%arg5 + %c13 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c13 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = addf %57, %58 : vector<8xf32> + store %59, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %60 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %61 = vector.transfer_read %arg2[%arg4, %60], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %62 = affine.load %2[((%arg5 + %c14 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c14 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %63 = addf %61, %62 : vector<8xf32> + store %63, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %65 = vector.transfer_read %arg2[%arg4, %64], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %66 = affine.load %2[((%arg5 + %c15 * 8) floordiv 16) mod 16, (%c0 + %c0) mod 6, (((%arg5 + %c15 * 8) mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = addf %65, %66 : vector<8xf32> + store %67, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %68 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %69 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %69, %arg2[%arg4, %68] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 1]} + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 784 : i64, index = #accln<"index{i_o,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/7_Canonicalizer.mlir b/Tutorials/optimized_matmul/mlir/7_Canonicalizer.mlir new file mode 100644 index 00000000..d77eefe0 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/7_Canonicalizer.mlir @@ -0,0 +1,872 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "optimized_matmul" { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + affine.for %arg3 = 0 to 512 step 256 { + affine.for %arg4 = 0 to 128 { + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %7 = vector.transfer_read %arg1[%arg4, %6], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %13 = vector.transfer_read %arg1[%arg4, %12], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %19 = vector.transfer_read %arg1[%arg4, %18], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %25 = vector.transfer_read %arg1[%arg4, %24], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %31 = vector.transfer_read %arg1[%arg4, %30], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + affine.store %36, %3[(%arg5 floordiv 16) mod 16, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + affine.store %37, %3[((%arg5 + 8) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %38 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + affine.store %38, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 1) floordiv 16) * 16 + 1, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + affine.store %39, %3[((%arg5 + 24) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 24) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 24) floordiv 16) * 2 + 3) floordiv 2) * 2 + 3] : memref<16x128x2xvector<8xf32>> + %40 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + affine.store %40, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 2) floordiv 16) * 16 + 2, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %41 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + affine.store %41, %3[((%arg5 + 40) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 40) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 40) floordiv 16) * 2 + 5) floordiv 2) * 2 + 5] : memref<16x128x2xvector<8xf32>> + %42 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + affine.store %42, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 3) floordiv 16) * 16 + 3, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + affine.store %43, %3[((%arg5 + 56) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 56) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 56) floordiv 16) * 2 + 7) floordiv 2) * 2 + 7] : memref<16x128x2xvector<8xf32>> + %44 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + affine.store %44, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 4) floordiv 16) * 16 + 4, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + affine.store %45, %3[((%arg5 + 72) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 72) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 72) floordiv 16) * 2 + 9) floordiv 2) * 2 + 9] : memref<16x128x2xvector<8xf32>> + %46 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + affine.store %46, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 5) floordiv 16) * 16 + 5, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %47 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + affine.store %47, %3[((%arg5 + 88) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 88) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 88) floordiv 16) * 2 + 11) floordiv 2) * 2 + 11] : memref<16x128x2xvector<8xf32>> + %48 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + affine.store %48, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 6) floordiv 16) * 16 + 6, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %49 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + affine.store %49, %3[((%arg5 + 104) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 104) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 104) floordiv 16) * 2 + 13) floordiv 2) * 2 + 13] : memref<16x128x2xvector<8xf32>> + %50 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + affine.store %50, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 7) floordiv 16) * 16 + 7, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.store %51, %3[((%arg5 + 120) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 120) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 120) floordiv 16) * 2 + 15) floordiv 2) * 2 + 15] : memref<16x128x2xvector<8xf32>> + } else { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %7 = vector.transfer_read %arg1[%arg4, %6], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %7, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %10 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %13 = vector.transfer_read %arg1[%arg4, %12], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %13, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %14 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %18 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %19 = vector.transfer_read %arg1[%arg4, %18], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %19, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %22 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %25 = vector.transfer_read %arg1[%arg4, %24], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %25, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %26 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %30 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %31 = vector.transfer_read %arg1[%arg4, %30], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %31, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %34 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %36 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + affine.store %36, %3[(%arg5 floordiv 16) mod 16, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + affine.store %37, %3[((%arg5 + 8) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %38 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + affine.store %38, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 1) floordiv 16) * 16 + 1, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + affine.store %39, %3[((%arg5 + 24) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 24) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 24) floordiv 16) * 2 + 3) floordiv 2) * 2 + 3] : memref<16x128x2xvector<8xf32>> + %40 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + affine.store %40, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 2) floordiv 16) * 16 + 2, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %41 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + affine.store %41, %3[((%arg5 + 40) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 40) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 40) floordiv 16) * 2 + 5) floordiv 2) * 2 + 5] : memref<16x128x2xvector<8xf32>> + %42 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + affine.store %42, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 3) floordiv 16) * 16 + 3, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + affine.store %43, %3[((%arg5 + 56) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 56) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 56) floordiv 16) * 2 + 7) floordiv 2) * 2 + 7] : memref<16x128x2xvector<8xf32>> + %44 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + affine.store %44, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 4) floordiv 16) * 16 + 4, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %45 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + affine.store %45, %3[((%arg5 + 72) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 72) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 72) floordiv 16) * 2 + 9) floordiv 2) * 2 + 9] : memref<16x128x2xvector<8xf32>> + %46 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + affine.store %46, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 5) floordiv 16) * 16 + 5, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %47 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + affine.store %47, %3[((%arg5 + 88) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 88) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 88) floordiv 16) * 2 + 11) floordiv 2) * 2 + 11] : memref<16x128x2xvector<8xf32>> + %48 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + affine.store %48, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 6) floordiv 16) * 16 + 6, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %49 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + affine.store %49, %3[((%arg5 + 104) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 104) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 104) floordiv 16) * 2 + 13) floordiv 2) * 2 + 13] : memref<16x128x2xvector<8xf32>> + %50 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + affine.store %50, %3[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 7) floordiv 16) * 16 + 7, %arg4 mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %51 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.store %51, %3[((%arg5 + 120) floordiv 16) mod 16, %arg4 mod 128, %arg5 floordiv 8 - ((%arg5 + 120) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 120) floordiv 16) * 2 + 15) floordiv 2) * 2 + 15] : memref<16x128x2xvector<8xf32>> + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_o,21}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{i_o,19}">, subdomainIndexOrder = [#accln<"index{i,17}">, #accln<"index{j,18}">], subdomainSize = [1, 256]} + affine.for %arg4 = 0 to 784 { + affine.for %arg5 = 0 to 16 { + affine.for %arg6 = 0 to 6 { + affine.for %arg7 = 0 to 2 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 2 : i64, index = #accln<"index{j_i_i_i,16}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 1]} + } {begin = 0 : i64, end = 6 : i64, index = #accln<"index{j_i_i_o,15}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 1, 2]} + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_i,14}">, subdomainIndexOrder = [#accln<"index{j_i_i,14}">, #accln<"index{j_i_i_o,15}">, #accln<"index{j_i_i_i,16}">], subdomainSize = [1, 6, 2]} + affine.for %arg5 = 0 to 256 step 16 { + affine.for %arg6 = 0 to 128 step 4 { + affine.for %arg7 = 0 to 0 step 6 { + affine.for %arg8 = 0 to 4 { + affine.for %arg9 = 0 to 0 { + %4 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %5 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %6 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %7 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %8 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %10 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %11 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %13 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %14 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %15 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %17 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %18 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %19 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %20 = load %arg0[%4, %12] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = load %arg0[%5, %13] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %22 = load %arg0[%6, %14] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %23 = load %arg0[%7, %15] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %24 = load %arg0[%8, %16] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %25 = load %arg0[%9, %17] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %26 = load %arg0[%10, %18] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %27 = load %arg0[%11, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %28 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg8) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %29 = vector.extractelement %28[%c0_i64 : i64] : vector<8xf32> + %30 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg8) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %31 = vector.extractelement %30[%c1_i64 : i64] : vector<8xf32> + %32 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg8) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c2_i64 : i64] : vector<8xf32> + %34 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg8) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %35 = vector.extractelement %34[%c3_i64 : i64] : vector<8xf32> + %36 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg8) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %37 = vector.extractelement %36[%c4_i64 : i64] : vector<8xf32> + %38 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg8) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %39 = vector.extractelement %38[%c5_i64 : i64] : vector<8xf32> + %40 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg8) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %41 = vector.extractelement %40[%c6_i64 : i64] : vector<8xf32> + %42 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg8) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %43 = vector.extractelement %42[%c7_i64 : i64] : vector<8xf32> + %44 = "accv.bin_op"(%20, %29) {predicate = 2 : i64} : (f32, f32) -> f32 + %45 = "accv.bin_op"(%21, %31) {predicate = 2 : i64} : (f32, f32) -> f32 + %46 = "accv.bin_op"(%22, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %47 = "accv.bin_op"(%23, %35) {predicate = 2 : i64} : (f32, f32) -> f32 + %48 = "accv.bin_op"(%24, %37) {predicate = 2 : i64} : (f32, f32) -> f32 + %49 = "accv.bin_op"(%25, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %50 = "accv.bin_op"(%26, %41) {predicate = 2 : i64} : (f32, f32) -> f32 + %51 = "accv.bin_op"(%27, %43) {predicate = 2 : i64} : (f32, f32) -> f32 + %52 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %53 = vector.extractelement %52[%c0_i64 : i64] : vector<8xf32> + %54 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %55 = vector.extractelement %54[%c1_i64 : i64] : vector<8xf32> + %56 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %57 = vector.extractelement %56[%c2_i64 : i64] : vector<8xf32> + %58 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = vector.extractelement %58[%c3_i64 : i64] : vector<8xf32> + %60 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %61 = vector.extractelement %60[%c4_i64 : i64] : vector<8xf32> + %62 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %63 = vector.extractelement %62[%c5_i64 : i64] : vector<8xf32> + %64 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %65 = vector.extractelement %64[%c6_i64 : i64] : vector<8xf32> + %66 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %67 = vector.extractelement %66[%c7_i64 : i64] : vector<8xf32> + %68 = "accv.bin_op"(%53, %44) {predicate = 0 : i64} : (f32, f32) -> f32 + %69 = "accv.bin_op"(%55, %45) {predicate = 0 : i64} : (f32, f32) -> f32 + %70 = "accv.bin_op"(%57, %46) {predicate = 0 : i64} : (f32, f32) -> f32 + %71 = "accv.bin_op"(%59, %47) {predicate = 0 : i64} : (f32, f32) -> f32 + %72 = "accv.bin_op"(%61, %48) {predicate = 0 : i64} : (f32, f32) -> f32 + %73 = "accv.bin_op"(%63, %49) {predicate = 0 : i64} : (f32, f32) -> f32 + %74 = "accv.bin_op"(%65, %50) {predicate = 0 : i64} : (f32, f32) -> f32 + %75 = "accv.bin_op"(%67, %51) {predicate = 0 : i64} : (f32, f32) -> f32 + %76 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %77 = vector.insertelement %68, %76[%c0_i64 : i64] : vector<8xf32> + affine.store %77, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %78 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.insertelement %69, %78[%c1_i64 : i64] : vector<8xf32> + affine.store %79, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %80 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %81 = vector.insertelement %70, %80[%c2_i64 : i64] : vector<8xf32> + affine.store %81, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %83 = vector.insertelement %71, %82[%c3_i64 : i64] : vector<8xf32> + affine.store %83, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %84 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %85 = vector.insertelement %72, %84[%c4_i64 : i64] : vector<8xf32> + affine.store %85, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %86 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %87 = vector.insertelement %73, %86[%c5_i64 : i64] : vector<8xf32> + affine.store %87, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %88 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %89 = vector.insertelement %74, %88[%c6_i64 : i64] : vector<8xf32> + affine.store %89, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %90 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %91 = vector.insertelement %75, %90[%c7_i64 : i64] : vector<8xf32> + affine.store %91, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = vector.insertelement %68, %92[%c0_i64 : i64] : vector<8xf32> + affine.store %93, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %69, %94[%c1_i64 : i64] : vector<8xf32> + affine.store %95, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %70, %96[%c2_i64 : i64] : vector<8xf32> + affine.store %97, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %71, %98[%c3_i64 : i64] : vector<8xf32> + affine.store %99, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %100 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %101 = vector.insertelement %72, %100[%c4_i64 : i64] : vector<8xf32> + affine.store %101, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %102 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %103 = vector.insertelement %73, %102[%c5_i64 : i64] : vector<8xf32> + affine.store %103, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %104 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %105 = vector.insertelement %74, %104[%c6_i64 : i64] : vector<8xf32> + affine.store %105, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %106 = affine.load %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %107 = vector.insertelement %75, %106[%c7_i64 : i64] : vector<8xf32> + affine.store %107, %2[(%arg5 floordiv 16) mod 16, (%arg7 + %arg9) mod 6, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %108 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %109 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %110 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %111 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %112 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %113 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %114 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %115 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg4, %arg7, %arg9) + %116 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %117 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %118 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %119 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %120 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %121 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %122 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %123 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg8) + %124 = load %arg0[%108, %116] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %125 = load %arg0[%109, %117] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %126 = load %arg0[%110, %118] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %127 = load %arg0[%111, %119] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %128 = load %arg0[%112, %120] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg0[%113, %121] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %130 = load %arg0[%114, %122] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %131 = load %arg0[%115, %123] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %132 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %133 = vector.extractelement %132[%c0_i64 : i64] : vector<8xf32> + %134 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %135 = vector.extractelement %134[%c1_i64 : i64] : vector<8xf32> + %136 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %137 = vector.extractelement %136[%c2_i64 : i64] : vector<8xf32> + %138 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %139 = vector.extractelement %138[%c3_i64 : i64] : vector<8xf32> + %140 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %141 = vector.extractelement %140[%c4_i64 : i64] : vector<8xf32> + %142 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %143 = vector.extractelement %142[%c5_i64 : i64] : vector<8xf32> + %144 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %145 = vector.extractelement %144[%c6_i64 : i64] : vector<8xf32> + %146 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg8) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %147 = vector.extractelement %146[%c7_i64 : i64] : vector<8xf32> + %148 = "accv.bin_op"(%124, %133) {predicate = 2 : i64} : (f32, f32) -> f32 + %149 = "accv.bin_op"(%125, %135) {predicate = 2 : i64} : (f32, f32) -> f32 + %150 = "accv.bin_op"(%126, %137) {predicate = 2 : i64} : (f32, f32) -> f32 + %151 = "accv.bin_op"(%127, %139) {predicate = 2 : i64} : (f32, f32) -> f32 + %152 = "accv.bin_op"(%128, %141) {predicate = 2 : i64} : (f32, f32) -> f32 + %153 = "accv.bin_op"(%129, %143) {predicate = 2 : i64} : (f32, f32) -> f32 + %154 = "accv.bin_op"(%130, %145) {predicate = 2 : i64} : (f32, f32) -> f32 + %155 = "accv.bin_op"(%131, %147) {predicate = 2 : i64} : (f32, f32) -> f32 + %156 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %157 = vector.extractelement %156[%c0_i64 : i64] : vector<8xf32> + %158 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %159 = vector.extractelement %158[%c1_i64 : i64] : vector<8xf32> + %160 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %161 = vector.extractelement %160[%c2_i64 : i64] : vector<8xf32> + %162 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %163 = vector.extractelement %162[%c3_i64 : i64] : vector<8xf32> + %164 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %165 = vector.extractelement %164[%c4_i64 : i64] : vector<8xf32> + %166 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %167 = vector.extractelement %166[%c5_i64 : i64] : vector<8xf32> + %168 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %169 = vector.extractelement %168[%c6_i64 : i64] : vector<8xf32> + %170 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %171 = vector.extractelement %170[%c7_i64 : i64] : vector<8xf32> + %172 = "accv.bin_op"(%157, %148) {predicate = 0 : i64} : (f32, f32) -> f32 + %173 = "accv.bin_op"(%159, %149) {predicate = 0 : i64} : (f32, f32) -> f32 + %174 = "accv.bin_op"(%161, %150) {predicate = 0 : i64} : (f32, f32) -> f32 + %175 = "accv.bin_op"(%163, %151) {predicate = 0 : i64} : (f32, f32) -> f32 + %176 = "accv.bin_op"(%165, %152) {predicate = 0 : i64} : (f32, f32) -> f32 + %177 = "accv.bin_op"(%167, %153) {predicate = 0 : i64} : (f32, f32) -> f32 + %178 = "accv.bin_op"(%169, %154) {predicate = 0 : i64} : (f32, f32) -> f32 + %179 = "accv.bin_op"(%171, %155) {predicate = 0 : i64} : (f32, f32) -> f32 + %180 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %181 = vector.insertelement %172, %180[%c0_i64 : i64] : vector<8xf32> + affine.store %181, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %182 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %183 = vector.insertelement %173, %182[%c1_i64 : i64] : vector<8xf32> + affine.store %183, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %184 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %185 = vector.insertelement %174, %184[%c2_i64 : i64] : vector<8xf32> + affine.store %185, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %186 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %187 = vector.insertelement %175, %186[%c3_i64 : i64] : vector<8xf32> + affine.store %187, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %188 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %189 = vector.insertelement %176, %188[%c4_i64 : i64] : vector<8xf32> + affine.store %189, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %190 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %191 = vector.insertelement %177, %190[%c5_i64 : i64] : vector<8xf32> + affine.store %191, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %192 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %193 = vector.insertelement %178, %192[%c6_i64 : i64] : vector<8xf32> + affine.store %193, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %194 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %195 = vector.insertelement %179, %194[%c7_i64 : i64] : vector<8xf32> + affine.store %195, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %196 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %197 = vector.insertelement %172, %196[%c0_i64 : i64] : vector<8xf32> + affine.store %197, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %198 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %199 = vector.insertelement %173, %198[%c1_i64 : i64] : vector<8xf32> + affine.store %199, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %200 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %201 = vector.insertelement %174, %200[%c2_i64 : i64] : vector<8xf32> + affine.store %201, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %202 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %203 = vector.insertelement %175, %202[%c3_i64 : i64] : vector<8xf32> + affine.store %203, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %204 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %205 = vector.insertelement %176, %204[%c4_i64 : i64] : vector<8xf32> + affine.store %205, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %206 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %207 = vector.insertelement %177, %206[%c5_i64 : i64] : vector<8xf32> + affine.store %207, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %208 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %209 = vector.insertelement %178, %208[%c6_i64 : i64] : vector<8xf32> + affine.store %209, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %210 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %211 = vector.insertelement %179, %210[%c7_i64 : i64] : vector<8xf32> + affine.store %211, %2[((%arg5 + 8) floordiv 16) mod 16, (%arg7 + %arg9) mod 6, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_i,12}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 1]} + } {begin = 0 : i64, end = 0 : i64, index = #accln<"index{i_i_o,11}">, accv_unrolled, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [0, 16, 4]} + affine.for %arg7 = 0 to 4 { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %5 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %6 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %7 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %10 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %11 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%arg4, %5] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%arg4, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = load %arg0[%arg4, %7] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %16 = load %arg0[%arg4, %8] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg0[%arg4, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %18 = load %arg0[%arg4, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg0[%arg4, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %20 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg7) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %21 = vector.extractelement %20[%c0_i64 : i64] : vector<8xf32> + %22 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg7) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %23 = vector.extractelement %22[%c1_i64 : i64] : vector<8xf32> + %24 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg7) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %25 = vector.extractelement %24[%c2_i64 : i64] : vector<8xf32> + %26 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg7) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %27 = vector.extractelement %26[%c3_i64 : i64] : vector<8xf32> + %28 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg7) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %29 = vector.extractelement %28[%c4_i64 : i64] : vector<8xf32> + %30 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg7) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %31 = vector.extractelement %30[%c5_i64 : i64] : vector<8xf32> + %32 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg7) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %33 = vector.extractelement %32[%c6_i64 : i64] : vector<8xf32> + %34 = affine.load %3[(%arg5 floordiv 16) mod 16, (%arg6 + %arg7) mod 128, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x128x2xvector<8xf32>> + %35 = vector.extractelement %34[%c7_i64 : i64] : vector<8xf32> + %36 = "accv.bin_op"(%12, %21) {predicate = 2 : i64} : (f32, f32) -> f32 + %37 = "accv.bin_op"(%13, %23) {predicate = 2 : i64} : (f32, f32) -> f32 + %38 = "accv.bin_op"(%14, %25) {predicate = 2 : i64} : (f32, f32) -> f32 + %39 = "accv.bin_op"(%15, %27) {predicate = 2 : i64} : (f32, f32) -> f32 + %40 = "accv.bin_op"(%16, %29) {predicate = 2 : i64} : (f32, f32) -> f32 + %41 = "accv.bin_op"(%17, %31) {predicate = 2 : i64} : (f32, f32) -> f32 + %42 = "accv.bin_op"(%18, %33) {predicate = 2 : i64} : (f32, f32) -> f32 + %43 = "accv.bin_op"(%19, %35) {predicate = 2 : i64} : (f32, f32) -> f32 + %44 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %45 = vector.extractelement %44[%c0_i64 : i64] : vector<8xf32> + %46 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %47 = vector.extractelement %46[%c1_i64 : i64] : vector<8xf32> + %48 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %49 = vector.extractelement %48[%c2_i64 : i64] : vector<8xf32> + %50 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %51 = vector.extractelement %50[%c3_i64 : i64] : vector<8xf32> + %52 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %53 = vector.extractelement %52[%c4_i64 : i64] : vector<8xf32> + %54 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %55 = vector.extractelement %54[%c5_i64 : i64] : vector<8xf32> + %56 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %57 = vector.extractelement %56[%c6_i64 : i64] : vector<8xf32> + %58 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %59 = vector.extractelement %58[%c7_i64 : i64] : vector<8xf32> + %60 = "accv.bin_op"(%45, %36) {predicate = 0 : i64} : (f32, f32) -> f32 + %61 = "accv.bin_op"(%47, %37) {predicate = 0 : i64} : (f32, f32) -> f32 + %62 = "accv.bin_op"(%49, %38) {predicate = 0 : i64} : (f32, f32) -> f32 + %63 = "accv.bin_op"(%51, %39) {predicate = 0 : i64} : (f32, f32) -> f32 + %64 = "accv.bin_op"(%53, %40) {predicate = 0 : i64} : (f32, f32) -> f32 + %65 = "accv.bin_op"(%55, %41) {predicate = 0 : i64} : (f32, f32) -> f32 + %66 = "accv.bin_op"(%57, %42) {predicate = 0 : i64} : (f32, f32) -> f32 + %67 = "accv.bin_op"(%59, %43) {predicate = 0 : i64} : (f32, f32) -> f32 + %68 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %69 = vector.insertelement %60, %68[%c0_i64 : i64] : vector<8xf32> + affine.store %69, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %70 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %71 = vector.insertelement %61, %70[%c1_i64 : i64] : vector<8xf32> + affine.store %71, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %72 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %73 = vector.insertelement %62, %72[%c2_i64 : i64] : vector<8xf32> + affine.store %73, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %74 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %75 = vector.insertelement %63, %74[%c3_i64 : i64] : vector<8xf32> + affine.store %75, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %76 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %77 = vector.insertelement %64, %76[%c4_i64 : i64] : vector<8xf32> + affine.store %77, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %78 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %79 = vector.insertelement %65, %78[%c5_i64 : i64] : vector<8xf32> + affine.store %79, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %80 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %81 = vector.insertelement %66, %80[%c6_i64 : i64] : vector<8xf32> + affine.store %81, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %82 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %83 = vector.insertelement %67, %82[%c7_i64 : i64] : vector<8xf32> + affine.store %83, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %84 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %85 = vector.insertelement %60, %84[%c0_i64 : i64] : vector<8xf32> + affine.store %85, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %86 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %87 = vector.insertelement %61, %86[%c1_i64 : i64] : vector<8xf32> + affine.store %87, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %88 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %89 = vector.insertelement %62, %88[%c2_i64 : i64] : vector<8xf32> + affine.store %89, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %90 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %91 = vector.insertelement %63, %90[%c3_i64 : i64] : vector<8xf32> + affine.store %91, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %92 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %93 = vector.insertelement %64, %92[%c4_i64 : i64] : vector<8xf32> + affine.store %93, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %94 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %95 = vector.insertelement %65, %94[%c5_i64 : i64] : vector<8xf32> + affine.store %95, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %96 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %97 = vector.insertelement %66, %96[%c6_i64 : i64] : vector<8xf32> + affine.store %97, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %98 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %99 = vector.insertelement %67, %98[%c7_i64 : i64] : vector<8xf32> + affine.store %99, %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %100 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %101 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %102 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %103 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %104 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %105 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %106 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %107 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg6, %arg7) + %108 = load %arg0[%arg4, %100] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %109 = load %arg0[%arg4, %101] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %110 = load %arg0[%arg4, %102] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %111 = load %arg0[%arg4, %103] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %112 = load %arg0[%arg4, %104] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %113 = load %arg0[%arg4, %105] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %114 = load %arg0[%arg4, %106] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %115 = load %arg0[%arg4, %107] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %116 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg7) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %117 = vector.extractelement %116[%c0_i64 : i64] : vector<8xf32> + %118 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg7) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %119 = vector.extractelement %118[%c1_i64 : i64] : vector<8xf32> + %120 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg7) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %121 = vector.extractelement %120[%c2_i64 : i64] : vector<8xf32> + %122 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg7) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %123 = vector.extractelement %122[%c3_i64 : i64] : vector<8xf32> + %124 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg7) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %125 = vector.extractelement %124[%c4_i64 : i64] : vector<8xf32> + %126 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg7) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %127 = vector.extractelement %126[%c5_i64 : i64] : vector<8xf32> + %128 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg7) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %129 = vector.extractelement %128[%c6_i64 : i64] : vector<8xf32> + %130 = affine.load %3[((%arg5 + 8) floordiv 16) mod 16, (%arg6 + %arg7) mod 128, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x128x2xvector<8xf32>> + %131 = vector.extractelement %130[%c7_i64 : i64] : vector<8xf32> + %132 = "accv.bin_op"(%108, %117) {predicate = 2 : i64} : (f32, f32) -> f32 + %133 = "accv.bin_op"(%109, %119) {predicate = 2 : i64} : (f32, f32) -> f32 + %134 = "accv.bin_op"(%110, %121) {predicate = 2 : i64} : (f32, f32) -> f32 + %135 = "accv.bin_op"(%111, %123) {predicate = 2 : i64} : (f32, f32) -> f32 + %136 = "accv.bin_op"(%112, %125) {predicate = 2 : i64} : (f32, f32) -> f32 + %137 = "accv.bin_op"(%113, %127) {predicate = 2 : i64} : (f32, f32) -> f32 + %138 = "accv.bin_op"(%114, %129) {predicate = 2 : i64} : (f32, f32) -> f32 + %139 = "accv.bin_op"(%115, %131) {predicate = 2 : i64} : (f32, f32) -> f32 + %140 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %141 = vector.extractelement %140[%c0_i64 : i64] : vector<8xf32> + %142 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %143 = vector.extractelement %142[%c1_i64 : i64] : vector<8xf32> + %144 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %145 = vector.extractelement %144[%c2_i64 : i64] : vector<8xf32> + %146 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %147 = vector.extractelement %146[%c3_i64 : i64] : vector<8xf32> + %148 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %149 = vector.extractelement %148[%c4_i64 : i64] : vector<8xf32> + %150 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %151 = vector.extractelement %150[%c5_i64 : i64] : vector<8xf32> + %152 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %153 = vector.extractelement %152[%c6_i64 : i64] : vector<8xf32> + %154 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %155 = vector.extractelement %154[%c7_i64 : i64] : vector<8xf32> + %156 = "accv.bin_op"(%141, %132) {predicate = 0 : i64} : (f32, f32) -> f32 + %157 = "accv.bin_op"(%143, %133) {predicate = 0 : i64} : (f32, f32) -> f32 + %158 = "accv.bin_op"(%145, %134) {predicate = 0 : i64} : (f32, f32) -> f32 + %159 = "accv.bin_op"(%147, %135) {predicate = 0 : i64} : (f32, f32) -> f32 + %160 = "accv.bin_op"(%149, %136) {predicate = 0 : i64} : (f32, f32) -> f32 + %161 = "accv.bin_op"(%151, %137) {predicate = 0 : i64} : (f32, f32) -> f32 + %162 = "accv.bin_op"(%153, %138) {predicate = 0 : i64} : (f32, f32) -> f32 + %163 = "accv.bin_op"(%155, %139) {predicate = 0 : i64} : (f32, f32) -> f32 + %164 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %165 = vector.insertelement %156, %164[%c0_i64 : i64] : vector<8xf32> + affine.store %165, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %166 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %167 = vector.insertelement %157, %166[%c1_i64 : i64] : vector<8xf32> + affine.store %167, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %168 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %169 = vector.insertelement %158, %168[%c2_i64 : i64] : vector<8xf32> + affine.store %169, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %170 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %171 = vector.insertelement %159, %170[%c3_i64 : i64] : vector<8xf32> + affine.store %171, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %172 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %173 = vector.insertelement %160, %172[%c4_i64 : i64] : vector<8xf32> + affine.store %173, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %174 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %175 = vector.insertelement %161, %174[%c5_i64 : i64] : vector<8xf32> + affine.store %175, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %176 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %177 = vector.insertelement %162, %176[%c6_i64 : i64] : vector<8xf32> + affine.store %177, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %178 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %179 = vector.insertelement %163, %178[%c7_i64 : i64] : vector<8xf32> + affine.store %179, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %180 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %181 = vector.insertelement %156, %180[%c0_i64 : i64] : vector<8xf32> + affine.store %181, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %182 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %183 = vector.insertelement %157, %182[%c1_i64 : i64] : vector<8xf32> + affine.store %183, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %184 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %185 = vector.insertelement %158, %184[%c2_i64 : i64] : vector<8xf32> + affine.store %185, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %186 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %187 = vector.insertelement %159, %186[%c3_i64 : i64] : vector<8xf32> + affine.store %187, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %188 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %189 = vector.insertelement %160, %188[%c4_i64 : i64] : vector<8xf32> + affine.store %189, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %190 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %191 = vector.insertelement %161, %190[%c5_i64 : i64] : vector<8xf32> + affine.store %191, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %192 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %193 = vector.insertelement %162, %192[%c6_i64 : i64] : vector<8xf32> + affine.store %193, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %194 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %195 = vector.insertelement %163, %194[%c7_i64 : i64] : vector<8xf32> + affine.store %195, %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + } {begin = 0 : i64, end = 4 : i64, index = #accln<"index{k_i_i,10}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 1]} + } {begin = 0 : i64, end = 128 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 4]} + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 16, 128]} + affine.for %arg5 = 0 to 256 step 128 { + affine.if affine_set<() : (0 == 0)>() { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %7 = addf %5, %6 : vector<8xf32> + store %7, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %9 = vector.transfer_read %arg2[%arg4, %8], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %10 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %11 = addf %9, %10 : vector<8xf32> + store %11, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %13 = vector.transfer_read %arg2[%arg4, %12], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %14 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 1) floordiv 16) * 16 + 1, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %15 = addf %13, %14 : vector<8xf32> + store %15, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %17 = vector.transfer_read %arg2[%arg4, %16], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %18 = affine.load %2[((%arg5 + 24) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 24) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 24) floordiv 16) * 2 + 3) floordiv 2) * 2 + 3] : memref<16x6x2xvector<8xf32>> + %19 = addf %17, %18 : vector<8xf32> + store %19, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %21 = vector.transfer_read %arg2[%arg4, %20], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %22 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 2) floordiv 16) * 16 + 2, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %23 = addf %21, %22 : vector<8xf32> + store %23, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %25 = vector.transfer_read %arg2[%arg4, %24], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %26 = affine.load %2[((%arg5 + 40) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 40) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 40) floordiv 16) * 2 + 5) floordiv 2) * 2 + 5] : memref<16x6x2xvector<8xf32>> + %27 = addf %25, %26 : vector<8xf32> + store %27, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %29 = vector.transfer_read %arg2[%arg4, %28], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %30 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 3) floordiv 16) * 16 + 3, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %31 = addf %29, %30 : vector<8xf32> + store %31, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %33 = vector.transfer_read %arg2[%arg4, %32], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = affine.load %2[((%arg5 + 56) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 56) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 56) floordiv 16) * 2 + 7) floordiv 2) * 2 + 7] : memref<16x6x2xvector<8xf32>> + %35 = addf %33, %34 : vector<8xf32> + store %35, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %36 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %37 = vector.transfer_read %arg2[%arg4, %36], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %38 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 4) floordiv 16) * 16 + 4, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %39 = addf %37, %38 : vector<8xf32> + store %39, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %41 = vector.transfer_read %arg2[%arg4, %40], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %42 = affine.load %2[((%arg5 + 72) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 72) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 72) floordiv 16) * 2 + 9) floordiv 2) * 2 + 9] : memref<16x6x2xvector<8xf32>> + %43 = addf %41, %42 : vector<8xf32> + store %43, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %44 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %45 = vector.transfer_read %arg2[%arg4, %44], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %46 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 5) floordiv 16) * 16 + 5, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %47 = addf %45, %46 : vector<8xf32> + store %47, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %49 = vector.transfer_read %arg2[%arg4, %48], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %50 = affine.load %2[((%arg5 + 88) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 88) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 88) floordiv 16) * 2 + 11) floordiv 2) * 2 + 11] : memref<16x6x2xvector<8xf32>> + %51 = addf %49, %50 : vector<8xf32> + store %51, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %52 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %53 = vector.transfer_read %arg2[%arg4, %52], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %54 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 6) floordiv 16) * 16 + 6, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %55 = addf %53, %54 : vector<8xf32> + store %55, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %57 = vector.transfer_read %arg2[%arg4, %56], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %58 = affine.load %2[((%arg5 + 104) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 104) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 104) floordiv 16) * 2 + 13) floordiv 2) * 2 + 13] : memref<16x6x2xvector<8xf32>> + %59 = addf %57, %58 : vector<8xf32> + store %59, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %60 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %61 = vector.transfer_read %arg2[%arg4, %60], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %62 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 7) floordiv 16) * 16 + 7, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %63 = addf %61, %62 : vector<8xf32> + store %63, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %65 = vector.transfer_read %arg2[%arg4, %64], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %66 = affine.load %2[((%arg5 + 120) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 120) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 120) floordiv 16) * 2 + 15) floordiv 2) * 2 + 15] : memref<16x6x2xvector<8xf32>> + %67 = addf %65, %66 : vector<8xf32> + store %67, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %68 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %69 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %69, %arg2[%arg4, %68] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{k_i_o,9}">, subdomainIndexOrder = [#accln<"index{i_i,8}">, #accln<"index{k_i_o,9}">], subdomainSize = [1, 1]} + } else { + %4 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg3, %arg5) + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = affine.load %2[(%arg5 floordiv 16) mod 16, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %7 = addf %5, %6 : vector<8xf32> + store %7, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %8 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 8)>(%arg3, %arg5) + %9 = vector.transfer_read %arg2[%arg4, %8], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %10 = affine.load %2[((%arg5 + 8) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 8) floordiv 16) * 2 + 1) floordiv 2) * 2 + 1] : memref<16x6x2xvector<8xf32>> + %11 = addf %9, %10 : vector<8xf32> + store %11, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 16)>(%arg3, %arg5) + %13 = vector.transfer_read %arg2[%arg4, %12], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %14 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 1) floordiv 16) * 16 + 1, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %15 = addf %13, %14 : vector<8xf32> + store %15, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %16 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 24)>(%arg3, %arg5) + %17 = vector.transfer_read %arg2[%arg4, %16], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %18 = affine.load %2[((%arg5 + 24) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 24) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 24) floordiv 16) * 2 + 3) floordiv 2) * 2 + 3] : memref<16x6x2xvector<8xf32>> + %19 = addf %17, %18 : vector<8xf32> + store %19, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %20 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 32)>(%arg3, %arg5) + %21 = vector.transfer_read %arg2[%arg4, %20], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %22 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 2) floordiv 16) * 16 + 2, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %23 = addf %21, %22 : vector<8xf32> + store %23, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %24 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 40)>(%arg3, %arg5) + %25 = vector.transfer_read %arg2[%arg4, %24], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %26 = affine.load %2[((%arg5 + 40) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 40) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 40) floordiv 16) * 2 + 5) floordiv 2) * 2 + 5] : memref<16x6x2xvector<8xf32>> + %27 = addf %25, %26 : vector<8xf32> + store %27, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %28 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 48)>(%arg3, %arg5) + %29 = vector.transfer_read %arg2[%arg4, %28], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %30 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 3) floordiv 16) * 16 + 3, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %31 = addf %29, %30 : vector<8xf32> + store %31, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %32 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 56)>(%arg3, %arg5) + %33 = vector.transfer_read %arg2[%arg4, %32], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %34 = affine.load %2[((%arg5 + 56) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 56) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 56) floordiv 16) * 2 + 7) floordiv 2) * 2 + 7] : memref<16x6x2xvector<8xf32>> + %35 = addf %33, %34 : vector<8xf32> + store %35, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %36 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 64)>(%arg3, %arg5) + %37 = vector.transfer_read %arg2[%arg4, %36], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %38 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 4) floordiv 16) * 16 + 4, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %39 = addf %37, %38 : vector<8xf32> + store %39, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %40 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 72)>(%arg3, %arg5) + %41 = vector.transfer_read %arg2[%arg4, %40], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %42 = affine.load %2[((%arg5 + 72) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 72) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 72) floordiv 16) * 2 + 9) floordiv 2) * 2 + 9] : memref<16x6x2xvector<8xf32>> + %43 = addf %41, %42 : vector<8xf32> + store %43, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %44 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 80)>(%arg3, %arg5) + %45 = vector.transfer_read %arg2[%arg4, %44], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %46 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 5) floordiv 16) * 16 + 5, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %47 = addf %45, %46 : vector<8xf32> + store %47, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %48 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 88)>(%arg3, %arg5) + %49 = vector.transfer_read %arg2[%arg4, %48], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %50 = affine.load %2[((%arg5 + 88) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 88) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 88) floordiv 16) * 2 + 11) floordiv 2) * 2 + 11] : memref<16x6x2xvector<8xf32>> + %51 = addf %49, %50 : vector<8xf32> + store %51, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %52 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 96)>(%arg3, %arg5) + %53 = vector.transfer_read %arg2[%arg4, %52], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %54 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 6) floordiv 16) * 16 + 6, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %55 = addf %53, %54 : vector<8xf32> + store %55, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %56 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 104)>(%arg3, %arg5) + %57 = vector.transfer_read %arg2[%arg4, %56], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %58 = affine.load %2[((%arg5 + 104) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 104) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 104) floordiv 16) * 2 + 13) floordiv 2) * 2 + 13] : memref<16x6x2xvector<8xf32>> + %59 = addf %57, %58 : vector<8xf32> + store %59, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %60 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 112)>(%arg3, %arg5) + %61 = vector.transfer_read %arg2[%arg4, %60], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %62 = affine.load %2[%arg5 floordiv 16 - ((%arg5 floordiv 16 + 7) floordiv 16) * 16 + 7, 0, ((%arg5 mod 16) floordiv 8) mod 2] : memref<16x6x2xvector<8xf32>> + %63 = addf %61, %62 : vector<8xf32> + store %63, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %64 = affine.apply affine_map<(d0, d1) -> (d0 + d1 + 120)>(%arg3, %arg5) + %65 = vector.transfer_read %arg2[%arg4, %64], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %66 = affine.load %2[((%arg5 + 120) floordiv 16) mod 16, 0, %arg5 floordiv 8 - ((%arg5 + 120) floordiv 16) * 2 - ((%arg5 floordiv 8 - ((%arg5 + 120) floordiv 16) * 2 + 15) floordiv 2) * 2 + 15] : memref<16x6x2xvector<8xf32>> + %67 = addf %65, %66 : vector<8xf32> + store %67, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + affine.for %arg6 = 0 to 16 { + %68 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2 * 8)>(%arg3, %arg5, %arg6) + %69 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %69, %arg2[%arg4, %68] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } {begin = 0 : i64, end = 16 : i64, index = #accln<"index{j_i_o,13}">, subdomainIndexOrder = [#accln<"index{i_i_i,12}">, #accln<"index{j_i_o,13}">], subdomainSize = [1, 1]} + } + } {begin = 0 : i64, end = 256 : i64, index = #accln<"index{j_i,4}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">], subdomainSize = [1, 128]} + } {begin = 0 : i64, end = 784 : i64, index = #accln<"index{i_o,7}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [1, 256, 128]} + } {begin = 0 : i64, end = 512 : i64, index = #accln<"index{j_o,3}">, subdomainIndexOrder = [#accln<"index{i,0}">, #accln<"index{j,1}">, #accln<"index{k,2}">], subdomainSize = [784, 256, 128]} + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/7_ConvertAffineToStandard.mlir b/Tutorials/optimized_matmul/mlir/7_ConvertAffineToStandard.mlir new file mode 100644 index 00000000..aa7cd201 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/7_ConvertAffineToStandard.mlir @@ -0,0 +1,2115 @@ +module @optimized_matmul { + accv.module "optimized_matmul" { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c780 = constant 780 : index + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c0 = constant 0 : index + %c512 = constant 512 : index + %c256 = constant 256 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + %c0_0 = constant 0 : index + %c780_1 = constant 780 : index + %c6 = constant 6 : index + scf.for %arg4 = %c0_0 to %c780_1 step %c6 { + %c0_4 = constant 0 : index + %c256_5 = constant 256 : index + %c16_6 = constant 16 : index + scf.for %arg5 = %c0_4 to %c256_5 step %c16_6 { + %c0_7 = constant 0 : index + %c128 = constant 128 : index + %c4 = constant 4 : index + scf.for %arg6 = %c0_7 to %c128 step %c4 { + %c0_8 = constant 0 : index + %c4_9 = constant 4 : index + %c1 = constant 1 : index + scf.for %arg7 = %c0_8 to %c4_9 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = "accv.bin_op"(%2, %3) {predicate = 2 : i64} : (f32, f32) -> f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = "accv.bin_op"(%5, %4) {predicate = 0 : i64} : (f32, f32) -> f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %arg3, %arg5 : index + %c1_10 = constant 1 : index + %9 = addi %8, %c1_10 : index + %10 = addi %arg6, %arg7 : index + %11 = load %arg0[%arg4, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg1[%10, %9] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = "accv.bin_op"(%11, %12) {predicate = 2 : i64} : (f32, f32) -> f32 + %14 = load %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = "accv.bin_op"(%14, %13) {predicate = 0 : i64} : (f32, f32) -> f32 + store %15, %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = load %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %16, %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %17 = addi %arg3, %arg5 : index + %c2 = constant 2 : index + %18 = addi %17, %c2 : index + %19 = addi %arg6, %arg7 : index + %20 = load %arg0[%arg4, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = load %arg1[%19, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = "accv.bin_op"(%20, %21) {predicate = 2 : i64} : (f32, f32) -> f32 + %23 = load %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = "accv.bin_op"(%23, %22) {predicate = 0 : i64} : (f32, f32) -> f32 + store %24, %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = load %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %25, %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %26 = addi %arg3, %arg5 : index + %c3 = constant 3 : index + %27 = addi %26, %c3 : index + %28 = addi %arg6, %arg7 : index + %29 = load %arg0[%arg4, %28] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %30 = load %arg1[%28, %27] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = "accv.bin_op"(%29, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %32 = load %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %33 = "accv.bin_op"(%32, %31) {predicate = 0 : i64} : (f32, f32) -> f32 + store %33, %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = load %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %34, %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = addi %arg3, %arg5 : index + %c4_11 = constant 4 : index + %36 = addi %35, %c4_11 : index + %37 = addi %arg6, %arg7 : index + %38 = load %arg0[%arg4, %37] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %39 = load %arg1[%37, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = "accv.bin_op"(%38, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %41 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = "accv.bin_op"(%41, %40) {predicate = 0 : i64} : (f32, f32) -> f32 + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %43, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = addi %arg3, %arg5 : index + %c5 = constant 5 : index + %45 = addi %44, %c5 : index + %46 = addi %arg6, %arg7 : index + %47 = load %arg0[%arg4, %46] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %48 = load %arg1[%46, %45] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = "accv.bin_op"(%47, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %50 = load %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %51 = "accv.bin_op"(%50, %49) {predicate = 0 : i64} : (f32, f32) -> f32 + store %51, %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = load %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %52, %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = addi %arg3, %arg5 : index + %c6_12 = constant 6 : index + %54 = addi %53, %c6_12 : index + %55 = addi %arg6, %arg7 : index + %56 = load %arg0[%arg4, %55] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %57 = load %arg1[%55, %54] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %58 = "accv.bin_op"(%56, %57) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = load %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = "accv.bin_op"(%59, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + store %60, %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %61 = load %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %61, %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addi %arg3, %arg5 : index + %c7 = constant 7 : index + %63 = addi %62, %c7 : index + %64 = addi %arg6, %arg7 : index + %65 = load %arg0[%arg4, %64] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%64, %63] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = "accv.bin_op"(%65, %66) {predicate = 2 : i64} : (f32, f32) -> f32 + %68 = load %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = "accv.bin_op"(%68, %67) {predicate = 0 : i64} : (f32, f32) -> f32 + store %69, %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %arg3, %arg5 : index + %c8 = constant 8 : index + %72 = addi %71, %c8 : index + %73 = addi %arg6, %arg7 : index + %74 = load %arg0[%arg4, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = "accv.bin_op"(%74, %75) {predicate = 2 : i64} : (f32, f32) -> f32 + %77 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = "accv.bin_op"(%77, %76) {predicate = 0 : i64} : (f32, f32) -> f32 + store %78, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = addi %arg3, %arg5 : index + %c9 = constant 9 : index + %81 = addi %80, %c9 : index + %82 = addi %arg6, %arg7 : index + %83 = load %arg0[%arg4, %82] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %84 = load %arg1[%82, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = "accv.bin_op"(%83, %84) {predicate = 2 : i64} : (f32, f32) -> f32 + %86 = load %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = "accv.bin_op"(%86, %85) {predicate = 0 : i64} : (f32, f32) -> f32 + store %87, %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = load %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %88, %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %89 = addi %arg3, %arg5 : index + %c10 = constant 10 : index + %90 = addi %89, %c10 : index + %91 = addi %arg6, %arg7 : index + %92 = load %arg0[%arg4, %91] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %93 = load %arg1[%91, %90] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = "accv.bin_op"(%92, %93) {predicate = 2 : i64} : (f32, f32) -> f32 + %95 = load %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = "accv.bin_op"(%95, %94) {predicate = 0 : i64} : (f32, f32) -> f32 + store %96, %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = load %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %97, %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = addi %arg3, %arg5 : index + %c11 = constant 11 : index + %99 = addi %98, %c11 : index + %100 = addi %arg6, %arg7 : index + %101 = load %arg0[%arg4, %100] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %102 = load %arg1[%100, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = "accv.bin_op"(%101, %102) {predicate = 2 : i64} : (f32, f32) -> f32 + %104 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = "accv.bin_op"(%104, %103) {predicate = 0 : i64} : (f32, f32) -> f32 + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %106, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %107 = addi %arg3, %arg5 : index + %c12 = constant 12 : index + %108 = addi %107, %c12 : index + %109 = addi %arg6, %arg7 : index + %110 = load %arg0[%arg4, %109] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %111 = load %arg1[%109, %108] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = "accv.bin_op"(%110, %111) {predicate = 2 : i64} : (f32, f32) -> f32 + %113 = load %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %114 = "accv.bin_op"(%113, %112) {predicate = 0 : i64} : (f32, f32) -> f32 + store %114, %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = load %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %115, %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = addi %arg3, %arg5 : index + %c13 = constant 13 : index + %117 = addi %116, %c13 : index + %118 = addi %arg6, %arg7 : index + %119 = load %arg0[%arg4, %118] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%118, %117] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = "accv.bin_op"(%119, %120) {predicate = 2 : i64} : (f32, f32) -> f32 + %122 = load %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = "accv.bin_op"(%122, %121) {predicate = 0 : i64} : (f32, f32) -> f32 + store %123, %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = addi %arg3, %arg5 : index + %c14 = constant 14 : index + %126 = addi %125, %c14 : index + %127 = addi %arg6, %arg7 : index + %128 = load %arg0[%arg4, %127] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg1[%127, %126] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = "accv.bin_op"(%128, %129) {predicate = 2 : i64} : (f32, f32) -> f32 + %131 = load %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = "accv.bin_op"(%131, %130) {predicate = 0 : i64} : (f32, f32) -> f32 + store %132, %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = load %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %133, %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = addi %arg3, %arg5 : index + %c15 = constant 15 : index + %135 = addi %134, %c15 : index + %136 = addi %arg6, %arg7 : index + %137 = load %arg0[%arg4, %136] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%136, %135] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = "accv.bin_op"(%137, %138) {predicate = 2 : i64} : (f32, f32) -> f32 + %140 = load %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = "accv.bin_op"(%140, %139) {predicate = 0 : i64} : (f32, f32) -> f32 + store %141, %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_13 = constant 1 : index + %143 = addi %arg4, %c1_13 : index + %144 = addi %arg3, %arg5 : index + %145 = addi %arg6, %arg7 : index + %146 = load %arg0[%143, %145] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %147 = load %arg1[%145, %144] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = "accv.bin_op"(%146, %147) {predicate = 2 : i64} : (f32, f32) -> f32 + %149 = load %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = "accv.bin_op"(%149, %148) {predicate = 0 : i64} : (f32, f32) -> f32 + store %150, %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = load %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %151, %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_14 = constant 1 : index + %152 = addi %arg4, %c1_14 : index + %153 = addi %arg3, %arg5 : index + %c1_15 = constant 1 : index + %154 = addi %153, %c1_15 : index + %155 = addi %arg6, %arg7 : index + %156 = load %arg0[%152, %155] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%155, %154] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = "accv.bin_op"(%156, %157) {predicate = 2 : i64} : (f32, f32) -> f32 + %159 = load %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = "accv.bin_op"(%159, %158) {predicate = 0 : i64} : (f32, f32) -> f32 + store %160, %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_16 = constant 1 : index + %162 = addi %arg4, %c1_16 : index + %163 = addi %arg3, %arg5 : index + %c2_17 = constant 2 : index + %164 = addi %163, %c2_17 : index + %165 = addi %arg6, %arg7 : index + %166 = load %arg0[%162, %165] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %167 = load %arg1[%165, %164] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = "accv.bin_op"(%166, %167) {predicate = 2 : i64} : (f32, f32) -> f32 + %169 = load %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = "accv.bin_op"(%169, %168) {predicate = 0 : i64} : (f32, f32) -> f32 + store %170, %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = load %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %171, %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_18 = constant 1 : index + %172 = addi %arg4, %c1_18 : index + %173 = addi %arg3, %arg5 : index + %c3_19 = constant 3 : index + %174 = addi %173, %c3_19 : index + %175 = addi %arg6, %arg7 : index + %176 = load %arg0[%172, %175] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %177 = load %arg1[%175, %174] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = "accv.bin_op"(%176, %177) {predicate = 2 : i64} : (f32, f32) -> f32 + %179 = load %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = "accv.bin_op"(%179, %178) {predicate = 0 : i64} : (f32, f32) -> f32 + store %180, %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = load %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %181, %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_20 = constant 1 : index + %182 = addi %arg4, %c1_20 : index + %183 = addi %arg3, %arg5 : index + %c4_21 = constant 4 : index + %184 = addi %183, %c4_21 : index + %185 = addi %arg6, %arg7 : index + %186 = load %arg0[%182, %185] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%185, %184] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = "accv.bin_op"(%186, %187) {predicate = 2 : i64} : (f32, f32) -> f32 + %189 = load %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = "accv.bin_op"(%189, %188) {predicate = 0 : i64} : (f32, f32) -> f32 + store %190, %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_22 = constant 1 : index + %192 = addi %arg4, %c1_22 : index + %193 = addi %arg3, %arg5 : index + %c5_23 = constant 5 : index + %194 = addi %193, %c5_23 : index + %195 = addi %arg6, %arg7 : index + %196 = load %arg0[%192, %195] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %197 = load %arg1[%195, %194] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = "accv.bin_op"(%196, %197) {predicate = 2 : i64} : (f32, f32) -> f32 + %199 = load %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = "accv.bin_op"(%199, %198) {predicate = 0 : i64} : (f32, f32) -> f32 + store %200, %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = load %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %201, %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_24 = constant 1 : index + %202 = addi %arg4, %c1_24 : index + %203 = addi %arg3, %arg5 : index + %c6_25 = constant 6 : index + %204 = addi %203, %c6_25 : index + %205 = addi %arg6, %arg7 : index + %206 = load %arg0[%202, %205] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %207 = load %arg1[%205, %204] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = "accv.bin_op"(%206, %207) {predicate = 2 : i64} : (f32, f32) -> f32 + %209 = load %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = "accv.bin_op"(%209, %208) {predicate = 0 : i64} : (f32, f32) -> f32 + store %210, %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = load %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %211, %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_26 = constant 1 : index + %212 = addi %arg4, %c1_26 : index + %213 = addi %arg3, %arg5 : index + %c7_27 = constant 7 : index + %214 = addi %213, %c7_27 : index + %215 = addi %arg6, %arg7 : index + %216 = load %arg0[%212, %215] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %217 = load %arg1[%215, %214] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %218 = "accv.bin_op"(%216, %217) {predicate = 2 : i64} : (f32, f32) -> f32 + %219 = load %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = "accv.bin_op"(%219, %218) {predicate = 0 : i64} : (f32, f32) -> f32 + store %220, %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %221, %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_28 = constant 1 : index + %222 = addi %arg4, %c1_28 : index + %223 = addi %arg3, %arg5 : index + %c8_29 = constant 8 : index + %224 = addi %223, %c8_29 : index + %225 = addi %arg6, %arg7 : index + %226 = load %arg0[%222, %225] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %227 = load %arg1[%225, %224] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = "accv.bin_op"(%226, %227) {predicate = 2 : i64} : (f32, f32) -> f32 + %229 = load %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %230 = "accv.bin_op"(%229, %228) {predicate = 0 : i64} : (f32, f32) -> f32 + store %230, %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = load %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %231, %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_30 = constant 1 : index + %232 = addi %arg4, %c1_30 : index + %233 = addi %arg3, %arg5 : index + %c9_31 = constant 9 : index + %234 = addi %233, %c9_31 : index + %235 = addi %arg6, %arg7 : index + %236 = load %arg0[%232, %235] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %237 = load %arg1[%235, %234] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = "accv.bin_op"(%236, %237) {predicate = 2 : i64} : (f32, f32) -> f32 + %239 = load %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = "accv.bin_op"(%239, %238) {predicate = 0 : i64} : (f32, f32) -> f32 + store %240, %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %241, %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_32 = constant 1 : index + %242 = addi %arg4, %c1_32 : index + %243 = addi %arg3, %arg5 : index + %c10_33 = constant 10 : index + %244 = addi %243, %c10_33 : index + %245 = addi %arg6, %arg7 : index + %246 = load %arg0[%242, %245] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %247 = load %arg1[%245, %244] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %248 = "accv.bin_op"(%246, %247) {predicate = 2 : i64} : (f32, f32) -> f32 + %249 = load %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = "accv.bin_op"(%249, %248) {predicate = 0 : i64} : (f32, f32) -> f32 + store %250, %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %251, %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_34 = constant 1 : index + %252 = addi %arg4, %c1_34 : index + %253 = addi %arg3, %arg5 : index + %c11_35 = constant 11 : index + %254 = addi %253, %c11_35 : index + %255 = addi %arg6, %arg7 : index + %256 = load %arg0[%252, %255] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %257 = load %arg1[%255, %254] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = "accv.bin_op"(%256, %257) {predicate = 2 : i64} : (f32, f32) -> f32 + %259 = load %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %260 = "accv.bin_op"(%259, %258) {predicate = 0 : i64} : (f32, f32) -> f32 + store %260, %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = load %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %261, %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_36 = constant 1 : index + %262 = addi %arg4, %c1_36 : index + %263 = addi %arg3, %arg5 : index + %c12_37 = constant 12 : index + %264 = addi %263, %c12_37 : index + %265 = addi %arg6, %arg7 : index + %266 = load %arg0[%262, %265] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %267 = load %arg1[%265, %264] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = "accv.bin_op"(%266, %267) {predicate = 2 : i64} : (f32, f32) -> f32 + %269 = load %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = "accv.bin_op"(%269, %268) {predicate = 0 : i64} : (f32, f32) -> f32 + store %270, %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %271, %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_38 = constant 1 : index + %272 = addi %arg4, %c1_38 : index + %273 = addi %arg3, %arg5 : index + %c13_39 = constant 13 : index + %274 = addi %273, %c13_39 : index + %275 = addi %arg6, %arg7 : index + %276 = load %arg0[%272, %275] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %277 = load %arg1[%275, %274] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %278 = "accv.bin_op"(%276, %277) {predicate = 2 : i64} : (f32, f32) -> f32 + %279 = load %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = "accv.bin_op"(%279, %278) {predicate = 0 : i64} : (f32, f32) -> f32 + store %280, %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %281, %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_40 = constant 1 : index + %282 = addi %arg4, %c1_40 : index + %283 = addi %arg3, %arg5 : index + %c14_41 = constant 14 : index + %284 = addi %283, %c14_41 : index + %285 = addi %arg6, %arg7 : index + %286 = load %arg0[%282, %285] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %287 = load %arg1[%285, %284] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = "accv.bin_op"(%286, %287) {predicate = 2 : i64} : (f32, f32) -> f32 + %289 = load %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %290 = "accv.bin_op"(%289, %288) {predicate = 0 : i64} : (f32, f32) -> f32 + store %290, %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = load %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %291, %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c1_42 = constant 1 : index + %292 = addi %arg4, %c1_42 : index + %293 = addi %arg3, %arg5 : index + %c15_43 = constant 15 : index + %294 = addi %293, %c15_43 : index + %295 = addi %arg6, %arg7 : index + %296 = load %arg0[%292, %295] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %297 = load %arg1[%295, %294] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = "accv.bin_op"(%296, %297) {predicate = 2 : i64} : (f32, f32) -> f32 + %299 = load %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = "accv.bin_op"(%299, %298) {predicate = 0 : i64} : (f32, f32) -> f32 + store %300, %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %301, %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_44 = constant 2 : index + %302 = addi %arg4, %c2_44 : index + %303 = addi %arg3, %arg5 : index + %304 = addi %arg6, %arg7 : index + %305 = load %arg0[%302, %304] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%304, %303] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = "accv.bin_op"(%305, %306) {predicate = 2 : i64} : (f32, f32) -> f32 + %308 = load %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = "accv.bin_op"(%308, %307) {predicate = 0 : i64} : (f32, f32) -> f32 + store %309, %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_45 = constant 2 : index + %311 = addi %arg4, %c2_45 : index + %312 = addi %arg3, %arg5 : index + %c1_46 = constant 1 : index + %313 = addi %312, %c1_46 : index + %314 = addi %arg6, %arg7 : index + %315 = load %arg0[%311, %314] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %316 = load %arg1[%314, %313] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = "accv.bin_op"(%315, %316) {predicate = 2 : i64} : (f32, f32) -> f32 + %318 = load %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = "accv.bin_op"(%318, %317) {predicate = 0 : i64} : (f32, f32) -> f32 + store %319, %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %320, %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_47 = constant 2 : index + %321 = addi %arg4, %c2_47 : index + %322 = addi %arg3, %arg5 : index + %c2_48 = constant 2 : index + %323 = addi %322, %c2_48 : index + %324 = addi %arg6, %arg7 : index + %325 = load %arg0[%321, %324] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %326 = load %arg1[%324, %323] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = "accv.bin_op"(%325, %326) {predicate = 2 : i64} : (f32, f32) -> f32 + %328 = load %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = "accv.bin_op"(%328, %327) {predicate = 0 : i64} : (f32, f32) -> f32 + store %329, %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = load %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %330, %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_49 = constant 2 : index + %331 = addi %arg4, %c2_49 : index + %332 = addi %arg3, %arg5 : index + %c3_50 = constant 3 : index + %333 = addi %332, %c3_50 : index + %334 = addi %arg6, %arg7 : index + %335 = load %arg0[%331, %334] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%334, %333] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = "accv.bin_op"(%335, %336) {predicate = 2 : i64} : (f32, f32) -> f32 + %338 = load %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = "accv.bin_op"(%338, %337) {predicate = 0 : i64} : (f32, f32) -> f32 + store %339, %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_51 = constant 2 : index + %341 = addi %arg4, %c2_51 : index + %342 = addi %arg3, %arg5 : index + %c4_52 = constant 4 : index + %343 = addi %342, %c4_52 : index + %344 = addi %arg6, %arg7 : index + %345 = load %arg0[%341, %344] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %346 = load %arg1[%344, %343] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = "accv.bin_op"(%345, %346) {predicate = 2 : i64} : (f32, f32) -> f32 + %348 = load %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = "accv.bin_op"(%348, %347) {predicate = 0 : i64} : (f32, f32) -> f32 + store %349, %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %350, %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_53 = constant 2 : index + %351 = addi %arg4, %c2_53 : index + %352 = addi %arg3, %arg5 : index + %c5_54 = constant 5 : index + %353 = addi %352, %c5_54 : index + %354 = addi %arg6, %arg7 : index + %355 = load %arg0[%351, %354] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %356 = load %arg1[%354, %353] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = "accv.bin_op"(%355, %356) {predicate = 2 : i64} : (f32, f32) -> f32 + %358 = load %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = "accv.bin_op"(%358, %357) {predicate = 0 : i64} : (f32, f32) -> f32 + store %359, %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = load %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %360, %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_55 = constant 2 : index + %361 = addi %arg4, %c2_55 : index + %362 = addi %arg3, %arg5 : index + %c6_56 = constant 6 : index + %363 = addi %362, %c6_56 : index + %364 = addi %arg6, %arg7 : index + %365 = load %arg0[%361, %364] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%364, %363] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = "accv.bin_op"(%365, %366) {predicate = 2 : i64} : (f32, f32) -> f32 + %368 = load %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = "accv.bin_op"(%368, %367) {predicate = 0 : i64} : (f32, f32) -> f32 + store %369, %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_57 = constant 2 : index + %371 = addi %arg4, %c2_57 : index + %372 = addi %arg3, %arg5 : index + %c7_58 = constant 7 : index + %373 = addi %372, %c7_58 : index + %374 = addi %arg6, %arg7 : index + %375 = load %arg0[%371, %374] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %376 = load %arg1[%374, %373] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = "accv.bin_op"(%375, %376) {predicate = 2 : i64} : (f32, f32) -> f32 + %378 = load %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = "accv.bin_op"(%378, %377) {predicate = 0 : i64} : (f32, f32) -> f32 + store %379, %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %380, %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_59 = constant 2 : index + %381 = addi %arg4, %c2_59 : index + %382 = addi %arg3, %arg5 : index + %c8_60 = constant 8 : index + %383 = addi %382, %c8_60 : index + %384 = addi %arg6, %arg7 : index + %385 = load %arg0[%381, %384] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %386 = load %arg1[%384, %383] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = "accv.bin_op"(%385, %386) {predicate = 2 : i64} : (f32, f32) -> f32 + %388 = load %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = "accv.bin_op"(%388, %387) {predicate = 0 : i64} : (f32, f32) -> f32 + store %389, %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = load %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %390, %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_61 = constant 2 : index + %391 = addi %arg4, %c2_61 : index + %392 = addi %arg3, %arg5 : index + %c9_62 = constant 9 : index + %393 = addi %392, %c9_62 : index + %394 = addi %arg6, %arg7 : index + %395 = load %arg0[%391, %394] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%394, %393] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = "accv.bin_op"(%395, %396) {predicate = 2 : i64} : (f32, f32) -> f32 + %398 = load %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = "accv.bin_op"(%398, %397) {predicate = 0 : i64} : (f32, f32) -> f32 + store %399, %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_63 = constant 2 : index + %401 = addi %arg4, %c2_63 : index + %402 = addi %arg3, %arg5 : index + %c10_64 = constant 10 : index + %403 = addi %402, %c10_64 : index + %404 = addi %arg6, %arg7 : index + %405 = load %arg0[%401, %404] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%404, %403] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = "accv.bin_op"(%405, %406) {predicate = 2 : i64} : (f32, f32) -> f32 + %408 = load %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = "accv.bin_op"(%408, %407) {predicate = 0 : i64} : (f32, f32) -> f32 + store %409, %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_65 = constant 2 : index + %411 = addi %arg4, %c2_65 : index + %412 = addi %arg3, %arg5 : index + %c11_66 = constant 11 : index + %413 = addi %412, %c11_66 : index + %414 = addi %arg6, %arg7 : index + %415 = load %arg0[%411, %414] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %416 = load %arg1[%414, %413] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = "accv.bin_op"(%415, %416) {predicate = 2 : i64} : (f32, f32) -> f32 + %418 = load %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = "accv.bin_op"(%418, %417) {predicate = 0 : i64} : (f32, f32) -> f32 + store %419, %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = load %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %420, %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_67 = constant 2 : index + %421 = addi %arg4, %c2_67 : index + %422 = addi %arg3, %arg5 : index + %c12_68 = constant 12 : index + %423 = addi %422, %c12_68 : index + %424 = addi %arg6, %arg7 : index + %425 = load %arg0[%421, %424] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %426 = load %arg1[%424, %423] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = "accv.bin_op"(%425, %426) {predicate = 2 : i64} : (f32, f32) -> f32 + %428 = load %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = "accv.bin_op"(%428, %427) {predicate = 0 : i64} : (f32, f32) -> f32 + store %429, %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %430 = load %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %430, %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_69 = constant 2 : index + %431 = addi %arg4, %c2_69 : index + %432 = addi %arg3, %arg5 : index + %c13_70 = constant 13 : index + %433 = addi %432, %c13_70 : index + %434 = addi %arg6, %arg7 : index + %435 = load %arg0[%431, %434] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%434, %433] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = "accv.bin_op"(%435, %436) {predicate = 2 : i64} : (f32, f32) -> f32 + %438 = load %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = "accv.bin_op"(%438, %437) {predicate = 0 : i64} : (f32, f32) -> f32 + store %439, %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_71 = constant 2 : index + %441 = addi %arg4, %c2_71 : index + %442 = addi %arg3, %arg5 : index + %c14_72 = constant 14 : index + %443 = addi %442, %c14_72 : index + %444 = addi %arg6, %arg7 : index + %445 = load %arg0[%441, %444] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %446 = load %arg1[%444, %443] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = "accv.bin_op"(%445, %446) {predicate = 2 : i64} : (f32, f32) -> f32 + %448 = load %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = "accv.bin_op"(%448, %447) {predicate = 0 : i64} : (f32, f32) -> f32 + store %449, %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %450 = load %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %450, %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c2_73 = constant 2 : index + %451 = addi %arg4, %c2_73 : index + %452 = addi %arg3, %arg5 : index + %c15_74 = constant 15 : index + %453 = addi %452, %c15_74 : index + %454 = addi %arg6, %arg7 : index + %455 = load %arg0[%451, %454] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %456 = load %arg1[%454, %453] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = "accv.bin_op"(%455, %456) {predicate = 2 : i64} : (f32, f32) -> f32 + %458 = load %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = "accv.bin_op"(%458, %457) {predicate = 0 : i64} : (f32, f32) -> f32 + store %459, %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = load %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %460, %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_75 = constant 3 : index + %461 = addi %arg4, %c3_75 : index + %462 = addi %arg3, %arg5 : index + %463 = addi %arg6, %arg7 : index + %464 = load %arg0[%461, %463] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %465 = load %arg1[%463, %462] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %466 = "accv.bin_op"(%464, %465) {predicate = 2 : i64} : (f32, f32) -> f32 + %467 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = "accv.bin_op"(%467, %466) {predicate = 0 : i64} : (f32, f32) -> f32 + store %468, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %469, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_76 = constant 3 : index + %470 = addi %arg4, %c3_76 : index + %471 = addi %arg3, %arg5 : index + %c1_77 = constant 1 : index + %472 = addi %471, %c1_77 : index + %473 = addi %arg6, %arg7 : index + %474 = load %arg0[%470, %473] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %475 = load %arg1[%473, %472] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = "accv.bin_op"(%474, %475) {predicate = 2 : i64} : (f32, f32) -> f32 + %477 = load %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = "accv.bin_op"(%477, %476) {predicate = 0 : i64} : (f32, f32) -> f32 + store %478, %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = load %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %479, %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_78 = constant 3 : index + %480 = addi %arg4, %c3_78 : index + %481 = addi %arg3, %arg5 : index + %c2_79 = constant 2 : index + %482 = addi %481, %c2_79 : index + %483 = addi %arg6, %arg7 : index + %484 = load %arg0[%480, %483] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %485 = load %arg1[%483, %482] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = "accv.bin_op"(%484, %485) {predicate = 2 : i64} : (f32, f32) -> f32 + %487 = load %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = "accv.bin_op"(%487, %486) {predicate = 0 : i64} : (f32, f32) -> f32 + store %488, %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %489, %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_80 = constant 3 : index + %490 = addi %arg4, %c3_80 : index + %491 = addi %arg3, %arg5 : index + %c3_81 = constant 3 : index + %492 = addi %491, %c3_81 : index + %493 = addi %arg6, %arg7 : index + %494 = load %arg0[%490, %493] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %495 = load %arg1[%493, %492] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = "accv.bin_op"(%494, %495) {predicate = 2 : i64} : (f32, f32) -> f32 + %497 = load %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %498 = "accv.bin_op"(%497, %496) {predicate = 0 : i64} : (f32, f32) -> f32 + store %498, %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = load %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %499, %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_82 = constant 3 : index + %500 = addi %arg4, %c3_82 : index + %501 = addi %arg3, %arg5 : index + %c4_83 = constant 4 : index + %502 = addi %501, %c4_83 : index + %503 = addi %arg6, %arg7 : index + %504 = load %arg0[%500, %503] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %505 = load %arg1[%503, %502] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = "accv.bin_op"(%504, %505) {predicate = 2 : i64} : (f32, f32) -> f32 + %507 = load %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = "accv.bin_op"(%507, %506) {predicate = 0 : i64} : (f32, f32) -> f32 + store %508, %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %509 = load %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %509, %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_84 = constant 3 : index + %510 = addi %arg4, %c3_84 : index + %511 = addi %arg3, %arg5 : index + %c5_85 = constant 5 : index + %512 = addi %511, %c5_85 : index + %513 = addi %arg6, %arg7 : index + %514 = load %arg0[%510, %513] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%513, %512] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = "accv.bin_op"(%514, %515) {predicate = 2 : i64} : (f32, f32) -> f32 + %517 = load %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = "accv.bin_op"(%517, %516) {predicate = 0 : i64} : (f32, f32) -> f32 + store %518, %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_86 = constant 3 : index + %520 = addi %arg4, %c3_86 : index + %521 = addi %arg3, %arg5 : index + %c6_87 = constant 6 : index + %522 = addi %521, %c6_87 : index + %523 = addi %arg6, %arg7 : index + %524 = load %arg0[%520, %523] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %525 = load %arg1[%523, %522] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = "accv.bin_op"(%524, %525) {predicate = 2 : i64} : (f32, f32) -> f32 + %527 = load %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = "accv.bin_op"(%527, %526) {predicate = 0 : i64} : (f32, f32) -> f32 + store %528, %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %529 = load %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %529, %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_88 = constant 3 : index + %530 = addi %arg4, %c3_88 : index + %531 = addi %arg3, %arg5 : index + %c7_89 = constant 7 : index + %532 = addi %531, %c7_89 : index + %533 = addi %arg6, %arg7 : index + %534 = load %arg0[%530, %533] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %535 = load %arg1[%533, %532] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = "accv.bin_op"(%534, %535) {predicate = 2 : i64} : (f32, f32) -> f32 + %537 = load %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = "accv.bin_op"(%537, %536) {predicate = 0 : i64} : (f32, f32) -> f32 + store %538, %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %539 = load %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %539, %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_90 = constant 3 : index + %540 = addi %arg4, %c3_90 : index + %541 = addi %arg3, %arg5 : index + %c8_91 = constant 8 : index + %542 = addi %541, %c8_91 : index + %543 = addi %arg6, %arg7 : index + %544 = load %arg0[%540, %543] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%543, %542] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = "accv.bin_op"(%544, %545) {predicate = 2 : i64} : (f32, f32) -> f32 + %547 = load %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = "accv.bin_op"(%547, %546) {predicate = 0 : i64} : (f32, f32) -> f32 + store %548, %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_92 = constant 3 : index + %550 = addi %arg4, %c3_92 : index + %551 = addi %arg3, %arg5 : index + %c9_93 = constant 9 : index + %552 = addi %551, %c9_93 : index + %553 = addi %arg6, %arg7 : index + %554 = load %arg0[%550, %553] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %555 = load %arg1[%553, %552] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = "accv.bin_op"(%554, %555) {predicate = 2 : i64} : (f32, f32) -> f32 + %557 = load %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = "accv.bin_op"(%557, %556) {predicate = 0 : i64} : (f32, f32) -> f32 + store %558, %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = load %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %559, %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_94 = constant 3 : index + %560 = addi %arg4, %c3_94 : index + %561 = addi %arg3, %arg5 : index + %c10_95 = constant 10 : index + %562 = addi %561, %c10_95 : index + %563 = addi %arg6, %arg7 : index + %564 = load %arg0[%560, %563] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %565 = load %arg1[%563, %562] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = "accv.bin_op"(%564, %565) {predicate = 2 : i64} : (f32, f32) -> f32 + %567 = load %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = "accv.bin_op"(%567, %566) {predicate = 0 : i64} : (f32, f32) -> f32 + store %568, %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %569 = load %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %569, %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_96 = constant 3 : index + %570 = addi %arg4, %c3_96 : index + %571 = addi %arg3, %arg5 : index + %c11_97 = constant 11 : index + %572 = addi %571, %c11_97 : index + %573 = addi %arg6, %arg7 : index + %574 = load %arg0[%570, %573] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%573, %572] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = "accv.bin_op"(%574, %575) {predicate = 2 : i64} : (f32, f32) -> f32 + %577 = load %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = "accv.bin_op"(%577, %576) {predicate = 0 : i64} : (f32, f32) -> f32 + store %578, %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_98 = constant 3 : index + %580 = addi %arg4, %c3_98 : index + %581 = addi %arg3, %arg5 : index + %c12_99 = constant 12 : index + %582 = addi %581, %c12_99 : index + %583 = addi %arg6, %arg7 : index + %584 = load %arg0[%580, %583] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %585 = load %arg1[%583, %582] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = "accv.bin_op"(%584, %585) {predicate = 2 : i64} : (f32, f32) -> f32 + %587 = load %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = "accv.bin_op"(%587, %586) {predicate = 0 : i64} : (f32, f32) -> f32 + store %588, %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %589 = load %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %589, %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_100 = constant 3 : index + %590 = addi %arg4, %c3_100 : index + %591 = addi %arg3, %arg5 : index + %c13_101 = constant 13 : index + %592 = addi %591, %c13_101 : index + %593 = addi %arg6, %arg7 : index + %594 = load %arg0[%590, %593] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %595 = load %arg1[%593, %592] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = "accv.bin_op"(%594, %595) {predicate = 2 : i64} : (f32, f32) -> f32 + %597 = load %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %598 = "accv.bin_op"(%597, %596) {predicate = 0 : i64} : (f32, f32) -> f32 + store %598, %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %599 = load %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %599, %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_102 = constant 3 : index + %600 = addi %arg4, %c3_102 : index + %601 = addi %arg3, %arg5 : index + %c14_103 = constant 14 : index + %602 = addi %601, %c14_103 : index + %603 = addi %arg6, %arg7 : index + %604 = load %arg0[%600, %603] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %605 = load %arg1[%603, %602] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %606 = "accv.bin_op"(%604, %605) {predicate = 2 : i64} : (f32, f32) -> f32 + %607 = load %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %608 = "accv.bin_op"(%607, %606) {predicate = 0 : i64} : (f32, f32) -> f32 + store %608, %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %609 = load %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %609, %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c3_104 = constant 3 : index + %610 = addi %arg4, %c3_104 : index + %611 = addi %arg3, %arg5 : index + %c15_105 = constant 15 : index + %612 = addi %611, %c15_105 : index + %613 = addi %arg6, %arg7 : index + %614 = load %arg0[%610, %613] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %615 = load %arg1[%613, %612] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %616 = "accv.bin_op"(%614, %615) {predicate = 2 : i64} : (f32, f32) -> f32 + %617 = load %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %618 = "accv.bin_op"(%617, %616) {predicate = 0 : i64} : (f32, f32) -> f32 + store %618, %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %619 = load %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %619, %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_106 = constant 4 : index + %620 = addi %arg4, %c4_106 : index + %621 = addi %arg3, %arg5 : index + %622 = addi %arg6, %arg7 : index + %623 = load %arg0[%620, %622] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %624 = load %arg1[%622, %621] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %625 = "accv.bin_op"(%623, %624) {predicate = 2 : i64} : (f32, f32) -> f32 + %626 = load %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %627 = "accv.bin_op"(%626, %625) {predicate = 0 : i64} : (f32, f32) -> f32 + store %627, %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %628 = load %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %628, %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_107 = constant 4 : index + %629 = addi %arg4, %c4_107 : index + %630 = addi %arg3, %arg5 : index + %c1_108 = constant 1 : index + %631 = addi %630, %c1_108 : index + %632 = addi %arg6, %arg7 : index + %633 = load %arg0[%629, %632] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %634 = load %arg1[%632, %631] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %635 = "accv.bin_op"(%633, %634) {predicate = 2 : i64} : (f32, f32) -> f32 + %636 = load %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %637 = "accv.bin_op"(%636, %635) {predicate = 0 : i64} : (f32, f32) -> f32 + store %637, %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %638 = load %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %638, %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_109 = constant 4 : index + %639 = addi %arg4, %c4_109 : index + %640 = addi %arg3, %arg5 : index + %c2_110 = constant 2 : index + %641 = addi %640, %c2_110 : index + %642 = addi %arg6, %arg7 : index + %643 = load %arg0[%639, %642] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %644 = load %arg1[%642, %641] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %645 = "accv.bin_op"(%643, %644) {predicate = 2 : i64} : (f32, f32) -> f32 + %646 = load %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %647 = "accv.bin_op"(%646, %645) {predicate = 0 : i64} : (f32, f32) -> f32 + store %647, %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %648 = load %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %648, %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_111 = constant 4 : index + %649 = addi %arg4, %c4_111 : index + %650 = addi %arg3, %arg5 : index + %c3_112 = constant 3 : index + %651 = addi %650, %c3_112 : index + %652 = addi %arg6, %arg7 : index + %653 = load %arg0[%649, %652] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %654 = load %arg1[%652, %651] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %655 = "accv.bin_op"(%653, %654) {predicate = 2 : i64} : (f32, f32) -> f32 + %656 = load %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %657 = "accv.bin_op"(%656, %655) {predicate = 0 : i64} : (f32, f32) -> f32 + store %657, %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %658 = load %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %658, %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_113 = constant 4 : index + %659 = addi %arg4, %c4_113 : index + %660 = addi %arg3, %arg5 : index + %c4_114 = constant 4 : index + %661 = addi %660, %c4_114 : index + %662 = addi %arg6, %arg7 : index + %663 = load %arg0[%659, %662] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %664 = load %arg1[%662, %661] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %665 = "accv.bin_op"(%663, %664) {predicate = 2 : i64} : (f32, f32) -> f32 + %666 = load %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %667 = "accv.bin_op"(%666, %665) {predicate = 0 : i64} : (f32, f32) -> f32 + store %667, %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %668 = load %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %668, %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_115 = constant 4 : index + %669 = addi %arg4, %c4_115 : index + %670 = addi %arg3, %arg5 : index + %c5_116 = constant 5 : index + %671 = addi %670, %c5_116 : index + %672 = addi %arg6, %arg7 : index + %673 = load %arg0[%669, %672] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %674 = load %arg1[%672, %671] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %675 = "accv.bin_op"(%673, %674) {predicate = 2 : i64} : (f32, f32) -> f32 + %676 = load %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %677 = "accv.bin_op"(%676, %675) {predicate = 0 : i64} : (f32, f32) -> f32 + store %677, %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %678 = load %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %678, %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_117 = constant 4 : index + %679 = addi %arg4, %c4_117 : index + %680 = addi %arg3, %arg5 : index + %c6_118 = constant 6 : index + %681 = addi %680, %c6_118 : index + %682 = addi %arg6, %arg7 : index + %683 = load %arg0[%679, %682] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %684 = load %arg1[%682, %681] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %685 = "accv.bin_op"(%683, %684) {predicate = 2 : i64} : (f32, f32) -> f32 + %686 = load %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %687 = "accv.bin_op"(%686, %685) {predicate = 0 : i64} : (f32, f32) -> f32 + store %687, %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %688 = load %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %688, %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_119 = constant 4 : index + %689 = addi %arg4, %c4_119 : index + %690 = addi %arg3, %arg5 : index + %c7_120 = constant 7 : index + %691 = addi %690, %c7_120 : index + %692 = addi %arg6, %arg7 : index + %693 = load %arg0[%689, %692] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %694 = load %arg1[%692, %691] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %695 = "accv.bin_op"(%693, %694) {predicate = 2 : i64} : (f32, f32) -> f32 + %696 = load %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %697 = "accv.bin_op"(%696, %695) {predicate = 0 : i64} : (f32, f32) -> f32 + store %697, %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %698 = load %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %698, %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_121 = constant 4 : index + %699 = addi %arg4, %c4_121 : index + %700 = addi %arg3, %arg5 : index + %c8_122 = constant 8 : index + %701 = addi %700, %c8_122 : index + %702 = addi %arg6, %arg7 : index + %703 = load %arg0[%699, %702] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %704 = load %arg1[%702, %701] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %705 = "accv.bin_op"(%703, %704) {predicate = 2 : i64} : (f32, f32) -> f32 + %706 = load %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %707 = "accv.bin_op"(%706, %705) {predicate = 0 : i64} : (f32, f32) -> f32 + store %707, %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %708 = load %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %708, %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_123 = constant 4 : index + %709 = addi %arg4, %c4_123 : index + %710 = addi %arg3, %arg5 : index + %c9_124 = constant 9 : index + %711 = addi %710, %c9_124 : index + %712 = addi %arg6, %arg7 : index + %713 = load %arg0[%709, %712] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %714 = load %arg1[%712, %711] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %715 = "accv.bin_op"(%713, %714) {predicate = 2 : i64} : (f32, f32) -> f32 + %716 = load %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %717 = "accv.bin_op"(%716, %715) {predicate = 0 : i64} : (f32, f32) -> f32 + store %717, %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %718 = load %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %718, %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_125 = constant 4 : index + %719 = addi %arg4, %c4_125 : index + %720 = addi %arg3, %arg5 : index + %c10_126 = constant 10 : index + %721 = addi %720, %c10_126 : index + %722 = addi %arg6, %arg7 : index + %723 = load %arg0[%719, %722] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %724 = load %arg1[%722, %721] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %725 = "accv.bin_op"(%723, %724) {predicate = 2 : i64} : (f32, f32) -> f32 + %726 = load %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %727 = "accv.bin_op"(%726, %725) {predicate = 0 : i64} : (f32, f32) -> f32 + store %727, %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %728 = load %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %728, %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_127 = constant 4 : index + %729 = addi %arg4, %c4_127 : index + %730 = addi %arg3, %arg5 : index + %c11_128 = constant 11 : index + %731 = addi %730, %c11_128 : index + %732 = addi %arg6, %arg7 : index + %733 = load %arg0[%729, %732] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %734 = load %arg1[%732, %731] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %735 = "accv.bin_op"(%733, %734) {predicate = 2 : i64} : (f32, f32) -> f32 + %736 = load %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %737 = "accv.bin_op"(%736, %735) {predicate = 0 : i64} : (f32, f32) -> f32 + store %737, %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %738 = load %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %738, %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_129 = constant 4 : index + %739 = addi %arg4, %c4_129 : index + %740 = addi %arg3, %arg5 : index + %c12_130 = constant 12 : index + %741 = addi %740, %c12_130 : index + %742 = addi %arg6, %arg7 : index + %743 = load %arg0[%739, %742] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %744 = load %arg1[%742, %741] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %745 = "accv.bin_op"(%743, %744) {predicate = 2 : i64} : (f32, f32) -> f32 + %746 = load %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %747 = "accv.bin_op"(%746, %745) {predicate = 0 : i64} : (f32, f32) -> f32 + store %747, %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %748 = load %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %748, %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_131 = constant 4 : index + %749 = addi %arg4, %c4_131 : index + %750 = addi %arg3, %arg5 : index + %c13_132 = constant 13 : index + %751 = addi %750, %c13_132 : index + %752 = addi %arg6, %arg7 : index + %753 = load %arg0[%749, %752] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %754 = load %arg1[%752, %751] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %755 = "accv.bin_op"(%753, %754) {predicate = 2 : i64} : (f32, f32) -> f32 + %756 = load %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %757 = "accv.bin_op"(%756, %755) {predicate = 0 : i64} : (f32, f32) -> f32 + store %757, %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %758 = load %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %758, %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_133 = constant 4 : index + %759 = addi %arg4, %c4_133 : index + %760 = addi %arg3, %arg5 : index + %c14_134 = constant 14 : index + %761 = addi %760, %c14_134 : index + %762 = addi %arg6, %arg7 : index + %763 = load %arg0[%759, %762] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %764 = load %arg1[%762, %761] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %765 = "accv.bin_op"(%763, %764) {predicate = 2 : i64} : (f32, f32) -> f32 + %766 = load %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %767 = "accv.bin_op"(%766, %765) {predicate = 0 : i64} : (f32, f32) -> f32 + store %767, %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %768 = load %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %768, %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c4_135 = constant 4 : index + %769 = addi %arg4, %c4_135 : index + %770 = addi %arg3, %arg5 : index + %c15_136 = constant 15 : index + %771 = addi %770, %c15_136 : index + %772 = addi %arg6, %arg7 : index + %773 = load %arg0[%769, %772] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %774 = load %arg1[%772, %771] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %775 = "accv.bin_op"(%773, %774) {predicate = 2 : i64} : (f32, f32) -> f32 + %776 = load %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %777 = "accv.bin_op"(%776, %775) {predicate = 0 : i64} : (f32, f32) -> f32 + store %777, %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %778 = load %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %778, %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_137 = constant 5 : index + %779 = addi %arg4, %c5_137 : index + %780 = addi %arg3, %arg5 : index + %781 = addi %arg6, %arg7 : index + %782 = load %arg0[%779, %781] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %783 = load %arg1[%781, %780] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %784 = "accv.bin_op"(%782, %783) {predicate = 2 : i64} : (f32, f32) -> f32 + %785 = load %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %786 = "accv.bin_op"(%785, %784) {predicate = 0 : i64} : (f32, f32) -> f32 + store %786, %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %787 = load %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %787, %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_138 = constant 5 : index + %788 = addi %arg4, %c5_138 : index + %789 = addi %arg3, %arg5 : index + %c1_139 = constant 1 : index + %790 = addi %789, %c1_139 : index + %791 = addi %arg6, %arg7 : index + %792 = load %arg0[%788, %791] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %793 = load %arg1[%791, %790] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %794 = "accv.bin_op"(%792, %793) {predicate = 2 : i64} : (f32, f32) -> f32 + %795 = load %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %796 = "accv.bin_op"(%795, %794) {predicate = 0 : i64} : (f32, f32) -> f32 + store %796, %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %797 = load %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %797, %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_140 = constant 5 : index + %798 = addi %arg4, %c5_140 : index + %799 = addi %arg3, %arg5 : index + %c2_141 = constant 2 : index + %800 = addi %799, %c2_141 : index + %801 = addi %arg6, %arg7 : index + %802 = load %arg0[%798, %801] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %803 = load %arg1[%801, %800] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %804 = "accv.bin_op"(%802, %803) {predicate = 2 : i64} : (f32, f32) -> f32 + %805 = load %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %806 = "accv.bin_op"(%805, %804) {predicate = 0 : i64} : (f32, f32) -> f32 + store %806, %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %807 = load %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %807, %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_142 = constant 5 : index + %808 = addi %arg4, %c5_142 : index + %809 = addi %arg3, %arg5 : index + %c3_143 = constant 3 : index + %810 = addi %809, %c3_143 : index + %811 = addi %arg6, %arg7 : index + %812 = load %arg0[%808, %811] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %813 = load %arg1[%811, %810] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %814 = "accv.bin_op"(%812, %813) {predicate = 2 : i64} : (f32, f32) -> f32 + %815 = load %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %816 = "accv.bin_op"(%815, %814) {predicate = 0 : i64} : (f32, f32) -> f32 + store %816, %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %817 = load %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %817, %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_144 = constant 5 : index + %818 = addi %arg4, %c5_144 : index + %819 = addi %arg3, %arg5 : index + %c4_145 = constant 4 : index + %820 = addi %819, %c4_145 : index + %821 = addi %arg6, %arg7 : index + %822 = load %arg0[%818, %821] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %823 = load %arg1[%821, %820] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %824 = "accv.bin_op"(%822, %823) {predicate = 2 : i64} : (f32, f32) -> f32 + %825 = load %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %826 = "accv.bin_op"(%825, %824) {predicate = 0 : i64} : (f32, f32) -> f32 + store %826, %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %827 = load %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %827, %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_146 = constant 5 : index + %828 = addi %arg4, %c5_146 : index + %829 = addi %arg3, %arg5 : index + %c5_147 = constant 5 : index + %830 = addi %829, %c5_147 : index + %831 = addi %arg6, %arg7 : index + %832 = load %arg0[%828, %831] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %833 = load %arg1[%831, %830] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %834 = "accv.bin_op"(%832, %833) {predicate = 2 : i64} : (f32, f32) -> f32 + %835 = load %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %836 = "accv.bin_op"(%835, %834) {predicate = 0 : i64} : (f32, f32) -> f32 + store %836, %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %837 = load %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %837, %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_148 = constant 5 : index + %838 = addi %arg4, %c5_148 : index + %839 = addi %arg3, %arg5 : index + %c6_149 = constant 6 : index + %840 = addi %839, %c6_149 : index + %841 = addi %arg6, %arg7 : index + %842 = load %arg0[%838, %841] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %843 = load %arg1[%841, %840] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %844 = "accv.bin_op"(%842, %843) {predicate = 2 : i64} : (f32, f32) -> f32 + %845 = load %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %846 = "accv.bin_op"(%845, %844) {predicate = 0 : i64} : (f32, f32) -> f32 + store %846, %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %847 = load %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %847, %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_150 = constant 5 : index + %848 = addi %arg4, %c5_150 : index + %849 = addi %arg3, %arg5 : index + %c7_151 = constant 7 : index + %850 = addi %849, %c7_151 : index + %851 = addi %arg6, %arg7 : index + %852 = load %arg0[%848, %851] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %853 = load %arg1[%851, %850] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %854 = "accv.bin_op"(%852, %853) {predicate = 2 : i64} : (f32, f32) -> f32 + %855 = load %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %856 = "accv.bin_op"(%855, %854) {predicate = 0 : i64} : (f32, f32) -> f32 + store %856, %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %857 = load %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %857, %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_152 = constant 5 : index + %858 = addi %arg4, %c5_152 : index + %859 = addi %arg3, %arg5 : index + %c8_153 = constant 8 : index + %860 = addi %859, %c8_153 : index + %861 = addi %arg6, %arg7 : index + %862 = load %arg0[%858, %861] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %863 = load %arg1[%861, %860] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %864 = "accv.bin_op"(%862, %863) {predicate = 2 : i64} : (f32, f32) -> f32 + %865 = load %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %866 = "accv.bin_op"(%865, %864) {predicate = 0 : i64} : (f32, f32) -> f32 + store %866, %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %867 = load %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %867, %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_154 = constant 5 : index + %868 = addi %arg4, %c5_154 : index + %869 = addi %arg3, %arg5 : index + %c9_155 = constant 9 : index + %870 = addi %869, %c9_155 : index + %871 = addi %arg6, %arg7 : index + %872 = load %arg0[%868, %871] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %873 = load %arg1[%871, %870] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %874 = "accv.bin_op"(%872, %873) {predicate = 2 : i64} : (f32, f32) -> f32 + %875 = load %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %876 = "accv.bin_op"(%875, %874) {predicate = 0 : i64} : (f32, f32) -> f32 + store %876, %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %877 = load %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %877, %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_156 = constant 5 : index + %878 = addi %arg4, %c5_156 : index + %879 = addi %arg3, %arg5 : index + %c10_157 = constant 10 : index + %880 = addi %879, %c10_157 : index + %881 = addi %arg6, %arg7 : index + %882 = load %arg0[%878, %881] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %883 = load %arg1[%881, %880] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %884 = "accv.bin_op"(%882, %883) {predicate = 2 : i64} : (f32, f32) -> f32 + %885 = load %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %886 = "accv.bin_op"(%885, %884) {predicate = 0 : i64} : (f32, f32) -> f32 + store %886, %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %887 = load %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %887, %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_158 = constant 5 : index + %888 = addi %arg4, %c5_158 : index + %889 = addi %arg3, %arg5 : index + %c11_159 = constant 11 : index + %890 = addi %889, %c11_159 : index + %891 = addi %arg6, %arg7 : index + %892 = load %arg0[%888, %891] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %893 = load %arg1[%891, %890] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %894 = "accv.bin_op"(%892, %893) {predicate = 2 : i64} : (f32, f32) -> f32 + %895 = load %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %896 = "accv.bin_op"(%895, %894) {predicate = 0 : i64} : (f32, f32) -> f32 + store %896, %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %897 = load %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %897, %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_160 = constant 5 : index + %898 = addi %arg4, %c5_160 : index + %899 = addi %arg3, %arg5 : index + %c12_161 = constant 12 : index + %900 = addi %899, %c12_161 : index + %901 = addi %arg6, %arg7 : index + %902 = load %arg0[%898, %901] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %903 = load %arg1[%901, %900] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %904 = "accv.bin_op"(%902, %903) {predicate = 2 : i64} : (f32, f32) -> f32 + %905 = load %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %906 = "accv.bin_op"(%905, %904) {predicate = 0 : i64} : (f32, f32) -> f32 + store %906, %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %907 = load %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %907, %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_162 = constant 5 : index + %908 = addi %arg4, %c5_162 : index + %909 = addi %arg3, %arg5 : index + %c13_163 = constant 13 : index + %910 = addi %909, %c13_163 : index + %911 = addi %arg6, %arg7 : index + %912 = load %arg0[%908, %911] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %913 = load %arg1[%911, %910] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %914 = "accv.bin_op"(%912, %913) {predicate = 2 : i64} : (f32, f32) -> f32 + %915 = load %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %916 = "accv.bin_op"(%915, %914) {predicate = 0 : i64} : (f32, f32) -> f32 + store %916, %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %917 = load %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %917, %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_164 = constant 5 : index + %918 = addi %arg4, %c5_164 : index + %919 = addi %arg3, %arg5 : index + %c14_165 = constant 14 : index + %920 = addi %919, %c14_165 : index + %921 = addi %arg6, %arg7 : index + %922 = load %arg0[%918, %921] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %923 = load %arg1[%921, %920] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %924 = "accv.bin_op"(%922, %923) {predicate = 2 : i64} : (f32, f32) -> f32 + %925 = load %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %926 = "accv.bin_op"(%925, %924) {predicate = 0 : i64} : (f32, f32) -> f32 + store %926, %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %927 = load %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %927, %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %c5_166 = constant 5 : index + %928 = addi %arg4, %c5_166 : index + %929 = addi %arg3, %arg5 : index + %c15_167 = constant 15 : index + %930 = addi %929, %c15_167 : index + %931 = addi %arg6, %arg7 : index + %932 = load %arg0[%928, %931] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %933 = load %arg1[%931, %930] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %934 = "accv.bin_op"(%932, %933) {predicate = 2 : i64} : (f32, f32) -> f32 + %935 = load %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %936 = "accv.bin_op"(%935, %934) {predicate = 0 : i64} : (f32, f32) -> f32 + store %936, %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %937 = load %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %937, %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + %c0_2 = constant 0 : index + %c256_3 = constant 256 : index + %c16 = constant 16 : index + scf.for %arg4 = %c0_2 to %c256_3 step %c16 { + %c0_4 = constant 0 : index + %c128 = constant 128 : index + %c4 = constant 4 : index + scf.for %arg5 = %c0_4 to %c128 step %c4 { + %c0_5 = constant 0 : index + %c4_6 = constant 4 : index + %c1 = constant 1 : index + scf.for %arg6 = %c0_5 to %c4_6 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = "accv.bin_op"(%2, %3) {predicate = 2 : i64} : (f32, f32) -> f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = "accv.bin_op"(%5, %4) {predicate = 0 : i64} : (f32, f32) -> f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %arg3, %arg4 : index + %c1_7 = constant 1 : index + %9 = addi %8, %c1_7 : index + %10 = addi %arg5, %arg6 : index + %11 = load %arg0[%c780, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg1[%10, %9] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = "accv.bin_op"(%11, %12) {predicate = 2 : i64} : (f32, f32) -> f32 + %14 = load %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = "accv.bin_op"(%14, %13) {predicate = 0 : i64} : (f32, f32) -> f32 + store %15, %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = load %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %16, %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %17 = addi %arg3, %arg4 : index + %c2 = constant 2 : index + %18 = addi %17, %c2 : index + %19 = addi %arg5, %arg6 : index + %20 = load %arg0[%c780, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = load %arg1[%19, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = "accv.bin_op"(%20, %21) {predicate = 2 : i64} : (f32, f32) -> f32 + %23 = load %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = "accv.bin_op"(%23, %22) {predicate = 0 : i64} : (f32, f32) -> f32 + store %24, %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = load %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %25, %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %26 = addi %arg3, %arg4 : index + %c3 = constant 3 : index + %27 = addi %26, %c3 : index + %28 = addi %arg5, %arg6 : index + %29 = load %arg0[%c780, %28] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %30 = load %arg1[%28, %27] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = "accv.bin_op"(%29, %30) {predicate = 2 : i64} : (f32, f32) -> f32 + %32 = load %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %33 = "accv.bin_op"(%32, %31) {predicate = 0 : i64} : (f32, f32) -> f32 + store %33, %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = load %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %34, %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = addi %arg3, %arg4 : index + %c4_8 = constant 4 : index + %36 = addi %35, %c4_8 : index + %37 = addi %arg5, %arg6 : index + %38 = load %arg0[%c780, %37] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %39 = load %arg1[%37, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = "accv.bin_op"(%38, %39) {predicate = 2 : i64} : (f32, f32) -> f32 + %41 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = "accv.bin_op"(%41, %40) {predicate = 0 : i64} : (f32, f32) -> f32 + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %43, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = addi %arg3, %arg4 : index + %c5 = constant 5 : index + %45 = addi %44, %c5 : index + %46 = addi %arg5, %arg6 : index + %47 = load %arg0[%c780, %46] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %48 = load %arg1[%46, %45] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = "accv.bin_op"(%47, %48) {predicate = 2 : i64} : (f32, f32) -> f32 + %50 = load %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %51 = "accv.bin_op"(%50, %49) {predicate = 0 : i64} : (f32, f32) -> f32 + store %51, %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = load %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %52, %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = addi %arg3, %arg4 : index + %c6_9 = constant 6 : index + %54 = addi %53, %c6_9 : index + %55 = addi %arg5, %arg6 : index + %56 = load %arg0[%c780, %55] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %57 = load %arg1[%55, %54] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %58 = "accv.bin_op"(%56, %57) {predicate = 2 : i64} : (f32, f32) -> f32 + %59 = load %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = "accv.bin_op"(%59, %58) {predicate = 0 : i64} : (f32, f32) -> f32 + store %60, %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %61 = load %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %61, %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addi %arg3, %arg4 : index + %c7 = constant 7 : index + %63 = addi %62, %c7 : index + %64 = addi %arg5, %arg6 : index + %65 = load %arg0[%c780, %64] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%64, %63] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = "accv.bin_op"(%65, %66) {predicate = 2 : i64} : (f32, f32) -> f32 + %68 = load %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = "accv.bin_op"(%68, %67) {predicate = 0 : i64} : (f32, f32) -> f32 + store %69, %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %arg3, %arg4 : index + %c8 = constant 8 : index + %72 = addi %71, %c8 : index + %73 = addi %arg5, %arg6 : index + %74 = load %arg0[%c780, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = "accv.bin_op"(%74, %75) {predicate = 2 : i64} : (f32, f32) -> f32 + %77 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = "accv.bin_op"(%77, %76) {predicate = 0 : i64} : (f32, f32) -> f32 + store %78, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = addi %arg3, %arg4 : index + %c9 = constant 9 : index + %81 = addi %80, %c9 : index + %82 = addi %arg5, %arg6 : index + %83 = load %arg0[%c780, %82] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %84 = load %arg1[%82, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = "accv.bin_op"(%83, %84) {predicate = 2 : i64} : (f32, f32) -> f32 + %86 = load %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = "accv.bin_op"(%86, %85) {predicate = 0 : i64} : (f32, f32) -> f32 + store %87, %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = load %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %88, %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %89 = addi %arg3, %arg4 : index + %c10 = constant 10 : index + %90 = addi %89, %c10 : index + %91 = addi %arg5, %arg6 : index + %92 = load %arg0[%c780, %91] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %93 = load %arg1[%91, %90] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = "accv.bin_op"(%92, %93) {predicate = 2 : i64} : (f32, f32) -> f32 + %95 = load %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = "accv.bin_op"(%95, %94) {predicate = 0 : i64} : (f32, f32) -> f32 + store %96, %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = load %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %97, %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = addi %arg3, %arg4 : index + %c11 = constant 11 : index + %99 = addi %98, %c11 : index + %100 = addi %arg5, %arg6 : index + %101 = load %arg0[%c780, %100] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %102 = load %arg1[%100, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = "accv.bin_op"(%101, %102) {predicate = 2 : i64} : (f32, f32) -> f32 + %104 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = "accv.bin_op"(%104, %103) {predicate = 0 : i64} : (f32, f32) -> f32 + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %106, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %107 = addi %arg3, %arg4 : index + %c12 = constant 12 : index + %108 = addi %107, %c12 : index + %109 = addi %arg5, %arg6 : index + %110 = load %arg0[%c780, %109] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %111 = load %arg1[%109, %108] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = "accv.bin_op"(%110, %111) {predicate = 2 : i64} : (f32, f32) -> f32 + %113 = load %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %114 = "accv.bin_op"(%113, %112) {predicate = 0 : i64} : (f32, f32) -> f32 + store %114, %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = load %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %115, %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = addi %arg3, %arg4 : index + %c13 = constant 13 : index + %117 = addi %116, %c13 : index + %118 = addi %arg5, %arg6 : index + %119 = load %arg0[%c780, %118] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%118, %117] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = "accv.bin_op"(%119, %120) {predicate = 2 : i64} : (f32, f32) -> f32 + %122 = load %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = "accv.bin_op"(%122, %121) {predicate = 0 : i64} : (f32, f32) -> f32 + store %123, %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = addi %arg3, %arg4 : index + %c14 = constant 14 : index + %126 = addi %125, %c14 : index + %127 = addi %arg5, %arg6 : index + %128 = load %arg0[%c780, %127] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg1[%127, %126] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = "accv.bin_op"(%128, %129) {predicate = 2 : i64} : (f32, f32) -> f32 + %131 = load %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = "accv.bin_op"(%131, %130) {predicate = 0 : i64} : (f32, f32) -> f32 + store %132, %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = load %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %133, %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = addi %arg3, %arg4 : index + %c15 = constant 15 : index + %135 = addi %134, %c15 : index + %136 = addi %arg5, %arg6 : index + %137 = load %arg0[%c780, %136] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%136, %135] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = "accv.bin_op"(%137, %138) {predicate = 2 : i64} : (f32, f32) -> f32 + %140 = load %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = "accv.bin_op"(%140, %139) {predicate = 0 : i64} : (f32, f32) -> f32 + store %141, %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = addi %arg3, %arg4 : index + %144 = addi %arg5, %arg6 : index + %145 = load %arg0[%c781, %144] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %146 = load %arg1[%144, %143] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = "accv.bin_op"(%145, %146) {predicate = 2 : i64} : (f32, f32) -> f32 + %148 = load %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = "accv.bin_op"(%148, %147) {predicate = 0 : i64} : (f32, f32) -> f32 + store %149, %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %150, %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = addi %arg3, %arg4 : index + %c1_10 = constant 1 : index + %152 = addi %151, %c1_10 : index + %153 = addi %arg5, %arg6 : index + %154 = load %arg0[%c781, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg1[%153, %152] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = "accv.bin_op"(%154, %155) {predicate = 2 : i64} : (f32, f32) -> f32 + %157 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = "accv.bin_op"(%157, %156) {predicate = 0 : i64} : (f32, f32) -> f32 + store %158, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %159, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addi %arg3, %arg4 : index + %c2_11 = constant 2 : index + %161 = addi %160, %c2_11 : index + %162 = addi %arg5, %arg6 : index + %163 = load %arg0[%c781, %162] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %164 = load %arg1[%162, %161] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = "accv.bin_op"(%163, %164) {predicate = 2 : i64} : (f32, f32) -> f32 + %166 = load %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = "accv.bin_op"(%166, %165) {predicate = 0 : i64} : (f32, f32) -> f32 + store %167, %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %168, %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = addi %arg3, %arg4 : index + %c3_12 = constant 3 : index + %170 = addi %169, %c3_12 : index + %171 = addi %arg5, %arg6 : index + %172 = load %arg0[%c781, %171] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %173 = load %arg1[%171, %170] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = "accv.bin_op"(%172, %173) {predicate = 2 : i64} : (f32, f32) -> f32 + %175 = load %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = "accv.bin_op"(%175, %174) {predicate = 0 : i64} : (f32, f32) -> f32 + store %176, %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = load %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %177, %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addi %arg3, %arg4 : index + %c4_13 = constant 4 : index + %179 = addi %178, %c4_13 : index + %180 = addi %arg5, %arg6 : index + %181 = load %arg0[%c781, %180] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %182 = load %arg1[%180, %179] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = "accv.bin_op"(%181, %182) {predicate = 2 : i64} : (f32, f32) -> f32 + %184 = load %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = "accv.bin_op"(%184, %183) {predicate = 0 : i64} : (f32, f32) -> f32 + store %185, %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %186, %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = addi %arg3, %arg4 : index + %c5_14 = constant 5 : index + %188 = addi %187, %c5_14 : index + %189 = addi %arg5, %arg6 : index + %190 = load %arg0[%c781, %189] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %191 = load %arg1[%189, %188] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = "accv.bin_op"(%190, %191) {predicate = 2 : i64} : (f32, f32) -> f32 + %193 = load %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = "accv.bin_op"(%193, %192) {predicate = 0 : i64} : (f32, f32) -> f32 + store %194, %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = load %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %195, %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addi %arg3, %arg4 : index + %c6_15 = constant 6 : index + %197 = addi %196, %c6_15 : index + %198 = addi %arg5, %arg6 : index + %199 = load %arg0[%c781, %198] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %200 = load %arg1[%198, %197] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = "accv.bin_op"(%199, %200) {predicate = 2 : i64} : (f32, f32) -> f32 + %202 = load %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = "accv.bin_op"(%202, %201) {predicate = 0 : i64} : (f32, f32) -> f32 + store %203, %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %204, %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = addi %arg3, %arg4 : index + %c7_16 = constant 7 : index + %206 = addi %205, %c7_16 : index + %207 = addi %arg5, %arg6 : index + %208 = load %arg0[%c781, %207] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %209 = load %arg1[%207, %206] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = "accv.bin_op"(%208, %209) {predicate = 2 : i64} : (f32, f32) -> f32 + %211 = load %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %212 = "accv.bin_op"(%211, %210) {predicate = 0 : i64} : (f32, f32) -> f32 + store %212, %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = load %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %213, %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = addi %arg3, %arg4 : index + %c8_17 = constant 8 : index + %215 = addi %214, %c8_17 : index + %216 = addi %arg5, %arg6 : index + %217 = load %arg0[%c781, %216] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%216, %215] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = "accv.bin_op"(%217, %218) {predicate = 2 : i64} : (f32, f32) -> f32 + %220 = load %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = "accv.bin_op"(%220, %219) {predicate = 0 : i64} : (f32, f32) -> f32 + store %221, %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = addi %arg3, %arg4 : index + %c9_18 = constant 9 : index + %224 = addi %223, %c9_18 : index + %225 = addi %arg5, %arg6 : index + %226 = load %arg0[%c781, %225] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %227 = load %arg1[%225, %224] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = "accv.bin_op"(%226, %227) {predicate = 2 : i64} : (f32, f32) -> f32 + %229 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %230 = "accv.bin_op"(%229, %228) {predicate = 0 : i64} : (f32, f32) -> f32 + store %230, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %231, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = addi %arg3, %arg4 : index + %c10_19 = constant 10 : index + %233 = addi %232, %c10_19 : index + %234 = addi %arg5, %arg6 : index + %235 = load %arg0[%c781, %234] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%234, %233] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = "accv.bin_op"(%235, %236) {predicate = 2 : i64} : (f32, f32) -> f32 + %238 = load %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = "accv.bin_op"(%238, %237) {predicate = 0 : i64} : (f32, f32) -> f32 + store %239, %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = addi %arg3, %arg4 : index + %c11_20 = constant 11 : index + %242 = addi %241, %c11_20 : index + %243 = addi %arg5, %arg6 : index + %244 = load %arg0[%c781, %243] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %245 = load %arg1[%243, %242] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = "accv.bin_op"(%244, %245) {predicate = 2 : i64} : (f32, f32) -> f32 + %247 = load %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %248 = "accv.bin_op"(%247, %246) {predicate = 0 : i64} : (f32, f32) -> f32 + store %248, %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = load %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %249, %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = addi %arg3, %arg4 : index + %c12_21 = constant 12 : index + %251 = addi %250, %c12_21 : index + %252 = addi %arg5, %arg6 : index + %253 = load %arg0[%c781, %252] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%252, %251] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = "accv.bin_op"(%253, %254) {predicate = 2 : i64} : (f32, f32) -> f32 + %256 = load %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = "accv.bin_op"(%256, %255) {predicate = 0 : i64} : (f32, f32) -> f32 + store %257, %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = addi %arg3, %arg4 : index + %c13_22 = constant 13 : index + %260 = addi %259, %c13_22 : index + %261 = addi %arg5, %arg6 : index + %262 = load %arg0[%c781, %261] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %263 = load %arg1[%261, %260] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = "accv.bin_op"(%262, %263) {predicate = 2 : i64} : (f32, f32) -> f32 + %265 = load %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %266 = "accv.bin_op"(%265, %264) {predicate = 0 : i64} : (f32, f32) -> f32 + store %266, %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = load %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %267, %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = addi %arg3, %arg4 : index + %c14_23 = constant 14 : index + %269 = addi %268, %c14_23 : index + %270 = addi %arg5, %arg6 : index + %271 = load %arg0[%c781, %270] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%270, %269] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = "accv.bin_op"(%271, %272) {predicate = 2 : i64} : (f32, f32) -> f32 + %274 = load %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = "accv.bin_op"(%274, %273) {predicate = 0 : i64} : (f32, f32) -> f32 + store %275, %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = addi %arg3, %arg4 : index + %c15_24 = constant 15 : index + %278 = addi %277, %c15_24 : index + %279 = addi %arg5, %arg6 : index + %280 = load %arg0[%c781, %279] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %281 = load %arg1[%279, %278] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = "accv.bin_op"(%280, %281) {predicate = 2 : i64} : (f32, f32) -> f32 + %283 = load %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %284 = "accv.bin_op"(%283, %282) {predicate = 0 : i64} : (f32, f32) -> f32 + store %284, %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = load %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %285, %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = addi %arg3, %arg4 : index + %287 = addi %arg5, %arg6 : index + %288 = load %arg0[%c782, %287] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %289 = load %arg1[%287, %286] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %290 = "accv.bin_op"(%288, %289) {predicate = 2 : i64} : (f32, f32) -> f32 + %291 = load %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = "accv.bin_op"(%291, %290) {predicate = 0 : i64} : (f32, f32) -> f32 + store %292, %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %293, %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = addi %arg3, %arg4 : index + %c1_25 = constant 1 : index + %295 = addi %294, %c1_25 : index + %296 = addi %arg5, %arg6 : index + %297 = load %arg0[%c782, %296] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %298 = load %arg1[%296, %295] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = "accv.bin_op"(%297, %298) {predicate = 2 : i64} : (f32, f32) -> f32 + %300 = load %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = "accv.bin_op"(%300, %299) {predicate = 0 : i64} : (f32, f32) -> f32 + store %301, %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %302 = load %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %302, %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addi %arg3, %arg4 : index + %c2_26 = constant 2 : index + %304 = addi %303, %c2_26 : index + %305 = addi %arg5, %arg6 : index + %306 = load %arg0[%c782, %305] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %307 = load %arg1[%305, %304] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %308 = "accv.bin_op"(%306, %307) {predicate = 2 : i64} : (f32, f32) -> f32 + %309 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = "accv.bin_op"(%309, %308) {predicate = 0 : i64} : (f32, f32) -> f32 + store %310, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %311, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addi %arg3, %arg4 : index + %c3_27 = constant 3 : index + %313 = addi %312, %c3_27 : index + %314 = addi %arg5, %arg6 : index + %315 = load %arg0[%c782, %314] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %316 = load %arg1[%314, %313] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = "accv.bin_op"(%315, %316) {predicate = 2 : i64} : (f32, f32) -> f32 + %318 = load %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = "accv.bin_op"(%318, %317) {predicate = 0 : i64} : (f32, f32) -> f32 + store %319, %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %320, %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addi %arg3, %arg4 : index + %c4_28 = constant 4 : index + %322 = addi %321, %c4_28 : index + %323 = addi %arg5, %arg6 : index + %324 = load %arg0[%c782, %323] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %325 = load %arg1[%323, %322] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = "accv.bin_op"(%324, %325) {predicate = 2 : i64} : (f32, f32) -> f32 + %327 = load %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = "accv.bin_op"(%327, %326) {predicate = 0 : i64} : (f32, f32) -> f32 + store %328, %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %329, %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addi %arg3, %arg4 : index + %c5_29 = constant 5 : index + %331 = addi %330, %c5_29 : index + %332 = addi %arg5, %arg6 : index + %333 = load %arg0[%c782, %332] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %334 = load %arg1[%332, %331] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = "accv.bin_op"(%333, %334) {predicate = 2 : i64} : (f32, f32) -> f32 + %336 = load %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = "accv.bin_op"(%336, %335) {predicate = 0 : i64} : (f32, f32) -> f32 + store %337, %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %338, %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addi %arg3, %arg4 : index + %c6_30 = constant 6 : index + %340 = addi %339, %c6_30 : index + %341 = addi %arg5, %arg6 : index + %342 = load %arg0[%c782, %341] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %343 = load %arg1[%341, %340] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = "accv.bin_op"(%342, %343) {predicate = 2 : i64} : (f32, f32) -> f32 + %345 = load %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = "accv.bin_op"(%345, %344) {predicate = 0 : i64} : (f32, f32) -> f32 + store %346, %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %347, %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addi %arg3, %arg4 : index + %c7_31 = constant 7 : index + %349 = addi %348, %c7_31 : index + %350 = addi %arg5, %arg6 : index + %351 = load %arg0[%c782, %350] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %352 = load %arg1[%350, %349] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = "accv.bin_op"(%351, %352) {predicate = 2 : i64} : (f32, f32) -> f32 + %354 = load %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = "accv.bin_op"(%354, %353) {predicate = 0 : i64} : (f32, f32) -> f32 + store %355, %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %356, %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addi %arg3, %arg4 : index + %c8_32 = constant 8 : index + %358 = addi %357, %c8_32 : index + %359 = addi %arg5, %arg6 : index + %360 = load %arg0[%c782, %359] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %361 = load %arg1[%359, %358] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = "accv.bin_op"(%360, %361) {predicate = 2 : i64} : (f32, f32) -> f32 + %363 = load %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = "accv.bin_op"(%363, %362) {predicate = 0 : i64} : (f32, f32) -> f32 + store %364, %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %365, %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addi %arg3, %arg4 : index + %c9_33 = constant 9 : index + %367 = addi %366, %c9_33 : index + %368 = addi %arg5, %arg6 : index + %369 = load %arg0[%c782, %368] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %370 = load %arg1[%368, %367] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = "accv.bin_op"(%369, %370) {predicate = 2 : i64} : (f32, f32) -> f32 + %372 = load %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = "accv.bin_op"(%372, %371) {predicate = 0 : i64} : (f32, f32) -> f32 + store %373, %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %374, %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addi %arg3, %arg4 : index + %c10_34 = constant 10 : index + %376 = addi %375, %c10_34 : index + %377 = addi %arg5, %arg6 : index + %378 = load %arg0[%c782, %377] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %379 = load %arg1[%377, %376] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = "accv.bin_op"(%378, %379) {predicate = 2 : i64} : (f32, f32) -> f32 + %381 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = "accv.bin_op"(%381, %380) {predicate = 0 : i64} : (f32, f32) -> f32 + store %382, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %383, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addi %arg3, %arg4 : index + %c11_35 = constant 11 : index + %385 = addi %384, %c11_35 : index + %386 = addi %arg5, %arg6 : index + %387 = load %arg0[%c782, %386] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %388 = load %arg1[%386, %385] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = "accv.bin_op"(%387, %388) {predicate = 2 : i64} : (f32, f32) -> f32 + %390 = load %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = "accv.bin_op"(%390, %389) {predicate = 0 : i64} : (f32, f32) -> f32 + store %391, %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %392, %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addi %arg3, %arg4 : index + %c12_36 = constant 12 : index + %394 = addi %393, %c12_36 : index + %395 = addi %arg5, %arg6 : index + %396 = load %arg0[%c782, %395] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %397 = load %arg1[%395, %394] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = "accv.bin_op"(%396, %397) {predicate = 2 : i64} : (f32, f32) -> f32 + %399 = load %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = "accv.bin_op"(%399, %398) {predicate = 0 : i64} : (f32, f32) -> f32 + store %400, %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %401 = load %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %401, %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addi %arg3, %arg4 : index + %c13_37 = constant 13 : index + %403 = addi %402, %c13_37 : index + %404 = addi %arg5, %arg6 : index + %405 = load %arg0[%c782, %404] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%404, %403] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = "accv.bin_op"(%405, %406) {predicate = 2 : i64} : (f32, f32) -> f32 + %408 = load %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = "accv.bin_op"(%408, %407) {predicate = 0 : i64} : (f32, f32) -> f32 + store %409, %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = addi %arg3, %arg4 : index + %c14_38 = constant 14 : index + %412 = addi %411, %c14_38 : index + %413 = addi %arg5, %arg6 : index + %414 = load %arg0[%c782, %413] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %415 = load %arg1[%413, %412] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = "accv.bin_op"(%414, %415) {predicate = 2 : i64} : (f32, f32) -> f32 + %417 = load %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %418 = "accv.bin_op"(%417, %416) {predicate = 0 : i64} : (f32, f32) -> f32 + store %418, %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = load %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %419, %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = addi %arg3, %arg4 : index + %c15_39 = constant 15 : index + %421 = addi %420, %c15_39 : index + %422 = addi %arg5, %arg6 : index + %423 = load %arg0[%c782, %422] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%422, %421] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = "accv.bin_op"(%423, %424) {predicate = 2 : i64} : (f32, f32) -> f32 + %426 = load %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = "accv.bin_op"(%426, %425) {predicate = 0 : i64} : (f32, f32) -> f32 + store %427, %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = addi %arg3, %arg4 : index + %430 = addi %arg5, %arg6 : index + %431 = load %arg0[%c783, %430] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %432 = load %arg1[%430, %429] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = "accv.bin_op"(%431, %432) {predicate = 2 : i64} : (f32, f32) -> f32 + %434 = load %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = "accv.bin_op"(%434, %433) {predicate = 0 : i64} : (f32, f32) -> f32 + store %435, %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %436 = load %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %436, %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = addi %arg3, %arg4 : index + %c1_40 = constant 1 : index + %438 = addi %437, %c1_40 : index + %439 = addi %arg5, %arg6 : index + %440 = load %arg0[%c783, %439] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %441 = load %arg1[%439, %438] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %442 = "accv.bin_op"(%440, %441) {predicate = 2 : i64} : (f32, f32) -> f32 + %443 = load %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %444 = "accv.bin_op"(%443, %442) {predicate = 0 : i64} : (f32, f32) -> f32 + store %444, %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = load %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %445, %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = addi %arg3, %arg4 : index + %c2_41 = constant 2 : index + %447 = addi %446, %c2_41 : index + %448 = addi %arg5, %arg6 : index + %449 = load %arg0[%c783, %448] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %450 = load %arg1[%448, %447] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = "accv.bin_op"(%449, %450) {predicate = 2 : i64} : (f32, f32) -> f32 + %452 = load %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = "accv.bin_op"(%452, %451) {predicate = 0 : i64} : (f32, f32) -> f32 + store %453, %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %454 = load %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %454, %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = addi %arg3, %arg4 : index + %c3_42 = constant 3 : index + %456 = addi %455, %c3_42 : index + %457 = addi %arg5, %arg6 : index + %458 = load %arg0[%c783, %457] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %459 = load %arg1[%457, %456] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = "accv.bin_op"(%458, %459) {predicate = 2 : i64} : (f32, f32) -> f32 + %461 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %462 = "accv.bin_op"(%461, %460) {predicate = 0 : i64} : (f32, f32) -> f32 + store %462, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %463, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = addi %arg3, %arg4 : index + %c4_43 = constant 4 : index + %465 = addi %464, %c4_43 : index + %466 = addi %arg5, %arg6 : index + %467 = load %arg0[%c783, %466] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %468 = load %arg1[%466, %465] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = "accv.bin_op"(%467, %468) {predicate = 2 : i64} : (f32, f32) -> f32 + %470 = load %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = "accv.bin_op"(%470, %469) {predicate = 0 : i64} : (f32, f32) -> f32 + store %471, %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %472 = load %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %472, %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = addi %arg3, %arg4 : index + %c5_44 = constant 5 : index + %474 = addi %473, %c5_44 : index + %475 = addi %arg5, %arg6 : index + %476 = load %arg0[%c783, %475] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %477 = load %arg1[%475, %474] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = "accv.bin_op"(%476, %477) {predicate = 2 : i64} : (f32, f32) -> f32 + %479 = load %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %480 = "accv.bin_op"(%479, %478) {predicate = 0 : i64} : (f32, f32) -> f32 + store %480, %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = load %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %481, %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = addi %arg3, %arg4 : index + %c6_45 = constant 6 : index + %483 = addi %482, %c6_45 : index + %484 = addi %arg5, %arg6 : index + %485 = load %arg0[%c783, %484] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %486 = load %arg1[%484, %483] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = "accv.bin_op"(%485, %486) {predicate = 2 : i64} : (f32, f32) -> f32 + %488 = load %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = "accv.bin_op"(%488, %487) {predicate = 0 : i64} : (f32, f32) -> f32 + store %489, %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %490 = load %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %490, %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = addi %arg3, %arg4 : index + %c7_46 = constant 7 : index + %492 = addi %491, %c7_46 : index + %493 = addi %arg5, %arg6 : index + %494 = load %arg0[%c783, %493] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %495 = load %arg1[%493, %492] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = "accv.bin_op"(%494, %495) {predicate = 2 : i64} : (f32, f32) -> f32 + %497 = load %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %498 = "accv.bin_op"(%497, %496) {predicate = 0 : i64} : (f32, f32) -> f32 + store %498, %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = load %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %499, %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = addi %arg3, %arg4 : index + %c8_47 = constant 8 : index + %501 = addi %500, %c8_47 : index + %502 = addi %arg5, %arg6 : index + %503 = load %arg0[%c783, %502] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %504 = load %arg1[%502, %501] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %505 = "accv.bin_op"(%503, %504) {predicate = 2 : i64} : (f32, f32) -> f32 + %506 = load %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = "accv.bin_op"(%506, %505) {predicate = 0 : i64} : (f32, f32) -> f32 + store %507, %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %508, %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %509 = addi %arg3, %arg4 : index + %c9_48 = constant 9 : index + %510 = addi %509, %c9_48 : index + %511 = addi %arg5, %arg6 : index + %512 = load %arg0[%c783, %511] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %513 = load %arg1[%511, %510] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = "accv.bin_op"(%512, %513) {predicate = 2 : i64} : (f32, f32) -> f32 + %515 = load %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = "accv.bin_op"(%515, %514) {predicate = 0 : i64} : (f32, f32) -> f32 + store %516, %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %517 = load %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %517, %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addi %arg3, %arg4 : index + %c10_49 = constant 10 : index + %519 = addi %518, %c10_49 : index + %520 = addi %arg5, %arg6 : index + %521 = load %arg0[%c783, %520] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %522 = load %arg1[%520, %519] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %523 = "accv.bin_op"(%521, %522) {predicate = 2 : i64} : (f32, f32) -> f32 + %524 = load %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = "accv.bin_op"(%524, %523) {predicate = 0 : i64} : (f32, f32) -> f32 + store %525, %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %526, %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %527 = addi %arg3, %arg4 : index + %c11_50 = constant 11 : index + %528 = addi %527, %c11_50 : index + %529 = addi %arg5, %arg6 : index + %530 = load %arg0[%c783, %529] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %531 = load %arg1[%529, %528] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = "accv.bin_op"(%530, %531) {predicate = 2 : i64} : (f32, f32) -> f32 + %533 = load %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = "accv.bin_op"(%533, %532) {predicate = 0 : i64} : (f32, f32) -> f32 + store %534, %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %535 = load %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %535, %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addi %arg3, %arg4 : index + %c12_51 = constant 12 : index + %537 = addi %536, %c12_51 : index + %538 = addi %arg5, %arg6 : index + %539 = load %arg0[%c783, %538] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %540 = load %arg1[%538, %537] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %541 = "accv.bin_op"(%539, %540) {predicate = 2 : i64} : (f32, f32) -> f32 + %542 = load %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = "accv.bin_op"(%542, %541) {predicate = 0 : i64} : (f32, f32) -> f32 + store %543, %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %544, %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %545 = addi %arg3, %arg4 : index + %c13_52 = constant 13 : index + %546 = addi %545, %c13_52 : index + %547 = addi %arg5, %arg6 : index + %548 = load %arg0[%c783, %547] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %549 = load %arg1[%547, %546] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = "accv.bin_op"(%548, %549) {predicate = 2 : i64} : (f32, f32) -> f32 + %551 = load %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = "accv.bin_op"(%551, %550) {predicate = 0 : i64} : (f32, f32) -> f32 + store %552, %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %553 = load %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %553, %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addi %arg3, %arg4 : index + %c14_53 = constant 14 : index + %555 = addi %554, %c14_53 : index + %556 = addi %arg5, %arg6 : index + %557 = load %arg0[%c783, %556] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %558 = load %arg1[%556, %555] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = "accv.bin_op"(%557, %558) {predicate = 2 : i64} : (f32, f32) -> f32 + %560 = load %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = "accv.bin_op"(%560, %559) {predicate = 0 : i64} : (f32, f32) -> f32 + store %561, %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %562, %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %563 = addi %arg3, %arg4 : index + %c15_54 = constant 15 : index + %564 = addi %563, %c15_54 : index + %565 = addi %arg5, %arg6 : index + %566 = load %arg0[%c783, %565] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %567 = load %arg1[%565, %564] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = "accv.bin_op"(%566, %567) {predicate = 2 : i64} : (f32, f32) -> f32 + %569 = load %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = "accv.bin_op"(%569, %568) {predicate = 0 : i64} : (f32, f32) -> f32 + store %570, %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %571 = load %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %571, %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/8_ConvertAffineToStandard.mlir b/Tutorials/optimized_matmul/mlir/8_ConvertAffineToStandard.mlir new file mode 100644 index 00000000..96ef8b49 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/8_ConvertAffineToStandard.mlir @@ -0,0 +1,17408 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + accv.module "optimized_matmul" { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c0 = constant 0 : index + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + %c0_1 = constant 0 : index + %c512 = constant 512 : index + %c256 = constant 256 : index + scf.for %arg3 = %c0_1 to %c512 step %c256 { + %c0_2 = constant 0 : index + %c128 = constant 128 : index + %c1_3 = constant 1 : index + scf.for %arg4 = %c0_2 to %c128 step %c1_3 { + %c0_6 = constant 0 : index + %c256_7 = constant 256 : index + %c128_8 = constant 128 : index + scf.for %arg5 = %c0_6 to %c256_7 step %c128_8 { + %c0_9 = constant 0 : index + %c0_10 = constant 0 : index + %4 = cmpi "eq", %c0_10, %c0_9 : index + scf.if %4 { + %5 = addi %arg3, %arg5 : index + %6 = vector.transfer_read %arg1[%arg4, %5], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %7 = addi %arg3, %arg5 : index + %c8_11 = constant 8 : index + %8 = addi %7, %c8_11 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %10 = addi %arg3, %arg5 : index + %c16 = constant 16 : index + %11 = addi %10, %c16 : index + %12 = vector.transfer_read %arg1[%arg4, %11], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %12, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %13 = addi %arg3, %arg5 : index + %c24 = constant 24 : index + %14 = addi %13, %c24 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %16 = addi %arg3, %arg5 : index + %c32 = constant 32 : index + %17 = addi %16, %c32 : index + %18 = vector.transfer_read %arg1[%arg4, %17], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %19 = addi %arg3, %arg5 : index + %c40 = constant 40 : index + %20 = addi %19, %c40 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %22 = addi %arg3, %arg5 : index + %c48 = constant 48 : index + %23 = addi %22, %c48 : index + %24 = vector.transfer_read %arg1[%arg4, %23], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %24, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %25 = addi %arg3, %arg5 : index + %c56 = constant 56 : index + %26 = addi %25, %c56 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %28 = addi %arg3, %arg5 : index + %c64 = constant 64 : index + %29 = addi %28, %c64 : index + %30 = vector.transfer_read %arg1[%arg4, %29], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %31 = addi %arg3, %arg5 : index + %c72 = constant 72 : index + %32 = addi %31, %c72 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %34 = addi %arg3, %arg5 : index + %c80 = constant 80 : index + %35 = addi %34, %c80 : index + %36 = vector.transfer_read %arg1[%arg4, %35], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %36, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %37 = addi %arg3, %arg5 : index + %c88 = constant 88 : index + %38 = addi %37, %c88 : index + %39 = vector.transfer_read %arg1[%arg4, %38], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %39, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %40 = addi %arg3, %arg5 : index + %c96 = constant 96 : index + %41 = addi %40, %c96 : index + %42 = vector.transfer_read %arg1[%arg4, %41], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %43 = addi %arg3, %arg5 : index + %c104 = constant 104 : index + %44 = addi %43, %c104 : index + %45 = vector.transfer_read %arg1[%arg4, %44], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %45, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %46 = addi %arg3, %arg5 : index + %c112 = constant 112 : index + %47 = addi %46, %c112 : index + %48 = vector.transfer_read %arg1[%arg4, %47], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %48, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %49 = addi %arg3, %arg5 : index + %c120 = constant 120 : index + %50 = addi %49, %c120 : index + %51 = vector.transfer_read %arg1[%arg4, %50], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %51, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %52 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %c16_12 = constant 16 : index + %c0_13 = constant 0 : index + %c-1 = constant -1 : index + %53 = cmpi "slt", %arg5, %c0_13 : index + %54 = subi %c-1, %arg5 : index + %55 = select %53, %54, %arg5 : index + %56 = divi_signed %55, %c16_12 : index + %57 = subi %c-1, %56 : index + %58 = select %53, %57, %56 : index + %c16_14 = constant 16 : index + %59 = remi_signed %58, %c16_14 : index + %c0_15 = constant 0 : index + %60 = cmpi "slt", %59, %c0_15 : index + %61 = addi %59, %c16_14 : index + %62 = select %60, %61, %59 : index + %c128_16 = constant 128 : index + %63 = remi_signed %arg4, %c128_16 : index + %c0_17 = constant 0 : index + %64 = cmpi "slt", %63, %c0_17 : index + %65 = addi %63, %c128_16 : index + %66 = select %64, %65, %63 : index + %c16_18 = constant 16 : index + %67 = remi_signed %arg5, %c16_18 : index + %c0_19 = constant 0 : index + %68 = cmpi "slt", %67, %c0_19 : index + %69 = addi %67, %c16_18 : index + %70 = select %68, %69, %67 : index + %c8_20 = constant 8 : index + %c0_21 = constant 0 : index + %c-1_22 = constant -1 : index + %71 = cmpi "slt", %70, %c0_21 : index + %72 = subi %c-1_22, %70 : index + %73 = select %71, %72, %70 : index + %74 = divi_signed %73, %c8_20 : index + %75 = subi %c-1_22, %74 : index + %76 = select %71, %75, %74 : index + %c2_23 = constant 2 : index + %77 = remi_signed %76, %c2_23 : index + %c0_24 = constant 0 : index + %78 = cmpi "slt", %77, %c0_24 : index + %79 = addi %77, %c2_23 : index + %80 = select %78, %79, %77 : index + store %52, %3[%62, %66, %80] : memref<16x128x2xvector<8xf32>> + %81 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %c8_25 = constant 8 : index + %82 = addi %arg5, %c8_25 : index + %c16_26 = constant 16 : index + %c0_27 = constant 0 : index + %c-1_28 = constant -1 : index + %83 = cmpi "slt", %82, %c0_27 : index + %84 = subi %c-1_28, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c16_26 : index + %87 = subi %c-1_28, %86 : index + %88 = select %83, %87, %86 : index + %c16_29 = constant 16 : index + %89 = remi_signed %88, %c16_29 : index + %c0_30 = constant 0 : index + %90 = cmpi "slt", %89, %c0_30 : index + %91 = addi %89, %c16_29 : index + %92 = select %90, %91, %89 : index + %c128_31 = constant 128 : index + %93 = remi_signed %arg4, %c128_31 : index + %c0_32 = constant 0 : index + %94 = cmpi "slt", %93, %c0_32 : index + %95 = addi %93, %c128_31 : index + %96 = select %94, %95, %93 : index + %c8_33 = constant 8 : index + %c0_34 = constant 0 : index + %c-1_35 = constant -1 : index + %97 = cmpi "slt", %arg5, %c0_34 : index + %98 = subi %c-1_35, %arg5 : index + %99 = select %97, %98, %arg5 : index + %100 = divi_signed %99, %c8_33 : index + %101 = subi %c-1_35, %100 : index + %102 = select %97, %101, %100 : index + %c8_36 = constant 8 : index + %103 = addi %arg5, %c8_36 : index + %c16_37 = constant 16 : index + %c0_38 = constant 0 : index + %c-1_39 = constant -1 : index + %104 = cmpi "slt", %103, %c0_38 : index + %105 = subi %c-1_39, %103 : index + %106 = select %104, %105, %103 : index + %107 = divi_signed %106, %c16_37 : index + %108 = subi %c-1_39, %107 : index + %109 = select %104, %108, %107 : index + %c-2 = constant -2 : index + %110 = muli %109, %c-2 : index + %111 = addi %102, %110 : index + %c8_40 = constant 8 : index + %c0_41 = constant 0 : index + %c-1_42 = constant -1 : index + %112 = cmpi "slt", %arg5, %c0_41 : index + %113 = subi %c-1_42, %arg5 : index + %114 = select %112, %113, %arg5 : index + %115 = divi_signed %114, %c8_40 : index + %116 = subi %c-1_42, %115 : index + %117 = select %112, %116, %115 : index + %c8_43 = constant 8 : index + %118 = addi %arg5, %c8_43 : index + %c16_44 = constant 16 : index + %c0_45 = constant 0 : index + %c-1_46 = constant -1 : index + %119 = cmpi "slt", %118, %c0_45 : index + %120 = subi %c-1_46, %118 : index + %121 = select %119, %120, %118 : index + %122 = divi_signed %121, %c16_44 : index + %123 = subi %c-1_46, %122 : index + %124 = select %119, %123, %122 : index + %c-2_47 = constant -2 : index + %125 = muli %124, %c-2_47 : index + %126 = addi %117, %125 : index + %c1_48 = constant 1 : index + %127 = addi %126, %c1_48 : index + %c2_49 = constant 2 : index + %c0_50 = constant 0 : index + %c-1_51 = constant -1 : index + %128 = cmpi "slt", %127, %c0_50 : index + %129 = subi %c-1_51, %127 : index + %130 = select %128, %129, %127 : index + %131 = divi_signed %130, %c2_49 : index + %132 = subi %c-1_51, %131 : index + %133 = select %128, %132, %131 : index + %c-2_52 = constant -2 : index + %134 = muli %133, %c-2_52 : index + %135 = addi %111, %134 : index + %c1_53 = constant 1 : index + %136 = addi %135, %c1_53 : index + store %81, %3[%92, %96, %136] : memref<16x128x2xvector<8xf32>> + %137 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %c16_54 = constant 16 : index + %c0_55 = constant 0 : index + %c-1_56 = constant -1 : index + %138 = cmpi "slt", %arg5, %c0_55 : index + %139 = subi %c-1_56, %arg5 : index + %140 = select %138, %139, %arg5 : index + %141 = divi_signed %140, %c16_54 : index + %142 = subi %c-1_56, %141 : index + %143 = select %138, %142, %141 : index + %c16_57 = constant 16 : index + %c0_58 = constant 0 : index + %c-1_59 = constant -1 : index + %144 = cmpi "slt", %arg5, %c0_58 : index + %145 = subi %c-1_59, %arg5 : index + %146 = select %144, %145, %arg5 : index + %147 = divi_signed %146, %c16_57 : index + %148 = subi %c-1_59, %147 : index + %149 = select %144, %148, %147 : index + %c1_60 = constant 1 : index + %150 = addi %149, %c1_60 : index + %c16_61 = constant 16 : index + %c0_62 = constant 0 : index + %c-1_63 = constant -1 : index + %151 = cmpi "slt", %150, %c0_62 : index + %152 = subi %c-1_63, %150 : index + %153 = select %151, %152, %150 : index + %154 = divi_signed %153, %c16_61 : index + %155 = subi %c-1_63, %154 : index + %156 = select %151, %155, %154 : index + %c-16 = constant -16 : index + %157 = muli %156, %c-16 : index + %158 = addi %143, %157 : index + %c1_64 = constant 1 : index + %159 = addi %158, %c1_64 : index + %c128_65 = constant 128 : index + %160 = remi_signed %arg4, %c128_65 : index + %c0_66 = constant 0 : index + %161 = cmpi "slt", %160, %c0_66 : index + %162 = addi %160, %c128_65 : index + %163 = select %161, %162, %160 : index + %c16_67 = constant 16 : index + %164 = remi_signed %arg5, %c16_67 : index + %c0_68 = constant 0 : index + %165 = cmpi "slt", %164, %c0_68 : index + %166 = addi %164, %c16_67 : index + %167 = select %165, %166, %164 : index + %c8_69 = constant 8 : index + %c0_70 = constant 0 : index + %c-1_71 = constant -1 : index + %168 = cmpi "slt", %167, %c0_70 : index + %169 = subi %c-1_71, %167 : index + %170 = select %168, %169, %167 : index + %171 = divi_signed %170, %c8_69 : index + %172 = subi %c-1_71, %171 : index + %173 = select %168, %172, %171 : index + %c2_72 = constant 2 : index + %174 = remi_signed %173, %c2_72 : index + %c0_73 = constant 0 : index + %175 = cmpi "slt", %174, %c0_73 : index + %176 = addi %174, %c2_72 : index + %177 = select %175, %176, %174 : index + store %137, %3[%159, %163, %177] : memref<16x128x2xvector<8xf32>> + %178 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %c24_74 = constant 24 : index + %179 = addi %arg5, %c24_74 : index + %c16_75 = constant 16 : index + %c0_76 = constant 0 : index + %c-1_77 = constant -1 : index + %180 = cmpi "slt", %179, %c0_76 : index + %181 = subi %c-1_77, %179 : index + %182 = select %180, %181, %179 : index + %183 = divi_signed %182, %c16_75 : index + %184 = subi %c-1_77, %183 : index + %185 = select %180, %184, %183 : index + %c16_78 = constant 16 : index + %186 = remi_signed %185, %c16_78 : index + %c0_79 = constant 0 : index + %187 = cmpi "slt", %186, %c0_79 : index + %188 = addi %186, %c16_78 : index + %189 = select %187, %188, %186 : index + %c128_80 = constant 128 : index + %190 = remi_signed %arg4, %c128_80 : index + %c0_81 = constant 0 : index + %191 = cmpi "slt", %190, %c0_81 : index + %192 = addi %190, %c128_80 : index + %193 = select %191, %192, %190 : index + %c8_82 = constant 8 : index + %c0_83 = constant 0 : index + %c-1_84 = constant -1 : index + %194 = cmpi "slt", %arg5, %c0_83 : index + %195 = subi %c-1_84, %arg5 : index + %196 = select %194, %195, %arg5 : index + %197 = divi_signed %196, %c8_82 : index + %198 = subi %c-1_84, %197 : index + %199 = select %194, %198, %197 : index + %c24_85 = constant 24 : index + %200 = addi %arg5, %c24_85 : index + %c16_86 = constant 16 : index + %c0_87 = constant 0 : index + %c-1_88 = constant -1 : index + %201 = cmpi "slt", %200, %c0_87 : index + %202 = subi %c-1_88, %200 : index + %203 = select %201, %202, %200 : index + %204 = divi_signed %203, %c16_86 : index + %205 = subi %c-1_88, %204 : index + %206 = select %201, %205, %204 : index + %c-2_89 = constant -2 : index + %207 = muli %206, %c-2_89 : index + %208 = addi %199, %207 : index + %c8_90 = constant 8 : index + %c0_91 = constant 0 : index + %c-1_92 = constant -1 : index + %209 = cmpi "slt", %arg5, %c0_91 : index + %210 = subi %c-1_92, %arg5 : index + %211 = select %209, %210, %arg5 : index + %212 = divi_signed %211, %c8_90 : index + %213 = subi %c-1_92, %212 : index + %214 = select %209, %213, %212 : index + %c24_93 = constant 24 : index + %215 = addi %arg5, %c24_93 : index + %c16_94 = constant 16 : index + %c0_95 = constant 0 : index + %c-1_96 = constant -1 : index + %216 = cmpi "slt", %215, %c0_95 : index + %217 = subi %c-1_96, %215 : index + %218 = select %216, %217, %215 : index + %219 = divi_signed %218, %c16_94 : index + %220 = subi %c-1_96, %219 : index + %221 = select %216, %220, %219 : index + %c-2_97 = constant -2 : index + %222 = muli %221, %c-2_97 : index + %223 = addi %214, %222 : index + %c3_98 = constant 3 : index + %224 = addi %223, %c3_98 : index + %c2_99 = constant 2 : index + %c0_100 = constant 0 : index + %c-1_101 = constant -1 : index + %225 = cmpi "slt", %224, %c0_100 : index + %226 = subi %c-1_101, %224 : index + %227 = select %225, %226, %224 : index + %228 = divi_signed %227, %c2_99 : index + %229 = subi %c-1_101, %228 : index + %230 = select %225, %229, %228 : index + %c-2_102 = constant -2 : index + %231 = muli %230, %c-2_102 : index + %232 = addi %208, %231 : index + %c3_103 = constant 3 : index + %233 = addi %232, %c3_103 : index + store %178, %3[%189, %193, %233] : memref<16x128x2xvector<8xf32>> + %234 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %c16_104 = constant 16 : index + %c0_105 = constant 0 : index + %c-1_106 = constant -1 : index + %235 = cmpi "slt", %arg5, %c0_105 : index + %236 = subi %c-1_106, %arg5 : index + %237 = select %235, %236, %arg5 : index + %238 = divi_signed %237, %c16_104 : index + %239 = subi %c-1_106, %238 : index + %240 = select %235, %239, %238 : index + %c16_107 = constant 16 : index + %c0_108 = constant 0 : index + %c-1_109 = constant -1 : index + %241 = cmpi "slt", %arg5, %c0_108 : index + %242 = subi %c-1_109, %arg5 : index + %243 = select %241, %242, %arg5 : index + %244 = divi_signed %243, %c16_107 : index + %245 = subi %c-1_109, %244 : index + %246 = select %241, %245, %244 : index + %c2_110 = constant 2 : index + %247 = addi %246, %c2_110 : index + %c16_111 = constant 16 : index + %c0_112 = constant 0 : index + %c-1_113 = constant -1 : index + %248 = cmpi "slt", %247, %c0_112 : index + %249 = subi %c-1_113, %247 : index + %250 = select %248, %249, %247 : index + %251 = divi_signed %250, %c16_111 : index + %252 = subi %c-1_113, %251 : index + %253 = select %248, %252, %251 : index + %c-16_114 = constant -16 : index + %254 = muli %253, %c-16_114 : index + %255 = addi %240, %254 : index + %c2_115 = constant 2 : index + %256 = addi %255, %c2_115 : index + %c128_116 = constant 128 : index + %257 = remi_signed %arg4, %c128_116 : index + %c0_117 = constant 0 : index + %258 = cmpi "slt", %257, %c0_117 : index + %259 = addi %257, %c128_116 : index + %260 = select %258, %259, %257 : index + %c16_118 = constant 16 : index + %261 = remi_signed %arg5, %c16_118 : index + %c0_119 = constant 0 : index + %262 = cmpi "slt", %261, %c0_119 : index + %263 = addi %261, %c16_118 : index + %264 = select %262, %263, %261 : index + %c8_120 = constant 8 : index + %c0_121 = constant 0 : index + %c-1_122 = constant -1 : index + %265 = cmpi "slt", %264, %c0_121 : index + %266 = subi %c-1_122, %264 : index + %267 = select %265, %266, %264 : index + %268 = divi_signed %267, %c8_120 : index + %269 = subi %c-1_122, %268 : index + %270 = select %265, %269, %268 : index + %c2_123 = constant 2 : index + %271 = remi_signed %270, %c2_123 : index + %c0_124 = constant 0 : index + %272 = cmpi "slt", %271, %c0_124 : index + %273 = addi %271, %c2_123 : index + %274 = select %272, %273, %271 : index + store %234, %3[%256, %260, %274] : memref<16x128x2xvector<8xf32>> + %275 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %c40_125 = constant 40 : index + %276 = addi %arg5, %c40_125 : index + %c16_126 = constant 16 : index + %c0_127 = constant 0 : index + %c-1_128 = constant -1 : index + %277 = cmpi "slt", %276, %c0_127 : index + %278 = subi %c-1_128, %276 : index + %279 = select %277, %278, %276 : index + %280 = divi_signed %279, %c16_126 : index + %281 = subi %c-1_128, %280 : index + %282 = select %277, %281, %280 : index + %c16_129 = constant 16 : index + %283 = remi_signed %282, %c16_129 : index + %c0_130 = constant 0 : index + %284 = cmpi "slt", %283, %c0_130 : index + %285 = addi %283, %c16_129 : index + %286 = select %284, %285, %283 : index + %c128_131 = constant 128 : index + %287 = remi_signed %arg4, %c128_131 : index + %c0_132 = constant 0 : index + %288 = cmpi "slt", %287, %c0_132 : index + %289 = addi %287, %c128_131 : index + %290 = select %288, %289, %287 : index + %c8_133 = constant 8 : index + %c0_134 = constant 0 : index + %c-1_135 = constant -1 : index + %291 = cmpi "slt", %arg5, %c0_134 : index + %292 = subi %c-1_135, %arg5 : index + %293 = select %291, %292, %arg5 : index + %294 = divi_signed %293, %c8_133 : index + %295 = subi %c-1_135, %294 : index + %296 = select %291, %295, %294 : index + %c40_136 = constant 40 : index + %297 = addi %arg5, %c40_136 : index + %c16_137 = constant 16 : index + %c0_138 = constant 0 : index + %c-1_139 = constant -1 : index + %298 = cmpi "slt", %297, %c0_138 : index + %299 = subi %c-1_139, %297 : index + %300 = select %298, %299, %297 : index + %301 = divi_signed %300, %c16_137 : index + %302 = subi %c-1_139, %301 : index + %303 = select %298, %302, %301 : index + %c-2_140 = constant -2 : index + %304 = muli %303, %c-2_140 : index + %305 = addi %296, %304 : index + %c8_141 = constant 8 : index + %c0_142 = constant 0 : index + %c-1_143 = constant -1 : index + %306 = cmpi "slt", %arg5, %c0_142 : index + %307 = subi %c-1_143, %arg5 : index + %308 = select %306, %307, %arg5 : index + %309 = divi_signed %308, %c8_141 : index + %310 = subi %c-1_143, %309 : index + %311 = select %306, %310, %309 : index + %c40_144 = constant 40 : index + %312 = addi %arg5, %c40_144 : index + %c16_145 = constant 16 : index + %c0_146 = constant 0 : index + %c-1_147 = constant -1 : index + %313 = cmpi "slt", %312, %c0_146 : index + %314 = subi %c-1_147, %312 : index + %315 = select %313, %314, %312 : index + %316 = divi_signed %315, %c16_145 : index + %317 = subi %c-1_147, %316 : index + %318 = select %313, %317, %316 : index + %c-2_148 = constant -2 : index + %319 = muli %318, %c-2_148 : index + %320 = addi %311, %319 : index + %c5_149 = constant 5 : index + %321 = addi %320, %c5_149 : index + %c2_150 = constant 2 : index + %c0_151 = constant 0 : index + %c-1_152 = constant -1 : index + %322 = cmpi "slt", %321, %c0_151 : index + %323 = subi %c-1_152, %321 : index + %324 = select %322, %323, %321 : index + %325 = divi_signed %324, %c2_150 : index + %326 = subi %c-1_152, %325 : index + %327 = select %322, %326, %325 : index + %c-2_153 = constant -2 : index + %328 = muli %327, %c-2_153 : index + %329 = addi %305, %328 : index + %c5_154 = constant 5 : index + %330 = addi %329, %c5_154 : index + store %275, %3[%286, %290, %330] : memref<16x128x2xvector<8xf32>> + %331 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %c16_155 = constant 16 : index + %c0_156 = constant 0 : index + %c-1_157 = constant -1 : index + %332 = cmpi "slt", %arg5, %c0_156 : index + %333 = subi %c-1_157, %arg5 : index + %334 = select %332, %333, %arg5 : index + %335 = divi_signed %334, %c16_155 : index + %336 = subi %c-1_157, %335 : index + %337 = select %332, %336, %335 : index + %c16_158 = constant 16 : index + %c0_159 = constant 0 : index + %c-1_160 = constant -1 : index + %338 = cmpi "slt", %arg5, %c0_159 : index + %339 = subi %c-1_160, %arg5 : index + %340 = select %338, %339, %arg5 : index + %341 = divi_signed %340, %c16_158 : index + %342 = subi %c-1_160, %341 : index + %343 = select %338, %342, %341 : index + %c3_161 = constant 3 : index + %344 = addi %343, %c3_161 : index + %c16_162 = constant 16 : index + %c0_163 = constant 0 : index + %c-1_164 = constant -1 : index + %345 = cmpi "slt", %344, %c0_163 : index + %346 = subi %c-1_164, %344 : index + %347 = select %345, %346, %344 : index + %348 = divi_signed %347, %c16_162 : index + %349 = subi %c-1_164, %348 : index + %350 = select %345, %349, %348 : index + %c-16_165 = constant -16 : index + %351 = muli %350, %c-16_165 : index + %352 = addi %337, %351 : index + %c3_166 = constant 3 : index + %353 = addi %352, %c3_166 : index + %c128_167 = constant 128 : index + %354 = remi_signed %arg4, %c128_167 : index + %c0_168 = constant 0 : index + %355 = cmpi "slt", %354, %c0_168 : index + %356 = addi %354, %c128_167 : index + %357 = select %355, %356, %354 : index + %c16_169 = constant 16 : index + %358 = remi_signed %arg5, %c16_169 : index + %c0_170 = constant 0 : index + %359 = cmpi "slt", %358, %c0_170 : index + %360 = addi %358, %c16_169 : index + %361 = select %359, %360, %358 : index + %c8_171 = constant 8 : index + %c0_172 = constant 0 : index + %c-1_173 = constant -1 : index + %362 = cmpi "slt", %361, %c0_172 : index + %363 = subi %c-1_173, %361 : index + %364 = select %362, %363, %361 : index + %365 = divi_signed %364, %c8_171 : index + %366 = subi %c-1_173, %365 : index + %367 = select %362, %366, %365 : index + %c2_174 = constant 2 : index + %368 = remi_signed %367, %c2_174 : index + %c0_175 = constant 0 : index + %369 = cmpi "slt", %368, %c0_175 : index + %370 = addi %368, %c2_174 : index + %371 = select %369, %370, %368 : index + store %331, %3[%353, %357, %371] : memref<16x128x2xvector<8xf32>> + %372 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %c56_176 = constant 56 : index + %373 = addi %arg5, %c56_176 : index + %c16_177 = constant 16 : index + %c0_178 = constant 0 : index + %c-1_179 = constant -1 : index + %374 = cmpi "slt", %373, %c0_178 : index + %375 = subi %c-1_179, %373 : index + %376 = select %374, %375, %373 : index + %377 = divi_signed %376, %c16_177 : index + %378 = subi %c-1_179, %377 : index + %379 = select %374, %378, %377 : index + %c16_180 = constant 16 : index + %380 = remi_signed %379, %c16_180 : index + %c0_181 = constant 0 : index + %381 = cmpi "slt", %380, %c0_181 : index + %382 = addi %380, %c16_180 : index + %383 = select %381, %382, %380 : index + %c128_182 = constant 128 : index + %384 = remi_signed %arg4, %c128_182 : index + %c0_183 = constant 0 : index + %385 = cmpi "slt", %384, %c0_183 : index + %386 = addi %384, %c128_182 : index + %387 = select %385, %386, %384 : index + %c8_184 = constant 8 : index + %c0_185 = constant 0 : index + %c-1_186 = constant -1 : index + %388 = cmpi "slt", %arg5, %c0_185 : index + %389 = subi %c-1_186, %arg5 : index + %390 = select %388, %389, %arg5 : index + %391 = divi_signed %390, %c8_184 : index + %392 = subi %c-1_186, %391 : index + %393 = select %388, %392, %391 : index + %c56_187 = constant 56 : index + %394 = addi %arg5, %c56_187 : index + %c16_188 = constant 16 : index + %c0_189 = constant 0 : index + %c-1_190 = constant -1 : index + %395 = cmpi "slt", %394, %c0_189 : index + %396 = subi %c-1_190, %394 : index + %397 = select %395, %396, %394 : index + %398 = divi_signed %397, %c16_188 : index + %399 = subi %c-1_190, %398 : index + %400 = select %395, %399, %398 : index + %c-2_191 = constant -2 : index + %401 = muli %400, %c-2_191 : index + %402 = addi %393, %401 : index + %c8_192 = constant 8 : index + %c0_193 = constant 0 : index + %c-1_194 = constant -1 : index + %403 = cmpi "slt", %arg5, %c0_193 : index + %404 = subi %c-1_194, %arg5 : index + %405 = select %403, %404, %arg5 : index + %406 = divi_signed %405, %c8_192 : index + %407 = subi %c-1_194, %406 : index + %408 = select %403, %407, %406 : index + %c56_195 = constant 56 : index + %409 = addi %arg5, %c56_195 : index + %c16_196 = constant 16 : index + %c0_197 = constant 0 : index + %c-1_198 = constant -1 : index + %410 = cmpi "slt", %409, %c0_197 : index + %411 = subi %c-1_198, %409 : index + %412 = select %410, %411, %409 : index + %413 = divi_signed %412, %c16_196 : index + %414 = subi %c-1_198, %413 : index + %415 = select %410, %414, %413 : index + %c-2_199 = constant -2 : index + %416 = muli %415, %c-2_199 : index + %417 = addi %408, %416 : index + %c7_200 = constant 7 : index + %418 = addi %417, %c7_200 : index + %c2_201 = constant 2 : index + %c0_202 = constant 0 : index + %c-1_203 = constant -1 : index + %419 = cmpi "slt", %418, %c0_202 : index + %420 = subi %c-1_203, %418 : index + %421 = select %419, %420, %418 : index + %422 = divi_signed %421, %c2_201 : index + %423 = subi %c-1_203, %422 : index + %424 = select %419, %423, %422 : index + %c-2_204 = constant -2 : index + %425 = muli %424, %c-2_204 : index + %426 = addi %402, %425 : index + %c7_205 = constant 7 : index + %427 = addi %426, %c7_205 : index + store %372, %3[%383, %387, %427] : memref<16x128x2xvector<8xf32>> + %428 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %c16_206 = constant 16 : index + %c0_207 = constant 0 : index + %c-1_208 = constant -1 : index + %429 = cmpi "slt", %arg5, %c0_207 : index + %430 = subi %c-1_208, %arg5 : index + %431 = select %429, %430, %arg5 : index + %432 = divi_signed %431, %c16_206 : index + %433 = subi %c-1_208, %432 : index + %434 = select %429, %433, %432 : index + %c16_209 = constant 16 : index + %c0_210 = constant 0 : index + %c-1_211 = constant -1 : index + %435 = cmpi "slt", %arg5, %c0_210 : index + %436 = subi %c-1_211, %arg5 : index + %437 = select %435, %436, %arg5 : index + %438 = divi_signed %437, %c16_209 : index + %439 = subi %c-1_211, %438 : index + %440 = select %435, %439, %438 : index + %c4_212 = constant 4 : index + %441 = addi %440, %c4_212 : index + %c16_213 = constant 16 : index + %c0_214 = constant 0 : index + %c-1_215 = constant -1 : index + %442 = cmpi "slt", %441, %c0_214 : index + %443 = subi %c-1_215, %441 : index + %444 = select %442, %443, %441 : index + %445 = divi_signed %444, %c16_213 : index + %446 = subi %c-1_215, %445 : index + %447 = select %442, %446, %445 : index + %c-16_216 = constant -16 : index + %448 = muli %447, %c-16_216 : index + %449 = addi %434, %448 : index + %c4_217 = constant 4 : index + %450 = addi %449, %c4_217 : index + %c128_218 = constant 128 : index + %451 = remi_signed %arg4, %c128_218 : index + %c0_219 = constant 0 : index + %452 = cmpi "slt", %451, %c0_219 : index + %453 = addi %451, %c128_218 : index + %454 = select %452, %453, %451 : index + %c16_220 = constant 16 : index + %455 = remi_signed %arg5, %c16_220 : index + %c0_221 = constant 0 : index + %456 = cmpi "slt", %455, %c0_221 : index + %457 = addi %455, %c16_220 : index + %458 = select %456, %457, %455 : index + %c8_222 = constant 8 : index + %c0_223 = constant 0 : index + %c-1_224 = constant -1 : index + %459 = cmpi "slt", %458, %c0_223 : index + %460 = subi %c-1_224, %458 : index + %461 = select %459, %460, %458 : index + %462 = divi_signed %461, %c8_222 : index + %463 = subi %c-1_224, %462 : index + %464 = select %459, %463, %462 : index + %c2_225 = constant 2 : index + %465 = remi_signed %464, %c2_225 : index + %c0_226 = constant 0 : index + %466 = cmpi "slt", %465, %c0_226 : index + %467 = addi %465, %c2_225 : index + %468 = select %466, %467, %465 : index + store %428, %3[%450, %454, %468] : memref<16x128x2xvector<8xf32>> + %469 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %c72_227 = constant 72 : index + %470 = addi %arg5, %c72_227 : index + %c16_228 = constant 16 : index + %c0_229 = constant 0 : index + %c-1_230 = constant -1 : index + %471 = cmpi "slt", %470, %c0_229 : index + %472 = subi %c-1_230, %470 : index + %473 = select %471, %472, %470 : index + %474 = divi_signed %473, %c16_228 : index + %475 = subi %c-1_230, %474 : index + %476 = select %471, %475, %474 : index + %c16_231 = constant 16 : index + %477 = remi_signed %476, %c16_231 : index + %c0_232 = constant 0 : index + %478 = cmpi "slt", %477, %c0_232 : index + %479 = addi %477, %c16_231 : index + %480 = select %478, %479, %477 : index + %c128_233 = constant 128 : index + %481 = remi_signed %arg4, %c128_233 : index + %c0_234 = constant 0 : index + %482 = cmpi "slt", %481, %c0_234 : index + %483 = addi %481, %c128_233 : index + %484 = select %482, %483, %481 : index + %c8_235 = constant 8 : index + %c0_236 = constant 0 : index + %c-1_237 = constant -1 : index + %485 = cmpi "slt", %arg5, %c0_236 : index + %486 = subi %c-1_237, %arg5 : index + %487 = select %485, %486, %arg5 : index + %488 = divi_signed %487, %c8_235 : index + %489 = subi %c-1_237, %488 : index + %490 = select %485, %489, %488 : index + %c72_238 = constant 72 : index + %491 = addi %arg5, %c72_238 : index + %c16_239 = constant 16 : index + %c0_240 = constant 0 : index + %c-1_241 = constant -1 : index + %492 = cmpi "slt", %491, %c0_240 : index + %493 = subi %c-1_241, %491 : index + %494 = select %492, %493, %491 : index + %495 = divi_signed %494, %c16_239 : index + %496 = subi %c-1_241, %495 : index + %497 = select %492, %496, %495 : index + %c-2_242 = constant -2 : index + %498 = muli %497, %c-2_242 : index + %499 = addi %490, %498 : index + %c8_243 = constant 8 : index + %c0_244 = constant 0 : index + %c-1_245 = constant -1 : index + %500 = cmpi "slt", %arg5, %c0_244 : index + %501 = subi %c-1_245, %arg5 : index + %502 = select %500, %501, %arg5 : index + %503 = divi_signed %502, %c8_243 : index + %504 = subi %c-1_245, %503 : index + %505 = select %500, %504, %503 : index + %c72_246 = constant 72 : index + %506 = addi %arg5, %c72_246 : index + %c16_247 = constant 16 : index + %c0_248 = constant 0 : index + %c-1_249 = constant -1 : index + %507 = cmpi "slt", %506, %c0_248 : index + %508 = subi %c-1_249, %506 : index + %509 = select %507, %508, %506 : index + %510 = divi_signed %509, %c16_247 : index + %511 = subi %c-1_249, %510 : index + %512 = select %507, %511, %510 : index + %c-2_250 = constant -2 : index + %513 = muli %512, %c-2_250 : index + %514 = addi %505, %513 : index + %c9_251 = constant 9 : index + %515 = addi %514, %c9_251 : index + %c2_252 = constant 2 : index + %c0_253 = constant 0 : index + %c-1_254 = constant -1 : index + %516 = cmpi "slt", %515, %c0_253 : index + %517 = subi %c-1_254, %515 : index + %518 = select %516, %517, %515 : index + %519 = divi_signed %518, %c2_252 : index + %520 = subi %c-1_254, %519 : index + %521 = select %516, %520, %519 : index + %c-2_255 = constant -2 : index + %522 = muli %521, %c-2_255 : index + %523 = addi %499, %522 : index + %c9_256 = constant 9 : index + %524 = addi %523, %c9_256 : index + store %469, %3[%480, %484, %524] : memref<16x128x2xvector<8xf32>> + %525 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %c16_257 = constant 16 : index + %c0_258 = constant 0 : index + %c-1_259 = constant -1 : index + %526 = cmpi "slt", %arg5, %c0_258 : index + %527 = subi %c-1_259, %arg5 : index + %528 = select %526, %527, %arg5 : index + %529 = divi_signed %528, %c16_257 : index + %530 = subi %c-1_259, %529 : index + %531 = select %526, %530, %529 : index + %c16_260 = constant 16 : index + %c0_261 = constant 0 : index + %c-1_262 = constant -1 : index + %532 = cmpi "slt", %arg5, %c0_261 : index + %533 = subi %c-1_262, %arg5 : index + %534 = select %532, %533, %arg5 : index + %535 = divi_signed %534, %c16_260 : index + %536 = subi %c-1_262, %535 : index + %537 = select %532, %536, %535 : index + %c5_263 = constant 5 : index + %538 = addi %537, %c5_263 : index + %c16_264 = constant 16 : index + %c0_265 = constant 0 : index + %c-1_266 = constant -1 : index + %539 = cmpi "slt", %538, %c0_265 : index + %540 = subi %c-1_266, %538 : index + %541 = select %539, %540, %538 : index + %542 = divi_signed %541, %c16_264 : index + %543 = subi %c-1_266, %542 : index + %544 = select %539, %543, %542 : index + %c-16_267 = constant -16 : index + %545 = muli %544, %c-16_267 : index + %546 = addi %531, %545 : index + %c5_268 = constant 5 : index + %547 = addi %546, %c5_268 : index + %c128_269 = constant 128 : index + %548 = remi_signed %arg4, %c128_269 : index + %c0_270 = constant 0 : index + %549 = cmpi "slt", %548, %c0_270 : index + %550 = addi %548, %c128_269 : index + %551 = select %549, %550, %548 : index + %c16_271 = constant 16 : index + %552 = remi_signed %arg5, %c16_271 : index + %c0_272 = constant 0 : index + %553 = cmpi "slt", %552, %c0_272 : index + %554 = addi %552, %c16_271 : index + %555 = select %553, %554, %552 : index + %c8_273 = constant 8 : index + %c0_274 = constant 0 : index + %c-1_275 = constant -1 : index + %556 = cmpi "slt", %555, %c0_274 : index + %557 = subi %c-1_275, %555 : index + %558 = select %556, %557, %555 : index + %559 = divi_signed %558, %c8_273 : index + %560 = subi %c-1_275, %559 : index + %561 = select %556, %560, %559 : index + %c2_276 = constant 2 : index + %562 = remi_signed %561, %c2_276 : index + %c0_277 = constant 0 : index + %563 = cmpi "slt", %562, %c0_277 : index + %564 = addi %562, %c2_276 : index + %565 = select %563, %564, %562 : index + store %525, %3[%547, %551, %565] : memref<16x128x2xvector<8xf32>> + %566 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %c88_278 = constant 88 : index + %567 = addi %arg5, %c88_278 : index + %c16_279 = constant 16 : index + %c0_280 = constant 0 : index + %c-1_281 = constant -1 : index + %568 = cmpi "slt", %567, %c0_280 : index + %569 = subi %c-1_281, %567 : index + %570 = select %568, %569, %567 : index + %571 = divi_signed %570, %c16_279 : index + %572 = subi %c-1_281, %571 : index + %573 = select %568, %572, %571 : index + %c16_282 = constant 16 : index + %574 = remi_signed %573, %c16_282 : index + %c0_283 = constant 0 : index + %575 = cmpi "slt", %574, %c0_283 : index + %576 = addi %574, %c16_282 : index + %577 = select %575, %576, %574 : index + %c128_284 = constant 128 : index + %578 = remi_signed %arg4, %c128_284 : index + %c0_285 = constant 0 : index + %579 = cmpi "slt", %578, %c0_285 : index + %580 = addi %578, %c128_284 : index + %581 = select %579, %580, %578 : index + %c8_286 = constant 8 : index + %c0_287 = constant 0 : index + %c-1_288 = constant -1 : index + %582 = cmpi "slt", %arg5, %c0_287 : index + %583 = subi %c-1_288, %arg5 : index + %584 = select %582, %583, %arg5 : index + %585 = divi_signed %584, %c8_286 : index + %586 = subi %c-1_288, %585 : index + %587 = select %582, %586, %585 : index + %c88_289 = constant 88 : index + %588 = addi %arg5, %c88_289 : index + %c16_290 = constant 16 : index + %c0_291 = constant 0 : index + %c-1_292 = constant -1 : index + %589 = cmpi "slt", %588, %c0_291 : index + %590 = subi %c-1_292, %588 : index + %591 = select %589, %590, %588 : index + %592 = divi_signed %591, %c16_290 : index + %593 = subi %c-1_292, %592 : index + %594 = select %589, %593, %592 : index + %c-2_293 = constant -2 : index + %595 = muli %594, %c-2_293 : index + %596 = addi %587, %595 : index + %c8_294 = constant 8 : index + %c0_295 = constant 0 : index + %c-1_296 = constant -1 : index + %597 = cmpi "slt", %arg5, %c0_295 : index + %598 = subi %c-1_296, %arg5 : index + %599 = select %597, %598, %arg5 : index + %600 = divi_signed %599, %c8_294 : index + %601 = subi %c-1_296, %600 : index + %602 = select %597, %601, %600 : index + %c88_297 = constant 88 : index + %603 = addi %arg5, %c88_297 : index + %c16_298 = constant 16 : index + %c0_299 = constant 0 : index + %c-1_300 = constant -1 : index + %604 = cmpi "slt", %603, %c0_299 : index + %605 = subi %c-1_300, %603 : index + %606 = select %604, %605, %603 : index + %607 = divi_signed %606, %c16_298 : index + %608 = subi %c-1_300, %607 : index + %609 = select %604, %608, %607 : index + %c-2_301 = constant -2 : index + %610 = muli %609, %c-2_301 : index + %611 = addi %602, %610 : index + %c11_302 = constant 11 : index + %612 = addi %611, %c11_302 : index + %c2_303 = constant 2 : index + %c0_304 = constant 0 : index + %c-1_305 = constant -1 : index + %613 = cmpi "slt", %612, %c0_304 : index + %614 = subi %c-1_305, %612 : index + %615 = select %613, %614, %612 : index + %616 = divi_signed %615, %c2_303 : index + %617 = subi %c-1_305, %616 : index + %618 = select %613, %617, %616 : index + %c-2_306 = constant -2 : index + %619 = muli %618, %c-2_306 : index + %620 = addi %596, %619 : index + %c11_307 = constant 11 : index + %621 = addi %620, %c11_307 : index + store %566, %3[%577, %581, %621] : memref<16x128x2xvector<8xf32>> + %622 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %c16_308 = constant 16 : index + %c0_309 = constant 0 : index + %c-1_310 = constant -1 : index + %623 = cmpi "slt", %arg5, %c0_309 : index + %624 = subi %c-1_310, %arg5 : index + %625 = select %623, %624, %arg5 : index + %626 = divi_signed %625, %c16_308 : index + %627 = subi %c-1_310, %626 : index + %628 = select %623, %627, %626 : index + %c16_311 = constant 16 : index + %c0_312 = constant 0 : index + %c-1_313 = constant -1 : index + %629 = cmpi "slt", %arg5, %c0_312 : index + %630 = subi %c-1_313, %arg5 : index + %631 = select %629, %630, %arg5 : index + %632 = divi_signed %631, %c16_311 : index + %633 = subi %c-1_313, %632 : index + %634 = select %629, %633, %632 : index + %c6_314 = constant 6 : index + %635 = addi %634, %c6_314 : index + %c16_315 = constant 16 : index + %c0_316 = constant 0 : index + %c-1_317 = constant -1 : index + %636 = cmpi "slt", %635, %c0_316 : index + %637 = subi %c-1_317, %635 : index + %638 = select %636, %637, %635 : index + %639 = divi_signed %638, %c16_315 : index + %640 = subi %c-1_317, %639 : index + %641 = select %636, %640, %639 : index + %c-16_318 = constant -16 : index + %642 = muli %641, %c-16_318 : index + %643 = addi %628, %642 : index + %c6_319 = constant 6 : index + %644 = addi %643, %c6_319 : index + %c128_320 = constant 128 : index + %645 = remi_signed %arg4, %c128_320 : index + %c0_321 = constant 0 : index + %646 = cmpi "slt", %645, %c0_321 : index + %647 = addi %645, %c128_320 : index + %648 = select %646, %647, %645 : index + %c16_322 = constant 16 : index + %649 = remi_signed %arg5, %c16_322 : index + %c0_323 = constant 0 : index + %650 = cmpi "slt", %649, %c0_323 : index + %651 = addi %649, %c16_322 : index + %652 = select %650, %651, %649 : index + %c8_324 = constant 8 : index + %c0_325 = constant 0 : index + %c-1_326 = constant -1 : index + %653 = cmpi "slt", %652, %c0_325 : index + %654 = subi %c-1_326, %652 : index + %655 = select %653, %654, %652 : index + %656 = divi_signed %655, %c8_324 : index + %657 = subi %c-1_326, %656 : index + %658 = select %653, %657, %656 : index + %c2_327 = constant 2 : index + %659 = remi_signed %658, %c2_327 : index + %c0_328 = constant 0 : index + %660 = cmpi "slt", %659, %c0_328 : index + %661 = addi %659, %c2_327 : index + %662 = select %660, %661, %659 : index + store %622, %3[%644, %648, %662] : memref<16x128x2xvector<8xf32>> + %663 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %c104_329 = constant 104 : index + %664 = addi %arg5, %c104_329 : index + %c16_330 = constant 16 : index + %c0_331 = constant 0 : index + %c-1_332 = constant -1 : index + %665 = cmpi "slt", %664, %c0_331 : index + %666 = subi %c-1_332, %664 : index + %667 = select %665, %666, %664 : index + %668 = divi_signed %667, %c16_330 : index + %669 = subi %c-1_332, %668 : index + %670 = select %665, %669, %668 : index + %c16_333 = constant 16 : index + %671 = remi_signed %670, %c16_333 : index + %c0_334 = constant 0 : index + %672 = cmpi "slt", %671, %c0_334 : index + %673 = addi %671, %c16_333 : index + %674 = select %672, %673, %671 : index + %c128_335 = constant 128 : index + %675 = remi_signed %arg4, %c128_335 : index + %c0_336 = constant 0 : index + %676 = cmpi "slt", %675, %c0_336 : index + %677 = addi %675, %c128_335 : index + %678 = select %676, %677, %675 : index + %c8_337 = constant 8 : index + %c0_338 = constant 0 : index + %c-1_339 = constant -1 : index + %679 = cmpi "slt", %arg5, %c0_338 : index + %680 = subi %c-1_339, %arg5 : index + %681 = select %679, %680, %arg5 : index + %682 = divi_signed %681, %c8_337 : index + %683 = subi %c-1_339, %682 : index + %684 = select %679, %683, %682 : index + %c104_340 = constant 104 : index + %685 = addi %arg5, %c104_340 : index + %c16_341 = constant 16 : index + %c0_342 = constant 0 : index + %c-1_343 = constant -1 : index + %686 = cmpi "slt", %685, %c0_342 : index + %687 = subi %c-1_343, %685 : index + %688 = select %686, %687, %685 : index + %689 = divi_signed %688, %c16_341 : index + %690 = subi %c-1_343, %689 : index + %691 = select %686, %690, %689 : index + %c-2_344 = constant -2 : index + %692 = muli %691, %c-2_344 : index + %693 = addi %684, %692 : index + %c8_345 = constant 8 : index + %c0_346 = constant 0 : index + %c-1_347 = constant -1 : index + %694 = cmpi "slt", %arg5, %c0_346 : index + %695 = subi %c-1_347, %arg5 : index + %696 = select %694, %695, %arg5 : index + %697 = divi_signed %696, %c8_345 : index + %698 = subi %c-1_347, %697 : index + %699 = select %694, %698, %697 : index + %c104_348 = constant 104 : index + %700 = addi %arg5, %c104_348 : index + %c16_349 = constant 16 : index + %c0_350 = constant 0 : index + %c-1_351 = constant -1 : index + %701 = cmpi "slt", %700, %c0_350 : index + %702 = subi %c-1_351, %700 : index + %703 = select %701, %702, %700 : index + %704 = divi_signed %703, %c16_349 : index + %705 = subi %c-1_351, %704 : index + %706 = select %701, %705, %704 : index + %c-2_352 = constant -2 : index + %707 = muli %706, %c-2_352 : index + %708 = addi %699, %707 : index + %c13_353 = constant 13 : index + %709 = addi %708, %c13_353 : index + %c2_354 = constant 2 : index + %c0_355 = constant 0 : index + %c-1_356 = constant -1 : index + %710 = cmpi "slt", %709, %c0_355 : index + %711 = subi %c-1_356, %709 : index + %712 = select %710, %711, %709 : index + %713 = divi_signed %712, %c2_354 : index + %714 = subi %c-1_356, %713 : index + %715 = select %710, %714, %713 : index + %c-2_357 = constant -2 : index + %716 = muli %715, %c-2_357 : index + %717 = addi %693, %716 : index + %c13_358 = constant 13 : index + %718 = addi %717, %c13_358 : index + store %663, %3[%674, %678, %718] : memref<16x128x2xvector<8xf32>> + %719 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %c16_359 = constant 16 : index + %c0_360 = constant 0 : index + %c-1_361 = constant -1 : index + %720 = cmpi "slt", %arg5, %c0_360 : index + %721 = subi %c-1_361, %arg5 : index + %722 = select %720, %721, %arg5 : index + %723 = divi_signed %722, %c16_359 : index + %724 = subi %c-1_361, %723 : index + %725 = select %720, %724, %723 : index + %c16_362 = constant 16 : index + %c0_363 = constant 0 : index + %c-1_364 = constant -1 : index + %726 = cmpi "slt", %arg5, %c0_363 : index + %727 = subi %c-1_364, %arg5 : index + %728 = select %726, %727, %arg5 : index + %729 = divi_signed %728, %c16_362 : index + %730 = subi %c-1_364, %729 : index + %731 = select %726, %730, %729 : index + %c7_365 = constant 7 : index + %732 = addi %731, %c7_365 : index + %c16_366 = constant 16 : index + %c0_367 = constant 0 : index + %c-1_368 = constant -1 : index + %733 = cmpi "slt", %732, %c0_367 : index + %734 = subi %c-1_368, %732 : index + %735 = select %733, %734, %732 : index + %736 = divi_signed %735, %c16_366 : index + %737 = subi %c-1_368, %736 : index + %738 = select %733, %737, %736 : index + %c-16_369 = constant -16 : index + %739 = muli %738, %c-16_369 : index + %740 = addi %725, %739 : index + %c7_370 = constant 7 : index + %741 = addi %740, %c7_370 : index + %c128_371 = constant 128 : index + %742 = remi_signed %arg4, %c128_371 : index + %c0_372 = constant 0 : index + %743 = cmpi "slt", %742, %c0_372 : index + %744 = addi %742, %c128_371 : index + %745 = select %743, %744, %742 : index + %c16_373 = constant 16 : index + %746 = remi_signed %arg5, %c16_373 : index + %c0_374 = constant 0 : index + %747 = cmpi "slt", %746, %c0_374 : index + %748 = addi %746, %c16_373 : index + %749 = select %747, %748, %746 : index + %c8_375 = constant 8 : index + %c0_376 = constant 0 : index + %c-1_377 = constant -1 : index + %750 = cmpi "slt", %749, %c0_376 : index + %751 = subi %c-1_377, %749 : index + %752 = select %750, %751, %749 : index + %753 = divi_signed %752, %c8_375 : index + %754 = subi %c-1_377, %753 : index + %755 = select %750, %754, %753 : index + %c2_378 = constant 2 : index + %756 = remi_signed %755, %c2_378 : index + %c0_379 = constant 0 : index + %757 = cmpi "slt", %756, %c0_379 : index + %758 = addi %756, %c2_378 : index + %759 = select %757, %758, %756 : index + store %719, %3[%741, %745, %759] : memref<16x128x2xvector<8xf32>> + %760 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %c120_380 = constant 120 : index + %761 = addi %arg5, %c120_380 : index + %c16_381 = constant 16 : index + %c0_382 = constant 0 : index + %c-1_383 = constant -1 : index + %762 = cmpi "slt", %761, %c0_382 : index + %763 = subi %c-1_383, %761 : index + %764 = select %762, %763, %761 : index + %765 = divi_signed %764, %c16_381 : index + %766 = subi %c-1_383, %765 : index + %767 = select %762, %766, %765 : index + %c16_384 = constant 16 : index + %768 = remi_signed %767, %c16_384 : index + %c0_385 = constant 0 : index + %769 = cmpi "slt", %768, %c0_385 : index + %770 = addi %768, %c16_384 : index + %771 = select %769, %770, %768 : index + %c128_386 = constant 128 : index + %772 = remi_signed %arg4, %c128_386 : index + %c0_387 = constant 0 : index + %773 = cmpi "slt", %772, %c0_387 : index + %774 = addi %772, %c128_386 : index + %775 = select %773, %774, %772 : index + %c8_388 = constant 8 : index + %c0_389 = constant 0 : index + %c-1_390 = constant -1 : index + %776 = cmpi "slt", %arg5, %c0_389 : index + %777 = subi %c-1_390, %arg5 : index + %778 = select %776, %777, %arg5 : index + %779 = divi_signed %778, %c8_388 : index + %780 = subi %c-1_390, %779 : index + %781 = select %776, %780, %779 : index + %c120_391 = constant 120 : index + %782 = addi %arg5, %c120_391 : index + %c16_392 = constant 16 : index + %c0_393 = constant 0 : index + %c-1_394 = constant -1 : index + %783 = cmpi "slt", %782, %c0_393 : index + %784 = subi %c-1_394, %782 : index + %785 = select %783, %784, %782 : index + %786 = divi_signed %785, %c16_392 : index + %787 = subi %c-1_394, %786 : index + %788 = select %783, %787, %786 : index + %c-2_395 = constant -2 : index + %789 = muli %788, %c-2_395 : index + %790 = addi %781, %789 : index + %c8_396 = constant 8 : index + %c0_397 = constant 0 : index + %c-1_398 = constant -1 : index + %791 = cmpi "slt", %arg5, %c0_397 : index + %792 = subi %c-1_398, %arg5 : index + %793 = select %791, %792, %arg5 : index + %794 = divi_signed %793, %c8_396 : index + %795 = subi %c-1_398, %794 : index + %796 = select %791, %795, %794 : index + %c120_399 = constant 120 : index + %797 = addi %arg5, %c120_399 : index + %c16_400 = constant 16 : index + %c0_401 = constant 0 : index + %c-1_402 = constant -1 : index + %798 = cmpi "slt", %797, %c0_401 : index + %799 = subi %c-1_402, %797 : index + %800 = select %798, %799, %797 : index + %801 = divi_signed %800, %c16_400 : index + %802 = subi %c-1_402, %801 : index + %803 = select %798, %802, %801 : index + %c-2_403 = constant -2 : index + %804 = muli %803, %c-2_403 : index + %805 = addi %796, %804 : index + %c15_404 = constant 15 : index + %806 = addi %805, %c15_404 : index + %c2_405 = constant 2 : index + %c0_406 = constant 0 : index + %c-1_407 = constant -1 : index + %807 = cmpi "slt", %806, %c0_406 : index + %808 = subi %c-1_407, %806 : index + %809 = select %807, %808, %806 : index + %810 = divi_signed %809, %c2_405 : index + %811 = subi %c-1_407, %810 : index + %812 = select %807, %811, %810 : index + %c-2_408 = constant -2 : index + %813 = muli %812, %c-2_408 : index + %814 = addi %790, %813 : index + %c15_409 = constant 15 : index + %815 = addi %814, %c15_409 : index + store %760, %3[%771, %775, %815] : memref<16x128x2xvector<8xf32>> + } else { + %5 = addi %arg3, %arg5 : index + %6 = vector.transfer_read %arg1[%arg4, %5], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %6, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %7 = addi %arg3, %arg5 : index + %c8_11 = constant 8 : index + %8 = addi %7, %c8_11 : index + %9 = vector.transfer_read %arg1[%arg4, %8], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %9, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %10 = addi %arg3, %arg5 : index + %c16 = constant 16 : index + %11 = addi %10, %c16 : index + %12 = vector.transfer_read %arg1[%arg4, %11], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %12, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %13 = addi %arg3, %arg5 : index + %c24 = constant 24 : index + %14 = addi %13, %c24 : index + %15 = vector.transfer_read %arg1[%arg4, %14], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %15, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %16 = addi %arg3, %arg5 : index + %c32 = constant 32 : index + %17 = addi %16, %c32 : index + %18 = vector.transfer_read %arg1[%arg4, %17], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %18, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %19 = addi %arg3, %arg5 : index + %c40 = constant 40 : index + %20 = addi %19, %c40 : index + %21 = vector.transfer_read %arg1[%arg4, %20], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %21, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %22 = addi %arg3, %arg5 : index + %c48 = constant 48 : index + %23 = addi %22, %c48 : index + %24 = vector.transfer_read %arg1[%arg4, %23], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %24, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %25 = addi %arg3, %arg5 : index + %c56 = constant 56 : index + %26 = addi %25, %c56 : index + %27 = vector.transfer_read %arg1[%arg4, %26], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %27, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %28 = addi %arg3, %arg5 : index + %c64 = constant 64 : index + %29 = addi %28, %c64 : index + %30 = vector.transfer_read %arg1[%arg4, %29], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %30, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %31 = addi %arg3, %arg5 : index + %c72 = constant 72 : index + %32 = addi %31, %c72 : index + %33 = vector.transfer_read %arg1[%arg4, %32], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %33, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %34 = addi %arg3, %arg5 : index + %c80 = constant 80 : index + %35 = addi %34, %c80 : index + %36 = vector.transfer_read %arg1[%arg4, %35], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %36, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %37 = addi %arg3, %arg5 : index + %c88 = constant 88 : index + %38 = addi %37, %c88 : index + %39 = vector.transfer_read %arg1[%arg4, %38], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %39, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %40 = addi %arg3, %arg5 : index + %c96 = constant 96 : index + %41 = addi %40, %c96 : index + %42 = vector.transfer_read %arg1[%arg4, %41], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %42, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %43 = addi %arg3, %arg5 : index + %c104 = constant 104 : index + %44 = addi %43, %c104 : index + %45 = vector.transfer_read %arg1[%arg4, %44], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %45, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %46 = addi %arg3, %arg5 : index + %c112 = constant 112 : index + %47 = addi %46, %c112 : index + %48 = vector.transfer_read %arg1[%arg4, %47], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %48, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %49 = addi %arg3, %arg5 : index + %c120 = constant 120 : index + %50 = addi %49, %c120 : index + %51 = vector.transfer_read %arg1[%arg4, %50], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %51, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %52 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %c16_12 = constant 16 : index + %c0_13 = constant 0 : index + %c-1 = constant -1 : index + %53 = cmpi "slt", %arg5, %c0_13 : index + %54 = subi %c-1, %arg5 : index + %55 = select %53, %54, %arg5 : index + %56 = divi_signed %55, %c16_12 : index + %57 = subi %c-1, %56 : index + %58 = select %53, %57, %56 : index + %c16_14 = constant 16 : index + %59 = remi_signed %58, %c16_14 : index + %c0_15 = constant 0 : index + %60 = cmpi "slt", %59, %c0_15 : index + %61 = addi %59, %c16_14 : index + %62 = select %60, %61, %59 : index + %c128_16 = constant 128 : index + %63 = remi_signed %arg4, %c128_16 : index + %c0_17 = constant 0 : index + %64 = cmpi "slt", %63, %c0_17 : index + %65 = addi %63, %c128_16 : index + %66 = select %64, %65, %63 : index + %c16_18 = constant 16 : index + %67 = remi_signed %arg5, %c16_18 : index + %c0_19 = constant 0 : index + %68 = cmpi "slt", %67, %c0_19 : index + %69 = addi %67, %c16_18 : index + %70 = select %68, %69, %67 : index + %c8_20 = constant 8 : index + %c0_21 = constant 0 : index + %c-1_22 = constant -1 : index + %71 = cmpi "slt", %70, %c0_21 : index + %72 = subi %c-1_22, %70 : index + %73 = select %71, %72, %70 : index + %74 = divi_signed %73, %c8_20 : index + %75 = subi %c-1_22, %74 : index + %76 = select %71, %75, %74 : index + %c2_23 = constant 2 : index + %77 = remi_signed %76, %c2_23 : index + %c0_24 = constant 0 : index + %78 = cmpi "slt", %77, %c0_24 : index + %79 = addi %77, %c2_23 : index + %80 = select %78, %79, %77 : index + store %52, %3[%62, %66, %80] : memref<16x128x2xvector<8xf32>> + %81 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %c8_25 = constant 8 : index + %82 = addi %arg5, %c8_25 : index + %c16_26 = constant 16 : index + %c0_27 = constant 0 : index + %c-1_28 = constant -1 : index + %83 = cmpi "slt", %82, %c0_27 : index + %84 = subi %c-1_28, %82 : index + %85 = select %83, %84, %82 : index + %86 = divi_signed %85, %c16_26 : index + %87 = subi %c-1_28, %86 : index + %88 = select %83, %87, %86 : index + %c16_29 = constant 16 : index + %89 = remi_signed %88, %c16_29 : index + %c0_30 = constant 0 : index + %90 = cmpi "slt", %89, %c0_30 : index + %91 = addi %89, %c16_29 : index + %92 = select %90, %91, %89 : index + %c128_31 = constant 128 : index + %93 = remi_signed %arg4, %c128_31 : index + %c0_32 = constant 0 : index + %94 = cmpi "slt", %93, %c0_32 : index + %95 = addi %93, %c128_31 : index + %96 = select %94, %95, %93 : index + %c8_33 = constant 8 : index + %c0_34 = constant 0 : index + %c-1_35 = constant -1 : index + %97 = cmpi "slt", %arg5, %c0_34 : index + %98 = subi %c-1_35, %arg5 : index + %99 = select %97, %98, %arg5 : index + %100 = divi_signed %99, %c8_33 : index + %101 = subi %c-1_35, %100 : index + %102 = select %97, %101, %100 : index + %c8_36 = constant 8 : index + %103 = addi %arg5, %c8_36 : index + %c16_37 = constant 16 : index + %c0_38 = constant 0 : index + %c-1_39 = constant -1 : index + %104 = cmpi "slt", %103, %c0_38 : index + %105 = subi %c-1_39, %103 : index + %106 = select %104, %105, %103 : index + %107 = divi_signed %106, %c16_37 : index + %108 = subi %c-1_39, %107 : index + %109 = select %104, %108, %107 : index + %c-2 = constant -2 : index + %110 = muli %109, %c-2 : index + %111 = addi %102, %110 : index + %c8_40 = constant 8 : index + %c0_41 = constant 0 : index + %c-1_42 = constant -1 : index + %112 = cmpi "slt", %arg5, %c0_41 : index + %113 = subi %c-1_42, %arg5 : index + %114 = select %112, %113, %arg5 : index + %115 = divi_signed %114, %c8_40 : index + %116 = subi %c-1_42, %115 : index + %117 = select %112, %116, %115 : index + %c8_43 = constant 8 : index + %118 = addi %arg5, %c8_43 : index + %c16_44 = constant 16 : index + %c0_45 = constant 0 : index + %c-1_46 = constant -1 : index + %119 = cmpi "slt", %118, %c0_45 : index + %120 = subi %c-1_46, %118 : index + %121 = select %119, %120, %118 : index + %122 = divi_signed %121, %c16_44 : index + %123 = subi %c-1_46, %122 : index + %124 = select %119, %123, %122 : index + %c-2_47 = constant -2 : index + %125 = muli %124, %c-2_47 : index + %126 = addi %117, %125 : index + %c1_48 = constant 1 : index + %127 = addi %126, %c1_48 : index + %c2_49 = constant 2 : index + %c0_50 = constant 0 : index + %c-1_51 = constant -1 : index + %128 = cmpi "slt", %127, %c0_50 : index + %129 = subi %c-1_51, %127 : index + %130 = select %128, %129, %127 : index + %131 = divi_signed %130, %c2_49 : index + %132 = subi %c-1_51, %131 : index + %133 = select %128, %132, %131 : index + %c-2_52 = constant -2 : index + %134 = muli %133, %c-2_52 : index + %135 = addi %111, %134 : index + %c1_53 = constant 1 : index + %136 = addi %135, %c1_53 : index + store %81, %3[%92, %96, %136] : memref<16x128x2xvector<8xf32>> + %137 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %c16_54 = constant 16 : index + %c0_55 = constant 0 : index + %c-1_56 = constant -1 : index + %138 = cmpi "slt", %arg5, %c0_55 : index + %139 = subi %c-1_56, %arg5 : index + %140 = select %138, %139, %arg5 : index + %141 = divi_signed %140, %c16_54 : index + %142 = subi %c-1_56, %141 : index + %143 = select %138, %142, %141 : index + %c16_57 = constant 16 : index + %c0_58 = constant 0 : index + %c-1_59 = constant -1 : index + %144 = cmpi "slt", %arg5, %c0_58 : index + %145 = subi %c-1_59, %arg5 : index + %146 = select %144, %145, %arg5 : index + %147 = divi_signed %146, %c16_57 : index + %148 = subi %c-1_59, %147 : index + %149 = select %144, %148, %147 : index + %c1_60 = constant 1 : index + %150 = addi %149, %c1_60 : index + %c16_61 = constant 16 : index + %c0_62 = constant 0 : index + %c-1_63 = constant -1 : index + %151 = cmpi "slt", %150, %c0_62 : index + %152 = subi %c-1_63, %150 : index + %153 = select %151, %152, %150 : index + %154 = divi_signed %153, %c16_61 : index + %155 = subi %c-1_63, %154 : index + %156 = select %151, %155, %154 : index + %c-16 = constant -16 : index + %157 = muli %156, %c-16 : index + %158 = addi %143, %157 : index + %c1_64 = constant 1 : index + %159 = addi %158, %c1_64 : index + %c128_65 = constant 128 : index + %160 = remi_signed %arg4, %c128_65 : index + %c0_66 = constant 0 : index + %161 = cmpi "slt", %160, %c0_66 : index + %162 = addi %160, %c128_65 : index + %163 = select %161, %162, %160 : index + %c16_67 = constant 16 : index + %164 = remi_signed %arg5, %c16_67 : index + %c0_68 = constant 0 : index + %165 = cmpi "slt", %164, %c0_68 : index + %166 = addi %164, %c16_67 : index + %167 = select %165, %166, %164 : index + %c8_69 = constant 8 : index + %c0_70 = constant 0 : index + %c-1_71 = constant -1 : index + %168 = cmpi "slt", %167, %c0_70 : index + %169 = subi %c-1_71, %167 : index + %170 = select %168, %169, %167 : index + %171 = divi_signed %170, %c8_69 : index + %172 = subi %c-1_71, %171 : index + %173 = select %168, %172, %171 : index + %c2_72 = constant 2 : index + %174 = remi_signed %173, %c2_72 : index + %c0_73 = constant 0 : index + %175 = cmpi "slt", %174, %c0_73 : index + %176 = addi %174, %c2_72 : index + %177 = select %175, %176, %174 : index + store %137, %3[%159, %163, %177] : memref<16x128x2xvector<8xf32>> + %178 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %c24_74 = constant 24 : index + %179 = addi %arg5, %c24_74 : index + %c16_75 = constant 16 : index + %c0_76 = constant 0 : index + %c-1_77 = constant -1 : index + %180 = cmpi "slt", %179, %c0_76 : index + %181 = subi %c-1_77, %179 : index + %182 = select %180, %181, %179 : index + %183 = divi_signed %182, %c16_75 : index + %184 = subi %c-1_77, %183 : index + %185 = select %180, %184, %183 : index + %c16_78 = constant 16 : index + %186 = remi_signed %185, %c16_78 : index + %c0_79 = constant 0 : index + %187 = cmpi "slt", %186, %c0_79 : index + %188 = addi %186, %c16_78 : index + %189 = select %187, %188, %186 : index + %c128_80 = constant 128 : index + %190 = remi_signed %arg4, %c128_80 : index + %c0_81 = constant 0 : index + %191 = cmpi "slt", %190, %c0_81 : index + %192 = addi %190, %c128_80 : index + %193 = select %191, %192, %190 : index + %c8_82 = constant 8 : index + %c0_83 = constant 0 : index + %c-1_84 = constant -1 : index + %194 = cmpi "slt", %arg5, %c0_83 : index + %195 = subi %c-1_84, %arg5 : index + %196 = select %194, %195, %arg5 : index + %197 = divi_signed %196, %c8_82 : index + %198 = subi %c-1_84, %197 : index + %199 = select %194, %198, %197 : index + %c24_85 = constant 24 : index + %200 = addi %arg5, %c24_85 : index + %c16_86 = constant 16 : index + %c0_87 = constant 0 : index + %c-1_88 = constant -1 : index + %201 = cmpi "slt", %200, %c0_87 : index + %202 = subi %c-1_88, %200 : index + %203 = select %201, %202, %200 : index + %204 = divi_signed %203, %c16_86 : index + %205 = subi %c-1_88, %204 : index + %206 = select %201, %205, %204 : index + %c-2_89 = constant -2 : index + %207 = muli %206, %c-2_89 : index + %208 = addi %199, %207 : index + %c8_90 = constant 8 : index + %c0_91 = constant 0 : index + %c-1_92 = constant -1 : index + %209 = cmpi "slt", %arg5, %c0_91 : index + %210 = subi %c-1_92, %arg5 : index + %211 = select %209, %210, %arg5 : index + %212 = divi_signed %211, %c8_90 : index + %213 = subi %c-1_92, %212 : index + %214 = select %209, %213, %212 : index + %c24_93 = constant 24 : index + %215 = addi %arg5, %c24_93 : index + %c16_94 = constant 16 : index + %c0_95 = constant 0 : index + %c-1_96 = constant -1 : index + %216 = cmpi "slt", %215, %c0_95 : index + %217 = subi %c-1_96, %215 : index + %218 = select %216, %217, %215 : index + %219 = divi_signed %218, %c16_94 : index + %220 = subi %c-1_96, %219 : index + %221 = select %216, %220, %219 : index + %c-2_97 = constant -2 : index + %222 = muli %221, %c-2_97 : index + %223 = addi %214, %222 : index + %c3_98 = constant 3 : index + %224 = addi %223, %c3_98 : index + %c2_99 = constant 2 : index + %c0_100 = constant 0 : index + %c-1_101 = constant -1 : index + %225 = cmpi "slt", %224, %c0_100 : index + %226 = subi %c-1_101, %224 : index + %227 = select %225, %226, %224 : index + %228 = divi_signed %227, %c2_99 : index + %229 = subi %c-1_101, %228 : index + %230 = select %225, %229, %228 : index + %c-2_102 = constant -2 : index + %231 = muli %230, %c-2_102 : index + %232 = addi %208, %231 : index + %c3_103 = constant 3 : index + %233 = addi %232, %c3_103 : index + store %178, %3[%189, %193, %233] : memref<16x128x2xvector<8xf32>> + %234 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %c16_104 = constant 16 : index + %c0_105 = constant 0 : index + %c-1_106 = constant -1 : index + %235 = cmpi "slt", %arg5, %c0_105 : index + %236 = subi %c-1_106, %arg5 : index + %237 = select %235, %236, %arg5 : index + %238 = divi_signed %237, %c16_104 : index + %239 = subi %c-1_106, %238 : index + %240 = select %235, %239, %238 : index + %c16_107 = constant 16 : index + %c0_108 = constant 0 : index + %c-1_109 = constant -1 : index + %241 = cmpi "slt", %arg5, %c0_108 : index + %242 = subi %c-1_109, %arg5 : index + %243 = select %241, %242, %arg5 : index + %244 = divi_signed %243, %c16_107 : index + %245 = subi %c-1_109, %244 : index + %246 = select %241, %245, %244 : index + %c2_110 = constant 2 : index + %247 = addi %246, %c2_110 : index + %c16_111 = constant 16 : index + %c0_112 = constant 0 : index + %c-1_113 = constant -1 : index + %248 = cmpi "slt", %247, %c0_112 : index + %249 = subi %c-1_113, %247 : index + %250 = select %248, %249, %247 : index + %251 = divi_signed %250, %c16_111 : index + %252 = subi %c-1_113, %251 : index + %253 = select %248, %252, %251 : index + %c-16_114 = constant -16 : index + %254 = muli %253, %c-16_114 : index + %255 = addi %240, %254 : index + %c2_115 = constant 2 : index + %256 = addi %255, %c2_115 : index + %c128_116 = constant 128 : index + %257 = remi_signed %arg4, %c128_116 : index + %c0_117 = constant 0 : index + %258 = cmpi "slt", %257, %c0_117 : index + %259 = addi %257, %c128_116 : index + %260 = select %258, %259, %257 : index + %c16_118 = constant 16 : index + %261 = remi_signed %arg5, %c16_118 : index + %c0_119 = constant 0 : index + %262 = cmpi "slt", %261, %c0_119 : index + %263 = addi %261, %c16_118 : index + %264 = select %262, %263, %261 : index + %c8_120 = constant 8 : index + %c0_121 = constant 0 : index + %c-1_122 = constant -1 : index + %265 = cmpi "slt", %264, %c0_121 : index + %266 = subi %c-1_122, %264 : index + %267 = select %265, %266, %264 : index + %268 = divi_signed %267, %c8_120 : index + %269 = subi %c-1_122, %268 : index + %270 = select %265, %269, %268 : index + %c2_123 = constant 2 : index + %271 = remi_signed %270, %c2_123 : index + %c0_124 = constant 0 : index + %272 = cmpi "slt", %271, %c0_124 : index + %273 = addi %271, %c2_123 : index + %274 = select %272, %273, %271 : index + store %234, %3[%256, %260, %274] : memref<16x128x2xvector<8xf32>> + %275 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %c40_125 = constant 40 : index + %276 = addi %arg5, %c40_125 : index + %c16_126 = constant 16 : index + %c0_127 = constant 0 : index + %c-1_128 = constant -1 : index + %277 = cmpi "slt", %276, %c0_127 : index + %278 = subi %c-1_128, %276 : index + %279 = select %277, %278, %276 : index + %280 = divi_signed %279, %c16_126 : index + %281 = subi %c-1_128, %280 : index + %282 = select %277, %281, %280 : index + %c16_129 = constant 16 : index + %283 = remi_signed %282, %c16_129 : index + %c0_130 = constant 0 : index + %284 = cmpi "slt", %283, %c0_130 : index + %285 = addi %283, %c16_129 : index + %286 = select %284, %285, %283 : index + %c128_131 = constant 128 : index + %287 = remi_signed %arg4, %c128_131 : index + %c0_132 = constant 0 : index + %288 = cmpi "slt", %287, %c0_132 : index + %289 = addi %287, %c128_131 : index + %290 = select %288, %289, %287 : index + %c8_133 = constant 8 : index + %c0_134 = constant 0 : index + %c-1_135 = constant -1 : index + %291 = cmpi "slt", %arg5, %c0_134 : index + %292 = subi %c-1_135, %arg5 : index + %293 = select %291, %292, %arg5 : index + %294 = divi_signed %293, %c8_133 : index + %295 = subi %c-1_135, %294 : index + %296 = select %291, %295, %294 : index + %c40_136 = constant 40 : index + %297 = addi %arg5, %c40_136 : index + %c16_137 = constant 16 : index + %c0_138 = constant 0 : index + %c-1_139 = constant -1 : index + %298 = cmpi "slt", %297, %c0_138 : index + %299 = subi %c-1_139, %297 : index + %300 = select %298, %299, %297 : index + %301 = divi_signed %300, %c16_137 : index + %302 = subi %c-1_139, %301 : index + %303 = select %298, %302, %301 : index + %c-2_140 = constant -2 : index + %304 = muli %303, %c-2_140 : index + %305 = addi %296, %304 : index + %c8_141 = constant 8 : index + %c0_142 = constant 0 : index + %c-1_143 = constant -1 : index + %306 = cmpi "slt", %arg5, %c0_142 : index + %307 = subi %c-1_143, %arg5 : index + %308 = select %306, %307, %arg5 : index + %309 = divi_signed %308, %c8_141 : index + %310 = subi %c-1_143, %309 : index + %311 = select %306, %310, %309 : index + %c40_144 = constant 40 : index + %312 = addi %arg5, %c40_144 : index + %c16_145 = constant 16 : index + %c0_146 = constant 0 : index + %c-1_147 = constant -1 : index + %313 = cmpi "slt", %312, %c0_146 : index + %314 = subi %c-1_147, %312 : index + %315 = select %313, %314, %312 : index + %316 = divi_signed %315, %c16_145 : index + %317 = subi %c-1_147, %316 : index + %318 = select %313, %317, %316 : index + %c-2_148 = constant -2 : index + %319 = muli %318, %c-2_148 : index + %320 = addi %311, %319 : index + %c5_149 = constant 5 : index + %321 = addi %320, %c5_149 : index + %c2_150 = constant 2 : index + %c0_151 = constant 0 : index + %c-1_152 = constant -1 : index + %322 = cmpi "slt", %321, %c0_151 : index + %323 = subi %c-1_152, %321 : index + %324 = select %322, %323, %321 : index + %325 = divi_signed %324, %c2_150 : index + %326 = subi %c-1_152, %325 : index + %327 = select %322, %326, %325 : index + %c-2_153 = constant -2 : index + %328 = muli %327, %c-2_153 : index + %329 = addi %305, %328 : index + %c5_154 = constant 5 : index + %330 = addi %329, %c5_154 : index + store %275, %3[%286, %290, %330] : memref<16x128x2xvector<8xf32>> + %331 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %c16_155 = constant 16 : index + %c0_156 = constant 0 : index + %c-1_157 = constant -1 : index + %332 = cmpi "slt", %arg5, %c0_156 : index + %333 = subi %c-1_157, %arg5 : index + %334 = select %332, %333, %arg5 : index + %335 = divi_signed %334, %c16_155 : index + %336 = subi %c-1_157, %335 : index + %337 = select %332, %336, %335 : index + %c16_158 = constant 16 : index + %c0_159 = constant 0 : index + %c-1_160 = constant -1 : index + %338 = cmpi "slt", %arg5, %c0_159 : index + %339 = subi %c-1_160, %arg5 : index + %340 = select %338, %339, %arg5 : index + %341 = divi_signed %340, %c16_158 : index + %342 = subi %c-1_160, %341 : index + %343 = select %338, %342, %341 : index + %c3_161 = constant 3 : index + %344 = addi %343, %c3_161 : index + %c16_162 = constant 16 : index + %c0_163 = constant 0 : index + %c-1_164 = constant -1 : index + %345 = cmpi "slt", %344, %c0_163 : index + %346 = subi %c-1_164, %344 : index + %347 = select %345, %346, %344 : index + %348 = divi_signed %347, %c16_162 : index + %349 = subi %c-1_164, %348 : index + %350 = select %345, %349, %348 : index + %c-16_165 = constant -16 : index + %351 = muli %350, %c-16_165 : index + %352 = addi %337, %351 : index + %c3_166 = constant 3 : index + %353 = addi %352, %c3_166 : index + %c128_167 = constant 128 : index + %354 = remi_signed %arg4, %c128_167 : index + %c0_168 = constant 0 : index + %355 = cmpi "slt", %354, %c0_168 : index + %356 = addi %354, %c128_167 : index + %357 = select %355, %356, %354 : index + %c16_169 = constant 16 : index + %358 = remi_signed %arg5, %c16_169 : index + %c0_170 = constant 0 : index + %359 = cmpi "slt", %358, %c0_170 : index + %360 = addi %358, %c16_169 : index + %361 = select %359, %360, %358 : index + %c8_171 = constant 8 : index + %c0_172 = constant 0 : index + %c-1_173 = constant -1 : index + %362 = cmpi "slt", %361, %c0_172 : index + %363 = subi %c-1_173, %361 : index + %364 = select %362, %363, %361 : index + %365 = divi_signed %364, %c8_171 : index + %366 = subi %c-1_173, %365 : index + %367 = select %362, %366, %365 : index + %c2_174 = constant 2 : index + %368 = remi_signed %367, %c2_174 : index + %c0_175 = constant 0 : index + %369 = cmpi "slt", %368, %c0_175 : index + %370 = addi %368, %c2_174 : index + %371 = select %369, %370, %368 : index + store %331, %3[%353, %357, %371] : memref<16x128x2xvector<8xf32>> + %372 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %c56_176 = constant 56 : index + %373 = addi %arg5, %c56_176 : index + %c16_177 = constant 16 : index + %c0_178 = constant 0 : index + %c-1_179 = constant -1 : index + %374 = cmpi "slt", %373, %c0_178 : index + %375 = subi %c-1_179, %373 : index + %376 = select %374, %375, %373 : index + %377 = divi_signed %376, %c16_177 : index + %378 = subi %c-1_179, %377 : index + %379 = select %374, %378, %377 : index + %c16_180 = constant 16 : index + %380 = remi_signed %379, %c16_180 : index + %c0_181 = constant 0 : index + %381 = cmpi "slt", %380, %c0_181 : index + %382 = addi %380, %c16_180 : index + %383 = select %381, %382, %380 : index + %c128_182 = constant 128 : index + %384 = remi_signed %arg4, %c128_182 : index + %c0_183 = constant 0 : index + %385 = cmpi "slt", %384, %c0_183 : index + %386 = addi %384, %c128_182 : index + %387 = select %385, %386, %384 : index + %c8_184 = constant 8 : index + %c0_185 = constant 0 : index + %c-1_186 = constant -1 : index + %388 = cmpi "slt", %arg5, %c0_185 : index + %389 = subi %c-1_186, %arg5 : index + %390 = select %388, %389, %arg5 : index + %391 = divi_signed %390, %c8_184 : index + %392 = subi %c-1_186, %391 : index + %393 = select %388, %392, %391 : index + %c56_187 = constant 56 : index + %394 = addi %arg5, %c56_187 : index + %c16_188 = constant 16 : index + %c0_189 = constant 0 : index + %c-1_190 = constant -1 : index + %395 = cmpi "slt", %394, %c0_189 : index + %396 = subi %c-1_190, %394 : index + %397 = select %395, %396, %394 : index + %398 = divi_signed %397, %c16_188 : index + %399 = subi %c-1_190, %398 : index + %400 = select %395, %399, %398 : index + %c-2_191 = constant -2 : index + %401 = muli %400, %c-2_191 : index + %402 = addi %393, %401 : index + %c8_192 = constant 8 : index + %c0_193 = constant 0 : index + %c-1_194 = constant -1 : index + %403 = cmpi "slt", %arg5, %c0_193 : index + %404 = subi %c-1_194, %arg5 : index + %405 = select %403, %404, %arg5 : index + %406 = divi_signed %405, %c8_192 : index + %407 = subi %c-1_194, %406 : index + %408 = select %403, %407, %406 : index + %c56_195 = constant 56 : index + %409 = addi %arg5, %c56_195 : index + %c16_196 = constant 16 : index + %c0_197 = constant 0 : index + %c-1_198 = constant -1 : index + %410 = cmpi "slt", %409, %c0_197 : index + %411 = subi %c-1_198, %409 : index + %412 = select %410, %411, %409 : index + %413 = divi_signed %412, %c16_196 : index + %414 = subi %c-1_198, %413 : index + %415 = select %410, %414, %413 : index + %c-2_199 = constant -2 : index + %416 = muli %415, %c-2_199 : index + %417 = addi %408, %416 : index + %c7_200 = constant 7 : index + %418 = addi %417, %c7_200 : index + %c2_201 = constant 2 : index + %c0_202 = constant 0 : index + %c-1_203 = constant -1 : index + %419 = cmpi "slt", %418, %c0_202 : index + %420 = subi %c-1_203, %418 : index + %421 = select %419, %420, %418 : index + %422 = divi_signed %421, %c2_201 : index + %423 = subi %c-1_203, %422 : index + %424 = select %419, %423, %422 : index + %c-2_204 = constant -2 : index + %425 = muli %424, %c-2_204 : index + %426 = addi %402, %425 : index + %c7_205 = constant 7 : index + %427 = addi %426, %c7_205 : index + store %372, %3[%383, %387, %427] : memref<16x128x2xvector<8xf32>> + %428 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %c16_206 = constant 16 : index + %c0_207 = constant 0 : index + %c-1_208 = constant -1 : index + %429 = cmpi "slt", %arg5, %c0_207 : index + %430 = subi %c-1_208, %arg5 : index + %431 = select %429, %430, %arg5 : index + %432 = divi_signed %431, %c16_206 : index + %433 = subi %c-1_208, %432 : index + %434 = select %429, %433, %432 : index + %c16_209 = constant 16 : index + %c0_210 = constant 0 : index + %c-1_211 = constant -1 : index + %435 = cmpi "slt", %arg5, %c0_210 : index + %436 = subi %c-1_211, %arg5 : index + %437 = select %435, %436, %arg5 : index + %438 = divi_signed %437, %c16_209 : index + %439 = subi %c-1_211, %438 : index + %440 = select %435, %439, %438 : index + %c4_212 = constant 4 : index + %441 = addi %440, %c4_212 : index + %c16_213 = constant 16 : index + %c0_214 = constant 0 : index + %c-1_215 = constant -1 : index + %442 = cmpi "slt", %441, %c0_214 : index + %443 = subi %c-1_215, %441 : index + %444 = select %442, %443, %441 : index + %445 = divi_signed %444, %c16_213 : index + %446 = subi %c-1_215, %445 : index + %447 = select %442, %446, %445 : index + %c-16_216 = constant -16 : index + %448 = muli %447, %c-16_216 : index + %449 = addi %434, %448 : index + %c4_217 = constant 4 : index + %450 = addi %449, %c4_217 : index + %c128_218 = constant 128 : index + %451 = remi_signed %arg4, %c128_218 : index + %c0_219 = constant 0 : index + %452 = cmpi "slt", %451, %c0_219 : index + %453 = addi %451, %c128_218 : index + %454 = select %452, %453, %451 : index + %c16_220 = constant 16 : index + %455 = remi_signed %arg5, %c16_220 : index + %c0_221 = constant 0 : index + %456 = cmpi "slt", %455, %c0_221 : index + %457 = addi %455, %c16_220 : index + %458 = select %456, %457, %455 : index + %c8_222 = constant 8 : index + %c0_223 = constant 0 : index + %c-1_224 = constant -1 : index + %459 = cmpi "slt", %458, %c0_223 : index + %460 = subi %c-1_224, %458 : index + %461 = select %459, %460, %458 : index + %462 = divi_signed %461, %c8_222 : index + %463 = subi %c-1_224, %462 : index + %464 = select %459, %463, %462 : index + %c2_225 = constant 2 : index + %465 = remi_signed %464, %c2_225 : index + %c0_226 = constant 0 : index + %466 = cmpi "slt", %465, %c0_226 : index + %467 = addi %465, %c2_225 : index + %468 = select %466, %467, %465 : index + store %428, %3[%450, %454, %468] : memref<16x128x2xvector<8xf32>> + %469 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %c72_227 = constant 72 : index + %470 = addi %arg5, %c72_227 : index + %c16_228 = constant 16 : index + %c0_229 = constant 0 : index + %c-1_230 = constant -1 : index + %471 = cmpi "slt", %470, %c0_229 : index + %472 = subi %c-1_230, %470 : index + %473 = select %471, %472, %470 : index + %474 = divi_signed %473, %c16_228 : index + %475 = subi %c-1_230, %474 : index + %476 = select %471, %475, %474 : index + %c16_231 = constant 16 : index + %477 = remi_signed %476, %c16_231 : index + %c0_232 = constant 0 : index + %478 = cmpi "slt", %477, %c0_232 : index + %479 = addi %477, %c16_231 : index + %480 = select %478, %479, %477 : index + %c128_233 = constant 128 : index + %481 = remi_signed %arg4, %c128_233 : index + %c0_234 = constant 0 : index + %482 = cmpi "slt", %481, %c0_234 : index + %483 = addi %481, %c128_233 : index + %484 = select %482, %483, %481 : index + %c8_235 = constant 8 : index + %c0_236 = constant 0 : index + %c-1_237 = constant -1 : index + %485 = cmpi "slt", %arg5, %c0_236 : index + %486 = subi %c-1_237, %arg5 : index + %487 = select %485, %486, %arg5 : index + %488 = divi_signed %487, %c8_235 : index + %489 = subi %c-1_237, %488 : index + %490 = select %485, %489, %488 : index + %c72_238 = constant 72 : index + %491 = addi %arg5, %c72_238 : index + %c16_239 = constant 16 : index + %c0_240 = constant 0 : index + %c-1_241 = constant -1 : index + %492 = cmpi "slt", %491, %c0_240 : index + %493 = subi %c-1_241, %491 : index + %494 = select %492, %493, %491 : index + %495 = divi_signed %494, %c16_239 : index + %496 = subi %c-1_241, %495 : index + %497 = select %492, %496, %495 : index + %c-2_242 = constant -2 : index + %498 = muli %497, %c-2_242 : index + %499 = addi %490, %498 : index + %c8_243 = constant 8 : index + %c0_244 = constant 0 : index + %c-1_245 = constant -1 : index + %500 = cmpi "slt", %arg5, %c0_244 : index + %501 = subi %c-1_245, %arg5 : index + %502 = select %500, %501, %arg5 : index + %503 = divi_signed %502, %c8_243 : index + %504 = subi %c-1_245, %503 : index + %505 = select %500, %504, %503 : index + %c72_246 = constant 72 : index + %506 = addi %arg5, %c72_246 : index + %c16_247 = constant 16 : index + %c0_248 = constant 0 : index + %c-1_249 = constant -1 : index + %507 = cmpi "slt", %506, %c0_248 : index + %508 = subi %c-1_249, %506 : index + %509 = select %507, %508, %506 : index + %510 = divi_signed %509, %c16_247 : index + %511 = subi %c-1_249, %510 : index + %512 = select %507, %511, %510 : index + %c-2_250 = constant -2 : index + %513 = muli %512, %c-2_250 : index + %514 = addi %505, %513 : index + %c9_251 = constant 9 : index + %515 = addi %514, %c9_251 : index + %c2_252 = constant 2 : index + %c0_253 = constant 0 : index + %c-1_254 = constant -1 : index + %516 = cmpi "slt", %515, %c0_253 : index + %517 = subi %c-1_254, %515 : index + %518 = select %516, %517, %515 : index + %519 = divi_signed %518, %c2_252 : index + %520 = subi %c-1_254, %519 : index + %521 = select %516, %520, %519 : index + %c-2_255 = constant -2 : index + %522 = muli %521, %c-2_255 : index + %523 = addi %499, %522 : index + %c9_256 = constant 9 : index + %524 = addi %523, %c9_256 : index + store %469, %3[%480, %484, %524] : memref<16x128x2xvector<8xf32>> + %525 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %c16_257 = constant 16 : index + %c0_258 = constant 0 : index + %c-1_259 = constant -1 : index + %526 = cmpi "slt", %arg5, %c0_258 : index + %527 = subi %c-1_259, %arg5 : index + %528 = select %526, %527, %arg5 : index + %529 = divi_signed %528, %c16_257 : index + %530 = subi %c-1_259, %529 : index + %531 = select %526, %530, %529 : index + %c16_260 = constant 16 : index + %c0_261 = constant 0 : index + %c-1_262 = constant -1 : index + %532 = cmpi "slt", %arg5, %c0_261 : index + %533 = subi %c-1_262, %arg5 : index + %534 = select %532, %533, %arg5 : index + %535 = divi_signed %534, %c16_260 : index + %536 = subi %c-1_262, %535 : index + %537 = select %532, %536, %535 : index + %c5_263 = constant 5 : index + %538 = addi %537, %c5_263 : index + %c16_264 = constant 16 : index + %c0_265 = constant 0 : index + %c-1_266 = constant -1 : index + %539 = cmpi "slt", %538, %c0_265 : index + %540 = subi %c-1_266, %538 : index + %541 = select %539, %540, %538 : index + %542 = divi_signed %541, %c16_264 : index + %543 = subi %c-1_266, %542 : index + %544 = select %539, %543, %542 : index + %c-16_267 = constant -16 : index + %545 = muli %544, %c-16_267 : index + %546 = addi %531, %545 : index + %c5_268 = constant 5 : index + %547 = addi %546, %c5_268 : index + %c128_269 = constant 128 : index + %548 = remi_signed %arg4, %c128_269 : index + %c0_270 = constant 0 : index + %549 = cmpi "slt", %548, %c0_270 : index + %550 = addi %548, %c128_269 : index + %551 = select %549, %550, %548 : index + %c16_271 = constant 16 : index + %552 = remi_signed %arg5, %c16_271 : index + %c0_272 = constant 0 : index + %553 = cmpi "slt", %552, %c0_272 : index + %554 = addi %552, %c16_271 : index + %555 = select %553, %554, %552 : index + %c8_273 = constant 8 : index + %c0_274 = constant 0 : index + %c-1_275 = constant -1 : index + %556 = cmpi "slt", %555, %c0_274 : index + %557 = subi %c-1_275, %555 : index + %558 = select %556, %557, %555 : index + %559 = divi_signed %558, %c8_273 : index + %560 = subi %c-1_275, %559 : index + %561 = select %556, %560, %559 : index + %c2_276 = constant 2 : index + %562 = remi_signed %561, %c2_276 : index + %c0_277 = constant 0 : index + %563 = cmpi "slt", %562, %c0_277 : index + %564 = addi %562, %c2_276 : index + %565 = select %563, %564, %562 : index + store %525, %3[%547, %551, %565] : memref<16x128x2xvector<8xf32>> + %566 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %c88_278 = constant 88 : index + %567 = addi %arg5, %c88_278 : index + %c16_279 = constant 16 : index + %c0_280 = constant 0 : index + %c-1_281 = constant -1 : index + %568 = cmpi "slt", %567, %c0_280 : index + %569 = subi %c-1_281, %567 : index + %570 = select %568, %569, %567 : index + %571 = divi_signed %570, %c16_279 : index + %572 = subi %c-1_281, %571 : index + %573 = select %568, %572, %571 : index + %c16_282 = constant 16 : index + %574 = remi_signed %573, %c16_282 : index + %c0_283 = constant 0 : index + %575 = cmpi "slt", %574, %c0_283 : index + %576 = addi %574, %c16_282 : index + %577 = select %575, %576, %574 : index + %c128_284 = constant 128 : index + %578 = remi_signed %arg4, %c128_284 : index + %c0_285 = constant 0 : index + %579 = cmpi "slt", %578, %c0_285 : index + %580 = addi %578, %c128_284 : index + %581 = select %579, %580, %578 : index + %c8_286 = constant 8 : index + %c0_287 = constant 0 : index + %c-1_288 = constant -1 : index + %582 = cmpi "slt", %arg5, %c0_287 : index + %583 = subi %c-1_288, %arg5 : index + %584 = select %582, %583, %arg5 : index + %585 = divi_signed %584, %c8_286 : index + %586 = subi %c-1_288, %585 : index + %587 = select %582, %586, %585 : index + %c88_289 = constant 88 : index + %588 = addi %arg5, %c88_289 : index + %c16_290 = constant 16 : index + %c0_291 = constant 0 : index + %c-1_292 = constant -1 : index + %589 = cmpi "slt", %588, %c0_291 : index + %590 = subi %c-1_292, %588 : index + %591 = select %589, %590, %588 : index + %592 = divi_signed %591, %c16_290 : index + %593 = subi %c-1_292, %592 : index + %594 = select %589, %593, %592 : index + %c-2_293 = constant -2 : index + %595 = muli %594, %c-2_293 : index + %596 = addi %587, %595 : index + %c8_294 = constant 8 : index + %c0_295 = constant 0 : index + %c-1_296 = constant -1 : index + %597 = cmpi "slt", %arg5, %c0_295 : index + %598 = subi %c-1_296, %arg5 : index + %599 = select %597, %598, %arg5 : index + %600 = divi_signed %599, %c8_294 : index + %601 = subi %c-1_296, %600 : index + %602 = select %597, %601, %600 : index + %c88_297 = constant 88 : index + %603 = addi %arg5, %c88_297 : index + %c16_298 = constant 16 : index + %c0_299 = constant 0 : index + %c-1_300 = constant -1 : index + %604 = cmpi "slt", %603, %c0_299 : index + %605 = subi %c-1_300, %603 : index + %606 = select %604, %605, %603 : index + %607 = divi_signed %606, %c16_298 : index + %608 = subi %c-1_300, %607 : index + %609 = select %604, %608, %607 : index + %c-2_301 = constant -2 : index + %610 = muli %609, %c-2_301 : index + %611 = addi %602, %610 : index + %c11_302 = constant 11 : index + %612 = addi %611, %c11_302 : index + %c2_303 = constant 2 : index + %c0_304 = constant 0 : index + %c-1_305 = constant -1 : index + %613 = cmpi "slt", %612, %c0_304 : index + %614 = subi %c-1_305, %612 : index + %615 = select %613, %614, %612 : index + %616 = divi_signed %615, %c2_303 : index + %617 = subi %c-1_305, %616 : index + %618 = select %613, %617, %616 : index + %c-2_306 = constant -2 : index + %619 = muli %618, %c-2_306 : index + %620 = addi %596, %619 : index + %c11_307 = constant 11 : index + %621 = addi %620, %c11_307 : index + store %566, %3[%577, %581, %621] : memref<16x128x2xvector<8xf32>> + %622 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %c16_308 = constant 16 : index + %c0_309 = constant 0 : index + %c-1_310 = constant -1 : index + %623 = cmpi "slt", %arg5, %c0_309 : index + %624 = subi %c-1_310, %arg5 : index + %625 = select %623, %624, %arg5 : index + %626 = divi_signed %625, %c16_308 : index + %627 = subi %c-1_310, %626 : index + %628 = select %623, %627, %626 : index + %c16_311 = constant 16 : index + %c0_312 = constant 0 : index + %c-1_313 = constant -1 : index + %629 = cmpi "slt", %arg5, %c0_312 : index + %630 = subi %c-1_313, %arg5 : index + %631 = select %629, %630, %arg5 : index + %632 = divi_signed %631, %c16_311 : index + %633 = subi %c-1_313, %632 : index + %634 = select %629, %633, %632 : index + %c6_314 = constant 6 : index + %635 = addi %634, %c6_314 : index + %c16_315 = constant 16 : index + %c0_316 = constant 0 : index + %c-1_317 = constant -1 : index + %636 = cmpi "slt", %635, %c0_316 : index + %637 = subi %c-1_317, %635 : index + %638 = select %636, %637, %635 : index + %639 = divi_signed %638, %c16_315 : index + %640 = subi %c-1_317, %639 : index + %641 = select %636, %640, %639 : index + %c-16_318 = constant -16 : index + %642 = muli %641, %c-16_318 : index + %643 = addi %628, %642 : index + %c6_319 = constant 6 : index + %644 = addi %643, %c6_319 : index + %c128_320 = constant 128 : index + %645 = remi_signed %arg4, %c128_320 : index + %c0_321 = constant 0 : index + %646 = cmpi "slt", %645, %c0_321 : index + %647 = addi %645, %c128_320 : index + %648 = select %646, %647, %645 : index + %c16_322 = constant 16 : index + %649 = remi_signed %arg5, %c16_322 : index + %c0_323 = constant 0 : index + %650 = cmpi "slt", %649, %c0_323 : index + %651 = addi %649, %c16_322 : index + %652 = select %650, %651, %649 : index + %c8_324 = constant 8 : index + %c0_325 = constant 0 : index + %c-1_326 = constant -1 : index + %653 = cmpi "slt", %652, %c0_325 : index + %654 = subi %c-1_326, %652 : index + %655 = select %653, %654, %652 : index + %656 = divi_signed %655, %c8_324 : index + %657 = subi %c-1_326, %656 : index + %658 = select %653, %657, %656 : index + %c2_327 = constant 2 : index + %659 = remi_signed %658, %c2_327 : index + %c0_328 = constant 0 : index + %660 = cmpi "slt", %659, %c0_328 : index + %661 = addi %659, %c2_327 : index + %662 = select %660, %661, %659 : index + store %622, %3[%644, %648, %662] : memref<16x128x2xvector<8xf32>> + %663 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %c104_329 = constant 104 : index + %664 = addi %arg5, %c104_329 : index + %c16_330 = constant 16 : index + %c0_331 = constant 0 : index + %c-1_332 = constant -1 : index + %665 = cmpi "slt", %664, %c0_331 : index + %666 = subi %c-1_332, %664 : index + %667 = select %665, %666, %664 : index + %668 = divi_signed %667, %c16_330 : index + %669 = subi %c-1_332, %668 : index + %670 = select %665, %669, %668 : index + %c16_333 = constant 16 : index + %671 = remi_signed %670, %c16_333 : index + %c0_334 = constant 0 : index + %672 = cmpi "slt", %671, %c0_334 : index + %673 = addi %671, %c16_333 : index + %674 = select %672, %673, %671 : index + %c128_335 = constant 128 : index + %675 = remi_signed %arg4, %c128_335 : index + %c0_336 = constant 0 : index + %676 = cmpi "slt", %675, %c0_336 : index + %677 = addi %675, %c128_335 : index + %678 = select %676, %677, %675 : index + %c8_337 = constant 8 : index + %c0_338 = constant 0 : index + %c-1_339 = constant -1 : index + %679 = cmpi "slt", %arg5, %c0_338 : index + %680 = subi %c-1_339, %arg5 : index + %681 = select %679, %680, %arg5 : index + %682 = divi_signed %681, %c8_337 : index + %683 = subi %c-1_339, %682 : index + %684 = select %679, %683, %682 : index + %c104_340 = constant 104 : index + %685 = addi %arg5, %c104_340 : index + %c16_341 = constant 16 : index + %c0_342 = constant 0 : index + %c-1_343 = constant -1 : index + %686 = cmpi "slt", %685, %c0_342 : index + %687 = subi %c-1_343, %685 : index + %688 = select %686, %687, %685 : index + %689 = divi_signed %688, %c16_341 : index + %690 = subi %c-1_343, %689 : index + %691 = select %686, %690, %689 : index + %c-2_344 = constant -2 : index + %692 = muli %691, %c-2_344 : index + %693 = addi %684, %692 : index + %c8_345 = constant 8 : index + %c0_346 = constant 0 : index + %c-1_347 = constant -1 : index + %694 = cmpi "slt", %arg5, %c0_346 : index + %695 = subi %c-1_347, %arg5 : index + %696 = select %694, %695, %arg5 : index + %697 = divi_signed %696, %c8_345 : index + %698 = subi %c-1_347, %697 : index + %699 = select %694, %698, %697 : index + %c104_348 = constant 104 : index + %700 = addi %arg5, %c104_348 : index + %c16_349 = constant 16 : index + %c0_350 = constant 0 : index + %c-1_351 = constant -1 : index + %701 = cmpi "slt", %700, %c0_350 : index + %702 = subi %c-1_351, %700 : index + %703 = select %701, %702, %700 : index + %704 = divi_signed %703, %c16_349 : index + %705 = subi %c-1_351, %704 : index + %706 = select %701, %705, %704 : index + %c-2_352 = constant -2 : index + %707 = muli %706, %c-2_352 : index + %708 = addi %699, %707 : index + %c13_353 = constant 13 : index + %709 = addi %708, %c13_353 : index + %c2_354 = constant 2 : index + %c0_355 = constant 0 : index + %c-1_356 = constant -1 : index + %710 = cmpi "slt", %709, %c0_355 : index + %711 = subi %c-1_356, %709 : index + %712 = select %710, %711, %709 : index + %713 = divi_signed %712, %c2_354 : index + %714 = subi %c-1_356, %713 : index + %715 = select %710, %714, %713 : index + %c-2_357 = constant -2 : index + %716 = muli %715, %c-2_357 : index + %717 = addi %693, %716 : index + %c13_358 = constant 13 : index + %718 = addi %717, %c13_358 : index + store %663, %3[%674, %678, %718] : memref<16x128x2xvector<8xf32>> + %719 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %c16_359 = constant 16 : index + %c0_360 = constant 0 : index + %c-1_361 = constant -1 : index + %720 = cmpi "slt", %arg5, %c0_360 : index + %721 = subi %c-1_361, %arg5 : index + %722 = select %720, %721, %arg5 : index + %723 = divi_signed %722, %c16_359 : index + %724 = subi %c-1_361, %723 : index + %725 = select %720, %724, %723 : index + %c16_362 = constant 16 : index + %c0_363 = constant 0 : index + %c-1_364 = constant -1 : index + %726 = cmpi "slt", %arg5, %c0_363 : index + %727 = subi %c-1_364, %arg5 : index + %728 = select %726, %727, %arg5 : index + %729 = divi_signed %728, %c16_362 : index + %730 = subi %c-1_364, %729 : index + %731 = select %726, %730, %729 : index + %c7_365 = constant 7 : index + %732 = addi %731, %c7_365 : index + %c16_366 = constant 16 : index + %c0_367 = constant 0 : index + %c-1_368 = constant -1 : index + %733 = cmpi "slt", %732, %c0_367 : index + %734 = subi %c-1_368, %732 : index + %735 = select %733, %734, %732 : index + %736 = divi_signed %735, %c16_366 : index + %737 = subi %c-1_368, %736 : index + %738 = select %733, %737, %736 : index + %c-16_369 = constant -16 : index + %739 = muli %738, %c-16_369 : index + %740 = addi %725, %739 : index + %c7_370 = constant 7 : index + %741 = addi %740, %c7_370 : index + %c128_371 = constant 128 : index + %742 = remi_signed %arg4, %c128_371 : index + %c0_372 = constant 0 : index + %743 = cmpi "slt", %742, %c0_372 : index + %744 = addi %742, %c128_371 : index + %745 = select %743, %744, %742 : index + %c16_373 = constant 16 : index + %746 = remi_signed %arg5, %c16_373 : index + %c0_374 = constant 0 : index + %747 = cmpi "slt", %746, %c0_374 : index + %748 = addi %746, %c16_373 : index + %749 = select %747, %748, %746 : index + %c8_375 = constant 8 : index + %c0_376 = constant 0 : index + %c-1_377 = constant -1 : index + %750 = cmpi "slt", %749, %c0_376 : index + %751 = subi %c-1_377, %749 : index + %752 = select %750, %751, %749 : index + %753 = divi_signed %752, %c8_375 : index + %754 = subi %c-1_377, %753 : index + %755 = select %750, %754, %753 : index + %c2_378 = constant 2 : index + %756 = remi_signed %755, %c2_378 : index + %c0_379 = constant 0 : index + %757 = cmpi "slt", %756, %c0_379 : index + %758 = addi %756, %c2_378 : index + %759 = select %757, %758, %756 : index + store %719, %3[%741, %745, %759] : memref<16x128x2xvector<8xf32>> + %760 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %c120_380 = constant 120 : index + %761 = addi %arg5, %c120_380 : index + %c16_381 = constant 16 : index + %c0_382 = constant 0 : index + %c-1_383 = constant -1 : index + %762 = cmpi "slt", %761, %c0_382 : index + %763 = subi %c-1_383, %761 : index + %764 = select %762, %763, %761 : index + %765 = divi_signed %764, %c16_381 : index + %766 = subi %c-1_383, %765 : index + %767 = select %762, %766, %765 : index + %c16_384 = constant 16 : index + %768 = remi_signed %767, %c16_384 : index + %c0_385 = constant 0 : index + %769 = cmpi "slt", %768, %c0_385 : index + %770 = addi %768, %c16_384 : index + %771 = select %769, %770, %768 : index + %c128_386 = constant 128 : index + %772 = remi_signed %arg4, %c128_386 : index + %c0_387 = constant 0 : index + %773 = cmpi "slt", %772, %c0_387 : index + %774 = addi %772, %c128_386 : index + %775 = select %773, %774, %772 : index + %c8_388 = constant 8 : index + %c0_389 = constant 0 : index + %c-1_390 = constant -1 : index + %776 = cmpi "slt", %arg5, %c0_389 : index + %777 = subi %c-1_390, %arg5 : index + %778 = select %776, %777, %arg5 : index + %779 = divi_signed %778, %c8_388 : index + %780 = subi %c-1_390, %779 : index + %781 = select %776, %780, %779 : index + %c120_391 = constant 120 : index + %782 = addi %arg5, %c120_391 : index + %c16_392 = constant 16 : index + %c0_393 = constant 0 : index + %c-1_394 = constant -1 : index + %783 = cmpi "slt", %782, %c0_393 : index + %784 = subi %c-1_394, %782 : index + %785 = select %783, %784, %782 : index + %786 = divi_signed %785, %c16_392 : index + %787 = subi %c-1_394, %786 : index + %788 = select %783, %787, %786 : index + %c-2_395 = constant -2 : index + %789 = muli %788, %c-2_395 : index + %790 = addi %781, %789 : index + %c8_396 = constant 8 : index + %c0_397 = constant 0 : index + %c-1_398 = constant -1 : index + %791 = cmpi "slt", %arg5, %c0_397 : index + %792 = subi %c-1_398, %arg5 : index + %793 = select %791, %792, %arg5 : index + %794 = divi_signed %793, %c8_396 : index + %795 = subi %c-1_398, %794 : index + %796 = select %791, %795, %794 : index + %c120_399 = constant 120 : index + %797 = addi %arg5, %c120_399 : index + %c16_400 = constant 16 : index + %c0_401 = constant 0 : index + %c-1_402 = constant -1 : index + %798 = cmpi "slt", %797, %c0_401 : index + %799 = subi %c-1_402, %797 : index + %800 = select %798, %799, %797 : index + %801 = divi_signed %800, %c16_400 : index + %802 = subi %c-1_402, %801 : index + %803 = select %798, %802, %801 : index + %c-2_403 = constant -2 : index + %804 = muli %803, %c-2_403 : index + %805 = addi %796, %804 : index + %c15_404 = constant 15 : index + %806 = addi %805, %c15_404 : index + %c2_405 = constant 2 : index + %c0_406 = constant 0 : index + %c-1_407 = constant -1 : index + %807 = cmpi "slt", %806, %c0_406 : index + %808 = subi %c-1_407, %806 : index + %809 = select %807, %808, %806 : index + %810 = divi_signed %809, %c2_405 : index + %811 = subi %c-1_407, %810 : index + %812 = select %807, %811, %810 : index + %c-2_408 = constant -2 : index + %813 = muli %812, %c-2_408 : index + %814 = addi %790, %813 : index + %c15_409 = constant 15 : index + %815 = addi %814, %c15_409 : index + store %760, %3[%771, %775, %815] : memref<16x128x2xvector<8xf32>> + } + } + } + %c0_4 = constant 0 : index + %c784 = constant 784 : index + %c1_5 = constant 1 : index + scf.for %arg4 = %c0_4 to %c784 step %c1_5 { + %c0_6 = constant 0 : index + %c16 = constant 16 : index + %c1_7 = constant 1 : index + scf.for %arg5 = %c0_6 to %c16 step %c1_7 { + %c0_14 = constant 0 : index + %c6_15 = constant 6 : index + %c1_16 = constant 1 : index + scf.for %arg6 = %c0_14 to %c6_15 step %c1_16 { + %c0_17 = constant 0 : index + %c2_18 = constant 2 : index + %c1_19 = constant 1 : index + scf.for %arg7 = %c0_17 to %c2_18 step %c1_19 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + %c0_8 = constant 0 : index + %c256_9 = constant 256 : index + %c16_10 = constant 16 : index + scf.for %arg5 = %c0_8 to %c256_9 step %c16_10 { + %c0_14 = constant 0 : index + %c128_15 = constant 128 : index + %c4_16 = constant 4 : index + scf.for %arg6 = %c0_14 to %c128_15 step %c4_16 { + %c0_17 = constant 0 : index + %c0_18 = constant 0 : index + %c6_19 = constant 6 : index + scf.for %arg7 = %c0_17 to %c0_18 step %c6_19 { + %c0_23 = constant 0 : index + %c4_24 = constant 4 : index + %c1_25 = constant 1 : index + scf.for %arg8 = %c0_23 to %c4_24 step %c1_25 { + %c0_26 = constant 0 : index + %c0_27 = constant 0 : index + %c1_28 = constant 1 : index + scf.for %arg9 = %c0_26 to %c0_27 step %c1_28 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg4, %arg7 : index + %7 = addi %6, %arg9 : index + %8 = addi %arg4, %arg7 : index + %9 = addi %8, %arg9 : index + %10 = addi %arg4, %arg7 : index + %11 = addi %10, %arg9 : index + %12 = addi %arg4, %arg7 : index + %13 = addi %12, %arg9 : index + %14 = addi %arg4, %arg7 : index + %15 = addi %14, %arg9 : index + %16 = addi %arg4, %arg7 : index + %17 = addi %16, %arg9 : index + %18 = addi %arg4, %arg7 : index + %19 = addi %18, %arg9 : index + %20 = addi %arg6, %arg8 : index + %21 = addi %arg6, %arg8 : index + %22 = addi %arg6, %arg8 : index + %23 = addi %arg6, %arg8 : index + %24 = addi %arg6, %arg8 : index + %25 = addi %arg6, %arg8 : index + %26 = addi %arg6, %arg8 : index + %27 = addi %arg6, %arg8 : index + %28 = load %arg0[%5, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = load %arg0[%7, %21] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %30 = load %arg0[%9, %22] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg0[%11, %23] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %32 = load %arg0[%13, %24] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %33 = load %arg0[%15, %25] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %34 = load %arg0[%17, %26] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %35 = load %arg0[%19, %27] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %c16_29 = constant 16 : index + %c0_30 = constant 0 : index + %c-1 = constant -1 : index + %36 = cmpi "slt", %arg5, %c0_30 : index + %37 = subi %c-1, %arg5 : index + %38 = select %36, %37, %arg5 : index + %39 = divi_signed %38, %c16_29 : index + %40 = subi %c-1, %39 : index + %41 = select %36, %40, %39 : index + %c16_31 = constant 16 : index + %42 = remi_signed %41, %c16_31 : index + %c0_32 = constant 0 : index + %43 = cmpi "slt", %42, %c0_32 : index + %44 = addi %42, %c16_31 : index + %45 = select %43, %44, %42 : index + %46 = addi %arg6, %arg8 : index + %c128_33 = constant 128 : index + %47 = remi_signed %46, %c128_33 : index + %c0_34 = constant 0 : index + %48 = cmpi "slt", %47, %c0_34 : index + %49 = addi %47, %c128_33 : index + %50 = select %48, %49, %47 : index + %c16_35 = constant 16 : index + %51 = remi_signed %arg5, %c16_35 : index + %c0_36 = constant 0 : index + %52 = cmpi "slt", %51, %c0_36 : index + %53 = addi %51, %c16_35 : index + %54 = select %52, %53, %51 : index + %c8_37 = constant 8 : index + %c0_38 = constant 0 : index + %c-1_39 = constant -1 : index + %55 = cmpi "slt", %54, %c0_38 : index + %56 = subi %c-1_39, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8_37 : index + %59 = subi %c-1_39, %58 : index + %60 = select %55, %59, %58 : index + %c2_40 = constant 2 : index + %61 = remi_signed %60, %c2_40 : index + %c0_41 = constant 0 : index + %62 = cmpi "slt", %61, %c0_41 : index + %63 = addi %61, %c2_40 : index + %64 = select %62, %63, %61 : index + %65 = load %3[%45, %50, %64] : memref<16x128x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %c16_42 = constant 16 : index + %c0_43 = constant 0 : index + %c-1_44 = constant -1 : index + %67 = cmpi "slt", %arg5, %c0_43 : index + %68 = subi %c-1_44, %arg5 : index + %69 = select %67, %68, %arg5 : index + %70 = divi_signed %69, %c16_42 : index + %71 = subi %c-1_44, %70 : index + %72 = select %67, %71, %70 : index + %c16_45 = constant 16 : index + %73 = remi_signed %72, %c16_45 : index + %c0_46 = constant 0 : index + %74 = cmpi "slt", %73, %c0_46 : index + %75 = addi %73, %c16_45 : index + %76 = select %74, %75, %73 : index + %77 = addi %arg6, %arg8 : index + %c128_47 = constant 128 : index + %78 = remi_signed %77, %c128_47 : index + %c0_48 = constant 0 : index + %79 = cmpi "slt", %78, %c0_48 : index + %80 = addi %78, %c128_47 : index + %81 = select %79, %80, %78 : index + %c16_49 = constant 16 : index + %82 = remi_signed %arg5, %c16_49 : index + %c0_50 = constant 0 : index + %83 = cmpi "slt", %82, %c0_50 : index + %84 = addi %82, %c16_49 : index + %85 = select %83, %84, %82 : index + %c8_51 = constant 8 : index + %c0_52 = constant 0 : index + %c-1_53 = constant -1 : index + %86 = cmpi "slt", %85, %c0_52 : index + %87 = subi %c-1_53, %85 : index + %88 = select %86, %87, %85 : index + %89 = divi_signed %88, %c8_51 : index + %90 = subi %c-1_53, %89 : index + %91 = select %86, %90, %89 : index + %c2_54 = constant 2 : index + %92 = remi_signed %91, %c2_54 : index + %c0_55 = constant 0 : index + %93 = cmpi "slt", %92, %c0_55 : index + %94 = addi %92, %c2_54 : index + %95 = select %93, %94, %92 : index + %96 = load %3[%76, %81, %95] : memref<16x128x2xvector<8xf32>> + %97 = vector.extractelement %96[%c1_i64 : i64] : vector<8xf32> + %c16_56 = constant 16 : index + %c0_57 = constant 0 : index + %c-1_58 = constant -1 : index + %98 = cmpi "slt", %arg5, %c0_57 : index + %99 = subi %c-1_58, %arg5 : index + %100 = select %98, %99, %arg5 : index + %101 = divi_signed %100, %c16_56 : index + %102 = subi %c-1_58, %101 : index + %103 = select %98, %102, %101 : index + %c16_59 = constant 16 : index + %104 = remi_signed %103, %c16_59 : index + %c0_60 = constant 0 : index + %105 = cmpi "slt", %104, %c0_60 : index + %106 = addi %104, %c16_59 : index + %107 = select %105, %106, %104 : index + %108 = addi %arg6, %arg8 : index + %c128_61 = constant 128 : index + %109 = remi_signed %108, %c128_61 : index + %c0_62 = constant 0 : index + %110 = cmpi "slt", %109, %c0_62 : index + %111 = addi %109, %c128_61 : index + %112 = select %110, %111, %109 : index + %c16_63 = constant 16 : index + %113 = remi_signed %arg5, %c16_63 : index + %c0_64 = constant 0 : index + %114 = cmpi "slt", %113, %c0_64 : index + %115 = addi %113, %c16_63 : index + %116 = select %114, %115, %113 : index + %c8_65 = constant 8 : index + %c0_66 = constant 0 : index + %c-1_67 = constant -1 : index + %117 = cmpi "slt", %116, %c0_66 : index + %118 = subi %c-1_67, %116 : index + %119 = select %117, %118, %116 : index + %120 = divi_signed %119, %c8_65 : index + %121 = subi %c-1_67, %120 : index + %122 = select %117, %121, %120 : index + %c2_68 = constant 2 : index + %123 = remi_signed %122, %c2_68 : index + %c0_69 = constant 0 : index + %124 = cmpi "slt", %123, %c0_69 : index + %125 = addi %123, %c2_68 : index + %126 = select %124, %125, %123 : index + %127 = load %3[%107, %112, %126] : memref<16x128x2xvector<8xf32>> + %128 = vector.extractelement %127[%c2_i64 : i64] : vector<8xf32> + %c16_70 = constant 16 : index + %c0_71 = constant 0 : index + %c-1_72 = constant -1 : index + %129 = cmpi "slt", %arg5, %c0_71 : index + %130 = subi %c-1_72, %arg5 : index + %131 = select %129, %130, %arg5 : index + %132 = divi_signed %131, %c16_70 : index + %133 = subi %c-1_72, %132 : index + %134 = select %129, %133, %132 : index + %c16_73 = constant 16 : index + %135 = remi_signed %134, %c16_73 : index + %c0_74 = constant 0 : index + %136 = cmpi "slt", %135, %c0_74 : index + %137 = addi %135, %c16_73 : index + %138 = select %136, %137, %135 : index + %139 = addi %arg6, %arg8 : index + %c128_75 = constant 128 : index + %140 = remi_signed %139, %c128_75 : index + %c0_76 = constant 0 : index + %141 = cmpi "slt", %140, %c0_76 : index + %142 = addi %140, %c128_75 : index + %143 = select %141, %142, %140 : index + %c16_77 = constant 16 : index + %144 = remi_signed %arg5, %c16_77 : index + %c0_78 = constant 0 : index + %145 = cmpi "slt", %144, %c0_78 : index + %146 = addi %144, %c16_77 : index + %147 = select %145, %146, %144 : index + %c8_79 = constant 8 : index + %c0_80 = constant 0 : index + %c-1_81 = constant -1 : index + %148 = cmpi "slt", %147, %c0_80 : index + %149 = subi %c-1_81, %147 : index + %150 = select %148, %149, %147 : index + %151 = divi_signed %150, %c8_79 : index + %152 = subi %c-1_81, %151 : index + %153 = select %148, %152, %151 : index + %c2_82 = constant 2 : index + %154 = remi_signed %153, %c2_82 : index + %c0_83 = constant 0 : index + %155 = cmpi "slt", %154, %c0_83 : index + %156 = addi %154, %c2_82 : index + %157 = select %155, %156, %154 : index + %158 = load %3[%138, %143, %157] : memref<16x128x2xvector<8xf32>> + %159 = vector.extractelement %158[%c3_i64 : i64] : vector<8xf32> + %c16_84 = constant 16 : index + %c0_85 = constant 0 : index + %c-1_86 = constant -1 : index + %160 = cmpi "slt", %arg5, %c0_85 : index + %161 = subi %c-1_86, %arg5 : index + %162 = select %160, %161, %arg5 : index + %163 = divi_signed %162, %c16_84 : index + %164 = subi %c-1_86, %163 : index + %165 = select %160, %164, %163 : index + %c16_87 = constant 16 : index + %166 = remi_signed %165, %c16_87 : index + %c0_88 = constant 0 : index + %167 = cmpi "slt", %166, %c0_88 : index + %168 = addi %166, %c16_87 : index + %169 = select %167, %168, %166 : index + %170 = addi %arg6, %arg8 : index + %c128_89 = constant 128 : index + %171 = remi_signed %170, %c128_89 : index + %c0_90 = constant 0 : index + %172 = cmpi "slt", %171, %c0_90 : index + %173 = addi %171, %c128_89 : index + %174 = select %172, %173, %171 : index + %c16_91 = constant 16 : index + %175 = remi_signed %arg5, %c16_91 : index + %c0_92 = constant 0 : index + %176 = cmpi "slt", %175, %c0_92 : index + %177 = addi %175, %c16_91 : index + %178 = select %176, %177, %175 : index + %c8_93 = constant 8 : index + %c0_94 = constant 0 : index + %c-1_95 = constant -1 : index + %179 = cmpi "slt", %178, %c0_94 : index + %180 = subi %c-1_95, %178 : index + %181 = select %179, %180, %178 : index + %182 = divi_signed %181, %c8_93 : index + %183 = subi %c-1_95, %182 : index + %184 = select %179, %183, %182 : index + %c2_96 = constant 2 : index + %185 = remi_signed %184, %c2_96 : index + %c0_97 = constant 0 : index + %186 = cmpi "slt", %185, %c0_97 : index + %187 = addi %185, %c2_96 : index + %188 = select %186, %187, %185 : index + %189 = load %3[%169, %174, %188] : memref<16x128x2xvector<8xf32>> + %190 = vector.extractelement %189[%c4_i64 : i64] : vector<8xf32> + %c16_98 = constant 16 : index + %c0_99 = constant 0 : index + %c-1_100 = constant -1 : index + %191 = cmpi "slt", %arg5, %c0_99 : index + %192 = subi %c-1_100, %arg5 : index + %193 = select %191, %192, %arg5 : index + %194 = divi_signed %193, %c16_98 : index + %195 = subi %c-1_100, %194 : index + %196 = select %191, %195, %194 : index + %c16_101 = constant 16 : index + %197 = remi_signed %196, %c16_101 : index + %c0_102 = constant 0 : index + %198 = cmpi "slt", %197, %c0_102 : index + %199 = addi %197, %c16_101 : index + %200 = select %198, %199, %197 : index + %201 = addi %arg6, %arg8 : index + %c128_103 = constant 128 : index + %202 = remi_signed %201, %c128_103 : index + %c0_104 = constant 0 : index + %203 = cmpi "slt", %202, %c0_104 : index + %204 = addi %202, %c128_103 : index + %205 = select %203, %204, %202 : index + %c16_105 = constant 16 : index + %206 = remi_signed %arg5, %c16_105 : index + %c0_106 = constant 0 : index + %207 = cmpi "slt", %206, %c0_106 : index + %208 = addi %206, %c16_105 : index + %209 = select %207, %208, %206 : index + %c8_107 = constant 8 : index + %c0_108 = constant 0 : index + %c-1_109 = constant -1 : index + %210 = cmpi "slt", %209, %c0_108 : index + %211 = subi %c-1_109, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c8_107 : index + %214 = subi %c-1_109, %213 : index + %215 = select %210, %214, %213 : index + %c2_110 = constant 2 : index + %216 = remi_signed %215, %c2_110 : index + %c0_111 = constant 0 : index + %217 = cmpi "slt", %216, %c0_111 : index + %218 = addi %216, %c2_110 : index + %219 = select %217, %218, %216 : index + %220 = load %3[%200, %205, %219] : memref<16x128x2xvector<8xf32>> + %221 = vector.extractelement %220[%c5_i64 : i64] : vector<8xf32> + %c16_112 = constant 16 : index + %c0_113 = constant 0 : index + %c-1_114 = constant -1 : index + %222 = cmpi "slt", %arg5, %c0_113 : index + %223 = subi %c-1_114, %arg5 : index + %224 = select %222, %223, %arg5 : index + %225 = divi_signed %224, %c16_112 : index + %226 = subi %c-1_114, %225 : index + %227 = select %222, %226, %225 : index + %c16_115 = constant 16 : index + %228 = remi_signed %227, %c16_115 : index + %c0_116 = constant 0 : index + %229 = cmpi "slt", %228, %c0_116 : index + %230 = addi %228, %c16_115 : index + %231 = select %229, %230, %228 : index + %232 = addi %arg6, %arg8 : index + %c128_117 = constant 128 : index + %233 = remi_signed %232, %c128_117 : index + %c0_118 = constant 0 : index + %234 = cmpi "slt", %233, %c0_118 : index + %235 = addi %233, %c128_117 : index + %236 = select %234, %235, %233 : index + %c16_119 = constant 16 : index + %237 = remi_signed %arg5, %c16_119 : index + %c0_120 = constant 0 : index + %238 = cmpi "slt", %237, %c0_120 : index + %239 = addi %237, %c16_119 : index + %240 = select %238, %239, %237 : index + %c8_121 = constant 8 : index + %c0_122 = constant 0 : index + %c-1_123 = constant -1 : index + %241 = cmpi "slt", %240, %c0_122 : index + %242 = subi %c-1_123, %240 : index + %243 = select %241, %242, %240 : index + %244 = divi_signed %243, %c8_121 : index + %245 = subi %c-1_123, %244 : index + %246 = select %241, %245, %244 : index + %c2_124 = constant 2 : index + %247 = remi_signed %246, %c2_124 : index + %c0_125 = constant 0 : index + %248 = cmpi "slt", %247, %c0_125 : index + %249 = addi %247, %c2_124 : index + %250 = select %248, %249, %247 : index + %251 = load %3[%231, %236, %250] : memref<16x128x2xvector<8xf32>> + %252 = vector.extractelement %251[%c6_i64 : i64] : vector<8xf32> + %c16_126 = constant 16 : index + %c0_127 = constant 0 : index + %c-1_128 = constant -1 : index + %253 = cmpi "slt", %arg5, %c0_127 : index + %254 = subi %c-1_128, %arg5 : index + %255 = select %253, %254, %arg5 : index + %256 = divi_signed %255, %c16_126 : index + %257 = subi %c-1_128, %256 : index + %258 = select %253, %257, %256 : index + %c16_129 = constant 16 : index + %259 = remi_signed %258, %c16_129 : index + %c0_130 = constant 0 : index + %260 = cmpi "slt", %259, %c0_130 : index + %261 = addi %259, %c16_129 : index + %262 = select %260, %261, %259 : index + %263 = addi %arg6, %arg8 : index + %c128_131 = constant 128 : index + %264 = remi_signed %263, %c128_131 : index + %c0_132 = constant 0 : index + %265 = cmpi "slt", %264, %c0_132 : index + %266 = addi %264, %c128_131 : index + %267 = select %265, %266, %264 : index + %c16_133 = constant 16 : index + %268 = remi_signed %arg5, %c16_133 : index + %c0_134 = constant 0 : index + %269 = cmpi "slt", %268, %c0_134 : index + %270 = addi %268, %c16_133 : index + %271 = select %269, %270, %268 : index + %c8_135 = constant 8 : index + %c0_136 = constant 0 : index + %c-1_137 = constant -1 : index + %272 = cmpi "slt", %271, %c0_136 : index + %273 = subi %c-1_137, %271 : index + %274 = select %272, %273, %271 : index + %275 = divi_signed %274, %c8_135 : index + %276 = subi %c-1_137, %275 : index + %277 = select %272, %276, %275 : index + %c2_138 = constant 2 : index + %278 = remi_signed %277, %c2_138 : index + %c0_139 = constant 0 : index + %279 = cmpi "slt", %278, %c0_139 : index + %280 = addi %278, %c2_138 : index + %281 = select %279, %280, %278 : index + %282 = load %3[%262, %267, %281] : memref<16x128x2xvector<8xf32>> + %283 = vector.extractelement %282[%c7_i64 : i64] : vector<8xf32> + %284 = "accv.bin_op"(%28, %66) {predicate = 2 : i64} : (f32, f32) -> f32 + %285 = "accv.bin_op"(%29, %97) {predicate = 2 : i64} : (f32, f32) -> f32 + %286 = "accv.bin_op"(%30, %128) {predicate = 2 : i64} : (f32, f32) -> f32 + %287 = "accv.bin_op"(%31, %159) {predicate = 2 : i64} : (f32, f32) -> f32 + %288 = "accv.bin_op"(%32, %190) {predicate = 2 : i64} : (f32, f32) -> f32 + %289 = "accv.bin_op"(%33, %221) {predicate = 2 : i64} : (f32, f32) -> f32 + %290 = "accv.bin_op"(%34, %252) {predicate = 2 : i64} : (f32, f32) -> f32 + %291 = "accv.bin_op"(%35, %283) {predicate = 2 : i64} : (f32, f32) -> f32 + %c16_140 = constant 16 : index + %c0_141 = constant 0 : index + %c-1_142 = constant -1 : index + %292 = cmpi "slt", %arg5, %c0_141 : index + %293 = subi %c-1_142, %arg5 : index + %294 = select %292, %293, %arg5 : index + %295 = divi_signed %294, %c16_140 : index + %296 = subi %c-1_142, %295 : index + %297 = select %292, %296, %295 : index + %c16_143 = constant 16 : index + %298 = remi_signed %297, %c16_143 : index + %c0_144 = constant 0 : index + %299 = cmpi "slt", %298, %c0_144 : index + %300 = addi %298, %c16_143 : index + %301 = select %299, %300, %298 : index + %302 = addi %arg7, %arg9 : index + %c6_145 = constant 6 : index + %303 = remi_signed %302, %c6_145 : index + %c0_146 = constant 0 : index + %304 = cmpi "slt", %303, %c0_146 : index + %305 = addi %303, %c6_145 : index + %306 = select %304, %305, %303 : index + %c16_147 = constant 16 : index + %307 = remi_signed %arg5, %c16_147 : index + %c0_148 = constant 0 : index + %308 = cmpi "slt", %307, %c0_148 : index + %309 = addi %307, %c16_147 : index + %310 = select %308, %309, %307 : index + %c8_149 = constant 8 : index + %c0_150 = constant 0 : index + %c-1_151 = constant -1 : index + %311 = cmpi "slt", %310, %c0_150 : index + %312 = subi %c-1_151, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c8_149 : index + %315 = subi %c-1_151, %314 : index + %316 = select %311, %315, %314 : index + %c2_152 = constant 2 : index + %317 = remi_signed %316, %c2_152 : index + %c0_153 = constant 0 : index + %318 = cmpi "slt", %317, %c0_153 : index + %319 = addi %317, %c2_152 : index + %320 = select %318, %319, %317 : index + %321 = load %2[%301, %306, %320] : memref<16x6x2xvector<8xf32>> + %322 = vector.extractelement %321[%c0_i64 : i64] : vector<8xf32> + %c16_154 = constant 16 : index + %c0_155 = constant 0 : index + %c-1_156 = constant -1 : index + %323 = cmpi "slt", %arg5, %c0_155 : index + %324 = subi %c-1_156, %arg5 : index + %325 = select %323, %324, %arg5 : index + %326 = divi_signed %325, %c16_154 : index + %327 = subi %c-1_156, %326 : index + %328 = select %323, %327, %326 : index + %c16_157 = constant 16 : index + %329 = remi_signed %328, %c16_157 : index + %c0_158 = constant 0 : index + %330 = cmpi "slt", %329, %c0_158 : index + %331 = addi %329, %c16_157 : index + %332 = select %330, %331, %329 : index + %333 = addi %arg7, %arg9 : index + %c6_159 = constant 6 : index + %334 = remi_signed %333, %c6_159 : index + %c0_160 = constant 0 : index + %335 = cmpi "slt", %334, %c0_160 : index + %336 = addi %334, %c6_159 : index + %337 = select %335, %336, %334 : index + %c16_161 = constant 16 : index + %338 = remi_signed %arg5, %c16_161 : index + %c0_162 = constant 0 : index + %339 = cmpi "slt", %338, %c0_162 : index + %340 = addi %338, %c16_161 : index + %341 = select %339, %340, %338 : index + %c8_163 = constant 8 : index + %c0_164 = constant 0 : index + %c-1_165 = constant -1 : index + %342 = cmpi "slt", %341, %c0_164 : index + %343 = subi %c-1_165, %341 : index + %344 = select %342, %343, %341 : index + %345 = divi_signed %344, %c8_163 : index + %346 = subi %c-1_165, %345 : index + %347 = select %342, %346, %345 : index + %c2_166 = constant 2 : index + %348 = remi_signed %347, %c2_166 : index + %c0_167 = constant 0 : index + %349 = cmpi "slt", %348, %c0_167 : index + %350 = addi %348, %c2_166 : index + %351 = select %349, %350, %348 : index + %352 = load %2[%332, %337, %351] : memref<16x6x2xvector<8xf32>> + %353 = vector.extractelement %352[%c1_i64 : i64] : vector<8xf32> + %c16_168 = constant 16 : index + %c0_169 = constant 0 : index + %c-1_170 = constant -1 : index + %354 = cmpi "slt", %arg5, %c0_169 : index + %355 = subi %c-1_170, %arg5 : index + %356 = select %354, %355, %arg5 : index + %357 = divi_signed %356, %c16_168 : index + %358 = subi %c-1_170, %357 : index + %359 = select %354, %358, %357 : index + %c16_171 = constant 16 : index + %360 = remi_signed %359, %c16_171 : index + %c0_172 = constant 0 : index + %361 = cmpi "slt", %360, %c0_172 : index + %362 = addi %360, %c16_171 : index + %363 = select %361, %362, %360 : index + %364 = addi %arg7, %arg9 : index + %c6_173 = constant 6 : index + %365 = remi_signed %364, %c6_173 : index + %c0_174 = constant 0 : index + %366 = cmpi "slt", %365, %c0_174 : index + %367 = addi %365, %c6_173 : index + %368 = select %366, %367, %365 : index + %c16_175 = constant 16 : index + %369 = remi_signed %arg5, %c16_175 : index + %c0_176 = constant 0 : index + %370 = cmpi "slt", %369, %c0_176 : index + %371 = addi %369, %c16_175 : index + %372 = select %370, %371, %369 : index + %c8_177 = constant 8 : index + %c0_178 = constant 0 : index + %c-1_179 = constant -1 : index + %373 = cmpi "slt", %372, %c0_178 : index + %374 = subi %c-1_179, %372 : index + %375 = select %373, %374, %372 : index + %376 = divi_signed %375, %c8_177 : index + %377 = subi %c-1_179, %376 : index + %378 = select %373, %377, %376 : index + %c2_180 = constant 2 : index + %379 = remi_signed %378, %c2_180 : index + %c0_181 = constant 0 : index + %380 = cmpi "slt", %379, %c0_181 : index + %381 = addi %379, %c2_180 : index + %382 = select %380, %381, %379 : index + %383 = load %2[%363, %368, %382] : memref<16x6x2xvector<8xf32>> + %384 = vector.extractelement %383[%c2_i64 : i64] : vector<8xf32> + %c16_182 = constant 16 : index + %c0_183 = constant 0 : index + %c-1_184 = constant -1 : index + %385 = cmpi "slt", %arg5, %c0_183 : index + %386 = subi %c-1_184, %arg5 : index + %387 = select %385, %386, %arg5 : index + %388 = divi_signed %387, %c16_182 : index + %389 = subi %c-1_184, %388 : index + %390 = select %385, %389, %388 : index + %c16_185 = constant 16 : index + %391 = remi_signed %390, %c16_185 : index + %c0_186 = constant 0 : index + %392 = cmpi "slt", %391, %c0_186 : index + %393 = addi %391, %c16_185 : index + %394 = select %392, %393, %391 : index + %395 = addi %arg7, %arg9 : index + %c6_187 = constant 6 : index + %396 = remi_signed %395, %c6_187 : index + %c0_188 = constant 0 : index + %397 = cmpi "slt", %396, %c0_188 : index + %398 = addi %396, %c6_187 : index + %399 = select %397, %398, %396 : index + %c16_189 = constant 16 : index + %400 = remi_signed %arg5, %c16_189 : index + %c0_190 = constant 0 : index + %401 = cmpi "slt", %400, %c0_190 : index + %402 = addi %400, %c16_189 : index + %403 = select %401, %402, %400 : index + %c8_191 = constant 8 : index + %c0_192 = constant 0 : index + %c-1_193 = constant -1 : index + %404 = cmpi "slt", %403, %c0_192 : index + %405 = subi %c-1_193, %403 : index + %406 = select %404, %405, %403 : index + %407 = divi_signed %406, %c8_191 : index + %408 = subi %c-1_193, %407 : index + %409 = select %404, %408, %407 : index + %c2_194 = constant 2 : index + %410 = remi_signed %409, %c2_194 : index + %c0_195 = constant 0 : index + %411 = cmpi "slt", %410, %c0_195 : index + %412 = addi %410, %c2_194 : index + %413 = select %411, %412, %410 : index + %414 = load %2[%394, %399, %413] : memref<16x6x2xvector<8xf32>> + %415 = vector.extractelement %414[%c3_i64 : i64] : vector<8xf32> + %c16_196 = constant 16 : index + %c0_197 = constant 0 : index + %c-1_198 = constant -1 : index + %416 = cmpi "slt", %arg5, %c0_197 : index + %417 = subi %c-1_198, %arg5 : index + %418 = select %416, %417, %arg5 : index + %419 = divi_signed %418, %c16_196 : index + %420 = subi %c-1_198, %419 : index + %421 = select %416, %420, %419 : index + %c16_199 = constant 16 : index + %422 = remi_signed %421, %c16_199 : index + %c0_200 = constant 0 : index + %423 = cmpi "slt", %422, %c0_200 : index + %424 = addi %422, %c16_199 : index + %425 = select %423, %424, %422 : index + %426 = addi %arg7, %arg9 : index + %c6_201 = constant 6 : index + %427 = remi_signed %426, %c6_201 : index + %c0_202 = constant 0 : index + %428 = cmpi "slt", %427, %c0_202 : index + %429 = addi %427, %c6_201 : index + %430 = select %428, %429, %427 : index + %c16_203 = constant 16 : index + %431 = remi_signed %arg5, %c16_203 : index + %c0_204 = constant 0 : index + %432 = cmpi "slt", %431, %c0_204 : index + %433 = addi %431, %c16_203 : index + %434 = select %432, %433, %431 : index + %c8_205 = constant 8 : index + %c0_206 = constant 0 : index + %c-1_207 = constant -1 : index + %435 = cmpi "slt", %434, %c0_206 : index + %436 = subi %c-1_207, %434 : index + %437 = select %435, %436, %434 : index + %438 = divi_signed %437, %c8_205 : index + %439 = subi %c-1_207, %438 : index + %440 = select %435, %439, %438 : index + %c2_208 = constant 2 : index + %441 = remi_signed %440, %c2_208 : index + %c0_209 = constant 0 : index + %442 = cmpi "slt", %441, %c0_209 : index + %443 = addi %441, %c2_208 : index + %444 = select %442, %443, %441 : index + %445 = load %2[%425, %430, %444] : memref<16x6x2xvector<8xf32>> + %446 = vector.extractelement %445[%c4_i64 : i64] : vector<8xf32> + %c16_210 = constant 16 : index + %c0_211 = constant 0 : index + %c-1_212 = constant -1 : index + %447 = cmpi "slt", %arg5, %c0_211 : index + %448 = subi %c-1_212, %arg5 : index + %449 = select %447, %448, %arg5 : index + %450 = divi_signed %449, %c16_210 : index + %451 = subi %c-1_212, %450 : index + %452 = select %447, %451, %450 : index + %c16_213 = constant 16 : index + %453 = remi_signed %452, %c16_213 : index + %c0_214 = constant 0 : index + %454 = cmpi "slt", %453, %c0_214 : index + %455 = addi %453, %c16_213 : index + %456 = select %454, %455, %453 : index + %457 = addi %arg7, %arg9 : index + %c6_215 = constant 6 : index + %458 = remi_signed %457, %c6_215 : index + %c0_216 = constant 0 : index + %459 = cmpi "slt", %458, %c0_216 : index + %460 = addi %458, %c6_215 : index + %461 = select %459, %460, %458 : index + %c16_217 = constant 16 : index + %462 = remi_signed %arg5, %c16_217 : index + %c0_218 = constant 0 : index + %463 = cmpi "slt", %462, %c0_218 : index + %464 = addi %462, %c16_217 : index + %465 = select %463, %464, %462 : index + %c8_219 = constant 8 : index + %c0_220 = constant 0 : index + %c-1_221 = constant -1 : index + %466 = cmpi "slt", %465, %c0_220 : index + %467 = subi %c-1_221, %465 : index + %468 = select %466, %467, %465 : index + %469 = divi_signed %468, %c8_219 : index + %470 = subi %c-1_221, %469 : index + %471 = select %466, %470, %469 : index + %c2_222 = constant 2 : index + %472 = remi_signed %471, %c2_222 : index + %c0_223 = constant 0 : index + %473 = cmpi "slt", %472, %c0_223 : index + %474 = addi %472, %c2_222 : index + %475 = select %473, %474, %472 : index + %476 = load %2[%456, %461, %475] : memref<16x6x2xvector<8xf32>> + %477 = vector.extractelement %476[%c5_i64 : i64] : vector<8xf32> + %c16_224 = constant 16 : index + %c0_225 = constant 0 : index + %c-1_226 = constant -1 : index + %478 = cmpi "slt", %arg5, %c0_225 : index + %479 = subi %c-1_226, %arg5 : index + %480 = select %478, %479, %arg5 : index + %481 = divi_signed %480, %c16_224 : index + %482 = subi %c-1_226, %481 : index + %483 = select %478, %482, %481 : index + %c16_227 = constant 16 : index + %484 = remi_signed %483, %c16_227 : index + %c0_228 = constant 0 : index + %485 = cmpi "slt", %484, %c0_228 : index + %486 = addi %484, %c16_227 : index + %487 = select %485, %486, %484 : index + %488 = addi %arg7, %arg9 : index + %c6_229 = constant 6 : index + %489 = remi_signed %488, %c6_229 : index + %c0_230 = constant 0 : index + %490 = cmpi "slt", %489, %c0_230 : index + %491 = addi %489, %c6_229 : index + %492 = select %490, %491, %489 : index + %c16_231 = constant 16 : index + %493 = remi_signed %arg5, %c16_231 : index + %c0_232 = constant 0 : index + %494 = cmpi "slt", %493, %c0_232 : index + %495 = addi %493, %c16_231 : index + %496 = select %494, %495, %493 : index + %c8_233 = constant 8 : index + %c0_234 = constant 0 : index + %c-1_235 = constant -1 : index + %497 = cmpi "slt", %496, %c0_234 : index + %498 = subi %c-1_235, %496 : index + %499 = select %497, %498, %496 : index + %500 = divi_signed %499, %c8_233 : index + %501 = subi %c-1_235, %500 : index + %502 = select %497, %501, %500 : index + %c2_236 = constant 2 : index + %503 = remi_signed %502, %c2_236 : index + %c0_237 = constant 0 : index + %504 = cmpi "slt", %503, %c0_237 : index + %505 = addi %503, %c2_236 : index + %506 = select %504, %505, %503 : index + %507 = load %2[%487, %492, %506] : memref<16x6x2xvector<8xf32>> + %508 = vector.extractelement %507[%c6_i64 : i64] : vector<8xf32> + %c16_238 = constant 16 : index + %c0_239 = constant 0 : index + %c-1_240 = constant -1 : index + %509 = cmpi "slt", %arg5, %c0_239 : index + %510 = subi %c-1_240, %arg5 : index + %511 = select %509, %510, %arg5 : index + %512 = divi_signed %511, %c16_238 : index + %513 = subi %c-1_240, %512 : index + %514 = select %509, %513, %512 : index + %c16_241 = constant 16 : index + %515 = remi_signed %514, %c16_241 : index + %c0_242 = constant 0 : index + %516 = cmpi "slt", %515, %c0_242 : index + %517 = addi %515, %c16_241 : index + %518 = select %516, %517, %515 : index + %519 = addi %arg7, %arg9 : index + %c6_243 = constant 6 : index + %520 = remi_signed %519, %c6_243 : index + %c0_244 = constant 0 : index + %521 = cmpi "slt", %520, %c0_244 : index + %522 = addi %520, %c6_243 : index + %523 = select %521, %522, %520 : index + %c16_245 = constant 16 : index + %524 = remi_signed %arg5, %c16_245 : index + %c0_246 = constant 0 : index + %525 = cmpi "slt", %524, %c0_246 : index + %526 = addi %524, %c16_245 : index + %527 = select %525, %526, %524 : index + %c8_247 = constant 8 : index + %c0_248 = constant 0 : index + %c-1_249 = constant -1 : index + %528 = cmpi "slt", %527, %c0_248 : index + %529 = subi %c-1_249, %527 : index + %530 = select %528, %529, %527 : index + %531 = divi_signed %530, %c8_247 : index + %532 = subi %c-1_249, %531 : index + %533 = select %528, %532, %531 : index + %c2_250 = constant 2 : index + %534 = remi_signed %533, %c2_250 : index + %c0_251 = constant 0 : index + %535 = cmpi "slt", %534, %c0_251 : index + %536 = addi %534, %c2_250 : index + %537 = select %535, %536, %534 : index + %538 = load %2[%518, %523, %537] : memref<16x6x2xvector<8xf32>> + %539 = vector.extractelement %538[%c7_i64 : i64] : vector<8xf32> + %540 = "accv.bin_op"(%322, %284) {predicate = 0 : i64} : (f32, f32) -> f32 + %541 = "accv.bin_op"(%353, %285) {predicate = 0 : i64} : (f32, f32) -> f32 + %542 = "accv.bin_op"(%384, %286) {predicate = 0 : i64} : (f32, f32) -> f32 + %543 = "accv.bin_op"(%415, %287) {predicate = 0 : i64} : (f32, f32) -> f32 + %544 = "accv.bin_op"(%446, %288) {predicate = 0 : i64} : (f32, f32) -> f32 + %545 = "accv.bin_op"(%477, %289) {predicate = 0 : i64} : (f32, f32) -> f32 + %546 = "accv.bin_op"(%508, %290) {predicate = 0 : i64} : (f32, f32) -> f32 + %547 = "accv.bin_op"(%539, %291) {predicate = 0 : i64} : (f32, f32) -> f32 + %c16_252 = constant 16 : index + %c0_253 = constant 0 : index + %c-1_254 = constant -1 : index + %548 = cmpi "slt", %arg5, %c0_253 : index + %549 = subi %c-1_254, %arg5 : index + %550 = select %548, %549, %arg5 : index + %551 = divi_signed %550, %c16_252 : index + %552 = subi %c-1_254, %551 : index + %553 = select %548, %552, %551 : index + %c16_255 = constant 16 : index + %554 = remi_signed %553, %c16_255 : index + %c0_256 = constant 0 : index + %555 = cmpi "slt", %554, %c0_256 : index + %556 = addi %554, %c16_255 : index + %557 = select %555, %556, %554 : index + %558 = addi %arg7, %arg9 : index + %c6_257 = constant 6 : index + %559 = remi_signed %558, %c6_257 : index + %c0_258 = constant 0 : index + %560 = cmpi "slt", %559, %c0_258 : index + %561 = addi %559, %c6_257 : index + %562 = select %560, %561, %559 : index + %c16_259 = constant 16 : index + %563 = remi_signed %arg5, %c16_259 : index + %c0_260 = constant 0 : index + %564 = cmpi "slt", %563, %c0_260 : index + %565 = addi %563, %c16_259 : index + %566 = select %564, %565, %563 : index + %c8_261 = constant 8 : index + %c0_262 = constant 0 : index + %c-1_263 = constant -1 : index + %567 = cmpi "slt", %566, %c0_262 : index + %568 = subi %c-1_263, %566 : index + %569 = select %567, %568, %566 : index + %570 = divi_signed %569, %c8_261 : index + %571 = subi %c-1_263, %570 : index + %572 = select %567, %571, %570 : index + %c2_264 = constant 2 : index + %573 = remi_signed %572, %c2_264 : index + %c0_265 = constant 0 : index + %574 = cmpi "slt", %573, %c0_265 : index + %575 = addi %573, %c2_264 : index + %576 = select %574, %575, %573 : index + %577 = load %2[%557, %562, %576] : memref<16x6x2xvector<8xf32>> + %578 = vector.insertelement %540, %577[%c0_i64 : i64] : vector<8xf32> + %c16_266 = constant 16 : index + %c0_267 = constant 0 : index + %c-1_268 = constant -1 : index + %579 = cmpi "slt", %arg5, %c0_267 : index + %580 = subi %c-1_268, %arg5 : index + %581 = select %579, %580, %arg5 : index + %582 = divi_signed %581, %c16_266 : index + %583 = subi %c-1_268, %582 : index + %584 = select %579, %583, %582 : index + %c16_269 = constant 16 : index + %585 = remi_signed %584, %c16_269 : index + %c0_270 = constant 0 : index + %586 = cmpi "slt", %585, %c0_270 : index + %587 = addi %585, %c16_269 : index + %588 = select %586, %587, %585 : index + %589 = addi %arg7, %arg9 : index + %c6_271 = constant 6 : index + %590 = remi_signed %589, %c6_271 : index + %c0_272 = constant 0 : index + %591 = cmpi "slt", %590, %c0_272 : index + %592 = addi %590, %c6_271 : index + %593 = select %591, %592, %590 : index + %c16_273 = constant 16 : index + %594 = remi_signed %arg5, %c16_273 : index + %c0_274 = constant 0 : index + %595 = cmpi "slt", %594, %c0_274 : index + %596 = addi %594, %c16_273 : index + %597 = select %595, %596, %594 : index + %c8_275 = constant 8 : index + %c0_276 = constant 0 : index + %c-1_277 = constant -1 : index + %598 = cmpi "slt", %597, %c0_276 : index + %599 = subi %c-1_277, %597 : index + %600 = select %598, %599, %597 : index + %601 = divi_signed %600, %c8_275 : index + %602 = subi %c-1_277, %601 : index + %603 = select %598, %602, %601 : index + %c2_278 = constant 2 : index + %604 = remi_signed %603, %c2_278 : index + %c0_279 = constant 0 : index + %605 = cmpi "slt", %604, %c0_279 : index + %606 = addi %604, %c2_278 : index + %607 = select %605, %606, %604 : index + store %578, %2[%588, %593, %607] : memref<16x6x2xvector<8xf32>> + %c16_280 = constant 16 : index + %c0_281 = constant 0 : index + %c-1_282 = constant -1 : index + %608 = cmpi "slt", %arg5, %c0_281 : index + %609 = subi %c-1_282, %arg5 : index + %610 = select %608, %609, %arg5 : index + %611 = divi_signed %610, %c16_280 : index + %612 = subi %c-1_282, %611 : index + %613 = select %608, %612, %611 : index + %c16_283 = constant 16 : index + %614 = remi_signed %613, %c16_283 : index + %c0_284 = constant 0 : index + %615 = cmpi "slt", %614, %c0_284 : index + %616 = addi %614, %c16_283 : index + %617 = select %615, %616, %614 : index + %618 = addi %arg7, %arg9 : index + %c6_285 = constant 6 : index + %619 = remi_signed %618, %c6_285 : index + %c0_286 = constant 0 : index + %620 = cmpi "slt", %619, %c0_286 : index + %621 = addi %619, %c6_285 : index + %622 = select %620, %621, %619 : index + %c16_287 = constant 16 : index + %623 = remi_signed %arg5, %c16_287 : index + %c0_288 = constant 0 : index + %624 = cmpi "slt", %623, %c0_288 : index + %625 = addi %623, %c16_287 : index + %626 = select %624, %625, %623 : index + %c8_289 = constant 8 : index + %c0_290 = constant 0 : index + %c-1_291 = constant -1 : index + %627 = cmpi "slt", %626, %c0_290 : index + %628 = subi %c-1_291, %626 : index + %629 = select %627, %628, %626 : index + %630 = divi_signed %629, %c8_289 : index + %631 = subi %c-1_291, %630 : index + %632 = select %627, %631, %630 : index + %c2_292 = constant 2 : index + %633 = remi_signed %632, %c2_292 : index + %c0_293 = constant 0 : index + %634 = cmpi "slt", %633, %c0_293 : index + %635 = addi %633, %c2_292 : index + %636 = select %634, %635, %633 : index + %637 = load %2[%617, %622, %636] : memref<16x6x2xvector<8xf32>> + %638 = vector.insertelement %541, %637[%c1_i64 : i64] : vector<8xf32> + %c16_294 = constant 16 : index + %c0_295 = constant 0 : index + %c-1_296 = constant -1 : index + %639 = cmpi "slt", %arg5, %c0_295 : index + %640 = subi %c-1_296, %arg5 : index + %641 = select %639, %640, %arg5 : index + %642 = divi_signed %641, %c16_294 : index + %643 = subi %c-1_296, %642 : index + %644 = select %639, %643, %642 : index + %c16_297 = constant 16 : index + %645 = remi_signed %644, %c16_297 : index + %c0_298 = constant 0 : index + %646 = cmpi "slt", %645, %c0_298 : index + %647 = addi %645, %c16_297 : index + %648 = select %646, %647, %645 : index + %649 = addi %arg7, %arg9 : index + %c6_299 = constant 6 : index + %650 = remi_signed %649, %c6_299 : index + %c0_300 = constant 0 : index + %651 = cmpi "slt", %650, %c0_300 : index + %652 = addi %650, %c6_299 : index + %653 = select %651, %652, %650 : index + %c16_301 = constant 16 : index + %654 = remi_signed %arg5, %c16_301 : index + %c0_302 = constant 0 : index + %655 = cmpi "slt", %654, %c0_302 : index + %656 = addi %654, %c16_301 : index + %657 = select %655, %656, %654 : index + %c8_303 = constant 8 : index + %c0_304 = constant 0 : index + %c-1_305 = constant -1 : index + %658 = cmpi "slt", %657, %c0_304 : index + %659 = subi %c-1_305, %657 : index + %660 = select %658, %659, %657 : index + %661 = divi_signed %660, %c8_303 : index + %662 = subi %c-1_305, %661 : index + %663 = select %658, %662, %661 : index + %c2_306 = constant 2 : index + %664 = remi_signed %663, %c2_306 : index + %c0_307 = constant 0 : index + %665 = cmpi "slt", %664, %c0_307 : index + %666 = addi %664, %c2_306 : index + %667 = select %665, %666, %664 : index + store %638, %2[%648, %653, %667] : memref<16x6x2xvector<8xf32>> + %c16_308 = constant 16 : index + %c0_309 = constant 0 : index + %c-1_310 = constant -1 : index + %668 = cmpi "slt", %arg5, %c0_309 : index + %669 = subi %c-1_310, %arg5 : index + %670 = select %668, %669, %arg5 : index + %671 = divi_signed %670, %c16_308 : index + %672 = subi %c-1_310, %671 : index + %673 = select %668, %672, %671 : index + %c16_311 = constant 16 : index + %674 = remi_signed %673, %c16_311 : index + %c0_312 = constant 0 : index + %675 = cmpi "slt", %674, %c0_312 : index + %676 = addi %674, %c16_311 : index + %677 = select %675, %676, %674 : index + %678 = addi %arg7, %arg9 : index + %c6_313 = constant 6 : index + %679 = remi_signed %678, %c6_313 : index + %c0_314 = constant 0 : index + %680 = cmpi "slt", %679, %c0_314 : index + %681 = addi %679, %c6_313 : index + %682 = select %680, %681, %679 : index + %c16_315 = constant 16 : index + %683 = remi_signed %arg5, %c16_315 : index + %c0_316 = constant 0 : index + %684 = cmpi "slt", %683, %c0_316 : index + %685 = addi %683, %c16_315 : index + %686 = select %684, %685, %683 : index + %c8_317 = constant 8 : index + %c0_318 = constant 0 : index + %c-1_319 = constant -1 : index + %687 = cmpi "slt", %686, %c0_318 : index + %688 = subi %c-1_319, %686 : index + %689 = select %687, %688, %686 : index + %690 = divi_signed %689, %c8_317 : index + %691 = subi %c-1_319, %690 : index + %692 = select %687, %691, %690 : index + %c2_320 = constant 2 : index + %693 = remi_signed %692, %c2_320 : index + %c0_321 = constant 0 : index + %694 = cmpi "slt", %693, %c0_321 : index + %695 = addi %693, %c2_320 : index + %696 = select %694, %695, %693 : index + %697 = load %2[%677, %682, %696] : memref<16x6x2xvector<8xf32>> + %698 = vector.insertelement %542, %697[%c2_i64 : i64] : vector<8xf32> + %c16_322 = constant 16 : index + %c0_323 = constant 0 : index + %c-1_324 = constant -1 : index + %699 = cmpi "slt", %arg5, %c0_323 : index + %700 = subi %c-1_324, %arg5 : index + %701 = select %699, %700, %arg5 : index + %702 = divi_signed %701, %c16_322 : index + %703 = subi %c-1_324, %702 : index + %704 = select %699, %703, %702 : index + %c16_325 = constant 16 : index + %705 = remi_signed %704, %c16_325 : index + %c0_326 = constant 0 : index + %706 = cmpi "slt", %705, %c0_326 : index + %707 = addi %705, %c16_325 : index + %708 = select %706, %707, %705 : index + %709 = addi %arg7, %arg9 : index + %c6_327 = constant 6 : index + %710 = remi_signed %709, %c6_327 : index + %c0_328 = constant 0 : index + %711 = cmpi "slt", %710, %c0_328 : index + %712 = addi %710, %c6_327 : index + %713 = select %711, %712, %710 : index + %c16_329 = constant 16 : index + %714 = remi_signed %arg5, %c16_329 : index + %c0_330 = constant 0 : index + %715 = cmpi "slt", %714, %c0_330 : index + %716 = addi %714, %c16_329 : index + %717 = select %715, %716, %714 : index + %c8_331 = constant 8 : index + %c0_332 = constant 0 : index + %c-1_333 = constant -1 : index + %718 = cmpi "slt", %717, %c0_332 : index + %719 = subi %c-1_333, %717 : index + %720 = select %718, %719, %717 : index + %721 = divi_signed %720, %c8_331 : index + %722 = subi %c-1_333, %721 : index + %723 = select %718, %722, %721 : index + %c2_334 = constant 2 : index + %724 = remi_signed %723, %c2_334 : index + %c0_335 = constant 0 : index + %725 = cmpi "slt", %724, %c0_335 : index + %726 = addi %724, %c2_334 : index + %727 = select %725, %726, %724 : index + store %698, %2[%708, %713, %727] : memref<16x6x2xvector<8xf32>> + %c16_336 = constant 16 : index + %c0_337 = constant 0 : index + %c-1_338 = constant -1 : index + %728 = cmpi "slt", %arg5, %c0_337 : index + %729 = subi %c-1_338, %arg5 : index + %730 = select %728, %729, %arg5 : index + %731 = divi_signed %730, %c16_336 : index + %732 = subi %c-1_338, %731 : index + %733 = select %728, %732, %731 : index + %c16_339 = constant 16 : index + %734 = remi_signed %733, %c16_339 : index + %c0_340 = constant 0 : index + %735 = cmpi "slt", %734, %c0_340 : index + %736 = addi %734, %c16_339 : index + %737 = select %735, %736, %734 : index + %738 = addi %arg7, %arg9 : index + %c6_341 = constant 6 : index + %739 = remi_signed %738, %c6_341 : index + %c0_342 = constant 0 : index + %740 = cmpi "slt", %739, %c0_342 : index + %741 = addi %739, %c6_341 : index + %742 = select %740, %741, %739 : index + %c16_343 = constant 16 : index + %743 = remi_signed %arg5, %c16_343 : index + %c0_344 = constant 0 : index + %744 = cmpi "slt", %743, %c0_344 : index + %745 = addi %743, %c16_343 : index + %746 = select %744, %745, %743 : index + %c8_345 = constant 8 : index + %c0_346 = constant 0 : index + %c-1_347 = constant -1 : index + %747 = cmpi "slt", %746, %c0_346 : index + %748 = subi %c-1_347, %746 : index + %749 = select %747, %748, %746 : index + %750 = divi_signed %749, %c8_345 : index + %751 = subi %c-1_347, %750 : index + %752 = select %747, %751, %750 : index + %c2_348 = constant 2 : index + %753 = remi_signed %752, %c2_348 : index + %c0_349 = constant 0 : index + %754 = cmpi "slt", %753, %c0_349 : index + %755 = addi %753, %c2_348 : index + %756 = select %754, %755, %753 : index + %757 = load %2[%737, %742, %756] : memref<16x6x2xvector<8xf32>> + %758 = vector.insertelement %543, %757[%c3_i64 : i64] : vector<8xf32> + %c16_350 = constant 16 : index + %c0_351 = constant 0 : index + %c-1_352 = constant -1 : index + %759 = cmpi "slt", %arg5, %c0_351 : index + %760 = subi %c-1_352, %arg5 : index + %761 = select %759, %760, %arg5 : index + %762 = divi_signed %761, %c16_350 : index + %763 = subi %c-1_352, %762 : index + %764 = select %759, %763, %762 : index + %c16_353 = constant 16 : index + %765 = remi_signed %764, %c16_353 : index + %c0_354 = constant 0 : index + %766 = cmpi "slt", %765, %c0_354 : index + %767 = addi %765, %c16_353 : index + %768 = select %766, %767, %765 : index + %769 = addi %arg7, %arg9 : index + %c6_355 = constant 6 : index + %770 = remi_signed %769, %c6_355 : index + %c0_356 = constant 0 : index + %771 = cmpi "slt", %770, %c0_356 : index + %772 = addi %770, %c6_355 : index + %773 = select %771, %772, %770 : index + %c16_357 = constant 16 : index + %774 = remi_signed %arg5, %c16_357 : index + %c0_358 = constant 0 : index + %775 = cmpi "slt", %774, %c0_358 : index + %776 = addi %774, %c16_357 : index + %777 = select %775, %776, %774 : index + %c8_359 = constant 8 : index + %c0_360 = constant 0 : index + %c-1_361 = constant -1 : index + %778 = cmpi "slt", %777, %c0_360 : index + %779 = subi %c-1_361, %777 : index + %780 = select %778, %779, %777 : index + %781 = divi_signed %780, %c8_359 : index + %782 = subi %c-1_361, %781 : index + %783 = select %778, %782, %781 : index + %c2_362 = constant 2 : index + %784 = remi_signed %783, %c2_362 : index + %c0_363 = constant 0 : index + %785 = cmpi "slt", %784, %c0_363 : index + %786 = addi %784, %c2_362 : index + %787 = select %785, %786, %784 : index + store %758, %2[%768, %773, %787] : memref<16x6x2xvector<8xf32>> + %c16_364 = constant 16 : index + %c0_365 = constant 0 : index + %c-1_366 = constant -1 : index + %788 = cmpi "slt", %arg5, %c0_365 : index + %789 = subi %c-1_366, %arg5 : index + %790 = select %788, %789, %arg5 : index + %791 = divi_signed %790, %c16_364 : index + %792 = subi %c-1_366, %791 : index + %793 = select %788, %792, %791 : index + %c16_367 = constant 16 : index + %794 = remi_signed %793, %c16_367 : index + %c0_368 = constant 0 : index + %795 = cmpi "slt", %794, %c0_368 : index + %796 = addi %794, %c16_367 : index + %797 = select %795, %796, %794 : index + %798 = addi %arg7, %arg9 : index + %c6_369 = constant 6 : index + %799 = remi_signed %798, %c6_369 : index + %c0_370 = constant 0 : index + %800 = cmpi "slt", %799, %c0_370 : index + %801 = addi %799, %c6_369 : index + %802 = select %800, %801, %799 : index + %c16_371 = constant 16 : index + %803 = remi_signed %arg5, %c16_371 : index + %c0_372 = constant 0 : index + %804 = cmpi "slt", %803, %c0_372 : index + %805 = addi %803, %c16_371 : index + %806 = select %804, %805, %803 : index + %c8_373 = constant 8 : index + %c0_374 = constant 0 : index + %c-1_375 = constant -1 : index + %807 = cmpi "slt", %806, %c0_374 : index + %808 = subi %c-1_375, %806 : index + %809 = select %807, %808, %806 : index + %810 = divi_signed %809, %c8_373 : index + %811 = subi %c-1_375, %810 : index + %812 = select %807, %811, %810 : index + %c2_376 = constant 2 : index + %813 = remi_signed %812, %c2_376 : index + %c0_377 = constant 0 : index + %814 = cmpi "slt", %813, %c0_377 : index + %815 = addi %813, %c2_376 : index + %816 = select %814, %815, %813 : index + %817 = load %2[%797, %802, %816] : memref<16x6x2xvector<8xf32>> + %818 = vector.insertelement %544, %817[%c4_i64 : i64] : vector<8xf32> + %c16_378 = constant 16 : index + %c0_379 = constant 0 : index + %c-1_380 = constant -1 : index + %819 = cmpi "slt", %arg5, %c0_379 : index + %820 = subi %c-1_380, %arg5 : index + %821 = select %819, %820, %arg5 : index + %822 = divi_signed %821, %c16_378 : index + %823 = subi %c-1_380, %822 : index + %824 = select %819, %823, %822 : index + %c16_381 = constant 16 : index + %825 = remi_signed %824, %c16_381 : index + %c0_382 = constant 0 : index + %826 = cmpi "slt", %825, %c0_382 : index + %827 = addi %825, %c16_381 : index + %828 = select %826, %827, %825 : index + %829 = addi %arg7, %arg9 : index + %c6_383 = constant 6 : index + %830 = remi_signed %829, %c6_383 : index + %c0_384 = constant 0 : index + %831 = cmpi "slt", %830, %c0_384 : index + %832 = addi %830, %c6_383 : index + %833 = select %831, %832, %830 : index + %c16_385 = constant 16 : index + %834 = remi_signed %arg5, %c16_385 : index + %c0_386 = constant 0 : index + %835 = cmpi "slt", %834, %c0_386 : index + %836 = addi %834, %c16_385 : index + %837 = select %835, %836, %834 : index + %c8_387 = constant 8 : index + %c0_388 = constant 0 : index + %c-1_389 = constant -1 : index + %838 = cmpi "slt", %837, %c0_388 : index + %839 = subi %c-1_389, %837 : index + %840 = select %838, %839, %837 : index + %841 = divi_signed %840, %c8_387 : index + %842 = subi %c-1_389, %841 : index + %843 = select %838, %842, %841 : index + %c2_390 = constant 2 : index + %844 = remi_signed %843, %c2_390 : index + %c0_391 = constant 0 : index + %845 = cmpi "slt", %844, %c0_391 : index + %846 = addi %844, %c2_390 : index + %847 = select %845, %846, %844 : index + store %818, %2[%828, %833, %847] : memref<16x6x2xvector<8xf32>> + %c16_392 = constant 16 : index + %c0_393 = constant 0 : index + %c-1_394 = constant -1 : index + %848 = cmpi "slt", %arg5, %c0_393 : index + %849 = subi %c-1_394, %arg5 : index + %850 = select %848, %849, %arg5 : index + %851 = divi_signed %850, %c16_392 : index + %852 = subi %c-1_394, %851 : index + %853 = select %848, %852, %851 : index + %c16_395 = constant 16 : index + %854 = remi_signed %853, %c16_395 : index + %c0_396 = constant 0 : index + %855 = cmpi "slt", %854, %c0_396 : index + %856 = addi %854, %c16_395 : index + %857 = select %855, %856, %854 : index + %858 = addi %arg7, %arg9 : index + %c6_397 = constant 6 : index + %859 = remi_signed %858, %c6_397 : index + %c0_398 = constant 0 : index + %860 = cmpi "slt", %859, %c0_398 : index + %861 = addi %859, %c6_397 : index + %862 = select %860, %861, %859 : index + %c16_399 = constant 16 : index + %863 = remi_signed %arg5, %c16_399 : index + %c0_400 = constant 0 : index + %864 = cmpi "slt", %863, %c0_400 : index + %865 = addi %863, %c16_399 : index + %866 = select %864, %865, %863 : index + %c8_401 = constant 8 : index + %c0_402 = constant 0 : index + %c-1_403 = constant -1 : index + %867 = cmpi "slt", %866, %c0_402 : index + %868 = subi %c-1_403, %866 : index + %869 = select %867, %868, %866 : index + %870 = divi_signed %869, %c8_401 : index + %871 = subi %c-1_403, %870 : index + %872 = select %867, %871, %870 : index + %c2_404 = constant 2 : index + %873 = remi_signed %872, %c2_404 : index + %c0_405 = constant 0 : index + %874 = cmpi "slt", %873, %c0_405 : index + %875 = addi %873, %c2_404 : index + %876 = select %874, %875, %873 : index + %877 = load %2[%857, %862, %876] : memref<16x6x2xvector<8xf32>> + %878 = vector.insertelement %545, %877[%c5_i64 : i64] : vector<8xf32> + %c16_406 = constant 16 : index + %c0_407 = constant 0 : index + %c-1_408 = constant -1 : index + %879 = cmpi "slt", %arg5, %c0_407 : index + %880 = subi %c-1_408, %arg5 : index + %881 = select %879, %880, %arg5 : index + %882 = divi_signed %881, %c16_406 : index + %883 = subi %c-1_408, %882 : index + %884 = select %879, %883, %882 : index + %c16_409 = constant 16 : index + %885 = remi_signed %884, %c16_409 : index + %c0_410 = constant 0 : index + %886 = cmpi "slt", %885, %c0_410 : index + %887 = addi %885, %c16_409 : index + %888 = select %886, %887, %885 : index + %889 = addi %arg7, %arg9 : index + %c6_411 = constant 6 : index + %890 = remi_signed %889, %c6_411 : index + %c0_412 = constant 0 : index + %891 = cmpi "slt", %890, %c0_412 : index + %892 = addi %890, %c6_411 : index + %893 = select %891, %892, %890 : index + %c16_413 = constant 16 : index + %894 = remi_signed %arg5, %c16_413 : index + %c0_414 = constant 0 : index + %895 = cmpi "slt", %894, %c0_414 : index + %896 = addi %894, %c16_413 : index + %897 = select %895, %896, %894 : index + %c8_415 = constant 8 : index + %c0_416 = constant 0 : index + %c-1_417 = constant -1 : index + %898 = cmpi "slt", %897, %c0_416 : index + %899 = subi %c-1_417, %897 : index + %900 = select %898, %899, %897 : index + %901 = divi_signed %900, %c8_415 : index + %902 = subi %c-1_417, %901 : index + %903 = select %898, %902, %901 : index + %c2_418 = constant 2 : index + %904 = remi_signed %903, %c2_418 : index + %c0_419 = constant 0 : index + %905 = cmpi "slt", %904, %c0_419 : index + %906 = addi %904, %c2_418 : index + %907 = select %905, %906, %904 : index + store %878, %2[%888, %893, %907] : memref<16x6x2xvector<8xf32>> + %c16_420 = constant 16 : index + %c0_421 = constant 0 : index + %c-1_422 = constant -1 : index + %908 = cmpi "slt", %arg5, %c0_421 : index + %909 = subi %c-1_422, %arg5 : index + %910 = select %908, %909, %arg5 : index + %911 = divi_signed %910, %c16_420 : index + %912 = subi %c-1_422, %911 : index + %913 = select %908, %912, %911 : index + %c16_423 = constant 16 : index + %914 = remi_signed %913, %c16_423 : index + %c0_424 = constant 0 : index + %915 = cmpi "slt", %914, %c0_424 : index + %916 = addi %914, %c16_423 : index + %917 = select %915, %916, %914 : index + %918 = addi %arg7, %arg9 : index + %c6_425 = constant 6 : index + %919 = remi_signed %918, %c6_425 : index + %c0_426 = constant 0 : index + %920 = cmpi "slt", %919, %c0_426 : index + %921 = addi %919, %c6_425 : index + %922 = select %920, %921, %919 : index + %c16_427 = constant 16 : index + %923 = remi_signed %arg5, %c16_427 : index + %c0_428 = constant 0 : index + %924 = cmpi "slt", %923, %c0_428 : index + %925 = addi %923, %c16_427 : index + %926 = select %924, %925, %923 : index + %c8_429 = constant 8 : index + %c0_430 = constant 0 : index + %c-1_431 = constant -1 : index + %927 = cmpi "slt", %926, %c0_430 : index + %928 = subi %c-1_431, %926 : index + %929 = select %927, %928, %926 : index + %930 = divi_signed %929, %c8_429 : index + %931 = subi %c-1_431, %930 : index + %932 = select %927, %931, %930 : index + %c2_432 = constant 2 : index + %933 = remi_signed %932, %c2_432 : index + %c0_433 = constant 0 : index + %934 = cmpi "slt", %933, %c0_433 : index + %935 = addi %933, %c2_432 : index + %936 = select %934, %935, %933 : index + %937 = load %2[%917, %922, %936] : memref<16x6x2xvector<8xf32>> + %938 = vector.insertelement %546, %937[%c6_i64 : i64] : vector<8xf32> + %c16_434 = constant 16 : index + %c0_435 = constant 0 : index + %c-1_436 = constant -1 : index + %939 = cmpi "slt", %arg5, %c0_435 : index + %940 = subi %c-1_436, %arg5 : index + %941 = select %939, %940, %arg5 : index + %942 = divi_signed %941, %c16_434 : index + %943 = subi %c-1_436, %942 : index + %944 = select %939, %943, %942 : index + %c16_437 = constant 16 : index + %945 = remi_signed %944, %c16_437 : index + %c0_438 = constant 0 : index + %946 = cmpi "slt", %945, %c0_438 : index + %947 = addi %945, %c16_437 : index + %948 = select %946, %947, %945 : index + %949 = addi %arg7, %arg9 : index + %c6_439 = constant 6 : index + %950 = remi_signed %949, %c6_439 : index + %c0_440 = constant 0 : index + %951 = cmpi "slt", %950, %c0_440 : index + %952 = addi %950, %c6_439 : index + %953 = select %951, %952, %950 : index + %c16_441 = constant 16 : index + %954 = remi_signed %arg5, %c16_441 : index + %c0_442 = constant 0 : index + %955 = cmpi "slt", %954, %c0_442 : index + %956 = addi %954, %c16_441 : index + %957 = select %955, %956, %954 : index + %c8_443 = constant 8 : index + %c0_444 = constant 0 : index + %c-1_445 = constant -1 : index + %958 = cmpi "slt", %957, %c0_444 : index + %959 = subi %c-1_445, %957 : index + %960 = select %958, %959, %957 : index + %961 = divi_signed %960, %c8_443 : index + %962 = subi %c-1_445, %961 : index + %963 = select %958, %962, %961 : index + %c2_446 = constant 2 : index + %964 = remi_signed %963, %c2_446 : index + %c0_447 = constant 0 : index + %965 = cmpi "slt", %964, %c0_447 : index + %966 = addi %964, %c2_446 : index + %967 = select %965, %966, %964 : index + store %938, %2[%948, %953, %967] : memref<16x6x2xvector<8xf32>> + %c16_448 = constant 16 : index + %c0_449 = constant 0 : index + %c-1_450 = constant -1 : index + %968 = cmpi "slt", %arg5, %c0_449 : index + %969 = subi %c-1_450, %arg5 : index + %970 = select %968, %969, %arg5 : index + %971 = divi_signed %970, %c16_448 : index + %972 = subi %c-1_450, %971 : index + %973 = select %968, %972, %971 : index + %c16_451 = constant 16 : index + %974 = remi_signed %973, %c16_451 : index + %c0_452 = constant 0 : index + %975 = cmpi "slt", %974, %c0_452 : index + %976 = addi %974, %c16_451 : index + %977 = select %975, %976, %974 : index + %978 = addi %arg7, %arg9 : index + %c6_453 = constant 6 : index + %979 = remi_signed %978, %c6_453 : index + %c0_454 = constant 0 : index + %980 = cmpi "slt", %979, %c0_454 : index + %981 = addi %979, %c6_453 : index + %982 = select %980, %981, %979 : index + %c16_455 = constant 16 : index + %983 = remi_signed %arg5, %c16_455 : index + %c0_456 = constant 0 : index + %984 = cmpi "slt", %983, %c0_456 : index + %985 = addi %983, %c16_455 : index + %986 = select %984, %985, %983 : index + %c8_457 = constant 8 : index + %c0_458 = constant 0 : index + %c-1_459 = constant -1 : index + %987 = cmpi "slt", %986, %c0_458 : index + %988 = subi %c-1_459, %986 : index + %989 = select %987, %988, %986 : index + %990 = divi_signed %989, %c8_457 : index + %991 = subi %c-1_459, %990 : index + %992 = select %987, %991, %990 : index + %c2_460 = constant 2 : index + %993 = remi_signed %992, %c2_460 : index + %c0_461 = constant 0 : index + %994 = cmpi "slt", %993, %c0_461 : index + %995 = addi %993, %c2_460 : index + %996 = select %994, %995, %993 : index + %997 = load %2[%977, %982, %996] : memref<16x6x2xvector<8xf32>> + %998 = vector.insertelement %547, %997[%c7_i64 : i64] : vector<8xf32> + %c16_462 = constant 16 : index + %c0_463 = constant 0 : index + %c-1_464 = constant -1 : index + %999 = cmpi "slt", %arg5, %c0_463 : index + %1000 = subi %c-1_464, %arg5 : index + %1001 = select %999, %1000, %arg5 : index + %1002 = divi_signed %1001, %c16_462 : index + %1003 = subi %c-1_464, %1002 : index + %1004 = select %999, %1003, %1002 : index + %c16_465 = constant 16 : index + %1005 = remi_signed %1004, %c16_465 : index + %c0_466 = constant 0 : index + %1006 = cmpi "slt", %1005, %c0_466 : index + %1007 = addi %1005, %c16_465 : index + %1008 = select %1006, %1007, %1005 : index + %1009 = addi %arg7, %arg9 : index + %c6_467 = constant 6 : index + %1010 = remi_signed %1009, %c6_467 : index + %c0_468 = constant 0 : index + %1011 = cmpi "slt", %1010, %c0_468 : index + %1012 = addi %1010, %c6_467 : index + %1013 = select %1011, %1012, %1010 : index + %c16_469 = constant 16 : index + %1014 = remi_signed %arg5, %c16_469 : index + %c0_470 = constant 0 : index + %1015 = cmpi "slt", %1014, %c0_470 : index + %1016 = addi %1014, %c16_469 : index + %1017 = select %1015, %1016, %1014 : index + %c8_471 = constant 8 : index + %c0_472 = constant 0 : index + %c-1_473 = constant -1 : index + %1018 = cmpi "slt", %1017, %c0_472 : index + %1019 = subi %c-1_473, %1017 : index + %1020 = select %1018, %1019, %1017 : index + %1021 = divi_signed %1020, %c8_471 : index + %1022 = subi %c-1_473, %1021 : index + %1023 = select %1018, %1022, %1021 : index + %c2_474 = constant 2 : index + %1024 = remi_signed %1023, %c2_474 : index + %c0_475 = constant 0 : index + %1025 = cmpi "slt", %1024, %c0_475 : index + %1026 = addi %1024, %c2_474 : index + %1027 = select %1025, %1026, %1024 : index + store %998, %2[%1008, %1013, %1027] : memref<16x6x2xvector<8xf32>> + %c16_476 = constant 16 : index + %c0_477 = constant 0 : index + %c-1_478 = constant -1 : index + %1028 = cmpi "slt", %arg5, %c0_477 : index + %1029 = subi %c-1_478, %arg5 : index + %1030 = select %1028, %1029, %arg5 : index + %1031 = divi_signed %1030, %c16_476 : index + %1032 = subi %c-1_478, %1031 : index + %1033 = select %1028, %1032, %1031 : index + %c16_479 = constant 16 : index + %1034 = remi_signed %1033, %c16_479 : index + %c0_480 = constant 0 : index + %1035 = cmpi "slt", %1034, %c0_480 : index + %1036 = addi %1034, %c16_479 : index + %1037 = select %1035, %1036, %1034 : index + %1038 = addi %arg7, %arg9 : index + %c6_481 = constant 6 : index + %1039 = remi_signed %1038, %c6_481 : index + %c0_482 = constant 0 : index + %1040 = cmpi "slt", %1039, %c0_482 : index + %1041 = addi %1039, %c6_481 : index + %1042 = select %1040, %1041, %1039 : index + %c16_483 = constant 16 : index + %1043 = remi_signed %arg5, %c16_483 : index + %c0_484 = constant 0 : index + %1044 = cmpi "slt", %1043, %c0_484 : index + %1045 = addi %1043, %c16_483 : index + %1046 = select %1044, %1045, %1043 : index + %c8_485 = constant 8 : index + %c0_486 = constant 0 : index + %c-1_487 = constant -1 : index + %1047 = cmpi "slt", %1046, %c0_486 : index + %1048 = subi %c-1_487, %1046 : index + %1049 = select %1047, %1048, %1046 : index + %1050 = divi_signed %1049, %c8_485 : index + %1051 = subi %c-1_487, %1050 : index + %1052 = select %1047, %1051, %1050 : index + %c2_488 = constant 2 : index + %1053 = remi_signed %1052, %c2_488 : index + %c0_489 = constant 0 : index + %1054 = cmpi "slt", %1053, %c0_489 : index + %1055 = addi %1053, %c2_488 : index + %1056 = select %1054, %1055, %1053 : index + %1057 = load %2[%1037, %1042, %1056] : memref<16x6x2xvector<8xf32>> + %1058 = vector.insertelement %540, %1057[%c0_i64 : i64] : vector<8xf32> + %c16_490 = constant 16 : index + %c0_491 = constant 0 : index + %c-1_492 = constant -1 : index + %1059 = cmpi "slt", %arg5, %c0_491 : index + %1060 = subi %c-1_492, %arg5 : index + %1061 = select %1059, %1060, %arg5 : index + %1062 = divi_signed %1061, %c16_490 : index + %1063 = subi %c-1_492, %1062 : index + %1064 = select %1059, %1063, %1062 : index + %c16_493 = constant 16 : index + %1065 = remi_signed %1064, %c16_493 : index + %c0_494 = constant 0 : index + %1066 = cmpi "slt", %1065, %c0_494 : index + %1067 = addi %1065, %c16_493 : index + %1068 = select %1066, %1067, %1065 : index + %1069 = addi %arg7, %arg9 : index + %c6_495 = constant 6 : index + %1070 = remi_signed %1069, %c6_495 : index + %c0_496 = constant 0 : index + %1071 = cmpi "slt", %1070, %c0_496 : index + %1072 = addi %1070, %c6_495 : index + %1073 = select %1071, %1072, %1070 : index + %c16_497 = constant 16 : index + %1074 = remi_signed %arg5, %c16_497 : index + %c0_498 = constant 0 : index + %1075 = cmpi "slt", %1074, %c0_498 : index + %1076 = addi %1074, %c16_497 : index + %1077 = select %1075, %1076, %1074 : index + %c8_499 = constant 8 : index + %c0_500 = constant 0 : index + %c-1_501 = constant -1 : index + %1078 = cmpi "slt", %1077, %c0_500 : index + %1079 = subi %c-1_501, %1077 : index + %1080 = select %1078, %1079, %1077 : index + %1081 = divi_signed %1080, %c8_499 : index + %1082 = subi %c-1_501, %1081 : index + %1083 = select %1078, %1082, %1081 : index + %c2_502 = constant 2 : index + %1084 = remi_signed %1083, %c2_502 : index + %c0_503 = constant 0 : index + %1085 = cmpi "slt", %1084, %c0_503 : index + %1086 = addi %1084, %c2_502 : index + %1087 = select %1085, %1086, %1084 : index + store %1058, %2[%1068, %1073, %1087] : memref<16x6x2xvector<8xf32>> + %c16_504 = constant 16 : index + %c0_505 = constant 0 : index + %c-1_506 = constant -1 : index + %1088 = cmpi "slt", %arg5, %c0_505 : index + %1089 = subi %c-1_506, %arg5 : index + %1090 = select %1088, %1089, %arg5 : index + %1091 = divi_signed %1090, %c16_504 : index + %1092 = subi %c-1_506, %1091 : index + %1093 = select %1088, %1092, %1091 : index + %c16_507 = constant 16 : index + %1094 = remi_signed %1093, %c16_507 : index + %c0_508 = constant 0 : index + %1095 = cmpi "slt", %1094, %c0_508 : index + %1096 = addi %1094, %c16_507 : index + %1097 = select %1095, %1096, %1094 : index + %1098 = addi %arg7, %arg9 : index + %c6_509 = constant 6 : index + %1099 = remi_signed %1098, %c6_509 : index + %c0_510 = constant 0 : index + %1100 = cmpi "slt", %1099, %c0_510 : index + %1101 = addi %1099, %c6_509 : index + %1102 = select %1100, %1101, %1099 : index + %c16_511 = constant 16 : index + %1103 = remi_signed %arg5, %c16_511 : index + %c0_512 = constant 0 : index + %1104 = cmpi "slt", %1103, %c0_512 : index + %1105 = addi %1103, %c16_511 : index + %1106 = select %1104, %1105, %1103 : index + %c8_513 = constant 8 : index + %c0_514 = constant 0 : index + %c-1_515 = constant -1 : index + %1107 = cmpi "slt", %1106, %c0_514 : index + %1108 = subi %c-1_515, %1106 : index + %1109 = select %1107, %1108, %1106 : index + %1110 = divi_signed %1109, %c8_513 : index + %1111 = subi %c-1_515, %1110 : index + %1112 = select %1107, %1111, %1110 : index + %c2_516 = constant 2 : index + %1113 = remi_signed %1112, %c2_516 : index + %c0_517 = constant 0 : index + %1114 = cmpi "slt", %1113, %c0_517 : index + %1115 = addi %1113, %c2_516 : index + %1116 = select %1114, %1115, %1113 : index + %1117 = load %2[%1097, %1102, %1116] : memref<16x6x2xvector<8xf32>> + %1118 = vector.insertelement %541, %1117[%c1_i64 : i64] : vector<8xf32> + %c16_518 = constant 16 : index + %c0_519 = constant 0 : index + %c-1_520 = constant -1 : index + %1119 = cmpi "slt", %arg5, %c0_519 : index + %1120 = subi %c-1_520, %arg5 : index + %1121 = select %1119, %1120, %arg5 : index + %1122 = divi_signed %1121, %c16_518 : index + %1123 = subi %c-1_520, %1122 : index + %1124 = select %1119, %1123, %1122 : index + %c16_521 = constant 16 : index + %1125 = remi_signed %1124, %c16_521 : index + %c0_522 = constant 0 : index + %1126 = cmpi "slt", %1125, %c0_522 : index + %1127 = addi %1125, %c16_521 : index + %1128 = select %1126, %1127, %1125 : index + %1129 = addi %arg7, %arg9 : index + %c6_523 = constant 6 : index + %1130 = remi_signed %1129, %c6_523 : index + %c0_524 = constant 0 : index + %1131 = cmpi "slt", %1130, %c0_524 : index + %1132 = addi %1130, %c6_523 : index + %1133 = select %1131, %1132, %1130 : index + %c16_525 = constant 16 : index + %1134 = remi_signed %arg5, %c16_525 : index + %c0_526 = constant 0 : index + %1135 = cmpi "slt", %1134, %c0_526 : index + %1136 = addi %1134, %c16_525 : index + %1137 = select %1135, %1136, %1134 : index + %c8_527 = constant 8 : index + %c0_528 = constant 0 : index + %c-1_529 = constant -1 : index + %1138 = cmpi "slt", %1137, %c0_528 : index + %1139 = subi %c-1_529, %1137 : index + %1140 = select %1138, %1139, %1137 : index + %1141 = divi_signed %1140, %c8_527 : index + %1142 = subi %c-1_529, %1141 : index + %1143 = select %1138, %1142, %1141 : index + %c2_530 = constant 2 : index + %1144 = remi_signed %1143, %c2_530 : index + %c0_531 = constant 0 : index + %1145 = cmpi "slt", %1144, %c0_531 : index + %1146 = addi %1144, %c2_530 : index + %1147 = select %1145, %1146, %1144 : index + store %1118, %2[%1128, %1133, %1147] : memref<16x6x2xvector<8xf32>> + %c16_532 = constant 16 : index + %c0_533 = constant 0 : index + %c-1_534 = constant -1 : index + %1148 = cmpi "slt", %arg5, %c0_533 : index + %1149 = subi %c-1_534, %arg5 : index + %1150 = select %1148, %1149, %arg5 : index + %1151 = divi_signed %1150, %c16_532 : index + %1152 = subi %c-1_534, %1151 : index + %1153 = select %1148, %1152, %1151 : index + %c16_535 = constant 16 : index + %1154 = remi_signed %1153, %c16_535 : index + %c0_536 = constant 0 : index + %1155 = cmpi "slt", %1154, %c0_536 : index + %1156 = addi %1154, %c16_535 : index + %1157 = select %1155, %1156, %1154 : index + %1158 = addi %arg7, %arg9 : index + %c6_537 = constant 6 : index + %1159 = remi_signed %1158, %c6_537 : index + %c0_538 = constant 0 : index + %1160 = cmpi "slt", %1159, %c0_538 : index + %1161 = addi %1159, %c6_537 : index + %1162 = select %1160, %1161, %1159 : index + %c16_539 = constant 16 : index + %1163 = remi_signed %arg5, %c16_539 : index + %c0_540 = constant 0 : index + %1164 = cmpi "slt", %1163, %c0_540 : index + %1165 = addi %1163, %c16_539 : index + %1166 = select %1164, %1165, %1163 : index + %c8_541 = constant 8 : index + %c0_542 = constant 0 : index + %c-1_543 = constant -1 : index + %1167 = cmpi "slt", %1166, %c0_542 : index + %1168 = subi %c-1_543, %1166 : index + %1169 = select %1167, %1168, %1166 : index + %1170 = divi_signed %1169, %c8_541 : index + %1171 = subi %c-1_543, %1170 : index + %1172 = select %1167, %1171, %1170 : index + %c2_544 = constant 2 : index + %1173 = remi_signed %1172, %c2_544 : index + %c0_545 = constant 0 : index + %1174 = cmpi "slt", %1173, %c0_545 : index + %1175 = addi %1173, %c2_544 : index + %1176 = select %1174, %1175, %1173 : index + %1177 = load %2[%1157, %1162, %1176] : memref<16x6x2xvector<8xf32>> + %1178 = vector.insertelement %542, %1177[%c2_i64 : i64] : vector<8xf32> + %c16_546 = constant 16 : index + %c0_547 = constant 0 : index + %c-1_548 = constant -1 : index + %1179 = cmpi "slt", %arg5, %c0_547 : index + %1180 = subi %c-1_548, %arg5 : index + %1181 = select %1179, %1180, %arg5 : index + %1182 = divi_signed %1181, %c16_546 : index + %1183 = subi %c-1_548, %1182 : index + %1184 = select %1179, %1183, %1182 : index + %c16_549 = constant 16 : index + %1185 = remi_signed %1184, %c16_549 : index + %c0_550 = constant 0 : index + %1186 = cmpi "slt", %1185, %c0_550 : index + %1187 = addi %1185, %c16_549 : index + %1188 = select %1186, %1187, %1185 : index + %1189 = addi %arg7, %arg9 : index + %c6_551 = constant 6 : index + %1190 = remi_signed %1189, %c6_551 : index + %c0_552 = constant 0 : index + %1191 = cmpi "slt", %1190, %c0_552 : index + %1192 = addi %1190, %c6_551 : index + %1193 = select %1191, %1192, %1190 : index + %c16_553 = constant 16 : index + %1194 = remi_signed %arg5, %c16_553 : index + %c0_554 = constant 0 : index + %1195 = cmpi "slt", %1194, %c0_554 : index + %1196 = addi %1194, %c16_553 : index + %1197 = select %1195, %1196, %1194 : index + %c8_555 = constant 8 : index + %c0_556 = constant 0 : index + %c-1_557 = constant -1 : index + %1198 = cmpi "slt", %1197, %c0_556 : index + %1199 = subi %c-1_557, %1197 : index + %1200 = select %1198, %1199, %1197 : index + %1201 = divi_signed %1200, %c8_555 : index + %1202 = subi %c-1_557, %1201 : index + %1203 = select %1198, %1202, %1201 : index + %c2_558 = constant 2 : index + %1204 = remi_signed %1203, %c2_558 : index + %c0_559 = constant 0 : index + %1205 = cmpi "slt", %1204, %c0_559 : index + %1206 = addi %1204, %c2_558 : index + %1207 = select %1205, %1206, %1204 : index + store %1178, %2[%1188, %1193, %1207] : memref<16x6x2xvector<8xf32>> + %c16_560 = constant 16 : index + %c0_561 = constant 0 : index + %c-1_562 = constant -1 : index + %1208 = cmpi "slt", %arg5, %c0_561 : index + %1209 = subi %c-1_562, %arg5 : index + %1210 = select %1208, %1209, %arg5 : index + %1211 = divi_signed %1210, %c16_560 : index + %1212 = subi %c-1_562, %1211 : index + %1213 = select %1208, %1212, %1211 : index + %c16_563 = constant 16 : index + %1214 = remi_signed %1213, %c16_563 : index + %c0_564 = constant 0 : index + %1215 = cmpi "slt", %1214, %c0_564 : index + %1216 = addi %1214, %c16_563 : index + %1217 = select %1215, %1216, %1214 : index + %1218 = addi %arg7, %arg9 : index + %c6_565 = constant 6 : index + %1219 = remi_signed %1218, %c6_565 : index + %c0_566 = constant 0 : index + %1220 = cmpi "slt", %1219, %c0_566 : index + %1221 = addi %1219, %c6_565 : index + %1222 = select %1220, %1221, %1219 : index + %c16_567 = constant 16 : index + %1223 = remi_signed %arg5, %c16_567 : index + %c0_568 = constant 0 : index + %1224 = cmpi "slt", %1223, %c0_568 : index + %1225 = addi %1223, %c16_567 : index + %1226 = select %1224, %1225, %1223 : index + %c8_569 = constant 8 : index + %c0_570 = constant 0 : index + %c-1_571 = constant -1 : index + %1227 = cmpi "slt", %1226, %c0_570 : index + %1228 = subi %c-1_571, %1226 : index + %1229 = select %1227, %1228, %1226 : index + %1230 = divi_signed %1229, %c8_569 : index + %1231 = subi %c-1_571, %1230 : index + %1232 = select %1227, %1231, %1230 : index + %c2_572 = constant 2 : index + %1233 = remi_signed %1232, %c2_572 : index + %c0_573 = constant 0 : index + %1234 = cmpi "slt", %1233, %c0_573 : index + %1235 = addi %1233, %c2_572 : index + %1236 = select %1234, %1235, %1233 : index + %1237 = load %2[%1217, %1222, %1236] : memref<16x6x2xvector<8xf32>> + %1238 = vector.insertelement %543, %1237[%c3_i64 : i64] : vector<8xf32> + %c16_574 = constant 16 : index + %c0_575 = constant 0 : index + %c-1_576 = constant -1 : index + %1239 = cmpi "slt", %arg5, %c0_575 : index + %1240 = subi %c-1_576, %arg5 : index + %1241 = select %1239, %1240, %arg5 : index + %1242 = divi_signed %1241, %c16_574 : index + %1243 = subi %c-1_576, %1242 : index + %1244 = select %1239, %1243, %1242 : index + %c16_577 = constant 16 : index + %1245 = remi_signed %1244, %c16_577 : index + %c0_578 = constant 0 : index + %1246 = cmpi "slt", %1245, %c0_578 : index + %1247 = addi %1245, %c16_577 : index + %1248 = select %1246, %1247, %1245 : index + %1249 = addi %arg7, %arg9 : index + %c6_579 = constant 6 : index + %1250 = remi_signed %1249, %c6_579 : index + %c0_580 = constant 0 : index + %1251 = cmpi "slt", %1250, %c0_580 : index + %1252 = addi %1250, %c6_579 : index + %1253 = select %1251, %1252, %1250 : index + %c16_581 = constant 16 : index + %1254 = remi_signed %arg5, %c16_581 : index + %c0_582 = constant 0 : index + %1255 = cmpi "slt", %1254, %c0_582 : index + %1256 = addi %1254, %c16_581 : index + %1257 = select %1255, %1256, %1254 : index + %c8_583 = constant 8 : index + %c0_584 = constant 0 : index + %c-1_585 = constant -1 : index + %1258 = cmpi "slt", %1257, %c0_584 : index + %1259 = subi %c-1_585, %1257 : index + %1260 = select %1258, %1259, %1257 : index + %1261 = divi_signed %1260, %c8_583 : index + %1262 = subi %c-1_585, %1261 : index + %1263 = select %1258, %1262, %1261 : index + %c2_586 = constant 2 : index + %1264 = remi_signed %1263, %c2_586 : index + %c0_587 = constant 0 : index + %1265 = cmpi "slt", %1264, %c0_587 : index + %1266 = addi %1264, %c2_586 : index + %1267 = select %1265, %1266, %1264 : index + store %1238, %2[%1248, %1253, %1267] : memref<16x6x2xvector<8xf32>> + %c16_588 = constant 16 : index + %c0_589 = constant 0 : index + %c-1_590 = constant -1 : index + %1268 = cmpi "slt", %arg5, %c0_589 : index + %1269 = subi %c-1_590, %arg5 : index + %1270 = select %1268, %1269, %arg5 : index + %1271 = divi_signed %1270, %c16_588 : index + %1272 = subi %c-1_590, %1271 : index + %1273 = select %1268, %1272, %1271 : index + %c16_591 = constant 16 : index + %1274 = remi_signed %1273, %c16_591 : index + %c0_592 = constant 0 : index + %1275 = cmpi "slt", %1274, %c0_592 : index + %1276 = addi %1274, %c16_591 : index + %1277 = select %1275, %1276, %1274 : index + %1278 = addi %arg7, %arg9 : index + %c6_593 = constant 6 : index + %1279 = remi_signed %1278, %c6_593 : index + %c0_594 = constant 0 : index + %1280 = cmpi "slt", %1279, %c0_594 : index + %1281 = addi %1279, %c6_593 : index + %1282 = select %1280, %1281, %1279 : index + %c16_595 = constant 16 : index + %1283 = remi_signed %arg5, %c16_595 : index + %c0_596 = constant 0 : index + %1284 = cmpi "slt", %1283, %c0_596 : index + %1285 = addi %1283, %c16_595 : index + %1286 = select %1284, %1285, %1283 : index + %c8_597 = constant 8 : index + %c0_598 = constant 0 : index + %c-1_599 = constant -1 : index + %1287 = cmpi "slt", %1286, %c0_598 : index + %1288 = subi %c-1_599, %1286 : index + %1289 = select %1287, %1288, %1286 : index + %1290 = divi_signed %1289, %c8_597 : index + %1291 = subi %c-1_599, %1290 : index + %1292 = select %1287, %1291, %1290 : index + %c2_600 = constant 2 : index + %1293 = remi_signed %1292, %c2_600 : index + %c0_601 = constant 0 : index + %1294 = cmpi "slt", %1293, %c0_601 : index + %1295 = addi %1293, %c2_600 : index + %1296 = select %1294, %1295, %1293 : index + %1297 = load %2[%1277, %1282, %1296] : memref<16x6x2xvector<8xf32>> + %1298 = vector.insertelement %544, %1297[%c4_i64 : i64] : vector<8xf32> + %c16_602 = constant 16 : index + %c0_603 = constant 0 : index + %c-1_604 = constant -1 : index + %1299 = cmpi "slt", %arg5, %c0_603 : index + %1300 = subi %c-1_604, %arg5 : index + %1301 = select %1299, %1300, %arg5 : index + %1302 = divi_signed %1301, %c16_602 : index + %1303 = subi %c-1_604, %1302 : index + %1304 = select %1299, %1303, %1302 : index + %c16_605 = constant 16 : index + %1305 = remi_signed %1304, %c16_605 : index + %c0_606 = constant 0 : index + %1306 = cmpi "slt", %1305, %c0_606 : index + %1307 = addi %1305, %c16_605 : index + %1308 = select %1306, %1307, %1305 : index + %1309 = addi %arg7, %arg9 : index + %c6_607 = constant 6 : index + %1310 = remi_signed %1309, %c6_607 : index + %c0_608 = constant 0 : index + %1311 = cmpi "slt", %1310, %c0_608 : index + %1312 = addi %1310, %c6_607 : index + %1313 = select %1311, %1312, %1310 : index + %c16_609 = constant 16 : index + %1314 = remi_signed %arg5, %c16_609 : index + %c0_610 = constant 0 : index + %1315 = cmpi "slt", %1314, %c0_610 : index + %1316 = addi %1314, %c16_609 : index + %1317 = select %1315, %1316, %1314 : index + %c8_611 = constant 8 : index + %c0_612 = constant 0 : index + %c-1_613 = constant -1 : index + %1318 = cmpi "slt", %1317, %c0_612 : index + %1319 = subi %c-1_613, %1317 : index + %1320 = select %1318, %1319, %1317 : index + %1321 = divi_signed %1320, %c8_611 : index + %1322 = subi %c-1_613, %1321 : index + %1323 = select %1318, %1322, %1321 : index + %c2_614 = constant 2 : index + %1324 = remi_signed %1323, %c2_614 : index + %c0_615 = constant 0 : index + %1325 = cmpi "slt", %1324, %c0_615 : index + %1326 = addi %1324, %c2_614 : index + %1327 = select %1325, %1326, %1324 : index + store %1298, %2[%1308, %1313, %1327] : memref<16x6x2xvector<8xf32>> + %c16_616 = constant 16 : index + %c0_617 = constant 0 : index + %c-1_618 = constant -1 : index + %1328 = cmpi "slt", %arg5, %c0_617 : index + %1329 = subi %c-1_618, %arg5 : index + %1330 = select %1328, %1329, %arg5 : index + %1331 = divi_signed %1330, %c16_616 : index + %1332 = subi %c-1_618, %1331 : index + %1333 = select %1328, %1332, %1331 : index + %c16_619 = constant 16 : index + %1334 = remi_signed %1333, %c16_619 : index + %c0_620 = constant 0 : index + %1335 = cmpi "slt", %1334, %c0_620 : index + %1336 = addi %1334, %c16_619 : index + %1337 = select %1335, %1336, %1334 : index + %1338 = addi %arg7, %arg9 : index + %c6_621 = constant 6 : index + %1339 = remi_signed %1338, %c6_621 : index + %c0_622 = constant 0 : index + %1340 = cmpi "slt", %1339, %c0_622 : index + %1341 = addi %1339, %c6_621 : index + %1342 = select %1340, %1341, %1339 : index + %c16_623 = constant 16 : index + %1343 = remi_signed %arg5, %c16_623 : index + %c0_624 = constant 0 : index + %1344 = cmpi "slt", %1343, %c0_624 : index + %1345 = addi %1343, %c16_623 : index + %1346 = select %1344, %1345, %1343 : index + %c8_625 = constant 8 : index + %c0_626 = constant 0 : index + %c-1_627 = constant -1 : index + %1347 = cmpi "slt", %1346, %c0_626 : index + %1348 = subi %c-1_627, %1346 : index + %1349 = select %1347, %1348, %1346 : index + %1350 = divi_signed %1349, %c8_625 : index + %1351 = subi %c-1_627, %1350 : index + %1352 = select %1347, %1351, %1350 : index + %c2_628 = constant 2 : index + %1353 = remi_signed %1352, %c2_628 : index + %c0_629 = constant 0 : index + %1354 = cmpi "slt", %1353, %c0_629 : index + %1355 = addi %1353, %c2_628 : index + %1356 = select %1354, %1355, %1353 : index + %1357 = load %2[%1337, %1342, %1356] : memref<16x6x2xvector<8xf32>> + %1358 = vector.insertelement %545, %1357[%c5_i64 : i64] : vector<8xf32> + %c16_630 = constant 16 : index + %c0_631 = constant 0 : index + %c-1_632 = constant -1 : index + %1359 = cmpi "slt", %arg5, %c0_631 : index + %1360 = subi %c-1_632, %arg5 : index + %1361 = select %1359, %1360, %arg5 : index + %1362 = divi_signed %1361, %c16_630 : index + %1363 = subi %c-1_632, %1362 : index + %1364 = select %1359, %1363, %1362 : index + %c16_633 = constant 16 : index + %1365 = remi_signed %1364, %c16_633 : index + %c0_634 = constant 0 : index + %1366 = cmpi "slt", %1365, %c0_634 : index + %1367 = addi %1365, %c16_633 : index + %1368 = select %1366, %1367, %1365 : index + %1369 = addi %arg7, %arg9 : index + %c6_635 = constant 6 : index + %1370 = remi_signed %1369, %c6_635 : index + %c0_636 = constant 0 : index + %1371 = cmpi "slt", %1370, %c0_636 : index + %1372 = addi %1370, %c6_635 : index + %1373 = select %1371, %1372, %1370 : index + %c16_637 = constant 16 : index + %1374 = remi_signed %arg5, %c16_637 : index + %c0_638 = constant 0 : index + %1375 = cmpi "slt", %1374, %c0_638 : index + %1376 = addi %1374, %c16_637 : index + %1377 = select %1375, %1376, %1374 : index + %c8_639 = constant 8 : index + %c0_640 = constant 0 : index + %c-1_641 = constant -1 : index + %1378 = cmpi "slt", %1377, %c0_640 : index + %1379 = subi %c-1_641, %1377 : index + %1380 = select %1378, %1379, %1377 : index + %1381 = divi_signed %1380, %c8_639 : index + %1382 = subi %c-1_641, %1381 : index + %1383 = select %1378, %1382, %1381 : index + %c2_642 = constant 2 : index + %1384 = remi_signed %1383, %c2_642 : index + %c0_643 = constant 0 : index + %1385 = cmpi "slt", %1384, %c0_643 : index + %1386 = addi %1384, %c2_642 : index + %1387 = select %1385, %1386, %1384 : index + store %1358, %2[%1368, %1373, %1387] : memref<16x6x2xvector<8xf32>> + %c16_644 = constant 16 : index + %c0_645 = constant 0 : index + %c-1_646 = constant -1 : index + %1388 = cmpi "slt", %arg5, %c0_645 : index + %1389 = subi %c-1_646, %arg5 : index + %1390 = select %1388, %1389, %arg5 : index + %1391 = divi_signed %1390, %c16_644 : index + %1392 = subi %c-1_646, %1391 : index + %1393 = select %1388, %1392, %1391 : index + %c16_647 = constant 16 : index + %1394 = remi_signed %1393, %c16_647 : index + %c0_648 = constant 0 : index + %1395 = cmpi "slt", %1394, %c0_648 : index + %1396 = addi %1394, %c16_647 : index + %1397 = select %1395, %1396, %1394 : index + %1398 = addi %arg7, %arg9 : index + %c6_649 = constant 6 : index + %1399 = remi_signed %1398, %c6_649 : index + %c0_650 = constant 0 : index + %1400 = cmpi "slt", %1399, %c0_650 : index + %1401 = addi %1399, %c6_649 : index + %1402 = select %1400, %1401, %1399 : index + %c16_651 = constant 16 : index + %1403 = remi_signed %arg5, %c16_651 : index + %c0_652 = constant 0 : index + %1404 = cmpi "slt", %1403, %c0_652 : index + %1405 = addi %1403, %c16_651 : index + %1406 = select %1404, %1405, %1403 : index + %c8_653 = constant 8 : index + %c0_654 = constant 0 : index + %c-1_655 = constant -1 : index + %1407 = cmpi "slt", %1406, %c0_654 : index + %1408 = subi %c-1_655, %1406 : index + %1409 = select %1407, %1408, %1406 : index + %1410 = divi_signed %1409, %c8_653 : index + %1411 = subi %c-1_655, %1410 : index + %1412 = select %1407, %1411, %1410 : index + %c2_656 = constant 2 : index + %1413 = remi_signed %1412, %c2_656 : index + %c0_657 = constant 0 : index + %1414 = cmpi "slt", %1413, %c0_657 : index + %1415 = addi %1413, %c2_656 : index + %1416 = select %1414, %1415, %1413 : index + %1417 = load %2[%1397, %1402, %1416] : memref<16x6x2xvector<8xf32>> + %1418 = vector.insertelement %546, %1417[%c6_i64 : i64] : vector<8xf32> + %c16_658 = constant 16 : index + %c0_659 = constant 0 : index + %c-1_660 = constant -1 : index + %1419 = cmpi "slt", %arg5, %c0_659 : index + %1420 = subi %c-1_660, %arg5 : index + %1421 = select %1419, %1420, %arg5 : index + %1422 = divi_signed %1421, %c16_658 : index + %1423 = subi %c-1_660, %1422 : index + %1424 = select %1419, %1423, %1422 : index + %c16_661 = constant 16 : index + %1425 = remi_signed %1424, %c16_661 : index + %c0_662 = constant 0 : index + %1426 = cmpi "slt", %1425, %c0_662 : index + %1427 = addi %1425, %c16_661 : index + %1428 = select %1426, %1427, %1425 : index + %1429 = addi %arg7, %arg9 : index + %c6_663 = constant 6 : index + %1430 = remi_signed %1429, %c6_663 : index + %c0_664 = constant 0 : index + %1431 = cmpi "slt", %1430, %c0_664 : index + %1432 = addi %1430, %c6_663 : index + %1433 = select %1431, %1432, %1430 : index + %c16_665 = constant 16 : index + %1434 = remi_signed %arg5, %c16_665 : index + %c0_666 = constant 0 : index + %1435 = cmpi "slt", %1434, %c0_666 : index + %1436 = addi %1434, %c16_665 : index + %1437 = select %1435, %1436, %1434 : index + %c8_667 = constant 8 : index + %c0_668 = constant 0 : index + %c-1_669 = constant -1 : index + %1438 = cmpi "slt", %1437, %c0_668 : index + %1439 = subi %c-1_669, %1437 : index + %1440 = select %1438, %1439, %1437 : index + %1441 = divi_signed %1440, %c8_667 : index + %1442 = subi %c-1_669, %1441 : index + %1443 = select %1438, %1442, %1441 : index + %c2_670 = constant 2 : index + %1444 = remi_signed %1443, %c2_670 : index + %c0_671 = constant 0 : index + %1445 = cmpi "slt", %1444, %c0_671 : index + %1446 = addi %1444, %c2_670 : index + %1447 = select %1445, %1446, %1444 : index + store %1418, %2[%1428, %1433, %1447] : memref<16x6x2xvector<8xf32>> + %c16_672 = constant 16 : index + %c0_673 = constant 0 : index + %c-1_674 = constant -1 : index + %1448 = cmpi "slt", %arg5, %c0_673 : index + %1449 = subi %c-1_674, %arg5 : index + %1450 = select %1448, %1449, %arg5 : index + %1451 = divi_signed %1450, %c16_672 : index + %1452 = subi %c-1_674, %1451 : index + %1453 = select %1448, %1452, %1451 : index + %c16_675 = constant 16 : index + %1454 = remi_signed %1453, %c16_675 : index + %c0_676 = constant 0 : index + %1455 = cmpi "slt", %1454, %c0_676 : index + %1456 = addi %1454, %c16_675 : index + %1457 = select %1455, %1456, %1454 : index + %1458 = addi %arg7, %arg9 : index + %c6_677 = constant 6 : index + %1459 = remi_signed %1458, %c6_677 : index + %c0_678 = constant 0 : index + %1460 = cmpi "slt", %1459, %c0_678 : index + %1461 = addi %1459, %c6_677 : index + %1462 = select %1460, %1461, %1459 : index + %c16_679 = constant 16 : index + %1463 = remi_signed %arg5, %c16_679 : index + %c0_680 = constant 0 : index + %1464 = cmpi "slt", %1463, %c0_680 : index + %1465 = addi %1463, %c16_679 : index + %1466 = select %1464, %1465, %1463 : index + %c8_681 = constant 8 : index + %c0_682 = constant 0 : index + %c-1_683 = constant -1 : index + %1467 = cmpi "slt", %1466, %c0_682 : index + %1468 = subi %c-1_683, %1466 : index + %1469 = select %1467, %1468, %1466 : index + %1470 = divi_signed %1469, %c8_681 : index + %1471 = subi %c-1_683, %1470 : index + %1472 = select %1467, %1471, %1470 : index + %c2_684 = constant 2 : index + %1473 = remi_signed %1472, %c2_684 : index + %c0_685 = constant 0 : index + %1474 = cmpi "slt", %1473, %c0_685 : index + %1475 = addi %1473, %c2_684 : index + %1476 = select %1474, %1475, %1473 : index + %1477 = load %2[%1457, %1462, %1476] : memref<16x6x2xvector<8xf32>> + %1478 = vector.insertelement %547, %1477[%c7_i64 : i64] : vector<8xf32> + %c16_686 = constant 16 : index + %c0_687 = constant 0 : index + %c-1_688 = constant -1 : index + %1479 = cmpi "slt", %arg5, %c0_687 : index + %1480 = subi %c-1_688, %arg5 : index + %1481 = select %1479, %1480, %arg5 : index + %1482 = divi_signed %1481, %c16_686 : index + %1483 = subi %c-1_688, %1482 : index + %1484 = select %1479, %1483, %1482 : index + %c16_689 = constant 16 : index + %1485 = remi_signed %1484, %c16_689 : index + %c0_690 = constant 0 : index + %1486 = cmpi "slt", %1485, %c0_690 : index + %1487 = addi %1485, %c16_689 : index + %1488 = select %1486, %1487, %1485 : index + %1489 = addi %arg7, %arg9 : index + %c6_691 = constant 6 : index + %1490 = remi_signed %1489, %c6_691 : index + %c0_692 = constant 0 : index + %1491 = cmpi "slt", %1490, %c0_692 : index + %1492 = addi %1490, %c6_691 : index + %1493 = select %1491, %1492, %1490 : index + %c16_693 = constant 16 : index + %1494 = remi_signed %arg5, %c16_693 : index + %c0_694 = constant 0 : index + %1495 = cmpi "slt", %1494, %c0_694 : index + %1496 = addi %1494, %c16_693 : index + %1497 = select %1495, %1496, %1494 : index + %c8_695 = constant 8 : index + %c0_696 = constant 0 : index + %c-1_697 = constant -1 : index + %1498 = cmpi "slt", %1497, %c0_696 : index + %1499 = subi %c-1_697, %1497 : index + %1500 = select %1498, %1499, %1497 : index + %1501 = divi_signed %1500, %c8_695 : index + %1502 = subi %c-1_697, %1501 : index + %1503 = select %1498, %1502, %1501 : index + %c2_698 = constant 2 : index + %1504 = remi_signed %1503, %c2_698 : index + %c0_699 = constant 0 : index + %1505 = cmpi "slt", %1504, %c0_699 : index + %1506 = addi %1504, %c2_698 : index + %1507 = select %1505, %1506, %1504 : index + store %1478, %2[%1488, %1493, %1507] : memref<16x6x2xvector<8xf32>> + %1508 = addi %arg4, %arg7 : index + %1509 = addi %1508, %arg9 : index + %1510 = addi %arg4, %arg7 : index + %1511 = addi %1510, %arg9 : index + %1512 = addi %arg4, %arg7 : index + %1513 = addi %1512, %arg9 : index + %1514 = addi %arg4, %arg7 : index + %1515 = addi %1514, %arg9 : index + %1516 = addi %arg4, %arg7 : index + %1517 = addi %1516, %arg9 : index + %1518 = addi %arg4, %arg7 : index + %1519 = addi %1518, %arg9 : index + %1520 = addi %arg4, %arg7 : index + %1521 = addi %1520, %arg9 : index + %1522 = addi %arg4, %arg7 : index + %1523 = addi %1522, %arg9 : index + %1524 = addi %arg6, %arg8 : index + %1525 = addi %arg6, %arg8 : index + %1526 = addi %arg6, %arg8 : index + %1527 = addi %arg6, %arg8 : index + %1528 = addi %arg6, %arg8 : index + %1529 = addi %arg6, %arg8 : index + %1530 = addi %arg6, %arg8 : index + %1531 = addi %arg6, %arg8 : index + %1532 = load %arg0[%1509, %1524] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1533 = load %arg0[%1511, %1525] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1534 = load %arg0[%1513, %1526] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1535 = load %arg0[%1515, %1527] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1536 = load %arg0[%1517, %1528] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1537 = load %arg0[%1519, %1529] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1538 = load %arg0[%1521, %1530] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1539 = load %arg0[%1523, %1531] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %c8_700 = constant 8 : index + %1540 = addi %arg5, %c8_700 : index + %c16_701 = constant 16 : index + %c0_702 = constant 0 : index + %c-1_703 = constant -1 : index + %1541 = cmpi "slt", %1540, %c0_702 : index + %1542 = subi %c-1_703, %1540 : index + %1543 = select %1541, %1542, %1540 : index + %1544 = divi_signed %1543, %c16_701 : index + %1545 = subi %c-1_703, %1544 : index + %1546 = select %1541, %1545, %1544 : index + %c16_704 = constant 16 : index + %1547 = remi_signed %1546, %c16_704 : index + %c0_705 = constant 0 : index + %1548 = cmpi "slt", %1547, %c0_705 : index + %1549 = addi %1547, %c16_704 : index + %1550 = select %1548, %1549, %1547 : index + %1551 = addi %arg6, %arg8 : index + %c128_706 = constant 128 : index + %1552 = remi_signed %1551, %c128_706 : index + %c0_707 = constant 0 : index + %1553 = cmpi "slt", %1552, %c0_707 : index + %1554 = addi %1552, %c128_706 : index + %1555 = select %1553, %1554, %1552 : index + %c8_708 = constant 8 : index + %c0_709 = constant 0 : index + %c-1_710 = constant -1 : index + %1556 = cmpi "slt", %arg5, %c0_709 : index + %1557 = subi %c-1_710, %arg5 : index + %1558 = select %1556, %1557, %arg5 : index + %1559 = divi_signed %1558, %c8_708 : index + %1560 = subi %c-1_710, %1559 : index + %1561 = select %1556, %1560, %1559 : index + %c8_711 = constant 8 : index + %1562 = addi %arg5, %c8_711 : index + %c16_712 = constant 16 : index + %c0_713 = constant 0 : index + %c-1_714 = constant -1 : index + %1563 = cmpi "slt", %1562, %c0_713 : index + %1564 = subi %c-1_714, %1562 : index + %1565 = select %1563, %1564, %1562 : index + %1566 = divi_signed %1565, %c16_712 : index + %1567 = subi %c-1_714, %1566 : index + %1568 = select %1563, %1567, %1566 : index + %c-2 = constant -2 : index + %1569 = muli %1568, %c-2 : index + %1570 = addi %1561, %1569 : index + %c8_715 = constant 8 : index + %c0_716 = constant 0 : index + %c-1_717 = constant -1 : index + %1571 = cmpi "slt", %arg5, %c0_716 : index + %1572 = subi %c-1_717, %arg5 : index + %1573 = select %1571, %1572, %arg5 : index + %1574 = divi_signed %1573, %c8_715 : index + %1575 = subi %c-1_717, %1574 : index + %1576 = select %1571, %1575, %1574 : index + %c8_718 = constant 8 : index + %1577 = addi %arg5, %c8_718 : index + %c16_719 = constant 16 : index + %c0_720 = constant 0 : index + %c-1_721 = constant -1 : index + %1578 = cmpi "slt", %1577, %c0_720 : index + %1579 = subi %c-1_721, %1577 : index + %1580 = select %1578, %1579, %1577 : index + %1581 = divi_signed %1580, %c16_719 : index + %1582 = subi %c-1_721, %1581 : index + %1583 = select %1578, %1582, %1581 : index + %c-2_722 = constant -2 : index + %1584 = muli %1583, %c-2_722 : index + %1585 = addi %1576, %1584 : index + %c1_723 = constant 1 : index + %1586 = addi %1585, %c1_723 : index + %c2_724 = constant 2 : index + %c0_725 = constant 0 : index + %c-1_726 = constant -1 : index + %1587 = cmpi "slt", %1586, %c0_725 : index + %1588 = subi %c-1_726, %1586 : index + %1589 = select %1587, %1588, %1586 : index + %1590 = divi_signed %1589, %c2_724 : index + %1591 = subi %c-1_726, %1590 : index + %1592 = select %1587, %1591, %1590 : index + %c-2_727 = constant -2 : index + %1593 = muli %1592, %c-2_727 : index + %1594 = addi %1570, %1593 : index + %c1_728 = constant 1 : index + %1595 = addi %1594, %c1_728 : index + %1596 = load %3[%1550, %1555, %1595] : memref<16x128x2xvector<8xf32>> + %1597 = vector.extractelement %1596[%c0_i64 : i64] : vector<8xf32> + %c8_729 = constant 8 : index + %1598 = addi %arg5, %c8_729 : index + %c16_730 = constant 16 : index + %c0_731 = constant 0 : index + %c-1_732 = constant -1 : index + %1599 = cmpi "slt", %1598, %c0_731 : index + %1600 = subi %c-1_732, %1598 : index + %1601 = select %1599, %1600, %1598 : index + %1602 = divi_signed %1601, %c16_730 : index + %1603 = subi %c-1_732, %1602 : index + %1604 = select %1599, %1603, %1602 : index + %c16_733 = constant 16 : index + %1605 = remi_signed %1604, %c16_733 : index + %c0_734 = constant 0 : index + %1606 = cmpi "slt", %1605, %c0_734 : index + %1607 = addi %1605, %c16_733 : index + %1608 = select %1606, %1607, %1605 : index + %1609 = addi %arg6, %arg8 : index + %c128_735 = constant 128 : index + %1610 = remi_signed %1609, %c128_735 : index + %c0_736 = constant 0 : index + %1611 = cmpi "slt", %1610, %c0_736 : index + %1612 = addi %1610, %c128_735 : index + %1613 = select %1611, %1612, %1610 : index + %c8_737 = constant 8 : index + %c0_738 = constant 0 : index + %c-1_739 = constant -1 : index + %1614 = cmpi "slt", %arg5, %c0_738 : index + %1615 = subi %c-1_739, %arg5 : index + %1616 = select %1614, %1615, %arg5 : index + %1617 = divi_signed %1616, %c8_737 : index + %1618 = subi %c-1_739, %1617 : index + %1619 = select %1614, %1618, %1617 : index + %c8_740 = constant 8 : index + %1620 = addi %arg5, %c8_740 : index + %c16_741 = constant 16 : index + %c0_742 = constant 0 : index + %c-1_743 = constant -1 : index + %1621 = cmpi "slt", %1620, %c0_742 : index + %1622 = subi %c-1_743, %1620 : index + %1623 = select %1621, %1622, %1620 : index + %1624 = divi_signed %1623, %c16_741 : index + %1625 = subi %c-1_743, %1624 : index + %1626 = select %1621, %1625, %1624 : index + %c-2_744 = constant -2 : index + %1627 = muli %1626, %c-2_744 : index + %1628 = addi %1619, %1627 : index + %c8_745 = constant 8 : index + %c0_746 = constant 0 : index + %c-1_747 = constant -1 : index + %1629 = cmpi "slt", %arg5, %c0_746 : index + %1630 = subi %c-1_747, %arg5 : index + %1631 = select %1629, %1630, %arg5 : index + %1632 = divi_signed %1631, %c8_745 : index + %1633 = subi %c-1_747, %1632 : index + %1634 = select %1629, %1633, %1632 : index + %c8_748 = constant 8 : index + %1635 = addi %arg5, %c8_748 : index + %c16_749 = constant 16 : index + %c0_750 = constant 0 : index + %c-1_751 = constant -1 : index + %1636 = cmpi "slt", %1635, %c0_750 : index + %1637 = subi %c-1_751, %1635 : index + %1638 = select %1636, %1637, %1635 : index + %1639 = divi_signed %1638, %c16_749 : index + %1640 = subi %c-1_751, %1639 : index + %1641 = select %1636, %1640, %1639 : index + %c-2_752 = constant -2 : index + %1642 = muli %1641, %c-2_752 : index + %1643 = addi %1634, %1642 : index + %c1_753 = constant 1 : index + %1644 = addi %1643, %c1_753 : index + %c2_754 = constant 2 : index + %c0_755 = constant 0 : index + %c-1_756 = constant -1 : index + %1645 = cmpi "slt", %1644, %c0_755 : index + %1646 = subi %c-1_756, %1644 : index + %1647 = select %1645, %1646, %1644 : index + %1648 = divi_signed %1647, %c2_754 : index + %1649 = subi %c-1_756, %1648 : index + %1650 = select %1645, %1649, %1648 : index + %c-2_757 = constant -2 : index + %1651 = muli %1650, %c-2_757 : index + %1652 = addi %1628, %1651 : index + %c1_758 = constant 1 : index + %1653 = addi %1652, %c1_758 : index + %1654 = load %3[%1608, %1613, %1653] : memref<16x128x2xvector<8xf32>> + %1655 = vector.extractelement %1654[%c1_i64 : i64] : vector<8xf32> + %c8_759 = constant 8 : index + %1656 = addi %arg5, %c8_759 : index + %c16_760 = constant 16 : index + %c0_761 = constant 0 : index + %c-1_762 = constant -1 : index + %1657 = cmpi "slt", %1656, %c0_761 : index + %1658 = subi %c-1_762, %1656 : index + %1659 = select %1657, %1658, %1656 : index + %1660 = divi_signed %1659, %c16_760 : index + %1661 = subi %c-1_762, %1660 : index + %1662 = select %1657, %1661, %1660 : index + %c16_763 = constant 16 : index + %1663 = remi_signed %1662, %c16_763 : index + %c0_764 = constant 0 : index + %1664 = cmpi "slt", %1663, %c0_764 : index + %1665 = addi %1663, %c16_763 : index + %1666 = select %1664, %1665, %1663 : index + %1667 = addi %arg6, %arg8 : index + %c128_765 = constant 128 : index + %1668 = remi_signed %1667, %c128_765 : index + %c0_766 = constant 0 : index + %1669 = cmpi "slt", %1668, %c0_766 : index + %1670 = addi %1668, %c128_765 : index + %1671 = select %1669, %1670, %1668 : index + %c8_767 = constant 8 : index + %c0_768 = constant 0 : index + %c-1_769 = constant -1 : index + %1672 = cmpi "slt", %arg5, %c0_768 : index + %1673 = subi %c-1_769, %arg5 : index + %1674 = select %1672, %1673, %arg5 : index + %1675 = divi_signed %1674, %c8_767 : index + %1676 = subi %c-1_769, %1675 : index + %1677 = select %1672, %1676, %1675 : index + %c8_770 = constant 8 : index + %1678 = addi %arg5, %c8_770 : index + %c16_771 = constant 16 : index + %c0_772 = constant 0 : index + %c-1_773 = constant -1 : index + %1679 = cmpi "slt", %1678, %c0_772 : index + %1680 = subi %c-1_773, %1678 : index + %1681 = select %1679, %1680, %1678 : index + %1682 = divi_signed %1681, %c16_771 : index + %1683 = subi %c-1_773, %1682 : index + %1684 = select %1679, %1683, %1682 : index + %c-2_774 = constant -2 : index + %1685 = muli %1684, %c-2_774 : index + %1686 = addi %1677, %1685 : index + %c8_775 = constant 8 : index + %c0_776 = constant 0 : index + %c-1_777 = constant -1 : index + %1687 = cmpi "slt", %arg5, %c0_776 : index + %1688 = subi %c-1_777, %arg5 : index + %1689 = select %1687, %1688, %arg5 : index + %1690 = divi_signed %1689, %c8_775 : index + %1691 = subi %c-1_777, %1690 : index + %1692 = select %1687, %1691, %1690 : index + %c8_778 = constant 8 : index + %1693 = addi %arg5, %c8_778 : index + %c16_779 = constant 16 : index + %c0_780 = constant 0 : index + %c-1_781 = constant -1 : index + %1694 = cmpi "slt", %1693, %c0_780 : index + %1695 = subi %c-1_781, %1693 : index + %1696 = select %1694, %1695, %1693 : index + %1697 = divi_signed %1696, %c16_779 : index + %1698 = subi %c-1_781, %1697 : index + %1699 = select %1694, %1698, %1697 : index + %c-2_782 = constant -2 : index + %1700 = muli %1699, %c-2_782 : index + %1701 = addi %1692, %1700 : index + %c1_783 = constant 1 : index + %1702 = addi %1701, %c1_783 : index + %c2_784 = constant 2 : index + %c0_785 = constant 0 : index + %c-1_786 = constant -1 : index + %1703 = cmpi "slt", %1702, %c0_785 : index + %1704 = subi %c-1_786, %1702 : index + %1705 = select %1703, %1704, %1702 : index + %1706 = divi_signed %1705, %c2_784 : index + %1707 = subi %c-1_786, %1706 : index + %1708 = select %1703, %1707, %1706 : index + %c-2_787 = constant -2 : index + %1709 = muli %1708, %c-2_787 : index + %1710 = addi %1686, %1709 : index + %c1_788 = constant 1 : index + %1711 = addi %1710, %c1_788 : index + %1712 = load %3[%1666, %1671, %1711] : memref<16x128x2xvector<8xf32>> + %1713 = vector.extractelement %1712[%c2_i64 : i64] : vector<8xf32> + %c8_789 = constant 8 : index + %1714 = addi %arg5, %c8_789 : index + %c16_790 = constant 16 : index + %c0_791 = constant 0 : index + %c-1_792 = constant -1 : index + %1715 = cmpi "slt", %1714, %c0_791 : index + %1716 = subi %c-1_792, %1714 : index + %1717 = select %1715, %1716, %1714 : index + %1718 = divi_signed %1717, %c16_790 : index + %1719 = subi %c-1_792, %1718 : index + %1720 = select %1715, %1719, %1718 : index + %c16_793 = constant 16 : index + %1721 = remi_signed %1720, %c16_793 : index + %c0_794 = constant 0 : index + %1722 = cmpi "slt", %1721, %c0_794 : index + %1723 = addi %1721, %c16_793 : index + %1724 = select %1722, %1723, %1721 : index + %1725 = addi %arg6, %arg8 : index + %c128_795 = constant 128 : index + %1726 = remi_signed %1725, %c128_795 : index + %c0_796 = constant 0 : index + %1727 = cmpi "slt", %1726, %c0_796 : index + %1728 = addi %1726, %c128_795 : index + %1729 = select %1727, %1728, %1726 : index + %c8_797 = constant 8 : index + %c0_798 = constant 0 : index + %c-1_799 = constant -1 : index + %1730 = cmpi "slt", %arg5, %c0_798 : index + %1731 = subi %c-1_799, %arg5 : index + %1732 = select %1730, %1731, %arg5 : index + %1733 = divi_signed %1732, %c8_797 : index + %1734 = subi %c-1_799, %1733 : index + %1735 = select %1730, %1734, %1733 : index + %c8_800 = constant 8 : index + %1736 = addi %arg5, %c8_800 : index + %c16_801 = constant 16 : index + %c0_802 = constant 0 : index + %c-1_803 = constant -1 : index + %1737 = cmpi "slt", %1736, %c0_802 : index + %1738 = subi %c-1_803, %1736 : index + %1739 = select %1737, %1738, %1736 : index + %1740 = divi_signed %1739, %c16_801 : index + %1741 = subi %c-1_803, %1740 : index + %1742 = select %1737, %1741, %1740 : index + %c-2_804 = constant -2 : index + %1743 = muli %1742, %c-2_804 : index + %1744 = addi %1735, %1743 : index + %c8_805 = constant 8 : index + %c0_806 = constant 0 : index + %c-1_807 = constant -1 : index + %1745 = cmpi "slt", %arg5, %c0_806 : index + %1746 = subi %c-1_807, %arg5 : index + %1747 = select %1745, %1746, %arg5 : index + %1748 = divi_signed %1747, %c8_805 : index + %1749 = subi %c-1_807, %1748 : index + %1750 = select %1745, %1749, %1748 : index + %c8_808 = constant 8 : index + %1751 = addi %arg5, %c8_808 : index + %c16_809 = constant 16 : index + %c0_810 = constant 0 : index + %c-1_811 = constant -1 : index + %1752 = cmpi "slt", %1751, %c0_810 : index + %1753 = subi %c-1_811, %1751 : index + %1754 = select %1752, %1753, %1751 : index + %1755 = divi_signed %1754, %c16_809 : index + %1756 = subi %c-1_811, %1755 : index + %1757 = select %1752, %1756, %1755 : index + %c-2_812 = constant -2 : index + %1758 = muli %1757, %c-2_812 : index + %1759 = addi %1750, %1758 : index + %c1_813 = constant 1 : index + %1760 = addi %1759, %c1_813 : index + %c2_814 = constant 2 : index + %c0_815 = constant 0 : index + %c-1_816 = constant -1 : index + %1761 = cmpi "slt", %1760, %c0_815 : index + %1762 = subi %c-1_816, %1760 : index + %1763 = select %1761, %1762, %1760 : index + %1764 = divi_signed %1763, %c2_814 : index + %1765 = subi %c-1_816, %1764 : index + %1766 = select %1761, %1765, %1764 : index + %c-2_817 = constant -2 : index + %1767 = muli %1766, %c-2_817 : index + %1768 = addi %1744, %1767 : index + %c1_818 = constant 1 : index + %1769 = addi %1768, %c1_818 : index + %1770 = load %3[%1724, %1729, %1769] : memref<16x128x2xvector<8xf32>> + %1771 = vector.extractelement %1770[%c3_i64 : i64] : vector<8xf32> + %c8_819 = constant 8 : index + %1772 = addi %arg5, %c8_819 : index + %c16_820 = constant 16 : index + %c0_821 = constant 0 : index + %c-1_822 = constant -1 : index + %1773 = cmpi "slt", %1772, %c0_821 : index + %1774 = subi %c-1_822, %1772 : index + %1775 = select %1773, %1774, %1772 : index + %1776 = divi_signed %1775, %c16_820 : index + %1777 = subi %c-1_822, %1776 : index + %1778 = select %1773, %1777, %1776 : index + %c16_823 = constant 16 : index + %1779 = remi_signed %1778, %c16_823 : index + %c0_824 = constant 0 : index + %1780 = cmpi "slt", %1779, %c0_824 : index + %1781 = addi %1779, %c16_823 : index + %1782 = select %1780, %1781, %1779 : index + %1783 = addi %arg6, %arg8 : index + %c128_825 = constant 128 : index + %1784 = remi_signed %1783, %c128_825 : index + %c0_826 = constant 0 : index + %1785 = cmpi "slt", %1784, %c0_826 : index + %1786 = addi %1784, %c128_825 : index + %1787 = select %1785, %1786, %1784 : index + %c8_827 = constant 8 : index + %c0_828 = constant 0 : index + %c-1_829 = constant -1 : index + %1788 = cmpi "slt", %arg5, %c0_828 : index + %1789 = subi %c-1_829, %arg5 : index + %1790 = select %1788, %1789, %arg5 : index + %1791 = divi_signed %1790, %c8_827 : index + %1792 = subi %c-1_829, %1791 : index + %1793 = select %1788, %1792, %1791 : index + %c8_830 = constant 8 : index + %1794 = addi %arg5, %c8_830 : index + %c16_831 = constant 16 : index + %c0_832 = constant 0 : index + %c-1_833 = constant -1 : index + %1795 = cmpi "slt", %1794, %c0_832 : index + %1796 = subi %c-1_833, %1794 : index + %1797 = select %1795, %1796, %1794 : index + %1798 = divi_signed %1797, %c16_831 : index + %1799 = subi %c-1_833, %1798 : index + %1800 = select %1795, %1799, %1798 : index + %c-2_834 = constant -2 : index + %1801 = muli %1800, %c-2_834 : index + %1802 = addi %1793, %1801 : index + %c8_835 = constant 8 : index + %c0_836 = constant 0 : index + %c-1_837 = constant -1 : index + %1803 = cmpi "slt", %arg5, %c0_836 : index + %1804 = subi %c-1_837, %arg5 : index + %1805 = select %1803, %1804, %arg5 : index + %1806 = divi_signed %1805, %c8_835 : index + %1807 = subi %c-1_837, %1806 : index + %1808 = select %1803, %1807, %1806 : index + %c8_838 = constant 8 : index + %1809 = addi %arg5, %c8_838 : index + %c16_839 = constant 16 : index + %c0_840 = constant 0 : index + %c-1_841 = constant -1 : index + %1810 = cmpi "slt", %1809, %c0_840 : index + %1811 = subi %c-1_841, %1809 : index + %1812 = select %1810, %1811, %1809 : index + %1813 = divi_signed %1812, %c16_839 : index + %1814 = subi %c-1_841, %1813 : index + %1815 = select %1810, %1814, %1813 : index + %c-2_842 = constant -2 : index + %1816 = muli %1815, %c-2_842 : index + %1817 = addi %1808, %1816 : index + %c1_843 = constant 1 : index + %1818 = addi %1817, %c1_843 : index + %c2_844 = constant 2 : index + %c0_845 = constant 0 : index + %c-1_846 = constant -1 : index + %1819 = cmpi "slt", %1818, %c0_845 : index + %1820 = subi %c-1_846, %1818 : index + %1821 = select %1819, %1820, %1818 : index + %1822 = divi_signed %1821, %c2_844 : index + %1823 = subi %c-1_846, %1822 : index + %1824 = select %1819, %1823, %1822 : index + %c-2_847 = constant -2 : index + %1825 = muli %1824, %c-2_847 : index + %1826 = addi %1802, %1825 : index + %c1_848 = constant 1 : index + %1827 = addi %1826, %c1_848 : index + %1828 = load %3[%1782, %1787, %1827] : memref<16x128x2xvector<8xf32>> + %1829 = vector.extractelement %1828[%c4_i64 : i64] : vector<8xf32> + %c8_849 = constant 8 : index + %1830 = addi %arg5, %c8_849 : index + %c16_850 = constant 16 : index + %c0_851 = constant 0 : index + %c-1_852 = constant -1 : index + %1831 = cmpi "slt", %1830, %c0_851 : index + %1832 = subi %c-1_852, %1830 : index + %1833 = select %1831, %1832, %1830 : index + %1834 = divi_signed %1833, %c16_850 : index + %1835 = subi %c-1_852, %1834 : index + %1836 = select %1831, %1835, %1834 : index + %c16_853 = constant 16 : index + %1837 = remi_signed %1836, %c16_853 : index + %c0_854 = constant 0 : index + %1838 = cmpi "slt", %1837, %c0_854 : index + %1839 = addi %1837, %c16_853 : index + %1840 = select %1838, %1839, %1837 : index + %1841 = addi %arg6, %arg8 : index + %c128_855 = constant 128 : index + %1842 = remi_signed %1841, %c128_855 : index + %c0_856 = constant 0 : index + %1843 = cmpi "slt", %1842, %c0_856 : index + %1844 = addi %1842, %c128_855 : index + %1845 = select %1843, %1844, %1842 : index + %c8_857 = constant 8 : index + %c0_858 = constant 0 : index + %c-1_859 = constant -1 : index + %1846 = cmpi "slt", %arg5, %c0_858 : index + %1847 = subi %c-1_859, %arg5 : index + %1848 = select %1846, %1847, %arg5 : index + %1849 = divi_signed %1848, %c8_857 : index + %1850 = subi %c-1_859, %1849 : index + %1851 = select %1846, %1850, %1849 : index + %c8_860 = constant 8 : index + %1852 = addi %arg5, %c8_860 : index + %c16_861 = constant 16 : index + %c0_862 = constant 0 : index + %c-1_863 = constant -1 : index + %1853 = cmpi "slt", %1852, %c0_862 : index + %1854 = subi %c-1_863, %1852 : index + %1855 = select %1853, %1854, %1852 : index + %1856 = divi_signed %1855, %c16_861 : index + %1857 = subi %c-1_863, %1856 : index + %1858 = select %1853, %1857, %1856 : index + %c-2_864 = constant -2 : index + %1859 = muli %1858, %c-2_864 : index + %1860 = addi %1851, %1859 : index + %c8_865 = constant 8 : index + %c0_866 = constant 0 : index + %c-1_867 = constant -1 : index + %1861 = cmpi "slt", %arg5, %c0_866 : index + %1862 = subi %c-1_867, %arg5 : index + %1863 = select %1861, %1862, %arg5 : index + %1864 = divi_signed %1863, %c8_865 : index + %1865 = subi %c-1_867, %1864 : index + %1866 = select %1861, %1865, %1864 : index + %c8_868 = constant 8 : index + %1867 = addi %arg5, %c8_868 : index + %c16_869 = constant 16 : index + %c0_870 = constant 0 : index + %c-1_871 = constant -1 : index + %1868 = cmpi "slt", %1867, %c0_870 : index + %1869 = subi %c-1_871, %1867 : index + %1870 = select %1868, %1869, %1867 : index + %1871 = divi_signed %1870, %c16_869 : index + %1872 = subi %c-1_871, %1871 : index + %1873 = select %1868, %1872, %1871 : index + %c-2_872 = constant -2 : index + %1874 = muli %1873, %c-2_872 : index + %1875 = addi %1866, %1874 : index + %c1_873 = constant 1 : index + %1876 = addi %1875, %c1_873 : index + %c2_874 = constant 2 : index + %c0_875 = constant 0 : index + %c-1_876 = constant -1 : index + %1877 = cmpi "slt", %1876, %c0_875 : index + %1878 = subi %c-1_876, %1876 : index + %1879 = select %1877, %1878, %1876 : index + %1880 = divi_signed %1879, %c2_874 : index + %1881 = subi %c-1_876, %1880 : index + %1882 = select %1877, %1881, %1880 : index + %c-2_877 = constant -2 : index + %1883 = muli %1882, %c-2_877 : index + %1884 = addi %1860, %1883 : index + %c1_878 = constant 1 : index + %1885 = addi %1884, %c1_878 : index + %1886 = load %3[%1840, %1845, %1885] : memref<16x128x2xvector<8xf32>> + %1887 = vector.extractelement %1886[%c5_i64 : i64] : vector<8xf32> + %c8_879 = constant 8 : index + %1888 = addi %arg5, %c8_879 : index + %c16_880 = constant 16 : index + %c0_881 = constant 0 : index + %c-1_882 = constant -1 : index + %1889 = cmpi "slt", %1888, %c0_881 : index + %1890 = subi %c-1_882, %1888 : index + %1891 = select %1889, %1890, %1888 : index + %1892 = divi_signed %1891, %c16_880 : index + %1893 = subi %c-1_882, %1892 : index + %1894 = select %1889, %1893, %1892 : index + %c16_883 = constant 16 : index + %1895 = remi_signed %1894, %c16_883 : index + %c0_884 = constant 0 : index + %1896 = cmpi "slt", %1895, %c0_884 : index + %1897 = addi %1895, %c16_883 : index + %1898 = select %1896, %1897, %1895 : index + %1899 = addi %arg6, %arg8 : index + %c128_885 = constant 128 : index + %1900 = remi_signed %1899, %c128_885 : index + %c0_886 = constant 0 : index + %1901 = cmpi "slt", %1900, %c0_886 : index + %1902 = addi %1900, %c128_885 : index + %1903 = select %1901, %1902, %1900 : index + %c8_887 = constant 8 : index + %c0_888 = constant 0 : index + %c-1_889 = constant -1 : index + %1904 = cmpi "slt", %arg5, %c0_888 : index + %1905 = subi %c-1_889, %arg5 : index + %1906 = select %1904, %1905, %arg5 : index + %1907 = divi_signed %1906, %c8_887 : index + %1908 = subi %c-1_889, %1907 : index + %1909 = select %1904, %1908, %1907 : index + %c8_890 = constant 8 : index + %1910 = addi %arg5, %c8_890 : index + %c16_891 = constant 16 : index + %c0_892 = constant 0 : index + %c-1_893 = constant -1 : index + %1911 = cmpi "slt", %1910, %c0_892 : index + %1912 = subi %c-1_893, %1910 : index + %1913 = select %1911, %1912, %1910 : index + %1914 = divi_signed %1913, %c16_891 : index + %1915 = subi %c-1_893, %1914 : index + %1916 = select %1911, %1915, %1914 : index + %c-2_894 = constant -2 : index + %1917 = muli %1916, %c-2_894 : index + %1918 = addi %1909, %1917 : index + %c8_895 = constant 8 : index + %c0_896 = constant 0 : index + %c-1_897 = constant -1 : index + %1919 = cmpi "slt", %arg5, %c0_896 : index + %1920 = subi %c-1_897, %arg5 : index + %1921 = select %1919, %1920, %arg5 : index + %1922 = divi_signed %1921, %c8_895 : index + %1923 = subi %c-1_897, %1922 : index + %1924 = select %1919, %1923, %1922 : index + %c8_898 = constant 8 : index + %1925 = addi %arg5, %c8_898 : index + %c16_899 = constant 16 : index + %c0_900 = constant 0 : index + %c-1_901 = constant -1 : index + %1926 = cmpi "slt", %1925, %c0_900 : index + %1927 = subi %c-1_901, %1925 : index + %1928 = select %1926, %1927, %1925 : index + %1929 = divi_signed %1928, %c16_899 : index + %1930 = subi %c-1_901, %1929 : index + %1931 = select %1926, %1930, %1929 : index + %c-2_902 = constant -2 : index + %1932 = muli %1931, %c-2_902 : index + %1933 = addi %1924, %1932 : index + %c1_903 = constant 1 : index + %1934 = addi %1933, %c1_903 : index + %c2_904 = constant 2 : index + %c0_905 = constant 0 : index + %c-1_906 = constant -1 : index + %1935 = cmpi "slt", %1934, %c0_905 : index + %1936 = subi %c-1_906, %1934 : index + %1937 = select %1935, %1936, %1934 : index + %1938 = divi_signed %1937, %c2_904 : index + %1939 = subi %c-1_906, %1938 : index + %1940 = select %1935, %1939, %1938 : index + %c-2_907 = constant -2 : index + %1941 = muli %1940, %c-2_907 : index + %1942 = addi %1918, %1941 : index + %c1_908 = constant 1 : index + %1943 = addi %1942, %c1_908 : index + %1944 = load %3[%1898, %1903, %1943] : memref<16x128x2xvector<8xf32>> + %1945 = vector.extractelement %1944[%c6_i64 : i64] : vector<8xf32> + %c8_909 = constant 8 : index + %1946 = addi %arg5, %c8_909 : index + %c16_910 = constant 16 : index + %c0_911 = constant 0 : index + %c-1_912 = constant -1 : index + %1947 = cmpi "slt", %1946, %c0_911 : index + %1948 = subi %c-1_912, %1946 : index + %1949 = select %1947, %1948, %1946 : index + %1950 = divi_signed %1949, %c16_910 : index + %1951 = subi %c-1_912, %1950 : index + %1952 = select %1947, %1951, %1950 : index + %c16_913 = constant 16 : index + %1953 = remi_signed %1952, %c16_913 : index + %c0_914 = constant 0 : index + %1954 = cmpi "slt", %1953, %c0_914 : index + %1955 = addi %1953, %c16_913 : index + %1956 = select %1954, %1955, %1953 : index + %1957 = addi %arg6, %arg8 : index + %c128_915 = constant 128 : index + %1958 = remi_signed %1957, %c128_915 : index + %c0_916 = constant 0 : index + %1959 = cmpi "slt", %1958, %c0_916 : index + %1960 = addi %1958, %c128_915 : index + %1961 = select %1959, %1960, %1958 : index + %c8_917 = constant 8 : index + %c0_918 = constant 0 : index + %c-1_919 = constant -1 : index + %1962 = cmpi "slt", %arg5, %c0_918 : index + %1963 = subi %c-1_919, %arg5 : index + %1964 = select %1962, %1963, %arg5 : index + %1965 = divi_signed %1964, %c8_917 : index + %1966 = subi %c-1_919, %1965 : index + %1967 = select %1962, %1966, %1965 : index + %c8_920 = constant 8 : index + %1968 = addi %arg5, %c8_920 : index + %c16_921 = constant 16 : index + %c0_922 = constant 0 : index + %c-1_923 = constant -1 : index + %1969 = cmpi "slt", %1968, %c0_922 : index + %1970 = subi %c-1_923, %1968 : index + %1971 = select %1969, %1970, %1968 : index + %1972 = divi_signed %1971, %c16_921 : index + %1973 = subi %c-1_923, %1972 : index + %1974 = select %1969, %1973, %1972 : index + %c-2_924 = constant -2 : index + %1975 = muli %1974, %c-2_924 : index + %1976 = addi %1967, %1975 : index + %c8_925 = constant 8 : index + %c0_926 = constant 0 : index + %c-1_927 = constant -1 : index + %1977 = cmpi "slt", %arg5, %c0_926 : index + %1978 = subi %c-1_927, %arg5 : index + %1979 = select %1977, %1978, %arg5 : index + %1980 = divi_signed %1979, %c8_925 : index + %1981 = subi %c-1_927, %1980 : index + %1982 = select %1977, %1981, %1980 : index + %c8_928 = constant 8 : index + %1983 = addi %arg5, %c8_928 : index + %c16_929 = constant 16 : index + %c0_930 = constant 0 : index + %c-1_931 = constant -1 : index + %1984 = cmpi "slt", %1983, %c0_930 : index + %1985 = subi %c-1_931, %1983 : index + %1986 = select %1984, %1985, %1983 : index + %1987 = divi_signed %1986, %c16_929 : index + %1988 = subi %c-1_931, %1987 : index + %1989 = select %1984, %1988, %1987 : index + %c-2_932 = constant -2 : index + %1990 = muli %1989, %c-2_932 : index + %1991 = addi %1982, %1990 : index + %c1_933 = constant 1 : index + %1992 = addi %1991, %c1_933 : index + %c2_934 = constant 2 : index + %c0_935 = constant 0 : index + %c-1_936 = constant -1 : index + %1993 = cmpi "slt", %1992, %c0_935 : index + %1994 = subi %c-1_936, %1992 : index + %1995 = select %1993, %1994, %1992 : index + %1996 = divi_signed %1995, %c2_934 : index + %1997 = subi %c-1_936, %1996 : index + %1998 = select %1993, %1997, %1996 : index + %c-2_937 = constant -2 : index + %1999 = muli %1998, %c-2_937 : index + %2000 = addi %1976, %1999 : index + %c1_938 = constant 1 : index + %2001 = addi %2000, %c1_938 : index + %2002 = load %3[%1956, %1961, %2001] : memref<16x128x2xvector<8xf32>> + %2003 = vector.extractelement %2002[%c7_i64 : i64] : vector<8xf32> + %2004 = "accv.bin_op"(%1532, %1597) {predicate = 2 : i64} : (f32, f32) -> f32 + %2005 = "accv.bin_op"(%1533, %1655) {predicate = 2 : i64} : (f32, f32) -> f32 + %2006 = "accv.bin_op"(%1534, %1713) {predicate = 2 : i64} : (f32, f32) -> f32 + %2007 = "accv.bin_op"(%1535, %1771) {predicate = 2 : i64} : (f32, f32) -> f32 + %2008 = "accv.bin_op"(%1536, %1829) {predicate = 2 : i64} : (f32, f32) -> f32 + %2009 = "accv.bin_op"(%1537, %1887) {predicate = 2 : i64} : (f32, f32) -> f32 + %2010 = "accv.bin_op"(%1538, %1945) {predicate = 2 : i64} : (f32, f32) -> f32 + %2011 = "accv.bin_op"(%1539, %2003) {predicate = 2 : i64} : (f32, f32) -> f32 + %c8_939 = constant 8 : index + %2012 = addi %arg5, %c8_939 : index + %c16_940 = constant 16 : index + %c0_941 = constant 0 : index + %c-1_942 = constant -1 : index + %2013 = cmpi "slt", %2012, %c0_941 : index + %2014 = subi %c-1_942, %2012 : index + %2015 = select %2013, %2014, %2012 : index + %2016 = divi_signed %2015, %c16_940 : index + %2017 = subi %c-1_942, %2016 : index + %2018 = select %2013, %2017, %2016 : index + %c16_943 = constant 16 : index + %2019 = remi_signed %2018, %c16_943 : index + %c0_944 = constant 0 : index + %2020 = cmpi "slt", %2019, %c0_944 : index + %2021 = addi %2019, %c16_943 : index + %2022 = select %2020, %2021, %2019 : index + %2023 = addi %arg7, %arg9 : index + %c6_945 = constant 6 : index + %2024 = remi_signed %2023, %c6_945 : index + %c0_946 = constant 0 : index + %2025 = cmpi "slt", %2024, %c0_946 : index + %2026 = addi %2024, %c6_945 : index + %2027 = select %2025, %2026, %2024 : index + %c8_947 = constant 8 : index + %c0_948 = constant 0 : index + %c-1_949 = constant -1 : index + %2028 = cmpi "slt", %arg5, %c0_948 : index + %2029 = subi %c-1_949, %arg5 : index + %2030 = select %2028, %2029, %arg5 : index + %2031 = divi_signed %2030, %c8_947 : index + %2032 = subi %c-1_949, %2031 : index + %2033 = select %2028, %2032, %2031 : index + %c8_950 = constant 8 : index + %2034 = addi %arg5, %c8_950 : index + %c16_951 = constant 16 : index + %c0_952 = constant 0 : index + %c-1_953 = constant -1 : index + %2035 = cmpi "slt", %2034, %c0_952 : index + %2036 = subi %c-1_953, %2034 : index + %2037 = select %2035, %2036, %2034 : index + %2038 = divi_signed %2037, %c16_951 : index + %2039 = subi %c-1_953, %2038 : index + %2040 = select %2035, %2039, %2038 : index + %c-2_954 = constant -2 : index + %2041 = muli %2040, %c-2_954 : index + %2042 = addi %2033, %2041 : index + %c8_955 = constant 8 : index + %c0_956 = constant 0 : index + %c-1_957 = constant -1 : index + %2043 = cmpi "slt", %arg5, %c0_956 : index + %2044 = subi %c-1_957, %arg5 : index + %2045 = select %2043, %2044, %arg5 : index + %2046 = divi_signed %2045, %c8_955 : index + %2047 = subi %c-1_957, %2046 : index + %2048 = select %2043, %2047, %2046 : index + %c8_958 = constant 8 : index + %2049 = addi %arg5, %c8_958 : index + %c16_959 = constant 16 : index + %c0_960 = constant 0 : index + %c-1_961 = constant -1 : index + %2050 = cmpi "slt", %2049, %c0_960 : index + %2051 = subi %c-1_961, %2049 : index + %2052 = select %2050, %2051, %2049 : index + %2053 = divi_signed %2052, %c16_959 : index + %2054 = subi %c-1_961, %2053 : index + %2055 = select %2050, %2054, %2053 : index + %c-2_962 = constant -2 : index + %2056 = muli %2055, %c-2_962 : index + %2057 = addi %2048, %2056 : index + %c1_963 = constant 1 : index + %2058 = addi %2057, %c1_963 : index + %c2_964 = constant 2 : index + %c0_965 = constant 0 : index + %c-1_966 = constant -1 : index + %2059 = cmpi "slt", %2058, %c0_965 : index + %2060 = subi %c-1_966, %2058 : index + %2061 = select %2059, %2060, %2058 : index + %2062 = divi_signed %2061, %c2_964 : index + %2063 = subi %c-1_966, %2062 : index + %2064 = select %2059, %2063, %2062 : index + %c-2_967 = constant -2 : index + %2065 = muli %2064, %c-2_967 : index + %2066 = addi %2042, %2065 : index + %c1_968 = constant 1 : index + %2067 = addi %2066, %c1_968 : index + %2068 = load %2[%2022, %2027, %2067] : memref<16x6x2xvector<8xf32>> + %2069 = vector.extractelement %2068[%c0_i64 : i64] : vector<8xf32> + %c8_969 = constant 8 : index + %2070 = addi %arg5, %c8_969 : index + %c16_970 = constant 16 : index + %c0_971 = constant 0 : index + %c-1_972 = constant -1 : index + %2071 = cmpi "slt", %2070, %c0_971 : index + %2072 = subi %c-1_972, %2070 : index + %2073 = select %2071, %2072, %2070 : index + %2074 = divi_signed %2073, %c16_970 : index + %2075 = subi %c-1_972, %2074 : index + %2076 = select %2071, %2075, %2074 : index + %c16_973 = constant 16 : index + %2077 = remi_signed %2076, %c16_973 : index + %c0_974 = constant 0 : index + %2078 = cmpi "slt", %2077, %c0_974 : index + %2079 = addi %2077, %c16_973 : index + %2080 = select %2078, %2079, %2077 : index + %2081 = addi %arg7, %arg9 : index + %c6_975 = constant 6 : index + %2082 = remi_signed %2081, %c6_975 : index + %c0_976 = constant 0 : index + %2083 = cmpi "slt", %2082, %c0_976 : index + %2084 = addi %2082, %c6_975 : index + %2085 = select %2083, %2084, %2082 : index + %c8_977 = constant 8 : index + %c0_978 = constant 0 : index + %c-1_979 = constant -1 : index + %2086 = cmpi "slt", %arg5, %c0_978 : index + %2087 = subi %c-1_979, %arg5 : index + %2088 = select %2086, %2087, %arg5 : index + %2089 = divi_signed %2088, %c8_977 : index + %2090 = subi %c-1_979, %2089 : index + %2091 = select %2086, %2090, %2089 : index + %c8_980 = constant 8 : index + %2092 = addi %arg5, %c8_980 : index + %c16_981 = constant 16 : index + %c0_982 = constant 0 : index + %c-1_983 = constant -1 : index + %2093 = cmpi "slt", %2092, %c0_982 : index + %2094 = subi %c-1_983, %2092 : index + %2095 = select %2093, %2094, %2092 : index + %2096 = divi_signed %2095, %c16_981 : index + %2097 = subi %c-1_983, %2096 : index + %2098 = select %2093, %2097, %2096 : index + %c-2_984 = constant -2 : index + %2099 = muli %2098, %c-2_984 : index + %2100 = addi %2091, %2099 : index + %c8_985 = constant 8 : index + %c0_986 = constant 0 : index + %c-1_987 = constant -1 : index + %2101 = cmpi "slt", %arg5, %c0_986 : index + %2102 = subi %c-1_987, %arg5 : index + %2103 = select %2101, %2102, %arg5 : index + %2104 = divi_signed %2103, %c8_985 : index + %2105 = subi %c-1_987, %2104 : index + %2106 = select %2101, %2105, %2104 : index + %c8_988 = constant 8 : index + %2107 = addi %arg5, %c8_988 : index + %c16_989 = constant 16 : index + %c0_990 = constant 0 : index + %c-1_991 = constant -1 : index + %2108 = cmpi "slt", %2107, %c0_990 : index + %2109 = subi %c-1_991, %2107 : index + %2110 = select %2108, %2109, %2107 : index + %2111 = divi_signed %2110, %c16_989 : index + %2112 = subi %c-1_991, %2111 : index + %2113 = select %2108, %2112, %2111 : index + %c-2_992 = constant -2 : index + %2114 = muli %2113, %c-2_992 : index + %2115 = addi %2106, %2114 : index + %c1_993 = constant 1 : index + %2116 = addi %2115, %c1_993 : index + %c2_994 = constant 2 : index + %c0_995 = constant 0 : index + %c-1_996 = constant -1 : index + %2117 = cmpi "slt", %2116, %c0_995 : index + %2118 = subi %c-1_996, %2116 : index + %2119 = select %2117, %2118, %2116 : index + %2120 = divi_signed %2119, %c2_994 : index + %2121 = subi %c-1_996, %2120 : index + %2122 = select %2117, %2121, %2120 : index + %c-2_997 = constant -2 : index + %2123 = muli %2122, %c-2_997 : index + %2124 = addi %2100, %2123 : index + %c1_998 = constant 1 : index + %2125 = addi %2124, %c1_998 : index + %2126 = load %2[%2080, %2085, %2125] : memref<16x6x2xvector<8xf32>> + %2127 = vector.extractelement %2126[%c1_i64 : i64] : vector<8xf32> + %c8_999 = constant 8 : index + %2128 = addi %arg5, %c8_999 : index + %c16_1000 = constant 16 : index + %c0_1001 = constant 0 : index + %c-1_1002 = constant -1 : index + %2129 = cmpi "slt", %2128, %c0_1001 : index + %2130 = subi %c-1_1002, %2128 : index + %2131 = select %2129, %2130, %2128 : index + %2132 = divi_signed %2131, %c16_1000 : index + %2133 = subi %c-1_1002, %2132 : index + %2134 = select %2129, %2133, %2132 : index + %c16_1003 = constant 16 : index + %2135 = remi_signed %2134, %c16_1003 : index + %c0_1004 = constant 0 : index + %2136 = cmpi "slt", %2135, %c0_1004 : index + %2137 = addi %2135, %c16_1003 : index + %2138 = select %2136, %2137, %2135 : index + %2139 = addi %arg7, %arg9 : index + %c6_1005 = constant 6 : index + %2140 = remi_signed %2139, %c6_1005 : index + %c0_1006 = constant 0 : index + %2141 = cmpi "slt", %2140, %c0_1006 : index + %2142 = addi %2140, %c6_1005 : index + %2143 = select %2141, %2142, %2140 : index + %c8_1007 = constant 8 : index + %c0_1008 = constant 0 : index + %c-1_1009 = constant -1 : index + %2144 = cmpi "slt", %arg5, %c0_1008 : index + %2145 = subi %c-1_1009, %arg5 : index + %2146 = select %2144, %2145, %arg5 : index + %2147 = divi_signed %2146, %c8_1007 : index + %2148 = subi %c-1_1009, %2147 : index + %2149 = select %2144, %2148, %2147 : index + %c8_1010 = constant 8 : index + %2150 = addi %arg5, %c8_1010 : index + %c16_1011 = constant 16 : index + %c0_1012 = constant 0 : index + %c-1_1013 = constant -1 : index + %2151 = cmpi "slt", %2150, %c0_1012 : index + %2152 = subi %c-1_1013, %2150 : index + %2153 = select %2151, %2152, %2150 : index + %2154 = divi_signed %2153, %c16_1011 : index + %2155 = subi %c-1_1013, %2154 : index + %2156 = select %2151, %2155, %2154 : index + %c-2_1014 = constant -2 : index + %2157 = muli %2156, %c-2_1014 : index + %2158 = addi %2149, %2157 : index + %c8_1015 = constant 8 : index + %c0_1016 = constant 0 : index + %c-1_1017 = constant -1 : index + %2159 = cmpi "slt", %arg5, %c0_1016 : index + %2160 = subi %c-1_1017, %arg5 : index + %2161 = select %2159, %2160, %arg5 : index + %2162 = divi_signed %2161, %c8_1015 : index + %2163 = subi %c-1_1017, %2162 : index + %2164 = select %2159, %2163, %2162 : index + %c8_1018 = constant 8 : index + %2165 = addi %arg5, %c8_1018 : index + %c16_1019 = constant 16 : index + %c0_1020 = constant 0 : index + %c-1_1021 = constant -1 : index + %2166 = cmpi "slt", %2165, %c0_1020 : index + %2167 = subi %c-1_1021, %2165 : index + %2168 = select %2166, %2167, %2165 : index + %2169 = divi_signed %2168, %c16_1019 : index + %2170 = subi %c-1_1021, %2169 : index + %2171 = select %2166, %2170, %2169 : index + %c-2_1022 = constant -2 : index + %2172 = muli %2171, %c-2_1022 : index + %2173 = addi %2164, %2172 : index + %c1_1023 = constant 1 : index + %2174 = addi %2173, %c1_1023 : index + %c2_1024 = constant 2 : index + %c0_1025 = constant 0 : index + %c-1_1026 = constant -1 : index + %2175 = cmpi "slt", %2174, %c0_1025 : index + %2176 = subi %c-1_1026, %2174 : index + %2177 = select %2175, %2176, %2174 : index + %2178 = divi_signed %2177, %c2_1024 : index + %2179 = subi %c-1_1026, %2178 : index + %2180 = select %2175, %2179, %2178 : index + %c-2_1027 = constant -2 : index + %2181 = muli %2180, %c-2_1027 : index + %2182 = addi %2158, %2181 : index + %c1_1028 = constant 1 : index + %2183 = addi %2182, %c1_1028 : index + %2184 = load %2[%2138, %2143, %2183] : memref<16x6x2xvector<8xf32>> + %2185 = vector.extractelement %2184[%c2_i64 : i64] : vector<8xf32> + %c8_1029 = constant 8 : index + %2186 = addi %arg5, %c8_1029 : index + %c16_1030 = constant 16 : index + %c0_1031 = constant 0 : index + %c-1_1032 = constant -1 : index + %2187 = cmpi "slt", %2186, %c0_1031 : index + %2188 = subi %c-1_1032, %2186 : index + %2189 = select %2187, %2188, %2186 : index + %2190 = divi_signed %2189, %c16_1030 : index + %2191 = subi %c-1_1032, %2190 : index + %2192 = select %2187, %2191, %2190 : index + %c16_1033 = constant 16 : index + %2193 = remi_signed %2192, %c16_1033 : index + %c0_1034 = constant 0 : index + %2194 = cmpi "slt", %2193, %c0_1034 : index + %2195 = addi %2193, %c16_1033 : index + %2196 = select %2194, %2195, %2193 : index + %2197 = addi %arg7, %arg9 : index + %c6_1035 = constant 6 : index + %2198 = remi_signed %2197, %c6_1035 : index + %c0_1036 = constant 0 : index + %2199 = cmpi "slt", %2198, %c0_1036 : index + %2200 = addi %2198, %c6_1035 : index + %2201 = select %2199, %2200, %2198 : index + %c8_1037 = constant 8 : index + %c0_1038 = constant 0 : index + %c-1_1039 = constant -1 : index + %2202 = cmpi "slt", %arg5, %c0_1038 : index + %2203 = subi %c-1_1039, %arg5 : index + %2204 = select %2202, %2203, %arg5 : index + %2205 = divi_signed %2204, %c8_1037 : index + %2206 = subi %c-1_1039, %2205 : index + %2207 = select %2202, %2206, %2205 : index + %c8_1040 = constant 8 : index + %2208 = addi %arg5, %c8_1040 : index + %c16_1041 = constant 16 : index + %c0_1042 = constant 0 : index + %c-1_1043 = constant -1 : index + %2209 = cmpi "slt", %2208, %c0_1042 : index + %2210 = subi %c-1_1043, %2208 : index + %2211 = select %2209, %2210, %2208 : index + %2212 = divi_signed %2211, %c16_1041 : index + %2213 = subi %c-1_1043, %2212 : index + %2214 = select %2209, %2213, %2212 : index + %c-2_1044 = constant -2 : index + %2215 = muli %2214, %c-2_1044 : index + %2216 = addi %2207, %2215 : index + %c8_1045 = constant 8 : index + %c0_1046 = constant 0 : index + %c-1_1047 = constant -1 : index + %2217 = cmpi "slt", %arg5, %c0_1046 : index + %2218 = subi %c-1_1047, %arg5 : index + %2219 = select %2217, %2218, %arg5 : index + %2220 = divi_signed %2219, %c8_1045 : index + %2221 = subi %c-1_1047, %2220 : index + %2222 = select %2217, %2221, %2220 : index + %c8_1048 = constant 8 : index + %2223 = addi %arg5, %c8_1048 : index + %c16_1049 = constant 16 : index + %c0_1050 = constant 0 : index + %c-1_1051 = constant -1 : index + %2224 = cmpi "slt", %2223, %c0_1050 : index + %2225 = subi %c-1_1051, %2223 : index + %2226 = select %2224, %2225, %2223 : index + %2227 = divi_signed %2226, %c16_1049 : index + %2228 = subi %c-1_1051, %2227 : index + %2229 = select %2224, %2228, %2227 : index + %c-2_1052 = constant -2 : index + %2230 = muli %2229, %c-2_1052 : index + %2231 = addi %2222, %2230 : index + %c1_1053 = constant 1 : index + %2232 = addi %2231, %c1_1053 : index + %c2_1054 = constant 2 : index + %c0_1055 = constant 0 : index + %c-1_1056 = constant -1 : index + %2233 = cmpi "slt", %2232, %c0_1055 : index + %2234 = subi %c-1_1056, %2232 : index + %2235 = select %2233, %2234, %2232 : index + %2236 = divi_signed %2235, %c2_1054 : index + %2237 = subi %c-1_1056, %2236 : index + %2238 = select %2233, %2237, %2236 : index + %c-2_1057 = constant -2 : index + %2239 = muli %2238, %c-2_1057 : index + %2240 = addi %2216, %2239 : index + %c1_1058 = constant 1 : index + %2241 = addi %2240, %c1_1058 : index + %2242 = load %2[%2196, %2201, %2241] : memref<16x6x2xvector<8xf32>> + %2243 = vector.extractelement %2242[%c3_i64 : i64] : vector<8xf32> + %c8_1059 = constant 8 : index + %2244 = addi %arg5, %c8_1059 : index + %c16_1060 = constant 16 : index + %c0_1061 = constant 0 : index + %c-1_1062 = constant -1 : index + %2245 = cmpi "slt", %2244, %c0_1061 : index + %2246 = subi %c-1_1062, %2244 : index + %2247 = select %2245, %2246, %2244 : index + %2248 = divi_signed %2247, %c16_1060 : index + %2249 = subi %c-1_1062, %2248 : index + %2250 = select %2245, %2249, %2248 : index + %c16_1063 = constant 16 : index + %2251 = remi_signed %2250, %c16_1063 : index + %c0_1064 = constant 0 : index + %2252 = cmpi "slt", %2251, %c0_1064 : index + %2253 = addi %2251, %c16_1063 : index + %2254 = select %2252, %2253, %2251 : index + %2255 = addi %arg7, %arg9 : index + %c6_1065 = constant 6 : index + %2256 = remi_signed %2255, %c6_1065 : index + %c0_1066 = constant 0 : index + %2257 = cmpi "slt", %2256, %c0_1066 : index + %2258 = addi %2256, %c6_1065 : index + %2259 = select %2257, %2258, %2256 : index + %c8_1067 = constant 8 : index + %c0_1068 = constant 0 : index + %c-1_1069 = constant -1 : index + %2260 = cmpi "slt", %arg5, %c0_1068 : index + %2261 = subi %c-1_1069, %arg5 : index + %2262 = select %2260, %2261, %arg5 : index + %2263 = divi_signed %2262, %c8_1067 : index + %2264 = subi %c-1_1069, %2263 : index + %2265 = select %2260, %2264, %2263 : index + %c8_1070 = constant 8 : index + %2266 = addi %arg5, %c8_1070 : index + %c16_1071 = constant 16 : index + %c0_1072 = constant 0 : index + %c-1_1073 = constant -1 : index + %2267 = cmpi "slt", %2266, %c0_1072 : index + %2268 = subi %c-1_1073, %2266 : index + %2269 = select %2267, %2268, %2266 : index + %2270 = divi_signed %2269, %c16_1071 : index + %2271 = subi %c-1_1073, %2270 : index + %2272 = select %2267, %2271, %2270 : index + %c-2_1074 = constant -2 : index + %2273 = muli %2272, %c-2_1074 : index + %2274 = addi %2265, %2273 : index + %c8_1075 = constant 8 : index + %c0_1076 = constant 0 : index + %c-1_1077 = constant -1 : index + %2275 = cmpi "slt", %arg5, %c0_1076 : index + %2276 = subi %c-1_1077, %arg5 : index + %2277 = select %2275, %2276, %arg5 : index + %2278 = divi_signed %2277, %c8_1075 : index + %2279 = subi %c-1_1077, %2278 : index + %2280 = select %2275, %2279, %2278 : index + %c8_1078 = constant 8 : index + %2281 = addi %arg5, %c8_1078 : index + %c16_1079 = constant 16 : index + %c0_1080 = constant 0 : index + %c-1_1081 = constant -1 : index + %2282 = cmpi "slt", %2281, %c0_1080 : index + %2283 = subi %c-1_1081, %2281 : index + %2284 = select %2282, %2283, %2281 : index + %2285 = divi_signed %2284, %c16_1079 : index + %2286 = subi %c-1_1081, %2285 : index + %2287 = select %2282, %2286, %2285 : index + %c-2_1082 = constant -2 : index + %2288 = muli %2287, %c-2_1082 : index + %2289 = addi %2280, %2288 : index + %c1_1083 = constant 1 : index + %2290 = addi %2289, %c1_1083 : index + %c2_1084 = constant 2 : index + %c0_1085 = constant 0 : index + %c-1_1086 = constant -1 : index + %2291 = cmpi "slt", %2290, %c0_1085 : index + %2292 = subi %c-1_1086, %2290 : index + %2293 = select %2291, %2292, %2290 : index + %2294 = divi_signed %2293, %c2_1084 : index + %2295 = subi %c-1_1086, %2294 : index + %2296 = select %2291, %2295, %2294 : index + %c-2_1087 = constant -2 : index + %2297 = muli %2296, %c-2_1087 : index + %2298 = addi %2274, %2297 : index + %c1_1088 = constant 1 : index + %2299 = addi %2298, %c1_1088 : index + %2300 = load %2[%2254, %2259, %2299] : memref<16x6x2xvector<8xf32>> + %2301 = vector.extractelement %2300[%c4_i64 : i64] : vector<8xf32> + %c8_1089 = constant 8 : index + %2302 = addi %arg5, %c8_1089 : index + %c16_1090 = constant 16 : index + %c0_1091 = constant 0 : index + %c-1_1092 = constant -1 : index + %2303 = cmpi "slt", %2302, %c0_1091 : index + %2304 = subi %c-1_1092, %2302 : index + %2305 = select %2303, %2304, %2302 : index + %2306 = divi_signed %2305, %c16_1090 : index + %2307 = subi %c-1_1092, %2306 : index + %2308 = select %2303, %2307, %2306 : index + %c16_1093 = constant 16 : index + %2309 = remi_signed %2308, %c16_1093 : index + %c0_1094 = constant 0 : index + %2310 = cmpi "slt", %2309, %c0_1094 : index + %2311 = addi %2309, %c16_1093 : index + %2312 = select %2310, %2311, %2309 : index + %2313 = addi %arg7, %arg9 : index + %c6_1095 = constant 6 : index + %2314 = remi_signed %2313, %c6_1095 : index + %c0_1096 = constant 0 : index + %2315 = cmpi "slt", %2314, %c0_1096 : index + %2316 = addi %2314, %c6_1095 : index + %2317 = select %2315, %2316, %2314 : index + %c8_1097 = constant 8 : index + %c0_1098 = constant 0 : index + %c-1_1099 = constant -1 : index + %2318 = cmpi "slt", %arg5, %c0_1098 : index + %2319 = subi %c-1_1099, %arg5 : index + %2320 = select %2318, %2319, %arg5 : index + %2321 = divi_signed %2320, %c8_1097 : index + %2322 = subi %c-1_1099, %2321 : index + %2323 = select %2318, %2322, %2321 : index + %c8_1100 = constant 8 : index + %2324 = addi %arg5, %c8_1100 : index + %c16_1101 = constant 16 : index + %c0_1102 = constant 0 : index + %c-1_1103 = constant -1 : index + %2325 = cmpi "slt", %2324, %c0_1102 : index + %2326 = subi %c-1_1103, %2324 : index + %2327 = select %2325, %2326, %2324 : index + %2328 = divi_signed %2327, %c16_1101 : index + %2329 = subi %c-1_1103, %2328 : index + %2330 = select %2325, %2329, %2328 : index + %c-2_1104 = constant -2 : index + %2331 = muli %2330, %c-2_1104 : index + %2332 = addi %2323, %2331 : index + %c8_1105 = constant 8 : index + %c0_1106 = constant 0 : index + %c-1_1107 = constant -1 : index + %2333 = cmpi "slt", %arg5, %c0_1106 : index + %2334 = subi %c-1_1107, %arg5 : index + %2335 = select %2333, %2334, %arg5 : index + %2336 = divi_signed %2335, %c8_1105 : index + %2337 = subi %c-1_1107, %2336 : index + %2338 = select %2333, %2337, %2336 : index + %c8_1108 = constant 8 : index + %2339 = addi %arg5, %c8_1108 : index + %c16_1109 = constant 16 : index + %c0_1110 = constant 0 : index + %c-1_1111 = constant -1 : index + %2340 = cmpi "slt", %2339, %c0_1110 : index + %2341 = subi %c-1_1111, %2339 : index + %2342 = select %2340, %2341, %2339 : index + %2343 = divi_signed %2342, %c16_1109 : index + %2344 = subi %c-1_1111, %2343 : index + %2345 = select %2340, %2344, %2343 : index + %c-2_1112 = constant -2 : index + %2346 = muli %2345, %c-2_1112 : index + %2347 = addi %2338, %2346 : index + %c1_1113 = constant 1 : index + %2348 = addi %2347, %c1_1113 : index + %c2_1114 = constant 2 : index + %c0_1115 = constant 0 : index + %c-1_1116 = constant -1 : index + %2349 = cmpi "slt", %2348, %c0_1115 : index + %2350 = subi %c-1_1116, %2348 : index + %2351 = select %2349, %2350, %2348 : index + %2352 = divi_signed %2351, %c2_1114 : index + %2353 = subi %c-1_1116, %2352 : index + %2354 = select %2349, %2353, %2352 : index + %c-2_1117 = constant -2 : index + %2355 = muli %2354, %c-2_1117 : index + %2356 = addi %2332, %2355 : index + %c1_1118 = constant 1 : index + %2357 = addi %2356, %c1_1118 : index + %2358 = load %2[%2312, %2317, %2357] : memref<16x6x2xvector<8xf32>> + %2359 = vector.extractelement %2358[%c5_i64 : i64] : vector<8xf32> + %c8_1119 = constant 8 : index + %2360 = addi %arg5, %c8_1119 : index + %c16_1120 = constant 16 : index + %c0_1121 = constant 0 : index + %c-1_1122 = constant -1 : index + %2361 = cmpi "slt", %2360, %c0_1121 : index + %2362 = subi %c-1_1122, %2360 : index + %2363 = select %2361, %2362, %2360 : index + %2364 = divi_signed %2363, %c16_1120 : index + %2365 = subi %c-1_1122, %2364 : index + %2366 = select %2361, %2365, %2364 : index + %c16_1123 = constant 16 : index + %2367 = remi_signed %2366, %c16_1123 : index + %c0_1124 = constant 0 : index + %2368 = cmpi "slt", %2367, %c0_1124 : index + %2369 = addi %2367, %c16_1123 : index + %2370 = select %2368, %2369, %2367 : index + %2371 = addi %arg7, %arg9 : index + %c6_1125 = constant 6 : index + %2372 = remi_signed %2371, %c6_1125 : index + %c0_1126 = constant 0 : index + %2373 = cmpi "slt", %2372, %c0_1126 : index + %2374 = addi %2372, %c6_1125 : index + %2375 = select %2373, %2374, %2372 : index + %c8_1127 = constant 8 : index + %c0_1128 = constant 0 : index + %c-1_1129 = constant -1 : index + %2376 = cmpi "slt", %arg5, %c0_1128 : index + %2377 = subi %c-1_1129, %arg5 : index + %2378 = select %2376, %2377, %arg5 : index + %2379 = divi_signed %2378, %c8_1127 : index + %2380 = subi %c-1_1129, %2379 : index + %2381 = select %2376, %2380, %2379 : index + %c8_1130 = constant 8 : index + %2382 = addi %arg5, %c8_1130 : index + %c16_1131 = constant 16 : index + %c0_1132 = constant 0 : index + %c-1_1133 = constant -1 : index + %2383 = cmpi "slt", %2382, %c0_1132 : index + %2384 = subi %c-1_1133, %2382 : index + %2385 = select %2383, %2384, %2382 : index + %2386 = divi_signed %2385, %c16_1131 : index + %2387 = subi %c-1_1133, %2386 : index + %2388 = select %2383, %2387, %2386 : index + %c-2_1134 = constant -2 : index + %2389 = muli %2388, %c-2_1134 : index + %2390 = addi %2381, %2389 : index + %c8_1135 = constant 8 : index + %c0_1136 = constant 0 : index + %c-1_1137 = constant -1 : index + %2391 = cmpi "slt", %arg5, %c0_1136 : index + %2392 = subi %c-1_1137, %arg5 : index + %2393 = select %2391, %2392, %arg5 : index + %2394 = divi_signed %2393, %c8_1135 : index + %2395 = subi %c-1_1137, %2394 : index + %2396 = select %2391, %2395, %2394 : index + %c8_1138 = constant 8 : index + %2397 = addi %arg5, %c8_1138 : index + %c16_1139 = constant 16 : index + %c0_1140 = constant 0 : index + %c-1_1141 = constant -1 : index + %2398 = cmpi "slt", %2397, %c0_1140 : index + %2399 = subi %c-1_1141, %2397 : index + %2400 = select %2398, %2399, %2397 : index + %2401 = divi_signed %2400, %c16_1139 : index + %2402 = subi %c-1_1141, %2401 : index + %2403 = select %2398, %2402, %2401 : index + %c-2_1142 = constant -2 : index + %2404 = muli %2403, %c-2_1142 : index + %2405 = addi %2396, %2404 : index + %c1_1143 = constant 1 : index + %2406 = addi %2405, %c1_1143 : index + %c2_1144 = constant 2 : index + %c0_1145 = constant 0 : index + %c-1_1146 = constant -1 : index + %2407 = cmpi "slt", %2406, %c0_1145 : index + %2408 = subi %c-1_1146, %2406 : index + %2409 = select %2407, %2408, %2406 : index + %2410 = divi_signed %2409, %c2_1144 : index + %2411 = subi %c-1_1146, %2410 : index + %2412 = select %2407, %2411, %2410 : index + %c-2_1147 = constant -2 : index + %2413 = muli %2412, %c-2_1147 : index + %2414 = addi %2390, %2413 : index + %c1_1148 = constant 1 : index + %2415 = addi %2414, %c1_1148 : index + %2416 = load %2[%2370, %2375, %2415] : memref<16x6x2xvector<8xf32>> + %2417 = vector.extractelement %2416[%c6_i64 : i64] : vector<8xf32> + %c8_1149 = constant 8 : index + %2418 = addi %arg5, %c8_1149 : index + %c16_1150 = constant 16 : index + %c0_1151 = constant 0 : index + %c-1_1152 = constant -1 : index + %2419 = cmpi "slt", %2418, %c0_1151 : index + %2420 = subi %c-1_1152, %2418 : index + %2421 = select %2419, %2420, %2418 : index + %2422 = divi_signed %2421, %c16_1150 : index + %2423 = subi %c-1_1152, %2422 : index + %2424 = select %2419, %2423, %2422 : index + %c16_1153 = constant 16 : index + %2425 = remi_signed %2424, %c16_1153 : index + %c0_1154 = constant 0 : index + %2426 = cmpi "slt", %2425, %c0_1154 : index + %2427 = addi %2425, %c16_1153 : index + %2428 = select %2426, %2427, %2425 : index + %2429 = addi %arg7, %arg9 : index + %c6_1155 = constant 6 : index + %2430 = remi_signed %2429, %c6_1155 : index + %c0_1156 = constant 0 : index + %2431 = cmpi "slt", %2430, %c0_1156 : index + %2432 = addi %2430, %c6_1155 : index + %2433 = select %2431, %2432, %2430 : index + %c8_1157 = constant 8 : index + %c0_1158 = constant 0 : index + %c-1_1159 = constant -1 : index + %2434 = cmpi "slt", %arg5, %c0_1158 : index + %2435 = subi %c-1_1159, %arg5 : index + %2436 = select %2434, %2435, %arg5 : index + %2437 = divi_signed %2436, %c8_1157 : index + %2438 = subi %c-1_1159, %2437 : index + %2439 = select %2434, %2438, %2437 : index + %c8_1160 = constant 8 : index + %2440 = addi %arg5, %c8_1160 : index + %c16_1161 = constant 16 : index + %c0_1162 = constant 0 : index + %c-1_1163 = constant -1 : index + %2441 = cmpi "slt", %2440, %c0_1162 : index + %2442 = subi %c-1_1163, %2440 : index + %2443 = select %2441, %2442, %2440 : index + %2444 = divi_signed %2443, %c16_1161 : index + %2445 = subi %c-1_1163, %2444 : index + %2446 = select %2441, %2445, %2444 : index + %c-2_1164 = constant -2 : index + %2447 = muli %2446, %c-2_1164 : index + %2448 = addi %2439, %2447 : index + %c8_1165 = constant 8 : index + %c0_1166 = constant 0 : index + %c-1_1167 = constant -1 : index + %2449 = cmpi "slt", %arg5, %c0_1166 : index + %2450 = subi %c-1_1167, %arg5 : index + %2451 = select %2449, %2450, %arg5 : index + %2452 = divi_signed %2451, %c8_1165 : index + %2453 = subi %c-1_1167, %2452 : index + %2454 = select %2449, %2453, %2452 : index + %c8_1168 = constant 8 : index + %2455 = addi %arg5, %c8_1168 : index + %c16_1169 = constant 16 : index + %c0_1170 = constant 0 : index + %c-1_1171 = constant -1 : index + %2456 = cmpi "slt", %2455, %c0_1170 : index + %2457 = subi %c-1_1171, %2455 : index + %2458 = select %2456, %2457, %2455 : index + %2459 = divi_signed %2458, %c16_1169 : index + %2460 = subi %c-1_1171, %2459 : index + %2461 = select %2456, %2460, %2459 : index + %c-2_1172 = constant -2 : index + %2462 = muli %2461, %c-2_1172 : index + %2463 = addi %2454, %2462 : index + %c1_1173 = constant 1 : index + %2464 = addi %2463, %c1_1173 : index + %c2_1174 = constant 2 : index + %c0_1175 = constant 0 : index + %c-1_1176 = constant -1 : index + %2465 = cmpi "slt", %2464, %c0_1175 : index + %2466 = subi %c-1_1176, %2464 : index + %2467 = select %2465, %2466, %2464 : index + %2468 = divi_signed %2467, %c2_1174 : index + %2469 = subi %c-1_1176, %2468 : index + %2470 = select %2465, %2469, %2468 : index + %c-2_1177 = constant -2 : index + %2471 = muli %2470, %c-2_1177 : index + %2472 = addi %2448, %2471 : index + %c1_1178 = constant 1 : index + %2473 = addi %2472, %c1_1178 : index + %2474 = load %2[%2428, %2433, %2473] : memref<16x6x2xvector<8xf32>> + %2475 = vector.extractelement %2474[%c7_i64 : i64] : vector<8xf32> + %2476 = "accv.bin_op"(%2069, %2004) {predicate = 0 : i64} : (f32, f32) -> f32 + %2477 = "accv.bin_op"(%2127, %2005) {predicate = 0 : i64} : (f32, f32) -> f32 + %2478 = "accv.bin_op"(%2185, %2006) {predicate = 0 : i64} : (f32, f32) -> f32 + %2479 = "accv.bin_op"(%2243, %2007) {predicate = 0 : i64} : (f32, f32) -> f32 + %2480 = "accv.bin_op"(%2301, %2008) {predicate = 0 : i64} : (f32, f32) -> f32 + %2481 = "accv.bin_op"(%2359, %2009) {predicate = 0 : i64} : (f32, f32) -> f32 + %2482 = "accv.bin_op"(%2417, %2010) {predicate = 0 : i64} : (f32, f32) -> f32 + %2483 = "accv.bin_op"(%2475, %2011) {predicate = 0 : i64} : (f32, f32) -> f32 + %c8_1179 = constant 8 : index + %2484 = addi %arg5, %c8_1179 : index + %c16_1180 = constant 16 : index + %c0_1181 = constant 0 : index + %c-1_1182 = constant -1 : index + %2485 = cmpi "slt", %2484, %c0_1181 : index + %2486 = subi %c-1_1182, %2484 : index + %2487 = select %2485, %2486, %2484 : index + %2488 = divi_signed %2487, %c16_1180 : index + %2489 = subi %c-1_1182, %2488 : index + %2490 = select %2485, %2489, %2488 : index + %c16_1183 = constant 16 : index + %2491 = remi_signed %2490, %c16_1183 : index + %c0_1184 = constant 0 : index + %2492 = cmpi "slt", %2491, %c0_1184 : index + %2493 = addi %2491, %c16_1183 : index + %2494 = select %2492, %2493, %2491 : index + %2495 = addi %arg7, %arg9 : index + %c6_1185 = constant 6 : index + %2496 = remi_signed %2495, %c6_1185 : index + %c0_1186 = constant 0 : index + %2497 = cmpi "slt", %2496, %c0_1186 : index + %2498 = addi %2496, %c6_1185 : index + %2499 = select %2497, %2498, %2496 : index + %c8_1187 = constant 8 : index + %c0_1188 = constant 0 : index + %c-1_1189 = constant -1 : index + %2500 = cmpi "slt", %arg5, %c0_1188 : index + %2501 = subi %c-1_1189, %arg5 : index + %2502 = select %2500, %2501, %arg5 : index + %2503 = divi_signed %2502, %c8_1187 : index + %2504 = subi %c-1_1189, %2503 : index + %2505 = select %2500, %2504, %2503 : index + %c8_1190 = constant 8 : index + %2506 = addi %arg5, %c8_1190 : index + %c16_1191 = constant 16 : index + %c0_1192 = constant 0 : index + %c-1_1193 = constant -1 : index + %2507 = cmpi "slt", %2506, %c0_1192 : index + %2508 = subi %c-1_1193, %2506 : index + %2509 = select %2507, %2508, %2506 : index + %2510 = divi_signed %2509, %c16_1191 : index + %2511 = subi %c-1_1193, %2510 : index + %2512 = select %2507, %2511, %2510 : index + %c-2_1194 = constant -2 : index + %2513 = muli %2512, %c-2_1194 : index + %2514 = addi %2505, %2513 : index + %c8_1195 = constant 8 : index + %c0_1196 = constant 0 : index + %c-1_1197 = constant -1 : index + %2515 = cmpi "slt", %arg5, %c0_1196 : index + %2516 = subi %c-1_1197, %arg5 : index + %2517 = select %2515, %2516, %arg5 : index + %2518 = divi_signed %2517, %c8_1195 : index + %2519 = subi %c-1_1197, %2518 : index + %2520 = select %2515, %2519, %2518 : index + %c8_1198 = constant 8 : index + %2521 = addi %arg5, %c8_1198 : index + %c16_1199 = constant 16 : index + %c0_1200 = constant 0 : index + %c-1_1201 = constant -1 : index + %2522 = cmpi "slt", %2521, %c0_1200 : index + %2523 = subi %c-1_1201, %2521 : index + %2524 = select %2522, %2523, %2521 : index + %2525 = divi_signed %2524, %c16_1199 : index + %2526 = subi %c-1_1201, %2525 : index + %2527 = select %2522, %2526, %2525 : index + %c-2_1202 = constant -2 : index + %2528 = muli %2527, %c-2_1202 : index + %2529 = addi %2520, %2528 : index + %c1_1203 = constant 1 : index + %2530 = addi %2529, %c1_1203 : index + %c2_1204 = constant 2 : index + %c0_1205 = constant 0 : index + %c-1_1206 = constant -1 : index + %2531 = cmpi "slt", %2530, %c0_1205 : index + %2532 = subi %c-1_1206, %2530 : index + %2533 = select %2531, %2532, %2530 : index + %2534 = divi_signed %2533, %c2_1204 : index + %2535 = subi %c-1_1206, %2534 : index + %2536 = select %2531, %2535, %2534 : index + %c-2_1207 = constant -2 : index + %2537 = muli %2536, %c-2_1207 : index + %2538 = addi %2514, %2537 : index + %c1_1208 = constant 1 : index + %2539 = addi %2538, %c1_1208 : index + %2540 = load %2[%2494, %2499, %2539] : memref<16x6x2xvector<8xf32>> + %2541 = vector.insertelement %2476, %2540[%c0_i64 : i64] : vector<8xf32> + %c8_1209 = constant 8 : index + %2542 = addi %arg5, %c8_1209 : index + %c16_1210 = constant 16 : index + %c0_1211 = constant 0 : index + %c-1_1212 = constant -1 : index + %2543 = cmpi "slt", %2542, %c0_1211 : index + %2544 = subi %c-1_1212, %2542 : index + %2545 = select %2543, %2544, %2542 : index + %2546 = divi_signed %2545, %c16_1210 : index + %2547 = subi %c-1_1212, %2546 : index + %2548 = select %2543, %2547, %2546 : index + %c16_1213 = constant 16 : index + %2549 = remi_signed %2548, %c16_1213 : index + %c0_1214 = constant 0 : index + %2550 = cmpi "slt", %2549, %c0_1214 : index + %2551 = addi %2549, %c16_1213 : index + %2552 = select %2550, %2551, %2549 : index + %2553 = addi %arg7, %arg9 : index + %c6_1215 = constant 6 : index + %2554 = remi_signed %2553, %c6_1215 : index + %c0_1216 = constant 0 : index + %2555 = cmpi "slt", %2554, %c0_1216 : index + %2556 = addi %2554, %c6_1215 : index + %2557 = select %2555, %2556, %2554 : index + %c8_1217 = constant 8 : index + %c0_1218 = constant 0 : index + %c-1_1219 = constant -1 : index + %2558 = cmpi "slt", %arg5, %c0_1218 : index + %2559 = subi %c-1_1219, %arg5 : index + %2560 = select %2558, %2559, %arg5 : index + %2561 = divi_signed %2560, %c8_1217 : index + %2562 = subi %c-1_1219, %2561 : index + %2563 = select %2558, %2562, %2561 : index + %c8_1220 = constant 8 : index + %2564 = addi %arg5, %c8_1220 : index + %c16_1221 = constant 16 : index + %c0_1222 = constant 0 : index + %c-1_1223 = constant -1 : index + %2565 = cmpi "slt", %2564, %c0_1222 : index + %2566 = subi %c-1_1223, %2564 : index + %2567 = select %2565, %2566, %2564 : index + %2568 = divi_signed %2567, %c16_1221 : index + %2569 = subi %c-1_1223, %2568 : index + %2570 = select %2565, %2569, %2568 : index + %c-2_1224 = constant -2 : index + %2571 = muli %2570, %c-2_1224 : index + %2572 = addi %2563, %2571 : index + %c8_1225 = constant 8 : index + %c0_1226 = constant 0 : index + %c-1_1227 = constant -1 : index + %2573 = cmpi "slt", %arg5, %c0_1226 : index + %2574 = subi %c-1_1227, %arg5 : index + %2575 = select %2573, %2574, %arg5 : index + %2576 = divi_signed %2575, %c8_1225 : index + %2577 = subi %c-1_1227, %2576 : index + %2578 = select %2573, %2577, %2576 : index + %c8_1228 = constant 8 : index + %2579 = addi %arg5, %c8_1228 : index + %c16_1229 = constant 16 : index + %c0_1230 = constant 0 : index + %c-1_1231 = constant -1 : index + %2580 = cmpi "slt", %2579, %c0_1230 : index + %2581 = subi %c-1_1231, %2579 : index + %2582 = select %2580, %2581, %2579 : index + %2583 = divi_signed %2582, %c16_1229 : index + %2584 = subi %c-1_1231, %2583 : index + %2585 = select %2580, %2584, %2583 : index + %c-2_1232 = constant -2 : index + %2586 = muli %2585, %c-2_1232 : index + %2587 = addi %2578, %2586 : index + %c1_1233 = constant 1 : index + %2588 = addi %2587, %c1_1233 : index + %c2_1234 = constant 2 : index + %c0_1235 = constant 0 : index + %c-1_1236 = constant -1 : index + %2589 = cmpi "slt", %2588, %c0_1235 : index + %2590 = subi %c-1_1236, %2588 : index + %2591 = select %2589, %2590, %2588 : index + %2592 = divi_signed %2591, %c2_1234 : index + %2593 = subi %c-1_1236, %2592 : index + %2594 = select %2589, %2593, %2592 : index + %c-2_1237 = constant -2 : index + %2595 = muli %2594, %c-2_1237 : index + %2596 = addi %2572, %2595 : index + %c1_1238 = constant 1 : index + %2597 = addi %2596, %c1_1238 : index + store %2541, %2[%2552, %2557, %2597] : memref<16x6x2xvector<8xf32>> + %c8_1239 = constant 8 : index + %2598 = addi %arg5, %c8_1239 : index + %c16_1240 = constant 16 : index + %c0_1241 = constant 0 : index + %c-1_1242 = constant -1 : index + %2599 = cmpi "slt", %2598, %c0_1241 : index + %2600 = subi %c-1_1242, %2598 : index + %2601 = select %2599, %2600, %2598 : index + %2602 = divi_signed %2601, %c16_1240 : index + %2603 = subi %c-1_1242, %2602 : index + %2604 = select %2599, %2603, %2602 : index + %c16_1243 = constant 16 : index + %2605 = remi_signed %2604, %c16_1243 : index + %c0_1244 = constant 0 : index + %2606 = cmpi "slt", %2605, %c0_1244 : index + %2607 = addi %2605, %c16_1243 : index + %2608 = select %2606, %2607, %2605 : index + %2609 = addi %arg7, %arg9 : index + %c6_1245 = constant 6 : index + %2610 = remi_signed %2609, %c6_1245 : index + %c0_1246 = constant 0 : index + %2611 = cmpi "slt", %2610, %c0_1246 : index + %2612 = addi %2610, %c6_1245 : index + %2613 = select %2611, %2612, %2610 : index + %c8_1247 = constant 8 : index + %c0_1248 = constant 0 : index + %c-1_1249 = constant -1 : index + %2614 = cmpi "slt", %arg5, %c0_1248 : index + %2615 = subi %c-1_1249, %arg5 : index + %2616 = select %2614, %2615, %arg5 : index + %2617 = divi_signed %2616, %c8_1247 : index + %2618 = subi %c-1_1249, %2617 : index + %2619 = select %2614, %2618, %2617 : index + %c8_1250 = constant 8 : index + %2620 = addi %arg5, %c8_1250 : index + %c16_1251 = constant 16 : index + %c0_1252 = constant 0 : index + %c-1_1253 = constant -1 : index + %2621 = cmpi "slt", %2620, %c0_1252 : index + %2622 = subi %c-1_1253, %2620 : index + %2623 = select %2621, %2622, %2620 : index + %2624 = divi_signed %2623, %c16_1251 : index + %2625 = subi %c-1_1253, %2624 : index + %2626 = select %2621, %2625, %2624 : index + %c-2_1254 = constant -2 : index + %2627 = muli %2626, %c-2_1254 : index + %2628 = addi %2619, %2627 : index + %c8_1255 = constant 8 : index + %c0_1256 = constant 0 : index + %c-1_1257 = constant -1 : index + %2629 = cmpi "slt", %arg5, %c0_1256 : index + %2630 = subi %c-1_1257, %arg5 : index + %2631 = select %2629, %2630, %arg5 : index + %2632 = divi_signed %2631, %c8_1255 : index + %2633 = subi %c-1_1257, %2632 : index + %2634 = select %2629, %2633, %2632 : index + %c8_1258 = constant 8 : index + %2635 = addi %arg5, %c8_1258 : index + %c16_1259 = constant 16 : index + %c0_1260 = constant 0 : index + %c-1_1261 = constant -1 : index + %2636 = cmpi "slt", %2635, %c0_1260 : index + %2637 = subi %c-1_1261, %2635 : index + %2638 = select %2636, %2637, %2635 : index + %2639 = divi_signed %2638, %c16_1259 : index + %2640 = subi %c-1_1261, %2639 : index + %2641 = select %2636, %2640, %2639 : index + %c-2_1262 = constant -2 : index + %2642 = muli %2641, %c-2_1262 : index + %2643 = addi %2634, %2642 : index + %c1_1263 = constant 1 : index + %2644 = addi %2643, %c1_1263 : index + %c2_1264 = constant 2 : index + %c0_1265 = constant 0 : index + %c-1_1266 = constant -1 : index + %2645 = cmpi "slt", %2644, %c0_1265 : index + %2646 = subi %c-1_1266, %2644 : index + %2647 = select %2645, %2646, %2644 : index + %2648 = divi_signed %2647, %c2_1264 : index + %2649 = subi %c-1_1266, %2648 : index + %2650 = select %2645, %2649, %2648 : index + %c-2_1267 = constant -2 : index + %2651 = muli %2650, %c-2_1267 : index + %2652 = addi %2628, %2651 : index + %c1_1268 = constant 1 : index + %2653 = addi %2652, %c1_1268 : index + %2654 = load %2[%2608, %2613, %2653] : memref<16x6x2xvector<8xf32>> + %2655 = vector.insertelement %2477, %2654[%c1_i64 : i64] : vector<8xf32> + %c8_1269 = constant 8 : index + %2656 = addi %arg5, %c8_1269 : index + %c16_1270 = constant 16 : index + %c0_1271 = constant 0 : index + %c-1_1272 = constant -1 : index + %2657 = cmpi "slt", %2656, %c0_1271 : index + %2658 = subi %c-1_1272, %2656 : index + %2659 = select %2657, %2658, %2656 : index + %2660 = divi_signed %2659, %c16_1270 : index + %2661 = subi %c-1_1272, %2660 : index + %2662 = select %2657, %2661, %2660 : index + %c16_1273 = constant 16 : index + %2663 = remi_signed %2662, %c16_1273 : index + %c0_1274 = constant 0 : index + %2664 = cmpi "slt", %2663, %c0_1274 : index + %2665 = addi %2663, %c16_1273 : index + %2666 = select %2664, %2665, %2663 : index + %2667 = addi %arg7, %arg9 : index + %c6_1275 = constant 6 : index + %2668 = remi_signed %2667, %c6_1275 : index + %c0_1276 = constant 0 : index + %2669 = cmpi "slt", %2668, %c0_1276 : index + %2670 = addi %2668, %c6_1275 : index + %2671 = select %2669, %2670, %2668 : index + %c8_1277 = constant 8 : index + %c0_1278 = constant 0 : index + %c-1_1279 = constant -1 : index + %2672 = cmpi "slt", %arg5, %c0_1278 : index + %2673 = subi %c-1_1279, %arg5 : index + %2674 = select %2672, %2673, %arg5 : index + %2675 = divi_signed %2674, %c8_1277 : index + %2676 = subi %c-1_1279, %2675 : index + %2677 = select %2672, %2676, %2675 : index + %c8_1280 = constant 8 : index + %2678 = addi %arg5, %c8_1280 : index + %c16_1281 = constant 16 : index + %c0_1282 = constant 0 : index + %c-1_1283 = constant -1 : index + %2679 = cmpi "slt", %2678, %c0_1282 : index + %2680 = subi %c-1_1283, %2678 : index + %2681 = select %2679, %2680, %2678 : index + %2682 = divi_signed %2681, %c16_1281 : index + %2683 = subi %c-1_1283, %2682 : index + %2684 = select %2679, %2683, %2682 : index + %c-2_1284 = constant -2 : index + %2685 = muli %2684, %c-2_1284 : index + %2686 = addi %2677, %2685 : index + %c8_1285 = constant 8 : index + %c0_1286 = constant 0 : index + %c-1_1287 = constant -1 : index + %2687 = cmpi "slt", %arg5, %c0_1286 : index + %2688 = subi %c-1_1287, %arg5 : index + %2689 = select %2687, %2688, %arg5 : index + %2690 = divi_signed %2689, %c8_1285 : index + %2691 = subi %c-1_1287, %2690 : index + %2692 = select %2687, %2691, %2690 : index + %c8_1288 = constant 8 : index + %2693 = addi %arg5, %c8_1288 : index + %c16_1289 = constant 16 : index + %c0_1290 = constant 0 : index + %c-1_1291 = constant -1 : index + %2694 = cmpi "slt", %2693, %c0_1290 : index + %2695 = subi %c-1_1291, %2693 : index + %2696 = select %2694, %2695, %2693 : index + %2697 = divi_signed %2696, %c16_1289 : index + %2698 = subi %c-1_1291, %2697 : index + %2699 = select %2694, %2698, %2697 : index + %c-2_1292 = constant -2 : index + %2700 = muli %2699, %c-2_1292 : index + %2701 = addi %2692, %2700 : index + %c1_1293 = constant 1 : index + %2702 = addi %2701, %c1_1293 : index + %c2_1294 = constant 2 : index + %c0_1295 = constant 0 : index + %c-1_1296 = constant -1 : index + %2703 = cmpi "slt", %2702, %c0_1295 : index + %2704 = subi %c-1_1296, %2702 : index + %2705 = select %2703, %2704, %2702 : index + %2706 = divi_signed %2705, %c2_1294 : index + %2707 = subi %c-1_1296, %2706 : index + %2708 = select %2703, %2707, %2706 : index + %c-2_1297 = constant -2 : index + %2709 = muli %2708, %c-2_1297 : index + %2710 = addi %2686, %2709 : index + %c1_1298 = constant 1 : index + %2711 = addi %2710, %c1_1298 : index + store %2655, %2[%2666, %2671, %2711] : memref<16x6x2xvector<8xf32>> + %c8_1299 = constant 8 : index + %2712 = addi %arg5, %c8_1299 : index + %c16_1300 = constant 16 : index + %c0_1301 = constant 0 : index + %c-1_1302 = constant -1 : index + %2713 = cmpi "slt", %2712, %c0_1301 : index + %2714 = subi %c-1_1302, %2712 : index + %2715 = select %2713, %2714, %2712 : index + %2716 = divi_signed %2715, %c16_1300 : index + %2717 = subi %c-1_1302, %2716 : index + %2718 = select %2713, %2717, %2716 : index + %c16_1303 = constant 16 : index + %2719 = remi_signed %2718, %c16_1303 : index + %c0_1304 = constant 0 : index + %2720 = cmpi "slt", %2719, %c0_1304 : index + %2721 = addi %2719, %c16_1303 : index + %2722 = select %2720, %2721, %2719 : index + %2723 = addi %arg7, %arg9 : index + %c6_1305 = constant 6 : index + %2724 = remi_signed %2723, %c6_1305 : index + %c0_1306 = constant 0 : index + %2725 = cmpi "slt", %2724, %c0_1306 : index + %2726 = addi %2724, %c6_1305 : index + %2727 = select %2725, %2726, %2724 : index + %c8_1307 = constant 8 : index + %c0_1308 = constant 0 : index + %c-1_1309 = constant -1 : index + %2728 = cmpi "slt", %arg5, %c0_1308 : index + %2729 = subi %c-1_1309, %arg5 : index + %2730 = select %2728, %2729, %arg5 : index + %2731 = divi_signed %2730, %c8_1307 : index + %2732 = subi %c-1_1309, %2731 : index + %2733 = select %2728, %2732, %2731 : index + %c8_1310 = constant 8 : index + %2734 = addi %arg5, %c8_1310 : index + %c16_1311 = constant 16 : index + %c0_1312 = constant 0 : index + %c-1_1313 = constant -1 : index + %2735 = cmpi "slt", %2734, %c0_1312 : index + %2736 = subi %c-1_1313, %2734 : index + %2737 = select %2735, %2736, %2734 : index + %2738 = divi_signed %2737, %c16_1311 : index + %2739 = subi %c-1_1313, %2738 : index + %2740 = select %2735, %2739, %2738 : index + %c-2_1314 = constant -2 : index + %2741 = muli %2740, %c-2_1314 : index + %2742 = addi %2733, %2741 : index + %c8_1315 = constant 8 : index + %c0_1316 = constant 0 : index + %c-1_1317 = constant -1 : index + %2743 = cmpi "slt", %arg5, %c0_1316 : index + %2744 = subi %c-1_1317, %arg5 : index + %2745 = select %2743, %2744, %arg5 : index + %2746 = divi_signed %2745, %c8_1315 : index + %2747 = subi %c-1_1317, %2746 : index + %2748 = select %2743, %2747, %2746 : index + %c8_1318 = constant 8 : index + %2749 = addi %arg5, %c8_1318 : index + %c16_1319 = constant 16 : index + %c0_1320 = constant 0 : index + %c-1_1321 = constant -1 : index + %2750 = cmpi "slt", %2749, %c0_1320 : index + %2751 = subi %c-1_1321, %2749 : index + %2752 = select %2750, %2751, %2749 : index + %2753 = divi_signed %2752, %c16_1319 : index + %2754 = subi %c-1_1321, %2753 : index + %2755 = select %2750, %2754, %2753 : index + %c-2_1322 = constant -2 : index + %2756 = muli %2755, %c-2_1322 : index + %2757 = addi %2748, %2756 : index + %c1_1323 = constant 1 : index + %2758 = addi %2757, %c1_1323 : index + %c2_1324 = constant 2 : index + %c0_1325 = constant 0 : index + %c-1_1326 = constant -1 : index + %2759 = cmpi "slt", %2758, %c0_1325 : index + %2760 = subi %c-1_1326, %2758 : index + %2761 = select %2759, %2760, %2758 : index + %2762 = divi_signed %2761, %c2_1324 : index + %2763 = subi %c-1_1326, %2762 : index + %2764 = select %2759, %2763, %2762 : index + %c-2_1327 = constant -2 : index + %2765 = muli %2764, %c-2_1327 : index + %2766 = addi %2742, %2765 : index + %c1_1328 = constant 1 : index + %2767 = addi %2766, %c1_1328 : index + %2768 = load %2[%2722, %2727, %2767] : memref<16x6x2xvector<8xf32>> + %2769 = vector.insertelement %2478, %2768[%c2_i64 : i64] : vector<8xf32> + %c8_1329 = constant 8 : index + %2770 = addi %arg5, %c8_1329 : index + %c16_1330 = constant 16 : index + %c0_1331 = constant 0 : index + %c-1_1332 = constant -1 : index + %2771 = cmpi "slt", %2770, %c0_1331 : index + %2772 = subi %c-1_1332, %2770 : index + %2773 = select %2771, %2772, %2770 : index + %2774 = divi_signed %2773, %c16_1330 : index + %2775 = subi %c-1_1332, %2774 : index + %2776 = select %2771, %2775, %2774 : index + %c16_1333 = constant 16 : index + %2777 = remi_signed %2776, %c16_1333 : index + %c0_1334 = constant 0 : index + %2778 = cmpi "slt", %2777, %c0_1334 : index + %2779 = addi %2777, %c16_1333 : index + %2780 = select %2778, %2779, %2777 : index + %2781 = addi %arg7, %arg9 : index + %c6_1335 = constant 6 : index + %2782 = remi_signed %2781, %c6_1335 : index + %c0_1336 = constant 0 : index + %2783 = cmpi "slt", %2782, %c0_1336 : index + %2784 = addi %2782, %c6_1335 : index + %2785 = select %2783, %2784, %2782 : index + %c8_1337 = constant 8 : index + %c0_1338 = constant 0 : index + %c-1_1339 = constant -1 : index + %2786 = cmpi "slt", %arg5, %c0_1338 : index + %2787 = subi %c-1_1339, %arg5 : index + %2788 = select %2786, %2787, %arg5 : index + %2789 = divi_signed %2788, %c8_1337 : index + %2790 = subi %c-1_1339, %2789 : index + %2791 = select %2786, %2790, %2789 : index + %c8_1340 = constant 8 : index + %2792 = addi %arg5, %c8_1340 : index + %c16_1341 = constant 16 : index + %c0_1342 = constant 0 : index + %c-1_1343 = constant -1 : index + %2793 = cmpi "slt", %2792, %c0_1342 : index + %2794 = subi %c-1_1343, %2792 : index + %2795 = select %2793, %2794, %2792 : index + %2796 = divi_signed %2795, %c16_1341 : index + %2797 = subi %c-1_1343, %2796 : index + %2798 = select %2793, %2797, %2796 : index + %c-2_1344 = constant -2 : index + %2799 = muli %2798, %c-2_1344 : index + %2800 = addi %2791, %2799 : index + %c8_1345 = constant 8 : index + %c0_1346 = constant 0 : index + %c-1_1347 = constant -1 : index + %2801 = cmpi "slt", %arg5, %c0_1346 : index + %2802 = subi %c-1_1347, %arg5 : index + %2803 = select %2801, %2802, %arg5 : index + %2804 = divi_signed %2803, %c8_1345 : index + %2805 = subi %c-1_1347, %2804 : index + %2806 = select %2801, %2805, %2804 : index + %c8_1348 = constant 8 : index + %2807 = addi %arg5, %c8_1348 : index + %c16_1349 = constant 16 : index + %c0_1350 = constant 0 : index + %c-1_1351 = constant -1 : index + %2808 = cmpi "slt", %2807, %c0_1350 : index + %2809 = subi %c-1_1351, %2807 : index + %2810 = select %2808, %2809, %2807 : index + %2811 = divi_signed %2810, %c16_1349 : index + %2812 = subi %c-1_1351, %2811 : index + %2813 = select %2808, %2812, %2811 : index + %c-2_1352 = constant -2 : index + %2814 = muli %2813, %c-2_1352 : index + %2815 = addi %2806, %2814 : index + %c1_1353 = constant 1 : index + %2816 = addi %2815, %c1_1353 : index + %c2_1354 = constant 2 : index + %c0_1355 = constant 0 : index + %c-1_1356 = constant -1 : index + %2817 = cmpi "slt", %2816, %c0_1355 : index + %2818 = subi %c-1_1356, %2816 : index + %2819 = select %2817, %2818, %2816 : index + %2820 = divi_signed %2819, %c2_1354 : index + %2821 = subi %c-1_1356, %2820 : index + %2822 = select %2817, %2821, %2820 : index + %c-2_1357 = constant -2 : index + %2823 = muli %2822, %c-2_1357 : index + %2824 = addi %2800, %2823 : index + %c1_1358 = constant 1 : index + %2825 = addi %2824, %c1_1358 : index + store %2769, %2[%2780, %2785, %2825] : memref<16x6x2xvector<8xf32>> + %c8_1359 = constant 8 : index + %2826 = addi %arg5, %c8_1359 : index + %c16_1360 = constant 16 : index + %c0_1361 = constant 0 : index + %c-1_1362 = constant -1 : index + %2827 = cmpi "slt", %2826, %c0_1361 : index + %2828 = subi %c-1_1362, %2826 : index + %2829 = select %2827, %2828, %2826 : index + %2830 = divi_signed %2829, %c16_1360 : index + %2831 = subi %c-1_1362, %2830 : index + %2832 = select %2827, %2831, %2830 : index + %c16_1363 = constant 16 : index + %2833 = remi_signed %2832, %c16_1363 : index + %c0_1364 = constant 0 : index + %2834 = cmpi "slt", %2833, %c0_1364 : index + %2835 = addi %2833, %c16_1363 : index + %2836 = select %2834, %2835, %2833 : index + %2837 = addi %arg7, %arg9 : index + %c6_1365 = constant 6 : index + %2838 = remi_signed %2837, %c6_1365 : index + %c0_1366 = constant 0 : index + %2839 = cmpi "slt", %2838, %c0_1366 : index + %2840 = addi %2838, %c6_1365 : index + %2841 = select %2839, %2840, %2838 : index + %c8_1367 = constant 8 : index + %c0_1368 = constant 0 : index + %c-1_1369 = constant -1 : index + %2842 = cmpi "slt", %arg5, %c0_1368 : index + %2843 = subi %c-1_1369, %arg5 : index + %2844 = select %2842, %2843, %arg5 : index + %2845 = divi_signed %2844, %c8_1367 : index + %2846 = subi %c-1_1369, %2845 : index + %2847 = select %2842, %2846, %2845 : index + %c8_1370 = constant 8 : index + %2848 = addi %arg5, %c8_1370 : index + %c16_1371 = constant 16 : index + %c0_1372 = constant 0 : index + %c-1_1373 = constant -1 : index + %2849 = cmpi "slt", %2848, %c0_1372 : index + %2850 = subi %c-1_1373, %2848 : index + %2851 = select %2849, %2850, %2848 : index + %2852 = divi_signed %2851, %c16_1371 : index + %2853 = subi %c-1_1373, %2852 : index + %2854 = select %2849, %2853, %2852 : index + %c-2_1374 = constant -2 : index + %2855 = muli %2854, %c-2_1374 : index + %2856 = addi %2847, %2855 : index + %c8_1375 = constant 8 : index + %c0_1376 = constant 0 : index + %c-1_1377 = constant -1 : index + %2857 = cmpi "slt", %arg5, %c0_1376 : index + %2858 = subi %c-1_1377, %arg5 : index + %2859 = select %2857, %2858, %arg5 : index + %2860 = divi_signed %2859, %c8_1375 : index + %2861 = subi %c-1_1377, %2860 : index + %2862 = select %2857, %2861, %2860 : index + %c8_1378 = constant 8 : index + %2863 = addi %arg5, %c8_1378 : index + %c16_1379 = constant 16 : index + %c0_1380 = constant 0 : index + %c-1_1381 = constant -1 : index + %2864 = cmpi "slt", %2863, %c0_1380 : index + %2865 = subi %c-1_1381, %2863 : index + %2866 = select %2864, %2865, %2863 : index + %2867 = divi_signed %2866, %c16_1379 : index + %2868 = subi %c-1_1381, %2867 : index + %2869 = select %2864, %2868, %2867 : index + %c-2_1382 = constant -2 : index + %2870 = muli %2869, %c-2_1382 : index + %2871 = addi %2862, %2870 : index + %c1_1383 = constant 1 : index + %2872 = addi %2871, %c1_1383 : index + %c2_1384 = constant 2 : index + %c0_1385 = constant 0 : index + %c-1_1386 = constant -1 : index + %2873 = cmpi "slt", %2872, %c0_1385 : index + %2874 = subi %c-1_1386, %2872 : index + %2875 = select %2873, %2874, %2872 : index + %2876 = divi_signed %2875, %c2_1384 : index + %2877 = subi %c-1_1386, %2876 : index + %2878 = select %2873, %2877, %2876 : index + %c-2_1387 = constant -2 : index + %2879 = muli %2878, %c-2_1387 : index + %2880 = addi %2856, %2879 : index + %c1_1388 = constant 1 : index + %2881 = addi %2880, %c1_1388 : index + %2882 = load %2[%2836, %2841, %2881] : memref<16x6x2xvector<8xf32>> + %2883 = vector.insertelement %2479, %2882[%c3_i64 : i64] : vector<8xf32> + %c8_1389 = constant 8 : index + %2884 = addi %arg5, %c8_1389 : index + %c16_1390 = constant 16 : index + %c0_1391 = constant 0 : index + %c-1_1392 = constant -1 : index + %2885 = cmpi "slt", %2884, %c0_1391 : index + %2886 = subi %c-1_1392, %2884 : index + %2887 = select %2885, %2886, %2884 : index + %2888 = divi_signed %2887, %c16_1390 : index + %2889 = subi %c-1_1392, %2888 : index + %2890 = select %2885, %2889, %2888 : index + %c16_1393 = constant 16 : index + %2891 = remi_signed %2890, %c16_1393 : index + %c0_1394 = constant 0 : index + %2892 = cmpi "slt", %2891, %c0_1394 : index + %2893 = addi %2891, %c16_1393 : index + %2894 = select %2892, %2893, %2891 : index + %2895 = addi %arg7, %arg9 : index + %c6_1395 = constant 6 : index + %2896 = remi_signed %2895, %c6_1395 : index + %c0_1396 = constant 0 : index + %2897 = cmpi "slt", %2896, %c0_1396 : index + %2898 = addi %2896, %c6_1395 : index + %2899 = select %2897, %2898, %2896 : index + %c8_1397 = constant 8 : index + %c0_1398 = constant 0 : index + %c-1_1399 = constant -1 : index + %2900 = cmpi "slt", %arg5, %c0_1398 : index + %2901 = subi %c-1_1399, %arg5 : index + %2902 = select %2900, %2901, %arg5 : index + %2903 = divi_signed %2902, %c8_1397 : index + %2904 = subi %c-1_1399, %2903 : index + %2905 = select %2900, %2904, %2903 : index + %c8_1400 = constant 8 : index + %2906 = addi %arg5, %c8_1400 : index + %c16_1401 = constant 16 : index + %c0_1402 = constant 0 : index + %c-1_1403 = constant -1 : index + %2907 = cmpi "slt", %2906, %c0_1402 : index + %2908 = subi %c-1_1403, %2906 : index + %2909 = select %2907, %2908, %2906 : index + %2910 = divi_signed %2909, %c16_1401 : index + %2911 = subi %c-1_1403, %2910 : index + %2912 = select %2907, %2911, %2910 : index + %c-2_1404 = constant -2 : index + %2913 = muli %2912, %c-2_1404 : index + %2914 = addi %2905, %2913 : index + %c8_1405 = constant 8 : index + %c0_1406 = constant 0 : index + %c-1_1407 = constant -1 : index + %2915 = cmpi "slt", %arg5, %c0_1406 : index + %2916 = subi %c-1_1407, %arg5 : index + %2917 = select %2915, %2916, %arg5 : index + %2918 = divi_signed %2917, %c8_1405 : index + %2919 = subi %c-1_1407, %2918 : index + %2920 = select %2915, %2919, %2918 : index + %c8_1408 = constant 8 : index + %2921 = addi %arg5, %c8_1408 : index + %c16_1409 = constant 16 : index + %c0_1410 = constant 0 : index + %c-1_1411 = constant -1 : index + %2922 = cmpi "slt", %2921, %c0_1410 : index + %2923 = subi %c-1_1411, %2921 : index + %2924 = select %2922, %2923, %2921 : index + %2925 = divi_signed %2924, %c16_1409 : index + %2926 = subi %c-1_1411, %2925 : index + %2927 = select %2922, %2926, %2925 : index + %c-2_1412 = constant -2 : index + %2928 = muli %2927, %c-2_1412 : index + %2929 = addi %2920, %2928 : index + %c1_1413 = constant 1 : index + %2930 = addi %2929, %c1_1413 : index + %c2_1414 = constant 2 : index + %c0_1415 = constant 0 : index + %c-1_1416 = constant -1 : index + %2931 = cmpi "slt", %2930, %c0_1415 : index + %2932 = subi %c-1_1416, %2930 : index + %2933 = select %2931, %2932, %2930 : index + %2934 = divi_signed %2933, %c2_1414 : index + %2935 = subi %c-1_1416, %2934 : index + %2936 = select %2931, %2935, %2934 : index + %c-2_1417 = constant -2 : index + %2937 = muli %2936, %c-2_1417 : index + %2938 = addi %2914, %2937 : index + %c1_1418 = constant 1 : index + %2939 = addi %2938, %c1_1418 : index + store %2883, %2[%2894, %2899, %2939] : memref<16x6x2xvector<8xf32>> + %c8_1419 = constant 8 : index + %2940 = addi %arg5, %c8_1419 : index + %c16_1420 = constant 16 : index + %c0_1421 = constant 0 : index + %c-1_1422 = constant -1 : index + %2941 = cmpi "slt", %2940, %c0_1421 : index + %2942 = subi %c-1_1422, %2940 : index + %2943 = select %2941, %2942, %2940 : index + %2944 = divi_signed %2943, %c16_1420 : index + %2945 = subi %c-1_1422, %2944 : index + %2946 = select %2941, %2945, %2944 : index + %c16_1423 = constant 16 : index + %2947 = remi_signed %2946, %c16_1423 : index + %c0_1424 = constant 0 : index + %2948 = cmpi "slt", %2947, %c0_1424 : index + %2949 = addi %2947, %c16_1423 : index + %2950 = select %2948, %2949, %2947 : index + %2951 = addi %arg7, %arg9 : index + %c6_1425 = constant 6 : index + %2952 = remi_signed %2951, %c6_1425 : index + %c0_1426 = constant 0 : index + %2953 = cmpi "slt", %2952, %c0_1426 : index + %2954 = addi %2952, %c6_1425 : index + %2955 = select %2953, %2954, %2952 : index + %c8_1427 = constant 8 : index + %c0_1428 = constant 0 : index + %c-1_1429 = constant -1 : index + %2956 = cmpi "slt", %arg5, %c0_1428 : index + %2957 = subi %c-1_1429, %arg5 : index + %2958 = select %2956, %2957, %arg5 : index + %2959 = divi_signed %2958, %c8_1427 : index + %2960 = subi %c-1_1429, %2959 : index + %2961 = select %2956, %2960, %2959 : index + %c8_1430 = constant 8 : index + %2962 = addi %arg5, %c8_1430 : index + %c16_1431 = constant 16 : index + %c0_1432 = constant 0 : index + %c-1_1433 = constant -1 : index + %2963 = cmpi "slt", %2962, %c0_1432 : index + %2964 = subi %c-1_1433, %2962 : index + %2965 = select %2963, %2964, %2962 : index + %2966 = divi_signed %2965, %c16_1431 : index + %2967 = subi %c-1_1433, %2966 : index + %2968 = select %2963, %2967, %2966 : index + %c-2_1434 = constant -2 : index + %2969 = muli %2968, %c-2_1434 : index + %2970 = addi %2961, %2969 : index + %c8_1435 = constant 8 : index + %c0_1436 = constant 0 : index + %c-1_1437 = constant -1 : index + %2971 = cmpi "slt", %arg5, %c0_1436 : index + %2972 = subi %c-1_1437, %arg5 : index + %2973 = select %2971, %2972, %arg5 : index + %2974 = divi_signed %2973, %c8_1435 : index + %2975 = subi %c-1_1437, %2974 : index + %2976 = select %2971, %2975, %2974 : index + %c8_1438 = constant 8 : index + %2977 = addi %arg5, %c8_1438 : index + %c16_1439 = constant 16 : index + %c0_1440 = constant 0 : index + %c-1_1441 = constant -1 : index + %2978 = cmpi "slt", %2977, %c0_1440 : index + %2979 = subi %c-1_1441, %2977 : index + %2980 = select %2978, %2979, %2977 : index + %2981 = divi_signed %2980, %c16_1439 : index + %2982 = subi %c-1_1441, %2981 : index + %2983 = select %2978, %2982, %2981 : index + %c-2_1442 = constant -2 : index + %2984 = muli %2983, %c-2_1442 : index + %2985 = addi %2976, %2984 : index + %c1_1443 = constant 1 : index + %2986 = addi %2985, %c1_1443 : index + %c2_1444 = constant 2 : index + %c0_1445 = constant 0 : index + %c-1_1446 = constant -1 : index + %2987 = cmpi "slt", %2986, %c0_1445 : index + %2988 = subi %c-1_1446, %2986 : index + %2989 = select %2987, %2988, %2986 : index + %2990 = divi_signed %2989, %c2_1444 : index + %2991 = subi %c-1_1446, %2990 : index + %2992 = select %2987, %2991, %2990 : index + %c-2_1447 = constant -2 : index + %2993 = muli %2992, %c-2_1447 : index + %2994 = addi %2970, %2993 : index + %c1_1448 = constant 1 : index + %2995 = addi %2994, %c1_1448 : index + %2996 = load %2[%2950, %2955, %2995] : memref<16x6x2xvector<8xf32>> + %2997 = vector.insertelement %2480, %2996[%c4_i64 : i64] : vector<8xf32> + %c8_1449 = constant 8 : index + %2998 = addi %arg5, %c8_1449 : index + %c16_1450 = constant 16 : index + %c0_1451 = constant 0 : index + %c-1_1452 = constant -1 : index + %2999 = cmpi "slt", %2998, %c0_1451 : index + %3000 = subi %c-1_1452, %2998 : index + %3001 = select %2999, %3000, %2998 : index + %3002 = divi_signed %3001, %c16_1450 : index + %3003 = subi %c-1_1452, %3002 : index + %3004 = select %2999, %3003, %3002 : index + %c16_1453 = constant 16 : index + %3005 = remi_signed %3004, %c16_1453 : index + %c0_1454 = constant 0 : index + %3006 = cmpi "slt", %3005, %c0_1454 : index + %3007 = addi %3005, %c16_1453 : index + %3008 = select %3006, %3007, %3005 : index + %3009 = addi %arg7, %arg9 : index + %c6_1455 = constant 6 : index + %3010 = remi_signed %3009, %c6_1455 : index + %c0_1456 = constant 0 : index + %3011 = cmpi "slt", %3010, %c0_1456 : index + %3012 = addi %3010, %c6_1455 : index + %3013 = select %3011, %3012, %3010 : index + %c8_1457 = constant 8 : index + %c0_1458 = constant 0 : index + %c-1_1459 = constant -1 : index + %3014 = cmpi "slt", %arg5, %c0_1458 : index + %3015 = subi %c-1_1459, %arg5 : index + %3016 = select %3014, %3015, %arg5 : index + %3017 = divi_signed %3016, %c8_1457 : index + %3018 = subi %c-1_1459, %3017 : index + %3019 = select %3014, %3018, %3017 : index + %c8_1460 = constant 8 : index + %3020 = addi %arg5, %c8_1460 : index + %c16_1461 = constant 16 : index + %c0_1462 = constant 0 : index + %c-1_1463 = constant -1 : index + %3021 = cmpi "slt", %3020, %c0_1462 : index + %3022 = subi %c-1_1463, %3020 : index + %3023 = select %3021, %3022, %3020 : index + %3024 = divi_signed %3023, %c16_1461 : index + %3025 = subi %c-1_1463, %3024 : index + %3026 = select %3021, %3025, %3024 : index + %c-2_1464 = constant -2 : index + %3027 = muli %3026, %c-2_1464 : index + %3028 = addi %3019, %3027 : index + %c8_1465 = constant 8 : index + %c0_1466 = constant 0 : index + %c-1_1467 = constant -1 : index + %3029 = cmpi "slt", %arg5, %c0_1466 : index + %3030 = subi %c-1_1467, %arg5 : index + %3031 = select %3029, %3030, %arg5 : index + %3032 = divi_signed %3031, %c8_1465 : index + %3033 = subi %c-1_1467, %3032 : index + %3034 = select %3029, %3033, %3032 : index + %c8_1468 = constant 8 : index + %3035 = addi %arg5, %c8_1468 : index + %c16_1469 = constant 16 : index + %c0_1470 = constant 0 : index + %c-1_1471 = constant -1 : index + %3036 = cmpi "slt", %3035, %c0_1470 : index + %3037 = subi %c-1_1471, %3035 : index + %3038 = select %3036, %3037, %3035 : index + %3039 = divi_signed %3038, %c16_1469 : index + %3040 = subi %c-1_1471, %3039 : index + %3041 = select %3036, %3040, %3039 : index + %c-2_1472 = constant -2 : index + %3042 = muli %3041, %c-2_1472 : index + %3043 = addi %3034, %3042 : index + %c1_1473 = constant 1 : index + %3044 = addi %3043, %c1_1473 : index + %c2_1474 = constant 2 : index + %c0_1475 = constant 0 : index + %c-1_1476 = constant -1 : index + %3045 = cmpi "slt", %3044, %c0_1475 : index + %3046 = subi %c-1_1476, %3044 : index + %3047 = select %3045, %3046, %3044 : index + %3048 = divi_signed %3047, %c2_1474 : index + %3049 = subi %c-1_1476, %3048 : index + %3050 = select %3045, %3049, %3048 : index + %c-2_1477 = constant -2 : index + %3051 = muli %3050, %c-2_1477 : index + %3052 = addi %3028, %3051 : index + %c1_1478 = constant 1 : index + %3053 = addi %3052, %c1_1478 : index + store %2997, %2[%3008, %3013, %3053] : memref<16x6x2xvector<8xf32>> + %c8_1479 = constant 8 : index + %3054 = addi %arg5, %c8_1479 : index + %c16_1480 = constant 16 : index + %c0_1481 = constant 0 : index + %c-1_1482 = constant -1 : index + %3055 = cmpi "slt", %3054, %c0_1481 : index + %3056 = subi %c-1_1482, %3054 : index + %3057 = select %3055, %3056, %3054 : index + %3058 = divi_signed %3057, %c16_1480 : index + %3059 = subi %c-1_1482, %3058 : index + %3060 = select %3055, %3059, %3058 : index + %c16_1483 = constant 16 : index + %3061 = remi_signed %3060, %c16_1483 : index + %c0_1484 = constant 0 : index + %3062 = cmpi "slt", %3061, %c0_1484 : index + %3063 = addi %3061, %c16_1483 : index + %3064 = select %3062, %3063, %3061 : index + %3065 = addi %arg7, %arg9 : index + %c6_1485 = constant 6 : index + %3066 = remi_signed %3065, %c6_1485 : index + %c0_1486 = constant 0 : index + %3067 = cmpi "slt", %3066, %c0_1486 : index + %3068 = addi %3066, %c6_1485 : index + %3069 = select %3067, %3068, %3066 : index + %c8_1487 = constant 8 : index + %c0_1488 = constant 0 : index + %c-1_1489 = constant -1 : index + %3070 = cmpi "slt", %arg5, %c0_1488 : index + %3071 = subi %c-1_1489, %arg5 : index + %3072 = select %3070, %3071, %arg5 : index + %3073 = divi_signed %3072, %c8_1487 : index + %3074 = subi %c-1_1489, %3073 : index + %3075 = select %3070, %3074, %3073 : index + %c8_1490 = constant 8 : index + %3076 = addi %arg5, %c8_1490 : index + %c16_1491 = constant 16 : index + %c0_1492 = constant 0 : index + %c-1_1493 = constant -1 : index + %3077 = cmpi "slt", %3076, %c0_1492 : index + %3078 = subi %c-1_1493, %3076 : index + %3079 = select %3077, %3078, %3076 : index + %3080 = divi_signed %3079, %c16_1491 : index + %3081 = subi %c-1_1493, %3080 : index + %3082 = select %3077, %3081, %3080 : index + %c-2_1494 = constant -2 : index + %3083 = muli %3082, %c-2_1494 : index + %3084 = addi %3075, %3083 : index + %c8_1495 = constant 8 : index + %c0_1496 = constant 0 : index + %c-1_1497 = constant -1 : index + %3085 = cmpi "slt", %arg5, %c0_1496 : index + %3086 = subi %c-1_1497, %arg5 : index + %3087 = select %3085, %3086, %arg5 : index + %3088 = divi_signed %3087, %c8_1495 : index + %3089 = subi %c-1_1497, %3088 : index + %3090 = select %3085, %3089, %3088 : index + %c8_1498 = constant 8 : index + %3091 = addi %arg5, %c8_1498 : index + %c16_1499 = constant 16 : index + %c0_1500 = constant 0 : index + %c-1_1501 = constant -1 : index + %3092 = cmpi "slt", %3091, %c0_1500 : index + %3093 = subi %c-1_1501, %3091 : index + %3094 = select %3092, %3093, %3091 : index + %3095 = divi_signed %3094, %c16_1499 : index + %3096 = subi %c-1_1501, %3095 : index + %3097 = select %3092, %3096, %3095 : index + %c-2_1502 = constant -2 : index + %3098 = muli %3097, %c-2_1502 : index + %3099 = addi %3090, %3098 : index + %c1_1503 = constant 1 : index + %3100 = addi %3099, %c1_1503 : index + %c2_1504 = constant 2 : index + %c0_1505 = constant 0 : index + %c-1_1506 = constant -1 : index + %3101 = cmpi "slt", %3100, %c0_1505 : index + %3102 = subi %c-1_1506, %3100 : index + %3103 = select %3101, %3102, %3100 : index + %3104 = divi_signed %3103, %c2_1504 : index + %3105 = subi %c-1_1506, %3104 : index + %3106 = select %3101, %3105, %3104 : index + %c-2_1507 = constant -2 : index + %3107 = muli %3106, %c-2_1507 : index + %3108 = addi %3084, %3107 : index + %c1_1508 = constant 1 : index + %3109 = addi %3108, %c1_1508 : index + %3110 = load %2[%3064, %3069, %3109] : memref<16x6x2xvector<8xf32>> + %3111 = vector.insertelement %2481, %3110[%c5_i64 : i64] : vector<8xf32> + %c8_1509 = constant 8 : index + %3112 = addi %arg5, %c8_1509 : index + %c16_1510 = constant 16 : index + %c0_1511 = constant 0 : index + %c-1_1512 = constant -1 : index + %3113 = cmpi "slt", %3112, %c0_1511 : index + %3114 = subi %c-1_1512, %3112 : index + %3115 = select %3113, %3114, %3112 : index + %3116 = divi_signed %3115, %c16_1510 : index + %3117 = subi %c-1_1512, %3116 : index + %3118 = select %3113, %3117, %3116 : index + %c16_1513 = constant 16 : index + %3119 = remi_signed %3118, %c16_1513 : index + %c0_1514 = constant 0 : index + %3120 = cmpi "slt", %3119, %c0_1514 : index + %3121 = addi %3119, %c16_1513 : index + %3122 = select %3120, %3121, %3119 : index + %3123 = addi %arg7, %arg9 : index + %c6_1515 = constant 6 : index + %3124 = remi_signed %3123, %c6_1515 : index + %c0_1516 = constant 0 : index + %3125 = cmpi "slt", %3124, %c0_1516 : index + %3126 = addi %3124, %c6_1515 : index + %3127 = select %3125, %3126, %3124 : index + %c8_1517 = constant 8 : index + %c0_1518 = constant 0 : index + %c-1_1519 = constant -1 : index + %3128 = cmpi "slt", %arg5, %c0_1518 : index + %3129 = subi %c-1_1519, %arg5 : index + %3130 = select %3128, %3129, %arg5 : index + %3131 = divi_signed %3130, %c8_1517 : index + %3132 = subi %c-1_1519, %3131 : index + %3133 = select %3128, %3132, %3131 : index + %c8_1520 = constant 8 : index + %3134 = addi %arg5, %c8_1520 : index + %c16_1521 = constant 16 : index + %c0_1522 = constant 0 : index + %c-1_1523 = constant -1 : index + %3135 = cmpi "slt", %3134, %c0_1522 : index + %3136 = subi %c-1_1523, %3134 : index + %3137 = select %3135, %3136, %3134 : index + %3138 = divi_signed %3137, %c16_1521 : index + %3139 = subi %c-1_1523, %3138 : index + %3140 = select %3135, %3139, %3138 : index + %c-2_1524 = constant -2 : index + %3141 = muli %3140, %c-2_1524 : index + %3142 = addi %3133, %3141 : index + %c8_1525 = constant 8 : index + %c0_1526 = constant 0 : index + %c-1_1527 = constant -1 : index + %3143 = cmpi "slt", %arg5, %c0_1526 : index + %3144 = subi %c-1_1527, %arg5 : index + %3145 = select %3143, %3144, %arg5 : index + %3146 = divi_signed %3145, %c8_1525 : index + %3147 = subi %c-1_1527, %3146 : index + %3148 = select %3143, %3147, %3146 : index + %c8_1528 = constant 8 : index + %3149 = addi %arg5, %c8_1528 : index + %c16_1529 = constant 16 : index + %c0_1530 = constant 0 : index + %c-1_1531 = constant -1 : index + %3150 = cmpi "slt", %3149, %c0_1530 : index + %3151 = subi %c-1_1531, %3149 : index + %3152 = select %3150, %3151, %3149 : index + %3153 = divi_signed %3152, %c16_1529 : index + %3154 = subi %c-1_1531, %3153 : index + %3155 = select %3150, %3154, %3153 : index + %c-2_1532 = constant -2 : index + %3156 = muli %3155, %c-2_1532 : index + %3157 = addi %3148, %3156 : index + %c1_1533 = constant 1 : index + %3158 = addi %3157, %c1_1533 : index + %c2_1534 = constant 2 : index + %c0_1535 = constant 0 : index + %c-1_1536 = constant -1 : index + %3159 = cmpi "slt", %3158, %c0_1535 : index + %3160 = subi %c-1_1536, %3158 : index + %3161 = select %3159, %3160, %3158 : index + %3162 = divi_signed %3161, %c2_1534 : index + %3163 = subi %c-1_1536, %3162 : index + %3164 = select %3159, %3163, %3162 : index + %c-2_1537 = constant -2 : index + %3165 = muli %3164, %c-2_1537 : index + %3166 = addi %3142, %3165 : index + %c1_1538 = constant 1 : index + %3167 = addi %3166, %c1_1538 : index + store %3111, %2[%3122, %3127, %3167] : memref<16x6x2xvector<8xf32>> + %c8_1539 = constant 8 : index + %3168 = addi %arg5, %c8_1539 : index + %c16_1540 = constant 16 : index + %c0_1541 = constant 0 : index + %c-1_1542 = constant -1 : index + %3169 = cmpi "slt", %3168, %c0_1541 : index + %3170 = subi %c-1_1542, %3168 : index + %3171 = select %3169, %3170, %3168 : index + %3172 = divi_signed %3171, %c16_1540 : index + %3173 = subi %c-1_1542, %3172 : index + %3174 = select %3169, %3173, %3172 : index + %c16_1543 = constant 16 : index + %3175 = remi_signed %3174, %c16_1543 : index + %c0_1544 = constant 0 : index + %3176 = cmpi "slt", %3175, %c0_1544 : index + %3177 = addi %3175, %c16_1543 : index + %3178 = select %3176, %3177, %3175 : index + %3179 = addi %arg7, %arg9 : index + %c6_1545 = constant 6 : index + %3180 = remi_signed %3179, %c6_1545 : index + %c0_1546 = constant 0 : index + %3181 = cmpi "slt", %3180, %c0_1546 : index + %3182 = addi %3180, %c6_1545 : index + %3183 = select %3181, %3182, %3180 : index + %c8_1547 = constant 8 : index + %c0_1548 = constant 0 : index + %c-1_1549 = constant -1 : index + %3184 = cmpi "slt", %arg5, %c0_1548 : index + %3185 = subi %c-1_1549, %arg5 : index + %3186 = select %3184, %3185, %arg5 : index + %3187 = divi_signed %3186, %c8_1547 : index + %3188 = subi %c-1_1549, %3187 : index + %3189 = select %3184, %3188, %3187 : index + %c8_1550 = constant 8 : index + %3190 = addi %arg5, %c8_1550 : index + %c16_1551 = constant 16 : index + %c0_1552 = constant 0 : index + %c-1_1553 = constant -1 : index + %3191 = cmpi "slt", %3190, %c0_1552 : index + %3192 = subi %c-1_1553, %3190 : index + %3193 = select %3191, %3192, %3190 : index + %3194 = divi_signed %3193, %c16_1551 : index + %3195 = subi %c-1_1553, %3194 : index + %3196 = select %3191, %3195, %3194 : index + %c-2_1554 = constant -2 : index + %3197 = muli %3196, %c-2_1554 : index + %3198 = addi %3189, %3197 : index + %c8_1555 = constant 8 : index + %c0_1556 = constant 0 : index + %c-1_1557 = constant -1 : index + %3199 = cmpi "slt", %arg5, %c0_1556 : index + %3200 = subi %c-1_1557, %arg5 : index + %3201 = select %3199, %3200, %arg5 : index + %3202 = divi_signed %3201, %c8_1555 : index + %3203 = subi %c-1_1557, %3202 : index + %3204 = select %3199, %3203, %3202 : index + %c8_1558 = constant 8 : index + %3205 = addi %arg5, %c8_1558 : index + %c16_1559 = constant 16 : index + %c0_1560 = constant 0 : index + %c-1_1561 = constant -1 : index + %3206 = cmpi "slt", %3205, %c0_1560 : index + %3207 = subi %c-1_1561, %3205 : index + %3208 = select %3206, %3207, %3205 : index + %3209 = divi_signed %3208, %c16_1559 : index + %3210 = subi %c-1_1561, %3209 : index + %3211 = select %3206, %3210, %3209 : index + %c-2_1562 = constant -2 : index + %3212 = muli %3211, %c-2_1562 : index + %3213 = addi %3204, %3212 : index + %c1_1563 = constant 1 : index + %3214 = addi %3213, %c1_1563 : index + %c2_1564 = constant 2 : index + %c0_1565 = constant 0 : index + %c-1_1566 = constant -1 : index + %3215 = cmpi "slt", %3214, %c0_1565 : index + %3216 = subi %c-1_1566, %3214 : index + %3217 = select %3215, %3216, %3214 : index + %3218 = divi_signed %3217, %c2_1564 : index + %3219 = subi %c-1_1566, %3218 : index + %3220 = select %3215, %3219, %3218 : index + %c-2_1567 = constant -2 : index + %3221 = muli %3220, %c-2_1567 : index + %3222 = addi %3198, %3221 : index + %c1_1568 = constant 1 : index + %3223 = addi %3222, %c1_1568 : index + %3224 = load %2[%3178, %3183, %3223] : memref<16x6x2xvector<8xf32>> + %3225 = vector.insertelement %2482, %3224[%c6_i64 : i64] : vector<8xf32> + %c8_1569 = constant 8 : index + %3226 = addi %arg5, %c8_1569 : index + %c16_1570 = constant 16 : index + %c0_1571 = constant 0 : index + %c-1_1572 = constant -1 : index + %3227 = cmpi "slt", %3226, %c0_1571 : index + %3228 = subi %c-1_1572, %3226 : index + %3229 = select %3227, %3228, %3226 : index + %3230 = divi_signed %3229, %c16_1570 : index + %3231 = subi %c-1_1572, %3230 : index + %3232 = select %3227, %3231, %3230 : index + %c16_1573 = constant 16 : index + %3233 = remi_signed %3232, %c16_1573 : index + %c0_1574 = constant 0 : index + %3234 = cmpi "slt", %3233, %c0_1574 : index + %3235 = addi %3233, %c16_1573 : index + %3236 = select %3234, %3235, %3233 : index + %3237 = addi %arg7, %arg9 : index + %c6_1575 = constant 6 : index + %3238 = remi_signed %3237, %c6_1575 : index + %c0_1576 = constant 0 : index + %3239 = cmpi "slt", %3238, %c0_1576 : index + %3240 = addi %3238, %c6_1575 : index + %3241 = select %3239, %3240, %3238 : index + %c8_1577 = constant 8 : index + %c0_1578 = constant 0 : index + %c-1_1579 = constant -1 : index + %3242 = cmpi "slt", %arg5, %c0_1578 : index + %3243 = subi %c-1_1579, %arg5 : index + %3244 = select %3242, %3243, %arg5 : index + %3245 = divi_signed %3244, %c8_1577 : index + %3246 = subi %c-1_1579, %3245 : index + %3247 = select %3242, %3246, %3245 : index + %c8_1580 = constant 8 : index + %3248 = addi %arg5, %c8_1580 : index + %c16_1581 = constant 16 : index + %c0_1582 = constant 0 : index + %c-1_1583 = constant -1 : index + %3249 = cmpi "slt", %3248, %c0_1582 : index + %3250 = subi %c-1_1583, %3248 : index + %3251 = select %3249, %3250, %3248 : index + %3252 = divi_signed %3251, %c16_1581 : index + %3253 = subi %c-1_1583, %3252 : index + %3254 = select %3249, %3253, %3252 : index + %c-2_1584 = constant -2 : index + %3255 = muli %3254, %c-2_1584 : index + %3256 = addi %3247, %3255 : index + %c8_1585 = constant 8 : index + %c0_1586 = constant 0 : index + %c-1_1587 = constant -1 : index + %3257 = cmpi "slt", %arg5, %c0_1586 : index + %3258 = subi %c-1_1587, %arg5 : index + %3259 = select %3257, %3258, %arg5 : index + %3260 = divi_signed %3259, %c8_1585 : index + %3261 = subi %c-1_1587, %3260 : index + %3262 = select %3257, %3261, %3260 : index + %c8_1588 = constant 8 : index + %3263 = addi %arg5, %c8_1588 : index + %c16_1589 = constant 16 : index + %c0_1590 = constant 0 : index + %c-1_1591 = constant -1 : index + %3264 = cmpi "slt", %3263, %c0_1590 : index + %3265 = subi %c-1_1591, %3263 : index + %3266 = select %3264, %3265, %3263 : index + %3267 = divi_signed %3266, %c16_1589 : index + %3268 = subi %c-1_1591, %3267 : index + %3269 = select %3264, %3268, %3267 : index + %c-2_1592 = constant -2 : index + %3270 = muli %3269, %c-2_1592 : index + %3271 = addi %3262, %3270 : index + %c1_1593 = constant 1 : index + %3272 = addi %3271, %c1_1593 : index + %c2_1594 = constant 2 : index + %c0_1595 = constant 0 : index + %c-1_1596 = constant -1 : index + %3273 = cmpi "slt", %3272, %c0_1595 : index + %3274 = subi %c-1_1596, %3272 : index + %3275 = select %3273, %3274, %3272 : index + %3276 = divi_signed %3275, %c2_1594 : index + %3277 = subi %c-1_1596, %3276 : index + %3278 = select %3273, %3277, %3276 : index + %c-2_1597 = constant -2 : index + %3279 = muli %3278, %c-2_1597 : index + %3280 = addi %3256, %3279 : index + %c1_1598 = constant 1 : index + %3281 = addi %3280, %c1_1598 : index + store %3225, %2[%3236, %3241, %3281] : memref<16x6x2xvector<8xf32>> + %c8_1599 = constant 8 : index + %3282 = addi %arg5, %c8_1599 : index + %c16_1600 = constant 16 : index + %c0_1601 = constant 0 : index + %c-1_1602 = constant -1 : index + %3283 = cmpi "slt", %3282, %c0_1601 : index + %3284 = subi %c-1_1602, %3282 : index + %3285 = select %3283, %3284, %3282 : index + %3286 = divi_signed %3285, %c16_1600 : index + %3287 = subi %c-1_1602, %3286 : index + %3288 = select %3283, %3287, %3286 : index + %c16_1603 = constant 16 : index + %3289 = remi_signed %3288, %c16_1603 : index + %c0_1604 = constant 0 : index + %3290 = cmpi "slt", %3289, %c0_1604 : index + %3291 = addi %3289, %c16_1603 : index + %3292 = select %3290, %3291, %3289 : index + %3293 = addi %arg7, %arg9 : index + %c6_1605 = constant 6 : index + %3294 = remi_signed %3293, %c6_1605 : index + %c0_1606 = constant 0 : index + %3295 = cmpi "slt", %3294, %c0_1606 : index + %3296 = addi %3294, %c6_1605 : index + %3297 = select %3295, %3296, %3294 : index + %c8_1607 = constant 8 : index + %c0_1608 = constant 0 : index + %c-1_1609 = constant -1 : index + %3298 = cmpi "slt", %arg5, %c0_1608 : index + %3299 = subi %c-1_1609, %arg5 : index + %3300 = select %3298, %3299, %arg5 : index + %3301 = divi_signed %3300, %c8_1607 : index + %3302 = subi %c-1_1609, %3301 : index + %3303 = select %3298, %3302, %3301 : index + %c8_1610 = constant 8 : index + %3304 = addi %arg5, %c8_1610 : index + %c16_1611 = constant 16 : index + %c0_1612 = constant 0 : index + %c-1_1613 = constant -1 : index + %3305 = cmpi "slt", %3304, %c0_1612 : index + %3306 = subi %c-1_1613, %3304 : index + %3307 = select %3305, %3306, %3304 : index + %3308 = divi_signed %3307, %c16_1611 : index + %3309 = subi %c-1_1613, %3308 : index + %3310 = select %3305, %3309, %3308 : index + %c-2_1614 = constant -2 : index + %3311 = muli %3310, %c-2_1614 : index + %3312 = addi %3303, %3311 : index + %c8_1615 = constant 8 : index + %c0_1616 = constant 0 : index + %c-1_1617 = constant -1 : index + %3313 = cmpi "slt", %arg5, %c0_1616 : index + %3314 = subi %c-1_1617, %arg5 : index + %3315 = select %3313, %3314, %arg5 : index + %3316 = divi_signed %3315, %c8_1615 : index + %3317 = subi %c-1_1617, %3316 : index + %3318 = select %3313, %3317, %3316 : index + %c8_1618 = constant 8 : index + %3319 = addi %arg5, %c8_1618 : index + %c16_1619 = constant 16 : index + %c0_1620 = constant 0 : index + %c-1_1621 = constant -1 : index + %3320 = cmpi "slt", %3319, %c0_1620 : index + %3321 = subi %c-1_1621, %3319 : index + %3322 = select %3320, %3321, %3319 : index + %3323 = divi_signed %3322, %c16_1619 : index + %3324 = subi %c-1_1621, %3323 : index + %3325 = select %3320, %3324, %3323 : index + %c-2_1622 = constant -2 : index + %3326 = muli %3325, %c-2_1622 : index + %3327 = addi %3318, %3326 : index + %c1_1623 = constant 1 : index + %3328 = addi %3327, %c1_1623 : index + %c2_1624 = constant 2 : index + %c0_1625 = constant 0 : index + %c-1_1626 = constant -1 : index + %3329 = cmpi "slt", %3328, %c0_1625 : index + %3330 = subi %c-1_1626, %3328 : index + %3331 = select %3329, %3330, %3328 : index + %3332 = divi_signed %3331, %c2_1624 : index + %3333 = subi %c-1_1626, %3332 : index + %3334 = select %3329, %3333, %3332 : index + %c-2_1627 = constant -2 : index + %3335 = muli %3334, %c-2_1627 : index + %3336 = addi %3312, %3335 : index + %c1_1628 = constant 1 : index + %3337 = addi %3336, %c1_1628 : index + %3338 = load %2[%3292, %3297, %3337] : memref<16x6x2xvector<8xf32>> + %3339 = vector.insertelement %2483, %3338[%c7_i64 : i64] : vector<8xf32> + %c8_1629 = constant 8 : index + %3340 = addi %arg5, %c8_1629 : index + %c16_1630 = constant 16 : index + %c0_1631 = constant 0 : index + %c-1_1632 = constant -1 : index + %3341 = cmpi "slt", %3340, %c0_1631 : index + %3342 = subi %c-1_1632, %3340 : index + %3343 = select %3341, %3342, %3340 : index + %3344 = divi_signed %3343, %c16_1630 : index + %3345 = subi %c-1_1632, %3344 : index + %3346 = select %3341, %3345, %3344 : index + %c16_1633 = constant 16 : index + %3347 = remi_signed %3346, %c16_1633 : index + %c0_1634 = constant 0 : index + %3348 = cmpi "slt", %3347, %c0_1634 : index + %3349 = addi %3347, %c16_1633 : index + %3350 = select %3348, %3349, %3347 : index + %3351 = addi %arg7, %arg9 : index + %c6_1635 = constant 6 : index + %3352 = remi_signed %3351, %c6_1635 : index + %c0_1636 = constant 0 : index + %3353 = cmpi "slt", %3352, %c0_1636 : index + %3354 = addi %3352, %c6_1635 : index + %3355 = select %3353, %3354, %3352 : index + %c8_1637 = constant 8 : index + %c0_1638 = constant 0 : index + %c-1_1639 = constant -1 : index + %3356 = cmpi "slt", %arg5, %c0_1638 : index + %3357 = subi %c-1_1639, %arg5 : index + %3358 = select %3356, %3357, %arg5 : index + %3359 = divi_signed %3358, %c8_1637 : index + %3360 = subi %c-1_1639, %3359 : index + %3361 = select %3356, %3360, %3359 : index + %c8_1640 = constant 8 : index + %3362 = addi %arg5, %c8_1640 : index + %c16_1641 = constant 16 : index + %c0_1642 = constant 0 : index + %c-1_1643 = constant -1 : index + %3363 = cmpi "slt", %3362, %c0_1642 : index + %3364 = subi %c-1_1643, %3362 : index + %3365 = select %3363, %3364, %3362 : index + %3366 = divi_signed %3365, %c16_1641 : index + %3367 = subi %c-1_1643, %3366 : index + %3368 = select %3363, %3367, %3366 : index + %c-2_1644 = constant -2 : index + %3369 = muli %3368, %c-2_1644 : index + %3370 = addi %3361, %3369 : index + %c8_1645 = constant 8 : index + %c0_1646 = constant 0 : index + %c-1_1647 = constant -1 : index + %3371 = cmpi "slt", %arg5, %c0_1646 : index + %3372 = subi %c-1_1647, %arg5 : index + %3373 = select %3371, %3372, %arg5 : index + %3374 = divi_signed %3373, %c8_1645 : index + %3375 = subi %c-1_1647, %3374 : index + %3376 = select %3371, %3375, %3374 : index + %c8_1648 = constant 8 : index + %3377 = addi %arg5, %c8_1648 : index + %c16_1649 = constant 16 : index + %c0_1650 = constant 0 : index + %c-1_1651 = constant -1 : index + %3378 = cmpi "slt", %3377, %c0_1650 : index + %3379 = subi %c-1_1651, %3377 : index + %3380 = select %3378, %3379, %3377 : index + %3381 = divi_signed %3380, %c16_1649 : index + %3382 = subi %c-1_1651, %3381 : index + %3383 = select %3378, %3382, %3381 : index + %c-2_1652 = constant -2 : index + %3384 = muli %3383, %c-2_1652 : index + %3385 = addi %3376, %3384 : index + %c1_1653 = constant 1 : index + %3386 = addi %3385, %c1_1653 : index + %c2_1654 = constant 2 : index + %c0_1655 = constant 0 : index + %c-1_1656 = constant -1 : index + %3387 = cmpi "slt", %3386, %c0_1655 : index + %3388 = subi %c-1_1656, %3386 : index + %3389 = select %3387, %3388, %3386 : index + %3390 = divi_signed %3389, %c2_1654 : index + %3391 = subi %c-1_1656, %3390 : index + %3392 = select %3387, %3391, %3390 : index + %c-2_1657 = constant -2 : index + %3393 = muli %3392, %c-2_1657 : index + %3394 = addi %3370, %3393 : index + %c1_1658 = constant 1 : index + %3395 = addi %3394, %c1_1658 : index + store %3339, %2[%3350, %3355, %3395] : memref<16x6x2xvector<8xf32>> + %c8_1659 = constant 8 : index + %3396 = addi %arg5, %c8_1659 : index + %c16_1660 = constant 16 : index + %c0_1661 = constant 0 : index + %c-1_1662 = constant -1 : index + %3397 = cmpi "slt", %3396, %c0_1661 : index + %3398 = subi %c-1_1662, %3396 : index + %3399 = select %3397, %3398, %3396 : index + %3400 = divi_signed %3399, %c16_1660 : index + %3401 = subi %c-1_1662, %3400 : index + %3402 = select %3397, %3401, %3400 : index + %c16_1663 = constant 16 : index + %3403 = remi_signed %3402, %c16_1663 : index + %c0_1664 = constant 0 : index + %3404 = cmpi "slt", %3403, %c0_1664 : index + %3405 = addi %3403, %c16_1663 : index + %3406 = select %3404, %3405, %3403 : index + %3407 = addi %arg7, %arg9 : index + %c6_1665 = constant 6 : index + %3408 = remi_signed %3407, %c6_1665 : index + %c0_1666 = constant 0 : index + %3409 = cmpi "slt", %3408, %c0_1666 : index + %3410 = addi %3408, %c6_1665 : index + %3411 = select %3409, %3410, %3408 : index + %c8_1667 = constant 8 : index + %c0_1668 = constant 0 : index + %c-1_1669 = constant -1 : index + %3412 = cmpi "slt", %arg5, %c0_1668 : index + %3413 = subi %c-1_1669, %arg5 : index + %3414 = select %3412, %3413, %arg5 : index + %3415 = divi_signed %3414, %c8_1667 : index + %3416 = subi %c-1_1669, %3415 : index + %3417 = select %3412, %3416, %3415 : index + %c8_1670 = constant 8 : index + %3418 = addi %arg5, %c8_1670 : index + %c16_1671 = constant 16 : index + %c0_1672 = constant 0 : index + %c-1_1673 = constant -1 : index + %3419 = cmpi "slt", %3418, %c0_1672 : index + %3420 = subi %c-1_1673, %3418 : index + %3421 = select %3419, %3420, %3418 : index + %3422 = divi_signed %3421, %c16_1671 : index + %3423 = subi %c-1_1673, %3422 : index + %3424 = select %3419, %3423, %3422 : index + %c-2_1674 = constant -2 : index + %3425 = muli %3424, %c-2_1674 : index + %3426 = addi %3417, %3425 : index + %c8_1675 = constant 8 : index + %c0_1676 = constant 0 : index + %c-1_1677 = constant -1 : index + %3427 = cmpi "slt", %arg5, %c0_1676 : index + %3428 = subi %c-1_1677, %arg5 : index + %3429 = select %3427, %3428, %arg5 : index + %3430 = divi_signed %3429, %c8_1675 : index + %3431 = subi %c-1_1677, %3430 : index + %3432 = select %3427, %3431, %3430 : index + %c8_1678 = constant 8 : index + %3433 = addi %arg5, %c8_1678 : index + %c16_1679 = constant 16 : index + %c0_1680 = constant 0 : index + %c-1_1681 = constant -1 : index + %3434 = cmpi "slt", %3433, %c0_1680 : index + %3435 = subi %c-1_1681, %3433 : index + %3436 = select %3434, %3435, %3433 : index + %3437 = divi_signed %3436, %c16_1679 : index + %3438 = subi %c-1_1681, %3437 : index + %3439 = select %3434, %3438, %3437 : index + %c-2_1682 = constant -2 : index + %3440 = muli %3439, %c-2_1682 : index + %3441 = addi %3432, %3440 : index + %c1_1683 = constant 1 : index + %3442 = addi %3441, %c1_1683 : index + %c2_1684 = constant 2 : index + %c0_1685 = constant 0 : index + %c-1_1686 = constant -1 : index + %3443 = cmpi "slt", %3442, %c0_1685 : index + %3444 = subi %c-1_1686, %3442 : index + %3445 = select %3443, %3444, %3442 : index + %3446 = divi_signed %3445, %c2_1684 : index + %3447 = subi %c-1_1686, %3446 : index + %3448 = select %3443, %3447, %3446 : index + %c-2_1687 = constant -2 : index + %3449 = muli %3448, %c-2_1687 : index + %3450 = addi %3426, %3449 : index + %c1_1688 = constant 1 : index + %3451 = addi %3450, %c1_1688 : index + %3452 = load %2[%3406, %3411, %3451] : memref<16x6x2xvector<8xf32>> + %3453 = vector.insertelement %2476, %3452[%c0_i64 : i64] : vector<8xf32> + %c8_1689 = constant 8 : index + %3454 = addi %arg5, %c8_1689 : index + %c16_1690 = constant 16 : index + %c0_1691 = constant 0 : index + %c-1_1692 = constant -1 : index + %3455 = cmpi "slt", %3454, %c0_1691 : index + %3456 = subi %c-1_1692, %3454 : index + %3457 = select %3455, %3456, %3454 : index + %3458 = divi_signed %3457, %c16_1690 : index + %3459 = subi %c-1_1692, %3458 : index + %3460 = select %3455, %3459, %3458 : index + %c16_1693 = constant 16 : index + %3461 = remi_signed %3460, %c16_1693 : index + %c0_1694 = constant 0 : index + %3462 = cmpi "slt", %3461, %c0_1694 : index + %3463 = addi %3461, %c16_1693 : index + %3464 = select %3462, %3463, %3461 : index + %3465 = addi %arg7, %arg9 : index + %c6_1695 = constant 6 : index + %3466 = remi_signed %3465, %c6_1695 : index + %c0_1696 = constant 0 : index + %3467 = cmpi "slt", %3466, %c0_1696 : index + %3468 = addi %3466, %c6_1695 : index + %3469 = select %3467, %3468, %3466 : index + %c8_1697 = constant 8 : index + %c0_1698 = constant 0 : index + %c-1_1699 = constant -1 : index + %3470 = cmpi "slt", %arg5, %c0_1698 : index + %3471 = subi %c-1_1699, %arg5 : index + %3472 = select %3470, %3471, %arg5 : index + %3473 = divi_signed %3472, %c8_1697 : index + %3474 = subi %c-1_1699, %3473 : index + %3475 = select %3470, %3474, %3473 : index + %c8_1700 = constant 8 : index + %3476 = addi %arg5, %c8_1700 : index + %c16_1701 = constant 16 : index + %c0_1702 = constant 0 : index + %c-1_1703 = constant -1 : index + %3477 = cmpi "slt", %3476, %c0_1702 : index + %3478 = subi %c-1_1703, %3476 : index + %3479 = select %3477, %3478, %3476 : index + %3480 = divi_signed %3479, %c16_1701 : index + %3481 = subi %c-1_1703, %3480 : index + %3482 = select %3477, %3481, %3480 : index + %c-2_1704 = constant -2 : index + %3483 = muli %3482, %c-2_1704 : index + %3484 = addi %3475, %3483 : index + %c8_1705 = constant 8 : index + %c0_1706 = constant 0 : index + %c-1_1707 = constant -1 : index + %3485 = cmpi "slt", %arg5, %c0_1706 : index + %3486 = subi %c-1_1707, %arg5 : index + %3487 = select %3485, %3486, %arg5 : index + %3488 = divi_signed %3487, %c8_1705 : index + %3489 = subi %c-1_1707, %3488 : index + %3490 = select %3485, %3489, %3488 : index + %c8_1708 = constant 8 : index + %3491 = addi %arg5, %c8_1708 : index + %c16_1709 = constant 16 : index + %c0_1710 = constant 0 : index + %c-1_1711 = constant -1 : index + %3492 = cmpi "slt", %3491, %c0_1710 : index + %3493 = subi %c-1_1711, %3491 : index + %3494 = select %3492, %3493, %3491 : index + %3495 = divi_signed %3494, %c16_1709 : index + %3496 = subi %c-1_1711, %3495 : index + %3497 = select %3492, %3496, %3495 : index + %c-2_1712 = constant -2 : index + %3498 = muli %3497, %c-2_1712 : index + %3499 = addi %3490, %3498 : index + %c1_1713 = constant 1 : index + %3500 = addi %3499, %c1_1713 : index + %c2_1714 = constant 2 : index + %c0_1715 = constant 0 : index + %c-1_1716 = constant -1 : index + %3501 = cmpi "slt", %3500, %c0_1715 : index + %3502 = subi %c-1_1716, %3500 : index + %3503 = select %3501, %3502, %3500 : index + %3504 = divi_signed %3503, %c2_1714 : index + %3505 = subi %c-1_1716, %3504 : index + %3506 = select %3501, %3505, %3504 : index + %c-2_1717 = constant -2 : index + %3507 = muli %3506, %c-2_1717 : index + %3508 = addi %3484, %3507 : index + %c1_1718 = constant 1 : index + %3509 = addi %3508, %c1_1718 : index + store %3453, %2[%3464, %3469, %3509] : memref<16x6x2xvector<8xf32>> + %c8_1719 = constant 8 : index + %3510 = addi %arg5, %c8_1719 : index + %c16_1720 = constant 16 : index + %c0_1721 = constant 0 : index + %c-1_1722 = constant -1 : index + %3511 = cmpi "slt", %3510, %c0_1721 : index + %3512 = subi %c-1_1722, %3510 : index + %3513 = select %3511, %3512, %3510 : index + %3514 = divi_signed %3513, %c16_1720 : index + %3515 = subi %c-1_1722, %3514 : index + %3516 = select %3511, %3515, %3514 : index + %c16_1723 = constant 16 : index + %3517 = remi_signed %3516, %c16_1723 : index + %c0_1724 = constant 0 : index + %3518 = cmpi "slt", %3517, %c0_1724 : index + %3519 = addi %3517, %c16_1723 : index + %3520 = select %3518, %3519, %3517 : index + %3521 = addi %arg7, %arg9 : index + %c6_1725 = constant 6 : index + %3522 = remi_signed %3521, %c6_1725 : index + %c0_1726 = constant 0 : index + %3523 = cmpi "slt", %3522, %c0_1726 : index + %3524 = addi %3522, %c6_1725 : index + %3525 = select %3523, %3524, %3522 : index + %c8_1727 = constant 8 : index + %c0_1728 = constant 0 : index + %c-1_1729 = constant -1 : index + %3526 = cmpi "slt", %arg5, %c0_1728 : index + %3527 = subi %c-1_1729, %arg5 : index + %3528 = select %3526, %3527, %arg5 : index + %3529 = divi_signed %3528, %c8_1727 : index + %3530 = subi %c-1_1729, %3529 : index + %3531 = select %3526, %3530, %3529 : index + %c8_1730 = constant 8 : index + %3532 = addi %arg5, %c8_1730 : index + %c16_1731 = constant 16 : index + %c0_1732 = constant 0 : index + %c-1_1733 = constant -1 : index + %3533 = cmpi "slt", %3532, %c0_1732 : index + %3534 = subi %c-1_1733, %3532 : index + %3535 = select %3533, %3534, %3532 : index + %3536 = divi_signed %3535, %c16_1731 : index + %3537 = subi %c-1_1733, %3536 : index + %3538 = select %3533, %3537, %3536 : index + %c-2_1734 = constant -2 : index + %3539 = muli %3538, %c-2_1734 : index + %3540 = addi %3531, %3539 : index + %c8_1735 = constant 8 : index + %c0_1736 = constant 0 : index + %c-1_1737 = constant -1 : index + %3541 = cmpi "slt", %arg5, %c0_1736 : index + %3542 = subi %c-1_1737, %arg5 : index + %3543 = select %3541, %3542, %arg5 : index + %3544 = divi_signed %3543, %c8_1735 : index + %3545 = subi %c-1_1737, %3544 : index + %3546 = select %3541, %3545, %3544 : index + %c8_1738 = constant 8 : index + %3547 = addi %arg5, %c8_1738 : index + %c16_1739 = constant 16 : index + %c0_1740 = constant 0 : index + %c-1_1741 = constant -1 : index + %3548 = cmpi "slt", %3547, %c0_1740 : index + %3549 = subi %c-1_1741, %3547 : index + %3550 = select %3548, %3549, %3547 : index + %3551 = divi_signed %3550, %c16_1739 : index + %3552 = subi %c-1_1741, %3551 : index + %3553 = select %3548, %3552, %3551 : index + %c-2_1742 = constant -2 : index + %3554 = muli %3553, %c-2_1742 : index + %3555 = addi %3546, %3554 : index + %c1_1743 = constant 1 : index + %3556 = addi %3555, %c1_1743 : index + %c2_1744 = constant 2 : index + %c0_1745 = constant 0 : index + %c-1_1746 = constant -1 : index + %3557 = cmpi "slt", %3556, %c0_1745 : index + %3558 = subi %c-1_1746, %3556 : index + %3559 = select %3557, %3558, %3556 : index + %3560 = divi_signed %3559, %c2_1744 : index + %3561 = subi %c-1_1746, %3560 : index + %3562 = select %3557, %3561, %3560 : index + %c-2_1747 = constant -2 : index + %3563 = muli %3562, %c-2_1747 : index + %3564 = addi %3540, %3563 : index + %c1_1748 = constant 1 : index + %3565 = addi %3564, %c1_1748 : index + %3566 = load %2[%3520, %3525, %3565] : memref<16x6x2xvector<8xf32>> + %3567 = vector.insertelement %2477, %3566[%c1_i64 : i64] : vector<8xf32> + %c8_1749 = constant 8 : index + %3568 = addi %arg5, %c8_1749 : index + %c16_1750 = constant 16 : index + %c0_1751 = constant 0 : index + %c-1_1752 = constant -1 : index + %3569 = cmpi "slt", %3568, %c0_1751 : index + %3570 = subi %c-1_1752, %3568 : index + %3571 = select %3569, %3570, %3568 : index + %3572 = divi_signed %3571, %c16_1750 : index + %3573 = subi %c-1_1752, %3572 : index + %3574 = select %3569, %3573, %3572 : index + %c16_1753 = constant 16 : index + %3575 = remi_signed %3574, %c16_1753 : index + %c0_1754 = constant 0 : index + %3576 = cmpi "slt", %3575, %c0_1754 : index + %3577 = addi %3575, %c16_1753 : index + %3578 = select %3576, %3577, %3575 : index + %3579 = addi %arg7, %arg9 : index + %c6_1755 = constant 6 : index + %3580 = remi_signed %3579, %c6_1755 : index + %c0_1756 = constant 0 : index + %3581 = cmpi "slt", %3580, %c0_1756 : index + %3582 = addi %3580, %c6_1755 : index + %3583 = select %3581, %3582, %3580 : index + %c8_1757 = constant 8 : index + %c0_1758 = constant 0 : index + %c-1_1759 = constant -1 : index + %3584 = cmpi "slt", %arg5, %c0_1758 : index + %3585 = subi %c-1_1759, %arg5 : index + %3586 = select %3584, %3585, %arg5 : index + %3587 = divi_signed %3586, %c8_1757 : index + %3588 = subi %c-1_1759, %3587 : index + %3589 = select %3584, %3588, %3587 : index + %c8_1760 = constant 8 : index + %3590 = addi %arg5, %c8_1760 : index + %c16_1761 = constant 16 : index + %c0_1762 = constant 0 : index + %c-1_1763 = constant -1 : index + %3591 = cmpi "slt", %3590, %c0_1762 : index + %3592 = subi %c-1_1763, %3590 : index + %3593 = select %3591, %3592, %3590 : index + %3594 = divi_signed %3593, %c16_1761 : index + %3595 = subi %c-1_1763, %3594 : index + %3596 = select %3591, %3595, %3594 : index + %c-2_1764 = constant -2 : index + %3597 = muli %3596, %c-2_1764 : index + %3598 = addi %3589, %3597 : index + %c8_1765 = constant 8 : index + %c0_1766 = constant 0 : index + %c-1_1767 = constant -1 : index + %3599 = cmpi "slt", %arg5, %c0_1766 : index + %3600 = subi %c-1_1767, %arg5 : index + %3601 = select %3599, %3600, %arg5 : index + %3602 = divi_signed %3601, %c8_1765 : index + %3603 = subi %c-1_1767, %3602 : index + %3604 = select %3599, %3603, %3602 : index + %c8_1768 = constant 8 : index + %3605 = addi %arg5, %c8_1768 : index + %c16_1769 = constant 16 : index + %c0_1770 = constant 0 : index + %c-1_1771 = constant -1 : index + %3606 = cmpi "slt", %3605, %c0_1770 : index + %3607 = subi %c-1_1771, %3605 : index + %3608 = select %3606, %3607, %3605 : index + %3609 = divi_signed %3608, %c16_1769 : index + %3610 = subi %c-1_1771, %3609 : index + %3611 = select %3606, %3610, %3609 : index + %c-2_1772 = constant -2 : index + %3612 = muli %3611, %c-2_1772 : index + %3613 = addi %3604, %3612 : index + %c1_1773 = constant 1 : index + %3614 = addi %3613, %c1_1773 : index + %c2_1774 = constant 2 : index + %c0_1775 = constant 0 : index + %c-1_1776 = constant -1 : index + %3615 = cmpi "slt", %3614, %c0_1775 : index + %3616 = subi %c-1_1776, %3614 : index + %3617 = select %3615, %3616, %3614 : index + %3618 = divi_signed %3617, %c2_1774 : index + %3619 = subi %c-1_1776, %3618 : index + %3620 = select %3615, %3619, %3618 : index + %c-2_1777 = constant -2 : index + %3621 = muli %3620, %c-2_1777 : index + %3622 = addi %3598, %3621 : index + %c1_1778 = constant 1 : index + %3623 = addi %3622, %c1_1778 : index + store %3567, %2[%3578, %3583, %3623] : memref<16x6x2xvector<8xf32>> + %c8_1779 = constant 8 : index + %3624 = addi %arg5, %c8_1779 : index + %c16_1780 = constant 16 : index + %c0_1781 = constant 0 : index + %c-1_1782 = constant -1 : index + %3625 = cmpi "slt", %3624, %c0_1781 : index + %3626 = subi %c-1_1782, %3624 : index + %3627 = select %3625, %3626, %3624 : index + %3628 = divi_signed %3627, %c16_1780 : index + %3629 = subi %c-1_1782, %3628 : index + %3630 = select %3625, %3629, %3628 : index + %c16_1783 = constant 16 : index + %3631 = remi_signed %3630, %c16_1783 : index + %c0_1784 = constant 0 : index + %3632 = cmpi "slt", %3631, %c0_1784 : index + %3633 = addi %3631, %c16_1783 : index + %3634 = select %3632, %3633, %3631 : index + %3635 = addi %arg7, %arg9 : index + %c6_1785 = constant 6 : index + %3636 = remi_signed %3635, %c6_1785 : index + %c0_1786 = constant 0 : index + %3637 = cmpi "slt", %3636, %c0_1786 : index + %3638 = addi %3636, %c6_1785 : index + %3639 = select %3637, %3638, %3636 : index + %c8_1787 = constant 8 : index + %c0_1788 = constant 0 : index + %c-1_1789 = constant -1 : index + %3640 = cmpi "slt", %arg5, %c0_1788 : index + %3641 = subi %c-1_1789, %arg5 : index + %3642 = select %3640, %3641, %arg5 : index + %3643 = divi_signed %3642, %c8_1787 : index + %3644 = subi %c-1_1789, %3643 : index + %3645 = select %3640, %3644, %3643 : index + %c8_1790 = constant 8 : index + %3646 = addi %arg5, %c8_1790 : index + %c16_1791 = constant 16 : index + %c0_1792 = constant 0 : index + %c-1_1793 = constant -1 : index + %3647 = cmpi "slt", %3646, %c0_1792 : index + %3648 = subi %c-1_1793, %3646 : index + %3649 = select %3647, %3648, %3646 : index + %3650 = divi_signed %3649, %c16_1791 : index + %3651 = subi %c-1_1793, %3650 : index + %3652 = select %3647, %3651, %3650 : index + %c-2_1794 = constant -2 : index + %3653 = muli %3652, %c-2_1794 : index + %3654 = addi %3645, %3653 : index + %c8_1795 = constant 8 : index + %c0_1796 = constant 0 : index + %c-1_1797 = constant -1 : index + %3655 = cmpi "slt", %arg5, %c0_1796 : index + %3656 = subi %c-1_1797, %arg5 : index + %3657 = select %3655, %3656, %arg5 : index + %3658 = divi_signed %3657, %c8_1795 : index + %3659 = subi %c-1_1797, %3658 : index + %3660 = select %3655, %3659, %3658 : index + %c8_1798 = constant 8 : index + %3661 = addi %arg5, %c8_1798 : index + %c16_1799 = constant 16 : index + %c0_1800 = constant 0 : index + %c-1_1801 = constant -1 : index + %3662 = cmpi "slt", %3661, %c0_1800 : index + %3663 = subi %c-1_1801, %3661 : index + %3664 = select %3662, %3663, %3661 : index + %3665 = divi_signed %3664, %c16_1799 : index + %3666 = subi %c-1_1801, %3665 : index + %3667 = select %3662, %3666, %3665 : index + %c-2_1802 = constant -2 : index + %3668 = muli %3667, %c-2_1802 : index + %3669 = addi %3660, %3668 : index + %c1_1803 = constant 1 : index + %3670 = addi %3669, %c1_1803 : index + %c2_1804 = constant 2 : index + %c0_1805 = constant 0 : index + %c-1_1806 = constant -1 : index + %3671 = cmpi "slt", %3670, %c0_1805 : index + %3672 = subi %c-1_1806, %3670 : index + %3673 = select %3671, %3672, %3670 : index + %3674 = divi_signed %3673, %c2_1804 : index + %3675 = subi %c-1_1806, %3674 : index + %3676 = select %3671, %3675, %3674 : index + %c-2_1807 = constant -2 : index + %3677 = muli %3676, %c-2_1807 : index + %3678 = addi %3654, %3677 : index + %c1_1808 = constant 1 : index + %3679 = addi %3678, %c1_1808 : index + %3680 = load %2[%3634, %3639, %3679] : memref<16x6x2xvector<8xf32>> + %3681 = vector.insertelement %2478, %3680[%c2_i64 : i64] : vector<8xf32> + %c8_1809 = constant 8 : index + %3682 = addi %arg5, %c8_1809 : index + %c16_1810 = constant 16 : index + %c0_1811 = constant 0 : index + %c-1_1812 = constant -1 : index + %3683 = cmpi "slt", %3682, %c0_1811 : index + %3684 = subi %c-1_1812, %3682 : index + %3685 = select %3683, %3684, %3682 : index + %3686 = divi_signed %3685, %c16_1810 : index + %3687 = subi %c-1_1812, %3686 : index + %3688 = select %3683, %3687, %3686 : index + %c16_1813 = constant 16 : index + %3689 = remi_signed %3688, %c16_1813 : index + %c0_1814 = constant 0 : index + %3690 = cmpi "slt", %3689, %c0_1814 : index + %3691 = addi %3689, %c16_1813 : index + %3692 = select %3690, %3691, %3689 : index + %3693 = addi %arg7, %arg9 : index + %c6_1815 = constant 6 : index + %3694 = remi_signed %3693, %c6_1815 : index + %c0_1816 = constant 0 : index + %3695 = cmpi "slt", %3694, %c0_1816 : index + %3696 = addi %3694, %c6_1815 : index + %3697 = select %3695, %3696, %3694 : index + %c8_1817 = constant 8 : index + %c0_1818 = constant 0 : index + %c-1_1819 = constant -1 : index + %3698 = cmpi "slt", %arg5, %c0_1818 : index + %3699 = subi %c-1_1819, %arg5 : index + %3700 = select %3698, %3699, %arg5 : index + %3701 = divi_signed %3700, %c8_1817 : index + %3702 = subi %c-1_1819, %3701 : index + %3703 = select %3698, %3702, %3701 : index + %c8_1820 = constant 8 : index + %3704 = addi %arg5, %c8_1820 : index + %c16_1821 = constant 16 : index + %c0_1822 = constant 0 : index + %c-1_1823 = constant -1 : index + %3705 = cmpi "slt", %3704, %c0_1822 : index + %3706 = subi %c-1_1823, %3704 : index + %3707 = select %3705, %3706, %3704 : index + %3708 = divi_signed %3707, %c16_1821 : index + %3709 = subi %c-1_1823, %3708 : index + %3710 = select %3705, %3709, %3708 : index + %c-2_1824 = constant -2 : index + %3711 = muli %3710, %c-2_1824 : index + %3712 = addi %3703, %3711 : index + %c8_1825 = constant 8 : index + %c0_1826 = constant 0 : index + %c-1_1827 = constant -1 : index + %3713 = cmpi "slt", %arg5, %c0_1826 : index + %3714 = subi %c-1_1827, %arg5 : index + %3715 = select %3713, %3714, %arg5 : index + %3716 = divi_signed %3715, %c8_1825 : index + %3717 = subi %c-1_1827, %3716 : index + %3718 = select %3713, %3717, %3716 : index + %c8_1828 = constant 8 : index + %3719 = addi %arg5, %c8_1828 : index + %c16_1829 = constant 16 : index + %c0_1830 = constant 0 : index + %c-1_1831 = constant -1 : index + %3720 = cmpi "slt", %3719, %c0_1830 : index + %3721 = subi %c-1_1831, %3719 : index + %3722 = select %3720, %3721, %3719 : index + %3723 = divi_signed %3722, %c16_1829 : index + %3724 = subi %c-1_1831, %3723 : index + %3725 = select %3720, %3724, %3723 : index + %c-2_1832 = constant -2 : index + %3726 = muli %3725, %c-2_1832 : index + %3727 = addi %3718, %3726 : index + %c1_1833 = constant 1 : index + %3728 = addi %3727, %c1_1833 : index + %c2_1834 = constant 2 : index + %c0_1835 = constant 0 : index + %c-1_1836 = constant -1 : index + %3729 = cmpi "slt", %3728, %c0_1835 : index + %3730 = subi %c-1_1836, %3728 : index + %3731 = select %3729, %3730, %3728 : index + %3732 = divi_signed %3731, %c2_1834 : index + %3733 = subi %c-1_1836, %3732 : index + %3734 = select %3729, %3733, %3732 : index + %c-2_1837 = constant -2 : index + %3735 = muli %3734, %c-2_1837 : index + %3736 = addi %3712, %3735 : index + %c1_1838 = constant 1 : index + %3737 = addi %3736, %c1_1838 : index + store %3681, %2[%3692, %3697, %3737] : memref<16x6x2xvector<8xf32>> + %c8_1839 = constant 8 : index + %3738 = addi %arg5, %c8_1839 : index + %c16_1840 = constant 16 : index + %c0_1841 = constant 0 : index + %c-1_1842 = constant -1 : index + %3739 = cmpi "slt", %3738, %c0_1841 : index + %3740 = subi %c-1_1842, %3738 : index + %3741 = select %3739, %3740, %3738 : index + %3742 = divi_signed %3741, %c16_1840 : index + %3743 = subi %c-1_1842, %3742 : index + %3744 = select %3739, %3743, %3742 : index + %c16_1843 = constant 16 : index + %3745 = remi_signed %3744, %c16_1843 : index + %c0_1844 = constant 0 : index + %3746 = cmpi "slt", %3745, %c0_1844 : index + %3747 = addi %3745, %c16_1843 : index + %3748 = select %3746, %3747, %3745 : index + %3749 = addi %arg7, %arg9 : index + %c6_1845 = constant 6 : index + %3750 = remi_signed %3749, %c6_1845 : index + %c0_1846 = constant 0 : index + %3751 = cmpi "slt", %3750, %c0_1846 : index + %3752 = addi %3750, %c6_1845 : index + %3753 = select %3751, %3752, %3750 : index + %c8_1847 = constant 8 : index + %c0_1848 = constant 0 : index + %c-1_1849 = constant -1 : index + %3754 = cmpi "slt", %arg5, %c0_1848 : index + %3755 = subi %c-1_1849, %arg5 : index + %3756 = select %3754, %3755, %arg5 : index + %3757 = divi_signed %3756, %c8_1847 : index + %3758 = subi %c-1_1849, %3757 : index + %3759 = select %3754, %3758, %3757 : index + %c8_1850 = constant 8 : index + %3760 = addi %arg5, %c8_1850 : index + %c16_1851 = constant 16 : index + %c0_1852 = constant 0 : index + %c-1_1853 = constant -1 : index + %3761 = cmpi "slt", %3760, %c0_1852 : index + %3762 = subi %c-1_1853, %3760 : index + %3763 = select %3761, %3762, %3760 : index + %3764 = divi_signed %3763, %c16_1851 : index + %3765 = subi %c-1_1853, %3764 : index + %3766 = select %3761, %3765, %3764 : index + %c-2_1854 = constant -2 : index + %3767 = muli %3766, %c-2_1854 : index + %3768 = addi %3759, %3767 : index + %c8_1855 = constant 8 : index + %c0_1856 = constant 0 : index + %c-1_1857 = constant -1 : index + %3769 = cmpi "slt", %arg5, %c0_1856 : index + %3770 = subi %c-1_1857, %arg5 : index + %3771 = select %3769, %3770, %arg5 : index + %3772 = divi_signed %3771, %c8_1855 : index + %3773 = subi %c-1_1857, %3772 : index + %3774 = select %3769, %3773, %3772 : index + %c8_1858 = constant 8 : index + %3775 = addi %arg5, %c8_1858 : index + %c16_1859 = constant 16 : index + %c0_1860 = constant 0 : index + %c-1_1861 = constant -1 : index + %3776 = cmpi "slt", %3775, %c0_1860 : index + %3777 = subi %c-1_1861, %3775 : index + %3778 = select %3776, %3777, %3775 : index + %3779 = divi_signed %3778, %c16_1859 : index + %3780 = subi %c-1_1861, %3779 : index + %3781 = select %3776, %3780, %3779 : index + %c-2_1862 = constant -2 : index + %3782 = muli %3781, %c-2_1862 : index + %3783 = addi %3774, %3782 : index + %c1_1863 = constant 1 : index + %3784 = addi %3783, %c1_1863 : index + %c2_1864 = constant 2 : index + %c0_1865 = constant 0 : index + %c-1_1866 = constant -1 : index + %3785 = cmpi "slt", %3784, %c0_1865 : index + %3786 = subi %c-1_1866, %3784 : index + %3787 = select %3785, %3786, %3784 : index + %3788 = divi_signed %3787, %c2_1864 : index + %3789 = subi %c-1_1866, %3788 : index + %3790 = select %3785, %3789, %3788 : index + %c-2_1867 = constant -2 : index + %3791 = muli %3790, %c-2_1867 : index + %3792 = addi %3768, %3791 : index + %c1_1868 = constant 1 : index + %3793 = addi %3792, %c1_1868 : index + %3794 = load %2[%3748, %3753, %3793] : memref<16x6x2xvector<8xf32>> + %3795 = vector.insertelement %2479, %3794[%c3_i64 : i64] : vector<8xf32> + %c8_1869 = constant 8 : index + %3796 = addi %arg5, %c8_1869 : index + %c16_1870 = constant 16 : index + %c0_1871 = constant 0 : index + %c-1_1872 = constant -1 : index + %3797 = cmpi "slt", %3796, %c0_1871 : index + %3798 = subi %c-1_1872, %3796 : index + %3799 = select %3797, %3798, %3796 : index + %3800 = divi_signed %3799, %c16_1870 : index + %3801 = subi %c-1_1872, %3800 : index + %3802 = select %3797, %3801, %3800 : index + %c16_1873 = constant 16 : index + %3803 = remi_signed %3802, %c16_1873 : index + %c0_1874 = constant 0 : index + %3804 = cmpi "slt", %3803, %c0_1874 : index + %3805 = addi %3803, %c16_1873 : index + %3806 = select %3804, %3805, %3803 : index + %3807 = addi %arg7, %arg9 : index + %c6_1875 = constant 6 : index + %3808 = remi_signed %3807, %c6_1875 : index + %c0_1876 = constant 0 : index + %3809 = cmpi "slt", %3808, %c0_1876 : index + %3810 = addi %3808, %c6_1875 : index + %3811 = select %3809, %3810, %3808 : index + %c8_1877 = constant 8 : index + %c0_1878 = constant 0 : index + %c-1_1879 = constant -1 : index + %3812 = cmpi "slt", %arg5, %c0_1878 : index + %3813 = subi %c-1_1879, %arg5 : index + %3814 = select %3812, %3813, %arg5 : index + %3815 = divi_signed %3814, %c8_1877 : index + %3816 = subi %c-1_1879, %3815 : index + %3817 = select %3812, %3816, %3815 : index + %c8_1880 = constant 8 : index + %3818 = addi %arg5, %c8_1880 : index + %c16_1881 = constant 16 : index + %c0_1882 = constant 0 : index + %c-1_1883 = constant -1 : index + %3819 = cmpi "slt", %3818, %c0_1882 : index + %3820 = subi %c-1_1883, %3818 : index + %3821 = select %3819, %3820, %3818 : index + %3822 = divi_signed %3821, %c16_1881 : index + %3823 = subi %c-1_1883, %3822 : index + %3824 = select %3819, %3823, %3822 : index + %c-2_1884 = constant -2 : index + %3825 = muli %3824, %c-2_1884 : index + %3826 = addi %3817, %3825 : index + %c8_1885 = constant 8 : index + %c0_1886 = constant 0 : index + %c-1_1887 = constant -1 : index + %3827 = cmpi "slt", %arg5, %c0_1886 : index + %3828 = subi %c-1_1887, %arg5 : index + %3829 = select %3827, %3828, %arg5 : index + %3830 = divi_signed %3829, %c8_1885 : index + %3831 = subi %c-1_1887, %3830 : index + %3832 = select %3827, %3831, %3830 : index + %c8_1888 = constant 8 : index + %3833 = addi %arg5, %c8_1888 : index + %c16_1889 = constant 16 : index + %c0_1890 = constant 0 : index + %c-1_1891 = constant -1 : index + %3834 = cmpi "slt", %3833, %c0_1890 : index + %3835 = subi %c-1_1891, %3833 : index + %3836 = select %3834, %3835, %3833 : index + %3837 = divi_signed %3836, %c16_1889 : index + %3838 = subi %c-1_1891, %3837 : index + %3839 = select %3834, %3838, %3837 : index + %c-2_1892 = constant -2 : index + %3840 = muli %3839, %c-2_1892 : index + %3841 = addi %3832, %3840 : index + %c1_1893 = constant 1 : index + %3842 = addi %3841, %c1_1893 : index + %c2_1894 = constant 2 : index + %c0_1895 = constant 0 : index + %c-1_1896 = constant -1 : index + %3843 = cmpi "slt", %3842, %c0_1895 : index + %3844 = subi %c-1_1896, %3842 : index + %3845 = select %3843, %3844, %3842 : index + %3846 = divi_signed %3845, %c2_1894 : index + %3847 = subi %c-1_1896, %3846 : index + %3848 = select %3843, %3847, %3846 : index + %c-2_1897 = constant -2 : index + %3849 = muli %3848, %c-2_1897 : index + %3850 = addi %3826, %3849 : index + %c1_1898 = constant 1 : index + %3851 = addi %3850, %c1_1898 : index + store %3795, %2[%3806, %3811, %3851] : memref<16x6x2xvector<8xf32>> + %c8_1899 = constant 8 : index + %3852 = addi %arg5, %c8_1899 : index + %c16_1900 = constant 16 : index + %c0_1901 = constant 0 : index + %c-1_1902 = constant -1 : index + %3853 = cmpi "slt", %3852, %c0_1901 : index + %3854 = subi %c-1_1902, %3852 : index + %3855 = select %3853, %3854, %3852 : index + %3856 = divi_signed %3855, %c16_1900 : index + %3857 = subi %c-1_1902, %3856 : index + %3858 = select %3853, %3857, %3856 : index + %c16_1903 = constant 16 : index + %3859 = remi_signed %3858, %c16_1903 : index + %c0_1904 = constant 0 : index + %3860 = cmpi "slt", %3859, %c0_1904 : index + %3861 = addi %3859, %c16_1903 : index + %3862 = select %3860, %3861, %3859 : index + %3863 = addi %arg7, %arg9 : index + %c6_1905 = constant 6 : index + %3864 = remi_signed %3863, %c6_1905 : index + %c0_1906 = constant 0 : index + %3865 = cmpi "slt", %3864, %c0_1906 : index + %3866 = addi %3864, %c6_1905 : index + %3867 = select %3865, %3866, %3864 : index + %c8_1907 = constant 8 : index + %c0_1908 = constant 0 : index + %c-1_1909 = constant -1 : index + %3868 = cmpi "slt", %arg5, %c0_1908 : index + %3869 = subi %c-1_1909, %arg5 : index + %3870 = select %3868, %3869, %arg5 : index + %3871 = divi_signed %3870, %c8_1907 : index + %3872 = subi %c-1_1909, %3871 : index + %3873 = select %3868, %3872, %3871 : index + %c8_1910 = constant 8 : index + %3874 = addi %arg5, %c8_1910 : index + %c16_1911 = constant 16 : index + %c0_1912 = constant 0 : index + %c-1_1913 = constant -1 : index + %3875 = cmpi "slt", %3874, %c0_1912 : index + %3876 = subi %c-1_1913, %3874 : index + %3877 = select %3875, %3876, %3874 : index + %3878 = divi_signed %3877, %c16_1911 : index + %3879 = subi %c-1_1913, %3878 : index + %3880 = select %3875, %3879, %3878 : index + %c-2_1914 = constant -2 : index + %3881 = muli %3880, %c-2_1914 : index + %3882 = addi %3873, %3881 : index + %c8_1915 = constant 8 : index + %c0_1916 = constant 0 : index + %c-1_1917 = constant -1 : index + %3883 = cmpi "slt", %arg5, %c0_1916 : index + %3884 = subi %c-1_1917, %arg5 : index + %3885 = select %3883, %3884, %arg5 : index + %3886 = divi_signed %3885, %c8_1915 : index + %3887 = subi %c-1_1917, %3886 : index + %3888 = select %3883, %3887, %3886 : index + %c8_1918 = constant 8 : index + %3889 = addi %arg5, %c8_1918 : index + %c16_1919 = constant 16 : index + %c0_1920 = constant 0 : index + %c-1_1921 = constant -1 : index + %3890 = cmpi "slt", %3889, %c0_1920 : index + %3891 = subi %c-1_1921, %3889 : index + %3892 = select %3890, %3891, %3889 : index + %3893 = divi_signed %3892, %c16_1919 : index + %3894 = subi %c-1_1921, %3893 : index + %3895 = select %3890, %3894, %3893 : index + %c-2_1922 = constant -2 : index + %3896 = muli %3895, %c-2_1922 : index + %3897 = addi %3888, %3896 : index + %c1_1923 = constant 1 : index + %3898 = addi %3897, %c1_1923 : index + %c2_1924 = constant 2 : index + %c0_1925 = constant 0 : index + %c-1_1926 = constant -1 : index + %3899 = cmpi "slt", %3898, %c0_1925 : index + %3900 = subi %c-1_1926, %3898 : index + %3901 = select %3899, %3900, %3898 : index + %3902 = divi_signed %3901, %c2_1924 : index + %3903 = subi %c-1_1926, %3902 : index + %3904 = select %3899, %3903, %3902 : index + %c-2_1927 = constant -2 : index + %3905 = muli %3904, %c-2_1927 : index + %3906 = addi %3882, %3905 : index + %c1_1928 = constant 1 : index + %3907 = addi %3906, %c1_1928 : index + %3908 = load %2[%3862, %3867, %3907] : memref<16x6x2xvector<8xf32>> + %3909 = vector.insertelement %2480, %3908[%c4_i64 : i64] : vector<8xf32> + %c8_1929 = constant 8 : index + %3910 = addi %arg5, %c8_1929 : index + %c16_1930 = constant 16 : index + %c0_1931 = constant 0 : index + %c-1_1932 = constant -1 : index + %3911 = cmpi "slt", %3910, %c0_1931 : index + %3912 = subi %c-1_1932, %3910 : index + %3913 = select %3911, %3912, %3910 : index + %3914 = divi_signed %3913, %c16_1930 : index + %3915 = subi %c-1_1932, %3914 : index + %3916 = select %3911, %3915, %3914 : index + %c16_1933 = constant 16 : index + %3917 = remi_signed %3916, %c16_1933 : index + %c0_1934 = constant 0 : index + %3918 = cmpi "slt", %3917, %c0_1934 : index + %3919 = addi %3917, %c16_1933 : index + %3920 = select %3918, %3919, %3917 : index + %3921 = addi %arg7, %arg9 : index + %c6_1935 = constant 6 : index + %3922 = remi_signed %3921, %c6_1935 : index + %c0_1936 = constant 0 : index + %3923 = cmpi "slt", %3922, %c0_1936 : index + %3924 = addi %3922, %c6_1935 : index + %3925 = select %3923, %3924, %3922 : index + %c8_1937 = constant 8 : index + %c0_1938 = constant 0 : index + %c-1_1939 = constant -1 : index + %3926 = cmpi "slt", %arg5, %c0_1938 : index + %3927 = subi %c-1_1939, %arg5 : index + %3928 = select %3926, %3927, %arg5 : index + %3929 = divi_signed %3928, %c8_1937 : index + %3930 = subi %c-1_1939, %3929 : index + %3931 = select %3926, %3930, %3929 : index + %c8_1940 = constant 8 : index + %3932 = addi %arg5, %c8_1940 : index + %c16_1941 = constant 16 : index + %c0_1942 = constant 0 : index + %c-1_1943 = constant -1 : index + %3933 = cmpi "slt", %3932, %c0_1942 : index + %3934 = subi %c-1_1943, %3932 : index + %3935 = select %3933, %3934, %3932 : index + %3936 = divi_signed %3935, %c16_1941 : index + %3937 = subi %c-1_1943, %3936 : index + %3938 = select %3933, %3937, %3936 : index + %c-2_1944 = constant -2 : index + %3939 = muli %3938, %c-2_1944 : index + %3940 = addi %3931, %3939 : index + %c8_1945 = constant 8 : index + %c0_1946 = constant 0 : index + %c-1_1947 = constant -1 : index + %3941 = cmpi "slt", %arg5, %c0_1946 : index + %3942 = subi %c-1_1947, %arg5 : index + %3943 = select %3941, %3942, %arg5 : index + %3944 = divi_signed %3943, %c8_1945 : index + %3945 = subi %c-1_1947, %3944 : index + %3946 = select %3941, %3945, %3944 : index + %c8_1948 = constant 8 : index + %3947 = addi %arg5, %c8_1948 : index + %c16_1949 = constant 16 : index + %c0_1950 = constant 0 : index + %c-1_1951 = constant -1 : index + %3948 = cmpi "slt", %3947, %c0_1950 : index + %3949 = subi %c-1_1951, %3947 : index + %3950 = select %3948, %3949, %3947 : index + %3951 = divi_signed %3950, %c16_1949 : index + %3952 = subi %c-1_1951, %3951 : index + %3953 = select %3948, %3952, %3951 : index + %c-2_1952 = constant -2 : index + %3954 = muli %3953, %c-2_1952 : index + %3955 = addi %3946, %3954 : index + %c1_1953 = constant 1 : index + %3956 = addi %3955, %c1_1953 : index + %c2_1954 = constant 2 : index + %c0_1955 = constant 0 : index + %c-1_1956 = constant -1 : index + %3957 = cmpi "slt", %3956, %c0_1955 : index + %3958 = subi %c-1_1956, %3956 : index + %3959 = select %3957, %3958, %3956 : index + %3960 = divi_signed %3959, %c2_1954 : index + %3961 = subi %c-1_1956, %3960 : index + %3962 = select %3957, %3961, %3960 : index + %c-2_1957 = constant -2 : index + %3963 = muli %3962, %c-2_1957 : index + %3964 = addi %3940, %3963 : index + %c1_1958 = constant 1 : index + %3965 = addi %3964, %c1_1958 : index + store %3909, %2[%3920, %3925, %3965] : memref<16x6x2xvector<8xf32>> + %c8_1959 = constant 8 : index + %3966 = addi %arg5, %c8_1959 : index + %c16_1960 = constant 16 : index + %c0_1961 = constant 0 : index + %c-1_1962 = constant -1 : index + %3967 = cmpi "slt", %3966, %c0_1961 : index + %3968 = subi %c-1_1962, %3966 : index + %3969 = select %3967, %3968, %3966 : index + %3970 = divi_signed %3969, %c16_1960 : index + %3971 = subi %c-1_1962, %3970 : index + %3972 = select %3967, %3971, %3970 : index + %c16_1963 = constant 16 : index + %3973 = remi_signed %3972, %c16_1963 : index + %c0_1964 = constant 0 : index + %3974 = cmpi "slt", %3973, %c0_1964 : index + %3975 = addi %3973, %c16_1963 : index + %3976 = select %3974, %3975, %3973 : index + %3977 = addi %arg7, %arg9 : index + %c6_1965 = constant 6 : index + %3978 = remi_signed %3977, %c6_1965 : index + %c0_1966 = constant 0 : index + %3979 = cmpi "slt", %3978, %c0_1966 : index + %3980 = addi %3978, %c6_1965 : index + %3981 = select %3979, %3980, %3978 : index + %c8_1967 = constant 8 : index + %c0_1968 = constant 0 : index + %c-1_1969 = constant -1 : index + %3982 = cmpi "slt", %arg5, %c0_1968 : index + %3983 = subi %c-1_1969, %arg5 : index + %3984 = select %3982, %3983, %arg5 : index + %3985 = divi_signed %3984, %c8_1967 : index + %3986 = subi %c-1_1969, %3985 : index + %3987 = select %3982, %3986, %3985 : index + %c8_1970 = constant 8 : index + %3988 = addi %arg5, %c8_1970 : index + %c16_1971 = constant 16 : index + %c0_1972 = constant 0 : index + %c-1_1973 = constant -1 : index + %3989 = cmpi "slt", %3988, %c0_1972 : index + %3990 = subi %c-1_1973, %3988 : index + %3991 = select %3989, %3990, %3988 : index + %3992 = divi_signed %3991, %c16_1971 : index + %3993 = subi %c-1_1973, %3992 : index + %3994 = select %3989, %3993, %3992 : index + %c-2_1974 = constant -2 : index + %3995 = muli %3994, %c-2_1974 : index + %3996 = addi %3987, %3995 : index + %c8_1975 = constant 8 : index + %c0_1976 = constant 0 : index + %c-1_1977 = constant -1 : index + %3997 = cmpi "slt", %arg5, %c0_1976 : index + %3998 = subi %c-1_1977, %arg5 : index + %3999 = select %3997, %3998, %arg5 : index + %4000 = divi_signed %3999, %c8_1975 : index + %4001 = subi %c-1_1977, %4000 : index + %4002 = select %3997, %4001, %4000 : index + %c8_1978 = constant 8 : index + %4003 = addi %arg5, %c8_1978 : index + %c16_1979 = constant 16 : index + %c0_1980 = constant 0 : index + %c-1_1981 = constant -1 : index + %4004 = cmpi "slt", %4003, %c0_1980 : index + %4005 = subi %c-1_1981, %4003 : index + %4006 = select %4004, %4005, %4003 : index + %4007 = divi_signed %4006, %c16_1979 : index + %4008 = subi %c-1_1981, %4007 : index + %4009 = select %4004, %4008, %4007 : index + %c-2_1982 = constant -2 : index + %4010 = muli %4009, %c-2_1982 : index + %4011 = addi %4002, %4010 : index + %c1_1983 = constant 1 : index + %4012 = addi %4011, %c1_1983 : index + %c2_1984 = constant 2 : index + %c0_1985 = constant 0 : index + %c-1_1986 = constant -1 : index + %4013 = cmpi "slt", %4012, %c0_1985 : index + %4014 = subi %c-1_1986, %4012 : index + %4015 = select %4013, %4014, %4012 : index + %4016 = divi_signed %4015, %c2_1984 : index + %4017 = subi %c-1_1986, %4016 : index + %4018 = select %4013, %4017, %4016 : index + %c-2_1987 = constant -2 : index + %4019 = muli %4018, %c-2_1987 : index + %4020 = addi %3996, %4019 : index + %c1_1988 = constant 1 : index + %4021 = addi %4020, %c1_1988 : index + %4022 = load %2[%3976, %3981, %4021] : memref<16x6x2xvector<8xf32>> + %4023 = vector.insertelement %2481, %4022[%c5_i64 : i64] : vector<8xf32> + %c8_1989 = constant 8 : index + %4024 = addi %arg5, %c8_1989 : index + %c16_1990 = constant 16 : index + %c0_1991 = constant 0 : index + %c-1_1992 = constant -1 : index + %4025 = cmpi "slt", %4024, %c0_1991 : index + %4026 = subi %c-1_1992, %4024 : index + %4027 = select %4025, %4026, %4024 : index + %4028 = divi_signed %4027, %c16_1990 : index + %4029 = subi %c-1_1992, %4028 : index + %4030 = select %4025, %4029, %4028 : index + %c16_1993 = constant 16 : index + %4031 = remi_signed %4030, %c16_1993 : index + %c0_1994 = constant 0 : index + %4032 = cmpi "slt", %4031, %c0_1994 : index + %4033 = addi %4031, %c16_1993 : index + %4034 = select %4032, %4033, %4031 : index + %4035 = addi %arg7, %arg9 : index + %c6_1995 = constant 6 : index + %4036 = remi_signed %4035, %c6_1995 : index + %c0_1996 = constant 0 : index + %4037 = cmpi "slt", %4036, %c0_1996 : index + %4038 = addi %4036, %c6_1995 : index + %4039 = select %4037, %4038, %4036 : index + %c8_1997 = constant 8 : index + %c0_1998 = constant 0 : index + %c-1_1999 = constant -1 : index + %4040 = cmpi "slt", %arg5, %c0_1998 : index + %4041 = subi %c-1_1999, %arg5 : index + %4042 = select %4040, %4041, %arg5 : index + %4043 = divi_signed %4042, %c8_1997 : index + %4044 = subi %c-1_1999, %4043 : index + %4045 = select %4040, %4044, %4043 : index + %c8_2000 = constant 8 : index + %4046 = addi %arg5, %c8_2000 : index + %c16_2001 = constant 16 : index + %c0_2002 = constant 0 : index + %c-1_2003 = constant -1 : index + %4047 = cmpi "slt", %4046, %c0_2002 : index + %4048 = subi %c-1_2003, %4046 : index + %4049 = select %4047, %4048, %4046 : index + %4050 = divi_signed %4049, %c16_2001 : index + %4051 = subi %c-1_2003, %4050 : index + %4052 = select %4047, %4051, %4050 : index + %c-2_2004 = constant -2 : index + %4053 = muli %4052, %c-2_2004 : index + %4054 = addi %4045, %4053 : index + %c8_2005 = constant 8 : index + %c0_2006 = constant 0 : index + %c-1_2007 = constant -1 : index + %4055 = cmpi "slt", %arg5, %c0_2006 : index + %4056 = subi %c-1_2007, %arg5 : index + %4057 = select %4055, %4056, %arg5 : index + %4058 = divi_signed %4057, %c8_2005 : index + %4059 = subi %c-1_2007, %4058 : index + %4060 = select %4055, %4059, %4058 : index + %c8_2008 = constant 8 : index + %4061 = addi %arg5, %c8_2008 : index + %c16_2009 = constant 16 : index + %c0_2010 = constant 0 : index + %c-1_2011 = constant -1 : index + %4062 = cmpi "slt", %4061, %c0_2010 : index + %4063 = subi %c-1_2011, %4061 : index + %4064 = select %4062, %4063, %4061 : index + %4065 = divi_signed %4064, %c16_2009 : index + %4066 = subi %c-1_2011, %4065 : index + %4067 = select %4062, %4066, %4065 : index + %c-2_2012 = constant -2 : index + %4068 = muli %4067, %c-2_2012 : index + %4069 = addi %4060, %4068 : index + %c1_2013 = constant 1 : index + %4070 = addi %4069, %c1_2013 : index + %c2_2014 = constant 2 : index + %c0_2015 = constant 0 : index + %c-1_2016 = constant -1 : index + %4071 = cmpi "slt", %4070, %c0_2015 : index + %4072 = subi %c-1_2016, %4070 : index + %4073 = select %4071, %4072, %4070 : index + %4074 = divi_signed %4073, %c2_2014 : index + %4075 = subi %c-1_2016, %4074 : index + %4076 = select %4071, %4075, %4074 : index + %c-2_2017 = constant -2 : index + %4077 = muli %4076, %c-2_2017 : index + %4078 = addi %4054, %4077 : index + %c1_2018 = constant 1 : index + %4079 = addi %4078, %c1_2018 : index + store %4023, %2[%4034, %4039, %4079] : memref<16x6x2xvector<8xf32>> + %c8_2019 = constant 8 : index + %4080 = addi %arg5, %c8_2019 : index + %c16_2020 = constant 16 : index + %c0_2021 = constant 0 : index + %c-1_2022 = constant -1 : index + %4081 = cmpi "slt", %4080, %c0_2021 : index + %4082 = subi %c-1_2022, %4080 : index + %4083 = select %4081, %4082, %4080 : index + %4084 = divi_signed %4083, %c16_2020 : index + %4085 = subi %c-1_2022, %4084 : index + %4086 = select %4081, %4085, %4084 : index + %c16_2023 = constant 16 : index + %4087 = remi_signed %4086, %c16_2023 : index + %c0_2024 = constant 0 : index + %4088 = cmpi "slt", %4087, %c0_2024 : index + %4089 = addi %4087, %c16_2023 : index + %4090 = select %4088, %4089, %4087 : index + %4091 = addi %arg7, %arg9 : index + %c6_2025 = constant 6 : index + %4092 = remi_signed %4091, %c6_2025 : index + %c0_2026 = constant 0 : index + %4093 = cmpi "slt", %4092, %c0_2026 : index + %4094 = addi %4092, %c6_2025 : index + %4095 = select %4093, %4094, %4092 : index + %c8_2027 = constant 8 : index + %c0_2028 = constant 0 : index + %c-1_2029 = constant -1 : index + %4096 = cmpi "slt", %arg5, %c0_2028 : index + %4097 = subi %c-1_2029, %arg5 : index + %4098 = select %4096, %4097, %arg5 : index + %4099 = divi_signed %4098, %c8_2027 : index + %4100 = subi %c-1_2029, %4099 : index + %4101 = select %4096, %4100, %4099 : index + %c8_2030 = constant 8 : index + %4102 = addi %arg5, %c8_2030 : index + %c16_2031 = constant 16 : index + %c0_2032 = constant 0 : index + %c-1_2033 = constant -1 : index + %4103 = cmpi "slt", %4102, %c0_2032 : index + %4104 = subi %c-1_2033, %4102 : index + %4105 = select %4103, %4104, %4102 : index + %4106 = divi_signed %4105, %c16_2031 : index + %4107 = subi %c-1_2033, %4106 : index + %4108 = select %4103, %4107, %4106 : index + %c-2_2034 = constant -2 : index + %4109 = muli %4108, %c-2_2034 : index + %4110 = addi %4101, %4109 : index + %c8_2035 = constant 8 : index + %c0_2036 = constant 0 : index + %c-1_2037 = constant -1 : index + %4111 = cmpi "slt", %arg5, %c0_2036 : index + %4112 = subi %c-1_2037, %arg5 : index + %4113 = select %4111, %4112, %arg5 : index + %4114 = divi_signed %4113, %c8_2035 : index + %4115 = subi %c-1_2037, %4114 : index + %4116 = select %4111, %4115, %4114 : index + %c8_2038 = constant 8 : index + %4117 = addi %arg5, %c8_2038 : index + %c16_2039 = constant 16 : index + %c0_2040 = constant 0 : index + %c-1_2041 = constant -1 : index + %4118 = cmpi "slt", %4117, %c0_2040 : index + %4119 = subi %c-1_2041, %4117 : index + %4120 = select %4118, %4119, %4117 : index + %4121 = divi_signed %4120, %c16_2039 : index + %4122 = subi %c-1_2041, %4121 : index + %4123 = select %4118, %4122, %4121 : index + %c-2_2042 = constant -2 : index + %4124 = muli %4123, %c-2_2042 : index + %4125 = addi %4116, %4124 : index + %c1_2043 = constant 1 : index + %4126 = addi %4125, %c1_2043 : index + %c2_2044 = constant 2 : index + %c0_2045 = constant 0 : index + %c-1_2046 = constant -1 : index + %4127 = cmpi "slt", %4126, %c0_2045 : index + %4128 = subi %c-1_2046, %4126 : index + %4129 = select %4127, %4128, %4126 : index + %4130 = divi_signed %4129, %c2_2044 : index + %4131 = subi %c-1_2046, %4130 : index + %4132 = select %4127, %4131, %4130 : index + %c-2_2047 = constant -2 : index + %4133 = muli %4132, %c-2_2047 : index + %4134 = addi %4110, %4133 : index + %c1_2048 = constant 1 : index + %4135 = addi %4134, %c1_2048 : index + %4136 = load %2[%4090, %4095, %4135] : memref<16x6x2xvector<8xf32>> + %4137 = vector.insertelement %2482, %4136[%c6_i64 : i64] : vector<8xf32> + %c8_2049 = constant 8 : index + %4138 = addi %arg5, %c8_2049 : index + %c16_2050 = constant 16 : index + %c0_2051 = constant 0 : index + %c-1_2052 = constant -1 : index + %4139 = cmpi "slt", %4138, %c0_2051 : index + %4140 = subi %c-1_2052, %4138 : index + %4141 = select %4139, %4140, %4138 : index + %4142 = divi_signed %4141, %c16_2050 : index + %4143 = subi %c-1_2052, %4142 : index + %4144 = select %4139, %4143, %4142 : index + %c16_2053 = constant 16 : index + %4145 = remi_signed %4144, %c16_2053 : index + %c0_2054 = constant 0 : index + %4146 = cmpi "slt", %4145, %c0_2054 : index + %4147 = addi %4145, %c16_2053 : index + %4148 = select %4146, %4147, %4145 : index + %4149 = addi %arg7, %arg9 : index + %c6_2055 = constant 6 : index + %4150 = remi_signed %4149, %c6_2055 : index + %c0_2056 = constant 0 : index + %4151 = cmpi "slt", %4150, %c0_2056 : index + %4152 = addi %4150, %c6_2055 : index + %4153 = select %4151, %4152, %4150 : index + %c8_2057 = constant 8 : index + %c0_2058 = constant 0 : index + %c-1_2059 = constant -1 : index + %4154 = cmpi "slt", %arg5, %c0_2058 : index + %4155 = subi %c-1_2059, %arg5 : index + %4156 = select %4154, %4155, %arg5 : index + %4157 = divi_signed %4156, %c8_2057 : index + %4158 = subi %c-1_2059, %4157 : index + %4159 = select %4154, %4158, %4157 : index + %c8_2060 = constant 8 : index + %4160 = addi %arg5, %c8_2060 : index + %c16_2061 = constant 16 : index + %c0_2062 = constant 0 : index + %c-1_2063 = constant -1 : index + %4161 = cmpi "slt", %4160, %c0_2062 : index + %4162 = subi %c-1_2063, %4160 : index + %4163 = select %4161, %4162, %4160 : index + %4164 = divi_signed %4163, %c16_2061 : index + %4165 = subi %c-1_2063, %4164 : index + %4166 = select %4161, %4165, %4164 : index + %c-2_2064 = constant -2 : index + %4167 = muli %4166, %c-2_2064 : index + %4168 = addi %4159, %4167 : index + %c8_2065 = constant 8 : index + %c0_2066 = constant 0 : index + %c-1_2067 = constant -1 : index + %4169 = cmpi "slt", %arg5, %c0_2066 : index + %4170 = subi %c-1_2067, %arg5 : index + %4171 = select %4169, %4170, %arg5 : index + %4172 = divi_signed %4171, %c8_2065 : index + %4173 = subi %c-1_2067, %4172 : index + %4174 = select %4169, %4173, %4172 : index + %c8_2068 = constant 8 : index + %4175 = addi %arg5, %c8_2068 : index + %c16_2069 = constant 16 : index + %c0_2070 = constant 0 : index + %c-1_2071 = constant -1 : index + %4176 = cmpi "slt", %4175, %c0_2070 : index + %4177 = subi %c-1_2071, %4175 : index + %4178 = select %4176, %4177, %4175 : index + %4179 = divi_signed %4178, %c16_2069 : index + %4180 = subi %c-1_2071, %4179 : index + %4181 = select %4176, %4180, %4179 : index + %c-2_2072 = constant -2 : index + %4182 = muli %4181, %c-2_2072 : index + %4183 = addi %4174, %4182 : index + %c1_2073 = constant 1 : index + %4184 = addi %4183, %c1_2073 : index + %c2_2074 = constant 2 : index + %c0_2075 = constant 0 : index + %c-1_2076 = constant -1 : index + %4185 = cmpi "slt", %4184, %c0_2075 : index + %4186 = subi %c-1_2076, %4184 : index + %4187 = select %4185, %4186, %4184 : index + %4188 = divi_signed %4187, %c2_2074 : index + %4189 = subi %c-1_2076, %4188 : index + %4190 = select %4185, %4189, %4188 : index + %c-2_2077 = constant -2 : index + %4191 = muli %4190, %c-2_2077 : index + %4192 = addi %4168, %4191 : index + %c1_2078 = constant 1 : index + %4193 = addi %4192, %c1_2078 : index + store %4137, %2[%4148, %4153, %4193] : memref<16x6x2xvector<8xf32>> + %c8_2079 = constant 8 : index + %4194 = addi %arg5, %c8_2079 : index + %c16_2080 = constant 16 : index + %c0_2081 = constant 0 : index + %c-1_2082 = constant -1 : index + %4195 = cmpi "slt", %4194, %c0_2081 : index + %4196 = subi %c-1_2082, %4194 : index + %4197 = select %4195, %4196, %4194 : index + %4198 = divi_signed %4197, %c16_2080 : index + %4199 = subi %c-1_2082, %4198 : index + %4200 = select %4195, %4199, %4198 : index + %c16_2083 = constant 16 : index + %4201 = remi_signed %4200, %c16_2083 : index + %c0_2084 = constant 0 : index + %4202 = cmpi "slt", %4201, %c0_2084 : index + %4203 = addi %4201, %c16_2083 : index + %4204 = select %4202, %4203, %4201 : index + %4205 = addi %arg7, %arg9 : index + %c6_2085 = constant 6 : index + %4206 = remi_signed %4205, %c6_2085 : index + %c0_2086 = constant 0 : index + %4207 = cmpi "slt", %4206, %c0_2086 : index + %4208 = addi %4206, %c6_2085 : index + %4209 = select %4207, %4208, %4206 : index + %c8_2087 = constant 8 : index + %c0_2088 = constant 0 : index + %c-1_2089 = constant -1 : index + %4210 = cmpi "slt", %arg5, %c0_2088 : index + %4211 = subi %c-1_2089, %arg5 : index + %4212 = select %4210, %4211, %arg5 : index + %4213 = divi_signed %4212, %c8_2087 : index + %4214 = subi %c-1_2089, %4213 : index + %4215 = select %4210, %4214, %4213 : index + %c8_2090 = constant 8 : index + %4216 = addi %arg5, %c8_2090 : index + %c16_2091 = constant 16 : index + %c0_2092 = constant 0 : index + %c-1_2093 = constant -1 : index + %4217 = cmpi "slt", %4216, %c0_2092 : index + %4218 = subi %c-1_2093, %4216 : index + %4219 = select %4217, %4218, %4216 : index + %4220 = divi_signed %4219, %c16_2091 : index + %4221 = subi %c-1_2093, %4220 : index + %4222 = select %4217, %4221, %4220 : index + %c-2_2094 = constant -2 : index + %4223 = muli %4222, %c-2_2094 : index + %4224 = addi %4215, %4223 : index + %c8_2095 = constant 8 : index + %c0_2096 = constant 0 : index + %c-1_2097 = constant -1 : index + %4225 = cmpi "slt", %arg5, %c0_2096 : index + %4226 = subi %c-1_2097, %arg5 : index + %4227 = select %4225, %4226, %arg5 : index + %4228 = divi_signed %4227, %c8_2095 : index + %4229 = subi %c-1_2097, %4228 : index + %4230 = select %4225, %4229, %4228 : index + %c8_2098 = constant 8 : index + %4231 = addi %arg5, %c8_2098 : index + %c16_2099 = constant 16 : index + %c0_2100 = constant 0 : index + %c-1_2101 = constant -1 : index + %4232 = cmpi "slt", %4231, %c0_2100 : index + %4233 = subi %c-1_2101, %4231 : index + %4234 = select %4232, %4233, %4231 : index + %4235 = divi_signed %4234, %c16_2099 : index + %4236 = subi %c-1_2101, %4235 : index + %4237 = select %4232, %4236, %4235 : index + %c-2_2102 = constant -2 : index + %4238 = muli %4237, %c-2_2102 : index + %4239 = addi %4230, %4238 : index + %c1_2103 = constant 1 : index + %4240 = addi %4239, %c1_2103 : index + %c2_2104 = constant 2 : index + %c0_2105 = constant 0 : index + %c-1_2106 = constant -1 : index + %4241 = cmpi "slt", %4240, %c0_2105 : index + %4242 = subi %c-1_2106, %4240 : index + %4243 = select %4241, %4242, %4240 : index + %4244 = divi_signed %4243, %c2_2104 : index + %4245 = subi %c-1_2106, %4244 : index + %4246 = select %4241, %4245, %4244 : index + %c-2_2107 = constant -2 : index + %4247 = muli %4246, %c-2_2107 : index + %4248 = addi %4224, %4247 : index + %c1_2108 = constant 1 : index + %4249 = addi %4248, %c1_2108 : index + %4250 = load %2[%4204, %4209, %4249] : memref<16x6x2xvector<8xf32>> + %4251 = vector.insertelement %2483, %4250[%c7_i64 : i64] : vector<8xf32> + %c8_2109 = constant 8 : index + %4252 = addi %arg5, %c8_2109 : index + %c16_2110 = constant 16 : index + %c0_2111 = constant 0 : index + %c-1_2112 = constant -1 : index + %4253 = cmpi "slt", %4252, %c0_2111 : index + %4254 = subi %c-1_2112, %4252 : index + %4255 = select %4253, %4254, %4252 : index + %4256 = divi_signed %4255, %c16_2110 : index + %4257 = subi %c-1_2112, %4256 : index + %4258 = select %4253, %4257, %4256 : index + %c16_2113 = constant 16 : index + %4259 = remi_signed %4258, %c16_2113 : index + %c0_2114 = constant 0 : index + %4260 = cmpi "slt", %4259, %c0_2114 : index + %4261 = addi %4259, %c16_2113 : index + %4262 = select %4260, %4261, %4259 : index + %4263 = addi %arg7, %arg9 : index + %c6_2115 = constant 6 : index + %4264 = remi_signed %4263, %c6_2115 : index + %c0_2116 = constant 0 : index + %4265 = cmpi "slt", %4264, %c0_2116 : index + %4266 = addi %4264, %c6_2115 : index + %4267 = select %4265, %4266, %4264 : index + %c8_2117 = constant 8 : index + %c0_2118 = constant 0 : index + %c-1_2119 = constant -1 : index + %4268 = cmpi "slt", %arg5, %c0_2118 : index + %4269 = subi %c-1_2119, %arg5 : index + %4270 = select %4268, %4269, %arg5 : index + %4271 = divi_signed %4270, %c8_2117 : index + %4272 = subi %c-1_2119, %4271 : index + %4273 = select %4268, %4272, %4271 : index + %c8_2120 = constant 8 : index + %4274 = addi %arg5, %c8_2120 : index + %c16_2121 = constant 16 : index + %c0_2122 = constant 0 : index + %c-1_2123 = constant -1 : index + %4275 = cmpi "slt", %4274, %c0_2122 : index + %4276 = subi %c-1_2123, %4274 : index + %4277 = select %4275, %4276, %4274 : index + %4278 = divi_signed %4277, %c16_2121 : index + %4279 = subi %c-1_2123, %4278 : index + %4280 = select %4275, %4279, %4278 : index + %c-2_2124 = constant -2 : index + %4281 = muli %4280, %c-2_2124 : index + %4282 = addi %4273, %4281 : index + %c8_2125 = constant 8 : index + %c0_2126 = constant 0 : index + %c-1_2127 = constant -1 : index + %4283 = cmpi "slt", %arg5, %c0_2126 : index + %4284 = subi %c-1_2127, %arg5 : index + %4285 = select %4283, %4284, %arg5 : index + %4286 = divi_signed %4285, %c8_2125 : index + %4287 = subi %c-1_2127, %4286 : index + %4288 = select %4283, %4287, %4286 : index + %c8_2128 = constant 8 : index + %4289 = addi %arg5, %c8_2128 : index + %c16_2129 = constant 16 : index + %c0_2130 = constant 0 : index + %c-1_2131 = constant -1 : index + %4290 = cmpi "slt", %4289, %c0_2130 : index + %4291 = subi %c-1_2131, %4289 : index + %4292 = select %4290, %4291, %4289 : index + %4293 = divi_signed %4292, %c16_2129 : index + %4294 = subi %c-1_2131, %4293 : index + %4295 = select %4290, %4294, %4293 : index + %c-2_2132 = constant -2 : index + %4296 = muli %4295, %c-2_2132 : index + %4297 = addi %4288, %4296 : index + %c1_2133 = constant 1 : index + %4298 = addi %4297, %c1_2133 : index + %c2_2134 = constant 2 : index + %c0_2135 = constant 0 : index + %c-1_2136 = constant -1 : index + %4299 = cmpi "slt", %4298, %c0_2135 : index + %4300 = subi %c-1_2136, %4298 : index + %4301 = select %4299, %4300, %4298 : index + %4302 = divi_signed %4301, %c2_2134 : index + %4303 = subi %c-1_2136, %4302 : index + %4304 = select %4299, %4303, %4302 : index + %c-2_2137 = constant -2 : index + %4305 = muli %4304, %c-2_2137 : index + %4306 = addi %4282, %4305 : index + %c1_2138 = constant 1 : index + %4307 = addi %4306, %c1_2138 : index + store %4251, %2[%4262, %4267, %4307] : memref<16x6x2xvector<8xf32>> + } + } + } + %c0_20 = constant 0 : index + %c4_21 = constant 4 : index + %c1_22 = constant 1 : index + scf.for %arg7 = %c0_20 to %c4_21 step %c1_22 { + %4 = addi %arg6, %arg7 : index + %5 = addi %arg6, %arg7 : index + %6 = addi %arg6, %arg7 : index + %7 = addi %arg6, %arg7 : index + %8 = addi %arg6, %arg7 : index + %9 = addi %arg6, %arg7 : index + %10 = addi %arg6, %arg7 : index + %11 = addi %arg6, %arg7 : index + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%arg4, %5] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%arg4, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = load %arg0[%arg4, %7] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %16 = load %arg0[%arg4, %8] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg0[%arg4, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %18 = load %arg0[%arg4, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg0[%arg4, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %c16_23 = constant 16 : index + %c0_24 = constant 0 : index + %c-1 = constant -1 : index + %20 = cmpi "slt", %arg5, %c0_24 : index + %21 = subi %c-1, %arg5 : index + %22 = select %20, %21, %arg5 : index + %23 = divi_signed %22, %c16_23 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %c16_25 = constant 16 : index + %26 = remi_signed %25, %c16_25 : index + %c0_26 = constant 0 : index + %27 = cmpi "slt", %26, %c0_26 : index + %28 = addi %26, %c16_25 : index + %29 = select %27, %28, %26 : index + %30 = addi %arg6, %arg7 : index + %c128_27 = constant 128 : index + %31 = remi_signed %30, %c128_27 : index + %c0_28 = constant 0 : index + %32 = cmpi "slt", %31, %c0_28 : index + %33 = addi %31, %c128_27 : index + %34 = select %32, %33, %31 : index + %c16_29 = constant 16 : index + %35 = remi_signed %arg5, %c16_29 : index + %c0_30 = constant 0 : index + %36 = cmpi "slt", %35, %c0_30 : index + %37 = addi %35, %c16_29 : index + %38 = select %36, %37, %35 : index + %c8_31 = constant 8 : index + %c0_32 = constant 0 : index + %c-1_33 = constant -1 : index + %39 = cmpi "slt", %38, %c0_32 : index + %40 = subi %c-1_33, %38 : index + %41 = select %39, %40, %38 : index + %42 = divi_signed %41, %c8_31 : index + %43 = subi %c-1_33, %42 : index + %44 = select %39, %43, %42 : index + %c2_34 = constant 2 : index + %45 = remi_signed %44, %c2_34 : index + %c0_35 = constant 0 : index + %46 = cmpi "slt", %45, %c0_35 : index + %47 = addi %45, %c2_34 : index + %48 = select %46, %47, %45 : index + %49 = load %3[%29, %34, %48] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c0_i64 : i64] : vector<8xf32> + %c16_36 = constant 16 : index + %c0_37 = constant 0 : index + %c-1_38 = constant -1 : index + %51 = cmpi "slt", %arg5, %c0_37 : index + %52 = subi %c-1_38, %arg5 : index + %53 = select %51, %52, %arg5 : index + %54 = divi_signed %53, %c16_36 : index + %55 = subi %c-1_38, %54 : index + %56 = select %51, %55, %54 : index + %c16_39 = constant 16 : index + %57 = remi_signed %56, %c16_39 : index + %c0_40 = constant 0 : index + %58 = cmpi "slt", %57, %c0_40 : index + %59 = addi %57, %c16_39 : index + %60 = select %58, %59, %57 : index + %61 = addi %arg6, %arg7 : index + %c128_41 = constant 128 : index + %62 = remi_signed %61, %c128_41 : index + %c0_42 = constant 0 : index + %63 = cmpi "slt", %62, %c0_42 : index + %64 = addi %62, %c128_41 : index + %65 = select %63, %64, %62 : index + %c16_43 = constant 16 : index + %66 = remi_signed %arg5, %c16_43 : index + %c0_44 = constant 0 : index + %67 = cmpi "slt", %66, %c0_44 : index + %68 = addi %66, %c16_43 : index + %69 = select %67, %68, %66 : index + %c8_45 = constant 8 : index + %c0_46 = constant 0 : index + %c-1_47 = constant -1 : index + %70 = cmpi "slt", %69, %c0_46 : index + %71 = subi %c-1_47, %69 : index + %72 = select %70, %71, %69 : index + %73 = divi_signed %72, %c8_45 : index + %74 = subi %c-1_47, %73 : index + %75 = select %70, %74, %73 : index + %c2_48 = constant 2 : index + %76 = remi_signed %75, %c2_48 : index + %c0_49 = constant 0 : index + %77 = cmpi "slt", %76, %c0_49 : index + %78 = addi %76, %c2_48 : index + %79 = select %77, %78, %76 : index + %80 = load %3[%60, %65, %79] : memref<16x128x2xvector<8xf32>> + %81 = vector.extractelement %80[%c1_i64 : i64] : vector<8xf32> + %c16_50 = constant 16 : index + %c0_51 = constant 0 : index + %c-1_52 = constant -1 : index + %82 = cmpi "slt", %arg5, %c0_51 : index + %83 = subi %c-1_52, %arg5 : index + %84 = select %82, %83, %arg5 : index + %85 = divi_signed %84, %c16_50 : index + %86 = subi %c-1_52, %85 : index + %87 = select %82, %86, %85 : index + %c16_53 = constant 16 : index + %88 = remi_signed %87, %c16_53 : index + %c0_54 = constant 0 : index + %89 = cmpi "slt", %88, %c0_54 : index + %90 = addi %88, %c16_53 : index + %91 = select %89, %90, %88 : index + %92 = addi %arg6, %arg7 : index + %c128_55 = constant 128 : index + %93 = remi_signed %92, %c128_55 : index + %c0_56 = constant 0 : index + %94 = cmpi "slt", %93, %c0_56 : index + %95 = addi %93, %c128_55 : index + %96 = select %94, %95, %93 : index + %c16_57 = constant 16 : index + %97 = remi_signed %arg5, %c16_57 : index + %c0_58 = constant 0 : index + %98 = cmpi "slt", %97, %c0_58 : index + %99 = addi %97, %c16_57 : index + %100 = select %98, %99, %97 : index + %c8_59 = constant 8 : index + %c0_60 = constant 0 : index + %c-1_61 = constant -1 : index + %101 = cmpi "slt", %100, %c0_60 : index + %102 = subi %c-1_61, %100 : index + %103 = select %101, %102, %100 : index + %104 = divi_signed %103, %c8_59 : index + %105 = subi %c-1_61, %104 : index + %106 = select %101, %105, %104 : index + %c2_62 = constant 2 : index + %107 = remi_signed %106, %c2_62 : index + %c0_63 = constant 0 : index + %108 = cmpi "slt", %107, %c0_63 : index + %109 = addi %107, %c2_62 : index + %110 = select %108, %109, %107 : index + %111 = load %3[%91, %96, %110] : memref<16x128x2xvector<8xf32>> + %112 = vector.extractelement %111[%c2_i64 : i64] : vector<8xf32> + %c16_64 = constant 16 : index + %c0_65 = constant 0 : index + %c-1_66 = constant -1 : index + %113 = cmpi "slt", %arg5, %c0_65 : index + %114 = subi %c-1_66, %arg5 : index + %115 = select %113, %114, %arg5 : index + %116 = divi_signed %115, %c16_64 : index + %117 = subi %c-1_66, %116 : index + %118 = select %113, %117, %116 : index + %c16_67 = constant 16 : index + %119 = remi_signed %118, %c16_67 : index + %c0_68 = constant 0 : index + %120 = cmpi "slt", %119, %c0_68 : index + %121 = addi %119, %c16_67 : index + %122 = select %120, %121, %119 : index + %123 = addi %arg6, %arg7 : index + %c128_69 = constant 128 : index + %124 = remi_signed %123, %c128_69 : index + %c0_70 = constant 0 : index + %125 = cmpi "slt", %124, %c0_70 : index + %126 = addi %124, %c128_69 : index + %127 = select %125, %126, %124 : index + %c16_71 = constant 16 : index + %128 = remi_signed %arg5, %c16_71 : index + %c0_72 = constant 0 : index + %129 = cmpi "slt", %128, %c0_72 : index + %130 = addi %128, %c16_71 : index + %131 = select %129, %130, %128 : index + %c8_73 = constant 8 : index + %c0_74 = constant 0 : index + %c-1_75 = constant -1 : index + %132 = cmpi "slt", %131, %c0_74 : index + %133 = subi %c-1_75, %131 : index + %134 = select %132, %133, %131 : index + %135 = divi_signed %134, %c8_73 : index + %136 = subi %c-1_75, %135 : index + %137 = select %132, %136, %135 : index + %c2_76 = constant 2 : index + %138 = remi_signed %137, %c2_76 : index + %c0_77 = constant 0 : index + %139 = cmpi "slt", %138, %c0_77 : index + %140 = addi %138, %c2_76 : index + %141 = select %139, %140, %138 : index + %142 = load %3[%122, %127, %141] : memref<16x128x2xvector<8xf32>> + %143 = vector.extractelement %142[%c3_i64 : i64] : vector<8xf32> + %c16_78 = constant 16 : index + %c0_79 = constant 0 : index + %c-1_80 = constant -1 : index + %144 = cmpi "slt", %arg5, %c0_79 : index + %145 = subi %c-1_80, %arg5 : index + %146 = select %144, %145, %arg5 : index + %147 = divi_signed %146, %c16_78 : index + %148 = subi %c-1_80, %147 : index + %149 = select %144, %148, %147 : index + %c16_81 = constant 16 : index + %150 = remi_signed %149, %c16_81 : index + %c0_82 = constant 0 : index + %151 = cmpi "slt", %150, %c0_82 : index + %152 = addi %150, %c16_81 : index + %153 = select %151, %152, %150 : index + %154 = addi %arg6, %arg7 : index + %c128_83 = constant 128 : index + %155 = remi_signed %154, %c128_83 : index + %c0_84 = constant 0 : index + %156 = cmpi "slt", %155, %c0_84 : index + %157 = addi %155, %c128_83 : index + %158 = select %156, %157, %155 : index + %c16_85 = constant 16 : index + %159 = remi_signed %arg5, %c16_85 : index + %c0_86 = constant 0 : index + %160 = cmpi "slt", %159, %c0_86 : index + %161 = addi %159, %c16_85 : index + %162 = select %160, %161, %159 : index + %c8_87 = constant 8 : index + %c0_88 = constant 0 : index + %c-1_89 = constant -1 : index + %163 = cmpi "slt", %162, %c0_88 : index + %164 = subi %c-1_89, %162 : index + %165 = select %163, %164, %162 : index + %166 = divi_signed %165, %c8_87 : index + %167 = subi %c-1_89, %166 : index + %168 = select %163, %167, %166 : index + %c2_90 = constant 2 : index + %169 = remi_signed %168, %c2_90 : index + %c0_91 = constant 0 : index + %170 = cmpi "slt", %169, %c0_91 : index + %171 = addi %169, %c2_90 : index + %172 = select %170, %171, %169 : index + %173 = load %3[%153, %158, %172] : memref<16x128x2xvector<8xf32>> + %174 = vector.extractelement %173[%c4_i64 : i64] : vector<8xf32> + %c16_92 = constant 16 : index + %c0_93 = constant 0 : index + %c-1_94 = constant -1 : index + %175 = cmpi "slt", %arg5, %c0_93 : index + %176 = subi %c-1_94, %arg5 : index + %177 = select %175, %176, %arg5 : index + %178 = divi_signed %177, %c16_92 : index + %179 = subi %c-1_94, %178 : index + %180 = select %175, %179, %178 : index + %c16_95 = constant 16 : index + %181 = remi_signed %180, %c16_95 : index + %c0_96 = constant 0 : index + %182 = cmpi "slt", %181, %c0_96 : index + %183 = addi %181, %c16_95 : index + %184 = select %182, %183, %181 : index + %185 = addi %arg6, %arg7 : index + %c128_97 = constant 128 : index + %186 = remi_signed %185, %c128_97 : index + %c0_98 = constant 0 : index + %187 = cmpi "slt", %186, %c0_98 : index + %188 = addi %186, %c128_97 : index + %189 = select %187, %188, %186 : index + %c16_99 = constant 16 : index + %190 = remi_signed %arg5, %c16_99 : index + %c0_100 = constant 0 : index + %191 = cmpi "slt", %190, %c0_100 : index + %192 = addi %190, %c16_99 : index + %193 = select %191, %192, %190 : index + %c8_101 = constant 8 : index + %c0_102 = constant 0 : index + %c-1_103 = constant -1 : index + %194 = cmpi "slt", %193, %c0_102 : index + %195 = subi %c-1_103, %193 : index + %196 = select %194, %195, %193 : index + %197 = divi_signed %196, %c8_101 : index + %198 = subi %c-1_103, %197 : index + %199 = select %194, %198, %197 : index + %c2_104 = constant 2 : index + %200 = remi_signed %199, %c2_104 : index + %c0_105 = constant 0 : index + %201 = cmpi "slt", %200, %c0_105 : index + %202 = addi %200, %c2_104 : index + %203 = select %201, %202, %200 : index + %204 = load %3[%184, %189, %203] : memref<16x128x2xvector<8xf32>> + %205 = vector.extractelement %204[%c5_i64 : i64] : vector<8xf32> + %c16_106 = constant 16 : index + %c0_107 = constant 0 : index + %c-1_108 = constant -1 : index + %206 = cmpi "slt", %arg5, %c0_107 : index + %207 = subi %c-1_108, %arg5 : index + %208 = select %206, %207, %arg5 : index + %209 = divi_signed %208, %c16_106 : index + %210 = subi %c-1_108, %209 : index + %211 = select %206, %210, %209 : index + %c16_109 = constant 16 : index + %212 = remi_signed %211, %c16_109 : index + %c0_110 = constant 0 : index + %213 = cmpi "slt", %212, %c0_110 : index + %214 = addi %212, %c16_109 : index + %215 = select %213, %214, %212 : index + %216 = addi %arg6, %arg7 : index + %c128_111 = constant 128 : index + %217 = remi_signed %216, %c128_111 : index + %c0_112 = constant 0 : index + %218 = cmpi "slt", %217, %c0_112 : index + %219 = addi %217, %c128_111 : index + %220 = select %218, %219, %217 : index + %c16_113 = constant 16 : index + %221 = remi_signed %arg5, %c16_113 : index + %c0_114 = constant 0 : index + %222 = cmpi "slt", %221, %c0_114 : index + %223 = addi %221, %c16_113 : index + %224 = select %222, %223, %221 : index + %c8_115 = constant 8 : index + %c0_116 = constant 0 : index + %c-1_117 = constant -1 : index + %225 = cmpi "slt", %224, %c0_116 : index + %226 = subi %c-1_117, %224 : index + %227 = select %225, %226, %224 : index + %228 = divi_signed %227, %c8_115 : index + %229 = subi %c-1_117, %228 : index + %230 = select %225, %229, %228 : index + %c2_118 = constant 2 : index + %231 = remi_signed %230, %c2_118 : index + %c0_119 = constant 0 : index + %232 = cmpi "slt", %231, %c0_119 : index + %233 = addi %231, %c2_118 : index + %234 = select %232, %233, %231 : index + %235 = load %3[%215, %220, %234] : memref<16x128x2xvector<8xf32>> + %236 = vector.extractelement %235[%c6_i64 : i64] : vector<8xf32> + %c16_120 = constant 16 : index + %c0_121 = constant 0 : index + %c-1_122 = constant -1 : index + %237 = cmpi "slt", %arg5, %c0_121 : index + %238 = subi %c-1_122, %arg5 : index + %239 = select %237, %238, %arg5 : index + %240 = divi_signed %239, %c16_120 : index + %241 = subi %c-1_122, %240 : index + %242 = select %237, %241, %240 : index + %c16_123 = constant 16 : index + %243 = remi_signed %242, %c16_123 : index + %c0_124 = constant 0 : index + %244 = cmpi "slt", %243, %c0_124 : index + %245 = addi %243, %c16_123 : index + %246 = select %244, %245, %243 : index + %247 = addi %arg6, %arg7 : index + %c128_125 = constant 128 : index + %248 = remi_signed %247, %c128_125 : index + %c0_126 = constant 0 : index + %249 = cmpi "slt", %248, %c0_126 : index + %250 = addi %248, %c128_125 : index + %251 = select %249, %250, %248 : index + %c16_127 = constant 16 : index + %252 = remi_signed %arg5, %c16_127 : index + %c0_128 = constant 0 : index + %253 = cmpi "slt", %252, %c0_128 : index + %254 = addi %252, %c16_127 : index + %255 = select %253, %254, %252 : index + %c8_129 = constant 8 : index + %c0_130 = constant 0 : index + %c-1_131 = constant -1 : index + %256 = cmpi "slt", %255, %c0_130 : index + %257 = subi %c-1_131, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c8_129 : index + %260 = subi %c-1_131, %259 : index + %261 = select %256, %260, %259 : index + %c2_132 = constant 2 : index + %262 = remi_signed %261, %c2_132 : index + %c0_133 = constant 0 : index + %263 = cmpi "slt", %262, %c0_133 : index + %264 = addi %262, %c2_132 : index + %265 = select %263, %264, %262 : index + %266 = load %3[%246, %251, %265] : memref<16x128x2xvector<8xf32>> + %267 = vector.extractelement %266[%c7_i64 : i64] : vector<8xf32> + %268 = "accv.bin_op"(%12, %50) {predicate = 2 : i64} : (f32, f32) -> f32 + %269 = "accv.bin_op"(%13, %81) {predicate = 2 : i64} : (f32, f32) -> f32 + %270 = "accv.bin_op"(%14, %112) {predicate = 2 : i64} : (f32, f32) -> f32 + %271 = "accv.bin_op"(%15, %143) {predicate = 2 : i64} : (f32, f32) -> f32 + %272 = "accv.bin_op"(%16, %174) {predicate = 2 : i64} : (f32, f32) -> f32 + %273 = "accv.bin_op"(%17, %205) {predicate = 2 : i64} : (f32, f32) -> f32 + %274 = "accv.bin_op"(%18, %236) {predicate = 2 : i64} : (f32, f32) -> f32 + %275 = "accv.bin_op"(%19, %267) {predicate = 2 : i64} : (f32, f32) -> f32 + %c16_134 = constant 16 : index + %c0_135 = constant 0 : index + %c-1_136 = constant -1 : index + %276 = cmpi "slt", %arg5, %c0_135 : index + %277 = subi %c-1_136, %arg5 : index + %278 = select %276, %277, %arg5 : index + %279 = divi_signed %278, %c16_134 : index + %280 = subi %c-1_136, %279 : index + %281 = select %276, %280, %279 : index + %c16_137 = constant 16 : index + %282 = remi_signed %281, %c16_137 : index + %c0_138 = constant 0 : index + %283 = cmpi "slt", %282, %c0_138 : index + %284 = addi %282, %c16_137 : index + %285 = select %283, %284, %282 : index + %c0_139 = constant 0 : index + %c16_140 = constant 16 : index + %286 = remi_signed %arg5, %c16_140 : index + %c0_141 = constant 0 : index + %287 = cmpi "slt", %286, %c0_141 : index + %288 = addi %286, %c16_140 : index + %289 = select %287, %288, %286 : index + %c8_142 = constant 8 : index + %c0_143 = constant 0 : index + %c-1_144 = constant -1 : index + %290 = cmpi "slt", %289, %c0_143 : index + %291 = subi %c-1_144, %289 : index + %292 = select %290, %291, %289 : index + %293 = divi_signed %292, %c8_142 : index + %294 = subi %c-1_144, %293 : index + %295 = select %290, %294, %293 : index + %c2_145 = constant 2 : index + %296 = remi_signed %295, %c2_145 : index + %c0_146 = constant 0 : index + %297 = cmpi "slt", %296, %c0_146 : index + %298 = addi %296, %c2_145 : index + %299 = select %297, %298, %296 : index + %300 = load %2[%285, %c0_139, %299] : memref<16x6x2xvector<8xf32>> + %301 = vector.extractelement %300[%c0_i64 : i64] : vector<8xf32> + %c16_147 = constant 16 : index + %c0_148 = constant 0 : index + %c-1_149 = constant -1 : index + %302 = cmpi "slt", %arg5, %c0_148 : index + %303 = subi %c-1_149, %arg5 : index + %304 = select %302, %303, %arg5 : index + %305 = divi_signed %304, %c16_147 : index + %306 = subi %c-1_149, %305 : index + %307 = select %302, %306, %305 : index + %c16_150 = constant 16 : index + %308 = remi_signed %307, %c16_150 : index + %c0_151 = constant 0 : index + %309 = cmpi "slt", %308, %c0_151 : index + %310 = addi %308, %c16_150 : index + %311 = select %309, %310, %308 : index + %c0_152 = constant 0 : index + %c16_153 = constant 16 : index + %312 = remi_signed %arg5, %c16_153 : index + %c0_154 = constant 0 : index + %313 = cmpi "slt", %312, %c0_154 : index + %314 = addi %312, %c16_153 : index + %315 = select %313, %314, %312 : index + %c8_155 = constant 8 : index + %c0_156 = constant 0 : index + %c-1_157 = constant -1 : index + %316 = cmpi "slt", %315, %c0_156 : index + %317 = subi %c-1_157, %315 : index + %318 = select %316, %317, %315 : index + %319 = divi_signed %318, %c8_155 : index + %320 = subi %c-1_157, %319 : index + %321 = select %316, %320, %319 : index + %c2_158 = constant 2 : index + %322 = remi_signed %321, %c2_158 : index + %c0_159 = constant 0 : index + %323 = cmpi "slt", %322, %c0_159 : index + %324 = addi %322, %c2_158 : index + %325 = select %323, %324, %322 : index + %326 = load %2[%311, %c0_152, %325] : memref<16x6x2xvector<8xf32>> + %327 = vector.extractelement %326[%c1_i64 : i64] : vector<8xf32> + %c16_160 = constant 16 : index + %c0_161 = constant 0 : index + %c-1_162 = constant -1 : index + %328 = cmpi "slt", %arg5, %c0_161 : index + %329 = subi %c-1_162, %arg5 : index + %330 = select %328, %329, %arg5 : index + %331 = divi_signed %330, %c16_160 : index + %332 = subi %c-1_162, %331 : index + %333 = select %328, %332, %331 : index + %c16_163 = constant 16 : index + %334 = remi_signed %333, %c16_163 : index + %c0_164 = constant 0 : index + %335 = cmpi "slt", %334, %c0_164 : index + %336 = addi %334, %c16_163 : index + %337 = select %335, %336, %334 : index + %c0_165 = constant 0 : index + %c16_166 = constant 16 : index + %338 = remi_signed %arg5, %c16_166 : index + %c0_167 = constant 0 : index + %339 = cmpi "slt", %338, %c0_167 : index + %340 = addi %338, %c16_166 : index + %341 = select %339, %340, %338 : index + %c8_168 = constant 8 : index + %c0_169 = constant 0 : index + %c-1_170 = constant -1 : index + %342 = cmpi "slt", %341, %c0_169 : index + %343 = subi %c-1_170, %341 : index + %344 = select %342, %343, %341 : index + %345 = divi_signed %344, %c8_168 : index + %346 = subi %c-1_170, %345 : index + %347 = select %342, %346, %345 : index + %c2_171 = constant 2 : index + %348 = remi_signed %347, %c2_171 : index + %c0_172 = constant 0 : index + %349 = cmpi "slt", %348, %c0_172 : index + %350 = addi %348, %c2_171 : index + %351 = select %349, %350, %348 : index + %352 = load %2[%337, %c0_165, %351] : memref<16x6x2xvector<8xf32>> + %353 = vector.extractelement %352[%c2_i64 : i64] : vector<8xf32> + %c16_173 = constant 16 : index + %c0_174 = constant 0 : index + %c-1_175 = constant -1 : index + %354 = cmpi "slt", %arg5, %c0_174 : index + %355 = subi %c-1_175, %arg5 : index + %356 = select %354, %355, %arg5 : index + %357 = divi_signed %356, %c16_173 : index + %358 = subi %c-1_175, %357 : index + %359 = select %354, %358, %357 : index + %c16_176 = constant 16 : index + %360 = remi_signed %359, %c16_176 : index + %c0_177 = constant 0 : index + %361 = cmpi "slt", %360, %c0_177 : index + %362 = addi %360, %c16_176 : index + %363 = select %361, %362, %360 : index + %c0_178 = constant 0 : index + %c16_179 = constant 16 : index + %364 = remi_signed %arg5, %c16_179 : index + %c0_180 = constant 0 : index + %365 = cmpi "slt", %364, %c0_180 : index + %366 = addi %364, %c16_179 : index + %367 = select %365, %366, %364 : index + %c8_181 = constant 8 : index + %c0_182 = constant 0 : index + %c-1_183 = constant -1 : index + %368 = cmpi "slt", %367, %c0_182 : index + %369 = subi %c-1_183, %367 : index + %370 = select %368, %369, %367 : index + %371 = divi_signed %370, %c8_181 : index + %372 = subi %c-1_183, %371 : index + %373 = select %368, %372, %371 : index + %c2_184 = constant 2 : index + %374 = remi_signed %373, %c2_184 : index + %c0_185 = constant 0 : index + %375 = cmpi "slt", %374, %c0_185 : index + %376 = addi %374, %c2_184 : index + %377 = select %375, %376, %374 : index + %378 = load %2[%363, %c0_178, %377] : memref<16x6x2xvector<8xf32>> + %379 = vector.extractelement %378[%c3_i64 : i64] : vector<8xf32> + %c16_186 = constant 16 : index + %c0_187 = constant 0 : index + %c-1_188 = constant -1 : index + %380 = cmpi "slt", %arg5, %c0_187 : index + %381 = subi %c-1_188, %arg5 : index + %382 = select %380, %381, %arg5 : index + %383 = divi_signed %382, %c16_186 : index + %384 = subi %c-1_188, %383 : index + %385 = select %380, %384, %383 : index + %c16_189 = constant 16 : index + %386 = remi_signed %385, %c16_189 : index + %c0_190 = constant 0 : index + %387 = cmpi "slt", %386, %c0_190 : index + %388 = addi %386, %c16_189 : index + %389 = select %387, %388, %386 : index + %c0_191 = constant 0 : index + %c16_192 = constant 16 : index + %390 = remi_signed %arg5, %c16_192 : index + %c0_193 = constant 0 : index + %391 = cmpi "slt", %390, %c0_193 : index + %392 = addi %390, %c16_192 : index + %393 = select %391, %392, %390 : index + %c8_194 = constant 8 : index + %c0_195 = constant 0 : index + %c-1_196 = constant -1 : index + %394 = cmpi "slt", %393, %c0_195 : index + %395 = subi %c-1_196, %393 : index + %396 = select %394, %395, %393 : index + %397 = divi_signed %396, %c8_194 : index + %398 = subi %c-1_196, %397 : index + %399 = select %394, %398, %397 : index + %c2_197 = constant 2 : index + %400 = remi_signed %399, %c2_197 : index + %c0_198 = constant 0 : index + %401 = cmpi "slt", %400, %c0_198 : index + %402 = addi %400, %c2_197 : index + %403 = select %401, %402, %400 : index + %404 = load %2[%389, %c0_191, %403] : memref<16x6x2xvector<8xf32>> + %405 = vector.extractelement %404[%c4_i64 : i64] : vector<8xf32> + %c16_199 = constant 16 : index + %c0_200 = constant 0 : index + %c-1_201 = constant -1 : index + %406 = cmpi "slt", %arg5, %c0_200 : index + %407 = subi %c-1_201, %arg5 : index + %408 = select %406, %407, %arg5 : index + %409 = divi_signed %408, %c16_199 : index + %410 = subi %c-1_201, %409 : index + %411 = select %406, %410, %409 : index + %c16_202 = constant 16 : index + %412 = remi_signed %411, %c16_202 : index + %c0_203 = constant 0 : index + %413 = cmpi "slt", %412, %c0_203 : index + %414 = addi %412, %c16_202 : index + %415 = select %413, %414, %412 : index + %c0_204 = constant 0 : index + %c16_205 = constant 16 : index + %416 = remi_signed %arg5, %c16_205 : index + %c0_206 = constant 0 : index + %417 = cmpi "slt", %416, %c0_206 : index + %418 = addi %416, %c16_205 : index + %419 = select %417, %418, %416 : index + %c8_207 = constant 8 : index + %c0_208 = constant 0 : index + %c-1_209 = constant -1 : index + %420 = cmpi "slt", %419, %c0_208 : index + %421 = subi %c-1_209, %419 : index + %422 = select %420, %421, %419 : index + %423 = divi_signed %422, %c8_207 : index + %424 = subi %c-1_209, %423 : index + %425 = select %420, %424, %423 : index + %c2_210 = constant 2 : index + %426 = remi_signed %425, %c2_210 : index + %c0_211 = constant 0 : index + %427 = cmpi "slt", %426, %c0_211 : index + %428 = addi %426, %c2_210 : index + %429 = select %427, %428, %426 : index + %430 = load %2[%415, %c0_204, %429] : memref<16x6x2xvector<8xf32>> + %431 = vector.extractelement %430[%c5_i64 : i64] : vector<8xf32> + %c16_212 = constant 16 : index + %c0_213 = constant 0 : index + %c-1_214 = constant -1 : index + %432 = cmpi "slt", %arg5, %c0_213 : index + %433 = subi %c-1_214, %arg5 : index + %434 = select %432, %433, %arg5 : index + %435 = divi_signed %434, %c16_212 : index + %436 = subi %c-1_214, %435 : index + %437 = select %432, %436, %435 : index + %c16_215 = constant 16 : index + %438 = remi_signed %437, %c16_215 : index + %c0_216 = constant 0 : index + %439 = cmpi "slt", %438, %c0_216 : index + %440 = addi %438, %c16_215 : index + %441 = select %439, %440, %438 : index + %c0_217 = constant 0 : index + %c16_218 = constant 16 : index + %442 = remi_signed %arg5, %c16_218 : index + %c0_219 = constant 0 : index + %443 = cmpi "slt", %442, %c0_219 : index + %444 = addi %442, %c16_218 : index + %445 = select %443, %444, %442 : index + %c8_220 = constant 8 : index + %c0_221 = constant 0 : index + %c-1_222 = constant -1 : index + %446 = cmpi "slt", %445, %c0_221 : index + %447 = subi %c-1_222, %445 : index + %448 = select %446, %447, %445 : index + %449 = divi_signed %448, %c8_220 : index + %450 = subi %c-1_222, %449 : index + %451 = select %446, %450, %449 : index + %c2_223 = constant 2 : index + %452 = remi_signed %451, %c2_223 : index + %c0_224 = constant 0 : index + %453 = cmpi "slt", %452, %c0_224 : index + %454 = addi %452, %c2_223 : index + %455 = select %453, %454, %452 : index + %456 = load %2[%441, %c0_217, %455] : memref<16x6x2xvector<8xf32>> + %457 = vector.extractelement %456[%c6_i64 : i64] : vector<8xf32> + %c16_225 = constant 16 : index + %c0_226 = constant 0 : index + %c-1_227 = constant -1 : index + %458 = cmpi "slt", %arg5, %c0_226 : index + %459 = subi %c-1_227, %arg5 : index + %460 = select %458, %459, %arg5 : index + %461 = divi_signed %460, %c16_225 : index + %462 = subi %c-1_227, %461 : index + %463 = select %458, %462, %461 : index + %c16_228 = constant 16 : index + %464 = remi_signed %463, %c16_228 : index + %c0_229 = constant 0 : index + %465 = cmpi "slt", %464, %c0_229 : index + %466 = addi %464, %c16_228 : index + %467 = select %465, %466, %464 : index + %c0_230 = constant 0 : index + %c16_231 = constant 16 : index + %468 = remi_signed %arg5, %c16_231 : index + %c0_232 = constant 0 : index + %469 = cmpi "slt", %468, %c0_232 : index + %470 = addi %468, %c16_231 : index + %471 = select %469, %470, %468 : index + %c8_233 = constant 8 : index + %c0_234 = constant 0 : index + %c-1_235 = constant -1 : index + %472 = cmpi "slt", %471, %c0_234 : index + %473 = subi %c-1_235, %471 : index + %474 = select %472, %473, %471 : index + %475 = divi_signed %474, %c8_233 : index + %476 = subi %c-1_235, %475 : index + %477 = select %472, %476, %475 : index + %c2_236 = constant 2 : index + %478 = remi_signed %477, %c2_236 : index + %c0_237 = constant 0 : index + %479 = cmpi "slt", %478, %c0_237 : index + %480 = addi %478, %c2_236 : index + %481 = select %479, %480, %478 : index + %482 = load %2[%467, %c0_230, %481] : memref<16x6x2xvector<8xf32>> + %483 = vector.extractelement %482[%c7_i64 : i64] : vector<8xf32> + %484 = "accv.bin_op"(%301, %268) {predicate = 0 : i64} : (f32, f32) -> f32 + %485 = "accv.bin_op"(%327, %269) {predicate = 0 : i64} : (f32, f32) -> f32 + %486 = "accv.bin_op"(%353, %270) {predicate = 0 : i64} : (f32, f32) -> f32 + %487 = "accv.bin_op"(%379, %271) {predicate = 0 : i64} : (f32, f32) -> f32 + %488 = "accv.bin_op"(%405, %272) {predicate = 0 : i64} : (f32, f32) -> f32 + %489 = "accv.bin_op"(%431, %273) {predicate = 0 : i64} : (f32, f32) -> f32 + %490 = "accv.bin_op"(%457, %274) {predicate = 0 : i64} : (f32, f32) -> f32 + %491 = "accv.bin_op"(%483, %275) {predicate = 0 : i64} : (f32, f32) -> f32 + %c16_238 = constant 16 : index + %c0_239 = constant 0 : index + %c-1_240 = constant -1 : index + %492 = cmpi "slt", %arg5, %c0_239 : index + %493 = subi %c-1_240, %arg5 : index + %494 = select %492, %493, %arg5 : index + %495 = divi_signed %494, %c16_238 : index + %496 = subi %c-1_240, %495 : index + %497 = select %492, %496, %495 : index + %c16_241 = constant 16 : index + %498 = remi_signed %497, %c16_241 : index + %c0_242 = constant 0 : index + %499 = cmpi "slt", %498, %c0_242 : index + %500 = addi %498, %c16_241 : index + %501 = select %499, %500, %498 : index + %c0_243 = constant 0 : index + %c16_244 = constant 16 : index + %502 = remi_signed %arg5, %c16_244 : index + %c0_245 = constant 0 : index + %503 = cmpi "slt", %502, %c0_245 : index + %504 = addi %502, %c16_244 : index + %505 = select %503, %504, %502 : index + %c8_246 = constant 8 : index + %c0_247 = constant 0 : index + %c-1_248 = constant -1 : index + %506 = cmpi "slt", %505, %c0_247 : index + %507 = subi %c-1_248, %505 : index + %508 = select %506, %507, %505 : index + %509 = divi_signed %508, %c8_246 : index + %510 = subi %c-1_248, %509 : index + %511 = select %506, %510, %509 : index + %c2_249 = constant 2 : index + %512 = remi_signed %511, %c2_249 : index + %c0_250 = constant 0 : index + %513 = cmpi "slt", %512, %c0_250 : index + %514 = addi %512, %c2_249 : index + %515 = select %513, %514, %512 : index + %516 = load %2[%501, %c0_243, %515] : memref<16x6x2xvector<8xf32>> + %517 = vector.insertelement %484, %516[%c0_i64 : i64] : vector<8xf32> + %c16_251 = constant 16 : index + %c0_252 = constant 0 : index + %c-1_253 = constant -1 : index + %518 = cmpi "slt", %arg5, %c0_252 : index + %519 = subi %c-1_253, %arg5 : index + %520 = select %518, %519, %arg5 : index + %521 = divi_signed %520, %c16_251 : index + %522 = subi %c-1_253, %521 : index + %523 = select %518, %522, %521 : index + %c16_254 = constant 16 : index + %524 = remi_signed %523, %c16_254 : index + %c0_255 = constant 0 : index + %525 = cmpi "slt", %524, %c0_255 : index + %526 = addi %524, %c16_254 : index + %527 = select %525, %526, %524 : index + %c0_256 = constant 0 : index + %c16_257 = constant 16 : index + %528 = remi_signed %arg5, %c16_257 : index + %c0_258 = constant 0 : index + %529 = cmpi "slt", %528, %c0_258 : index + %530 = addi %528, %c16_257 : index + %531 = select %529, %530, %528 : index + %c8_259 = constant 8 : index + %c0_260 = constant 0 : index + %c-1_261 = constant -1 : index + %532 = cmpi "slt", %531, %c0_260 : index + %533 = subi %c-1_261, %531 : index + %534 = select %532, %533, %531 : index + %535 = divi_signed %534, %c8_259 : index + %536 = subi %c-1_261, %535 : index + %537 = select %532, %536, %535 : index + %c2_262 = constant 2 : index + %538 = remi_signed %537, %c2_262 : index + %c0_263 = constant 0 : index + %539 = cmpi "slt", %538, %c0_263 : index + %540 = addi %538, %c2_262 : index + %541 = select %539, %540, %538 : index + store %517, %2[%527, %c0_256, %541] : memref<16x6x2xvector<8xf32>> + %c16_264 = constant 16 : index + %c0_265 = constant 0 : index + %c-1_266 = constant -1 : index + %542 = cmpi "slt", %arg5, %c0_265 : index + %543 = subi %c-1_266, %arg5 : index + %544 = select %542, %543, %arg5 : index + %545 = divi_signed %544, %c16_264 : index + %546 = subi %c-1_266, %545 : index + %547 = select %542, %546, %545 : index + %c16_267 = constant 16 : index + %548 = remi_signed %547, %c16_267 : index + %c0_268 = constant 0 : index + %549 = cmpi "slt", %548, %c0_268 : index + %550 = addi %548, %c16_267 : index + %551 = select %549, %550, %548 : index + %c0_269 = constant 0 : index + %c16_270 = constant 16 : index + %552 = remi_signed %arg5, %c16_270 : index + %c0_271 = constant 0 : index + %553 = cmpi "slt", %552, %c0_271 : index + %554 = addi %552, %c16_270 : index + %555 = select %553, %554, %552 : index + %c8_272 = constant 8 : index + %c0_273 = constant 0 : index + %c-1_274 = constant -1 : index + %556 = cmpi "slt", %555, %c0_273 : index + %557 = subi %c-1_274, %555 : index + %558 = select %556, %557, %555 : index + %559 = divi_signed %558, %c8_272 : index + %560 = subi %c-1_274, %559 : index + %561 = select %556, %560, %559 : index + %c2_275 = constant 2 : index + %562 = remi_signed %561, %c2_275 : index + %c0_276 = constant 0 : index + %563 = cmpi "slt", %562, %c0_276 : index + %564 = addi %562, %c2_275 : index + %565 = select %563, %564, %562 : index + %566 = load %2[%551, %c0_269, %565] : memref<16x6x2xvector<8xf32>> + %567 = vector.insertelement %485, %566[%c1_i64 : i64] : vector<8xf32> + %c16_277 = constant 16 : index + %c0_278 = constant 0 : index + %c-1_279 = constant -1 : index + %568 = cmpi "slt", %arg5, %c0_278 : index + %569 = subi %c-1_279, %arg5 : index + %570 = select %568, %569, %arg5 : index + %571 = divi_signed %570, %c16_277 : index + %572 = subi %c-1_279, %571 : index + %573 = select %568, %572, %571 : index + %c16_280 = constant 16 : index + %574 = remi_signed %573, %c16_280 : index + %c0_281 = constant 0 : index + %575 = cmpi "slt", %574, %c0_281 : index + %576 = addi %574, %c16_280 : index + %577 = select %575, %576, %574 : index + %c0_282 = constant 0 : index + %c16_283 = constant 16 : index + %578 = remi_signed %arg5, %c16_283 : index + %c0_284 = constant 0 : index + %579 = cmpi "slt", %578, %c0_284 : index + %580 = addi %578, %c16_283 : index + %581 = select %579, %580, %578 : index + %c8_285 = constant 8 : index + %c0_286 = constant 0 : index + %c-1_287 = constant -1 : index + %582 = cmpi "slt", %581, %c0_286 : index + %583 = subi %c-1_287, %581 : index + %584 = select %582, %583, %581 : index + %585 = divi_signed %584, %c8_285 : index + %586 = subi %c-1_287, %585 : index + %587 = select %582, %586, %585 : index + %c2_288 = constant 2 : index + %588 = remi_signed %587, %c2_288 : index + %c0_289 = constant 0 : index + %589 = cmpi "slt", %588, %c0_289 : index + %590 = addi %588, %c2_288 : index + %591 = select %589, %590, %588 : index + store %567, %2[%577, %c0_282, %591] : memref<16x6x2xvector<8xf32>> + %c16_290 = constant 16 : index + %c0_291 = constant 0 : index + %c-1_292 = constant -1 : index + %592 = cmpi "slt", %arg5, %c0_291 : index + %593 = subi %c-1_292, %arg5 : index + %594 = select %592, %593, %arg5 : index + %595 = divi_signed %594, %c16_290 : index + %596 = subi %c-1_292, %595 : index + %597 = select %592, %596, %595 : index + %c16_293 = constant 16 : index + %598 = remi_signed %597, %c16_293 : index + %c0_294 = constant 0 : index + %599 = cmpi "slt", %598, %c0_294 : index + %600 = addi %598, %c16_293 : index + %601 = select %599, %600, %598 : index + %c0_295 = constant 0 : index + %c16_296 = constant 16 : index + %602 = remi_signed %arg5, %c16_296 : index + %c0_297 = constant 0 : index + %603 = cmpi "slt", %602, %c0_297 : index + %604 = addi %602, %c16_296 : index + %605 = select %603, %604, %602 : index + %c8_298 = constant 8 : index + %c0_299 = constant 0 : index + %c-1_300 = constant -1 : index + %606 = cmpi "slt", %605, %c0_299 : index + %607 = subi %c-1_300, %605 : index + %608 = select %606, %607, %605 : index + %609 = divi_signed %608, %c8_298 : index + %610 = subi %c-1_300, %609 : index + %611 = select %606, %610, %609 : index + %c2_301 = constant 2 : index + %612 = remi_signed %611, %c2_301 : index + %c0_302 = constant 0 : index + %613 = cmpi "slt", %612, %c0_302 : index + %614 = addi %612, %c2_301 : index + %615 = select %613, %614, %612 : index + %616 = load %2[%601, %c0_295, %615] : memref<16x6x2xvector<8xf32>> + %617 = vector.insertelement %486, %616[%c2_i64 : i64] : vector<8xf32> + %c16_303 = constant 16 : index + %c0_304 = constant 0 : index + %c-1_305 = constant -1 : index + %618 = cmpi "slt", %arg5, %c0_304 : index + %619 = subi %c-1_305, %arg5 : index + %620 = select %618, %619, %arg5 : index + %621 = divi_signed %620, %c16_303 : index + %622 = subi %c-1_305, %621 : index + %623 = select %618, %622, %621 : index + %c16_306 = constant 16 : index + %624 = remi_signed %623, %c16_306 : index + %c0_307 = constant 0 : index + %625 = cmpi "slt", %624, %c0_307 : index + %626 = addi %624, %c16_306 : index + %627 = select %625, %626, %624 : index + %c0_308 = constant 0 : index + %c16_309 = constant 16 : index + %628 = remi_signed %arg5, %c16_309 : index + %c0_310 = constant 0 : index + %629 = cmpi "slt", %628, %c0_310 : index + %630 = addi %628, %c16_309 : index + %631 = select %629, %630, %628 : index + %c8_311 = constant 8 : index + %c0_312 = constant 0 : index + %c-1_313 = constant -1 : index + %632 = cmpi "slt", %631, %c0_312 : index + %633 = subi %c-1_313, %631 : index + %634 = select %632, %633, %631 : index + %635 = divi_signed %634, %c8_311 : index + %636 = subi %c-1_313, %635 : index + %637 = select %632, %636, %635 : index + %c2_314 = constant 2 : index + %638 = remi_signed %637, %c2_314 : index + %c0_315 = constant 0 : index + %639 = cmpi "slt", %638, %c0_315 : index + %640 = addi %638, %c2_314 : index + %641 = select %639, %640, %638 : index + store %617, %2[%627, %c0_308, %641] : memref<16x6x2xvector<8xf32>> + %c16_316 = constant 16 : index + %c0_317 = constant 0 : index + %c-1_318 = constant -1 : index + %642 = cmpi "slt", %arg5, %c0_317 : index + %643 = subi %c-1_318, %arg5 : index + %644 = select %642, %643, %arg5 : index + %645 = divi_signed %644, %c16_316 : index + %646 = subi %c-1_318, %645 : index + %647 = select %642, %646, %645 : index + %c16_319 = constant 16 : index + %648 = remi_signed %647, %c16_319 : index + %c0_320 = constant 0 : index + %649 = cmpi "slt", %648, %c0_320 : index + %650 = addi %648, %c16_319 : index + %651 = select %649, %650, %648 : index + %c0_321 = constant 0 : index + %c16_322 = constant 16 : index + %652 = remi_signed %arg5, %c16_322 : index + %c0_323 = constant 0 : index + %653 = cmpi "slt", %652, %c0_323 : index + %654 = addi %652, %c16_322 : index + %655 = select %653, %654, %652 : index + %c8_324 = constant 8 : index + %c0_325 = constant 0 : index + %c-1_326 = constant -1 : index + %656 = cmpi "slt", %655, %c0_325 : index + %657 = subi %c-1_326, %655 : index + %658 = select %656, %657, %655 : index + %659 = divi_signed %658, %c8_324 : index + %660 = subi %c-1_326, %659 : index + %661 = select %656, %660, %659 : index + %c2_327 = constant 2 : index + %662 = remi_signed %661, %c2_327 : index + %c0_328 = constant 0 : index + %663 = cmpi "slt", %662, %c0_328 : index + %664 = addi %662, %c2_327 : index + %665 = select %663, %664, %662 : index + %666 = load %2[%651, %c0_321, %665] : memref<16x6x2xvector<8xf32>> + %667 = vector.insertelement %487, %666[%c3_i64 : i64] : vector<8xf32> + %c16_329 = constant 16 : index + %c0_330 = constant 0 : index + %c-1_331 = constant -1 : index + %668 = cmpi "slt", %arg5, %c0_330 : index + %669 = subi %c-1_331, %arg5 : index + %670 = select %668, %669, %arg5 : index + %671 = divi_signed %670, %c16_329 : index + %672 = subi %c-1_331, %671 : index + %673 = select %668, %672, %671 : index + %c16_332 = constant 16 : index + %674 = remi_signed %673, %c16_332 : index + %c0_333 = constant 0 : index + %675 = cmpi "slt", %674, %c0_333 : index + %676 = addi %674, %c16_332 : index + %677 = select %675, %676, %674 : index + %c0_334 = constant 0 : index + %c16_335 = constant 16 : index + %678 = remi_signed %arg5, %c16_335 : index + %c0_336 = constant 0 : index + %679 = cmpi "slt", %678, %c0_336 : index + %680 = addi %678, %c16_335 : index + %681 = select %679, %680, %678 : index + %c8_337 = constant 8 : index + %c0_338 = constant 0 : index + %c-1_339 = constant -1 : index + %682 = cmpi "slt", %681, %c0_338 : index + %683 = subi %c-1_339, %681 : index + %684 = select %682, %683, %681 : index + %685 = divi_signed %684, %c8_337 : index + %686 = subi %c-1_339, %685 : index + %687 = select %682, %686, %685 : index + %c2_340 = constant 2 : index + %688 = remi_signed %687, %c2_340 : index + %c0_341 = constant 0 : index + %689 = cmpi "slt", %688, %c0_341 : index + %690 = addi %688, %c2_340 : index + %691 = select %689, %690, %688 : index + store %667, %2[%677, %c0_334, %691] : memref<16x6x2xvector<8xf32>> + %c16_342 = constant 16 : index + %c0_343 = constant 0 : index + %c-1_344 = constant -1 : index + %692 = cmpi "slt", %arg5, %c0_343 : index + %693 = subi %c-1_344, %arg5 : index + %694 = select %692, %693, %arg5 : index + %695 = divi_signed %694, %c16_342 : index + %696 = subi %c-1_344, %695 : index + %697 = select %692, %696, %695 : index + %c16_345 = constant 16 : index + %698 = remi_signed %697, %c16_345 : index + %c0_346 = constant 0 : index + %699 = cmpi "slt", %698, %c0_346 : index + %700 = addi %698, %c16_345 : index + %701 = select %699, %700, %698 : index + %c0_347 = constant 0 : index + %c16_348 = constant 16 : index + %702 = remi_signed %arg5, %c16_348 : index + %c0_349 = constant 0 : index + %703 = cmpi "slt", %702, %c0_349 : index + %704 = addi %702, %c16_348 : index + %705 = select %703, %704, %702 : index + %c8_350 = constant 8 : index + %c0_351 = constant 0 : index + %c-1_352 = constant -1 : index + %706 = cmpi "slt", %705, %c0_351 : index + %707 = subi %c-1_352, %705 : index + %708 = select %706, %707, %705 : index + %709 = divi_signed %708, %c8_350 : index + %710 = subi %c-1_352, %709 : index + %711 = select %706, %710, %709 : index + %c2_353 = constant 2 : index + %712 = remi_signed %711, %c2_353 : index + %c0_354 = constant 0 : index + %713 = cmpi "slt", %712, %c0_354 : index + %714 = addi %712, %c2_353 : index + %715 = select %713, %714, %712 : index + %716 = load %2[%701, %c0_347, %715] : memref<16x6x2xvector<8xf32>> + %717 = vector.insertelement %488, %716[%c4_i64 : i64] : vector<8xf32> + %c16_355 = constant 16 : index + %c0_356 = constant 0 : index + %c-1_357 = constant -1 : index + %718 = cmpi "slt", %arg5, %c0_356 : index + %719 = subi %c-1_357, %arg5 : index + %720 = select %718, %719, %arg5 : index + %721 = divi_signed %720, %c16_355 : index + %722 = subi %c-1_357, %721 : index + %723 = select %718, %722, %721 : index + %c16_358 = constant 16 : index + %724 = remi_signed %723, %c16_358 : index + %c0_359 = constant 0 : index + %725 = cmpi "slt", %724, %c0_359 : index + %726 = addi %724, %c16_358 : index + %727 = select %725, %726, %724 : index + %c0_360 = constant 0 : index + %c16_361 = constant 16 : index + %728 = remi_signed %arg5, %c16_361 : index + %c0_362 = constant 0 : index + %729 = cmpi "slt", %728, %c0_362 : index + %730 = addi %728, %c16_361 : index + %731 = select %729, %730, %728 : index + %c8_363 = constant 8 : index + %c0_364 = constant 0 : index + %c-1_365 = constant -1 : index + %732 = cmpi "slt", %731, %c0_364 : index + %733 = subi %c-1_365, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c8_363 : index + %736 = subi %c-1_365, %735 : index + %737 = select %732, %736, %735 : index + %c2_366 = constant 2 : index + %738 = remi_signed %737, %c2_366 : index + %c0_367 = constant 0 : index + %739 = cmpi "slt", %738, %c0_367 : index + %740 = addi %738, %c2_366 : index + %741 = select %739, %740, %738 : index + store %717, %2[%727, %c0_360, %741] : memref<16x6x2xvector<8xf32>> + %c16_368 = constant 16 : index + %c0_369 = constant 0 : index + %c-1_370 = constant -1 : index + %742 = cmpi "slt", %arg5, %c0_369 : index + %743 = subi %c-1_370, %arg5 : index + %744 = select %742, %743, %arg5 : index + %745 = divi_signed %744, %c16_368 : index + %746 = subi %c-1_370, %745 : index + %747 = select %742, %746, %745 : index + %c16_371 = constant 16 : index + %748 = remi_signed %747, %c16_371 : index + %c0_372 = constant 0 : index + %749 = cmpi "slt", %748, %c0_372 : index + %750 = addi %748, %c16_371 : index + %751 = select %749, %750, %748 : index + %c0_373 = constant 0 : index + %c16_374 = constant 16 : index + %752 = remi_signed %arg5, %c16_374 : index + %c0_375 = constant 0 : index + %753 = cmpi "slt", %752, %c0_375 : index + %754 = addi %752, %c16_374 : index + %755 = select %753, %754, %752 : index + %c8_376 = constant 8 : index + %c0_377 = constant 0 : index + %c-1_378 = constant -1 : index + %756 = cmpi "slt", %755, %c0_377 : index + %757 = subi %c-1_378, %755 : index + %758 = select %756, %757, %755 : index + %759 = divi_signed %758, %c8_376 : index + %760 = subi %c-1_378, %759 : index + %761 = select %756, %760, %759 : index + %c2_379 = constant 2 : index + %762 = remi_signed %761, %c2_379 : index + %c0_380 = constant 0 : index + %763 = cmpi "slt", %762, %c0_380 : index + %764 = addi %762, %c2_379 : index + %765 = select %763, %764, %762 : index + %766 = load %2[%751, %c0_373, %765] : memref<16x6x2xvector<8xf32>> + %767 = vector.insertelement %489, %766[%c5_i64 : i64] : vector<8xf32> + %c16_381 = constant 16 : index + %c0_382 = constant 0 : index + %c-1_383 = constant -1 : index + %768 = cmpi "slt", %arg5, %c0_382 : index + %769 = subi %c-1_383, %arg5 : index + %770 = select %768, %769, %arg5 : index + %771 = divi_signed %770, %c16_381 : index + %772 = subi %c-1_383, %771 : index + %773 = select %768, %772, %771 : index + %c16_384 = constant 16 : index + %774 = remi_signed %773, %c16_384 : index + %c0_385 = constant 0 : index + %775 = cmpi "slt", %774, %c0_385 : index + %776 = addi %774, %c16_384 : index + %777 = select %775, %776, %774 : index + %c0_386 = constant 0 : index + %c16_387 = constant 16 : index + %778 = remi_signed %arg5, %c16_387 : index + %c0_388 = constant 0 : index + %779 = cmpi "slt", %778, %c0_388 : index + %780 = addi %778, %c16_387 : index + %781 = select %779, %780, %778 : index + %c8_389 = constant 8 : index + %c0_390 = constant 0 : index + %c-1_391 = constant -1 : index + %782 = cmpi "slt", %781, %c0_390 : index + %783 = subi %c-1_391, %781 : index + %784 = select %782, %783, %781 : index + %785 = divi_signed %784, %c8_389 : index + %786 = subi %c-1_391, %785 : index + %787 = select %782, %786, %785 : index + %c2_392 = constant 2 : index + %788 = remi_signed %787, %c2_392 : index + %c0_393 = constant 0 : index + %789 = cmpi "slt", %788, %c0_393 : index + %790 = addi %788, %c2_392 : index + %791 = select %789, %790, %788 : index + store %767, %2[%777, %c0_386, %791] : memref<16x6x2xvector<8xf32>> + %c16_394 = constant 16 : index + %c0_395 = constant 0 : index + %c-1_396 = constant -1 : index + %792 = cmpi "slt", %arg5, %c0_395 : index + %793 = subi %c-1_396, %arg5 : index + %794 = select %792, %793, %arg5 : index + %795 = divi_signed %794, %c16_394 : index + %796 = subi %c-1_396, %795 : index + %797 = select %792, %796, %795 : index + %c16_397 = constant 16 : index + %798 = remi_signed %797, %c16_397 : index + %c0_398 = constant 0 : index + %799 = cmpi "slt", %798, %c0_398 : index + %800 = addi %798, %c16_397 : index + %801 = select %799, %800, %798 : index + %c0_399 = constant 0 : index + %c16_400 = constant 16 : index + %802 = remi_signed %arg5, %c16_400 : index + %c0_401 = constant 0 : index + %803 = cmpi "slt", %802, %c0_401 : index + %804 = addi %802, %c16_400 : index + %805 = select %803, %804, %802 : index + %c8_402 = constant 8 : index + %c0_403 = constant 0 : index + %c-1_404 = constant -1 : index + %806 = cmpi "slt", %805, %c0_403 : index + %807 = subi %c-1_404, %805 : index + %808 = select %806, %807, %805 : index + %809 = divi_signed %808, %c8_402 : index + %810 = subi %c-1_404, %809 : index + %811 = select %806, %810, %809 : index + %c2_405 = constant 2 : index + %812 = remi_signed %811, %c2_405 : index + %c0_406 = constant 0 : index + %813 = cmpi "slt", %812, %c0_406 : index + %814 = addi %812, %c2_405 : index + %815 = select %813, %814, %812 : index + %816 = load %2[%801, %c0_399, %815] : memref<16x6x2xvector<8xf32>> + %817 = vector.insertelement %490, %816[%c6_i64 : i64] : vector<8xf32> + %c16_407 = constant 16 : index + %c0_408 = constant 0 : index + %c-1_409 = constant -1 : index + %818 = cmpi "slt", %arg5, %c0_408 : index + %819 = subi %c-1_409, %arg5 : index + %820 = select %818, %819, %arg5 : index + %821 = divi_signed %820, %c16_407 : index + %822 = subi %c-1_409, %821 : index + %823 = select %818, %822, %821 : index + %c16_410 = constant 16 : index + %824 = remi_signed %823, %c16_410 : index + %c0_411 = constant 0 : index + %825 = cmpi "slt", %824, %c0_411 : index + %826 = addi %824, %c16_410 : index + %827 = select %825, %826, %824 : index + %c0_412 = constant 0 : index + %c16_413 = constant 16 : index + %828 = remi_signed %arg5, %c16_413 : index + %c0_414 = constant 0 : index + %829 = cmpi "slt", %828, %c0_414 : index + %830 = addi %828, %c16_413 : index + %831 = select %829, %830, %828 : index + %c8_415 = constant 8 : index + %c0_416 = constant 0 : index + %c-1_417 = constant -1 : index + %832 = cmpi "slt", %831, %c0_416 : index + %833 = subi %c-1_417, %831 : index + %834 = select %832, %833, %831 : index + %835 = divi_signed %834, %c8_415 : index + %836 = subi %c-1_417, %835 : index + %837 = select %832, %836, %835 : index + %c2_418 = constant 2 : index + %838 = remi_signed %837, %c2_418 : index + %c0_419 = constant 0 : index + %839 = cmpi "slt", %838, %c0_419 : index + %840 = addi %838, %c2_418 : index + %841 = select %839, %840, %838 : index + store %817, %2[%827, %c0_412, %841] : memref<16x6x2xvector<8xf32>> + %c16_420 = constant 16 : index + %c0_421 = constant 0 : index + %c-1_422 = constant -1 : index + %842 = cmpi "slt", %arg5, %c0_421 : index + %843 = subi %c-1_422, %arg5 : index + %844 = select %842, %843, %arg5 : index + %845 = divi_signed %844, %c16_420 : index + %846 = subi %c-1_422, %845 : index + %847 = select %842, %846, %845 : index + %c16_423 = constant 16 : index + %848 = remi_signed %847, %c16_423 : index + %c0_424 = constant 0 : index + %849 = cmpi "slt", %848, %c0_424 : index + %850 = addi %848, %c16_423 : index + %851 = select %849, %850, %848 : index + %c0_425 = constant 0 : index + %c16_426 = constant 16 : index + %852 = remi_signed %arg5, %c16_426 : index + %c0_427 = constant 0 : index + %853 = cmpi "slt", %852, %c0_427 : index + %854 = addi %852, %c16_426 : index + %855 = select %853, %854, %852 : index + %c8_428 = constant 8 : index + %c0_429 = constant 0 : index + %c-1_430 = constant -1 : index + %856 = cmpi "slt", %855, %c0_429 : index + %857 = subi %c-1_430, %855 : index + %858 = select %856, %857, %855 : index + %859 = divi_signed %858, %c8_428 : index + %860 = subi %c-1_430, %859 : index + %861 = select %856, %860, %859 : index + %c2_431 = constant 2 : index + %862 = remi_signed %861, %c2_431 : index + %c0_432 = constant 0 : index + %863 = cmpi "slt", %862, %c0_432 : index + %864 = addi %862, %c2_431 : index + %865 = select %863, %864, %862 : index + %866 = load %2[%851, %c0_425, %865] : memref<16x6x2xvector<8xf32>> + %867 = vector.insertelement %491, %866[%c7_i64 : i64] : vector<8xf32> + %c16_433 = constant 16 : index + %c0_434 = constant 0 : index + %c-1_435 = constant -1 : index + %868 = cmpi "slt", %arg5, %c0_434 : index + %869 = subi %c-1_435, %arg5 : index + %870 = select %868, %869, %arg5 : index + %871 = divi_signed %870, %c16_433 : index + %872 = subi %c-1_435, %871 : index + %873 = select %868, %872, %871 : index + %c16_436 = constant 16 : index + %874 = remi_signed %873, %c16_436 : index + %c0_437 = constant 0 : index + %875 = cmpi "slt", %874, %c0_437 : index + %876 = addi %874, %c16_436 : index + %877 = select %875, %876, %874 : index + %c0_438 = constant 0 : index + %c16_439 = constant 16 : index + %878 = remi_signed %arg5, %c16_439 : index + %c0_440 = constant 0 : index + %879 = cmpi "slt", %878, %c0_440 : index + %880 = addi %878, %c16_439 : index + %881 = select %879, %880, %878 : index + %c8_441 = constant 8 : index + %c0_442 = constant 0 : index + %c-1_443 = constant -1 : index + %882 = cmpi "slt", %881, %c0_442 : index + %883 = subi %c-1_443, %881 : index + %884 = select %882, %883, %881 : index + %885 = divi_signed %884, %c8_441 : index + %886 = subi %c-1_443, %885 : index + %887 = select %882, %886, %885 : index + %c2_444 = constant 2 : index + %888 = remi_signed %887, %c2_444 : index + %c0_445 = constant 0 : index + %889 = cmpi "slt", %888, %c0_445 : index + %890 = addi %888, %c2_444 : index + %891 = select %889, %890, %888 : index + store %867, %2[%877, %c0_438, %891] : memref<16x6x2xvector<8xf32>> + %c16_446 = constant 16 : index + %c0_447 = constant 0 : index + %c-1_448 = constant -1 : index + %892 = cmpi "slt", %arg5, %c0_447 : index + %893 = subi %c-1_448, %arg5 : index + %894 = select %892, %893, %arg5 : index + %895 = divi_signed %894, %c16_446 : index + %896 = subi %c-1_448, %895 : index + %897 = select %892, %896, %895 : index + %c16_449 = constant 16 : index + %898 = remi_signed %897, %c16_449 : index + %c0_450 = constant 0 : index + %899 = cmpi "slt", %898, %c0_450 : index + %900 = addi %898, %c16_449 : index + %901 = select %899, %900, %898 : index + %c0_451 = constant 0 : index + %c16_452 = constant 16 : index + %902 = remi_signed %arg5, %c16_452 : index + %c0_453 = constant 0 : index + %903 = cmpi "slt", %902, %c0_453 : index + %904 = addi %902, %c16_452 : index + %905 = select %903, %904, %902 : index + %c8_454 = constant 8 : index + %c0_455 = constant 0 : index + %c-1_456 = constant -1 : index + %906 = cmpi "slt", %905, %c0_455 : index + %907 = subi %c-1_456, %905 : index + %908 = select %906, %907, %905 : index + %909 = divi_signed %908, %c8_454 : index + %910 = subi %c-1_456, %909 : index + %911 = select %906, %910, %909 : index + %c2_457 = constant 2 : index + %912 = remi_signed %911, %c2_457 : index + %c0_458 = constant 0 : index + %913 = cmpi "slt", %912, %c0_458 : index + %914 = addi %912, %c2_457 : index + %915 = select %913, %914, %912 : index + %916 = load %2[%901, %c0_451, %915] : memref<16x6x2xvector<8xf32>> + %917 = vector.insertelement %484, %916[%c0_i64 : i64] : vector<8xf32> + %c16_459 = constant 16 : index + %c0_460 = constant 0 : index + %c-1_461 = constant -1 : index + %918 = cmpi "slt", %arg5, %c0_460 : index + %919 = subi %c-1_461, %arg5 : index + %920 = select %918, %919, %arg5 : index + %921 = divi_signed %920, %c16_459 : index + %922 = subi %c-1_461, %921 : index + %923 = select %918, %922, %921 : index + %c16_462 = constant 16 : index + %924 = remi_signed %923, %c16_462 : index + %c0_463 = constant 0 : index + %925 = cmpi "slt", %924, %c0_463 : index + %926 = addi %924, %c16_462 : index + %927 = select %925, %926, %924 : index + %c0_464 = constant 0 : index + %c16_465 = constant 16 : index + %928 = remi_signed %arg5, %c16_465 : index + %c0_466 = constant 0 : index + %929 = cmpi "slt", %928, %c0_466 : index + %930 = addi %928, %c16_465 : index + %931 = select %929, %930, %928 : index + %c8_467 = constant 8 : index + %c0_468 = constant 0 : index + %c-1_469 = constant -1 : index + %932 = cmpi "slt", %931, %c0_468 : index + %933 = subi %c-1_469, %931 : index + %934 = select %932, %933, %931 : index + %935 = divi_signed %934, %c8_467 : index + %936 = subi %c-1_469, %935 : index + %937 = select %932, %936, %935 : index + %c2_470 = constant 2 : index + %938 = remi_signed %937, %c2_470 : index + %c0_471 = constant 0 : index + %939 = cmpi "slt", %938, %c0_471 : index + %940 = addi %938, %c2_470 : index + %941 = select %939, %940, %938 : index + store %917, %2[%927, %c0_464, %941] : memref<16x6x2xvector<8xf32>> + %c16_472 = constant 16 : index + %c0_473 = constant 0 : index + %c-1_474 = constant -1 : index + %942 = cmpi "slt", %arg5, %c0_473 : index + %943 = subi %c-1_474, %arg5 : index + %944 = select %942, %943, %arg5 : index + %945 = divi_signed %944, %c16_472 : index + %946 = subi %c-1_474, %945 : index + %947 = select %942, %946, %945 : index + %c16_475 = constant 16 : index + %948 = remi_signed %947, %c16_475 : index + %c0_476 = constant 0 : index + %949 = cmpi "slt", %948, %c0_476 : index + %950 = addi %948, %c16_475 : index + %951 = select %949, %950, %948 : index + %c0_477 = constant 0 : index + %c16_478 = constant 16 : index + %952 = remi_signed %arg5, %c16_478 : index + %c0_479 = constant 0 : index + %953 = cmpi "slt", %952, %c0_479 : index + %954 = addi %952, %c16_478 : index + %955 = select %953, %954, %952 : index + %c8_480 = constant 8 : index + %c0_481 = constant 0 : index + %c-1_482 = constant -1 : index + %956 = cmpi "slt", %955, %c0_481 : index + %957 = subi %c-1_482, %955 : index + %958 = select %956, %957, %955 : index + %959 = divi_signed %958, %c8_480 : index + %960 = subi %c-1_482, %959 : index + %961 = select %956, %960, %959 : index + %c2_483 = constant 2 : index + %962 = remi_signed %961, %c2_483 : index + %c0_484 = constant 0 : index + %963 = cmpi "slt", %962, %c0_484 : index + %964 = addi %962, %c2_483 : index + %965 = select %963, %964, %962 : index + %966 = load %2[%951, %c0_477, %965] : memref<16x6x2xvector<8xf32>> + %967 = vector.insertelement %485, %966[%c1_i64 : i64] : vector<8xf32> + %c16_485 = constant 16 : index + %c0_486 = constant 0 : index + %c-1_487 = constant -1 : index + %968 = cmpi "slt", %arg5, %c0_486 : index + %969 = subi %c-1_487, %arg5 : index + %970 = select %968, %969, %arg5 : index + %971 = divi_signed %970, %c16_485 : index + %972 = subi %c-1_487, %971 : index + %973 = select %968, %972, %971 : index + %c16_488 = constant 16 : index + %974 = remi_signed %973, %c16_488 : index + %c0_489 = constant 0 : index + %975 = cmpi "slt", %974, %c0_489 : index + %976 = addi %974, %c16_488 : index + %977 = select %975, %976, %974 : index + %c0_490 = constant 0 : index + %c16_491 = constant 16 : index + %978 = remi_signed %arg5, %c16_491 : index + %c0_492 = constant 0 : index + %979 = cmpi "slt", %978, %c0_492 : index + %980 = addi %978, %c16_491 : index + %981 = select %979, %980, %978 : index + %c8_493 = constant 8 : index + %c0_494 = constant 0 : index + %c-1_495 = constant -1 : index + %982 = cmpi "slt", %981, %c0_494 : index + %983 = subi %c-1_495, %981 : index + %984 = select %982, %983, %981 : index + %985 = divi_signed %984, %c8_493 : index + %986 = subi %c-1_495, %985 : index + %987 = select %982, %986, %985 : index + %c2_496 = constant 2 : index + %988 = remi_signed %987, %c2_496 : index + %c0_497 = constant 0 : index + %989 = cmpi "slt", %988, %c0_497 : index + %990 = addi %988, %c2_496 : index + %991 = select %989, %990, %988 : index + store %967, %2[%977, %c0_490, %991] : memref<16x6x2xvector<8xf32>> + %c16_498 = constant 16 : index + %c0_499 = constant 0 : index + %c-1_500 = constant -1 : index + %992 = cmpi "slt", %arg5, %c0_499 : index + %993 = subi %c-1_500, %arg5 : index + %994 = select %992, %993, %arg5 : index + %995 = divi_signed %994, %c16_498 : index + %996 = subi %c-1_500, %995 : index + %997 = select %992, %996, %995 : index + %c16_501 = constant 16 : index + %998 = remi_signed %997, %c16_501 : index + %c0_502 = constant 0 : index + %999 = cmpi "slt", %998, %c0_502 : index + %1000 = addi %998, %c16_501 : index + %1001 = select %999, %1000, %998 : index + %c0_503 = constant 0 : index + %c16_504 = constant 16 : index + %1002 = remi_signed %arg5, %c16_504 : index + %c0_505 = constant 0 : index + %1003 = cmpi "slt", %1002, %c0_505 : index + %1004 = addi %1002, %c16_504 : index + %1005 = select %1003, %1004, %1002 : index + %c8_506 = constant 8 : index + %c0_507 = constant 0 : index + %c-1_508 = constant -1 : index + %1006 = cmpi "slt", %1005, %c0_507 : index + %1007 = subi %c-1_508, %1005 : index + %1008 = select %1006, %1007, %1005 : index + %1009 = divi_signed %1008, %c8_506 : index + %1010 = subi %c-1_508, %1009 : index + %1011 = select %1006, %1010, %1009 : index + %c2_509 = constant 2 : index + %1012 = remi_signed %1011, %c2_509 : index + %c0_510 = constant 0 : index + %1013 = cmpi "slt", %1012, %c0_510 : index + %1014 = addi %1012, %c2_509 : index + %1015 = select %1013, %1014, %1012 : index + %1016 = load %2[%1001, %c0_503, %1015] : memref<16x6x2xvector<8xf32>> + %1017 = vector.insertelement %486, %1016[%c2_i64 : i64] : vector<8xf32> + %c16_511 = constant 16 : index + %c0_512 = constant 0 : index + %c-1_513 = constant -1 : index + %1018 = cmpi "slt", %arg5, %c0_512 : index + %1019 = subi %c-1_513, %arg5 : index + %1020 = select %1018, %1019, %arg5 : index + %1021 = divi_signed %1020, %c16_511 : index + %1022 = subi %c-1_513, %1021 : index + %1023 = select %1018, %1022, %1021 : index + %c16_514 = constant 16 : index + %1024 = remi_signed %1023, %c16_514 : index + %c0_515 = constant 0 : index + %1025 = cmpi "slt", %1024, %c0_515 : index + %1026 = addi %1024, %c16_514 : index + %1027 = select %1025, %1026, %1024 : index + %c0_516 = constant 0 : index + %c16_517 = constant 16 : index + %1028 = remi_signed %arg5, %c16_517 : index + %c0_518 = constant 0 : index + %1029 = cmpi "slt", %1028, %c0_518 : index + %1030 = addi %1028, %c16_517 : index + %1031 = select %1029, %1030, %1028 : index + %c8_519 = constant 8 : index + %c0_520 = constant 0 : index + %c-1_521 = constant -1 : index + %1032 = cmpi "slt", %1031, %c0_520 : index + %1033 = subi %c-1_521, %1031 : index + %1034 = select %1032, %1033, %1031 : index + %1035 = divi_signed %1034, %c8_519 : index + %1036 = subi %c-1_521, %1035 : index + %1037 = select %1032, %1036, %1035 : index + %c2_522 = constant 2 : index + %1038 = remi_signed %1037, %c2_522 : index + %c0_523 = constant 0 : index + %1039 = cmpi "slt", %1038, %c0_523 : index + %1040 = addi %1038, %c2_522 : index + %1041 = select %1039, %1040, %1038 : index + store %1017, %2[%1027, %c0_516, %1041] : memref<16x6x2xvector<8xf32>> + %c16_524 = constant 16 : index + %c0_525 = constant 0 : index + %c-1_526 = constant -1 : index + %1042 = cmpi "slt", %arg5, %c0_525 : index + %1043 = subi %c-1_526, %arg5 : index + %1044 = select %1042, %1043, %arg5 : index + %1045 = divi_signed %1044, %c16_524 : index + %1046 = subi %c-1_526, %1045 : index + %1047 = select %1042, %1046, %1045 : index + %c16_527 = constant 16 : index + %1048 = remi_signed %1047, %c16_527 : index + %c0_528 = constant 0 : index + %1049 = cmpi "slt", %1048, %c0_528 : index + %1050 = addi %1048, %c16_527 : index + %1051 = select %1049, %1050, %1048 : index + %c0_529 = constant 0 : index + %c16_530 = constant 16 : index + %1052 = remi_signed %arg5, %c16_530 : index + %c0_531 = constant 0 : index + %1053 = cmpi "slt", %1052, %c0_531 : index + %1054 = addi %1052, %c16_530 : index + %1055 = select %1053, %1054, %1052 : index + %c8_532 = constant 8 : index + %c0_533 = constant 0 : index + %c-1_534 = constant -1 : index + %1056 = cmpi "slt", %1055, %c0_533 : index + %1057 = subi %c-1_534, %1055 : index + %1058 = select %1056, %1057, %1055 : index + %1059 = divi_signed %1058, %c8_532 : index + %1060 = subi %c-1_534, %1059 : index + %1061 = select %1056, %1060, %1059 : index + %c2_535 = constant 2 : index + %1062 = remi_signed %1061, %c2_535 : index + %c0_536 = constant 0 : index + %1063 = cmpi "slt", %1062, %c0_536 : index + %1064 = addi %1062, %c2_535 : index + %1065 = select %1063, %1064, %1062 : index + %1066 = load %2[%1051, %c0_529, %1065] : memref<16x6x2xvector<8xf32>> + %1067 = vector.insertelement %487, %1066[%c3_i64 : i64] : vector<8xf32> + %c16_537 = constant 16 : index + %c0_538 = constant 0 : index + %c-1_539 = constant -1 : index + %1068 = cmpi "slt", %arg5, %c0_538 : index + %1069 = subi %c-1_539, %arg5 : index + %1070 = select %1068, %1069, %arg5 : index + %1071 = divi_signed %1070, %c16_537 : index + %1072 = subi %c-1_539, %1071 : index + %1073 = select %1068, %1072, %1071 : index + %c16_540 = constant 16 : index + %1074 = remi_signed %1073, %c16_540 : index + %c0_541 = constant 0 : index + %1075 = cmpi "slt", %1074, %c0_541 : index + %1076 = addi %1074, %c16_540 : index + %1077 = select %1075, %1076, %1074 : index + %c0_542 = constant 0 : index + %c16_543 = constant 16 : index + %1078 = remi_signed %arg5, %c16_543 : index + %c0_544 = constant 0 : index + %1079 = cmpi "slt", %1078, %c0_544 : index + %1080 = addi %1078, %c16_543 : index + %1081 = select %1079, %1080, %1078 : index + %c8_545 = constant 8 : index + %c0_546 = constant 0 : index + %c-1_547 = constant -1 : index + %1082 = cmpi "slt", %1081, %c0_546 : index + %1083 = subi %c-1_547, %1081 : index + %1084 = select %1082, %1083, %1081 : index + %1085 = divi_signed %1084, %c8_545 : index + %1086 = subi %c-1_547, %1085 : index + %1087 = select %1082, %1086, %1085 : index + %c2_548 = constant 2 : index + %1088 = remi_signed %1087, %c2_548 : index + %c0_549 = constant 0 : index + %1089 = cmpi "slt", %1088, %c0_549 : index + %1090 = addi %1088, %c2_548 : index + %1091 = select %1089, %1090, %1088 : index + store %1067, %2[%1077, %c0_542, %1091] : memref<16x6x2xvector<8xf32>> + %c16_550 = constant 16 : index + %c0_551 = constant 0 : index + %c-1_552 = constant -1 : index + %1092 = cmpi "slt", %arg5, %c0_551 : index + %1093 = subi %c-1_552, %arg5 : index + %1094 = select %1092, %1093, %arg5 : index + %1095 = divi_signed %1094, %c16_550 : index + %1096 = subi %c-1_552, %1095 : index + %1097 = select %1092, %1096, %1095 : index + %c16_553 = constant 16 : index + %1098 = remi_signed %1097, %c16_553 : index + %c0_554 = constant 0 : index + %1099 = cmpi "slt", %1098, %c0_554 : index + %1100 = addi %1098, %c16_553 : index + %1101 = select %1099, %1100, %1098 : index + %c0_555 = constant 0 : index + %c16_556 = constant 16 : index + %1102 = remi_signed %arg5, %c16_556 : index + %c0_557 = constant 0 : index + %1103 = cmpi "slt", %1102, %c0_557 : index + %1104 = addi %1102, %c16_556 : index + %1105 = select %1103, %1104, %1102 : index + %c8_558 = constant 8 : index + %c0_559 = constant 0 : index + %c-1_560 = constant -1 : index + %1106 = cmpi "slt", %1105, %c0_559 : index + %1107 = subi %c-1_560, %1105 : index + %1108 = select %1106, %1107, %1105 : index + %1109 = divi_signed %1108, %c8_558 : index + %1110 = subi %c-1_560, %1109 : index + %1111 = select %1106, %1110, %1109 : index + %c2_561 = constant 2 : index + %1112 = remi_signed %1111, %c2_561 : index + %c0_562 = constant 0 : index + %1113 = cmpi "slt", %1112, %c0_562 : index + %1114 = addi %1112, %c2_561 : index + %1115 = select %1113, %1114, %1112 : index + %1116 = load %2[%1101, %c0_555, %1115] : memref<16x6x2xvector<8xf32>> + %1117 = vector.insertelement %488, %1116[%c4_i64 : i64] : vector<8xf32> + %c16_563 = constant 16 : index + %c0_564 = constant 0 : index + %c-1_565 = constant -1 : index + %1118 = cmpi "slt", %arg5, %c0_564 : index + %1119 = subi %c-1_565, %arg5 : index + %1120 = select %1118, %1119, %arg5 : index + %1121 = divi_signed %1120, %c16_563 : index + %1122 = subi %c-1_565, %1121 : index + %1123 = select %1118, %1122, %1121 : index + %c16_566 = constant 16 : index + %1124 = remi_signed %1123, %c16_566 : index + %c0_567 = constant 0 : index + %1125 = cmpi "slt", %1124, %c0_567 : index + %1126 = addi %1124, %c16_566 : index + %1127 = select %1125, %1126, %1124 : index + %c0_568 = constant 0 : index + %c16_569 = constant 16 : index + %1128 = remi_signed %arg5, %c16_569 : index + %c0_570 = constant 0 : index + %1129 = cmpi "slt", %1128, %c0_570 : index + %1130 = addi %1128, %c16_569 : index + %1131 = select %1129, %1130, %1128 : index + %c8_571 = constant 8 : index + %c0_572 = constant 0 : index + %c-1_573 = constant -1 : index + %1132 = cmpi "slt", %1131, %c0_572 : index + %1133 = subi %c-1_573, %1131 : index + %1134 = select %1132, %1133, %1131 : index + %1135 = divi_signed %1134, %c8_571 : index + %1136 = subi %c-1_573, %1135 : index + %1137 = select %1132, %1136, %1135 : index + %c2_574 = constant 2 : index + %1138 = remi_signed %1137, %c2_574 : index + %c0_575 = constant 0 : index + %1139 = cmpi "slt", %1138, %c0_575 : index + %1140 = addi %1138, %c2_574 : index + %1141 = select %1139, %1140, %1138 : index + store %1117, %2[%1127, %c0_568, %1141] : memref<16x6x2xvector<8xf32>> + %c16_576 = constant 16 : index + %c0_577 = constant 0 : index + %c-1_578 = constant -1 : index + %1142 = cmpi "slt", %arg5, %c0_577 : index + %1143 = subi %c-1_578, %arg5 : index + %1144 = select %1142, %1143, %arg5 : index + %1145 = divi_signed %1144, %c16_576 : index + %1146 = subi %c-1_578, %1145 : index + %1147 = select %1142, %1146, %1145 : index + %c16_579 = constant 16 : index + %1148 = remi_signed %1147, %c16_579 : index + %c0_580 = constant 0 : index + %1149 = cmpi "slt", %1148, %c0_580 : index + %1150 = addi %1148, %c16_579 : index + %1151 = select %1149, %1150, %1148 : index + %c0_581 = constant 0 : index + %c16_582 = constant 16 : index + %1152 = remi_signed %arg5, %c16_582 : index + %c0_583 = constant 0 : index + %1153 = cmpi "slt", %1152, %c0_583 : index + %1154 = addi %1152, %c16_582 : index + %1155 = select %1153, %1154, %1152 : index + %c8_584 = constant 8 : index + %c0_585 = constant 0 : index + %c-1_586 = constant -1 : index + %1156 = cmpi "slt", %1155, %c0_585 : index + %1157 = subi %c-1_586, %1155 : index + %1158 = select %1156, %1157, %1155 : index + %1159 = divi_signed %1158, %c8_584 : index + %1160 = subi %c-1_586, %1159 : index + %1161 = select %1156, %1160, %1159 : index + %c2_587 = constant 2 : index + %1162 = remi_signed %1161, %c2_587 : index + %c0_588 = constant 0 : index + %1163 = cmpi "slt", %1162, %c0_588 : index + %1164 = addi %1162, %c2_587 : index + %1165 = select %1163, %1164, %1162 : index + %1166 = load %2[%1151, %c0_581, %1165] : memref<16x6x2xvector<8xf32>> + %1167 = vector.insertelement %489, %1166[%c5_i64 : i64] : vector<8xf32> + %c16_589 = constant 16 : index + %c0_590 = constant 0 : index + %c-1_591 = constant -1 : index + %1168 = cmpi "slt", %arg5, %c0_590 : index + %1169 = subi %c-1_591, %arg5 : index + %1170 = select %1168, %1169, %arg5 : index + %1171 = divi_signed %1170, %c16_589 : index + %1172 = subi %c-1_591, %1171 : index + %1173 = select %1168, %1172, %1171 : index + %c16_592 = constant 16 : index + %1174 = remi_signed %1173, %c16_592 : index + %c0_593 = constant 0 : index + %1175 = cmpi "slt", %1174, %c0_593 : index + %1176 = addi %1174, %c16_592 : index + %1177 = select %1175, %1176, %1174 : index + %c0_594 = constant 0 : index + %c16_595 = constant 16 : index + %1178 = remi_signed %arg5, %c16_595 : index + %c0_596 = constant 0 : index + %1179 = cmpi "slt", %1178, %c0_596 : index + %1180 = addi %1178, %c16_595 : index + %1181 = select %1179, %1180, %1178 : index + %c8_597 = constant 8 : index + %c0_598 = constant 0 : index + %c-1_599 = constant -1 : index + %1182 = cmpi "slt", %1181, %c0_598 : index + %1183 = subi %c-1_599, %1181 : index + %1184 = select %1182, %1183, %1181 : index + %1185 = divi_signed %1184, %c8_597 : index + %1186 = subi %c-1_599, %1185 : index + %1187 = select %1182, %1186, %1185 : index + %c2_600 = constant 2 : index + %1188 = remi_signed %1187, %c2_600 : index + %c0_601 = constant 0 : index + %1189 = cmpi "slt", %1188, %c0_601 : index + %1190 = addi %1188, %c2_600 : index + %1191 = select %1189, %1190, %1188 : index + store %1167, %2[%1177, %c0_594, %1191] : memref<16x6x2xvector<8xf32>> + %c16_602 = constant 16 : index + %c0_603 = constant 0 : index + %c-1_604 = constant -1 : index + %1192 = cmpi "slt", %arg5, %c0_603 : index + %1193 = subi %c-1_604, %arg5 : index + %1194 = select %1192, %1193, %arg5 : index + %1195 = divi_signed %1194, %c16_602 : index + %1196 = subi %c-1_604, %1195 : index + %1197 = select %1192, %1196, %1195 : index + %c16_605 = constant 16 : index + %1198 = remi_signed %1197, %c16_605 : index + %c0_606 = constant 0 : index + %1199 = cmpi "slt", %1198, %c0_606 : index + %1200 = addi %1198, %c16_605 : index + %1201 = select %1199, %1200, %1198 : index + %c0_607 = constant 0 : index + %c16_608 = constant 16 : index + %1202 = remi_signed %arg5, %c16_608 : index + %c0_609 = constant 0 : index + %1203 = cmpi "slt", %1202, %c0_609 : index + %1204 = addi %1202, %c16_608 : index + %1205 = select %1203, %1204, %1202 : index + %c8_610 = constant 8 : index + %c0_611 = constant 0 : index + %c-1_612 = constant -1 : index + %1206 = cmpi "slt", %1205, %c0_611 : index + %1207 = subi %c-1_612, %1205 : index + %1208 = select %1206, %1207, %1205 : index + %1209 = divi_signed %1208, %c8_610 : index + %1210 = subi %c-1_612, %1209 : index + %1211 = select %1206, %1210, %1209 : index + %c2_613 = constant 2 : index + %1212 = remi_signed %1211, %c2_613 : index + %c0_614 = constant 0 : index + %1213 = cmpi "slt", %1212, %c0_614 : index + %1214 = addi %1212, %c2_613 : index + %1215 = select %1213, %1214, %1212 : index + %1216 = load %2[%1201, %c0_607, %1215] : memref<16x6x2xvector<8xf32>> + %1217 = vector.insertelement %490, %1216[%c6_i64 : i64] : vector<8xf32> + %c16_615 = constant 16 : index + %c0_616 = constant 0 : index + %c-1_617 = constant -1 : index + %1218 = cmpi "slt", %arg5, %c0_616 : index + %1219 = subi %c-1_617, %arg5 : index + %1220 = select %1218, %1219, %arg5 : index + %1221 = divi_signed %1220, %c16_615 : index + %1222 = subi %c-1_617, %1221 : index + %1223 = select %1218, %1222, %1221 : index + %c16_618 = constant 16 : index + %1224 = remi_signed %1223, %c16_618 : index + %c0_619 = constant 0 : index + %1225 = cmpi "slt", %1224, %c0_619 : index + %1226 = addi %1224, %c16_618 : index + %1227 = select %1225, %1226, %1224 : index + %c0_620 = constant 0 : index + %c16_621 = constant 16 : index + %1228 = remi_signed %arg5, %c16_621 : index + %c0_622 = constant 0 : index + %1229 = cmpi "slt", %1228, %c0_622 : index + %1230 = addi %1228, %c16_621 : index + %1231 = select %1229, %1230, %1228 : index + %c8_623 = constant 8 : index + %c0_624 = constant 0 : index + %c-1_625 = constant -1 : index + %1232 = cmpi "slt", %1231, %c0_624 : index + %1233 = subi %c-1_625, %1231 : index + %1234 = select %1232, %1233, %1231 : index + %1235 = divi_signed %1234, %c8_623 : index + %1236 = subi %c-1_625, %1235 : index + %1237 = select %1232, %1236, %1235 : index + %c2_626 = constant 2 : index + %1238 = remi_signed %1237, %c2_626 : index + %c0_627 = constant 0 : index + %1239 = cmpi "slt", %1238, %c0_627 : index + %1240 = addi %1238, %c2_626 : index + %1241 = select %1239, %1240, %1238 : index + store %1217, %2[%1227, %c0_620, %1241] : memref<16x6x2xvector<8xf32>> + %c16_628 = constant 16 : index + %c0_629 = constant 0 : index + %c-1_630 = constant -1 : index + %1242 = cmpi "slt", %arg5, %c0_629 : index + %1243 = subi %c-1_630, %arg5 : index + %1244 = select %1242, %1243, %arg5 : index + %1245 = divi_signed %1244, %c16_628 : index + %1246 = subi %c-1_630, %1245 : index + %1247 = select %1242, %1246, %1245 : index + %c16_631 = constant 16 : index + %1248 = remi_signed %1247, %c16_631 : index + %c0_632 = constant 0 : index + %1249 = cmpi "slt", %1248, %c0_632 : index + %1250 = addi %1248, %c16_631 : index + %1251 = select %1249, %1250, %1248 : index + %c0_633 = constant 0 : index + %c16_634 = constant 16 : index + %1252 = remi_signed %arg5, %c16_634 : index + %c0_635 = constant 0 : index + %1253 = cmpi "slt", %1252, %c0_635 : index + %1254 = addi %1252, %c16_634 : index + %1255 = select %1253, %1254, %1252 : index + %c8_636 = constant 8 : index + %c0_637 = constant 0 : index + %c-1_638 = constant -1 : index + %1256 = cmpi "slt", %1255, %c0_637 : index + %1257 = subi %c-1_638, %1255 : index + %1258 = select %1256, %1257, %1255 : index + %1259 = divi_signed %1258, %c8_636 : index + %1260 = subi %c-1_638, %1259 : index + %1261 = select %1256, %1260, %1259 : index + %c2_639 = constant 2 : index + %1262 = remi_signed %1261, %c2_639 : index + %c0_640 = constant 0 : index + %1263 = cmpi "slt", %1262, %c0_640 : index + %1264 = addi %1262, %c2_639 : index + %1265 = select %1263, %1264, %1262 : index + %1266 = load %2[%1251, %c0_633, %1265] : memref<16x6x2xvector<8xf32>> + %1267 = vector.insertelement %491, %1266[%c7_i64 : i64] : vector<8xf32> + %c16_641 = constant 16 : index + %c0_642 = constant 0 : index + %c-1_643 = constant -1 : index + %1268 = cmpi "slt", %arg5, %c0_642 : index + %1269 = subi %c-1_643, %arg5 : index + %1270 = select %1268, %1269, %arg5 : index + %1271 = divi_signed %1270, %c16_641 : index + %1272 = subi %c-1_643, %1271 : index + %1273 = select %1268, %1272, %1271 : index + %c16_644 = constant 16 : index + %1274 = remi_signed %1273, %c16_644 : index + %c0_645 = constant 0 : index + %1275 = cmpi "slt", %1274, %c0_645 : index + %1276 = addi %1274, %c16_644 : index + %1277 = select %1275, %1276, %1274 : index + %c0_646 = constant 0 : index + %c16_647 = constant 16 : index + %1278 = remi_signed %arg5, %c16_647 : index + %c0_648 = constant 0 : index + %1279 = cmpi "slt", %1278, %c0_648 : index + %1280 = addi %1278, %c16_647 : index + %1281 = select %1279, %1280, %1278 : index + %c8_649 = constant 8 : index + %c0_650 = constant 0 : index + %c-1_651 = constant -1 : index + %1282 = cmpi "slt", %1281, %c0_650 : index + %1283 = subi %c-1_651, %1281 : index + %1284 = select %1282, %1283, %1281 : index + %1285 = divi_signed %1284, %c8_649 : index + %1286 = subi %c-1_651, %1285 : index + %1287 = select %1282, %1286, %1285 : index + %c2_652 = constant 2 : index + %1288 = remi_signed %1287, %c2_652 : index + %c0_653 = constant 0 : index + %1289 = cmpi "slt", %1288, %c0_653 : index + %1290 = addi %1288, %c2_652 : index + %1291 = select %1289, %1290, %1288 : index + store %1267, %2[%1277, %c0_646, %1291] : memref<16x6x2xvector<8xf32>> + %1292 = addi %arg6, %arg7 : index + %1293 = addi %arg6, %arg7 : index + %1294 = addi %arg6, %arg7 : index + %1295 = addi %arg6, %arg7 : index + %1296 = addi %arg6, %arg7 : index + %1297 = addi %arg6, %arg7 : index + %1298 = addi %arg6, %arg7 : index + %1299 = addi %arg6, %arg7 : index + %1300 = load %arg0[%arg4, %1292] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1301 = load %arg0[%arg4, %1293] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1302 = load %arg0[%arg4, %1294] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1303 = load %arg0[%arg4, %1295] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1304 = load %arg0[%arg4, %1296] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1305 = load %arg0[%arg4, %1297] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1306 = load %arg0[%arg4, %1298] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1307 = load %arg0[%arg4, %1299] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %c8_654 = constant 8 : index + %1308 = addi %arg5, %c8_654 : index + %c16_655 = constant 16 : index + %c0_656 = constant 0 : index + %c-1_657 = constant -1 : index + %1309 = cmpi "slt", %1308, %c0_656 : index + %1310 = subi %c-1_657, %1308 : index + %1311 = select %1309, %1310, %1308 : index + %1312 = divi_signed %1311, %c16_655 : index + %1313 = subi %c-1_657, %1312 : index + %1314 = select %1309, %1313, %1312 : index + %c16_658 = constant 16 : index + %1315 = remi_signed %1314, %c16_658 : index + %c0_659 = constant 0 : index + %1316 = cmpi "slt", %1315, %c0_659 : index + %1317 = addi %1315, %c16_658 : index + %1318 = select %1316, %1317, %1315 : index + %1319 = addi %arg6, %arg7 : index + %c128_660 = constant 128 : index + %1320 = remi_signed %1319, %c128_660 : index + %c0_661 = constant 0 : index + %1321 = cmpi "slt", %1320, %c0_661 : index + %1322 = addi %1320, %c128_660 : index + %1323 = select %1321, %1322, %1320 : index + %c8_662 = constant 8 : index + %c0_663 = constant 0 : index + %c-1_664 = constant -1 : index + %1324 = cmpi "slt", %arg5, %c0_663 : index + %1325 = subi %c-1_664, %arg5 : index + %1326 = select %1324, %1325, %arg5 : index + %1327 = divi_signed %1326, %c8_662 : index + %1328 = subi %c-1_664, %1327 : index + %1329 = select %1324, %1328, %1327 : index + %c8_665 = constant 8 : index + %1330 = addi %arg5, %c8_665 : index + %c16_666 = constant 16 : index + %c0_667 = constant 0 : index + %c-1_668 = constant -1 : index + %1331 = cmpi "slt", %1330, %c0_667 : index + %1332 = subi %c-1_668, %1330 : index + %1333 = select %1331, %1332, %1330 : index + %1334 = divi_signed %1333, %c16_666 : index + %1335 = subi %c-1_668, %1334 : index + %1336 = select %1331, %1335, %1334 : index + %c-2 = constant -2 : index + %1337 = muli %1336, %c-2 : index + %1338 = addi %1329, %1337 : index + %c8_669 = constant 8 : index + %c0_670 = constant 0 : index + %c-1_671 = constant -1 : index + %1339 = cmpi "slt", %arg5, %c0_670 : index + %1340 = subi %c-1_671, %arg5 : index + %1341 = select %1339, %1340, %arg5 : index + %1342 = divi_signed %1341, %c8_669 : index + %1343 = subi %c-1_671, %1342 : index + %1344 = select %1339, %1343, %1342 : index + %c8_672 = constant 8 : index + %1345 = addi %arg5, %c8_672 : index + %c16_673 = constant 16 : index + %c0_674 = constant 0 : index + %c-1_675 = constant -1 : index + %1346 = cmpi "slt", %1345, %c0_674 : index + %1347 = subi %c-1_675, %1345 : index + %1348 = select %1346, %1347, %1345 : index + %1349 = divi_signed %1348, %c16_673 : index + %1350 = subi %c-1_675, %1349 : index + %1351 = select %1346, %1350, %1349 : index + %c-2_676 = constant -2 : index + %1352 = muli %1351, %c-2_676 : index + %1353 = addi %1344, %1352 : index + %c1_677 = constant 1 : index + %1354 = addi %1353, %c1_677 : index + %c2_678 = constant 2 : index + %c0_679 = constant 0 : index + %c-1_680 = constant -1 : index + %1355 = cmpi "slt", %1354, %c0_679 : index + %1356 = subi %c-1_680, %1354 : index + %1357 = select %1355, %1356, %1354 : index + %1358 = divi_signed %1357, %c2_678 : index + %1359 = subi %c-1_680, %1358 : index + %1360 = select %1355, %1359, %1358 : index + %c-2_681 = constant -2 : index + %1361 = muli %1360, %c-2_681 : index + %1362 = addi %1338, %1361 : index + %c1_682 = constant 1 : index + %1363 = addi %1362, %c1_682 : index + %1364 = load %3[%1318, %1323, %1363] : memref<16x128x2xvector<8xf32>> + %1365 = vector.extractelement %1364[%c0_i64 : i64] : vector<8xf32> + %c8_683 = constant 8 : index + %1366 = addi %arg5, %c8_683 : index + %c16_684 = constant 16 : index + %c0_685 = constant 0 : index + %c-1_686 = constant -1 : index + %1367 = cmpi "slt", %1366, %c0_685 : index + %1368 = subi %c-1_686, %1366 : index + %1369 = select %1367, %1368, %1366 : index + %1370 = divi_signed %1369, %c16_684 : index + %1371 = subi %c-1_686, %1370 : index + %1372 = select %1367, %1371, %1370 : index + %c16_687 = constant 16 : index + %1373 = remi_signed %1372, %c16_687 : index + %c0_688 = constant 0 : index + %1374 = cmpi "slt", %1373, %c0_688 : index + %1375 = addi %1373, %c16_687 : index + %1376 = select %1374, %1375, %1373 : index + %1377 = addi %arg6, %arg7 : index + %c128_689 = constant 128 : index + %1378 = remi_signed %1377, %c128_689 : index + %c0_690 = constant 0 : index + %1379 = cmpi "slt", %1378, %c0_690 : index + %1380 = addi %1378, %c128_689 : index + %1381 = select %1379, %1380, %1378 : index + %c8_691 = constant 8 : index + %c0_692 = constant 0 : index + %c-1_693 = constant -1 : index + %1382 = cmpi "slt", %arg5, %c0_692 : index + %1383 = subi %c-1_693, %arg5 : index + %1384 = select %1382, %1383, %arg5 : index + %1385 = divi_signed %1384, %c8_691 : index + %1386 = subi %c-1_693, %1385 : index + %1387 = select %1382, %1386, %1385 : index + %c8_694 = constant 8 : index + %1388 = addi %arg5, %c8_694 : index + %c16_695 = constant 16 : index + %c0_696 = constant 0 : index + %c-1_697 = constant -1 : index + %1389 = cmpi "slt", %1388, %c0_696 : index + %1390 = subi %c-1_697, %1388 : index + %1391 = select %1389, %1390, %1388 : index + %1392 = divi_signed %1391, %c16_695 : index + %1393 = subi %c-1_697, %1392 : index + %1394 = select %1389, %1393, %1392 : index + %c-2_698 = constant -2 : index + %1395 = muli %1394, %c-2_698 : index + %1396 = addi %1387, %1395 : index + %c8_699 = constant 8 : index + %c0_700 = constant 0 : index + %c-1_701 = constant -1 : index + %1397 = cmpi "slt", %arg5, %c0_700 : index + %1398 = subi %c-1_701, %arg5 : index + %1399 = select %1397, %1398, %arg5 : index + %1400 = divi_signed %1399, %c8_699 : index + %1401 = subi %c-1_701, %1400 : index + %1402 = select %1397, %1401, %1400 : index + %c8_702 = constant 8 : index + %1403 = addi %arg5, %c8_702 : index + %c16_703 = constant 16 : index + %c0_704 = constant 0 : index + %c-1_705 = constant -1 : index + %1404 = cmpi "slt", %1403, %c0_704 : index + %1405 = subi %c-1_705, %1403 : index + %1406 = select %1404, %1405, %1403 : index + %1407 = divi_signed %1406, %c16_703 : index + %1408 = subi %c-1_705, %1407 : index + %1409 = select %1404, %1408, %1407 : index + %c-2_706 = constant -2 : index + %1410 = muli %1409, %c-2_706 : index + %1411 = addi %1402, %1410 : index + %c1_707 = constant 1 : index + %1412 = addi %1411, %c1_707 : index + %c2_708 = constant 2 : index + %c0_709 = constant 0 : index + %c-1_710 = constant -1 : index + %1413 = cmpi "slt", %1412, %c0_709 : index + %1414 = subi %c-1_710, %1412 : index + %1415 = select %1413, %1414, %1412 : index + %1416 = divi_signed %1415, %c2_708 : index + %1417 = subi %c-1_710, %1416 : index + %1418 = select %1413, %1417, %1416 : index + %c-2_711 = constant -2 : index + %1419 = muli %1418, %c-2_711 : index + %1420 = addi %1396, %1419 : index + %c1_712 = constant 1 : index + %1421 = addi %1420, %c1_712 : index + %1422 = load %3[%1376, %1381, %1421] : memref<16x128x2xvector<8xf32>> + %1423 = vector.extractelement %1422[%c1_i64 : i64] : vector<8xf32> + %c8_713 = constant 8 : index + %1424 = addi %arg5, %c8_713 : index + %c16_714 = constant 16 : index + %c0_715 = constant 0 : index + %c-1_716 = constant -1 : index + %1425 = cmpi "slt", %1424, %c0_715 : index + %1426 = subi %c-1_716, %1424 : index + %1427 = select %1425, %1426, %1424 : index + %1428 = divi_signed %1427, %c16_714 : index + %1429 = subi %c-1_716, %1428 : index + %1430 = select %1425, %1429, %1428 : index + %c16_717 = constant 16 : index + %1431 = remi_signed %1430, %c16_717 : index + %c0_718 = constant 0 : index + %1432 = cmpi "slt", %1431, %c0_718 : index + %1433 = addi %1431, %c16_717 : index + %1434 = select %1432, %1433, %1431 : index + %1435 = addi %arg6, %arg7 : index + %c128_719 = constant 128 : index + %1436 = remi_signed %1435, %c128_719 : index + %c0_720 = constant 0 : index + %1437 = cmpi "slt", %1436, %c0_720 : index + %1438 = addi %1436, %c128_719 : index + %1439 = select %1437, %1438, %1436 : index + %c8_721 = constant 8 : index + %c0_722 = constant 0 : index + %c-1_723 = constant -1 : index + %1440 = cmpi "slt", %arg5, %c0_722 : index + %1441 = subi %c-1_723, %arg5 : index + %1442 = select %1440, %1441, %arg5 : index + %1443 = divi_signed %1442, %c8_721 : index + %1444 = subi %c-1_723, %1443 : index + %1445 = select %1440, %1444, %1443 : index + %c8_724 = constant 8 : index + %1446 = addi %arg5, %c8_724 : index + %c16_725 = constant 16 : index + %c0_726 = constant 0 : index + %c-1_727 = constant -1 : index + %1447 = cmpi "slt", %1446, %c0_726 : index + %1448 = subi %c-1_727, %1446 : index + %1449 = select %1447, %1448, %1446 : index + %1450 = divi_signed %1449, %c16_725 : index + %1451 = subi %c-1_727, %1450 : index + %1452 = select %1447, %1451, %1450 : index + %c-2_728 = constant -2 : index + %1453 = muli %1452, %c-2_728 : index + %1454 = addi %1445, %1453 : index + %c8_729 = constant 8 : index + %c0_730 = constant 0 : index + %c-1_731 = constant -1 : index + %1455 = cmpi "slt", %arg5, %c0_730 : index + %1456 = subi %c-1_731, %arg5 : index + %1457 = select %1455, %1456, %arg5 : index + %1458 = divi_signed %1457, %c8_729 : index + %1459 = subi %c-1_731, %1458 : index + %1460 = select %1455, %1459, %1458 : index + %c8_732 = constant 8 : index + %1461 = addi %arg5, %c8_732 : index + %c16_733 = constant 16 : index + %c0_734 = constant 0 : index + %c-1_735 = constant -1 : index + %1462 = cmpi "slt", %1461, %c0_734 : index + %1463 = subi %c-1_735, %1461 : index + %1464 = select %1462, %1463, %1461 : index + %1465 = divi_signed %1464, %c16_733 : index + %1466 = subi %c-1_735, %1465 : index + %1467 = select %1462, %1466, %1465 : index + %c-2_736 = constant -2 : index + %1468 = muli %1467, %c-2_736 : index + %1469 = addi %1460, %1468 : index + %c1_737 = constant 1 : index + %1470 = addi %1469, %c1_737 : index + %c2_738 = constant 2 : index + %c0_739 = constant 0 : index + %c-1_740 = constant -1 : index + %1471 = cmpi "slt", %1470, %c0_739 : index + %1472 = subi %c-1_740, %1470 : index + %1473 = select %1471, %1472, %1470 : index + %1474 = divi_signed %1473, %c2_738 : index + %1475 = subi %c-1_740, %1474 : index + %1476 = select %1471, %1475, %1474 : index + %c-2_741 = constant -2 : index + %1477 = muli %1476, %c-2_741 : index + %1478 = addi %1454, %1477 : index + %c1_742 = constant 1 : index + %1479 = addi %1478, %c1_742 : index + %1480 = load %3[%1434, %1439, %1479] : memref<16x128x2xvector<8xf32>> + %1481 = vector.extractelement %1480[%c2_i64 : i64] : vector<8xf32> + %c8_743 = constant 8 : index + %1482 = addi %arg5, %c8_743 : index + %c16_744 = constant 16 : index + %c0_745 = constant 0 : index + %c-1_746 = constant -1 : index + %1483 = cmpi "slt", %1482, %c0_745 : index + %1484 = subi %c-1_746, %1482 : index + %1485 = select %1483, %1484, %1482 : index + %1486 = divi_signed %1485, %c16_744 : index + %1487 = subi %c-1_746, %1486 : index + %1488 = select %1483, %1487, %1486 : index + %c16_747 = constant 16 : index + %1489 = remi_signed %1488, %c16_747 : index + %c0_748 = constant 0 : index + %1490 = cmpi "slt", %1489, %c0_748 : index + %1491 = addi %1489, %c16_747 : index + %1492 = select %1490, %1491, %1489 : index + %1493 = addi %arg6, %arg7 : index + %c128_749 = constant 128 : index + %1494 = remi_signed %1493, %c128_749 : index + %c0_750 = constant 0 : index + %1495 = cmpi "slt", %1494, %c0_750 : index + %1496 = addi %1494, %c128_749 : index + %1497 = select %1495, %1496, %1494 : index + %c8_751 = constant 8 : index + %c0_752 = constant 0 : index + %c-1_753 = constant -1 : index + %1498 = cmpi "slt", %arg5, %c0_752 : index + %1499 = subi %c-1_753, %arg5 : index + %1500 = select %1498, %1499, %arg5 : index + %1501 = divi_signed %1500, %c8_751 : index + %1502 = subi %c-1_753, %1501 : index + %1503 = select %1498, %1502, %1501 : index + %c8_754 = constant 8 : index + %1504 = addi %arg5, %c8_754 : index + %c16_755 = constant 16 : index + %c0_756 = constant 0 : index + %c-1_757 = constant -1 : index + %1505 = cmpi "slt", %1504, %c0_756 : index + %1506 = subi %c-1_757, %1504 : index + %1507 = select %1505, %1506, %1504 : index + %1508 = divi_signed %1507, %c16_755 : index + %1509 = subi %c-1_757, %1508 : index + %1510 = select %1505, %1509, %1508 : index + %c-2_758 = constant -2 : index + %1511 = muli %1510, %c-2_758 : index + %1512 = addi %1503, %1511 : index + %c8_759 = constant 8 : index + %c0_760 = constant 0 : index + %c-1_761 = constant -1 : index + %1513 = cmpi "slt", %arg5, %c0_760 : index + %1514 = subi %c-1_761, %arg5 : index + %1515 = select %1513, %1514, %arg5 : index + %1516 = divi_signed %1515, %c8_759 : index + %1517 = subi %c-1_761, %1516 : index + %1518 = select %1513, %1517, %1516 : index + %c8_762 = constant 8 : index + %1519 = addi %arg5, %c8_762 : index + %c16_763 = constant 16 : index + %c0_764 = constant 0 : index + %c-1_765 = constant -1 : index + %1520 = cmpi "slt", %1519, %c0_764 : index + %1521 = subi %c-1_765, %1519 : index + %1522 = select %1520, %1521, %1519 : index + %1523 = divi_signed %1522, %c16_763 : index + %1524 = subi %c-1_765, %1523 : index + %1525 = select %1520, %1524, %1523 : index + %c-2_766 = constant -2 : index + %1526 = muli %1525, %c-2_766 : index + %1527 = addi %1518, %1526 : index + %c1_767 = constant 1 : index + %1528 = addi %1527, %c1_767 : index + %c2_768 = constant 2 : index + %c0_769 = constant 0 : index + %c-1_770 = constant -1 : index + %1529 = cmpi "slt", %1528, %c0_769 : index + %1530 = subi %c-1_770, %1528 : index + %1531 = select %1529, %1530, %1528 : index + %1532 = divi_signed %1531, %c2_768 : index + %1533 = subi %c-1_770, %1532 : index + %1534 = select %1529, %1533, %1532 : index + %c-2_771 = constant -2 : index + %1535 = muli %1534, %c-2_771 : index + %1536 = addi %1512, %1535 : index + %c1_772 = constant 1 : index + %1537 = addi %1536, %c1_772 : index + %1538 = load %3[%1492, %1497, %1537] : memref<16x128x2xvector<8xf32>> + %1539 = vector.extractelement %1538[%c3_i64 : i64] : vector<8xf32> + %c8_773 = constant 8 : index + %1540 = addi %arg5, %c8_773 : index + %c16_774 = constant 16 : index + %c0_775 = constant 0 : index + %c-1_776 = constant -1 : index + %1541 = cmpi "slt", %1540, %c0_775 : index + %1542 = subi %c-1_776, %1540 : index + %1543 = select %1541, %1542, %1540 : index + %1544 = divi_signed %1543, %c16_774 : index + %1545 = subi %c-1_776, %1544 : index + %1546 = select %1541, %1545, %1544 : index + %c16_777 = constant 16 : index + %1547 = remi_signed %1546, %c16_777 : index + %c0_778 = constant 0 : index + %1548 = cmpi "slt", %1547, %c0_778 : index + %1549 = addi %1547, %c16_777 : index + %1550 = select %1548, %1549, %1547 : index + %1551 = addi %arg6, %arg7 : index + %c128_779 = constant 128 : index + %1552 = remi_signed %1551, %c128_779 : index + %c0_780 = constant 0 : index + %1553 = cmpi "slt", %1552, %c0_780 : index + %1554 = addi %1552, %c128_779 : index + %1555 = select %1553, %1554, %1552 : index + %c8_781 = constant 8 : index + %c0_782 = constant 0 : index + %c-1_783 = constant -1 : index + %1556 = cmpi "slt", %arg5, %c0_782 : index + %1557 = subi %c-1_783, %arg5 : index + %1558 = select %1556, %1557, %arg5 : index + %1559 = divi_signed %1558, %c8_781 : index + %1560 = subi %c-1_783, %1559 : index + %1561 = select %1556, %1560, %1559 : index + %c8_784 = constant 8 : index + %1562 = addi %arg5, %c8_784 : index + %c16_785 = constant 16 : index + %c0_786 = constant 0 : index + %c-1_787 = constant -1 : index + %1563 = cmpi "slt", %1562, %c0_786 : index + %1564 = subi %c-1_787, %1562 : index + %1565 = select %1563, %1564, %1562 : index + %1566 = divi_signed %1565, %c16_785 : index + %1567 = subi %c-1_787, %1566 : index + %1568 = select %1563, %1567, %1566 : index + %c-2_788 = constant -2 : index + %1569 = muli %1568, %c-2_788 : index + %1570 = addi %1561, %1569 : index + %c8_789 = constant 8 : index + %c0_790 = constant 0 : index + %c-1_791 = constant -1 : index + %1571 = cmpi "slt", %arg5, %c0_790 : index + %1572 = subi %c-1_791, %arg5 : index + %1573 = select %1571, %1572, %arg5 : index + %1574 = divi_signed %1573, %c8_789 : index + %1575 = subi %c-1_791, %1574 : index + %1576 = select %1571, %1575, %1574 : index + %c8_792 = constant 8 : index + %1577 = addi %arg5, %c8_792 : index + %c16_793 = constant 16 : index + %c0_794 = constant 0 : index + %c-1_795 = constant -1 : index + %1578 = cmpi "slt", %1577, %c0_794 : index + %1579 = subi %c-1_795, %1577 : index + %1580 = select %1578, %1579, %1577 : index + %1581 = divi_signed %1580, %c16_793 : index + %1582 = subi %c-1_795, %1581 : index + %1583 = select %1578, %1582, %1581 : index + %c-2_796 = constant -2 : index + %1584 = muli %1583, %c-2_796 : index + %1585 = addi %1576, %1584 : index + %c1_797 = constant 1 : index + %1586 = addi %1585, %c1_797 : index + %c2_798 = constant 2 : index + %c0_799 = constant 0 : index + %c-1_800 = constant -1 : index + %1587 = cmpi "slt", %1586, %c0_799 : index + %1588 = subi %c-1_800, %1586 : index + %1589 = select %1587, %1588, %1586 : index + %1590 = divi_signed %1589, %c2_798 : index + %1591 = subi %c-1_800, %1590 : index + %1592 = select %1587, %1591, %1590 : index + %c-2_801 = constant -2 : index + %1593 = muli %1592, %c-2_801 : index + %1594 = addi %1570, %1593 : index + %c1_802 = constant 1 : index + %1595 = addi %1594, %c1_802 : index + %1596 = load %3[%1550, %1555, %1595] : memref<16x128x2xvector<8xf32>> + %1597 = vector.extractelement %1596[%c4_i64 : i64] : vector<8xf32> + %c8_803 = constant 8 : index + %1598 = addi %arg5, %c8_803 : index + %c16_804 = constant 16 : index + %c0_805 = constant 0 : index + %c-1_806 = constant -1 : index + %1599 = cmpi "slt", %1598, %c0_805 : index + %1600 = subi %c-1_806, %1598 : index + %1601 = select %1599, %1600, %1598 : index + %1602 = divi_signed %1601, %c16_804 : index + %1603 = subi %c-1_806, %1602 : index + %1604 = select %1599, %1603, %1602 : index + %c16_807 = constant 16 : index + %1605 = remi_signed %1604, %c16_807 : index + %c0_808 = constant 0 : index + %1606 = cmpi "slt", %1605, %c0_808 : index + %1607 = addi %1605, %c16_807 : index + %1608 = select %1606, %1607, %1605 : index + %1609 = addi %arg6, %arg7 : index + %c128_809 = constant 128 : index + %1610 = remi_signed %1609, %c128_809 : index + %c0_810 = constant 0 : index + %1611 = cmpi "slt", %1610, %c0_810 : index + %1612 = addi %1610, %c128_809 : index + %1613 = select %1611, %1612, %1610 : index + %c8_811 = constant 8 : index + %c0_812 = constant 0 : index + %c-1_813 = constant -1 : index + %1614 = cmpi "slt", %arg5, %c0_812 : index + %1615 = subi %c-1_813, %arg5 : index + %1616 = select %1614, %1615, %arg5 : index + %1617 = divi_signed %1616, %c8_811 : index + %1618 = subi %c-1_813, %1617 : index + %1619 = select %1614, %1618, %1617 : index + %c8_814 = constant 8 : index + %1620 = addi %arg5, %c8_814 : index + %c16_815 = constant 16 : index + %c0_816 = constant 0 : index + %c-1_817 = constant -1 : index + %1621 = cmpi "slt", %1620, %c0_816 : index + %1622 = subi %c-1_817, %1620 : index + %1623 = select %1621, %1622, %1620 : index + %1624 = divi_signed %1623, %c16_815 : index + %1625 = subi %c-1_817, %1624 : index + %1626 = select %1621, %1625, %1624 : index + %c-2_818 = constant -2 : index + %1627 = muli %1626, %c-2_818 : index + %1628 = addi %1619, %1627 : index + %c8_819 = constant 8 : index + %c0_820 = constant 0 : index + %c-1_821 = constant -1 : index + %1629 = cmpi "slt", %arg5, %c0_820 : index + %1630 = subi %c-1_821, %arg5 : index + %1631 = select %1629, %1630, %arg5 : index + %1632 = divi_signed %1631, %c8_819 : index + %1633 = subi %c-1_821, %1632 : index + %1634 = select %1629, %1633, %1632 : index + %c8_822 = constant 8 : index + %1635 = addi %arg5, %c8_822 : index + %c16_823 = constant 16 : index + %c0_824 = constant 0 : index + %c-1_825 = constant -1 : index + %1636 = cmpi "slt", %1635, %c0_824 : index + %1637 = subi %c-1_825, %1635 : index + %1638 = select %1636, %1637, %1635 : index + %1639 = divi_signed %1638, %c16_823 : index + %1640 = subi %c-1_825, %1639 : index + %1641 = select %1636, %1640, %1639 : index + %c-2_826 = constant -2 : index + %1642 = muli %1641, %c-2_826 : index + %1643 = addi %1634, %1642 : index + %c1_827 = constant 1 : index + %1644 = addi %1643, %c1_827 : index + %c2_828 = constant 2 : index + %c0_829 = constant 0 : index + %c-1_830 = constant -1 : index + %1645 = cmpi "slt", %1644, %c0_829 : index + %1646 = subi %c-1_830, %1644 : index + %1647 = select %1645, %1646, %1644 : index + %1648 = divi_signed %1647, %c2_828 : index + %1649 = subi %c-1_830, %1648 : index + %1650 = select %1645, %1649, %1648 : index + %c-2_831 = constant -2 : index + %1651 = muli %1650, %c-2_831 : index + %1652 = addi %1628, %1651 : index + %c1_832 = constant 1 : index + %1653 = addi %1652, %c1_832 : index + %1654 = load %3[%1608, %1613, %1653] : memref<16x128x2xvector<8xf32>> + %1655 = vector.extractelement %1654[%c5_i64 : i64] : vector<8xf32> + %c8_833 = constant 8 : index + %1656 = addi %arg5, %c8_833 : index + %c16_834 = constant 16 : index + %c0_835 = constant 0 : index + %c-1_836 = constant -1 : index + %1657 = cmpi "slt", %1656, %c0_835 : index + %1658 = subi %c-1_836, %1656 : index + %1659 = select %1657, %1658, %1656 : index + %1660 = divi_signed %1659, %c16_834 : index + %1661 = subi %c-1_836, %1660 : index + %1662 = select %1657, %1661, %1660 : index + %c16_837 = constant 16 : index + %1663 = remi_signed %1662, %c16_837 : index + %c0_838 = constant 0 : index + %1664 = cmpi "slt", %1663, %c0_838 : index + %1665 = addi %1663, %c16_837 : index + %1666 = select %1664, %1665, %1663 : index + %1667 = addi %arg6, %arg7 : index + %c128_839 = constant 128 : index + %1668 = remi_signed %1667, %c128_839 : index + %c0_840 = constant 0 : index + %1669 = cmpi "slt", %1668, %c0_840 : index + %1670 = addi %1668, %c128_839 : index + %1671 = select %1669, %1670, %1668 : index + %c8_841 = constant 8 : index + %c0_842 = constant 0 : index + %c-1_843 = constant -1 : index + %1672 = cmpi "slt", %arg5, %c0_842 : index + %1673 = subi %c-1_843, %arg5 : index + %1674 = select %1672, %1673, %arg5 : index + %1675 = divi_signed %1674, %c8_841 : index + %1676 = subi %c-1_843, %1675 : index + %1677 = select %1672, %1676, %1675 : index + %c8_844 = constant 8 : index + %1678 = addi %arg5, %c8_844 : index + %c16_845 = constant 16 : index + %c0_846 = constant 0 : index + %c-1_847 = constant -1 : index + %1679 = cmpi "slt", %1678, %c0_846 : index + %1680 = subi %c-1_847, %1678 : index + %1681 = select %1679, %1680, %1678 : index + %1682 = divi_signed %1681, %c16_845 : index + %1683 = subi %c-1_847, %1682 : index + %1684 = select %1679, %1683, %1682 : index + %c-2_848 = constant -2 : index + %1685 = muli %1684, %c-2_848 : index + %1686 = addi %1677, %1685 : index + %c8_849 = constant 8 : index + %c0_850 = constant 0 : index + %c-1_851 = constant -1 : index + %1687 = cmpi "slt", %arg5, %c0_850 : index + %1688 = subi %c-1_851, %arg5 : index + %1689 = select %1687, %1688, %arg5 : index + %1690 = divi_signed %1689, %c8_849 : index + %1691 = subi %c-1_851, %1690 : index + %1692 = select %1687, %1691, %1690 : index + %c8_852 = constant 8 : index + %1693 = addi %arg5, %c8_852 : index + %c16_853 = constant 16 : index + %c0_854 = constant 0 : index + %c-1_855 = constant -1 : index + %1694 = cmpi "slt", %1693, %c0_854 : index + %1695 = subi %c-1_855, %1693 : index + %1696 = select %1694, %1695, %1693 : index + %1697 = divi_signed %1696, %c16_853 : index + %1698 = subi %c-1_855, %1697 : index + %1699 = select %1694, %1698, %1697 : index + %c-2_856 = constant -2 : index + %1700 = muli %1699, %c-2_856 : index + %1701 = addi %1692, %1700 : index + %c1_857 = constant 1 : index + %1702 = addi %1701, %c1_857 : index + %c2_858 = constant 2 : index + %c0_859 = constant 0 : index + %c-1_860 = constant -1 : index + %1703 = cmpi "slt", %1702, %c0_859 : index + %1704 = subi %c-1_860, %1702 : index + %1705 = select %1703, %1704, %1702 : index + %1706 = divi_signed %1705, %c2_858 : index + %1707 = subi %c-1_860, %1706 : index + %1708 = select %1703, %1707, %1706 : index + %c-2_861 = constant -2 : index + %1709 = muli %1708, %c-2_861 : index + %1710 = addi %1686, %1709 : index + %c1_862 = constant 1 : index + %1711 = addi %1710, %c1_862 : index + %1712 = load %3[%1666, %1671, %1711] : memref<16x128x2xvector<8xf32>> + %1713 = vector.extractelement %1712[%c6_i64 : i64] : vector<8xf32> + %c8_863 = constant 8 : index + %1714 = addi %arg5, %c8_863 : index + %c16_864 = constant 16 : index + %c0_865 = constant 0 : index + %c-1_866 = constant -1 : index + %1715 = cmpi "slt", %1714, %c0_865 : index + %1716 = subi %c-1_866, %1714 : index + %1717 = select %1715, %1716, %1714 : index + %1718 = divi_signed %1717, %c16_864 : index + %1719 = subi %c-1_866, %1718 : index + %1720 = select %1715, %1719, %1718 : index + %c16_867 = constant 16 : index + %1721 = remi_signed %1720, %c16_867 : index + %c0_868 = constant 0 : index + %1722 = cmpi "slt", %1721, %c0_868 : index + %1723 = addi %1721, %c16_867 : index + %1724 = select %1722, %1723, %1721 : index + %1725 = addi %arg6, %arg7 : index + %c128_869 = constant 128 : index + %1726 = remi_signed %1725, %c128_869 : index + %c0_870 = constant 0 : index + %1727 = cmpi "slt", %1726, %c0_870 : index + %1728 = addi %1726, %c128_869 : index + %1729 = select %1727, %1728, %1726 : index + %c8_871 = constant 8 : index + %c0_872 = constant 0 : index + %c-1_873 = constant -1 : index + %1730 = cmpi "slt", %arg5, %c0_872 : index + %1731 = subi %c-1_873, %arg5 : index + %1732 = select %1730, %1731, %arg5 : index + %1733 = divi_signed %1732, %c8_871 : index + %1734 = subi %c-1_873, %1733 : index + %1735 = select %1730, %1734, %1733 : index + %c8_874 = constant 8 : index + %1736 = addi %arg5, %c8_874 : index + %c16_875 = constant 16 : index + %c0_876 = constant 0 : index + %c-1_877 = constant -1 : index + %1737 = cmpi "slt", %1736, %c0_876 : index + %1738 = subi %c-1_877, %1736 : index + %1739 = select %1737, %1738, %1736 : index + %1740 = divi_signed %1739, %c16_875 : index + %1741 = subi %c-1_877, %1740 : index + %1742 = select %1737, %1741, %1740 : index + %c-2_878 = constant -2 : index + %1743 = muli %1742, %c-2_878 : index + %1744 = addi %1735, %1743 : index + %c8_879 = constant 8 : index + %c0_880 = constant 0 : index + %c-1_881 = constant -1 : index + %1745 = cmpi "slt", %arg5, %c0_880 : index + %1746 = subi %c-1_881, %arg5 : index + %1747 = select %1745, %1746, %arg5 : index + %1748 = divi_signed %1747, %c8_879 : index + %1749 = subi %c-1_881, %1748 : index + %1750 = select %1745, %1749, %1748 : index + %c8_882 = constant 8 : index + %1751 = addi %arg5, %c8_882 : index + %c16_883 = constant 16 : index + %c0_884 = constant 0 : index + %c-1_885 = constant -1 : index + %1752 = cmpi "slt", %1751, %c0_884 : index + %1753 = subi %c-1_885, %1751 : index + %1754 = select %1752, %1753, %1751 : index + %1755 = divi_signed %1754, %c16_883 : index + %1756 = subi %c-1_885, %1755 : index + %1757 = select %1752, %1756, %1755 : index + %c-2_886 = constant -2 : index + %1758 = muli %1757, %c-2_886 : index + %1759 = addi %1750, %1758 : index + %c1_887 = constant 1 : index + %1760 = addi %1759, %c1_887 : index + %c2_888 = constant 2 : index + %c0_889 = constant 0 : index + %c-1_890 = constant -1 : index + %1761 = cmpi "slt", %1760, %c0_889 : index + %1762 = subi %c-1_890, %1760 : index + %1763 = select %1761, %1762, %1760 : index + %1764 = divi_signed %1763, %c2_888 : index + %1765 = subi %c-1_890, %1764 : index + %1766 = select %1761, %1765, %1764 : index + %c-2_891 = constant -2 : index + %1767 = muli %1766, %c-2_891 : index + %1768 = addi %1744, %1767 : index + %c1_892 = constant 1 : index + %1769 = addi %1768, %c1_892 : index + %1770 = load %3[%1724, %1729, %1769] : memref<16x128x2xvector<8xf32>> + %1771 = vector.extractelement %1770[%c7_i64 : i64] : vector<8xf32> + %1772 = "accv.bin_op"(%1300, %1365) {predicate = 2 : i64} : (f32, f32) -> f32 + %1773 = "accv.bin_op"(%1301, %1423) {predicate = 2 : i64} : (f32, f32) -> f32 + %1774 = "accv.bin_op"(%1302, %1481) {predicate = 2 : i64} : (f32, f32) -> f32 + %1775 = "accv.bin_op"(%1303, %1539) {predicate = 2 : i64} : (f32, f32) -> f32 + %1776 = "accv.bin_op"(%1304, %1597) {predicate = 2 : i64} : (f32, f32) -> f32 + %1777 = "accv.bin_op"(%1305, %1655) {predicate = 2 : i64} : (f32, f32) -> f32 + %1778 = "accv.bin_op"(%1306, %1713) {predicate = 2 : i64} : (f32, f32) -> f32 + %1779 = "accv.bin_op"(%1307, %1771) {predicate = 2 : i64} : (f32, f32) -> f32 + %c8_893 = constant 8 : index + %1780 = addi %arg5, %c8_893 : index + %c16_894 = constant 16 : index + %c0_895 = constant 0 : index + %c-1_896 = constant -1 : index + %1781 = cmpi "slt", %1780, %c0_895 : index + %1782 = subi %c-1_896, %1780 : index + %1783 = select %1781, %1782, %1780 : index + %1784 = divi_signed %1783, %c16_894 : index + %1785 = subi %c-1_896, %1784 : index + %1786 = select %1781, %1785, %1784 : index + %c16_897 = constant 16 : index + %1787 = remi_signed %1786, %c16_897 : index + %c0_898 = constant 0 : index + %1788 = cmpi "slt", %1787, %c0_898 : index + %1789 = addi %1787, %c16_897 : index + %1790 = select %1788, %1789, %1787 : index + %c0_899 = constant 0 : index + %c8_900 = constant 8 : index + %c0_901 = constant 0 : index + %c-1_902 = constant -1 : index + %1791 = cmpi "slt", %arg5, %c0_901 : index + %1792 = subi %c-1_902, %arg5 : index + %1793 = select %1791, %1792, %arg5 : index + %1794 = divi_signed %1793, %c8_900 : index + %1795 = subi %c-1_902, %1794 : index + %1796 = select %1791, %1795, %1794 : index + %c8_903 = constant 8 : index + %1797 = addi %arg5, %c8_903 : index + %c16_904 = constant 16 : index + %c0_905 = constant 0 : index + %c-1_906 = constant -1 : index + %1798 = cmpi "slt", %1797, %c0_905 : index + %1799 = subi %c-1_906, %1797 : index + %1800 = select %1798, %1799, %1797 : index + %1801 = divi_signed %1800, %c16_904 : index + %1802 = subi %c-1_906, %1801 : index + %1803 = select %1798, %1802, %1801 : index + %c-2_907 = constant -2 : index + %1804 = muli %1803, %c-2_907 : index + %1805 = addi %1796, %1804 : index + %c8_908 = constant 8 : index + %c0_909 = constant 0 : index + %c-1_910 = constant -1 : index + %1806 = cmpi "slt", %arg5, %c0_909 : index + %1807 = subi %c-1_910, %arg5 : index + %1808 = select %1806, %1807, %arg5 : index + %1809 = divi_signed %1808, %c8_908 : index + %1810 = subi %c-1_910, %1809 : index + %1811 = select %1806, %1810, %1809 : index + %c8_911 = constant 8 : index + %1812 = addi %arg5, %c8_911 : index + %c16_912 = constant 16 : index + %c0_913 = constant 0 : index + %c-1_914 = constant -1 : index + %1813 = cmpi "slt", %1812, %c0_913 : index + %1814 = subi %c-1_914, %1812 : index + %1815 = select %1813, %1814, %1812 : index + %1816 = divi_signed %1815, %c16_912 : index + %1817 = subi %c-1_914, %1816 : index + %1818 = select %1813, %1817, %1816 : index + %c-2_915 = constant -2 : index + %1819 = muli %1818, %c-2_915 : index + %1820 = addi %1811, %1819 : index + %c1_916 = constant 1 : index + %1821 = addi %1820, %c1_916 : index + %c2_917 = constant 2 : index + %c0_918 = constant 0 : index + %c-1_919 = constant -1 : index + %1822 = cmpi "slt", %1821, %c0_918 : index + %1823 = subi %c-1_919, %1821 : index + %1824 = select %1822, %1823, %1821 : index + %1825 = divi_signed %1824, %c2_917 : index + %1826 = subi %c-1_919, %1825 : index + %1827 = select %1822, %1826, %1825 : index + %c-2_920 = constant -2 : index + %1828 = muli %1827, %c-2_920 : index + %1829 = addi %1805, %1828 : index + %c1_921 = constant 1 : index + %1830 = addi %1829, %c1_921 : index + %1831 = load %2[%1790, %c0_899, %1830] : memref<16x6x2xvector<8xf32>> + %1832 = vector.extractelement %1831[%c0_i64 : i64] : vector<8xf32> + %c8_922 = constant 8 : index + %1833 = addi %arg5, %c8_922 : index + %c16_923 = constant 16 : index + %c0_924 = constant 0 : index + %c-1_925 = constant -1 : index + %1834 = cmpi "slt", %1833, %c0_924 : index + %1835 = subi %c-1_925, %1833 : index + %1836 = select %1834, %1835, %1833 : index + %1837 = divi_signed %1836, %c16_923 : index + %1838 = subi %c-1_925, %1837 : index + %1839 = select %1834, %1838, %1837 : index + %c16_926 = constant 16 : index + %1840 = remi_signed %1839, %c16_926 : index + %c0_927 = constant 0 : index + %1841 = cmpi "slt", %1840, %c0_927 : index + %1842 = addi %1840, %c16_926 : index + %1843 = select %1841, %1842, %1840 : index + %c0_928 = constant 0 : index + %c8_929 = constant 8 : index + %c0_930 = constant 0 : index + %c-1_931 = constant -1 : index + %1844 = cmpi "slt", %arg5, %c0_930 : index + %1845 = subi %c-1_931, %arg5 : index + %1846 = select %1844, %1845, %arg5 : index + %1847 = divi_signed %1846, %c8_929 : index + %1848 = subi %c-1_931, %1847 : index + %1849 = select %1844, %1848, %1847 : index + %c8_932 = constant 8 : index + %1850 = addi %arg5, %c8_932 : index + %c16_933 = constant 16 : index + %c0_934 = constant 0 : index + %c-1_935 = constant -1 : index + %1851 = cmpi "slt", %1850, %c0_934 : index + %1852 = subi %c-1_935, %1850 : index + %1853 = select %1851, %1852, %1850 : index + %1854 = divi_signed %1853, %c16_933 : index + %1855 = subi %c-1_935, %1854 : index + %1856 = select %1851, %1855, %1854 : index + %c-2_936 = constant -2 : index + %1857 = muli %1856, %c-2_936 : index + %1858 = addi %1849, %1857 : index + %c8_937 = constant 8 : index + %c0_938 = constant 0 : index + %c-1_939 = constant -1 : index + %1859 = cmpi "slt", %arg5, %c0_938 : index + %1860 = subi %c-1_939, %arg5 : index + %1861 = select %1859, %1860, %arg5 : index + %1862 = divi_signed %1861, %c8_937 : index + %1863 = subi %c-1_939, %1862 : index + %1864 = select %1859, %1863, %1862 : index + %c8_940 = constant 8 : index + %1865 = addi %arg5, %c8_940 : index + %c16_941 = constant 16 : index + %c0_942 = constant 0 : index + %c-1_943 = constant -1 : index + %1866 = cmpi "slt", %1865, %c0_942 : index + %1867 = subi %c-1_943, %1865 : index + %1868 = select %1866, %1867, %1865 : index + %1869 = divi_signed %1868, %c16_941 : index + %1870 = subi %c-1_943, %1869 : index + %1871 = select %1866, %1870, %1869 : index + %c-2_944 = constant -2 : index + %1872 = muli %1871, %c-2_944 : index + %1873 = addi %1864, %1872 : index + %c1_945 = constant 1 : index + %1874 = addi %1873, %c1_945 : index + %c2_946 = constant 2 : index + %c0_947 = constant 0 : index + %c-1_948 = constant -1 : index + %1875 = cmpi "slt", %1874, %c0_947 : index + %1876 = subi %c-1_948, %1874 : index + %1877 = select %1875, %1876, %1874 : index + %1878 = divi_signed %1877, %c2_946 : index + %1879 = subi %c-1_948, %1878 : index + %1880 = select %1875, %1879, %1878 : index + %c-2_949 = constant -2 : index + %1881 = muli %1880, %c-2_949 : index + %1882 = addi %1858, %1881 : index + %c1_950 = constant 1 : index + %1883 = addi %1882, %c1_950 : index + %1884 = load %2[%1843, %c0_928, %1883] : memref<16x6x2xvector<8xf32>> + %1885 = vector.extractelement %1884[%c1_i64 : i64] : vector<8xf32> + %c8_951 = constant 8 : index + %1886 = addi %arg5, %c8_951 : index + %c16_952 = constant 16 : index + %c0_953 = constant 0 : index + %c-1_954 = constant -1 : index + %1887 = cmpi "slt", %1886, %c0_953 : index + %1888 = subi %c-1_954, %1886 : index + %1889 = select %1887, %1888, %1886 : index + %1890 = divi_signed %1889, %c16_952 : index + %1891 = subi %c-1_954, %1890 : index + %1892 = select %1887, %1891, %1890 : index + %c16_955 = constant 16 : index + %1893 = remi_signed %1892, %c16_955 : index + %c0_956 = constant 0 : index + %1894 = cmpi "slt", %1893, %c0_956 : index + %1895 = addi %1893, %c16_955 : index + %1896 = select %1894, %1895, %1893 : index + %c0_957 = constant 0 : index + %c8_958 = constant 8 : index + %c0_959 = constant 0 : index + %c-1_960 = constant -1 : index + %1897 = cmpi "slt", %arg5, %c0_959 : index + %1898 = subi %c-1_960, %arg5 : index + %1899 = select %1897, %1898, %arg5 : index + %1900 = divi_signed %1899, %c8_958 : index + %1901 = subi %c-1_960, %1900 : index + %1902 = select %1897, %1901, %1900 : index + %c8_961 = constant 8 : index + %1903 = addi %arg5, %c8_961 : index + %c16_962 = constant 16 : index + %c0_963 = constant 0 : index + %c-1_964 = constant -1 : index + %1904 = cmpi "slt", %1903, %c0_963 : index + %1905 = subi %c-1_964, %1903 : index + %1906 = select %1904, %1905, %1903 : index + %1907 = divi_signed %1906, %c16_962 : index + %1908 = subi %c-1_964, %1907 : index + %1909 = select %1904, %1908, %1907 : index + %c-2_965 = constant -2 : index + %1910 = muli %1909, %c-2_965 : index + %1911 = addi %1902, %1910 : index + %c8_966 = constant 8 : index + %c0_967 = constant 0 : index + %c-1_968 = constant -1 : index + %1912 = cmpi "slt", %arg5, %c0_967 : index + %1913 = subi %c-1_968, %arg5 : index + %1914 = select %1912, %1913, %arg5 : index + %1915 = divi_signed %1914, %c8_966 : index + %1916 = subi %c-1_968, %1915 : index + %1917 = select %1912, %1916, %1915 : index + %c8_969 = constant 8 : index + %1918 = addi %arg5, %c8_969 : index + %c16_970 = constant 16 : index + %c0_971 = constant 0 : index + %c-1_972 = constant -1 : index + %1919 = cmpi "slt", %1918, %c0_971 : index + %1920 = subi %c-1_972, %1918 : index + %1921 = select %1919, %1920, %1918 : index + %1922 = divi_signed %1921, %c16_970 : index + %1923 = subi %c-1_972, %1922 : index + %1924 = select %1919, %1923, %1922 : index + %c-2_973 = constant -2 : index + %1925 = muli %1924, %c-2_973 : index + %1926 = addi %1917, %1925 : index + %c1_974 = constant 1 : index + %1927 = addi %1926, %c1_974 : index + %c2_975 = constant 2 : index + %c0_976 = constant 0 : index + %c-1_977 = constant -1 : index + %1928 = cmpi "slt", %1927, %c0_976 : index + %1929 = subi %c-1_977, %1927 : index + %1930 = select %1928, %1929, %1927 : index + %1931 = divi_signed %1930, %c2_975 : index + %1932 = subi %c-1_977, %1931 : index + %1933 = select %1928, %1932, %1931 : index + %c-2_978 = constant -2 : index + %1934 = muli %1933, %c-2_978 : index + %1935 = addi %1911, %1934 : index + %c1_979 = constant 1 : index + %1936 = addi %1935, %c1_979 : index + %1937 = load %2[%1896, %c0_957, %1936] : memref<16x6x2xvector<8xf32>> + %1938 = vector.extractelement %1937[%c2_i64 : i64] : vector<8xf32> + %c8_980 = constant 8 : index + %1939 = addi %arg5, %c8_980 : index + %c16_981 = constant 16 : index + %c0_982 = constant 0 : index + %c-1_983 = constant -1 : index + %1940 = cmpi "slt", %1939, %c0_982 : index + %1941 = subi %c-1_983, %1939 : index + %1942 = select %1940, %1941, %1939 : index + %1943 = divi_signed %1942, %c16_981 : index + %1944 = subi %c-1_983, %1943 : index + %1945 = select %1940, %1944, %1943 : index + %c16_984 = constant 16 : index + %1946 = remi_signed %1945, %c16_984 : index + %c0_985 = constant 0 : index + %1947 = cmpi "slt", %1946, %c0_985 : index + %1948 = addi %1946, %c16_984 : index + %1949 = select %1947, %1948, %1946 : index + %c0_986 = constant 0 : index + %c8_987 = constant 8 : index + %c0_988 = constant 0 : index + %c-1_989 = constant -1 : index + %1950 = cmpi "slt", %arg5, %c0_988 : index + %1951 = subi %c-1_989, %arg5 : index + %1952 = select %1950, %1951, %arg5 : index + %1953 = divi_signed %1952, %c8_987 : index + %1954 = subi %c-1_989, %1953 : index + %1955 = select %1950, %1954, %1953 : index + %c8_990 = constant 8 : index + %1956 = addi %arg5, %c8_990 : index + %c16_991 = constant 16 : index + %c0_992 = constant 0 : index + %c-1_993 = constant -1 : index + %1957 = cmpi "slt", %1956, %c0_992 : index + %1958 = subi %c-1_993, %1956 : index + %1959 = select %1957, %1958, %1956 : index + %1960 = divi_signed %1959, %c16_991 : index + %1961 = subi %c-1_993, %1960 : index + %1962 = select %1957, %1961, %1960 : index + %c-2_994 = constant -2 : index + %1963 = muli %1962, %c-2_994 : index + %1964 = addi %1955, %1963 : index + %c8_995 = constant 8 : index + %c0_996 = constant 0 : index + %c-1_997 = constant -1 : index + %1965 = cmpi "slt", %arg5, %c0_996 : index + %1966 = subi %c-1_997, %arg5 : index + %1967 = select %1965, %1966, %arg5 : index + %1968 = divi_signed %1967, %c8_995 : index + %1969 = subi %c-1_997, %1968 : index + %1970 = select %1965, %1969, %1968 : index + %c8_998 = constant 8 : index + %1971 = addi %arg5, %c8_998 : index + %c16_999 = constant 16 : index + %c0_1000 = constant 0 : index + %c-1_1001 = constant -1 : index + %1972 = cmpi "slt", %1971, %c0_1000 : index + %1973 = subi %c-1_1001, %1971 : index + %1974 = select %1972, %1973, %1971 : index + %1975 = divi_signed %1974, %c16_999 : index + %1976 = subi %c-1_1001, %1975 : index + %1977 = select %1972, %1976, %1975 : index + %c-2_1002 = constant -2 : index + %1978 = muli %1977, %c-2_1002 : index + %1979 = addi %1970, %1978 : index + %c1_1003 = constant 1 : index + %1980 = addi %1979, %c1_1003 : index + %c2_1004 = constant 2 : index + %c0_1005 = constant 0 : index + %c-1_1006 = constant -1 : index + %1981 = cmpi "slt", %1980, %c0_1005 : index + %1982 = subi %c-1_1006, %1980 : index + %1983 = select %1981, %1982, %1980 : index + %1984 = divi_signed %1983, %c2_1004 : index + %1985 = subi %c-1_1006, %1984 : index + %1986 = select %1981, %1985, %1984 : index + %c-2_1007 = constant -2 : index + %1987 = muli %1986, %c-2_1007 : index + %1988 = addi %1964, %1987 : index + %c1_1008 = constant 1 : index + %1989 = addi %1988, %c1_1008 : index + %1990 = load %2[%1949, %c0_986, %1989] : memref<16x6x2xvector<8xf32>> + %1991 = vector.extractelement %1990[%c3_i64 : i64] : vector<8xf32> + %c8_1009 = constant 8 : index + %1992 = addi %arg5, %c8_1009 : index + %c16_1010 = constant 16 : index + %c0_1011 = constant 0 : index + %c-1_1012 = constant -1 : index + %1993 = cmpi "slt", %1992, %c0_1011 : index + %1994 = subi %c-1_1012, %1992 : index + %1995 = select %1993, %1994, %1992 : index + %1996 = divi_signed %1995, %c16_1010 : index + %1997 = subi %c-1_1012, %1996 : index + %1998 = select %1993, %1997, %1996 : index + %c16_1013 = constant 16 : index + %1999 = remi_signed %1998, %c16_1013 : index + %c0_1014 = constant 0 : index + %2000 = cmpi "slt", %1999, %c0_1014 : index + %2001 = addi %1999, %c16_1013 : index + %2002 = select %2000, %2001, %1999 : index + %c0_1015 = constant 0 : index + %c8_1016 = constant 8 : index + %c0_1017 = constant 0 : index + %c-1_1018 = constant -1 : index + %2003 = cmpi "slt", %arg5, %c0_1017 : index + %2004 = subi %c-1_1018, %arg5 : index + %2005 = select %2003, %2004, %arg5 : index + %2006 = divi_signed %2005, %c8_1016 : index + %2007 = subi %c-1_1018, %2006 : index + %2008 = select %2003, %2007, %2006 : index + %c8_1019 = constant 8 : index + %2009 = addi %arg5, %c8_1019 : index + %c16_1020 = constant 16 : index + %c0_1021 = constant 0 : index + %c-1_1022 = constant -1 : index + %2010 = cmpi "slt", %2009, %c0_1021 : index + %2011 = subi %c-1_1022, %2009 : index + %2012 = select %2010, %2011, %2009 : index + %2013 = divi_signed %2012, %c16_1020 : index + %2014 = subi %c-1_1022, %2013 : index + %2015 = select %2010, %2014, %2013 : index + %c-2_1023 = constant -2 : index + %2016 = muli %2015, %c-2_1023 : index + %2017 = addi %2008, %2016 : index + %c8_1024 = constant 8 : index + %c0_1025 = constant 0 : index + %c-1_1026 = constant -1 : index + %2018 = cmpi "slt", %arg5, %c0_1025 : index + %2019 = subi %c-1_1026, %arg5 : index + %2020 = select %2018, %2019, %arg5 : index + %2021 = divi_signed %2020, %c8_1024 : index + %2022 = subi %c-1_1026, %2021 : index + %2023 = select %2018, %2022, %2021 : index + %c8_1027 = constant 8 : index + %2024 = addi %arg5, %c8_1027 : index + %c16_1028 = constant 16 : index + %c0_1029 = constant 0 : index + %c-1_1030 = constant -1 : index + %2025 = cmpi "slt", %2024, %c0_1029 : index + %2026 = subi %c-1_1030, %2024 : index + %2027 = select %2025, %2026, %2024 : index + %2028 = divi_signed %2027, %c16_1028 : index + %2029 = subi %c-1_1030, %2028 : index + %2030 = select %2025, %2029, %2028 : index + %c-2_1031 = constant -2 : index + %2031 = muli %2030, %c-2_1031 : index + %2032 = addi %2023, %2031 : index + %c1_1032 = constant 1 : index + %2033 = addi %2032, %c1_1032 : index + %c2_1033 = constant 2 : index + %c0_1034 = constant 0 : index + %c-1_1035 = constant -1 : index + %2034 = cmpi "slt", %2033, %c0_1034 : index + %2035 = subi %c-1_1035, %2033 : index + %2036 = select %2034, %2035, %2033 : index + %2037 = divi_signed %2036, %c2_1033 : index + %2038 = subi %c-1_1035, %2037 : index + %2039 = select %2034, %2038, %2037 : index + %c-2_1036 = constant -2 : index + %2040 = muli %2039, %c-2_1036 : index + %2041 = addi %2017, %2040 : index + %c1_1037 = constant 1 : index + %2042 = addi %2041, %c1_1037 : index + %2043 = load %2[%2002, %c0_1015, %2042] : memref<16x6x2xvector<8xf32>> + %2044 = vector.extractelement %2043[%c4_i64 : i64] : vector<8xf32> + %c8_1038 = constant 8 : index + %2045 = addi %arg5, %c8_1038 : index + %c16_1039 = constant 16 : index + %c0_1040 = constant 0 : index + %c-1_1041 = constant -1 : index + %2046 = cmpi "slt", %2045, %c0_1040 : index + %2047 = subi %c-1_1041, %2045 : index + %2048 = select %2046, %2047, %2045 : index + %2049 = divi_signed %2048, %c16_1039 : index + %2050 = subi %c-1_1041, %2049 : index + %2051 = select %2046, %2050, %2049 : index + %c16_1042 = constant 16 : index + %2052 = remi_signed %2051, %c16_1042 : index + %c0_1043 = constant 0 : index + %2053 = cmpi "slt", %2052, %c0_1043 : index + %2054 = addi %2052, %c16_1042 : index + %2055 = select %2053, %2054, %2052 : index + %c0_1044 = constant 0 : index + %c8_1045 = constant 8 : index + %c0_1046 = constant 0 : index + %c-1_1047 = constant -1 : index + %2056 = cmpi "slt", %arg5, %c0_1046 : index + %2057 = subi %c-1_1047, %arg5 : index + %2058 = select %2056, %2057, %arg5 : index + %2059 = divi_signed %2058, %c8_1045 : index + %2060 = subi %c-1_1047, %2059 : index + %2061 = select %2056, %2060, %2059 : index + %c8_1048 = constant 8 : index + %2062 = addi %arg5, %c8_1048 : index + %c16_1049 = constant 16 : index + %c0_1050 = constant 0 : index + %c-1_1051 = constant -1 : index + %2063 = cmpi "slt", %2062, %c0_1050 : index + %2064 = subi %c-1_1051, %2062 : index + %2065 = select %2063, %2064, %2062 : index + %2066 = divi_signed %2065, %c16_1049 : index + %2067 = subi %c-1_1051, %2066 : index + %2068 = select %2063, %2067, %2066 : index + %c-2_1052 = constant -2 : index + %2069 = muli %2068, %c-2_1052 : index + %2070 = addi %2061, %2069 : index + %c8_1053 = constant 8 : index + %c0_1054 = constant 0 : index + %c-1_1055 = constant -1 : index + %2071 = cmpi "slt", %arg5, %c0_1054 : index + %2072 = subi %c-1_1055, %arg5 : index + %2073 = select %2071, %2072, %arg5 : index + %2074 = divi_signed %2073, %c8_1053 : index + %2075 = subi %c-1_1055, %2074 : index + %2076 = select %2071, %2075, %2074 : index + %c8_1056 = constant 8 : index + %2077 = addi %arg5, %c8_1056 : index + %c16_1057 = constant 16 : index + %c0_1058 = constant 0 : index + %c-1_1059 = constant -1 : index + %2078 = cmpi "slt", %2077, %c0_1058 : index + %2079 = subi %c-1_1059, %2077 : index + %2080 = select %2078, %2079, %2077 : index + %2081 = divi_signed %2080, %c16_1057 : index + %2082 = subi %c-1_1059, %2081 : index + %2083 = select %2078, %2082, %2081 : index + %c-2_1060 = constant -2 : index + %2084 = muli %2083, %c-2_1060 : index + %2085 = addi %2076, %2084 : index + %c1_1061 = constant 1 : index + %2086 = addi %2085, %c1_1061 : index + %c2_1062 = constant 2 : index + %c0_1063 = constant 0 : index + %c-1_1064 = constant -1 : index + %2087 = cmpi "slt", %2086, %c0_1063 : index + %2088 = subi %c-1_1064, %2086 : index + %2089 = select %2087, %2088, %2086 : index + %2090 = divi_signed %2089, %c2_1062 : index + %2091 = subi %c-1_1064, %2090 : index + %2092 = select %2087, %2091, %2090 : index + %c-2_1065 = constant -2 : index + %2093 = muli %2092, %c-2_1065 : index + %2094 = addi %2070, %2093 : index + %c1_1066 = constant 1 : index + %2095 = addi %2094, %c1_1066 : index + %2096 = load %2[%2055, %c0_1044, %2095] : memref<16x6x2xvector<8xf32>> + %2097 = vector.extractelement %2096[%c5_i64 : i64] : vector<8xf32> + %c8_1067 = constant 8 : index + %2098 = addi %arg5, %c8_1067 : index + %c16_1068 = constant 16 : index + %c0_1069 = constant 0 : index + %c-1_1070 = constant -1 : index + %2099 = cmpi "slt", %2098, %c0_1069 : index + %2100 = subi %c-1_1070, %2098 : index + %2101 = select %2099, %2100, %2098 : index + %2102 = divi_signed %2101, %c16_1068 : index + %2103 = subi %c-1_1070, %2102 : index + %2104 = select %2099, %2103, %2102 : index + %c16_1071 = constant 16 : index + %2105 = remi_signed %2104, %c16_1071 : index + %c0_1072 = constant 0 : index + %2106 = cmpi "slt", %2105, %c0_1072 : index + %2107 = addi %2105, %c16_1071 : index + %2108 = select %2106, %2107, %2105 : index + %c0_1073 = constant 0 : index + %c8_1074 = constant 8 : index + %c0_1075 = constant 0 : index + %c-1_1076 = constant -1 : index + %2109 = cmpi "slt", %arg5, %c0_1075 : index + %2110 = subi %c-1_1076, %arg5 : index + %2111 = select %2109, %2110, %arg5 : index + %2112 = divi_signed %2111, %c8_1074 : index + %2113 = subi %c-1_1076, %2112 : index + %2114 = select %2109, %2113, %2112 : index + %c8_1077 = constant 8 : index + %2115 = addi %arg5, %c8_1077 : index + %c16_1078 = constant 16 : index + %c0_1079 = constant 0 : index + %c-1_1080 = constant -1 : index + %2116 = cmpi "slt", %2115, %c0_1079 : index + %2117 = subi %c-1_1080, %2115 : index + %2118 = select %2116, %2117, %2115 : index + %2119 = divi_signed %2118, %c16_1078 : index + %2120 = subi %c-1_1080, %2119 : index + %2121 = select %2116, %2120, %2119 : index + %c-2_1081 = constant -2 : index + %2122 = muli %2121, %c-2_1081 : index + %2123 = addi %2114, %2122 : index + %c8_1082 = constant 8 : index + %c0_1083 = constant 0 : index + %c-1_1084 = constant -1 : index + %2124 = cmpi "slt", %arg5, %c0_1083 : index + %2125 = subi %c-1_1084, %arg5 : index + %2126 = select %2124, %2125, %arg5 : index + %2127 = divi_signed %2126, %c8_1082 : index + %2128 = subi %c-1_1084, %2127 : index + %2129 = select %2124, %2128, %2127 : index + %c8_1085 = constant 8 : index + %2130 = addi %arg5, %c8_1085 : index + %c16_1086 = constant 16 : index + %c0_1087 = constant 0 : index + %c-1_1088 = constant -1 : index + %2131 = cmpi "slt", %2130, %c0_1087 : index + %2132 = subi %c-1_1088, %2130 : index + %2133 = select %2131, %2132, %2130 : index + %2134 = divi_signed %2133, %c16_1086 : index + %2135 = subi %c-1_1088, %2134 : index + %2136 = select %2131, %2135, %2134 : index + %c-2_1089 = constant -2 : index + %2137 = muli %2136, %c-2_1089 : index + %2138 = addi %2129, %2137 : index + %c1_1090 = constant 1 : index + %2139 = addi %2138, %c1_1090 : index + %c2_1091 = constant 2 : index + %c0_1092 = constant 0 : index + %c-1_1093 = constant -1 : index + %2140 = cmpi "slt", %2139, %c0_1092 : index + %2141 = subi %c-1_1093, %2139 : index + %2142 = select %2140, %2141, %2139 : index + %2143 = divi_signed %2142, %c2_1091 : index + %2144 = subi %c-1_1093, %2143 : index + %2145 = select %2140, %2144, %2143 : index + %c-2_1094 = constant -2 : index + %2146 = muli %2145, %c-2_1094 : index + %2147 = addi %2123, %2146 : index + %c1_1095 = constant 1 : index + %2148 = addi %2147, %c1_1095 : index + %2149 = load %2[%2108, %c0_1073, %2148] : memref<16x6x2xvector<8xf32>> + %2150 = vector.extractelement %2149[%c6_i64 : i64] : vector<8xf32> + %c8_1096 = constant 8 : index + %2151 = addi %arg5, %c8_1096 : index + %c16_1097 = constant 16 : index + %c0_1098 = constant 0 : index + %c-1_1099 = constant -1 : index + %2152 = cmpi "slt", %2151, %c0_1098 : index + %2153 = subi %c-1_1099, %2151 : index + %2154 = select %2152, %2153, %2151 : index + %2155 = divi_signed %2154, %c16_1097 : index + %2156 = subi %c-1_1099, %2155 : index + %2157 = select %2152, %2156, %2155 : index + %c16_1100 = constant 16 : index + %2158 = remi_signed %2157, %c16_1100 : index + %c0_1101 = constant 0 : index + %2159 = cmpi "slt", %2158, %c0_1101 : index + %2160 = addi %2158, %c16_1100 : index + %2161 = select %2159, %2160, %2158 : index + %c0_1102 = constant 0 : index + %c8_1103 = constant 8 : index + %c0_1104 = constant 0 : index + %c-1_1105 = constant -1 : index + %2162 = cmpi "slt", %arg5, %c0_1104 : index + %2163 = subi %c-1_1105, %arg5 : index + %2164 = select %2162, %2163, %arg5 : index + %2165 = divi_signed %2164, %c8_1103 : index + %2166 = subi %c-1_1105, %2165 : index + %2167 = select %2162, %2166, %2165 : index + %c8_1106 = constant 8 : index + %2168 = addi %arg5, %c8_1106 : index + %c16_1107 = constant 16 : index + %c0_1108 = constant 0 : index + %c-1_1109 = constant -1 : index + %2169 = cmpi "slt", %2168, %c0_1108 : index + %2170 = subi %c-1_1109, %2168 : index + %2171 = select %2169, %2170, %2168 : index + %2172 = divi_signed %2171, %c16_1107 : index + %2173 = subi %c-1_1109, %2172 : index + %2174 = select %2169, %2173, %2172 : index + %c-2_1110 = constant -2 : index + %2175 = muli %2174, %c-2_1110 : index + %2176 = addi %2167, %2175 : index + %c8_1111 = constant 8 : index + %c0_1112 = constant 0 : index + %c-1_1113 = constant -1 : index + %2177 = cmpi "slt", %arg5, %c0_1112 : index + %2178 = subi %c-1_1113, %arg5 : index + %2179 = select %2177, %2178, %arg5 : index + %2180 = divi_signed %2179, %c8_1111 : index + %2181 = subi %c-1_1113, %2180 : index + %2182 = select %2177, %2181, %2180 : index + %c8_1114 = constant 8 : index + %2183 = addi %arg5, %c8_1114 : index + %c16_1115 = constant 16 : index + %c0_1116 = constant 0 : index + %c-1_1117 = constant -1 : index + %2184 = cmpi "slt", %2183, %c0_1116 : index + %2185 = subi %c-1_1117, %2183 : index + %2186 = select %2184, %2185, %2183 : index + %2187 = divi_signed %2186, %c16_1115 : index + %2188 = subi %c-1_1117, %2187 : index + %2189 = select %2184, %2188, %2187 : index + %c-2_1118 = constant -2 : index + %2190 = muli %2189, %c-2_1118 : index + %2191 = addi %2182, %2190 : index + %c1_1119 = constant 1 : index + %2192 = addi %2191, %c1_1119 : index + %c2_1120 = constant 2 : index + %c0_1121 = constant 0 : index + %c-1_1122 = constant -1 : index + %2193 = cmpi "slt", %2192, %c0_1121 : index + %2194 = subi %c-1_1122, %2192 : index + %2195 = select %2193, %2194, %2192 : index + %2196 = divi_signed %2195, %c2_1120 : index + %2197 = subi %c-1_1122, %2196 : index + %2198 = select %2193, %2197, %2196 : index + %c-2_1123 = constant -2 : index + %2199 = muli %2198, %c-2_1123 : index + %2200 = addi %2176, %2199 : index + %c1_1124 = constant 1 : index + %2201 = addi %2200, %c1_1124 : index + %2202 = load %2[%2161, %c0_1102, %2201] : memref<16x6x2xvector<8xf32>> + %2203 = vector.extractelement %2202[%c7_i64 : i64] : vector<8xf32> + %2204 = "accv.bin_op"(%1832, %1772) {predicate = 0 : i64} : (f32, f32) -> f32 + %2205 = "accv.bin_op"(%1885, %1773) {predicate = 0 : i64} : (f32, f32) -> f32 + %2206 = "accv.bin_op"(%1938, %1774) {predicate = 0 : i64} : (f32, f32) -> f32 + %2207 = "accv.bin_op"(%1991, %1775) {predicate = 0 : i64} : (f32, f32) -> f32 + %2208 = "accv.bin_op"(%2044, %1776) {predicate = 0 : i64} : (f32, f32) -> f32 + %2209 = "accv.bin_op"(%2097, %1777) {predicate = 0 : i64} : (f32, f32) -> f32 + %2210 = "accv.bin_op"(%2150, %1778) {predicate = 0 : i64} : (f32, f32) -> f32 + %2211 = "accv.bin_op"(%2203, %1779) {predicate = 0 : i64} : (f32, f32) -> f32 + %c8_1125 = constant 8 : index + %2212 = addi %arg5, %c8_1125 : index + %c16_1126 = constant 16 : index + %c0_1127 = constant 0 : index + %c-1_1128 = constant -1 : index + %2213 = cmpi "slt", %2212, %c0_1127 : index + %2214 = subi %c-1_1128, %2212 : index + %2215 = select %2213, %2214, %2212 : index + %2216 = divi_signed %2215, %c16_1126 : index + %2217 = subi %c-1_1128, %2216 : index + %2218 = select %2213, %2217, %2216 : index + %c16_1129 = constant 16 : index + %2219 = remi_signed %2218, %c16_1129 : index + %c0_1130 = constant 0 : index + %2220 = cmpi "slt", %2219, %c0_1130 : index + %2221 = addi %2219, %c16_1129 : index + %2222 = select %2220, %2221, %2219 : index + %c0_1131 = constant 0 : index + %c8_1132 = constant 8 : index + %c0_1133 = constant 0 : index + %c-1_1134 = constant -1 : index + %2223 = cmpi "slt", %arg5, %c0_1133 : index + %2224 = subi %c-1_1134, %arg5 : index + %2225 = select %2223, %2224, %arg5 : index + %2226 = divi_signed %2225, %c8_1132 : index + %2227 = subi %c-1_1134, %2226 : index + %2228 = select %2223, %2227, %2226 : index + %c8_1135 = constant 8 : index + %2229 = addi %arg5, %c8_1135 : index + %c16_1136 = constant 16 : index + %c0_1137 = constant 0 : index + %c-1_1138 = constant -1 : index + %2230 = cmpi "slt", %2229, %c0_1137 : index + %2231 = subi %c-1_1138, %2229 : index + %2232 = select %2230, %2231, %2229 : index + %2233 = divi_signed %2232, %c16_1136 : index + %2234 = subi %c-1_1138, %2233 : index + %2235 = select %2230, %2234, %2233 : index + %c-2_1139 = constant -2 : index + %2236 = muli %2235, %c-2_1139 : index + %2237 = addi %2228, %2236 : index + %c8_1140 = constant 8 : index + %c0_1141 = constant 0 : index + %c-1_1142 = constant -1 : index + %2238 = cmpi "slt", %arg5, %c0_1141 : index + %2239 = subi %c-1_1142, %arg5 : index + %2240 = select %2238, %2239, %arg5 : index + %2241 = divi_signed %2240, %c8_1140 : index + %2242 = subi %c-1_1142, %2241 : index + %2243 = select %2238, %2242, %2241 : index + %c8_1143 = constant 8 : index + %2244 = addi %arg5, %c8_1143 : index + %c16_1144 = constant 16 : index + %c0_1145 = constant 0 : index + %c-1_1146 = constant -1 : index + %2245 = cmpi "slt", %2244, %c0_1145 : index + %2246 = subi %c-1_1146, %2244 : index + %2247 = select %2245, %2246, %2244 : index + %2248 = divi_signed %2247, %c16_1144 : index + %2249 = subi %c-1_1146, %2248 : index + %2250 = select %2245, %2249, %2248 : index + %c-2_1147 = constant -2 : index + %2251 = muli %2250, %c-2_1147 : index + %2252 = addi %2243, %2251 : index + %c1_1148 = constant 1 : index + %2253 = addi %2252, %c1_1148 : index + %c2_1149 = constant 2 : index + %c0_1150 = constant 0 : index + %c-1_1151 = constant -1 : index + %2254 = cmpi "slt", %2253, %c0_1150 : index + %2255 = subi %c-1_1151, %2253 : index + %2256 = select %2254, %2255, %2253 : index + %2257 = divi_signed %2256, %c2_1149 : index + %2258 = subi %c-1_1151, %2257 : index + %2259 = select %2254, %2258, %2257 : index + %c-2_1152 = constant -2 : index + %2260 = muli %2259, %c-2_1152 : index + %2261 = addi %2237, %2260 : index + %c1_1153 = constant 1 : index + %2262 = addi %2261, %c1_1153 : index + %2263 = load %2[%2222, %c0_1131, %2262] : memref<16x6x2xvector<8xf32>> + %2264 = vector.insertelement %2204, %2263[%c0_i64 : i64] : vector<8xf32> + %c8_1154 = constant 8 : index + %2265 = addi %arg5, %c8_1154 : index + %c16_1155 = constant 16 : index + %c0_1156 = constant 0 : index + %c-1_1157 = constant -1 : index + %2266 = cmpi "slt", %2265, %c0_1156 : index + %2267 = subi %c-1_1157, %2265 : index + %2268 = select %2266, %2267, %2265 : index + %2269 = divi_signed %2268, %c16_1155 : index + %2270 = subi %c-1_1157, %2269 : index + %2271 = select %2266, %2270, %2269 : index + %c16_1158 = constant 16 : index + %2272 = remi_signed %2271, %c16_1158 : index + %c0_1159 = constant 0 : index + %2273 = cmpi "slt", %2272, %c0_1159 : index + %2274 = addi %2272, %c16_1158 : index + %2275 = select %2273, %2274, %2272 : index + %c0_1160 = constant 0 : index + %c8_1161 = constant 8 : index + %c0_1162 = constant 0 : index + %c-1_1163 = constant -1 : index + %2276 = cmpi "slt", %arg5, %c0_1162 : index + %2277 = subi %c-1_1163, %arg5 : index + %2278 = select %2276, %2277, %arg5 : index + %2279 = divi_signed %2278, %c8_1161 : index + %2280 = subi %c-1_1163, %2279 : index + %2281 = select %2276, %2280, %2279 : index + %c8_1164 = constant 8 : index + %2282 = addi %arg5, %c8_1164 : index + %c16_1165 = constant 16 : index + %c0_1166 = constant 0 : index + %c-1_1167 = constant -1 : index + %2283 = cmpi "slt", %2282, %c0_1166 : index + %2284 = subi %c-1_1167, %2282 : index + %2285 = select %2283, %2284, %2282 : index + %2286 = divi_signed %2285, %c16_1165 : index + %2287 = subi %c-1_1167, %2286 : index + %2288 = select %2283, %2287, %2286 : index + %c-2_1168 = constant -2 : index + %2289 = muli %2288, %c-2_1168 : index + %2290 = addi %2281, %2289 : index + %c8_1169 = constant 8 : index + %c0_1170 = constant 0 : index + %c-1_1171 = constant -1 : index + %2291 = cmpi "slt", %arg5, %c0_1170 : index + %2292 = subi %c-1_1171, %arg5 : index + %2293 = select %2291, %2292, %arg5 : index + %2294 = divi_signed %2293, %c8_1169 : index + %2295 = subi %c-1_1171, %2294 : index + %2296 = select %2291, %2295, %2294 : index + %c8_1172 = constant 8 : index + %2297 = addi %arg5, %c8_1172 : index + %c16_1173 = constant 16 : index + %c0_1174 = constant 0 : index + %c-1_1175 = constant -1 : index + %2298 = cmpi "slt", %2297, %c0_1174 : index + %2299 = subi %c-1_1175, %2297 : index + %2300 = select %2298, %2299, %2297 : index + %2301 = divi_signed %2300, %c16_1173 : index + %2302 = subi %c-1_1175, %2301 : index + %2303 = select %2298, %2302, %2301 : index + %c-2_1176 = constant -2 : index + %2304 = muli %2303, %c-2_1176 : index + %2305 = addi %2296, %2304 : index + %c1_1177 = constant 1 : index + %2306 = addi %2305, %c1_1177 : index + %c2_1178 = constant 2 : index + %c0_1179 = constant 0 : index + %c-1_1180 = constant -1 : index + %2307 = cmpi "slt", %2306, %c0_1179 : index + %2308 = subi %c-1_1180, %2306 : index + %2309 = select %2307, %2308, %2306 : index + %2310 = divi_signed %2309, %c2_1178 : index + %2311 = subi %c-1_1180, %2310 : index + %2312 = select %2307, %2311, %2310 : index + %c-2_1181 = constant -2 : index + %2313 = muli %2312, %c-2_1181 : index + %2314 = addi %2290, %2313 : index + %c1_1182 = constant 1 : index + %2315 = addi %2314, %c1_1182 : index + store %2264, %2[%2275, %c0_1160, %2315] : memref<16x6x2xvector<8xf32>> + %c8_1183 = constant 8 : index + %2316 = addi %arg5, %c8_1183 : index + %c16_1184 = constant 16 : index + %c0_1185 = constant 0 : index + %c-1_1186 = constant -1 : index + %2317 = cmpi "slt", %2316, %c0_1185 : index + %2318 = subi %c-1_1186, %2316 : index + %2319 = select %2317, %2318, %2316 : index + %2320 = divi_signed %2319, %c16_1184 : index + %2321 = subi %c-1_1186, %2320 : index + %2322 = select %2317, %2321, %2320 : index + %c16_1187 = constant 16 : index + %2323 = remi_signed %2322, %c16_1187 : index + %c0_1188 = constant 0 : index + %2324 = cmpi "slt", %2323, %c0_1188 : index + %2325 = addi %2323, %c16_1187 : index + %2326 = select %2324, %2325, %2323 : index + %c0_1189 = constant 0 : index + %c8_1190 = constant 8 : index + %c0_1191 = constant 0 : index + %c-1_1192 = constant -1 : index + %2327 = cmpi "slt", %arg5, %c0_1191 : index + %2328 = subi %c-1_1192, %arg5 : index + %2329 = select %2327, %2328, %arg5 : index + %2330 = divi_signed %2329, %c8_1190 : index + %2331 = subi %c-1_1192, %2330 : index + %2332 = select %2327, %2331, %2330 : index + %c8_1193 = constant 8 : index + %2333 = addi %arg5, %c8_1193 : index + %c16_1194 = constant 16 : index + %c0_1195 = constant 0 : index + %c-1_1196 = constant -1 : index + %2334 = cmpi "slt", %2333, %c0_1195 : index + %2335 = subi %c-1_1196, %2333 : index + %2336 = select %2334, %2335, %2333 : index + %2337 = divi_signed %2336, %c16_1194 : index + %2338 = subi %c-1_1196, %2337 : index + %2339 = select %2334, %2338, %2337 : index + %c-2_1197 = constant -2 : index + %2340 = muli %2339, %c-2_1197 : index + %2341 = addi %2332, %2340 : index + %c8_1198 = constant 8 : index + %c0_1199 = constant 0 : index + %c-1_1200 = constant -1 : index + %2342 = cmpi "slt", %arg5, %c0_1199 : index + %2343 = subi %c-1_1200, %arg5 : index + %2344 = select %2342, %2343, %arg5 : index + %2345 = divi_signed %2344, %c8_1198 : index + %2346 = subi %c-1_1200, %2345 : index + %2347 = select %2342, %2346, %2345 : index + %c8_1201 = constant 8 : index + %2348 = addi %arg5, %c8_1201 : index + %c16_1202 = constant 16 : index + %c0_1203 = constant 0 : index + %c-1_1204 = constant -1 : index + %2349 = cmpi "slt", %2348, %c0_1203 : index + %2350 = subi %c-1_1204, %2348 : index + %2351 = select %2349, %2350, %2348 : index + %2352 = divi_signed %2351, %c16_1202 : index + %2353 = subi %c-1_1204, %2352 : index + %2354 = select %2349, %2353, %2352 : index + %c-2_1205 = constant -2 : index + %2355 = muli %2354, %c-2_1205 : index + %2356 = addi %2347, %2355 : index + %c1_1206 = constant 1 : index + %2357 = addi %2356, %c1_1206 : index + %c2_1207 = constant 2 : index + %c0_1208 = constant 0 : index + %c-1_1209 = constant -1 : index + %2358 = cmpi "slt", %2357, %c0_1208 : index + %2359 = subi %c-1_1209, %2357 : index + %2360 = select %2358, %2359, %2357 : index + %2361 = divi_signed %2360, %c2_1207 : index + %2362 = subi %c-1_1209, %2361 : index + %2363 = select %2358, %2362, %2361 : index + %c-2_1210 = constant -2 : index + %2364 = muli %2363, %c-2_1210 : index + %2365 = addi %2341, %2364 : index + %c1_1211 = constant 1 : index + %2366 = addi %2365, %c1_1211 : index + %2367 = load %2[%2326, %c0_1189, %2366] : memref<16x6x2xvector<8xf32>> + %2368 = vector.insertelement %2205, %2367[%c1_i64 : i64] : vector<8xf32> + %c8_1212 = constant 8 : index + %2369 = addi %arg5, %c8_1212 : index + %c16_1213 = constant 16 : index + %c0_1214 = constant 0 : index + %c-1_1215 = constant -1 : index + %2370 = cmpi "slt", %2369, %c0_1214 : index + %2371 = subi %c-1_1215, %2369 : index + %2372 = select %2370, %2371, %2369 : index + %2373 = divi_signed %2372, %c16_1213 : index + %2374 = subi %c-1_1215, %2373 : index + %2375 = select %2370, %2374, %2373 : index + %c16_1216 = constant 16 : index + %2376 = remi_signed %2375, %c16_1216 : index + %c0_1217 = constant 0 : index + %2377 = cmpi "slt", %2376, %c0_1217 : index + %2378 = addi %2376, %c16_1216 : index + %2379 = select %2377, %2378, %2376 : index + %c0_1218 = constant 0 : index + %c8_1219 = constant 8 : index + %c0_1220 = constant 0 : index + %c-1_1221 = constant -1 : index + %2380 = cmpi "slt", %arg5, %c0_1220 : index + %2381 = subi %c-1_1221, %arg5 : index + %2382 = select %2380, %2381, %arg5 : index + %2383 = divi_signed %2382, %c8_1219 : index + %2384 = subi %c-1_1221, %2383 : index + %2385 = select %2380, %2384, %2383 : index + %c8_1222 = constant 8 : index + %2386 = addi %arg5, %c8_1222 : index + %c16_1223 = constant 16 : index + %c0_1224 = constant 0 : index + %c-1_1225 = constant -1 : index + %2387 = cmpi "slt", %2386, %c0_1224 : index + %2388 = subi %c-1_1225, %2386 : index + %2389 = select %2387, %2388, %2386 : index + %2390 = divi_signed %2389, %c16_1223 : index + %2391 = subi %c-1_1225, %2390 : index + %2392 = select %2387, %2391, %2390 : index + %c-2_1226 = constant -2 : index + %2393 = muli %2392, %c-2_1226 : index + %2394 = addi %2385, %2393 : index + %c8_1227 = constant 8 : index + %c0_1228 = constant 0 : index + %c-1_1229 = constant -1 : index + %2395 = cmpi "slt", %arg5, %c0_1228 : index + %2396 = subi %c-1_1229, %arg5 : index + %2397 = select %2395, %2396, %arg5 : index + %2398 = divi_signed %2397, %c8_1227 : index + %2399 = subi %c-1_1229, %2398 : index + %2400 = select %2395, %2399, %2398 : index + %c8_1230 = constant 8 : index + %2401 = addi %arg5, %c8_1230 : index + %c16_1231 = constant 16 : index + %c0_1232 = constant 0 : index + %c-1_1233 = constant -1 : index + %2402 = cmpi "slt", %2401, %c0_1232 : index + %2403 = subi %c-1_1233, %2401 : index + %2404 = select %2402, %2403, %2401 : index + %2405 = divi_signed %2404, %c16_1231 : index + %2406 = subi %c-1_1233, %2405 : index + %2407 = select %2402, %2406, %2405 : index + %c-2_1234 = constant -2 : index + %2408 = muli %2407, %c-2_1234 : index + %2409 = addi %2400, %2408 : index + %c1_1235 = constant 1 : index + %2410 = addi %2409, %c1_1235 : index + %c2_1236 = constant 2 : index + %c0_1237 = constant 0 : index + %c-1_1238 = constant -1 : index + %2411 = cmpi "slt", %2410, %c0_1237 : index + %2412 = subi %c-1_1238, %2410 : index + %2413 = select %2411, %2412, %2410 : index + %2414 = divi_signed %2413, %c2_1236 : index + %2415 = subi %c-1_1238, %2414 : index + %2416 = select %2411, %2415, %2414 : index + %c-2_1239 = constant -2 : index + %2417 = muli %2416, %c-2_1239 : index + %2418 = addi %2394, %2417 : index + %c1_1240 = constant 1 : index + %2419 = addi %2418, %c1_1240 : index + store %2368, %2[%2379, %c0_1218, %2419] : memref<16x6x2xvector<8xf32>> + %c8_1241 = constant 8 : index + %2420 = addi %arg5, %c8_1241 : index + %c16_1242 = constant 16 : index + %c0_1243 = constant 0 : index + %c-1_1244 = constant -1 : index + %2421 = cmpi "slt", %2420, %c0_1243 : index + %2422 = subi %c-1_1244, %2420 : index + %2423 = select %2421, %2422, %2420 : index + %2424 = divi_signed %2423, %c16_1242 : index + %2425 = subi %c-1_1244, %2424 : index + %2426 = select %2421, %2425, %2424 : index + %c16_1245 = constant 16 : index + %2427 = remi_signed %2426, %c16_1245 : index + %c0_1246 = constant 0 : index + %2428 = cmpi "slt", %2427, %c0_1246 : index + %2429 = addi %2427, %c16_1245 : index + %2430 = select %2428, %2429, %2427 : index + %c0_1247 = constant 0 : index + %c8_1248 = constant 8 : index + %c0_1249 = constant 0 : index + %c-1_1250 = constant -1 : index + %2431 = cmpi "slt", %arg5, %c0_1249 : index + %2432 = subi %c-1_1250, %arg5 : index + %2433 = select %2431, %2432, %arg5 : index + %2434 = divi_signed %2433, %c8_1248 : index + %2435 = subi %c-1_1250, %2434 : index + %2436 = select %2431, %2435, %2434 : index + %c8_1251 = constant 8 : index + %2437 = addi %arg5, %c8_1251 : index + %c16_1252 = constant 16 : index + %c0_1253 = constant 0 : index + %c-1_1254 = constant -1 : index + %2438 = cmpi "slt", %2437, %c0_1253 : index + %2439 = subi %c-1_1254, %2437 : index + %2440 = select %2438, %2439, %2437 : index + %2441 = divi_signed %2440, %c16_1252 : index + %2442 = subi %c-1_1254, %2441 : index + %2443 = select %2438, %2442, %2441 : index + %c-2_1255 = constant -2 : index + %2444 = muli %2443, %c-2_1255 : index + %2445 = addi %2436, %2444 : index + %c8_1256 = constant 8 : index + %c0_1257 = constant 0 : index + %c-1_1258 = constant -1 : index + %2446 = cmpi "slt", %arg5, %c0_1257 : index + %2447 = subi %c-1_1258, %arg5 : index + %2448 = select %2446, %2447, %arg5 : index + %2449 = divi_signed %2448, %c8_1256 : index + %2450 = subi %c-1_1258, %2449 : index + %2451 = select %2446, %2450, %2449 : index + %c8_1259 = constant 8 : index + %2452 = addi %arg5, %c8_1259 : index + %c16_1260 = constant 16 : index + %c0_1261 = constant 0 : index + %c-1_1262 = constant -1 : index + %2453 = cmpi "slt", %2452, %c0_1261 : index + %2454 = subi %c-1_1262, %2452 : index + %2455 = select %2453, %2454, %2452 : index + %2456 = divi_signed %2455, %c16_1260 : index + %2457 = subi %c-1_1262, %2456 : index + %2458 = select %2453, %2457, %2456 : index + %c-2_1263 = constant -2 : index + %2459 = muli %2458, %c-2_1263 : index + %2460 = addi %2451, %2459 : index + %c1_1264 = constant 1 : index + %2461 = addi %2460, %c1_1264 : index + %c2_1265 = constant 2 : index + %c0_1266 = constant 0 : index + %c-1_1267 = constant -1 : index + %2462 = cmpi "slt", %2461, %c0_1266 : index + %2463 = subi %c-1_1267, %2461 : index + %2464 = select %2462, %2463, %2461 : index + %2465 = divi_signed %2464, %c2_1265 : index + %2466 = subi %c-1_1267, %2465 : index + %2467 = select %2462, %2466, %2465 : index + %c-2_1268 = constant -2 : index + %2468 = muli %2467, %c-2_1268 : index + %2469 = addi %2445, %2468 : index + %c1_1269 = constant 1 : index + %2470 = addi %2469, %c1_1269 : index + %2471 = load %2[%2430, %c0_1247, %2470] : memref<16x6x2xvector<8xf32>> + %2472 = vector.insertelement %2206, %2471[%c2_i64 : i64] : vector<8xf32> + %c8_1270 = constant 8 : index + %2473 = addi %arg5, %c8_1270 : index + %c16_1271 = constant 16 : index + %c0_1272 = constant 0 : index + %c-1_1273 = constant -1 : index + %2474 = cmpi "slt", %2473, %c0_1272 : index + %2475 = subi %c-1_1273, %2473 : index + %2476 = select %2474, %2475, %2473 : index + %2477 = divi_signed %2476, %c16_1271 : index + %2478 = subi %c-1_1273, %2477 : index + %2479 = select %2474, %2478, %2477 : index + %c16_1274 = constant 16 : index + %2480 = remi_signed %2479, %c16_1274 : index + %c0_1275 = constant 0 : index + %2481 = cmpi "slt", %2480, %c0_1275 : index + %2482 = addi %2480, %c16_1274 : index + %2483 = select %2481, %2482, %2480 : index + %c0_1276 = constant 0 : index + %c8_1277 = constant 8 : index + %c0_1278 = constant 0 : index + %c-1_1279 = constant -1 : index + %2484 = cmpi "slt", %arg5, %c0_1278 : index + %2485 = subi %c-1_1279, %arg5 : index + %2486 = select %2484, %2485, %arg5 : index + %2487 = divi_signed %2486, %c8_1277 : index + %2488 = subi %c-1_1279, %2487 : index + %2489 = select %2484, %2488, %2487 : index + %c8_1280 = constant 8 : index + %2490 = addi %arg5, %c8_1280 : index + %c16_1281 = constant 16 : index + %c0_1282 = constant 0 : index + %c-1_1283 = constant -1 : index + %2491 = cmpi "slt", %2490, %c0_1282 : index + %2492 = subi %c-1_1283, %2490 : index + %2493 = select %2491, %2492, %2490 : index + %2494 = divi_signed %2493, %c16_1281 : index + %2495 = subi %c-1_1283, %2494 : index + %2496 = select %2491, %2495, %2494 : index + %c-2_1284 = constant -2 : index + %2497 = muli %2496, %c-2_1284 : index + %2498 = addi %2489, %2497 : index + %c8_1285 = constant 8 : index + %c0_1286 = constant 0 : index + %c-1_1287 = constant -1 : index + %2499 = cmpi "slt", %arg5, %c0_1286 : index + %2500 = subi %c-1_1287, %arg5 : index + %2501 = select %2499, %2500, %arg5 : index + %2502 = divi_signed %2501, %c8_1285 : index + %2503 = subi %c-1_1287, %2502 : index + %2504 = select %2499, %2503, %2502 : index + %c8_1288 = constant 8 : index + %2505 = addi %arg5, %c8_1288 : index + %c16_1289 = constant 16 : index + %c0_1290 = constant 0 : index + %c-1_1291 = constant -1 : index + %2506 = cmpi "slt", %2505, %c0_1290 : index + %2507 = subi %c-1_1291, %2505 : index + %2508 = select %2506, %2507, %2505 : index + %2509 = divi_signed %2508, %c16_1289 : index + %2510 = subi %c-1_1291, %2509 : index + %2511 = select %2506, %2510, %2509 : index + %c-2_1292 = constant -2 : index + %2512 = muli %2511, %c-2_1292 : index + %2513 = addi %2504, %2512 : index + %c1_1293 = constant 1 : index + %2514 = addi %2513, %c1_1293 : index + %c2_1294 = constant 2 : index + %c0_1295 = constant 0 : index + %c-1_1296 = constant -1 : index + %2515 = cmpi "slt", %2514, %c0_1295 : index + %2516 = subi %c-1_1296, %2514 : index + %2517 = select %2515, %2516, %2514 : index + %2518 = divi_signed %2517, %c2_1294 : index + %2519 = subi %c-1_1296, %2518 : index + %2520 = select %2515, %2519, %2518 : index + %c-2_1297 = constant -2 : index + %2521 = muli %2520, %c-2_1297 : index + %2522 = addi %2498, %2521 : index + %c1_1298 = constant 1 : index + %2523 = addi %2522, %c1_1298 : index + store %2472, %2[%2483, %c0_1276, %2523] : memref<16x6x2xvector<8xf32>> + %c8_1299 = constant 8 : index + %2524 = addi %arg5, %c8_1299 : index + %c16_1300 = constant 16 : index + %c0_1301 = constant 0 : index + %c-1_1302 = constant -1 : index + %2525 = cmpi "slt", %2524, %c0_1301 : index + %2526 = subi %c-1_1302, %2524 : index + %2527 = select %2525, %2526, %2524 : index + %2528 = divi_signed %2527, %c16_1300 : index + %2529 = subi %c-1_1302, %2528 : index + %2530 = select %2525, %2529, %2528 : index + %c16_1303 = constant 16 : index + %2531 = remi_signed %2530, %c16_1303 : index + %c0_1304 = constant 0 : index + %2532 = cmpi "slt", %2531, %c0_1304 : index + %2533 = addi %2531, %c16_1303 : index + %2534 = select %2532, %2533, %2531 : index + %c0_1305 = constant 0 : index + %c8_1306 = constant 8 : index + %c0_1307 = constant 0 : index + %c-1_1308 = constant -1 : index + %2535 = cmpi "slt", %arg5, %c0_1307 : index + %2536 = subi %c-1_1308, %arg5 : index + %2537 = select %2535, %2536, %arg5 : index + %2538 = divi_signed %2537, %c8_1306 : index + %2539 = subi %c-1_1308, %2538 : index + %2540 = select %2535, %2539, %2538 : index + %c8_1309 = constant 8 : index + %2541 = addi %arg5, %c8_1309 : index + %c16_1310 = constant 16 : index + %c0_1311 = constant 0 : index + %c-1_1312 = constant -1 : index + %2542 = cmpi "slt", %2541, %c0_1311 : index + %2543 = subi %c-1_1312, %2541 : index + %2544 = select %2542, %2543, %2541 : index + %2545 = divi_signed %2544, %c16_1310 : index + %2546 = subi %c-1_1312, %2545 : index + %2547 = select %2542, %2546, %2545 : index + %c-2_1313 = constant -2 : index + %2548 = muli %2547, %c-2_1313 : index + %2549 = addi %2540, %2548 : index + %c8_1314 = constant 8 : index + %c0_1315 = constant 0 : index + %c-1_1316 = constant -1 : index + %2550 = cmpi "slt", %arg5, %c0_1315 : index + %2551 = subi %c-1_1316, %arg5 : index + %2552 = select %2550, %2551, %arg5 : index + %2553 = divi_signed %2552, %c8_1314 : index + %2554 = subi %c-1_1316, %2553 : index + %2555 = select %2550, %2554, %2553 : index + %c8_1317 = constant 8 : index + %2556 = addi %arg5, %c8_1317 : index + %c16_1318 = constant 16 : index + %c0_1319 = constant 0 : index + %c-1_1320 = constant -1 : index + %2557 = cmpi "slt", %2556, %c0_1319 : index + %2558 = subi %c-1_1320, %2556 : index + %2559 = select %2557, %2558, %2556 : index + %2560 = divi_signed %2559, %c16_1318 : index + %2561 = subi %c-1_1320, %2560 : index + %2562 = select %2557, %2561, %2560 : index + %c-2_1321 = constant -2 : index + %2563 = muli %2562, %c-2_1321 : index + %2564 = addi %2555, %2563 : index + %c1_1322 = constant 1 : index + %2565 = addi %2564, %c1_1322 : index + %c2_1323 = constant 2 : index + %c0_1324 = constant 0 : index + %c-1_1325 = constant -1 : index + %2566 = cmpi "slt", %2565, %c0_1324 : index + %2567 = subi %c-1_1325, %2565 : index + %2568 = select %2566, %2567, %2565 : index + %2569 = divi_signed %2568, %c2_1323 : index + %2570 = subi %c-1_1325, %2569 : index + %2571 = select %2566, %2570, %2569 : index + %c-2_1326 = constant -2 : index + %2572 = muli %2571, %c-2_1326 : index + %2573 = addi %2549, %2572 : index + %c1_1327 = constant 1 : index + %2574 = addi %2573, %c1_1327 : index + %2575 = load %2[%2534, %c0_1305, %2574] : memref<16x6x2xvector<8xf32>> + %2576 = vector.insertelement %2207, %2575[%c3_i64 : i64] : vector<8xf32> + %c8_1328 = constant 8 : index + %2577 = addi %arg5, %c8_1328 : index + %c16_1329 = constant 16 : index + %c0_1330 = constant 0 : index + %c-1_1331 = constant -1 : index + %2578 = cmpi "slt", %2577, %c0_1330 : index + %2579 = subi %c-1_1331, %2577 : index + %2580 = select %2578, %2579, %2577 : index + %2581 = divi_signed %2580, %c16_1329 : index + %2582 = subi %c-1_1331, %2581 : index + %2583 = select %2578, %2582, %2581 : index + %c16_1332 = constant 16 : index + %2584 = remi_signed %2583, %c16_1332 : index + %c0_1333 = constant 0 : index + %2585 = cmpi "slt", %2584, %c0_1333 : index + %2586 = addi %2584, %c16_1332 : index + %2587 = select %2585, %2586, %2584 : index + %c0_1334 = constant 0 : index + %c8_1335 = constant 8 : index + %c0_1336 = constant 0 : index + %c-1_1337 = constant -1 : index + %2588 = cmpi "slt", %arg5, %c0_1336 : index + %2589 = subi %c-1_1337, %arg5 : index + %2590 = select %2588, %2589, %arg5 : index + %2591 = divi_signed %2590, %c8_1335 : index + %2592 = subi %c-1_1337, %2591 : index + %2593 = select %2588, %2592, %2591 : index + %c8_1338 = constant 8 : index + %2594 = addi %arg5, %c8_1338 : index + %c16_1339 = constant 16 : index + %c0_1340 = constant 0 : index + %c-1_1341 = constant -1 : index + %2595 = cmpi "slt", %2594, %c0_1340 : index + %2596 = subi %c-1_1341, %2594 : index + %2597 = select %2595, %2596, %2594 : index + %2598 = divi_signed %2597, %c16_1339 : index + %2599 = subi %c-1_1341, %2598 : index + %2600 = select %2595, %2599, %2598 : index + %c-2_1342 = constant -2 : index + %2601 = muli %2600, %c-2_1342 : index + %2602 = addi %2593, %2601 : index + %c8_1343 = constant 8 : index + %c0_1344 = constant 0 : index + %c-1_1345 = constant -1 : index + %2603 = cmpi "slt", %arg5, %c0_1344 : index + %2604 = subi %c-1_1345, %arg5 : index + %2605 = select %2603, %2604, %arg5 : index + %2606 = divi_signed %2605, %c8_1343 : index + %2607 = subi %c-1_1345, %2606 : index + %2608 = select %2603, %2607, %2606 : index + %c8_1346 = constant 8 : index + %2609 = addi %arg5, %c8_1346 : index + %c16_1347 = constant 16 : index + %c0_1348 = constant 0 : index + %c-1_1349 = constant -1 : index + %2610 = cmpi "slt", %2609, %c0_1348 : index + %2611 = subi %c-1_1349, %2609 : index + %2612 = select %2610, %2611, %2609 : index + %2613 = divi_signed %2612, %c16_1347 : index + %2614 = subi %c-1_1349, %2613 : index + %2615 = select %2610, %2614, %2613 : index + %c-2_1350 = constant -2 : index + %2616 = muli %2615, %c-2_1350 : index + %2617 = addi %2608, %2616 : index + %c1_1351 = constant 1 : index + %2618 = addi %2617, %c1_1351 : index + %c2_1352 = constant 2 : index + %c0_1353 = constant 0 : index + %c-1_1354 = constant -1 : index + %2619 = cmpi "slt", %2618, %c0_1353 : index + %2620 = subi %c-1_1354, %2618 : index + %2621 = select %2619, %2620, %2618 : index + %2622 = divi_signed %2621, %c2_1352 : index + %2623 = subi %c-1_1354, %2622 : index + %2624 = select %2619, %2623, %2622 : index + %c-2_1355 = constant -2 : index + %2625 = muli %2624, %c-2_1355 : index + %2626 = addi %2602, %2625 : index + %c1_1356 = constant 1 : index + %2627 = addi %2626, %c1_1356 : index + store %2576, %2[%2587, %c0_1334, %2627] : memref<16x6x2xvector<8xf32>> + %c8_1357 = constant 8 : index + %2628 = addi %arg5, %c8_1357 : index + %c16_1358 = constant 16 : index + %c0_1359 = constant 0 : index + %c-1_1360 = constant -1 : index + %2629 = cmpi "slt", %2628, %c0_1359 : index + %2630 = subi %c-1_1360, %2628 : index + %2631 = select %2629, %2630, %2628 : index + %2632 = divi_signed %2631, %c16_1358 : index + %2633 = subi %c-1_1360, %2632 : index + %2634 = select %2629, %2633, %2632 : index + %c16_1361 = constant 16 : index + %2635 = remi_signed %2634, %c16_1361 : index + %c0_1362 = constant 0 : index + %2636 = cmpi "slt", %2635, %c0_1362 : index + %2637 = addi %2635, %c16_1361 : index + %2638 = select %2636, %2637, %2635 : index + %c0_1363 = constant 0 : index + %c8_1364 = constant 8 : index + %c0_1365 = constant 0 : index + %c-1_1366 = constant -1 : index + %2639 = cmpi "slt", %arg5, %c0_1365 : index + %2640 = subi %c-1_1366, %arg5 : index + %2641 = select %2639, %2640, %arg5 : index + %2642 = divi_signed %2641, %c8_1364 : index + %2643 = subi %c-1_1366, %2642 : index + %2644 = select %2639, %2643, %2642 : index + %c8_1367 = constant 8 : index + %2645 = addi %arg5, %c8_1367 : index + %c16_1368 = constant 16 : index + %c0_1369 = constant 0 : index + %c-1_1370 = constant -1 : index + %2646 = cmpi "slt", %2645, %c0_1369 : index + %2647 = subi %c-1_1370, %2645 : index + %2648 = select %2646, %2647, %2645 : index + %2649 = divi_signed %2648, %c16_1368 : index + %2650 = subi %c-1_1370, %2649 : index + %2651 = select %2646, %2650, %2649 : index + %c-2_1371 = constant -2 : index + %2652 = muli %2651, %c-2_1371 : index + %2653 = addi %2644, %2652 : index + %c8_1372 = constant 8 : index + %c0_1373 = constant 0 : index + %c-1_1374 = constant -1 : index + %2654 = cmpi "slt", %arg5, %c0_1373 : index + %2655 = subi %c-1_1374, %arg5 : index + %2656 = select %2654, %2655, %arg5 : index + %2657 = divi_signed %2656, %c8_1372 : index + %2658 = subi %c-1_1374, %2657 : index + %2659 = select %2654, %2658, %2657 : index + %c8_1375 = constant 8 : index + %2660 = addi %arg5, %c8_1375 : index + %c16_1376 = constant 16 : index + %c0_1377 = constant 0 : index + %c-1_1378 = constant -1 : index + %2661 = cmpi "slt", %2660, %c0_1377 : index + %2662 = subi %c-1_1378, %2660 : index + %2663 = select %2661, %2662, %2660 : index + %2664 = divi_signed %2663, %c16_1376 : index + %2665 = subi %c-1_1378, %2664 : index + %2666 = select %2661, %2665, %2664 : index + %c-2_1379 = constant -2 : index + %2667 = muli %2666, %c-2_1379 : index + %2668 = addi %2659, %2667 : index + %c1_1380 = constant 1 : index + %2669 = addi %2668, %c1_1380 : index + %c2_1381 = constant 2 : index + %c0_1382 = constant 0 : index + %c-1_1383 = constant -1 : index + %2670 = cmpi "slt", %2669, %c0_1382 : index + %2671 = subi %c-1_1383, %2669 : index + %2672 = select %2670, %2671, %2669 : index + %2673 = divi_signed %2672, %c2_1381 : index + %2674 = subi %c-1_1383, %2673 : index + %2675 = select %2670, %2674, %2673 : index + %c-2_1384 = constant -2 : index + %2676 = muli %2675, %c-2_1384 : index + %2677 = addi %2653, %2676 : index + %c1_1385 = constant 1 : index + %2678 = addi %2677, %c1_1385 : index + %2679 = load %2[%2638, %c0_1363, %2678] : memref<16x6x2xvector<8xf32>> + %2680 = vector.insertelement %2208, %2679[%c4_i64 : i64] : vector<8xf32> + %c8_1386 = constant 8 : index + %2681 = addi %arg5, %c8_1386 : index + %c16_1387 = constant 16 : index + %c0_1388 = constant 0 : index + %c-1_1389 = constant -1 : index + %2682 = cmpi "slt", %2681, %c0_1388 : index + %2683 = subi %c-1_1389, %2681 : index + %2684 = select %2682, %2683, %2681 : index + %2685 = divi_signed %2684, %c16_1387 : index + %2686 = subi %c-1_1389, %2685 : index + %2687 = select %2682, %2686, %2685 : index + %c16_1390 = constant 16 : index + %2688 = remi_signed %2687, %c16_1390 : index + %c0_1391 = constant 0 : index + %2689 = cmpi "slt", %2688, %c0_1391 : index + %2690 = addi %2688, %c16_1390 : index + %2691 = select %2689, %2690, %2688 : index + %c0_1392 = constant 0 : index + %c8_1393 = constant 8 : index + %c0_1394 = constant 0 : index + %c-1_1395 = constant -1 : index + %2692 = cmpi "slt", %arg5, %c0_1394 : index + %2693 = subi %c-1_1395, %arg5 : index + %2694 = select %2692, %2693, %arg5 : index + %2695 = divi_signed %2694, %c8_1393 : index + %2696 = subi %c-1_1395, %2695 : index + %2697 = select %2692, %2696, %2695 : index + %c8_1396 = constant 8 : index + %2698 = addi %arg5, %c8_1396 : index + %c16_1397 = constant 16 : index + %c0_1398 = constant 0 : index + %c-1_1399 = constant -1 : index + %2699 = cmpi "slt", %2698, %c0_1398 : index + %2700 = subi %c-1_1399, %2698 : index + %2701 = select %2699, %2700, %2698 : index + %2702 = divi_signed %2701, %c16_1397 : index + %2703 = subi %c-1_1399, %2702 : index + %2704 = select %2699, %2703, %2702 : index + %c-2_1400 = constant -2 : index + %2705 = muli %2704, %c-2_1400 : index + %2706 = addi %2697, %2705 : index + %c8_1401 = constant 8 : index + %c0_1402 = constant 0 : index + %c-1_1403 = constant -1 : index + %2707 = cmpi "slt", %arg5, %c0_1402 : index + %2708 = subi %c-1_1403, %arg5 : index + %2709 = select %2707, %2708, %arg5 : index + %2710 = divi_signed %2709, %c8_1401 : index + %2711 = subi %c-1_1403, %2710 : index + %2712 = select %2707, %2711, %2710 : index + %c8_1404 = constant 8 : index + %2713 = addi %arg5, %c8_1404 : index + %c16_1405 = constant 16 : index + %c0_1406 = constant 0 : index + %c-1_1407 = constant -1 : index + %2714 = cmpi "slt", %2713, %c0_1406 : index + %2715 = subi %c-1_1407, %2713 : index + %2716 = select %2714, %2715, %2713 : index + %2717 = divi_signed %2716, %c16_1405 : index + %2718 = subi %c-1_1407, %2717 : index + %2719 = select %2714, %2718, %2717 : index + %c-2_1408 = constant -2 : index + %2720 = muli %2719, %c-2_1408 : index + %2721 = addi %2712, %2720 : index + %c1_1409 = constant 1 : index + %2722 = addi %2721, %c1_1409 : index + %c2_1410 = constant 2 : index + %c0_1411 = constant 0 : index + %c-1_1412 = constant -1 : index + %2723 = cmpi "slt", %2722, %c0_1411 : index + %2724 = subi %c-1_1412, %2722 : index + %2725 = select %2723, %2724, %2722 : index + %2726 = divi_signed %2725, %c2_1410 : index + %2727 = subi %c-1_1412, %2726 : index + %2728 = select %2723, %2727, %2726 : index + %c-2_1413 = constant -2 : index + %2729 = muli %2728, %c-2_1413 : index + %2730 = addi %2706, %2729 : index + %c1_1414 = constant 1 : index + %2731 = addi %2730, %c1_1414 : index + store %2680, %2[%2691, %c0_1392, %2731] : memref<16x6x2xvector<8xf32>> + %c8_1415 = constant 8 : index + %2732 = addi %arg5, %c8_1415 : index + %c16_1416 = constant 16 : index + %c0_1417 = constant 0 : index + %c-1_1418 = constant -1 : index + %2733 = cmpi "slt", %2732, %c0_1417 : index + %2734 = subi %c-1_1418, %2732 : index + %2735 = select %2733, %2734, %2732 : index + %2736 = divi_signed %2735, %c16_1416 : index + %2737 = subi %c-1_1418, %2736 : index + %2738 = select %2733, %2737, %2736 : index + %c16_1419 = constant 16 : index + %2739 = remi_signed %2738, %c16_1419 : index + %c0_1420 = constant 0 : index + %2740 = cmpi "slt", %2739, %c0_1420 : index + %2741 = addi %2739, %c16_1419 : index + %2742 = select %2740, %2741, %2739 : index + %c0_1421 = constant 0 : index + %c8_1422 = constant 8 : index + %c0_1423 = constant 0 : index + %c-1_1424 = constant -1 : index + %2743 = cmpi "slt", %arg5, %c0_1423 : index + %2744 = subi %c-1_1424, %arg5 : index + %2745 = select %2743, %2744, %arg5 : index + %2746 = divi_signed %2745, %c8_1422 : index + %2747 = subi %c-1_1424, %2746 : index + %2748 = select %2743, %2747, %2746 : index + %c8_1425 = constant 8 : index + %2749 = addi %arg5, %c8_1425 : index + %c16_1426 = constant 16 : index + %c0_1427 = constant 0 : index + %c-1_1428 = constant -1 : index + %2750 = cmpi "slt", %2749, %c0_1427 : index + %2751 = subi %c-1_1428, %2749 : index + %2752 = select %2750, %2751, %2749 : index + %2753 = divi_signed %2752, %c16_1426 : index + %2754 = subi %c-1_1428, %2753 : index + %2755 = select %2750, %2754, %2753 : index + %c-2_1429 = constant -2 : index + %2756 = muli %2755, %c-2_1429 : index + %2757 = addi %2748, %2756 : index + %c8_1430 = constant 8 : index + %c0_1431 = constant 0 : index + %c-1_1432 = constant -1 : index + %2758 = cmpi "slt", %arg5, %c0_1431 : index + %2759 = subi %c-1_1432, %arg5 : index + %2760 = select %2758, %2759, %arg5 : index + %2761 = divi_signed %2760, %c8_1430 : index + %2762 = subi %c-1_1432, %2761 : index + %2763 = select %2758, %2762, %2761 : index + %c8_1433 = constant 8 : index + %2764 = addi %arg5, %c8_1433 : index + %c16_1434 = constant 16 : index + %c0_1435 = constant 0 : index + %c-1_1436 = constant -1 : index + %2765 = cmpi "slt", %2764, %c0_1435 : index + %2766 = subi %c-1_1436, %2764 : index + %2767 = select %2765, %2766, %2764 : index + %2768 = divi_signed %2767, %c16_1434 : index + %2769 = subi %c-1_1436, %2768 : index + %2770 = select %2765, %2769, %2768 : index + %c-2_1437 = constant -2 : index + %2771 = muli %2770, %c-2_1437 : index + %2772 = addi %2763, %2771 : index + %c1_1438 = constant 1 : index + %2773 = addi %2772, %c1_1438 : index + %c2_1439 = constant 2 : index + %c0_1440 = constant 0 : index + %c-1_1441 = constant -1 : index + %2774 = cmpi "slt", %2773, %c0_1440 : index + %2775 = subi %c-1_1441, %2773 : index + %2776 = select %2774, %2775, %2773 : index + %2777 = divi_signed %2776, %c2_1439 : index + %2778 = subi %c-1_1441, %2777 : index + %2779 = select %2774, %2778, %2777 : index + %c-2_1442 = constant -2 : index + %2780 = muli %2779, %c-2_1442 : index + %2781 = addi %2757, %2780 : index + %c1_1443 = constant 1 : index + %2782 = addi %2781, %c1_1443 : index + %2783 = load %2[%2742, %c0_1421, %2782] : memref<16x6x2xvector<8xf32>> + %2784 = vector.insertelement %2209, %2783[%c5_i64 : i64] : vector<8xf32> + %c8_1444 = constant 8 : index + %2785 = addi %arg5, %c8_1444 : index + %c16_1445 = constant 16 : index + %c0_1446 = constant 0 : index + %c-1_1447 = constant -1 : index + %2786 = cmpi "slt", %2785, %c0_1446 : index + %2787 = subi %c-1_1447, %2785 : index + %2788 = select %2786, %2787, %2785 : index + %2789 = divi_signed %2788, %c16_1445 : index + %2790 = subi %c-1_1447, %2789 : index + %2791 = select %2786, %2790, %2789 : index + %c16_1448 = constant 16 : index + %2792 = remi_signed %2791, %c16_1448 : index + %c0_1449 = constant 0 : index + %2793 = cmpi "slt", %2792, %c0_1449 : index + %2794 = addi %2792, %c16_1448 : index + %2795 = select %2793, %2794, %2792 : index + %c0_1450 = constant 0 : index + %c8_1451 = constant 8 : index + %c0_1452 = constant 0 : index + %c-1_1453 = constant -1 : index + %2796 = cmpi "slt", %arg5, %c0_1452 : index + %2797 = subi %c-1_1453, %arg5 : index + %2798 = select %2796, %2797, %arg5 : index + %2799 = divi_signed %2798, %c8_1451 : index + %2800 = subi %c-1_1453, %2799 : index + %2801 = select %2796, %2800, %2799 : index + %c8_1454 = constant 8 : index + %2802 = addi %arg5, %c8_1454 : index + %c16_1455 = constant 16 : index + %c0_1456 = constant 0 : index + %c-1_1457 = constant -1 : index + %2803 = cmpi "slt", %2802, %c0_1456 : index + %2804 = subi %c-1_1457, %2802 : index + %2805 = select %2803, %2804, %2802 : index + %2806 = divi_signed %2805, %c16_1455 : index + %2807 = subi %c-1_1457, %2806 : index + %2808 = select %2803, %2807, %2806 : index + %c-2_1458 = constant -2 : index + %2809 = muli %2808, %c-2_1458 : index + %2810 = addi %2801, %2809 : index + %c8_1459 = constant 8 : index + %c0_1460 = constant 0 : index + %c-1_1461 = constant -1 : index + %2811 = cmpi "slt", %arg5, %c0_1460 : index + %2812 = subi %c-1_1461, %arg5 : index + %2813 = select %2811, %2812, %arg5 : index + %2814 = divi_signed %2813, %c8_1459 : index + %2815 = subi %c-1_1461, %2814 : index + %2816 = select %2811, %2815, %2814 : index + %c8_1462 = constant 8 : index + %2817 = addi %arg5, %c8_1462 : index + %c16_1463 = constant 16 : index + %c0_1464 = constant 0 : index + %c-1_1465 = constant -1 : index + %2818 = cmpi "slt", %2817, %c0_1464 : index + %2819 = subi %c-1_1465, %2817 : index + %2820 = select %2818, %2819, %2817 : index + %2821 = divi_signed %2820, %c16_1463 : index + %2822 = subi %c-1_1465, %2821 : index + %2823 = select %2818, %2822, %2821 : index + %c-2_1466 = constant -2 : index + %2824 = muli %2823, %c-2_1466 : index + %2825 = addi %2816, %2824 : index + %c1_1467 = constant 1 : index + %2826 = addi %2825, %c1_1467 : index + %c2_1468 = constant 2 : index + %c0_1469 = constant 0 : index + %c-1_1470 = constant -1 : index + %2827 = cmpi "slt", %2826, %c0_1469 : index + %2828 = subi %c-1_1470, %2826 : index + %2829 = select %2827, %2828, %2826 : index + %2830 = divi_signed %2829, %c2_1468 : index + %2831 = subi %c-1_1470, %2830 : index + %2832 = select %2827, %2831, %2830 : index + %c-2_1471 = constant -2 : index + %2833 = muli %2832, %c-2_1471 : index + %2834 = addi %2810, %2833 : index + %c1_1472 = constant 1 : index + %2835 = addi %2834, %c1_1472 : index + store %2784, %2[%2795, %c0_1450, %2835] : memref<16x6x2xvector<8xf32>> + %c8_1473 = constant 8 : index + %2836 = addi %arg5, %c8_1473 : index + %c16_1474 = constant 16 : index + %c0_1475 = constant 0 : index + %c-1_1476 = constant -1 : index + %2837 = cmpi "slt", %2836, %c0_1475 : index + %2838 = subi %c-1_1476, %2836 : index + %2839 = select %2837, %2838, %2836 : index + %2840 = divi_signed %2839, %c16_1474 : index + %2841 = subi %c-1_1476, %2840 : index + %2842 = select %2837, %2841, %2840 : index + %c16_1477 = constant 16 : index + %2843 = remi_signed %2842, %c16_1477 : index + %c0_1478 = constant 0 : index + %2844 = cmpi "slt", %2843, %c0_1478 : index + %2845 = addi %2843, %c16_1477 : index + %2846 = select %2844, %2845, %2843 : index + %c0_1479 = constant 0 : index + %c8_1480 = constant 8 : index + %c0_1481 = constant 0 : index + %c-1_1482 = constant -1 : index + %2847 = cmpi "slt", %arg5, %c0_1481 : index + %2848 = subi %c-1_1482, %arg5 : index + %2849 = select %2847, %2848, %arg5 : index + %2850 = divi_signed %2849, %c8_1480 : index + %2851 = subi %c-1_1482, %2850 : index + %2852 = select %2847, %2851, %2850 : index + %c8_1483 = constant 8 : index + %2853 = addi %arg5, %c8_1483 : index + %c16_1484 = constant 16 : index + %c0_1485 = constant 0 : index + %c-1_1486 = constant -1 : index + %2854 = cmpi "slt", %2853, %c0_1485 : index + %2855 = subi %c-1_1486, %2853 : index + %2856 = select %2854, %2855, %2853 : index + %2857 = divi_signed %2856, %c16_1484 : index + %2858 = subi %c-1_1486, %2857 : index + %2859 = select %2854, %2858, %2857 : index + %c-2_1487 = constant -2 : index + %2860 = muli %2859, %c-2_1487 : index + %2861 = addi %2852, %2860 : index + %c8_1488 = constant 8 : index + %c0_1489 = constant 0 : index + %c-1_1490 = constant -1 : index + %2862 = cmpi "slt", %arg5, %c0_1489 : index + %2863 = subi %c-1_1490, %arg5 : index + %2864 = select %2862, %2863, %arg5 : index + %2865 = divi_signed %2864, %c8_1488 : index + %2866 = subi %c-1_1490, %2865 : index + %2867 = select %2862, %2866, %2865 : index + %c8_1491 = constant 8 : index + %2868 = addi %arg5, %c8_1491 : index + %c16_1492 = constant 16 : index + %c0_1493 = constant 0 : index + %c-1_1494 = constant -1 : index + %2869 = cmpi "slt", %2868, %c0_1493 : index + %2870 = subi %c-1_1494, %2868 : index + %2871 = select %2869, %2870, %2868 : index + %2872 = divi_signed %2871, %c16_1492 : index + %2873 = subi %c-1_1494, %2872 : index + %2874 = select %2869, %2873, %2872 : index + %c-2_1495 = constant -2 : index + %2875 = muli %2874, %c-2_1495 : index + %2876 = addi %2867, %2875 : index + %c1_1496 = constant 1 : index + %2877 = addi %2876, %c1_1496 : index + %c2_1497 = constant 2 : index + %c0_1498 = constant 0 : index + %c-1_1499 = constant -1 : index + %2878 = cmpi "slt", %2877, %c0_1498 : index + %2879 = subi %c-1_1499, %2877 : index + %2880 = select %2878, %2879, %2877 : index + %2881 = divi_signed %2880, %c2_1497 : index + %2882 = subi %c-1_1499, %2881 : index + %2883 = select %2878, %2882, %2881 : index + %c-2_1500 = constant -2 : index + %2884 = muli %2883, %c-2_1500 : index + %2885 = addi %2861, %2884 : index + %c1_1501 = constant 1 : index + %2886 = addi %2885, %c1_1501 : index + %2887 = load %2[%2846, %c0_1479, %2886] : memref<16x6x2xvector<8xf32>> + %2888 = vector.insertelement %2210, %2887[%c6_i64 : i64] : vector<8xf32> + %c8_1502 = constant 8 : index + %2889 = addi %arg5, %c8_1502 : index + %c16_1503 = constant 16 : index + %c0_1504 = constant 0 : index + %c-1_1505 = constant -1 : index + %2890 = cmpi "slt", %2889, %c0_1504 : index + %2891 = subi %c-1_1505, %2889 : index + %2892 = select %2890, %2891, %2889 : index + %2893 = divi_signed %2892, %c16_1503 : index + %2894 = subi %c-1_1505, %2893 : index + %2895 = select %2890, %2894, %2893 : index + %c16_1506 = constant 16 : index + %2896 = remi_signed %2895, %c16_1506 : index + %c0_1507 = constant 0 : index + %2897 = cmpi "slt", %2896, %c0_1507 : index + %2898 = addi %2896, %c16_1506 : index + %2899 = select %2897, %2898, %2896 : index + %c0_1508 = constant 0 : index + %c8_1509 = constant 8 : index + %c0_1510 = constant 0 : index + %c-1_1511 = constant -1 : index + %2900 = cmpi "slt", %arg5, %c0_1510 : index + %2901 = subi %c-1_1511, %arg5 : index + %2902 = select %2900, %2901, %arg5 : index + %2903 = divi_signed %2902, %c8_1509 : index + %2904 = subi %c-1_1511, %2903 : index + %2905 = select %2900, %2904, %2903 : index + %c8_1512 = constant 8 : index + %2906 = addi %arg5, %c8_1512 : index + %c16_1513 = constant 16 : index + %c0_1514 = constant 0 : index + %c-1_1515 = constant -1 : index + %2907 = cmpi "slt", %2906, %c0_1514 : index + %2908 = subi %c-1_1515, %2906 : index + %2909 = select %2907, %2908, %2906 : index + %2910 = divi_signed %2909, %c16_1513 : index + %2911 = subi %c-1_1515, %2910 : index + %2912 = select %2907, %2911, %2910 : index + %c-2_1516 = constant -2 : index + %2913 = muli %2912, %c-2_1516 : index + %2914 = addi %2905, %2913 : index + %c8_1517 = constant 8 : index + %c0_1518 = constant 0 : index + %c-1_1519 = constant -1 : index + %2915 = cmpi "slt", %arg5, %c0_1518 : index + %2916 = subi %c-1_1519, %arg5 : index + %2917 = select %2915, %2916, %arg5 : index + %2918 = divi_signed %2917, %c8_1517 : index + %2919 = subi %c-1_1519, %2918 : index + %2920 = select %2915, %2919, %2918 : index + %c8_1520 = constant 8 : index + %2921 = addi %arg5, %c8_1520 : index + %c16_1521 = constant 16 : index + %c0_1522 = constant 0 : index + %c-1_1523 = constant -1 : index + %2922 = cmpi "slt", %2921, %c0_1522 : index + %2923 = subi %c-1_1523, %2921 : index + %2924 = select %2922, %2923, %2921 : index + %2925 = divi_signed %2924, %c16_1521 : index + %2926 = subi %c-1_1523, %2925 : index + %2927 = select %2922, %2926, %2925 : index + %c-2_1524 = constant -2 : index + %2928 = muli %2927, %c-2_1524 : index + %2929 = addi %2920, %2928 : index + %c1_1525 = constant 1 : index + %2930 = addi %2929, %c1_1525 : index + %c2_1526 = constant 2 : index + %c0_1527 = constant 0 : index + %c-1_1528 = constant -1 : index + %2931 = cmpi "slt", %2930, %c0_1527 : index + %2932 = subi %c-1_1528, %2930 : index + %2933 = select %2931, %2932, %2930 : index + %2934 = divi_signed %2933, %c2_1526 : index + %2935 = subi %c-1_1528, %2934 : index + %2936 = select %2931, %2935, %2934 : index + %c-2_1529 = constant -2 : index + %2937 = muli %2936, %c-2_1529 : index + %2938 = addi %2914, %2937 : index + %c1_1530 = constant 1 : index + %2939 = addi %2938, %c1_1530 : index + store %2888, %2[%2899, %c0_1508, %2939] : memref<16x6x2xvector<8xf32>> + %c8_1531 = constant 8 : index + %2940 = addi %arg5, %c8_1531 : index + %c16_1532 = constant 16 : index + %c0_1533 = constant 0 : index + %c-1_1534 = constant -1 : index + %2941 = cmpi "slt", %2940, %c0_1533 : index + %2942 = subi %c-1_1534, %2940 : index + %2943 = select %2941, %2942, %2940 : index + %2944 = divi_signed %2943, %c16_1532 : index + %2945 = subi %c-1_1534, %2944 : index + %2946 = select %2941, %2945, %2944 : index + %c16_1535 = constant 16 : index + %2947 = remi_signed %2946, %c16_1535 : index + %c0_1536 = constant 0 : index + %2948 = cmpi "slt", %2947, %c0_1536 : index + %2949 = addi %2947, %c16_1535 : index + %2950 = select %2948, %2949, %2947 : index + %c0_1537 = constant 0 : index + %c8_1538 = constant 8 : index + %c0_1539 = constant 0 : index + %c-1_1540 = constant -1 : index + %2951 = cmpi "slt", %arg5, %c0_1539 : index + %2952 = subi %c-1_1540, %arg5 : index + %2953 = select %2951, %2952, %arg5 : index + %2954 = divi_signed %2953, %c8_1538 : index + %2955 = subi %c-1_1540, %2954 : index + %2956 = select %2951, %2955, %2954 : index + %c8_1541 = constant 8 : index + %2957 = addi %arg5, %c8_1541 : index + %c16_1542 = constant 16 : index + %c0_1543 = constant 0 : index + %c-1_1544 = constant -1 : index + %2958 = cmpi "slt", %2957, %c0_1543 : index + %2959 = subi %c-1_1544, %2957 : index + %2960 = select %2958, %2959, %2957 : index + %2961 = divi_signed %2960, %c16_1542 : index + %2962 = subi %c-1_1544, %2961 : index + %2963 = select %2958, %2962, %2961 : index + %c-2_1545 = constant -2 : index + %2964 = muli %2963, %c-2_1545 : index + %2965 = addi %2956, %2964 : index + %c8_1546 = constant 8 : index + %c0_1547 = constant 0 : index + %c-1_1548 = constant -1 : index + %2966 = cmpi "slt", %arg5, %c0_1547 : index + %2967 = subi %c-1_1548, %arg5 : index + %2968 = select %2966, %2967, %arg5 : index + %2969 = divi_signed %2968, %c8_1546 : index + %2970 = subi %c-1_1548, %2969 : index + %2971 = select %2966, %2970, %2969 : index + %c8_1549 = constant 8 : index + %2972 = addi %arg5, %c8_1549 : index + %c16_1550 = constant 16 : index + %c0_1551 = constant 0 : index + %c-1_1552 = constant -1 : index + %2973 = cmpi "slt", %2972, %c0_1551 : index + %2974 = subi %c-1_1552, %2972 : index + %2975 = select %2973, %2974, %2972 : index + %2976 = divi_signed %2975, %c16_1550 : index + %2977 = subi %c-1_1552, %2976 : index + %2978 = select %2973, %2977, %2976 : index + %c-2_1553 = constant -2 : index + %2979 = muli %2978, %c-2_1553 : index + %2980 = addi %2971, %2979 : index + %c1_1554 = constant 1 : index + %2981 = addi %2980, %c1_1554 : index + %c2_1555 = constant 2 : index + %c0_1556 = constant 0 : index + %c-1_1557 = constant -1 : index + %2982 = cmpi "slt", %2981, %c0_1556 : index + %2983 = subi %c-1_1557, %2981 : index + %2984 = select %2982, %2983, %2981 : index + %2985 = divi_signed %2984, %c2_1555 : index + %2986 = subi %c-1_1557, %2985 : index + %2987 = select %2982, %2986, %2985 : index + %c-2_1558 = constant -2 : index + %2988 = muli %2987, %c-2_1558 : index + %2989 = addi %2965, %2988 : index + %c1_1559 = constant 1 : index + %2990 = addi %2989, %c1_1559 : index + %2991 = load %2[%2950, %c0_1537, %2990] : memref<16x6x2xvector<8xf32>> + %2992 = vector.insertelement %2211, %2991[%c7_i64 : i64] : vector<8xf32> + %c8_1560 = constant 8 : index + %2993 = addi %arg5, %c8_1560 : index + %c16_1561 = constant 16 : index + %c0_1562 = constant 0 : index + %c-1_1563 = constant -1 : index + %2994 = cmpi "slt", %2993, %c0_1562 : index + %2995 = subi %c-1_1563, %2993 : index + %2996 = select %2994, %2995, %2993 : index + %2997 = divi_signed %2996, %c16_1561 : index + %2998 = subi %c-1_1563, %2997 : index + %2999 = select %2994, %2998, %2997 : index + %c16_1564 = constant 16 : index + %3000 = remi_signed %2999, %c16_1564 : index + %c0_1565 = constant 0 : index + %3001 = cmpi "slt", %3000, %c0_1565 : index + %3002 = addi %3000, %c16_1564 : index + %3003 = select %3001, %3002, %3000 : index + %c0_1566 = constant 0 : index + %c8_1567 = constant 8 : index + %c0_1568 = constant 0 : index + %c-1_1569 = constant -1 : index + %3004 = cmpi "slt", %arg5, %c0_1568 : index + %3005 = subi %c-1_1569, %arg5 : index + %3006 = select %3004, %3005, %arg5 : index + %3007 = divi_signed %3006, %c8_1567 : index + %3008 = subi %c-1_1569, %3007 : index + %3009 = select %3004, %3008, %3007 : index + %c8_1570 = constant 8 : index + %3010 = addi %arg5, %c8_1570 : index + %c16_1571 = constant 16 : index + %c0_1572 = constant 0 : index + %c-1_1573 = constant -1 : index + %3011 = cmpi "slt", %3010, %c0_1572 : index + %3012 = subi %c-1_1573, %3010 : index + %3013 = select %3011, %3012, %3010 : index + %3014 = divi_signed %3013, %c16_1571 : index + %3015 = subi %c-1_1573, %3014 : index + %3016 = select %3011, %3015, %3014 : index + %c-2_1574 = constant -2 : index + %3017 = muli %3016, %c-2_1574 : index + %3018 = addi %3009, %3017 : index + %c8_1575 = constant 8 : index + %c0_1576 = constant 0 : index + %c-1_1577 = constant -1 : index + %3019 = cmpi "slt", %arg5, %c0_1576 : index + %3020 = subi %c-1_1577, %arg5 : index + %3021 = select %3019, %3020, %arg5 : index + %3022 = divi_signed %3021, %c8_1575 : index + %3023 = subi %c-1_1577, %3022 : index + %3024 = select %3019, %3023, %3022 : index + %c8_1578 = constant 8 : index + %3025 = addi %arg5, %c8_1578 : index + %c16_1579 = constant 16 : index + %c0_1580 = constant 0 : index + %c-1_1581 = constant -1 : index + %3026 = cmpi "slt", %3025, %c0_1580 : index + %3027 = subi %c-1_1581, %3025 : index + %3028 = select %3026, %3027, %3025 : index + %3029 = divi_signed %3028, %c16_1579 : index + %3030 = subi %c-1_1581, %3029 : index + %3031 = select %3026, %3030, %3029 : index + %c-2_1582 = constant -2 : index + %3032 = muli %3031, %c-2_1582 : index + %3033 = addi %3024, %3032 : index + %c1_1583 = constant 1 : index + %3034 = addi %3033, %c1_1583 : index + %c2_1584 = constant 2 : index + %c0_1585 = constant 0 : index + %c-1_1586 = constant -1 : index + %3035 = cmpi "slt", %3034, %c0_1585 : index + %3036 = subi %c-1_1586, %3034 : index + %3037 = select %3035, %3036, %3034 : index + %3038 = divi_signed %3037, %c2_1584 : index + %3039 = subi %c-1_1586, %3038 : index + %3040 = select %3035, %3039, %3038 : index + %c-2_1587 = constant -2 : index + %3041 = muli %3040, %c-2_1587 : index + %3042 = addi %3018, %3041 : index + %c1_1588 = constant 1 : index + %3043 = addi %3042, %c1_1588 : index + store %2992, %2[%3003, %c0_1566, %3043] : memref<16x6x2xvector<8xf32>> + %c8_1589 = constant 8 : index + %3044 = addi %arg5, %c8_1589 : index + %c16_1590 = constant 16 : index + %c0_1591 = constant 0 : index + %c-1_1592 = constant -1 : index + %3045 = cmpi "slt", %3044, %c0_1591 : index + %3046 = subi %c-1_1592, %3044 : index + %3047 = select %3045, %3046, %3044 : index + %3048 = divi_signed %3047, %c16_1590 : index + %3049 = subi %c-1_1592, %3048 : index + %3050 = select %3045, %3049, %3048 : index + %c16_1593 = constant 16 : index + %3051 = remi_signed %3050, %c16_1593 : index + %c0_1594 = constant 0 : index + %3052 = cmpi "slt", %3051, %c0_1594 : index + %3053 = addi %3051, %c16_1593 : index + %3054 = select %3052, %3053, %3051 : index + %c0_1595 = constant 0 : index + %c8_1596 = constant 8 : index + %c0_1597 = constant 0 : index + %c-1_1598 = constant -1 : index + %3055 = cmpi "slt", %arg5, %c0_1597 : index + %3056 = subi %c-1_1598, %arg5 : index + %3057 = select %3055, %3056, %arg5 : index + %3058 = divi_signed %3057, %c8_1596 : index + %3059 = subi %c-1_1598, %3058 : index + %3060 = select %3055, %3059, %3058 : index + %c8_1599 = constant 8 : index + %3061 = addi %arg5, %c8_1599 : index + %c16_1600 = constant 16 : index + %c0_1601 = constant 0 : index + %c-1_1602 = constant -1 : index + %3062 = cmpi "slt", %3061, %c0_1601 : index + %3063 = subi %c-1_1602, %3061 : index + %3064 = select %3062, %3063, %3061 : index + %3065 = divi_signed %3064, %c16_1600 : index + %3066 = subi %c-1_1602, %3065 : index + %3067 = select %3062, %3066, %3065 : index + %c-2_1603 = constant -2 : index + %3068 = muli %3067, %c-2_1603 : index + %3069 = addi %3060, %3068 : index + %c8_1604 = constant 8 : index + %c0_1605 = constant 0 : index + %c-1_1606 = constant -1 : index + %3070 = cmpi "slt", %arg5, %c0_1605 : index + %3071 = subi %c-1_1606, %arg5 : index + %3072 = select %3070, %3071, %arg5 : index + %3073 = divi_signed %3072, %c8_1604 : index + %3074 = subi %c-1_1606, %3073 : index + %3075 = select %3070, %3074, %3073 : index + %c8_1607 = constant 8 : index + %3076 = addi %arg5, %c8_1607 : index + %c16_1608 = constant 16 : index + %c0_1609 = constant 0 : index + %c-1_1610 = constant -1 : index + %3077 = cmpi "slt", %3076, %c0_1609 : index + %3078 = subi %c-1_1610, %3076 : index + %3079 = select %3077, %3078, %3076 : index + %3080 = divi_signed %3079, %c16_1608 : index + %3081 = subi %c-1_1610, %3080 : index + %3082 = select %3077, %3081, %3080 : index + %c-2_1611 = constant -2 : index + %3083 = muli %3082, %c-2_1611 : index + %3084 = addi %3075, %3083 : index + %c1_1612 = constant 1 : index + %3085 = addi %3084, %c1_1612 : index + %c2_1613 = constant 2 : index + %c0_1614 = constant 0 : index + %c-1_1615 = constant -1 : index + %3086 = cmpi "slt", %3085, %c0_1614 : index + %3087 = subi %c-1_1615, %3085 : index + %3088 = select %3086, %3087, %3085 : index + %3089 = divi_signed %3088, %c2_1613 : index + %3090 = subi %c-1_1615, %3089 : index + %3091 = select %3086, %3090, %3089 : index + %c-2_1616 = constant -2 : index + %3092 = muli %3091, %c-2_1616 : index + %3093 = addi %3069, %3092 : index + %c1_1617 = constant 1 : index + %3094 = addi %3093, %c1_1617 : index + %3095 = load %2[%3054, %c0_1595, %3094] : memref<16x6x2xvector<8xf32>> + %3096 = vector.insertelement %2204, %3095[%c0_i64 : i64] : vector<8xf32> + %c8_1618 = constant 8 : index + %3097 = addi %arg5, %c8_1618 : index + %c16_1619 = constant 16 : index + %c0_1620 = constant 0 : index + %c-1_1621 = constant -1 : index + %3098 = cmpi "slt", %3097, %c0_1620 : index + %3099 = subi %c-1_1621, %3097 : index + %3100 = select %3098, %3099, %3097 : index + %3101 = divi_signed %3100, %c16_1619 : index + %3102 = subi %c-1_1621, %3101 : index + %3103 = select %3098, %3102, %3101 : index + %c16_1622 = constant 16 : index + %3104 = remi_signed %3103, %c16_1622 : index + %c0_1623 = constant 0 : index + %3105 = cmpi "slt", %3104, %c0_1623 : index + %3106 = addi %3104, %c16_1622 : index + %3107 = select %3105, %3106, %3104 : index + %c0_1624 = constant 0 : index + %c8_1625 = constant 8 : index + %c0_1626 = constant 0 : index + %c-1_1627 = constant -1 : index + %3108 = cmpi "slt", %arg5, %c0_1626 : index + %3109 = subi %c-1_1627, %arg5 : index + %3110 = select %3108, %3109, %arg5 : index + %3111 = divi_signed %3110, %c8_1625 : index + %3112 = subi %c-1_1627, %3111 : index + %3113 = select %3108, %3112, %3111 : index + %c8_1628 = constant 8 : index + %3114 = addi %arg5, %c8_1628 : index + %c16_1629 = constant 16 : index + %c0_1630 = constant 0 : index + %c-1_1631 = constant -1 : index + %3115 = cmpi "slt", %3114, %c0_1630 : index + %3116 = subi %c-1_1631, %3114 : index + %3117 = select %3115, %3116, %3114 : index + %3118 = divi_signed %3117, %c16_1629 : index + %3119 = subi %c-1_1631, %3118 : index + %3120 = select %3115, %3119, %3118 : index + %c-2_1632 = constant -2 : index + %3121 = muli %3120, %c-2_1632 : index + %3122 = addi %3113, %3121 : index + %c8_1633 = constant 8 : index + %c0_1634 = constant 0 : index + %c-1_1635 = constant -1 : index + %3123 = cmpi "slt", %arg5, %c0_1634 : index + %3124 = subi %c-1_1635, %arg5 : index + %3125 = select %3123, %3124, %arg5 : index + %3126 = divi_signed %3125, %c8_1633 : index + %3127 = subi %c-1_1635, %3126 : index + %3128 = select %3123, %3127, %3126 : index + %c8_1636 = constant 8 : index + %3129 = addi %arg5, %c8_1636 : index + %c16_1637 = constant 16 : index + %c0_1638 = constant 0 : index + %c-1_1639 = constant -1 : index + %3130 = cmpi "slt", %3129, %c0_1638 : index + %3131 = subi %c-1_1639, %3129 : index + %3132 = select %3130, %3131, %3129 : index + %3133 = divi_signed %3132, %c16_1637 : index + %3134 = subi %c-1_1639, %3133 : index + %3135 = select %3130, %3134, %3133 : index + %c-2_1640 = constant -2 : index + %3136 = muli %3135, %c-2_1640 : index + %3137 = addi %3128, %3136 : index + %c1_1641 = constant 1 : index + %3138 = addi %3137, %c1_1641 : index + %c2_1642 = constant 2 : index + %c0_1643 = constant 0 : index + %c-1_1644 = constant -1 : index + %3139 = cmpi "slt", %3138, %c0_1643 : index + %3140 = subi %c-1_1644, %3138 : index + %3141 = select %3139, %3140, %3138 : index + %3142 = divi_signed %3141, %c2_1642 : index + %3143 = subi %c-1_1644, %3142 : index + %3144 = select %3139, %3143, %3142 : index + %c-2_1645 = constant -2 : index + %3145 = muli %3144, %c-2_1645 : index + %3146 = addi %3122, %3145 : index + %c1_1646 = constant 1 : index + %3147 = addi %3146, %c1_1646 : index + store %3096, %2[%3107, %c0_1624, %3147] : memref<16x6x2xvector<8xf32>> + %c8_1647 = constant 8 : index + %3148 = addi %arg5, %c8_1647 : index + %c16_1648 = constant 16 : index + %c0_1649 = constant 0 : index + %c-1_1650 = constant -1 : index + %3149 = cmpi "slt", %3148, %c0_1649 : index + %3150 = subi %c-1_1650, %3148 : index + %3151 = select %3149, %3150, %3148 : index + %3152 = divi_signed %3151, %c16_1648 : index + %3153 = subi %c-1_1650, %3152 : index + %3154 = select %3149, %3153, %3152 : index + %c16_1651 = constant 16 : index + %3155 = remi_signed %3154, %c16_1651 : index + %c0_1652 = constant 0 : index + %3156 = cmpi "slt", %3155, %c0_1652 : index + %3157 = addi %3155, %c16_1651 : index + %3158 = select %3156, %3157, %3155 : index + %c0_1653 = constant 0 : index + %c8_1654 = constant 8 : index + %c0_1655 = constant 0 : index + %c-1_1656 = constant -1 : index + %3159 = cmpi "slt", %arg5, %c0_1655 : index + %3160 = subi %c-1_1656, %arg5 : index + %3161 = select %3159, %3160, %arg5 : index + %3162 = divi_signed %3161, %c8_1654 : index + %3163 = subi %c-1_1656, %3162 : index + %3164 = select %3159, %3163, %3162 : index + %c8_1657 = constant 8 : index + %3165 = addi %arg5, %c8_1657 : index + %c16_1658 = constant 16 : index + %c0_1659 = constant 0 : index + %c-1_1660 = constant -1 : index + %3166 = cmpi "slt", %3165, %c0_1659 : index + %3167 = subi %c-1_1660, %3165 : index + %3168 = select %3166, %3167, %3165 : index + %3169 = divi_signed %3168, %c16_1658 : index + %3170 = subi %c-1_1660, %3169 : index + %3171 = select %3166, %3170, %3169 : index + %c-2_1661 = constant -2 : index + %3172 = muli %3171, %c-2_1661 : index + %3173 = addi %3164, %3172 : index + %c8_1662 = constant 8 : index + %c0_1663 = constant 0 : index + %c-1_1664 = constant -1 : index + %3174 = cmpi "slt", %arg5, %c0_1663 : index + %3175 = subi %c-1_1664, %arg5 : index + %3176 = select %3174, %3175, %arg5 : index + %3177 = divi_signed %3176, %c8_1662 : index + %3178 = subi %c-1_1664, %3177 : index + %3179 = select %3174, %3178, %3177 : index + %c8_1665 = constant 8 : index + %3180 = addi %arg5, %c8_1665 : index + %c16_1666 = constant 16 : index + %c0_1667 = constant 0 : index + %c-1_1668 = constant -1 : index + %3181 = cmpi "slt", %3180, %c0_1667 : index + %3182 = subi %c-1_1668, %3180 : index + %3183 = select %3181, %3182, %3180 : index + %3184 = divi_signed %3183, %c16_1666 : index + %3185 = subi %c-1_1668, %3184 : index + %3186 = select %3181, %3185, %3184 : index + %c-2_1669 = constant -2 : index + %3187 = muli %3186, %c-2_1669 : index + %3188 = addi %3179, %3187 : index + %c1_1670 = constant 1 : index + %3189 = addi %3188, %c1_1670 : index + %c2_1671 = constant 2 : index + %c0_1672 = constant 0 : index + %c-1_1673 = constant -1 : index + %3190 = cmpi "slt", %3189, %c0_1672 : index + %3191 = subi %c-1_1673, %3189 : index + %3192 = select %3190, %3191, %3189 : index + %3193 = divi_signed %3192, %c2_1671 : index + %3194 = subi %c-1_1673, %3193 : index + %3195 = select %3190, %3194, %3193 : index + %c-2_1674 = constant -2 : index + %3196 = muli %3195, %c-2_1674 : index + %3197 = addi %3173, %3196 : index + %c1_1675 = constant 1 : index + %3198 = addi %3197, %c1_1675 : index + %3199 = load %2[%3158, %c0_1653, %3198] : memref<16x6x2xvector<8xf32>> + %3200 = vector.insertelement %2205, %3199[%c1_i64 : i64] : vector<8xf32> + %c8_1676 = constant 8 : index + %3201 = addi %arg5, %c8_1676 : index + %c16_1677 = constant 16 : index + %c0_1678 = constant 0 : index + %c-1_1679 = constant -1 : index + %3202 = cmpi "slt", %3201, %c0_1678 : index + %3203 = subi %c-1_1679, %3201 : index + %3204 = select %3202, %3203, %3201 : index + %3205 = divi_signed %3204, %c16_1677 : index + %3206 = subi %c-1_1679, %3205 : index + %3207 = select %3202, %3206, %3205 : index + %c16_1680 = constant 16 : index + %3208 = remi_signed %3207, %c16_1680 : index + %c0_1681 = constant 0 : index + %3209 = cmpi "slt", %3208, %c0_1681 : index + %3210 = addi %3208, %c16_1680 : index + %3211 = select %3209, %3210, %3208 : index + %c0_1682 = constant 0 : index + %c8_1683 = constant 8 : index + %c0_1684 = constant 0 : index + %c-1_1685 = constant -1 : index + %3212 = cmpi "slt", %arg5, %c0_1684 : index + %3213 = subi %c-1_1685, %arg5 : index + %3214 = select %3212, %3213, %arg5 : index + %3215 = divi_signed %3214, %c8_1683 : index + %3216 = subi %c-1_1685, %3215 : index + %3217 = select %3212, %3216, %3215 : index + %c8_1686 = constant 8 : index + %3218 = addi %arg5, %c8_1686 : index + %c16_1687 = constant 16 : index + %c0_1688 = constant 0 : index + %c-1_1689 = constant -1 : index + %3219 = cmpi "slt", %3218, %c0_1688 : index + %3220 = subi %c-1_1689, %3218 : index + %3221 = select %3219, %3220, %3218 : index + %3222 = divi_signed %3221, %c16_1687 : index + %3223 = subi %c-1_1689, %3222 : index + %3224 = select %3219, %3223, %3222 : index + %c-2_1690 = constant -2 : index + %3225 = muli %3224, %c-2_1690 : index + %3226 = addi %3217, %3225 : index + %c8_1691 = constant 8 : index + %c0_1692 = constant 0 : index + %c-1_1693 = constant -1 : index + %3227 = cmpi "slt", %arg5, %c0_1692 : index + %3228 = subi %c-1_1693, %arg5 : index + %3229 = select %3227, %3228, %arg5 : index + %3230 = divi_signed %3229, %c8_1691 : index + %3231 = subi %c-1_1693, %3230 : index + %3232 = select %3227, %3231, %3230 : index + %c8_1694 = constant 8 : index + %3233 = addi %arg5, %c8_1694 : index + %c16_1695 = constant 16 : index + %c0_1696 = constant 0 : index + %c-1_1697 = constant -1 : index + %3234 = cmpi "slt", %3233, %c0_1696 : index + %3235 = subi %c-1_1697, %3233 : index + %3236 = select %3234, %3235, %3233 : index + %3237 = divi_signed %3236, %c16_1695 : index + %3238 = subi %c-1_1697, %3237 : index + %3239 = select %3234, %3238, %3237 : index + %c-2_1698 = constant -2 : index + %3240 = muli %3239, %c-2_1698 : index + %3241 = addi %3232, %3240 : index + %c1_1699 = constant 1 : index + %3242 = addi %3241, %c1_1699 : index + %c2_1700 = constant 2 : index + %c0_1701 = constant 0 : index + %c-1_1702 = constant -1 : index + %3243 = cmpi "slt", %3242, %c0_1701 : index + %3244 = subi %c-1_1702, %3242 : index + %3245 = select %3243, %3244, %3242 : index + %3246 = divi_signed %3245, %c2_1700 : index + %3247 = subi %c-1_1702, %3246 : index + %3248 = select %3243, %3247, %3246 : index + %c-2_1703 = constant -2 : index + %3249 = muli %3248, %c-2_1703 : index + %3250 = addi %3226, %3249 : index + %c1_1704 = constant 1 : index + %3251 = addi %3250, %c1_1704 : index + store %3200, %2[%3211, %c0_1682, %3251] : memref<16x6x2xvector<8xf32>> + %c8_1705 = constant 8 : index + %3252 = addi %arg5, %c8_1705 : index + %c16_1706 = constant 16 : index + %c0_1707 = constant 0 : index + %c-1_1708 = constant -1 : index + %3253 = cmpi "slt", %3252, %c0_1707 : index + %3254 = subi %c-1_1708, %3252 : index + %3255 = select %3253, %3254, %3252 : index + %3256 = divi_signed %3255, %c16_1706 : index + %3257 = subi %c-1_1708, %3256 : index + %3258 = select %3253, %3257, %3256 : index + %c16_1709 = constant 16 : index + %3259 = remi_signed %3258, %c16_1709 : index + %c0_1710 = constant 0 : index + %3260 = cmpi "slt", %3259, %c0_1710 : index + %3261 = addi %3259, %c16_1709 : index + %3262 = select %3260, %3261, %3259 : index + %c0_1711 = constant 0 : index + %c8_1712 = constant 8 : index + %c0_1713 = constant 0 : index + %c-1_1714 = constant -1 : index + %3263 = cmpi "slt", %arg5, %c0_1713 : index + %3264 = subi %c-1_1714, %arg5 : index + %3265 = select %3263, %3264, %arg5 : index + %3266 = divi_signed %3265, %c8_1712 : index + %3267 = subi %c-1_1714, %3266 : index + %3268 = select %3263, %3267, %3266 : index + %c8_1715 = constant 8 : index + %3269 = addi %arg5, %c8_1715 : index + %c16_1716 = constant 16 : index + %c0_1717 = constant 0 : index + %c-1_1718 = constant -1 : index + %3270 = cmpi "slt", %3269, %c0_1717 : index + %3271 = subi %c-1_1718, %3269 : index + %3272 = select %3270, %3271, %3269 : index + %3273 = divi_signed %3272, %c16_1716 : index + %3274 = subi %c-1_1718, %3273 : index + %3275 = select %3270, %3274, %3273 : index + %c-2_1719 = constant -2 : index + %3276 = muli %3275, %c-2_1719 : index + %3277 = addi %3268, %3276 : index + %c8_1720 = constant 8 : index + %c0_1721 = constant 0 : index + %c-1_1722 = constant -1 : index + %3278 = cmpi "slt", %arg5, %c0_1721 : index + %3279 = subi %c-1_1722, %arg5 : index + %3280 = select %3278, %3279, %arg5 : index + %3281 = divi_signed %3280, %c8_1720 : index + %3282 = subi %c-1_1722, %3281 : index + %3283 = select %3278, %3282, %3281 : index + %c8_1723 = constant 8 : index + %3284 = addi %arg5, %c8_1723 : index + %c16_1724 = constant 16 : index + %c0_1725 = constant 0 : index + %c-1_1726 = constant -1 : index + %3285 = cmpi "slt", %3284, %c0_1725 : index + %3286 = subi %c-1_1726, %3284 : index + %3287 = select %3285, %3286, %3284 : index + %3288 = divi_signed %3287, %c16_1724 : index + %3289 = subi %c-1_1726, %3288 : index + %3290 = select %3285, %3289, %3288 : index + %c-2_1727 = constant -2 : index + %3291 = muli %3290, %c-2_1727 : index + %3292 = addi %3283, %3291 : index + %c1_1728 = constant 1 : index + %3293 = addi %3292, %c1_1728 : index + %c2_1729 = constant 2 : index + %c0_1730 = constant 0 : index + %c-1_1731 = constant -1 : index + %3294 = cmpi "slt", %3293, %c0_1730 : index + %3295 = subi %c-1_1731, %3293 : index + %3296 = select %3294, %3295, %3293 : index + %3297 = divi_signed %3296, %c2_1729 : index + %3298 = subi %c-1_1731, %3297 : index + %3299 = select %3294, %3298, %3297 : index + %c-2_1732 = constant -2 : index + %3300 = muli %3299, %c-2_1732 : index + %3301 = addi %3277, %3300 : index + %c1_1733 = constant 1 : index + %3302 = addi %3301, %c1_1733 : index + %3303 = load %2[%3262, %c0_1711, %3302] : memref<16x6x2xvector<8xf32>> + %3304 = vector.insertelement %2206, %3303[%c2_i64 : i64] : vector<8xf32> + %c8_1734 = constant 8 : index + %3305 = addi %arg5, %c8_1734 : index + %c16_1735 = constant 16 : index + %c0_1736 = constant 0 : index + %c-1_1737 = constant -1 : index + %3306 = cmpi "slt", %3305, %c0_1736 : index + %3307 = subi %c-1_1737, %3305 : index + %3308 = select %3306, %3307, %3305 : index + %3309 = divi_signed %3308, %c16_1735 : index + %3310 = subi %c-1_1737, %3309 : index + %3311 = select %3306, %3310, %3309 : index + %c16_1738 = constant 16 : index + %3312 = remi_signed %3311, %c16_1738 : index + %c0_1739 = constant 0 : index + %3313 = cmpi "slt", %3312, %c0_1739 : index + %3314 = addi %3312, %c16_1738 : index + %3315 = select %3313, %3314, %3312 : index + %c0_1740 = constant 0 : index + %c8_1741 = constant 8 : index + %c0_1742 = constant 0 : index + %c-1_1743 = constant -1 : index + %3316 = cmpi "slt", %arg5, %c0_1742 : index + %3317 = subi %c-1_1743, %arg5 : index + %3318 = select %3316, %3317, %arg5 : index + %3319 = divi_signed %3318, %c8_1741 : index + %3320 = subi %c-1_1743, %3319 : index + %3321 = select %3316, %3320, %3319 : index + %c8_1744 = constant 8 : index + %3322 = addi %arg5, %c8_1744 : index + %c16_1745 = constant 16 : index + %c0_1746 = constant 0 : index + %c-1_1747 = constant -1 : index + %3323 = cmpi "slt", %3322, %c0_1746 : index + %3324 = subi %c-1_1747, %3322 : index + %3325 = select %3323, %3324, %3322 : index + %3326 = divi_signed %3325, %c16_1745 : index + %3327 = subi %c-1_1747, %3326 : index + %3328 = select %3323, %3327, %3326 : index + %c-2_1748 = constant -2 : index + %3329 = muli %3328, %c-2_1748 : index + %3330 = addi %3321, %3329 : index + %c8_1749 = constant 8 : index + %c0_1750 = constant 0 : index + %c-1_1751 = constant -1 : index + %3331 = cmpi "slt", %arg5, %c0_1750 : index + %3332 = subi %c-1_1751, %arg5 : index + %3333 = select %3331, %3332, %arg5 : index + %3334 = divi_signed %3333, %c8_1749 : index + %3335 = subi %c-1_1751, %3334 : index + %3336 = select %3331, %3335, %3334 : index + %c8_1752 = constant 8 : index + %3337 = addi %arg5, %c8_1752 : index + %c16_1753 = constant 16 : index + %c0_1754 = constant 0 : index + %c-1_1755 = constant -1 : index + %3338 = cmpi "slt", %3337, %c0_1754 : index + %3339 = subi %c-1_1755, %3337 : index + %3340 = select %3338, %3339, %3337 : index + %3341 = divi_signed %3340, %c16_1753 : index + %3342 = subi %c-1_1755, %3341 : index + %3343 = select %3338, %3342, %3341 : index + %c-2_1756 = constant -2 : index + %3344 = muli %3343, %c-2_1756 : index + %3345 = addi %3336, %3344 : index + %c1_1757 = constant 1 : index + %3346 = addi %3345, %c1_1757 : index + %c2_1758 = constant 2 : index + %c0_1759 = constant 0 : index + %c-1_1760 = constant -1 : index + %3347 = cmpi "slt", %3346, %c0_1759 : index + %3348 = subi %c-1_1760, %3346 : index + %3349 = select %3347, %3348, %3346 : index + %3350 = divi_signed %3349, %c2_1758 : index + %3351 = subi %c-1_1760, %3350 : index + %3352 = select %3347, %3351, %3350 : index + %c-2_1761 = constant -2 : index + %3353 = muli %3352, %c-2_1761 : index + %3354 = addi %3330, %3353 : index + %c1_1762 = constant 1 : index + %3355 = addi %3354, %c1_1762 : index + store %3304, %2[%3315, %c0_1740, %3355] : memref<16x6x2xvector<8xf32>> + %c8_1763 = constant 8 : index + %3356 = addi %arg5, %c8_1763 : index + %c16_1764 = constant 16 : index + %c0_1765 = constant 0 : index + %c-1_1766 = constant -1 : index + %3357 = cmpi "slt", %3356, %c0_1765 : index + %3358 = subi %c-1_1766, %3356 : index + %3359 = select %3357, %3358, %3356 : index + %3360 = divi_signed %3359, %c16_1764 : index + %3361 = subi %c-1_1766, %3360 : index + %3362 = select %3357, %3361, %3360 : index + %c16_1767 = constant 16 : index + %3363 = remi_signed %3362, %c16_1767 : index + %c0_1768 = constant 0 : index + %3364 = cmpi "slt", %3363, %c0_1768 : index + %3365 = addi %3363, %c16_1767 : index + %3366 = select %3364, %3365, %3363 : index + %c0_1769 = constant 0 : index + %c8_1770 = constant 8 : index + %c0_1771 = constant 0 : index + %c-1_1772 = constant -1 : index + %3367 = cmpi "slt", %arg5, %c0_1771 : index + %3368 = subi %c-1_1772, %arg5 : index + %3369 = select %3367, %3368, %arg5 : index + %3370 = divi_signed %3369, %c8_1770 : index + %3371 = subi %c-1_1772, %3370 : index + %3372 = select %3367, %3371, %3370 : index + %c8_1773 = constant 8 : index + %3373 = addi %arg5, %c8_1773 : index + %c16_1774 = constant 16 : index + %c0_1775 = constant 0 : index + %c-1_1776 = constant -1 : index + %3374 = cmpi "slt", %3373, %c0_1775 : index + %3375 = subi %c-1_1776, %3373 : index + %3376 = select %3374, %3375, %3373 : index + %3377 = divi_signed %3376, %c16_1774 : index + %3378 = subi %c-1_1776, %3377 : index + %3379 = select %3374, %3378, %3377 : index + %c-2_1777 = constant -2 : index + %3380 = muli %3379, %c-2_1777 : index + %3381 = addi %3372, %3380 : index + %c8_1778 = constant 8 : index + %c0_1779 = constant 0 : index + %c-1_1780 = constant -1 : index + %3382 = cmpi "slt", %arg5, %c0_1779 : index + %3383 = subi %c-1_1780, %arg5 : index + %3384 = select %3382, %3383, %arg5 : index + %3385 = divi_signed %3384, %c8_1778 : index + %3386 = subi %c-1_1780, %3385 : index + %3387 = select %3382, %3386, %3385 : index + %c8_1781 = constant 8 : index + %3388 = addi %arg5, %c8_1781 : index + %c16_1782 = constant 16 : index + %c0_1783 = constant 0 : index + %c-1_1784 = constant -1 : index + %3389 = cmpi "slt", %3388, %c0_1783 : index + %3390 = subi %c-1_1784, %3388 : index + %3391 = select %3389, %3390, %3388 : index + %3392 = divi_signed %3391, %c16_1782 : index + %3393 = subi %c-1_1784, %3392 : index + %3394 = select %3389, %3393, %3392 : index + %c-2_1785 = constant -2 : index + %3395 = muli %3394, %c-2_1785 : index + %3396 = addi %3387, %3395 : index + %c1_1786 = constant 1 : index + %3397 = addi %3396, %c1_1786 : index + %c2_1787 = constant 2 : index + %c0_1788 = constant 0 : index + %c-1_1789 = constant -1 : index + %3398 = cmpi "slt", %3397, %c0_1788 : index + %3399 = subi %c-1_1789, %3397 : index + %3400 = select %3398, %3399, %3397 : index + %3401 = divi_signed %3400, %c2_1787 : index + %3402 = subi %c-1_1789, %3401 : index + %3403 = select %3398, %3402, %3401 : index + %c-2_1790 = constant -2 : index + %3404 = muli %3403, %c-2_1790 : index + %3405 = addi %3381, %3404 : index + %c1_1791 = constant 1 : index + %3406 = addi %3405, %c1_1791 : index + %3407 = load %2[%3366, %c0_1769, %3406] : memref<16x6x2xvector<8xf32>> + %3408 = vector.insertelement %2207, %3407[%c3_i64 : i64] : vector<8xf32> + %c8_1792 = constant 8 : index + %3409 = addi %arg5, %c8_1792 : index + %c16_1793 = constant 16 : index + %c0_1794 = constant 0 : index + %c-1_1795 = constant -1 : index + %3410 = cmpi "slt", %3409, %c0_1794 : index + %3411 = subi %c-1_1795, %3409 : index + %3412 = select %3410, %3411, %3409 : index + %3413 = divi_signed %3412, %c16_1793 : index + %3414 = subi %c-1_1795, %3413 : index + %3415 = select %3410, %3414, %3413 : index + %c16_1796 = constant 16 : index + %3416 = remi_signed %3415, %c16_1796 : index + %c0_1797 = constant 0 : index + %3417 = cmpi "slt", %3416, %c0_1797 : index + %3418 = addi %3416, %c16_1796 : index + %3419 = select %3417, %3418, %3416 : index + %c0_1798 = constant 0 : index + %c8_1799 = constant 8 : index + %c0_1800 = constant 0 : index + %c-1_1801 = constant -1 : index + %3420 = cmpi "slt", %arg5, %c0_1800 : index + %3421 = subi %c-1_1801, %arg5 : index + %3422 = select %3420, %3421, %arg5 : index + %3423 = divi_signed %3422, %c8_1799 : index + %3424 = subi %c-1_1801, %3423 : index + %3425 = select %3420, %3424, %3423 : index + %c8_1802 = constant 8 : index + %3426 = addi %arg5, %c8_1802 : index + %c16_1803 = constant 16 : index + %c0_1804 = constant 0 : index + %c-1_1805 = constant -1 : index + %3427 = cmpi "slt", %3426, %c0_1804 : index + %3428 = subi %c-1_1805, %3426 : index + %3429 = select %3427, %3428, %3426 : index + %3430 = divi_signed %3429, %c16_1803 : index + %3431 = subi %c-1_1805, %3430 : index + %3432 = select %3427, %3431, %3430 : index + %c-2_1806 = constant -2 : index + %3433 = muli %3432, %c-2_1806 : index + %3434 = addi %3425, %3433 : index + %c8_1807 = constant 8 : index + %c0_1808 = constant 0 : index + %c-1_1809 = constant -1 : index + %3435 = cmpi "slt", %arg5, %c0_1808 : index + %3436 = subi %c-1_1809, %arg5 : index + %3437 = select %3435, %3436, %arg5 : index + %3438 = divi_signed %3437, %c8_1807 : index + %3439 = subi %c-1_1809, %3438 : index + %3440 = select %3435, %3439, %3438 : index + %c8_1810 = constant 8 : index + %3441 = addi %arg5, %c8_1810 : index + %c16_1811 = constant 16 : index + %c0_1812 = constant 0 : index + %c-1_1813 = constant -1 : index + %3442 = cmpi "slt", %3441, %c0_1812 : index + %3443 = subi %c-1_1813, %3441 : index + %3444 = select %3442, %3443, %3441 : index + %3445 = divi_signed %3444, %c16_1811 : index + %3446 = subi %c-1_1813, %3445 : index + %3447 = select %3442, %3446, %3445 : index + %c-2_1814 = constant -2 : index + %3448 = muli %3447, %c-2_1814 : index + %3449 = addi %3440, %3448 : index + %c1_1815 = constant 1 : index + %3450 = addi %3449, %c1_1815 : index + %c2_1816 = constant 2 : index + %c0_1817 = constant 0 : index + %c-1_1818 = constant -1 : index + %3451 = cmpi "slt", %3450, %c0_1817 : index + %3452 = subi %c-1_1818, %3450 : index + %3453 = select %3451, %3452, %3450 : index + %3454 = divi_signed %3453, %c2_1816 : index + %3455 = subi %c-1_1818, %3454 : index + %3456 = select %3451, %3455, %3454 : index + %c-2_1819 = constant -2 : index + %3457 = muli %3456, %c-2_1819 : index + %3458 = addi %3434, %3457 : index + %c1_1820 = constant 1 : index + %3459 = addi %3458, %c1_1820 : index + store %3408, %2[%3419, %c0_1798, %3459] : memref<16x6x2xvector<8xf32>> + %c8_1821 = constant 8 : index + %3460 = addi %arg5, %c8_1821 : index + %c16_1822 = constant 16 : index + %c0_1823 = constant 0 : index + %c-1_1824 = constant -1 : index + %3461 = cmpi "slt", %3460, %c0_1823 : index + %3462 = subi %c-1_1824, %3460 : index + %3463 = select %3461, %3462, %3460 : index + %3464 = divi_signed %3463, %c16_1822 : index + %3465 = subi %c-1_1824, %3464 : index + %3466 = select %3461, %3465, %3464 : index + %c16_1825 = constant 16 : index + %3467 = remi_signed %3466, %c16_1825 : index + %c0_1826 = constant 0 : index + %3468 = cmpi "slt", %3467, %c0_1826 : index + %3469 = addi %3467, %c16_1825 : index + %3470 = select %3468, %3469, %3467 : index + %c0_1827 = constant 0 : index + %c8_1828 = constant 8 : index + %c0_1829 = constant 0 : index + %c-1_1830 = constant -1 : index + %3471 = cmpi "slt", %arg5, %c0_1829 : index + %3472 = subi %c-1_1830, %arg5 : index + %3473 = select %3471, %3472, %arg5 : index + %3474 = divi_signed %3473, %c8_1828 : index + %3475 = subi %c-1_1830, %3474 : index + %3476 = select %3471, %3475, %3474 : index + %c8_1831 = constant 8 : index + %3477 = addi %arg5, %c8_1831 : index + %c16_1832 = constant 16 : index + %c0_1833 = constant 0 : index + %c-1_1834 = constant -1 : index + %3478 = cmpi "slt", %3477, %c0_1833 : index + %3479 = subi %c-1_1834, %3477 : index + %3480 = select %3478, %3479, %3477 : index + %3481 = divi_signed %3480, %c16_1832 : index + %3482 = subi %c-1_1834, %3481 : index + %3483 = select %3478, %3482, %3481 : index + %c-2_1835 = constant -2 : index + %3484 = muli %3483, %c-2_1835 : index + %3485 = addi %3476, %3484 : index + %c8_1836 = constant 8 : index + %c0_1837 = constant 0 : index + %c-1_1838 = constant -1 : index + %3486 = cmpi "slt", %arg5, %c0_1837 : index + %3487 = subi %c-1_1838, %arg5 : index + %3488 = select %3486, %3487, %arg5 : index + %3489 = divi_signed %3488, %c8_1836 : index + %3490 = subi %c-1_1838, %3489 : index + %3491 = select %3486, %3490, %3489 : index + %c8_1839 = constant 8 : index + %3492 = addi %arg5, %c8_1839 : index + %c16_1840 = constant 16 : index + %c0_1841 = constant 0 : index + %c-1_1842 = constant -1 : index + %3493 = cmpi "slt", %3492, %c0_1841 : index + %3494 = subi %c-1_1842, %3492 : index + %3495 = select %3493, %3494, %3492 : index + %3496 = divi_signed %3495, %c16_1840 : index + %3497 = subi %c-1_1842, %3496 : index + %3498 = select %3493, %3497, %3496 : index + %c-2_1843 = constant -2 : index + %3499 = muli %3498, %c-2_1843 : index + %3500 = addi %3491, %3499 : index + %c1_1844 = constant 1 : index + %3501 = addi %3500, %c1_1844 : index + %c2_1845 = constant 2 : index + %c0_1846 = constant 0 : index + %c-1_1847 = constant -1 : index + %3502 = cmpi "slt", %3501, %c0_1846 : index + %3503 = subi %c-1_1847, %3501 : index + %3504 = select %3502, %3503, %3501 : index + %3505 = divi_signed %3504, %c2_1845 : index + %3506 = subi %c-1_1847, %3505 : index + %3507 = select %3502, %3506, %3505 : index + %c-2_1848 = constant -2 : index + %3508 = muli %3507, %c-2_1848 : index + %3509 = addi %3485, %3508 : index + %c1_1849 = constant 1 : index + %3510 = addi %3509, %c1_1849 : index + %3511 = load %2[%3470, %c0_1827, %3510] : memref<16x6x2xvector<8xf32>> + %3512 = vector.insertelement %2208, %3511[%c4_i64 : i64] : vector<8xf32> + %c8_1850 = constant 8 : index + %3513 = addi %arg5, %c8_1850 : index + %c16_1851 = constant 16 : index + %c0_1852 = constant 0 : index + %c-1_1853 = constant -1 : index + %3514 = cmpi "slt", %3513, %c0_1852 : index + %3515 = subi %c-1_1853, %3513 : index + %3516 = select %3514, %3515, %3513 : index + %3517 = divi_signed %3516, %c16_1851 : index + %3518 = subi %c-1_1853, %3517 : index + %3519 = select %3514, %3518, %3517 : index + %c16_1854 = constant 16 : index + %3520 = remi_signed %3519, %c16_1854 : index + %c0_1855 = constant 0 : index + %3521 = cmpi "slt", %3520, %c0_1855 : index + %3522 = addi %3520, %c16_1854 : index + %3523 = select %3521, %3522, %3520 : index + %c0_1856 = constant 0 : index + %c8_1857 = constant 8 : index + %c0_1858 = constant 0 : index + %c-1_1859 = constant -1 : index + %3524 = cmpi "slt", %arg5, %c0_1858 : index + %3525 = subi %c-1_1859, %arg5 : index + %3526 = select %3524, %3525, %arg5 : index + %3527 = divi_signed %3526, %c8_1857 : index + %3528 = subi %c-1_1859, %3527 : index + %3529 = select %3524, %3528, %3527 : index + %c8_1860 = constant 8 : index + %3530 = addi %arg5, %c8_1860 : index + %c16_1861 = constant 16 : index + %c0_1862 = constant 0 : index + %c-1_1863 = constant -1 : index + %3531 = cmpi "slt", %3530, %c0_1862 : index + %3532 = subi %c-1_1863, %3530 : index + %3533 = select %3531, %3532, %3530 : index + %3534 = divi_signed %3533, %c16_1861 : index + %3535 = subi %c-1_1863, %3534 : index + %3536 = select %3531, %3535, %3534 : index + %c-2_1864 = constant -2 : index + %3537 = muli %3536, %c-2_1864 : index + %3538 = addi %3529, %3537 : index + %c8_1865 = constant 8 : index + %c0_1866 = constant 0 : index + %c-1_1867 = constant -1 : index + %3539 = cmpi "slt", %arg5, %c0_1866 : index + %3540 = subi %c-1_1867, %arg5 : index + %3541 = select %3539, %3540, %arg5 : index + %3542 = divi_signed %3541, %c8_1865 : index + %3543 = subi %c-1_1867, %3542 : index + %3544 = select %3539, %3543, %3542 : index + %c8_1868 = constant 8 : index + %3545 = addi %arg5, %c8_1868 : index + %c16_1869 = constant 16 : index + %c0_1870 = constant 0 : index + %c-1_1871 = constant -1 : index + %3546 = cmpi "slt", %3545, %c0_1870 : index + %3547 = subi %c-1_1871, %3545 : index + %3548 = select %3546, %3547, %3545 : index + %3549 = divi_signed %3548, %c16_1869 : index + %3550 = subi %c-1_1871, %3549 : index + %3551 = select %3546, %3550, %3549 : index + %c-2_1872 = constant -2 : index + %3552 = muli %3551, %c-2_1872 : index + %3553 = addi %3544, %3552 : index + %c1_1873 = constant 1 : index + %3554 = addi %3553, %c1_1873 : index + %c2_1874 = constant 2 : index + %c0_1875 = constant 0 : index + %c-1_1876 = constant -1 : index + %3555 = cmpi "slt", %3554, %c0_1875 : index + %3556 = subi %c-1_1876, %3554 : index + %3557 = select %3555, %3556, %3554 : index + %3558 = divi_signed %3557, %c2_1874 : index + %3559 = subi %c-1_1876, %3558 : index + %3560 = select %3555, %3559, %3558 : index + %c-2_1877 = constant -2 : index + %3561 = muli %3560, %c-2_1877 : index + %3562 = addi %3538, %3561 : index + %c1_1878 = constant 1 : index + %3563 = addi %3562, %c1_1878 : index + store %3512, %2[%3523, %c0_1856, %3563] : memref<16x6x2xvector<8xf32>> + %c8_1879 = constant 8 : index + %3564 = addi %arg5, %c8_1879 : index + %c16_1880 = constant 16 : index + %c0_1881 = constant 0 : index + %c-1_1882 = constant -1 : index + %3565 = cmpi "slt", %3564, %c0_1881 : index + %3566 = subi %c-1_1882, %3564 : index + %3567 = select %3565, %3566, %3564 : index + %3568 = divi_signed %3567, %c16_1880 : index + %3569 = subi %c-1_1882, %3568 : index + %3570 = select %3565, %3569, %3568 : index + %c16_1883 = constant 16 : index + %3571 = remi_signed %3570, %c16_1883 : index + %c0_1884 = constant 0 : index + %3572 = cmpi "slt", %3571, %c0_1884 : index + %3573 = addi %3571, %c16_1883 : index + %3574 = select %3572, %3573, %3571 : index + %c0_1885 = constant 0 : index + %c8_1886 = constant 8 : index + %c0_1887 = constant 0 : index + %c-1_1888 = constant -1 : index + %3575 = cmpi "slt", %arg5, %c0_1887 : index + %3576 = subi %c-1_1888, %arg5 : index + %3577 = select %3575, %3576, %arg5 : index + %3578 = divi_signed %3577, %c8_1886 : index + %3579 = subi %c-1_1888, %3578 : index + %3580 = select %3575, %3579, %3578 : index + %c8_1889 = constant 8 : index + %3581 = addi %arg5, %c8_1889 : index + %c16_1890 = constant 16 : index + %c0_1891 = constant 0 : index + %c-1_1892 = constant -1 : index + %3582 = cmpi "slt", %3581, %c0_1891 : index + %3583 = subi %c-1_1892, %3581 : index + %3584 = select %3582, %3583, %3581 : index + %3585 = divi_signed %3584, %c16_1890 : index + %3586 = subi %c-1_1892, %3585 : index + %3587 = select %3582, %3586, %3585 : index + %c-2_1893 = constant -2 : index + %3588 = muli %3587, %c-2_1893 : index + %3589 = addi %3580, %3588 : index + %c8_1894 = constant 8 : index + %c0_1895 = constant 0 : index + %c-1_1896 = constant -1 : index + %3590 = cmpi "slt", %arg5, %c0_1895 : index + %3591 = subi %c-1_1896, %arg5 : index + %3592 = select %3590, %3591, %arg5 : index + %3593 = divi_signed %3592, %c8_1894 : index + %3594 = subi %c-1_1896, %3593 : index + %3595 = select %3590, %3594, %3593 : index + %c8_1897 = constant 8 : index + %3596 = addi %arg5, %c8_1897 : index + %c16_1898 = constant 16 : index + %c0_1899 = constant 0 : index + %c-1_1900 = constant -1 : index + %3597 = cmpi "slt", %3596, %c0_1899 : index + %3598 = subi %c-1_1900, %3596 : index + %3599 = select %3597, %3598, %3596 : index + %3600 = divi_signed %3599, %c16_1898 : index + %3601 = subi %c-1_1900, %3600 : index + %3602 = select %3597, %3601, %3600 : index + %c-2_1901 = constant -2 : index + %3603 = muli %3602, %c-2_1901 : index + %3604 = addi %3595, %3603 : index + %c1_1902 = constant 1 : index + %3605 = addi %3604, %c1_1902 : index + %c2_1903 = constant 2 : index + %c0_1904 = constant 0 : index + %c-1_1905 = constant -1 : index + %3606 = cmpi "slt", %3605, %c0_1904 : index + %3607 = subi %c-1_1905, %3605 : index + %3608 = select %3606, %3607, %3605 : index + %3609 = divi_signed %3608, %c2_1903 : index + %3610 = subi %c-1_1905, %3609 : index + %3611 = select %3606, %3610, %3609 : index + %c-2_1906 = constant -2 : index + %3612 = muli %3611, %c-2_1906 : index + %3613 = addi %3589, %3612 : index + %c1_1907 = constant 1 : index + %3614 = addi %3613, %c1_1907 : index + %3615 = load %2[%3574, %c0_1885, %3614] : memref<16x6x2xvector<8xf32>> + %3616 = vector.insertelement %2209, %3615[%c5_i64 : i64] : vector<8xf32> + %c8_1908 = constant 8 : index + %3617 = addi %arg5, %c8_1908 : index + %c16_1909 = constant 16 : index + %c0_1910 = constant 0 : index + %c-1_1911 = constant -1 : index + %3618 = cmpi "slt", %3617, %c0_1910 : index + %3619 = subi %c-1_1911, %3617 : index + %3620 = select %3618, %3619, %3617 : index + %3621 = divi_signed %3620, %c16_1909 : index + %3622 = subi %c-1_1911, %3621 : index + %3623 = select %3618, %3622, %3621 : index + %c16_1912 = constant 16 : index + %3624 = remi_signed %3623, %c16_1912 : index + %c0_1913 = constant 0 : index + %3625 = cmpi "slt", %3624, %c0_1913 : index + %3626 = addi %3624, %c16_1912 : index + %3627 = select %3625, %3626, %3624 : index + %c0_1914 = constant 0 : index + %c8_1915 = constant 8 : index + %c0_1916 = constant 0 : index + %c-1_1917 = constant -1 : index + %3628 = cmpi "slt", %arg5, %c0_1916 : index + %3629 = subi %c-1_1917, %arg5 : index + %3630 = select %3628, %3629, %arg5 : index + %3631 = divi_signed %3630, %c8_1915 : index + %3632 = subi %c-1_1917, %3631 : index + %3633 = select %3628, %3632, %3631 : index + %c8_1918 = constant 8 : index + %3634 = addi %arg5, %c8_1918 : index + %c16_1919 = constant 16 : index + %c0_1920 = constant 0 : index + %c-1_1921 = constant -1 : index + %3635 = cmpi "slt", %3634, %c0_1920 : index + %3636 = subi %c-1_1921, %3634 : index + %3637 = select %3635, %3636, %3634 : index + %3638 = divi_signed %3637, %c16_1919 : index + %3639 = subi %c-1_1921, %3638 : index + %3640 = select %3635, %3639, %3638 : index + %c-2_1922 = constant -2 : index + %3641 = muli %3640, %c-2_1922 : index + %3642 = addi %3633, %3641 : index + %c8_1923 = constant 8 : index + %c0_1924 = constant 0 : index + %c-1_1925 = constant -1 : index + %3643 = cmpi "slt", %arg5, %c0_1924 : index + %3644 = subi %c-1_1925, %arg5 : index + %3645 = select %3643, %3644, %arg5 : index + %3646 = divi_signed %3645, %c8_1923 : index + %3647 = subi %c-1_1925, %3646 : index + %3648 = select %3643, %3647, %3646 : index + %c8_1926 = constant 8 : index + %3649 = addi %arg5, %c8_1926 : index + %c16_1927 = constant 16 : index + %c0_1928 = constant 0 : index + %c-1_1929 = constant -1 : index + %3650 = cmpi "slt", %3649, %c0_1928 : index + %3651 = subi %c-1_1929, %3649 : index + %3652 = select %3650, %3651, %3649 : index + %3653 = divi_signed %3652, %c16_1927 : index + %3654 = subi %c-1_1929, %3653 : index + %3655 = select %3650, %3654, %3653 : index + %c-2_1930 = constant -2 : index + %3656 = muli %3655, %c-2_1930 : index + %3657 = addi %3648, %3656 : index + %c1_1931 = constant 1 : index + %3658 = addi %3657, %c1_1931 : index + %c2_1932 = constant 2 : index + %c0_1933 = constant 0 : index + %c-1_1934 = constant -1 : index + %3659 = cmpi "slt", %3658, %c0_1933 : index + %3660 = subi %c-1_1934, %3658 : index + %3661 = select %3659, %3660, %3658 : index + %3662 = divi_signed %3661, %c2_1932 : index + %3663 = subi %c-1_1934, %3662 : index + %3664 = select %3659, %3663, %3662 : index + %c-2_1935 = constant -2 : index + %3665 = muli %3664, %c-2_1935 : index + %3666 = addi %3642, %3665 : index + %c1_1936 = constant 1 : index + %3667 = addi %3666, %c1_1936 : index + store %3616, %2[%3627, %c0_1914, %3667] : memref<16x6x2xvector<8xf32>> + %c8_1937 = constant 8 : index + %3668 = addi %arg5, %c8_1937 : index + %c16_1938 = constant 16 : index + %c0_1939 = constant 0 : index + %c-1_1940 = constant -1 : index + %3669 = cmpi "slt", %3668, %c0_1939 : index + %3670 = subi %c-1_1940, %3668 : index + %3671 = select %3669, %3670, %3668 : index + %3672 = divi_signed %3671, %c16_1938 : index + %3673 = subi %c-1_1940, %3672 : index + %3674 = select %3669, %3673, %3672 : index + %c16_1941 = constant 16 : index + %3675 = remi_signed %3674, %c16_1941 : index + %c0_1942 = constant 0 : index + %3676 = cmpi "slt", %3675, %c0_1942 : index + %3677 = addi %3675, %c16_1941 : index + %3678 = select %3676, %3677, %3675 : index + %c0_1943 = constant 0 : index + %c8_1944 = constant 8 : index + %c0_1945 = constant 0 : index + %c-1_1946 = constant -1 : index + %3679 = cmpi "slt", %arg5, %c0_1945 : index + %3680 = subi %c-1_1946, %arg5 : index + %3681 = select %3679, %3680, %arg5 : index + %3682 = divi_signed %3681, %c8_1944 : index + %3683 = subi %c-1_1946, %3682 : index + %3684 = select %3679, %3683, %3682 : index + %c8_1947 = constant 8 : index + %3685 = addi %arg5, %c8_1947 : index + %c16_1948 = constant 16 : index + %c0_1949 = constant 0 : index + %c-1_1950 = constant -1 : index + %3686 = cmpi "slt", %3685, %c0_1949 : index + %3687 = subi %c-1_1950, %3685 : index + %3688 = select %3686, %3687, %3685 : index + %3689 = divi_signed %3688, %c16_1948 : index + %3690 = subi %c-1_1950, %3689 : index + %3691 = select %3686, %3690, %3689 : index + %c-2_1951 = constant -2 : index + %3692 = muli %3691, %c-2_1951 : index + %3693 = addi %3684, %3692 : index + %c8_1952 = constant 8 : index + %c0_1953 = constant 0 : index + %c-1_1954 = constant -1 : index + %3694 = cmpi "slt", %arg5, %c0_1953 : index + %3695 = subi %c-1_1954, %arg5 : index + %3696 = select %3694, %3695, %arg5 : index + %3697 = divi_signed %3696, %c8_1952 : index + %3698 = subi %c-1_1954, %3697 : index + %3699 = select %3694, %3698, %3697 : index + %c8_1955 = constant 8 : index + %3700 = addi %arg5, %c8_1955 : index + %c16_1956 = constant 16 : index + %c0_1957 = constant 0 : index + %c-1_1958 = constant -1 : index + %3701 = cmpi "slt", %3700, %c0_1957 : index + %3702 = subi %c-1_1958, %3700 : index + %3703 = select %3701, %3702, %3700 : index + %3704 = divi_signed %3703, %c16_1956 : index + %3705 = subi %c-1_1958, %3704 : index + %3706 = select %3701, %3705, %3704 : index + %c-2_1959 = constant -2 : index + %3707 = muli %3706, %c-2_1959 : index + %3708 = addi %3699, %3707 : index + %c1_1960 = constant 1 : index + %3709 = addi %3708, %c1_1960 : index + %c2_1961 = constant 2 : index + %c0_1962 = constant 0 : index + %c-1_1963 = constant -1 : index + %3710 = cmpi "slt", %3709, %c0_1962 : index + %3711 = subi %c-1_1963, %3709 : index + %3712 = select %3710, %3711, %3709 : index + %3713 = divi_signed %3712, %c2_1961 : index + %3714 = subi %c-1_1963, %3713 : index + %3715 = select %3710, %3714, %3713 : index + %c-2_1964 = constant -2 : index + %3716 = muli %3715, %c-2_1964 : index + %3717 = addi %3693, %3716 : index + %c1_1965 = constant 1 : index + %3718 = addi %3717, %c1_1965 : index + %3719 = load %2[%3678, %c0_1943, %3718] : memref<16x6x2xvector<8xf32>> + %3720 = vector.insertelement %2210, %3719[%c6_i64 : i64] : vector<8xf32> + %c8_1966 = constant 8 : index + %3721 = addi %arg5, %c8_1966 : index + %c16_1967 = constant 16 : index + %c0_1968 = constant 0 : index + %c-1_1969 = constant -1 : index + %3722 = cmpi "slt", %3721, %c0_1968 : index + %3723 = subi %c-1_1969, %3721 : index + %3724 = select %3722, %3723, %3721 : index + %3725 = divi_signed %3724, %c16_1967 : index + %3726 = subi %c-1_1969, %3725 : index + %3727 = select %3722, %3726, %3725 : index + %c16_1970 = constant 16 : index + %3728 = remi_signed %3727, %c16_1970 : index + %c0_1971 = constant 0 : index + %3729 = cmpi "slt", %3728, %c0_1971 : index + %3730 = addi %3728, %c16_1970 : index + %3731 = select %3729, %3730, %3728 : index + %c0_1972 = constant 0 : index + %c8_1973 = constant 8 : index + %c0_1974 = constant 0 : index + %c-1_1975 = constant -1 : index + %3732 = cmpi "slt", %arg5, %c0_1974 : index + %3733 = subi %c-1_1975, %arg5 : index + %3734 = select %3732, %3733, %arg5 : index + %3735 = divi_signed %3734, %c8_1973 : index + %3736 = subi %c-1_1975, %3735 : index + %3737 = select %3732, %3736, %3735 : index + %c8_1976 = constant 8 : index + %3738 = addi %arg5, %c8_1976 : index + %c16_1977 = constant 16 : index + %c0_1978 = constant 0 : index + %c-1_1979 = constant -1 : index + %3739 = cmpi "slt", %3738, %c0_1978 : index + %3740 = subi %c-1_1979, %3738 : index + %3741 = select %3739, %3740, %3738 : index + %3742 = divi_signed %3741, %c16_1977 : index + %3743 = subi %c-1_1979, %3742 : index + %3744 = select %3739, %3743, %3742 : index + %c-2_1980 = constant -2 : index + %3745 = muli %3744, %c-2_1980 : index + %3746 = addi %3737, %3745 : index + %c8_1981 = constant 8 : index + %c0_1982 = constant 0 : index + %c-1_1983 = constant -1 : index + %3747 = cmpi "slt", %arg5, %c0_1982 : index + %3748 = subi %c-1_1983, %arg5 : index + %3749 = select %3747, %3748, %arg5 : index + %3750 = divi_signed %3749, %c8_1981 : index + %3751 = subi %c-1_1983, %3750 : index + %3752 = select %3747, %3751, %3750 : index + %c8_1984 = constant 8 : index + %3753 = addi %arg5, %c8_1984 : index + %c16_1985 = constant 16 : index + %c0_1986 = constant 0 : index + %c-1_1987 = constant -1 : index + %3754 = cmpi "slt", %3753, %c0_1986 : index + %3755 = subi %c-1_1987, %3753 : index + %3756 = select %3754, %3755, %3753 : index + %3757 = divi_signed %3756, %c16_1985 : index + %3758 = subi %c-1_1987, %3757 : index + %3759 = select %3754, %3758, %3757 : index + %c-2_1988 = constant -2 : index + %3760 = muli %3759, %c-2_1988 : index + %3761 = addi %3752, %3760 : index + %c1_1989 = constant 1 : index + %3762 = addi %3761, %c1_1989 : index + %c2_1990 = constant 2 : index + %c0_1991 = constant 0 : index + %c-1_1992 = constant -1 : index + %3763 = cmpi "slt", %3762, %c0_1991 : index + %3764 = subi %c-1_1992, %3762 : index + %3765 = select %3763, %3764, %3762 : index + %3766 = divi_signed %3765, %c2_1990 : index + %3767 = subi %c-1_1992, %3766 : index + %3768 = select %3763, %3767, %3766 : index + %c-2_1993 = constant -2 : index + %3769 = muli %3768, %c-2_1993 : index + %3770 = addi %3746, %3769 : index + %c1_1994 = constant 1 : index + %3771 = addi %3770, %c1_1994 : index + store %3720, %2[%3731, %c0_1972, %3771] : memref<16x6x2xvector<8xf32>> + %c8_1995 = constant 8 : index + %3772 = addi %arg5, %c8_1995 : index + %c16_1996 = constant 16 : index + %c0_1997 = constant 0 : index + %c-1_1998 = constant -1 : index + %3773 = cmpi "slt", %3772, %c0_1997 : index + %3774 = subi %c-1_1998, %3772 : index + %3775 = select %3773, %3774, %3772 : index + %3776 = divi_signed %3775, %c16_1996 : index + %3777 = subi %c-1_1998, %3776 : index + %3778 = select %3773, %3777, %3776 : index + %c16_1999 = constant 16 : index + %3779 = remi_signed %3778, %c16_1999 : index + %c0_2000 = constant 0 : index + %3780 = cmpi "slt", %3779, %c0_2000 : index + %3781 = addi %3779, %c16_1999 : index + %3782 = select %3780, %3781, %3779 : index + %c0_2001 = constant 0 : index + %c8_2002 = constant 8 : index + %c0_2003 = constant 0 : index + %c-1_2004 = constant -1 : index + %3783 = cmpi "slt", %arg5, %c0_2003 : index + %3784 = subi %c-1_2004, %arg5 : index + %3785 = select %3783, %3784, %arg5 : index + %3786 = divi_signed %3785, %c8_2002 : index + %3787 = subi %c-1_2004, %3786 : index + %3788 = select %3783, %3787, %3786 : index + %c8_2005 = constant 8 : index + %3789 = addi %arg5, %c8_2005 : index + %c16_2006 = constant 16 : index + %c0_2007 = constant 0 : index + %c-1_2008 = constant -1 : index + %3790 = cmpi "slt", %3789, %c0_2007 : index + %3791 = subi %c-1_2008, %3789 : index + %3792 = select %3790, %3791, %3789 : index + %3793 = divi_signed %3792, %c16_2006 : index + %3794 = subi %c-1_2008, %3793 : index + %3795 = select %3790, %3794, %3793 : index + %c-2_2009 = constant -2 : index + %3796 = muli %3795, %c-2_2009 : index + %3797 = addi %3788, %3796 : index + %c8_2010 = constant 8 : index + %c0_2011 = constant 0 : index + %c-1_2012 = constant -1 : index + %3798 = cmpi "slt", %arg5, %c0_2011 : index + %3799 = subi %c-1_2012, %arg5 : index + %3800 = select %3798, %3799, %arg5 : index + %3801 = divi_signed %3800, %c8_2010 : index + %3802 = subi %c-1_2012, %3801 : index + %3803 = select %3798, %3802, %3801 : index + %c8_2013 = constant 8 : index + %3804 = addi %arg5, %c8_2013 : index + %c16_2014 = constant 16 : index + %c0_2015 = constant 0 : index + %c-1_2016 = constant -1 : index + %3805 = cmpi "slt", %3804, %c0_2015 : index + %3806 = subi %c-1_2016, %3804 : index + %3807 = select %3805, %3806, %3804 : index + %3808 = divi_signed %3807, %c16_2014 : index + %3809 = subi %c-1_2016, %3808 : index + %3810 = select %3805, %3809, %3808 : index + %c-2_2017 = constant -2 : index + %3811 = muli %3810, %c-2_2017 : index + %3812 = addi %3803, %3811 : index + %c1_2018 = constant 1 : index + %3813 = addi %3812, %c1_2018 : index + %c2_2019 = constant 2 : index + %c0_2020 = constant 0 : index + %c-1_2021 = constant -1 : index + %3814 = cmpi "slt", %3813, %c0_2020 : index + %3815 = subi %c-1_2021, %3813 : index + %3816 = select %3814, %3815, %3813 : index + %3817 = divi_signed %3816, %c2_2019 : index + %3818 = subi %c-1_2021, %3817 : index + %3819 = select %3814, %3818, %3817 : index + %c-2_2022 = constant -2 : index + %3820 = muli %3819, %c-2_2022 : index + %3821 = addi %3797, %3820 : index + %c1_2023 = constant 1 : index + %3822 = addi %3821, %c1_2023 : index + %3823 = load %2[%3782, %c0_2001, %3822] : memref<16x6x2xvector<8xf32>> + %3824 = vector.insertelement %2211, %3823[%c7_i64 : i64] : vector<8xf32> + %c8_2024 = constant 8 : index + %3825 = addi %arg5, %c8_2024 : index + %c16_2025 = constant 16 : index + %c0_2026 = constant 0 : index + %c-1_2027 = constant -1 : index + %3826 = cmpi "slt", %3825, %c0_2026 : index + %3827 = subi %c-1_2027, %3825 : index + %3828 = select %3826, %3827, %3825 : index + %3829 = divi_signed %3828, %c16_2025 : index + %3830 = subi %c-1_2027, %3829 : index + %3831 = select %3826, %3830, %3829 : index + %c16_2028 = constant 16 : index + %3832 = remi_signed %3831, %c16_2028 : index + %c0_2029 = constant 0 : index + %3833 = cmpi "slt", %3832, %c0_2029 : index + %3834 = addi %3832, %c16_2028 : index + %3835 = select %3833, %3834, %3832 : index + %c0_2030 = constant 0 : index + %c8_2031 = constant 8 : index + %c0_2032 = constant 0 : index + %c-1_2033 = constant -1 : index + %3836 = cmpi "slt", %arg5, %c0_2032 : index + %3837 = subi %c-1_2033, %arg5 : index + %3838 = select %3836, %3837, %arg5 : index + %3839 = divi_signed %3838, %c8_2031 : index + %3840 = subi %c-1_2033, %3839 : index + %3841 = select %3836, %3840, %3839 : index + %c8_2034 = constant 8 : index + %3842 = addi %arg5, %c8_2034 : index + %c16_2035 = constant 16 : index + %c0_2036 = constant 0 : index + %c-1_2037 = constant -1 : index + %3843 = cmpi "slt", %3842, %c0_2036 : index + %3844 = subi %c-1_2037, %3842 : index + %3845 = select %3843, %3844, %3842 : index + %3846 = divi_signed %3845, %c16_2035 : index + %3847 = subi %c-1_2037, %3846 : index + %3848 = select %3843, %3847, %3846 : index + %c-2_2038 = constant -2 : index + %3849 = muli %3848, %c-2_2038 : index + %3850 = addi %3841, %3849 : index + %c8_2039 = constant 8 : index + %c0_2040 = constant 0 : index + %c-1_2041 = constant -1 : index + %3851 = cmpi "slt", %arg5, %c0_2040 : index + %3852 = subi %c-1_2041, %arg5 : index + %3853 = select %3851, %3852, %arg5 : index + %3854 = divi_signed %3853, %c8_2039 : index + %3855 = subi %c-1_2041, %3854 : index + %3856 = select %3851, %3855, %3854 : index + %c8_2042 = constant 8 : index + %3857 = addi %arg5, %c8_2042 : index + %c16_2043 = constant 16 : index + %c0_2044 = constant 0 : index + %c-1_2045 = constant -1 : index + %3858 = cmpi "slt", %3857, %c0_2044 : index + %3859 = subi %c-1_2045, %3857 : index + %3860 = select %3858, %3859, %3857 : index + %3861 = divi_signed %3860, %c16_2043 : index + %3862 = subi %c-1_2045, %3861 : index + %3863 = select %3858, %3862, %3861 : index + %c-2_2046 = constant -2 : index + %3864 = muli %3863, %c-2_2046 : index + %3865 = addi %3856, %3864 : index + %c1_2047 = constant 1 : index + %3866 = addi %3865, %c1_2047 : index + %c2_2048 = constant 2 : index + %c0_2049 = constant 0 : index + %c-1_2050 = constant -1 : index + %3867 = cmpi "slt", %3866, %c0_2049 : index + %3868 = subi %c-1_2050, %3866 : index + %3869 = select %3867, %3868, %3866 : index + %3870 = divi_signed %3869, %c2_2048 : index + %3871 = subi %c-1_2050, %3870 : index + %3872 = select %3867, %3871, %3870 : index + %c-2_2051 = constant -2 : index + %3873 = muli %3872, %c-2_2051 : index + %3874 = addi %3850, %3873 : index + %c1_2052 = constant 1 : index + %3875 = addi %3874, %c1_2052 : index + store %3824, %2[%3835, %c0_2030, %3875] : memref<16x6x2xvector<8xf32>> + } + } + } + %c0_11 = constant 0 : index + %c256_12 = constant 256 : index + %c128_13 = constant 128 : index + scf.for %arg5 = %c0_11 to %c256_12 step %c128_13 { + %c0_14 = constant 0 : index + %c0_15 = constant 0 : index + %4 = cmpi "eq", %c0_15, %c0_14 : index + scf.if %4 { + %5 = addi %arg3, %arg5 : index + %6 = vector.transfer_read %arg2[%arg4, %5], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_16 = constant 16 : index + %c0_17 = constant 0 : index + %c-1 = constant -1 : index + %7 = cmpi "slt", %arg5, %c0_17 : index + %8 = subi %c-1, %arg5 : index + %9 = select %7, %8, %arg5 : index + %10 = divi_signed %9, %c16_16 : index + %11 = subi %c-1, %10 : index + %12 = select %7, %11, %10 : index + %c16_18 = constant 16 : index + %13 = remi_signed %12, %c16_18 : index + %c0_19 = constant 0 : index + %14 = cmpi "slt", %13, %c0_19 : index + %15 = addi %13, %c16_18 : index + %16 = select %14, %15, %13 : index + %c0_20 = constant 0 : index + %c16_21 = constant 16 : index + %17 = remi_signed %arg5, %c16_21 : index + %c0_22 = constant 0 : index + %18 = cmpi "slt", %17, %c0_22 : index + %19 = addi %17, %c16_21 : index + %20 = select %18, %19, %17 : index + %c8_23 = constant 8 : index + %c0_24 = constant 0 : index + %c-1_25 = constant -1 : index + %21 = cmpi "slt", %20, %c0_24 : index + %22 = subi %c-1_25, %20 : index + %23 = select %21, %22, %20 : index + %24 = divi_signed %23, %c8_23 : index + %25 = subi %c-1_25, %24 : index + %26 = select %21, %25, %24 : index + %c2_26 = constant 2 : index + %27 = remi_signed %26, %c2_26 : index + %c0_27 = constant 0 : index + %28 = cmpi "slt", %27, %c0_27 : index + %29 = addi %27, %c2_26 : index + %30 = select %28, %29, %27 : index + %31 = load %2[%16, %c0_20, %30] : memref<16x6x2xvector<8xf32>> + %32 = addf %6, %31 : vector<8xf32> + store %32, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %33 = addi %arg3, %arg5 : index + %c8_28 = constant 8 : index + %34 = addi %33, %c8_28 : index + %35 = vector.transfer_read %arg2[%arg4, %34], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c8_29 = constant 8 : index + %36 = addi %arg5, %c8_29 : index + %c16_30 = constant 16 : index + %c0_31 = constant 0 : index + %c-1_32 = constant -1 : index + %37 = cmpi "slt", %36, %c0_31 : index + %38 = subi %c-1_32, %36 : index + %39 = select %37, %38, %36 : index + %40 = divi_signed %39, %c16_30 : index + %41 = subi %c-1_32, %40 : index + %42 = select %37, %41, %40 : index + %c16_33 = constant 16 : index + %43 = remi_signed %42, %c16_33 : index + %c0_34 = constant 0 : index + %44 = cmpi "slt", %43, %c0_34 : index + %45 = addi %43, %c16_33 : index + %46 = select %44, %45, %43 : index + %c0_35 = constant 0 : index + %c8_36 = constant 8 : index + %c0_37 = constant 0 : index + %c-1_38 = constant -1 : index + %47 = cmpi "slt", %arg5, %c0_37 : index + %48 = subi %c-1_38, %arg5 : index + %49 = select %47, %48, %arg5 : index + %50 = divi_signed %49, %c8_36 : index + %51 = subi %c-1_38, %50 : index + %52 = select %47, %51, %50 : index + %c8_39 = constant 8 : index + %53 = addi %arg5, %c8_39 : index + %c16_40 = constant 16 : index + %c0_41 = constant 0 : index + %c-1_42 = constant -1 : index + %54 = cmpi "slt", %53, %c0_41 : index + %55 = subi %c-1_42, %53 : index + %56 = select %54, %55, %53 : index + %57 = divi_signed %56, %c16_40 : index + %58 = subi %c-1_42, %57 : index + %59 = select %54, %58, %57 : index + %c-2 = constant -2 : index + %60 = muli %59, %c-2 : index + %61 = addi %52, %60 : index + %c8_43 = constant 8 : index + %c0_44 = constant 0 : index + %c-1_45 = constant -1 : index + %62 = cmpi "slt", %arg5, %c0_44 : index + %63 = subi %c-1_45, %arg5 : index + %64 = select %62, %63, %arg5 : index + %65 = divi_signed %64, %c8_43 : index + %66 = subi %c-1_45, %65 : index + %67 = select %62, %66, %65 : index + %c8_46 = constant 8 : index + %68 = addi %arg5, %c8_46 : index + %c16_47 = constant 16 : index + %c0_48 = constant 0 : index + %c-1_49 = constant -1 : index + %69 = cmpi "slt", %68, %c0_48 : index + %70 = subi %c-1_49, %68 : index + %71 = select %69, %70, %68 : index + %72 = divi_signed %71, %c16_47 : index + %73 = subi %c-1_49, %72 : index + %74 = select %69, %73, %72 : index + %c-2_50 = constant -2 : index + %75 = muli %74, %c-2_50 : index + %76 = addi %67, %75 : index + %c1_51 = constant 1 : index + %77 = addi %76, %c1_51 : index + %c2_52 = constant 2 : index + %c0_53 = constant 0 : index + %c-1_54 = constant -1 : index + %78 = cmpi "slt", %77, %c0_53 : index + %79 = subi %c-1_54, %77 : index + %80 = select %78, %79, %77 : index + %81 = divi_signed %80, %c2_52 : index + %82 = subi %c-1_54, %81 : index + %83 = select %78, %82, %81 : index + %c-2_55 = constant -2 : index + %84 = muli %83, %c-2_55 : index + %85 = addi %61, %84 : index + %c1_56 = constant 1 : index + %86 = addi %85, %c1_56 : index + %87 = load %2[%46, %c0_35, %86] : memref<16x6x2xvector<8xf32>> + %88 = addf %35, %87 : vector<8xf32> + store %88, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %89 = addi %arg3, %arg5 : index + %c16_57 = constant 16 : index + %90 = addi %89, %c16_57 : index + %91 = vector.transfer_read %arg2[%arg4, %90], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_58 = constant 16 : index + %c0_59 = constant 0 : index + %c-1_60 = constant -1 : index + %92 = cmpi "slt", %arg5, %c0_59 : index + %93 = subi %c-1_60, %arg5 : index + %94 = select %92, %93, %arg5 : index + %95 = divi_signed %94, %c16_58 : index + %96 = subi %c-1_60, %95 : index + %97 = select %92, %96, %95 : index + %c16_61 = constant 16 : index + %c0_62 = constant 0 : index + %c-1_63 = constant -1 : index + %98 = cmpi "slt", %arg5, %c0_62 : index + %99 = subi %c-1_63, %arg5 : index + %100 = select %98, %99, %arg5 : index + %101 = divi_signed %100, %c16_61 : index + %102 = subi %c-1_63, %101 : index + %103 = select %98, %102, %101 : index + %c1_64 = constant 1 : index + %104 = addi %103, %c1_64 : index + %c16_65 = constant 16 : index + %c0_66 = constant 0 : index + %c-1_67 = constant -1 : index + %105 = cmpi "slt", %104, %c0_66 : index + %106 = subi %c-1_67, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16_65 : index + %109 = subi %c-1_67, %108 : index + %110 = select %105, %109, %108 : index + %c-16 = constant -16 : index + %111 = muli %110, %c-16 : index + %112 = addi %97, %111 : index + %c1_68 = constant 1 : index + %113 = addi %112, %c1_68 : index + %c0_69 = constant 0 : index + %c16_70 = constant 16 : index + %114 = remi_signed %arg5, %c16_70 : index + %c0_71 = constant 0 : index + %115 = cmpi "slt", %114, %c0_71 : index + %116 = addi %114, %c16_70 : index + %117 = select %115, %116, %114 : index + %c8_72 = constant 8 : index + %c0_73 = constant 0 : index + %c-1_74 = constant -1 : index + %118 = cmpi "slt", %117, %c0_73 : index + %119 = subi %c-1_74, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c8_72 : index + %122 = subi %c-1_74, %121 : index + %123 = select %118, %122, %121 : index + %c2_75 = constant 2 : index + %124 = remi_signed %123, %c2_75 : index + %c0_76 = constant 0 : index + %125 = cmpi "slt", %124, %c0_76 : index + %126 = addi %124, %c2_75 : index + %127 = select %125, %126, %124 : index + %128 = load %2[%113, %c0_69, %127] : memref<16x6x2xvector<8xf32>> + %129 = addf %91, %128 : vector<8xf32> + store %129, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %130 = addi %arg3, %arg5 : index + %c24 = constant 24 : index + %131 = addi %130, %c24 : index + %132 = vector.transfer_read %arg2[%arg4, %131], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c24_77 = constant 24 : index + %133 = addi %arg5, %c24_77 : index + %c16_78 = constant 16 : index + %c0_79 = constant 0 : index + %c-1_80 = constant -1 : index + %134 = cmpi "slt", %133, %c0_79 : index + %135 = subi %c-1_80, %133 : index + %136 = select %134, %135, %133 : index + %137 = divi_signed %136, %c16_78 : index + %138 = subi %c-1_80, %137 : index + %139 = select %134, %138, %137 : index + %c16_81 = constant 16 : index + %140 = remi_signed %139, %c16_81 : index + %c0_82 = constant 0 : index + %141 = cmpi "slt", %140, %c0_82 : index + %142 = addi %140, %c16_81 : index + %143 = select %141, %142, %140 : index + %c0_83 = constant 0 : index + %c8_84 = constant 8 : index + %c0_85 = constant 0 : index + %c-1_86 = constant -1 : index + %144 = cmpi "slt", %arg5, %c0_85 : index + %145 = subi %c-1_86, %arg5 : index + %146 = select %144, %145, %arg5 : index + %147 = divi_signed %146, %c8_84 : index + %148 = subi %c-1_86, %147 : index + %149 = select %144, %148, %147 : index + %c24_87 = constant 24 : index + %150 = addi %arg5, %c24_87 : index + %c16_88 = constant 16 : index + %c0_89 = constant 0 : index + %c-1_90 = constant -1 : index + %151 = cmpi "slt", %150, %c0_89 : index + %152 = subi %c-1_90, %150 : index + %153 = select %151, %152, %150 : index + %154 = divi_signed %153, %c16_88 : index + %155 = subi %c-1_90, %154 : index + %156 = select %151, %155, %154 : index + %c-2_91 = constant -2 : index + %157 = muli %156, %c-2_91 : index + %158 = addi %149, %157 : index + %c8_92 = constant 8 : index + %c0_93 = constant 0 : index + %c-1_94 = constant -1 : index + %159 = cmpi "slt", %arg5, %c0_93 : index + %160 = subi %c-1_94, %arg5 : index + %161 = select %159, %160, %arg5 : index + %162 = divi_signed %161, %c8_92 : index + %163 = subi %c-1_94, %162 : index + %164 = select %159, %163, %162 : index + %c24_95 = constant 24 : index + %165 = addi %arg5, %c24_95 : index + %c16_96 = constant 16 : index + %c0_97 = constant 0 : index + %c-1_98 = constant -1 : index + %166 = cmpi "slt", %165, %c0_97 : index + %167 = subi %c-1_98, %165 : index + %168 = select %166, %167, %165 : index + %169 = divi_signed %168, %c16_96 : index + %170 = subi %c-1_98, %169 : index + %171 = select %166, %170, %169 : index + %c-2_99 = constant -2 : index + %172 = muli %171, %c-2_99 : index + %173 = addi %164, %172 : index + %c3_100 = constant 3 : index + %174 = addi %173, %c3_100 : index + %c2_101 = constant 2 : index + %c0_102 = constant 0 : index + %c-1_103 = constant -1 : index + %175 = cmpi "slt", %174, %c0_102 : index + %176 = subi %c-1_103, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c2_101 : index + %179 = subi %c-1_103, %178 : index + %180 = select %175, %179, %178 : index + %c-2_104 = constant -2 : index + %181 = muli %180, %c-2_104 : index + %182 = addi %158, %181 : index + %c3_105 = constant 3 : index + %183 = addi %182, %c3_105 : index + %184 = load %2[%143, %c0_83, %183] : memref<16x6x2xvector<8xf32>> + %185 = addf %132, %184 : vector<8xf32> + store %185, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %186 = addi %arg3, %arg5 : index + %c32 = constant 32 : index + %187 = addi %186, %c32 : index + %188 = vector.transfer_read %arg2[%arg4, %187], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_106 = constant 16 : index + %c0_107 = constant 0 : index + %c-1_108 = constant -1 : index + %189 = cmpi "slt", %arg5, %c0_107 : index + %190 = subi %c-1_108, %arg5 : index + %191 = select %189, %190, %arg5 : index + %192 = divi_signed %191, %c16_106 : index + %193 = subi %c-1_108, %192 : index + %194 = select %189, %193, %192 : index + %c16_109 = constant 16 : index + %c0_110 = constant 0 : index + %c-1_111 = constant -1 : index + %195 = cmpi "slt", %arg5, %c0_110 : index + %196 = subi %c-1_111, %arg5 : index + %197 = select %195, %196, %arg5 : index + %198 = divi_signed %197, %c16_109 : index + %199 = subi %c-1_111, %198 : index + %200 = select %195, %199, %198 : index + %c2_112 = constant 2 : index + %201 = addi %200, %c2_112 : index + %c16_113 = constant 16 : index + %c0_114 = constant 0 : index + %c-1_115 = constant -1 : index + %202 = cmpi "slt", %201, %c0_114 : index + %203 = subi %c-1_115, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16_113 : index + %206 = subi %c-1_115, %205 : index + %207 = select %202, %206, %205 : index + %c-16_116 = constant -16 : index + %208 = muli %207, %c-16_116 : index + %209 = addi %194, %208 : index + %c2_117 = constant 2 : index + %210 = addi %209, %c2_117 : index + %c0_118 = constant 0 : index + %c16_119 = constant 16 : index + %211 = remi_signed %arg5, %c16_119 : index + %c0_120 = constant 0 : index + %212 = cmpi "slt", %211, %c0_120 : index + %213 = addi %211, %c16_119 : index + %214 = select %212, %213, %211 : index + %c8_121 = constant 8 : index + %c0_122 = constant 0 : index + %c-1_123 = constant -1 : index + %215 = cmpi "slt", %214, %c0_122 : index + %216 = subi %c-1_123, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c8_121 : index + %219 = subi %c-1_123, %218 : index + %220 = select %215, %219, %218 : index + %c2_124 = constant 2 : index + %221 = remi_signed %220, %c2_124 : index + %c0_125 = constant 0 : index + %222 = cmpi "slt", %221, %c0_125 : index + %223 = addi %221, %c2_124 : index + %224 = select %222, %223, %221 : index + %225 = load %2[%210, %c0_118, %224] : memref<16x6x2xvector<8xf32>> + %226 = addf %188, %225 : vector<8xf32> + store %226, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %227 = addi %arg3, %arg5 : index + %c40 = constant 40 : index + %228 = addi %227, %c40 : index + %229 = vector.transfer_read %arg2[%arg4, %228], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c40_126 = constant 40 : index + %230 = addi %arg5, %c40_126 : index + %c16_127 = constant 16 : index + %c0_128 = constant 0 : index + %c-1_129 = constant -1 : index + %231 = cmpi "slt", %230, %c0_128 : index + %232 = subi %c-1_129, %230 : index + %233 = select %231, %232, %230 : index + %234 = divi_signed %233, %c16_127 : index + %235 = subi %c-1_129, %234 : index + %236 = select %231, %235, %234 : index + %c16_130 = constant 16 : index + %237 = remi_signed %236, %c16_130 : index + %c0_131 = constant 0 : index + %238 = cmpi "slt", %237, %c0_131 : index + %239 = addi %237, %c16_130 : index + %240 = select %238, %239, %237 : index + %c0_132 = constant 0 : index + %c8_133 = constant 8 : index + %c0_134 = constant 0 : index + %c-1_135 = constant -1 : index + %241 = cmpi "slt", %arg5, %c0_134 : index + %242 = subi %c-1_135, %arg5 : index + %243 = select %241, %242, %arg5 : index + %244 = divi_signed %243, %c8_133 : index + %245 = subi %c-1_135, %244 : index + %246 = select %241, %245, %244 : index + %c40_136 = constant 40 : index + %247 = addi %arg5, %c40_136 : index + %c16_137 = constant 16 : index + %c0_138 = constant 0 : index + %c-1_139 = constant -1 : index + %248 = cmpi "slt", %247, %c0_138 : index + %249 = subi %c-1_139, %247 : index + %250 = select %248, %249, %247 : index + %251 = divi_signed %250, %c16_137 : index + %252 = subi %c-1_139, %251 : index + %253 = select %248, %252, %251 : index + %c-2_140 = constant -2 : index + %254 = muli %253, %c-2_140 : index + %255 = addi %246, %254 : index + %c8_141 = constant 8 : index + %c0_142 = constant 0 : index + %c-1_143 = constant -1 : index + %256 = cmpi "slt", %arg5, %c0_142 : index + %257 = subi %c-1_143, %arg5 : index + %258 = select %256, %257, %arg5 : index + %259 = divi_signed %258, %c8_141 : index + %260 = subi %c-1_143, %259 : index + %261 = select %256, %260, %259 : index + %c40_144 = constant 40 : index + %262 = addi %arg5, %c40_144 : index + %c16_145 = constant 16 : index + %c0_146 = constant 0 : index + %c-1_147 = constant -1 : index + %263 = cmpi "slt", %262, %c0_146 : index + %264 = subi %c-1_147, %262 : index + %265 = select %263, %264, %262 : index + %266 = divi_signed %265, %c16_145 : index + %267 = subi %c-1_147, %266 : index + %268 = select %263, %267, %266 : index + %c-2_148 = constant -2 : index + %269 = muli %268, %c-2_148 : index + %270 = addi %261, %269 : index + %c5_149 = constant 5 : index + %271 = addi %270, %c5_149 : index + %c2_150 = constant 2 : index + %c0_151 = constant 0 : index + %c-1_152 = constant -1 : index + %272 = cmpi "slt", %271, %c0_151 : index + %273 = subi %c-1_152, %271 : index + %274 = select %272, %273, %271 : index + %275 = divi_signed %274, %c2_150 : index + %276 = subi %c-1_152, %275 : index + %277 = select %272, %276, %275 : index + %c-2_153 = constant -2 : index + %278 = muli %277, %c-2_153 : index + %279 = addi %255, %278 : index + %c5_154 = constant 5 : index + %280 = addi %279, %c5_154 : index + %281 = load %2[%240, %c0_132, %280] : memref<16x6x2xvector<8xf32>> + %282 = addf %229, %281 : vector<8xf32> + store %282, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %283 = addi %arg3, %arg5 : index + %c48 = constant 48 : index + %284 = addi %283, %c48 : index + %285 = vector.transfer_read %arg2[%arg4, %284], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_155 = constant 16 : index + %c0_156 = constant 0 : index + %c-1_157 = constant -1 : index + %286 = cmpi "slt", %arg5, %c0_156 : index + %287 = subi %c-1_157, %arg5 : index + %288 = select %286, %287, %arg5 : index + %289 = divi_signed %288, %c16_155 : index + %290 = subi %c-1_157, %289 : index + %291 = select %286, %290, %289 : index + %c16_158 = constant 16 : index + %c0_159 = constant 0 : index + %c-1_160 = constant -1 : index + %292 = cmpi "slt", %arg5, %c0_159 : index + %293 = subi %c-1_160, %arg5 : index + %294 = select %292, %293, %arg5 : index + %295 = divi_signed %294, %c16_158 : index + %296 = subi %c-1_160, %295 : index + %297 = select %292, %296, %295 : index + %c3_161 = constant 3 : index + %298 = addi %297, %c3_161 : index + %c16_162 = constant 16 : index + %c0_163 = constant 0 : index + %c-1_164 = constant -1 : index + %299 = cmpi "slt", %298, %c0_163 : index + %300 = subi %c-1_164, %298 : index + %301 = select %299, %300, %298 : index + %302 = divi_signed %301, %c16_162 : index + %303 = subi %c-1_164, %302 : index + %304 = select %299, %303, %302 : index + %c-16_165 = constant -16 : index + %305 = muli %304, %c-16_165 : index + %306 = addi %291, %305 : index + %c3_166 = constant 3 : index + %307 = addi %306, %c3_166 : index + %c0_167 = constant 0 : index + %c16_168 = constant 16 : index + %308 = remi_signed %arg5, %c16_168 : index + %c0_169 = constant 0 : index + %309 = cmpi "slt", %308, %c0_169 : index + %310 = addi %308, %c16_168 : index + %311 = select %309, %310, %308 : index + %c8_170 = constant 8 : index + %c0_171 = constant 0 : index + %c-1_172 = constant -1 : index + %312 = cmpi "slt", %311, %c0_171 : index + %313 = subi %c-1_172, %311 : index + %314 = select %312, %313, %311 : index + %315 = divi_signed %314, %c8_170 : index + %316 = subi %c-1_172, %315 : index + %317 = select %312, %316, %315 : index + %c2_173 = constant 2 : index + %318 = remi_signed %317, %c2_173 : index + %c0_174 = constant 0 : index + %319 = cmpi "slt", %318, %c0_174 : index + %320 = addi %318, %c2_173 : index + %321 = select %319, %320, %318 : index + %322 = load %2[%307, %c0_167, %321] : memref<16x6x2xvector<8xf32>> + %323 = addf %285, %322 : vector<8xf32> + store %323, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %324 = addi %arg3, %arg5 : index + %c56 = constant 56 : index + %325 = addi %324, %c56 : index + %326 = vector.transfer_read %arg2[%arg4, %325], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c56_175 = constant 56 : index + %327 = addi %arg5, %c56_175 : index + %c16_176 = constant 16 : index + %c0_177 = constant 0 : index + %c-1_178 = constant -1 : index + %328 = cmpi "slt", %327, %c0_177 : index + %329 = subi %c-1_178, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c16_176 : index + %332 = subi %c-1_178, %331 : index + %333 = select %328, %332, %331 : index + %c16_179 = constant 16 : index + %334 = remi_signed %333, %c16_179 : index + %c0_180 = constant 0 : index + %335 = cmpi "slt", %334, %c0_180 : index + %336 = addi %334, %c16_179 : index + %337 = select %335, %336, %334 : index + %c0_181 = constant 0 : index + %c8_182 = constant 8 : index + %c0_183 = constant 0 : index + %c-1_184 = constant -1 : index + %338 = cmpi "slt", %arg5, %c0_183 : index + %339 = subi %c-1_184, %arg5 : index + %340 = select %338, %339, %arg5 : index + %341 = divi_signed %340, %c8_182 : index + %342 = subi %c-1_184, %341 : index + %343 = select %338, %342, %341 : index + %c56_185 = constant 56 : index + %344 = addi %arg5, %c56_185 : index + %c16_186 = constant 16 : index + %c0_187 = constant 0 : index + %c-1_188 = constant -1 : index + %345 = cmpi "slt", %344, %c0_187 : index + %346 = subi %c-1_188, %344 : index + %347 = select %345, %346, %344 : index + %348 = divi_signed %347, %c16_186 : index + %349 = subi %c-1_188, %348 : index + %350 = select %345, %349, %348 : index + %c-2_189 = constant -2 : index + %351 = muli %350, %c-2_189 : index + %352 = addi %343, %351 : index + %c8_190 = constant 8 : index + %c0_191 = constant 0 : index + %c-1_192 = constant -1 : index + %353 = cmpi "slt", %arg5, %c0_191 : index + %354 = subi %c-1_192, %arg5 : index + %355 = select %353, %354, %arg5 : index + %356 = divi_signed %355, %c8_190 : index + %357 = subi %c-1_192, %356 : index + %358 = select %353, %357, %356 : index + %c56_193 = constant 56 : index + %359 = addi %arg5, %c56_193 : index + %c16_194 = constant 16 : index + %c0_195 = constant 0 : index + %c-1_196 = constant -1 : index + %360 = cmpi "slt", %359, %c0_195 : index + %361 = subi %c-1_196, %359 : index + %362 = select %360, %361, %359 : index + %363 = divi_signed %362, %c16_194 : index + %364 = subi %c-1_196, %363 : index + %365 = select %360, %364, %363 : index + %c-2_197 = constant -2 : index + %366 = muli %365, %c-2_197 : index + %367 = addi %358, %366 : index + %c7_198 = constant 7 : index + %368 = addi %367, %c7_198 : index + %c2_199 = constant 2 : index + %c0_200 = constant 0 : index + %c-1_201 = constant -1 : index + %369 = cmpi "slt", %368, %c0_200 : index + %370 = subi %c-1_201, %368 : index + %371 = select %369, %370, %368 : index + %372 = divi_signed %371, %c2_199 : index + %373 = subi %c-1_201, %372 : index + %374 = select %369, %373, %372 : index + %c-2_202 = constant -2 : index + %375 = muli %374, %c-2_202 : index + %376 = addi %352, %375 : index + %c7_203 = constant 7 : index + %377 = addi %376, %c7_203 : index + %378 = load %2[%337, %c0_181, %377] : memref<16x6x2xvector<8xf32>> + %379 = addf %326, %378 : vector<8xf32> + store %379, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %380 = addi %arg3, %arg5 : index + %c64 = constant 64 : index + %381 = addi %380, %c64 : index + %382 = vector.transfer_read %arg2[%arg4, %381], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_204 = constant 16 : index + %c0_205 = constant 0 : index + %c-1_206 = constant -1 : index + %383 = cmpi "slt", %arg5, %c0_205 : index + %384 = subi %c-1_206, %arg5 : index + %385 = select %383, %384, %arg5 : index + %386 = divi_signed %385, %c16_204 : index + %387 = subi %c-1_206, %386 : index + %388 = select %383, %387, %386 : index + %c16_207 = constant 16 : index + %c0_208 = constant 0 : index + %c-1_209 = constant -1 : index + %389 = cmpi "slt", %arg5, %c0_208 : index + %390 = subi %c-1_209, %arg5 : index + %391 = select %389, %390, %arg5 : index + %392 = divi_signed %391, %c16_207 : index + %393 = subi %c-1_209, %392 : index + %394 = select %389, %393, %392 : index + %c4_210 = constant 4 : index + %395 = addi %394, %c4_210 : index + %c16_211 = constant 16 : index + %c0_212 = constant 0 : index + %c-1_213 = constant -1 : index + %396 = cmpi "slt", %395, %c0_212 : index + %397 = subi %c-1_213, %395 : index + %398 = select %396, %397, %395 : index + %399 = divi_signed %398, %c16_211 : index + %400 = subi %c-1_213, %399 : index + %401 = select %396, %400, %399 : index + %c-16_214 = constant -16 : index + %402 = muli %401, %c-16_214 : index + %403 = addi %388, %402 : index + %c4_215 = constant 4 : index + %404 = addi %403, %c4_215 : index + %c0_216 = constant 0 : index + %c16_217 = constant 16 : index + %405 = remi_signed %arg5, %c16_217 : index + %c0_218 = constant 0 : index + %406 = cmpi "slt", %405, %c0_218 : index + %407 = addi %405, %c16_217 : index + %408 = select %406, %407, %405 : index + %c8_219 = constant 8 : index + %c0_220 = constant 0 : index + %c-1_221 = constant -1 : index + %409 = cmpi "slt", %408, %c0_220 : index + %410 = subi %c-1_221, %408 : index + %411 = select %409, %410, %408 : index + %412 = divi_signed %411, %c8_219 : index + %413 = subi %c-1_221, %412 : index + %414 = select %409, %413, %412 : index + %c2_222 = constant 2 : index + %415 = remi_signed %414, %c2_222 : index + %c0_223 = constant 0 : index + %416 = cmpi "slt", %415, %c0_223 : index + %417 = addi %415, %c2_222 : index + %418 = select %416, %417, %415 : index + %419 = load %2[%404, %c0_216, %418] : memref<16x6x2xvector<8xf32>> + %420 = addf %382, %419 : vector<8xf32> + store %420, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %421 = addi %arg3, %arg5 : index + %c72 = constant 72 : index + %422 = addi %421, %c72 : index + %423 = vector.transfer_read %arg2[%arg4, %422], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c72_224 = constant 72 : index + %424 = addi %arg5, %c72_224 : index + %c16_225 = constant 16 : index + %c0_226 = constant 0 : index + %c-1_227 = constant -1 : index + %425 = cmpi "slt", %424, %c0_226 : index + %426 = subi %c-1_227, %424 : index + %427 = select %425, %426, %424 : index + %428 = divi_signed %427, %c16_225 : index + %429 = subi %c-1_227, %428 : index + %430 = select %425, %429, %428 : index + %c16_228 = constant 16 : index + %431 = remi_signed %430, %c16_228 : index + %c0_229 = constant 0 : index + %432 = cmpi "slt", %431, %c0_229 : index + %433 = addi %431, %c16_228 : index + %434 = select %432, %433, %431 : index + %c0_230 = constant 0 : index + %c8_231 = constant 8 : index + %c0_232 = constant 0 : index + %c-1_233 = constant -1 : index + %435 = cmpi "slt", %arg5, %c0_232 : index + %436 = subi %c-1_233, %arg5 : index + %437 = select %435, %436, %arg5 : index + %438 = divi_signed %437, %c8_231 : index + %439 = subi %c-1_233, %438 : index + %440 = select %435, %439, %438 : index + %c72_234 = constant 72 : index + %441 = addi %arg5, %c72_234 : index + %c16_235 = constant 16 : index + %c0_236 = constant 0 : index + %c-1_237 = constant -1 : index + %442 = cmpi "slt", %441, %c0_236 : index + %443 = subi %c-1_237, %441 : index + %444 = select %442, %443, %441 : index + %445 = divi_signed %444, %c16_235 : index + %446 = subi %c-1_237, %445 : index + %447 = select %442, %446, %445 : index + %c-2_238 = constant -2 : index + %448 = muli %447, %c-2_238 : index + %449 = addi %440, %448 : index + %c8_239 = constant 8 : index + %c0_240 = constant 0 : index + %c-1_241 = constant -1 : index + %450 = cmpi "slt", %arg5, %c0_240 : index + %451 = subi %c-1_241, %arg5 : index + %452 = select %450, %451, %arg5 : index + %453 = divi_signed %452, %c8_239 : index + %454 = subi %c-1_241, %453 : index + %455 = select %450, %454, %453 : index + %c72_242 = constant 72 : index + %456 = addi %arg5, %c72_242 : index + %c16_243 = constant 16 : index + %c0_244 = constant 0 : index + %c-1_245 = constant -1 : index + %457 = cmpi "slt", %456, %c0_244 : index + %458 = subi %c-1_245, %456 : index + %459 = select %457, %458, %456 : index + %460 = divi_signed %459, %c16_243 : index + %461 = subi %c-1_245, %460 : index + %462 = select %457, %461, %460 : index + %c-2_246 = constant -2 : index + %463 = muli %462, %c-2_246 : index + %464 = addi %455, %463 : index + %c9_247 = constant 9 : index + %465 = addi %464, %c9_247 : index + %c2_248 = constant 2 : index + %c0_249 = constant 0 : index + %c-1_250 = constant -1 : index + %466 = cmpi "slt", %465, %c0_249 : index + %467 = subi %c-1_250, %465 : index + %468 = select %466, %467, %465 : index + %469 = divi_signed %468, %c2_248 : index + %470 = subi %c-1_250, %469 : index + %471 = select %466, %470, %469 : index + %c-2_251 = constant -2 : index + %472 = muli %471, %c-2_251 : index + %473 = addi %449, %472 : index + %c9_252 = constant 9 : index + %474 = addi %473, %c9_252 : index + %475 = load %2[%434, %c0_230, %474] : memref<16x6x2xvector<8xf32>> + %476 = addf %423, %475 : vector<8xf32> + store %476, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %477 = addi %arg3, %arg5 : index + %c80 = constant 80 : index + %478 = addi %477, %c80 : index + %479 = vector.transfer_read %arg2[%arg4, %478], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_253 = constant 16 : index + %c0_254 = constant 0 : index + %c-1_255 = constant -1 : index + %480 = cmpi "slt", %arg5, %c0_254 : index + %481 = subi %c-1_255, %arg5 : index + %482 = select %480, %481, %arg5 : index + %483 = divi_signed %482, %c16_253 : index + %484 = subi %c-1_255, %483 : index + %485 = select %480, %484, %483 : index + %c16_256 = constant 16 : index + %c0_257 = constant 0 : index + %c-1_258 = constant -1 : index + %486 = cmpi "slt", %arg5, %c0_257 : index + %487 = subi %c-1_258, %arg5 : index + %488 = select %486, %487, %arg5 : index + %489 = divi_signed %488, %c16_256 : index + %490 = subi %c-1_258, %489 : index + %491 = select %486, %490, %489 : index + %c5_259 = constant 5 : index + %492 = addi %491, %c5_259 : index + %c16_260 = constant 16 : index + %c0_261 = constant 0 : index + %c-1_262 = constant -1 : index + %493 = cmpi "slt", %492, %c0_261 : index + %494 = subi %c-1_262, %492 : index + %495 = select %493, %494, %492 : index + %496 = divi_signed %495, %c16_260 : index + %497 = subi %c-1_262, %496 : index + %498 = select %493, %497, %496 : index + %c-16_263 = constant -16 : index + %499 = muli %498, %c-16_263 : index + %500 = addi %485, %499 : index + %c5_264 = constant 5 : index + %501 = addi %500, %c5_264 : index + %c0_265 = constant 0 : index + %c16_266 = constant 16 : index + %502 = remi_signed %arg5, %c16_266 : index + %c0_267 = constant 0 : index + %503 = cmpi "slt", %502, %c0_267 : index + %504 = addi %502, %c16_266 : index + %505 = select %503, %504, %502 : index + %c8_268 = constant 8 : index + %c0_269 = constant 0 : index + %c-1_270 = constant -1 : index + %506 = cmpi "slt", %505, %c0_269 : index + %507 = subi %c-1_270, %505 : index + %508 = select %506, %507, %505 : index + %509 = divi_signed %508, %c8_268 : index + %510 = subi %c-1_270, %509 : index + %511 = select %506, %510, %509 : index + %c2_271 = constant 2 : index + %512 = remi_signed %511, %c2_271 : index + %c0_272 = constant 0 : index + %513 = cmpi "slt", %512, %c0_272 : index + %514 = addi %512, %c2_271 : index + %515 = select %513, %514, %512 : index + %516 = load %2[%501, %c0_265, %515] : memref<16x6x2xvector<8xf32>> + %517 = addf %479, %516 : vector<8xf32> + store %517, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %518 = addi %arg3, %arg5 : index + %c88 = constant 88 : index + %519 = addi %518, %c88 : index + %520 = vector.transfer_read %arg2[%arg4, %519], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c88_273 = constant 88 : index + %521 = addi %arg5, %c88_273 : index + %c16_274 = constant 16 : index + %c0_275 = constant 0 : index + %c-1_276 = constant -1 : index + %522 = cmpi "slt", %521, %c0_275 : index + %523 = subi %c-1_276, %521 : index + %524 = select %522, %523, %521 : index + %525 = divi_signed %524, %c16_274 : index + %526 = subi %c-1_276, %525 : index + %527 = select %522, %526, %525 : index + %c16_277 = constant 16 : index + %528 = remi_signed %527, %c16_277 : index + %c0_278 = constant 0 : index + %529 = cmpi "slt", %528, %c0_278 : index + %530 = addi %528, %c16_277 : index + %531 = select %529, %530, %528 : index + %c0_279 = constant 0 : index + %c8_280 = constant 8 : index + %c0_281 = constant 0 : index + %c-1_282 = constant -1 : index + %532 = cmpi "slt", %arg5, %c0_281 : index + %533 = subi %c-1_282, %arg5 : index + %534 = select %532, %533, %arg5 : index + %535 = divi_signed %534, %c8_280 : index + %536 = subi %c-1_282, %535 : index + %537 = select %532, %536, %535 : index + %c88_283 = constant 88 : index + %538 = addi %arg5, %c88_283 : index + %c16_284 = constant 16 : index + %c0_285 = constant 0 : index + %c-1_286 = constant -1 : index + %539 = cmpi "slt", %538, %c0_285 : index + %540 = subi %c-1_286, %538 : index + %541 = select %539, %540, %538 : index + %542 = divi_signed %541, %c16_284 : index + %543 = subi %c-1_286, %542 : index + %544 = select %539, %543, %542 : index + %c-2_287 = constant -2 : index + %545 = muli %544, %c-2_287 : index + %546 = addi %537, %545 : index + %c8_288 = constant 8 : index + %c0_289 = constant 0 : index + %c-1_290 = constant -1 : index + %547 = cmpi "slt", %arg5, %c0_289 : index + %548 = subi %c-1_290, %arg5 : index + %549 = select %547, %548, %arg5 : index + %550 = divi_signed %549, %c8_288 : index + %551 = subi %c-1_290, %550 : index + %552 = select %547, %551, %550 : index + %c88_291 = constant 88 : index + %553 = addi %arg5, %c88_291 : index + %c16_292 = constant 16 : index + %c0_293 = constant 0 : index + %c-1_294 = constant -1 : index + %554 = cmpi "slt", %553, %c0_293 : index + %555 = subi %c-1_294, %553 : index + %556 = select %554, %555, %553 : index + %557 = divi_signed %556, %c16_292 : index + %558 = subi %c-1_294, %557 : index + %559 = select %554, %558, %557 : index + %c-2_295 = constant -2 : index + %560 = muli %559, %c-2_295 : index + %561 = addi %552, %560 : index + %c11_296 = constant 11 : index + %562 = addi %561, %c11_296 : index + %c2_297 = constant 2 : index + %c0_298 = constant 0 : index + %c-1_299 = constant -1 : index + %563 = cmpi "slt", %562, %c0_298 : index + %564 = subi %c-1_299, %562 : index + %565 = select %563, %564, %562 : index + %566 = divi_signed %565, %c2_297 : index + %567 = subi %c-1_299, %566 : index + %568 = select %563, %567, %566 : index + %c-2_300 = constant -2 : index + %569 = muli %568, %c-2_300 : index + %570 = addi %546, %569 : index + %c11_301 = constant 11 : index + %571 = addi %570, %c11_301 : index + %572 = load %2[%531, %c0_279, %571] : memref<16x6x2xvector<8xf32>> + %573 = addf %520, %572 : vector<8xf32> + store %573, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %574 = addi %arg3, %arg5 : index + %c96 = constant 96 : index + %575 = addi %574, %c96 : index + %576 = vector.transfer_read %arg2[%arg4, %575], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_302 = constant 16 : index + %c0_303 = constant 0 : index + %c-1_304 = constant -1 : index + %577 = cmpi "slt", %arg5, %c0_303 : index + %578 = subi %c-1_304, %arg5 : index + %579 = select %577, %578, %arg5 : index + %580 = divi_signed %579, %c16_302 : index + %581 = subi %c-1_304, %580 : index + %582 = select %577, %581, %580 : index + %c16_305 = constant 16 : index + %c0_306 = constant 0 : index + %c-1_307 = constant -1 : index + %583 = cmpi "slt", %arg5, %c0_306 : index + %584 = subi %c-1_307, %arg5 : index + %585 = select %583, %584, %arg5 : index + %586 = divi_signed %585, %c16_305 : index + %587 = subi %c-1_307, %586 : index + %588 = select %583, %587, %586 : index + %c6_308 = constant 6 : index + %589 = addi %588, %c6_308 : index + %c16_309 = constant 16 : index + %c0_310 = constant 0 : index + %c-1_311 = constant -1 : index + %590 = cmpi "slt", %589, %c0_310 : index + %591 = subi %c-1_311, %589 : index + %592 = select %590, %591, %589 : index + %593 = divi_signed %592, %c16_309 : index + %594 = subi %c-1_311, %593 : index + %595 = select %590, %594, %593 : index + %c-16_312 = constant -16 : index + %596 = muli %595, %c-16_312 : index + %597 = addi %582, %596 : index + %c6_313 = constant 6 : index + %598 = addi %597, %c6_313 : index + %c0_314 = constant 0 : index + %c16_315 = constant 16 : index + %599 = remi_signed %arg5, %c16_315 : index + %c0_316 = constant 0 : index + %600 = cmpi "slt", %599, %c0_316 : index + %601 = addi %599, %c16_315 : index + %602 = select %600, %601, %599 : index + %c8_317 = constant 8 : index + %c0_318 = constant 0 : index + %c-1_319 = constant -1 : index + %603 = cmpi "slt", %602, %c0_318 : index + %604 = subi %c-1_319, %602 : index + %605 = select %603, %604, %602 : index + %606 = divi_signed %605, %c8_317 : index + %607 = subi %c-1_319, %606 : index + %608 = select %603, %607, %606 : index + %c2_320 = constant 2 : index + %609 = remi_signed %608, %c2_320 : index + %c0_321 = constant 0 : index + %610 = cmpi "slt", %609, %c0_321 : index + %611 = addi %609, %c2_320 : index + %612 = select %610, %611, %609 : index + %613 = load %2[%598, %c0_314, %612] : memref<16x6x2xvector<8xf32>> + %614 = addf %576, %613 : vector<8xf32> + store %614, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %615 = addi %arg3, %arg5 : index + %c104 = constant 104 : index + %616 = addi %615, %c104 : index + %617 = vector.transfer_read %arg2[%arg4, %616], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c104_322 = constant 104 : index + %618 = addi %arg5, %c104_322 : index + %c16_323 = constant 16 : index + %c0_324 = constant 0 : index + %c-1_325 = constant -1 : index + %619 = cmpi "slt", %618, %c0_324 : index + %620 = subi %c-1_325, %618 : index + %621 = select %619, %620, %618 : index + %622 = divi_signed %621, %c16_323 : index + %623 = subi %c-1_325, %622 : index + %624 = select %619, %623, %622 : index + %c16_326 = constant 16 : index + %625 = remi_signed %624, %c16_326 : index + %c0_327 = constant 0 : index + %626 = cmpi "slt", %625, %c0_327 : index + %627 = addi %625, %c16_326 : index + %628 = select %626, %627, %625 : index + %c0_328 = constant 0 : index + %c8_329 = constant 8 : index + %c0_330 = constant 0 : index + %c-1_331 = constant -1 : index + %629 = cmpi "slt", %arg5, %c0_330 : index + %630 = subi %c-1_331, %arg5 : index + %631 = select %629, %630, %arg5 : index + %632 = divi_signed %631, %c8_329 : index + %633 = subi %c-1_331, %632 : index + %634 = select %629, %633, %632 : index + %c104_332 = constant 104 : index + %635 = addi %arg5, %c104_332 : index + %c16_333 = constant 16 : index + %c0_334 = constant 0 : index + %c-1_335 = constant -1 : index + %636 = cmpi "slt", %635, %c0_334 : index + %637 = subi %c-1_335, %635 : index + %638 = select %636, %637, %635 : index + %639 = divi_signed %638, %c16_333 : index + %640 = subi %c-1_335, %639 : index + %641 = select %636, %640, %639 : index + %c-2_336 = constant -2 : index + %642 = muli %641, %c-2_336 : index + %643 = addi %634, %642 : index + %c8_337 = constant 8 : index + %c0_338 = constant 0 : index + %c-1_339 = constant -1 : index + %644 = cmpi "slt", %arg5, %c0_338 : index + %645 = subi %c-1_339, %arg5 : index + %646 = select %644, %645, %arg5 : index + %647 = divi_signed %646, %c8_337 : index + %648 = subi %c-1_339, %647 : index + %649 = select %644, %648, %647 : index + %c104_340 = constant 104 : index + %650 = addi %arg5, %c104_340 : index + %c16_341 = constant 16 : index + %c0_342 = constant 0 : index + %c-1_343 = constant -1 : index + %651 = cmpi "slt", %650, %c0_342 : index + %652 = subi %c-1_343, %650 : index + %653 = select %651, %652, %650 : index + %654 = divi_signed %653, %c16_341 : index + %655 = subi %c-1_343, %654 : index + %656 = select %651, %655, %654 : index + %c-2_344 = constant -2 : index + %657 = muli %656, %c-2_344 : index + %658 = addi %649, %657 : index + %c13_345 = constant 13 : index + %659 = addi %658, %c13_345 : index + %c2_346 = constant 2 : index + %c0_347 = constant 0 : index + %c-1_348 = constant -1 : index + %660 = cmpi "slt", %659, %c0_347 : index + %661 = subi %c-1_348, %659 : index + %662 = select %660, %661, %659 : index + %663 = divi_signed %662, %c2_346 : index + %664 = subi %c-1_348, %663 : index + %665 = select %660, %664, %663 : index + %c-2_349 = constant -2 : index + %666 = muli %665, %c-2_349 : index + %667 = addi %643, %666 : index + %c13_350 = constant 13 : index + %668 = addi %667, %c13_350 : index + %669 = load %2[%628, %c0_328, %668] : memref<16x6x2xvector<8xf32>> + %670 = addf %617, %669 : vector<8xf32> + store %670, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %671 = addi %arg3, %arg5 : index + %c112 = constant 112 : index + %672 = addi %671, %c112 : index + %673 = vector.transfer_read %arg2[%arg4, %672], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_351 = constant 16 : index + %c0_352 = constant 0 : index + %c-1_353 = constant -1 : index + %674 = cmpi "slt", %arg5, %c0_352 : index + %675 = subi %c-1_353, %arg5 : index + %676 = select %674, %675, %arg5 : index + %677 = divi_signed %676, %c16_351 : index + %678 = subi %c-1_353, %677 : index + %679 = select %674, %678, %677 : index + %c16_354 = constant 16 : index + %c0_355 = constant 0 : index + %c-1_356 = constant -1 : index + %680 = cmpi "slt", %arg5, %c0_355 : index + %681 = subi %c-1_356, %arg5 : index + %682 = select %680, %681, %arg5 : index + %683 = divi_signed %682, %c16_354 : index + %684 = subi %c-1_356, %683 : index + %685 = select %680, %684, %683 : index + %c7_357 = constant 7 : index + %686 = addi %685, %c7_357 : index + %c16_358 = constant 16 : index + %c0_359 = constant 0 : index + %c-1_360 = constant -1 : index + %687 = cmpi "slt", %686, %c0_359 : index + %688 = subi %c-1_360, %686 : index + %689 = select %687, %688, %686 : index + %690 = divi_signed %689, %c16_358 : index + %691 = subi %c-1_360, %690 : index + %692 = select %687, %691, %690 : index + %c-16_361 = constant -16 : index + %693 = muli %692, %c-16_361 : index + %694 = addi %679, %693 : index + %c7_362 = constant 7 : index + %695 = addi %694, %c7_362 : index + %c0_363 = constant 0 : index + %c16_364 = constant 16 : index + %696 = remi_signed %arg5, %c16_364 : index + %c0_365 = constant 0 : index + %697 = cmpi "slt", %696, %c0_365 : index + %698 = addi %696, %c16_364 : index + %699 = select %697, %698, %696 : index + %c8_366 = constant 8 : index + %c0_367 = constant 0 : index + %c-1_368 = constant -1 : index + %700 = cmpi "slt", %699, %c0_367 : index + %701 = subi %c-1_368, %699 : index + %702 = select %700, %701, %699 : index + %703 = divi_signed %702, %c8_366 : index + %704 = subi %c-1_368, %703 : index + %705 = select %700, %704, %703 : index + %c2_369 = constant 2 : index + %706 = remi_signed %705, %c2_369 : index + %c0_370 = constant 0 : index + %707 = cmpi "slt", %706, %c0_370 : index + %708 = addi %706, %c2_369 : index + %709 = select %707, %708, %706 : index + %710 = load %2[%695, %c0_363, %709] : memref<16x6x2xvector<8xf32>> + %711 = addf %673, %710 : vector<8xf32> + store %711, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %712 = addi %arg3, %arg5 : index + %c120 = constant 120 : index + %713 = addi %712, %c120 : index + %714 = vector.transfer_read %arg2[%arg4, %713], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c120_371 = constant 120 : index + %715 = addi %arg5, %c120_371 : index + %c16_372 = constant 16 : index + %c0_373 = constant 0 : index + %c-1_374 = constant -1 : index + %716 = cmpi "slt", %715, %c0_373 : index + %717 = subi %c-1_374, %715 : index + %718 = select %716, %717, %715 : index + %719 = divi_signed %718, %c16_372 : index + %720 = subi %c-1_374, %719 : index + %721 = select %716, %720, %719 : index + %c16_375 = constant 16 : index + %722 = remi_signed %721, %c16_375 : index + %c0_376 = constant 0 : index + %723 = cmpi "slt", %722, %c0_376 : index + %724 = addi %722, %c16_375 : index + %725 = select %723, %724, %722 : index + %c0_377 = constant 0 : index + %c8_378 = constant 8 : index + %c0_379 = constant 0 : index + %c-1_380 = constant -1 : index + %726 = cmpi "slt", %arg5, %c0_379 : index + %727 = subi %c-1_380, %arg5 : index + %728 = select %726, %727, %arg5 : index + %729 = divi_signed %728, %c8_378 : index + %730 = subi %c-1_380, %729 : index + %731 = select %726, %730, %729 : index + %c120_381 = constant 120 : index + %732 = addi %arg5, %c120_381 : index + %c16_382 = constant 16 : index + %c0_383 = constant 0 : index + %c-1_384 = constant -1 : index + %733 = cmpi "slt", %732, %c0_383 : index + %734 = subi %c-1_384, %732 : index + %735 = select %733, %734, %732 : index + %736 = divi_signed %735, %c16_382 : index + %737 = subi %c-1_384, %736 : index + %738 = select %733, %737, %736 : index + %c-2_385 = constant -2 : index + %739 = muli %738, %c-2_385 : index + %740 = addi %731, %739 : index + %c8_386 = constant 8 : index + %c0_387 = constant 0 : index + %c-1_388 = constant -1 : index + %741 = cmpi "slt", %arg5, %c0_387 : index + %742 = subi %c-1_388, %arg5 : index + %743 = select %741, %742, %arg5 : index + %744 = divi_signed %743, %c8_386 : index + %745 = subi %c-1_388, %744 : index + %746 = select %741, %745, %744 : index + %c120_389 = constant 120 : index + %747 = addi %arg5, %c120_389 : index + %c16_390 = constant 16 : index + %c0_391 = constant 0 : index + %c-1_392 = constant -1 : index + %748 = cmpi "slt", %747, %c0_391 : index + %749 = subi %c-1_392, %747 : index + %750 = select %748, %749, %747 : index + %751 = divi_signed %750, %c16_390 : index + %752 = subi %c-1_392, %751 : index + %753 = select %748, %752, %751 : index + %c-2_393 = constant -2 : index + %754 = muli %753, %c-2_393 : index + %755 = addi %746, %754 : index + %c15_394 = constant 15 : index + %756 = addi %755, %c15_394 : index + %c2_395 = constant 2 : index + %c0_396 = constant 0 : index + %c-1_397 = constant -1 : index + %757 = cmpi "slt", %756, %c0_396 : index + %758 = subi %c-1_397, %756 : index + %759 = select %757, %758, %756 : index + %760 = divi_signed %759, %c2_395 : index + %761 = subi %c-1_397, %760 : index + %762 = select %757, %761, %760 : index + %c-2_398 = constant -2 : index + %763 = muli %762, %c-2_398 : index + %764 = addi %740, %763 : index + %c15_399 = constant 15 : index + %765 = addi %764, %c15_399 : index + %766 = load %2[%725, %c0_377, %765] : memref<16x6x2xvector<8xf32>> + %767 = addf %714, %766 : vector<8xf32> + store %767, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + %c0_400 = constant 0 : index + %c16_401 = constant 16 : index + %c1_402 = constant 1 : index + scf.for %arg6 = %c0_400 to %c16_401 step %c1_402 { + %768 = addi %arg3, %arg5 : index + %c8_403 = constant 8 : index + %769 = muli %arg6, %c8_403 : index + %770 = addi %768, %769 : index + %771 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %771, %arg2[%arg4, %770] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %5 = addi %arg3, %arg5 : index + %6 = vector.transfer_read %arg2[%arg4, %5], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_16 = constant 16 : index + %c0_17 = constant 0 : index + %c-1 = constant -1 : index + %7 = cmpi "slt", %arg5, %c0_17 : index + %8 = subi %c-1, %arg5 : index + %9 = select %7, %8, %arg5 : index + %10 = divi_signed %9, %c16_16 : index + %11 = subi %c-1, %10 : index + %12 = select %7, %11, %10 : index + %c16_18 = constant 16 : index + %13 = remi_signed %12, %c16_18 : index + %c0_19 = constant 0 : index + %14 = cmpi "slt", %13, %c0_19 : index + %15 = addi %13, %c16_18 : index + %16 = select %14, %15, %13 : index + %c0_20 = constant 0 : index + %c16_21 = constant 16 : index + %17 = remi_signed %arg5, %c16_21 : index + %c0_22 = constant 0 : index + %18 = cmpi "slt", %17, %c0_22 : index + %19 = addi %17, %c16_21 : index + %20 = select %18, %19, %17 : index + %c8_23 = constant 8 : index + %c0_24 = constant 0 : index + %c-1_25 = constant -1 : index + %21 = cmpi "slt", %20, %c0_24 : index + %22 = subi %c-1_25, %20 : index + %23 = select %21, %22, %20 : index + %24 = divi_signed %23, %c8_23 : index + %25 = subi %c-1_25, %24 : index + %26 = select %21, %25, %24 : index + %c2_26 = constant 2 : index + %27 = remi_signed %26, %c2_26 : index + %c0_27 = constant 0 : index + %28 = cmpi "slt", %27, %c0_27 : index + %29 = addi %27, %c2_26 : index + %30 = select %28, %29, %27 : index + %31 = load %2[%16, %c0_20, %30] : memref<16x6x2xvector<8xf32>> + %32 = addf %6, %31 : vector<8xf32> + store %32, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %33 = addi %arg3, %arg5 : index + %c8_28 = constant 8 : index + %34 = addi %33, %c8_28 : index + %35 = vector.transfer_read %arg2[%arg4, %34], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c8_29 = constant 8 : index + %36 = addi %arg5, %c8_29 : index + %c16_30 = constant 16 : index + %c0_31 = constant 0 : index + %c-1_32 = constant -1 : index + %37 = cmpi "slt", %36, %c0_31 : index + %38 = subi %c-1_32, %36 : index + %39 = select %37, %38, %36 : index + %40 = divi_signed %39, %c16_30 : index + %41 = subi %c-1_32, %40 : index + %42 = select %37, %41, %40 : index + %c16_33 = constant 16 : index + %43 = remi_signed %42, %c16_33 : index + %c0_34 = constant 0 : index + %44 = cmpi "slt", %43, %c0_34 : index + %45 = addi %43, %c16_33 : index + %46 = select %44, %45, %43 : index + %c0_35 = constant 0 : index + %c8_36 = constant 8 : index + %c0_37 = constant 0 : index + %c-1_38 = constant -1 : index + %47 = cmpi "slt", %arg5, %c0_37 : index + %48 = subi %c-1_38, %arg5 : index + %49 = select %47, %48, %arg5 : index + %50 = divi_signed %49, %c8_36 : index + %51 = subi %c-1_38, %50 : index + %52 = select %47, %51, %50 : index + %c8_39 = constant 8 : index + %53 = addi %arg5, %c8_39 : index + %c16_40 = constant 16 : index + %c0_41 = constant 0 : index + %c-1_42 = constant -1 : index + %54 = cmpi "slt", %53, %c0_41 : index + %55 = subi %c-1_42, %53 : index + %56 = select %54, %55, %53 : index + %57 = divi_signed %56, %c16_40 : index + %58 = subi %c-1_42, %57 : index + %59 = select %54, %58, %57 : index + %c-2 = constant -2 : index + %60 = muli %59, %c-2 : index + %61 = addi %52, %60 : index + %c8_43 = constant 8 : index + %c0_44 = constant 0 : index + %c-1_45 = constant -1 : index + %62 = cmpi "slt", %arg5, %c0_44 : index + %63 = subi %c-1_45, %arg5 : index + %64 = select %62, %63, %arg5 : index + %65 = divi_signed %64, %c8_43 : index + %66 = subi %c-1_45, %65 : index + %67 = select %62, %66, %65 : index + %c8_46 = constant 8 : index + %68 = addi %arg5, %c8_46 : index + %c16_47 = constant 16 : index + %c0_48 = constant 0 : index + %c-1_49 = constant -1 : index + %69 = cmpi "slt", %68, %c0_48 : index + %70 = subi %c-1_49, %68 : index + %71 = select %69, %70, %68 : index + %72 = divi_signed %71, %c16_47 : index + %73 = subi %c-1_49, %72 : index + %74 = select %69, %73, %72 : index + %c-2_50 = constant -2 : index + %75 = muli %74, %c-2_50 : index + %76 = addi %67, %75 : index + %c1_51 = constant 1 : index + %77 = addi %76, %c1_51 : index + %c2_52 = constant 2 : index + %c0_53 = constant 0 : index + %c-1_54 = constant -1 : index + %78 = cmpi "slt", %77, %c0_53 : index + %79 = subi %c-1_54, %77 : index + %80 = select %78, %79, %77 : index + %81 = divi_signed %80, %c2_52 : index + %82 = subi %c-1_54, %81 : index + %83 = select %78, %82, %81 : index + %c-2_55 = constant -2 : index + %84 = muli %83, %c-2_55 : index + %85 = addi %61, %84 : index + %c1_56 = constant 1 : index + %86 = addi %85, %c1_56 : index + %87 = load %2[%46, %c0_35, %86] : memref<16x6x2xvector<8xf32>> + %88 = addf %35, %87 : vector<8xf32> + store %88, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %89 = addi %arg3, %arg5 : index + %c16_57 = constant 16 : index + %90 = addi %89, %c16_57 : index + %91 = vector.transfer_read %arg2[%arg4, %90], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_58 = constant 16 : index + %c0_59 = constant 0 : index + %c-1_60 = constant -1 : index + %92 = cmpi "slt", %arg5, %c0_59 : index + %93 = subi %c-1_60, %arg5 : index + %94 = select %92, %93, %arg5 : index + %95 = divi_signed %94, %c16_58 : index + %96 = subi %c-1_60, %95 : index + %97 = select %92, %96, %95 : index + %c16_61 = constant 16 : index + %c0_62 = constant 0 : index + %c-1_63 = constant -1 : index + %98 = cmpi "slt", %arg5, %c0_62 : index + %99 = subi %c-1_63, %arg5 : index + %100 = select %98, %99, %arg5 : index + %101 = divi_signed %100, %c16_61 : index + %102 = subi %c-1_63, %101 : index + %103 = select %98, %102, %101 : index + %c1_64 = constant 1 : index + %104 = addi %103, %c1_64 : index + %c16_65 = constant 16 : index + %c0_66 = constant 0 : index + %c-1_67 = constant -1 : index + %105 = cmpi "slt", %104, %c0_66 : index + %106 = subi %c-1_67, %104 : index + %107 = select %105, %106, %104 : index + %108 = divi_signed %107, %c16_65 : index + %109 = subi %c-1_67, %108 : index + %110 = select %105, %109, %108 : index + %c-16 = constant -16 : index + %111 = muli %110, %c-16 : index + %112 = addi %97, %111 : index + %c1_68 = constant 1 : index + %113 = addi %112, %c1_68 : index + %c0_69 = constant 0 : index + %c16_70 = constant 16 : index + %114 = remi_signed %arg5, %c16_70 : index + %c0_71 = constant 0 : index + %115 = cmpi "slt", %114, %c0_71 : index + %116 = addi %114, %c16_70 : index + %117 = select %115, %116, %114 : index + %c8_72 = constant 8 : index + %c0_73 = constant 0 : index + %c-1_74 = constant -1 : index + %118 = cmpi "slt", %117, %c0_73 : index + %119 = subi %c-1_74, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c8_72 : index + %122 = subi %c-1_74, %121 : index + %123 = select %118, %122, %121 : index + %c2_75 = constant 2 : index + %124 = remi_signed %123, %c2_75 : index + %c0_76 = constant 0 : index + %125 = cmpi "slt", %124, %c0_76 : index + %126 = addi %124, %c2_75 : index + %127 = select %125, %126, %124 : index + %128 = load %2[%113, %c0_69, %127] : memref<16x6x2xvector<8xf32>> + %129 = addf %91, %128 : vector<8xf32> + store %129, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %130 = addi %arg3, %arg5 : index + %c24 = constant 24 : index + %131 = addi %130, %c24 : index + %132 = vector.transfer_read %arg2[%arg4, %131], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c24_77 = constant 24 : index + %133 = addi %arg5, %c24_77 : index + %c16_78 = constant 16 : index + %c0_79 = constant 0 : index + %c-1_80 = constant -1 : index + %134 = cmpi "slt", %133, %c0_79 : index + %135 = subi %c-1_80, %133 : index + %136 = select %134, %135, %133 : index + %137 = divi_signed %136, %c16_78 : index + %138 = subi %c-1_80, %137 : index + %139 = select %134, %138, %137 : index + %c16_81 = constant 16 : index + %140 = remi_signed %139, %c16_81 : index + %c0_82 = constant 0 : index + %141 = cmpi "slt", %140, %c0_82 : index + %142 = addi %140, %c16_81 : index + %143 = select %141, %142, %140 : index + %c0_83 = constant 0 : index + %c8_84 = constant 8 : index + %c0_85 = constant 0 : index + %c-1_86 = constant -1 : index + %144 = cmpi "slt", %arg5, %c0_85 : index + %145 = subi %c-1_86, %arg5 : index + %146 = select %144, %145, %arg5 : index + %147 = divi_signed %146, %c8_84 : index + %148 = subi %c-1_86, %147 : index + %149 = select %144, %148, %147 : index + %c24_87 = constant 24 : index + %150 = addi %arg5, %c24_87 : index + %c16_88 = constant 16 : index + %c0_89 = constant 0 : index + %c-1_90 = constant -1 : index + %151 = cmpi "slt", %150, %c0_89 : index + %152 = subi %c-1_90, %150 : index + %153 = select %151, %152, %150 : index + %154 = divi_signed %153, %c16_88 : index + %155 = subi %c-1_90, %154 : index + %156 = select %151, %155, %154 : index + %c-2_91 = constant -2 : index + %157 = muli %156, %c-2_91 : index + %158 = addi %149, %157 : index + %c8_92 = constant 8 : index + %c0_93 = constant 0 : index + %c-1_94 = constant -1 : index + %159 = cmpi "slt", %arg5, %c0_93 : index + %160 = subi %c-1_94, %arg5 : index + %161 = select %159, %160, %arg5 : index + %162 = divi_signed %161, %c8_92 : index + %163 = subi %c-1_94, %162 : index + %164 = select %159, %163, %162 : index + %c24_95 = constant 24 : index + %165 = addi %arg5, %c24_95 : index + %c16_96 = constant 16 : index + %c0_97 = constant 0 : index + %c-1_98 = constant -1 : index + %166 = cmpi "slt", %165, %c0_97 : index + %167 = subi %c-1_98, %165 : index + %168 = select %166, %167, %165 : index + %169 = divi_signed %168, %c16_96 : index + %170 = subi %c-1_98, %169 : index + %171 = select %166, %170, %169 : index + %c-2_99 = constant -2 : index + %172 = muli %171, %c-2_99 : index + %173 = addi %164, %172 : index + %c3_100 = constant 3 : index + %174 = addi %173, %c3_100 : index + %c2_101 = constant 2 : index + %c0_102 = constant 0 : index + %c-1_103 = constant -1 : index + %175 = cmpi "slt", %174, %c0_102 : index + %176 = subi %c-1_103, %174 : index + %177 = select %175, %176, %174 : index + %178 = divi_signed %177, %c2_101 : index + %179 = subi %c-1_103, %178 : index + %180 = select %175, %179, %178 : index + %c-2_104 = constant -2 : index + %181 = muli %180, %c-2_104 : index + %182 = addi %158, %181 : index + %c3_105 = constant 3 : index + %183 = addi %182, %c3_105 : index + %184 = load %2[%143, %c0_83, %183] : memref<16x6x2xvector<8xf32>> + %185 = addf %132, %184 : vector<8xf32> + store %185, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %186 = addi %arg3, %arg5 : index + %c32 = constant 32 : index + %187 = addi %186, %c32 : index + %188 = vector.transfer_read %arg2[%arg4, %187], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_106 = constant 16 : index + %c0_107 = constant 0 : index + %c-1_108 = constant -1 : index + %189 = cmpi "slt", %arg5, %c0_107 : index + %190 = subi %c-1_108, %arg5 : index + %191 = select %189, %190, %arg5 : index + %192 = divi_signed %191, %c16_106 : index + %193 = subi %c-1_108, %192 : index + %194 = select %189, %193, %192 : index + %c16_109 = constant 16 : index + %c0_110 = constant 0 : index + %c-1_111 = constant -1 : index + %195 = cmpi "slt", %arg5, %c0_110 : index + %196 = subi %c-1_111, %arg5 : index + %197 = select %195, %196, %arg5 : index + %198 = divi_signed %197, %c16_109 : index + %199 = subi %c-1_111, %198 : index + %200 = select %195, %199, %198 : index + %c2_112 = constant 2 : index + %201 = addi %200, %c2_112 : index + %c16_113 = constant 16 : index + %c0_114 = constant 0 : index + %c-1_115 = constant -1 : index + %202 = cmpi "slt", %201, %c0_114 : index + %203 = subi %c-1_115, %201 : index + %204 = select %202, %203, %201 : index + %205 = divi_signed %204, %c16_113 : index + %206 = subi %c-1_115, %205 : index + %207 = select %202, %206, %205 : index + %c-16_116 = constant -16 : index + %208 = muli %207, %c-16_116 : index + %209 = addi %194, %208 : index + %c2_117 = constant 2 : index + %210 = addi %209, %c2_117 : index + %c0_118 = constant 0 : index + %c16_119 = constant 16 : index + %211 = remi_signed %arg5, %c16_119 : index + %c0_120 = constant 0 : index + %212 = cmpi "slt", %211, %c0_120 : index + %213 = addi %211, %c16_119 : index + %214 = select %212, %213, %211 : index + %c8_121 = constant 8 : index + %c0_122 = constant 0 : index + %c-1_123 = constant -1 : index + %215 = cmpi "slt", %214, %c0_122 : index + %216 = subi %c-1_123, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c8_121 : index + %219 = subi %c-1_123, %218 : index + %220 = select %215, %219, %218 : index + %c2_124 = constant 2 : index + %221 = remi_signed %220, %c2_124 : index + %c0_125 = constant 0 : index + %222 = cmpi "slt", %221, %c0_125 : index + %223 = addi %221, %c2_124 : index + %224 = select %222, %223, %221 : index + %225 = load %2[%210, %c0_118, %224] : memref<16x6x2xvector<8xf32>> + %226 = addf %188, %225 : vector<8xf32> + store %226, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %227 = addi %arg3, %arg5 : index + %c40 = constant 40 : index + %228 = addi %227, %c40 : index + %229 = vector.transfer_read %arg2[%arg4, %228], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c40_126 = constant 40 : index + %230 = addi %arg5, %c40_126 : index + %c16_127 = constant 16 : index + %c0_128 = constant 0 : index + %c-1_129 = constant -1 : index + %231 = cmpi "slt", %230, %c0_128 : index + %232 = subi %c-1_129, %230 : index + %233 = select %231, %232, %230 : index + %234 = divi_signed %233, %c16_127 : index + %235 = subi %c-1_129, %234 : index + %236 = select %231, %235, %234 : index + %c16_130 = constant 16 : index + %237 = remi_signed %236, %c16_130 : index + %c0_131 = constant 0 : index + %238 = cmpi "slt", %237, %c0_131 : index + %239 = addi %237, %c16_130 : index + %240 = select %238, %239, %237 : index + %c0_132 = constant 0 : index + %c8_133 = constant 8 : index + %c0_134 = constant 0 : index + %c-1_135 = constant -1 : index + %241 = cmpi "slt", %arg5, %c0_134 : index + %242 = subi %c-1_135, %arg5 : index + %243 = select %241, %242, %arg5 : index + %244 = divi_signed %243, %c8_133 : index + %245 = subi %c-1_135, %244 : index + %246 = select %241, %245, %244 : index + %c40_136 = constant 40 : index + %247 = addi %arg5, %c40_136 : index + %c16_137 = constant 16 : index + %c0_138 = constant 0 : index + %c-1_139 = constant -1 : index + %248 = cmpi "slt", %247, %c0_138 : index + %249 = subi %c-1_139, %247 : index + %250 = select %248, %249, %247 : index + %251 = divi_signed %250, %c16_137 : index + %252 = subi %c-1_139, %251 : index + %253 = select %248, %252, %251 : index + %c-2_140 = constant -2 : index + %254 = muli %253, %c-2_140 : index + %255 = addi %246, %254 : index + %c8_141 = constant 8 : index + %c0_142 = constant 0 : index + %c-1_143 = constant -1 : index + %256 = cmpi "slt", %arg5, %c0_142 : index + %257 = subi %c-1_143, %arg5 : index + %258 = select %256, %257, %arg5 : index + %259 = divi_signed %258, %c8_141 : index + %260 = subi %c-1_143, %259 : index + %261 = select %256, %260, %259 : index + %c40_144 = constant 40 : index + %262 = addi %arg5, %c40_144 : index + %c16_145 = constant 16 : index + %c0_146 = constant 0 : index + %c-1_147 = constant -1 : index + %263 = cmpi "slt", %262, %c0_146 : index + %264 = subi %c-1_147, %262 : index + %265 = select %263, %264, %262 : index + %266 = divi_signed %265, %c16_145 : index + %267 = subi %c-1_147, %266 : index + %268 = select %263, %267, %266 : index + %c-2_148 = constant -2 : index + %269 = muli %268, %c-2_148 : index + %270 = addi %261, %269 : index + %c5_149 = constant 5 : index + %271 = addi %270, %c5_149 : index + %c2_150 = constant 2 : index + %c0_151 = constant 0 : index + %c-1_152 = constant -1 : index + %272 = cmpi "slt", %271, %c0_151 : index + %273 = subi %c-1_152, %271 : index + %274 = select %272, %273, %271 : index + %275 = divi_signed %274, %c2_150 : index + %276 = subi %c-1_152, %275 : index + %277 = select %272, %276, %275 : index + %c-2_153 = constant -2 : index + %278 = muli %277, %c-2_153 : index + %279 = addi %255, %278 : index + %c5_154 = constant 5 : index + %280 = addi %279, %c5_154 : index + %281 = load %2[%240, %c0_132, %280] : memref<16x6x2xvector<8xf32>> + %282 = addf %229, %281 : vector<8xf32> + store %282, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %283 = addi %arg3, %arg5 : index + %c48 = constant 48 : index + %284 = addi %283, %c48 : index + %285 = vector.transfer_read %arg2[%arg4, %284], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_155 = constant 16 : index + %c0_156 = constant 0 : index + %c-1_157 = constant -1 : index + %286 = cmpi "slt", %arg5, %c0_156 : index + %287 = subi %c-1_157, %arg5 : index + %288 = select %286, %287, %arg5 : index + %289 = divi_signed %288, %c16_155 : index + %290 = subi %c-1_157, %289 : index + %291 = select %286, %290, %289 : index + %c16_158 = constant 16 : index + %c0_159 = constant 0 : index + %c-1_160 = constant -1 : index + %292 = cmpi "slt", %arg5, %c0_159 : index + %293 = subi %c-1_160, %arg5 : index + %294 = select %292, %293, %arg5 : index + %295 = divi_signed %294, %c16_158 : index + %296 = subi %c-1_160, %295 : index + %297 = select %292, %296, %295 : index + %c3_161 = constant 3 : index + %298 = addi %297, %c3_161 : index + %c16_162 = constant 16 : index + %c0_163 = constant 0 : index + %c-1_164 = constant -1 : index + %299 = cmpi "slt", %298, %c0_163 : index + %300 = subi %c-1_164, %298 : index + %301 = select %299, %300, %298 : index + %302 = divi_signed %301, %c16_162 : index + %303 = subi %c-1_164, %302 : index + %304 = select %299, %303, %302 : index + %c-16_165 = constant -16 : index + %305 = muli %304, %c-16_165 : index + %306 = addi %291, %305 : index + %c3_166 = constant 3 : index + %307 = addi %306, %c3_166 : index + %c0_167 = constant 0 : index + %c16_168 = constant 16 : index + %308 = remi_signed %arg5, %c16_168 : index + %c0_169 = constant 0 : index + %309 = cmpi "slt", %308, %c0_169 : index + %310 = addi %308, %c16_168 : index + %311 = select %309, %310, %308 : index + %c8_170 = constant 8 : index + %c0_171 = constant 0 : index + %c-1_172 = constant -1 : index + %312 = cmpi "slt", %311, %c0_171 : index + %313 = subi %c-1_172, %311 : index + %314 = select %312, %313, %311 : index + %315 = divi_signed %314, %c8_170 : index + %316 = subi %c-1_172, %315 : index + %317 = select %312, %316, %315 : index + %c2_173 = constant 2 : index + %318 = remi_signed %317, %c2_173 : index + %c0_174 = constant 0 : index + %319 = cmpi "slt", %318, %c0_174 : index + %320 = addi %318, %c2_173 : index + %321 = select %319, %320, %318 : index + %322 = load %2[%307, %c0_167, %321] : memref<16x6x2xvector<8xf32>> + %323 = addf %285, %322 : vector<8xf32> + store %323, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %324 = addi %arg3, %arg5 : index + %c56 = constant 56 : index + %325 = addi %324, %c56 : index + %326 = vector.transfer_read %arg2[%arg4, %325], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c56_175 = constant 56 : index + %327 = addi %arg5, %c56_175 : index + %c16_176 = constant 16 : index + %c0_177 = constant 0 : index + %c-1_178 = constant -1 : index + %328 = cmpi "slt", %327, %c0_177 : index + %329 = subi %c-1_178, %327 : index + %330 = select %328, %329, %327 : index + %331 = divi_signed %330, %c16_176 : index + %332 = subi %c-1_178, %331 : index + %333 = select %328, %332, %331 : index + %c16_179 = constant 16 : index + %334 = remi_signed %333, %c16_179 : index + %c0_180 = constant 0 : index + %335 = cmpi "slt", %334, %c0_180 : index + %336 = addi %334, %c16_179 : index + %337 = select %335, %336, %334 : index + %c0_181 = constant 0 : index + %c8_182 = constant 8 : index + %c0_183 = constant 0 : index + %c-1_184 = constant -1 : index + %338 = cmpi "slt", %arg5, %c0_183 : index + %339 = subi %c-1_184, %arg5 : index + %340 = select %338, %339, %arg5 : index + %341 = divi_signed %340, %c8_182 : index + %342 = subi %c-1_184, %341 : index + %343 = select %338, %342, %341 : index + %c56_185 = constant 56 : index + %344 = addi %arg5, %c56_185 : index + %c16_186 = constant 16 : index + %c0_187 = constant 0 : index + %c-1_188 = constant -1 : index + %345 = cmpi "slt", %344, %c0_187 : index + %346 = subi %c-1_188, %344 : index + %347 = select %345, %346, %344 : index + %348 = divi_signed %347, %c16_186 : index + %349 = subi %c-1_188, %348 : index + %350 = select %345, %349, %348 : index + %c-2_189 = constant -2 : index + %351 = muli %350, %c-2_189 : index + %352 = addi %343, %351 : index + %c8_190 = constant 8 : index + %c0_191 = constant 0 : index + %c-1_192 = constant -1 : index + %353 = cmpi "slt", %arg5, %c0_191 : index + %354 = subi %c-1_192, %arg5 : index + %355 = select %353, %354, %arg5 : index + %356 = divi_signed %355, %c8_190 : index + %357 = subi %c-1_192, %356 : index + %358 = select %353, %357, %356 : index + %c56_193 = constant 56 : index + %359 = addi %arg5, %c56_193 : index + %c16_194 = constant 16 : index + %c0_195 = constant 0 : index + %c-1_196 = constant -1 : index + %360 = cmpi "slt", %359, %c0_195 : index + %361 = subi %c-1_196, %359 : index + %362 = select %360, %361, %359 : index + %363 = divi_signed %362, %c16_194 : index + %364 = subi %c-1_196, %363 : index + %365 = select %360, %364, %363 : index + %c-2_197 = constant -2 : index + %366 = muli %365, %c-2_197 : index + %367 = addi %358, %366 : index + %c7_198 = constant 7 : index + %368 = addi %367, %c7_198 : index + %c2_199 = constant 2 : index + %c0_200 = constant 0 : index + %c-1_201 = constant -1 : index + %369 = cmpi "slt", %368, %c0_200 : index + %370 = subi %c-1_201, %368 : index + %371 = select %369, %370, %368 : index + %372 = divi_signed %371, %c2_199 : index + %373 = subi %c-1_201, %372 : index + %374 = select %369, %373, %372 : index + %c-2_202 = constant -2 : index + %375 = muli %374, %c-2_202 : index + %376 = addi %352, %375 : index + %c7_203 = constant 7 : index + %377 = addi %376, %c7_203 : index + %378 = load %2[%337, %c0_181, %377] : memref<16x6x2xvector<8xf32>> + %379 = addf %326, %378 : vector<8xf32> + store %379, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %380 = addi %arg3, %arg5 : index + %c64 = constant 64 : index + %381 = addi %380, %c64 : index + %382 = vector.transfer_read %arg2[%arg4, %381], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_204 = constant 16 : index + %c0_205 = constant 0 : index + %c-1_206 = constant -1 : index + %383 = cmpi "slt", %arg5, %c0_205 : index + %384 = subi %c-1_206, %arg5 : index + %385 = select %383, %384, %arg5 : index + %386 = divi_signed %385, %c16_204 : index + %387 = subi %c-1_206, %386 : index + %388 = select %383, %387, %386 : index + %c16_207 = constant 16 : index + %c0_208 = constant 0 : index + %c-1_209 = constant -1 : index + %389 = cmpi "slt", %arg5, %c0_208 : index + %390 = subi %c-1_209, %arg5 : index + %391 = select %389, %390, %arg5 : index + %392 = divi_signed %391, %c16_207 : index + %393 = subi %c-1_209, %392 : index + %394 = select %389, %393, %392 : index + %c4_210 = constant 4 : index + %395 = addi %394, %c4_210 : index + %c16_211 = constant 16 : index + %c0_212 = constant 0 : index + %c-1_213 = constant -1 : index + %396 = cmpi "slt", %395, %c0_212 : index + %397 = subi %c-1_213, %395 : index + %398 = select %396, %397, %395 : index + %399 = divi_signed %398, %c16_211 : index + %400 = subi %c-1_213, %399 : index + %401 = select %396, %400, %399 : index + %c-16_214 = constant -16 : index + %402 = muli %401, %c-16_214 : index + %403 = addi %388, %402 : index + %c4_215 = constant 4 : index + %404 = addi %403, %c4_215 : index + %c0_216 = constant 0 : index + %c16_217 = constant 16 : index + %405 = remi_signed %arg5, %c16_217 : index + %c0_218 = constant 0 : index + %406 = cmpi "slt", %405, %c0_218 : index + %407 = addi %405, %c16_217 : index + %408 = select %406, %407, %405 : index + %c8_219 = constant 8 : index + %c0_220 = constant 0 : index + %c-1_221 = constant -1 : index + %409 = cmpi "slt", %408, %c0_220 : index + %410 = subi %c-1_221, %408 : index + %411 = select %409, %410, %408 : index + %412 = divi_signed %411, %c8_219 : index + %413 = subi %c-1_221, %412 : index + %414 = select %409, %413, %412 : index + %c2_222 = constant 2 : index + %415 = remi_signed %414, %c2_222 : index + %c0_223 = constant 0 : index + %416 = cmpi "slt", %415, %c0_223 : index + %417 = addi %415, %c2_222 : index + %418 = select %416, %417, %415 : index + %419 = load %2[%404, %c0_216, %418] : memref<16x6x2xvector<8xf32>> + %420 = addf %382, %419 : vector<8xf32> + store %420, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %421 = addi %arg3, %arg5 : index + %c72 = constant 72 : index + %422 = addi %421, %c72 : index + %423 = vector.transfer_read %arg2[%arg4, %422], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c72_224 = constant 72 : index + %424 = addi %arg5, %c72_224 : index + %c16_225 = constant 16 : index + %c0_226 = constant 0 : index + %c-1_227 = constant -1 : index + %425 = cmpi "slt", %424, %c0_226 : index + %426 = subi %c-1_227, %424 : index + %427 = select %425, %426, %424 : index + %428 = divi_signed %427, %c16_225 : index + %429 = subi %c-1_227, %428 : index + %430 = select %425, %429, %428 : index + %c16_228 = constant 16 : index + %431 = remi_signed %430, %c16_228 : index + %c0_229 = constant 0 : index + %432 = cmpi "slt", %431, %c0_229 : index + %433 = addi %431, %c16_228 : index + %434 = select %432, %433, %431 : index + %c0_230 = constant 0 : index + %c8_231 = constant 8 : index + %c0_232 = constant 0 : index + %c-1_233 = constant -1 : index + %435 = cmpi "slt", %arg5, %c0_232 : index + %436 = subi %c-1_233, %arg5 : index + %437 = select %435, %436, %arg5 : index + %438 = divi_signed %437, %c8_231 : index + %439 = subi %c-1_233, %438 : index + %440 = select %435, %439, %438 : index + %c72_234 = constant 72 : index + %441 = addi %arg5, %c72_234 : index + %c16_235 = constant 16 : index + %c0_236 = constant 0 : index + %c-1_237 = constant -1 : index + %442 = cmpi "slt", %441, %c0_236 : index + %443 = subi %c-1_237, %441 : index + %444 = select %442, %443, %441 : index + %445 = divi_signed %444, %c16_235 : index + %446 = subi %c-1_237, %445 : index + %447 = select %442, %446, %445 : index + %c-2_238 = constant -2 : index + %448 = muli %447, %c-2_238 : index + %449 = addi %440, %448 : index + %c8_239 = constant 8 : index + %c0_240 = constant 0 : index + %c-1_241 = constant -1 : index + %450 = cmpi "slt", %arg5, %c0_240 : index + %451 = subi %c-1_241, %arg5 : index + %452 = select %450, %451, %arg5 : index + %453 = divi_signed %452, %c8_239 : index + %454 = subi %c-1_241, %453 : index + %455 = select %450, %454, %453 : index + %c72_242 = constant 72 : index + %456 = addi %arg5, %c72_242 : index + %c16_243 = constant 16 : index + %c0_244 = constant 0 : index + %c-1_245 = constant -1 : index + %457 = cmpi "slt", %456, %c0_244 : index + %458 = subi %c-1_245, %456 : index + %459 = select %457, %458, %456 : index + %460 = divi_signed %459, %c16_243 : index + %461 = subi %c-1_245, %460 : index + %462 = select %457, %461, %460 : index + %c-2_246 = constant -2 : index + %463 = muli %462, %c-2_246 : index + %464 = addi %455, %463 : index + %c9_247 = constant 9 : index + %465 = addi %464, %c9_247 : index + %c2_248 = constant 2 : index + %c0_249 = constant 0 : index + %c-1_250 = constant -1 : index + %466 = cmpi "slt", %465, %c0_249 : index + %467 = subi %c-1_250, %465 : index + %468 = select %466, %467, %465 : index + %469 = divi_signed %468, %c2_248 : index + %470 = subi %c-1_250, %469 : index + %471 = select %466, %470, %469 : index + %c-2_251 = constant -2 : index + %472 = muli %471, %c-2_251 : index + %473 = addi %449, %472 : index + %c9_252 = constant 9 : index + %474 = addi %473, %c9_252 : index + %475 = load %2[%434, %c0_230, %474] : memref<16x6x2xvector<8xf32>> + %476 = addf %423, %475 : vector<8xf32> + store %476, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %477 = addi %arg3, %arg5 : index + %c80 = constant 80 : index + %478 = addi %477, %c80 : index + %479 = vector.transfer_read %arg2[%arg4, %478], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_253 = constant 16 : index + %c0_254 = constant 0 : index + %c-1_255 = constant -1 : index + %480 = cmpi "slt", %arg5, %c0_254 : index + %481 = subi %c-1_255, %arg5 : index + %482 = select %480, %481, %arg5 : index + %483 = divi_signed %482, %c16_253 : index + %484 = subi %c-1_255, %483 : index + %485 = select %480, %484, %483 : index + %c16_256 = constant 16 : index + %c0_257 = constant 0 : index + %c-1_258 = constant -1 : index + %486 = cmpi "slt", %arg5, %c0_257 : index + %487 = subi %c-1_258, %arg5 : index + %488 = select %486, %487, %arg5 : index + %489 = divi_signed %488, %c16_256 : index + %490 = subi %c-1_258, %489 : index + %491 = select %486, %490, %489 : index + %c5_259 = constant 5 : index + %492 = addi %491, %c5_259 : index + %c16_260 = constant 16 : index + %c0_261 = constant 0 : index + %c-1_262 = constant -1 : index + %493 = cmpi "slt", %492, %c0_261 : index + %494 = subi %c-1_262, %492 : index + %495 = select %493, %494, %492 : index + %496 = divi_signed %495, %c16_260 : index + %497 = subi %c-1_262, %496 : index + %498 = select %493, %497, %496 : index + %c-16_263 = constant -16 : index + %499 = muli %498, %c-16_263 : index + %500 = addi %485, %499 : index + %c5_264 = constant 5 : index + %501 = addi %500, %c5_264 : index + %c0_265 = constant 0 : index + %c16_266 = constant 16 : index + %502 = remi_signed %arg5, %c16_266 : index + %c0_267 = constant 0 : index + %503 = cmpi "slt", %502, %c0_267 : index + %504 = addi %502, %c16_266 : index + %505 = select %503, %504, %502 : index + %c8_268 = constant 8 : index + %c0_269 = constant 0 : index + %c-1_270 = constant -1 : index + %506 = cmpi "slt", %505, %c0_269 : index + %507 = subi %c-1_270, %505 : index + %508 = select %506, %507, %505 : index + %509 = divi_signed %508, %c8_268 : index + %510 = subi %c-1_270, %509 : index + %511 = select %506, %510, %509 : index + %c2_271 = constant 2 : index + %512 = remi_signed %511, %c2_271 : index + %c0_272 = constant 0 : index + %513 = cmpi "slt", %512, %c0_272 : index + %514 = addi %512, %c2_271 : index + %515 = select %513, %514, %512 : index + %516 = load %2[%501, %c0_265, %515] : memref<16x6x2xvector<8xf32>> + %517 = addf %479, %516 : vector<8xf32> + store %517, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %518 = addi %arg3, %arg5 : index + %c88 = constant 88 : index + %519 = addi %518, %c88 : index + %520 = vector.transfer_read %arg2[%arg4, %519], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c88_273 = constant 88 : index + %521 = addi %arg5, %c88_273 : index + %c16_274 = constant 16 : index + %c0_275 = constant 0 : index + %c-1_276 = constant -1 : index + %522 = cmpi "slt", %521, %c0_275 : index + %523 = subi %c-1_276, %521 : index + %524 = select %522, %523, %521 : index + %525 = divi_signed %524, %c16_274 : index + %526 = subi %c-1_276, %525 : index + %527 = select %522, %526, %525 : index + %c16_277 = constant 16 : index + %528 = remi_signed %527, %c16_277 : index + %c0_278 = constant 0 : index + %529 = cmpi "slt", %528, %c0_278 : index + %530 = addi %528, %c16_277 : index + %531 = select %529, %530, %528 : index + %c0_279 = constant 0 : index + %c8_280 = constant 8 : index + %c0_281 = constant 0 : index + %c-1_282 = constant -1 : index + %532 = cmpi "slt", %arg5, %c0_281 : index + %533 = subi %c-1_282, %arg5 : index + %534 = select %532, %533, %arg5 : index + %535 = divi_signed %534, %c8_280 : index + %536 = subi %c-1_282, %535 : index + %537 = select %532, %536, %535 : index + %c88_283 = constant 88 : index + %538 = addi %arg5, %c88_283 : index + %c16_284 = constant 16 : index + %c0_285 = constant 0 : index + %c-1_286 = constant -1 : index + %539 = cmpi "slt", %538, %c0_285 : index + %540 = subi %c-1_286, %538 : index + %541 = select %539, %540, %538 : index + %542 = divi_signed %541, %c16_284 : index + %543 = subi %c-1_286, %542 : index + %544 = select %539, %543, %542 : index + %c-2_287 = constant -2 : index + %545 = muli %544, %c-2_287 : index + %546 = addi %537, %545 : index + %c8_288 = constant 8 : index + %c0_289 = constant 0 : index + %c-1_290 = constant -1 : index + %547 = cmpi "slt", %arg5, %c0_289 : index + %548 = subi %c-1_290, %arg5 : index + %549 = select %547, %548, %arg5 : index + %550 = divi_signed %549, %c8_288 : index + %551 = subi %c-1_290, %550 : index + %552 = select %547, %551, %550 : index + %c88_291 = constant 88 : index + %553 = addi %arg5, %c88_291 : index + %c16_292 = constant 16 : index + %c0_293 = constant 0 : index + %c-1_294 = constant -1 : index + %554 = cmpi "slt", %553, %c0_293 : index + %555 = subi %c-1_294, %553 : index + %556 = select %554, %555, %553 : index + %557 = divi_signed %556, %c16_292 : index + %558 = subi %c-1_294, %557 : index + %559 = select %554, %558, %557 : index + %c-2_295 = constant -2 : index + %560 = muli %559, %c-2_295 : index + %561 = addi %552, %560 : index + %c11_296 = constant 11 : index + %562 = addi %561, %c11_296 : index + %c2_297 = constant 2 : index + %c0_298 = constant 0 : index + %c-1_299 = constant -1 : index + %563 = cmpi "slt", %562, %c0_298 : index + %564 = subi %c-1_299, %562 : index + %565 = select %563, %564, %562 : index + %566 = divi_signed %565, %c2_297 : index + %567 = subi %c-1_299, %566 : index + %568 = select %563, %567, %566 : index + %c-2_300 = constant -2 : index + %569 = muli %568, %c-2_300 : index + %570 = addi %546, %569 : index + %c11_301 = constant 11 : index + %571 = addi %570, %c11_301 : index + %572 = load %2[%531, %c0_279, %571] : memref<16x6x2xvector<8xf32>> + %573 = addf %520, %572 : vector<8xf32> + store %573, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %574 = addi %arg3, %arg5 : index + %c96 = constant 96 : index + %575 = addi %574, %c96 : index + %576 = vector.transfer_read %arg2[%arg4, %575], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_302 = constant 16 : index + %c0_303 = constant 0 : index + %c-1_304 = constant -1 : index + %577 = cmpi "slt", %arg5, %c0_303 : index + %578 = subi %c-1_304, %arg5 : index + %579 = select %577, %578, %arg5 : index + %580 = divi_signed %579, %c16_302 : index + %581 = subi %c-1_304, %580 : index + %582 = select %577, %581, %580 : index + %c16_305 = constant 16 : index + %c0_306 = constant 0 : index + %c-1_307 = constant -1 : index + %583 = cmpi "slt", %arg5, %c0_306 : index + %584 = subi %c-1_307, %arg5 : index + %585 = select %583, %584, %arg5 : index + %586 = divi_signed %585, %c16_305 : index + %587 = subi %c-1_307, %586 : index + %588 = select %583, %587, %586 : index + %c6_308 = constant 6 : index + %589 = addi %588, %c6_308 : index + %c16_309 = constant 16 : index + %c0_310 = constant 0 : index + %c-1_311 = constant -1 : index + %590 = cmpi "slt", %589, %c0_310 : index + %591 = subi %c-1_311, %589 : index + %592 = select %590, %591, %589 : index + %593 = divi_signed %592, %c16_309 : index + %594 = subi %c-1_311, %593 : index + %595 = select %590, %594, %593 : index + %c-16_312 = constant -16 : index + %596 = muli %595, %c-16_312 : index + %597 = addi %582, %596 : index + %c6_313 = constant 6 : index + %598 = addi %597, %c6_313 : index + %c0_314 = constant 0 : index + %c16_315 = constant 16 : index + %599 = remi_signed %arg5, %c16_315 : index + %c0_316 = constant 0 : index + %600 = cmpi "slt", %599, %c0_316 : index + %601 = addi %599, %c16_315 : index + %602 = select %600, %601, %599 : index + %c8_317 = constant 8 : index + %c0_318 = constant 0 : index + %c-1_319 = constant -1 : index + %603 = cmpi "slt", %602, %c0_318 : index + %604 = subi %c-1_319, %602 : index + %605 = select %603, %604, %602 : index + %606 = divi_signed %605, %c8_317 : index + %607 = subi %c-1_319, %606 : index + %608 = select %603, %607, %606 : index + %c2_320 = constant 2 : index + %609 = remi_signed %608, %c2_320 : index + %c0_321 = constant 0 : index + %610 = cmpi "slt", %609, %c0_321 : index + %611 = addi %609, %c2_320 : index + %612 = select %610, %611, %609 : index + %613 = load %2[%598, %c0_314, %612] : memref<16x6x2xvector<8xf32>> + %614 = addf %576, %613 : vector<8xf32> + store %614, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %615 = addi %arg3, %arg5 : index + %c104 = constant 104 : index + %616 = addi %615, %c104 : index + %617 = vector.transfer_read %arg2[%arg4, %616], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c104_322 = constant 104 : index + %618 = addi %arg5, %c104_322 : index + %c16_323 = constant 16 : index + %c0_324 = constant 0 : index + %c-1_325 = constant -1 : index + %619 = cmpi "slt", %618, %c0_324 : index + %620 = subi %c-1_325, %618 : index + %621 = select %619, %620, %618 : index + %622 = divi_signed %621, %c16_323 : index + %623 = subi %c-1_325, %622 : index + %624 = select %619, %623, %622 : index + %c16_326 = constant 16 : index + %625 = remi_signed %624, %c16_326 : index + %c0_327 = constant 0 : index + %626 = cmpi "slt", %625, %c0_327 : index + %627 = addi %625, %c16_326 : index + %628 = select %626, %627, %625 : index + %c0_328 = constant 0 : index + %c8_329 = constant 8 : index + %c0_330 = constant 0 : index + %c-1_331 = constant -1 : index + %629 = cmpi "slt", %arg5, %c0_330 : index + %630 = subi %c-1_331, %arg5 : index + %631 = select %629, %630, %arg5 : index + %632 = divi_signed %631, %c8_329 : index + %633 = subi %c-1_331, %632 : index + %634 = select %629, %633, %632 : index + %c104_332 = constant 104 : index + %635 = addi %arg5, %c104_332 : index + %c16_333 = constant 16 : index + %c0_334 = constant 0 : index + %c-1_335 = constant -1 : index + %636 = cmpi "slt", %635, %c0_334 : index + %637 = subi %c-1_335, %635 : index + %638 = select %636, %637, %635 : index + %639 = divi_signed %638, %c16_333 : index + %640 = subi %c-1_335, %639 : index + %641 = select %636, %640, %639 : index + %c-2_336 = constant -2 : index + %642 = muli %641, %c-2_336 : index + %643 = addi %634, %642 : index + %c8_337 = constant 8 : index + %c0_338 = constant 0 : index + %c-1_339 = constant -1 : index + %644 = cmpi "slt", %arg5, %c0_338 : index + %645 = subi %c-1_339, %arg5 : index + %646 = select %644, %645, %arg5 : index + %647 = divi_signed %646, %c8_337 : index + %648 = subi %c-1_339, %647 : index + %649 = select %644, %648, %647 : index + %c104_340 = constant 104 : index + %650 = addi %arg5, %c104_340 : index + %c16_341 = constant 16 : index + %c0_342 = constant 0 : index + %c-1_343 = constant -1 : index + %651 = cmpi "slt", %650, %c0_342 : index + %652 = subi %c-1_343, %650 : index + %653 = select %651, %652, %650 : index + %654 = divi_signed %653, %c16_341 : index + %655 = subi %c-1_343, %654 : index + %656 = select %651, %655, %654 : index + %c-2_344 = constant -2 : index + %657 = muli %656, %c-2_344 : index + %658 = addi %649, %657 : index + %c13_345 = constant 13 : index + %659 = addi %658, %c13_345 : index + %c2_346 = constant 2 : index + %c0_347 = constant 0 : index + %c-1_348 = constant -1 : index + %660 = cmpi "slt", %659, %c0_347 : index + %661 = subi %c-1_348, %659 : index + %662 = select %660, %661, %659 : index + %663 = divi_signed %662, %c2_346 : index + %664 = subi %c-1_348, %663 : index + %665 = select %660, %664, %663 : index + %c-2_349 = constant -2 : index + %666 = muli %665, %c-2_349 : index + %667 = addi %643, %666 : index + %c13_350 = constant 13 : index + %668 = addi %667, %c13_350 : index + %669 = load %2[%628, %c0_328, %668] : memref<16x6x2xvector<8xf32>> + %670 = addf %617, %669 : vector<8xf32> + store %670, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %671 = addi %arg3, %arg5 : index + %c112 = constant 112 : index + %672 = addi %671, %c112 : index + %673 = vector.transfer_read %arg2[%arg4, %672], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c16_351 = constant 16 : index + %c0_352 = constant 0 : index + %c-1_353 = constant -1 : index + %674 = cmpi "slt", %arg5, %c0_352 : index + %675 = subi %c-1_353, %arg5 : index + %676 = select %674, %675, %arg5 : index + %677 = divi_signed %676, %c16_351 : index + %678 = subi %c-1_353, %677 : index + %679 = select %674, %678, %677 : index + %c16_354 = constant 16 : index + %c0_355 = constant 0 : index + %c-1_356 = constant -1 : index + %680 = cmpi "slt", %arg5, %c0_355 : index + %681 = subi %c-1_356, %arg5 : index + %682 = select %680, %681, %arg5 : index + %683 = divi_signed %682, %c16_354 : index + %684 = subi %c-1_356, %683 : index + %685 = select %680, %684, %683 : index + %c7_357 = constant 7 : index + %686 = addi %685, %c7_357 : index + %c16_358 = constant 16 : index + %c0_359 = constant 0 : index + %c-1_360 = constant -1 : index + %687 = cmpi "slt", %686, %c0_359 : index + %688 = subi %c-1_360, %686 : index + %689 = select %687, %688, %686 : index + %690 = divi_signed %689, %c16_358 : index + %691 = subi %c-1_360, %690 : index + %692 = select %687, %691, %690 : index + %c-16_361 = constant -16 : index + %693 = muli %692, %c-16_361 : index + %694 = addi %679, %693 : index + %c7_362 = constant 7 : index + %695 = addi %694, %c7_362 : index + %c0_363 = constant 0 : index + %c16_364 = constant 16 : index + %696 = remi_signed %arg5, %c16_364 : index + %c0_365 = constant 0 : index + %697 = cmpi "slt", %696, %c0_365 : index + %698 = addi %696, %c16_364 : index + %699 = select %697, %698, %696 : index + %c8_366 = constant 8 : index + %c0_367 = constant 0 : index + %c-1_368 = constant -1 : index + %700 = cmpi "slt", %699, %c0_367 : index + %701 = subi %c-1_368, %699 : index + %702 = select %700, %701, %699 : index + %703 = divi_signed %702, %c8_366 : index + %704 = subi %c-1_368, %703 : index + %705 = select %700, %704, %703 : index + %c2_369 = constant 2 : index + %706 = remi_signed %705, %c2_369 : index + %c0_370 = constant 0 : index + %707 = cmpi "slt", %706, %c0_370 : index + %708 = addi %706, %c2_369 : index + %709 = select %707, %708, %706 : index + %710 = load %2[%695, %c0_363, %709] : memref<16x6x2xvector<8xf32>> + %711 = addf %673, %710 : vector<8xf32> + store %711, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %712 = addi %arg3, %arg5 : index + %c120 = constant 120 : index + %713 = addi %712, %c120 : index + %714 = vector.transfer_read %arg2[%arg4, %713], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %c120_371 = constant 120 : index + %715 = addi %arg5, %c120_371 : index + %c16_372 = constant 16 : index + %c0_373 = constant 0 : index + %c-1_374 = constant -1 : index + %716 = cmpi "slt", %715, %c0_373 : index + %717 = subi %c-1_374, %715 : index + %718 = select %716, %717, %715 : index + %719 = divi_signed %718, %c16_372 : index + %720 = subi %c-1_374, %719 : index + %721 = select %716, %720, %719 : index + %c16_375 = constant 16 : index + %722 = remi_signed %721, %c16_375 : index + %c0_376 = constant 0 : index + %723 = cmpi "slt", %722, %c0_376 : index + %724 = addi %722, %c16_375 : index + %725 = select %723, %724, %722 : index + %c0_377 = constant 0 : index + %c8_378 = constant 8 : index + %c0_379 = constant 0 : index + %c-1_380 = constant -1 : index + %726 = cmpi "slt", %arg5, %c0_379 : index + %727 = subi %c-1_380, %arg5 : index + %728 = select %726, %727, %arg5 : index + %729 = divi_signed %728, %c8_378 : index + %730 = subi %c-1_380, %729 : index + %731 = select %726, %730, %729 : index + %c120_381 = constant 120 : index + %732 = addi %arg5, %c120_381 : index + %c16_382 = constant 16 : index + %c0_383 = constant 0 : index + %c-1_384 = constant -1 : index + %733 = cmpi "slt", %732, %c0_383 : index + %734 = subi %c-1_384, %732 : index + %735 = select %733, %734, %732 : index + %736 = divi_signed %735, %c16_382 : index + %737 = subi %c-1_384, %736 : index + %738 = select %733, %737, %736 : index + %c-2_385 = constant -2 : index + %739 = muli %738, %c-2_385 : index + %740 = addi %731, %739 : index + %c8_386 = constant 8 : index + %c0_387 = constant 0 : index + %c-1_388 = constant -1 : index + %741 = cmpi "slt", %arg5, %c0_387 : index + %742 = subi %c-1_388, %arg5 : index + %743 = select %741, %742, %arg5 : index + %744 = divi_signed %743, %c8_386 : index + %745 = subi %c-1_388, %744 : index + %746 = select %741, %745, %744 : index + %c120_389 = constant 120 : index + %747 = addi %arg5, %c120_389 : index + %c16_390 = constant 16 : index + %c0_391 = constant 0 : index + %c-1_392 = constant -1 : index + %748 = cmpi "slt", %747, %c0_391 : index + %749 = subi %c-1_392, %747 : index + %750 = select %748, %749, %747 : index + %751 = divi_signed %750, %c16_390 : index + %752 = subi %c-1_392, %751 : index + %753 = select %748, %752, %751 : index + %c-2_393 = constant -2 : index + %754 = muli %753, %c-2_393 : index + %755 = addi %746, %754 : index + %c15_394 = constant 15 : index + %756 = addi %755, %c15_394 : index + %c2_395 = constant 2 : index + %c0_396 = constant 0 : index + %c-1_397 = constant -1 : index + %757 = cmpi "slt", %756, %c0_396 : index + %758 = subi %c-1_397, %756 : index + %759 = select %757, %758, %756 : index + %760 = divi_signed %759, %c2_395 : index + %761 = subi %c-1_397, %760 : index + %762 = select %757, %761, %760 : index + %c-2_398 = constant -2 : index + %763 = muli %762, %c-2_398 : index + %764 = addi %740, %763 : index + %c15_399 = constant 15 : index + %765 = addi %764, %c15_399 : index + %766 = load %2[%725, %c0_377, %765] : memref<16x6x2xvector<8xf32>> + %767 = addf %714, %766 : vector<8xf32> + store %767, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + %c0_400 = constant 0 : index + %c16_401 = constant 16 : index + %c1_402 = constant 1 : index + scf.for %arg6 = %c0_400 to %c16_401 step %c1_402 { + %768 = addi %arg3, %arg5 : index + %c8_403 = constant 8 : index + %769 = muli %arg6, %c8_403 : index + %770 = addi %768, %769 : index + %771 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %771, %arg2[%arg4, %770] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + accv.launch_func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) {exec_target = 0 : i64} : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } + } +} diff --git a/Tutorials/optimized_matmul/mlir/8_ConvertValueToStd.mlir b/Tutorials/optimized_matmul/mlir/8_ConvertValueToStd.mlir new file mode 100644 index 00000000..3d4569bf --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/8_ConvertValueToStd.mlir @@ -0,0 +1,1879 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %arg3, %arg5 : index + %9 = addi %8, %c1 : index + %10 = addi %arg6, %arg7 : index + %11 = load %arg0[%arg4, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg1[%10, %9] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = mulf %11, %12 {RelaxedPrecision} : f32 + %14 = load %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addf %14, %13 {RelaxedPrecision} : f32 + store %15, %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = load %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %16, %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %17 = addi %arg3, %arg5 : index + %18 = addi %17, %c2 : index + %19 = addi %arg6, %arg7 : index + %20 = load %arg0[%arg4, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = load %arg1[%19, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = mulf %20, %21 {RelaxedPrecision} : f32 + %23 = load %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = addf %23, %22 {RelaxedPrecision} : f32 + store %24, %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = load %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %25, %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %26 = addi %arg3, %arg5 : index + %27 = addi %26, %c3 : index + %28 = addi %arg6, %arg7 : index + %29 = load %arg0[%arg4, %28] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %30 = load %arg1[%28, %27] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = mulf %29, %30 {RelaxedPrecision} : f32 + %32 = load %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %33 = addf %32, %31 {RelaxedPrecision} : f32 + store %33, %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = load %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %34, %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = addi %arg3, %arg5 : index + %36 = addi %35, %c4 : index + %37 = addi %arg6, %arg7 : index + %38 = load %arg0[%arg4, %37] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %39 = load %arg1[%37, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = mulf %38, %39 {RelaxedPrecision} : f32 + %41 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = addf %41, %40 {RelaxedPrecision} : f32 + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %43, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = addi %arg3, %arg5 : index + %45 = addi %44, %c5 : index + %46 = addi %arg6, %arg7 : index + %47 = load %arg0[%arg4, %46] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %48 = load %arg1[%46, %45] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = mulf %47, %48 {RelaxedPrecision} : f32 + %50 = load %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %51 = addf %50, %49 {RelaxedPrecision} : f32 + store %51, %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = load %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %52, %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = addi %arg3, %arg5 : index + %54 = addi %53, %c6 : index + %55 = addi %arg6, %arg7 : index + %56 = load %arg0[%arg4, %55] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %57 = load %arg1[%55, %54] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %58 = mulf %56, %57 {RelaxedPrecision} : f32 + %59 = load %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = addf %59, %58 {RelaxedPrecision} : f32 + store %60, %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %61 = load %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %61, %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addi %arg3, %arg5 : index + %63 = addi %62, %c7 : index + %64 = addi %arg6, %arg7 : index + %65 = load %arg0[%arg4, %64] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%64, %63] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %arg3, %arg5 : index + %72 = addi %71, %c8 : index + %73 = addi %arg6, %arg7 : index + %74 = load %arg0[%arg4, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = mulf %74, %75 {RelaxedPrecision} : f32 + %77 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addf %77, %76 {RelaxedPrecision} : f32 + store %78, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = addi %arg3, %arg5 : index + %81 = addi %80, %c9 : index + %82 = addi %arg6, %arg7 : index + %83 = load %arg0[%arg4, %82] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %84 = load %arg1[%82, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = mulf %83, %84 {RelaxedPrecision} : f32 + %86 = load %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = addf %86, %85 {RelaxedPrecision} : f32 + store %87, %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = load %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %88, %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %89 = addi %arg3, %arg5 : index + %90 = addi %89, %c10 : index + %91 = addi %arg6, %arg7 : index + %92 = load %arg0[%arg4, %91] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %93 = load %arg1[%91, %90] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = mulf %92, %93 {RelaxedPrecision} : f32 + %95 = load %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = addf %95, %94 {RelaxedPrecision} : f32 + store %96, %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = load %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %97, %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = addi %arg3, %arg5 : index + %99 = addi %98, %c11 : index + %100 = addi %arg6, %arg7 : index + %101 = load %arg0[%arg4, %100] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %102 = load %arg1[%100, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = mulf %101, %102 {RelaxedPrecision} : f32 + %104 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = addf %104, %103 {RelaxedPrecision} : f32 + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %106, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %107 = addi %arg3, %arg5 : index + %108 = addi %107, %c12 : index + %109 = addi %arg6, %arg7 : index + %110 = load %arg0[%arg4, %109] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %111 = load %arg1[%109, %108] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = mulf %110, %111 {RelaxedPrecision} : f32 + %113 = load %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %114 = addf %113, %112 {RelaxedPrecision} : f32 + store %114, %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = load %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %115, %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = addi %arg3, %arg5 : index + %117 = addi %116, %c13 : index + %118 = addi %arg6, %arg7 : index + %119 = load %arg0[%arg4, %118] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%118, %117] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = addi %arg3, %arg5 : index + %126 = addi %125, %c14 : index + %127 = addi %arg6, %arg7 : index + %128 = load %arg0[%arg4, %127] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg1[%127, %126] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = mulf %128, %129 {RelaxedPrecision} : f32 + %131 = load %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = addf %131, %130 {RelaxedPrecision} : f32 + store %132, %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = load %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %133, %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = addi %arg3, %arg5 : index + %135 = addi %134, %c15 : index + %136 = addi %arg6, %arg7 : index + %137 = load %arg0[%arg4, %136] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%136, %135] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = addi %arg4, %c1 : index + %144 = addi %arg3, %arg5 : index + %145 = addi %arg6, %arg7 : index + %146 = load %arg0[%143, %145] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %147 = load %arg1[%145, %144] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = mulf %146, %147 {RelaxedPrecision} : f32 + %149 = load %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = addf %149, %148 {RelaxedPrecision} : f32 + store %150, %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = load %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %151, %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = addi %arg4, %c1 : index + %153 = addi %arg3, %arg5 : index + %154 = addi %153, %c1 : index + %155 = addi %arg6, %arg7 : index + %156 = load %arg0[%152, %155] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%155, %154] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = addi %arg4, %c1 : index + %163 = addi %arg3, %arg5 : index + %164 = addi %163, %c2 : index + %165 = addi %arg6, %arg7 : index + %166 = load %arg0[%162, %165] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %167 = load %arg1[%165, %164] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = mulf %166, %167 {RelaxedPrecision} : f32 + %169 = load %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = addf %169, %168 {RelaxedPrecision} : f32 + store %170, %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = load %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %171, %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addi %arg4, %c1 : index + %173 = addi %arg3, %arg5 : index + %174 = addi %173, %c3 : index + %175 = addi %arg6, %arg7 : index + %176 = load %arg0[%172, %175] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %177 = load %arg1[%175, %174] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = mulf %176, %177 {RelaxedPrecision} : f32 + %179 = load %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = addf %179, %178 {RelaxedPrecision} : f32 + store %180, %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = load %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %181, %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = addi %arg4, %c1 : index + %183 = addi %arg3, %arg5 : index + %184 = addi %183, %c4 : index + %185 = addi %arg6, %arg7 : index + %186 = load %arg0[%182, %185] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%185, %184] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = addi %arg4, %c1 : index + %193 = addi %arg3, %arg5 : index + %194 = addi %193, %c5 : index + %195 = addi %arg6, %arg7 : index + %196 = load %arg0[%192, %195] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %197 = load %arg1[%195, %194] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = mulf %196, %197 {RelaxedPrecision} : f32 + %199 = load %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = addf %199, %198 {RelaxedPrecision} : f32 + store %200, %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = load %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %201, %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addi %arg4, %c1 : index + %203 = addi %arg3, %arg5 : index + %204 = addi %203, %c6 : index + %205 = addi %arg6, %arg7 : index + %206 = load %arg0[%202, %205] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %207 = load %arg1[%205, %204] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = mulf %206, %207 {RelaxedPrecision} : f32 + %209 = load %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addf %209, %208 {RelaxedPrecision} : f32 + store %210, %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = load %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %211, %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %212 = addi %arg4, %c1 : index + %213 = addi %arg3, %arg5 : index + %214 = addi %213, %c7 : index + %215 = addi %arg6, %arg7 : index + %216 = load %arg0[%212, %215] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %217 = load %arg1[%215, %214] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %218 = mulf %216, %217 {RelaxedPrecision} : f32 + %219 = load %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = addf %219, %218 {RelaxedPrecision} : f32 + store %220, %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %221, %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = addi %arg4, %c1 : index + %223 = addi %arg3, %arg5 : index + %224 = addi %223, %c8 : index + %225 = addi %arg6, %arg7 : index + %226 = load %arg0[%222, %225] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %227 = load %arg1[%225, %224] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = mulf %226, %227 {RelaxedPrecision} : f32 + %229 = load %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %230 = addf %229, %228 {RelaxedPrecision} : f32 + store %230, %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = load %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %231, %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = addi %arg4, %c1 : index + %233 = addi %arg3, %arg5 : index + %234 = addi %233, %c9 : index + %235 = addi %arg6, %arg7 : index + %236 = load %arg0[%232, %235] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %237 = load %arg1[%235, %234] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = mulf %236, %237 {RelaxedPrecision} : f32 + %239 = load %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = addf %239, %238 {RelaxedPrecision} : f32 + store %240, %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %241, %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %242 = addi %arg4, %c1 : index + %243 = addi %arg3, %arg5 : index + %244 = addi %243, %c10 : index + %245 = addi %arg6, %arg7 : index + %246 = load %arg0[%242, %245] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %247 = load %arg1[%245, %244] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %248 = mulf %246, %247 {RelaxedPrecision} : f32 + %249 = load %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = addf %249, %248 {RelaxedPrecision} : f32 + store %250, %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %251, %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = addi %arg4, %c1 : index + %253 = addi %arg3, %arg5 : index + %254 = addi %253, %c11 : index + %255 = addi %arg6, %arg7 : index + %256 = load %arg0[%252, %255] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %257 = load %arg1[%255, %254] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = mulf %256, %257 {RelaxedPrecision} : f32 + %259 = load %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %260 = addf %259, %258 {RelaxedPrecision} : f32 + store %260, %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = load %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %261, %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = addi %arg4, %c1 : index + %263 = addi %arg3, %arg5 : index + %264 = addi %263, %c12 : index + %265 = addi %arg6, %arg7 : index + %266 = load %arg0[%262, %265] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %267 = load %arg1[%265, %264] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = mulf %266, %267 {RelaxedPrecision} : f32 + %269 = load %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = addf %269, %268 {RelaxedPrecision} : f32 + store %270, %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %271, %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %272 = addi %arg4, %c1 : index + %273 = addi %arg3, %arg5 : index + %274 = addi %273, %c13 : index + %275 = addi %arg6, %arg7 : index + %276 = load %arg0[%272, %275] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %277 = load %arg1[%275, %274] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %278 = mulf %276, %277 {RelaxedPrecision} : f32 + %279 = load %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = addf %279, %278 {RelaxedPrecision} : f32 + store %280, %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %281, %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = addi %arg4, %c1 : index + %283 = addi %arg3, %arg5 : index + %284 = addi %283, %c14 : index + %285 = addi %arg6, %arg7 : index + %286 = load %arg0[%282, %285] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %287 = load %arg1[%285, %284] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = mulf %286, %287 {RelaxedPrecision} : f32 + %289 = load %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %290 = addf %289, %288 {RelaxedPrecision} : f32 + store %290, %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = load %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %291, %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = addi %arg4, %c1 : index + %293 = addi %arg3, %arg5 : index + %294 = addi %293, %c15 : index + %295 = addi %arg6, %arg7 : index + %296 = load %arg0[%292, %295] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %297 = load %arg1[%295, %294] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = mulf %296, %297 {RelaxedPrecision} : f32 + %299 = load %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = addf %299, %298 {RelaxedPrecision} : f32 + store %300, %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %301, %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %302 = addi %arg4, %c2 : index + %303 = addi %arg3, %arg5 : index + %304 = addi %arg6, %arg7 : index + %305 = load %arg0[%302, %304] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%304, %303] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = addi %arg4, %c2 : index + %312 = addi %arg3, %arg5 : index + %313 = addi %312, %c1 : index + %314 = addi %arg6, %arg7 : index + %315 = load %arg0[%311, %314] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %316 = load %arg1[%314, %313] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = mulf %315, %316 {RelaxedPrecision} : f32 + %318 = load %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = addf %318, %317 {RelaxedPrecision} : f32 + store %319, %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %320, %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addi %arg4, %c2 : index + %322 = addi %arg3, %arg5 : index + %323 = addi %322, %c2 : index + %324 = addi %arg6, %arg7 : index + %325 = load %arg0[%321, %324] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %326 = load %arg1[%324, %323] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = mulf %325, %326 {RelaxedPrecision} : f32 + %328 = load %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = addf %328, %327 {RelaxedPrecision} : f32 + store %329, %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = load %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %330, %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = addi %arg4, %c2 : index + %332 = addi %arg3, %arg5 : index + %333 = addi %332, %c3 : index + %334 = addi %arg6, %arg7 : index + %335 = load %arg0[%331, %334] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%334, %333] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = addi %arg4, %c2 : index + %342 = addi %arg3, %arg5 : index + %343 = addi %342, %c4 : index + %344 = addi %arg6, %arg7 : index + %345 = load %arg0[%341, %344] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %346 = load %arg1[%344, %343] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = mulf %345, %346 {RelaxedPrecision} : f32 + %348 = load %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = addf %348, %347 {RelaxedPrecision} : f32 + store %349, %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %350, %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addi %arg4, %c2 : index + %352 = addi %arg3, %arg5 : index + %353 = addi %352, %c5 : index + %354 = addi %arg6, %arg7 : index + %355 = load %arg0[%351, %354] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %356 = load %arg1[%354, %353] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = mulf %355, %356 {RelaxedPrecision} : f32 + %358 = load %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = addf %358, %357 {RelaxedPrecision} : f32 + store %359, %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = load %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %360, %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = addi %arg4, %c2 : index + %362 = addi %arg3, %arg5 : index + %363 = addi %362, %c6 : index + %364 = addi %arg6, %arg7 : index + %365 = load %arg0[%361, %364] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%364, %363] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = addi %arg4, %c2 : index + %372 = addi %arg3, %arg5 : index + %373 = addi %372, %c7 : index + %374 = addi %arg6, %arg7 : index + %375 = load %arg0[%371, %374] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %376 = load %arg1[%374, %373] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = mulf %375, %376 {RelaxedPrecision} : f32 + %378 = load %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = addf %378, %377 {RelaxedPrecision} : f32 + store %379, %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %380, %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addi %arg4, %c2 : index + %382 = addi %arg3, %arg5 : index + %383 = addi %382, %c8 : index + %384 = addi %arg6, %arg7 : index + %385 = load %arg0[%381, %384] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %386 = load %arg1[%384, %383] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = mulf %385, %386 {RelaxedPrecision} : f32 + %388 = load %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = addf %388, %387 {RelaxedPrecision} : f32 + store %389, %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = load %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %390, %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = addi %arg4, %c2 : index + %392 = addi %arg3, %arg5 : index + %393 = addi %392, %c9 : index + %394 = addi %arg6, %arg7 : index + %395 = load %arg0[%391, %394] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%394, %393] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %401 = addi %arg4, %c2 : index + %402 = addi %arg3, %arg5 : index + %403 = addi %402, %c10 : index + %404 = addi %arg6, %arg7 : index + %405 = load %arg0[%401, %404] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%404, %403] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = addi %arg4, %c2 : index + %412 = addi %arg3, %arg5 : index + %413 = addi %412, %c11 : index + %414 = addi %arg6, %arg7 : index + %415 = load %arg0[%411, %414] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %416 = load %arg1[%414, %413] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = mulf %415, %416 {RelaxedPrecision} : f32 + %418 = load %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = addf %418, %417 {RelaxedPrecision} : f32 + store %419, %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = load %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %420, %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addi %arg4, %c2 : index + %422 = addi %arg3, %arg5 : index + %423 = addi %422, %c12 : index + %424 = addi %arg6, %arg7 : index + %425 = load %arg0[%421, %424] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %426 = load %arg1[%424, %423] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = mulf %425, %426 {RelaxedPrecision} : f32 + %428 = load %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = addf %428, %427 {RelaxedPrecision} : f32 + store %429, %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %430 = load %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %430, %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = addi %arg4, %c2 : index + %432 = addi %arg3, %arg5 : index + %433 = addi %432, %c13 : index + %434 = addi %arg6, %arg7 : index + %435 = load %arg0[%431, %434] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%434, %433] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = addi %arg4, %c2 : index + %442 = addi %arg3, %arg5 : index + %443 = addi %442, %c14 : index + %444 = addi %arg6, %arg7 : index + %445 = load %arg0[%441, %444] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %446 = load %arg1[%444, %443] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = mulf %445, %446 {RelaxedPrecision} : f32 + %448 = load %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = addf %448, %447 {RelaxedPrecision} : f32 + store %449, %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %450 = load %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %450, %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addi %arg4, %c2 : index + %452 = addi %arg3, %arg5 : index + %453 = addi %452, %c15 : index + %454 = addi %arg6, %arg7 : index + %455 = load %arg0[%451, %454] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %456 = load %arg1[%454, %453] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = mulf %455, %456 {RelaxedPrecision} : f32 + %458 = load %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = addf %458, %457 {RelaxedPrecision} : f32 + store %459, %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = load %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %460, %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = addi %arg4, %c3 : index + %462 = addi %arg3, %arg5 : index + %463 = addi %arg6, %arg7 : index + %464 = load %arg0[%461, %463] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %465 = load %arg1[%463, %462] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %466 = mulf %464, %465 {RelaxedPrecision} : f32 + %467 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = addf %467, %466 {RelaxedPrecision} : f32 + store %468, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %469, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = addi %arg4, %c3 : index + %471 = addi %arg3, %arg5 : index + %472 = addi %471, %c1 : index + %473 = addi %arg6, %arg7 : index + %474 = load %arg0[%470, %473] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %475 = load %arg1[%473, %472] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = mulf %474, %475 {RelaxedPrecision} : f32 + %477 = load %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = addf %477, %476 {RelaxedPrecision} : f32 + store %478, %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = load %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %479, %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %480 = addi %arg4, %c3 : index + %481 = addi %arg3, %arg5 : index + %482 = addi %481, %c2 : index + %483 = addi %arg6, %arg7 : index + %484 = load %arg0[%480, %483] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %485 = load %arg1[%483, %482] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = mulf %484, %485 {RelaxedPrecision} : f32 + %487 = load %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = addf %487, %486 {RelaxedPrecision} : f32 + store %488, %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %489, %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %490 = addi %arg4, %c3 : index + %491 = addi %arg3, %arg5 : index + %492 = addi %491, %c3 : index + %493 = addi %arg6, %arg7 : index + %494 = load %arg0[%490, %493] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %495 = load %arg1[%493, %492] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = mulf %494, %495 {RelaxedPrecision} : f32 + %497 = load %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %498 = addf %497, %496 {RelaxedPrecision} : f32 + store %498, %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = load %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %499, %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = addi %arg4, %c3 : index + %501 = addi %arg3, %arg5 : index + %502 = addi %501, %c4 : index + %503 = addi %arg6, %arg7 : index + %504 = load %arg0[%500, %503] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %505 = load %arg1[%503, %502] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = mulf %504, %505 {RelaxedPrecision} : f32 + %507 = load %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = addf %507, %506 {RelaxedPrecision} : f32 + store %508, %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %509 = load %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %509, %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = addi %arg4, %c3 : index + %511 = addi %arg3, %arg5 : index + %512 = addi %511, %c5 : index + %513 = addi %arg6, %arg7 : index + %514 = load %arg0[%510, %513] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%513, %512] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = addi %arg4, %c3 : index + %521 = addi %arg3, %arg5 : index + %522 = addi %521, %c6 : index + %523 = addi %arg6, %arg7 : index + %524 = load %arg0[%520, %523] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %525 = load %arg1[%523, %522] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = mulf %524, %525 {RelaxedPrecision} : f32 + %527 = load %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = addf %527, %526 {RelaxedPrecision} : f32 + store %528, %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %529 = load %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %529, %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addi %arg4, %c3 : index + %531 = addi %arg3, %arg5 : index + %532 = addi %531, %c7 : index + %533 = addi %arg6, %arg7 : index + %534 = load %arg0[%530, %533] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %535 = load %arg1[%533, %532] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = mulf %534, %535 {RelaxedPrecision} : f32 + %537 = load %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = addf %537, %536 {RelaxedPrecision} : f32 + store %538, %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %539 = load %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %539, %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = addi %arg4, %c3 : index + %541 = addi %arg3, %arg5 : index + %542 = addi %541, %c8 : index + %543 = addi %arg6, %arg7 : index + %544 = load %arg0[%540, %543] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%543, %542] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = addi %arg4, %c3 : index + %551 = addi %arg3, %arg5 : index + %552 = addi %551, %c9 : index + %553 = addi %arg6, %arg7 : index + %554 = load %arg0[%550, %553] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %555 = load %arg1[%553, %552] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = mulf %554, %555 {RelaxedPrecision} : f32 + %557 = load %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = addf %557, %556 {RelaxedPrecision} : f32 + store %558, %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = load %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %559, %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addi %arg4, %c3 : index + %561 = addi %arg3, %arg5 : index + %562 = addi %561, %c10 : index + %563 = addi %arg6, %arg7 : index + %564 = load %arg0[%560, %563] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %565 = load %arg1[%563, %562] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = mulf %564, %565 {RelaxedPrecision} : f32 + %567 = load %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = addf %567, %566 {RelaxedPrecision} : f32 + store %568, %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %569 = load %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %569, %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = addi %arg4, %c3 : index + %571 = addi %arg3, %arg5 : index + %572 = addi %571, %c11 : index + %573 = addi %arg6, %arg7 : index + %574 = load %arg0[%570, %573] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%573, %572] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = addi %arg4, %c3 : index + %581 = addi %arg3, %arg5 : index + %582 = addi %581, %c12 : index + %583 = addi %arg6, %arg7 : index + %584 = load %arg0[%580, %583] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %585 = load %arg1[%583, %582] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = mulf %584, %585 {RelaxedPrecision} : f32 + %587 = load %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = addf %587, %586 {RelaxedPrecision} : f32 + store %588, %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %589 = load %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %589, %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addi %arg4, %c3 : index + %591 = addi %arg3, %arg5 : index + %592 = addi %591, %c13 : index + %593 = addi %arg6, %arg7 : index + %594 = load %arg0[%590, %593] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %595 = load %arg1[%593, %592] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = mulf %594, %595 {RelaxedPrecision} : f32 + %597 = load %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %598 = addf %597, %596 {RelaxedPrecision} : f32 + store %598, %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %599 = load %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %599, %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %600 = addi %arg4, %c3 : index + %601 = addi %arg3, %arg5 : index + %602 = addi %601, %c14 : index + %603 = addi %arg6, %arg7 : index + %604 = load %arg0[%600, %603] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %605 = load %arg1[%603, %602] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %606 = mulf %604, %605 {RelaxedPrecision} : f32 + %607 = load %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %608 = addf %607, %606 {RelaxedPrecision} : f32 + store %608, %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %609 = load %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %609, %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %610 = addi %arg4, %c3 : index + %611 = addi %arg3, %arg5 : index + %612 = addi %611, %c15 : index + %613 = addi %arg6, %arg7 : index + %614 = load %arg0[%610, %613] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %615 = load %arg1[%613, %612] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %616 = mulf %614, %615 {RelaxedPrecision} : f32 + %617 = load %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %618 = addf %617, %616 {RelaxedPrecision} : f32 + store %618, %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %619 = load %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %619, %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %620 = addi %arg4, %c4 : index + %621 = addi %arg3, %arg5 : index + %622 = addi %arg6, %arg7 : index + %623 = load %arg0[%620, %622] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %624 = load %arg1[%622, %621] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %625 = mulf %623, %624 {RelaxedPrecision} : f32 + %626 = load %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %627 = addf %626, %625 {RelaxedPrecision} : f32 + store %627, %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %628 = load %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %628, %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %629 = addi %arg4, %c4 : index + %630 = addi %arg3, %arg5 : index + %631 = addi %630, %c1 : index + %632 = addi %arg6, %arg7 : index + %633 = load %arg0[%629, %632] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %634 = load %arg1[%632, %631] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %635 = mulf %633, %634 {RelaxedPrecision} : f32 + %636 = load %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %637 = addf %636, %635 {RelaxedPrecision} : f32 + store %637, %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %638 = load %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %638, %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %639 = addi %arg4, %c4 : index + %640 = addi %arg3, %arg5 : index + %641 = addi %640, %c2 : index + %642 = addi %arg6, %arg7 : index + %643 = load %arg0[%639, %642] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %644 = load %arg1[%642, %641] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %645 = mulf %643, %644 {RelaxedPrecision} : f32 + %646 = load %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %647 = addf %646, %645 {RelaxedPrecision} : f32 + store %647, %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %648 = load %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %648, %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %649 = addi %arg4, %c4 : index + %650 = addi %arg3, %arg5 : index + %651 = addi %650, %c3 : index + %652 = addi %arg6, %arg7 : index + %653 = load %arg0[%649, %652] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %654 = load %arg1[%652, %651] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %655 = mulf %653, %654 {RelaxedPrecision} : f32 + %656 = load %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %657 = addf %656, %655 {RelaxedPrecision} : f32 + store %657, %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %658 = load %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %658, %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %659 = addi %arg4, %c4 : index + %660 = addi %arg3, %arg5 : index + %661 = addi %660, %c4 : index + %662 = addi %arg6, %arg7 : index + %663 = load %arg0[%659, %662] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %664 = load %arg1[%662, %661] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %665 = mulf %663, %664 {RelaxedPrecision} : f32 + %666 = load %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %667 = addf %666, %665 {RelaxedPrecision} : f32 + store %667, %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %668 = load %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %668, %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %669 = addi %arg4, %c4 : index + %670 = addi %arg3, %arg5 : index + %671 = addi %670, %c5 : index + %672 = addi %arg6, %arg7 : index + %673 = load %arg0[%669, %672] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %674 = load %arg1[%672, %671] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %675 = mulf %673, %674 {RelaxedPrecision} : f32 + %676 = load %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %677 = addf %676, %675 {RelaxedPrecision} : f32 + store %677, %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %678 = load %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %678, %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %679 = addi %arg4, %c4 : index + %680 = addi %arg3, %arg5 : index + %681 = addi %680, %c6 : index + %682 = addi %arg6, %arg7 : index + %683 = load %arg0[%679, %682] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %684 = load %arg1[%682, %681] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %685 = mulf %683, %684 {RelaxedPrecision} : f32 + %686 = load %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %687 = addf %686, %685 {RelaxedPrecision} : f32 + store %687, %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %688 = load %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %688, %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %689 = addi %arg4, %c4 : index + %690 = addi %arg3, %arg5 : index + %691 = addi %690, %c7 : index + %692 = addi %arg6, %arg7 : index + %693 = load %arg0[%689, %692] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %694 = load %arg1[%692, %691] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %695 = mulf %693, %694 {RelaxedPrecision} : f32 + %696 = load %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %697 = addf %696, %695 {RelaxedPrecision} : f32 + store %697, %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %698 = load %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %698, %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %699 = addi %arg4, %c4 : index + %700 = addi %arg3, %arg5 : index + %701 = addi %700, %c8 : index + %702 = addi %arg6, %arg7 : index + %703 = load %arg0[%699, %702] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %704 = load %arg1[%702, %701] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %705 = mulf %703, %704 {RelaxedPrecision} : f32 + %706 = load %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %707 = addf %706, %705 {RelaxedPrecision} : f32 + store %707, %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %708 = load %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %708, %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %709 = addi %arg4, %c4 : index + %710 = addi %arg3, %arg5 : index + %711 = addi %710, %c9 : index + %712 = addi %arg6, %arg7 : index + %713 = load %arg0[%709, %712] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %714 = load %arg1[%712, %711] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %715 = mulf %713, %714 {RelaxedPrecision} : f32 + %716 = load %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %717 = addf %716, %715 {RelaxedPrecision} : f32 + store %717, %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %718 = load %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %718, %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %719 = addi %arg4, %c4 : index + %720 = addi %arg3, %arg5 : index + %721 = addi %720, %c10 : index + %722 = addi %arg6, %arg7 : index + %723 = load %arg0[%719, %722] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %724 = load %arg1[%722, %721] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %725 = mulf %723, %724 {RelaxedPrecision} : f32 + %726 = load %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %727 = addf %726, %725 {RelaxedPrecision} : f32 + store %727, %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %728 = load %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %728, %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %729 = addi %arg4, %c4 : index + %730 = addi %arg3, %arg5 : index + %731 = addi %730, %c11 : index + %732 = addi %arg6, %arg7 : index + %733 = load %arg0[%729, %732] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %734 = load %arg1[%732, %731] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %735 = mulf %733, %734 {RelaxedPrecision} : f32 + %736 = load %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %737 = addf %736, %735 {RelaxedPrecision} : f32 + store %737, %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %738 = load %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %738, %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %739 = addi %arg4, %c4 : index + %740 = addi %arg3, %arg5 : index + %741 = addi %740, %c12 : index + %742 = addi %arg6, %arg7 : index + %743 = load %arg0[%739, %742] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %744 = load %arg1[%742, %741] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %745 = mulf %743, %744 {RelaxedPrecision} : f32 + %746 = load %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %747 = addf %746, %745 {RelaxedPrecision} : f32 + store %747, %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %748 = load %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %748, %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %749 = addi %arg4, %c4 : index + %750 = addi %arg3, %arg5 : index + %751 = addi %750, %c13 : index + %752 = addi %arg6, %arg7 : index + %753 = load %arg0[%749, %752] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %754 = load %arg1[%752, %751] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %755 = mulf %753, %754 {RelaxedPrecision} : f32 + %756 = load %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %757 = addf %756, %755 {RelaxedPrecision} : f32 + store %757, %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %758 = load %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %758, %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %759 = addi %arg4, %c4 : index + %760 = addi %arg3, %arg5 : index + %761 = addi %760, %c14 : index + %762 = addi %arg6, %arg7 : index + %763 = load %arg0[%759, %762] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %764 = load %arg1[%762, %761] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %765 = mulf %763, %764 {RelaxedPrecision} : f32 + %766 = load %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %767 = addf %766, %765 {RelaxedPrecision} : f32 + store %767, %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %768 = load %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %768, %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %769 = addi %arg4, %c4 : index + %770 = addi %arg3, %arg5 : index + %771 = addi %770, %c15 : index + %772 = addi %arg6, %arg7 : index + %773 = load %arg0[%769, %772] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %774 = load %arg1[%772, %771] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %775 = mulf %773, %774 {RelaxedPrecision} : f32 + %776 = load %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %777 = addf %776, %775 {RelaxedPrecision} : f32 + store %777, %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %778 = load %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %778, %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %779 = addi %arg4, %c5 : index + %780 = addi %arg3, %arg5 : index + %781 = addi %arg6, %arg7 : index + %782 = load %arg0[%779, %781] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %783 = load %arg1[%781, %780] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %784 = mulf %782, %783 {RelaxedPrecision} : f32 + %785 = load %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %786 = addf %785, %784 {RelaxedPrecision} : f32 + store %786, %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %787 = load %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %787, %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %788 = addi %arg4, %c5 : index + %789 = addi %arg3, %arg5 : index + %790 = addi %789, %c1 : index + %791 = addi %arg6, %arg7 : index + %792 = load %arg0[%788, %791] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %793 = load %arg1[%791, %790] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %794 = mulf %792, %793 {RelaxedPrecision} : f32 + %795 = load %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %796 = addf %795, %794 {RelaxedPrecision} : f32 + store %796, %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %797 = load %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %797, %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %798 = addi %arg4, %c5 : index + %799 = addi %arg3, %arg5 : index + %800 = addi %799, %c2 : index + %801 = addi %arg6, %arg7 : index + %802 = load %arg0[%798, %801] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %803 = load %arg1[%801, %800] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %804 = mulf %802, %803 {RelaxedPrecision} : f32 + %805 = load %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %806 = addf %805, %804 {RelaxedPrecision} : f32 + store %806, %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %807 = load %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %807, %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %808 = addi %arg4, %c5 : index + %809 = addi %arg3, %arg5 : index + %810 = addi %809, %c3 : index + %811 = addi %arg6, %arg7 : index + %812 = load %arg0[%808, %811] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %813 = load %arg1[%811, %810] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %814 = mulf %812, %813 {RelaxedPrecision} : f32 + %815 = load %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %816 = addf %815, %814 {RelaxedPrecision} : f32 + store %816, %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %817 = load %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %817, %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %818 = addi %arg4, %c5 : index + %819 = addi %arg3, %arg5 : index + %820 = addi %819, %c4 : index + %821 = addi %arg6, %arg7 : index + %822 = load %arg0[%818, %821] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %823 = load %arg1[%821, %820] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %824 = mulf %822, %823 {RelaxedPrecision} : f32 + %825 = load %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %826 = addf %825, %824 {RelaxedPrecision} : f32 + store %826, %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %827 = load %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %827, %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %828 = addi %arg4, %c5 : index + %829 = addi %arg3, %arg5 : index + %830 = addi %829, %c5 : index + %831 = addi %arg6, %arg7 : index + %832 = load %arg0[%828, %831] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %833 = load %arg1[%831, %830] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %834 = mulf %832, %833 {RelaxedPrecision} : f32 + %835 = load %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %836 = addf %835, %834 {RelaxedPrecision} : f32 + store %836, %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %837 = load %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %837, %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %838 = addi %arg4, %c5 : index + %839 = addi %arg3, %arg5 : index + %840 = addi %839, %c6 : index + %841 = addi %arg6, %arg7 : index + %842 = load %arg0[%838, %841] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %843 = load %arg1[%841, %840] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %844 = mulf %842, %843 {RelaxedPrecision} : f32 + %845 = load %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %846 = addf %845, %844 {RelaxedPrecision} : f32 + store %846, %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %847 = load %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %847, %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %848 = addi %arg4, %c5 : index + %849 = addi %arg3, %arg5 : index + %850 = addi %849, %c7 : index + %851 = addi %arg6, %arg7 : index + %852 = load %arg0[%848, %851] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %853 = load %arg1[%851, %850] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %854 = mulf %852, %853 {RelaxedPrecision} : f32 + %855 = load %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %856 = addf %855, %854 {RelaxedPrecision} : f32 + store %856, %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %857 = load %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %857, %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %858 = addi %arg4, %c5 : index + %859 = addi %arg3, %arg5 : index + %860 = addi %859, %c8 : index + %861 = addi %arg6, %arg7 : index + %862 = load %arg0[%858, %861] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %863 = load %arg1[%861, %860] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %864 = mulf %862, %863 {RelaxedPrecision} : f32 + %865 = load %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %866 = addf %865, %864 {RelaxedPrecision} : f32 + store %866, %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %867 = load %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %867, %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %868 = addi %arg4, %c5 : index + %869 = addi %arg3, %arg5 : index + %870 = addi %869, %c9 : index + %871 = addi %arg6, %arg7 : index + %872 = load %arg0[%868, %871] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %873 = load %arg1[%871, %870] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %874 = mulf %872, %873 {RelaxedPrecision} : f32 + %875 = load %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %876 = addf %875, %874 {RelaxedPrecision} : f32 + store %876, %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %877 = load %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %877, %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %878 = addi %arg4, %c5 : index + %879 = addi %arg3, %arg5 : index + %880 = addi %879, %c10 : index + %881 = addi %arg6, %arg7 : index + %882 = load %arg0[%878, %881] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %883 = load %arg1[%881, %880] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %884 = mulf %882, %883 {RelaxedPrecision} : f32 + %885 = load %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %886 = addf %885, %884 {RelaxedPrecision} : f32 + store %886, %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %887 = load %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %887, %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %888 = addi %arg4, %c5 : index + %889 = addi %arg3, %arg5 : index + %890 = addi %889, %c11 : index + %891 = addi %arg6, %arg7 : index + %892 = load %arg0[%888, %891] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %893 = load %arg1[%891, %890] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %894 = mulf %892, %893 {RelaxedPrecision} : f32 + %895 = load %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %896 = addf %895, %894 {RelaxedPrecision} : f32 + store %896, %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %897 = load %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %897, %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %898 = addi %arg4, %c5 : index + %899 = addi %arg3, %arg5 : index + %900 = addi %899, %c12 : index + %901 = addi %arg6, %arg7 : index + %902 = load %arg0[%898, %901] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %903 = load %arg1[%901, %900] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %904 = mulf %902, %903 {RelaxedPrecision} : f32 + %905 = load %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %906 = addf %905, %904 {RelaxedPrecision} : f32 + store %906, %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %907 = load %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %907, %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %908 = addi %arg4, %c5 : index + %909 = addi %arg3, %arg5 : index + %910 = addi %909, %c13 : index + %911 = addi %arg6, %arg7 : index + %912 = load %arg0[%908, %911] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %913 = load %arg1[%911, %910] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %914 = mulf %912, %913 {RelaxedPrecision} : f32 + %915 = load %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %916 = addf %915, %914 {RelaxedPrecision} : f32 + store %916, %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %917 = load %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %917, %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %918 = addi %arg4, %c5 : index + %919 = addi %arg3, %arg5 : index + %920 = addi %919, %c14 : index + %921 = addi %arg6, %arg7 : index + %922 = load %arg0[%918, %921] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %923 = load %arg1[%921, %920] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %924 = mulf %922, %923 {RelaxedPrecision} : f32 + %925 = load %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %926 = addf %925, %924 {RelaxedPrecision} : f32 + store %926, %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %927 = load %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %927, %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %928 = addi %arg4, %c5 : index + %929 = addi %arg3, %arg5 : index + %930 = addi %929, %c15 : index + %931 = addi %arg6, %arg7 : index + %932 = load %arg0[%928, %931] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %933 = load %arg1[%931, %930] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %934 = mulf %932, %933 {RelaxedPrecision} : f32 + %935 = load %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %936 = addf %935, %934 {RelaxedPrecision} : f32 + store %936, %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %937 = load %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %937, %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %arg3, %arg4 : index + %9 = addi %8, %c1 : index + %10 = addi %arg5, %arg6 : index + %11 = load %arg0[%c780, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg1[%10, %9] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = mulf %11, %12 {RelaxedPrecision} : f32 + %14 = load %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addf %14, %13 {RelaxedPrecision} : f32 + store %15, %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = load %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %16, %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %17 = addi %arg3, %arg4 : index + %18 = addi %17, %c2 : index + %19 = addi %arg5, %arg6 : index + %20 = load %arg0[%c780, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = load %arg1[%19, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = mulf %20, %21 {RelaxedPrecision} : f32 + %23 = load %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = addf %23, %22 {RelaxedPrecision} : f32 + store %24, %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = load %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %25, %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %26 = addi %arg3, %arg4 : index + %27 = addi %26, %c3 : index + %28 = addi %arg5, %arg6 : index + %29 = load %arg0[%c780, %28] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %30 = load %arg1[%28, %27] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = mulf %29, %30 {RelaxedPrecision} : f32 + %32 = load %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %33 = addf %32, %31 {RelaxedPrecision} : f32 + store %33, %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = load %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %34, %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = addi %arg3, %arg4 : index + %36 = addi %35, %c4 : index + %37 = addi %arg5, %arg6 : index + %38 = load %arg0[%c780, %37] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %39 = load %arg1[%37, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = mulf %38, %39 {RelaxedPrecision} : f32 + %41 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = addf %41, %40 {RelaxedPrecision} : f32 + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %43, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = addi %arg3, %arg4 : index + %45 = addi %44, %c5 : index + %46 = addi %arg5, %arg6 : index + %47 = load %arg0[%c780, %46] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %48 = load %arg1[%46, %45] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = mulf %47, %48 {RelaxedPrecision} : f32 + %50 = load %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %51 = addf %50, %49 {RelaxedPrecision} : f32 + store %51, %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = load %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %52, %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = addi %arg3, %arg4 : index + %54 = addi %53, %c6 : index + %55 = addi %arg5, %arg6 : index + %56 = load %arg0[%c780, %55] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %57 = load %arg1[%55, %54] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %58 = mulf %56, %57 {RelaxedPrecision} : f32 + %59 = load %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = addf %59, %58 {RelaxedPrecision} : f32 + store %60, %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %61 = load %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %61, %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addi %arg3, %arg4 : index + %63 = addi %62, %c7 : index + %64 = addi %arg5, %arg6 : index + %65 = load %arg0[%c780, %64] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%64, %63] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %arg3, %arg4 : index + %72 = addi %71, %c8 : index + %73 = addi %arg5, %arg6 : index + %74 = load %arg0[%c780, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = mulf %74, %75 {RelaxedPrecision} : f32 + %77 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addf %77, %76 {RelaxedPrecision} : f32 + store %78, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = addi %arg3, %arg4 : index + %81 = addi %80, %c9 : index + %82 = addi %arg5, %arg6 : index + %83 = load %arg0[%c780, %82] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %84 = load %arg1[%82, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = mulf %83, %84 {RelaxedPrecision} : f32 + %86 = load %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = addf %86, %85 {RelaxedPrecision} : f32 + store %87, %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = load %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %88, %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %89 = addi %arg3, %arg4 : index + %90 = addi %89, %c10 : index + %91 = addi %arg5, %arg6 : index + %92 = load %arg0[%c780, %91] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %93 = load %arg1[%91, %90] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = mulf %92, %93 {RelaxedPrecision} : f32 + %95 = load %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = addf %95, %94 {RelaxedPrecision} : f32 + store %96, %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = load %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %97, %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = addi %arg3, %arg4 : index + %99 = addi %98, %c11 : index + %100 = addi %arg5, %arg6 : index + %101 = load %arg0[%c780, %100] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %102 = load %arg1[%100, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = mulf %101, %102 {RelaxedPrecision} : f32 + %104 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = addf %104, %103 {RelaxedPrecision} : f32 + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %106, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %107 = addi %arg3, %arg4 : index + %108 = addi %107, %c12 : index + %109 = addi %arg5, %arg6 : index + %110 = load %arg0[%c780, %109] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %111 = load %arg1[%109, %108] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = mulf %110, %111 {RelaxedPrecision} : f32 + %113 = load %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %114 = addf %113, %112 {RelaxedPrecision} : f32 + store %114, %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = load %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %115, %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = addi %arg3, %arg4 : index + %117 = addi %116, %c13 : index + %118 = addi %arg5, %arg6 : index + %119 = load %arg0[%c780, %118] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%118, %117] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = addi %arg3, %arg4 : index + %126 = addi %125, %c14 : index + %127 = addi %arg5, %arg6 : index + %128 = load %arg0[%c780, %127] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg1[%127, %126] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = mulf %128, %129 {RelaxedPrecision} : f32 + %131 = load %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = addf %131, %130 {RelaxedPrecision} : f32 + store %132, %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = load %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %133, %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = addi %arg3, %arg4 : index + %135 = addi %134, %c15 : index + %136 = addi %arg5, %arg6 : index + %137 = load %arg0[%c780, %136] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%136, %135] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = addi %arg3, %arg4 : index + %144 = addi %arg5, %arg6 : index + %145 = load %arg0[%c781, %144] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %146 = load %arg1[%144, %143] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = mulf %145, %146 {RelaxedPrecision} : f32 + %148 = load %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = addf %148, %147 {RelaxedPrecision} : f32 + store %149, %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %150, %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = addi %arg3, %arg4 : index + %152 = addi %151, %c1 : index + %153 = addi %arg5, %arg6 : index + %154 = load %arg0[%c781, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg1[%153, %152] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = mulf %154, %155 {RelaxedPrecision} : f32 + %157 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = addf %157, %156 {RelaxedPrecision} : f32 + store %158, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %159, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addi %arg3, %arg4 : index + %161 = addi %160, %c2 : index + %162 = addi %arg5, %arg6 : index + %163 = load %arg0[%c781, %162] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %164 = load %arg1[%162, %161] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = mulf %163, %164 {RelaxedPrecision} : f32 + %166 = load %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = addf %166, %165 {RelaxedPrecision} : f32 + store %167, %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %168, %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = addi %arg3, %arg4 : index + %170 = addi %169, %c3 : index + %171 = addi %arg5, %arg6 : index + %172 = load %arg0[%c781, %171] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %173 = load %arg1[%171, %170] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = mulf %172, %173 {RelaxedPrecision} : f32 + %175 = load %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = addf %175, %174 {RelaxedPrecision} : f32 + store %176, %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = load %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %177, %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addi %arg3, %arg4 : index + %179 = addi %178, %c4 : index + %180 = addi %arg5, %arg6 : index + %181 = load %arg0[%c781, %180] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %182 = load %arg1[%180, %179] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = mulf %181, %182 {RelaxedPrecision} : f32 + %184 = load %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = addf %184, %183 {RelaxedPrecision} : f32 + store %185, %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %186, %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = addi %arg3, %arg4 : index + %188 = addi %187, %c5 : index + %189 = addi %arg5, %arg6 : index + %190 = load %arg0[%c781, %189] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %191 = load %arg1[%189, %188] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = mulf %190, %191 {RelaxedPrecision} : f32 + %193 = load %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = addf %193, %192 {RelaxedPrecision} : f32 + store %194, %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = load %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %195, %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addi %arg3, %arg4 : index + %197 = addi %196, %c6 : index + %198 = addi %arg5, %arg6 : index + %199 = load %arg0[%c781, %198] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %200 = load %arg1[%198, %197] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = mulf %199, %200 {RelaxedPrecision} : f32 + %202 = load %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = addf %202, %201 {RelaxedPrecision} : f32 + store %203, %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %204, %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = addi %arg3, %arg4 : index + %206 = addi %205, %c7 : index + %207 = addi %arg5, %arg6 : index + %208 = load %arg0[%c781, %207] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %209 = load %arg1[%207, %206] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = mulf %208, %209 {RelaxedPrecision} : f32 + %211 = load %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %212 = addf %211, %210 {RelaxedPrecision} : f32 + store %212, %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = load %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %213, %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = addi %arg3, %arg4 : index + %215 = addi %214, %c8 : index + %216 = addi %arg5, %arg6 : index + %217 = load %arg0[%c781, %216] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%216, %215] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = addi %arg3, %arg4 : index + %224 = addi %223, %c9 : index + %225 = addi %arg5, %arg6 : index + %226 = load %arg0[%c781, %225] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %227 = load %arg1[%225, %224] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = mulf %226, %227 {RelaxedPrecision} : f32 + %229 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %230 = addf %229, %228 {RelaxedPrecision} : f32 + store %230, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %231, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = addi %arg3, %arg4 : index + %233 = addi %232, %c10 : index + %234 = addi %arg5, %arg6 : index + %235 = load %arg0[%c781, %234] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%234, %233] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = addi %arg3, %arg4 : index + %242 = addi %241, %c11 : index + %243 = addi %arg5, %arg6 : index + %244 = load %arg0[%c781, %243] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %245 = load %arg1[%243, %242] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = mulf %244, %245 {RelaxedPrecision} : f32 + %247 = load %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %248 = addf %247, %246 {RelaxedPrecision} : f32 + store %248, %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = load %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %249, %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = addi %arg3, %arg4 : index + %251 = addi %250, %c12 : index + %252 = addi %arg5, %arg6 : index + %253 = load %arg0[%c781, %252] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%252, %251] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = addi %arg3, %arg4 : index + %260 = addi %259, %c13 : index + %261 = addi %arg5, %arg6 : index + %262 = load %arg0[%c781, %261] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %263 = load %arg1[%261, %260] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = mulf %262, %263 {RelaxedPrecision} : f32 + %265 = load %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %266 = addf %265, %264 {RelaxedPrecision} : f32 + store %266, %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = load %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %267, %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = addi %arg3, %arg4 : index + %269 = addi %268, %c14 : index + %270 = addi %arg5, %arg6 : index + %271 = load %arg0[%c781, %270] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%270, %269] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = addi %arg3, %arg4 : index + %278 = addi %277, %c15 : index + %279 = addi %arg5, %arg6 : index + %280 = load %arg0[%c781, %279] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %281 = load %arg1[%279, %278] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = mulf %280, %281 {RelaxedPrecision} : f32 + %283 = load %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %284 = addf %283, %282 {RelaxedPrecision} : f32 + store %284, %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = load %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %285, %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = addi %arg3, %arg4 : index + %287 = addi %arg5, %arg6 : index + %288 = load %arg0[%c782, %287] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %289 = load %arg1[%287, %286] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %290 = mulf %288, %289 {RelaxedPrecision} : f32 + %291 = load %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = addf %291, %290 {RelaxedPrecision} : f32 + store %292, %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %293, %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = addi %arg3, %arg4 : index + %295 = addi %294, %c1 : index + %296 = addi %arg5, %arg6 : index + %297 = load %arg0[%c782, %296] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %298 = load %arg1[%296, %295] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = mulf %297, %298 {RelaxedPrecision} : f32 + %300 = load %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = addf %300, %299 {RelaxedPrecision} : f32 + store %301, %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %302 = load %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %302, %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addi %arg3, %arg4 : index + %304 = addi %303, %c2 : index + %305 = addi %arg5, %arg6 : index + %306 = load %arg0[%c782, %305] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %307 = load %arg1[%305, %304] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %308 = mulf %306, %307 {RelaxedPrecision} : f32 + %309 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = addf %309, %308 {RelaxedPrecision} : f32 + store %310, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %311, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addi %arg3, %arg4 : index + %313 = addi %312, %c3 : index + %314 = addi %arg5, %arg6 : index + %315 = load %arg0[%c782, %314] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %316 = load %arg1[%314, %313] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = mulf %315, %316 {RelaxedPrecision} : f32 + %318 = load %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = addf %318, %317 {RelaxedPrecision} : f32 + store %319, %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %320, %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addi %arg3, %arg4 : index + %322 = addi %321, %c4 : index + %323 = addi %arg5, %arg6 : index + %324 = load %arg0[%c782, %323] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %325 = load %arg1[%323, %322] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = mulf %324, %325 {RelaxedPrecision} : f32 + %327 = load %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = addf %327, %326 {RelaxedPrecision} : f32 + store %328, %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %329, %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addi %arg3, %arg4 : index + %331 = addi %330, %c5 : index + %332 = addi %arg5, %arg6 : index + %333 = load %arg0[%c782, %332] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %334 = load %arg1[%332, %331] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = mulf %333, %334 {RelaxedPrecision} : f32 + %336 = load %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = addf %336, %335 {RelaxedPrecision} : f32 + store %337, %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %338, %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addi %arg3, %arg4 : index + %340 = addi %339, %c6 : index + %341 = addi %arg5, %arg6 : index + %342 = load %arg0[%c782, %341] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %343 = load %arg1[%341, %340] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = mulf %342, %343 {RelaxedPrecision} : f32 + %345 = load %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = addf %345, %344 {RelaxedPrecision} : f32 + store %346, %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %347, %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addi %arg3, %arg4 : index + %349 = addi %348, %c7 : index + %350 = addi %arg5, %arg6 : index + %351 = load %arg0[%c782, %350] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %352 = load %arg1[%350, %349] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = mulf %351, %352 {RelaxedPrecision} : f32 + %354 = load %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = addf %354, %353 {RelaxedPrecision} : f32 + store %355, %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %356, %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addi %arg3, %arg4 : index + %358 = addi %357, %c8 : index + %359 = addi %arg5, %arg6 : index + %360 = load %arg0[%c782, %359] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %361 = load %arg1[%359, %358] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = mulf %360, %361 {RelaxedPrecision} : f32 + %363 = load %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = addf %363, %362 {RelaxedPrecision} : f32 + store %364, %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %365, %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addi %arg3, %arg4 : index + %367 = addi %366, %c9 : index + %368 = addi %arg5, %arg6 : index + %369 = load %arg0[%c782, %368] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %370 = load %arg1[%368, %367] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = mulf %369, %370 {RelaxedPrecision} : f32 + %372 = load %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = addf %372, %371 {RelaxedPrecision} : f32 + store %373, %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %374, %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addi %arg3, %arg4 : index + %376 = addi %375, %c10 : index + %377 = addi %arg5, %arg6 : index + %378 = load %arg0[%c782, %377] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %379 = load %arg1[%377, %376] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = mulf %378, %379 {RelaxedPrecision} : f32 + %381 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = addf %381, %380 {RelaxedPrecision} : f32 + store %382, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %383, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addi %arg3, %arg4 : index + %385 = addi %384, %c11 : index + %386 = addi %arg5, %arg6 : index + %387 = load %arg0[%c782, %386] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %388 = load %arg1[%386, %385] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = mulf %387, %388 {RelaxedPrecision} : f32 + %390 = load %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = addf %390, %389 {RelaxedPrecision} : f32 + store %391, %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %392, %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addi %arg3, %arg4 : index + %394 = addi %393, %c12 : index + %395 = addi %arg5, %arg6 : index + %396 = load %arg0[%c782, %395] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %397 = load %arg1[%395, %394] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = mulf %396, %397 {RelaxedPrecision} : f32 + %399 = load %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = addf %399, %398 {RelaxedPrecision} : f32 + store %400, %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %401 = load %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %401, %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addi %arg3, %arg4 : index + %403 = addi %402, %c13 : index + %404 = addi %arg5, %arg6 : index + %405 = load %arg0[%c782, %404] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%404, %403] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = addi %arg3, %arg4 : index + %412 = addi %411, %c14 : index + %413 = addi %arg5, %arg6 : index + %414 = load %arg0[%c782, %413] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %415 = load %arg1[%413, %412] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = mulf %414, %415 {RelaxedPrecision} : f32 + %417 = load %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %418 = addf %417, %416 {RelaxedPrecision} : f32 + store %418, %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = load %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %419, %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = addi %arg3, %arg4 : index + %421 = addi %420, %c15 : index + %422 = addi %arg5, %arg6 : index + %423 = load %arg0[%c782, %422] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%422, %421] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = addi %arg3, %arg4 : index + %430 = addi %arg5, %arg6 : index + %431 = load %arg0[%c783, %430] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %432 = load %arg1[%430, %429] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = mulf %431, %432 {RelaxedPrecision} : f32 + %434 = load %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = addf %434, %433 {RelaxedPrecision} : f32 + store %435, %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %436 = load %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %436, %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = addi %arg3, %arg4 : index + %438 = addi %437, %c1 : index + %439 = addi %arg5, %arg6 : index + %440 = load %arg0[%c783, %439] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %441 = load %arg1[%439, %438] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %442 = mulf %440, %441 {RelaxedPrecision} : f32 + %443 = load %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %444 = addf %443, %442 {RelaxedPrecision} : f32 + store %444, %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = load %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %445, %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = addi %arg3, %arg4 : index + %447 = addi %446, %c2 : index + %448 = addi %arg5, %arg6 : index + %449 = load %arg0[%c783, %448] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %450 = load %arg1[%448, %447] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = mulf %449, %450 {RelaxedPrecision} : f32 + %452 = load %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = addf %452, %451 {RelaxedPrecision} : f32 + store %453, %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %454 = load %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %454, %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = addi %arg3, %arg4 : index + %456 = addi %455, %c3 : index + %457 = addi %arg5, %arg6 : index + %458 = load %arg0[%c783, %457] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %459 = load %arg1[%457, %456] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = mulf %458, %459 {RelaxedPrecision} : f32 + %461 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %462 = addf %461, %460 {RelaxedPrecision} : f32 + store %462, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %463, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = addi %arg3, %arg4 : index + %465 = addi %464, %c4 : index + %466 = addi %arg5, %arg6 : index + %467 = load %arg0[%c783, %466] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %468 = load %arg1[%466, %465] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = mulf %467, %468 {RelaxedPrecision} : f32 + %470 = load %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = addf %470, %469 {RelaxedPrecision} : f32 + store %471, %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %472 = load %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %472, %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = addi %arg3, %arg4 : index + %474 = addi %473, %c5 : index + %475 = addi %arg5, %arg6 : index + %476 = load %arg0[%c783, %475] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %477 = load %arg1[%475, %474] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = mulf %476, %477 {RelaxedPrecision} : f32 + %479 = load %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %480 = addf %479, %478 {RelaxedPrecision} : f32 + store %480, %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = load %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %481, %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = addi %arg3, %arg4 : index + %483 = addi %482, %c6 : index + %484 = addi %arg5, %arg6 : index + %485 = load %arg0[%c783, %484] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %486 = load %arg1[%484, %483] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = mulf %485, %486 {RelaxedPrecision} : f32 + %488 = load %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = addf %488, %487 {RelaxedPrecision} : f32 + store %489, %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %490 = load %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %490, %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = addi %arg3, %arg4 : index + %492 = addi %491, %c7 : index + %493 = addi %arg5, %arg6 : index + %494 = load %arg0[%c783, %493] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %495 = load %arg1[%493, %492] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = mulf %494, %495 {RelaxedPrecision} : f32 + %497 = load %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %498 = addf %497, %496 {RelaxedPrecision} : f32 + store %498, %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = load %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %499, %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = addi %arg3, %arg4 : index + %501 = addi %500, %c8 : index + %502 = addi %arg5, %arg6 : index + %503 = load %arg0[%c783, %502] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %504 = load %arg1[%502, %501] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %505 = mulf %503, %504 {RelaxedPrecision} : f32 + %506 = load %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = addf %506, %505 {RelaxedPrecision} : f32 + store %507, %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %508, %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %509 = addi %arg3, %arg4 : index + %510 = addi %509, %c9 : index + %511 = addi %arg5, %arg6 : index + %512 = load %arg0[%c783, %511] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %513 = load %arg1[%511, %510] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = mulf %512, %513 {RelaxedPrecision} : f32 + %515 = load %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = addf %515, %514 {RelaxedPrecision} : f32 + store %516, %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %517 = load %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %517, %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addi %arg3, %arg4 : index + %519 = addi %518, %c10 : index + %520 = addi %arg5, %arg6 : index + %521 = load %arg0[%c783, %520] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %522 = load %arg1[%520, %519] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %523 = mulf %521, %522 {RelaxedPrecision} : f32 + %524 = load %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = addf %524, %523 {RelaxedPrecision} : f32 + store %525, %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %526, %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %527 = addi %arg3, %arg4 : index + %528 = addi %527, %c11 : index + %529 = addi %arg5, %arg6 : index + %530 = load %arg0[%c783, %529] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %531 = load %arg1[%529, %528] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = mulf %530, %531 {RelaxedPrecision} : f32 + %533 = load %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = addf %533, %532 {RelaxedPrecision} : f32 + store %534, %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %535 = load %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %535, %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addi %arg3, %arg4 : index + %537 = addi %536, %c12 : index + %538 = addi %arg5, %arg6 : index + %539 = load %arg0[%c783, %538] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %540 = load %arg1[%538, %537] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %541 = mulf %539, %540 {RelaxedPrecision} : f32 + %542 = load %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = addf %542, %541 {RelaxedPrecision} : f32 + store %543, %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %544, %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %545 = addi %arg3, %arg4 : index + %546 = addi %545, %c13 : index + %547 = addi %arg5, %arg6 : index + %548 = load %arg0[%c783, %547] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %549 = load %arg1[%547, %546] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = mulf %548, %549 {RelaxedPrecision} : f32 + %551 = load %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = addf %551, %550 {RelaxedPrecision} : f32 + store %552, %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %553 = load %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %553, %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addi %arg3, %arg4 : index + %555 = addi %554, %c14 : index + %556 = addi %arg5, %arg6 : index + %557 = load %arg0[%c783, %556] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %558 = load %arg1[%556, %555] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = mulf %557, %558 {RelaxedPrecision} : f32 + %560 = load %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = addf %560, %559 {RelaxedPrecision} : f32 + store %561, %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %562, %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %563 = addi %arg3, %arg4 : index + %564 = addi %563, %c15 : index + %565 = addi %arg5, %arg6 : index + %566 = load %arg0[%c783, %565] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %567 = load %arg1[%565, %564] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = mulf %566, %567 {RelaxedPrecision} : f32 + %569 = load %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = addf %569, %568 {RelaxedPrecision} : f32 + store %570, %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %571 = load %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %571, %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/9_Canonicalizer.mlir b/Tutorials/optimized_matmul/mlir/9_Canonicalizer.mlir new file mode 100644 index 00000000..3d4569bf --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/9_Canonicalizer.mlir @@ -0,0 +1,1879 @@ +module @optimized_matmul { + func @optimized_matmul_py_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %c781 = constant 781 : index + %c782 = constant 782 : index + %c783 = constant 783 : index + %c512 = constant 512 : index + %c780 = constant 780 : index + %c256 = constant 256 : index + %c16 = constant 16 : index + %c128 = constant 128 : index + %c0 = constant 0 : index + %c1 = constant 1 : index + %c2 = constant 2 : index + %c3 = constant 3 : index + %c4 = constant 4 : index + %c5 = constant 5 : index + %c6 = constant 6 : index + %c7 = constant 7 : index + %c8 = constant 8 : index + %c9 = constant 9 : index + %c10 = constant 10 : index + %c11 = constant 11 : index + %c12 = constant 12 : index + %c13 = constant 13 : index + %c14 = constant 14 : index + %c15 = constant 15 : index + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c780 step %c6 { + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg5 : index + %1 = addi %arg6, %arg7 : index + %2 = load %arg0[%arg4, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%arg4, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %arg3, %arg5 : index + %9 = addi %8, %c1 : index + %10 = addi %arg6, %arg7 : index + %11 = load %arg0[%arg4, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg1[%10, %9] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = mulf %11, %12 {RelaxedPrecision} : f32 + %14 = load %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addf %14, %13 {RelaxedPrecision} : f32 + store %15, %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = load %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %16, %arg2[%arg4, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %17 = addi %arg3, %arg5 : index + %18 = addi %17, %c2 : index + %19 = addi %arg6, %arg7 : index + %20 = load %arg0[%arg4, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = load %arg1[%19, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = mulf %20, %21 {RelaxedPrecision} : f32 + %23 = load %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = addf %23, %22 {RelaxedPrecision} : f32 + store %24, %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = load %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %25, %arg2[%arg4, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %26 = addi %arg3, %arg5 : index + %27 = addi %26, %c3 : index + %28 = addi %arg6, %arg7 : index + %29 = load %arg0[%arg4, %28] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %30 = load %arg1[%28, %27] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = mulf %29, %30 {RelaxedPrecision} : f32 + %32 = load %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %33 = addf %32, %31 {RelaxedPrecision} : f32 + store %33, %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = load %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %34, %arg2[%arg4, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = addi %arg3, %arg5 : index + %36 = addi %35, %c4 : index + %37 = addi %arg6, %arg7 : index + %38 = load %arg0[%arg4, %37] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %39 = load %arg1[%37, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = mulf %38, %39 {RelaxedPrecision} : f32 + %41 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = addf %41, %40 {RelaxedPrecision} : f32 + store %42, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = load %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %43, %arg2[%arg4, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = addi %arg3, %arg5 : index + %45 = addi %44, %c5 : index + %46 = addi %arg6, %arg7 : index + %47 = load %arg0[%arg4, %46] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %48 = load %arg1[%46, %45] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = mulf %47, %48 {RelaxedPrecision} : f32 + %50 = load %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %51 = addf %50, %49 {RelaxedPrecision} : f32 + store %51, %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = load %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %52, %arg2[%arg4, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = addi %arg3, %arg5 : index + %54 = addi %53, %c6 : index + %55 = addi %arg6, %arg7 : index + %56 = load %arg0[%arg4, %55] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %57 = load %arg1[%55, %54] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %58 = mulf %56, %57 {RelaxedPrecision} : f32 + %59 = load %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = addf %59, %58 {RelaxedPrecision} : f32 + store %60, %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %61 = load %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %61, %arg2[%arg4, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addi %arg3, %arg5 : index + %63 = addi %62, %c7 : index + %64 = addi %arg6, %arg7 : index + %65 = load %arg0[%arg4, %64] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%64, %63] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%arg4, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %arg3, %arg5 : index + %72 = addi %71, %c8 : index + %73 = addi %arg6, %arg7 : index + %74 = load %arg0[%arg4, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = mulf %74, %75 {RelaxedPrecision} : f32 + %77 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addf %77, %76 {RelaxedPrecision} : f32 + store %78, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%arg4, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = addi %arg3, %arg5 : index + %81 = addi %80, %c9 : index + %82 = addi %arg6, %arg7 : index + %83 = load %arg0[%arg4, %82] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %84 = load %arg1[%82, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = mulf %83, %84 {RelaxedPrecision} : f32 + %86 = load %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = addf %86, %85 {RelaxedPrecision} : f32 + store %87, %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = load %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %88, %arg2[%arg4, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %89 = addi %arg3, %arg5 : index + %90 = addi %89, %c10 : index + %91 = addi %arg6, %arg7 : index + %92 = load %arg0[%arg4, %91] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %93 = load %arg1[%91, %90] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = mulf %92, %93 {RelaxedPrecision} : f32 + %95 = load %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = addf %95, %94 {RelaxedPrecision} : f32 + store %96, %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = load %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %97, %arg2[%arg4, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = addi %arg3, %arg5 : index + %99 = addi %98, %c11 : index + %100 = addi %arg6, %arg7 : index + %101 = load %arg0[%arg4, %100] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %102 = load %arg1[%100, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = mulf %101, %102 {RelaxedPrecision} : f32 + %104 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = addf %104, %103 {RelaxedPrecision} : f32 + store %105, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = load %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %106, %arg2[%arg4, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %107 = addi %arg3, %arg5 : index + %108 = addi %107, %c12 : index + %109 = addi %arg6, %arg7 : index + %110 = load %arg0[%arg4, %109] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %111 = load %arg1[%109, %108] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = mulf %110, %111 {RelaxedPrecision} : f32 + %113 = load %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %114 = addf %113, %112 {RelaxedPrecision} : f32 + store %114, %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = load %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %115, %arg2[%arg4, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = addi %arg3, %arg5 : index + %117 = addi %116, %c13 : index + %118 = addi %arg6, %arg7 : index + %119 = load %arg0[%arg4, %118] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%118, %117] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%arg4, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = addi %arg3, %arg5 : index + %126 = addi %125, %c14 : index + %127 = addi %arg6, %arg7 : index + %128 = load %arg0[%arg4, %127] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg1[%127, %126] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = mulf %128, %129 {RelaxedPrecision} : f32 + %131 = load %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = addf %131, %130 {RelaxedPrecision} : f32 + store %132, %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = load %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %133, %arg2[%arg4, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = addi %arg3, %arg5 : index + %135 = addi %134, %c15 : index + %136 = addi %arg6, %arg7 : index + %137 = load %arg0[%arg4, %136] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%136, %135] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%arg4, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = addi %arg4, %c1 : index + %144 = addi %arg3, %arg5 : index + %145 = addi %arg6, %arg7 : index + %146 = load %arg0[%143, %145] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %147 = load %arg1[%145, %144] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %148 = mulf %146, %147 {RelaxedPrecision} : f32 + %149 = load %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = addf %149, %148 {RelaxedPrecision} : f32 + store %150, %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = load %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %151, %arg2[%143, %144] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %152 = addi %arg4, %c1 : index + %153 = addi %arg3, %arg5 : index + %154 = addi %153, %c1 : index + %155 = addi %arg6, %arg7 : index + %156 = load %arg0[%152, %155] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %157 = load %arg1[%155, %154] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = mulf %156, %157 {RelaxedPrecision} : f32 + %159 = load %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addf %159, %158 {RelaxedPrecision} : f32 + store %160, %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %161 = load %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %161, %arg2[%152, %154] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %162 = addi %arg4, %c1 : index + %163 = addi %arg3, %arg5 : index + %164 = addi %163, %c2 : index + %165 = addi %arg6, %arg7 : index + %166 = load %arg0[%162, %165] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %167 = load %arg1[%165, %164] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = mulf %166, %167 {RelaxedPrecision} : f32 + %169 = load %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %170 = addf %169, %168 {RelaxedPrecision} : f32 + store %170, %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %171 = load %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %171, %arg2[%162, %164] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %172 = addi %arg4, %c1 : index + %173 = addi %arg3, %arg5 : index + %174 = addi %173, %c3 : index + %175 = addi %arg6, %arg7 : index + %176 = load %arg0[%172, %175] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %177 = load %arg1[%175, %174] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = mulf %176, %177 {RelaxedPrecision} : f32 + %179 = load %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %180 = addf %179, %178 {RelaxedPrecision} : f32 + store %180, %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %181 = load %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %181, %arg2[%172, %174] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %182 = addi %arg4, %c1 : index + %183 = addi %arg3, %arg5 : index + %184 = addi %183, %c4 : index + %185 = addi %arg6, %arg7 : index + %186 = load %arg0[%182, %185] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %187 = load %arg1[%185, %184] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %188 = mulf %186, %187 {RelaxedPrecision} : f32 + %189 = load %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %190 = addf %189, %188 {RelaxedPrecision} : f32 + store %190, %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %191 = load %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %191, %arg2[%182, %184] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = addi %arg4, %c1 : index + %193 = addi %arg3, %arg5 : index + %194 = addi %193, %c5 : index + %195 = addi %arg6, %arg7 : index + %196 = load %arg0[%192, %195] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %197 = load %arg1[%195, %194] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %198 = mulf %196, %197 {RelaxedPrecision} : f32 + %199 = load %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %200 = addf %199, %198 {RelaxedPrecision} : f32 + store %200, %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = load %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %201, %arg2[%192, %194] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %202 = addi %arg4, %c1 : index + %203 = addi %arg3, %arg5 : index + %204 = addi %203, %c6 : index + %205 = addi %arg6, %arg7 : index + %206 = load %arg0[%202, %205] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %207 = load %arg1[%205, %204] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %208 = mulf %206, %207 {RelaxedPrecision} : f32 + %209 = load %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = addf %209, %208 {RelaxedPrecision} : f32 + store %210, %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %211 = load %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %211, %arg2[%202, %204] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %212 = addi %arg4, %c1 : index + %213 = addi %arg3, %arg5 : index + %214 = addi %213, %c7 : index + %215 = addi %arg6, %arg7 : index + %216 = load %arg0[%212, %215] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %217 = load %arg1[%215, %214] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %218 = mulf %216, %217 {RelaxedPrecision} : f32 + %219 = load %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %220 = addf %219, %218 {RelaxedPrecision} : f32 + store %220, %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = load %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %221, %arg2[%212, %214] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = addi %arg4, %c1 : index + %223 = addi %arg3, %arg5 : index + %224 = addi %223, %c8 : index + %225 = addi %arg6, %arg7 : index + %226 = load %arg0[%222, %225] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %227 = load %arg1[%225, %224] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = mulf %226, %227 {RelaxedPrecision} : f32 + %229 = load %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %230 = addf %229, %228 {RelaxedPrecision} : f32 + store %230, %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = load %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %231, %arg2[%222, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = addi %arg4, %c1 : index + %233 = addi %arg3, %arg5 : index + %234 = addi %233, %c9 : index + %235 = addi %arg6, %arg7 : index + %236 = load %arg0[%232, %235] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %237 = load %arg1[%235, %234] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %238 = mulf %236, %237 {RelaxedPrecision} : f32 + %239 = load %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = addf %239, %238 {RelaxedPrecision} : f32 + store %240, %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = load %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %241, %arg2[%232, %234] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %242 = addi %arg4, %c1 : index + %243 = addi %arg3, %arg5 : index + %244 = addi %243, %c10 : index + %245 = addi %arg6, %arg7 : index + %246 = load %arg0[%242, %245] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %247 = load %arg1[%245, %244] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %248 = mulf %246, %247 {RelaxedPrecision} : f32 + %249 = load %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = addf %249, %248 {RelaxedPrecision} : f32 + store %250, %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %251 = load %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %251, %arg2[%242, %244] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %252 = addi %arg4, %c1 : index + %253 = addi %arg3, %arg5 : index + %254 = addi %253, %c11 : index + %255 = addi %arg6, %arg7 : index + %256 = load %arg0[%252, %255] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %257 = load %arg1[%255, %254] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = mulf %256, %257 {RelaxedPrecision} : f32 + %259 = load %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %260 = addf %259, %258 {RelaxedPrecision} : f32 + store %260, %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %261 = load %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %261, %arg2[%252, %254] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %262 = addi %arg4, %c1 : index + %263 = addi %arg3, %arg5 : index + %264 = addi %263, %c12 : index + %265 = addi %arg6, %arg7 : index + %266 = load %arg0[%262, %265] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %267 = load %arg1[%265, %264] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = mulf %266, %267 {RelaxedPrecision} : f32 + %269 = load %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %270 = addf %269, %268 {RelaxedPrecision} : f32 + store %270, %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %271 = load %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %271, %arg2[%262, %264] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %272 = addi %arg4, %c1 : index + %273 = addi %arg3, %arg5 : index + %274 = addi %273, %c13 : index + %275 = addi %arg6, %arg7 : index + %276 = load %arg0[%272, %275] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %277 = load %arg1[%275, %274] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %278 = mulf %276, %277 {RelaxedPrecision} : f32 + %279 = load %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %280 = addf %279, %278 {RelaxedPrecision} : f32 + store %280, %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %281 = load %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %281, %arg2[%272, %274] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = addi %arg4, %c1 : index + %283 = addi %arg3, %arg5 : index + %284 = addi %283, %c14 : index + %285 = addi %arg6, %arg7 : index + %286 = load %arg0[%282, %285] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %287 = load %arg1[%285, %284] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %288 = mulf %286, %287 {RelaxedPrecision} : f32 + %289 = load %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %290 = addf %289, %288 {RelaxedPrecision} : f32 + store %290, %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %291 = load %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %291, %arg2[%282, %284] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = addi %arg4, %c1 : index + %293 = addi %arg3, %arg5 : index + %294 = addi %293, %c15 : index + %295 = addi %arg6, %arg7 : index + %296 = load %arg0[%292, %295] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %297 = load %arg1[%295, %294] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %298 = mulf %296, %297 {RelaxedPrecision} : f32 + %299 = load %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %300 = addf %299, %298 {RelaxedPrecision} : f32 + store %300, %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = load %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %301, %arg2[%292, %294] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %302 = addi %arg4, %c2 : index + %303 = addi %arg3, %arg5 : index + %304 = addi %arg6, %arg7 : index + %305 = load %arg0[%302, %304] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %306 = load %arg1[%304, %303] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %307 = mulf %305, %306 {RelaxedPrecision} : f32 + %308 = load %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %309 = addf %308, %307 {RelaxedPrecision} : f32 + store %309, %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = load %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %310, %arg2[%302, %303] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = addi %arg4, %c2 : index + %312 = addi %arg3, %arg5 : index + %313 = addi %312, %c1 : index + %314 = addi %arg6, %arg7 : index + %315 = load %arg0[%311, %314] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %316 = load %arg1[%314, %313] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = mulf %315, %316 {RelaxedPrecision} : f32 + %318 = load %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = addf %318, %317 {RelaxedPrecision} : f32 + store %319, %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %320, %arg2[%311, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addi %arg4, %c2 : index + %322 = addi %arg3, %arg5 : index + %323 = addi %322, %c2 : index + %324 = addi %arg6, %arg7 : index + %325 = load %arg0[%321, %324] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %326 = load %arg1[%324, %323] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %327 = mulf %325, %326 {RelaxedPrecision} : f32 + %328 = load %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = addf %328, %327 {RelaxedPrecision} : f32 + store %329, %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = load %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %330, %arg2[%321, %323] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %331 = addi %arg4, %c2 : index + %332 = addi %arg3, %arg5 : index + %333 = addi %332, %c3 : index + %334 = addi %arg6, %arg7 : index + %335 = load %arg0[%331, %334] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %336 = load %arg1[%334, %333] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = mulf %335, %336 {RelaxedPrecision} : f32 + %338 = load %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addf %338, %337 {RelaxedPrecision} : f32 + store %339, %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %340 = load %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %340, %arg2[%331, %333] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %341 = addi %arg4, %c2 : index + %342 = addi %arg3, %arg5 : index + %343 = addi %342, %c4 : index + %344 = addi %arg6, %arg7 : index + %345 = load %arg0[%341, %344] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %346 = load %arg1[%344, %343] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = mulf %345, %346 {RelaxedPrecision} : f32 + %348 = load %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %349 = addf %348, %347 {RelaxedPrecision} : f32 + store %349, %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %350 = load %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %350, %arg2[%341, %343] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %351 = addi %arg4, %c2 : index + %352 = addi %arg3, %arg5 : index + %353 = addi %352, %c5 : index + %354 = addi %arg6, %arg7 : index + %355 = load %arg0[%351, %354] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %356 = load %arg1[%354, %353] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = mulf %355, %356 {RelaxedPrecision} : f32 + %358 = load %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %359 = addf %358, %357 {RelaxedPrecision} : f32 + store %359, %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %360 = load %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %360, %arg2[%351, %353] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %361 = addi %arg4, %c2 : index + %362 = addi %arg3, %arg5 : index + %363 = addi %362, %c6 : index + %364 = addi %arg6, %arg7 : index + %365 = load %arg0[%361, %364] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %366 = load %arg1[%364, %363] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %367 = mulf %365, %366 {RelaxedPrecision} : f32 + %368 = load %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %369 = addf %368, %367 {RelaxedPrecision} : f32 + store %369, %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %370 = load %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %370, %arg2[%361, %363] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = addi %arg4, %c2 : index + %372 = addi %arg3, %arg5 : index + %373 = addi %372, %c7 : index + %374 = addi %arg6, %arg7 : index + %375 = load %arg0[%371, %374] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %376 = load %arg1[%374, %373] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %377 = mulf %375, %376 {RelaxedPrecision} : f32 + %378 = load %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %379 = addf %378, %377 {RelaxedPrecision} : f32 + store %379, %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = load %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %380, %arg2[%371, %373] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %381 = addi %arg4, %c2 : index + %382 = addi %arg3, %arg5 : index + %383 = addi %382, %c8 : index + %384 = addi %arg6, %arg7 : index + %385 = load %arg0[%381, %384] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %386 = load %arg1[%384, %383] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %387 = mulf %385, %386 {RelaxedPrecision} : f32 + %388 = load %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = addf %388, %387 {RelaxedPrecision} : f32 + store %389, %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %390 = load %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %390, %arg2[%381, %383] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = addi %arg4, %c2 : index + %392 = addi %arg3, %arg5 : index + %393 = addi %392, %c9 : index + %394 = addi %arg6, %arg7 : index + %395 = load %arg0[%391, %394] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %396 = load %arg1[%394, %393] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %397 = mulf %395, %396 {RelaxedPrecision} : f32 + %398 = load %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %399 = addf %398, %397 {RelaxedPrecision} : f32 + store %399, %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = load %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %400, %arg2[%391, %393] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %401 = addi %arg4, %c2 : index + %402 = addi %arg3, %arg5 : index + %403 = addi %402, %c10 : index + %404 = addi %arg6, %arg7 : index + %405 = load %arg0[%401, %404] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%404, %403] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%401, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = addi %arg4, %c2 : index + %412 = addi %arg3, %arg5 : index + %413 = addi %412, %c11 : index + %414 = addi %arg6, %arg7 : index + %415 = load %arg0[%411, %414] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %416 = load %arg1[%414, %413] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %417 = mulf %415, %416 {RelaxedPrecision} : f32 + %418 = load %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = addf %418, %417 {RelaxedPrecision} : f32 + store %419, %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = load %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %420, %arg2[%411, %413] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %421 = addi %arg4, %c2 : index + %422 = addi %arg3, %arg5 : index + %423 = addi %422, %c12 : index + %424 = addi %arg6, %arg7 : index + %425 = load %arg0[%421, %424] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %426 = load %arg1[%424, %423] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = mulf %425, %426 {RelaxedPrecision} : f32 + %428 = load %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = addf %428, %427 {RelaxedPrecision} : f32 + store %429, %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %430 = load %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %430, %arg2[%421, %423] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %431 = addi %arg4, %c2 : index + %432 = addi %arg3, %arg5 : index + %433 = addi %432, %c13 : index + %434 = addi %arg6, %arg7 : index + %435 = load %arg0[%431, %434] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %436 = load %arg1[%434, %433] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = mulf %435, %436 {RelaxedPrecision} : f32 + %438 = load %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %439 = addf %438, %437 {RelaxedPrecision} : f32 + store %439, %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %440 = load %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %440, %arg2[%431, %433] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %441 = addi %arg4, %c2 : index + %442 = addi %arg3, %arg5 : index + %443 = addi %442, %c14 : index + %444 = addi %arg6, %arg7 : index + %445 = load %arg0[%441, %444] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %446 = load %arg1[%444, %443] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %447 = mulf %445, %446 {RelaxedPrecision} : f32 + %448 = load %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %449 = addf %448, %447 {RelaxedPrecision} : f32 + store %449, %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %450 = load %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %450, %arg2[%441, %443] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = addi %arg4, %c2 : index + %452 = addi %arg3, %arg5 : index + %453 = addi %452, %c15 : index + %454 = addi %arg6, %arg7 : index + %455 = load %arg0[%451, %454] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %456 = load %arg1[%454, %453] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %457 = mulf %455, %456 {RelaxedPrecision} : f32 + %458 = load %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %459 = addf %458, %457 {RelaxedPrecision} : f32 + store %459, %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = load %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %460, %arg2[%451, %453] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %461 = addi %arg4, %c3 : index + %462 = addi %arg3, %arg5 : index + %463 = addi %arg6, %arg7 : index + %464 = load %arg0[%461, %463] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %465 = load %arg1[%463, %462] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %466 = mulf %464, %465 {RelaxedPrecision} : f32 + %467 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %468 = addf %467, %466 {RelaxedPrecision} : f32 + store %468, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = load %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %469, %arg2[%461, %462] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %470 = addi %arg4, %c3 : index + %471 = addi %arg3, %arg5 : index + %472 = addi %471, %c1 : index + %473 = addi %arg6, %arg7 : index + %474 = load %arg0[%470, %473] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %475 = load %arg1[%473, %472] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %476 = mulf %474, %475 {RelaxedPrecision} : f32 + %477 = load %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = addf %477, %476 {RelaxedPrecision} : f32 + store %478, %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %479 = load %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %479, %arg2[%470, %472] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %480 = addi %arg4, %c3 : index + %481 = addi %arg3, %arg5 : index + %482 = addi %481, %c2 : index + %483 = addi %arg6, %arg7 : index + %484 = load %arg0[%480, %483] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %485 = load %arg1[%483, %482] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %486 = mulf %484, %485 {RelaxedPrecision} : f32 + %487 = load %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %488 = addf %487, %486 {RelaxedPrecision} : f32 + store %488, %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = load %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %489, %arg2[%480, %482] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %490 = addi %arg4, %c3 : index + %491 = addi %arg3, %arg5 : index + %492 = addi %491, %c3 : index + %493 = addi %arg6, %arg7 : index + %494 = load %arg0[%490, %493] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %495 = load %arg1[%493, %492] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = mulf %494, %495 {RelaxedPrecision} : f32 + %497 = load %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %498 = addf %497, %496 {RelaxedPrecision} : f32 + store %498, %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = load %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %499, %arg2[%490, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = addi %arg4, %c3 : index + %501 = addi %arg3, %arg5 : index + %502 = addi %501, %c4 : index + %503 = addi %arg6, %arg7 : index + %504 = load %arg0[%500, %503] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %505 = load %arg1[%503, %502] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %506 = mulf %504, %505 {RelaxedPrecision} : f32 + %507 = load %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = addf %507, %506 {RelaxedPrecision} : f32 + store %508, %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %509 = load %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %509, %arg2[%500, %502] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %510 = addi %arg4, %c3 : index + %511 = addi %arg3, %arg5 : index + %512 = addi %511, %c5 : index + %513 = addi %arg6, %arg7 : index + %514 = load %arg0[%510, %513] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %515 = load %arg1[%513, %512] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = mulf %514, %515 {RelaxedPrecision} : f32 + %517 = load %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addf %517, %516 {RelaxedPrecision} : f32 + store %518, %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %519 = load %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %519, %arg2[%510, %512] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %520 = addi %arg4, %c3 : index + %521 = addi %arg3, %arg5 : index + %522 = addi %521, %c6 : index + %523 = addi %arg6, %arg7 : index + %524 = load %arg0[%520, %523] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %525 = load %arg1[%523, %522] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = mulf %524, %525 {RelaxedPrecision} : f32 + %527 = load %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %528 = addf %527, %526 {RelaxedPrecision} : f32 + store %528, %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %529 = load %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %529, %arg2[%520, %522] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %530 = addi %arg4, %c3 : index + %531 = addi %arg3, %arg5 : index + %532 = addi %531, %c7 : index + %533 = addi %arg6, %arg7 : index + %534 = load %arg0[%530, %533] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %535 = load %arg1[%533, %532] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = mulf %534, %535 {RelaxedPrecision} : f32 + %537 = load %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %538 = addf %537, %536 {RelaxedPrecision} : f32 + store %538, %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %539 = load %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %539, %arg2[%530, %532] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %540 = addi %arg4, %c3 : index + %541 = addi %arg3, %arg5 : index + %542 = addi %541, %c8 : index + %543 = addi %arg6, %arg7 : index + %544 = load %arg0[%540, %543] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %545 = load %arg1[%543, %542] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %546 = mulf %544, %545 {RelaxedPrecision} : f32 + %547 = load %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %548 = addf %547, %546 {RelaxedPrecision} : f32 + store %548, %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %549 = load %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %549, %arg2[%540, %542] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = addi %arg4, %c3 : index + %551 = addi %arg3, %arg5 : index + %552 = addi %551, %c9 : index + %553 = addi %arg6, %arg7 : index + %554 = load %arg0[%550, %553] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %555 = load %arg1[%553, %552] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %556 = mulf %554, %555 {RelaxedPrecision} : f32 + %557 = load %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %558 = addf %557, %556 {RelaxedPrecision} : f32 + store %558, %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = load %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %559, %arg2[%550, %552] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %560 = addi %arg4, %c3 : index + %561 = addi %arg3, %arg5 : index + %562 = addi %561, %c10 : index + %563 = addi %arg6, %arg7 : index + %564 = load %arg0[%560, %563] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %565 = load %arg1[%563, %562] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %566 = mulf %564, %565 {RelaxedPrecision} : f32 + %567 = load %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = addf %567, %566 {RelaxedPrecision} : f32 + store %568, %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %569 = load %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %569, %arg2[%560, %562] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = addi %arg4, %c3 : index + %571 = addi %arg3, %arg5 : index + %572 = addi %571, %c11 : index + %573 = addi %arg6, %arg7 : index + %574 = load %arg0[%570, %573] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %575 = load %arg1[%573, %572] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %576 = mulf %574, %575 {RelaxedPrecision} : f32 + %577 = load %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %578 = addf %577, %576 {RelaxedPrecision} : f32 + store %578, %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %579 = load %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %579, %arg2[%570, %572] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %580 = addi %arg4, %c3 : index + %581 = addi %arg3, %arg5 : index + %582 = addi %581, %c12 : index + %583 = addi %arg6, %arg7 : index + %584 = load %arg0[%580, %583] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %585 = load %arg1[%583, %582] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %586 = mulf %584, %585 {RelaxedPrecision} : f32 + %587 = load %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %588 = addf %587, %586 {RelaxedPrecision} : f32 + store %588, %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %589 = load %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %589, %arg2[%580, %582] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %590 = addi %arg4, %c3 : index + %591 = addi %arg3, %arg5 : index + %592 = addi %591, %c13 : index + %593 = addi %arg6, %arg7 : index + %594 = load %arg0[%590, %593] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %595 = load %arg1[%593, %592] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %596 = mulf %594, %595 {RelaxedPrecision} : f32 + %597 = load %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %598 = addf %597, %596 {RelaxedPrecision} : f32 + store %598, %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %599 = load %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %599, %arg2[%590, %592] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %600 = addi %arg4, %c3 : index + %601 = addi %arg3, %arg5 : index + %602 = addi %601, %c14 : index + %603 = addi %arg6, %arg7 : index + %604 = load %arg0[%600, %603] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %605 = load %arg1[%603, %602] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %606 = mulf %604, %605 {RelaxedPrecision} : f32 + %607 = load %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %608 = addf %607, %606 {RelaxedPrecision} : f32 + store %608, %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %609 = load %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %609, %arg2[%600, %602] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %610 = addi %arg4, %c3 : index + %611 = addi %arg3, %arg5 : index + %612 = addi %611, %c15 : index + %613 = addi %arg6, %arg7 : index + %614 = load %arg0[%610, %613] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %615 = load %arg1[%613, %612] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %616 = mulf %614, %615 {RelaxedPrecision} : f32 + %617 = load %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %618 = addf %617, %616 {RelaxedPrecision} : f32 + store %618, %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %619 = load %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %619, %arg2[%610, %612] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %620 = addi %arg4, %c4 : index + %621 = addi %arg3, %arg5 : index + %622 = addi %arg6, %arg7 : index + %623 = load %arg0[%620, %622] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %624 = load %arg1[%622, %621] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %625 = mulf %623, %624 {RelaxedPrecision} : f32 + %626 = load %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %627 = addf %626, %625 {RelaxedPrecision} : f32 + store %627, %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %628 = load %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %628, %arg2[%620, %621] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %629 = addi %arg4, %c4 : index + %630 = addi %arg3, %arg5 : index + %631 = addi %630, %c1 : index + %632 = addi %arg6, %arg7 : index + %633 = load %arg0[%629, %632] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %634 = load %arg1[%632, %631] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %635 = mulf %633, %634 {RelaxedPrecision} : f32 + %636 = load %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %637 = addf %636, %635 {RelaxedPrecision} : f32 + store %637, %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %638 = load %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %638, %arg2[%629, %631] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %639 = addi %arg4, %c4 : index + %640 = addi %arg3, %arg5 : index + %641 = addi %640, %c2 : index + %642 = addi %arg6, %arg7 : index + %643 = load %arg0[%639, %642] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %644 = load %arg1[%642, %641] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %645 = mulf %643, %644 {RelaxedPrecision} : f32 + %646 = load %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %647 = addf %646, %645 {RelaxedPrecision} : f32 + store %647, %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %648 = load %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %648, %arg2[%639, %641] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %649 = addi %arg4, %c4 : index + %650 = addi %arg3, %arg5 : index + %651 = addi %650, %c3 : index + %652 = addi %arg6, %arg7 : index + %653 = load %arg0[%649, %652] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %654 = load %arg1[%652, %651] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %655 = mulf %653, %654 {RelaxedPrecision} : f32 + %656 = load %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %657 = addf %656, %655 {RelaxedPrecision} : f32 + store %657, %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %658 = load %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %658, %arg2[%649, %651] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %659 = addi %arg4, %c4 : index + %660 = addi %arg3, %arg5 : index + %661 = addi %660, %c4 : index + %662 = addi %arg6, %arg7 : index + %663 = load %arg0[%659, %662] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %664 = load %arg1[%662, %661] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %665 = mulf %663, %664 {RelaxedPrecision} : f32 + %666 = load %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %667 = addf %666, %665 {RelaxedPrecision} : f32 + store %667, %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %668 = load %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %668, %arg2[%659, %661] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %669 = addi %arg4, %c4 : index + %670 = addi %arg3, %arg5 : index + %671 = addi %670, %c5 : index + %672 = addi %arg6, %arg7 : index + %673 = load %arg0[%669, %672] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %674 = load %arg1[%672, %671] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %675 = mulf %673, %674 {RelaxedPrecision} : f32 + %676 = load %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %677 = addf %676, %675 {RelaxedPrecision} : f32 + store %677, %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %678 = load %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %678, %arg2[%669, %671] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %679 = addi %arg4, %c4 : index + %680 = addi %arg3, %arg5 : index + %681 = addi %680, %c6 : index + %682 = addi %arg6, %arg7 : index + %683 = load %arg0[%679, %682] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %684 = load %arg1[%682, %681] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %685 = mulf %683, %684 {RelaxedPrecision} : f32 + %686 = load %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %687 = addf %686, %685 {RelaxedPrecision} : f32 + store %687, %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %688 = load %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %688, %arg2[%679, %681] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %689 = addi %arg4, %c4 : index + %690 = addi %arg3, %arg5 : index + %691 = addi %690, %c7 : index + %692 = addi %arg6, %arg7 : index + %693 = load %arg0[%689, %692] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %694 = load %arg1[%692, %691] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %695 = mulf %693, %694 {RelaxedPrecision} : f32 + %696 = load %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %697 = addf %696, %695 {RelaxedPrecision} : f32 + store %697, %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %698 = load %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %698, %arg2[%689, %691] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %699 = addi %arg4, %c4 : index + %700 = addi %arg3, %arg5 : index + %701 = addi %700, %c8 : index + %702 = addi %arg6, %arg7 : index + %703 = load %arg0[%699, %702] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %704 = load %arg1[%702, %701] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %705 = mulf %703, %704 {RelaxedPrecision} : f32 + %706 = load %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %707 = addf %706, %705 {RelaxedPrecision} : f32 + store %707, %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %708 = load %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %708, %arg2[%699, %701] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %709 = addi %arg4, %c4 : index + %710 = addi %arg3, %arg5 : index + %711 = addi %710, %c9 : index + %712 = addi %arg6, %arg7 : index + %713 = load %arg0[%709, %712] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %714 = load %arg1[%712, %711] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %715 = mulf %713, %714 {RelaxedPrecision} : f32 + %716 = load %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %717 = addf %716, %715 {RelaxedPrecision} : f32 + store %717, %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %718 = load %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %718, %arg2[%709, %711] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %719 = addi %arg4, %c4 : index + %720 = addi %arg3, %arg5 : index + %721 = addi %720, %c10 : index + %722 = addi %arg6, %arg7 : index + %723 = load %arg0[%719, %722] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %724 = load %arg1[%722, %721] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %725 = mulf %723, %724 {RelaxedPrecision} : f32 + %726 = load %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %727 = addf %726, %725 {RelaxedPrecision} : f32 + store %727, %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %728 = load %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %728, %arg2[%719, %721] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %729 = addi %arg4, %c4 : index + %730 = addi %arg3, %arg5 : index + %731 = addi %730, %c11 : index + %732 = addi %arg6, %arg7 : index + %733 = load %arg0[%729, %732] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %734 = load %arg1[%732, %731] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %735 = mulf %733, %734 {RelaxedPrecision} : f32 + %736 = load %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %737 = addf %736, %735 {RelaxedPrecision} : f32 + store %737, %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %738 = load %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %738, %arg2[%729, %731] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %739 = addi %arg4, %c4 : index + %740 = addi %arg3, %arg5 : index + %741 = addi %740, %c12 : index + %742 = addi %arg6, %arg7 : index + %743 = load %arg0[%739, %742] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %744 = load %arg1[%742, %741] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %745 = mulf %743, %744 {RelaxedPrecision} : f32 + %746 = load %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %747 = addf %746, %745 {RelaxedPrecision} : f32 + store %747, %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %748 = load %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %748, %arg2[%739, %741] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %749 = addi %arg4, %c4 : index + %750 = addi %arg3, %arg5 : index + %751 = addi %750, %c13 : index + %752 = addi %arg6, %arg7 : index + %753 = load %arg0[%749, %752] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %754 = load %arg1[%752, %751] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %755 = mulf %753, %754 {RelaxedPrecision} : f32 + %756 = load %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %757 = addf %756, %755 {RelaxedPrecision} : f32 + store %757, %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %758 = load %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %758, %arg2[%749, %751] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %759 = addi %arg4, %c4 : index + %760 = addi %arg3, %arg5 : index + %761 = addi %760, %c14 : index + %762 = addi %arg6, %arg7 : index + %763 = load %arg0[%759, %762] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %764 = load %arg1[%762, %761] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %765 = mulf %763, %764 {RelaxedPrecision} : f32 + %766 = load %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %767 = addf %766, %765 {RelaxedPrecision} : f32 + store %767, %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %768 = load %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %768, %arg2[%759, %761] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %769 = addi %arg4, %c4 : index + %770 = addi %arg3, %arg5 : index + %771 = addi %770, %c15 : index + %772 = addi %arg6, %arg7 : index + %773 = load %arg0[%769, %772] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %774 = load %arg1[%772, %771] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %775 = mulf %773, %774 {RelaxedPrecision} : f32 + %776 = load %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %777 = addf %776, %775 {RelaxedPrecision} : f32 + store %777, %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %778 = load %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %778, %arg2[%769, %771] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %779 = addi %arg4, %c5 : index + %780 = addi %arg3, %arg5 : index + %781 = addi %arg6, %arg7 : index + %782 = load %arg0[%779, %781] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %783 = load %arg1[%781, %780] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %784 = mulf %782, %783 {RelaxedPrecision} : f32 + %785 = load %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %786 = addf %785, %784 {RelaxedPrecision} : f32 + store %786, %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %787 = load %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %787, %arg2[%779, %780] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %788 = addi %arg4, %c5 : index + %789 = addi %arg3, %arg5 : index + %790 = addi %789, %c1 : index + %791 = addi %arg6, %arg7 : index + %792 = load %arg0[%788, %791] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %793 = load %arg1[%791, %790] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %794 = mulf %792, %793 {RelaxedPrecision} : f32 + %795 = load %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %796 = addf %795, %794 {RelaxedPrecision} : f32 + store %796, %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %797 = load %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %797, %arg2[%788, %790] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %798 = addi %arg4, %c5 : index + %799 = addi %arg3, %arg5 : index + %800 = addi %799, %c2 : index + %801 = addi %arg6, %arg7 : index + %802 = load %arg0[%798, %801] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %803 = load %arg1[%801, %800] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %804 = mulf %802, %803 {RelaxedPrecision} : f32 + %805 = load %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %806 = addf %805, %804 {RelaxedPrecision} : f32 + store %806, %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %807 = load %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %807, %arg2[%798, %800] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %808 = addi %arg4, %c5 : index + %809 = addi %arg3, %arg5 : index + %810 = addi %809, %c3 : index + %811 = addi %arg6, %arg7 : index + %812 = load %arg0[%808, %811] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %813 = load %arg1[%811, %810] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %814 = mulf %812, %813 {RelaxedPrecision} : f32 + %815 = load %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %816 = addf %815, %814 {RelaxedPrecision} : f32 + store %816, %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %817 = load %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %817, %arg2[%808, %810] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %818 = addi %arg4, %c5 : index + %819 = addi %arg3, %arg5 : index + %820 = addi %819, %c4 : index + %821 = addi %arg6, %arg7 : index + %822 = load %arg0[%818, %821] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %823 = load %arg1[%821, %820] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %824 = mulf %822, %823 {RelaxedPrecision} : f32 + %825 = load %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %826 = addf %825, %824 {RelaxedPrecision} : f32 + store %826, %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %827 = load %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %827, %arg2[%818, %820] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %828 = addi %arg4, %c5 : index + %829 = addi %arg3, %arg5 : index + %830 = addi %829, %c5 : index + %831 = addi %arg6, %arg7 : index + %832 = load %arg0[%828, %831] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %833 = load %arg1[%831, %830] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %834 = mulf %832, %833 {RelaxedPrecision} : f32 + %835 = load %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %836 = addf %835, %834 {RelaxedPrecision} : f32 + store %836, %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %837 = load %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %837, %arg2[%828, %830] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %838 = addi %arg4, %c5 : index + %839 = addi %arg3, %arg5 : index + %840 = addi %839, %c6 : index + %841 = addi %arg6, %arg7 : index + %842 = load %arg0[%838, %841] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %843 = load %arg1[%841, %840] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %844 = mulf %842, %843 {RelaxedPrecision} : f32 + %845 = load %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %846 = addf %845, %844 {RelaxedPrecision} : f32 + store %846, %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %847 = load %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %847, %arg2[%838, %840] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %848 = addi %arg4, %c5 : index + %849 = addi %arg3, %arg5 : index + %850 = addi %849, %c7 : index + %851 = addi %arg6, %arg7 : index + %852 = load %arg0[%848, %851] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %853 = load %arg1[%851, %850] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %854 = mulf %852, %853 {RelaxedPrecision} : f32 + %855 = load %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %856 = addf %855, %854 {RelaxedPrecision} : f32 + store %856, %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %857 = load %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %857, %arg2[%848, %850] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %858 = addi %arg4, %c5 : index + %859 = addi %arg3, %arg5 : index + %860 = addi %859, %c8 : index + %861 = addi %arg6, %arg7 : index + %862 = load %arg0[%858, %861] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %863 = load %arg1[%861, %860] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %864 = mulf %862, %863 {RelaxedPrecision} : f32 + %865 = load %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %866 = addf %865, %864 {RelaxedPrecision} : f32 + store %866, %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %867 = load %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %867, %arg2[%858, %860] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %868 = addi %arg4, %c5 : index + %869 = addi %arg3, %arg5 : index + %870 = addi %869, %c9 : index + %871 = addi %arg6, %arg7 : index + %872 = load %arg0[%868, %871] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %873 = load %arg1[%871, %870] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %874 = mulf %872, %873 {RelaxedPrecision} : f32 + %875 = load %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %876 = addf %875, %874 {RelaxedPrecision} : f32 + store %876, %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %877 = load %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %877, %arg2[%868, %870] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %878 = addi %arg4, %c5 : index + %879 = addi %arg3, %arg5 : index + %880 = addi %879, %c10 : index + %881 = addi %arg6, %arg7 : index + %882 = load %arg0[%878, %881] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %883 = load %arg1[%881, %880] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %884 = mulf %882, %883 {RelaxedPrecision} : f32 + %885 = load %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %886 = addf %885, %884 {RelaxedPrecision} : f32 + store %886, %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %887 = load %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %887, %arg2[%878, %880] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %888 = addi %arg4, %c5 : index + %889 = addi %arg3, %arg5 : index + %890 = addi %889, %c11 : index + %891 = addi %arg6, %arg7 : index + %892 = load %arg0[%888, %891] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %893 = load %arg1[%891, %890] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %894 = mulf %892, %893 {RelaxedPrecision} : f32 + %895 = load %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %896 = addf %895, %894 {RelaxedPrecision} : f32 + store %896, %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %897 = load %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %897, %arg2[%888, %890] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %898 = addi %arg4, %c5 : index + %899 = addi %arg3, %arg5 : index + %900 = addi %899, %c12 : index + %901 = addi %arg6, %arg7 : index + %902 = load %arg0[%898, %901] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %903 = load %arg1[%901, %900] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %904 = mulf %902, %903 {RelaxedPrecision} : f32 + %905 = load %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %906 = addf %905, %904 {RelaxedPrecision} : f32 + store %906, %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %907 = load %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %907, %arg2[%898, %900] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %908 = addi %arg4, %c5 : index + %909 = addi %arg3, %arg5 : index + %910 = addi %909, %c13 : index + %911 = addi %arg6, %arg7 : index + %912 = load %arg0[%908, %911] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %913 = load %arg1[%911, %910] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %914 = mulf %912, %913 {RelaxedPrecision} : f32 + %915 = load %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %916 = addf %915, %914 {RelaxedPrecision} : f32 + store %916, %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %917 = load %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %917, %arg2[%908, %910] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %918 = addi %arg4, %c5 : index + %919 = addi %arg3, %arg5 : index + %920 = addi %919, %c14 : index + %921 = addi %arg6, %arg7 : index + %922 = load %arg0[%918, %921] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %923 = load %arg1[%921, %920] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %924 = mulf %922, %923 {RelaxedPrecision} : f32 + %925 = load %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %926 = addf %925, %924 {RelaxedPrecision} : f32 + store %926, %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %927 = load %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %927, %arg2[%918, %920] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %928 = addi %arg4, %c5 : index + %929 = addi %arg3, %arg5 : index + %930 = addi %929, %c15 : index + %931 = addi %arg6, %arg7 : index + %932 = load %arg0[%928, %931] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %933 = load %arg1[%931, %930] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %934 = mulf %932, %933 {RelaxedPrecision} : f32 + %935 = load %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %936 = addf %935, %934 {RelaxedPrecision} : f32 + store %936, %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %937 = load %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %937, %arg2[%928, %930] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + scf.for %arg4 = %c0 to %c256 step %c16 { + scf.for %arg5 = %c0 to %c128 step %c4 { + scf.for %arg6 = %c0 to %c4 step %c1 { + %0 = addi %arg3, %arg4 : index + %1 = addi %arg5, %arg6 : index + %2 = load %arg0[%c780, %1] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %3 = load %arg1[%1, %0] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %4 = mulf %2, %3 {RelaxedPrecision} : f32 + %5 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %6 = addf %5, %4 {RelaxedPrecision} : f32 + store %6, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %7 = load %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %7, %arg2[%c780, %0] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %8 = addi %arg3, %arg4 : index + %9 = addi %8, %c1 : index + %10 = addi %arg5, %arg6 : index + %11 = load %arg0[%c780, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %12 = load %arg1[%10, %9] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %13 = mulf %11, %12 {RelaxedPrecision} : f32 + %14 = load %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %15 = addf %14, %13 {RelaxedPrecision} : f32 + store %15, %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %16 = load %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %16, %arg2[%c780, %9] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %17 = addi %arg3, %arg4 : index + %18 = addi %17, %c2 : index + %19 = addi %arg5, %arg6 : index + %20 = load %arg0[%c780, %19] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %21 = load %arg1[%19, %18] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %22 = mulf %20, %21 {RelaxedPrecision} : f32 + %23 = load %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %24 = addf %23, %22 {RelaxedPrecision} : f32 + store %24, %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %25 = load %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %25, %arg2[%c780, %18] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %26 = addi %arg3, %arg4 : index + %27 = addi %26, %c3 : index + %28 = addi %arg5, %arg6 : index + %29 = load %arg0[%c780, %28] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %30 = load %arg1[%28, %27] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %31 = mulf %29, %30 {RelaxedPrecision} : f32 + %32 = load %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %33 = addf %32, %31 {RelaxedPrecision} : f32 + store %33, %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %34 = load %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %34, %arg2[%c780, %27] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %35 = addi %arg3, %arg4 : index + %36 = addi %35, %c4 : index + %37 = addi %arg5, %arg6 : index + %38 = load %arg0[%c780, %37] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %39 = load %arg1[%37, %36] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %40 = mulf %38, %39 {RelaxedPrecision} : f32 + %41 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %42 = addf %41, %40 {RelaxedPrecision} : f32 + store %42, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %43 = load %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %43, %arg2[%c780, %36] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %44 = addi %arg3, %arg4 : index + %45 = addi %44, %c5 : index + %46 = addi %arg5, %arg6 : index + %47 = load %arg0[%c780, %46] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %48 = load %arg1[%46, %45] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %49 = mulf %47, %48 {RelaxedPrecision} : f32 + %50 = load %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %51 = addf %50, %49 {RelaxedPrecision} : f32 + store %51, %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %52 = load %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %52, %arg2[%c780, %45] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %53 = addi %arg3, %arg4 : index + %54 = addi %53, %c6 : index + %55 = addi %arg5, %arg6 : index + %56 = load %arg0[%c780, %55] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %57 = load %arg1[%55, %54] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %58 = mulf %56, %57 {RelaxedPrecision} : f32 + %59 = load %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %60 = addf %59, %58 {RelaxedPrecision} : f32 + store %60, %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %61 = load %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %61, %arg2[%c780, %54] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %62 = addi %arg3, %arg4 : index + %63 = addi %62, %c7 : index + %64 = addi %arg5, %arg6 : index + %65 = load %arg0[%c780, %64] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %66 = load %arg1[%64, %63] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %67 = mulf %65, %66 {RelaxedPrecision} : f32 + %68 = load %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %69 = addf %68, %67 {RelaxedPrecision} : f32 + store %69, %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %70 = load %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %70, %arg2[%c780, %63] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %71 = addi %arg3, %arg4 : index + %72 = addi %71, %c8 : index + %73 = addi %arg5, %arg6 : index + %74 = load %arg0[%c780, %73] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %75 = load %arg1[%73, %72] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %76 = mulf %74, %75 {RelaxedPrecision} : f32 + %77 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %78 = addf %77, %76 {RelaxedPrecision} : f32 + store %78, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %79 = load %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %79, %arg2[%c780, %72] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %80 = addi %arg3, %arg4 : index + %81 = addi %80, %c9 : index + %82 = addi %arg5, %arg6 : index + %83 = load %arg0[%c780, %82] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %84 = load %arg1[%82, %81] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %85 = mulf %83, %84 {RelaxedPrecision} : f32 + %86 = load %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %87 = addf %86, %85 {RelaxedPrecision} : f32 + store %87, %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %88 = load %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %88, %arg2[%c780, %81] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %89 = addi %arg3, %arg4 : index + %90 = addi %89, %c10 : index + %91 = addi %arg5, %arg6 : index + %92 = load %arg0[%c780, %91] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %93 = load %arg1[%91, %90] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %94 = mulf %92, %93 {RelaxedPrecision} : f32 + %95 = load %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %96 = addf %95, %94 {RelaxedPrecision} : f32 + store %96, %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %97 = load %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %97, %arg2[%c780, %90] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %98 = addi %arg3, %arg4 : index + %99 = addi %98, %c11 : index + %100 = addi %arg5, %arg6 : index + %101 = load %arg0[%c780, %100] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %102 = load %arg1[%100, %99] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %103 = mulf %101, %102 {RelaxedPrecision} : f32 + %104 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %105 = addf %104, %103 {RelaxedPrecision} : f32 + store %105, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %106 = load %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %106, %arg2[%c780, %99] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %107 = addi %arg3, %arg4 : index + %108 = addi %107, %c12 : index + %109 = addi %arg5, %arg6 : index + %110 = load %arg0[%c780, %109] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %111 = load %arg1[%109, %108] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %112 = mulf %110, %111 {RelaxedPrecision} : f32 + %113 = load %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %114 = addf %113, %112 {RelaxedPrecision} : f32 + store %114, %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %115 = load %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %115, %arg2[%c780, %108] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %116 = addi %arg3, %arg4 : index + %117 = addi %116, %c13 : index + %118 = addi %arg5, %arg6 : index + %119 = load %arg0[%c780, %118] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %120 = load %arg1[%118, %117] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %121 = mulf %119, %120 {RelaxedPrecision} : f32 + %122 = load %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %123 = addf %122, %121 {RelaxedPrecision} : f32 + store %123, %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %124 = load %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %124, %arg2[%c780, %117] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %125 = addi %arg3, %arg4 : index + %126 = addi %125, %c14 : index + %127 = addi %arg5, %arg6 : index + %128 = load %arg0[%c780, %127] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %129 = load %arg1[%127, %126] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %130 = mulf %128, %129 {RelaxedPrecision} : f32 + %131 = load %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %132 = addf %131, %130 {RelaxedPrecision} : f32 + store %132, %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %133 = load %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %133, %arg2[%c780, %126] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %134 = addi %arg3, %arg4 : index + %135 = addi %134, %c15 : index + %136 = addi %arg5, %arg6 : index + %137 = load %arg0[%c780, %136] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %138 = load %arg1[%136, %135] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %139 = mulf %137, %138 {RelaxedPrecision} : f32 + %140 = load %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %141 = addf %140, %139 {RelaxedPrecision} : f32 + store %141, %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %142 = load %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %142, %arg2[%c780, %135] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %143 = addi %arg3, %arg4 : index + %144 = addi %arg5, %arg6 : index + %145 = load %arg0[%c781, %144] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %146 = load %arg1[%144, %143] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %147 = mulf %145, %146 {RelaxedPrecision} : f32 + %148 = load %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %149 = addf %148, %147 {RelaxedPrecision} : f32 + store %149, %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %150 = load %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %150, %arg2[%c781, %143] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %151 = addi %arg3, %arg4 : index + %152 = addi %151, %c1 : index + %153 = addi %arg5, %arg6 : index + %154 = load %arg0[%c781, %153] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %155 = load %arg1[%153, %152] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %156 = mulf %154, %155 {RelaxedPrecision} : f32 + %157 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %158 = addf %157, %156 {RelaxedPrecision} : f32 + store %158, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %159 = load %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %159, %arg2[%c781, %152] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %160 = addi %arg3, %arg4 : index + %161 = addi %160, %c2 : index + %162 = addi %arg5, %arg6 : index + %163 = load %arg0[%c781, %162] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %164 = load %arg1[%162, %161] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %165 = mulf %163, %164 {RelaxedPrecision} : f32 + %166 = load %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %167 = addf %166, %165 {RelaxedPrecision} : f32 + store %167, %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %168 = load %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %168, %arg2[%c781, %161] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %169 = addi %arg3, %arg4 : index + %170 = addi %169, %c3 : index + %171 = addi %arg5, %arg6 : index + %172 = load %arg0[%c781, %171] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %173 = load %arg1[%171, %170] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %174 = mulf %172, %173 {RelaxedPrecision} : f32 + %175 = load %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %176 = addf %175, %174 {RelaxedPrecision} : f32 + store %176, %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %177 = load %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %177, %arg2[%c781, %170] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %178 = addi %arg3, %arg4 : index + %179 = addi %178, %c4 : index + %180 = addi %arg5, %arg6 : index + %181 = load %arg0[%c781, %180] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %182 = load %arg1[%180, %179] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %183 = mulf %181, %182 {RelaxedPrecision} : f32 + %184 = load %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %185 = addf %184, %183 {RelaxedPrecision} : f32 + store %185, %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %186 = load %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %186, %arg2[%c781, %179] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %187 = addi %arg3, %arg4 : index + %188 = addi %187, %c5 : index + %189 = addi %arg5, %arg6 : index + %190 = load %arg0[%c781, %189] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %191 = load %arg1[%189, %188] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %192 = mulf %190, %191 {RelaxedPrecision} : f32 + %193 = load %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %194 = addf %193, %192 {RelaxedPrecision} : f32 + store %194, %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %195 = load %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %195, %arg2[%c781, %188] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %196 = addi %arg3, %arg4 : index + %197 = addi %196, %c6 : index + %198 = addi %arg5, %arg6 : index + %199 = load %arg0[%c781, %198] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %200 = load %arg1[%198, %197] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %201 = mulf %199, %200 {RelaxedPrecision} : f32 + %202 = load %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %203 = addf %202, %201 {RelaxedPrecision} : f32 + store %203, %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %204 = load %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %204, %arg2[%c781, %197] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %205 = addi %arg3, %arg4 : index + %206 = addi %205, %c7 : index + %207 = addi %arg5, %arg6 : index + %208 = load %arg0[%c781, %207] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %209 = load %arg1[%207, %206] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %210 = mulf %208, %209 {RelaxedPrecision} : f32 + %211 = load %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %212 = addf %211, %210 {RelaxedPrecision} : f32 + store %212, %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %213 = load %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %213, %arg2[%c781, %206] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %214 = addi %arg3, %arg4 : index + %215 = addi %214, %c8 : index + %216 = addi %arg5, %arg6 : index + %217 = load %arg0[%c781, %216] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %218 = load %arg1[%216, %215] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %219 = mulf %217, %218 {RelaxedPrecision} : f32 + %220 = load %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %221 = addf %220, %219 {RelaxedPrecision} : f32 + store %221, %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %222 = load %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %222, %arg2[%c781, %215] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %223 = addi %arg3, %arg4 : index + %224 = addi %223, %c9 : index + %225 = addi %arg5, %arg6 : index + %226 = load %arg0[%c781, %225] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %227 = load %arg1[%225, %224] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %228 = mulf %226, %227 {RelaxedPrecision} : f32 + %229 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %230 = addf %229, %228 {RelaxedPrecision} : f32 + store %230, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %231 = load %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %231, %arg2[%c781, %224] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %232 = addi %arg3, %arg4 : index + %233 = addi %232, %c10 : index + %234 = addi %arg5, %arg6 : index + %235 = load %arg0[%c781, %234] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %236 = load %arg1[%234, %233] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %237 = mulf %235, %236 {RelaxedPrecision} : f32 + %238 = load %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %239 = addf %238, %237 {RelaxedPrecision} : f32 + store %239, %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %240 = load %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %240, %arg2[%c781, %233] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %241 = addi %arg3, %arg4 : index + %242 = addi %241, %c11 : index + %243 = addi %arg5, %arg6 : index + %244 = load %arg0[%c781, %243] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %245 = load %arg1[%243, %242] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %246 = mulf %244, %245 {RelaxedPrecision} : f32 + %247 = load %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %248 = addf %247, %246 {RelaxedPrecision} : f32 + store %248, %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %249 = load %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %249, %arg2[%c781, %242] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %250 = addi %arg3, %arg4 : index + %251 = addi %250, %c12 : index + %252 = addi %arg5, %arg6 : index + %253 = load %arg0[%c781, %252] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %254 = load %arg1[%252, %251] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %255 = mulf %253, %254 {RelaxedPrecision} : f32 + %256 = load %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %257 = addf %256, %255 {RelaxedPrecision} : f32 + store %257, %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %258 = load %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %258, %arg2[%c781, %251] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %259 = addi %arg3, %arg4 : index + %260 = addi %259, %c13 : index + %261 = addi %arg5, %arg6 : index + %262 = load %arg0[%c781, %261] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %263 = load %arg1[%261, %260] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %264 = mulf %262, %263 {RelaxedPrecision} : f32 + %265 = load %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %266 = addf %265, %264 {RelaxedPrecision} : f32 + store %266, %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %267 = load %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %267, %arg2[%c781, %260] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %268 = addi %arg3, %arg4 : index + %269 = addi %268, %c14 : index + %270 = addi %arg5, %arg6 : index + %271 = load %arg0[%c781, %270] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %272 = load %arg1[%270, %269] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %273 = mulf %271, %272 {RelaxedPrecision} : f32 + %274 = load %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %275 = addf %274, %273 {RelaxedPrecision} : f32 + store %275, %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %276 = load %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %276, %arg2[%c781, %269] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %277 = addi %arg3, %arg4 : index + %278 = addi %277, %c15 : index + %279 = addi %arg5, %arg6 : index + %280 = load %arg0[%c781, %279] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %281 = load %arg1[%279, %278] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %282 = mulf %280, %281 {RelaxedPrecision} : f32 + %283 = load %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %284 = addf %283, %282 {RelaxedPrecision} : f32 + store %284, %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %285 = load %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %285, %arg2[%c781, %278] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %286 = addi %arg3, %arg4 : index + %287 = addi %arg5, %arg6 : index + %288 = load %arg0[%c782, %287] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %289 = load %arg1[%287, %286] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %290 = mulf %288, %289 {RelaxedPrecision} : f32 + %291 = load %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %292 = addf %291, %290 {RelaxedPrecision} : f32 + store %292, %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %293 = load %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %293, %arg2[%c782, %286] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %294 = addi %arg3, %arg4 : index + %295 = addi %294, %c1 : index + %296 = addi %arg5, %arg6 : index + %297 = load %arg0[%c782, %296] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %298 = load %arg1[%296, %295] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %299 = mulf %297, %298 {RelaxedPrecision} : f32 + %300 = load %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %301 = addf %300, %299 {RelaxedPrecision} : f32 + store %301, %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %302 = load %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %302, %arg2[%c782, %295] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %303 = addi %arg3, %arg4 : index + %304 = addi %303, %c2 : index + %305 = addi %arg5, %arg6 : index + %306 = load %arg0[%c782, %305] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %307 = load %arg1[%305, %304] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %308 = mulf %306, %307 {RelaxedPrecision} : f32 + %309 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %310 = addf %309, %308 {RelaxedPrecision} : f32 + store %310, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %311 = load %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %311, %arg2[%c782, %304] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %312 = addi %arg3, %arg4 : index + %313 = addi %312, %c3 : index + %314 = addi %arg5, %arg6 : index + %315 = load %arg0[%c782, %314] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %316 = load %arg1[%314, %313] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %317 = mulf %315, %316 {RelaxedPrecision} : f32 + %318 = load %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %319 = addf %318, %317 {RelaxedPrecision} : f32 + store %319, %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %320 = load %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %320, %arg2[%c782, %313] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %321 = addi %arg3, %arg4 : index + %322 = addi %321, %c4 : index + %323 = addi %arg5, %arg6 : index + %324 = load %arg0[%c782, %323] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %325 = load %arg1[%323, %322] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %326 = mulf %324, %325 {RelaxedPrecision} : f32 + %327 = load %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %328 = addf %327, %326 {RelaxedPrecision} : f32 + store %328, %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %329 = load %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %329, %arg2[%c782, %322] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %330 = addi %arg3, %arg4 : index + %331 = addi %330, %c5 : index + %332 = addi %arg5, %arg6 : index + %333 = load %arg0[%c782, %332] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %334 = load %arg1[%332, %331] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %335 = mulf %333, %334 {RelaxedPrecision} : f32 + %336 = load %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %337 = addf %336, %335 {RelaxedPrecision} : f32 + store %337, %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %338 = load %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %338, %arg2[%c782, %331] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %339 = addi %arg3, %arg4 : index + %340 = addi %339, %c6 : index + %341 = addi %arg5, %arg6 : index + %342 = load %arg0[%c782, %341] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %343 = load %arg1[%341, %340] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %344 = mulf %342, %343 {RelaxedPrecision} : f32 + %345 = load %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %346 = addf %345, %344 {RelaxedPrecision} : f32 + store %346, %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %347 = load %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %347, %arg2[%c782, %340] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %348 = addi %arg3, %arg4 : index + %349 = addi %348, %c7 : index + %350 = addi %arg5, %arg6 : index + %351 = load %arg0[%c782, %350] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %352 = load %arg1[%350, %349] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %353 = mulf %351, %352 {RelaxedPrecision} : f32 + %354 = load %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %355 = addf %354, %353 {RelaxedPrecision} : f32 + store %355, %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %356 = load %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %356, %arg2[%c782, %349] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %357 = addi %arg3, %arg4 : index + %358 = addi %357, %c8 : index + %359 = addi %arg5, %arg6 : index + %360 = load %arg0[%c782, %359] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %361 = load %arg1[%359, %358] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %362 = mulf %360, %361 {RelaxedPrecision} : f32 + %363 = load %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %364 = addf %363, %362 {RelaxedPrecision} : f32 + store %364, %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %365 = load %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %365, %arg2[%c782, %358] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %366 = addi %arg3, %arg4 : index + %367 = addi %366, %c9 : index + %368 = addi %arg5, %arg6 : index + %369 = load %arg0[%c782, %368] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %370 = load %arg1[%368, %367] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %371 = mulf %369, %370 {RelaxedPrecision} : f32 + %372 = load %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %373 = addf %372, %371 {RelaxedPrecision} : f32 + store %373, %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %374 = load %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %374, %arg2[%c782, %367] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %375 = addi %arg3, %arg4 : index + %376 = addi %375, %c10 : index + %377 = addi %arg5, %arg6 : index + %378 = load %arg0[%c782, %377] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %379 = load %arg1[%377, %376] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %380 = mulf %378, %379 {RelaxedPrecision} : f32 + %381 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %382 = addf %381, %380 {RelaxedPrecision} : f32 + store %382, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %383 = load %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %383, %arg2[%c782, %376] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %384 = addi %arg3, %arg4 : index + %385 = addi %384, %c11 : index + %386 = addi %arg5, %arg6 : index + %387 = load %arg0[%c782, %386] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %388 = load %arg1[%386, %385] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %389 = mulf %387, %388 {RelaxedPrecision} : f32 + %390 = load %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %391 = addf %390, %389 {RelaxedPrecision} : f32 + store %391, %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %392 = load %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %392, %arg2[%c782, %385] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %393 = addi %arg3, %arg4 : index + %394 = addi %393, %c12 : index + %395 = addi %arg5, %arg6 : index + %396 = load %arg0[%c782, %395] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %397 = load %arg1[%395, %394] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %398 = mulf %396, %397 {RelaxedPrecision} : f32 + %399 = load %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %400 = addf %399, %398 {RelaxedPrecision} : f32 + store %400, %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %401 = load %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %401, %arg2[%c782, %394] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %402 = addi %arg3, %arg4 : index + %403 = addi %402, %c13 : index + %404 = addi %arg5, %arg6 : index + %405 = load %arg0[%c782, %404] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %406 = load %arg1[%404, %403] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %407 = mulf %405, %406 {RelaxedPrecision} : f32 + %408 = load %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %409 = addf %408, %407 {RelaxedPrecision} : f32 + store %409, %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %410 = load %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %410, %arg2[%c782, %403] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %411 = addi %arg3, %arg4 : index + %412 = addi %411, %c14 : index + %413 = addi %arg5, %arg6 : index + %414 = load %arg0[%c782, %413] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %415 = load %arg1[%413, %412] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %416 = mulf %414, %415 {RelaxedPrecision} : f32 + %417 = load %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %418 = addf %417, %416 {RelaxedPrecision} : f32 + store %418, %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %419 = load %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %419, %arg2[%c782, %412] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %420 = addi %arg3, %arg4 : index + %421 = addi %420, %c15 : index + %422 = addi %arg5, %arg6 : index + %423 = load %arg0[%c782, %422] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %424 = load %arg1[%422, %421] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %425 = mulf %423, %424 {RelaxedPrecision} : f32 + %426 = load %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %427 = addf %426, %425 {RelaxedPrecision} : f32 + store %427, %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %428 = load %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %428, %arg2[%c782, %421] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %429 = addi %arg3, %arg4 : index + %430 = addi %arg5, %arg6 : index + %431 = load %arg0[%c783, %430] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %432 = load %arg1[%430, %429] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %433 = mulf %431, %432 {RelaxedPrecision} : f32 + %434 = load %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %435 = addf %434, %433 {RelaxedPrecision} : f32 + store %435, %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %436 = load %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %436, %arg2[%c783, %429] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %437 = addi %arg3, %arg4 : index + %438 = addi %437, %c1 : index + %439 = addi %arg5, %arg6 : index + %440 = load %arg0[%c783, %439] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %441 = load %arg1[%439, %438] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %442 = mulf %440, %441 {RelaxedPrecision} : f32 + %443 = load %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %444 = addf %443, %442 {RelaxedPrecision} : f32 + store %444, %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %445 = load %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %445, %arg2[%c783, %438] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %446 = addi %arg3, %arg4 : index + %447 = addi %446, %c2 : index + %448 = addi %arg5, %arg6 : index + %449 = load %arg0[%c783, %448] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %450 = load %arg1[%448, %447] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %451 = mulf %449, %450 {RelaxedPrecision} : f32 + %452 = load %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %453 = addf %452, %451 {RelaxedPrecision} : f32 + store %453, %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %454 = load %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %454, %arg2[%c783, %447] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %455 = addi %arg3, %arg4 : index + %456 = addi %455, %c3 : index + %457 = addi %arg5, %arg6 : index + %458 = load %arg0[%c783, %457] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %459 = load %arg1[%457, %456] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %460 = mulf %458, %459 {RelaxedPrecision} : f32 + %461 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %462 = addf %461, %460 {RelaxedPrecision} : f32 + store %462, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %463 = load %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %463, %arg2[%c783, %456] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %464 = addi %arg3, %arg4 : index + %465 = addi %464, %c4 : index + %466 = addi %arg5, %arg6 : index + %467 = load %arg0[%c783, %466] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %468 = load %arg1[%466, %465] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %469 = mulf %467, %468 {RelaxedPrecision} : f32 + %470 = load %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %471 = addf %470, %469 {RelaxedPrecision} : f32 + store %471, %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %472 = load %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %472, %arg2[%c783, %465] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %473 = addi %arg3, %arg4 : index + %474 = addi %473, %c5 : index + %475 = addi %arg5, %arg6 : index + %476 = load %arg0[%c783, %475] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %477 = load %arg1[%475, %474] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %478 = mulf %476, %477 {RelaxedPrecision} : f32 + %479 = load %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %480 = addf %479, %478 {RelaxedPrecision} : f32 + store %480, %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %481 = load %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %481, %arg2[%c783, %474] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %482 = addi %arg3, %arg4 : index + %483 = addi %482, %c6 : index + %484 = addi %arg5, %arg6 : index + %485 = load %arg0[%c783, %484] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %486 = load %arg1[%484, %483] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %487 = mulf %485, %486 {RelaxedPrecision} : f32 + %488 = load %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %489 = addf %488, %487 {RelaxedPrecision} : f32 + store %489, %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %490 = load %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %490, %arg2[%c783, %483] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %491 = addi %arg3, %arg4 : index + %492 = addi %491, %c7 : index + %493 = addi %arg5, %arg6 : index + %494 = load %arg0[%c783, %493] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %495 = load %arg1[%493, %492] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %496 = mulf %494, %495 {RelaxedPrecision} : f32 + %497 = load %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %498 = addf %497, %496 {RelaxedPrecision} : f32 + store %498, %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %499 = load %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %499, %arg2[%c783, %492] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %500 = addi %arg3, %arg4 : index + %501 = addi %500, %c8 : index + %502 = addi %arg5, %arg6 : index + %503 = load %arg0[%c783, %502] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %504 = load %arg1[%502, %501] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %505 = mulf %503, %504 {RelaxedPrecision} : f32 + %506 = load %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %507 = addf %506, %505 {RelaxedPrecision} : f32 + store %507, %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %508 = load %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %508, %arg2[%c783, %501] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %509 = addi %arg3, %arg4 : index + %510 = addi %509, %c9 : index + %511 = addi %arg5, %arg6 : index + %512 = load %arg0[%c783, %511] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %513 = load %arg1[%511, %510] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %514 = mulf %512, %513 {RelaxedPrecision} : f32 + %515 = load %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %516 = addf %515, %514 {RelaxedPrecision} : f32 + store %516, %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %517 = load %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %517, %arg2[%c783, %510] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %518 = addi %arg3, %arg4 : index + %519 = addi %518, %c10 : index + %520 = addi %arg5, %arg6 : index + %521 = load %arg0[%c783, %520] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %522 = load %arg1[%520, %519] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %523 = mulf %521, %522 {RelaxedPrecision} : f32 + %524 = load %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %525 = addf %524, %523 {RelaxedPrecision} : f32 + store %525, %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %526 = load %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %526, %arg2[%c783, %519] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %527 = addi %arg3, %arg4 : index + %528 = addi %527, %c11 : index + %529 = addi %arg5, %arg6 : index + %530 = load %arg0[%c783, %529] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %531 = load %arg1[%529, %528] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %532 = mulf %530, %531 {RelaxedPrecision} : f32 + %533 = load %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %534 = addf %533, %532 {RelaxedPrecision} : f32 + store %534, %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %535 = load %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %535, %arg2[%c783, %528] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %536 = addi %arg3, %arg4 : index + %537 = addi %536, %c12 : index + %538 = addi %arg5, %arg6 : index + %539 = load %arg0[%c783, %538] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %540 = load %arg1[%538, %537] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %541 = mulf %539, %540 {RelaxedPrecision} : f32 + %542 = load %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %543 = addf %542, %541 {RelaxedPrecision} : f32 + store %543, %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %544 = load %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %544, %arg2[%c783, %537] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %545 = addi %arg3, %arg4 : index + %546 = addi %545, %c13 : index + %547 = addi %arg5, %arg6 : index + %548 = load %arg0[%c783, %547] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %549 = load %arg1[%547, %546] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %550 = mulf %548, %549 {RelaxedPrecision} : f32 + %551 = load %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %552 = addf %551, %550 {RelaxedPrecision} : f32 + store %552, %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %553 = load %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %553, %arg2[%c783, %546] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %554 = addi %arg3, %arg4 : index + %555 = addi %554, %c14 : index + %556 = addi %arg5, %arg6 : index + %557 = load %arg0[%c783, %556] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %558 = load %arg1[%556, %555] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %559 = mulf %557, %558 {RelaxedPrecision} : f32 + %560 = load %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %561 = addf %560, %559 {RelaxedPrecision} : f32 + store %561, %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %562 = load %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %562, %arg2[%c783, %555] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %563 = addi %arg3, %arg4 : index + %564 = addi %563, %c15 : index + %565 = addi %arg5, %arg6 : index + %566 = load %arg0[%c783, %565] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %567 = load %arg1[%565, %564] : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %568 = mulf %566, %567 {RelaxedPrecision} : f32 + %569 = load %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %570 = addf %569, %568 {RelaxedPrecision} : f32 + store %570, %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + %571 = load %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + store %571, %arg2[%c783, %564] : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + return + } + func @optimized_matmul_py(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/9_ConvertValueToStd.mlir b/Tutorials/optimized_matmul/mlir/9_ConvertValueToStd.mlir new file mode 100644 index 00000000..ff398abb --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/9_ConvertValueToStd.mlir @@ -0,0 +1,11596 @@ +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + "accv.global"() {sym_name = "cache_17", type = memref<16x128x2xvector<8xf32>>} : () -> () + "accv.global"() {sym_name = "cache_16", type = memref<16x6x2xvector<8xf32>>} : () -> () + func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %cst = constant 0.000000e+00 : f32 + %c0_i64 = constant 0 : i64 + %c1_i64 = constant 1 : i64 + %c2_i64 = constant 2 : i64 + %c3_i64 = constant 3 : i64 + %c4_i64 = constant 4 : i64 + %c5_i64 = constant 5 : i64 + %c6_i64 = constant 6 : i64 + %c7_i64 = constant 7 : i64 + %cst_0 = constant dense<0.000000e+00> : vector<8xf32> + %c10 = constant 10 : index + %c12 = constant 12 : index + %c14 = constant 14 : index + %c512 = constant 512 : index + %c784 = constant 784 : index + %c256 = constant 256 : index + %c128 = constant 128 : index + %true = constant true + %c24 = constant 24 : index + %c32 = constant 32 : index + %c40 = constant 40 : index + %c48 = constant 48 : index + %c3 = constant 3 : index + %c56 = constant 56 : index + %c64 = constant 64 : index + %c4 = constant 4 : index + %c72 = constant 72 : index + %c9 = constant 9 : index + %c80 = constant 80 : index + %c5 = constant 5 : index + %c88 = constant 88 : index + %c11 = constant 11 : index + %c96 = constant 96 : index + %c6 = constant 6 : index + %c104 = constant 104 : index + %c13 = constant 13 : index + %c112 = constant 112 : index + %c-16 = constant -16 : index + %c7 = constant 7 : index + %c120 = constant 120 : index + %c2 = constant 2 : index + %c-1 = constant -1 : index + %c-2 = constant -2 : index + %c15 = constant 15 : index + %c0 = constant 0 : index + %c16 = constant 16 : index + %c1 = constant 1 : index + %c8 = constant 8 : index + %0 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %1 = alloca() {alignment = 32 : i64} : memref<1x16xvector<8xf32>> + %2 = "accv.ref_global"() {global_name = @cache_16} : () -> memref<16x6x2xvector<8xf32>> + %3 = "accv.ref_global"() {global_name = @cache_17} : () -> memref<16x128x2xvector<8xf32>> + scf.for %arg3 = %c0 to %c512 step %c256 { + scf.for %arg4 = %c0 to %c128 step %c1 { + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %arg3, %arg5 : index + %7 = addi %6, %c8 : index + %8 = vector.transfer_read %arg1[%arg4, %7], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %8, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %9 = addi %arg3, %arg5 : index + %10 = addi %9, %c16 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %12 = addi %arg3, %arg5 : index + %13 = addi %12, %c24 : index + %14 = vector.transfer_read %arg1[%arg4, %13], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %15 = addi %arg3, %arg5 : index + %16 = addi %15, %c32 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %18 = addi %arg3, %arg5 : index + %19 = addi %18, %c40 : index + %20 = vector.transfer_read %arg1[%arg4, %19], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %20, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %21 = addi %arg3, %arg5 : index + %22 = addi %21, %c48 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %24 = addi %arg3, %arg5 : index + %25 = addi %24, %c56 : index + %26 = vector.transfer_read %arg1[%arg4, %25], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %27 = addi %arg3, %arg5 : index + %28 = addi %27, %c64 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %30 = addi %arg3, %arg5 : index + %31 = addi %30, %c72 : index + %32 = vector.transfer_read %arg1[%arg4, %31], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %32, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %33 = addi %arg3, %arg5 : index + %34 = addi %33, %c80 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %36 = addi %arg3, %arg5 : index + %37 = addi %36, %c88 : index + %38 = vector.transfer_read %arg1[%arg4, %37], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %39 = addi %arg3, %arg5 : index + %40 = addi %39, %c96 : index + %41 = vector.transfer_read %arg1[%arg4, %40], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %41, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %42 = addi %arg3, %arg5 : index + %43 = addi %42, %c104 : index + %44 = vector.transfer_read %arg1[%arg4, %43], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %44, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %45 = addi %arg3, %arg5 : index + %46 = addi %45, %c112 : index + %47 = vector.transfer_read %arg1[%arg4, %46], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %47, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %48 = addi %arg3, %arg5 : index + %49 = addi %48, %c120 : index + %50 = vector.transfer_read %arg1[%arg4, %49], %cst {masked = [false]} : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %51 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %52 = cmpi "slt", %arg5, %c0 : index + %53 = subi %c-1, %arg5 : index + %54 = select %52, %53, %arg5 : index + %55 = divi_signed %54, %c16 : index + %56 = subi %c-1, %55 : index + %57 = select %52, %56, %55 : index + %58 = remi_signed %57, %c16 : index + %59 = cmpi "slt", %58, %c0 : index + %60 = addi %58, %c16 : index + %61 = select %59, %60, %58 : index + %62 = remi_signed %arg4, %c128 : index + %63 = cmpi "slt", %62, %c0 : index + %64 = addi %62, %c128 : index + %65 = select %63, %64, %62 : index + %66 = remi_signed %arg5, %c16 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = addi %66, %c16 : index + %69 = select %67, %68, %66 : index + %70 = cmpi "slt", %69, %c0 : index + %71 = subi %c-1, %69 : index + %72 = select %70, %71, %69 : index + %73 = divi_signed %72, %c8 : index + %74 = subi %c-1, %73 : index + %75 = select %70, %74, %73 : index + %76 = remi_signed %75, %c2 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = addi %76, %c2 : index + %79 = select %77, %78, %76 : index + store %51, %3[%61, %65, %79] : memref<16x128x2xvector<8xf32>> + %80 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %81 = addi %arg5, %c8 : index + %82 = cmpi "slt", %81, %c0 : index + %83 = subi %c-1, %81 : index + %84 = select %82, %83, %81 : index + %85 = divi_signed %84, %c16 : index + %86 = subi %c-1, %85 : index + %87 = select %82, %86, %85 : index + %88 = remi_signed %87, %c16 : index + %89 = cmpi "slt", %88, %c0 : index + %90 = addi %88, %c16 : index + %91 = select %89, %90, %88 : index + %92 = remi_signed %arg4, %c128 : index + %93 = cmpi "slt", %92, %c0 : index + %94 = addi %92, %c128 : index + %95 = select %93, %94, %92 : index + %96 = cmpi "slt", %arg5, %c0 : index + %97 = subi %c-1, %arg5 : index + %98 = select %96, %97, %arg5 : index + %99 = divi_signed %98, %c8 : index + %100 = subi %c-1, %99 : index + %101 = select %96, %100, %99 : index + %102 = addi %arg5, %c8 : index + %103 = cmpi "slt", %102, %c0 : index + %104 = subi %c-1, %102 : index + %105 = select %103, %104, %102 : index + %106 = divi_signed %105, %c16 : index + %107 = subi %c-1, %106 : index + %108 = select %103, %107, %106 : index + %109 = muli %108, %c-2 : index + %110 = addi %101, %109 : index + %111 = cmpi "slt", %arg5, %c0 : index + %112 = subi %c-1, %arg5 : index + %113 = select %111, %112, %arg5 : index + %114 = divi_signed %113, %c8 : index + %115 = subi %c-1, %114 : index + %116 = select %111, %115, %114 : index + %117 = addi %arg5, %c8 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c16 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c1 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = subi %c-1, %126 : index + %129 = select %127, %128, %126 : index + %130 = divi_signed %129, %c2 : index + %131 = subi %c-1, %130 : index + %132 = select %127, %131, %130 : index + %133 = muli %132, %c-2 : index + %134 = addi %110, %133 : index + %135 = addi %134, %c1 : index + store %80, %3[%91, %95, %135] : memref<16x128x2xvector<8xf32>> + %136 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %137 = cmpi "slt", %arg5, %c0 : index + %138 = subi %c-1, %arg5 : index + %139 = select %137, %138, %arg5 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = cmpi "slt", %arg5, %c0 : index + %144 = subi %c-1, %arg5 : index + %145 = select %143, %144, %arg5 : index + %146 = divi_signed %145, %c16 : index + %147 = subi %c-1, %146 : index + %148 = select %143, %147, %146 : index + %149 = addi %148, %c1 : index + %150 = cmpi "slt", %149, %c0 : index + %151 = subi %c-1, %149 : index + %152 = select %150, %151, %149 : index + %153 = divi_signed %152, %c16 : index + %154 = subi %c-1, %153 : index + %155 = select %150, %154, %153 : index + %156 = muli %155, %c-16 : index + %157 = addi %142, %156 : index + %158 = addi %157, %c1 : index + %159 = remi_signed %arg4, %c128 : index + %160 = cmpi "slt", %159, %c0 : index + %161 = addi %159, %c128 : index + %162 = select %160, %161, %159 : index + %163 = remi_signed %arg5, %c16 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = addi %163, %c16 : index + %166 = select %164, %165, %163 : index + %167 = cmpi "slt", %166, %c0 : index + %168 = subi %c-1, %166 : index + %169 = select %167, %168, %166 : index + %170 = divi_signed %169, %c8 : index + %171 = subi %c-1, %170 : index + %172 = select %167, %171, %170 : index + %173 = remi_signed %172, %c2 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = addi %173, %c2 : index + %176 = select %174, %175, %173 : index + store %136, %3[%158, %162, %176] : memref<16x128x2xvector<8xf32>> + %177 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %178 = addi %arg5, %c24 : index + %179 = cmpi "slt", %178, %c0 : index + %180 = subi %c-1, %178 : index + %181 = select %179, %180, %178 : index + %182 = divi_signed %181, %c16 : index + %183 = subi %c-1, %182 : index + %184 = select %179, %183, %182 : index + %185 = remi_signed %184, %c16 : index + %186 = cmpi "slt", %185, %c0 : index + %187 = addi %185, %c16 : index + %188 = select %186, %187, %185 : index + %189 = remi_signed %arg4, %c128 : index + %190 = cmpi "slt", %189, %c0 : index + %191 = addi %189, %c128 : index + %192 = select %190, %191, %189 : index + %193 = cmpi "slt", %arg5, %c0 : index + %194 = subi %c-1, %arg5 : index + %195 = select %193, %194, %arg5 : index + %196 = divi_signed %195, %c8 : index + %197 = subi %c-1, %196 : index + %198 = select %193, %197, %196 : index + %199 = addi %arg5, %c24 : index + %200 = cmpi "slt", %199, %c0 : index + %201 = subi %c-1, %199 : index + %202 = select %200, %201, %199 : index + %203 = divi_signed %202, %c16 : index + %204 = subi %c-1, %203 : index + %205 = select %200, %204, %203 : index + %206 = muli %205, %c-2 : index + %207 = addi %198, %206 : index + %208 = cmpi "slt", %arg5, %c0 : index + %209 = subi %c-1, %arg5 : index + %210 = select %208, %209, %arg5 : index + %211 = divi_signed %210, %c8 : index + %212 = subi %c-1, %211 : index + %213 = select %208, %212, %211 : index + %214 = addi %arg5, %c24 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c16 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c3 : index + %224 = cmpi "slt", %223, %c0 : index + %225 = subi %c-1, %223 : index + %226 = select %224, %225, %223 : index + %227 = divi_signed %226, %c2 : index + %228 = subi %c-1, %227 : index + %229 = select %224, %228, %227 : index + %230 = muli %229, %c-2 : index + %231 = addi %207, %230 : index + %232 = addi %231, %c3 : index + store %177, %3[%188, %192, %232] : memref<16x128x2xvector<8xf32>> + %233 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %234 = cmpi "slt", %arg5, %c0 : index + %235 = subi %c-1, %arg5 : index + %236 = select %234, %235, %arg5 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = cmpi "slt", %arg5, %c0 : index + %241 = subi %c-1, %arg5 : index + %242 = select %240, %241, %arg5 : index + %243 = divi_signed %242, %c16 : index + %244 = subi %c-1, %243 : index + %245 = select %240, %244, %243 : index + %246 = addi %245, %c2 : index + %247 = cmpi "slt", %246, %c0 : index + %248 = subi %c-1, %246 : index + %249 = select %247, %248, %246 : index + %250 = divi_signed %249, %c16 : index + %251 = subi %c-1, %250 : index + %252 = select %247, %251, %250 : index + %253 = muli %252, %c-16 : index + %254 = addi %239, %253 : index + %255 = addi %254, %c2 : index + %256 = remi_signed %arg4, %c128 : index + %257 = cmpi "slt", %256, %c0 : index + %258 = addi %256, %c128 : index + %259 = select %257, %258, %256 : index + %260 = remi_signed %arg5, %c16 : index + %261 = cmpi "slt", %260, %c0 : index + %262 = addi %260, %c16 : index + %263 = select %261, %262, %260 : index + %264 = cmpi "slt", %263, %c0 : index + %265 = subi %c-1, %263 : index + %266 = select %264, %265, %263 : index + %267 = divi_signed %266, %c8 : index + %268 = subi %c-1, %267 : index + %269 = select %264, %268, %267 : index + %270 = remi_signed %269, %c2 : index + %271 = cmpi "slt", %270, %c0 : index + %272 = addi %270, %c2 : index + %273 = select %271, %272, %270 : index + store %233, %3[%255, %259, %273] : memref<16x128x2xvector<8xf32>> + %274 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %275 = addi %arg5, %c40 : index + %276 = cmpi "slt", %275, %c0 : index + %277 = subi %c-1, %275 : index + %278 = select %276, %277, %275 : index + %279 = divi_signed %278, %c16 : index + %280 = subi %c-1, %279 : index + %281 = select %276, %280, %279 : index + %282 = remi_signed %281, %c16 : index + %283 = cmpi "slt", %282, %c0 : index + %284 = addi %282, %c16 : index + %285 = select %283, %284, %282 : index + %286 = remi_signed %arg4, %c128 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c128 : index + %289 = select %287, %288, %286 : index + %290 = cmpi "slt", %arg5, %c0 : index + %291 = subi %c-1, %arg5 : index + %292 = select %290, %291, %arg5 : index + %293 = divi_signed %292, %c8 : index + %294 = subi %c-1, %293 : index + %295 = select %290, %294, %293 : index + %296 = addi %arg5, %c40 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c16 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = cmpi "slt", %arg5, %c0 : index + %306 = subi %c-1, %arg5 : index + %307 = select %305, %306, %arg5 : index + %308 = divi_signed %307, %c8 : index + %309 = subi %c-1, %308 : index + %310 = select %305, %309, %308 : index + %311 = addi %arg5, %c40 : index + %312 = cmpi "slt", %311, %c0 : index + %313 = subi %c-1, %311 : index + %314 = select %312, %313, %311 : index + %315 = divi_signed %314, %c16 : index + %316 = subi %c-1, %315 : index + %317 = select %312, %316, %315 : index + %318 = muli %317, %c-2 : index + %319 = addi %310, %318 : index + %320 = addi %319, %c5 : index + %321 = cmpi "slt", %320, %c0 : index + %322 = subi %c-1, %320 : index + %323 = select %321, %322, %320 : index + %324 = divi_signed %323, %c2 : index + %325 = subi %c-1, %324 : index + %326 = select %321, %325, %324 : index + %327 = muli %326, %c-2 : index + %328 = addi %304, %327 : index + %329 = addi %328, %c5 : index + store %274, %3[%285, %289, %329] : memref<16x128x2xvector<8xf32>> + %330 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %331 = cmpi "slt", %arg5, %c0 : index + %332 = subi %c-1, %arg5 : index + %333 = select %331, %332, %arg5 : index + %334 = divi_signed %333, %c16 : index + %335 = subi %c-1, %334 : index + %336 = select %331, %335, %334 : index + %337 = cmpi "slt", %arg5, %c0 : index + %338 = subi %c-1, %arg5 : index + %339 = select %337, %338, %arg5 : index + %340 = divi_signed %339, %c16 : index + %341 = subi %c-1, %340 : index + %342 = select %337, %341, %340 : index + %343 = addi %342, %c3 : index + %344 = cmpi "slt", %343, %c0 : index + %345 = subi %c-1, %343 : index + %346 = select %344, %345, %343 : index + %347 = divi_signed %346, %c16 : index + %348 = subi %c-1, %347 : index + %349 = select %344, %348, %347 : index + %350 = muli %349, %c-16 : index + %351 = addi %336, %350 : index + %352 = addi %351, %c3 : index + %353 = remi_signed %arg4, %c128 : index + %354 = cmpi "slt", %353, %c0 : index + %355 = addi %353, %c128 : index + %356 = select %354, %355, %353 : index + %357 = remi_signed %arg5, %c16 : index + %358 = cmpi "slt", %357, %c0 : index + %359 = addi %357, %c16 : index + %360 = select %358, %359, %357 : index + %361 = cmpi "slt", %360, %c0 : index + %362 = subi %c-1, %360 : index + %363 = select %361, %362, %360 : index + %364 = divi_signed %363, %c8 : index + %365 = subi %c-1, %364 : index + %366 = select %361, %365, %364 : index + %367 = remi_signed %366, %c2 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = addi %367, %c2 : index + %370 = select %368, %369, %367 : index + store %330, %3[%352, %356, %370] : memref<16x128x2xvector<8xf32>> + %371 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %372 = addi %arg5, %c56 : index + %373 = cmpi "slt", %372, %c0 : index + %374 = subi %c-1, %372 : index + %375 = select %373, %374, %372 : index + %376 = divi_signed %375, %c16 : index + %377 = subi %c-1, %376 : index + %378 = select %373, %377, %376 : index + %379 = remi_signed %378, %c16 : index + %380 = cmpi "slt", %379, %c0 : index + %381 = addi %379, %c16 : index + %382 = select %380, %381, %379 : index + %383 = remi_signed %arg4, %c128 : index + %384 = cmpi "slt", %383, %c0 : index + %385 = addi %383, %c128 : index + %386 = select %384, %385, %383 : index + %387 = cmpi "slt", %arg5, %c0 : index + %388 = subi %c-1, %arg5 : index + %389 = select %387, %388, %arg5 : index + %390 = divi_signed %389, %c8 : index + %391 = subi %c-1, %390 : index + %392 = select %387, %391, %390 : index + %393 = addi %arg5, %c56 : index + %394 = cmpi "slt", %393, %c0 : index + %395 = subi %c-1, %393 : index + %396 = select %394, %395, %393 : index + %397 = divi_signed %396, %c16 : index + %398 = subi %c-1, %397 : index + %399 = select %394, %398, %397 : index + %400 = muli %399, %c-2 : index + %401 = addi %392, %400 : index + %402 = cmpi "slt", %arg5, %c0 : index + %403 = subi %c-1, %arg5 : index + %404 = select %402, %403, %arg5 : index + %405 = divi_signed %404, %c8 : index + %406 = subi %c-1, %405 : index + %407 = select %402, %406, %405 : index + %408 = addi %arg5, %c56 : index + %409 = cmpi "slt", %408, %c0 : index + %410 = subi %c-1, %408 : index + %411 = select %409, %410, %408 : index + %412 = divi_signed %411, %c16 : index + %413 = subi %c-1, %412 : index + %414 = select %409, %413, %412 : index + %415 = muli %414, %c-2 : index + %416 = addi %407, %415 : index + %417 = addi %416, %c7 : index + %418 = cmpi "slt", %417, %c0 : index + %419 = subi %c-1, %417 : index + %420 = select %418, %419, %417 : index + %421 = divi_signed %420, %c2 : index + %422 = subi %c-1, %421 : index + %423 = select %418, %422, %421 : index + %424 = muli %423, %c-2 : index + %425 = addi %401, %424 : index + %426 = addi %425, %c7 : index + store %371, %3[%382, %386, %426] : memref<16x128x2xvector<8xf32>> + %427 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %428 = cmpi "slt", %arg5, %c0 : index + %429 = subi %c-1, %arg5 : index + %430 = select %428, %429, %arg5 : index + %431 = divi_signed %430, %c16 : index + %432 = subi %c-1, %431 : index + %433 = select %428, %432, %431 : index + %434 = cmpi "slt", %arg5, %c0 : index + %435 = subi %c-1, %arg5 : index + %436 = select %434, %435, %arg5 : index + %437 = divi_signed %436, %c16 : index + %438 = subi %c-1, %437 : index + %439 = select %434, %438, %437 : index + %440 = addi %439, %c4 : index + %441 = cmpi "slt", %440, %c0 : index + %442 = subi %c-1, %440 : index + %443 = select %441, %442, %440 : index + %444 = divi_signed %443, %c16 : index + %445 = subi %c-1, %444 : index + %446 = select %441, %445, %444 : index + %447 = muli %446, %c-16 : index + %448 = addi %433, %447 : index + %449 = addi %448, %c4 : index + %450 = remi_signed %arg4, %c128 : index + %451 = cmpi "slt", %450, %c0 : index + %452 = addi %450, %c128 : index + %453 = select %451, %452, %450 : index + %454 = remi_signed %arg5, %c16 : index + %455 = cmpi "slt", %454, %c0 : index + %456 = addi %454, %c16 : index + %457 = select %455, %456, %454 : index + %458 = cmpi "slt", %457, %c0 : index + %459 = subi %c-1, %457 : index + %460 = select %458, %459, %457 : index + %461 = divi_signed %460, %c8 : index + %462 = subi %c-1, %461 : index + %463 = select %458, %462, %461 : index + %464 = remi_signed %463, %c2 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = addi %464, %c2 : index + %467 = select %465, %466, %464 : index + store %427, %3[%449, %453, %467] : memref<16x128x2xvector<8xf32>> + %468 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %469 = addi %arg5, %c72 : index + %470 = cmpi "slt", %469, %c0 : index + %471 = subi %c-1, %469 : index + %472 = select %470, %471, %469 : index + %473 = divi_signed %472, %c16 : index + %474 = subi %c-1, %473 : index + %475 = select %470, %474, %473 : index + %476 = remi_signed %475, %c16 : index + %477 = cmpi "slt", %476, %c0 : index + %478 = addi %476, %c16 : index + %479 = select %477, %478, %476 : index + %480 = remi_signed %arg4, %c128 : index + %481 = cmpi "slt", %480, %c0 : index + %482 = addi %480, %c128 : index + %483 = select %481, %482, %480 : index + %484 = cmpi "slt", %arg5, %c0 : index + %485 = subi %c-1, %arg5 : index + %486 = select %484, %485, %arg5 : index + %487 = divi_signed %486, %c8 : index + %488 = subi %c-1, %487 : index + %489 = select %484, %488, %487 : index + %490 = addi %arg5, %c72 : index + %491 = cmpi "slt", %490, %c0 : index + %492 = subi %c-1, %490 : index + %493 = select %491, %492, %490 : index + %494 = divi_signed %493, %c16 : index + %495 = subi %c-1, %494 : index + %496 = select %491, %495, %494 : index + %497 = muli %496, %c-2 : index + %498 = addi %489, %497 : index + %499 = cmpi "slt", %arg5, %c0 : index + %500 = subi %c-1, %arg5 : index + %501 = select %499, %500, %arg5 : index + %502 = divi_signed %501, %c8 : index + %503 = subi %c-1, %502 : index + %504 = select %499, %503, %502 : index + %505 = addi %arg5, %c72 : index + %506 = cmpi "slt", %505, %c0 : index + %507 = subi %c-1, %505 : index + %508 = select %506, %507, %505 : index + %509 = divi_signed %508, %c16 : index + %510 = subi %c-1, %509 : index + %511 = select %506, %510, %509 : index + %512 = muli %511, %c-2 : index + %513 = addi %504, %512 : index + %514 = addi %513, %c9 : index + %515 = cmpi "slt", %514, %c0 : index + %516 = subi %c-1, %514 : index + %517 = select %515, %516, %514 : index + %518 = divi_signed %517, %c2 : index + %519 = subi %c-1, %518 : index + %520 = select %515, %519, %518 : index + %521 = muli %520, %c-2 : index + %522 = addi %498, %521 : index + %523 = addi %522, %c9 : index + store %468, %3[%479, %483, %523] : memref<16x128x2xvector<8xf32>> + %524 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %525 = cmpi "slt", %arg5, %c0 : index + %526 = subi %c-1, %arg5 : index + %527 = select %525, %526, %arg5 : index + %528 = divi_signed %527, %c16 : index + %529 = subi %c-1, %528 : index + %530 = select %525, %529, %528 : index + %531 = cmpi "slt", %arg5, %c0 : index + %532 = subi %c-1, %arg5 : index + %533 = select %531, %532, %arg5 : index + %534 = divi_signed %533, %c16 : index + %535 = subi %c-1, %534 : index + %536 = select %531, %535, %534 : index + %537 = addi %536, %c5 : index + %538 = cmpi "slt", %537, %c0 : index + %539 = subi %c-1, %537 : index + %540 = select %538, %539, %537 : index + %541 = divi_signed %540, %c16 : index + %542 = subi %c-1, %541 : index + %543 = select %538, %542, %541 : index + %544 = muli %543, %c-16 : index + %545 = addi %530, %544 : index + %546 = addi %545, %c5 : index + %547 = remi_signed %arg4, %c128 : index + %548 = cmpi "slt", %547, %c0 : index + %549 = addi %547, %c128 : index + %550 = select %548, %549, %547 : index + %551 = remi_signed %arg5, %c16 : index + %552 = cmpi "slt", %551, %c0 : index + %553 = addi %551, %c16 : index + %554 = select %552, %553, %551 : index + %555 = cmpi "slt", %554, %c0 : index + %556 = subi %c-1, %554 : index + %557 = select %555, %556, %554 : index + %558 = divi_signed %557, %c8 : index + %559 = subi %c-1, %558 : index + %560 = select %555, %559, %558 : index + %561 = remi_signed %560, %c2 : index + %562 = cmpi "slt", %561, %c0 : index + %563 = addi %561, %c2 : index + %564 = select %562, %563, %561 : index + store %524, %3[%546, %550, %564] : memref<16x128x2xvector<8xf32>> + %565 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %566 = addi %arg5, %c88 : index + %567 = cmpi "slt", %566, %c0 : index + %568 = subi %c-1, %566 : index + %569 = select %567, %568, %566 : index + %570 = divi_signed %569, %c16 : index + %571 = subi %c-1, %570 : index + %572 = select %567, %571, %570 : index + %573 = remi_signed %572, %c16 : index + %574 = cmpi "slt", %573, %c0 : index + %575 = addi %573, %c16 : index + %576 = select %574, %575, %573 : index + %577 = remi_signed %arg4, %c128 : index + %578 = cmpi "slt", %577, %c0 : index + %579 = addi %577, %c128 : index + %580 = select %578, %579, %577 : index + %581 = cmpi "slt", %arg5, %c0 : index + %582 = subi %c-1, %arg5 : index + %583 = select %581, %582, %arg5 : index + %584 = divi_signed %583, %c8 : index + %585 = subi %c-1, %584 : index + %586 = select %581, %585, %584 : index + %587 = addi %arg5, %c88 : index + %588 = cmpi "slt", %587, %c0 : index + %589 = subi %c-1, %587 : index + %590 = select %588, %589, %587 : index + %591 = divi_signed %590, %c16 : index + %592 = subi %c-1, %591 : index + %593 = select %588, %592, %591 : index + %594 = muli %593, %c-2 : index + %595 = addi %586, %594 : index + %596 = cmpi "slt", %arg5, %c0 : index + %597 = subi %c-1, %arg5 : index + %598 = select %596, %597, %arg5 : index + %599 = divi_signed %598, %c8 : index + %600 = subi %c-1, %599 : index + %601 = select %596, %600, %599 : index + %602 = addi %arg5, %c88 : index + %603 = cmpi "slt", %602, %c0 : index + %604 = subi %c-1, %602 : index + %605 = select %603, %604, %602 : index + %606 = divi_signed %605, %c16 : index + %607 = subi %c-1, %606 : index + %608 = select %603, %607, %606 : index + %609 = muli %608, %c-2 : index + %610 = addi %601, %609 : index + %611 = addi %610, %c11 : index + %612 = cmpi "slt", %611, %c0 : index + %613 = subi %c-1, %611 : index + %614 = select %612, %613, %611 : index + %615 = divi_signed %614, %c2 : index + %616 = subi %c-1, %615 : index + %617 = select %612, %616, %615 : index + %618 = muli %617, %c-2 : index + %619 = addi %595, %618 : index + %620 = addi %619, %c11 : index + store %565, %3[%576, %580, %620] : memref<16x128x2xvector<8xf32>> + %621 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %622 = cmpi "slt", %arg5, %c0 : index + %623 = subi %c-1, %arg5 : index + %624 = select %622, %623, %arg5 : index + %625 = divi_signed %624, %c16 : index + %626 = subi %c-1, %625 : index + %627 = select %622, %626, %625 : index + %628 = cmpi "slt", %arg5, %c0 : index + %629 = subi %c-1, %arg5 : index + %630 = select %628, %629, %arg5 : index + %631 = divi_signed %630, %c16 : index + %632 = subi %c-1, %631 : index + %633 = select %628, %632, %631 : index + %634 = addi %633, %c6 : index + %635 = cmpi "slt", %634, %c0 : index + %636 = subi %c-1, %634 : index + %637 = select %635, %636, %634 : index + %638 = divi_signed %637, %c16 : index + %639 = subi %c-1, %638 : index + %640 = select %635, %639, %638 : index + %641 = muli %640, %c-16 : index + %642 = addi %627, %641 : index + %643 = addi %642, %c6 : index + %644 = remi_signed %arg4, %c128 : index + %645 = cmpi "slt", %644, %c0 : index + %646 = addi %644, %c128 : index + %647 = select %645, %646, %644 : index + %648 = remi_signed %arg5, %c16 : index + %649 = cmpi "slt", %648, %c0 : index + %650 = addi %648, %c16 : index + %651 = select %649, %650, %648 : index + %652 = cmpi "slt", %651, %c0 : index + %653 = subi %c-1, %651 : index + %654 = select %652, %653, %651 : index + %655 = divi_signed %654, %c8 : index + %656 = subi %c-1, %655 : index + %657 = select %652, %656, %655 : index + %658 = remi_signed %657, %c2 : index + %659 = cmpi "slt", %658, %c0 : index + %660 = addi %658, %c2 : index + %661 = select %659, %660, %658 : index + store %621, %3[%643, %647, %661] : memref<16x128x2xvector<8xf32>> + %662 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %663 = addi %arg5, %c104 : index + %664 = cmpi "slt", %663, %c0 : index + %665 = subi %c-1, %663 : index + %666 = select %664, %665, %663 : index + %667 = divi_signed %666, %c16 : index + %668 = subi %c-1, %667 : index + %669 = select %664, %668, %667 : index + %670 = remi_signed %669, %c16 : index + %671 = cmpi "slt", %670, %c0 : index + %672 = addi %670, %c16 : index + %673 = select %671, %672, %670 : index + %674 = remi_signed %arg4, %c128 : index + %675 = cmpi "slt", %674, %c0 : index + %676 = addi %674, %c128 : index + %677 = select %675, %676, %674 : index + %678 = cmpi "slt", %arg5, %c0 : index + %679 = subi %c-1, %arg5 : index + %680 = select %678, %679, %arg5 : index + %681 = divi_signed %680, %c8 : index + %682 = subi %c-1, %681 : index + %683 = select %678, %682, %681 : index + %684 = addi %arg5, %c104 : index + %685 = cmpi "slt", %684, %c0 : index + %686 = subi %c-1, %684 : index + %687 = select %685, %686, %684 : index + %688 = divi_signed %687, %c16 : index + %689 = subi %c-1, %688 : index + %690 = select %685, %689, %688 : index + %691 = muli %690, %c-2 : index + %692 = addi %683, %691 : index + %693 = cmpi "slt", %arg5, %c0 : index + %694 = subi %c-1, %arg5 : index + %695 = select %693, %694, %arg5 : index + %696 = divi_signed %695, %c8 : index + %697 = subi %c-1, %696 : index + %698 = select %693, %697, %696 : index + %699 = addi %arg5, %c104 : index + %700 = cmpi "slt", %699, %c0 : index + %701 = subi %c-1, %699 : index + %702 = select %700, %701, %699 : index + %703 = divi_signed %702, %c16 : index + %704 = subi %c-1, %703 : index + %705 = select %700, %704, %703 : index + %706 = muli %705, %c-2 : index + %707 = addi %698, %706 : index + %708 = addi %707, %c13 : index + %709 = cmpi "slt", %708, %c0 : index + %710 = subi %c-1, %708 : index + %711 = select %709, %710, %708 : index + %712 = divi_signed %711, %c2 : index + %713 = subi %c-1, %712 : index + %714 = select %709, %713, %712 : index + %715 = muli %714, %c-2 : index + %716 = addi %692, %715 : index + %717 = addi %716, %c13 : index + store %662, %3[%673, %677, %717] : memref<16x128x2xvector<8xf32>> + %718 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %719 = cmpi "slt", %arg5, %c0 : index + %720 = subi %c-1, %arg5 : index + %721 = select %719, %720, %arg5 : index + %722 = divi_signed %721, %c16 : index + %723 = subi %c-1, %722 : index + %724 = select %719, %723, %722 : index + %725 = cmpi "slt", %arg5, %c0 : index + %726 = subi %c-1, %arg5 : index + %727 = select %725, %726, %arg5 : index + %728 = divi_signed %727, %c16 : index + %729 = subi %c-1, %728 : index + %730 = select %725, %729, %728 : index + %731 = addi %730, %c7 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c16 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = muli %737, %c-16 : index + %739 = addi %724, %738 : index + %740 = addi %739, %c7 : index + %741 = remi_signed %arg4, %c128 : index + %742 = cmpi "slt", %741, %c0 : index + %743 = addi %741, %c128 : index + %744 = select %742, %743, %741 : index + %745 = remi_signed %arg5, %c16 : index + %746 = cmpi "slt", %745, %c0 : index + %747 = addi %745, %c16 : index + %748 = select %746, %747, %745 : index + %749 = cmpi "slt", %748, %c0 : index + %750 = subi %c-1, %748 : index + %751 = select %749, %750, %748 : index + %752 = divi_signed %751, %c8 : index + %753 = subi %c-1, %752 : index + %754 = select %749, %753, %752 : index + %755 = remi_signed %754, %c2 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = addi %755, %c2 : index + %758 = select %756, %757, %755 : index + store %718, %3[%740, %744, %758] : memref<16x128x2xvector<8xf32>> + %759 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %760 = addi %arg5, %c120 : index + %761 = cmpi "slt", %760, %c0 : index + %762 = subi %c-1, %760 : index + %763 = select %761, %762, %760 : index + %764 = divi_signed %763, %c16 : index + %765 = subi %c-1, %764 : index + %766 = select %761, %765, %764 : index + %767 = remi_signed %766, %c16 : index + %768 = cmpi "slt", %767, %c0 : index + %769 = addi %767, %c16 : index + %770 = select %768, %769, %767 : index + %771 = remi_signed %arg4, %c128 : index + %772 = cmpi "slt", %771, %c0 : index + %773 = addi %771, %c128 : index + %774 = select %772, %773, %771 : index + %775 = cmpi "slt", %arg5, %c0 : index + %776 = subi %c-1, %arg5 : index + %777 = select %775, %776, %arg5 : index + %778 = divi_signed %777, %c8 : index + %779 = subi %c-1, %778 : index + %780 = select %775, %779, %778 : index + %781 = addi %arg5, %c120 : index + %782 = cmpi "slt", %781, %c0 : index + %783 = subi %c-1, %781 : index + %784 = select %782, %783, %781 : index + %785 = divi_signed %784, %c16 : index + %786 = subi %c-1, %785 : index + %787 = select %782, %786, %785 : index + %788 = muli %787, %c-2 : index + %789 = addi %780, %788 : index + %790 = cmpi "slt", %arg5, %c0 : index + %791 = subi %c-1, %arg5 : index + %792 = select %790, %791, %arg5 : index + %793 = divi_signed %792, %c8 : index + %794 = subi %c-1, %793 : index + %795 = select %790, %794, %793 : index + %796 = addi %arg5, %c120 : index + %797 = cmpi "slt", %796, %c0 : index + %798 = subi %c-1, %796 : index + %799 = select %797, %798, %796 : index + %800 = divi_signed %799, %c16 : index + %801 = subi %c-1, %800 : index + %802 = select %797, %801, %800 : index + %803 = muli %802, %c-2 : index + %804 = addi %795, %803 : index + %805 = addi %804, %c15 : index + %806 = cmpi "slt", %805, %c0 : index + %807 = subi %c-1, %805 : index + %808 = select %806, %807, %805 : index + %809 = divi_signed %808, %c2 : index + %810 = subi %c-1, %809 : index + %811 = select %806, %810, %809 : index + %812 = muli %811, %c-2 : index + %813 = addi %789, %812 : index + %814 = addi %813, %c15 : index + store %759, %3[%770, %774, %814] : memref<16x128x2xvector<8xf32>> + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg1[%arg4, %4], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %5, %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %6 = addi %arg3, %arg5 : index + %7 = addi %6, %c8 : index + %8 = vector.transfer_read %arg1[%arg4, %7], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %8, %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %9 = addi %arg3, %arg5 : index + %10 = addi %9, %c16 : index + %11 = vector.transfer_read %arg1[%arg4, %10], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %11, %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %12 = addi %arg3, %arg5 : index + %13 = addi %12, %c24 : index + %14 = vector.transfer_read %arg1[%arg4, %13], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %14, %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %15 = addi %arg3, %arg5 : index + %16 = addi %15, %c32 : index + %17 = vector.transfer_read %arg1[%arg4, %16], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %17, %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %18 = addi %arg3, %arg5 : index + %19 = addi %18, %c40 : index + %20 = vector.transfer_read %arg1[%arg4, %19], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %20, %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %21 = addi %arg3, %arg5 : index + %22 = addi %21, %c48 : index + %23 = vector.transfer_read %arg1[%arg4, %22], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %23, %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %24 = addi %arg3, %arg5 : index + %25 = addi %24, %c56 : index + %26 = vector.transfer_read %arg1[%arg4, %25], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %26, %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %27 = addi %arg3, %arg5 : index + %28 = addi %27, %c64 : index + %29 = vector.transfer_read %arg1[%arg4, %28], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %29, %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %30 = addi %arg3, %arg5 : index + %31 = addi %30, %c72 : index + %32 = vector.transfer_read %arg1[%arg4, %31], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %32, %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %33 = addi %arg3, %arg5 : index + %34 = addi %33, %c80 : index + %35 = vector.transfer_read %arg1[%arg4, %34], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %35, %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %36 = addi %arg3, %arg5 : index + %37 = addi %36, %c88 : index + %38 = vector.transfer_read %arg1[%arg4, %37], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %38, %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %39 = addi %arg3, %arg5 : index + %40 = addi %39, %c96 : index + %41 = vector.transfer_read %arg1[%arg4, %40], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %41, %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %42 = addi %arg3, %arg5 : index + %43 = addi %42, %c104 : index + %44 = vector.transfer_read %arg1[%arg4, %43], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %44, %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %45 = addi %arg3, %arg5 : index + %46 = addi %45, %c112 : index + %47 = vector.transfer_read %arg1[%arg4, %46], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %47, %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %48 = addi %arg3, %arg5 : index + %49 = addi %48, %c120 : index + %50 = vector.transfer_read %arg1[%arg4, %49], %cst : memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + store %50, %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %51 = load %0[%c0, %c0] : memref<1x16xvector<8xf32>> + %52 = cmpi "slt", %arg5, %c0 : index + %53 = subi %c-1, %arg5 : index + %54 = select %52, %53, %arg5 : index + %55 = divi_signed %54, %c16 : index + %56 = subi %c-1, %55 : index + %57 = select %52, %56, %55 : index + %58 = remi_signed %57, %c16 : index + %59 = cmpi "slt", %58, %c0 : index + %60 = addi %58, %c16 : index + %61 = select %59, %60, %58 : index + %62 = remi_signed %arg4, %c128 : index + %63 = cmpi "slt", %62, %c0 : index + %64 = addi %62, %c128 : index + %65 = select %63, %64, %62 : index + %66 = remi_signed %arg5, %c16 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = addi %66, %c16 : index + %69 = select %67, %68, %66 : index + %70 = cmpi "slt", %69, %c0 : index + %71 = subi %c-1, %69 : index + %72 = select %70, %71, %69 : index + %73 = divi_signed %72, %c8 : index + %74 = subi %c-1, %73 : index + %75 = select %70, %74, %73 : index + %76 = remi_signed %75, %c2 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = addi %76, %c2 : index + %79 = select %77, %78, %76 : index + store %51, %3[%61, %65, %79] : memref<16x128x2xvector<8xf32>> + %80 = load %0[%c0, %c1] : memref<1x16xvector<8xf32>> + %81 = addi %arg5, %c8 : index + %82 = cmpi "slt", %81, %c0 : index + %83 = subi %c-1, %81 : index + %84 = select %82, %83, %81 : index + %85 = divi_signed %84, %c16 : index + %86 = subi %c-1, %85 : index + %87 = select %82, %86, %85 : index + %88 = remi_signed %87, %c16 : index + %89 = cmpi "slt", %88, %c0 : index + %90 = addi %88, %c16 : index + %91 = select %89, %90, %88 : index + %92 = remi_signed %arg4, %c128 : index + %93 = cmpi "slt", %92, %c0 : index + %94 = addi %92, %c128 : index + %95 = select %93, %94, %92 : index + %96 = cmpi "slt", %arg5, %c0 : index + %97 = subi %c-1, %arg5 : index + %98 = select %96, %97, %arg5 : index + %99 = divi_signed %98, %c8 : index + %100 = subi %c-1, %99 : index + %101 = select %96, %100, %99 : index + %102 = addi %arg5, %c8 : index + %103 = cmpi "slt", %102, %c0 : index + %104 = subi %c-1, %102 : index + %105 = select %103, %104, %102 : index + %106 = divi_signed %105, %c16 : index + %107 = subi %c-1, %106 : index + %108 = select %103, %107, %106 : index + %109 = muli %108, %c-2 : index + %110 = addi %101, %109 : index + %111 = cmpi "slt", %arg5, %c0 : index + %112 = subi %c-1, %arg5 : index + %113 = select %111, %112, %arg5 : index + %114 = divi_signed %113, %c8 : index + %115 = subi %c-1, %114 : index + %116 = select %111, %115, %114 : index + %117 = addi %arg5, %c8 : index + %118 = cmpi "slt", %117, %c0 : index + %119 = subi %c-1, %117 : index + %120 = select %118, %119, %117 : index + %121 = divi_signed %120, %c16 : index + %122 = subi %c-1, %121 : index + %123 = select %118, %122, %121 : index + %124 = muli %123, %c-2 : index + %125 = addi %116, %124 : index + %126 = addi %125, %c1 : index + %127 = cmpi "slt", %126, %c0 : index + %128 = subi %c-1, %126 : index + %129 = select %127, %128, %126 : index + %130 = divi_signed %129, %c2 : index + %131 = subi %c-1, %130 : index + %132 = select %127, %131, %130 : index + %133 = muli %132, %c-2 : index + %134 = addi %110, %133 : index + %135 = addi %134, %c1 : index + store %80, %3[%91, %95, %135] : memref<16x128x2xvector<8xf32>> + %136 = load %0[%c0, %c2] : memref<1x16xvector<8xf32>> + %137 = cmpi "slt", %arg5, %c0 : index + %138 = subi %c-1, %arg5 : index + %139 = select %137, %138, %arg5 : index + %140 = divi_signed %139, %c16 : index + %141 = subi %c-1, %140 : index + %142 = select %137, %141, %140 : index + %143 = cmpi "slt", %arg5, %c0 : index + %144 = subi %c-1, %arg5 : index + %145 = select %143, %144, %arg5 : index + %146 = divi_signed %145, %c16 : index + %147 = subi %c-1, %146 : index + %148 = select %143, %147, %146 : index + %149 = addi %148, %c1 : index + %150 = cmpi "slt", %149, %c0 : index + %151 = subi %c-1, %149 : index + %152 = select %150, %151, %149 : index + %153 = divi_signed %152, %c16 : index + %154 = subi %c-1, %153 : index + %155 = select %150, %154, %153 : index + %156 = muli %155, %c-16 : index + %157 = addi %142, %156 : index + %158 = addi %157, %c1 : index + %159 = remi_signed %arg4, %c128 : index + %160 = cmpi "slt", %159, %c0 : index + %161 = addi %159, %c128 : index + %162 = select %160, %161, %159 : index + %163 = remi_signed %arg5, %c16 : index + %164 = cmpi "slt", %163, %c0 : index + %165 = addi %163, %c16 : index + %166 = select %164, %165, %163 : index + %167 = cmpi "slt", %166, %c0 : index + %168 = subi %c-1, %166 : index + %169 = select %167, %168, %166 : index + %170 = divi_signed %169, %c8 : index + %171 = subi %c-1, %170 : index + %172 = select %167, %171, %170 : index + %173 = remi_signed %172, %c2 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = addi %173, %c2 : index + %176 = select %174, %175, %173 : index + store %136, %3[%158, %162, %176] : memref<16x128x2xvector<8xf32>> + %177 = load %0[%c0, %c3] : memref<1x16xvector<8xf32>> + %178 = addi %arg5, %c24 : index + %179 = cmpi "slt", %178, %c0 : index + %180 = subi %c-1, %178 : index + %181 = select %179, %180, %178 : index + %182 = divi_signed %181, %c16 : index + %183 = subi %c-1, %182 : index + %184 = select %179, %183, %182 : index + %185 = remi_signed %184, %c16 : index + %186 = cmpi "slt", %185, %c0 : index + %187 = addi %185, %c16 : index + %188 = select %186, %187, %185 : index + %189 = remi_signed %arg4, %c128 : index + %190 = cmpi "slt", %189, %c0 : index + %191 = addi %189, %c128 : index + %192 = select %190, %191, %189 : index + %193 = cmpi "slt", %arg5, %c0 : index + %194 = subi %c-1, %arg5 : index + %195 = select %193, %194, %arg5 : index + %196 = divi_signed %195, %c8 : index + %197 = subi %c-1, %196 : index + %198 = select %193, %197, %196 : index + %199 = addi %arg5, %c24 : index + %200 = cmpi "slt", %199, %c0 : index + %201 = subi %c-1, %199 : index + %202 = select %200, %201, %199 : index + %203 = divi_signed %202, %c16 : index + %204 = subi %c-1, %203 : index + %205 = select %200, %204, %203 : index + %206 = muli %205, %c-2 : index + %207 = addi %198, %206 : index + %208 = cmpi "slt", %arg5, %c0 : index + %209 = subi %c-1, %arg5 : index + %210 = select %208, %209, %arg5 : index + %211 = divi_signed %210, %c8 : index + %212 = subi %c-1, %211 : index + %213 = select %208, %212, %211 : index + %214 = addi %arg5, %c24 : index + %215 = cmpi "slt", %214, %c0 : index + %216 = subi %c-1, %214 : index + %217 = select %215, %216, %214 : index + %218 = divi_signed %217, %c16 : index + %219 = subi %c-1, %218 : index + %220 = select %215, %219, %218 : index + %221 = muli %220, %c-2 : index + %222 = addi %213, %221 : index + %223 = addi %222, %c3 : index + %224 = cmpi "slt", %223, %c0 : index + %225 = subi %c-1, %223 : index + %226 = select %224, %225, %223 : index + %227 = divi_signed %226, %c2 : index + %228 = subi %c-1, %227 : index + %229 = select %224, %228, %227 : index + %230 = muli %229, %c-2 : index + %231 = addi %207, %230 : index + %232 = addi %231, %c3 : index + store %177, %3[%188, %192, %232] : memref<16x128x2xvector<8xf32>> + %233 = load %0[%c0, %c4] : memref<1x16xvector<8xf32>> + %234 = cmpi "slt", %arg5, %c0 : index + %235 = subi %c-1, %arg5 : index + %236 = select %234, %235, %arg5 : index + %237 = divi_signed %236, %c16 : index + %238 = subi %c-1, %237 : index + %239 = select %234, %238, %237 : index + %240 = cmpi "slt", %arg5, %c0 : index + %241 = subi %c-1, %arg5 : index + %242 = select %240, %241, %arg5 : index + %243 = divi_signed %242, %c16 : index + %244 = subi %c-1, %243 : index + %245 = select %240, %244, %243 : index + %246 = addi %245, %c2 : index + %247 = cmpi "slt", %246, %c0 : index + %248 = subi %c-1, %246 : index + %249 = select %247, %248, %246 : index + %250 = divi_signed %249, %c16 : index + %251 = subi %c-1, %250 : index + %252 = select %247, %251, %250 : index + %253 = muli %252, %c-16 : index + %254 = addi %239, %253 : index + %255 = addi %254, %c2 : index + %256 = remi_signed %arg4, %c128 : index + %257 = cmpi "slt", %256, %c0 : index + %258 = addi %256, %c128 : index + %259 = select %257, %258, %256 : index + %260 = remi_signed %arg5, %c16 : index + %261 = cmpi "slt", %260, %c0 : index + %262 = addi %260, %c16 : index + %263 = select %261, %262, %260 : index + %264 = cmpi "slt", %263, %c0 : index + %265 = subi %c-1, %263 : index + %266 = select %264, %265, %263 : index + %267 = divi_signed %266, %c8 : index + %268 = subi %c-1, %267 : index + %269 = select %264, %268, %267 : index + %270 = remi_signed %269, %c2 : index + %271 = cmpi "slt", %270, %c0 : index + %272 = addi %270, %c2 : index + %273 = select %271, %272, %270 : index + store %233, %3[%255, %259, %273] : memref<16x128x2xvector<8xf32>> + %274 = load %0[%c0, %c5] : memref<1x16xvector<8xf32>> + %275 = addi %arg5, %c40 : index + %276 = cmpi "slt", %275, %c0 : index + %277 = subi %c-1, %275 : index + %278 = select %276, %277, %275 : index + %279 = divi_signed %278, %c16 : index + %280 = subi %c-1, %279 : index + %281 = select %276, %280, %279 : index + %282 = remi_signed %281, %c16 : index + %283 = cmpi "slt", %282, %c0 : index + %284 = addi %282, %c16 : index + %285 = select %283, %284, %282 : index + %286 = remi_signed %arg4, %c128 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c128 : index + %289 = select %287, %288, %286 : index + %290 = cmpi "slt", %arg5, %c0 : index + %291 = subi %c-1, %arg5 : index + %292 = select %290, %291, %arg5 : index + %293 = divi_signed %292, %c8 : index + %294 = subi %c-1, %293 : index + %295 = select %290, %294, %293 : index + %296 = addi %arg5, %c40 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = subi %c-1, %296 : index + %299 = select %297, %298, %296 : index + %300 = divi_signed %299, %c16 : index + %301 = subi %c-1, %300 : index + %302 = select %297, %301, %300 : index + %303 = muli %302, %c-2 : index + %304 = addi %295, %303 : index + %305 = cmpi "slt", %arg5, %c0 : index + %306 = subi %c-1, %arg5 : index + %307 = select %305, %306, %arg5 : index + %308 = divi_signed %307, %c8 : index + %309 = subi %c-1, %308 : index + %310 = select %305, %309, %308 : index + %311 = addi %arg5, %c40 : index + %312 = cmpi "slt", %311, %c0 : index + %313 = subi %c-1, %311 : index + %314 = select %312, %313, %311 : index + %315 = divi_signed %314, %c16 : index + %316 = subi %c-1, %315 : index + %317 = select %312, %316, %315 : index + %318 = muli %317, %c-2 : index + %319 = addi %310, %318 : index + %320 = addi %319, %c5 : index + %321 = cmpi "slt", %320, %c0 : index + %322 = subi %c-1, %320 : index + %323 = select %321, %322, %320 : index + %324 = divi_signed %323, %c2 : index + %325 = subi %c-1, %324 : index + %326 = select %321, %325, %324 : index + %327 = muli %326, %c-2 : index + %328 = addi %304, %327 : index + %329 = addi %328, %c5 : index + store %274, %3[%285, %289, %329] : memref<16x128x2xvector<8xf32>> + %330 = load %0[%c0, %c6] : memref<1x16xvector<8xf32>> + %331 = cmpi "slt", %arg5, %c0 : index + %332 = subi %c-1, %arg5 : index + %333 = select %331, %332, %arg5 : index + %334 = divi_signed %333, %c16 : index + %335 = subi %c-1, %334 : index + %336 = select %331, %335, %334 : index + %337 = cmpi "slt", %arg5, %c0 : index + %338 = subi %c-1, %arg5 : index + %339 = select %337, %338, %arg5 : index + %340 = divi_signed %339, %c16 : index + %341 = subi %c-1, %340 : index + %342 = select %337, %341, %340 : index + %343 = addi %342, %c3 : index + %344 = cmpi "slt", %343, %c0 : index + %345 = subi %c-1, %343 : index + %346 = select %344, %345, %343 : index + %347 = divi_signed %346, %c16 : index + %348 = subi %c-1, %347 : index + %349 = select %344, %348, %347 : index + %350 = muli %349, %c-16 : index + %351 = addi %336, %350 : index + %352 = addi %351, %c3 : index + %353 = remi_signed %arg4, %c128 : index + %354 = cmpi "slt", %353, %c0 : index + %355 = addi %353, %c128 : index + %356 = select %354, %355, %353 : index + %357 = remi_signed %arg5, %c16 : index + %358 = cmpi "slt", %357, %c0 : index + %359 = addi %357, %c16 : index + %360 = select %358, %359, %357 : index + %361 = cmpi "slt", %360, %c0 : index + %362 = subi %c-1, %360 : index + %363 = select %361, %362, %360 : index + %364 = divi_signed %363, %c8 : index + %365 = subi %c-1, %364 : index + %366 = select %361, %365, %364 : index + %367 = remi_signed %366, %c2 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = addi %367, %c2 : index + %370 = select %368, %369, %367 : index + store %330, %3[%352, %356, %370] : memref<16x128x2xvector<8xf32>> + %371 = load %0[%c0, %c7] : memref<1x16xvector<8xf32>> + %372 = addi %arg5, %c56 : index + %373 = cmpi "slt", %372, %c0 : index + %374 = subi %c-1, %372 : index + %375 = select %373, %374, %372 : index + %376 = divi_signed %375, %c16 : index + %377 = subi %c-1, %376 : index + %378 = select %373, %377, %376 : index + %379 = remi_signed %378, %c16 : index + %380 = cmpi "slt", %379, %c0 : index + %381 = addi %379, %c16 : index + %382 = select %380, %381, %379 : index + %383 = remi_signed %arg4, %c128 : index + %384 = cmpi "slt", %383, %c0 : index + %385 = addi %383, %c128 : index + %386 = select %384, %385, %383 : index + %387 = cmpi "slt", %arg5, %c0 : index + %388 = subi %c-1, %arg5 : index + %389 = select %387, %388, %arg5 : index + %390 = divi_signed %389, %c8 : index + %391 = subi %c-1, %390 : index + %392 = select %387, %391, %390 : index + %393 = addi %arg5, %c56 : index + %394 = cmpi "slt", %393, %c0 : index + %395 = subi %c-1, %393 : index + %396 = select %394, %395, %393 : index + %397 = divi_signed %396, %c16 : index + %398 = subi %c-1, %397 : index + %399 = select %394, %398, %397 : index + %400 = muli %399, %c-2 : index + %401 = addi %392, %400 : index + %402 = cmpi "slt", %arg5, %c0 : index + %403 = subi %c-1, %arg5 : index + %404 = select %402, %403, %arg5 : index + %405 = divi_signed %404, %c8 : index + %406 = subi %c-1, %405 : index + %407 = select %402, %406, %405 : index + %408 = addi %arg5, %c56 : index + %409 = cmpi "slt", %408, %c0 : index + %410 = subi %c-1, %408 : index + %411 = select %409, %410, %408 : index + %412 = divi_signed %411, %c16 : index + %413 = subi %c-1, %412 : index + %414 = select %409, %413, %412 : index + %415 = muli %414, %c-2 : index + %416 = addi %407, %415 : index + %417 = addi %416, %c7 : index + %418 = cmpi "slt", %417, %c0 : index + %419 = subi %c-1, %417 : index + %420 = select %418, %419, %417 : index + %421 = divi_signed %420, %c2 : index + %422 = subi %c-1, %421 : index + %423 = select %418, %422, %421 : index + %424 = muli %423, %c-2 : index + %425 = addi %401, %424 : index + %426 = addi %425, %c7 : index + store %371, %3[%382, %386, %426] : memref<16x128x2xvector<8xf32>> + %427 = load %0[%c0, %c8] : memref<1x16xvector<8xf32>> + %428 = cmpi "slt", %arg5, %c0 : index + %429 = subi %c-1, %arg5 : index + %430 = select %428, %429, %arg5 : index + %431 = divi_signed %430, %c16 : index + %432 = subi %c-1, %431 : index + %433 = select %428, %432, %431 : index + %434 = cmpi "slt", %arg5, %c0 : index + %435 = subi %c-1, %arg5 : index + %436 = select %434, %435, %arg5 : index + %437 = divi_signed %436, %c16 : index + %438 = subi %c-1, %437 : index + %439 = select %434, %438, %437 : index + %440 = addi %439, %c4 : index + %441 = cmpi "slt", %440, %c0 : index + %442 = subi %c-1, %440 : index + %443 = select %441, %442, %440 : index + %444 = divi_signed %443, %c16 : index + %445 = subi %c-1, %444 : index + %446 = select %441, %445, %444 : index + %447 = muli %446, %c-16 : index + %448 = addi %433, %447 : index + %449 = addi %448, %c4 : index + %450 = remi_signed %arg4, %c128 : index + %451 = cmpi "slt", %450, %c0 : index + %452 = addi %450, %c128 : index + %453 = select %451, %452, %450 : index + %454 = remi_signed %arg5, %c16 : index + %455 = cmpi "slt", %454, %c0 : index + %456 = addi %454, %c16 : index + %457 = select %455, %456, %454 : index + %458 = cmpi "slt", %457, %c0 : index + %459 = subi %c-1, %457 : index + %460 = select %458, %459, %457 : index + %461 = divi_signed %460, %c8 : index + %462 = subi %c-1, %461 : index + %463 = select %458, %462, %461 : index + %464 = remi_signed %463, %c2 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = addi %464, %c2 : index + %467 = select %465, %466, %464 : index + store %427, %3[%449, %453, %467] : memref<16x128x2xvector<8xf32>> + %468 = load %0[%c0, %c9] : memref<1x16xvector<8xf32>> + %469 = addi %arg5, %c72 : index + %470 = cmpi "slt", %469, %c0 : index + %471 = subi %c-1, %469 : index + %472 = select %470, %471, %469 : index + %473 = divi_signed %472, %c16 : index + %474 = subi %c-1, %473 : index + %475 = select %470, %474, %473 : index + %476 = remi_signed %475, %c16 : index + %477 = cmpi "slt", %476, %c0 : index + %478 = addi %476, %c16 : index + %479 = select %477, %478, %476 : index + %480 = remi_signed %arg4, %c128 : index + %481 = cmpi "slt", %480, %c0 : index + %482 = addi %480, %c128 : index + %483 = select %481, %482, %480 : index + %484 = cmpi "slt", %arg5, %c0 : index + %485 = subi %c-1, %arg5 : index + %486 = select %484, %485, %arg5 : index + %487 = divi_signed %486, %c8 : index + %488 = subi %c-1, %487 : index + %489 = select %484, %488, %487 : index + %490 = addi %arg5, %c72 : index + %491 = cmpi "slt", %490, %c0 : index + %492 = subi %c-1, %490 : index + %493 = select %491, %492, %490 : index + %494 = divi_signed %493, %c16 : index + %495 = subi %c-1, %494 : index + %496 = select %491, %495, %494 : index + %497 = muli %496, %c-2 : index + %498 = addi %489, %497 : index + %499 = cmpi "slt", %arg5, %c0 : index + %500 = subi %c-1, %arg5 : index + %501 = select %499, %500, %arg5 : index + %502 = divi_signed %501, %c8 : index + %503 = subi %c-1, %502 : index + %504 = select %499, %503, %502 : index + %505 = addi %arg5, %c72 : index + %506 = cmpi "slt", %505, %c0 : index + %507 = subi %c-1, %505 : index + %508 = select %506, %507, %505 : index + %509 = divi_signed %508, %c16 : index + %510 = subi %c-1, %509 : index + %511 = select %506, %510, %509 : index + %512 = muli %511, %c-2 : index + %513 = addi %504, %512 : index + %514 = addi %513, %c9 : index + %515 = cmpi "slt", %514, %c0 : index + %516 = subi %c-1, %514 : index + %517 = select %515, %516, %514 : index + %518 = divi_signed %517, %c2 : index + %519 = subi %c-1, %518 : index + %520 = select %515, %519, %518 : index + %521 = muli %520, %c-2 : index + %522 = addi %498, %521 : index + %523 = addi %522, %c9 : index + store %468, %3[%479, %483, %523] : memref<16x128x2xvector<8xf32>> + %524 = load %0[%c0, %c10] : memref<1x16xvector<8xf32>> + %525 = cmpi "slt", %arg5, %c0 : index + %526 = subi %c-1, %arg5 : index + %527 = select %525, %526, %arg5 : index + %528 = divi_signed %527, %c16 : index + %529 = subi %c-1, %528 : index + %530 = select %525, %529, %528 : index + %531 = cmpi "slt", %arg5, %c0 : index + %532 = subi %c-1, %arg5 : index + %533 = select %531, %532, %arg5 : index + %534 = divi_signed %533, %c16 : index + %535 = subi %c-1, %534 : index + %536 = select %531, %535, %534 : index + %537 = addi %536, %c5 : index + %538 = cmpi "slt", %537, %c0 : index + %539 = subi %c-1, %537 : index + %540 = select %538, %539, %537 : index + %541 = divi_signed %540, %c16 : index + %542 = subi %c-1, %541 : index + %543 = select %538, %542, %541 : index + %544 = muli %543, %c-16 : index + %545 = addi %530, %544 : index + %546 = addi %545, %c5 : index + %547 = remi_signed %arg4, %c128 : index + %548 = cmpi "slt", %547, %c0 : index + %549 = addi %547, %c128 : index + %550 = select %548, %549, %547 : index + %551 = remi_signed %arg5, %c16 : index + %552 = cmpi "slt", %551, %c0 : index + %553 = addi %551, %c16 : index + %554 = select %552, %553, %551 : index + %555 = cmpi "slt", %554, %c0 : index + %556 = subi %c-1, %554 : index + %557 = select %555, %556, %554 : index + %558 = divi_signed %557, %c8 : index + %559 = subi %c-1, %558 : index + %560 = select %555, %559, %558 : index + %561 = remi_signed %560, %c2 : index + %562 = cmpi "slt", %561, %c0 : index + %563 = addi %561, %c2 : index + %564 = select %562, %563, %561 : index + store %524, %3[%546, %550, %564] : memref<16x128x2xvector<8xf32>> + %565 = load %0[%c0, %c11] : memref<1x16xvector<8xf32>> + %566 = addi %arg5, %c88 : index + %567 = cmpi "slt", %566, %c0 : index + %568 = subi %c-1, %566 : index + %569 = select %567, %568, %566 : index + %570 = divi_signed %569, %c16 : index + %571 = subi %c-1, %570 : index + %572 = select %567, %571, %570 : index + %573 = remi_signed %572, %c16 : index + %574 = cmpi "slt", %573, %c0 : index + %575 = addi %573, %c16 : index + %576 = select %574, %575, %573 : index + %577 = remi_signed %arg4, %c128 : index + %578 = cmpi "slt", %577, %c0 : index + %579 = addi %577, %c128 : index + %580 = select %578, %579, %577 : index + %581 = cmpi "slt", %arg5, %c0 : index + %582 = subi %c-1, %arg5 : index + %583 = select %581, %582, %arg5 : index + %584 = divi_signed %583, %c8 : index + %585 = subi %c-1, %584 : index + %586 = select %581, %585, %584 : index + %587 = addi %arg5, %c88 : index + %588 = cmpi "slt", %587, %c0 : index + %589 = subi %c-1, %587 : index + %590 = select %588, %589, %587 : index + %591 = divi_signed %590, %c16 : index + %592 = subi %c-1, %591 : index + %593 = select %588, %592, %591 : index + %594 = muli %593, %c-2 : index + %595 = addi %586, %594 : index + %596 = cmpi "slt", %arg5, %c0 : index + %597 = subi %c-1, %arg5 : index + %598 = select %596, %597, %arg5 : index + %599 = divi_signed %598, %c8 : index + %600 = subi %c-1, %599 : index + %601 = select %596, %600, %599 : index + %602 = addi %arg5, %c88 : index + %603 = cmpi "slt", %602, %c0 : index + %604 = subi %c-1, %602 : index + %605 = select %603, %604, %602 : index + %606 = divi_signed %605, %c16 : index + %607 = subi %c-1, %606 : index + %608 = select %603, %607, %606 : index + %609 = muli %608, %c-2 : index + %610 = addi %601, %609 : index + %611 = addi %610, %c11 : index + %612 = cmpi "slt", %611, %c0 : index + %613 = subi %c-1, %611 : index + %614 = select %612, %613, %611 : index + %615 = divi_signed %614, %c2 : index + %616 = subi %c-1, %615 : index + %617 = select %612, %616, %615 : index + %618 = muli %617, %c-2 : index + %619 = addi %595, %618 : index + %620 = addi %619, %c11 : index + store %565, %3[%576, %580, %620] : memref<16x128x2xvector<8xf32>> + %621 = load %0[%c0, %c12] : memref<1x16xvector<8xf32>> + %622 = cmpi "slt", %arg5, %c0 : index + %623 = subi %c-1, %arg5 : index + %624 = select %622, %623, %arg5 : index + %625 = divi_signed %624, %c16 : index + %626 = subi %c-1, %625 : index + %627 = select %622, %626, %625 : index + %628 = cmpi "slt", %arg5, %c0 : index + %629 = subi %c-1, %arg5 : index + %630 = select %628, %629, %arg5 : index + %631 = divi_signed %630, %c16 : index + %632 = subi %c-1, %631 : index + %633 = select %628, %632, %631 : index + %634 = addi %633, %c6 : index + %635 = cmpi "slt", %634, %c0 : index + %636 = subi %c-1, %634 : index + %637 = select %635, %636, %634 : index + %638 = divi_signed %637, %c16 : index + %639 = subi %c-1, %638 : index + %640 = select %635, %639, %638 : index + %641 = muli %640, %c-16 : index + %642 = addi %627, %641 : index + %643 = addi %642, %c6 : index + %644 = remi_signed %arg4, %c128 : index + %645 = cmpi "slt", %644, %c0 : index + %646 = addi %644, %c128 : index + %647 = select %645, %646, %644 : index + %648 = remi_signed %arg5, %c16 : index + %649 = cmpi "slt", %648, %c0 : index + %650 = addi %648, %c16 : index + %651 = select %649, %650, %648 : index + %652 = cmpi "slt", %651, %c0 : index + %653 = subi %c-1, %651 : index + %654 = select %652, %653, %651 : index + %655 = divi_signed %654, %c8 : index + %656 = subi %c-1, %655 : index + %657 = select %652, %656, %655 : index + %658 = remi_signed %657, %c2 : index + %659 = cmpi "slt", %658, %c0 : index + %660 = addi %658, %c2 : index + %661 = select %659, %660, %658 : index + store %621, %3[%643, %647, %661] : memref<16x128x2xvector<8xf32>> + %662 = load %0[%c0, %c13] : memref<1x16xvector<8xf32>> + %663 = addi %arg5, %c104 : index + %664 = cmpi "slt", %663, %c0 : index + %665 = subi %c-1, %663 : index + %666 = select %664, %665, %663 : index + %667 = divi_signed %666, %c16 : index + %668 = subi %c-1, %667 : index + %669 = select %664, %668, %667 : index + %670 = remi_signed %669, %c16 : index + %671 = cmpi "slt", %670, %c0 : index + %672 = addi %670, %c16 : index + %673 = select %671, %672, %670 : index + %674 = remi_signed %arg4, %c128 : index + %675 = cmpi "slt", %674, %c0 : index + %676 = addi %674, %c128 : index + %677 = select %675, %676, %674 : index + %678 = cmpi "slt", %arg5, %c0 : index + %679 = subi %c-1, %arg5 : index + %680 = select %678, %679, %arg5 : index + %681 = divi_signed %680, %c8 : index + %682 = subi %c-1, %681 : index + %683 = select %678, %682, %681 : index + %684 = addi %arg5, %c104 : index + %685 = cmpi "slt", %684, %c0 : index + %686 = subi %c-1, %684 : index + %687 = select %685, %686, %684 : index + %688 = divi_signed %687, %c16 : index + %689 = subi %c-1, %688 : index + %690 = select %685, %689, %688 : index + %691 = muli %690, %c-2 : index + %692 = addi %683, %691 : index + %693 = cmpi "slt", %arg5, %c0 : index + %694 = subi %c-1, %arg5 : index + %695 = select %693, %694, %arg5 : index + %696 = divi_signed %695, %c8 : index + %697 = subi %c-1, %696 : index + %698 = select %693, %697, %696 : index + %699 = addi %arg5, %c104 : index + %700 = cmpi "slt", %699, %c0 : index + %701 = subi %c-1, %699 : index + %702 = select %700, %701, %699 : index + %703 = divi_signed %702, %c16 : index + %704 = subi %c-1, %703 : index + %705 = select %700, %704, %703 : index + %706 = muli %705, %c-2 : index + %707 = addi %698, %706 : index + %708 = addi %707, %c13 : index + %709 = cmpi "slt", %708, %c0 : index + %710 = subi %c-1, %708 : index + %711 = select %709, %710, %708 : index + %712 = divi_signed %711, %c2 : index + %713 = subi %c-1, %712 : index + %714 = select %709, %713, %712 : index + %715 = muli %714, %c-2 : index + %716 = addi %692, %715 : index + %717 = addi %716, %c13 : index + store %662, %3[%673, %677, %717] : memref<16x128x2xvector<8xf32>> + %718 = load %0[%c0, %c14] : memref<1x16xvector<8xf32>> + %719 = cmpi "slt", %arg5, %c0 : index + %720 = subi %c-1, %arg5 : index + %721 = select %719, %720, %arg5 : index + %722 = divi_signed %721, %c16 : index + %723 = subi %c-1, %722 : index + %724 = select %719, %723, %722 : index + %725 = cmpi "slt", %arg5, %c0 : index + %726 = subi %c-1, %arg5 : index + %727 = select %725, %726, %arg5 : index + %728 = divi_signed %727, %c16 : index + %729 = subi %c-1, %728 : index + %730 = select %725, %729, %728 : index + %731 = addi %730, %c7 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c16 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = muli %737, %c-16 : index + %739 = addi %724, %738 : index + %740 = addi %739, %c7 : index + %741 = remi_signed %arg4, %c128 : index + %742 = cmpi "slt", %741, %c0 : index + %743 = addi %741, %c128 : index + %744 = select %742, %743, %741 : index + %745 = remi_signed %arg5, %c16 : index + %746 = cmpi "slt", %745, %c0 : index + %747 = addi %745, %c16 : index + %748 = select %746, %747, %745 : index + %749 = cmpi "slt", %748, %c0 : index + %750 = subi %c-1, %748 : index + %751 = select %749, %750, %748 : index + %752 = divi_signed %751, %c8 : index + %753 = subi %c-1, %752 : index + %754 = select %749, %753, %752 : index + %755 = remi_signed %754, %c2 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = addi %755, %c2 : index + %758 = select %756, %757, %755 : index + store %718, %3[%740, %744, %758] : memref<16x128x2xvector<8xf32>> + %759 = load %0[%c0, %c15] : memref<1x16xvector<8xf32>> + %760 = addi %arg5, %c120 : index + %761 = cmpi "slt", %760, %c0 : index + %762 = subi %c-1, %760 : index + %763 = select %761, %762, %760 : index + %764 = divi_signed %763, %c16 : index + %765 = subi %c-1, %764 : index + %766 = select %761, %765, %764 : index + %767 = remi_signed %766, %c16 : index + %768 = cmpi "slt", %767, %c0 : index + %769 = addi %767, %c16 : index + %770 = select %768, %769, %767 : index + %771 = remi_signed %arg4, %c128 : index + %772 = cmpi "slt", %771, %c0 : index + %773 = addi %771, %c128 : index + %774 = select %772, %773, %771 : index + %775 = cmpi "slt", %arg5, %c0 : index + %776 = subi %c-1, %arg5 : index + %777 = select %775, %776, %arg5 : index + %778 = divi_signed %777, %c8 : index + %779 = subi %c-1, %778 : index + %780 = select %775, %779, %778 : index + %781 = addi %arg5, %c120 : index + %782 = cmpi "slt", %781, %c0 : index + %783 = subi %c-1, %781 : index + %784 = select %782, %783, %781 : index + %785 = divi_signed %784, %c16 : index + %786 = subi %c-1, %785 : index + %787 = select %782, %786, %785 : index + %788 = muli %787, %c-2 : index + %789 = addi %780, %788 : index + %790 = cmpi "slt", %arg5, %c0 : index + %791 = subi %c-1, %arg5 : index + %792 = select %790, %791, %arg5 : index + %793 = divi_signed %792, %c8 : index + %794 = subi %c-1, %793 : index + %795 = select %790, %794, %793 : index + %796 = addi %arg5, %c120 : index + %797 = cmpi "slt", %796, %c0 : index + %798 = subi %c-1, %796 : index + %799 = select %797, %798, %796 : index + %800 = divi_signed %799, %c16 : index + %801 = subi %c-1, %800 : index + %802 = select %797, %801, %800 : index + %803 = muli %802, %c-2 : index + %804 = addi %795, %803 : index + %805 = addi %804, %c15 : index + %806 = cmpi "slt", %805, %c0 : index + %807 = subi %c-1, %805 : index + %808 = select %806, %807, %805 : index + %809 = divi_signed %808, %c2 : index + %810 = subi %c-1, %809 : index + %811 = select %806, %810, %809 : index + %812 = muli %811, %c-2 : index + %813 = addi %789, %812 : index + %814 = addi %813, %c15 : index + store %759, %3[%770, %774, %814] : memref<16x128x2xvector<8xf32>> + } + } + } + scf.for %arg4 = %c0 to %c784 step %c1 { + scf.for %arg5 = %c0 to %c16 step %c1 { + scf.for %arg6 = %c0 to %c6 step %c1 { + scf.for %arg7 = %c0 to %c2 step %c1 { + store %cst_0, %2[%arg5, %arg6, %arg7] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c16 { + scf.for %arg6 = %c0 to %c128 step %c4 { + scf.for %arg7 = %c0 to %c0 step %c6 { + scf.for %arg8 = %c0 to %c4 step %c1 { + scf.for %arg9 = %c0 to %c0 step %c1 { + %4 = addi %arg4, %arg7 : index + %5 = addi %4, %arg9 : index + %6 = addi %arg4, %arg7 : index + %7 = addi %6, %arg9 : index + %8 = addi %arg4, %arg7 : index + %9 = addi %8, %arg9 : index + %10 = addi %arg4, %arg7 : index + %11 = addi %10, %arg9 : index + %12 = addi %arg4, %arg7 : index + %13 = addi %12, %arg9 : index + %14 = addi %arg4, %arg7 : index + %15 = addi %14, %arg9 : index + %16 = addi %arg4, %arg7 : index + %17 = addi %16, %arg9 : index + %18 = addi %arg4, %arg7 : index + %19 = addi %18, %arg9 : index + %20 = addi %arg6, %arg8 : index + %21 = addi %arg6, %arg8 : index + %22 = addi %arg6, %arg8 : index + %23 = addi %arg6, %arg8 : index + %24 = addi %arg6, %arg8 : index + %25 = addi %arg6, %arg8 : index + %26 = addi %arg6, %arg8 : index + %27 = addi %arg6, %arg8 : index + %28 = load %arg0[%5, %20] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %29 = load %arg0[%7, %21] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %30 = load %arg0[%9, %22] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %31 = load %arg0[%11, %23] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %32 = load %arg0[%13, %24] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %33 = load %arg0[%15, %25] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %34 = load %arg0[%17, %26] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %35 = load %arg0[%19, %27] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %36 = cmpi "slt", %arg5, %c0 : index + %37 = subi %c-1, %arg5 : index + %38 = select %36, %37, %arg5 : index + %39 = divi_signed %38, %c16 : index + %40 = subi %c-1, %39 : index + %41 = select %36, %40, %39 : index + %42 = remi_signed %41, %c16 : index + %43 = cmpi "slt", %42, %c0 : index + %44 = addi %42, %c16 : index + %45 = select %43, %44, %42 : index + %46 = addi %arg6, %arg8 : index + %47 = remi_signed %46, %c128 : index + %48 = cmpi "slt", %47, %c0 : index + %49 = addi %47, %c128 : index + %50 = select %48, %49, %47 : index + %51 = remi_signed %arg5, %c16 : index + %52 = cmpi "slt", %51, %c0 : index + %53 = addi %51, %c16 : index + %54 = select %52, %53, %51 : index + %55 = cmpi "slt", %54, %c0 : index + %56 = subi %c-1, %54 : index + %57 = select %55, %56, %54 : index + %58 = divi_signed %57, %c8 : index + %59 = subi %c-1, %58 : index + %60 = select %55, %59, %58 : index + %61 = remi_signed %60, %c2 : index + %62 = cmpi "slt", %61, %c0 : index + %63 = addi %61, %c2 : index + %64 = select %62, %63, %61 : index + %65 = load %3[%45, %50, %64] : memref<16x128x2xvector<8xf32>> + %66 = vector.extractelement %65[%c0_i64 : i64] : vector<8xf32> + %67 = cmpi "slt", %arg5, %c0 : index + %68 = subi %c-1, %arg5 : index + %69 = select %67, %68, %arg5 : index + %70 = divi_signed %69, %c16 : index + %71 = subi %c-1, %70 : index + %72 = select %67, %71, %70 : index + %73 = remi_signed %72, %c16 : index + %74 = cmpi "slt", %73, %c0 : index + %75 = addi %73, %c16 : index + %76 = select %74, %75, %73 : index + %77 = addi %arg6, %arg8 : index + %78 = remi_signed %77, %c128 : index + %79 = cmpi "slt", %78, %c0 : index + %80 = addi %78, %c128 : index + %81 = select %79, %80, %78 : index + %82 = remi_signed %arg5, %c16 : index + %83 = cmpi "slt", %82, %c0 : index + %84 = addi %82, %c16 : index + %85 = select %83, %84, %82 : index + %86 = cmpi "slt", %85, %c0 : index + %87 = subi %c-1, %85 : index + %88 = select %86, %87, %85 : index + %89 = divi_signed %88, %c8 : index + %90 = subi %c-1, %89 : index + %91 = select %86, %90, %89 : index + %92 = remi_signed %91, %c2 : index + %93 = cmpi "slt", %92, %c0 : index + %94 = addi %92, %c2 : index + %95 = select %93, %94, %92 : index + %96 = load %3[%76, %81, %95] : memref<16x128x2xvector<8xf32>> + %97 = vector.extractelement %96[%c1_i64 : i64] : vector<8xf32> + %98 = cmpi "slt", %arg5, %c0 : index + %99 = subi %c-1, %arg5 : index + %100 = select %98, %99, %arg5 : index + %101 = divi_signed %100, %c16 : index + %102 = subi %c-1, %101 : index + %103 = select %98, %102, %101 : index + %104 = remi_signed %103, %c16 : index + %105 = cmpi "slt", %104, %c0 : index + %106 = addi %104, %c16 : index + %107 = select %105, %106, %104 : index + %108 = addi %arg6, %arg8 : index + %109 = remi_signed %108, %c128 : index + %110 = cmpi "slt", %109, %c0 : index + %111 = addi %109, %c128 : index + %112 = select %110, %111, %109 : index + %113 = remi_signed %arg5, %c16 : index + %114 = cmpi "slt", %113, %c0 : index + %115 = addi %113, %c16 : index + %116 = select %114, %115, %113 : index + %117 = cmpi "slt", %116, %c0 : index + %118 = subi %c-1, %116 : index + %119 = select %117, %118, %116 : index + %120 = divi_signed %119, %c8 : index + %121 = subi %c-1, %120 : index + %122 = select %117, %121, %120 : index + %123 = remi_signed %122, %c2 : index + %124 = cmpi "slt", %123, %c0 : index + %125 = addi %123, %c2 : index + %126 = select %124, %125, %123 : index + %127 = load %3[%107, %112, %126] : memref<16x128x2xvector<8xf32>> + %128 = vector.extractelement %127[%c2_i64 : i64] : vector<8xf32> + %129 = cmpi "slt", %arg5, %c0 : index + %130 = subi %c-1, %arg5 : index + %131 = select %129, %130, %arg5 : index + %132 = divi_signed %131, %c16 : index + %133 = subi %c-1, %132 : index + %134 = select %129, %133, %132 : index + %135 = remi_signed %134, %c16 : index + %136 = cmpi "slt", %135, %c0 : index + %137 = addi %135, %c16 : index + %138 = select %136, %137, %135 : index + %139 = addi %arg6, %arg8 : index + %140 = remi_signed %139, %c128 : index + %141 = cmpi "slt", %140, %c0 : index + %142 = addi %140, %c128 : index + %143 = select %141, %142, %140 : index + %144 = remi_signed %arg5, %c16 : index + %145 = cmpi "slt", %144, %c0 : index + %146 = addi %144, %c16 : index + %147 = select %145, %146, %144 : index + %148 = cmpi "slt", %147, %c0 : index + %149 = subi %c-1, %147 : index + %150 = select %148, %149, %147 : index + %151 = divi_signed %150, %c8 : index + %152 = subi %c-1, %151 : index + %153 = select %148, %152, %151 : index + %154 = remi_signed %153, %c2 : index + %155 = cmpi "slt", %154, %c0 : index + %156 = addi %154, %c2 : index + %157 = select %155, %156, %154 : index + %158 = load %3[%138, %143, %157] : memref<16x128x2xvector<8xf32>> + %159 = vector.extractelement %158[%c3_i64 : i64] : vector<8xf32> + %160 = cmpi "slt", %arg5, %c0 : index + %161 = subi %c-1, %arg5 : index + %162 = select %160, %161, %arg5 : index + %163 = divi_signed %162, %c16 : index + %164 = subi %c-1, %163 : index + %165 = select %160, %164, %163 : index + %166 = remi_signed %165, %c16 : index + %167 = cmpi "slt", %166, %c0 : index + %168 = addi %166, %c16 : index + %169 = select %167, %168, %166 : index + %170 = addi %arg6, %arg8 : index + %171 = remi_signed %170, %c128 : index + %172 = cmpi "slt", %171, %c0 : index + %173 = addi %171, %c128 : index + %174 = select %172, %173, %171 : index + %175 = remi_signed %arg5, %c16 : index + %176 = cmpi "slt", %175, %c0 : index + %177 = addi %175, %c16 : index + %178 = select %176, %177, %175 : index + %179 = cmpi "slt", %178, %c0 : index + %180 = subi %c-1, %178 : index + %181 = select %179, %180, %178 : index + %182 = divi_signed %181, %c8 : index + %183 = subi %c-1, %182 : index + %184 = select %179, %183, %182 : index + %185 = remi_signed %184, %c2 : index + %186 = cmpi "slt", %185, %c0 : index + %187 = addi %185, %c2 : index + %188 = select %186, %187, %185 : index + %189 = load %3[%169, %174, %188] : memref<16x128x2xvector<8xf32>> + %190 = vector.extractelement %189[%c4_i64 : i64] : vector<8xf32> + %191 = cmpi "slt", %arg5, %c0 : index + %192 = subi %c-1, %arg5 : index + %193 = select %191, %192, %arg5 : index + %194 = divi_signed %193, %c16 : index + %195 = subi %c-1, %194 : index + %196 = select %191, %195, %194 : index + %197 = remi_signed %196, %c16 : index + %198 = cmpi "slt", %197, %c0 : index + %199 = addi %197, %c16 : index + %200 = select %198, %199, %197 : index + %201 = addi %arg6, %arg8 : index + %202 = remi_signed %201, %c128 : index + %203 = cmpi "slt", %202, %c0 : index + %204 = addi %202, %c128 : index + %205 = select %203, %204, %202 : index + %206 = remi_signed %arg5, %c16 : index + %207 = cmpi "slt", %206, %c0 : index + %208 = addi %206, %c16 : index + %209 = select %207, %208, %206 : index + %210 = cmpi "slt", %209, %c0 : index + %211 = subi %c-1, %209 : index + %212 = select %210, %211, %209 : index + %213 = divi_signed %212, %c8 : index + %214 = subi %c-1, %213 : index + %215 = select %210, %214, %213 : index + %216 = remi_signed %215, %c2 : index + %217 = cmpi "slt", %216, %c0 : index + %218 = addi %216, %c2 : index + %219 = select %217, %218, %216 : index + %220 = load %3[%200, %205, %219] : memref<16x128x2xvector<8xf32>> + %221 = vector.extractelement %220[%c5_i64 : i64] : vector<8xf32> + %222 = cmpi "slt", %arg5, %c0 : index + %223 = subi %c-1, %arg5 : index + %224 = select %222, %223, %arg5 : index + %225 = divi_signed %224, %c16 : index + %226 = subi %c-1, %225 : index + %227 = select %222, %226, %225 : index + %228 = remi_signed %227, %c16 : index + %229 = cmpi "slt", %228, %c0 : index + %230 = addi %228, %c16 : index + %231 = select %229, %230, %228 : index + %232 = addi %arg6, %arg8 : index + %233 = remi_signed %232, %c128 : index + %234 = cmpi "slt", %233, %c0 : index + %235 = addi %233, %c128 : index + %236 = select %234, %235, %233 : index + %237 = remi_signed %arg5, %c16 : index + %238 = cmpi "slt", %237, %c0 : index + %239 = addi %237, %c16 : index + %240 = select %238, %239, %237 : index + %241 = cmpi "slt", %240, %c0 : index + %242 = subi %c-1, %240 : index + %243 = select %241, %242, %240 : index + %244 = divi_signed %243, %c8 : index + %245 = subi %c-1, %244 : index + %246 = select %241, %245, %244 : index + %247 = remi_signed %246, %c2 : index + %248 = cmpi "slt", %247, %c0 : index + %249 = addi %247, %c2 : index + %250 = select %248, %249, %247 : index + %251 = load %3[%231, %236, %250] : memref<16x128x2xvector<8xf32>> + %252 = vector.extractelement %251[%c6_i64 : i64] : vector<8xf32> + %253 = cmpi "slt", %arg5, %c0 : index + %254 = subi %c-1, %arg5 : index + %255 = select %253, %254, %arg5 : index + %256 = divi_signed %255, %c16 : index + %257 = subi %c-1, %256 : index + %258 = select %253, %257, %256 : index + %259 = remi_signed %258, %c16 : index + %260 = cmpi "slt", %259, %c0 : index + %261 = addi %259, %c16 : index + %262 = select %260, %261, %259 : index + %263 = addi %arg6, %arg8 : index + %264 = remi_signed %263, %c128 : index + %265 = cmpi "slt", %264, %c0 : index + %266 = addi %264, %c128 : index + %267 = select %265, %266, %264 : index + %268 = remi_signed %arg5, %c16 : index + %269 = cmpi "slt", %268, %c0 : index + %270 = addi %268, %c16 : index + %271 = select %269, %270, %268 : index + %272 = cmpi "slt", %271, %c0 : index + %273 = subi %c-1, %271 : index + %274 = select %272, %273, %271 : index + %275 = divi_signed %274, %c8 : index + %276 = subi %c-1, %275 : index + %277 = select %272, %276, %275 : index + %278 = remi_signed %277, %c2 : index + %279 = cmpi "slt", %278, %c0 : index + %280 = addi %278, %c2 : index + %281 = select %279, %280, %278 : index + %282 = load %3[%262, %267, %281] : memref<16x128x2xvector<8xf32>> + %283 = vector.extractelement %282[%c7_i64 : i64] : vector<8xf32> + %284 = mulf %28, %66 {RelaxedPrecision} : f32 + %285 = mulf %29, %97 {RelaxedPrecision} : f32 + %286 = mulf %30, %128 {RelaxedPrecision} : f32 + %287 = mulf %31, %159 {RelaxedPrecision} : f32 + %288 = mulf %32, %190 {RelaxedPrecision} : f32 + %289 = mulf %33, %221 {RelaxedPrecision} : f32 + %290 = mulf %34, %252 {RelaxedPrecision} : f32 + %291 = mulf %35, %283 {RelaxedPrecision} : f32 + %292 = cmpi "slt", %arg5, %c0 : index + %293 = subi %c-1, %arg5 : index + %294 = select %292, %293, %arg5 : index + %295 = divi_signed %294, %c16 : index + %296 = subi %c-1, %295 : index + %297 = select %292, %296, %295 : index + %298 = remi_signed %297, %c16 : index + %299 = cmpi "slt", %298, %c0 : index + %300 = addi %298, %c16 : index + %301 = select %299, %300, %298 : index + %302 = addi %arg7, %arg9 : index + %303 = remi_signed %302, %c6 : index + %304 = cmpi "slt", %303, %c0 : index + %305 = addi %303, %c6 : index + %306 = select %304, %305, %303 : index + %307 = remi_signed %arg5, %c16 : index + %308 = cmpi "slt", %307, %c0 : index + %309 = addi %307, %c16 : index + %310 = select %308, %309, %307 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c8 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = remi_signed %316, %c2 : index + %318 = cmpi "slt", %317, %c0 : index + %319 = addi %317, %c2 : index + %320 = select %318, %319, %317 : index + %321 = load %2[%301, %306, %320] : memref<16x6x2xvector<8xf32>> + %322 = vector.extractelement %321[%c0_i64 : i64] : vector<8xf32> + %323 = cmpi "slt", %arg5, %c0 : index + %324 = subi %c-1, %arg5 : index + %325 = select %323, %324, %arg5 : index + %326 = divi_signed %325, %c16 : index + %327 = subi %c-1, %326 : index + %328 = select %323, %327, %326 : index + %329 = remi_signed %328, %c16 : index + %330 = cmpi "slt", %329, %c0 : index + %331 = addi %329, %c16 : index + %332 = select %330, %331, %329 : index + %333 = addi %arg7, %arg9 : index + %334 = remi_signed %333, %c6 : index + %335 = cmpi "slt", %334, %c0 : index + %336 = addi %334, %c6 : index + %337 = select %335, %336, %334 : index + %338 = remi_signed %arg5, %c16 : index + %339 = cmpi "slt", %338, %c0 : index + %340 = addi %338, %c16 : index + %341 = select %339, %340, %338 : index + %342 = cmpi "slt", %341, %c0 : index + %343 = subi %c-1, %341 : index + %344 = select %342, %343, %341 : index + %345 = divi_signed %344, %c8 : index + %346 = subi %c-1, %345 : index + %347 = select %342, %346, %345 : index + %348 = remi_signed %347, %c2 : index + %349 = cmpi "slt", %348, %c0 : index + %350 = addi %348, %c2 : index + %351 = select %349, %350, %348 : index + %352 = load %2[%332, %337, %351] : memref<16x6x2xvector<8xf32>> + %353 = vector.extractelement %352[%c1_i64 : i64] : vector<8xf32> + %354 = cmpi "slt", %arg5, %c0 : index + %355 = subi %c-1, %arg5 : index + %356 = select %354, %355, %arg5 : index + %357 = divi_signed %356, %c16 : index + %358 = subi %c-1, %357 : index + %359 = select %354, %358, %357 : index + %360 = remi_signed %359, %c16 : index + %361 = cmpi "slt", %360, %c0 : index + %362 = addi %360, %c16 : index + %363 = select %361, %362, %360 : index + %364 = addi %arg7, %arg9 : index + %365 = remi_signed %364, %c6 : index + %366 = cmpi "slt", %365, %c0 : index + %367 = addi %365, %c6 : index + %368 = select %366, %367, %365 : index + %369 = remi_signed %arg5, %c16 : index + %370 = cmpi "slt", %369, %c0 : index + %371 = addi %369, %c16 : index + %372 = select %370, %371, %369 : index + %373 = cmpi "slt", %372, %c0 : index + %374 = subi %c-1, %372 : index + %375 = select %373, %374, %372 : index + %376 = divi_signed %375, %c8 : index + %377 = subi %c-1, %376 : index + %378 = select %373, %377, %376 : index + %379 = remi_signed %378, %c2 : index + %380 = cmpi "slt", %379, %c0 : index + %381 = addi %379, %c2 : index + %382 = select %380, %381, %379 : index + %383 = load %2[%363, %368, %382] : memref<16x6x2xvector<8xf32>> + %384 = vector.extractelement %383[%c2_i64 : i64] : vector<8xf32> + %385 = cmpi "slt", %arg5, %c0 : index + %386 = subi %c-1, %arg5 : index + %387 = select %385, %386, %arg5 : index + %388 = divi_signed %387, %c16 : index + %389 = subi %c-1, %388 : index + %390 = select %385, %389, %388 : index + %391 = remi_signed %390, %c16 : index + %392 = cmpi "slt", %391, %c0 : index + %393 = addi %391, %c16 : index + %394 = select %392, %393, %391 : index + %395 = addi %arg7, %arg9 : index + %396 = remi_signed %395, %c6 : index + %397 = cmpi "slt", %396, %c0 : index + %398 = addi %396, %c6 : index + %399 = select %397, %398, %396 : index + %400 = remi_signed %arg5, %c16 : index + %401 = cmpi "slt", %400, %c0 : index + %402 = addi %400, %c16 : index + %403 = select %401, %402, %400 : index + %404 = cmpi "slt", %403, %c0 : index + %405 = subi %c-1, %403 : index + %406 = select %404, %405, %403 : index + %407 = divi_signed %406, %c8 : index + %408 = subi %c-1, %407 : index + %409 = select %404, %408, %407 : index + %410 = remi_signed %409, %c2 : index + %411 = cmpi "slt", %410, %c0 : index + %412 = addi %410, %c2 : index + %413 = select %411, %412, %410 : index + %414 = load %2[%394, %399, %413] : memref<16x6x2xvector<8xf32>> + %415 = vector.extractelement %414[%c3_i64 : i64] : vector<8xf32> + %416 = cmpi "slt", %arg5, %c0 : index + %417 = subi %c-1, %arg5 : index + %418 = select %416, %417, %arg5 : index + %419 = divi_signed %418, %c16 : index + %420 = subi %c-1, %419 : index + %421 = select %416, %420, %419 : index + %422 = remi_signed %421, %c16 : index + %423 = cmpi "slt", %422, %c0 : index + %424 = addi %422, %c16 : index + %425 = select %423, %424, %422 : index + %426 = addi %arg7, %arg9 : index + %427 = remi_signed %426, %c6 : index + %428 = cmpi "slt", %427, %c0 : index + %429 = addi %427, %c6 : index + %430 = select %428, %429, %427 : index + %431 = remi_signed %arg5, %c16 : index + %432 = cmpi "slt", %431, %c0 : index + %433 = addi %431, %c16 : index + %434 = select %432, %433, %431 : index + %435 = cmpi "slt", %434, %c0 : index + %436 = subi %c-1, %434 : index + %437 = select %435, %436, %434 : index + %438 = divi_signed %437, %c8 : index + %439 = subi %c-1, %438 : index + %440 = select %435, %439, %438 : index + %441 = remi_signed %440, %c2 : index + %442 = cmpi "slt", %441, %c0 : index + %443 = addi %441, %c2 : index + %444 = select %442, %443, %441 : index + %445 = load %2[%425, %430, %444] : memref<16x6x2xvector<8xf32>> + %446 = vector.extractelement %445[%c4_i64 : i64] : vector<8xf32> + %447 = cmpi "slt", %arg5, %c0 : index + %448 = subi %c-1, %arg5 : index + %449 = select %447, %448, %arg5 : index + %450 = divi_signed %449, %c16 : index + %451 = subi %c-1, %450 : index + %452 = select %447, %451, %450 : index + %453 = remi_signed %452, %c16 : index + %454 = cmpi "slt", %453, %c0 : index + %455 = addi %453, %c16 : index + %456 = select %454, %455, %453 : index + %457 = addi %arg7, %arg9 : index + %458 = remi_signed %457, %c6 : index + %459 = cmpi "slt", %458, %c0 : index + %460 = addi %458, %c6 : index + %461 = select %459, %460, %458 : index + %462 = remi_signed %arg5, %c16 : index + %463 = cmpi "slt", %462, %c0 : index + %464 = addi %462, %c16 : index + %465 = select %463, %464, %462 : index + %466 = cmpi "slt", %465, %c0 : index + %467 = subi %c-1, %465 : index + %468 = select %466, %467, %465 : index + %469 = divi_signed %468, %c8 : index + %470 = subi %c-1, %469 : index + %471 = select %466, %470, %469 : index + %472 = remi_signed %471, %c2 : index + %473 = cmpi "slt", %472, %c0 : index + %474 = addi %472, %c2 : index + %475 = select %473, %474, %472 : index + %476 = load %2[%456, %461, %475] : memref<16x6x2xvector<8xf32>> + %477 = vector.extractelement %476[%c5_i64 : i64] : vector<8xf32> + %478 = cmpi "slt", %arg5, %c0 : index + %479 = subi %c-1, %arg5 : index + %480 = select %478, %479, %arg5 : index + %481 = divi_signed %480, %c16 : index + %482 = subi %c-1, %481 : index + %483 = select %478, %482, %481 : index + %484 = remi_signed %483, %c16 : index + %485 = cmpi "slt", %484, %c0 : index + %486 = addi %484, %c16 : index + %487 = select %485, %486, %484 : index + %488 = addi %arg7, %arg9 : index + %489 = remi_signed %488, %c6 : index + %490 = cmpi "slt", %489, %c0 : index + %491 = addi %489, %c6 : index + %492 = select %490, %491, %489 : index + %493 = remi_signed %arg5, %c16 : index + %494 = cmpi "slt", %493, %c0 : index + %495 = addi %493, %c16 : index + %496 = select %494, %495, %493 : index + %497 = cmpi "slt", %496, %c0 : index + %498 = subi %c-1, %496 : index + %499 = select %497, %498, %496 : index + %500 = divi_signed %499, %c8 : index + %501 = subi %c-1, %500 : index + %502 = select %497, %501, %500 : index + %503 = remi_signed %502, %c2 : index + %504 = cmpi "slt", %503, %c0 : index + %505 = addi %503, %c2 : index + %506 = select %504, %505, %503 : index + %507 = load %2[%487, %492, %506] : memref<16x6x2xvector<8xf32>> + %508 = vector.extractelement %507[%c6_i64 : i64] : vector<8xf32> + %509 = cmpi "slt", %arg5, %c0 : index + %510 = subi %c-1, %arg5 : index + %511 = select %509, %510, %arg5 : index + %512 = divi_signed %511, %c16 : index + %513 = subi %c-1, %512 : index + %514 = select %509, %513, %512 : index + %515 = remi_signed %514, %c16 : index + %516 = cmpi "slt", %515, %c0 : index + %517 = addi %515, %c16 : index + %518 = select %516, %517, %515 : index + %519 = addi %arg7, %arg9 : index + %520 = remi_signed %519, %c6 : index + %521 = cmpi "slt", %520, %c0 : index + %522 = addi %520, %c6 : index + %523 = select %521, %522, %520 : index + %524 = remi_signed %arg5, %c16 : index + %525 = cmpi "slt", %524, %c0 : index + %526 = addi %524, %c16 : index + %527 = select %525, %526, %524 : index + %528 = cmpi "slt", %527, %c0 : index + %529 = subi %c-1, %527 : index + %530 = select %528, %529, %527 : index + %531 = divi_signed %530, %c8 : index + %532 = subi %c-1, %531 : index + %533 = select %528, %532, %531 : index + %534 = remi_signed %533, %c2 : index + %535 = cmpi "slt", %534, %c0 : index + %536 = addi %534, %c2 : index + %537 = select %535, %536, %534 : index + %538 = load %2[%518, %523, %537] : memref<16x6x2xvector<8xf32>> + %539 = vector.extractelement %538[%c7_i64 : i64] : vector<8xf32> + %540 = addf %322, %284 {RelaxedPrecision} : f32 + %541 = addf %353, %285 {RelaxedPrecision} : f32 + %542 = addf %384, %286 {RelaxedPrecision} : f32 + %543 = addf %415, %287 {RelaxedPrecision} : f32 + %544 = addf %446, %288 {RelaxedPrecision} : f32 + %545 = addf %477, %289 {RelaxedPrecision} : f32 + %546 = addf %508, %290 {RelaxedPrecision} : f32 + %547 = addf %539, %291 {RelaxedPrecision} : f32 + %548 = cmpi "slt", %arg5, %c0 : index + %549 = subi %c-1, %arg5 : index + %550 = select %548, %549, %arg5 : index + %551 = divi_signed %550, %c16 : index + %552 = subi %c-1, %551 : index + %553 = select %548, %552, %551 : index + %554 = remi_signed %553, %c16 : index + %555 = cmpi "slt", %554, %c0 : index + %556 = addi %554, %c16 : index + %557 = select %555, %556, %554 : index + %558 = addi %arg7, %arg9 : index + %559 = remi_signed %558, %c6 : index + %560 = cmpi "slt", %559, %c0 : index + %561 = addi %559, %c6 : index + %562 = select %560, %561, %559 : index + %563 = remi_signed %arg5, %c16 : index + %564 = cmpi "slt", %563, %c0 : index + %565 = addi %563, %c16 : index + %566 = select %564, %565, %563 : index + %567 = cmpi "slt", %566, %c0 : index + %568 = subi %c-1, %566 : index + %569 = select %567, %568, %566 : index + %570 = divi_signed %569, %c8 : index + %571 = subi %c-1, %570 : index + %572 = select %567, %571, %570 : index + %573 = remi_signed %572, %c2 : index + %574 = cmpi "slt", %573, %c0 : index + %575 = addi %573, %c2 : index + %576 = select %574, %575, %573 : index + %577 = load %2[%557, %562, %576] : memref<16x6x2xvector<8xf32>> + %578 = vector.insertelement %540, %577[%c0_i64 : i64] : vector<8xf32> + %579 = cmpi "slt", %arg5, %c0 : index + %580 = subi %c-1, %arg5 : index + %581 = select %579, %580, %arg5 : index + %582 = divi_signed %581, %c16 : index + %583 = subi %c-1, %582 : index + %584 = select %579, %583, %582 : index + %585 = remi_signed %584, %c16 : index + %586 = cmpi "slt", %585, %c0 : index + %587 = addi %585, %c16 : index + %588 = select %586, %587, %585 : index + %589 = addi %arg7, %arg9 : index + %590 = remi_signed %589, %c6 : index + %591 = cmpi "slt", %590, %c0 : index + %592 = addi %590, %c6 : index + %593 = select %591, %592, %590 : index + %594 = remi_signed %arg5, %c16 : index + %595 = cmpi "slt", %594, %c0 : index + %596 = addi %594, %c16 : index + %597 = select %595, %596, %594 : index + %598 = cmpi "slt", %597, %c0 : index + %599 = subi %c-1, %597 : index + %600 = select %598, %599, %597 : index + %601 = divi_signed %600, %c8 : index + %602 = subi %c-1, %601 : index + %603 = select %598, %602, %601 : index + %604 = remi_signed %603, %c2 : index + %605 = cmpi "slt", %604, %c0 : index + %606 = addi %604, %c2 : index + %607 = select %605, %606, %604 : index + store %578, %2[%588, %593, %607] : memref<16x6x2xvector<8xf32>> + %608 = cmpi "slt", %arg5, %c0 : index + %609 = subi %c-1, %arg5 : index + %610 = select %608, %609, %arg5 : index + %611 = divi_signed %610, %c16 : index + %612 = subi %c-1, %611 : index + %613 = select %608, %612, %611 : index + %614 = remi_signed %613, %c16 : index + %615 = cmpi "slt", %614, %c0 : index + %616 = addi %614, %c16 : index + %617 = select %615, %616, %614 : index + %618 = addi %arg7, %arg9 : index + %619 = remi_signed %618, %c6 : index + %620 = cmpi "slt", %619, %c0 : index + %621 = addi %619, %c6 : index + %622 = select %620, %621, %619 : index + %623 = remi_signed %arg5, %c16 : index + %624 = cmpi "slt", %623, %c0 : index + %625 = addi %623, %c16 : index + %626 = select %624, %625, %623 : index + %627 = cmpi "slt", %626, %c0 : index + %628 = subi %c-1, %626 : index + %629 = select %627, %628, %626 : index + %630 = divi_signed %629, %c8 : index + %631 = subi %c-1, %630 : index + %632 = select %627, %631, %630 : index + %633 = remi_signed %632, %c2 : index + %634 = cmpi "slt", %633, %c0 : index + %635 = addi %633, %c2 : index + %636 = select %634, %635, %633 : index + %637 = load %2[%617, %622, %636] : memref<16x6x2xvector<8xf32>> + %638 = vector.insertelement %541, %637[%c1_i64 : i64] : vector<8xf32> + %639 = cmpi "slt", %arg5, %c0 : index + %640 = subi %c-1, %arg5 : index + %641 = select %639, %640, %arg5 : index + %642 = divi_signed %641, %c16 : index + %643 = subi %c-1, %642 : index + %644 = select %639, %643, %642 : index + %645 = remi_signed %644, %c16 : index + %646 = cmpi "slt", %645, %c0 : index + %647 = addi %645, %c16 : index + %648 = select %646, %647, %645 : index + %649 = addi %arg7, %arg9 : index + %650 = remi_signed %649, %c6 : index + %651 = cmpi "slt", %650, %c0 : index + %652 = addi %650, %c6 : index + %653 = select %651, %652, %650 : index + %654 = remi_signed %arg5, %c16 : index + %655 = cmpi "slt", %654, %c0 : index + %656 = addi %654, %c16 : index + %657 = select %655, %656, %654 : index + %658 = cmpi "slt", %657, %c0 : index + %659 = subi %c-1, %657 : index + %660 = select %658, %659, %657 : index + %661 = divi_signed %660, %c8 : index + %662 = subi %c-1, %661 : index + %663 = select %658, %662, %661 : index + %664 = remi_signed %663, %c2 : index + %665 = cmpi "slt", %664, %c0 : index + %666 = addi %664, %c2 : index + %667 = select %665, %666, %664 : index + store %638, %2[%648, %653, %667] : memref<16x6x2xvector<8xf32>> + %668 = cmpi "slt", %arg5, %c0 : index + %669 = subi %c-1, %arg5 : index + %670 = select %668, %669, %arg5 : index + %671 = divi_signed %670, %c16 : index + %672 = subi %c-1, %671 : index + %673 = select %668, %672, %671 : index + %674 = remi_signed %673, %c16 : index + %675 = cmpi "slt", %674, %c0 : index + %676 = addi %674, %c16 : index + %677 = select %675, %676, %674 : index + %678 = addi %arg7, %arg9 : index + %679 = remi_signed %678, %c6 : index + %680 = cmpi "slt", %679, %c0 : index + %681 = addi %679, %c6 : index + %682 = select %680, %681, %679 : index + %683 = remi_signed %arg5, %c16 : index + %684 = cmpi "slt", %683, %c0 : index + %685 = addi %683, %c16 : index + %686 = select %684, %685, %683 : index + %687 = cmpi "slt", %686, %c0 : index + %688 = subi %c-1, %686 : index + %689 = select %687, %688, %686 : index + %690 = divi_signed %689, %c8 : index + %691 = subi %c-1, %690 : index + %692 = select %687, %691, %690 : index + %693 = remi_signed %692, %c2 : index + %694 = cmpi "slt", %693, %c0 : index + %695 = addi %693, %c2 : index + %696 = select %694, %695, %693 : index + %697 = load %2[%677, %682, %696] : memref<16x6x2xvector<8xf32>> + %698 = vector.insertelement %542, %697[%c2_i64 : i64] : vector<8xf32> + %699 = cmpi "slt", %arg5, %c0 : index + %700 = subi %c-1, %arg5 : index + %701 = select %699, %700, %arg5 : index + %702 = divi_signed %701, %c16 : index + %703 = subi %c-1, %702 : index + %704 = select %699, %703, %702 : index + %705 = remi_signed %704, %c16 : index + %706 = cmpi "slt", %705, %c0 : index + %707 = addi %705, %c16 : index + %708 = select %706, %707, %705 : index + %709 = addi %arg7, %arg9 : index + %710 = remi_signed %709, %c6 : index + %711 = cmpi "slt", %710, %c0 : index + %712 = addi %710, %c6 : index + %713 = select %711, %712, %710 : index + %714 = remi_signed %arg5, %c16 : index + %715 = cmpi "slt", %714, %c0 : index + %716 = addi %714, %c16 : index + %717 = select %715, %716, %714 : index + %718 = cmpi "slt", %717, %c0 : index + %719 = subi %c-1, %717 : index + %720 = select %718, %719, %717 : index + %721 = divi_signed %720, %c8 : index + %722 = subi %c-1, %721 : index + %723 = select %718, %722, %721 : index + %724 = remi_signed %723, %c2 : index + %725 = cmpi "slt", %724, %c0 : index + %726 = addi %724, %c2 : index + %727 = select %725, %726, %724 : index + store %698, %2[%708, %713, %727] : memref<16x6x2xvector<8xf32>> + %728 = cmpi "slt", %arg5, %c0 : index + %729 = subi %c-1, %arg5 : index + %730 = select %728, %729, %arg5 : index + %731 = divi_signed %730, %c16 : index + %732 = subi %c-1, %731 : index + %733 = select %728, %732, %731 : index + %734 = remi_signed %733, %c16 : index + %735 = cmpi "slt", %734, %c0 : index + %736 = addi %734, %c16 : index + %737 = select %735, %736, %734 : index + %738 = addi %arg7, %arg9 : index + %739 = remi_signed %738, %c6 : index + %740 = cmpi "slt", %739, %c0 : index + %741 = addi %739, %c6 : index + %742 = select %740, %741, %739 : index + %743 = remi_signed %arg5, %c16 : index + %744 = cmpi "slt", %743, %c0 : index + %745 = addi %743, %c16 : index + %746 = select %744, %745, %743 : index + %747 = cmpi "slt", %746, %c0 : index + %748 = subi %c-1, %746 : index + %749 = select %747, %748, %746 : index + %750 = divi_signed %749, %c8 : index + %751 = subi %c-1, %750 : index + %752 = select %747, %751, %750 : index + %753 = remi_signed %752, %c2 : index + %754 = cmpi "slt", %753, %c0 : index + %755 = addi %753, %c2 : index + %756 = select %754, %755, %753 : index + %757 = load %2[%737, %742, %756] : memref<16x6x2xvector<8xf32>> + %758 = vector.insertelement %543, %757[%c3_i64 : i64] : vector<8xf32> + %759 = cmpi "slt", %arg5, %c0 : index + %760 = subi %c-1, %arg5 : index + %761 = select %759, %760, %arg5 : index + %762 = divi_signed %761, %c16 : index + %763 = subi %c-1, %762 : index + %764 = select %759, %763, %762 : index + %765 = remi_signed %764, %c16 : index + %766 = cmpi "slt", %765, %c0 : index + %767 = addi %765, %c16 : index + %768 = select %766, %767, %765 : index + %769 = addi %arg7, %arg9 : index + %770 = remi_signed %769, %c6 : index + %771 = cmpi "slt", %770, %c0 : index + %772 = addi %770, %c6 : index + %773 = select %771, %772, %770 : index + %774 = remi_signed %arg5, %c16 : index + %775 = cmpi "slt", %774, %c0 : index + %776 = addi %774, %c16 : index + %777 = select %775, %776, %774 : index + %778 = cmpi "slt", %777, %c0 : index + %779 = subi %c-1, %777 : index + %780 = select %778, %779, %777 : index + %781 = divi_signed %780, %c8 : index + %782 = subi %c-1, %781 : index + %783 = select %778, %782, %781 : index + %784 = remi_signed %783, %c2 : index + %785 = cmpi "slt", %784, %c0 : index + %786 = addi %784, %c2 : index + %787 = select %785, %786, %784 : index + store %758, %2[%768, %773, %787] : memref<16x6x2xvector<8xf32>> + %788 = cmpi "slt", %arg5, %c0 : index + %789 = subi %c-1, %arg5 : index + %790 = select %788, %789, %arg5 : index + %791 = divi_signed %790, %c16 : index + %792 = subi %c-1, %791 : index + %793 = select %788, %792, %791 : index + %794 = remi_signed %793, %c16 : index + %795 = cmpi "slt", %794, %c0 : index + %796 = addi %794, %c16 : index + %797 = select %795, %796, %794 : index + %798 = addi %arg7, %arg9 : index + %799 = remi_signed %798, %c6 : index + %800 = cmpi "slt", %799, %c0 : index + %801 = addi %799, %c6 : index + %802 = select %800, %801, %799 : index + %803 = remi_signed %arg5, %c16 : index + %804 = cmpi "slt", %803, %c0 : index + %805 = addi %803, %c16 : index + %806 = select %804, %805, %803 : index + %807 = cmpi "slt", %806, %c0 : index + %808 = subi %c-1, %806 : index + %809 = select %807, %808, %806 : index + %810 = divi_signed %809, %c8 : index + %811 = subi %c-1, %810 : index + %812 = select %807, %811, %810 : index + %813 = remi_signed %812, %c2 : index + %814 = cmpi "slt", %813, %c0 : index + %815 = addi %813, %c2 : index + %816 = select %814, %815, %813 : index + %817 = load %2[%797, %802, %816] : memref<16x6x2xvector<8xf32>> + %818 = vector.insertelement %544, %817[%c4_i64 : i64] : vector<8xf32> + %819 = cmpi "slt", %arg5, %c0 : index + %820 = subi %c-1, %arg5 : index + %821 = select %819, %820, %arg5 : index + %822 = divi_signed %821, %c16 : index + %823 = subi %c-1, %822 : index + %824 = select %819, %823, %822 : index + %825 = remi_signed %824, %c16 : index + %826 = cmpi "slt", %825, %c0 : index + %827 = addi %825, %c16 : index + %828 = select %826, %827, %825 : index + %829 = addi %arg7, %arg9 : index + %830 = remi_signed %829, %c6 : index + %831 = cmpi "slt", %830, %c0 : index + %832 = addi %830, %c6 : index + %833 = select %831, %832, %830 : index + %834 = remi_signed %arg5, %c16 : index + %835 = cmpi "slt", %834, %c0 : index + %836 = addi %834, %c16 : index + %837 = select %835, %836, %834 : index + %838 = cmpi "slt", %837, %c0 : index + %839 = subi %c-1, %837 : index + %840 = select %838, %839, %837 : index + %841 = divi_signed %840, %c8 : index + %842 = subi %c-1, %841 : index + %843 = select %838, %842, %841 : index + %844 = remi_signed %843, %c2 : index + %845 = cmpi "slt", %844, %c0 : index + %846 = addi %844, %c2 : index + %847 = select %845, %846, %844 : index + store %818, %2[%828, %833, %847] : memref<16x6x2xvector<8xf32>> + %848 = cmpi "slt", %arg5, %c0 : index + %849 = subi %c-1, %arg5 : index + %850 = select %848, %849, %arg5 : index + %851 = divi_signed %850, %c16 : index + %852 = subi %c-1, %851 : index + %853 = select %848, %852, %851 : index + %854 = remi_signed %853, %c16 : index + %855 = cmpi "slt", %854, %c0 : index + %856 = addi %854, %c16 : index + %857 = select %855, %856, %854 : index + %858 = addi %arg7, %arg9 : index + %859 = remi_signed %858, %c6 : index + %860 = cmpi "slt", %859, %c0 : index + %861 = addi %859, %c6 : index + %862 = select %860, %861, %859 : index + %863 = remi_signed %arg5, %c16 : index + %864 = cmpi "slt", %863, %c0 : index + %865 = addi %863, %c16 : index + %866 = select %864, %865, %863 : index + %867 = cmpi "slt", %866, %c0 : index + %868 = subi %c-1, %866 : index + %869 = select %867, %868, %866 : index + %870 = divi_signed %869, %c8 : index + %871 = subi %c-1, %870 : index + %872 = select %867, %871, %870 : index + %873 = remi_signed %872, %c2 : index + %874 = cmpi "slt", %873, %c0 : index + %875 = addi %873, %c2 : index + %876 = select %874, %875, %873 : index + %877 = load %2[%857, %862, %876] : memref<16x6x2xvector<8xf32>> + %878 = vector.insertelement %545, %877[%c5_i64 : i64] : vector<8xf32> + %879 = cmpi "slt", %arg5, %c0 : index + %880 = subi %c-1, %arg5 : index + %881 = select %879, %880, %arg5 : index + %882 = divi_signed %881, %c16 : index + %883 = subi %c-1, %882 : index + %884 = select %879, %883, %882 : index + %885 = remi_signed %884, %c16 : index + %886 = cmpi "slt", %885, %c0 : index + %887 = addi %885, %c16 : index + %888 = select %886, %887, %885 : index + %889 = addi %arg7, %arg9 : index + %890 = remi_signed %889, %c6 : index + %891 = cmpi "slt", %890, %c0 : index + %892 = addi %890, %c6 : index + %893 = select %891, %892, %890 : index + %894 = remi_signed %arg5, %c16 : index + %895 = cmpi "slt", %894, %c0 : index + %896 = addi %894, %c16 : index + %897 = select %895, %896, %894 : index + %898 = cmpi "slt", %897, %c0 : index + %899 = subi %c-1, %897 : index + %900 = select %898, %899, %897 : index + %901 = divi_signed %900, %c8 : index + %902 = subi %c-1, %901 : index + %903 = select %898, %902, %901 : index + %904 = remi_signed %903, %c2 : index + %905 = cmpi "slt", %904, %c0 : index + %906 = addi %904, %c2 : index + %907 = select %905, %906, %904 : index + store %878, %2[%888, %893, %907] : memref<16x6x2xvector<8xf32>> + %908 = cmpi "slt", %arg5, %c0 : index + %909 = subi %c-1, %arg5 : index + %910 = select %908, %909, %arg5 : index + %911 = divi_signed %910, %c16 : index + %912 = subi %c-1, %911 : index + %913 = select %908, %912, %911 : index + %914 = remi_signed %913, %c16 : index + %915 = cmpi "slt", %914, %c0 : index + %916 = addi %914, %c16 : index + %917 = select %915, %916, %914 : index + %918 = addi %arg7, %arg9 : index + %919 = remi_signed %918, %c6 : index + %920 = cmpi "slt", %919, %c0 : index + %921 = addi %919, %c6 : index + %922 = select %920, %921, %919 : index + %923 = remi_signed %arg5, %c16 : index + %924 = cmpi "slt", %923, %c0 : index + %925 = addi %923, %c16 : index + %926 = select %924, %925, %923 : index + %927 = cmpi "slt", %926, %c0 : index + %928 = subi %c-1, %926 : index + %929 = select %927, %928, %926 : index + %930 = divi_signed %929, %c8 : index + %931 = subi %c-1, %930 : index + %932 = select %927, %931, %930 : index + %933 = remi_signed %932, %c2 : index + %934 = cmpi "slt", %933, %c0 : index + %935 = addi %933, %c2 : index + %936 = select %934, %935, %933 : index + %937 = load %2[%917, %922, %936] : memref<16x6x2xvector<8xf32>> + %938 = vector.insertelement %546, %937[%c6_i64 : i64] : vector<8xf32> + %939 = cmpi "slt", %arg5, %c0 : index + %940 = subi %c-1, %arg5 : index + %941 = select %939, %940, %arg5 : index + %942 = divi_signed %941, %c16 : index + %943 = subi %c-1, %942 : index + %944 = select %939, %943, %942 : index + %945 = remi_signed %944, %c16 : index + %946 = cmpi "slt", %945, %c0 : index + %947 = addi %945, %c16 : index + %948 = select %946, %947, %945 : index + %949 = addi %arg7, %arg9 : index + %950 = remi_signed %949, %c6 : index + %951 = cmpi "slt", %950, %c0 : index + %952 = addi %950, %c6 : index + %953 = select %951, %952, %950 : index + %954 = remi_signed %arg5, %c16 : index + %955 = cmpi "slt", %954, %c0 : index + %956 = addi %954, %c16 : index + %957 = select %955, %956, %954 : index + %958 = cmpi "slt", %957, %c0 : index + %959 = subi %c-1, %957 : index + %960 = select %958, %959, %957 : index + %961 = divi_signed %960, %c8 : index + %962 = subi %c-1, %961 : index + %963 = select %958, %962, %961 : index + %964 = remi_signed %963, %c2 : index + %965 = cmpi "slt", %964, %c0 : index + %966 = addi %964, %c2 : index + %967 = select %965, %966, %964 : index + store %938, %2[%948, %953, %967] : memref<16x6x2xvector<8xf32>> + %968 = cmpi "slt", %arg5, %c0 : index + %969 = subi %c-1, %arg5 : index + %970 = select %968, %969, %arg5 : index + %971 = divi_signed %970, %c16 : index + %972 = subi %c-1, %971 : index + %973 = select %968, %972, %971 : index + %974 = remi_signed %973, %c16 : index + %975 = cmpi "slt", %974, %c0 : index + %976 = addi %974, %c16 : index + %977 = select %975, %976, %974 : index + %978 = addi %arg7, %arg9 : index + %979 = remi_signed %978, %c6 : index + %980 = cmpi "slt", %979, %c0 : index + %981 = addi %979, %c6 : index + %982 = select %980, %981, %979 : index + %983 = remi_signed %arg5, %c16 : index + %984 = cmpi "slt", %983, %c0 : index + %985 = addi %983, %c16 : index + %986 = select %984, %985, %983 : index + %987 = cmpi "slt", %986, %c0 : index + %988 = subi %c-1, %986 : index + %989 = select %987, %988, %986 : index + %990 = divi_signed %989, %c8 : index + %991 = subi %c-1, %990 : index + %992 = select %987, %991, %990 : index + %993 = remi_signed %992, %c2 : index + %994 = cmpi "slt", %993, %c0 : index + %995 = addi %993, %c2 : index + %996 = select %994, %995, %993 : index + %997 = load %2[%977, %982, %996] : memref<16x6x2xvector<8xf32>> + %998 = vector.insertelement %547, %997[%c7_i64 : i64] : vector<8xf32> + %999 = cmpi "slt", %arg5, %c0 : index + %1000 = subi %c-1, %arg5 : index + %1001 = select %999, %1000, %arg5 : index + %1002 = divi_signed %1001, %c16 : index + %1003 = subi %c-1, %1002 : index + %1004 = select %999, %1003, %1002 : index + %1005 = remi_signed %1004, %c16 : index + %1006 = cmpi "slt", %1005, %c0 : index + %1007 = addi %1005, %c16 : index + %1008 = select %1006, %1007, %1005 : index + %1009 = addi %arg7, %arg9 : index + %1010 = remi_signed %1009, %c6 : index + %1011 = cmpi "slt", %1010, %c0 : index + %1012 = addi %1010, %c6 : index + %1013 = select %1011, %1012, %1010 : index + %1014 = remi_signed %arg5, %c16 : index + %1015 = cmpi "slt", %1014, %c0 : index + %1016 = addi %1014, %c16 : index + %1017 = select %1015, %1016, %1014 : index + %1018 = cmpi "slt", %1017, %c0 : index + %1019 = subi %c-1, %1017 : index + %1020 = select %1018, %1019, %1017 : index + %1021 = divi_signed %1020, %c8 : index + %1022 = subi %c-1, %1021 : index + %1023 = select %1018, %1022, %1021 : index + %1024 = remi_signed %1023, %c2 : index + %1025 = cmpi "slt", %1024, %c0 : index + %1026 = addi %1024, %c2 : index + %1027 = select %1025, %1026, %1024 : index + store %998, %2[%1008, %1013, %1027] : memref<16x6x2xvector<8xf32>> + %1028 = cmpi "slt", %arg5, %c0 : index + %1029 = subi %c-1, %arg5 : index + %1030 = select %1028, %1029, %arg5 : index + %1031 = divi_signed %1030, %c16 : index + %1032 = subi %c-1, %1031 : index + %1033 = select %1028, %1032, %1031 : index + %1034 = remi_signed %1033, %c16 : index + %1035 = cmpi "slt", %1034, %c0 : index + %1036 = addi %1034, %c16 : index + %1037 = select %1035, %1036, %1034 : index + %1038 = addi %arg7, %arg9 : index + %1039 = remi_signed %1038, %c6 : index + %1040 = cmpi "slt", %1039, %c0 : index + %1041 = addi %1039, %c6 : index + %1042 = select %1040, %1041, %1039 : index + %1043 = remi_signed %arg5, %c16 : index + %1044 = cmpi "slt", %1043, %c0 : index + %1045 = addi %1043, %c16 : index + %1046 = select %1044, %1045, %1043 : index + %1047 = cmpi "slt", %1046, %c0 : index + %1048 = subi %c-1, %1046 : index + %1049 = select %1047, %1048, %1046 : index + %1050 = divi_signed %1049, %c8 : index + %1051 = subi %c-1, %1050 : index + %1052 = select %1047, %1051, %1050 : index + %1053 = remi_signed %1052, %c2 : index + %1054 = cmpi "slt", %1053, %c0 : index + %1055 = addi %1053, %c2 : index + %1056 = select %1054, %1055, %1053 : index + %1057 = load %2[%1037, %1042, %1056] : memref<16x6x2xvector<8xf32>> + %1058 = vector.insertelement %540, %1057[%c0_i64 : i64] : vector<8xf32> + %1059 = cmpi "slt", %arg5, %c0 : index + %1060 = subi %c-1, %arg5 : index + %1061 = select %1059, %1060, %arg5 : index + %1062 = divi_signed %1061, %c16 : index + %1063 = subi %c-1, %1062 : index + %1064 = select %1059, %1063, %1062 : index + %1065 = remi_signed %1064, %c16 : index + %1066 = cmpi "slt", %1065, %c0 : index + %1067 = addi %1065, %c16 : index + %1068 = select %1066, %1067, %1065 : index + %1069 = addi %arg7, %arg9 : index + %1070 = remi_signed %1069, %c6 : index + %1071 = cmpi "slt", %1070, %c0 : index + %1072 = addi %1070, %c6 : index + %1073 = select %1071, %1072, %1070 : index + %1074 = remi_signed %arg5, %c16 : index + %1075 = cmpi "slt", %1074, %c0 : index + %1076 = addi %1074, %c16 : index + %1077 = select %1075, %1076, %1074 : index + %1078 = cmpi "slt", %1077, %c0 : index + %1079 = subi %c-1, %1077 : index + %1080 = select %1078, %1079, %1077 : index + %1081 = divi_signed %1080, %c8 : index + %1082 = subi %c-1, %1081 : index + %1083 = select %1078, %1082, %1081 : index + %1084 = remi_signed %1083, %c2 : index + %1085 = cmpi "slt", %1084, %c0 : index + %1086 = addi %1084, %c2 : index + %1087 = select %1085, %1086, %1084 : index + store %1058, %2[%1068, %1073, %1087] : memref<16x6x2xvector<8xf32>> + %1088 = cmpi "slt", %arg5, %c0 : index + %1089 = subi %c-1, %arg5 : index + %1090 = select %1088, %1089, %arg5 : index + %1091 = divi_signed %1090, %c16 : index + %1092 = subi %c-1, %1091 : index + %1093 = select %1088, %1092, %1091 : index + %1094 = remi_signed %1093, %c16 : index + %1095 = cmpi "slt", %1094, %c0 : index + %1096 = addi %1094, %c16 : index + %1097 = select %1095, %1096, %1094 : index + %1098 = addi %arg7, %arg9 : index + %1099 = remi_signed %1098, %c6 : index + %1100 = cmpi "slt", %1099, %c0 : index + %1101 = addi %1099, %c6 : index + %1102 = select %1100, %1101, %1099 : index + %1103 = remi_signed %arg5, %c16 : index + %1104 = cmpi "slt", %1103, %c0 : index + %1105 = addi %1103, %c16 : index + %1106 = select %1104, %1105, %1103 : index + %1107 = cmpi "slt", %1106, %c0 : index + %1108 = subi %c-1, %1106 : index + %1109 = select %1107, %1108, %1106 : index + %1110 = divi_signed %1109, %c8 : index + %1111 = subi %c-1, %1110 : index + %1112 = select %1107, %1111, %1110 : index + %1113 = remi_signed %1112, %c2 : index + %1114 = cmpi "slt", %1113, %c0 : index + %1115 = addi %1113, %c2 : index + %1116 = select %1114, %1115, %1113 : index + %1117 = load %2[%1097, %1102, %1116] : memref<16x6x2xvector<8xf32>> + %1118 = vector.insertelement %541, %1117[%c1_i64 : i64] : vector<8xf32> + %1119 = cmpi "slt", %arg5, %c0 : index + %1120 = subi %c-1, %arg5 : index + %1121 = select %1119, %1120, %arg5 : index + %1122 = divi_signed %1121, %c16 : index + %1123 = subi %c-1, %1122 : index + %1124 = select %1119, %1123, %1122 : index + %1125 = remi_signed %1124, %c16 : index + %1126 = cmpi "slt", %1125, %c0 : index + %1127 = addi %1125, %c16 : index + %1128 = select %1126, %1127, %1125 : index + %1129 = addi %arg7, %arg9 : index + %1130 = remi_signed %1129, %c6 : index + %1131 = cmpi "slt", %1130, %c0 : index + %1132 = addi %1130, %c6 : index + %1133 = select %1131, %1132, %1130 : index + %1134 = remi_signed %arg5, %c16 : index + %1135 = cmpi "slt", %1134, %c0 : index + %1136 = addi %1134, %c16 : index + %1137 = select %1135, %1136, %1134 : index + %1138 = cmpi "slt", %1137, %c0 : index + %1139 = subi %c-1, %1137 : index + %1140 = select %1138, %1139, %1137 : index + %1141 = divi_signed %1140, %c8 : index + %1142 = subi %c-1, %1141 : index + %1143 = select %1138, %1142, %1141 : index + %1144 = remi_signed %1143, %c2 : index + %1145 = cmpi "slt", %1144, %c0 : index + %1146 = addi %1144, %c2 : index + %1147 = select %1145, %1146, %1144 : index + store %1118, %2[%1128, %1133, %1147] : memref<16x6x2xvector<8xf32>> + %1148 = cmpi "slt", %arg5, %c0 : index + %1149 = subi %c-1, %arg5 : index + %1150 = select %1148, %1149, %arg5 : index + %1151 = divi_signed %1150, %c16 : index + %1152 = subi %c-1, %1151 : index + %1153 = select %1148, %1152, %1151 : index + %1154 = remi_signed %1153, %c16 : index + %1155 = cmpi "slt", %1154, %c0 : index + %1156 = addi %1154, %c16 : index + %1157 = select %1155, %1156, %1154 : index + %1158 = addi %arg7, %arg9 : index + %1159 = remi_signed %1158, %c6 : index + %1160 = cmpi "slt", %1159, %c0 : index + %1161 = addi %1159, %c6 : index + %1162 = select %1160, %1161, %1159 : index + %1163 = remi_signed %arg5, %c16 : index + %1164 = cmpi "slt", %1163, %c0 : index + %1165 = addi %1163, %c16 : index + %1166 = select %1164, %1165, %1163 : index + %1167 = cmpi "slt", %1166, %c0 : index + %1168 = subi %c-1, %1166 : index + %1169 = select %1167, %1168, %1166 : index + %1170 = divi_signed %1169, %c8 : index + %1171 = subi %c-1, %1170 : index + %1172 = select %1167, %1171, %1170 : index + %1173 = remi_signed %1172, %c2 : index + %1174 = cmpi "slt", %1173, %c0 : index + %1175 = addi %1173, %c2 : index + %1176 = select %1174, %1175, %1173 : index + %1177 = load %2[%1157, %1162, %1176] : memref<16x6x2xvector<8xf32>> + %1178 = vector.insertelement %542, %1177[%c2_i64 : i64] : vector<8xf32> + %1179 = cmpi "slt", %arg5, %c0 : index + %1180 = subi %c-1, %arg5 : index + %1181 = select %1179, %1180, %arg5 : index + %1182 = divi_signed %1181, %c16 : index + %1183 = subi %c-1, %1182 : index + %1184 = select %1179, %1183, %1182 : index + %1185 = remi_signed %1184, %c16 : index + %1186 = cmpi "slt", %1185, %c0 : index + %1187 = addi %1185, %c16 : index + %1188 = select %1186, %1187, %1185 : index + %1189 = addi %arg7, %arg9 : index + %1190 = remi_signed %1189, %c6 : index + %1191 = cmpi "slt", %1190, %c0 : index + %1192 = addi %1190, %c6 : index + %1193 = select %1191, %1192, %1190 : index + %1194 = remi_signed %arg5, %c16 : index + %1195 = cmpi "slt", %1194, %c0 : index + %1196 = addi %1194, %c16 : index + %1197 = select %1195, %1196, %1194 : index + %1198 = cmpi "slt", %1197, %c0 : index + %1199 = subi %c-1, %1197 : index + %1200 = select %1198, %1199, %1197 : index + %1201 = divi_signed %1200, %c8 : index + %1202 = subi %c-1, %1201 : index + %1203 = select %1198, %1202, %1201 : index + %1204 = remi_signed %1203, %c2 : index + %1205 = cmpi "slt", %1204, %c0 : index + %1206 = addi %1204, %c2 : index + %1207 = select %1205, %1206, %1204 : index + store %1178, %2[%1188, %1193, %1207] : memref<16x6x2xvector<8xf32>> + %1208 = cmpi "slt", %arg5, %c0 : index + %1209 = subi %c-1, %arg5 : index + %1210 = select %1208, %1209, %arg5 : index + %1211 = divi_signed %1210, %c16 : index + %1212 = subi %c-1, %1211 : index + %1213 = select %1208, %1212, %1211 : index + %1214 = remi_signed %1213, %c16 : index + %1215 = cmpi "slt", %1214, %c0 : index + %1216 = addi %1214, %c16 : index + %1217 = select %1215, %1216, %1214 : index + %1218 = addi %arg7, %arg9 : index + %1219 = remi_signed %1218, %c6 : index + %1220 = cmpi "slt", %1219, %c0 : index + %1221 = addi %1219, %c6 : index + %1222 = select %1220, %1221, %1219 : index + %1223 = remi_signed %arg5, %c16 : index + %1224 = cmpi "slt", %1223, %c0 : index + %1225 = addi %1223, %c16 : index + %1226 = select %1224, %1225, %1223 : index + %1227 = cmpi "slt", %1226, %c0 : index + %1228 = subi %c-1, %1226 : index + %1229 = select %1227, %1228, %1226 : index + %1230 = divi_signed %1229, %c8 : index + %1231 = subi %c-1, %1230 : index + %1232 = select %1227, %1231, %1230 : index + %1233 = remi_signed %1232, %c2 : index + %1234 = cmpi "slt", %1233, %c0 : index + %1235 = addi %1233, %c2 : index + %1236 = select %1234, %1235, %1233 : index + %1237 = load %2[%1217, %1222, %1236] : memref<16x6x2xvector<8xf32>> + %1238 = vector.insertelement %543, %1237[%c3_i64 : i64] : vector<8xf32> + %1239 = cmpi "slt", %arg5, %c0 : index + %1240 = subi %c-1, %arg5 : index + %1241 = select %1239, %1240, %arg5 : index + %1242 = divi_signed %1241, %c16 : index + %1243 = subi %c-1, %1242 : index + %1244 = select %1239, %1243, %1242 : index + %1245 = remi_signed %1244, %c16 : index + %1246 = cmpi "slt", %1245, %c0 : index + %1247 = addi %1245, %c16 : index + %1248 = select %1246, %1247, %1245 : index + %1249 = addi %arg7, %arg9 : index + %1250 = remi_signed %1249, %c6 : index + %1251 = cmpi "slt", %1250, %c0 : index + %1252 = addi %1250, %c6 : index + %1253 = select %1251, %1252, %1250 : index + %1254 = remi_signed %arg5, %c16 : index + %1255 = cmpi "slt", %1254, %c0 : index + %1256 = addi %1254, %c16 : index + %1257 = select %1255, %1256, %1254 : index + %1258 = cmpi "slt", %1257, %c0 : index + %1259 = subi %c-1, %1257 : index + %1260 = select %1258, %1259, %1257 : index + %1261 = divi_signed %1260, %c8 : index + %1262 = subi %c-1, %1261 : index + %1263 = select %1258, %1262, %1261 : index + %1264 = remi_signed %1263, %c2 : index + %1265 = cmpi "slt", %1264, %c0 : index + %1266 = addi %1264, %c2 : index + %1267 = select %1265, %1266, %1264 : index + store %1238, %2[%1248, %1253, %1267] : memref<16x6x2xvector<8xf32>> + %1268 = cmpi "slt", %arg5, %c0 : index + %1269 = subi %c-1, %arg5 : index + %1270 = select %1268, %1269, %arg5 : index + %1271 = divi_signed %1270, %c16 : index + %1272 = subi %c-1, %1271 : index + %1273 = select %1268, %1272, %1271 : index + %1274 = remi_signed %1273, %c16 : index + %1275 = cmpi "slt", %1274, %c0 : index + %1276 = addi %1274, %c16 : index + %1277 = select %1275, %1276, %1274 : index + %1278 = addi %arg7, %arg9 : index + %1279 = remi_signed %1278, %c6 : index + %1280 = cmpi "slt", %1279, %c0 : index + %1281 = addi %1279, %c6 : index + %1282 = select %1280, %1281, %1279 : index + %1283 = remi_signed %arg5, %c16 : index + %1284 = cmpi "slt", %1283, %c0 : index + %1285 = addi %1283, %c16 : index + %1286 = select %1284, %1285, %1283 : index + %1287 = cmpi "slt", %1286, %c0 : index + %1288 = subi %c-1, %1286 : index + %1289 = select %1287, %1288, %1286 : index + %1290 = divi_signed %1289, %c8 : index + %1291 = subi %c-1, %1290 : index + %1292 = select %1287, %1291, %1290 : index + %1293 = remi_signed %1292, %c2 : index + %1294 = cmpi "slt", %1293, %c0 : index + %1295 = addi %1293, %c2 : index + %1296 = select %1294, %1295, %1293 : index + %1297 = load %2[%1277, %1282, %1296] : memref<16x6x2xvector<8xf32>> + %1298 = vector.insertelement %544, %1297[%c4_i64 : i64] : vector<8xf32> + %1299 = cmpi "slt", %arg5, %c0 : index + %1300 = subi %c-1, %arg5 : index + %1301 = select %1299, %1300, %arg5 : index + %1302 = divi_signed %1301, %c16 : index + %1303 = subi %c-1, %1302 : index + %1304 = select %1299, %1303, %1302 : index + %1305 = remi_signed %1304, %c16 : index + %1306 = cmpi "slt", %1305, %c0 : index + %1307 = addi %1305, %c16 : index + %1308 = select %1306, %1307, %1305 : index + %1309 = addi %arg7, %arg9 : index + %1310 = remi_signed %1309, %c6 : index + %1311 = cmpi "slt", %1310, %c0 : index + %1312 = addi %1310, %c6 : index + %1313 = select %1311, %1312, %1310 : index + %1314 = remi_signed %arg5, %c16 : index + %1315 = cmpi "slt", %1314, %c0 : index + %1316 = addi %1314, %c16 : index + %1317 = select %1315, %1316, %1314 : index + %1318 = cmpi "slt", %1317, %c0 : index + %1319 = subi %c-1, %1317 : index + %1320 = select %1318, %1319, %1317 : index + %1321 = divi_signed %1320, %c8 : index + %1322 = subi %c-1, %1321 : index + %1323 = select %1318, %1322, %1321 : index + %1324 = remi_signed %1323, %c2 : index + %1325 = cmpi "slt", %1324, %c0 : index + %1326 = addi %1324, %c2 : index + %1327 = select %1325, %1326, %1324 : index + store %1298, %2[%1308, %1313, %1327] : memref<16x6x2xvector<8xf32>> + %1328 = cmpi "slt", %arg5, %c0 : index + %1329 = subi %c-1, %arg5 : index + %1330 = select %1328, %1329, %arg5 : index + %1331 = divi_signed %1330, %c16 : index + %1332 = subi %c-1, %1331 : index + %1333 = select %1328, %1332, %1331 : index + %1334 = remi_signed %1333, %c16 : index + %1335 = cmpi "slt", %1334, %c0 : index + %1336 = addi %1334, %c16 : index + %1337 = select %1335, %1336, %1334 : index + %1338 = addi %arg7, %arg9 : index + %1339 = remi_signed %1338, %c6 : index + %1340 = cmpi "slt", %1339, %c0 : index + %1341 = addi %1339, %c6 : index + %1342 = select %1340, %1341, %1339 : index + %1343 = remi_signed %arg5, %c16 : index + %1344 = cmpi "slt", %1343, %c0 : index + %1345 = addi %1343, %c16 : index + %1346 = select %1344, %1345, %1343 : index + %1347 = cmpi "slt", %1346, %c0 : index + %1348 = subi %c-1, %1346 : index + %1349 = select %1347, %1348, %1346 : index + %1350 = divi_signed %1349, %c8 : index + %1351 = subi %c-1, %1350 : index + %1352 = select %1347, %1351, %1350 : index + %1353 = remi_signed %1352, %c2 : index + %1354 = cmpi "slt", %1353, %c0 : index + %1355 = addi %1353, %c2 : index + %1356 = select %1354, %1355, %1353 : index + %1357 = load %2[%1337, %1342, %1356] : memref<16x6x2xvector<8xf32>> + %1358 = vector.insertelement %545, %1357[%c5_i64 : i64] : vector<8xf32> + %1359 = cmpi "slt", %arg5, %c0 : index + %1360 = subi %c-1, %arg5 : index + %1361 = select %1359, %1360, %arg5 : index + %1362 = divi_signed %1361, %c16 : index + %1363 = subi %c-1, %1362 : index + %1364 = select %1359, %1363, %1362 : index + %1365 = remi_signed %1364, %c16 : index + %1366 = cmpi "slt", %1365, %c0 : index + %1367 = addi %1365, %c16 : index + %1368 = select %1366, %1367, %1365 : index + %1369 = addi %arg7, %arg9 : index + %1370 = remi_signed %1369, %c6 : index + %1371 = cmpi "slt", %1370, %c0 : index + %1372 = addi %1370, %c6 : index + %1373 = select %1371, %1372, %1370 : index + %1374 = remi_signed %arg5, %c16 : index + %1375 = cmpi "slt", %1374, %c0 : index + %1376 = addi %1374, %c16 : index + %1377 = select %1375, %1376, %1374 : index + %1378 = cmpi "slt", %1377, %c0 : index + %1379 = subi %c-1, %1377 : index + %1380 = select %1378, %1379, %1377 : index + %1381 = divi_signed %1380, %c8 : index + %1382 = subi %c-1, %1381 : index + %1383 = select %1378, %1382, %1381 : index + %1384 = remi_signed %1383, %c2 : index + %1385 = cmpi "slt", %1384, %c0 : index + %1386 = addi %1384, %c2 : index + %1387 = select %1385, %1386, %1384 : index + store %1358, %2[%1368, %1373, %1387] : memref<16x6x2xvector<8xf32>> + %1388 = cmpi "slt", %arg5, %c0 : index + %1389 = subi %c-1, %arg5 : index + %1390 = select %1388, %1389, %arg5 : index + %1391 = divi_signed %1390, %c16 : index + %1392 = subi %c-1, %1391 : index + %1393 = select %1388, %1392, %1391 : index + %1394 = remi_signed %1393, %c16 : index + %1395 = cmpi "slt", %1394, %c0 : index + %1396 = addi %1394, %c16 : index + %1397 = select %1395, %1396, %1394 : index + %1398 = addi %arg7, %arg9 : index + %1399 = remi_signed %1398, %c6 : index + %1400 = cmpi "slt", %1399, %c0 : index + %1401 = addi %1399, %c6 : index + %1402 = select %1400, %1401, %1399 : index + %1403 = remi_signed %arg5, %c16 : index + %1404 = cmpi "slt", %1403, %c0 : index + %1405 = addi %1403, %c16 : index + %1406 = select %1404, %1405, %1403 : index + %1407 = cmpi "slt", %1406, %c0 : index + %1408 = subi %c-1, %1406 : index + %1409 = select %1407, %1408, %1406 : index + %1410 = divi_signed %1409, %c8 : index + %1411 = subi %c-1, %1410 : index + %1412 = select %1407, %1411, %1410 : index + %1413 = remi_signed %1412, %c2 : index + %1414 = cmpi "slt", %1413, %c0 : index + %1415 = addi %1413, %c2 : index + %1416 = select %1414, %1415, %1413 : index + %1417 = load %2[%1397, %1402, %1416] : memref<16x6x2xvector<8xf32>> + %1418 = vector.insertelement %546, %1417[%c6_i64 : i64] : vector<8xf32> + %1419 = cmpi "slt", %arg5, %c0 : index + %1420 = subi %c-1, %arg5 : index + %1421 = select %1419, %1420, %arg5 : index + %1422 = divi_signed %1421, %c16 : index + %1423 = subi %c-1, %1422 : index + %1424 = select %1419, %1423, %1422 : index + %1425 = remi_signed %1424, %c16 : index + %1426 = cmpi "slt", %1425, %c0 : index + %1427 = addi %1425, %c16 : index + %1428 = select %1426, %1427, %1425 : index + %1429 = addi %arg7, %arg9 : index + %1430 = remi_signed %1429, %c6 : index + %1431 = cmpi "slt", %1430, %c0 : index + %1432 = addi %1430, %c6 : index + %1433 = select %1431, %1432, %1430 : index + %1434 = remi_signed %arg5, %c16 : index + %1435 = cmpi "slt", %1434, %c0 : index + %1436 = addi %1434, %c16 : index + %1437 = select %1435, %1436, %1434 : index + %1438 = cmpi "slt", %1437, %c0 : index + %1439 = subi %c-1, %1437 : index + %1440 = select %1438, %1439, %1437 : index + %1441 = divi_signed %1440, %c8 : index + %1442 = subi %c-1, %1441 : index + %1443 = select %1438, %1442, %1441 : index + %1444 = remi_signed %1443, %c2 : index + %1445 = cmpi "slt", %1444, %c0 : index + %1446 = addi %1444, %c2 : index + %1447 = select %1445, %1446, %1444 : index + store %1418, %2[%1428, %1433, %1447] : memref<16x6x2xvector<8xf32>> + %1448 = cmpi "slt", %arg5, %c0 : index + %1449 = subi %c-1, %arg5 : index + %1450 = select %1448, %1449, %arg5 : index + %1451 = divi_signed %1450, %c16 : index + %1452 = subi %c-1, %1451 : index + %1453 = select %1448, %1452, %1451 : index + %1454 = remi_signed %1453, %c16 : index + %1455 = cmpi "slt", %1454, %c0 : index + %1456 = addi %1454, %c16 : index + %1457 = select %1455, %1456, %1454 : index + %1458 = addi %arg7, %arg9 : index + %1459 = remi_signed %1458, %c6 : index + %1460 = cmpi "slt", %1459, %c0 : index + %1461 = addi %1459, %c6 : index + %1462 = select %1460, %1461, %1459 : index + %1463 = remi_signed %arg5, %c16 : index + %1464 = cmpi "slt", %1463, %c0 : index + %1465 = addi %1463, %c16 : index + %1466 = select %1464, %1465, %1463 : index + %1467 = cmpi "slt", %1466, %c0 : index + %1468 = subi %c-1, %1466 : index + %1469 = select %1467, %1468, %1466 : index + %1470 = divi_signed %1469, %c8 : index + %1471 = subi %c-1, %1470 : index + %1472 = select %1467, %1471, %1470 : index + %1473 = remi_signed %1472, %c2 : index + %1474 = cmpi "slt", %1473, %c0 : index + %1475 = addi %1473, %c2 : index + %1476 = select %1474, %1475, %1473 : index + %1477 = load %2[%1457, %1462, %1476] : memref<16x6x2xvector<8xf32>> + %1478 = vector.insertelement %547, %1477[%c7_i64 : i64] : vector<8xf32> + %1479 = cmpi "slt", %arg5, %c0 : index + %1480 = subi %c-1, %arg5 : index + %1481 = select %1479, %1480, %arg5 : index + %1482 = divi_signed %1481, %c16 : index + %1483 = subi %c-1, %1482 : index + %1484 = select %1479, %1483, %1482 : index + %1485 = remi_signed %1484, %c16 : index + %1486 = cmpi "slt", %1485, %c0 : index + %1487 = addi %1485, %c16 : index + %1488 = select %1486, %1487, %1485 : index + %1489 = addi %arg7, %arg9 : index + %1490 = remi_signed %1489, %c6 : index + %1491 = cmpi "slt", %1490, %c0 : index + %1492 = addi %1490, %c6 : index + %1493 = select %1491, %1492, %1490 : index + %1494 = remi_signed %arg5, %c16 : index + %1495 = cmpi "slt", %1494, %c0 : index + %1496 = addi %1494, %c16 : index + %1497 = select %1495, %1496, %1494 : index + %1498 = cmpi "slt", %1497, %c0 : index + %1499 = subi %c-1, %1497 : index + %1500 = select %1498, %1499, %1497 : index + %1501 = divi_signed %1500, %c8 : index + %1502 = subi %c-1, %1501 : index + %1503 = select %1498, %1502, %1501 : index + %1504 = remi_signed %1503, %c2 : index + %1505 = cmpi "slt", %1504, %c0 : index + %1506 = addi %1504, %c2 : index + %1507 = select %1505, %1506, %1504 : index + store %1478, %2[%1488, %1493, %1507] : memref<16x6x2xvector<8xf32>> + %1508 = addi %arg4, %arg7 : index + %1509 = addi %1508, %arg9 : index + %1510 = addi %arg4, %arg7 : index + %1511 = addi %1510, %arg9 : index + %1512 = addi %arg4, %arg7 : index + %1513 = addi %1512, %arg9 : index + %1514 = addi %arg4, %arg7 : index + %1515 = addi %1514, %arg9 : index + %1516 = addi %arg4, %arg7 : index + %1517 = addi %1516, %arg9 : index + %1518 = addi %arg4, %arg7 : index + %1519 = addi %1518, %arg9 : index + %1520 = addi %arg4, %arg7 : index + %1521 = addi %1520, %arg9 : index + %1522 = addi %arg4, %arg7 : index + %1523 = addi %1522, %arg9 : index + %1524 = addi %arg6, %arg8 : index + %1525 = addi %arg6, %arg8 : index + %1526 = addi %arg6, %arg8 : index + %1527 = addi %arg6, %arg8 : index + %1528 = addi %arg6, %arg8 : index + %1529 = addi %arg6, %arg8 : index + %1530 = addi %arg6, %arg8 : index + %1531 = addi %arg6, %arg8 : index + %1532 = load %arg0[%1509, %1524] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1533 = load %arg0[%1511, %1525] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1534 = load %arg0[%1513, %1526] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1535 = load %arg0[%1515, %1527] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1536 = load %arg0[%1517, %1528] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1537 = load %arg0[%1519, %1529] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1538 = load %arg0[%1521, %1530] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1539 = load %arg0[%1523, %1531] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1540 = addi %arg5, %c8 : index + %1541 = cmpi "slt", %1540, %c0 : index + %1542 = subi %c-1, %1540 : index + %1543 = select %1541, %1542, %1540 : index + %1544 = divi_signed %1543, %c16 : index + %1545 = subi %c-1, %1544 : index + %1546 = select %1541, %1545, %1544 : index + %1547 = remi_signed %1546, %c16 : index + %1548 = cmpi "slt", %1547, %c0 : index + %1549 = addi %1547, %c16 : index + %1550 = select %1548, %1549, %1547 : index + %1551 = addi %arg6, %arg8 : index + %1552 = remi_signed %1551, %c128 : index + %1553 = cmpi "slt", %1552, %c0 : index + %1554 = addi %1552, %c128 : index + %1555 = select %1553, %1554, %1552 : index + %1556 = cmpi "slt", %arg5, %c0 : index + %1557 = subi %c-1, %arg5 : index + %1558 = select %1556, %1557, %arg5 : index + %1559 = divi_signed %1558, %c8 : index + %1560 = subi %c-1, %1559 : index + %1561 = select %1556, %1560, %1559 : index + %1562 = addi %arg5, %c8 : index + %1563 = cmpi "slt", %1562, %c0 : index + %1564 = subi %c-1, %1562 : index + %1565 = select %1563, %1564, %1562 : index + %1566 = divi_signed %1565, %c16 : index + %1567 = subi %c-1, %1566 : index + %1568 = select %1563, %1567, %1566 : index + %1569 = muli %1568, %c-2 : index + %1570 = addi %1561, %1569 : index + %1571 = cmpi "slt", %arg5, %c0 : index + %1572 = subi %c-1, %arg5 : index + %1573 = select %1571, %1572, %arg5 : index + %1574 = divi_signed %1573, %c8 : index + %1575 = subi %c-1, %1574 : index + %1576 = select %1571, %1575, %1574 : index + %1577 = addi %arg5, %c8 : index + %1578 = cmpi "slt", %1577, %c0 : index + %1579 = subi %c-1, %1577 : index + %1580 = select %1578, %1579, %1577 : index + %1581 = divi_signed %1580, %c16 : index + %1582 = subi %c-1, %1581 : index + %1583 = select %1578, %1582, %1581 : index + %1584 = muli %1583, %c-2 : index + %1585 = addi %1576, %1584 : index + %1586 = addi %1585, %c1 : index + %1587 = cmpi "slt", %1586, %c0 : index + %1588 = subi %c-1, %1586 : index + %1589 = select %1587, %1588, %1586 : index + %1590 = divi_signed %1589, %c2 : index + %1591 = subi %c-1, %1590 : index + %1592 = select %1587, %1591, %1590 : index + %1593 = muli %1592, %c-2 : index + %1594 = addi %1570, %1593 : index + %1595 = addi %1594, %c1 : index + %1596 = load %3[%1550, %1555, %1595] : memref<16x128x2xvector<8xf32>> + %1597 = vector.extractelement %1596[%c0_i64 : i64] : vector<8xf32> + %1598 = addi %arg5, %c8 : index + %1599 = cmpi "slt", %1598, %c0 : index + %1600 = subi %c-1, %1598 : index + %1601 = select %1599, %1600, %1598 : index + %1602 = divi_signed %1601, %c16 : index + %1603 = subi %c-1, %1602 : index + %1604 = select %1599, %1603, %1602 : index + %1605 = remi_signed %1604, %c16 : index + %1606 = cmpi "slt", %1605, %c0 : index + %1607 = addi %1605, %c16 : index + %1608 = select %1606, %1607, %1605 : index + %1609 = addi %arg6, %arg8 : index + %1610 = remi_signed %1609, %c128 : index + %1611 = cmpi "slt", %1610, %c0 : index + %1612 = addi %1610, %c128 : index + %1613 = select %1611, %1612, %1610 : index + %1614 = cmpi "slt", %arg5, %c0 : index + %1615 = subi %c-1, %arg5 : index + %1616 = select %1614, %1615, %arg5 : index + %1617 = divi_signed %1616, %c8 : index + %1618 = subi %c-1, %1617 : index + %1619 = select %1614, %1618, %1617 : index + %1620 = addi %arg5, %c8 : index + %1621 = cmpi "slt", %1620, %c0 : index + %1622 = subi %c-1, %1620 : index + %1623 = select %1621, %1622, %1620 : index + %1624 = divi_signed %1623, %c16 : index + %1625 = subi %c-1, %1624 : index + %1626 = select %1621, %1625, %1624 : index + %1627 = muli %1626, %c-2 : index + %1628 = addi %1619, %1627 : index + %1629 = cmpi "slt", %arg5, %c0 : index + %1630 = subi %c-1, %arg5 : index + %1631 = select %1629, %1630, %arg5 : index + %1632 = divi_signed %1631, %c8 : index + %1633 = subi %c-1, %1632 : index + %1634 = select %1629, %1633, %1632 : index + %1635 = addi %arg5, %c8 : index + %1636 = cmpi "slt", %1635, %c0 : index + %1637 = subi %c-1, %1635 : index + %1638 = select %1636, %1637, %1635 : index + %1639 = divi_signed %1638, %c16 : index + %1640 = subi %c-1, %1639 : index + %1641 = select %1636, %1640, %1639 : index + %1642 = muli %1641, %c-2 : index + %1643 = addi %1634, %1642 : index + %1644 = addi %1643, %c1 : index + %1645 = cmpi "slt", %1644, %c0 : index + %1646 = subi %c-1, %1644 : index + %1647 = select %1645, %1646, %1644 : index + %1648 = divi_signed %1647, %c2 : index + %1649 = subi %c-1, %1648 : index + %1650 = select %1645, %1649, %1648 : index + %1651 = muli %1650, %c-2 : index + %1652 = addi %1628, %1651 : index + %1653 = addi %1652, %c1 : index + %1654 = load %3[%1608, %1613, %1653] : memref<16x128x2xvector<8xf32>> + %1655 = vector.extractelement %1654[%c1_i64 : i64] : vector<8xf32> + %1656 = addi %arg5, %c8 : index + %1657 = cmpi "slt", %1656, %c0 : index + %1658 = subi %c-1, %1656 : index + %1659 = select %1657, %1658, %1656 : index + %1660 = divi_signed %1659, %c16 : index + %1661 = subi %c-1, %1660 : index + %1662 = select %1657, %1661, %1660 : index + %1663 = remi_signed %1662, %c16 : index + %1664 = cmpi "slt", %1663, %c0 : index + %1665 = addi %1663, %c16 : index + %1666 = select %1664, %1665, %1663 : index + %1667 = addi %arg6, %arg8 : index + %1668 = remi_signed %1667, %c128 : index + %1669 = cmpi "slt", %1668, %c0 : index + %1670 = addi %1668, %c128 : index + %1671 = select %1669, %1670, %1668 : index + %1672 = cmpi "slt", %arg5, %c0 : index + %1673 = subi %c-1, %arg5 : index + %1674 = select %1672, %1673, %arg5 : index + %1675 = divi_signed %1674, %c8 : index + %1676 = subi %c-1, %1675 : index + %1677 = select %1672, %1676, %1675 : index + %1678 = addi %arg5, %c8 : index + %1679 = cmpi "slt", %1678, %c0 : index + %1680 = subi %c-1, %1678 : index + %1681 = select %1679, %1680, %1678 : index + %1682 = divi_signed %1681, %c16 : index + %1683 = subi %c-1, %1682 : index + %1684 = select %1679, %1683, %1682 : index + %1685 = muli %1684, %c-2 : index + %1686 = addi %1677, %1685 : index + %1687 = cmpi "slt", %arg5, %c0 : index + %1688 = subi %c-1, %arg5 : index + %1689 = select %1687, %1688, %arg5 : index + %1690 = divi_signed %1689, %c8 : index + %1691 = subi %c-1, %1690 : index + %1692 = select %1687, %1691, %1690 : index + %1693 = addi %arg5, %c8 : index + %1694 = cmpi "slt", %1693, %c0 : index + %1695 = subi %c-1, %1693 : index + %1696 = select %1694, %1695, %1693 : index + %1697 = divi_signed %1696, %c16 : index + %1698 = subi %c-1, %1697 : index + %1699 = select %1694, %1698, %1697 : index + %1700 = muli %1699, %c-2 : index + %1701 = addi %1692, %1700 : index + %1702 = addi %1701, %c1 : index + %1703 = cmpi "slt", %1702, %c0 : index + %1704 = subi %c-1, %1702 : index + %1705 = select %1703, %1704, %1702 : index + %1706 = divi_signed %1705, %c2 : index + %1707 = subi %c-1, %1706 : index + %1708 = select %1703, %1707, %1706 : index + %1709 = muli %1708, %c-2 : index + %1710 = addi %1686, %1709 : index + %1711 = addi %1710, %c1 : index + %1712 = load %3[%1666, %1671, %1711] : memref<16x128x2xvector<8xf32>> + %1713 = vector.extractelement %1712[%c2_i64 : i64] : vector<8xf32> + %1714 = addi %arg5, %c8 : index + %1715 = cmpi "slt", %1714, %c0 : index + %1716 = subi %c-1, %1714 : index + %1717 = select %1715, %1716, %1714 : index + %1718 = divi_signed %1717, %c16 : index + %1719 = subi %c-1, %1718 : index + %1720 = select %1715, %1719, %1718 : index + %1721 = remi_signed %1720, %c16 : index + %1722 = cmpi "slt", %1721, %c0 : index + %1723 = addi %1721, %c16 : index + %1724 = select %1722, %1723, %1721 : index + %1725 = addi %arg6, %arg8 : index + %1726 = remi_signed %1725, %c128 : index + %1727 = cmpi "slt", %1726, %c0 : index + %1728 = addi %1726, %c128 : index + %1729 = select %1727, %1728, %1726 : index + %1730 = cmpi "slt", %arg5, %c0 : index + %1731 = subi %c-1, %arg5 : index + %1732 = select %1730, %1731, %arg5 : index + %1733 = divi_signed %1732, %c8 : index + %1734 = subi %c-1, %1733 : index + %1735 = select %1730, %1734, %1733 : index + %1736 = addi %arg5, %c8 : index + %1737 = cmpi "slt", %1736, %c0 : index + %1738 = subi %c-1, %1736 : index + %1739 = select %1737, %1738, %1736 : index + %1740 = divi_signed %1739, %c16 : index + %1741 = subi %c-1, %1740 : index + %1742 = select %1737, %1741, %1740 : index + %1743 = muli %1742, %c-2 : index + %1744 = addi %1735, %1743 : index + %1745 = cmpi "slt", %arg5, %c0 : index + %1746 = subi %c-1, %arg5 : index + %1747 = select %1745, %1746, %arg5 : index + %1748 = divi_signed %1747, %c8 : index + %1749 = subi %c-1, %1748 : index + %1750 = select %1745, %1749, %1748 : index + %1751 = addi %arg5, %c8 : index + %1752 = cmpi "slt", %1751, %c0 : index + %1753 = subi %c-1, %1751 : index + %1754 = select %1752, %1753, %1751 : index + %1755 = divi_signed %1754, %c16 : index + %1756 = subi %c-1, %1755 : index + %1757 = select %1752, %1756, %1755 : index + %1758 = muli %1757, %c-2 : index + %1759 = addi %1750, %1758 : index + %1760 = addi %1759, %c1 : index + %1761 = cmpi "slt", %1760, %c0 : index + %1762 = subi %c-1, %1760 : index + %1763 = select %1761, %1762, %1760 : index + %1764 = divi_signed %1763, %c2 : index + %1765 = subi %c-1, %1764 : index + %1766 = select %1761, %1765, %1764 : index + %1767 = muli %1766, %c-2 : index + %1768 = addi %1744, %1767 : index + %1769 = addi %1768, %c1 : index + %1770 = load %3[%1724, %1729, %1769] : memref<16x128x2xvector<8xf32>> + %1771 = vector.extractelement %1770[%c3_i64 : i64] : vector<8xf32> + %1772 = addi %arg5, %c8 : index + %1773 = cmpi "slt", %1772, %c0 : index + %1774 = subi %c-1, %1772 : index + %1775 = select %1773, %1774, %1772 : index + %1776 = divi_signed %1775, %c16 : index + %1777 = subi %c-1, %1776 : index + %1778 = select %1773, %1777, %1776 : index + %1779 = remi_signed %1778, %c16 : index + %1780 = cmpi "slt", %1779, %c0 : index + %1781 = addi %1779, %c16 : index + %1782 = select %1780, %1781, %1779 : index + %1783 = addi %arg6, %arg8 : index + %1784 = remi_signed %1783, %c128 : index + %1785 = cmpi "slt", %1784, %c0 : index + %1786 = addi %1784, %c128 : index + %1787 = select %1785, %1786, %1784 : index + %1788 = cmpi "slt", %arg5, %c0 : index + %1789 = subi %c-1, %arg5 : index + %1790 = select %1788, %1789, %arg5 : index + %1791 = divi_signed %1790, %c8 : index + %1792 = subi %c-1, %1791 : index + %1793 = select %1788, %1792, %1791 : index + %1794 = addi %arg5, %c8 : index + %1795 = cmpi "slt", %1794, %c0 : index + %1796 = subi %c-1, %1794 : index + %1797 = select %1795, %1796, %1794 : index + %1798 = divi_signed %1797, %c16 : index + %1799 = subi %c-1, %1798 : index + %1800 = select %1795, %1799, %1798 : index + %1801 = muli %1800, %c-2 : index + %1802 = addi %1793, %1801 : index + %1803 = cmpi "slt", %arg5, %c0 : index + %1804 = subi %c-1, %arg5 : index + %1805 = select %1803, %1804, %arg5 : index + %1806 = divi_signed %1805, %c8 : index + %1807 = subi %c-1, %1806 : index + %1808 = select %1803, %1807, %1806 : index + %1809 = addi %arg5, %c8 : index + %1810 = cmpi "slt", %1809, %c0 : index + %1811 = subi %c-1, %1809 : index + %1812 = select %1810, %1811, %1809 : index + %1813 = divi_signed %1812, %c16 : index + %1814 = subi %c-1, %1813 : index + %1815 = select %1810, %1814, %1813 : index + %1816 = muli %1815, %c-2 : index + %1817 = addi %1808, %1816 : index + %1818 = addi %1817, %c1 : index + %1819 = cmpi "slt", %1818, %c0 : index + %1820 = subi %c-1, %1818 : index + %1821 = select %1819, %1820, %1818 : index + %1822 = divi_signed %1821, %c2 : index + %1823 = subi %c-1, %1822 : index + %1824 = select %1819, %1823, %1822 : index + %1825 = muli %1824, %c-2 : index + %1826 = addi %1802, %1825 : index + %1827 = addi %1826, %c1 : index + %1828 = load %3[%1782, %1787, %1827] : memref<16x128x2xvector<8xf32>> + %1829 = vector.extractelement %1828[%c4_i64 : i64] : vector<8xf32> + %1830 = addi %arg5, %c8 : index + %1831 = cmpi "slt", %1830, %c0 : index + %1832 = subi %c-1, %1830 : index + %1833 = select %1831, %1832, %1830 : index + %1834 = divi_signed %1833, %c16 : index + %1835 = subi %c-1, %1834 : index + %1836 = select %1831, %1835, %1834 : index + %1837 = remi_signed %1836, %c16 : index + %1838 = cmpi "slt", %1837, %c0 : index + %1839 = addi %1837, %c16 : index + %1840 = select %1838, %1839, %1837 : index + %1841 = addi %arg6, %arg8 : index + %1842 = remi_signed %1841, %c128 : index + %1843 = cmpi "slt", %1842, %c0 : index + %1844 = addi %1842, %c128 : index + %1845 = select %1843, %1844, %1842 : index + %1846 = cmpi "slt", %arg5, %c0 : index + %1847 = subi %c-1, %arg5 : index + %1848 = select %1846, %1847, %arg5 : index + %1849 = divi_signed %1848, %c8 : index + %1850 = subi %c-1, %1849 : index + %1851 = select %1846, %1850, %1849 : index + %1852 = addi %arg5, %c8 : index + %1853 = cmpi "slt", %1852, %c0 : index + %1854 = subi %c-1, %1852 : index + %1855 = select %1853, %1854, %1852 : index + %1856 = divi_signed %1855, %c16 : index + %1857 = subi %c-1, %1856 : index + %1858 = select %1853, %1857, %1856 : index + %1859 = muli %1858, %c-2 : index + %1860 = addi %1851, %1859 : index + %1861 = cmpi "slt", %arg5, %c0 : index + %1862 = subi %c-1, %arg5 : index + %1863 = select %1861, %1862, %arg5 : index + %1864 = divi_signed %1863, %c8 : index + %1865 = subi %c-1, %1864 : index + %1866 = select %1861, %1865, %1864 : index + %1867 = addi %arg5, %c8 : index + %1868 = cmpi "slt", %1867, %c0 : index + %1869 = subi %c-1, %1867 : index + %1870 = select %1868, %1869, %1867 : index + %1871 = divi_signed %1870, %c16 : index + %1872 = subi %c-1, %1871 : index + %1873 = select %1868, %1872, %1871 : index + %1874 = muli %1873, %c-2 : index + %1875 = addi %1866, %1874 : index + %1876 = addi %1875, %c1 : index + %1877 = cmpi "slt", %1876, %c0 : index + %1878 = subi %c-1, %1876 : index + %1879 = select %1877, %1878, %1876 : index + %1880 = divi_signed %1879, %c2 : index + %1881 = subi %c-1, %1880 : index + %1882 = select %1877, %1881, %1880 : index + %1883 = muli %1882, %c-2 : index + %1884 = addi %1860, %1883 : index + %1885 = addi %1884, %c1 : index + %1886 = load %3[%1840, %1845, %1885] : memref<16x128x2xvector<8xf32>> + %1887 = vector.extractelement %1886[%c5_i64 : i64] : vector<8xf32> + %1888 = addi %arg5, %c8 : index + %1889 = cmpi "slt", %1888, %c0 : index + %1890 = subi %c-1, %1888 : index + %1891 = select %1889, %1890, %1888 : index + %1892 = divi_signed %1891, %c16 : index + %1893 = subi %c-1, %1892 : index + %1894 = select %1889, %1893, %1892 : index + %1895 = remi_signed %1894, %c16 : index + %1896 = cmpi "slt", %1895, %c0 : index + %1897 = addi %1895, %c16 : index + %1898 = select %1896, %1897, %1895 : index + %1899 = addi %arg6, %arg8 : index + %1900 = remi_signed %1899, %c128 : index + %1901 = cmpi "slt", %1900, %c0 : index + %1902 = addi %1900, %c128 : index + %1903 = select %1901, %1902, %1900 : index + %1904 = cmpi "slt", %arg5, %c0 : index + %1905 = subi %c-1, %arg5 : index + %1906 = select %1904, %1905, %arg5 : index + %1907 = divi_signed %1906, %c8 : index + %1908 = subi %c-1, %1907 : index + %1909 = select %1904, %1908, %1907 : index + %1910 = addi %arg5, %c8 : index + %1911 = cmpi "slt", %1910, %c0 : index + %1912 = subi %c-1, %1910 : index + %1913 = select %1911, %1912, %1910 : index + %1914 = divi_signed %1913, %c16 : index + %1915 = subi %c-1, %1914 : index + %1916 = select %1911, %1915, %1914 : index + %1917 = muli %1916, %c-2 : index + %1918 = addi %1909, %1917 : index + %1919 = cmpi "slt", %arg5, %c0 : index + %1920 = subi %c-1, %arg5 : index + %1921 = select %1919, %1920, %arg5 : index + %1922 = divi_signed %1921, %c8 : index + %1923 = subi %c-1, %1922 : index + %1924 = select %1919, %1923, %1922 : index + %1925 = addi %arg5, %c8 : index + %1926 = cmpi "slt", %1925, %c0 : index + %1927 = subi %c-1, %1925 : index + %1928 = select %1926, %1927, %1925 : index + %1929 = divi_signed %1928, %c16 : index + %1930 = subi %c-1, %1929 : index + %1931 = select %1926, %1930, %1929 : index + %1932 = muli %1931, %c-2 : index + %1933 = addi %1924, %1932 : index + %1934 = addi %1933, %c1 : index + %1935 = cmpi "slt", %1934, %c0 : index + %1936 = subi %c-1, %1934 : index + %1937 = select %1935, %1936, %1934 : index + %1938 = divi_signed %1937, %c2 : index + %1939 = subi %c-1, %1938 : index + %1940 = select %1935, %1939, %1938 : index + %1941 = muli %1940, %c-2 : index + %1942 = addi %1918, %1941 : index + %1943 = addi %1942, %c1 : index + %1944 = load %3[%1898, %1903, %1943] : memref<16x128x2xvector<8xf32>> + %1945 = vector.extractelement %1944[%c6_i64 : i64] : vector<8xf32> + %1946 = addi %arg5, %c8 : index + %1947 = cmpi "slt", %1946, %c0 : index + %1948 = subi %c-1, %1946 : index + %1949 = select %1947, %1948, %1946 : index + %1950 = divi_signed %1949, %c16 : index + %1951 = subi %c-1, %1950 : index + %1952 = select %1947, %1951, %1950 : index + %1953 = remi_signed %1952, %c16 : index + %1954 = cmpi "slt", %1953, %c0 : index + %1955 = addi %1953, %c16 : index + %1956 = select %1954, %1955, %1953 : index + %1957 = addi %arg6, %arg8 : index + %1958 = remi_signed %1957, %c128 : index + %1959 = cmpi "slt", %1958, %c0 : index + %1960 = addi %1958, %c128 : index + %1961 = select %1959, %1960, %1958 : index + %1962 = cmpi "slt", %arg5, %c0 : index + %1963 = subi %c-1, %arg5 : index + %1964 = select %1962, %1963, %arg5 : index + %1965 = divi_signed %1964, %c8 : index + %1966 = subi %c-1, %1965 : index + %1967 = select %1962, %1966, %1965 : index + %1968 = addi %arg5, %c8 : index + %1969 = cmpi "slt", %1968, %c0 : index + %1970 = subi %c-1, %1968 : index + %1971 = select %1969, %1970, %1968 : index + %1972 = divi_signed %1971, %c16 : index + %1973 = subi %c-1, %1972 : index + %1974 = select %1969, %1973, %1972 : index + %1975 = muli %1974, %c-2 : index + %1976 = addi %1967, %1975 : index + %1977 = cmpi "slt", %arg5, %c0 : index + %1978 = subi %c-1, %arg5 : index + %1979 = select %1977, %1978, %arg5 : index + %1980 = divi_signed %1979, %c8 : index + %1981 = subi %c-1, %1980 : index + %1982 = select %1977, %1981, %1980 : index + %1983 = addi %arg5, %c8 : index + %1984 = cmpi "slt", %1983, %c0 : index + %1985 = subi %c-1, %1983 : index + %1986 = select %1984, %1985, %1983 : index + %1987 = divi_signed %1986, %c16 : index + %1988 = subi %c-1, %1987 : index + %1989 = select %1984, %1988, %1987 : index + %1990 = muli %1989, %c-2 : index + %1991 = addi %1982, %1990 : index + %1992 = addi %1991, %c1 : index + %1993 = cmpi "slt", %1992, %c0 : index + %1994 = subi %c-1, %1992 : index + %1995 = select %1993, %1994, %1992 : index + %1996 = divi_signed %1995, %c2 : index + %1997 = subi %c-1, %1996 : index + %1998 = select %1993, %1997, %1996 : index + %1999 = muli %1998, %c-2 : index + %2000 = addi %1976, %1999 : index + %2001 = addi %2000, %c1 : index + %2002 = load %3[%1956, %1961, %2001] : memref<16x128x2xvector<8xf32>> + %2003 = vector.extractelement %2002[%c7_i64 : i64] : vector<8xf32> + %2004 = mulf %1532, %1597 {RelaxedPrecision} : f32 + %2005 = mulf %1533, %1655 {RelaxedPrecision} : f32 + %2006 = mulf %1534, %1713 {RelaxedPrecision} : f32 + %2007 = mulf %1535, %1771 {RelaxedPrecision} : f32 + %2008 = mulf %1536, %1829 {RelaxedPrecision} : f32 + %2009 = mulf %1537, %1887 {RelaxedPrecision} : f32 + %2010 = mulf %1538, %1945 {RelaxedPrecision} : f32 + %2011 = mulf %1539, %2003 {RelaxedPrecision} : f32 + %2012 = addi %arg5, %c8 : index + %2013 = cmpi "slt", %2012, %c0 : index + %2014 = subi %c-1, %2012 : index + %2015 = select %2013, %2014, %2012 : index + %2016 = divi_signed %2015, %c16 : index + %2017 = subi %c-1, %2016 : index + %2018 = select %2013, %2017, %2016 : index + %2019 = remi_signed %2018, %c16 : index + %2020 = cmpi "slt", %2019, %c0 : index + %2021 = addi %2019, %c16 : index + %2022 = select %2020, %2021, %2019 : index + %2023 = addi %arg7, %arg9 : index + %2024 = remi_signed %2023, %c6 : index + %2025 = cmpi "slt", %2024, %c0 : index + %2026 = addi %2024, %c6 : index + %2027 = select %2025, %2026, %2024 : index + %2028 = cmpi "slt", %arg5, %c0 : index + %2029 = subi %c-1, %arg5 : index + %2030 = select %2028, %2029, %arg5 : index + %2031 = divi_signed %2030, %c8 : index + %2032 = subi %c-1, %2031 : index + %2033 = select %2028, %2032, %2031 : index + %2034 = addi %arg5, %c8 : index + %2035 = cmpi "slt", %2034, %c0 : index + %2036 = subi %c-1, %2034 : index + %2037 = select %2035, %2036, %2034 : index + %2038 = divi_signed %2037, %c16 : index + %2039 = subi %c-1, %2038 : index + %2040 = select %2035, %2039, %2038 : index + %2041 = muli %2040, %c-2 : index + %2042 = addi %2033, %2041 : index + %2043 = cmpi "slt", %arg5, %c0 : index + %2044 = subi %c-1, %arg5 : index + %2045 = select %2043, %2044, %arg5 : index + %2046 = divi_signed %2045, %c8 : index + %2047 = subi %c-1, %2046 : index + %2048 = select %2043, %2047, %2046 : index + %2049 = addi %arg5, %c8 : index + %2050 = cmpi "slt", %2049, %c0 : index + %2051 = subi %c-1, %2049 : index + %2052 = select %2050, %2051, %2049 : index + %2053 = divi_signed %2052, %c16 : index + %2054 = subi %c-1, %2053 : index + %2055 = select %2050, %2054, %2053 : index + %2056 = muli %2055, %c-2 : index + %2057 = addi %2048, %2056 : index + %2058 = addi %2057, %c1 : index + %2059 = cmpi "slt", %2058, %c0 : index + %2060 = subi %c-1, %2058 : index + %2061 = select %2059, %2060, %2058 : index + %2062 = divi_signed %2061, %c2 : index + %2063 = subi %c-1, %2062 : index + %2064 = select %2059, %2063, %2062 : index + %2065 = muli %2064, %c-2 : index + %2066 = addi %2042, %2065 : index + %2067 = addi %2066, %c1 : index + %2068 = load %2[%2022, %2027, %2067] : memref<16x6x2xvector<8xf32>> + %2069 = vector.extractelement %2068[%c0_i64 : i64] : vector<8xf32> + %2070 = addi %arg5, %c8 : index + %2071 = cmpi "slt", %2070, %c0 : index + %2072 = subi %c-1, %2070 : index + %2073 = select %2071, %2072, %2070 : index + %2074 = divi_signed %2073, %c16 : index + %2075 = subi %c-1, %2074 : index + %2076 = select %2071, %2075, %2074 : index + %2077 = remi_signed %2076, %c16 : index + %2078 = cmpi "slt", %2077, %c0 : index + %2079 = addi %2077, %c16 : index + %2080 = select %2078, %2079, %2077 : index + %2081 = addi %arg7, %arg9 : index + %2082 = remi_signed %2081, %c6 : index + %2083 = cmpi "slt", %2082, %c0 : index + %2084 = addi %2082, %c6 : index + %2085 = select %2083, %2084, %2082 : index + %2086 = cmpi "slt", %arg5, %c0 : index + %2087 = subi %c-1, %arg5 : index + %2088 = select %2086, %2087, %arg5 : index + %2089 = divi_signed %2088, %c8 : index + %2090 = subi %c-1, %2089 : index + %2091 = select %2086, %2090, %2089 : index + %2092 = addi %arg5, %c8 : index + %2093 = cmpi "slt", %2092, %c0 : index + %2094 = subi %c-1, %2092 : index + %2095 = select %2093, %2094, %2092 : index + %2096 = divi_signed %2095, %c16 : index + %2097 = subi %c-1, %2096 : index + %2098 = select %2093, %2097, %2096 : index + %2099 = muli %2098, %c-2 : index + %2100 = addi %2091, %2099 : index + %2101 = cmpi "slt", %arg5, %c0 : index + %2102 = subi %c-1, %arg5 : index + %2103 = select %2101, %2102, %arg5 : index + %2104 = divi_signed %2103, %c8 : index + %2105 = subi %c-1, %2104 : index + %2106 = select %2101, %2105, %2104 : index + %2107 = addi %arg5, %c8 : index + %2108 = cmpi "slt", %2107, %c0 : index + %2109 = subi %c-1, %2107 : index + %2110 = select %2108, %2109, %2107 : index + %2111 = divi_signed %2110, %c16 : index + %2112 = subi %c-1, %2111 : index + %2113 = select %2108, %2112, %2111 : index + %2114 = muli %2113, %c-2 : index + %2115 = addi %2106, %2114 : index + %2116 = addi %2115, %c1 : index + %2117 = cmpi "slt", %2116, %c0 : index + %2118 = subi %c-1, %2116 : index + %2119 = select %2117, %2118, %2116 : index + %2120 = divi_signed %2119, %c2 : index + %2121 = subi %c-1, %2120 : index + %2122 = select %2117, %2121, %2120 : index + %2123 = muli %2122, %c-2 : index + %2124 = addi %2100, %2123 : index + %2125 = addi %2124, %c1 : index + %2126 = load %2[%2080, %2085, %2125] : memref<16x6x2xvector<8xf32>> + %2127 = vector.extractelement %2126[%c1_i64 : i64] : vector<8xf32> + %2128 = addi %arg5, %c8 : index + %2129 = cmpi "slt", %2128, %c0 : index + %2130 = subi %c-1, %2128 : index + %2131 = select %2129, %2130, %2128 : index + %2132 = divi_signed %2131, %c16 : index + %2133 = subi %c-1, %2132 : index + %2134 = select %2129, %2133, %2132 : index + %2135 = remi_signed %2134, %c16 : index + %2136 = cmpi "slt", %2135, %c0 : index + %2137 = addi %2135, %c16 : index + %2138 = select %2136, %2137, %2135 : index + %2139 = addi %arg7, %arg9 : index + %2140 = remi_signed %2139, %c6 : index + %2141 = cmpi "slt", %2140, %c0 : index + %2142 = addi %2140, %c6 : index + %2143 = select %2141, %2142, %2140 : index + %2144 = cmpi "slt", %arg5, %c0 : index + %2145 = subi %c-1, %arg5 : index + %2146 = select %2144, %2145, %arg5 : index + %2147 = divi_signed %2146, %c8 : index + %2148 = subi %c-1, %2147 : index + %2149 = select %2144, %2148, %2147 : index + %2150 = addi %arg5, %c8 : index + %2151 = cmpi "slt", %2150, %c0 : index + %2152 = subi %c-1, %2150 : index + %2153 = select %2151, %2152, %2150 : index + %2154 = divi_signed %2153, %c16 : index + %2155 = subi %c-1, %2154 : index + %2156 = select %2151, %2155, %2154 : index + %2157 = muli %2156, %c-2 : index + %2158 = addi %2149, %2157 : index + %2159 = cmpi "slt", %arg5, %c0 : index + %2160 = subi %c-1, %arg5 : index + %2161 = select %2159, %2160, %arg5 : index + %2162 = divi_signed %2161, %c8 : index + %2163 = subi %c-1, %2162 : index + %2164 = select %2159, %2163, %2162 : index + %2165 = addi %arg5, %c8 : index + %2166 = cmpi "slt", %2165, %c0 : index + %2167 = subi %c-1, %2165 : index + %2168 = select %2166, %2167, %2165 : index + %2169 = divi_signed %2168, %c16 : index + %2170 = subi %c-1, %2169 : index + %2171 = select %2166, %2170, %2169 : index + %2172 = muli %2171, %c-2 : index + %2173 = addi %2164, %2172 : index + %2174 = addi %2173, %c1 : index + %2175 = cmpi "slt", %2174, %c0 : index + %2176 = subi %c-1, %2174 : index + %2177 = select %2175, %2176, %2174 : index + %2178 = divi_signed %2177, %c2 : index + %2179 = subi %c-1, %2178 : index + %2180 = select %2175, %2179, %2178 : index + %2181 = muli %2180, %c-2 : index + %2182 = addi %2158, %2181 : index + %2183 = addi %2182, %c1 : index + %2184 = load %2[%2138, %2143, %2183] : memref<16x6x2xvector<8xf32>> + %2185 = vector.extractelement %2184[%c2_i64 : i64] : vector<8xf32> + %2186 = addi %arg5, %c8 : index + %2187 = cmpi "slt", %2186, %c0 : index + %2188 = subi %c-1, %2186 : index + %2189 = select %2187, %2188, %2186 : index + %2190 = divi_signed %2189, %c16 : index + %2191 = subi %c-1, %2190 : index + %2192 = select %2187, %2191, %2190 : index + %2193 = remi_signed %2192, %c16 : index + %2194 = cmpi "slt", %2193, %c0 : index + %2195 = addi %2193, %c16 : index + %2196 = select %2194, %2195, %2193 : index + %2197 = addi %arg7, %arg9 : index + %2198 = remi_signed %2197, %c6 : index + %2199 = cmpi "slt", %2198, %c0 : index + %2200 = addi %2198, %c6 : index + %2201 = select %2199, %2200, %2198 : index + %2202 = cmpi "slt", %arg5, %c0 : index + %2203 = subi %c-1, %arg5 : index + %2204 = select %2202, %2203, %arg5 : index + %2205 = divi_signed %2204, %c8 : index + %2206 = subi %c-1, %2205 : index + %2207 = select %2202, %2206, %2205 : index + %2208 = addi %arg5, %c8 : index + %2209 = cmpi "slt", %2208, %c0 : index + %2210 = subi %c-1, %2208 : index + %2211 = select %2209, %2210, %2208 : index + %2212 = divi_signed %2211, %c16 : index + %2213 = subi %c-1, %2212 : index + %2214 = select %2209, %2213, %2212 : index + %2215 = muli %2214, %c-2 : index + %2216 = addi %2207, %2215 : index + %2217 = cmpi "slt", %arg5, %c0 : index + %2218 = subi %c-1, %arg5 : index + %2219 = select %2217, %2218, %arg5 : index + %2220 = divi_signed %2219, %c8 : index + %2221 = subi %c-1, %2220 : index + %2222 = select %2217, %2221, %2220 : index + %2223 = addi %arg5, %c8 : index + %2224 = cmpi "slt", %2223, %c0 : index + %2225 = subi %c-1, %2223 : index + %2226 = select %2224, %2225, %2223 : index + %2227 = divi_signed %2226, %c16 : index + %2228 = subi %c-1, %2227 : index + %2229 = select %2224, %2228, %2227 : index + %2230 = muli %2229, %c-2 : index + %2231 = addi %2222, %2230 : index + %2232 = addi %2231, %c1 : index + %2233 = cmpi "slt", %2232, %c0 : index + %2234 = subi %c-1, %2232 : index + %2235 = select %2233, %2234, %2232 : index + %2236 = divi_signed %2235, %c2 : index + %2237 = subi %c-1, %2236 : index + %2238 = select %2233, %2237, %2236 : index + %2239 = muli %2238, %c-2 : index + %2240 = addi %2216, %2239 : index + %2241 = addi %2240, %c1 : index + %2242 = load %2[%2196, %2201, %2241] : memref<16x6x2xvector<8xf32>> + %2243 = vector.extractelement %2242[%c3_i64 : i64] : vector<8xf32> + %2244 = addi %arg5, %c8 : index + %2245 = cmpi "slt", %2244, %c0 : index + %2246 = subi %c-1, %2244 : index + %2247 = select %2245, %2246, %2244 : index + %2248 = divi_signed %2247, %c16 : index + %2249 = subi %c-1, %2248 : index + %2250 = select %2245, %2249, %2248 : index + %2251 = remi_signed %2250, %c16 : index + %2252 = cmpi "slt", %2251, %c0 : index + %2253 = addi %2251, %c16 : index + %2254 = select %2252, %2253, %2251 : index + %2255 = addi %arg7, %arg9 : index + %2256 = remi_signed %2255, %c6 : index + %2257 = cmpi "slt", %2256, %c0 : index + %2258 = addi %2256, %c6 : index + %2259 = select %2257, %2258, %2256 : index + %2260 = cmpi "slt", %arg5, %c0 : index + %2261 = subi %c-1, %arg5 : index + %2262 = select %2260, %2261, %arg5 : index + %2263 = divi_signed %2262, %c8 : index + %2264 = subi %c-1, %2263 : index + %2265 = select %2260, %2264, %2263 : index + %2266 = addi %arg5, %c8 : index + %2267 = cmpi "slt", %2266, %c0 : index + %2268 = subi %c-1, %2266 : index + %2269 = select %2267, %2268, %2266 : index + %2270 = divi_signed %2269, %c16 : index + %2271 = subi %c-1, %2270 : index + %2272 = select %2267, %2271, %2270 : index + %2273 = muli %2272, %c-2 : index + %2274 = addi %2265, %2273 : index + %2275 = cmpi "slt", %arg5, %c0 : index + %2276 = subi %c-1, %arg5 : index + %2277 = select %2275, %2276, %arg5 : index + %2278 = divi_signed %2277, %c8 : index + %2279 = subi %c-1, %2278 : index + %2280 = select %2275, %2279, %2278 : index + %2281 = addi %arg5, %c8 : index + %2282 = cmpi "slt", %2281, %c0 : index + %2283 = subi %c-1, %2281 : index + %2284 = select %2282, %2283, %2281 : index + %2285 = divi_signed %2284, %c16 : index + %2286 = subi %c-1, %2285 : index + %2287 = select %2282, %2286, %2285 : index + %2288 = muli %2287, %c-2 : index + %2289 = addi %2280, %2288 : index + %2290 = addi %2289, %c1 : index + %2291 = cmpi "slt", %2290, %c0 : index + %2292 = subi %c-1, %2290 : index + %2293 = select %2291, %2292, %2290 : index + %2294 = divi_signed %2293, %c2 : index + %2295 = subi %c-1, %2294 : index + %2296 = select %2291, %2295, %2294 : index + %2297 = muli %2296, %c-2 : index + %2298 = addi %2274, %2297 : index + %2299 = addi %2298, %c1 : index + %2300 = load %2[%2254, %2259, %2299] : memref<16x6x2xvector<8xf32>> + %2301 = vector.extractelement %2300[%c4_i64 : i64] : vector<8xf32> + %2302 = addi %arg5, %c8 : index + %2303 = cmpi "slt", %2302, %c0 : index + %2304 = subi %c-1, %2302 : index + %2305 = select %2303, %2304, %2302 : index + %2306 = divi_signed %2305, %c16 : index + %2307 = subi %c-1, %2306 : index + %2308 = select %2303, %2307, %2306 : index + %2309 = remi_signed %2308, %c16 : index + %2310 = cmpi "slt", %2309, %c0 : index + %2311 = addi %2309, %c16 : index + %2312 = select %2310, %2311, %2309 : index + %2313 = addi %arg7, %arg9 : index + %2314 = remi_signed %2313, %c6 : index + %2315 = cmpi "slt", %2314, %c0 : index + %2316 = addi %2314, %c6 : index + %2317 = select %2315, %2316, %2314 : index + %2318 = cmpi "slt", %arg5, %c0 : index + %2319 = subi %c-1, %arg5 : index + %2320 = select %2318, %2319, %arg5 : index + %2321 = divi_signed %2320, %c8 : index + %2322 = subi %c-1, %2321 : index + %2323 = select %2318, %2322, %2321 : index + %2324 = addi %arg5, %c8 : index + %2325 = cmpi "slt", %2324, %c0 : index + %2326 = subi %c-1, %2324 : index + %2327 = select %2325, %2326, %2324 : index + %2328 = divi_signed %2327, %c16 : index + %2329 = subi %c-1, %2328 : index + %2330 = select %2325, %2329, %2328 : index + %2331 = muli %2330, %c-2 : index + %2332 = addi %2323, %2331 : index + %2333 = cmpi "slt", %arg5, %c0 : index + %2334 = subi %c-1, %arg5 : index + %2335 = select %2333, %2334, %arg5 : index + %2336 = divi_signed %2335, %c8 : index + %2337 = subi %c-1, %2336 : index + %2338 = select %2333, %2337, %2336 : index + %2339 = addi %arg5, %c8 : index + %2340 = cmpi "slt", %2339, %c0 : index + %2341 = subi %c-1, %2339 : index + %2342 = select %2340, %2341, %2339 : index + %2343 = divi_signed %2342, %c16 : index + %2344 = subi %c-1, %2343 : index + %2345 = select %2340, %2344, %2343 : index + %2346 = muli %2345, %c-2 : index + %2347 = addi %2338, %2346 : index + %2348 = addi %2347, %c1 : index + %2349 = cmpi "slt", %2348, %c0 : index + %2350 = subi %c-1, %2348 : index + %2351 = select %2349, %2350, %2348 : index + %2352 = divi_signed %2351, %c2 : index + %2353 = subi %c-1, %2352 : index + %2354 = select %2349, %2353, %2352 : index + %2355 = muli %2354, %c-2 : index + %2356 = addi %2332, %2355 : index + %2357 = addi %2356, %c1 : index + %2358 = load %2[%2312, %2317, %2357] : memref<16x6x2xvector<8xf32>> + %2359 = vector.extractelement %2358[%c5_i64 : i64] : vector<8xf32> + %2360 = addi %arg5, %c8 : index + %2361 = cmpi "slt", %2360, %c0 : index + %2362 = subi %c-1, %2360 : index + %2363 = select %2361, %2362, %2360 : index + %2364 = divi_signed %2363, %c16 : index + %2365 = subi %c-1, %2364 : index + %2366 = select %2361, %2365, %2364 : index + %2367 = remi_signed %2366, %c16 : index + %2368 = cmpi "slt", %2367, %c0 : index + %2369 = addi %2367, %c16 : index + %2370 = select %2368, %2369, %2367 : index + %2371 = addi %arg7, %arg9 : index + %2372 = remi_signed %2371, %c6 : index + %2373 = cmpi "slt", %2372, %c0 : index + %2374 = addi %2372, %c6 : index + %2375 = select %2373, %2374, %2372 : index + %2376 = cmpi "slt", %arg5, %c0 : index + %2377 = subi %c-1, %arg5 : index + %2378 = select %2376, %2377, %arg5 : index + %2379 = divi_signed %2378, %c8 : index + %2380 = subi %c-1, %2379 : index + %2381 = select %2376, %2380, %2379 : index + %2382 = addi %arg5, %c8 : index + %2383 = cmpi "slt", %2382, %c0 : index + %2384 = subi %c-1, %2382 : index + %2385 = select %2383, %2384, %2382 : index + %2386 = divi_signed %2385, %c16 : index + %2387 = subi %c-1, %2386 : index + %2388 = select %2383, %2387, %2386 : index + %2389 = muli %2388, %c-2 : index + %2390 = addi %2381, %2389 : index + %2391 = cmpi "slt", %arg5, %c0 : index + %2392 = subi %c-1, %arg5 : index + %2393 = select %2391, %2392, %arg5 : index + %2394 = divi_signed %2393, %c8 : index + %2395 = subi %c-1, %2394 : index + %2396 = select %2391, %2395, %2394 : index + %2397 = addi %arg5, %c8 : index + %2398 = cmpi "slt", %2397, %c0 : index + %2399 = subi %c-1, %2397 : index + %2400 = select %2398, %2399, %2397 : index + %2401 = divi_signed %2400, %c16 : index + %2402 = subi %c-1, %2401 : index + %2403 = select %2398, %2402, %2401 : index + %2404 = muli %2403, %c-2 : index + %2405 = addi %2396, %2404 : index + %2406 = addi %2405, %c1 : index + %2407 = cmpi "slt", %2406, %c0 : index + %2408 = subi %c-1, %2406 : index + %2409 = select %2407, %2408, %2406 : index + %2410 = divi_signed %2409, %c2 : index + %2411 = subi %c-1, %2410 : index + %2412 = select %2407, %2411, %2410 : index + %2413 = muli %2412, %c-2 : index + %2414 = addi %2390, %2413 : index + %2415 = addi %2414, %c1 : index + %2416 = load %2[%2370, %2375, %2415] : memref<16x6x2xvector<8xf32>> + %2417 = vector.extractelement %2416[%c6_i64 : i64] : vector<8xf32> + %2418 = addi %arg5, %c8 : index + %2419 = cmpi "slt", %2418, %c0 : index + %2420 = subi %c-1, %2418 : index + %2421 = select %2419, %2420, %2418 : index + %2422 = divi_signed %2421, %c16 : index + %2423 = subi %c-1, %2422 : index + %2424 = select %2419, %2423, %2422 : index + %2425 = remi_signed %2424, %c16 : index + %2426 = cmpi "slt", %2425, %c0 : index + %2427 = addi %2425, %c16 : index + %2428 = select %2426, %2427, %2425 : index + %2429 = addi %arg7, %arg9 : index + %2430 = remi_signed %2429, %c6 : index + %2431 = cmpi "slt", %2430, %c0 : index + %2432 = addi %2430, %c6 : index + %2433 = select %2431, %2432, %2430 : index + %2434 = cmpi "slt", %arg5, %c0 : index + %2435 = subi %c-1, %arg5 : index + %2436 = select %2434, %2435, %arg5 : index + %2437 = divi_signed %2436, %c8 : index + %2438 = subi %c-1, %2437 : index + %2439 = select %2434, %2438, %2437 : index + %2440 = addi %arg5, %c8 : index + %2441 = cmpi "slt", %2440, %c0 : index + %2442 = subi %c-1, %2440 : index + %2443 = select %2441, %2442, %2440 : index + %2444 = divi_signed %2443, %c16 : index + %2445 = subi %c-1, %2444 : index + %2446 = select %2441, %2445, %2444 : index + %2447 = muli %2446, %c-2 : index + %2448 = addi %2439, %2447 : index + %2449 = cmpi "slt", %arg5, %c0 : index + %2450 = subi %c-1, %arg5 : index + %2451 = select %2449, %2450, %arg5 : index + %2452 = divi_signed %2451, %c8 : index + %2453 = subi %c-1, %2452 : index + %2454 = select %2449, %2453, %2452 : index + %2455 = addi %arg5, %c8 : index + %2456 = cmpi "slt", %2455, %c0 : index + %2457 = subi %c-1, %2455 : index + %2458 = select %2456, %2457, %2455 : index + %2459 = divi_signed %2458, %c16 : index + %2460 = subi %c-1, %2459 : index + %2461 = select %2456, %2460, %2459 : index + %2462 = muli %2461, %c-2 : index + %2463 = addi %2454, %2462 : index + %2464 = addi %2463, %c1 : index + %2465 = cmpi "slt", %2464, %c0 : index + %2466 = subi %c-1, %2464 : index + %2467 = select %2465, %2466, %2464 : index + %2468 = divi_signed %2467, %c2 : index + %2469 = subi %c-1, %2468 : index + %2470 = select %2465, %2469, %2468 : index + %2471 = muli %2470, %c-2 : index + %2472 = addi %2448, %2471 : index + %2473 = addi %2472, %c1 : index + %2474 = load %2[%2428, %2433, %2473] : memref<16x6x2xvector<8xf32>> + %2475 = vector.extractelement %2474[%c7_i64 : i64] : vector<8xf32> + %2476 = addf %2069, %2004 {RelaxedPrecision} : f32 + %2477 = addf %2127, %2005 {RelaxedPrecision} : f32 + %2478 = addf %2185, %2006 {RelaxedPrecision} : f32 + %2479 = addf %2243, %2007 {RelaxedPrecision} : f32 + %2480 = addf %2301, %2008 {RelaxedPrecision} : f32 + %2481 = addf %2359, %2009 {RelaxedPrecision} : f32 + %2482 = addf %2417, %2010 {RelaxedPrecision} : f32 + %2483 = addf %2475, %2011 {RelaxedPrecision} : f32 + %2484 = addi %arg5, %c8 : index + %2485 = cmpi "slt", %2484, %c0 : index + %2486 = subi %c-1, %2484 : index + %2487 = select %2485, %2486, %2484 : index + %2488 = divi_signed %2487, %c16 : index + %2489 = subi %c-1, %2488 : index + %2490 = select %2485, %2489, %2488 : index + %2491 = remi_signed %2490, %c16 : index + %2492 = cmpi "slt", %2491, %c0 : index + %2493 = addi %2491, %c16 : index + %2494 = select %2492, %2493, %2491 : index + %2495 = addi %arg7, %arg9 : index + %2496 = remi_signed %2495, %c6 : index + %2497 = cmpi "slt", %2496, %c0 : index + %2498 = addi %2496, %c6 : index + %2499 = select %2497, %2498, %2496 : index + %2500 = cmpi "slt", %arg5, %c0 : index + %2501 = subi %c-1, %arg5 : index + %2502 = select %2500, %2501, %arg5 : index + %2503 = divi_signed %2502, %c8 : index + %2504 = subi %c-1, %2503 : index + %2505 = select %2500, %2504, %2503 : index + %2506 = addi %arg5, %c8 : index + %2507 = cmpi "slt", %2506, %c0 : index + %2508 = subi %c-1, %2506 : index + %2509 = select %2507, %2508, %2506 : index + %2510 = divi_signed %2509, %c16 : index + %2511 = subi %c-1, %2510 : index + %2512 = select %2507, %2511, %2510 : index + %2513 = muli %2512, %c-2 : index + %2514 = addi %2505, %2513 : index + %2515 = cmpi "slt", %arg5, %c0 : index + %2516 = subi %c-1, %arg5 : index + %2517 = select %2515, %2516, %arg5 : index + %2518 = divi_signed %2517, %c8 : index + %2519 = subi %c-1, %2518 : index + %2520 = select %2515, %2519, %2518 : index + %2521 = addi %arg5, %c8 : index + %2522 = cmpi "slt", %2521, %c0 : index + %2523 = subi %c-1, %2521 : index + %2524 = select %2522, %2523, %2521 : index + %2525 = divi_signed %2524, %c16 : index + %2526 = subi %c-1, %2525 : index + %2527 = select %2522, %2526, %2525 : index + %2528 = muli %2527, %c-2 : index + %2529 = addi %2520, %2528 : index + %2530 = addi %2529, %c1 : index + %2531 = cmpi "slt", %2530, %c0 : index + %2532 = subi %c-1, %2530 : index + %2533 = select %2531, %2532, %2530 : index + %2534 = divi_signed %2533, %c2 : index + %2535 = subi %c-1, %2534 : index + %2536 = select %2531, %2535, %2534 : index + %2537 = muli %2536, %c-2 : index + %2538 = addi %2514, %2537 : index + %2539 = addi %2538, %c1 : index + %2540 = load %2[%2494, %2499, %2539] : memref<16x6x2xvector<8xf32>> + %2541 = vector.insertelement %2476, %2540[%c0_i64 : i64] : vector<8xf32> + %2542 = addi %arg5, %c8 : index + %2543 = cmpi "slt", %2542, %c0 : index + %2544 = subi %c-1, %2542 : index + %2545 = select %2543, %2544, %2542 : index + %2546 = divi_signed %2545, %c16 : index + %2547 = subi %c-1, %2546 : index + %2548 = select %2543, %2547, %2546 : index + %2549 = remi_signed %2548, %c16 : index + %2550 = cmpi "slt", %2549, %c0 : index + %2551 = addi %2549, %c16 : index + %2552 = select %2550, %2551, %2549 : index + %2553 = addi %arg7, %arg9 : index + %2554 = remi_signed %2553, %c6 : index + %2555 = cmpi "slt", %2554, %c0 : index + %2556 = addi %2554, %c6 : index + %2557 = select %2555, %2556, %2554 : index + %2558 = cmpi "slt", %arg5, %c0 : index + %2559 = subi %c-1, %arg5 : index + %2560 = select %2558, %2559, %arg5 : index + %2561 = divi_signed %2560, %c8 : index + %2562 = subi %c-1, %2561 : index + %2563 = select %2558, %2562, %2561 : index + %2564 = addi %arg5, %c8 : index + %2565 = cmpi "slt", %2564, %c0 : index + %2566 = subi %c-1, %2564 : index + %2567 = select %2565, %2566, %2564 : index + %2568 = divi_signed %2567, %c16 : index + %2569 = subi %c-1, %2568 : index + %2570 = select %2565, %2569, %2568 : index + %2571 = muli %2570, %c-2 : index + %2572 = addi %2563, %2571 : index + %2573 = cmpi "slt", %arg5, %c0 : index + %2574 = subi %c-1, %arg5 : index + %2575 = select %2573, %2574, %arg5 : index + %2576 = divi_signed %2575, %c8 : index + %2577 = subi %c-1, %2576 : index + %2578 = select %2573, %2577, %2576 : index + %2579 = addi %arg5, %c8 : index + %2580 = cmpi "slt", %2579, %c0 : index + %2581 = subi %c-1, %2579 : index + %2582 = select %2580, %2581, %2579 : index + %2583 = divi_signed %2582, %c16 : index + %2584 = subi %c-1, %2583 : index + %2585 = select %2580, %2584, %2583 : index + %2586 = muli %2585, %c-2 : index + %2587 = addi %2578, %2586 : index + %2588 = addi %2587, %c1 : index + %2589 = cmpi "slt", %2588, %c0 : index + %2590 = subi %c-1, %2588 : index + %2591 = select %2589, %2590, %2588 : index + %2592 = divi_signed %2591, %c2 : index + %2593 = subi %c-1, %2592 : index + %2594 = select %2589, %2593, %2592 : index + %2595 = muli %2594, %c-2 : index + %2596 = addi %2572, %2595 : index + %2597 = addi %2596, %c1 : index + store %2541, %2[%2552, %2557, %2597] : memref<16x6x2xvector<8xf32>> + %2598 = addi %arg5, %c8 : index + %2599 = cmpi "slt", %2598, %c0 : index + %2600 = subi %c-1, %2598 : index + %2601 = select %2599, %2600, %2598 : index + %2602 = divi_signed %2601, %c16 : index + %2603 = subi %c-1, %2602 : index + %2604 = select %2599, %2603, %2602 : index + %2605 = remi_signed %2604, %c16 : index + %2606 = cmpi "slt", %2605, %c0 : index + %2607 = addi %2605, %c16 : index + %2608 = select %2606, %2607, %2605 : index + %2609 = addi %arg7, %arg9 : index + %2610 = remi_signed %2609, %c6 : index + %2611 = cmpi "slt", %2610, %c0 : index + %2612 = addi %2610, %c6 : index + %2613 = select %2611, %2612, %2610 : index + %2614 = cmpi "slt", %arg5, %c0 : index + %2615 = subi %c-1, %arg5 : index + %2616 = select %2614, %2615, %arg5 : index + %2617 = divi_signed %2616, %c8 : index + %2618 = subi %c-1, %2617 : index + %2619 = select %2614, %2618, %2617 : index + %2620 = addi %arg5, %c8 : index + %2621 = cmpi "slt", %2620, %c0 : index + %2622 = subi %c-1, %2620 : index + %2623 = select %2621, %2622, %2620 : index + %2624 = divi_signed %2623, %c16 : index + %2625 = subi %c-1, %2624 : index + %2626 = select %2621, %2625, %2624 : index + %2627 = muli %2626, %c-2 : index + %2628 = addi %2619, %2627 : index + %2629 = cmpi "slt", %arg5, %c0 : index + %2630 = subi %c-1, %arg5 : index + %2631 = select %2629, %2630, %arg5 : index + %2632 = divi_signed %2631, %c8 : index + %2633 = subi %c-1, %2632 : index + %2634 = select %2629, %2633, %2632 : index + %2635 = addi %arg5, %c8 : index + %2636 = cmpi "slt", %2635, %c0 : index + %2637 = subi %c-1, %2635 : index + %2638 = select %2636, %2637, %2635 : index + %2639 = divi_signed %2638, %c16 : index + %2640 = subi %c-1, %2639 : index + %2641 = select %2636, %2640, %2639 : index + %2642 = muli %2641, %c-2 : index + %2643 = addi %2634, %2642 : index + %2644 = addi %2643, %c1 : index + %2645 = cmpi "slt", %2644, %c0 : index + %2646 = subi %c-1, %2644 : index + %2647 = select %2645, %2646, %2644 : index + %2648 = divi_signed %2647, %c2 : index + %2649 = subi %c-1, %2648 : index + %2650 = select %2645, %2649, %2648 : index + %2651 = muli %2650, %c-2 : index + %2652 = addi %2628, %2651 : index + %2653 = addi %2652, %c1 : index + %2654 = load %2[%2608, %2613, %2653] : memref<16x6x2xvector<8xf32>> + %2655 = vector.insertelement %2477, %2654[%c1_i64 : i64] : vector<8xf32> + %2656 = addi %arg5, %c8 : index + %2657 = cmpi "slt", %2656, %c0 : index + %2658 = subi %c-1, %2656 : index + %2659 = select %2657, %2658, %2656 : index + %2660 = divi_signed %2659, %c16 : index + %2661 = subi %c-1, %2660 : index + %2662 = select %2657, %2661, %2660 : index + %2663 = remi_signed %2662, %c16 : index + %2664 = cmpi "slt", %2663, %c0 : index + %2665 = addi %2663, %c16 : index + %2666 = select %2664, %2665, %2663 : index + %2667 = addi %arg7, %arg9 : index + %2668 = remi_signed %2667, %c6 : index + %2669 = cmpi "slt", %2668, %c0 : index + %2670 = addi %2668, %c6 : index + %2671 = select %2669, %2670, %2668 : index + %2672 = cmpi "slt", %arg5, %c0 : index + %2673 = subi %c-1, %arg5 : index + %2674 = select %2672, %2673, %arg5 : index + %2675 = divi_signed %2674, %c8 : index + %2676 = subi %c-1, %2675 : index + %2677 = select %2672, %2676, %2675 : index + %2678 = addi %arg5, %c8 : index + %2679 = cmpi "slt", %2678, %c0 : index + %2680 = subi %c-1, %2678 : index + %2681 = select %2679, %2680, %2678 : index + %2682 = divi_signed %2681, %c16 : index + %2683 = subi %c-1, %2682 : index + %2684 = select %2679, %2683, %2682 : index + %2685 = muli %2684, %c-2 : index + %2686 = addi %2677, %2685 : index + %2687 = cmpi "slt", %arg5, %c0 : index + %2688 = subi %c-1, %arg5 : index + %2689 = select %2687, %2688, %arg5 : index + %2690 = divi_signed %2689, %c8 : index + %2691 = subi %c-1, %2690 : index + %2692 = select %2687, %2691, %2690 : index + %2693 = addi %arg5, %c8 : index + %2694 = cmpi "slt", %2693, %c0 : index + %2695 = subi %c-1, %2693 : index + %2696 = select %2694, %2695, %2693 : index + %2697 = divi_signed %2696, %c16 : index + %2698 = subi %c-1, %2697 : index + %2699 = select %2694, %2698, %2697 : index + %2700 = muli %2699, %c-2 : index + %2701 = addi %2692, %2700 : index + %2702 = addi %2701, %c1 : index + %2703 = cmpi "slt", %2702, %c0 : index + %2704 = subi %c-1, %2702 : index + %2705 = select %2703, %2704, %2702 : index + %2706 = divi_signed %2705, %c2 : index + %2707 = subi %c-1, %2706 : index + %2708 = select %2703, %2707, %2706 : index + %2709 = muli %2708, %c-2 : index + %2710 = addi %2686, %2709 : index + %2711 = addi %2710, %c1 : index + store %2655, %2[%2666, %2671, %2711] : memref<16x6x2xvector<8xf32>> + %2712 = addi %arg5, %c8 : index + %2713 = cmpi "slt", %2712, %c0 : index + %2714 = subi %c-1, %2712 : index + %2715 = select %2713, %2714, %2712 : index + %2716 = divi_signed %2715, %c16 : index + %2717 = subi %c-1, %2716 : index + %2718 = select %2713, %2717, %2716 : index + %2719 = remi_signed %2718, %c16 : index + %2720 = cmpi "slt", %2719, %c0 : index + %2721 = addi %2719, %c16 : index + %2722 = select %2720, %2721, %2719 : index + %2723 = addi %arg7, %arg9 : index + %2724 = remi_signed %2723, %c6 : index + %2725 = cmpi "slt", %2724, %c0 : index + %2726 = addi %2724, %c6 : index + %2727 = select %2725, %2726, %2724 : index + %2728 = cmpi "slt", %arg5, %c0 : index + %2729 = subi %c-1, %arg5 : index + %2730 = select %2728, %2729, %arg5 : index + %2731 = divi_signed %2730, %c8 : index + %2732 = subi %c-1, %2731 : index + %2733 = select %2728, %2732, %2731 : index + %2734 = addi %arg5, %c8 : index + %2735 = cmpi "slt", %2734, %c0 : index + %2736 = subi %c-1, %2734 : index + %2737 = select %2735, %2736, %2734 : index + %2738 = divi_signed %2737, %c16 : index + %2739 = subi %c-1, %2738 : index + %2740 = select %2735, %2739, %2738 : index + %2741 = muli %2740, %c-2 : index + %2742 = addi %2733, %2741 : index + %2743 = cmpi "slt", %arg5, %c0 : index + %2744 = subi %c-1, %arg5 : index + %2745 = select %2743, %2744, %arg5 : index + %2746 = divi_signed %2745, %c8 : index + %2747 = subi %c-1, %2746 : index + %2748 = select %2743, %2747, %2746 : index + %2749 = addi %arg5, %c8 : index + %2750 = cmpi "slt", %2749, %c0 : index + %2751 = subi %c-1, %2749 : index + %2752 = select %2750, %2751, %2749 : index + %2753 = divi_signed %2752, %c16 : index + %2754 = subi %c-1, %2753 : index + %2755 = select %2750, %2754, %2753 : index + %2756 = muli %2755, %c-2 : index + %2757 = addi %2748, %2756 : index + %2758 = addi %2757, %c1 : index + %2759 = cmpi "slt", %2758, %c0 : index + %2760 = subi %c-1, %2758 : index + %2761 = select %2759, %2760, %2758 : index + %2762 = divi_signed %2761, %c2 : index + %2763 = subi %c-1, %2762 : index + %2764 = select %2759, %2763, %2762 : index + %2765 = muli %2764, %c-2 : index + %2766 = addi %2742, %2765 : index + %2767 = addi %2766, %c1 : index + %2768 = load %2[%2722, %2727, %2767] : memref<16x6x2xvector<8xf32>> + %2769 = vector.insertelement %2478, %2768[%c2_i64 : i64] : vector<8xf32> + %2770 = addi %arg5, %c8 : index + %2771 = cmpi "slt", %2770, %c0 : index + %2772 = subi %c-1, %2770 : index + %2773 = select %2771, %2772, %2770 : index + %2774 = divi_signed %2773, %c16 : index + %2775 = subi %c-1, %2774 : index + %2776 = select %2771, %2775, %2774 : index + %2777 = remi_signed %2776, %c16 : index + %2778 = cmpi "slt", %2777, %c0 : index + %2779 = addi %2777, %c16 : index + %2780 = select %2778, %2779, %2777 : index + %2781 = addi %arg7, %arg9 : index + %2782 = remi_signed %2781, %c6 : index + %2783 = cmpi "slt", %2782, %c0 : index + %2784 = addi %2782, %c6 : index + %2785 = select %2783, %2784, %2782 : index + %2786 = cmpi "slt", %arg5, %c0 : index + %2787 = subi %c-1, %arg5 : index + %2788 = select %2786, %2787, %arg5 : index + %2789 = divi_signed %2788, %c8 : index + %2790 = subi %c-1, %2789 : index + %2791 = select %2786, %2790, %2789 : index + %2792 = addi %arg5, %c8 : index + %2793 = cmpi "slt", %2792, %c0 : index + %2794 = subi %c-1, %2792 : index + %2795 = select %2793, %2794, %2792 : index + %2796 = divi_signed %2795, %c16 : index + %2797 = subi %c-1, %2796 : index + %2798 = select %2793, %2797, %2796 : index + %2799 = muli %2798, %c-2 : index + %2800 = addi %2791, %2799 : index + %2801 = cmpi "slt", %arg5, %c0 : index + %2802 = subi %c-1, %arg5 : index + %2803 = select %2801, %2802, %arg5 : index + %2804 = divi_signed %2803, %c8 : index + %2805 = subi %c-1, %2804 : index + %2806 = select %2801, %2805, %2804 : index + %2807 = addi %arg5, %c8 : index + %2808 = cmpi "slt", %2807, %c0 : index + %2809 = subi %c-1, %2807 : index + %2810 = select %2808, %2809, %2807 : index + %2811 = divi_signed %2810, %c16 : index + %2812 = subi %c-1, %2811 : index + %2813 = select %2808, %2812, %2811 : index + %2814 = muli %2813, %c-2 : index + %2815 = addi %2806, %2814 : index + %2816 = addi %2815, %c1 : index + %2817 = cmpi "slt", %2816, %c0 : index + %2818 = subi %c-1, %2816 : index + %2819 = select %2817, %2818, %2816 : index + %2820 = divi_signed %2819, %c2 : index + %2821 = subi %c-1, %2820 : index + %2822 = select %2817, %2821, %2820 : index + %2823 = muli %2822, %c-2 : index + %2824 = addi %2800, %2823 : index + %2825 = addi %2824, %c1 : index + store %2769, %2[%2780, %2785, %2825] : memref<16x6x2xvector<8xf32>> + %2826 = addi %arg5, %c8 : index + %2827 = cmpi "slt", %2826, %c0 : index + %2828 = subi %c-1, %2826 : index + %2829 = select %2827, %2828, %2826 : index + %2830 = divi_signed %2829, %c16 : index + %2831 = subi %c-1, %2830 : index + %2832 = select %2827, %2831, %2830 : index + %2833 = remi_signed %2832, %c16 : index + %2834 = cmpi "slt", %2833, %c0 : index + %2835 = addi %2833, %c16 : index + %2836 = select %2834, %2835, %2833 : index + %2837 = addi %arg7, %arg9 : index + %2838 = remi_signed %2837, %c6 : index + %2839 = cmpi "slt", %2838, %c0 : index + %2840 = addi %2838, %c6 : index + %2841 = select %2839, %2840, %2838 : index + %2842 = cmpi "slt", %arg5, %c0 : index + %2843 = subi %c-1, %arg5 : index + %2844 = select %2842, %2843, %arg5 : index + %2845 = divi_signed %2844, %c8 : index + %2846 = subi %c-1, %2845 : index + %2847 = select %2842, %2846, %2845 : index + %2848 = addi %arg5, %c8 : index + %2849 = cmpi "slt", %2848, %c0 : index + %2850 = subi %c-1, %2848 : index + %2851 = select %2849, %2850, %2848 : index + %2852 = divi_signed %2851, %c16 : index + %2853 = subi %c-1, %2852 : index + %2854 = select %2849, %2853, %2852 : index + %2855 = muli %2854, %c-2 : index + %2856 = addi %2847, %2855 : index + %2857 = cmpi "slt", %arg5, %c0 : index + %2858 = subi %c-1, %arg5 : index + %2859 = select %2857, %2858, %arg5 : index + %2860 = divi_signed %2859, %c8 : index + %2861 = subi %c-1, %2860 : index + %2862 = select %2857, %2861, %2860 : index + %2863 = addi %arg5, %c8 : index + %2864 = cmpi "slt", %2863, %c0 : index + %2865 = subi %c-1, %2863 : index + %2866 = select %2864, %2865, %2863 : index + %2867 = divi_signed %2866, %c16 : index + %2868 = subi %c-1, %2867 : index + %2869 = select %2864, %2868, %2867 : index + %2870 = muli %2869, %c-2 : index + %2871 = addi %2862, %2870 : index + %2872 = addi %2871, %c1 : index + %2873 = cmpi "slt", %2872, %c0 : index + %2874 = subi %c-1, %2872 : index + %2875 = select %2873, %2874, %2872 : index + %2876 = divi_signed %2875, %c2 : index + %2877 = subi %c-1, %2876 : index + %2878 = select %2873, %2877, %2876 : index + %2879 = muli %2878, %c-2 : index + %2880 = addi %2856, %2879 : index + %2881 = addi %2880, %c1 : index + %2882 = load %2[%2836, %2841, %2881] : memref<16x6x2xvector<8xf32>> + %2883 = vector.insertelement %2479, %2882[%c3_i64 : i64] : vector<8xf32> + %2884 = addi %arg5, %c8 : index + %2885 = cmpi "slt", %2884, %c0 : index + %2886 = subi %c-1, %2884 : index + %2887 = select %2885, %2886, %2884 : index + %2888 = divi_signed %2887, %c16 : index + %2889 = subi %c-1, %2888 : index + %2890 = select %2885, %2889, %2888 : index + %2891 = remi_signed %2890, %c16 : index + %2892 = cmpi "slt", %2891, %c0 : index + %2893 = addi %2891, %c16 : index + %2894 = select %2892, %2893, %2891 : index + %2895 = addi %arg7, %arg9 : index + %2896 = remi_signed %2895, %c6 : index + %2897 = cmpi "slt", %2896, %c0 : index + %2898 = addi %2896, %c6 : index + %2899 = select %2897, %2898, %2896 : index + %2900 = cmpi "slt", %arg5, %c0 : index + %2901 = subi %c-1, %arg5 : index + %2902 = select %2900, %2901, %arg5 : index + %2903 = divi_signed %2902, %c8 : index + %2904 = subi %c-1, %2903 : index + %2905 = select %2900, %2904, %2903 : index + %2906 = addi %arg5, %c8 : index + %2907 = cmpi "slt", %2906, %c0 : index + %2908 = subi %c-1, %2906 : index + %2909 = select %2907, %2908, %2906 : index + %2910 = divi_signed %2909, %c16 : index + %2911 = subi %c-1, %2910 : index + %2912 = select %2907, %2911, %2910 : index + %2913 = muli %2912, %c-2 : index + %2914 = addi %2905, %2913 : index + %2915 = cmpi "slt", %arg5, %c0 : index + %2916 = subi %c-1, %arg5 : index + %2917 = select %2915, %2916, %arg5 : index + %2918 = divi_signed %2917, %c8 : index + %2919 = subi %c-1, %2918 : index + %2920 = select %2915, %2919, %2918 : index + %2921 = addi %arg5, %c8 : index + %2922 = cmpi "slt", %2921, %c0 : index + %2923 = subi %c-1, %2921 : index + %2924 = select %2922, %2923, %2921 : index + %2925 = divi_signed %2924, %c16 : index + %2926 = subi %c-1, %2925 : index + %2927 = select %2922, %2926, %2925 : index + %2928 = muli %2927, %c-2 : index + %2929 = addi %2920, %2928 : index + %2930 = addi %2929, %c1 : index + %2931 = cmpi "slt", %2930, %c0 : index + %2932 = subi %c-1, %2930 : index + %2933 = select %2931, %2932, %2930 : index + %2934 = divi_signed %2933, %c2 : index + %2935 = subi %c-1, %2934 : index + %2936 = select %2931, %2935, %2934 : index + %2937 = muli %2936, %c-2 : index + %2938 = addi %2914, %2937 : index + %2939 = addi %2938, %c1 : index + store %2883, %2[%2894, %2899, %2939] : memref<16x6x2xvector<8xf32>> + %2940 = addi %arg5, %c8 : index + %2941 = cmpi "slt", %2940, %c0 : index + %2942 = subi %c-1, %2940 : index + %2943 = select %2941, %2942, %2940 : index + %2944 = divi_signed %2943, %c16 : index + %2945 = subi %c-1, %2944 : index + %2946 = select %2941, %2945, %2944 : index + %2947 = remi_signed %2946, %c16 : index + %2948 = cmpi "slt", %2947, %c0 : index + %2949 = addi %2947, %c16 : index + %2950 = select %2948, %2949, %2947 : index + %2951 = addi %arg7, %arg9 : index + %2952 = remi_signed %2951, %c6 : index + %2953 = cmpi "slt", %2952, %c0 : index + %2954 = addi %2952, %c6 : index + %2955 = select %2953, %2954, %2952 : index + %2956 = cmpi "slt", %arg5, %c0 : index + %2957 = subi %c-1, %arg5 : index + %2958 = select %2956, %2957, %arg5 : index + %2959 = divi_signed %2958, %c8 : index + %2960 = subi %c-1, %2959 : index + %2961 = select %2956, %2960, %2959 : index + %2962 = addi %arg5, %c8 : index + %2963 = cmpi "slt", %2962, %c0 : index + %2964 = subi %c-1, %2962 : index + %2965 = select %2963, %2964, %2962 : index + %2966 = divi_signed %2965, %c16 : index + %2967 = subi %c-1, %2966 : index + %2968 = select %2963, %2967, %2966 : index + %2969 = muli %2968, %c-2 : index + %2970 = addi %2961, %2969 : index + %2971 = cmpi "slt", %arg5, %c0 : index + %2972 = subi %c-1, %arg5 : index + %2973 = select %2971, %2972, %arg5 : index + %2974 = divi_signed %2973, %c8 : index + %2975 = subi %c-1, %2974 : index + %2976 = select %2971, %2975, %2974 : index + %2977 = addi %arg5, %c8 : index + %2978 = cmpi "slt", %2977, %c0 : index + %2979 = subi %c-1, %2977 : index + %2980 = select %2978, %2979, %2977 : index + %2981 = divi_signed %2980, %c16 : index + %2982 = subi %c-1, %2981 : index + %2983 = select %2978, %2982, %2981 : index + %2984 = muli %2983, %c-2 : index + %2985 = addi %2976, %2984 : index + %2986 = addi %2985, %c1 : index + %2987 = cmpi "slt", %2986, %c0 : index + %2988 = subi %c-1, %2986 : index + %2989 = select %2987, %2988, %2986 : index + %2990 = divi_signed %2989, %c2 : index + %2991 = subi %c-1, %2990 : index + %2992 = select %2987, %2991, %2990 : index + %2993 = muli %2992, %c-2 : index + %2994 = addi %2970, %2993 : index + %2995 = addi %2994, %c1 : index + %2996 = load %2[%2950, %2955, %2995] : memref<16x6x2xvector<8xf32>> + %2997 = vector.insertelement %2480, %2996[%c4_i64 : i64] : vector<8xf32> + %2998 = addi %arg5, %c8 : index + %2999 = cmpi "slt", %2998, %c0 : index + %3000 = subi %c-1, %2998 : index + %3001 = select %2999, %3000, %2998 : index + %3002 = divi_signed %3001, %c16 : index + %3003 = subi %c-1, %3002 : index + %3004 = select %2999, %3003, %3002 : index + %3005 = remi_signed %3004, %c16 : index + %3006 = cmpi "slt", %3005, %c0 : index + %3007 = addi %3005, %c16 : index + %3008 = select %3006, %3007, %3005 : index + %3009 = addi %arg7, %arg9 : index + %3010 = remi_signed %3009, %c6 : index + %3011 = cmpi "slt", %3010, %c0 : index + %3012 = addi %3010, %c6 : index + %3013 = select %3011, %3012, %3010 : index + %3014 = cmpi "slt", %arg5, %c0 : index + %3015 = subi %c-1, %arg5 : index + %3016 = select %3014, %3015, %arg5 : index + %3017 = divi_signed %3016, %c8 : index + %3018 = subi %c-1, %3017 : index + %3019 = select %3014, %3018, %3017 : index + %3020 = addi %arg5, %c8 : index + %3021 = cmpi "slt", %3020, %c0 : index + %3022 = subi %c-1, %3020 : index + %3023 = select %3021, %3022, %3020 : index + %3024 = divi_signed %3023, %c16 : index + %3025 = subi %c-1, %3024 : index + %3026 = select %3021, %3025, %3024 : index + %3027 = muli %3026, %c-2 : index + %3028 = addi %3019, %3027 : index + %3029 = cmpi "slt", %arg5, %c0 : index + %3030 = subi %c-1, %arg5 : index + %3031 = select %3029, %3030, %arg5 : index + %3032 = divi_signed %3031, %c8 : index + %3033 = subi %c-1, %3032 : index + %3034 = select %3029, %3033, %3032 : index + %3035 = addi %arg5, %c8 : index + %3036 = cmpi "slt", %3035, %c0 : index + %3037 = subi %c-1, %3035 : index + %3038 = select %3036, %3037, %3035 : index + %3039 = divi_signed %3038, %c16 : index + %3040 = subi %c-1, %3039 : index + %3041 = select %3036, %3040, %3039 : index + %3042 = muli %3041, %c-2 : index + %3043 = addi %3034, %3042 : index + %3044 = addi %3043, %c1 : index + %3045 = cmpi "slt", %3044, %c0 : index + %3046 = subi %c-1, %3044 : index + %3047 = select %3045, %3046, %3044 : index + %3048 = divi_signed %3047, %c2 : index + %3049 = subi %c-1, %3048 : index + %3050 = select %3045, %3049, %3048 : index + %3051 = muli %3050, %c-2 : index + %3052 = addi %3028, %3051 : index + %3053 = addi %3052, %c1 : index + store %2997, %2[%3008, %3013, %3053] : memref<16x6x2xvector<8xf32>> + %3054 = addi %arg5, %c8 : index + %3055 = cmpi "slt", %3054, %c0 : index + %3056 = subi %c-1, %3054 : index + %3057 = select %3055, %3056, %3054 : index + %3058 = divi_signed %3057, %c16 : index + %3059 = subi %c-1, %3058 : index + %3060 = select %3055, %3059, %3058 : index + %3061 = remi_signed %3060, %c16 : index + %3062 = cmpi "slt", %3061, %c0 : index + %3063 = addi %3061, %c16 : index + %3064 = select %3062, %3063, %3061 : index + %3065 = addi %arg7, %arg9 : index + %3066 = remi_signed %3065, %c6 : index + %3067 = cmpi "slt", %3066, %c0 : index + %3068 = addi %3066, %c6 : index + %3069 = select %3067, %3068, %3066 : index + %3070 = cmpi "slt", %arg5, %c0 : index + %3071 = subi %c-1, %arg5 : index + %3072 = select %3070, %3071, %arg5 : index + %3073 = divi_signed %3072, %c8 : index + %3074 = subi %c-1, %3073 : index + %3075 = select %3070, %3074, %3073 : index + %3076 = addi %arg5, %c8 : index + %3077 = cmpi "slt", %3076, %c0 : index + %3078 = subi %c-1, %3076 : index + %3079 = select %3077, %3078, %3076 : index + %3080 = divi_signed %3079, %c16 : index + %3081 = subi %c-1, %3080 : index + %3082 = select %3077, %3081, %3080 : index + %3083 = muli %3082, %c-2 : index + %3084 = addi %3075, %3083 : index + %3085 = cmpi "slt", %arg5, %c0 : index + %3086 = subi %c-1, %arg5 : index + %3087 = select %3085, %3086, %arg5 : index + %3088 = divi_signed %3087, %c8 : index + %3089 = subi %c-1, %3088 : index + %3090 = select %3085, %3089, %3088 : index + %3091 = addi %arg5, %c8 : index + %3092 = cmpi "slt", %3091, %c0 : index + %3093 = subi %c-1, %3091 : index + %3094 = select %3092, %3093, %3091 : index + %3095 = divi_signed %3094, %c16 : index + %3096 = subi %c-1, %3095 : index + %3097 = select %3092, %3096, %3095 : index + %3098 = muli %3097, %c-2 : index + %3099 = addi %3090, %3098 : index + %3100 = addi %3099, %c1 : index + %3101 = cmpi "slt", %3100, %c0 : index + %3102 = subi %c-1, %3100 : index + %3103 = select %3101, %3102, %3100 : index + %3104 = divi_signed %3103, %c2 : index + %3105 = subi %c-1, %3104 : index + %3106 = select %3101, %3105, %3104 : index + %3107 = muli %3106, %c-2 : index + %3108 = addi %3084, %3107 : index + %3109 = addi %3108, %c1 : index + %3110 = load %2[%3064, %3069, %3109] : memref<16x6x2xvector<8xf32>> + %3111 = vector.insertelement %2481, %3110[%c5_i64 : i64] : vector<8xf32> + %3112 = addi %arg5, %c8 : index + %3113 = cmpi "slt", %3112, %c0 : index + %3114 = subi %c-1, %3112 : index + %3115 = select %3113, %3114, %3112 : index + %3116 = divi_signed %3115, %c16 : index + %3117 = subi %c-1, %3116 : index + %3118 = select %3113, %3117, %3116 : index + %3119 = remi_signed %3118, %c16 : index + %3120 = cmpi "slt", %3119, %c0 : index + %3121 = addi %3119, %c16 : index + %3122 = select %3120, %3121, %3119 : index + %3123 = addi %arg7, %arg9 : index + %3124 = remi_signed %3123, %c6 : index + %3125 = cmpi "slt", %3124, %c0 : index + %3126 = addi %3124, %c6 : index + %3127 = select %3125, %3126, %3124 : index + %3128 = cmpi "slt", %arg5, %c0 : index + %3129 = subi %c-1, %arg5 : index + %3130 = select %3128, %3129, %arg5 : index + %3131 = divi_signed %3130, %c8 : index + %3132 = subi %c-1, %3131 : index + %3133 = select %3128, %3132, %3131 : index + %3134 = addi %arg5, %c8 : index + %3135 = cmpi "slt", %3134, %c0 : index + %3136 = subi %c-1, %3134 : index + %3137 = select %3135, %3136, %3134 : index + %3138 = divi_signed %3137, %c16 : index + %3139 = subi %c-1, %3138 : index + %3140 = select %3135, %3139, %3138 : index + %3141 = muli %3140, %c-2 : index + %3142 = addi %3133, %3141 : index + %3143 = cmpi "slt", %arg5, %c0 : index + %3144 = subi %c-1, %arg5 : index + %3145 = select %3143, %3144, %arg5 : index + %3146 = divi_signed %3145, %c8 : index + %3147 = subi %c-1, %3146 : index + %3148 = select %3143, %3147, %3146 : index + %3149 = addi %arg5, %c8 : index + %3150 = cmpi "slt", %3149, %c0 : index + %3151 = subi %c-1, %3149 : index + %3152 = select %3150, %3151, %3149 : index + %3153 = divi_signed %3152, %c16 : index + %3154 = subi %c-1, %3153 : index + %3155 = select %3150, %3154, %3153 : index + %3156 = muli %3155, %c-2 : index + %3157 = addi %3148, %3156 : index + %3158 = addi %3157, %c1 : index + %3159 = cmpi "slt", %3158, %c0 : index + %3160 = subi %c-1, %3158 : index + %3161 = select %3159, %3160, %3158 : index + %3162 = divi_signed %3161, %c2 : index + %3163 = subi %c-1, %3162 : index + %3164 = select %3159, %3163, %3162 : index + %3165 = muli %3164, %c-2 : index + %3166 = addi %3142, %3165 : index + %3167 = addi %3166, %c1 : index + store %3111, %2[%3122, %3127, %3167] : memref<16x6x2xvector<8xf32>> + %3168 = addi %arg5, %c8 : index + %3169 = cmpi "slt", %3168, %c0 : index + %3170 = subi %c-1, %3168 : index + %3171 = select %3169, %3170, %3168 : index + %3172 = divi_signed %3171, %c16 : index + %3173 = subi %c-1, %3172 : index + %3174 = select %3169, %3173, %3172 : index + %3175 = remi_signed %3174, %c16 : index + %3176 = cmpi "slt", %3175, %c0 : index + %3177 = addi %3175, %c16 : index + %3178 = select %3176, %3177, %3175 : index + %3179 = addi %arg7, %arg9 : index + %3180 = remi_signed %3179, %c6 : index + %3181 = cmpi "slt", %3180, %c0 : index + %3182 = addi %3180, %c6 : index + %3183 = select %3181, %3182, %3180 : index + %3184 = cmpi "slt", %arg5, %c0 : index + %3185 = subi %c-1, %arg5 : index + %3186 = select %3184, %3185, %arg5 : index + %3187 = divi_signed %3186, %c8 : index + %3188 = subi %c-1, %3187 : index + %3189 = select %3184, %3188, %3187 : index + %3190 = addi %arg5, %c8 : index + %3191 = cmpi "slt", %3190, %c0 : index + %3192 = subi %c-1, %3190 : index + %3193 = select %3191, %3192, %3190 : index + %3194 = divi_signed %3193, %c16 : index + %3195 = subi %c-1, %3194 : index + %3196 = select %3191, %3195, %3194 : index + %3197 = muli %3196, %c-2 : index + %3198 = addi %3189, %3197 : index + %3199 = cmpi "slt", %arg5, %c0 : index + %3200 = subi %c-1, %arg5 : index + %3201 = select %3199, %3200, %arg5 : index + %3202 = divi_signed %3201, %c8 : index + %3203 = subi %c-1, %3202 : index + %3204 = select %3199, %3203, %3202 : index + %3205 = addi %arg5, %c8 : index + %3206 = cmpi "slt", %3205, %c0 : index + %3207 = subi %c-1, %3205 : index + %3208 = select %3206, %3207, %3205 : index + %3209 = divi_signed %3208, %c16 : index + %3210 = subi %c-1, %3209 : index + %3211 = select %3206, %3210, %3209 : index + %3212 = muli %3211, %c-2 : index + %3213 = addi %3204, %3212 : index + %3214 = addi %3213, %c1 : index + %3215 = cmpi "slt", %3214, %c0 : index + %3216 = subi %c-1, %3214 : index + %3217 = select %3215, %3216, %3214 : index + %3218 = divi_signed %3217, %c2 : index + %3219 = subi %c-1, %3218 : index + %3220 = select %3215, %3219, %3218 : index + %3221 = muli %3220, %c-2 : index + %3222 = addi %3198, %3221 : index + %3223 = addi %3222, %c1 : index + %3224 = load %2[%3178, %3183, %3223] : memref<16x6x2xvector<8xf32>> + %3225 = vector.insertelement %2482, %3224[%c6_i64 : i64] : vector<8xf32> + %3226 = addi %arg5, %c8 : index + %3227 = cmpi "slt", %3226, %c0 : index + %3228 = subi %c-1, %3226 : index + %3229 = select %3227, %3228, %3226 : index + %3230 = divi_signed %3229, %c16 : index + %3231 = subi %c-1, %3230 : index + %3232 = select %3227, %3231, %3230 : index + %3233 = remi_signed %3232, %c16 : index + %3234 = cmpi "slt", %3233, %c0 : index + %3235 = addi %3233, %c16 : index + %3236 = select %3234, %3235, %3233 : index + %3237 = addi %arg7, %arg9 : index + %3238 = remi_signed %3237, %c6 : index + %3239 = cmpi "slt", %3238, %c0 : index + %3240 = addi %3238, %c6 : index + %3241 = select %3239, %3240, %3238 : index + %3242 = cmpi "slt", %arg5, %c0 : index + %3243 = subi %c-1, %arg5 : index + %3244 = select %3242, %3243, %arg5 : index + %3245 = divi_signed %3244, %c8 : index + %3246 = subi %c-1, %3245 : index + %3247 = select %3242, %3246, %3245 : index + %3248 = addi %arg5, %c8 : index + %3249 = cmpi "slt", %3248, %c0 : index + %3250 = subi %c-1, %3248 : index + %3251 = select %3249, %3250, %3248 : index + %3252 = divi_signed %3251, %c16 : index + %3253 = subi %c-1, %3252 : index + %3254 = select %3249, %3253, %3252 : index + %3255 = muli %3254, %c-2 : index + %3256 = addi %3247, %3255 : index + %3257 = cmpi "slt", %arg5, %c0 : index + %3258 = subi %c-1, %arg5 : index + %3259 = select %3257, %3258, %arg5 : index + %3260 = divi_signed %3259, %c8 : index + %3261 = subi %c-1, %3260 : index + %3262 = select %3257, %3261, %3260 : index + %3263 = addi %arg5, %c8 : index + %3264 = cmpi "slt", %3263, %c0 : index + %3265 = subi %c-1, %3263 : index + %3266 = select %3264, %3265, %3263 : index + %3267 = divi_signed %3266, %c16 : index + %3268 = subi %c-1, %3267 : index + %3269 = select %3264, %3268, %3267 : index + %3270 = muli %3269, %c-2 : index + %3271 = addi %3262, %3270 : index + %3272 = addi %3271, %c1 : index + %3273 = cmpi "slt", %3272, %c0 : index + %3274 = subi %c-1, %3272 : index + %3275 = select %3273, %3274, %3272 : index + %3276 = divi_signed %3275, %c2 : index + %3277 = subi %c-1, %3276 : index + %3278 = select %3273, %3277, %3276 : index + %3279 = muli %3278, %c-2 : index + %3280 = addi %3256, %3279 : index + %3281 = addi %3280, %c1 : index + store %3225, %2[%3236, %3241, %3281] : memref<16x6x2xvector<8xf32>> + %3282 = addi %arg5, %c8 : index + %3283 = cmpi "slt", %3282, %c0 : index + %3284 = subi %c-1, %3282 : index + %3285 = select %3283, %3284, %3282 : index + %3286 = divi_signed %3285, %c16 : index + %3287 = subi %c-1, %3286 : index + %3288 = select %3283, %3287, %3286 : index + %3289 = remi_signed %3288, %c16 : index + %3290 = cmpi "slt", %3289, %c0 : index + %3291 = addi %3289, %c16 : index + %3292 = select %3290, %3291, %3289 : index + %3293 = addi %arg7, %arg9 : index + %3294 = remi_signed %3293, %c6 : index + %3295 = cmpi "slt", %3294, %c0 : index + %3296 = addi %3294, %c6 : index + %3297 = select %3295, %3296, %3294 : index + %3298 = cmpi "slt", %arg5, %c0 : index + %3299 = subi %c-1, %arg5 : index + %3300 = select %3298, %3299, %arg5 : index + %3301 = divi_signed %3300, %c8 : index + %3302 = subi %c-1, %3301 : index + %3303 = select %3298, %3302, %3301 : index + %3304 = addi %arg5, %c8 : index + %3305 = cmpi "slt", %3304, %c0 : index + %3306 = subi %c-1, %3304 : index + %3307 = select %3305, %3306, %3304 : index + %3308 = divi_signed %3307, %c16 : index + %3309 = subi %c-1, %3308 : index + %3310 = select %3305, %3309, %3308 : index + %3311 = muli %3310, %c-2 : index + %3312 = addi %3303, %3311 : index + %3313 = cmpi "slt", %arg5, %c0 : index + %3314 = subi %c-1, %arg5 : index + %3315 = select %3313, %3314, %arg5 : index + %3316 = divi_signed %3315, %c8 : index + %3317 = subi %c-1, %3316 : index + %3318 = select %3313, %3317, %3316 : index + %3319 = addi %arg5, %c8 : index + %3320 = cmpi "slt", %3319, %c0 : index + %3321 = subi %c-1, %3319 : index + %3322 = select %3320, %3321, %3319 : index + %3323 = divi_signed %3322, %c16 : index + %3324 = subi %c-1, %3323 : index + %3325 = select %3320, %3324, %3323 : index + %3326 = muli %3325, %c-2 : index + %3327 = addi %3318, %3326 : index + %3328 = addi %3327, %c1 : index + %3329 = cmpi "slt", %3328, %c0 : index + %3330 = subi %c-1, %3328 : index + %3331 = select %3329, %3330, %3328 : index + %3332 = divi_signed %3331, %c2 : index + %3333 = subi %c-1, %3332 : index + %3334 = select %3329, %3333, %3332 : index + %3335 = muli %3334, %c-2 : index + %3336 = addi %3312, %3335 : index + %3337 = addi %3336, %c1 : index + %3338 = load %2[%3292, %3297, %3337] : memref<16x6x2xvector<8xf32>> + %3339 = vector.insertelement %2483, %3338[%c7_i64 : i64] : vector<8xf32> + %3340 = addi %arg5, %c8 : index + %3341 = cmpi "slt", %3340, %c0 : index + %3342 = subi %c-1, %3340 : index + %3343 = select %3341, %3342, %3340 : index + %3344 = divi_signed %3343, %c16 : index + %3345 = subi %c-1, %3344 : index + %3346 = select %3341, %3345, %3344 : index + %3347 = remi_signed %3346, %c16 : index + %3348 = cmpi "slt", %3347, %c0 : index + %3349 = addi %3347, %c16 : index + %3350 = select %3348, %3349, %3347 : index + %3351 = addi %arg7, %arg9 : index + %3352 = remi_signed %3351, %c6 : index + %3353 = cmpi "slt", %3352, %c0 : index + %3354 = addi %3352, %c6 : index + %3355 = select %3353, %3354, %3352 : index + %3356 = cmpi "slt", %arg5, %c0 : index + %3357 = subi %c-1, %arg5 : index + %3358 = select %3356, %3357, %arg5 : index + %3359 = divi_signed %3358, %c8 : index + %3360 = subi %c-1, %3359 : index + %3361 = select %3356, %3360, %3359 : index + %3362 = addi %arg5, %c8 : index + %3363 = cmpi "slt", %3362, %c0 : index + %3364 = subi %c-1, %3362 : index + %3365 = select %3363, %3364, %3362 : index + %3366 = divi_signed %3365, %c16 : index + %3367 = subi %c-1, %3366 : index + %3368 = select %3363, %3367, %3366 : index + %3369 = muli %3368, %c-2 : index + %3370 = addi %3361, %3369 : index + %3371 = cmpi "slt", %arg5, %c0 : index + %3372 = subi %c-1, %arg5 : index + %3373 = select %3371, %3372, %arg5 : index + %3374 = divi_signed %3373, %c8 : index + %3375 = subi %c-1, %3374 : index + %3376 = select %3371, %3375, %3374 : index + %3377 = addi %arg5, %c8 : index + %3378 = cmpi "slt", %3377, %c0 : index + %3379 = subi %c-1, %3377 : index + %3380 = select %3378, %3379, %3377 : index + %3381 = divi_signed %3380, %c16 : index + %3382 = subi %c-1, %3381 : index + %3383 = select %3378, %3382, %3381 : index + %3384 = muli %3383, %c-2 : index + %3385 = addi %3376, %3384 : index + %3386 = addi %3385, %c1 : index + %3387 = cmpi "slt", %3386, %c0 : index + %3388 = subi %c-1, %3386 : index + %3389 = select %3387, %3388, %3386 : index + %3390 = divi_signed %3389, %c2 : index + %3391 = subi %c-1, %3390 : index + %3392 = select %3387, %3391, %3390 : index + %3393 = muli %3392, %c-2 : index + %3394 = addi %3370, %3393 : index + %3395 = addi %3394, %c1 : index + store %3339, %2[%3350, %3355, %3395] : memref<16x6x2xvector<8xf32>> + %3396 = addi %arg5, %c8 : index + %3397 = cmpi "slt", %3396, %c0 : index + %3398 = subi %c-1, %3396 : index + %3399 = select %3397, %3398, %3396 : index + %3400 = divi_signed %3399, %c16 : index + %3401 = subi %c-1, %3400 : index + %3402 = select %3397, %3401, %3400 : index + %3403 = remi_signed %3402, %c16 : index + %3404 = cmpi "slt", %3403, %c0 : index + %3405 = addi %3403, %c16 : index + %3406 = select %3404, %3405, %3403 : index + %3407 = addi %arg7, %arg9 : index + %3408 = remi_signed %3407, %c6 : index + %3409 = cmpi "slt", %3408, %c0 : index + %3410 = addi %3408, %c6 : index + %3411 = select %3409, %3410, %3408 : index + %3412 = cmpi "slt", %arg5, %c0 : index + %3413 = subi %c-1, %arg5 : index + %3414 = select %3412, %3413, %arg5 : index + %3415 = divi_signed %3414, %c8 : index + %3416 = subi %c-1, %3415 : index + %3417 = select %3412, %3416, %3415 : index + %3418 = addi %arg5, %c8 : index + %3419 = cmpi "slt", %3418, %c0 : index + %3420 = subi %c-1, %3418 : index + %3421 = select %3419, %3420, %3418 : index + %3422 = divi_signed %3421, %c16 : index + %3423 = subi %c-1, %3422 : index + %3424 = select %3419, %3423, %3422 : index + %3425 = muli %3424, %c-2 : index + %3426 = addi %3417, %3425 : index + %3427 = cmpi "slt", %arg5, %c0 : index + %3428 = subi %c-1, %arg5 : index + %3429 = select %3427, %3428, %arg5 : index + %3430 = divi_signed %3429, %c8 : index + %3431 = subi %c-1, %3430 : index + %3432 = select %3427, %3431, %3430 : index + %3433 = addi %arg5, %c8 : index + %3434 = cmpi "slt", %3433, %c0 : index + %3435 = subi %c-1, %3433 : index + %3436 = select %3434, %3435, %3433 : index + %3437 = divi_signed %3436, %c16 : index + %3438 = subi %c-1, %3437 : index + %3439 = select %3434, %3438, %3437 : index + %3440 = muli %3439, %c-2 : index + %3441 = addi %3432, %3440 : index + %3442 = addi %3441, %c1 : index + %3443 = cmpi "slt", %3442, %c0 : index + %3444 = subi %c-1, %3442 : index + %3445 = select %3443, %3444, %3442 : index + %3446 = divi_signed %3445, %c2 : index + %3447 = subi %c-1, %3446 : index + %3448 = select %3443, %3447, %3446 : index + %3449 = muli %3448, %c-2 : index + %3450 = addi %3426, %3449 : index + %3451 = addi %3450, %c1 : index + %3452 = load %2[%3406, %3411, %3451] : memref<16x6x2xvector<8xf32>> + %3453 = vector.insertelement %2476, %3452[%c0_i64 : i64] : vector<8xf32> + %3454 = addi %arg5, %c8 : index + %3455 = cmpi "slt", %3454, %c0 : index + %3456 = subi %c-1, %3454 : index + %3457 = select %3455, %3456, %3454 : index + %3458 = divi_signed %3457, %c16 : index + %3459 = subi %c-1, %3458 : index + %3460 = select %3455, %3459, %3458 : index + %3461 = remi_signed %3460, %c16 : index + %3462 = cmpi "slt", %3461, %c0 : index + %3463 = addi %3461, %c16 : index + %3464 = select %3462, %3463, %3461 : index + %3465 = addi %arg7, %arg9 : index + %3466 = remi_signed %3465, %c6 : index + %3467 = cmpi "slt", %3466, %c0 : index + %3468 = addi %3466, %c6 : index + %3469 = select %3467, %3468, %3466 : index + %3470 = cmpi "slt", %arg5, %c0 : index + %3471 = subi %c-1, %arg5 : index + %3472 = select %3470, %3471, %arg5 : index + %3473 = divi_signed %3472, %c8 : index + %3474 = subi %c-1, %3473 : index + %3475 = select %3470, %3474, %3473 : index + %3476 = addi %arg5, %c8 : index + %3477 = cmpi "slt", %3476, %c0 : index + %3478 = subi %c-1, %3476 : index + %3479 = select %3477, %3478, %3476 : index + %3480 = divi_signed %3479, %c16 : index + %3481 = subi %c-1, %3480 : index + %3482 = select %3477, %3481, %3480 : index + %3483 = muli %3482, %c-2 : index + %3484 = addi %3475, %3483 : index + %3485 = cmpi "slt", %arg5, %c0 : index + %3486 = subi %c-1, %arg5 : index + %3487 = select %3485, %3486, %arg5 : index + %3488 = divi_signed %3487, %c8 : index + %3489 = subi %c-1, %3488 : index + %3490 = select %3485, %3489, %3488 : index + %3491 = addi %arg5, %c8 : index + %3492 = cmpi "slt", %3491, %c0 : index + %3493 = subi %c-1, %3491 : index + %3494 = select %3492, %3493, %3491 : index + %3495 = divi_signed %3494, %c16 : index + %3496 = subi %c-1, %3495 : index + %3497 = select %3492, %3496, %3495 : index + %3498 = muli %3497, %c-2 : index + %3499 = addi %3490, %3498 : index + %3500 = addi %3499, %c1 : index + %3501 = cmpi "slt", %3500, %c0 : index + %3502 = subi %c-1, %3500 : index + %3503 = select %3501, %3502, %3500 : index + %3504 = divi_signed %3503, %c2 : index + %3505 = subi %c-1, %3504 : index + %3506 = select %3501, %3505, %3504 : index + %3507 = muli %3506, %c-2 : index + %3508 = addi %3484, %3507 : index + %3509 = addi %3508, %c1 : index + store %3453, %2[%3464, %3469, %3509] : memref<16x6x2xvector<8xf32>> + %3510 = addi %arg5, %c8 : index + %3511 = cmpi "slt", %3510, %c0 : index + %3512 = subi %c-1, %3510 : index + %3513 = select %3511, %3512, %3510 : index + %3514 = divi_signed %3513, %c16 : index + %3515 = subi %c-1, %3514 : index + %3516 = select %3511, %3515, %3514 : index + %3517 = remi_signed %3516, %c16 : index + %3518 = cmpi "slt", %3517, %c0 : index + %3519 = addi %3517, %c16 : index + %3520 = select %3518, %3519, %3517 : index + %3521 = addi %arg7, %arg9 : index + %3522 = remi_signed %3521, %c6 : index + %3523 = cmpi "slt", %3522, %c0 : index + %3524 = addi %3522, %c6 : index + %3525 = select %3523, %3524, %3522 : index + %3526 = cmpi "slt", %arg5, %c0 : index + %3527 = subi %c-1, %arg5 : index + %3528 = select %3526, %3527, %arg5 : index + %3529 = divi_signed %3528, %c8 : index + %3530 = subi %c-1, %3529 : index + %3531 = select %3526, %3530, %3529 : index + %3532 = addi %arg5, %c8 : index + %3533 = cmpi "slt", %3532, %c0 : index + %3534 = subi %c-1, %3532 : index + %3535 = select %3533, %3534, %3532 : index + %3536 = divi_signed %3535, %c16 : index + %3537 = subi %c-1, %3536 : index + %3538 = select %3533, %3537, %3536 : index + %3539 = muli %3538, %c-2 : index + %3540 = addi %3531, %3539 : index + %3541 = cmpi "slt", %arg5, %c0 : index + %3542 = subi %c-1, %arg5 : index + %3543 = select %3541, %3542, %arg5 : index + %3544 = divi_signed %3543, %c8 : index + %3545 = subi %c-1, %3544 : index + %3546 = select %3541, %3545, %3544 : index + %3547 = addi %arg5, %c8 : index + %3548 = cmpi "slt", %3547, %c0 : index + %3549 = subi %c-1, %3547 : index + %3550 = select %3548, %3549, %3547 : index + %3551 = divi_signed %3550, %c16 : index + %3552 = subi %c-1, %3551 : index + %3553 = select %3548, %3552, %3551 : index + %3554 = muli %3553, %c-2 : index + %3555 = addi %3546, %3554 : index + %3556 = addi %3555, %c1 : index + %3557 = cmpi "slt", %3556, %c0 : index + %3558 = subi %c-1, %3556 : index + %3559 = select %3557, %3558, %3556 : index + %3560 = divi_signed %3559, %c2 : index + %3561 = subi %c-1, %3560 : index + %3562 = select %3557, %3561, %3560 : index + %3563 = muli %3562, %c-2 : index + %3564 = addi %3540, %3563 : index + %3565 = addi %3564, %c1 : index + %3566 = load %2[%3520, %3525, %3565] : memref<16x6x2xvector<8xf32>> + %3567 = vector.insertelement %2477, %3566[%c1_i64 : i64] : vector<8xf32> + %3568 = addi %arg5, %c8 : index + %3569 = cmpi "slt", %3568, %c0 : index + %3570 = subi %c-1, %3568 : index + %3571 = select %3569, %3570, %3568 : index + %3572 = divi_signed %3571, %c16 : index + %3573 = subi %c-1, %3572 : index + %3574 = select %3569, %3573, %3572 : index + %3575 = remi_signed %3574, %c16 : index + %3576 = cmpi "slt", %3575, %c0 : index + %3577 = addi %3575, %c16 : index + %3578 = select %3576, %3577, %3575 : index + %3579 = addi %arg7, %arg9 : index + %3580 = remi_signed %3579, %c6 : index + %3581 = cmpi "slt", %3580, %c0 : index + %3582 = addi %3580, %c6 : index + %3583 = select %3581, %3582, %3580 : index + %3584 = cmpi "slt", %arg5, %c0 : index + %3585 = subi %c-1, %arg5 : index + %3586 = select %3584, %3585, %arg5 : index + %3587 = divi_signed %3586, %c8 : index + %3588 = subi %c-1, %3587 : index + %3589 = select %3584, %3588, %3587 : index + %3590 = addi %arg5, %c8 : index + %3591 = cmpi "slt", %3590, %c0 : index + %3592 = subi %c-1, %3590 : index + %3593 = select %3591, %3592, %3590 : index + %3594 = divi_signed %3593, %c16 : index + %3595 = subi %c-1, %3594 : index + %3596 = select %3591, %3595, %3594 : index + %3597 = muli %3596, %c-2 : index + %3598 = addi %3589, %3597 : index + %3599 = cmpi "slt", %arg5, %c0 : index + %3600 = subi %c-1, %arg5 : index + %3601 = select %3599, %3600, %arg5 : index + %3602 = divi_signed %3601, %c8 : index + %3603 = subi %c-1, %3602 : index + %3604 = select %3599, %3603, %3602 : index + %3605 = addi %arg5, %c8 : index + %3606 = cmpi "slt", %3605, %c0 : index + %3607 = subi %c-1, %3605 : index + %3608 = select %3606, %3607, %3605 : index + %3609 = divi_signed %3608, %c16 : index + %3610 = subi %c-1, %3609 : index + %3611 = select %3606, %3610, %3609 : index + %3612 = muli %3611, %c-2 : index + %3613 = addi %3604, %3612 : index + %3614 = addi %3613, %c1 : index + %3615 = cmpi "slt", %3614, %c0 : index + %3616 = subi %c-1, %3614 : index + %3617 = select %3615, %3616, %3614 : index + %3618 = divi_signed %3617, %c2 : index + %3619 = subi %c-1, %3618 : index + %3620 = select %3615, %3619, %3618 : index + %3621 = muli %3620, %c-2 : index + %3622 = addi %3598, %3621 : index + %3623 = addi %3622, %c1 : index + store %3567, %2[%3578, %3583, %3623] : memref<16x6x2xvector<8xf32>> + %3624 = addi %arg5, %c8 : index + %3625 = cmpi "slt", %3624, %c0 : index + %3626 = subi %c-1, %3624 : index + %3627 = select %3625, %3626, %3624 : index + %3628 = divi_signed %3627, %c16 : index + %3629 = subi %c-1, %3628 : index + %3630 = select %3625, %3629, %3628 : index + %3631 = remi_signed %3630, %c16 : index + %3632 = cmpi "slt", %3631, %c0 : index + %3633 = addi %3631, %c16 : index + %3634 = select %3632, %3633, %3631 : index + %3635 = addi %arg7, %arg9 : index + %3636 = remi_signed %3635, %c6 : index + %3637 = cmpi "slt", %3636, %c0 : index + %3638 = addi %3636, %c6 : index + %3639 = select %3637, %3638, %3636 : index + %3640 = cmpi "slt", %arg5, %c0 : index + %3641 = subi %c-1, %arg5 : index + %3642 = select %3640, %3641, %arg5 : index + %3643 = divi_signed %3642, %c8 : index + %3644 = subi %c-1, %3643 : index + %3645 = select %3640, %3644, %3643 : index + %3646 = addi %arg5, %c8 : index + %3647 = cmpi "slt", %3646, %c0 : index + %3648 = subi %c-1, %3646 : index + %3649 = select %3647, %3648, %3646 : index + %3650 = divi_signed %3649, %c16 : index + %3651 = subi %c-1, %3650 : index + %3652 = select %3647, %3651, %3650 : index + %3653 = muli %3652, %c-2 : index + %3654 = addi %3645, %3653 : index + %3655 = cmpi "slt", %arg5, %c0 : index + %3656 = subi %c-1, %arg5 : index + %3657 = select %3655, %3656, %arg5 : index + %3658 = divi_signed %3657, %c8 : index + %3659 = subi %c-1, %3658 : index + %3660 = select %3655, %3659, %3658 : index + %3661 = addi %arg5, %c8 : index + %3662 = cmpi "slt", %3661, %c0 : index + %3663 = subi %c-1, %3661 : index + %3664 = select %3662, %3663, %3661 : index + %3665 = divi_signed %3664, %c16 : index + %3666 = subi %c-1, %3665 : index + %3667 = select %3662, %3666, %3665 : index + %3668 = muli %3667, %c-2 : index + %3669 = addi %3660, %3668 : index + %3670 = addi %3669, %c1 : index + %3671 = cmpi "slt", %3670, %c0 : index + %3672 = subi %c-1, %3670 : index + %3673 = select %3671, %3672, %3670 : index + %3674 = divi_signed %3673, %c2 : index + %3675 = subi %c-1, %3674 : index + %3676 = select %3671, %3675, %3674 : index + %3677 = muli %3676, %c-2 : index + %3678 = addi %3654, %3677 : index + %3679 = addi %3678, %c1 : index + %3680 = load %2[%3634, %3639, %3679] : memref<16x6x2xvector<8xf32>> + %3681 = vector.insertelement %2478, %3680[%c2_i64 : i64] : vector<8xf32> + %3682 = addi %arg5, %c8 : index + %3683 = cmpi "slt", %3682, %c0 : index + %3684 = subi %c-1, %3682 : index + %3685 = select %3683, %3684, %3682 : index + %3686 = divi_signed %3685, %c16 : index + %3687 = subi %c-1, %3686 : index + %3688 = select %3683, %3687, %3686 : index + %3689 = remi_signed %3688, %c16 : index + %3690 = cmpi "slt", %3689, %c0 : index + %3691 = addi %3689, %c16 : index + %3692 = select %3690, %3691, %3689 : index + %3693 = addi %arg7, %arg9 : index + %3694 = remi_signed %3693, %c6 : index + %3695 = cmpi "slt", %3694, %c0 : index + %3696 = addi %3694, %c6 : index + %3697 = select %3695, %3696, %3694 : index + %3698 = cmpi "slt", %arg5, %c0 : index + %3699 = subi %c-1, %arg5 : index + %3700 = select %3698, %3699, %arg5 : index + %3701 = divi_signed %3700, %c8 : index + %3702 = subi %c-1, %3701 : index + %3703 = select %3698, %3702, %3701 : index + %3704 = addi %arg5, %c8 : index + %3705 = cmpi "slt", %3704, %c0 : index + %3706 = subi %c-1, %3704 : index + %3707 = select %3705, %3706, %3704 : index + %3708 = divi_signed %3707, %c16 : index + %3709 = subi %c-1, %3708 : index + %3710 = select %3705, %3709, %3708 : index + %3711 = muli %3710, %c-2 : index + %3712 = addi %3703, %3711 : index + %3713 = cmpi "slt", %arg5, %c0 : index + %3714 = subi %c-1, %arg5 : index + %3715 = select %3713, %3714, %arg5 : index + %3716 = divi_signed %3715, %c8 : index + %3717 = subi %c-1, %3716 : index + %3718 = select %3713, %3717, %3716 : index + %3719 = addi %arg5, %c8 : index + %3720 = cmpi "slt", %3719, %c0 : index + %3721 = subi %c-1, %3719 : index + %3722 = select %3720, %3721, %3719 : index + %3723 = divi_signed %3722, %c16 : index + %3724 = subi %c-1, %3723 : index + %3725 = select %3720, %3724, %3723 : index + %3726 = muli %3725, %c-2 : index + %3727 = addi %3718, %3726 : index + %3728 = addi %3727, %c1 : index + %3729 = cmpi "slt", %3728, %c0 : index + %3730 = subi %c-1, %3728 : index + %3731 = select %3729, %3730, %3728 : index + %3732 = divi_signed %3731, %c2 : index + %3733 = subi %c-1, %3732 : index + %3734 = select %3729, %3733, %3732 : index + %3735 = muli %3734, %c-2 : index + %3736 = addi %3712, %3735 : index + %3737 = addi %3736, %c1 : index + store %3681, %2[%3692, %3697, %3737] : memref<16x6x2xvector<8xf32>> + %3738 = addi %arg5, %c8 : index + %3739 = cmpi "slt", %3738, %c0 : index + %3740 = subi %c-1, %3738 : index + %3741 = select %3739, %3740, %3738 : index + %3742 = divi_signed %3741, %c16 : index + %3743 = subi %c-1, %3742 : index + %3744 = select %3739, %3743, %3742 : index + %3745 = remi_signed %3744, %c16 : index + %3746 = cmpi "slt", %3745, %c0 : index + %3747 = addi %3745, %c16 : index + %3748 = select %3746, %3747, %3745 : index + %3749 = addi %arg7, %arg9 : index + %3750 = remi_signed %3749, %c6 : index + %3751 = cmpi "slt", %3750, %c0 : index + %3752 = addi %3750, %c6 : index + %3753 = select %3751, %3752, %3750 : index + %3754 = cmpi "slt", %arg5, %c0 : index + %3755 = subi %c-1, %arg5 : index + %3756 = select %3754, %3755, %arg5 : index + %3757 = divi_signed %3756, %c8 : index + %3758 = subi %c-1, %3757 : index + %3759 = select %3754, %3758, %3757 : index + %3760 = addi %arg5, %c8 : index + %3761 = cmpi "slt", %3760, %c0 : index + %3762 = subi %c-1, %3760 : index + %3763 = select %3761, %3762, %3760 : index + %3764 = divi_signed %3763, %c16 : index + %3765 = subi %c-1, %3764 : index + %3766 = select %3761, %3765, %3764 : index + %3767 = muli %3766, %c-2 : index + %3768 = addi %3759, %3767 : index + %3769 = cmpi "slt", %arg5, %c0 : index + %3770 = subi %c-1, %arg5 : index + %3771 = select %3769, %3770, %arg5 : index + %3772 = divi_signed %3771, %c8 : index + %3773 = subi %c-1, %3772 : index + %3774 = select %3769, %3773, %3772 : index + %3775 = addi %arg5, %c8 : index + %3776 = cmpi "slt", %3775, %c0 : index + %3777 = subi %c-1, %3775 : index + %3778 = select %3776, %3777, %3775 : index + %3779 = divi_signed %3778, %c16 : index + %3780 = subi %c-1, %3779 : index + %3781 = select %3776, %3780, %3779 : index + %3782 = muli %3781, %c-2 : index + %3783 = addi %3774, %3782 : index + %3784 = addi %3783, %c1 : index + %3785 = cmpi "slt", %3784, %c0 : index + %3786 = subi %c-1, %3784 : index + %3787 = select %3785, %3786, %3784 : index + %3788 = divi_signed %3787, %c2 : index + %3789 = subi %c-1, %3788 : index + %3790 = select %3785, %3789, %3788 : index + %3791 = muli %3790, %c-2 : index + %3792 = addi %3768, %3791 : index + %3793 = addi %3792, %c1 : index + %3794 = load %2[%3748, %3753, %3793] : memref<16x6x2xvector<8xf32>> + %3795 = vector.insertelement %2479, %3794[%c3_i64 : i64] : vector<8xf32> + %3796 = addi %arg5, %c8 : index + %3797 = cmpi "slt", %3796, %c0 : index + %3798 = subi %c-1, %3796 : index + %3799 = select %3797, %3798, %3796 : index + %3800 = divi_signed %3799, %c16 : index + %3801 = subi %c-1, %3800 : index + %3802 = select %3797, %3801, %3800 : index + %3803 = remi_signed %3802, %c16 : index + %3804 = cmpi "slt", %3803, %c0 : index + %3805 = addi %3803, %c16 : index + %3806 = select %3804, %3805, %3803 : index + %3807 = addi %arg7, %arg9 : index + %3808 = remi_signed %3807, %c6 : index + %3809 = cmpi "slt", %3808, %c0 : index + %3810 = addi %3808, %c6 : index + %3811 = select %3809, %3810, %3808 : index + %3812 = cmpi "slt", %arg5, %c0 : index + %3813 = subi %c-1, %arg5 : index + %3814 = select %3812, %3813, %arg5 : index + %3815 = divi_signed %3814, %c8 : index + %3816 = subi %c-1, %3815 : index + %3817 = select %3812, %3816, %3815 : index + %3818 = addi %arg5, %c8 : index + %3819 = cmpi "slt", %3818, %c0 : index + %3820 = subi %c-1, %3818 : index + %3821 = select %3819, %3820, %3818 : index + %3822 = divi_signed %3821, %c16 : index + %3823 = subi %c-1, %3822 : index + %3824 = select %3819, %3823, %3822 : index + %3825 = muli %3824, %c-2 : index + %3826 = addi %3817, %3825 : index + %3827 = cmpi "slt", %arg5, %c0 : index + %3828 = subi %c-1, %arg5 : index + %3829 = select %3827, %3828, %arg5 : index + %3830 = divi_signed %3829, %c8 : index + %3831 = subi %c-1, %3830 : index + %3832 = select %3827, %3831, %3830 : index + %3833 = addi %arg5, %c8 : index + %3834 = cmpi "slt", %3833, %c0 : index + %3835 = subi %c-1, %3833 : index + %3836 = select %3834, %3835, %3833 : index + %3837 = divi_signed %3836, %c16 : index + %3838 = subi %c-1, %3837 : index + %3839 = select %3834, %3838, %3837 : index + %3840 = muli %3839, %c-2 : index + %3841 = addi %3832, %3840 : index + %3842 = addi %3841, %c1 : index + %3843 = cmpi "slt", %3842, %c0 : index + %3844 = subi %c-1, %3842 : index + %3845 = select %3843, %3844, %3842 : index + %3846 = divi_signed %3845, %c2 : index + %3847 = subi %c-1, %3846 : index + %3848 = select %3843, %3847, %3846 : index + %3849 = muli %3848, %c-2 : index + %3850 = addi %3826, %3849 : index + %3851 = addi %3850, %c1 : index + store %3795, %2[%3806, %3811, %3851] : memref<16x6x2xvector<8xf32>> + %3852 = addi %arg5, %c8 : index + %3853 = cmpi "slt", %3852, %c0 : index + %3854 = subi %c-1, %3852 : index + %3855 = select %3853, %3854, %3852 : index + %3856 = divi_signed %3855, %c16 : index + %3857 = subi %c-1, %3856 : index + %3858 = select %3853, %3857, %3856 : index + %3859 = remi_signed %3858, %c16 : index + %3860 = cmpi "slt", %3859, %c0 : index + %3861 = addi %3859, %c16 : index + %3862 = select %3860, %3861, %3859 : index + %3863 = addi %arg7, %arg9 : index + %3864 = remi_signed %3863, %c6 : index + %3865 = cmpi "slt", %3864, %c0 : index + %3866 = addi %3864, %c6 : index + %3867 = select %3865, %3866, %3864 : index + %3868 = cmpi "slt", %arg5, %c0 : index + %3869 = subi %c-1, %arg5 : index + %3870 = select %3868, %3869, %arg5 : index + %3871 = divi_signed %3870, %c8 : index + %3872 = subi %c-1, %3871 : index + %3873 = select %3868, %3872, %3871 : index + %3874 = addi %arg5, %c8 : index + %3875 = cmpi "slt", %3874, %c0 : index + %3876 = subi %c-1, %3874 : index + %3877 = select %3875, %3876, %3874 : index + %3878 = divi_signed %3877, %c16 : index + %3879 = subi %c-1, %3878 : index + %3880 = select %3875, %3879, %3878 : index + %3881 = muli %3880, %c-2 : index + %3882 = addi %3873, %3881 : index + %3883 = cmpi "slt", %arg5, %c0 : index + %3884 = subi %c-1, %arg5 : index + %3885 = select %3883, %3884, %arg5 : index + %3886 = divi_signed %3885, %c8 : index + %3887 = subi %c-1, %3886 : index + %3888 = select %3883, %3887, %3886 : index + %3889 = addi %arg5, %c8 : index + %3890 = cmpi "slt", %3889, %c0 : index + %3891 = subi %c-1, %3889 : index + %3892 = select %3890, %3891, %3889 : index + %3893 = divi_signed %3892, %c16 : index + %3894 = subi %c-1, %3893 : index + %3895 = select %3890, %3894, %3893 : index + %3896 = muli %3895, %c-2 : index + %3897 = addi %3888, %3896 : index + %3898 = addi %3897, %c1 : index + %3899 = cmpi "slt", %3898, %c0 : index + %3900 = subi %c-1, %3898 : index + %3901 = select %3899, %3900, %3898 : index + %3902 = divi_signed %3901, %c2 : index + %3903 = subi %c-1, %3902 : index + %3904 = select %3899, %3903, %3902 : index + %3905 = muli %3904, %c-2 : index + %3906 = addi %3882, %3905 : index + %3907 = addi %3906, %c1 : index + %3908 = load %2[%3862, %3867, %3907] : memref<16x6x2xvector<8xf32>> + %3909 = vector.insertelement %2480, %3908[%c4_i64 : i64] : vector<8xf32> + %3910 = addi %arg5, %c8 : index + %3911 = cmpi "slt", %3910, %c0 : index + %3912 = subi %c-1, %3910 : index + %3913 = select %3911, %3912, %3910 : index + %3914 = divi_signed %3913, %c16 : index + %3915 = subi %c-1, %3914 : index + %3916 = select %3911, %3915, %3914 : index + %3917 = remi_signed %3916, %c16 : index + %3918 = cmpi "slt", %3917, %c0 : index + %3919 = addi %3917, %c16 : index + %3920 = select %3918, %3919, %3917 : index + %3921 = addi %arg7, %arg9 : index + %3922 = remi_signed %3921, %c6 : index + %3923 = cmpi "slt", %3922, %c0 : index + %3924 = addi %3922, %c6 : index + %3925 = select %3923, %3924, %3922 : index + %3926 = cmpi "slt", %arg5, %c0 : index + %3927 = subi %c-1, %arg5 : index + %3928 = select %3926, %3927, %arg5 : index + %3929 = divi_signed %3928, %c8 : index + %3930 = subi %c-1, %3929 : index + %3931 = select %3926, %3930, %3929 : index + %3932 = addi %arg5, %c8 : index + %3933 = cmpi "slt", %3932, %c0 : index + %3934 = subi %c-1, %3932 : index + %3935 = select %3933, %3934, %3932 : index + %3936 = divi_signed %3935, %c16 : index + %3937 = subi %c-1, %3936 : index + %3938 = select %3933, %3937, %3936 : index + %3939 = muli %3938, %c-2 : index + %3940 = addi %3931, %3939 : index + %3941 = cmpi "slt", %arg5, %c0 : index + %3942 = subi %c-1, %arg5 : index + %3943 = select %3941, %3942, %arg5 : index + %3944 = divi_signed %3943, %c8 : index + %3945 = subi %c-1, %3944 : index + %3946 = select %3941, %3945, %3944 : index + %3947 = addi %arg5, %c8 : index + %3948 = cmpi "slt", %3947, %c0 : index + %3949 = subi %c-1, %3947 : index + %3950 = select %3948, %3949, %3947 : index + %3951 = divi_signed %3950, %c16 : index + %3952 = subi %c-1, %3951 : index + %3953 = select %3948, %3952, %3951 : index + %3954 = muli %3953, %c-2 : index + %3955 = addi %3946, %3954 : index + %3956 = addi %3955, %c1 : index + %3957 = cmpi "slt", %3956, %c0 : index + %3958 = subi %c-1, %3956 : index + %3959 = select %3957, %3958, %3956 : index + %3960 = divi_signed %3959, %c2 : index + %3961 = subi %c-1, %3960 : index + %3962 = select %3957, %3961, %3960 : index + %3963 = muli %3962, %c-2 : index + %3964 = addi %3940, %3963 : index + %3965 = addi %3964, %c1 : index + store %3909, %2[%3920, %3925, %3965] : memref<16x6x2xvector<8xf32>> + %3966 = addi %arg5, %c8 : index + %3967 = cmpi "slt", %3966, %c0 : index + %3968 = subi %c-1, %3966 : index + %3969 = select %3967, %3968, %3966 : index + %3970 = divi_signed %3969, %c16 : index + %3971 = subi %c-1, %3970 : index + %3972 = select %3967, %3971, %3970 : index + %3973 = remi_signed %3972, %c16 : index + %3974 = cmpi "slt", %3973, %c0 : index + %3975 = addi %3973, %c16 : index + %3976 = select %3974, %3975, %3973 : index + %3977 = addi %arg7, %arg9 : index + %3978 = remi_signed %3977, %c6 : index + %3979 = cmpi "slt", %3978, %c0 : index + %3980 = addi %3978, %c6 : index + %3981 = select %3979, %3980, %3978 : index + %3982 = cmpi "slt", %arg5, %c0 : index + %3983 = subi %c-1, %arg5 : index + %3984 = select %3982, %3983, %arg5 : index + %3985 = divi_signed %3984, %c8 : index + %3986 = subi %c-1, %3985 : index + %3987 = select %3982, %3986, %3985 : index + %3988 = addi %arg5, %c8 : index + %3989 = cmpi "slt", %3988, %c0 : index + %3990 = subi %c-1, %3988 : index + %3991 = select %3989, %3990, %3988 : index + %3992 = divi_signed %3991, %c16 : index + %3993 = subi %c-1, %3992 : index + %3994 = select %3989, %3993, %3992 : index + %3995 = muli %3994, %c-2 : index + %3996 = addi %3987, %3995 : index + %3997 = cmpi "slt", %arg5, %c0 : index + %3998 = subi %c-1, %arg5 : index + %3999 = select %3997, %3998, %arg5 : index + %4000 = divi_signed %3999, %c8 : index + %4001 = subi %c-1, %4000 : index + %4002 = select %3997, %4001, %4000 : index + %4003 = addi %arg5, %c8 : index + %4004 = cmpi "slt", %4003, %c0 : index + %4005 = subi %c-1, %4003 : index + %4006 = select %4004, %4005, %4003 : index + %4007 = divi_signed %4006, %c16 : index + %4008 = subi %c-1, %4007 : index + %4009 = select %4004, %4008, %4007 : index + %4010 = muli %4009, %c-2 : index + %4011 = addi %4002, %4010 : index + %4012 = addi %4011, %c1 : index + %4013 = cmpi "slt", %4012, %c0 : index + %4014 = subi %c-1, %4012 : index + %4015 = select %4013, %4014, %4012 : index + %4016 = divi_signed %4015, %c2 : index + %4017 = subi %c-1, %4016 : index + %4018 = select %4013, %4017, %4016 : index + %4019 = muli %4018, %c-2 : index + %4020 = addi %3996, %4019 : index + %4021 = addi %4020, %c1 : index + %4022 = load %2[%3976, %3981, %4021] : memref<16x6x2xvector<8xf32>> + %4023 = vector.insertelement %2481, %4022[%c5_i64 : i64] : vector<8xf32> + %4024 = addi %arg5, %c8 : index + %4025 = cmpi "slt", %4024, %c0 : index + %4026 = subi %c-1, %4024 : index + %4027 = select %4025, %4026, %4024 : index + %4028 = divi_signed %4027, %c16 : index + %4029 = subi %c-1, %4028 : index + %4030 = select %4025, %4029, %4028 : index + %4031 = remi_signed %4030, %c16 : index + %4032 = cmpi "slt", %4031, %c0 : index + %4033 = addi %4031, %c16 : index + %4034 = select %4032, %4033, %4031 : index + %4035 = addi %arg7, %arg9 : index + %4036 = remi_signed %4035, %c6 : index + %4037 = cmpi "slt", %4036, %c0 : index + %4038 = addi %4036, %c6 : index + %4039 = select %4037, %4038, %4036 : index + %4040 = cmpi "slt", %arg5, %c0 : index + %4041 = subi %c-1, %arg5 : index + %4042 = select %4040, %4041, %arg5 : index + %4043 = divi_signed %4042, %c8 : index + %4044 = subi %c-1, %4043 : index + %4045 = select %4040, %4044, %4043 : index + %4046 = addi %arg5, %c8 : index + %4047 = cmpi "slt", %4046, %c0 : index + %4048 = subi %c-1, %4046 : index + %4049 = select %4047, %4048, %4046 : index + %4050 = divi_signed %4049, %c16 : index + %4051 = subi %c-1, %4050 : index + %4052 = select %4047, %4051, %4050 : index + %4053 = muli %4052, %c-2 : index + %4054 = addi %4045, %4053 : index + %4055 = cmpi "slt", %arg5, %c0 : index + %4056 = subi %c-1, %arg5 : index + %4057 = select %4055, %4056, %arg5 : index + %4058 = divi_signed %4057, %c8 : index + %4059 = subi %c-1, %4058 : index + %4060 = select %4055, %4059, %4058 : index + %4061 = addi %arg5, %c8 : index + %4062 = cmpi "slt", %4061, %c0 : index + %4063 = subi %c-1, %4061 : index + %4064 = select %4062, %4063, %4061 : index + %4065 = divi_signed %4064, %c16 : index + %4066 = subi %c-1, %4065 : index + %4067 = select %4062, %4066, %4065 : index + %4068 = muli %4067, %c-2 : index + %4069 = addi %4060, %4068 : index + %4070 = addi %4069, %c1 : index + %4071 = cmpi "slt", %4070, %c0 : index + %4072 = subi %c-1, %4070 : index + %4073 = select %4071, %4072, %4070 : index + %4074 = divi_signed %4073, %c2 : index + %4075 = subi %c-1, %4074 : index + %4076 = select %4071, %4075, %4074 : index + %4077 = muli %4076, %c-2 : index + %4078 = addi %4054, %4077 : index + %4079 = addi %4078, %c1 : index + store %4023, %2[%4034, %4039, %4079] : memref<16x6x2xvector<8xf32>> + %4080 = addi %arg5, %c8 : index + %4081 = cmpi "slt", %4080, %c0 : index + %4082 = subi %c-1, %4080 : index + %4083 = select %4081, %4082, %4080 : index + %4084 = divi_signed %4083, %c16 : index + %4085 = subi %c-1, %4084 : index + %4086 = select %4081, %4085, %4084 : index + %4087 = remi_signed %4086, %c16 : index + %4088 = cmpi "slt", %4087, %c0 : index + %4089 = addi %4087, %c16 : index + %4090 = select %4088, %4089, %4087 : index + %4091 = addi %arg7, %arg9 : index + %4092 = remi_signed %4091, %c6 : index + %4093 = cmpi "slt", %4092, %c0 : index + %4094 = addi %4092, %c6 : index + %4095 = select %4093, %4094, %4092 : index + %4096 = cmpi "slt", %arg5, %c0 : index + %4097 = subi %c-1, %arg5 : index + %4098 = select %4096, %4097, %arg5 : index + %4099 = divi_signed %4098, %c8 : index + %4100 = subi %c-1, %4099 : index + %4101 = select %4096, %4100, %4099 : index + %4102 = addi %arg5, %c8 : index + %4103 = cmpi "slt", %4102, %c0 : index + %4104 = subi %c-1, %4102 : index + %4105 = select %4103, %4104, %4102 : index + %4106 = divi_signed %4105, %c16 : index + %4107 = subi %c-1, %4106 : index + %4108 = select %4103, %4107, %4106 : index + %4109 = muli %4108, %c-2 : index + %4110 = addi %4101, %4109 : index + %4111 = cmpi "slt", %arg5, %c0 : index + %4112 = subi %c-1, %arg5 : index + %4113 = select %4111, %4112, %arg5 : index + %4114 = divi_signed %4113, %c8 : index + %4115 = subi %c-1, %4114 : index + %4116 = select %4111, %4115, %4114 : index + %4117 = addi %arg5, %c8 : index + %4118 = cmpi "slt", %4117, %c0 : index + %4119 = subi %c-1, %4117 : index + %4120 = select %4118, %4119, %4117 : index + %4121 = divi_signed %4120, %c16 : index + %4122 = subi %c-1, %4121 : index + %4123 = select %4118, %4122, %4121 : index + %4124 = muli %4123, %c-2 : index + %4125 = addi %4116, %4124 : index + %4126 = addi %4125, %c1 : index + %4127 = cmpi "slt", %4126, %c0 : index + %4128 = subi %c-1, %4126 : index + %4129 = select %4127, %4128, %4126 : index + %4130 = divi_signed %4129, %c2 : index + %4131 = subi %c-1, %4130 : index + %4132 = select %4127, %4131, %4130 : index + %4133 = muli %4132, %c-2 : index + %4134 = addi %4110, %4133 : index + %4135 = addi %4134, %c1 : index + %4136 = load %2[%4090, %4095, %4135] : memref<16x6x2xvector<8xf32>> + %4137 = vector.insertelement %2482, %4136[%c6_i64 : i64] : vector<8xf32> + %4138 = addi %arg5, %c8 : index + %4139 = cmpi "slt", %4138, %c0 : index + %4140 = subi %c-1, %4138 : index + %4141 = select %4139, %4140, %4138 : index + %4142 = divi_signed %4141, %c16 : index + %4143 = subi %c-1, %4142 : index + %4144 = select %4139, %4143, %4142 : index + %4145 = remi_signed %4144, %c16 : index + %4146 = cmpi "slt", %4145, %c0 : index + %4147 = addi %4145, %c16 : index + %4148 = select %4146, %4147, %4145 : index + %4149 = addi %arg7, %arg9 : index + %4150 = remi_signed %4149, %c6 : index + %4151 = cmpi "slt", %4150, %c0 : index + %4152 = addi %4150, %c6 : index + %4153 = select %4151, %4152, %4150 : index + %4154 = cmpi "slt", %arg5, %c0 : index + %4155 = subi %c-1, %arg5 : index + %4156 = select %4154, %4155, %arg5 : index + %4157 = divi_signed %4156, %c8 : index + %4158 = subi %c-1, %4157 : index + %4159 = select %4154, %4158, %4157 : index + %4160 = addi %arg5, %c8 : index + %4161 = cmpi "slt", %4160, %c0 : index + %4162 = subi %c-1, %4160 : index + %4163 = select %4161, %4162, %4160 : index + %4164 = divi_signed %4163, %c16 : index + %4165 = subi %c-1, %4164 : index + %4166 = select %4161, %4165, %4164 : index + %4167 = muli %4166, %c-2 : index + %4168 = addi %4159, %4167 : index + %4169 = cmpi "slt", %arg5, %c0 : index + %4170 = subi %c-1, %arg5 : index + %4171 = select %4169, %4170, %arg5 : index + %4172 = divi_signed %4171, %c8 : index + %4173 = subi %c-1, %4172 : index + %4174 = select %4169, %4173, %4172 : index + %4175 = addi %arg5, %c8 : index + %4176 = cmpi "slt", %4175, %c0 : index + %4177 = subi %c-1, %4175 : index + %4178 = select %4176, %4177, %4175 : index + %4179 = divi_signed %4178, %c16 : index + %4180 = subi %c-1, %4179 : index + %4181 = select %4176, %4180, %4179 : index + %4182 = muli %4181, %c-2 : index + %4183 = addi %4174, %4182 : index + %4184 = addi %4183, %c1 : index + %4185 = cmpi "slt", %4184, %c0 : index + %4186 = subi %c-1, %4184 : index + %4187 = select %4185, %4186, %4184 : index + %4188 = divi_signed %4187, %c2 : index + %4189 = subi %c-1, %4188 : index + %4190 = select %4185, %4189, %4188 : index + %4191 = muli %4190, %c-2 : index + %4192 = addi %4168, %4191 : index + %4193 = addi %4192, %c1 : index + store %4137, %2[%4148, %4153, %4193] : memref<16x6x2xvector<8xf32>> + %4194 = addi %arg5, %c8 : index + %4195 = cmpi "slt", %4194, %c0 : index + %4196 = subi %c-1, %4194 : index + %4197 = select %4195, %4196, %4194 : index + %4198 = divi_signed %4197, %c16 : index + %4199 = subi %c-1, %4198 : index + %4200 = select %4195, %4199, %4198 : index + %4201 = remi_signed %4200, %c16 : index + %4202 = cmpi "slt", %4201, %c0 : index + %4203 = addi %4201, %c16 : index + %4204 = select %4202, %4203, %4201 : index + %4205 = addi %arg7, %arg9 : index + %4206 = remi_signed %4205, %c6 : index + %4207 = cmpi "slt", %4206, %c0 : index + %4208 = addi %4206, %c6 : index + %4209 = select %4207, %4208, %4206 : index + %4210 = cmpi "slt", %arg5, %c0 : index + %4211 = subi %c-1, %arg5 : index + %4212 = select %4210, %4211, %arg5 : index + %4213 = divi_signed %4212, %c8 : index + %4214 = subi %c-1, %4213 : index + %4215 = select %4210, %4214, %4213 : index + %4216 = addi %arg5, %c8 : index + %4217 = cmpi "slt", %4216, %c0 : index + %4218 = subi %c-1, %4216 : index + %4219 = select %4217, %4218, %4216 : index + %4220 = divi_signed %4219, %c16 : index + %4221 = subi %c-1, %4220 : index + %4222 = select %4217, %4221, %4220 : index + %4223 = muli %4222, %c-2 : index + %4224 = addi %4215, %4223 : index + %4225 = cmpi "slt", %arg5, %c0 : index + %4226 = subi %c-1, %arg5 : index + %4227 = select %4225, %4226, %arg5 : index + %4228 = divi_signed %4227, %c8 : index + %4229 = subi %c-1, %4228 : index + %4230 = select %4225, %4229, %4228 : index + %4231 = addi %arg5, %c8 : index + %4232 = cmpi "slt", %4231, %c0 : index + %4233 = subi %c-1, %4231 : index + %4234 = select %4232, %4233, %4231 : index + %4235 = divi_signed %4234, %c16 : index + %4236 = subi %c-1, %4235 : index + %4237 = select %4232, %4236, %4235 : index + %4238 = muli %4237, %c-2 : index + %4239 = addi %4230, %4238 : index + %4240 = addi %4239, %c1 : index + %4241 = cmpi "slt", %4240, %c0 : index + %4242 = subi %c-1, %4240 : index + %4243 = select %4241, %4242, %4240 : index + %4244 = divi_signed %4243, %c2 : index + %4245 = subi %c-1, %4244 : index + %4246 = select %4241, %4245, %4244 : index + %4247 = muli %4246, %c-2 : index + %4248 = addi %4224, %4247 : index + %4249 = addi %4248, %c1 : index + %4250 = load %2[%4204, %4209, %4249] : memref<16x6x2xvector<8xf32>> + %4251 = vector.insertelement %2483, %4250[%c7_i64 : i64] : vector<8xf32> + %4252 = addi %arg5, %c8 : index + %4253 = cmpi "slt", %4252, %c0 : index + %4254 = subi %c-1, %4252 : index + %4255 = select %4253, %4254, %4252 : index + %4256 = divi_signed %4255, %c16 : index + %4257 = subi %c-1, %4256 : index + %4258 = select %4253, %4257, %4256 : index + %4259 = remi_signed %4258, %c16 : index + %4260 = cmpi "slt", %4259, %c0 : index + %4261 = addi %4259, %c16 : index + %4262 = select %4260, %4261, %4259 : index + %4263 = addi %arg7, %arg9 : index + %4264 = remi_signed %4263, %c6 : index + %4265 = cmpi "slt", %4264, %c0 : index + %4266 = addi %4264, %c6 : index + %4267 = select %4265, %4266, %4264 : index + %4268 = cmpi "slt", %arg5, %c0 : index + %4269 = subi %c-1, %arg5 : index + %4270 = select %4268, %4269, %arg5 : index + %4271 = divi_signed %4270, %c8 : index + %4272 = subi %c-1, %4271 : index + %4273 = select %4268, %4272, %4271 : index + %4274 = addi %arg5, %c8 : index + %4275 = cmpi "slt", %4274, %c0 : index + %4276 = subi %c-1, %4274 : index + %4277 = select %4275, %4276, %4274 : index + %4278 = divi_signed %4277, %c16 : index + %4279 = subi %c-1, %4278 : index + %4280 = select %4275, %4279, %4278 : index + %4281 = muli %4280, %c-2 : index + %4282 = addi %4273, %4281 : index + %4283 = cmpi "slt", %arg5, %c0 : index + %4284 = subi %c-1, %arg5 : index + %4285 = select %4283, %4284, %arg5 : index + %4286 = divi_signed %4285, %c8 : index + %4287 = subi %c-1, %4286 : index + %4288 = select %4283, %4287, %4286 : index + %4289 = addi %arg5, %c8 : index + %4290 = cmpi "slt", %4289, %c0 : index + %4291 = subi %c-1, %4289 : index + %4292 = select %4290, %4291, %4289 : index + %4293 = divi_signed %4292, %c16 : index + %4294 = subi %c-1, %4293 : index + %4295 = select %4290, %4294, %4293 : index + %4296 = muli %4295, %c-2 : index + %4297 = addi %4288, %4296 : index + %4298 = addi %4297, %c1 : index + %4299 = cmpi "slt", %4298, %c0 : index + %4300 = subi %c-1, %4298 : index + %4301 = select %4299, %4300, %4298 : index + %4302 = divi_signed %4301, %c2 : index + %4303 = subi %c-1, %4302 : index + %4304 = select %4299, %4303, %4302 : index + %4305 = muli %4304, %c-2 : index + %4306 = addi %4282, %4305 : index + %4307 = addi %4306, %c1 : index + store %4251, %2[%4262, %4267, %4307] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg7 = %c0 to %c4 step %c1 { + %4 = addi %arg6, %arg7 : index + %5 = addi %arg6, %arg7 : index + %6 = addi %arg6, %arg7 : index + %7 = addi %arg6, %arg7 : index + %8 = addi %arg6, %arg7 : index + %9 = addi %arg6, %arg7 : index + %10 = addi %arg6, %arg7 : index + %11 = addi %arg6, %arg7 : index + %12 = load %arg0[%arg4, %4] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %13 = load %arg0[%arg4, %5] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %14 = load %arg0[%arg4, %6] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %15 = load %arg0[%arg4, %7] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %16 = load %arg0[%arg4, %8] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %17 = load %arg0[%arg4, %9] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %18 = load %arg0[%arg4, %10] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %19 = load %arg0[%arg4, %11] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %20 = cmpi "slt", %arg5, %c0 : index + %21 = subi %c-1, %arg5 : index + %22 = select %20, %21, %arg5 : index + %23 = divi_signed %22, %c16 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c16 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c16 : index + %29 = select %27, %28, %26 : index + %30 = addi %arg6, %arg7 : index + %31 = remi_signed %30, %c128 : index + %32 = cmpi "slt", %31, %c0 : index + %33 = addi %31, %c128 : index + %34 = select %32, %33, %31 : index + %35 = remi_signed %arg5, %c16 : index + %36 = cmpi "slt", %35, %c0 : index + %37 = addi %35, %c16 : index + %38 = select %36, %37, %35 : index + %39 = cmpi "slt", %38, %c0 : index + %40 = subi %c-1, %38 : index + %41 = select %39, %40, %38 : index + %42 = divi_signed %41, %c8 : index + %43 = subi %c-1, %42 : index + %44 = select %39, %43, %42 : index + %45 = remi_signed %44, %c2 : index + %46 = cmpi "slt", %45, %c0 : index + %47 = addi %45, %c2 : index + %48 = select %46, %47, %45 : index + %49 = load %3[%29, %34, %48] : memref<16x128x2xvector<8xf32>> + %50 = vector.extractelement %49[%c0_i64 : i64] : vector<8xf32> + %51 = cmpi "slt", %arg5, %c0 : index + %52 = subi %c-1, %arg5 : index + %53 = select %51, %52, %arg5 : index + %54 = divi_signed %53, %c16 : index + %55 = subi %c-1, %54 : index + %56 = select %51, %55, %54 : index + %57 = remi_signed %56, %c16 : index + %58 = cmpi "slt", %57, %c0 : index + %59 = addi %57, %c16 : index + %60 = select %58, %59, %57 : index + %61 = addi %arg6, %arg7 : index + %62 = remi_signed %61, %c128 : index + %63 = cmpi "slt", %62, %c0 : index + %64 = addi %62, %c128 : index + %65 = select %63, %64, %62 : index + %66 = remi_signed %arg5, %c16 : index + %67 = cmpi "slt", %66, %c0 : index + %68 = addi %66, %c16 : index + %69 = select %67, %68, %66 : index + %70 = cmpi "slt", %69, %c0 : index + %71 = subi %c-1, %69 : index + %72 = select %70, %71, %69 : index + %73 = divi_signed %72, %c8 : index + %74 = subi %c-1, %73 : index + %75 = select %70, %74, %73 : index + %76 = remi_signed %75, %c2 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = addi %76, %c2 : index + %79 = select %77, %78, %76 : index + %80 = load %3[%60, %65, %79] : memref<16x128x2xvector<8xf32>> + %81 = vector.extractelement %80[%c1_i64 : i64] : vector<8xf32> + %82 = cmpi "slt", %arg5, %c0 : index + %83 = subi %c-1, %arg5 : index + %84 = select %82, %83, %arg5 : index + %85 = divi_signed %84, %c16 : index + %86 = subi %c-1, %85 : index + %87 = select %82, %86, %85 : index + %88 = remi_signed %87, %c16 : index + %89 = cmpi "slt", %88, %c0 : index + %90 = addi %88, %c16 : index + %91 = select %89, %90, %88 : index + %92 = addi %arg6, %arg7 : index + %93 = remi_signed %92, %c128 : index + %94 = cmpi "slt", %93, %c0 : index + %95 = addi %93, %c128 : index + %96 = select %94, %95, %93 : index + %97 = remi_signed %arg5, %c16 : index + %98 = cmpi "slt", %97, %c0 : index + %99 = addi %97, %c16 : index + %100 = select %98, %99, %97 : index + %101 = cmpi "slt", %100, %c0 : index + %102 = subi %c-1, %100 : index + %103 = select %101, %102, %100 : index + %104 = divi_signed %103, %c8 : index + %105 = subi %c-1, %104 : index + %106 = select %101, %105, %104 : index + %107 = remi_signed %106, %c2 : index + %108 = cmpi "slt", %107, %c0 : index + %109 = addi %107, %c2 : index + %110 = select %108, %109, %107 : index + %111 = load %3[%91, %96, %110] : memref<16x128x2xvector<8xf32>> + %112 = vector.extractelement %111[%c2_i64 : i64] : vector<8xf32> + %113 = cmpi "slt", %arg5, %c0 : index + %114 = subi %c-1, %arg5 : index + %115 = select %113, %114, %arg5 : index + %116 = divi_signed %115, %c16 : index + %117 = subi %c-1, %116 : index + %118 = select %113, %117, %116 : index + %119 = remi_signed %118, %c16 : index + %120 = cmpi "slt", %119, %c0 : index + %121 = addi %119, %c16 : index + %122 = select %120, %121, %119 : index + %123 = addi %arg6, %arg7 : index + %124 = remi_signed %123, %c128 : index + %125 = cmpi "slt", %124, %c0 : index + %126 = addi %124, %c128 : index + %127 = select %125, %126, %124 : index + %128 = remi_signed %arg5, %c16 : index + %129 = cmpi "slt", %128, %c0 : index + %130 = addi %128, %c16 : index + %131 = select %129, %130, %128 : index + %132 = cmpi "slt", %131, %c0 : index + %133 = subi %c-1, %131 : index + %134 = select %132, %133, %131 : index + %135 = divi_signed %134, %c8 : index + %136 = subi %c-1, %135 : index + %137 = select %132, %136, %135 : index + %138 = remi_signed %137, %c2 : index + %139 = cmpi "slt", %138, %c0 : index + %140 = addi %138, %c2 : index + %141 = select %139, %140, %138 : index + %142 = load %3[%122, %127, %141] : memref<16x128x2xvector<8xf32>> + %143 = vector.extractelement %142[%c3_i64 : i64] : vector<8xf32> + %144 = cmpi "slt", %arg5, %c0 : index + %145 = subi %c-1, %arg5 : index + %146 = select %144, %145, %arg5 : index + %147 = divi_signed %146, %c16 : index + %148 = subi %c-1, %147 : index + %149 = select %144, %148, %147 : index + %150 = remi_signed %149, %c16 : index + %151 = cmpi "slt", %150, %c0 : index + %152 = addi %150, %c16 : index + %153 = select %151, %152, %150 : index + %154 = addi %arg6, %arg7 : index + %155 = remi_signed %154, %c128 : index + %156 = cmpi "slt", %155, %c0 : index + %157 = addi %155, %c128 : index + %158 = select %156, %157, %155 : index + %159 = remi_signed %arg5, %c16 : index + %160 = cmpi "slt", %159, %c0 : index + %161 = addi %159, %c16 : index + %162 = select %160, %161, %159 : index + %163 = cmpi "slt", %162, %c0 : index + %164 = subi %c-1, %162 : index + %165 = select %163, %164, %162 : index + %166 = divi_signed %165, %c8 : index + %167 = subi %c-1, %166 : index + %168 = select %163, %167, %166 : index + %169 = remi_signed %168, %c2 : index + %170 = cmpi "slt", %169, %c0 : index + %171 = addi %169, %c2 : index + %172 = select %170, %171, %169 : index + %173 = load %3[%153, %158, %172] : memref<16x128x2xvector<8xf32>> + %174 = vector.extractelement %173[%c4_i64 : i64] : vector<8xf32> + %175 = cmpi "slt", %arg5, %c0 : index + %176 = subi %c-1, %arg5 : index + %177 = select %175, %176, %arg5 : index + %178 = divi_signed %177, %c16 : index + %179 = subi %c-1, %178 : index + %180 = select %175, %179, %178 : index + %181 = remi_signed %180, %c16 : index + %182 = cmpi "slt", %181, %c0 : index + %183 = addi %181, %c16 : index + %184 = select %182, %183, %181 : index + %185 = addi %arg6, %arg7 : index + %186 = remi_signed %185, %c128 : index + %187 = cmpi "slt", %186, %c0 : index + %188 = addi %186, %c128 : index + %189 = select %187, %188, %186 : index + %190 = remi_signed %arg5, %c16 : index + %191 = cmpi "slt", %190, %c0 : index + %192 = addi %190, %c16 : index + %193 = select %191, %192, %190 : index + %194 = cmpi "slt", %193, %c0 : index + %195 = subi %c-1, %193 : index + %196 = select %194, %195, %193 : index + %197 = divi_signed %196, %c8 : index + %198 = subi %c-1, %197 : index + %199 = select %194, %198, %197 : index + %200 = remi_signed %199, %c2 : index + %201 = cmpi "slt", %200, %c0 : index + %202 = addi %200, %c2 : index + %203 = select %201, %202, %200 : index + %204 = load %3[%184, %189, %203] : memref<16x128x2xvector<8xf32>> + %205 = vector.extractelement %204[%c5_i64 : i64] : vector<8xf32> + %206 = cmpi "slt", %arg5, %c0 : index + %207 = subi %c-1, %arg5 : index + %208 = select %206, %207, %arg5 : index + %209 = divi_signed %208, %c16 : index + %210 = subi %c-1, %209 : index + %211 = select %206, %210, %209 : index + %212 = remi_signed %211, %c16 : index + %213 = cmpi "slt", %212, %c0 : index + %214 = addi %212, %c16 : index + %215 = select %213, %214, %212 : index + %216 = addi %arg6, %arg7 : index + %217 = remi_signed %216, %c128 : index + %218 = cmpi "slt", %217, %c0 : index + %219 = addi %217, %c128 : index + %220 = select %218, %219, %217 : index + %221 = remi_signed %arg5, %c16 : index + %222 = cmpi "slt", %221, %c0 : index + %223 = addi %221, %c16 : index + %224 = select %222, %223, %221 : index + %225 = cmpi "slt", %224, %c0 : index + %226 = subi %c-1, %224 : index + %227 = select %225, %226, %224 : index + %228 = divi_signed %227, %c8 : index + %229 = subi %c-1, %228 : index + %230 = select %225, %229, %228 : index + %231 = remi_signed %230, %c2 : index + %232 = cmpi "slt", %231, %c0 : index + %233 = addi %231, %c2 : index + %234 = select %232, %233, %231 : index + %235 = load %3[%215, %220, %234] : memref<16x128x2xvector<8xf32>> + %236 = vector.extractelement %235[%c6_i64 : i64] : vector<8xf32> + %237 = cmpi "slt", %arg5, %c0 : index + %238 = subi %c-1, %arg5 : index + %239 = select %237, %238, %arg5 : index + %240 = divi_signed %239, %c16 : index + %241 = subi %c-1, %240 : index + %242 = select %237, %241, %240 : index + %243 = remi_signed %242, %c16 : index + %244 = cmpi "slt", %243, %c0 : index + %245 = addi %243, %c16 : index + %246 = select %244, %245, %243 : index + %247 = addi %arg6, %arg7 : index + %248 = remi_signed %247, %c128 : index + %249 = cmpi "slt", %248, %c0 : index + %250 = addi %248, %c128 : index + %251 = select %249, %250, %248 : index + %252 = remi_signed %arg5, %c16 : index + %253 = cmpi "slt", %252, %c0 : index + %254 = addi %252, %c16 : index + %255 = select %253, %254, %252 : index + %256 = cmpi "slt", %255, %c0 : index + %257 = subi %c-1, %255 : index + %258 = select %256, %257, %255 : index + %259 = divi_signed %258, %c8 : index + %260 = subi %c-1, %259 : index + %261 = select %256, %260, %259 : index + %262 = remi_signed %261, %c2 : index + %263 = cmpi "slt", %262, %c0 : index + %264 = addi %262, %c2 : index + %265 = select %263, %264, %262 : index + %266 = load %3[%246, %251, %265] : memref<16x128x2xvector<8xf32>> + %267 = vector.extractelement %266[%c7_i64 : i64] : vector<8xf32> + %268 = mulf %12, %50 {RelaxedPrecision} : f32 + %269 = mulf %13, %81 {RelaxedPrecision} : f32 + %270 = mulf %14, %112 {RelaxedPrecision} : f32 + %271 = mulf %15, %143 {RelaxedPrecision} : f32 + %272 = mulf %16, %174 {RelaxedPrecision} : f32 + %273 = mulf %17, %205 {RelaxedPrecision} : f32 + %274 = mulf %18, %236 {RelaxedPrecision} : f32 + %275 = mulf %19, %267 {RelaxedPrecision} : f32 + %276 = cmpi "slt", %arg5, %c0 : index + %277 = subi %c-1, %arg5 : index + %278 = select %276, %277, %arg5 : index + %279 = divi_signed %278, %c16 : index + %280 = subi %c-1, %279 : index + %281 = select %276, %280, %279 : index + %282 = remi_signed %281, %c16 : index + %283 = cmpi "slt", %282, %c0 : index + %284 = addi %282, %c16 : index + %285 = select %283, %284, %282 : index + %286 = remi_signed %arg5, %c16 : index + %287 = cmpi "slt", %286, %c0 : index + %288 = addi %286, %c16 : index + %289 = select %287, %288, %286 : index + %290 = cmpi "slt", %289, %c0 : index + %291 = subi %c-1, %289 : index + %292 = select %290, %291, %289 : index + %293 = divi_signed %292, %c8 : index + %294 = subi %c-1, %293 : index + %295 = select %290, %294, %293 : index + %296 = remi_signed %295, %c2 : index + %297 = cmpi "slt", %296, %c0 : index + %298 = addi %296, %c2 : index + %299 = select %297, %298, %296 : index + %300 = load %2[%285, %c0, %299] : memref<16x6x2xvector<8xf32>> + %301 = vector.extractelement %300[%c0_i64 : i64] : vector<8xf32> + %302 = cmpi "slt", %arg5, %c0 : index + %303 = subi %c-1, %arg5 : index + %304 = select %302, %303, %arg5 : index + %305 = divi_signed %304, %c16 : index + %306 = subi %c-1, %305 : index + %307 = select %302, %306, %305 : index + %308 = remi_signed %307, %c16 : index + %309 = cmpi "slt", %308, %c0 : index + %310 = addi %308, %c16 : index + %311 = select %309, %310, %308 : index + %312 = remi_signed %arg5, %c16 : index + %313 = cmpi "slt", %312, %c0 : index + %314 = addi %312, %c16 : index + %315 = select %313, %314, %312 : index + %316 = cmpi "slt", %315, %c0 : index + %317 = subi %c-1, %315 : index + %318 = select %316, %317, %315 : index + %319 = divi_signed %318, %c8 : index + %320 = subi %c-1, %319 : index + %321 = select %316, %320, %319 : index + %322 = remi_signed %321, %c2 : index + %323 = cmpi "slt", %322, %c0 : index + %324 = addi %322, %c2 : index + %325 = select %323, %324, %322 : index + %326 = load %2[%311, %c0, %325] : memref<16x6x2xvector<8xf32>> + %327 = vector.extractelement %326[%c1_i64 : i64] : vector<8xf32> + %328 = cmpi "slt", %arg5, %c0 : index + %329 = subi %c-1, %arg5 : index + %330 = select %328, %329, %arg5 : index + %331 = divi_signed %330, %c16 : index + %332 = subi %c-1, %331 : index + %333 = select %328, %332, %331 : index + %334 = remi_signed %333, %c16 : index + %335 = cmpi "slt", %334, %c0 : index + %336 = addi %334, %c16 : index + %337 = select %335, %336, %334 : index + %338 = remi_signed %arg5, %c16 : index + %339 = cmpi "slt", %338, %c0 : index + %340 = addi %338, %c16 : index + %341 = select %339, %340, %338 : index + %342 = cmpi "slt", %341, %c0 : index + %343 = subi %c-1, %341 : index + %344 = select %342, %343, %341 : index + %345 = divi_signed %344, %c8 : index + %346 = subi %c-1, %345 : index + %347 = select %342, %346, %345 : index + %348 = remi_signed %347, %c2 : index + %349 = cmpi "slt", %348, %c0 : index + %350 = addi %348, %c2 : index + %351 = select %349, %350, %348 : index + %352 = load %2[%337, %c0, %351] : memref<16x6x2xvector<8xf32>> + %353 = vector.extractelement %352[%c2_i64 : i64] : vector<8xf32> + %354 = cmpi "slt", %arg5, %c0 : index + %355 = subi %c-1, %arg5 : index + %356 = select %354, %355, %arg5 : index + %357 = divi_signed %356, %c16 : index + %358 = subi %c-1, %357 : index + %359 = select %354, %358, %357 : index + %360 = remi_signed %359, %c16 : index + %361 = cmpi "slt", %360, %c0 : index + %362 = addi %360, %c16 : index + %363 = select %361, %362, %360 : index + %364 = remi_signed %arg5, %c16 : index + %365 = cmpi "slt", %364, %c0 : index + %366 = addi %364, %c16 : index + %367 = select %365, %366, %364 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = subi %c-1, %367 : index + %370 = select %368, %369, %367 : index + %371 = divi_signed %370, %c8 : index + %372 = subi %c-1, %371 : index + %373 = select %368, %372, %371 : index + %374 = remi_signed %373, %c2 : index + %375 = cmpi "slt", %374, %c0 : index + %376 = addi %374, %c2 : index + %377 = select %375, %376, %374 : index + %378 = load %2[%363, %c0, %377] : memref<16x6x2xvector<8xf32>> + %379 = vector.extractelement %378[%c3_i64 : i64] : vector<8xf32> + %380 = cmpi "slt", %arg5, %c0 : index + %381 = subi %c-1, %arg5 : index + %382 = select %380, %381, %arg5 : index + %383 = divi_signed %382, %c16 : index + %384 = subi %c-1, %383 : index + %385 = select %380, %384, %383 : index + %386 = remi_signed %385, %c16 : index + %387 = cmpi "slt", %386, %c0 : index + %388 = addi %386, %c16 : index + %389 = select %387, %388, %386 : index + %390 = remi_signed %arg5, %c16 : index + %391 = cmpi "slt", %390, %c0 : index + %392 = addi %390, %c16 : index + %393 = select %391, %392, %390 : index + %394 = cmpi "slt", %393, %c0 : index + %395 = subi %c-1, %393 : index + %396 = select %394, %395, %393 : index + %397 = divi_signed %396, %c8 : index + %398 = subi %c-1, %397 : index + %399 = select %394, %398, %397 : index + %400 = remi_signed %399, %c2 : index + %401 = cmpi "slt", %400, %c0 : index + %402 = addi %400, %c2 : index + %403 = select %401, %402, %400 : index + %404 = load %2[%389, %c0, %403] : memref<16x6x2xvector<8xf32>> + %405 = vector.extractelement %404[%c4_i64 : i64] : vector<8xf32> + %406 = cmpi "slt", %arg5, %c0 : index + %407 = subi %c-1, %arg5 : index + %408 = select %406, %407, %arg5 : index + %409 = divi_signed %408, %c16 : index + %410 = subi %c-1, %409 : index + %411 = select %406, %410, %409 : index + %412 = remi_signed %411, %c16 : index + %413 = cmpi "slt", %412, %c0 : index + %414 = addi %412, %c16 : index + %415 = select %413, %414, %412 : index + %416 = remi_signed %arg5, %c16 : index + %417 = cmpi "slt", %416, %c0 : index + %418 = addi %416, %c16 : index + %419 = select %417, %418, %416 : index + %420 = cmpi "slt", %419, %c0 : index + %421 = subi %c-1, %419 : index + %422 = select %420, %421, %419 : index + %423 = divi_signed %422, %c8 : index + %424 = subi %c-1, %423 : index + %425 = select %420, %424, %423 : index + %426 = remi_signed %425, %c2 : index + %427 = cmpi "slt", %426, %c0 : index + %428 = addi %426, %c2 : index + %429 = select %427, %428, %426 : index + %430 = load %2[%415, %c0, %429] : memref<16x6x2xvector<8xf32>> + %431 = vector.extractelement %430[%c5_i64 : i64] : vector<8xf32> + %432 = cmpi "slt", %arg5, %c0 : index + %433 = subi %c-1, %arg5 : index + %434 = select %432, %433, %arg5 : index + %435 = divi_signed %434, %c16 : index + %436 = subi %c-1, %435 : index + %437 = select %432, %436, %435 : index + %438 = remi_signed %437, %c16 : index + %439 = cmpi "slt", %438, %c0 : index + %440 = addi %438, %c16 : index + %441 = select %439, %440, %438 : index + %442 = remi_signed %arg5, %c16 : index + %443 = cmpi "slt", %442, %c0 : index + %444 = addi %442, %c16 : index + %445 = select %443, %444, %442 : index + %446 = cmpi "slt", %445, %c0 : index + %447 = subi %c-1, %445 : index + %448 = select %446, %447, %445 : index + %449 = divi_signed %448, %c8 : index + %450 = subi %c-1, %449 : index + %451 = select %446, %450, %449 : index + %452 = remi_signed %451, %c2 : index + %453 = cmpi "slt", %452, %c0 : index + %454 = addi %452, %c2 : index + %455 = select %453, %454, %452 : index + %456 = load %2[%441, %c0, %455] : memref<16x6x2xvector<8xf32>> + %457 = vector.extractelement %456[%c6_i64 : i64] : vector<8xf32> + %458 = cmpi "slt", %arg5, %c0 : index + %459 = subi %c-1, %arg5 : index + %460 = select %458, %459, %arg5 : index + %461 = divi_signed %460, %c16 : index + %462 = subi %c-1, %461 : index + %463 = select %458, %462, %461 : index + %464 = remi_signed %463, %c16 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = addi %464, %c16 : index + %467 = select %465, %466, %464 : index + %468 = remi_signed %arg5, %c16 : index + %469 = cmpi "slt", %468, %c0 : index + %470 = addi %468, %c16 : index + %471 = select %469, %470, %468 : index + %472 = cmpi "slt", %471, %c0 : index + %473 = subi %c-1, %471 : index + %474 = select %472, %473, %471 : index + %475 = divi_signed %474, %c8 : index + %476 = subi %c-1, %475 : index + %477 = select %472, %476, %475 : index + %478 = remi_signed %477, %c2 : index + %479 = cmpi "slt", %478, %c0 : index + %480 = addi %478, %c2 : index + %481 = select %479, %480, %478 : index + %482 = load %2[%467, %c0, %481] : memref<16x6x2xvector<8xf32>> + %483 = vector.extractelement %482[%c7_i64 : i64] : vector<8xf32> + %484 = addf %301, %268 {RelaxedPrecision} : f32 + %485 = addf %327, %269 {RelaxedPrecision} : f32 + %486 = addf %353, %270 {RelaxedPrecision} : f32 + %487 = addf %379, %271 {RelaxedPrecision} : f32 + %488 = addf %405, %272 {RelaxedPrecision} : f32 + %489 = addf %431, %273 {RelaxedPrecision} : f32 + %490 = addf %457, %274 {RelaxedPrecision} : f32 + %491 = addf %483, %275 {RelaxedPrecision} : f32 + %492 = cmpi "slt", %arg5, %c0 : index + %493 = subi %c-1, %arg5 : index + %494 = select %492, %493, %arg5 : index + %495 = divi_signed %494, %c16 : index + %496 = subi %c-1, %495 : index + %497 = select %492, %496, %495 : index + %498 = remi_signed %497, %c16 : index + %499 = cmpi "slt", %498, %c0 : index + %500 = addi %498, %c16 : index + %501 = select %499, %500, %498 : index + %502 = remi_signed %arg5, %c16 : index + %503 = cmpi "slt", %502, %c0 : index + %504 = addi %502, %c16 : index + %505 = select %503, %504, %502 : index + %506 = cmpi "slt", %505, %c0 : index + %507 = subi %c-1, %505 : index + %508 = select %506, %507, %505 : index + %509 = divi_signed %508, %c8 : index + %510 = subi %c-1, %509 : index + %511 = select %506, %510, %509 : index + %512 = remi_signed %511, %c2 : index + %513 = cmpi "slt", %512, %c0 : index + %514 = addi %512, %c2 : index + %515 = select %513, %514, %512 : index + %516 = load %2[%501, %c0, %515] : memref<16x6x2xvector<8xf32>> + %517 = vector.insertelement %484, %516[%c0_i64 : i64] : vector<8xf32> + %518 = cmpi "slt", %arg5, %c0 : index + %519 = subi %c-1, %arg5 : index + %520 = select %518, %519, %arg5 : index + %521 = divi_signed %520, %c16 : index + %522 = subi %c-1, %521 : index + %523 = select %518, %522, %521 : index + %524 = remi_signed %523, %c16 : index + %525 = cmpi "slt", %524, %c0 : index + %526 = addi %524, %c16 : index + %527 = select %525, %526, %524 : index + %528 = remi_signed %arg5, %c16 : index + %529 = cmpi "slt", %528, %c0 : index + %530 = addi %528, %c16 : index + %531 = select %529, %530, %528 : index + %532 = cmpi "slt", %531, %c0 : index + %533 = subi %c-1, %531 : index + %534 = select %532, %533, %531 : index + %535 = divi_signed %534, %c8 : index + %536 = subi %c-1, %535 : index + %537 = select %532, %536, %535 : index + %538 = remi_signed %537, %c2 : index + %539 = cmpi "slt", %538, %c0 : index + %540 = addi %538, %c2 : index + %541 = select %539, %540, %538 : index + store %517, %2[%527, %c0, %541] : memref<16x6x2xvector<8xf32>> + %542 = cmpi "slt", %arg5, %c0 : index + %543 = subi %c-1, %arg5 : index + %544 = select %542, %543, %arg5 : index + %545 = divi_signed %544, %c16 : index + %546 = subi %c-1, %545 : index + %547 = select %542, %546, %545 : index + %548 = remi_signed %547, %c16 : index + %549 = cmpi "slt", %548, %c0 : index + %550 = addi %548, %c16 : index + %551 = select %549, %550, %548 : index + %552 = remi_signed %arg5, %c16 : index + %553 = cmpi "slt", %552, %c0 : index + %554 = addi %552, %c16 : index + %555 = select %553, %554, %552 : index + %556 = cmpi "slt", %555, %c0 : index + %557 = subi %c-1, %555 : index + %558 = select %556, %557, %555 : index + %559 = divi_signed %558, %c8 : index + %560 = subi %c-1, %559 : index + %561 = select %556, %560, %559 : index + %562 = remi_signed %561, %c2 : index + %563 = cmpi "slt", %562, %c0 : index + %564 = addi %562, %c2 : index + %565 = select %563, %564, %562 : index + %566 = load %2[%551, %c0, %565] : memref<16x6x2xvector<8xf32>> + %567 = vector.insertelement %485, %566[%c1_i64 : i64] : vector<8xf32> + %568 = cmpi "slt", %arg5, %c0 : index + %569 = subi %c-1, %arg5 : index + %570 = select %568, %569, %arg5 : index + %571 = divi_signed %570, %c16 : index + %572 = subi %c-1, %571 : index + %573 = select %568, %572, %571 : index + %574 = remi_signed %573, %c16 : index + %575 = cmpi "slt", %574, %c0 : index + %576 = addi %574, %c16 : index + %577 = select %575, %576, %574 : index + %578 = remi_signed %arg5, %c16 : index + %579 = cmpi "slt", %578, %c0 : index + %580 = addi %578, %c16 : index + %581 = select %579, %580, %578 : index + %582 = cmpi "slt", %581, %c0 : index + %583 = subi %c-1, %581 : index + %584 = select %582, %583, %581 : index + %585 = divi_signed %584, %c8 : index + %586 = subi %c-1, %585 : index + %587 = select %582, %586, %585 : index + %588 = remi_signed %587, %c2 : index + %589 = cmpi "slt", %588, %c0 : index + %590 = addi %588, %c2 : index + %591 = select %589, %590, %588 : index + store %567, %2[%577, %c0, %591] : memref<16x6x2xvector<8xf32>> + %592 = cmpi "slt", %arg5, %c0 : index + %593 = subi %c-1, %arg5 : index + %594 = select %592, %593, %arg5 : index + %595 = divi_signed %594, %c16 : index + %596 = subi %c-1, %595 : index + %597 = select %592, %596, %595 : index + %598 = remi_signed %597, %c16 : index + %599 = cmpi "slt", %598, %c0 : index + %600 = addi %598, %c16 : index + %601 = select %599, %600, %598 : index + %602 = remi_signed %arg5, %c16 : index + %603 = cmpi "slt", %602, %c0 : index + %604 = addi %602, %c16 : index + %605 = select %603, %604, %602 : index + %606 = cmpi "slt", %605, %c0 : index + %607 = subi %c-1, %605 : index + %608 = select %606, %607, %605 : index + %609 = divi_signed %608, %c8 : index + %610 = subi %c-1, %609 : index + %611 = select %606, %610, %609 : index + %612 = remi_signed %611, %c2 : index + %613 = cmpi "slt", %612, %c0 : index + %614 = addi %612, %c2 : index + %615 = select %613, %614, %612 : index + %616 = load %2[%601, %c0, %615] : memref<16x6x2xvector<8xf32>> + %617 = vector.insertelement %486, %616[%c2_i64 : i64] : vector<8xf32> + %618 = cmpi "slt", %arg5, %c0 : index + %619 = subi %c-1, %arg5 : index + %620 = select %618, %619, %arg5 : index + %621 = divi_signed %620, %c16 : index + %622 = subi %c-1, %621 : index + %623 = select %618, %622, %621 : index + %624 = remi_signed %623, %c16 : index + %625 = cmpi "slt", %624, %c0 : index + %626 = addi %624, %c16 : index + %627 = select %625, %626, %624 : index + %628 = remi_signed %arg5, %c16 : index + %629 = cmpi "slt", %628, %c0 : index + %630 = addi %628, %c16 : index + %631 = select %629, %630, %628 : index + %632 = cmpi "slt", %631, %c0 : index + %633 = subi %c-1, %631 : index + %634 = select %632, %633, %631 : index + %635 = divi_signed %634, %c8 : index + %636 = subi %c-1, %635 : index + %637 = select %632, %636, %635 : index + %638 = remi_signed %637, %c2 : index + %639 = cmpi "slt", %638, %c0 : index + %640 = addi %638, %c2 : index + %641 = select %639, %640, %638 : index + store %617, %2[%627, %c0, %641] : memref<16x6x2xvector<8xf32>> + %642 = cmpi "slt", %arg5, %c0 : index + %643 = subi %c-1, %arg5 : index + %644 = select %642, %643, %arg5 : index + %645 = divi_signed %644, %c16 : index + %646 = subi %c-1, %645 : index + %647 = select %642, %646, %645 : index + %648 = remi_signed %647, %c16 : index + %649 = cmpi "slt", %648, %c0 : index + %650 = addi %648, %c16 : index + %651 = select %649, %650, %648 : index + %652 = remi_signed %arg5, %c16 : index + %653 = cmpi "slt", %652, %c0 : index + %654 = addi %652, %c16 : index + %655 = select %653, %654, %652 : index + %656 = cmpi "slt", %655, %c0 : index + %657 = subi %c-1, %655 : index + %658 = select %656, %657, %655 : index + %659 = divi_signed %658, %c8 : index + %660 = subi %c-1, %659 : index + %661 = select %656, %660, %659 : index + %662 = remi_signed %661, %c2 : index + %663 = cmpi "slt", %662, %c0 : index + %664 = addi %662, %c2 : index + %665 = select %663, %664, %662 : index + %666 = load %2[%651, %c0, %665] : memref<16x6x2xvector<8xf32>> + %667 = vector.insertelement %487, %666[%c3_i64 : i64] : vector<8xf32> + %668 = cmpi "slt", %arg5, %c0 : index + %669 = subi %c-1, %arg5 : index + %670 = select %668, %669, %arg5 : index + %671 = divi_signed %670, %c16 : index + %672 = subi %c-1, %671 : index + %673 = select %668, %672, %671 : index + %674 = remi_signed %673, %c16 : index + %675 = cmpi "slt", %674, %c0 : index + %676 = addi %674, %c16 : index + %677 = select %675, %676, %674 : index + %678 = remi_signed %arg5, %c16 : index + %679 = cmpi "slt", %678, %c0 : index + %680 = addi %678, %c16 : index + %681 = select %679, %680, %678 : index + %682 = cmpi "slt", %681, %c0 : index + %683 = subi %c-1, %681 : index + %684 = select %682, %683, %681 : index + %685 = divi_signed %684, %c8 : index + %686 = subi %c-1, %685 : index + %687 = select %682, %686, %685 : index + %688 = remi_signed %687, %c2 : index + %689 = cmpi "slt", %688, %c0 : index + %690 = addi %688, %c2 : index + %691 = select %689, %690, %688 : index + store %667, %2[%677, %c0, %691] : memref<16x6x2xvector<8xf32>> + %692 = cmpi "slt", %arg5, %c0 : index + %693 = subi %c-1, %arg5 : index + %694 = select %692, %693, %arg5 : index + %695 = divi_signed %694, %c16 : index + %696 = subi %c-1, %695 : index + %697 = select %692, %696, %695 : index + %698 = remi_signed %697, %c16 : index + %699 = cmpi "slt", %698, %c0 : index + %700 = addi %698, %c16 : index + %701 = select %699, %700, %698 : index + %702 = remi_signed %arg5, %c16 : index + %703 = cmpi "slt", %702, %c0 : index + %704 = addi %702, %c16 : index + %705 = select %703, %704, %702 : index + %706 = cmpi "slt", %705, %c0 : index + %707 = subi %c-1, %705 : index + %708 = select %706, %707, %705 : index + %709 = divi_signed %708, %c8 : index + %710 = subi %c-1, %709 : index + %711 = select %706, %710, %709 : index + %712 = remi_signed %711, %c2 : index + %713 = cmpi "slt", %712, %c0 : index + %714 = addi %712, %c2 : index + %715 = select %713, %714, %712 : index + %716 = load %2[%701, %c0, %715] : memref<16x6x2xvector<8xf32>> + %717 = vector.insertelement %488, %716[%c4_i64 : i64] : vector<8xf32> + %718 = cmpi "slt", %arg5, %c0 : index + %719 = subi %c-1, %arg5 : index + %720 = select %718, %719, %arg5 : index + %721 = divi_signed %720, %c16 : index + %722 = subi %c-1, %721 : index + %723 = select %718, %722, %721 : index + %724 = remi_signed %723, %c16 : index + %725 = cmpi "slt", %724, %c0 : index + %726 = addi %724, %c16 : index + %727 = select %725, %726, %724 : index + %728 = remi_signed %arg5, %c16 : index + %729 = cmpi "slt", %728, %c0 : index + %730 = addi %728, %c16 : index + %731 = select %729, %730, %728 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c8 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = remi_signed %737, %c2 : index + %739 = cmpi "slt", %738, %c0 : index + %740 = addi %738, %c2 : index + %741 = select %739, %740, %738 : index + store %717, %2[%727, %c0, %741] : memref<16x6x2xvector<8xf32>> + %742 = cmpi "slt", %arg5, %c0 : index + %743 = subi %c-1, %arg5 : index + %744 = select %742, %743, %arg5 : index + %745 = divi_signed %744, %c16 : index + %746 = subi %c-1, %745 : index + %747 = select %742, %746, %745 : index + %748 = remi_signed %747, %c16 : index + %749 = cmpi "slt", %748, %c0 : index + %750 = addi %748, %c16 : index + %751 = select %749, %750, %748 : index + %752 = remi_signed %arg5, %c16 : index + %753 = cmpi "slt", %752, %c0 : index + %754 = addi %752, %c16 : index + %755 = select %753, %754, %752 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = subi %c-1, %755 : index + %758 = select %756, %757, %755 : index + %759 = divi_signed %758, %c8 : index + %760 = subi %c-1, %759 : index + %761 = select %756, %760, %759 : index + %762 = remi_signed %761, %c2 : index + %763 = cmpi "slt", %762, %c0 : index + %764 = addi %762, %c2 : index + %765 = select %763, %764, %762 : index + %766 = load %2[%751, %c0, %765] : memref<16x6x2xvector<8xf32>> + %767 = vector.insertelement %489, %766[%c5_i64 : i64] : vector<8xf32> + %768 = cmpi "slt", %arg5, %c0 : index + %769 = subi %c-1, %arg5 : index + %770 = select %768, %769, %arg5 : index + %771 = divi_signed %770, %c16 : index + %772 = subi %c-1, %771 : index + %773 = select %768, %772, %771 : index + %774 = remi_signed %773, %c16 : index + %775 = cmpi "slt", %774, %c0 : index + %776 = addi %774, %c16 : index + %777 = select %775, %776, %774 : index + %778 = remi_signed %arg5, %c16 : index + %779 = cmpi "slt", %778, %c0 : index + %780 = addi %778, %c16 : index + %781 = select %779, %780, %778 : index + %782 = cmpi "slt", %781, %c0 : index + %783 = subi %c-1, %781 : index + %784 = select %782, %783, %781 : index + %785 = divi_signed %784, %c8 : index + %786 = subi %c-1, %785 : index + %787 = select %782, %786, %785 : index + %788 = remi_signed %787, %c2 : index + %789 = cmpi "slt", %788, %c0 : index + %790 = addi %788, %c2 : index + %791 = select %789, %790, %788 : index + store %767, %2[%777, %c0, %791] : memref<16x6x2xvector<8xf32>> + %792 = cmpi "slt", %arg5, %c0 : index + %793 = subi %c-1, %arg5 : index + %794 = select %792, %793, %arg5 : index + %795 = divi_signed %794, %c16 : index + %796 = subi %c-1, %795 : index + %797 = select %792, %796, %795 : index + %798 = remi_signed %797, %c16 : index + %799 = cmpi "slt", %798, %c0 : index + %800 = addi %798, %c16 : index + %801 = select %799, %800, %798 : index + %802 = remi_signed %arg5, %c16 : index + %803 = cmpi "slt", %802, %c0 : index + %804 = addi %802, %c16 : index + %805 = select %803, %804, %802 : index + %806 = cmpi "slt", %805, %c0 : index + %807 = subi %c-1, %805 : index + %808 = select %806, %807, %805 : index + %809 = divi_signed %808, %c8 : index + %810 = subi %c-1, %809 : index + %811 = select %806, %810, %809 : index + %812 = remi_signed %811, %c2 : index + %813 = cmpi "slt", %812, %c0 : index + %814 = addi %812, %c2 : index + %815 = select %813, %814, %812 : index + %816 = load %2[%801, %c0, %815] : memref<16x6x2xvector<8xf32>> + %817 = vector.insertelement %490, %816[%c6_i64 : i64] : vector<8xf32> + %818 = cmpi "slt", %arg5, %c0 : index + %819 = subi %c-1, %arg5 : index + %820 = select %818, %819, %arg5 : index + %821 = divi_signed %820, %c16 : index + %822 = subi %c-1, %821 : index + %823 = select %818, %822, %821 : index + %824 = remi_signed %823, %c16 : index + %825 = cmpi "slt", %824, %c0 : index + %826 = addi %824, %c16 : index + %827 = select %825, %826, %824 : index + %828 = remi_signed %arg5, %c16 : index + %829 = cmpi "slt", %828, %c0 : index + %830 = addi %828, %c16 : index + %831 = select %829, %830, %828 : index + %832 = cmpi "slt", %831, %c0 : index + %833 = subi %c-1, %831 : index + %834 = select %832, %833, %831 : index + %835 = divi_signed %834, %c8 : index + %836 = subi %c-1, %835 : index + %837 = select %832, %836, %835 : index + %838 = remi_signed %837, %c2 : index + %839 = cmpi "slt", %838, %c0 : index + %840 = addi %838, %c2 : index + %841 = select %839, %840, %838 : index + store %817, %2[%827, %c0, %841] : memref<16x6x2xvector<8xf32>> + %842 = cmpi "slt", %arg5, %c0 : index + %843 = subi %c-1, %arg5 : index + %844 = select %842, %843, %arg5 : index + %845 = divi_signed %844, %c16 : index + %846 = subi %c-1, %845 : index + %847 = select %842, %846, %845 : index + %848 = remi_signed %847, %c16 : index + %849 = cmpi "slt", %848, %c0 : index + %850 = addi %848, %c16 : index + %851 = select %849, %850, %848 : index + %852 = remi_signed %arg5, %c16 : index + %853 = cmpi "slt", %852, %c0 : index + %854 = addi %852, %c16 : index + %855 = select %853, %854, %852 : index + %856 = cmpi "slt", %855, %c0 : index + %857 = subi %c-1, %855 : index + %858 = select %856, %857, %855 : index + %859 = divi_signed %858, %c8 : index + %860 = subi %c-1, %859 : index + %861 = select %856, %860, %859 : index + %862 = remi_signed %861, %c2 : index + %863 = cmpi "slt", %862, %c0 : index + %864 = addi %862, %c2 : index + %865 = select %863, %864, %862 : index + %866 = load %2[%851, %c0, %865] : memref<16x6x2xvector<8xf32>> + %867 = vector.insertelement %491, %866[%c7_i64 : i64] : vector<8xf32> + %868 = cmpi "slt", %arg5, %c0 : index + %869 = subi %c-1, %arg5 : index + %870 = select %868, %869, %arg5 : index + %871 = divi_signed %870, %c16 : index + %872 = subi %c-1, %871 : index + %873 = select %868, %872, %871 : index + %874 = remi_signed %873, %c16 : index + %875 = cmpi "slt", %874, %c0 : index + %876 = addi %874, %c16 : index + %877 = select %875, %876, %874 : index + %878 = remi_signed %arg5, %c16 : index + %879 = cmpi "slt", %878, %c0 : index + %880 = addi %878, %c16 : index + %881 = select %879, %880, %878 : index + %882 = cmpi "slt", %881, %c0 : index + %883 = subi %c-1, %881 : index + %884 = select %882, %883, %881 : index + %885 = divi_signed %884, %c8 : index + %886 = subi %c-1, %885 : index + %887 = select %882, %886, %885 : index + %888 = remi_signed %887, %c2 : index + %889 = cmpi "slt", %888, %c0 : index + %890 = addi %888, %c2 : index + %891 = select %889, %890, %888 : index + store %867, %2[%877, %c0, %891] : memref<16x6x2xvector<8xf32>> + %892 = cmpi "slt", %arg5, %c0 : index + %893 = subi %c-1, %arg5 : index + %894 = select %892, %893, %arg5 : index + %895 = divi_signed %894, %c16 : index + %896 = subi %c-1, %895 : index + %897 = select %892, %896, %895 : index + %898 = remi_signed %897, %c16 : index + %899 = cmpi "slt", %898, %c0 : index + %900 = addi %898, %c16 : index + %901 = select %899, %900, %898 : index + %902 = remi_signed %arg5, %c16 : index + %903 = cmpi "slt", %902, %c0 : index + %904 = addi %902, %c16 : index + %905 = select %903, %904, %902 : index + %906 = cmpi "slt", %905, %c0 : index + %907 = subi %c-1, %905 : index + %908 = select %906, %907, %905 : index + %909 = divi_signed %908, %c8 : index + %910 = subi %c-1, %909 : index + %911 = select %906, %910, %909 : index + %912 = remi_signed %911, %c2 : index + %913 = cmpi "slt", %912, %c0 : index + %914 = addi %912, %c2 : index + %915 = select %913, %914, %912 : index + %916 = load %2[%901, %c0, %915] : memref<16x6x2xvector<8xf32>> + %917 = vector.insertelement %484, %916[%c0_i64 : i64] : vector<8xf32> + %918 = cmpi "slt", %arg5, %c0 : index + %919 = subi %c-1, %arg5 : index + %920 = select %918, %919, %arg5 : index + %921 = divi_signed %920, %c16 : index + %922 = subi %c-1, %921 : index + %923 = select %918, %922, %921 : index + %924 = remi_signed %923, %c16 : index + %925 = cmpi "slt", %924, %c0 : index + %926 = addi %924, %c16 : index + %927 = select %925, %926, %924 : index + %928 = remi_signed %arg5, %c16 : index + %929 = cmpi "slt", %928, %c0 : index + %930 = addi %928, %c16 : index + %931 = select %929, %930, %928 : index + %932 = cmpi "slt", %931, %c0 : index + %933 = subi %c-1, %931 : index + %934 = select %932, %933, %931 : index + %935 = divi_signed %934, %c8 : index + %936 = subi %c-1, %935 : index + %937 = select %932, %936, %935 : index + %938 = remi_signed %937, %c2 : index + %939 = cmpi "slt", %938, %c0 : index + %940 = addi %938, %c2 : index + %941 = select %939, %940, %938 : index + store %917, %2[%927, %c0, %941] : memref<16x6x2xvector<8xf32>> + %942 = cmpi "slt", %arg5, %c0 : index + %943 = subi %c-1, %arg5 : index + %944 = select %942, %943, %arg5 : index + %945 = divi_signed %944, %c16 : index + %946 = subi %c-1, %945 : index + %947 = select %942, %946, %945 : index + %948 = remi_signed %947, %c16 : index + %949 = cmpi "slt", %948, %c0 : index + %950 = addi %948, %c16 : index + %951 = select %949, %950, %948 : index + %952 = remi_signed %arg5, %c16 : index + %953 = cmpi "slt", %952, %c0 : index + %954 = addi %952, %c16 : index + %955 = select %953, %954, %952 : index + %956 = cmpi "slt", %955, %c0 : index + %957 = subi %c-1, %955 : index + %958 = select %956, %957, %955 : index + %959 = divi_signed %958, %c8 : index + %960 = subi %c-1, %959 : index + %961 = select %956, %960, %959 : index + %962 = remi_signed %961, %c2 : index + %963 = cmpi "slt", %962, %c0 : index + %964 = addi %962, %c2 : index + %965 = select %963, %964, %962 : index + %966 = load %2[%951, %c0, %965] : memref<16x6x2xvector<8xf32>> + %967 = vector.insertelement %485, %966[%c1_i64 : i64] : vector<8xf32> + %968 = cmpi "slt", %arg5, %c0 : index + %969 = subi %c-1, %arg5 : index + %970 = select %968, %969, %arg5 : index + %971 = divi_signed %970, %c16 : index + %972 = subi %c-1, %971 : index + %973 = select %968, %972, %971 : index + %974 = remi_signed %973, %c16 : index + %975 = cmpi "slt", %974, %c0 : index + %976 = addi %974, %c16 : index + %977 = select %975, %976, %974 : index + %978 = remi_signed %arg5, %c16 : index + %979 = cmpi "slt", %978, %c0 : index + %980 = addi %978, %c16 : index + %981 = select %979, %980, %978 : index + %982 = cmpi "slt", %981, %c0 : index + %983 = subi %c-1, %981 : index + %984 = select %982, %983, %981 : index + %985 = divi_signed %984, %c8 : index + %986 = subi %c-1, %985 : index + %987 = select %982, %986, %985 : index + %988 = remi_signed %987, %c2 : index + %989 = cmpi "slt", %988, %c0 : index + %990 = addi %988, %c2 : index + %991 = select %989, %990, %988 : index + store %967, %2[%977, %c0, %991] : memref<16x6x2xvector<8xf32>> + %992 = cmpi "slt", %arg5, %c0 : index + %993 = subi %c-1, %arg5 : index + %994 = select %992, %993, %arg5 : index + %995 = divi_signed %994, %c16 : index + %996 = subi %c-1, %995 : index + %997 = select %992, %996, %995 : index + %998 = remi_signed %997, %c16 : index + %999 = cmpi "slt", %998, %c0 : index + %1000 = addi %998, %c16 : index + %1001 = select %999, %1000, %998 : index + %1002 = remi_signed %arg5, %c16 : index + %1003 = cmpi "slt", %1002, %c0 : index + %1004 = addi %1002, %c16 : index + %1005 = select %1003, %1004, %1002 : index + %1006 = cmpi "slt", %1005, %c0 : index + %1007 = subi %c-1, %1005 : index + %1008 = select %1006, %1007, %1005 : index + %1009 = divi_signed %1008, %c8 : index + %1010 = subi %c-1, %1009 : index + %1011 = select %1006, %1010, %1009 : index + %1012 = remi_signed %1011, %c2 : index + %1013 = cmpi "slt", %1012, %c0 : index + %1014 = addi %1012, %c2 : index + %1015 = select %1013, %1014, %1012 : index + %1016 = load %2[%1001, %c0, %1015] : memref<16x6x2xvector<8xf32>> + %1017 = vector.insertelement %486, %1016[%c2_i64 : i64] : vector<8xf32> + %1018 = cmpi "slt", %arg5, %c0 : index + %1019 = subi %c-1, %arg5 : index + %1020 = select %1018, %1019, %arg5 : index + %1021 = divi_signed %1020, %c16 : index + %1022 = subi %c-1, %1021 : index + %1023 = select %1018, %1022, %1021 : index + %1024 = remi_signed %1023, %c16 : index + %1025 = cmpi "slt", %1024, %c0 : index + %1026 = addi %1024, %c16 : index + %1027 = select %1025, %1026, %1024 : index + %1028 = remi_signed %arg5, %c16 : index + %1029 = cmpi "slt", %1028, %c0 : index + %1030 = addi %1028, %c16 : index + %1031 = select %1029, %1030, %1028 : index + %1032 = cmpi "slt", %1031, %c0 : index + %1033 = subi %c-1, %1031 : index + %1034 = select %1032, %1033, %1031 : index + %1035 = divi_signed %1034, %c8 : index + %1036 = subi %c-1, %1035 : index + %1037 = select %1032, %1036, %1035 : index + %1038 = remi_signed %1037, %c2 : index + %1039 = cmpi "slt", %1038, %c0 : index + %1040 = addi %1038, %c2 : index + %1041 = select %1039, %1040, %1038 : index + store %1017, %2[%1027, %c0, %1041] : memref<16x6x2xvector<8xf32>> + %1042 = cmpi "slt", %arg5, %c0 : index + %1043 = subi %c-1, %arg5 : index + %1044 = select %1042, %1043, %arg5 : index + %1045 = divi_signed %1044, %c16 : index + %1046 = subi %c-1, %1045 : index + %1047 = select %1042, %1046, %1045 : index + %1048 = remi_signed %1047, %c16 : index + %1049 = cmpi "slt", %1048, %c0 : index + %1050 = addi %1048, %c16 : index + %1051 = select %1049, %1050, %1048 : index + %1052 = remi_signed %arg5, %c16 : index + %1053 = cmpi "slt", %1052, %c0 : index + %1054 = addi %1052, %c16 : index + %1055 = select %1053, %1054, %1052 : index + %1056 = cmpi "slt", %1055, %c0 : index + %1057 = subi %c-1, %1055 : index + %1058 = select %1056, %1057, %1055 : index + %1059 = divi_signed %1058, %c8 : index + %1060 = subi %c-1, %1059 : index + %1061 = select %1056, %1060, %1059 : index + %1062 = remi_signed %1061, %c2 : index + %1063 = cmpi "slt", %1062, %c0 : index + %1064 = addi %1062, %c2 : index + %1065 = select %1063, %1064, %1062 : index + %1066 = load %2[%1051, %c0, %1065] : memref<16x6x2xvector<8xf32>> + %1067 = vector.insertelement %487, %1066[%c3_i64 : i64] : vector<8xf32> + %1068 = cmpi "slt", %arg5, %c0 : index + %1069 = subi %c-1, %arg5 : index + %1070 = select %1068, %1069, %arg5 : index + %1071 = divi_signed %1070, %c16 : index + %1072 = subi %c-1, %1071 : index + %1073 = select %1068, %1072, %1071 : index + %1074 = remi_signed %1073, %c16 : index + %1075 = cmpi "slt", %1074, %c0 : index + %1076 = addi %1074, %c16 : index + %1077 = select %1075, %1076, %1074 : index + %1078 = remi_signed %arg5, %c16 : index + %1079 = cmpi "slt", %1078, %c0 : index + %1080 = addi %1078, %c16 : index + %1081 = select %1079, %1080, %1078 : index + %1082 = cmpi "slt", %1081, %c0 : index + %1083 = subi %c-1, %1081 : index + %1084 = select %1082, %1083, %1081 : index + %1085 = divi_signed %1084, %c8 : index + %1086 = subi %c-1, %1085 : index + %1087 = select %1082, %1086, %1085 : index + %1088 = remi_signed %1087, %c2 : index + %1089 = cmpi "slt", %1088, %c0 : index + %1090 = addi %1088, %c2 : index + %1091 = select %1089, %1090, %1088 : index + store %1067, %2[%1077, %c0, %1091] : memref<16x6x2xvector<8xf32>> + %1092 = cmpi "slt", %arg5, %c0 : index + %1093 = subi %c-1, %arg5 : index + %1094 = select %1092, %1093, %arg5 : index + %1095 = divi_signed %1094, %c16 : index + %1096 = subi %c-1, %1095 : index + %1097 = select %1092, %1096, %1095 : index + %1098 = remi_signed %1097, %c16 : index + %1099 = cmpi "slt", %1098, %c0 : index + %1100 = addi %1098, %c16 : index + %1101 = select %1099, %1100, %1098 : index + %1102 = remi_signed %arg5, %c16 : index + %1103 = cmpi "slt", %1102, %c0 : index + %1104 = addi %1102, %c16 : index + %1105 = select %1103, %1104, %1102 : index + %1106 = cmpi "slt", %1105, %c0 : index + %1107 = subi %c-1, %1105 : index + %1108 = select %1106, %1107, %1105 : index + %1109 = divi_signed %1108, %c8 : index + %1110 = subi %c-1, %1109 : index + %1111 = select %1106, %1110, %1109 : index + %1112 = remi_signed %1111, %c2 : index + %1113 = cmpi "slt", %1112, %c0 : index + %1114 = addi %1112, %c2 : index + %1115 = select %1113, %1114, %1112 : index + %1116 = load %2[%1101, %c0, %1115] : memref<16x6x2xvector<8xf32>> + %1117 = vector.insertelement %488, %1116[%c4_i64 : i64] : vector<8xf32> + %1118 = cmpi "slt", %arg5, %c0 : index + %1119 = subi %c-1, %arg5 : index + %1120 = select %1118, %1119, %arg5 : index + %1121 = divi_signed %1120, %c16 : index + %1122 = subi %c-1, %1121 : index + %1123 = select %1118, %1122, %1121 : index + %1124 = remi_signed %1123, %c16 : index + %1125 = cmpi "slt", %1124, %c0 : index + %1126 = addi %1124, %c16 : index + %1127 = select %1125, %1126, %1124 : index + %1128 = remi_signed %arg5, %c16 : index + %1129 = cmpi "slt", %1128, %c0 : index + %1130 = addi %1128, %c16 : index + %1131 = select %1129, %1130, %1128 : index + %1132 = cmpi "slt", %1131, %c0 : index + %1133 = subi %c-1, %1131 : index + %1134 = select %1132, %1133, %1131 : index + %1135 = divi_signed %1134, %c8 : index + %1136 = subi %c-1, %1135 : index + %1137 = select %1132, %1136, %1135 : index + %1138 = remi_signed %1137, %c2 : index + %1139 = cmpi "slt", %1138, %c0 : index + %1140 = addi %1138, %c2 : index + %1141 = select %1139, %1140, %1138 : index + store %1117, %2[%1127, %c0, %1141] : memref<16x6x2xvector<8xf32>> + %1142 = cmpi "slt", %arg5, %c0 : index + %1143 = subi %c-1, %arg5 : index + %1144 = select %1142, %1143, %arg5 : index + %1145 = divi_signed %1144, %c16 : index + %1146 = subi %c-1, %1145 : index + %1147 = select %1142, %1146, %1145 : index + %1148 = remi_signed %1147, %c16 : index + %1149 = cmpi "slt", %1148, %c0 : index + %1150 = addi %1148, %c16 : index + %1151 = select %1149, %1150, %1148 : index + %1152 = remi_signed %arg5, %c16 : index + %1153 = cmpi "slt", %1152, %c0 : index + %1154 = addi %1152, %c16 : index + %1155 = select %1153, %1154, %1152 : index + %1156 = cmpi "slt", %1155, %c0 : index + %1157 = subi %c-1, %1155 : index + %1158 = select %1156, %1157, %1155 : index + %1159 = divi_signed %1158, %c8 : index + %1160 = subi %c-1, %1159 : index + %1161 = select %1156, %1160, %1159 : index + %1162 = remi_signed %1161, %c2 : index + %1163 = cmpi "slt", %1162, %c0 : index + %1164 = addi %1162, %c2 : index + %1165 = select %1163, %1164, %1162 : index + %1166 = load %2[%1151, %c0, %1165] : memref<16x6x2xvector<8xf32>> + %1167 = vector.insertelement %489, %1166[%c5_i64 : i64] : vector<8xf32> + %1168 = cmpi "slt", %arg5, %c0 : index + %1169 = subi %c-1, %arg5 : index + %1170 = select %1168, %1169, %arg5 : index + %1171 = divi_signed %1170, %c16 : index + %1172 = subi %c-1, %1171 : index + %1173 = select %1168, %1172, %1171 : index + %1174 = remi_signed %1173, %c16 : index + %1175 = cmpi "slt", %1174, %c0 : index + %1176 = addi %1174, %c16 : index + %1177 = select %1175, %1176, %1174 : index + %1178 = remi_signed %arg5, %c16 : index + %1179 = cmpi "slt", %1178, %c0 : index + %1180 = addi %1178, %c16 : index + %1181 = select %1179, %1180, %1178 : index + %1182 = cmpi "slt", %1181, %c0 : index + %1183 = subi %c-1, %1181 : index + %1184 = select %1182, %1183, %1181 : index + %1185 = divi_signed %1184, %c8 : index + %1186 = subi %c-1, %1185 : index + %1187 = select %1182, %1186, %1185 : index + %1188 = remi_signed %1187, %c2 : index + %1189 = cmpi "slt", %1188, %c0 : index + %1190 = addi %1188, %c2 : index + %1191 = select %1189, %1190, %1188 : index + store %1167, %2[%1177, %c0, %1191] : memref<16x6x2xvector<8xf32>> + %1192 = cmpi "slt", %arg5, %c0 : index + %1193 = subi %c-1, %arg5 : index + %1194 = select %1192, %1193, %arg5 : index + %1195 = divi_signed %1194, %c16 : index + %1196 = subi %c-1, %1195 : index + %1197 = select %1192, %1196, %1195 : index + %1198 = remi_signed %1197, %c16 : index + %1199 = cmpi "slt", %1198, %c0 : index + %1200 = addi %1198, %c16 : index + %1201 = select %1199, %1200, %1198 : index + %1202 = remi_signed %arg5, %c16 : index + %1203 = cmpi "slt", %1202, %c0 : index + %1204 = addi %1202, %c16 : index + %1205 = select %1203, %1204, %1202 : index + %1206 = cmpi "slt", %1205, %c0 : index + %1207 = subi %c-1, %1205 : index + %1208 = select %1206, %1207, %1205 : index + %1209 = divi_signed %1208, %c8 : index + %1210 = subi %c-1, %1209 : index + %1211 = select %1206, %1210, %1209 : index + %1212 = remi_signed %1211, %c2 : index + %1213 = cmpi "slt", %1212, %c0 : index + %1214 = addi %1212, %c2 : index + %1215 = select %1213, %1214, %1212 : index + %1216 = load %2[%1201, %c0, %1215] : memref<16x6x2xvector<8xf32>> + %1217 = vector.insertelement %490, %1216[%c6_i64 : i64] : vector<8xf32> + %1218 = cmpi "slt", %arg5, %c0 : index + %1219 = subi %c-1, %arg5 : index + %1220 = select %1218, %1219, %arg5 : index + %1221 = divi_signed %1220, %c16 : index + %1222 = subi %c-1, %1221 : index + %1223 = select %1218, %1222, %1221 : index + %1224 = remi_signed %1223, %c16 : index + %1225 = cmpi "slt", %1224, %c0 : index + %1226 = addi %1224, %c16 : index + %1227 = select %1225, %1226, %1224 : index + %1228 = remi_signed %arg5, %c16 : index + %1229 = cmpi "slt", %1228, %c0 : index + %1230 = addi %1228, %c16 : index + %1231 = select %1229, %1230, %1228 : index + %1232 = cmpi "slt", %1231, %c0 : index + %1233 = subi %c-1, %1231 : index + %1234 = select %1232, %1233, %1231 : index + %1235 = divi_signed %1234, %c8 : index + %1236 = subi %c-1, %1235 : index + %1237 = select %1232, %1236, %1235 : index + %1238 = remi_signed %1237, %c2 : index + %1239 = cmpi "slt", %1238, %c0 : index + %1240 = addi %1238, %c2 : index + %1241 = select %1239, %1240, %1238 : index + store %1217, %2[%1227, %c0, %1241] : memref<16x6x2xvector<8xf32>> + %1242 = cmpi "slt", %arg5, %c0 : index + %1243 = subi %c-1, %arg5 : index + %1244 = select %1242, %1243, %arg5 : index + %1245 = divi_signed %1244, %c16 : index + %1246 = subi %c-1, %1245 : index + %1247 = select %1242, %1246, %1245 : index + %1248 = remi_signed %1247, %c16 : index + %1249 = cmpi "slt", %1248, %c0 : index + %1250 = addi %1248, %c16 : index + %1251 = select %1249, %1250, %1248 : index + %1252 = remi_signed %arg5, %c16 : index + %1253 = cmpi "slt", %1252, %c0 : index + %1254 = addi %1252, %c16 : index + %1255 = select %1253, %1254, %1252 : index + %1256 = cmpi "slt", %1255, %c0 : index + %1257 = subi %c-1, %1255 : index + %1258 = select %1256, %1257, %1255 : index + %1259 = divi_signed %1258, %c8 : index + %1260 = subi %c-1, %1259 : index + %1261 = select %1256, %1260, %1259 : index + %1262 = remi_signed %1261, %c2 : index + %1263 = cmpi "slt", %1262, %c0 : index + %1264 = addi %1262, %c2 : index + %1265 = select %1263, %1264, %1262 : index + %1266 = load %2[%1251, %c0, %1265] : memref<16x6x2xvector<8xf32>> + %1267 = vector.insertelement %491, %1266[%c7_i64 : i64] : vector<8xf32> + %1268 = cmpi "slt", %arg5, %c0 : index + %1269 = subi %c-1, %arg5 : index + %1270 = select %1268, %1269, %arg5 : index + %1271 = divi_signed %1270, %c16 : index + %1272 = subi %c-1, %1271 : index + %1273 = select %1268, %1272, %1271 : index + %1274 = remi_signed %1273, %c16 : index + %1275 = cmpi "slt", %1274, %c0 : index + %1276 = addi %1274, %c16 : index + %1277 = select %1275, %1276, %1274 : index + %1278 = remi_signed %arg5, %c16 : index + %1279 = cmpi "slt", %1278, %c0 : index + %1280 = addi %1278, %c16 : index + %1281 = select %1279, %1280, %1278 : index + %1282 = cmpi "slt", %1281, %c0 : index + %1283 = subi %c-1, %1281 : index + %1284 = select %1282, %1283, %1281 : index + %1285 = divi_signed %1284, %c8 : index + %1286 = subi %c-1, %1285 : index + %1287 = select %1282, %1286, %1285 : index + %1288 = remi_signed %1287, %c2 : index + %1289 = cmpi "slt", %1288, %c0 : index + %1290 = addi %1288, %c2 : index + %1291 = select %1289, %1290, %1288 : index + store %1267, %2[%1277, %c0, %1291] : memref<16x6x2xvector<8xf32>> + %1292 = addi %arg6, %arg7 : index + %1293 = addi %arg6, %arg7 : index + %1294 = addi %arg6, %arg7 : index + %1295 = addi %arg6, %arg7 : index + %1296 = addi %arg6, %arg7 : index + %1297 = addi %arg6, %arg7 : index + %1298 = addi %arg6, %arg7 : index + %1299 = addi %arg6, %arg7 : index + %1300 = load %arg0[%arg4, %1292] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1301 = load %arg0[%arg4, %1293] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1302 = load %arg0[%arg4, %1294] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1303 = load %arg0[%arg4, %1295] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1304 = load %arg0[%arg4, %1296] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1305 = load %arg0[%arg4, %1297] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1306 = load %arg0[%arg4, %1298] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1307 = load %arg0[%arg4, %1299] : memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>> + %1308 = addi %arg5, %c8 : index + %1309 = cmpi "slt", %1308, %c0 : index + %1310 = subi %c-1, %1308 : index + %1311 = select %1309, %1310, %1308 : index + %1312 = divi_signed %1311, %c16 : index + %1313 = subi %c-1, %1312 : index + %1314 = select %1309, %1313, %1312 : index + %1315 = remi_signed %1314, %c16 : index + %1316 = cmpi "slt", %1315, %c0 : index + %1317 = addi %1315, %c16 : index + %1318 = select %1316, %1317, %1315 : index + %1319 = addi %arg6, %arg7 : index + %1320 = remi_signed %1319, %c128 : index + %1321 = cmpi "slt", %1320, %c0 : index + %1322 = addi %1320, %c128 : index + %1323 = select %1321, %1322, %1320 : index + %1324 = cmpi "slt", %arg5, %c0 : index + %1325 = subi %c-1, %arg5 : index + %1326 = select %1324, %1325, %arg5 : index + %1327 = divi_signed %1326, %c8 : index + %1328 = subi %c-1, %1327 : index + %1329 = select %1324, %1328, %1327 : index + %1330 = addi %arg5, %c8 : index + %1331 = cmpi "slt", %1330, %c0 : index + %1332 = subi %c-1, %1330 : index + %1333 = select %1331, %1332, %1330 : index + %1334 = divi_signed %1333, %c16 : index + %1335 = subi %c-1, %1334 : index + %1336 = select %1331, %1335, %1334 : index + %1337 = muli %1336, %c-2 : index + %1338 = addi %1329, %1337 : index + %1339 = cmpi "slt", %arg5, %c0 : index + %1340 = subi %c-1, %arg5 : index + %1341 = select %1339, %1340, %arg5 : index + %1342 = divi_signed %1341, %c8 : index + %1343 = subi %c-1, %1342 : index + %1344 = select %1339, %1343, %1342 : index + %1345 = addi %arg5, %c8 : index + %1346 = cmpi "slt", %1345, %c0 : index + %1347 = subi %c-1, %1345 : index + %1348 = select %1346, %1347, %1345 : index + %1349 = divi_signed %1348, %c16 : index + %1350 = subi %c-1, %1349 : index + %1351 = select %1346, %1350, %1349 : index + %1352 = muli %1351, %c-2 : index + %1353 = addi %1344, %1352 : index + %1354 = addi %1353, %c1 : index + %1355 = cmpi "slt", %1354, %c0 : index + %1356 = subi %c-1, %1354 : index + %1357 = select %1355, %1356, %1354 : index + %1358 = divi_signed %1357, %c2 : index + %1359 = subi %c-1, %1358 : index + %1360 = select %1355, %1359, %1358 : index + %1361 = muli %1360, %c-2 : index + %1362 = addi %1338, %1361 : index + %1363 = addi %1362, %c1 : index + %1364 = load %3[%1318, %1323, %1363] : memref<16x128x2xvector<8xf32>> + %1365 = vector.extractelement %1364[%c0_i64 : i64] : vector<8xf32> + %1366 = addi %arg5, %c8 : index + %1367 = cmpi "slt", %1366, %c0 : index + %1368 = subi %c-1, %1366 : index + %1369 = select %1367, %1368, %1366 : index + %1370 = divi_signed %1369, %c16 : index + %1371 = subi %c-1, %1370 : index + %1372 = select %1367, %1371, %1370 : index + %1373 = remi_signed %1372, %c16 : index + %1374 = cmpi "slt", %1373, %c0 : index + %1375 = addi %1373, %c16 : index + %1376 = select %1374, %1375, %1373 : index + %1377 = addi %arg6, %arg7 : index + %1378 = remi_signed %1377, %c128 : index + %1379 = cmpi "slt", %1378, %c0 : index + %1380 = addi %1378, %c128 : index + %1381 = select %1379, %1380, %1378 : index + %1382 = cmpi "slt", %arg5, %c0 : index + %1383 = subi %c-1, %arg5 : index + %1384 = select %1382, %1383, %arg5 : index + %1385 = divi_signed %1384, %c8 : index + %1386 = subi %c-1, %1385 : index + %1387 = select %1382, %1386, %1385 : index + %1388 = addi %arg5, %c8 : index + %1389 = cmpi "slt", %1388, %c0 : index + %1390 = subi %c-1, %1388 : index + %1391 = select %1389, %1390, %1388 : index + %1392 = divi_signed %1391, %c16 : index + %1393 = subi %c-1, %1392 : index + %1394 = select %1389, %1393, %1392 : index + %1395 = muli %1394, %c-2 : index + %1396 = addi %1387, %1395 : index + %1397 = cmpi "slt", %arg5, %c0 : index + %1398 = subi %c-1, %arg5 : index + %1399 = select %1397, %1398, %arg5 : index + %1400 = divi_signed %1399, %c8 : index + %1401 = subi %c-1, %1400 : index + %1402 = select %1397, %1401, %1400 : index + %1403 = addi %arg5, %c8 : index + %1404 = cmpi "slt", %1403, %c0 : index + %1405 = subi %c-1, %1403 : index + %1406 = select %1404, %1405, %1403 : index + %1407 = divi_signed %1406, %c16 : index + %1408 = subi %c-1, %1407 : index + %1409 = select %1404, %1408, %1407 : index + %1410 = muli %1409, %c-2 : index + %1411 = addi %1402, %1410 : index + %1412 = addi %1411, %c1 : index + %1413 = cmpi "slt", %1412, %c0 : index + %1414 = subi %c-1, %1412 : index + %1415 = select %1413, %1414, %1412 : index + %1416 = divi_signed %1415, %c2 : index + %1417 = subi %c-1, %1416 : index + %1418 = select %1413, %1417, %1416 : index + %1419 = muli %1418, %c-2 : index + %1420 = addi %1396, %1419 : index + %1421 = addi %1420, %c1 : index + %1422 = load %3[%1376, %1381, %1421] : memref<16x128x2xvector<8xf32>> + %1423 = vector.extractelement %1422[%c1_i64 : i64] : vector<8xf32> + %1424 = addi %arg5, %c8 : index + %1425 = cmpi "slt", %1424, %c0 : index + %1426 = subi %c-1, %1424 : index + %1427 = select %1425, %1426, %1424 : index + %1428 = divi_signed %1427, %c16 : index + %1429 = subi %c-1, %1428 : index + %1430 = select %1425, %1429, %1428 : index + %1431 = remi_signed %1430, %c16 : index + %1432 = cmpi "slt", %1431, %c0 : index + %1433 = addi %1431, %c16 : index + %1434 = select %1432, %1433, %1431 : index + %1435 = addi %arg6, %arg7 : index + %1436 = remi_signed %1435, %c128 : index + %1437 = cmpi "slt", %1436, %c0 : index + %1438 = addi %1436, %c128 : index + %1439 = select %1437, %1438, %1436 : index + %1440 = cmpi "slt", %arg5, %c0 : index + %1441 = subi %c-1, %arg5 : index + %1442 = select %1440, %1441, %arg5 : index + %1443 = divi_signed %1442, %c8 : index + %1444 = subi %c-1, %1443 : index + %1445 = select %1440, %1444, %1443 : index + %1446 = addi %arg5, %c8 : index + %1447 = cmpi "slt", %1446, %c0 : index + %1448 = subi %c-1, %1446 : index + %1449 = select %1447, %1448, %1446 : index + %1450 = divi_signed %1449, %c16 : index + %1451 = subi %c-1, %1450 : index + %1452 = select %1447, %1451, %1450 : index + %1453 = muli %1452, %c-2 : index + %1454 = addi %1445, %1453 : index + %1455 = cmpi "slt", %arg5, %c0 : index + %1456 = subi %c-1, %arg5 : index + %1457 = select %1455, %1456, %arg5 : index + %1458 = divi_signed %1457, %c8 : index + %1459 = subi %c-1, %1458 : index + %1460 = select %1455, %1459, %1458 : index + %1461 = addi %arg5, %c8 : index + %1462 = cmpi "slt", %1461, %c0 : index + %1463 = subi %c-1, %1461 : index + %1464 = select %1462, %1463, %1461 : index + %1465 = divi_signed %1464, %c16 : index + %1466 = subi %c-1, %1465 : index + %1467 = select %1462, %1466, %1465 : index + %1468 = muli %1467, %c-2 : index + %1469 = addi %1460, %1468 : index + %1470 = addi %1469, %c1 : index + %1471 = cmpi "slt", %1470, %c0 : index + %1472 = subi %c-1, %1470 : index + %1473 = select %1471, %1472, %1470 : index + %1474 = divi_signed %1473, %c2 : index + %1475 = subi %c-1, %1474 : index + %1476 = select %1471, %1475, %1474 : index + %1477 = muli %1476, %c-2 : index + %1478 = addi %1454, %1477 : index + %1479 = addi %1478, %c1 : index + %1480 = load %3[%1434, %1439, %1479] : memref<16x128x2xvector<8xf32>> + %1481 = vector.extractelement %1480[%c2_i64 : i64] : vector<8xf32> + %1482 = addi %arg5, %c8 : index + %1483 = cmpi "slt", %1482, %c0 : index + %1484 = subi %c-1, %1482 : index + %1485 = select %1483, %1484, %1482 : index + %1486 = divi_signed %1485, %c16 : index + %1487 = subi %c-1, %1486 : index + %1488 = select %1483, %1487, %1486 : index + %1489 = remi_signed %1488, %c16 : index + %1490 = cmpi "slt", %1489, %c0 : index + %1491 = addi %1489, %c16 : index + %1492 = select %1490, %1491, %1489 : index + %1493 = addi %arg6, %arg7 : index + %1494 = remi_signed %1493, %c128 : index + %1495 = cmpi "slt", %1494, %c0 : index + %1496 = addi %1494, %c128 : index + %1497 = select %1495, %1496, %1494 : index + %1498 = cmpi "slt", %arg5, %c0 : index + %1499 = subi %c-1, %arg5 : index + %1500 = select %1498, %1499, %arg5 : index + %1501 = divi_signed %1500, %c8 : index + %1502 = subi %c-1, %1501 : index + %1503 = select %1498, %1502, %1501 : index + %1504 = addi %arg5, %c8 : index + %1505 = cmpi "slt", %1504, %c0 : index + %1506 = subi %c-1, %1504 : index + %1507 = select %1505, %1506, %1504 : index + %1508 = divi_signed %1507, %c16 : index + %1509 = subi %c-1, %1508 : index + %1510 = select %1505, %1509, %1508 : index + %1511 = muli %1510, %c-2 : index + %1512 = addi %1503, %1511 : index + %1513 = cmpi "slt", %arg5, %c0 : index + %1514 = subi %c-1, %arg5 : index + %1515 = select %1513, %1514, %arg5 : index + %1516 = divi_signed %1515, %c8 : index + %1517 = subi %c-1, %1516 : index + %1518 = select %1513, %1517, %1516 : index + %1519 = addi %arg5, %c8 : index + %1520 = cmpi "slt", %1519, %c0 : index + %1521 = subi %c-1, %1519 : index + %1522 = select %1520, %1521, %1519 : index + %1523 = divi_signed %1522, %c16 : index + %1524 = subi %c-1, %1523 : index + %1525 = select %1520, %1524, %1523 : index + %1526 = muli %1525, %c-2 : index + %1527 = addi %1518, %1526 : index + %1528 = addi %1527, %c1 : index + %1529 = cmpi "slt", %1528, %c0 : index + %1530 = subi %c-1, %1528 : index + %1531 = select %1529, %1530, %1528 : index + %1532 = divi_signed %1531, %c2 : index + %1533 = subi %c-1, %1532 : index + %1534 = select %1529, %1533, %1532 : index + %1535 = muli %1534, %c-2 : index + %1536 = addi %1512, %1535 : index + %1537 = addi %1536, %c1 : index + %1538 = load %3[%1492, %1497, %1537] : memref<16x128x2xvector<8xf32>> + %1539 = vector.extractelement %1538[%c3_i64 : i64] : vector<8xf32> + %1540 = addi %arg5, %c8 : index + %1541 = cmpi "slt", %1540, %c0 : index + %1542 = subi %c-1, %1540 : index + %1543 = select %1541, %1542, %1540 : index + %1544 = divi_signed %1543, %c16 : index + %1545 = subi %c-1, %1544 : index + %1546 = select %1541, %1545, %1544 : index + %1547 = remi_signed %1546, %c16 : index + %1548 = cmpi "slt", %1547, %c0 : index + %1549 = addi %1547, %c16 : index + %1550 = select %1548, %1549, %1547 : index + %1551 = addi %arg6, %arg7 : index + %1552 = remi_signed %1551, %c128 : index + %1553 = cmpi "slt", %1552, %c0 : index + %1554 = addi %1552, %c128 : index + %1555 = select %1553, %1554, %1552 : index + %1556 = cmpi "slt", %arg5, %c0 : index + %1557 = subi %c-1, %arg5 : index + %1558 = select %1556, %1557, %arg5 : index + %1559 = divi_signed %1558, %c8 : index + %1560 = subi %c-1, %1559 : index + %1561 = select %1556, %1560, %1559 : index + %1562 = addi %arg5, %c8 : index + %1563 = cmpi "slt", %1562, %c0 : index + %1564 = subi %c-1, %1562 : index + %1565 = select %1563, %1564, %1562 : index + %1566 = divi_signed %1565, %c16 : index + %1567 = subi %c-1, %1566 : index + %1568 = select %1563, %1567, %1566 : index + %1569 = muli %1568, %c-2 : index + %1570 = addi %1561, %1569 : index + %1571 = cmpi "slt", %arg5, %c0 : index + %1572 = subi %c-1, %arg5 : index + %1573 = select %1571, %1572, %arg5 : index + %1574 = divi_signed %1573, %c8 : index + %1575 = subi %c-1, %1574 : index + %1576 = select %1571, %1575, %1574 : index + %1577 = addi %arg5, %c8 : index + %1578 = cmpi "slt", %1577, %c0 : index + %1579 = subi %c-1, %1577 : index + %1580 = select %1578, %1579, %1577 : index + %1581 = divi_signed %1580, %c16 : index + %1582 = subi %c-1, %1581 : index + %1583 = select %1578, %1582, %1581 : index + %1584 = muli %1583, %c-2 : index + %1585 = addi %1576, %1584 : index + %1586 = addi %1585, %c1 : index + %1587 = cmpi "slt", %1586, %c0 : index + %1588 = subi %c-1, %1586 : index + %1589 = select %1587, %1588, %1586 : index + %1590 = divi_signed %1589, %c2 : index + %1591 = subi %c-1, %1590 : index + %1592 = select %1587, %1591, %1590 : index + %1593 = muli %1592, %c-2 : index + %1594 = addi %1570, %1593 : index + %1595 = addi %1594, %c1 : index + %1596 = load %3[%1550, %1555, %1595] : memref<16x128x2xvector<8xf32>> + %1597 = vector.extractelement %1596[%c4_i64 : i64] : vector<8xf32> + %1598 = addi %arg5, %c8 : index + %1599 = cmpi "slt", %1598, %c0 : index + %1600 = subi %c-1, %1598 : index + %1601 = select %1599, %1600, %1598 : index + %1602 = divi_signed %1601, %c16 : index + %1603 = subi %c-1, %1602 : index + %1604 = select %1599, %1603, %1602 : index + %1605 = remi_signed %1604, %c16 : index + %1606 = cmpi "slt", %1605, %c0 : index + %1607 = addi %1605, %c16 : index + %1608 = select %1606, %1607, %1605 : index + %1609 = addi %arg6, %arg7 : index + %1610 = remi_signed %1609, %c128 : index + %1611 = cmpi "slt", %1610, %c0 : index + %1612 = addi %1610, %c128 : index + %1613 = select %1611, %1612, %1610 : index + %1614 = cmpi "slt", %arg5, %c0 : index + %1615 = subi %c-1, %arg5 : index + %1616 = select %1614, %1615, %arg5 : index + %1617 = divi_signed %1616, %c8 : index + %1618 = subi %c-1, %1617 : index + %1619 = select %1614, %1618, %1617 : index + %1620 = addi %arg5, %c8 : index + %1621 = cmpi "slt", %1620, %c0 : index + %1622 = subi %c-1, %1620 : index + %1623 = select %1621, %1622, %1620 : index + %1624 = divi_signed %1623, %c16 : index + %1625 = subi %c-1, %1624 : index + %1626 = select %1621, %1625, %1624 : index + %1627 = muli %1626, %c-2 : index + %1628 = addi %1619, %1627 : index + %1629 = cmpi "slt", %arg5, %c0 : index + %1630 = subi %c-1, %arg5 : index + %1631 = select %1629, %1630, %arg5 : index + %1632 = divi_signed %1631, %c8 : index + %1633 = subi %c-1, %1632 : index + %1634 = select %1629, %1633, %1632 : index + %1635 = addi %arg5, %c8 : index + %1636 = cmpi "slt", %1635, %c0 : index + %1637 = subi %c-1, %1635 : index + %1638 = select %1636, %1637, %1635 : index + %1639 = divi_signed %1638, %c16 : index + %1640 = subi %c-1, %1639 : index + %1641 = select %1636, %1640, %1639 : index + %1642 = muli %1641, %c-2 : index + %1643 = addi %1634, %1642 : index + %1644 = addi %1643, %c1 : index + %1645 = cmpi "slt", %1644, %c0 : index + %1646 = subi %c-1, %1644 : index + %1647 = select %1645, %1646, %1644 : index + %1648 = divi_signed %1647, %c2 : index + %1649 = subi %c-1, %1648 : index + %1650 = select %1645, %1649, %1648 : index + %1651 = muli %1650, %c-2 : index + %1652 = addi %1628, %1651 : index + %1653 = addi %1652, %c1 : index + %1654 = load %3[%1608, %1613, %1653] : memref<16x128x2xvector<8xf32>> + %1655 = vector.extractelement %1654[%c5_i64 : i64] : vector<8xf32> + %1656 = addi %arg5, %c8 : index + %1657 = cmpi "slt", %1656, %c0 : index + %1658 = subi %c-1, %1656 : index + %1659 = select %1657, %1658, %1656 : index + %1660 = divi_signed %1659, %c16 : index + %1661 = subi %c-1, %1660 : index + %1662 = select %1657, %1661, %1660 : index + %1663 = remi_signed %1662, %c16 : index + %1664 = cmpi "slt", %1663, %c0 : index + %1665 = addi %1663, %c16 : index + %1666 = select %1664, %1665, %1663 : index + %1667 = addi %arg6, %arg7 : index + %1668 = remi_signed %1667, %c128 : index + %1669 = cmpi "slt", %1668, %c0 : index + %1670 = addi %1668, %c128 : index + %1671 = select %1669, %1670, %1668 : index + %1672 = cmpi "slt", %arg5, %c0 : index + %1673 = subi %c-1, %arg5 : index + %1674 = select %1672, %1673, %arg5 : index + %1675 = divi_signed %1674, %c8 : index + %1676 = subi %c-1, %1675 : index + %1677 = select %1672, %1676, %1675 : index + %1678 = addi %arg5, %c8 : index + %1679 = cmpi "slt", %1678, %c0 : index + %1680 = subi %c-1, %1678 : index + %1681 = select %1679, %1680, %1678 : index + %1682 = divi_signed %1681, %c16 : index + %1683 = subi %c-1, %1682 : index + %1684 = select %1679, %1683, %1682 : index + %1685 = muli %1684, %c-2 : index + %1686 = addi %1677, %1685 : index + %1687 = cmpi "slt", %arg5, %c0 : index + %1688 = subi %c-1, %arg5 : index + %1689 = select %1687, %1688, %arg5 : index + %1690 = divi_signed %1689, %c8 : index + %1691 = subi %c-1, %1690 : index + %1692 = select %1687, %1691, %1690 : index + %1693 = addi %arg5, %c8 : index + %1694 = cmpi "slt", %1693, %c0 : index + %1695 = subi %c-1, %1693 : index + %1696 = select %1694, %1695, %1693 : index + %1697 = divi_signed %1696, %c16 : index + %1698 = subi %c-1, %1697 : index + %1699 = select %1694, %1698, %1697 : index + %1700 = muli %1699, %c-2 : index + %1701 = addi %1692, %1700 : index + %1702 = addi %1701, %c1 : index + %1703 = cmpi "slt", %1702, %c0 : index + %1704 = subi %c-1, %1702 : index + %1705 = select %1703, %1704, %1702 : index + %1706 = divi_signed %1705, %c2 : index + %1707 = subi %c-1, %1706 : index + %1708 = select %1703, %1707, %1706 : index + %1709 = muli %1708, %c-2 : index + %1710 = addi %1686, %1709 : index + %1711 = addi %1710, %c1 : index + %1712 = load %3[%1666, %1671, %1711] : memref<16x128x2xvector<8xf32>> + %1713 = vector.extractelement %1712[%c6_i64 : i64] : vector<8xf32> + %1714 = addi %arg5, %c8 : index + %1715 = cmpi "slt", %1714, %c0 : index + %1716 = subi %c-1, %1714 : index + %1717 = select %1715, %1716, %1714 : index + %1718 = divi_signed %1717, %c16 : index + %1719 = subi %c-1, %1718 : index + %1720 = select %1715, %1719, %1718 : index + %1721 = remi_signed %1720, %c16 : index + %1722 = cmpi "slt", %1721, %c0 : index + %1723 = addi %1721, %c16 : index + %1724 = select %1722, %1723, %1721 : index + %1725 = addi %arg6, %arg7 : index + %1726 = remi_signed %1725, %c128 : index + %1727 = cmpi "slt", %1726, %c0 : index + %1728 = addi %1726, %c128 : index + %1729 = select %1727, %1728, %1726 : index + %1730 = cmpi "slt", %arg5, %c0 : index + %1731 = subi %c-1, %arg5 : index + %1732 = select %1730, %1731, %arg5 : index + %1733 = divi_signed %1732, %c8 : index + %1734 = subi %c-1, %1733 : index + %1735 = select %1730, %1734, %1733 : index + %1736 = addi %arg5, %c8 : index + %1737 = cmpi "slt", %1736, %c0 : index + %1738 = subi %c-1, %1736 : index + %1739 = select %1737, %1738, %1736 : index + %1740 = divi_signed %1739, %c16 : index + %1741 = subi %c-1, %1740 : index + %1742 = select %1737, %1741, %1740 : index + %1743 = muli %1742, %c-2 : index + %1744 = addi %1735, %1743 : index + %1745 = cmpi "slt", %arg5, %c0 : index + %1746 = subi %c-1, %arg5 : index + %1747 = select %1745, %1746, %arg5 : index + %1748 = divi_signed %1747, %c8 : index + %1749 = subi %c-1, %1748 : index + %1750 = select %1745, %1749, %1748 : index + %1751 = addi %arg5, %c8 : index + %1752 = cmpi "slt", %1751, %c0 : index + %1753 = subi %c-1, %1751 : index + %1754 = select %1752, %1753, %1751 : index + %1755 = divi_signed %1754, %c16 : index + %1756 = subi %c-1, %1755 : index + %1757 = select %1752, %1756, %1755 : index + %1758 = muli %1757, %c-2 : index + %1759 = addi %1750, %1758 : index + %1760 = addi %1759, %c1 : index + %1761 = cmpi "slt", %1760, %c0 : index + %1762 = subi %c-1, %1760 : index + %1763 = select %1761, %1762, %1760 : index + %1764 = divi_signed %1763, %c2 : index + %1765 = subi %c-1, %1764 : index + %1766 = select %1761, %1765, %1764 : index + %1767 = muli %1766, %c-2 : index + %1768 = addi %1744, %1767 : index + %1769 = addi %1768, %c1 : index + %1770 = load %3[%1724, %1729, %1769] : memref<16x128x2xvector<8xf32>> + %1771 = vector.extractelement %1770[%c7_i64 : i64] : vector<8xf32> + %1772 = mulf %1300, %1365 {RelaxedPrecision} : f32 + %1773 = mulf %1301, %1423 {RelaxedPrecision} : f32 + %1774 = mulf %1302, %1481 {RelaxedPrecision} : f32 + %1775 = mulf %1303, %1539 {RelaxedPrecision} : f32 + %1776 = mulf %1304, %1597 {RelaxedPrecision} : f32 + %1777 = mulf %1305, %1655 {RelaxedPrecision} : f32 + %1778 = mulf %1306, %1713 {RelaxedPrecision} : f32 + %1779 = mulf %1307, %1771 {RelaxedPrecision} : f32 + %1780 = addi %arg5, %c8 : index + %1781 = cmpi "slt", %1780, %c0 : index + %1782 = subi %c-1, %1780 : index + %1783 = select %1781, %1782, %1780 : index + %1784 = divi_signed %1783, %c16 : index + %1785 = subi %c-1, %1784 : index + %1786 = select %1781, %1785, %1784 : index + %1787 = remi_signed %1786, %c16 : index + %1788 = cmpi "slt", %1787, %c0 : index + %1789 = addi %1787, %c16 : index + %1790 = select %1788, %1789, %1787 : index + %1791 = cmpi "slt", %arg5, %c0 : index + %1792 = subi %c-1, %arg5 : index + %1793 = select %1791, %1792, %arg5 : index + %1794 = divi_signed %1793, %c8 : index + %1795 = subi %c-1, %1794 : index + %1796 = select %1791, %1795, %1794 : index + %1797 = addi %arg5, %c8 : index + %1798 = cmpi "slt", %1797, %c0 : index + %1799 = subi %c-1, %1797 : index + %1800 = select %1798, %1799, %1797 : index + %1801 = divi_signed %1800, %c16 : index + %1802 = subi %c-1, %1801 : index + %1803 = select %1798, %1802, %1801 : index + %1804 = muli %1803, %c-2 : index + %1805 = addi %1796, %1804 : index + %1806 = cmpi "slt", %arg5, %c0 : index + %1807 = subi %c-1, %arg5 : index + %1808 = select %1806, %1807, %arg5 : index + %1809 = divi_signed %1808, %c8 : index + %1810 = subi %c-1, %1809 : index + %1811 = select %1806, %1810, %1809 : index + %1812 = addi %arg5, %c8 : index + %1813 = cmpi "slt", %1812, %c0 : index + %1814 = subi %c-1, %1812 : index + %1815 = select %1813, %1814, %1812 : index + %1816 = divi_signed %1815, %c16 : index + %1817 = subi %c-1, %1816 : index + %1818 = select %1813, %1817, %1816 : index + %1819 = muli %1818, %c-2 : index + %1820 = addi %1811, %1819 : index + %1821 = addi %1820, %c1 : index + %1822 = cmpi "slt", %1821, %c0 : index + %1823 = subi %c-1, %1821 : index + %1824 = select %1822, %1823, %1821 : index + %1825 = divi_signed %1824, %c2 : index + %1826 = subi %c-1, %1825 : index + %1827 = select %1822, %1826, %1825 : index + %1828 = muli %1827, %c-2 : index + %1829 = addi %1805, %1828 : index + %1830 = addi %1829, %c1 : index + %1831 = load %2[%1790, %c0, %1830] : memref<16x6x2xvector<8xf32>> + %1832 = vector.extractelement %1831[%c0_i64 : i64] : vector<8xf32> + %1833 = addi %arg5, %c8 : index + %1834 = cmpi "slt", %1833, %c0 : index + %1835 = subi %c-1, %1833 : index + %1836 = select %1834, %1835, %1833 : index + %1837 = divi_signed %1836, %c16 : index + %1838 = subi %c-1, %1837 : index + %1839 = select %1834, %1838, %1837 : index + %1840 = remi_signed %1839, %c16 : index + %1841 = cmpi "slt", %1840, %c0 : index + %1842 = addi %1840, %c16 : index + %1843 = select %1841, %1842, %1840 : index + %1844 = cmpi "slt", %arg5, %c0 : index + %1845 = subi %c-1, %arg5 : index + %1846 = select %1844, %1845, %arg5 : index + %1847 = divi_signed %1846, %c8 : index + %1848 = subi %c-1, %1847 : index + %1849 = select %1844, %1848, %1847 : index + %1850 = addi %arg5, %c8 : index + %1851 = cmpi "slt", %1850, %c0 : index + %1852 = subi %c-1, %1850 : index + %1853 = select %1851, %1852, %1850 : index + %1854 = divi_signed %1853, %c16 : index + %1855 = subi %c-1, %1854 : index + %1856 = select %1851, %1855, %1854 : index + %1857 = muli %1856, %c-2 : index + %1858 = addi %1849, %1857 : index + %1859 = cmpi "slt", %arg5, %c0 : index + %1860 = subi %c-1, %arg5 : index + %1861 = select %1859, %1860, %arg5 : index + %1862 = divi_signed %1861, %c8 : index + %1863 = subi %c-1, %1862 : index + %1864 = select %1859, %1863, %1862 : index + %1865 = addi %arg5, %c8 : index + %1866 = cmpi "slt", %1865, %c0 : index + %1867 = subi %c-1, %1865 : index + %1868 = select %1866, %1867, %1865 : index + %1869 = divi_signed %1868, %c16 : index + %1870 = subi %c-1, %1869 : index + %1871 = select %1866, %1870, %1869 : index + %1872 = muli %1871, %c-2 : index + %1873 = addi %1864, %1872 : index + %1874 = addi %1873, %c1 : index + %1875 = cmpi "slt", %1874, %c0 : index + %1876 = subi %c-1, %1874 : index + %1877 = select %1875, %1876, %1874 : index + %1878 = divi_signed %1877, %c2 : index + %1879 = subi %c-1, %1878 : index + %1880 = select %1875, %1879, %1878 : index + %1881 = muli %1880, %c-2 : index + %1882 = addi %1858, %1881 : index + %1883 = addi %1882, %c1 : index + %1884 = load %2[%1843, %c0, %1883] : memref<16x6x2xvector<8xf32>> + %1885 = vector.extractelement %1884[%c1_i64 : i64] : vector<8xf32> + %1886 = addi %arg5, %c8 : index + %1887 = cmpi "slt", %1886, %c0 : index + %1888 = subi %c-1, %1886 : index + %1889 = select %1887, %1888, %1886 : index + %1890 = divi_signed %1889, %c16 : index + %1891 = subi %c-1, %1890 : index + %1892 = select %1887, %1891, %1890 : index + %1893 = remi_signed %1892, %c16 : index + %1894 = cmpi "slt", %1893, %c0 : index + %1895 = addi %1893, %c16 : index + %1896 = select %1894, %1895, %1893 : index + %1897 = cmpi "slt", %arg5, %c0 : index + %1898 = subi %c-1, %arg5 : index + %1899 = select %1897, %1898, %arg5 : index + %1900 = divi_signed %1899, %c8 : index + %1901 = subi %c-1, %1900 : index + %1902 = select %1897, %1901, %1900 : index + %1903 = addi %arg5, %c8 : index + %1904 = cmpi "slt", %1903, %c0 : index + %1905 = subi %c-1, %1903 : index + %1906 = select %1904, %1905, %1903 : index + %1907 = divi_signed %1906, %c16 : index + %1908 = subi %c-1, %1907 : index + %1909 = select %1904, %1908, %1907 : index + %1910 = muli %1909, %c-2 : index + %1911 = addi %1902, %1910 : index + %1912 = cmpi "slt", %arg5, %c0 : index + %1913 = subi %c-1, %arg5 : index + %1914 = select %1912, %1913, %arg5 : index + %1915 = divi_signed %1914, %c8 : index + %1916 = subi %c-1, %1915 : index + %1917 = select %1912, %1916, %1915 : index + %1918 = addi %arg5, %c8 : index + %1919 = cmpi "slt", %1918, %c0 : index + %1920 = subi %c-1, %1918 : index + %1921 = select %1919, %1920, %1918 : index + %1922 = divi_signed %1921, %c16 : index + %1923 = subi %c-1, %1922 : index + %1924 = select %1919, %1923, %1922 : index + %1925 = muli %1924, %c-2 : index + %1926 = addi %1917, %1925 : index + %1927 = addi %1926, %c1 : index + %1928 = cmpi "slt", %1927, %c0 : index + %1929 = subi %c-1, %1927 : index + %1930 = select %1928, %1929, %1927 : index + %1931 = divi_signed %1930, %c2 : index + %1932 = subi %c-1, %1931 : index + %1933 = select %1928, %1932, %1931 : index + %1934 = muli %1933, %c-2 : index + %1935 = addi %1911, %1934 : index + %1936 = addi %1935, %c1 : index + %1937 = load %2[%1896, %c0, %1936] : memref<16x6x2xvector<8xf32>> + %1938 = vector.extractelement %1937[%c2_i64 : i64] : vector<8xf32> + %1939 = addi %arg5, %c8 : index + %1940 = cmpi "slt", %1939, %c0 : index + %1941 = subi %c-1, %1939 : index + %1942 = select %1940, %1941, %1939 : index + %1943 = divi_signed %1942, %c16 : index + %1944 = subi %c-1, %1943 : index + %1945 = select %1940, %1944, %1943 : index + %1946 = remi_signed %1945, %c16 : index + %1947 = cmpi "slt", %1946, %c0 : index + %1948 = addi %1946, %c16 : index + %1949 = select %1947, %1948, %1946 : index + %1950 = cmpi "slt", %arg5, %c0 : index + %1951 = subi %c-1, %arg5 : index + %1952 = select %1950, %1951, %arg5 : index + %1953 = divi_signed %1952, %c8 : index + %1954 = subi %c-1, %1953 : index + %1955 = select %1950, %1954, %1953 : index + %1956 = addi %arg5, %c8 : index + %1957 = cmpi "slt", %1956, %c0 : index + %1958 = subi %c-1, %1956 : index + %1959 = select %1957, %1958, %1956 : index + %1960 = divi_signed %1959, %c16 : index + %1961 = subi %c-1, %1960 : index + %1962 = select %1957, %1961, %1960 : index + %1963 = muli %1962, %c-2 : index + %1964 = addi %1955, %1963 : index + %1965 = cmpi "slt", %arg5, %c0 : index + %1966 = subi %c-1, %arg5 : index + %1967 = select %1965, %1966, %arg5 : index + %1968 = divi_signed %1967, %c8 : index + %1969 = subi %c-1, %1968 : index + %1970 = select %1965, %1969, %1968 : index + %1971 = addi %arg5, %c8 : index + %1972 = cmpi "slt", %1971, %c0 : index + %1973 = subi %c-1, %1971 : index + %1974 = select %1972, %1973, %1971 : index + %1975 = divi_signed %1974, %c16 : index + %1976 = subi %c-1, %1975 : index + %1977 = select %1972, %1976, %1975 : index + %1978 = muli %1977, %c-2 : index + %1979 = addi %1970, %1978 : index + %1980 = addi %1979, %c1 : index + %1981 = cmpi "slt", %1980, %c0 : index + %1982 = subi %c-1, %1980 : index + %1983 = select %1981, %1982, %1980 : index + %1984 = divi_signed %1983, %c2 : index + %1985 = subi %c-1, %1984 : index + %1986 = select %1981, %1985, %1984 : index + %1987 = muli %1986, %c-2 : index + %1988 = addi %1964, %1987 : index + %1989 = addi %1988, %c1 : index + %1990 = load %2[%1949, %c0, %1989] : memref<16x6x2xvector<8xf32>> + %1991 = vector.extractelement %1990[%c3_i64 : i64] : vector<8xf32> + %1992 = addi %arg5, %c8 : index + %1993 = cmpi "slt", %1992, %c0 : index + %1994 = subi %c-1, %1992 : index + %1995 = select %1993, %1994, %1992 : index + %1996 = divi_signed %1995, %c16 : index + %1997 = subi %c-1, %1996 : index + %1998 = select %1993, %1997, %1996 : index + %1999 = remi_signed %1998, %c16 : index + %2000 = cmpi "slt", %1999, %c0 : index + %2001 = addi %1999, %c16 : index + %2002 = select %2000, %2001, %1999 : index + %2003 = cmpi "slt", %arg5, %c0 : index + %2004 = subi %c-1, %arg5 : index + %2005 = select %2003, %2004, %arg5 : index + %2006 = divi_signed %2005, %c8 : index + %2007 = subi %c-1, %2006 : index + %2008 = select %2003, %2007, %2006 : index + %2009 = addi %arg5, %c8 : index + %2010 = cmpi "slt", %2009, %c0 : index + %2011 = subi %c-1, %2009 : index + %2012 = select %2010, %2011, %2009 : index + %2013 = divi_signed %2012, %c16 : index + %2014 = subi %c-1, %2013 : index + %2015 = select %2010, %2014, %2013 : index + %2016 = muli %2015, %c-2 : index + %2017 = addi %2008, %2016 : index + %2018 = cmpi "slt", %arg5, %c0 : index + %2019 = subi %c-1, %arg5 : index + %2020 = select %2018, %2019, %arg5 : index + %2021 = divi_signed %2020, %c8 : index + %2022 = subi %c-1, %2021 : index + %2023 = select %2018, %2022, %2021 : index + %2024 = addi %arg5, %c8 : index + %2025 = cmpi "slt", %2024, %c0 : index + %2026 = subi %c-1, %2024 : index + %2027 = select %2025, %2026, %2024 : index + %2028 = divi_signed %2027, %c16 : index + %2029 = subi %c-1, %2028 : index + %2030 = select %2025, %2029, %2028 : index + %2031 = muli %2030, %c-2 : index + %2032 = addi %2023, %2031 : index + %2033 = addi %2032, %c1 : index + %2034 = cmpi "slt", %2033, %c0 : index + %2035 = subi %c-1, %2033 : index + %2036 = select %2034, %2035, %2033 : index + %2037 = divi_signed %2036, %c2 : index + %2038 = subi %c-1, %2037 : index + %2039 = select %2034, %2038, %2037 : index + %2040 = muli %2039, %c-2 : index + %2041 = addi %2017, %2040 : index + %2042 = addi %2041, %c1 : index + %2043 = load %2[%2002, %c0, %2042] : memref<16x6x2xvector<8xf32>> + %2044 = vector.extractelement %2043[%c4_i64 : i64] : vector<8xf32> + %2045 = addi %arg5, %c8 : index + %2046 = cmpi "slt", %2045, %c0 : index + %2047 = subi %c-1, %2045 : index + %2048 = select %2046, %2047, %2045 : index + %2049 = divi_signed %2048, %c16 : index + %2050 = subi %c-1, %2049 : index + %2051 = select %2046, %2050, %2049 : index + %2052 = remi_signed %2051, %c16 : index + %2053 = cmpi "slt", %2052, %c0 : index + %2054 = addi %2052, %c16 : index + %2055 = select %2053, %2054, %2052 : index + %2056 = cmpi "slt", %arg5, %c0 : index + %2057 = subi %c-1, %arg5 : index + %2058 = select %2056, %2057, %arg5 : index + %2059 = divi_signed %2058, %c8 : index + %2060 = subi %c-1, %2059 : index + %2061 = select %2056, %2060, %2059 : index + %2062 = addi %arg5, %c8 : index + %2063 = cmpi "slt", %2062, %c0 : index + %2064 = subi %c-1, %2062 : index + %2065 = select %2063, %2064, %2062 : index + %2066 = divi_signed %2065, %c16 : index + %2067 = subi %c-1, %2066 : index + %2068 = select %2063, %2067, %2066 : index + %2069 = muli %2068, %c-2 : index + %2070 = addi %2061, %2069 : index + %2071 = cmpi "slt", %arg5, %c0 : index + %2072 = subi %c-1, %arg5 : index + %2073 = select %2071, %2072, %arg5 : index + %2074 = divi_signed %2073, %c8 : index + %2075 = subi %c-1, %2074 : index + %2076 = select %2071, %2075, %2074 : index + %2077 = addi %arg5, %c8 : index + %2078 = cmpi "slt", %2077, %c0 : index + %2079 = subi %c-1, %2077 : index + %2080 = select %2078, %2079, %2077 : index + %2081 = divi_signed %2080, %c16 : index + %2082 = subi %c-1, %2081 : index + %2083 = select %2078, %2082, %2081 : index + %2084 = muli %2083, %c-2 : index + %2085 = addi %2076, %2084 : index + %2086 = addi %2085, %c1 : index + %2087 = cmpi "slt", %2086, %c0 : index + %2088 = subi %c-1, %2086 : index + %2089 = select %2087, %2088, %2086 : index + %2090 = divi_signed %2089, %c2 : index + %2091 = subi %c-1, %2090 : index + %2092 = select %2087, %2091, %2090 : index + %2093 = muli %2092, %c-2 : index + %2094 = addi %2070, %2093 : index + %2095 = addi %2094, %c1 : index + %2096 = load %2[%2055, %c0, %2095] : memref<16x6x2xvector<8xf32>> + %2097 = vector.extractelement %2096[%c5_i64 : i64] : vector<8xf32> + %2098 = addi %arg5, %c8 : index + %2099 = cmpi "slt", %2098, %c0 : index + %2100 = subi %c-1, %2098 : index + %2101 = select %2099, %2100, %2098 : index + %2102 = divi_signed %2101, %c16 : index + %2103 = subi %c-1, %2102 : index + %2104 = select %2099, %2103, %2102 : index + %2105 = remi_signed %2104, %c16 : index + %2106 = cmpi "slt", %2105, %c0 : index + %2107 = addi %2105, %c16 : index + %2108 = select %2106, %2107, %2105 : index + %2109 = cmpi "slt", %arg5, %c0 : index + %2110 = subi %c-1, %arg5 : index + %2111 = select %2109, %2110, %arg5 : index + %2112 = divi_signed %2111, %c8 : index + %2113 = subi %c-1, %2112 : index + %2114 = select %2109, %2113, %2112 : index + %2115 = addi %arg5, %c8 : index + %2116 = cmpi "slt", %2115, %c0 : index + %2117 = subi %c-1, %2115 : index + %2118 = select %2116, %2117, %2115 : index + %2119 = divi_signed %2118, %c16 : index + %2120 = subi %c-1, %2119 : index + %2121 = select %2116, %2120, %2119 : index + %2122 = muli %2121, %c-2 : index + %2123 = addi %2114, %2122 : index + %2124 = cmpi "slt", %arg5, %c0 : index + %2125 = subi %c-1, %arg5 : index + %2126 = select %2124, %2125, %arg5 : index + %2127 = divi_signed %2126, %c8 : index + %2128 = subi %c-1, %2127 : index + %2129 = select %2124, %2128, %2127 : index + %2130 = addi %arg5, %c8 : index + %2131 = cmpi "slt", %2130, %c0 : index + %2132 = subi %c-1, %2130 : index + %2133 = select %2131, %2132, %2130 : index + %2134 = divi_signed %2133, %c16 : index + %2135 = subi %c-1, %2134 : index + %2136 = select %2131, %2135, %2134 : index + %2137 = muli %2136, %c-2 : index + %2138 = addi %2129, %2137 : index + %2139 = addi %2138, %c1 : index + %2140 = cmpi "slt", %2139, %c0 : index + %2141 = subi %c-1, %2139 : index + %2142 = select %2140, %2141, %2139 : index + %2143 = divi_signed %2142, %c2 : index + %2144 = subi %c-1, %2143 : index + %2145 = select %2140, %2144, %2143 : index + %2146 = muli %2145, %c-2 : index + %2147 = addi %2123, %2146 : index + %2148 = addi %2147, %c1 : index + %2149 = load %2[%2108, %c0, %2148] : memref<16x6x2xvector<8xf32>> + %2150 = vector.extractelement %2149[%c6_i64 : i64] : vector<8xf32> + %2151 = addi %arg5, %c8 : index + %2152 = cmpi "slt", %2151, %c0 : index + %2153 = subi %c-1, %2151 : index + %2154 = select %2152, %2153, %2151 : index + %2155 = divi_signed %2154, %c16 : index + %2156 = subi %c-1, %2155 : index + %2157 = select %2152, %2156, %2155 : index + %2158 = remi_signed %2157, %c16 : index + %2159 = cmpi "slt", %2158, %c0 : index + %2160 = addi %2158, %c16 : index + %2161 = select %2159, %2160, %2158 : index + %2162 = cmpi "slt", %arg5, %c0 : index + %2163 = subi %c-1, %arg5 : index + %2164 = select %2162, %2163, %arg5 : index + %2165 = divi_signed %2164, %c8 : index + %2166 = subi %c-1, %2165 : index + %2167 = select %2162, %2166, %2165 : index + %2168 = addi %arg5, %c8 : index + %2169 = cmpi "slt", %2168, %c0 : index + %2170 = subi %c-1, %2168 : index + %2171 = select %2169, %2170, %2168 : index + %2172 = divi_signed %2171, %c16 : index + %2173 = subi %c-1, %2172 : index + %2174 = select %2169, %2173, %2172 : index + %2175 = muli %2174, %c-2 : index + %2176 = addi %2167, %2175 : index + %2177 = cmpi "slt", %arg5, %c0 : index + %2178 = subi %c-1, %arg5 : index + %2179 = select %2177, %2178, %arg5 : index + %2180 = divi_signed %2179, %c8 : index + %2181 = subi %c-1, %2180 : index + %2182 = select %2177, %2181, %2180 : index + %2183 = addi %arg5, %c8 : index + %2184 = cmpi "slt", %2183, %c0 : index + %2185 = subi %c-1, %2183 : index + %2186 = select %2184, %2185, %2183 : index + %2187 = divi_signed %2186, %c16 : index + %2188 = subi %c-1, %2187 : index + %2189 = select %2184, %2188, %2187 : index + %2190 = muli %2189, %c-2 : index + %2191 = addi %2182, %2190 : index + %2192 = addi %2191, %c1 : index + %2193 = cmpi "slt", %2192, %c0 : index + %2194 = subi %c-1, %2192 : index + %2195 = select %2193, %2194, %2192 : index + %2196 = divi_signed %2195, %c2 : index + %2197 = subi %c-1, %2196 : index + %2198 = select %2193, %2197, %2196 : index + %2199 = muli %2198, %c-2 : index + %2200 = addi %2176, %2199 : index + %2201 = addi %2200, %c1 : index + %2202 = load %2[%2161, %c0, %2201] : memref<16x6x2xvector<8xf32>> + %2203 = vector.extractelement %2202[%c7_i64 : i64] : vector<8xf32> + %2204 = addf %1832, %1772 {RelaxedPrecision} : f32 + %2205 = addf %1885, %1773 {RelaxedPrecision} : f32 + %2206 = addf %1938, %1774 {RelaxedPrecision} : f32 + %2207 = addf %1991, %1775 {RelaxedPrecision} : f32 + %2208 = addf %2044, %1776 {RelaxedPrecision} : f32 + %2209 = addf %2097, %1777 {RelaxedPrecision} : f32 + %2210 = addf %2150, %1778 {RelaxedPrecision} : f32 + %2211 = addf %2203, %1779 {RelaxedPrecision} : f32 + %2212 = addi %arg5, %c8 : index + %2213 = cmpi "slt", %2212, %c0 : index + %2214 = subi %c-1, %2212 : index + %2215 = select %2213, %2214, %2212 : index + %2216 = divi_signed %2215, %c16 : index + %2217 = subi %c-1, %2216 : index + %2218 = select %2213, %2217, %2216 : index + %2219 = remi_signed %2218, %c16 : index + %2220 = cmpi "slt", %2219, %c0 : index + %2221 = addi %2219, %c16 : index + %2222 = select %2220, %2221, %2219 : index + %2223 = cmpi "slt", %arg5, %c0 : index + %2224 = subi %c-1, %arg5 : index + %2225 = select %2223, %2224, %arg5 : index + %2226 = divi_signed %2225, %c8 : index + %2227 = subi %c-1, %2226 : index + %2228 = select %2223, %2227, %2226 : index + %2229 = addi %arg5, %c8 : index + %2230 = cmpi "slt", %2229, %c0 : index + %2231 = subi %c-1, %2229 : index + %2232 = select %2230, %2231, %2229 : index + %2233 = divi_signed %2232, %c16 : index + %2234 = subi %c-1, %2233 : index + %2235 = select %2230, %2234, %2233 : index + %2236 = muli %2235, %c-2 : index + %2237 = addi %2228, %2236 : index + %2238 = cmpi "slt", %arg5, %c0 : index + %2239 = subi %c-1, %arg5 : index + %2240 = select %2238, %2239, %arg5 : index + %2241 = divi_signed %2240, %c8 : index + %2242 = subi %c-1, %2241 : index + %2243 = select %2238, %2242, %2241 : index + %2244 = addi %arg5, %c8 : index + %2245 = cmpi "slt", %2244, %c0 : index + %2246 = subi %c-1, %2244 : index + %2247 = select %2245, %2246, %2244 : index + %2248 = divi_signed %2247, %c16 : index + %2249 = subi %c-1, %2248 : index + %2250 = select %2245, %2249, %2248 : index + %2251 = muli %2250, %c-2 : index + %2252 = addi %2243, %2251 : index + %2253 = addi %2252, %c1 : index + %2254 = cmpi "slt", %2253, %c0 : index + %2255 = subi %c-1, %2253 : index + %2256 = select %2254, %2255, %2253 : index + %2257 = divi_signed %2256, %c2 : index + %2258 = subi %c-1, %2257 : index + %2259 = select %2254, %2258, %2257 : index + %2260 = muli %2259, %c-2 : index + %2261 = addi %2237, %2260 : index + %2262 = addi %2261, %c1 : index + %2263 = load %2[%2222, %c0, %2262] : memref<16x6x2xvector<8xf32>> + %2264 = vector.insertelement %2204, %2263[%c0_i64 : i64] : vector<8xf32> + %2265 = addi %arg5, %c8 : index + %2266 = cmpi "slt", %2265, %c0 : index + %2267 = subi %c-1, %2265 : index + %2268 = select %2266, %2267, %2265 : index + %2269 = divi_signed %2268, %c16 : index + %2270 = subi %c-1, %2269 : index + %2271 = select %2266, %2270, %2269 : index + %2272 = remi_signed %2271, %c16 : index + %2273 = cmpi "slt", %2272, %c0 : index + %2274 = addi %2272, %c16 : index + %2275 = select %2273, %2274, %2272 : index + %2276 = cmpi "slt", %arg5, %c0 : index + %2277 = subi %c-1, %arg5 : index + %2278 = select %2276, %2277, %arg5 : index + %2279 = divi_signed %2278, %c8 : index + %2280 = subi %c-1, %2279 : index + %2281 = select %2276, %2280, %2279 : index + %2282 = addi %arg5, %c8 : index + %2283 = cmpi "slt", %2282, %c0 : index + %2284 = subi %c-1, %2282 : index + %2285 = select %2283, %2284, %2282 : index + %2286 = divi_signed %2285, %c16 : index + %2287 = subi %c-1, %2286 : index + %2288 = select %2283, %2287, %2286 : index + %2289 = muli %2288, %c-2 : index + %2290 = addi %2281, %2289 : index + %2291 = cmpi "slt", %arg5, %c0 : index + %2292 = subi %c-1, %arg5 : index + %2293 = select %2291, %2292, %arg5 : index + %2294 = divi_signed %2293, %c8 : index + %2295 = subi %c-1, %2294 : index + %2296 = select %2291, %2295, %2294 : index + %2297 = addi %arg5, %c8 : index + %2298 = cmpi "slt", %2297, %c0 : index + %2299 = subi %c-1, %2297 : index + %2300 = select %2298, %2299, %2297 : index + %2301 = divi_signed %2300, %c16 : index + %2302 = subi %c-1, %2301 : index + %2303 = select %2298, %2302, %2301 : index + %2304 = muli %2303, %c-2 : index + %2305 = addi %2296, %2304 : index + %2306 = addi %2305, %c1 : index + %2307 = cmpi "slt", %2306, %c0 : index + %2308 = subi %c-1, %2306 : index + %2309 = select %2307, %2308, %2306 : index + %2310 = divi_signed %2309, %c2 : index + %2311 = subi %c-1, %2310 : index + %2312 = select %2307, %2311, %2310 : index + %2313 = muli %2312, %c-2 : index + %2314 = addi %2290, %2313 : index + %2315 = addi %2314, %c1 : index + store %2264, %2[%2275, %c0, %2315] : memref<16x6x2xvector<8xf32>> + %2316 = addi %arg5, %c8 : index + %2317 = cmpi "slt", %2316, %c0 : index + %2318 = subi %c-1, %2316 : index + %2319 = select %2317, %2318, %2316 : index + %2320 = divi_signed %2319, %c16 : index + %2321 = subi %c-1, %2320 : index + %2322 = select %2317, %2321, %2320 : index + %2323 = remi_signed %2322, %c16 : index + %2324 = cmpi "slt", %2323, %c0 : index + %2325 = addi %2323, %c16 : index + %2326 = select %2324, %2325, %2323 : index + %2327 = cmpi "slt", %arg5, %c0 : index + %2328 = subi %c-1, %arg5 : index + %2329 = select %2327, %2328, %arg5 : index + %2330 = divi_signed %2329, %c8 : index + %2331 = subi %c-1, %2330 : index + %2332 = select %2327, %2331, %2330 : index + %2333 = addi %arg5, %c8 : index + %2334 = cmpi "slt", %2333, %c0 : index + %2335 = subi %c-1, %2333 : index + %2336 = select %2334, %2335, %2333 : index + %2337 = divi_signed %2336, %c16 : index + %2338 = subi %c-1, %2337 : index + %2339 = select %2334, %2338, %2337 : index + %2340 = muli %2339, %c-2 : index + %2341 = addi %2332, %2340 : index + %2342 = cmpi "slt", %arg5, %c0 : index + %2343 = subi %c-1, %arg5 : index + %2344 = select %2342, %2343, %arg5 : index + %2345 = divi_signed %2344, %c8 : index + %2346 = subi %c-1, %2345 : index + %2347 = select %2342, %2346, %2345 : index + %2348 = addi %arg5, %c8 : index + %2349 = cmpi "slt", %2348, %c0 : index + %2350 = subi %c-1, %2348 : index + %2351 = select %2349, %2350, %2348 : index + %2352 = divi_signed %2351, %c16 : index + %2353 = subi %c-1, %2352 : index + %2354 = select %2349, %2353, %2352 : index + %2355 = muli %2354, %c-2 : index + %2356 = addi %2347, %2355 : index + %2357 = addi %2356, %c1 : index + %2358 = cmpi "slt", %2357, %c0 : index + %2359 = subi %c-1, %2357 : index + %2360 = select %2358, %2359, %2357 : index + %2361 = divi_signed %2360, %c2 : index + %2362 = subi %c-1, %2361 : index + %2363 = select %2358, %2362, %2361 : index + %2364 = muli %2363, %c-2 : index + %2365 = addi %2341, %2364 : index + %2366 = addi %2365, %c1 : index + %2367 = load %2[%2326, %c0, %2366] : memref<16x6x2xvector<8xf32>> + %2368 = vector.insertelement %2205, %2367[%c1_i64 : i64] : vector<8xf32> + %2369 = addi %arg5, %c8 : index + %2370 = cmpi "slt", %2369, %c0 : index + %2371 = subi %c-1, %2369 : index + %2372 = select %2370, %2371, %2369 : index + %2373 = divi_signed %2372, %c16 : index + %2374 = subi %c-1, %2373 : index + %2375 = select %2370, %2374, %2373 : index + %2376 = remi_signed %2375, %c16 : index + %2377 = cmpi "slt", %2376, %c0 : index + %2378 = addi %2376, %c16 : index + %2379 = select %2377, %2378, %2376 : index + %2380 = cmpi "slt", %arg5, %c0 : index + %2381 = subi %c-1, %arg5 : index + %2382 = select %2380, %2381, %arg5 : index + %2383 = divi_signed %2382, %c8 : index + %2384 = subi %c-1, %2383 : index + %2385 = select %2380, %2384, %2383 : index + %2386 = addi %arg5, %c8 : index + %2387 = cmpi "slt", %2386, %c0 : index + %2388 = subi %c-1, %2386 : index + %2389 = select %2387, %2388, %2386 : index + %2390 = divi_signed %2389, %c16 : index + %2391 = subi %c-1, %2390 : index + %2392 = select %2387, %2391, %2390 : index + %2393 = muli %2392, %c-2 : index + %2394 = addi %2385, %2393 : index + %2395 = cmpi "slt", %arg5, %c0 : index + %2396 = subi %c-1, %arg5 : index + %2397 = select %2395, %2396, %arg5 : index + %2398 = divi_signed %2397, %c8 : index + %2399 = subi %c-1, %2398 : index + %2400 = select %2395, %2399, %2398 : index + %2401 = addi %arg5, %c8 : index + %2402 = cmpi "slt", %2401, %c0 : index + %2403 = subi %c-1, %2401 : index + %2404 = select %2402, %2403, %2401 : index + %2405 = divi_signed %2404, %c16 : index + %2406 = subi %c-1, %2405 : index + %2407 = select %2402, %2406, %2405 : index + %2408 = muli %2407, %c-2 : index + %2409 = addi %2400, %2408 : index + %2410 = addi %2409, %c1 : index + %2411 = cmpi "slt", %2410, %c0 : index + %2412 = subi %c-1, %2410 : index + %2413 = select %2411, %2412, %2410 : index + %2414 = divi_signed %2413, %c2 : index + %2415 = subi %c-1, %2414 : index + %2416 = select %2411, %2415, %2414 : index + %2417 = muli %2416, %c-2 : index + %2418 = addi %2394, %2417 : index + %2419 = addi %2418, %c1 : index + store %2368, %2[%2379, %c0, %2419] : memref<16x6x2xvector<8xf32>> + %2420 = addi %arg5, %c8 : index + %2421 = cmpi "slt", %2420, %c0 : index + %2422 = subi %c-1, %2420 : index + %2423 = select %2421, %2422, %2420 : index + %2424 = divi_signed %2423, %c16 : index + %2425 = subi %c-1, %2424 : index + %2426 = select %2421, %2425, %2424 : index + %2427 = remi_signed %2426, %c16 : index + %2428 = cmpi "slt", %2427, %c0 : index + %2429 = addi %2427, %c16 : index + %2430 = select %2428, %2429, %2427 : index + %2431 = cmpi "slt", %arg5, %c0 : index + %2432 = subi %c-1, %arg5 : index + %2433 = select %2431, %2432, %arg5 : index + %2434 = divi_signed %2433, %c8 : index + %2435 = subi %c-1, %2434 : index + %2436 = select %2431, %2435, %2434 : index + %2437 = addi %arg5, %c8 : index + %2438 = cmpi "slt", %2437, %c0 : index + %2439 = subi %c-1, %2437 : index + %2440 = select %2438, %2439, %2437 : index + %2441 = divi_signed %2440, %c16 : index + %2442 = subi %c-1, %2441 : index + %2443 = select %2438, %2442, %2441 : index + %2444 = muli %2443, %c-2 : index + %2445 = addi %2436, %2444 : index + %2446 = cmpi "slt", %arg5, %c0 : index + %2447 = subi %c-1, %arg5 : index + %2448 = select %2446, %2447, %arg5 : index + %2449 = divi_signed %2448, %c8 : index + %2450 = subi %c-1, %2449 : index + %2451 = select %2446, %2450, %2449 : index + %2452 = addi %arg5, %c8 : index + %2453 = cmpi "slt", %2452, %c0 : index + %2454 = subi %c-1, %2452 : index + %2455 = select %2453, %2454, %2452 : index + %2456 = divi_signed %2455, %c16 : index + %2457 = subi %c-1, %2456 : index + %2458 = select %2453, %2457, %2456 : index + %2459 = muli %2458, %c-2 : index + %2460 = addi %2451, %2459 : index + %2461 = addi %2460, %c1 : index + %2462 = cmpi "slt", %2461, %c0 : index + %2463 = subi %c-1, %2461 : index + %2464 = select %2462, %2463, %2461 : index + %2465 = divi_signed %2464, %c2 : index + %2466 = subi %c-1, %2465 : index + %2467 = select %2462, %2466, %2465 : index + %2468 = muli %2467, %c-2 : index + %2469 = addi %2445, %2468 : index + %2470 = addi %2469, %c1 : index + %2471 = load %2[%2430, %c0, %2470] : memref<16x6x2xvector<8xf32>> + %2472 = vector.insertelement %2206, %2471[%c2_i64 : i64] : vector<8xf32> + %2473 = addi %arg5, %c8 : index + %2474 = cmpi "slt", %2473, %c0 : index + %2475 = subi %c-1, %2473 : index + %2476 = select %2474, %2475, %2473 : index + %2477 = divi_signed %2476, %c16 : index + %2478 = subi %c-1, %2477 : index + %2479 = select %2474, %2478, %2477 : index + %2480 = remi_signed %2479, %c16 : index + %2481 = cmpi "slt", %2480, %c0 : index + %2482 = addi %2480, %c16 : index + %2483 = select %2481, %2482, %2480 : index + %2484 = cmpi "slt", %arg5, %c0 : index + %2485 = subi %c-1, %arg5 : index + %2486 = select %2484, %2485, %arg5 : index + %2487 = divi_signed %2486, %c8 : index + %2488 = subi %c-1, %2487 : index + %2489 = select %2484, %2488, %2487 : index + %2490 = addi %arg5, %c8 : index + %2491 = cmpi "slt", %2490, %c0 : index + %2492 = subi %c-1, %2490 : index + %2493 = select %2491, %2492, %2490 : index + %2494 = divi_signed %2493, %c16 : index + %2495 = subi %c-1, %2494 : index + %2496 = select %2491, %2495, %2494 : index + %2497 = muli %2496, %c-2 : index + %2498 = addi %2489, %2497 : index + %2499 = cmpi "slt", %arg5, %c0 : index + %2500 = subi %c-1, %arg5 : index + %2501 = select %2499, %2500, %arg5 : index + %2502 = divi_signed %2501, %c8 : index + %2503 = subi %c-1, %2502 : index + %2504 = select %2499, %2503, %2502 : index + %2505 = addi %arg5, %c8 : index + %2506 = cmpi "slt", %2505, %c0 : index + %2507 = subi %c-1, %2505 : index + %2508 = select %2506, %2507, %2505 : index + %2509 = divi_signed %2508, %c16 : index + %2510 = subi %c-1, %2509 : index + %2511 = select %2506, %2510, %2509 : index + %2512 = muli %2511, %c-2 : index + %2513 = addi %2504, %2512 : index + %2514 = addi %2513, %c1 : index + %2515 = cmpi "slt", %2514, %c0 : index + %2516 = subi %c-1, %2514 : index + %2517 = select %2515, %2516, %2514 : index + %2518 = divi_signed %2517, %c2 : index + %2519 = subi %c-1, %2518 : index + %2520 = select %2515, %2519, %2518 : index + %2521 = muli %2520, %c-2 : index + %2522 = addi %2498, %2521 : index + %2523 = addi %2522, %c1 : index + store %2472, %2[%2483, %c0, %2523] : memref<16x6x2xvector<8xf32>> + %2524 = addi %arg5, %c8 : index + %2525 = cmpi "slt", %2524, %c0 : index + %2526 = subi %c-1, %2524 : index + %2527 = select %2525, %2526, %2524 : index + %2528 = divi_signed %2527, %c16 : index + %2529 = subi %c-1, %2528 : index + %2530 = select %2525, %2529, %2528 : index + %2531 = remi_signed %2530, %c16 : index + %2532 = cmpi "slt", %2531, %c0 : index + %2533 = addi %2531, %c16 : index + %2534 = select %2532, %2533, %2531 : index + %2535 = cmpi "slt", %arg5, %c0 : index + %2536 = subi %c-1, %arg5 : index + %2537 = select %2535, %2536, %arg5 : index + %2538 = divi_signed %2537, %c8 : index + %2539 = subi %c-1, %2538 : index + %2540 = select %2535, %2539, %2538 : index + %2541 = addi %arg5, %c8 : index + %2542 = cmpi "slt", %2541, %c0 : index + %2543 = subi %c-1, %2541 : index + %2544 = select %2542, %2543, %2541 : index + %2545 = divi_signed %2544, %c16 : index + %2546 = subi %c-1, %2545 : index + %2547 = select %2542, %2546, %2545 : index + %2548 = muli %2547, %c-2 : index + %2549 = addi %2540, %2548 : index + %2550 = cmpi "slt", %arg5, %c0 : index + %2551 = subi %c-1, %arg5 : index + %2552 = select %2550, %2551, %arg5 : index + %2553 = divi_signed %2552, %c8 : index + %2554 = subi %c-1, %2553 : index + %2555 = select %2550, %2554, %2553 : index + %2556 = addi %arg5, %c8 : index + %2557 = cmpi "slt", %2556, %c0 : index + %2558 = subi %c-1, %2556 : index + %2559 = select %2557, %2558, %2556 : index + %2560 = divi_signed %2559, %c16 : index + %2561 = subi %c-1, %2560 : index + %2562 = select %2557, %2561, %2560 : index + %2563 = muli %2562, %c-2 : index + %2564 = addi %2555, %2563 : index + %2565 = addi %2564, %c1 : index + %2566 = cmpi "slt", %2565, %c0 : index + %2567 = subi %c-1, %2565 : index + %2568 = select %2566, %2567, %2565 : index + %2569 = divi_signed %2568, %c2 : index + %2570 = subi %c-1, %2569 : index + %2571 = select %2566, %2570, %2569 : index + %2572 = muli %2571, %c-2 : index + %2573 = addi %2549, %2572 : index + %2574 = addi %2573, %c1 : index + %2575 = load %2[%2534, %c0, %2574] : memref<16x6x2xvector<8xf32>> + %2576 = vector.insertelement %2207, %2575[%c3_i64 : i64] : vector<8xf32> + %2577 = addi %arg5, %c8 : index + %2578 = cmpi "slt", %2577, %c0 : index + %2579 = subi %c-1, %2577 : index + %2580 = select %2578, %2579, %2577 : index + %2581 = divi_signed %2580, %c16 : index + %2582 = subi %c-1, %2581 : index + %2583 = select %2578, %2582, %2581 : index + %2584 = remi_signed %2583, %c16 : index + %2585 = cmpi "slt", %2584, %c0 : index + %2586 = addi %2584, %c16 : index + %2587 = select %2585, %2586, %2584 : index + %2588 = cmpi "slt", %arg5, %c0 : index + %2589 = subi %c-1, %arg5 : index + %2590 = select %2588, %2589, %arg5 : index + %2591 = divi_signed %2590, %c8 : index + %2592 = subi %c-1, %2591 : index + %2593 = select %2588, %2592, %2591 : index + %2594 = addi %arg5, %c8 : index + %2595 = cmpi "slt", %2594, %c0 : index + %2596 = subi %c-1, %2594 : index + %2597 = select %2595, %2596, %2594 : index + %2598 = divi_signed %2597, %c16 : index + %2599 = subi %c-1, %2598 : index + %2600 = select %2595, %2599, %2598 : index + %2601 = muli %2600, %c-2 : index + %2602 = addi %2593, %2601 : index + %2603 = cmpi "slt", %arg5, %c0 : index + %2604 = subi %c-1, %arg5 : index + %2605 = select %2603, %2604, %arg5 : index + %2606 = divi_signed %2605, %c8 : index + %2607 = subi %c-1, %2606 : index + %2608 = select %2603, %2607, %2606 : index + %2609 = addi %arg5, %c8 : index + %2610 = cmpi "slt", %2609, %c0 : index + %2611 = subi %c-1, %2609 : index + %2612 = select %2610, %2611, %2609 : index + %2613 = divi_signed %2612, %c16 : index + %2614 = subi %c-1, %2613 : index + %2615 = select %2610, %2614, %2613 : index + %2616 = muli %2615, %c-2 : index + %2617 = addi %2608, %2616 : index + %2618 = addi %2617, %c1 : index + %2619 = cmpi "slt", %2618, %c0 : index + %2620 = subi %c-1, %2618 : index + %2621 = select %2619, %2620, %2618 : index + %2622 = divi_signed %2621, %c2 : index + %2623 = subi %c-1, %2622 : index + %2624 = select %2619, %2623, %2622 : index + %2625 = muli %2624, %c-2 : index + %2626 = addi %2602, %2625 : index + %2627 = addi %2626, %c1 : index + store %2576, %2[%2587, %c0, %2627] : memref<16x6x2xvector<8xf32>> + %2628 = addi %arg5, %c8 : index + %2629 = cmpi "slt", %2628, %c0 : index + %2630 = subi %c-1, %2628 : index + %2631 = select %2629, %2630, %2628 : index + %2632 = divi_signed %2631, %c16 : index + %2633 = subi %c-1, %2632 : index + %2634 = select %2629, %2633, %2632 : index + %2635 = remi_signed %2634, %c16 : index + %2636 = cmpi "slt", %2635, %c0 : index + %2637 = addi %2635, %c16 : index + %2638 = select %2636, %2637, %2635 : index + %2639 = cmpi "slt", %arg5, %c0 : index + %2640 = subi %c-1, %arg5 : index + %2641 = select %2639, %2640, %arg5 : index + %2642 = divi_signed %2641, %c8 : index + %2643 = subi %c-1, %2642 : index + %2644 = select %2639, %2643, %2642 : index + %2645 = addi %arg5, %c8 : index + %2646 = cmpi "slt", %2645, %c0 : index + %2647 = subi %c-1, %2645 : index + %2648 = select %2646, %2647, %2645 : index + %2649 = divi_signed %2648, %c16 : index + %2650 = subi %c-1, %2649 : index + %2651 = select %2646, %2650, %2649 : index + %2652 = muli %2651, %c-2 : index + %2653 = addi %2644, %2652 : index + %2654 = cmpi "slt", %arg5, %c0 : index + %2655 = subi %c-1, %arg5 : index + %2656 = select %2654, %2655, %arg5 : index + %2657 = divi_signed %2656, %c8 : index + %2658 = subi %c-1, %2657 : index + %2659 = select %2654, %2658, %2657 : index + %2660 = addi %arg5, %c8 : index + %2661 = cmpi "slt", %2660, %c0 : index + %2662 = subi %c-1, %2660 : index + %2663 = select %2661, %2662, %2660 : index + %2664 = divi_signed %2663, %c16 : index + %2665 = subi %c-1, %2664 : index + %2666 = select %2661, %2665, %2664 : index + %2667 = muli %2666, %c-2 : index + %2668 = addi %2659, %2667 : index + %2669 = addi %2668, %c1 : index + %2670 = cmpi "slt", %2669, %c0 : index + %2671 = subi %c-1, %2669 : index + %2672 = select %2670, %2671, %2669 : index + %2673 = divi_signed %2672, %c2 : index + %2674 = subi %c-1, %2673 : index + %2675 = select %2670, %2674, %2673 : index + %2676 = muli %2675, %c-2 : index + %2677 = addi %2653, %2676 : index + %2678 = addi %2677, %c1 : index + %2679 = load %2[%2638, %c0, %2678] : memref<16x6x2xvector<8xf32>> + %2680 = vector.insertelement %2208, %2679[%c4_i64 : i64] : vector<8xf32> + %2681 = addi %arg5, %c8 : index + %2682 = cmpi "slt", %2681, %c0 : index + %2683 = subi %c-1, %2681 : index + %2684 = select %2682, %2683, %2681 : index + %2685 = divi_signed %2684, %c16 : index + %2686 = subi %c-1, %2685 : index + %2687 = select %2682, %2686, %2685 : index + %2688 = remi_signed %2687, %c16 : index + %2689 = cmpi "slt", %2688, %c0 : index + %2690 = addi %2688, %c16 : index + %2691 = select %2689, %2690, %2688 : index + %2692 = cmpi "slt", %arg5, %c0 : index + %2693 = subi %c-1, %arg5 : index + %2694 = select %2692, %2693, %arg5 : index + %2695 = divi_signed %2694, %c8 : index + %2696 = subi %c-1, %2695 : index + %2697 = select %2692, %2696, %2695 : index + %2698 = addi %arg5, %c8 : index + %2699 = cmpi "slt", %2698, %c0 : index + %2700 = subi %c-1, %2698 : index + %2701 = select %2699, %2700, %2698 : index + %2702 = divi_signed %2701, %c16 : index + %2703 = subi %c-1, %2702 : index + %2704 = select %2699, %2703, %2702 : index + %2705 = muli %2704, %c-2 : index + %2706 = addi %2697, %2705 : index + %2707 = cmpi "slt", %arg5, %c0 : index + %2708 = subi %c-1, %arg5 : index + %2709 = select %2707, %2708, %arg5 : index + %2710 = divi_signed %2709, %c8 : index + %2711 = subi %c-1, %2710 : index + %2712 = select %2707, %2711, %2710 : index + %2713 = addi %arg5, %c8 : index + %2714 = cmpi "slt", %2713, %c0 : index + %2715 = subi %c-1, %2713 : index + %2716 = select %2714, %2715, %2713 : index + %2717 = divi_signed %2716, %c16 : index + %2718 = subi %c-1, %2717 : index + %2719 = select %2714, %2718, %2717 : index + %2720 = muli %2719, %c-2 : index + %2721 = addi %2712, %2720 : index + %2722 = addi %2721, %c1 : index + %2723 = cmpi "slt", %2722, %c0 : index + %2724 = subi %c-1, %2722 : index + %2725 = select %2723, %2724, %2722 : index + %2726 = divi_signed %2725, %c2 : index + %2727 = subi %c-1, %2726 : index + %2728 = select %2723, %2727, %2726 : index + %2729 = muli %2728, %c-2 : index + %2730 = addi %2706, %2729 : index + %2731 = addi %2730, %c1 : index + store %2680, %2[%2691, %c0, %2731] : memref<16x6x2xvector<8xf32>> + %2732 = addi %arg5, %c8 : index + %2733 = cmpi "slt", %2732, %c0 : index + %2734 = subi %c-1, %2732 : index + %2735 = select %2733, %2734, %2732 : index + %2736 = divi_signed %2735, %c16 : index + %2737 = subi %c-1, %2736 : index + %2738 = select %2733, %2737, %2736 : index + %2739 = remi_signed %2738, %c16 : index + %2740 = cmpi "slt", %2739, %c0 : index + %2741 = addi %2739, %c16 : index + %2742 = select %2740, %2741, %2739 : index + %2743 = cmpi "slt", %arg5, %c0 : index + %2744 = subi %c-1, %arg5 : index + %2745 = select %2743, %2744, %arg5 : index + %2746 = divi_signed %2745, %c8 : index + %2747 = subi %c-1, %2746 : index + %2748 = select %2743, %2747, %2746 : index + %2749 = addi %arg5, %c8 : index + %2750 = cmpi "slt", %2749, %c0 : index + %2751 = subi %c-1, %2749 : index + %2752 = select %2750, %2751, %2749 : index + %2753 = divi_signed %2752, %c16 : index + %2754 = subi %c-1, %2753 : index + %2755 = select %2750, %2754, %2753 : index + %2756 = muli %2755, %c-2 : index + %2757 = addi %2748, %2756 : index + %2758 = cmpi "slt", %arg5, %c0 : index + %2759 = subi %c-1, %arg5 : index + %2760 = select %2758, %2759, %arg5 : index + %2761 = divi_signed %2760, %c8 : index + %2762 = subi %c-1, %2761 : index + %2763 = select %2758, %2762, %2761 : index + %2764 = addi %arg5, %c8 : index + %2765 = cmpi "slt", %2764, %c0 : index + %2766 = subi %c-1, %2764 : index + %2767 = select %2765, %2766, %2764 : index + %2768 = divi_signed %2767, %c16 : index + %2769 = subi %c-1, %2768 : index + %2770 = select %2765, %2769, %2768 : index + %2771 = muli %2770, %c-2 : index + %2772 = addi %2763, %2771 : index + %2773 = addi %2772, %c1 : index + %2774 = cmpi "slt", %2773, %c0 : index + %2775 = subi %c-1, %2773 : index + %2776 = select %2774, %2775, %2773 : index + %2777 = divi_signed %2776, %c2 : index + %2778 = subi %c-1, %2777 : index + %2779 = select %2774, %2778, %2777 : index + %2780 = muli %2779, %c-2 : index + %2781 = addi %2757, %2780 : index + %2782 = addi %2781, %c1 : index + %2783 = load %2[%2742, %c0, %2782] : memref<16x6x2xvector<8xf32>> + %2784 = vector.insertelement %2209, %2783[%c5_i64 : i64] : vector<8xf32> + %2785 = addi %arg5, %c8 : index + %2786 = cmpi "slt", %2785, %c0 : index + %2787 = subi %c-1, %2785 : index + %2788 = select %2786, %2787, %2785 : index + %2789 = divi_signed %2788, %c16 : index + %2790 = subi %c-1, %2789 : index + %2791 = select %2786, %2790, %2789 : index + %2792 = remi_signed %2791, %c16 : index + %2793 = cmpi "slt", %2792, %c0 : index + %2794 = addi %2792, %c16 : index + %2795 = select %2793, %2794, %2792 : index + %2796 = cmpi "slt", %arg5, %c0 : index + %2797 = subi %c-1, %arg5 : index + %2798 = select %2796, %2797, %arg5 : index + %2799 = divi_signed %2798, %c8 : index + %2800 = subi %c-1, %2799 : index + %2801 = select %2796, %2800, %2799 : index + %2802 = addi %arg5, %c8 : index + %2803 = cmpi "slt", %2802, %c0 : index + %2804 = subi %c-1, %2802 : index + %2805 = select %2803, %2804, %2802 : index + %2806 = divi_signed %2805, %c16 : index + %2807 = subi %c-1, %2806 : index + %2808 = select %2803, %2807, %2806 : index + %2809 = muli %2808, %c-2 : index + %2810 = addi %2801, %2809 : index + %2811 = cmpi "slt", %arg5, %c0 : index + %2812 = subi %c-1, %arg5 : index + %2813 = select %2811, %2812, %arg5 : index + %2814 = divi_signed %2813, %c8 : index + %2815 = subi %c-1, %2814 : index + %2816 = select %2811, %2815, %2814 : index + %2817 = addi %arg5, %c8 : index + %2818 = cmpi "slt", %2817, %c0 : index + %2819 = subi %c-1, %2817 : index + %2820 = select %2818, %2819, %2817 : index + %2821 = divi_signed %2820, %c16 : index + %2822 = subi %c-1, %2821 : index + %2823 = select %2818, %2822, %2821 : index + %2824 = muli %2823, %c-2 : index + %2825 = addi %2816, %2824 : index + %2826 = addi %2825, %c1 : index + %2827 = cmpi "slt", %2826, %c0 : index + %2828 = subi %c-1, %2826 : index + %2829 = select %2827, %2828, %2826 : index + %2830 = divi_signed %2829, %c2 : index + %2831 = subi %c-1, %2830 : index + %2832 = select %2827, %2831, %2830 : index + %2833 = muli %2832, %c-2 : index + %2834 = addi %2810, %2833 : index + %2835 = addi %2834, %c1 : index + store %2784, %2[%2795, %c0, %2835] : memref<16x6x2xvector<8xf32>> + %2836 = addi %arg5, %c8 : index + %2837 = cmpi "slt", %2836, %c0 : index + %2838 = subi %c-1, %2836 : index + %2839 = select %2837, %2838, %2836 : index + %2840 = divi_signed %2839, %c16 : index + %2841 = subi %c-1, %2840 : index + %2842 = select %2837, %2841, %2840 : index + %2843 = remi_signed %2842, %c16 : index + %2844 = cmpi "slt", %2843, %c0 : index + %2845 = addi %2843, %c16 : index + %2846 = select %2844, %2845, %2843 : index + %2847 = cmpi "slt", %arg5, %c0 : index + %2848 = subi %c-1, %arg5 : index + %2849 = select %2847, %2848, %arg5 : index + %2850 = divi_signed %2849, %c8 : index + %2851 = subi %c-1, %2850 : index + %2852 = select %2847, %2851, %2850 : index + %2853 = addi %arg5, %c8 : index + %2854 = cmpi "slt", %2853, %c0 : index + %2855 = subi %c-1, %2853 : index + %2856 = select %2854, %2855, %2853 : index + %2857 = divi_signed %2856, %c16 : index + %2858 = subi %c-1, %2857 : index + %2859 = select %2854, %2858, %2857 : index + %2860 = muli %2859, %c-2 : index + %2861 = addi %2852, %2860 : index + %2862 = cmpi "slt", %arg5, %c0 : index + %2863 = subi %c-1, %arg5 : index + %2864 = select %2862, %2863, %arg5 : index + %2865 = divi_signed %2864, %c8 : index + %2866 = subi %c-1, %2865 : index + %2867 = select %2862, %2866, %2865 : index + %2868 = addi %arg5, %c8 : index + %2869 = cmpi "slt", %2868, %c0 : index + %2870 = subi %c-1, %2868 : index + %2871 = select %2869, %2870, %2868 : index + %2872 = divi_signed %2871, %c16 : index + %2873 = subi %c-1, %2872 : index + %2874 = select %2869, %2873, %2872 : index + %2875 = muli %2874, %c-2 : index + %2876 = addi %2867, %2875 : index + %2877 = addi %2876, %c1 : index + %2878 = cmpi "slt", %2877, %c0 : index + %2879 = subi %c-1, %2877 : index + %2880 = select %2878, %2879, %2877 : index + %2881 = divi_signed %2880, %c2 : index + %2882 = subi %c-1, %2881 : index + %2883 = select %2878, %2882, %2881 : index + %2884 = muli %2883, %c-2 : index + %2885 = addi %2861, %2884 : index + %2886 = addi %2885, %c1 : index + %2887 = load %2[%2846, %c0, %2886] : memref<16x6x2xvector<8xf32>> + %2888 = vector.insertelement %2210, %2887[%c6_i64 : i64] : vector<8xf32> + %2889 = addi %arg5, %c8 : index + %2890 = cmpi "slt", %2889, %c0 : index + %2891 = subi %c-1, %2889 : index + %2892 = select %2890, %2891, %2889 : index + %2893 = divi_signed %2892, %c16 : index + %2894 = subi %c-1, %2893 : index + %2895 = select %2890, %2894, %2893 : index + %2896 = remi_signed %2895, %c16 : index + %2897 = cmpi "slt", %2896, %c0 : index + %2898 = addi %2896, %c16 : index + %2899 = select %2897, %2898, %2896 : index + %2900 = cmpi "slt", %arg5, %c0 : index + %2901 = subi %c-1, %arg5 : index + %2902 = select %2900, %2901, %arg5 : index + %2903 = divi_signed %2902, %c8 : index + %2904 = subi %c-1, %2903 : index + %2905 = select %2900, %2904, %2903 : index + %2906 = addi %arg5, %c8 : index + %2907 = cmpi "slt", %2906, %c0 : index + %2908 = subi %c-1, %2906 : index + %2909 = select %2907, %2908, %2906 : index + %2910 = divi_signed %2909, %c16 : index + %2911 = subi %c-1, %2910 : index + %2912 = select %2907, %2911, %2910 : index + %2913 = muli %2912, %c-2 : index + %2914 = addi %2905, %2913 : index + %2915 = cmpi "slt", %arg5, %c0 : index + %2916 = subi %c-1, %arg5 : index + %2917 = select %2915, %2916, %arg5 : index + %2918 = divi_signed %2917, %c8 : index + %2919 = subi %c-1, %2918 : index + %2920 = select %2915, %2919, %2918 : index + %2921 = addi %arg5, %c8 : index + %2922 = cmpi "slt", %2921, %c0 : index + %2923 = subi %c-1, %2921 : index + %2924 = select %2922, %2923, %2921 : index + %2925 = divi_signed %2924, %c16 : index + %2926 = subi %c-1, %2925 : index + %2927 = select %2922, %2926, %2925 : index + %2928 = muli %2927, %c-2 : index + %2929 = addi %2920, %2928 : index + %2930 = addi %2929, %c1 : index + %2931 = cmpi "slt", %2930, %c0 : index + %2932 = subi %c-1, %2930 : index + %2933 = select %2931, %2932, %2930 : index + %2934 = divi_signed %2933, %c2 : index + %2935 = subi %c-1, %2934 : index + %2936 = select %2931, %2935, %2934 : index + %2937 = muli %2936, %c-2 : index + %2938 = addi %2914, %2937 : index + %2939 = addi %2938, %c1 : index + store %2888, %2[%2899, %c0, %2939] : memref<16x6x2xvector<8xf32>> + %2940 = addi %arg5, %c8 : index + %2941 = cmpi "slt", %2940, %c0 : index + %2942 = subi %c-1, %2940 : index + %2943 = select %2941, %2942, %2940 : index + %2944 = divi_signed %2943, %c16 : index + %2945 = subi %c-1, %2944 : index + %2946 = select %2941, %2945, %2944 : index + %2947 = remi_signed %2946, %c16 : index + %2948 = cmpi "slt", %2947, %c0 : index + %2949 = addi %2947, %c16 : index + %2950 = select %2948, %2949, %2947 : index + %2951 = cmpi "slt", %arg5, %c0 : index + %2952 = subi %c-1, %arg5 : index + %2953 = select %2951, %2952, %arg5 : index + %2954 = divi_signed %2953, %c8 : index + %2955 = subi %c-1, %2954 : index + %2956 = select %2951, %2955, %2954 : index + %2957 = addi %arg5, %c8 : index + %2958 = cmpi "slt", %2957, %c0 : index + %2959 = subi %c-1, %2957 : index + %2960 = select %2958, %2959, %2957 : index + %2961 = divi_signed %2960, %c16 : index + %2962 = subi %c-1, %2961 : index + %2963 = select %2958, %2962, %2961 : index + %2964 = muli %2963, %c-2 : index + %2965 = addi %2956, %2964 : index + %2966 = cmpi "slt", %arg5, %c0 : index + %2967 = subi %c-1, %arg5 : index + %2968 = select %2966, %2967, %arg5 : index + %2969 = divi_signed %2968, %c8 : index + %2970 = subi %c-1, %2969 : index + %2971 = select %2966, %2970, %2969 : index + %2972 = addi %arg5, %c8 : index + %2973 = cmpi "slt", %2972, %c0 : index + %2974 = subi %c-1, %2972 : index + %2975 = select %2973, %2974, %2972 : index + %2976 = divi_signed %2975, %c16 : index + %2977 = subi %c-1, %2976 : index + %2978 = select %2973, %2977, %2976 : index + %2979 = muli %2978, %c-2 : index + %2980 = addi %2971, %2979 : index + %2981 = addi %2980, %c1 : index + %2982 = cmpi "slt", %2981, %c0 : index + %2983 = subi %c-1, %2981 : index + %2984 = select %2982, %2983, %2981 : index + %2985 = divi_signed %2984, %c2 : index + %2986 = subi %c-1, %2985 : index + %2987 = select %2982, %2986, %2985 : index + %2988 = muli %2987, %c-2 : index + %2989 = addi %2965, %2988 : index + %2990 = addi %2989, %c1 : index + %2991 = load %2[%2950, %c0, %2990] : memref<16x6x2xvector<8xf32>> + %2992 = vector.insertelement %2211, %2991[%c7_i64 : i64] : vector<8xf32> + %2993 = addi %arg5, %c8 : index + %2994 = cmpi "slt", %2993, %c0 : index + %2995 = subi %c-1, %2993 : index + %2996 = select %2994, %2995, %2993 : index + %2997 = divi_signed %2996, %c16 : index + %2998 = subi %c-1, %2997 : index + %2999 = select %2994, %2998, %2997 : index + %3000 = remi_signed %2999, %c16 : index + %3001 = cmpi "slt", %3000, %c0 : index + %3002 = addi %3000, %c16 : index + %3003 = select %3001, %3002, %3000 : index + %3004 = cmpi "slt", %arg5, %c0 : index + %3005 = subi %c-1, %arg5 : index + %3006 = select %3004, %3005, %arg5 : index + %3007 = divi_signed %3006, %c8 : index + %3008 = subi %c-1, %3007 : index + %3009 = select %3004, %3008, %3007 : index + %3010 = addi %arg5, %c8 : index + %3011 = cmpi "slt", %3010, %c0 : index + %3012 = subi %c-1, %3010 : index + %3013 = select %3011, %3012, %3010 : index + %3014 = divi_signed %3013, %c16 : index + %3015 = subi %c-1, %3014 : index + %3016 = select %3011, %3015, %3014 : index + %3017 = muli %3016, %c-2 : index + %3018 = addi %3009, %3017 : index + %3019 = cmpi "slt", %arg5, %c0 : index + %3020 = subi %c-1, %arg5 : index + %3021 = select %3019, %3020, %arg5 : index + %3022 = divi_signed %3021, %c8 : index + %3023 = subi %c-1, %3022 : index + %3024 = select %3019, %3023, %3022 : index + %3025 = addi %arg5, %c8 : index + %3026 = cmpi "slt", %3025, %c0 : index + %3027 = subi %c-1, %3025 : index + %3028 = select %3026, %3027, %3025 : index + %3029 = divi_signed %3028, %c16 : index + %3030 = subi %c-1, %3029 : index + %3031 = select %3026, %3030, %3029 : index + %3032 = muli %3031, %c-2 : index + %3033 = addi %3024, %3032 : index + %3034 = addi %3033, %c1 : index + %3035 = cmpi "slt", %3034, %c0 : index + %3036 = subi %c-1, %3034 : index + %3037 = select %3035, %3036, %3034 : index + %3038 = divi_signed %3037, %c2 : index + %3039 = subi %c-1, %3038 : index + %3040 = select %3035, %3039, %3038 : index + %3041 = muli %3040, %c-2 : index + %3042 = addi %3018, %3041 : index + %3043 = addi %3042, %c1 : index + store %2992, %2[%3003, %c0, %3043] : memref<16x6x2xvector<8xf32>> + %3044 = addi %arg5, %c8 : index + %3045 = cmpi "slt", %3044, %c0 : index + %3046 = subi %c-1, %3044 : index + %3047 = select %3045, %3046, %3044 : index + %3048 = divi_signed %3047, %c16 : index + %3049 = subi %c-1, %3048 : index + %3050 = select %3045, %3049, %3048 : index + %3051 = remi_signed %3050, %c16 : index + %3052 = cmpi "slt", %3051, %c0 : index + %3053 = addi %3051, %c16 : index + %3054 = select %3052, %3053, %3051 : index + %3055 = cmpi "slt", %arg5, %c0 : index + %3056 = subi %c-1, %arg5 : index + %3057 = select %3055, %3056, %arg5 : index + %3058 = divi_signed %3057, %c8 : index + %3059 = subi %c-1, %3058 : index + %3060 = select %3055, %3059, %3058 : index + %3061 = addi %arg5, %c8 : index + %3062 = cmpi "slt", %3061, %c0 : index + %3063 = subi %c-1, %3061 : index + %3064 = select %3062, %3063, %3061 : index + %3065 = divi_signed %3064, %c16 : index + %3066 = subi %c-1, %3065 : index + %3067 = select %3062, %3066, %3065 : index + %3068 = muli %3067, %c-2 : index + %3069 = addi %3060, %3068 : index + %3070 = cmpi "slt", %arg5, %c0 : index + %3071 = subi %c-1, %arg5 : index + %3072 = select %3070, %3071, %arg5 : index + %3073 = divi_signed %3072, %c8 : index + %3074 = subi %c-1, %3073 : index + %3075 = select %3070, %3074, %3073 : index + %3076 = addi %arg5, %c8 : index + %3077 = cmpi "slt", %3076, %c0 : index + %3078 = subi %c-1, %3076 : index + %3079 = select %3077, %3078, %3076 : index + %3080 = divi_signed %3079, %c16 : index + %3081 = subi %c-1, %3080 : index + %3082 = select %3077, %3081, %3080 : index + %3083 = muli %3082, %c-2 : index + %3084 = addi %3075, %3083 : index + %3085 = addi %3084, %c1 : index + %3086 = cmpi "slt", %3085, %c0 : index + %3087 = subi %c-1, %3085 : index + %3088 = select %3086, %3087, %3085 : index + %3089 = divi_signed %3088, %c2 : index + %3090 = subi %c-1, %3089 : index + %3091 = select %3086, %3090, %3089 : index + %3092 = muli %3091, %c-2 : index + %3093 = addi %3069, %3092 : index + %3094 = addi %3093, %c1 : index + %3095 = load %2[%3054, %c0, %3094] : memref<16x6x2xvector<8xf32>> + %3096 = vector.insertelement %2204, %3095[%c0_i64 : i64] : vector<8xf32> + %3097 = addi %arg5, %c8 : index + %3098 = cmpi "slt", %3097, %c0 : index + %3099 = subi %c-1, %3097 : index + %3100 = select %3098, %3099, %3097 : index + %3101 = divi_signed %3100, %c16 : index + %3102 = subi %c-1, %3101 : index + %3103 = select %3098, %3102, %3101 : index + %3104 = remi_signed %3103, %c16 : index + %3105 = cmpi "slt", %3104, %c0 : index + %3106 = addi %3104, %c16 : index + %3107 = select %3105, %3106, %3104 : index + %3108 = cmpi "slt", %arg5, %c0 : index + %3109 = subi %c-1, %arg5 : index + %3110 = select %3108, %3109, %arg5 : index + %3111 = divi_signed %3110, %c8 : index + %3112 = subi %c-1, %3111 : index + %3113 = select %3108, %3112, %3111 : index + %3114 = addi %arg5, %c8 : index + %3115 = cmpi "slt", %3114, %c0 : index + %3116 = subi %c-1, %3114 : index + %3117 = select %3115, %3116, %3114 : index + %3118 = divi_signed %3117, %c16 : index + %3119 = subi %c-1, %3118 : index + %3120 = select %3115, %3119, %3118 : index + %3121 = muli %3120, %c-2 : index + %3122 = addi %3113, %3121 : index + %3123 = cmpi "slt", %arg5, %c0 : index + %3124 = subi %c-1, %arg5 : index + %3125 = select %3123, %3124, %arg5 : index + %3126 = divi_signed %3125, %c8 : index + %3127 = subi %c-1, %3126 : index + %3128 = select %3123, %3127, %3126 : index + %3129 = addi %arg5, %c8 : index + %3130 = cmpi "slt", %3129, %c0 : index + %3131 = subi %c-1, %3129 : index + %3132 = select %3130, %3131, %3129 : index + %3133 = divi_signed %3132, %c16 : index + %3134 = subi %c-1, %3133 : index + %3135 = select %3130, %3134, %3133 : index + %3136 = muli %3135, %c-2 : index + %3137 = addi %3128, %3136 : index + %3138 = addi %3137, %c1 : index + %3139 = cmpi "slt", %3138, %c0 : index + %3140 = subi %c-1, %3138 : index + %3141 = select %3139, %3140, %3138 : index + %3142 = divi_signed %3141, %c2 : index + %3143 = subi %c-1, %3142 : index + %3144 = select %3139, %3143, %3142 : index + %3145 = muli %3144, %c-2 : index + %3146 = addi %3122, %3145 : index + %3147 = addi %3146, %c1 : index + store %3096, %2[%3107, %c0, %3147] : memref<16x6x2xvector<8xf32>> + %3148 = addi %arg5, %c8 : index + %3149 = cmpi "slt", %3148, %c0 : index + %3150 = subi %c-1, %3148 : index + %3151 = select %3149, %3150, %3148 : index + %3152 = divi_signed %3151, %c16 : index + %3153 = subi %c-1, %3152 : index + %3154 = select %3149, %3153, %3152 : index + %3155 = remi_signed %3154, %c16 : index + %3156 = cmpi "slt", %3155, %c0 : index + %3157 = addi %3155, %c16 : index + %3158 = select %3156, %3157, %3155 : index + %3159 = cmpi "slt", %arg5, %c0 : index + %3160 = subi %c-1, %arg5 : index + %3161 = select %3159, %3160, %arg5 : index + %3162 = divi_signed %3161, %c8 : index + %3163 = subi %c-1, %3162 : index + %3164 = select %3159, %3163, %3162 : index + %3165 = addi %arg5, %c8 : index + %3166 = cmpi "slt", %3165, %c0 : index + %3167 = subi %c-1, %3165 : index + %3168 = select %3166, %3167, %3165 : index + %3169 = divi_signed %3168, %c16 : index + %3170 = subi %c-1, %3169 : index + %3171 = select %3166, %3170, %3169 : index + %3172 = muli %3171, %c-2 : index + %3173 = addi %3164, %3172 : index + %3174 = cmpi "slt", %arg5, %c0 : index + %3175 = subi %c-1, %arg5 : index + %3176 = select %3174, %3175, %arg5 : index + %3177 = divi_signed %3176, %c8 : index + %3178 = subi %c-1, %3177 : index + %3179 = select %3174, %3178, %3177 : index + %3180 = addi %arg5, %c8 : index + %3181 = cmpi "slt", %3180, %c0 : index + %3182 = subi %c-1, %3180 : index + %3183 = select %3181, %3182, %3180 : index + %3184 = divi_signed %3183, %c16 : index + %3185 = subi %c-1, %3184 : index + %3186 = select %3181, %3185, %3184 : index + %3187 = muli %3186, %c-2 : index + %3188 = addi %3179, %3187 : index + %3189 = addi %3188, %c1 : index + %3190 = cmpi "slt", %3189, %c0 : index + %3191 = subi %c-1, %3189 : index + %3192 = select %3190, %3191, %3189 : index + %3193 = divi_signed %3192, %c2 : index + %3194 = subi %c-1, %3193 : index + %3195 = select %3190, %3194, %3193 : index + %3196 = muli %3195, %c-2 : index + %3197 = addi %3173, %3196 : index + %3198 = addi %3197, %c1 : index + %3199 = load %2[%3158, %c0, %3198] : memref<16x6x2xvector<8xf32>> + %3200 = vector.insertelement %2205, %3199[%c1_i64 : i64] : vector<8xf32> + %3201 = addi %arg5, %c8 : index + %3202 = cmpi "slt", %3201, %c0 : index + %3203 = subi %c-1, %3201 : index + %3204 = select %3202, %3203, %3201 : index + %3205 = divi_signed %3204, %c16 : index + %3206 = subi %c-1, %3205 : index + %3207 = select %3202, %3206, %3205 : index + %3208 = remi_signed %3207, %c16 : index + %3209 = cmpi "slt", %3208, %c0 : index + %3210 = addi %3208, %c16 : index + %3211 = select %3209, %3210, %3208 : index + %3212 = cmpi "slt", %arg5, %c0 : index + %3213 = subi %c-1, %arg5 : index + %3214 = select %3212, %3213, %arg5 : index + %3215 = divi_signed %3214, %c8 : index + %3216 = subi %c-1, %3215 : index + %3217 = select %3212, %3216, %3215 : index + %3218 = addi %arg5, %c8 : index + %3219 = cmpi "slt", %3218, %c0 : index + %3220 = subi %c-1, %3218 : index + %3221 = select %3219, %3220, %3218 : index + %3222 = divi_signed %3221, %c16 : index + %3223 = subi %c-1, %3222 : index + %3224 = select %3219, %3223, %3222 : index + %3225 = muli %3224, %c-2 : index + %3226 = addi %3217, %3225 : index + %3227 = cmpi "slt", %arg5, %c0 : index + %3228 = subi %c-1, %arg5 : index + %3229 = select %3227, %3228, %arg5 : index + %3230 = divi_signed %3229, %c8 : index + %3231 = subi %c-1, %3230 : index + %3232 = select %3227, %3231, %3230 : index + %3233 = addi %arg5, %c8 : index + %3234 = cmpi "slt", %3233, %c0 : index + %3235 = subi %c-1, %3233 : index + %3236 = select %3234, %3235, %3233 : index + %3237 = divi_signed %3236, %c16 : index + %3238 = subi %c-1, %3237 : index + %3239 = select %3234, %3238, %3237 : index + %3240 = muli %3239, %c-2 : index + %3241 = addi %3232, %3240 : index + %3242 = addi %3241, %c1 : index + %3243 = cmpi "slt", %3242, %c0 : index + %3244 = subi %c-1, %3242 : index + %3245 = select %3243, %3244, %3242 : index + %3246 = divi_signed %3245, %c2 : index + %3247 = subi %c-1, %3246 : index + %3248 = select %3243, %3247, %3246 : index + %3249 = muli %3248, %c-2 : index + %3250 = addi %3226, %3249 : index + %3251 = addi %3250, %c1 : index + store %3200, %2[%3211, %c0, %3251] : memref<16x6x2xvector<8xf32>> + %3252 = addi %arg5, %c8 : index + %3253 = cmpi "slt", %3252, %c0 : index + %3254 = subi %c-1, %3252 : index + %3255 = select %3253, %3254, %3252 : index + %3256 = divi_signed %3255, %c16 : index + %3257 = subi %c-1, %3256 : index + %3258 = select %3253, %3257, %3256 : index + %3259 = remi_signed %3258, %c16 : index + %3260 = cmpi "slt", %3259, %c0 : index + %3261 = addi %3259, %c16 : index + %3262 = select %3260, %3261, %3259 : index + %3263 = cmpi "slt", %arg5, %c0 : index + %3264 = subi %c-1, %arg5 : index + %3265 = select %3263, %3264, %arg5 : index + %3266 = divi_signed %3265, %c8 : index + %3267 = subi %c-1, %3266 : index + %3268 = select %3263, %3267, %3266 : index + %3269 = addi %arg5, %c8 : index + %3270 = cmpi "slt", %3269, %c0 : index + %3271 = subi %c-1, %3269 : index + %3272 = select %3270, %3271, %3269 : index + %3273 = divi_signed %3272, %c16 : index + %3274 = subi %c-1, %3273 : index + %3275 = select %3270, %3274, %3273 : index + %3276 = muli %3275, %c-2 : index + %3277 = addi %3268, %3276 : index + %3278 = cmpi "slt", %arg5, %c0 : index + %3279 = subi %c-1, %arg5 : index + %3280 = select %3278, %3279, %arg5 : index + %3281 = divi_signed %3280, %c8 : index + %3282 = subi %c-1, %3281 : index + %3283 = select %3278, %3282, %3281 : index + %3284 = addi %arg5, %c8 : index + %3285 = cmpi "slt", %3284, %c0 : index + %3286 = subi %c-1, %3284 : index + %3287 = select %3285, %3286, %3284 : index + %3288 = divi_signed %3287, %c16 : index + %3289 = subi %c-1, %3288 : index + %3290 = select %3285, %3289, %3288 : index + %3291 = muli %3290, %c-2 : index + %3292 = addi %3283, %3291 : index + %3293 = addi %3292, %c1 : index + %3294 = cmpi "slt", %3293, %c0 : index + %3295 = subi %c-1, %3293 : index + %3296 = select %3294, %3295, %3293 : index + %3297 = divi_signed %3296, %c2 : index + %3298 = subi %c-1, %3297 : index + %3299 = select %3294, %3298, %3297 : index + %3300 = muli %3299, %c-2 : index + %3301 = addi %3277, %3300 : index + %3302 = addi %3301, %c1 : index + %3303 = load %2[%3262, %c0, %3302] : memref<16x6x2xvector<8xf32>> + %3304 = vector.insertelement %2206, %3303[%c2_i64 : i64] : vector<8xf32> + %3305 = addi %arg5, %c8 : index + %3306 = cmpi "slt", %3305, %c0 : index + %3307 = subi %c-1, %3305 : index + %3308 = select %3306, %3307, %3305 : index + %3309 = divi_signed %3308, %c16 : index + %3310 = subi %c-1, %3309 : index + %3311 = select %3306, %3310, %3309 : index + %3312 = remi_signed %3311, %c16 : index + %3313 = cmpi "slt", %3312, %c0 : index + %3314 = addi %3312, %c16 : index + %3315 = select %3313, %3314, %3312 : index + %3316 = cmpi "slt", %arg5, %c0 : index + %3317 = subi %c-1, %arg5 : index + %3318 = select %3316, %3317, %arg5 : index + %3319 = divi_signed %3318, %c8 : index + %3320 = subi %c-1, %3319 : index + %3321 = select %3316, %3320, %3319 : index + %3322 = addi %arg5, %c8 : index + %3323 = cmpi "slt", %3322, %c0 : index + %3324 = subi %c-1, %3322 : index + %3325 = select %3323, %3324, %3322 : index + %3326 = divi_signed %3325, %c16 : index + %3327 = subi %c-1, %3326 : index + %3328 = select %3323, %3327, %3326 : index + %3329 = muli %3328, %c-2 : index + %3330 = addi %3321, %3329 : index + %3331 = cmpi "slt", %arg5, %c0 : index + %3332 = subi %c-1, %arg5 : index + %3333 = select %3331, %3332, %arg5 : index + %3334 = divi_signed %3333, %c8 : index + %3335 = subi %c-1, %3334 : index + %3336 = select %3331, %3335, %3334 : index + %3337 = addi %arg5, %c8 : index + %3338 = cmpi "slt", %3337, %c0 : index + %3339 = subi %c-1, %3337 : index + %3340 = select %3338, %3339, %3337 : index + %3341 = divi_signed %3340, %c16 : index + %3342 = subi %c-1, %3341 : index + %3343 = select %3338, %3342, %3341 : index + %3344 = muli %3343, %c-2 : index + %3345 = addi %3336, %3344 : index + %3346 = addi %3345, %c1 : index + %3347 = cmpi "slt", %3346, %c0 : index + %3348 = subi %c-1, %3346 : index + %3349 = select %3347, %3348, %3346 : index + %3350 = divi_signed %3349, %c2 : index + %3351 = subi %c-1, %3350 : index + %3352 = select %3347, %3351, %3350 : index + %3353 = muli %3352, %c-2 : index + %3354 = addi %3330, %3353 : index + %3355 = addi %3354, %c1 : index + store %3304, %2[%3315, %c0, %3355] : memref<16x6x2xvector<8xf32>> + %3356 = addi %arg5, %c8 : index + %3357 = cmpi "slt", %3356, %c0 : index + %3358 = subi %c-1, %3356 : index + %3359 = select %3357, %3358, %3356 : index + %3360 = divi_signed %3359, %c16 : index + %3361 = subi %c-1, %3360 : index + %3362 = select %3357, %3361, %3360 : index + %3363 = remi_signed %3362, %c16 : index + %3364 = cmpi "slt", %3363, %c0 : index + %3365 = addi %3363, %c16 : index + %3366 = select %3364, %3365, %3363 : index + %3367 = cmpi "slt", %arg5, %c0 : index + %3368 = subi %c-1, %arg5 : index + %3369 = select %3367, %3368, %arg5 : index + %3370 = divi_signed %3369, %c8 : index + %3371 = subi %c-1, %3370 : index + %3372 = select %3367, %3371, %3370 : index + %3373 = addi %arg5, %c8 : index + %3374 = cmpi "slt", %3373, %c0 : index + %3375 = subi %c-1, %3373 : index + %3376 = select %3374, %3375, %3373 : index + %3377 = divi_signed %3376, %c16 : index + %3378 = subi %c-1, %3377 : index + %3379 = select %3374, %3378, %3377 : index + %3380 = muli %3379, %c-2 : index + %3381 = addi %3372, %3380 : index + %3382 = cmpi "slt", %arg5, %c0 : index + %3383 = subi %c-1, %arg5 : index + %3384 = select %3382, %3383, %arg5 : index + %3385 = divi_signed %3384, %c8 : index + %3386 = subi %c-1, %3385 : index + %3387 = select %3382, %3386, %3385 : index + %3388 = addi %arg5, %c8 : index + %3389 = cmpi "slt", %3388, %c0 : index + %3390 = subi %c-1, %3388 : index + %3391 = select %3389, %3390, %3388 : index + %3392 = divi_signed %3391, %c16 : index + %3393 = subi %c-1, %3392 : index + %3394 = select %3389, %3393, %3392 : index + %3395 = muli %3394, %c-2 : index + %3396 = addi %3387, %3395 : index + %3397 = addi %3396, %c1 : index + %3398 = cmpi "slt", %3397, %c0 : index + %3399 = subi %c-1, %3397 : index + %3400 = select %3398, %3399, %3397 : index + %3401 = divi_signed %3400, %c2 : index + %3402 = subi %c-1, %3401 : index + %3403 = select %3398, %3402, %3401 : index + %3404 = muli %3403, %c-2 : index + %3405 = addi %3381, %3404 : index + %3406 = addi %3405, %c1 : index + %3407 = load %2[%3366, %c0, %3406] : memref<16x6x2xvector<8xf32>> + %3408 = vector.insertelement %2207, %3407[%c3_i64 : i64] : vector<8xf32> + %3409 = addi %arg5, %c8 : index + %3410 = cmpi "slt", %3409, %c0 : index + %3411 = subi %c-1, %3409 : index + %3412 = select %3410, %3411, %3409 : index + %3413 = divi_signed %3412, %c16 : index + %3414 = subi %c-1, %3413 : index + %3415 = select %3410, %3414, %3413 : index + %3416 = remi_signed %3415, %c16 : index + %3417 = cmpi "slt", %3416, %c0 : index + %3418 = addi %3416, %c16 : index + %3419 = select %3417, %3418, %3416 : index + %3420 = cmpi "slt", %arg5, %c0 : index + %3421 = subi %c-1, %arg5 : index + %3422 = select %3420, %3421, %arg5 : index + %3423 = divi_signed %3422, %c8 : index + %3424 = subi %c-1, %3423 : index + %3425 = select %3420, %3424, %3423 : index + %3426 = addi %arg5, %c8 : index + %3427 = cmpi "slt", %3426, %c0 : index + %3428 = subi %c-1, %3426 : index + %3429 = select %3427, %3428, %3426 : index + %3430 = divi_signed %3429, %c16 : index + %3431 = subi %c-1, %3430 : index + %3432 = select %3427, %3431, %3430 : index + %3433 = muli %3432, %c-2 : index + %3434 = addi %3425, %3433 : index + %3435 = cmpi "slt", %arg5, %c0 : index + %3436 = subi %c-1, %arg5 : index + %3437 = select %3435, %3436, %arg5 : index + %3438 = divi_signed %3437, %c8 : index + %3439 = subi %c-1, %3438 : index + %3440 = select %3435, %3439, %3438 : index + %3441 = addi %arg5, %c8 : index + %3442 = cmpi "slt", %3441, %c0 : index + %3443 = subi %c-1, %3441 : index + %3444 = select %3442, %3443, %3441 : index + %3445 = divi_signed %3444, %c16 : index + %3446 = subi %c-1, %3445 : index + %3447 = select %3442, %3446, %3445 : index + %3448 = muli %3447, %c-2 : index + %3449 = addi %3440, %3448 : index + %3450 = addi %3449, %c1 : index + %3451 = cmpi "slt", %3450, %c0 : index + %3452 = subi %c-1, %3450 : index + %3453 = select %3451, %3452, %3450 : index + %3454 = divi_signed %3453, %c2 : index + %3455 = subi %c-1, %3454 : index + %3456 = select %3451, %3455, %3454 : index + %3457 = muli %3456, %c-2 : index + %3458 = addi %3434, %3457 : index + %3459 = addi %3458, %c1 : index + store %3408, %2[%3419, %c0, %3459] : memref<16x6x2xvector<8xf32>> + %3460 = addi %arg5, %c8 : index + %3461 = cmpi "slt", %3460, %c0 : index + %3462 = subi %c-1, %3460 : index + %3463 = select %3461, %3462, %3460 : index + %3464 = divi_signed %3463, %c16 : index + %3465 = subi %c-1, %3464 : index + %3466 = select %3461, %3465, %3464 : index + %3467 = remi_signed %3466, %c16 : index + %3468 = cmpi "slt", %3467, %c0 : index + %3469 = addi %3467, %c16 : index + %3470 = select %3468, %3469, %3467 : index + %3471 = cmpi "slt", %arg5, %c0 : index + %3472 = subi %c-1, %arg5 : index + %3473 = select %3471, %3472, %arg5 : index + %3474 = divi_signed %3473, %c8 : index + %3475 = subi %c-1, %3474 : index + %3476 = select %3471, %3475, %3474 : index + %3477 = addi %arg5, %c8 : index + %3478 = cmpi "slt", %3477, %c0 : index + %3479 = subi %c-1, %3477 : index + %3480 = select %3478, %3479, %3477 : index + %3481 = divi_signed %3480, %c16 : index + %3482 = subi %c-1, %3481 : index + %3483 = select %3478, %3482, %3481 : index + %3484 = muli %3483, %c-2 : index + %3485 = addi %3476, %3484 : index + %3486 = cmpi "slt", %arg5, %c0 : index + %3487 = subi %c-1, %arg5 : index + %3488 = select %3486, %3487, %arg5 : index + %3489 = divi_signed %3488, %c8 : index + %3490 = subi %c-1, %3489 : index + %3491 = select %3486, %3490, %3489 : index + %3492 = addi %arg5, %c8 : index + %3493 = cmpi "slt", %3492, %c0 : index + %3494 = subi %c-1, %3492 : index + %3495 = select %3493, %3494, %3492 : index + %3496 = divi_signed %3495, %c16 : index + %3497 = subi %c-1, %3496 : index + %3498 = select %3493, %3497, %3496 : index + %3499 = muli %3498, %c-2 : index + %3500 = addi %3491, %3499 : index + %3501 = addi %3500, %c1 : index + %3502 = cmpi "slt", %3501, %c0 : index + %3503 = subi %c-1, %3501 : index + %3504 = select %3502, %3503, %3501 : index + %3505 = divi_signed %3504, %c2 : index + %3506 = subi %c-1, %3505 : index + %3507 = select %3502, %3506, %3505 : index + %3508 = muli %3507, %c-2 : index + %3509 = addi %3485, %3508 : index + %3510 = addi %3509, %c1 : index + %3511 = load %2[%3470, %c0, %3510] : memref<16x6x2xvector<8xf32>> + %3512 = vector.insertelement %2208, %3511[%c4_i64 : i64] : vector<8xf32> + %3513 = addi %arg5, %c8 : index + %3514 = cmpi "slt", %3513, %c0 : index + %3515 = subi %c-1, %3513 : index + %3516 = select %3514, %3515, %3513 : index + %3517 = divi_signed %3516, %c16 : index + %3518 = subi %c-1, %3517 : index + %3519 = select %3514, %3518, %3517 : index + %3520 = remi_signed %3519, %c16 : index + %3521 = cmpi "slt", %3520, %c0 : index + %3522 = addi %3520, %c16 : index + %3523 = select %3521, %3522, %3520 : index + %3524 = cmpi "slt", %arg5, %c0 : index + %3525 = subi %c-1, %arg5 : index + %3526 = select %3524, %3525, %arg5 : index + %3527 = divi_signed %3526, %c8 : index + %3528 = subi %c-1, %3527 : index + %3529 = select %3524, %3528, %3527 : index + %3530 = addi %arg5, %c8 : index + %3531 = cmpi "slt", %3530, %c0 : index + %3532 = subi %c-1, %3530 : index + %3533 = select %3531, %3532, %3530 : index + %3534 = divi_signed %3533, %c16 : index + %3535 = subi %c-1, %3534 : index + %3536 = select %3531, %3535, %3534 : index + %3537 = muli %3536, %c-2 : index + %3538 = addi %3529, %3537 : index + %3539 = cmpi "slt", %arg5, %c0 : index + %3540 = subi %c-1, %arg5 : index + %3541 = select %3539, %3540, %arg5 : index + %3542 = divi_signed %3541, %c8 : index + %3543 = subi %c-1, %3542 : index + %3544 = select %3539, %3543, %3542 : index + %3545 = addi %arg5, %c8 : index + %3546 = cmpi "slt", %3545, %c0 : index + %3547 = subi %c-1, %3545 : index + %3548 = select %3546, %3547, %3545 : index + %3549 = divi_signed %3548, %c16 : index + %3550 = subi %c-1, %3549 : index + %3551 = select %3546, %3550, %3549 : index + %3552 = muli %3551, %c-2 : index + %3553 = addi %3544, %3552 : index + %3554 = addi %3553, %c1 : index + %3555 = cmpi "slt", %3554, %c0 : index + %3556 = subi %c-1, %3554 : index + %3557 = select %3555, %3556, %3554 : index + %3558 = divi_signed %3557, %c2 : index + %3559 = subi %c-1, %3558 : index + %3560 = select %3555, %3559, %3558 : index + %3561 = muli %3560, %c-2 : index + %3562 = addi %3538, %3561 : index + %3563 = addi %3562, %c1 : index + store %3512, %2[%3523, %c0, %3563] : memref<16x6x2xvector<8xf32>> + %3564 = addi %arg5, %c8 : index + %3565 = cmpi "slt", %3564, %c0 : index + %3566 = subi %c-1, %3564 : index + %3567 = select %3565, %3566, %3564 : index + %3568 = divi_signed %3567, %c16 : index + %3569 = subi %c-1, %3568 : index + %3570 = select %3565, %3569, %3568 : index + %3571 = remi_signed %3570, %c16 : index + %3572 = cmpi "slt", %3571, %c0 : index + %3573 = addi %3571, %c16 : index + %3574 = select %3572, %3573, %3571 : index + %3575 = cmpi "slt", %arg5, %c0 : index + %3576 = subi %c-1, %arg5 : index + %3577 = select %3575, %3576, %arg5 : index + %3578 = divi_signed %3577, %c8 : index + %3579 = subi %c-1, %3578 : index + %3580 = select %3575, %3579, %3578 : index + %3581 = addi %arg5, %c8 : index + %3582 = cmpi "slt", %3581, %c0 : index + %3583 = subi %c-1, %3581 : index + %3584 = select %3582, %3583, %3581 : index + %3585 = divi_signed %3584, %c16 : index + %3586 = subi %c-1, %3585 : index + %3587 = select %3582, %3586, %3585 : index + %3588 = muli %3587, %c-2 : index + %3589 = addi %3580, %3588 : index + %3590 = cmpi "slt", %arg5, %c0 : index + %3591 = subi %c-1, %arg5 : index + %3592 = select %3590, %3591, %arg5 : index + %3593 = divi_signed %3592, %c8 : index + %3594 = subi %c-1, %3593 : index + %3595 = select %3590, %3594, %3593 : index + %3596 = addi %arg5, %c8 : index + %3597 = cmpi "slt", %3596, %c0 : index + %3598 = subi %c-1, %3596 : index + %3599 = select %3597, %3598, %3596 : index + %3600 = divi_signed %3599, %c16 : index + %3601 = subi %c-1, %3600 : index + %3602 = select %3597, %3601, %3600 : index + %3603 = muli %3602, %c-2 : index + %3604 = addi %3595, %3603 : index + %3605 = addi %3604, %c1 : index + %3606 = cmpi "slt", %3605, %c0 : index + %3607 = subi %c-1, %3605 : index + %3608 = select %3606, %3607, %3605 : index + %3609 = divi_signed %3608, %c2 : index + %3610 = subi %c-1, %3609 : index + %3611 = select %3606, %3610, %3609 : index + %3612 = muli %3611, %c-2 : index + %3613 = addi %3589, %3612 : index + %3614 = addi %3613, %c1 : index + %3615 = load %2[%3574, %c0, %3614] : memref<16x6x2xvector<8xf32>> + %3616 = vector.insertelement %2209, %3615[%c5_i64 : i64] : vector<8xf32> + %3617 = addi %arg5, %c8 : index + %3618 = cmpi "slt", %3617, %c0 : index + %3619 = subi %c-1, %3617 : index + %3620 = select %3618, %3619, %3617 : index + %3621 = divi_signed %3620, %c16 : index + %3622 = subi %c-1, %3621 : index + %3623 = select %3618, %3622, %3621 : index + %3624 = remi_signed %3623, %c16 : index + %3625 = cmpi "slt", %3624, %c0 : index + %3626 = addi %3624, %c16 : index + %3627 = select %3625, %3626, %3624 : index + %3628 = cmpi "slt", %arg5, %c0 : index + %3629 = subi %c-1, %arg5 : index + %3630 = select %3628, %3629, %arg5 : index + %3631 = divi_signed %3630, %c8 : index + %3632 = subi %c-1, %3631 : index + %3633 = select %3628, %3632, %3631 : index + %3634 = addi %arg5, %c8 : index + %3635 = cmpi "slt", %3634, %c0 : index + %3636 = subi %c-1, %3634 : index + %3637 = select %3635, %3636, %3634 : index + %3638 = divi_signed %3637, %c16 : index + %3639 = subi %c-1, %3638 : index + %3640 = select %3635, %3639, %3638 : index + %3641 = muli %3640, %c-2 : index + %3642 = addi %3633, %3641 : index + %3643 = cmpi "slt", %arg5, %c0 : index + %3644 = subi %c-1, %arg5 : index + %3645 = select %3643, %3644, %arg5 : index + %3646 = divi_signed %3645, %c8 : index + %3647 = subi %c-1, %3646 : index + %3648 = select %3643, %3647, %3646 : index + %3649 = addi %arg5, %c8 : index + %3650 = cmpi "slt", %3649, %c0 : index + %3651 = subi %c-1, %3649 : index + %3652 = select %3650, %3651, %3649 : index + %3653 = divi_signed %3652, %c16 : index + %3654 = subi %c-1, %3653 : index + %3655 = select %3650, %3654, %3653 : index + %3656 = muli %3655, %c-2 : index + %3657 = addi %3648, %3656 : index + %3658 = addi %3657, %c1 : index + %3659 = cmpi "slt", %3658, %c0 : index + %3660 = subi %c-1, %3658 : index + %3661 = select %3659, %3660, %3658 : index + %3662 = divi_signed %3661, %c2 : index + %3663 = subi %c-1, %3662 : index + %3664 = select %3659, %3663, %3662 : index + %3665 = muli %3664, %c-2 : index + %3666 = addi %3642, %3665 : index + %3667 = addi %3666, %c1 : index + store %3616, %2[%3627, %c0, %3667] : memref<16x6x2xvector<8xf32>> + %3668 = addi %arg5, %c8 : index + %3669 = cmpi "slt", %3668, %c0 : index + %3670 = subi %c-1, %3668 : index + %3671 = select %3669, %3670, %3668 : index + %3672 = divi_signed %3671, %c16 : index + %3673 = subi %c-1, %3672 : index + %3674 = select %3669, %3673, %3672 : index + %3675 = remi_signed %3674, %c16 : index + %3676 = cmpi "slt", %3675, %c0 : index + %3677 = addi %3675, %c16 : index + %3678 = select %3676, %3677, %3675 : index + %3679 = cmpi "slt", %arg5, %c0 : index + %3680 = subi %c-1, %arg5 : index + %3681 = select %3679, %3680, %arg5 : index + %3682 = divi_signed %3681, %c8 : index + %3683 = subi %c-1, %3682 : index + %3684 = select %3679, %3683, %3682 : index + %3685 = addi %arg5, %c8 : index + %3686 = cmpi "slt", %3685, %c0 : index + %3687 = subi %c-1, %3685 : index + %3688 = select %3686, %3687, %3685 : index + %3689 = divi_signed %3688, %c16 : index + %3690 = subi %c-1, %3689 : index + %3691 = select %3686, %3690, %3689 : index + %3692 = muli %3691, %c-2 : index + %3693 = addi %3684, %3692 : index + %3694 = cmpi "slt", %arg5, %c0 : index + %3695 = subi %c-1, %arg5 : index + %3696 = select %3694, %3695, %arg5 : index + %3697 = divi_signed %3696, %c8 : index + %3698 = subi %c-1, %3697 : index + %3699 = select %3694, %3698, %3697 : index + %3700 = addi %arg5, %c8 : index + %3701 = cmpi "slt", %3700, %c0 : index + %3702 = subi %c-1, %3700 : index + %3703 = select %3701, %3702, %3700 : index + %3704 = divi_signed %3703, %c16 : index + %3705 = subi %c-1, %3704 : index + %3706 = select %3701, %3705, %3704 : index + %3707 = muli %3706, %c-2 : index + %3708 = addi %3699, %3707 : index + %3709 = addi %3708, %c1 : index + %3710 = cmpi "slt", %3709, %c0 : index + %3711 = subi %c-1, %3709 : index + %3712 = select %3710, %3711, %3709 : index + %3713 = divi_signed %3712, %c2 : index + %3714 = subi %c-1, %3713 : index + %3715 = select %3710, %3714, %3713 : index + %3716 = muli %3715, %c-2 : index + %3717 = addi %3693, %3716 : index + %3718 = addi %3717, %c1 : index + %3719 = load %2[%3678, %c0, %3718] : memref<16x6x2xvector<8xf32>> + %3720 = vector.insertelement %2210, %3719[%c6_i64 : i64] : vector<8xf32> + %3721 = addi %arg5, %c8 : index + %3722 = cmpi "slt", %3721, %c0 : index + %3723 = subi %c-1, %3721 : index + %3724 = select %3722, %3723, %3721 : index + %3725 = divi_signed %3724, %c16 : index + %3726 = subi %c-1, %3725 : index + %3727 = select %3722, %3726, %3725 : index + %3728 = remi_signed %3727, %c16 : index + %3729 = cmpi "slt", %3728, %c0 : index + %3730 = addi %3728, %c16 : index + %3731 = select %3729, %3730, %3728 : index + %3732 = cmpi "slt", %arg5, %c0 : index + %3733 = subi %c-1, %arg5 : index + %3734 = select %3732, %3733, %arg5 : index + %3735 = divi_signed %3734, %c8 : index + %3736 = subi %c-1, %3735 : index + %3737 = select %3732, %3736, %3735 : index + %3738 = addi %arg5, %c8 : index + %3739 = cmpi "slt", %3738, %c0 : index + %3740 = subi %c-1, %3738 : index + %3741 = select %3739, %3740, %3738 : index + %3742 = divi_signed %3741, %c16 : index + %3743 = subi %c-1, %3742 : index + %3744 = select %3739, %3743, %3742 : index + %3745 = muli %3744, %c-2 : index + %3746 = addi %3737, %3745 : index + %3747 = cmpi "slt", %arg5, %c0 : index + %3748 = subi %c-1, %arg5 : index + %3749 = select %3747, %3748, %arg5 : index + %3750 = divi_signed %3749, %c8 : index + %3751 = subi %c-1, %3750 : index + %3752 = select %3747, %3751, %3750 : index + %3753 = addi %arg5, %c8 : index + %3754 = cmpi "slt", %3753, %c0 : index + %3755 = subi %c-1, %3753 : index + %3756 = select %3754, %3755, %3753 : index + %3757 = divi_signed %3756, %c16 : index + %3758 = subi %c-1, %3757 : index + %3759 = select %3754, %3758, %3757 : index + %3760 = muli %3759, %c-2 : index + %3761 = addi %3752, %3760 : index + %3762 = addi %3761, %c1 : index + %3763 = cmpi "slt", %3762, %c0 : index + %3764 = subi %c-1, %3762 : index + %3765 = select %3763, %3764, %3762 : index + %3766 = divi_signed %3765, %c2 : index + %3767 = subi %c-1, %3766 : index + %3768 = select %3763, %3767, %3766 : index + %3769 = muli %3768, %c-2 : index + %3770 = addi %3746, %3769 : index + %3771 = addi %3770, %c1 : index + store %3720, %2[%3731, %c0, %3771] : memref<16x6x2xvector<8xf32>> + %3772 = addi %arg5, %c8 : index + %3773 = cmpi "slt", %3772, %c0 : index + %3774 = subi %c-1, %3772 : index + %3775 = select %3773, %3774, %3772 : index + %3776 = divi_signed %3775, %c16 : index + %3777 = subi %c-1, %3776 : index + %3778 = select %3773, %3777, %3776 : index + %3779 = remi_signed %3778, %c16 : index + %3780 = cmpi "slt", %3779, %c0 : index + %3781 = addi %3779, %c16 : index + %3782 = select %3780, %3781, %3779 : index + %3783 = cmpi "slt", %arg5, %c0 : index + %3784 = subi %c-1, %arg5 : index + %3785 = select %3783, %3784, %arg5 : index + %3786 = divi_signed %3785, %c8 : index + %3787 = subi %c-1, %3786 : index + %3788 = select %3783, %3787, %3786 : index + %3789 = addi %arg5, %c8 : index + %3790 = cmpi "slt", %3789, %c0 : index + %3791 = subi %c-1, %3789 : index + %3792 = select %3790, %3791, %3789 : index + %3793 = divi_signed %3792, %c16 : index + %3794 = subi %c-1, %3793 : index + %3795 = select %3790, %3794, %3793 : index + %3796 = muli %3795, %c-2 : index + %3797 = addi %3788, %3796 : index + %3798 = cmpi "slt", %arg5, %c0 : index + %3799 = subi %c-1, %arg5 : index + %3800 = select %3798, %3799, %arg5 : index + %3801 = divi_signed %3800, %c8 : index + %3802 = subi %c-1, %3801 : index + %3803 = select %3798, %3802, %3801 : index + %3804 = addi %arg5, %c8 : index + %3805 = cmpi "slt", %3804, %c0 : index + %3806 = subi %c-1, %3804 : index + %3807 = select %3805, %3806, %3804 : index + %3808 = divi_signed %3807, %c16 : index + %3809 = subi %c-1, %3808 : index + %3810 = select %3805, %3809, %3808 : index + %3811 = muli %3810, %c-2 : index + %3812 = addi %3803, %3811 : index + %3813 = addi %3812, %c1 : index + %3814 = cmpi "slt", %3813, %c0 : index + %3815 = subi %c-1, %3813 : index + %3816 = select %3814, %3815, %3813 : index + %3817 = divi_signed %3816, %c2 : index + %3818 = subi %c-1, %3817 : index + %3819 = select %3814, %3818, %3817 : index + %3820 = muli %3819, %c-2 : index + %3821 = addi %3797, %3820 : index + %3822 = addi %3821, %c1 : index + %3823 = load %2[%3782, %c0, %3822] : memref<16x6x2xvector<8xf32>> + %3824 = vector.insertelement %2211, %3823[%c7_i64 : i64] : vector<8xf32> + %3825 = addi %arg5, %c8 : index + %3826 = cmpi "slt", %3825, %c0 : index + %3827 = subi %c-1, %3825 : index + %3828 = select %3826, %3827, %3825 : index + %3829 = divi_signed %3828, %c16 : index + %3830 = subi %c-1, %3829 : index + %3831 = select %3826, %3830, %3829 : index + %3832 = remi_signed %3831, %c16 : index + %3833 = cmpi "slt", %3832, %c0 : index + %3834 = addi %3832, %c16 : index + %3835 = select %3833, %3834, %3832 : index + %3836 = cmpi "slt", %arg5, %c0 : index + %3837 = subi %c-1, %arg5 : index + %3838 = select %3836, %3837, %arg5 : index + %3839 = divi_signed %3838, %c8 : index + %3840 = subi %c-1, %3839 : index + %3841 = select %3836, %3840, %3839 : index + %3842 = addi %arg5, %c8 : index + %3843 = cmpi "slt", %3842, %c0 : index + %3844 = subi %c-1, %3842 : index + %3845 = select %3843, %3844, %3842 : index + %3846 = divi_signed %3845, %c16 : index + %3847 = subi %c-1, %3846 : index + %3848 = select %3843, %3847, %3846 : index + %3849 = muli %3848, %c-2 : index + %3850 = addi %3841, %3849 : index + %3851 = cmpi "slt", %arg5, %c0 : index + %3852 = subi %c-1, %arg5 : index + %3853 = select %3851, %3852, %arg5 : index + %3854 = divi_signed %3853, %c8 : index + %3855 = subi %c-1, %3854 : index + %3856 = select %3851, %3855, %3854 : index + %3857 = addi %arg5, %c8 : index + %3858 = cmpi "slt", %3857, %c0 : index + %3859 = subi %c-1, %3857 : index + %3860 = select %3858, %3859, %3857 : index + %3861 = divi_signed %3860, %c16 : index + %3862 = subi %c-1, %3861 : index + %3863 = select %3858, %3862, %3861 : index + %3864 = muli %3863, %c-2 : index + %3865 = addi %3856, %3864 : index + %3866 = addi %3865, %c1 : index + %3867 = cmpi "slt", %3866, %c0 : index + %3868 = subi %c-1, %3866 : index + %3869 = select %3867, %3868, %3866 : index + %3870 = divi_signed %3869, %c2 : index + %3871 = subi %c-1, %3870 : index + %3872 = select %3867, %3871, %3870 : index + %3873 = muli %3872, %c-2 : index + %3874 = addi %3850, %3873 : index + %3875 = addi %3874, %c1 : index + store %3824, %2[%3835, %c0, %3875] : memref<16x6x2xvector<8xf32>> + } + } + } + scf.for %arg5 = %c0 to %c256 step %c128 { + scf.if %true { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %arg3, %arg5 : index + %33 = addi %32, %c8 : index + %34 = vector.transfer_read %arg2[%arg4, %33], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %35 = addi %arg5, %c8 : index + %36 = cmpi "slt", %35, %c0 : index + %37 = subi %c-1, %35 : index + %38 = select %36, %37, %35 : index + %39 = divi_signed %38, %c16 : index + %40 = subi %c-1, %39 : index + %41 = select %36, %40, %39 : index + %42 = remi_signed %41, %c16 : index + %43 = cmpi "slt", %42, %c0 : index + %44 = addi %42, %c16 : index + %45 = select %43, %44, %42 : index + %46 = cmpi "slt", %arg5, %c0 : index + %47 = subi %c-1, %arg5 : index + %48 = select %46, %47, %arg5 : index + %49 = divi_signed %48, %c8 : index + %50 = subi %c-1, %49 : index + %51 = select %46, %50, %49 : index + %52 = addi %arg5, %c8 : index + %53 = cmpi "slt", %52, %c0 : index + %54 = subi %c-1, %52 : index + %55 = select %53, %54, %52 : index + %56 = divi_signed %55, %c16 : index + %57 = subi %c-1, %56 : index + %58 = select %53, %57, %56 : index + %59 = muli %58, %c-2 : index + %60 = addi %51, %59 : index + %61 = cmpi "slt", %arg5, %c0 : index + %62 = subi %c-1, %arg5 : index + %63 = select %61, %62, %arg5 : index + %64 = divi_signed %63, %c8 : index + %65 = subi %c-1, %64 : index + %66 = select %61, %65, %64 : index + %67 = addi %arg5, %c8 : index + %68 = cmpi "slt", %67, %c0 : index + %69 = subi %c-1, %67 : index + %70 = select %68, %69, %67 : index + %71 = divi_signed %70, %c16 : index + %72 = subi %c-1, %71 : index + %73 = select %68, %72, %71 : index + %74 = muli %73, %c-2 : index + %75 = addi %66, %74 : index + %76 = addi %75, %c1 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = subi %c-1, %76 : index + %79 = select %77, %78, %76 : index + %80 = divi_signed %79, %c2 : index + %81 = subi %c-1, %80 : index + %82 = select %77, %81, %80 : index + %83 = muli %82, %c-2 : index + %84 = addi %60, %83 : index + %85 = addi %84, %c1 : index + %86 = load %2[%45, %c0, %85] : memref<16x6x2xvector<8xf32>> + %87 = addf %34, %86 : vector<8xf32> + store %87, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %88 = addi %arg3, %arg5 : index + %89 = addi %88, %c16 : index + %90 = vector.transfer_read %arg2[%arg4, %89], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %91 = cmpi "slt", %arg5, %c0 : index + %92 = subi %c-1, %arg5 : index + %93 = select %91, %92, %arg5 : index + %94 = divi_signed %93, %c16 : index + %95 = subi %c-1, %94 : index + %96 = select %91, %95, %94 : index + %97 = cmpi "slt", %arg5, %c0 : index + %98 = subi %c-1, %arg5 : index + %99 = select %97, %98, %arg5 : index + %100 = divi_signed %99, %c16 : index + %101 = subi %c-1, %100 : index + %102 = select %97, %101, %100 : index + %103 = addi %102, %c1 : index + %104 = cmpi "slt", %103, %c0 : index + %105 = subi %c-1, %103 : index + %106 = select %104, %105, %103 : index + %107 = divi_signed %106, %c16 : index + %108 = subi %c-1, %107 : index + %109 = select %104, %108, %107 : index + %110 = muli %109, %c-16 : index + %111 = addi %96, %110 : index + %112 = addi %111, %c1 : index + %113 = remi_signed %arg5, %c16 : index + %114 = cmpi "slt", %113, %c0 : index + %115 = addi %113, %c16 : index + %116 = select %114, %115, %113 : index + %117 = cmpi "slt", %116, %c0 : index + %118 = subi %c-1, %116 : index + %119 = select %117, %118, %116 : index + %120 = divi_signed %119, %c8 : index + %121 = subi %c-1, %120 : index + %122 = select %117, %121, %120 : index + %123 = remi_signed %122, %c2 : index + %124 = cmpi "slt", %123, %c0 : index + %125 = addi %123, %c2 : index + %126 = select %124, %125, %123 : index + %127 = load %2[%112, %c0, %126] : memref<16x6x2xvector<8xf32>> + %128 = addf %90, %127 : vector<8xf32> + store %128, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %129 = addi %arg3, %arg5 : index + %130 = addi %129, %c24 : index + %131 = vector.transfer_read %arg2[%arg4, %130], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %132 = addi %arg5, %c24 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c16 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = remi_signed %138, %c16 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = addi %139, %c16 : index + %142 = select %140, %141, %139 : index + %143 = cmpi "slt", %arg5, %c0 : index + %144 = subi %c-1, %arg5 : index + %145 = select %143, %144, %arg5 : index + %146 = divi_signed %145, %c8 : index + %147 = subi %c-1, %146 : index + %148 = select %143, %147, %146 : index + %149 = addi %arg5, %c24 : index + %150 = cmpi "slt", %149, %c0 : index + %151 = subi %c-1, %149 : index + %152 = select %150, %151, %149 : index + %153 = divi_signed %152, %c16 : index + %154 = subi %c-1, %153 : index + %155 = select %150, %154, %153 : index + %156 = muli %155, %c-2 : index + %157 = addi %148, %156 : index + %158 = cmpi "slt", %arg5, %c0 : index + %159 = subi %c-1, %arg5 : index + %160 = select %158, %159, %arg5 : index + %161 = divi_signed %160, %c8 : index + %162 = subi %c-1, %161 : index + %163 = select %158, %162, %161 : index + %164 = addi %arg5, %c24 : index + %165 = cmpi "slt", %164, %c0 : index + %166 = subi %c-1, %164 : index + %167 = select %165, %166, %164 : index + %168 = divi_signed %167, %c16 : index + %169 = subi %c-1, %168 : index + %170 = select %165, %169, %168 : index + %171 = muli %170, %c-2 : index + %172 = addi %163, %171 : index + %173 = addi %172, %c3 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %157, %180 : index + %182 = addi %181, %c3 : index + %183 = load %2[%142, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %131, %183 : vector<8xf32> + store %184, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %185 = addi %arg3, %arg5 : index + %186 = addi %185, %c32 : index + %187 = vector.transfer_read %arg2[%arg4, %186], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %188 = cmpi "slt", %arg5, %c0 : index + %189 = subi %c-1, %arg5 : index + %190 = select %188, %189, %arg5 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = cmpi "slt", %arg5, %c0 : index + %195 = subi %c-1, %arg5 : index + %196 = select %194, %195, %arg5 : index + %197 = divi_signed %196, %c16 : index + %198 = subi %c-1, %197 : index + %199 = select %194, %198, %197 : index + %200 = addi %199, %c2 : index + %201 = cmpi "slt", %200, %c0 : index + %202 = subi %c-1, %200 : index + %203 = select %201, %202, %200 : index + %204 = divi_signed %203, %c16 : index + %205 = subi %c-1, %204 : index + %206 = select %201, %205, %204 : index + %207 = muli %206, %c-16 : index + %208 = addi %193, %207 : index + %209 = addi %208, %c2 : index + %210 = remi_signed %arg5, %c16 : index + %211 = cmpi "slt", %210, %c0 : index + %212 = addi %210, %c16 : index + %213 = select %211, %212, %210 : index + %214 = cmpi "slt", %213, %c0 : index + %215 = subi %c-1, %213 : index + %216 = select %214, %215, %213 : index + %217 = divi_signed %216, %c8 : index + %218 = subi %c-1, %217 : index + %219 = select %214, %218, %217 : index + %220 = remi_signed %219, %c2 : index + %221 = cmpi "slt", %220, %c0 : index + %222 = addi %220, %c2 : index + %223 = select %221, %222, %220 : index + %224 = load %2[%209, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %187, %224 : vector<8xf32> + store %225, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %226 = addi %arg3, %arg5 : index + %227 = addi %226, %c40 : index + %228 = vector.transfer_read %arg2[%arg4, %227], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %229 = addi %arg5, %c40 : index + %230 = cmpi "slt", %229, %c0 : index + %231 = subi %c-1, %229 : index + %232 = select %230, %231, %229 : index + %233 = divi_signed %232, %c16 : index + %234 = subi %c-1, %233 : index + %235 = select %230, %234, %233 : index + %236 = remi_signed %235, %c16 : index + %237 = cmpi "slt", %236, %c0 : index + %238 = addi %236, %c16 : index + %239 = select %237, %238, %236 : index + %240 = cmpi "slt", %arg5, %c0 : index + %241 = subi %c-1, %arg5 : index + %242 = select %240, %241, %arg5 : index + %243 = divi_signed %242, %c8 : index + %244 = subi %c-1, %243 : index + %245 = select %240, %244, %243 : index + %246 = addi %arg5, %c40 : index + %247 = cmpi "slt", %246, %c0 : index + %248 = subi %c-1, %246 : index + %249 = select %247, %248, %246 : index + %250 = divi_signed %249, %c16 : index + %251 = subi %c-1, %250 : index + %252 = select %247, %251, %250 : index + %253 = muli %252, %c-2 : index + %254 = addi %245, %253 : index + %255 = cmpi "slt", %arg5, %c0 : index + %256 = subi %c-1, %arg5 : index + %257 = select %255, %256, %arg5 : index + %258 = divi_signed %257, %c8 : index + %259 = subi %c-1, %258 : index + %260 = select %255, %259, %258 : index + %261 = addi %arg5, %c40 : index + %262 = cmpi "slt", %261, %c0 : index + %263 = subi %c-1, %261 : index + %264 = select %262, %263, %261 : index + %265 = divi_signed %264, %c16 : index + %266 = subi %c-1, %265 : index + %267 = select %262, %266, %265 : index + %268 = muli %267, %c-2 : index + %269 = addi %260, %268 : index + %270 = addi %269, %c5 : index + %271 = cmpi "slt", %270, %c0 : index + %272 = subi %c-1, %270 : index + %273 = select %271, %272, %270 : index + %274 = divi_signed %273, %c2 : index + %275 = subi %c-1, %274 : index + %276 = select %271, %275, %274 : index + %277 = muli %276, %c-2 : index + %278 = addi %254, %277 : index + %279 = addi %278, %c5 : index + %280 = load %2[%239, %c0, %279] : memref<16x6x2xvector<8xf32>> + %281 = addf %228, %280 : vector<8xf32> + store %281, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %282 = addi %arg3, %arg5 : index + %283 = addi %282, %c48 : index + %284 = vector.transfer_read %arg2[%arg4, %283], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %285 = cmpi "slt", %arg5, %c0 : index + %286 = subi %c-1, %arg5 : index + %287 = select %285, %286, %arg5 : index + %288 = divi_signed %287, %c16 : index + %289 = subi %c-1, %288 : index + %290 = select %285, %289, %288 : index + %291 = cmpi "slt", %arg5, %c0 : index + %292 = subi %c-1, %arg5 : index + %293 = select %291, %292, %arg5 : index + %294 = divi_signed %293, %c16 : index + %295 = subi %c-1, %294 : index + %296 = select %291, %295, %294 : index + %297 = addi %296, %c3 : index + %298 = cmpi "slt", %297, %c0 : index + %299 = subi %c-1, %297 : index + %300 = select %298, %299, %297 : index + %301 = divi_signed %300, %c16 : index + %302 = subi %c-1, %301 : index + %303 = select %298, %302, %301 : index + %304 = muli %303, %c-16 : index + %305 = addi %290, %304 : index + %306 = addi %305, %c3 : index + %307 = remi_signed %arg5, %c16 : index + %308 = cmpi "slt", %307, %c0 : index + %309 = addi %307, %c16 : index + %310 = select %308, %309, %307 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c8 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = remi_signed %316, %c2 : index + %318 = cmpi "slt", %317, %c0 : index + %319 = addi %317, %c2 : index + %320 = select %318, %319, %317 : index + %321 = load %2[%306, %c0, %320] : memref<16x6x2xvector<8xf32>> + %322 = addf %284, %321 : vector<8xf32> + store %322, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %323 = addi %arg3, %arg5 : index + %324 = addi %323, %c56 : index + %325 = vector.transfer_read %arg2[%arg4, %324], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %326 = addi %arg5, %c56 : index + %327 = cmpi "slt", %326, %c0 : index + %328 = subi %c-1, %326 : index + %329 = select %327, %328, %326 : index + %330 = divi_signed %329, %c16 : index + %331 = subi %c-1, %330 : index + %332 = select %327, %331, %330 : index + %333 = remi_signed %332, %c16 : index + %334 = cmpi "slt", %333, %c0 : index + %335 = addi %333, %c16 : index + %336 = select %334, %335, %333 : index + %337 = cmpi "slt", %arg5, %c0 : index + %338 = subi %c-1, %arg5 : index + %339 = select %337, %338, %arg5 : index + %340 = divi_signed %339, %c8 : index + %341 = subi %c-1, %340 : index + %342 = select %337, %341, %340 : index + %343 = addi %arg5, %c56 : index + %344 = cmpi "slt", %343, %c0 : index + %345 = subi %c-1, %343 : index + %346 = select %344, %345, %343 : index + %347 = divi_signed %346, %c16 : index + %348 = subi %c-1, %347 : index + %349 = select %344, %348, %347 : index + %350 = muli %349, %c-2 : index + %351 = addi %342, %350 : index + %352 = cmpi "slt", %arg5, %c0 : index + %353 = subi %c-1, %arg5 : index + %354 = select %352, %353, %arg5 : index + %355 = divi_signed %354, %c8 : index + %356 = subi %c-1, %355 : index + %357 = select %352, %356, %355 : index + %358 = addi %arg5, %c56 : index + %359 = cmpi "slt", %358, %c0 : index + %360 = subi %c-1, %358 : index + %361 = select %359, %360, %358 : index + %362 = divi_signed %361, %c16 : index + %363 = subi %c-1, %362 : index + %364 = select %359, %363, %362 : index + %365 = muli %364, %c-2 : index + %366 = addi %357, %365 : index + %367 = addi %366, %c7 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = subi %c-1, %367 : index + %370 = select %368, %369, %367 : index + %371 = divi_signed %370, %c2 : index + %372 = subi %c-1, %371 : index + %373 = select %368, %372, %371 : index + %374 = muli %373, %c-2 : index + %375 = addi %351, %374 : index + %376 = addi %375, %c7 : index + %377 = load %2[%336, %c0, %376] : memref<16x6x2xvector<8xf32>> + %378 = addf %325, %377 : vector<8xf32> + store %378, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %379 = addi %arg3, %arg5 : index + %380 = addi %379, %c64 : index + %381 = vector.transfer_read %arg2[%arg4, %380], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %382 = cmpi "slt", %arg5, %c0 : index + %383 = subi %c-1, %arg5 : index + %384 = select %382, %383, %arg5 : index + %385 = divi_signed %384, %c16 : index + %386 = subi %c-1, %385 : index + %387 = select %382, %386, %385 : index + %388 = cmpi "slt", %arg5, %c0 : index + %389 = subi %c-1, %arg5 : index + %390 = select %388, %389, %arg5 : index + %391 = divi_signed %390, %c16 : index + %392 = subi %c-1, %391 : index + %393 = select %388, %392, %391 : index + %394 = addi %393, %c4 : index + %395 = cmpi "slt", %394, %c0 : index + %396 = subi %c-1, %394 : index + %397 = select %395, %396, %394 : index + %398 = divi_signed %397, %c16 : index + %399 = subi %c-1, %398 : index + %400 = select %395, %399, %398 : index + %401 = muli %400, %c-16 : index + %402 = addi %387, %401 : index + %403 = addi %402, %c4 : index + %404 = remi_signed %arg5, %c16 : index + %405 = cmpi "slt", %404, %c0 : index + %406 = addi %404, %c16 : index + %407 = select %405, %406, %404 : index + %408 = cmpi "slt", %407, %c0 : index + %409 = subi %c-1, %407 : index + %410 = select %408, %409, %407 : index + %411 = divi_signed %410, %c8 : index + %412 = subi %c-1, %411 : index + %413 = select %408, %412, %411 : index + %414 = remi_signed %413, %c2 : index + %415 = cmpi "slt", %414, %c0 : index + %416 = addi %414, %c2 : index + %417 = select %415, %416, %414 : index + %418 = load %2[%403, %c0, %417] : memref<16x6x2xvector<8xf32>> + %419 = addf %381, %418 : vector<8xf32> + store %419, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %420 = addi %arg3, %arg5 : index + %421 = addi %420, %c72 : index + %422 = vector.transfer_read %arg2[%arg4, %421], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %423 = addi %arg5, %c72 : index + %424 = cmpi "slt", %423, %c0 : index + %425 = subi %c-1, %423 : index + %426 = select %424, %425, %423 : index + %427 = divi_signed %426, %c16 : index + %428 = subi %c-1, %427 : index + %429 = select %424, %428, %427 : index + %430 = remi_signed %429, %c16 : index + %431 = cmpi "slt", %430, %c0 : index + %432 = addi %430, %c16 : index + %433 = select %431, %432, %430 : index + %434 = cmpi "slt", %arg5, %c0 : index + %435 = subi %c-1, %arg5 : index + %436 = select %434, %435, %arg5 : index + %437 = divi_signed %436, %c8 : index + %438 = subi %c-1, %437 : index + %439 = select %434, %438, %437 : index + %440 = addi %arg5, %c72 : index + %441 = cmpi "slt", %440, %c0 : index + %442 = subi %c-1, %440 : index + %443 = select %441, %442, %440 : index + %444 = divi_signed %443, %c16 : index + %445 = subi %c-1, %444 : index + %446 = select %441, %445, %444 : index + %447 = muli %446, %c-2 : index + %448 = addi %439, %447 : index + %449 = cmpi "slt", %arg5, %c0 : index + %450 = subi %c-1, %arg5 : index + %451 = select %449, %450, %arg5 : index + %452 = divi_signed %451, %c8 : index + %453 = subi %c-1, %452 : index + %454 = select %449, %453, %452 : index + %455 = addi %arg5, %c72 : index + %456 = cmpi "slt", %455, %c0 : index + %457 = subi %c-1, %455 : index + %458 = select %456, %457, %455 : index + %459 = divi_signed %458, %c16 : index + %460 = subi %c-1, %459 : index + %461 = select %456, %460, %459 : index + %462 = muli %461, %c-2 : index + %463 = addi %454, %462 : index + %464 = addi %463, %c9 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = subi %c-1, %464 : index + %467 = select %465, %466, %464 : index + %468 = divi_signed %467, %c2 : index + %469 = subi %c-1, %468 : index + %470 = select %465, %469, %468 : index + %471 = muli %470, %c-2 : index + %472 = addi %448, %471 : index + %473 = addi %472, %c9 : index + %474 = load %2[%433, %c0, %473] : memref<16x6x2xvector<8xf32>> + %475 = addf %422, %474 : vector<8xf32> + store %475, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %476 = addi %arg3, %arg5 : index + %477 = addi %476, %c80 : index + %478 = vector.transfer_read %arg2[%arg4, %477], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %479 = cmpi "slt", %arg5, %c0 : index + %480 = subi %c-1, %arg5 : index + %481 = select %479, %480, %arg5 : index + %482 = divi_signed %481, %c16 : index + %483 = subi %c-1, %482 : index + %484 = select %479, %483, %482 : index + %485 = cmpi "slt", %arg5, %c0 : index + %486 = subi %c-1, %arg5 : index + %487 = select %485, %486, %arg5 : index + %488 = divi_signed %487, %c16 : index + %489 = subi %c-1, %488 : index + %490 = select %485, %489, %488 : index + %491 = addi %490, %c5 : index + %492 = cmpi "slt", %491, %c0 : index + %493 = subi %c-1, %491 : index + %494 = select %492, %493, %491 : index + %495 = divi_signed %494, %c16 : index + %496 = subi %c-1, %495 : index + %497 = select %492, %496, %495 : index + %498 = muli %497, %c-16 : index + %499 = addi %484, %498 : index + %500 = addi %499, %c5 : index + %501 = remi_signed %arg5, %c16 : index + %502 = cmpi "slt", %501, %c0 : index + %503 = addi %501, %c16 : index + %504 = select %502, %503, %501 : index + %505 = cmpi "slt", %504, %c0 : index + %506 = subi %c-1, %504 : index + %507 = select %505, %506, %504 : index + %508 = divi_signed %507, %c8 : index + %509 = subi %c-1, %508 : index + %510 = select %505, %509, %508 : index + %511 = remi_signed %510, %c2 : index + %512 = cmpi "slt", %511, %c0 : index + %513 = addi %511, %c2 : index + %514 = select %512, %513, %511 : index + %515 = load %2[%500, %c0, %514] : memref<16x6x2xvector<8xf32>> + %516 = addf %478, %515 : vector<8xf32> + store %516, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %517 = addi %arg3, %arg5 : index + %518 = addi %517, %c88 : index + %519 = vector.transfer_read %arg2[%arg4, %518], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %520 = addi %arg5, %c88 : index + %521 = cmpi "slt", %520, %c0 : index + %522 = subi %c-1, %520 : index + %523 = select %521, %522, %520 : index + %524 = divi_signed %523, %c16 : index + %525 = subi %c-1, %524 : index + %526 = select %521, %525, %524 : index + %527 = remi_signed %526, %c16 : index + %528 = cmpi "slt", %527, %c0 : index + %529 = addi %527, %c16 : index + %530 = select %528, %529, %527 : index + %531 = cmpi "slt", %arg5, %c0 : index + %532 = subi %c-1, %arg5 : index + %533 = select %531, %532, %arg5 : index + %534 = divi_signed %533, %c8 : index + %535 = subi %c-1, %534 : index + %536 = select %531, %535, %534 : index + %537 = addi %arg5, %c88 : index + %538 = cmpi "slt", %537, %c0 : index + %539 = subi %c-1, %537 : index + %540 = select %538, %539, %537 : index + %541 = divi_signed %540, %c16 : index + %542 = subi %c-1, %541 : index + %543 = select %538, %542, %541 : index + %544 = muli %543, %c-2 : index + %545 = addi %536, %544 : index + %546 = cmpi "slt", %arg5, %c0 : index + %547 = subi %c-1, %arg5 : index + %548 = select %546, %547, %arg5 : index + %549 = divi_signed %548, %c8 : index + %550 = subi %c-1, %549 : index + %551 = select %546, %550, %549 : index + %552 = addi %arg5, %c88 : index + %553 = cmpi "slt", %552, %c0 : index + %554 = subi %c-1, %552 : index + %555 = select %553, %554, %552 : index + %556 = divi_signed %555, %c16 : index + %557 = subi %c-1, %556 : index + %558 = select %553, %557, %556 : index + %559 = muli %558, %c-2 : index + %560 = addi %551, %559 : index + %561 = addi %560, %c11 : index + %562 = cmpi "slt", %561, %c0 : index + %563 = subi %c-1, %561 : index + %564 = select %562, %563, %561 : index + %565 = divi_signed %564, %c2 : index + %566 = subi %c-1, %565 : index + %567 = select %562, %566, %565 : index + %568 = muli %567, %c-2 : index + %569 = addi %545, %568 : index + %570 = addi %569, %c11 : index + %571 = load %2[%530, %c0, %570] : memref<16x6x2xvector<8xf32>> + %572 = addf %519, %571 : vector<8xf32> + store %572, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %573 = addi %arg3, %arg5 : index + %574 = addi %573, %c96 : index + %575 = vector.transfer_read %arg2[%arg4, %574], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %576 = cmpi "slt", %arg5, %c0 : index + %577 = subi %c-1, %arg5 : index + %578 = select %576, %577, %arg5 : index + %579 = divi_signed %578, %c16 : index + %580 = subi %c-1, %579 : index + %581 = select %576, %580, %579 : index + %582 = cmpi "slt", %arg5, %c0 : index + %583 = subi %c-1, %arg5 : index + %584 = select %582, %583, %arg5 : index + %585 = divi_signed %584, %c16 : index + %586 = subi %c-1, %585 : index + %587 = select %582, %586, %585 : index + %588 = addi %587, %c6 : index + %589 = cmpi "slt", %588, %c0 : index + %590 = subi %c-1, %588 : index + %591 = select %589, %590, %588 : index + %592 = divi_signed %591, %c16 : index + %593 = subi %c-1, %592 : index + %594 = select %589, %593, %592 : index + %595 = muli %594, %c-16 : index + %596 = addi %581, %595 : index + %597 = addi %596, %c6 : index + %598 = remi_signed %arg5, %c16 : index + %599 = cmpi "slt", %598, %c0 : index + %600 = addi %598, %c16 : index + %601 = select %599, %600, %598 : index + %602 = cmpi "slt", %601, %c0 : index + %603 = subi %c-1, %601 : index + %604 = select %602, %603, %601 : index + %605 = divi_signed %604, %c8 : index + %606 = subi %c-1, %605 : index + %607 = select %602, %606, %605 : index + %608 = remi_signed %607, %c2 : index + %609 = cmpi "slt", %608, %c0 : index + %610 = addi %608, %c2 : index + %611 = select %609, %610, %608 : index + %612 = load %2[%597, %c0, %611] : memref<16x6x2xvector<8xf32>> + %613 = addf %575, %612 : vector<8xf32> + store %613, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %614 = addi %arg3, %arg5 : index + %615 = addi %614, %c104 : index + %616 = vector.transfer_read %arg2[%arg4, %615], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %617 = addi %arg5, %c104 : index + %618 = cmpi "slt", %617, %c0 : index + %619 = subi %c-1, %617 : index + %620 = select %618, %619, %617 : index + %621 = divi_signed %620, %c16 : index + %622 = subi %c-1, %621 : index + %623 = select %618, %622, %621 : index + %624 = remi_signed %623, %c16 : index + %625 = cmpi "slt", %624, %c0 : index + %626 = addi %624, %c16 : index + %627 = select %625, %626, %624 : index + %628 = cmpi "slt", %arg5, %c0 : index + %629 = subi %c-1, %arg5 : index + %630 = select %628, %629, %arg5 : index + %631 = divi_signed %630, %c8 : index + %632 = subi %c-1, %631 : index + %633 = select %628, %632, %631 : index + %634 = addi %arg5, %c104 : index + %635 = cmpi "slt", %634, %c0 : index + %636 = subi %c-1, %634 : index + %637 = select %635, %636, %634 : index + %638 = divi_signed %637, %c16 : index + %639 = subi %c-1, %638 : index + %640 = select %635, %639, %638 : index + %641 = muli %640, %c-2 : index + %642 = addi %633, %641 : index + %643 = cmpi "slt", %arg5, %c0 : index + %644 = subi %c-1, %arg5 : index + %645 = select %643, %644, %arg5 : index + %646 = divi_signed %645, %c8 : index + %647 = subi %c-1, %646 : index + %648 = select %643, %647, %646 : index + %649 = addi %arg5, %c104 : index + %650 = cmpi "slt", %649, %c0 : index + %651 = subi %c-1, %649 : index + %652 = select %650, %651, %649 : index + %653 = divi_signed %652, %c16 : index + %654 = subi %c-1, %653 : index + %655 = select %650, %654, %653 : index + %656 = muli %655, %c-2 : index + %657 = addi %648, %656 : index + %658 = addi %657, %c13 : index + %659 = cmpi "slt", %658, %c0 : index + %660 = subi %c-1, %658 : index + %661 = select %659, %660, %658 : index + %662 = divi_signed %661, %c2 : index + %663 = subi %c-1, %662 : index + %664 = select %659, %663, %662 : index + %665 = muli %664, %c-2 : index + %666 = addi %642, %665 : index + %667 = addi %666, %c13 : index + %668 = load %2[%627, %c0, %667] : memref<16x6x2xvector<8xf32>> + %669 = addf %616, %668 : vector<8xf32> + store %669, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %670 = addi %arg3, %arg5 : index + %671 = addi %670, %c112 : index + %672 = vector.transfer_read %arg2[%arg4, %671], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %673 = cmpi "slt", %arg5, %c0 : index + %674 = subi %c-1, %arg5 : index + %675 = select %673, %674, %arg5 : index + %676 = divi_signed %675, %c16 : index + %677 = subi %c-1, %676 : index + %678 = select %673, %677, %676 : index + %679 = cmpi "slt", %arg5, %c0 : index + %680 = subi %c-1, %arg5 : index + %681 = select %679, %680, %arg5 : index + %682 = divi_signed %681, %c16 : index + %683 = subi %c-1, %682 : index + %684 = select %679, %683, %682 : index + %685 = addi %684, %c7 : index + %686 = cmpi "slt", %685, %c0 : index + %687 = subi %c-1, %685 : index + %688 = select %686, %687, %685 : index + %689 = divi_signed %688, %c16 : index + %690 = subi %c-1, %689 : index + %691 = select %686, %690, %689 : index + %692 = muli %691, %c-16 : index + %693 = addi %678, %692 : index + %694 = addi %693, %c7 : index + %695 = remi_signed %arg5, %c16 : index + %696 = cmpi "slt", %695, %c0 : index + %697 = addi %695, %c16 : index + %698 = select %696, %697, %695 : index + %699 = cmpi "slt", %698, %c0 : index + %700 = subi %c-1, %698 : index + %701 = select %699, %700, %698 : index + %702 = divi_signed %701, %c8 : index + %703 = subi %c-1, %702 : index + %704 = select %699, %703, %702 : index + %705 = remi_signed %704, %c2 : index + %706 = cmpi "slt", %705, %c0 : index + %707 = addi %705, %c2 : index + %708 = select %706, %707, %705 : index + %709 = load %2[%694, %c0, %708] : memref<16x6x2xvector<8xf32>> + %710 = addf %672, %709 : vector<8xf32> + store %710, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %711 = addi %arg3, %arg5 : index + %712 = addi %711, %c120 : index + %713 = vector.transfer_read %arg2[%arg4, %712], %cst {masked = [false]} : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %714 = addi %arg5, %c120 : index + %715 = cmpi "slt", %714, %c0 : index + %716 = subi %c-1, %714 : index + %717 = select %715, %716, %714 : index + %718 = divi_signed %717, %c16 : index + %719 = subi %c-1, %718 : index + %720 = select %715, %719, %718 : index + %721 = remi_signed %720, %c16 : index + %722 = cmpi "slt", %721, %c0 : index + %723 = addi %721, %c16 : index + %724 = select %722, %723, %721 : index + %725 = cmpi "slt", %arg5, %c0 : index + %726 = subi %c-1, %arg5 : index + %727 = select %725, %726, %arg5 : index + %728 = divi_signed %727, %c8 : index + %729 = subi %c-1, %728 : index + %730 = select %725, %729, %728 : index + %731 = addi %arg5, %c120 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c16 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = muli %737, %c-2 : index + %739 = addi %730, %738 : index + %740 = cmpi "slt", %arg5, %c0 : index + %741 = subi %c-1, %arg5 : index + %742 = select %740, %741, %arg5 : index + %743 = divi_signed %742, %c8 : index + %744 = subi %c-1, %743 : index + %745 = select %740, %744, %743 : index + %746 = addi %arg5, %c120 : index + %747 = cmpi "slt", %746, %c0 : index + %748 = subi %c-1, %746 : index + %749 = select %747, %748, %746 : index + %750 = divi_signed %749, %c16 : index + %751 = subi %c-1, %750 : index + %752 = select %747, %751, %750 : index + %753 = muli %752, %c-2 : index + %754 = addi %745, %753 : index + %755 = addi %754, %c15 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = subi %c-1, %755 : index + %758 = select %756, %757, %755 : index + %759 = divi_signed %758, %c2 : index + %760 = subi %c-1, %759 : index + %761 = select %756, %760, %759 : index + %762 = muli %761, %c-2 : index + %763 = addi %739, %762 : index + %764 = addi %763, %c15 : index + %765 = load %2[%724, %c0, %764] : memref<16x6x2xvector<8xf32>> + %766 = addf %713, %765 : vector<8xf32> + store %766, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %767 = addi %arg3, %arg5 : index + %768 = muli %arg6, %c8 : index + %769 = addi %767, %768 : index + %770 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %770, %arg2[%arg4, %769] {masked = [false]} : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } else { + %4 = addi %arg3, %arg5 : index + %5 = vector.transfer_read %arg2[%arg4, %4], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %6 = cmpi "slt", %arg5, %c0 : index + %7 = subi %c-1, %arg5 : index + %8 = select %6, %7, %arg5 : index + %9 = divi_signed %8, %c16 : index + %10 = subi %c-1, %9 : index + %11 = select %6, %10, %9 : index + %12 = remi_signed %11, %c16 : index + %13 = cmpi "slt", %12, %c0 : index + %14 = addi %12, %c16 : index + %15 = select %13, %14, %12 : index + %16 = remi_signed %arg5, %c16 : index + %17 = cmpi "slt", %16, %c0 : index + %18 = addi %16, %c16 : index + %19 = select %17, %18, %16 : index + %20 = cmpi "slt", %19, %c0 : index + %21 = subi %c-1, %19 : index + %22 = select %20, %21, %19 : index + %23 = divi_signed %22, %c8 : index + %24 = subi %c-1, %23 : index + %25 = select %20, %24, %23 : index + %26 = remi_signed %25, %c2 : index + %27 = cmpi "slt", %26, %c0 : index + %28 = addi %26, %c2 : index + %29 = select %27, %28, %26 : index + %30 = load %2[%15, %c0, %29] : memref<16x6x2xvector<8xf32>> + %31 = addf %5, %30 : vector<8xf32> + store %31, %1[%c0, %c0] : memref<1x16xvector<8xf32>> + %32 = addi %arg3, %arg5 : index + %33 = addi %32, %c8 : index + %34 = vector.transfer_read %arg2[%arg4, %33], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %35 = addi %arg5, %c8 : index + %36 = cmpi "slt", %35, %c0 : index + %37 = subi %c-1, %35 : index + %38 = select %36, %37, %35 : index + %39 = divi_signed %38, %c16 : index + %40 = subi %c-1, %39 : index + %41 = select %36, %40, %39 : index + %42 = remi_signed %41, %c16 : index + %43 = cmpi "slt", %42, %c0 : index + %44 = addi %42, %c16 : index + %45 = select %43, %44, %42 : index + %46 = cmpi "slt", %arg5, %c0 : index + %47 = subi %c-1, %arg5 : index + %48 = select %46, %47, %arg5 : index + %49 = divi_signed %48, %c8 : index + %50 = subi %c-1, %49 : index + %51 = select %46, %50, %49 : index + %52 = addi %arg5, %c8 : index + %53 = cmpi "slt", %52, %c0 : index + %54 = subi %c-1, %52 : index + %55 = select %53, %54, %52 : index + %56 = divi_signed %55, %c16 : index + %57 = subi %c-1, %56 : index + %58 = select %53, %57, %56 : index + %59 = muli %58, %c-2 : index + %60 = addi %51, %59 : index + %61 = cmpi "slt", %arg5, %c0 : index + %62 = subi %c-1, %arg5 : index + %63 = select %61, %62, %arg5 : index + %64 = divi_signed %63, %c8 : index + %65 = subi %c-1, %64 : index + %66 = select %61, %65, %64 : index + %67 = addi %arg5, %c8 : index + %68 = cmpi "slt", %67, %c0 : index + %69 = subi %c-1, %67 : index + %70 = select %68, %69, %67 : index + %71 = divi_signed %70, %c16 : index + %72 = subi %c-1, %71 : index + %73 = select %68, %72, %71 : index + %74 = muli %73, %c-2 : index + %75 = addi %66, %74 : index + %76 = addi %75, %c1 : index + %77 = cmpi "slt", %76, %c0 : index + %78 = subi %c-1, %76 : index + %79 = select %77, %78, %76 : index + %80 = divi_signed %79, %c2 : index + %81 = subi %c-1, %80 : index + %82 = select %77, %81, %80 : index + %83 = muli %82, %c-2 : index + %84 = addi %60, %83 : index + %85 = addi %84, %c1 : index + %86 = load %2[%45, %c0, %85] : memref<16x6x2xvector<8xf32>> + %87 = addf %34, %86 : vector<8xf32> + store %87, %1[%c0, %c1] : memref<1x16xvector<8xf32>> + %88 = addi %arg3, %arg5 : index + %89 = addi %88, %c16 : index + %90 = vector.transfer_read %arg2[%arg4, %89], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %91 = cmpi "slt", %arg5, %c0 : index + %92 = subi %c-1, %arg5 : index + %93 = select %91, %92, %arg5 : index + %94 = divi_signed %93, %c16 : index + %95 = subi %c-1, %94 : index + %96 = select %91, %95, %94 : index + %97 = cmpi "slt", %arg5, %c0 : index + %98 = subi %c-1, %arg5 : index + %99 = select %97, %98, %arg5 : index + %100 = divi_signed %99, %c16 : index + %101 = subi %c-1, %100 : index + %102 = select %97, %101, %100 : index + %103 = addi %102, %c1 : index + %104 = cmpi "slt", %103, %c0 : index + %105 = subi %c-1, %103 : index + %106 = select %104, %105, %103 : index + %107 = divi_signed %106, %c16 : index + %108 = subi %c-1, %107 : index + %109 = select %104, %108, %107 : index + %110 = muli %109, %c-16 : index + %111 = addi %96, %110 : index + %112 = addi %111, %c1 : index + %113 = remi_signed %arg5, %c16 : index + %114 = cmpi "slt", %113, %c0 : index + %115 = addi %113, %c16 : index + %116 = select %114, %115, %113 : index + %117 = cmpi "slt", %116, %c0 : index + %118 = subi %c-1, %116 : index + %119 = select %117, %118, %116 : index + %120 = divi_signed %119, %c8 : index + %121 = subi %c-1, %120 : index + %122 = select %117, %121, %120 : index + %123 = remi_signed %122, %c2 : index + %124 = cmpi "slt", %123, %c0 : index + %125 = addi %123, %c2 : index + %126 = select %124, %125, %123 : index + %127 = load %2[%112, %c0, %126] : memref<16x6x2xvector<8xf32>> + %128 = addf %90, %127 : vector<8xf32> + store %128, %1[%c0, %c2] : memref<1x16xvector<8xf32>> + %129 = addi %arg3, %arg5 : index + %130 = addi %129, %c24 : index + %131 = vector.transfer_read %arg2[%arg4, %130], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %132 = addi %arg5, %c24 : index + %133 = cmpi "slt", %132, %c0 : index + %134 = subi %c-1, %132 : index + %135 = select %133, %134, %132 : index + %136 = divi_signed %135, %c16 : index + %137 = subi %c-1, %136 : index + %138 = select %133, %137, %136 : index + %139 = remi_signed %138, %c16 : index + %140 = cmpi "slt", %139, %c0 : index + %141 = addi %139, %c16 : index + %142 = select %140, %141, %139 : index + %143 = cmpi "slt", %arg5, %c0 : index + %144 = subi %c-1, %arg5 : index + %145 = select %143, %144, %arg5 : index + %146 = divi_signed %145, %c8 : index + %147 = subi %c-1, %146 : index + %148 = select %143, %147, %146 : index + %149 = addi %arg5, %c24 : index + %150 = cmpi "slt", %149, %c0 : index + %151 = subi %c-1, %149 : index + %152 = select %150, %151, %149 : index + %153 = divi_signed %152, %c16 : index + %154 = subi %c-1, %153 : index + %155 = select %150, %154, %153 : index + %156 = muli %155, %c-2 : index + %157 = addi %148, %156 : index + %158 = cmpi "slt", %arg5, %c0 : index + %159 = subi %c-1, %arg5 : index + %160 = select %158, %159, %arg5 : index + %161 = divi_signed %160, %c8 : index + %162 = subi %c-1, %161 : index + %163 = select %158, %162, %161 : index + %164 = addi %arg5, %c24 : index + %165 = cmpi "slt", %164, %c0 : index + %166 = subi %c-1, %164 : index + %167 = select %165, %166, %164 : index + %168 = divi_signed %167, %c16 : index + %169 = subi %c-1, %168 : index + %170 = select %165, %169, %168 : index + %171 = muli %170, %c-2 : index + %172 = addi %163, %171 : index + %173 = addi %172, %c3 : index + %174 = cmpi "slt", %173, %c0 : index + %175 = subi %c-1, %173 : index + %176 = select %174, %175, %173 : index + %177 = divi_signed %176, %c2 : index + %178 = subi %c-1, %177 : index + %179 = select %174, %178, %177 : index + %180 = muli %179, %c-2 : index + %181 = addi %157, %180 : index + %182 = addi %181, %c3 : index + %183 = load %2[%142, %c0, %182] : memref<16x6x2xvector<8xf32>> + %184 = addf %131, %183 : vector<8xf32> + store %184, %1[%c0, %c3] : memref<1x16xvector<8xf32>> + %185 = addi %arg3, %arg5 : index + %186 = addi %185, %c32 : index + %187 = vector.transfer_read %arg2[%arg4, %186], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %188 = cmpi "slt", %arg5, %c0 : index + %189 = subi %c-1, %arg5 : index + %190 = select %188, %189, %arg5 : index + %191 = divi_signed %190, %c16 : index + %192 = subi %c-1, %191 : index + %193 = select %188, %192, %191 : index + %194 = cmpi "slt", %arg5, %c0 : index + %195 = subi %c-1, %arg5 : index + %196 = select %194, %195, %arg5 : index + %197 = divi_signed %196, %c16 : index + %198 = subi %c-1, %197 : index + %199 = select %194, %198, %197 : index + %200 = addi %199, %c2 : index + %201 = cmpi "slt", %200, %c0 : index + %202 = subi %c-1, %200 : index + %203 = select %201, %202, %200 : index + %204 = divi_signed %203, %c16 : index + %205 = subi %c-1, %204 : index + %206 = select %201, %205, %204 : index + %207 = muli %206, %c-16 : index + %208 = addi %193, %207 : index + %209 = addi %208, %c2 : index + %210 = remi_signed %arg5, %c16 : index + %211 = cmpi "slt", %210, %c0 : index + %212 = addi %210, %c16 : index + %213 = select %211, %212, %210 : index + %214 = cmpi "slt", %213, %c0 : index + %215 = subi %c-1, %213 : index + %216 = select %214, %215, %213 : index + %217 = divi_signed %216, %c8 : index + %218 = subi %c-1, %217 : index + %219 = select %214, %218, %217 : index + %220 = remi_signed %219, %c2 : index + %221 = cmpi "slt", %220, %c0 : index + %222 = addi %220, %c2 : index + %223 = select %221, %222, %220 : index + %224 = load %2[%209, %c0, %223] : memref<16x6x2xvector<8xf32>> + %225 = addf %187, %224 : vector<8xf32> + store %225, %1[%c0, %c4] : memref<1x16xvector<8xf32>> + %226 = addi %arg3, %arg5 : index + %227 = addi %226, %c40 : index + %228 = vector.transfer_read %arg2[%arg4, %227], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %229 = addi %arg5, %c40 : index + %230 = cmpi "slt", %229, %c0 : index + %231 = subi %c-1, %229 : index + %232 = select %230, %231, %229 : index + %233 = divi_signed %232, %c16 : index + %234 = subi %c-1, %233 : index + %235 = select %230, %234, %233 : index + %236 = remi_signed %235, %c16 : index + %237 = cmpi "slt", %236, %c0 : index + %238 = addi %236, %c16 : index + %239 = select %237, %238, %236 : index + %240 = cmpi "slt", %arg5, %c0 : index + %241 = subi %c-1, %arg5 : index + %242 = select %240, %241, %arg5 : index + %243 = divi_signed %242, %c8 : index + %244 = subi %c-1, %243 : index + %245 = select %240, %244, %243 : index + %246 = addi %arg5, %c40 : index + %247 = cmpi "slt", %246, %c0 : index + %248 = subi %c-1, %246 : index + %249 = select %247, %248, %246 : index + %250 = divi_signed %249, %c16 : index + %251 = subi %c-1, %250 : index + %252 = select %247, %251, %250 : index + %253 = muli %252, %c-2 : index + %254 = addi %245, %253 : index + %255 = cmpi "slt", %arg5, %c0 : index + %256 = subi %c-1, %arg5 : index + %257 = select %255, %256, %arg5 : index + %258 = divi_signed %257, %c8 : index + %259 = subi %c-1, %258 : index + %260 = select %255, %259, %258 : index + %261 = addi %arg5, %c40 : index + %262 = cmpi "slt", %261, %c0 : index + %263 = subi %c-1, %261 : index + %264 = select %262, %263, %261 : index + %265 = divi_signed %264, %c16 : index + %266 = subi %c-1, %265 : index + %267 = select %262, %266, %265 : index + %268 = muli %267, %c-2 : index + %269 = addi %260, %268 : index + %270 = addi %269, %c5 : index + %271 = cmpi "slt", %270, %c0 : index + %272 = subi %c-1, %270 : index + %273 = select %271, %272, %270 : index + %274 = divi_signed %273, %c2 : index + %275 = subi %c-1, %274 : index + %276 = select %271, %275, %274 : index + %277 = muli %276, %c-2 : index + %278 = addi %254, %277 : index + %279 = addi %278, %c5 : index + %280 = load %2[%239, %c0, %279] : memref<16x6x2xvector<8xf32>> + %281 = addf %228, %280 : vector<8xf32> + store %281, %1[%c0, %c5] : memref<1x16xvector<8xf32>> + %282 = addi %arg3, %arg5 : index + %283 = addi %282, %c48 : index + %284 = vector.transfer_read %arg2[%arg4, %283], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %285 = cmpi "slt", %arg5, %c0 : index + %286 = subi %c-1, %arg5 : index + %287 = select %285, %286, %arg5 : index + %288 = divi_signed %287, %c16 : index + %289 = subi %c-1, %288 : index + %290 = select %285, %289, %288 : index + %291 = cmpi "slt", %arg5, %c0 : index + %292 = subi %c-1, %arg5 : index + %293 = select %291, %292, %arg5 : index + %294 = divi_signed %293, %c16 : index + %295 = subi %c-1, %294 : index + %296 = select %291, %295, %294 : index + %297 = addi %296, %c3 : index + %298 = cmpi "slt", %297, %c0 : index + %299 = subi %c-1, %297 : index + %300 = select %298, %299, %297 : index + %301 = divi_signed %300, %c16 : index + %302 = subi %c-1, %301 : index + %303 = select %298, %302, %301 : index + %304 = muli %303, %c-16 : index + %305 = addi %290, %304 : index + %306 = addi %305, %c3 : index + %307 = remi_signed %arg5, %c16 : index + %308 = cmpi "slt", %307, %c0 : index + %309 = addi %307, %c16 : index + %310 = select %308, %309, %307 : index + %311 = cmpi "slt", %310, %c0 : index + %312 = subi %c-1, %310 : index + %313 = select %311, %312, %310 : index + %314 = divi_signed %313, %c8 : index + %315 = subi %c-1, %314 : index + %316 = select %311, %315, %314 : index + %317 = remi_signed %316, %c2 : index + %318 = cmpi "slt", %317, %c0 : index + %319 = addi %317, %c2 : index + %320 = select %318, %319, %317 : index + %321 = load %2[%306, %c0, %320] : memref<16x6x2xvector<8xf32>> + %322 = addf %284, %321 : vector<8xf32> + store %322, %1[%c0, %c6] : memref<1x16xvector<8xf32>> + %323 = addi %arg3, %arg5 : index + %324 = addi %323, %c56 : index + %325 = vector.transfer_read %arg2[%arg4, %324], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %326 = addi %arg5, %c56 : index + %327 = cmpi "slt", %326, %c0 : index + %328 = subi %c-1, %326 : index + %329 = select %327, %328, %326 : index + %330 = divi_signed %329, %c16 : index + %331 = subi %c-1, %330 : index + %332 = select %327, %331, %330 : index + %333 = remi_signed %332, %c16 : index + %334 = cmpi "slt", %333, %c0 : index + %335 = addi %333, %c16 : index + %336 = select %334, %335, %333 : index + %337 = cmpi "slt", %arg5, %c0 : index + %338 = subi %c-1, %arg5 : index + %339 = select %337, %338, %arg5 : index + %340 = divi_signed %339, %c8 : index + %341 = subi %c-1, %340 : index + %342 = select %337, %341, %340 : index + %343 = addi %arg5, %c56 : index + %344 = cmpi "slt", %343, %c0 : index + %345 = subi %c-1, %343 : index + %346 = select %344, %345, %343 : index + %347 = divi_signed %346, %c16 : index + %348 = subi %c-1, %347 : index + %349 = select %344, %348, %347 : index + %350 = muli %349, %c-2 : index + %351 = addi %342, %350 : index + %352 = cmpi "slt", %arg5, %c0 : index + %353 = subi %c-1, %arg5 : index + %354 = select %352, %353, %arg5 : index + %355 = divi_signed %354, %c8 : index + %356 = subi %c-1, %355 : index + %357 = select %352, %356, %355 : index + %358 = addi %arg5, %c56 : index + %359 = cmpi "slt", %358, %c0 : index + %360 = subi %c-1, %358 : index + %361 = select %359, %360, %358 : index + %362 = divi_signed %361, %c16 : index + %363 = subi %c-1, %362 : index + %364 = select %359, %363, %362 : index + %365 = muli %364, %c-2 : index + %366 = addi %357, %365 : index + %367 = addi %366, %c7 : index + %368 = cmpi "slt", %367, %c0 : index + %369 = subi %c-1, %367 : index + %370 = select %368, %369, %367 : index + %371 = divi_signed %370, %c2 : index + %372 = subi %c-1, %371 : index + %373 = select %368, %372, %371 : index + %374 = muli %373, %c-2 : index + %375 = addi %351, %374 : index + %376 = addi %375, %c7 : index + %377 = load %2[%336, %c0, %376] : memref<16x6x2xvector<8xf32>> + %378 = addf %325, %377 : vector<8xf32> + store %378, %1[%c0, %c7] : memref<1x16xvector<8xf32>> + %379 = addi %arg3, %arg5 : index + %380 = addi %379, %c64 : index + %381 = vector.transfer_read %arg2[%arg4, %380], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %382 = cmpi "slt", %arg5, %c0 : index + %383 = subi %c-1, %arg5 : index + %384 = select %382, %383, %arg5 : index + %385 = divi_signed %384, %c16 : index + %386 = subi %c-1, %385 : index + %387 = select %382, %386, %385 : index + %388 = cmpi "slt", %arg5, %c0 : index + %389 = subi %c-1, %arg5 : index + %390 = select %388, %389, %arg5 : index + %391 = divi_signed %390, %c16 : index + %392 = subi %c-1, %391 : index + %393 = select %388, %392, %391 : index + %394 = addi %393, %c4 : index + %395 = cmpi "slt", %394, %c0 : index + %396 = subi %c-1, %394 : index + %397 = select %395, %396, %394 : index + %398 = divi_signed %397, %c16 : index + %399 = subi %c-1, %398 : index + %400 = select %395, %399, %398 : index + %401 = muli %400, %c-16 : index + %402 = addi %387, %401 : index + %403 = addi %402, %c4 : index + %404 = remi_signed %arg5, %c16 : index + %405 = cmpi "slt", %404, %c0 : index + %406 = addi %404, %c16 : index + %407 = select %405, %406, %404 : index + %408 = cmpi "slt", %407, %c0 : index + %409 = subi %c-1, %407 : index + %410 = select %408, %409, %407 : index + %411 = divi_signed %410, %c8 : index + %412 = subi %c-1, %411 : index + %413 = select %408, %412, %411 : index + %414 = remi_signed %413, %c2 : index + %415 = cmpi "slt", %414, %c0 : index + %416 = addi %414, %c2 : index + %417 = select %415, %416, %414 : index + %418 = load %2[%403, %c0, %417] : memref<16x6x2xvector<8xf32>> + %419 = addf %381, %418 : vector<8xf32> + store %419, %1[%c0, %c8] : memref<1x16xvector<8xf32>> + %420 = addi %arg3, %arg5 : index + %421 = addi %420, %c72 : index + %422 = vector.transfer_read %arg2[%arg4, %421], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %423 = addi %arg5, %c72 : index + %424 = cmpi "slt", %423, %c0 : index + %425 = subi %c-1, %423 : index + %426 = select %424, %425, %423 : index + %427 = divi_signed %426, %c16 : index + %428 = subi %c-1, %427 : index + %429 = select %424, %428, %427 : index + %430 = remi_signed %429, %c16 : index + %431 = cmpi "slt", %430, %c0 : index + %432 = addi %430, %c16 : index + %433 = select %431, %432, %430 : index + %434 = cmpi "slt", %arg5, %c0 : index + %435 = subi %c-1, %arg5 : index + %436 = select %434, %435, %arg5 : index + %437 = divi_signed %436, %c8 : index + %438 = subi %c-1, %437 : index + %439 = select %434, %438, %437 : index + %440 = addi %arg5, %c72 : index + %441 = cmpi "slt", %440, %c0 : index + %442 = subi %c-1, %440 : index + %443 = select %441, %442, %440 : index + %444 = divi_signed %443, %c16 : index + %445 = subi %c-1, %444 : index + %446 = select %441, %445, %444 : index + %447 = muli %446, %c-2 : index + %448 = addi %439, %447 : index + %449 = cmpi "slt", %arg5, %c0 : index + %450 = subi %c-1, %arg5 : index + %451 = select %449, %450, %arg5 : index + %452 = divi_signed %451, %c8 : index + %453 = subi %c-1, %452 : index + %454 = select %449, %453, %452 : index + %455 = addi %arg5, %c72 : index + %456 = cmpi "slt", %455, %c0 : index + %457 = subi %c-1, %455 : index + %458 = select %456, %457, %455 : index + %459 = divi_signed %458, %c16 : index + %460 = subi %c-1, %459 : index + %461 = select %456, %460, %459 : index + %462 = muli %461, %c-2 : index + %463 = addi %454, %462 : index + %464 = addi %463, %c9 : index + %465 = cmpi "slt", %464, %c0 : index + %466 = subi %c-1, %464 : index + %467 = select %465, %466, %464 : index + %468 = divi_signed %467, %c2 : index + %469 = subi %c-1, %468 : index + %470 = select %465, %469, %468 : index + %471 = muli %470, %c-2 : index + %472 = addi %448, %471 : index + %473 = addi %472, %c9 : index + %474 = load %2[%433, %c0, %473] : memref<16x6x2xvector<8xf32>> + %475 = addf %422, %474 : vector<8xf32> + store %475, %1[%c0, %c9] : memref<1x16xvector<8xf32>> + %476 = addi %arg3, %arg5 : index + %477 = addi %476, %c80 : index + %478 = vector.transfer_read %arg2[%arg4, %477], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %479 = cmpi "slt", %arg5, %c0 : index + %480 = subi %c-1, %arg5 : index + %481 = select %479, %480, %arg5 : index + %482 = divi_signed %481, %c16 : index + %483 = subi %c-1, %482 : index + %484 = select %479, %483, %482 : index + %485 = cmpi "slt", %arg5, %c0 : index + %486 = subi %c-1, %arg5 : index + %487 = select %485, %486, %arg5 : index + %488 = divi_signed %487, %c16 : index + %489 = subi %c-1, %488 : index + %490 = select %485, %489, %488 : index + %491 = addi %490, %c5 : index + %492 = cmpi "slt", %491, %c0 : index + %493 = subi %c-1, %491 : index + %494 = select %492, %493, %491 : index + %495 = divi_signed %494, %c16 : index + %496 = subi %c-1, %495 : index + %497 = select %492, %496, %495 : index + %498 = muli %497, %c-16 : index + %499 = addi %484, %498 : index + %500 = addi %499, %c5 : index + %501 = remi_signed %arg5, %c16 : index + %502 = cmpi "slt", %501, %c0 : index + %503 = addi %501, %c16 : index + %504 = select %502, %503, %501 : index + %505 = cmpi "slt", %504, %c0 : index + %506 = subi %c-1, %504 : index + %507 = select %505, %506, %504 : index + %508 = divi_signed %507, %c8 : index + %509 = subi %c-1, %508 : index + %510 = select %505, %509, %508 : index + %511 = remi_signed %510, %c2 : index + %512 = cmpi "slt", %511, %c0 : index + %513 = addi %511, %c2 : index + %514 = select %512, %513, %511 : index + %515 = load %2[%500, %c0, %514] : memref<16x6x2xvector<8xf32>> + %516 = addf %478, %515 : vector<8xf32> + store %516, %1[%c0, %c10] : memref<1x16xvector<8xf32>> + %517 = addi %arg3, %arg5 : index + %518 = addi %517, %c88 : index + %519 = vector.transfer_read %arg2[%arg4, %518], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %520 = addi %arg5, %c88 : index + %521 = cmpi "slt", %520, %c0 : index + %522 = subi %c-1, %520 : index + %523 = select %521, %522, %520 : index + %524 = divi_signed %523, %c16 : index + %525 = subi %c-1, %524 : index + %526 = select %521, %525, %524 : index + %527 = remi_signed %526, %c16 : index + %528 = cmpi "slt", %527, %c0 : index + %529 = addi %527, %c16 : index + %530 = select %528, %529, %527 : index + %531 = cmpi "slt", %arg5, %c0 : index + %532 = subi %c-1, %arg5 : index + %533 = select %531, %532, %arg5 : index + %534 = divi_signed %533, %c8 : index + %535 = subi %c-1, %534 : index + %536 = select %531, %535, %534 : index + %537 = addi %arg5, %c88 : index + %538 = cmpi "slt", %537, %c0 : index + %539 = subi %c-1, %537 : index + %540 = select %538, %539, %537 : index + %541 = divi_signed %540, %c16 : index + %542 = subi %c-1, %541 : index + %543 = select %538, %542, %541 : index + %544 = muli %543, %c-2 : index + %545 = addi %536, %544 : index + %546 = cmpi "slt", %arg5, %c0 : index + %547 = subi %c-1, %arg5 : index + %548 = select %546, %547, %arg5 : index + %549 = divi_signed %548, %c8 : index + %550 = subi %c-1, %549 : index + %551 = select %546, %550, %549 : index + %552 = addi %arg5, %c88 : index + %553 = cmpi "slt", %552, %c0 : index + %554 = subi %c-1, %552 : index + %555 = select %553, %554, %552 : index + %556 = divi_signed %555, %c16 : index + %557 = subi %c-1, %556 : index + %558 = select %553, %557, %556 : index + %559 = muli %558, %c-2 : index + %560 = addi %551, %559 : index + %561 = addi %560, %c11 : index + %562 = cmpi "slt", %561, %c0 : index + %563 = subi %c-1, %561 : index + %564 = select %562, %563, %561 : index + %565 = divi_signed %564, %c2 : index + %566 = subi %c-1, %565 : index + %567 = select %562, %566, %565 : index + %568 = muli %567, %c-2 : index + %569 = addi %545, %568 : index + %570 = addi %569, %c11 : index + %571 = load %2[%530, %c0, %570] : memref<16x6x2xvector<8xf32>> + %572 = addf %519, %571 : vector<8xf32> + store %572, %1[%c0, %c11] : memref<1x16xvector<8xf32>> + %573 = addi %arg3, %arg5 : index + %574 = addi %573, %c96 : index + %575 = vector.transfer_read %arg2[%arg4, %574], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %576 = cmpi "slt", %arg5, %c0 : index + %577 = subi %c-1, %arg5 : index + %578 = select %576, %577, %arg5 : index + %579 = divi_signed %578, %c16 : index + %580 = subi %c-1, %579 : index + %581 = select %576, %580, %579 : index + %582 = cmpi "slt", %arg5, %c0 : index + %583 = subi %c-1, %arg5 : index + %584 = select %582, %583, %arg5 : index + %585 = divi_signed %584, %c16 : index + %586 = subi %c-1, %585 : index + %587 = select %582, %586, %585 : index + %588 = addi %587, %c6 : index + %589 = cmpi "slt", %588, %c0 : index + %590 = subi %c-1, %588 : index + %591 = select %589, %590, %588 : index + %592 = divi_signed %591, %c16 : index + %593 = subi %c-1, %592 : index + %594 = select %589, %593, %592 : index + %595 = muli %594, %c-16 : index + %596 = addi %581, %595 : index + %597 = addi %596, %c6 : index + %598 = remi_signed %arg5, %c16 : index + %599 = cmpi "slt", %598, %c0 : index + %600 = addi %598, %c16 : index + %601 = select %599, %600, %598 : index + %602 = cmpi "slt", %601, %c0 : index + %603 = subi %c-1, %601 : index + %604 = select %602, %603, %601 : index + %605 = divi_signed %604, %c8 : index + %606 = subi %c-1, %605 : index + %607 = select %602, %606, %605 : index + %608 = remi_signed %607, %c2 : index + %609 = cmpi "slt", %608, %c0 : index + %610 = addi %608, %c2 : index + %611 = select %609, %610, %608 : index + %612 = load %2[%597, %c0, %611] : memref<16x6x2xvector<8xf32>> + %613 = addf %575, %612 : vector<8xf32> + store %613, %1[%c0, %c12] : memref<1x16xvector<8xf32>> + %614 = addi %arg3, %arg5 : index + %615 = addi %614, %c104 : index + %616 = vector.transfer_read %arg2[%arg4, %615], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %617 = addi %arg5, %c104 : index + %618 = cmpi "slt", %617, %c0 : index + %619 = subi %c-1, %617 : index + %620 = select %618, %619, %617 : index + %621 = divi_signed %620, %c16 : index + %622 = subi %c-1, %621 : index + %623 = select %618, %622, %621 : index + %624 = remi_signed %623, %c16 : index + %625 = cmpi "slt", %624, %c0 : index + %626 = addi %624, %c16 : index + %627 = select %625, %626, %624 : index + %628 = cmpi "slt", %arg5, %c0 : index + %629 = subi %c-1, %arg5 : index + %630 = select %628, %629, %arg5 : index + %631 = divi_signed %630, %c8 : index + %632 = subi %c-1, %631 : index + %633 = select %628, %632, %631 : index + %634 = addi %arg5, %c104 : index + %635 = cmpi "slt", %634, %c0 : index + %636 = subi %c-1, %634 : index + %637 = select %635, %636, %634 : index + %638 = divi_signed %637, %c16 : index + %639 = subi %c-1, %638 : index + %640 = select %635, %639, %638 : index + %641 = muli %640, %c-2 : index + %642 = addi %633, %641 : index + %643 = cmpi "slt", %arg5, %c0 : index + %644 = subi %c-1, %arg5 : index + %645 = select %643, %644, %arg5 : index + %646 = divi_signed %645, %c8 : index + %647 = subi %c-1, %646 : index + %648 = select %643, %647, %646 : index + %649 = addi %arg5, %c104 : index + %650 = cmpi "slt", %649, %c0 : index + %651 = subi %c-1, %649 : index + %652 = select %650, %651, %649 : index + %653 = divi_signed %652, %c16 : index + %654 = subi %c-1, %653 : index + %655 = select %650, %654, %653 : index + %656 = muli %655, %c-2 : index + %657 = addi %648, %656 : index + %658 = addi %657, %c13 : index + %659 = cmpi "slt", %658, %c0 : index + %660 = subi %c-1, %658 : index + %661 = select %659, %660, %658 : index + %662 = divi_signed %661, %c2 : index + %663 = subi %c-1, %662 : index + %664 = select %659, %663, %662 : index + %665 = muli %664, %c-2 : index + %666 = addi %642, %665 : index + %667 = addi %666, %c13 : index + %668 = load %2[%627, %c0, %667] : memref<16x6x2xvector<8xf32>> + %669 = addf %616, %668 : vector<8xf32> + store %669, %1[%c0, %c13] : memref<1x16xvector<8xf32>> + %670 = addi %arg3, %arg5 : index + %671 = addi %670, %c112 : index + %672 = vector.transfer_read %arg2[%arg4, %671], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %673 = cmpi "slt", %arg5, %c0 : index + %674 = subi %c-1, %arg5 : index + %675 = select %673, %674, %arg5 : index + %676 = divi_signed %675, %c16 : index + %677 = subi %c-1, %676 : index + %678 = select %673, %677, %676 : index + %679 = cmpi "slt", %arg5, %c0 : index + %680 = subi %c-1, %arg5 : index + %681 = select %679, %680, %arg5 : index + %682 = divi_signed %681, %c16 : index + %683 = subi %c-1, %682 : index + %684 = select %679, %683, %682 : index + %685 = addi %684, %c7 : index + %686 = cmpi "slt", %685, %c0 : index + %687 = subi %c-1, %685 : index + %688 = select %686, %687, %685 : index + %689 = divi_signed %688, %c16 : index + %690 = subi %c-1, %689 : index + %691 = select %686, %690, %689 : index + %692 = muli %691, %c-16 : index + %693 = addi %678, %692 : index + %694 = addi %693, %c7 : index + %695 = remi_signed %arg5, %c16 : index + %696 = cmpi "slt", %695, %c0 : index + %697 = addi %695, %c16 : index + %698 = select %696, %697, %695 : index + %699 = cmpi "slt", %698, %c0 : index + %700 = subi %c-1, %698 : index + %701 = select %699, %700, %698 : index + %702 = divi_signed %701, %c8 : index + %703 = subi %c-1, %702 : index + %704 = select %699, %703, %702 : index + %705 = remi_signed %704, %c2 : index + %706 = cmpi "slt", %705, %c0 : index + %707 = addi %705, %c2 : index + %708 = select %706, %707, %705 : index + %709 = load %2[%694, %c0, %708] : memref<16x6x2xvector<8xf32>> + %710 = addf %672, %709 : vector<8xf32> + store %710, %1[%c0, %c14] : memref<1x16xvector<8xf32>> + %711 = addi %arg3, %arg5 : index + %712 = addi %711, %c120 : index + %713 = vector.transfer_read %arg2[%arg4, %712], %cst : memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, vector<8xf32> + %714 = addi %arg5, %c120 : index + %715 = cmpi "slt", %714, %c0 : index + %716 = subi %c-1, %714 : index + %717 = select %715, %716, %714 : index + %718 = divi_signed %717, %c16 : index + %719 = subi %c-1, %718 : index + %720 = select %715, %719, %718 : index + %721 = remi_signed %720, %c16 : index + %722 = cmpi "slt", %721, %c0 : index + %723 = addi %721, %c16 : index + %724 = select %722, %723, %721 : index + %725 = cmpi "slt", %arg5, %c0 : index + %726 = subi %c-1, %arg5 : index + %727 = select %725, %726, %arg5 : index + %728 = divi_signed %727, %c8 : index + %729 = subi %c-1, %728 : index + %730 = select %725, %729, %728 : index + %731 = addi %arg5, %c120 : index + %732 = cmpi "slt", %731, %c0 : index + %733 = subi %c-1, %731 : index + %734 = select %732, %733, %731 : index + %735 = divi_signed %734, %c16 : index + %736 = subi %c-1, %735 : index + %737 = select %732, %736, %735 : index + %738 = muli %737, %c-2 : index + %739 = addi %730, %738 : index + %740 = cmpi "slt", %arg5, %c0 : index + %741 = subi %c-1, %arg5 : index + %742 = select %740, %741, %arg5 : index + %743 = divi_signed %742, %c8 : index + %744 = subi %c-1, %743 : index + %745 = select %740, %744, %743 : index + %746 = addi %arg5, %c120 : index + %747 = cmpi "slt", %746, %c0 : index + %748 = subi %c-1, %746 : index + %749 = select %747, %748, %746 : index + %750 = divi_signed %749, %c16 : index + %751 = subi %c-1, %750 : index + %752 = select %747, %751, %750 : index + %753 = muli %752, %c-2 : index + %754 = addi %745, %753 : index + %755 = addi %754, %c15 : index + %756 = cmpi "slt", %755, %c0 : index + %757 = subi %c-1, %755 : index + %758 = select %756, %757, %755 : index + %759 = divi_signed %758, %c2 : index + %760 = subi %c-1, %759 : index + %761 = select %756, %760, %759 : index + %762 = muli %761, %c-2 : index + %763 = addi %739, %762 : index + %764 = addi %763, %c15 : index + %765 = load %2[%724, %c0, %764] : memref<16x6x2xvector<8xf32>> + %766 = addf %713, %765 : vector<8xf32> + store %766, %1[%c0, %c15] : memref<1x16xvector<8xf32>> + scf.for %arg6 = %c0 to %c16 step %c1 { + %767 = addi %arg3, %arg5 : index + %768 = muli %arg6, %c8 : index + %769 = addi %767, %768 : index + %770 = load %1[%c0, %arg6] : memref<1x16xvector<8xf32>> + vector.transfer_write %770, %arg2[%arg4, %769] : vector<8xf32>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>> + } + } + } + } + } + return + } + func @optimized_matmul_py_4a6286d9(%arg0: memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, %arg1: memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, %arg2: memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0, %arg1, %arg2) : (memref<784x128xf32, affine_map<(d0, d1) -> (d0 * 128 + d1)>>, memref<128x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>, memref<784x512xf32, affine_map<(d0, d1) -> (d0 * 512 + d1)>>) -> () + return + } +} diff --git a/Tutorials/optimized_matmul/mlir/optimized_matmul.s b/Tutorials/optimized_matmul/mlir/optimized_matmul.s new file mode 100644 index 00000000..6c7c7b5e --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/optimized_matmul.s @@ -0,0 +1,896 @@ + .text + .def @feat.00; + .scl 3; + .type 0; + .endef + .globl @feat.00 +.set @feat.00, 0 + .file "LLVMDialectModule" + .def optimized_matmul_py_4a6286d9_impl_17630232307017152746; + .scl 2; + .type 32; + .endef + .globl optimized_matmul_py_4a6286d9_impl_17630232307017152746 # -- Begin function optimized_matmul_py_4a6286d9_impl_17630232307017152746 + .p2align 4, 0x90 +optimized_matmul_py_4a6286d9_impl_17630232307017152746: # @optimized_matmul_py_4a6286d9_impl_17630232307017152746 +.Lfunc_begin0: + .file 1 "D:\\win\\repos\\accera-samples\\tutorials\\optimized_matmul\\_tmp\\optimized_matmul\\optimized_matmul_llvm.mlir" + .loc 1 8 0 # optimized_matmul\optimized_matmul_llvm.mlir:8:0 +# %bb.0: + pushq %r15 + pushq %r14 + pushq %r13 + pushq %r12 + pushq %rsi + pushq %rdi + pushq %rbp + pushq %rbx + subq $328, %rsp # imm = 0x148 + movl $992, %ecx # imm = 0x3E0 +.Ltmp0: + .loc 1 163 5 prologue_end # optimized_matmul\optimized_matmul_llvm.mlir:163:5 + addq 464(%rsp), %rcx + addq $12, %rdx + movq %rdx, 32(%rsp) # 8-byte Spill + leaq cache_17(%rip), %rdx + leaq cache_16(%rip), %r15 + xorl %eax, %eax + xorl %ebp, %ebp + .p2align 4, 0x90 +.LBB0_1: # %.preheader40 + # =>This Loop Header: Depth=1 + # Child Loop BB0_2 Depth 2 + # Child Loop BB0_4 Depth 2 + # Child Loop BB0_5 Depth 3 + # Child Loop BB0_6 Depth 4 + .loc 1 0 5 is_stmt 0 # optimized_matmul\optimized_matmul_llvm.mlir:0:5 + movq %rax, 40(%rsp) # 8-byte Spill + .loc 1 168 5 is_stmt 1 # optimized_matmul\optimized_matmul_llvm.mlir:168:5 + shlq $8, %rax + movq %rax, 304(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $8, %rax + movq %rax, 296(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $16, %rax + movq %rax, 288(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $24, %rax + movq %rax, 280(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $32, %rax + movq %rax, 272(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $40, %rax + movq %rax, 264(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $48, %rax + movq %rax, 256(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $56, %rax + movq %rax, 248(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $64, %rax + movq %rax, 240(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $72, %rax + movq %rax, 232(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $80, %rax + movq %rax, 224(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $88, %rax + movq %rax, 216(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $96, %rax + movq %rax, 208(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $104, %rax + movq %rax, 200(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $112, %rax + movq %rax, 192(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $120, %rax + movq %rax, 184(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $128, %rax + movq %rax, 176(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $136, %rax + movq %rax, 168(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $144, %rax + movq %rax, 160(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $152, %rax + movq %rax, 152(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $160, %rax + movq %rax, 144(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $168, %rax + movq %rax, 136(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $176, %rax + movq %rax, 128(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $184, %rax + movq %rax, 120(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $192, %rax + movq %rax, 112(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $200, %rax + movq %rax, 104(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $208, %rax + movq %rax, 96(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $216, %rax + movq %rax, 88(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $224, %rax + movq %rax, 80(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $232, %rax + movq %rax, 72(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $240, %rax + movq %rax, 64(%rsp) # 8-byte Spill + movq %rbp, %rax + orq $248, %rax + movq %rax, 56(%rsp) # 8-byte Spill + movq $-8192, %rax # imm = 0xE000 + movq %rcx, 48(%rsp) # 8-byte Spill + .p2align 4, 0x90 +.LBB0_2: # %.preheader38 + # Parent Loop BB0_1 Depth=1 + # => This Inner Loop Header: Depth=2 + .loc 1 188 12 # optimized_matmul\optimized_matmul_llvm.mlir:188:12 + vmovups -992(%rcx), %ymm0 + .loc 1 210 12 # optimized_matmul\optimized_matmul_llvm.mlir:210:12 + vmovups -960(%rcx), %ymm1 + .loc 1 232 12 # optimized_matmul\optimized_matmul_llvm.mlir:232:12 + vmovups -928(%rcx), %ymm2 + .loc 1 254 12 # optimized_matmul\optimized_matmul_llvm.mlir:254:12 + vmovups -896(%rcx), %ymm3 + .loc 1 276 12 # optimized_matmul\optimized_matmul_llvm.mlir:276:12 + vmovups -864(%rcx), %ymm4 + .loc 1 298 12 # optimized_matmul\optimized_matmul_llvm.mlir:298:12 + vmovups -832(%rcx), %ymm5 + .loc 1 320 12 # optimized_matmul\optimized_matmul_llvm.mlir:320:12 + vmovups -800(%rcx), %ymm16 + .loc 1 342 12 # optimized_matmul\optimized_matmul_llvm.mlir:342:12 + vmovups -768(%rcx), %ymm17 + .loc 1 364 12 # optimized_matmul\optimized_matmul_llvm.mlir:364:12 + vmovups -736(%rcx), %ymm18 + .loc 1 386 12 # optimized_matmul\optimized_matmul_llvm.mlir:386:12 + vmovups -704(%rcx), %ymm19 + .loc 1 408 12 # optimized_matmul\optimized_matmul_llvm.mlir:408:12 + vmovups -672(%rcx), %ymm20 + .loc 1 430 12 # optimized_matmul\optimized_matmul_llvm.mlir:430:12 + vmovups -640(%rcx), %ymm21 + .loc 1 452 12 # optimized_matmul\optimized_matmul_llvm.mlir:452:12 + vmovups -608(%rcx), %ymm22 + .loc 1 474 12 # optimized_matmul\optimized_matmul_llvm.mlir:474:12 + vmovups -576(%rcx), %ymm23 + .loc 1 496 12 # optimized_matmul\optimized_matmul_llvm.mlir:496:12 + vmovups -544(%rcx), %ymm24 + .loc 1 518 12 # optimized_matmul\optimized_matmul_llvm.mlir:518:12 + vmovups -512(%rcx), %ymm25 + .loc 1 579 5 # optimized_matmul\optimized_matmul_llvm.mlir:579:5 + vmovaps %ymm0, 8192(%rax,%rdx) + .loc 1 628 5 # optimized_matmul\optimized_matmul_llvm.mlir:628:5 + vmovaps %ymm1, 8224(%rax,%rdx) + .loc 1 661 5 # optimized_matmul\optimized_matmul_llvm.mlir:661:5 + vmovaps %ymm2, 16384(%rax,%rdx) + .loc 1 707 5 # optimized_matmul\optimized_matmul_llvm.mlir:707:5 + vmovaps %ymm3, 16416(%rax,%rdx) + .loc 1 740 5 # optimized_matmul\optimized_matmul_llvm.mlir:740:5 + vmovaps %ymm4, 24576(%rax,%rdx) + .loc 1 786 5 # optimized_matmul\optimized_matmul_llvm.mlir:786:5 + vmovaps %ymm5, 24608(%rax,%rdx) + .loc 1 819 5 # optimized_matmul\optimized_matmul_llvm.mlir:819:5 + vmovaps %ymm16, 32768(%rax,%rdx) + .loc 1 865 5 # optimized_matmul\optimized_matmul_llvm.mlir:865:5 + vmovaps %ymm17, 32800(%rax,%rdx) + .loc 1 898 5 # optimized_matmul\optimized_matmul_llvm.mlir:898:5 + vmovaps %ymm18, 40960(%rax,%rdx) + .loc 1 944 5 # optimized_matmul\optimized_matmul_llvm.mlir:944:5 + vmovaps %ymm19, 40992(%rax,%rdx) + .loc 1 977 5 # optimized_matmul\optimized_matmul_llvm.mlir:977:5 + vmovaps %ymm20, 49152(%rax,%rdx) + .loc 1 1023 5 # optimized_matmul\optimized_matmul_llvm.mlir:1023:5 + vmovaps %ymm21, 49184(%rax,%rdx) + .loc 1 1056 5 # optimized_matmul\optimized_matmul_llvm.mlir:1056:5 + vmovaps %ymm22, 57344(%rax,%rdx) + .loc 1 1102 5 # optimized_matmul\optimized_matmul_llvm.mlir:1102:5 + vmovaps %ymm23, 57376(%rax,%rdx) + .loc 1 1135 5 # optimized_matmul\optimized_matmul_llvm.mlir:1135:5 + vmovaps %ymm24, 65536(%rax,%rdx) + .loc 1 1181 5 # optimized_matmul\optimized_matmul_llvm.mlir:1181:5 + vmovaps %ymm25, 65568(%rax,%rdx) + .loc 1 188 12 # optimized_matmul\optimized_matmul_llvm.mlir:188:12 + vmovups -480(%rcx), %ymm0 + .loc 1 210 12 # optimized_matmul\optimized_matmul_llvm.mlir:210:12 + vmovups -448(%rcx), %ymm1 + .loc 1 232 12 # optimized_matmul\optimized_matmul_llvm.mlir:232:12 + vmovups -416(%rcx), %ymm2 + .loc 1 254 12 # optimized_matmul\optimized_matmul_llvm.mlir:254:12 + vmovups -384(%rcx), %ymm3 + .loc 1 276 12 # optimized_matmul\optimized_matmul_llvm.mlir:276:12 + vmovups -352(%rcx), %ymm4 + .loc 1 298 12 # optimized_matmul\optimized_matmul_llvm.mlir:298:12 + vmovups -320(%rcx), %ymm5 + .loc 1 320 12 # optimized_matmul\optimized_matmul_llvm.mlir:320:12 + vmovups -288(%rcx), %ymm16 + .loc 1 342 12 # optimized_matmul\optimized_matmul_llvm.mlir:342:12 + vmovups -256(%rcx), %ymm17 + .loc 1 364 12 # optimized_matmul\optimized_matmul_llvm.mlir:364:12 + vmovups -224(%rcx), %ymm18 + .loc 1 386 12 # optimized_matmul\optimized_matmul_llvm.mlir:386:12 + vmovups -192(%rcx), %ymm19 + .loc 1 408 12 # optimized_matmul\optimized_matmul_llvm.mlir:408:12 + vmovups -160(%rcx), %ymm20 + .loc 1 430 12 # optimized_matmul\optimized_matmul_llvm.mlir:430:12 + vmovups -128(%rcx), %ymm21 + .loc 1 452 12 # optimized_matmul\optimized_matmul_llvm.mlir:452:12 + vmovups -96(%rcx), %ymm22 + .loc 1 474 12 # optimized_matmul\optimized_matmul_llvm.mlir:474:12 + vmovups -64(%rcx), %ymm23 + .loc 1 496 12 # optimized_matmul\optimized_matmul_llvm.mlir:496:12 + vmovups -32(%rcx), %ymm24 + .loc 1 518 12 # optimized_matmul\optimized_matmul_llvm.mlir:518:12 + vmovups (%rcx), %ymm25 + .loc 1 579 5 # optimized_matmul\optimized_matmul_llvm.mlir:579:5 + vmovaps %ymm0, 73728(%rax,%rdx) + .loc 1 628 5 # optimized_matmul\optimized_matmul_llvm.mlir:628:5 + vmovaps %ymm1, 73760(%rax,%rdx) + .loc 1 661 5 # optimized_matmul\optimized_matmul_llvm.mlir:661:5 + vmovaps %ymm2, 81920(%rax,%rdx) + .loc 1 707 5 # optimized_matmul\optimized_matmul_llvm.mlir:707:5 + vmovaps %ymm3, 81952(%rax,%rdx) + .loc 1 740 5 # optimized_matmul\optimized_matmul_llvm.mlir:740:5 + vmovaps %ymm4, 90112(%rax,%rdx) + .loc 1 786 5 # optimized_matmul\optimized_matmul_llvm.mlir:786:5 + vmovaps %ymm5, 90144(%rax,%rdx) + .loc 1 819 5 # optimized_matmul\optimized_matmul_llvm.mlir:819:5 + vmovaps %ymm16, 98304(%rax,%rdx) + .loc 1 865 5 # optimized_matmul\optimized_matmul_llvm.mlir:865:5 + vmovaps %ymm17, 98336(%rax,%rdx) + .loc 1 898 5 # optimized_matmul\optimized_matmul_llvm.mlir:898:5 + vmovaps %ymm18, 106496(%rax,%rdx) + .loc 1 944 5 # optimized_matmul\optimized_matmul_llvm.mlir:944:5 + vmovaps %ymm19, 106528(%rax,%rdx) + .loc 1 977 5 # optimized_matmul\optimized_matmul_llvm.mlir:977:5 + vmovaps %ymm20, 114688(%rax,%rdx) + .loc 1 1023 5 # optimized_matmul\optimized_matmul_llvm.mlir:1023:5 + vmovaps %ymm21, 114720(%rax,%rdx) + .loc 1 1056 5 # optimized_matmul\optimized_matmul_llvm.mlir:1056:5 + vmovaps %ymm22, 122880(%rax,%rdx) + .loc 1 1102 5 # optimized_matmul\optimized_matmul_llvm.mlir:1102:5 + vmovaps %ymm23, 122912(%rax,%rdx) + .loc 1 1135 5 # optimized_matmul\optimized_matmul_llvm.mlir:1135:5 + vmovaps %ymm24, 131072(%rax,%rdx) + .loc 1 1181 5 # optimized_matmul\optimized_matmul_llvm.mlir:1181:5 + vmovaps %ymm25, 131104(%rax,%rdx) + .loc 1 167 12 # optimized_matmul\optimized_matmul_llvm.mlir:167:12 + addq $2048, %rcx # imm = 0x800 + addq $64, %rax + .loc 1 168 5 # optimized_matmul\optimized_matmul_llvm.mlir:168:5 + jne .LBB0_2 +# %bb.3: # %.preheader37.preheader + # in Loop: Header=BB0_1 Depth=1 + .loc 1 0 5 is_stmt 0 # optimized_matmul\optimized_matmul_llvm.mlir:0:5 + movq 32(%rsp), %r12 # 8-byte Reload + xorl %r13d, %r13d + movq %rbp, 312(%rsp) # 8-byte Spill + .p2align 4, 0x90 +.LBB0_4: # %.preheader37 + # Parent Loop BB0_1 Depth=1 + # => This Loop Header: Depth=2 + # Child Loop BB0_5 Depth 3 + # Child Loop BB0_6 Depth 4 + movq %r13, 320(%rsp) # 8-byte Spill + .loc 1 2469 5 is_stmt 1 # optimized_matmul\optimized_matmul_llvm.mlir:2469:5 + shlq $9, %r13 + movq 304(%rsp), %rax # 8-byte Reload + leaq (%rax,%r13), %r14 + movl $6144, %r8d # imm = 0x1800 + movq %r15, %rcx + xorl %edx, %edx + vzeroupper + callq memset + movq $-2, %r9 + leaq cache_17+160(%rip), %rbp + xorl %ecx, %ecx + .p2align 4, 0x90 +.LBB0_5: # %.preheader33 + # Parent Loop BB0_1 Depth=1 + # Parent Loop BB0_4 Depth=2 + # => This Loop Header: Depth=3 + # Child Loop BB0_6 Depth 4 + .loc 1 0 5 is_stmt 0 # optimized_matmul\optimized_matmul_llvm.mlir:0:5 + movl %ecx, %eax + shrl $4, %eax + andl $15, %eax + shlq $2, %rax + leaq (%rax,%rax,2), %rbx + movq %rbx, %rax + shlq $5, %rax + leaq (%r15,%rax), %r8 + xorl %esi, %esi + movq $-1, %rdi + negq %rdi + setl %sil + movl $1, %edx + cmovlq %r9, %rdx + movq %rdx, %rdi + shrq $63, %rdi + addq %rdx, %rdi + sarq %rdi + negq %rsi + xorq %rdi, %rsi + leaq (%rsi,%rsi), %rdx + movl $1, %edi + subq %rdx, %rdi + addq %rbx, %rdi + shlq $5, %rdi + vmovaps (%rax,%r15), %ymm1 + leaq (%r15,%rdi), %rbx + vmovaps (%rdi,%r15), %ymm0 + .loc 1 2487 5 is_stmt 1 # optimized_matmul\optimized_matmul_llvm.mlir:2487:5 + shlq $6, %rsi + negq %rsi + movq $-4, %rdi + movq %rbp, %rax + .p2align 4, 0x90 +.LBB0_6: # %.preheader + # Parent Loop BB0_1 Depth=1 + # Parent Loop BB0_4 Depth=2 + # Parent Loop BB0_5 Depth=3 + # => This Inner Loop Header: Depth=4 + .loc 1 4305 13 # optimized_matmul\optimized_matmul_llvm.mlir:4305:13 + vbroadcastss 4(%r12,%rdi,4), %ymm2 + .loc 1 4425 13 # optimized_matmul\optimized_matmul_llvm.mlir:4425:13 + vfmadd231ps -160(%rax), %ymm2, %ymm1 # ymm1 = (ymm2 * mem) + ymm1 + .loc 1 5203 13 # optimized_matmul\optimized_matmul_llvm.mlir:5203:13 + vfmadd231ps -128(%rax,%rsi), %ymm2, %ymm0 # ymm0 = (ymm2 * mem) + ymm0 + .loc 1 4305 13 # optimized_matmul\optimized_matmul_llvm.mlir:4305:13 + vbroadcastss 8(%r12,%rdi,4), %ymm2 + .loc 1 4425 13 # optimized_matmul\optimized_matmul_llvm.mlir:4425:13 + vfmadd231ps -96(%rax), %ymm2, %ymm1 # ymm1 = (ymm2 * mem) + ymm1 + .loc 1 5203 13 # optimized_matmul\optimized_matmul_llvm.mlir:5203:13 + vfmadd231ps -64(%rax,%rsi), %ymm2, %ymm0 # ymm0 = (ymm2 * mem) + ymm0 + .loc 1 4305 13 # optimized_matmul\optimized_matmul_llvm.mlir:4305:13 + vbroadcastss 12(%r12,%rdi,4), %ymm2 + .loc 1 4425 13 # optimized_matmul\optimized_matmul_llvm.mlir:4425:13 + vfmadd231ps -32(%rax), %ymm2, %ymm1 # ymm1 = (ymm2 * mem) + ymm1 + .loc 1 5203 13 # optimized_matmul\optimized_matmul_llvm.mlir:5203:13 + vfmadd231ps (%rax,%rsi), %ymm2, %ymm0 # ymm0 = (ymm2 * mem) + ymm0 + .loc 1 4305 13 # optimized_matmul\optimized_matmul_llvm.mlir:4305:13 + vbroadcastss 16(%r12,%rdi,4), %ymm2 + .loc 1 4425 13 # optimized_matmul\optimized_matmul_llvm.mlir:4425:13 + vfmadd231ps 32(%rax), %ymm2, %ymm1 # ymm1 = (ymm2 * mem) + ymm1 + .loc 1 5203 13 # optimized_matmul\optimized_matmul_llvm.mlir:5203:13 + vfmadd231ps 64(%rax,%rsi), %ymm2, %ymm0 # ymm0 = (ymm2 * mem) + ymm0 + .loc 1 2486 13 # optimized_matmul\optimized_matmul_llvm.mlir:2486:13 + addq $4, %rdi + addq $256, %rax # imm = 0x100 + cmpq $124, %rdi + .loc 1 2487 5 # optimized_matmul\optimized_matmul_llvm.mlir:2487:5 + jb .LBB0_6 +# %bb.7: # in Loop: Header=BB0_5 Depth=3 + .loc 1 0 5 is_stmt 0 # optimized_matmul\optimized_matmul_llvm.mlir:0:5 + vmovaps %ymm1, (%r8) + vmovaps %ymm0, (%rbx) + .loc 1 5649 13 is_stmt 1 # optimized_matmul\optimized_matmul_llvm.mlir:5649:13 + leaq 16(%rcx), %rax + .loc 1 2482 5 # optimized_matmul\optimized_matmul_llvm.mlir:2482:5 + addq $8192, %rbp # imm = 0x2000 + .loc 1 2481 13 # optimized_matmul\optimized_matmul_llvm.mlir:2481:13 + cmpq $240, %rcx + movq %rax, %rcx + .loc 1 2482 5 # optimized_matmul\optimized_matmul_llvm.mlir:2482:5 + jb .LBB0_5 +# %bb.8: # %.preheader35 + # in Loop: Header=BB0_4 Depth=2 + .loc 1 0 5 is_stmt 0 # optimized_matmul\optimized_matmul_llvm.mlir:0:5 + movq 312(%rsp), %rbp # 8-byte Reload + .loc 1 5667 13 is_stmt 1 # optimized_matmul\optimized_matmul_llvm.mlir:5667:13 + movq %rbp, %rax + addq %r13, %rax + movq 520(%rsp), %rcx + .loc 1 5670 13 # optimized_matmul\optimized_matmul_llvm.mlir:5670:13 + vmovups (%rcx,%rax,4), %ymm0 + .loc 1 5708 13 # optimized_matmul\optimized_matmul_llvm.mlir:5708:13 + vaddps cache_16(%rip), %ymm0, %ymm0 + movq 296(%rsp), %rax # 8-byte Reload + .loc 1 5727 13 # optimized_matmul\optimized_matmul_llvm.mlir:5727:13 + leaq (%rax,%r13), %rax + .loc 1 5730 13 # optimized_matmul\optimized_matmul_llvm.mlir:5730:13 + vmovups (%rcx,%rax,4), %ymm1 + .loc 1 5770 13 # optimized_matmul\optimized_matmul_llvm.mlir:5770:13 + vaddps cache_16+32(%rip), %ymm1, %ymm1 + movq 288(%rsp), %rax # 8-byte Reload + .loc 1 5789 13 # optimized_matmul\optimized_matmul_llvm.mlir:5789:13 + leaq (%rax,%r13), %rax + .loc 1 5792 13 # optimized_matmul\optimized_matmul_llvm.mlir:5792:13 + vmovups (%rcx,%rax,4), %ymm2 + .loc 1 5816 13 # optimized_matmul\optimized_matmul_llvm.mlir:5816:13 + vaddps cache_16+384(%rip), %ymm2, %ymm2 + movq 280(%rsp), %rax # 8-byte Reload + .loc 1 5835 13 # optimized_matmul\optimized_matmul_llvm.mlir:5835:13 + leaq (%rax,%r13), %rax + .loc 1 5838 13 # optimized_matmul\optimized_matmul_llvm.mlir:5838:13 + vmovups (%rcx,%rax,4), %ymm3 + .loc 1 5875 13 # optimized_matmul\optimized_matmul_llvm.mlir:5875:13 + vaddps cache_16+416(%rip), %ymm3, %ymm3 + movq 272(%rsp), %rax # 8-byte Reload + .loc 1 5894 13 # optimized_matmul\optimized_matmul_llvm.mlir:5894:13 + leaq (%rax,%r13), %rax + .loc 1 5897 13 # optimized_matmul\optimized_matmul_llvm.mlir:5897:13 + vmovups (%rcx,%rax,4), %ymm4 + .loc 1 5921 13 # optimized_matmul\optimized_matmul_llvm.mlir:5921:13 + vaddps cache_16+768(%rip), %ymm4, %ymm4 + movq 264(%rsp), %rax # 8-byte Reload + .loc 1 5940 13 # optimized_matmul\optimized_matmul_llvm.mlir:5940:13 + leaq (%rax,%r13), %rax + .loc 1 5943 13 # optimized_matmul\optimized_matmul_llvm.mlir:5943:13 + vmovups (%rcx,%rax,4), %ymm5 + .loc 1 5980 13 # optimized_matmul\optimized_matmul_llvm.mlir:5980:13 + vaddps cache_16+800(%rip), %ymm5, %ymm5 + movq 256(%rsp), %rax # 8-byte Reload + .loc 1 5999 13 # optimized_matmul\optimized_matmul_llvm.mlir:5999:13 + leaq (%rax,%r13), %rax + .loc 1 6002 13 # optimized_matmul\optimized_matmul_llvm.mlir:6002:13 + vmovups (%rcx,%rax,4), %ymm16 + .loc 1 6026 13 # optimized_matmul\optimized_matmul_llvm.mlir:6026:13 + vaddps cache_16+1152(%rip), %ymm16, %ymm16 + movq 248(%rsp), %rax # 8-byte Reload + .loc 1 6045 13 # optimized_matmul\optimized_matmul_llvm.mlir:6045:13 + leaq (%rax,%r13), %rax + .loc 1 6048 13 # optimized_matmul\optimized_matmul_llvm.mlir:6048:13 + vmovups (%rcx,%rax,4), %ymm17 + .loc 1 6085 13 # optimized_matmul\optimized_matmul_llvm.mlir:6085:13 + vaddps cache_16+1184(%rip), %ymm17, %ymm17 + movq 240(%rsp), %rax # 8-byte Reload + .loc 1 6104 13 # optimized_matmul\optimized_matmul_llvm.mlir:6104:13 + leaq (%rax,%r13), %rax + .loc 1 6107 13 # optimized_matmul\optimized_matmul_llvm.mlir:6107:13 + vmovups (%rcx,%rax,4), %ymm18 + .loc 1 6131 13 # optimized_matmul\optimized_matmul_llvm.mlir:6131:13 + vaddps cache_16+1536(%rip), %ymm18, %ymm18 + movq 232(%rsp), %rax # 8-byte Reload + .loc 1 6150 13 # optimized_matmul\optimized_matmul_llvm.mlir:6150:13 + leaq (%rax,%r13), %rax + .loc 1 6153 13 # optimized_matmul\optimized_matmul_llvm.mlir:6153:13 + vmovups (%rcx,%rax,4), %ymm19 + .loc 1 6190 13 # optimized_matmul\optimized_matmul_llvm.mlir:6190:13 + vaddps cache_16+1568(%rip), %ymm19, %ymm19 + movq 224(%rsp), %rax # 8-byte Reload + .loc 1 6209 13 # optimized_matmul\optimized_matmul_llvm.mlir:6209:13 + leaq (%rax,%r13), %rax + .loc 1 6212 13 # optimized_matmul\optimized_matmul_llvm.mlir:6212:13 + vmovups (%rcx,%rax,4), %ymm20 + .loc 1 6236 13 # optimized_matmul\optimized_matmul_llvm.mlir:6236:13 + vaddps cache_16+1920(%rip), %ymm20, %ymm20 + movq 216(%rsp), %rax # 8-byte Reload + .loc 1 6255 13 # optimized_matmul\optimized_matmul_llvm.mlir:6255:13 + leaq (%rax,%r13), %rax + .loc 1 6258 13 # optimized_matmul\optimized_matmul_llvm.mlir:6258:13 + vmovups (%rcx,%rax,4), %ymm21 + .loc 1 6295 13 # optimized_matmul\optimized_matmul_llvm.mlir:6295:13 + vaddps cache_16+1952(%rip), %ymm21, %ymm21 + movq 208(%rsp), %rax # 8-byte Reload + .loc 1 6314 13 # optimized_matmul\optimized_matmul_llvm.mlir:6314:13 + leaq (%rax,%r13), %rax + .loc 1 6317 13 # optimized_matmul\optimized_matmul_llvm.mlir:6317:13 + vmovups (%rcx,%rax,4), %ymm22 + .loc 1 6341 13 # optimized_matmul\optimized_matmul_llvm.mlir:6341:13 + vaddps cache_16+2304(%rip), %ymm22, %ymm22 + movq 200(%rsp), %rax # 8-byte Reload + .loc 1 6360 13 # optimized_matmul\optimized_matmul_llvm.mlir:6360:13 + leaq (%rax,%r13), %rax + .loc 1 6363 13 # optimized_matmul\optimized_matmul_llvm.mlir:6363:13 + vmovups (%rcx,%rax,4), %ymm23 + .loc 1 6400 13 # optimized_matmul\optimized_matmul_llvm.mlir:6400:13 + vaddps cache_16+2336(%rip), %ymm23, %ymm23 + movq 192(%rsp), %rax # 8-byte Reload + .loc 1 6419 13 # optimized_matmul\optimized_matmul_llvm.mlir:6419:13 + leaq (%rax,%r13), %rax + .loc 1 6422 13 # optimized_matmul\optimized_matmul_llvm.mlir:6422:13 + vmovups (%rcx,%rax,4), %ymm24 + .loc 1 6446 13 # optimized_matmul\optimized_matmul_llvm.mlir:6446:13 + vaddps cache_16+2688(%rip), %ymm24, %ymm24 + movq 184(%rsp), %rax # 8-byte Reload + .loc 1 6465 13 # optimized_matmul\optimized_matmul_llvm.mlir:6465:13 + leaq (%rax,%r13), %rax + .loc 1 6468 13 # optimized_matmul\optimized_matmul_llvm.mlir:6468:13 + vmovups (%rcx,%rax,4), %ymm25 + .loc 1 6505 13 # optimized_matmul\optimized_matmul_llvm.mlir:6505:13 + vaddps cache_16+2720(%rip), %ymm25, %ymm25 + .loc 1 6543 5 # optimized_matmul\optimized_matmul_llvm.mlir:6543:5 + vmovups %ymm0, (%rcx,%r14,4) + vmovups %ymm1, 32(%rcx,%r14,4) + vmovups %ymm2, 64(%rcx,%r14,4) + vmovups %ymm3, 96(%rcx,%r14,4) + vmovups %ymm4, 128(%rcx,%r14,4) + vmovups %ymm5, 160(%rcx,%r14,4) + vmovups %ymm16, 192(%rcx,%r14,4) + vmovups %ymm17, 224(%rcx,%r14,4) + vmovups %ymm18, 256(%rcx,%r14,4) + vmovups %ymm19, 288(%rcx,%r14,4) + vmovups %ymm20, 320(%rcx,%r14,4) + vmovups %ymm21, 352(%rcx,%r14,4) + vmovups %ymm22, 384(%rcx,%r14,4) + vmovups %ymm23, 416(%rcx,%r14,4) + vmovups %ymm24, 448(%rcx,%r14,4) + vmovups %ymm25, 480(%rcx,%r14,4) + movq 176(%rsp), %rax # 8-byte Reload + .loc 1 5667 13 # optimized_matmul\optimized_matmul_llvm.mlir:5667:13 + leaq (%rax,%r13), %rax + .loc 1 5670 13 # optimized_matmul\optimized_matmul_llvm.mlir:5670:13 + vmovups (%rcx,%rax,4), %ymm0 + movq 168(%rsp), %rax # 8-byte Reload + .loc 1 5727 13 # optimized_matmul\optimized_matmul_llvm.mlir:5727:13 + leaq (%rax,%r13), %rax + .loc 1 5730 13 # optimized_matmul\optimized_matmul_llvm.mlir:5730:13 + vmovups (%rcx,%rax,4), %ymm1 + movq 160(%rsp), %rax # 8-byte Reload + .loc 1 5789 13 # optimized_matmul\optimized_matmul_llvm.mlir:5789:13 + leaq (%rax,%r13), %rax + .loc 1 5792 13 # optimized_matmul\optimized_matmul_llvm.mlir:5792:13 + vmovups (%rcx,%rax,4), %ymm2 + movq 152(%rsp), %rax # 8-byte Reload + .loc 1 5835 13 # optimized_matmul\optimized_matmul_llvm.mlir:5835:13 + leaq (%rax,%r13), %rax + .loc 1 5838 13 # optimized_matmul\optimized_matmul_llvm.mlir:5838:13 + vmovups (%rcx,%rax,4), %ymm3 + movq 144(%rsp), %rax # 8-byte Reload + .loc 1 5894 13 # optimized_matmul\optimized_matmul_llvm.mlir:5894:13 + leaq (%rax,%r13), %rax + .loc 1 5897 13 # optimized_matmul\optimized_matmul_llvm.mlir:5897:13 + vmovups (%rcx,%rax,4), %ymm4 + movq 136(%rsp), %rax # 8-byte Reload + .loc 1 5940 13 # optimized_matmul\optimized_matmul_llvm.mlir:5940:13 + leaq (%rax,%r13), %rax + .loc 1 5943 13 # optimized_matmul\optimized_matmul_llvm.mlir:5943:13 + vmovups (%rcx,%rax,4), %ymm5 + movq 128(%rsp), %rax # 8-byte Reload + .loc 1 5999 13 # optimized_matmul\optimized_matmul_llvm.mlir:5999:13 + leaq (%rax,%r13), %rax + .loc 1 6002 13 # optimized_matmul\optimized_matmul_llvm.mlir:6002:13 + vmovups (%rcx,%rax,4), %ymm16 + movq 120(%rsp), %rax # 8-byte Reload + .loc 1 6045 13 # optimized_matmul\optimized_matmul_llvm.mlir:6045:13 + leaq (%rax,%r13), %rax + .loc 1 6048 13 # optimized_matmul\optimized_matmul_llvm.mlir:6048:13 + vmovups (%rcx,%rax,4), %ymm17 + movq 112(%rsp), %rax # 8-byte Reload + .loc 1 6104 13 # optimized_matmul\optimized_matmul_llvm.mlir:6104:13 + leaq (%rax,%r13), %rax + .loc 1 6107 13 # optimized_matmul\optimized_matmul_llvm.mlir:6107:13 + vmovups (%rcx,%rax,4), %ymm18 + movq 104(%rsp), %rax # 8-byte Reload + .loc 1 6150 13 # optimized_matmul\optimized_matmul_llvm.mlir:6150:13 + leaq (%rax,%r13), %rax + .loc 1 6153 13 # optimized_matmul\optimized_matmul_llvm.mlir:6153:13 + vmovups (%rcx,%rax,4), %ymm19 + movq 96(%rsp), %rax # 8-byte Reload + .loc 1 6209 13 # optimized_matmul\optimized_matmul_llvm.mlir:6209:13 + leaq (%rax,%r13), %rax + .loc 1 6212 13 # optimized_matmul\optimized_matmul_llvm.mlir:6212:13 + vmovups (%rcx,%rax,4), %ymm20 + movq 88(%rsp), %rax # 8-byte Reload + .loc 1 6255 13 # optimized_matmul\optimized_matmul_llvm.mlir:6255:13 + leaq (%rax,%r13), %rax + .loc 1 6258 13 # optimized_matmul\optimized_matmul_llvm.mlir:6258:13 + vmovups (%rcx,%rax,4), %ymm21 + movq 80(%rsp), %rax # 8-byte Reload + .loc 1 6314 13 # optimized_matmul\optimized_matmul_llvm.mlir:6314:13 + addq %r13, %rax + .loc 1 6317 13 # optimized_matmul\optimized_matmul_llvm.mlir:6317:13 + vmovups (%rcx,%rax,4), %ymm22 + movq 72(%rsp), %rax # 8-byte Reload + .loc 1 6360 13 # optimized_matmul\optimized_matmul_llvm.mlir:6360:13 + addq %r13, %rax + .loc 1 6363 13 # optimized_matmul\optimized_matmul_llvm.mlir:6363:13 + vmovups (%rcx,%rax,4), %ymm23 + movq 64(%rsp), %rax # 8-byte Reload + .loc 1 6419 13 # optimized_matmul\optimized_matmul_llvm.mlir:6419:13 + addq %r13, %rax + .loc 1 6422 13 # optimized_matmul\optimized_matmul_llvm.mlir:6422:13 + vmovups (%rcx,%rax,4), %ymm24 + .loc 1 6465 13 # optimized_matmul\optimized_matmul_llvm.mlir:6465:13 + addq 56(%rsp), %r13 # 8-byte Folded Reload + .loc 1 6468 13 # optimized_matmul\optimized_matmul_llvm.mlir:6468:13 + vmovups (%rcx,%r13,4), %ymm25 + .loc 1 5659 13 # optimized_matmul\optimized_matmul_llvm.mlir:5659:13 + leaq (,%r14,4), %rax + orq $512, %rax # imm = 0x200 + .loc 1 5708 13 # optimized_matmul\optimized_matmul_llvm.mlir:5708:13 + vaddps cache_16+3072(%rip), %ymm0, %ymm0 + .loc 1 5770 13 # optimized_matmul\optimized_matmul_llvm.mlir:5770:13 + vaddps cache_16+3104(%rip), %ymm1, %ymm1 + .loc 1 5816 13 # optimized_matmul\optimized_matmul_llvm.mlir:5816:13 + vaddps cache_16+3456(%rip), %ymm2, %ymm2 + .loc 1 5875 13 # optimized_matmul\optimized_matmul_llvm.mlir:5875:13 + vaddps cache_16+3488(%rip), %ymm3, %ymm3 + .loc 1 5921 13 # optimized_matmul\optimized_matmul_llvm.mlir:5921:13 + vaddps cache_16+3840(%rip), %ymm4, %ymm4 + .loc 1 5980 13 # optimized_matmul\optimized_matmul_llvm.mlir:5980:13 + vaddps cache_16+3872(%rip), %ymm5, %ymm5 + .loc 1 6026 13 # optimized_matmul\optimized_matmul_llvm.mlir:6026:13 + vaddps cache_16+4224(%rip), %ymm16, %ymm16 + .loc 1 6085 13 # optimized_matmul\optimized_matmul_llvm.mlir:6085:13 + vaddps cache_16+4256(%rip), %ymm17, %ymm17 + .loc 1 6131 13 # optimized_matmul\optimized_matmul_llvm.mlir:6131:13 + vaddps cache_16+4608(%rip), %ymm18, %ymm18 + .loc 1 6190 13 # optimized_matmul\optimized_matmul_llvm.mlir:6190:13 + vaddps cache_16+4640(%rip), %ymm19, %ymm19 + .loc 1 6236 13 # optimized_matmul\optimized_matmul_llvm.mlir:6236:13 + vaddps cache_16+4992(%rip), %ymm20, %ymm20 + .loc 1 6295 13 # optimized_matmul\optimized_matmul_llvm.mlir:6295:13 + vaddps cache_16+5024(%rip), %ymm21, %ymm21 + .loc 1 6341 13 # optimized_matmul\optimized_matmul_llvm.mlir:6341:13 + vaddps cache_16+5376(%rip), %ymm22, %ymm22 + .loc 1 6400 13 # optimized_matmul\optimized_matmul_llvm.mlir:6400:13 + vaddps cache_16+5408(%rip), %ymm23, %ymm23 + .loc 1 6446 13 # optimized_matmul\optimized_matmul_llvm.mlir:6446:13 + vaddps cache_16+5760(%rip), %ymm24, %ymm24 + .loc 1 6505 13 # optimized_matmul\optimized_matmul_llvm.mlir:6505:13 + vaddps cache_16+5792(%rip), %ymm25, %ymm25 + .loc 1 6543 5 # optimized_matmul\optimized_matmul_llvm.mlir:6543:5 + vmovups %ymm0, (%rcx,%rax) + vmovups %ymm1, 32(%rcx,%rax) + vmovups %ymm2, 64(%rcx,%rax) + vmovups %ymm3, 96(%rcx,%rax) + vmovups %ymm4, 128(%rcx,%rax) + vmovups %ymm5, 160(%rcx,%rax) + vmovups %ymm16, 192(%rcx,%rax) + vmovups %ymm17, 224(%rcx,%rax) + vmovups %ymm18, 256(%rcx,%rax) + vmovups %ymm19, 288(%rcx,%rax) + vmovups %ymm20, 320(%rcx,%rax) + vmovups %ymm21, 352(%rcx,%rax) + vmovups %ymm22, 384(%rcx,%rax) + vmovups %ymm23, 416(%rcx,%rax) + vmovups %ymm24, 448(%rcx,%rax) + vmovups %ymm25, 480(%rcx,%rax) + movq 320(%rsp), %r13 # 8-byte Reload + .loc 1 7694 13 # optimized_matmul\optimized_matmul_llvm.mlir:7694:13 + incq %r13 + .loc 1 2440 5 # optimized_matmul\optimized_matmul_llvm.mlir:2440:5 + addq $512, %r12 # imm = 0x200 + .loc 1 2439 13 # optimized_matmul\optimized_matmul_llvm.mlir:2439:13 + cmpq $784, %r13 # imm = 0x310 + .loc 1 2440 5 # optimized_matmul\optimized_matmul_llvm.mlir:2440:5 + jne .LBB0_4 +# %bb.9: # in Loop: Header=BB0_1 Depth=1 + .loc 1 7697 13 # optimized_matmul\optimized_matmul_llvm.mlir:7697:13 + addq $256, %rbp # imm = 0x100 + movq 40(%rsp), %rax # 8-byte Reload + .loc 1 163 5 # optimized_matmul\optimized_matmul_llvm.mlir:163:5 + incq %rax + movq 48(%rsp), %rcx # 8-byte Reload + addq $1024, %rcx # imm = 0x400 + .loc 1 162 12 # optimized_matmul\optimized_matmul_llvm.mlir:162:12 + cmpq $2, %rax + leaq cache_17(%rip), %rdx + .loc 1 163 5 # optimized_matmul\optimized_matmul_llvm.mlir:163:5 + jne .LBB0_1 +# %bb.10: + .loc 1 7700 5 # optimized_matmul\optimized_matmul_llvm.mlir:7700:5 + addq $328, %rsp # imm = 0x148 + popq %rbx + popq %rbp + popq %rdi + popq %rsi + popq %r12 + popq %r13 + popq %r14 + popq %r15 + vzeroupper + retq +.Ltmp1: +.Lfunc_end0: + # -- End function + .def optimized_matmul_py_4a6286d9; + .scl 2; + .type 32; + .endef + .globl __ymm@0000000000000200000000000000020000000000000003100000000000000000 # -- Begin function optimized_matmul_py_4a6286d9 + .section .rdata,"dr",discard,__ymm@0000000000000200000000000000020000000000000003100000000000000000 + .p2align 5 +__ymm@0000000000000200000000000000020000000000000003100000000000000000: + .quad 0 # 0x0 + .quad 784 # 0x310 + .quad 512 # 0x200 + .quad 512 # 0x200 + .globl __ymm@0000000000000200000000000000020000000000000000800000000000000000 + .section .rdata,"dr",discard,__ymm@0000000000000200000000000000020000000000000000800000000000000000 + .p2align 5 +__ymm@0000000000000200000000000000020000000000000000800000000000000000: + .quad 0 # 0x0 + .quad 128 # 0x80 + .quad 512 # 0x200 + .quad 512 # 0x200 + .text + .globl optimized_matmul_py_4a6286d9 + .p2align 4, 0x90 +optimized_matmul_py_4a6286d9: # @optimized_matmul_py_4a6286d9 +.Lfunc_begin1: + .loc 1 7702 0 # optimized_matmul\optimized_matmul_llvm.mlir:7702:0 +# %bb.0: + subq $168, %rsp +.Ltmp2: + .loc 1 7763 5 prologue_end # optimized_matmul\optimized_matmul_llvm.mlir:7763:5 + vmovaps __ymm@0000000000000200000000000000020000000000000003100000000000000000(%rip), %ymm0 # ymm0 = [0,784,512,512] + vmovups %ymm0, 128(%rsp) + movq %r8, 120(%rsp) + vmovaps __ymm@0000000000000200000000000000020000000000000000800000000000000000(%rip), %ymm0 # ymm0 = [0,128,512,512] + movq %r8, 112(%rsp) + vmovups %ymm0, 72(%rsp) + movq %rdx, 64(%rsp) + movq %rdx, 56(%rsp) + movq $1, 160(%rsp) + movq $1, 104(%rsp) + movq $1, 48(%rsp) + movq $128, 40(%rsp) + movq $128, 32(%rsp) + movl $784, %r9d # imm = 0x310 + movq %rcx, %rdx + xorl %r8d, %r8d + vzeroupper + callq optimized_matmul_py_4a6286d9_impl_17630232307017152746 + .loc 1 7764 5 # optimized_matmul\optimized_matmul_llvm.mlir:7764:5 + addq $168, %rsp + retq +.Ltmp3: +.Lfunc_end1: + # -- End function + .lcomm cache_17,131072,32 # @cache_17 + .lcomm cache_16,6144,32 # @cache_16 + .section .debug_abbrev,"dr" +.Lsection_abbrev: + .byte 1 # Abbreviation Code + .byte 17 # DW_TAG_compile_unit + .byte 1 # DW_CHILDREN_yes + .byte 37 # DW_AT_producer + .byte 14 # DW_FORM_strp + .byte 19 # DW_AT_language + .byte 5 # DW_FORM_data2 + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 16 # DW_AT_stmt_list + .byte 23 # DW_FORM_sec_offset + .byte 27 # DW_AT_comp_dir + .byte 14 # DW_FORM_strp + .ascii "\264B" # DW_AT_GNU_pubnames + .byte 25 # DW_FORM_flag_present + .byte 17 # DW_AT_low_pc + .byte 1 # DW_FORM_addr + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 2 # Abbreviation Code + .byte 46 # DW_TAG_subprogram + .byte 0 # DW_CHILDREN_no + .byte 17 # DW_AT_low_pc + .byte 1 # DW_FORM_addr + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 64 # DW_AT_frame_base + .byte 24 # DW_FORM_exprloc + .byte 110 # DW_AT_linkage_name + .byte 14 # DW_FORM_strp + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 58 # DW_AT_decl_file + .byte 11 # DW_FORM_data1 + .byte 59 # DW_AT_decl_line + .byte 11 # DW_FORM_data1 + .byte 63 # DW_AT_external + .byte 25 # DW_FORM_flag_present + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 3 # Abbreviation Code + .byte 46 # DW_TAG_subprogram + .byte 0 # DW_CHILDREN_no + .byte 17 # DW_AT_low_pc + .byte 1 # DW_FORM_addr + .byte 18 # DW_AT_high_pc + .byte 6 # DW_FORM_data4 + .byte 64 # DW_AT_frame_base + .byte 24 # DW_FORM_exprloc + .byte 110 # DW_AT_linkage_name + .byte 14 # DW_FORM_strp + .byte 3 # DW_AT_name + .byte 14 # DW_FORM_strp + .byte 58 # DW_AT_decl_file + .byte 11 # DW_FORM_data1 + .byte 59 # DW_AT_decl_line + .byte 5 # DW_FORM_data2 + .byte 63 # DW_AT_external + .byte 25 # DW_FORM_flag_present + .byte 0 # EOM(1) + .byte 0 # EOM(2) + .byte 0 # EOM(3) + .section .debug_info,"dr" +.Lsection_info: +.Lcu_begin0: + .long .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit +.Ldebug_info_start0: + .short 4 # DWARF version number + .secrel32 .Lsection_abbrev # Offset Into Abbrev. Section + .byte 8 # Address Size (in bytes) + .byte 1 # Abbrev [1] 0xb:0x53 DW_TAG_compile_unit + .secrel32 .Linfo_string0 # DW_AT_producer + .short 2 # DW_AT_language + .secrel32 .Linfo_string1 # DW_AT_name + .secrel32 .Lline_table_start0 # DW_AT_stmt_list + .secrel32 .Linfo_string2 # DW_AT_comp_dir + # DW_AT_GNU_pubnames + .quad .Lfunc_begin0 # DW_AT_low_pc + .long .Lfunc_end1-.Lfunc_begin0 # DW_AT_high_pc + .byte 2 # Abbrev [2] 0x2a:0x19 DW_TAG_subprogram + .quad .Lfunc_begin0 # DW_AT_low_pc + .long .Lfunc_end0-.Lfunc_begin0 # DW_AT_high_pc + .byte 1 # DW_AT_frame_base + .byte 87 + .secrel32 .Linfo_string3 # DW_AT_linkage_name + .secrel32 .Linfo_string3 # DW_AT_name + .byte 1 # DW_AT_decl_file + .byte 8 # DW_AT_decl_line + # DW_AT_external + .byte 3 # Abbrev [3] 0x43:0x1a DW_TAG_subprogram + .quad .Lfunc_begin1 # DW_AT_low_pc + .long .Lfunc_end1-.Lfunc_begin1 # DW_AT_high_pc + .byte 1 # DW_AT_frame_base + .byte 87 + .secrel32 .Linfo_string4 # DW_AT_linkage_name + .secrel32 .Linfo_string4 # DW_AT_name + .byte 1 # DW_AT_decl_file + .short 7702 # DW_AT_decl_line + # DW_AT_external + .byte 0 # End Of Children Mark +.Ldebug_info_end0: + .section .debug_str,"dr" +.Linfo_string: +.Linfo_string0: + .asciz "mlir" # string offset=0 +.Linfo_string1: + .asciz "LLVMDialectModule" # string offset=5 +.Linfo_string2: + .asciz "/" # string offset=23 +.Linfo_string3: + .asciz "optimized_matmul_py_4a6286d9_impl_17630232307017152746" # string offset=25 +.Linfo_string4: + .asciz "optimized_matmul_py_4a6286d9" # string offset=80 + .section .debug_pubnames,"dr" + .long .LpubNames_end0-.LpubNames_begin0 # Length of Public Names Info +.LpubNames_begin0: + .short 2 # DWARF Version + .secrel32 .Lcu_begin0 # Offset of Compilation Unit Info + .long 94 # Compilation Unit Length + .long 67 # DIE offset + .asciz "optimized_matmul_py_4a6286d9" # External Name + .long 42 # DIE offset + .asciz "optimized_matmul_py_4a6286d9_impl_17630232307017152746" # External Name + .long 0 # End Mark +.LpubNames_end0: + .section .debug_pubtypes,"dr" + .long .LpubTypes_end0-.LpubTypes_begin0 # Length of Public Types Info +.LpubTypes_begin0: + .short 2 # DWARF Version + .secrel32 .Lcu_begin0 # Offset of Compilation Unit Info + .long 94 # Compilation Unit Length + .long 0 # End Mark +.LpubTypes_end0: + .globl _fltused + .section .debug_line,"dr" +.Lsection_line: +.Lline_table_start0: diff --git a/Tutorials/optimized_matmul/mlir/optimized_matmul_llvm.mlir b/Tutorials/optimized_matmul/mlir/optimized_matmul_llvm.mlir new file mode 100644 index 00000000..66233fd7 --- /dev/null +++ b/Tutorials/optimized_matmul/mlir/optimized_matmul_llvm.mlir @@ -0,0 +1,7766 @@ + + + + +module @optimized_matmul attributes {llvm.data_layout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"} { + llvm.mlir.global internal @cache_17() : !llvm.array<4096 x vec<8 x float>> + llvm.mlir.global internal @cache_16() : !llvm.array<192 x vec<8 x float>> + llvm.func @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i64, %arg6: !llvm.i64, %arg7: !llvm.ptr, %arg8: !llvm.ptr, %arg9: !llvm.i64, %arg10: !llvm.i64, %arg11: !llvm.i64, %arg12: !llvm.i64, %arg13: !llvm.i64, %arg14: !llvm.ptr, %arg15: !llvm.ptr, %arg16: !llvm.i64, %arg17: !llvm.i64, %arg18: !llvm.i64, %arg19: !llvm.i64, %arg20: !llvm.i64) attributes {exec_target = 0 : i64, sym_visibility = "nested"} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.insertvalue %arg7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %10 = llvm.insertvalue %arg8, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.insertvalue %arg9, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %12 = llvm.insertvalue %arg10, %11[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.insertvalue %arg12, %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg11, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg13, %14[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %17 = llvm.insertvalue %arg14, %16[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.insertvalue %arg15, %17[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %19 = llvm.insertvalue %arg16, %18[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.insertvalue %arg17, %19[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %21 = llvm.insertvalue %arg19, %20[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.insertvalue %arg18, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %23 = llvm.insertvalue %arg20, %22[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(0 : i64) : !llvm.i64 + %25 = llvm.mlir.constant(1 : i64) : !llvm.i64 + %26 = llvm.mlir.constant(2 : i64) : !llvm.i64 + %27 = llvm.mlir.constant(3 : i64) : !llvm.i64 + %28 = llvm.mlir.constant(4 : i64) : !llvm.i64 + %29 = llvm.mlir.constant(5 : i64) : !llvm.i64 + %30 = llvm.mlir.constant(6 : i64) : !llvm.i64 + %31 = llvm.mlir.constant(7 : i64) : !llvm.i64 + %32 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %33 = llvm.mlir.constant(10 : index) : !llvm.i64 + %34 = llvm.mlir.constant(12 : index) : !llvm.i64 + %35 = llvm.mlir.constant(14 : index) : !llvm.i64 + %36 = llvm.mlir.constant(512 : index) : !llvm.i64 + %37 = llvm.mlir.constant(784 : index) : !llvm.i64 + %38 = llvm.mlir.constant(256 : index) : !llvm.i64 + %39 = llvm.mlir.constant(128 : index) : !llvm.i64 + %40 = llvm.mlir.constant(true) : !llvm.i1 + %41 = llvm.mlir.constant(24 : index) : !llvm.i64 + %42 = llvm.mlir.constant(32 : index) : !llvm.i64 + %43 = llvm.mlir.constant(40 : index) : !llvm.i64 + %44 = llvm.mlir.constant(48 : index) : !llvm.i64 + %45 = llvm.mlir.constant(3 : index) : !llvm.i64 + %46 = llvm.mlir.constant(56 : index) : !llvm.i64 + %47 = llvm.mlir.constant(64 : index) : !llvm.i64 + %48 = llvm.mlir.constant(4 : index) : !llvm.i64 + %49 = llvm.mlir.constant(72 : index) : !llvm.i64 + %50 = llvm.mlir.constant(9 : index) : !llvm.i64 + %51 = llvm.mlir.constant(80 : index) : !llvm.i64 + %52 = llvm.mlir.constant(5 : index) : !llvm.i64 + %53 = llvm.mlir.constant(88 : index) : !llvm.i64 + %54 = llvm.mlir.constant(11 : index) : !llvm.i64 + %55 = llvm.mlir.constant(96 : index) : !llvm.i64 + %56 = llvm.mlir.constant(6 : index) : !llvm.i64 + %57 = llvm.mlir.constant(104 : index) : !llvm.i64 + %58 = llvm.mlir.constant(13 : index) : !llvm.i64 + %59 = llvm.mlir.constant(112 : index) : !llvm.i64 + %60 = llvm.mlir.constant(-16 : index) : !llvm.i64 + %61 = llvm.mlir.constant(7 : index) : !llvm.i64 + %62 = llvm.mlir.constant(120 : index) : !llvm.i64 + %63 = llvm.mlir.constant(2 : index) : !llvm.i64 + %64 = llvm.mlir.constant(-1 : index) : !llvm.i64 + %65 = llvm.mlir.constant(-2 : index) : !llvm.i64 + %66 = llvm.mlir.constant(15 : index) : !llvm.i64 + %67 = llvm.mlir.constant(0 : index) : !llvm.i64 + %68 = llvm.mlir.constant(16 : index) : !llvm.i64 + %69 = llvm.mlir.constant(1 : index) : !llvm.i64 + %70 = llvm.mlir.constant(8 : index) : !llvm.i64 + %71 = llvm.mlir.constant(1 : index) : !llvm.i64 + %72 = llvm.mlir.constant(16 : index) : !llvm.i64 + %73 = llvm.mul %71, %72 : !llvm.i64 + %74 = llvm.mlir.null : !llvm.ptr> + %75 = llvm.mlir.constant(1 : index) : !llvm.i64 + %76 = llvm.getelementptr %74[%75] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %77 = llvm.ptrtoint %76 : !llvm.ptr> to !llvm.i64 + %78 = llvm.mul %73, %77 : !llvm.i64 + %79 = llvm.alloca %78 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %80 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %81 = llvm.insertvalue %79, %80[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %82 = llvm.insertvalue %79, %81[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %83 = llvm.mlir.constant(0 : index) : !llvm.i64 + %84 = llvm.insertvalue %83, %82[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %85 = llvm.mlir.constant(1 : index) : !llvm.i64 + %86 = llvm.mlir.constant(16 : index) : !llvm.i64 + %87 = llvm.insertvalue %71, %84[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %88 = llvm.insertvalue %86, %87[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %89 = llvm.insertvalue %72, %88[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %90 = llvm.insertvalue %85, %89[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %91 = llvm.mlir.constant(1 : index) : !llvm.i64 + %92 = llvm.mlir.constant(16 : index) : !llvm.i64 + %93 = llvm.mul %91, %92 : !llvm.i64 + %94 = llvm.mlir.null : !llvm.ptr> + %95 = llvm.mlir.constant(1 : index) : !llvm.i64 + %96 = llvm.getelementptr %94[%95] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %97 = llvm.ptrtoint %96 : !llvm.ptr> to !llvm.i64 + %98 = llvm.mul %93, %97 : !llvm.i64 + %99 = llvm.alloca %98 x !llvm.vec<8 x float> {alignment = 32 : i64} : (!llvm.i64) -> !llvm.ptr> + %100 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %101 = llvm.insertvalue %99, %100[0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %102 = llvm.insertvalue %99, %101[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %103 = llvm.mlir.constant(0 : index) : !llvm.i64 + %104 = llvm.insertvalue %103, %102[2] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %105 = llvm.mlir.constant(1 : index) : !llvm.i64 + %106 = llvm.mlir.constant(16 : index) : !llvm.i64 + %107 = llvm.insertvalue %91, %104[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %108 = llvm.insertvalue %106, %107[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %109 = llvm.insertvalue %92, %108[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %110 = llvm.insertvalue %105, %109[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %111 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %112 = llvm.mlir.addressof @cache_16 : !llvm.ptr>> + %113 = llvm.getelementptr %112[%111, %111] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %114 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %115 = llvm.insertvalue %113, %114[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %116 = llvm.insertvalue %113, %115[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %117 = llvm.mlir.constant(0 : index) : !llvm.i64 + %118 = llvm.insertvalue %117, %116[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %119 = llvm.mlir.constant(16 : index) : !llvm.i64 + %120 = llvm.insertvalue %119, %118[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %121 = llvm.mlir.constant(12 : index) : !llvm.i64 + %122 = llvm.insertvalue %121, %120[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %123 = llvm.mlir.constant(6 : index) : !llvm.i64 + %124 = llvm.insertvalue %123, %122[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %126 = llvm.insertvalue %125, %124[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %127 = llvm.mlir.constant(2 : index) : !llvm.i64 + %128 = llvm.insertvalue %127, %126[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %129 = llvm.mlir.constant(1 : index) : !llvm.i64 + %130 = llvm.insertvalue %129, %128[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %131 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %132 = llvm.mlir.addressof @cache_17 : !llvm.ptr>> + %133 = llvm.getelementptr %132[%131, %131] : (!llvm.ptr>>, !llvm.i32, !llvm.i32) -> !llvm.ptr> + %134 = llvm.mlir.undef : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %135 = llvm.insertvalue %133, %134[0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %136 = llvm.insertvalue %133, %135[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %137 = llvm.mlir.constant(0 : index) : !llvm.i64 + %138 = llvm.insertvalue %137, %136[2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %139 = llvm.mlir.constant(16 : index) : !llvm.i64 + %140 = llvm.insertvalue %139, %138[3, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %141 = llvm.mlir.constant(256 : index) : !llvm.i64 + %142 = llvm.insertvalue %141, %140[4, 0] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %143 = llvm.mlir.constant(128 : index) : !llvm.i64 + %144 = llvm.insertvalue %143, %142[3, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %145 = llvm.mlir.constant(2 : index) : !llvm.i64 + %146 = llvm.insertvalue %145, %144[4, 1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %147 = llvm.mlir.constant(2 : index) : !llvm.i64 + %148 = llvm.insertvalue %147, %146[3, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %150 = llvm.insertvalue %149, %148[4, 2] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + llvm.br ^bb1(%67 : !llvm.i64) + ^bb1(%151: !llvm.i64): // 2 preds: ^bb0, ^bb52 + %152 = llvm.icmp "slt" %151, %36 : !llvm.i64 + llvm.cond_br %152, ^bb2, ^bb53 + ^bb2: // pred: ^bb1 + llvm.br ^bb3(%67 : !llvm.i64) + ^bb3(%153: !llvm.i64): // 2 preds: ^bb2, ^bb10 + %154 = llvm.icmp "slt" %153, %39 : !llvm.i64 + llvm.cond_br %154, ^bb4, ^bb11 + ^bb4: // pred: ^bb3 + llvm.br ^bb5(%67 : !llvm.i64) + ^bb5(%155: !llvm.i64): // 2 preds: ^bb4, ^bb9 + %156 = llvm.icmp "slt" %155, %38 : !llvm.i64 + llvm.cond_br %156, ^bb6, ^bb10 + ^bb6: // pred: ^bb5 + llvm.cond_br %40, ^bb7, ^bb8 + ^bb7: // pred: ^bb6 + %157 = llvm.add %151, %155 : !llvm.i64 + %158 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %160 = llvm.mlir.constant(512 : index) : !llvm.i64 + %161 = llvm.mul %153, %160 : !llvm.i64 + %162 = llvm.add %159, %161 : !llvm.i64 + %163 = llvm.mlir.constant(1 : index) : !llvm.i64 + %164 = llvm.mul %157, %163 : !llvm.i64 + %165 = llvm.add %162, %164 : !llvm.i64 + %166 = llvm.getelementptr %158[%165] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %167 = llvm.bitcast %166 : !llvm.ptr to !llvm.ptr> + %168 = llvm.load %167 {alignment = 4 : i64} : !llvm.ptr> + %169 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %170 = llvm.mlir.constant(0 : index) : !llvm.i64 + %171 = llvm.mlir.constant(16 : index) : !llvm.i64 + %172 = llvm.mul %67, %171 : !llvm.i64 + %173 = llvm.add %170, %172 : !llvm.i64 + %174 = llvm.mlir.constant(1 : index) : !llvm.i64 + %175 = llvm.mul %67, %174 : !llvm.i64 + %176 = llvm.add %173, %175 : !llvm.i64 + %177 = llvm.getelementptr %169[%176] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %168, %177 : !llvm.ptr> + %178 = llvm.add %157, %70 : !llvm.i64 + %179 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %180 = llvm.mlir.constant(0 : index) : !llvm.i64 + %181 = llvm.mlir.constant(512 : index) : !llvm.i64 + %182 = llvm.mul %153, %181 : !llvm.i64 + %183 = llvm.add %180, %182 : !llvm.i64 + %184 = llvm.mlir.constant(1 : index) : !llvm.i64 + %185 = llvm.mul %178, %184 : !llvm.i64 + %186 = llvm.add %183, %185 : !llvm.i64 + %187 = llvm.getelementptr %179[%186] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %188 = llvm.bitcast %187 : !llvm.ptr to !llvm.ptr> + %189 = llvm.load %188 {alignment = 4 : i64} : !llvm.ptr> + %190 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %192 = llvm.mlir.constant(16 : index) : !llvm.i64 + %193 = llvm.mul %67, %192 : !llvm.i64 + %194 = llvm.add %191, %193 : !llvm.i64 + %195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %196 = llvm.mul %69, %195 : !llvm.i64 + %197 = llvm.add %194, %196 : !llvm.i64 + %198 = llvm.getelementptr %190[%197] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %189, %198 : !llvm.ptr> + %199 = llvm.add %157, %68 : !llvm.i64 + %200 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %202 = llvm.mlir.constant(512 : index) : !llvm.i64 + %203 = llvm.mul %153, %202 : !llvm.i64 + %204 = llvm.add %201, %203 : !llvm.i64 + %205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %206 = llvm.mul %199, %205 : !llvm.i64 + %207 = llvm.add %204, %206 : !llvm.i64 + %208 = llvm.getelementptr %200[%207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %209 = llvm.bitcast %208 : !llvm.ptr to !llvm.ptr> + %210 = llvm.load %209 {alignment = 4 : i64} : !llvm.ptr> + %211 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %212 = llvm.mlir.constant(0 : index) : !llvm.i64 + %213 = llvm.mlir.constant(16 : index) : !llvm.i64 + %214 = llvm.mul %67, %213 : !llvm.i64 + %215 = llvm.add %212, %214 : !llvm.i64 + %216 = llvm.mlir.constant(1 : index) : !llvm.i64 + %217 = llvm.mul %63, %216 : !llvm.i64 + %218 = llvm.add %215, %217 : !llvm.i64 + %219 = llvm.getelementptr %211[%218] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %210, %219 : !llvm.ptr> + %220 = llvm.add %157, %41 : !llvm.i64 + %221 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %223 = llvm.mlir.constant(512 : index) : !llvm.i64 + %224 = llvm.mul %153, %223 : !llvm.i64 + %225 = llvm.add %222, %224 : !llvm.i64 + %226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %227 = llvm.mul %220, %226 : !llvm.i64 + %228 = llvm.add %225, %227 : !llvm.i64 + %229 = llvm.getelementptr %221[%228] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %230 = llvm.bitcast %229 : !llvm.ptr to !llvm.ptr> + %231 = llvm.load %230 {alignment = 4 : i64} : !llvm.ptr> + %232 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %233 = llvm.mlir.constant(0 : index) : !llvm.i64 + %234 = llvm.mlir.constant(16 : index) : !llvm.i64 + %235 = llvm.mul %67, %234 : !llvm.i64 + %236 = llvm.add %233, %235 : !llvm.i64 + %237 = llvm.mlir.constant(1 : index) : !llvm.i64 + %238 = llvm.mul %45, %237 : !llvm.i64 + %239 = llvm.add %236, %238 : !llvm.i64 + %240 = llvm.getelementptr %232[%239] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %231, %240 : !llvm.ptr> + %241 = llvm.add %157, %42 : !llvm.i64 + %242 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %243 = llvm.mlir.constant(0 : index) : !llvm.i64 + %244 = llvm.mlir.constant(512 : index) : !llvm.i64 + %245 = llvm.mul %153, %244 : !llvm.i64 + %246 = llvm.add %243, %245 : !llvm.i64 + %247 = llvm.mlir.constant(1 : index) : !llvm.i64 + %248 = llvm.mul %241, %247 : !llvm.i64 + %249 = llvm.add %246, %248 : !llvm.i64 + %250 = llvm.getelementptr %242[%249] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %251 = llvm.bitcast %250 : !llvm.ptr to !llvm.ptr> + %252 = llvm.load %251 {alignment = 4 : i64} : !llvm.ptr> + %253 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %254 = llvm.mlir.constant(0 : index) : !llvm.i64 + %255 = llvm.mlir.constant(16 : index) : !llvm.i64 + %256 = llvm.mul %67, %255 : !llvm.i64 + %257 = llvm.add %254, %256 : !llvm.i64 + %258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %259 = llvm.mul %48, %258 : !llvm.i64 + %260 = llvm.add %257, %259 : !llvm.i64 + %261 = llvm.getelementptr %253[%260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %252, %261 : !llvm.ptr> + %262 = llvm.add %157, %43 : !llvm.i64 + %263 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %264 = llvm.mlir.constant(0 : index) : !llvm.i64 + %265 = llvm.mlir.constant(512 : index) : !llvm.i64 + %266 = llvm.mul %153, %265 : !llvm.i64 + %267 = llvm.add %264, %266 : !llvm.i64 + %268 = llvm.mlir.constant(1 : index) : !llvm.i64 + %269 = llvm.mul %262, %268 : !llvm.i64 + %270 = llvm.add %267, %269 : !llvm.i64 + %271 = llvm.getelementptr %263[%270] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %272 = llvm.bitcast %271 : !llvm.ptr to !llvm.ptr> + %273 = llvm.load %272 {alignment = 4 : i64} : !llvm.ptr> + %274 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %275 = llvm.mlir.constant(0 : index) : !llvm.i64 + %276 = llvm.mlir.constant(16 : index) : !llvm.i64 + %277 = llvm.mul %67, %276 : !llvm.i64 + %278 = llvm.add %275, %277 : !llvm.i64 + %279 = llvm.mlir.constant(1 : index) : !llvm.i64 + %280 = llvm.mul %52, %279 : !llvm.i64 + %281 = llvm.add %278, %280 : !llvm.i64 + %282 = llvm.getelementptr %274[%281] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %273, %282 : !llvm.ptr> + %283 = llvm.add %157, %44 : !llvm.i64 + %284 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %286 = llvm.mlir.constant(512 : index) : !llvm.i64 + %287 = llvm.mul %153, %286 : !llvm.i64 + %288 = llvm.add %285, %287 : !llvm.i64 + %289 = llvm.mlir.constant(1 : index) : !llvm.i64 + %290 = llvm.mul %283, %289 : !llvm.i64 + %291 = llvm.add %288, %290 : !llvm.i64 + %292 = llvm.getelementptr %284[%291] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %293 = llvm.bitcast %292 : !llvm.ptr to !llvm.ptr> + %294 = llvm.load %293 {alignment = 4 : i64} : !llvm.ptr> + %295 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %296 = llvm.mlir.constant(0 : index) : !llvm.i64 + %297 = llvm.mlir.constant(16 : index) : !llvm.i64 + %298 = llvm.mul %67, %297 : !llvm.i64 + %299 = llvm.add %296, %298 : !llvm.i64 + %300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %301 = llvm.mul %56, %300 : !llvm.i64 + %302 = llvm.add %299, %301 : !llvm.i64 + %303 = llvm.getelementptr %295[%302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %294, %303 : !llvm.ptr> + %304 = llvm.add %157, %46 : !llvm.i64 + %305 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %306 = llvm.mlir.constant(0 : index) : !llvm.i64 + %307 = llvm.mlir.constant(512 : index) : !llvm.i64 + %308 = llvm.mul %153, %307 : !llvm.i64 + %309 = llvm.add %306, %308 : !llvm.i64 + %310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %311 = llvm.mul %304, %310 : !llvm.i64 + %312 = llvm.add %309, %311 : !llvm.i64 + %313 = llvm.getelementptr %305[%312] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %314 = llvm.bitcast %313 : !llvm.ptr to !llvm.ptr> + %315 = llvm.load %314 {alignment = 4 : i64} : !llvm.ptr> + %316 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %318 = llvm.mlir.constant(16 : index) : !llvm.i64 + %319 = llvm.mul %67, %318 : !llvm.i64 + %320 = llvm.add %317, %319 : !llvm.i64 + %321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %322 = llvm.mul %61, %321 : !llvm.i64 + %323 = llvm.add %320, %322 : !llvm.i64 + %324 = llvm.getelementptr %316[%323] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %315, %324 : !llvm.ptr> + %325 = llvm.add %157, %47 : !llvm.i64 + %326 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %328 = llvm.mlir.constant(512 : index) : !llvm.i64 + %329 = llvm.mul %153, %328 : !llvm.i64 + %330 = llvm.add %327, %329 : !llvm.i64 + %331 = llvm.mlir.constant(1 : index) : !llvm.i64 + %332 = llvm.mul %325, %331 : !llvm.i64 + %333 = llvm.add %330, %332 : !llvm.i64 + %334 = llvm.getelementptr %326[%333] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %335 = llvm.bitcast %334 : !llvm.ptr to !llvm.ptr> + %336 = llvm.load %335 {alignment = 4 : i64} : !llvm.ptr> + %337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %340 = llvm.mul %67, %339 : !llvm.i64 + %341 = llvm.add %338, %340 : !llvm.i64 + %342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %343 = llvm.mul %70, %342 : !llvm.i64 + %344 = llvm.add %341, %343 : !llvm.i64 + %345 = llvm.getelementptr %337[%344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %336, %345 : !llvm.ptr> + %346 = llvm.add %157, %49 : !llvm.i64 + %347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %350 = llvm.mul %153, %349 : !llvm.i64 + %351 = llvm.add %348, %350 : !llvm.i64 + %352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %353 = llvm.mul %346, %352 : !llvm.i64 + %354 = llvm.add %351, %353 : !llvm.i64 + %355 = llvm.getelementptr %347[%354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %356 = llvm.bitcast %355 : !llvm.ptr to !llvm.ptr> + %357 = llvm.load %356 {alignment = 4 : i64} : !llvm.ptr> + %358 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %360 = llvm.mlir.constant(16 : index) : !llvm.i64 + %361 = llvm.mul %67, %360 : !llvm.i64 + %362 = llvm.add %359, %361 : !llvm.i64 + %363 = llvm.mlir.constant(1 : index) : !llvm.i64 + %364 = llvm.mul %50, %363 : !llvm.i64 + %365 = llvm.add %362, %364 : !llvm.i64 + %366 = llvm.getelementptr %358[%365] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %357, %366 : !llvm.ptr> + %367 = llvm.add %157, %51 : !llvm.i64 + %368 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %369 = llvm.mlir.constant(0 : index) : !llvm.i64 + %370 = llvm.mlir.constant(512 : index) : !llvm.i64 + %371 = llvm.mul %153, %370 : !llvm.i64 + %372 = llvm.add %369, %371 : !llvm.i64 + %373 = llvm.mlir.constant(1 : index) : !llvm.i64 + %374 = llvm.mul %367, %373 : !llvm.i64 + %375 = llvm.add %372, %374 : !llvm.i64 + %376 = llvm.getelementptr %368[%375] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %377 = llvm.bitcast %376 : !llvm.ptr to !llvm.ptr> + %378 = llvm.load %377 {alignment = 4 : i64} : !llvm.ptr> + %379 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %380 = llvm.mlir.constant(0 : index) : !llvm.i64 + %381 = llvm.mlir.constant(16 : index) : !llvm.i64 + %382 = llvm.mul %67, %381 : !llvm.i64 + %383 = llvm.add %380, %382 : !llvm.i64 + %384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %385 = llvm.mul %33, %384 : !llvm.i64 + %386 = llvm.add %383, %385 : !llvm.i64 + %387 = llvm.getelementptr %379[%386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %378, %387 : !llvm.ptr> + %388 = llvm.add %157, %53 : !llvm.i64 + %389 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %390 = llvm.mlir.constant(0 : index) : !llvm.i64 + %391 = llvm.mlir.constant(512 : index) : !llvm.i64 + %392 = llvm.mul %153, %391 : !llvm.i64 + %393 = llvm.add %390, %392 : !llvm.i64 + %394 = llvm.mlir.constant(1 : index) : !llvm.i64 + %395 = llvm.mul %388, %394 : !llvm.i64 + %396 = llvm.add %393, %395 : !llvm.i64 + %397 = llvm.getelementptr %389[%396] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %398 = llvm.bitcast %397 : !llvm.ptr to !llvm.ptr> + %399 = llvm.load %398 {alignment = 4 : i64} : !llvm.ptr> + %400 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %403 = llvm.mul %67, %402 : !llvm.i64 + %404 = llvm.add %401, %403 : !llvm.i64 + %405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %406 = llvm.mul %54, %405 : !llvm.i64 + %407 = llvm.add %404, %406 : !llvm.i64 + %408 = llvm.getelementptr %400[%407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %399, %408 : !llvm.ptr> + %409 = llvm.add %157, %55 : !llvm.i64 + %410 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %413 = llvm.mul %153, %412 : !llvm.i64 + %414 = llvm.add %411, %413 : !llvm.i64 + %415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %416 = llvm.mul %409, %415 : !llvm.i64 + %417 = llvm.add %414, %416 : !llvm.i64 + %418 = llvm.getelementptr %410[%417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %419 = llvm.bitcast %418 : !llvm.ptr to !llvm.ptr> + %420 = llvm.load %419 {alignment = 4 : i64} : !llvm.ptr> + %421 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %423 = llvm.mlir.constant(16 : index) : !llvm.i64 + %424 = llvm.mul %67, %423 : !llvm.i64 + %425 = llvm.add %422, %424 : !llvm.i64 + %426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %427 = llvm.mul %34, %426 : !llvm.i64 + %428 = llvm.add %425, %427 : !llvm.i64 + %429 = llvm.getelementptr %421[%428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %420, %429 : !llvm.ptr> + %430 = llvm.add %157, %57 : !llvm.i64 + %431 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %433 = llvm.mlir.constant(512 : index) : !llvm.i64 + %434 = llvm.mul %153, %433 : !llvm.i64 + %435 = llvm.add %432, %434 : !llvm.i64 + %436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %437 = llvm.mul %430, %436 : !llvm.i64 + %438 = llvm.add %435, %437 : !llvm.i64 + %439 = llvm.getelementptr %431[%438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %440 = llvm.bitcast %439 : !llvm.ptr to !llvm.ptr> + %441 = llvm.load %440 {alignment = 4 : i64} : !llvm.ptr> + %442 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %444 = llvm.mlir.constant(16 : index) : !llvm.i64 + %445 = llvm.mul %67, %444 : !llvm.i64 + %446 = llvm.add %443, %445 : !llvm.i64 + %447 = llvm.mlir.constant(1 : index) : !llvm.i64 + %448 = llvm.mul %58, %447 : !llvm.i64 + %449 = llvm.add %446, %448 : !llvm.i64 + %450 = llvm.getelementptr %442[%449] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %441, %450 : !llvm.ptr> + %451 = llvm.add %157, %59 : !llvm.i64 + %452 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %453 = llvm.mlir.constant(0 : index) : !llvm.i64 + %454 = llvm.mlir.constant(512 : index) : !llvm.i64 + %455 = llvm.mul %153, %454 : !llvm.i64 + %456 = llvm.add %453, %455 : !llvm.i64 + %457 = llvm.mlir.constant(1 : index) : !llvm.i64 + %458 = llvm.mul %451, %457 : !llvm.i64 + %459 = llvm.add %456, %458 : !llvm.i64 + %460 = llvm.getelementptr %452[%459] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %461 = llvm.bitcast %460 : !llvm.ptr to !llvm.ptr> + %462 = llvm.load %461 {alignment = 4 : i64} : !llvm.ptr> + %463 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %464 = llvm.mlir.constant(0 : index) : !llvm.i64 + %465 = llvm.mlir.constant(16 : index) : !llvm.i64 + %466 = llvm.mul %67, %465 : !llvm.i64 + %467 = llvm.add %464, %466 : !llvm.i64 + %468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %469 = llvm.mul %35, %468 : !llvm.i64 + %470 = llvm.add %467, %469 : !llvm.i64 + %471 = llvm.getelementptr %463[%470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %462, %471 : !llvm.ptr> + %472 = llvm.add %157, %62 : !llvm.i64 + %473 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %474 = llvm.mlir.constant(0 : index) : !llvm.i64 + %475 = llvm.mlir.constant(512 : index) : !llvm.i64 + %476 = llvm.mul %153, %475 : !llvm.i64 + %477 = llvm.add %474, %476 : !llvm.i64 + %478 = llvm.mlir.constant(1 : index) : !llvm.i64 + %479 = llvm.mul %472, %478 : !llvm.i64 + %480 = llvm.add %477, %479 : !llvm.i64 + %481 = llvm.getelementptr %473[%480] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %482 = llvm.bitcast %481 : !llvm.ptr to !llvm.ptr> + %483 = llvm.load %482 {alignment = 4 : i64} : !llvm.ptr> + %484 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %485 = llvm.mlir.constant(0 : index) : !llvm.i64 + %486 = llvm.mlir.constant(16 : index) : !llvm.i64 + %487 = llvm.mul %67, %486 : !llvm.i64 + %488 = llvm.add %485, %487 : !llvm.i64 + %489 = llvm.mlir.constant(1 : index) : !llvm.i64 + %490 = llvm.mul %66, %489 : !llvm.i64 + %491 = llvm.add %488, %490 : !llvm.i64 + %492 = llvm.getelementptr %484[%491] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %483, %492 : !llvm.ptr> + %493 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %494 = llvm.mlir.constant(0 : index) : !llvm.i64 + %495 = llvm.mlir.constant(16 : index) : !llvm.i64 + %496 = llvm.mul %67, %495 : !llvm.i64 + %497 = llvm.add %494, %496 : !llvm.i64 + %498 = llvm.mlir.constant(1 : index) : !llvm.i64 + %499 = llvm.mul %67, %498 : !llvm.i64 + %500 = llvm.add %497, %499 : !llvm.i64 + %501 = llvm.getelementptr %493[%500] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %502 = llvm.load %501 : !llvm.ptr> + %503 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %504 = llvm.sub %64, %155 : !llvm.i64 + %505 = llvm.select %503, %504, %155 : !llvm.i1, !llvm.i64 + %506 = llvm.sdiv %505, %68 : !llvm.i64 + %507 = llvm.sub %64, %506 : !llvm.i64 + %508 = llvm.select %503, %507, %506 : !llvm.i1, !llvm.i64 + %509 = llvm.srem %508, %68 : !llvm.i64 + %510 = llvm.icmp "slt" %509, %67 : !llvm.i64 + %511 = llvm.add %509, %68 : !llvm.i64 + %512 = llvm.select %510, %511, %509 : !llvm.i1, !llvm.i64 + %513 = llvm.srem %153, %39 : !llvm.i64 + %514 = llvm.icmp "slt" %513, %67 : !llvm.i64 + %515 = llvm.add %513, %39 : !llvm.i64 + %516 = llvm.select %514, %515, %513 : !llvm.i1, !llvm.i64 + %517 = llvm.srem %155, %68 : !llvm.i64 + %518 = llvm.icmp "slt" %517, %67 : !llvm.i64 + %519 = llvm.add %517, %68 : !llvm.i64 + %520 = llvm.select %518, %519, %517 : !llvm.i1, !llvm.i64 + %521 = llvm.icmp "slt" %520, %67 : !llvm.i64 + %522 = llvm.sub %64, %520 : !llvm.i64 + %523 = llvm.select %521, %522, %520 : !llvm.i1, !llvm.i64 + %524 = llvm.sdiv %523, %70 : !llvm.i64 + %525 = llvm.sub %64, %524 : !llvm.i64 + %526 = llvm.select %521, %525, %524 : !llvm.i1, !llvm.i64 + %527 = llvm.srem %526, %63 : !llvm.i64 + %528 = llvm.icmp "slt" %527, %67 : !llvm.i64 + %529 = llvm.add %527, %63 : !llvm.i64 + %530 = llvm.select %528, %529, %527 : !llvm.i1, !llvm.i64 + %531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %534 = llvm.mul %512, %533 : !llvm.i64 + %535 = llvm.add %532, %534 : !llvm.i64 + %536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %537 = llvm.mul %516, %536 : !llvm.i64 + %538 = llvm.add %535, %537 : !llvm.i64 + %539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %540 = llvm.mul %530, %539 : !llvm.i64 + %541 = llvm.add %538, %540 : !llvm.i64 + %542 = llvm.getelementptr %531[%541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %502, %542 : !llvm.ptr> + %543 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %544 = llvm.mlir.constant(0 : index) : !llvm.i64 + %545 = llvm.mlir.constant(16 : index) : !llvm.i64 + %546 = llvm.mul %67, %545 : !llvm.i64 + %547 = llvm.add %544, %546 : !llvm.i64 + %548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %549 = llvm.mul %69, %548 : !llvm.i64 + %550 = llvm.add %547, %549 : !llvm.i64 + %551 = llvm.getelementptr %543[%550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %552 = llvm.load %551 : !llvm.ptr> + %553 = llvm.add %155, %70 : !llvm.i64 + %554 = llvm.icmp "slt" %553, %67 : !llvm.i64 + %555 = llvm.sub %64, %553 : !llvm.i64 + %556 = llvm.select %554, %555, %553 : !llvm.i1, !llvm.i64 + %557 = llvm.sdiv %556, %68 : !llvm.i64 + %558 = llvm.sub %64, %557 : !llvm.i64 + %559 = llvm.select %554, %558, %557 : !llvm.i1, !llvm.i64 + %560 = llvm.srem %559, %68 : !llvm.i64 + %561 = llvm.icmp "slt" %560, %67 : !llvm.i64 + %562 = llvm.add %560, %68 : !llvm.i64 + %563 = llvm.select %561, %562, %560 : !llvm.i1, !llvm.i64 + %564 = llvm.sdiv %505, %70 : !llvm.i64 + %565 = llvm.sub %64, %564 : !llvm.i64 + %566 = llvm.select %503, %565, %564 : !llvm.i1, !llvm.i64 + %567 = llvm.mul %559, %65 : !llvm.i64 + %568 = llvm.add %566, %567 : !llvm.i64 + %569 = llvm.add %568, %69 : !llvm.i64 + %570 = llvm.icmp "slt" %569, %67 : !llvm.i64 + %571 = llvm.sub %64, %569 : !llvm.i64 + %572 = llvm.select %570, %571, %569 : !llvm.i1, !llvm.i64 + %573 = llvm.sdiv %572, %63 : !llvm.i64 + %574 = llvm.sub %64, %573 : !llvm.i64 + %575 = llvm.select %570, %574, %573 : !llvm.i1, !llvm.i64 + %576 = llvm.mul %575, %65 : !llvm.i64 + %577 = llvm.add %568, %576 : !llvm.i64 + %578 = llvm.add %577, %69 : !llvm.i64 + %579 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %580 = llvm.mlir.constant(0 : index) : !llvm.i64 + %581 = llvm.mlir.constant(256 : index) : !llvm.i64 + %582 = llvm.mul %563, %581 : !llvm.i64 + %583 = llvm.add %580, %582 : !llvm.i64 + %584 = llvm.mlir.constant(2 : index) : !llvm.i64 + %585 = llvm.mul %516, %584 : !llvm.i64 + %586 = llvm.add %583, %585 : !llvm.i64 + %587 = llvm.mlir.constant(1 : index) : !llvm.i64 + %588 = llvm.mul %578, %587 : !llvm.i64 + %589 = llvm.add %586, %588 : !llvm.i64 + %590 = llvm.getelementptr %579[%589] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %552, %590 : !llvm.ptr> + %591 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %593 = llvm.mlir.constant(16 : index) : !llvm.i64 + %594 = llvm.mul %67, %593 : !llvm.i64 + %595 = llvm.add %592, %594 : !llvm.i64 + %596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %597 = llvm.mul %63, %596 : !llvm.i64 + %598 = llvm.add %595, %597 : !llvm.i64 + %599 = llvm.getelementptr %591[%598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %600 = llvm.load %599 : !llvm.ptr> + %601 = llvm.add %508, %69 : !llvm.i64 + %602 = llvm.icmp "slt" %601, %67 : !llvm.i64 + %603 = llvm.sub %64, %601 : !llvm.i64 + %604 = llvm.select %602, %603, %601 : !llvm.i1, !llvm.i64 + %605 = llvm.sdiv %604, %68 : !llvm.i64 + %606 = llvm.sub %64, %605 : !llvm.i64 + %607 = llvm.select %602, %606, %605 : !llvm.i1, !llvm.i64 + %608 = llvm.mul %607, %60 : !llvm.i64 + %609 = llvm.add %508, %608 : !llvm.i64 + %610 = llvm.add %609, %69 : !llvm.i64 + %611 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %612 = llvm.mlir.constant(0 : index) : !llvm.i64 + %613 = llvm.mlir.constant(256 : index) : !llvm.i64 + %614 = llvm.mul %610, %613 : !llvm.i64 + %615 = llvm.add %612, %614 : !llvm.i64 + %616 = llvm.mlir.constant(2 : index) : !llvm.i64 + %617 = llvm.mul %516, %616 : !llvm.i64 + %618 = llvm.add %615, %617 : !llvm.i64 + %619 = llvm.mlir.constant(1 : index) : !llvm.i64 + %620 = llvm.mul %530, %619 : !llvm.i64 + %621 = llvm.add %618, %620 : !llvm.i64 + %622 = llvm.getelementptr %611[%621] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %600, %622 : !llvm.ptr> + %623 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %624 = llvm.mlir.constant(0 : index) : !llvm.i64 + %625 = llvm.mlir.constant(16 : index) : !llvm.i64 + %626 = llvm.mul %67, %625 : !llvm.i64 + %627 = llvm.add %624, %626 : !llvm.i64 + %628 = llvm.mlir.constant(1 : index) : !llvm.i64 + %629 = llvm.mul %45, %628 : !llvm.i64 + %630 = llvm.add %627, %629 : !llvm.i64 + %631 = llvm.getelementptr %623[%630] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %632 = llvm.load %631 : !llvm.ptr> + %633 = llvm.add %155, %41 : !llvm.i64 + %634 = llvm.icmp "slt" %633, %67 : !llvm.i64 + %635 = llvm.sub %64, %633 : !llvm.i64 + %636 = llvm.select %634, %635, %633 : !llvm.i1, !llvm.i64 + %637 = llvm.sdiv %636, %68 : !llvm.i64 + %638 = llvm.sub %64, %637 : !llvm.i64 + %639 = llvm.select %634, %638, %637 : !llvm.i1, !llvm.i64 + %640 = llvm.srem %639, %68 : !llvm.i64 + %641 = llvm.icmp "slt" %640, %67 : !llvm.i64 + %642 = llvm.add %640, %68 : !llvm.i64 + %643 = llvm.select %641, %642, %640 : !llvm.i1, !llvm.i64 + %644 = llvm.mul %639, %65 : !llvm.i64 + %645 = llvm.add %566, %644 : !llvm.i64 + %646 = llvm.add %645, %45 : !llvm.i64 + %647 = llvm.icmp "slt" %646, %67 : !llvm.i64 + %648 = llvm.sub %64, %646 : !llvm.i64 + %649 = llvm.select %647, %648, %646 : !llvm.i1, !llvm.i64 + %650 = llvm.sdiv %649, %63 : !llvm.i64 + %651 = llvm.sub %64, %650 : !llvm.i64 + %652 = llvm.select %647, %651, %650 : !llvm.i1, !llvm.i64 + %653 = llvm.mul %652, %65 : !llvm.i64 + %654 = llvm.add %645, %653 : !llvm.i64 + %655 = llvm.add %654, %45 : !llvm.i64 + %656 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %658 = llvm.mlir.constant(256 : index) : !llvm.i64 + %659 = llvm.mul %643, %658 : !llvm.i64 + %660 = llvm.add %657, %659 : !llvm.i64 + %661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %662 = llvm.mul %516, %661 : !llvm.i64 + %663 = llvm.add %660, %662 : !llvm.i64 + %664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %665 = llvm.mul %655, %664 : !llvm.i64 + %666 = llvm.add %663, %665 : !llvm.i64 + %667 = llvm.getelementptr %656[%666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %632, %667 : !llvm.ptr> + %668 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %669 = llvm.mlir.constant(0 : index) : !llvm.i64 + %670 = llvm.mlir.constant(16 : index) : !llvm.i64 + %671 = llvm.mul %67, %670 : !llvm.i64 + %672 = llvm.add %669, %671 : !llvm.i64 + %673 = llvm.mlir.constant(1 : index) : !llvm.i64 + %674 = llvm.mul %48, %673 : !llvm.i64 + %675 = llvm.add %672, %674 : !llvm.i64 + %676 = llvm.getelementptr %668[%675] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %677 = llvm.load %676 : !llvm.ptr> + %678 = llvm.add %508, %63 : !llvm.i64 + %679 = llvm.icmp "slt" %678, %67 : !llvm.i64 + %680 = llvm.sub %64, %678 : !llvm.i64 + %681 = llvm.select %679, %680, %678 : !llvm.i1, !llvm.i64 + %682 = llvm.sdiv %681, %68 : !llvm.i64 + %683 = llvm.sub %64, %682 : !llvm.i64 + %684 = llvm.select %679, %683, %682 : !llvm.i1, !llvm.i64 + %685 = llvm.mul %684, %60 : !llvm.i64 + %686 = llvm.add %508, %685 : !llvm.i64 + %687 = llvm.add %686, %63 : !llvm.i64 + %688 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %689 = llvm.mlir.constant(0 : index) : !llvm.i64 + %690 = llvm.mlir.constant(256 : index) : !llvm.i64 + %691 = llvm.mul %687, %690 : !llvm.i64 + %692 = llvm.add %689, %691 : !llvm.i64 + %693 = llvm.mlir.constant(2 : index) : !llvm.i64 + %694 = llvm.mul %516, %693 : !llvm.i64 + %695 = llvm.add %692, %694 : !llvm.i64 + %696 = llvm.mlir.constant(1 : index) : !llvm.i64 + %697 = llvm.mul %530, %696 : !llvm.i64 + %698 = llvm.add %695, %697 : !llvm.i64 + %699 = llvm.getelementptr %688[%698] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %677, %699 : !llvm.ptr> + %700 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %701 = llvm.mlir.constant(0 : index) : !llvm.i64 + %702 = llvm.mlir.constant(16 : index) : !llvm.i64 + %703 = llvm.mul %67, %702 : !llvm.i64 + %704 = llvm.add %701, %703 : !llvm.i64 + %705 = llvm.mlir.constant(1 : index) : !llvm.i64 + %706 = llvm.mul %52, %705 : !llvm.i64 + %707 = llvm.add %704, %706 : !llvm.i64 + %708 = llvm.getelementptr %700[%707] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %709 = llvm.load %708 : !llvm.ptr> + %710 = llvm.add %155, %43 : !llvm.i64 + %711 = llvm.icmp "slt" %710, %67 : !llvm.i64 + %712 = llvm.sub %64, %710 : !llvm.i64 + %713 = llvm.select %711, %712, %710 : !llvm.i1, !llvm.i64 + %714 = llvm.sdiv %713, %68 : !llvm.i64 + %715 = llvm.sub %64, %714 : !llvm.i64 + %716 = llvm.select %711, %715, %714 : !llvm.i1, !llvm.i64 + %717 = llvm.srem %716, %68 : !llvm.i64 + %718 = llvm.icmp "slt" %717, %67 : !llvm.i64 + %719 = llvm.add %717, %68 : !llvm.i64 + %720 = llvm.select %718, %719, %717 : !llvm.i1, !llvm.i64 + %721 = llvm.mul %716, %65 : !llvm.i64 + %722 = llvm.add %566, %721 : !llvm.i64 + %723 = llvm.add %722, %52 : !llvm.i64 + %724 = llvm.icmp "slt" %723, %67 : !llvm.i64 + %725 = llvm.sub %64, %723 : !llvm.i64 + %726 = llvm.select %724, %725, %723 : !llvm.i1, !llvm.i64 + %727 = llvm.sdiv %726, %63 : !llvm.i64 + %728 = llvm.sub %64, %727 : !llvm.i64 + %729 = llvm.select %724, %728, %727 : !llvm.i1, !llvm.i64 + %730 = llvm.mul %729, %65 : !llvm.i64 + %731 = llvm.add %722, %730 : !llvm.i64 + %732 = llvm.add %731, %52 : !llvm.i64 + %733 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %734 = llvm.mlir.constant(0 : index) : !llvm.i64 + %735 = llvm.mlir.constant(256 : index) : !llvm.i64 + %736 = llvm.mul %720, %735 : !llvm.i64 + %737 = llvm.add %734, %736 : !llvm.i64 + %738 = llvm.mlir.constant(2 : index) : !llvm.i64 + %739 = llvm.mul %516, %738 : !llvm.i64 + %740 = llvm.add %737, %739 : !llvm.i64 + %741 = llvm.mlir.constant(1 : index) : !llvm.i64 + %742 = llvm.mul %732, %741 : !llvm.i64 + %743 = llvm.add %740, %742 : !llvm.i64 + %744 = llvm.getelementptr %733[%743] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %709, %744 : !llvm.ptr> + %745 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %746 = llvm.mlir.constant(0 : index) : !llvm.i64 + %747 = llvm.mlir.constant(16 : index) : !llvm.i64 + %748 = llvm.mul %67, %747 : !llvm.i64 + %749 = llvm.add %746, %748 : !llvm.i64 + %750 = llvm.mlir.constant(1 : index) : !llvm.i64 + %751 = llvm.mul %56, %750 : !llvm.i64 + %752 = llvm.add %749, %751 : !llvm.i64 + %753 = llvm.getelementptr %745[%752] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %754 = llvm.load %753 : !llvm.ptr> + %755 = llvm.add %508, %45 : !llvm.i64 + %756 = llvm.icmp "slt" %755, %67 : !llvm.i64 + %757 = llvm.sub %64, %755 : !llvm.i64 + %758 = llvm.select %756, %757, %755 : !llvm.i1, !llvm.i64 + %759 = llvm.sdiv %758, %68 : !llvm.i64 + %760 = llvm.sub %64, %759 : !llvm.i64 + %761 = llvm.select %756, %760, %759 : !llvm.i1, !llvm.i64 + %762 = llvm.mul %761, %60 : !llvm.i64 + %763 = llvm.add %508, %762 : !llvm.i64 + %764 = llvm.add %763, %45 : !llvm.i64 + %765 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %766 = llvm.mlir.constant(0 : index) : !llvm.i64 + %767 = llvm.mlir.constant(256 : index) : !llvm.i64 + %768 = llvm.mul %764, %767 : !llvm.i64 + %769 = llvm.add %766, %768 : !llvm.i64 + %770 = llvm.mlir.constant(2 : index) : !llvm.i64 + %771 = llvm.mul %516, %770 : !llvm.i64 + %772 = llvm.add %769, %771 : !llvm.i64 + %773 = llvm.mlir.constant(1 : index) : !llvm.i64 + %774 = llvm.mul %530, %773 : !llvm.i64 + %775 = llvm.add %772, %774 : !llvm.i64 + %776 = llvm.getelementptr %765[%775] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %754, %776 : !llvm.ptr> + %777 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %778 = llvm.mlir.constant(0 : index) : !llvm.i64 + %779 = llvm.mlir.constant(16 : index) : !llvm.i64 + %780 = llvm.mul %67, %779 : !llvm.i64 + %781 = llvm.add %778, %780 : !llvm.i64 + %782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %783 = llvm.mul %61, %782 : !llvm.i64 + %784 = llvm.add %781, %783 : !llvm.i64 + %785 = llvm.getelementptr %777[%784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %786 = llvm.load %785 : !llvm.ptr> + %787 = llvm.add %155, %46 : !llvm.i64 + %788 = llvm.icmp "slt" %787, %67 : !llvm.i64 + %789 = llvm.sub %64, %787 : !llvm.i64 + %790 = llvm.select %788, %789, %787 : !llvm.i1, !llvm.i64 + %791 = llvm.sdiv %790, %68 : !llvm.i64 + %792 = llvm.sub %64, %791 : !llvm.i64 + %793 = llvm.select %788, %792, %791 : !llvm.i1, !llvm.i64 + %794 = llvm.srem %793, %68 : !llvm.i64 + %795 = llvm.icmp "slt" %794, %67 : !llvm.i64 + %796 = llvm.add %794, %68 : !llvm.i64 + %797 = llvm.select %795, %796, %794 : !llvm.i1, !llvm.i64 + %798 = llvm.mul %793, %65 : !llvm.i64 + %799 = llvm.add %566, %798 : !llvm.i64 + %800 = llvm.add %799, %61 : !llvm.i64 + %801 = llvm.icmp "slt" %800, %67 : !llvm.i64 + %802 = llvm.sub %64, %800 : !llvm.i64 + %803 = llvm.select %801, %802, %800 : !llvm.i1, !llvm.i64 + %804 = llvm.sdiv %803, %63 : !llvm.i64 + %805 = llvm.sub %64, %804 : !llvm.i64 + %806 = llvm.select %801, %805, %804 : !llvm.i1, !llvm.i64 + %807 = llvm.mul %806, %65 : !llvm.i64 + %808 = llvm.add %799, %807 : !llvm.i64 + %809 = llvm.add %808, %61 : !llvm.i64 + %810 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %811 = llvm.mlir.constant(0 : index) : !llvm.i64 + %812 = llvm.mlir.constant(256 : index) : !llvm.i64 + %813 = llvm.mul %797, %812 : !llvm.i64 + %814 = llvm.add %811, %813 : !llvm.i64 + %815 = llvm.mlir.constant(2 : index) : !llvm.i64 + %816 = llvm.mul %516, %815 : !llvm.i64 + %817 = llvm.add %814, %816 : !llvm.i64 + %818 = llvm.mlir.constant(1 : index) : !llvm.i64 + %819 = llvm.mul %809, %818 : !llvm.i64 + %820 = llvm.add %817, %819 : !llvm.i64 + %821 = llvm.getelementptr %810[%820] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %786, %821 : !llvm.ptr> + %822 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %824 = llvm.mlir.constant(16 : index) : !llvm.i64 + %825 = llvm.mul %67, %824 : !llvm.i64 + %826 = llvm.add %823, %825 : !llvm.i64 + %827 = llvm.mlir.constant(1 : index) : !llvm.i64 + %828 = llvm.mul %70, %827 : !llvm.i64 + %829 = llvm.add %826, %828 : !llvm.i64 + %830 = llvm.getelementptr %822[%829] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %831 = llvm.load %830 : !llvm.ptr> + %832 = llvm.add %508, %48 : !llvm.i64 + %833 = llvm.icmp "slt" %832, %67 : !llvm.i64 + %834 = llvm.sub %64, %832 : !llvm.i64 + %835 = llvm.select %833, %834, %832 : !llvm.i1, !llvm.i64 + %836 = llvm.sdiv %835, %68 : !llvm.i64 + %837 = llvm.sub %64, %836 : !llvm.i64 + %838 = llvm.select %833, %837, %836 : !llvm.i1, !llvm.i64 + %839 = llvm.mul %838, %60 : !llvm.i64 + %840 = llvm.add %508, %839 : !llvm.i64 + %841 = llvm.add %840, %48 : !llvm.i64 + %842 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %843 = llvm.mlir.constant(0 : index) : !llvm.i64 + %844 = llvm.mlir.constant(256 : index) : !llvm.i64 + %845 = llvm.mul %841, %844 : !llvm.i64 + %846 = llvm.add %843, %845 : !llvm.i64 + %847 = llvm.mlir.constant(2 : index) : !llvm.i64 + %848 = llvm.mul %516, %847 : !llvm.i64 + %849 = llvm.add %846, %848 : !llvm.i64 + %850 = llvm.mlir.constant(1 : index) : !llvm.i64 + %851 = llvm.mul %530, %850 : !llvm.i64 + %852 = llvm.add %849, %851 : !llvm.i64 + %853 = llvm.getelementptr %842[%852] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %831, %853 : !llvm.ptr> + %854 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %855 = llvm.mlir.constant(0 : index) : !llvm.i64 + %856 = llvm.mlir.constant(16 : index) : !llvm.i64 + %857 = llvm.mul %67, %856 : !llvm.i64 + %858 = llvm.add %855, %857 : !llvm.i64 + %859 = llvm.mlir.constant(1 : index) : !llvm.i64 + %860 = llvm.mul %50, %859 : !llvm.i64 + %861 = llvm.add %858, %860 : !llvm.i64 + %862 = llvm.getelementptr %854[%861] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %863 = llvm.load %862 : !llvm.ptr> + %864 = llvm.add %155, %49 : !llvm.i64 + %865 = llvm.icmp "slt" %864, %67 : !llvm.i64 + %866 = llvm.sub %64, %864 : !llvm.i64 + %867 = llvm.select %865, %866, %864 : !llvm.i1, !llvm.i64 + %868 = llvm.sdiv %867, %68 : !llvm.i64 + %869 = llvm.sub %64, %868 : !llvm.i64 + %870 = llvm.select %865, %869, %868 : !llvm.i1, !llvm.i64 + %871 = llvm.srem %870, %68 : !llvm.i64 + %872 = llvm.icmp "slt" %871, %67 : !llvm.i64 + %873 = llvm.add %871, %68 : !llvm.i64 + %874 = llvm.select %872, %873, %871 : !llvm.i1, !llvm.i64 + %875 = llvm.mul %870, %65 : !llvm.i64 + %876 = llvm.add %566, %875 : !llvm.i64 + %877 = llvm.add %876, %50 : !llvm.i64 + %878 = llvm.icmp "slt" %877, %67 : !llvm.i64 + %879 = llvm.sub %64, %877 : !llvm.i64 + %880 = llvm.select %878, %879, %877 : !llvm.i1, !llvm.i64 + %881 = llvm.sdiv %880, %63 : !llvm.i64 + %882 = llvm.sub %64, %881 : !llvm.i64 + %883 = llvm.select %878, %882, %881 : !llvm.i1, !llvm.i64 + %884 = llvm.mul %883, %65 : !llvm.i64 + %885 = llvm.add %876, %884 : !llvm.i64 + %886 = llvm.add %885, %50 : !llvm.i64 + %887 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %888 = llvm.mlir.constant(0 : index) : !llvm.i64 + %889 = llvm.mlir.constant(256 : index) : !llvm.i64 + %890 = llvm.mul %874, %889 : !llvm.i64 + %891 = llvm.add %888, %890 : !llvm.i64 + %892 = llvm.mlir.constant(2 : index) : !llvm.i64 + %893 = llvm.mul %516, %892 : !llvm.i64 + %894 = llvm.add %891, %893 : !llvm.i64 + %895 = llvm.mlir.constant(1 : index) : !llvm.i64 + %896 = llvm.mul %886, %895 : !llvm.i64 + %897 = llvm.add %894, %896 : !llvm.i64 + %898 = llvm.getelementptr %887[%897] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %863, %898 : !llvm.ptr> + %899 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %900 = llvm.mlir.constant(0 : index) : !llvm.i64 + %901 = llvm.mlir.constant(16 : index) : !llvm.i64 + %902 = llvm.mul %67, %901 : !llvm.i64 + %903 = llvm.add %900, %902 : !llvm.i64 + %904 = llvm.mlir.constant(1 : index) : !llvm.i64 + %905 = llvm.mul %33, %904 : !llvm.i64 + %906 = llvm.add %903, %905 : !llvm.i64 + %907 = llvm.getelementptr %899[%906] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %908 = llvm.load %907 : !llvm.ptr> + %909 = llvm.add %508, %52 : !llvm.i64 + %910 = llvm.icmp "slt" %909, %67 : !llvm.i64 + %911 = llvm.sub %64, %909 : !llvm.i64 + %912 = llvm.select %910, %911, %909 : !llvm.i1, !llvm.i64 + %913 = llvm.sdiv %912, %68 : !llvm.i64 + %914 = llvm.sub %64, %913 : !llvm.i64 + %915 = llvm.select %910, %914, %913 : !llvm.i1, !llvm.i64 + %916 = llvm.mul %915, %60 : !llvm.i64 + %917 = llvm.add %508, %916 : !llvm.i64 + %918 = llvm.add %917, %52 : !llvm.i64 + %919 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %920 = llvm.mlir.constant(0 : index) : !llvm.i64 + %921 = llvm.mlir.constant(256 : index) : !llvm.i64 + %922 = llvm.mul %918, %921 : !llvm.i64 + %923 = llvm.add %920, %922 : !llvm.i64 + %924 = llvm.mlir.constant(2 : index) : !llvm.i64 + %925 = llvm.mul %516, %924 : !llvm.i64 + %926 = llvm.add %923, %925 : !llvm.i64 + %927 = llvm.mlir.constant(1 : index) : !llvm.i64 + %928 = llvm.mul %530, %927 : !llvm.i64 + %929 = llvm.add %926, %928 : !llvm.i64 + %930 = llvm.getelementptr %919[%929] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %908, %930 : !llvm.ptr> + %931 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %932 = llvm.mlir.constant(0 : index) : !llvm.i64 + %933 = llvm.mlir.constant(16 : index) : !llvm.i64 + %934 = llvm.mul %67, %933 : !llvm.i64 + %935 = llvm.add %932, %934 : !llvm.i64 + %936 = llvm.mlir.constant(1 : index) : !llvm.i64 + %937 = llvm.mul %54, %936 : !llvm.i64 + %938 = llvm.add %935, %937 : !llvm.i64 + %939 = llvm.getelementptr %931[%938] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %940 = llvm.load %939 : !llvm.ptr> + %941 = llvm.add %155, %53 : !llvm.i64 + %942 = llvm.icmp "slt" %941, %67 : !llvm.i64 + %943 = llvm.sub %64, %941 : !llvm.i64 + %944 = llvm.select %942, %943, %941 : !llvm.i1, !llvm.i64 + %945 = llvm.sdiv %944, %68 : !llvm.i64 + %946 = llvm.sub %64, %945 : !llvm.i64 + %947 = llvm.select %942, %946, %945 : !llvm.i1, !llvm.i64 + %948 = llvm.srem %947, %68 : !llvm.i64 + %949 = llvm.icmp "slt" %948, %67 : !llvm.i64 + %950 = llvm.add %948, %68 : !llvm.i64 + %951 = llvm.select %949, %950, %948 : !llvm.i1, !llvm.i64 + %952 = llvm.mul %947, %65 : !llvm.i64 + %953 = llvm.add %566, %952 : !llvm.i64 + %954 = llvm.add %953, %54 : !llvm.i64 + %955 = llvm.icmp "slt" %954, %67 : !llvm.i64 + %956 = llvm.sub %64, %954 : !llvm.i64 + %957 = llvm.select %955, %956, %954 : !llvm.i1, !llvm.i64 + %958 = llvm.sdiv %957, %63 : !llvm.i64 + %959 = llvm.sub %64, %958 : !llvm.i64 + %960 = llvm.select %955, %959, %958 : !llvm.i1, !llvm.i64 + %961 = llvm.mul %960, %65 : !llvm.i64 + %962 = llvm.add %953, %961 : !llvm.i64 + %963 = llvm.add %962, %54 : !llvm.i64 + %964 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %966 = llvm.mlir.constant(256 : index) : !llvm.i64 + %967 = llvm.mul %951, %966 : !llvm.i64 + %968 = llvm.add %965, %967 : !llvm.i64 + %969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %970 = llvm.mul %516, %969 : !llvm.i64 + %971 = llvm.add %968, %970 : !llvm.i64 + %972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %973 = llvm.mul %963, %972 : !llvm.i64 + %974 = llvm.add %971, %973 : !llvm.i64 + %975 = llvm.getelementptr %964[%974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %940, %975 : !llvm.ptr> + %976 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %977 = llvm.mlir.constant(0 : index) : !llvm.i64 + %978 = llvm.mlir.constant(16 : index) : !llvm.i64 + %979 = llvm.mul %67, %978 : !llvm.i64 + %980 = llvm.add %977, %979 : !llvm.i64 + %981 = llvm.mlir.constant(1 : index) : !llvm.i64 + %982 = llvm.mul %34, %981 : !llvm.i64 + %983 = llvm.add %980, %982 : !llvm.i64 + %984 = llvm.getelementptr %976[%983] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %985 = llvm.load %984 : !llvm.ptr> + %986 = llvm.add %508, %56 : !llvm.i64 + %987 = llvm.icmp "slt" %986, %67 : !llvm.i64 + %988 = llvm.sub %64, %986 : !llvm.i64 + %989 = llvm.select %987, %988, %986 : !llvm.i1, !llvm.i64 + %990 = llvm.sdiv %989, %68 : !llvm.i64 + %991 = llvm.sub %64, %990 : !llvm.i64 + %992 = llvm.select %987, %991, %990 : !llvm.i1, !llvm.i64 + %993 = llvm.mul %992, %60 : !llvm.i64 + %994 = llvm.add %508, %993 : !llvm.i64 + %995 = llvm.add %994, %56 : !llvm.i64 + %996 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %997 = llvm.mlir.constant(0 : index) : !llvm.i64 + %998 = llvm.mlir.constant(256 : index) : !llvm.i64 + %999 = llvm.mul %995, %998 : !llvm.i64 + %1000 = llvm.add %997, %999 : !llvm.i64 + %1001 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1002 = llvm.mul %516, %1001 : !llvm.i64 + %1003 = llvm.add %1000, %1002 : !llvm.i64 + %1004 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1005 = llvm.mul %530, %1004 : !llvm.i64 + %1006 = llvm.add %1003, %1005 : !llvm.i64 + %1007 = llvm.getelementptr %996[%1006] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %985, %1007 : !llvm.ptr> + %1008 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1010 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1011 = llvm.mul %67, %1010 : !llvm.i64 + %1012 = llvm.add %1009, %1011 : !llvm.i64 + %1013 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1014 = llvm.mul %58, %1013 : !llvm.i64 + %1015 = llvm.add %1012, %1014 : !llvm.i64 + %1016 = llvm.getelementptr %1008[%1015] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1017 = llvm.load %1016 : !llvm.ptr> + %1018 = llvm.add %155, %57 : !llvm.i64 + %1019 = llvm.icmp "slt" %1018, %67 : !llvm.i64 + %1020 = llvm.sub %64, %1018 : !llvm.i64 + %1021 = llvm.select %1019, %1020, %1018 : !llvm.i1, !llvm.i64 + %1022 = llvm.sdiv %1021, %68 : !llvm.i64 + %1023 = llvm.sub %64, %1022 : !llvm.i64 + %1024 = llvm.select %1019, %1023, %1022 : !llvm.i1, !llvm.i64 + %1025 = llvm.srem %1024, %68 : !llvm.i64 + %1026 = llvm.icmp "slt" %1025, %67 : !llvm.i64 + %1027 = llvm.add %1025, %68 : !llvm.i64 + %1028 = llvm.select %1026, %1027, %1025 : !llvm.i1, !llvm.i64 + %1029 = llvm.mul %1024, %65 : !llvm.i64 + %1030 = llvm.add %566, %1029 : !llvm.i64 + %1031 = llvm.add %1030, %58 : !llvm.i64 + %1032 = llvm.icmp "slt" %1031, %67 : !llvm.i64 + %1033 = llvm.sub %64, %1031 : !llvm.i64 + %1034 = llvm.select %1032, %1033, %1031 : !llvm.i1, !llvm.i64 + %1035 = llvm.sdiv %1034, %63 : !llvm.i64 + %1036 = llvm.sub %64, %1035 : !llvm.i64 + %1037 = llvm.select %1032, %1036, %1035 : !llvm.i1, !llvm.i64 + %1038 = llvm.mul %1037, %65 : !llvm.i64 + %1039 = llvm.add %1030, %1038 : !llvm.i64 + %1040 = llvm.add %1039, %58 : !llvm.i64 + %1041 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1042 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1043 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1044 = llvm.mul %1028, %1043 : !llvm.i64 + %1045 = llvm.add %1042, %1044 : !llvm.i64 + %1046 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1047 = llvm.mul %516, %1046 : !llvm.i64 + %1048 = llvm.add %1045, %1047 : !llvm.i64 + %1049 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1050 = llvm.mul %1040, %1049 : !llvm.i64 + %1051 = llvm.add %1048, %1050 : !llvm.i64 + %1052 = llvm.getelementptr %1041[%1051] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1017, %1052 : !llvm.ptr> + %1053 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1054 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1055 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1056 = llvm.mul %67, %1055 : !llvm.i64 + %1057 = llvm.add %1054, %1056 : !llvm.i64 + %1058 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1059 = llvm.mul %35, %1058 : !llvm.i64 + %1060 = llvm.add %1057, %1059 : !llvm.i64 + %1061 = llvm.getelementptr %1053[%1060] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1062 = llvm.load %1061 : !llvm.ptr> + %1063 = llvm.add %508, %61 : !llvm.i64 + %1064 = llvm.icmp "slt" %1063, %67 : !llvm.i64 + %1065 = llvm.sub %64, %1063 : !llvm.i64 + %1066 = llvm.select %1064, %1065, %1063 : !llvm.i1, !llvm.i64 + %1067 = llvm.sdiv %1066, %68 : !llvm.i64 + %1068 = llvm.sub %64, %1067 : !llvm.i64 + %1069 = llvm.select %1064, %1068, %1067 : !llvm.i1, !llvm.i64 + %1070 = llvm.mul %1069, %60 : !llvm.i64 + %1071 = llvm.add %508, %1070 : !llvm.i64 + %1072 = llvm.add %1071, %61 : !llvm.i64 + %1073 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1074 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1075 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1076 = llvm.mul %1072, %1075 : !llvm.i64 + %1077 = llvm.add %1074, %1076 : !llvm.i64 + %1078 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1079 = llvm.mul %516, %1078 : !llvm.i64 + %1080 = llvm.add %1077, %1079 : !llvm.i64 + %1081 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1082 = llvm.mul %530, %1081 : !llvm.i64 + %1083 = llvm.add %1080, %1082 : !llvm.i64 + %1084 = llvm.getelementptr %1073[%1083] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1062, %1084 : !llvm.ptr> + %1085 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1087 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1088 = llvm.mul %67, %1087 : !llvm.i64 + %1089 = llvm.add %1086, %1088 : !llvm.i64 + %1090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1091 = llvm.mul %66, %1090 : !llvm.i64 + %1092 = llvm.add %1089, %1091 : !llvm.i64 + %1093 = llvm.getelementptr %1085[%1092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1094 = llvm.load %1093 : !llvm.ptr> + %1095 = llvm.add %155, %62 : !llvm.i64 + %1096 = llvm.icmp "slt" %1095, %67 : !llvm.i64 + %1097 = llvm.sub %64, %1095 : !llvm.i64 + %1098 = llvm.select %1096, %1097, %1095 : !llvm.i1, !llvm.i64 + %1099 = llvm.sdiv %1098, %68 : !llvm.i64 + %1100 = llvm.sub %64, %1099 : !llvm.i64 + %1101 = llvm.select %1096, %1100, %1099 : !llvm.i1, !llvm.i64 + %1102 = llvm.srem %1101, %68 : !llvm.i64 + %1103 = llvm.icmp "slt" %1102, %67 : !llvm.i64 + %1104 = llvm.add %1102, %68 : !llvm.i64 + %1105 = llvm.select %1103, %1104, %1102 : !llvm.i1, !llvm.i64 + %1106 = llvm.mul %1101, %65 : !llvm.i64 + %1107 = llvm.add %566, %1106 : !llvm.i64 + %1108 = llvm.add %1107, %66 : !llvm.i64 + %1109 = llvm.icmp "slt" %1108, %67 : !llvm.i64 + %1110 = llvm.sub %64, %1108 : !llvm.i64 + %1111 = llvm.select %1109, %1110, %1108 : !llvm.i1, !llvm.i64 + %1112 = llvm.sdiv %1111, %63 : !llvm.i64 + %1113 = llvm.sub %64, %1112 : !llvm.i64 + %1114 = llvm.select %1109, %1113, %1112 : !llvm.i1, !llvm.i64 + %1115 = llvm.mul %1114, %65 : !llvm.i64 + %1116 = llvm.add %1107, %1115 : !llvm.i64 + %1117 = llvm.add %1116, %66 : !llvm.i64 + %1118 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1119 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1120 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1121 = llvm.mul %1105, %1120 : !llvm.i64 + %1122 = llvm.add %1119, %1121 : !llvm.i64 + %1123 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1124 = llvm.mul %516, %1123 : !llvm.i64 + %1125 = llvm.add %1122, %1124 : !llvm.i64 + %1126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1127 = llvm.mul %1117, %1126 : !llvm.i64 + %1128 = llvm.add %1125, %1127 : !llvm.i64 + %1129 = llvm.getelementptr %1118[%1128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1094, %1129 : !llvm.ptr> + llvm.br ^bb9 + ^bb8: // pred: ^bb6 + %1130 = llvm.add %151, %155 : !llvm.i64 + %1131 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1134 = llvm.mul %153, %1133 : !llvm.i64 + %1135 = llvm.add %1132, %1134 : !llvm.i64 + %1136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1137 = llvm.mul %1130, %1136 : !llvm.i64 + %1138 = llvm.add %1135, %1137 : !llvm.i64 + %1139 = llvm.getelementptr %1131[%1138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1140 = llvm.bitcast %1139 : !llvm.ptr to !llvm.ptr> + %1141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1143 = llvm.trunc %1130 : !llvm.i64 to !llvm.i32 + %1144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1146 = llvm.insertelement %1143, %1144[%1145 : !llvm.i32] : !llvm.vec<8 x i32> + %1147 = llvm.shufflevector %1146, %1144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1148 = llvm.add %1147, %1142 : !llvm.vec<8 x i32> + %1149 = llvm.trunc %1141 : !llvm.i64 to !llvm.i32 + %1150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1152 = llvm.insertelement %1149, %1150[%1151 : !llvm.i32] : !llvm.vec<8 x i32> + %1153 = llvm.shufflevector %1152, %1150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1154 = llvm.icmp "slt" %1148, %1153 : !llvm.vec<8 x i32> + %1155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1156 = llvm.intr.masked.load %1140, %1154, %1155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1157 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1158 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1159 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1160 = llvm.mul %67, %1159 : !llvm.i64 + %1161 = llvm.add %1158, %1160 : !llvm.i64 + %1162 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1163 = llvm.mul %67, %1162 : !llvm.i64 + %1164 = llvm.add %1161, %1163 : !llvm.i64 + %1165 = llvm.getelementptr %1157[%1164] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1156, %1165 : !llvm.ptr> + %1166 = llvm.add %1130, %70 : !llvm.i64 + %1167 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1169 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1170 = llvm.mul %153, %1169 : !llvm.i64 + %1171 = llvm.add %1168, %1170 : !llvm.i64 + %1172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1173 = llvm.mul %1166, %1172 : !llvm.i64 + %1174 = llvm.add %1171, %1173 : !llvm.i64 + %1175 = llvm.getelementptr %1167[%1174] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1176 = llvm.bitcast %1175 : !llvm.ptr to !llvm.ptr> + %1177 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1178 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1179 = llvm.trunc %1166 : !llvm.i64 to !llvm.i32 + %1180 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1181 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1182 = llvm.insertelement %1179, %1180[%1181 : !llvm.i32] : !llvm.vec<8 x i32> + %1183 = llvm.shufflevector %1182, %1180 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1184 = llvm.add %1183, %1178 : !llvm.vec<8 x i32> + %1185 = llvm.trunc %1177 : !llvm.i64 to !llvm.i32 + %1186 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1187 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1188 = llvm.insertelement %1185, %1186[%1187 : !llvm.i32] : !llvm.vec<8 x i32> + %1189 = llvm.shufflevector %1188, %1186 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1190 = llvm.icmp "slt" %1184, %1189 : !llvm.vec<8 x i32> + %1191 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1192 = llvm.intr.masked.load %1176, %1190, %1191 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1193 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1194 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1195 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1196 = llvm.mul %67, %1195 : !llvm.i64 + %1197 = llvm.add %1194, %1196 : !llvm.i64 + %1198 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1199 = llvm.mul %69, %1198 : !llvm.i64 + %1200 = llvm.add %1197, %1199 : !llvm.i64 + %1201 = llvm.getelementptr %1193[%1200] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1192, %1201 : !llvm.ptr> + %1202 = llvm.add %1130, %68 : !llvm.i64 + %1203 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1204 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1205 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1206 = llvm.mul %153, %1205 : !llvm.i64 + %1207 = llvm.add %1204, %1206 : !llvm.i64 + %1208 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1209 = llvm.mul %1202, %1208 : !llvm.i64 + %1210 = llvm.add %1207, %1209 : !llvm.i64 + %1211 = llvm.getelementptr %1203[%1210] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1212 = llvm.bitcast %1211 : !llvm.ptr to !llvm.ptr> + %1213 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1214 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1215 = llvm.trunc %1202 : !llvm.i64 to !llvm.i32 + %1216 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1217 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1218 = llvm.insertelement %1215, %1216[%1217 : !llvm.i32] : !llvm.vec<8 x i32> + %1219 = llvm.shufflevector %1218, %1216 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1220 = llvm.add %1219, %1214 : !llvm.vec<8 x i32> + %1221 = llvm.trunc %1213 : !llvm.i64 to !llvm.i32 + %1222 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1223 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1224 = llvm.insertelement %1221, %1222[%1223 : !llvm.i32] : !llvm.vec<8 x i32> + %1225 = llvm.shufflevector %1224, %1222 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1226 = llvm.icmp "slt" %1220, %1225 : !llvm.vec<8 x i32> + %1227 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1228 = llvm.intr.masked.load %1212, %1226, %1227 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1229 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1230 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1231 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1232 = llvm.mul %67, %1231 : !llvm.i64 + %1233 = llvm.add %1230, %1232 : !llvm.i64 + %1234 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1235 = llvm.mul %63, %1234 : !llvm.i64 + %1236 = llvm.add %1233, %1235 : !llvm.i64 + %1237 = llvm.getelementptr %1229[%1236] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1228, %1237 : !llvm.ptr> + %1238 = llvm.add %1130, %41 : !llvm.i64 + %1239 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1240 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1241 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1242 = llvm.mul %153, %1241 : !llvm.i64 + %1243 = llvm.add %1240, %1242 : !llvm.i64 + %1244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1245 = llvm.mul %1238, %1244 : !llvm.i64 + %1246 = llvm.add %1243, %1245 : !llvm.i64 + %1247 = llvm.getelementptr %1239[%1246] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1248 = llvm.bitcast %1247 : !llvm.ptr to !llvm.ptr> + %1249 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1250 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1251 = llvm.trunc %1238 : !llvm.i64 to !llvm.i32 + %1252 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1253 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1254 = llvm.insertelement %1251, %1252[%1253 : !llvm.i32] : !llvm.vec<8 x i32> + %1255 = llvm.shufflevector %1254, %1252 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1256 = llvm.add %1255, %1250 : !llvm.vec<8 x i32> + %1257 = llvm.trunc %1249 : !llvm.i64 to !llvm.i32 + %1258 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1259 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1260 = llvm.insertelement %1257, %1258[%1259 : !llvm.i32] : !llvm.vec<8 x i32> + %1261 = llvm.shufflevector %1260, %1258 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1262 = llvm.icmp "slt" %1256, %1261 : !llvm.vec<8 x i32> + %1263 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1264 = llvm.intr.masked.load %1248, %1262, %1263 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1265 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1266 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1267 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1268 = llvm.mul %67, %1267 : !llvm.i64 + %1269 = llvm.add %1266, %1268 : !llvm.i64 + %1270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1271 = llvm.mul %45, %1270 : !llvm.i64 + %1272 = llvm.add %1269, %1271 : !llvm.i64 + %1273 = llvm.getelementptr %1265[%1272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1264, %1273 : !llvm.ptr> + %1274 = llvm.add %1130, %42 : !llvm.i64 + %1275 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1276 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1277 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1278 = llvm.mul %153, %1277 : !llvm.i64 + %1279 = llvm.add %1276, %1278 : !llvm.i64 + %1280 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1281 = llvm.mul %1274, %1280 : !llvm.i64 + %1282 = llvm.add %1279, %1281 : !llvm.i64 + %1283 = llvm.getelementptr %1275[%1282] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1284 = llvm.bitcast %1283 : !llvm.ptr to !llvm.ptr> + %1285 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1286 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1287 = llvm.trunc %1274 : !llvm.i64 to !llvm.i32 + %1288 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1289 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1290 = llvm.insertelement %1287, %1288[%1289 : !llvm.i32] : !llvm.vec<8 x i32> + %1291 = llvm.shufflevector %1290, %1288 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1292 = llvm.add %1291, %1286 : !llvm.vec<8 x i32> + %1293 = llvm.trunc %1285 : !llvm.i64 to !llvm.i32 + %1294 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1295 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1296 = llvm.insertelement %1293, %1294[%1295 : !llvm.i32] : !llvm.vec<8 x i32> + %1297 = llvm.shufflevector %1296, %1294 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1298 = llvm.icmp "slt" %1292, %1297 : !llvm.vec<8 x i32> + %1299 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1300 = llvm.intr.masked.load %1284, %1298, %1299 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1301 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1302 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1303 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1304 = llvm.mul %67, %1303 : !llvm.i64 + %1305 = llvm.add %1302, %1304 : !llvm.i64 + %1306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1307 = llvm.mul %48, %1306 : !llvm.i64 + %1308 = llvm.add %1305, %1307 : !llvm.i64 + %1309 = llvm.getelementptr %1301[%1308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1300, %1309 : !llvm.ptr> + %1310 = llvm.add %1130, %43 : !llvm.i64 + %1311 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1312 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1313 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1314 = llvm.mul %153, %1313 : !llvm.i64 + %1315 = llvm.add %1312, %1314 : !llvm.i64 + %1316 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1317 = llvm.mul %1310, %1316 : !llvm.i64 + %1318 = llvm.add %1315, %1317 : !llvm.i64 + %1319 = llvm.getelementptr %1311[%1318] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1320 = llvm.bitcast %1319 : !llvm.ptr to !llvm.ptr> + %1321 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1322 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1323 = llvm.trunc %1310 : !llvm.i64 to !llvm.i32 + %1324 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1325 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1326 = llvm.insertelement %1323, %1324[%1325 : !llvm.i32] : !llvm.vec<8 x i32> + %1327 = llvm.shufflevector %1326, %1324 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1328 = llvm.add %1327, %1322 : !llvm.vec<8 x i32> + %1329 = llvm.trunc %1321 : !llvm.i64 to !llvm.i32 + %1330 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1331 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1332 = llvm.insertelement %1329, %1330[%1331 : !llvm.i32] : !llvm.vec<8 x i32> + %1333 = llvm.shufflevector %1332, %1330 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1334 = llvm.icmp "slt" %1328, %1333 : !llvm.vec<8 x i32> + %1335 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1336 = llvm.intr.masked.load %1320, %1334, %1335 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1337 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1338 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1339 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1340 = llvm.mul %67, %1339 : !llvm.i64 + %1341 = llvm.add %1338, %1340 : !llvm.i64 + %1342 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1343 = llvm.mul %52, %1342 : !llvm.i64 + %1344 = llvm.add %1341, %1343 : !llvm.i64 + %1345 = llvm.getelementptr %1337[%1344] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1336, %1345 : !llvm.ptr> + %1346 = llvm.add %1130, %44 : !llvm.i64 + %1347 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1348 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1349 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1350 = llvm.mul %153, %1349 : !llvm.i64 + %1351 = llvm.add %1348, %1350 : !llvm.i64 + %1352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1353 = llvm.mul %1346, %1352 : !llvm.i64 + %1354 = llvm.add %1351, %1353 : !llvm.i64 + %1355 = llvm.getelementptr %1347[%1354] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1356 = llvm.bitcast %1355 : !llvm.ptr to !llvm.ptr> + %1357 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1358 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1359 = llvm.trunc %1346 : !llvm.i64 to !llvm.i32 + %1360 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1361 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1362 = llvm.insertelement %1359, %1360[%1361 : !llvm.i32] : !llvm.vec<8 x i32> + %1363 = llvm.shufflevector %1362, %1360 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1364 = llvm.add %1363, %1358 : !llvm.vec<8 x i32> + %1365 = llvm.trunc %1357 : !llvm.i64 to !llvm.i32 + %1366 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1367 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1368 = llvm.insertelement %1365, %1366[%1367 : !llvm.i32] : !llvm.vec<8 x i32> + %1369 = llvm.shufflevector %1368, %1366 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1370 = llvm.icmp "slt" %1364, %1369 : !llvm.vec<8 x i32> + %1371 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1372 = llvm.intr.masked.load %1356, %1370, %1371 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1373 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1375 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1376 = llvm.mul %67, %1375 : !llvm.i64 + %1377 = llvm.add %1374, %1376 : !llvm.i64 + %1378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1379 = llvm.mul %56, %1378 : !llvm.i64 + %1380 = llvm.add %1377, %1379 : !llvm.i64 + %1381 = llvm.getelementptr %1373[%1380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1372, %1381 : !llvm.ptr> + %1382 = llvm.add %1130, %46 : !llvm.i64 + %1383 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1384 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1385 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1386 = llvm.mul %153, %1385 : !llvm.i64 + %1387 = llvm.add %1384, %1386 : !llvm.i64 + %1388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1389 = llvm.mul %1382, %1388 : !llvm.i64 + %1390 = llvm.add %1387, %1389 : !llvm.i64 + %1391 = llvm.getelementptr %1383[%1390] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1392 = llvm.bitcast %1391 : !llvm.ptr to !llvm.ptr> + %1393 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1394 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1395 = llvm.trunc %1382 : !llvm.i64 to !llvm.i32 + %1396 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1397 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1398 = llvm.insertelement %1395, %1396[%1397 : !llvm.i32] : !llvm.vec<8 x i32> + %1399 = llvm.shufflevector %1398, %1396 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1400 = llvm.add %1399, %1394 : !llvm.vec<8 x i32> + %1401 = llvm.trunc %1393 : !llvm.i64 to !llvm.i32 + %1402 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1403 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1404 = llvm.insertelement %1401, %1402[%1403 : !llvm.i32] : !llvm.vec<8 x i32> + %1405 = llvm.shufflevector %1404, %1402 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1406 = llvm.icmp "slt" %1400, %1405 : !llvm.vec<8 x i32> + %1407 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1408 = llvm.intr.masked.load %1392, %1406, %1407 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1409 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1410 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1411 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1412 = llvm.mul %67, %1411 : !llvm.i64 + %1413 = llvm.add %1410, %1412 : !llvm.i64 + %1414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1415 = llvm.mul %61, %1414 : !llvm.i64 + %1416 = llvm.add %1413, %1415 : !llvm.i64 + %1417 = llvm.getelementptr %1409[%1416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1408, %1417 : !llvm.ptr> + %1418 = llvm.add %1130, %47 : !llvm.i64 + %1419 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1420 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1421 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1422 = llvm.mul %153, %1421 : !llvm.i64 + %1423 = llvm.add %1420, %1422 : !llvm.i64 + %1424 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1425 = llvm.mul %1418, %1424 : !llvm.i64 + %1426 = llvm.add %1423, %1425 : !llvm.i64 + %1427 = llvm.getelementptr %1419[%1426] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1428 = llvm.bitcast %1427 : !llvm.ptr to !llvm.ptr> + %1429 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1430 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1431 = llvm.trunc %1418 : !llvm.i64 to !llvm.i32 + %1432 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1433 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1434 = llvm.insertelement %1431, %1432[%1433 : !llvm.i32] : !llvm.vec<8 x i32> + %1435 = llvm.shufflevector %1434, %1432 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1436 = llvm.add %1435, %1430 : !llvm.vec<8 x i32> + %1437 = llvm.trunc %1429 : !llvm.i64 to !llvm.i32 + %1438 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1439 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1440 = llvm.insertelement %1437, %1438[%1439 : !llvm.i32] : !llvm.vec<8 x i32> + %1441 = llvm.shufflevector %1440, %1438 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1442 = llvm.icmp "slt" %1436, %1441 : !llvm.vec<8 x i32> + %1443 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1444 = llvm.intr.masked.load %1428, %1442, %1443 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1445 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1446 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1447 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1448 = llvm.mul %67, %1447 : !llvm.i64 + %1449 = llvm.add %1446, %1448 : !llvm.i64 + %1450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1451 = llvm.mul %70, %1450 : !llvm.i64 + %1452 = llvm.add %1449, %1451 : !llvm.i64 + %1453 = llvm.getelementptr %1445[%1452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1444, %1453 : !llvm.ptr> + %1454 = llvm.add %1130, %49 : !llvm.i64 + %1455 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1456 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1457 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1458 = llvm.mul %153, %1457 : !llvm.i64 + %1459 = llvm.add %1456, %1458 : !llvm.i64 + %1460 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1461 = llvm.mul %1454, %1460 : !llvm.i64 + %1462 = llvm.add %1459, %1461 : !llvm.i64 + %1463 = llvm.getelementptr %1455[%1462] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1464 = llvm.bitcast %1463 : !llvm.ptr to !llvm.ptr> + %1465 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1466 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1467 = llvm.trunc %1454 : !llvm.i64 to !llvm.i32 + %1468 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1469 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1470 = llvm.insertelement %1467, %1468[%1469 : !llvm.i32] : !llvm.vec<8 x i32> + %1471 = llvm.shufflevector %1470, %1468 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1472 = llvm.add %1471, %1466 : !llvm.vec<8 x i32> + %1473 = llvm.trunc %1465 : !llvm.i64 to !llvm.i32 + %1474 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1475 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1476 = llvm.insertelement %1473, %1474[%1475 : !llvm.i32] : !llvm.vec<8 x i32> + %1477 = llvm.shufflevector %1476, %1474 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1478 = llvm.icmp "slt" %1472, %1477 : !llvm.vec<8 x i32> + %1479 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1480 = llvm.intr.masked.load %1464, %1478, %1479 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1481 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1482 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1483 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1484 = llvm.mul %67, %1483 : !llvm.i64 + %1485 = llvm.add %1482, %1484 : !llvm.i64 + %1486 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1487 = llvm.mul %50, %1486 : !llvm.i64 + %1488 = llvm.add %1485, %1487 : !llvm.i64 + %1489 = llvm.getelementptr %1481[%1488] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1480, %1489 : !llvm.ptr> + %1490 = llvm.add %1130, %51 : !llvm.i64 + %1491 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1492 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1493 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1494 = llvm.mul %153, %1493 : !llvm.i64 + %1495 = llvm.add %1492, %1494 : !llvm.i64 + %1496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1497 = llvm.mul %1490, %1496 : !llvm.i64 + %1498 = llvm.add %1495, %1497 : !llvm.i64 + %1499 = llvm.getelementptr %1491[%1498] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1500 = llvm.bitcast %1499 : !llvm.ptr to !llvm.ptr> + %1501 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1502 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1503 = llvm.trunc %1490 : !llvm.i64 to !llvm.i32 + %1504 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1505 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1506 = llvm.insertelement %1503, %1504[%1505 : !llvm.i32] : !llvm.vec<8 x i32> + %1507 = llvm.shufflevector %1506, %1504 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1508 = llvm.add %1507, %1502 : !llvm.vec<8 x i32> + %1509 = llvm.trunc %1501 : !llvm.i64 to !llvm.i32 + %1510 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1511 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1512 = llvm.insertelement %1509, %1510[%1511 : !llvm.i32] : !llvm.vec<8 x i32> + %1513 = llvm.shufflevector %1512, %1510 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1514 = llvm.icmp "slt" %1508, %1513 : !llvm.vec<8 x i32> + %1515 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1516 = llvm.intr.masked.load %1500, %1514, %1515 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1517 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1519 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1520 = llvm.mul %67, %1519 : !llvm.i64 + %1521 = llvm.add %1518, %1520 : !llvm.i64 + %1522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1523 = llvm.mul %33, %1522 : !llvm.i64 + %1524 = llvm.add %1521, %1523 : !llvm.i64 + %1525 = llvm.getelementptr %1517[%1524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1516, %1525 : !llvm.ptr> + %1526 = llvm.add %1130, %53 : !llvm.i64 + %1527 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1528 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1529 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1530 = llvm.mul %153, %1529 : !llvm.i64 + %1531 = llvm.add %1528, %1530 : !llvm.i64 + %1532 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1533 = llvm.mul %1526, %1532 : !llvm.i64 + %1534 = llvm.add %1531, %1533 : !llvm.i64 + %1535 = llvm.getelementptr %1527[%1534] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1536 = llvm.bitcast %1535 : !llvm.ptr to !llvm.ptr> + %1537 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1538 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1539 = llvm.trunc %1526 : !llvm.i64 to !llvm.i32 + %1540 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1541 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1542 = llvm.insertelement %1539, %1540[%1541 : !llvm.i32] : !llvm.vec<8 x i32> + %1543 = llvm.shufflevector %1542, %1540 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1544 = llvm.add %1543, %1538 : !llvm.vec<8 x i32> + %1545 = llvm.trunc %1537 : !llvm.i64 to !llvm.i32 + %1546 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1547 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1548 = llvm.insertelement %1545, %1546[%1547 : !llvm.i32] : !llvm.vec<8 x i32> + %1549 = llvm.shufflevector %1548, %1546 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1550 = llvm.icmp "slt" %1544, %1549 : !llvm.vec<8 x i32> + %1551 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1552 = llvm.intr.masked.load %1536, %1550, %1551 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1553 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1554 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1555 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1556 = llvm.mul %67, %1555 : !llvm.i64 + %1557 = llvm.add %1554, %1556 : !llvm.i64 + %1558 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1559 = llvm.mul %54, %1558 : !llvm.i64 + %1560 = llvm.add %1557, %1559 : !llvm.i64 + %1561 = llvm.getelementptr %1553[%1560] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1552, %1561 : !llvm.ptr> + %1562 = llvm.add %1130, %55 : !llvm.i64 + %1563 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1564 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1565 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1566 = llvm.mul %153, %1565 : !llvm.i64 + %1567 = llvm.add %1564, %1566 : !llvm.i64 + %1568 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1569 = llvm.mul %1562, %1568 : !llvm.i64 + %1570 = llvm.add %1567, %1569 : !llvm.i64 + %1571 = llvm.getelementptr %1563[%1570] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1572 = llvm.bitcast %1571 : !llvm.ptr to !llvm.ptr> + %1573 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1574 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1575 = llvm.trunc %1562 : !llvm.i64 to !llvm.i32 + %1576 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1577 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1578 = llvm.insertelement %1575, %1576[%1577 : !llvm.i32] : !llvm.vec<8 x i32> + %1579 = llvm.shufflevector %1578, %1576 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1580 = llvm.add %1579, %1574 : !llvm.vec<8 x i32> + %1581 = llvm.trunc %1573 : !llvm.i64 to !llvm.i32 + %1582 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1583 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1584 = llvm.insertelement %1581, %1582[%1583 : !llvm.i32] : !llvm.vec<8 x i32> + %1585 = llvm.shufflevector %1584, %1582 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1586 = llvm.icmp "slt" %1580, %1585 : !llvm.vec<8 x i32> + %1587 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1588 = llvm.intr.masked.load %1572, %1586, %1587 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1589 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1592 = llvm.mul %67, %1591 : !llvm.i64 + %1593 = llvm.add %1590, %1592 : !llvm.i64 + %1594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1595 = llvm.mul %34, %1594 : !llvm.i64 + %1596 = llvm.add %1593, %1595 : !llvm.i64 + %1597 = llvm.getelementptr %1589[%1596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1588, %1597 : !llvm.ptr> + %1598 = llvm.add %1130, %57 : !llvm.i64 + %1599 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1602 = llvm.mul %153, %1601 : !llvm.i64 + %1603 = llvm.add %1600, %1602 : !llvm.i64 + %1604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1605 = llvm.mul %1598, %1604 : !llvm.i64 + %1606 = llvm.add %1603, %1605 : !llvm.i64 + %1607 = llvm.getelementptr %1599[%1606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1608 = llvm.bitcast %1607 : !llvm.ptr to !llvm.ptr> + %1609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1611 = llvm.trunc %1598 : !llvm.i64 to !llvm.i32 + %1612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1614 = llvm.insertelement %1611, %1612[%1613 : !llvm.i32] : !llvm.vec<8 x i32> + %1615 = llvm.shufflevector %1614, %1612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1616 = llvm.add %1615, %1610 : !llvm.vec<8 x i32> + %1617 = llvm.trunc %1609 : !llvm.i64 to !llvm.i32 + %1618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1620 = llvm.insertelement %1617, %1618[%1619 : !llvm.i32] : !llvm.vec<8 x i32> + %1621 = llvm.shufflevector %1620, %1618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1622 = llvm.icmp "slt" %1616, %1621 : !llvm.vec<8 x i32> + %1623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1624 = llvm.intr.masked.load %1608, %1622, %1623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1625 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1626 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1627 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1628 = llvm.mul %67, %1627 : !llvm.i64 + %1629 = llvm.add %1626, %1628 : !llvm.i64 + %1630 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1631 = llvm.mul %58, %1630 : !llvm.i64 + %1632 = llvm.add %1629, %1631 : !llvm.i64 + %1633 = llvm.getelementptr %1625[%1632] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1624, %1633 : !llvm.ptr> + %1634 = llvm.add %1130, %59 : !llvm.i64 + %1635 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1637 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1638 = llvm.mul %153, %1637 : !llvm.i64 + %1639 = llvm.add %1636, %1638 : !llvm.i64 + %1640 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1641 = llvm.mul %1634, %1640 : !llvm.i64 + %1642 = llvm.add %1639, %1641 : !llvm.i64 + %1643 = llvm.getelementptr %1635[%1642] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1644 = llvm.bitcast %1643 : !llvm.ptr to !llvm.ptr> + %1645 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1646 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1647 = llvm.trunc %1634 : !llvm.i64 to !llvm.i32 + %1648 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1649 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1650 = llvm.insertelement %1647, %1648[%1649 : !llvm.i32] : !llvm.vec<8 x i32> + %1651 = llvm.shufflevector %1650, %1648 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1652 = llvm.add %1651, %1646 : !llvm.vec<8 x i32> + %1653 = llvm.trunc %1645 : !llvm.i64 to !llvm.i32 + %1654 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1655 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1656 = llvm.insertelement %1653, %1654[%1655 : !llvm.i32] : !llvm.vec<8 x i32> + %1657 = llvm.shufflevector %1656, %1654 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1658 = llvm.icmp "slt" %1652, %1657 : !llvm.vec<8 x i32> + %1659 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1660 = llvm.intr.masked.load %1644, %1658, %1659 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1661 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1662 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1663 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1664 = llvm.mul %67, %1663 : !llvm.i64 + %1665 = llvm.add %1662, %1664 : !llvm.i64 + %1666 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1667 = llvm.mul %35, %1666 : !llvm.i64 + %1668 = llvm.add %1665, %1667 : !llvm.i64 + %1669 = llvm.getelementptr %1661[%1668] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1660, %1669 : !llvm.ptr> + %1670 = llvm.add %1130, %62 : !llvm.i64 + %1671 = llvm.extractvalue %15[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1672 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1673 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1674 = llvm.mul %153, %1673 : !llvm.i64 + %1675 = llvm.add %1672, %1674 : !llvm.i64 + %1676 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1677 = llvm.mul %1670, %1676 : !llvm.i64 + %1678 = llvm.add %1675, %1677 : !llvm.i64 + %1679 = llvm.getelementptr %1671[%1678] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %1680 = llvm.bitcast %1679 : !llvm.ptr to !llvm.ptr> + %1681 = llvm.mlir.constant(512 : index) : !llvm.i64 + %1682 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %1683 = llvm.trunc %1670 : !llvm.i64 to !llvm.i32 + %1684 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1685 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1686 = llvm.insertelement %1683, %1684[%1685 : !llvm.i32] : !llvm.vec<8 x i32> + %1687 = llvm.shufflevector %1686, %1684 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1688 = llvm.add %1687, %1682 : !llvm.vec<8 x i32> + %1689 = llvm.trunc %1681 : !llvm.i64 to !llvm.i32 + %1690 = llvm.mlir.undef : !llvm.vec<8 x i32> + %1691 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %1692 = llvm.insertelement %1689, %1690[%1691 : !llvm.i32] : !llvm.vec<8 x i32> + %1693 = llvm.shufflevector %1692, %1690 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %1694 = llvm.icmp "slt" %1688, %1693 : !llvm.vec<8 x i32> + %1695 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %1696 = llvm.intr.masked.load %1680, %1694, %1695 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %1697 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1698 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1699 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1700 = llvm.mul %67, %1699 : !llvm.i64 + %1701 = llvm.add %1698, %1700 : !llvm.i64 + %1702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1703 = llvm.mul %66, %1702 : !llvm.i64 + %1704 = llvm.add %1701, %1703 : !llvm.i64 + %1705 = llvm.getelementptr %1697[%1704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1696, %1705 : !llvm.ptr> + %1706 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1707 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1708 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1709 = llvm.mul %67, %1708 : !llvm.i64 + %1710 = llvm.add %1707, %1709 : !llvm.i64 + %1711 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1712 = llvm.mul %67, %1711 : !llvm.i64 + %1713 = llvm.add %1710, %1712 : !llvm.i64 + %1714 = llvm.getelementptr %1706[%1713] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1715 = llvm.load %1714 : !llvm.ptr> + %1716 = llvm.icmp "slt" %155, %67 : !llvm.i64 + %1717 = llvm.sub %64, %155 : !llvm.i64 + %1718 = llvm.select %1716, %1717, %155 : !llvm.i1, !llvm.i64 + %1719 = llvm.sdiv %1718, %68 : !llvm.i64 + %1720 = llvm.sub %64, %1719 : !llvm.i64 + %1721 = llvm.select %1716, %1720, %1719 : !llvm.i1, !llvm.i64 + %1722 = llvm.srem %1721, %68 : !llvm.i64 + %1723 = llvm.icmp "slt" %1722, %67 : !llvm.i64 + %1724 = llvm.add %1722, %68 : !llvm.i64 + %1725 = llvm.select %1723, %1724, %1722 : !llvm.i1, !llvm.i64 + %1726 = llvm.srem %153, %39 : !llvm.i64 + %1727 = llvm.icmp "slt" %1726, %67 : !llvm.i64 + %1728 = llvm.add %1726, %39 : !llvm.i64 + %1729 = llvm.select %1727, %1728, %1726 : !llvm.i1, !llvm.i64 + %1730 = llvm.srem %155, %68 : !llvm.i64 + %1731 = llvm.icmp "slt" %1730, %67 : !llvm.i64 + %1732 = llvm.add %1730, %68 : !llvm.i64 + %1733 = llvm.select %1731, %1732, %1730 : !llvm.i1, !llvm.i64 + %1734 = llvm.icmp "slt" %1733, %67 : !llvm.i64 + %1735 = llvm.sub %64, %1733 : !llvm.i64 + %1736 = llvm.select %1734, %1735, %1733 : !llvm.i1, !llvm.i64 + %1737 = llvm.sdiv %1736, %70 : !llvm.i64 + %1738 = llvm.sub %64, %1737 : !llvm.i64 + %1739 = llvm.select %1734, %1738, %1737 : !llvm.i1, !llvm.i64 + %1740 = llvm.srem %1739, %63 : !llvm.i64 + %1741 = llvm.icmp "slt" %1740, %67 : !llvm.i64 + %1742 = llvm.add %1740, %63 : !llvm.i64 + %1743 = llvm.select %1741, %1742, %1740 : !llvm.i1, !llvm.i64 + %1744 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1746 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1747 = llvm.mul %1725, %1746 : !llvm.i64 + %1748 = llvm.add %1745, %1747 : !llvm.i64 + %1749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1750 = llvm.mul %1729, %1749 : !llvm.i64 + %1751 = llvm.add %1748, %1750 : !llvm.i64 + %1752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1753 = llvm.mul %1743, %1752 : !llvm.i64 + %1754 = llvm.add %1751, %1753 : !llvm.i64 + %1755 = llvm.getelementptr %1744[%1754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1715, %1755 : !llvm.ptr> + %1756 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1758 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1759 = llvm.mul %67, %1758 : !llvm.i64 + %1760 = llvm.add %1757, %1759 : !llvm.i64 + %1761 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1762 = llvm.mul %69, %1761 : !llvm.i64 + %1763 = llvm.add %1760, %1762 : !llvm.i64 + %1764 = llvm.getelementptr %1756[%1763] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1765 = llvm.load %1764 : !llvm.ptr> + %1766 = llvm.add %155, %70 : !llvm.i64 + %1767 = llvm.icmp "slt" %1766, %67 : !llvm.i64 + %1768 = llvm.sub %64, %1766 : !llvm.i64 + %1769 = llvm.select %1767, %1768, %1766 : !llvm.i1, !llvm.i64 + %1770 = llvm.sdiv %1769, %68 : !llvm.i64 + %1771 = llvm.sub %64, %1770 : !llvm.i64 + %1772 = llvm.select %1767, %1771, %1770 : !llvm.i1, !llvm.i64 + %1773 = llvm.srem %1772, %68 : !llvm.i64 + %1774 = llvm.icmp "slt" %1773, %67 : !llvm.i64 + %1775 = llvm.add %1773, %68 : !llvm.i64 + %1776 = llvm.select %1774, %1775, %1773 : !llvm.i1, !llvm.i64 + %1777 = llvm.sdiv %1718, %70 : !llvm.i64 + %1778 = llvm.sub %64, %1777 : !llvm.i64 + %1779 = llvm.select %1716, %1778, %1777 : !llvm.i1, !llvm.i64 + %1780 = llvm.mul %1772, %65 : !llvm.i64 + %1781 = llvm.add %1779, %1780 : !llvm.i64 + %1782 = llvm.add %1781, %69 : !llvm.i64 + %1783 = llvm.icmp "slt" %1782, %67 : !llvm.i64 + %1784 = llvm.sub %64, %1782 : !llvm.i64 + %1785 = llvm.select %1783, %1784, %1782 : !llvm.i1, !llvm.i64 + %1786 = llvm.sdiv %1785, %63 : !llvm.i64 + %1787 = llvm.sub %64, %1786 : !llvm.i64 + %1788 = llvm.select %1783, %1787, %1786 : !llvm.i1, !llvm.i64 + %1789 = llvm.mul %1788, %65 : !llvm.i64 + %1790 = llvm.add %1781, %1789 : !llvm.i64 + %1791 = llvm.add %1790, %69 : !llvm.i64 + %1792 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1794 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1795 = llvm.mul %1776, %1794 : !llvm.i64 + %1796 = llvm.add %1793, %1795 : !llvm.i64 + %1797 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1798 = llvm.mul %1729, %1797 : !llvm.i64 + %1799 = llvm.add %1796, %1798 : !llvm.i64 + %1800 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1801 = llvm.mul %1791, %1800 : !llvm.i64 + %1802 = llvm.add %1799, %1801 : !llvm.i64 + %1803 = llvm.getelementptr %1792[%1802] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1765, %1803 : !llvm.ptr> + %1804 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1805 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1806 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1807 = llvm.mul %67, %1806 : !llvm.i64 + %1808 = llvm.add %1805, %1807 : !llvm.i64 + %1809 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1810 = llvm.mul %63, %1809 : !llvm.i64 + %1811 = llvm.add %1808, %1810 : !llvm.i64 + %1812 = llvm.getelementptr %1804[%1811] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1813 = llvm.load %1812 : !llvm.ptr> + %1814 = llvm.add %1721, %69 : !llvm.i64 + %1815 = llvm.icmp "slt" %1814, %67 : !llvm.i64 + %1816 = llvm.sub %64, %1814 : !llvm.i64 + %1817 = llvm.select %1815, %1816, %1814 : !llvm.i1, !llvm.i64 + %1818 = llvm.sdiv %1817, %68 : !llvm.i64 + %1819 = llvm.sub %64, %1818 : !llvm.i64 + %1820 = llvm.select %1815, %1819, %1818 : !llvm.i1, !llvm.i64 + %1821 = llvm.mul %1820, %60 : !llvm.i64 + %1822 = llvm.add %1721, %1821 : !llvm.i64 + %1823 = llvm.add %1822, %69 : !llvm.i64 + %1824 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1825 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1826 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1827 = llvm.mul %1823, %1826 : !llvm.i64 + %1828 = llvm.add %1825, %1827 : !llvm.i64 + %1829 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1830 = llvm.mul %1729, %1829 : !llvm.i64 + %1831 = llvm.add %1828, %1830 : !llvm.i64 + %1832 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1833 = llvm.mul %1743, %1832 : !llvm.i64 + %1834 = llvm.add %1831, %1833 : !llvm.i64 + %1835 = llvm.getelementptr %1824[%1834] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1813, %1835 : !llvm.ptr> + %1836 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1837 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1838 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1839 = llvm.mul %67, %1838 : !llvm.i64 + %1840 = llvm.add %1837, %1839 : !llvm.i64 + %1841 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1842 = llvm.mul %45, %1841 : !llvm.i64 + %1843 = llvm.add %1840, %1842 : !llvm.i64 + %1844 = llvm.getelementptr %1836[%1843] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1845 = llvm.load %1844 : !llvm.ptr> + %1846 = llvm.add %155, %41 : !llvm.i64 + %1847 = llvm.icmp "slt" %1846, %67 : !llvm.i64 + %1848 = llvm.sub %64, %1846 : !llvm.i64 + %1849 = llvm.select %1847, %1848, %1846 : !llvm.i1, !llvm.i64 + %1850 = llvm.sdiv %1849, %68 : !llvm.i64 + %1851 = llvm.sub %64, %1850 : !llvm.i64 + %1852 = llvm.select %1847, %1851, %1850 : !llvm.i1, !llvm.i64 + %1853 = llvm.srem %1852, %68 : !llvm.i64 + %1854 = llvm.icmp "slt" %1853, %67 : !llvm.i64 + %1855 = llvm.add %1853, %68 : !llvm.i64 + %1856 = llvm.select %1854, %1855, %1853 : !llvm.i1, !llvm.i64 + %1857 = llvm.mul %1852, %65 : !llvm.i64 + %1858 = llvm.add %1779, %1857 : !llvm.i64 + %1859 = llvm.add %1858, %45 : !llvm.i64 + %1860 = llvm.icmp "slt" %1859, %67 : !llvm.i64 + %1861 = llvm.sub %64, %1859 : !llvm.i64 + %1862 = llvm.select %1860, %1861, %1859 : !llvm.i1, !llvm.i64 + %1863 = llvm.sdiv %1862, %63 : !llvm.i64 + %1864 = llvm.sub %64, %1863 : !llvm.i64 + %1865 = llvm.select %1860, %1864, %1863 : !llvm.i1, !llvm.i64 + %1866 = llvm.mul %1865, %65 : !llvm.i64 + %1867 = llvm.add %1858, %1866 : !llvm.i64 + %1868 = llvm.add %1867, %45 : !llvm.i64 + %1869 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1871 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1872 = llvm.mul %1856, %1871 : !llvm.i64 + %1873 = llvm.add %1870, %1872 : !llvm.i64 + %1874 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1875 = llvm.mul %1729, %1874 : !llvm.i64 + %1876 = llvm.add %1873, %1875 : !llvm.i64 + %1877 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1878 = llvm.mul %1868, %1877 : !llvm.i64 + %1879 = llvm.add %1876, %1878 : !llvm.i64 + %1880 = llvm.getelementptr %1869[%1879] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1845, %1880 : !llvm.ptr> + %1881 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1882 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1883 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1884 = llvm.mul %67, %1883 : !llvm.i64 + %1885 = llvm.add %1882, %1884 : !llvm.i64 + %1886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1887 = llvm.mul %48, %1886 : !llvm.i64 + %1888 = llvm.add %1885, %1887 : !llvm.i64 + %1889 = llvm.getelementptr %1881[%1888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1890 = llvm.load %1889 : !llvm.ptr> + %1891 = llvm.add %1721, %63 : !llvm.i64 + %1892 = llvm.icmp "slt" %1891, %67 : !llvm.i64 + %1893 = llvm.sub %64, %1891 : !llvm.i64 + %1894 = llvm.select %1892, %1893, %1891 : !llvm.i1, !llvm.i64 + %1895 = llvm.sdiv %1894, %68 : !llvm.i64 + %1896 = llvm.sub %64, %1895 : !llvm.i64 + %1897 = llvm.select %1892, %1896, %1895 : !llvm.i1, !llvm.i64 + %1898 = llvm.mul %1897, %60 : !llvm.i64 + %1899 = llvm.add %1721, %1898 : !llvm.i64 + %1900 = llvm.add %1899, %63 : !llvm.i64 + %1901 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1903 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1904 = llvm.mul %1900, %1903 : !llvm.i64 + %1905 = llvm.add %1902, %1904 : !llvm.i64 + %1906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1907 = llvm.mul %1729, %1906 : !llvm.i64 + %1908 = llvm.add %1905, %1907 : !llvm.i64 + %1909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1910 = llvm.mul %1743, %1909 : !llvm.i64 + %1911 = llvm.add %1908, %1910 : !llvm.i64 + %1912 = llvm.getelementptr %1901[%1911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1890, %1912 : !llvm.ptr> + %1913 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1914 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1915 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1916 = llvm.mul %67, %1915 : !llvm.i64 + %1917 = llvm.add %1914, %1916 : !llvm.i64 + %1918 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1919 = llvm.mul %52, %1918 : !llvm.i64 + %1920 = llvm.add %1917, %1919 : !llvm.i64 + %1921 = llvm.getelementptr %1913[%1920] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1922 = llvm.load %1921 : !llvm.ptr> + %1923 = llvm.add %155, %43 : !llvm.i64 + %1924 = llvm.icmp "slt" %1923, %67 : !llvm.i64 + %1925 = llvm.sub %64, %1923 : !llvm.i64 + %1926 = llvm.select %1924, %1925, %1923 : !llvm.i1, !llvm.i64 + %1927 = llvm.sdiv %1926, %68 : !llvm.i64 + %1928 = llvm.sub %64, %1927 : !llvm.i64 + %1929 = llvm.select %1924, %1928, %1927 : !llvm.i1, !llvm.i64 + %1930 = llvm.srem %1929, %68 : !llvm.i64 + %1931 = llvm.icmp "slt" %1930, %67 : !llvm.i64 + %1932 = llvm.add %1930, %68 : !llvm.i64 + %1933 = llvm.select %1931, %1932, %1930 : !llvm.i1, !llvm.i64 + %1934 = llvm.mul %1929, %65 : !llvm.i64 + %1935 = llvm.add %1779, %1934 : !llvm.i64 + %1936 = llvm.add %1935, %52 : !llvm.i64 + %1937 = llvm.icmp "slt" %1936, %67 : !llvm.i64 + %1938 = llvm.sub %64, %1936 : !llvm.i64 + %1939 = llvm.select %1937, %1938, %1936 : !llvm.i1, !llvm.i64 + %1940 = llvm.sdiv %1939, %63 : !llvm.i64 + %1941 = llvm.sub %64, %1940 : !llvm.i64 + %1942 = llvm.select %1937, %1941, %1940 : !llvm.i1, !llvm.i64 + %1943 = llvm.mul %1942, %65 : !llvm.i64 + %1944 = llvm.add %1935, %1943 : !llvm.i64 + %1945 = llvm.add %1944, %52 : !llvm.i64 + %1946 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1947 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1948 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1949 = llvm.mul %1933, %1948 : !llvm.i64 + %1950 = llvm.add %1947, %1949 : !llvm.i64 + %1951 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1952 = llvm.mul %1729, %1951 : !llvm.i64 + %1953 = llvm.add %1950, %1952 : !llvm.i64 + %1954 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1955 = llvm.mul %1945, %1954 : !llvm.i64 + %1956 = llvm.add %1953, %1955 : !llvm.i64 + %1957 = llvm.getelementptr %1946[%1956] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1922, %1957 : !llvm.ptr> + %1958 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1960 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1961 = llvm.mul %67, %1960 : !llvm.i64 + %1962 = llvm.add %1959, %1961 : !llvm.i64 + %1963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1964 = llvm.mul %56, %1963 : !llvm.i64 + %1965 = llvm.add %1962, %1964 : !llvm.i64 + %1966 = llvm.getelementptr %1958[%1965] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1967 = llvm.load %1966 : !llvm.ptr> + %1968 = llvm.add %1721, %45 : !llvm.i64 + %1969 = llvm.icmp "slt" %1968, %67 : !llvm.i64 + %1970 = llvm.sub %64, %1968 : !llvm.i64 + %1971 = llvm.select %1969, %1970, %1968 : !llvm.i1, !llvm.i64 + %1972 = llvm.sdiv %1971, %68 : !llvm.i64 + %1973 = llvm.sub %64, %1972 : !llvm.i64 + %1974 = llvm.select %1969, %1973, %1972 : !llvm.i1, !llvm.i64 + %1975 = llvm.mul %1974, %60 : !llvm.i64 + %1976 = llvm.add %1721, %1975 : !llvm.i64 + %1977 = llvm.add %1976, %45 : !llvm.i64 + %1978 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %1979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1980 = llvm.mlir.constant(256 : index) : !llvm.i64 + %1981 = llvm.mul %1977, %1980 : !llvm.i64 + %1982 = llvm.add %1979, %1981 : !llvm.i64 + %1983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %1984 = llvm.mul %1729, %1983 : !llvm.i64 + %1985 = llvm.add %1982, %1984 : !llvm.i64 + %1986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1987 = llvm.mul %1743, %1986 : !llvm.i64 + %1988 = llvm.add %1985, %1987 : !llvm.i64 + %1989 = llvm.getelementptr %1978[%1988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1967, %1989 : !llvm.ptr> + %1990 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %1991 = llvm.mlir.constant(0 : index) : !llvm.i64 + %1992 = llvm.mlir.constant(16 : index) : !llvm.i64 + %1993 = llvm.mul %67, %1992 : !llvm.i64 + %1994 = llvm.add %1991, %1993 : !llvm.i64 + %1995 = llvm.mlir.constant(1 : index) : !llvm.i64 + %1996 = llvm.mul %61, %1995 : !llvm.i64 + %1997 = llvm.add %1994, %1996 : !llvm.i64 + %1998 = llvm.getelementptr %1990[%1997] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %1999 = llvm.load %1998 : !llvm.ptr> + %2000 = llvm.add %155, %46 : !llvm.i64 + %2001 = llvm.icmp "slt" %2000, %67 : !llvm.i64 + %2002 = llvm.sub %64, %2000 : !llvm.i64 + %2003 = llvm.select %2001, %2002, %2000 : !llvm.i1, !llvm.i64 + %2004 = llvm.sdiv %2003, %68 : !llvm.i64 + %2005 = llvm.sub %64, %2004 : !llvm.i64 + %2006 = llvm.select %2001, %2005, %2004 : !llvm.i1, !llvm.i64 + %2007 = llvm.srem %2006, %68 : !llvm.i64 + %2008 = llvm.icmp "slt" %2007, %67 : !llvm.i64 + %2009 = llvm.add %2007, %68 : !llvm.i64 + %2010 = llvm.select %2008, %2009, %2007 : !llvm.i1, !llvm.i64 + %2011 = llvm.mul %2006, %65 : !llvm.i64 + %2012 = llvm.add %1779, %2011 : !llvm.i64 + %2013 = llvm.add %2012, %61 : !llvm.i64 + %2014 = llvm.icmp "slt" %2013, %67 : !llvm.i64 + %2015 = llvm.sub %64, %2013 : !llvm.i64 + %2016 = llvm.select %2014, %2015, %2013 : !llvm.i1, !llvm.i64 + %2017 = llvm.sdiv %2016, %63 : !llvm.i64 + %2018 = llvm.sub %64, %2017 : !llvm.i64 + %2019 = llvm.select %2014, %2018, %2017 : !llvm.i1, !llvm.i64 + %2020 = llvm.mul %2019, %65 : !llvm.i64 + %2021 = llvm.add %2012, %2020 : !llvm.i64 + %2022 = llvm.add %2021, %61 : !llvm.i64 + %2023 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2024 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2025 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2026 = llvm.mul %2010, %2025 : !llvm.i64 + %2027 = llvm.add %2024, %2026 : !llvm.i64 + %2028 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2029 = llvm.mul %1729, %2028 : !llvm.i64 + %2030 = llvm.add %2027, %2029 : !llvm.i64 + %2031 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2032 = llvm.mul %2022, %2031 : !llvm.i64 + %2033 = llvm.add %2030, %2032 : !llvm.i64 + %2034 = llvm.getelementptr %2023[%2033] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %1999, %2034 : !llvm.ptr> + %2035 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2036 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2037 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2038 = llvm.mul %67, %2037 : !llvm.i64 + %2039 = llvm.add %2036, %2038 : !llvm.i64 + %2040 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2041 = llvm.mul %70, %2040 : !llvm.i64 + %2042 = llvm.add %2039, %2041 : !llvm.i64 + %2043 = llvm.getelementptr %2035[%2042] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2044 = llvm.load %2043 : !llvm.ptr> + %2045 = llvm.add %1721, %48 : !llvm.i64 + %2046 = llvm.icmp "slt" %2045, %67 : !llvm.i64 + %2047 = llvm.sub %64, %2045 : !llvm.i64 + %2048 = llvm.select %2046, %2047, %2045 : !llvm.i1, !llvm.i64 + %2049 = llvm.sdiv %2048, %68 : !llvm.i64 + %2050 = llvm.sub %64, %2049 : !llvm.i64 + %2051 = llvm.select %2046, %2050, %2049 : !llvm.i1, !llvm.i64 + %2052 = llvm.mul %2051, %60 : !llvm.i64 + %2053 = llvm.add %1721, %2052 : !llvm.i64 + %2054 = llvm.add %2053, %48 : !llvm.i64 + %2055 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2056 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2057 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2058 = llvm.mul %2054, %2057 : !llvm.i64 + %2059 = llvm.add %2056, %2058 : !llvm.i64 + %2060 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2061 = llvm.mul %1729, %2060 : !llvm.i64 + %2062 = llvm.add %2059, %2061 : !llvm.i64 + %2063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2064 = llvm.mul %1743, %2063 : !llvm.i64 + %2065 = llvm.add %2062, %2064 : !llvm.i64 + %2066 = llvm.getelementptr %2055[%2065] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2044, %2066 : !llvm.ptr> + %2067 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2068 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2069 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2070 = llvm.mul %67, %2069 : !llvm.i64 + %2071 = llvm.add %2068, %2070 : !llvm.i64 + %2072 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2073 = llvm.mul %50, %2072 : !llvm.i64 + %2074 = llvm.add %2071, %2073 : !llvm.i64 + %2075 = llvm.getelementptr %2067[%2074] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2076 = llvm.load %2075 : !llvm.ptr> + %2077 = llvm.add %155, %49 : !llvm.i64 + %2078 = llvm.icmp "slt" %2077, %67 : !llvm.i64 + %2079 = llvm.sub %64, %2077 : !llvm.i64 + %2080 = llvm.select %2078, %2079, %2077 : !llvm.i1, !llvm.i64 + %2081 = llvm.sdiv %2080, %68 : !llvm.i64 + %2082 = llvm.sub %64, %2081 : !llvm.i64 + %2083 = llvm.select %2078, %2082, %2081 : !llvm.i1, !llvm.i64 + %2084 = llvm.srem %2083, %68 : !llvm.i64 + %2085 = llvm.icmp "slt" %2084, %67 : !llvm.i64 + %2086 = llvm.add %2084, %68 : !llvm.i64 + %2087 = llvm.select %2085, %2086, %2084 : !llvm.i1, !llvm.i64 + %2088 = llvm.mul %2083, %65 : !llvm.i64 + %2089 = llvm.add %1779, %2088 : !llvm.i64 + %2090 = llvm.add %2089, %50 : !llvm.i64 + %2091 = llvm.icmp "slt" %2090, %67 : !llvm.i64 + %2092 = llvm.sub %64, %2090 : !llvm.i64 + %2093 = llvm.select %2091, %2092, %2090 : !llvm.i1, !llvm.i64 + %2094 = llvm.sdiv %2093, %63 : !llvm.i64 + %2095 = llvm.sub %64, %2094 : !llvm.i64 + %2096 = llvm.select %2091, %2095, %2094 : !llvm.i1, !llvm.i64 + %2097 = llvm.mul %2096, %65 : !llvm.i64 + %2098 = llvm.add %2089, %2097 : !llvm.i64 + %2099 = llvm.add %2098, %50 : !llvm.i64 + %2100 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2101 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2102 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2103 = llvm.mul %2087, %2102 : !llvm.i64 + %2104 = llvm.add %2101, %2103 : !llvm.i64 + %2105 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2106 = llvm.mul %1729, %2105 : !llvm.i64 + %2107 = llvm.add %2104, %2106 : !llvm.i64 + %2108 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2109 = llvm.mul %2099, %2108 : !llvm.i64 + %2110 = llvm.add %2107, %2109 : !llvm.i64 + %2111 = llvm.getelementptr %2100[%2110] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2076, %2111 : !llvm.ptr> + %2112 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2114 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2115 = llvm.mul %67, %2114 : !llvm.i64 + %2116 = llvm.add %2113, %2115 : !llvm.i64 + %2117 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2118 = llvm.mul %33, %2117 : !llvm.i64 + %2119 = llvm.add %2116, %2118 : !llvm.i64 + %2120 = llvm.getelementptr %2112[%2119] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2121 = llvm.load %2120 : !llvm.ptr> + %2122 = llvm.add %1721, %52 : !llvm.i64 + %2123 = llvm.icmp "slt" %2122, %67 : !llvm.i64 + %2124 = llvm.sub %64, %2122 : !llvm.i64 + %2125 = llvm.select %2123, %2124, %2122 : !llvm.i1, !llvm.i64 + %2126 = llvm.sdiv %2125, %68 : !llvm.i64 + %2127 = llvm.sub %64, %2126 : !llvm.i64 + %2128 = llvm.select %2123, %2127, %2126 : !llvm.i1, !llvm.i64 + %2129 = llvm.mul %2128, %60 : !llvm.i64 + %2130 = llvm.add %1721, %2129 : !llvm.i64 + %2131 = llvm.add %2130, %52 : !llvm.i64 + %2132 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2134 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2135 = llvm.mul %2131, %2134 : !llvm.i64 + %2136 = llvm.add %2133, %2135 : !llvm.i64 + %2137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2138 = llvm.mul %1729, %2137 : !llvm.i64 + %2139 = llvm.add %2136, %2138 : !llvm.i64 + %2140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2141 = llvm.mul %1743, %2140 : !llvm.i64 + %2142 = llvm.add %2139, %2141 : !llvm.i64 + %2143 = llvm.getelementptr %2132[%2142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2121, %2143 : !llvm.ptr> + %2144 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2145 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2146 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2147 = llvm.mul %67, %2146 : !llvm.i64 + %2148 = llvm.add %2145, %2147 : !llvm.i64 + %2149 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2150 = llvm.mul %54, %2149 : !llvm.i64 + %2151 = llvm.add %2148, %2150 : !llvm.i64 + %2152 = llvm.getelementptr %2144[%2151] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2153 = llvm.load %2152 : !llvm.ptr> + %2154 = llvm.add %155, %53 : !llvm.i64 + %2155 = llvm.icmp "slt" %2154, %67 : !llvm.i64 + %2156 = llvm.sub %64, %2154 : !llvm.i64 + %2157 = llvm.select %2155, %2156, %2154 : !llvm.i1, !llvm.i64 + %2158 = llvm.sdiv %2157, %68 : !llvm.i64 + %2159 = llvm.sub %64, %2158 : !llvm.i64 + %2160 = llvm.select %2155, %2159, %2158 : !llvm.i1, !llvm.i64 + %2161 = llvm.srem %2160, %68 : !llvm.i64 + %2162 = llvm.icmp "slt" %2161, %67 : !llvm.i64 + %2163 = llvm.add %2161, %68 : !llvm.i64 + %2164 = llvm.select %2162, %2163, %2161 : !llvm.i1, !llvm.i64 + %2165 = llvm.mul %2160, %65 : !llvm.i64 + %2166 = llvm.add %1779, %2165 : !llvm.i64 + %2167 = llvm.add %2166, %54 : !llvm.i64 + %2168 = llvm.icmp "slt" %2167, %67 : !llvm.i64 + %2169 = llvm.sub %64, %2167 : !llvm.i64 + %2170 = llvm.select %2168, %2169, %2167 : !llvm.i1, !llvm.i64 + %2171 = llvm.sdiv %2170, %63 : !llvm.i64 + %2172 = llvm.sub %64, %2171 : !llvm.i64 + %2173 = llvm.select %2168, %2172, %2171 : !llvm.i1, !llvm.i64 + %2174 = llvm.mul %2173, %65 : !llvm.i64 + %2175 = llvm.add %2166, %2174 : !llvm.i64 + %2176 = llvm.add %2175, %54 : !llvm.i64 + %2177 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2178 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2179 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2180 = llvm.mul %2164, %2179 : !llvm.i64 + %2181 = llvm.add %2178, %2180 : !llvm.i64 + %2182 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2183 = llvm.mul %1729, %2182 : !llvm.i64 + %2184 = llvm.add %2181, %2183 : !llvm.i64 + %2185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2186 = llvm.mul %2176, %2185 : !llvm.i64 + %2187 = llvm.add %2184, %2186 : !llvm.i64 + %2188 = llvm.getelementptr %2177[%2187] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2153, %2188 : !llvm.ptr> + %2189 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2190 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2191 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2192 = llvm.mul %67, %2191 : !llvm.i64 + %2193 = llvm.add %2190, %2192 : !llvm.i64 + %2194 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2195 = llvm.mul %34, %2194 : !llvm.i64 + %2196 = llvm.add %2193, %2195 : !llvm.i64 + %2197 = llvm.getelementptr %2189[%2196] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2198 = llvm.load %2197 : !llvm.ptr> + %2199 = llvm.add %1721, %56 : !llvm.i64 + %2200 = llvm.icmp "slt" %2199, %67 : !llvm.i64 + %2201 = llvm.sub %64, %2199 : !llvm.i64 + %2202 = llvm.select %2200, %2201, %2199 : !llvm.i1, !llvm.i64 + %2203 = llvm.sdiv %2202, %68 : !llvm.i64 + %2204 = llvm.sub %64, %2203 : !llvm.i64 + %2205 = llvm.select %2200, %2204, %2203 : !llvm.i1, !llvm.i64 + %2206 = llvm.mul %2205, %60 : !llvm.i64 + %2207 = llvm.add %1721, %2206 : !llvm.i64 + %2208 = llvm.add %2207, %56 : !llvm.i64 + %2209 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2211 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2212 = llvm.mul %2208, %2211 : !llvm.i64 + %2213 = llvm.add %2210, %2212 : !llvm.i64 + %2214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2215 = llvm.mul %1729, %2214 : !llvm.i64 + %2216 = llvm.add %2213, %2215 : !llvm.i64 + %2217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2218 = llvm.mul %1743, %2217 : !llvm.i64 + %2219 = llvm.add %2216, %2218 : !llvm.i64 + %2220 = llvm.getelementptr %2209[%2219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2198, %2220 : !llvm.ptr> + %2221 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2222 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2223 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2224 = llvm.mul %67, %2223 : !llvm.i64 + %2225 = llvm.add %2222, %2224 : !llvm.i64 + %2226 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2227 = llvm.mul %58, %2226 : !llvm.i64 + %2228 = llvm.add %2225, %2227 : !llvm.i64 + %2229 = llvm.getelementptr %2221[%2228] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2230 = llvm.load %2229 : !llvm.ptr> + %2231 = llvm.add %155, %57 : !llvm.i64 + %2232 = llvm.icmp "slt" %2231, %67 : !llvm.i64 + %2233 = llvm.sub %64, %2231 : !llvm.i64 + %2234 = llvm.select %2232, %2233, %2231 : !llvm.i1, !llvm.i64 + %2235 = llvm.sdiv %2234, %68 : !llvm.i64 + %2236 = llvm.sub %64, %2235 : !llvm.i64 + %2237 = llvm.select %2232, %2236, %2235 : !llvm.i1, !llvm.i64 + %2238 = llvm.srem %2237, %68 : !llvm.i64 + %2239 = llvm.icmp "slt" %2238, %67 : !llvm.i64 + %2240 = llvm.add %2238, %68 : !llvm.i64 + %2241 = llvm.select %2239, %2240, %2238 : !llvm.i1, !llvm.i64 + %2242 = llvm.mul %2237, %65 : !llvm.i64 + %2243 = llvm.add %1779, %2242 : !llvm.i64 + %2244 = llvm.add %2243, %58 : !llvm.i64 + %2245 = llvm.icmp "slt" %2244, %67 : !llvm.i64 + %2246 = llvm.sub %64, %2244 : !llvm.i64 + %2247 = llvm.select %2245, %2246, %2244 : !llvm.i1, !llvm.i64 + %2248 = llvm.sdiv %2247, %63 : !llvm.i64 + %2249 = llvm.sub %64, %2248 : !llvm.i64 + %2250 = llvm.select %2245, %2249, %2248 : !llvm.i1, !llvm.i64 + %2251 = llvm.mul %2250, %65 : !llvm.i64 + %2252 = llvm.add %2243, %2251 : !llvm.i64 + %2253 = llvm.add %2252, %58 : !llvm.i64 + %2254 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2256 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2257 = llvm.mul %2241, %2256 : !llvm.i64 + %2258 = llvm.add %2255, %2257 : !llvm.i64 + %2259 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2260 = llvm.mul %1729, %2259 : !llvm.i64 + %2261 = llvm.add %2258, %2260 : !llvm.i64 + %2262 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2263 = llvm.mul %2253, %2262 : !llvm.i64 + %2264 = llvm.add %2261, %2263 : !llvm.i64 + %2265 = llvm.getelementptr %2254[%2264] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2230, %2265 : !llvm.ptr> + %2266 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2268 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2269 = llvm.mul %67, %2268 : !llvm.i64 + %2270 = llvm.add %2267, %2269 : !llvm.i64 + %2271 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2272 = llvm.mul %35, %2271 : !llvm.i64 + %2273 = llvm.add %2270, %2272 : !llvm.i64 + %2274 = llvm.getelementptr %2266[%2273] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2275 = llvm.load %2274 : !llvm.ptr> + %2276 = llvm.add %1721, %61 : !llvm.i64 + %2277 = llvm.icmp "slt" %2276, %67 : !llvm.i64 + %2278 = llvm.sub %64, %2276 : !llvm.i64 + %2279 = llvm.select %2277, %2278, %2276 : !llvm.i1, !llvm.i64 + %2280 = llvm.sdiv %2279, %68 : !llvm.i64 + %2281 = llvm.sub %64, %2280 : !llvm.i64 + %2282 = llvm.select %2277, %2281, %2280 : !llvm.i1, !llvm.i64 + %2283 = llvm.mul %2282, %60 : !llvm.i64 + %2284 = llvm.add %1721, %2283 : !llvm.i64 + %2285 = llvm.add %2284, %61 : !llvm.i64 + %2286 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2287 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2288 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2289 = llvm.mul %2285, %2288 : !llvm.i64 + %2290 = llvm.add %2287, %2289 : !llvm.i64 + %2291 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2292 = llvm.mul %1729, %2291 : !llvm.i64 + %2293 = llvm.add %2290, %2292 : !llvm.i64 + %2294 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2295 = llvm.mul %1743, %2294 : !llvm.i64 + %2296 = llvm.add %2293, %2295 : !llvm.i64 + %2297 = llvm.getelementptr %2286[%2296] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2275, %2297 : !llvm.ptr> + %2298 = llvm.extractvalue %90[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %2299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2300 = llvm.mlir.constant(16 : index) : !llvm.i64 + %2301 = llvm.mul %67, %2300 : !llvm.i64 + %2302 = llvm.add %2299, %2301 : !llvm.i64 + %2303 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2304 = llvm.mul %66, %2303 : !llvm.i64 + %2305 = llvm.add %2302, %2304 : !llvm.i64 + %2306 = llvm.getelementptr %2298[%2305] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2307 = llvm.load %2306 : !llvm.ptr> + %2308 = llvm.add %155, %62 : !llvm.i64 + %2309 = llvm.icmp "slt" %2308, %67 : !llvm.i64 + %2310 = llvm.sub %64, %2308 : !llvm.i64 + %2311 = llvm.select %2309, %2310, %2308 : !llvm.i1, !llvm.i64 + %2312 = llvm.sdiv %2311, %68 : !llvm.i64 + %2313 = llvm.sub %64, %2312 : !llvm.i64 + %2314 = llvm.select %2309, %2313, %2312 : !llvm.i1, !llvm.i64 + %2315 = llvm.srem %2314, %68 : !llvm.i64 + %2316 = llvm.icmp "slt" %2315, %67 : !llvm.i64 + %2317 = llvm.add %2315, %68 : !llvm.i64 + %2318 = llvm.select %2316, %2317, %2315 : !llvm.i1, !llvm.i64 + %2319 = llvm.mul %2314, %65 : !llvm.i64 + %2320 = llvm.add %1779, %2319 : !llvm.i64 + %2321 = llvm.add %2320, %66 : !llvm.i64 + %2322 = llvm.icmp "slt" %2321, %67 : !llvm.i64 + %2323 = llvm.sub %64, %2321 : !llvm.i64 + %2324 = llvm.select %2322, %2323, %2321 : !llvm.i1, !llvm.i64 + %2325 = llvm.sdiv %2324, %63 : !llvm.i64 + %2326 = llvm.sub %64, %2325 : !llvm.i64 + %2327 = llvm.select %2322, %2326, %2325 : !llvm.i1, !llvm.i64 + %2328 = llvm.mul %2327, %65 : !llvm.i64 + %2329 = llvm.add %2320, %2328 : !llvm.i64 + %2330 = llvm.add %2329, %66 : !llvm.i64 + %2331 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2332 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2333 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2334 = llvm.mul %2318, %2333 : !llvm.i64 + %2335 = llvm.add %2332, %2334 : !llvm.i64 + %2336 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2337 = llvm.mul %1729, %2336 : !llvm.i64 + %2338 = llvm.add %2335, %2337 : !llvm.i64 + %2339 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2340 = llvm.mul %2330, %2339 : !llvm.i64 + %2341 = llvm.add %2338, %2340 : !llvm.i64 + %2342 = llvm.getelementptr %2331[%2341] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2307, %2342 : !llvm.ptr> + llvm.br ^bb9 + ^bb9: // 2 preds: ^bb7, ^bb8 + %2343 = llvm.add %155, %39 : !llvm.i64 + llvm.br ^bb5(%2343 : !llvm.i64) + ^bb10: // pred: ^bb5 + %2344 = llvm.add %153, %69 : !llvm.i64 + llvm.br ^bb3(%2344 : !llvm.i64) + ^bb11: // pred: ^bb3 + llvm.br ^bb12(%67 : !llvm.i64) + ^bb12(%2345: !llvm.i64): // 2 preds: ^bb11, ^bb51 + %2346 = llvm.icmp "slt" %2345, %37 : !llvm.i64 + llvm.cond_br %2346, ^bb13, ^bb52 + ^bb13: // pred: ^bb12 + llvm.br ^bb14(%67 : !llvm.i64) + ^bb14(%2347: !llvm.i64): // 2 preds: ^bb13, ^bb21 + %2348 = llvm.icmp "slt" %2347, %68 : !llvm.i64 + llvm.cond_br %2348, ^bb15, ^bb22 + ^bb15: // pred: ^bb14 + llvm.br ^bb16(%67 : !llvm.i64) + ^bb16(%2349: !llvm.i64): // 2 preds: ^bb15, ^bb20 + %2350 = llvm.icmp "slt" %2349, %56 : !llvm.i64 + llvm.cond_br %2350, ^bb17, ^bb21 + ^bb17: // pred: ^bb16 + llvm.br ^bb18(%67 : !llvm.i64) + ^bb18(%2351: !llvm.i64): // 2 preds: ^bb17, ^bb19 + %2352 = llvm.icmp "slt" %2351, %63 : !llvm.i64 + llvm.cond_br %2352, ^bb19, ^bb20 + ^bb19: // pred: ^bb18 + %2353 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2354 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2355 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2356 = llvm.mul %2347, %2355 : !llvm.i64 + %2357 = llvm.add %2354, %2356 : !llvm.i64 + %2358 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2359 = llvm.mul %2349, %2358 : !llvm.i64 + %2360 = llvm.add %2357, %2359 : !llvm.i64 + %2361 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2362 = llvm.mul %2351, %2361 : !llvm.i64 + %2363 = llvm.add %2360, %2362 : !llvm.i64 + %2364 = llvm.getelementptr %2353[%2363] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %32, %2364 : !llvm.ptr> + %2365 = llvm.add %2351, %69 : !llvm.i64 + llvm.br ^bb18(%2365 : !llvm.i64) + ^bb20: // pred: ^bb18 + %2366 = llvm.add %2349, %69 : !llvm.i64 + llvm.br ^bb16(%2366 : !llvm.i64) + ^bb21: // pred: ^bb16 + %2367 = llvm.add %2347, %69 : !llvm.i64 + llvm.br ^bb14(%2367 : !llvm.i64) + ^bb22: // pred: ^bb14 + llvm.br ^bb23(%67 : !llvm.i64) + ^bb23(%2368: !llvm.i64): // 2 preds: ^bb22, ^bb39 + %2369 = llvm.icmp "slt" %2368, %38 : !llvm.i64 + llvm.cond_br %2369, ^bb24, ^bb40 + ^bb24: // pred: ^bb23 + llvm.br ^bb25(%67 : !llvm.i64) + ^bb25(%2370: !llvm.i64): // 2 preds: ^bb24, ^bb38 + %2371 = llvm.icmp "slt" %2370, %39 : !llvm.i64 + llvm.cond_br %2371, ^bb26, ^bb39 + ^bb26: // pred: ^bb25 + llvm.br ^bb27(%67 : !llvm.i64) + ^bb27(%2372: !llvm.i64): // 2 preds: ^bb26, ^bb34 + %2373 = llvm.icmp "slt" %2372, %67 : !llvm.i64 + llvm.cond_br %2373, ^bb28, ^bb35 + ^bb28: // pred: ^bb27 + llvm.br ^bb29(%67 : !llvm.i64) + ^bb29(%2374: !llvm.i64): // 2 preds: ^bb28, ^bb33 + %2375 = llvm.icmp "slt" %2374, %48 : !llvm.i64 + llvm.cond_br %2375, ^bb30, ^bb34 + ^bb30: // pred: ^bb29 + llvm.br ^bb31(%67 : !llvm.i64) + ^bb31(%2376: !llvm.i64): // 2 preds: ^bb30, ^bb32 + %2377 = llvm.icmp "slt" %2376, %67 : !llvm.i64 + llvm.cond_br %2377, ^bb32, ^bb33 + ^bb32: // pred: ^bb31 + %2378 = llvm.add %2345, %2372 : !llvm.i64 + %2379 = llvm.add %2378, %2376 : !llvm.i64 + %2380 = llvm.add %2370, %2374 : !llvm.i64 + %2381 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2382 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2383 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2384 = llvm.mul %2379, %2383 : !llvm.i64 + %2385 = llvm.add %2382, %2384 : !llvm.i64 + %2386 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2387 = llvm.mul %2380, %2386 : !llvm.i64 + %2388 = llvm.add %2385, %2387 : !llvm.i64 + %2389 = llvm.getelementptr %2381[%2388] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2390 = llvm.load %2389 : !llvm.ptr + %2391 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2392 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2393 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2394 = llvm.mul %2379, %2393 : !llvm.i64 + %2395 = llvm.add %2392, %2394 : !llvm.i64 + %2396 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2397 = llvm.mul %2380, %2396 : !llvm.i64 + %2398 = llvm.add %2395, %2397 : !llvm.i64 + %2399 = llvm.getelementptr %2391[%2398] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2400 = llvm.load %2399 : !llvm.ptr + %2401 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2402 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2403 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2404 = llvm.mul %2379, %2403 : !llvm.i64 + %2405 = llvm.add %2402, %2404 : !llvm.i64 + %2406 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2407 = llvm.mul %2380, %2406 : !llvm.i64 + %2408 = llvm.add %2405, %2407 : !llvm.i64 + %2409 = llvm.getelementptr %2401[%2408] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2410 = llvm.load %2409 : !llvm.ptr + %2411 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2412 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2413 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2414 = llvm.mul %2379, %2413 : !llvm.i64 + %2415 = llvm.add %2412, %2414 : !llvm.i64 + %2416 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2417 = llvm.mul %2380, %2416 : !llvm.i64 + %2418 = llvm.add %2415, %2417 : !llvm.i64 + %2419 = llvm.getelementptr %2411[%2418] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2420 = llvm.load %2419 : !llvm.ptr + %2421 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2422 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2423 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2424 = llvm.mul %2379, %2423 : !llvm.i64 + %2425 = llvm.add %2422, %2424 : !llvm.i64 + %2426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2427 = llvm.mul %2380, %2426 : !llvm.i64 + %2428 = llvm.add %2425, %2427 : !llvm.i64 + %2429 = llvm.getelementptr %2421[%2428] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2430 = llvm.load %2429 : !llvm.ptr + %2431 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2432 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2433 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2434 = llvm.mul %2379, %2433 : !llvm.i64 + %2435 = llvm.add %2432, %2434 : !llvm.i64 + %2436 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2437 = llvm.mul %2380, %2436 : !llvm.i64 + %2438 = llvm.add %2435, %2437 : !llvm.i64 + %2439 = llvm.getelementptr %2431[%2438] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2440 = llvm.load %2439 : !llvm.ptr + %2441 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2442 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2443 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2444 = llvm.mul %2379, %2443 : !llvm.i64 + %2445 = llvm.add %2442, %2444 : !llvm.i64 + %2446 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2447 = llvm.mul %2380, %2446 : !llvm.i64 + %2448 = llvm.add %2445, %2447 : !llvm.i64 + %2449 = llvm.getelementptr %2441[%2448] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2450 = llvm.load %2449 : !llvm.ptr + %2451 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2452 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2453 = llvm.mlir.constant(128 : index) : !llvm.i64 + %2454 = llvm.mul %2379, %2453 : !llvm.i64 + %2455 = llvm.add %2452, %2454 : !llvm.i64 + %2456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2457 = llvm.mul %2380, %2456 : !llvm.i64 + %2458 = llvm.add %2455, %2457 : !llvm.i64 + %2459 = llvm.getelementptr %2451[%2458] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %2460 = llvm.load %2459 : !llvm.ptr + %2461 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %2462 = llvm.sub %64, %2368 : !llvm.i64 + %2463 = llvm.select %2461, %2462, %2368 : !llvm.i1, !llvm.i64 + %2464 = llvm.sdiv %2463, %68 : !llvm.i64 + %2465 = llvm.sub %64, %2464 : !llvm.i64 + %2466 = llvm.select %2461, %2465, %2464 : !llvm.i1, !llvm.i64 + %2467 = llvm.srem %2466, %68 : !llvm.i64 + %2468 = llvm.icmp "slt" %2467, %67 : !llvm.i64 + %2469 = llvm.add %2467, %68 : !llvm.i64 + %2470 = llvm.select %2468, %2469, %2467 : !llvm.i1, !llvm.i64 + %2471 = llvm.srem %2380, %39 : !llvm.i64 + %2472 = llvm.icmp "slt" %2471, %67 : !llvm.i64 + %2473 = llvm.add %2471, %39 : !llvm.i64 + %2474 = llvm.select %2472, %2473, %2471 : !llvm.i1, !llvm.i64 + %2475 = llvm.srem %2368, %68 : !llvm.i64 + %2476 = llvm.icmp "slt" %2475, %67 : !llvm.i64 + %2477 = llvm.add %2475, %68 : !llvm.i64 + %2478 = llvm.select %2476, %2477, %2475 : !llvm.i1, !llvm.i64 + %2479 = llvm.icmp "slt" %2478, %67 : !llvm.i64 + %2480 = llvm.sub %64, %2478 : !llvm.i64 + %2481 = llvm.select %2479, %2480, %2478 : !llvm.i1, !llvm.i64 + %2482 = llvm.sdiv %2481, %70 : !llvm.i64 + %2483 = llvm.sub %64, %2482 : !llvm.i64 + %2484 = llvm.select %2479, %2483, %2482 : !llvm.i1, !llvm.i64 + %2485 = llvm.srem %2484, %63 : !llvm.i64 + %2486 = llvm.icmp "slt" %2485, %67 : !llvm.i64 + %2487 = llvm.add %2485, %63 : !llvm.i64 + %2488 = llvm.select %2486, %2487, %2485 : !llvm.i1, !llvm.i64 + %2489 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2490 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2491 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2492 = llvm.mul %2470, %2491 : !llvm.i64 + %2493 = llvm.add %2490, %2492 : !llvm.i64 + %2494 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2495 = llvm.mul %2474, %2494 : !llvm.i64 + %2496 = llvm.add %2493, %2495 : !llvm.i64 + %2497 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2498 = llvm.mul %2488, %2497 : !llvm.i64 + %2499 = llvm.add %2496, %2498 : !llvm.i64 + %2500 = llvm.getelementptr %2489[%2499] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2501 = llvm.load %2500 : !llvm.ptr> + %2502 = llvm.extractelement %2501[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2503 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2504 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2505 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2506 = llvm.mul %2470, %2505 : !llvm.i64 + %2507 = llvm.add %2504, %2506 : !llvm.i64 + %2508 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2509 = llvm.mul %2474, %2508 : !llvm.i64 + %2510 = llvm.add %2507, %2509 : !llvm.i64 + %2511 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2512 = llvm.mul %2488, %2511 : !llvm.i64 + %2513 = llvm.add %2510, %2512 : !llvm.i64 + %2514 = llvm.getelementptr %2503[%2513] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2515 = llvm.load %2514 : !llvm.ptr> + %2516 = llvm.extractelement %2515[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2517 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2518 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2519 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2520 = llvm.mul %2470, %2519 : !llvm.i64 + %2521 = llvm.add %2518, %2520 : !llvm.i64 + %2522 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2523 = llvm.mul %2474, %2522 : !llvm.i64 + %2524 = llvm.add %2521, %2523 : !llvm.i64 + %2525 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2526 = llvm.mul %2488, %2525 : !llvm.i64 + %2527 = llvm.add %2524, %2526 : !llvm.i64 + %2528 = llvm.getelementptr %2517[%2527] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2529 = llvm.load %2528 : !llvm.ptr> + %2530 = llvm.extractelement %2529[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2531 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2532 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2533 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2534 = llvm.mul %2470, %2533 : !llvm.i64 + %2535 = llvm.add %2532, %2534 : !llvm.i64 + %2536 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2537 = llvm.mul %2474, %2536 : !llvm.i64 + %2538 = llvm.add %2535, %2537 : !llvm.i64 + %2539 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2540 = llvm.mul %2488, %2539 : !llvm.i64 + %2541 = llvm.add %2538, %2540 : !llvm.i64 + %2542 = llvm.getelementptr %2531[%2541] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2543 = llvm.load %2542 : !llvm.ptr> + %2544 = llvm.extractelement %2543[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2545 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2546 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2547 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2548 = llvm.mul %2470, %2547 : !llvm.i64 + %2549 = llvm.add %2546, %2548 : !llvm.i64 + %2550 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2551 = llvm.mul %2474, %2550 : !llvm.i64 + %2552 = llvm.add %2549, %2551 : !llvm.i64 + %2553 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2554 = llvm.mul %2488, %2553 : !llvm.i64 + %2555 = llvm.add %2552, %2554 : !llvm.i64 + %2556 = llvm.getelementptr %2545[%2555] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2557 = llvm.load %2556 : !llvm.ptr> + %2558 = llvm.extractelement %2557[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2559 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2560 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2561 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2562 = llvm.mul %2470, %2561 : !llvm.i64 + %2563 = llvm.add %2560, %2562 : !llvm.i64 + %2564 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2565 = llvm.mul %2474, %2564 : !llvm.i64 + %2566 = llvm.add %2563, %2565 : !llvm.i64 + %2567 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2568 = llvm.mul %2488, %2567 : !llvm.i64 + %2569 = llvm.add %2566, %2568 : !llvm.i64 + %2570 = llvm.getelementptr %2559[%2569] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2571 = llvm.load %2570 : !llvm.ptr> + %2572 = llvm.extractelement %2571[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2573 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2574 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2575 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2576 = llvm.mul %2470, %2575 : !llvm.i64 + %2577 = llvm.add %2574, %2576 : !llvm.i64 + %2578 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2579 = llvm.mul %2474, %2578 : !llvm.i64 + %2580 = llvm.add %2577, %2579 : !llvm.i64 + %2581 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2582 = llvm.mul %2488, %2581 : !llvm.i64 + %2583 = llvm.add %2580, %2582 : !llvm.i64 + %2584 = llvm.getelementptr %2573[%2583] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2585 = llvm.load %2584 : !llvm.ptr> + %2586 = llvm.extractelement %2585[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2587 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2588 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2589 = llvm.mlir.constant(256 : index) : !llvm.i64 + %2590 = llvm.mul %2470, %2589 : !llvm.i64 + %2591 = llvm.add %2588, %2590 : !llvm.i64 + %2592 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2593 = llvm.mul %2474, %2592 : !llvm.i64 + %2594 = llvm.add %2591, %2593 : !llvm.i64 + %2595 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2596 = llvm.mul %2488, %2595 : !llvm.i64 + %2597 = llvm.add %2594, %2596 : !llvm.i64 + %2598 = llvm.getelementptr %2587[%2597] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2599 = llvm.load %2598 : !llvm.ptr> + %2600 = llvm.extractelement %2599[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2601 = llvm.fmul %2390, %2502 {RelaxedPrecision} : !llvm.float + %2602 = llvm.fmul %2400, %2516 {RelaxedPrecision} : !llvm.float + %2603 = llvm.fmul %2410, %2530 {RelaxedPrecision} : !llvm.float + %2604 = llvm.fmul %2420, %2544 {RelaxedPrecision} : !llvm.float + %2605 = llvm.fmul %2430, %2558 {RelaxedPrecision} : !llvm.float + %2606 = llvm.fmul %2440, %2572 {RelaxedPrecision} : !llvm.float + %2607 = llvm.fmul %2450, %2586 {RelaxedPrecision} : !llvm.float + %2608 = llvm.fmul %2460, %2600 {RelaxedPrecision} : !llvm.float + %2609 = llvm.add %2372, %2376 : !llvm.i64 + %2610 = llvm.srem %2609, %56 : !llvm.i64 + %2611 = llvm.icmp "slt" %2610, %67 : !llvm.i64 + %2612 = llvm.add %2610, %56 : !llvm.i64 + %2613 = llvm.select %2611, %2612, %2610 : !llvm.i1, !llvm.i64 + %2614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2617 = llvm.mul %2470, %2616 : !llvm.i64 + %2618 = llvm.add %2615, %2617 : !llvm.i64 + %2619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2620 = llvm.mul %2613, %2619 : !llvm.i64 + %2621 = llvm.add %2618, %2620 : !llvm.i64 + %2622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2623 = llvm.mul %2488, %2622 : !llvm.i64 + %2624 = llvm.add %2621, %2623 : !llvm.i64 + %2625 = llvm.getelementptr %2614[%2624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2626 = llvm.load %2625 : !llvm.ptr> + %2627 = llvm.extractelement %2626[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2628 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2629 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2630 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2631 = llvm.mul %2470, %2630 : !llvm.i64 + %2632 = llvm.add %2629, %2631 : !llvm.i64 + %2633 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2634 = llvm.mul %2613, %2633 : !llvm.i64 + %2635 = llvm.add %2632, %2634 : !llvm.i64 + %2636 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2637 = llvm.mul %2488, %2636 : !llvm.i64 + %2638 = llvm.add %2635, %2637 : !llvm.i64 + %2639 = llvm.getelementptr %2628[%2638] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2640 = llvm.load %2639 : !llvm.ptr> + %2641 = llvm.extractelement %2640[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2642 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2643 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2644 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2645 = llvm.mul %2470, %2644 : !llvm.i64 + %2646 = llvm.add %2643, %2645 : !llvm.i64 + %2647 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2648 = llvm.mul %2613, %2647 : !llvm.i64 + %2649 = llvm.add %2646, %2648 : !llvm.i64 + %2650 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2651 = llvm.mul %2488, %2650 : !llvm.i64 + %2652 = llvm.add %2649, %2651 : !llvm.i64 + %2653 = llvm.getelementptr %2642[%2652] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2654 = llvm.load %2653 : !llvm.ptr> + %2655 = llvm.extractelement %2654[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2659 = llvm.mul %2470, %2658 : !llvm.i64 + %2660 = llvm.add %2657, %2659 : !llvm.i64 + %2661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2662 = llvm.mul %2613, %2661 : !llvm.i64 + %2663 = llvm.add %2660, %2662 : !llvm.i64 + %2664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2665 = llvm.mul %2488, %2664 : !llvm.i64 + %2666 = llvm.add %2663, %2665 : !llvm.i64 + %2667 = llvm.getelementptr %2656[%2666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2668 = llvm.load %2667 : !llvm.ptr> + %2669 = llvm.extractelement %2668[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2673 = llvm.mul %2470, %2672 : !llvm.i64 + %2674 = llvm.add %2671, %2673 : !llvm.i64 + %2675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2676 = llvm.mul %2613, %2675 : !llvm.i64 + %2677 = llvm.add %2674, %2676 : !llvm.i64 + %2678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2679 = llvm.mul %2488, %2678 : !llvm.i64 + %2680 = llvm.add %2677, %2679 : !llvm.i64 + %2681 = llvm.getelementptr %2670[%2680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2682 = llvm.load %2681 : !llvm.ptr> + %2683 = llvm.extractelement %2682[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2684 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2685 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2686 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2687 = llvm.mul %2470, %2686 : !llvm.i64 + %2688 = llvm.add %2685, %2687 : !llvm.i64 + %2689 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2690 = llvm.mul %2613, %2689 : !llvm.i64 + %2691 = llvm.add %2688, %2690 : !llvm.i64 + %2692 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2693 = llvm.mul %2488, %2692 : !llvm.i64 + %2694 = llvm.add %2691, %2693 : !llvm.i64 + %2695 = llvm.getelementptr %2684[%2694] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2696 = llvm.load %2695 : !llvm.ptr> + %2697 = llvm.extractelement %2696[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2698 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2699 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2700 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2701 = llvm.mul %2470, %2700 : !llvm.i64 + %2702 = llvm.add %2699, %2701 : !llvm.i64 + %2703 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2704 = llvm.mul %2613, %2703 : !llvm.i64 + %2705 = llvm.add %2702, %2704 : !llvm.i64 + %2706 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2707 = llvm.mul %2488, %2706 : !llvm.i64 + %2708 = llvm.add %2705, %2707 : !llvm.i64 + %2709 = llvm.getelementptr %2698[%2708] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2710 = llvm.load %2709 : !llvm.ptr> + %2711 = llvm.extractelement %2710[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2712 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2714 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2715 = llvm.mul %2470, %2714 : !llvm.i64 + %2716 = llvm.add %2713, %2715 : !llvm.i64 + %2717 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2718 = llvm.mul %2613, %2717 : !llvm.i64 + %2719 = llvm.add %2716, %2718 : !llvm.i64 + %2720 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2721 = llvm.mul %2488, %2720 : !llvm.i64 + %2722 = llvm.add %2719, %2721 : !llvm.i64 + %2723 = llvm.getelementptr %2712[%2722] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2724 = llvm.load %2723 : !llvm.ptr> + %2725 = llvm.extractelement %2724[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2726 = llvm.fadd %2627, %2601 {RelaxedPrecision} : !llvm.float + %2727 = llvm.fadd %2641, %2602 {RelaxedPrecision} : !llvm.float + %2728 = llvm.fadd %2655, %2603 {RelaxedPrecision} : !llvm.float + %2729 = llvm.fadd %2669, %2604 {RelaxedPrecision} : !llvm.float + %2730 = llvm.fadd %2683, %2605 {RelaxedPrecision} : !llvm.float + %2731 = llvm.fadd %2697, %2606 {RelaxedPrecision} : !llvm.float + %2732 = llvm.fadd %2711, %2607 {RelaxedPrecision} : !llvm.float + %2733 = llvm.fadd %2725, %2608 {RelaxedPrecision} : !llvm.float + %2734 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2735 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2736 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2737 = llvm.mul %2470, %2736 : !llvm.i64 + %2738 = llvm.add %2735, %2737 : !llvm.i64 + %2739 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2740 = llvm.mul %2613, %2739 : !llvm.i64 + %2741 = llvm.add %2738, %2740 : !llvm.i64 + %2742 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2743 = llvm.mul %2488, %2742 : !llvm.i64 + %2744 = llvm.add %2741, %2743 : !llvm.i64 + %2745 = llvm.getelementptr %2734[%2744] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2746 = llvm.load %2745 : !llvm.ptr> + %2747 = llvm.insertelement %2726, %2746[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2748 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2749 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2750 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2751 = llvm.mul %2470, %2750 : !llvm.i64 + %2752 = llvm.add %2749, %2751 : !llvm.i64 + %2753 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2754 = llvm.mul %2613, %2753 : !llvm.i64 + %2755 = llvm.add %2752, %2754 : !llvm.i64 + %2756 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2757 = llvm.mul %2488, %2756 : !llvm.i64 + %2758 = llvm.add %2755, %2757 : !llvm.i64 + %2759 = llvm.getelementptr %2748[%2758] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2747, %2759 : !llvm.ptr> + %2760 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2761 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2762 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2763 = llvm.mul %2470, %2762 : !llvm.i64 + %2764 = llvm.add %2761, %2763 : !llvm.i64 + %2765 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2766 = llvm.mul %2613, %2765 : !llvm.i64 + %2767 = llvm.add %2764, %2766 : !llvm.i64 + %2768 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2769 = llvm.mul %2488, %2768 : !llvm.i64 + %2770 = llvm.add %2767, %2769 : !llvm.i64 + %2771 = llvm.getelementptr %2760[%2770] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2772 = llvm.load %2771 : !llvm.ptr> + %2773 = llvm.insertelement %2727, %2772[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2774 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2775 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2776 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2777 = llvm.mul %2470, %2776 : !llvm.i64 + %2778 = llvm.add %2775, %2777 : !llvm.i64 + %2779 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2780 = llvm.mul %2613, %2779 : !llvm.i64 + %2781 = llvm.add %2778, %2780 : !llvm.i64 + %2782 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2783 = llvm.mul %2488, %2782 : !llvm.i64 + %2784 = llvm.add %2781, %2783 : !llvm.i64 + %2785 = llvm.getelementptr %2774[%2784] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2773, %2785 : !llvm.ptr> + %2786 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2787 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2788 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2789 = llvm.mul %2470, %2788 : !llvm.i64 + %2790 = llvm.add %2787, %2789 : !llvm.i64 + %2791 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2792 = llvm.mul %2613, %2791 : !llvm.i64 + %2793 = llvm.add %2790, %2792 : !llvm.i64 + %2794 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2795 = llvm.mul %2488, %2794 : !llvm.i64 + %2796 = llvm.add %2793, %2795 : !llvm.i64 + %2797 = llvm.getelementptr %2786[%2796] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2798 = llvm.load %2797 : !llvm.ptr> + %2799 = llvm.insertelement %2728, %2798[%26 : !llvm.i64] : !llvm.vec<8 x float> + %2800 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2801 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2802 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2803 = llvm.mul %2470, %2802 : !llvm.i64 + %2804 = llvm.add %2801, %2803 : !llvm.i64 + %2805 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2806 = llvm.mul %2613, %2805 : !llvm.i64 + %2807 = llvm.add %2804, %2806 : !llvm.i64 + %2808 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2809 = llvm.mul %2488, %2808 : !llvm.i64 + %2810 = llvm.add %2807, %2809 : !llvm.i64 + %2811 = llvm.getelementptr %2800[%2810] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2799, %2811 : !llvm.ptr> + %2812 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2813 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2814 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2815 = llvm.mul %2470, %2814 : !llvm.i64 + %2816 = llvm.add %2813, %2815 : !llvm.i64 + %2817 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2818 = llvm.mul %2613, %2817 : !llvm.i64 + %2819 = llvm.add %2816, %2818 : !llvm.i64 + %2820 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2821 = llvm.mul %2488, %2820 : !llvm.i64 + %2822 = llvm.add %2819, %2821 : !llvm.i64 + %2823 = llvm.getelementptr %2812[%2822] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2824 = llvm.load %2823 : !llvm.ptr> + %2825 = llvm.insertelement %2729, %2824[%27 : !llvm.i64] : !llvm.vec<8 x float> + %2826 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2827 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2828 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2829 = llvm.mul %2470, %2828 : !llvm.i64 + %2830 = llvm.add %2827, %2829 : !llvm.i64 + %2831 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2832 = llvm.mul %2613, %2831 : !llvm.i64 + %2833 = llvm.add %2830, %2832 : !llvm.i64 + %2834 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2835 = llvm.mul %2488, %2834 : !llvm.i64 + %2836 = llvm.add %2833, %2835 : !llvm.i64 + %2837 = llvm.getelementptr %2826[%2836] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2825, %2837 : !llvm.ptr> + %2838 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2839 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2840 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2841 = llvm.mul %2470, %2840 : !llvm.i64 + %2842 = llvm.add %2839, %2841 : !llvm.i64 + %2843 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2844 = llvm.mul %2613, %2843 : !llvm.i64 + %2845 = llvm.add %2842, %2844 : !llvm.i64 + %2846 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2847 = llvm.mul %2488, %2846 : !llvm.i64 + %2848 = llvm.add %2845, %2847 : !llvm.i64 + %2849 = llvm.getelementptr %2838[%2848] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2850 = llvm.load %2849 : !llvm.ptr> + %2851 = llvm.insertelement %2730, %2850[%28 : !llvm.i64] : !llvm.vec<8 x float> + %2852 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2853 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2854 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2855 = llvm.mul %2470, %2854 : !llvm.i64 + %2856 = llvm.add %2853, %2855 : !llvm.i64 + %2857 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2858 = llvm.mul %2613, %2857 : !llvm.i64 + %2859 = llvm.add %2856, %2858 : !llvm.i64 + %2860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2861 = llvm.mul %2488, %2860 : !llvm.i64 + %2862 = llvm.add %2859, %2861 : !llvm.i64 + %2863 = llvm.getelementptr %2852[%2862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2851, %2863 : !llvm.ptr> + %2864 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2865 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2866 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2867 = llvm.mul %2470, %2866 : !llvm.i64 + %2868 = llvm.add %2865, %2867 : !llvm.i64 + %2869 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2870 = llvm.mul %2613, %2869 : !llvm.i64 + %2871 = llvm.add %2868, %2870 : !llvm.i64 + %2872 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2873 = llvm.mul %2488, %2872 : !llvm.i64 + %2874 = llvm.add %2871, %2873 : !llvm.i64 + %2875 = llvm.getelementptr %2864[%2874] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2876 = llvm.load %2875 : !llvm.ptr> + %2877 = llvm.insertelement %2731, %2876[%29 : !llvm.i64] : !llvm.vec<8 x float> + %2878 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2879 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2880 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2881 = llvm.mul %2470, %2880 : !llvm.i64 + %2882 = llvm.add %2879, %2881 : !llvm.i64 + %2883 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2884 = llvm.mul %2613, %2883 : !llvm.i64 + %2885 = llvm.add %2882, %2884 : !llvm.i64 + %2886 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2887 = llvm.mul %2488, %2886 : !llvm.i64 + %2888 = llvm.add %2885, %2887 : !llvm.i64 + %2889 = llvm.getelementptr %2878[%2888] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2877, %2889 : !llvm.ptr> + %2890 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2891 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2892 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2893 = llvm.mul %2470, %2892 : !llvm.i64 + %2894 = llvm.add %2891, %2893 : !llvm.i64 + %2895 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2896 = llvm.mul %2613, %2895 : !llvm.i64 + %2897 = llvm.add %2894, %2896 : !llvm.i64 + %2898 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2899 = llvm.mul %2488, %2898 : !llvm.i64 + %2900 = llvm.add %2897, %2899 : !llvm.i64 + %2901 = llvm.getelementptr %2890[%2900] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2902 = llvm.load %2901 : !llvm.ptr> + %2903 = llvm.insertelement %2732, %2902[%30 : !llvm.i64] : !llvm.vec<8 x float> + %2904 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2905 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2906 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2907 = llvm.mul %2470, %2906 : !llvm.i64 + %2908 = llvm.add %2905, %2907 : !llvm.i64 + %2909 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2910 = llvm.mul %2613, %2909 : !llvm.i64 + %2911 = llvm.add %2908, %2910 : !llvm.i64 + %2912 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2913 = llvm.mul %2488, %2912 : !llvm.i64 + %2914 = llvm.add %2911, %2913 : !llvm.i64 + %2915 = llvm.getelementptr %2904[%2914] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2903, %2915 : !llvm.ptr> + %2916 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2917 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2918 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2919 = llvm.mul %2470, %2918 : !llvm.i64 + %2920 = llvm.add %2917, %2919 : !llvm.i64 + %2921 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2922 = llvm.mul %2613, %2921 : !llvm.i64 + %2923 = llvm.add %2920, %2922 : !llvm.i64 + %2924 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2925 = llvm.mul %2488, %2924 : !llvm.i64 + %2926 = llvm.add %2923, %2925 : !llvm.i64 + %2927 = llvm.getelementptr %2916[%2926] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2928 = llvm.load %2927 : !llvm.ptr> + %2929 = llvm.insertelement %2733, %2928[%31 : !llvm.i64] : !llvm.vec<8 x float> + %2930 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2931 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2932 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2933 = llvm.mul %2470, %2932 : !llvm.i64 + %2934 = llvm.add %2931, %2933 : !llvm.i64 + %2935 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2936 = llvm.mul %2613, %2935 : !llvm.i64 + %2937 = llvm.add %2934, %2936 : !llvm.i64 + %2938 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2939 = llvm.mul %2488, %2938 : !llvm.i64 + %2940 = llvm.add %2937, %2939 : !llvm.i64 + %2941 = llvm.getelementptr %2930[%2940] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2929, %2941 : !llvm.ptr> + %2942 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2943 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2944 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2945 = llvm.mul %2470, %2944 : !llvm.i64 + %2946 = llvm.add %2943, %2945 : !llvm.i64 + %2947 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2948 = llvm.mul %2613, %2947 : !llvm.i64 + %2949 = llvm.add %2946, %2948 : !llvm.i64 + %2950 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2951 = llvm.mul %2488, %2950 : !llvm.i64 + %2952 = llvm.add %2949, %2951 : !llvm.i64 + %2953 = llvm.getelementptr %2942[%2952] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2954 = llvm.load %2953 : !llvm.ptr> + %2955 = llvm.insertelement %2726, %2954[%24 : !llvm.i64] : !llvm.vec<8 x float> + %2956 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2957 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2958 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2959 = llvm.mul %2470, %2958 : !llvm.i64 + %2960 = llvm.add %2957, %2959 : !llvm.i64 + %2961 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2962 = llvm.mul %2613, %2961 : !llvm.i64 + %2963 = llvm.add %2960, %2962 : !llvm.i64 + %2964 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2965 = llvm.mul %2488, %2964 : !llvm.i64 + %2966 = llvm.add %2963, %2965 : !llvm.i64 + %2967 = llvm.getelementptr %2956[%2966] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2955, %2967 : !llvm.ptr> + %2968 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2970 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2971 = llvm.mul %2470, %2970 : !llvm.i64 + %2972 = llvm.add %2969, %2971 : !llvm.i64 + %2973 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2974 = llvm.mul %2613, %2973 : !llvm.i64 + %2975 = llvm.add %2972, %2974 : !llvm.i64 + %2976 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2977 = llvm.mul %2488, %2976 : !llvm.i64 + %2978 = llvm.add %2975, %2977 : !llvm.i64 + %2979 = llvm.getelementptr %2968[%2978] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %2980 = llvm.load %2979 : !llvm.ptr> + %2981 = llvm.insertelement %2727, %2980[%25 : !llvm.i64] : !llvm.vec<8 x float> + %2982 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2984 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2985 = llvm.mul %2470, %2984 : !llvm.i64 + %2986 = llvm.add %2983, %2985 : !llvm.i64 + %2987 = llvm.mlir.constant(2 : index) : !llvm.i64 + %2988 = llvm.mul %2613, %2987 : !llvm.i64 + %2989 = llvm.add %2986, %2988 : !llvm.i64 + %2990 = llvm.mlir.constant(1 : index) : !llvm.i64 + %2991 = llvm.mul %2488, %2990 : !llvm.i64 + %2992 = llvm.add %2989, %2991 : !llvm.i64 + %2993 = llvm.getelementptr %2982[%2992] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %2981, %2993 : !llvm.ptr> + %2994 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %2995 = llvm.mlir.constant(0 : index) : !llvm.i64 + %2996 = llvm.mlir.constant(12 : index) : !llvm.i64 + %2997 = llvm.mul %2470, %2996 : !llvm.i64 + %2998 = llvm.add %2995, %2997 : !llvm.i64 + %2999 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3000 = llvm.mul %2613, %2999 : !llvm.i64 + %3001 = llvm.add %2998, %3000 : !llvm.i64 + %3002 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3003 = llvm.mul %2488, %3002 : !llvm.i64 + %3004 = llvm.add %3001, %3003 : !llvm.i64 + %3005 = llvm.getelementptr %2994[%3004] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3006 = llvm.load %3005 : !llvm.ptr> + %3007 = llvm.insertelement %2728, %3006[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3008 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3009 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3010 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3011 = llvm.mul %2470, %3010 : !llvm.i64 + %3012 = llvm.add %3009, %3011 : !llvm.i64 + %3013 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3014 = llvm.mul %2613, %3013 : !llvm.i64 + %3015 = llvm.add %3012, %3014 : !llvm.i64 + %3016 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3017 = llvm.mul %2488, %3016 : !llvm.i64 + %3018 = llvm.add %3015, %3017 : !llvm.i64 + %3019 = llvm.getelementptr %3008[%3018] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3007, %3019 : !llvm.ptr> + %3020 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3021 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3022 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3023 = llvm.mul %2470, %3022 : !llvm.i64 + %3024 = llvm.add %3021, %3023 : !llvm.i64 + %3025 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3026 = llvm.mul %2613, %3025 : !llvm.i64 + %3027 = llvm.add %3024, %3026 : !llvm.i64 + %3028 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3029 = llvm.mul %2488, %3028 : !llvm.i64 + %3030 = llvm.add %3027, %3029 : !llvm.i64 + %3031 = llvm.getelementptr %3020[%3030] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3032 = llvm.load %3031 : !llvm.ptr> + %3033 = llvm.insertelement %2729, %3032[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3037 = llvm.mul %2470, %3036 : !llvm.i64 + %3038 = llvm.add %3035, %3037 : !llvm.i64 + %3039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3040 = llvm.mul %2613, %3039 : !llvm.i64 + %3041 = llvm.add %3038, %3040 : !llvm.i64 + %3042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3043 = llvm.mul %2488, %3042 : !llvm.i64 + %3044 = llvm.add %3041, %3043 : !llvm.i64 + %3045 = llvm.getelementptr %3034[%3044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3033, %3045 : !llvm.ptr> + %3046 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3047 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3048 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3049 = llvm.mul %2470, %3048 : !llvm.i64 + %3050 = llvm.add %3047, %3049 : !llvm.i64 + %3051 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3052 = llvm.mul %2613, %3051 : !llvm.i64 + %3053 = llvm.add %3050, %3052 : !llvm.i64 + %3054 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3055 = llvm.mul %2488, %3054 : !llvm.i64 + %3056 = llvm.add %3053, %3055 : !llvm.i64 + %3057 = llvm.getelementptr %3046[%3056] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3058 = llvm.load %3057 : !llvm.ptr> + %3059 = llvm.insertelement %2730, %3058[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3060 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3061 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3062 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3063 = llvm.mul %2470, %3062 : !llvm.i64 + %3064 = llvm.add %3061, %3063 : !llvm.i64 + %3065 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3066 = llvm.mul %2613, %3065 : !llvm.i64 + %3067 = llvm.add %3064, %3066 : !llvm.i64 + %3068 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3069 = llvm.mul %2488, %3068 : !llvm.i64 + %3070 = llvm.add %3067, %3069 : !llvm.i64 + %3071 = llvm.getelementptr %3060[%3070] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3059, %3071 : !llvm.ptr> + %3072 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3073 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3074 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3075 = llvm.mul %2470, %3074 : !llvm.i64 + %3076 = llvm.add %3073, %3075 : !llvm.i64 + %3077 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3078 = llvm.mul %2613, %3077 : !llvm.i64 + %3079 = llvm.add %3076, %3078 : !llvm.i64 + %3080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3081 = llvm.mul %2488, %3080 : !llvm.i64 + %3082 = llvm.add %3079, %3081 : !llvm.i64 + %3083 = llvm.getelementptr %3072[%3082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3084 = llvm.load %3083 : !llvm.ptr> + %3085 = llvm.insertelement %2731, %3084[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3086 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3087 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3088 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3089 = llvm.mul %2470, %3088 : !llvm.i64 + %3090 = llvm.add %3087, %3089 : !llvm.i64 + %3091 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3092 = llvm.mul %2613, %3091 : !llvm.i64 + %3093 = llvm.add %3090, %3092 : !llvm.i64 + %3094 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3095 = llvm.mul %2488, %3094 : !llvm.i64 + %3096 = llvm.add %3093, %3095 : !llvm.i64 + %3097 = llvm.getelementptr %3086[%3096] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3085, %3097 : !llvm.ptr> + %3098 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3099 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3100 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3101 = llvm.mul %2470, %3100 : !llvm.i64 + %3102 = llvm.add %3099, %3101 : !llvm.i64 + %3103 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3104 = llvm.mul %2613, %3103 : !llvm.i64 + %3105 = llvm.add %3102, %3104 : !llvm.i64 + %3106 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3107 = llvm.mul %2488, %3106 : !llvm.i64 + %3108 = llvm.add %3105, %3107 : !llvm.i64 + %3109 = llvm.getelementptr %3098[%3108] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3110 = llvm.load %3109 : !llvm.ptr> + %3111 = llvm.insertelement %2732, %3110[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3112 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3113 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3114 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3115 = llvm.mul %2470, %3114 : !llvm.i64 + %3116 = llvm.add %3113, %3115 : !llvm.i64 + %3117 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3118 = llvm.mul %2613, %3117 : !llvm.i64 + %3119 = llvm.add %3116, %3118 : !llvm.i64 + %3120 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3121 = llvm.mul %2488, %3120 : !llvm.i64 + %3122 = llvm.add %3119, %3121 : !llvm.i64 + %3123 = llvm.getelementptr %3112[%3122] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3111, %3123 : !llvm.ptr> + %3124 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3126 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3127 = llvm.mul %2470, %3126 : !llvm.i64 + %3128 = llvm.add %3125, %3127 : !llvm.i64 + %3129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3130 = llvm.mul %2613, %3129 : !llvm.i64 + %3131 = llvm.add %3128, %3130 : !llvm.i64 + %3132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3133 = llvm.mul %2488, %3132 : !llvm.i64 + %3134 = llvm.add %3131, %3133 : !llvm.i64 + %3135 = llvm.getelementptr %3124[%3134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3136 = llvm.load %3135 : !llvm.ptr> + %3137 = llvm.insertelement %2733, %3136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3138 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3139 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3140 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3141 = llvm.mul %2470, %3140 : !llvm.i64 + %3142 = llvm.add %3139, %3141 : !llvm.i64 + %3143 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3144 = llvm.mul %2613, %3143 : !llvm.i64 + %3145 = llvm.add %3142, %3144 : !llvm.i64 + %3146 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3147 = llvm.mul %2488, %3146 : !llvm.i64 + %3148 = llvm.add %3145, %3147 : !llvm.i64 + %3149 = llvm.getelementptr %3138[%3148] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3137, %3149 : !llvm.ptr> + %3150 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3151 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3152 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3153 = llvm.mul %2379, %3152 : !llvm.i64 + %3154 = llvm.add %3151, %3153 : !llvm.i64 + %3155 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3156 = llvm.mul %2380, %3155 : !llvm.i64 + %3157 = llvm.add %3154, %3156 : !llvm.i64 + %3158 = llvm.getelementptr %3150[%3157] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3159 = llvm.load %3158 : !llvm.ptr + %3160 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3162 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3163 = llvm.mul %2379, %3162 : !llvm.i64 + %3164 = llvm.add %3161, %3163 : !llvm.i64 + %3165 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3166 = llvm.mul %2380, %3165 : !llvm.i64 + %3167 = llvm.add %3164, %3166 : !llvm.i64 + %3168 = llvm.getelementptr %3160[%3167] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3169 = llvm.load %3168 : !llvm.ptr + %3170 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3171 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3172 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3173 = llvm.mul %2379, %3172 : !llvm.i64 + %3174 = llvm.add %3171, %3173 : !llvm.i64 + %3175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3176 = llvm.mul %2380, %3175 : !llvm.i64 + %3177 = llvm.add %3174, %3176 : !llvm.i64 + %3178 = llvm.getelementptr %3170[%3177] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3179 = llvm.load %3178 : !llvm.ptr + %3180 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3181 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3182 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3183 = llvm.mul %2379, %3182 : !llvm.i64 + %3184 = llvm.add %3181, %3183 : !llvm.i64 + %3185 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3186 = llvm.mul %2380, %3185 : !llvm.i64 + %3187 = llvm.add %3184, %3186 : !llvm.i64 + %3188 = llvm.getelementptr %3180[%3187] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3189 = llvm.load %3188 : !llvm.ptr + %3190 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3191 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3192 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3193 = llvm.mul %2379, %3192 : !llvm.i64 + %3194 = llvm.add %3191, %3193 : !llvm.i64 + %3195 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3196 = llvm.mul %2380, %3195 : !llvm.i64 + %3197 = llvm.add %3194, %3196 : !llvm.i64 + %3198 = llvm.getelementptr %3190[%3197] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3199 = llvm.load %3198 : !llvm.ptr + %3200 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3201 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3202 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3203 = llvm.mul %2379, %3202 : !llvm.i64 + %3204 = llvm.add %3201, %3203 : !llvm.i64 + %3205 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3206 = llvm.mul %2380, %3205 : !llvm.i64 + %3207 = llvm.add %3204, %3206 : !llvm.i64 + %3208 = llvm.getelementptr %3200[%3207] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3209 = llvm.load %3208 : !llvm.ptr + %3210 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3212 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3213 = llvm.mul %2379, %3212 : !llvm.i64 + %3214 = llvm.add %3211, %3213 : !llvm.i64 + %3215 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3216 = llvm.mul %2380, %3215 : !llvm.i64 + %3217 = llvm.add %3214, %3216 : !llvm.i64 + %3218 = llvm.getelementptr %3210[%3217] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3219 = llvm.load %3218 : !llvm.ptr + %3220 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3221 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3222 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3223 = llvm.mul %2379, %3222 : !llvm.i64 + %3224 = llvm.add %3221, %3223 : !llvm.i64 + %3225 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3226 = llvm.mul %2380, %3225 : !llvm.i64 + %3227 = llvm.add %3224, %3226 : !llvm.i64 + %3228 = llvm.getelementptr %3220[%3227] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3229 = llvm.load %3228 : !llvm.ptr + %3230 = llvm.add %2368, %70 : !llvm.i64 + %3231 = llvm.icmp "slt" %3230, %67 : !llvm.i64 + %3232 = llvm.sub %64, %3230 : !llvm.i64 + %3233 = llvm.select %3231, %3232, %3230 : !llvm.i1, !llvm.i64 + %3234 = llvm.sdiv %3233, %68 : !llvm.i64 + %3235 = llvm.sub %64, %3234 : !llvm.i64 + %3236 = llvm.select %3231, %3235, %3234 : !llvm.i1, !llvm.i64 + %3237 = llvm.srem %3236, %68 : !llvm.i64 + %3238 = llvm.icmp "slt" %3237, %67 : !llvm.i64 + %3239 = llvm.add %3237, %68 : !llvm.i64 + %3240 = llvm.select %3238, %3239, %3237 : !llvm.i1, !llvm.i64 + %3241 = llvm.sdiv %2463, %70 : !llvm.i64 + %3242 = llvm.sub %64, %3241 : !llvm.i64 + %3243 = llvm.select %2461, %3242, %3241 : !llvm.i1, !llvm.i64 + %3244 = llvm.mul %3236, %65 : !llvm.i64 + %3245 = llvm.add %3243, %3244 : !llvm.i64 + %3246 = llvm.add %3245, %69 : !llvm.i64 + %3247 = llvm.icmp "slt" %3246, %67 : !llvm.i64 + %3248 = llvm.sub %64, %3246 : !llvm.i64 + %3249 = llvm.select %3247, %3248, %3246 : !llvm.i1, !llvm.i64 + %3250 = llvm.sdiv %3249, %63 : !llvm.i64 + %3251 = llvm.sub %64, %3250 : !llvm.i64 + %3252 = llvm.select %3247, %3251, %3250 : !llvm.i1, !llvm.i64 + %3253 = llvm.mul %3252, %65 : !llvm.i64 + %3254 = llvm.add %3245, %3253 : !llvm.i64 + %3255 = llvm.add %3254, %69 : !llvm.i64 + %3256 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3257 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3258 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3259 = llvm.mul %3240, %3258 : !llvm.i64 + %3260 = llvm.add %3257, %3259 : !llvm.i64 + %3261 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3262 = llvm.mul %2474, %3261 : !llvm.i64 + %3263 = llvm.add %3260, %3262 : !llvm.i64 + %3264 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3265 = llvm.mul %3255, %3264 : !llvm.i64 + %3266 = llvm.add %3263, %3265 : !llvm.i64 + %3267 = llvm.getelementptr %3256[%3266] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3268 = llvm.load %3267 : !llvm.ptr> + %3269 = llvm.extractelement %3268[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3270 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3271 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3272 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3273 = llvm.mul %3240, %3272 : !llvm.i64 + %3274 = llvm.add %3271, %3273 : !llvm.i64 + %3275 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3276 = llvm.mul %2474, %3275 : !llvm.i64 + %3277 = llvm.add %3274, %3276 : !llvm.i64 + %3278 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3279 = llvm.mul %3255, %3278 : !llvm.i64 + %3280 = llvm.add %3277, %3279 : !llvm.i64 + %3281 = llvm.getelementptr %3270[%3280] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3282 = llvm.load %3281 : !llvm.ptr> + %3283 = llvm.extractelement %3282[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3284 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3285 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3286 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3287 = llvm.mul %3240, %3286 : !llvm.i64 + %3288 = llvm.add %3285, %3287 : !llvm.i64 + %3289 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3290 = llvm.mul %2474, %3289 : !llvm.i64 + %3291 = llvm.add %3288, %3290 : !llvm.i64 + %3292 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3293 = llvm.mul %3255, %3292 : !llvm.i64 + %3294 = llvm.add %3291, %3293 : !llvm.i64 + %3295 = llvm.getelementptr %3284[%3294] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3296 = llvm.load %3295 : !llvm.ptr> + %3297 = llvm.extractelement %3296[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3298 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3299 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3300 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3301 = llvm.mul %3240, %3300 : !llvm.i64 + %3302 = llvm.add %3299, %3301 : !llvm.i64 + %3303 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3304 = llvm.mul %2474, %3303 : !llvm.i64 + %3305 = llvm.add %3302, %3304 : !llvm.i64 + %3306 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3307 = llvm.mul %3255, %3306 : !llvm.i64 + %3308 = llvm.add %3305, %3307 : !llvm.i64 + %3309 = llvm.getelementptr %3298[%3308] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3310 = llvm.load %3309 : !llvm.ptr> + %3311 = llvm.extractelement %3310[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3312 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3313 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3314 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3315 = llvm.mul %3240, %3314 : !llvm.i64 + %3316 = llvm.add %3313, %3315 : !llvm.i64 + %3317 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3318 = llvm.mul %2474, %3317 : !llvm.i64 + %3319 = llvm.add %3316, %3318 : !llvm.i64 + %3320 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3321 = llvm.mul %3255, %3320 : !llvm.i64 + %3322 = llvm.add %3319, %3321 : !llvm.i64 + %3323 = llvm.getelementptr %3312[%3322] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3324 = llvm.load %3323 : !llvm.ptr> + %3325 = llvm.extractelement %3324[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3326 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3327 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3328 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3329 = llvm.mul %3240, %3328 : !llvm.i64 + %3330 = llvm.add %3327, %3329 : !llvm.i64 + %3331 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3332 = llvm.mul %2474, %3331 : !llvm.i64 + %3333 = llvm.add %3330, %3332 : !llvm.i64 + %3334 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3335 = llvm.mul %3255, %3334 : !llvm.i64 + %3336 = llvm.add %3333, %3335 : !llvm.i64 + %3337 = llvm.getelementptr %3326[%3336] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3338 = llvm.load %3337 : !llvm.ptr> + %3339 = llvm.extractelement %3338[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3340 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3342 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3343 = llvm.mul %3240, %3342 : !llvm.i64 + %3344 = llvm.add %3341, %3343 : !llvm.i64 + %3345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3346 = llvm.mul %2474, %3345 : !llvm.i64 + %3347 = llvm.add %3344, %3346 : !llvm.i64 + %3348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3349 = llvm.mul %3255, %3348 : !llvm.i64 + %3350 = llvm.add %3347, %3349 : !llvm.i64 + %3351 = llvm.getelementptr %3340[%3350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3352 = llvm.load %3351 : !llvm.ptr> + %3353 = llvm.extractelement %3352[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3354 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3356 = llvm.mlir.constant(256 : index) : !llvm.i64 + %3357 = llvm.mul %3240, %3356 : !llvm.i64 + %3358 = llvm.add %3355, %3357 : !llvm.i64 + %3359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3360 = llvm.mul %2474, %3359 : !llvm.i64 + %3361 = llvm.add %3358, %3360 : !llvm.i64 + %3362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3363 = llvm.mul %3255, %3362 : !llvm.i64 + %3364 = llvm.add %3361, %3363 : !llvm.i64 + %3365 = llvm.getelementptr %3354[%3364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3366 = llvm.load %3365 : !llvm.ptr> + %3367 = llvm.extractelement %3366[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3368 = llvm.fmul %3159, %3269 {RelaxedPrecision} : !llvm.float + %3369 = llvm.fmul %3169, %3283 {RelaxedPrecision} : !llvm.float + %3370 = llvm.fmul %3179, %3297 {RelaxedPrecision} : !llvm.float + %3371 = llvm.fmul %3189, %3311 {RelaxedPrecision} : !llvm.float + %3372 = llvm.fmul %3199, %3325 {RelaxedPrecision} : !llvm.float + %3373 = llvm.fmul %3209, %3339 {RelaxedPrecision} : !llvm.float + %3374 = llvm.fmul %3219, %3353 {RelaxedPrecision} : !llvm.float + %3375 = llvm.fmul %3229, %3367 {RelaxedPrecision} : !llvm.float + %3376 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3377 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3378 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3379 = llvm.mul %3240, %3378 : !llvm.i64 + %3380 = llvm.add %3377, %3379 : !llvm.i64 + %3381 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3382 = llvm.mul %2613, %3381 : !llvm.i64 + %3383 = llvm.add %3380, %3382 : !llvm.i64 + %3384 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3385 = llvm.mul %3255, %3384 : !llvm.i64 + %3386 = llvm.add %3383, %3385 : !llvm.i64 + %3387 = llvm.getelementptr %3376[%3386] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3388 = llvm.load %3387 : !llvm.ptr> + %3389 = llvm.extractelement %3388[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3390 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3392 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3393 = llvm.mul %3240, %3392 : !llvm.i64 + %3394 = llvm.add %3391, %3393 : !llvm.i64 + %3395 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3396 = llvm.mul %2613, %3395 : !llvm.i64 + %3397 = llvm.add %3394, %3396 : !llvm.i64 + %3398 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3399 = llvm.mul %3255, %3398 : !llvm.i64 + %3400 = llvm.add %3397, %3399 : !llvm.i64 + %3401 = llvm.getelementptr %3390[%3400] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3402 = llvm.load %3401 : !llvm.ptr> + %3403 = llvm.extractelement %3402[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3404 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3405 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3406 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3407 = llvm.mul %3240, %3406 : !llvm.i64 + %3408 = llvm.add %3405, %3407 : !llvm.i64 + %3409 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3410 = llvm.mul %2613, %3409 : !llvm.i64 + %3411 = llvm.add %3408, %3410 : !llvm.i64 + %3412 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3413 = llvm.mul %3255, %3412 : !llvm.i64 + %3414 = llvm.add %3411, %3413 : !llvm.i64 + %3415 = llvm.getelementptr %3404[%3414] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3416 = llvm.load %3415 : !llvm.ptr> + %3417 = llvm.extractelement %3416[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3421 = llvm.mul %3240, %3420 : !llvm.i64 + %3422 = llvm.add %3419, %3421 : !llvm.i64 + %3423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3424 = llvm.mul %2613, %3423 : !llvm.i64 + %3425 = llvm.add %3422, %3424 : !llvm.i64 + %3426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3427 = llvm.mul %3255, %3426 : !llvm.i64 + %3428 = llvm.add %3425, %3427 : !llvm.i64 + %3429 = llvm.getelementptr %3418[%3428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3430 = llvm.load %3429 : !llvm.ptr> + %3431 = llvm.extractelement %3430[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3435 = llvm.mul %3240, %3434 : !llvm.i64 + %3436 = llvm.add %3433, %3435 : !llvm.i64 + %3437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3438 = llvm.mul %2613, %3437 : !llvm.i64 + %3439 = llvm.add %3436, %3438 : !llvm.i64 + %3440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3441 = llvm.mul %3255, %3440 : !llvm.i64 + %3442 = llvm.add %3439, %3441 : !llvm.i64 + %3443 = llvm.getelementptr %3432[%3442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3444 = llvm.load %3443 : !llvm.ptr> + %3445 = llvm.extractelement %3444[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3446 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3447 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3448 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3449 = llvm.mul %3240, %3448 : !llvm.i64 + %3450 = llvm.add %3447, %3449 : !llvm.i64 + %3451 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3452 = llvm.mul %2613, %3451 : !llvm.i64 + %3453 = llvm.add %3450, %3452 : !llvm.i64 + %3454 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3455 = llvm.mul %3255, %3454 : !llvm.i64 + %3456 = llvm.add %3453, %3455 : !llvm.i64 + %3457 = llvm.getelementptr %3446[%3456] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3458 = llvm.load %3457 : !llvm.ptr> + %3459 = llvm.extractelement %3458[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3460 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3461 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3462 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3463 = llvm.mul %3240, %3462 : !llvm.i64 + %3464 = llvm.add %3461, %3463 : !llvm.i64 + %3465 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3466 = llvm.mul %2613, %3465 : !llvm.i64 + %3467 = llvm.add %3464, %3466 : !llvm.i64 + %3468 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3469 = llvm.mul %3255, %3468 : !llvm.i64 + %3470 = llvm.add %3467, %3469 : !llvm.i64 + %3471 = llvm.getelementptr %3460[%3470] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3472 = llvm.load %3471 : !llvm.ptr> + %3473 = llvm.extractelement %3472[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3477 = llvm.mul %3240, %3476 : !llvm.i64 + %3478 = llvm.add %3475, %3477 : !llvm.i64 + %3479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3480 = llvm.mul %2613, %3479 : !llvm.i64 + %3481 = llvm.add %3478, %3480 : !llvm.i64 + %3482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3483 = llvm.mul %3255, %3482 : !llvm.i64 + %3484 = llvm.add %3481, %3483 : !llvm.i64 + %3485 = llvm.getelementptr %3474[%3484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3486 = llvm.load %3485 : !llvm.ptr> + %3487 = llvm.extractelement %3486[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3488 = llvm.fadd %3389, %3368 {RelaxedPrecision} : !llvm.float + %3489 = llvm.fadd %3403, %3369 {RelaxedPrecision} : !llvm.float + %3490 = llvm.fadd %3417, %3370 {RelaxedPrecision} : !llvm.float + %3491 = llvm.fadd %3431, %3371 {RelaxedPrecision} : !llvm.float + %3492 = llvm.fadd %3445, %3372 {RelaxedPrecision} : !llvm.float + %3493 = llvm.fadd %3459, %3373 {RelaxedPrecision} : !llvm.float + %3494 = llvm.fadd %3473, %3374 {RelaxedPrecision} : !llvm.float + %3495 = llvm.fadd %3487, %3375 {RelaxedPrecision} : !llvm.float + %3496 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3497 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3498 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3499 = llvm.mul %3240, %3498 : !llvm.i64 + %3500 = llvm.add %3497, %3499 : !llvm.i64 + %3501 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3502 = llvm.mul %2613, %3501 : !llvm.i64 + %3503 = llvm.add %3500, %3502 : !llvm.i64 + %3504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3505 = llvm.mul %3255, %3504 : !llvm.i64 + %3506 = llvm.add %3503, %3505 : !llvm.i64 + %3507 = llvm.getelementptr %3496[%3506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3508 = llvm.load %3507 : !llvm.ptr> + %3509 = llvm.insertelement %3488, %3508[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3510 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3511 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3512 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3513 = llvm.mul %3240, %3512 : !llvm.i64 + %3514 = llvm.add %3511, %3513 : !llvm.i64 + %3515 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3516 = llvm.mul %2613, %3515 : !llvm.i64 + %3517 = llvm.add %3514, %3516 : !llvm.i64 + %3518 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3519 = llvm.mul %3255, %3518 : !llvm.i64 + %3520 = llvm.add %3517, %3519 : !llvm.i64 + %3521 = llvm.getelementptr %3510[%3520] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3509, %3521 : !llvm.ptr> + %3522 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3523 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3524 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3525 = llvm.mul %3240, %3524 : !llvm.i64 + %3526 = llvm.add %3523, %3525 : !llvm.i64 + %3527 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3528 = llvm.mul %2613, %3527 : !llvm.i64 + %3529 = llvm.add %3526, %3528 : !llvm.i64 + %3530 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3531 = llvm.mul %3255, %3530 : !llvm.i64 + %3532 = llvm.add %3529, %3531 : !llvm.i64 + %3533 = llvm.getelementptr %3522[%3532] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3534 = llvm.load %3533 : !llvm.ptr> + %3535 = llvm.insertelement %3489, %3534[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3536 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3537 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3538 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3539 = llvm.mul %3240, %3538 : !llvm.i64 + %3540 = llvm.add %3537, %3539 : !llvm.i64 + %3541 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3542 = llvm.mul %2613, %3541 : !llvm.i64 + %3543 = llvm.add %3540, %3542 : !llvm.i64 + %3544 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3545 = llvm.mul %3255, %3544 : !llvm.i64 + %3546 = llvm.add %3543, %3545 : !llvm.i64 + %3547 = llvm.getelementptr %3536[%3546] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3535, %3547 : !llvm.ptr> + %3548 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3549 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3550 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3551 = llvm.mul %3240, %3550 : !llvm.i64 + %3552 = llvm.add %3549, %3551 : !llvm.i64 + %3553 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3554 = llvm.mul %2613, %3553 : !llvm.i64 + %3555 = llvm.add %3552, %3554 : !llvm.i64 + %3556 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3557 = llvm.mul %3255, %3556 : !llvm.i64 + %3558 = llvm.add %3555, %3557 : !llvm.i64 + %3559 = llvm.getelementptr %3548[%3558] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3560 = llvm.load %3559 : !llvm.ptr> + %3561 = llvm.insertelement %3490, %3560[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3562 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3563 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3564 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3565 = llvm.mul %3240, %3564 : !llvm.i64 + %3566 = llvm.add %3563, %3565 : !llvm.i64 + %3567 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3568 = llvm.mul %2613, %3567 : !llvm.i64 + %3569 = llvm.add %3566, %3568 : !llvm.i64 + %3570 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3571 = llvm.mul %3255, %3570 : !llvm.i64 + %3572 = llvm.add %3569, %3571 : !llvm.i64 + %3573 = llvm.getelementptr %3562[%3572] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3561, %3573 : !llvm.ptr> + %3574 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3575 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3576 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3577 = llvm.mul %3240, %3576 : !llvm.i64 + %3578 = llvm.add %3575, %3577 : !llvm.i64 + %3579 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3580 = llvm.mul %2613, %3579 : !llvm.i64 + %3581 = llvm.add %3578, %3580 : !llvm.i64 + %3582 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3583 = llvm.mul %3255, %3582 : !llvm.i64 + %3584 = llvm.add %3581, %3583 : !llvm.i64 + %3585 = llvm.getelementptr %3574[%3584] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3586 = llvm.load %3585 : !llvm.ptr> + %3587 = llvm.insertelement %3491, %3586[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3588 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3589 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3590 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3591 = llvm.mul %3240, %3590 : !llvm.i64 + %3592 = llvm.add %3589, %3591 : !llvm.i64 + %3593 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3594 = llvm.mul %2613, %3593 : !llvm.i64 + %3595 = llvm.add %3592, %3594 : !llvm.i64 + %3596 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3597 = llvm.mul %3255, %3596 : !llvm.i64 + %3598 = llvm.add %3595, %3597 : !llvm.i64 + %3599 = llvm.getelementptr %3588[%3598] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3587, %3599 : !llvm.ptr> + %3600 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3601 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3602 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3603 = llvm.mul %3240, %3602 : !llvm.i64 + %3604 = llvm.add %3601, %3603 : !llvm.i64 + %3605 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3606 = llvm.mul %2613, %3605 : !llvm.i64 + %3607 = llvm.add %3604, %3606 : !llvm.i64 + %3608 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3609 = llvm.mul %3255, %3608 : !llvm.i64 + %3610 = llvm.add %3607, %3609 : !llvm.i64 + %3611 = llvm.getelementptr %3600[%3610] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3612 = llvm.load %3611 : !llvm.ptr> + %3613 = llvm.insertelement %3492, %3612[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3614 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3615 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3616 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3617 = llvm.mul %3240, %3616 : !llvm.i64 + %3618 = llvm.add %3615, %3617 : !llvm.i64 + %3619 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3620 = llvm.mul %2613, %3619 : !llvm.i64 + %3621 = llvm.add %3618, %3620 : !llvm.i64 + %3622 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3623 = llvm.mul %3255, %3622 : !llvm.i64 + %3624 = llvm.add %3621, %3623 : !llvm.i64 + %3625 = llvm.getelementptr %3614[%3624] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3613, %3625 : !llvm.ptr> + %3626 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3627 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3628 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3629 = llvm.mul %3240, %3628 : !llvm.i64 + %3630 = llvm.add %3627, %3629 : !llvm.i64 + %3631 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3632 = llvm.mul %2613, %3631 : !llvm.i64 + %3633 = llvm.add %3630, %3632 : !llvm.i64 + %3634 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3635 = llvm.mul %3255, %3634 : !llvm.i64 + %3636 = llvm.add %3633, %3635 : !llvm.i64 + %3637 = llvm.getelementptr %3626[%3636] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3638 = llvm.load %3637 : !llvm.ptr> + %3639 = llvm.insertelement %3493, %3638[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3640 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3641 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3642 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3643 = llvm.mul %3240, %3642 : !llvm.i64 + %3644 = llvm.add %3641, %3643 : !llvm.i64 + %3645 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3646 = llvm.mul %2613, %3645 : !llvm.i64 + %3647 = llvm.add %3644, %3646 : !llvm.i64 + %3648 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3649 = llvm.mul %3255, %3648 : !llvm.i64 + %3650 = llvm.add %3647, %3649 : !llvm.i64 + %3651 = llvm.getelementptr %3640[%3650] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3639, %3651 : !llvm.ptr> + %3652 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3653 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3654 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3655 = llvm.mul %3240, %3654 : !llvm.i64 + %3656 = llvm.add %3653, %3655 : !llvm.i64 + %3657 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3658 = llvm.mul %2613, %3657 : !llvm.i64 + %3659 = llvm.add %3656, %3658 : !llvm.i64 + %3660 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3661 = llvm.mul %3255, %3660 : !llvm.i64 + %3662 = llvm.add %3659, %3661 : !llvm.i64 + %3663 = llvm.getelementptr %3652[%3662] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3664 = llvm.load %3663 : !llvm.ptr> + %3665 = llvm.insertelement %3494, %3664[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3666 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3667 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3668 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3669 = llvm.mul %3240, %3668 : !llvm.i64 + %3670 = llvm.add %3667, %3669 : !llvm.i64 + %3671 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3672 = llvm.mul %2613, %3671 : !llvm.i64 + %3673 = llvm.add %3670, %3672 : !llvm.i64 + %3674 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3675 = llvm.mul %3255, %3674 : !llvm.i64 + %3676 = llvm.add %3673, %3675 : !llvm.i64 + %3677 = llvm.getelementptr %3666[%3676] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3665, %3677 : !llvm.ptr> + %3678 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3679 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3680 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3681 = llvm.mul %3240, %3680 : !llvm.i64 + %3682 = llvm.add %3679, %3681 : !llvm.i64 + %3683 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3684 = llvm.mul %2613, %3683 : !llvm.i64 + %3685 = llvm.add %3682, %3684 : !llvm.i64 + %3686 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3687 = llvm.mul %3255, %3686 : !llvm.i64 + %3688 = llvm.add %3685, %3687 : !llvm.i64 + %3689 = llvm.getelementptr %3678[%3688] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3690 = llvm.load %3689 : !llvm.ptr> + %3691 = llvm.insertelement %3495, %3690[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3692 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3694 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3695 = llvm.mul %3240, %3694 : !llvm.i64 + %3696 = llvm.add %3693, %3695 : !llvm.i64 + %3697 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3698 = llvm.mul %2613, %3697 : !llvm.i64 + %3699 = llvm.add %3696, %3698 : !llvm.i64 + %3700 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3701 = llvm.mul %3255, %3700 : !llvm.i64 + %3702 = llvm.add %3699, %3701 : !llvm.i64 + %3703 = llvm.getelementptr %3692[%3702] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3691, %3703 : !llvm.ptr> + %3704 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3705 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3706 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3707 = llvm.mul %3240, %3706 : !llvm.i64 + %3708 = llvm.add %3705, %3707 : !llvm.i64 + %3709 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3710 = llvm.mul %2613, %3709 : !llvm.i64 + %3711 = llvm.add %3708, %3710 : !llvm.i64 + %3712 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3713 = llvm.mul %3255, %3712 : !llvm.i64 + %3714 = llvm.add %3711, %3713 : !llvm.i64 + %3715 = llvm.getelementptr %3704[%3714] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3716 = llvm.load %3715 : !llvm.ptr> + %3717 = llvm.insertelement %3488, %3716[%24 : !llvm.i64] : !llvm.vec<8 x float> + %3718 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3720 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3721 = llvm.mul %3240, %3720 : !llvm.i64 + %3722 = llvm.add %3719, %3721 : !llvm.i64 + %3723 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3724 = llvm.mul %2613, %3723 : !llvm.i64 + %3725 = llvm.add %3722, %3724 : !llvm.i64 + %3726 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3727 = llvm.mul %3255, %3726 : !llvm.i64 + %3728 = llvm.add %3725, %3727 : !llvm.i64 + %3729 = llvm.getelementptr %3718[%3728] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3717, %3729 : !llvm.ptr> + %3730 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3731 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3732 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3733 = llvm.mul %3240, %3732 : !llvm.i64 + %3734 = llvm.add %3731, %3733 : !llvm.i64 + %3735 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3736 = llvm.mul %2613, %3735 : !llvm.i64 + %3737 = llvm.add %3734, %3736 : !llvm.i64 + %3738 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3739 = llvm.mul %3255, %3738 : !llvm.i64 + %3740 = llvm.add %3737, %3739 : !llvm.i64 + %3741 = llvm.getelementptr %3730[%3740] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3742 = llvm.load %3741 : !llvm.ptr> + %3743 = llvm.insertelement %3489, %3742[%25 : !llvm.i64] : !llvm.vec<8 x float> + %3744 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3745 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3746 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3747 = llvm.mul %3240, %3746 : !llvm.i64 + %3748 = llvm.add %3745, %3747 : !llvm.i64 + %3749 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3750 = llvm.mul %2613, %3749 : !llvm.i64 + %3751 = llvm.add %3748, %3750 : !llvm.i64 + %3752 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3753 = llvm.mul %3255, %3752 : !llvm.i64 + %3754 = llvm.add %3751, %3753 : !llvm.i64 + %3755 = llvm.getelementptr %3744[%3754] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3743, %3755 : !llvm.ptr> + %3756 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3757 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3758 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3759 = llvm.mul %3240, %3758 : !llvm.i64 + %3760 = llvm.add %3757, %3759 : !llvm.i64 + %3761 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3762 = llvm.mul %2613, %3761 : !llvm.i64 + %3763 = llvm.add %3760, %3762 : !llvm.i64 + %3764 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3765 = llvm.mul %3255, %3764 : !llvm.i64 + %3766 = llvm.add %3763, %3765 : !llvm.i64 + %3767 = llvm.getelementptr %3756[%3766] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3768 = llvm.load %3767 : !llvm.ptr> + %3769 = llvm.insertelement %3490, %3768[%26 : !llvm.i64] : !llvm.vec<8 x float> + %3770 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3771 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3772 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3773 = llvm.mul %3240, %3772 : !llvm.i64 + %3774 = llvm.add %3771, %3773 : !llvm.i64 + %3775 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3776 = llvm.mul %2613, %3775 : !llvm.i64 + %3777 = llvm.add %3774, %3776 : !llvm.i64 + %3778 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3779 = llvm.mul %3255, %3778 : !llvm.i64 + %3780 = llvm.add %3777, %3779 : !llvm.i64 + %3781 = llvm.getelementptr %3770[%3780] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3769, %3781 : !llvm.ptr> + %3782 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3784 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3785 = llvm.mul %3240, %3784 : !llvm.i64 + %3786 = llvm.add %3783, %3785 : !llvm.i64 + %3787 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3788 = llvm.mul %2613, %3787 : !llvm.i64 + %3789 = llvm.add %3786, %3788 : !llvm.i64 + %3790 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3791 = llvm.mul %3255, %3790 : !llvm.i64 + %3792 = llvm.add %3789, %3791 : !llvm.i64 + %3793 = llvm.getelementptr %3782[%3792] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3794 = llvm.load %3793 : !llvm.ptr> + %3795 = llvm.insertelement %3491, %3794[%27 : !llvm.i64] : !llvm.vec<8 x float> + %3796 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3797 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3798 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3799 = llvm.mul %3240, %3798 : !llvm.i64 + %3800 = llvm.add %3797, %3799 : !llvm.i64 + %3801 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3802 = llvm.mul %2613, %3801 : !llvm.i64 + %3803 = llvm.add %3800, %3802 : !llvm.i64 + %3804 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3805 = llvm.mul %3255, %3804 : !llvm.i64 + %3806 = llvm.add %3803, %3805 : !llvm.i64 + %3807 = llvm.getelementptr %3796[%3806] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3795, %3807 : !llvm.ptr> + %3808 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3809 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3810 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3811 = llvm.mul %3240, %3810 : !llvm.i64 + %3812 = llvm.add %3809, %3811 : !llvm.i64 + %3813 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3814 = llvm.mul %2613, %3813 : !llvm.i64 + %3815 = llvm.add %3812, %3814 : !llvm.i64 + %3816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3817 = llvm.mul %3255, %3816 : !llvm.i64 + %3818 = llvm.add %3815, %3817 : !llvm.i64 + %3819 = llvm.getelementptr %3808[%3818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3820 = llvm.load %3819 : !llvm.ptr> + %3821 = llvm.insertelement %3492, %3820[%28 : !llvm.i64] : !llvm.vec<8 x float> + %3822 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3823 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3824 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3825 = llvm.mul %3240, %3824 : !llvm.i64 + %3826 = llvm.add %3823, %3825 : !llvm.i64 + %3827 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3828 = llvm.mul %2613, %3827 : !llvm.i64 + %3829 = llvm.add %3826, %3828 : !llvm.i64 + %3830 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3831 = llvm.mul %3255, %3830 : !llvm.i64 + %3832 = llvm.add %3829, %3831 : !llvm.i64 + %3833 = llvm.getelementptr %3822[%3832] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3821, %3833 : !llvm.ptr> + %3834 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3835 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3836 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3837 = llvm.mul %3240, %3836 : !llvm.i64 + %3838 = llvm.add %3835, %3837 : !llvm.i64 + %3839 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3840 = llvm.mul %2613, %3839 : !llvm.i64 + %3841 = llvm.add %3838, %3840 : !llvm.i64 + %3842 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3843 = llvm.mul %3255, %3842 : !llvm.i64 + %3844 = llvm.add %3841, %3843 : !llvm.i64 + %3845 = llvm.getelementptr %3834[%3844] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3846 = llvm.load %3845 : !llvm.ptr> + %3847 = llvm.insertelement %3493, %3846[%29 : !llvm.i64] : !llvm.vec<8 x float> + %3848 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3849 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3850 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3851 = llvm.mul %3240, %3850 : !llvm.i64 + %3852 = llvm.add %3849, %3851 : !llvm.i64 + %3853 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3854 = llvm.mul %2613, %3853 : !llvm.i64 + %3855 = llvm.add %3852, %3854 : !llvm.i64 + %3856 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3857 = llvm.mul %3255, %3856 : !llvm.i64 + %3858 = llvm.add %3855, %3857 : !llvm.i64 + %3859 = llvm.getelementptr %3848[%3858] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3847, %3859 : !llvm.ptr> + %3860 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3861 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3862 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3863 = llvm.mul %3240, %3862 : !llvm.i64 + %3864 = llvm.add %3861, %3863 : !llvm.i64 + %3865 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3866 = llvm.mul %2613, %3865 : !llvm.i64 + %3867 = llvm.add %3864, %3866 : !llvm.i64 + %3868 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3869 = llvm.mul %3255, %3868 : !llvm.i64 + %3870 = llvm.add %3867, %3869 : !llvm.i64 + %3871 = llvm.getelementptr %3860[%3870] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3872 = llvm.load %3871 : !llvm.ptr> + %3873 = llvm.insertelement %3494, %3872[%30 : !llvm.i64] : !llvm.vec<8 x float> + %3874 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3875 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3876 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3877 = llvm.mul %3240, %3876 : !llvm.i64 + %3878 = llvm.add %3875, %3877 : !llvm.i64 + %3879 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3880 = llvm.mul %2613, %3879 : !llvm.i64 + %3881 = llvm.add %3878, %3880 : !llvm.i64 + %3882 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3883 = llvm.mul %3255, %3882 : !llvm.i64 + %3884 = llvm.add %3881, %3883 : !llvm.i64 + %3885 = llvm.getelementptr %3874[%3884] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3873, %3885 : !llvm.ptr> + %3886 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3888 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3889 = llvm.mul %3240, %3888 : !llvm.i64 + %3890 = llvm.add %3887, %3889 : !llvm.i64 + %3891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3892 = llvm.mul %2613, %3891 : !llvm.i64 + %3893 = llvm.add %3890, %3892 : !llvm.i64 + %3894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3895 = llvm.mul %3255, %3894 : !llvm.i64 + %3896 = llvm.add %3893, %3895 : !llvm.i64 + %3897 = llvm.getelementptr %3886[%3896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %3898 = llvm.load %3897 : !llvm.ptr> + %3899 = llvm.insertelement %3495, %3898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %3900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %3901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %3903 = llvm.mul %3240, %3902 : !llvm.i64 + %3904 = llvm.add %3901, %3903 : !llvm.i64 + %3905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %3906 = llvm.mul %2613, %3905 : !llvm.i64 + %3907 = llvm.add %3904, %3906 : !llvm.i64 + %3908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3909 = llvm.mul %3255, %3908 : !llvm.i64 + %3910 = llvm.add %3907, %3909 : !llvm.i64 + %3911 = llvm.getelementptr %3900[%3910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %3899, %3911 : !llvm.ptr> + %3912 = llvm.add %2376, %69 : !llvm.i64 + llvm.br ^bb31(%3912 : !llvm.i64) + ^bb33: // pred: ^bb31 + %3913 = llvm.add %2374, %69 : !llvm.i64 + llvm.br ^bb29(%3913 : !llvm.i64) + ^bb34: // pred: ^bb29 + %3914 = llvm.add %2372, %56 : !llvm.i64 + llvm.br ^bb27(%3914 : !llvm.i64) + ^bb35: // pred: ^bb27 + llvm.br ^bb36(%67 : !llvm.i64) + ^bb36(%3915: !llvm.i64): // 2 preds: ^bb35, ^bb37 + %3916 = llvm.icmp "slt" %3915, %48 : !llvm.i64 + llvm.cond_br %3916, ^bb37, ^bb38 + ^bb37: // pred: ^bb36 + %3917 = llvm.add %2370, %3915 : !llvm.i64 + %3918 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3919 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3920 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3921 = llvm.mul %2345, %3920 : !llvm.i64 + %3922 = llvm.add %3919, %3921 : !llvm.i64 + %3923 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3924 = llvm.mul %3917, %3923 : !llvm.i64 + %3925 = llvm.add %3922, %3924 : !llvm.i64 + %3926 = llvm.getelementptr %3918[%3925] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3927 = llvm.load %3926 : !llvm.ptr + %3928 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3929 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3930 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3931 = llvm.mul %2345, %3930 : !llvm.i64 + %3932 = llvm.add %3929, %3931 : !llvm.i64 + %3933 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3934 = llvm.mul %3917, %3933 : !llvm.i64 + %3935 = llvm.add %3932, %3934 : !llvm.i64 + %3936 = llvm.getelementptr %3928[%3935] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3937 = llvm.load %3936 : !llvm.ptr + %3938 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3939 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3940 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3941 = llvm.mul %2345, %3940 : !llvm.i64 + %3942 = llvm.add %3939, %3941 : !llvm.i64 + %3943 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3944 = llvm.mul %3917, %3943 : !llvm.i64 + %3945 = llvm.add %3942, %3944 : !llvm.i64 + %3946 = llvm.getelementptr %3938[%3945] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3947 = llvm.load %3946 : !llvm.ptr + %3948 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3949 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3950 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3951 = llvm.mul %2345, %3950 : !llvm.i64 + %3952 = llvm.add %3949, %3951 : !llvm.i64 + %3953 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3954 = llvm.mul %3917, %3953 : !llvm.i64 + %3955 = llvm.add %3952, %3954 : !llvm.i64 + %3956 = llvm.getelementptr %3948[%3955] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3957 = llvm.load %3956 : !llvm.ptr + %3958 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3960 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3961 = llvm.mul %2345, %3960 : !llvm.i64 + %3962 = llvm.add %3959, %3961 : !llvm.i64 + %3963 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3964 = llvm.mul %3917, %3963 : !llvm.i64 + %3965 = llvm.add %3962, %3964 : !llvm.i64 + %3966 = llvm.getelementptr %3958[%3965] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3967 = llvm.load %3966 : !llvm.ptr + %3968 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3969 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3970 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3971 = llvm.mul %2345, %3970 : !llvm.i64 + %3972 = llvm.add %3969, %3971 : !llvm.i64 + %3973 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3974 = llvm.mul %3917, %3973 : !llvm.i64 + %3975 = llvm.add %3972, %3974 : !llvm.i64 + %3976 = llvm.getelementptr %3968[%3975] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3977 = llvm.load %3976 : !llvm.ptr + %3978 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3980 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3981 = llvm.mul %2345, %3980 : !llvm.i64 + %3982 = llvm.add %3979, %3981 : !llvm.i64 + %3983 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3984 = llvm.mul %3917, %3983 : !llvm.i64 + %3985 = llvm.add %3982, %3984 : !llvm.i64 + %3986 = llvm.getelementptr %3978[%3985] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3987 = llvm.load %3986 : !llvm.ptr + %3988 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %3990 = llvm.mlir.constant(128 : index) : !llvm.i64 + %3991 = llvm.mul %2345, %3990 : !llvm.i64 + %3992 = llvm.add %3989, %3991 : !llvm.i64 + %3993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %3994 = llvm.mul %3917, %3993 : !llvm.i64 + %3995 = llvm.add %3992, %3994 : !llvm.i64 + %3996 = llvm.getelementptr %3988[%3995] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %3997 = llvm.load %3996 : !llvm.ptr + %3998 = llvm.icmp "slt" %2368, %67 : !llvm.i64 + %3999 = llvm.sub %64, %2368 : !llvm.i64 + %4000 = llvm.select %3998, %3999, %2368 : !llvm.i1, !llvm.i64 + %4001 = llvm.sdiv %4000, %68 : !llvm.i64 + %4002 = llvm.sub %64, %4001 : !llvm.i64 + %4003 = llvm.select %3998, %4002, %4001 : !llvm.i1, !llvm.i64 + %4004 = llvm.srem %4003, %68 : !llvm.i64 + %4005 = llvm.icmp "slt" %4004, %67 : !llvm.i64 + %4006 = llvm.add %4004, %68 : !llvm.i64 + %4007 = llvm.select %4005, %4006, %4004 : !llvm.i1, !llvm.i64 + %4008 = llvm.srem %3917, %39 : !llvm.i64 + %4009 = llvm.icmp "slt" %4008, %67 : !llvm.i64 + %4010 = llvm.add %4008, %39 : !llvm.i64 + %4011 = llvm.select %4009, %4010, %4008 : !llvm.i1, !llvm.i64 + %4012 = llvm.srem %2368, %68 : !llvm.i64 + %4013 = llvm.icmp "slt" %4012, %67 : !llvm.i64 + %4014 = llvm.add %4012, %68 : !llvm.i64 + %4015 = llvm.select %4013, %4014, %4012 : !llvm.i1, !llvm.i64 + %4016 = llvm.icmp "slt" %4015, %67 : !llvm.i64 + %4017 = llvm.sub %64, %4015 : !llvm.i64 + %4018 = llvm.select %4016, %4017, %4015 : !llvm.i1, !llvm.i64 + %4019 = llvm.sdiv %4018, %70 : !llvm.i64 + %4020 = llvm.sub %64, %4019 : !llvm.i64 + %4021 = llvm.select %4016, %4020, %4019 : !llvm.i1, !llvm.i64 + %4022 = llvm.srem %4021, %63 : !llvm.i64 + %4023 = llvm.icmp "slt" %4022, %67 : !llvm.i64 + %4024 = llvm.add %4022, %63 : !llvm.i64 + %4025 = llvm.select %4023, %4024, %4022 : !llvm.i1, !llvm.i64 + %4026 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4027 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4028 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4029 = llvm.mul %4007, %4028 : !llvm.i64 + %4030 = llvm.add %4027, %4029 : !llvm.i64 + %4031 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4032 = llvm.mul %4011, %4031 : !llvm.i64 + %4033 = llvm.add %4030, %4032 : !llvm.i64 + %4034 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4035 = llvm.mul %4025, %4034 : !llvm.i64 + %4036 = llvm.add %4033, %4035 : !llvm.i64 + %4037 = llvm.getelementptr %4026[%4036] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4038 = llvm.load %4037 : !llvm.ptr> + %4039 = llvm.extractelement %4038[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4040 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4041 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4042 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4043 = llvm.mul %4007, %4042 : !llvm.i64 + %4044 = llvm.add %4041, %4043 : !llvm.i64 + %4045 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4046 = llvm.mul %4011, %4045 : !llvm.i64 + %4047 = llvm.add %4044, %4046 : !llvm.i64 + %4048 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4049 = llvm.mul %4025, %4048 : !llvm.i64 + %4050 = llvm.add %4047, %4049 : !llvm.i64 + %4051 = llvm.getelementptr %4040[%4050] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4052 = llvm.load %4051 : !llvm.ptr> + %4053 = llvm.extractelement %4052[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4054 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4056 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4057 = llvm.mul %4007, %4056 : !llvm.i64 + %4058 = llvm.add %4055, %4057 : !llvm.i64 + %4059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4060 = llvm.mul %4011, %4059 : !llvm.i64 + %4061 = llvm.add %4058, %4060 : !llvm.i64 + %4062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4063 = llvm.mul %4025, %4062 : !llvm.i64 + %4064 = llvm.add %4061, %4063 : !llvm.i64 + %4065 = llvm.getelementptr %4054[%4064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4066 = llvm.load %4065 : !llvm.ptr> + %4067 = llvm.extractelement %4066[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4068 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4070 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4071 = llvm.mul %4007, %4070 : !llvm.i64 + %4072 = llvm.add %4069, %4071 : !llvm.i64 + %4073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4074 = llvm.mul %4011, %4073 : !llvm.i64 + %4075 = llvm.add %4072, %4074 : !llvm.i64 + %4076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4077 = llvm.mul %4025, %4076 : !llvm.i64 + %4078 = llvm.add %4075, %4077 : !llvm.i64 + %4079 = llvm.getelementptr %4068[%4078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4080 = llvm.load %4079 : !llvm.ptr> + %4081 = llvm.extractelement %4080[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4082 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4083 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4084 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4085 = llvm.mul %4007, %4084 : !llvm.i64 + %4086 = llvm.add %4083, %4085 : !llvm.i64 + %4087 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4088 = llvm.mul %4011, %4087 : !llvm.i64 + %4089 = llvm.add %4086, %4088 : !llvm.i64 + %4090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4091 = llvm.mul %4025, %4090 : !llvm.i64 + %4092 = llvm.add %4089, %4091 : !llvm.i64 + %4093 = llvm.getelementptr %4082[%4092] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4094 = llvm.load %4093 : !llvm.ptr> + %4095 = llvm.extractelement %4094[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4096 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4097 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4098 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4099 = llvm.mul %4007, %4098 : !llvm.i64 + %4100 = llvm.add %4097, %4099 : !llvm.i64 + %4101 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4102 = llvm.mul %4011, %4101 : !llvm.i64 + %4103 = llvm.add %4100, %4102 : !llvm.i64 + %4104 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4105 = llvm.mul %4025, %4104 : !llvm.i64 + %4106 = llvm.add %4103, %4105 : !llvm.i64 + %4107 = llvm.getelementptr %4096[%4106] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4108 = llvm.load %4107 : !llvm.ptr> + %4109 = llvm.extractelement %4108[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4110 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4111 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4112 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4113 = llvm.mul %4007, %4112 : !llvm.i64 + %4114 = llvm.add %4111, %4113 : !llvm.i64 + %4115 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4116 = llvm.mul %4011, %4115 : !llvm.i64 + %4117 = llvm.add %4114, %4116 : !llvm.i64 + %4118 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4119 = llvm.mul %4025, %4118 : !llvm.i64 + %4120 = llvm.add %4117, %4119 : !llvm.i64 + %4121 = llvm.getelementptr %4110[%4120] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4122 = llvm.load %4121 : !llvm.ptr> + %4123 = llvm.extractelement %4122[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4124 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4125 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4126 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4127 = llvm.mul %4007, %4126 : !llvm.i64 + %4128 = llvm.add %4125, %4127 : !llvm.i64 + %4129 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4130 = llvm.mul %4011, %4129 : !llvm.i64 + %4131 = llvm.add %4128, %4130 : !llvm.i64 + %4132 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4133 = llvm.mul %4025, %4132 : !llvm.i64 + %4134 = llvm.add %4131, %4133 : !llvm.i64 + %4135 = llvm.getelementptr %4124[%4134] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4136 = llvm.load %4135 : !llvm.ptr> + %4137 = llvm.extractelement %4136[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4138 = llvm.fmul %3927, %4039 {RelaxedPrecision} : !llvm.float + %4139 = llvm.fmul %3937, %4053 {RelaxedPrecision} : !llvm.float + %4140 = llvm.fmul %3947, %4067 {RelaxedPrecision} : !llvm.float + %4141 = llvm.fmul %3957, %4081 {RelaxedPrecision} : !llvm.float + %4142 = llvm.fmul %3967, %4095 {RelaxedPrecision} : !llvm.float + %4143 = llvm.fmul %3977, %4109 {RelaxedPrecision} : !llvm.float + %4144 = llvm.fmul %3987, %4123 {RelaxedPrecision} : !llvm.float + %4145 = llvm.fmul %3997, %4137 {RelaxedPrecision} : !llvm.float + %4146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4149 = llvm.mul %4007, %4148 : !llvm.i64 + %4150 = llvm.add %4147, %4149 : !llvm.i64 + %4151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4152 = llvm.mul %67, %4151 : !llvm.i64 + %4153 = llvm.add %4150, %4152 : !llvm.i64 + %4154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4155 = llvm.mul %4025, %4154 : !llvm.i64 + %4156 = llvm.add %4153, %4155 : !llvm.i64 + %4157 = llvm.getelementptr %4146[%4156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4158 = llvm.load %4157 : !llvm.ptr> + %4159 = llvm.extractelement %4158[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4160 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4161 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4162 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4163 = llvm.mul %4007, %4162 : !llvm.i64 + %4164 = llvm.add %4161, %4163 : !llvm.i64 + %4165 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4166 = llvm.mul %67, %4165 : !llvm.i64 + %4167 = llvm.add %4164, %4166 : !llvm.i64 + %4168 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4169 = llvm.mul %4025, %4168 : !llvm.i64 + %4170 = llvm.add %4167, %4169 : !llvm.i64 + %4171 = llvm.getelementptr %4160[%4170] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4172 = llvm.load %4171 : !llvm.ptr> + %4173 = llvm.extractelement %4172[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4174 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4175 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4176 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4177 = llvm.mul %4007, %4176 : !llvm.i64 + %4178 = llvm.add %4175, %4177 : !llvm.i64 + %4179 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4180 = llvm.mul %67, %4179 : !llvm.i64 + %4181 = llvm.add %4178, %4180 : !llvm.i64 + %4182 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4183 = llvm.mul %4025, %4182 : !llvm.i64 + %4184 = llvm.add %4181, %4183 : !llvm.i64 + %4185 = llvm.getelementptr %4174[%4184] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4186 = llvm.load %4185 : !llvm.ptr> + %4187 = llvm.extractelement %4186[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4188 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4190 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4191 = llvm.mul %4007, %4190 : !llvm.i64 + %4192 = llvm.add %4189, %4191 : !llvm.i64 + %4193 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4194 = llvm.mul %67, %4193 : !llvm.i64 + %4195 = llvm.add %4192, %4194 : !llvm.i64 + %4196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4197 = llvm.mul %4025, %4196 : !llvm.i64 + %4198 = llvm.add %4195, %4197 : !llvm.i64 + %4199 = llvm.getelementptr %4188[%4198] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4200 = llvm.load %4199 : !llvm.ptr> + %4201 = llvm.extractelement %4200[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4202 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4203 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4204 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4205 = llvm.mul %4007, %4204 : !llvm.i64 + %4206 = llvm.add %4203, %4205 : !llvm.i64 + %4207 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4208 = llvm.mul %67, %4207 : !llvm.i64 + %4209 = llvm.add %4206, %4208 : !llvm.i64 + %4210 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4211 = llvm.mul %4025, %4210 : !llvm.i64 + %4212 = llvm.add %4209, %4211 : !llvm.i64 + %4213 = llvm.getelementptr %4202[%4212] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4214 = llvm.load %4213 : !llvm.ptr> + %4215 = llvm.extractelement %4214[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4216 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4217 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4218 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4219 = llvm.mul %4007, %4218 : !llvm.i64 + %4220 = llvm.add %4217, %4219 : !llvm.i64 + %4221 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4222 = llvm.mul %67, %4221 : !llvm.i64 + %4223 = llvm.add %4220, %4222 : !llvm.i64 + %4224 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4225 = llvm.mul %4025, %4224 : !llvm.i64 + %4226 = llvm.add %4223, %4225 : !llvm.i64 + %4227 = llvm.getelementptr %4216[%4226] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4228 = llvm.load %4227 : !llvm.ptr> + %4229 = llvm.extractelement %4228[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4230 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4231 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4232 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4233 = llvm.mul %4007, %4232 : !llvm.i64 + %4234 = llvm.add %4231, %4233 : !llvm.i64 + %4235 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4236 = llvm.mul %67, %4235 : !llvm.i64 + %4237 = llvm.add %4234, %4236 : !llvm.i64 + %4238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4239 = llvm.mul %4025, %4238 : !llvm.i64 + %4240 = llvm.add %4237, %4239 : !llvm.i64 + %4241 = llvm.getelementptr %4230[%4240] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4242 = llvm.load %4241 : !llvm.ptr> + %4243 = llvm.extractelement %4242[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4244 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4245 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4246 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4247 = llvm.mul %4007, %4246 : !llvm.i64 + %4248 = llvm.add %4245, %4247 : !llvm.i64 + %4249 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4250 = llvm.mul %67, %4249 : !llvm.i64 + %4251 = llvm.add %4248, %4250 : !llvm.i64 + %4252 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4253 = llvm.mul %4025, %4252 : !llvm.i64 + %4254 = llvm.add %4251, %4253 : !llvm.i64 + %4255 = llvm.getelementptr %4244[%4254] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4256 = llvm.load %4255 : !llvm.ptr> + %4257 = llvm.extractelement %4256[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4258 = llvm.fadd %4159, %4138 {RelaxedPrecision} : !llvm.float + %4259 = llvm.fadd %4173, %4139 {RelaxedPrecision} : !llvm.float + %4260 = llvm.fadd %4187, %4140 {RelaxedPrecision} : !llvm.float + %4261 = llvm.fadd %4201, %4141 {RelaxedPrecision} : !llvm.float + %4262 = llvm.fadd %4215, %4142 {RelaxedPrecision} : !llvm.float + %4263 = llvm.fadd %4229, %4143 {RelaxedPrecision} : !llvm.float + %4264 = llvm.fadd %4243, %4144 {RelaxedPrecision} : !llvm.float + %4265 = llvm.fadd %4257, %4145 {RelaxedPrecision} : !llvm.float + %4266 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4267 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4268 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4269 = llvm.mul %4007, %4268 : !llvm.i64 + %4270 = llvm.add %4267, %4269 : !llvm.i64 + %4271 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4272 = llvm.mul %67, %4271 : !llvm.i64 + %4273 = llvm.add %4270, %4272 : !llvm.i64 + %4274 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4275 = llvm.mul %4025, %4274 : !llvm.i64 + %4276 = llvm.add %4273, %4275 : !llvm.i64 + %4277 = llvm.getelementptr %4266[%4276] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4278 = llvm.load %4277 : !llvm.ptr> + %4279 = llvm.insertelement %4258, %4278[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4280 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4281 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4282 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4283 = llvm.mul %4007, %4282 : !llvm.i64 + %4284 = llvm.add %4281, %4283 : !llvm.i64 + %4285 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4286 = llvm.mul %67, %4285 : !llvm.i64 + %4287 = llvm.add %4284, %4286 : !llvm.i64 + %4288 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4289 = llvm.mul %4025, %4288 : !llvm.i64 + %4290 = llvm.add %4287, %4289 : !llvm.i64 + %4291 = llvm.getelementptr %4280[%4290] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4279, %4291 : !llvm.ptr> + %4292 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4293 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4294 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4295 = llvm.mul %4007, %4294 : !llvm.i64 + %4296 = llvm.add %4293, %4295 : !llvm.i64 + %4297 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4298 = llvm.mul %67, %4297 : !llvm.i64 + %4299 = llvm.add %4296, %4298 : !llvm.i64 + %4300 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4301 = llvm.mul %4025, %4300 : !llvm.i64 + %4302 = llvm.add %4299, %4301 : !llvm.i64 + %4303 = llvm.getelementptr %4292[%4302] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4304 = llvm.load %4303 : !llvm.ptr> + %4305 = llvm.insertelement %4259, %4304[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4306 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4307 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4308 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4309 = llvm.mul %4007, %4308 : !llvm.i64 + %4310 = llvm.add %4307, %4309 : !llvm.i64 + %4311 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4312 = llvm.mul %67, %4311 : !llvm.i64 + %4313 = llvm.add %4310, %4312 : !llvm.i64 + %4314 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4315 = llvm.mul %4025, %4314 : !llvm.i64 + %4316 = llvm.add %4313, %4315 : !llvm.i64 + %4317 = llvm.getelementptr %4306[%4316] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4305, %4317 : !llvm.ptr> + %4318 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4319 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4320 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4321 = llvm.mul %4007, %4320 : !llvm.i64 + %4322 = llvm.add %4319, %4321 : !llvm.i64 + %4323 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4324 = llvm.mul %67, %4323 : !llvm.i64 + %4325 = llvm.add %4322, %4324 : !llvm.i64 + %4326 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4327 = llvm.mul %4025, %4326 : !llvm.i64 + %4328 = llvm.add %4325, %4327 : !llvm.i64 + %4329 = llvm.getelementptr %4318[%4328] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4330 = llvm.load %4329 : !llvm.ptr> + %4331 = llvm.insertelement %4260, %4330[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4332 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4333 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4334 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4335 = llvm.mul %4007, %4334 : !llvm.i64 + %4336 = llvm.add %4333, %4335 : !llvm.i64 + %4337 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4338 = llvm.mul %67, %4337 : !llvm.i64 + %4339 = llvm.add %4336, %4338 : !llvm.i64 + %4340 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4341 = llvm.mul %4025, %4340 : !llvm.i64 + %4342 = llvm.add %4339, %4341 : !llvm.i64 + %4343 = llvm.getelementptr %4332[%4342] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4331, %4343 : !llvm.ptr> + %4344 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4345 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4346 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4347 = llvm.mul %4007, %4346 : !llvm.i64 + %4348 = llvm.add %4345, %4347 : !llvm.i64 + %4349 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4350 = llvm.mul %67, %4349 : !llvm.i64 + %4351 = llvm.add %4348, %4350 : !llvm.i64 + %4352 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4353 = llvm.mul %4025, %4352 : !llvm.i64 + %4354 = llvm.add %4351, %4353 : !llvm.i64 + %4355 = llvm.getelementptr %4344[%4354] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4356 = llvm.load %4355 : !llvm.ptr> + %4357 = llvm.insertelement %4261, %4356[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4358 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4359 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4360 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4361 = llvm.mul %4007, %4360 : !llvm.i64 + %4362 = llvm.add %4359, %4361 : !llvm.i64 + %4363 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4364 = llvm.mul %67, %4363 : !llvm.i64 + %4365 = llvm.add %4362, %4364 : !llvm.i64 + %4366 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4367 = llvm.mul %4025, %4366 : !llvm.i64 + %4368 = llvm.add %4365, %4367 : !llvm.i64 + %4369 = llvm.getelementptr %4358[%4368] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4357, %4369 : !llvm.ptr> + %4370 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4371 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4372 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4373 = llvm.mul %4007, %4372 : !llvm.i64 + %4374 = llvm.add %4371, %4373 : !llvm.i64 + %4375 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4376 = llvm.mul %67, %4375 : !llvm.i64 + %4377 = llvm.add %4374, %4376 : !llvm.i64 + %4378 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4379 = llvm.mul %4025, %4378 : !llvm.i64 + %4380 = llvm.add %4377, %4379 : !llvm.i64 + %4381 = llvm.getelementptr %4370[%4380] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4382 = llvm.load %4381 : !llvm.ptr> + %4383 = llvm.insertelement %4262, %4382[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4384 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4385 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4386 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4387 = llvm.mul %4007, %4386 : !llvm.i64 + %4388 = llvm.add %4385, %4387 : !llvm.i64 + %4389 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4390 = llvm.mul %67, %4389 : !llvm.i64 + %4391 = llvm.add %4388, %4390 : !llvm.i64 + %4392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4393 = llvm.mul %4025, %4392 : !llvm.i64 + %4394 = llvm.add %4391, %4393 : !llvm.i64 + %4395 = llvm.getelementptr %4384[%4394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4383, %4395 : !llvm.ptr> + %4396 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4397 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4398 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4399 = llvm.mul %4007, %4398 : !llvm.i64 + %4400 = llvm.add %4397, %4399 : !llvm.i64 + %4401 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4402 = llvm.mul %67, %4401 : !llvm.i64 + %4403 = llvm.add %4400, %4402 : !llvm.i64 + %4404 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4405 = llvm.mul %4025, %4404 : !llvm.i64 + %4406 = llvm.add %4403, %4405 : !llvm.i64 + %4407 = llvm.getelementptr %4396[%4406] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4408 = llvm.load %4407 : !llvm.ptr> + %4409 = llvm.insertelement %4263, %4408[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4410 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4412 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4413 = llvm.mul %4007, %4412 : !llvm.i64 + %4414 = llvm.add %4411, %4413 : !llvm.i64 + %4415 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4416 = llvm.mul %67, %4415 : !llvm.i64 + %4417 = llvm.add %4414, %4416 : !llvm.i64 + %4418 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4419 = llvm.mul %4025, %4418 : !llvm.i64 + %4420 = llvm.add %4417, %4419 : !llvm.i64 + %4421 = llvm.getelementptr %4410[%4420] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4409, %4421 : !llvm.ptr> + %4422 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4423 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4424 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4425 = llvm.mul %4007, %4424 : !llvm.i64 + %4426 = llvm.add %4423, %4425 : !llvm.i64 + %4427 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4428 = llvm.mul %67, %4427 : !llvm.i64 + %4429 = llvm.add %4426, %4428 : !llvm.i64 + %4430 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4431 = llvm.mul %4025, %4430 : !llvm.i64 + %4432 = llvm.add %4429, %4431 : !llvm.i64 + %4433 = llvm.getelementptr %4422[%4432] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4434 = llvm.load %4433 : !llvm.ptr> + %4435 = llvm.insertelement %4264, %4434[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4436 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4437 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4438 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4439 = llvm.mul %4007, %4438 : !llvm.i64 + %4440 = llvm.add %4437, %4439 : !llvm.i64 + %4441 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4442 = llvm.mul %67, %4441 : !llvm.i64 + %4443 = llvm.add %4440, %4442 : !llvm.i64 + %4444 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4445 = llvm.mul %4025, %4444 : !llvm.i64 + %4446 = llvm.add %4443, %4445 : !llvm.i64 + %4447 = llvm.getelementptr %4436[%4446] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4435, %4447 : !llvm.ptr> + %4448 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4449 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4450 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4451 = llvm.mul %4007, %4450 : !llvm.i64 + %4452 = llvm.add %4449, %4451 : !llvm.i64 + %4453 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4454 = llvm.mul %67, %4453 : !llvm.i64 + %4455 = llvm.add %4452, %4454 : !llvm.i64 + %4456 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4457 = llvm.mul %4025, %4456 : !llvm.i64 + %4458 = llvm.add %4455, %4457 : !llvm.i64 + %4459 = llvm.getelementptr %4448[%4458] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4460 = llvm.load %4459 : !llvm.ptr> + %4461 = llvm.insertelement %4265, %4460[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4462 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4463 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4464 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4465 = llvm.mul %4007, %4464 : !llvm.i64 + %4466 = llvm.add %4463, %4465 : !llvm.i64 + %4467 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4468 = llvm.mul %67, %4467 : !llvm.i64 + %4469 = llvm.add %4466, %4468 : !llvm.i64 + %4470 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4471 = llvm.mul %4025, %4470 : !llvm.i64 + %4472 = llvm.add %4469, %4471 : !llvm.i64 + %4473 = llvm.getelementptr %4462[%4472] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4461, %4473 : !llvm.ptr> + %4474 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4475 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4476 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4477 = llvm.mul %4007, %4476 : !llvm.i64 + %4478 = llvm.add %4475, %4477 : !llvm.i64 + %4479 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4480 = llvm.mul %67, %4479 : !llvm.i64 + %4481 = llvm.add %4478, %4480 : !llvm.i64 + %4482 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4483 = llvm.mul %4025, %4482 : !llvm.i64 + %4484 = llvm.add %4481, %4483 : !llvm.i64 + %4485 = llvm.getelementptr %4474[%4484] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4486 = llvm.load %4485 : !llvm.ptr> + %4487 = llvm.insertelement %4258, %4486[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4488 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4489 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4490 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4491 = llvm.mul %4007, %4490 : !llvm.i64 + %4492 = llvm.add %4489, %4491 : !llvm.i64 + %4493 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4494 = llvm.mul %67, %4493 : !llvm.i64 + %4495 = llvm.add %4492, %4494 : !llvm.i64 + %4496 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4497 = llvm.mul %4025, %4496 : !llvm.i64 + %4498 = llvm.add %4495, %4497 : !llvm.i64 + %4499 = llvm.getelementptr %4488[%4498] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4487, %4499 : !llvm.ptr> + %4500 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4501 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4502 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4503 = llvm.mul %4007, %4502 : !llvm.i64 + %4504 = llvm.add %4501, %4503 : !llvm.i64 + %4505 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4506 = llvm.mul %67, %4505 : !llvm.i64 + %4507 = llvm.add %4504, %4506 : !llvm.i64 + %4508 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4509 = llvm.mul %4025, %4508 : !llvm.i64 + %4510 = llvm.add %4507, %4509 : !llvm.i64 + %4511 = llvm.getelementptr %4500[%4510] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4512 = llvm.load %4511 : !llvm.ptr> + %4513 = llvm.insertelement %4259, %4512[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4514 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4515 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4516 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4517 = llvm.mul %4007, %4516 : !llvm.i64 + %4518 = llvm.add %4515, %4517 : !llvm.i64 + %4519 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4520 = llvm.mul %67, %4519 : !llvm.i64 + %4521 = llvm.add %4518, %4520 : !llvm.i64 + %4522 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4523 = llvm.mul %4025, %4522 : !llvm.i64 + %4524 = llvm.add %4521, %4523 : !llvm.i64 + %4525 = llvm.getelementptr %4514[%4524] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4513, %4525 : !llvm.ptr> + %4526 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4528 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4529 = llvm.mul %4007, %4528 : !llvm.i64 + %4530 = llvm.add %4527, %4529 : !llvm.i64 + %4531 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4532 = llvm.mul %67, %4531 : !llvm.i64 + %4533 = llvm.add %4530, %4532 : !llvm.i64 + %4534 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4535 = llvm.mul %4025, %4534 : !llvm.i64 + %4536 = llvm.add %4533, %4535 : !llvm.i64 + %4537 = llvm.getelementptr %4526[%4536] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4538 = llvm.load %4537 : !llvm.ptr> + %4539 = llvm.insertelement %4260, %4538[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4540 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4541 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4542 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4543 = llvm.mul %4007, %4542 : !llvm.i64 + %4544 = llvm.add %4541, %4543 : !llvm.i64 + %4545 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4546 = llvm.mul %67, %4545 : !llvm.i64 + %4547 = llvm.add %4544, %4546 : !llvm.i64 + %4548 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4549 = llvm.mul %4025, %4548 : !llvm.i64 + %4550 = llvm.add %4547, %4549 : !llvm.i64 + %4551 = llvm.getelementptr %4540[%4550] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4539, %4551 : !llvm.ptr> + %4552 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4553 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4554 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4555 = llvm.mul %4007, %4554 : !llvm.i64 + %4556 = llvm.add %4553, %4555 : !llvm.i64 + %4557 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4558 = llvm.mul %67, %4557 : !llvm.i64 + %4559 = llvm.add %4556, %4558 : !llvm.i64 + %4560 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4561 = llvm.mul %4025, %4560 : !llvm.i64 + %4562 = llvm.add %4559, %4561 : !llvm.i64 + %4563 = llvm.getelementptr %4552[%4562] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4564 = llvm.load %4563 : !llvm.ptr> + %4565 = llvm.insertelement %4261, %4564[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4566 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4567 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4568 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4569 = llvm.mul %4007, %4568 : !llvm.i64 + %4570 = llvm.add %4567, %4569 : !llvm.i64 + %4571 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4572 = llvm.mul %67, %4571 : !llvm.i64 + %4573 = llvm.add %4570, %4572 : !llvm.i64 + %4574 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4575 = llvm.mul %4025, %4574 : !llvm.i64 + %4576 = llvm.add %4573, %4575 : !llvm.i64 + %4577 = llvm.getelementptr %4566[%4576] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4565, %4577 : !llvm.ptr> + %4578 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4579 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4580 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4581 = llvm.mul %4007, %4580 : !llvm.i64 + %4582 = llvm.add %4579, %4581 : !llvm.i64 + %4583 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4584 = llvm.mul %67, %4583 : !llvm.i64 + %4585 = llvm.add %4582, %4584 : !llvm.i64 + %4586 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4587 = llvm.mul %4025, %4586 : !llvm.i64 + %4588 = llvm.add %4585, %4587 : !llvm.i64 + %4589 = llvm.getelementptr %4578[%4588] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4590 = llvm.load %4589 : !llvm.ptr> + %4591 = llvm.insertelement %4262, %4590[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4592 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4593 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4594 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4595 = llvm.mul %4007, %4594 : !llvm.i64 + %4596 = llvm.add %4593, %4595 : !llvm.i64 + %4597 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4598 = llvm.mul %67, %4597 : !llvm.i64 + %4599 = llvm.add %4596, %4598 : !llvm.i64 + %4600 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4601 = llvm.mul %4025, %4600 : !llvm.i64 + %4602 = llvm.add %4599, %4601 : !llvm.i64 + %4603 = llvm.getelementptr %4592[%4602] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4591, %4603 : !llvm.ptr> + %4604 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4605 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4606 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4607 = llvm.mul %4007, %4606 : !llvm.i64 + %4608 = llvm.add %4605, %4607 : !llvm.i64 + %4609 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4610 = llvm.mul %67, %4609 : !llvm.i64 + %4611 = llvm.add %4608, %4610 : !llvm.i64 + %4612 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4613 = llvm.mul %4025, %4612 : !llvm.i64 + %4614 = llvm.add %4611, %4613 : !llvm.i64 + %4615 = llvm.getelementptr %4604[%4614] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4616 = llvm.load %4615 : !llvm.ptr> + %4617 = llvm.insertelement %4263, %4616[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4618 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4619 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4620 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4621 = llvm.mul %4007, %4620 : !llvm.i64 + %4622 = llvm.add %4619, %4621 : !llvm.i64 + %4623 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4624 = llvm.mul %67, %4623 : !llvm.i64 + %4625 = llvm.add %4622, %4624 : !llvm.i64 + %4626 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4627 = llvm.mul %4025, %4626 : !llvm.i64 + %4628 = llvm.add %4625, %4627 : !llvm.i64 + %4629 = llvm.getelementptr %4618[%4628] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4617, %4629 : !llvm.ptr> + %4630 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4631 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4632 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4633 = llvm.mul %4007, %4632 : !llvm.i64 + %4634 = llvm.add %4631, %4633 : !llvm.i64 + %4635 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4636 = llvm.mul %67, %4635 : !llvm.i64 + %4637 = llvm.add %4634, %4636 : !llvm.i64 + %4638 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4639 = llvm.mul %4025, %4638 : !llvm.i64 + %4640 = llvm.add %4637, %4639 : !llvm.i64 + %4641 = llvm.getelementptr %4630[%4640] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4642 = llvm.load %4641 : !llvm.ptr> + %4643 = llvm.insertelement %4264, %4642[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4644 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4645 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4646 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4647 = llvm.mul %4007, %4646 : !llvm.i64 + %4648 = llvm.add %4645, %4647 : !llvm.i64 + %4649 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4650 = llvm.mul %67, %4649 : !llvm.i64 + %4651 = llvm.add %4648, %4650 : !llvm.i64 + %4652 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4653 = llvm.mul %4025, %4652 : !llvm.i64 + %4654 = llvm.add %4651, %4653 : !llvm.i64 + %4655 = llvm.getelementptr %4644[%4654] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4643, %4655 : !llvm.ptr> + %4656 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4657 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4658 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4659 = llvm.mul %4007, %4658 : !llvm.i64 + %4660 = llvm.add %4657, %4659 : !llvm.i64 + %4661 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4662 = llvm.mul %67, %4661 : !llvm.i64 + %4663 = llvm.add %4660, %4662 : !llvm.i64 + %4664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4665 = llvm.mul %4025, %4664 : !llvm.i64 + %4666 = llvm.add %4663, %4665 : !llvm.i64 + %4667 = llvm.getelementptr %4656[%4666] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4668 = llvm.load %4667 : !llvm.ptr> + %4669 = llvm.insertelement %4265, %4668[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4670 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4671 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4672 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4673 = llvm.mul %4007, %4672 : !llvm.i64 + %4674 = llvm.add %4671, %4673 : !llvm.i64 + %4675 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4676 = llvm.mul %67, %4675 : !llvm.i64 + %4677 = llvm.add %4674, %4676 : !llvm.i64 + %4678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4679 = llvm.mul %4025, %4678 : !llvm.i64 + %4680 = llvm.add %4677, %4679 : !llvm.i64 + %4681 = llvm.getelementptr %4670[%4680] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %4669, %4681 : !llvm.ptr> + %4682 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4683 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4684 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4685 = llvm.mul %2345, %4684 : !llvm.i64 + %4686 = llvm.add %4683, %4685 : !llvm.i64 + %4687 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4688 = llvm.mul %3917, %4687 : !llvm.i64 + %4689 = llvm.add %4686, %4688 : !llvm.i64 + %4690 = llvm.getelementptr %4682[%4689] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4691 = llvm.load %4690 : !llvm.ptr + %4692 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4693 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4694 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4695 = llvm.mul %2345, %4694 : !llvm.i64 + %4696 = llvm.add %4693, %4695 : !llvm.i64 + %4697 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4698 = llvm.mul %3917, %4697 : !llvm.i64 + %4699 = llvm.add %4696, %4698 : !llvm.i64 + %4700 = llvm.getelementptr %4692[%4699] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4701 = llvm.load %4700 : !llvm.ptr + %4702 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4703 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4704 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4705 = llvm.mul %2345, %4704 : !llvm.i64 + %4706 = llvm.add %4703, %4705 : !llvm.i64 + %4707 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4708 = llvm.mul %3917, %4707 : !llvm.i64 + %4709 = llvm.add %4706, %4708 : !llvm.i64 + %4710 = llvm.getelementptr %4702[%4709] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4711 = llvm.load %4710 : !llvm.ptr + %4712 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4713 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4714 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4715 = llvm.mul %2345, %4714 : !llvm.i64 + %4716 = llvm.add %4713, %4715 : !llvm.i64 + %4717 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4718 = llvm.mul %3917, %4717 : !llvm.i64 + %4719 = llvm.add %4716, %4718 : !llvm.i64 + %4720 = llvm.getelementptr %4712[%4719] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4721 = llvm.load %4720 : !llvm.ptr + %4722 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4724 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4725 = llvm.mul %2345, %4724 : !llvm.i64 + %4726 = llvm.add %4723, %4725 : !llvm.i64 + %4727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4728 = llvm.mul %3917, %4727 : !llvm.i64 + %4729 = llvm.add %4726, %4728 : !llvm.i64 + %4730 = llvm.getelementptr %4722[%4729] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4731 = llvm.load %4730 : !llvm.ptr + %4732 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4734 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4735 = llvm.mul %2345, %4734 : !llvm.i64 + %4736 = llvm.add %4733, %4735 : !llvm.i64 + %4737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4738 = llvm.mul %3917, %4737 : !llvm.i64 + %4739 = llvm.add %4736, %4738 : !llvm.i64 + %4740 = llvm.getelementptr %4732[%4739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4741 = llvm.load %4740 : !llvm.ptr + %4742 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4743 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4744 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4745 = llvm.mul %2345, %4744 : !llvm.i64 + %4746 = llvm.add %4743, %4745 : !llvm.i64 + %4747 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4748 = llvm.mul %3917, %4747 : !llvm.i64 + %4749 = llvm.add %4746, %4748 : !llvm.i64 + %4750 = llvm.getelementptr %4742[%4749] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4751 = llvm.load %4750 : !llvm.ptr + %4752 = llvm.extractvalue %7[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %4753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4754 = llvm.mlir.constant(128 : index) : !llvm.i64 + %4755 = llvm.mul %2345, %4754 : !llvm.i64 + %4756 = llvm.add %4753, %4755 : !llvm.i64 + %4757 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4758 = llvm.mul %3917, %4757 : !llvm.i64 + %4759 = llvm.add %4756, %4758 : !llvm.i64 + %4760 = llvm.getelementptr %4752[%4759] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %4761 = llvm.load %4760 : !llvm.ptr + %4762 = llvm.add %2368, %70 : !llvm.i64 + %4763 = llvm.icmp "slt" %4762, %67 : !llvm.i64 + %4764 = llvm.sub %64, %4762 : !llvm.i64 + %4765 = llvm.select %4763, %4764, %4762 : !llvm.i1, !llvm.i64 + %4766 = llvm.sdiv %4765, %68 : !llvm.i64 + %4767 = llvm.sub %64, %4766 : !llvm.i64 + %4768 = llvm.select %4763, %4767, %4766 : !llvm.i1, !llvm.i64 + %4769 = llvm.srem %4768, %68 : !llvm.i64 + %4770 = llvm.icmp "slt" %4769, %67 : !llvm.i64 + %4771 = llvm.add %4769, %68 : !llvm.i64 + %4772 = llvm.select %4770, %4771, %4769 : !llvm.i1, !llvm.i64 + %4773 = llvm.sdiv %4000, %70 : !llvm.i64 + %4774 = llvm.sub %64, %4773 : !llvm.i64 + %4775 = llvm.select %3998, %4774, %4773 : !llvm.i1, !llvm.i64 + %4776 = llvm.mul %4768, %65 : !llvm.i64 + %4777 = llvm.add %4775, %4776 : !llvm.i64 + %4778 = llvm.add %4777, %69 : !llvm.i64 + %4779 = llvm.icmp "slt" %4778, %67 : !llvm.i64 + %4780 = llvm.sub %64, %4778 : !llvm.i64 + %4781 = llvm.select %4779, %4780, %4778 : !llvm.i1, !llvm.i64 + %4782 = llvm.sdiv %4781, %63 : !llvm.i64 + %4783 = llvm.sub %64, %4782 : !llvm.i64 + %4784 = llvm.select %4779, %4783, %4782 : !llvm.i1, !llvm.i64 + %4785 = llvm.mul %4784, %65 : !llvm.i64 + %4786 = llvm.add %4777, %4785 : !llvm.i64 + %4787 = llvm.add %4786, %69 : !llvm.i64 + %4788 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4789 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4790 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4791 = llvm.mul %4772, %4790 : !llvm.i64 + %4792 = llvm.add %4789, %4791 : !llvm.i64 + %4793 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4794 = llvm.mul %4011, %4793 : !llvm.i64 + %4795 = llvm.add %4792, %4794 : !llvm.i64 + %4796 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4797 = llvm.mul %4787, %4796 : !llvm.i64 + %4798 = llvm.add %4795, %4797 : !llvm.i64 + %4799 = llvm.getelementptr %4788[%4798] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4800 = llvm.load %4799 : !llvm.ptr> + %4801 = llvm.extractelement %4800[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4802 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4803 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4804 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4805 = llvm.mul %4772, %4804 : !llvm.i64 + %4806 = llvm.add %4803, %4805 : !llvm.i64 + %4807 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4808 = llvm.mul %4011, %4807 : !llvm.i64 + %4809 = llvm.add %4806, %4808 : !llvm.i64 + %4810 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4811 = llvm.mul %4787, %4810 : !llvm.i64 + %4812 = llvm.add %4809, %4811 : !llvm.i64 + %4813 = llvm.getelementptr %4802[%4812] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4814 = llvm.load %4813 : !llvm.ptr> + %4815 = llvm.extractelement %4814[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4816 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4817 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4818 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4819 = llvm.mul %4772, %4818 : !llvm.i64 + %4820 = llvm.add %4817, %4819 : !llvm.i64 + %4821 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4822 = llvm.mul %4011, %4821 : !llvm.i64 + %4823 = llvm.add %4820, %4822 : !llvm.i64 + %4824 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4825 = llvm.mul %4787, %4824 : !llvm.i64 + %4826 = llvm.add %4823, %4825 : !llvm.i64 + %4827 = llvm.getelementptr %4816[%4826] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4828 = llvm.load %4827 : !llvm.ptr> + %4829 = llvm.extractelement %4828[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4830 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4831 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4832 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4833 = llvm.mul %4772, %4832 : !llvm.i64 + %4834 = llvm.add %4831, %4833 : !llvm.i64 + %4835 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4836 = llvm.mul %4011, %4835 : !llvm.i64 + %4837 = llvm.add %4834, %4836 : !llvm.i64 + %4838 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4839 = llvm.mul %4787, %4838 : !llvm.i64 + %4840 = llvm.add %4837, %4839 : !llvm.i64 + %4841 = llvm.getelementptr %4830[%4840] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4842 = llvm.load %4841 : !llvm.ptr> + %4843 = llvm.extractelement %4842[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4844 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4845 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4846 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4847 = llvm.mul %4772, %4846 : !llvm.i64 + %4848 = llvm.add %4845, %4847 : !llvm.i64 + %4849 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4850 = llvm.mul %4011, %4849 : !llvm.i64 + %4851 = llvm.add %4848, %4850 : !llvm.i64 + %4852 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4853 = llvm.mul %4787, %4852 : !llvm.i64 + %4854 = llvm.add %4851, %4853 : !llvm.i64 + %4855 = llvm.getelementptr %4844[%4854] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4856 = llvm.load %4855 : !llvm.ptr> + %4857 = llvm.extractelement %4856[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4858 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4859 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4860 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4861 = llvm.mul %4772, %4860 : !llvm.i64 + %4862 = llvm.add %4859, %4861 : !llvm.i64 + %4863 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4864 = llvm.mul %4011, %4863 : !llvm.i64 + %4865 = llvm.add %4862, %4864 : !llvm.i64 + %4866 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4867 = llvm.mul %4787, %4866 : !llvm.i64 + %4868 = llvm.add %4865, %4867 : !llvm.i64 + %4869 = llvm.getelementptr %4858[%4868] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4870 = llvm.load %4869 : !llvm.ptr> + %4871 = llvm.extractelement %4870[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4872 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4873 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4874 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4875 = llvm.mul %4772, %4874 : !llvm.i64 + %4876 = llvm.add %4873, %4875 : !llvm.i64 + %4877 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4878 = llvm.mul %4011, %4877 : !llvm.i64 + %4879 = llvm.add %4876, %4878 : !llvm.i64 + %4880 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4881 = llvm.mul %4787, %4880 : !llvm.i64 + %4882 = llvm.add %4879, %4881 : !llvm.i64 + %4883 = llvm.getelementptr %4872[%4882] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4884 = llvm.load %4883 : !llvm.ptr> + %4885 = llvm.extractelement %4884[%30 : !llvm.i64] : !llvm.vec<8 x float> + %4886 = llvm.extractvalue %150[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4887 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4888 = llvm.mlir.constant(256 : index) : !llvm.i64 + %4889 = llvm.mul %4772, %4888 : !llvm.i64 + %4890 = llvm.add %4887, %4889 : !llvm.i64 + %4891 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4892 = llvm.mul %4011, %4891 : !llvm.i64 + %4893 = llvm.add %4890, %4892 : !llvm.i64 + %4894 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4895 = llvm.mul %4787, %4894 : !llvm.i64 + %4896 = llvm.add %4893, %4895 : !llvm.i64 + %4897 = llvm.getelementptr %4886[%4896] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4898 = llvm.load %4897 : !llvm.ptr> + %4899 = llvm.extractelement %4898[%31 : !llvm.i64] : !llvm.vec<8 x float> + %4900 = llvm.fmul %4691, %4801 {RelaxedPrecision} : !llvm.float + %4901 = llvm.fmul %4701, %4815 {RelaxedPrecision} : !llvm.float + %4902 = llvm.fmul %4711, %4829 {RelaxedPrecision} : !llvm.float + %4903 = llvm.fmul %4721, %4843 {RelaxedPrecision} : !llvm.float + %4904 = llvm.fmul %4731, %4857 {RelaxedPrecision} : !llvm.float + %4905 = llvm.fmul %4741, %4871 {RelaxedPrecision} : !llvm.float + %4906 = llvm.fmul %4751, %4885 {RelaxedPrecision} : !llvm.float + %4907 = llvm.fmul %4761, %4899 {RelaxedPrecision} : !llvm.float + %4908 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4909 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4910 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4911 = llvm.mul %4772, %4910 : !llvm.i64 + %4912 = llvm.add %4909, %4911 : !llvm.i64 + %4913 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4914 = llvm.mul %67, %4913 : !llvm.i64 + %4915 = llvm.add %4912, %4914 : !llvm.i64 + %4916 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4917 = llvm.mul %4787, %4916 : !llvm.i64 + %4918 = llvm.add %4915, %4917 : !llvm.i64 + %4919 = llvm.getelementptr %4908[%4918] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4920 = llvm.load %4919 : !llvm.ptr> + %4921 = llvm.extractelement %4920[%24 : !llvm.i64] : !llvm.vec<8 x float> + %4922 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4923 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4924 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4925 = llvm.mul %4772, %4924 : !llvm.i64 + %4926 = llvm.add %4923, %4925 : !llvm.i64 + %4927 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4928 = llvm.mul %67, %4927 : !llvm.i64 + %4929 = llvm.add %4926, %4928 : !llvm.i64 + %4930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4931 = llvm.mul %4787, %4930 : !llvm.i64 + %4932 = llvm.add %4929, %4931 : !llvm.i64 + %4933 = llvm.getelementptr %4922[%4932] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4934 = llvm.load %4933 : !llvm.ptr> + %4935 = llvm.extractelement %4934[%25 : !llvm.i64] : !llvm.vec<8 x float> + %4936 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4937 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4938 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4939 = llvm.mul %4772, %4938 : !llvm.i64 + %4940 = llvm.add %4937, %4939 : !llvm.i64 + %4941 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4942 = llvm.mul %67, %4941 : !llvm.i64 + %4943 = llvm.add %4940, %4942 : !llvm.i64 + %4944 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4945 = llvm.mul %4787, %4944 : !llvm.i64 + %4946 = llvm.add %4943, %4945 : !llvm.i64 + %4947 = llvm.getelementptr %4936[%4946] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4948 = llvm.load %4947 : !llvm.ptr> + %4949 = llvm.extractelement %4948[%26 : !llvm.i64] : !llvm.vec<8 x float> + %4950 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4951 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4952 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4953 = llvm.mul %4772, %4952 : !llvm.i64 + %4954 = llvm.add %4951, %4953 : !llvm.i64 + %4955 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4956 = llvm.mul %67, %4955 : !llvm.i64 + %4957 = llvm.add %4954, %4956 : !llvm.i64 + %4958 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4959 = llvm.mul %4787, %4958 : !llvm.i64 + %4960 = llvm.add %4957, %4959 : !llvm.i64 + %4961 = llvm.getelementptr %4950[%4960] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4962 = llvm.load %4961 : !llvm.ptr> + %4963 = llvm.extractelement %4962[%27 : !llvm.i64] : !llvm.vec<8 x float> + %4964 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4965 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4966 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4967 = llvm.mul %4772, %4966 : !llvm.i64 + %4968 = llvm.add %4965, %4967 : !llvm.i64 + %4969 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4970 = llvm.mul %67, %4969 : !llvm.i64 + %4971 = llvm.add %4968, %4970 : !llvm.i64 + %4972 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4973 = llvm.mul %4787, %4972 : !llvm.i64 + %4974 = llvm.add %4971, %4973 : !llvm.i64 + %4975 = llvm.getelementptr %4964[%4974] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4976 = llvm.load %4975 : !llvm.ptr> + %4977 = llvm.extractelement %4976[%28 : !llvm.i64] : !llvm.vec<8 x float> + %4978 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4979 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4980 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4981 = llvm.mul %4772, %4980 : !llvm.i64 + %4982 = llvm.add %4979, %4981 : !llvm.i64 + %4983 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4984 = llvm.mul %67, %4983 : !llvm.i64 + %4985 = llvm.add %4982, %4984 : !llvm.i64 + %4986 = llvm.mlir.constant(1 : index) : !llvm.i64 + %4987 = llvm.mul %4787, %4986 : !llvm.i64 + %4988 = llvm.add %4985, %4987 : !llvm.i64 + %4989 = llvm.getelementptr %4978[%4988] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %4990 = llvm.load %4989 : !llvm.ptr> + %4991 = llvm.extractelement %4990[%29 : !llvm.i64] : !llvm.vec<8 x float> + %4992 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %4993 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4994 = llvm.mlir.constant(12 : index) : !llvm.i64 + %4995 = llvm.mul %4772, %4994 : !llvm.i64 + %4996 = llvm.add %4993, %4995 : !llvm.i64 + %4997 = llvm.mlir.constant(2 : index) : !llvm.i64 + %4998 = llvm.mul %67, %4997 : !llvm.i64 + %4999 = llvm.add %4996, %4998 : !llvm.i64 + %5000 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5001 = llvm.mul %4787, %5000 : !llvm.i64 + %5002 = llvm.add %4999, %5001 : !llvm.i64 + %5003 = llvm.getelementptr %4992[%5002] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5004 = llvm.load %5003 : !llvm.ptr> + %5005 = llvm.extractelement %5004[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5006 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5007 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5008 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5009 = llvm.mul %4772, %5008 : !llvm.i64 + %5010 = llvm.add %5007, %5009 : !llvm.i64 + %5011 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5012 = llvm.mul %67, %5011 : !llvm.i64 + %5013 = llvm.add %5010, %5012 : !llvm.i64 + %5014 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5015 = llvm.mul %4787, %5014 : !llvm.i64 + %5016 = llvm.add %5013, %5015 : !llvm.i64 + %5017 = llvm.getelementptr %5006[%5016] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5018 = llvm.load %5017 : !llvm.ptr> + %5019 = llvm.extractelement %5018[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5020 = llvm.fadd %4921, %4900 {RelaxedPrecision} : !llvm.float + %5021 = llvm.fadd %4935, %4901 {RelaxedPrecision} : !llvm.float + %5022 = llvm.fadd %4949, %4902 {RelaxedPrecision} : !llvm.float + %5023 = llvm.fadd %4963, %4903 {RelaxedPrecision} : !llvm.float + %5024 = llvm.fadd %4977, %4904 {RelaxedPrecision} : !llvm.float + %5025 = llvm.fadd %4991, %4905 {RelaxedPrecision} : !llvm.float + %5026 = llvm.fadd %5005, %4906 {RelaxedPrecision} : !llvm.float + %5027 = llvm.fadd %5019, %4907 {RelaxedPrecision} : !llvm.float + %5028 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5029 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5030 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5031 = llvm.mul %4772, %5030 : !llvm.i64 + %5032 = llvm.add %5029, %5031 : !llvm.i64 + %5033 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5034 = llvm.mul %67, %5033 : !llvm.i64 + %5035 = llvm.add %5032, %5034 : !llvm.i64 + %5036 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5037 = llvm.mul %4787, %5036 : !llvm.i64 + %5038 = llvm.add %5035, %5037 : !llvm.i64 + %5039 = llvm.getelementptr %5028[%5038] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5040 = llvm.load %5039 : !llvm.ptr> + %5041 = llvm.insertelement %5020, %5040[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5042 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5043 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5044 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5045 = llvm.mul %4772, %5044 : !llvm.i64 + %5046 = llvm.add %5043, %5045 : !llvm.i64 + %5047 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5048 = llvm.mul %67, %5047 : !llvm.i64 + %5049 = llvm.add %5046, %5048 : !llvm.i64 + %5050 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5051 = llvm.mul %4787, %5050 : !llvm.i64 + %5052 = llvm.add %5049, %5051 : !llvm.i64 + %5053 = llvm.getelementptr %5042[%5052] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5041, %5053 : !llvm.ptr> + %5054 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5055 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5056 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5057 = llvm.mul %4772, %5056 : !llvm.i64 + %5058 = llvm.add %5055, %5057 : !llvm.i64 + %5059 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5060 = llvm.mul %67, %5059 : !llvm.i64 + %5061 = llvm.add %5058, %5060 : !llvm.i64 + %5062 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5063 = llvm.mul %4787, %5062 : !llvm.i64 + %5064 = llvm.add %5061, %5063 : !llvm.i64 + %5065 = llvm.getelementptr %5054[%5064] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5066 = llvm.load %5065 : !llvm.ptr> + %5067 = llvm.insertelement %5021, %5066[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5068 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5069 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5070 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5071 = llvm.mul %4772, %5070 : !llvm.i64 + %5072 = llvm.add %5069, %5071 : !llvm.i64 + %5073 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5074 = llvm.mul %67, %5073 : !llvm.i64 + %5075 = llvm.add %5072, %5074 : !llvm.i64 + %5076 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5077 = llvm.mul %4787, %5076 : !llvm.i64 + %5078 = llvm.add %5075, %5077 : !llvm.i64 + %5079 = llvm.getelementptr %5068[%5078] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5067, %5079 : !llvm.ptr> + %5080 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5081 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5082 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5083 = llvm.mul %4772, %5082 : !llvm.i64 + %5084 = llvm.add %5081, %5083 : !llvm.i64 + %5085 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5086 = llvm.mul %67, %5085 : !llvm.i64 + %5087 = llvm.add %5084, %5086 : !llvm.i64 + %5088 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5089 = llvm.mul %4787, %5088 : !llvm.i64 + %5090 = llvm.add %5087, %5089 : !llvm.i64 + %5091 = llvm.getelementptr %5080[%5090] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5092 = llvm.load %5091 : !llvm.ptr> + %5093 = llvm.insertelement %5022, %5092[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5094 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5095 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5096 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5097 = llvm.mul %4772, %5096 : !llvm.i64 + %5098 = llvm.add %5095, %5097 : !llvm.i64 + %5099 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5100 = llvm.mul %67, %5099 : !llvm.i64 + %5101 = llvm.add %5098, %5100 : !llvm.i64 + %5102 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5103 = llvm.mul %4787, %5102 : !llvm.i64 + %5104 = llvm.add %5101, %5103 : !llvm.i64 + %5105 = llvm.getelementptr %5094[%5104] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5093, %5105 : !llvm.ptr> + %5106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5109 = llvm.mul %4772, %5108 : !llvm.i64 + %5110 = llvm.add %5107, %5109 : !llvm.i64 + %5111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5112 = llvm.mul %67, %5111 : !llvm.i64 + %5113 = llvm.add %5110, %5112 : !llvm.i64 + %5114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5115 = llvm.mul %4787, %5114 : !llvm.i64 + %5116 = llvm.add %5113, %5115 : !llvm.i64 + %5117 = llvm.getelementptr %5106[%5116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5118 = llvm.load %5117 : !llvm.ptr> + %5119 = llvm.insertelement %5023, %5118[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5120 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5122 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5123 = llvm.mul %4772, %5122 : !llvm.i64 + %5124 = llvm.add %5121, %5123 : !llvm.i64 + %5125 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5126 = llvm.mul %67, %5125 : !llvm.i64 + %5127 = llvm.add %5124, %5126 : !llvm.i64 + %5128 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5129 = llvm.mul %4787, %5128 : !llvm.i64 + %5130 = llvm.add %5127, %5129 : !llvm.i64 + %5131 = llvm.getelementptr %5120[%5130] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5119, %5131 : !llvm.ptr> + %5132 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5133 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5134 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5135 = llvm.mul %4772, %5134 : !llvm.i64 + %5136 = llvm.add %5133, %5135 : !llvm.i64 + %5137 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5138 = llvm.mul %67, %5137 : !llvm.i64 + %5139 = llvm.add %5136, %5138 : !llvm.i64 + %5140 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5141 = llvm.mul %4787, %5140 : !llvm.i64 + %5142 = llvm.add %5139, %5141 : !llvm.i64 + %5143 = llvm.getelementptr %5132[%5142] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5144 = llvm.load %5143 : !llvm.ptr> + %5145 = llvm.insertelement %5024, %5144[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5146 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5147 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5148 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5149 = llvm.mul %4772, %5148 : !llvm.i64 + %5150 = llvm.add %5147, %5149 : !llvm.i64 + %5151 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5152 = llvm.mul %67, %5151 : !llvm.i64 + %5153 = llvm.add %5150, %5152 : !llvm.i64 + %5154 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5155 = llvm.mul %4787, %5154 : !llvm.i64 + %5156 = llvm.add %5153, %5155 : !llvm.i64 + %5157 = llvm.getelementptr %5146[%5156] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5145, %5157 : !llvm.ptr> + %5158 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5159 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5160 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5161 = llvm.mul %4772, %5160 : !llvm.i64 + %5162 = llvm.add %5159, %5161 : !llvm.i64 + %5163 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5164 = llvm.mul %67, %5163 : !llvm.i64 + %5165 = llvm.add %5162, %5164 : !llvm.i64 + %5166 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5167 = llvm.mul %4787, %5166 : !llvm.i64 + %5168 = llvm.add %5165, %5167 : !llvm.i64 + %5169 = llvm.getelementptr %5158[%5168] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5170 = llvm.load %5169 : !llvm.ptr> + %5171 = llvm.insertelement %5025, %5170[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5172 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5173 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5174 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5175 = llvm.mul %4772, %5174 : !llvm.i64 + %5176 = llvm.add %5173, %5175 : !llvm.i64 + %5177 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5178 = llvm.mul %67, %5177 : !llvm.i64 + %5179 = llvm.add %5176, %5178 : !llvm.i64 + %5180 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5181 = llvm.mul %4787, %5180 : !llvm.i64 + %5182 = llvm.add %5179, %5181 : !llvm.i64 + %5183 = llvm.getelementptr %5172[%5182] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5171, %5183 : !llvm.ptr> + %5184 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5185 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5186 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5187 = llvm.mul %4772, %5186 : !llvm.i64 + %5188 = llvm.add %5185, %5187 : !llvm.i64 + %5189 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5190 = llvm.mul %67, %5189 : !llvm.i64 + %5191 = llvm.add %5188, %5190 : !llvm.i64 + %5192 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5193 = llvm.mul %4787, %5192 : !llvm.i64 + %5194 = llvm.add %5191, %5193 : !llvm.i64 + %5195 = llvm.getelementptr %5184[%5194] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5196 = llvm.load %5195 : !llvm.ptr> + %5197 = llvm.insertelement %5026, %5196[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5198 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5199 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5200 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5201 = llvm.mul %4772, %5200 : !llvm.i64 + %5202 = llvm.add %5199, %5201 : !llvm.i64 + %5203 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5204 = llvm.mul %67, %5203 : !llvm.i64 + %5205 = llvm.add %5202, %5204 : !llvm.i64 + %5206 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5207 = llvm.mul %4787, %5206 : !llvm.i64 + %5208 = llvm.add %5205, %5207 : !llvm.i64 + %5209 = llvm.getelementptr %5198[%5208] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5197, %5209 : !llvm.ptr> + %5210 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5211 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5212 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5213 = llvm.mul %4772, %5212 : !llvm.i64 + %5214 = llvm.add %5211, %5213 : !llvm.i64 + %5215 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5216 = llvm.mul %67, %5215 : !llvm.i64 + %5217 = llvm.add %5214, %5216 : !llvm.i64 + %5218 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5219 = llvm.mul %4787, %5218 : !llvm.i64 + %5220 = llvm.add %5217, %5219 : !llvm.i64 + %5221 = llvm.getelementptr %5210[%5220] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5222 = llvm.load %5221 : !llvm.ptr> + %5223 = llvm.insertelement %5027, %5222[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5224 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5225 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5226 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5227 = llvm.mul %4772, %5226 : !llvm.i64 + %5228 = llvm.add %5225, %5227 : !llvm.i64 + %5229 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5230 = llvm.mul %67, %5229 : !llvm.i64 + %5231 = llvm.add %5228, %5230 : !llvm.i64 + %5232 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5233 = llvm.mul %4787, %5232 : !llvm.i64 + %5234 = llvm.add %5231, %5233 : !llvm.i64 + %5235 = llvm.getelementptr %5224[%5234] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5223, %5235 : !llvm.ptr> + %5236 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5237 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5238 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5239 = llvm.mul %4772, %5238 : !llvm.i64 + %5240 = llvm.add %5237, %5239 : !llvm.i64 + %5241 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5242 = llvm.mul %67, %5241 : !llvm.i64 + %5243 = llvm.add %5240, %5242 : !llvm.i64 + %5244 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5245 = llvm.mul %4787, %5244 : !llvm.i64 + %5246 = llvm.add %5243, %5245 : !llvm.i64 + %5247 = llvm.getelementptr %5236[%5246] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5248 = llvm.load %5247 : !llvm.ptr> + %5249 = llvm.insertelement %5020, %5248[%24 : !llvm.i64] : !llvm.vec<8 x float> + %5250 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5251 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5252 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5253 = llvm.mul %4772, %5252 : !llvm.i64 + %5254 = llvm.add %5251, %5253 : !llvm.i64 + %5255 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5256 = llvm.mul %67, %5255 : !llvm.i64 + %5257 = llvm.add %5254, %5256 : !llvm.i64 + %5258 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5259 = llvm.mul %4787, %5258 : !llvm.i64 + %5260 = llvm.add %5257, %5259 : !llvm.i64 + %5261 = llvm.getelementptr %5250[%5260] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5249, %5261 : !llvm.ptr> + %5262 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5263 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5264 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5265 = llvm.mul %4772, %5264 : !llvm.i64 + %5266 = llvm.add %5263, %5265 : !llvm.i64 + %5267 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5268 = llvm.mul %67, %5267 : !llvm.i64 + %5269 = llvm.add %5266, %5268 : !llvm.i64 + %5270 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5271 = llvm.mul %4787, %5270 : !llvm.i64 + %5272 = llvm.add %5269, %5271 : !llvm.i64 + %5273 = llvm.getelementptr %5262[%5272] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5274 = llvm.load %5273 : !llvm.ptr> + %5275 = llvm.insertelement %5021, %5274[%25 : !llvm.i64] : !llvm.vec<8 x float> + %5276 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5277 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5278 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5279 = llvm.mul %4772, %5278 : !llvm.i64 + %5280 = llvm.add %5277, %5279 : !llvm.i64 + %5281 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5282 = llvm.mul %67, %5281 : !llvm.i64 + %5283 = llvm.add %5280, %5282 : !llvm.i64 + %5284 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5285 = llvm.mul %4787, %5284 : !llvm.i64 + %5286 = llvm.add %5283, %5285 : !llvm.i64 + %5287 = llvm.getelementptr %5276[%5286] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5275, %5287 : !llvm.ptr> + %5288 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5289 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5290 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5291 = llvm.mul %4772, %5290 : !llvm.i64 + %5292 = llvm.add %5289, %5291 : !llvm.i64 + %5293 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5294 = llvm.mul %67, %5293 : !llvm.i64 + %5295 = llvm.add %5292, %5294 : !llvm.i64 + %5296 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5297 = llvm.mul %4787, %5296 : !llvm.i64 + %5298 = llvm.add %5295, %5297 : !llvm.i64 + %5299 = llvm.getelementptr %5288[%5298] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5300 = llvm.load %5299 : !llvm.ptr> + %5301 = llvm.insertelement %5022, %5300[%26 : !llvm.i64] : !llvm.vec<8 x float> + %5302 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5303 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5304 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5305 = llvm.mul %4772, %5304 : !llvm.i64 + %5306 = llvm.add %5303, %5305 : !llvm.i64 + %5307 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5308 = llvm.mul %67, %5307 : !llvm.i64 + %5309 = llvm.add %5306, %5308 : !llvm.i64 + %5310 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5311 = llvm.mul %4787, %5310 : !llvm.i64 + %5312 = llvm.add %5309, %5311 : !llvm.i64 + %5313 = llvm.getelementptr %5302[%5312] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5301, %5313 : !llvm.ptr> + %5314 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5316 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5317 = llvm.mul %4772, %5316 : !llvm.i64 + %5318 = llvm.add %5315, %5317 : !llvm.i64 + %5319 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5320 = llvm.mul %67, %5319 : !llvm.i64 + %5321 = llvm.add %5318, %5320 : !llvm.i64 + %5322 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5323 = llvm.mul %4787, %5322 : !llvm.i64 + %5324 = llvm.add %5321, %5323 : !llvm.i64 + %5325 = llvm.getelementptr %5314[%5324] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5326 = llvm.load %5325 : !llvm.ptr> + %5327 = llvm.insertelement %5023, %5326[%27 : !llvm.i64] : !llvm.vec<8 x float> + %5328 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5329 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5330 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5331 = llvm.mul %4772, %5330 : !llvm.i64 + %5332 = llvm.add %5329, %5331 : !llvm.i64 + %5333 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5334 = llvm.mul %67, %5333 : !llvm.i64 + %5335 = llvm.add %5332, %5334 : !llvm.i64 + %5336 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5337 = llvm.mul %4787, %5336 : !llvm.i64 + %5338 = llvm.add %5335, %5337 : !llvm.i64 + %5339 = llvm.getelementptr %5328[%5338] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5327, %5339 : !llvm.ptr> + %5340 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5341 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5342 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5343 = llvm.mul %4772, %5342 : !llvm.i64 + %5344 = llvm.add %5341, %5343 : !llvm.i64 + %5345 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5346 = llvm.mul %67, %5345 : !llvm.i64 + %5347 = llvm.add %5344, %5346 : !llvm.i64 + %5348 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5349 = llvm.mul %4787, %5348 : !llvm.i64 + %5350 = llvm.add %5347, %5349 : !llvm.i64 + %5351 = llvm.getelementptr %5340[%5350] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5352 = llvm.load %5351 : !llvm.ptr> + %5353 = llvm.insertelement %5024, %5352[%28 : !llvm.i64] : !llvm.vec<8 x float> + %5354 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5355 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5356 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5357 = llvm.mul %4772, %5356 : !llvm.i64 + %5358 = llvm.add %5355, %5357 : !llvm.i64 + %5359 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5360 = llvm.mul %67, %5359 : !llvm.i64 + %5361 = llvm.add %5358, %5360 : !llvm.i64 + %5362 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5363 = llvm.mul %4787, %5362 : !llvm.i64 + %5364 = llvm.add %5361, %5363 : !llvm.i64 + %5365 = llvm.getelementptr %5354[%5364] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5353, %5365 : !llvm.ptr> + %5366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5369 = llvm.mul %4772, %5368 : !llvm.i64 + %5370 = llvm.add %5367, %5369 : !llvm.i64 + %5371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5372 = llvm.mul %67, %5371 : !llvm.i64 + %5373 = llvm.add %5370, %5372 : !llvm.i64 + %5374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5375 = llvm.mul %4787, %5374 : !llvm.i64 + %5376 = llvm.add %5373, %5375 : !llvm.i64 + %5377 = llvm.getelementptr %5366[%5376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5378 = llvm.load %5377 : !llvm.ptr> + %5379 = llvm.insertelement %5025, %5378[%29 : !llvm.i64] : !llvm.vec<8 x float> + %5380 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5382 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5383 = llvm.mul %4772, %5382 : !llvm.i64 + %5384 = llvm.add %5381, %5383 : !llvm.i64 + %5385 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5386 = llvm.mul %67, %5385 : !llvm.i64 + %5387 = llvm.add %5384, %5386 : !llvm.i64 + %5388 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5389 = llvm.mul %4787, %5388 : !llvm.i64 + %5390 = llvm.add %5387, %5389 : !llvm.i64 + %5391 = llvm.getelementptr %5380[%5390] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5379, %5391 : !llvm.ptr> + %5392 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5393 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5394 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5395 = llvm.mul %4772, %5394 : !llvm.i64 + %5396 = llvm.add %5393, %5395 : !llvm.i64 + %5397 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5398 = llvm.mul %67, %5397 : !llvm.i64 + %5399 = llvm.add %5396, %5398 : !llvm.i64 + %5400 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5401 = llvm.mul %4787, %5400 : !llvm.i64 + %5402 = llvm.add %5399, %5401 : !llvm.i64 + %5403 = llvm.getelementptr %5392[%5402] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5404 = llvm.load %5403 : !llvm.ptr> + %5405 = llvm.insertelement %5026, %5404[%30 : !llvm.i64] : !llvm.vec<8 x float> + %5406 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5407 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5408 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5409 = llvm.mul %4772, %5408 : !llvm.i64 + %5410 = llvm.add %5407, %5409 : !llvm.i64 + %5411 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5412 = llvm.mul %67, %5411 : !llvm.i64 + %5413 = llvm.add %5410, %5412 : !llvm.i64 + %5414 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5415 = llvm.mul %4787, %5414 : !llvm.i64 + %5416 = llvm.add %5413, %5415 : !llvm.i64 + %5417 = llvm.getelementptr %5406[%5416] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5405, %5417 : !llvm.ptr> + %5418 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5419 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5420 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5421 = llvm.mul %4772, %5420 : !llvm.i64 + %5422 = llvm.add %5419, %5421 : !llvm.i64 + %5423 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5424 = llvm.mul %67, %5423 : !llvm.i64 + %5425 = llvm.add %5422, %5424 : !llvm.i64 + %5426 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5427 = llvm.mul %4787, %5426 : !llvm.i64 + %5428 = llvm.add %5425, %5427 : !llvm.i64 + %5429 = llvm.getelementptr %5418[%5428] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5430 = llvm.load %5429 : !llvm.ptr> + %5431 = llvm.insertelement %5027, %5430[%31 : !llvm.i64] : !llvm.vec<8 x float> + %5432 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5433 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5434 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5435 = llvm.mul %4772, %5434 : !llvm.i64 + %5436 = llvm.add %5433, %5435 : !llvm.i64 + %5437 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5438 = llvm.mul %67, %5437 : !llvm.i64 + %5439 = llvm.add %5436, %5438 : !llvm.i64 + %5440 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5441 = llvm.mul %4787, %5440 : !llvm.i64 + %5442 = llvm.add %5439, %5441 : !llvm.i64 + %5443 = llvm.getelementptr %5432[%5442] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5431, %5443 : !llvm.ptr> + %5444 = llvm.add %3915, %69 : !llvm.i64 + llvm.br ^bb36(%5444 : !llvm.i64) + ^bb38: // pred: ^bb36 + %5445 = llvm.add %2370, %48 : !llvm.i64 + llvm.br ^bb25(%5445 : !llvm.i64) + ^bb39: // pred: ^bb25 + %5446 = llvm.add %2368, %68 : !llvm.i64 + llvm.br ^bb23(%5446 : !llvm.i64) + ^bb40: // pred: ^bb23 + llvm.br ^bb41(%67 : !llvm.i64) + ^bb41(%5447: !llvm.i64): // 2 preds: ^bb40, ^bb50 + %5448 = llvm.icmp "slt" %5447, %38 : !llvm.i64 + llvm.cond_br %5448, ^bb42, ^bb51 + ^bb42: // pred: ^bb41 + llvm.cond_br %40, ^bb43, ^bb47 + ^bb43: // pred: ^bb42 + %5449 = llvm.add %151, %5447 : !llvm.i64 + %5450 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5451 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5452 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5453 = llvm.mul %2345, %5452 : !llvm.i64 + %5454 = llvm.add %5451, %5453 : !llvm.i64 + %5455 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5456 = llvm.mul %5449, %5455 : !llvm.i64 + %5457 = llvm.add %5454, %5456 : !llvm.i64 + %5458 = llvm.getelementptr %5450[%5457] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5459 = llvm.bitcast %5458 : !llvm.ptr to !llvm.ptr> + %5460 = llvm.load %5459 {alignment = 4 : i64} : !llvm.ptr> + %5461 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %5462 = llvm.sub %64, %5447 : !llvm.i64 + %5463 = llvm.select %5461, %5462, %5447 : !llvm.i1, !llvm.i64 + %5464 = llvm.sdiv %5463, %68 : !llvm.i64 + %5465 = llvm.sub %64, %5464 : !llvm.i64 + %5466 = llvm.select %5461, %5465, %5464 : !llvm.i1, !llvm.i64 + %5467 = llvm.srem %5466, %68 : !llvm.i64 + %5468 = llvm.icmp "slt" %5467, %67 : !llvm.i64 + %5469 = llvm.add %5467, %68 : !llvm.i64 + %5470 = llvm.select %5468, %5469, %5467 : !llvm.i1, !llvm.i64 + %5471 = llvm.srem %5447, %68 : !llvm.i64 + %5472 = llvm.icmp "slt" %5471, %67 : !llvm.i64 + %5473 = llvm.add %5471, %68 : !llvm.i64 + %5474 = llvm.select %5472, %5473, %5471 : !llvm.i1, !llvm.i64 + %5475 = llvm.icmp "slt" %5474, %67 : !llvm.i64 + %5476 = llvm.sub %64, %5474 : !llvm.i64 + %5477 = llvm.select %5475, %5476, %5474 : !llvm.i1, !llvm.i64 + %5478 = llvm.sdiv %5477, %70 : !llvm.i64 + %5479 = llvm.sub %64, %5478 : !llvm.i64 + %5480 = llvm.select %5475, %5479, %5478 : !llvm.i1, !llvm.i64 + %5481 = llvm.srem %5480, %63 : !llvm.i64 + %5482 = llvm.icmp "slt" %5481, %67 : !llvm.i64 + %5483 = llvm.add %5481, %63 : !llvm.i64 + %5484 = llvm.select %5482, %5483, %5481 : !llvm.i1, !llvm.i64 + %5485 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5486 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5487 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5488 = llvm.mul %5470, %5487 : !llvm.i64 + %5489 = llvm.add %5486, %5488 : !llvm.i64 + %5490 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5491 = llvm.mul %67, %5490 : !llvm.i64 + %5492 = llvm.add %5489, %5491 : !llvm.i64 + %5493 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5494 = llvm.mul %5484, %5493 : !llvm.i64 + %5495 = llvm.add %5492, %5494 : !llvm.i64 + %5496 = llvm.getelementptr %5485[%5495] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5497 = llvm.load %5496 : !llvm.ptr> + %5498 = llvm.fadd %5460, %5497 : !llvm.vec<8 x float> + %5499 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5500 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5501 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5502 = llvm.mul %67, %5501 : !llvm.i64 + %5503 = llvm.add %5500, %5502 : !llvm.i64 + %5504 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5505 = llvm.mul %67, %5504 : !llvm.i64 + %5506 = llvm.add %5503, %5505 : !llvm.i64 + %5507 = llvm.getelementptr %5499[%5506] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5498, %5507 : !llvm.ptr> + %5508 = llvm.add %5449, %70 : !llvm.i64 + %5509 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5510 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5511 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5512 = llvm.mul %2345, %5511 : !llvm.i64 + %5513 = llvm.add %5510, %5512 : !llvm.i64 + %5514 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5515 = llvm.mul %5508, %5514 : !llvm.i64 + %5516 = llvm.add %5513, %5515 : !llvm.i64 + %5517 = llvm.getelementptr %5509[%5516] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5518 = llvm.bitcast %5517 : !llvm.ptr to !llvm.ptr> + %5519 = llvm.load %5518 {alignment = 4 : i64} : !llvm.ptr> + %5520 = llvm.add %5447, %70 : !llvm.i64 + %5521 = llvm.icmp "slt" %5520, %67 : !llvm.i64 + %5522 = llvm.sub %64, %5520 : !llvm.i64 + %5523 = llvm.select %5521, %5522, %5520 : !llvm.i1, !llvm.i64 + %5524 = llvm.sdiv %5523, %68 : !llvm.i64 + %5525 = llvm.sub %64, %5524 : !llvm.i64 + %5526 = llvm.select %5521, %5525, %5524 : !llvm.i1, !llvm.i64 + %5527 = llvm.srem %5526, %68 : !llvm.i64 + %5528 = llvm.icmp "slt" %5527, %67 : !llvm.i64 + %5529 = llvm.add %5527, %68 : !llvm.i64 + %5530 = llvm.select %5528, %5529, %5527 : !llvm.i1, !llvm.i64 + %5531 = llvm.sdiv %5463, %70 : !llvm.i64 + %5532 = llvm.sub %64, %5531 : !llvm.i64 + %5533 = llvm.select %5461, %5532, %5531 : !llvm.i1, !llvm.i64 + %5534 = llvm.mul %5526, %65 : !llvm.i64 + %5535 = llvm.add %5533, %5534 : !llvm.i64 + %5536 = llvm.add %5535, %69 : !llvm.i64 + %5537 = llvm.icmp "slt" %5536, %67 : !llvm.i64 + %5538 = llvm.sub %64, %5536 : !llvm.i64 + %5539 = llvm.select %5537, %5538, %5536 : !llvm.i1, !llvm.i64 + %5540 = llvm.sdiv %5539, %63 : !llvm.i64 + %5541 = llvm.sub %64, %5540 : !llvm.i64 + %5542 = llvm.select %5537, %5541, %5540 : !llvm.i1, !llvm.i64 + %5543 = llvm.mul %5542, %65 : !llvm.i64 + %5544 = llvm.add %5535, %5543 : !llvm.i64 + %5545 = llvm.add %5544, %69 : !llvm.i64 + %5546 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5547 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5548 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5549 = llvm.mul %5530, %5548 : !llvm.i64 + %5550 = llvm.add %5547, %5549 : !llvm.i64 + %5551 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5552 = llvm.mul %67, %5551 : !llvm.i64 + %5553 = llvm.add %5550, %5552 : !llvm.i64 + %5554 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5555 = llvm.mul %5545, %5554 : !llvm.i64 + %5556 = llvm.add %5553, %5555 : !llvm.i64 + %5557 = llvm.getelementptr %5546[%5556] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5558 = llvm.load %5557 : !llvm.ptr> + %5559 = llvm.fadd %5519, %5558 : !llvm.vec<8 x float> + %5560 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5561 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5562 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5563 = llvm.mul %67, %5562 : !llvm.i64 + %5564 = llvm.add %5561, %5563 : !llvm.i64 + %5565 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5566 = llvm.mul %69, %5565 : !llvm.i64 + %5567 = llvm.add %5564, %5566 : !llvm.i64 + %5568 = llvm.getelementptr %5560[%5567] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5559, %5568 : !llvm.ptr> + %5569 = llvm.add %5449, %68 : !llvm.i64 + %5570 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5571 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5572 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5573 = llvm.mul %2345, %5572 : !llvm.i64 + %5574 = llvm.add %5571, %5573 : !llvm.i64 + %5575 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5576 = llvm.mul %5569, %5575 : !llvm.i64 + %5577 = llvm.add %5574, %5576 : !llvm.i64 + %5578 = llvm.getelementptr %5570[%5577] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5579 = llvm.bitcast %5578 : !llvm.ptr to !llvm.ptr> + %5580 = llvm.load %5579 {alignment = 4 : i64} : !llvm.ptr> + %5581 = llvm.add %5466, %69 : !llvm.i64 + %5582 = llvm.icmp "slt" %5581, %67 : !llvm.i64 + %5583 = llvm.sub %64, %5581 : !llvm.i64 + %5584 = llvm.select %5582, %5583, %5581 : !llvm.i1, !llvm.i64 + %5585 = llvm.sdiv %5584, %68 : !llvm.i64 + %5586 = llvm.sub %64, %5585 : !llvm.i64 + %5587 = llvm.select %5582, %5586, %5585 : !llvm.i1, !llvm.i64 + %5588 = llvm.mul %5587, %60 : !llvm.i64 + %5589 = llvm.add %5466, %5588 : !llvm.i64 + %5590 = llvm.add %5589, %69 : !llvm.i64 + %5591 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5592 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5593 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5594 = llvm.mul %5590, %5593 : !llvm.i64 + %5595 = llvm.add %5592, %5594 : !llvm.i64 + %5596 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5597 = llvm.mul %67, %5596 : !llvm.i64 + %5598 = llvm.add %5595, %5597 : !llvm.i64 + %5599 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5600 = llvm.mul %5484, %5599 : !llvm.i64 + %5601 = llvm.add %5598, %5600 : !llvm.i64 + %5602 = llvm.getelementptr %5591[%5601] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5603 = llvm.load %5602 : !llvm.ptr> + %5604 = llvm.fadd %5580, %5603 : !llvm.vec<8 x float> + %5605 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5606 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5607 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5608 = llvm.mul %67, %5607 : !llvm.i64 + %5609 = llvm.add %5606, %5608 : !llvm.i64 + %5610 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5611 = llvm.mul %63, %5610 : !llvm.i64 + %5612 = llvm.add %5609, %5611 : !llvm.i64 + %5613 = llvm.getelementptr %5605[%5612] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5604, %5613 : !llvm.ptr> + %5614 = llvm.add %5449, %41 : !llvm.i64 + %5615 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5616 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5617 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5618 = llvm.mul %2345, %5617 : !llvm.i64 + %5619 = llvm.add %5616, %5618 : !llvm.i64 + %5620 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5621 = llvm.mul %5614, %5620 : !llvm.i64 + %5622 = llvm.add %5619, %5621 : !llvm.i64 + %5623 = llvm.getelementptr %5615[%5622] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5624 = llvm.bitcast %5623 : !llvm.ptr to !llvm.ptr> + %5625 = llvm.load %5624 {alignment = 4 : i64} : !llvm.ptr> + %5626 = llvm.add %5447, %41 : !llvm.i64 + %5627 = llvm.icmp "slt" %5626, %67 : !llvm.i64 + %5628 = llvm.sub %64, %5626 : !llvm.i64 + %5629 = llvm.select %5627, %5628, %5626 : !llvm.i1, !llvm.i64 + %5630 = llvm.sdiv %5629, %68 : !llvm.i64 + %5631 = llvm.sub %64, %5630 : !llvm.i64 + %5632 = llvm.select %5627, %5631, %5630 : !llvm.i1, !llvm.i64 + %5633 = llvm.srem %5632, %68 : !llvm.i64 + %5634 = llvm.icmp "slt" %5633, %67 : !llvm.i64 + %5635 = llvm.add %5633, %68 : !llvm.i64 + %5636 = llvm.select %5634, %5635, %5633 : !llvm.i1, !llvm.i64 + %5637 = llvm.mul %5632, %65 : !llvm.i64 + %5638 = llvm.add %5533, %5637 : !llvm.i64 + %5639 = llvm.add %5638, %45 : !llvm.i64 + %5640 = llvm.icmp "slt" %5639, %67 : !llvm.i64 + %5641 = llvm.sub %64, %5639 : !llvm.i64 + %5642 = llvm.select %5640, %5641, %5639 : !llvm.i1, !llvm.i64 + %5643 = llvm.sdiv %5642, %63 : !llvm.i64 + %5644 = llvm.sub %64, %5643 : !llvm.i64 + %5645 = llvm.select %5640, %5644, %5643 : !llvm.i1, !llvm.i64 + %5646 = llvm.mul %5645, %65 : !llvm.i64 + %5647 = llvm.add %5638, %5646 : !llvm.i64 + %5648 = llvm.add %5647, %45 : !llvm.i64 + %5649 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5651 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5652 = llvm.mul %5636, %5651 : !llvm.i64 + %5653 = llvm.add %5650, %5652 : !llvm.i64 + %5654 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5655 = llvm.mul %67, %5654 : !llvm.i64 + %5656 = llvm.add %5653, %5655 : !llvm.i64 + %5657 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5658 = llvm.mul %5648, %5657 : !llvm.i64 + %5659 = llvm.add %5656, %5658 : !llvm.i64 + %5660 = llvm.getelementptr %5649[%5659] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5661 = llvm.load %5660 : !llvm.ptr> + %5662 = llvm.fadd %5625, %5661 : !llvm.vec<8 x float> + %5663 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5664 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5665 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5666 = llvm.mul %67, %5665 : !llvm.i64 + %5667 = llvm.add %5664, %5666 : !llvm.i64 + %5668 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5669 = llvm.mul %45, %5668 : !llvm.i64 + %5670 = llvm.add %5667, %5669 : !llvm.i64 + %5671 = llvm.getelementptr %5663[%5670] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5662, %5671 : !llvm.ptr> + %5672 = llvm.add %5449, %42 : !llvm.i64 + %5673 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5674 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5675 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5676 = llvm.mul %2345, %5675 : !llvm.i64 + %5677 = llvm.add %5674, %5676 : !llvm.i64 + %5678 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5679 = llvm.mul %5672, %5678 : !llvm.i64 + %5680 = llvm.add %5677, %5679 : !llvm.i64 + %5681 = llvm.getelementptr %5673[%5680] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5682 = llvm.bitcast %5681 : !llvm.ptr to !llvm.ptr> + %5683 = llvm.load %5682 {alignment = 4 : i64} : !llvm.ptr> + %5684 = llvm.add %5466, %63 : !llvm.i64 + %5685 = llvm.icmp "slt" %5684, %67 : !llvm.i64 + %5686 = llvm.sub %64, %5684 : !llvm.i64 + %5687 = llvm.select %5685, %5686, %5684 : !llvm.i1, !llvm.i64 + %5688 = llvm.sdiv %5687, %68 : !llvm.i64 + %5689 = llvm.sub %64, %5688 : !llvm.i64 + %5690 = llvm.select %5685, %5689, %5688 : !llvm.i1, !llvm.i64 + %5691 = llvm.mul %5690, %60 : !llvm.i64 + %5692 = llvm.add %5466, %5691 : !llvm.i64 + %5693 = llvm.add %5692, %63 : !llvm.i64 + %5694 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5695 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5696 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5697 = llvm.mul %5693, %5696 : !llvm.i64 + %5698 = llvm.add %5695, %5697 : !llvm.i64 + %5699 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5700 = llvm.mul %67, %5699 : !llvm.i64 + %5701 = llvm.add %5698, %5700 : !llvm.i64 + %5702 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5703 = llvm.mul %5484, %5702 : !llvm.i64 + %5704 = llvm.add %5701, %5703 : !llvm.i64 + %5705 = llvm.getelementptr %5694[%5704] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5706 = llvm.load %5705 : !llvm.ptr> + %5707 = llvm.fadd %5683, %5706 : !llvm.vec<8 x float> + %5708 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5710 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5711 = llvm.mul %67, %5710 : !llvm.i64 + %5712 = llvm.add %5709, %5711 : !llvm.i64 + %5713 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5714 = llvm.mul %48, %5713 : !llvm.i64 + %5715 = llvm.add %5712, %5714 : !llvm.i64 + %5716 = llvm.getelementptr %5708[%5715] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5707, %5716 : !llvm.ptr> + %5717 = llvm.add %5449, %43 : !llvm.i64 + %5718 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5719 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5720 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5721 = llvm.mul %2345, %5720 : !llvm.i64 + %5722 = llvm.add %5719, %5721 : !llvm.i64 + %5723 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5724 = llvm.mul %5717, %5723 : !llvm.i64 + %5725 = llvm.add %5722, %5724 : !llvm.i64 + %5726 = llvm.getelementptr %5718[%5725] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5727 = llvm.bitcast %5726 : !llvm.ptr to !llvm.ptr> + %5728 = llvm.load %5727 {alignment = 4 : i64} : !llvm.ptr> + %5729 = llvm.add %5447, %43 : !llvm.i64 + %5730 = llvm.icmp "slt" %5729, %67 : !llvm.i64 + %5731 = llvm.sub %64, %5729 : !llvm.i64 + %5732 = llvm.select %5730, %5731, %5729 : !llvm.i1, !llvm.i64 + %5733 = llvm.sdiv %5732, %68 : !llvm.i64 + %5734 = llvm.sub %64, %5733 : !llvm.i64 + %5735 = llvm.select %5730, %5734, %5733 : !llvm.i1, !llvm.i64 + %5736 = llvm.srem %5735, %68 : !llvm.i64 + %5737 = llvm.icmp "slt" %5736, %67 : !llvm.i64 + %5738 = llvm.add %5736, %68 : !llvm.i64 + %5739 = llvm.select %5737, %5738, %5736 : !llvm.i1, !llvm.i64 + %5740 = llvm.mul %5735, %65 : !llvm.i64 + %5741 = llvm.add %5533, %5740 : !llvm.i64 + %5742 = llvm.add %5741, %52 : !llvm.i64 + %5743 = llvm.icmp "slt" %5742, %67 : !llvm.i64 + %5744 = llvm.sub %64, %5742 : !llvm.i64 + %5745 = llvm.select %5743, %5744, %5742 : !llvm.i1, !llvm.i64 + %5746 = llvm.sdiv %5745, %63 : !llvm.i64 + %5747 = llvm.sub %64, %5746 : !llvm.i64 + %5748 = llvm.select %5743, %5747, %5746 : !llvm.i1, !llvm.i64 + %5749 = llvm.mul %5748, %65 : !llvm.i64 + %5750 = llvm.add %5741, %5749 : !llvm.i64 + %5751 = llvm.add %5750, %52 : !llvm.i64 + %5752 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5753 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5754 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5755 = llvm.mul %5739, %5754 : !llvm.i64 + %5756 = llvm.add %5753, %5755 : !llvm.i64 + %5757 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5758 = llvm.mul %67, %5757 : !llvm.i64 + %5759 = llvm.add %5756, %5758 : !llvm.i64 + %5760 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5761 = llvm.mul %5751, %5760 : !llvm.i64 + %5762 = llvm.add %5759, %5761 : !llvm.i64 + %5763 = llvm.getelementptr %5752[%5762] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5764 = llvm.load %5763 : !llvm.ptr> + %5765 = llvm.fadd %5728, %5764 : !llvm.vec<8 x float> + %5766 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5767 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5768 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5769 = llvm.mul %67, %5768 : !llvm.i64 + %5770 = llvm.add %5767, %5769 : !llvm.i64 + %5771 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5772 = llvm.mul %52, %5771 : !llvm.i64 + %5773 = llvm.add %5770, %5772 : !llvm.i64 + %5774 = llvm.getelementptr %5766[%5773] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5765, %5774 : !llvm.ptr> + %5775 = llvm.add %5449, %44 : !llvm.i64 + %5776 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5777 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5778 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5779 = llvm.mul %2345, %5778 : !llvm.i64 + %5780 = llvm.add %5777, %5779 : !llvm.i64 + %5781 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5782 = llvm.mul %5775, %5781 : !llvm.i64 + %5783 = llvm.add %5780, %5782 : !llvm.i64 + %5784 = llvm.getelementptr %5776[%5783] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5785 = llvm.bitcast %5784 : !llvm.ptr to !llvm.ptr> + %5786 = llvm.load %5785 {alignment = 4 : i64} : !llvm.ptr> + %5787 = llvm.add %5466, %45 : !llvm.i64 + %5788 = llvm.icmp "slt" %5787, %67 : !llvm.i64 + %5789 = llvm.sub %64, %5787 : !llvm.i64 + %5790 = llvm.select %5788, %5789, %5787 : !llvm.i1, !llvm.i64 + %5791 = llvm.sdiv %5790, %68 : !llvm.i64 + %5792 = llvm.sub %64, %5791 : !llvm.i64 + %5793 = llvm.select %5788, %5792, %5791 : !llvm.i1, !llvm.i64 + %5794 = llvm.mul %5793, %60 : !llvm.i64 + %5795 = llvm.add %5466, %5794 : !llvm.i64 + %5796 = llvm.add %5795, %45 : !llvm.i64 + %5797 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5798 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5799 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5800 = llvm.mul %5796, %5799 : !llvm.i64 + %5801 = llvm.add %5798, %5800 : !llvm.i64 + %5802 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5803 = llvm.mul %67, %5802 : !llvm.i64 + %5804 = llvm.add %5801, %5803 : !llvm.i64 + %5805 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5806 = llvm.mul %5484, %5805 : !llvm.i64 + %5807 = llvm.add %5804, %5806 : !llvm.i64 + %5808 = llvm.getelementptr %5797[%5807] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5809 = llvm.load %5808 : !llvm.ptr> + %5810 = llvm.fadd %5786, %5809 : !llvm.vec<8 x float> + %5811 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5812 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5813 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5814 = llvm.mul %67, %5813 : !llvm.i64 + %5815 = llvm.add %5812, %5814 : !llvm.i64 + %5816 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5817 = llvm.mul %56, %5816 : !llvm.i64 + %5818 = llvm.add %5815, %5817 : !llvm.i64 + %5819 = llvm.getelementptr %5811[%5818] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5810, %5819 : !llvm.ptr> + %5820 = llvm.add %5449, %46 : !llvm.i64 + %5821 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5822 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5823 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5824 = llvm.mul %2345, %5823 : !llvm.i64 + %5825 = llvm.add %5822, %5824 : !llvm.i64 + %5826 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5827 = llvm.mul %5820, %5826 : !llvm.i64 + %5828 = llvm.add %5825, %5827 : !llvm.i64 + %5829 = llvm.getelementptr %5821[%5828] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5830 = llvm.bitcast %5829 : !llvm.ptr to !llvm.ptr> + %5831 = llvm.load %5830 {alignment = 4 : i64} : !llvm.ptr> + %5832 = llvm.add %5447, %46 : !llvm.i64 + %5833 = llvm.icmp "slt" %5832, %67 : !llvm.i64 + %5834 = llvm.sub %64, %5832 : !llvm.i64 + %5835 = llvm.select %5833, %5834, %5832 : !llvm.i1, !llvm.i64 + %5836 = llvm.sdiv %5835, %68 : !llvm.i64 + %5837 = llvm.sub %64, %5836 : !llvm.i64 + %5838 = llvm.select %5833, %5837, %5836 : !llvm.i1, !llvm.i64 + %5839 = llvm.srem %5838, %68 : !llvm.i64 + %5840 = llvm.icmp "slt" %5839, %67 : !llvm.i64 + %5841 = llvm.add %5839, %68 : !llvm.i64 + %5842 = llvm.select %5840, %5841, %5839 : !llvm.i1, !llvm.i64 + %5843 = llvm.mul %5838, %65 : !llvm.i64 + %5844 = llvm.add %5533, %5843 : !llvm.i64 + %5845 = llvm.add %5844, %61 : !llvm.i64 + %5846 = llvm.icmp "slt" %5845, %67 : !llvm.i64 + %5847 = llvm.sub %64, %5845 : !llvm.i64 + %5848 = llvm.select %5846, %5847, %5845 : !llvm.i1, !llvm.i64 + %5849 = llvm.sdiv %5848, %63 : !llvm.i64 + %5850 = llvm.sub %64, %5849 : !llvm.i64 + %5851 = llvm.select %5846, %5850, %5849 : !llvm.i1, !llvm.i64 + %5852 = llvm.mul %5851, %65 : !llvm.i64 + %5853 = llvm.add %5844, %5852 : !llvm.i64 + %5854 = llvm.add %5853, %61 : !llvm.i64 + %5855 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5857 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5858 = llvm.mul %5842, %5857 : !llvm.i64 + %5859 = llvm.add %5856, %5858 : !llvm.i64 + %5860 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5861 = llvm.mul %67, %5860 : !llvm.i64 + %5862 = llvm.add %5859, %5861 : !llvm.i64 + %5863 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5864 = llvm.mul %5854, %5863 : !llvm.i64 + %5865 = llvm.add %5862, %5864 : !llvm.i64 + %5866 = llvm.getelementptr %5855[%5865] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5867 = llvm.load %5866 : !llvm.ptr> + %5868 = llvm.fadd %5831, %5867 : !llvm.vec<8 x float> + %5869 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5870 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5871 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5872 = llvm.mul %67, %5871 : !llvm.i64 + %5873 = llvm.add %5870, %5872 : !llvm.i64 + %5874 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5875 = llvm.mul %61, %5874 : !llvm.i64 + %5876 = llvm.add %5873, %5875 : !llvm.i64 + %5877 = llvm.getelementptr %5869[%5876] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5868, %5877 : !llvm.ptr> + %5878 = llvm.add %5449, %47 : !llvm.i64 + %5879 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5880 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5881 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5882 = llvm.mul %2345, %5881 : !llvm.i64 + %5883 = llvm.add %5880, %5882 : !llvm.i64 + %5884 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5885 = llvm.mul %5878, %5884 : !llvm.i64 + %5886 = llvm.add %5883, %5885 : !llvm.i64 + %5887 = llvm.getelementptr %5879[%5886] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5888 = llvm.bitcast %5887 : !llvm.ptr to !llvm.ptr> + %5889 = llvm.load %5888 {alignment = 4 : i64} : !llvm.ptr> + %5890 = llvm.add %5466, %48 : !llvm.i64 + %5891 = llvm.icmp "slt" %5890, %67 : !llvm.i64 + %5892 = llvm.sub %64, %5890 : !llvm.i64 + %5893 = llvm.select %5891, %5892, %5890 : !llvm.i1, !llvm.i64 + %5894 = llvm.sdiv %5893, %68 : !llvm.i64 + %5895 = llvm.sub %64, %5894 : !llvm.i64 + %5896 = llvm.select %5891, %5895, %5894 : !llvm.i1, !llvm.i64 + %5897 = llvm.mul %5896, %60 : !llvm.i64 + %5898 = llvm.add %5466, %5897 : !llvm.i64 + %5899 = llvm.add %5898, %48 : !llvm.i64 + %5900 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5901 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5902 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5903 = llvm.mul %5899, %5902 : !llvm.i64 + %5904 = llvm.add %5901, %5903 : !llvm.i64 + %5905 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5906 = llvm.mul %67, %5905 : !llvm.i64 + %5907 = llvm.add %5904, %5906 : !llvm.i64 + %5908 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5909 = llvm.mul %5484, %5908 : !llvm.i64 + %5910 = llvm.add %5907, %5909 : !llvm.i64 + %5911 = llvm.getelementptr %5900[%5910] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5912 = llvm.load %5911 : !llvm.ptr> + %5913 = llvm.fadd %5889, %5912 : !llvm.vec<8 x float> + %5914 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5915 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5916 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5917 = llvm.mul %67, %5916 : !llvm.i64 + %5918 = llvm.add %5915, %5917 : !llvm.i64 + %5919 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5920 = llvm.mul %70, %5919 : !llvm.i64 + %5921 = llvm.add %5918, %5920 : !llvm.i64 + %5922 = llvm.getelementptr %5914[%5921] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5913, %5922 : !llvm.ptr> + %5923 = llvm.add %5449, %49 : !llvm.i64 + %5924 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5925 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5926 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5927 = llvm.mul %2345, %5926 : !llvm.i64 + %5928 = llvm.add %5925, %5927 : !llvm.i64 + %5929 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5930 = llvm.mul %5923, %5929 : !llvm.i64 + %5931 = llvm.add %5928, %5930 : !llvm.i64 + %5932 = llvm.getelementptr %5924[%5931] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5933 = llvm.bitcast %5932 : !llvm.ptr to !llvm.ptr> + %5934 = llvm.load %5933 {alignment = 4 : i64} : !llvm.ptr> + %5935 = llvm.add %5447, %49 : !llvm.i64 + %5936 = llvm.icmp "slt" %5935, %67 : !llvm.i64 + %5937 = llvm.sub %64, %5935 : !llvm.i64 + %5938 = llvm.select %5936, %5937, %5935 : !llvm.i1, !llvm.i64 + %5939 = llvm.sdiv %5938, %68 : !llvm.i64 + %5940 = llvm.sub %64, %5939 : !llvm.i64 + %5941 = llvm.select %5936, %5940, %5939 : !llvm.i1, !llvm.i64 + %5942 = llvm.srem %5941, %68 : !llvm.i64 + %5943 = llvm.icmp "slt" %5942, %67 : !llvm.i64 + %5944 = llvm.add %5942, %68 : !llvm.i64 + %5945 = llvm.select %5943, %5944, %5942 : !llvm.i1, !llvm.i64 + %5946 = llvm.mul %5941, %65 : !llvm.i64 + %5947 = llvm.add %5533, %5946 : !llvm.i64 + %5948 = llvm.add %5947, %50 : !llvm.i64 + %5949 = llvm.icmp "slt" %5948, %67 : !llvm.i64 + %5950 = llvm.sub %64, %5948 : !llvm.i64 + %5951 = llvm.select %5949, %5950, %5948 : !llvm.i1, !llvm.i64 + %5952 = llvm.sdiv %5951, %63 : !llvm.i64 + %5953 = llvm.sub %64, %5952 : !llvm.i64 + %5954 = llvm.select %5949, %5953, %5952 : !llvm.i1, !llvm.i64 + %5955 = llvm.mul %5954, %65 : !llvm.i64 + %5956 = llvm.add %5947, %5955 : !llvm.i64 + %5957 = llvm.add %5956, %50 : !llvm.i64 + %5958 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %5959 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5960 = llvm.mlir.constant(12 : index) : !llvm.i64 + %5961 = llvm.mul %5945, %5960 : !llvm.i64 + %5962 = llvm.add %5959, %5961 : !llvm.i64 + %5963 = llvm.mlir.constant(2 : index) : !llvm.i64 + %5964 = llvm.mul %67, %5963 : !llvm.i64 + %5965 = llvm.add %5962, %5964 : !llvm.i64 + %5966 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5967 = llvm.mul %5957, %5966 : !llvm.i64 + %5968 = llvm.add %5965, %5967 : !llvm.i64 + %5969 = llvm.getelementptr %5958[%5968] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %5970 = llvm.load %5969 : !llvm.ptr> + %5971 = llvm.fadd %5934, %5970 : !llvm.vec<8 x float> + %5972 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %5973 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5974 = llvm.mlir.constant(16 : index) : !llvm.i64 + %5975 = llvm.mul %67, %5974 : !llvm.i64 + %5976 = llvm.add %5973, %5975 : !llvm.i64 + %5977 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5978 = llvm.mul %50, %5977 : !llvm.i64 + %5979 = llvm.add %5976, %5978 : !llvm.i64 + %5980 = llvm.getelementptr %5972[%5979] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %5971, %5980 : !llvm.ptr> + %5981 = llvm.add %5449, %51 : !llvm.i64 + %5982 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5983 = llvm.mlir.constant(0 : index) : !llvm.i64 + %5984 = llvm.mlir.constant(512 : index) : !llvm.i64 + %5985 = llvm.mul %2345, %5984 : !llvm.i64 + %5986 = llvm.add %5983, %5985 : !llvm.i64 + %5987 = llvm.mlir.constant(1 : index) : !llvm.i64 + %5988 = llvm.mul %5981, %5987 : !llvm.i64 + %5989 = llvm.add %5986, %5988 : !llvm.i64 + %5990 = llvm.getelementptr %5982[%5989] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %5991 = llvm.bitcast %5990 : !llvm.ptr to !llvm.ptr> + %5992 = llvm.load %5991 {alignment = 4 : i64} : !llvm.ptr> + %5993 = llvm.add %5466, %52 : !llvm.i64 + %5994 = llvm.icmp "slt" %5993, %67 : !llvm.i64 + %5995 = llvm.sub %64, %5993 : !llvm.i64 + %5996 = llvm.select %5994, %5995, %5993 : !llvm.i1, !llvm.i64 + %5997 = llvm.sdiv %5996, %68 : !llvm.i64 + %5998 = llvm.sub %64, %5997 : !llvm.i64 + %5999 = llvm.select %5994, %5998, %5997 : !llvm.i1, !llvm.i64 + %6000 = llvm.mul %5999, %60 : !llvm.i64 + %6001 = llvm.add %5466, %6000 : !llvm.i64 + %6002 = llvm.add %6001, %52 : !llvm.i64 + %6003 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6004 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6005 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6006 = llvm.mul %6002, %6005 : !llvm.i64 + %6007 = llvm.add %6004, %6006 : !llvm.i64 + %6008 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6009 = llvm.mul %67, %6008 : !llvm.i64 + %6010 = llvm.add %6007, %6009 : !llvm.i64 + %6011 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6012 = llvm.mul %5484, %6011 : !llvm.i64 + %6013 = llvm.add %6010, %6012 : !llvm.i64 + %6014 = llvm.getelementptr %6003[%6013] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6015 = llvm.load %6014 : !llvm.ptr> + %6016 = llvm.fadd %5992, %6015 : !llvm.vec<8 x float> + %6017 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6018 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6019 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6020 = llvm.mul %67, %6019 : !llvm.i64 + %6021 = llvm.add %6018, %6020 : !llvm.i64 + %6022 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6023 = llvm.mul %33, %6022 : !llvm.i64 + %6024 = llvm.add %6021, %6023 : !llvm.i64 + %6025 = llvm.getelementptr %6017[%6024] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6016, %6025 : !llvm.ptr> + %6026 = llvm.add %5449, %53 : !llvm.i64 + %6027 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6028 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6029 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6030 = llvm.mul %2345, %6029 : !llvm.i64 + %6031 = llvm.add %6028, %6030 : !llvm.i64 + %6032 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6033 = llvm.mul %6026, %6032 : !llvm.i64 + %6034 = llvm.add %6031, %6033 : !llvm.i64 + %6035 = llvm.getelementptr %6027[%6034] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6036 = llvm.bitcast %6035 : !llvm.ptr to !llvm.ptr> + %6037 = llvm.load %6036 {alignment = 4 : i64} : !llvm.ptr> + %6038 = llvm.add %5447, %53 : !llvm.i64 + %6039 = llvm.icmp "slt" %6038, %67 : !llvm.i64 + %6040 = llvm.sub %64, %6038 : !llvm.i64 + %6041 = llvm.select %6039, %6040, %6038 : !llvm.i1, !llvm.i64 + %6042 = llvm.sdiv %6041, %68 : !llvm.i64 + %6043 = llvm.sub %64, %6042 : !llvm.i64 + %6044 = llvm.select %6039, %6043, %6042 : !llvm.i1, !llvm.i64 + %6045 = llvm.srem %6044, %68 : !llvm.i64 + %6046 = llvm.icmp "slt" %6045, %67 : !llvm.i64 + %6047 = llvm.add %6045, %68 : !llvm.i64 + %6048 = llvm.select %6046, %6047, %6045 : !llvm.i1, !llvm.i64 + %6049 = llvm.mul %6044, %65 : !llvm.i64 + %6050 = llvm.add %5533, %6049 : !llvm.i64 + %6051 = llvm.add %6050, %54 : !llvm.i64 + %6052 = llvm.icmp "slt" %6051, %67 : !llvm.i64 + %6053 = llvm.sub %64, %6051 : !llvm.i64 + %6054 = llvm.select %6052, %6053, %6051 : !llvm.i1, !llvm.i64 + %6055 = llvm.sdiv %6054, %63 : !llvm.i64 + %6056 = llvm.sub %64, %6055 : !llvm.i64 + %6057 = llvm.select %6052, %6056, %6055 : !llvm.i1, !llvm.i64 + %6058 = llvm.mul %6057, %65 : !llvm.i64 + %6059 = llvm.add %6050, %6058 : !llvm.i64 + %6060 = llvm.add %6059, %54 : !llvm.i64 + %6061 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6062 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6063 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6064 = llvm.mul %6048, %6063 : !llvm.i64 + %6065 = llvm.add %6062, %6064 : !llvm.i64 + %6066 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6067 = llvm.mul %67, %6066 : !llvm.i64 + %6068 = llvm.add %6065, %6067 : !llvm.i64 + %6069 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6070 = llvm.mul %6060, %6069 : !llvm.i64 + %6071 = llvm.add %6068, %6070 : !llvm.i64 + %6072 = llvm.getelementptr %6061[%6071] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6073 = llvm.load %6072 : !llvm.ptr> + %6074 = llvm.fadd %6037, %6073 : !llvm.vec<8 x float> + %6075 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6076 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6077 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6078 = llvm.mul %67, %6077 : !llvm.i64 + %6079 = llvm.add %6076, %6078 : !llvm.i64 + %6080 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6081 = llvm.mul %54, %6080 : !llvm.i64 + %6082 = llvm.add %6079, %6081 : !llvm.i64 + %6083 = llvm.getelementptr %6075[%6082] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6074, %6083 : !llvm.ptr> + %6084 = llvm.add %5449, %55 : !llvm.i64 + %6085 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6086 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6087 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6088 = llvm.mul %2345, %6087 : !llvm.i64 + %6089 = llvm.add %6086, %6088 : !llvm.i64 + %6090 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6091 = llvm.mul %6084, %6090 : !llvm.i64 + %6092 = llvm.add %6089, %6091 : !llvm.i64 + %6093 = llvm.getelementptr %6085[%6092] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6094 = llvm.bitcast %6093 : !llvm.ptr to !llvm.ptr> + %6095 = llvm.load %6094 {alignment = 4 : i64} : !llvm.ptr> + %6096 = llvm.add %5466, %56 : !llvm.i64 + %6097 = llvm.icmp "slt" %6096, %67 : !llvm.i64 + %6098 = llvm.sub %64, %6096 : !llvm.i64 + %6099 = llvm.select %6097, %6098, %6096 : !llvm.i1, !llvm.i64 + %6100 = llvm.sdiv %6099, %68 : !llvm.i64 + %6101 = llvm.sub %64, %6100 : !llvm.i64 + %6102 = llvm.select %6097, %6101, %6100 : !llvm.i1, !llvm.i64 + %6103 = llvm.mul %6102, %60 : !llvm.i64 + %6104 = llvm.add %5466, %6103 : !llvm.i64 + %6105 = llvm.add %6104, %56 : !llvm.i64 + %6106 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6107 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6108 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6109 = llvm.mul %6105, %6108 : !llvm.i64 + %6110 = llvm.add %6107, %6109 : !llvm.i64 + %6111 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6112 = llvm.mul %67, %6111 : !llvm.i64 + %6113 = llvm.add %6110, %6112 : !llvm.i64 + %6114 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6115 = llvm.mul %5484, %6114 : !llvm.i64 + %6116 = llvm.add %6113, %6115 : !llvm.i64 + %6117 = llvm.getelementptr %6106[%6116] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6118 = llvm.load %6117 : !llvm.ptr> + %6119 = llvm.fadd %6095, %6118 : !llvm.vec<8 x float> + %6120 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6121 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6122 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6123 = llvm.mul %67, %6122 : !llvm.i64 + %6124 = llvm.add %6121, %6123 : !llvm.i64 + %6125 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6126 = llvm.mul %34, %6125 : !llvm.i64 + %6127 = llvm.add %6124, %6126 : !llvm.i64 + %6128 = llvm.getelementptr %6120[%6127] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6119, %6128 : !llvm.ptr> + %6129 = llvm.add %5449, %57 : !llvm.i64 + %6130 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6131 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6132 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6133 = llvm.mul %2345, %6132 : !llvm.i64 + %6134 = llvm.add %6131, %6133 : !llvm.i64 + %6135 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6136 = llvm.mul %6129, %6135 : !llvm.i64 + %6137 = llvm.add %6134, %6136 : !llvm.i64 + %6138 = llvm.getelementptr %6130[%6137] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6139 = llvm.bitcast %6138 : !llvm.ptr to !llvm.ptr> + %6140 = llvm.load %6139 {alignment = 4 : i64} : !llvm.ptr> + %6141 = llvm.add %5447, %57 : !llvm.i64 + %6142 = llvm.icmp "slt" %6141, %67 : !llvm.i64 + %6143 = llvm.sub %64, %6141 : !llvm.i64 + %6144 = llvm.select %6142, %6143, %6141 : !llvm.i1, !llvm.i64 + %6145 = llvm.sdiv %6144, %68 : !llvm.i64 + %6146 = llvm.sub %64, %6145 : !llvm.i64 + %6147 = llvm.select %6142, %6146, %6145 : !llvm.i1, !llvm.i64 + %6148 = llvm.srem %6147, %68 : !llvm.i64 + %6149 = llvm.icmp "slt" %6148, %67 : !llvm.i64 + %6150 = llvm.add %6148, %68 : !llvm.i64 + %6151 = llvm.select %6149, %6150, %6148 : !llvm.i1, !llvm.i64 + %6152 = llvm.mul %6147, %65 : !llvm.i64 + %6153 = llvm.add %5533, %6152 : !llvm.i64 + %6154 = llvm.add %6153, %58 : !llvm.i64 + %6155 = llvm.icmp "slt" %6154, %67 : !llvm.i64 + %6156 = llvm.sub %64, %6154 : !llvm.i64 + %6157 = llvm.select %6155, %6156, %6154 : !llvm.i1, !llvm.i64 + %6158 = llvm.sdiv %6157, %63 : !llvm.i64 + %6159 = llvm.sub %64, %6158 : !llvm.i64 + %6160 = llvm.select %6155, %6159, %6158 : !llvm.i1, !llvm.i64 + %6161 = llvm.mul %6160, %65 : !llvm.i64 + %6162 = llvm.add %6153, %6161 : !llvm.i64 + %6163 = llvm.add %6162, %58 : !llvm.i64 + %6164 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6165 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6166 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6167 = llvm.mul %6151, %6166 : !llvm.i64 + %6168 = llvm.add %6165, %6167 : !llvm.i64 + %6169 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6170 = llvm.mul %67, %6169 : !llvm.i64 + %6171 = llvm.add %6168, %6170 : !llvm.i64 + %6172 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6173 = llvm.mul %6163, %6172 : !llvm.i64 + %6174 = llvm.add %6171, %6173 : !llvm.i64 + %6175 = llvm.getelementptr %6164[%6174] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6176 = llvm.load %6175 : !llvm.ptr> + %6177 = llvm.fadd %6140, %6176 : !llvm.vec<8 x float> + %6178 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6179 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6180 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6181 = llvm.mul %67, %6180 : !llvm.i64 + %6182 = llvm.add %6179, %6181 : !llvm.i64 + %6183 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6184 = llvm.mul %58, %6183 : !llvm.i64 + %6185 = llvm.add %6182, %6184 : !llvm.i64 + %6186 = llvm.getelementptr %6178[%6185] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6177, %6186 : !llvm.ptr> + %6187 = llvm.add %5449, %59 : !llvm.i64 + %6188 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6189 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6190 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6191 = llvm.mul %2345, %6190 : !llvm.i64 + %6192 = llvm.add %6189, %6191 : !llvm.i64 + %6193 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6194 = llvm.mul %6187, %6193 : !llvm.i64 + %6195 = llvm.add %6192, %6194 : !llvm.i64 + %6196 = llvm.getelementptr %6188[%6195] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6197 = llvm.bitcast %6196 : !llvm.ptr to !llvm.ptr> + %6198 = llvm.load %6197 {alignment = 4 : i64} : !llvm.ptr> + %6199 = llvm.add %5466, %61 : !llvm.i64 + %6200 = llvm.icmp "slt" %6199, %67 : !llvm.i64 + %6201 = llvm.sub %64, %6199 : !llvm.i64 + %6202 = llvm.select %6200, %6201, %6199 : !llvm.i1, !llvm.i64 + %6203 = llvm.sdiv %6202, %68 : !llvm.i64 + %6204 = llvm.sub %64, %6203 : !llvm.i64 + %6205 = llvm.select %6200, %6204, %6203 : !llvm.i1, !llvm.i64 + %6206 = llvm.mul %6205, %60 : !llvm.i64 + %6207 = llvm.add %5466, %6206 : !llvm.i64 + %6208 = llvm.add %6207, %61 : !llvm.i64 + %6209 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6210 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6211 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6212 = llvm.mul %6208, %6211 : !llvm.i64 + %6213 = llvm.add %6210, %6212 : !llvm.i64 + %6214 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6215 = llvm.mul %67, %6214 : !llvm.i64 + %6216 = llvm.add %6213, %6215 : !llvm.i64 + %6217 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6218 = llvm.mul %5484, %6217 : !llvm.i64 + %6219 = llvm.add %6216, %6218 : !llvm.i64 + %6220 = llvm.getelementptr %6209[%6219] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6221 = llvm.load %6220 : !llvm.ptr> + %6222 = llvm.fadd %6198, %6221 : !llvm.vec<8 x float> + %6223 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6224 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6225 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6226 = llvm.mul %67, %6225 : !llvm.i64 + %6227 = llvm.add %6224, %6226 : !llvm.i64 + %6228 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6229 = llvm.mul %35, %6228 : !llvm.i64 + %6230 = llvm.add %6227, %6229 : !llvm.i64 + %6231 = llvm.getelementptr %6223[%6230] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6222, %6231 : !llvm.ptr> + %6232 = llvm.add %5449, %62 : !llvm.i64 + %6233 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6234 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6235 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6236 = llvm.mul %2345, %6235 : !llvm.i64 + %6237 = llvm.add %6234, %6236 : !llvm.i64 + %6238 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6239 = llvm.mul %6232, %6238 : !llvm.i64 + %6240 = llvm.add %6237, %6239 : !llvm.i64 + %6241 = llvm.getelementptr %6233[%6240] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6242 = llvm.bitcast %6241 : !llvm.ptr to !llvm.ptr> + %6243 = llvm.load %6242 {alignment = 4 : i64} : !llvm.ptr> + %6244 = llvm.add %5447, %62 : !llvm.i64 + %6245 = llvm.icmp "slt" %6244, %67 : !llvm.i64 + %6246 = llvm.sub %64, %6244 : !llvm.i64 + %6247 = llvm.select %6245, %6246, %6244 : !llvm.i1, !llvm.i64 + %6248 = llvm.sdiv %6247, %68 : !llvm.i64 + %6249 = llvm.sub %64, %6248 : !llvm.i64 + %6250 = llvm.select %6245, %6249, %6248 : !llvm.i1, !llvm.i64 + %6251 = llvm.srem %6250, %68 : !llvm.i64 + %6252 = llvm.icmp "slt" %6251, %67 : !llvm.i64 + %6253 = llvm.add %6251, %68 : !llvm.i64 + %6254 = llvm.select %6252, %6253, %6251 : !llvm.i1, !llvm.i64 + %6255 = llvm.mul %6250, %65 : !llvm.i64 + %6256 = llvm.add %5533, %6255 : !llvm.i64 + %6257 = llvm.add %6256, %66 : !llvm.i64 + %6258 = llvm.icmp "slt" %6257, %67 : !llvm.i64 + %6259 = llvm.sub %64, %6257 : !llvm.i64 + %6260 = llvm.select %6258, %6259, %6257 : !llvm.i1, !llvm.i64 + %6261 = llvm.sdiv %6260, %63 : !llvm.i64 + %6262 = llvm.sub %64, %6261 : !llvm.i64 + %6263 = llvm.select %6258, %6262, %6261 : !llvm.i1, !llvm.i64 + %6264 = llvm.mul %6263, %65 : !llvm.i64 + %6265 = llvm.add %6256, %6264 : !llvm.i64 + %6266 = llvm.add %6265, %66 : !llvm.i64 + %6267 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6268 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6269 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6270 = llvm.mul %6254, %6269 : !llvm.i64 + %6271 = llvm.add %6268, %6270 : !llvm.i64 + %6272 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6273 = llvm.mul %67, %6272 : !llvm.i64 + %6274 = llvm.add %6271, %6273 : !llvm.i64 + %6275 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6276 = llvm.mul %6266, %6275 : !llvm.i64 + %6277 = llvm.add %6274, %6276 : !llvm.i64 + %6278 = llvm.getelementptr %6267[%6277] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6279 = llvm.load %6278 : !llvm.ptr> + %6280 = llvm.fadd %6243, %6279 : !llvm.vec<8 x float> + %6281 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6282 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6283 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6284 = llvm.mul %67, %6283 : !llvm.i64 + %6285 = llvm.add %6282, %6284 : !llvm.i64 + %6286 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6287 = llvm.mul %66, %6286 : !llvm.i64 + %6288 = llvm.add %6285, %6287 : !llvm.i64 + %6289 = llvm.getelementptr %6281[%6288] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6280, %6289 : !llvm.ptr> + llvm.br ^bb44(%67 : !llvm.i64) + ^bb44(%6290: !llvm.i64): // 2 preds: ^bb43, ^bb45 + %6291 = llvm.icmp "slt" %6290, %68 : !llvm.i64 + llvm.cond_br %6291, ^bb45, ^bb46 + ^bb45: // pred: ^bb44 + %6292 = llvm.mul %6290, %70 : !llvm.i64 + %6293 = llvm.add %5449, %6292 : !llvm.i64 + %6294 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6295 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6296 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6297 = llvm.mul %67, %6296 : !llvm.i64 + %6298 = llvm.add %6295, %6297 : !llvm.i64 + %6299 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6300 = llvm.mul %6290, %6299 : !llvm.i64 + %6301 = llvm.add %6298, %6300 : !llvm.i64 + %6302 = llvm.getelementptr %6294[%6301] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6303 = llvm.load %6302 : !llvm.ptr> + %6304 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6305 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6306 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6307 = llvm.mul %2345, %6306 : !llvm.i64 + %6308 = llvm.add %6305, %6307 : !llvm.i64 + %6309 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6310 = llvm.mul %6293, %6309 : !llvm.i64 + %6311 = llvm.add %6308, %6310 : !llvm.i64 + %6312 = llvm.getelementptr %6304[%6311] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6313 = llvm.bitcast %6312 : !llvm.ptr to !llvm.ptr> + llvm.store %6303, %6313 {alignment = 4 : i64} : !llvm.ptr> + %6314 = llvm.add %6290, %69 : !llvm.i64 + llvm.br ^bb44(%6314 : !llvm.i64) + ^bb46: // 2 preds: ^bb44, ^bb48 + llvm.br ^bb50 + ^bb47: // pred: ^bb42 + %6315 = llvm.add %151, %5447 : !llvm.i64 + %6316 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6317 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6318 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6319 = llvm.mul %2345, %6318 : !llvm.i64 + %6320 = llvm.add %6317, %6319 : !llvm.i64 + %6321 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6322 = llvm.mul %6315, %6321 : !llvm.i64 + %6323 = llvm.add %6320, %6322 : !llvm.i64 + %6324 = llvm.getelementptr %6316[%6323] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6325 = llvm.bitcast %6324 : !llvm.ptr to !llvm.ptr> + %6326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6327 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6328 = llvm.trunc %6315 : !llvm.i64 to !llvm.i32 + %6329 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6330 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6331 = llvm.insertelement %6328, %6329[%6330 : !llvm.i32] : !llvm.vec<8 x i32> + %6332 = llvm.shufflevector %6331, %6329 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6333 = llvm.add %6332, %6327 : !llvm.vec<8 x i32> + %6334 = llvm.trunc %6326 : !llvm.i64 to !llvm.i32 + %6335 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6336 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6337 = llvm.insertelement %6334, %6335[%6336 : !llvm.i32] : !llvm.vec<8 x i32> + %6338 = llvm.shufflevector %6337, %6335 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6339 = llvm.icmp "slt" %6333, %6338 : !llvm.vec<8 x i32> + %6340 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6341 = llvm.intr.masked.load %6325, %6339, %6340 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6342 = llvm.icmp "slt" %5447, %67 : !llvm.i64 + %6343 = llvm.sub %64, %5447 : !llvm.i64 + %6344 = llvm.select %6342, %6343, %5447 : !llvm.i1, !llvm.i64 + %6345 = llvm.sdiv %6344, %68 : !llvm.i64 + %6346 = llvm.sub %64, %6345 : !llvm.i64 + %6347 = llvm.select %6342, %6346, %6345 : !llvm.i1, !llvm.i64 + %6348 = llvm.srem %6347, %68 : !llvm.i64 + %6349 = llvm.icmp "slt" %6348, %67 : !llvm.i64 + %6350 = llvm.add %6348, %68 : !llvm.i64 + %6351 = llvm.select %6349, %6350, %6348 : !llvm.i1, !llvm.i64 + %6352 = llvm.srem %5447, %68 : !llvm.i64 + %6353 = llvm.icmp "slt" %6352, %67 : !llvm.i64 + %6354 = llvm.add %6352, %68 : !llvm.i64 + %6355 = llvm.select %6353, %6354, %6352 : !llvm.i1, !llvm.i64 + %6356 = llvm.icmp "slt" %6355, %67 : !llvm.i64 + %6357 = llvm.sub %64, %6355 : !llvm.i64 + %6358 = llvm.select %6356, %6357, %6355 : !llvm.i1, !llvm.i64 + %6359 = llvm.sdiv %6358, %70 : !llvm.i64 + %6360 = llvm.sub %64, %6359 : !llvm.i64 + %6361 = llvm.select %6356, %6360, %6359 : !llvm.i1, !llvm.i64 + %6362 = llvm.srem %6361, %63 : !llvm.i64 + %6363 = llvm.icmp "slt" %6362, %67 : !llvm.i64 + %6364 = llvm.add %6362, %63 : !llvm.i64 + %6365 = llvm.select %6363, %6364, %6362 : !llvm.i1, !llvm.i64 + %6366 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6367 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6368 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6369 = llvm.mul %6351, %6368 : !llvm.i64 + %6370 = llvm.add %6367, %6369 : !llvm.i64 + %6371 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6372 = llvm.mul %67, %6371 : !llvm.i64 + %6373 = llvm.add %6370, %6372 : !llvm.i64 + %6374 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6375 = llvm.mul %6365, %6374 : !llvm.i64 + %6376 = llvm.add %6373, %6375 : !llvm.i64 + %6377 = llvm.getelementptr %6366[%6376] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6378 = llvm.load %6377 : !llvm.ptr> + %6379 = llvm.fadd %6341, %6378 : !llvm.vec<8 x float> + %6380 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6381 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6382 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6383 = llvm.mul %67, %6382 : !llvm.i64 + %6384 = llvm.add %6381, %6383 : !llvm.i64 + %6385 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6386 = llvm.mul %67, %6385 : !llvm.i64 + %6387 = llvm.add %6384, %6386 : !llvm.i64 + %6388 = llvm.getelementptr %6380[%6387] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6379, %6388 : !llvm.ptr> + %6389 = llvm.add %6315, %70 : !llvm.i64 + %6390 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6391 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6392 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6393 = llvm.mul %2345, %6392 : !llvm.i64 + %6394 = llvm.add %6391, %6393 : !llvm.i64 + %6395 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6396 = llvm.mul %6389, %6395 : !llvm.i64 + %6397 = llvm.add %6394, %6396 : !llvm.i64 + %6398 = llvm.getelementptr %6390[%6397] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6399 = llvm.bitcast %6398 : !llvm.ptr to !llvm.ptr> + %6400 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6401 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6402 = llvm.trunc %6389 : !llvm.i64 to !llvm.i32 + %6403 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6404 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6405 = llvm.insertelement %6402, %6403[%6404 : !llvm.i32] : !llvm.vec<8 x i32> + %6406 = llvm.shufflevector %6405, %6403 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6407 = llvm.add %6406, %6401 : !llvm.vec<8 x i32> + %6408 = llvm.trunc %6400 : !llvm.i64 to !llvm.i32 + %6409 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6410 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6411 = llvm.insertelement %6408, %6409[%6410 : !llvm.i32] : !llvm.vec<8 x i32> + %6412 = llvm.shufflevector %6411, %6409 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6413 = llvm.icmp "slt" %6407, %6412 : !llvm.vec<8 x i32> + %6414 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6415 = llvm.intr.masked.load %6399, %6413, %6414 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6416 = llvm.add %5447, %70 : !llvm.i64 + %6417 = llvm.icmp "slt" %6416, %67 : !llvm.i64 + %6418 = llvm.sub %64, %6416 : !llvm.i64 + %6419 = llvm.select %6417, %6418, %6416 : !llvm.i1, !llvm.i64 + %6420 = llvm.sdiv %6419, %68 : !llvm.i64 + %6421 = llvm.sub %64, %6420 : !llvm.i64 + %6422 = llvm.select %6417, %6421, %6420 : !llvm.i1, !llvm.i64 + %6423 = llvm.srem %6422, %68 : !llvm.i64 + %6424 = llvm.icmp "slt" %6423, %67 : !llvm.i64 + %6425 = llvm.add %6423, %68 : !llvm.i64 + %6426 = llvm.select %6424, %6425, %6423 : !llvm.i1, !llvm.i64 + %6427 = llvm.sdiv %6344, %70 : !llvm.i64 + %6428 = llvm.sub %64, %6427 : !llvm.i64 + %6429 = llvm.select %6342, %6428, %6427 : !llvm.i1, !llvm.i64 + %6430 = llvm.mul %6422, %65 : !llvm.i64 + %6431 = llvm.add %6429, %6430 : !llvm.i64 + %6432 = llvm.add %6431, %69 : !llvm.i64 + %6433 = llvm.icmp "slt" %6432, %67 : !llvm.i64 + %6434 = llvm.sub %64, %6432 : !llvm.i64 + %6435 = llvm.select %6433, %6434, %6432 : !llvm.i1, !llvm.i64 + %6436 = llvm.sdiv %6435, %63 : !llvm.i64 + %6437 = llvm.sub %64, %6436 : !llvm.i64 + %6438 = llvm.select %6433, %6437, %6436 : !llvm.i1, !llvm.i64 + %6439 = llvm.mul %6438, %65 : !llvm.i64 + %6440 = llvm.add %6431, %6439 : !llvm.i64 + %6441 = llvm.add %6440, %69 : !llvm.i64 + %6442 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6443 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6444 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6445 = llvm.mul %6426, %6444 : !llvm.i64 + %6446 = llvm.add %6443, %6445 : !llvm.i64 + %6447 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6448 = llvm.mul %67, %6447 : !llvm.i64 + %6449 = llvm.add %6446, %6448 : !llvm.i64 + %6450 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6451 = llvm.mul %6441, %6450 : !llvm.i64 + %6452 = llvm.add %6449, %6451 : !llvm.i64 + %6453 = llvm.getelementptr %6442[%6452] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6454 = llvm.load %6453 : !llvm.ptr> + %6455 = llvm.fadd %6415, %6454 : !llvm.vec<8 x float> + %6456 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6457 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6458 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6459 = llvm.mul %67, %6458 : !llvm.i64 + %6460 = llvm.add %6457, %6459 : !llvm.i64 + %6461 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6462 = llvm.mul %69, %6461 : !llvm.i64 + %6463 = llvm.add %6460, %6462 : !llvm.i64 + %6464 = llvm.getelementptr %6456[%6463] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6455, %6464 : !llvm.ptr> + %6465 = llvm.add %6315, %68 : !llvm.i64 + %6466 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6467 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6468 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6469 = llvm.mul %2345, %6468 : !llvm.i64 + %6470 = llvm.add %6467, %6469 : !llvm.i64 + %6471 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6472 = llvm.mul %6465, %6471 : !llvm.i64 + %6473 = llvm.add %6470, %6472 : !llvm.i64 + %6474 = llvm.getelementptr %6466[%6473] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6475 = llvm.bitcast %6474 : !llvm.ptr to !llvm.ptr> + %6476 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6477 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6478 = llvm.trunc %6465 : !llvm.i64 to !llvm.i32 + %6479 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6480 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6481 = llvm.insertelement %6478, %6479[%6480 : !llvm.i32] : !llvm.vec<8 x i32> + %6482 = llvm.shufflevector %6481, %6479 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6483 = llvm.add %6482, %6477 : !llvm.vec<8 x i32> + %6484 = llvm.trunc %6476 : !llvm.i64 to !llvm.i32 + %6485 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6486 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6487 = llvm.insertelement %6484, %6485[%6486 : !llvm.i32] : !llvm.vec<8 x i32> + %6488 = llvm.shufflevector %6487, %6485 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6489 = llvm.icmp "slt" %6483, %6488 : !llvm.vec<8 x i32> + %6490 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6491 = llvm.intr.masked.load %6475, %6489, %6490 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6492 = llvm.add %6347, %69 : !llvm.i64 + %6493 = llvm.icmp "slt" %6492, %67 : !llvm.i64 + %6494 = llvm.sub %64, %6492 : !llvm.i64 + %6495 = llvm.select %6493, %6494, %6492 : !llvm.i1, !llvm.i64 + %6496 = llvm.sdiv %6495, %68 : !llvm.i64 + %6497 = llvm.sub %64, %6496 : !llvm.i64 + %6498 = llvm.select %6493, %6497, %6496 : !llvm.i1, !llvm.i64 + %6499 = llvm.mul %6498, %60 : !llvm.i64 + %6500 = llvm.add %6347, %6499 : !llvm.i64 + %6501 = llvm.add %6500, %69 : !llvm.i64 + %6502 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6503 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6504 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6505 = llvm.mul %6501, %6504 : !llvm.i64 + %6506 = llvm.add %6503, %6505 : !llvm.i64 + %6507 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6508 = llvm.mul %67, %6507 : !llvm.i64 + %6509 = llvm.add %6506, %6508 : !llvm.i64 + %6510 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6511 = llvm.mul %6365, %6510 : !llvm.i64 + %6512 = llvm.add %6509, %6511 : !llvm.i64 + %6513 = llvm.getelementptr %6502[%6512] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6514 = llvm.load %6513 : !llvm.ptr> + %6515 = llvm.fadd %6491, %6514 : !llvm.vec<8 x float> + %6516 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6517 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6518 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6519 = llvm.mul %67, %6518 : !llvm.i64 + %6520 = llvm.add %6517, %6519 : !llvm.i64 + %6521 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6522 = llvm.mul %63, %6521 : !llvm.i64 + %6523 = llvm.add %6520, %6522 : !llvm.i64 + %6524 = llvm.getelementptr %6516[%6523] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6515, %6524 : !llvm.ptr> + %6525 = llvm.add %6315, %41 : !llvm.i64 + %6526 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6527 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6528 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6529 = llvm.mul %2345, %6528 : !llvm.i64 + %6530 = llvm.add %6527, %6529 : !llvm.i64 + %6531 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6532 = llvm.mul %6525, %6531 : !llvm.i64 + %6533 = llvm.add %6530, %6532 : !llvm.i64 + %6534 = llvm.getelementptr %6526[%6533] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6535 = llvm.bitcast %6534 : !llvm.ptr to !llvm.ptr> + %6536 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6537 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6538 = llvm.trunc %6525 : !llvm.i64 to !llvm.i32 + %6539 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6540 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6541 = llvm.insertelement %6538, %6539[%6540 : !llvm.i32] : !llvm.vec<8 x i32> + %6542 = llvm.shufflevector %6541, %6539 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6543 = llvm.add %6542, %6537 : !llvm.vec<8 x i32> + %6544 = llvm.trunc %6536 : !llvm.i64 to !llvm.i32 + %6545 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6546 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6547 = llvm.insertelement %6544, %6545[%6546 : !llvm.i32] : !llvm.vec<8 x i32> + %6548 = llvm.shufflevector %6547, %6545 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6549 = llvm.icmp "slt" %6543, %6548 : !llvm.vec<8 x i32> + %6550 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6551 = llvm.intr.masked.load %6535, %6549, %6550 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6552 = llvm.add %5447, %41 : !llvm.i64 + %6553 = llvm.icmp "slt" %6552, %67 : !llvm.i64 + %6554 = llvm.sub %64, %6552 : !llvm.i64 + %6555 = llvm.select %6553, %6554, %6552 : !llvm.i1, !llvm.i64 + %6556 = llvm.sdiv %6555, %68 : !llvm.i64 + %6557 = llvm.sub %64, %6556 : !llvm.i64 + %6558 = llvm.select %6553, %6557, %6556 : !llvm.i1, !llvm.i64 + %6559 = llvm.srem %6558, %68 : !llvm.i64 + %6560 = llvm.icmp "slt" %6559, %67 : !llvm.i64 + %6561 = llvm.add %6559, %68 : !llvm.i64 + %6562 = llvm.select %6560, %6561, %6559 : !llvm.i1, !llvm.i64 + %6563 = llvm.mul %6558, %65 : !llvm.i64 + %6564 = llvm.add %6429, %6563 : !llvm.i64 + %6565 = llvm.add %6564, %45 : !llvm.i64 + %6566 = llvm.icmp "slt" %6565, %67 : !llvm.i64 + %6567 = llvm.sub %64, %6565 : !llvm.i64 + %6568 = llvm.select %6566, %6567, %6565 : !llvm.i1, !llvm.i64 + %6569 = llvm.sdiv %6568, %63 : !llvm.i64 + %6570 = llvm.sub %64, %6569 : !llvm.i64 + %6571 = llvm.select %6566, %6570, %6569 : !llvm.i1, !llvm.i64 + %6572 = llvm.mul %6571, %65 : !llvm.i64 + %6573 = llvm.add %6564, %6572 : !llvm.i64 + %6574 = llvm.add %6573, %45 : !llvm.i64 + %6575 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6576 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6577 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6578 = llvm.mul %6562, %6577 : !llvm.i64 + %6579 = llvm.add %6576, %6578 : !llvm.i64 + %6580 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6581 = llvm.mul %67, %6580 : !llvm.i64 + %6582 = llvm.add %6579, %6581 : !llvm.i64 + %6583 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6584 = llvm.mul %6574, %6583 : !llvm.i64 + %6585 = llvm.add %6582, %6584 : !llvm.i64 + %6586 = llvm.getelementptr %6575[%6585] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6587 = llvm.load %6586 : !llvm.ptr> + %6588 = llvm.fadd %6551, %6587 : !llvm.vec<8 x float> + %6589 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6590 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6591 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6592 = llvm.mul %67, %6591 : !llvm.i64 + %6593 = llvm.add %6590, %6592 : !llvm.i64 + %6594 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6595 = llvm.mul %45, %6594 : !llvm.i64 + %6596 = llvm.add %6593, %6595 : !llvm.i64 + %6597 = llvm.getelementptr %6589[%6596] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6588, %6597 : !llvm.ptr> + %6598 = llvm.add %6315, %42 : !llvm.i64 + %6599 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6600 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6601 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6602 = llvm.mul %2345, %6601 : !llvm.i64 + %6603 = llvm.add %6600, %6602 : !llvm.i64 + %6604 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6605 = llvm.mul %6598, %6604 : !llvm.i64 + %6606 = llvm.add %6603, %6605 : !llvm.i64 + %6607 = llvm.getelementptr %6599[%6606] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6608 = llvm.bitcast %6607 : !llvm.ptr to !llvm.ptr> + %6609 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6610 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6611 = llvm.trunc %6598 : !llvm.i64 to !llvm.i32 + %6612 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6613 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6614 = llvm.insertelement %6611, %6612[%6613 : !llvm.i32] : !llvm.vec<8 x i32> + %6615 = llvm.shufflevector %6614, %6612 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6616 = llvm.add %6615, %6610 : !llvm.vec<8 x i32> + %6617 = llvm.trunc %6609 : !llvm.i64 to !llvm.i32 + %6618 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6619 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6620 = llvm.insertelement %6617, %6618[%6619 : !llvm.i32] : !llvm.vec<8 x i32> + %6621 = llvm.shufflevector %6620, %6618 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6622 = llvm.icmp "slt" %6616, %6621 : !llvm.vec<8 x i32> + %6623 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6624 = llvm.intr.masked.load %6608, %6622, %6623 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6625 = llvm.add %6347, %63 : !llvm.i64 + %6626 = llvm.icmp "slt" %6625, %67 : !llvm.i64 + %6627 = llvm.sub %64, %6625 : !llvm.i64 + %6628 = llvm.select %6626, %6627, %6625 : !llvm.i1, !llvm.i64 + %6629 = llvm.sdiv %6628, %68 : !llvm.i64 + %6630 = llvm.sub %64, %6629 : !llvm.i64 + %6631 = llvm.select %6626, %6630, %6629 : !llvm.i1, !llvm.i64 + %6632 = llvm.mul %6631, %60 : !llvm.i64 + %6633 = llvm.add %6347, %6632 : !llvm.i64 + %6634 = llvm.add %6633, %63 : !llvm.i64 + %6635 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6636 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6637 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6638 = llvm.mul %6634, %6637 : !llvm.i64 + %6639 = llvm.add %6636, %6638 : !llvm.i64 + %6640 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6641 = llvm.mul %67, %6640 : !llvm.i64 + %6642 = llvm.add %6639, %6641 : !llvm.i64 + %6643 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6644 = llvm.mul %6365, %6643 : !llvm.i64 + %6645 = llvm.add %6642, %6644 : !llvm.i64 + %6646 = llvm.getelementptr %6635[%6645] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6647 = llvm.load %6646 : !llvm.ptr> + %6648 = llvm.fadd %6624, %6647 : !llvm.vec<8 x float> + %6649 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6650 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6651 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6652 = llvm.mul %67, %6651 : !llvm.i64 + %6653 = llvm.add %6650, %6652 : !llvm.i64 + %6654 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6655 = llvm.mul %48, %6654 : !llvm.i64 + %6656 = llvm.add %6653, %6655 : !llvm.i64 + %6657 = llvm.getelementptr %6649[%6656] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6648, %6657 : !llvm.ptr> + %6658 = llvm.add %6315, %43 : !llvm.i64 + %6659 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6660 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6661 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6662 = llvm.mul %2345, %6661 : !llvm.i64 + %6663 = llvm.add %6660, %6662 : !llvm.i64 + %6664 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6665 = llvm.mul %6658, %6664 : !llvm.i64 + %6666 = llvm.add %6663, %6665 : !llvm.i64 + %6667 = llvm.getelementptr %6659[%6666] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6668 = llvm.bitcast %6667 : !llvm.ptr to !llvm.ptr> + %6669 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6670 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6671 = llvm.trunc %6658 : !llvm.i64 to !llvm.i32 + %6672 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6673 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6674 = llvm.insertelement %6671, %6672[%6673 : !llvm.i32] : !llvm.vec<8 x i32> + %6675 = llvm.shufflevector %6674, %6672 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6676 = llvm.add %6675, %6670 : !llvm.vec<8 x i32> + %6677 = llvm.trunc %6669 : !llvm.i64 to !llvm.i32 + %6678 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6679 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6680 = llvm.insertelement %6677, %6678[%6679 : !llvm.i32] : !llvm.vec<8 x i32> + %6681 = llvm.shufflevector %6680, %6678 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6682 = llvm.icmp "slt" %6676, %6681 : !llvm.vec<8 x i32> + %6683 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6684 = llvm.intr.masked.load %6668, %6682, %6683 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6685 = llvm.add %5447, %43 : !llvm.i64 + %6686 = llvm.icmp "slt" %6685, %67 : !llvm.i64 + %6687 = llvm.sub %64, %6685 : !llvm.i64 + %6688 = llvm.select %6686, %6687, %6685 : !llvm.i1, !llvm.i64 + %6689 = llvm.sdiv %6688, %68 : !llvm.i64 + %6690 = llvm.sub %64, %6689 : !llvm.i64 + %6691 = llvm.select %6686, %6690, %6689 : !llvm.i1, !llvm.i64 + %6692 = llvm.srem %6691, %68 : !llvm.i64 + %6693 = llvm.icmp "slt" %6692, %67 : !llvm.i64 + %6694 = llvm.add %6692, %68 : !llvm.i64 + %6695 = llvm.select %6693, %6694, %6692 : !llvm.i1, !llvm.i64 + %6696 = llvm.mul %6691, %65 : !llvm.i64 + %6697 = llvm.add %6429, %6696 : !llvm.i64 + %6698 = llvm.add %6697, %52 : !llvm.i64 + %6699 = llvm.icmp "slt" %6698, %67 : !llvm.i64 + %6700 = llvm.sub %64, %6698 : !llvm.i64 + %6701 = llvm.select %6699, %6700, %6698 : !llvm.i1, !llvm.i64 + %6702 = llvm.sdiv %6701, %63 : !llvm.i64 + %6703 = llvm.sub %64, %6702 : !llvm.i64 + %6704 = llvm.select %6699, %6703, %6702 : !llvm.i1, !llvm.i64 + %6705 = llvm.mul %6704, %65 : !llvm.i64 + %6706 = llvm.add %6697, %6705 : !llvm.i64 + %6707 = llvm.add %6706, %52 : !llvm.i64 + %6708 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6709 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6710 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6711 = llvm.mul %6695, %6710 : !llvm.i64 + %6712 = llvm.add %6709, %6711 : !llvm.i64 + %6713 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6714 = llvm.mul %67, %6713 : !llvm.i64 + %6715 = llvm.add %6712, %6714 : !llvm.i64 + %6716 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6717 = llvm.mul %6707, %6716 : !llvm.i64 + %6718 = llvm.add %6715, %6717 : !llvm.i64 + %6719 = llvm.getelementptr %6708[%6718] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6720 = llvm.load %6719 : !llvm.ptr> + %6721 = llvm.fadd %6684, %6720 : !llvm.vec<8 x float> + %6722 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6723 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6724 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6725 = llvm.mul %67, %6724 : !llvm.i64 + %6726 = llvm.add %6723, %6725 : !llvm.i64 + %6727 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6728 = llvm.mul %52, %6727 : !llvm.i64 + %6729 = llvm.add %6726, %6728 : !llvm.i64 + %6730 = llvm.getelementptr %6722[%6729] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6721, %6730 : !llvm.ptr> + %6731 = llvm.add %6315, %44 : !llvm.i64 + %6732 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6733 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6734 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6735 = llvm.mul %2345, %6734 : !llvm.i64 + %6736 = llvm.add %6733, %6735 : !llvm.i64 + %6737 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6738 = llvm.mul %6731, %6737 : !llvm.i64 + %6739 = llvm.add %6736, %6738 : !llvm.i64 + %6740 = llvm.getelementptr %6732[%6739] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6741 = llvm.bitcast %6740 : !llvm.ptr to !llvm.ptr> + %6742 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6743 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6744 = llvm.trunc %6731 : !llvm.i64 to !llvm.i32 + %6745 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6746 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6747 = llvm.insertelement %6744, %6745[%6746 : !llvm.i32] : !llvm.vec<8 x i32> + %6748 = llvm.shufflevector %6747, %6745 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6749 = llvm.add %6748, %6743 : !llvm.vec<8 x i32> + %6750 = llvm.trunc %6742 : !llvm.i64 to !llvm.i32 + %6751 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6752 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6753 = llvm.insertelement %6750, %6751[%6752 : !llvm.i32] : !llvm.vec<8 x i32> + %6754 = llvm.shufflevector %6753, %6751 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6755 = llvm.icmp "slt" %6749, %6754 : !llvm.vec<8 x i32> + %6756 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6757 = llvm.intr.masked.load %6741, %6755, %6756 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6758 = llvm.add %6347, %45 : !llvm.i64 + %6759 = llvm.icmp "slt" %6758, %67 : !llvm.i64 + %6760 = llvm.sub %64, %6758 : !llvm.i64 + %6761 = llvm.select %6759, %6760, %6758 : !llvm.i1, !llvm.i64 + %6762 = llvm.sdiv %6761, %68 : !llvm.i64 + %6763 = llvm.sub %64, %6762 : !llvm.i64 + %6764 = llvm.select %6759, %6763, %6762 : !llvm.i1, !llvm.i64 + %6765 = llvm.mul %6764, %60 : !llvm.i64 + %6766 = llvm.add %6347, %6765 : !llvm.i64 + %6767 = llvm.add %6766, %45 : !llvm.i64 + %6768 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6769 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6770 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6771 = llvm.mul %6767, %6770 : !llvm.i64 + %6772 = llvm.add %6769, %6771 : !llvm.i64 + %6773 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6774 = llvm.mul %67, %6773 : !llvm.i64 + %6775 = llvm.add %6772, %6774 : !llvm.i64 + %6776 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6777 = llvm.mul %6365, %6776 : !llvm.i64 + %6778 = llvm.add %6775, %6777 : !llvm.i64 + %6779 = llvm.getelementptr %6768[%6778] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6780 = llvm.load %6779 : !llvm.ptr> + %6781 = llvm.fadd %6757, %6780 : !llvm.vec<8 x float> + %6782 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6783 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6784 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6785 = llvm.mul %67, %6784 : !llvm.i64 + %6786 = llvm.add %6783, %6785 : !llvm.i64 + %6787 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6788 = llvm.mul %56, %6787 : !llvm.i64 + %6789 = llvm.add %6786, %6788 : !llvm.i64 + %6790 = llvm.getelementptr %6782[%6789] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6781, %6790 : !llvm.ptr> + %6791 = llvm.add %6315, %46 : !llvm.i64 + %6792 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6793 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6794 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6795 = llvm.mul %2345, %6794 : !llvm.i64 + %6796 = llvm.add %6793, %6795 : !llvm.i64 + %6797 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6798 = llvm.mul %6791, %6797 : !llvm.i64 + %6799 = llvm.add %6796, %6798 : !llvm.i64 + %6800 = llvm.getelementptr %6792[%6799] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6801 = llvm.bitcast %6800 : !llvm.ptr to !llvm.ptr> + %6802 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6803 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6804 = llvm.trunc %6791 : !llvm.i64 to !llvm.i32 + %6805 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6806 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6807 = llvm.insertelement %6804, %6805[%6806 : !llvm.i32] : !llvm.vec<8 x i32> + %6808 = llvm.shufflevector %6807, %6805 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6809 = llvm.add %6808, %6803 : !llvm.vec<8 x i32> + %6810 = llvm.trunc %6802 : !llvm.i64 to !llvm.i32 + %6811 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6812 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6813 = llvm.insertelement %6810, %6811[%6812 : !llvm.i32] : !llvm.vec<8 x i32> + %6814 = llvm.shufflevector %6813, %6811 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6815 = llvm.icmp "slt" %6809, %6814 : !llvm.vec<8 x i32> + %6816 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6817 = llvm.intr.masked.load %6801, %6815, %6816 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6818 = llvm.add %5447, %46 : !llvm.i64 + %6819 = llvm.icmp "slt" %6818, %67 : !llvm.i64 + %6820 = llvm.sub %64, %6818 : !llvm.i64 + %6821 = llvm.select %6819, %6820, %6818 : !llvm.i1, !llvm.i64 + %6822 = llvm.sdiv %6821, %68 : !llvm.i64 + %6823 = llvm.sub %64, %6822 : !llvm.i64 + %6824 = llvm.select %6819, %6823, %6822 : !llvm.i1, !llvm.i64 + %6825 = llvm.srem %6824, %68 : !llvm.i64 + %6826 = llvm.icmp "slt" %6825, %67 : !llvm.i64 + %6827 = llvm.add %6825, %68 : !llvm.i64 + %6828 = llvm.select %6826, %6827, %6825 : !llvm.i1, !llvm.i64 + %6829 = llvm.mul %6824, %65 : !llvm.i64 + %6830 = llvm.add %6429, %6829 : !llvm.i64 + %6831 = llvm.add %6830, %61 : !llvm.i64 + %6832 = llvm.icmp "slt" %6831, %67 : !llvm.i64 + %6833 = llvm.sub %64, %6831 : !llvm.i64 + %6834 = llvm.select %6832, %6833, %6831 : !llvm.i1, !llvm.i64 + %6835 = llvm.sdiv %6834, %63 : !llvm.i64 + %6836 = llvm.sub %64, %6835 : !llvm.i64 + %6837 = llvm.select %6832, %6836, %6835 : !llvm.i1, !llvm.i64 + %6838 = llvm.mul %6837, %65 : !llvm.i64 + %6839 = llvm.add %6830, %6838 : !llvm.i64 + %6840 = llvm.add %6839, %61 : !llvm.i64 + %6841 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6842 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6843 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6844 = llvm.mul %6828, %6843 : !llvm.i64 + %6845 = llvm.add %6842, %6844 : !llvm.i64 + %6846 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6847 = llvm.mul %67, %6846 : !llvm.i64 + %6848 = llvm.add %6845, %6847 : !llvm.i64 + %6849 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6850 = llvm.mul %6840, %6849 : !llvm.i64 + %6851 = llvm.add %6848, %6850 : !llvm.i64 + %6852 = llvm.getelementptr %6841[%6851] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6853 = llvm.load %6852 : !llvm.ptr> + %6854 = llvm.fadd %6817, %6853 : !llvm.vec<8 x float> + %6855 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6856 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6857 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6858 = llvm.mul %67, %6857 : !llvm.i64 + %6859 = llvm.add %6856, %6858 : !llvm.i64 + %6860 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6861 = llvm.mul %61, %6860 : !llvm.i64 + %6862 = llvm.add %6859, %6861 : !llvm.i64 + %6863 = llvm.getelementptr %6855[%6862] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6854, %6863 : !llvm.ptr> + %6864 = llvm.add %6315, %47 : !llvm.i64 + %6865 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6866 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6867 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6868 = llvm.mul %2345, %6867 : !llvm.i64 + %6869 = llvm.add %6866, %6868 : !llvm.i64 + %6870 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6871 = llvm.mul %6864, %6870 : !llvm.i64 + %6872 = llvm.add %6869, %6871 : !llvm.i64 + %6873 = llvm.getelementptr %6865[%6872] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6874 = llvm.bitcast %6873 : !llvm.ptr to !llvm.ptr> + %6875 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6876 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6877 = llvm.trunc %6864 : !llvm.i64 to !llvm.i32 + %6878 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6879 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6880 = llvm.insertelement %6877, %6878[%6879 : !llvm.i32] : !llvm.vec<8 x i32> + %6881 = llvm.shufflevector %6880, %6878 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6882 = llvm.add %6881, %6876 : !llvm.vec<8 x i32> + %6883 = llvm.trunc %6875 : !llvm.i64 to !llvm.i32 + %6884 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6885 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6886 = llvm.insertelement %6883, %6884[%6885 : !llvm.i32] : !llvm.vec<8 x i32> + %6887 = llvm.shufflevector %6886, %6884 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6888 = llvm.icmp "slt" %6882, %6887 : !llvm.vec<8 x i32> + %6889 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6890 = llvm.intr.masked.load %6874, %6888, %6889 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6891 = llvm.add %6347, %48 : !llvm.i64 + %6892 = llvm.icmp "slt" %6891, %67 : !llvm.i64 + %6893 = llvm.sub %64, %6891 : !llvm.i64 + %6894 = llvm.select %6892, %6893, %6891 : !llvm.i1, !llvm.i64 + %6895 = llvm.sdiv %6894, %68 : !llvm.i64 + %6896 = llvm.sub %64, %6895 : !llvm.i64 + %6897 = llvm.select %6892, %6896, %6895 : !llvm.i1, !llvm.i64 + %6898 = llvm.mul %6897, %60 : !llvm.i64 + %6899 = llvm.add %6347, %6898 : !llvm.i64 + %6900 = llvm.add %6899, %48 : !llvm.i64 + %6901 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6902 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6903 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6904 = llvm.mul %6900, %6903 : !llvm.i64 + %6905 = llvm.add %6902, %6904 : !llvm.i64 + %6906 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6907 = llvm.mul %67, %6906 : !llvm.i64 + %6908 = llvm.add %6905, %6907 : !llvm.i64 + %6909 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6910 = llvm.mul %6365, %6909 : !llvm.i64 + %6911 = llvm.add %6908, %6910 : !llvm.i64 + %6912 = llvm.getelementptr %6901[%6911] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6913 = llvm.load %6912 : !llvm.ptr> + %6914 = llvm.fadd %6890, %6913 : !llvm.vec<8 x float> + %6915 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6916 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6917 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6918 = llvm.mul %67, %6917 : !llvm.i64 + %6919 = llvm.add %6916, %6918 : !llvm.i64 + %6920 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6921 = llvm.mul %70, %6920 : !llvm.i64 + %6922 = llvm.add %6919, %6921 : !llvm.i64 + %6923 = llvm.getelementptr %6915[%6922] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6914, %6923 : !llvm.ptr> + %6924 = llvm.add %6315, %49 : !llvm.i64 + %6925 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6926 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6927 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6928 = llvm.mul %2345, %6927 : !llvm.i64 + %6929 = llvm.add %6926, %6928 : !llvm.i64 + %6930 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6931 = llvm.mul %6924, %6930 : !llvm.i64 + %6932 = llvm.add %6929, %6931 : !llvm.i64 + %6933 = llvm.getelementptr %6925[%6932] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %6934 = llvm.bitcast %6933 : !llvm.ptr to !llvm.ptr> + %6935 = llvm.mlir.constant(512 : index) : !llvm.i64 + %6936 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %6937 = llvm.trunc %6924 : !llvm.i64 to !llvm.i32 + %6938 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6939 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6940 = llvm.insertelement %6937, %6938[%6939 : !llvm.i32] : !llvm.vec<8 x i32> + %6941 = llvm.shufflevector %6940, %6938 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6942 = llvm.add %6941, %6936 : !llvm.vec<8 x i32> + %6943 = llvm.trunc %6935 : !llvm.i64 to !llvm.i32 + %6944 = llvm.mlir.undef : !llvm.vec<8 x i32> + %6945 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %6946 = llvm.insertelement %6943, %6944[%6945 : !llvm.i32] : !llvm.vec<8 x i32> + %6947 = llvm.shufflevector %6946, %6944 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %6948 = llvm.icmp "slt" %6942, %6947 : !llvm.vec<8 x i32> + %6949 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %6950 = llvm.intr.masked.load %6934, %6948, %6949 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %6951 = llvm.add %5447, %49 : !llvm.i64 + %6952 = llvm.icmp "slt" %6951, %67 : !llvm.i64 + %6953 = llvm.sub %64, %6951 : !llvm.i64 + %6954 = llvm.select %6952, %6953, %6951 : !llvm.i1, !llvm.i64 + %6955 = llvm.sdiv %6954, %68 : !llvm.i64 + %6956 = llvm.sub %64, %6955 : !llvm.i64 + %6957 = llvm.select %6952, %6956, %6955 : !llvm.i1, !llvm.i64 + %6958 = llvm.srem %6957, %68 : !llvm.i64 + %6959 = llvm.icmp "slt" %6958, %67 : !llvm.i64 + %6960 = llvm.add %6958, %68 : !llvm.i64 + %6961 = llvm.select %6959, %6960, %6958 : !llvm.i1, !llvm.i64 + %6962 = llvm.mul %6957, %65 : !llvm.i64 + %6963 = llvm.add %6429, %6962 : !llvm.i64 + %6964 = llvm.add %6963, %50 : !llvm.i64 + %6965 = llvm.icmp "slt" %6964, %67 : !llvm.i64 + %6966 = llvm.sub %64, %6964 : !llvm.i64 + %6967 = llvm.select %6965, %6966, %6964 : !llvm.i1, !llvm.i64 + %6968 = llvm.sdiv %6967, %63 : !llvm.i64 + %6969 = llvm.sub %64, %6968 : !llvm.i64 + %6970 = llvm.select %6965, %6969, %6968 : !llvm.i1, !llvm.i64 + %6971 = llvm.mul %6970, %65 : !llvm.i64 + %6972 = llvm.add %6963, %6971 : !llvm.i64 + %6973 = llvm.add %6972, %50 : !llvm.i64 + %6974 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %6975 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6976 = llvm.mlir.constant(12 : index) : !llvm.i64 + %6977 = llvm.mul %6961, %6976 : !llvm.i64 + %6978 = llvm.add %6975, %6977 : !llvm.i64 + %6979 = llvm.mlir.constant(2 : index) : !llvm.i64 + %6980 = llvm.mul %67, %6979 : !llvm.i64 + %6981 = llvm.add %6978, %6980 : !llvm.i64 + %6982 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6983 = llvm.mul %6973, %6982 : !llvm.i64 + %6984 = llvm.add %6981, %6983 : !llvm.i64 + %6985 = llvm.getelementptr %6974[%6984] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %6986 = llvm.load %6985 : !llvm.ptr> + %6987 = llvm.fadd %6950, %6986 : !llvm.vec<8 x float> + %6988 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %6989 = llvm.mlir.constant(0 : index) : !llvm.i64 + %6990 = llvm.mlir.constant(16 : index) : !llvm.i64 + %6991 = llvm.mul %67, %6990 : !llvm.i64 + %6992 = llvm.add %6989, %6991 : !llvm.i64 + %6993 = llvm.mlir.constant(1 : index) : !llvm.i64 + %6994 = llvm.mul %50, %6993 : !llvm.i64 + %6995 = llvm.add %6992, %6994 : !llvm.i64 + %6996 = llvm.getelementptr %6988[%6995] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %6987, %6996 : !llvm.ptr> + %6997 = llvm.add %6315, %51 : !llvm.i64 + %6998 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %6999 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7000 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7001 = llvm.mul %2345, %7000 : !llvm.i64 + %7002 = llvm.add %6999, %7001 : !llvm.i64 + %7003 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7004 = llvm.mul %6997, %7003 : !llvm.i64 + %7005 = llvm.add %7002, %7004 : !llvm.i64 + %7006 = llvm.getelementptr %6998[%7005] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7007 = llvm.bitcast %7006 : !llvm.ptr to !llvm.ptr> + %7008 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7009 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7010 = llvm.trunc %6997 : !llvm.i64 to !llvm.i32 + %7011 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7012 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7013 = llvm.insertelement %7010, %7011[%7012 : !llvm.i32] : !llvm.vec<8 x i32> + %7014 = llvm.shufflevector %7013, %7011 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7015 = llvm.add %7014, %7009 : !llvm.vec<8 x i32> + %7016 = llvm.trunc %7008 : !llvm.i64 to !llvm.i32 + %7017 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7018 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7019 = llvm.insertelement %7016, %7017[%7018 : !llvm.i32] : !llvm.vec<8 x i32> + %7020 = llvm.shufflevector %7019, %7017 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7021 = llvm.icmp "slt" %7015, %7020 : !llvm.vec<8 x i32> + %7022 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7023 = llvm.intr.masked.load %7007, %7021, %7022 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7024 = llvm.add %6347, %52 : !llvm.i64 + %7025 = llvm.icmp "slt" %7024, %67 : !llvm.i64 + %7026 = llvm.sub %64, %7024 : !llvm.i64 + %7027 = llvm.select %7025, %7026, %7024 : !llvm.i1, !llvm.i64 + %7028 = llvm.sdiv %7027, %68 : !llvm.i64 + %7029 = llvm.sub %64, %7028 : !llvm.i64 + %7030 = llvm.select %7025, %7029, %7028 : !llvm.i1, !llvm.i64 + %7031 = llvm.mul %7030, %60 : !llvm.i64 + %7032 = llvm.add %6347, %7031 : !llvm.i64 + %7033 = llvm.add %7032, %52 : !llvm.i64 + %7034 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7035 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7036 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7037 = llvm.mul %7033, %7036 : !llvm.i64 + %7038 = llvm.add %7035, %7037 : !llvm.i64 + %7039 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7040 = llvm.mul %67, %7039 : !llvm.i64 + %7041 = llvm.add %7038, %7040 : !llvm.i64 + %7042 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7043 = llvm.mul %6365, %7042 : !llvm.i64 + %7044 = llvm.add %7041, %7043 : !llvm.i64 + %7045 = llvm.getelementptr %7034[%7044] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7046 = llvm.load %7045 : !llvm.ptr> + %7047 = llvm.fadd %7023, %7046 : !llvm.vec<8 x float> + %7048 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7049 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7050 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7051 = llvm.mul %67, %7050 : !llvm.i64 + %7052 = llvm.add %7049, %7051 : !llvm.i64 + %7053 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7054 = llvm.mul %33, %7053 : !llvm.i64 + %7055 = llvm.add %7052, %7054 : !llvm.i64 + %7056 = llvm.getelementptr %7048[%7055] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7047, %7056 : !llvm.ptr> + %7057 = llvm.add %6315, %53 : !llvm.i64 + %7058 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7059 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7060 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7061 = llvm.mul %2345, %7060 : !llvm.i64 + %7062 = llvm.add %7059, %7061 : !llvm.i64 + %7063 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7064 = llvm.mul %7057, %7063 : !llvm.i64 + %7065 = llvm.add %7062, %7064 : !llvm.i64 + %7066 = llvm.getelementptr %7058[%7065] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7067 = llvm.bitcast %7066 : !llvm.ptr to !llvm.ptr> + %7068 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7069 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7070 = llvm.trunc %7057 : !llvm.i64 to !llvm.i32 + %7071 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7072 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7073 = llvm.insertelement %7070, %7071[%7072 : !llvm.i32] : !llvm.vec<8 x i32> + %7074 = llvm.shufflevector %7073, %7071 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7075 = llvm.add %7074, %7069 : !llvm.vec<8 x i32> + %7076 = llvm.trunc %7068 : !llvm.i64 to !llvm.i32 + %7077 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7078 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7079 = llvm.insertelement %7076, %7077[%7078 : !llvm.i32] : !llvm.vec<8 x i32> + %7080 = llvm.shufflevector %7079, %7077 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7081 = llvm.icmp "slt" %7075, %7080 : !llvm.vec<8 x i32> + %7082 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7083 = llvm.intr.masked.load %7067, %7081, %7082 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7084 = llvm.add %5447, %53 : !llvm.i64 + %7085 = llvm.icmp "slt" %7084, %67 : !llvm.i64 + %7086 = llvm.sub %64, %7084 : !llvm.i64 + %7087 = llvm.select %7085, %7086, %7084 : !llvm.i1, !llvm.i64 + %7088 = llvm.sdiv %7087, %68 : !llvm.i64 + %7089 = llvm.sub %64, %7088 : !llvm.i64 + %7090 = llvm.select %7085, %7089, %7088 : !llvm.i1, !llvm.i64 + %7091 = llvm.srem %7090, %68 : !llvm.i64 + %7092 = llvm.icmp "slt" %7091, %67 : !llvm.i64 + %7093 = llvm.add %7091, %68 : !llvm.i64 + %7094 = llvm.select %7092, %7093, %7091 : !llvm.i1, !llvm.i64 + %7095 = llvm.mul %7090, %65 : !llvm.i64 + %7096 = llvm.add %6429, %7095 : !llvm.i64 + %7097 = llvm.add %7096, %54 : !llvm.i64 + %7098 = llvm.icmp "slt" %7097, %67 : !llvm.i64 + %7099 = llvm.sub %64, %7097 : !llvm.i64 + %7100 = llvm.select %7098, %7099, %7097 : !llvm.i1, !llvm.i64 + %7101 = llvm.sdiv %7100, %63 : !llvm.i64 + %7102 = llvm.sub %64, %7101 : !llvm.i64 + %7103 = llvm.select %7098, %7102, %7101 : !llvm.i1, !llvm.i64 + %7104 = llvm.mul %7103, %65 : !llvm.i64 + %7105 = llvm.add %7096, %7104 : !llvm.i64 + %7106 = llvm.add %7105, %54 : !llvm.i64 + %7107 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7108 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7109 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7110 = llvm.mul %7094, %7109 : !llvm.i64 + %7111 = llvm.add %7108, %7110 : !llvm.i64 + %7112 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7113 = llvm.mul %67, %7112 : !llvm.i64 + %7114 = llvm.add %7111, %7113 : !llvm.i64 + %7115 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7116 = llvm.mul %7106, %7115 : !llvm.i64 + %7117 = llvm.add %7114, %7116 : !llvm.i64 + %7118 = llvm.getelementptr %7107[%7117] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7119 = llvm.load %7118 : !llvm.ptr> + %7120 = llvm.fadd %7083, %7119 : !llvm.vec<8 x float> + %7121 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7122 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7123 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7124 = llvm.mul %67, %7123 : !llvm.i64 + %7125 = llvm.add %7122, %7124 : !llvm.i64 + %7126 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7127 = llvm.mul %54, %7126 : !llvm.i64 + %7128 = llvm.add %7125, %7127 : !llvm.i64 + %7129 = llvm.getelementptr %7121[%7128] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7120, %7129 : !llvm.ptr> + %7130 = llvm.add %6315, %55 : !llvm.i64 + %7131 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7132 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7133 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7134 = llvm.mul %2345, %7133 : !llvm.i64 + %7135 = llvm.add %7132, %7134 : !llvm.i64 + %7136 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7137 = llvm.mul %7130, %7136 : !llvm.i64 + %7138 = llvm.add %7135, %7137 : !llvm.i64 + %7139 = llvm.getelementptr %7131[%7138] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7140 = llvm.bitcast %7139 : !llvm.ptr to !llvm.ptr> + %7141 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7142 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7143 = llvm.trunc %7130 : !llvm.i64 to !llvm.i32 + %7144 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7145 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7146 = llvm.insertelement %7143, %7144[%7145 : !llvm.i32] : !llvm.vec<8 x i32> + %7147 = llvm.shufflevector %7146, %7144 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7148 = llvm.add %7147, %7142 : !llvm.vec<8 x i32> + %7149 = llvm.trunc %7141 : !llvm.i64 to !llvm.i32 + %7150 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7151 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7152 = llvm.insertelement %7149, %7150[%7151 : !llvm.i32] : !llvm.vec<8 x i32> + %7153 = llvm.shufflevector %7152, %7150 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7154 = llvm.icmp "slt" %7148, %7153 : !llvm.vec<8 x i32> + %7155 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7156 = llvm.intr.masked.load %7140, %7154, %7155 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7157 = llvm.add %6347, %56 : !llvm.i64 + %7158 = llvm.icmp "slt" %7157, %67 : !llvm.i64 + %7159 = llvm.sub %64, %7157 : !llvm.i64 + %7160 = llvm.select %7158, %7159, %7157 : !llvm.i1, !llvm.i64 + %7161 = llvm.sdiv %7160, %68 : !llvm.i64 + %7162 = llvm.sub %64, %7161 : !llvm.i64 + %7163 = llvm.select %7158, %7162, %7161 : !llvm.i1, !llvm.i64 + %7164 = llvm.mul %7163, %60 : !llvm.i64 + %7165 = llvm.add %6347, %7164 : !llvm.i64 + %7166 = llvm.add %7165, %56 : !llvm.i64 + %7167 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7168 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7169 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7170 = llvm.mul %7166, %7169 : !llvm.i64 + %7171 = llvm.add %7168, %7170 : !llvm.i64 + %7172 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7173 = llvm.mul %67, %7172 : !llvm.i64 + %7174 = llvm.add %7171, %7173 : !llvm.i64 + %7175 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7176 = llvm.mul %6365, %7175 : !llvm.i64 + %7177 = llvm.add %7174, %7176 : !llvm.i64 + %7178 = llvm.getelementptr %7167[%7177] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7179 = llvm.load %7178 : !llvm.ptr> + %7180 = llvm.fadd %7156, %7179 : !llvm.vec<8 x float> + %7181 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7182 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7183 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7184 = llvm.mul %67, %7183 : !llvm.i64 + %7185 = llvm.add %7182, %7184 : !llvm.i64 + %7186 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7187 = llvm.mul %34, %7186 : !llvm.i64 + %7188 = llvm.add %7185, %7187 : !llvm.i64 + %7189 = llvm.getelementptr %7181[%7188] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7180, %7189 : !llvm.ptr> + %7190 = llvm.add %6315, %57 : !llvm.i64 + %7191 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7192 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7193 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7194 = llvm.mul %2345, %7193 : !llvm.i64 + %7195 = llvm.add %7192, %7194 : !llvm.i64 + %7196 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7197 = llvm.mul %7190, %7196 : !llvm.i64 + %7198 = llvm.add %7195, %7197 : !llvm.i64 + %7199 = llvm.getelementptr %7191[%7198] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7200 = llvm.bitcast %7199 : !llvm.ptr to !llvm.ptr> + %7201 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7202 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7203 = llvm.trunc %7190 : !llvm.i64 to !llvm.i32 + %7204 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7205 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7206 = llvm.insertelement %7203, %7204[%7205 : !llvm.i32] : !llvm.vec<8 x i32> + %7207 = llvm.shufflevector %7206, %7204 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7208 = llvm.add %7207, %7202 : !llvm.vec<8 x i32> + %7209 = llvm.trunc %7201 : !llvm.i64 to !llvm.i32 + %7210 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7211 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7212 = llvm.insertelement %7209, %7210[%7211 : !llvm.i32] : !llvm.vec<8 x i32> + %7213 = llvm.shufflevector %7212, %7210 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7214 = llvm.icmp "slt" %7208, %7213 : !llvm.vec<8 x i32> + %7215 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7216 = llvm.intr.masked.load %7200, %7214, %7215 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7217 = llvm.add %5447, %57 : !llvm.i64 + %7218 = llvm.icmp "slt" %7217, %67 : !llvm.i64 + %7219 = llvm.sub %64, %7217 : !llvm.i64 + %7220 = llvm.select %7218, %7219, %7217 : !llvm.i1, !llvm.i64 + %7221 = llvm.sdiv %7220, %68 : !llvm.i64 + %7222 = llvm.sub %64, %7221 : !llvm.i64 + %7223 = llvm.select %7218, %7222, %7221 : !llvm.i1, !llvm.i64 + %7224 = llvm.srem %7223, %68 : !llvm.i64 + %7225 = llvm.icmp "slt" %7224, %67 : !llvm.i64 + %7226 = llvm.add %7224, %68 : !llvm.i64 + %7227 = llvm.select %7225, %7226, %7224 : !llvm.i1, !llvm.i64 + %7228 = llvm.mul %7223, %65 : !llvm.i64 + %7229 = llvm.add %6429, %7228 : !llvm.i64 + %7230 = llvm.add %7229, %58 : !llvm.i64 + %7231 = llvm.icmp "slt" %7230, %67 : !llvm.i64 + %7232 = llvm.sub %64, %7230 : !llvm.i64 + %7233 = llvm.select %7231, %7232, %7230 : !llvm.i1, !llvm.i64 + %7234 = llvm.sdiv %7233, %63 : !llvm.i64 + %7235 = llvm.sub %64, %7234 : !llvm.i64 + %7236 = llvm.select %7231, %7235, %7234 : !llvm.i1, !llvm.i64 + %7237 = llvm.mul %7236, %65 : !llvm.i64 + %7238 = llvm.add %7229, %7237 : !llvm.i64 + %7239 = llvm.add %7238, %58 : !llvm.i64 + %7240 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7241 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7242 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7243 = llvm.mul %7227, %7242 : !llvm.i64 + %7244 = llvm.add %7241, %7243 : !llvm.i64 + %7245 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7246 = llvm.mul %67, %7245 : !llvm.i64 + %7247 = llvm.add %7244, %7246 : !llvm.i64 + %7248 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7249 = llvm.mul %7239, %7248 : !llvm.i64 + %7250 = llvm.add %7247, %7249 : !llvm.i64 + %7251 = llvm.getelementptr %7240[%7250] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7252 = llvm.load %7251 : !llvm.ptr> + %7253 = llvm.fadd %7216, %7252 : !llvm.vec<8 x float> + %7254 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7255 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7256 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7257 = llvm.mul %67, %7256 : !llvm.i64 + %7258 = llvm.add %7255, %7257 : !llvm.i64 + %7259 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7260 = llvm.mul %58, %7259 : !llvm.i64 + %7261 = llvm.add %7258, %7260 : !llvm.i64 + %7262 = llvm.getelementptr %7254[%7261] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7253, %7262 : !llvm.ptr> + %7263 = llvm.add %6315, %59 : !llvm.i64 + %7264 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7265 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7266 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7267 = llvm.mul %2345, %7266 : !llvm.i64 + %7268 = llvm.add %7265, %7267 : !llvm.i64 + %7269 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7270 = llvm.mul %7263, %7269 : !llvm.i64 + %7271 = llvm.add %7268, %7270 : !llvm.i64 + %7272 = llvm.getelementptr %7264[%7271] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7273 = llvm.bitcast %7272 : !llvm.ptr to !llvm.ptr> + %7274 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7275 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7276 = llvm.trunc %7263 : !llvm.i64 to !llvm.i32 + %7277 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7278 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7279 = llvm.insertelement %7276, %7277[%7278 : !llvm.i32] : !llvm.vec<8 x i32> + %7280 = llvm.shufflevector %7279, %7277 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7281 = llvm.add %7280, %7275 : !llvm.vec<8 x i32> + %7282 = llvm.trunc %7274 : !llvm.i64 to !llvm.i32 + %7283 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7284 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7285 = llvm.insertelement %7282, %7283[%7284 : !llvm.i32] : !llvm.vec<8 x i32> + %7286 = llvm.shufflevector %7285, %7283 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7287 = llvm.icmp "slt" %7281, %7286 : !llvm.vec<8 x i32> + %7288 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7289 = llvm.intr.masked.load %7273, %7287, %7288 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7290 = llvm.add %6347, %61 : !llvm.i64 + %7291 = llvm.icmp "slt" %7290, %67 : !llvm.i64 + %7292 = llvm.sub %64, %7290 : !llvm.i64 + %7293 = llvm.select %7291, %7292, %7290 : !llvm.i1, !llvm.i64 + %7294 = llvm.sdiv %7293, %68 : !llvm.i64 + %7295 = llvm.sub %64, %7294 : !llvm.i64 + %7296 = llvm.select %7291, %7295, %7294 : !llvm.i1, !llvm.i64 + %7297 = llvm.mul %7296, %60 : !llvm.i64 + %7298 = llvm.add %6347, %7297 : !llvm.i64 + %7299 = llvm.add %7298, %61 : !llvm.i64 + %7300 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7301 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7302 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7303 = llvm.mul %7299, %7302 : !llvm.i64 + %7304 = llvm.add %7301, %7303 : !llvm.i64 + %7305 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7306 = llvm.mul %67, %7305 : !llvm.i64 + %7307 = llvm.add %7304, %7306 : !llvm.i64 + %7308 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7309 = llvm.mul %6365, %7308 : !llvm.i64 + %7310 = llvm.add %7307, %7309 : !llvm.i64 + %7311 = llvm.getelementptr %7300[%7310] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7312 = llvm.load %7311 : !llvm.ptr> + %7313 = llvm.fadd %7289, %7312 : !llvm.vec<8 x float> + %7314 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7315 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7316 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7317 = llvm.mul %67, %7316 : !llvm.i64 + %7318 = llvm.add %7315, %7317 : !llvm.i64 + %7319 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7320 = llvm.mul %35, %7319 : !llvm.i64 + %7321 = llvm.add %7318, %7320 : !llvm.i64 + %7322 = llvm.getelementptr %7314[%7321] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7313, %7322 : !llvm.ptr> + %7323 = llvm.add %6315, %62 : !llvm.i64 + %7324 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7325 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7326 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7327 = llvm.mul %2345, %7326 : !llvm.i64 + %7328 = llvm.add %7325, %7327 : !llvm.i64 + %7329 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7330 = llvm.mul %7323, %7329 : !llvm.i64 + %7331 = llvm.add %7328, %7330 : !llvm.i64 + %7332 = llvm.getelementptr %7324[%7331] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7333 = llvm.bitcast %7332 : !llvm.ptr to !llvm.ptr> + %7334 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7335 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7336 = llvm.trunc %7323 : !llvm.i64 to !llvm.i32 + %7337 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7338 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7339 = llvm.insertelement %7336, %7337[%7338 : !llvm.i32] : !llvm.vec<8 x i32> + %7340 = llvm.shufflevector %7339, %7337 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7341 = llvm.add %7340, %7335 : !llvm.vec<8 x i32> + %7342 = llvm.trunc %7334 : !llvm.i64 to !llvm.i32 + %7343 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7344 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7345 = llvm.insertelement %7342, %7343[%7344 : !llvm.i32] : !llvm.vec<8 x i32> + %7346 = llvm.shufflevector %7345, %7343 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7347 = llvm.icmp "slt" %7341, %7346 : !llvm.vec<8 x i32> + %7348 = llvm.mlir.constant(dense<0.000000e+00> : vector<8xf32>) : !llvm.vec<8 x float> + %7349 = llvm.intr.masked.load %7333, %7347, %7348 {alignment = 4 : i32} : (!llvm.ptr>, !llvm.vec<8 x i1>, !llvm.vec<8 x float>) -> !llvm.vec<8 x float> + %7350 = llvm.add %5447, %62 : !llvm.i64 + %7351 = llvm.icmp "slt" %7350, %67 : !llvm.i64 + %7352 = llvm.sub %64, %7350 : !llvm.i64 + %7353 = llvm.select %7351, %7352, %7350 : !llvm.i1, !llvm.i64 + %7354 = llvm.sdiv %7353, %68 : !llvm.i64 + %7355 = llvm.sub %64, %7354 : !llvm.i64 + %7356 = llvm.select %7351, %7355, %7354 : !llvm.i1, !llvm.i64 + %7357 = llvm.srem %7356, %68 : !llvm.i64 + %7358 = llvm.icmp "slt" %7357, %67 : !llvm.i64 + %7359 = llvm.add %7357, %68 : !llvm.i64 + %7360 = llvm.select %7358, %7359, %7357 : !llvm.i1, !llvm.i64 + %7361 = llvm.mul %7356, %65 : !llvm.i64 + %7362 = llvm.add %6429, %7361 : !llvm.i64 + %7363 = llvm.add %7362, %66 : !llvm.i64 + %7364 = llvm.icmp "slt" %7363, %67 : !llvm.i64 + %7365 = llvm.sub %64, %7363 : !llvm.i64 + %7366 = llvm.select %7364, %7365, %7363 : !llvm.i1, !llvm.i64 + %7367 = llvm.sdiv %7366, %63 : !llvm.i64 + %7368 = llvm.sub %64, %7367 : !llvm.i64 + %7369 = llvm.select %7364, %7368, %7367 : !llvm.i1, !llvm.i64 + %7370 = llvm.mul %7369, %65 : !llvm.i64 + %7371 = llvm.add %7362, %7370 : !llvm.i64 + %7372 = llvm.add %7371, %66 : !llvm.i64 + %7373 = llvm.extractvalue %130[1] : !llvm.struct<(ptr>, ptr>, i64, array<3 x i64>, array<3 x i64>)> + %7374 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7375 = llvm.mlir.constant(12 : index) : !llvm.i64 + %7376 = llvm.mul %7360, %7375 : !llvm.i64 + %7377 = llvm.add %7374, %7376 : !llvm.i64 + %7378 = llvm.mlir.constant(2 : index) : !llvm.i64 + %7379 = llvm.mul %67, %7378 : !llvm.i64 + %7380 = llvm.add %7377, %7379 : !llvm.i64 + %7381 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7382 = llvm.mul %7372, %7381 : !llvm.i64 + %7383 = llvm.add %7380, %7382 : !llvm.i64 + %7384 = llvm.getelementptr %7373[%7383] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7385 = llvm.load %7384 : !llvm.ptr> + %7386 = llvm.fadd %7349, %7385 : !llvm.vec<8 x float> + %7387 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7388 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7389 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7390 = llvm.mul %67, %7389 : !llvm.i64 + %7391 = llvm.add %7388, %7390 : !llvm.i64 + %7392 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7393 = llvm.mul %66, %7392 : !llvm.i64 + %7394 = llvm.add %7391, %7393 : !llvm.i64 + %7395 = llvm.getelementptr %7387[%7394] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + llvm.store %7386, %7395 : !llvm.ptr> + llvm.br ^bb48(%67 : !llvm.i64) + ^bb48(%7396: !llvm.i64): // 2 preds: ^bb47, ^bb49 + %7397 = llvm.icmp "slt" %7396, %68 : !llvm.i64 + llvm.cond_br %7397, ^bb49, ^bb46 + ^bb49: // pred: ^bb48 + %7398 = llvm.mul %7396, %70 : !llvm.i64 + %7399 = llvm.add %6315, %7398 : !llvm.i64 + %7400 = llvm.extractvalue %110[1] : !llvm.struct<(ptr>, ptr>, i64, array<2 x i64>, array<2 x i64>)> + %7401 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7402 = llvm.mlir.constant(16 : index) : !llvm.i64 + %7403 = llvm.mul %67, %7402 : !llvm.i64 + %7404 = llvm.add %7401, %7403 : !llvm.i64 + %7405 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7406 = llvm.mul %7396, %7405 : !llvm.i64 + %7407 = llvm.add %7404, %7406 : !llvm.i64 + %7408 = llvm.getelementptr %7400[%7407] : (!llvm.ptr>, !llvm.i64) -> !llvm.ptr> + %7409 = llvm.load %7408 : !llvm.ptr> + %7410 = llvm.extractvalue %23[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7411 = llvm.mlir.constant(0 : index) : !llvm.i64 + %7412 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7413 = llvm.mul %2345, %7412 : !llvm.i64 + %7414 = llvm.add %7411, %7413 : !llvm.i64 + %7415 = llvm.mlir.constant(1 : index) : !llvm.i64 + %7416 = llvm.mul %7399, %7415 : !llvm.i64 + %7417 = llvm.add %7414, %7416 : !llvm.i64 + %7418 = llvm.getelementptr %7410[%7417] : (!llvm.ptr, !llvm.i64) -> !llvm.ptr + %7419 = llvm.bitcast %7418 : !llvm.ptr to !llvm.ptr> + %7420 = llvm.mlir.constant(512 : index) : !llvm.i64 + %7421 = llvm.mlir.constant(dense<[0, 1, 2, 3, 4, 5, 6, 7]> : vector<8xi32>) : !llvm.vec<8 x i32> + %7422 = llvm.trunc %7399 : !llvm.i64 to !llvm.i32 + %7423 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7424 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7425 = llvm.insertelement %7422, %7423[%7424 : !llvm.i32] : !llvm.vec<8 x i32> + %7426 = llvm.shufflevector %7425, %7423 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7427 = llvm.add %7426, %7421 : !llvm.vec<8 x i32> + %7428 = llvm.trunc %7420 : !llvm.i64 to !llvm.i32 + %7429 = llvm.mlir.undef : !llvm.vec<8 x i32> + %7430 = llvm.mlir.constant(0 : i32) : !llvm.i32 + %7431 = llvm.insertelement %7428, %7429[%7430 : !llvm.i32] : !llvm.vec<8 x i32> + %7432 = llvm.shufflevector %7431, %7429 [0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm.vec<8 x i32>, !llvm.vec<8 x i32> + %7433 = llvm.icmp "slt" %7427, %7432 : !llvm.vec<8 x i32> + llvm.intr.masked.store %7409, %7419, %7433 {alignment = 4 : i32} : !llvm.vec<8 x float>, !llvm.vec<8 x i1> into !llvm.ptr> + %7434 = llvm.add %7396, %69 : !llvm.i64 + llvm.br ^bb48(%7434 : !llvm.i64) + ^bb50: // pred: ^bb46 + %7435 = llvm.add %5447, %39 : !llvm.i64 + llvm.br ^bb41(%7435 : !llvm.i64) + ^bb51: // pred: ^bb41 + %7436 = llvm.add %2345, %69 : !llvm.i64 + llvm.br ^bb12(%7436 : !llvm.i64) + ^bb52: // pred: ^bb12 + %7437 = llvm.add %151, %38 : !llvm.i64 + llvm.br ^bb1(%7437 : !llvm.i64) + ^bb53: // pred: ^bb1 + llvm.return + } + llvm.func @optimized_matmul_py_4a6286d9(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: !llvm.ptr) attributes {exec_target = 0 : i64, accv.base_name = "optimized_matmul_py", accv.emit_header_decl, accv.emit_raw_pointer_api} { + %0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %2 = llvm.insertvalue %arg0, %1[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %3 = llvm.mlir.constant(0 : index) : !llvm.i64 + %4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %5 = llvm.mlir.constant(784 : index) : !llvm.i64 + %6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %7 = llvm.mlir.constant(128 : index) : !llvm.i64 + %8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %9 = llvm.mlir.constant(128 : index) : !llvm.i64 + %10 = llvm.insertvalue %9, %8[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %11 = llvm.mlir.constant(1 : index) : !llvm.i64 + %12 = llvm.insertvalue %11, %10[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %13 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %14 = llvm.insertvalue %arg1, %13[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %15 = llvm.insertvalue %arg1, %14[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %16 = llvm.mlir.constant(0 : index) : !llvm.i64 + %17 = llvm.insertvalue %16, %15[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %18 = llvm.mlir.constant(128 : index) : !llvm.i64 + %19 = llvm.insertvalue %18, %17[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %20 = llvm.mlir.constant(512 : index) : !llvm.i64 + %21 = llvm.insertvalue %20, %19[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %22 = llvm.mlir.constant(512 : index) : !llvm.i64 + %23 = llvm.insertvalue %22, %21[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %24 = llvm.mlir.constant(1 : index) : !llvm.i64 + %25 = llvm.insertvalue %24, %23[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %26 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %27 = llvm.insertvalue %arg2, %26[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %28 = llvm.insertvalue %arg2, %27[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %29 = llvm.mlir.constant(0 : index) : !llvm.i64 + %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %31 = llvm.mlir.constant(784 : index) : !llvm.i64 + %32 = llvm.insertvalue %31, %30[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %33 = llvm.mlir.constant(512 : index) : !llvm.i64 + %34 = llvm.insertvalue %33, %32[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %35 = llvm.mlir.constant(512 : index) : !llvm.i64 + %36 = llvm.insertvalue %35, %34[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %37 = llvm.mlir.constant(1 : index) : !llvm.i64 + %38 = llvm.insertvalue %37, %36[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %39 = llvm.extractvalue %12[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %40 = llvm.extractvalue %12[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %41 = llvm.extractvalue %12[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %42 = llvm.extractvalue %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %43 = llvm.extractvalue %12[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %44 = llvm.extractvalue %12[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %45 = llvm.extractvalue %12[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %46 = llvm.extractvalue %25[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %47 = llvm.extractvalue %25[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %48 = llvm.extractvalue %25[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %49 = llvm.extractvalue %25[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %50 = llvm.extractvalue %25[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %51 = llvm.extractvalue %25[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %52 = llvm.extractvalue %25[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %53 = llvm.extractvalue %38[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %54 = llvm.extractvalue %38[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %55 = llvm.extractvalue %38[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %56 = llvm.extractvalue %38[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %57 = llvm.extractvalue %38[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %58 = llvm.extractvalue %38[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + %59 = llvm.extractvalue %38[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> + llvm.call @optimized_matmul_py_4a6286d9_impl_17630232307017152746(%39, %40, %41, %42, %43, %44, %45, %46, %47, %48, %49, %50, %51, %52, %53, %54, %55, %56, %57, %58, %59) : (!llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.ptr, !llvm.ptr, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64) -> () + llvm.return + } +} diff --git a/Tutorials/optimized_matmul/optimized_matmul_generator.py b/Tutorials/optimized_matmul/optimized_matmul_generator.py new file mode 100644 index 00000000..58a78cd1 --- /dev/null +++ b/Tutorials/optimized_matmul/optimized_matmul_generator.py @@ -0,0 +1,70 @@ +#!/usr/bin/env python3 +# Accera Optimized MatMul sample: generator + +import accera as acc + +# Define our matrix sizes. These represent an arbitraily chosen layer in a +# Resnet-50 model. +M = 784 +N = 512 +K = 128 + +A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K)) +B = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N)) +C = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N)) + +# Define the loop nest +nest = acc.Nest(shape=(M, N, K)) + +# Get the loop nest indices +i, j, k = nest.get_indices() + +# Define the loop nest logic +@nest.iteration_logic +def _(): + C[i, j] += A[i, k] * B[k, j] + +schedule = nest.create_schedule() + +# Define constants used in the schedule and plan. The values are +# either hardware target characteristics or can be found through auto-tuning. +tile_size_i = 6 +tile_size_j = 128 +tile_size_k = 128 +inner_dim_unroll = 4 +num_rows_in_kernel = 6 + +# Create a CPU target which will define the hardware target characteristics +target = acc.Target(category=acc.Target.Category.CPU) + +# Transform the iteration space +ii = schedule.split(i, tile_size_i) +jj = schedule.split(j, tile_size_j) +kk = schedule.split(k, tile_size_k) + +kkk = schedule.split(kk, inner_dim_unroll) +iii = schedule.split(ii, num_rows_in_kernel) +jjj = schedule.split(jj, (target.vector_bytes // 4) * 2) # There are 2 vfma execution units, each holding (target.vector_bytes // 4) 32-bit float elements +jjjj = schedule.split(jjj, target.vector_bytes // 4) # Each SIMD register holds (target.vector_bytes // 4) 32-bit float elements + +schedule.reorder(j, k, i, jj, ii, kk, kkk, iii, jjj, jjjj) + +plan = schedule.create_plan(target) + +# Add caching +# Cache the B array by prefetching and packing the memory footprint along slices of the jj dimension. +plan.cache(B, jj) +# Cache the C array along slices of jj dimension. Since the C array is the output, its footprint is +# the size of the kernel. If the kernel is small enough, Accera will use registers for this +# accumulation before writing these values back to C. +plan.cache(C, ii) + +# Kernelize the inner dimensions, which applies unroll and vectorize transformations +plan.kernelize(unroll_indices=[jjj, iii, kkk], vectorize_indices=jjjj) + +# Create a package and add a function to the package based on the plan +package = acc.Package() +package.add(plan, args=(A, B, C), base_name="optimized_matmul_py") + +# Build a statically-linked HAT package to be consumed by the C++ runner +package.build(name="optimized_matmul", format=acc.Package.Format.HAT_STATIC) diff --git a/Tutorials/optimized_matmul/optimized_matmul_runner.cpp b/Tutorials/optimized_matmul/optimized_matmul_runner.cpp new file mode 100644 index 00000000..bbe96800 --- /dev/null +++ b/Tutorials/optimized_matmul/optimized_matmul_runner.cpp @@ -0,0 +1,37 @@ +#include +#include + +// Include the HAT file that declares our MatMul function +#include "optimized_matmul.hat" + +#define M 784 +#define N 512 +#define K 128 + +int main(int argc, const char** argv) +{ + // Prepare our matrices (using the heap for large matrices) + float* A = new float[M*K]; + float* B = new float[K*N]; + float* C = new float[M*N]; + + // Fill with data + std::fill_n(A, M*K, 2.0f); + std::fill_n(B, K*N, 3.0f); + std::fill_n(C, M*N, 0.42f); + + printf("Calling MatMul M=%d, K=%d, N=%d\n", M, K, N); + optimized_matmul_py(A, B, C); + + printf("Result (first 10 elements): "); + for (int i = 0; i < 10; ++i) + { + printf("%f ", C[i]); + } + printf("\n"); + + delete[] A; + delete[] B; + delete[] C; + return 0; +} diff --git a/assets/images/favicon.png b/assets/images/favicon.png new file mode 100644 index 00000000..1cf13b9f Binary files /dev/null and b/assets/images/favicon.png differ diff --git a/assets/javascripts/bundle.19047be9.min.js b/assets/javascripts/bundle.19047be9.min.js new file mode 100644 index 00000000..0e09ba9a --- /dev/null +++ b/assets/javascripts/bundle.19047be9.min.js @@ -0,0 +1,29 @@ +"use strict";(()=>{var Ri=Object.create;var gr=Object.defineProperty;var ki=Object.getOwnPropertyDescriptor;var Hi=Object.getOwnPropertyNames,Ht=Object.getOwnPropertySymbols,Pi=Object.getPrototypeOf,yr=Object.prototype.hasOwnProperty,on=Object.prototype.propertyIsEnumerable;var nn=(e,t,r)=>t in e?gr(e,t,{enumerable:!0,configurable:!0,writable:!0,value:r}):e[t]=r,P=(e,t)=>{for(var r in t||(t={}))yr.call(t,r)&&nn(e,r,t[r]);if(Ht)for(var r of Ht(t))on.call(t,r)&&nn(e,r,t[r]);return e};var an=(e,t)=>{var r={};for(var n in e)yr.call(e,n)&&t.indexOf(n)<0&&(r[n]=e[n]);if(e!=null&&Ht)for(var n of Ht(e))t.indexOf(n)<0&&on.call(e,n)&&(r[n]=e[n]);return r};var Pt=(e,t)=>()=>(t||e((t={exports:{}}).exports,t),t.exports);var $i=(e,t,r,n)=>{if(t&&typeof t=="object"||typeof t=="function")for(let o of Hi(t))!yr.call(e,o)&&o!==r&&gr(e,o,{get:()=>t[o],enumerable:!(n=ki(t,o))||n.enumerable});return e};var yt=(e,t,r)=>(r=e!=null?Ri(Pi(e)):{},$i(t||!e||!e.__esModule?gr(r,"default",{value:e,enumerable:!0}):r,e));var cn=Pt((xr,sn)=>{(function(e,t){typeof xr=="object"&&typeof sn!="undefined"?t():typeof define=="function"&&define.amd?define(t):t()})(xr,function(){"use strict";function e(r){var n=!0,o=!1,i=null,s={text:!0,search:!0,url:!0,tel:!0,email:!0,password:!0,number:!0,date:!0,month:!0,week:!0,time:!0,datetime:!0,"datetime-local":!0};function a(T){return!!(T&&T!==document&&T.nodeName!=="HTML"&&T.nodeName!=="BODY"&&"classList"in T&&"contains"in T.classList)}function c(T){var Qe=T.type,De=T.tagName;return!!(De==="INPUT"&&s[Qe]&&!T.readOnly||De==="TEXTAREA"&&!T.readOnly||T.isContentEditable)}function f(T){T.classList.contains("focus-visible")||(T.classList.add("focus-visible"),T.setAttribute("data-focus-visible-added",""))}function u(T){T.hasAttribute("data-focus-visible-added")&&(T.classList.remove("focus-visible"),T.removeAttribute("data-focus-visible-added"))}function p(T){T.metaKey||T.altKey||T.ctrlKey||(a(r.activeElement)&&f(r.activeElement),n=!0)}function m(T){n=!1}function d(T){a(T.target)&&(n||c(T.target))&&f(T.target)}function h(T){a(T.target)&&(T.target.classList.contains("focus-visible")||T.target.hasAttribute("data-focus-visible-added"))&&(o=!0,window.clearTimeout(i),i=window.setTimeout(function(){o=!1},100),u(T.target))}function v(T){document.visibilityState==="hidden"&&(o&&(n=!0),G())}function G(){document.addEventListener("mousemove",N),document.addEventListener("mousedown",N),document.addEventListener("mouseup",N),document.addEventListener("pointermove",N),document.addEventListener("pointerdown",N),document.addEventListener("pointerup",N),document.addEventListener("touchmove",N),document.addEventListener("touchstart",N),document.addEventListener("touchend",N)}function oe(){document.removeEventListener("mousemove",N),document.removeEventListener("mousedown",N),document.removeEventListener("mouseup",N),document.removeEventListener("pointermove",N),document.removeEventListener("pointerdown",N),document.removeEventListener("pointerup",N),document.removeEventListener("touchmove",N),document.removeEventListener("touchstart",N),document.removeEventListener("touchend",N)}function N(T){T.target.nodeName&&T.target.nodeName.toLowerCase()==="html"||(n=!1,oe())}document.addEventListener("keydown",p,!0),document.addEventListener("mousedown",m,!0),document.addEventListener("pointerdown",m,!0),document.addEventListener("touchstart",m,!0),document.addEventListener("visibilitychange",v,!0),G(),r.addEventListener("focus",d,!0),r.addEventListener("blur",h,!0),r.nodeType===Node.DOCUMENT_FRAGMENT_NODE&&r.host?r.host.setAttribute("data-js-focus-visible",""):r.nodeType===Node.DOCUMENT_NODE&&(document.documentElement.classList.add("js-focus-visible"),document.documentElement.setAttribute("data-js-focus-visible",""))}if(typeof window!="undefined"&&typeof document!="undefined"){window.applyFocusVisiblePolyfill=e;var t;try{t=new CustomEvent("focus-visible-polyfill-ready")}catch(r){t=document.createEvent("CustomEvent"),t.initCustomEvent("focus-visible-polyfill-ready",!1,!1,{})}window.dispatchEvent(t)}typeof document!="undefined"&&e(document)})});var fn=Pt(Er=>{(function(e){var t=function(){try{return!!Symbol.iterator}catch(f){return!1}},r=t(),n=function(f){var u={next:function(){var p=f.shift();return{done:p===void 0,value:p}}};return r&&(u[Symbol.iterator]=function(){return u}),u},o=function(f){return encodeURIComponent(f).replace(/%20/g,"+")},i=function(f){return decodeURIComponent(String(f).replace(/\+/g," "))},s=function(){var f=function(p){Object.defineProperty(this,"_entries",{writable:!0,value:{}});var m=typeof p;if(m!=="undefined")if(m==="string")p!==""&&this._fromString(p);else if(p instanceof f){var d=this;p.forEach(function(oe,N){d.append(N,oe)})}else if(p!==null&&m==="object")if(Object.prototype.toString.call(p)==="[object Array]")for(var h=0;hd[0]?1:0}),f._entries&&(f._entries={});for(var p=0;p1?i(d[1]):"")}})})(typeof global!="undefined"?global:typeof window!="undefined"?window:typeof self!="undefined"?self:Er);(function(e){var t=function(){try{var o=new e.URL("b","http://a");return o.pathname="c d",o.href==="http://a/c%20d"&&o.searchParams}catch(i){return!1}},r=function(){var o=e.URL,i=function(c,f){typeof c!="string"&&(c=String(c)),f&&typeof f!="string"&&(f=String(f));var u=document,p;if(f&&(e.location===void 0||f!==e.location.href)){f=f.toLowerCase(),u=document.implementation.createHTMLDocument(""),p=u.createElement("base"),p.href=f,u.head.appendChild(p);try{if(p.href.indexOf(f)!==0)throw new Error(p.href)}catch(T){throw new Error("URL unable to set base "+f+" due to "+T)}}var m=u.createElement("a");m.href=c,p&&(u.body.appendChild(m),m.href=m.href);var d=u.createElement("input");if(d.type="url",d.value=c,m.protocol===":"||!/:/.test(m.href)||!d.checkValidity()&&!f)throw new TypeError("Invalid URL");Object.defineProperty(this,"_anchorElement",{value:m});var h=new e.URLSearchParams(this.search),v=!0,G=!0,oe=this;["append","delete","set"].forEach(function(T){var Qe=h[T];h[T]=function(){Qe.apply(h,arguments),v&&(G=!1,oe.search=h.toString(),G=!0)}}),Object.defineProperty(this,"searchParams",{value:h,enumerable:!0});var N=void 0;Object.defineProperty(this,"_updateSearchParams",{enumerable:!1,configurable:!1,writable:!1,value:function(){this.search!==N&&(N=this.search,G&&(v=!1,this.searchParams._fromString(this.search),v=!0))}})},s=i.prototype,a=function(c){Object.defineProperty(s,c,{get:function(){return this._anchorElement[c]},set:function(f){this._anchorElement[c]=f},enumerable:!0})};["hash","host","hostname","port","protocol"].forEach(function(c){a(c)}),Object.defineProperty(s,"search",{get:function(){return this._anchorElement.search},set:function(c){this._anchorElement.search=c,this._updateSearchParams()},enumerable:!0}),Object.defineProperties(s,{toString:{get:function(){var c=this;return function(){return c.href}}},href:{get:function(){return this._anchorElement.href.replace(/\?$/,"")},set:function(c){this._anchorElement.href=c,this._updateSearchParams()},enumerable:!0},pathname:{get:function(){return this._anchorElement.pathname.replace(/(^\/?)/,"/")},set:function(c){this._anchorElement.pathname=c},enumerable:!0},origin:{get:function(){var c={"http:":80,"https:":443,"ftp:":21}[this._anchorElement.protocol],f=this._anchorElement.port!=c&&this._anchorElement.port!=="";return this._anchorElement.protocol+"//"+this._anchorElement.hostname+(f?":"+this._anchorElement.port:"")},enumerable:!0},password:{get:function(){return""},set:function(c){},enumerable:!0},username:{get:function(){return""},set:function(c){},enumerable:!0}}),i.createObjectURL=function(c){return o.createObjectURL.apply(o,arguments)},i.revokeObjectURL=function(c){return o.revokeObjectURL.apply(o,arguments)},e.URL=i};if(t()||r(),e.location!==void 0&&!("origin"in e.location)){var n=function(){return e.location.protocol+"//"+e.location.hostname+(e.location.port?":"+e.location.port:"")};try{Object.defineProperty(e.location,"origin",{get:n,enumerable:!0})}catch(o){setInterval(function(){e.location.origin=n()},100)}}})(typeof global!="undefined"?global:typeof window!="undefined"?window:typeof self!="undefined"?self:Er)});var Kr=Pt((Mt,qr)=>{/*! + * clipboard.js v2.0.11 + * https://clipboardjs.com/ + * + * Licensed MIT © Zeno Rocha + */(function(t,r){typeof Mt=="object"&&typeof qr=="object"?qr.exports=r():typeof define=="function"&&define.amd?define([],r):typeof Mt=="object"?Mt.ClipboardJS=r():t.ClipboardJS=r()})(Mt,function(){return function(){var e={686:function(n,o,i){"use strict";i.d(o,{default:function(){return Ci}});var s=i(279),a=i.n(s),c=i(370),f=i.n(c),u=i(817),p=i.n(u);function m(j){try{return document.execCommand(j)}catch(O){return!1}}var d=function(O){var E=p()(O);return m("cut"),E},h=d;function v(j){var O=document.documentElement.getAttribute("dir")==="rtl",E=document.createElement("textarea");E.style.fontSize="12pt",E.style.border="0",E.style.padding="0",E.style.margin="0",E.style.position="absolute",E.style[O?"right":"left"]="-9999px";var H=window.pageYOffset||document.documentElement.scrollTop;return E.style.top="".concat(H,"px"),E.setAttribute("readonly",""),E.value=j,E}var G=function(O,E){var H=v(O);E.container.appendChild(H);var I=p()(H);return m("copy"),H.remove(),I},oe=function(O){var E=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body},H="";return typeof O=="string"?H=G(O,E):O instanceof HTMLInputElement&&!["text","search","url","tel","password"].includes(O==null?void 0:O.type)?H=G(O.value,E):(H=p()(O),m("copy")),H},N=oe;function T(j){return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?T=function(E){return typeof E}:T=function(E){return E&&typeof Symbol=="function"&&E.constructor===Symbol&&E!==Symbol.prototype?"symbol":typeof E},T(j)}var Qe=function(){var O=arguments.length>0&&arguments[0]!==void 0?arguments[0]:{},E=O.action,H=E===void 0?"copy":E,I=O.container,q=O.target,Me=O.text;if(H!=="copy"&&H!=="cut")throw new Error('Invalid "action" value, use either "copy" or "cut"');if(q!==void 0)if(q&&T(q)==="object"&&q.nodeType===1){if(H==="copy"&&q.hasAttribute("disabled"))throw new Error('Invalid "target" attribute. Please use "readonly" instead of "disabled" attribute');if(H==="cut"&&(q.hasAttribute("readonly")||q.hasAttribute("disabled")))throw new Error(`Invalid "target" attribute. You can't cut text from elements with "readonly" or "disabled" attributes`)}else throw new Error('Invalid "target" value, use a valid Element');if(Me)return N(Me,{container:I});if(q)return H==="cut"?h(q):N(q,{container:I})},De=Qe;function $e(j){return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?$e=function(E){return typeof E}:$e=function(E){return E&&typeof Symbol=="function"&&E.constructor===Symbol&&E!==Symbol.prototype?"symbol":typeof E},$e(j)}function wi(j,O){if(!(j instanceof O))throw new TypeError("Cannot call a class as a function")}function rn(j,O){for(var E=0;E0&&arguments[0]!==void 0?arguments[0]:{};this.action=typeof I.action=="function"?I.action:this.defaultAction,this.target=typeof I.target=="function"?I.target:this.defaultTarget,this.text=typeof I.text=="function"?I.text:this.defaultText,this.container=$e(I.container)==="object"?I.container:document.body}},{key:"listenClick",value:function(I){var q=this;this.listener=f()(I,"click",function(Me){return q.onClick(Me)})}},{key:"onClick",value:function(I){var q=I.delegateTarget||I.currentTarget,Me=this.action(q)||"copy",kt=De({action:Me,container:this.container,target:this.target(q),text:this.text(q)});this.emit(kt?"success":"error",{action:Me,text:kt,trigger:q,clearSelection:function(){q&&q.focus(),window.getSelection().removeAllRanges()}})}},{key:"defaultAction",value:function(I){return vr("action",I)}},{key:"defaultTarget",value:function(I){var q=vr("target",I);if(q)return document.querySelector(q)}},{key:"defaultText",value:function(I){return vr("text",I)}},{key:"destroy",value:function(){this.listener.destroy()}}],[{key:"copy",value:function(I){var q=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body};return N(I,q)}},{key:"cut",value:function(I){return h(I)}},{key:"isSupported",value:function(){var I=arguments.length>0&&arguments[0]!==void 0?arguments[0]:["copy","cut"],q=typeof I=="string"?[I]:I,Me=!!document.queryCommandSupported;return q.forEach(function(kt){Me=Me&&!!document.queryCommandSupported(kt)}),Me}}]),E}(a()),Ci=Ai},828:function(n){var o=9;if(typeof Element!="undefined"&&!Element.prototype.matches){var i=Element.prototype;i.matches=i.matchesSelector||i.mozMatchesSelector||i.msMatchesSelector||i.oMatchesSelector||i.webkitMatchesSelector}function s(a,c){for(;a&&a.nodeType!==o;){if(typeof a.matches=="function"&&a.matches(c))return a;a=a.parentNode}}n.exports=s},438:function(n,o,i){var s=i(828);function a(u,p,m,d,h){var v=f.apply(this,arguments);return u.addEventListener(m,v,h),{destroy:function(){u.removeEventListener(m,v,h)}}}function c(u,p,m,d,h){return typeof u.addEventListener=="function"?a.apply(null,arguments):typeof m=="function"?a.bind(null,document).apply(null,arguments):(typeof u=="string"&&(u=document.querySelectorAll(u)),Array.prototype.map.call(u,function(v){return a(v,p,m,d,h)}))}function f(u,p,m,d){return function(h){h.delegateTarget=s(h.target,p),h.delegateTarget&&d.call(u,h)}}n.exports=c},879:function(n,o){o.node=function(i){return i!==void 0&&i instanceof HTMLElement&&i.nodeType===1},o.nodeList=function(i){var s=Object.prototype.toString.call(i);return i!==void 0&&(s==="[object NodeList]"||s==="[object HTMLCollection]")&&"length"in i&&(i.length===0||o.node(i[0]))},o.string=function(i){return typeof i=="string"||i instanceof String},o.fn=function(i){var s=Object.prototype.toString.call(i);return s==="[object Function]"}},370:function(n,o,i){var s=i(879),a=i(438);function c(m,d,h){if(!m&&!d&&!h)throw new Error("Missing required arguments");if(!s.string(d))throw new TypeError("Second argument must be a String");if(!s.fn(h))throw new TypeError("Third argument must be a Function");if(s.node(m))return f(m,d,h);if(s.nodeList(m))return u(m,d,h);if(s.string(m))return p(m,d,h);throw new TypeError("First argument must be a String, HTMLElement, HTMLCollection, or NodeList")}function f(m,d,h){return m.addEventListener(d,h),{destroy:function(){m.removeEventListener(d,h)}}}function u(m,d,h){return Array.prototype.forEach.call(m,function(v){v.addEventListener(d,h)}),{destroy:function(){Array.prototype.forEach.call(m,function(v){v.removeEventListener(d,h)})}}}function p(m,d,h){return a(document.body,m,d,h)}n.exports=c},817:function(n){function o(i){var s;if(i.nodeName==="SELECT")i.focus(),s=i.value;else if(i.nodeName==="INPUT"||i.nodeName==="TEXTAREA"){var a=i.hasAttribute("readonly");a||i.setAttribute("readonly",""),i.select(),i.setSelectionRange(0,i.value.length),a||i.removeAttribute("readonly"),s=i.value}else{i.hasAttribute("contenteditable")&&i.focus();var c=window.getSelection(),f=document.createRange();f.selectNodeContents(i),c.removeAllRanges(),c.addRange(f),s=c.toString()}return s}n.exports=o},279:function(n){function o(){}o.prototype={on:function(i,s,a){var c=this.e||(this.e={});return(c[i]||(c[i]=[])).push({fn:s,ctx:a}),this},once:function(i,s,a){var c=this;function f(){c.off(i,f),s.apply(a,arguments)}return f._=s,this.on(i,f,a)},emit:function(i){var s=[].slice.call(arguments,1),a=((this.e||(this.e={}))[i]||[]).slice(),c=0,f=a.length;for(c;c{"use strict";/*! + * escape-html + * Copyright(c) 2012-2013 TJ Holowaychuk + * Copyright(c) 2015 Andreas Lubbe + * Copyright(c) 2015 Tiancheng "Timothy" Gu + * MIT Licensed + */var ns=/["'&<>]/;Go.exports=os;function os(e){var t=""+e,r=ns.exec(t);if(!r)return t;var n,o="",i=0,s=0;for(i=r.index;i0&&i[i.length-1])&&(f[0]===6||f[0]===2)){r=0;continue}if(f[0]===3&&(!i||f[1]>i[0]&&f[1]=e.length&&(e=void 0),{value:e&&e[n++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function W(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var n=r.call(e),o,i=[],s;try{for(;(t===void 0||t-- >0)&&!(o=n.next()).done;)i.push(o.value)}catch(a){s={error:a}}finally{try{o&&!o.done&&(r=n.return)&&r.call(n)}finally{if(s)throw s.error}}return i}function D(e,t,r){if(r||arguments.length===2)for(var n=0,o=t.length,i;n1||a(m,d)})})}function a(m,d){try{c(n[m](d))}catch(h){p(i[0][3],h)}}function c(m){m.value instanceof et?Promise.resolve(m.value.v).then(f,u):p(i[0][2],m)}function f(m){a("next",m)}function u(m){a("throw",m)}function p(m,d){m(d),i.shift(),i.length&&a(i[0][0],i[0][1])}}function ln(e){if(!Symbol.asyncIterator)throw new TypeError("Symbol.asyncIterator is not defined.");var t=e[Symbol.asyncIterator],r;return t?t.call(e):(e=typeof Ee=="function"?Ee(e):e[Symbol.iterator](),r={},n("next"),n("throw"),n("return"),r[Symbol.asyncIterator]=function(){return this},r);function n(i){r[i]=e[i]&&function(s){return new Promise(function(a,c){s=e[i](s),o(a,c,s.done,s.value)})}}function o(i,s,a,c){Promise.resolve(c).then(function(f){i({value:f,done:a})},s)}}function C(e){return typeof e=="function"}function at(e){var t=function(n){Error.call(n),n.stack=new Error().stack},r=e(t);return r.prototype=Object.create(Error.prototype),r.prototype.constructor=r,r}var It=at(function(e){return function(r){e(this),this.message=r?r.length+` errors occurred during unsubscription: +`+r.map(function(n,o){return o+1+") "+n.toString()}).join(` + `):"",this.name="UnsubscriptionError",this.errors=r}});function Ve(e,t){if(e){var r=e.indexOf(t);0<=r&&e.splice(r,1)}}var Ie=function(){function e(t){this.initialTeardown=t,this.closed=!1,this._parentage=null,this._finalizers=null}return e.prototype.unsubscribe=function(){var t,r,n,o,i;if(!this.closed){this.closed=!0;var s=this._parentage;if(s)if(this._parentage=null,Array.isArray(s))try{for(var a=Ee(s),c=a.next();!c.done;c=a.next()){var f=c.value;f.remove(this)}}catch(v){t={error:v}}finally{try{c&&!c.done&&(r=a.return)&&r.call(a)}finally{if(t)throw t.error}}else s.remove(this);var u=this.initialTeardown;if(C(u))try{u()}catch(v){i=v instanceof It?v.errors:[v]}var p=this._finalizers;if(p){this._finalizers=null;try{for(var m=Ee(p),d=m.next();!d.done;d=m.next()){var h=d.value;try{mn(h)}catch(v){i=i!=null?i:[],v instanceof It?i=D(D([],W(i)),W(v.errors)):i.push(v)}}}catch(v){n={error:v}}finally{try{d&&!d.done&&(o=m.return)&&o.call(m)}finally{if(n)throw n.error}}}if(i)throw new It(i)}},e.prototype.add=function(t){var r;if(t&&t!==this)if(this.closed)mn(t);else{if(t instanceof e){if(t.closed||t._hasParent(this))return;t._addParent(this)}(this._finalizers=(r=this._finalizers)!==null&&r!==void 0?r:[]).push(t)}},e.prototype._hasParent=function(t){var r=this._parentage;return r===t||Array.isArray(r)&&r.includes(t)},e.prototype._addParent=function(t){var r=this._parentage;this._parentage=Array.isArray(r)?(r.push(t),r):r?[r,t]:t},e.prototype._removeParent=function(t){var r=this._parentage;r===t?this._parentage=null:Array.isArray(r)&&Ve(r,t)},e.prototype.remove=function(t){var r=this._finalizers;r&&Ve(r,t),t instanceof e&&t._removeParent(this)},e.EMPTY=function(){var t=new e;return t.closed=!0,t}(),e}();var Sr=Ie.EMPTY;function jt(e){return e instanceof Ie||e&&"closed"in e&&C(e.remove)&&C(e.add)&&C(e.unsubscribe)}function mn(e){C(e)?e():e.unsubscribe()}var Le={onUnhandledError:null,onStoppedNotification:null,Promise:void 0,useDeprecatedSynchronousErrorHandling:!1,useDeprecatedNextContext:!1};var st={setTimeout:function(e,t){for(var r=[],n=2;n0},enumerable:!1,configurable:!0}),t.prototype._trySubscribe=function(r){return this._throwIfClosed(),e.prototype._trySubscribe.call(this,r)},t.prototype._subscribe=function(r){return this._throwIfClosed(),this._checkFinalizedStatuses(r),this._innerSubscribe(r)},t.prototype._innerSubscribe=function(r){var n=this,o=this,i=o.hasError,s=o.isStopped,a=o.observers;return i||s?Sr:(this.currentObservers=null,a.push(r),new Ie(function(){n.currentObservers=null,Ve(a,r)}))},t.prototype._checkFinalizedStatuses=function(r){var n=this,o=n.hasError,i=n.thrownError,s=n.isStopped;o?r.error(i):s&&r.complete()},t.prototype.asObservable=function(){var r=new F;return r.source=this,r},t.create=function(r,n){return new En(r,n)},t}(F);var En=function(e){ie(t,e);function t(r,n){var o=e.call(this)||this;return o.destination=r,o.source=n,o}return t.prototype.next=function(r){var n,o;(o=(n=this.destination)===null||n===void 0?void 0:n.next)===null||o===void 0||o.call(n,r)},t.prototype.error=function(r){var n,o;(o=(n=this.destination)===null||n===void 0?void 0:n.error)===null||o===void 0||o.call(n,r)},t.prototype.complete=function(){var r,n;(n=(r=this.destination)===null||r===void 0?void 0:r.complete)===null||n===void 0||n.call(r)},t.prototype._subscribe=function(r){var n,o;return(o=(n=this.source)===null||n===void 0?void 0:n.subscribe(r))!==null&&o!==void 0?o:Sr},t}(x);var Et={now:function(){return(Et.delegate||Date).now()},delegate:void 0};var wt=function(e){ie(t,e);function t(r,n,o){r===void 0&&(r=1/0),n===void 0&&(n=1/0),o===void 0&&(o=Et);var i=e.call(this)||this;return i._bufferSize=r,i._windowTime=n,i._timestampProvider=o,i._buffer=[],i._infiniteTimeWindow=!0,i._infiniteTimeWindow=n===1/0,i._bufferSize=Math.max(1,r),i._windowTime=Math.max(1,n),i}return t.prototype.next=function(r){var n=this,o=n.isStopped,i=n._buffer,s=n._infiniteTimeWindow,a=n._timestampProvider,c=n._windowTime;o||(i.push(r),!s&&i.push(a.now()+c)),this._trimBuffer(),e.prototype.next.call(this,r)},t.prototype._subscribe=function(r){this._throwIfClosed(),this._trimBuffer();for(var n=this._innerSubscribe(r),o=this,i=o._infiniteTimeWindow,s=o._buffer,a=s.slice(),c=0;c0?e.prototype.requestAsyncId.call(this,r,n,o):(r.actions.push(this),r._scheduled||(r._scheduled=ut.requestAnimationFrame(function(){return r.flush(void 0)})))},t.prototype.recycleAsyncId=function(r,n,o){var i;if(o===void 0&&(o=0),o!=null?o>0:this.delay>0)return e.prototype.recycleAsyncId.call(this,r,n,o);var s=r.actions;n!=null&&((i=s[s.length-1])===null||i===void 0?void 0:i.id)!==n&&(ut.cancelAnimationFrame(n),r._scheduled=void 0)},t}(Wt);var Tn=function(e){ie(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t.prototype.flush=function(r){this._active=!0;var n=this._scheduled;this._scheduled=void 0;var o=this.actions,i;r=r||o.shift();do if(i=r.execute(r.state,r.delay))break;while((r=o[0])&&r.id===n&&o.shift());if(this._active=!1,i){for(;(r=o[0])&&r.id===n&&o.shift();)r.unsubscribe();throw i}},t}(Dt);var Te=new Tn(Sn);var _=new F(function(e){return e.complete()});function Vt(e){return e&&C(e.schedule)}function Cr(e){return e[e.length-1]}function Ye(e){return C(Cr(e))?e.pop():void 0}function Oe(e){return Vt(Cr(e))?e.pop():void 0}function zt(e,t){return typeof Cr(e)=="number"?e.pop():t}var pt=function(e){return e&&typeof e.length=="number"&&typeof e!="function"};function Nt(e){return C(e==null?void 0:e.then)}function qt(e){return C(e[ft])}function Kt(e){return Symbol.asyncIterator&&C(e==null?void 0:e[Symbol.asyncIterator])}function Qt(e){return new TypeError("You provided "+(e!==null&&typeof e=="object"?"an invalid object":"'"+e+"'")+" where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.")}function Ni(){return typeof Symbol!="function"||!Symbol.iterator?"@@iterator":Symbol.iterator}var Yt=Ni();function Gt(e){return C(e==null?void 0:e[Yt])}function Bt(e){return pn(this,arguments,function(){var r,n,o,i;return $t(this,function(s){switch(s.label){case 0:r=e.getReader(),s.label=1;case 1:s.trys.push([1,,9,10]),s.label=2;case 2:return[4,et(r.read())];case 3:return n=s.sent(),o=n.value,i=n.done,i?[4,et(void 0)]:[3,5];case 4:return[2,s.sent()];case 5:return[4,et(o)];case 6:return[4,s.sent()];case 7:return s.sent(),[3,2];case 8:return[3,10];case 9:return r.releaseLock(),[7];case 10:return[2]}})})}function Jt(e){return C(e==null?void 0:e.getReader)}function U(e){if(e instanceof F)return e;if(e!=null){if(qt(e))return qi(e);if(pt(e))return Ki(e);if(Nt(e))return Qi(e);if(Kt(e))return On(e);if(Gt(e))return Yi(e);if(Jt(e))return Gi(e)}throw Qt(e)}function qi(e){return new F(function(t){var r=e[ft]();if(C(r.subscribe))return r.subscribe(t);throw new TypeError("Provided object does not correctly implement Symbol.observable")})}function Ki(e){return new F(function(t){for(var r=0;r=2;return function(n){return n.pipe(e?A(function(o,i){return e(o,i,n)}):de,ge(1),r?He(t):Vn(function(){return new Zt}))}}function zn(){for(var e=[],t=0;t=2,!0))}function pe(e){e===void 0&&(e={});var t=e.connector,r=t===void 0?function(){return new x}:t,n=e.resetOnError,o=n===void 0?!0:n,i=e.resetOnComplete,s=i===void 0?!0:i,a=e.resetOnRefCountZero,c=a===void 0?!0:a;return function(f){var u,p,m,d=0,h=!1,v=!1,G=function(){p==null||p.unsubscribe(),p=void 0},oe=function(){G(),u=m=void 0,h=v=!1},N=function(){var T=u;oe(),T==null||T.unsubscribe()};return y(function(T,Qe){d++,!v&&!h&&G();var De=m=m!=null?m:r();Qe.add(function(){d--,d===0&&!v&&!h&&(p=$r(N,c))}),De.subscribe(Qe),!u&&d>0&&(u=new rt({next:function($e){return De.next($e)},error:function($e){v=!0,G(),p=$r(oe,o,$e),De.error($e)},complete:function(){h=!0,G(),p=$r(oe,s),De.complete()}}),U(T).subscribe(u))})(f)}}function $r(e,t){for(var r=[],n=2;ne.next(document)),e}function K(e,t=document){return Array.from(t.querySelectorAll(e))}function z(e,t=document){let r=ce(e,t);if(typeof r=="undefined")throw new ReferenceError(`Missing element: expected "${e}" to be present`);return r}function ce(e,t=document){return t.querySelector(e)||void 0}function _e(){return document.activeElement instanceof HTMLElement&&document.activeElement||void 0}function tr(e){return L(b(document.body,"focusin"),b(document.body,"focusout")).pipe(ke(1),l(()=>{let t=_e();return typeof t!="undefined"?e.contains(t):!1}),V(e===_e()),B())}function Xe(e){return{x:e.offsetLeft,y:e.offsetTop}}function Qn(e){return L(b(window,"load"),b(window,"resize")).pipe(Ce(0,Te),l(()=>Xe(e)),V(Xe(e)))}function rr(e){return{x:e.scrollLeft,y:e.scrollTop}}function dt(e){return L(b(e,"scroll"),b(window,"resize")).pipe(Ce(0,Te),l(()=>rr(e)),V(rr(e)))}var Gn=function(){if(typeof Map!="undefined")return Map;function e(t,r){var n=-1;return t.some(function(o,i){return o[0]===r?(n=i,!0):!1}),n}return function(){function t(){this.__entries__=[]}return Object.defineProperty(t.prototype,"size",{get:function(){return this.__entries__.length},enumerable:!0,configurable:!0}),t.prototype.get=function(r){var n=e(this.__entries__,r),o=this.__entries__[n];return o&&o[1]},t.prototype.set=function(r,n){var o=e(this.__entries__,r);~o?this.__entries__[o][1]=n:this.__entries__.push([r,n])},t.prototype.delete=function(r){var n=this.__entries__,o=e(n,r);~o&&n.splice(o,1)},t.prototype.has=function(r){return!!~e(this.__entries__,r)},t.prototype.clear=function(){this.__entries__.splice(0)},t.prototype.forEach=function(r,n){n===void 0&&(n=null);for(var o=0,i=this.__entries__;o0},e.prototype.connect_=function(){!Dr||this.connected_||(document.addEventListener("transitionend",this.onTransitionEnd_),window.addEventListener("resize",this.refresh),ga?(this.mutationsObserver_=new MutationObserver(this.refresh),this.mutationsObserver_.observe(document,{attributes:!0,childList:!0,characterData:!0,subtree:!0})):(document.addEventListener("DOMSubtreeModified",this.refresh),this.mutationEventsAdded_=!0),this.connected_=!0)},e.prototype.disconnect_=function(){!Dr||!this.connected_||(document.removeEventListener("transitionend",this.onTransitionEnd_),window.removeEventListener("resize",this.refresh),this.mutationsObserver_&&this.mutationsObserver_.disconnect(),this.mutationEventsAdded_&&document.removeEventListener("DOMSubtreeModified",this.refresh),this.mutationsObserver_=null,this.mutationEventsAdded_=!1,this.connected_=!1)},e.prototype.onTransitionEnd_=function(t){var r=t.propertyName,n=r===void 0?"":r,o=va.some(function(i){return!!~n.indexOf(i)});o&&this.refresh()},e.getInstance=function(){return this.instance_||(this.instance_=new e),this.instance_},e.instance_=null,e}(),Bn=function(e,t){for(var r=0,n=Object.keys(t);r0},e}(),Xn=typeof WeakMap!="undefined"?new WeakMap:new Gn,Zn=function(){function e(t){if(!(this instanceof e))throw new TypeError("Cannot call a class as a function.");if(!arguments.length)throw new TypeError("1 argument required, but only 0 present.");var r=ya.getInstance(),n=new Aa(t,r,this);Xn.set(this,n)}return e}();["observe","unobserve","disconnect"].forEach(function(e){Zn.prototype[e]=function(){var t;return(t=Xn.get(this))[e].apply(t,arguments)}});var Ca=function(){return typeof nr.ResizeObserver!="undefined"?nr.ResizeObserver:Zn}(),eo=Ca;var to=new x,Ra=$(()=>k(new eo(e=>{for(let t of e)to.next(t)}))).pipe(g(e=>L(ze,k(e)).pipe(R(()=>e.disconnect()))),J(1));function he(e){return{width:e.offsetWidth,height:e.offsetHeight}}function ye(e){return Ra.pipe(S(t=>t.observe(e)),g(t=>to.pipe(A(({target:r})=>r===e),R(()=>t.unobserve(e)),l(()=>he(e)))),V(he(e)))}function bt(e){return{width:e.scrollWidth,height:e.scrollHeight}}function ar(e){let t=e.parentElement;for(;t&&(e.scrollWidth<=t.scrollWidth&&e.scrollHeight<=t.scrollHeight);)t=(e=t).parentElement;return t?e:void 0}var ro=new x,ka=$(()=>k(new IntersectionObserver(e=>{for(let t of e)ro.next(t)},{threshold:0}))).pipe(g(e=>L(ze,k(e)).pipe(R(()=>e.disconnect()))),J(1));function sr(e){return ka.pipe(S(t=>t.observe(e)),g(t=>ro.pipe(A(({target:r})=>r===e),R(()=>t.unobserve(e)),l(({isIntersecting:r})=>r))))}function no(e,t=16){return dt(e).pipe(l(({y:r})=>{let n=he(e),o=bt(e);return r>=o.height-n.height-t}),B())}var cr={drawer:z("[data-md-toggle=drawer]"),search:z("[data-md-toggle=search]")};function oo(e){return cr[e].checked}function Ke(e,t){cr[e].checked!==t&&cr[e].click()}function Ue(e){let t=cr[e];return b(t,"change").pipe(l(()=>t.checked),V(t.checked))}function Ha(e,t){switch(e.constructor){case HTMLInputElement:return e.type==="radio"?/^Arrow/.test(t):!0;case HTMLSelectElement:case HTMLTextAreaElement:return!0;default:return e.isContentEditable}}function Pa(){return L(b(window,"compositionstart").pipe(l(()=>!0)),b(window,"compositionend").pipe(l(()=>!1))).pipe(V(!1))}function io(){let e=b(window,"keydown").pipe(A(t=>!(t.metaKey||t.ctrlKey)),l(t=>({mode:oo("search")?"search":"global",type:t.key,claim(){t.preventDefault(),t.stopPropagation()}})),A(({mode:t,type:r})=>{if(t==="global"){let n=_e();if(typeof n!="undefined")return!Ha(n,r)}return!0}),pe());return Pa().pipe(g(t=>t?_:e))}function le(){return new URL(location.href)}function ot(e){location.href=e.href}function ao(){return new x}function so(e,t){if(typeof t=="string"||typeof t=="number")e.innerHTML+=t.toString();else if(t instanceof Node)e.appendChild(t);else if(Array.isArray(t))for(let r of t)so(e,r)}function M(e,t,...r){let n=document.createElement(e);if(t)for(let o of Object.keys(t))typeof t[o]!="undefined"&&(typeof t[o]!="boolean"?n.setAttribute(o,t[o]):n.setAttribute(o,""));for(let o of r)so(n,o);return n}function fr(e){if(e>999){let t=+((e-950)%1e3>99);return`${((e+1e-6)/1e3).toFixed(t)}k`}else return e.toString()}function co(){return location.hash.substring(1)}function Vr(e){let t=M("a",{href:e});t.addEventListener("click",r=>r.stopPropagation()),t.click()}function $a(e){return L(b(window,"hashchange"),e).pipe(l(co),V(co()),A(t=>t.length>0),J(1))}function fo(e){return $a(e).pipe(l(t=>ce(`[id="${t}"]`)),A(t=>typeof t!="undefined"))}function zr(e){let t=matchMedia(e);return er(r=>t.addListener(()=>r(t.matches))).pipe(V(t.matches))}function uo(){let e=matchMedia("print");return L(b(window,"beforeprint").pipe(l(()=>!0)),b(window,"afterprint").pipe(l(()=>!1))).pipe(V(e.matches))}function Nr(e,t){return e.pipe(g(r=>r?t():_))}function ur(e,t={credentials:"same-origin"}){return ue(fetch(`${e}`,t)).pipe(fe(()=>_),g(r=>r.status!==200?Tt(()=>new Error(r.statusText)):k(r)))}function We(e,t){return ur(e,t).pipe(g(r=>r.json()),J(1))}function po(e,t){let r=new DOMParser;return ur(e,t).pipe(g(n=>n.text()),l(n=>r.parseFromString(n,"text/xml")),J(1))}function pr(e){let t=M("script",{src:e});return $(()=>(document.head.appendChild(t),L(b(t,"load"),b(t,"error").pipe(g(()=>Tt(()=>new ReferenceError(`Invalid script: ${e}`))))).pipe(l(()=>{}),R(()=>document.head.removeChild(t)),ge(1))))}function lo(){return{x:Math.max(0,scrollX),y:Math.max(0,scrollY)}}function mo(){return L(b(window,"scroll",{passive:!0}),b(window,"resize",{passive:!0})).pipe(l(lo),V(lo()))}function ho(){return{width:innerWidth,height:innerHeight}}function bo(){return b(window,"resize",{passive:!0}).pipe(l(ho),V(ho()))}function vo(){return Q([mo(),bo()]).pipe(l(([e,t])=>({offset:e,size:t})),J(1))}function lr(e,{viewport$:t,header$:r}){let n=t.pipe(Z("size")),o=Q([n,r]).pipe(l(()=>Xe(e)));return Q([r,t,o]).pipe(l(([{height:i},{offset:s,size:a},{x:c,y:f}])=>({offset:{x:s.x-c,y:s.y-f+i},size:a})))}(()=>{function e(n,o){parent.postMessage(n,o||"*")}function t(...n){return n.reduce((o,i)=>o.then(()=>new Promise(s=>{let a=document.createElement("script");a.src=i,a.onload=s,document.body.appendChild(a)})),Promise.resolve())}var r=class extends EventTarget{constructor(n){super(),this.url=n,this.m=i=>{i.source===this.w&&(this.dispatchEvent(new MessageEvent("message",{data:i.data})),this.onmessage&&this.onmessage(i))},this.e=(i,s,a,c,f)=>{if(s===`${this.url}`){let u=new ErrorEvent("error",{message:i,filename:s,lineno:a,colno:c,error:f});this.dispatchEvent(u),this.onerror&&this.onerror(u)}};let o=document.createElement("iframe");o.hidden=!0,document.body.appendChild(this.iframe=o),this.w.document.open(),this.w.document.write(` + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + + + +

Accera logo

+
+ +

PyPI package version Python versions MIT License

+

Problem at Hand

+

Writing highly optimized compute-intensive code in a traditional programming language is strenuous and time-consuming. Not only does it require advanced engineering skills such as fluency in Assembly language, but a deep understanding of computer architecture is also indispensable. Manual optimization of even the simplest numerical algorithms demands a significant engineering effort. Needless to say, a highly optimized numerical code is often prone to bugs, lacks readability, and offers little to no usability. Code maintenance becomes a nightmare resulting in the reimplementation of the same logic every time an architecture level change is introduced.

+

Accera: An Optimized Solution

+

Accera is a compiler that enables you to experiment with loop optimizations without hand-writing Assembly code. With Accera, these problems and impediments can be addressed in an optimized way. It is available as a Python library and supports cross-compiling to a wide range of processor targets.

+

Accera has THREE primary goals:

+
    +
  • Performance: To guarantee the fastest implementation for any compute-intensive algorithm.
  • +
  • Readability: To ensure effective implementation of algorithms without sacrificing the readability of code.
  • +
  • Writability: To provide a user-friendly programming model, designed for agility and maintainability.
  • +
+

Install

+

To install for Linux, macOS, or Windows (requires Python 3.7-3.10):

+
pip install accera
+
+

See the Install Instructions for more details on installing pre-built Python 3 packages and how to build Accera from the source.

+

Quickstart

+

In this example, we will:

+
    +
  • Implement matrix multiplication with a ReLU activation (matmul + ReLU), commonly used in machine learning algorithms.
  • +
  • Generate two implementations: a naive algorithm and loop-based transformations.
  • +
  • Compare the execution time of both implementations.
  • +
+

Run in your browser

+ +

Binder

+

No installation is required. This will launch a Jupyter notebook with the quickstart example running in the cloud.

+

Run on your machine

+
    +
  1. +

    Create a Python 3 script called quickstart.py:

    +
    import accera as acc
    +
    +# define placeholder inputs/output
    +A = acc.Array(role=acc.Role.INPUT, shape=(512, 512))
    +B = acc.Array(role=acc.Role.INPUT, shape=(512, 512))
    +C = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(512, 512))
    +
    +# implement the logic for matmul and relu
    +matmul = acc.Nest(shape=(512, 512, 512))
    +i1, j1, k1 = matmul.get_indices()
    +@matmul.iteration_logic
    +def _():
    +    C[i1, j1] += A[i1, k1] * B[k1, j1]
    +
    +relu = acc.Nest(shape=(512, 512))
    +i2, j2 = relu.get_indices()
    +@relu.iteration_logic
    +def _():
    +    C[i2, j2] = acc.max(C[i2, j2], 0.0)
    +
    +package = acc.Package()
    +
    +# fuse the i and j indices of matmul and relu, add to the package
    +schedule = acc.fuse(matmul.create_schedule(), relu.create_schedule(), partial=2)
    +package.add(schedule, args=(A, B, C), base_name="matmul_relu_fusion_naive")
    +
    +# transform the schedule, add to the package
    +i, j, f, k = schedule.get_indices()
    +ii, jj = schedule.tile({
    +    i: 16,
    +    j: 16
    +}) # loop tiling
    +schedule.reorder(j, i, f, k, jj, ii) # loop reordering
    +plan = schedule.create_plan()
    +plan.unroll(ii) # loop unrolling
    +package.add(plan, args=(A, B, C), base_name="matmul_relu_fusion_transformed")
    +
    +# build a dynamically-linked package (a .dll or .so) that exports both functions
    +print(package.build(name="hello_accera", format=acc.Package.Format.HAT_DYNAMIC))
    +
    +
  2. +
  3. +

    Ensure that you have a compiler in your PATH:

    +
      +
    • Windows: Install Microsoft Visual Studio and run vcvars64.bat to setup the command prompt.
    • +
    • Linux/macOS: Install gcc
    • +
    +

    Don't have a compiler handy? We recommend trying Accera in your browser instead Binder

    +
  4. +
  5. +

    Install Accera:

    +
    pip install accera
    +
    +
  6. +
  7. +

    Generate the library that implements two versions of matmul + ReLU:

    +
    python quickstart.py
    +
    +
  8. +
  9. +

    To consume and compare the library functions, create a file called benchmark.py in the same location:

    +
    import hatlib as hat
    +import numpy as np
    +
    +# load the package
    +_, functions = hat.load("hello_accera.hat")
    +
    +# call one of the functions with test inputs
    +A_test = np.random.rand(512, 512).astype(np.float32)
    +B_test = np.random.rand(512, 512).astype(np.float32)
    +C_test = np.zeros((512, 512)).astype(np.float32)
    +C_numpy = np.maximum(C_test + A_test @ B_test, 0.0)
    +
    +matmul_relu = functions["matmul_relu_fusion_transformed"]
    +matmul_relu(A_test, B_test, C_test)
    +
    +# check correctness
    +np.testing.assert_allclose(C_test, C_numpy, atol=1e-3)
    +
    +# benchmark all functions
    +hat.run_benchmark("hello_accera.hat", batch_size=5, min_time_in_sec=5)
    +
    +
  10. +
  11. +

    Run the benchmark to get the execution time results:

    +
    python benchmark.py
    +
    +
  12. +
+

Next Steps

+

The Manual is the best introductory resource for the Accera Python programming model.

+

In particular, the schedule transformations describe how you can experiment with different loop transformations with just a few lines of Python code.

+

Finally, the .hat format is just a C header file containing the metadata. Learn more about the HAT format and benchmarking.

+

How it works

+

In a nutshell, Accera takes the Python code that defines the loop schedule and algorithm while converting it into MLIR intermediate representation (IR). Accera's compiler then takes this IR through a series of MLIR pipelines to perform transformations. The result is a binary library with a C header file. The library implements the algorithms that are defined in Python and it is compatible with the target.

+

To peek into the stages of IR transformation that Accera does, try replacing format=acc.Package.Format.HAT_DYNAMIC with format=acc.Package.Format.MLIR_DYNAMIC in quickstart.py, re-run the script, and search the _tmp subfolder for the intermediate *.mlir files. We plan to document these IR constructs in the future.

+

Documentation

+

Get familiar with Accera's concepts and Python constructs in the Documentation page.

+

Tutorials

+

Step-by-step examples are available on the Tutorials page. We're working on adding more complementary examples and tutorials.

+

Contributions

+

Accera is a research platform-in-progress that can certainly benefit from your contributions. We would love your feedback, recommendations, and feature requests. Not to mention that we are excited to answer your questions. Let’s collaborate! Please file a Github issue or send us a pull request. Please review the Microsoft Code of Conduct to learn more.

+

Credits

+

Accera is built using several open source libraries, including: LLVM, pybind11, toml++, tomlkit, vcpkg, pyyaml, and HAT. For testing, we used numpy and catch2.

+

License

+

This project is released under the MIT License.

+ +
+
+ + + Last update: + 2023-03-27 + + +
+ + + + + + +
+
+ + +
+ + + +
+ +
+ + +
+ +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 00000000..ccc8259c --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#problem-at-hand","title":"Problem at Hand","text":"

Writing highly optimized compute-intensive code in a traditional programming language is strenuous and time-consuming. Not only does it require advanced engineering skills such as fluency in Assembly language, but a deep understanding of computer architecture is also indispensable. Manual optimization of even the simplest numerical algorithms demands a significant engineering effort. Needless to say, a highly optimized numerical code is often prone to bugs, lacks readability, and offers little to no usability. Code maintenance becomes a nightmare resulting in the reimplementation of the same logic every time an architecture level change is introduced.

"},{"location":"#accera-an-optimized-solution","title":"Accera: An Optimized Solution","text":"

Accera is a compiler that enables you to experiment with loop optimizations without hand-writing Assembly code. With Accera, these problems and impediments can be addressed in an optimized way. It is available as a Python library and supports cross-compiling to a wide range of processor targets.

Accera has THREE primary goals:

  • Performance: To guarantee the fastest implementation for any compute-intensive algorithm.
  • Readability: To ensure effective implementation of algorithms without sacrificing the readability of code.
  • Writability: To provide a user-friendly programming model, designed for agility and maintainability.
"},{"location":"#install","title":"Install","text":"

To install for Linux, macOS, or Windows (requires Python 3.7-3.10):

pip install accera\n

See the Install Instructions for more details on installing pre-built Python 3 packages and how to build Accera from the source.

"},{"location":"#quickstart","title":"Quickstart","text":"

In this example, we will:

  • Implement matrix multiplication with a ReLU activation (matmul + ReLU), commonly used in machine learning algorithms.
  • Generate two implementations: a naive algorithm and loop-based transformations.
  • Compare the execution time of both implementations.
"},{"location":"#run-in-your-browser","title":"Run in your browser","text":"

No installation is required. This will launch a Jupyter notebook with the quickstart example running in the cloud.

"},{"location":"#run-on-your-machine","title":"Run on your machine","text":"
  1. Create a Python 3 script called quickstart.py:

    import accera as acc\n\n# define placeholder inputs/output\nA = acc.Array(role=acc.Role.INPUT, shape=(512, 512))\nB = acc.Array(role=acc.Role.INPUT, shape=(512, 512))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(512, 512))\n\n# implement the logic for matmul and relu\nmatmul = acc.Nest(shape=(512, 512, 512))\ni1, j1, k1 = matmul.get_indices()\n@matmul.iteration_logic\ndef _():\n    C[i1, j1] += A[i1, k1] * B[k1, j1]\n\nrelu = acc.Nest(shape=(512, 512))\ni2, j2 = relu.get_indices()\n@relu.iteration_logic\ndef _():\n    C[i2, j2] = acc.max(C[i2, j2], 0.0)\n\npackage = acc.Package()\n\n# fuse the i and j indices of matmul and relu, add to the package\nschedule = acc.fuse(matmul.create_schedule(), relu.create_schedule(), partial=2)\npackage.add(schedule, args=(A, B, C), base_name=\"matmul_relu_fusion_naive\")\n\n# transform the schedule, add to the package\ni, j, f, k = schedule.get_indices()\nii, jj = schedule.tile({\n    i: 16,\n    j: 16\n}) # loop tiling\nschedule.reorder(j, i, f, k, jj, ii) # loop reordering\nplan = schedule.create_plan()\nplan.unroll(ii) # loop unrolling\npackage.add(plan, args=(A, B, C), base_name=\"matmul_relu_fusion_transformed\")\n\n# build a dynamically-linked package (a .dll or .so) that exports both functions\nprint(package.build(name=\"hello_accera\", format=acc.Package.Format.HAT_DYNAMIC))\n
  2. Ensure that you have a compiler in your PATH:

    • Windows: Install Microsoft Visual Studio and run vcvars64.bat to setup the command prompt.
    • Linux/macOS: Install gcc

    Don't have a compiler handy? We recommend trying Accera in your browser instead

  3. Install Accera:

    pip install accera\n
  4. Generate the library that implements two versions of matmul + ReLU:

    python quickstart.py\n
  5. To consume and compare the library functions, create a file called benchmark.py in the same location:

    import hatlib as hat\nimport numpy as np\n\n# load the package\n_, functions = hat.load(\"hello_accera.hat\")\n\n# call one of the functions with test inputs\nA_test = np.random.rand(512, 512).astype(np.float32)\nB_test = np.random.rand(512, 512).astype(np.float32)\nC_test = np.zeros((512, 512)).astype(np.float32)\nC_numpy = np.maximum(C_test + A_test @ B_test, 0.0)\n\nmatmul_relu = functions[\"matmul_relu_fusion_transformed\"]\nmatmul_relu(A_test, B_test, C_test)\n\n# check correctness\nnp.testing.assert_allclose(C_test, C_numpy, atol=1e-3)\n\n# benchmark all functions\nhat.run_benchmark(\"hello_accera.hat\", batch_size=5, min_time_in_sec=5)\n
  6. Run the benchmark to get the execution time results:

    python benchmark.py\n
"},{"location":"#next-steps","title":"Next Steps","text":"

The Manual is the best introductory resource for the Accera Python programming model.

In particular, the schedule transformations describe how you can experiment with different loop transformations with just a few lines of Python code.

Finally, the .hat format is just a C header file containing the metadata. Learn more about the HAT format and benchmarking.

"},{"location":"#how-it-works","title":"How it works","text":"

In a nutshell, Accera takes the Python code that defines the loop schedule and algorithm while converting it into MLIR intermediate representation (IR). Accera's compiler then takes this IR through a series of MLIR pipelines to perform transformations. The result is a binary library with a C header file. The library implements the algorithms that are defined in Python and it is compatible with the target.

To peek into the stages of IR transformation that Accera does, try replacing format=acc.Package.Format.HAT_DYNAMIC with format=acc.Package.Format.MLIR_DYNAMIC in quickstart.py, re-run the script, and search the _tmp subfolder for the intermediate *.mlir files. We plan to document these IR constructs in the future.

"},{"location":"#documentation","title":"Documentation","text":"

Get familiar with Accera's concepts and Python constructs in the Documentation page.

"},{"location":"#tutorials","title":"Tutorials","text":"

Step-by-step examples are available on the Tutorials page. We're working on adding more complementary examples and tutorials.

"},{"location":"#contributions","title":"Contributions","text":"

Accera is a research platform-in-progress that can certainly benefit from your contributions. We would love your feedback, recommendations, and feature requests. Not to mention that we are excited to answer your questions. Let\u2019s collaborate! Please file a Github issue or send us a pull request. Please review the Microsoft Code of Conduct to learn more.

"},{"location":"#credits","title":"Credits","text":"

Accera is built using several open source libraries, including: LLVM, pybind11, toml++, tomlkit, vcpkg, pyyaml, and HAT. For testing, we used numpy and catch2.

"},{"location":"#license","title":"License","text":"

This project is released under the MIT License.

"},{"location":"Case%20Studies/","title":"Accera Case Studies","text":"

Accera case studies are community-provided samples that showcase the Accera language and programming model. To contribute a case study of your own, follow these instructions.

"},{"location":"Case%20Studies/#matmul-grid-search","title":"MatMul Grid Search","text":"
  • MatMul Grid Search
"},{"location":"Case%20Studies/#convolution","title":"Convolution","text":"
  • NCHWc 2D Convolution Grid Search

  • Unrolled 2D Convolution Grid Search

"},{"location":"Case%20Studies/#three-matrix-multiplication","title":"Three Matrix Multiplication","text":"
  • (Coming soon)
"},{"location":"Case%20Studies/CONTRIBUTING/","title":"Contributing Guide","text":"

Thank you for investing your time contributing a community case study!

In this guide, you will get an overview of the contribution workflow.

"},{"location":"Case%20Studies/CONTRIBUTING/#getting-started","title":"Getting started","text":"

Read our Code of Conduct to keep our community approachable and respectable.

Refer to the Manual and Tutorials to familiarize yourself with the Accera language and programming model.

"},{"location":"Case%20Studies/CONTRIBUTING/#components-of-a-good-case-study","title":"Components of a good case study","text":"

A good case study should have these components and characteristics:

  1. Solves one specific task, such as matrix multiplication, matrix convolution, vector addition. If you have a series of tasks to solve, break them up into multiple case studies that reference one another.

  2. Includes working Accera Python code implementing that task. At the end of the case study, the code should produce a HAT package using accera.Package.build().

  3. Describes the thought process, considerations, pros and cons of your implementation in a README.md.

  4. If the case study generates several implementations (for example, using Parameter Grids), include the following:

  5. Benchmark results on a target machine (for example, your laptop). You can run hatlib.run_benchmark on your HAT package.
  6. A description of the make and model of that target machine you used (for example, Intel Xeon E5). If you are unsure, you can use the output of this command:

    python -m cpuinfo\n

For some examples, refer to the published case studies in the Table of Contents.

"},{"location":"Case%20Studies/CONTRIBUTING/#publishing-your-case-study","title":"Publishing your case study","text":"

All community case studies are published directly from the author's GitHub repository and linked to from the Accera GitHub repository.

Once you are ready to publish your case study: 1. Make your case study GitHub repository public (if you haven't done so already).

  1. Edit Case Studies/README.md to add your case study to the Table of Contents. The link should point to the git SHA for your latest commit. The format to use is: https://github.com/user/repo/blob/git_sha/path_to_case_study/README.md.

  2. Create a Pull Request to submit your edits to Case Studies/README.md.

"},{"location":"Install/","title":"Install from PyPI","text":"

The quickest way to get up and running is to install the pre-built Python packages:

  • MacOS
  • Ubuntu
  • Windows
"},{"location":"Install/#build-and-install","title":"Build and Install","text":"

You can also build and install the latest version of Accera by following these instructions:

  • MacOS
  • Ubuntu
  • Windows
"},{"location":"Install/Building_on_MacOS/","title":"Building on MacOS","text":""},{"location":"Install/Building_on_MacOS/#installing-on-macos","title":"Installing on MacOS","text":""},{"location":"Install/Building_on_MacOS/#install-dependencies","title":"Install Dependencies","text":"

Accera requires the following tools and libraries:

  • A C++ compiler that supports C++ 17, such as clang, which is bundled in XCode
  • CMake 3.14 or newer
  • Python 3.7 or newer
  • Ninja
  • Ccache
  • LLVM OpenMP 5, if using parallelization

Homebrew is a package manager that makes it easy to install the prerequisites. Homebrew can be downloaded and installed by:

/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n

If you already have Homebrew installed, update it to the latest version by typing:

brew update\n

Install the dependencies:

Intel MacOS Apple Silicon brew install cmake python ninja-build ccache libomp pkg-config brew install cmake python ninja ccache libomp pkg-config"},{"location":"Install/Building_on_MacOS/#clang","title":"Clang","text":"

Select the clang compiler from XCode:

xcode-select --install\n
"},{"location":"Install/Building_on_MacOS/#clone-accera","title":"Clone Accera","text":"

A version of git should already be included in XCode.

Clone the git repository:

git clone --recurse-submodules https://github.com/microsoft/Accera\n
"},{"location":"Install/Building_on_MacOS/#build-and-install-accera","title":"Build and install Accera","text":"

Run the build.sh script to install dependencies and build the Accera Python package (replace <path_to_accera> with the path to the cloned Accera repository).

cd <path_to_accera>\nsh ./build.sh\n

Update or install the resulting .whl file from the dist sudirectory. The name depends on your Python version, your OS and your CPU architecture.

pip install -U ./dist/accera-0.0.1-cp37-cp37-macosx_10_15_x86_64.whl --find-links=dist\n

"},{"location":"Install/Building_on_MacOS/#build-and-install-using-cmake","title":"Build and install using CMake","text":"

Accera can also be built using CMake (intended for expert users).

"},{"location":"Install/Building_on_MacOS/#install-dependencies_1","title":"Install dependencies","text":"
cd <path_to_accera>\ngit submodule init\ngit submodule update\n./external/vcpkg/bootstrap-vcpkg.sh\n./external/vcpkg/vcpkg install catch2 tomlplusplus accera-llvm --overlay-ports=external/llvm\n

The last command typically takes a few hours to build and then install Accera's fork of LLVM. We recommend reserving at least 20GB of disk space for the LLVM build.

"},{"location":"Install/Building_on_MacOS/#configure-cmake","title":"Configure CMake","text":"
cd <path_to_accera>\nmkdir build\ncd build\n\ncmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja\n
"},{"location":"Install/Building_on_MacOS/#build-and-run-tests","title":"Build and run tests","text":"
cmake --build . --config Release\nctest -C Release\n
"},{"location":"Install/Building_on_MacOS/#install","title":"Install","text":"
cmake --build . --config Release --target install\n
"},{"location":"Install/Building_on_Ubuntu/","title":"Building on Ubuntu","text":""},{"location":"Install/Building_on_Ubuntu/#installing-on-ubuntu","title":"Installing on Ubuntu","text":""},{"location":"Install/Building_on_Ubuntu/#quickstart","title":"Quickstart","text":"

If you have access to Codespaces, you can launch a Linux VM in the browser or in Visual Studio Code with all the pre-requisites installed:

  1. Go to https://github.com/microsoft/Accera, use the \"<> Code\" drop-down menu, and in the Codespaces tab, click Create codespace on main.
  2. sh build.sh

Step 2 will take some time to build Accera's LLVM fork. Grab a coffee and come back in about an hour or so.

"},{"location":"Install/Building_on_Ubuntu/#build-script","title":"Build Script","text":"

If you do not have access to Codespaces or prefer to build locally, you can use the build.sh script to build Accera.

"},{"location":"Install/Building_on_Ubuntu/#install-dependencies","title":"Install Dependencies","text":"

Accera requires the following tools and libraries:

  • A C++ compiler that supports C++ 17, such as GCC 8
  • CMake 3.14 or newer
  • Python 3.7 or newer
  • Ninja
  • Ccache
  • LLVM OpenMP 5, if using parallelization
sudo apt update\nsudo apt-get install gcc-8 g++-8 cmake python3 python3-pip ninja-build ccache libomp-11-dev pkg-config zip\n

Some Ubuntu distributions install an older version of CMake. Check the version of cmake using cmake --version, and download a newer version if older than 3.14.

"},{"location":"Install/Building_on_Ubuntu/#clone-accera","title":"Clone Accera","text":"

Install git if you don't already have it:

sudo apt-get install git\n

Clone the git repository

git clone --recurse-submodules https://github.com/microsoft/Accera\n
"},{"location":"Install/Building_on_Ubuntu/#build-and-install-accera","title":"Build and install Accera","text":"

Run the build.sh script to install dependencies and build the Accera Python package (replace <path_to_accera> with the path to the cloned Accera repository).

cd <path_to_accera>\nsh ./build.sh\n

Update or install the resulting .whl files from the dist subdirectory. The --find-links option tells pip to look at the dist subdirectory for the dependent packages. The name depends on your Python version, your OS and your CPU architecture.

pip install -U ./dist/accera-0.0.1-cp37-cp37m-linux_x86_64.whl --find-links=dist\n

"},{"location":"Install/Building_on_Ubuntu/#cmake-builds","title":"CMake Builds","text":"

Accera can also be built using CMake (intended for expert users).

"},{"location":"Install/Building_on_Ubuntu/#install-dependencies_1","title":"Install dependencies","text":"
cd <path_to_accera>\ngit submodule init\ngit submodule update\n./external/vcpkg/bootstrap-vcpkg.sh\n./external/vcpkg/vcpkg install catch2 tomlplusplus accera-llvm --overlay-ports=external/llvm\n

The last command typically takes a few hours to build and then install Accera's fork of LLVM. We recommend reserving at least 20GB of disk space for the LLVM build.

"},{"location":"Install/Building_on_Ubuntu/#configure-cmake","title":"Configure CMake","text":"
cd <path_to_accera>\nmkdir build\ncd build\n\ncmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja\n
"},{"location":"Install/Building_on_Ubuntu/#build-and-run-tests","title":"Build and run tests","text":"
cmake --build . --config Release\nctest -C Release\n
"},{"location":"Install/Building_on_Ubuntu/#install","title":"Install","text":"
cmake --build . --config Release --target install\n
"},{"location":"Install/Building_on_Windows/","title":"Building on Windows","text":""},{"location":"Install/Building_on_Windows/#installing-on-windows","title":"Installing on Windows","text":""},{"location":"Install/Building_on_Windows/#install-dependencies","title":"Install Dependencies","text":""},{"location":"Install/Building_on_Windows/#visual-studio","title":"Visual Studio","text":"

Accera requires a C++ compiler that supports C++ 17. You can download Visual Studio 2019 Enterprise Edition or Visual Studio 2022 Community Edition. Install Update 10 or later, which includes the LLVM OpenMP libraries only for VS 2019.

Select Desktop Development with C++.

Accera requires Spectre-mitigated libraries:

  1. Go to Individual Components
  2. Type in \"Spectre\" in the search box
  3. Select the latest version of the MSVC libraries, e.g., MSVC v142 - VS 2019 C++ x64/x86 Spectre-mitigated libs (Latest) (your actual version may vary)
"},{"location":"Install/Building_on_Windows/#cmake","title":"CMake","text":"

Accera requires CMake 3.14 or newer. A version of CMake that satisfies this requirement is included with Visual Studio 2019 and Visual Studio 2022.

"},{"location":"Install/Building_on_Windows/#python","title":"Python","text":"

Accera's packages require Python 3.7 64-bit or newer, plus a version of pip that supports 64-bit packages (win_amd64). One way to obtain this is to download and install Miniconda. Download \"Miniconda3 Windows 64-bit\".

"},{"location":"Install/Building_on_Windows/#optional-create-a-conda-environment","title":"Optional: Create a conda environment","text":"

After installing Miniconda, you can optionally create an environment to manage different Python versions.

From an \"Anaconda Prompt\", create and then activate an environment for Python 3.7 (or a newer version if you prefer). Make sure to activate an environment from other applications that you use to develop Accera.

conda create -n py37 python=3.7\nconda activate py37\n
"},{"location":"Install/Building_on_Windows/#clone-accera","title":"Clone Accera","text":"

Visual Studio 2019 and 2022 include a version of git. To use it, launch Visual Studio 2019 or 2022, and select Clone a repository.

Repository location:

https://github.com/microsoft/Accera\n
"},{"location":"Install/Building_on_Windows/#build-and-install-accera","title":"Build and install Accera","text":"

From a command line with Python in your PATH, such as an Anaconda Command Prompt, setup the Visual Studio command line environment (vcvars64.bat) and then run build.bat to generate the Accera Python packages.

For Visual Studio 2022:

\"%ProgramFiles%\\Microsoft Visual Studio\\2022\\Community\\VC\\Auxiliary\\Build\\vcvars64.bat\"\n

For Visual Studio 2019:

\"%ProgramFiles(x86)%\\Microsoft Visual Studio\\2019\\Community\\VC\\Auxiliary\\Build\\vcvars64.bat\"\n

cd <path_to_accera>\nbuild.bat\n

Replace <path_to_accera> with the path to the cloned Accera repository.

Update or install the resulting .whl file from the dist subdirectory. The --find-links option tells pip to look at the dist subdirectory for the dependent packages. The whl filename depends on your Python version, your OS, and your CPU architecture.

pip install -U dist\\accera-0.0.1-cp37-cp37m-win_amd64.whl --find-links=dist\n
"},{"location":"Install/Building_on_Windows/#build-and-install-using-cmake","title":"Build and install using CMake","text":"

Accera can also be built using CMake (intended for expert users).

"},{"location":"Install/Building_on_Windows/#install-dependencies_1","title":"Install dependencies","text":"
cd <path_to_accera>\ngit submodule init\ngit submodule update\nexternal\\vcpkg\\bootstrap-vcpkg.bat\nexternal\\vcpkg\\vcpkg install catch2:x64-windows tomlplusplus:x64-windows accera-llvm:x64-windows --overlay-ports=external\\llvm\n

The last command typically takes a few hours to build and then install Accera's fork of LLVM. We recommend reserving at least 20GB of disk space for the LLVM build.

"},{"location":"Install/Building_on_Windows/#configure-cmake","title":"Configure CMake","text":"
cd <path_to_accera>\nmkdir build\ncd build\n\n# For Visual Studio 2019:\ncmake .. -DCMAKE_BUILD_TYPE=Release -G\"Visual Studio 16 2019\" -Ax64\n\n# For Visual Studio 2022:\ncmake .. -DCMAKE_BUILD_TYPE=Release -G\"Visual Studio 17 2022\" -Ax64\n
"},{"location":"Install/Building_on_Windows/#build-and-run-tests","title":"Build and run tests","text":"
cmake --build . --config Release -- /m\nctest -C Release\n
"},{"location":"Install/Building_on_Windows/#install","title":"Install","text":"
cmake --build . --config Release --target install -- /m\n
"},{"location":"Install/Installing_Accera_on_MacOS/","title":"Installing Accera on MacOS","text":""},{"location":"Install/Installing_Accera_on_MacOS/#installing-on-macos","title":"Installing on MacOS","text":""},{"location":"Install/Installing_Accera_on_MacOS/#install-dependencies","title":"Install dependencies","text":"

Accera requires the following tools and libraries for building the generated code:

  • A C++ compiler, such as clang, which is bundled in XCode
  • Python 3.7 or newer
  • OpenMP 5, if using parallelization

Homebrew is a package manager that makes it easy to install the prerequisites. Homebrew can be downloaded and installed by:

/usr/bin/ruby -e \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)\"\n

If you already have Homebrew installed, update it to the latest version by typing:

brew update\n

Install the dependencies:

brew install cmake python@3.7\n

Install the optional dependency if using parallelization:

brew install libomp\n
"},{"location":"Install/Installing_Accera_on_MacOS/#clang","title":"Clang","text":"

Select the clang compiler from XCode:

xcode-select --install\n
"},{"location":"Install/Installing_Accera_on_MacOS/#install-accera","title":"Install Accera","text":"

The accera Python package can be installed from PyPI:

pip install accera\n
"},{"location":"Install/Installing_Accera_on_Ubuntu/","title":"Installing Accera on Ubuntu","text":""},{"location":"Install/Installing_Accera_on_Ubuntu/#installing-on-ubuntu","title":"Installing on Ubuntu","text":""},{"location":"Install/Installing_Accera_on_Ubuntu/#install-dependencies","title":"Install dependencies","text":"

Accera requires the following tools and libraries for building the generated code:

  • A C++ compiler, such as GCC 8
  • Python 3.7 or newer
  • OpenMP 5, if using parallelization

Ubuntu 20.04 is recommended. A quick way to start is to use a new Docker container for Ubuntu 20.04:

docker run -v $PWD:/code -it --entrypoint \"/bin/bash\" ubuntu:focal\n

Install Accera's dependencies:

apt update\napt-get install gcc-8 g++-8 python3 python3-pip libncurses5\n

Install the optional dependency if using parallelization:

apt-get install libomp-11-dev\n
"},{"location":"Install/Installing_Accera_on_Ubuntu/#install-accera","title":"Install Accera","text":"

The accera Python package can be installed from PyPI:

pip install accera\n
"},{"location":"Install/Installing_Accera_on_Windows/","title":"Installing Accera on Windows","text":""},{"location":"Install/Installing_Accera_on_Windows/#installing-on-windows","title":"Installing on Windows","text":""},{"location":"Install/Installing_Accera_on_Windows/#install-dependencies","title":"Install dependencies","text":""},{"location":"Install/Installing_Accera_on_Windows/#visual-studio","title":"Visual Studio","text":"

Accera's generated code requires a C++ compiler. Download Visual Studio 2019 Enterprise Edition or Visual Studio 2022 Community Edition, and select Desktop development with C++ during installation.

If you've selected VS 2019 and would like to use parallelization, ensure that Update 10 or later is installed. Both VS 2019 Update 10 or later and VS 2022 include the LLVM OpenMP libraries.

"},{"location":"Install/Installing_Accera_on_Windows/#python","title":"Python","text":"

Accera's packages require Python 3.7 64-bit or newer, plus a version of pip that supports 64-bit packages (win_amd64). One way to obtain this is to download and install Miniconda. Download \"Miniconda3 Windows 64-bit\".

"},{"location":"Install/Installing_Accera_on_Windows/#optional-create-a-conda-environment","title":"Optional: Create a conda environment","text":"

After installing Miniconda, you can optionally create an environment to manage different Python versions.

From an \"Anaconda Prompt\", create and then activate an environment for Python 3.7 (or a newer version if you prefer):

conda create -n py37 python=3.7\nconda activate py37\n
"},{"location":"Install/Installing_Accera_on_Windows/#install-accera","title":"Install Accera","text":"

The accera Python package can be installed from PyPI:

pip install accera\n
"},{"location":"Manual/","title":"Accera v1.2 Manual","text":"
  • Introduction
  • Arrays and Scalars
  • Simple Affine Loop Nests
  • Schedules
  • Fusing
  • Targets
  • Plans - Caching
  • Plans - Vectorization and Parallelization
  • Deferred layout of constant arrays
  • Parameters
  • Packages
"},{"location":"Manual/00%20Introduction/","title":"Introduction","text":"

Accera is a framework with a Python-based Domain-specific Language (eDSL) that produces optimized compute-intensive code. Accera's primary focus is the optimization of affine and semi-affine nested for-loops for CPU and GPU targets.

Optimization of compute-intensive code in a traditional programming language is not only challenging and time-consuming, but manual optimization of the simplest numerical algorithms demands significant engineering effort and requires an advanced understanding of computer architecture and fluency in C++, C, or Assembly Language. Even with all these efforts, implemented code is prone to critical bugs and requires extensive engineering effort for maintenance. Accera aims at resolving all these issues by providing optimized solutions for compute-intensive algorithms that are highly efficient, readable, and maintainable.

Accera has THREE primary goals:

  • Performance: To generate the fastest implementation for any compute-intensive algorithm.
  • Readability: To ensure effective implementation of algorithms without sacrificing the readability of code.
  • Writability: To provide a user-friendly programming model designed for agility and maintainability.

Accera is designed based on the following guiding principles:

"},{"location":"Manual/00%20Introduction/#1-strict-separation-of-logic-from-implementation","title":"1: Strict separation of logic from implementation","text":"

Traditional programming languages are prone to the tight coupling of code logic (what the program does) with its implementation (how the program is implemented). Consider an example of multiplying a 16\u00d711 matrix A by an 11\u00d710 matrix B. The algorithm's logic calculates the sum over k of A[i,k]\u00b7B[k,j] for each value of i and j. In Python, this logic can be expressed as:

# C += A @ B\nfor i in range(16):\n    for j in range(10):\n        for k in range(11):\n            C[i, j] += A[i, k] * B[k, j]\n
The above code expresses more than just the logic of matrix multiplication. It insists on a specific execution flow: first perform all the steps required to calculate C(0,0) in ascending order of k; then proceed to C(0,1). However, in principle, a single order of execution should not be imposed because the iterations of this loop can be performed in any order while keeping the logic intact. Moreover, the above logic doesn't utilize important optimization techniques, such as double-buffered caching or vectorization.

Accera, on the other hand, provides a strict distinction between logic and its implementation. The programmer first implements the logic without performance considerations using a pseudocode-like syntax independent of the target platform. Once the logic is specified, only then does the programmer move to define the concrete implementation details.

"},{"location":"Manual/00%20Introduction/#2-mindfully-trade-off-safety-versus-expressivity","title":"2: Mindfully trade-off safety versus expressivity","text":"

Accera offers a programming model where a default implementation of the specified logic can be transformed and manipulated in different ways. If used correctly, these transformations are safe, which means that the underlying logic remains intact. This allows the programmer to entirely focus on the performance of the logic without worrying about its correctness. Moreover, these safe transformations allow automatic search algorithms to aggressively search the space of transformations to converge faster and find better optima.

Traditionally, this safety is achieved by trading off the true potential of a programming language since it demands restricting its scope. Nevertheless, extensive constraints significantly restrict the expressivity and the power of the programming language, eventually preventing the end-users from developing highly-optimized and sophisticated implementations.

Accera moderates this trade-off between safety and expressivity by explicitly defining what level of safety guarantees are being given by each transformation under different circumstances. Some situations are safer than others. However, the programmer knows exactly what safeties are being guaranteed in all cases.

"},{"location":"Manual/00%20Introduction/#3-the-programmer-is-in-control","title":"3: The programmer is in control","text":"

Accera gives the programmer maximum control over the generated logic by providing access to the underlying knobs that determine how algorithms are optimized. Convenience methods and carefully used default values can prevent verbosity. As per the use case, these helper methods can always be tuned, even overridden.

"},{"location":"Manual/01%20Arrays%20and%20Scalars/","title":"Section 1: Arrays and Scalars","text":""},{"location":"Manual/01%20Arrays%20and%20Scalars/#arrays","title":"Arrays","text":"

Accera stores data in multi-dimensional arrays of scalar elements where all the array elements share the same primary data type (e.g., float32, int8). An array has a constant number of dimensions d known at compile-time (e.g., a matrix is a 2-dimensional array). Each dimension has a positive size, and the sequence of d sizes is called the shape of the array. An element of an array is referred to by a d-coordinate zero-based index vector.

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#affine-memory-layout","title":"Affine memory layout","text":"

Arrays are multi-dimensional, while computer memories have a linear (one-dimensional) address space. There are many strategies to represent a multi-dimensional array in one-dimensional computer memory. Accera arrays must have an affine memory layout, where each array has an affine memory map that is a d-dimensional vector denoted by a and a memory offset value denoted by o. The array element that corresponds to the index vector i is stored at memory address i\u00b7 a+o (where i\u00b7 a denotes a vector dot product).

Affine memory maps are rich enough to represent many standard array layouts. For example, in affine maps, 2-dimensional arrays (matrices) can be represented as row-major, column-major, triangular, banded, and Toeplitz matrices. However, affine maps cannot represent z-ordering or striped or blocked layouts.

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#array-shape","title":"Array shape","text":"

In an affine memory map, each dimension corresponds to an element, where the dimension having the largest absolute value of the element is called the major dimension. The user must specify all dimension sizes except for the major dimension when constructing an Array. Accera assumes that the size is arbitrary (or infinite) if the major dimension is not specified. In other words, the iterations of the loops determine how much of the array is visited along this dimension.

For example, a row-major matrix must have a compile-time-constant number of columns. However, the number of rows can be left undefined, and the loops' sizes control how many rows are processed.

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#compile-time-and-runtime-dimension-sizes","title":"Compile-time and runtime dimension sizes","text":"

The number of dimensions of Accera arrays are known at compile-time. However, the user can choose to specify the sizes of each dimension at compile-time or at runtime. Runtime dimension sizes are only resolved at runtime, typically as inputs to an Accera function.

For example, a function that implements generalized matrix multiply can receive the M, N, K dimension sizes as inputs along with the M \u00d7 N, M \u00d7 K, and N \u00d7 K Arrays.

Furthermore, an Array can have a mixture of compile-time and runtime dimension sizes.

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#default-and-inferred-memory-layout","title":"Default and inferred memory layout","text":"

Although the user can explicitly specify the memory map, Accera offers some conveniences. The user can set the layout as FIRST_MAJOR (e.g., for two-dimensional arrays, first-major is equivalent to row-major) or LAST_MAJOR. In both cases, the affine map is inferred from the array shape. Specifically, if the layout is LAST_MAJOR and the shape is denoted by the vector s, then the map a is set to [1, s0, s0\u00d7s1, s0\u00d7s1\u00d7s2, ...]. If the layout is FIRST_MAJOR and the dimension equals 4, then a is set to [s0\u00d7s1\u00d7s2, s1\u00d7s2, s2, 1]. In both cases, the size of the major dimension is not used in the definition of a. This indicates that the major dimension size is not needed. If no layout is specified, the default layout is FIRST_MAJOR.

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#array-properties","title":"Array properties","text":"

Accera arrays are defined with either internal scope or external scope. An internal array is a private array that exists inside a specific Accera function only and cannot be accessed outside of that function. An external array is defined outside of an Accera function and passed in as an argument. The memory layout of an external array is specified as a part of the Accera function signature. Moreover, external arrays are assumed to be disjoint, i.e., they do not share any memory.

Accera arrays are either mutable or immutable. The elements of a mutable array can be set by an Accera function, while an immutable array is read-only.

Array properties are not explicitly set by the user but are implied by the role of the array (see below).

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#array-roles","title":"Array roles","text":"

Accera supports the following four array roles where each role is treated differently.

  • Input
  • Input/Output
  • Output
  • Constant
  • Temporary
"},{"location":"Manual/01%20Arrays%20and%20Scalars/#input-arrays","title":"Input arrays","text":"

Input arrays are immutable external arrays whose element type, shape, and affine layout can be known at compile-time. However, their contents are only available at runtime. If the Accera function is emitted as a function in C, each input array is passed as a const pointer argument. For example, we can construct a 10\u00d720 input array of 32-bit floating-point numbers by writing

import accera as acc\n\nA = acc.Array(shape=(10, 20), role=acc.Role.INPUT, element_type=acc.ScalarType.float32)\n
The layout of this array would be the default layout, which is acc.Array.Layout.FIRST_MAJOR.

The shape (and similarly, the layout) of Input arrays can also be set at runtime:

N = acc.create_dimensions()\nA = acc.Array(shape=(N, 20), role=acc.Role.INPUT, element_type=acc.ScalarType.float32)\n
"},{"location":"Manual/01%20Arrays%20and%20Scalars/#inputoutput-arrays","title":"Input/output arrays","text":"

Input/Output arrays are similar to the input arrays except that they are mutable external arrays, i.e., their values can be changed. This type of array is used to output the results of the loop-nest computation. If the Accera function is emitted as a function in C, each input array is passed as a non-const pointer argument.

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#output-arrays","title":"Output arrays","text":"

Output arrays are variable-shaped mutable external arrays whose shapes and affine layout are known at runtime. The key differences with Input/Output arrays are:

  • Output arrays are dynamically allocated at runtime. The caller of an Accera function that uses Output arrays will need to implement the __accera_allocate function to allocate memory (and also perform the subsequent deallocation).
  • Output arrays are uninitialized by default. Accera will produce an error if operators such as += are used on an Output array without prior initialization through assignment.
  • For simplicity, output dimensions (acc.Role.OUTPUT) must be used for specifying an Output array shape or layout (this limitation may be lifted in the future).

Output arrays are useful for operations that adjust the array shape depending on the input values. For example, the Range operation generates variable output sizes based on the start, end, and step inputs:

import accera as acc\n\n# inputs\nStart = acc.Scalar()\nEnd = acc.Scalar()\nStep = acc.Scalar()\n\n# compute the variable output size\nN = acc.create_dimensions(role=acc.Role.OUTPUT)\nN.value = acc.floor((End - Start) / Step)\n\n# create an Output array with the variable output size\nA = acc.Array(shape=(N, ), role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32)\n

The layout of this array is the default layout, which is acc.Array.Layout.FIRST_MAJOR.

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#constant-arrays","title":"Constant arrays","text":"

These are the only Accera arrays whose contents are known at compile-time. Constant arrays are immutable internal arrays whose memory layout can be chosen automatically without any external constraints since they are internally scoped. For example, a constant array can be automatically laid out according to the loop nest's memory access pattern. The layout of a constant array could even depend on its contents (e.g., its sparsity pattern). The dimension sizes of a constant array must be known at compile-time.

We must provide the constant array data (the element values) when constructing it. This data can be any Python buffer or a numpy array:

import accera as acc\nimport numpy as np\n\nmatrix = np.random.rand(16, 16)\nB = acc.Array(role=acc.Role.CONST, data=matrix)\n

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#temporary-arrays","title":"Temporary arrays","text":"

Temporary arrays are mutable internal arrays that are used when two Accera schedules are fused into one (more on fusing in Section 4). The elements of a temporary array are initialized to zeros and used to store intermediate values. Similar to constant arrays, temporary arrays can be laid out arbitrarily. In fact, the Accera compiler can even choose not to store them in physical memory at all.

"},{"location":"Manual/01%20Arrays%20and%20Scalars/#scalars","title":"Scalars","text":"

A scalar represents a single number whose value is mutable and set at runtime. Scalars are useful as input arguments to functions or when computing a single-valued numeric result.

Section 2 lists the operations can be performed on scalars.

"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/","title":"Section 2: Simple affine loop nests","text":"

This section introduces loop nests and their different types that are provided in Accera programming model.

"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#affine-loop-nests","title":"Affine loop nests","text":"

Many important compute-intensive workloads can be expressed using nested for-loops. An algorithm that can be defined using nested for-loops is called a loop nest. Accera only supports the class of affine loop nests. A loop nest is affine if the indices of the elements accessed on each iteration are an affine function of the loop iterator variables. For example, the following loop nest is affine:

for i in range(M):\n    for j in range(N):\n        C[2*i+2, j+2] += A[3*i, j] + B[j, i]\n
because 2*i+2, j+2, 3*i, j and i are all affine functions of the iterator variables i and j.

On the other hand, the following loop nest is not affine:

for i in range(M):\n    for j in range(N):\n        C[i*i, j] += A[i*i, j] + B[i*j, i]\n
because i*i and i*j are quadratic (non-affine) functions of i and j.

"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#simple-affine-loops-nests-aka-simple-nests","title":"Simple affine loops nests, a.k.a. simple nests","text":"

Simple Affine Loop Nests, hereinafter referred to as simple nests, is an important subclass of affine loop nests that satisfies the following properties: 1. The loops are perfectly nested: all the computation is entirely contained within the deepest loop. 2. All the loops are normalized: each loop starts at 0, increments by 1, and ends at a compile-time constant size. 3. The loop iterations are order invariant: the logic doesn't change if the loop iterations are executed in a different sequential order. 4. No conditional exit: the loop doesn't contain break or continue commands.

The matrix-matrix multiplication example given in the introduction is an example of a simple nest. Another example is 2-dimensional convolution, which is the fundamental operation in convolutional neural networks, and can be written in Python as:

# Convolve M x N data matrix A with S x T filter matrix B and add output to matrix C\nfor i in range(M):\n    for j in range(N):\n        for k in range(S):\n            for l in range(T):\n                C[i, j] += A[i + k, j + l] * B[k, l]\n

While Accera supports arbitrary affine loop nests, the programmer defines the logic of their algorithms using simple nests. More complex nests are obtained by applying schedule transformations (see Section 3) or by fusing multiple schedules (see Section 4).

"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#defining-the-loop-nest-logic","title":"Defining the loop nest logic","text":"

The programmer's goal is to create a highly optimized target-specific implementation of an affine loop nest. The first step towards this goal is to define the logic of one or more simple nests. The logic is a target-independent pseudo-code of a simple nest, written without considering performance. For example, the following code defines the logic of the matrix-matrix multiplication loop nest:

# Import accera\nimport accera as acc\n\n# Define matrix sizes\nM = 16\nN = 10\nS = 11\n\nA = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, S))\nB = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(S, N))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n\n# Define a simple affine loop nest and name its loops i, j, k\nnest = acc.Nest(shape=(M, N, S))\ni, j, k = nest.get_indices()\n\n# Define the logic of each iteration in the nest\n@nest.iteration_logic\ndef _():\n    C[i,j] += A[i,k] * B[k,j]\n
We start by defining the arrays that participate in the computation: A and B are input arrays and C is an input/output array. Next, we initialize nest to be an empty skeleton of a loop nest, with nested loops of sizes M, N, S. These loops are logical -- think of them as pseudo-code loops -- they do not define the execution order of the iterations. The index variables that correspond to the three loops are named i, j, k respectively.

The last part of the example sets the iteration logic to C[i, j] += A[i, k] * B[k, j]. Note that this iteration logic follows an affine memory access pattern. The syntax in the example makes use of Python decorators and is shorthand for the more explicit syntax:

def logic_fn():\n    C[i, j] += A[i, k] * B[k, j]\n\nnest.iteration_logic(logic_fn)\n

The iteration spaces above have compile-time shapes. We can define runtime shapes by replacing any or all of the constant matrix sizes M, N, and S with an acc.Dimension placeholder:

M = acc.create_dimensions() # replace M with a runtime dimension\nN = 10 # a compile-time dimension\nS = 11\n\nA = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, S))\nB = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(S, N))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n\n# Define a simple affine loop nest and name its loops i, j, k\nnest = acc.Nest(shape=(M, N, S))\n

The iteration space dimensions will now be runtime variables that need to be provided to the function (more on this later).

"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#supported-operations","title":"Supported operations","text":"

The iteration logic can include the following operations (assuming accera was imported as acc):

"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#assignment-operators","title":"Assignment operators","text":"Operation Types (Operands must be of same type) Description a = b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Assigns the value of scalar b to scalar a"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#arithmetic-operators","title":"Arithmetic operators","text":"Operation Types (Operands must be of same type) Description a + b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the sum of scalars a and b a - b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the difference between scalars a and b a * b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the product of scalars a and b a / b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the quotient of scalars a and b. If the operands are integers, an integer division result is returned a ** b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the b'th power of scalar a a // b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the floor of the quotient of scalars a and b a % b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the signed remainder after dividing scalar a by scalar b -a acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the additive inverse of scalar a

Comment: Accera also supports the corresponding compound-assignment operators, such as a += b, a -= b, etc.

"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#relational-operators","title":"Relational operators","text":"Operation Types (Operands must be of same type) Description a == b acc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if scalar a equals scalar b, else False a != b acc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if scalar a is not equal to scalar b, else False a < b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if scalar a is strictly smaller than scalar b, else False a <= b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if scalar a is smaller than or equal to scalar b, else False a > b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if scalar a is strictly greater than scalar b, else False a >= b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if scalar a is greater than or equal to scalar b, else False"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#logical-operators","title":"Logical operators","text":"Operation Types (Operands must be of same type) Description acc.logical_and(a, b) acc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if scalars a and b are non-zero, else False acc.logical_or(a, b) acc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if either scalar a or scalar b are non-zero, else False acc.logical_not(a) acc.ScalarType.bool, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if a is zero, else False"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#bitwise-operators","title":"Bitwise operators","text":"Operation Types (Operands must be of same type) Description a & b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns the bitwise AND of the bits in scalars a and b a \\| b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns the bitwise OR of the bits in scalars a and b a ^ b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns the bitwise XOR of the bits in scalars a and b ~a acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns the bitwise inverse of the bits in scalar a a << b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns scalar a whose bitwise representation is shifted left by b bits a >> b acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns scalar a whose bitwise representation is shifted right by b bits

Comment: Accera also supports the corresponding compound-assignment operators, such as a &= b, a |= b, etc.

"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#intrinsics","title":"Intrinsics","text":"Operation Types (Operands must be of same type) Description acc.abs(a) acc.ScalarType.float16/32/64 Returns the absolute value of scalar a acc.max(a, b) acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the larger of the two scalars a and b acc.min(a, b) acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the smaller of the two scalars a and b acc.ceil(a) acc.ScalarType.float16/32/64 Returns the value of scalar a rounded up to the nearest integer as an int64 type acc.floor(a) acc.ScalarType.float16/32/64 Returns the value of scalar a rounded down to the nearest integer as an int64 type acc.sqrt(a) acc.ScalarType.float16/32/64 Returns the square root of scalar a acc.exp(a) acc.ScalarType.float16/32/64 Returns the exponential e raised to the scalar a acc.log(a) acc.ScalarType.float16/32/64 Returns the natural logarithm (base e) of scalar a acc.log10(a) acc.ScalarType.float16/32/64 Returns the common logarithm (base 10) of scalar a acc.log2(a) acc.ScalarType.float16/32/64 Returns the binary logarithm (base 2) of scalar a acc.sin(a) acc.ScalarType.float16/32/64 Returns the sine of scalar a, where a is in radians acc.cos(a) acc.ScalarType.float16/32/64 Returns the cosine of scalar a, where a is in radians acc.tan(a) acc.ScalarType.float16/32/64 Returns the tangent of scalar a, where a is in radians acc.sinh(a) acc.ScalarType.float16/32/64 Returns the hyperbolic sine of scalar a, where a is in radians acc.cosh(a) acc.ScalarType.float16/32/64 Returns the hyperbolic cosine of scalar a, where a is in radians acc.tanh(a) acc.ScalarType.float16/32/64 Returns the hyperbolic tangent of scalar a, where a is in radians"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#implicit-type-casting","title":"Implicit type casting","text":"

Accera operators require operands to be the same type. Computations that use multiple types can take advantage of Accera's implicit type casting support when converting from smaller-sized types to larger-sized types.

To do implicit casting, simply assign a source type to its implicitly-castable destination type. No additional casting operation is needed for converting between these types.

Source types Destination type (implicitly-castable) acc.ScalarType.bool, acc.ScalarType.uint8 acc.ScalarType.int8 acc.ScalarType.bool, acc.ScalarType.int8 acc.ScalarType.uint8 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.uint16 acc.ScalarType.int16 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16 acc.ScalarType.uint16 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.uint32 acc.ScalarType.int32 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32 acc.ScalarType.uint32 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32, acc.ScalarType.uint32, acc.ScalarType.uint64 acc.ScalarType.int64 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32, acc.ScalarType.uint32, acc.ScalarType.int64 acc.ScalarType.uint64 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16 acc.ScalarType.float16 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16 acc.ScalarType.bfloat16 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32, acc.ScalarType.uint32, acc.ScalarType.int64, acc.ScalarType.float16, acc.ScalarType.bfloat16 acc.ScalarType.float32 acc.ScalarType.bool, acc.ScalarType.int8, acc.ScalarType.uint8, acc.ScalarType.int16, acc.ScalarType.uint16, acc.ScalarType.int32, acc.ScalarType.uint32, acc.ScalarType.int64, acc.ScalarType.float16, acc.ScalarType.bfloat16, acc.ScalarType.float32 acc.ScalarType.float64

To override the casting behavior above, or cast a larger-sized type to a smaller-sized type, use the acc.cast operation.

Comment: implicit casting of constants may result in truncation.

"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#accera-program-stages","title":"Accera program stages","text":"

Let\u2019s take a step back to describe the stages of Accera program:

  • Nest: A nest captures the logic of a simple nest, without any optimizations or implementation details.
  • Schedule: A Nest is used to create a schedule. The schedule controls the order in which the nest iterations are visited. Multiple schedules can be fused into a single schedule, which may no longer represent a simple nest.
  • Plan: A Schedule is used to create a plan. A plan controls the implementation details that are specific for a target platform (e.g., data caching strategy, vectorization, assignment of arrays and caches to different types of memory).
  • Package: A Plan is used to create a function in a function package. The package is then compiled and emitted.

Once a package is emitted, the Accera functions contained in it can be called from external client code. This external code is typically not written using Accera.

Accera currently supports the following package formats:

  • HAT, which is a schematized version of a standard C library. The external client code can be written in C or C++ and linked with the HAT package.
  • MLIR, which uses standard MLIR dialects. The external code must also be in MLIR.

Overall, to build and emit nest (defined above), we would write:

# create a default schedule from the nest\nschedule = nest.create_schedule()\n\n# create a default plan from the schedule\nplan = schedule.create_plan()\n\n# create a HAT package. Create a function in the package based on the plan\npackage = acc.Package()\npackage.add(plan, args=(A, B, C), base_name=\"simple_matmul\")\n\n# build the HAT package\npackage.build(format=acc.Package.Format.HAT_DYNAMIC, name=\"linear_algebra\")\n

It may not be immediately clear why so many stages are needed just to compile a simple nest. Therefore, let\u2019s discuss each stage in detail to understand their importance.

In the example above, the call to package.add takes three arguments: the first is the plan that defines the function's implementation; the second is the order of the input and input/output arrays in the function signature; and the third is a base name for the function. The full name of the function is the base name followed by an automatically-generated unique identifier. For example, the function in the example could appear in the package as simple_matmul_8f24bef5. The automatically-generated suffix ensures that each function in the package has a unique name. More details on function packages can be found in Section 10.

The Array shapes above are known at compile-time. If one or all of the shapes are known at runtime, we provide dimensions as arguments to the function:

M, N, S = acc.create_dimensions() # runtime dimensions\n\nA = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, S))\nB = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(S, N))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n\n...\n\n# create a default schedule from the nest\nschedule = nest.create_schedule()\n\n# create a default plan from the schedule\nplan = schedule.create_plan()\n\n# create a HAT package. Create a function in the package based on the plan, with\n# the dimensions as additional arguments (in any order)\npackage = acc.Package()\npackage.add(plan, args=(M, N, S, A, B, C), base_name=\"simple_matmul_runtime_shapes\")\n\n# build the HAT package\npackage.build(format=acc.Package.Format.HAT_DYNAMIC, name=\"linear_algebra\")\n
"},{"location":"Manual/02%20Simple%20Affine%20Loop%20Nests/#convenience-syntax","title":"Convenience syntax","text":"

For convenience, Accera also provides shortcuts to avoid unnecessary verbosity. Specifically, we can create a function in a package directly from a nest, as follows:

package.add(nest, args=(A, B, C), base_name=\"simple_matmul\")\n
The abbreviated syntax makes it seem like a callable function is generated directly from nest. However, what actually happens behind the scenes is that nest creates a default schedule, which creates a default plan, which is added as a function in the package. Accera has a similar convenience syntax to create a function from a schedule:
package.add(schedule, args=(A, B, C), base_name=\"simple_matmul\")\n
and to create a plan directly from a nest:
plan = nest.create_plan()\n

"},{"location":"Manual/03%20Schedules/","title":"Section 3: Schedules","text":"

We begin with nest from Section 2 which captures the logic of matrix-matrix multiplication. We use nest to create a Schedule that controls the execution order of the nest's iterations. Schedules are target-independent in the sense that the same schedule can be used to emit code for multiple target platforms.

We create a default schedule as follows:

schedule = nest.create_schedule()\n

The default schedule is equivalent to the following straightforward for-loop version of the loop nest:

for i in range(3):\n    for j in range(12):\n        for k in range(15):\n            C[i, j] += A[i, k] * B[k, j]\n
In other words, each of the logical pseudo-code loops in nest becomes an actual for-loop in the default schedule. The for-loop sizes can be known at compile-time or at runtime.

We can now transform this schedule in various ways. However, these transformations do not change the underlying logic defined in nest and merely change the order of the loop iterations. We can even generate as many independent schedules as we want by calling nest.create_schedule().

"},{"location":"Manual/03%20Schedules/#iteration-spaces-a-geometric-representation-of-schedules","title":"Iteration spaces: a geometric representation of schedules","text":"

In the Accera programming model, a schedule is geometrically interpreted as a multi-dimensional discrete hypercube called the iteration space of the nest. The elements of the iteration space represent the individual iterations of the loop nest. Initially, the dimensions of the iteration space correspond to the logical loops defined in nest.

For example, the default iteration space for the matrix-matrix multiplication nest forms a three-dimensional discrete hypercube, whose shape is (3, 12, 15):

The (3, 12, 15) iteration space. The arrows labelled 1, 2, and 3 indicate the dimension order and direction.

"},{"location":"Manual/03%20Schedules/#how-does-an-iteration-space-imply-an-order-over-the-iterations","title":"How does an iteration space imply an order over the iterations?","text":"

The dimensions of the iteration space are ordered, and this order corresponds to the original order of the logical loops in nest by default. In fact, the order over the dimensions induces a lexicographic sequence over the individual elements of the iteration space.

Video showing sequence of iterations for the (3, 12, 15) iteration space.

This geometric interpretation of schedules helps us visualize how different transformations modify them. While some transformations merely rearrange the elements of the iteration space, others increase its dimensions, and some even pad the space with empty (no-op) elements. The transformed iteration space defines a new lexicographic order over the individual iterations.

Comment: It is important not to confuse arrays, like A, B, C, with iteration spaces, like schedule. A possible source of confusion could be that both arrays and iteration spaces have a multidimensional rectilinear structure (i.e., they both look like hypercubes). However, arrays and iteration spaces are fundamentally different. Arrays are data structures whose elements are scalars. Iteration spaces are abstract geometric representations of schedules and their elements represent individual iterations of a loop nest. Transformations apply to iteration spaces, not to arrays.

Comment: Accera's geometric interpretation of schedules resembles the iteration domain polyhedron, which is the cornerstone of the polyhedral model of compiler optimization. However, unlike polyhedrons, Accera iteration spaces are not embedded in a continuous space and cannot be manipulated by algebraic transformations. Accera iteration spaces always remain rectilinear and are inherently discrete objects.

"},{"location":"Manual/03%20Schedules/#iteration-space-slices","title":"Iteration space slices","text":"

Iteration space slices is an abstract concept that affects different aspects of the Accera programming model. Since the iteration space dimensions are ordered, each element of the iteration space can be identified by a vector of coordinates. For example, the vector (1, 6, 7) identifies the iteration at position 1 along the first dimension, position 6 along the second dimension, and position 7 along the third dimension. If one or more coordinates are replaced with the wildcard symbol *, we get an iteration space slice, which is a set of iterations obtained by replacing the wildcard with all possible values. For example, (*, *, 2) represents a slice containing all the elements with 2 as their last coordinate. The dimension of a slice equals the number of wildcards in its definition.

The (3, 12, 15) iteration space. Highlighted elements belong to the (*, *, 2) slice.

Iteration space slices in four dimensions, denoted by indices (i, j, jj, k):

(1, *, *, *) (*, *, *, 3) (2, *, 0, *)"},{"location":"Manual/03%20Schedules/#loops-indices-and-dimensions","title":"Loops, indices, and dimensions","text":"

When we defined nest, we used variables such as i, j, and k to name the loops in the loop-nest. When we described the default schedule using equivalent for-loops, i, j, and k became the index variables of those loops. Now, when we represent a schedule as an iteration space, these variables are used as the names of the corresponding iteration space dimensions. From here on, we move seamlessly between these different representations and use the terms loop, index, and dimension interchangeably.

"},{"location":"Manual/03%20Schedules/#schedule-transformations","title":"Schedule transformations","text":"

Iteration space transformations change the shape of the iteration space, possibly by adding dimensions or padding the space with empty elements.

The iterations space always retains its rectilinear (hypercube) shape. In some cases, Accera transformations must pad the iteration space with empty elements to avoid reaching a jagged iteration space structure.

"},{"location":"Manual/03%20Schedules/#reorder","title":"reorder","text":"
# Reorder the indices.\nschedule.reorder(k, i, j)\n

The reorder transformation sets the order of indices in the schedule. From the iteration space point-of-view, reorder performs a pivot rotation of the iteration space, which orients its dimensions in a specified order. Since the iteration space elements are executed in lexicographic order, pivoting the iteration space is equivalent to reordering the loops.

For example, we can write:

schedule.reorder(j, k, i)\n
After this transformation, schedule becomes equivalent to:
for j in range(12):\n    for k in range(15):\n        for i in range(3):\n            C[i, j] += A[i, k] * B[k, j]\n

Default schedule, order is (i, j, k) After reorder(j, k, i), order is (j, k, i)"},{"location":"Manual/03%20Schedules/#invalid-orders","title":"Invalid orders","text":"

Some orders are not allowed. Describing these restrictions in full will require concepts that are yet to be introduced. Therefore, we are stating these restrictions here and will discuss them later in the upcoming sections. The restrictions are: 1. The inner dimension created by a split transformation (see below) must be ordered later than its corresponding outer dimension. 2. The fusing dimension created by a fuse operation (see Section 4) must always precede any unfused dimensions.

Also note that reorder can also have the following overloaded form:

schedule.reorder(order=(j, k, i))\n
This form is better suited for use with parameters (see Section 9).

"},{"location":"Manual/03%20Schedules/#split","title":"split","text":"
# Splits dimension i into equally-sized parts, orients those parts along a new dimension ii, and stacks those parts along dimension i\nii = schedule.split(i, size)\n

From the iteration space point-of-view, the split transformation takes a dimension i and a size, modifies i, and creates a new dimension ii. Assume that the original size of dimension i was n: The split transformation splits the dimension i into ceil(n/size) parts of size size, orients each of these parts along dimension ii, and stacks the ceil(n/size) parts along the dimension i. If the split size does not divide the dimension size, empty elements are added such that the split size does divide the dimension size. As a result of the split, the size of i becomes ceil(n/size), the size of the new dimension ii equals size, and the iteration space remains rectilinear.

In loop terms, ii = split(i, size) splits loop i into two loops: an inner loop ii and an outer loop, which inherits the original name i. Note that the outer loop always precedes the corresponding inner loop in the loop ordering.

For example, starting from nest defined in Section 2, we could write:

schedule = nest.create_schedule()\njj = schedule.split(j, 3)\n
The resulting iteration space has a shape of (3, 4, 3, 15) and corresponds to the following python code:
for i in range(3):\n    for j in range(0, 12, 3): # length 4, stride 3\n        for jj in range(3):\n            for k in range(15):\n                C[i, j+jj] += A[i, k] * B[k, j+jj]\n
Note that loop j is no longer normalized (it has a stride of 3 rather than 1), which means that the nest is no longer a simple nest. As mentioned in the previous section, Nest objects always represent simple nests, but Schedule objects can represent more complex affine loop nests.

Default schedule After split(j, 3)

After performing a split, both the outer index and the inner index can be split again. For example,

schedule = nest.create_schedule()\njj = schedule.split(j, 3)\njjj = schedule.split(j, 2) # split the outer index j again\n
After the first split, the iteration space has the shape (3, 4, 3, 15). After the second split, the shape becomes (3, 2, 2, 3, 15). The transformed schedule corresponds to the following Python code:
for i in range(3):\n    for j in range(0, 12, 6): # length 2, stride 6\n        for jjj in range(0, 6, 3): # length 2, stride 3\n            for jj in range(3):\n                for k in range(15):\n                    C[i, j+jj+jjj] += A[i, k] * B[k, j+jj+jjj]\n

The split does not necessarily need to divide the dimension size. For example, consider the following code:

schedule = nest.create_schedule()\njj = schedule.split(j, 5)  # original size of dimension j was 12\n
From the iteration space point-of-view, this code splits dimension j into three parts of size 5, where the last part is padded with empty (no-op) elements. Before the transformation, the iteration space shape is (3, 12, 15), and after the transformation, the shape is (3, 3, 5, 15) (so, 135 empty elements were added).

Default schedule (no-op elements in blue) After split(j, 5)

In loop form, the transformed iteration space corresponds to the following Python code:

for i in range(3):\n    for j in range(0, 12, 5):\n        for jj in range(5):\n            for k in range(15):\n                if j+jj < 12\n                C[i, j+jj] += A[i, k] * B[k, j+jj]\n
Note that Accera optimizes away costly if statements by unswitching the loops, which results in code that looks more like this:
for i in range(3):\n    for j in range(0, 10, 5):\n        for jj in range(5):\n            for k in range(15):\n                C[i, j+jj] += A[i, k] * B[k, j+jj]\n        # loop unswitching: handle the last iteration of the j loop separately\n        for j in range(10, 12):\n            for k in range(15):\n                C[i, j] += A[i, k] * B[k, j]\n

"},{"location":"Manual/03%20Schedules/#meaningless-splits","title":"Meaningless splits","text":"

Next, we will describe Accera\u2019s behavior in a few degenerate cases. If the split size equals the dimension size, the transformation simply renames the split dimension. For example,

schedule = nest.create_schedule()\njj = schedule.split(j, 12) # original size of dimension j was 12\n
After the split, the size of j becomes 1 and the size of jj is 12. The new shape of the iteration space is (3, 1, 12, 15). The dimension j becomes meaningless and therefore the schedule is basically unchanged.

If the split size exceeds the dimension size, Accera will treat it as if the split size doesn't divide the dimension size. This special case is handled by adding empty elements. For example,

schedule = nest.create_schedule()\njj = schedule.split(j, 13)  # original size of dimension j was 12\n
After the split, the size of j becomes 1 and the size of jj, 13. The new shape of the iteration space is (3, 1, 13, 15), which means that 45 empty elements were added. These empty elements are removed during code generation, which means that the schedule is basically unchanged.

Finally, note that jj = schedule.split(j, 1) simply adds a meaningless new dimension jj of size 1, and again, the schedule is unchanged.

"},{"location":"Manual/03%20Schedules/#convenience-syntax-tile","title":"Convenience syntax: tile","text":"

The tile transformation is a convenience syntax and does not provide any unique functionality. Consider the following code

schedule = nest.create_schedule()\njj, kk = schedule.tile({\n    j: 2,\n    k: 3\n})\n
The tile transformation above is shorthand for the following sequence of transformations:
jj = schedule.split(j, 2)\nkk = schedule.split(k, 3)\n

It will result in a sequence of indices that are ordered as:

(i, j, jj, k, kk)\n
In other words, the tile transformation takes a tuple of indices and a tuple of sizes, splitting each index by the corresponding size. The indices involved in the split are then ordered such that each of the outer indices (parent index) precedes its inner indices (child index). On the other hand, indices that did not participate in the transformation retain their relative positions.

"},{"location":"Manual/03%20Schedules/#skew","title":"skew","text":"
# Skew dimension i with respect to dimension j.\nschedule.skew(i, j)\n

The skew transformation is the easiest to explain for a two-dimensional iteration space of shape (N, M). Skewing dimension i (the row dimension) with respect to j (the column dimension) modifies the iteration space column-by-column: column j gets j empty elements added to its start and M-j-1 empty elements to its end. As a result, each column grows from size N to size N+M-1. Geometrically, the original iteration space elements take the form of a 45-degree parallelogram, embedded within a bounding rectangle of shape (N+M-1, M). The element that used to be at coordinate (i, j) moves to coordinate (i+j, j).

Similarly, skewing j with respect to i adds empty elements at the beginning and end of each row, resulting in an iteration space of shape (N, N+M-1). In higher dimensions, we simply apply the two-dimensional skew transformation independently to each two-dimensional slice along the two specified dimensions.

To demonstrate the importance of this transformation, consider convolving a 10-element vector with a 3-element filter. The loop logic for this operation is defined as follows:

import accera as acc\n\nN = 10  # input size\nK = 3  # filter size\nM = N - K + 1  # output size = 8\n\nA = acc.Array(role=acc.Role.INPUT, shape=(N,))\nB = acc.Array(role=acc.Role.INPUT, shape=(K,))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(M,))\n\nnest = acc.Nest(shape=(M, K))\ni, j = nest.get_indices()\n\n@nest.iteration_logic\ndef _():\n    C[i] += A[i+j] * B[j]\n\nschedule = nest.create_schedule()\n
schedule corresponds to an iteration space of shape (8,3), where the first dimension corresponds to the 8 elements of the output vector. This schedule calculates the outputs one by one: first C[0], then C[1], etc.

Here is the equivalent Python code:

for i in range(8):\n    for j in range(3):\n        C[i] += A[i+j] * B[j]\n

Now, say that we apply the skew transformation as follows:

schedule.skew(i, j)\n
This transformation results in an iteration space of shape (10, 3), where the first dimension corresponds to the 10 elements of the input. This transformed schedule processes the input elements one-by-one: it extracts all the information from A[0] (A[0] is only used in the calculation of C[0]), then moves on to A[1] (which contributes to both C[0] and C[1]), and so on.

In this example, the default schedule achieves memory locality with respect to array C, whereas the skewed schedule achieves memory locality with respect to array A.

In loop form, the transformed iteration space corresponds to the following Python code:

for i in range(10):\n    for j in range(3):\n        if (i-j) >= 0 and (i-j) < 8:\n            C[i-j] += A[i] * B[j]\n

Behind the scenes, unswitching the loops results in code that looks more like this:

# triangle of height 2, width 3\nfor j in range(1):\n    C[0-j] += A[0] * B[j]\nfor j in range(2):\n    C[1-j] += A[1] * B[j]\n\n# rectangle of shape (6, 3)\nfor i in range(2, 8):\n    for j in range(3):\n        C[i-j] += A[i] * B[j]\n\n# upside-down triangle of height 2, width 3\nfor j in range(2):\n    C[6+j] += A[8] * B[2-j]\nfor j in range(1):\n    C[7+j] += A[9] * B[2-j]\n

Finally, note that some loops have small sizes that can be replaced by unrolls. To enable the unrolling of these small loops, we can use this optional parameter:

schedule.skew(i, j, unroll_loops_smaller_than=3)\n

This will unroll all loops that are smaller than 3, which include the range(2) and range(1) loops in the example above.

"},{"location":"Manual/03%20Schedules/#pad","title":"pad","text":"
# Adds empty elements to the beginning of dimension i.\nschedule.pad(i, size)\n

The pad transformation pads the beginning of dimension i with empty elements. This operation is meaningless by itself, but can be useful when used with splitting or fusing.

"},{"location":"Manual/03%20Schedules/#order-invariant-schedules-and-safety","title":"Order-invariant schedules and safety","text":"

A schedule is order-invariant if its underlying logic doesn't depend on the execution order of its iterations. For example, schedules created from a single Nest (via create_schedule()) are order-invariant. All of the schedules discussed so far have been order-invariant.

A schedule is safe if its underlying logic is guaranteed to remain intact regardless of the applied transformations. Not all schedules are safe, but order-invariant schedules are. This is because the transformations introduced in this section only change the execution order of iterations without adding or removing any work.

In Section 4, we introduce fused schedules, which are not order-invariant, but may still be safe.

"},{"location":"Manual/04%20Fusing/","title":"Section 4: Fusing","text":"

With fuse operation, multiple schedules can be combined into a single schedule representing the union of the work in the original schedules. These fused schedules can be transformed by any of the transformations presented in Section 3.

"},{"location":"Manual/04%20Fusing/#full-fusing","title":"Full fusing","text":"
import accera as acc\n\n# Fuse three schedules to create a fused schedule\nschedule = acc.fuse(schedule0, schedule1, ...)\n

Full fusing is the most straightforward, where each dimension is fused with the corresponding dimension from other schedules.

"},{"location":"Manual/04%20Fusing/#full-fusing-of-same-shaped-iteration-spaces","title":"Full fusing of same-shaped iteration spaces","text":"

First, consider the simplest case where we fuse schedules with identical iteration space shapes. This fusing assigns a new dimension called fusing dimension to the fused schedule schedule that does not exist in the original schedules. By default, the fusing dimension is the last dimension in the fused schedule. Its size is equal to the number of fused schedules. The slices along the fusing dimension contain a copy of the iteration logic of schedule0, schedule1. The first slice along the fusing dimension contains a copy of the iteration logic of schedule0, the second slice contains that of schedule1, and so on. Since the fusing dimension is the last dimension, the fused schedule is logically equivalent to executing an iteration of schedule0, followed by an iteration of schedule1, and so on.

Consider a scenario where we want first to shift and then scale each element of a matrix. In other words, we want to perform the equivalent of the below Python code:

C = (C + A) * B\n

If all three matrices are 10 by 10, one way to do this without fusing is to write:

A = acc.Array(role=acc.Role.INPUT, shape=(10, 10))\nB = acc.Array(role=acc.Role.INPUT, shape=(10, 10))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(10, 10))\n\n# Create nest_simple and schedule_simple\nnest_simple = acc.Nest(shape=(10, 10))\ni, j = nest_simple.get_indices()\n\n@nest_simple.iteration_logic\ndef _():\n    C[i, j] = (C[i, j] + A[i, j]) * B[i, j]\n\nschedule_simple = nest_simple.create_schedule()\n
Note that each iteration in schedule_simple executes simultaneously on all three arrays. However, there can be a case where concurrent operation on these arrays creates excessive pressure on the computer\u2019s memory cache, resulting in lower performance. In such a case, simultaneous operation on two arrays instead of three has a computational advantage.

Therefore, we may first want to compute C += A and then compute C *= B. Better yet, we may want to compute C in 2\u00d72 blocks. We first compute C[0:2, 0:2] += A[0:2, 0:2]. Subsequently, we compute C[0:2, 0:2] *= B[0:2, 0:2]. Finally, we move on to the next block and compute C[2:4, 0:2] += A[2:4, 0:2], and so on. This way, fusing offers remarkable flexibility to explore all of these different execution possibilities.

First, we define two separate nests, one for the C += A logic and one for the C *= B logic, and get their corresponding default schedules:

# Create nest0 and schedule0\nnest0 = acc.Nest(shape=(10, 10))\ni0, j0 = nest0.get_indices()\n\n@nest0.iteration_logic\ndef _():\n    C[i0, j0] += A[i0, j0]\n\nschedule0 = nest0.create_schedule()\n\n# Create nest1 and schedule1\nnest1 = acc.Nest(shape=(10, 10))\ni1, j1 = nest1.get_indices()\n\n@nest1.iteration_logic\ndef _():\n    C[i1, j1] *= B[i1, j1]\n\nschedule1 = nest1.create_schedule()\n

Before fusing, both schedule0 and schedule1 have a shape (10, 10). Now, let\u2019s fuse them:

# Create a fused schedule\nschedule = acc.fuse(schedule0, schedule1)\ni, j, f = schedule.get_indices()\n
Fusing creates a new fused schedule schedule with a shape (10, 10, 2). It does not change schedule0 and schedule1. The last dimension in schedule is the so-called fusing dimension f. Its slice (*, *, 0) contains a copy of schedule0, and its slice (*, *, 1) contains a copy of schedule1.

Before fusing After fuse(schedule0, schedule1)

In loop form, schedule is now equivalent to the following Python code:

for i in range(10):\n    for j in range(10):\n        # f = 0\n        C[i, j] += A[i, j]\n        # f = 1\n        C[i, j] *= B[i, j]\n

Resulting iteration sequence for C = (C + A) * B. (White elements represent C + A; purple elements are C * B)

"},{"location":"Manual/04%20Fusing/#tiling","title":"Tiling","text":"

Recall that we discussed computing the output block-by-block: first computing C[0:2, 0:2] += A[0:2, 0:2], then computing C[0:2, 0:2] *= B[0:2, 0:2], and so on. This can be achieved with the following sequence of transformations:

ii, jj = schedule.tile({\n    i: 2,\n    j: 2\n})\nschedule.reorder(i, j, f, ii, jj)\n
The resulting schedule is equivalent to the following Python code:
for i in range(0, 10, 2):\n    for j in range(0, 10, 2):\n        # f = 0\n        for ii in range(2):\n            for jj in range(2):\n                C[i+ii, j+jj] += A[i+ii, j+jj]\n        # f = 1\n        for ii in range(2):\n            for jj in range(2):\n                C[i+ii, j+jj] *= B[i+ii, j+jj]\n

"},{"location":"Manual/04%20Fusing/#constraints-of-fusing-dimension","title":"Constraints of Fusing Dimension","text":"

The fusing dimension comes with certain constraints that are discussed from the safety perspective with examples.

"},{"location":"Manual/04%20Fusing/#constraint-1-the-fusing-dimension-is-executed-sequentially","title":"Constraint 1: the fusing dimension is executed sequentially","text":"

Unlike other dimensions that allow parallelization, vectorization, or tensorization (see Section 7 ), none of these operations can be applied to the fusing dimension. The fusing dimension must be executed sequentially. This constraint enables the safety guarantee discussed below.

"},{"location":"Manual/04%20Fusing/#safety","title":"Safety","text":"

Before applying any subsequent transformations, the fused schedule is always logically equivalent to executing the original schedules sequentially for each value of the fused dimensions. However, is it safe? Recall that a schedule is considered safe if the underlying logic is guaranteed to be unchanged regardless of the applied transformation. The safety of a fused schedule depends on circumstances that may break logic equivalence:

Accera preserves the order of the fused schedules for each value of the fused dimensions, regardless of how the fused schedule is transformed. For example, in the example above, the fused dimensions are i and j. Therefore, for any concrete value of i and j, the corresponding operation from schedule0 is guaranteed to execute before the corresponding operation from schedule1, regardless of how the fused schedule is transformed. More specifically, for each i and j, the operation C[i, j] += A[i, j] is guaranteed to execute before the operation C[i, j] *= B[i, j], no matter how we transform the fused schedule. Since those are the only operations that interact with C[i,j], the Accera guarantee is sufficient, and we can claim that the fused schedule is safe. With this assurance, the programmer can apply any sequence of transformations without worrying about the correctness of the resulting implementation.

However, not every fusing operation creates a safe schedule. For example, consider a scenario where we fused schedule0 and schedule1 differently:

# Reorder schedule1 before fusing\nschedule1.reorder(j1, i1)\n# Fuse schedule0 with the reordered schedule1\nschedule_t = acc.fuse(schedule0, schedule1)\na, b, f = schedule_t.get_indices()\n
In this unnatural example, i0 and j1 are fused and named a. Similarly,i1 and j0 are fused and named b. As mentioned above, Accera guarantees that, for each value of a and b, the operation C[a, b] += A[a, b] is executed before C[b, a] *= B[b, a]. The fusing operation itself preserves the logical equivalence. However, the underlying logic is changed with the transformation performed before fusion:
schedule1.reorder(j1, i1)\n
To understand this change in the logic, note that the resulting schedule is equivalent to the following Python code:
for a in range(10):\n    for b in range(10):\n        C[a, b] += A[a, b]\n        C[b, a] *= B[b, a]\n
The above code sets C[1,0] to C[1,0] * B[1,0] + A[1,0], whereas the original fused logic set C[1,0] to (C[1,0] + A[1,0]) * B[1,0]. In this case, we can conclude that schedule_t is definitely not safe. If the programmer decides to create an unsafe schedule, they take upon themselves the responsibility of maintaining logical equivalence.

"},{"location":"Manual/04%20Fusing/#fusing-iteration-spaces-with-different-shapes","title":"Fusing iteration spaces with different shapes","text":"

If the iterations spaces have different shapes, Accera matches their shapes by padding them appropriately with empty cells.

"},{"location":"Manual/04%20Fusing/#partial-fusing","title":"Partial fusing","text":"

Instead of fusing all the dimensions, we may want to fuse a subset of dimensions, leaving the rest unfused. To fuse the first s dimensions, we use the syntax:

# Fuse the first s dimensions of three schedules\nschedule = acc.fuse((schedule0, schedule1, ...), partial=s)\n
The order of the dimensions in the fused schedule is as follows: first the fused dimensions s, then the fusing dimension f, followed by the unfused dimensions of schedule0, schedule1, and so on.

We can easily calculate the number of dimensions in the fused schedule. For example, if we fuse the first s dimensions of a d0-dimensional space schedule0 and a d1-dimensional space schedule1, the fused iteration space will have s fused dimensions, d0 + d1 - 2s unfused dimensions, and the special fusing dimension f, for a total of d0 + d1 - s + 1 dimensions.

The fuse operation uses padding to ensure that the fused iteration space is not jagged in any direction. For example, say that we partially fuse the first 2 dimensions of schedule0, which is 4-dimensional, and schedule1, which is 3-dimensional:

schedule = acc.fuse((schedule0, schedule1), partial=2)\ni, j = schedule.get_fused_indices()\nf = schedule.get_fusing_index()\nk, l, m = schedule.get_unfused_indices()\n# Alternative way:\n# i, j, f, k, l, m = schedule.get_indices()\n
First comes the fused dimensions i and j. Nest is the fusing dimensions f of size 2, followed by the unfused dimensions k and l from schedule0 and m from schedule1. The slice (*, *, 0, *, *, 0) contains a copy of schedule0, the slice (*, *, 1, 0, 0, *) contains a copy of schedule1, and the rest of schedule is padded with empty elements. Note that full fusing is a special case of partial fusing, where s is the larger of the dimensions of schedule0 and schedule1.

"},{"location":"Manual/04%20Fusing/#constraint-2-the-fusing-dimension-always-precedes-unfused-dimensions","title":"Constraint 2: the fusing dimension always precedes unfused dimensions","text":"

Another constraint introduced by partial fusing is that the fusing dimension must precede all of the unfused dimensions in its dimension order. This constraint applies to dimensions derived from the fusing dimension and the unfused dimensions via splitting.

"},{"location":"Manual/04%20Fusing/#safety_1","title":"Safety","text":"

The safety guarantees for partial fusing are a natural extension of the guarantees for full fusing. For each value of the fused dimensions, Accera preserves the fused schedules' order regardless of how the fused schedule is transformed. In other words, for each concrete value of fused dimensions, all the corresponding work in schedule0 (across all of its unfused dimensions) is performed before the corresponding work in schedule1 (across all of its unfused dimensions). This remains true no matter how we transform the fused schedule. While fusing, the programmer needs to consider if this property implies safety. The below examples shows how this can be done.

"},{"location":"Manual/04%20Fusing/#partial-fusing-example-fully-connected-neural-layer-with-activation","title":"Partial fusing example: fully-connected neural layer with activation","text":"

Consider applying an element-wise operation, such as the ReLU function of AI, to the result of a matrix-matrix multiplication. This is called a fully connected layer with a ReLU activation in the language of neural networks. The function relu(x) is simply max(x,0).

Imagine that we have an element-wise operator relu, and we want to implement the equivalent Python code:

C = relu(C + A @ B)\n
Here, A has a shape of (8, 4), B has a shape of (4, 8), and C has a shape of (8, 8). Let\u2019s now define two nests, one for C += A @ B and the other for C = relu(C), and obtain their corresponding default schedules:
# Create nest0 and schedule0\nnest0 = acc.Nest(shape=(8, 8, 4))\ni0, j0, k0 = nest0.get_indices()\n\n# nest0 performs C += A @ B\n@nest0.iteration_logic\ndef _():\n    C[i0, j0] += A[i0, k0] * B[k0, j0]\n\nschedule0 = nest0.create_schedule()\n\n# Create nest1 and schedule1\nnest1 = acc.Nest(shape=(8, 8))\ni1, j1 = nest1.get_indices()\n\n# nest1 performs C = relu(C)\n@nest1.iteration_logic\ndef _():\n    C[i1, j1] = acc.max(C[i1, j1], 0)\n\nschedule1 = nest1.create_schedule()\n
In schedule0 and schedule1, the first dimension represents the rows of C and the second dimension represents the columns of C. Additionally, schedule0 has a third dimension that schedule1 does not have. Therefore, we fuse the first two dimensions of the iteration spaces and leave the third dimension of schedule0 unfused.
schedule = acc.fuse((schedule0, schedule1), partial=2)\ni, j = schedule.get_fused_indices()\nf = schedule.get_fusing_index()\nk0 = schedule.get_unfused_indices()[0]\n# Alternative way:\n# i, j, f, k0 = schedule.get_indices()\n

The fused iteration space schedule has a shape of (8, 8, 2, 4). Its slice (*, *, 0, *) contains a copy of schedule0, the slice (*, *, 1, 0) contains a copy of schedule1, and the rest of its elements are padded. Note that the code above overwrites the index k0, which initially was an index of schedule0. However, now it corresponds to the unfused index in schedule. Note that the name k0 is a stylistic choice, we could have chosen a different name.

Before fusing After fuse((schedule0, schedule1), partial=2) (padded elements in blue)"},{"location":"Manual/04%20Fusing/#safety_2","title":"Safety","text":"

Is schedule safe? Recall that for each value of i and j, Accera guarantees that the corresponding work in schedule0 (C[i,j] += A[i,k0] * B[k0,j] for all values of k0) is executed before the corresponding work in schedule1 (C[i,j] = max(C[i,j], 0)), and this holds regardless of how the fused schedule is transformed. Since these are the only operations that touch C[i,j] and the ReLU operation is always executed last, this warrants that schedule is safe. Therefore, we can focus all of our attention on optimizing performance without worrying about correctness from this point onwards.

The resulting schedule is now equivalent to the following Python code:

for i in range(16):\n    for j in range(10):\n        # f = 0\n        for k0 in range(11):\n            C[i,j] += A[i,k0] * B[k0,j]\n        # f = 1\n        C[i,j] = max(C[i,j], 0)\n

Iteration sequence for C = relu(C + A @ B). (White elements represent C + A @ B; purple elements are relu(C); blue elements are padding.)

"},{"location":"Manual/04%20Fusing/#partial-fusing-example-multiplying-three-matrices","title":"Partial fusing example: multiplying three matrices","text":"

Consider fusing two matrix-matrix multiplications to get matrix-matrix-matrix multiplication. Specifically, say that our goal is to calculate the equivalent of the following Python code:

E += A @ B @ D\n
Where A has a shape (4, 5), B (5, 6), D (6, 10), and E (4, 10).

We start by defining the arrays. In addition to A, B, D, and E, we define a temporary array C to store the intermediate result of A@B.

A = acc.Array(role=acc.Role.INPUT, shape=(4, 5))\nB = acc.Array(role=acc.Role.INPUT, shape=(5, 6))\nC = acc.Array(role=acc.Role.TEMP, shape=(4, 6))\nD = acc.Array(role=acc.Role.INPUT, shape=(6, 10))\nE = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(4, 10))\n
Note that C has the role of TEMP. Temporary arrays are mutable and initialized with zeros. Moreover, these arrays are logical objects that may not exist in memory during the entire computation.

Next, define a simple nest to compute C += A @ B and another simple nest to compute E += C @ D.

# Create nest0 and schedule0 for C = A @ B\nnest0 = acc.Nest(shape=(4, 6, 5))\ni0, j0, k0 = nest0.get_indices()\n\n@nest0.iteration_logic\ndef _():\n    C[i0, j0] += A[i0, k0] * B[k0, j0]\n\nschedule0 = nest0.create_schedule()\n\n# Create nest1 and schedule1 E += C @ D\nnest1 = acc.Nest(shape=(4, 10, 6))\ni1, j1, k1 = nest1.get_indices()\n\n@nest1.iteration_logic\ndef _():\n    E[i1, j1] += C[i1, k1] * D[k1, j1]\n\nschedule1 = nest1.create_schedule()\n
The temporary array C stores the output of schedule0, which is then used as one of the inputs of schedule1. Dimensions i0 and j0 correspond to the rows and columns of C in schedule0. Similarly, dimensions i1 and k1 correspond to the rows and columns of C in schedule1. Therefore, we fuse i0 with i1 and j0 with k1. We need to correctly line up the dimensions of the two iteration spaces and perform partial fusing.
schedule1.reorder(i1, k1, j1)\nschedule = acc.fuse((schedule0, schedule1), partial=2)\ni, j = schedule.get_fused_indices()\nf = schedule.get_fusing_index()\nk0, j1 = schedule.get_unfused_indices()\n# Alternative way:\n# i, j, f, k0, j1 = schedule.get_indices()\n

Before reorder(i1, k1, j1) After reorder(i1, k1, j1)

The fused iteration space has a shape of (4, 6, 2, 5, 10). i is the result of fusing i0 and i1, j is the result of fusing j0 and k1 and f is the fusing dimension. On the other hand, k0 is the unfused dimension from schedule0, and j1 is the unfused dimension from schedule1. The slice (*, *, 0, *, 0) contains a copy of schedule0 and the slice (*, *, 1, 0, *) contains a copy of schedule1. The rest of the iteration space is padded with empty elements.

After fuse((schedule0, schedule1), partial=2) (White elements represent C += A @ B; purple elements are E += C @ D; blue elements are padding.)

"},{"location":"Manual/04%20Fusing/#safety_3","title":"Safety","text":"

Is schedule safe? Again, recall that for each value of i and j, Accera guarantees that all of the corresponding work in schedule0 (C[i, j] += A[i, k0] * B[k0, j] for all values of k0) is executed before any of the corresponding work in schedule1 (E[i, j1] += C[i, j] * D[j, j1] for all values of j1). In other words, each element of C is entirely computed before it is used. This confirms that the schedule is safe.

Initially, the fused schedule is equivalent to the following Python code:

for i in range(4):\n    for j in range(6):\n        for f in range(2):\n            for k0 in range(5):\n                for j1 in range(7):\n                    if f == 0 and j1 == 0:\n                        # f = 0, create C[i, j]\n                        C[i, j] += A[i, k0] * B[k0, j]\n                    if f == 1 and k0 == 0:\n                        # f = 1, use C[i, j]\n                        E[i, j1] += C[i, j] * D[j, j1]\n

The simplified loops after unswitching:

for i in range(4):\n    for j in range(6):\n        # f = 0, create C[i, j]\n        for k0 in range(5):\n            C[i, j] += A[i, k0] * B[k0, j]\n        # f = 1, use C[i, j]\n        for j1 in range(7):\n            E[i, j1] += C[i, j] * D[j, j1]\n

The advantage of this schedule is that only one element of C is active at any time in the computation. Accera can reuse the same memory location to store the active element of C instead of storing all of C in physical memory.

"},{"location":"Manual/04%20Fusing/#tiling_1","title":"Tiling","text":"

As a further optimization, we can compute a 2\u00d73 block of C. Do all the work that uses this block and then move on to the next block:

ii, jj = schedule.tile({\n    i: 2,\n    j: 3\n})\nschedule.reorder(i, j, f, ii, jj, k0, j1)\n
This schedule is equivalent to the following Python code:
for i in range(0, 4, 2):\n    for j in range(0, 6, 3):\n        # f = 0\n        for ii in range(2):\n            for jj in range(3):\n                for k0 in range(11):\n                    C[i+ii, j+jj] += A[i+ii, k0] * B[k0, j+jj]\n        # f = 1\n        for ii in range(2):\n            for jj in range(3):\n                for j1 in range(7):\n                    E[i+ii, j1] += C[i+ii, j+jj] * D[j+jj, j1]\n

"},{"location":"Manual/05%20Targets/","title":"Section 5: Targets","text":"

Accera is a cross compiler, which means that it can generate executable code for different target platforms. A target is described using the Target class. Accera already supports many different targets, for example:

import accera as acc\n\ncorei9 = acc.Target(Target.Model.INTEL_7960X, num_threads=44)\n
or

v100 = acc.Target(Target.Model.NVIDIA_V100)\n
or

corei7 = acc.Target(known_name=\"Intel 7700T\")\n

To query the list of known names:

dir(acc.Targets.Model)\n

We can also define custom targets:

my_target = acc.Target(name=\"Custom processor\", category=acc.Target.Category.CPU, architecture=acc.Target.Architecture.X86_64, family=\"Broadwell\", extensions=[\"MMX\", \"SSE\", \"SSE2\", \"SSE3\", \"SSSE3\", \"SSE4\", \"SSE4.1\", \"SSE4.2\", \"AVX\", \"AVX2\", \"FMA3\"], num_cores=22, num_threads=44, frequency_GHz=3.2, turbo_frequency_GHz=3.8, cache_sizes=[32, 256, 56320], cache_lines=[64, 64, 64])\n

One benefit of targets is that they provide a standard way of accessing useful constants. For example, we may want to split an iteration space dimension by the number of elements that fit in a vector register.

schedule.split(i, size=corei9.vector_bytes/4)\n
We may tile the iteration space for GPU targets based on input shapes and available resources like shared memory. If you are not sure of what to use, try starting with the default:
# find block_x and block_y in powers of two, such that block_x*block_y=v100.default_block_size.\nblock_x = pow(2, math.log2(v100.default_block_size)//2)\nblock_y = v100.default_block_size // block_x\nii, jj = schedule.tile({\n    i: block_x,\n    j: block_y\n})\n

"},{"location":"Manual/06%20Plans%20-%20Caching/","title":"Section 6: Plans - Caching","text":"

In the previous sections, we defined the logic and then scheduled its iterations. Now, let's move on to completing the implementation with target-specific options.

First, we create a plan from the schedule:

plan = schedule.create_plan()\n
The Accera programming model allows us to create multiple plans from a single schedule. More importantly, we can modify individual plans without changing the schedule. We can manually specify the target platform by calling create_plan that takes a target argument. The default value of this target argument is acc.Target.HOST, which refers to the current host computer.

In this section, we discuss how to add data caching strategies to a plan.

Not yet implemented: Data caching strategies are not supported when one or more of the Array's dimension sizes are specified at runtime.

"},{"location":"Manual/06%20Plans%20-%20Caching/#key-slices","title":"Key slices","text":"

Recall that a slice is a set of iteration space elements that match a coordinate template with wildcards, such as (1, *, 3). A key-slice is a slice with only right-aligned wildcards, such as (1, 2, *) and (3, *, *). The level of a key-slice is determined by the number of wildcards in its definition. For example, (1, 2, *) is a level 1 key-slice and (3, *, *) is a level 2 key-slice.

Note that the key-slices are changed by reordering the dimensions of the iteration space. However, it is always true that the entire d-dimensional iteration space is a level d key-slice and each individual element is a level zero key-slice. For a total of d+1 different key-slices, each iteration belongs to one key-slice from each level, zero to d. When the schedule is executed, the key-slices containing the current iteration are called the current key-slices.

In the Accera programming model, key-slices are significant because they partition the iteration space into sets of consecutive iterations. Therefore, they can describe the phases of computation at different levels of granularity. The term key-slice suggests using them to key (trigger) different actions. Specifically, each time the current level-l key slice changes, we use this event to trigger a cache update.

As mentioned above, a key-slice can be identified by its level. Another way to specify a key-slice is to take advantage of the iteration space dimensions being named in the order. To specify a key-slice for a dimension, replace it and subsequent dimensions with wildcard symbols. For example, if the names of the iteration space dimensions are (i, j, k), then a key-slice that corresponds to the dimension j is one of (0, *, *), (1, *, *), etc. Both ways of specifying a key-slice are useful and Accera uses them interchangeably.

"},{"location":"Manual/06%20Plans%20-%20Caching/#active-elements-and-active-blocks","title":"Active elements and active blocks","text":"

A loop nest operates on the data that is stored in arrays. Each key-slice can access a subset of the array elements, which we call the active elements that correspond to that specific key-slice. Since the current iteration belongs to key-slices at different levels, we need to define corresponding sets of active elements at different levels.

More precisely, array A elements that are read from or written to by the iterations of the current level l key-slice are called the level l active elements of A. This set of elements does not necessarily take the shape of a block. Therefore, the level l active block of A can be defined as the smallest block of elements that contains all of the level l active elements in A. Accera uses active blocks to define caching strategies.

Just like we can specify a key-slice using a dimension, we can also refer to the active block that corresponds to a specific dimension. For example, if the names of the iteration space dimensions are (i, j, k) and the current iteration is one of the iterations for which i=3, then the active block in A that corresponds to dimension j is the block that includes all the elements touched by the key-slice (3, *, *).

"},{"location":"Manual/06%20Plans%20-%20Caching/#caches","title":"Caches","text":"

An Accera cache is a local copy of an active block. A cache is contiguous in memory and its memory layout may differ from the layout of the original array. The loop nest iterations operate on the cache elements instead of the original array elements.

The contents of the active block are copied into the cache at the start of the corresponding key-slice. If the array is mutable (namely, an input/output array or a temporary array), the cache contents are copied back into the original array at the end of the key-slice.

"},{"location":"Manual/06%20Plans%20-%20Caching/#caching-by-level","title":"Caching by level","text":"

To define a cache for a given array, all we need is to specify the desired level.For example:

AA = plan.cache(A, level=2)\n
The return value AA is a handle that can be used to refer to the cache in subsequent operations. We can choose the cache layout, just as we did when we defined the original array.
AA = plan.cache(A, level=2, layout=acc.Array.Layout.FIRST_MAJOR)\n

"},{"location":"Manual/06%20Plans%20-%20Caching/#caching-by-dimension","title":"Caching by dimension","text":"

As mentioned above, we can specify an active block using a dimension. We use this to define a cache as follows:

AA = plan.cache(A, index=j)\n

"},{"location":"Manual/06%20Plans%20-%20Caching/#caching-by-element-budget","title":"Caching by element budget","text":"

Note that the current active blocks of an array are nested, and their sizes are monotonic (nondecreasing) in their level. Therefore, we can also select the largest active block that does not exceed a certain number of elements:

AA = plan.cache(A, max_elements=1024)\n

"},{"location":"Manual/06%20Plans%20-%20Caching/#thrifty-caching","title":"Thrifty caching","text":"

By default, Accera caching strategies are thrifty in the sense that the data is physically copied into an allocated cache only if the cached data somehow differs from the original active block. Therefore, if the original active block is already in the correct memory layout and resides contiguous in memory. Accera skips the caching steps and uses the original array instead. Note that a physical copy is created on a GPU if the cache is supposed to be allocated a different type of memory than the original array (e.g., the array is in global memory, but the cache is supposed to be in shared memory).

For example, assume that A is a two-dimensional array and its active block at the chosen level is always one of its rows. If A is row-major, the rows are already stored contiguously. Additionally, the data in the active block and the data to be copied to cache are identical: both are contiguous and share the same memory layout. In this case, there is no benefit in using cache over the original array. The thrifty caching strategy will skip the caching steps and use the original array instead.

On the other hand, if A is column-major, its rows are not stored contiguously. In this case, copying the active row into a contiguous temporary location could be computationally advantageous. Therefore, the thrifty caching strategy would create the cache and populate it with the data.

Thrifty caching can be turned off using the optional argument thrifty=False. If turned off, a physical copy is always created.

"},{"location":"Manual/06%20Plans%20-%20Caching/#hierarchical-caching","title":"Hierarchical caching","text":"

Caches can be composed hierarchically. Namely, a high-level key-slice can trigger a copy from the original array into a big cache, and a lower level key-slice can be used to trigger a copy from the big cache into a smaller cache.

For example,

AA = plan.cache(A, level=4)\nAAA = plan.cache(AA, level=2)\n

"},{"location":"Manual/06%20Plans%20-%20Caching/#multicaching","title":"Multicaching","text":"

While caches are defined with a key-slice level, a higher-level key slice trigger_level can be specified as the trigger key-slice for copying multiple successive active blocks of elements to a local copy. These copied active blocks have their layouts defined as usual, and only the trigger level for copying them has been changed. Since active blocks are not mutually exclusive, this can result in the same element being copied into multiple locations as separate caches. Therefore, a trigger_level may only be specified on an INPUT or CONST array as Accera does not support multicache write coherence.

For example,

AA = plan.cache(A, level=2, trigger_level=4)\n

"},{"location":"Manual/06%20Plans%20-%20Caching/#mapping-caches-to-specific-types-of-memory","title":"Mapping caches to specific types of memory","text":"

Some target platforms have different types of memory that can hold Accera caches. In the case of a GPU target, caches can be located in global or shared memory. To explicitly choose the location of the cache, we write:

AA = plan.cache(A, level=4, location=v100.MemorySpace.SHARED)\n

"},{"location":"Manual/06%20Plans%20-%20Caching/#double-buffering","title":"Double buffering","text":"

Caches can double-buffer data by loading the next active block's cache data into a temporary buffer during the current active block's usage and then moving that data into the cache buffer after the current active block is done being used. If the cache trigger level is the highest level in the loopnest then this does nothing as it is dependent on having another loop outside of the cache trigger loop. In shared memory caches on GPU this temporary buffer will automatically be allocated in private memory. Since the next iteration's data is loaded into a temporary buffer while the current iteration's data is in the cache buffer, any overlap in these active blocks would result in a write coherency issue similar to what occurs with Multicaching. Because of this, double_buffer may only be specified on an INPUT or CONST array as Accera does not perform multicache write coherence.

AA = plan.cache(A, level=3, double_buffer=True)\n

Full schedule with equivalent pseudo-code:

...\nM, N, K = 1024, 1024, 1024\nm_tile, n_tile, k_tile = 32, 64, 128\nnest = Nest((M, N, K))\ni, j, k = nest.get_indices()\n@nest.iteration_logic\ndef _():\n    C[i,j] += A[i,k] * B[k,j]\nschedule = nest.create_schedule()\nschedule.tile({\n    i: m_tile,\n    j: n_tile,\n    k: k_tile\n})\nschedule.reorder(i, j, k, ii, jj, kk)\n\nplan = schedule.create_plan()\nplan.cache(A, index=ii, double_buffer=True)\n...\n
equivalent to:
for i in range(0, M, m_tile):\n    for j in range(0, N, n_tile):\n        for ii_cache in range(0, m_tile):\n            for kk_cache in range(0, k_tile):\n                cache_A[ii_cache, kk_cache] = A[i+ii_cache, kk_cache]\n        for k in range(0, K-k_tile, k_tile): # Note: this loop doesn't run for the final K tile\n            for ii_cache in range(0, m_tile):\n                for kk_cache in range(0, k_tile):\n                    temp_A[ii_cache, kk_cache] = A[i+ii_cache, (k + k_tile) + kk_cache]\n            for ii in range(0, m_tile):\n                for jj in range(0, n_tile):\n                    for kk in range(0, k_tile):\n                        C[i+ii, j+jj] += cache_A[ii, kk] * B[k+kk, j+jj]\n            for ii_cache in range(0, m_tile):\n                for kk_cache in range(0, k_tile):\n                    cache_A[ii_cache, kk_cache] = temp_A[ii_cache, kk_cache]\n        for ii in range(0, m_tile):\n            for jj in range(0, n_tile):\n                for kk in range(0, k_tile):\n                    C[i+ii, j+jj] += cache_A[ii, kk] * B[k+kk, j+jj]\n

"},{"location":"Manual/06%20Plans%20-%20Caching/#caching-strategy","title":"Caching strategy","text":"

In GPUs, the mapping between threads and data can be controlled by specifying the strategy option. Currently, we support the BLOCKED and the STRIPED access patterns and they are explained in detail at accera.CacheStrategy. The choice of which pattern to use will depend on the hardware architecture and the intended algorithm the cache will be used for, since different access patterns incur different performance overhead.

AA = plan.cache(A, level=3, double_buffer=True, strategy=_CacheStrategy.BLOCKED)\n

The above example will create a cache where each thread copies a contiguous chunk (block) of elements based on their thread index.

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/","title":"Section 7: Plans - Operations and Optimizations","text":"

We can control target-specific operations and optimizations using a plan. Examples include instruction pipelining, applying SIMD vector instructions, and so on.

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#unroll","title":"unroll","text":"

By default, each dimension of the iteration space is implemented as a for-loop. The unroll instruction marks a dimension for unrolling rather than looping. Imagine the following nest that multiplies the entries of an array by a constant:

import accera as acc\n\nmy_target = acc.Target(type=acc.Target.Category.CPU)\n\nnest = acc.Nest(shape=(3,5))\ni, j = nest.get_indices()\n\n@nest.iteration_logic\ndef _():\n    A[i, j] *= 2.0\n\nplan = nest.create_plan(my_target)\n
If we build plan as is, the resulting implementation would be equivalent to the following Python code:
for i in range(3):\n    for j in range(5):\n        A[i, j] *= 2.0\n
If we add the instruction plan.unroll(index=j), the resulting implementation becomes equivalent to:
for i in range(3):\n    A[i, 0] *= 2.0\n    A[i, 1] *= 2.0\n    A[i, 2] *= 2.0\n    A[i, 3] *= 2.0\n    A[i, 4] *= 2.0\n
If, instead of unrolling j, we add the instruction plan.unroll(index=i), the resulting implementation becomes equivalent to:
for j in range(5):\n    A[0, j] *= 2.0\nfor j in range(5):\n    A[1, j] *= 2.0\nfor j in range(5):\n    A[2, j] *= 2.0\n
And, of course, we can also unroll both dimensions, removing for-loops completely.

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#vectorize","title":"vectorize","text":"

Modern target platforms support SIMD vector instructions. These instructions perform the same operation on an entire vector of elements, all at once. By default, each dimension of an iteration space becomes a for-loop. The vectorize instruction instead labels a dimension for vectorized execution, rather than for-looping.

For example, assume that a host supports 256-bit vector instructions, indicating that its vector instructions operate on eight floating-point elements at once. Also, consider that we already have arrays A, B, and C, and we write the following code:

nest = acc.Nest(shape=(64,))\ni = nest.get_indices()\n\n@nest.iteration_logic\ndef _():\n    C[i] = A[i] * B[i]\n\nschedule = nest.create_schedule()\nii = schedule.split(i, 8)\n\nplan = nest.create_plan()\nplan.vectorize(index=ii)\n
The dimension marked for the vectorization is of size 8, which is a supported vector size on the specific target platform. Therefore, the resulting binary will contain something like:
  00000001400010B0: C5 FC 10 0C 11     vmovups     ymm1,ymmword ptr [rcx+rdx]\n  00000001400010B5: C5 F4 59 0A        vmulps      ymm1,ymm1,ymmword ptr [rdx]\n  00000001400010B9: C4 C1 7C 11 0C 10  vmovups     ymmword ptr [r8+rdx],ymm1\n  00000001400010BF: 48 8D 52 20        lea         rdx,[rdx+20h]\n  00000001400010C3: 48 83 E8 01        sub         rax,1\n  00000001400010C7: 75 E7              jne         00000001400010B0\n
Note how the multiplication instruction vmulps and the memory move instruction vmovups deal with eight 32-bit floating-point values at a time.

Different targets support different vector instructions having different vector sizes. The following table includes iteration logic that vectorizes correctly on most targets with vectorization support, such as Intel Haswell, Broadwell or newer, and ARM v7/A32. Other examples of iteration logic may or may not vectorize correctly. Variables prefixed with v are vector types, and those prefixed with s are scalar types.

Vector pseudocode Equivalent to Supported types v1 += s0 * v0 for i in range(vector_size): v1[i] += s0 * v0[i] float32 v1 += v0 * s0 for i in range(vector_size): v1[i] += v0[i] * s0 float32 v1 += v0 / s0 for i in range(vector_size): v1[i] += v0[i] / s0 float32 v1 -= s0 * v0 for i in range(vector_size): v1[i] -= s0 * v0[i] float32 v1 -= v0 * s0 for i in range(vector_size): v1[i] -= v0[i] * s0 float32 v1 -= v0 / s0 for i in range(vector_size): v1[i] -= v0[i] / s0 float32 v2 += v0 * v1 for i in range(vector_size): v2[i] += v0[i] * v1[i] float32 vector inner (dot) product: s0 += dot(v0, v1) for i in range(vector_size): s0 += v0[i] * v1[i] float32 v2 = v0 + v1 for i in range(vector_size): v2[i] = v0[i] + v1[i] int8/16/32/64, float32 v2 = v0 - v1 for i in range(vector_size): v2[i] = v0[i] - v1[i] int8/16/32/64, float32 v2 = v0 * v1 for i in range(vector_size): v2[i] = v0[i] * v1[i] int8/16/32/64, float32 v2 = v0 / v1 for i in range(vector_size): v2[i] = v0[i] / v1[i] float32 v1 = abs(v[0]) for i in range(vector_size): v1[i] = abs(v0[i]) int8/16/32/64, float32 v2 = (v0 == v1) for i in range(vector_size): v2[i] = 0XF..F if v0[i] == v1[i] else 0 int8/16/32/64, float32 v2 = (v0 > v1) for i in range(vector_size): v2[i] = 0XF..F if v0[i] > v1[i] else 0 int8/16/32/64, float32 v2 = (v0 >= v1) for i in range(vector_size): v2[i] = 0XF..F if v0[i] >= v1[i] else 0 int8/16/32/64, float32 v2 = (v0 < v1) for i in range(vector_size): v2[i] = 0XF..F if v0[i] < v1[i] else 0 int8/16/32/64, float32 v2 = (v0 <= v1) for i in range(vector_size): v2[i] = 0XF..F if v0[i] <= v1[i] else 0 int8/16/32/64, float32 v1 = v0 << s0 for i in range(vector_size): v1[i] = v0[i] << s0 int16/32/64, float32 v1 = v0 >> s0 for i in range(vector_size): v1[i] = v0[i] >> s0 int16/32/64, float32 s0 = sum(v0) for i in range(vector_size): s0 += v0[i] int8/16/32/64, float32 s0 = max(v0 + v1) for i in range(vector_size): s0 = max(v0[i] + v1[i], s0) int8/16/32/64, float32 s0 = max(v0 - v1) for i in range(vector_size): s0 = max(v0[i] - v1[i], s0) int8/16/32/64, float32

Additionally, Accera can perform vectorized load and store operations to/from vector registers and memory if the memory locations are contiguous.

To vectorize dimension i, the number of active elements that corresponds to dimension i must exactly match the vector instruction width of the target processor. For example, if the target processor has vector instructions that operate on either 4 or 8 floating-point elements at once, then the number of active elements can either be 4 or 8. Additionally, those active elements must occupy adjacent memory locations (they cannot be spread out).

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#tensorize","title":"tensorize","text":"

Some hardware also have specialized instructions for performing matrix multiplications. These instructions operate on certain matrix dimensions with specific data types. The tensorization instructions take tiles of the A, B, and C matrices and compute the C = A * B + C operation.

The tensorize operation takes 3 indices:

plan.tensorize(indices=(i,j,k))\n

Tensorization is limited and is only valid on loop structures of the form

for i in range(M):\n    for k in range(N):\n        for j in range(K):\n            C[i, j] += A[i, k] * B[k, j]\n

Where there is MxNxK tensorization hardware support using the A, B, and C element data types.

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#convenience-syntax-kernelize","title":"Convenience syntax: kernelize","text":"

The kernelize instruction is a convenience syntax that does not provide any unique functionality. Specifically, kernelize is equivalent to a sequence of unroll instructions, followed by an optional vectorize instruction.

A typical Accera design pattern is to first break a loop-nest into tiles and then apply an optimized kernel to each tile. For example, imagine that the loop nest multiplies two 256\u00d7256 matrices and the kernel is a highly optimized procedure for multiplying 4\u00d74 matrices. Accera will introduce different ways to write highly optimized kernels in the future. However, currently, it only supports automatic kernelization using the kernelize instruction. As mentioned above, kernelize is shorthand for unrolling and vectorizing. These instructions structure the code in a way that makes it easy for downstream compiler heuristics to automatically generate kernels.

Consider, once again, the matrix multiplication example we discussed previously in Section 2. Assume that we declare the schedule and reorder as follows:

schedule = nest.create_schedule()\nschedule.reorder(i, k, j)\n
Notice that i, k, j are the last three dimensions in the iteration space and the resulting implementation becomes equivalent to:

for i in range(M):\n    for k in range(S):\n        for j in range(N):\n            C[i, j] += A[i, k] * B[k, j]\n

The instruction:

plan.kernelize(unroll_indices=(i, k), vectorize_indices=j)\n
is just shorthand for
plan.unroll(i)\nplan.unroll(k)\nplan.vectorize(j)\n
Applying this sequence of instructions allows the compiler to automatically create an optimized kernel from loops i, k, j.

For simplicity, assume that the matrix sizes defined by M, N, and S are 3, 4, and 2 respectively.

After applying kernelize, the schedule is equivalent to the following Python code:

C[0,0:4] += A[0,0] * B[0,0:4] # vectorized\nC[0,0:4] += A[0,1] * B[1,0:4] # vectorized\nC[1,0:4] += A[1,0] * B[0,0:4] # vectorized\nC[1,0:4] += A[1,1] * B[1,0:4] # vectorized\nC[2,0:4] += A[2,0] * B[0,0:4] # vectorized\nC[2,0:4] += A[2,1] * B[1,0:4] # vectorized\n

This would result in the following vectorized instructions on an Intel Haswell CPU:

  0000000000000200: C4 C1 78 10 00     vmovups     xmm0,xmmword ptr [r8]\n  0000000000000205: C4 E2 79 18 09     vbroadcastss xmm1,dword ptr [rcx]\n  000000000000020A: C5 F8 10 12        vmovups     xmm2,xmmword ptr [rdx]\n  000000000000020E: C4 E2 69 A8 C8     vfmadd213ps xmm1,xmm2,xmm0\n  0000000000000213: C5 F8 10 5A 10     vmovups     xmm3,xmmword ptr [rdx+10h]\n  0000000000000218: C4 E2 79 18 61 04  vbroadcastss xmm4,dword ptr [rcx+4]\n  000000000000021E: C4 E2 61 A8 E1     vfmadd213ps xmm4,xmm3,xmm1\n  0000000000000223: C4 E2 79 18 49 08  vbroadcastss xmm1,dword ptr [rcx+8]\n  0000000000000229: C4 E2 69 A8 C8     vfmadd213ps xmm1,xmm2,xmm0\n  000000000000022E: C4 E2 79 18 69 0C  vbroadcastss xmm5,dword ptr [rcx+0Ch]\n  0000000000000234: C4 E2 61 A8 E9     vfmadd213ps xmm5,xmm3,xmm1\n  0000000000000239: C4 E2 79 18 49 10  vbroadcastss xmm1,dword ptr [rcx+10h]\n  000000000000023F: C4 E2 69 A8 C8     vfmadd213ps xmm1,xmm2,xmm0\n  0000000000000244: C4 E2 79 18 41 14  vbroadcastss xmm0,dword ptr [rcx+14h]\n  000000000000024A: C4 E2 61 A8 C1     vfmadd213ps xmm0,xmm3,xmm1\n  000000000000024F: C4 C1 58 58 09     vaddps      xmm1,xmm4,xmmword ptr [r9]\n  0000000000000254: C4 C1 50 58 51 10  vaddps      xmm2,xmm5,xmmword ptr [r9+10h]\n  000000000000025A: C4 C1 78 58 41 20  vaddps      xmm0,xmm0,xmmword ptr [r9+20h]\n

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#parallelize","title":"parallelize","text":"

The parallelize instruction performs one or more loops in parallel on multiple cores.

xeonPlat = acc.Target(\"Intel 9221\", num_threads=16)\nplan = schedule.create_plan(xeonPlat)\nplan.parallelize(indices=(i,j,k))\n
Specifying multiple dimensions is equivalent to the collapse argument in OpenMP. Therefore, the dimensions must be contiguous in the iteration space dimension order.

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#static-scheduling-policy","title":"Static scheduling policy","text":"

Static scheduling strategy is invoked by setting the argument policy=\"static\" in the call to parallelize. If n iterations are parallelized across c cores, the static scheduling partitions the work into c fixed parts, some of size floor(n/c), some of size ceil(n/c), and executes each part on a different core.

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#dynamic-scheduling-policy","title":"Dynamic scheduling policy","text":"

Dynamic scheduling strategy is invoked by setting the argument policy=\"dynamic\" in the call to parallelize. Dynamic scheduling creates a single work queue that is shared across different cores.

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#specifying-thread-limit","title":"Specifying thread limit","text":"

Setting the argument max_threads to a positive integer value will tell the compiler to have an upper bound on the number of threads used for distributing the workload.

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#not-yet-implemented-pinning-to-specific-cores","title":"Not yet implemented: Pinning to specific cores","text":"

The pin argument allows the parallel work to be pinned to specific cores.

"},{"location":"Manual/07%20Plans%20-%20Operations%20and%20Optimizations/#bind","title":"bind","text":"

Some target platforms, such as GPUs, are specifically designed to execute nested loops. They can take an entire grid of work and schedule its execution on multiple cores. On a GPU, this grid is broken up into multiple blocks, where each block contains multiple threads. Block iterators and thread iterators are identified by special variables in the Target object. To take advantage of a target platform's ability to execute grids, we must bind dimensions of the iteration space with these special iterator variables.

For example,

v100 = acc.Target(\"Tesla V100\")\nplan.bind(mapping={\n        i: v100.GridUnit.BLOCK_X,\n        j: v100.GridUnit.THREAD_X,\n        k: v100.GridUnit.THREAD_Y\n    }\n)\n

"},{"location":"Manual/08%20Deferred%20Layout%20of%20Constant%20Arrays/","title":"Section 8: Deferred layout of constant arrays","text":"

Let's revisit the memory layout of constant arrays. As explained in Section 1, the contents of constant arrays are known at compile-time, and these contents are immutable. Accera stores constant arrays in a non-standard memory layout optimized for a particular plan. In some cases, storing multiple copies of each array element may even prove advantageous (e.g., storing a matrix in row-major and column-major layouts).

"},{"location":"Manual/08%20Deferred%20Layout%20of%20Constant%20Arrays/#deferred-layout-based-on-a-cache","title":"Deferred layout based on a cache","text":"

Accera's cache strategy creates local copies of an array's active blocks. The constant array can be arranged based on the defined cache. Specifically, the array is stored by serializing the active blocks consecutively. If the caching strategy is thrifty=True, the active blocks are ready to use without copying the data.

To define an array layout based on a cache, Accera DSL has to overcome the chicken-and-egg paradox. While on the one hand, arrays need to be defined even before the nest logic. On the other hand, array layout depends on a cache, which is defined only as a part of a plan. In Accera, we overcome this situation by splitting the array definition into two parts. Though we still define the constant array upfront, we avoid committing to a specific layout.

import accera as acc\nimport numpy as np\n\nmatrix = np.random.rand(16, 16)\nA = acc.Array(role=acc.Role.CONST, data=matrix, layout=acc.Array.Layout.DEFERRED)\n
Now we define the nest logic, the schedule, and the plan. Consider that we define a plan named plan and use this plan to define a cache A based on dimension i:
AA = plan.cache(A, i, layout=acc.Array.Layout.FIRST_MAJOR, thrifty=True)\n
We can now use the cache AA to determine the layout of the original array A:
A.deferred_layout(cache=AA)\n

"},{"location":"Manual/09%20Parameters/","title":"Section 9: Parameters","text":"

Accera's parameters are placeholders that get replaced with concrete values when adding a function to a package. A parameter can be used in a Nest, a Schedule, or a Plan.

"},{"location":"Manual/09%20Parameters/#parameterized-nests","title":"Parameterized nests","text":"

Recall that a Nest represents the loop-nest logic. We can parameterize the nest's shape and iteration logic. For example, consider the following parameterized version of matrix multiplication:

# Create parameters\nP0, P1, P2, P3 = acc.create_parameters()\n\nA = acc.Array(role=acc.Role.INPUT, shape=(P0, P2))\nB = acc.Array(role=acc.Role.INPUT, shape=(P2, P1))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(P0, P1))\n\n# Define a simple nest\nnest = acc.Nest(shape=(P0, P1, P2))\ni, j, k = nest.get_indices()\n\n# Define the loop nest logic and add it to the nest\n@nest.iteration_logic\ndef _():\n    C[i, j] += P3 * A[i, k] * B[k, j]\n\n# create a package\npackage = acc.Package()\n\n# Use the templated nest to add two different functions to the package\npackage.add(nest, args=(A, B, C), parameters={P0:16, P1:16, P2:16, P3:1.0}, base_name=\"matmul_16_16_16_1\")\npackage.add(nest, args=(A, B, C), parameters={P0:32, P1:32, P2:32, P3:2.0}, base_name=\"matmul_32_32_32_2\")\n
In the above scenario, the shape of the nest is parameterized by (P0, P1, P2) and its iteration logic includes the parameter P3. The nest is used twice with different settings of these parameters to create two separate functions in the package.

"},{"location":"Manual/09%20Parameters/#parameterized-schedules-and-plans","title":"Parameterized schedules and plans","text":"

Parameters can also appear in schedules and plans. For example, we can add the following code snippet:

P4, P5 = acc.create_parameters()\n\n# Create a parameterized schedule\nschedule = nest.create_schedule()\nii = schedule.split(i, size=P4)\n\n# Create a parameterized plan\nplan = schedule.create_plan()\nplan.cache(A, level=P5)\n\n# Add another function to the package\npackage.add(plan, args=(A, B, C), parameters={P0:16, P1:16, P2:16, P3:1.0, P4:4, P5:2}, base_name=\"alternative_matmul_16_16_16\")\n
"},{"location":"Manual/09%20Parameters/#supported-operations","title":"Supported operations","text":"

Accera's parameters support the basic arithmetic operations and other relational/bitwise/intrinsics operations. For example, we can add the following code snippet instead:

fma_unit_count, vector_size, P5 = acc.create_parameters()\n\n# Create a parameterized schedule\nschedule = nest.create_schedule()\nii = schedule.split(i, size=fma_unit_count * vector_size)\niii = schedule.split(ii, size=vector_size)\n\n# Create a parameterized plan\nplan = schedule.create_plan()\nplan.cache(A, level=P5)\n\n# Add another function to the package\npackage.add(plan, args=(A, B, C), parameters={P0:16, P1:16, P2:16, P3:1.0, fma_unit_count:4, vector_size:16, P5:2}, base_name=\"alternative_matmul_16_16_16\")\n

The supported operations include the following operations:

"},{"location":"Manual/09%20Parameters/#arithmetic-operators","title":"Arithmetic operators","text":"Operation Types Description a + b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the sum of parameters (or parameter and scalar) a and b a - b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the difference between parameters (or parameter and scalar) a and b a * b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the product of parameters (or parameter and scalar) a and b a / b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the quotient of parameters (or parameter and scalar) a and b a ** b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the b'th power of parameter a a // b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the floor of the quotient of parameters (or parameter and scalar) a and b a % b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the signed remainder after dividing parameter a by parameter or scalar b -a acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns the additive inverse of parameter a"},{"location":"Manual/09%20Parameters/#comparison-operations","title":"Comparison Operations","text":"Operation Types Description a == b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if parameter or scalar a equals parameter or scalar b, else False a != b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if parameter or scalar a is not equal to parameter or scalar b, else False a < b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if parameter or scalar a is strictly smaller than parameter or scalar b, else False a <= b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if parameter or scalar a is smaller than or equal to parameter or scalar b, else False a > b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if parameter or scalar a is strictly greater than parameter or scalar b, else False a >= b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64, acc.ScalarType.float16/32/64 Returns True if parameter or scalar a is greater than or equal to parameter or scalar b, else False"},{"location":"Manual/09%20Parameters/#bitwise-operators","title":"Bitwise operators","text":"Operation Types Description a & b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns the bitwise AND of the bits in parameters (or parameter and scalar) a and b a \\| b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns the bitwise OR of the bits in parameters (or parameter and scalar) a and b a ^ b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns the bitwise XOR of the bits in parameters (or parameter and scalar) a and b ~a acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns the bitwise inverse of the bits in parameter a a << b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns parameter a whose bitwise representation is shifted left by b bits a >> b acc.DelayedParameter, acc.ScalarType.int8/16/32/64, acc.ScalarType.uint8/16/32/64 Returns parameter a whose bitwise representation is shifted right by b bits"},{"location":"Manual/09%20Parameters/#intrinsics","title":"Intrinsics","text":"Operation Types Description acc.abs(a) acc.ScalarType.float16/32/64 Returns the absolute value of parameter a"},{"location":"Manual/09%20Parameters/#tuple-parameter-values","title":"Tuple parameter values","text":"

Parameters can be used as placeholders for tuples, specifically for tuples of indices. For example, assume that we want to parameterize the order of the iteration space dimensions. We can then write:

P6 = acc.create_parameters()\nschedule.reorder(order=P6)\n
Later, we can set the value of P6 to the index tuple (j,k,i).

"},{"location":"Manual/09%20Parameters/#create-parameters-from-an-entire-parameter-grid","title":"Create parameters from an entire parameter grid","text":"

Consider the parameterized nest defined above. Rather than setting a specific value for each parameter, imagine that we have a set of different values for each parameter. For example, consider that we want P0 to have a value in set {8, 16}, P1 in {16, 32}, P2 to be always 16, and P3 in {1,2}. We can define the parameter grid with this data, which lists all the valid parameter combinations. In our case, this grid includes the following parameter settings:

{P0:8, P1:16, P2:16, P3:1.0}\n{P0:8, P1:16, P2:16, P3:2.0}\n{P0:8, P1:32, P2:16, P3:1.0}\n{P0:8, P1:32, P2:16, P3:2.0}\n{P0:16, P1:16, P2:16, P3:1.0}\n{P0:16, P1:16, P2:16, P3:2.0}\n{P0:16, P1:32, P2:16, P3:1.0}\n{P0:16, P1:32, P2:16, P3:2.0}\n

Accera provides an easy way to add all the functions that correspond to the parameter grid at once:

parameters = create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0])\npackage.add(nest, args=(A, B, C), base_name=\"matmul\", parameters)\n
In this case, package.add generates a function eight times, once for each parameter combination in the grid. Other than nest, package.add can alternatively accept a Schedule (if we are performing schedule transformations), or a Plan (if we are setting target-specific options). All eight functions share the same base name. However, Accera automatically adds a unique suffix to each function name to prevent duplication. This pattern allows optional filtering by inspecting the generated parameter values list before calling package.add.

You can define a lambda or function to filter out combinations from the parameter grid. The arguments to the filter are the values of a parameter combination, and it should return True if the combination should be included, and False otherwise:

parameters = create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]}, filter_func=lambda p0, p1, p2, p3: p2 < p1 and 4 * (p0 * p3 + p1 * p2 + p1 * p3 + p2 * p3) / 1024 < 256)\n

To limit the size of the parameter grid (and therefore the number of functions generated) to at most 5:

parameters = create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]}, sample=5)\n

If the parameter is a loop order which is a list or tuple of indices, create_parameter_grid can generate all the permutations of loop order. Furthermore, you can pass in a filter function to filter out invalid loop orders:

parameters = create_parameter_grid({P0:(i, j, k, ii, jj, kk)}, filter_func = lambda *p : schedule.is_valid_loop_order(p[0][0]))\n

Schedule.is_valid_loop_order() is a pre-defined filter function that determines if a given loop order is valid for that schedule.

Note that the order of the list or tuple of indices provided to create_parameter_grid does not matter.

To filter parameters with more complicated logic, you can define your own filter function that wraps Schedule.is_valid_loop_order():

def my_filter(parameters_choice):\n    P1, P2, P3, P4, P5, loop_order = parameters_choice\n\n    return P1 > P2 \\\n        and P3 > P4 \\\n        and P1 * P5 < P3 \\\n        and P2 * P5 < P4 \\\n        and schedule.is_valid_loop_order(loop_order)\n\n parameters = acc.create_parameter_grid({\n        P1: [64, 128, 256],\n        P2: [32, 128], \n        P3: [16, 32, 128],\n        P4: [8, 64],\n        P5: [4],\n        loop_order: (i, j, k, ii, jj, kk)\n    }, my_filter)\n
"},{"location":"Manual/10%20Packages/","title":"Section 10: Building Packages","text":"

The Package class represents a collection of Accera-generated functions. Whenever a package is built, it creates a stand-alone function library that other pieces of software can use. Currently, Accera supports two package formats: HAT and MLIR.

"},{"location":"Manual/10%20Packages/#hat-package-format","title":"HAT package format","text":"

HAT \"Header Annotated with TOML\" is a format for packaging compiled libraries in the C programming language. HAT implies that a standard C header is styled with useful metadata in the TOML markup language.

Consider a nest that holds some loop-nest logic. To build a HAT package containing a function with this logic for the Windows operating system, we write the following lines of code:

package = acc.Package()\npackage.add(nest, args=(A, B), base_name=\"myFunc\")\npackage.build(format=acc.Package.Format.HAT_DYNAMIC, name=\"myPackage\", platform=acc.Package.Platform.WINDOWS)\n

The result is two files: myPackage.hat and myPackage.dll. The output directory defaults to current working directory. We can change the output directory with output_dir set to a relative or absolute path:

package.build(format=acc.Package.Format.HAT_DYNAMIC, name=\"myPackage\", platform=acc.Package.Platform.WINDOWS, output_dir=\"hat_packages\")\n
"},{"location":"Manual/10%20Packages/#mlir-package-format","title":"MLIR package format","text":"

MLIR format is used for debugging the multiple stages of MLIR lowering, from the Accera DSL all the way to runnable code.

package.build(format=acc.Package.Format.MLIR, name=\"myPackage\")\n

"},{"location":"Manual/10%20Packages/#function-names-in-packages","title":"Function names in packages","text":"

We can specify the base name of a function when it is added to a package. The full function name is the base name followed by an automatically generated unique identifier. For example, if the base name is \"myFunc\" then the function name could be \"myFunc_8f24bef5\". If no base name is defined, the automatically-generated unique identifier becomes the function name.

The unique identifier ensures that no two functions share the same name. However, invoking the function from the client code becomes cumbersome because the function name changes each time the Accera package is updated and rebuilt. Therefore, the HAT file includes the client code to call the function without the unique identifier. Concretely, if the function signature in C is:

void myFunc_8f24bef5(const float* A, float* B);\n
then the HAT file also contains the line:
void (*myFunc)(const float* A, float* B) = myFunc_8f24bef5;\n
The above code makes the abbreviated name myFunc an alias of the full function name myFunc_8f24bef5. If multiple functions share the same base name, the first function in the HAT file gets the alias.

"},{"location":"Manual/10%20Packages/#debug-mode","title":"Debug mode","text":"

A package can be built withmode=acc.Package.Mode.DEBUG. Doing so creates a special version of each function that validates its own correctness every time the function is called. From the outside, a debugging package looks identical to a standard package. However, each of its functions actually contains two different implementations: the Accera implementation (with all of the fancy scheduling and planning) and the trivial default implementation (without any scheduling or planning). When called, the function runs both implementations and asserts that their outputs are within the predefined tolerance. If the outputs don't match, the function prints error messages to stderr.

package.build(format=acc.Package.Format.HAT_DYNAMIC, name=\"myPackage\", mode=acc.Package.Mode.DEBUG, tolerance=1.0e-6)\n

Not yet implemented: Debug mode is not supported for GPU targets.

"},{"location":"Manual/10%20Packages/#adding-descriptions","title":"Adding descriptions","text":"

Accera allows us to specify some standard descriptive fields in a package:

package.add_description(version=\"1.0\", license=\"https://mit-license.org/\", author=\"Microsoft Research\")\n
Additionally, we can add arbitrary metadata to the package description as follows:
package.add_description(other={\"title\": \"My Package Title\", \"source\": \"https://github.com/\", \"citations\": [\"https://arxiv.org/2021.12345/\", \"https://arxiv.org/2021.56789/\"]})\n

"},{"location":"Reference/accera/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/accera/#module-functions","title":"Module functions","text":"
  • accera.cast (value, type)
  • accera.create_dimensions ([role])
  • accera.create_parameters ()
  • accera.create_parameter_grid (parameter_choices[, filter_func, sample, seed])
  • accera.fuse (schedules[, partial])
"},{"location":"Reference/accera/#top-level-enumerations","title":"Top level enumerations","text":"
  • accera.CacheStrategy
  • accera.ScalarType
  • accera.MMASchedulingPolicy
  • accera.MMAShape
  • accera.Role
"},{"location":"Reference/accera/#classes","title":"Classes","text":""},{"location":"Reference/accera/#class-acceraarray","title":"class accera.Array","text":"

A multidimensional array of scalar elements.

"},{"location":"Reference/accera/#constructors","title":"Constructors","text":"
  • Array (role[, data, element_type, layout, offset, shape])
"},{"location":"Reference/accera/#enumerations","title":"Enumerations","text":"
  • accera.Array.Layout
"},{"location":"Reference/accera/#methods","title":"Methods","text":"
  • deferred_layout (layout)
  • sub_array (offsets, shape[, strides])
"},{"location":"Reference/accera/#class-acceracache","title":"class accera.Cache","text":"

A local copy of an Array block.

"},{"location":"Reference/accera/#class-acceraindex","title":"class accera.Index","text":"

An index representing one of the loops in a Nest or one of the iteration-space dimensions of a Schedule or a Plan.

"},{"location":"Reference/accera/#class-acceranest","title":"class accera.Nest","text":"

The logic of a loop nest.

"},{"location":"Reference/accera/#constructors_1","title":"Constructors","text":"
  • Nest (shape)
"},{"location":"Reference/accera/#methods_1","title":"Methods","text":"
  • iteration_logic (logic)
  • create_plan ([target])
  • create_schedule ()
  • get_indices ()
"},{"location":"Reference/accera/#class-accerapackage","title":"class accera.Package","text":"

Represents a collection of functions that can be built and emitted for use in client code.

"},{"location":"Reference/accera/#constructors_2","title":"Constructors","text":"
  • Package ()
"},{"location":"Reference/accera/#enumerations_1","title":"Enumerations","text":"
  • accera.Package.Format
  • accera.Package.Mode
  • accera.Package.Platform
"},{"location":"Reference/accera/#methods_2","title":"Methods","text":"
  • add_description ([author, license, other, version])
  • add (args, source[, base_name, parameters])
  • build (name[, error_path, format, mode, os, tolerance])
"},{"location":"Reference/accera/#class-acceraparameter","title":"class accera.Parameter","text":"

A placeholder that can be used instead of concrete values when constructing or calling the methods of a Nest, Schedule, or Plan.

"},{"location":"Reference/accera/#class-acceraplan","title":"class accera.Plan","text":"

A scheduled (ordered) loop nest with target-specific implementation details.

"},{"location":"Reference/accera/#methods_3","title":"Methods","text":"
  • cache (source[, index, trigger_index, layout, level, trigger_level, max_elements, thrifty, location, double_buffer, double_buffer_location, vectorize])
  • bind (indices, grid)
  • kernelize (unroll_indices[, vectorize_indices])
  • parallelize (indices[, pin, policy, max_threads])
  • tensorize (indices, mma_shape[, use_static_offsets, num_total_passes, num_fused_passes, scheduling_policy])
  • unroll (index)
  • vectorize (index)
"},{"location":"Reference/accera/#class-accerascalar","title":"class accera.Scalar","text":"

A scalar element.

"},{"location":"Reference/accera/#constructors_3","title":"Constructors","text":"
  • Scalar ([value])
  • Scalar ([value, name, role])
  • Scalar ([element_type, role])
"},{"location":"Reference/accera/#class-acceradimension","title":"class accera.Dimension","text":"

A specialization of Scalar with element_type as ScalarType.index.

"},{"location":"Reference/accera/#constructors_4","title":"Constructors","text":"
  • Dimension ([role])
  • Dimension ([name, role])
  • Dimension ([value, name, role])
"},{"location":"Reference/accera/#class-acceraschedule","title":"class accera.Schedule","text":"

A scheduled (ordered) loop nest with no target-specific implementation details.

"},{"location":"Reference/accera/#methods_4","title":"Methods","text":"
  • create_plan ([target])
  • pad (index, size)
  • reorder (indices)
  • skew (index, reference_index)
  • split (index, size)
  • tile (indices, sizes)
  • get_indices ()
"},{"location":"Reference/accera/#class-accerafusedschedule","title":"class accera.FusedSchedule","text":"

Child class of class accera.Schedule created as a result of fusing multiple schedules.

"},{"location":"Reference/accera/#methods-in-addition-to-the-inherited-functions-from-class-acceraschedule","title":"Methods (in addition to the inherited functions from class accera.Schedule)","text":"
  • get_fusing_index ()
  • get_fused_indices ()
  • get_unfused_indices ()
"},{"location":"Reference/accera/#class-acceratarget","title":"class accera.Target","text":"

A target platform for the cross-compiler.

"},{"location":"Reference/accera/#constructors_5","title":"Constructors","text":"
  • Target ([architecture, cache_lines, cache_sizes, category, extensions, family, frequency_GHz, model, name, num_cores, num_threads, turbo_frequency_GHz])
"},{"location":"Reference/accera/#enumerations_2","title":"Enumerations","text":"
  • accera.Target.Architecture
  • accera.Target.Category
  • accera.Target.Models
"},{"location":"Reference/safety_analysis/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/safety_analysis/#safety-analysis","title":"Safety Analysis","text":"

One of the most important features in Accera is to provide safety guarantees to preserve the underlying logic no matter how we transform the schedule. Not all Accera schedules are safe, but those that are safe are much easier to work with.

"},{"location":"Reference/safety_analysis/#order-invariant-schedules","title":"Order-invariant Schedules","text":"

Order-invariant schedules are always safe because Accera transformations never remove any iterations. They only change the order of the loop-nest iterations, or add empty iterations in the form of padding when necessary. Recall that a Nest represents a simple nest. A simple nest is assumed to be order-invariant, and therefore any schedule created by a call to create_schedule() is safe.

"},{"location":"Reference/safety_analysis/#safety-and-fusing","title":"Safety and Fusing","text":"

Fusing is another way to create a schedule (see Section 4 of the Accera manual). Say that we have a sequence of n schedules: schedule0, schedule1, ... and we partially fuse their first m dimensions. Namely:

schedule = acc.fuse((schedule0, schedule1, ...), partial=m)\n
At this point, schedule is equivalent to sequentially executing the individual schedules for each iteration of the fused dimensions. However, is the fused schedule safe? In other words, does schedule guarantee the preservation of underlying logic, regardless of the applied transformation?

The dimensions of schedule fall into three categories:

  • Fused dimensions: at first, this category contains the next m dimensions of schedule. If any of these dimensions are split, the derived dimensions are also added to this category.
  • Fusing dimensions: at first, this category contains a single dimension, the first dimension of schedule. However, if this dimension is split, its derived dimensions are added to this category.
  • Unfused dimensions: all the remaining dimensions.

Note that the individual schedules being fused may have been created by previous fusing operations. The categories above relate to the role of each dimension in the current fusing operation.

"},{"location":"Reference/safety_analysis/#theorem","title":"Theorem","text":"

Imagine that we apply a sequence of transformations to schedule, which may derive new dimensions. Derived dimensions belong to the same category as the dimension from which they were derived. Suppose the fusing dimension (and its derived dimensions) precedes all the unfused dimensions. In that case, for any value of the fused dimensions, all the corresponding work from schedule0 is executed before any of the corresponding work from schedule1. Similarly, all the corresponding work from schedule1 is executed before any of the corresponding work from schedule2; and so on.

"},{"location":"Reference/safety_analysis/#proof","title":"Proof","text":"

For simplicity, assume that there is only one fusing dimension, f. Also, assume that we've only fused two schedules, schedule0 and schedule1. Note that these simplifying assumptions can easily be relaxed.

Assume that f precedes all of the unfused dimensions. Therefore, dimensions that precede f are necessarily fused dimensions. Let U be a sequence of concrete values for all the fused dimensions, and let V denote only those values that correspond to dimensions that precede f. The work from schedule0 that corresponds to the concrete values in U is contained in the slice (V, 0, *, ..., *). Similarly, the work form schedule1 that corresponds to the values in U is contained in (V, 1, *, ..., *). Finally, note that the former slice lexicographically precedes the latter, concluding the proof.

"},{"location":"Reference/safety_analysis/#an-example","title":"An example","text":"

To make the theorem less abstract, we demonstrate how it applies to a simple example. Assume that we start with two three-dimensional schedules, schedule0 and schedule1, and we fuse their first two dimensions:

i0, j0, k0 = schedule0.get_indices() # redundant operation, included for clarity\ni1, j1, k1 = schedule1.get_indices() # redundant operation, included for clarity\nschedule = acc.fuse((schedule0, schedule1), partial=2)\ni, j, f, k0, k1 = schedule.get_indices()\n
Next, say that we transform schedule by tiling dimensions j and k0 to reorder the dimensions as follows:
jj, kk0 = schedule.tile({\n    j: 4,\n    k0: 4\n})\nschedule.reorder(j, i, f, k0, k1, kk0, jj)\n
Dimensions i, j, and jj are fused dimensions, while k0, kk0, and k1 are unfused dimensions. Note that the fusing dimension f precedes all of the unfused dimensions, satisfying the theorem's condition. Next, choose concrete values for the fused dimensions, say, i=4, j=3, and jj=2. The work from schedule0 that corresponds to these values is contained in the slice (3, 4, 0, *, *, *, *). Similarly, the work from schedule1 that corresponds to these values is contained in the slice (3, 4, 1, *, *, *, *). The former slice lexicographically precedes the latter and is therefore executed first.

"},{"location":"Reference/safety_analysis/#safety","title":"Safety","text":"

The theorem holds for any schedule, but it does not imply that every schedule is safe. Additional effort is required to prove whether a specific schedule is safe. When performing a fuse operation, we must examine the specific circumstances and consider whether the theorem provides a sufficient condition for safety.

"},{"location":"Reference/classes/Array/Array/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Array/Array/#acceraarrayrole-data-element_type-layout-offset-shape","title":"accera.Array(role[, data, element_type, layout, offset, shape])","text":"

Constructs an array.

"},{"location":"Reference/classes/Array/Array/#arguments","title":"Arguments","text":"argument description type/default role The role of the array determines if the array scope is internal or external, if the array is mutable or immutable, and if the array memory is dynamically allocated. accera.Role data The contents of a constant array. Required for accera.Role.CONST arrays but should not be specified for other roles. Python buffer or numpy.ndarray. element_type The array element type. accera.ScalarType, default: accera.ScalarType.float32. layout The affine memory map. accera.Array.Layout, or tuple of (integers or accera.Dimension), default: accera.Array.Layout.FIRST_MAJOR. offset The offset of the affine memory map integer (positive, zero, or negative), default: 0. shape The array shape. Required for roles other than accera.Role.CONST, should not be specified for accera.Role.CONST. tuple of (integers or accera.Dimension)."},{"location":"Reference/classes/Array/Array/#examples","title":"Examples","text":"

Construct an input array:

import accera as acc\nA = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, 20))  # the default layout is acc.Array.Layout.FIRST_MAJOR\n

Construct an input array with an explicit standard layout:

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, 20), layout=acc.Array.Layout.LAST_MAJOR)\n

Construct an input array with an explicit affine memory map:

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, 20), layout=(1, 10))\n

Construct an input array with an infinite (undefined) major dimension:

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, acc.inf), layout=acc.Array.Layout.LAST_MAJOR)\n

Construct a input array with both runtime and compile-time dimension sizes:

M = acc.create_dimensions()\nA = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, 20))\n

Construct an input/output array:

A = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(10, 20))\n

Construct an input/output array with runtime input dimension sizes:

M, N = acc.create_dimensions()\nA = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n

Construct an output array with runtime output dimension sizes:

M, N = acc.create_dimensions(role=acc.Role.OUTPUT)\nA = acc.Array(role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n

Construct an output array with explicit affine memory map:

M, N = acc.create_dimensions(role=acc.Role.OUTPUT)\nA = acc.Array(role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N), layout=(1, M))\n

Construct a constant array:

D = np.random.rand(10, 16)\nA = acc.Array(role=acc.Role.CONST, data=D)\n

Construct a constant array with an explicit element type and layout, which does not necessarily match the input data:

D = np.random.rand(10, 16)\nA = acc.Array(role=acc.Role.CONST, element_type=acc.ScalarType.float32, layout=acc.Array.Layout.LAST_MAJOR, data=D)\n

Construct a temporary array:

A = acc.Array(role=acc.Role.TEMP, element_type=acc.ScalarType.float32, shape=(10, 20), layout=acc.Array.Layout.LAST_MAJOR)\n

"},{"location":"Reference/classes/Array/Layout/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Array/Layout/#acceraarraylayout","title":"accera.Array.Layout","text":"type description accera.Array.Layout.FIRST_MAJOR Specifies a memory layout where the first major axis is in contiguous memory. For example, in a matrix, this corresponds to \"row-major\". accera.Array.Layout.LAST_MAJOR Specifies a memory layout where the last major axis is in contiguous memory. For example, in a matrix, this corresponds to \"column-major\". accera.Array.Layout.DEFERRED Defer specifying the memory layout for an Role.CONST array until a cache is created."},{"location":"Reference/classes/Array/deferred_layout/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Array/deferred_layout/#acceraarraydeferred_layoutcache","title":"accera.Array.deferred_layout(cache)","text":"

Specifies the layout for a Role.CONST array based on a Cache. For more details, see Deferred layout of constant arrays

"},{"location":"Reference/classes/Array/deferred_layout/#arguments","title":"Arguments","text":"argument description type/default cache The cache that defines the layout to set. accera.Cache"},{"location":"Reference/classes/Array/deferred_layout/#examples","title":"Examples","text":"

Create a constant 16x16 array without specifying a layout. Later on, define its layout based on a cache:

import numpy as np\nimport accera as acc\n\nmatrix = np.random.rand(16, 16)\n\n# Create a constant array with a deferred layout\nA = acc.Array(role=acc.Role.CONST, data=matrix, layout=acc.Array.Layout.DEFERRED)\nB = Array(role=Role.INPUT_OUTPUT, element_type=ScalarType.float32, shape=matrix.shape)\n\nnest = Nest(shape=matrix.shape)\ni, j = nest.get_indices()\n\n@nest.iteration_logic\ndef_():\n    B[i, j] += A[i, j]\n\nplan = nest.create_plan()\n\n# create a cache for the constant array\nAA = plan.cache(A, i, layout=acc.Array.Layout.FIRST_MAJOR, thrifty=True)\n\n# update the constant array's layout based on the cache\nA.deferred_layout(cache=AA)\n
"},{"location":"Reference/classes/Array/sub_array/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Array/sub_array/#acceraarraysub_arrayoffsets-shape-strides","title":"accera.Array.sub_array(offsets, shape[, strides])","text":"

Creates a sub-array of a specific shape from an array. The sub-array is created from elements at specified offsets and strides into the original array.

"},{"location":"Reference/classes/Array/sub_array/#arguments","title":"Arguments","text":"argument description type/default offsets The offsets into the original array. Tuple[int] shape The size of the sub-array. Tuple[int] strides (Optional) The strides in the original array used to create the sub-array. Tuple[int]"},{"location":"Reference/classes/Array/sub_array/#examples","title":"Examples","text":"

Create a sub-array of size 2x3 from an array of size 5x5 at an offset of {1, 1} and a stride of {2, 1}:

import numpy as np\nimport accera as acc\n\nN = 5\nsubArrayNumRows = 2\nsubArrayNumCols = 3\n\nmatrix = np.random.rand(N, N)\nArr = Array(role=Role.INPUT, data=matrix)\n\n# Zero out a sub array of size [2, 3] such that the resulting array looks like this:\n# xxxxx\n# x000x\n# xxxxx\n# x000x\n# xxxxx\n\nnest = Nest(shape=(subArrayNumRows, subArrayNumCols))\ni, j = nest.get_indices()\n\n@nest.iteration_logic\ndef _():\n    SubArr = Arr.sub_array([1, 1], [subArrayNumRows, subArrayNumCols], [2, 1])\n    SubArr[i, j] = 0.0\n\nschedule = nest.create_schedule()\n
"},{"location":"Reference/classes/Dimension/Dimension/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Dimension/Dimension/#acceradimensionrole-value","title":"accera.Dimension([role, value])","text":"

Constructs a runtime dimension size with optional initialization.

Note: This constructor is meant for advanced use cases that involve Python generator expressions. For the simplified syntax to create dimensions, see create_dimensions.

"},{"location":"Reference/classes/Dimension/Dimension/#arguments","title":"Arguments","text":"argument description type/default role The role of the dimension determines if it is mutable or immutable. accera.Role. default: accera.Role.INPUT. name The name of the dimension variable. Default is an empty string. string value The optional value to initialize the dimension. Only applies to mutable dimensions (accera.Role.OUTPUT) integer or Dimension"},{"location":"Reference/classes/Dimension/Dimension/#returns","title":"Returns","text":"

Dimension

"},{"location":"Reference/classes/Dimension/Dimension/#examples","title":"Examples","text":"

Construct an output array with runtime dimensions using Python tuple comprehension over an input shape:

import accera as acc\n\n# input_shape is a tuple or list of acc.Dimensions or integers\noutput_shape = tuple(acc.Dimension(role=acc.Role.OUTPUT, value=i) for i in input_shape)\nA = acc.Array(role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32, shape=output_shape)\n

"},{"location":"Reference/classes/FusedSchedule/get_fused_indices/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/FusedSchedule/get_fused_indices/#accerafusedscheduleget_fused_indices","title":"accera.FusedSchedule.get_fused_indices()","text":"

Gets the fused indices of a fused schedule.

"},{"location":"Reference/classes/FusedSchedule/get_fused_indices/#returns","title":"Returns","text":"

Tuple of Index

"},{"location":"Reference/classes/FusedSchedule/get_fused_indices/#examples","title":"Examples","text":"
i, j = fused_schedule.get_fused_indices()\n
"},{"location":"Reference/classes/FusedSchedule/get_fusing_index/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/FusedSchedule/get_fusing_index/#accerafusedscheduleget_fusing_index","title":"accera.FusedSchedule.get_fusing_index()","text":"

Gets the fusing index of a fused schedule.

"},{"location":"Reference/classes/FusedSchedule/get_fusing_index/#returns","title":"Returns","text":"

Instance of Index

"},{"location":"Reference/classes/FusedSchedule/get_fusing_index/#examples","title":"Examples","text":"
f = fused_schedule.get_fusing_index()\n
"},{"location":"Reference/classes/FusedSchedule/get_unfused_indices/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/FusedSchedule/get_unfused_indices/#accerafusedscheduleget_unfused_indices","title":"accera.FusedSchedule.get_unfused_indices()","text":"

Gets the unfused indices of a fused schedule.

"},{"location":"Reference/classes/FusedSchedule/get_unfused_indices/#returns","title":"Returns","text":"

Tuple of Index

"},{"location":"Reference/classes/FusedSchedule/get_unfused_indices/#examples","title":"Examples","text":"
 k, l = fused_schedule.get_unfused_indices()\n
"},{"location":"Reference/classes/Nest/Nest/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Nest/Nest/#acceranestshape","title":"accera.Nest(shape)","text":"

Creates an affine loop nest.

"},{"location":"Reference/classes/Nest/Nest/#arguments","title":"Arguments","text":"argument description type/default shape The shape of the iteration space tuple of positive integers"},{"location":"Reference/classes/Nest/Nest/#examples","title":"Examples","text":"

Create a nest with 3 nested for-loops of sizes 16, 10, and 11:

nest = acc.Nest(shape=(16, 10, 11))\n
"},{"location":"Reference/classes/Nest/create_plan/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Nest/create_plan/#acceranestcreate_plantarget","title":"accera.Nest.create_plan([target])","text":"

Creates a plan using the default schedule for the nest.

"},{"location":"Reference/classes/Nest/create_plan/#arguments","title":"Arguments","text":"argument description type/default target The target platform. Defaults to acc.Target.HOST Target"},{"location":"Reference/classes/Nest/create_plan/#returns","title":"Returns","text":"

Plan

"},{"location":"Reference/classes/Nest/create_plan/#examples","title":"Examples","text":"

Create a plan for the host computer, using the default schedule for a nest:

plan = nest.create_plan()\n

Create a plan for an Intel Core 7th Generation, using the default schedule for a nest:

corei9 = acc.Target(\"Intel 7900X\", num_threads=44)\nplan = nest.create_plan(corei9)\n
"},{"location":"Reference/classes/Nest/create_schedule/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Nest/create_schedule/#acceranestcreate_schedule","title":"accera.Nest.create_schedule()","text":"

Create a default schedule for a nest.

"},{"location":"Reference/classes/Nest/create_schedule/#returns","title":"Returns","text":"

Schedule

"},{"location":"Reference/classes/Nest/create_schedule/#examples","title":"Examples","text":"
schedule = nest.create_schedule()\n
"},{"location":"Reference/classes/Nest/get_indices/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Nest/get_indices/#acceranestget_indices","title":"accera.Nest.get_indices()","text":"

Gets the iteration space dimensions for a nest.

"},{"location":"Reference/classes/Nest/get_indices/#returns","title":"Returns","text":"

Tuple of Index

"},{"location":"Reference/classes/Nest/get_indices/#examples","title":"Examples","text":"

Get the iteration space dimensions for a 3-dimensional nest:

i, j, k = nest.get_indices()\n
"},{"location":"Reference/classes/Nest/iteration_logic/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Nest/iteration_logic/#acceranestiteration_logiclogic","title":"accera.Nest.iteration_logic(logic)","text":"

Adds an iteration logic function to a Nest.

"},{"location":"Reference/classes/Nest/iteration_logic/#arguments","title":"Arguments","text":"argument description type/default logic Python function that represents the logic to run in the innermost loop of the nest."},{"location":"Reference/classes/Nest/iteration_logic/#examples","title":"Examples","text":"

The preferred syntax uses Python decorators, as follows:

import accera as acc\n\nA = acc.Array(role=acc.role.INPUT, shape=(16, 64))\nB = acc.Array(role=acc.role.INPUT, shape=(64, 32))\nC = acc.Array(role=acc.role.INPUT_OUTPUT, shape=(16, 32))\n\nnest = acc.Nest(shape=(16, 32, 64))\ni, j, k = nest.get_indices()\n\n@nest.iteration_logic\ndef _():\n    C[i,j] += A[i,k] * B[k,j]\n

The alternative syntax avoids decorators and instead defines the logic in a function:

import accera as acc\n\nA = acc.Array(role=acc.role.INPUT, shape=(16, 64))\nB = acc.Array(role=acc.role.INPUT, shape=(64, 32))\nC = acc.Array(role=acc.role.INPUT_OUTPUT, shape=(16, 32))\n\nnest = acc.Nest(shape=(16, 32, 64))\ni, j, k = nest.get_indices()\n\ndef logic_fn():\n    C[i, j] += A[i, k] * B[k, j]\n\nnest.iteration_logic(logic_fn)\n

"},{"location":"Reference/classes/Package/Format/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Package/Format/#accerapackageformat","title":"accera.Package.Format","text":"type description accera.Package.Format.HAT_DYNAMIC HAT package format, dynamically linked. accera.Package.Format.HAT_STATIC HAT package format, statically linked. accera.Package.Format.MLIR_DYNAMIC MLIR (debugging) package format, dynamically linked. accera.Package.Format.MLIR_STATIC MLIR (debugging) package format, statically linked.

When cross-compiling, use either accera.Package.Format.HAT_STATIC or accera.Package.Format.MLIR_STATIC.

"},{"location":"Reference/classes/Package/Mode/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Package/Mode/#accerapackagemode","title":"accera.Package.Mode","text":"type description accera.Package.Mode.DEBUG Debug mode (automatically tests logical equivalence). accera.Package.Mode.RELEASE Release (maximally optimized)."},{"location":"Reference/classes/Package/Package/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Package/Package/#accerapackagepackage","title":"accera.Package.Package()","text":"

A package of functions that can be built and linked with client code.

"},{"location":"Reference/classes/Package/Package/#examples","title":"Examples","text":"

Create a package:

package = acc.Package()\n
"},{"location":"Reference/classes/Package/Platform/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Package/Platform/#accerapackageplatform","title":"accera.Package.Platform","text":"type description accera.Package.Platform.HOST The host computer's platform accera.Package.Platform.WINDOWS The Windows platform accera.Package.Platform.LINUX The Linux platform accera.Package.Platform.MACOS The MacOS platform accera.Package.Platform.ANDRIOD The Android platform accera.Package.Platform.IOS The iOS platform accera.Package.Platform.RASPBIAN The Raspbian platform"},{"location":"Reference/classes/Package/add/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Package/add/#accerapackageaddsource-args-base_name-parameters","title":"accera.Package.add(source, args[, base_name, parameters])","text":"

Adds one or more functions to the package.

"},{"location":"Reference/classes/Package/add/#arguments","title":"Arguments","text":"argument description type source The source which defines the function's implementation. Nest or Schedule or Plan args The order of external-scope arrays, scalars, and dimensions used in the function signature. tuple of Array, Scalar, or Dim base_name A base name for the function. The full name for the function will be the base name followed by an automatically-generated unique identifier. string parameters A value for each parameter if the function's implementation is parameterized. See Parameters. A list of dictionaries can also be provided, in which case, multiple functions are generated. Parameter to value dictionary or a list of Parameter to value dictionaries."},{"location":"Reference/classes/Package/add/#examples","title":"Examples","text":"

Adding a function defined by an Plan:

package.add(plan, args=(A, B, C), base_name=\"simple_matmul\")\n

Convenience syntax to add a function defined by a Schedule. A default Plan will be created automatically:

package.add(schedule, args=(A, B, C), base_name=\"simple_matmul\")\n

Convenience syntax to add a function defined by a Nest. A default Schedule and Plan will be created internally:

package.add(nest, args=(A, B, C), base_name=\"simple_matmul\")\n

Adding a function with concrete values specified for its parameters (P0, P1, P2, P3).

package.add(nest, args=(A, B, C), parameters={P0:16, P1:16, P2:16, P3:1}, base_name=\"matmul_16_16_16_1\")\n

Adding a function with runtime dimension sizes M, N, K and arrays A, B, and C:

package.add(nest, args=(M, N, K, A, B, C), base_name=\"matmul_M_N_K\")\n
"},{"location":"Reference/classes/Package/add_description/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Package/add_description/#accerapackageadd_descriptionauthor-license-other-version","title":"accera.Package.add_description([author, license, other, version])","text":"

Adds descriptive metadata to the HAT package.

"},{"location":"Reference/classes/Package/add_description/#arguments","title":"Arguments","text":"argument description type/default author Name of the individual or group that authored the package. string license The internet URL of the license used to release the package. string other User-specific descriptive metadata dictionary version The package version. string"},{"location":"Reference/classes/Package/add_description/#examples","title":"Examples","text":"

Adds the standard version, license, and author description fields to the package:

package.add_description(version\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b=\"1.0\", license=\"https://mit-license.org/\", author=\"Microsoft Research\")\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\n

Adds arbitrary user-defined metadata to describe the package:

package.add_description(other={\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\"title\": \"My Package Title\", \"source\": \"https://github.com/\", \"citations\": [\"https://arxiv.org/2021.12345/\", \"https://arxiv.org/2021.56789/\"]}\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b)\n

"},{"location":"Reference/classes/Package/build/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Package/build/#accerapackagebuildname-format-mode-platform-tolerance-output_dir","title":"accera.Package.build(name[, format, mode, platform, tolerance, output_dir])","text":"

Builds a HAT package.

"},{"location":"Reference/classes/Package/build/#arguments","title":"Arguments","text":"argument description type/default name The package name. string format The format of the package. accera.Package.Format, defaults to Package.Format.HAT_STATIC mode The package mode, such as whether it is optimized or used for debugging. robopy.Package.Mode, defaults to Package.Mode.Release platform The platform where the package runs. accera.Package.Platform tolerance The tolerance for correctness checking when mode = Package.Mode.Debug. float, defaults to 1e-5 output_dir The path to an output directory. Defaults to the current directory if unspecified. string"},{"location":"Reference/classes/Package/build/#examples","title":"Examples","text":"

Build a Dynamically-linked HAT package called myPackage containing func1 for the host platform in the current directory:

package = acc.Package()\npackage.add(plan, base_name=\"func1\")\npackage.build(format=acc.Package.Format.HAT_DYNAMIC, name=\"myPackage\")\n

Build a statically-linked HAT package called myPackage containing func1 for the host platform in the hat_packages subdirectory:

package = acc.Package()\npackage.add(plan, base_name=\"func1\")\npackage.build(format=acc.Package.Format.HAT_STATIC, name=\"myPackage\", output_dir=\"hat_packages\")\n

Build a statically-linked myPackage with additional intermediate MLIR files for debugging purposes. To build a dynamically-linked package, use acc.Package.Format.MLIR_DYNAMIC:

package = acc.Package()\npackage.add(plan, base_name=\"func1\")\npackage.build(format=acc.Package.Format.MLIR_STATIC, name=\"myPackage\")\n

Build a package with error checking for func1, outputing error messages to stderr if the default implementation and the Accera implementation do not match within a tolerance of 1.0e-6:

package = acc.Package()\npackage.add(plan, base_name=\"func1\")\npackage.build(format=acc.Package.Format.HAT_DYNAMIC, name=\"myPackage\", mode=acc.Package.Mode.DEBUG, tolerance=1.0e-6)\n

Cross-compile a statically-linked HAT package called myPackage containing func1 for the Raspberry Pi 3. Note that dynamically-linked HAT packages are not supported for cross-compilation:

pi3 = Target(\"Raspberry Pi 3B\", category=Target.Category.CPU)\nplan = schedule.create_plan(target=pi3)\npackage = acc.Package()\npackage.add(plan, base_name=\"func1\")\npackage.build(format=acc.Package.Format.HAT_STATIC, name=\"myPackagePi3\", platform=acc.Package.Platform.RASPBIAN)\n
"},{"location":"Reference/classes/Plan/bind/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Plan/bind/#acceraplanbindmapping","title":"accera.Plan.bind(mapping)","text":"

Only available for targets that can execute a grid of work (such as GPUs). The bind function binds dimensions of the iteration space to axes of the target-specific grid (such as v100.GridUnit.BLOCK_X, v100.GridUnit.THREAD_X or v100.GridUnit.WARP_X on an Nvidia GPU).

"},{"location":"Reference/classes/Plan/bind/#arguments","title":"Arguments","text":"argument description type/default mapping Mapping of indices to GPU thread or block identifiers. dict of Index to target-specific identifiers"},{"location":"Reference/classes/Plan/bind/#examples","title":"Examples","text":"

Mark the i, j, and k indices to execute on NVidia V100's BLOCK_X, THREAD_X, and THREAD_Y grid axes, respectively.

v100 = acc.Target(Target.Model.NVIDIA_V100)\nplan.bind({\n    i: v100.GridUnit.BLOCK_X,\n    j: v100.GridUnit.THREAD_X,\n    k: v100.GridUnit.THREAD_Y\n})\n

In some cases, e.g. with tensorization where it might be non-trivial to assign threads to their respective data, it might be simpler to bind iteration space indices to warps (Nvidia) or waves (AMD) in the x and y dimensions rather than threads. This also abstracts the computation at a level higher than individual threads where, instead of each thread performing calculation independently we consider a group of threads (warp) working to solve a bigger computational problem collaboratively (as we often see in warp synchronous primitives like the CUDA's WMMA api). For example,

v100 = acc.Target(Target.Model.NVIDIA_V100)\nplan.bind({\n    i: v100.GridUnit.BLOCK_X,\n    j: v100.GridUnit.BLOCK_Y,\n    ii: v100.GridUnit.WARP_Y,\n    jj: v100.GridUnit.WARP_X\n})\n

in this case, we assign a warp/wave of threads to each unique combination of the (ii, jj) in the iteration space. The spatial arrangement of the warps to their data is defined by the ranges assigned to these individual indices. For example, if both ii and jj are ranges [0, 32) with step size of 16, we will have a total of 4 warps (2 in the x-dimension and 2 in the y-dimension) covering a 32x32 data region.

"},{"location":"Reference/classes/Plan/cache/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Plan/cache/#acceraplancachesource-index-trigger_index-layout-level-trigger_level-max_elements-element_type-strategy-thrifty-location-double_buffer-double_buffer_location-vectorize","title":"accera.Plan.cache(source[, index, trigger_index, layout, level, trigger_level, max_elements, element_type, strategy, thrifty, location, double_buffer, double_buffer_location, vectorize])","text":"

Adds a caching strategy to a plan.

"},{"location":"Reference/classes/Plan/cache/#arguments","title":"Arguments","text":"argument description type source The array or cache from which this cache is copied. Array or Cache. index The index used to determine the cache level. Specify one and only one of index, level, max_elements. Index. trigger_index The index used to determine what level to fill the cache at. trigger_index can't come after index in the schedule order and will default to index if not specified. Specify at most one of trigger_index or trigger_level. Index. layout The affine memory map, if different from the source. accera.Layout. level The key-slice level to cache (the number of wildcard dimensions in a key-slice). Specify one and only one of index, level, max_elements. positive integer. trigger_level The key-slice level to fill the cache at. trigger_level can't be smaller than level, and will default to level if not specified. Specify at most one of trigger_index or trigger_level. positive integer max_elements The maximum elements to include in the cached region. Specify one and only one of index, level, max_elements. positive integer element_type The element type to use in the cache. Defaults to the element type of the cached array ScalarType strategy The thread to data mapping pattern to use when collaboratively caching by multiple threads. Defaults to AUTO which will resolve to the strategy best suited for the current target environment. CacheStrategy thrifty Use thrifty caching (copy data into a cache only if the cached data differs from the original active block). bool location The type of memory used to store the cache. MemorySpace double_buffer Whether to make this cache a double-buffering cache. Only valid on INPUT and CONST arrays. bool double_buffer_location Which memory space to put the double buffer temp array in. Requires that double_buffer is set to True. Defaults to AUTO. MemorySpace or AUTO vectorize Whether to vectorize the cache operations. Defaults to AUTO, which will behave like vectorize=True if the loop-nest has any vectorized loop via plan.vectorize(index) or vectorize=False if the loop-nest has no vectorized loops. bool

AUTO will configure the double buffering location based on the following: location | double_buffer | double_buffer_location = AUTO --- | --- | --- MemorySpace.SHARED | True | MemorySpace.PRIVATE !MemorySpace.SHARED | True | Same value as location

"},{"location":"Reference/classes/Plan/cache/#returns","title":"Returns","text":"

A Cache handle that represents the created cache.

"},{"location":"Reference/classes/Plan/cache/#examples","title":"Examples","text":"

Create a cache of array A at level 2.

AA = plan.cache(A, level=2)\n

Create a cache of array A with the Array.Layout.FIRST_MAJOR layout:

AA = plan.cache(A, level=2, layout=acc.Array.Layout.FIRST_MAJOR)\n

Create a cache of array A for dimension j:

AA = plan.cache(A, index=j)\n

Create a cache of array A for the largest active block that does not exceed 1024 elements:

AA = plan.cache(A, max_elements=1024)\n

Create a level 2 cache of array A from its level 4 cache:

AA = plan.cache(A, level=4)\nAAA = plan.cache(AA, level=2)\n

Not yet implemented: Create a cache of array A at index i in GPU shared memory:

v100 = Target(Target.Model.NVIDIA_V100)\nAA = plan.cache(A, i, location=v100.MemorySpace.SHARED)\n

"},{"location":"Reference/classes/Plan/kernelize/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Plan/kernelize/#acceraplankernelizeunroll_indices-vectorize_indices","title":"accera.Plan.kernelize(unroll_indices[, vectorize_indices])","text":"

A convenience method for a sequence of unroll instructions followed by a possible sequence of vectorize instructions.

"},{"location":"Reference/classes/Plan/kernelize/#arguments","title":"Arguments","text":"argument description type/default unroll_indices The iteration-space dimensions to unroll tuple of accera.Index. vectorize_indices The optional iteration-space dimensions to vectorize accera.Index or tuple of accera.Index."},{"location":"Reference/classes/Plan/kernelize/#examples","title":"Examples","text":"

Unroll i and k, and then vectorize j:

schedule.reorder(i, k, j)\nplan = schedule.create_plan()\nplan.kernelize(unroll_indices=(i, k), vectorize_indices=j)\n

Another example is to Unroll i and then vectorize j and k:

schedule.reorder(i, j, k)\nplan = schedule.create_plan()\nplan.kernelize(unroll_indices=(i,), vectorize_indices=(j, k))\n
"},{"location":"Reference/classes/Plan/parallelize/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Plan/parallelize/#acceraplanparallelizeindices-pin-policy-max_threads","title":"accera.Plan.parallelize(indices[, pin, policy, max_threads])","text":"

Executes one or more loops in parallel on multiple cores or processors.

Only available for targets with multiple cores or processors.

"},{"location":"Reference/classes/Plan/parallelize/#arguments","title":"Arguments","text":"argument description type/default indices The iteration-space dimensions to run in parallel. To assign multiple threads to an index, first split that index, then parallelize its split indices. Unsplit indices will be assigned one thread each, split indices will be assigned threads based on the number of split blocks. This is limited by the number of threads supported by the target. tuple of accera.Index pin Pin the computation to a subset of cores or processors. tuple of target-specific identifiers policy The scheduling policy to apply (\"dynamic\" or \"static\"). string. Defaults to \"static\". max_threads The maximum number of threads to use when distributing the workload. The actual number of threads used is the lowest value among (a) max_threads, (b) the number of threads supported by the target and (c) the number of iterations in the domain as specified by indices. int. Defaults to None."},{"location":"Reference/classes/Plan/parallelize/#examples","title":"Examples","text":""},{"location":"Reference/classes/Plan/parallelize/#parallelize-the-i-j-and-k-dimensions-using-default-number-of-threads","title":"Parallelize the i, j, and k dimensions using default number of threads:","text":"
nest = Nest(shape=(2, 3, 4))\ni, j, k = nest.get_indices()\nplan.parallelize(indices=(i, j, k)) # This will use 2 x 3 x 4 = 24 threads\n
"},{"location":"Reference/classes/Plan/parallelize/#parallelize-the-i-dimension-after-splitting-using-default-number-of-threads","title":"Parallelize the i dimension after splitting using default number of threads:","text":"
nest = Nest(shape=(20,))\nschedule = nest.create_schedule()\ni = schedule.get_indices()\nii = schedule.split(i, 4)\nplan.parallelize(indices=i) # This will use 20 / 4 = 5 threads\n
"},{"location":"Reference/classes/Plan/parallelize/#parallelize-the-i-j-and-k-dimensions-using-thread-limit","title":"Parallelize the i, j, and k dimensions using thread limit:","text":"
nest = Nest(shape=(2, 3, 4))\ni, j, k = nest.get_indices()\nplan.parallelize(indices=(i, j, k), max_threads=4) # This will use 4 threads\n
"},{"location":"Reference/classes/Plan/parallelize/#parallelize-the-i-dimension-with-thread-limit-set-higher-than-the-number-of-iterations","title":"Parallelize the i dimension with thread limit set higher than the number of iterations:","text":"
nest = Nest(shape=(2, 3, 4))\ni, j, k = nest.get_indices()\nplan.parallelize(indices=i, max_threads=4) # This will use 2 threads since 'i' has only 2 iterations\n

Not yet implemented: Parallelize the i, j, and k dimensions by pinning them to specific cores on an Intel Xeon E5:

plan.parallelize(indices=(i, j, k), pin=(xeonE5.cores[0], xeonE5.cores[1], xeonE5.cores[2]))\n

Apply a dynamic scheduling policy, which uses a queue to partition the work across multiple cores:

plan.parallelize(indices=(i, j, k), policy=\"dynamic\")\n
"},{"location":"Reference/classes/Plan/tensorize/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Plan/tensorize/#acceraplantensorizeindices-mma_shape-use_static_offsets-num_total_passes-num_fused_passes-scheduling_policy-prologue_op-epilogue_op","title":"accera.Plan.tensorize(indices, mma_shape [, use_static_offsets, num_total_passes, num_fused_passes, scheduling_policy, prologue_op, epilogue_op])","text":"

Only available for targets with native matrix multiplication instruction (tensor core) support. Marks the dimensions of the iteration-space for tensorization. Only perfectly nested loops of the following form can be tensorized:

for i in range(M):\n    for k in range(N):\n        for j in range(K):\n            C[i, j] += A[i, k] * B[k, j]\n
"},{"location":"Reference/classes/Plan/tensorize/#arguments","title":"Arguments","text":"argument description type/default indices The 3-dimensional iteration space to tensorize. 3-D tuple of accera.Index mma_shape The type of MMA operation to use. accera.MMAShape use_static_offsets This is an optimization flag, which when enabled will use precomputed offset maps stored in device constant memory. Defaults to False. bool num_total_passes This controls the total number of passes to run. Defaults to 1. positive integer num_fused_passes This controls the number of passes for which register allocation is done, higher the value more the number of registers that are allocated. Defaults to None which will fuse all the passes as specified by num_total_passes. positive integer scheduling_policy For multi-block MMA operations, this controls whether matrix multiplication is done block-by-block or pass-by-pass (affects register usage). Default value is accera.MMASchedulingPolicy.PASS_ORDER accera.MMASchedulingPolicy prologue_op The element-wise operation to apply on matrix fragment data as a part of initialization (pre-tensorization). Default value is accera.MMAFragmentOp.NONE accera.MMAFragmentOp epilogue_op The element-wise operation to apply on matrix fragment data as a part of the final store (post-tensorization). Default value is accera.MMAFragmentOp.NONE accera.MMAFragmentOp

The different values of the enum MMAShape are explained here: accera.MMAShape

The different values of the enum MMASchedulingPolicy (applicable only for AMD targets supporting MFMA ops, such as accera.Target.Model.AMD_MI100) are mentioned here: accera.MMASchedulingPolicy

The different values of the enum MMAFragmentOp are explained here: accera.MMAFragmentOp

"},{"location":"Reference/classes/Plan/tensorize/#examples","title":"Examples","text":"

Mark the dimensions ii, jj, and kk for tensorization execution:

plan.tensorize(indices=(ii,jj,kk))\n
"},{"location":"Reference/classes/Plan/unroll/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Plan/unroll/#acceraplanunrollindex","title":"accera.Plan.unroll(index)","text":"

Marks a dimension of the iteration-space for unrolling.

"},{"location":"Reference/classes/Plan/unroll/#arguments","title":"Arguments","text":"argument description type/default index The index to unroll. Index"},{"location":"Reference/classes/Plan/unroll/#examples","title":"Examples","text":"

Mark the i dimension for unrolling:

plan.unroll(index=i)\n
"},{"location":"Reference/classes/Plan/vectorize/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Plan/vectorize/#acceraplanvectorizeindex","title":"accera.Plan.vectorize(index)","text":"

Only available for targets that have SIMD registers and support vector instructions. Marks a dimension of the iteration-space for vectorization.

"},{"location":"Reference/classes/Plan/vectorize/#arguments","title":"Arguments","text":"argument description type/default index The index to vectorize. Index"},{"location":"Reference/classes/Plan/vectorize/#examples","title":"Examples","text":"

Mark the dimension ii for vectorized execution:

plan.vectorize(index=ii)\n
"},{"location":"Reference/classes/Scalar/Scalar/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Scalar/Scalar/#accerascalarelement_type-value","title":"accera.Scalar([element_type, value])","text":"

Constructs a scalar that holds a number.

"},{"location":"Reference/classes/Scalar/Scalar/#arguments","title":"Arguments","text":"argument description type/default element_type The element type. accera.ScalarType, default: accera.ScalarType.float32. value An optional value. A number."},{"location":"Reference/classes/Scalar/Scalar/#examples","title":"Examples","text":"

Construct a float32 scalar:

import accera as acc\n\nX = acc.Scalar()\n

Construct a float32 scalar and initialize it:

Pi = acc.Scalar(value=3.14)\n

Construct integer scalars and perform arithmetic operations on them:

X = acc.Scalar(element_type=acc.ScalarType.int32)\nY = acc.Scalar(element_type=acc.ScalarType.int32)\nY.value = x + 2\n

"},{"location":"Reference/classes/Schedule/create_plan/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Schedule/create_plan/#acceraschedulecreate_plantarget","title":"accera.Schedule.create_plan([target])","text":"

Creates a plan for running this schedule.

"},{"location":"Reference/classes/Schedule/create_plan/#arguments","title":"Arguments","text":"argument description type/default target The target platform. Defaults to acc.Target.HOST Target"},{"location":"Reference/classes/Schedule/create_plan/#returns","title":"Returns","text":"

Plan

"},{"location":"Reference/classes/Schedule/create_plan/#examples","title":"Examples","text":"

TODO

"},{"location":"Reference/classes/Schedule/get_indices/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Schedule/get_indices/#accerascheduleget_indices","title":"accera.Schedule.get_indices()","text":"

Gets the iteration space dimensions for a schedule.

"},{"location":"Reference/classes/Schedule/get_indices/#returns","title":"Returns","text":"

Tuple of Index

"},{"location":"Reference/classes/Schedule/get_indices/#examples","title":"Examples","text":"

Get the iteration space dimensions for a 3-dimensional nest:

i, j, k = schedule.get_indices()\n
"},{"location":"Reference/classes/Schedule/is_valid_loop_order/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Schedule/is_valid_loop_order/#accerascheduleis_valid_loop_orderorder","title":"accera.Schedule.is_valid_loop_order(*order)","text":"

The is_valid_loop_order function determines if an order of indices is valid. For a description of valid schedule orders, refer to reorder.

"},{"location":"Reference/classes/Schedule/is_valid_loop_order/#arguments","title":"Arguments","text":"argument description type/default *order The order of indices to check for validity variable Index arguments"},{"location":"Reference/classes/Schedule/is_valid_loop_order/#examples","title":"Examples","text":"

Checks if an order is valid:

print(schedule.is_valid_loop_order(k, i, j))\n

Uses this function as part of a parameter filter to determine which permutations of loop order parameters are valid:

P1, P2, P2, P4, P5, loop_order = acc.create_parameters()\nschedule.reorder(order=loop_order)\n\ndef my_filter(parameters_choice):\n    P1, P2, P3, P4, P5, loop_order = parameters_choice\n\n    return P1 > P2 \\\n        and P3 > P4 \\\n        and P1 * P5 < P3 \\\n        and P2 * P5 < P4 \\\n        and schedule.is_valid_loop_order(loop_order)\n\n parameters = acc.create_parameter_grid({\n        P1: [64, 128, 256],\n        P2: [32, 128], \n        P3: [16, 32, 128],\n        P4: [8, 64],\n        P5: [4],\n        loop_order: (i, j, k, ii, jj, kk)\n    }, my_filter)\n
"},{"location":"Reference/classes/Schedule/pad/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Schedule/pad/#acceraschedulepadindex-size","title":"accera.Schedule.pad(index, size)","text":"

Pads the beginning of a specified dimension of the iteration-space with empty (no-op) elements.

"},{"location":"Reference/classes/Schedule/pad/#arguments","title":"Arguments","text":"argument description type/default index The dimension to pad Index size The number of elements to pad non-negative integer"},{"location":"Reference/classes/Schedule/pad/#examples","title":"Examples","text":"

Pads the beginning of dimension i with 10 empty elements

schedule.pad(i, 10)\n
"},{"location":"Reference/classes/Schedule/reorder/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Schedule/reorder/#acceraschedulereorderorder-args","title":"accera.Schedule.reorder(order, *args)","text":"

The reorder transformation sets the order of the indices in the schedule.

These orders are not allowed: 1. The outer dimension created by a split transformation must always precede the corresponding inner dimension. 2. The fusing dimension created by a fuse operation must always precede any unfused dimensions.

"},{"location":"Reference/classes/Schedule/reorder/#arguments","title":"Arguments","text":"argument description type/default order Either the order of indices to set or the outermost index if using variable arguments tuple of Index or Index. *args Optional variable arguments containing subsequent indices to set variable Index arguments"},{"location":"Reference/classes/Schedule/reorder/#examples","title":"Examples","text":"

Reorder a schedule by moving the k dimension to the outermost loop:

schedule.reorder(k, i, j)\n

Using a tuple to reorder a schedule. This overloaded form is better suited for parameters:

schedule.reorder(order=(k, i, j))\n
"},{"location":"Reference/classes/Schedule/skew/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Schedule/skew/#accerascheduleskewindex-reference_index-unroll_loops_smaller_than","title":"accera.Schedule.skew(index, reference_index [, unroll_loops_smaller_than])","text":"

Transforms a dimension with respect to a reference dimension into a parallelogram by padding with empty elements.

"},{"location":"Reference/classes/Schedule/skew/#arguments","title":"Arguments","text":"argument description type/default index The dimension to skew Index reference_index The reference dimension Index unroll_loops_smaller_than Unroll loops that are smaller than this range (non-inclusive) non-negative integer"},{"location":"Reference/classes/Schedule/skew/#examples","title":"Examples","text":"

Skew dimension i with respect to dimension j:

schedule.skew(i, j)\n

Skew dimension j with respect to dimension i, and unroll if the resulting loops are smaller than 3:

schedule.skew(j, i, unroll_loops_smaller_than=3)\n
"},{"location":"Reference/classes/Schedule/split/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Schedule/split/#acceraschedulesplitindex-size","title":"accera.Schedule.split(index, size)","text":"

The split transformation takes a dimension i and a size, modifies i, and creates a new dimension ii.

Assume that the original size of dimension i was n: The split transformation splits dimension i into ceil(n/size) parts of size size, arranges each of those parts along dimension ii, and stacks the ceil(n/size) parts along dimension i.

If the split size does not divide the dimension size, empty elements are added such that the split size does divide the dimension size.

"},{"location":"Reference/classes/Schedule/split/#arguments","title":"Arguments","text":"argument description type/default index The dimension to split Index size The split size non-negative integer"},{"location":"Reference/classes/Schedule/split/#returns","title":"Returns","text":"

Index for the new inner dimension

"},{"location":"Reference/classes/Schedule/split/#examples","title":"Examples","text":"

Split the i dimension by 5, creating a new dimension ii:

ii = schedule.split(j, 5)\n
"},{"location":"Reference/classes/Schedule/tile/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Schedule/tile/#accerascheduletileshape","title":"accera.Schedule.tile(shape)","text":"

The tile transformation is a convenience syntax that takes a tuple of indices and a tuple of sizes, and splits each index by the corresponding size. The indices involved in the split are then ordered such that all the outer indices precede all of their respective inner indices.

"},{"location":"Reference/classes/Schedule/tile/#arguments","title":"Arguments","text":"argument description type/default shape Mapping of indices to tile sizes dict of Index and non-negative integers"},{"location":"Reference/classes/Schedule/tile/#returns","title":"Returns","text":"

Tuple of Index representing the new inner dimensions.

"},{"location":"Reference/classes/Schedule/tile/#examples","title":"Examples","text":"

Tile the i, j, and k dimensions by 8, 2, and 3, respectively.

ii, jj, kk = schedule.tile({\n    i: 8,\n    j: 2,\n    k: 3\n})\n
"},{"location":"Reference/classes/Target/Architecture/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Target/Architecture/#acceratargetarchitecture","title":"accera.Target.Architecture","text":"

Defines the supported target architectures.

type description accera.Target.Architecture.HOST The host computer's architecture accera.Target.Architecture.ARM The ARM architecture accera.Target.Architecture.AARCH64 The 64-bit ARM architecture accera.Target.Architecture.X86 The 32-bit x86 architecture accera.Target.Architecture.X86_64 The 64-bit x86 architecture

TODO: AARCH64?

"},{"location":"Reference/classes/Target/Category/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Target/Category/#acceratargetcategory","title":"accera.Target.Category","text":"

Defines the target processor category.

type description accera.Target.Category.CPU accera.Target.Category.GPU"},{"location":"Reference/classes/Target/Model/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Target/Model/#acceratargetmodel","title":"accera.Target.Model","text":"

Defines constants for some well-known CPU models.

type description accera.Target.Model.AMD_1200 AMD 1200 accera.Target.Model.AMD_1300X AMD 1300X accera.Target.Model.AMD_1400 AMD 1400 accera.Target.Model.AMD_1500X AMD 1500X accera.Target.Model.AMD_1600 AMD 1600 accera.Target.Model.AMD_1600X AMD 1600X accera.Target.Model.AMD_1700 AMD 1700 accera.Target.Model.AMD_1700X AMD 1700X accera.Target.Model.AMD_1800X AMD 1800X accera.Target.Model.AMD_1900X AMD 1900X accera.Target.Model.AMD_1920X AMD 1920X accera.Target.Model.AMD_1950X AMD 1950X accera.Target.Model.AMD_200GE AMD 200GE accera.Target.Model.AMD_2200G AMD 2200G accera.Target.Model.AMD_2200GE AMD 2200GE accera.Target.Model.AMD_2200U AMD 2200U accera.Target.Model.AMD_220GE AMD 220GE accera.Target.Model.AMD_2300U AMD 2300U accera.Target.Model.AMD_2300X AMD 2300X accera.Target.Model.AMD_2400G AMD 2400G accera.Target.Model.AMD_2400GE AMD 2400GE accera.Target.Model.AMD_240GE AMD 240GE accera.Target.Model.AMD_2500U AMD 2500U accera.Target.Model.AMD_2500X AMD 2500X accera.Target.Model.AMD_2600 AMD 2600 accera.Target.Model.AMD_2600E AMD 2600E accera.Target.Model.AMD_2600H AMD 2600H accera.Target.Model.AMD_2600X AMD 2600X accera.Target.Model.AMD_2700 AMD 2700 accera.Target.Model.AMD_2700E AMD 2700E accera.Target.Model.AMD_2700U AMD 2700U accera.Target.Model.AMD_2700X AMD 2700X accera.Target.Model.AMD_2700X_GOLD_EDITION AMD 2700X Gold Edition accera.Target.Model.AMD_2800H AMD 2800H accera.Target.Model.AMD_2920X AMD 2920X accera.Target.Model.AMD_2950X AMD 2950X accera.Target.Model.AMD_2970WX AMD 2970WX accera.Target.Model.AMD_2990WX AMD 2990WX accera.Target.Model.AMD_3000G AMD 3000G accera.Target.Model.AMD_300U AMD 300U accera.Target.Model.AMD_3050U AMD 3050U accera.Target.Model.AMD_3101 AMD 3101 accera.Target.Model.AMD_3150U AMD 3150U accera.Target.Model.AMD_3151 AMD 3151 accera.Target.Model.AMD_3200G AMD 3200G accera.Target.Model.AMD_3200U AMD 3200U accera.Target.Model.AMD_3201 AMD 3201 accera.Target.Model.AMD_3250U AMD 3250U accera.Target.Model.AMD_3251 AMD 3251 accera.Target.Model.AMD_3255 AMD 3255 accera.Target.Model.AMD_3300U AMD 3300U accera.Target.Model.AMD_3301 AMD 3301 accera.Target.Model.AMD_3351 AMD 3351 accera.Target.Model.AMD_3400G AMD 3400G accera.Target.Model.AMD_3401 AMD 3401 accera.Target.Model.AMD_3451 AMD 3451 accera.Target.Model.AMD_3500 AMD 3500 accera.Target.Model.AMD_3500U AMD 3500U accera.Target.Model.AMD_3500X AMD 3500X accera.Target.Model.AMD_3550H AMD 3550H accera.Target.Model.AMD_3580U AMD 3580U accera.Target.Model.AMD_3600 AMD 3600 accera.Target.Model.AMD_3600X AMD 3600X accera.Target.Model.AMD_3600XT AMD 3600XT accera.Target.Model.AMD_3700U AMD 3700U accera.Target.Model.AMD_3700X AMD 3700X accera.Target.Model.AMD_3750H AMD 3750H accera.Target.Model.AMD_3780U AMD 3780U accera.Target.Model.AMD_3800X AMD 3800X accera.Target.Model.AMD_3800XT AMD 3800XT accera.Target.Model.AMD_3900 AMD 3900 accera.Target.Model.AMD_3900X AMD 3900X accera.Target.Model.AMD_3900XT AMD 3900XT accera.Target.Model.AMD_3950X AMD 3950X accera.Target.Model.AMD_3960X AMD 3960X accera.Target.Model.AMD_3970X AMD 3970X accera.Target.Model.AMD_3980X AMD 3980X accera.Target.Model.AMD_3990X AMD 3990X accera.Target.Model.AMD_4300G AMD 4300G accera.Target.Model.AMD_4300GE AMD 4300GE accera.Target.Model.AMD_4300U AMD 4300U accera.Target.Model.AMD_4500U AMD 4500U accera.Target.Model.AMD_4600G AMD 4600G accera.Target.Model.AMD_4600GE AMD 4600GE accera.Target.Model.AMD_4600H AMD 4600H accera.Target.Model.AMD_4600HS AMD 4600HS accera.Target.Model.AMD_4600U AMD 4600U accera.Target.Model.AMD_4680U AMD 4680U accera.Target.Model.AMD_4700G AMD 4700G accera.Target.Model.AMD_4700GE AMD 4700GE accera.Target.Model.AMD_4700U AMD 4700U accera.Target.Model.AMD_4800H AMD 4800H accera.Target.Model.AMD_4800HS AMD 4800HS accera.Target.Model.AMD_4800U AMD 4800U accera.Target.Model.AMD_4900H AMD 4900H accera.Target.Model.AMD_4900HS AMD 4900HS accera.Target.Model.AMD_4980U AMD 4980U accera.Target.Model.AMD_5300G AMD 5300G accera.Target.Model.AMD_5300GE AMD 5300GE accera.Target.Model.AMD_5300U AMD 5300U accera.Target.Model.AMD_5400U AMD 5400U accera.Target.Model.AMD_5500U AMD 5500U accera.Target.Model.AMD_5600G AMD 5600G accera.Target.Model.AMD_5600GE AMD 5600GE accera.Target.Model.AMD_5600H AMD 5600H accera.Target.Model.AMD_5600HS AMD 5600HS accera.Target.Model.AMD_5600U AMD 5600U accera.Target.Model.AMD_5600X AMD 5600X accera.Target.Model.AMD_5700G AMD 5700G accera.Target.Model.AMD_5700GE AMD 5700GE accera.Target.Model.AMD_5700U AMD 5700U accera.Target.Model.AMD_5800 AMD 5800 accera.Target.Model.AMD_5800H AMD 5800H accera.Target.Model.AMD_5800HS AMD 5800HS accera.Target.Model.AMD_5800U AMD 5800U accera.Target.Model.AMD_5800X AMD 5800X accera.Target.Model.AMD_5900 AMD 5900 accera.Target.Model.AMD_5900HS AMD 5900HS accera.Target.Model.AMD_5900HX AMD 5900HX accera.Target.Model.AMD_5900X AMD 5900X accera.Target.Model.AMD_5950X AMD 5950X accera.Target.Model.AMD_5980HS AMD 5980HS accera.Target.Model.AMD_5980HX AMD 5980HX accera.Target.Model.AMD_7232P AMD 7232P accera.Target.Model.AMD_7251 AMD 7251 accera.Target.Model.AMD_7252 AMD 7252 accera.Target.Model.AMD_7261 AMD 7261 accera.Target.Model.AMD_7262 AMD 7262 accera.Target.Model.AMD_7272 AMD 7272 accera.Target.Model.AMD_7281 AMD 7281 accera.Target.Model.AMD_7282 AMD 7282 accera.Target.Model.AMD_72F3 AMD 72F3 accera.Target.Model.AMD_7301 AMD 7301 accera.Target.Model.AMD_7302 AMD 7302 accera.Target.Model.AMD_7302P AMD 7302P accera.Target.Model.AMD_7313 AMD 7313 accera.Target.Model.AMD_7313P AMD 7313P accera.Target.Model.AMD_7343 AMD 7343 accera.Target.Model.AMD_7351 AMD 7351 accera.Target.Model.AMD_7351P AMD 7351P accera.Target.Model.AMD_7352 AMD 7352 accera.Target.Model.AMD_7371 AMD 7371 accera.Target.Model.AMD_73F3 AMD 73F3 accera.Target.Model.AMD_7401 AMD 7401 accera.Target.Model.AMD_7401P AMD 7401P accera.Target.Model.AMD_7402 AMD 7402 accera.Target.Model.AMD_7402P AMD 7402P accera.Target.Model.AMD_7413 AMD 7413 accera.Target.Model.AMD_7443 AMD 7443 accera.Target.Model.AMD_7443P AMD 7443P accera.Target.Model.AMD_7451 AMD 7451 accera.Target.Model.AMD_7452 AMD 7452 accera.Target.Model.AMD_7453 AMD 7453 accera.Target.Model.AMD_74F3 AMD 74F3 accera.Target.Model.AMD_7501 AMD 7501 accera.Target.Model.AMD_7502 AMD 7502 accera.Target.Model.AMD_7502P AMD 7502P accera.Target.Model.AMD_7513 AMD 7513 accera.Target.Model.AMD_7532 AMD 7532 accera.Target.Model.AMD_7542 AMD 7542 accera.Target.Model.AMD_7543 AMD 7543 accera.Target.Model.AMD_7543P AMD 7543P accera.Target.Model.AMD_7551 AMD 7551 accera.Target.Model.AMD_7551P AMD 7551P accera.Target.Model.AMD_7552 AMD 7552 accera.Target.Model.AMD_75F3 AMD 75F3 accera.Target.Model.AMD_7601 AMD 7601 accera.Target.Model.AMD_7642 AMD 7642 accera.Target.Model.AMD_7643 AMD 7643 accera.Target.Model.AMD_7662 AMD 7662 accera.Target.Model.AMD_7663 AMD 7663 accera.Target.Model.AMD_7702 AMD 7702 accera.Target.Model.AMD_7702P AMD 7702P accera.Target.Model.AMD_7713 AMD 7713 accera.Target.Model.AMD_7713P AMD 7713P accera.Target.Model.AMD_7742 AMD 7742 accera.Target.Model.AMD_7763 AMD 7763 accera.Target.Model.AMD_7F32 AMD 7F32 accera.Target.Model.AMD_7F52 AMD 7F52 accera.Target.Model.AMD_7F72 AMD 7F72 accera.Target.Model.AMD_7H12 AMD 7H12 accera.Target.Model.AMD_7V12 AMD 7V12 accera.Target.Model.AMD_FIREFLIGHT AMD FireFlight accera.Target.Model.AMD_PRO_1200 AMD PRO 1200 accera.Target.Model.AMD_PRO_1300 AMD PRO 1300 accera.Target.Model.AMD_PRO_1500 AMD PRO 1500 accera.Target.Model.AMD_PRO_1600 AMD PRO 1600 accera.Target.Model.AMD_PRO_1700 AMD PRO 1700 accera.Target.Model.AMD_PRO_1700X AMD PRO 1700X accera.Target.Model.AMD_PRO_200GE AMD PRO 200GE accera.Target.Model.AMD_PRO_2200G AMD PRO 2200G accera.Target.Model.AMD_PRO_2200GE AMD PRO 2200GE accera.Target.Model.AMD_PRO_2300U AMD PRO 2300U accera.Target.Model.AMD_PRO_2400G AMD PRO 2400G accera.Target.Model.AMD_PRO_2400GE AMD PRO 2400GE accera.Target.Model.AMD_PRO_2500U AMD PRO 2500U accera.Target.Model.AMD_PRO_2600 AMD PRO 2600 accera.Target.Model.AMD_PRO_2700 AMD PRO 2700 accera.Target.Model.AMD_PRO_2700U AMD PRO 2700U accera.Target.Model.AMD_PRO_2700X AMD PRO 2700X accera.Target.Model.AMD_PRO_300GE AMD PRO 300GE accera.Target.Model.AMD_PRO_300U AMD PRO 300U accera.Target.Model.AMD_PRO_3200G AMD PRO 3200G accera.Target.Model.AMD_PRO_3200GE AMD PRO 3200GE accera.Target.Model.AMD_PRO_3300U AMD PRO 3300U accera.Target.Model.AMD_PRO_3400G AMD PRO 3400G accera.Target.Model.AMD_PRO_3400GE AMD PRO 3400GE accera.Target.Model.AMD_PRO_3500U AMD PRO 3500U accera.Target.Model.AMD_PRO_3600 AMD PRO 3600 accera.Target.Model.AMD_PRO_3700 AMD PRO 3700 accera.Target.Model.AMD_PRO_3700U AMD PRO 3700U accera.Target.Model.AMD_PRO_3900 AMD PRO 3900 accera.Target.Model.AMD_PRO_4350G AMD PRO 4350G accera.Target.Model.AMD_PRO_4350GE AMD PRO 4350GE accera.Target.Model.AMD_PRO_4450U AMD PRO 4450U accera.Target.Model.AMD_PRO_4650G AMD PRO 4650G accera.Target.Model.AMD_PRO_4650GE AMD PRO 4650GE accera.Target.Model.AMD_PRO_4650U AMD PRO 4650U accera.Target.Model.AMD_PRO_4750G AMD PRO 4750G accera.Target.Model.AMD_PRO_4750GE AMD PRO 4750GE accera.Target.Model.AMD_PRO_4750U AMD PRO 4750U accera.Target.Model.AMD_PRO_5350G AMD PRO 5350G accera.Target.Model.AMD_PRO_5350GE AMD PRO 5350GE accera.Target.Model.AMD_PRO_5450U AMD PRO 5450U accera.Target.Model.AMD_PRO_5650G AMD PRO 5650G accera.Target.Model.AMD_PRO_5650GE AMD PRO 5650GE accera.Target.Model.AMD_PRO_5650U AMD PRO 5650U accera.Target.Model.AMD_PRO_5750G AMD PRO 5750G accera.Target.Model.AMD_PRO_5750GE AMD PRO 5750GE accera.Target.Model.AMD_PRO_5850U AMD PRO 5850U accera.Target.Model.AMD_R1102G AMD R1102G accera.Target.Model.AMD_R1305G AMD R1305G accera.Target.Model.AMD_R1505G AMD R1505G accera.Target.Model.AMD_R1606G AMD R1606G accera.Target.Model.AMD_V1202B AMD V1202B accera.Target.Model.AMD_V1404I AMD V1404I accera.Target.Model.AMD_V1500B AMD V1500B accera.Target.Model.AMD_V1605B AMD V1605B accera.Target.Model.AMD_V1756B AMD V1756B accera.Target.Model.AMD_V1780B AMD V1780B accera.Target.Model.AMD_V1807B AMD V1807B accera.Target.Model.AMD_V2516 AMD V2516 accera.Target.Model.AMD_V2546 AMD V2546 accera.Target.Model.AMD_V2718 AMD V2718 accera.Target.Model.AMD_V2748 AMD V2748 accera.Target.Model.ARM_CORTEX_M4 ARM Cortex-M4 accera.Target.Model.ARM_CORTEX_M4F ARM Cortex-M4F accera.Target.Model.APPLE_M1_MAX Apple M1 Max accera.Target.Model.INTEL_1000G1 Intel 1000G1 accera.Target.Model.INTEL_1000G4 Intel 1000G4 accera.Target.Model.INTEL_1005G1 Intel 1005G1 accera.Target.Model.INTEL_10100 Intel 10100 accera.Target.Model.INTEL_10100F Intel 10100F accera.Target.Model.INTEL_10100T Intel 10100T accera.Target.Model.INTEL_10300 Intel 10300 accera.Target.Model.INTEL_10300T Intel 10300T accera.Target.Model.INTEL_1030G4 Intel 1030G4 accera.Target.Model.INTEL_1030G7 Intel 1030G7 accera.Target.Model.INTEL_10320 Intel 10320 accera.Target.Model.INTEL_1035G1 Intel 1035G1 accera.Target.Model.INTEL_1035G4 Intel 1035G4 accera.Target.Model.INTEL_1035G7 Intel 1035G7 accera.Target.Model.INTEL_10400 Intel 10400 accera.Target.Model.INTEL_10400F Intel 10400F accera.Target.Model.INTEL_10400T Intel 10400T accera.Target.Model.INTEL_10500 Intel 10500 accera.Target.Model.INTEL_10500T Intel 10500T accera.Target.Model.INTEL_10600 Intel 10600 accera.Target.Model.INTEL_10600K Intel 10600K accera.Target.Model.INTEL_10600KF Intel 10600KF accera.Target.Model.INTEL_10600T Intel 10600T accera.Target.Model.INTEL_1060G7 Intel 1060G7 accera.Target.Model.INTEL_1065G7 Intel 1065G7 accera.Target.Model.INTEL_1068G7 Intel 1068G7 accera.Target.Model.INTEL_10700 Intel 10700 accera.Target.Model.INTEL_10700F Intel 10700F accera.Target.Model.INTEL_10700K Intel 10700K accera.Target.Model.INTEL_10700KF Intel 10700KF accera.Target.Model.INTEL_10700T Intel 10700T accera.Target.Model.INTEL_10850K Intel 10850K accera.Target.Model.INTEL_10900 Intel 10900 accera.Target.Model.INTEL_10900F Intel 10900F accera.Target.Model.INTEL_10900K Intel 10900K accera.Target.Model.INTEL_10900KF Intel 10900KF accera.Target.Model.INTEL_10900T Intel 10900T accera.Target.Model.INTEL_10910 Intel 10910 accera.Target.Model.INTEL_11100B Intel 11100B accera.Target.Model.INTEL_1115G7 Intel 1115G7 accera.Target.Model.INTEL_1125G7 Intel 1125G7 accera.Target.Model.INTEL_1135G7 Intel 1135G7 accera.Target.Model.INTEL_11400 Intel 11400 accera.Target.Model.INTEL_11400F Intel 11400F accera.Target.Model.INTEL_11400T Intel 11400T accera.Target.Model.INTEL_1145G7 Intel 1145G7 accera.Target.Model.INTEL_11500 Intel 11500 accera.Target.Model.INTEL_11500B Intel 11500B accera.Target.Model.INTEL_11500T Intel 11500T accera.Target.Model.INTEL_1155G7 Intel 1155G7 accera.Target.Model.INTEL_11600 Intel 11600 accera.Target.Model.INTEL_11600K Intel 11600K accera.Target.Model.INTEL_11600KF Intel 11600KF accera.Target.Model.INTEL_11600T Intel 11600T accera.Target.Model.INTEL_1165G7 Intel 1165G7 accera.Target.Model.INTEL_11700 Intel 11700 accera.Target.Model.INTEL_11700B Intel 11700B accera.Target.Model.INTEL_11700F Intel 11700F accera.Target.Model.INTEL_11700K Intel 11700K accera.Target.Model.INTEL_11700KF Intel 11700KF accera.Target.Model.INTEL_11700T Intel 11700T accera.Target.Model.INTEL_11850H Intel 11850H accera.Target.Model.INTEL_1185G7 Intel 1185G7 accera.Target.Model.INTEL_11900 Intel 11900 accera.Target.Model.INTEL_11900F Intel 11900F accera.Target.Model.INTEL_11900K Intel 11900K accera.Target.Model.INTEL_11900KB Intel 11900KB accera.Target.Model.INTEL_11900KF Intel 11900KF accera.Target.Model.INTEL_11900T Intel 11900T accera.Target.Model.INTEL_1195G7 Intel 1195G7 accera.Target.Model.INTEL_2104G Intel 2104G accera.Target.Model.INTEL_2124 Intel 2124 accera.Target.Model.INTEL_2124G Intel 2124G accera.Target.Model.INTEL_2126G Intel 2126G accera.Target.Model.INTEL_2134 Intel 2134 accera.Target.Model.INTEL_2136 Intel 2136 accera.Target.Model.INTEL_2144G Intel 2144G accera.Target.Model.INTEL_2146G Intel 2146G accera.Target.Model.INTEL_2174G Intel 2174G accera.Target.Model.INTEL_2176G Intel 2176G accera.Target.Model.INTEL_2186G Intel 2186G accera.Target.Model.INTEL_2314 Intel 2314 accera.Target.Model.INTEL_2324G Intel 2324G accera.Target.Model.INTEL_2334 Intel 2334 accera.Target.Model.INTEL_2336 Intel 2336 accera.Target.Model.INTEL_2356G Intel 2356G accera.Target.Model.INTEL_2374G Intel 2374G accera.Target.Model.INTEL_2378 Intel 2378 accera.Target.Model.INTEL_2378G Intel 2378G accera.Target.Model.INTEL_2386G Intel 2386G accera.Target.Model.INTEL_2388G Intel 2388G accera.Target.Model.INTEL_3204 Intel 3204 accera.Target.Model.INTEL_4108 Intel 4108 accera.Target.Model.INTEL_4109T Intel 4109T accera.Target.Model.INTEL_4110 Intel 4110 accera.Target.Model.INTEL_4112 Intel 4112 accera.Target.Model.INTEL_4114 Intel 4114 accera.Target.Model.INTEL_4208 Intel 4208 accera.Target.Model.INTEL_4209T Intel 4209T accera.Target.Model.INTEL_4210 Intel 4210 accera.Target.Model.INTEL_4210R Intel 4210R accera.Target.Model.INTEL_4214 Intel 4214 accera.Target.Model.INTEL_4214R Intel 4214R accera.Target.Model.INTEL_4214Y Intel 4214Y accera.Target.Model.INTEL_4215 Intel 4215 accera.Target.Model.INTEL_4215R Intel 4215R accera.Target.Model.INTEL_4216 Intel 4216 accera.Target.Model.INTEL_5215 Intel 5215 accera.Target.Model.INTEL_5215L Intel 5215L accera.Target.Model.INTEL_5215M Intel 5215M accera.Target.Model.INTEL_5217 Intel 5217 accera.Target.Model.INTEL_5218 Intel 5218 accera.Target.Model.INTEL_5218B Intel 5218B accera.Target.Model.INTEL_5218N Intel 5218N accera.Target.Model.INTEL_5218R Intel 5218R accera.Target.Model.INTEL_5218T Intel 5218T accera.Target.Model.INTEL_5220 Intel 5220 accera.Target.Model.INTEL_5220R Intel 5220R accera.Target.Model.INTEL_5220S Intel 5220S accera.Target.Model.INTEL_5220T Intel 5220T accera.Target.Model.INTEL_5222 Intel 5222 accera.Target.Model.INTEL_6035 Intel 6035 accera.Target.Model.INTEL_6098P Intel 6098P accera.Target.Model.INTEL_6100 Intel 6100 accera.Target.Model.INTEL_6100T Intel 6100T accera.Target.Model.INTEL_6209U Intel 6209U accera.Target.Model.INTEL_6210U Intel 6210U accera.Target.Model.INTEL_6212U Intel 6212U accera.Target.Model.INTEL_6222V Intel 6222V accera.Target.Model.INTEL_6226 Intel 6226 accera.Target.Model.INTEL_6226R Intel 6226R accera.Target.Model.INTEL_6230 Intel 6230 accera.Target.Model.INTEL_6230N Intel 6230N accera.Target.Model.INTEL_6230R Intel 6230R accera.Target.Model.INTEL_6230T Intel 6230T accera.Target.Model.INTEL_6234 Intel 6234 accera.Target.Model.INTEL_6238 Intel 6238 accera.Target.Model.INTEL_6238L Intel 6238L accera.Target.Model.INTEL_6238M Intel 6238M accera.Target.Model.INTEL_6238R Intel 6238R accera.Target.Model.INTEL_6238T Intel 6238T accera.Target.Model.INTEL_6240 Intel 6240 accera.Target.Model.INTEL_6240L Intel 6240L accera.Target.Model.INTEL_6240M Intel 6240M accera.Target.Model.INTEL_6240R Intel 6240R accera.Target.Model.INTEL_6240Y Intel 6240Y accera.Target.Model.INTEL_6242 Intel 6242 accera.Target.Model.INTEL_6242R Intel 6242R accera.Target.Model.INTEL_6244 Intel 6244 accera.Target.Model.INTEL_6246 Intel 6246 accera.Target.Model.INTEL_6246R Intel 6246R accera.Target.Model.INTEL_6248 Intel 6248 accera.Target.Model.INTEL_6248R Intel 6248R accera.Target.Model.INTEL_6252 Intel 6252 accera.Target.Model.INTEL_6252N Intel 6252N accera.Target.Model.INTEL_6254 Intel 6254 accera.Target.Model.INTEL_6258R Intel 6258R accera.Target.Model.INTEL_6262V Intel 6262V accera.Target.Model.INTEL_6300 Intel 6300 accera.Target.Model.INTEL_6300T Intel 6300T accera.Target.Model.INTEL_6320 Intel 6320 accera.Target.Model.INTEL_6400 Intel 6400 accera.Target.Model.INTEL_6400T Intel 6400T accera.Target.Model.INTEL_6402P Intel 6402P accera.Target.Model.INTEL_6500 Intel 6500 accera.Target.Model.INTEL_6500T Intel 6500T accera.Target.Model.INTEL_6585R Intel 6585R accera.Target.Model.INTEL_6600 Intel 6600 accera.Target.Model.INTEL_6600K Intel 6600K accera.Target.Model.INTEL_6600T Intel 6600T accera.Target.Model.INTEL_6685R Intel 6685R accera.Target.Model.INTEL_6700 Intel 6700 accera.Target.Model.INTEL_6700K Intel 6700K accera.Target.Model.INTEL_6700T Intel 6700T accera.Target.Model.INTEL_6785R Intel 6785R accera.Target.Model.INTEL_6820HQ Intel 6820HQ accera.Target.Model.INTEL_7100 Intel 7100 accera.Target.Model.INTEL_7100T Intel 7100T accera.Target.Model.INTEL_7101E Intel 7101E accera.Target.Model.INTEL_7101TE Intel 7101TE accera.Target.Model.INTEL_7300 Intel 7300 accera.Target.Model.INTEL_7300T Intel 7300T accera.Target.Model.INTEL_7320 Intel 7320 accera.Target.Model.INTEL_7350K Intel 7350K accera.Target.Model.INTEL_7400 Intel 7400 accera.Target.Model.INTEL_7400T Intel 7400T accera.Target.Model.INTEL_7500 Intel 7500 accera.Target.Model.INTEL_7500T Intel 7500T accera.Target.Model.INTEL_7505 Intel 7505 accera.Target.Model.INTEL_7600 Intel 7600 accera.Target.Model.INTEL_7600K Intel 7600K accera.Target.Model.INTEL_7600T Intel 7600T accera.Target.Model.INTEL_7640X Intel 7640X accera.Target.Model.INTEL_7700 Intel 7700 accera.Target.Model.INTEL_7700K Intel 7700K accera.Target.Model.INTEL_7700T Intel 7700T accera.Target.Model.INTEL_7740X Intel 7740X accera.Target.Model.INTEL_7800X Intel 7800X accera.Target.Model.INTEL_7820X Intel 7820X accera.Target.Model.INTEL_7900X Intel 7900X accera.Target.Model.INTEL_7920X Intel 7920X accera.Target.Model.INTEL_7940X Intel 7940X accera.Target.Model.INTEL_7960X Intel 7960X accera.Target.Model.INTEL_7980XE Intel 7980XE accera.Target.Model.INTEL_8086K Intel 8086K accera.Target.Model.INTEL_8100 Intel 8100 accera.Target.Model.INTEL_8100F Intel 8100F accera.Target.Model.INTEL_8100T Intel 8100T accera.Target.Model.INTEL_8253 Intel 8253 accera.Target.Model.INTEL_8256 Intel 8256 accera.Target.Model.INTEL_8260 Intel 8260 accera.Target.Model.INTEL_8260L Intel 8260L accera.Target.Model.INTEL_8260M Intel 8260M accera.Target.Model.INTEL_8260Y Intel 8260Y accera.Target.Model.INTEL_8268 Intel 8268 accera.Target.Model.INTEL_8270 Intel 8270 accera.Target.Model.INTEL_8272CL Intel 8272CL accera.Target.Model.INTEL_8273CL Intel 8273CL accera.Target.Model.INTEL_8276 Intel 8276 accera.Target.Model.INTEL_8276L Intel 8276L accera.Target.Model.INTEL_8276M Intel 8276M accera.Target.Model.INTEL_8280 Intel 8280 accera.Target.Model.INTEL_8280L Intel 8280L accera.Target.Model.INTEL_8280M Intel 8280M accera.Target.Model.INTEL_8284 Intel 8284 accera.Target.Model.INTEL_8300 Intel 8300 accera.Target.Model.INTEL_8300T Intel 8300T accera.Target.Model.INTEL_8350K Intel 8350K accera.Target.Model.INTEL_8351N Intel 8351N accera.Target.Model.INTEL_8352S Intel 8352S accera.Target.Model.INTEL_8352V Intel 8352V accera.Target.Model.INTEL_8352Y Intel 8352Y accera.Target.Model.INTEL_8358 Intel 8358 accera.Target.Model.INTEL_8358P Intel 8358P accera.Target.Model.INTEL_8360Y Intel 8360Y accera.Target.Model.INTEL_8362 Intel 8362 accera.Target.Model.INTEL_8368 Intel 8368 accera.Target.Model.INTEL_8368Q Intel 8368Q accera.Target.Model.INTEL_8380 Intel 8380 accera.Target.Model.INTEL_8400 Intel 8400 accera.Target.Model.INTEL_8400T Intel 8400T accera.Target.Model.INTEL_8500 Intel 8500 accera.Target.Model.INTEL_8500T Intel 8500T accera.Target.Model.INTEL_8550U Intel 8550U accera.Target.Model.INTEL_8600 Intel 8600 accera.Target.Model.INTEL_8600K Intel 8600K accera.Target.Model.INTEL_8600T Intel 8600T accera.Target.Model.INTEL_8650U Intel 8650U accera.Target.Model.INTEL_8700 Intel 8700 accera.Target.Model.INTEL_8700K Intel 8700K accera.Target.Model.INTEL_8700T Intel 8700T accera.Target.Model.INTEL_9221 Intel 9221 accera.Target.Model.INTEL_9222 Intel 9222 accera.Target.Model.INTEL_9242 Intel 9242 accera.Target.Model.INTEL_9282 Intel 9282 accera.Target.Model.INTEL_9800X Intel 9800X accera.Target.Model.INTEL_9820X Intel 9820X accera.Target.Model.INTEL_9900X Intel 9900X accera.Target.Model.INTEL_9920X Intel 9920X accera.Target.Model.INTEL_9940X Intel 9940X accera.Target.Model.INTEL_9960X Intel 9960X accera.Target.Model.INTEL_9980XE Intel 9980XE accera.Target.Model.INTEL_9990XE Intel 9990XE accera.Target.Model.INTEL_E3_1220_V6 Intel E3-1220 v6 accera.Target.Model.INTEL_E3_1225_V6 Intel E3-1225 v6 accera.Target.Model.INTEL_E3_1230_V6 Intel E3-1230 v6 accera.Target.Model.INTEL_E3_1240_V6 Intel E3-1240 v6 accera.Target.Model.INTEL_E3_1245_V6 Intel E3-1245 v6 accera.Target.Model.INTEL_E3_1270_V6 Intel E3-1270 v6 accera.Target.Model.INTEL_E3_1275_V6 Intel E3-1275 v6 accera.Target.Model.INTEL_E3_1280_V6 Intel E3-1280 v6 accera.Target.Model.INTEL_E3_1285_V6 Intel E3-1285 v6 accera.Target.Model.INTEL_E5_1607_V2 Intel E5-1607 v2 accera.Target.Model.INTEL_E5_1620_V2 Intel E5-1620 v2 accera.Target.Model.INTEL_E5_1650_V2 Intel E5-1650 v2 accera.Target.Model.INTEL_E5_1650_V3 Intel E5-1650 v3 accera.Target.Model.INTEL_E5_1660_V2 Intel E5-1660 v2 accera.Target.Model.INTEL_E5_1660_V3 Intel E5-1660 v3 accera.Target.Model.INTEL_E5_1680_V2 Intel E5-1680 v2 accera.Target.Model.INTEL_E5_1680_V3 Intel E5-1680 v3 accera.Target.Model.INTEL_E5_2620_V3 Intel E5-2620 v3 accera.Target.Model.INTEL_E5_2673_V4 Intel E5-2673 v4 accera.Target.Model.INTEL_G3900 Intel G3900 accera.Target.Model.INTEL_G3900T Intel G3900T accera.Target.Model.INTEL_G3900TE Intel G3900TE accera.Target.Model.INTEL_G3920 Intel G3920 accera.Target.Model.INTEL_G4400 Intel G4400 accera.Target.Model.INTEL_G4400T Intel G4400T accera.Target.Model.INTEL_G4400TE Intel G4400TE accera.Target.Model.INTEL_G4500 Intel G4500 accera.Target.Model.INTEL_G4500T Intel G4500T accera.Target.Model.INTEL_G4520 Intel G4520 accera.Target.Model.INTEL_W_1250 Intel W-1250 accera.Target.Model.INTEL_W_1250P Intel W-1250P accera.Target.Model.INTEL_W_1270 Intel W-1270 accera.Target.Model.INTEL_W_1270P Intel W-1270P accera.Target.Model.INTEL_W_1290 Intel W-1290 accera.Target.Model.INTEL_W_1290P Intel W-1290P accera.Target.Model.INTEL_W_1290T Intel W-1290T accera.Target.Model.INTEL_W_1350 Intel W-1350 accera.Target.Model.INTEL_W_1350P Intel W-1350P accera.Target.Model.INTEL_W_1370 Intel W-1370 accera.Target.Model.INTEL_W_1370P Intel W-1370P accera.Target.Model.INTEL_W_1390 Intel W-1390 accera.Target.Model.INTEL_W_1390P Intel W-1390P accera.Target.Model.INTEL_W_1390T Intel W-1390T accera.Target.Model.INTEL_W_2102 Intel W-2102 accera.Target.Model.INTEL_W_2104 Intel W-2104 accera.Target.Model.INTEL_W_2123 Intel W-2123 accera.Target.Model.INTEL_W_2125 Intel W-2125 accera.Target.Model.INTEL_W_2133 Intel W-2133 accera.Target.Model.INTEL_W_2135 Intel W-2135 accera.Target.Model.INTEL_W_2140B Intel W-2140B accera.Target.Model.INTEL_W_2150B Intel W-2150B accera.Target.Model.INTEL_W_3175X Intel W-3175X accera.Target.Model.INTEL_W_3223 Intel W-3223 accera.Target.Model.INTEL_W_3225 Intel W-3225 accera.Target.Model.INTEL_W_3235 Intel W-3235 accera.Target.Model.INTEL_W_3245 Intel W-3245 accera.Target.Model.INTEL_W_3245M Intel W-3245M accera.Target.Model.INTEL_W_3265 Intel W-3265 accera.Target.Model.INTEL_W_3265M Intel W-3265M accera.Target.Model.INTEL_W_3275 Intel W-3275 accera.Target.Model.INTEL_W_3275M Intel W-3275M accera.Target.Model.RASPBERRY_PI_3B Raspberry Pi 3B accera.Target.Model.RASPBERRY_PI_4B Raspberry Pi 4B accera.Target.Model.RASPBERRY_PI_ZERO Raspberry Pi Zero

The enum also defines constants for some well-known GPU models.

type description accera.Target.Model.AMD_MI100 AMD MI100 accera.Target.Model.AMD_MI200 AMD MI200 accera.Target.Model.AMD_MI50 AMD MI50 accera.Target.Model.AMD_RADEON7 AMD Radeon7 accera.Target.Model.NVIDIA_A100 NVidia A100 accera.Target.Model.NVIDIA_P100 NVidia P100 accera.Target.Model.NVIDIA_RTX_A6000 NVidia RTX A6000 accera.Target.Model.NVIDIA_V100 NVidia V100"},{"location":"Reference/classes/Target/Runtime/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Target/Runtime/#acceratargetruntime","title":"accera.Target.Runtime","text":"

The runtime for code generation and/or compilation.

type description accera.Target.Runtime.CUDA The NVidia CUDA runtime. accera.Target.Runtime.ROCM The AMD ROCm runtime. accera.Target.Runtime.VULKAN The Vulkan runtime. accera.Target.Runtime.OPENMP The OpenMP runtime."},{"location":"Reference/classes/Target/Target/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/classes/Target/Target/#acceratargetarchitecture-cache_lines-cache_sizes-category-extensions-family-frequency_ghz-known_name-model-name-num_cores-num_threads-runtime-tensor_core_info-turbo_frequency_ghz-vector_bytes-vector_registers","title":"accera.Target([architecture, cache_lines, cache_sizes, category, extensions, family, frequency_GHz, known_name, model, name, num_cores, num_threads, runtime, tensor_core_info, turbo_frequency_GHz, vector_bytes, vector_registers)","text":"

Defines the capabilities of a target processor.

"},{"location":"Reference/classes/Target/Target/#arguments","title":"Arguments","text":"argument description type/default architecture The processor architecture accera.Target.Architecture cache_lines Cache lines (kilobytes) list of positive integers cache_sizes Cache sizes (bytes) list of positive integers category The processor category accera.Target.Category extensions Supported processor extensions list of extension codes family The processor family string frequency_GHz The processor frequency (GHz) positive number known_name A name of a device known to Accera string | accera.Target.Model / \"HOST\" model The processor model accera.Target.Model name The processor name string num_cores Number of cores positive integer num_threads Number of threads positive integer runtime The runtime accera.Target.Runtime tensor_core_info The tensor core capabilities, such as the supported input type, output type, and shapes accera.Targets.TensorCoreInformation turbo_frequency_GHz Turbo frequency (GHz) positive number vector_bytes Bytes per vector register positive number vector_registers total number of SIMD registers positive number"},{"location":"Reference/classes/Target/Target/#known-device-names","title":"Known device names","text":"

Accera provides a pre-defined list of known targets through the accera.Target.Models enumeration.

These known targets provide typical hardware settings and may not fit your specific hardware characteristics exactly. If your target matches closely with (but not exactly to) one of these targets, you can always start with a known target and update the properties accordingly.

If your target is your host machine, Accera will first try to find your host machine's CPU in the list of known devices then use its corresponding capabilities. If none is found, we recommend that you inspect the closest matching device in accera.Target.Models enumeration in order to generate optimal code. If there is no closely matching device for you host machine, we suggest you to look at the following section to define a cpu target in Accera.

"},{"location":"Reference/classes/Target/Target/#examples","title":"Examples","text":"

Let's have a look at some examples to understand how to define a CPU target in Accera.

Create a custom CPU target:

cpu_target = acc.Target(name=\"Custom processor\", category=acc.Target.Category.CPU, architecture=acc.Target.Architecture.X86_64, num_cores=10)\n

We further create a known CPU target and can selectively override fields.

gen10 = acc.Target(\n                known_name=\"Intel 7940X\",\n                category=acc.Target.Category.CPU,\n                extensions=[\"SSE4.1\", \"SSE4.2\", \"AVX2\"])\n

In this example, we created a target device of a known CPU but overrode the extensions to remove AVX512 support.

You can use this example as a starting point to define any other Intel Core Processor. Their specifications are listed in the table above.

Craete a pre-defined GPU target representing an NVidia Tesla v100 processor:

v100 = acc.Target(model=acc.Target.Model.NVIDIA_TESLA_V100)\n

Here is another example to create a custom GPU target:

gpu_target = acc.Target(name=\"Custom GPU processor\", category=acc.Target.Category.GPU, default_block_size=16)\n
"},{"location":"Reference/classes/Target/Target/#additional-notes-on-instruction-set-extensions","title":"Additional Notes on Instruction Set Extensions","text":"

It is important to identify the number of vector registers and vector bytes of each SIMD register. These values may help you determine if you are leveraging the vector units of the underlying hardware to its best capabilities.

"},{"location":"Reference/classes/Target/Target/#avx","title":"AVX","text":"

Advanced Vector Extensions (AVX) promotes legacy 128-bit SIMD instructions that operate on XMM registers to use a vector-extension (VEX) prefix and operate on 256-bit YMM registers.

Intel AVX introduced support for 256-bit wide SIMD registers (YMM0-YMM7 in operating modes that are 32-bit or less, YMM0-YMM15 in 64-bit mode). For Accera, 64-bit mode is the default. a target. The lower 128-bits of the YMM registers are aliased to the respective 128-bit XMM registers. In Intel AVX, there are 256-bit wide vector registers, 16 XMM registers, and 16 YMM registers to support an extension of 128-bits.

"},{"location":"Reference/classes/Target/Target/#avx512","title":"AVX512","text":"

AVX-512 is a further extension offering 32 ZMM registers, and each SIMD register is 512 bits (64 bytes) wide.

"},{"location":"Reference/classes/Target/Target/#sse4-extension","title":"SSE4 Extension","text":"

There are 16 XMM registers (XMM0 to XMM15), each 128-bit wide. In 64-bit mode, eight additional XMM registers are accessible. Registers XMM8-XMM15 are accessed by using REX prefixes.

"},{"location":"Reference/enumerations/CacheStrategy/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/enumerations/CacheStrategy/#acceracachestrategy","title":"accera.CacheStrategy","text":"type description accera.CacheStrategy.BLOCKED Every thread copies a contiguous block of memory based on their thread index. e.g. If 100 elements are cached by 10 threads, thread 0 copies elements [0, 10), thread 1 copies elements [10, 20) and so on. accera.CacheStrategy.STRIPED Every thread copies a part of their contribution in a round-robin fashion. e.g. In the previous example, thread 0 will now copy elements [0, 2), [20, 22), [40, 42), [60, 62) and [80, 82), thread 1 will copy [2, 4), [22, 24), [42, 44), [62, 64) and [82, 84) and so on. The minimum number of contiguous elements that each thread copies is governed by the vectorization parameter, which in this example is 2.

The effects of different caching strategies can be noticed as performance characteristics arising out of overhead caused by bank conflicts, memory coalescing etc.

"},{"location":"Reference/enumerations/MMAFragmentOp/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/enumerations/MMAFragmentOp/#accerammafragmentop","title":"accera.MMAFragmentOp","text":"type description accera.MMAFragmentOp.NONE No-op which does not modify the fragment data, i.e. f(x) = x. accera.MMAFragmentOp.ReLU Rectified linear unit activation function (details), i.e. f(x) = max(0, x). accera.MMAFragmentOp.ReLU_NoConditional Rectified linear unit activation function which does not generate divergent code, i.e. f(x) = x * bool(x > 0). accera.MMAFragmentOp.CLEAR Sets the data to constant 0, i.e. f(x) = 0."},{"location":"Reference/enumerations/MMASchedulingPolicy/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/enumerations/MMASchedulingPolicy/#accerammaschedulingpolicy","title":"accera.MMASchedulingPolicy","text":"type description accera.MMASchedulingPolicy.PASS_ORDER Process pass groups (fused passed) sequentially, within each pass group compute all the MFMA blocks. This allocates Accmulator registers required for all the blocks, however it only allocates input (A, B) registers which are only required for the current pass group. accera.MMASchedulingPolicy.BLOCK_ORDER Process MFMA blocks sequentially, for each block iterate over all the passes. This allocates Accumulator registers required for only 1 block and input (A, B) registers required for the entire pass group currently being processed. In this mode, input data for the same pass group is loaded into registers multiple times, once per block."},{"location":"Reference/enumerations/MMAShape/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/enumerations/MMAShape/#accerammashape","title":"accera.MMAShape","text":"

The following table shows the matrix multiplication parameters associated with the different enum values, for different data types for a single pass. So for example a single pass of the M32xN32xK2_B1 operation would take input matrices of dimensions [32x2] (A) and [2x32] (B) to produce a matrix multiplication result of dimensions [32x32] (C). These operations can then be composed together to perform matrix multiplication of larger matrices.

More information about the corresponding Matrix Arithmetic Instructions (MAI) can be found here.

Supported MMA shapes and their compatible types for AMD targets accera.MMAShape MFMA Instruction M, N, K Input Type (ScalarType) Output Type (ScalarType) Compute Type (C++) M64xN64xK1_B4 V_MFMA_F32_16x16x1F32 64, 64, 1 float32 float32 float M64xN64xK1_B2 V_MFMA_F32_32x32x1F32 M32xN32xK2_B1 V_MFMA_F32_32x32x2F32 32, 32, 2 M16xN16xK4_B1 V_MFMA_F32_16x16x4F32 16, 16, 4 M64xN64xK2_B4 V_MFMA_F32_16X16X2BF16 64, 64, 2 bfloat16 bfloat16/float32 M64xN64xK2_B2 V_MFMA_F32_32X32X2BF16 bfloat16/float32 M32xN32xK4_B1 V_MFMA_F32_32X32X4BF16 32, 32, 4 bfloat16/float32 M16xN16xK8_B1 V_MFMA_F32_16X16X8BF16 16, 16, 8 bfloat16/float32 M64xN64xK4_B4 V_MFMA_F32_16x16x4F16 64, 64, 4 float16 float16/32 V_MFMA_I32_16X16X4I8 int8 int8/16/32 int M64xN64xK4_B2 V_MFMA_F32_32x32x4F16 float16 float16/32 float V_MFMA_I32_32X32X4I8 int8 int8/16/32 int M32xN32xK8_B1 V_MFMA_F32_32x32x8F16 32, 32, 8 float16 float16/32 float V_MFMA_I32_32X32X8I8 int8 int8/16/32 int M16xN16xK16_B1 V_MFMA_F32_16x16x16F16 16, 16, 16 float16 float16/32 float V_MFMA_I32_16X16X16I8 int8 int8/16/32 int Supported MMA shapes and their compatible types for Nvidia targets accera.MMAShape M, N, K Input Type (ScalarType) Output Type (ScalarType) Compute Type (C++) M16xN16xK8_B1 16, 16, 8 float32 float32 tf32* M16xN16xK16_B1 16, 16, 16 float16 float16/32 float bfloat16 float32 u/int8 int32 int M32xN8xK16_B1 32, 8, 16 float16 float16/32 float bfloat16 float32 u/int8 int32 int M8xN32xK16_B1 8, 32, 16 float16 float16/32 float bfloat16 float32 u/int8 int32 int

*TensorFloat-32 is a floating-point type introduced in the Nvidia Ampere architecture for accelerating FP32 performance. Information about this can be found here and in more detail in the architecture whitepaper. In this mode, multiplication is performed in TF32 precision and accumulation happens in FP32 precision.

"},{"location":"Reference/enumerations/Role/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/enumerations/Role/#accerarole","title":"accera.Role","text":"type description accera.Role.CONST A constant array (immutable internally scoped) whose contents are known at compile-time. accera.Role.INPUT An input array (immutable external-scope). accera.Role.INPUT_OUTPUT An input/output array (mutable external-scope). accera.Role.OUTPUT An output array (mutable external-scope) which is allocated at runtime. accera.Role.TEMP A temporary array (mutable internal-scope)."},{"location":"Reference/enumerations/ScalarType/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/enumerations/ScalarType/#accerascalartype","title":"accera.ScalarType","text":"type description accera.ScalarType.bool boolean accera.ScalarType.float16 16-bit floating point number accera.ScalarType.float32 32-bit floating point number accera.ScalarType.float64 64-bit floating point number accera.ScalarType.bfloat16 16-bit Brain floating point number accera.ScalarType.int8 8-bit signed integer accera.ScalarType.int16 16-bit signed integer accera.ScalarType.int32 32-bit signed integer accera.ScalarType.int64 64-bit signed integer accera.ScalarType.uint8 8-bit unsigned integer accera.ScalarType.uint16 16-bit unsigned integer accera.ScalarType.uint32 32-bit unsigned integer accera.ScalarType.uint64 64-bit unsigned integer"},{"location":"Reference/functions/cast/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/functions/cast/#acceracastvalue-type","title":"accera.cast(value, type)","text":"

The cast operation converts a value from one acc.ScalarType to another.

Accera performs implicit casting between most types. Therefore, this operation should only be used to override the implicit casting behavior documented in Section 2.

Limitation: casting constants may result in truncation.

"},{"location":"Reference/functions/cast/#arguments","title":"Arguments","text":"argument description type/default value The value to cast type The destination type acc.ScalarType"},{"location":"Reference/functions/cast/#returns","title":"Returns","text":"

The result after casting

"},{"location":"Reference/functions/cast/#examples","title":"Examples","text":"

Casting from float32 to int16:

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(10, 20))\nB = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.int16, shape=(10, 20))\n\nnest = acc.Nest(10, 20)\ni, j = nest.get_indices()\n\n@nest.iteration_logic:\ndef _():\n    B[i, j] = acc.cast(A[i, j], acc.ScalarType.int16) # explicit cast to int16\n    ...\n

In comparison, casting from int16 to float32 is implicit, which means the cast operation can be omitted:

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.int16, shape=(10, 20))\nB = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(10, 20))\n\nnest = acc.Nest(10, 20)\ni, j = nest.get_indices()\n\n@nest.iteration_logic:\ndef _():\n    B[i, j] = A[i, j] # implicit cast to float32\n    ...\n

Casting a constant to int8:

A = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.int8, shape=(10, 20))\n\nnest = acc.Nest(10, 20)\ni, j = nest.get_indices()\n\n@nest.iteration_logic:\ndef _():\n    A[i, j] = acc.cast(10, acc.ScalarType.int8)\n    ...\n
"},{"location":"Reference/functions/create_dimensions/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/functions/create_dimensions/#acceracreate_dimensionsrole","title":"accera.create_dimensions([role])","text":"

Creates placeholder dimensions of the specified role. These typically represent runtime Array and Nest dimensions.

There are two roles for runtime dimensions:

  • accera.Role.INPUT - immutable dimension that is provided by an input parameter to an Accera function
  • accera.Role.OUTPUT - mutable dimension that is set within an Accera function

A third type of dimension, the compile-time dimension, is not covered here because it is just a constant.

"},{"location":"Reference/functions/create_dimensions/#arguments","title":"Arguments","text":"argument description type/default role The role of the dimension determines if it is mutable or immutable. accera.Role. default: accera.Role.INPUT. Must be set to accera.Role.OUTPUT if intended for an accera.Role.OUTPUT Array."},{"location":"Reference/functions/create_dimensions/#returns","title":"Returns","text":"

Tuple of Dimension

"},{"location":"Reference/functions/create_dimensions/#examples","title":"Examples","text":"

Construct an input array with runtime input dimensions:

import accera as acc\nM, K = acc.create_dimensions()\nA = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))\n

Construct a input/output array using a combination of runtime and compile-time dimensions, respectively:

M = acc.create_dimensions()\nA = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, 20))\n

Adding a function for an input/output array with runtime input dimensions:

M, N = acc.create_dimensions()\nA = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n\nnest = acc.Nest(M, N)\n...\n\npackage = acc.Package()\npackage.add(nest, args=(A, M, N), base_name=\"myFunc\")\n

Construct a output array with runtime (mutable) output dimensions.

M, N = acc.create_dimensions(role=acc.Role.OUTPUT)\nA = acc.Array(role=acc.Role.OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n

Assign the value of a runtime input dimension to a runtime output dimension:

M = acc.create_dimensions()\nN = acc.create_dimensions(role=acc.Role.OUTPUT)\n\nN.value = M\n

Assign the value of a runtime input dimension to a runtime output dimension:

M = acc.create_dimensions()\nN = acc.create_dimensions(role=acc.Role.OUTPUT)\n\nN.value = M\n

Assign an integer value to a runtime output dimension:

N = acc.create_dimensions(role=acc.Role.OUTPUT)\nN.value = 100\n

Assign a value to a runtime output dimension using an expression of runtime input dimensions:

M, K = acc.create_dimensions()\nN = acc.create_dimensions(role=acc.Role.OUTPUT)\n\nN.value = M + K + 1\n

"},{"location":"Reference/functions/create_parameter_grid/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/functions/create_parameter_grid/#acceracreate_parameter_gridparameter_choices-filter_func-sample-seed","title":"accera.create_parameter_grid(parameter_choices, [filter_func, sample, seed])","text":"

Create a parameter grid from a dictionary that maps each parameter to its possible values.

"},{"location":"Reference/functions/create_parameter_grid/#arguments","title":"Arguments","text":"argument description type/default parameter_choices A dictionary that maps each parameter to its possible values dictionary filter_func A callable to filter parameter_choices which returns a bool to indicate whether a given parameter combination should be included in the grid Callable sample A number to limit the size of the parameter grid. The grid is randomly sampled. integer seed The seed value for random sampling. integer"},{"location":"Reference/functions/create_parameter_grid/#returns","title":"Returns","text":"

List of dictionary

"},{"location":"Reference/functions/create_parameter_grid/#examples","title":"Examples","text":"

Create a parameter grid from a dictionary that maps each parameter to its possible values:

parameters = acc.create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]})\npackage.add(nest, args=(A, B, C), base_name=\"matmul\", parameters)\n

Define a lambda or function to filter out combinations from the parameter grid. The arguments to the filter are the values of a parameter combination. The filter function should return True if the combination should be included, and False otherwise:

parameters = acc.create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]}, filter_func=lambda p0, p1, p2, p3: p2 < p1 and 4 * (p0 * p3 + p1 * p2 + p1 * p3 + p2 * p3) / 1024 < 256)\n

Parameter grids can result in a large number of possible combinations. We can limit the number of combinations by random sampling:

parameters = acc.create_parameter_grid(parameter_choices={P0:[8,16], P1:[16,32], P2:[16], P3:[1.0,2.0]}, sample=5)\n
"},{"location":"Reference/functions/create_parameters/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/functions/create_parameters/#acceracreate_parameters","title":"accera.create_parameters()","text":"

Creates placeholder parameters.

"},{"location":"Reference/functions/create_parameters/#returns","title":"Returns","text":"

Tuple of Parameter

"},{"location":"Reference/functions/create_parameters/#examples","title":"Examples","text":"

Create 3 parameters m, n, k. Use them to parameterize the nest shape:

m, n, k = acc.create_parameters()\nnest = acc.Nest(shape=(m, n, k))\n
"},{"location":"Reference/functions/fuse/","title":"Accera v1.2 Reference","text":""},{"location":"Reference/functions/fuse/#accerafuseschedules-args-partial","title":"accera.fuse(schedules[, *args, partial])","text":"

The fuse operation combines multiple iteration spaces into a single \"fused\" iteration space. The fused iteration space represents the union of the work in the original spaces.

In cases where it doesn't make sense to fuse all of the iteration space dimensions, we can choose to fuse a prefix of the dimensions and leave the rest unfused.

"},{"location":"Reference/functions/fuse/#arguments","title":"Arguments","text":"argument description type/default schedules If performing partial fusing, this is a tuple of the schedules to fuse. If performing full fusing, this contains the first schedule to fuse, while args will contain the subsequent schedules. *args Optional variable arguments containing subsequent schedules to fuse variable Schedule arguments partial The number of dimensions to fuse. If not specified, all dimensions will be fused non-negative integer"},{"location":"Reference/functions/fuse/#returns","title":"Returns","text":"

Instance of FusedSchedule

"},{"location":"Reference/functions/fuse/#examples","title":"Examples","text":"

Full fusing of same-shaped iteration spaces:

# Fuse all dimensions of schedule0 and schedule1\nschedule = acc.fuse(schedule0, schedule1)\n

Partial iteration space fusing:

# Fuse the first two dimensions of schedule0 and schedule1\nschedule = acc.fuse((schedule0, schedule1), partial=2)\n
"},{"location":"Tutorials/","title":"Accera Tutorials","text":"Tutorial Description Hello Matrix Multiplication Start here if you are completely new to Accera and would like to learn more about the workflow Optimized Matrix Multiplication Once you understand the basics, we'll look at how to optimize matrix multiplication for a specific hardware target Hello Matrix Multiplication on GPU We'll look at how to apply the basic concepts for GPU targets Cross Compilation for Raspberry Pi 3 After you know how to generate code for the host target, we'll look at how to generate code for other targets"},{"location":"Tutorials/Hello_MatMul/","title":"Hello MatMul","text":""},{"location":"Tutorials/Hello_MatMul/#hello-matmul","title":"Hello MatMul","text":"

By the end of this tutorial, you will learn how to:

  • Implement a simple Matrix Multiplication (MatMul) function using Accera's Domain Specific Language (DSL)
  • Produce a HAT package containing the MatMul function
  • Call the function from C or C++ code
"},{"location":"Tutorials/Hello_MatMul/#prerequisites","title":"Prerequisites","text":"
  • This tutorial assumes you already have Accera installed. If not, you can find the instructions in here
  • You should also be familiar with writing Python and C++
"},{"location":"Tutorials/Hello_MatMul/#a-naive-matmul-algorithm","title":"A naive MatMul algorithm","text":"

Let's consider the example of multiplying matrices A and B, and adding the result into matrix C. In NumPy syntax, this can be expressed as:

C += A @ B\n

A naive algorithm for matrix multiplication typically contains 3 nested for loops. Expressed in Python, this could like:

# A.shape = (M, K), B.shape = (K, N), C.shape = (M, N)\n\nfor i in range(M):\n    for j in range(N):\n        for k in range(K):\n            C[i, j] += A[i, k] * B[k, j]\n
"},{"location":"Tutorials/Hello_MatMul/#accera-python-dsl","title":"Accera Python DSL","text":"

We will now walk through a naive Matrix Multiplication (MatMul) using Accera.

Create an empty file called hello_matmul_generator.py. First we'll import Accera's module.

import accera as acc\n

Define some matrix sizes. A will be M by K, B will be K by N, and C will be M by N.

# Define our matrix sizes\nM = 128\nN = 256\nK = 256\n

Write a Python function that receives arrays A, B and C. These are our input and input/output matrices.

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))\nB = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n

Here, we will use the Nest class to define our 3-layered nested for loop. The range indices are M, N, and K, with the outermost loop (M) listed first. We can get the loop nest indices in order to perform the computation.

# Define the loop nest\nnest = acc.Nest(shape=(M, N, K))\n\n# Get the loop nest indices\ni, j, k = nest.get_indices()\n

Next we define the logic of each iteration of the loop nest:

# Define the loop nest logic\n@nest.iteration_logic\ndef _():\n    C[i, j] += A[i, k] * B[k, j]\n

We have finished defining the logic of MatMul, and let's define the schedule which controls how the logic is executed. To do this, we first create the schedule from the nest:

sched = nest.create_schedule()\n

At this point, sched represents the default schedule for our algorithm. We can also perform some basic transformations on this schedule. For example, the following lines of code will split the k index in blocks of 4 (so k, k+4, k+8, and so on).

# Split the k loop into blocks of 4, effectively doing this\n# (assuming K is divisible by 4):\n#\n# for i in range(M):\n#    for j in range(N):\n#        # Split k into two loops\n#        for k in range(0, K, 4):\n#            for kk in range(4):\n#                C[i, j] += A[i, k + kk] * B[k + kk, j]\n#\n# If k is not divisible by 4, Accera will take care of the boundary\n# case for you.\nkk = sched.split(k, 4)\n

The split index is now k and kk.

The next step is to create a plan from the schedule. For instance, we can use this plan to unroll the innermost loop.

plan = sched.create_plan()\n# Unroll kk, effectively doing this\n# (assuming K is divisible by 4):\n#\n# for i in range(M):\n#    for j in range(N):\n#        for k in range(0, K, 4):\n#            # Unrolled kk\n#            C[i, j] += A[i, k + 0] * B[k + 0, j]\n#            C[i, j] += A[i, k + 1] * B[k + 1, j]\n#            C[i, j] += A[i, k + 2] * B[k + 2, j]\n#            C[i, j] += A[i, k + 3] * B[k + 3, j]\n#\n# If k is not divisible by 4, Accera will take care of the boundary\n# case for you.\nplan.unroll(kk)\n

Use the plan to add a callable function named hello_matmul_pi3_py to a HAT package.

# Create a package and add a function to the package based on the plan\npackage = acc.Package()\npackage.add(plan, args=(A, B, C), base_name=\"hello_matmul_py\")\n

Finally, we build the HAT package:

# Build the HAT package\npackage.build(name=\"hello_matmul\")\n

By now, you should have all the code necessary to generate your first Accera MatMul function. You can also find the complete Python script here.

"},{"location":"Tutorials/Hello_MatMul/#generate-hat-package","title":"Generate HAT package","text":"

Next, we run the generator script to produce a HAT package.

"},{"location":"Tutorials/Hello_MatMul/#windowsmacos","title":"Windows/MacOS","text":"
python hello_matmul_generator.py\n
"},{"location":"Tutorials/Hello_MatMul/#ubuntu","title":"Ubuntu","text":"
python3 hello_matmul_generator.py\n

After this runs, you should see a header file hello_matmul.hat and some object files (such as hello_matmul.obj or hello_matmul.o). The .hat file format is described here. In Accera, we call these files the \"HAT package\".

"},{"location":"Tutorials/Hello_MatMul/#runner-code","title":"Runner code","text":"

We will now walk through how to call our MatMul implementation from the HAT package.

Create a file called hello_matmul_runner.cpp with the code below. You can also find it here.

#include <stdio.h>\n#include <algorithm>\n\n// Include the HAT file that declares our MatMul function\n#include \"hello_matmul.hat\"\n\n#define M 128\n#define N 256\n#define K 256\n\nint main(int argc, const char** argv)\n{\n// Prepare our matrices\nfloat A[M*K];\nfloat B[K*N];\nfloat C[M*N];\n\n// Fill with data\nstd::fill_n(A, M*K, 2.0f);\nstd::fill_n(B, K*N, 3.0f);\nstd::fill_n(C, M*N, 0.42f);\n\nprintf(\"Calling MatMul M=%d, K=%d, N=%d\\n\", M, K, N);\nhello_matmul_py(A, B, C);\n\nprintf(\"Result (first few elements): \");\nfor (int i = 0; i < 10; ++i)\n{\nprintf(\"%f \", C[i]);\n}\nprintf(\"\\n\");\nreturn 0;\n}\n

The code above creates the A, B, and C matrices, and calls the function hello_matmul_py to perform MatMul.

Now that we have written the code, we will compile and link it with the HAT package to create an executable. Save the file to your working directory, in the same location as hello_matmul_generator.py and the generated *.hat and object files.

"},{"location":"Tutorials/Hello_MatMul/#build-and-run","title":"Build and run","text":""},{"location":"Tutorials/Hello_MatMul/#windows","title":"Windows","text":"

We will need the 64-bit Visual C++ tools to link against the generated 64-bit .obj. From an \"x64 Native Tools Command Prompt\":

cl.exe hello_matmul_runner.cpp *.lib\nhello_matmul_runner.exe\n
"},{"location":"Tutorials/Hello_MatMul/#macos","title":"MacOS","text":"
clang hello_matmul_runner.cpp *.a -o hello_matmul_runner\n./hello_matmul_runner\n
"},{"location":"Tutorials/Hello_MatMul/#ubuntu_1","title":"Ubuntu","text":"
gcc hello_matmul_runner.cpp *.a -o hello_matmul_runner\n./hello_matmul_runner\n

The output should look like:

Calling MatMul M=128, K=256, N=256\nResult (first few elements): 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922\n

You can now experiment with the generated MatMul function with your own inputs.

"},{"location":"Tutorials/Hello_MatMul/#optimized-matmul-algorithm","title":"Optimized MatMul algorithm","text":"

The above example illustrates a naive algorithm. To see what a more optimized version could like like, see the Optimized MatMul tutorial.

"},{"location":"Tutorials/Hello_MatMul_GPU/","title":"Hello MatMul GPU","text":""},{"location":"Tutorials/Hello_MatMul_GPU/#hello-matmul-gpu","title":"Hello MatMul GPU","text":"

In this tutorial, you will learn how to implement a simple Matrix Multiplication (MatMul) function for execution on a GPU. We will use the Accera's Domain Specific Language (DSL) to produce a HAT package containing the MatMul function that can be called from the host to launch the MatMul function on the GPU.

"},{"location":"Tutorials/Hello_MatMul_GPU/#prerequisites","title":"Prerequisites","text":"
  • You should have Accera installed. If not, you can find the instructions in here.
  • Be familiar with writing Python and C++ code.
  • Be familiar with basic GPU programming and concepts.
  • You have completed the Hello_MatMul tutorial.
  • You have installed the Vulkan SDK and runtime.
"},{"location":"Tutorials/Hello_MatMul_GPU/#review-the-naive-matmul-algorithm","title":"Review: the naive MatMul algorithm","text":"

As in the Hello_MatMul tutorial, we'll consider the example of multiplying matrices A and B and adding the result into matrix C. In NumPy syntax, this can be expressed as:

C += A @ B\n

A naive algorithm for matrix multiplication typically contains 3 nested for-loops. Expressed in Python, this can look like:

for i in range(M):\n    for j in range(N):\n        for k in range(K):\n            C[i, j] += A[i, k] * B[k, j]\n
"},{"location":"Tutorials/Hello_MatMul_GPU/#accera-python-dsl","title":"Accera Python DSL","text":"

We will now walk through a basic Matrix Multiplication (MatMul) using Accera. Additionally, we will direct Accera to execute this MatMul function on the default GPU.

Create an empty file called hello_matmul_gpu_generator.py. Import dependent modules:

import accera as acc\n

Define some matrix sizes, where A's shape is M by K, B's is K by N, and C's, M by N.

# Define our matrix sizes\nM = 1024\nN = 512\nK = 256\n

Declare arrays A, B, and C. These are our input and input/output matrices and hold 32-bit floating-point elements.

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))\nB = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n

Use the Nest class to define our 3-layered nested for-loop and get the indices:

# Define the loop nest\nnest = acc.Nest(shape=(M, N, K))\n\n# Get the loop nest indices\ni, j, k = nest.get_indices()\n

Next, we define the logic for every iteration of the loop nest:

# Define the loop nest logic\n@nest.iteration_logic\ndef _():\n    C[i, j] += A[i, k] * B[k, j]\n

We have finished defining the logic of MatMul. Notice how, up to this point, it is identical to what we did for the CPU example. Let's now define the schedule to control the execution logic. To do this, we first create the schedule from the nest:

schedule = nest.create_schedule()\n

We will transform the iteration space and change the plan according to some predefined constants to execute this efficiently on our chosen hardware target. The values of these constants can come either from hardware target characteristics and the shapes of the arrays or can be found through auto-tuning. These will be explained in detail in a subsequent tutorial. For now, define:

block_x = 16\nblock_y = 16\n

Transform the iteration space to specify the thread block behavior. See (GPU blocks)[TODO:markdown...] section to learning more about optimizing block sizes on GPU:

ii = schedule.split(i, block_x)\njj = schedule.split(j, block_y)\n

Set the order to traverse the iteration space. Note that the precise order of execution on GPU targets will be unknown due to the parallel nature of the hardware. Nevertheless, setting the order here is important since the coarse grain parallelization (e.g., grid) should precede the more fine-grained (e.g., warps/wavefronts):

schedule.reorder(i, j, ii, jj, k)\n

Create a plan from the schedule. The plan allows us to control specific execution behavior on the hardware target. Such grid launch dimensions and thread blocks sizes are essential for high performance:

target = acc.Target(category=acc.Target.Category.GPU, runtime=acc.Target.Runtime.VULKAN)\nplan = schedule.create_plan(target)\n

Bind dimensions of the schedule to execution units on the GPU. Use the outer dimensions i, j to be the block indices x,y in the grid, and the ii and jj dimensions to be the thread indices x,y in the block:

plan.bind({\n    i: target.GridUnit.BLOCK_X,\n    j: target.GridUnit.BLOCK_Y,\n    ii: target.GridUnit.THREAD_X,\n    jj: target.GridUnit.THREAD_Y\n})\n

Use the plan to add a callable function named hello_matmul_gpu to a HAT package.

# Create a package and add a function to the package based on the plan\npackage = acc.Package()\npackage.add(plan, args=(A, B, C), base_name=\"hello_matmul_gpu\")\n

Finally, we build the HAT package:

# Build a statically-linked HAT package to be consumed by the C++ runner\npackage.build(name=\"hello_matmul_gpu\", format=acc.Package.Format.HAT_STATIC)\n

By now, you have all the code necessary to generate an Accera MatMul function that runs on the GPU. You can find the complete Python script here.

"},{"location":"Tutorials/Hello_MatMul_GPU/#generate-hat-package","title":"Generate HAT package","text":"

Next, we run the generator script to produce a HAT package.

"},{"location":"Tutorials/Hello_MatMul_GPU/#windowsmacos","title":"Windows/MacOS","text":"
python hello_matmul_gpu_generator.py\n
"},{"location":"Tutorials/Hello_MatMul_GPU/#ubuntu","title":"Ubuntu","text":"
python3 hello_matmul_gpu_generator.py\n

After this script runs, you should see a header file hello_matmul_gpu.hat and some object files (such as hello_matmul_gpu.obj or hello_matmul_gpu.o). The build process also generates a supporting module, AcceraGPUUtilities.hat and its object file for GPU initialization and uninitialization. In Accera, we call these files the \"HAT package\".

"},{"location":"Tutorials/Hello_MatMul_GPU/#runner-code","title":"Runner code","text":"

Let's see how we can call our MatMul implementation from the HAT package.

Create a file called hello_matmul_gpu_runner.cpp containing the code below. You can find it here.

#include <stdio.h>\n#include <algorithm>\n\n// Include the HAT file that declares GPU initialization/uninitialization functions\n#include \"AcceraGPUUtilities.hat\"\n\n// Include the HAT file that declares our MatMul function\n#include \"hello_matmul_gpu.hat\"\n\n#define M 1024\n#define N 512\n#define K 256\n\nint main(int argc, const char** argv)\n{\n// Prepare our matrices (using the heap for large matrices)\nfloat* A = new float[M*K];\nfloat* B = new float[K*N];\nfloat* C = new float[M*N];\n\n// Fill with data\nstd::fill_n(A, M*K, 2.0f);\nstd::fill_n(B, K*N, 3.0f);\nstd::fill_n(C, M*N, 0.42f);\n\n// Initialize the GPU\nAcceraGPUInitialize();\n\nprintf(\"Calling MatMul M=%d, K=%d, N=%d\\n\", M, K, N);\nhello_matmul_gpu(A, B, C);\n\nprintf(\"Result (first 10 elements): \");\nfor (int i = 0; i < 10; ++i)\n{\nprintf(\"%f \", C[i]);\n}\nprintf(\"\\n\");\n\n// Uninitialize the GPU\nAcceraGPUDeInitialize();\n\ndelete[] A;\ndelete[] B;\ndelete[] C;\nreturn 0;\n}\n

The above code creates the A, B, and C matrices and calls the function hello_matmul_gpu to perform MatMul.

Now that we have the code, compile and link it with the HAT package to create an executable. Save the file to your working directory, in the exact location as hello_matmul_gpu_generator.py and the generated *.hat and object files.

"},{"location":"Tutorials/Hello_MatMul_GPU/#build-and-run","title":"Build and run","text":"

Accera includes a shared library that wraps the Vulkan APIs (acc-vulkan-runtime-wrappers.so, acc-vulkan-runtime-wrappers.dll, or acc-vulkan-runtime-wrappers.dylib). We need to provide the path to this shared library when building and running the executable.

Find the installed path to the \"accera\" package:

"},{"location":"Tutorials/Hello_MatMul_GPU/#windowsmacos_1","title":"Windows/MacOS","text":"
pip show accera\n
"},{"location":"Tutorials/Hello_MatMul_GPU/#ubuntu_1","title":"Ubuntu","text":"
pip3 show accera\n

From the output above, find the Location entry, for example:

Location: /usr/local/lib/python3.8/dist-packages\n

We will use this path below.

"},{"location":"Tutorials/Hello_MatMul_GPU/#windows","title":"Windows","text":"

We will need the 64-bit Visual C++ tools to link against the generated 64-bit .obj. From an \"x64 Native Tools Command Prompt\":

Set the ACCERA_PATH environment variable to the full install path of the \"accera\" package (derived from pip show accera to locate acc-vulkan-runtime-wrappers.dll):

set ACCERA_PATH=<Location_path>\\accera\n

Set the PATH environment variable to allow the runner to locate acc-vulkan-runtime-wrappers.dll:

set PATH=%PATH%;%ACCERA_PATH%\n

Now build and run:

cl.exe hello_matmul_gpu_runner.cpp *.lib %ACCERA_PATH%/*.lib\nhello_matmul_gpu_runner.exe\n
"},{"location":"Tutorials/Hello_MatMul_GPU/#macos","title":"MacOS","text":"

Set the ACCERA_PATH environment variable to the full install path of the \"accera\" package (derived from pip show accera to locate acc-vulkan-runtime-wrappers.dylib):

export ACCERA_PATH=<Location_path>/accera\n

Now build and run:

clang++ hello_matmul_gpu_runner.cpp *.a $ACCERA_PATH/*.dylib -o hello_matmul_gpu_runner\nDYLD_LIBRARY_PATH=$ACCERA_PATH ./hello_matmul_gpu_runner\n
"},{"location":"Tutorials/Hello_MatMul_GPU/#ubuntu_2","title":"Ubuntu","text":"

Set the ACCERA_PATH environment variable to the full install path of the \"accera\" package (derived from pip3 show accera to locate acc-vulkan-runtime-wrappers.so):

export ACCERA_PATH=<Location_path>/accera\n

Now build and run:

g++ hello_matmul_gpu_runner.cpp *.a $ACCERA_PATH/*.so -o hello_matmul_gpu_runner\nLD_LIBRARY_PATH=$ACCERA_PATH ./hello_matmul_gpu_runner\n

The output should look like this:

Calling MatMul M=1024, K=256, N=512\nResult (first 10 elements): 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922\n

You can now experiment with the generated MatMul function with your own inputs.

"},{"location":"Tutorials/Optimized_MatMul/","title":"Optimized MatMul","text":""},{"location":"Tutorials/Optimized_MatMul/#optimized-matmul","title":"Optimized MatMul","text":"

Optimizing MatMul is highly dependent on the target platform. The code in the example below is optimized specifically for an Intel Xeon E5-2673 v3 CPU. However, it should work equally well on CPUs with similar hardware characteristics, such as an AMD Epyc 7551.

By the end of this tutorial, you will learn how to: * Implement a performant Matrix Multiplication (MatMul) function targetting AVX2 FMA3 CPUs like Intel Haswell or the AMD Epyc families. * Produce a HAT package containing the optimized MatMul function. * Call the function from C or C++ code.

"},{"location":"Tutorials/Optimized_MatMul/#prerequisites","title":"Prerequisites","text":"
  • You should have Accera installed. If not, you can find the instructions in here.
  • You are familiar with writing Python and C++.
  • You know about SIMD instructions and registers.
  • You have completed the Hello_MatMul tutorial.
"},{"location":"Tutorials/Optimized_MatMul/#review-the-naive-matmul-algorithm","title":"Review: the naive MatMul algorithm","text":"

As in the Hello_MatMul tutorial, we'll consider the example of multiplying matrices A and B and adding the result into matrix C. In NumPy syntax, this can be expressed as:

C += A @ B\n

A naive algorithm for matrix multiplication typically contains 3 nested for-loops. Expressed in Python, this will look like:

for i in range(M):\n    for j in range(N):\n        for k in range(K):\n            C[i, j] += A[i, k] * B[k, j]\n
"},{"location":"Tutorials/Optimized_MatMul/#accera-python-dsl","title":"Accera Python DSL","text":"

We will walk through how to specify an optimized Matrix Multiplication (MatMul) using Accera. This tutorial assumes the following:

  • Specific matrix sizes. Inputs A and B are 784 x 128 and 128 x 512 matrices, respectively. The output C is a 784 x 512 matrix. These can represent a mid-level layer in a Resnet-50 model. The A matrix contains the activation values from the previous layer, and the B matrix contains the weights of the neural network layer.
  • Row-major layout of the matrix elements.
  • The target hardware is capable of AVX2 FMA3 instructions, such as the Intel Xeon E5-2673 v3 or the AMD Epyc 7551.

Create an empty file called optimized_matmul_generator.py. Import dependent modules:

import accera as acc\n

Define some matrix sizes, where A's shape is M by K, B's is K by N, and C's, M by N.

# Define our matrix sizes\nM = 784\nN = 512\nK = 128\n

Declare arrays A, B, and C. These are our input and input/output matrices and hold 32-bit floating-point elements.

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))\nB = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n

Use the Nest class to define a 3-layered nested for-loop and get the indices:

# Define the loop nest\nnest = acc.Nest(shape=(M, N, K))\n\n# Get the loop nest indices\ni, j, k = nest.get_indices()\n

Next, we define the logic for every iteration of the loop nest:

# Define the loop nest logic\n@nest.iteration_logic\ndef _():\n    C[i, j] += A[i, k] * B[k, j]\n

We have finished defining the logic of MatMul. Let's now define the schedule which controls execution logic. To do this, we first create the schedule from the nest:

schedule = nest.create_schedule()\n

We will transform the iteration space and change the plan according to some predefined constants to execute this efficiently on our chosen hardware target. Values of these constants come either from hardware target characteristics and the shapes of the arrays or can be found through auto-tuning. These will be explained in detail in a subsequent tutorial. For now, define:

tile_size_i = 6\ntile_size_j = 256\ntile_size_k = 128\ninner_dim_unroll = 4\nnum_rows_in_kernel = 6\n

We create a CPU target that defines constants for the SIMD vector sizes and the number of vector execution units to use the hardware characteristics.

target = acc.Target(category=acc.Target.Category.CPU)\n

Transform the iteration space to specify the tiling behavior:

ii = schedule.split(i, tile_size_i)\njj = schedule.split(j, tile_size_j)\nkk = schedule.split(k, tile_size_k)\n

Next, let's split the iteration space to match the kernel characteristics:

kkk = schedule.split(kk, inner_dim_unroll)\niii = schedule.split(ii, num_rows_in_kernel)\njjj = schedule.split(jj, (target.vector_bytes // 4) * 2) # There are 2 vfma execution units, each holding (target.vector_bytes // 4) 32-bit float elements\njjjj = schedule.split(jjj, target.vector_bytes // 4) # Each SIMD register holds (target.vector_bytes // 4) 32-bit float elements\n

Accera will handle the encountered boundary conditions for each of these splits and do appropriate optimizations such as loop un-switching to ensure that efficient code gets generated in those cases.

Set the order to traverse the iteration space. We start with the outer indices that control the tiling, then move to the innermost indices that are used in the kernel:

schedule.reorder(j, k, i, jj, kk, ii, kkk, iii, jjj, jjjj)\n

Create a plan from the schedule and the current target. The plan allows us to control specific execution behavior on the hardware target, such as vectorization and caching, which are essential for high performance:

plan = schedule.create_plan(target)\n

Add caching. We use an input cache for array B that exceeds our threshold. The B cache will be packed according to the access pattern specified by the schedule. We use an input/output cache for array C. See caching for more information:

# Cache the B array by prefetching and packing the memory footprint along slices of the jj dimension.\nplan.cache(B, jj)\n# Cache the C array along slices of jj dimension. Since the C array is the output, its footprint is\n# the size of the kernel. If the kernel is small enough, Accera will use registers for this\n# accumulation before writing these values back to C.\nplan.cache(C, jj)\n

Kernelize the inner dimensions, which applies unroll and vectorize transformations allowing use of SIMD registers:

plan.kernelize(unroll_indices=[jjj, iii, kkk], vectorize_indices=jjjj)\n

Use the plan to add a function named optimized_matmul_py to a HAT package.

# Create a package and add a function to the package based on the plan\npackage = acc.Package()\npackage.add(plan, args=(A, B, C), base_name=\"optimized_matmul_py\")\n

Finally, we build the HAT package:

# Build a statically-linked HAT package to be consumed by the C++ runner\npackage.build(name=\"optimized_matmul\", format=acc.Package.Format.HAT_STATIC)\n

By now, you should have all the code necessary to generate an optimized Accera MatMul function. You can find the complete Python script here.

"},{"location":"Tutorials/Optimized_MatMul/#generate-hat-package","title":"Generate HAT package","text":"

Next, we run the generator script to produce a HAT package.

"},{"location":"Tutorials/Optimized_MatMul/#windowsmacos","title":"Windows/MacOS","text":"
python optimized_matmul_generator.py\n
"},{"location":"Tutorials/Optimized_MatMul/#ubuntu","title":"Ubuntu","text":"
python3 optimized_matmul_generator.py\n

The generator script produces a HAT file (optimized_matmul.hat). Examine this file, and you will see that it contains the exported function with the following meta-data:

[functions.optimized_matmul_py_4a6286d9]\nname = 'optimized_matmul_py_4a6286d9'\ndescription = ''\ncalling_convention = \"cdecl\"\narguments = [\n{name = '', description = '', logical_type = \"affine_array\", declared_type = 'float*', element_type = 'float', usage = \"input_output\", shape = [ 784, 128 ], affine_map = [ 128, 1 ], affine_offset = 0},\n{name = '', description = '', logical_type = \"affine_array\", declared_type = 'float*', element_type = 'float', usage = \"input_output\", shape = [ 128, 512 ], affine_map = [ 512, 1 ], affine_offset = 0},\n{name = '', description = '', logical_type = \"affine_array\", declared_type = 'float*', element_type = 'float', usage = \"input_output\", shape = [ 784, 512 ], affine_map = [ 512, 1 ], affine_offset = 0}\n]\nreturn = {name = '', description = '', logical_type = \"void\", declared_type = 'void', element_type = 'void', usage = \"output\"}\n

The C declaration from the header is:

void optimized_matmul_py_4a6286d9(float*, float*, float*);\n

Accera automatically appends a unique identifier to the function implementation, such as optimized_matmul_py_4a6286d9 to support auto-tuning. This name is re-generated every time the HAT package is rebuilt. To make it easier for client code to use the function, Accera also provides a fixed-name alias, optimized_matmul_py, for the same function.

To see how Accera generates code for the iteration space transformations and the plan, you can change the format=HAT to format=MLIR, which will output MLIR for each lowering phase. Stepping through the progression of lowerings, you can see how Accera moves from simple representation of the Accera DSL, to the final optimized assembly.

Compare this to previous tutorial, whose naive DSL is given here, and final assembly can be viewed here.

"},{"location":"Tutorials/Optimized_MatMul/#runner-code","title":"Runner code","text":"

Let's see how to call our MatMul implementation from the HAT package.

Create a file called optimized_matmul_runner.cpp with the code below. You can also find it here.

#include <stdio.h>\n#include <algorithm>\n\n// Include the HAT file that declares our MatMul function\n#include \"optimized_matmul.hat\"\n\n#define M 784\n#define N 512\n#define K 128\n\nint main(int argc, const char** argv)\n{\n// Prepare our matrices (using the heap for large matrices)\nfloat* A = new float[M*K];\nfloat* B = new float[K*N];\nfloat* C = new float[M*N];\n\n// Fill with data\nstd::fill_n(A, M*K, 2.0f);\nstd::fill_n(B, K*N, 3.0f);\nstd::fill_n(C, M*N, 0.42f);\n\nprintf(\"Calling MatMul M=%d, K=%d, N=%d\\n\", M, K, N);\noptimized_matmul_py(A, B, C);\n\nprintf(\"Result (first 10 elements): \");\nfor (int i = 0; i < 10; ++i)\n{\nprintf(\"%f \", C[i]);\n}\nprintf(\"\\n\");\n\ndelete[] A;\ndelete[] B;\ndelete[] C;\nreturn 0;\n}\n

The above code creates the matrices A, B, and C and calls the function optimized_matmul_py to perform MatMul.

Now that we have the code, let's compile and link it with the HAT package to create an executable. Save the file to your working directory, in the same location as optimized_matmul_generator.py and the generated *.hat and object files.

"},{"location":"Tutorials/Optimized_MatMul/#build-and-run","title":"Build and run","text":""},{"location":"Tutorials/Optimized_MatMul/#windows","title":"Windows","text":"

We will need the 64-bit Visual C++ tools to link against the generated 64-bit .obj. From an \"x64 Native Tools Command Prompt\":

cl.exe optimized_matmul_runner.cpp *.lib\noptimized_matmul_runner.exe\n
"},{"location":"Tutorials/Optimized_MatMul/#macos","title":"MacOS","text":"
clang++ optimized_matmul_runner.cpp *.a -o optimized_matmul_runner\n./optimized_matmul_runner\n
"},{"location":"Tutorials/Optimized_MatMul/#ubuntu_1","title":"Ubuntu","text":"
g++ optimized_matmul_runner.cpp *.a -o optimized_matmul_runner\n./optimized_matmul_runner\n

The output should look like:

Calling MatMul M=784, K=128, N=512\nResult (first 10 elements): 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983 768.419983\n

You can now experiment with the generated MatMul function with your own inputs.

"},{"location":"Tutorials/Pi3_Cross_Compilation/","title":"Cross Compiling for the Raspberry Pi 3","text":"

By the end of this tutorial, you will learn how to:

  • Cross compile a simple Matrix Multiplication (MatMul) function for execution on a Raspberry Pi 3.
  • Produce a HAT package containing the MatMul function that can be called on the Pi 3 target.
  • Call the function on a Raspberry Pi 3 from C/C++ code.
"},{"location":"Tutorials/Pi3_Cross_Compilation/#prerequisites","title":"Prerequisites","text":"
  • You should have Accera installed. If not, you can find the instructions in here.
  • Be familiar with writing Python and C++ code.
  • Have access to a Raspberry Pi 3 device.
"},{"location":"Tutorials/Pi3_Cross_Compilation/#a-naive-matmul-algorithm","title":"A naive MatMul algorithm","text":"

Consider the example of multiplying matrices A and B and adding the result into matrix C. In NumPy syntax, this can be expressed as:

C += A @ B\n

A naive algorithm for matrix multiplication typically contains 3 nested for-loops. In Python, this can be expressed as:

# A.shape = (M, K), B.shape = (K, N), C.shape = (M, N)\n\nfor i in range(M):\n    for j in range(N):\n        for k in range(K):\n            C[i, j] += A[i, k] * B[k, j]\n
"},{"location":"Tutorials/Pi3_Cross_Compilation/#accera-python-dsl","title":"Accera Python DSL","text":"

Let's walk through a na\u00efve Matrix Multiplication (MatMul) using Accera. Instead of using the default target, i.e., the host machine, we specify a target representing a Raspberry Pi 3 to cross-compile the host for a different target.

Create an empty file called hello_matmul_pi3_generator.py. First, we import Accera's module:

import accera as acc\n

Define some matrix sizes, where A's shape is M by K, B's is K by N, and C's, M by N.

# Define our matrix sizes\nM = 128\nN = 256\nK = 256\n

Write a Python function that receives A, B, and C arrays. These are our input and input/output matrices.

A = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(M, K))\nB = acc.Array(role=acc.Role.INPUT, element_type=acc.ScalarType.float32, shape=(K, N))\nC = acc.Array(role=acc.Role.INPUT_OUTPUT, element_type=acc.ScalarType.float32, shape=(M, N))\n

We now use the Nest class to define our 3-layered nested for-loop. The range indices are M, N, and K, with the outermost loop (M) listed first. We can get the loop nest indices to perform the computation.

# Define the loop nest\nnest = acc.Nest(shape=(M, N, K))\n\n# Get the loop nest indices\ni, j, k = nest.get_indices()\n

Next, we define the logic for every iteration of the loop nest:

# Define the loop nest logic\n@nest.iteration_logic\ndef _():\n    C[i, j] += A[i, k] * B[k, j]\n

We have finished defining the logic of MatMul. Let's now define the schedule which controls the execution of logic. For this, we first create the schedule from the nest:

sched = nest.create_schedule()\n

At this point, sched represents the default schedule for our algorithm. We can also perform some basic transformations on this schedule. For example, the following lines of code split the k index into blocks of 4 ( k, k+4, k+8, and so on).

# Split the k loop into blocks of 4, effectively doing this\n# (assuming K is divisible by 4):\n#\n# for i in range(M):\n#    for j in range(N):\n#        # Split k into two loops\n#        for k in range(0, K, 4):\n#            for kk in range(4):\n#                C[i, j] += A[i, k + kk] * B[k + kk, j]\n#\n# If k is not divisible by 4, Accera will take care of the boundary\n# case for you.\nkk = sched.split(k, 4)\n

The split index is now k and kk.

The next step is to create a plan from the schedule. For instance, we can use this plan to unroll the innermost loop.

# Create a plan, specify the target to be a Raspberry Pi 3\npi3 = acc.Target(acc.Target.Model.RASPBERRY_PI_3B)\nplan = sched.create_plan(pi3)\n\n# Unroll kk, effectively doing this\n# (assuming K is divisible by 4):\n#\n# for i in range(M):\n#    for j in range(N):\n#        for k in range(0, K, 4):\n#            # Unrolled kk\n#            C[i, j] += A[i, k + 0] * B[k + 0, j]\n#            C[i, j] += A[i, k + 1] * B[k + 1, j]\n#            C[i, j] += A[i, k + 2] * B[k + 2, j]\n#            C[i, j] += A[i, k + 3] * B[k + 3, j]\n#\n# If k is not divisible by 4, Accera will take care of the boundary\n# case for you.\nplan.unroll(kk)\n

Use the plan to add a callable function named hello_matmul_pi3_py to a HAT package.

# Create a package and add a function to the package based on the plan\npackage = acc.Package()\npackage.add(plan, args=(A, B, C), base_name=\"hello_matmul_pi3_py\")\n

Finally, we build the statically-linked HAT package for the Raspbian platform:

# Build the HAT package\npackage.build(name=\"hello_matmul_pi3\", format=acc.Package.Format.HAT_STATIC, platform=acc.Package.Platform.RASPBIAN)\n
After following the above steps, you should now have all the code necessary to generate your Accera MatMul function that can be called on a Raspberry Pi 3 target. You can find the complete Python script here.

"},{"location":"Tutorials/Pi3_Cross_Compilation/#generate-hat-package","title":"Generate HAT package","text":"

Next, we run the generator script to produce a HAT package for the Raspberry Pi 3 target.

"},{"location":"Tutorials/Pi3_Cross_Compilation/#windowsmacos","title":"Windows/MacOS","text":"
python hello_matmul_pi3_generator.py\n
"},{"location":"Tutorials/Pi3_Cross_Compilation/#ubuntu","title":"Ubuntu","text":"
python3 hello_matmul_pi3_generator.py\n

After we run the script, there should be a header file hello_matmul_pi3.hat and an object file hello_matmul_pi3.o in the ELF format. The .hat file format is described here. Collectively, we call the .hat file and object file a \"HAT package\".

"},{"location":"Tutorials/Pi3_Cross_Compilation/#runner-code","title":"Runner code","text":"

Let's now see how we can call our MatMul implementation from the HAT package on the Raspberry Pi 3.

Create a file called hello_matmul_pi3_runner.cpp with the code below. You can find it here.

#include <stdio.h>\n#include <algorithm>\n\n// Include the HAT file that declares our MatMul function\n#include \"hello_matmul_p3.HAT\"\n\n#define M 128\n#define N 256\n#define K 256\n\nint main(int argc, const char** argv)\n{\n// Prepare our matrices\nfloat A[M*K];\nfloat B[K*N];\nfloat C[M*N];\n\n// Fill with data\nstd::fill_n(A, M*K, 2.0f);\nstd::fill_n(B, K*N, 3.0f);\nstd::fill_n(C, M*N, 0.42f);\n\nprintf(\"Calling MatMul M=%d, K=%d, N=%d\\n\", M, K, N);\nhello_matmul_py(A, B, C);\n\nprintf(\"Result (first few elements): \");\nfor (int i = 0; i < 10; ++i)\n{\nprintf(\"%f \", C[i]);\n}\nprintf(\"\\n\");\nreturn 0;\n}\n

The above code creates the A, B, and C matrices and calls the function hello_matmul_pi3_py to perform MatMul.

Now that we have written the code, we compile and link it with the HAT package to create an executable file. Save this file to your working directory, in the exact location as hello_matmul_pi3_generator.py, and the generated *.hat and *.o files.

"},{"location":"Tutorials/Pi3_Cross_Compilation/#build-and-run","title":"Build and run","text":""},{"location":"Tutorials/Pi3_Cross_Compilation/#on-the-raspberry-pi-3-device","title":"On the Raspberry Pi 3 device","text":"

For this step, you'll be working with your Raspberry Pi device. If your Pi device is accessible over the network, copy hello_matmul_pi3_runner.cpp, hello_matmul_pi3.hat, and hello_matmul_pi3.o using the Unix scp tool or the Windows WinSCP tool here., otherwise use a USB thumb drive to transfer files manually. You do not need to copy the other generated files and folders.

You also need gcc. Although it is often installed by default on Raspberry Pi 3 systems, type this for confirmation:

sudo apt-get install -y gcc\n

This has been verified with \"Raspbian GNU/Linux 9 (stretch)\" and gcc<4:6.3.0-4> and should work with subsequent versions. Now, you can run the following commands to build and run.

gcc hello_matmul_pi3_runner.cpp hello_matmul_pi3.o -o hello_matmul_pi3_runner\n./hello_matmul_pi3_runner\n

The output should look like:

Calling MatMul M=128, K=256, N=256\nResult (first few elements): 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922 1536.419922\n

You can now experiment with the generated MatMul function with your own inputs. To try different inputs, you can modify hello_matmul_pi3_runner.cpp on the Raspberry Pi 3 and recompile it with the existing HAT package.

"}]} \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 00000000..d5d0dee5 --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,408 @@ + + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + + None + 2023-03-27 + daily + + \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz new file mode 100644 index 00000000..c989a5f5 Binary files /dev/null and b/sitemap.xml.gz differ diff --git a/update_versions.sh b/update_versions.sh new file mode 100644 index 00000000..b008c29c --- /dev/null +++ b/update_versions.sh @@ -0,0 +1,14 @@ +#!/bin/sh +#################################################################################################### +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. See LICENSE in the project root for license information. +#################################################################################################### + +pip install bump2version + +# Usage: bump2version , where is major, minor, or patch +# Note: Run this from the docs folder +# Note: This will override .bumpversion.cfg +# TODO: tie the version with git tags + +bump2version patch --allow-dirty \ No newline at end of file